INLINE ASSEMBLER IN DELPHI (II) - ANSI STRINGS
By Ernesto De Spirito
In this chapter we will learn a few more assembler instructions, and the
basics of working with ANSI strings, also called long strings.
New opcodes
===========
These are the opcodes introduced in this article:
* JL (Jump if Lower): The correct description would take long to be
explained, so let's just say that JL jumps (goes) to the specified
label if in the previous CMP (or SUB) operation, the first operand is
less than the second operand in a signed comparison:
// if signed(op1) < signed(op2) then goto @@label;
cmp op1, op2
jl @@label
JG (Jump if Greater), JLE (Jump if Lower or Equal), and JGE (Jump if
Greater or Equal) complete the family of conditional jumps for
signed comparisons.
* JA (Jump if Above): jumps (goes) to the specified label if in the
previous CMP (or SUB) operation, the first operand was greater than
the second one, being both operands considered as unsigned values:
// if unsigned(op1) > unsigned(op2) then goto @@label;
cmp op1, op2
ja @@label
JB (Jump if Below), JBE (Jump if Below or Equal), and JAE (Jump if
Above or Equal) complete the family of conditional jumps for unsigned
comparisons.
* LOOP: Decrements ECX, and if not zero, it jumps to the specified
label. LOOP @@label is a shorter and faster equivalent to:
dec ecx // ECX := ECX - 1;
jnz @@label // if ECX <> 0 then goto @@label
Example:
xor eax, eax // EAX := EAX xor EAX; // EAX := 0;
mov ecx, 5 // ECX := 5;
@@label:
add eax, ecx // EAX := EAX + ECX; // Executed 5 times
loop @@label // Dec(ECX); if ECX <> 0 then goto @@label;
// EAX would value 15 (5+4+3+2+1)
Working with ANSI strings
=========================
A string variable is represented by a 32 bit pointer. If the string is
the empty string (''), then the pointer is Nil (zero), otherwise it
points to the first character of the string. The length of the string,
and the reference count are two integers at negative offsets from the
position of the first byte:
+-----------+
| s: string |-------------------+
+-----------+ |
V
--+-----------+-----------+-----------+---+---+---+---+---+---+---+--
| allocSiz | refCnt | length | H | e | l | l | o | ! | #0|
--+-----------+-----------+-----------+---+---+---+---+---+---+---+--
(longint) (longint) (longint)
\-----------------v-----------------/
StrRec record
const skew = sizeof(StrRec); // 12
When we pass a string as a parameter to a function, what is passed is
just the 32-bit pointer. Strings as return values are more difficult to
explain. The caller of a function returning a string must pass --as an
invisible last parameter of PString type-- the address of the string
variable that will hold the result of the function.
d := Uppercase(s); // Internally converted to: Uppercase(s, @d);
If the result of the function will be used in an expression rather than
assigned directly to a variable, the caller must use a temporary
variable initialized to Nil (the empty string). The compiler does that
for us in the Object Pascal code, but we have to do it by ourselves if
we call string returning functions from assembler code.
For some tasks, we can't call the classic string functions directly. For
example, the function Length isn't the name of a real function. It's a
construct built-in into the compiler, and the compiler generates code to
call the appropriate function, depending on whether the parameter is a
string or a dynamic array. In assembler, instead of Length, we have to
call the function _LStrLen (declared in the System unit) to get the
string length.
There are more things we should know about strings, but we have enough
for a first example.
Assembler version of Uppercase
==============================
This is the declaration of the function:
function AsmUpperCase(const s: string): string;
The parameter "s" will be passed in EAX, and the address of the "Result"
will be passed as a second parameter, i.e., in EDX.
Basically, this function should:
1) Get the length of the string to convert
2) Allocate memory for the result string
3) Copy the characters, converting them to uppercase
1) Get the length of the string to convert
------------------------------------------
We'll do this by calling System.@LStrLen. The function expects the
string in EAX (we already have it there), and the result will be placed
in EAX, so we have to save the value of EAX (the parameter "s")
somewhere before calling the function to avoid losing it. We can save
it in a local variable "src". Since functions are free to use EAX, ECX
and EDX, we should assume the value of EDX ("@Result") could also be
destroyed after calling System.@LStrLen, so we should first save it, for
example in a local variable "psrc". The result of System.@LStrLen,
left in EAX, will be used as a parameter for System.@LStrSetLength (to
allocate memory for the content of the result string), and then we need
it to count the bytes to be copied, so we also have to save it, for
example in a variable "n":
var
pdst: Pointer; // Address of the result string
src: PChar; // Source string
n: Integer; // String length
asm
// The address of the result string is passed in EDX.
// We save it in a local variable (pdst):
mov pdst, edx // pdst := EDX;
// Save EAX (s) in a local variable (src)
mov src, eax // src := EAX;
// n := Length(s);
call System.@LStrLen // EAX := _LStrLen(EAX);
mov n, eax // n := EAX;
2) Allocate memory for the result string
----------------------------------------
This is accomplished by calling System.@LStrSetLength. The procedure
expects two parameters: the address of the string (we saved it in
"pdst"), and the length of the string (we have it in EAX).
// SetLength(pdst^, n); // Allocates result string
mov edx, eax // EDX := n; // Second parameter for LStrSetLength
mov eax, pdst // EAX := pdst; // First parameter for LStrSetLength
call System.@LStrSetLength // _LStrSetLength(EAX, EDX);
3) Copy the characters, converting them to uppercase
----------------------------------------------------
If the length of the string was zero, we are done:
// if n = 0 then exit;
mov ecx, n // ECX := n;
test ecx, ecx // And ECX with ECX to set flags (ECX unchanged)
jz @@end // Go to @@end if the zero flag is set (ECX=0)
Otherwise, we should copy the characters from one string to the other,
converting them to uppercase as needed. We are going to use ESI and EDX
for pointing the characters of the source string and the result string
respectively, AL to load a character from the source string and perform
the change before storing it in the destination string, and ECX with the
LOOP instruction to count the characters. Since ESI is a register we
must preserve, we have to save its value to restore them later. I
decided to save ESI pushing it on the stack.
push esi // Save ESI on the stack
// Initialize ESI and EDX
mov eax, pdst // EAX := pdst; // Address of the result string
mov esi, src // ESI := src; // Source string
mov edx, [eax] // EDX := pdst^; // Result string
@@cycle:
mov al, [esi] // AL := ESI^;
// if Shortint(AL) < Shortint(Ord('a')) then goto @@nochange
cmp al, 'a'
jl @@nochange
// AL in ['a'..#127]
// if Byte(AL) > Byte(Ord('a')) then goto @@nochange
cmp al, 'z'
ja @@nochange
// AL in ['a'..'z']
sub al, 'a'-'A' // Dec(AL, Ord('a')-Ord('A'));
@@nochange:
mov [edx], al // EDX^ := AL;
inc esi // Inc(ESI);
inc edx // Inc(EDX);
loop @@cycle // Dec(ECX); if ECX <> 0 then goto cycle
pop esi // Restore ESI from the stack
@@end:
end;
________________________________________________________________________