I’m coding in nasm, and I don’t get what is going on. The Linux distro is an Ubuntu 16 64bits, but the NASM is operating in 32bits.
Expected output –> "Number is: 2"
Actual output –> "number is: 134520868"
Code:
%include "io.inc"
section .data
n1 db 2 ; i know it's a bad practice to define a db variable, it's just for a test
msg: db 'number is: %d',10,0
section .text
extern printf
global main
main:
push ebp
mov ebp, esp
push dword n1
push msg
call printf
mov esp, ebp
pop ebp
ret
I’ve tried to define n1 with dd
, or pushing the content of n1
, even with a register like eax
.
Update, even when I do push dword [n1]
the only thing that changes is that now the output is "Number is: 1836404226"…
3
Answers
You want a 32-bit number, so use
dd
and you don’t want to print the address of the variable, but its contents. So:and
It is not bad practice.
If the variable is meant to be a byte, then using
db
is the absolute right thing to do.On the other hand, if the variable is not meant to be a byte, then using
db
is still not bad practice; it is a mistake.It being a test does not somehow invert the laws of the universe and make wrong things right.
You have two options:
Since you are using the
x86
tag, you are targeting a 16-bit architecture, where anint
is 16-bits long. (Of course, the number1836404226
that you are seeing is in conflict with this, but that is another story.)So, since you are targeting a 16-bit architecture, declare your variable using
dw
, which is 16 bits, and usepush word [n1]
orpush [word ptr n1]
orpush [n1]
depending on your assembler’s syntax flavor. (I can’t be bothered to look up nasm syntax right now.) The rest of the code remains the same.Alternatively, keep the variable defined with
db
, but usemovsx ax, [n1]
ormovzx ax, [n1]
followed bypush ax
to sign-extend or zero-extend that byte into a word, and then push it into the stack. The rest of the code remains the same.If, by any chance, you are targeting
x64
and notx86
, then usedd
instead ofdw
, usedword
instead ofword
, and useeax
instead ofax
.To answer the actual question, your
printf()
prints a large number because you are not printing the actual variable, you are printing the address of the variable. (And addresses tend to be arbitrary large numbers.)However, even if you did
push word [n1]
it would still print some nonsense number, becausen1
has been defined withdb
, so it is a byte, and this byte is followed by another byte, (the letter ‘n’ in your case,) and when these two bytes are read together as a word they form some other nonsensical number which is different from 2.First of all,
push n1
pushes the address.push dword [n1]
loads 4 bytes from memory and pushes them.Basic use of immediates vs. square brackets in YASM/NASM x86 assembly
msg: db ...
comes right aftern1: db 2
, so a 4-byte (dword) load gets 3 bytes of ASCII characters as the high bytes of the integer. A%d
format takes all 4 of those bytes as theint
to print.x86 is little-endian, so if you’d used
%x
in your format string you’d see0x6d756e02
– notice that the low byte is the2
you loaded.6e
is'n'
in ASCII / UTF-8, etc. You could also use GDB to look at stack memory before thecall
. See the bottom of the x86 tag wiki for asm GDB tips.It’s not bad practice to have a
char
,uint8_t
, orint8_t
global variable. It’s no worse than auint32_t
global. Whatever size your data is, you need to use appropriate instructions for it. This is assembly, there’s no compiler to implicitly convert types for you.In this case, you need to load just a byte if you don’t want to pull in other garbage.
The things you can do include:
Zero- or Sign-extend a byte into a register and push that
movzx eax, byte [n1]
/push eax
will load just a byte from memory, zero-extending to 32 bits (the width of EAX).movsx
is the same but with sign-extension.This is what a C compiler would do for
printf("...", n1)
withint8_t n1 = 2;
Load high garbage but tell
printf
to only look at the low byteSince you know you have valid data after your byte in this case, you can’t segfault from going off the end of a page into an unmapped one.
push dword [n1]
and use%hhd
in your format string to treat arg asint8_t
, only looking at the low byte. See the Glibc printf man page.Loading garbage past the end of a variable isn’t something you can express in C, except maybe with
memcpy(&tmp_int, &n1, sizeof(tmp_var));
. A very clever compiler could potentially do this asm optimization if you used a%hhd
format string so it knew the high 3 bytes of what it pushed didn’t matter. Yourn1
happens to be 4-byte aligned (since it’s at the start of your.data
in this file), so a 4-byte load can’t be split across cache lines, so there’s no downside.Note that standard calling conventions including i386 System V allow narrow args to contain high garbage; they don’t have to be zero or sign extended to 32-bit. (Except maybe as an undocumented extension required by clang, at least that’s the case for register args in x86-64 Sys V.) And anyway,
printf
in C terms is taking anint
and%hd
/%hhd
conversions truncate it, since it’s variadic so the default argument promotions apply: narrow integer types promote toint
.)Reserve 4 bytes for your global
n1: dd 2
allowspush dword [n1]
to load0x0000002
from memory, so a%d
conversion to print the whole thing as anint
will print just2
.In C terms, this is
static int n1 = 2;
instead ofstatic int8_t n1 = 2;