32,64-bit systems, CPU Registers, Assembly Language, The Call Stack, ...
2022-02-21 . 3 min read
32, 64-bit Systems
32 or 64 bit systems are characterized by the amount of data the CPU registers can hold at once. So, a 32-bit system has 32-bit registers and 64-bit systems can have 64 -bit registers. CPU reads data in a sequence of bits. The opcodes(operation codes) are just a sequence of bits fed into the CPU. 8-bits makes a byte and instead of saying the sequence of bits, it would be more accurate to say that the computer is fed bytes of data at a time because 8-bit byte has meaning. For example; when we write in assembly language :
the assembler turns it into the opcode: b8 05 00 00 00
Each pair is a byte. The first pair b8, which is the hex representation of the binary number 10111000, tells the CPU to move the given bytes, 05 00 00 00, to the eax register. Computers are dumb, so to store a small integer as the number 5, takes 4 bytes of memory so that it does not get confused with different numbers when fetching from memory. So, the 4-byte representation of 5 would be 00 00 00 05 but notice that the number is in the opposite order in the opcode. This is because of something called the endianness which basically means the order in which data is stored in memory. x86 and most ARM architecture follow little endianness. Hence, we see the reverse ordering.
32-bit system can only handle 4 GB of RAM; Do you know why?
Assembly Language
Most people like c because it is much closer to the hardware, so lower level of abstraction and much control over the programmer over the hardware. Assembly is steps below c. The instruction :
is an instruction written in assembly language following the x86 ISA(Instruction set architecture). Each family of CPUs has its own set of ISA and as written on the ARM’s website:
The ISA acts as an interface between the hardware and the software, specifying both what the processor is capable of doing as well as how it gets done.
Register
A CPU has many registers and each has its own purpose. Registers can be divided into 4 types as clearly shown from the figure below:
The photo is taken from an online resource begin.re and I found it really helpful when learning Assembly. So, the register we see above are 32-bit registers. 64-bit registers are denoted as RAX, RSI, RBP, etc.
Previous 16-bit CPU would contain only the subset of the 32-bit and the 32-bit register is only the subset of the 64-bit CPU which you are using on your computer or mobile device right now. Since the registers are made such, any 64-bit device can almost possibly run a 32-bit program. Now, enough about this and let's talk about the type of registers we see above. The general-purpose registers as the name suggests are for general purposes like holding data for the ALU to perform some mathematical operations on them. Some GPR(General Purpose Registers) have interesting tasks like the EBP register which we will talk about in the Call Stack section.
The code we write is compiled and stored in memory and fetched from the memory into the CPU registers for execution, so an instruction pointer (EIP) always holds the memory address of where the instruction should be fetched from memory to execute.
Another interesting register is the EFLAGS register which holds 32-bit flags to hold information on operations, for example, the zero flag is one of the flags in the EFLAGS register which is set when the result of the previous operation is zero, which is then used by the conditional jump instruction to set the instruction pointer to some memory location if the condition is met. The jump instruction looks like this:
, this instruction checks if the zero flag is set or not, and if set then the instruction is now taken from the mem location specified. So, whenever we perform some conditional checks in our if-else block or while loop, something like this is going under the hood.
The Call Stack
A stack is a data structure that can be thought of as a stack of books or a stack on plates but here in the call stack, we stack functions. Whenever a function is called, it is fetched from the memory and loaded onto the call stack. There is a certain way the function’s data is loaded onto the stack. We won’t be going into much detail but first, the function parameters are loaded onto the stack with some order, and then the local variables and then other instructions. The base pointer holds the address or points to the base of the current running function or subroutine on the call stack and the ESP (stack pointer) points to a different part of the function within the function space in the call stack. Hence, when we want to access certain local variables of the function we use the EBP as the reference and move the ESP up or down to point to the address on the call stack where the variable we want is located.
A function can take a finite amount of the call stack’s memory and when we try to fill the stack with more amount of data than it is allowed to use, we get stack overflow or buffer overflow. Vulnerabilities like these can allow us to get or view certain parts of memory addresses that we are not supposed to look into where only the program has the right to play with the data. It is the job of the programming language to add certain restrictions to its memory space. c had major problems with its functions like gets() and strcmp() which allowed such vulnerabilities. Instead of these we should use fgets(), srtncmp(), srtncpy() which are the patched version of these functions. For people involved in asynchronous programming for example in Javascript while fetching data from a remote server or any other type of asynchronous task, it can be very beneficial to know and understand the call stack.
And many more
Just the tip of the iceberg; Many other processes and operations are happening under the screen or your keyboard but the level of abstraction we have today has made our life much easier. We do not have to think about all these things when we are programming our computer to perform certain tasks like, when we want to print something on the screen we don’t have to think about the syscalls, the interrupts, the hardware access, user mode, kernel mode, etc. The abstractions all around us make these things seem like magic and it blows my mind when thinking about the time when people had to write in assembly when people had to know about the call stack, the hardware interrupts, etc.
References and Resources
- Begin.re
- Binary Exploitation - Live Overflow
- The Internet
Hey I assume you finished reading, I would love to know your feedback or if found any error or mistake in this blog post, please do not hesitate to reach out to me.