Binary Exploitation

Welcome to Binary Exploitation or what is popularly also known as Pwning.

What’s a binary?

Say you wrote some C code and then compiled it, this would give you a file which you would further run, using say ./a.out - this is what is a binary, which is actually executed on the machine. There are a few binary formats depending on the OS used while compiling - ELF binary format, used by Linux and EXE binary format, used by Windows.

What’s binary exploitation?

Suppose one finds a binary running on some server which reads input from the user. Then, binary exploitation is the process of exploiting(read “hacking”) the binary to perform unintended functionality by providing malicious input (for example, causing it to spawn a shell, or read internal data), and hence, forcing it to do what we want!

Usually we’re either asked to pop up a shell or read some file named “flag.txt” in CTFs.

Okay, but what’s this “pwning” word?

Well, “pwn” is a leetspeak slang of “own”, created accidentally by the misspelling of “own” due to proximity of “O” and “P” on QWERTY keyboards. As wikipedia states :

In script kiddie jargon, pwn means to compromise or control, specifically another computer (server or PC), website, gateway device, or application.{:.info}

In binary exploitation, our goal is to indeed “pwn” the system, and hence, the “pwning” term.

Well well well, but how do you “hack” a binary?

Pwning is a skill! It is something which needs quite some hardwork and patience. But but but, you should atleast give it a try? Maybe your love(unknown :wink:) for Assembly will grow, and maybe you’ll vecome a better human, huh? Anyways let’s get into this.

Most of pwning now-a-days revolves around Memory Corruption!

Memory corruption occurs in a computer program when the contents of a memory location are modified due to programmatic behavior that exceeds the intention of the original programmer or program/language constructs.{:.info}

If someone could write perfect code, there would be no exploits at all!

All binary and web exploits occur because of programming errors.
- Sahil Jain XD {:.info}

Lemme show you the kinds of vulnerabilities commonly occur in binaries -

Buffer Overflows

Buffer overflow is probably the best known form of software security vulnerability.

Buffer overflows, even after being best known, are still very common in deployed softwares.

But what is a buffer overflow?

A buffer overflow occurs when a program puts or attempts to put more data in a buffer than it can hold, and as a result the data is put in a memory area past the buffer if there are no boundary checks. Writing past the allocated memory can corrupt data, crash the program or cause the execution of malicious code!

There are two kinds of BOFs(Buffer OverFlows) depending on where the buffer is located in memory :

Stack Overflows

These occur when the vulnerable buffer resides on the stack. These are easy to exploit since function calls occur via stack as well!

Heap Overflows

These occur when the vulnerable buffer resides on the heap. Exploiting them is a real skill :)

But how do BOFs arise?

Lemme show you some code first.

  char a[4];
  gets(a);

What could possibly go wrong over here? Notice that the char array has a size of 4. What is the gets function? char * gets ( char * str );
Reads characters from the standard input (stdin) and stores them as a C string into str until a newline character or the end-of-file is reached{:.info}

gets can read any amount of data :boom: You could as well write a 10-length string into the array a and this would lead to writing past the boundaries, this is how BOFs arise.

Format String Vulnerabilities

To understand these vulnerabilities, we first need to dive into some C stuff.
Format specifiers, in C they start with a % character, indicate how to translate the data and where and how to insert them. Format strings, are the strings containing format specifiers, which will be translated to usual strings at runtime, with the format specifiers replaced by the actual data. Format functions are several input/output functions in C and many other languages which usually take first argument as the format string and then rest of the arguments would replace the format specifiers in the string.

Well, if you’re new to this stuff, the above sentences must have been tiresome to read, so lemme show an example -

  printf("%d",5);

Here %d is a format specifier which interprets the corresponding data as an int, and hence the above line would print 5 in the current line.

  string s = "hello";
  printf("%s\n",s);

This would work only in C++ but the concept is the same, %s interprets the input data as a string, or rather, null-terminated string. Thus, it prints hello and a newline after that.

Moving on to how the vulnerability arises. Suppose we do not provide as many arguments as the number of format specifiers.

  printf("%d\n");

What would happen in such a case? Would the program stop running? NOOOOOOOOOOOOOOOOOOOOOOOOOOOOO, it doesn’t! It outputs some value, which we may think of as garbage, but the value has actually been read off the stack! What next? Now comes the actual programming error which leads to troubles.

  printf(str);

What could go wrong over here? It works nicely if str were hello there! :happy: but what happens if str were hello there! %x? Hehe, a format specifier, this is almost the same as the previous example, except for the fact that %x outputs as hexadecimal. Yeah, this would print something from the stack, thus leaking internal data, and this is how the vulnerability stems. The %n format specifier even allows us to write, thus, permitting the attacker to cause denial of service or execute malicious code.

Some Awesome Resources

Some proposed mitigations

  • NX : Non-executable stack
  • Stack Canaries : Stack protectors
  • ASLR(Address Space Layout Randomization)
  • PIE(Position-Independent Executable)

How to mitigate?

  1. Using safe languages which perform boundary checks
  2. Write good code :)
  3. Use safe functions like fgets, strncmp etc
  4. Learn how to pwn first XD