Home Introduction to Format Strings Vulnerabilities
Post
Cancel

Introduction to Format Strings Vulnerabilities

A format string is, simply put, a string that is used to format dynamic data for display. They are typically used to avoid hard-coding variables into a string, and also allows the programmer to specify the format in which specific data is to be displayed. Example: You are making a social media application and you want to display the username of a user when they open their profile. A minimalistic implementation of this in C looks like this;

1
2
char *username = "4g3nt47";
printf("Username: %s\n", username);

In the above example, the format string in use is %s, tells the printf function to display the argument passed to it as a string, in this case, the variable username. This should output the following;

1
Username: 4g3nt47

So… how does this work? Looking at the printf() function prototype;

1
int printf(const char *format, ...);

format is the format string, and the ... indicates the function accepts variable number of additional arguments. These additional arguments are referenced sequentially whenever the printf() functions encounters a format string that needs to be resolved, in our example %s. It should be noted there are many functions that use format strings, not just printf. See the reference manual (man 3 printf on linux).


How It Really Works


Format strings look pretty useful and safe until you think about how they actually work. As discussed, the printf() function resolves format strings by reading their respective data from an infinite set of arguments. What happens if no argument was provided? Let’s see…

1
2
3
4
5
#include <stdio.h>

void main(){
  printf("Welcome, %s");
}

Compiling and running this, I got;

Notice we keep getting some random values in place of %s, which indicates the function is still reading something. As you know, a string is just a pointer to a bunch of characters, so we can assume the %s is causing the printf() function to de-reference some address. If we want to see the actual address that is being read, we need a format specifier that actually prints values as they are. Since we are working on x86 where pointers are 4 bytes in size, an ideal format specifier for this is an 8-characters wide hexadecimal (%08x);

1
printf("Welcome, %08x\n");

Sweet! We are now getting the address, which appears different each time because of ASLR. Now the question is where exactly is the print() function getting this address considering we didn’t give it any? Well, function arguments in x86 typically go onto the stack, so it’s a safe guess to assume that’s where the address is coming from. To confirm this, we can load the program in a debugger and set a breakpoint immediately after the call to printf() to see if the leaked address is on the stack;

As expected it is. The address 0x565c6008 (highlighted in blue), which contains our format string, is right at the top of our stack. Next to it (highlighted in green) is the address we are leaking. Remember that when calling functions in x86, arguments are pushed into the stack in reverse order, meaning the first argument goes in last, and will therefore be on top of the stack. In this case, we didn’t actually provide any extra argument to printf(), but the function has no way of knowing that since it does not have a way of tracking how many arguments were given to it, relying solely on the number of format specifiers in the format string. Since the format string we used has one format specifier, the function simply pulls data from the stack, which was not meant for it.


Exploitation: Leaking Memory


Now let’s look at another simple example;

1
2
3
4
5
6
7
8
9
10
11
12
#include <stdio.h>

int main(int argc, char **argv){
  
  if (argc < 2){
    printf("[-] Please specify your name!\n");
    return 1;
  }
  printf("Welcome: ");
  printf(argv[1]);
  return 0;
}

The above code may look safe until you take a closer look at line 10;

1
printf(argv[1]);

Remember that the first argument for printf() is the format string, in this case argv[1], which is the first argument passed to the program during execution. Since we have full control over this value, we can provide printf() with our own format strings and leak stuff from the stack;

1
2
(agent47@debian) ~> ./test '%08x %08x %08x %08x'
Welcome: 00000001 5656d000 5656a1c1 00000002

You can see we are able to leak 16 bytes from the top of the stack in hex. The first and last DWORDS (double-words/4 bytes) are clearly not addresses, so we can ignore those. But the middle two might point to something. To read contents at these addresses, we can simply replace the format specifiers with %s, which causes printf() to de-reference them (Caution: reading the wrong address will cause the program to segfault);

1
2
(agent47@debian) ~> ./test '%08x %s %s %08x'
Welcome: 00000001 > ?. 00000002

We were now able to read the data at those addresses. Nothing interesting in this case, but you could get lucky going further down the stack in sensitive programs that happen to be vulnerable.


Exploitation: Writing to Memory


Leaking stuff is fun, but let’s take a look at an example that will show you how dangerous format strings bugs can be;

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#include <stdio.h>
#include <string.h>
#include <unistd.h>

void show_profile(char *name, int *is_admin){
  printf("[*] Username: ");
  printf(name);
  printf("\n[*] Admin privs: %d\n", *is_admin);
}

int main(int argc, char **argv){
  if (argc < 2){
    printf("[-] Usage: %s <username>\n", argv[0]);
    return 1;
  }
  char *name = argv[1];
  int is_admin = 0;
  printf("[+] NOTE: Administrative function disabled due to security concerns!\n");
  show_profile(name, &is_admin);
  if (is_admin){
    printf("[+] YOU ARE AN ADMIN: %d\n", is_admin);
    char *cmd[] = {"/bin/sh", NULL};
    execve(cmd[0], cmd, NULL);
  }else{
    printf("[-] Out of here, you peasant!\n");
  }
  return 0;
}

The above shown code when executed will;

  1. Expect to be given the name of the user as an argument, and save it to name
  2. Set is_admin to 0, and inform the user that admin functions are disabled.
  3. Call show_profile() with the name of the user, and a pointer to the is_admin variable.
  4. Check if is_admin is non-zero, which should always be false since we hard-coded the value to 0. Therefore, the block of code that calls execve() to open a shell should never run.

Running it, the program seems to be working as expected;

Looking at the show_profile() code, we can clearly see the second call to printf() is vulnerable;

1
2
3
4
5
void show_profile(char *name, int *is_admin){
  printf("[*] Username: ");
  printf(name);
  printf("\n[*] Admin privs: %d\n", *is_admin);
}

This function accepts two arguments, both of which are pointers, and are placed on the stack. Let’s see if we can find them! Since our call stack is just main() >> show_profile() >> printf(), we shouldn’t need to dig deep into the stack before finding our addresses. So I leaked the first 10 addresses and examine their contents (see the previous section on how to do this) until I found the the address of my format string;

And success! Our format string (the name variable) is the 8th DWORD on the stack. Since we passed both name and is_admin in the call to show_profile(), we can be certain that the next DWORD (highlighted in red) is the pointer to our is_admin function. We can’t exactly print this since it’s currently set to 0, which printf() will see as a null character. But what if we can modify it?

Turns out we can! There is a special format string used to write the number of bytes written by the printf() function at a specific point to a given memory address: %n. Since in our case we don’t really care what is written to the target address as long as it’s not zero, we could use this to easily bypass the check that disables the code block that calls execve();

And just like that, we modified memory and got a shell! You might be curious why is_admin is now set to 72? Well, if you count the number of bytes printed by the second call to printf(), you will see they are 72 bytes long;

This is great.. but notice our payload is quite long, and programs will often restrict the length of the string you are allowed to input. An elegant way to shorten your format string payloads significantly is using direct parameter access, which allows you to specify exactly which argument you want to access within the format string. In C/C++, the format for direct parameter access is %<index>$<format>, where <index> is the position of the argument from the top of the stack (starting from 1), and <format> is the format specifier you wish to use on it.

In our case, we know that the pointer to is_admin is the 9th item from the top of the stack, so we can access it using the following;

And we got the same exploit with much shorter payload! Note that this feature may not be available on some platforms. If you are interested in learning more on format strings exploitation, consider reading the 4th chapter of The Shellcoder’s Handbook: Discovering and Exploiting Security Holes.


This post is licensed under CC BY 4.0 by the author.