A primer on Windows PE files and doing API calls without knowledge of memory layout

This blog post started as a ridiculously long comment on a GitHub issue. It’s long enough that it should be a blog post, as someone on Twitter pointed out to me, so now I’m replicating it here with some tweaks to make it read a bit better in continuous prose.

A caveat: I very quickly slapped this together and have not 100% validated everything. There might be some mistakes. Shout at me ~~on Twitter~~ (I’m over on Mastodon now) if you find issues.

The issue at hand here is this: you’ve got an x86_64 Windows PE and you want to change its behaviour by executing a stub or some shellcode in the process when it runs. That stub or shellcode needs to make API calls, but you can’t guarantee that the PE you’re injecting into actually imports the APIs you want to use, and you don’t know anything about the memory layout ahead of time. So how do you make this work?

To make sure we’re all on the same page, I’m going to start with the PE format.

All screenshots here are from CFF Explorer, which is a PE editor tool. It’s kinda old but it gets the job done. I’m also just looking at a 64-bit executable since 32-bit structures are slightly different.

PEs files start with an old 16-bit DOS header. This header is almost entirely ignored on modern Windows, so the only fields that typically matter are e_magic (which must be ‘MZ’) and e_lfanew, which points to the offset of the NT header.

You can see that its offset is at 0x108, which is where e_lfanew said it was. You might notice that there’s a bit of a gap between the end of the PE header at 0x40 and the start of the NT header at 0x108.

What sits in that space is the DOS stub. You know the old “This program cannot be run in DOS mode”? That’s actually a 16-bit x86 DOS program, stored in the file immediately after the e_lfanew field, but before the NT header. If you try to run a modern Windows PE under DOS, it runs that program instead of the PE. Since the e_lfanew field is 32-bit, you can actually embed a complete 16-bit DOS program in there for cross-compatibility!

For fun, here’s the stub disassembled:

The first is the Machine field, which tells you what machine this was built for. 0x8664 means x86_64, and 0x14C means x86_32. There are a bunch more defined values but unless you’re planning on working with Itanium or ARM PEs I wouldn’t worry about it.

Next is the NumberOfSections field. This tells you how many sections there are in the section table. We’ll come to that later.

TimeDateStamp and the symbol table fields can be ignored. SizeOfOptionalHeader is the next of importance - it tells us how big the next structure is going to be. It should always be 0xF0 on a 64-bit executable.

Finally there’s Characteristics. This is a bitfield that specifies various flags. The flags in here should be irrelevant for your use-case, but flag 0x20 is “image can handle >2GB address space” which, if you ever do 32-bit stuff, will be important because it signifies PAE compatibility, i.e. the ability to have a virtual address space up to 3GB (or sometimes 4GB) in size per 32-bit process. If you’re just doing 64-bit, ignore this.

The optional header is where most of the magic happens. It’s different between 32-bit and 64-bit programs. I’ll focus only on 64-bit.

The data directories table is an array of up to 15 entries, each containing an RVA and size field. The NumberOfRvaAndSizes field is the number of valid entries in the table, plus one null entry on the end. So for a full table (the norm) it’s 16, or 0x10. Normally you won’t see any other value than 0x10 in a non-packed executable.

The meaning of each directory is hard-coded by its index, i.e. export = 0, import = 1, resource = 2, exception = 3, etc.

The RVA is the virtual address, relative to the image base, of the location of the data for that directory. These match up with sections, i.e. every directory points to some address in a section, rather than to an offset in the file.

The ones you care about at the Export Directory, Import Directory and the Import Address Table (IAT) Directory. I’ll describe these later since it makes more sense to look at sections first.

Immediately after the data directories you have the sections table.

Name is the RVA of a null-terminated string that specifies the name of the DLL. If you convert the RVA to an offset you’ll find the string in the file there. In this case I’m using kernel32.dll as an example:

So basically you’ve got:

void* Functions[NumberOfFunctions];
char_t* Names[NumberOfNames];
uint16_t NameOrdinals[NumberOfNames];

Each function can be accessed by those indices. Functions with no name are importable by their ordinal (index into the array) - not to be confused with a name ordinal, which is different. Don’t worry about ordinals too much, they don’t come up very often and you don’t really care about them here.

So to find a function by its name in the export table, you look at the AddressOfNames field to get the RVA of the names array, then use that to loop through each of the name RVAs to find the one that matches the name of the API you want. That gives you the index into the other arrays to find where the function is.

For example:

void* getFunction(const char* functionName)
{
  for (int n = 0; n < exportDirectory->NumberOfNames; n++)
  {
    if (strncmp(functionName, exportDirectory->Names[n], peHeader->SizeOfImage) == 0)
    {
      return exportDirectory->Functions[n];
    }
  }
  return NULL;
}

Keep in mind that this gives you the RVA, so if you want the virtual address you need to add the base address of the module.

So now you know how to find an API export in a PE file, as long as you know its base address. So how do you find its base address? All Windows processes have a structure called the Process Environment Block (PEB) in memory. The structure is undocumented, but extremely stable. You can access the PEB via the Thread Environment Block (TEB), which has a ProcessEnvironmentBlock field at offset 0x60 on 64-bit processes. The TEB is accessible via the GS segment register, so reading the PEB pointer is just a case of doing mov rax, gs:[0x60] or using an intrinsic such as __readgsqword(0x60).

The field at offset 0x10 of the PEB is ImageBaseAddress. This tells you the base address of the main executable module for the current process. So if your process is Task Manager, this is the image base address for the taskmgr.exe module.

The field at offset 0x18 is Ldr, also known as the loader data. It is a pointer to a PEB_LDR_DATA structure that includes information about modules loaded into the process. The InLoadOrderModuleList and InMemoryOrderModuleList fields of that structure are the heads of doubly-linked lists that describe the modules that are loaded into the process. Their offsets are 0x10 and 0x20 respectively, and this hasn’t changed since the Windows 3.x days.

Each of these linked lists uses a LIST_ENTRY struct as a header. Immediately after each entry (apart from the one inside the PEB_LDR_DATA struct itself) is an LDR_DATA_TABLE_ENTRY struct. This struct describes a module that has been loaded into the process. Its fields of key interest include DllBase, EntryPoint, FullDllName, and BaseDllName. Ignore the use of “DLL” here - it really means any executable module.

The DllBase field tells you the base address of the module after it was loaded into memory. The EntryPoint field tells you the address of the entry point for that module, which should match the AddressOfEntryPoint field from that module’s PE (albeit as a virtual address, not an RVA). The FullDllName and BaseDllName fields are UNICODE_STRING structures that contain the full path to the module file and the name of the module respectively. You can use these to find a module by name.

In short, the process to find a module in memory, by name, is:

Read the ProcessEnvironmentBlock pointer from the TEB at gs:[0x60].
Read the Ldr pointer from the PEB to get the loader data.
Iterate through either InLoadOrderModuleList or InMemoryOrderModuleList using the Flink field (forward link).
Find the LDR_DATA_TABLE_ENTRY struct immediately after each LIST_ENTRY struct.
Read the Buffer field of BaseDllName and check it against the name that you want, e.g. kernel32.dll
If it matches, read the DllBase field to get its base address in memory.

The base address of the module points to the DOS header (MZ …) of the module in memory. You can then apply the techniques previously discussed to find the export table and figure out where APIs are.

So let’s say you want to find LoadModule and GetProcAddress from kernel32.dll at runtime - here’s the steps in pseudocode:

// get PEB from TEB at gs:[0x60]
PEB* peb = (PEB*)__readgsqword(0x60);
PEB_LDR_DATA* ldr = peb->Ldr;
// start at the first node
// the first LIST_ENTRY is in the PEB_LDR_DATA struct, so not valid
LIST_ENTRY* currentNode = &ldr->InLoadOrderModuleList->Flink;
IMAGE_DOS_HEADER* kernel32_dos = NULL;
do
{
  // get LDR_DATA_TABLE_ENTRY after LIST_ENTRY
  LDR_DATA_TABLE_ENTRY* entry = (LDR_DATA_TABLE_ENTRY*)(
    ((uint8_t*)currentNode) + sizeof(LIST_ENTRY)
  );
  USHORT length = currentNode->BaseDllName->Length;
  wchar_t* dllNameStr = currentNode->BaseDllName->Buffer;
  // case-insensitive wide string comparison, with length limit
  if (wcsnicmp(L"kernel32.dll", dllNameStr, length) == 0)
  {
    // this is kernel32
    kernel32_dos = (IMAGE_DOS_HEADER*)currentNode->DllBase;
    break;
  }
  // not kernel32, try the next module
  currentNode = currentNode->Flink;
}
while (currentNode != NULL && currentNode != &ldr->InLoadOrderModuleList);

// did we find kernel32?
if (!kernel32_dos)
  return -1;

uint8_t* kernel32_base = (uint8_t*)kernel32_dos;

// find the NT header at the offset specified by e_lfanew
IMAGE_NT_HEADERS64* ntHeader = (IMAGE_NT_HEADERS64*)(
  kernel32_base + kernel32_dos->e_lfanew
);

// get the file & PE (optional) headers
IMAGE_FILE_HEADER* fileHeader = &ntHeader->FileHeader;
IMAGE_OPTIONAL_HEADER64* peHeader = &ntHeader->OptionalHeader;
uint8_t* peHeaderBase = (uint8_t*)peHeader;

// data directories are directly after the PE (optional) header.
IMAGE_DATA_DIRECTORY* directories = (IMAGE_DATA_DIRECTORY*)(
  peHeaderBase + sizeof(IMAGE_OPTIONAL_HEADER64)
);

// find the sections
size_t sizeOfDirectories = sizeof(IMAGE_DATA_DIRECTORY) * peHeader->NumberOfRvaAndSizes;
IMAGE_SECTION_HEADER* sections = (IMAGE_SECTION_HEADER*)(
  peHeaderBase + sizeof(IMAGE_OPTIONAL_HEADER64) + sizeOfDirectories
);

// get the virtual address of the export directory
IMAGE_EXPORT_DIRECTORY* exportDir = (IMAGE_EXPORT_DIRECTORY*)(kernel32_base + directories[0]->RVA);

// get the export arrays
DWORD* nameRVAs = (DWORD*)(kernel32_base + exportDir->AddressOfNames);
DWORD* functionRVAs = (DWORD*)(kernel32_base + exportDir->AddressOfFunctions);

void* fnLoadLibrary = NULL;
void* fnGetProcAddress = NULL;

for (int n = 0; n < exportDir->NumberOfNames; n++)
{
  char* name = (char*)(kernel32_base + nameRVAs[n]);
  void* func = (void*)(kernel32_base + functionRVAs[n]);
  if (strcmp("LoadLibrary", name) == 0)
    fnLoadLibrary = func;
  if (strcmp("GetProcAddress", name) == 0)
    fnGetProcAddress = func;

  if (fnLoadLibrary != NULL && fnGetProcAddress != NULL)
    break;
}

// did we find the APIs?
if (fnLoadLibrary == NULL || fnGetProcAddress == NULL)
  return -2;

// ok, now you've got the address of LoadLibrary and GetProcAddress and you can call them!

Once you’ve got LoadLibrary and GetProcAddress you can just get any API you like, or load any DLL:

HANDLE hKernel32 = LoadLibrary("kernel32.dll");
SOME_FUNCTION_TYPE fnOpenProcess = GetProcAddress(hKernel32, "OpenProcess");

Congrats, you’re done.