Skip to content
September 23, 2012 / eddierusty8

Chapter 5 Beyond Documentation The Generic Table API in NTDLL.DLL

src : Revering Secrets of Reverse Engineering
The generic table API is part of the Windows Native API . It’s basically a run-time library function typically not used for communication with the OS , but simply as a toolkit containing common required services such as string manipulation and data management .

Dump NTDLL.DLL using dumpbin.exe :

dumpbin.exe located in \VC\bin ( I have VC++ 2010 Express Edition)
To initialize env variables in cmd run vlvar32.bat file located it the same directory
I used the /EXPORT and /OUT options :
dumpbin.exe /OUT:ntdllexp.txt /EXPORTS C:\WINDOWS\system32\ntdll.dll
Sample out from the file :
ordinal hint RVA      name
8    0 0001EB58 CsrAllocateCaptureBuffer
9    1 0001EBB9 CsrAllocateMessagePointer
10    2 0001E3AF CsrCaptureMessageBuffer

641  27C 000114FA RtlInitializeCriticalSectionAndSpinCount
642  27D 00021491 RtlInitializeGenericTable
643  27E 00030161 RtlInitializeGenericTableAvl
644  27F 00018B7A RtlInitializeHandleTable
645  280 0002FE1A RtlInitializeRXact

see the RVA for RtlInitializeGenericTable is  00021491

Using OllyDbg

I opened generictable.exe that use this API
get generictable.exe from here
view executable modules you will see the imorted module NTDLL.DLL imported at base address 79C00000

RtlInitializeGenericTable

Visit the address and the code section now 7C900000 + 00021491 (the RVA from export table)
hit the address 7C921491 , so here is the dump for RtlInitializeTable :

Note on Calling Conventions :

RET 14

14 : how many bytes of stack to unwind

  • cdecl : the caller unwind stack
  • it’s not _fastcall convention because it’s not taking registers from the caller, all used registers
    are initialized in the function itself
  • c++ member functions are decorated they have its name from export library
  • by elimination process this function follows stdcall calling convention

14 in hex = 20 decimal and since this is 32bit Arch the parameters are 32bit aligned so 20/4 = 5 parameter probably or at most 5 parameters

Analyzing the code :

MOV EDI,EDI
PUSH EBP
MOV EBP,ESP

EBP and ESP basic instructions no need to comment (or why to copy ESP to EBP )
just remeber if there is a need to allocate local variable , this is done by subtracting from ESP

Note on Segment Registers :

Windows doesn’t use DS,SS,ES,CS so it’s safe to ignore them
Windows does use FS for current thread local memory

MOV EAX,DWORD PTR SS:[EBP+8]
XOR EDX,EDX
LEA ECX,DWORD PTR DS:[EAX+4]

eax points to first parameter
zero EDX
ECX = EAX + 4 , LEA is used for arithmetic operations on addresses

MOV DWORD PTR DS:[EAX],EDX
first member ( first 4 bytes referenced by eax ) set to zero
MOV DWORD PTR DS:[ECX+4],ECX
third member poins to second member
MOV DWORD PTR DS:[ECX],ECX
second member points to it self
MOV DWORD PTR DS:[EAX+C],ECX
fourth member points to second

UnknownStruct->Member1 = 0;
UnknownStruct->Member3 = &UnknownStruct->Member2;
UnkownStruct->Member2 = &UnknownStruct->Member2;
UnknownStruct->Member4 = &UnknownStruct->Member2;

MOV ECX, DWORD PTR SS:[EBP+C]
MOV DWORD PTR DS:[EAX+18],ECX
MOV ECX, DWORD PTR SS:[EBP+10]
MOV DWORD PTR DS:[EAX+1C],ECX

UnknownStruct->Member7 = Param2;
UnknownStruct->Member8 = Param3;

continue the rest of the function we will have the following structure 4 * 10 = 40 bytes
struct TABLE
{
UNKNOWN  Member1;
UNKNOWN_PTR Member2;
UNKNOWN_PTR Member3;
UNKNOWN_PTR Member4;
UNKNOWN Member5;
UNKNOWN Member6;
UNKNOWN Member7;
UNKNOWN Member8;
UNKNOWN Member9;
UNKNOWN Member10;
};

RtlNumberGenericTableElements

RVA 00023B59

so EAX + 14 contains the number of elements

struct TABLE
{
UNKNOWN Member1;
UNKNOWN_PTR Member2;
UNKNOWN_PTR Member3;
UNKNOWN_PTR Member4;
UNKNOWN Member5;
ULONG NumberOfElements;
UNKNOWN Member7;
UNKNOWN Member8;
UNKNOWN Member9;
UNKNOWN Member10;
};

RtlIsGenericTableEmpty

set AL to one if the first member of the structure is zero
so member 1 is non zero when the structure is not empty

RtlGetElementGenericTable

7C9624E0 PUSH EBP
7C9624E1 MOV EBP,ESP
7C9624E3 MOV ECX,DWORD PTR [EBP+8]
7C9624E6 MOV EDX,DWORD PTR [ECX+14]
7C9624E9 MOV EAX,DWORD PTR [ECX+C]


7C9624EC PUSH EBX
7C9624ED PUSH ESI

7C9624EE MOV ESI,DWORD PTR [ECX+10]
7C9624F1 PUSH EDI
7C9624F2 MOV EDI,DWORD PTR [EBP+C]   # the second parameter
7C9624F5 CMP EDI,-1                  # JE is not directly after CMP
# case of interleaved code
7C9624F8 LEA EBX,DWORD PTR [EDI+1]   # EBX = EDI + 1
# LEA is the choice to keep both values
7C9624FB JE SHORT ntdll.7C962559     # je to the end function
if (Param2 == 0xffffffff)
return 0;

7C9624FD CMP EBX,EDX
7C9624FF JA SHORT ntdll.7C962559

Another interesting and informative hint you find here is the fact that the
conditional jump instruction used is JA (jump if above), which uses the carry
flag (CF). This indicates that EBX and EDX are both treated as unsigned values.
If they were signed, the compiler would have used JG, which is the signed ver-
sion of the instruction
ULONG AdjustedElementToGet = ElementToGet + 1;
if (ElementToGet == 0xffffffff ||
AdjustedElementToGet > Table->TotalElements)
return 0


7C962501 CMP ESI,EBX
# ESI is ,DWORD PTR [ECX+10]
# so EXC+10 holds kind of an index
7C962503 JE SHORT ntdll.7C962554

# at ntdll.7C962554
7C962554 ADD EAX,0C  # EAX is ,DWORD PTR [ECX+C]
#EAX = EAX + 12
7C962557 JMP SHORT ntdll.7C96255B
# 7C96255B is at the end so if ESI=EBX eaqual returning EAX+12

You know that RtlGetElementGenericTable is returning the value
of one of these pointers to the caller, but not before it is incremented by
12. Note that 12 also happens to be the total size of those three pointers.

All of this leads to one conclusion. RtlGetElementGenericTable is
returning a pointer to an element, and adding 12 simply skips the element’s
header and gets directly to the element’s data. It seems very likely that this
header is another three-pointer data structure just like that in offset +4 in the
root data structure. Furthermore, it would make sense that each of those point-
ers point to other items with three-pointer headers, just like this one

One other thing you have learned here is that offset +10 is the index of the cached
element—the same element pointed to by the third pointer, at offset +c. The
difference is that +c is a pointer to memory, and offset +10 is an index into the
table, which is equivalent to an element number.

7C962505 JBE SHORT ntdll.7C96252B
7C962507 MOV EDX,ESI
7C962509 SHR EDX,1
7C96250B CMP EBX,EDX
7C96250D JBE SHORT ntdll.7C96251B
7C96250F SUB ESI,EBX
7C962511 JE SHORT ntdll.7C96254E

September 22, 2012 / eddierusty8

Chapter 4 Reversing Tools

User mode :

  • OllyDbg
  • WinDbg
  • IDA PRO
  • PEBrowse

Kernel mode :

  • Windbg
  • Numega SoftICE

System Monitoring Tools http://www.sysinternals.com :

  • FileMon
  • TCPView
  • TDIMon
  • RegMon
  • WinObj
  • WinObj : for named objects
  • Process Explorer

Patching Tools :

  • Hex Workshop

Miscellaneous Reversing Tools :

  • DUMPBIN
  • PEVIEW
  • PEBrowse
September 21, 2012 / eddierusty8

Chapter 3 Windows Fundementals

src : Revering Secrets of Reverse Engineering
page table entries have set of flags

Working Sets : per process data structure that lists the current physical pages that it’s using

hint ! A 32 bit number whose first hexadecimal digit is 8 or above is not a valid user mode address

The Kernel Memory Space :

  • Paged and Non Paged pools :

    • are kernel mode heaps
    • non paged never flushed to hard drive
  • System Cache :

    • is where windows cache manager maps all currently cached files
    • when you open a file a section object is created
    • when you use ReadFile or WriteFile API the file system internally access the mapped copy of the file using cache manager API
  • Terminal Services Session Space :

    • WIN32.SYS loaded multiple instances
    • each instance is loaded into the same virtual address but in different session space
    • there is also a session pool
  • Page Tables and Hyper Space :

    • hyper space primarily for mapping working sets
  • System Working Set

  • System PTE :

    • virtual memory space used for large kernel allocations
    • device drivers can allocate by calling the MmAllocateMappingAddress

Section Objects (Memory Mapped Files) :

  • special chunk of memory managed by OS
  • before its contents can be accessed it must be mapped
  • key property ! : can be mapped to more than one place
  • to share memory , kernel-user
  • two basic types :
    1. Page file backed
    2. File backed eg : used when loading an exe

VAD Trees

User Mode Allocations :

  • private allocations WIN32API VirtualAlloc
  • Heaps WIN32 HeapAlloc
  • stacks
  • executable s as memory mapped files
  • Mapped-view create memory mapped files to share

Memory Management APIs :

  • VirtualAlloc : reserved VS commit , commit is to allocate space in system page file but no physical memory will be used unitl actually accessed
    if privileged there is an Ex version of the API that receive a handle to a process object
  • VirtualProtect
  • VirtualQuery
  • VirtualFree
  • API for other process address space and can read write to it ReadProcessMemory and WriteProcessMemory
  • A section object can be mapped into user mode MapViewofFileEx

Objects and Handles

Named Objects :

  • arranged in hie directories
  • WIN32 API restrict user mode access to them
  • most interesting directories :
    • BaseNamedObjects : such as mutexes
    • DeviceObjects : can’t access with WIN32 directy , use symbolic links
    • Global : symbolic links directory

Threads :

a thread can have two stacks because a thread alternate between user mode and kernel mode

Process Initialization Sequence :

  1. Win32 API CreareProcess
    • create a process object
    • allocate new memory address space
  2. CreateProcess mas NTDLL.DLL and .exe to address space
  3. create first thread and allocate its stack
  4. LdrpInitialize function from NTDLL.DLL
  5. recursively traverse the exe import table
  6. LdrpInitalizeRoutines initialize dlls
  7. then LdrpInitialize calls BaseProcessStart NTDLL.DLL which calls WinMain

Win32APIs :

  • Kernel APIs KERNEL32.DLL implemented in the system object manager
  • GDI API GDI32.DLL : basic graphics , implemented in the kernel inside WIN32.SYs
  • USER APIs USER.DLL , window managment

Native API (NTDLL.DLL) :

  • WIN32 API is just a layer above the Native API
  • every native API has two versions Nt and Zw

System Calling Mechanisim :

  •  <= 2000

    ntdll zwReadFile
    mov eax, 0xa1
    lea edx, [esp+0x4]
    int 2e
    ret 0x24

    • the processor will use IDT Interrupt Descriptor Table
    • the IDT entry for 2e points to an internal NTOSKRNL function called KiSystemService
    • KiServiceTable array of pointers to various services
    • eax is the index in KiServiceTable array
  • >= XP , 2003

    • instead of IDT handler the system use SYSENTR instruction
    • predetermined function whose address stored in special model specific registry MSR

Image Sections

Section Alignment : 2 Values ( file and in memory)

DLLS :

  • if certain dll loaded into more than one address it will just jump into it
  • two different methods for attaching to DLLs
    1. static linking : import table in PE header
    2. runtime linking

Headers :

  • DOS Header :
    • _IMAGE_NT_HEADERS
      • IMAGE_FILE_HEADER
      • IMAGE_OPTIONAL_HEADER32
    • most interesting contents of PE header resides in DataDirectory

Import and Export :

the addresses are found by going over the exporting modules from the export table which contains names and RVAs of every exported function

Directories , List if Common Special Directories :

  • Import Table
  • Export Table
  • Resource Table : points to the exe resource directory static definition of various user interface elements
  • Base Relocation Table
  • Debugging Information
  • Thread Local Storage Table
  • Load Configuration Table : contains a variety of image configuration elements
  • Bound Import Table
  • IAT Import Address Table
  • Delay Import Descriptor : not implemented by OS , but by C run-time library

The IO System

  • layered which means that for each device there can be multiple device drivers that are stacked on top of each other
  • think of transparent file compression-encryption
  • tools that insert special filtering code

The WIN32 Subsystem :

  • it’s important to realize that the components considered the Win32 Subsystem are not responsible for the entire Win32 API only the GDI and USER portions of it . As described earlier the BASE API exported from KERNEL32.DLL is implemented using direct calls to the native API
  • WIN32 subsystem implemented in WIN32.SYS kernel component and is controlled by GDI32.DLL and USER32.DLL user components
  • Object Management :
    • they don’t use the kernel object manager
    • their own little object management mechanisms
    • maintain object tables
    • the tables are stored and managed by kernel memory
    • but are also mapped into each process address space for read only access from user mode
    • hence USER and GDI handles are global

Structured Exception Handling :

  • hardware exceptions and software exceptions
  • throw keyword is implemented using the RaiseException WIN32 API which goes down into the kernel and follows the same code path as hardware exception
  • each thread is assigned an exception handler list
  • the list is stored in thread information block TIB data structure which is available from user mode
  • the TIB is stored in regular private allocation user mode memory
  • each process can have multiple TIB data structures
  • How does a thread find it’s own TIB structure ?
  • using the FS segment register as a pointer , TIB is always available as FS:[0]
September 21, 2012 / eddierusty8

Chapter 2 Low Level Software

Intel NetBurst

Micro Ops :

  • below opcodes
  • micro code RAM , contains microcode for every instruction in the instruction set
  • execution trace cache for execution

Pipelines :

  1. Front end:from op-code to micro code
  2. Out of order core : reorders them based on availability of CPU resources
  3. Retirement Section : responsible for preserving original order

Four Execution Ports !

Branch Prediction :

branch trace buffer which records results of most recent branch instructions processed