In these days that I was currently quite free, I have took the occasion to deepen a feature of all X64 systems… Indeed last month, when I was analysing a sample of Expiro File infector, I encountered an instruction like this:
mov r11, gs:10h
Of course, according to the code context, and to my previous x86 experience, the previous opcode will move the content of current Teb (thread environment block) Stack limit field, in r11 register.
But how this is implemented in a X64 CPU?
According to Intel manuals (System Programming Guide, Chapter 3.2.4):
“In 64-bit mode, segmentation is generally (but not completely) disabled, creating a flat 64-bit linear-address space. The processor treats the segment base of CS, DS, ES, SS as zero, creating a linear address that is equal to the effective address. The FS and GS segments are exceptions. These segment registers (which hold the segment base) can be used as an additional base registers in linear address calculations. They facilitate addressing local data and certain operating system data structures.”
It seems that 64 bit segments and their descriptor tables are now useless. FS and GS segments are exceptions. I have dumped a GDT from a live 64 bit system, then developed a specific driver to perform some analysis. Here are the results:
According to the previous picture, it seems that all x64 segments are used only for memory protection (Ring 0-3 protections). But, one important question arise: how is possible that GS segment base address is 0? It is indeed very improbable that TEB could be located at address 0x00000000.
The following snapshot demonstrate that all segments are exploited in X64 architecture only for memory protection, meanwhile standard x86 segments are used for full segmentation as in the previous x86 architecture:
So far so good… Now it’s time to understand why FS and GS segments work in the way as they do for 64 bit long mode. The answer to the question resides in Windows X64 Syscall handler. Intel manual states that SYSCALL new instruction transfers execution control to the address found in IA32_LSTAR model specific register (and changes CPU current privilege level). IA32_LSTAR register points to “KiSystemCall64” Nt kernel routine. Whenever a native API is called from user mode, Ntdll code exploits SYSCALL instruction to perform Kernel transition. The transition is managed by “KiSystemCall64” procedure.
This routine first invokes “swapgs” instruction. According to Intel manuals: “SWAPGS exchanges the current GS base register value with the value contained in MSR address C0000102H (IA32_KERNEL_GS_BASE). The SWAPGS instruction is a privileged instruction intended for use by system software”. Based on our test, the latter information is not 100% accurate. The value of GS base register actually equals to the value contained in IA32_GS_BASE model specific register. In x64 Windows systems, these MSR contains:
- IA32_KERNEL_GS_BASE – Pointer to current processor control region (PCR)
- IA32_GS_BASE – Pointer to current execution thread TEB
- IA32_FS_BASE – Currently unused in Windows x64. Its value equals to the base address of 32 bit FS segment descriptor (located in GDT). In 64 bit executables an instruction like “MOV RAX, FS:[10h]” causes an access violation
The test driver I have developed confirms all the previous conclusions. As a matter of fact, Windows operating system, when working in 64 long mode, has GS segment that always points to current thread TEB (in user mode), whereas in kernel mode points to current processor PCR.
Adding a segment descriptor, and doing some other tests, confirms again that X64 GDT is used in long mode only for memory protections. In 32 bit compatibility mode, all segments are used as normal for memory segmentation.
Returning to Windows System call handler, its job is quite easy: as the reader can see, the user-mode stack pointer is saved in PCR data structure, Kernel stack pointer is then retrieved, all GP registers (and MMX flag register) are pushed on the stack and user-mode debug environment is saved (if needed). Execution control is transferred to “KiSystemServiceStart”. The latter routine is the key of System service dispatch feature: first, it calculates the right system table pointer (if the native API number is above 0x1000, then the required function is a Win32 gdi graphics one, otherwise a standard Nt kernel API). It retrieves the right native API pointer from table, copies all remaining stack parameters (KiSystemServiceCopyStart) and finally calls kernel API. Noteworthy is that Windows 8 & 8.1 Syscall dispatch is totally changed from older Microsoft operating systems. Deep describe these new features is behind the scope of this brief paper…
A printable “pdf” version of this article is available here.
9 thoughts on “X64 Memory segmentation – Is the game over?”
Interesting article. You inspired me to research more on this topic. Thanks!
You are welcome hasherezade!
If you would like to share with me your new research topic, it could be great!
In the meantime I am trying your product: PE-bear, from your website…. 🙂
Hey! I am glad to find your post. I actually stumbled upon it while seeking an answer to the question: why is the cs register being used as if it was a kernel data structure?
Looking upon some code from the kernel driver win32k.sys, I have seen the code refering to cs:_gptiCurrent; this seemed a bit odd, as the cs register should be used to indicate the current code section, and made me wonder. This question still remains unanwered, so if you have some insights on the matter, I’d love to hear them.
This post made me revisit the topic of segmentation which I have researched a bit in the past, and made me re-think about it a bit. In my research, I have learned how Intel defines segmentation and tried to reflect upon how Windows is using it; thus, I have written a kernel mode driver which is pretty similar to the one you described in your article.
The code for it is found here: https://github.com/scalys7/Windows-Kernel-Research/tree/master/Windows%20Kernel%20Research .
When I revisit it and it’s execution results, and comparing to what you have described here, I found it interesting. At first glance, the results has shown that the gs register in kernel mode is at base address 0 and points onto a user mode segment. This contradicted all health logic, so I have revisited the code I wrote to print segment register values; that approved my assumption – the details printed where extracted from the corresponding GDT entry, given by the index from the selector part of the register. This makes sense, due to the following two points: 1) the instruction mov ax, gs only reads the selector part of the register and 2) the actual values used are those in the hidden part of the segment register.
I later continued on gazing upon the psuedo-code (http://www.felixcloutier.com/x86/SWAPGS.html) describing the swapgs instruction, which fully resolved this anecdote; the swapgs instruction only replaces the hidden part of the gs register value.
One remark on your article – It is not true to say that the swapgs instruction uses IA32_GS_Base, as you may clearly see in the psuedo-code. What I think that might have confused you, is that most likely IA32_GS_Base always holds the TEB; I do not know where it is being used, as the swapgs instruction uses Kernel_Base exclusively, but I guess that when dumping the values on user mode, you would find it to point to TEB and Kernel_Base to point onto the PCR, which makes sense. During kernel mode execution, the Kernel_Base should also point onto the TEB by the above linked code.
I have written this reply mainly because I have a few questions unanswered:
1) You have claimed that:
“X64 GDT is used in long mode only for memory protections. In 32 bit compatibility mode, all segments are used as normal for memory segmentation.”
Is there any documentation upon how Windows 32-bit uses a full segmentation model?
2) The cs register usage as described in the very beginning of this comment.
Hi J.C. Scaly!
First of all, thanks for having signalled this.
I would like to try to answer your questions.
If you take the Intel Manual (section 3.2.4 – Volume 3A) you will find that: “In 64-bit mode, segmentation is generally (but not completely) disabled, creating a flat 64-bit linear-address
space. The processor treats the segment base of CS, DS, ES, SS as zero, creating a linear address that is equal to the effective address.” This means that the CS, GS, DS, ES, SS segment descriptor, are still valid, but the “offset” part will be zero or will be not considered by the Processor. The only part considered is the one regarding the memory protection (at least based on my tests, to implement ring 0 / ring 3 separation). This is why you will find all the 64-bit segment descriptor offset always equal to zero (see even my screenshots) in 64-bit mode. Even in 32-bit modern Windows Operating systems, the segmentation is mainly (but not always) used for ring 0 – ring 3 separation. Take a look at Intel’s manual “Chapter 5.5 – Privilege leves” to find how the system implements this separation. You can check even the following article:
Regarding the “CS” matter, honestly I have no idea. I am 10000% sure that the segmentation in 64-bit mode is not used, but rarely I have seen something like “mov rax, qword ptr cs:[0xADDR]”. I think that it depends on some storically motivations. I have even made the try to substitute the opcode, and I can assure that the results are still the same. The only segment register used in X64 is the GS segment.
If you take a look at the Intel manuals you will find:
IF CS.L ≠ 1 (* Not in 64-Bit Mode *)
IF CPL ≠ 0
THEN #GP(0); FI;
tmp ← GS.base;
GS.base ← IA32_KERNEL_GS_BASE;
IA32_KERNEL_GS_BASE ← tmp;
This pseudocode has been confirmed even from my tests.
Hope that this will help you.
Sorry for the late answer.
Andrea (aka Aall86)
Nice article! I do have one follow up. I was checking the address of the fs, and in the windows systems it seems to be 0x30 but for a 64-bit system, it seems 0x53 (even in your post it seems the same). Sorry if it is the dumb question, I am just getting started 🙂
This is because, according to the Intel manuals (see section 3.4.2), 4 bits in the Segment selector are used for storing the RPL and TI flag, which, in your case, corresponds to:
Request Privilege Level (RPL) = 3 – User-mode code
Table Indicator = 0 – GDT
Hope that this help.
Hey, AaLl86, thanks for the article.
I have a question. If the base address of the GS segment is just 32-bits how is it used in a 64-bit flat addressing space? I mean, how can a segment reach beyond the first 4GB of the 64-bit linear address space with just 32-bits? Or is it sign-extended to 64-bits. Which still doesn’t help the issue too much.
The GS segment descriptor in 64-bit environment is always 64-bit. Are you perhaps confusing with the Segment Selector???
I meant 64-bit base address for GS register. Where does it come from if the segment descriptor has only 32-bit base?