2348 words
12 minutes
Windows Internals
2025-04-11
2025-04-12

Windows divides its architecture into two distinct privilege domains: Kernel Mode and User Mode.

In Kernel Mode, the Executive layer implements core operating system services through specialized managers:

  • The Process Manager handles task creation, termination and scheduling
  • The Memory Manager controls virtual address spaces and paging operations
  • The I/O Manager orchestrates data flow between applications and hardware by routing I/O Request Packets (IRPs) through appropriate driver stacks
  • The Object Manager provides a unified framework for creating and managing kernel resources like mutexes, events, and semaphores through handle tables
  • Security enforcement falls to the Security Reference Monitor (SRM), which validates access tokens against Access Control Lists for every protected resource operation

Beneath the Executive sits the Kernel itself, managing fundamental CPU scheduling, interrupt dispatching, and multiprocessor synchronization using primitives like spinlocks and dispatcher objects. The Hardware Abstraction Layer (HAL) creates a consistent interface to diverse hardware platforms, isolating the kernel from processor-specific details like ACPI power management and multiprocessor communication protocols. Device Drivers, though executing in Kernel Mode, are modular components that can be loaded and unloaded dynamically to control hardware devices or provide system services.

In User Mode, applications execute with restricted privileges, unable to directly access hardware or critical system structures. Environment Subsystems provide API implementations that applications use to request system services:

  • The Win32 subsystem serves as the primary interface for traditional Windows applications
  • Newer additions like Windows Subsystem for Linux (WSL2) enable Linux binary compatibility through lightweight virtualization

User applications never interact directly with kernel components; instead, they invoke documented API functions exposed by these subsystems. These calls are translated into system service requests that transition from User Mode to Kernel Mode through carefully controlled entry points. For example, when an application calls CreateFile(), the request passes through kernel32.dll and eventually ntdll.dll, which executes a supervised transition to Kernel Mode where the NtCreateFile system service is performed.

Windows Process Architecture#

This diagram illustrates the internal structure and components that make up a Windows process. At the center is the EPROCESS structure, which acts as the kernel’s representation of a process. It provides information about process identity, memory space, threads, handles, access tokens, and environment details.

The Process ID serves as a unique identifier assigned to every process, enabling the system to track and manage processes individually. The Virtual Address Space is the memory sandbox allocated to the process, containing sections like code, data, stack, and heap. This space is managed through page tables, which map virtual addresses to physical memory.

Threads, represented by ETHREAD objects, are the units of execution within a process. Each thread has an associated TEB (Thread Environment Block), which contains user-mode information such as the stack and thread-local storage. Multiple threads can belong to the same process, supporting parallelism.

Handles refer to resources the process interacts with, such as open files, registry keys, mutexes, and shared memory sections. These handles link the process to kernel objects and resources it uses during execution.

The Access Token holds the security context of the process, defining the user and group SIDs, specific Privileges, and the process’s Integrity Level. This token governs what the process can access and execute based on the system’s security policies.

The PEB (Process Environment Block) is a user-mode structure containing process-specific metadata like environment variables, command line arguments, loaded modules (DLLs), and memory heaps. It is crucial for linking functionality between the kernel and user-mode applications.

Finally, the executable image describes the program code and data loaded into memory for execution. It includes the PE Headers (Portable Executable format) and sections such as the compiled code and initialized data.

Process/Thread Management Engine#

The Process/Thread Management Engine governs the lifecycle and execution of threads through a state-driven model. Threads transition between Ready, Running, Waiting, and Terminated states. A thread begins in the Ready state, enters Running when the scheduler dispatches it to a CPU core, and may return to Ready if preempted by a higher-priority thread. When a thread issues an I/O request, it moves to the Waiting state until the operation completes. Termination occurs when a thread exits or is forcefully ended, triggering resource cleanup.

Windows employs a priority-driven preemptive scheduler with 32 priority levels (0–31). Priorities are divided into two categories: base priority, statically assigned via APIs like SetPriorityClass, and dynamic priority, adjusted by the scheduler to optimize responsiveness (e.g., boosting I/O-bound threads in foreground applications). Execution time slices (quantum) are configurable through registry settings under HKLM\SYSTEM\CurrentControlSet\Control\PriorityControl, allowing fine-tuning for specific workloads.

Key kernel data structures underpin this system. The KPROCESS structure stores process-wide metadata, including virtual memory statistics, handle tables, and security context. KTHREAD tracks thread-specific details such as the Thread Environment Block (TEB), CPU register states, and Asynchronous Procedure Call (APC) queues. The Dispatcher Header manages synchronization by signaling thread state transitions (e.g., mutex acquisition) between signaled and unsignaled states.

Virtual Memory Subsystem#

The Virtual Memory Subsystem manages how Windows allocates, maps, and optimizes memory across physical and virtual spaces. At its core, the Memory Manager translates virtual addresses used by applications into physical addresses via a 4-level paging structure (PML4 → Page Directory Pointer → Page Directory → Page Table). This hierarchy supports both 4KB and 2MB page sizes, balancing granularity and efficiency for diverse workloads.

A process’s Virtual Address Space is partitioned into logical sections (code, data, stack) and mapped to Page Tables, which track the location of each page in physical memory or disk. The Working Set—a subset of these pages actively residing in RAM—is managed by a clock algorithm that evicts less-recently-used pages to free space. Pages in the working set are categorized as Valid (immediately accessible) or Transition (in the process of being loaded or written to disk).

The Page Frame Number (PFN) Database acts as a global index of all physical memory pages, recording their states (active, standby, modified, or free). When a Page Fault occurs, the Memory Manager determines its type: a Hard Fault requires retrieving the page from disk (pagefile.sys), while a Soft Fault involves remapping a page already in physical memory (e.g., copy-on-write operations).

I/O Subsystem Mechanics#

The I/O Subsystem Mechanics in Windows orchestrate the complex flow of data between applications and hardware through a layered, packet-driven architecture. When an application initiates an I/O operation, the I/O Manager creates an I/O Request Packet (IRP), a data structure representing the operation throughout its lifecycle. This IRP contains essential fields: the MajorFunction code identifying the operation type (such as IRP_MJ_READ or IRP_MJ_WRITE), Parameters specific to the device and operation (buffer addresses, offsets, lengths), and a Completion Routine callback address for asynchronous notification when the operation finishes.

Once created, the IRP travels down a stack of drivers, each handling a specific layer of abstraction. The I/O Manager routes the IRP to the appropriate file system driver, such as NTFS, which interprets file metadata through its Master File Table (MFT) to locate the physical storage location. NTFS then passes the processed IRP to volume management drivers like VolMgr, which translate logical volumes to physical partitions. Finally, the IRP reaches storage port drivers such as Storport, which communicate with hardware-specific miniport drivers designed for particular storage controllers.

At the hardware level, modern storage operations leverage Direct Memory Access (DMA) to transfer data between memory and peripherals without CPU intervention. The disk driver prepares scatter/gather lists that map potentially non-contiguous memory buffers into a format the hardware can efficiently process. This mapping enables the storage controller to read from or write to system memory directly, significantly reducing CPU overhead during data transfers.

When hardware completes the operation, it signals the system via an interrupt. This interrupt triggers the disk driver to process completion status and propagate the results back up the driver stack. Each driver in the chain may perform post-processing before passing control upward. Eventually, the I/O Manager receives the completed IRP and either notifies the application directly (for synchronous calls) or invokes the registered completion routine (for asynchronous operations).

Security Enforcement Layer#

The Security Enforcement Layer in Windows implements an access control model centered around the Security Reference Monitor (SRM), a kernel-mode component that validates all resource access attempts. When an application or process attempts to access a secured object, the SRM evaluates the request by comparing the requestor’s Access Token against the object’s Security Descriptor. This token serves as the security context of the process, containing the user’s Security Identifier (SID), group memberships, and specific Privileges such as SeDebugPrivilege or SeBackupPrivilege that grant special capabilities beyond standard permissions.

Security Descriptors attached to each securable object contain two critical access control lists. The Discretionary Access Control List (DACL) defines which users or groups can access the object and what operations they can perform through Access Control Entries (ACEs). Each ACE specifies a SID and the allowed or denied permissions. The System Access Control List (SACL) enables security auditing by defining which access attempts should generate entries in the security event log, essential for compliance monitoring and forensic analysis.

Windows extends this model with Mandatory Integrity Control, which assigns integrity levels (Low, Medium, High, System) to processes and objects. A process cannot modify objects with higher integrity levels regardless of DACL permissions, creating a hierarchical protection system. This prevents lower-privileged processes from tampering with more sensitive system components. For enhanced protection of credential data, Windows implements Credential Guard, which isolates the Local Security Authority Subsystem Service (LSASS) within a virtualized secure environment using Hyper-V technology, preventing malware from extracting authentication tokens and hashes even if it gains system-level access (normaly)…

Practical example#

WARNING

All following acions are done in a high integrity context.

Utilizing WinDbg we can interact with the kernel and modifying Token of a Windows process :

alt text

Let’s per example change the Token of the cmd.exe process, we need first to retrieve his EPROCESS_ID :

!process 0 0 cmd.exe

alt text

And then look at the crurrent offset of the Token value :

dt nt!_eprocess <SYSTEM_EPROCESS> Token

We can see that it’s offset is 0x4b8 and it’s type is _EX_FAST_REF.
Let’s retrieve the System Eprocess address from system :

!process 0 0 system

alt text

And get the Token value of system by adding the offset :

imagedt _EX_FAST_REF <SYSTEM_EPROCESS>+0x4b8

alt text

Replace the token from CMD process with the one from system :

eq <CMD_EPROCESS>+0x4b8 <SYSTEM_Token>

And verify the change :

dt _EX_FAST_REF <CMD_EPROCESS>+0x4b8

alt text

Continue the kernel execution :

Registry Database Engine#

The Registry Database Engine serves as the centralized hierarchical configuration repository for Windows, storing settings for the operating system, applications, users, and hardware. It organizes data in a tree-like structure with major root keys including HKEY_LOCAL_MACHINE (HKLM), which stores system-wide settings in hives such as SYSTEM (boot and driver configurations), SOFTWARE (application settings), and HARDWARE (detected device information). User-specific settings reside in HKEY_USERS, organized by Security Identifier (SID), while HKEY_CURRENT_CONFIG provides a view into the current hardware profile, primarily referencing system settings from the CurrentControlSet control set.

Physically, the Registry is stored across multiple hive files, each corresponding to a major section of the Registry hierarchy. These files reside primarily in %SystemRoot%\System32\Config and in user profile directories. The hive files employ a sophisticated database-like structure with atomic transaction support to maintain data integrity. When Registry modifications occur, changes are first written to transaction log files (log1/log2) rather than directly to the hive, creating a journaling system. This approach protects against corruption during system crashes or power failures, as incomplete modifications remain in the logs until they can be safely committed or rolled back during the next system boot.

Internally, the Registry uses a cell-based storage model where Key Cells contain metadata about Registry keys, including timestamps, security descriptors, and references to subkeys and values. To optimize performance, the Registry implements Fast Leaf Optimization using B+ tree data structures that enable efficient key lookups with O(log n) complexity. This indexing is crucial for rapid access to Registry data, especially in hives containing thousands of entries, such as the SOFTWARE hive on a typical installation.

For developers and security products, Windows provides a Registry Filtering mechanism through the CmRegisterCallback and CmRegisterCallbackEx APIs. These functions allow kernel-mode drivers to intercept Registry operations before they execute, enabling monitoring, modification, or blocking of Registry access. This capability forms the foundation for many antivirus products, application control solutions, and system monitoring tools that need to protect or audit sensitive Registry operations.

The Registry’s Recovery Mechanism activates during system startup when Windows detects “dirty” hives, those that were not cleanly closed during the previous session. The system automatically applies or rolls back pending transactions from the log files to restore consistency. In cases of severe corruption, Windows can also recover from backup hives (*.bak files) or restore points.

Kernel Synchronization Primitives#

The Kernel Synchronization Primitives in Windows establish a hierarchical interrupt handling and thread coordination framework essential for system stability and performance. At the core of this architecture is the IRQL (Interrupt Request Level) mechanism, which prioritizes code execution based on urgency and criticality. The diagram illustrates how IRQLs form a progression from lower to higher levels, with each transition representing increasing execution priority and more restrictive execution context.

At PASSIVE_LEVEL (IRQL 0), the lowest priority, both user-mode code and most kernel-mode operations execute. This level permits thread context switches and page faults, allowing memory access operations to complete transparently. When the system needs to deliver Asynchronous Procedure Calls (APCs) to a thread, it temporarily raises the IRQL to APC_LEVEL (IRQL 1), preventing further APCs from interrupting while current ones process. For scheduling operations and deferred processing, the system elevates to DISPATCH_LEVEL (IRQL 2), where the thread scheduler runs and Deferred Procedure Calls (DPCs) execute. At this level, thread context switches and page faults are prohibited, constraining what operations can safely occur.

Hardware interrupts occur at Device IRQL (DIRQL) levels above DISPATCH_LEVEL, with each device assigned a specific IRQL based on its priority. Network cards, disk controllers, and other peripherals trigger interrupts that preempt code running at lower IRQLs. At the highest level, HIGH_LEVEL (IRQL 31), critical interrupts like Non-Maskable Interrupts (NMIs) and Performance Monitoring Counter overflows execute, preempting all other system activities.

Windows provides specialized synchronization primitives optimized for different scenarios across these IRQL levels. Spinlocks, acquired via functions like KeAcquireSpinLockAtDpcLevel, provide exclusive access protection in multiprocessor environments by continuously polling (spinning) until the lock becomes available. Unlike waiting synchronization objects, spinlocks don’t relinquish CPU time, making them suitable for brief, critical sections at DISPATCH_LEVEL or above. For more complex synchronization patterns, Executive Resources implement reader-writer semantics through ExAcquireResourceSharedLite and ExAcquireResourceExclusiveLite, allowing multiple concurrent readers but exclusive access for writers.

Windows Internals
https://xsec.fr/posts/evasion/windows-internals/
Author
Xsec
Published at
2025-04-11
License
CC BY-NC-SA 4.0