In this post we’ll examine the exploitability of CVE-2021-1648, a privilege escalation bug in splwow64. I actually started writing this post to organize my notes on the bug and subsystem, and was initially skeptical of its exploitability. I went back and forth on the notion, ultimately ditching the bug. Regardless, organizing notes and writing blogs can be a valuable exercise! The vector is useful, seems to have a lot of attack surface, and will likely crop up again unless Microsoft performs a serious exorcism on the entire spooler architecture.
This bug was first detailed by Google Project Zero (GP0) on December 23, 2020. While it’s unclear from the original GP0 description if the bug was discovered in the wild, k0shl later detailed that it was his bug reported to MSRC in July 2020 and only just patched in January of 2021. Seems, then, that it was a case of bug collision. The bug is a usermode crash in the splwow64 process, caused by a wild memcpy in one of the LPC endpoints. This could lead to a privilege escalation from a low IL to medium.
This particular vector has a sordid history that’s probably worth briefly detailing. In short, splwow64 is used to host 64-bit usermode printer drivers and implements an LPC endpoint, thus allowing 32-bit processes access to 64-bit printer drivers. This vector was popularized by Kasperksy in their great analysis of Operation Powerfall, an APT they detailed in August of 2020. As part of the chain they analyzed CVE-2020-0986, effectively the same bug as CVE-2021-1648, as noted by GP0. In turn, CVE-2020-0986 is essentially the same bug as another found in the wild, CVE-2019-0880. Each time Microsoft failed to adequately patch the bug, leading to a new variant: first there were no pointer checks, then it was guarded by driver cookies, then offsets. We’ll look at how they finally chose to patch the bug later — for now.
I won’t regurgitate how the LPC interface works; for that, I recommend reading Kaspersky’s Operation Powerfall post as well as the blog by ByteRaptor. Both of these cover the architecture of the vector well enough to understand what’s happening. Instead, we’ll focus on what’s changed since CVE-2020-0986.
To catch you up very briefly, though:
splwow64 exposes an LPC endpoint that
any process can connect to and send requests. These requests carry opcodes and
input parameters to a variety of printer functions (OpenPrinter, ClosePrinter,
etc.). These functions occasionally require pointers as input, and thus the
input buffer needs to support those.
As alluded to, Microsoft chose to instead use offsets in the LPC request buffers instead of raw pointers. Since the input/output addresses were to be
used in memcpy’s, they need to be translated back from offsets to absolute addresses. The functions
UMPDOffsetFromPointer were added to accomodate this need. Here’s
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
So as per the GP0 post, the buffer addresses are indeed restricted to
<=0x7fffffff. Implicit in this is also the fact that our offset is unsigned,
meaning we can only work with positive numbers; therefore, if our target
address is somewhere below our
lpBufStart, we’re out of luck.
This new offset strategy kills the previous techniques used to exploit this
vulnerability. Under CVE-2020-0986, they exploited the memcpy by targeting a
global function pointer. When request 0x6A is called, a function
bLoadSpooler) is used to resolve a dozen or so winspool functions used for
interfacing with printers:
These global variables are “protected” by
RtlEncodePointer, as detailed by
Kaspersky, but this is relatively trivial to break when executing locally.
Using the memcpy with arbitrary src/dst addresses, they were able to overwrite
the function pointers and replace one with a call to
Unfortunately, now that offsets are used, we can no longer target any arbitrary address. Not only are we restricted to 32-bit addresses, but we are also restricted to addresses >= the message buffer and <= 0x7fffffff.
I had a few thoughts/strategies here. My first attempt was to target UMPD
cookies. This was part of a mitigation added after 0986 as again described by
Kaspersky. Essentially, in order to invoke the other functions available to
splwow64, we need to open a handle to a target printer. Doing this, GDI creates
a cookie for us and stores it in an internal linked list. The cookie is created
LoadUserModePrinterDriverEx and is of type UMPD:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
When a request for a printer action comes in, GDI will check if the request contains a valid printer handle and a cookie for it exists. Conveniently, there’s a function pointer table at the end of the UMPD structure called by a number of LPC functions. By using the pointer to the head of the cookie list, a global variable, we can inspect the list:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
This is the first UMPD cookie entry, and we can see its function table contains 5 entries. Conveniently all of these heap addresses are 32-bit.
Unfortunately, none of these functions are called from
splwow64 LPC. When processing the LPC requests, the following check is performed on the received buffer:
This effectively limits the functions we can call to 0x6a through 0x74, and the only times the function tables are referenced are prior to 0x6a.
Another strategy I looked at was abusing the fact that request buffers are allocated from the same heap, and thus linear. Essentially, I wanted to see if I could TOCTTOU the buffer by overwriting the memcpy destination after it’s transformed from an offset to an address, but before it’s processed. Since the
splwow64 process is disposable and we can crash it as often as we’d like without impacting system stability, it seems possible. After tinkering with heap allocations for awhile, I discovered a helpful primitive.
When a request comes into the LPC server,
splwow64 will first allocate a buffer and then copy the request into it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Notice there are effectively no checks on the message size; this gives us the ability to allocate chunks of arbitrary size. What’s more is that once the request has finished processing, the output is copied back to the memory view and the buffer is released. Since the Windows heap aggressively returns free chunks of same sized requests, we can obtain reliable read/write into another message buffer. Here’s the leaked heap address after several runs:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Since we can only write to addresses ahead of ours, we can use 0xdd9e90 to write into 0x2b43fe0 (offset of 0x1d6a150). Note that these allocations are coming out of the front-end allocator due to their size, but as previously mentioned, we’ve got a lot of control there.
After a few hours and a lot of threads, I abandoned this approach as I was unable to trigger an appropriately timed overwrite. I found a memory leak in the port connection code, but it’s tiny (0x18 bytes) and doesn’t improve the odds, no matter how much pressure I put on the heap. I next attempted to target the message type field; maybe the connection timing was easier to land. Recall that
splwow64 restricts the message type we can request. This is because certain message types are considered “privileged”. How privileged, you ask? Well, let’s see what 0x76 does:
1 2 3 4 5 6 7
A fully controlled memcpy with zero checks on the values passed. If we could gain access to this we could use the old techniques used to exploit this vulnerability.
After rigging up some threads to spray, I quickly identified a crash:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
That’s the format of our spray, but you’ll notice it’s crashing during allocation. Basically, the message buffer chunk was freed and we’ve managed to overwrite the freelist chunk’s forward link prior to it being reused. Once our next request comes in, it attempts to allocate a chunk out of this sized bucket and crashes walking the list.
Notably, we can also corrupt a busy chunk’s header, leading to a crash during the free process:
1 2 3 4 5 6 7 8 9 10 11 12 13
This is an interesting primitive because it grants us full control over a heap chunk, both free and busy, but unlike the browser world, full of its class objects and vtables, our message buffer is flat, already assumed to be untrustworthy. This means we can’t just overwrite a function pointer or modify an object length. Furthermore, the lifespan of the object is quite short. Once the message has been processed and the response copied back to the shared memory region, the chunk is released.
I spent quite a bit of time digging into public work on NT/LF heap exploitation primitives in modern Windows 10, but came up empty. Most work these days focuses on browser heaps and, typically, abusing object fields to gain code execution or AAR/AAW. @scwuaptx has a great paper on modern heap internals/primitives and an example from a CTF in ‘19, but ends up using a FILE object to gain r/w which is unavailable here.
While I wasn’t able to take this to full code execution, I’m fairly confident this is doable provided the right heap primitive comes along. I was able to gain full control over a free and busy chunk with valid headers (leaking the heap encoding cookie), but Microsoft has killed all the public techniques, and I don’t have the motivation to find new ones (for now ;P).
The code is available on Github, which is based on the public PoC. It uses my technique described above to leak the heap cookie and smash a free chunk’s flink.
Microsoft patched this in January, just a few weeks after Project Zero FD’d the bug. They added a variety of things to the function, but the crux of the patch now requires a buffer size which is then used as a bounds check before performing memcpy’s.
GdiPrinterThunk now checks if
DisableUmpdBufferSizeCheck is set in
HKLM\Software\Microsoft\Windows NT\CurrentVersion\GRE_Initialize. If it’s not,
GdiPrinterThunk_Unpatched is used, otherwise,
GdiPrinterThunk_Patched. I can only surmise that they didn’t want to break compatibility with…something, and decided to implement a hack while they work on a more complete solution (AppContainer..?). The new
1 2 3 4 5 6 7 8 9 10
Along with the buf size they now also require the return buffer size and check to ensure it’s sufficiently large enough to hold output (this is supplied by the ProxyMsg in
And the specific patch for the 0x6d memcpy:
1 2 3 4 5 6 7 8 9 10 11 12 13
It’s a little funny at first and seems like an incomplete patch, but it’s because Microsoft has removed (or rather, inlined) all of the previous
UMPDPointerFromOffset calls. It still exists, but it’s only called from within
UMPDStringPointerFromOffset_Patched and now named
UMPDPointerFromOffset_Patched. Here’s how they’ve replaced the source offset conversion/check:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
It seems messier this way, but is probably just compiler optimization.
MCpySrc is the address of the source struct, which is:
1 2 3 4 5
Size is likely split out for additional functionality in other LPC functions, but I didn’t bother figuring out why. The destination offset/pointer is resolved in a similar fashion.
Funny enough, the
GdiPrinterThunk_Unpatched really is unpatched; the vulnerable memcpy code lives on.