This blogpost, discusses the technical details of WeSee in more detail. Our paper shows how to break confidentiality and integrity of SEV-SNP VMs with #VC interrupts. We discuss the implementation details of the exploit. This is an early revision and not fully finished yet.
VC interrupts are used by SEV-ES and SEV-SNP guests. They are raised by hardware when a guest attempts to execute an instruction that cannot be directly executed for whatever reason. Depending on the executed instruction the guest will communicate with the hypervisor for assistance to emulate it or just skip the instruction and do nothing (i.e., mwait
). Communication with the hypervisor happens via a shared buffer page called GHCB.
The code in arch/x86/entry/entry_64.S
is executed once a VC is recognized by the CPU. The stack created by hardware when the interrupt arrives is shown in the following figure (https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/40332.pdf page 738).
The #VC IDT entry is marked as IST
it operates on a dedicated stack when the interrupt is raised (https://www.kernel.org/doc/Documentation/x86/kernel-stacks / https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/40332.pdf page 740). The figure below shows how the hardware sets up the context in this case.
Interrupts Linux executes on IST
stacks are defined in arch/x86/include/asm/cpu_entry_area.h
.
After the hardware transfers control to the interrupt service routine the kernel executes some default routines( swapgs
, pushing regs on stack such that the stack contains the interrupted context register state in struct pt_regs
format, …). Then the VC routine tries to switch off the IST stack and performs. Depending on whether the entry was from userspace or kernelspace different functions are executed.
#ifdef CONFIG_AMD_MEM_ENCRYPT
/**
* idtentry_vc - Macro to generate entry stub for #VC
* @vector: Vector number
* @asmsym: ASM symbol for the entry point
* @cfunc: C function to be called
*
* The macro emits code to set up the kernel context for #VC. The #VC handler
* runs on an IST stack and needs to be able to cause nested #VC exceptions.
*
* To make this work the #VC entry code tries its best to pretend it doesn't use
* an IST stack by switching to the task stack if coming from user-space (which
* includes early SYSCALL entry path) or back to the stack in the IRET frame if
* entered from kernel-mode.
*
* If entered from kernel-mode the return stack is validated first, and if it is
* not safe to use (e.g. because it points to the entry stack) the #VC handler
* will switch to a fall-back stack (VC2) and call a special handler function.
*
* The macro is only used for one vector, but it is planned to be extended in
* the future for the #HV exception.
*/
.macro idtentry_vc vector asmsym cfunc
SYM_CODE_START(\\asmsym)
UNWIND_HINT_IRET_ENTRY
ENDBR
ASM_CLAC
cld
/*
* If the entry is from userspace, switch stacks and treat it as
* a normal entry.
*/
testb $3, CS-ORIG_RAX(%rsp)
jnz .Lfrom_usermode_switch_stack_\\@
/*
* paranoid_entry returns SWAPGS flag for paranoid_exit in EBX.
* EBX == 0 -> SWAPGS, EBX == 1 -> no SWAPGS
*/
call paranoid_entry
UNWIND_HINT_REGS
/*
* Switch off the IST stack to make it free for nested exceptions. The
* vc_switch_off_ist() function will switch back to the interrupted
* stack if it is safe to do so. If not it switches to the VC fall-back
* stack.
*/
movq %rsp, %rdi /* pt_regs pointer */
call vc_switch_off_ist
movq %rax, %rsp /* Switch to new stack */
ENCODE_FRAME_POINTER
UNWIND_HINT_REGS
/* Update pt_regs */
movq ORIG_RAX(%rsp), %rsi /* get error code into 2nd argument*/
movq $-1, ORIG_RAX(%rsp) /* no syscall to restart */
movq %rsp, %rdi /* pt_regs pointer */
call kernel_\\cfunc
/*
* No need to switch back to the IST stack. The current stack is either
* identical to the stack in the IRET frame or the VC fall-back stack,
* so it is definitely mapped even with PTI enabled.
*/
jmp paranoid_exit
/* Switch to the regular task stack */
.Lfrom_usermode_switch_stack_\\@:
idtentry_body user_\\cfunc, has_error_code=1
_ASM_NOKPROBE(\\asmsym)
SYM_CODE_END(\\asmsym)
.endm
#endif
Once context switching and register preparation is done we call the ‘actual’ interrupt handler DEFINE_IDTENTRY_VC_KERNEL(exc_vmm_communication)
defined in arch/x86/kernel/sev.c
.
In the beginning we validate on which stack we operate. If we run on the VC2
stack we panic (unless this is a nested VC exception and it was caused by an instruction already executing on the VC2
stack). vc_switch_off_ist
shows conditions that can led to a VC2
stack.
static __always_inline bool vc_from_invalid_context(struct pt_regs *regs)
{
unsigned long sp, prev_sp;
sp = (unsigned long)regs;
prev_sp = regs->sp;
/*
* If the code was already executing on the VC2 stack when the #VC
* happened, let it proceed to the normal handling routine. This way the
* code executing on the VC2 stack can cause #VC exceptions to get handled.
*/
return is_vc2_stack(sp) && !is_vc2_stack(prev_sp);
}
...
DEFINE_IDTENTRY_VC_KERNEL(exc_vmm_communication)
{
...
if (unlikely(vc_from_invalid_context(regs))) {
instrumentation_begin();
panic("Can't handle #VC exception from unsupported context\\n");
instrumentation_end();
}
...
if (!**vc_raw_handle_exception**(regs, error_code)) {
/* Show some debug info */
show_regs(regs);
/* Ask hypervisor to sev_es_terminate */
sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ);
/* If that fails and we get here - just panic */
panic("Returned from Terminate-Request to Hypervisor\\n");
}
instrumentation_end();
irqentry_nmi_exit(regs, irq_state);
}