I am trying to write to Extended Control Register 0 (xcr0
) on an x86_64 Debian v7 virtual machine. My approach to doing so is through a kernel module (so CPL=0
) with some inline assembly. However, I keep getting a general protection fault (#GP
) when I try to execute the xsetbv
instruction.
The init
function of my module first checks that the osxsave
bit is set in control register 4 (cr4
). If it isn’t, it sets it. Then, I read the xcr0
register using xgetbv
. This works fine and (in the limited testing I have done) has the value 0b111
. I would like to set the bndreg
and bndcsr
bits which are the 3rd and 4th bits (0-indexed), so I do some OR
ing and write 0b11111
back to xcr0
using xsetbv
. The code to achieve this last part is as follows.
unsigned long xcr0; /* extended register */
unsigned long bndreg = 0x8; /* 3rd bit in xcr0 */
unsigned long bndcsr = 0x10; /* 4th bit in xcr0 */
/* ... checking cr4 for osxsave and reading xcr0 ... */
if (!(xcr0 & bndreg))
xcr0 |= bndreg;
if (!(xcr0 & bndcsr))
xcr0 |= bndcsr;
/* ... xcr0 is now 0b11111 ... */
/*
* write changes to xcr0; ignore high bits (set them =0) b/c they are reserved
*/
unsigned long new_xcr0 = ((xcr0) & 0xffffffff);
__asm__ volatile (
"mov $0, %%ecx tn" // %ecx selects the xcr to write
"xor %%rdx, %%rdx tn" // set %rdx to zero
"xsetbv tn" // write from edx:eax into xcr0
:
: "a" (new_xcr0) /* input */
: "ecx", "rdx" /* clobbered */
);
By looking at the trace from the general protection fault, I determined that the xsetbv
instruction is the problem. However, if I don’t manipulate xcr0
and just read its value and write it back, things seem to work fine. Looking at the Intel manual and this site, I found various reasons for a #GP
, but none of them seem to match my situation. The reasons are as follows along with my explanation for why they most likely don’t apply.
-
If the current privilege level is not 0 –> I use a kernel module to achieve
CPL=0
-
If an invalid
xcr
is specified in%ecx
–> 0 is in%ecx
which is valid and worked forxgetbv
-
If the value in
edx:eax
sets bits that are reserved in thexcr
specified byecx
–> according to the Intel manual and Wikipedia the bits I am setting are not reserved -
If an attempt is made to clear bit 0 of
xcr0
–> I printed outxcr0
before setting it, and it was0b11111
-
If an attempt is made to set
xcr0[2:1]
to0b10
–> I printed outxcr0
before setting it, and it was0b11111
Thank you in advance for any help discovering why this #GP
is happening.
2
Answers
Peter Cordes was right, it was a problem with my hypervisor. I am using VMWare Fusion for virtualization, and after a lot of digging on the internet I found the following quote from VMWare:
The solution VMWare proposed was to edit the virtual machine's
.vmx
file with the following directive.After I did this, things worked and I was able to use
xsetbv
to enable thebndreg
andbndcsr
bits ofxcr0
.When using VMWare to expose CPU features from the host to the guest under more normal conditions (i.e. the feature isn't plagued with deprecation) you can mask the bits of
cpuid
leaves by adding the following to the VM's.vmx
file.So, for example, if we assume that SMAP can be exposed this way, we would want to set bit 20 of
cpuid
leaf 7.Colons are optional to ease reading of the string, ones and zeros override any default settings, and dashes are used to leave default setting alone.
It would enable MPX if you ran it on a machine that supported MPX. (Assuming your code is correct.)
The virtual x86 CPU your VM is running on does not, according to its own virtualized CPUID, so it’s not surprising at all that this faults. The hypervisor might be doing this manually in a VMEXIT, emulating
xsetbv
and checking the changes to the virtualized xcr0.If you want to use features your HW has but your VM doesn’t support, in general you have to run on bare metal instead. Or find a different VM that does expose the feature to the guest.
Note that MPX introduces new architectural state (the
bnd
registers) that have to get saved/restored on context switches. If your hypervisor doesn’t want to do that, that would be one reason to disable MPX. (I think it can get saved/restored as part ofxsave
, but it does make the save slightly larger.) I haven’t looked at MPX much; it might be something the hypervisor would have to deal with in vmexits to not have bounds checking apply to the hypervisor… If so that would be a major inconvenience.