Use LatencyMon instead of DPC Lat Checker. LatencyMon tells you the exact ISR causing the long latency instead of making you guess. I'd bet money it's the Nvidia driver.
You should set up 2MB
hugepages to reduce some pressure on the MMU.
I don't have an AMD system (anymore, I used to run a 1950X but now it's a 7920X) so I can only tell you what I have gleaned from research. The first thing is that you need to distro hop or upgrade ASAP, CentOS 7 is way too old for this platform. They only get the bare minimum backports from upstream to make Ryzen work at all, you have none of the Ryzen-specific optimizations that have been made to KVM since then. I don't know the exact version of QEMU currently in the RH repo, but it's probably ancient; I saw a reference to QEMU 2.12 being used in CentOS this year and the current version is 4.2.0. The same applies here, QEMU has gotten a ton of updates specifically for Ryzen. Zen SMT implementation didn't even work right at all until QEMU 3.0 released.
For VFIO I recommend Arch or a derivative distro, though I know it can be scary to jump into a rolling-release. Personally I've only had an update break my system once, and it was because I was running Antergos which fucked up their localization settings when they pushed out their last dying breath update. When I switched all my packages to raw Arch and fixed my language, everything was good again. Otherwise I've only broken my system through my own stupidity (owning and using an Nvidia GPU on Linux, expecting their driver to survive a kernel update)
You should enable AVIC when you upgrade your kernel to one that supports it. Then in Libvirt you should
disable vAPIC. vAPIC is a paravirtualized device that performs similar functions to what AVIC does in hardware and they are incompatible; therefore, turn off vAPIC to get faster hardware interrupt control support. Luckily for you, SynIC and therefore STimers can still work on AMD systems using AVIC, whereas on Intel systems using the comparable APICv tech, these enlightenments are incompatible. So you can enable those in your hyperv block too.
The
ioapic driver might not be available for changing in your old version of libvirt, but it basically has 2 states. split = APIC split between userspace and kernel space; kvm = all APIC done in kernel space. KVM can perform slightly better but you can only use ioapic=kvm with AVIC and SynIC on
very recent distributions, there was a fix that got put into the kernel just 2-3 months ago that enabled this combination. Before that, AVIC + SynIC only worked properly in split ioapic mode. On top of all that, split ioapic mode has been known to cause issues with the Nvidia driver, so you should strive to use KVM if possible.
You should dedicate specific physical cores for vCPUS in order to prevent threadwalking (use
vcpupin). You also need to set a couple of CPU features that tell Windows how to handle the cache architecture and SMT on Zen. I'm also passing through an invariant TSC for timekeeping here, you should do that as well if your system has a stable TSC:
<cpu mode='host-passthrough' check='none'>
<topology sockets='1' cores='5' threads='2'/>
<cache mode='passthrough'/>
<feature policy='require' name='invtsc'/>
<feature policy='require' name='topoext'/>
</cpu>
The topoext feature exposes the Zen SMT arch and CCX split to the guest.
Can you afford to dedicate 5 entire cores to the VM while it's running? Consider using a real-time scheduler (
vcpusched with fifo, priority 1) and process isolation (cpuset shielding, disable kernel scheduling timer ticks on the dedicated cores, prevent RCU callbacks on those cores, poll for RCU on the host cores instead of generating them via interrupt from the guest cores. Check kernel options nohz_full=cset; rcu_nocbs=cset; rcu_nocb_poll)
Can you afford to dedicate even more cores to your VM while it's running? Consider using
emulatorpin to dedicate a thread for emulation processes. Also consider using
iothread to dedicate a thread for your disk I/O.
Unfortunately any advice I would have regarding the use of irqbalance is dependent on whether or not AVIC supports posted interrupts like Intel's APICv, but I can't find info on that. You could experiment with rebalancing kvm interrupts both exclusively onto and exclusively off of your vCPUs if nothing else works, but you have a lot to do before hitting this point.
Worth noting any of the phrases I
italicized in this post you should be able to find more info on how to use it at the Libvirt domain XML page
here.
Also, go join and ask questions at the /r/vfio subreddit.