The AMD Secure Processor is an on-chip unit that is completely separated from the x86 CPU tasks. It runs off a 32-bit ARM Cortex-A5 microcontroller, uses a secure OS/kernel, and has off-chip firmware and data storage. The purpose of said processor is to provide cryptographic functionality for the secure generation of keys and management of these keys for different applications. A hardware-validated boot option enables root of trust for the security of the entire platform.
On-board secure memory encryption (SME) enables a single key to encrypt system memory, including that on virtual machines or containers, thus protecting against physical memory attacks. When your servers are located in a datacenter, you might not be able to protect against people physically messing with your machine, hooking up some equipment to read the contents of the memory modules. SME is fully transparent and requires no OS or driver support. It also supports hardware devices, including network and storage devices and GPUs, to access encrypted pages without issue, via direct memory access (DMA). We asked AMD about any in-built security against DMA attacks and were informed that they have not added anything on top; as such, DMA attacks can happen and circumvent SME.
AMD's second memory encryption technology is called secure encrypted virtualization (SEV), which protects virtual machines (VMs) and containers from each other as well as against tampering in general. It generates one key per hypervisor, per VM, per group of VMs, or per VM with multiple containers. This enables isolation of the hypervisor from the VMs/containers. Of course, you can always run unsecure VMs, but given the option, it is best to encrypt and use SEV. This capability comes in handy in situations where you rent virtual machines (e.g. Amazon AWS) without control over the machine or hypervisor. This could let third parties access your virtual machine data without you ever knowing of it. With SEV, this is prevented since neither the machine's provider nor your competitors (using VMs on the same host) will be able to access your unencrypted data.
Note also that AMD mentioned a latency increase of 7-8 ns when memory encryption is enabled, which results in a 1.5% performance hit in SPECInt. This is very reasonable in my opinion and well worth the benefits of encryption.
Intel's competing SGX memory protection requires that applications be recompiled to make use of memory encryption, whereas AMD's technology works transparently, which enables its use for legacy applications that to this day often handle mission-critical systems for which source code is either not available or can never be modified, due to high validation cost/complexity for code changes.
Infinity Fabric
AMD has engineered their EPYC processors to use four individual silicon dies on each CPU. This move makes sense because it vastly improves production yields. The bigger a processor die, the higher the chance of a defect occurring within its surface area, rendering the whole die unusable. When broken into smaller pieces, the ratio of failed dies to good dies improves. The only challenge here is that you have to connect the pieces of the puzzle somehow, for which AMD has invented Infinity Fabric - their take on a cache-coherent, energy-efficient, high-speed, low-latency interconnect.
Each of the four dies has three Infinity Fabric links going to each other die, which ensures latency is low because no intermediate hops are needed. Bi-directional bandwidth is provided by each link, clocked at 10.6 Gbps, is 42 GB/sec. AMD was also careful to minimize energy costs, which would otherwise have taken away power budget from the number crunching circuitry in the processor. To transmit a single bit, only 2 picoJoules is needed, which translates into 0.7 watt for a single link when fully active in both directions.
In a multiprocessor system, the processors are connected by several Infinity Fabric links: each die has its own link to the same die on the other socket. Combined with the die-to-die interconnects inside each processor, this ensures that data has to travel a maximum of two hops to go from any die to any other die in a 2-socket system. The link speed between sockets is 38 GB/s, for a total bandwidth of 152 GB/s provided by the four links. Energy usage is higher than for the on-processor links, but still low with 9 picoJoules per bit (11 watts max power for all four links in worst-case).