# GPU server project



## storm-chaser (Sep 11, 2020)

Hello fellow computer enthusiasts! 

The Server at the heart of this little project is a Dell PowerEdge C4130 with provisions internally for up to 4 full sized PCIe GPUs located just behind the front grill. Consequently, it has a very deep 1U design / form factor. *I am eyeing the Nvidia Tesla K80 for GPU choice, *because it's actually two GPUs housed on one board, each with a dedicated 12GB of GDDR5 memory. I have a source for brand new in box (new old stock) K80s that run about $300 a pop so that is likely the route I will be taking. This project is purely done out of my passion for computers and I'm talking this time to learn as much as possible about using GPUs for general computing at the same time. I intend to use this server for GPGPU related tasks, *but I have nothing definitive planned in terms of it's actual usage case. *That's partly why I'm here ---  I'd like to know what you guys think I could do with something like this, given the supplied hardware configurations I've put forward for both the Dell C4130 as well as the Dell PowerEdge R620 that I will be using in parallel as in a cluster of sorts.

*K80 Background (credits to TPU): *
The Tesla K80 was a professional graphics card by NVIDIA, launched in November 2014. Built on the 28 nm process, and based on the GK210 graphics processor, in its GK210-885-A1 variant, the card supports DirectX 12. The GK210 graphics processor is a large chip with a die area of 561 mm² and 7,100 million transistors. *Tesla K80 combines two graphics processors to increase performance. *It features 2496 shading units, 208 texture mapping units,  and 48 ROPs, per GPU. NVIDIA has paired 24 GB GDDR5 memory with the Tesla K80, which are connected using a 384-bit memory interface per GPU (each GPU manages 12,288 MB). The GPU is operating at a frequency of 562 MHz, which can be boosted up to 824 MHz, memory is running at 1253 MHz. 

Being a dual-slot card, the NVIDIA Tesla K80 draws power from 1x 8-pin power connector, with power draw rated at 300 W maximum. This device has no display connectivity, as it is not designed to have monitors connected to it. Tesla K80 is connected to the rest of the system using a PCI-Express 3.0 x16 interface. The card measures 267 mm in length, and features a dual-slot cooling solution. 

*Additional Specs on the Nvidia Tesla K80 GPU:*

4992 NVIDIA CUDA cores with a dual-GPU design (two GK210 chips on same board)
Up to 2.91 teraflops double-precision performance with NVIDIA GPU Boost
Up to 8.73 teraflops single-precision performance with NVIDIA GPU Boost
24 GB of GDDR5 memory
480 GB/s aggregate memory bandwidth
ECC protection for increased reliability
Server-optimised to deliver the best throughput in the data center
Additional Specs on the K80. Keep in mind most of these numbers reflect the specs of only a single GPU susystem. So multiply by two to get real world numbers since this card has two GPUs on the same PCB.
*Memory Bus  *                                384 bit 
*Bandwidth    *                                240.6 GB/s
*Bus Interface  *                              PCIe 3.0 x16

*Base Clock *                                   562 MHz
*Boost Clock *                                 824 MHz
*Memory Clock  *                            1253 MHz (5 Gbps effective)

*Transistors  *                                  7,100 million
*Pixel Rate *                                     42.85 GPixel/s
*Texture Rate  *                                171.4 GTexel/s 
*FP32 (float) performance*            4.113 TFLOPS 
*FP64 (double) performance*        1,371 GFLOPS (1:3)

*Shading Units *                               2496 
*TMUs   *                                           208 
*ROPs   *                                            48 
*TDP    *                                              300 W

So this server has provisions for up to two 1.8" USATA SSDs for your OS related partition, etc. Thing is, the OEM Dell SSD drives sometimes still go for upwards of $400, even in used condition. According to dell documentation, these run on a Usata interface and are described as being SSDs with a 1.8" form factor design. *Try as I may, I am having a difficult time getting clarity on exactly what I can use as a substitute for the overpriced oem SSDs. Because I don't want to have to pay through the nose for OEM drives if I can help it. I* can get an adapter from a 1.8" interface and converting to a standard 2.5", but I'm still not even sure if that will work. There also seems to be some confusion about exactly what interface Dell is using here. Some people are telling me it's usata, and others are telling me it's msata. I have attached pictures of the actual SSD 1.8" drive bays (in the C4130 server) and pictures of the specific sata connection located on the server board. Ideally, I want to convert from 1.8" SSD to the standard 2.5" sata interface so I can run a standard SSD. This will obviously require an adapter and then an extension since the bigger 2.5" drives will not fit in the 1.8" drive bays. So it will be a little messy here but I cant think of any other way to do this. Any suggestions here are appreciated on how I can get around this little interface issue in the most painless way possible. 

As I said earlier, I will be using a Dell PowerEdge R620 that I have laying around here (as you can see in some of the pictures, both servers will need a good cleaning before putting them back into service. Sorry they are pretty dusty right now. Some of the parts and upgrade plans are listed below.


----------



## storm-chaser (Sep 11, 2020)

A little background for you guys on the C4130 GPU server I will be using here:

*Dell PowerEdge C4130 Server - System Overview*

*Description*





The Dell PowerEdge C4130 server is part of Dell's C-series and delivers up to 7.2 teraflops in a single, ultra-dense 1U platform. Teraflops are a unit of measure used for high-performance computing (and gaming consoles) and feature much faster data transfer than a typical server, achieved with the addition of GPU coprocessors. It supports dual Intel Xeon processors and up to four 300W PCIe accelerators, either Intel Xeon Phi coprocessors or Nvidia Tesla GPUs. There are five unique configurations available for this system to support workloads in science, research, finance, and medical applications.

*Performance*
This system can support either one or two Intel Xeon E5-2600 v3 or E5-2600 v4 processors supporting up to 22 cores. Additional compute power is provided by up to four Intel Xeon Phi accelerators or Nvidia tesla GPUs like the K40 or K80 dual GPU accelerators.

*Memory*
Supported by Intel's C612 series chipset, the Dell C4130 is capable of supporting a maximum of 16 DDR4 memory modules in a two processor configuration. Each processor has four memory channels, and each channel supports two memory modules, either Registered (RDIMM) or Load Reduced (LRDIMM). With dual processors and 64GB memory modules installed in all 16 slots, this system will support up to 1TB of memory at speeds of up to 2400MT/s. Of course, memory speed is dependent on the memory module, memory configuration, and a compatible CPU that supports this transfer speed.

*Storage*
There are two 1.8-inch SATA SSD boot drives that are accessible from the back of the system. Four more 2.5-inch cabled SAS or SATA drives can be mounted in an optional HD cage that occupies the top power supply slot. However, adding the additional 2.5-inch drive cage will disable power redundancy. The system also features an optional Internal Dual SD Card Module (IDSDM) that can be configured to boot the system or for additional storage, and can be configured for redundancy or single card operation.

*CPU Choice: *







*Pulled the trigger on two Xeon E5 2678 v3 12 core chips with hyperthreading.* This chip is nearly identical to the E5 2680 CPU in all respects.
*Turbo Data:*
2.5 GHz    4/4/4/4/4/4/4/4/5/6/8/8
Which means, one core max turbo speed is *2.5 + 0.8 = 3.3 GHz*
all core turbo speed is *2.5 + 0.4 = 2.9 GHz*
E5-2678 v3 can also run with DDR3 rams up to 1866 MHz and the 2680 v3 can only run DDR4 memory.

This C4130 will be getting a dedicated Nvidia GeForce 710GT, which I just installed in the case after switching out the low profile adaptor. The R620 will be getting the same treatment and a very similar 710GT dedicated GPU that I will install later today.











*Memory Upgrade:*
The PowerEdge C4130 will be getting a 64GB ECC kit (8 x 8gb) running at 2133 MHz. This will allow me to capitalize on memory bandwidth but by no means is it maxed out. I could have gone more, for more channels, but this DDR4 stuff is still pretty expensive. I can always add more down the road, if I feel so inclined. *I am confident memory performance will not be a problem.*



The PowerEdge R620 has a total of 24 memory slots accepting ECC DDR3 modules. *I will be populating all memory channels to fully maximize memory bandwidth.* This kit consists of twenty four 4GB modules, giving me a total of 96GB of ECC memory running at 1600MHz. If I have this right, it should work out to a twelve channels of memory bandwidth? Total of 24 slots, with two slots per channel = 12 channel memory, right?

New memory for the R 620:
This is actually OEM HP server memory, but should work fine in my poweredge 620...
In case there is a compatibility problem with the first kit, I also have the other memory to try out, the sticks with the black heat spreaders on them



















New Memory kit for the C4130:


----------



## storm-chaser (Sep 11, 2020)

Hard drive upgrade: In addition to the SSD boot drive, the PowerEdge R620 will be getting two 300GB 15K SAS drives for a data partition. Since this equipment isn't mission critical, I will be opting for the RAID 0 option to maximize disk IO. 





This is the 1.8" SSD cage for the C4130, any suggestions on parts to convert this to 2.5" would be appreciated. I just want to get some feedback before I go out and get the actual adapters I will need here...


----------



## storm-chaser (Sep 11, 2020)

Low Profile GPU installed:






Upgraded PSU to support four 300 watt GPUs, such as the Tesla K80


----------



## phill (Sep 17, 2020)

Look forward to seeing the rest of the build and setup!!   I have a R620 myself as well, upgraded it to 2 10C 20T CPUs with 64GB of RAM which I was happily surprised at being quite cheap!


----------



## storm-chaser (Sep 26, 2020)

phill said:


> Look forward to seeing the rest of the build and setup!!  I have a R620 myself as well, upgraded it to 2 10C 20T CPUs with 64GB of RAM which I was happily surprised at being quite cheap!


Yeap, the R620 is a decent workhorse, but in my case the CPUs that came with it were near the bottom of the list in terms of performance, and I definitely had to do something about that.

*So last week I pulled the trigger on two Xeon E5 2673 v2 CPUs. *These are 8 core 16 thread chips that have a base clock of 3.3GHz and a single core turbo of 4.0GHz. They have an all core turbo of 3.6GHz, and if cooling is good it will hold this no problem. 

The E5 2673 v2 is an OEM processor and very difficult to find in the US market. Most often you have to go to the china market to get them. The attributes that make it desirable in this case is the low TDP of 110 watts (vs 130 watts of it's retail equivalent) and high performance turbo characteristics. Matter of fact, there are three 8C/16T Xeon chips in the 2600 series family that are very close in terms of performance characteristics. The E5 2673 v2, the E5 2667 v2 and the 2687w v2:





In theory, all three of these CPUs have nearly identical clock and turbo specifications. They all have an all core turbo speed of 3.6GHz and a single core turbo speed of 4.0GHz. So performance characteristics between all three should be very similar. In my opinion the E5 2673 v2 is the best choice here, due to lower TDP and same clock speeds as the higher rated 130w and 150w processors. But I will be doing a detailed performance comparison between the E5 2667 v2 and the E5 2673 v2 to see exactly how they perform toe to toe.


----------



## phill (Sep 26, 2020)

I've been trying to find a few V4 Xeon's for my servers but they do seem to be rather hard to get hold off.  I'm looking forward to the write up....


----------



## Toothless (Sep 27, 2020)

I wanna see the F@H and WCG numbers.


----------

