• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Dell T7810 + Tesla K80 graphic card with 1300 W PSU

Tryphon

New Member
Joined
Jul 31, 2021
Messages
7 (0.01/day)
Hello,

I am trying the combination listed in title. I really need your help, at least to understand what is wrong in the following set up I tried. I saw in this forum that people seem to have knowledge to bring valuable help (https://www.techpowerup.com/forums/threads/dell-workstation-owners-club.243124/page-50). I give here maximum details on what I did.

Nominal configuration that works with both following PSUs :
Dell Precision T7810
Dell PSU 685W or 1300W
2 x Xeon 2690 v4
RAM 256 GB
nVidia Quadro K4000

I made some tests to understand the rail configuration on my PSUs : Dell 685 and 1300 W. The sticker on the PSU top lists 5 and 10 rails respectively.

Both PSUs work like a charm in my T7810 workstation.

The distribution board I have to power my T7810 cannot use all the rails provided by the 1300 W PSU.

1627754416737.png

In that way, the 685 W PSU is enough for a standard geared motherboard in this workstation.
Just to mention, the T7600 workstation may fully use the 1300 W PSU. I have also have this distribution board.
1627754551878.png


In both distribution boards, the rails are connected to the pins identified A1 to A32 and some from between B1 to B32. At least 18 lines in B are connected to the ground. Power is mainly provided by A lines. A lines form sets to power the connectors on the distribution board.

In the T7810 distribution board, for example, the P1 connector uses 2 sets of lines : 5 pins for the ground and 3 pins connected to A9-11 lines. A9-11 lines seems to form an independent power source dedicated to feed a video card. This distribution board does not use A12-14 and A29-32 power sets. However they are used on the T7600 distribution board to feed a video card and one of the CPUs. Hence I know where to find additional power from my 1300W PSU.

From the T7810 distribution board, I see 7 power sets but the 685 W PSU has only 5 rails. From the T7600 distribution board, I see 10 power sets and 1300 W PSU has 10 rails. I do not know what is the exact relation between a rail and a power set.

If A12-14 and A29-32 power sets are matching 2 rails, I would expect to get 2 x 12V x 18 A = 432 W.

I plugged a Tesla k80 card using extra wires I soldered on A12-14 and A29-32 that I mounted in a 8-pin EPS plug which fits in this K80 device. Precisely A12-14 pins connect pins 7 and 8, and A25-28 pins connect pins 5 and 6 on the Tesla board (https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/tesla-product-literature/Tesla-K80-Bo.... On the Tesla side, the EPS connector requires 225 W. I hypothetized that A12-14 plus A25-28 may bring 432 W.
Since my nominal configuration works with less than 685W, adding the K80 board cannot drain more than 685+300 = 985 W. I should be okay with the 1300W PSU.

Note that I printed a plastic guide to funnel air from the T7810 front fans througth the K80 passive dissipator : air has no choice than to blow inside the K80 box.

Result with the 1300 W PSU. When the Tesla card is seated in a 16X PCIe 3.0 slot without plugging my extra EPS plug, the T7810 starts normally (of course, in that way, I cannot use the card). When I plug this extra cable, the PSU seems to continuously restart (nothing appears on the screen, I cannot reach the BIOS and no beep is emitted). Note : "Memory Map IO above 4GB" is set in the BIOS.

Despite of the fact this result is frustrating after so much effort, I wonder if someone may bring any information or point any mistake that may help me to reach my goal to have the Tesla k80 card powered correctly.

Thank you for your help.
 
Last edited:

Jawesker

New Member
Joined
Aug 5, 2021
Messages
5 (0.00/day)
Hi, Maybe the problem is not the psu. I own a t7820 and a k80 I could never make it work and I tried everything. The computer never goes to the bios just stays in idle and never send anything to the monitor. My psu is 950W and nothing I even try an external psu and the same result. I believe that dell don't support these Tesla cards.
 

Tryphon

New Member
Joined
Jul 31, 2021
Messages
7 (0.01/day)
Thank you to share your experience. In fact you give me intuition that my PSU is not involved in the issue.

I just tried this for my T7810 machine :

I upgraded BIOS to A26 version
I disabled the PCI slot in the settings
I forced PCIe in gen3 mode (not Auto)

The workstation rebooted twice and ... THAT WORKS !!!!!!

The PSU configuration described before can feed my Tesla K80.
With air forced à 50% from the Bios, I can maintained the GPUs in idle mode at 56°C and 39°C.

Now I can continue to investigate with high load on the GPUs.

Thank you.
 

Jawesker

New Member
Joined
Aug 5, 2021
Messages
5 (0.00/day)
Thank you to share your experience. In fact you give me intuition that my PSU is not involved in the issue.

I just tried this for my T7810 machine :

I upgraded BIOS to A26 version
I disabled the PCI slot in the settings
I forced PCIe in gen3 mode (not Auto)

The workstation rebooted twice and ... THAT WORKS !!!!!!

The PSU configuration described before can feed my Tesla K80.
With air forced à 50% from the Bios, I can maintained the GPUs in idle mode at 56°C and 39°C.

Now I can continue to investigate with high load on the GPUs.

Thank you.
Hi, amazing congrats, when you say I disable the pci slot what you mean? Could you send me some fotos of the configuration in the bios please ?
Kind regards
 

Tryphon

New Member
Joined
Jul 31, 2021
Messages
7 (0.01/day)
Hi,

To disable PCI slot : System configuration/Miscellaneous devices (not sure this is really required)
To extend the BAR : System configuration/Memory Map IO above 4G
To manage PCIe link speed go to Advanced configurations
To force fan speed : Power management/Fan Speed Control

Remember you need an air guide to conduct air flow from front fan to K80 dissapator. I did one and printed it with 3D printer. Normally, the K80 should send a signal to the motherboard to ask more air flow. It does not appear to work in my configuration.
Tell if you need more settings than that but I do not see anything else.
 

Jawesker

New Member
Joined
Aug 5, 2021
Messages
5 (0.00/day)
HI, I just try but I have the same results the computer turns on but never reaches the bios. I just realized that never told you about my configuration:
Dell T7820
Dual xeon silver 4114
128 Gb Memory
pcie NVme 1Tb
quadro p2000
tesla k80 properly cooled

As I told you I tried everything.
1. Test the card in all pci slots
2. Setting memory map to above 4Gb
3. Disable Serr messages
4. Test the tesla on other computer (it works fine)
5. Set the configuration you told me before (everything except Gen3 I just have the option auto, gen1 and gen2 ) I tried all of them
6. Try an external psu for the gpu
7. remove one of the processors
8. update my bios firmware

Something really curious that I tried and partially worked was to remove the p2000 and boot the workstation with the tesla. It actually worked the computer finally started windows. I know that because I connected to it through Remote desktop but the tesla device was not properly working and I got a message that says The device cannot find enough free resources that it can use code 12.
I was thinking that maybe the problem is that the motherboard cannot handle heavy work GPU that is why I tried with a Quadro k6000 but it worked really well not a single issue the problem is with the tesla series. I will try this week with a tesla m40 just to be sure that the tesla is not compatible with t7820.
I have attached some pictures of the bios configuration actual photos of the setup and the error message in windows when I connect it through RDP.
Thanks again for all your help


Kind regards
 

Attachments

  • IMG_1057.jpg
    IMG_1057.jpg
    1,022.5 KB · Views: 527
  • IMG_4932.jpg
    IMG_4932.jpg
    1.3 MB · Views: 717
  • IMG_5127.jpg
    IMG_5127.jpg
    1.1 MB · Views: 521
  • IMG_7173.jpg
    IMG_7173.jpg
    1.6 MB · Views: 442
  • IMG_8373.jpg
    IMG_8373.jpg
    731.3 KB · Views: 371
  • IMG_0177.jpg
    IMG_0177.jpg
    1.2 MB · Views: 339
  • IMG_3526.jpg
    IMG_3526.jpg
    718.7 KB · Views: 304
  • IMG_5310.jpg
    IMG_5310.jpg
    1.1 MB · Views: 314
  • IMG_7346.jpg
    IMG_7346.jpg
    724.4 KB · Views: 316
  • IMG_8258.jpg
    IMG_8258.jpg
    1.4 MB · Views: 353

Tryphon

New Member
Joined
Jul 31, 2021
Messages
7 (0.01/day)
Hi,

Probably you are not so far from the success. I read somewhere from nVidia that you cannot use 2 different nVidia drivers for K80 and another nVidia card. This configuration can work only if both cards use the same driver. Perhaps this explains why you see K80 partially working without the p2000.

You have to be sure to install the right driver for the K80 : is it the cause of the error code 12 ? Maybe if 2 drivers try to access the same I/O port.

I set gen2 in the Bios and the K80 seems to work again with less bandwidth and a bit less heating.

I chose a legacy boot and I did not try UEFI. I don't know the impact.

I don't use Windows at home since more than 10 years. My config is under Ubuntu 20.04. Tesla K80 comes from server configurations mainly under Linux flavor and was optimized with this OS.

Cooling the K80 in that way is probably not appropriate. Air flow needs to forced inside the K80. You may find various solutions in the net.

Hope this help.
 

Jawesker

New Member
Joined
Aug 5, 2021
Messages
5 (0.00/day)
Hi, sorry for the late reply. Don't success so far :(. Next thing that I will do is install ubuntu to see if this is a o.s problem or in the bios or the motherboard. I tried a tesla m40 without success, well at least I manage it to finally have a video output from p2000, but still there is the error 12. Something weird that happens is that the workstation always try to set teslas as primary video cards, evethough I configure another pci slot with the p2000 as primary. And yes you are right, the cooling in the tesla is poor but I usally use a 3D printed duct with a fan, but for the photo i didnt place it. I am this close to quit and return the dell workstation, when I bought the tesla I thought it will be cool it, plug and play but...
 

Tryphon

New Member
Joined
Jul 31, 2021
Messages
7 (0.01/day)
Hi,

Sad to hear it is not working for you. As you I took a chance buying a T7810 with plenty of used hardware. It was quite easy to assemble ... except the K80 which turned me crazy. It was important for me to use a Dell workstation which works great under Linux. Many software for calculus under Linux are more powerful up to 30% compared to Windows. And you don't need an antivirus which always drains significant resources.

Today, 30°C here to test my new air conduct 3D printed. It collects the air from one of the front fans and put it directly inside the K80. GPU1 acheived 48 °C and GPU2 35°C in idle mode. I loaded the GPU around 85 W consumption and they achieved 75°C (limit is 88°C). It should be less in normal conditions (20 °C). Front fan is set at 50%. In very heavy load, I should be ok to cool the card. You could have the same arrangement without the complication to add an internal fan which could only blow hot air from inside the chassis.

Did you try the nVidia 470.05 driver? I use it for the K80 and a Quadro card with Ubuntu 20.04 and CUDA 11.4.
 

Jawesker

New Member
Joined
Aug 5, 2021
Messages
5 (0.00/day)
Hi,

Today I will try Linux to see if the GPU works properly. For cooling, I was thinking to use liquid cooling since I will stress the GPU many times. Have you considered using that? I know is a bit expensive but I always try to get second-handed and in the end is a quarter the price of a new one. Something that troubles me is the uncertainty about if the problem is in general with the t7820 series or just my motherboard has some issue. :( . I believe that I have tried that driver. I will let you know how things work out at the end. Thanks for all your help.

Kind regards
 

dzung

New Member
Joined
Aug 17, 2021
Messages
1 (0.00/day)
I am also planning to use a Tesla K80 on my Dell T7610 (waiting for the card from eBay). I will try to make it work and share some info here.

Thanks for bringing up this topic.
 

Tryphon

New Member
Joined
Jul 31, 2021
Messages
7 (0.01/day)
Hi,

I saw the water cooling solution. Of course water is really more able to evacuate the heat than the air and this device is quiet. However it is not so easy to install in the chassis. I just got a Quadro K6000 and I installed it next to the Tesla K80. They are working fine with the nVidia 470.05 driver. I used the 8pin PCIe connector available on the power distribution board to feed the K6000. So, in the T7810, it is possible to have 2 CPUs + 256 GB RAM and 3 GPUs with 36 GB VRAM. I can use the GPUs with the Matlab software.

Unfortunately, fancontrol software is not working properly with my workstation to remote the internal front fans. Otherwise it would be possible to link the K80 with the front fan speed. It is doable with an Arduino connected with one of the front fan. This is the only way I found to automatically remote the fan speed according to the K80 temperature. Maybe I will do ... to achieve more comfort with less noise.

Thank you to follow this thread. T7610 seems to be equivalent to my T7810. I do not know if it can take a more powerful power supply (675 W PSU is not enough to feed the workstation plus the K80 and a graphic card). Otherwise the T7610 distribution board offers an easy way to plug the K80 (using the provided cable adaptor) plus another graphic card. Remember to not power the K80 without a cooling solution.

Good luck.
 

robin997

New Member
Joined
Sep 24, 2021
Messages
1 (0.00/day)
Hi.

Coincidentally stumbled across the forum. I have successfully installed 4x K80 on the T630. A different model, but I think it is worth mentioning some of the problems I came up with along the way.

- Regarding cooling, dell's fan setup can never sufficiently cool the K80s. Those are 300W each, which is higher than most dual CPU systems. Some people add a small fan on one end of the card, you can find those on eBay, just search for "K80 active cooling fan". What I did instead is attaching server fans on the outside of the PC Case. Right on the card end would be better, since Tesla cards don't have any display outputs. And putting the fan in PUSH direction can achieve lower temperature; though, need a PULL fan somewhere, or just remove the front panel so as not to overheat other components.

- Regarding power cables, firstly, be careful that K80s use CPU arrangement rather than PCIe. Secondly, I remember that Dell was using some special wiring scheme, for the T630 power distribution card at least, so that we have to buy their provided cables. So what I had to do is personally rewiring the cables before going into the distribution card. After that, no more crashing.
 

Tryphon

New Member
Joined
Jul 31, 2021
Messages
7 (0.01/day)
Hi,

I managed the fan I used to cool the K80 inside my T7810 workstation. Dell Precision has 3 front fans (not independent). Setting of these fans is only possible from the BIOS. I choose the front bottom fan which blow air through 3D printed conduct attached to the K80 rear. This solution avoids inelegant fan addition on the K80 which can only blow hot air taken from inside the chassis. The 3D model is not so difficult to build with Thinkercad for example. It is really efficient to cool the K80 except it is noisy since all 3 fans have to rotate quickly at the same speed. I decided to reduce the fan noise for a better comfort and more pleasure to work with the workstation. As I said, fancontrol software does not work with these fans. The main idea is to take control of the Pulse Width Modulation (PWM) signal for the front fan cooling the K80. What we need :
- single wires with male and female connectors (picture) having yellow, red , black and blue colors
- one cheap Arduino Micro controller
- one USB A to micro B cable
- Arduino-IDE software to program the controller
- any plastic cylinder where the Arduino Micro may fit inside (to protect the electrical pins)
- read again my previous post

Hardware steps :
- unplug power cord of the workstation
- pull out the K80
- detach the front fan you chose and look at the wire colors
- plug the female connectors of the yellow, red and black single wires on the motherboard system-fan connector (the yellow RPM wire is required otherwise the workstation does not start)
- plug the male connectors of these single wires into matching female connector (yellow, red and black wires) of the fan cable.
- plug the blue single wire (male connector) in the remaining location of the fan cable (blue PWM wire)
- use tissue tape to secure the connections which may be loose
- plug the female connector of this blue wire on Arduino pin 3
- plug the USB B micro connector on the Arduino
- plug the USB A connector on the internal USB port of the motherboard
- cover the Arduino with the plastic cylinder
- start the workstation
- F12, enter the BIOS : to force fan speed, Power management/Fan Speed Control, choose 10 or 20, apply, restart

Software steps (tested with Ubuntu 20.04, do NOT plug 2 identical Arduinos at the same time!) :
- download/start Arduino-IDE and configure your Arduino following any tutorial found on the web
- copy paste this code and upload it into your device
C++:
// Temperatures are in Celius, NOT Fahrenheit
// Speed varies between 0 and 255
int PWMpin = 3; // define pin number for PWR output
int speed0 = 125; // define nominal speed for the K80 fan, PLEASE increase this value if you feel this is safer for the K80
int MinSpeed = 25; // define minimum fan speed
long timer = 0; // define start value for a timer
int temp = 50; // init temperature parameter
long t0 = millis();
int speed = speed0;
long t1 = t0;

void setup() {
  // put your setup code here, to run once:
    Serial.begin(9600);
    // write to PWM pin (K80 protection level 1)
    analogWrite(PWMpin, speed);
    // initialize digital pin LED_BUILTIN as an output.
    pinMode(LED_BUILTIN, OUTPUT);
}

void loop() {
  // put your main code here, to run repeatedly:
  if(Serial.available()>0)
    {
        digitalWrite(LED_BUILTIN, HIGH);
        delay(2);
        digitalWrite(LED_BUILTIN, LOW);
        temp = Serial.parseInt(); // collect K80 temperature
        // Clear buffer
        while(Serial.available() > 0) {
          char t = Serial.read();
        }
        speed = max(10,3*temp); // convert to fan speed : adjust this model to meet your requirement
        // test if speed is garbage
        if((speed < MinSpeed) || (speed > 255))
            {
              speed = speed0; // change speed to nominal value in order to keep the K80 safe (K80 protection level 2)
            }
        else
            {
              t0 = millis(); // reset starting value for the timer           
            }
    }
  t1 = millis(); // define ending value for the timer
  timer = abs(t1 - t0); // timer value
  // if no update about K80 temperature after 50000 ms, set speed to nominal level
  if(timer > 50000)
    {
      digitalWrite(LED_BUILTIN, HIGH);   // turn the LED on (HIGH is the voltage level)
      speed = speed0; // change speed to nominal value in order to protect the K80 (K80 protection level 3)
    }
  else
    {
      digitalWrite(LED_BUILTIN, LOW);    // turn the LED off by making the voltage LOW
    }
  // write to PWM pin
  analogWrite(PWMpin, speed);
}
- Open the terminal
Code:
sudo mkdir -p /opt/nvidia
sudo nano /opt/nvidia/CoreTempTX.sh
- copy paste this code
Bash:
#!/bin/bash

script_name=$(basename -- "$0")

if pidof -x "$script_name" -o $$ >/dev/null; then
    echo "Process already running"
else
    # Requires vendor:product as argument
    if [ $# -eq 0 ]
        then
            echo "Requires vendor:product as argument"
    else
            while :
            do
                    # Find device name of the Arduino in /dev
                    ArduinoMicro=$(find $(grep -l "PRODUCT=$(printf "%x/%x" "0x${1%:*}" "0x${1#*:}")" \
                    /sys/bus/usb/devices/[0-9]*:*/uevent | sed 's,uevent$,,') \
                    /dev/null -name dev -o -name dev_id | sed 's,[^/]*$,uevent,' | xargs sed -n -e s,DEVNAME=,/dev/,p -e s,INTERFACE=,,p)

                    # Read core temperatures of all nVidia GPUs and collect the highest one (Only K80 has no fan, hence highest temp should always come from K80)
                    T=$(nvidia-smi dmon -s p -c 1 | tail -n +3 | tr -s ' ' | cut -d ' ' -f4 | sort -V | tail -n 1)

                    # Send K80 temperature to the Arduino which monitor external fan for the K80
                    # MANDATORY : close Serial Monitor under Arduino-IDE to free its USB port !!!
                    echo $T > $ArduinoMicro
                    echo $T
                    sleep 15
            done
    fi
fi
- save the file (CTRL+X)
- set rigths
Code:
sudo chmod 744 /opt/nvidia/CoreTempTX.sh
- display Arduino "vendor:product" :
Code:
lsusb
note the vendor:product string found for the Arduino
- I used crontab to start and maintain the K80 fan monitoring
Code:
sudo crontab -e
- copy paste this line (a log is included) :
Code:
* * * * * /opt/nvidia/CoreTempTX.sh vendor:product >> /opt/nvidia/arduino.log 2>&1
- replace "vendor:product" by the string you found for your Arduino
- save
- start the monitoring
Code:
sudo service cron reload
- only one of the 3 front fans is now spinning more than the other

Insert the K80
- shut off the workstation and unplug the power cord
- attach your 3D printed conduct to the K80 rear
- insert this assembly in a PCIe 16X slot (the one aligned with the chosen front fan)
- plug the EPS connector for the K80 extra power
- start the machine

Results
- Arduino Micro takes control of the front fan very early during startup and blow a good air flux into the K80
- when the cron task is active, the bash program sends continuously the highest GPU temperature to the Arduino (via USB)
- programmed Arduino adjusts the fan speed according to the GPU temperature in a smoothly manner (up and down)
- no messy stuff around the workstation : all is well integrated
- stress test with MatLab + CUDA on K80 GPUs :
GPU 1 did not exceed 72°C (GPU 1 always received warmed air from GPU 2)
GPU 2 did not exceed 60°C
Slowdown temperature is 88°C (never achieved)
- in idle mode, I get 43°C/26°C on GPU 1 and 2, moreover Arduino allows a noise reduction by 10 dB and this is a welcome recompense (56 dB total noise with my config)

Disclaimer : this test was done under my configuration. You need to understand what you are doing to not damage your system. This tutorial may need to be adjusted for your setup.
 

Attachments

  • MalefemaleSingleWire.png
    MalefemaleSingleWire.png
    26.3 KB · Views: 98
Top