• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Upgrading cluster nodes

Solaris17

Super Dainty Moderator
Staff member
Joined
Aug 16, 2005
Messages
26,257 (3.80/day)
Location
Alabama
System Name Rocinante
Processor I9 14900KS
Motherboard MSI MPG Z790I Edge WiFi Gaming
Cooling be quiet! Pure Loop 240mm
Memory 64GB Gskill Trident Z5 DDR5 6000
Video Card(s) MSI SUPRIM Liquid X 4090
Storage 1x 500GB 980 Pro | 1x 1TB 980 Pro | 1x 8TB Corsair MP400
Display(s) Odyssey OLED G9 (G95SC)
Case LANCOOL 205M MESH Snow
Audio Device(s) Moondrop S8's on schitt Modi+ & Valhalla 2
Power Supply ASUS ROG Loki SFX-L 1000W
Mouse Lamzu Atlantis mini (White)
Keyboard Monsgeek M3 Lavender, Akko Crystal Blues
VR HMD Quest 3
Software openSUSE Tumbleweed
Benchmark Scores I dont have time for that.


Introduction​

So its about that time, my existing cluster runs on some E3 Xeons from the skylake era and while they are in perfect working order I am under a LOT of memory pressure from the max 32gb/node. I decided it was time to bite the bullet and shop around for a platform knowing I will buy atleast 3.

It started with some sapphire rapids xeons, but the depth of those chassis it a bit more than I want since I populate both sides of my rack. The only affordable option was ebay ES xeons which is honestly fine, but the motherboards either had to come from a WS line like proart, or MSIs flavor, or they had to be part of a barebones sled which ran me into the length constraints.

Moving on I decided to take a look at what options were out there. Gigabyte looked great, I have used there systems briefly, but they are hard to get. Supermicro an obvious choice was meh and again didnt really have the dimensions I needed for the compute I wanted. Lenovo was a no, there boot sequence is terribly long, Looked at some Dell and HP servers and while they would fit the bill iLO and iDRAC are terrible and they hide a lot of features behind licensing ala supermicro.

Then I found this. The AsrockRack 1U4LW-B650/2L2T RPSU. This is a server with all the works but built on a consumer platform.

One of the biggest issues I found immediately was that no amount of snooping online, either youtube, hosting forums, asrocks own site, indicated what exactly I would get if I actually bought one. Thats kind of scary since seperate coolers (looking at you supermicro) or missing rails (Dell) are pains in the ass to order after the fact and shipping generally takes forever.

This made me nervous but I decided to take the plunge.

It comes in a marked box like most servers, pretty industry standard but if you have porch pirates just something to make note of, though I doubt most people are ordering pure storage flash arrays to there house.

IMG_4363.JPG



Opening it up I was actually super happy! I immediately recognized the rail box as well as the heatsink box. Im off to a great start! Atleast I would be able to boot this. Of course the server itself was packaged nicely which is always a plus I have had terrible luck with HP and super micro sending me bent chassis.

IMG_4364.JPG


The other little box contained a single power cable some M.2 screws, some chassis screws and other odds and ends. I found the single power cable mildly inconvenient since its a dual PSU system but its also only like $1400 bare bones so its fine.

IMG_4365.JPG


The heatsink itself is really nothing special, A square piece of metal that you line up with your fans if you have a squared socket and 4 retention screws. This design isnt new and anyone who has ever looked at a server for the last 20 years has seen this.

IMG_4366.JPG


Surprisingly the bottom had a copper contact plate, most 1u systems dont anticipate a ton of power draw so most manufacturers just give you a solid hunk of aluminium. So this is a pleasant surprise. It has some pre-applied server goo that I swapped out with some thermal grizzly. After this gets racked I wont visit the DC again for another 5 years so its important to make sure all of this is setup as you intend to leave it.

IMG_4367.JPG


Side shot, no heatpipes here, just a copper base with some fins soldered on.

IMG_4368.JPG


This is the MOST ANNOYING part of missing heatsinks, since server heatsinks are generally the same, the major difference is usually height. So here you go the part number; your welcome Evan.

IMG_4373.JPG


Front shot of the server itself. its a 1U with 4 hot swap bays and integrated ears.

IMG_4369.JPG


Tugging on one of the PSUs I was pleasantly surprised. This sports 2 450w PSUs by FSP and are rated 80+ platinum.

IMG_4370.JPG


Taking the top off we are greeted with your run of the mil 1u layout. PSUs to the right, CPU center and IO to the left. It also sports a 2.5" mountpoint internally that is not hot swappable. The cooling baffles are plastic sheets, they arent my favorite, or anyone in the industry most likely, but they do work.

IMG_4371.JPG


The server itself has specs as follows:

AM5 Ryzen or Epyc 4004 series CPUs
4 DDR5 DIMM slots unregistered
1x m.2 4x pcie 5 slot
4 hot swap bays
2x 1gb Intel 210 ethernet NICs
2x 10GB Broadcom ethernet NICs
1x 1GB Realtek ethernet IPMI NIC
5x 40x28mm fans

Thats the gist anyway. I was happy to see that the IPMI nic is driven by an aspeed 2600. Thats relatively new in server land it was released in 2019. The 2500 which im used to working with was launched in like 2015. The 2700 is on the horizon but not actually launched so thats a nice win.


A closer look​


Now that we have it all taken apart we need to install some parts. For this test build we are going to throw in the following, under the assumption this is what I will stick with, which generally means I will get the best that I can in the density I need. The parts list is as follows.

1x AMD Ryzen 7950x
4x 32GB Crucial udimms (CT2K32G48C40U5)
1x 1TB Samsung 990 Pro NVMe
2x 500gb Sandisk Ultra SSDs
2x 20TB Seagate Exos
1x Intel Arc Pro A40

Now that we have it all installed we need to start with the basics. IPMI is your lifeline on any machine you are not in front of, so before ANYTHING makes it into my rack it has to work, have the features I want, and most importantly be stable.

1721771448556.png


Nice, Im used to early 2000s yahoo.com vibes. It looks like these come with redfish right out of the box. After logging in, I was prompted to immediately change the password. Nice! Once we were in I was greeted with the main control panel.

1721771594110.png


Looks good. Now time to beat on it.


IPMI the good, the bad, the ugly​


First order of business is usually what an admin will absolutely need, that can be broken down into only a few items.

- BMC Upgrading: The upgrading of the IPMI firmware itself
- BIOS Upgrading: Upgrading of the servers bios
- Remote Control: KVM and its abilities

Those are needed and unfortunately a lot of the time they are locked behind a license key. Some nice to haves are as follows

- Configuration Backup IPMI
- Configuration Backup bios
- Sensor reading

Remember though after all of this we still need it to be stable, so it's time to start clicking around.

First things first we want to see if we can update firmware or if im going to need to buy a license. This is important because licenses are generally not BOM from the manufacturer, you need to get them from a VAR like CDW or PC Connection. Thats when your dealing with sales people and it can get pretty trying on your patience.

1721772383325.png


Thankfully thats not the case! Atleast yet.... Lets actually click on one to see what happens. We will do BIOS since that usually the one locked behind a license.

1721772432647.png


Wow no shit. At this point you would have a license key field or a upload dialogue for your key file. However straight from the factory they dont seem to limit you. Alright lets try the BMC firmware next.

1721772505654.png


Nice, it was expected but its good to make sure. So far so good, lets see how manageable this is, time to click around and see how responsive and buggy the IPMI implementation is from Asrock.

1721772639290.png


Honestly so far its pretty quick and they include some really nice features. The sensors windows actually draws a graph for each sensor being read, and the above shows some extras they throw in; including a BSOD screenshot. Nice.

Lets try KVM

1721772732177.png


Not terrible at all, its snappy, allows media mounting and has control built in. However. It is kind of annoying. Here are some griped I immediately have.

- The time out is not based on usage it seems. It will boot me after 15min no matter what. So far there is no setting for timeout adjustment.
- The BIOS and IPMI version from the box are JANKY the features are nice but when it reads out "N/A" how useful is it really? Thankfully an update did cure most of this.

Those dont seem like terrible gripes, until you are trying to image a box in an emergency and your session disconnects, or you are inexperienced with datacenter hardware and dont understand why most functions seem half baked.

As I said; after upgrading the BIOS and IPMI firmware, its smooth sailing, minus the time out, which is still intensely annoying. If you are mounting media from your personal PC and streaming it a continent away because you are doing a remote install your not going to be pleased when it boots you 36% into writing to disk.


The BIOS​

The BIOS gets graded on features. Unlike consumer land the only player I can remember that actually tries to make there BIOS EVGA/Gigabyte pretty is Dell, and frankly it sucks, its slow cumbersome and glitches. As far as looks go this is a pure your intimate with the industry or your not. If you expect gorgeous interactive BIOS systems, then stick to consumer land where they cater to you.

Standard OPRom init

1721773327142.png


Now we get the interactive splash, kinda looks nice to be honest!

1721773358666.png


Now we have the BIOS itself. Everyone older than 20 has seen it.

1721773396742.png


Not bad, I should also note that even with the factory BIOS it booted without issue with the test parts. That doesnt mean it wasnt without its gremlins though.

First off the disk detection was too fast or broken, meaning you had to warm reboot to see them in the BIOS. The newer roms fix this but it was annoying. Additionally; the product page itself does not contain all of the firmware. If you want the latest you need to go to the motherboard section and not the server section.

Server Section:

1721773619249.png


Board Section:

1721773658081.png


However, its not as bad as broadcoms or HPs site so I will chalk this up to just being jaded. The support itself is actually great! They already have a beta BIOS for the 9000 series AMD chips.

1721773725737.png


Neat. Moving on the next stop is the famous "Advanced" tab. The impact can vary by OEM but its generally ALWAYS AMD based systems. There are SO MANY options under the advanced tab most Jr's get tricked into thinking the server or session has hung. It hasn't, it just takes forever to iterate.

1721773858401.png


From here we are looking for a few specific things.

- Virtualization settings, SR-IOV IOMMU etc
- Security, both platform and memory

Taking a peek around we find a few off the bat. Re-BAR and SR-IOV

1721774977625.png


Looking further we find the TPM 2.0 enablment

1721775039218.png


DDR Data scrambling

1721775155349.png


Expected ECC enablment

1721775186613.png


IOMMU perfect

1721775220910.png


Pluton support extra nice

1721775332636.png


Secure Boot thats expected but fine.

This all seems in order, but you know what would be cool? If we can change some of these options from IPMI.
 
Last edited:

Solaris17

Super Dainty Moderator
Staff member
Joined
Aug 16, 2005
Messages
26,257 (3.80/day)
Location
Alabama
System Name Rocinante
Processor I9 14900KS
Motherboard MSI MPG Z790I Edge WiFi Gaming
Cooling be quiet! Pure Loop 240mm
Memory 64GB Gskill Trident Z5 DDR5 6000
Video Card(s) MSI SUPRIM Liquid X 4090
Storage 1x 500GB 980 Pro | 1x 1TB 980 Pro | 1x 8TB Corsair MP400
Display(s) Odyssey OLED G9 (G95SC)
Case LANCOOL 205M MESH Snow
Audio Device(s) Moondrop S8's on schitt Modi+ & Valhalla 2
Power Supply ASUS ROG Loki SFX-L 1000W
Mouse Lamzu Atlantis mini (White)
Keyboard Monsgeek M3 Lavender, Akko Crystal Blues
VR HMD Quest 3
Software openSUSE Tumbleweed
Benchmark Scores I dont have time for that.


Remote BIOS management​


Looks like they have some sort of functionality. It has a link near the account management stuff.

1721775662412.png


Clicking it prompts us to login again.

1721775685194.png


Oh wow, I wont give my hopes up yet, but it appears they have some of the sections.

1721775729600.png


1721776863050.png


Yeah, I mean its cool. All the ones you might more or less need are present but not all. That might make some sense if you take into account some settings are dangerous or only apply during a reboot, but those options are highlighted in yellow already. So why didn't they include all? Whos to say.


Thermals and Noise pollution​


I honestly debated making this a section since there wont be a ton to talk about and during a typical bootup sequence things get spicy before settling to the upper 40ºC

1721777341548.png


fun fact, all your sensors emit graphs

1721777369266.png


For you dashboard people the IPMI CP also allows you to download the MIB so you can use it in like a TIG stack or prometheus, or nagios etc etc.

1721777458644.png


Moving on with that spicy heat comes an aggressive fan curve and heuristics. Not unlike other competitors however; after all we aren't reviewing a Ferraris ability to go to the grocery store.

That said the PWM 40x28mm fans are a pretty common 1u industry standard. You can get them from the likes of Delta, Dynatron, Sunon. If you understand what I'm saying then the rest is for everyone else.

I can hear these fans ramp down stairs in my kitchen. They are loud and in a DC environment require hearing protection. This is par and I won't dock it any points. Your basement, or closet, or whatever you are hiding your true self in is not the place to keep this server. However, the cooling performance is good!

When it ramps up it sounds like a jet easily passing 10k RPM but the system generally peaks just shy of 80ºc. 77 might be warm for some, but it will be cooler in a DC. It's also worth mentioning that the loads are different. For example while you might struggle to play crysis hosting VMs actually don't take a ton of CPU power, unless you SPECIFICALLY and purposefully put a CPU load on it.

The OS bits

In my case I am migrating my Windows server cluster to these new machines. For that I will be transitioning from my SAN over 10G fiber to Storage spaces direct over 10G ethernet. This comes with some challenges, specifically drivers. Most home users don't care or even look at these settings but we will be looking for some specific ones. Mostly because we will be doing hyperconverged storage. We will install the actual broadcom NIC drivers instead of those MS provides.

- RoCE (ideally v2)
- SR-IOV
- Flow Control
- QoS

Flow Control ok

1721778573967.png


Network direct unexpected but sick

1721778674465.png


RoCEv2!

1721778706249.png



QoSSSS

1721778733877.png


SR-IOV



1721778763102.png


Wow no shit thats unexpected. Now S2D should be fire and forget.

Now to just test it for a week under different performance loads and see what shakes.


1721778965851.png
 
Last edited:

Solaris17

Super Dainty Moderator
Staff member
Joined
Aug 16, 2005
Messages
26,257 (3.80/day)
Location
Alabama
System Name Rocinante
Processor I9 14900KS
Motherboard MSI MPG Z790I Edge WiFi Gaming
Cooling be quiet! Pure Loop 240mm
Memory 64GB Gskill Trident Z5 DDR5 6000
Video Card(s) MSI SUPRIM Liquid X 4090
Storage 1x 500GB 980 Pro | 1x 1TB 980 Pro | 1x 8TB Corsair MP400
Display(s) Odyssey OLED G9 (G95SC)
Case LANCOOL 205M MESH Snow
Audio Device(s) Moondrop S8's on schitt Modi+ & Valhalla 2
Power Supply ASUS ROG Loki SFX-L 1000W
Mouse Lamzu Atlantis mini (White)
Keyboard Monsgeek M3 Lavender, Akko Crystal Blues
VR HMD Quest 3
Software openSUSE Tumbleweed
Benchmark Scores I dont have time for that.

In Closing​


Deep down I have reservations on consumer hardware, however there is no denying that the performance over what I have is significant. After a week of load tests mock setups; ranging from purposefully destroyed arrays, to unplugging a PSU mid benchmark it really handled everything I threw at it.

So much so I did the bad.

1721779123342.png


3 total systems exactly the same.

1x AMD Ryzen 7950x
4x 32GB Crucial udimms (CT2K32G48C40U5)
1x 1TB Samsung 990 Pro NVMe
2x 500gb Sandisk Ultra SSDs
2x 20TB Seagate Exos
1x Intel Arc Pro A40

This will allow me to continue to grow my cluster and VMs in general. The ARC cards will be passed through for AI training and the system itself has all the requisites that will be required for server 2025 in the next few months.

Additionally. when 48gb ECC dimms are more easily had and the new Epyc or Ryzen Pro AM5 9k systems come out I can make the switch in both CPU and RAM to more DC oriented parts. With prices as they are and ECC and compute already had to some degree on DDR5 for me it was worth it to do it twice, even if that means a refresh in a year or two. As it stands now this should give me enough breathing room for sometime.

In the next few weeks I will take the weekend trip to do the migration. If you want to know about that kinda nerd shit lmk and I can document it.
 
Top