• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

THREAD CLOSED!!! Post Your AMD RyZen Blender Benchmarks at 200 Samples!

  • Thread starter Thread starter Deleted member 50521
  • Start date Start date
Status
Not open for further replies.
So cuda aktive with one card is slower than CPU alone. Not what i exspected with this old CPU. CPU is still better than one card.
You might want to re-check that for a single 970. I had some trouble getting Blender to use my GPU at first.
 
You might want to re-check that for a single 970. I had some trouble getting Blender to use my GPU at first.

There shut not be a problem about single GPU run. I have a Logitech G19s keyboard with a display in it and with MSI afterburner i can in real time hold an eye on CPU aswell as GPU load. Under single run test only one of the GPU where in use.
And in the dual GPU test both GPU where in use and a friend of mine confirmed that a single GPU is slower. He has a I7 6900K and a single EVGA GTX 1080 classified and that where also way slower with one GPU vs. only CPU run. CPU only took 32 seconds while with cuda and his single GTX 1080 over 1 minut.
 
If you download the Ryzen blender file now it defaults to 150 Samples instead the older default of 200 Samples.

So AMD changed the sample setting for their blender file I see.
 
CUDA for Geforce 1000 series is not optimized. It's not just for Blender. Nvidia Iray render engine has issues with Geforce 1000 series.

In general, it takes a full year for CUDA code optimization to catch up. It's not new.
 
In the Render PERFORMANCE tab:

For GPU enter: 256 by 256 and start with 256 as well or try even 564 by 564 and start with 564. Depends on your GPU. 1080 I beleave 564 is better

For CPU enter: 16 by 16 and start with 16 or 32 by 32 and start with 32.

If CPU (I mean i5 or i7) and GPU are same Gen, then GPU in Blender should be faster than CPU at the current build. I am rendering some video material for the company I am working for sometimes.

I don´t know how it is with 6/8/etc. core processors or server processors or AMD hardware.

I am using 2 Blender instances side by side to use CPU and GPU simultaneously.
 
Last edited:
Or use Blender built-in addon called autotile. It detects CPU or GPU and changes tile size automatically.

For GPU though, the best tile size is the same size as render resolution if possible to set it that way.
 
1:02:73 200 samples

6700k @ 4.8ghz, 4.5ghz cache. RAM @ 3333mhz 14-17-17-32 1T

View attachment 82051

I really hope Zen is as good as we're being lead to believe. Could be game changing :D
Could you be so kind and test 4.7ghz and cache 4.2ghz, wanna compare clock vs clock with mine :D thanks

I got 1:12sec @ 200 samples.
 
Could you be so kind and test 4.7ghz and cache 4.2ghz, wanna compare clock vs clock with mine :D thanks

I got 1:12sec @ 200 samples.
His Skylake even at the same clockspeed is going to be faster than your Haswell. I was matching or just under a Skylake 6700K @ 4.6 GHz while my 4790K was @ 4.7 GHz. Although depending on the benchmark it will/does flip-flop too.
 
CUDA for Geforce 1000 series is not optimized. It's not just for Blender. Nvidia Iray render engine has issues with Geforce 1000 series.

In general, it takes a full year for CUDA code optimization to catch up. It's not new.

Funny thing that in my system 750Ti is faster thant 980Ti in this bench... ~1:29 vs 1:41 in 150 samples(albeit monitor hooked to 980Ti it might screw the party)

I really don't know to be happy or cry therefor lol
 
Funny thing that in my system 750Ti is faster thant 980Ti in this bench... ~1:29 vs 1:41 in 150 samples(albeit monitor hooked to 980Ti it might screw the party)

I really don't know to be happy or cry therefor lol

Hm, 1:29 and 1:41? You can't be serious? That's waaaaaay too slow for those GPUs.

Anyway,

The 150 sample is showing weird result for me also on CUDA. I've done my renders, which takes hours, on both CPU and GPU. From the results, I've understood that my CPU and 1060 are pretty much neck on neck.

However, in this 150 sample render, CUDA scores 17 seconds whereas my CPU scores 31 seconds. In 100 / 200 sample render, 1060 was identical to my CPU render time, 20 seconds / 40 seconds.

There may be a reason why AMD chose this specific 150 sample.

Edited to correct spelling.
 
Last edited:
Hm, 1:29 and 1:41? You can't be serious? That's waaaaaay too slow for those GPUs.

Anyway,

The 150 sample is showing weird result for me also on CUDA. I've done my renders, which takes hours, on both CPU and GPU. From the results, I've understood that my CPU and 1060 are pretty much neck on neck.

However, in this 150 sample render, CUDA cores 17 seconds whereas my CPU cores 31 seconds. In 100 / 200 sample render, 1060 was identical to my CPU render time, 20 seconds / 40 seconds.

There may be a reason why AMD chose this specific 150 sample.

I get 47s while using both GPU's... haven't changed anything else. Tomgang also didn't have stellar results with GPU. I got 45.77s on CPU using 150 samples.

The key should lie in the used filters and scene complexity, this is really a simple one so some heavy x86 optimizations doesn't shine and thus simple raw power from GPU breaks it's way. There is some mojo hidden for sure.
 
I get 47s while using both GPU's... haven't changed anything else. Tomgang also didn't have stellar results with GPU. I got 45.77s on CPU using 150 samples.

The key should lie in the used filters and scene complexity, this is really a simple one so some heavy x86 optimizations doesn't shine and thus simple raw power from GPU breaks it's way. There is some mojo hidden for sure.

Try this. Under file -> Preference -> in search box, type auto -> Click to activate Auto tile size

The thing with GPU render is that GPU cores as whole can only act as one giant core. Therefore, smaller tile sizes actually hinder their speed due to overhead. The auto tile size should speed things up for GPU. It doesn't matter much for CPU though.
 
Try this. Under file -> Preference -> in search box, type auto -> Click to activate Auto tile size

The thing with GPU render is that GPU cores as whole can only act as one giant core. Therefore, smaller tile sizes actually hinder their speed due to overhead. The auto tile size should speed things up for GPU. It doesn't matter much for CPU though.

Always nice to learn something new. :lovetpu:


Got 11.32 on both GPU's. 40.44s on 750Ti and 13.45s on 980Ti.

Edit.

Forgot to change them samples :D

on 150

both 9.54s, 750Ti 30.25 and 980Ti 10.19
 
FX-8150 @ 5GHz
200 Samples @ 3:26.58
Bulldozer sucks pretty hard. :laugh: Time to go back to X58 + Xeon 6-core soon.
Screenshot (66).png
 
2:44
i5-6200U

Update: 6:59 on Core 2 Duo T9600 on the good old Precision M4400 :D

The model renders about 30% faster under Linux (I use Xubuntu 16.04) than under Windows 10.

Made a figure for a more intuitive comparison. The gap is big enough such that it cannot be ignored!
compare.png


A screenshot from Linux. `cat /proc/cpuinfo` returned a frequency at which the cpu is running after rendering ended.
aaa.png


(Update) T9600:
m4400.png
 
Last edited:
IPC had increased over the generations. Since Sandy Bridge it's not a huge leap, but from previous generations it's massive. The T9600 has nothing on the i5 6200u, and neither does the i5 540m.
Ah, yes, correct! For almost all applications later generations perform better. (the only rare occasions i5-6200U lag behind older generations may be very special cases and/or when it's thermal-throttling)

Plus the integrated GPUs kept improving after Sandy Bridge XD
One very noticeable improvement is that the old Dell Precision M4400 lags even when scrolling in Firefox (because the GPU operates at Performance Level 0 and forcing the GPU at maximum speed with CoolBits can fix this).
 
4790K @ 4.7 DDR3-2400 10-10-12-1T, latest *.blend file, blender 2.78a x64 Win 8.1

150 samples: 54.9 seconds
100 samples: 36.2 seconds
200 samples: 73.8 seconds

(My laptop took like 11 minutes for 200 samples lol.)

9.53 seconds with the 1080FE at 150 samples
 
Last edited:
GeiAb6D.png
PKqyksT.png


Interestingly the Xeon Phi ran at around 10% load for most of the benchmark and memory wasn't noticeably taxed either. The Xeon Phi still needs a lot of optimization.
 
It would be great if some of the 86 guests currently viewing this thread registered with TPU and submitted a score.

Fill in your specs here
https://www.techpowerup.com/forums/account/specs

Did just that lol.

200 samples

200sam.PNG


100 samples

100sam.PNG


Although my scores seem to be terrible with my 295x2. I activated auto tile, but didnt seem to help much. My gpus sat pretty much idle during this. Guessing this is probably an xfire issue?

Edit 2: Would help if I set it to gpu compute...
1 gpu - 34.5s
2 gpu - 18.01s

@ 150 samples. Damn I think I finally need to go get a new GPU. 1080FE is apparently 4x faster than my current heat spewing card.

Pretty neat to see scaling like that for once lol.
 
Last edited:
Thread title still says blend on top of it all.. lol..

Surprised this cluster is still going. Plans for a new thread, properly titled, and using the same file???
 
Thread title still says blend on top of it all.. lol..

Surprised this cluster is still going. Plans for a new thread, properly titled, and using the same file???


Fixed the title. And Still holy mother of god this thread is still going on STRONG!

over 14k views. Somehow I feel like TPU should thank me for making this thread in the first place. :D We got tons of new members registered just to post their scores!
 
Last edited by a moderator:
Could you be so kind and test 4.7ghz and cache 4.2ghz, wanna compare clock vs clock with mine :D thanks

I got 1:12sec @ 200 samples.
Hi @TheHunter , sorry for slow reply.

I got 1:04:71 at those settings :)
I'm a bit surprised by that, I expected a bigger drop. That's only 2 seconds longer than my best run, with -100mhz cpu, -300mhz cache, -333mhz on the ram.

4.7_4.2_3.0 requested.png

Edit: For the benefit of new overclockers, don't copy my voltage! This chip was a poor overclocker to start with and high voltage benchmarking and 24/7 crunching has degraded the cpu quite substantially. Stay below 1.35V (on skylake) if you want your cpu to last.
 
Last edited:
Status
Not open for further replies.
Back
Top