• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Two 16-core AMD Threadripper Parts Listed Online

HEDT is never really for gamers, but pro's doing pro stuff and need the threads / raw CPU power. It's a different socket as well, needs different cooling considering the TDP.

Uh, depends on the size of your hand I guess, I'm going to say bigger for most. Big enough to need new coolers.

If you think that CPU is big, you've never seen the Sun M5000 server:


Gold plated CPU to optimize cooling. :D These things where terrible huge and woud'nt fit a proper 2/4U rack. But it's good to see more cores are coming available for us consumers. It means that future games, apps and all will be made more multithreaded to get the best out of multiple cores/threads.
 
work done for work done clock speed for clock speed are the new generation of amd chips more energy efficient than the current generation of intel chips.. my 7700K is 91 watts in theory out of the box.. times its 91 watts by 4 to give it 16 cores and 32 threads and we would have just over 360 watts..

we could lose half of that by slowing everything down to say 3.5 ghz but we would still have around 180 watts.. it will all come down to the clock speed of these promised chips.. i recon it aint gonna be what some folks think it is.. i dont see how it can be..

trog
 
work done for work done clock speed for clock speed are the new generation of amd chips more energy efficient than the current generation of intel chips.. my 7700K is 91 watts in theory out of the box.. times its 91 watts by 4 to give it 16 cores and 32 threads and we would have just over 360 watts..

we could lose half of that by slowing everything down to say 3.5 ghz but we would still have around 180 watts.. it will all come down to the clock speed of these promised chips.. i recon it aint gonna be what some folks think it is.. i dont see how it can be..

trog
Yet they already states thermal tdp, about 155w for the 1998x iirc. The 1700 only uses 60w with 16 threads, doubling that gives 120w so 155w might not be so unreasonable.
 
It means that future games, apps and all will be made more multithreaded to get the best out of multiple cores/threads.
Not necessarily. Coders are lazy. They still don't code to use the hex and octo cores we had a few platforms ago.
 
No it really isn't that simple. I ran my 6700K with 1.4v and none of the heat issues being discussed with the 7700K. There definitely seems to be something wrong with that specific CPU.
Similar to the 4770k vs the 4790k
 
Not necessarily. Coders are lazy. They still don't code to use the hex and octo cores we had a few platforms ago.
Ugh. It's not exactly that simple. I've been thinking about writing up a rant for TPU to explain exactly why you can't always just write multi-threaded code and that there is a bit more to it than devs just being lazy because that's an excuse and even worse, it's the wrong excuse. To make a long story short, there is overhead associated with writing concurrent processes and in cases like games, low latency is super important but, what people always seem to forget is that efficient concurrent systems use queues between the different functional parts of the application. While this increases throughput on multi-core machines while letting each part scale as necessary (assuming purely functional transforms,) it harms latency do to managing multi-threaded resources and asynchronously conveying values between different parts of the application. So unless you have one particular thing going on that's consuming a ton of CPU time, it's not dependent on other calculations, and is substantial enough of a workload to overcome all of the overhead and pitfalls of multi-threaded programming, it's not going to offer much of a gain and could actually use more CPU power and harm performance.

I would like to emphasize that almost every time I decide to make some code multi-threaded, it's because the task is relatively tolerant of higher latency and requires high throughput to be accomplished in a reasonable amount of time but, my latency boundaries tend to be measured in milliseconds because I typically do web services as opposed to microseconds in the case of game state processing and rendering.

tl;dr: Stop bashing devs for not making multi-threaded games, it's not always quite that simple. If it's so easy, why don't you do it? ...and it's not like they're not multi-threaded, they just can't saturate as many cores you would like because not every thread is running full tilt.
 
Ugh. It's not exactly that simple. I've been thinking about writing up a rant for TPU to explain exactly why you can't always just write multi-threaded code and that there is a bit more to it than devs just being lazy because that's an excuse and even worse, it's the wrong excuse. To make a long story short, there is overhead associated with writing concurrent processes and in cases like games, low latency is super important but, what people always seem to forget is that efficient concurrent systems use queues between the different functional parts of the application. While this increases throughput on multi-core machines while letting each part scale as necessary (assuming purely functional transforms,) it harms latency do to managing multi-threaded resources and asynchronously conveying values between different parts of the application. So unless you have one particular thing going on that's consuming a ton of CPU time, it's not dependent on other calculations, and is substantial enough of a workload to overcome all of the overhead and pitfalls of multi-threaded programming, it's not going to offer much of a gain and could actually use more CPU power and harm performance.

I would like to emphasize that almost every time I decide to make some code multi-threaded, it's because the task is relatively tolerant of higher latency and requires high throughput to be accomplished in a reasonable amount of time but, my latency boundaries tend to be measured in milliseconds because I typically do web services as opposed to microseconds in the case of game state processing and rendering.

tl;dr: Stop bashing devs for not making multi-threaded games, it's not always quite that simple. If it's so easy, why don't you do it? ...and it's not like they're not multi-threaded, they just can't saturate as many cores you would like because not every thread is running full tilt.

Thread thread = new Thread (); or something like that, it's really not that hard At least in java, mind you I only have about 2 and a half years of java coding experience.

Also isn't that what things like Vulkan and stuff are for?
 
Vulkan is closer to the GPU. Your new Thread is inside a software scheduler which causes that same overhead < DX11 used todo.

If game devs dont know how to proper code for multiple cores / threads then use a off the shelf game-engine that does by default. This way the game dev only has to focus on the game itself and not the visual / technical part.
 
Thread thread = new Thread (); or something like that, it's really not that hard At least in java, mind you I only have about 2 and a half years of java coding experience.
Easy to start a thread, yes. You're forgetting the time it takes to spin up the thread and to schedule it, the time waiting if you're using locks, the latency incurred if you use a queue instead. It's a little more than just spinning up a thread, it's how you use it and what its characteristics are.
Also isn't that what things like Vulkan and stuff are for?
Vulkan is basically OpenGL but every command that would normally be executed in the OpenGL global scope would get executed as command buffers, or in other words, queues of commands. So the application prepares a series of commands that tells the Vulkan engine how to prepare stuff. The performance is had from those command buffers because you can submit multiple command buffers so you essentially gain a queue of queues which represents the full set of processing you need to do. This decouples the actual rendering process/thread from the process/thread that describe what needed to be done. This is a case where the latency incurred from making hundreds of thousands of OpenGL calls to draw is greater than submitting an order list of things that need to be done because the engine can process the workload so long as there is a queue to be processed versus waiting for a render loop to be calling draw commands and such but, it's not like Vulkan is killing/starting processes or thread to do all of this, the same threads are used because of the overhead of setting everything up. It's also not unrealistic that some command buffers could be static and might be prepared ahead of time so all that needs to be done is to submit them when rendering.

So I write Clojure which is a Lisp on top of the JVM and JavaScript. It's a great language for doing concurrent programming. So consider the time it takes to spin up a thread:
Code:
> (time (async/<!! (async/thread (+ 1 2 3))))
"Elapsed time: 0.834924 msecs"                                                                                                                                                                    
6
Just to add 1, 2, and 3 on a new thread and return the value takes almost a full millisecond the majority of that is spinning up the thread:
Code:
(time (+ 1 2 3))
"Elapsed time: 0.049105 msecs"                                                                                                                                                                    
6

Keep in mind, I'm running this interactively so the time includes parsing and compiling since it's being JIT'ed on the spot. It would take less time if I AOT'ed it but, you would still see the same kind of difference in performance and probably even more so because the adding function is actually a lot faster than that after being compiled.
 
Easy to start a thread, yes. You're forgetting the time it takes to spin up the thread and to schedule it, the time waiting if you're using locks, the latency incurred if you use a queue instead. It's a little more than just spinning up a thread, it's how you use it and what its characteristics are.

Vulkan is basically OpenGL but every command that would normally be executed in the OpenGL global scope would get executed as command buffers, or in other words, queues of commands. So the application prepares a series of commands that tells the Vulkan engine how to prepare stuff. The performance is had from those command buffers because you can submit multiple command buffers so you essentially gain a queue of queues which represents the full set of processing you need to do. This decouples the actual rendering process/thread from the process/thread that describe what needed to be done. This is a case where the latency incurred from making hundreds of thousands of OpenGL calls to draw is greater than submitting an order list of things that need to be done because the engine can process the workload so long as there is a queue to be processed versus waiting for a render loop to be calling draw commands and such but, it's not like Vulkan is killing/starting processes or thread to do all of this, the same threads are used because of the overhead of setting everything up. It's also not unrealistic that some command buffers could be static and might be prepared ahead of time so all that needs to be done is to submit them when rendering.

So I write Clojure which is a Lisp on top of the JVM and JavaScript. It's a great language for doing concurrent programming. So consider the time it takes to spin up a thread:
Code:
> (time (async/<!! (async/thread (+ 1 2 3))))
"Elapsed time: 0.834924 msecs"                                                                                                                                                                   
6
Just to add 1, 2, and 3 on a new thread and return the value takes almost a full millisecond the majority of that is spinning up the thread:
Code:
(time (+ 1 2 3))
"Elapsed time: 0.049105 msecs"                                                                                                                                                                   
6

Keep in mind, I'm running this interactively so the time includes parsing and compiling since it's being JIT'ed on the spot. It would take less time if I AOT'ed it but, you would still see the same kind of difference in performance and probably even more so because the adding function is actually a lot faster than that after being compiled.
But wouldn't using a game engine like cry engine or unreal engine 4 help optimize that? Look at how well optimized battlefield is on the frost engine and it uses at least 8 cores, or should I say at least 8 threads.
 
But wouldn't using a game engine like cry engine or unreal engine 4 help optimize that? Look at how well optimized battlefield is on the frost engine and it uses at least 8 cores, or should I say at least 8 threads.
Sure, if the only thing your game is doing is rendering which isn't the only thing most games do. :)

Point being is that any system is only as fast as the weakest link.
 
Easy to start a thread, yes. You're forgetting the time it takes to spin up the thread and to schedule it, the time waiting if you're using locks, the latency incurred if you use a queue instead. It's a little more than just spinning up a thread, it's how you use it and what its characteristics are.

Vulkan is basically OpenGL but every command that would normally be executed in the OpenGL global scope would get executed as command buffers, or in other words, queues of commands. So the application prepares a series of commands that tells the Vulkan engine how to prepare stuff. The performance is had from those command buffers because you can submit multiple command buffers so you essentially gain a queue of queues which represents the full set of processing you need to do. This decouples the actual rendering process/thread from the process/thread that describe what needed to be done. This is a case where the latency incurred from making hundreds of thousands of OpenGL calls to draw is greater than submitting an order list of things that need to be done because the engine can process the workload so long as there is a queue to be processed versus waiting for a render loop to be calling draw commands and such but, it's not like Vulkan is killing/starting processes or thread to do all of this, the same threads are used because of the overhead of setting everything up. It's also not unrealistic that some command buffers could be static and might be prepared ahead of time so all that needs to be done is to submit them when rendering.

So I write Clojure which is a Lisp on top of the JVM and JavaScript. It's a great language for doing concurrent programming. So consider the time it takes to spin up a thread:
Code:
> (time (async/<!! (async/thread (+ 1 2 3))))
"Elapsed time: 0.834924 msecs"                                                                                                                                                                   
6
Just to add 1, 2, and 3 on a new thread and return the value takes almost a full millisecond the majority of that is spinning up the thread:
Code:
(time (+ 1 2 3))
"Elapsed time: 0.049105 msecs"                                                                                                                                                                   
6

Keep in mind, I'm running this interactively so the time includes parsing and compiling since it's being JIT'ed on the spot. It would take less time if I AOT'ed it but, you would still see the same kind of difference in performance and probably even more so because the adding function is actually a lot faster than that after being compiled.
How much does the language used affect that thread time? Wouldn't c++ be faster than java script in a game with a lot of threads?
 
How much does the language used affect that thread time? Wouldn't c++ be faster than java script in a game with a lot of threads?
Yes but, that doesn't change the limitations of writing multi-threaded code and what costs there are to it.
 
Yes but, that doesn't change the limitations of writing multi-threaded code and what costs there are to it.
All I know is that the games that run in multiple cores/threads don't seem to suffer from performance loss and instead gain performance vs single thread games
 
All I know is that the games that run in multiple cores/threads don't seem to suffer from performance loss and instead gain performance vs single thread games
That's because game devs aren't going to release multi-threaded code that doesn't help performance because it won't pass QA. Companies try to release features that work, not ones that don't. :)
 
only if done wrong, plus 1.28v :eek: @4.8 way to much :rolleyes:
You would be more shocked if you see my i5 4670K @ 5GHZ at 1.45V - it has been running fine since 2 years plus.
 
So I write Clojure which is a Lisp on top of the JVM and JavaScript. It's a great language for doing concurrent programming. So consider the time it takes to spin up a thread:
Code:
> (time (async/<!! (async/thread (+ 1 2 3))))
"Elapsed time: 0.834924 msecs"                                                                                                                                                           
6
Just to add 1, 2, and 3 on a new thread and return the value takes almost a full millisecond the majority of that is spinning up the thread:
Code:
(time (+ 1 2 3))
"Elapsed time: 0.049105 msecs"                                                                                                                                                           
6
Main: 30,000-50,000 ticks (typical 40,000 ticks)
Worker: 40,000-180,000 ticks (typical 50,000 ticks)
Code:
using System;
using System.Threading;
namespace ConsoleApplication1
{
    class Program
    {
        public static long Start;
        [STAThread]
        public static void Main(string[] args)
        {
            Start = DateTime.Now.Ticks;
            Thread worker = new Thread(Worker);
            worker.Start();
            //Worker();
            Console.ReadKey();
        }
        private static void Worker()
        {
            int a = 0;
            a += 1;
            a += 2;
            a += 3;
            Console.WriteLine(a + " " + (DateTime.Now.Ticks - Start).ToString());
        }
    }
}
I'm not counting compile time. Most of the difference in yours is likely coming from fetching and compiling the async threading libraries.
 
Last edited:
More like JVM sucks.

I did the same thing in C# and it boggles my mind but the second thread literally takes 0 ticks to complete.
Code:
using System;
using System.Threading;
namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            Worker(); //6 0.0030015
            Thread worker = new Thread(Worker);
            worker.Start(); //6 0
            while (worker.ThreadState == ThreadState.Running)
                Thread.Sleep(10);
            worker = null;
            Console.ReadKey();
        }
        static void Worker()
        {
            DateTime start = DateTime.Now;
            int a = 0;
            a += 1;
            a += 2;
            a += 3;
            Console.WriteLine(a.ToString() + " " + (DateTime.Now - start).ToString());
        }
    }
}
.NET Framework is doing some wizardry to make it execute a second time instanteously. For giggles, I ran it three times (thread or not) and it's zero ticks. Put simply, it seems that the processor can execute that +++ in just a few processor cycles.
More like JVM sucks.

I did the same thing in C# and it boggles my mind but the second thread literally takes 0 ticks to complete.
Code:
using System;
using System.Threading;
namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            Worker(); //6 0.0030015
            Thread worker = new Thread(Worker);
            worker.Start(); //6 0
            while (worker.ThreadState == ThreadState.Running)
                Thread.Sleep(10);
            worker = null;
            Console.ReadKey();
        }
        static void Worker()
        {
            DateTime start = DateTime.Now;
            int a = 0;
            a += 1;
            a += 2;
            a += 3;
            Console.WriteLine(a.ToString() + " " + (DateTime.Now - start).ToString());
        }
    }
}
.NET Framework is doing some wizardry to make it execute a second time instanteously. For giggles, I ran it three times (thread or not) and it's zero ticks. Put simply, it seems that the processor can execute that +++ in just a few processor cycles.

I did the same test in Java. Here is the code:

Code:
public class ThreadTest
{
    static ThreadTest aaa = new ThreadTest();
    static long start = System.currentTimeMillis();

    public static void main(String[] args)
    {
        aaa.randoThread.start();
    }

    private Thread randoThread = new Thread(new Runnable()
    {
        @Override
        public void run()
        {
            int a = 0;
            a += 1;
            a += 2;
            a +=3;
            long end = System.currentTimeMillis();
            System.out.println("A=" + a + "\nThis test took: " + (end - start) + "ms");
        }
    });
}

I had the current start time start at the top of the program instead of the beginning of the Thread, wasn't sure if that was the best way to do it, but the output was still 0ms.
 
See latest edit. The problem is they cache the worker so the result always takes 0 ticks after the first run. You have to run the program repeatedly and only the first result is useful.

On my system, the second thread incurs about a 20% penalty, not 1700%.
 
Last edited:
I'm not counting compile time. Most of the difference in yours is likely coming from fetching and compiling the async threading libraries.
Not exactly, there is one thing that mine is doing that you're is not so let's look at the one I did again (also keep in mind, this is only to measure the overhead from starting a thread and getting the result back.)

So, "async" is a library called "clojure.core.async" which can do all kinds of cool asynchronous things like easily spinning up threads, using thread pools, etc. What it also provides is a stream abstraction, a channel. In core.async, every time a thread or thread pool is used, that function call returns a channel which represents the value that's going to eventually be returned by the thread.

Obviously we know what (+ 1 2 3) does and (async/thread ...) does but, what about that (async/<!! ...)? That's a function to get the next value off the channel, the two bangs indicates that it's a blocking function call (as in, it will block the thread until a result is returned,) and the outer most (time ...) function simply prints out execution time. Your example times from the time before the thread starts to the time the thread finished doing the calculation but, is still running. In mine, it's doing the same thing except, I'm including time to get the value back from the thread. That alone could account for the extra time because people don't really realize it but, a lot of threads get held up because they're waiting for data or for a lock to be released.

Edit: Consider this new application:
Code:
(defn add-with-thread []
  (let [start (System/nanoTime)]
    (async/<!!
      (async/thread
        [(+ 1 2 3)
         (- (System/nanoTime) start)]))))

(comment
  (time (add-with-thread)))

It will give me a result that looks like this:
Code:
"Elapsed time: 0.622553 msecs"
[6 100905]

So total execution time is 0.622553 ms but, the time between starting the thread and finishing the calculation only took 100905ns which is ~0.101ms. That means that most of the time was spent trying to get the data back from the thread.
 
Last edited:
Well I was thinking of getting a Ryzen/Vega laptop when I upgraded.


But I think I'm going to get what of these thread rippers and build myself a purdy desktop again.


If no-one else is getting one I'll stick up a review. Probably YouTube as I such at writing.
 
and getting the result back.
I added a delegate and event to do just that and it made no difference on performance because nothing is waiting for the result to proceed.

Your example times from the time before the thread starts to the time the thread finished doing the calculation but, is still running. In mine, it's doing the same thing except, I'm including time to get the value back from the thread.
Because there's no reason to wait: Console is a static class tied to the Main thread which prints what it receives in the order it received it. When I did the aforementioned delegate and event, it worked the same way: Console was checking for key input while the other thread raised the event which wrote to the console circumventing the check. As long as you don't press a key terminating the console, the threads can continue to inject and display data in the console.


That alone could account for the extra time because people don't really realize it but, a lot of threads get held up because they're waiting for data or for a lock to be released.
Thread locking defeats the purpose of async data processing. Delegates and events are the proper way to handle async threading (no threads in wait state, just notified to proceed).


FYI, 10,000 ticks per millisecond in .NET. Doing the calculation directly took 4ms versus 5ms on a separate thread. I suspect DateTime itself has a lot overhead because it calculates everything (even the month/day/year from 1/1/1 as well as day of week and several other metrics) using the default culture on initialization. Proving this: Environment.TickCount measures milliseconds since the computer started circumventing DateTime overhead. Using that method, both main thread and worker thread get a 0ms result because the workload is too minor to measure.
 
Last edited:
Because there's no reason to wait: Console is a static class tied to the Main thread which prints what it receives in the order it received it. When I did the aforementioned delegate and event, it worked the same way: Console was checking for key input while the other thread raised the event which wrote to the console circumventing the check. As long as you don't press a key terminating the console, the threads can continue to inject and display data in the console.
Most applications aren't using threads to spit results to the screen, it's giving the data to something else so it can be used. Returning data from a thread is actually super important for doing just about anything useful.
Thread locking defeats the purpose of async data processing. Delegates and events are the proper way to handle async threading (no threads in wait state, just notified to proceed).
Actually, something (typically,) like a thread pool does "wait" but, it's kind of like parking the thread without necessarily stopping it but, there are times where the only option is to block unless the workload and everything your using already implements some sort of non-blocking I/O which is kind of a stretch to say holds for a lot of applications. More often than not, values at some point need to get synchronously joined up. You can't always get rid of that but, you can minimize how much time it takes to do it.

Edit: For what it's worth, I wouldn't have used <!! if I wasn't done working with that data, I would apply a transducer to it or something but the point is, <!! is the last step, not a step that should be done often.
 
Last edited:
Most applications aren't using threads to spit results to the screen, it's giving the data to something else so it can be used. Returning data from a thread is actually super important for doing just about anything useful.
Which is what the delegate and event are for. Any class listening to that event will get the data contained in the delegate. In this case, the delegate had the value (6) and the time (ticks). When the event is raised, the delegate method is called by the issuing thread and carried out (printed to console). In cases where cross-thread references are a problem, the issuing thread invokes the owning thread: owning thread executes the method while the issuing thread continues its tasks (usually requesting more work from the main thread).
 
Back
Top