# Core i7 940 Review Shows SMT and Tri-Channel Memory Let-down



## btarunr (Oct 13, 2008)

As the computer enthusiast community gears up for Nehalem November, with reports suggesting a series of product launches for both Intel's Core i7 processors and compatible motherboards, Industry observer PC Online.cn have already published an in-depth review of the Core i7 940 2.93 GHz processor. The processor is based on the Bloomfield core, and essentially the Nehalem architecture that has been making news for over an year now. PC Online went right to the heart of the matter, evaluating the 192-bit wide (tri-channel) memory interface, and the advantage of HyperThreading on four physical cores. In the tests, the 2.93 GHz Bloomfield chip was pitted against a Core 2 Extreme QX9770 operating at both its reference speed of 3.20 GHz, and underclocked to 2.93 GHz, so a clock to clock comparison could be brought about.

The evaluation found that the performance increments tri-channel offers over dual-channel memory, in real world applications and games, are just about insignificant. Super Pi Mod 1.4 shows only a fractional lead for tri-channel over dual-channel, and the trend continued with Everest Memory Benchmark. On the brighter side, the integrated memory controller does offer improvements over the previous generation setup, with the northbridge handling memory. Even in games such as Call of Duty 4 and Crysis, tri-channel memory did not shine.



 

 

 




As for the other architectural change, simultaneous multi-threading, that makes its comeback on the desktop scene with the Bloomfield processors offering as many as eight available logical processors for the operating system to talk to, it appears to be a mixed bag, in terms of performance. The architecture did provide massive boosts in WinRAR and Cinebench tests Across tests, enabling SMT brought in performance increments of roughly 10~20% with general benchmarks that included Cinebench, WinRAR, TMPGEnc, and Fritz Chess. With 3DMark Vantage, SMT provided a very significant boost to the scores, with about 25% increments. It didn't do the same, to current generation games such as Call of Duty 4, World in Conflict and Company of Heroes. What's more, the games didn't seem to benefit from Bloomfield in the first place. The QX9770 underclocked at 2.93 GHz, outperformed i7 940, both with and without SMT, in some games. 



 

 

 

 

 



*View at TechPowerUp Main Site*


----------



## InnocentCriminal (Oct 13, 2008)

Hmm... interesting. So that's an in-sight into the performance. Now for the power consumption...


----------



## Basard (Oct 13, 2008)

is it just me, or is the FPS on the games going down with tri-channel?


----------



## InnocentCriminal (Oct 13, 2008)

Once Far Cry 2, Left 4 Dead and other natively multi-threaded games come out, we'll have a better look at how well they compare to the Core 2 line.


----------



## Mussels (Oct 13, 2008)

this is kind of the same old argument - go a quad + HT if you use multithreaded apps, but if all you do is games, a faster clocked quad/even faster dual, is the better choice.

Video encoders are going to cream themselves, at least.


----------



## eidairaman1 (Oct 13, 2008)

more of a server upgrade is what this is primarily


----------



## Mussels (Oct 13, 2008)

after reading through the page, one thing caught my attention. The Core i7 CPU has 8MB cache, while the QX9770 has 12MB.

I'm not sure what the i7 CPU will cost, but isnt comparing it to the Extreme core 2 model giving it a bit of a disadvantage? Wouldnt hte performance different be a lot smaller vs a lower cached CPU?


edit:

load and idle power graph.


----------



## InnocentCriminal (Oct 13, 2008)




----------



## D4S4 (Oct 13, 2008)

Mussels said:


> after reading through the page, one thing caught my attention. The Core i7 CPU has 8MB cache, while the QX9770 has 12MB.
> 
> I'm not sure what the i7 CPU will cost, but isnt comparing it to the Extreme core 2 model giving it a bit of a disadvantage? Wouldnt hte performance different be a lot smaller vs a lower cached CPU?



Not really, Core i7 doesn't even need 8MB since it has integrated memory controller.


One thing that should go wild on that thing is photoshop


----------



## FordGT90Concept (Oct 13, 2008)

Basard said:


> is it just me, or is the FPS on the games going down with tri-channel?


Memory is a funny thing because the faster it goes in terms of bandwidth, the slower it goes in terms of clock cycles.  We see this going from DDR, to DDR2, to DDR3.  Ultimately, I think it's time to stop with DDR and move to something that accomplishes more per work cycle like QDR or ODR.  DDR2 and DDR3 have been able to keep pace with processor needs but it fails to improve not only on bandwidth, but also clock speeds and/or latency.  Basically, the original DDR technology is being stretched to meet modern demands when it is long over due to explore the possibilities of something else.

Ultimately, bandwidth doesn't matter so long as it doesn't run out.  For instance, interstate highways are great--until they have a traffic jam.  As such, memory never really weighs very heavily into benchmarking.  It only has a major impact if it errors or if there is a traffic jam--both are for the worse.  So in regards to memory, uneventful is a great thing.


Now directly to your question: tri-channel is addressing the need of the processor more than anything else.  In order to add another two DIMMs, they increase the distance and therefore the latency.  Specific references such as FPS goes down slightly in order to prevent a disaster (traffic jam).  I think they are being very pro-active on this whole memory bandwidth issue but I really don't like the way it is progressing (has been progressing for almost a decade).  This small decrease in performance with tri-channel is basically universal until they try to reinvent memory.


----------



## Mussels (Oct 13, 2008)

as a very good analogy to explain fords post, i shall use video cards.

Look at a video card such as the 8600GT and its 128 bit memory bus. You can slap more ram on it (256/512/1024MB), and that will prevent running out of ram (texture swapping) _without_ improving performance.

You could also add more channels with the same amount of ram, which gives you better performance, but not if the game/application was designed for a 128 bit bus/256MB of ram.

Two examples:
128 bit bus with 1GB of ram, or a 512 bit bus with 256MB of ram.

Tri channel will help prevent bottlenecks as we all go 8GB+ in our systems, and i think we need benchmarks with MORE than a measly 2GB of ram, before calling it a failure.


----------



## FordGT90Concept (Oct 13, 2008)

Mussels said:


> Tri channel will help prevent bottlenecks as we all go 8GB+ in our systems, and i think we need benchmarks with MORE than a measly 2GB of ram, before calling it a failure.


To add to that, Nehalem seems to be designed for tomorrow--not today.  This is a processor essentially built for when most software is demanding and multithreaded, 64-bit environment is common place, and memory flows like a beer tap at Oktoberfest.  When it is launched, it will only seem like it belongs in high-bandwidth server environments and not in your desktop computer.  It will be a few years from now when Core 2 seems like Pentium 4 and Core i7 is a must-have chip.

I believe Core i7 (or at least the concepts that it is pushing for) won't go mainstream until Microsoft releases a 64-bit only operating system.  I just hope AMD takes the ques from Intel and moves towards the same direction.  AMD will get kicked in the balls again if they don't.


----------



## Mussels (Oct 13, 2008)

in summary, tri channel is for users of massive bandwidth, with multithreaded applications.

Todays games do not fit that category, directX 11 games will (native multithreading), and any form of media encoding definately will.

I bet Vmwares will run uber fast on these systems.


----------



## _jM (Oct 13, 2008)

this is like one of those .. " I told ya so" posts. I knew with the new chips there would be some kinda flaw and it happens to be in the tri-channel ram.


----------



## FordGT90Concept (Oct 13, 2008)

Mussels said:


> Todays games do not fit that category, directX 11 games will (native multithreading), and any form of media encoding definately will.


Which reminds me of yet another thing: I believe the future of GPUs will be much like CPUs.  That is, less complicated but more of them like Intel is in Larrabee development.  The more GPUs you have, the more requests the CPU receives and therefore, more leads to more.

It is awkward how manufacturers are pushing for server technology in home computers.  I mean, ten years ago, it was all about the clockspeed.  Today, they realized that clockspeed isn't virtually unlimited and have looked to mainframe servers in to how to fix it: more processors.  Because more processors means more of everything (sockets, DIMMs, power, etc.), they had to find a way to make more affordable and marketable.  The answer was in the form of cores on the same CPU die.  The GPU crowd is now realising the same thing but there is a great deal of latency involved.  Once the GPU crowd jumps on the same multi-core bandwagon as the CPU crowd has been on for several years, games will start to benefit from CPUs with lots of cores.


----------



## tkpenalty (Oct 13, 2008)

Well a triple memory channel setup basically, is somewhat ahead of its time for the general consumers, I mean the games aren't even designed to make use of this advantage.


----------



## Mussels (Oct 13, 2008)

tkpenalty said:


> Well a triple memory channel setup basically, is somewhat ahead of its time for the general consumers, I mean the games aren't even designed to make use of this advantage.



Indeed. If a game was, it'd perform terribly on modern hardware. Kinda like running 1920x1200 on card with a 64 bit memory bus.


----------



## DarkMatter (Oct 13, 2008)

FordGT90Concept said:


> Ultimately, I think it's time to stop with DDR and move to something that accomplishes more per work cycle like QDR or ODR.



How in hell do you make QDR or ODR? (I understand it as Quad/Octo Data Rate)

DDR is such because it works in clock's low and high states. That is Dual Data Rate, so again how do you do Quad/Octo data rate if your clock signal only has two states?
As I see it, multiplexing the clock signal in time is not an option, because isn't that the same as just running the memory at twice the speed?

EDIT:  Forget about QDR, I'm stupid, I forgot you still have rising and falling edges like in Intel's Quad pumped FSB. I still fail to see how you could do 8 ops per cycle though.


----------



## Deleted member 3 (Oct 13, 2008)

DarkMatter said:


> How in hell do you make QDR or ODR? (I understand it as Quad/Octo Data Rate)
> 
> DDR is such because it works in clock's low and high states. That is Dual Data Rate, so again how do you do Quad/Octo data rate if your clock signal only has two states?
> As I see it, multiplexing the clock signal in time is not an option, because isn't that the same as just running the memory at twice the speed?
> ...



XDR sends 8 bits per clock. Not sure about the theory behind it though. Either way it already exists so it is possible apparently


----------



## Deleted member 3 (Oct 13, 2008)

Oh, http://en.wikipedia.org/wiki/XDR2_DRAM

HDR


----------



## btarunr (Oct 13, 2008)

Yes, claimed faster than GDDR5 in its applications: http://www.rambus.com/us/products/xdr2/xdr2_vs_gddr5.html   Comes from Rambus itself though. And a broad memory interface isn't something software needs "optimizations" for, it's just a physical thing. You have a beefy 192-bit wide memory interface, and an accordingly stepped-up memory bandwidth.


----------



## SimFreak47 (Oct 13, 2008)

btarunr said:


> Comes from Rambus itself though.


Gotta be pretty expensive.


----------



## FordGT90Concept (Oct 13, 2008)

DarkMatter said:


> I still fail to see how you could do 8 ops per cycle though.


At eight points along the sine wave.  A wave is like a string where the entire length of the string represents a single wavelength.  Instead of reading/writing at two points like we do with DDR, we read/write at eight points along it (the rising edge, peak, falling edge, intersection, falling edge, summit, rising edge, intersection).

History tells us anything made by Rambus is doomed to failure in the PC world; just look at RDRAM and the brief stint Intel had with it.  Rambus specializes in high performance, soldered in situations.  They're kind of like Apple come to think of it.  They make a product, don't care what others say about it, and just expect people to come crawling to them for licensing.

It's JEDEC (a forum including all major processor and memory manufacturers) that has to decide when it's time to move to a new memory technology.  I'm afraid we're probably going to be stuck with DDR derivitives until photon processors come out. :shadedshu


----------



## DarkMatter (Oct 13, 2008)

DanTheBanjoman said:


> XDR sends 8 bits per clock. Not sure about the theory behind it though. Either way it already exists so it is possible apparently



First of all, 3 things:

1- I need more sleep.
2- I was right in the first place. Falling and rising edges are still ONLY 2. Left edge of low state is the same as the right one of high state.  Quad Pumping is done using 2 clocks with 90º phase difference.
3- XDR AFAIK uses a ring bus to acces the different memory banks, so is effectively multiplexing the data signals and it's a completely different aproach to SDRAM. It's also different to what I said about multiplexing the clock signal, which would be pointless IMO: if memory can run faster just run it faster, IMO FSB could easily keep up. In fact I have always considered the FSB was so "slow" compared to the CPU clock because the memory was even slower. And if you are doubling the memory bits/banks per clock why multiplex the external clock (or use two different phased signals) to be able to use them and not just double the lanes?

That being said, I think I have to elaborate more on my question. Using XDR as main memory is out of the question, we could do that (with it's pros and cons), but that wouldn't be using QDR/ODR SDRAM. My question is how and why you use a Quad Pumped Syncronous RAM when for doing that you have to double the accesible bits per clock of your memory chips without obtaining the benefits of a fully parallel design, if you could just run the memory twice as fast. I'm going to make a diagram of what I say because I don't know how to explain it better now and I'm sure no one will understand this mess.


----------



## DarkMatter (Oct 13, 2008)

FordGT90Concept said:


> At eight points along the sine wave.  A wave is like a string where the entire length of the string represents a single wavelength.  Instead of reading/writing at two points like we do with DDR, we read/write at eight points along it (the rising edge, peak, falling edge, intersection, falling edge, summit, rising edge, intersection).
> 
> History tells us anything made by Rambus is doomed to failure in the PC world; just look at RDRAM and the brief stint Intel had with it.  Rambus specializes in high performance, soldered in situations.  They're kind of like Apple come to think of it.  They make a product, don't care what others say about it, and just expect people to come crawling to them for licensing.
> 
> It's JEDEC (a forum including all major processor and memory manufacturers) that has to decide when it's time to move to a new memory technology.  I'm afraid we're probably going to be stuck with DDR derivitives until photon processors come out. :shadedshu



I was asking exactly for that. You CAN'T ask any digital circuit to understand such things as peak, intersection, etc. You only have two states (low/high) at your disposal. What you are saying would be like using an 8 state machine, which of course would be ideal, but impossible in current technology. Otherwise the whole digital world would be based on more than 2 state machines!!

A circuit can know when it is on high/low state OR when he is changing from low to high and viceversa as is the case with DDR. But once it is in one state how does it know it has to perform another task? It can't until another edge comes.


----------



## FordGT90Concept (Oct 13, 2008)

DarkMatter said:


> I was asking exactly for that. You CAN'T ask any digital circuit to understand such things as peak, intersection, etc.


Which is to suggest that QDR technology and beyond work primarily via analog signals.


----------



## Mussels (Oct 13, 2008)

system memory isnt something i'd want left to something as unstable as analogue voltages. they'd be so prone to interference...


----------



## niko084 (Oct 13, 2008)

Mussels said:


> Video encoders are going to cream themselves, at least.



Eh, doubtful... Most people I know that are heavy into video encoding are running dual quads, and 8+ gb of ram. If you are really into it you can afford the system built for it, if you are doing it as something for yourself, the difference between a Quad and a Quad with HT isn't going to be enough I think, especially considering the over clock ability of the Core2 quads.


----------



## Deleted member 3 (Oct 13, 2008)

niko084 said:


> Eh, doubtful... Most people I know that are heavy into video encoding are running dual quads, and 8+ gb of ram. If you are really into it you can afford the system built for it, if you are doing it as something for yourself, the difference between a Quad and a Quad with HT isn't going to be enough I think, especially considering the over clock ability of the Core2 quads.



Though in a few months dual 6-core machines with htt will cost as much as the current dual quads. Of course, it'll be interesting to know how it scales. This should be where CSI should show its power.


----------



## Mussels (Oct 13, 2008)

niko084 said:


> Eh, doubtful... Most people I know that are heavy into video encoding are running dual quads, and 8+ gb of ram. If you are really into it you can afford the system built for it, if you are doing it as something for yourself, the difference between a Quad and a Quad with HT isn't going to be enough I think, especially considering the over clock ability of the Core2 quads.



triple octas with HT and 16GB of ram will get to them soon enough!

HT is something that *will* apply to video encoding, as will the ram bandwidth. cant you see those graphs in the first post!







Yeah its not video encoding, but in most reviews i see the performance gains for video encoding, tend to be very similar to winrar/winzip encoding tasks.


----------



## DarkMatter (Oct 13, 2008)

Analog signals are totally contrary to the goal for what digital computing was created: efficiency. You want to use as few energy as you can and circuits are bound to some degree of inaccuracy. In fact digital in reality is not 0 or 1. It works as if anything above 0,75 is seen as one and anything below 0,25 is seen as 0. You can't make a whole chip run at an specified voltage, specially because every circuit has resistance, so while the first transistor gets 1v the last one can easily get only 0,8v (add to that the fact that different power supplies, mobos, etc. give different input voltages). You have to make your circuit with this in mind. Now imagine you add more states, how do you know which state is 0,36V THROUGHOUT your whole chip when the voltage for the same state can change almost by that amount? You can't and you would only have one option: increase your input voltage by an order of magnitude...


----------



## niko084 (Oct 13, 2008)

DanTheBanjoman said:


> Though in a few months dual 6-core machines with htt will cost as much as the current dual quads. Of course, it'll be interesting to know how it scales. This should be where CSI should show its power.



Indeed, but I think even with scaling it wont much matter comparing dual 6 core chips to dual quads... Unless someone manages 5ghz out of an air cooled xeon 5k series...

I know when I moved from my e6750 @ 3.2 to my x3210 @ 3.2 my encoding speeds increased by about 50%..


----------



## FordGT90Concept (Oct 13, 2008)

Mussels said:


> system memory isnt something i'd want left to something as unstable as analogue voltages. they'd be so prone to interference...


Both analog and digital are prone to interference.  I think I said it wrong though.  Analog and digital are two different means of sending the actual data on a wave.  Digital is taking the wave and pulling specific information out of it (binary).  Analog is converting the wave into a format that can be used, like what happens in a CRT.

Imagine the string I spoke of earlier having extra points of information (up to 16 so far) induced in to it.  As long as you read the signal on the other end the same way it was produced, you'll end up with more information on the same stream of electrons in the same amount of time.  So on a single sine, you can induce and extract any number of points of data from it.

How exactly they do this, I have no idea.  I don't even know how deep the ODR and HDR go (controller to translator on stick or direct to memory modules).  I do know it works as seen in the PS3.


----------



## FordGT90Concept (Oct 13, 2008)

DarkMatter said:


> Analog signals are totally contrary to the goal for what digital computing was created: efficiency. You want to use as few energy as you can and circuits are bound to some degree of inaccuracy. In fact digital in reality is not 0 or 1. It works as if anything above 0,75 is seen as one and anything below 0,25 is seen as 0. You can't make a whole chip run at an specified voltage, specially because every circuit has resistance, so while the first transistor gets 1v the last one can easily get only 0,8v (add to that the fact that different power supplies, mobos, etc. give different input voltages). You have to make your circuit with this in mind. Now imagine you add more states, how do you know which state is 0,36V THROUGHOUT your whole chip when the voltage for the same state can change almost by that amount? You can't and you would only have one option: increase your input voltage by an order of magnitude...


This is about all I can find on the subject of how they do it:


> http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2341
> The next technology that FlexIO enables is DRSL with LVDS (Low Voltage Differential Signaling), which is a technology similar to what Intel uses in the Pentium 4 to reduce power consumption of their high-speed ALUs.  We will actually explain the technology in greater detail later on this week in unrelated coverage, but the basic idea is as follows: normally the lower the voltage you run your interfaces at, the more difficult it becomes to detect an electrical "high" from an electrical "low."  The reason being that it is quite easy to tell a 5V signal from a 0V signal, but telling a 0.9V signal from a 0V signal becomes much more difficult.  DRSL instead takes the difference between two voltage lines with a very low voltage difference and uses that difference for signaling.  By using low signal voltages, you can ensure that even though you may have a high speed bus, power consumption is kept to a minimum.  The technology isn't  quite sophisticated enough to make the transition to the mobile world, but with some additional circuitry to dynamically enable/disable interface pins it would be quite easy to apply FlexIO to mobile applications of the Cell architecture.


----------



## Morgoth (Oct 13, 2008)

i want xdr


----------



## CDdude55 (Oct 13, 2008)

As ive said, as a gamer this shows even more why i am not getting Core i7.


Core i7 i think will be a really good thing among the video encoders and photoshop people(and folding), for gaming they arent that great, even if the apps arent multi threaded, that still isn't much of an excuse as to why it cant beat the QX9770, with the apps of today.

Maybe when games use more cores, core i7 will be useful but even then i dont see a huge leap over the Core 2's.

Hearing this is even more disappointing news for the gamers.


----------



## niko084 (Oct 13, 2008)

I think the I7 will have the market that they seem to be pushing it to..

Professional workstations... They are not overclocked and they need to be powerful and stable.


----------



## DarkMatter (Oct 13, 2008)

That's XDR again, which has nothing to do with SDRAM memory and it's neither analog by any means. It's neither octal data rated in the same way as DDR SDRAM is dual data rated. 

It just uses 8 bit wide lanes to achieve 8 bits per clock per lane, but I fail to see how that is octal pumped. 

From the Cell (microprocessor) wiki article:



> The system interface used in Cell, also a Rambus design, is known as FlexIO. The FlexIO interface is organized into 12 lanes, each lane being a unidirectional 8-bit wide point-to-point path. Five 8-bit wide point-to-point paths are inbound lanes to Cell, while the remaining seven are outbound. This provides a theoretical peak bandwidth of 62.4 GB/s (36.4 GB/s outbound, 26 GB/s inbound) at 2.6 GHz. The FlexIO interface can be clocked independently, typ. at 3.2 GHz. 4 inbound + 4 outbound lanes are supporting memory coherency.



It's a completely different technology, with it's pros and it's cons. It's been long since DDR SDRAM probed to be the better economical RAM solution while providing almost the same performance as Rambus memory. XDR is better for embedded solutions that don't require too much memory.

I also still fail to see how you could do quad pumped SDRAM, that is that each memory cell performs 4 ops per clock cycle. And I also don't understand what would be the benefit of that, versus a DDR RAM with double the speed. I.e if your memory cells can perform 1600MT/s wouldn't it be better (simpler, easy to implement, cheaper...) a DDR running at 800Mhz than a "QDR" at 400Mhz?


----------



## BOSE (Oct 13, 2008)

*Would i want to see.*

What i really want to see, is multiple programs running at once. Tri-Chinal might not be good for games or most other applications, but perhaps it will do well in a multitasking world, where people are running at least 4 programs at once.

We need multitasking benchmarks.


----------



## masterbw2000 (Oct 13, 2008)

This review is flawed.

They did not use Tri-Channel certified kit to do the test. The system was running at Dual channel mode although three are used.

Tri-channel REQUIRE Tri-Channel Certified Memory Kit like the A-DATA news we have seen previously.
http://www.techpowerup.com/72926/A-Data_Releases_Tri-Channel_Memory_Kits_for_Intel_X58_Platform.html

You cannot just buy 2 x Dual Channel Kit and grab 3 out of 4 sticks because that will not enable true Tri-Channel mode.


----------



## BOSE (Oct 13, 2008)

masterbw2000 said:


> This review is flawed.
> 
> They did not use Tri-Channel certified kit to do the test. The system was running at Dual channel mode although three are used.
> 
> ...



No you dont. Its all the same RAM. Its just a selling gimmick for people like you. 

You can either buy 3 packs of of 2x1GB or, you can buy 2 packs of 3x1GB sticks. Its all the same.


----------



## niko084 (Oct 13, 2008)

masterbw2000 said:


> This review is flawed.
> 
> They did not use Tri-Channel certified kit to do the test. The system was running at Dual channel mode although three are used.
> 
> ...



Although that is "certified" I highly doubt that it is "required"...

First of all there is no way for the computer to tell if its certified or not, and I doubt its an entire different line of ram..

It's like dual channel certified kits, you don't need that either, but they said you did.


----------



## masterbw2000 (Oct 13, 2008)

BOSE said:


> No you dont. Its all the same RAM. Its just a selling gimmick for people like you.
> 
> You can either buy 3 packs of of 2x1GB or, you can buy 2 packs of 3x1GB sticks. Its all the same.




Hello. There is no need to argue about this because I had doubts myself about that too previously.
Or you can say that both MB & RAM Manufacturers need to Cross-Certify because MB BIOS is involved in that as well.
But of course, you are most welcome to check with ASUS, Gigabyte, MSI, or any manufacturer that is going to sell the X58 boards.


----------



## Morgoth (Oct 13, 2008)

so what point are you making?


----------



## DarkMatter (Oct 13, 2008)

You don't need any kind of certification is all marketing BS.


----------



## niko084 (Oct 13, 2008)

masterbw2000 said:


> Hello. There is no need to argue about this because I had doubts myself about that too previously.
> Or you can say that both MB & RAM Manufacturers need to Cross-Certify because MB BIOS is involved in that as well.
> But of course, you are most welcome to check with ASUS, Gigabyte, MSI, or any manufacturer that is going to sell the X58 boards.



That board manufacturers also list "certified" memory... Yet, hmm almost everything works..

It's multi level marketing and cross marketing, if Asus says Crucial ram, Micron pays them.


----------



## wolf2009 (Oct 13, 2008)

masterbw2000 said:


> Hello. There is no need to argue about this because I had doubts myself about that too previously.
> Or you can say that both MB & RAM Manufacturers need to Cross-Certify because MB BIOS is involved in that as well.
> But of course, you are most welcome to check with ASUS, Gigabyte, MSI, or any manufacturer that is going to sell the X58 boards.



wait a minute , who do you work for ? 

since you are from Taipei, Taiwan, maybe you have some of these samples ? you may be right , but that is a big maybe

I also think its marketing BS


----------



## BOSE (Oct 13, 2008)

It just makes it easier for people to buy ram in a pack of 3, instead of buying a pack of 2 and then an additional stick of ram.

Its common sense. This way, an average Joe or Jane feels better about their purchase because it has word "certified" on the package.

So let it go.


----------



## masterbw2000 (Oct 13, 2008)

I can tell you that X58 sample boards are out there enjoying being "pumped". The launch date is coming up soon.

Notes:
1. Three (3) sticks out of Dual Channel Memory kits, test didn't show any gain.
2. Certified version of Tri-Channel memory shows large performance gain.
3. Circuitry differences.


----------



## Morgoth (Oct 13, 2008)

these is not differend in a dual pack or a tripack there both pysical and technical the same 
if i buy 3x 1gb single or 3 paks of 2x2/2x1 or a 3 pack it all runs trichannel can also run dual channel 
i wil profe to you that 3x dual channel pack can run tri channel once i got my stuff


----------



## niko084 (Oct 13, 2008)

Morgoth said:


> these is not differend in a dual pack or a tripack there both pysical and technical the same
> if i buy 3x 1gb single or 3 paks of 2x2/2x1 or a 3 pack it all runs trichannel can also run dual channel
> i wil profe to you that 3x dual channel pack can run tri channel once i got my stuff



You can also verify that its running in dual or tri channel in the bios as well as the os.


----------



## PP Mguire (Oct 13, 2008)

Holy Jesus you guys know to much! Yall sound like my damn dad where this signal thing is concernd.


----------



## niko084 (Oct 13, 2008)

PP Mguire said:


> Holy Jesus you guys know to much! Yall sound like my damn dad where this signal thing is concernd.



Some of us are into more than computers...


----------



## PP Mguire (Oct 13, 2008)

Yea he has a few electronic and other degrees besides audio engineering. When he starts talking about that kinda stuff i get so lost. 

But yea what they where saying, its like the old days of dual channel. You dont need a special kit because its all the same. I never ran matching ram and it ran in dual channel.


----------



## niko084 (Oct 13, 2008)

masterbw2000 said:


> Please prove it to us later. But what's the point if we already know the answer?
> Sure, you will get like some points more for trying to enable tri-channel on dual-channel type of RAMs thus making tri-channel to look bad.
> But the real tri-channel will give you MORE points, that's the fun part.
> 
> Will you be getting the evaluation samples? or the mass prod. version?



So now its Fake VS Real Tri-Channel huh....

Buddy take a hike or show some evidence outside of marketing hype.


----------



## BOSE (Oct 13, 2008)

masterbw2000 said:


> 2. Certified version of Tri-Channel memory shows large performance gain.
> 3. Circuitry differences.




Says Who?? Where is it stayed?


----------



## Morgoth (Oct 13, 2008)

can you tell me what pysical and technical stuff is differend from a dual pack/ single modules from tri channel pack

i'm not talking abouth the package

i pre-ordered the msi x58 epclipse board and the webshop allready sended it to me


----------



## masterbw2000 (Oct 13, 2008)

That's definitely a secret as of now from the memory manufacturers but they did say that there are differences. Like I said earlier, I doubted initially too and tried 3 identical sticks out of two dual chan kits, compared with just 2 sticks, then benchmarked it.


----------



## Morgoth (Oct 13, 2008)

and how many % increas was it? got a source?
what memmory modules did you used?


----------



## niko084 (Oct 13, 2008)

Morgoth said:


> and how many % increas was it? got a source?
> what memmory modules did you used?



And how about a few screen shots of the benches...


----------



## masterbw2000 (Oct 13, 2008)

Sorry, I would love to but unfortunately I can't, and I am sure ya'll understand.
Please just wait till Morgoth gets his stuff and then we will find out. Thank you.

Patience is a virtue.


----------



## BOSE (Oct 13, 2008)

masterbw2000 said:


> Sorry, I would love to but unfortunately I can't, and I am sure ya'll understand.
> Please just wait till Morgoth gets his stuff and then we will find out. Thank you.



Morgoth is allowed to post the info and screen shots and you cant?? Thats BS dude.

You sir, are a liar and hypocrite.


----------



## masterbw2000 (Oct 13, 2008)

Dear BOSS, thanks for the comment.

I don't need to please you.

http://forum.coolaler.com/showthread.php?p=2185937
http://forum.coolaler.com/showthread.php?t=191847

Yup, diff RAMs mix-up.


----------



## niko084 (Oct 13, 2008)

Mussels said:


> Tri channel will help prevent bottlenecks as we all go 8GB+ in our systems, and i think we need benchmarks with MORE than a measly 2GB of ram, before calling it a failure.



I think this is more the issue, than the idea that its not running in tri channel.


----------



## BOSE (Oct 13, 2008)

masterbw2000 said:


> Dear BOSS, thanks for the comment.
> 
> I don't need to please you.
> 
> http://forum.coolaler.com/showthread.php?p=2185937




Look at the ram sticks in this picture two of them are they same and one of them is different.  Durrrrr!!!! 

http://www.coolaler.com.tw/coolalercbb/INTEL_I7_EXTREME965_X58/30.JPG


p.s. He is using G.Skill RAM, and they haven't made Tri-Chanel kits yet!

A better pic of his RAM. Two black sticks and one silver stick. 

http://www.coolaler.com.tw/coolalercbb/ECS_9800GTX+SLI/12.JPG


----------



## swaaye (Oct 13, 2008)

This CPU isn't designed for games that can only use one or maybe two threads. Nehalem is probably going to look disappointing if you want more speed for such apps. But man, if you give this thing an app that can pound all 8 threads, it eats every other CPU alive. The whole architecture is designed to extract more performance from each core when there are many threads.

I don't know of any game that will really leverage the design here. SupCom is probably the most aggressively parallelized game engine and it can't peg more than about 3 cores. This chip may be ahead of its time...


----------



## PP Mguire (Oct 13, 2008)

That dude just got owned by his own linkage. Take a hike bro these guys dont tolerate liars!!


----------



## masterbw2000 (Oct 13, 2008)

I knew you'd say that.
So take a look at the results from certified ones later.


----------



## DarkMatter (Oct 13, 2008)

masterbw2000 said:


> Dear BOSS, thanks for the comment.
> 
> I don't need to please you.
> 
> ...



In those links (in the second one, with 6 identical sticks) we can clearly see how it's not benefiting from the triple channel even one bit. The memory running at 1800+ Mhz is ~40% faster than 1333 Mhz. If we add that 40% to the ~15000 MB/s results in the OP graphs we get no more no less than a theoretical ~21000 MB/s, around what the guy is getting with his overclocked PC. No benefit with triple channel.



masterbw2000 said:


> I knew you'd say that.
> So take a look at the results from certified ones later.



So where are those results? Or are the ones in the second link (which you added after an edit) the ones with certified modules?


----------



## BOSE (Oct 13, 2008)

This is with new A-Data Tri-Chanel Kit. Its posted on this website btw. 

link -  http://www.techpowerup.com/73397/A-DATA_and_ASUS_Demonstrate_Intel_Nehalem_s_DDR3_Performance.html

pic - http://www.techpowerup.com/img/08-10-09/a-data-x58-ddr3-bench.jpg* At 2000Mhz*




And this is same test done by the guy from the link you provided earlier. He is using G-Skill RAM.

http://www.coolaler.com.tw/toppc/I7920/20087.JPG *At 1867Mhz*


As you see, the read and write speeds of both RAM kits is almost the same. Thus proves the point that you just got OWNED!


----------



## niko084 (Oct 13, 2008)

Alright chill out guys, all we are really showing is that with "current" software that "has" been used to test tri channel ram, we see nothing that makes us want to buy it.

While I doubt you will need a tri-channel kit for it to work, I do believe that tri-channel will show its benefits down line with proper application.


----------



## DarkMatter (Oct 14, 2008)

BOSE said:


> This is with new A-Data Tri-Chanel Kit. Its posted on this website btw.
> 
> link -  http://www.techpowerup.com/73397/A-DATA_and_ASUS_Demonstrate_Intel_Nehalem_s_DDR3_Performance.html
> 
> ...



Hmm it's interesting that 200 Mhz more didn't help increasing the bandwidth over what 1800 Mhz offered. Considering 20000 MB/s is also below what DDR3 1800 Mhz should offer in comparison to 1333 Mhz's 15000 MB/s (considering same efficiency in both cases), I think we can take one note: Nehalem CPU when overclocked to ts limits, in that specific benchmark can only offer 20000 MB/s and probably on stock it can "only" give 15000 MB/s, which is much better than Core2 anyway. My point is that maybe, in that benchmark, the memory is bottlenecked by the CPU!

It's not surprise to me anyway: Memory performance has increased a lot in recent years, while CPU doesn't. We have gone from memory bottleneck situations before 2004 to diminishing returns with more than DDR2 667 in 2008. 
A CPU that is 33%-50% faster than Core2 will never be able to use a lot more than 33-50% more memory bandwidth than Core2, but due to proper usage of DDR3 and the IMC, the available memory bandwidth is much much more. Also thanks to the IMC it CAN couple with more REAL bandwidth that that 33-50% increase, but it still has a limit. 

Adding a third channel increases the theoretical bandwidth but because the CPU can't work with such high bandwidth we see no gains. Keep in mind DDR3 1333 has a theoretical peak bandwidth of 10800 MB/s per channel (21600 MB/s in dual, 32400 in triple channel) so at 15000 MB/s in DC is way below that mark. On the other hand in the charts we can see the single channel one is sitting at 10300 MB/s, pretty close to it's peak bandwidth, one more detail that makes me think I'm right about this.


----------



## Basard (Oct 14, 2008)

summing it all up... sorta.... ya, this cpu will be cool, when crysis 6 comes out... lol


----------



## PP Mguire (Oct 14, 2008)

Im glad im just looking at an e8400. These seem to be a whop.


----------



## DarkMatter (Oct 14, 2008)

Well, I wouldn't say so. Just because Nehalem doesn't seem capable of using all the bandwidth it has unlocked by himself, that doesn't mean it's a fail. It's still way faster than Core2 clock for clock and has a lot more bandwidth even on single channel mode! SMT also seems to work very well.

It's maybe not worth it for games, but for everything else is faster enough to justify the expenses for many people (not me TBH), just as any other new CPU. Don't forget it's suposed to be aimed at the server market. Off course an upgrade from a C2Q is not worth it either, but if you were to build a completely new PC and you care about more than gaming, Nehalem is worth a look or two.


----------



## FordGT90Concept (Oct 14, 2008)

I think what it really boils down to is the IMC in Nehalem is over engineered and slower because of it.  The designed the IMC to handle not just four cores, but eight, and maybe even more (as much as 72).  I doubt we will see any significant changes to the IMC as far off as Sandybridge chips.

But yeah... maybe there is a bug in it.  When AMD shifted to IMC, they got a huge memory performance boost.  Intel appears to be taking a hit instead.  CIS is a more complex scheme but you'd think that would show throw in more than just gaming benchmarks.

I just wonder how what Nehalem runs with FB-DIMMs.


----------



## DarkMatter (Oct 14, 2008)

FordGT90Concept said:


> I think what it really boils down to is the IMC in Nehalem is over engineered and slower because of it.  The designed the IMC to handle not just four cores, but eight, and maybe even more (as much as 72).  I doubt we will see any significant changes to the IMC as far off as Sandybridge chips.
> 
> But yeah... maybe there is a bug in it.  When AMD shifted to IMC, they got a huge memory performance boost.  *Intel appears to be taking a hit instead.*  CIS is a more complex scheme but you'd think that would show throw in more than just gaming benchmarks.
> 
> I just wonder how what Nehalem runs with FB-DIMMs.



What are you talking about? A hit? Core2 peaks at 8000 MB/s while Nehalem does 15000 MB/s, both in dual channel, where do you see a hit there?


----------



## Wile E (Oct 14, 2008)

masterbw2000 said:


> I knew you'd say that.
> So take a look at the results from certified ones later.



They will be exactly the same. 



DarkMatter said:


> *I also still fail to see how you could do quad pumped SDRAM, that is that each memory cell performs 4 ops per clock cycle. And I also don't understand what would be the benefit of that, versus a DDR RAM with double the speed*. I.e if your memory cells can perform 1600MT/s wouldn't it be better (simpler, easy to implement, cheaper...) a DDR running at 800Mhz than a "QDR" at 400Mhz?


I don't know much about the technical side of ram, but what about GDDR5? It's rated as QDR, and as far as I was aware, it gets it's roots from SDRAM as well.

As a side note, do you think triple channel would come in handy on lower speed modules? Say, DDR3 1066 cas7?


----------



## PP Mguire (Oct 14, 2008)

DarkMatter said:


> Well, I wouldn't say so. Just because Nehalem doesn't seem capable of using all the bandwidth it has unlocked by himself, that doesn't mean it's a fail. It's still way faster than Core2 clock for clock and has a lot more bandwidth even on single channel mode! SMT also seems to work very well.
> 
> It's maybe not worth it for games, but for everything else is faster enough to justify the expenses for many people (not me TBH), just as any other new CPU. Don't forget it's suposed to be aimed at the server market. Off course an upgrade from a C2Q is not worth it either, but if you were to build a completely new PC and you care about more than gaming, Nehalem is worth a look or two.



I only game so its not worth it to me to even have a quad really. 4+ghz dualy is ideal for me


----------



## FordGT90Concept (Oct 14, 2008)

DarkMatter said:


> What are you talking about? A hit? Core2 peaks at 8000 MB/s while Nehalem does 15000 MB/s, both in dual channel, where do you see a hit there?


Game performance which tighter timings yield better FPS than higher bandwidth.  Games don't need a lot of bandwidth (only enough to satisfy the engaged cores) but, they need very quick response times.

Maybe the IMC/memory has little to do with poor Nehalem game performance which reminds me of something else.  Nehalem's architecture has strong ties to Pentium 4 w/ Hyperthreading more so than Core 2 architecture.  We all remember how Athlon 64 was the better gamer but Pentium 4 w/ Hyperthreading took the cake in terms of multimedia.  That pretty much explains everything.


----------



## e6600 (Oct 14, 2008)

right now, a Q9550 E0 should last us many years.  very good priceerformance chip


----------



## DarkMatter (Oct 14, 2008)

Wile E said:


> They will be exactly the same.
> 
> 
> I don't know much about the technical side of ram, but what about GDDR5? It's rated as QDR, and as far as I was aware, it gets it's roots from SDRAM as well.
> ...



I have noticed there's an insanely amount of things which are called QDR and are not exactly. In this regard Quad/Octal Pumping is a much better description to say it can do 4/8 operations per cycle. AFAIK GDDR5 uses two DDR streams that are conbined (multiplexed) to form a 2x faster memory module. Again it's not that each memory cell performs 4 ops per clock cycle, but 2 cells perform 2 ops per cycle and send the data together. In practice is almost the same as you are doubling the frequency per pin, but it's important to note that the difference with a QDR signal is evident: you don't have control to every operation being made, you only can control it by pairs. For GPUs that kind of parallelism is not a problem, but in CPUs it could create an insanely and undesired amount of latency (it already does in GPU BTW, but it's masked by the parallelism of the GPU.). In this regard is as if someone called a 2.4 Ghz Dual core CPU a 4.8 Ghz CPU. It's not the same. In the case of GDDR5 that's exactly what we do. Although as I understand it, the memory controler must work at twice the speed of the memory cells, so from that point of view it is twice as fast. It's what I said, if your memory is twice as fast (by any method), make the external clock twice as fast.

This is how I think it works, just from the clock being use perspective:







Consider the input the external clock generator. 

- In DDR a spike is created for every rising and falling edge.

- QDR would create 4 spikes. I don't know how, that's what I'm being asking all the time.

- Note how GDDR5's input clock is twice as fast. A completely different thing would be if the input was 100Mhz and it was doubled inside the memory itself and not externally. That is what I said it would be pointless IMO.


----------



## Hayder_Master (Oct 14, 2008)

anyone expect more performance in games with tri channel ram , why it is go down


----------



## Poisonsnak (Oct 14, 2008)

DarkMatter said:


> ... This is how I think it works, just from the clock being use perspective: ...



Yeah you're definitely on th right track there.  I have a bit of background in digital signaling so I know how this stuff works.  Interestingly enough if you read up on AGP (yes AGP) it's the same concept. AGP = SDR, AGP2x = DDR, AGP4x = QDR, AGP8x = ODR.

Some references here:
http://en.wikipedia.org/wiki/Agp
http://en.wikipedia.org/wiki/Quadruple_data_rate

I can summarize it here though:

SDR = transmits data on the rising edge of each clock
DDR = transmits data on the rising and falling edge of each clock
QDR = uses 2 clock generators, same frequency, one is 90° ahead (or behind) of the other (e.g. clock #1 has a rising edge, then halfway before its falling edge clock #2 has a rising edge).  Transmits data on the rising and falling edge of both clocks.

In practice you wouldn't actually use 2 generators but instead delay the second signal by 90° somehow but you get the idea.  ODR follows the same sort of pattern except you're using 4 clocks instead of 2.

The 2 clocks 90° apart thing can be hard to visualize so if I have time I'll draw a picture later today.  Another way to think of it (maybe easier) is that the falling edge of the clock could be considered to be a clock signal that is 180° behind the original clock.  In that case:

SDR = 1 clock 0°
DDR = 2 clocks, 0° and 180°
QDR = 4 clocks, 0°, 90°, 180°, and 270°
ODR = 8 clocks, 0°, 45°, 90°, 135°, etc.


----------



## Swansen (Oct 15, 2008)

FordGT90Concept said:


> We see this going from DDR, to DDR2, to DDR3.  Ultimately, I think it's time to stop with DDR and move to something that accomplishes more per work cycle like QDR or ODR.


my vote goes for a completely different memory architecture, my choice being XDR.


----------



## DarkMatter (Oct 15, 2008)

Poisonsnak said:


> Yeah you're definitely on th right track there.  I have a bit of background in digital signaling so I know how this stuff works.  Interestingly enough if you read up on AGP (yes AGP) it's the same concept. AGP = SDR, AGP2x = DDR, AGP4x = QDR, AGP8x = ODR.
> 
> Some references here:
> http://en.wikipedia.org/wiki/Agp
> ...



Yeah thanks, that's what I thought, phased signals and multiplexed input/output data (or something similar). I now just need someone to explain why would be better to use QDR, instead of a faster bus, when AFAIK only reason FSBs (or the like) are not faster is because of the slow memories. With DDR the benefit is clear as you can use the rising and falling edges of same 1 signal, but QDR and ODR require an additional signal (be it a different one or the same one phased out) and I don't see much benefit there for main memory where good latency is must*. Maybe the answer is really simple and I'm just missing it out, I dunno.

* One of the requirements to convince me is that the advantage of using QDR is not use for memory cell/bank parallelization (like in GDDR5) as that wouldn't be a good solution for main memory.


----------



## Morgoth (Oct 15, 2008)

psst remember where using now imc goes no longer trough nortbridge


----------



## DarkMatter (Oct 15, 2008)

Morgoth said:


> psst remember where using now imc goes no longer trough nortbridge





DarkMatter said:


> FSBs (or the like)



I used FSB as a generic term. I should have said bus interconnect or something like that. QPI or HT are still buses and in some way it's still in the "up front" of the architecture. FSB is a very specific technology, but IMHO it also describes more or less what HT or QPI is in a generic way. Much like HD means 720/1080p, but the word itself could mean any high resolution.

Anyway IMC indeed helps my point. As I understand it IMC allows for much faster inerconnects between the CPU and the memory, so whenever faster memory (by any method) is available, the bus should be made faster instead of using "multiple instances" of the same one. Am I right or not?


----------



## Poisonsnak (Oct 16, 2008)

DarkMatter said:


> ... I now just need someone to explain why would be better to use QDR, instead of a faster bus ...



I think (?) the main reason to use a lower clock frequency is for signal synchronization over long distances (PCB traces).  If 1 1GHz signal is travelling along 10cm of PCB trace at the speed of light then it takes 3 ns to make the trip, or more importantly 3 clock cycles.


----------



## DarkMatter (Oct 16, 2008)

Poisonsnak said:


> I think (?) the main reason to use a lower clock frequency is for signal synchronization over long distances (PCB traces).  If 1 1GHz signal is travelling along 10cm of PCB trace at the speed of light then it takes 3 ns to make the trip, or more importantly 3 clock cycles.



That doesn't make sense at all. Those numbers don't make sense to me. An electron or a hole travelling at light speed will cover 10 cm in 0,33 ns = 10cm/(300.000km/s*1000m/km*100cm/m). So it would have the time to do 3 travels.

Anyway, why would that matter? As I see it, it doesn't. It would be like saying that in a chain montage, you can't have a product every second because it takes 4 hours to each of them to go from start to finish. It's the production rate which matters, and unless I'm missing something important the same happens in electronics. Besides clock speed limits are constituted by much slower elements, such as the gate's (NAND, NOR...) state change delay, or the delay in transistors state change (one depends on the other really). Following the analogy, we can compare that to the time it takes to fullfill the trailers that will carry the goods to another place.

I understand there are limitations on clock speed, but considering the speeds at which GDDR5 runs, doubling what we have in our mobos several times wouldn't be a problem yet.


----------



## Morgoth (Oct 16, 2008)

http://en.wikipedia.org/wiki/Speed_of_electricity


----------



## DarkMatter (Oct 17, 2008)

Morgoth said:


> http://en.wikipedia.org/wiki/Speed_of_electricity



Yeah I didn't want to enter into specifics, so I used a simple analogy to differentiate between electron mobility and the signal propagation.


----------

