Yes, I did. Did you? The link I posted was an interview with the guy in charge of HBM development @ AMD. You read memory controller traces.. I read MEMORY traces... the same things that are used to connect GDDR5 GDDR3... the PCB bits. That's not part of the memory controller. The interface is wider, yes, but the per-pin connectivity is also wider, allowing for less interconnects.
I'll quote Joe Macri (AMD employee in charge of HBM) again:
By this diagram, you can see that the interposer is merely full of electrical pathways. The logic die in the HBM stack is what you are thinking about for saving on die space, NOT the interposer.
For a moment, let's assume we're both idiots. Let's just look at the pretty picture.
You've got the package substrate, which is the PCB. You've got a GPU die with just the interconnection points shown. Connect them, using your diagram. What the picture shows is that the interposer can take any placement on the GPU, and run conductors three dimensionaly to the DRAM dies. This means no matter what the GPU traces look like, it can connect to the DRAM. I'm assuming, given how obvious this statement is, that we can agree to it.
Now back to intelligent people land. What packages do chips come in? There's BGA, which is more expensive because it requires a PCB with multiple layers. Those are a royal PITA. There's packaging that places all of the pins on the outside edge. Those are great for manufacturing, but the trade off is that the inside of the GPU has to accomodate traces getting to those pins. Conductors are threaded through the system, such that space is wasted on the die for easy access to the pins.
AMD is demonstrating BGA, where there is no need for multi-layer PCBs. They've basically taken that difficulty out of the equation by having an interposer do it. The interposer can be produced relatively cheaply if it uses older lithography, but it's far easier to make that layering traces inside of fiberglass.
So at this point, I need you to be reasonable. Looking at the picture above how do you connect the GPU to the DRAM? You can argue that a 7+ layer PCB would work, or you can argue that the memory controller has been given free reign to reach for the DRAM chips like an octopus. I'll be blunt, if you can find a way to make 7+ layer PCBs consistently and cheaply I'll give the argument to you. In my experience 3 layers is enough of a nightmare.
AMD isn't insanely stupid. They add cost by making an interposer. They decrease costs (or improve performance) by decreasing necessary silicon on the more expensive GPU die by having interconnect spaghetti throughout the interposer. I don't care about the logic die. Unless you can prove otherwise, my understanding is that it is basically just a secondary controller similar to what we already enmebbed into our current DRAM chips. I haven't seen anything to indicate that isn't the case, but more specifically this article never indicates otherwise. Their exact words are:
"Although the interposer is essential, the truly intriguing innovation in the HBM setup is the stacked memory. Each HBM memory stack consists of five chips: four storage dies above a single logic die that controls them."
I'm even going out on a limb here, and calling out your other statement. Here it is:
LuLz. You didn't wait for my edits.
I have yet to see anywhere that the IMC is on the interposer. It's mentioned that it is passive (ie, not powered), so that idea that memory control is on the interposer doesn't make sense to me personally. It's also cheap, since it's passive, and just a few layers thick. It IS large, though, so could be fairly complex, but that's not how I read what is posted.
Also, HBM2 apparently removes the need for the interposer. PASCAL cards shown already have a rather normal-looking GPU substrate. So any conjecture about the interposer on future products that are reported to use HBM2 is useless. This also plays into the comparison of HBM vs GDDR5X.
....
In your own article, they explain exactly why your assertion that the interposer may not be needed for HBM is crap. Here's the quote:
"Macri explained that the interposer is completely passive; it has no active transistors because it serves only as an electrical interconnect path between the primary logic chip and the DRAM stacks.
The interposer is what makes HBM's closer integration between DRAM and the GPU possible. A traditional organic chip package sits below the interposer, as it does with most any GPU, but that package only has to transfer data for PCI Express, display outputs, and some low-frequency interfaces. All high-speed communication between the GPU and memory happens across the interposer instead."
I'm just a schmuck, who you can choose to ignore. The problem is that when your expert says the exact opposite, he's no longer an expert somehow? You really should stop arguing here. Your pictures prove my points. Your articles prove the points. It requires a very narrow and difficult to justify perspective in order to claim what you've said. They clearly say that the interposer is offloading high speed communication that used to be on the GPU. They clearly diagram said high speed communication is between the GPU and DRAM. I don't know how else you want to try and justify that the interposer isn't largely an extension of the memory controller.
In my mind the only argument that remains is that they didn't outright say that the interposer is an extension of the memory controller. The catch is that they can't. The high speed interconnects of the memory controller are in the interposer (they confirmed this), as well as the power feeds and interconnects to the rest of the PCB. As such, it's a device which performs multiple functions of which some are usually on the PCB. If you'd like to understand why that is, let's review exactly how carefully Macri chose the words that were utilized:
"When I asked Macri about this issue, he expressed confidence in AMD's ability to work around this capacity constraint. In fact, he said that current GPUs aren't terribly efficient with their memory capacity simply because GDDR5's architecture required ever-larger memory capacities in order to extract more bandwidth. As a result, AMD "never bothered to put a single engineer on using frame buffer memory better," because memory capacities kept growing. Essentially, that capacity was free, while engineers were not. Macri classified the utilization of memory capacity in current Radeon operation as "exceedingly poor" and said the "amount of data that gets touched sitting in there is embarrassing.""
This is why Macri said that 4 GB on Fiji wouldn't be an issue. He said they had "a couple of engineers" on it, and that simply throwing more RAM at the problem was always the solution. What Macri said here is tantamount to "we don't ever optimize," yet he's said it in a way that the interviewer bought as a valid explanation for why they needed to optimize now. Talk about getting punched in the gut, then asking "please sir, may I have another?" Whatever else can be said, Macri is aware that the interposer is vital for memory access. He acquiesces that memory usage has historically been crap.
After all of that, you're still arguing that AMD hasn't chosen to make gains by introducing cheap processes to save on the expensive ones? You can't see where a 65 nm interposer saving them from a couple million dollars in engineering work isn't necessary? I'm sorry, but I just can't put it more simply than that. Heck, Macri himself said that high speed data was the primary purpose of the interposer. If memory access isn't high speed data, when PCI-e was cited as low speed, I haven't the slightest clue what is.