Wednesday, August 13th 2014
Intel Haswell TSX Erratum as Grave as AMD Barcelona TLB Erratum
Intel's "Haswell" micro-architecture introduced the transactional synchronization extensions (TSX) as part of its upgraded feature-set over its predecessor. The instructions are designed to speed up certain types of multithreaded software, and although it's too new for any major software vendor to implement, some of the more eager independent software developers began experimenting with them, only to discover that TSX is buggy and can cause critical software failures.
The buggy TSX implementation on Core "Haswell" processors was discovered by a developer outside Intel, who reported it to the company, which then labeled it as an erratum (a known design flaw). Intel is addressing the situation by releasing a micro-code update to motherboard manufacturers, who will then release it as a BIOS update to customers. The update disables TSX on affected products (Core and Xeon "Haswell" retail, and "Broadwell-Y" engineering samples).
TechReport's Scott Wasson draws a parallel between the TSX erratum, and the infamous translation lookaside buffer (TLB) erratum of AMD's "Barcelona" chips, which caused the company to temporarily halt production of its first single-die quad-core Opteron processors, and release similar "performance-impacting" micro-code updates for its consumer Phenom X4 processors. Expect your motherboard vendor to dish out a BIOS update with Intel's micro-code patch very soon.
Source:
TechReport
The buggy TSX implementation on Core "Haswell" processors was discovered by a developer outside Intel, who reported it to the company, which then labeled it as an erratum (a known design flaw). Intel is addressing the situation by releasing a micro-code update to motherboard manufacturers, who will then release it as a BIOS update to customers. The update disables TSX on affected products (Core and Xeon "Haswell" retail, and "Broadwell-Y" engineering samples).
TechReport's Scott Wasson draws a parallel between the TSX erratum, and the infamous translation lookaside buffer (TLB) erratum of AMD's "Barcelona" chips, which caused the company to temporarily halt production of its first single-die quad-core Opteron processors, and release similar "performance-impacting" micro-code updates for its consumer Phenom X4 processors. Expect your motherboard vendor to dish out a BIOS update with Intel's micro-code patch very soon.
20 Comments on Intel Haswell TSX Erratum as Grave as AMD Barcelona TLB Erratum
Seems a bit sensationalist doesn't it?
Barcelona's TLB was producing erroneous results in enterprise workloads from day one. As far as I'm aware the only TSX enabled processors are desktop Haswell and whatever Broadwell samples are being sent around...and the grand total of software applications using TSX totals zero.
it reeks of flame bait?
The title in my opinion is justified. It is the conditions that are totally different. It's how the same thing, because of different condition, we look it at two totally different angles some times.
Haswell TSX bug is stupid. As if Intel didn't have enough to make their things right... c'mon intel...
On the other hand, the Bulldozer TLB bug "fix" affected ALL software - anyone who bought an AMD CPU with the TLB bug, experienced reduced performance overall. No, we don't give a s**t because there's no reason to. The ordinary users who make up 99.999999% of Haswell's marketshare will never use TSX, so they won't be affected. TSX has been around since 2012 and hasn't seen wide adoption. Designing silicon is difficult. Put the two together and it's not difficult to figure out how a bug like this could get into production silicon.
Kind of sucks if you bought Haswell because of the TSX extension, but I'm picking most people don't even know what it is let alone have it on their must have list. That doesn't make a great deal of sense to me. Overall market share has nothing to do with it - even market shares comparing Haswell users to Opteron users (while more relevant - I don't believe Haswell owners make up the 14.6% of their market that Opteron did in Q4 2007). AMD had to cease in-progress shipments of flagship enterprise part that was causing errors in current workloads. How can it affect more people when there is no software that could take advantage of TSX even if it were available? No, they don't (maybe) give a sh!t because they couldn't use TSX in any case. More to the point, disabling something you couldn't use anyway doesn't make the bug a dealbreaker for most people. And that is another very large distinction. Disabling TSX doesn't affect performance using other ISA extensions, where the workaround for the TLB bug for Barcelona came at a ~5-20% performance penalty.
For the Broadwell's already circulating (Broadwell-Y) ? No, but I think only OEM's have them for validation at the moment.
From the Intel spokesflunky's wording, the reason for the bug has been found, so it just comes down to how quickly new lithography masks can be set up and the 8-12 weeks for fabbing the chips. Depends on how serious Intel view the bug as to when it's implemented. If it's serious then they'd likely get on to it quickly and the new chips will receive new sSpec codes at the very least.
ark.intel.com/products/80807/Intel-Core-i7-4790K-Processor-8M-Cache-up-to-4_40-GHz
How about all the companies that may have purchased Haswell hardware with the intention of using it to improve database performance, along with other time sensitive software performance, and now.....can't.
www.anandtech.com/show/6290/making-sense-of-intel-haswell-transactional-synchronization-extensions
Like banks, and like most of all other financial institutions, where consumers are demanding faster deposit, transfer, and availability times, which requires...... you guessed it, faster database processing for millions of transactions per hour.
Good troll post asshat.
AFAIK, enterprise systems will continue to be Xeon E5 (Grantley) and E7 (Brickland). The former might initially be affected, but since their launches will be staggered its probably a safe bet that not all will be. The E7 line (Haswell-EX - which is being pushed as the cloud computing big data SKU's) will have TSX enabled. Given your argument isn't exactly watertight how about toning down the insults?
The degree of effect these bugs have on computing is only half the discussion as to whether or not they are equally grave. They are not given the circumstances, but in my case resulted in similar ends. That matters most to a bottom-line company such as Intel.
Now, will the bottom-line be affected equally? Probably not, given that Intel has the experience and resources to make a speedy recovery while keeping a happy face about it. Just saying that this this aspect should not be forgotten as other companies are not as lucky as Intel. Their underestimation has lead to much dire consequences.
btarunr used TechReport as the source and quoted content from the article in his repost. My problem is the fact that the headline doesn't match the article specified and quoted as the source. If btarunr wanted to use that headline he needed to find different/additional reference material. This is not the first time a news piece at TPU has had an unqualified and unrelated headline as clickbait. And like the last time I had already read the source material prior to seeing his repost and was scratching my head as to how someone came up with that headline from the content that was sourced.
They are already out, and have the errata.
www.pcworld.com/article/2464880/intel-finds-specialized-tsx-enterprise-bug-on-haswell-broadwell-cpus.html
As does Broadwell (E5), so the new enterprise chips are going to have it until Intel finds a hardware fix. OEM's already have the chips. So we have hardware in the wild, production systems being built, that cannot support a new feature that was being pushed. For those who have based their purchasing decision on this, its going to suck, will the OEM or Intel provide new chips to those people, it wouldn't be the first time Intel has had to do that, although the last time was perhaps before you were born.
My asshat comment was about critiquing of an article title, if you find it to be clickbait, go elsewhere, or do research on the topic and realize it is something show stopping for end users and OEM's.
techreport.com/news/26911/errata-prompts-intel-to-disable-tsx-in-haswell-early-broadwell-cpus
Intel doesn't have a current timeline for the fix, and really considering how much complexity it takes and improvement it was offering they might not be able to implement an actual full fix, perhaps limiting the number of threads or cores that can use it.
Also, damn that is a long list of broken things, 5 pages
www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e3-1200v3-spec-update.pdf
Apparently normal Broadwell-DT will be fixed.