Monday, November 18th 2019

MATLAB MKL Codepath Tweak Boosts AMD Ryzen MKL Performance Significantly
MATLAB is a popular math computing environment in use by engineering firms, universities, and other research institutes. Some of its operations can be made to leverage Intel MKL (Math Kernel Library), which is poorly optimized for, and notoriously slow on AMD Ryzen processors. Reddit user Nedflanders1976 devised a way to restore anywhere between 20 to 300 percent performance on Ryzen and Ryzen Threadripper processors, by forcing MATLAB to use advanced instruction-sets such as AVX2. By default, MKL queries your processor's vendor ID string, and if it sees anything other than "GenuineIntel...," it falls back to SSE, posing a significant performance disadvantage to "AuthenticAMD" Ryzen processors that have a full IA SSE4, AVX, and AVX2 implementation.
The tweak, meant to be manually applied by AMD Ryzen users, forces MKL to use AVX2 regardless of the CPU Vendor ID query result. The tweak is as simple as it is powerful. A simple 4-line Windows batch file with a set of arguments starts MKL in AVX2 mode. You can also make the tweak "permanent" by creating a system environment variable. The environment variable will apply to all instances of MATLAB, and not just those spawned by the batch file. Nedflanders1976 also posted a benchmark script that highlights the performance impact of AVX2, however you can use your own scripts and post results.
Source:
Nedflanders1976 (Reddit)
The tweak, meant to be manually applied by AMD Ryzen users, forces MKL to use AVX2 regardless of the CPU Vendor ID query result. The tweak is as simple as it is powerful. A simple 4-line Windows batch file with a set of arguments starts MKL in AVX2 mode. You can also make the tweak "permanent" by creating a system environment variable. The environment variable will apply to all instances of MATLAB, and not just those spawned by the batch file. Nedflanders1976 also posted a benchmark script that highlights the performance impact of AVX2, however you can use your own scripts and post results.
67 Comments on MATLAB MKL Codepath Tweak Boosts AMD Ryzen MKL Performance Significantly
There are many things that made me start hating Intel, but probably the biggest for me was HSA suppression. I'm talking about strong AMD (with the fabs, with the highly competitive products) acquirement of ATI, in order to bring the new level of efficiency in computing in general, and how Intel effectively stopped it.
HSA should have been a step above CUDA, OpenCL and similar standards. HSA should've exclude the developer from the equation, they should've done things normally and HSA should've been interpreted on compiler level.
HSA Foundation members are AMD, ARM, Samsung, MediaTek, Qualcomm, and Texas Instruments... Who is missing? Of course, Intel - because no on-chip GPU worth speaking about, and of course NVIDIA - because having no CPU at all...
For those who aren't familiar with HSA... Both CPU and GPU do calculations, except FPU is many times faster on GPU and some other stuff are CPU-exclusive. HSA should've represented 'marriage' of CPU and GPU on the same die, with different tasks assigned to the part that does it better and in cooperation regarding resources used.
Why it failed? Because of ill-fated AMD Fusion project. Mistakes were made, solutions were delayed, bad Bulldozer (and forward) architectures, etc. Ending in weak AMD, with product who couldn't compete with Intel. On the other, uglier side, both Intel and NVIDIA actively sabotages the progress, from selfish reasons. Say, what are components of "typical" super-computer? Many Intel CPUs and many NVIDIA GPUs.
Would AMD APU with HSA actually used made a difference? I think yes. I think this still may happen, now when AMD has competitive products for both CPU/GPU. I think it could make difference in home computing, too. I think we have lower-quality products today on software side, thanks to shady business practice. I really liked HSA idea :)
How useful the patcher still is since 2010 though I don't know.
Compilers have had a ability to generate binaries that use flags for what they are running on since... forever. Intel should absolutely not be praised for what it is doing here.
It's not like I expect everyone to be a developer, but having a minimal understanding of how software works would be useful while discussing this topic...
I think I expected more... Intel is entitled to provide an API optimized for their CPUs (that's the whole point).
AMD is entitled to provide a similar product.
The main goal of software like Matlab is not to support CPU market or promote competition.
The goal is to compute efficiently. And since Intel provides an API that makes Matlab faster on Intel CPUs, why would they not use it? In fact: shouldn't we demand that they use it? Because the gains on Intel CPUs is really significant.
AMD can also provide such an API and I'm sure MathWorks (like every other major software maker) will happily provide a backend otpimized for AMD.
And most importantly: I have no idea why the criticism is aimed at Intel and not MathWorks. Intel gives no guarantee of MKL performance on AMD platform. In fact they may as well block it completely. It's software maker role to provide compatibility.
If AMD felt this is unfair or "anticompetitive", they should have pointed this out. They didn't. Why? Yes. MKL run on AMD CPU is very slow because of falls back to the simplest instruction set.
Of course we could argue if this is OK or not. Intel can't guarantee that AMD CPUs support AVX2 or not. What if they suddenly stopped? Would we them blame Intel for making a library that crashes on competition's CPUs?
But let's not do that.
Let's focus on the very simple fact: it's an Intel library. For the most part it's not open source. It's not designed to become a market-wide standard. And it simply shouldn't be used with AMD CPUs.
Similarly, we could criticize Nvidia because CUDA doesn't work with Radeon. Or blame Ford because their navigation system doesn't work in a Toyota. Why would it?
I mean... seriously... it's Matlab. Hardware makers should fight for performance in such applications. Intel does. Nvidia does.
AMD doesn't. And AMD fanboys - instead of expecting AMD to try harder - criticize everyone else.
Also, MATLAB isn't being unfair by going with Intel's MKLs. They need to because of AVX-512 and efficient FFTs. If your code is mostly linear algebra and not implementing any NLP, then MATLAB on AMD CPUs should be fine, which is why they implemented the MKL_DEBUG_CPU_TYPE=5 environment variable in the first place.
AMD doesn't support AVX-512, but they can make use of AVX at the very least (just like Intel since SB) and AVX2 (just like Intel since Haswell). Even on Intel, AVX-512 support for its various instructions is still selective and patchy on the whopping two current platforms that can support it (Xeon Phi and non-mainstream Skylake). By that logic, Broadwell-E and Haswell-E should also be kicked all the way down to hilarious SSE despite AVX and AVX-256 support. But the "GenuineIntel" string means that they aren't, now, are they?
Wonder why Intel would suggest LegitReviews use Mathlab as a CPU benchmark?
There are some tricks to measuring MATLAB performance, especially because it is not a compiled language but it can do just in time compiling, you can get variations in performance run-to-run. Feel free to send me a PM, I'd be happy to give you my 2 cents where I can.
Is it a s**tty, idiotic, anticompetitive practice? Yes. Does it achieve anything else than making Intel look like idiots? No. Is it their right to do this? Absolutely.
Finally, I'm not sure why this is even making news now. It's been common knowledge in ICC since 2009 and in MKL since 2013, so it's hardly new, and anyone who knows anything about ICC or MKL already knows how to patch the offending check out.
As I said earlier: Intel MKL is not supposed to serve the whole market. It's not universal. It's their software - made for their hardware.
They took things that existed (BLAS, LAPACK, FFT etc) and they've rewritten them to make the best use of what Intel CPU can provide. That's it.
MKL is not meant to replace the open-source libraries. Software makers can (and should) provide a separate implementation for AMD - just like they would have to do for ARM etc.
Intel and AMD share the same fundamental architecture, but there are significant differences in instruction set (not just AVX-512, but also DNN and more things will follow soon).
Is Matlab optimally coded for AMD CPUs? No. But it's MathWorks' and AMD's fault, not Intel's. Why is this anticompetitive? And if yes, then who is to blame?
If someone said MathWorks promotes Intel (i.e. Intel pays them not to make an AMD version), it would smell flat-Earth conspiracy, but I couldn't really prove that it's wrong.
But the thesis in this discussion, that Intel should optimize their software for competing hardware, is just bizarre.
software.intel.com/en-us/mkl
It's official. AMD isn't supported. Can we move on? :)
Given the above, and the fact that MKL is essentially the only library available that does what it does, it could likely be argued that Intel's behaviour here violates antitrust laws. Certainly, if someone wanted to sue Intel on this basis, they would likely have a better chance than when they were sued for doing this in ICC - at that time Intel was able to weasel their way out of a deserved smackdown by virtue of the fact that consumers weren't forced to use ICC, as there were other compilers that could be used. Yes, you could argue that AMD has had, and does have, the opportunity to create a competing library - but everyone knows how difficult it is to dislodge the market incumbent, even with a superior product.
Honestly though, I don't care if this breaks the law or not, it's just really terrible and unnecessary behaviour that goes against the grain of everything that is responsible and ethical software engineering. I don't like to blow the "all software should be free" horn, but this is an example where it's really necessary.
To be fair, Intel has been dominating the processor market for nearly over a decade, so for the clients where performance really matters they were probably running Intel already. But as the tables are turning there is more scrutiny on their design choices (and much deserved).