How to test AVX-512 instructions w/o supported hardware? [closed]
Clash Royale CLAN TAG#URR8PPP
How to test AVX-512 instructions w/o supported hardware? [closed]
I'm trying to learn x86-64
's new AVX-512 instructions, but neither of my computers have support for them. I tried using various disassemblers (from Visual Studio to online ones: 1, 2) to see the instructions for specific opcode encodings, but I'm getting somewhat conflicting results. Plus, it would've been nice to run some instructions and see their actual output.
x86-64
So I'm wondering if there is an online service that allows to compile small (x86-64) assembly code and run it, or step through it, on a specific processor? (Say, Intel's Sandy Bridge
, Cannon Lake
, etc.)
Sandy Bridge
Cannon Lake
This question appears to be off-topic. The users who voted to close gave this specific reason:
2 Answers
2
Use Intel® Software Development Emulator, aka SDE to run an executable on an emulated CPU that supports future instruction-sets. It's freeware (not open source, but a free download), and is available for Linux, Windows, and I think also OS X.
https://software.intel.com/en-us/articles/debugging-applications-with-intel-sde has step-by-step instructions for how to debug with it on Windows or Linux: SDE can work as a GDB remote, so you can run sde -debug -- ./your-program
, then in another terminal run gdb ./your-program
and use target remote :portnumber
to connect to the SDE process so you can set breakpoints and single-step.
sde -debug -- ./your-program
gdb ./your-program
target remote :portnumber
You might be able to do the same thing with QEMU, if they've added support for emulating AVX512. QEMU can also act as a GDB remote.
QEMU definitely has configurable instruction-set stuff, e.g. you could tell it to emulate an x86 with AVX but not AVX2 (like Sandybridge.) SDM can probably do the same thing.
You could even tell it to emulate something you won't find on real hardware, like AVX2 but not BMI1/2, if you want to verify that your CPUID checks don't assume anything implies anything else that isn't guaranteed.
Remember that these are both essentially useless for performance testing, only for correctness of your vectorization. IACA could be useful to get an idea of performance on SKX, but it's far from perfect and doesn't model memory bottlenecks at all. (Only the actual pipeline in some level of detail.)
@MikeF: My answer shows how you can single-step through the emulated code with a debugger. (Or at least links to an Intel article about how to do that on Windows. I only quoted the Linux part, because it's a couple simple commands.)
– Peter Cordes
Aug 12 at 4:56
@MikeF: If you literally just want a disassembler, use
objdump -drwC -Mintel
or Agner Fog's objconv
to convert machine code into asm text. Your CPU doesn't have to support AVX512 for a disassembler to work, no emulation or anything needed. Or if you're compiling C or C++, use godbolt.org to get asm output from the compiler directly, without creating an executable and then disassembling it. e.g. godbolt.org/g/YsVuAX has some example functions with compiler output from gcc, clang, and MSVC.– Peter Cordes
Aug 12 at 5:09
objdump -drwC -Mintel
objconv
@MikeF: Are you doing that for performance testing? Your question doesn't say that, so a free emulator you can run on your desktop to single-step AVX512 code seems a lot better to me.
– Peter Cordes
Aug 12 at 5:22
@MikeF: That's exactly what you can do with an emulator, like my answer explains, without having to remote-desktop to a cloud VM to run a debugger there. That's how I learned AVX512. (Actually I spent more time just looking at compiler-generated asm for stuff I tried with intrinsics; I think I only actually ran things in SDE once or twice. Seeing what syntax was accepted by NASM was another way I learned how/when you could use masking and broadcast loads, and rounding-mode overrides.)
– Peter Cordes
Aug 12 at 5:28
There are online tools which allow you to at least select different assembly dialects, but I'm not seeing anything that supports Xeon Phi or Skylake. However, the Intel C++ and Fortran compilers support cross-compiling for those additional architectures. It seems you're using Windows, and that is directly supported.
An additional route would include renting an AWS EC2 C5 instance to play with which natively supports AVX-512. For learning purposes, this can be done for as little as $0.085/hr for a reserved instance or $0.0185/hr if you're fine with Spot pricing.
Hey, thanks. Your AWS idea sounds very interesting. Although I've never deal with them before. Where do you take all these prices from? And also what is "spot pricing"?
– MikeF
Aug 12 at 4:14
Pricing varies over time, but this link should stay up to date. The "spot" instances differ from the "on-demand" instances in that you don't get a machine instantly allocated necessarily. Amazon uses them to fill the gaps in the normal usage and is willing to offer a discount since something is better than nothing (as long as that something exceeds their operating overhead). Your testing likely doesn't require lots of resources or persistent storage between instances on their machines, so the cheapest option should work fine.
– Hans Musgrave
Aug 12 at 4:18
Examining your comment on the other answer, AWS is Amazon, and Azure has a comparable product with AVX-512. Their pricing is competitive -- not outdoing the spot instances but handily beating AWS on-demand products.
– Hans Musgrave
Aug 12 at 4:19
Yep, thanks. I'll try to dig through it. So far it's all very confusing. Let me try to get it straight. I'd rent a VM that I can install, say, Windows on and then remote into it, right? If so, it would be a good idea, as I can run a remote debugger on it with Visual Studio. What confuses me is their naming in that list you linked. Say
t1.micro
, t2.small
, and so on -- million things on that list. Also how do I select which CPU it will run on?– MikeF
Aug 12 at 4:23
t1.micro
t2.small
Those clouds services are IMO needlessly complex. You'd rent a VM and be able to choose what kind of VM it is (e.g. Windows). You don't have to install the OS. You'd need to dig into the docs to verify the CPU type, or you can take my word for it that Amazon is bragging about AVX512 in the C5 instances and that Microsoft is bragging about it in their Fv2 instances. Both providers use Skylake processors which have the newer version of the AVX512 instruction set. To select which kind of, for example, C5 instance you want you'd need to compare their other properties like RAM. Cheapest should work
– Hans Musgrave
Aug 12 at 4:31
Yeah, I thought about an emulator too. I may try it. Although it's quite limiting. Stepping through code with a debugger would be my optimal solution. As for other online disassemblers, as my experience shows, most run on processors that don't support AVX512. I need to see if Amazon or Microsoft's Azure has a plan that supports low cost CPU rental. (like Hans Musgrave suggested.)
– MikeF
Aug 12 at 4:17