512 AVX enabled CPU is needed

nulik · Jan 24, 2016

Hi,
I need to buy a CPU with AVX - 512 instruction set, for development purposes, but I don't understand which CPUs does contain it ???
It may be an AMD or Intel CPU, it doesn't really matter, of course I will be buying the cheapest one for sure, but , the question is, where do I get the info about which CPU supports it ? Is there a list or something like that? Because some internet blogs say they are not have been released yet, some says it is enabled only in Intel Xeon Phi, but I am not crazy to spend 2,500 USD on Xeon Phi. It is a complete mess with the info about the subject. Please help!

TIA
Nulik

ATMINSIDE · Jan 24, 2016

Only on the Xeon Phi right now, no mainstream processors have AVX-512 yet.

nulik · Jan 25, 2016

ATMINSIDE said:
Only on the Xeon Phi right now, no mainstream processors have AVX-512 yet.

You know what I was thinking today, this thing may be dead already. I mean, who would want a 512 AVX if they can get a GPU thousands of times faster and lots of shaders for around 200 bucks. I am probably betting on dead technology, I should be developing for GPUs instead. The thing is , not everybody has a GPU.

wingman99 · Jan 27, 2016

Some Skylake Xeon processors, expected in 2016 or later https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#AVX-512

Dolk · Jan 28, 2016

nulik, what are you trying to do with a 512-AVX instruction set? Not many people require that particular IS. Skylake CPUs can work 2-256bit AVX instructions and combine them, but this is a much slower process. There is also an issue with the instruction set on Skylake CPUs. http://danluu.com/cpu-bugs/

RJARRRPCGP · Feb 3, 2016

Looks like the biggest Intel F-up of this century! The last time Intel had major processor flaws, was in 1994 and maybe 1995!

Expect Intel to have a processor recall pandemic and processor RMA pandemic!

I'm boycotting Skylake processors!

nulik · Feb 3, 2016

Dolk said:
nulik, what are you trying to do with a 512-AVX instruction set?

I am writing a high performance network daemon + my own TCP/IP stack, so ... it is very cool to have parallel execution inside the processor

Dolk said:
Not many people require that particular IS.

You are right, I am a performance-obsessed person, will try to write everything in assembly but eventually get tired and switch back to C.

Dolk said:
Skylake CPUs can work 2-256bit AVX instructions and combine them, but this is a much slower process. There is also an issue with the instruction set on Skylake CPUs. http://danluu.com/cpu-bugs/

Depressing stuff indeed. But what can you do? Run on AMD ? It probably has much more bugs. Do you know of any CPU processors that is much better than Intel and has much lower amount of bugs? Please tell me which one!

Dolk · Feb 3, 2016

Some guess work here:

Assuming that you are using high bit AVX IS to quickly align/filter/create TCP/IP payloads would require 10GIG+ ether or Fiber channel. Really, you should be using a FPGA co-processor on a PCIe Gen3.0 to properly implement this project. CPUs and GPUs are great for general processing in an OS environment, but are vary limited when you are assigning precise instructions and utilizing the system for a precise purpose.

Thus I suggest you bring in an FPGA to do your TCP/IP stacking as you can easily create your own AVX IS of nth bit length. Attache the FPGA to a PCIe Gen3.0 riser card and than use the OS to control the FPGA. You will need to create firmware to handle the communication between the x86 and the FPGA, but this isn't too difficult if you write it all in C and create your own program. Hell, you could even use Matlab to interact with the FPGA.

If you are not too comfortable with that route, go with a Haswell/Broadwell Intel CPU and do a 2-step 265bit AVX parallel execution for each 512bit AVX instruction. Disable Hyperthreading unless it is only interacting with memory handling, as this will create headroom in your AVX performance. Using two cores in parallel, you can perform the two 256bit AVX instructions in one step. Yes this means that for each CPU, you relatively get half the performance, but its a solution. You will not be able to utilize a GPU unless the AVX stack is properly paralleled so that each instruction can be performed in a parallel loop. In either case you will need a lot of system memory (dependent on the stack size)

One last note:

An Intel/AMD x86 system will not be able to download, process, and upload as quick as a FPGA co processor for this kind of task work. This is due to the CPU bus. Best case, your CPU will directly communicate with the PCIe bus that your Ethernet/Fiber is attached to. The CPU cores will have to utilize some clocks to tell the eathernet/fiber controller what to do with the payload that its sending. This kind of overhead will be significant to your total performance. An FPGA can easily handle the paralleled task of being a ethernet/fiber controller and an AVX computation unit.

nulik · Feb 3, 2016

Dolk said:
Some guess work here:

Assuming that you are using high bit AVX IS to quickly align/filter/create TCP/IP payloads would require 10GIG+ ether or Fiber channel. Really, you should be using a FPGA co-processor on a PCIe Gen3.0 to properly implement this project.

Wow! What a great idea!! I will investigate on this ASAP. In fact, I googled about the topic and Intel is working out a microprocessor with FPGA embedded. Here the news link:
http://www.pcworld.com/article/3006...ce-boosting-fpga-to-ship-early-next-year.html
I am really eager to see the price label. What cards would you recommend ? Because I found some at 5k and here one for 99 bucks.:
http://dangerousprototypes.com/2011/05/18/99-dollar-pci-e-fpga-kit/
I need an inexpensive one , to make some tests.

I should investigate more, this is a new subject for me, and I appreciate very much your insights. Thank you.

nulik · Feb 4, 2016

Found this:
http://www.nvidia.com/object/nvlink.html
Indeed FPGAs are great, and it is the future. But right now GPU ASICs are faster for SIMD processing (imho) . FPGAs still have low clock speeds, and support low amounts of RAM. I think FPGAs are long term investment and it will start to mainstream after 2020, like GPUs now.
But this Nvlink from Nvidia is difficult to resist. PCI express will never give 80 or 200 GB/sec transfer.

Dolk · Feb 4, 2016

You are not going to get the performance you desire unless you bring in more money. Sad truth there.

Also, modern FPGAs compete with GPUs on SIMD processing every day.

https://www.altera.com/products/fpga/stratix-series/stratix-10/overview.html
http://www.xilinx.com/products/silicon-devices/fpga/virtex-ultrascale-plus.html

Granted those two examples are for high end development. You can easily use a Cyclon V or a Spartan 6 to do the work you are trying to achieve. I'm also going to start assuming that this is for a senior project or something similar, so you have $0 for just about anything. Start off small and grow, this project is already an advanced subject and normally performed in large companies that have very knowledgeable CpE/EE designers. I'm going to reiterate that a FPGA as a co-processor is going to be your only solution, unless you find an ASIC that will do what you want. Most companies already go this route anyway for highspeed communication and data parsing. This is because, FPGAs and custom ASICs are the only means of quickly taking in data, and adding in custom parsing. You do not want to rely on an x86 solution as this will be way to slow.

To add, Intel will have their FPGA embedded CPUs ready sometime this year? I do not have any hands on them yet, but my Altera peeps tell me they will be here soon enough. I am personally looking forward to these types of CPUs as they have been needed for a long time. Also don't expect them to enter into the PC market, ever. These types of CPUs will almost exclusively live in the server world due to size and power consumption.

nulik · Feb 4, 2016

Dolk said:
You are not going to get the performance you desire unless you bring in more money. Sad truth there.

The problem is, my project attempts to bring parallel server applications into the masses, people who don't know about performance and parallel computing. But they do have x86 SIMD hardware starting from SSEv4.2 and they can easily buy a 200 bucks GPU card if they will get 20x to 50x performance by changing to my software. I doubt they would buy additional hardware for more than 1,000 bucks, which is (as I have briefly investigated) the cost of good FPGA card. (and that is without Ethernet, I will have to still use the PCI bus)
FPGA is better , I perfectly understand, but I won't get such attention from the market, if I display high upgrade costs. It's purely marketing decision. GPUs will give me the speed up I need, and after the market have been taught about the benefits of parallel computing, you can push FPGA. Kindof , version 2 software release.

Dolk said:
To add, Intel will have their FPGA embedded CPUs ready sometime this year? I do not have any hands on them yet, but my Altera peeps tell me they will be here soon enough. I am personally looking forward to these types of CPUs as they have been needed for a long time. Also don't expect them to enter into the PC market, ever. These types of CPUs will almost exclusively live in the server world due to size and power consumption.

Yes that would be ideal, but how much will it cost? Intel is expensive, their Xeon Phi was released at above 2k price, I am not going to persuade the users to switch to my software if they have to buy .... i don't know 2,500 bucks for Xeon + FPGA ? By using AVX 256 I will get a minimum speedup of 20x. Now, if I switch to GPUs, by losing speed on PCI bus, but winning on massively paralel computation, I am going to neutralize the PCI express slowness.
Unfortunately, money drives the technology decisions towards not an ideal solution sometimes....

512 AVX enabled CPU is needed

nulik

Registered

ATMINSIDE

Sim Racing Aficionado Co-Owner

nulik

Registered

wingman99

Member

Dolk

I once overclocked an Intel

RJARRRPCGP

Member

nulik

Registered

Dolk

I once overclocked an Intel

nulik

Registered

nulik

Registered

Dolk

I once overclocked an Intel

nulik

Registered

Similar threads