• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

Advice on build: large RAM + max storage throughput

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.

ned3000

Registered
Joined
Oct 13, 2008
Location
Boston, MA
I'm attempting to put together a new system with the main requirement being a large amount of memory. Maybe 256GB if that's practical.

The other thing I'll want is a lot of throughput from the drives.

If you're wondering why I'm prioritizing those two things, the main purpose of the machine will be for running virtual instruments for music production. They basically load a lot (seriously) of data into RAM and then stream relentlessly from disk with relatively tight real time constraints.

It looks like the AMD motherboards max out at 128GB RAM, so I'm thinking Intel. The 9820X CPU looks like it might work ok but I haven't really narrowed it down too much yet. Selecting a motherboard has been more problematic. I *think* the difference between having fast M.2 drives connected directly to CPU PCI lanes vs. though shared PCI lanes on the chipset might make a difference in my use case.

It's been kind of difficult to get to the bottom of how various motherboards work in that regard. With a few models it's sort of implied that the motherboard M.2 slots are direct CPU connections but it's not 100% clear. In most of the motherboards though it seems like they're shared.

Another options is one of those PCI cards with 4 M.2 connectors. There's some confusion there over how the second PCI slot is shared with the first (which will contain the graphics card) as well, but it seems like in some cases two of the slots can do full 16x (I think), so 4 drives could each get x4 lanes. That seems ideal, but from what I've seen that solution can be a hassle to set up.

So ... any input would be appreciated. In particular what a good choice for motherboard might be and what option to go with for the drives.
 
What is your budget?

Are you sure that you would really benefit from more than 128 gb of memory?

I think that with PCI-e 4.0 there are some m.2 slots that do communicate directly with the CPU (maybe one of them) while the others go through the south bridge.

Another option for fast storage might be a striped RAID of some kind.
 
The 9820X CPU looks like it might work ok but I haven't really narrowed it down too much yet. Selecting a motherboard has been more problematic. I *think* the difference between having fast M.2 drives connected directly to CPU PCI lanes vs. though shared PCI lanes on the chipset might make a difference in my use case.

The i9-9820X supports 44 PCIe lanes. Look at the MSI Creator X299, X299 PRO 10G,or X299 PRO motherboards which come with 2 M.2 slots connected directly to the CPU and support 256GB capacity.
 
I would think it would be wise when using that much RAM to stick to lower frequencies so as not to overtax the CPU's memory controller.
 
I would think it would be wise when using that much RAM to stick to lower frequencies so as not to overtax the CPU's memory controller.
There is little choice but to do so. I'd look around 3200 MHz....maybe even look at the QVL list for the motherboard you choose....




I also have to ask the OP, have you really looked at the RAM use and storage throughput? I dont understand why if it loads a ton of stuff in RAM is it streaming off storage.
 
I wonder if you are running into a PCI-e lane management conflict.
 
Last edited:
I wonder if you are running into a PCI-e lane management conflict.

With a 44 PCIe lane CPU? He's only talking a single GPU.

The MSI document I linked explicitly states:

"More Efficiency for M.2 Devices
Due to the increasing CPU lanes from 44 to 48 lanes, MSI new X299 motherboards are optimized to utilize more M.2 slots direct to CPU. It is evident that M.2 direct to CPU rather than Chipset can have better transfer speed and efficiency because M.2 devices would not be restricted by DMI 3.0 bandwidth limitations. The Creator X299 motherboard supports two M.2 slots direct to CPU among total three M.2 slots."
 
The i9-9820X supports 44 PCIe lanes. Look at the MSI Creator X299, X299 PRO 10G,or X299 PRO motherboards which come with 2 M.2 slots connected directly to the CPU and support 256GB capacity.

That looks promising, but it looks like the model that supports 256GB and 2 M.2 drives direct to CPU (the "creator") is not yet available. Thanks for the link though; I'll keep an eye out for news about that.

MSI also talks about a PCI card for M.2 drives: [URL="https://www.msi.com/PC-component/M.2-XPANDER-AERO"/]M.2 XPANDER AERO[/URL], but it looks like that's vaporware.

I'm starting to think that's going to be the way to go (PCI card hosting the drives) but information on that is really sparse/sketchy. Id be interested to hear if anyone's set that up successfully.

I also have to ask the OP, have you really looked at the RAM use and storage throughput? I dont understand why if it loads a ton of stuff in RAM is it streaming off storage.

The total amount of data that's required to be "live" simultaneously is enormous, TBs at a time. The RAM just acts like a cache or buffer to cover for the time it takes to start streaming individual samples from disk (thousands of samples, so a lot of RAM). A lot of times people resort to using multiple machines, but I'm trying to get set up where I can load a lot of stuff on one box.
 
That looks promising, but it looks like the model that supports 256GB and 2 M.2 drives direct to CPU (the "creator") is not yet available. Thanks for the link though; I'll keep an eye out for news about that.

Since the MSI website info is over a month old, I thought they should be "out in the wild". Sorry about that.
 
This sounds like an interesting challenge, but it is difficult to understand the real requirements without more experience of it. Looking at what others have done may give a pointer to what is achievable. I assume this is real-time music, not some kind of offline rendering which would not be time dependant.

What is the software that will be doing all this? Does the CPU performance matter much? Is it just literally moving data around? Or will it do processing effects in real time also?

ned3000, presuming this is something that is being done already, what hardware are you currently using? Can you monitor its usage to help give a pointer on requirements?

I work with sound, not in a music environment, but in communications. Taking something we're all familiar with, "CD quality" is only 176kb/s. Even assuming the raw data is uncompressed at 24 bit and double the sampling rate (I have no idea what quality is used) then we're still only up to 530kb/s per stream. How many simultaneous streams are needed at once? You could fit over 1000 of these over a single SATA interface, without even touching NVMe. Depending on the size of the file, I'm guessing potential random nature of the reads will be the bigger limiting factor. In that case, assuming the software supports it, multiple single drives will probably perform better than raid. I'm not sure sequential performance is what's needed. I also presume this vast data is pretty much read-only so the SSDs don't have to be high write performance ones which is when pricing starts to really hurt.

On the chipset bandwidth on Intel, it's 4GB/s from memory. My gut feeling still is, how much do you think you will really need to transfer at the same time? To help scale that, how much total SSD capacity will be required?
 
The software I'm running is Cubase. For this application, I don't think actual CPU performance is the bottleneck, it's like you said "moving data around."

The hardware I have now is Intel 4790K w/ 32GB RAM and a few SATA SSDs.

I know from experience that the RAM is an issue; it runs out depressingly quickly when loading virtual instruments.

Storage performance is a little harder to quantify. Under heavy load there are sometimes audio glitches/dropouts while CPU utilization is relatively low. I'm guessing that there are SSD reads not making it in time; hard to say for sure though.

The raw bandwidth numbers are probably enough to keep up, but I think the problem is timing. The is a lot of data involved, but more importantly there's a hard real-time deadline. The software does all it's disk I/O, mixes and processes, and then fills a buffer with audio to be output by the hardware every ~100 ms. If that process doesn't complete in time there's a glitch. My thinking is that decreasing I/O latency (by making each individual read faster) will increase reliability/performance. Maybe I'm thinking about it wrong though; I'd be interested if anyone's got any insights.

I guess one of the reasons why I'm going out of my way to try to maximize storage throughput is that it doesn't really cost anything. Maybe a bit more for a motherboard with the M.2 slots set up right or a PCI card for the drives (which seem to be relatively inexpensive.)
 
Personally I wouldn't touch X299 if you want multiple M.2 SSD in RAID in any form. The main reason is that RAID on X299 requires VROC support so you need additional VROC module. I don't think it's built-in into any PCIE card that supports multiple SSD. ASUS and ASRock M.2 PCIE x16 cards still required VROC module to set RAID. Also max bandwidth is ~8-9GB/s no matter what the manufacturer says (they put only max theoretical bandwidth in specs). On X399 it's ~12GB/s max and doesn't require VROC for RAID. MSI actually shows results on their AERO card made on X399 mobo.
Another thing is that RAID performance in VROC is pathetic. I was covering that topic a couple of times on the OCF. You get high sequential bandwidth but lower random bandwidth than on a single NVMe SSD.

I would wait for TRX40 mobos ( 2-3 weeks maybe) and grab a new Threadripper. More cores and probably better performance per price than Intel. Also, PCIE 4.0 SSD are faster. Motherboards should support up to 128GB in standard modules and up to 256GB in new high-density modules (can already buy some like Corsair Vengeance 2666-3200 at a reasonable price).

If you want the lowest delays and still higher capacity then I wouldn't pick more than 128GB RAM and more than a single NVMe SSD. Everything above that will have higher latency because of additional controllers, drivers, higher density/more relaxed timings etc.
 
While I'm not familiar with Cubase, I still suspect raid is barking up the wrong tree. X299 if supporting bifurcation would just be an enabler to have multiple NVMe SSDs. The minimum and recommended hardware for Cubase isn't actually very high at all, so I wonder if there are more system-level optimisations that may help also. For example, while previously testing Optane, I didn't see much difference between CPU and chipset lane connection, but CPU speed made a bigger difference to random reads. I think there was talk that with Zen 2 the response time for a CPU to go from power save state to normal running was lower compared to Intel. There may be a benefit to responsiveness by turning off idle power save on CPU for example.

I also think multiple SSDs, if needed, on multiple controllers is still better than throwing everything on one storage device. This is kinda like a software raid 0, presuming files are spread across the drives. On a single device it'll have to do all read requests by itself. With multiple devices, you can get some parallel action going.
 
ned3000, how much total storage do you estimate you will need for the data bases? This may have bearing on whether you do something with one large SSD or multiple drives; that together with the fact that SSD drives don't come in huge cpacities like spinners do.
 
Thanks for the comments guys. This has been helpful.

I wasn't really considering RAID, although it's a bit unclear how the M.2 PCI card/VROC thing works. If I were to go that route, I was hoping to just run the drives independently (I *think* that's possible). If not, from what I've seen RAID 0 is doable without the special key/dongle, although now I'm reading Woomack's comment that VROC RAID performance is terrible anyways. Would that apply to RAID 0 as well? What about just non-RAID drives connected over PCI?

The upcoming TRX40/Threadripper 3 combo is certainly interesting, but $1400 is more than I was looking to spend on the CPU (I was actually hoping that would be the cheap part of this build to offset the rather large costs elsewhere).

At this point it's tempting to just settle for one fast M.2 drive w/ direct CPU connection (which is easy), but since I'm going to need at least 2 just for the data capacity I really want to try to get each drive on its own 4 PCI lanes if possible. So any other thoughts are welcome on good ways to do it.

Also, regarding the power state switching thing that mackerel mentioned: Cubase has an option for that that does something to keep the CPU at full power continuously (I think I should be ok with that; I've got a 120mm water cooler to dedicate just to the CPU.)
 
Last edited:
I'd have to imagine a 120mm aio may struggle to keep a 9920x cool for prolonged periods of heavy loads. May want to bump that up to 240mm aio.
 
At this point in the discussion I would point out to OP that he mentioned most people use two computers to do what he wants to do. Looking at the expense of moving to a motherboard and CPU platform that could handle 256gb of RAM I wonder if he would be better off building two less expensive systems. Maybe he has run into the reason most people use two computers for this task.
 
I think I may have found a motherboard that could work: ASRock X299 TAICHI CLX. I'm not really familiar with that brand but that model seems to tick the necessary boxes: 256GB memory support and 2 M.2 drives direct to CPU. Anybody think that's a bad call? Looks like a good CPU might be 9820x.

Regarding the two computer thing: There are a few downsides to that. For one, a license is required on each machine for the host software, Cubase (~$600). Also, there's some complexity in setting up synchronization, additional audio hardware required, the second computer, additional monitor or KVM, etc. Basically enough pain that I feel like it's worth it to deal with the hassles we're talking about here.

Also, just a quick clarification about the cooling situation. It's a 3x 120mm radiator (not single 120).
 
Back