• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

i7-7800X impressions

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.

mackerel

Member
Joined
Mar 7, 2008
After the fun of getting the bits together, the build has gone fairly smoothly. There was some pain with trying to put a mobo in a case with an existing watercooling setup and no slack in the tubing. I got the old mobo out, so this one should go in. Didn't stop it fighting back in the process.

I'm still putting software on it at the moment so not even thought about OC yet.

System:
i7-7800X - Asus bios defaults, probably stock, what's the all-core turbo? It's running 3.8 GHz and I don't know if the mobo is OCing for me
Asus X299 TFU mark 2 bios 0402 (only one released)
Corsair HX1000i
Samsung 960 Evo 500GB
Asus 980Ti reference
EK custom loop for both CPU and GPU, with P240 and P360 rads, but fans are only on the P240 currently.
G.Skill Ripjaws 4 3333 4x4GB (at 2133 for now)

Only done a quick stress with Prime95 29.2. With all cores loaded it went to 3.8 GHz. Core voltage was reported to be 0.93v! Max temp in that condition was 58C so I have some thermal headroom to explore later.

Running Realbench on it now as alternate stress test. Gonna run out of time for more testing tonight so I will put it to some distributed computing work overnight, and start looking at overclocking tomorrow.


Edit: realbench got temps up to 63. While attempting to use the system at the same time it was running, I had a couple of instances when the UI totally froze. Not even mouse pointer was moving. I noticed there was still disk activity and after waiting some time (tens of seconds) it came back as if nothing happened. I dunno if this is something to do with RB or if there is something else going on here.

Anyone have experience with Samsung's NVMe driver? I did a crystaldiskmark before (MS driver) and after installing it, and it did seem to help random and non-queued sustained transfer speeds.
 
Last edited:
Did they have a 960 Evo firmware update? I don't have any issues with my PM961 and samsung's 2.2 NVMe driver and realbench.
 
Ooh, can you use Samsung's NVMe driver with PM961? I have that in another system and as OEM drive I don't recall it was supported by their consumer software last time I tried it. My 960 evo was reported as having current firmware.
 
Learning the new bios options available... I have some baseline Prime95 data for reference and went to increase ram speed as first step. The modules are rated at 3333 but I never proved them 100% stable at that speed on any system so far. Like I would do on other Intel systems, I turned on XMP (declining Asus all core optimisation that was offered when doing so), saved, exit. Windows started to load and... lockup. That didn't look promising. Reboot into bios, and I saw it listing a target CPU speed of 5 GHz. There's the problem. It turned out to hit 3333 ram speed, it decided to use 125 strap and adjust multiplier. Unfortunately it wasn't sensible enough to compensate CPU multiplier which I had left on auto. I set 100 strap, 3200 ram speed, and tried again. Booted no problem this time. Doing a quick mix of P95 blend on it for now, will reboot into memtest next. If that appears stable, I'll move onto CPU OC.

Since I'm using P95 as my primary stress (AVX) it could get interesting later as it looks like I can set separate CPU ratios for AVX, AVX-512, and other.

One minor criticism of the TUF mark 2 so far: there's only one internal USB2 header on it. I could use two, one each for PSU and LED/fan controller. The single header does support two USB ports, but each device connector hogs both spaces. I had made a splitter in the past but no idea where I put it...
 
Did 2 complete passes of memtest at 3200, and most of a 3rd pass before I got bored and went back to benching.

I came across a weird thing after that. Aida64 ram bandwidth showed an improvement at 3200 over 2133 as would be expected, but when I looked at the Prime95 benchmark results, they were all way below expectations. Even after a reboot, I could repeat this. It was like something was bottlenecking the crap out of it. CPU-z reported the expected CPU, cache and ram clocks and configuration. Weird. Remembering Woomack's post about cache clock influencing ram, I randomly set it from 2000 to 2500 and things were back where I expected it to be. Let's reset it back to default... and the scores remained high. I suspect this is a bios quirk, in that something maybe didn't get initialised correctly until the setting was changed. I had seen similar with SMT on one of the Asus Ryzen boards. I've now added a short P95 bench (30 seconds) to my test routine to verify performance.

Enough of the ram for now, time to see what the CPU can do. I manually set 4 GHz in bios, no problem with P95 64k FFT load on 12 threads, in place. I want to initially concentrate on the CPU cores. At this point I installed XTU to make things easier. I noticed a discrepancy here between the mobo reported core voltage (shown by both hwinfo64 and CPU-z) typically 0.9xv, and the value reported by XTU around 1.10v. Relative to Skylake, I'm more inclined to believe 1.1 as 0.9x sounds rather low. Without touching the voltage, I started climbing up the multipliers, watching out for throttling or otherwise, with only a short pause in between to check for stability. 45x crashed and rebooted, so next step down is 44x. Not good either, got a BSOD. Currently letting 43x burn in. With over 10 minutes running, hottest core is 75C and VRM is 77C. I should take and add a photo some time, but the case is assembled and there will be some degree of airflow around the mobo.

slx-p95.png

Here's a summary of the P95 testing so far. I'm comparing a 6700k stock (4.0 GHz all cores loaded) with 7800x stock (3.8 GHz all cores loaded). Prime95 allows you to test either one task with multiple cores/threads, or one task per core or thread. I tested two scenarios, one task multiple cores, one task per core. Although HT was on, it was deselected and treated as if it wasn't there. The assumption remains that it provides no benefit to this type of task as with previous Intel CPUs to date.

The FFT size is related to the amount of data set size, multiply by 8 to get the working size. e.g. 64k FFT = 512k ram. If multiple cores are working on the same task, there is still only one master set of data - I don't know if it needs to be duplicated per core. If you are working one task per core, they are independent of each other, but you will have multiple sets of data floating around. Simply multiply further by number of working cores. I'm using the total data size for the horizontal axis, as that allows us to see how it fits in with the cache levels.

For one task multiple cores, it can be seen there is a drop off in performance for smaller task sizes. The assumption here is that there is an insufficient amount of work to be spread and the overheads of going multi-threaded far outweigh the benefits. Using my main 6700k system turned out not to be the best choice as since I un-overclocked it from previous 4.2 GHz all cores, it has gone practically ram bandwidth unlimited as far as P95 is concerned. The lines are almost horizontal. There is a slight dip after 8192 if you look really carefully which is when the size no longer fits inside the 8MB L3 cache. If I had slower ram, or faster processor, this would be more obvious. I had posted similar previously when comparing i5 vs i7 (6 vs. 8MB cache, as well as architecture generations) but it wont be trivial to find it again for now.

Onto the 7800x let's start with the dark green and blue lines at the top. Between about 3M and 12M they are practically on top of each other. This represents the best throughput available for small size workloads when not ram limited. I haven't done an exact comparison, but the difference is roughly from about 7 for 6700k to just over 10 for 7800x, which is a similar ratio to 4.0x4 : 3.8x6. This is indicative at a crude level to no significant IPC difference to the cores, which was expected anyway. It is interesting that the dropoff doesn't start until 12MB, even though the 7800x only has about 8MB of L3 cache also. We have to remember that in 6700k the L3 was inclusive, duplicating the lower level cache data. In 7800x it is non-inclusive (I may have incorrectly referred to it as exclusive previously) so the effective size is harder to understand as it is now a blend of L2 and L3.

To the right of the chart the blue and green lines diverge, with the faster ram giving more performance as expected. But the difference is smaller than expected. This may be because we're neither purely ram limited, or purely CPU limited, and the transition is not as simple as it looks. I believe it could be approximated by a tanh function (with appropriate unit scaling), but there may also be other bottlenecks due to the L3 cache for example. Again based on Woomack's previous observations, I suspect the cache clock is limiting the ram bandwidth potential. That'll be something to look at next.


I took so long to write that bit, the stress test has now run for over 48 minutes. CPU max core temp 76C, VRM 79C. Time to mix it up a bit...

I might settle on 4.3 as my "AVX speed" and using the offsets see if I can push non-AVX clocks even higher.
 
I'm happy that 4.3 GHz seems stable based on running 60m P95 set to 64k FFT, 90m P95 blend, 1h Realbench.

Just started to play with the AVX offset. For starters I set the ratio to 44, with an AVX offset of 1. So it should run 4.4 GHz for non-AVX code, and 4.3 GHz for AVX code. Ok so far? Now I just need non-AVX code to test it. Ryzen advocates have long argued AVX isn't important, but it seems it may be lurking there more than you think. Prime95 was out, well, unless I force it to not use AVX. What else is there? Aida64? I've paid up for an upgrade licence so I'm current, and with either CPU or FPU selected I can see the clock flickering between 4300 and 4400. AVX! There is no other throttling that I can see. Ok, what about realbench. It does it too. CPU-z built in stress test doesn't, but it is hardly anything of note. What else... I downloaded OCCT which I haven't used in a very long time. It still looks rubbish but under the skin it does seem feature rich. The default OCCT stress was triggering the AVX downclock. Linpack depended if you selected the AVX checkbox or not.

All considered, are there any other non-AVX stability tools out there I should look at? Ideally I want something that can run continuously, and alter if something isn't quite right, preferably without having to wait long enough for a hard crash or similar.
 
This ASUS board sometimes sets weird clocks when you keep multicore enchancement option enabled. For some reason when I left all at auto then sometimes 3000 and 3600 memory results were almost the same. Yesterday I tested my memory kit at everything manually and cache at auto. Differences in memory bandwidth were up to 10GB/s between 3000 and 3600 while in my previous tests were no more than 4-5GB/s. Something is bumping some clocks and it's not visible in software. Probably that ASUS option which is "improving" multicore performance.
There is only 1 BIOS for this board what is a bit weird. Usually in 1-2 months after premiere ASUS has 3-4 BIOS releases and now nothing. Other thing is that first BIOS is already marked as 0402 so there were many betas before that.
 
Uses AVX1 not AVX2 or FMA3.

Doesn't matter. There's only two offset options, AVX and AVX-512. Presumably AVX will include AVX2 and FMA3 as well as older functions.


Ok, time for the next data dump...

I should add, I have for the moment settled on 4.3 GHz stock voltage as my operating CPU overclock. I might push it further for non-AVX later, but for now this has taken everything I've thrown at it. Testing below is all at 4.3 GHz, where I will look at ram and cache influence.


Firstly aida64 results. Note I'm using the current release and it warns it is not yet optimised for my CPU. The current beta changelog doesn't suggest it would help.

4300-2133-2000.png 4300-2133-3000.png
Above results are at 2133 ram, 2000 and 3000 cache respectively.

4300-3200-2000.png 4300-3200-3000b.png
Above results are at 3200 ram, 2000 and 3000 cache respectively.

Taking the 4 results above together, it is not surprising that increasing the cache speed hugely impacts the L3 results. Looking at read bandwidth only, a 50% increase in cache clock gave 43% and 45% for 2133 and 3200 ram cases respectively. Similarly a 50% increase in ram speed from 2133 to 3200 gave 11% and 29% for 2000 and 3000 cache speeds respectively. It seems the cache speed is curtailing the higher ram speed potential.

Note for a stock system, I think ram bandwidth is far less important than for Skylake-S. The 7800X at a stock all core speed of 3.8 GHz and quad channel ram at 2133 has a ratio of about 3.0x ram bandwidth to core GHz. A 6700k at all core 4.0 GHz and 2133 ram only has 2.1x ratio. My 6700k system with 3200 ram would rate 3.2x, and 7800x at 4.3 GHz with 3200 ram is a massive 4.0x. Note this value isn't a linear scale and it depends on the application, and I've not taken rank into consideration for this overview. For Prime95 like uses, a ratio of 2x is under 70% efficient, a ratio of 3x is around 80% efficient, and a ratio of 4x is better than 90% efficient.

6700k.png
For comparison, this is my 6700k at stock: 4.0 base to 4.2 boost, 3200 dual channel dual rank ram. Ram bandwidth is comparable to 7800X with 2133 ram, but latency is much lower. L1 bandwidth is lower but at comparable latency. L2 bandwidth also lower, but so is the latency. L3 is the shocker. The old 6700k has far more bandwidth and far less latency. Not even increasing the cache on the 7800X closes the gap. It seems the 7800X really wants the data to stay in L2 and only use L3 as last resort before hitting ram.

gtav-bench.png
This is only one scenario looking at the gaming impact. I used GTAV built in benchmark. Settings were 1080p full screen, vsync off, stuff was high or very high. Tested in 4 conditions: ram at 2133 and 3200, cache at 2000 and 3000. It looks like increasing either cache or ram gave a boost, and enabling both gave the highest framerate. There's not much between doing just cache or ram, but ram has a slight edge. Then again, the cache is essentially free performance when you OC it. The difference between 2133-2000 and 3200-3000 for the five passes are: 17%, 7%, 14%, 9%, 14%. I have to wonder how much of this is CPU as opposed to GPU limiting...

p95-bench.png
Mainly for my own interest, here's one P95 result in more detail. This is large enough that it would hit ram hard, but also note my earlier comment that the balance of this system means even without ram OC it shouldn't be heavily limited. From 2133-2000 to 3200-3000 is a 19% increase in both test cases.
 
Does it reduce with other AVX software? I'm just reporting what I'm seeing. What CPU is that on? The 7600k in sig? Maybe the implementation is different between the processors.
 
Does it reduce with other AVX software? I'm just reporting what I'm seeing. What CPU is that on? The 7600k in sig? Maybe the implementation is different between the processors.

I just did a test with RealBench V2.54 and it has AVX, so it reduced my clock speed from 4.6GHz to 4.0GHz. Version 2.43 does not have AVX. I found out Chrome browser uses AVX, I open 50 taps at the same time, I use on startup continue where you left off.

What is the stock cache speed for the i7 7800X?
 
Ok, I failed to spot the difference in Realbench version. Good to know. The 7800X cache is 2000 stock. I'm confident 3000 is stable without touching voltage.

Temporarily stopped tinkering with it for now. As a vague justification to myself getting this system to play with, I'm now forcing myself to sell some bits once I extract it, take a nice photo and list it at various places. Could take some time...
 
Ok, I failed to spot the difference in Realbench version. Good to know. The 7800X cache is 2000 stock. I'm confident 3000 is stable without touching voltage.

Temporarily stopped tinkering with it for now. As a vague justification to myself getting this system to play with, I'm now forcing myself to sell some bits once I extract it, take a nice photo and list it at various places. Could take some time...

Is cache listed on your motherboard as uncore frequency?
 
There are uncore and cache voltages separately but only cache clock. I haven't seen any difference at higher voltages.

I don't think that L3 is really important. Most applications take advantage of fast L1/L2. L3 as I remember is some kind of addition. I don't remember exactly where it was but someone mentioned that not all applications can fully utilize L3. I can be wrong as I don't remember the details. Even though L3 is slow then L1/L2 is fast. I also heard that memory controller wasn't improved at all and it pretty much looks like that in tests.
Most generations of Intel processors in last years had performace improvements based on cache speed. In many applications there were differences only because of memory/cache speed. Every generation has a wall of improvements what means something only for marketing purposes. In real, performance gain is only because of some single changes or like in last gen new AVX instructions ( as long as they're used by software ).

SB-> IB = different bus/memory controller, some other little changes
IB-> HW = improved cache/memory controller, some other little changes
HW-> BW = added L4 for IGP
HW/BW-> SL = new memory controller/faster cache, some other little changes
SL-> KL = barely any changes

on the way we had X series and these processors had bigger changes just because were for longer on the market:

BF->GT = improved cache/memory controller
BF/GT->SBE/IBE = huge improvements in everything
SBE-> IBE = memory controller/cache improvements
SBE/IBE-> HWE/BWE = different memory controller, faster cache
HWE-> BWE = some more changes but generally nothing special
HWE/BWE-> SLX = faster L1/L2, slower L3, no difference in memory controller, added AVX-512 and some other little improvements

I see it like almost everything base on improvements in data transfers between CPU and RAM. Large and fast cache reduces need of fast RAM and end user can't see delays because of that. In each generation ( or maybe in most ) cache is getting larger or faster. Memory controllers are not changing much for some time but memory speed and density is going up.

I see that your L3 cache results are quite low in general. I have 90-100GB/s without OC and 160-170GB/s after OC. Also the same as in previous generation, memory bandwidth on more cores is much higher.
I don't know if you've seen Maxxmem results but are really low. Single threaded memory bandwidth is counted as about 10GB/s while last generations had 20GB/s+. I'm not sure if it's benchmark issue or real result. In general it doesn't mean much.

I was testing 4x8GB single rank kit yesterday and results are a bit weird. Read was ~3GB/s lower than on dual rank kit but writes were about 5GB/s higher. I was expecting it other way.
Max clock on single rank was 3800 but there were problems with booting. 3733 was entering Windows without issues but was acting weird sometimes. 3600 was working the same as dual rank kit. So I guess I will stick to dual rank kit for everything.
I hope there will be new BIOS soon with some max memory clock improvements. Current BIOS is fully stable and all seems fine but I wish to play some more with settings and check memory at higher frequency.
 
Last edited:
There is quite some discussion around the net in general that Skylake-X even when OC core, ram, cache, isn't giving the high fps performance of 7700k by quite a margin. This is a bit of a bad result for me, as I was intending using this as a platform for going into high fps gaming... doh! I don't think anyone understands the reasons why yet, but it may be optimisations for cache structure of previous CPUs not suiting Skylake-X. Maybe I should have dropped a 7700k into my existing desktop after all...

On L3 bandwidth, does that scale with cores? I can see it could, if each core was accessing its local slice in parallel. Never given it much thought before.

Regardless, I do have a real test which helps shows possible problems in the form of Prime95 benchmark. I never got to how the low performance state happens but once gone, it stays gone as long as I don't change any settings. Due to the data size, L3 does help out here in certain circumstances. In theory the 7800X isn't bad on ram bandwidth relative to core demand so maybe that lessens the dependency on L3. Intel may have rebalanced the cache, but now I wonder if software needs to be rebalanced to better use it also. Some devs I've already seen are considering optimising for L2 and not L3 as would be done previously.

I should add to my todo list how rank influences ram performance, but to do that I'm going to have to be creative a bit running two channels, with one or two single rank modules on each. Might also see if the ram works above rated 3333 (currently 3200).
 
I don't think that L3 bandwidth scales with cores but 6 and 8+ cores are a bit different and have different amount of cache so maybe there is parallel access on more cores. L1 and L2 look about the same on 6 and 10 cores ... or I've missed something. I have about 30-60% higher L3 cache bandwidth on 10 cores comparing to your results.

On this platform I can't see big difference between ranks. I was expecting more. That 3-5GB/s bandwidth here or there can be probably covered by timings. Dual rank for sure has more relaxed sub timings and it's losing in writes. I had no time to play with timings. I only tested all at manual main timings + auto subs. For some tests I used more timings.

Generally platform runs nice at higher memory frequency but I'm not sure if for 4000+ won't be required expensive motherboard. I stuck at not much above 3600 what for dual rank is still great while single rank kits could go much higher ... and simply can't. Maybe new BIOS will help but somehow I doubt. On the other hand it's not helping much in general performance and I just don't want to pay 2x more for motherboard to see higher memory clock without big coverage in performance gain. I'm not complaining at 3600 13-13-13 1N in quad channel on dual rank kit, it's actually more than I was expecting ;)
 
Back