FEATURED AMD ZEN Discussion (Previous Rumor Thread)

DaveB · Mar 6, 2017

Bluefalcon13 said:
I'm not entirely surprised at this point. We know there are BIOS issues, particularly around RAM settings. I wouldn't be surprised if the Mobo manufacturers are focusing on their high-end models for fixing the UEFI, and just gonna trickle those changes down to the lesser boards.

Another thing to look forward to, a ways back I posted a link to a Tom's (ewwww I know....) Announcement article from gskill saying they are releasing 2 new lines of memory, designed around Ryzen. I'm hoping those will help with some of the RAM pain in addition to UEFI updates.

Yes I'm sure that it is a motherboard issue. Looking at various motherboard support pages, I was surprised how little high speed memory had been validated. They seemed to have focused all the testing on DDR4-2133 and 2400 with only scattered testing of higher speed modules. If I were running the testing program, I'd do just the opposite and validate as much high speed RAM as possible. DDR4 is not new, all the motherboard manufacturer's have lots of experience validating DDR4 with Intel motherboards so IMO there is no excuse for this.

Kenrou · Mar 6, 2017

https://www.techpowerup.com/231268/...yzed-improvements-improveable-ccx-compromises

TLDR: "there seems to be some problem with Ryzen's L3 implementation, in that it produces latency results that are up to 30 ms higher than the average, at 90 ms, than the L3 latency found on Intel's i7 6900K or even AMD's FX 8350 (both with latency around 60 ms)."

mackerel · Mar 6, 2017

Johan45 said:
WHich board are you on? I'm using the CH6 I know it sounds weird but with the old windows install 2933 wouldn't post on the 1700x. Now it does and runs stably. It took 3 tries initially to get the ram to train. When it hung I used the retry button and after the third attemp it booted and ran fine. Every boot since it's working fine no hangs etc.. I have G.Skill "B" die. I just set manual timing 15-15-15-15-35 and left the voltage to auto. It didn't seem to like it if I set it manually. There's also a MemVTT voltage you can try setting it to 0.7v on auto it runs 1/2 of set mem voltage this will help the IMC

Asus B350M-A as stated previously. Dunno how, it did boot with 4 sticks at 2666 (all else auto) which I benched and showed near enough 25% improvement in ram limited Prime95 conditions compared to 2 sticks of 2133. However the 2133 4 stick results were on average 4% faster than 2 sticks. I don't know what's happening, if it is the test itself, or the test subject, since both are essentially new to me. The Intel test uses different code so I can't assume the AMD code necessarily behaves the same way.

Kenrou said:
https://www.techpowerup.com/231268/...yzed-improvements-improveable-ccx-compromises

TLDR: "there seems to be some problem with Ryzen's L3 implementation, in that it produces latency results that are up to 30 ms higher than the average, at 90 ms, than the L3 latency found on Intel's i7 6900K or even AMD's FX 8350 (both with latency around 60 ms)."

They're using AIDA64 which has problems with Ryzen as they didn't get an early sample to update. I'm not aware if there's been an update since to properly support it. My install isn't reporting any updates available recently.

Edit: there are beta versions with various updates in them, the last dated 3/3, but I don't see anything in the chagelog for the versions I can see that specifically mentions correcting the results on Ryzen.

Edit 2:

https://twitter.com/x/status/837219996166205442

L1 and mem are ok, L2 and L3 are not. So unless we're sure they're using a fixed version of Aida64, ignore.

DaveB · Mar 6, 2017

mackerel said:
They're using AIDA64 which has problems with Ryzen as they didn't get an early sample to update. I'm not aware if there's been an update since to properly support it. My install isn't reporting any updates available recently.

Edit: there are beta versions with various updates in them, the last dated 3/3, but I don't see anything in the chagelog for the versions I can see that specifically mentions correcting the results on Ryzen.

My AIDA64 results, which I posted on page 127 this thread, were derived from an updated version of AIDA64 Engineer sent to me by Tamas Miklos of FinalWire. He sent it to me after I sent them my initial results from the version available for download. They show 98.2 ns latency for my 1700.

Bluefalcon13 · Mar 6, 2017

Silly question, but maybe someone with a better understanding of caches and such could answer. Could these results be caused by the issues notice by the Slilt in his thread? He specifically mentioned cache issues in win10, iirc win10 sees each core as having its own L3 cache, with a value of the total L3 cache.

Sentential · Mar 6, 2017

Bluefalcon13 said:
You keep saying "die" which is confusing. A two die chip would be the core2quad. Ryzen is a single continuous piece of silicone, and a single die.

https://www.pcper.com/news/Processo...onfirms-AMD-Using-Solder-IHS-Ryzen-Processors

It didn't look that way to me; from what I read about the uarch they're essentially 2 core units fused together unless I'm misunderstanding their modular design. (note the two seperate TIM pads) They may be on one slug but they don't appear to be physically connected other than the L3 like you said. Essentially that crossbeam of L3 cache is the weak point and the two units aren't talking to one another very well either because of resource hogging or lack of memory bandwidth.

Essentially Ryzen is a single-socket, single slug 2P setup with two separate cores that share memory together. I'd wager there's probably quite a bit of thrashing thats occuring between hte two

Blaylock · Mar 6, 2017

Bluefalcon13 said:
Silly question, but maybe someone with a better understanding of caches and such could answer. Could these results be caused by the issues notice by The STILT in his thread? He specifically mentioned cache issues in win10, iirc win10 sees each core as having its own L3 cache, with a value of the total L3 cache.

There, fixed that for ya. Sorry, my OCD couldn't take anymore

DaveB · Mar 6, 2017

Bluefalcon13 said:
Silly question, but maybe someone with a better understanding of caches and such could answer. Could these results be caused by the issues notice by the Slilt in his thread? He specifically mentioned cache issues in win10, iirc win10 sees each core as having its own L3 cache, with a value of the total L3 cache.

It might be the way 2 CCX modules use a single "victim cache" split in 2, one for each CCX module. Some think a patch to the Windows scheduler might be a quick fix for this but we'll have to see. Linux users will most likely need a kernel patch also.

Bluefalcon13 · Mar 6, 2017

Sentential said:
https://www.pcper.com/news/Processo...onfirms-AMD-Using-Solder-IHS-Ryzen-Processors

It didn't look that way to me; from what I read about the uarch they're essentially 2 core units fused together unless I'm misunderstanding their modular design. (note the two seperate TIM pads) They may be on one slug but they don't appear to be physically connected other than the L3 like you said.

AFAIK from my understandings from the microcontroller class I took, all ins and outs (memory access, instructions, data manipulation) only occur from cache. IE, the cores themselves shouldn't talk directly to each other. They can only access the same shared cache, but not the core exclusive cache, and manipulate that data.

For example, core 0 adds two numbers. The result is stored in cache L2. If core 2 needs that data, it must be in the shared cache, so it can access it or in RAM. This can be accomplished many different ways (one very inefficient example: core 0 saves the data to RAM, core 2 loads data from RAM, etc).

Sentential · Mar 6, 2017

Bluefalcon13 said:
AFAIK from my understandings from the microcontroller class I took, all ins and outs (memory access, instructions, data manipulation) only occur from cache. IE, the cores themselves shouldn't talk directly to each other. They can only access the same shared cache, but not the core exclusive cache, and manipulate that data.

For example, core 0 adds two numbers. The result is stored in cache L2. If core 2 needs that data, it must be in the shared cache, so it can access it or in RAM. This can be accomplished many different ways (one very inefficient example: core 0 saves the data to RAM, core 2 loads data from RAM, etc).

Could very well be but from what I read it suggested that the L3 acted more as an overflow than something that was actually utilized which would explain why the numbers were off. This would essentially restrict each core to 128bit memory kinda like the way nVidia did the GeforceFX series. Either way something is amiss and I seriously doubt patching Windows is going to fix it.

mackerel · Mar 6, 2017

DaveB said:
My AIDA64 results, which I posted on page 127 this thread, were derived from an updated version of AIDA64 Engineer sent to me by Tamas Miklos of FinalWire. He sent it to me after I sent them my initial results from the version available for download. They show 98.2 ns latency for my 1700.

Which latency? If for ram, that was apparently ok anyway. It is the L2/L3 cache that it was mis-reporting on. Also, what version did you use? I note the latest beta version on their public site is 5.80.4089. I'd try it myself as soon as I find where my licence code is...

DaveB · Mar 6, 2017

mackerel said:
Which latency? If for ram, that was apparently ok anyway. It is the L2/L3 cache that it was mis-reporting on. Also, what version did you use? I note the latest beta version on their public site is 5.80.4089. I'd try it myself as soon as I find where my licence code is...

That's the one. And yes, my AIDA64 results are for memory latency and it is roughly 25% higher than comparable recent Intel CPUs. I'm sure AIDA64 Engineer update it isn't a total fix since Ryzen is new to everybody.

Bluefalcon13 · Mar 6, 2017

Sentential said:
Could very well be but from what I read it suggested that the L3 acted more as an overflow than something that was actually utilized which would explain why the numbers were off. This would essentially restrict each core to 128bit memory kinda like the way nVidia did the GeforceFX series. Either way something is amiss and I seriously doubt patching Windows is going to fix it.

That thread, they were talking about different results for how win7 sees the caches vs win10. His would mean it is something that can be modified via the OS itself.

AFAIK, L3 should be a data sharing cache for all cores. IE, all cores have access to it, and can manipulate it. Granted, it is slower than L2, but it is faster than RAM access. Furthermore, the way alot of multi-threading applications are written, the cores do not manipulate the same data anyways. IE, core 2's code to execute does not depend on core 0's result. So some of that may be a moot point.

Granted my assembly code experience is solely based on an ARM single core microcontroller, with a very limited cache and ram set. I'm sure things are more complicated than that, but I'd assume the general principle of cores do not directly access RAM, but access cache would hold.

Bluefalcon13 · Mar 6, 2017

Blaylock said:
There, fixed that for ya. Sorry, my OCD couldn't take anymore

Thanks blay, I fell off the short bus as a kid [emoji14]

Woomack · Mar 6, 2017

mackerel said:
Which latency? If for ram, that was apparently ok anyway. It is the L2/L3 cache that it was mis-reporting on. Also, what version did you use? I note the latest beta version on their public site is 5.80.4089. I'd try it myself as soon as I find where my licence code is...

80-100ns latency for DDR4 in dual channel isn't ok, it could be if it was quad channel controller at lower memory frequency. On Intel dual channel controllers, results are between 30-60ns. Cache in general isn't the best comparing to intel but FX had the same. Large cache = higher latency. Here you have cache like in server processors so hard is to compare it to Intel lower series.

Btw you can check memory performance in various tests using Geekbench 3/4. There are many tests which maybe won't be affected by software version, patches or anything else. I bet that AIDA64 will be updated soon, they are adding updates quite often so if something isn't right with results/tests then they had no access to hardware to test it.
I'm not sure if there is any other benchmark which is showing memory performance in various tests, not only one single pass with bandwidth.

Btw2 at the end I got ASUS Prime X370 but still waiting for CPU. Maybe tomorrow it will arrive and I will make some quick tests.

mackerel · Mar 6, 2017

Just ran the latest public beta of AIDA64 once, with results shown above. Note aida64 now gives a warning about it not being optimised for the CPU.

Comparing the numbers for L2 and L3 to the TPU article that Kenrou linked earlier, mine are significantly better, and this is with a lower CPU model than was reported on. I'm running completely stock here.

I also just noticed, the source article that TPU reported on is dated 2nd, the day before the latest beta version of aida64, although I can't rule out previous betas having a relevant update. The activity with loading is also interesting, as that is what I was trying to achieve in my testing. The difference is, my testing is with multiple separate jobs, so each would be able to use the L3 cache of each CCX it resides on, and not run into the limitations shown if you wanted to run a single task using the entire but split L3 cache.

Oh, I've now finished the switch from B350M-A to X370-Pro. There is a slightly newer bios I haven't yet installed on it. First impressions are... it isn't much different. I still can't find a way to turn off SMT in the bios. There looks like a lot more ram timings available to play with. I just swapped SSD over and it didn't even show as detecting hardware or want a reboot for anything. Suppose I should give OC a go some time...

- - - Updated - - -

Woomack said:
80-100ns latency for DDR4 in dual channel isn't ok

When I said it was ok earlier, I meant in the sense that it was supposed to be valid, not that the value was good or bad.

Woomack · Mar 6, 2017

mackerel said:
When I said it was ok earlier, I meant in the sense that it was supposed to be valid, not that the value was good or bad.

I know, I just said it's not ok

... maybe try some other benchmark like Geekbench which I suggested ? Or try what bandwidth you get in winsat command in windows. It's usually showing how OS uses memory bandwidth.

mackerel · Mar 6, 2017

Woomack said:
I know, I just said it's not ok ... maybe try some other benchmark like Geekbench which I suggested ? Or try what bandwidth you get in winsat command in windows. It's usually showing how OS uses memory bandwidth.

https://browser.geekbench.com/v4/cpu/2021631

I am not familiar with it at all so I have no idea how this compares, and I don't have an idle higher end Intel to run and compare it to at this moment...

Woomack · Mar 6, 2017

it's dual channel platform, can compare it to i7 6700k

Johan45 · Mar 6, 2017

Just wanted to add this here for those saying that Ryzen is "terrible" for gaming

FEATURED AMD ZEN Discussion (Previous Rumor Thread)

Senior Member

Member

Member

Senior Member

Member

Contributing Member

"That Backfired" Senior Member

Senior Member

Member

Contributing Member

Member

Senior Member

Member

Member

Benching Team Leader

Member

Benching Team Leader

Member

Benching Team Leader

Benching Team Leader Super Moderator

Similar threads