UL Benchmark Launches CPU Profile Benchmarking Software

Yesterday, UL Benchmark added CPU Profile to their 3DMark Advanced and Professional Edition benchmarking software. CPU Profile runs six tests on the CPU determining 1, 2, 4, 8, 16, and maximum threads to provide a comparative rating to other CPUs.  If you currently own a copy of 3DMark Advanced Edition it is currently available as a free update. If you would like to purchase this useful tool you can buy it from Steam for only $4.49 until July 8th, 2021. The press release below has additional details along with links.

3DMark CPU Profile—CPU benchmarks for modern processors

The 3DMark CPU Profile introduces a new approach to CPU benchmarking. Instead of producing a single number, the 3DMark CPU Profile shows how CPU performance scales and changes with the number of cores and threads used.

The CPU Profile has six tests, each of which uses a different number of threads. The benchmark starts by using all available threads. It then repeats using 16 threads, 8 threads, 4 threads, 2 threads, and ends with a single-threaded test.

These six tests help you benchmark and compare CPU performance for a range of threading levels. They also provide a better way to compare different CPU models by looking at the results from thread levels they have in common.

The 3DMark CPU Profile shows you how your CPU scores compare with other results from the same CPU model. It’s a great way to check if your CPU is performing as expected. For overclockers, the 3DMark CPU Profile shows the overclocking potential of your CPU and provides more ways to track and measure the gains from overclocking.

More cores, more threads

The trend in processor development is towards an increasing number of cores. More cores mean more work can be performed at the same time.

Simultaneous multithreading (SMT) enables each core to run multiple threads. The more threads you have, the greater the throughput of work.

However, core counts are increasing faster than the ability of popular applications to make use of them. Some tasks are more suited to multithreading and multiple cores than others.

A modern CPU benchmark should demonstrate the benefits of having many cores and threads by scaling beyond 16 threads. It should also show how a processor performs for gaming and other real-world activities where performance rarely scales beyond a modest number of cores and threads.

It is not possible to represent both these aspects of CPU performance with a single number. A different type of benchmark is needed.

3DMark CPU Profile benchmarks

The 3DMark CPU Profile includes six tests that feature a combination of physics computations and custom simulations. All six tests use the same workload; it is only the amount of threading that changes, with tests limited to using either 1, 2, 4, 8, 16, or the maximum number of available threads.

Each of the six tests produces a score. Scores are comparable across tests. You can compare the 8-thread score with the 4-thread score, for example. A higher score means the CPU performed the work faster.

A hardware monitoring chart shows you how the CPU clock frequency and CPU temperature changed while the tests were running.

 

How to benchmark and compare CPU performance

The 3DMark CPU Profile shows you how your CPU scores compare with other results from the same processor.

The green bars on the 3DMark CPU Profile result screen show you how your scores compare with the best scores for your CPU. The longer the green bar, the closer your score is to the best result for your CPU model.

The median score, shown by the marker, shows the performance level you should expect for your CPU. In most cases, the median represents performance with stock settings. If your score is below the median, it may indicate a problem with cooling or background processes. Check the hardware monitoring chart to see how the CPU temperature changed during the run.

The distance from the median marker to the end of the bar represents the overclocking potential of the CPU. For overclockers, the 3DMark CPU Profile provides more ways to measure the effects of overclocking and more ways to compete for the highest scores!

Please note that these features are powered by benchmark results from 3DMark users. These insights may be unavailable for some CPU models until enough results are submitted.

Your 3DMark CPU Profile scores should increase up to the number of threads supported by your CPU. In this screenshot from a CPU with 4 cores and 8 threads, you can see that the scores for 8 threads, 16 threads and max threads are the same within the usual 3% accuracy range for UL benchmarks. For CPUs with SMT, which have more threads than cores, the benefit of having more threads decreases beyond the number of CPU cores.

Six levels of CPU performance

The 3DMark CPU Profile includes six tests. These six levels make it easier to compare the performance of different CPU models by looking at the results from thread levels they have in common.

Max threads

The Max-threads score represents the full performance potential of your CPU when using all available threads. The practical use cases for this score lie outside of gaming in extremely heavy, multithreaded workloads such as movie-quality rendering, simulations, and scientific analysis.

16 threads

Computationally intensive tasks such as digital content creation and 3D rendering benefit from more threads, but the 16-threads score is less relevant for estimating practical gaming performance.

8 threads

Modern DirectX 12 games make better use of multithreaded performance beyond 4 cores. The gaming performance of a CPU usually correlates most closely with the 8-threads score. This score also has a high correlation with the 3DMark Time Spy CPU score.

4 threads and 2 threads

Older games developed for DirectX 9 are often bottlenecked by the CPU on modern gaming PCs. The frame rates of popular esports titles, such as DotA 2, League of Legends, and Counter-Strike: Global Offensive, usually correlate most closely with the 2-threads and 4-threads scores.

1 thread
The 1-thread score is a fundamental measure of the processor’s performance. For games and real-world use cases, however, the multithreaded scores are usually a better indicator of practical performance.

3DMark CPU Profile benchmarks for Windows PCs

The 3DMark CPU Profile is available now as a free update for 3DMark Advanced Edition. From now until July 8, 3DMark is 85% off, only $4.49 USD, when you buy it from Steam or the UL Benchmarks website.

The CPU Profile benchmarks are available as a free update for 3DMark Professional Edition customers with a valid annual license.

-John Nester (Blaylock)

Recent News

About John Nester 399 Articles
John started writing and reviewing PC components for Overclockers.com in 2015, but his passion for PCs dates all the way back to the early 1980s. His first personal computer was a Commodore 64 with a cassette drive. As a dedicated member of the news team, he focuses his articles on new product releases and software updates. He reviews a wide variety of PC components including chassis, storage drives, keyboards, and more. John works in technology as a C.A.D. designer for a major automotive manufacturer. His other passions in life include motorcycles, hunting, guns, and football.

Loading new replies...

Avatar of mackerel
mackerel

Member

3,863 messages 586 likes

I'm away from home so can only run it on my laptop with a Zen 3 5800H mobile CPU, with 8 cores 16 threads.

cpuprofile.png
Link to web results: http://www.3dmark.com/cpu/16620

Well, I get some numbers. How those numbers compare will have to wait until I get home next week.

As is, the bench results may be slightly skewed by thermal and/or power limits. It starts off at higher thread counts and works down. At the start, the CPU may be cooler and thus have less thermal effects early on. Likewise time based power budgets will be consumed early on and may give a boost in that area.

1 to 2 threads: 1.93x
2 to 4 threads: 1.92x
4 to 8 threads: 1.69x
8 to 16 threads: 1.17x

Looking at score scaling with increasing threads. We seem to get near ideal scaling from 1 to 4 threads, with a little overhead somewhere. 4 to 8 threads is a smaller jump. It will take more work to try and determine if this is due to software scaling overheads or is a hardware resource limitation. For example, this might be determined by running an 8 core CPU at a fixed lower clock. This will reduce the loading on non-execution hardware resources. If this scales better than expected, it is hardware. If it doesn't change in scaling, it is software limiting. 8 to 16 threads on an 8 core system is indicative of how much benefit SMT gives. 17% is a pretty unremarkable value, assuming it is not hardware limiting.

Open questions:

What is the benchmark effective peak IPC on Intel vs AMD CPUs?
Is it strongly affected by cache sizes, speeds, memory bandwidth/latency?
How co-dependant are the threads to each other? For example, 1 task using 8 threads is different from 8 independent tasks using 1 thread each.

Reply Like

click to expand...
Avatar of Johan45
Johan45

Benching Team Leader Super Moderator

18,290 messages 167 likes

Here's my 5950X using the CTR OC tool.

3DMark CPU.JPG

1 to 2 threads: 1.95x
2 to 4 threads: 1.95x
4 to 8 threads: 1.84x
8 to 16 threads: 1.61x
16 to 32 threads 1.26

Maybe you do have some throttling going on Mac. My SMT gave about 26%

Reply Like

Avatar of mackerel
mackerel

Member

3,863 messages 586 likes

Maybe you do have some throttling going on Mac. My SMT gave about 26%

I forgot something important, that CPUs will generally boost differently depending on how many cores are in use. That is, 1 core active will generally run at a higher clock than 2, and so on. That might better explain the not so ideal scaling I saw. Also, I need to monitor the CPU power during these runs. Might be hitting a power limit on higher core usage. I don't have a 2nd monitor on laptop so doing this live isn't practical. I know there are overlay software but I'm not set up for those.

Also mine is the mobile Zen 3, so I only have half the L3 cache/core compared to the desktop versions.

Once I'm home it will be easier to test these by using fixed clocks which takes out a bunch of variables.

Reply Like

click to expand...
Avatar of dejo
dejo

Senior Moment Senior Member

4,163 messages 73 likes

In looking at the discrepancies in scores vs threads- you can see that from 1core to 2 then to 4 then to 8, that they do scale closely. What you can see is that 1 and 2 core seem to be higher clocks vs 4 and 8 core. Once over 8 core on the cpu's that have been shown in this thread, that a thread vs a core have a substantial ability difference. I have heard it said that a thread is only about 30% as abled as a core.3dcpu.jpgI will also state that this is my r7 5700g with pbo in a mini itx case

Reply Like

click to expand...
Avatar of Brutal-Force
Brutal-Force

Member

2,331 messages 7 likes

5950x stomps the i9 10980xe :(

CPU Profile v1.PNG

Reply Like

Avatar of mackerel
mackerel

Member

3,863 messages 586 likes

I have heard it said that a thread is only about 30% as abled as a core.

Presuming you're talking about having SMT/HT vs not, it varies a lot depending on the workload. For Prime95 it is 0%. It does not benefit from it, and actually makes things worse because power consumption increases. I forget the exact number but Cinebench R15/R20 is around 30%, so maybe that's where the number comes from. However this is a relatively good case. I don't know what the average is if you were to pick a lot of varied workloads, but I'd expect it to be lower than 30%.

There are some interesting outliers. I've seen 50% twice, once in a long retired distributed computing project, and the Blender Ryzen benchmark with the software version at the time. And the all time record that I'm aware of are some of the subtests in 3DPM base version. That isn't heavily optimised but I saw some 80% uplift on that. The AVX-512 optimised version isn't public as far as I'm aware but it gained massive increases from that. I had thought about it as improving execution resource, but the new understanding is optimally getting data around can be a limit, and that's what's probably going on in this particular case.

Reply Like

click to expand...
Avatar of dejo
dejo

Senior Moment Senior Member

4,163 messages 73 likes

you can see the change from when you have no more cores and have to rely on HT. Look at the r9 5950x vs my 5700g, it scales pretty much as expected until I run out of cores at 8 count where the 5950 makes a nice jump even at 16 threats

Reply Like

Avatar of EarthDog
EarthDog

Gulper Nozzle Co-Owner

76,285 messages 3,035 likes

That's normal what you see, 30-50% depending on workload. P95 isnt something people play so.. scaling there may nkt apply anywhere else. :)

Reply Like

Avatar of BugFreak
BugFreak
2,406 messages 628 likes

Always love new benchmarks! Here is mine on the 11700k.
cpuproifile.png

Reply Like

Avatar of mackerel
mackerel

Member

3,863 messages 586 likes

That's normal what you see, 30-50% depending on workload. P95 isnt something people play so.. scaling there may nkt apply anywhere else. :)

People don't play Cinebench either. You can equally say Cinebench doesn't represent scaling elsewhere. Different niches. If you were to average a wide variety of workloads, I'd bet it is less than 30% benefit. Implication: most software scales less well than Cinebench. 50% can happen but is rare. Prime95 is heavily optimised to make use of execution resource thus it doesn't need it to extract potential performance.

Example of my previous testing at link below. You may note the choice of tests performed as mostly being synthetic or niche compute so arguably that wont represent everything either.

https://linustechtips.com/topic/985591-skylake-vs-zen-vs-zen-htsmt/

Reply Like

click to expand...