AMD has had much success this past year with their fully redesigned Zen CPU core. First, they gave us Ryzen and our first look at AMD’s building block technique which uses CPU Complexes (CCX), the cornerstone of their design, and the Infinity Fabric which ties these “blocks” together. This approach allows AMD to easily “stack” these four core, eight thread CCX together to increase core count. Take the Threadripper CPU with as many as 16 cores and 32 threads or as in this case, attach the CCX to a Radeon Vega graphics core and renew their APU lineup.
Today I have the AMD Ryzen 5 2400G and the Ryzen 3 2200G on the test bench. These are AMD’s all-new, revised APUs based on the Zen Architecture and Radeon Graphics processing code-named Raven Ridge. AMD has found through independent research that PCs sold without a discrete graphics card makeup 30% of the market and the addition of an APU of the Ryzen Family would be ideally suited to this segment.
With suggested pricing of $169.00 for the Ryzen 5 2400G and $99.00 for the Ryzen 3 2200G, AMD has set a compelling price point that does not require a dedicated GPU. In AMD’s testing, the Ryzen 5 2400G APU often compares favorably to $75 dedicated GPUs making this APU a wise choice when it comes to performance per dollar for price-conscious consumers. These two APUs will ultimately replace the Ryzen 5 1400 and Ryzen 3 1200 with similar or lower suggested pricing, higher base and boost clocks, and integrated graphics It’s a natural progression.
Specifications and Features
Looking at the specifications table below, the 2400G is a quad-core with SMT for a total of eight threads. This total core/thread count comes from the use of a single CPU Complex (CCX) with SMT active. The base clock comes in at 3.6 GHz and will boost to 3.9 GHz on AMD’s improved Precision Boost 2 technology (more on this later). The 2400G APU also includes 11 Radeon Vega Compute Units clocked up to 1250 MHz. The 2200G is also built on one CCX for four cores but SMT isn’t active on this SKU. The base clock comes in at 3.5 GHz and will boost to 3.7 GHz with Precision Boost 2 technology and also includes eight Radeon Vega Compute Units clocked up to 1100 MHz.
Both are produced on AMD’s 14 nm FinFET process with a TDP (Thermal Design Power) of 65 W. The cooling medium between the die and IHS is TIM (Thermal Interface Material) instead of solder, AMD chose this method to keep production costs down and keep pricing competitive.
There are benefits to using a single CCX such as a lower cost and a more compact size which makes it more suitable for desktop as well as mobile solutions. This also leads to improved latency over a two CCX CPU but there are some drawbacks. This move reduces the L3 cache from 8MB to 4 MB which AMD has offset with higher CPU clocks. The new CPU package also allows Raven Ridge to officially support JEDEC DDR4-2933, the highest official memory clock of any consumer processor.
Regarding PCI Express (PCIe) support, Raven Ridge offers a total of 24 lanes out of the CPU with eight dedicated to graphics and eight for general use such as M.2 PCIe NVMe (four of those eight dedicated to the chipset). The remainder is split up over SATA and USB 2.0, 3.1 and 3.1 Gen2 functionality. AMD’s decision to reduce the dedicated graphics PCIe lanes from sixteen to eight is based on the mid-range GPU and workloads likely to be paired with the APU. The upside, it is simpler to manufacture allowing AMD to reduce consumer costs.
Windows 10 is the officially supported platform for the Ryzen APUs. At this point, it’s unclear whether or not any legacy Operating Systems such as Windows 7 will be supported.
|APU||AMD Ryzen 5 2400G||AMD Ryzen 3 2200G|
|# of Cores||4||4|
|# of Threads||8||4|
|Base Clock Speed||3.6 GHz||3.5 GHz|
|Boost Clock Speed||3.9 GHz||3.7 GHz|
|Instruction Set Extensions||SSE 4.1/4.2/4a, AVX2, SHA||SSE 4.1/4.2/4a, AVX2, SHA|
|Lithography||14 nm FinFET||14 nm FinFET|
|Transistor Count||4.94 billion||4.94 billion|
|TDP||65 W||65 W|
|Thermal Solution Spec||Traditional nonmetallic TIM||Traditional nonmetallic TIM|
|Integrated Graphics||11 Radeon Vega CUs Up to 1250 MHz||8 Radeon Vega CUs Up to 1100 MHz|
|L1 Cache||64 KB I-Cache
32 KB D-Cache per Core
|64 KB I-Cache
32 KB D-Cache per Core
|L2 Cache||2 MB (512 KB per core)||2 MB (512 KB per core)|
|L3 Cache||4 MB Shared||4 MB Shared|
|Max Memory Size||128 GB||128 GB|
|# of Memory Channels||2||2|
|ECC Memory Support||No||No|
The table below is a list of the Ryzen APU desktop lineup equipped with AMD’s new Radeon Vega graphics. In it, we see the Ryzen 5 2400G is the top with its four-core, eight thread, configuration and 11 Radeon Vega compute units followed by the Ryzen 3 2200G with four cores, four threads, and eight Radeon Vega compute units. Both CPUs are overclockable, assuming you buy a motherboard with a chipset capable of doing so.
|AMD Ryzen APU Model||Cores/
|Base Clock||Boost Clock||L3 Cache||Cooler Included||Graphics||TDP|
|Ryzen 5 2400G||4/8||3.6 GHz||3.9 GHz||4 MB||Wriath Spire||11CU 1250 MHz||65W|
|Ryzen 3 2200G||4/4||3.5 GHz||3.7 GHz||4 MB||Wriath Spire||8CU 1100 MHz||65W|
AMD SenseMI Technology
The following information was provided by AMD.
First and foremost, it is important to understand that each AMD Ryzen processor has a distributed “smart grid” of interconnected sensors that are accurate to 1 mA, 1 mV, 1 mW, and 1 °C with a polling rate of 1000/sec. These sensors generate vital telemetry data that feed into the Infinity Fabric control loop, and the control loop is empowered to make real-time adjustments to AMD Ryzen processor’s behavior based on current and expected future operating conditions.
AMD SenseMI is a package of five related “senses” that rely on sophisticated learning algorithms and/or the command-and-control functionality of the Infinity Fabric to empower AMD Ryzen processors with Machine Intelligence (MI). This intelligence is used to fine-tune the performance and power characteristics of the cores, manage speculative cache fetches, and perform AI-based branch prediction.
- Pure Power
The distributed network of smart sensors that drive Precision Boost can do double duty to streamline processor power consumption with any given workload. And for next-level brilliance: telemetry data from the Pure Power optimization loop allows each AMD Ryzen processor to inspect the unique characteristics of its own silicon to extract individualized power management.
- Precision Boost 2
After the unveiling of Precision Boost and the AMD Ryzen desktop processor, AMD has observed scenarios where 3+ cores are in use, yet the overall size of the workload is relatively small. This creates a scenario where the “all core boost” state is triggered, even though there is no imminent electrical, thermal, or utilization boundary that would practically halt further clock speed increases. This scenario represents additional opportunity to drive higher performance. The thermal, electrical, and utilization headroom of the product can be converted into higher clock speeds to capitalize on the opportunity Precision Boost 2 carries forward the 25 MHz granularity of its predecessor, but importantly transitions to an algorithm that will intelligently pursue the highest possible frequency until an aforementioned limit is encountered, or the rated frequency of the part is met (whichever comes first). This applies to any number of threads in flight, without arbitrary limits. Precision Boost 2 could be described as opportunistic, linear, or graceful, and a conceptual comparison of Precision Boost 1 VS 2 has been plotted for clarity below.
- If a hardware limit is encountered, Precision Boost 2 is designed to level off and employ its granular clock selection to dither at a small range of frequencies circa the leveling off point. This process is a continuous adjustment loop managed by the AMD Infinity Fabric, and it cycles up to 1000 times per second. A real-world example of this is shown below with OCCT, where the boost gracefully transitions across one to eight threads and then maintains a max-thread clock speed well above the base. Taken as a whole the Precision Boost 2 invests the AMD Ryzen Processor with Radeon Vega Graphics with greater performance in real-world multi-threaded applications by freeing the CPU to make the most performant clock selection for it’s defined electrical/thermal/load/frequency capacity — regardless of the number of threads in flight.
- Neural Net Prediction
A true AI inside every AMD Ryzen processor harnesses a neural network to do real-time learning of an application’s behavior and speculate on its next moves. The predictive AI readies vital CPU instructions so the processor is always primed to tackle a new workload.
- Smart Prefetch
Sophisticated learning algorithms understand the internal patterns and behaviors of applications and anticipate what data will be needed for fast execution in the future. Smart Prefetch predicatively pre-loads that data into large caches on the AMD Ryzen processor to enable fast and responsive computing.
SMT (Simultaneous Multi-Threading)
This is AMD’s new equivalent to Intel’s HyperThreading (HT) technology. It allows each core to function as two threads, adding performance in multi-threaded applications.
Every Processor is Unlocked
AMD is allowing overclocking on all CPU models, much as they have in the past. The only caveat this time around is you must have a motherboard with a chipset supporting overclocking (X370, B350, or X300).
The “Zen” X86 Microarchitecture
On the performance side, the Zen micro-architecture represents a quantum leap in core execution capability versus AMD’s previous desktop designs. Notably, the Zen architecture features a 1.75x larger instruction scheduler window and 1.5x greater issue width and resources; this change allows Zen to schedule and send more work into the execution units. Further, a micro-op cache allows Zen to bypass L2 and L3 cache when utilizing frequently-accessed micro-operations. Zen also gains a neural network-based branch prediction unit which allows the Zen architecture to be more intelligent about preparing optimal instructions and pathways for future work. Finally, products based on the Zen architecture may optionally utilize SMT to increase utilization of the compute pipeline by filling app-created pipeline bubbles with meaningful work.
A high-performance engine requires fuel, and the Zen architecture’s throughput characteristics deliver in this regard. Chief amongst the changes are major revisions to cache hierarchy with dedicated 64 KB L1 instruction and data caches, 512KB dedicated L2 cache per core, and 8 MB of L3 cache shared across four cores. This cache is augmented with a sophisticated learning prefetcher that speculatively harvests application data into the caches so they are available for immediate execution. Altogether, these changes establish lower level cache nearer to the core netting up to 5x greater cache bandwidth into a core.
Beyond adopting the more power efficient 14 nm FinFET process, the Zen architecture specifically utilizes the density-optimized version of the Global Foundries 14 nm FinFET process. This permits for smaller die sizes and lower operating voltages across the complete power/performance curve. The Zen architecture also incorporates AMD’s latest low power design methodologies, such as: the previously mentioned micro-op cache to reduce power-intensive faraway fetches, aggressive clock gating to zero out dynamic power consumption in minimally utilized regions of the core, and a stack engine for low-power address generation into the dispatcher.
It is in this realm, especially, that the power management wisdom of AMD’s APU teams shines through to impart in Zen the ability to scale from low-wattage mobile to HEDT configurations.
Scalability in the Zen architecture starts with the CPU Complex (CCX), a natively four core eight thread module. Each CCX has 64 KB L1 I-Cache, 64 KB L1 D-Cache, 512 KB dedicated L2 cache per core, and 8 MB L3 cache shared across cores. Each core within the CCX may optionally feature SMT for additional multi-threaded capabilities.
More than one CCX can be present in a Zen-based product, wherein the AMD Ryzen processor features two CCX’s consisting of eight cores and 16 threads (total). Individual cores within the CCX may be disabled by AMD, and the CCX’s communicate across the high-speed Infinity Fabric. This modular design allows AMD to scale core, thread, and cache quantities as necessary to target the full spectrum of the client, server, and HPC markets.
- Infinity Fabric
The Infinity Fabric, meanwhile, is a flexible and coherent interface/bus that allows AMD to quickly and efficiently integrate a sophisticated IP portfolio into a cohesive die. These assembled pieces can utilize the Infinity Fabric to exchange data between CCX’s, system memory, and other controllers (e.g. memory, I/O, PCIe) present on the AMD Ryzen SoC design. The Infinity Fabric also gives Zen architecture powerful command and control capabilities, establishing a sensitive feedback loop that allows for real-time estimations and adjustments to core voltage, temperature, socket power draw, clock speed, and more. This command and control functionality is instrumental to AMD SenseMI technology.
The Vega Graphics Architecture
Seventeen years since the introduction of the first Radeon, the usage model for graphics processors continues to expand, both within the realm of visual computing and beyond. AMD’s customers are employing GPUs to tackle a diverse set of workloads spanning from machine learning to professional visualization and virtualized hosting–and into new fields like virtual reality. Even traditional gaming constantly pushes the envelope with cutting-edge visual effects and unprecedented levels of visual fidelity in the latest games. Along the way, the data sets to be processed in these applications have mushroomed in size and complexity. The processing power of GPUs has multiplied to keep pace with the needs of emerging workloads, but the throughput of nearly all types of high-performance processors has been increasingly gated by power consumption.
With these needs in mind, the Radeon Technologies Group set out to build a new architecture known as Vega. Vega is the most sweeping change to AMD’s core graphics technology since the introduction of the first GCN-based chips five years ago. The Vega architecture is intended to meet today’s needs by embracing several principles; flexible operation, support for large data sets, improved power efficiency, and extremely scalable performance, Vega introduces a host of innovative features in pursuit of this vision, which we’ll describe in the following pages. This new architecture promises to revolutionize the way GPUs are used in both established and emerging markets by offering developers new levels of control, flexibility, and scalability.
Next Generation Geometry
To meet the needs of both professional graphics and gaming applications, the geometry engines in Vega have been tuned for higher polygon throughput by adding new fast paths through the hardware and by avoiding unnecessary processing. This next-generation geometry (NGG) path is much more flexible and programmable than before.
To highlight one of the innovations in the new geometry engine, primitive shaders are a key element in its ability to achieve much higher polygon throughput per transistor. Previous hardware mapped quite closely to the standard Direct3D rendering pipeline, with several stages including input assembly, vertex shading, hull shading, tessellation, domain shading, and geometry shading. Given the wide variety of rendering technologies now being implemented by developers, however, including these stages isn’t always the most efficient way of doing things. Each stage has various restrictions on inputs and outputs that may have been necessary for earlier GPU designs, but such restrictions aren’t always needed on today’s more flexible hardware.
Vega’s new primitive shader support allows some parts of the geometry processing pipeline to be combined and replaced with a new, highly efficient shader type. These flexible, general-purpose shaders can be launched very quickly, enabling more than four times the peak end primitive shaders primitive cull rate per clock cycle.
In a typical scene, around half of the geometry will be discarded through various techniques such as frustum culling, back-face culling, and small-primitive culling. The faster these primitives are discarded, the faster the GPU can start rendering the visible geometry. Furthermore, traditional geometry pipelines discard primitives after vertex processing is completed, which can waste computing resources and create bottlenecks when storing a large batch of unnecessary attributes. Primitive shaders enable early culling to save those resources.
Primitive shaders can operate on a variety of different geometric primitives, including individual vertices, polygons, and patch surfaces. When tessellation is enabled, a surface shader is generated to process patches and control points before the surface is tessellated, and the resulting polygons are sent to the primitive shader. In this case, the surface shader combines the vertex shading and hull shading stages of the Direct3D graphics pipeline, while the primitive shader replaces the domain shading and geometry shading stages.
Geometry Engine Load Balancing with NGG
Primitive shaders have many potential uses beyond high-performance geometry culling. Shadow-map rendering is another ubiquitous process in modern engines that could benefit greatly from the reduced processing overhead of primitive shaders. We can envision even more uses for this technology in the future, including deferred vertex attribute computation, multi-view/multi-resolution rendering, depth pre-passes, particle systems, and full-scene graph processing and traversal on the GPU.
Primitive shaders will coexist with the standard hardware geometry pipeline rather than replacing it in keeping with Vega’s new cache hierarchy, the geometry engine can now use the on-chip L2 cache to store vertex parameter data. This arrangement complements the dedicated parameter cache, which has doubled in size relative to the prior generation Polaris architecture. This caching setup makes the system highly tunable and allows the graphics driver to choose the optimal path for any use case. Combined with high-speed HBM2 memory, these improvements help to reduce the potential for memory bandwidth to act as a bottleneck for geometry throughput.
Another innovation of Vega’s NGG is improved load balancing across multiple geometry engines. An intelligent workload distributor (IWD) continually adjusts pipeline settings based on the characteristics of the draw calls it receives to maximize utilization.
One factor that can cause geometry engines to idle is context switching. Context switches occur whenever the engine changes from one render state to another, such as when changing from a draw call for one object to that of a different object with different material properties. The amount of data associated with render states can be quite large, and GPU processing can stall if it runs out of available context storage. The IWD seeks to avoid this performance overhead by avoiding context switches whenever possible.
Some draw calls also include many small instances (i.e., they render many similar versions of a simple object). If an instance does not include enough primitives to fill a wavefront of 64 threads, then it cannot take full advantage of the GPU’s parallel processing capability, and some proportion of the GPU’s capacity goes unused. The IWD can mitigate this effect by packing multiple small instances into a single wavefront, providing a substantial boost to utilization.
Next Generation Compute Unit (NCU) with Rapid Packed Math
GPUs today often use more mathematical precision than necessary for the calculations they perform years ago, GPU hardware was optimized solely for processing the 32-bit floating point operations that had become the standard for 3D graphics. However, as rendering engines have become more sophisticated—and as the range of applications for GPUs has extended beyond graphics processing—the value of data types beyond FP32 has grown.
The programmable compute units (Figure 7) at the heart of Vega GPUs have been designed to address this changing landscape with the addition of a feature called Rapid Packed Math. Support for 16-bit packed math doubles peak floating-point and integer rates relative to 32-bit operations. It also halves the register space as well as the data movement required to process a given number of operations. The new instruction set includes a rich mix of 16-bit floating point and integer instructions, including FMA, MUL, ADD, MIN/MAX/MED, bit shifts, packing operations, and many more.
For applications that can leverage this capability, Rapid Packed Math can provide a substantial improvement in compute throughput and energy efficiency. In the case of specialized applications like machine learning and training, video processing, and computer vision, 16-bit data types are a natural fit, but there are benefits to be had for more traditional rendering operations, as well. Modern games, for example, use a wide range of data types in addition to the standard FP32. Normal/direction vectors, lighting values, HDR color values, and blend factors are some examples of where 16-bit operations can be used.
With mixed-precision support, Vega can accelerate the operations that don’t benefit from higher precision while maintaining full precision for the ones that do. Thus, the resulting performance increases need not come at the expense of image quality.
In addition to Rapid Packed Math, the NCU introduces a variety of new 32-bit integer operations that can improve performance and efficiency in specific scenarios. These include a set of eight instructions to accelerate memory address generation and hashing functions (commonly used in cryptographic processing and cryptocurrency mining), as well as new AOD/SUB instructions designed to minimize register usage.
The NCU also supports a set of 8-bit integer SAD (Sum of Absolute Differences) operations. These operations are important for a wide range of video and image processing algorithms, including image classification for machine learning, motion detection, gesture recognition, stereo depth extraction, and computer vision. The QSAD instruction can evaluate 16 4×4-pixel tiles per NCU per clock cycle and accumulate the results in 32-bit or 16-bit registers. A maskable version (MQSAD) can provide further optimization by ignoring background pixels and focusing computation on areas of interest in an image.
Revised Pixel Engine
As ultra-high resolution and high-refresh displays become more widespread, maximizing pixel throughput is becoming more important Monitors with 4K+ resolutions and refresh rates up to 240Hz are dramatically increasing the demands on today’s GPUs. The pixel engines in the Vega architecture are built to tackle these demands with an array of new features.
The Draw-Stream Binning Rasterizer (DSBR) is an important innovation to highlight. It has been designed to reduce unnecessary processing and data transfer on the GPU, which helps both to boost performance and to reduce power consumption. The idea was to combine the benefits of a technique already widely used in handheld graphics products (tiled rendering) with the benefits of immediate-mode rendering used in high-performance PC graphics.
Standard immediate-mode rendering works by rasterizing each polygon as it is submitted until the whole scene is complete, whereas tiled rendering works by dividing the screen into a grid of tiles and then rendering each tile independently.
The DSBR works by first dividing the image to be rendered into a grid of bins or tiles in screen space and then collecting a batch of primitives to be rasterized in the scan converter. The bin and batch sizes can be adjusted dynamically to optimize for the content being rendered. The DSBR then traverses the batched primitives one bin at a time, determining which ones are fully or partially covered by the bin. Geometry is processed once, requiring one clock cycle per primitive in the pipeline. There are no restrictions on when binning can be enabled, and it is fully compatible with tessellation and geometry shading.
This design economizes off-chip memory bandwidth by keeping all the data necessary to rasterize geometry for a bin in fast on-chip memory (i.e., the 12 cache). The data in off-chip memory only needs to be accessed once and can then reused before moving on to the next bin. Vega uses a relatively small number of tiles, and it operates on primitive batches of limited size compared with those used in previous tile-based rendering architectures. This setup keeps the costs associated with clipping and sorting manageable for complex scenes while delivering most of the performance and efficiency benefits.
Pixel shading can also be deferred until an entire batch has been processed so that only visible foreground pixels need to be shaded. This deferred step can be disabled selectively for batches that contain polygons with transparency. Deferred shading reduces unnecessary work by reducing overdraw (i.e., cases where pixel shaders are executed multiple times when different polygons overlap a single screen pixel).
Deferred pixel processing works by using a scoreboard for color samples prior to executing pixel shaders on them. If a later sample occludes or overwrites an earlier sample, the earlier sample can be discarded before any pixel shading is done on it. The scoreboard has limited depth, so it is most powerful when used in conjunction with binning.
These optimizations can significantly reduce off-chip memory traffic, boosting performance in memory-bound scenarios and reducing total graphics power consumption. In the case of Vega desktop GPUs, we observed memory bandwidth reductions of up to 33% when the DSBR is enabled for existing game applications, with no increase in power consumption.
Built for Higher GPU Clock Speeds
One of the key goals for the Vega architecture was achieving higher operating clock speeds than any prior Radeon GPU. Put simply, this effort required the design teams to close on higher frequency targets. The simplicity of that statement belies the scope of the task, though. Meeting Vega’s substantially tighter timing targets required some level of design effort for virtually every portion of the chip.
In some units—for instance, in the texture decompression data path of the LI cache—the teams added more stages to the pipeline, reducing the amount of work done in each clock cycle to meet Vega’s tighter timing targets.
Adding stages is a common means of improving the frequency tolerance of a design, but those additional stages can contribute more latency to the pipeline, potentially impacting performance. In many cases, these impacts can be minor. In our texture decompression example, the additional latency might add up to two clock cycles out of the hundreds required for a typical texture fetch—a negligible effect.
In other instances, on more performance-critical paths, the Vega project required creative design solutions to better balance frequency tolerance with per-clock performance. Take, for example, the case of the Vega NCU, the design team made major changes to the compute unit to improve its frequency tolerance without compromising its core performance.
First, the team changed the fundamental floor plan of the compute unit. In prior GCN architectures with less aggressive frequency targets, the presence of wired connections of a certain length was acceptable because signals could travel the full distance in a single clock cycle. For this architecture, some of those wire lengths had to be reduced so signals could traverse them within the span of Vega’s much shorter dock cycles. This change required a new physical layout for the Vega NCU with a floor plan optimized to enable shorter wire lengths.
This layout change alone wasn’t sufficient, though. Key internal units, like the instruction, fetch and decode logic, were rebuilt with the express goal of meeting Vega’s tighter timing targets. At the same time, the team worked very hard to avoid adding stages to the most performance-critical paths. Ultimately, they could close on a design that maintains the four-stage depth of the main ALU pipeline and still meets Vega’s timing targets.
Vega also leverages high-performance custom SRAMs originally developed by the Zen CPU team. These SRAMs, modified for use in the in general-purpose registers of the Vega NCU, offer improvements on multiple fronts, with 8% less delay, an 18% savings in die area, and a 43% reduction in power use versus standard compiled memories.
AMD Ryzen APU Topology
Employing the Zen, Vega, and Infinity Fabric technologies described in the previous section, the AMD Ryzen Processor with Radeon Vega Graphics employs the physical topology shown below (Figure 10). The Infinity Fabric services six unique clients representing different categories of technologies in the AMD IP portfolio. These clients are centrally monitored and managed via the data/control capabilities of the fabric.
Below is a die shot of the Raven Ridge APU compared to the Ryzen CPU structure.
Below are some images from the care package I received from AMD and the product packaging for the new Ryzen APUs. As you can see the APU and the cooler each has their own packaging. The slender box for the APU which AMD has been using for some time and a cardboard box very similar for the cooler.
Next up are pictures of the Ryzen APU samples we have. The APUs are packaged in the usual fashion from AMD with a plastic sleeve inside the small cardboard box and include a case badge denoting Ryzen 5 or Ryzen 3 depending on the APU inside. Moving on to the Wraith Spire CPU cooler, I can say it will keep the Ryzen 5 2400G within its thermal envelope at stock speeds but that’s about as far as it goes. During stress testing, the APU would reach temperatures in the mid-eighties which is still under the throttling limit of 95 °C but dashes hopes of overclocking far on the stock cooler. It was also a bit awkward to install, but as you can see the stock TIM shows good coverage of the IHS and there was just the right amount pre-applied.
During the benchmarks, I wanted to give the APUs a fair shake in their own weight class but the only CPU I had available with the Intel UHD 630 graphics was an 8700K six-core twelve thread CPU that retails for over twice suggested retail price of the Ryzen 5 2400G. So I settled on the i3 8350K which retails at the same price point as the 2400G APU but I didn’t have one and wasn’t about to buy one either.
Just to be clear I’m trying to be as fair as possible so I took my i7 8700K and reduced it to a four core, four thread CPU and set it at 4.0 GHz with 3.7 GHz cache to mimic an i3 8350K as closely as possible with what I have. In the parts list I have denoted the 8350K with an asterisk and added a footnote as well but going forward I will refer to the CPU as an i3 8350K*
||Ryzen 3 2200G||A10-7870k||i3 8350K*|
|Motherboard||MSI B350I PRO AC||MSI B350I PRO AC||ASUS Crossblade Ranger||ASUS ROG Strix X370-E Gaming|
|Memory||G.Skill FlareX 2×8 GB DDR4-3200 MHz 14-14-14-34||G.Skill FlareX 2×8 GB DDR4-3200 MHz 14-14-14-34||G.Skill 2×4 GB DDR3-2400 10-12-12-31||G.Skill FlareX 2×8 GB DDR4-3200 MHz 14-14-14-34|
|HDD||Samsung 120 GB 840 EVO||Samsung 120 GB 840 EVO||Samsung 120 GB 840 EVO||Samsung 120 GB 840 EVO|
|Power Supply||Super Flower 1000W Platinum||Super Flower 1000W Platinum||Super Flower 1000W Platinum||Super Flower 1000W Platinum|
|iGPU||Radeon Vega 11||Radeon Vega 8||Radeon R7 512 Shaders||Intel UHD 630|
|Cooling||AMD Wraith Spire||AMD Wraith Spire||Noctua NH-D15||Noctua NH-D15|
|OS||Windows 10 x64||Windows 10 x64||Windows 10 x64||Windows 10 x64|
*i3 8350K=i7 8700K@ 4.0 GHz, four cores and four threads to simulate an i3 8350K
In the review package from AMD, we find parts from MSI and G.Skill for the Ryzen APU review. The motherboard delivered was the MSI B350I PRO AC, a full-featured Mini-ITX AM4 motherboard and for RAM, the package from G.Skill includes two FlareX DIMMs, this 2×8 GB kit is rated for DDR4-3200 at 14-14-14-34. Shortly after the Ryzen launch last year G.Skill released the FlareX and Fortis RAM which are specifically tuned for the AM4 platform. The full contents of the review package are pictured below.
The MSI B350I PRO AC, as I said is a mini-ITX form factor AM4 motherboard, don’t let its size fool you. MSI has packed a few goodies into that small real estate. The board is equipped with the B350 chipset and supports most AM4 CPUs currently available. I did notice however that the Ryzen 7 1800X wasn’t on the list likely due to TDP restraints. The two DRAM slots will support up to 32 GB of DDR4 in dual channel with speeds up to 3200 MHz. There’s one PCIe 3.0 x16 slot on the board and an M.2 connector on the back which supports PCIe 3.0 x4 and SATA NVMe drives. These PCIe connection speeds will depend on the CPU used. With the Ryzen CPUs they run full speed but with the reduced lanes in the APU they are both reduced to half that’s PCIe 3.0 x8 and x2 respectively. The MSI PRO AC also has USB 3.1 Gen2 connectivity, HDMI, and Display Port output and also includes Intel dual-band wireless/Bluetooth.
All benchmarks were run with the motherboard being set to optimized defaults (outside of some memory settings which had to be configured manually). When “stock” is mentioned along with the clock speed, this includes the precision boost 2 on the AMD Ryzen APUs. I tested this way to observe AMD’s updated Precision Boost 2 and how it manipulates the clock speeds when under different loads. I’d also like to reiterate the fact that I used an 8700K pared down to an 8350K performance level set at a static speed of 4 GHz with four cores and four threads. All onboard graphics were left at stock speeds for this testing.
After the testing, I set the AMD APUs to their maximum overclock for the CPU and iGPU that I could obtain on the MSI motherboard. I did find it had some limitations with voltage so I could only go so far. This will give you an idea of the possible performance gains to be had if you choose to overclock the APU. Memory was kept at the rated speeds for the FlareX DDR4 kit.
- AIDA64 Engineer CPU, FPU, and Memory Tests
- Cinebench R11.5 and R15
- x265 1080p Benchmark (HWBOT)
- SuperPi 1M/32M
- WPrime 32M/1024M
All CPU tests were run at their default settings unless otherwise noted.
All game tests were run at 1920×1080 on low presets for the benchmarks and verified V-Sync was disabled.
- 3DMark Fire Strike
- Middle Earth: Shadow of Mordor
- Metro Last Light
- Ashes of the Singularity
- Rise of the Tomb Raider
Just a note here, I used the latest AIDA64 Engineer Beta for testing but it still doesn’t officially support the Ryzen APU. First up is the AIDA64 cache and memory benchmark.
|AIDA64 Cache and Memory Benchmark|
|Ryzen 5 2400G @ 3.6 GHz||46981||47573||41578||68.8|
|Ryzen 3 2200G @ 3.5 GHz||46750||47660||41652||67.3|
|Intel i3 8350K* @ 4.0 GHz||47089||47755||42998||44.6|
|AMD A-10 7870K @ 3.9 GHz||22236||12393||21413||77|
As you can see the Ryzen is working much better with ram than it was a year ago but that latency is still quite high when comparing to Intel. Up next the AIDA64 CPU benchmarks.
|AIDA64 CPU Tests|
|Ryzen 5 2400G @ 3.6 GHz||46989||18366||356.4||33069||11308|
|Ryzen 3 2200G @ 3.5 GHz||30895||13768||228.1||29018||7335|
|Intel i3 8350K* @ 4.0 GHz||36091||26878||285.2||17698||4375|
|AMD A-10 7870K @ 3.9 GHz||18809||9791||175.3||8685||2745|
As you can see the four extra threads gave the 2400G a nice advantage through most of the CPU tests and the 2200G wasn’t all that far behind. On to the last of the AIDA64 benchmarks.
|AIDA64 FPU Tests|
|Ryzen 5 2400G @ 3.6 GHz||6170||19247||10051||6362|
|Ryzen 3 2200G @ 3.5 GHz||5786||18411||9588||4367|
|Intel i3 8350K* @ 4.0 GHz||6611||33031||18289||3391|
|AMD A-10 7870K @ 3.9 GHz||3788||6240||3184||1484|
The floating point tests seem to be a bit of a weak spot for the Ryzen based APUs even with the extra threads the 2400G was left behind in all but the SinJulia test.
Real World Tests
Next, we will move on to something a bit more tangible/productivity based with compression, rendering, and encoding benchmarks.
|Cinebench R11.5/R15, POVRay, x265 (HWBot), 7Zip – Raw Data|
|Ryzen 5 2400G @ 3.6 GHz||9.23||826||1702.86||19.84||21913|
|Ryzen 3 2200G @ 3.5 GHz||6.66||585||1374.92||17.04||16115|
|Intel i3 8350K* @ 4.0 GHz||7.82||683||1665.39||27.13||19203|
|AMD A-10 7870K @ 3.9 GHz||3.71||326||857.51||9.32||11912|
Here again, the extra threads gave the 2400G a bit of an advantage over the Intel CPU in all but HWBot’s X265 benchmark. The last generation of Intel CPUs got a real boost in this benchmark when compared to their predecessors.
Moving on from all the multi-threaded goodness above, we get to some Pi and Prime number based tests. SuperPi and WPrime, specifically. Even though AMD isn’t particularly strong in these benchmarks you can see there’s a vast improvement over their Steamroller counterpart.
|SuperPi and wPrime Benchmarks – Raw Data|
|CPU||SuperPi 1M||SuperPi 32M||wPrime 32M||wPrime 1024M|
|Ryzen 5 2400G @ 3.6 GHz||11.066||625.307||6.425||181.54|
|Ryzen 3 2200G @ 3.5 GHz||12.114||671.124||8.863||271.828|
|Intel i3 8350K* @ 4.0 GHz||9.141||461.783||6.939||221.602|
|AMD A-10 7870K @ 3.9 GHz||17.891||957.549||12.298||392.797|
As far as the games go, tests were done at 1080p and low presets. These APUs are meant to be an affordable all-in-one solution so gaming isn’t their primary purpose but as you’ll see it is doable with acceptable frame rates. For the gamers out there, you definitely won’t be disappointed with the performance of the Radeon Vega graphics!
As for the synthetic benchmark, 3DMark Fire Strike, you can see the results are similar to the graph above with the Vega graphics pulling in some impressive numbers for an iGPU.
Precision Boost 2
Just a few words on my observations of AMD’s improved boost function. First off it does behave differently than in their first iteration of Precision Boost. I’ll start with the Ryzen 3 2200G since the two CPUs behaved slightly different from each other. The 2200G has a base clock of 3.5 GHz and boost to 3.7 GHz and from what I saw it stayed at its top boost frequency on all four core even under heavy loads such as Cinebench R15 or HWBot x265. During single thread operations, it would boost one or more cores up to 3.7 GHz but the load appeared to move between different cores, it almost seemed erratic. I retested and set affinity to a single core so it was the only one that boosted, that core stayed at 3.7 GHz during the full test but the benchmark scored the same so the stock behavior didn’t affect the outcome.
The Ryzen 5 2400G behaved a bit different in that it didn’t reach full boost on all cores during heavy, multithreaded loads but would boost from a base speed of 3.6 GHz and hover between 3.75 GHz and 3.8 GHz. During light loads, it did, however, reach its full boost clock speed of 3.9 GHz.
For overclocking I switched coolers from the included Wraith Spire to the Noctua NH D-15. The 2400G was already close to its thermal limits running at stock with the stock cooler. The 2200G, on the other hand, had a bit more headroom and I did manage to do stability testing at 4.1 GHz with the included Wraith Spire cooler which was quite surprising, to say the least. I did try running it with the Noctua cooler but again I ran into the limit of the motherboard which wouldn’t allow me to set the core voltage above 1.4V. It could have been my sample but the Ryzen 5 2400G was nearly at its limits even with improved cooling. I only managed to get 3.95 GHz max for the CPU core and 1350 MHz on the GPU core. It was stable at these settings but I really didn’t get the graphics improvements I was hoping for. As you’ll see in the following results gaming only improved one or two frames per second. The Ryzen 3 2200G seemed to have a lot more headroom. I managed 4.1 GHz for the CPU core and 1300 MHz for the GPU core and only needed a slight boost to the SOC/NB voltage to get that extra 200 MHz from the APU.
So let’s see how they stacked up.
Power Consumption and Temperatures
In the graph below we tested power use of the system across multiple situations from idle, to Prime 95 Small FFT (with FMA3/AVX) to 3DMark Firestrike for a combined load. The system, at stock, was pulling a maximum of 104 W for CPU only load conditions with the 2400G and the 2200G maxed out at 93 W both these results were from the Prime 95 small FFT test. Even when overclocking the 2400G only made it up to 122 W and 93 W for the 2200G at 4.1 GHz. Keep in mind this is full system power usage, these CPUs definitely sip the electricity.
Temperatures were actually surprisingly well-controlled with the included Wraith Spire cooler, I saw no throttling at any point. The highest temperature when at stock was 85 °C with the Ryzen 5 2400G, during Prime95 Small FFT. This shows that the stock cooler is adequate for the 2400G at stock settings but it wouldn’t hold up during overclocking. For the Ryzen 3 2200G it worked well all the way to 4.1 GHz under stress testing mind you the CPU was reaching and slightly passing 90 °C during P95 small FFT testing but it’s all you would need.
Overall, the performance from AMD’s newest addition to the Ryzen stable is quite impressive. When comparing them to the A10-7870K, from a computing perspective there really is no contest and the graphics performance has nearly doubled. Looking at the numbers and the dollars the Ryzen 5 2400G compares quite favorably to the 8350K which is in the same ball park, price wise, with an MSRP of $169.99. It has four more threads to aid in multi-threaded workloads and a pretty decent graphics processor if you feel like kicking back and doing some light gaming. Personally, I feel the real sweetheart is the $ 99.99 Ryzen 3 2200G and I wouldn’t be surprised to see it show up in a lot of OEM machines in the near future.
I think AMD has hit the nail on the head this time, these two CPUs fit their intended market just like a glove. They’re offering the best of both worlds with great performance that won’t break the bank. Overclockers Approved!
Shawn Jennings – Johan45