High Bandwidth Memory (HBM3)

SK Hynix Inc

High Bandwidth Memory 3 (HBM3) is the latest generation of memory technology.

It is an advanced memory system that provides very high data transfer speeds (bandwidth), uses low power, and packs a large amount of memory (high capacity) into a small physical size (form factor).

HBM a type of memory architecture used in high-performance computing. It's known for its ability to provide extremely high memory bandwidth.

HBM3e is the latest generation in the HBM series, following HBM2 and HBM2E.

The 'e' in HBM3e denotes an enhanced version of the HBM3 standard.

And there may be further development, the market may soon witness a second generation of HBM3 devices, following the trend set by and LPDDR5, which have already seen speed upgrades

HBM also uses a very wide interface to the processor chip.

An interface is how two different parts of a system connect and communicate with each other. By using many parallel connections (like having many lanes on a highway), HBM can send and receive a massive amount of data to/from the processor simultaneously.

One of the most significant advantages of HBM3 is its increased storage capacity.

Supporting up to 32 Gb of density and a 16-high stack, HBM3 can provide a maximum of 64 GB of storage, almost triple that of HBM2E.

This expanded memory capacity is crucial for handling the increasing demands of advanced applications.

Data Transfer

In addition to its storage capabilities, HBM3 boasts speed, with a top data transfer rate of 6.4 Gbps, nearly doubling the speed of HBM2E (3.6 Gbps).

HBM memory stacks several chips vertically on a substrate, which is then connected to a processor or GPU via a silicon interposer.

Vertical Stacking and Silicon Interposer

HBM uses an innovative approach of stacking multiple DRAM dies on top of each other vertically.

DRAM stands for Dynamic Random Access Memory, which is the type of memory commonly used in computers. A die is a small block of semiconducting material on which a given functional circuit is fabricated. So an HBM stack has several DRAM dies stacked up.

HBM (High Bandwidth Memory) uses a unique architecture where multiple DRAM chips are stacked vertically on a substrate, rather than being placed side by side like in traditional memory layouts.
The stacked DRAM chips are connected to a processor or GPU using a , which is a thin layer of silicon that sits between the memory stack and the processor/GPU.
The silicon interposer contains a large number of tiny wires (interconnects) that enable high-speed communication between the stacked memory and the processor/GPU.
This vertical stacking and use of a silicon interposer allow for a much wider interface and higher bandwidth compared to traditional memory configurations.

The benefits of stacking

The DRAM dies are linked together using vertical interconnects called Through-Silicon Vias (TSVs).

A TSV is a vertical electrical connection that passes completely through a silicon die. It allows the stacked dies to communicate with each other much faster than traditional wire-bonding. Think of it as an elevator shaft that lets data move between different floors (dies) quickly.

This vertical stacking, combined with a wider interface, enables much higher bandwidth compared to traditional flat, planar layouts of DRAM.

This significant increase in bandwidth enables faster processing and improved overall system performance.

Power Efficiency

Another key benefit of HBM3 is its improved power efficiency.

HBM3 reduces the (the voltage supplied to the DRAM chips) to 1.1 volts from HBM2E's 1.2 volts.

This lower voltage means less power is consumed by the memory. This allows HBM3 to offer substantial power savings without compromising performance.

Remember, power consumption is proportional to the square of the voltage $(P = V^2 / R)$ . So even a small reduction in voltage can have a significant impact on power efficiency. The challenge is maintaining signal integrity and data retention at lower voltages.

This improved power efficiency has permitted improvements in bandwidth, reliability

Bandwidth

HBM3 achieves this through an enhanced channel architecture, dividing its 1024-bit interface into 16 64-bit channels or 32 32-bit pseudo-channels.

What are pseudo-channels: HBM3 splits each physical 64-bit channel into two 32-bit "pseudo-channels". This effectively doubles the number of independent sub-channels from 16 to 32.

More pseudo-channels allow greater parallelism - more data can be accessed simultaneously from different regions of the DRAM. This improves bandwidth utilisation and performance.

However, the pseudo-channel logic does consume some additional power. The power savings from the core voltage reduction help offset this.

Nonetheless. this doubled number of pseudo-channels, combined with the increased data rate, results in a substantial performance improvement over HBM2E.

Reliability

HBM3 also incorporates advanced RAS (reliability, availability, and serviceability) features that enhance data integrity and system reliability.

On-die ECC

Error-Correcting Code (ECC) is a method of detecting and correcting bit errors in memory. HBM3 introduces on-die ECC, where the ECC bits are stored and the correction is performed within each DRAM die.

On-die ECC improves reliability by catching and fixing errors locally before data is transmitted to the host. However, the ECC circuits do add some power overhead. Careful design is needed to minimise this.

Error Check and Scrub (ECS)

This is a background process that periodically reads data from the DRAM, checks the ECC for errors, and writes back corrected data if necessary.

ECS helps maintain data integrity over time, preventing the accumulation of bit errors. The scrubbing does consume some additional power, but it is essential for mission-critical applications.

Refresh Management

DRAM cells lose their data over time due to charge leakage and must be periodically refreshed. HBM3 introduces advanced refresh management techniques like Refresh Management (RFM) and Adaptive Refresh Management (ARFM).

These allow the refresh rate to be optimised based on temperature and usage conditions. Unnecessary refreshes can be avoided, saving power. The refresh logic does add some complexity and power, but the net effect is a power savings.

Latency

The new clocking architecture in HBM3 decouples the traditional clock signal from the host to the device and the data strobe signals, allowing for a lower latency and high-performance solution when migrating from HBM2E to HBM3.

Clock architecture

HBM3 decouples the command/address clock from the data bus clock. The command clock runs at half the frequency of the data clock. This allows the DRAM I/O to run faster without burdening the core DRAM arrays.

Splitting the clocks does require some additional clock generation and synchronisation logic which consumes power. But it enables a significant data rate increase without a proportional power increase.

Summary

In conclusion, HBM3 represents a leap forward in memory technology, offering increased storage capacity, faster data transfer rates, improved power efficiency, and advanced features.

With its ability to meet the growing demands of high-performance computing applications, HBM3 is poised to become the memory solution of choice for industries seeking cutting-edge performance and efficiency.

As the adoption of HBM3 grows, we can expect to see groundbreaking advancements in graphics, cloud computing, networking, AI, and automotive sectors, propelling us into a new era of technological innovation.

Other Technical Details

Pseudo-Channels:

Each HBM channel is quite wide (128 bits in HBM2). To further increase parallelism, each channel can be split into narrower "pseudo-channels".
HBM2 could split each channel into 2 pseudo-channels of 64 bits, while HBM3 can split into 4 pseudo-channels of 32 bits.
This allows even more simultaneous data access within a channel.

Wide Data Interface

HBM has a very wide data bus - 1024 bits in total. In HBM2 this was split into 8 channels of 256 bits each, while in HBM3 it's 16 channels of 128 bits.
This wide interface allows a high data rate (amount of data transferred per second) to be achieved at a relatively lower clock speed, which helps manage power consumption.

High-Speed Signaling:

HBM uses advanced circuit techniques to achieve very high signaling rates on the data interface.
HBM2 could transfer data at up to 2 Gigabits per second (Gbps) per pin. HBM2E increased this to 3.2 Gbps, and HBM3 will reach 6.4 Gbps or even higher.
To ensure reliable operation at these high speeds, HBM uses techniques like equalization (adjusting the signal strength and shape), Forward Error Correction (adding redundant data that allows errors to be corrected), and careful alignment and training of the signal timing.

Separate Row and Column Commands

The DRAM chips in HBM are arranged in a grid of rows and columns. Accessing data requires first selecting a row (called activating the row) and then reading or writing the desired columns.
HBM has separate command buses for row commands (like activate and precharge) and column commands (like read and write). This allows the memory controller to prepare the next row while still reading or writing data from the current row, improving utilization.

Additive Latency and Read-Modify-Write

Additive latency allows the memory controller to send a read command before the associated row activate command. The DRAM internally delays the read until the row is ready. This hides some of the row activation time, reducing overall latency.
Read-modify-write is a feature that allows a small piece of data within a larger block to be updated without having to read the entire block, modify it in the processor, and write the entire block back. This saves time and power.

To achieve its high bandwidth goals, HBM requires careful engineering of the electrical signals and power delivery:

Signal Integrity

Signal integrity refers to ensuring that the electrical signals representing the data maintain their correct shape and timing as they travel from the sender to the receiver.
At the high speeds used by HBM, this is challenging. The engineers must carefully model the entire signal path, including the microscopic bumps and TSVs in the HBM stack and the wiring in the interposer.
They use advanced simulation models (like IBIS-AMI) that capture the behavior of the circuits and the physics of the signals. They run many simulations with different patterns of data to statistically predict the likelihood of signal errors (the Bit Error Rate or BER).
The goal is to achieve a clean "eye diagram" - a visual representation of the signal quality that shows there is adequate margin for the signal voltage and timing to correctly represent the data bits even with the expected manufacturing variability and changes in operating conditions like temperature and voltage.

Power Integrity

Power integrity refers to ensuring that the voltage supplied to the HBM remains stable and noise-free despite the large, fast changes in current draw as the HBM operates.
The electrical current consumed by the HBM has to flow through the wiring in the interposer and package. The resistance and inductance of this wiring causes the voltage to droop when the current changes rapidly, which can cause errors if the voltage goes too low.
To mitigate this, the engineers place decoupling capacitors (small reservoirs of charge) very close to the HBM stack - on the same die, on the interposer, and on the package. These capacitors supply current to the HBM when needed and help smooth out the voltage fluctuations.
The placement and size of these capacitors, as well as the geometry of the power delivery network wiring, must be carefully optimized to ensure a clean power supply.
In some cases, specialized voltage regulators designed specifically for HBM may be placed very close to the stack to further improve the power integrity.

In addition to raw performance, HBM includes several reliability, availability, and serviceability (RAS) features:

Error Checking

HBM includes parity bits on critical signals like the command bus and address bus. Parity is a simple form of error detection where an extra bit is added to a group of bits to make the total number of '1's either even or odd. If a single bit gets corrupted, the parity will no longer match, indicating an error.
HBM3 introduces an even more advanced error detection method called Pulse Amplitude Modulation 4-level (PAM4). This allows the receiver to not only detect errors but also to assess the signal quality in real-time.

Error Correction

While earlier HBM versions could only detect errors, HBM3 adds the ability to correct errors using Error Correction Codes (ECC). ECC works by adding redundant bits to the data that allow the receiver to not only detect but also correct a certain number of bit errors.
HBM3 includes ECC within the DRAM chips themselves, which can correct single-bit errors and detect multi-bit errors.
The HBM also performs background scrubbing, where it periodically reads out the data, checks and corrects any errors, and writes the corrected data back. This prevents the accumulation of errors over time.

Lane Repair

HBM includes spare data and command/address lanes (like extra highway lanes). If a lane is not performing well or has failed, the controller can swap in one of the spare lanes to replace it.
This repair can be done by programming the HBM's configuration registers, or it can be made permanent by blowing fuses (essentially tiny electrical switches that can be permanently opened).

Temperature Monitoring

The performance and reliability of DRAM is sensitive to temperature. If the DRAM gets too hot, it may not be able to meet its timing requirements, leading to errors.
HBM includes several temperature sensors within the stack that continuously monitor the temperature.
The memory controller can read out these temperature values and adjust its operation accordingly. For example, it can increase the frequency of refresh operations (which are necessary to maintain data integrity in DRAM) at higher temperatures. If the temperature gets too high, the controller can throttle down the data transfer rate to reduce power consumption and heat generation.

Designing an HBM system that achieves multi-gigabit per second data rates while maintaining signal integrity, power integrity, thermal control, and high reliability is a complex multi-disciplinary challenge.

It requires close collaboration and co-design across the entire system, from the DRAM chips themselves to the interposer, the package, the PHY circuits, and the memory controller.

The engineering teams must carefully budget and allocate the available timing and voltage margins across these different components.

They rely heavily on detailed modeling and simulation to predict and optimise the system's behavior.

Specialised circuit designs, advanced packaging technologies, sophisticated error correction and calibration methods, and dynamic monitoring and adaptation techniques are all essential to make the HBM system robust and reliable.

When all of these elements come together successfully, HBM provides a step-change improvement in memory performance within a manageable power envelope.

This has made HBM a critical enabler for applications like high-performance computing, artificial intelligence and machine learning training, and high-speed networking, which all demand the highest possible memory bandwidth.

As the latest generation, HBM3, ramps up into volume production, and future generations like HBM4 are developed, we can expect to see HBM continue to advance the leading edge of computing systems. It's a complex and fascinating technology that showcases the ingenuity and perseverance of the engineers and scientists pushing the boundaries of what's possible in the world of semiconductors and computing architecture.

PreviousRemote Direct Memory Access (RDMA)NextFlash Memory

Last updated 1 year ago

Was this helpful?