August 13, 2019

Breaking the Memory Wall: The AI Bottleneck

By Michael Hall

In the long unfolding arc of technology innovation, artificial intelligence (AI) looms immense. In its quest to mimic human behavior, the technology touches energy, agriculture, manufacturing, logistics, healthcare, construction, transportation and nearly every other imaginable industry – a defining role that promises to fast track the fourth Industrial Revolution. And if the industry oracles have it right, AI growth will be nothing shy of explosive.

“The gains these days are not incremental,” said Ajit Manocha, SEMI president and CEO, said to a gathering in July of the Chinese American Semiconductor Professional Association (CASPA) for its Summer Symposium at SEMI’s headquarters in Milpitas. “They are hockey stick – exponential – with AI semiconductors growing in market size from $4 billion this year to $70 billion in 2025.”

Manocha left little doubt that AI is remaking the semiconductor industry and, in the process, the world at large. Internet of Things (IoT) and 4G/5G, both key AI enablers, will account for more than 75 percent of device connections by 2025.

CASPA Ajit “Today, 30 billion devices worldwide are connected,” Manocha said, citing an Applied Materials prediction that the number of connected devices globally will grow to between 500 billion and 1 trillion by 2030. Those devices will generate stunning amounts of data collected, interpreted and used to reason, solve problems, learn and plan, leading to the holy grail of autonomous machine behavior.

To process this colossal amount of data central to the promise of AI, the industry must break through the limits of a key technology: memory.

Memory a Critical AI Bottleneck

The challenge for memory starts with performance. Historically, every decade gains in compute performance have outpaced improvements in memory speed by 100 times, and over the past 20 years that gap has grown, said Steven Woo, a fellow and distinguished inventor at Rambus, presenting at the symposium. The upshot is that memory has bottlenecked compute and, in turn, AI performance.

CASPA Steven

The industry has responded with new ways to implement memory systems on AI chips. Each is suited to unique performance requirements and, of course, comes with trade-offs. Among the frontrunners:

On-chip memory delivers the highest bandwidth and power efficiency but is limited in capacity.
HBM (High Bandwidth Memory) offers both very high memory bandwidth and density.
GDDR balances trade-offs among bandwidth, power efficiency, cost and reliability.

Since 2012, AI training capability has grown 300,000 times, besting Moore’s law by 25,000 times in doubling every 3.5 months, a blistering pace compared to the 18-month doubling cycle of Moore’s law, Woo said. The staggering improvements have been driven by parallel computing capacity and new application-specific silicon like Google’s Tensor Processing Unit (TPU).

These specialized silicon architectures and parallel engines are key to sustaining future gains in compute performance and combatting the slowing of Moore’s Law and the end of power scaling, Woo said. By rethinking the way processors are architected for certain markets, chipmakers can develop dedicated hardware capable of operating with 100 to 1,000 times greater energy efficiency than general purpose processors to overcome another big limiter to scaling compute performance – power.

For its part, the memory industry can improve performance by signaling at higher data rates and using stacked architectures like HBM for greater power efficiency and performance, and by bringing compute closer to the data.

Memory scaling for AI

A key challenge is scaling memory for AI. Demand for better voice, gesture and facial recognition experiences and more immersive virtual reality and augmented reality interactions is tremendous, said Bill En, senior director at AMD, speaking at the symposium. These capabilities require more processing power across both high-performance computing (HPC) for big data analytics and machine learning as it relies on AI and machine intelligence to generate meaningful insights.

CASPA Bill

Emerging machine learning applications include classification and security, medicine, advanced driver assistance, human-aided design, real-time analytics and industrial automation. And with 75 billion IoT-connected devices – all generating data – expected by 2025, there will be no shortage of data to analyze, En said. The wings alone of a new Airbus A380-1000 feature some 10,000 sensors.

Mountains of this data are stored in massive data centers on magnetic hard drives, then transferred to DRAM before moving to SRAM within the CPU for the handoff to the compute hardware for analysis.

With data growing at an exponential clip, the question is how to make sure all other memory systems can handle the flood of data. AMD’s answer is a chiplet architecture featuring eight smaller chips around the edge that drive the compute and a large chip in the center that doubles the IO interface and memory capability to in turn double chip bandwidth.

AMD has also moved from a legacy GDDR5 memory chip configuration to HBM to bring memory bandwidth closer to the GPU for more efficient processing of AI applications. The HBM provides much higher bandwidth while reducing power consumption. Compared to DRAM, AMD’s HBM delivers a much faster data rate and far greater memory density, En said.

Over the next decade, look for more performance improvements from multi-chip architectures, innovations in memory technology and integration, aggressive 3D stacking and streamlined system-level interconnects, he said. The industry will also continue to drive performance gains in devices, compute density and power through technology scaling.

Michael Hall is a global marketing communications manager at SEMI.

Topics: AI , sensors , Artificial Intelligence , HPC , IoT , manufacturing , DRAM , healthcare , Applied Materials , Internet of Things , GPU , Google , Moore's Law , AMD , memory , CASPA , high-performance computing , 3D stacking , agriculture , transportation , Ajit Manocha , Fourth Industrial Revolution , Rambus , on-chip memory , HBM , GDDR , TPU , Tensor Processing Unit , compute performance , data , GDDR5 , chiplet , bottleneck , energy , logistics , High-Bandwidth Memory , memory scaling , graphics processing unit

Join SEMI Today

Join the Climate Consortium

Connect at SEMI

Enroll now: Semitracks Course Subscription

Cybersecurity Standard

SEMICON China 2024

SEMICON SEA 2024

SEMICON West

Breaking the Memory Wall: The AI Bottleneck

Search SEMI Blog

Subscribe to SEMI blog