Ultra-low power processor design using sub-threshold design techniques

David Blaauw,Bo Zhai
DOI: https://doi.org/10.1007/978-0-387-34047-0_1
2007-01-01
Abstract:Power consumption is becoming worse with every technology generation. While there has been much research in recent years proposing design methods addressing this issue, one of the most efficient approaches is to reduce supply voltage, which can help mitigate both dynamic and static power consumption. In our research, we found the optimal voltage (Vmin) for energy efficiency in CMOS technology. We analyzed the different factors affecting Vmin and find that Vmin usually lies in subthreshold voltage regime. The increased sensitivity of subthreshold switching current to process variation poses a significant design challenge. We investigated the impact of subthreshold variation on circuit performance and energy consumption in a statistical manner and proposed certain design guidelines to mitigate variation.To verify the high energy efficiency of subthreshold operation, we designed and fabricated two subthreshold processors in 0.13um technology, specifically, the Subliminal 1 and Subliminal 2 processors. Measurements confirm 2.60pJ per instruction efficiency for the Subliminal 1. However, we also found that the on-chip SRAM was the energy consumption bottleneck. Therefore, we designed the first sub-200mV compact 6-T SRAM. It was fabricated in a commercial 0.13um CMOS technology and silicon measurements shows that all 24 dies measured were fully functional and a typical die operates from 1.2V to 193mV. This could be further extended to sub-170mV with 2% bit redundancy.The downside of voltage scaling into subthreshold is the considerable performance loss. To address this issue, we proposed a novel micro-architecture that combines chip multi-processing and subthreshold techniques. By tuning the supply voltage and threshold voltage of the L1 cache and the processor core independently, we found that having multiple cores sharing one faster local L1 provides the best energy efficiency. In particular, SPLASH2 benchmarks show about a 53% energy improvement over the traditional CMP approach (about 70% over a single core machine).
What problem does this paper attempt to address?