Abstract
I. Introduction
II. Background and Motivation
III. PTX-Based Modeling
IV. Experimental Results
V. Conclusion
Authors
Figures
References
Abstract
In the quest for exascale computing, energy-efficiency is a fundamental goal in highperformance computing systems, typically achieved via dynamic voltage and frequency scaling (DVFS). However, this type of mechanism relies on having accurate methods of predicting the performance and power/energy consumption of such systems. Unlike previous works in the literature, this research focuses on creating novel GPU predictive models that do not require run-time information from the applications. The proposed models, implemented using recurrent neural networks, take into account the sequence of GPU assembly instructions (PTX) and can accurately predict changes in the execution time, power and energy consumption of applications when the frequencies of different GPU domains (core and memory) are scaled. Validated with 24 applications on GPUs from different NVIDIA microarchitectures (Turing, Volta, Pascal and Maxwell), the proposed models attain a significant accuracy. Particularly, the obtained power consumption scaling model provides an average error rate of 7.9% (Tesla T4), 6.7% (Titan V), 5.9% (Titan Xp) and 5.4% (GTX Titan X), which is comparable to state-of-the-art run-time counter-based models. When using the models to select the minimum-energy frequency configuration, significant energy savings can be attained: 8.0% (Tesla T4), 6.0% (Titan V), 29.0% (Titan Xp) and 11.5% (GTX Titan X).
Introduction
Over the past decade, the high-performance computing (HPC) area has observed a noticeable upsurge in the utilization of accelerators, more specifically graphics processing units (GPUs). The energy efficiency of these devices can have a large impact on the total cost of large-scale computer clusters. As an example, the Summit supercomputer (number one system of June’2019 Top500 list [1]), uses a total of 27 648 NVIDIA Volta GPUs to achieve a peak performance of almost 200 petaflops. For that, it requires a power supply of 13 million watts, which corresponds to an estimated cost of 17 million dollars per year (on power supply alone) [2]. The magnitude of such values highlights the importance of effective mechanisms to maximize the energy efficiency of these systems, as a mere 5% decrease in the energy consumption could generate savings of around 1 million dollars. One example of such mechanisms is the dynamic voltage and frequency scaling (DVFS), which allows placing devices into lower performance/power states. When carefully applied to match the needs of the executing applications, DVFS can lead to significant power and energy savings, sometimes with minimum impact on performance [3], [4]. A recent study showed that using DVFS techniques in GPUs executing deep neural networks applications can provide energy savings up to 23% during training and 26% during inference phases [5].