Abstract
I. Introduction
II. Background
III. Study Case: Compressed Integer Vectors
IV. Experimental Evaluation
V. Conclusion and Future Work
Authors
Figures
References
Abstract
In the field of algorithms and data structures analysis and design, most of the researchers focus only on the space/time trade-off, and little attention has been paid to energy consumption. Moreover, most of the efforts in the field of Green Computing have been devoted to hardware-related issues, being green software in its infancy. Optimizing the usage of computing resources, minimizing power consumption or increasing battery life are some of the goals of this field of research. As an attempt to address the most recent sustainability challenges, we must incorporate the energy consumption as a first-class constraint when designing new compact data structures. Thus, as a preliminary work to reach that goal, we first need to understand the factors that impact on the energy consumption and their relation with compression. In this work, we study the energy consumption required by several integer vector representations. We execute typical operations over datasets of different nature. We can see that, as commonly believed, energy consumption is highly related to the time required by the process, but not always. We analyze other parameters, such as number of instructions, number of CPU cycles, memory loads, among others.
Introduction
We are surrounded by digital information, such as the huge amount of data generated on the Internet and also that we are collecting in our daily lives: human generated data, both consciously (such as emails, tweets, pictures, voice) and unconsciously (clicks, likes, follows, logs, . . . ), or observed data (biological, astronomical, etc.). When managing large volumes of digital information, data compression has always been considered to be vitally important. Traditionally, data compression focused on obtaining the smallest representation possible, in order to save space and transmission time, thus, providing a good archival method. However, most of the compression techniques require decompressing the data when they need to be accessed, especially when these accesses are not sequential, and thus, limiting the applicability of data compression. To overcome these issues, compact data structures appeared in the 1990’s and rapidly evolved during early years of the current century [1]. They use compression strategies to reduce the size of the stored data, taking advantage of the patterns existing in the data, but with a key difference: data can be directly managed and queried in compressed form, without requiring prior decompression. The main contribution is that they allow larger datasets fit in faster levels of the memory hierarchy than classical representations, thus, dramatically improving processing times. In addition, many compact data structures are equipped with additional information that, within the same compressed space, acts as index and speeds up queries.