Abstract
1- Introduction
2- Related work
3- Introduction to relational and NoSQL databases
4- Systems, metrics and benchmarks
5- Speedup, powerup, and greenup metrics
6- Experimental results of NoSQL databases
7- Experimental results of relational databases
8- Cross database comparison
9- Conclusion and future work
References
Abstract
As big data becomes the norm of various industrial applications, the complexity of database workloads and database system design has increased significantly. To address these challenges, conventional relational databases have been constantly improved and NoSQL databases such as MongoDB and Cassandra have been proposed and implemented to compete with SQL databases. In addition to traditional metrics such as response time, throughput, and capacity, modern database systems are posing higher requirements on energy efficiency due to the large volume of data that need to be stored, queried, updated, and analyzed. While decades of research in the database and data processing communities has produced a wealth of literature that optimize for performance, research on optimizations for energy efficiency has been historically overlooked and only a few studies have investigated the energy efficiency of database systems. To the best our knowledge, there are currently no comprehensive studies that analyze the impact of query optimizations on performance and energy efficiency across both relational and NoSQL databases. In fact, the energy behavior of many basic database operations (e.g. insertion, deletion, searching, update, indexing, etc) remains largely unknown due to the lack of accurate power measurement methodologies for various databases and queries. In this paper, we investigate a series of query optimization techniques for improving the energy-efficiency of relational databases and NoSQL databases. We use both widely acceptable benchmarks (e.g. Yahoo! Cloud Server Benchmark) and customized datasets (converted from ˜100GB of Twitter data) in our experiments to evaluate the effectiveness of various optimization techniques. We conduct cross database analysis on relational database (MySQL) and NoSQL based databases (MongoDB and Cassandra) to compare their performance and energy efficiency. Additionally, we study a variety of optimization techniques that can improve energy efficiency without compromising performance on the databases derived from the Twitter data. Using these techniques, we are able to achieve significant energy savings without performance degradation. Moreover, we investigate the impact of Dynamic Voltage and Frequency Scaling (DVFS) on the performance and energy efficiency of MySQL, MongoDB and Cassandra.
INTRODUCTION
Energy efficiency has become a critical design and operational criteria for computing systems ranging from data centers, small clusters, stand-alone servers, to mobile and embedded devices. Unfortunately, Database Management Systems (DBMS) running in server environments have largely ignored the energy efficiency issue, but we can no longer afford such oversight. For example, Google currently processes about 40,000 queries per second or 3.5 billion queries per day [1]. Today, people express their opinions and views on Twitter and emerging events or news are often followed almost instantly by a burst in Twitter volume, which makes Twitter another exemplary big dataset where many social media analytics tools are being used to determine attitude of people towards a product, idea, and so on. However, analyzing such humungous volume of data with accuracy and efficiency is very costly thus requires databases to be highly efficient in terms of both performance and energy efficiency. Despite the fact that the majority of data is stored and processed using different forms of databases, either relational or NoSQL, the current academic research and industrial practices on databases emphasize more on performance than energy efficiency. To the best our knowledge, there are currently no comprehensive studies that analyze the impact of query optimizations on the performance and the energy efficiency across both relational and NoSQL databases. In fact, the energy behavior of many basic database operations (e.g. insertion, deletion, searching, update, indexing, etc.) remains largely unknown due to the lack of accurate power measurement tools and analysis methodologies that can be applied to various databases. However, understanding the energy benefit of various query optimizations is paramount for both relational and NoSQL databases, especially for servers that response to millions of queries on a daily basis. Maximizing the energy efficiency of each single query could significantly reduce the accumulated cost of large-scale database systems. Meanwhile, it is worth noting that there are ongoing debates for improving database energy efficiency. Numerous database researchers believe that hardware optimizations (e.g. replacing HDDs with SDDs [2] [21]) are more effective than software optimizations. Some studies conclude that energy savings are merely a byproduct of performance optimizations [3]. Nonetheless, other researchers argue that performance optimization and energy optimization are conflicting goals (i.e. performance needs to be sacrificed to save energy or vice versa). They believe that tradeoffs are inevitable in a multi-objectives optimization problem. All these arguments are reasonable in certain scenarios but they do not reveal the whole picture of database optimizations. For example, the best-case scenario, where optimizations can reduce energy consumption without degrading performance and energy saving is larger than performance improvement, has been overlooked due to the lack of measurement tools and analysis methodologies. In fact, researchers may doubt the existence of such best-case scenarios because it sounds too idealistic.