Abstract
1- Big Data science
2- Cloud computing for Big Data
3- Main requirements of Big Data analysis on Clouds
4- Big Data science frameworks
5- Discussion and open challenges
References
Abstract
Recently, we have been witnessing huge advancements in the scale of data we routinely generate and collect in pretty much everything we do, as well as our ability to exploit modern technologies to process, analyze and understand this data. The intersection of these trends is what is, nowadays, called Big Data Science. Big Data Science requires scalable architectures for storing and processing data. Cloud computing represents a practical and cost-effective solution for supporting Big Data storage, processing and for sophisticated analytics applications. We analyze in details the building blocks of the software stack for supporting Big Data Science as a commodity service for data scientists. In addition, we analyze and classify the state-of-the-art of big data analytics frameworks, available today mostly on Clouds, based on their supported service models. Furthermore, we provide various insights about the latest ongoing developments and open challenges in this domain.
Big Data science
The continuous growth and integration of data storage, computation, digital devices and networking empowered a rich environment for the explosive growth of Big Data as well as the tools through which data is produced, shared, cured and analyzed [43]. In addition to the 4Vs (Volume, Velocity, Variety and Veracity), it is vital to consider an additional feature of Big Data that is Value. Value is obtained by analyzing Big Data and extracting from them hidden patterns, trends and knowledge models by using smart data analysis algorithms and techniques. Data science methods must be able to analyze Big Data and extract features we don’t know. Those learned features improve the value of data that will make it possible to better understand phenomena and behaviors, optimizing processes, and improving machine, business and scientific discovery. Therefore, we cannot look at Big Data Science without considering data analysis and machine learning as key steps for including value as a Big Data Science strategy. In practice, big data analytics tools enable data scientists to discover correlations and patterns via analyzing massive amounts of data from various sources that are of different types. Recently, Big Data science [3] has emerged as a modern and important data analysis discipline. It is considered as an amalgamation of classical disciplines such as statistics, artificial intelligence, mathematics and computer science with its sub-disciplines including database systems, machine learning and distributed systems. It combines existing approaches with the aim of turning abundantly available data into value for individuals, organizations, and society. The ultimate goal of data science techniques is to convert data into meaningful information. Both in business and in science, data science methods have shown to facilitate more robust decision making capabilities. In the last few years, we have witnessed a huge emergence of Big Data Science in various real-world applications such as business optimization, financial trading, healthcare data analytics and social network analysis, just to name but a few [43]. In particular, we can think of the relationship between Big Data and data science as being like the relationship between crude oil and an oil refinery.