Abstract
1- Introduction
2- Theoretical background
3- Research method
4- Results
5- Discussion and Implications for Future Work
6- Conclusion
References
Abstract
The data lake approach has emerged as a promising way to handle large volumes of structured and unstructured data. This big data technology enables enterprises to profoundly improve their Business Intelligence. However, there is a lack of empirical studies on the use of the data lake approach in enterprises. This paper provides the results of an exploratory study designed to improve the understanding of the use of the data lake approach in enterprises. I interviewed 12 experts who had implemented this approach in various enterprises and identified three important purposes of implementing data lakes: (1) as staging areas or sources for data warehouses, (2) as a platform for experimentation for data scientists and analysts, and (3) as a direct source for self-service business intelligence. The study also identifies several perceived benefits and challenges of the data lake approach. The results may be beneficial for both academics and practitioners. Further, suggestions for future research is presented.
Introduction
Business Intelligence (BI) is a contemporary approach that combines methodologies, processes, architectures, and technologies to transform raw data into meaningful information for decision making [1]. BI can play a vital role in improving organizational performance by identifying new opportunities, highlighting potential threats, revealing new business insights, and enhancing decision making processes [2, 3]. Therefore, BI is a top priority for organizations in most industries [4]. Traditionally, BI focuses primarily on structured and internal enterprise data, overlooking potentially valuable information embedded in unstructured and external data. This could result in an incomplete view of reality and biased enterprise decision making [5]. The accelerated growth and pervasive development of internet, web, and cloud technologies have given new meaning to the phrase “information overload” [6]. These technological advances have led to the generation of unprecedented volumes and accumulations of data. Large and complex data are often described by the concept of “Big data” [7]. As big data become increasingly available, the challenge of analyzing large and growing data sets is growing more urgent. Therefore, BI today faces new challenges, but also exciting opportunities [5]. Big data was one of the big buzzwords of the 2000s [8]. The first organizations to embrace big data were online and start-up companies. According to Davenport and Dyché [8], companies like Google, eBay, and Facebook were built around big data from the beginning. Big data changed the way enterprises manipulated data, providing not only new opportunities to handle data, but also new ways to use and add value to vast amounts of data coming from the Internet of Things (IoT), social media, web logs, and sensors [9]. Big data also supports the supply of data as a resource that organizations can utilize [10]. Big data has also led to the emergence of modern technologies like data lakes, which enable enterprises to store and handle large volumes of structured and unstructured data in their native format. However, despite the prevalence of this technology, our literature search yielded only a handful of studies discussing data lakes. One study discussed data lakes in a cursory manner [11], while another [12] discussed some of the challenges of data lakes in a detailed fashion. However, we found no empirical studies on the use of data lakes in enterprises. The main objectives of the study are to understand the role of data lake in a BI architecture and how data lake is used in practice by enterprises. The following research questions have guided this research: What are the purposes of implementing data lake into a BI architecture? How do data lakes affect the BI architecture of an enterprise? What are the benefits and challenges of implementing data lake in a BI architecture? Since the topic has not been empirically examined in prior research, this study conducted exploratory research of BI experts from various industries. In the next section of this paper, I discuss the theoretical background for this study. Then, I illustrate the exploratory study approach by describing the data collection and the data analysis procedure. Subsequently, I present the results of this exploratory study. The article ends with a discussion of the research findings, directions for future research, and a conclusion, as well as the study’s limitations.