Editor’s note: This editorial launches a series written by editors and co-authored with a senior executive, thought leader, or scholar from a different field to explore new content areas and grand challenges with the goal of expanding the scope, interestingness, and relevance of the work presented in the Academy of Management Journal. The principle is to use the editorial notes as “stage setters” for further work and to open up fresh, new areas of inquiry for management research.
Big data is everywhere. In recent years, there has been an increasing emphasis on big data, business analytics, and “smart” living and work environments. Though these conversations are predominantly practice driven, organizations are exploring how large-volume data can usefully be deployed to create and capture value for individuals, businesses, communities, and governments (McKinsey Global Institute, 2011). Whether it is machine learning and web analytics to predict individual action, consumer choice, search behavior, traffic patterns, or disease outbreaks, big data is fast becoming a tool that not only analyzes patterns, but can also provide the predictive likelihood of an event.
Organizations have jumped on this bandwagon of using ever-increasing volumes of data, often in tera- or petabytes’ worth of storage capacity, to better predict outcomes with greater precision. For example, the United Nations’ Global Pulse is an initiative that uses new digital data sources, such as mobile calls or mobile payments, with real-time data analytics and data mining to assist in development efforts and understanding emerging vulnerabilities across developing countries. Though “big data” has now become commonplace as a business term, there is very little published management scholarship that tackles the challenges of using such tools— or, better yet, that explores the promise and opportunities for new theories and practices that big data might bring about. In this editorial, we explore some of its conceptual foundations as well as possible avenues for future research and application in management and organizational scholarship.
WHAT IS “BIG DATA”?
Big data is generated from an increasing plurality of sources, including Internet clicks, mobile transactions, user-generated content, and social media as well as purposefully generated content through sensor networks or business transactions such as sales queries and purchase transactions. In addition, genomics, health care, engineering, operations management, the industrial Internet, and finance all add to big data pervasiveness. These data require the use of powerful computational techniques to unveil trends and patterns within and between these extremely large socioeconomic datasets. New insights gleaned from such data-value extraction can meaningfully complement official statistics, surveys, and archival data sources that remain largely static, adding depth and insight from collective experiences—and doing so in real time, thereby narrowing both information and time gaps.
Perhaps the misnomer is in the “bigness” of big data, which invariably attracts researchers’ attention to the size of the dataset. Among practitioners, there is emergent discussion that “big” is no longer the defining parameter, but, rather, how “smart” it is—that is, the insights that the volume of data can reasonably provide. For us, the defining parameter of big data is the fine-grained nature of the data itself, thereby shifting the focus away from the number of participants to the granular information about the individual. For example, a participant in a Formula 1 car race generates 20 gigabytes of data from the 150 sensors on the car that can help analyze the technical performance of its components, but also the driver reactions, pit stop delays, and communication between crew and driver that contribute to overall performance (Munford, 2014). The emphasis thus moves away from outcomes (win/lose race), to instead focus on each proximal, contributory element for success or failure mapped for every second during the race. Similarly, one could analyze the social networks and social engagement behaviors of individuals by mapping mobility patterns onto physical layouts of workspaces using sensors, or the frequency of meeting room usage using remote sensors that track entry and exit patterns, which could provide information on communication and coordination needs based on project complexity and approaching deadlines. These micro data provide a richness of individual behaviors and actions that have not yet been fully tapped in management research. Whether it is “big” or “smart” data, the use of large-scale data to predict human behavior is gaining currency in business and government policy practice, as well as in scientific domains where the physical and social sciences converge (recently referred to as “social physics”) (Pentland, 2014).
Sources of Big Data
Big data is also a wrapper for different types of granular data. Below, we list five key sources of high volume data: (1) public data, (2) private data, (3) data exhaust, (4) community data, and (5) selfquantification data.
“Public data” are data typically held by governments, governmental organizations, and local communities that can potentially be harnessed for wide-ranging business and management applications. Examples of such data include those concerning transportation, energy use, and health care that are accessed under certain restrictions in order to guard individual privacy. “Private data” are data held by private firms, non-profit organizations, and individuals that reflect private information that cannot readily be imputed from public sources. For example, private data include consumer transactions, radio-frequency identification tags used by organizational supply chains, movement of company goods and resources, website browsing, and mobile phone usage, among several others.
“Data exhaust” refers to ambient data that are passively collected, non-core data with limited or zero value to the original data-collection partner. These data were collected for a different purpose, but can be recombined with other data sources to create new sources of value. When individuals adopt and use new technologies (e.g., mobile phones), they generate ambient data as by-products of their everyday activities. Individuals may also be passively emitting information as they go about their daily lives (e.g., when they make purchases, even at informal markets; when they access basic health care; or when they interact with others). Another source of data exhaust is information-seeking behavior, which can be used to infer people’s needs, desires, or intentions. This includes Internet searches, telephone hotlines, or other types of private call centers.
“Community data” is a distillation of unstructured data— especially text—into dynamic networks that capture social trends. Typical community data include consumer reviews on products, voting buttons (such as, “I find this review useful”), and Twitter feeds, among many others. These community data can then be distilled for meaning to infer patterns in social structure (e.g., Kennedy, 2008). “Self-quantification data” are types of data that are revealed by the individual through quantifying personal actions and behaviors. For example, a common form of self-quantification data is that obtained through the wristbands that monitor exercise and movement, data which are then uploaded to a mobile phone application and can then be tracked and aggregated. In psychology, individuals have “stated preferences” of what they would like to do versus “revealed preferences,” wherein the preference for an action or behavior is inferred. For example, an individual might buy energy-efficient lightbulbs with the goal of saving electricity, but, instead, keep the lights on longer because they are now using less energy. Such self-quantification data helps bridge the connection between psychology and behavior. Social science scholars from diverse areas, such as psychology, marketing, or public policy, could benefit from stated and implicit preference data for use in their research.