Abstract
1. Introduction
2. Programming models
3. Security and privacy model
4. Cloud services
5. Applications
6. Experimental results
7. Discussion
8. Conclusions
Acknowledgements
References
Abstract
Analysis of public transportation data in large cities is a challenging problem. Managing data ingestion, data storage, data quality enhancement, modelling and analysis requires intensive computing and a non-trivial amount of resources. In EUBraBIGSEA (Europe-Brazil Collaboration of Big Data Scientific Research Through Cloud-Centric Applications) we address such problems in a comprehensive and integrated way. EUBra-BIGSEA provides a platform for building up data analytic workflows on top of elastic cloud services without requiring skills related to either programming or cloud services. The approach combines cloud orchestration, Quality of Service and automatic parallelisation on a platform that includes a toolbox for implementing privacy guarantees and data quality enhancement as well as advanced services for sentiment analysis, traffic jam estimation and trip recommendation based on estimated crowdedness. All developments are available under Open Source licenses (http://github.org/eubr-bigsea, https: //hub.docker.com/u/eubrabigsea/).
Introduction
Public transportation in large cities is a major source of high-valuable data to understand and improve the citizens’ lifestyle and to dynamically react to unplanned events. Multiple heterogeneous data sources are available, and different data analytics tools do exist. However, processing such data requires downloading the data, installing processing tools, managing the resources and developing processing software. EUBra-BIGSEA1 (Europe – Brazil Collaboration of Big Data Scientific Research Through Cloud-Centric Applications) is a collaboration aimed at developing convenient data analytic services based on the cloud mainly tailored for public transportation data, able to process data under several restrictions, such as Quality of Service (QoS) constraints and privacy-awareness, by means of convenient and auto-parallelisable programming models. EUBra-BIGSEA has developed and implemented a software architecture that addresses a significant number of software requirements for three main use cases on public transportation data analysis.