Abstract
۱٫ Introduction
۲٫ Background and related works
۳٫ Our approach
۴٫ Experimental evaluation
۵٫ General conclusion and future work
Declaration of Competing Interest
Acknowlgedgments
Appendix
References
Abstract
An u-shapelet is a sub-sequence of a time series used for the clustering of time series datasets. The purpose of this paper is to discover u-shapelets on uncertain time series. To achieve this goal, we propose a dissimilarity score called FOTS whose computation is based on the eigenvector decomposition and the comparison of the autocorrelation matrices of the time series. This score is robust to the presence of uncertainty; it is not very sensitive to transient changes; it allows capturing complex relationships between time series such as oscillations and trends, and it is also well adapted to the comparison of short time series. The FOTS score is used with the Scalable Unsupervised Shapelet Discovery algorithm for the clustering of 63 datasets, and it has shown a substantial improvement in the quality of the clustering with respect to the Rand Index. This work defines a novel framework for the clustering of uncertain time series.
Introduction
All measurements performed by a mechanical system contain uncertainty. Indeed, the uncertainty principle is partly a statement about the limitations of mechanical systems ability to perform measurements on a system without disturbing it [1]. Thus, time series from measurement instruments are uncertain. These time series produced by sensors constitute a vast proportion of the time series used in science, whether in medicine with ECGs, in physics with measurements recorded by telescopes, in computing with the Internet of Things and so on. Ignoring the uncertainty of the data during their analysis can lead to inaccurate conclusions [2], hence the need to implement uncertain data management techniques. Several recent studies have focused on the processing of uncertainty in data mining. Rizvandi et al.[3] studied CPU utilization time patterns of several MapReduce applications using Dynamic Time Warping and Euclidian distance for comparing times series, and they investigated the minimum distance/maximum similarity of these applications. Their results showed the effectiveness of their approach on a private cloud with up to 25 virtual nodes. Considering that time series data often contain uncertainty and that DUST is one of the latest methods that can deal with arbitrary probability distributions, but that its computational cost is high particularly when the dataset is large, Hwang et al. [4] demonstrated that the performance of DUST was much faster using GPU than the CPU-based implementation. Rehfeld and Kurths [5] investigated similarity estimators that could be suitable for the quantitative investigation of dependencies in irregular and age-uncertain time series like paleoclimate time series.