With the development of Internet of Things (IoT), heterogeneous sensory data appears everywhere in our lives. Unlike traditional sensory data, heterogeneous sensory data often involves variety modalities of data in one set, so that it is called as the multi-modal sensory data in this paper. The appearance of such data making it possible to monitor more complicated objects and improve monitoring accuracy. However, due to lack of integration model for multi-modal sensory data, most of the existing sensory data management algorithms only consider single modal sensory data, resulting in insufficient utilization of sensory data. Thus, we propose a model for integrating the heterogeneous sensory data generated in a IoT system based on Hidden Markov Process in the paper. The distributed algorithm for constructing such a model is then presented. The integration model can be applied to many applications, while we take the cooperative event detection as an example for illustration. The extensive theoretical analysis and experimental results show that all the proposed algorithms are efficient and effective .
With the rapid development of sensing techniques, embody systems and cross-technology communication , various sensors are always involved in a IoT system or even in a single device. For example, the current smart phones are equipped with several different sensors, such as accelerometer, digital compass, gyroscope, GPS, microphone and camera . An intelligent traffic monitoring system could involve many flow monitoring sensors, such as electronic eyes, GPS devices and intelligent traffic lights. A smart home application always contains the RFIDs for locating some objects, the sensors for sampling the temperature, humidity, light intensity, air flow and so on in the environment, the smart bracelet for obtaining the healthy information of monitoring people, the cameras and acoustic sensors for catching the abnormal informations and guaranteeing the safety of house etc. Unlike the traditional sensor networks, the sensory data sampled by the current IoT system not only have big volume  but also involved diverse modalities. In the aforementioned example, a crowdsourcing task running in a smart phone may use the accelerometer, microphone and camera to collect sensory data simultaneously, while the sensory data sampled by them are vector data, audio data and video data, respectively. Similarly, an intelligent traffic system also generates scalar data, vector data and video data simultaneously. Meanwhile, in a forest ecology monitoring system, temperature and humidity are presented as scalar data, wind velocity and direction are presented as vector data, and pictures of plants and videos of animals are presented as multimedia data. Furthermore, in a smart home application, the dataset includes the scalar data such as temperature, humidity .etc, the vector data, such as the movement information of monitoring persons, and the multimedia data, such as the data sampled by the camera and acoustic sensors. We notice that the data set generated by the above IoT systems refer to multiple modalities, and we call such heterogeneous data set as multi-modal sensory data set.