Abstract
I. Introduction
II. Preliminaries
III. Design
IV. Experiment and Analysis
V. Conclusion
Authors
Figures
References
Abstract
User behavior clustering analysis has a wide range of applications in business intelligence, information retrieval, and image pattern recognition and fault diagnosis. Most of existing methods of user behavior have some problems such as weak generality and the lack of tags of clustering. With the increasing awareness of privacy protection, user behavior analysis also needs to support for ciphertext to protect user data. Based on clustering algorithm, homomorphic encryption technology and information security, in this paper, we propose a user behavior clustering scheme that supports automatic tags on ciphertext. Firstly, design a security protocol corresponding to the basic operations such as addition, multiplication and comparison and apply to the scheme. Then, the relevant features of the user behavior are merged with the clustering process, the latent factor model, and matrix decomposition. We have implemented our method and evaluated its performance using K-means and K-means++ clustering. The results show that the scheme can auto tags over encrypted data, and the tag also meets the actual situation, which proves the validity and generality of the scheme.
Introduction
With the increasing maturity of mobile Internet technology, people use various mobile devices and wireless communication networks to browse the web, read news and carry out social activities at any time and any place, and information exchange is more and more convenient. Massive data is constantly generated in various fields, which makes the Internet data and resources show massive characteristics. How to get useful information and knowledge from redundant data to help us make more objective and effective decisions has become an important problem. User behavior analysis can solve the above problems, which refers to the statistics and analysis of user interest. Clustering algorithm is a common means to achieve it, which is widely used in data statistical analysis fields such as business intelligence, information retrieval, image pattern recognition and fault diagnosis [1]. At present, most clustering algorithms still exit two problems: the number of clustering and the tags after clustering is unknown. Without iterating through the data in each group, the category represented by the group cannot be known. After clustering user behavior data, there is no suitable method to mark each group directly. For example, shopping websites usually record members to buy the product information or comments, and also has a product category. They wants to get each group of tags after clustering to combine them, so they can obtain information about what kind of products the users in the group like. Company can offer different marketing plans to different groups. While user behavior clustering is widely applied, it also causes serious privacy disclosure, which will bring harm to the data owner [2], [3]. For example, when using clustering for stock analysis, if the behavior information of individual stock is leaked in the process of clustering, it will bring chaos to the stock market. Criminals steal user behavior data, which often reflects the user’s interests and hobbies, criminals for this fraud.