Abstract
1- Introduction
2- Literature Review
3- The Dataset
4- Background
5- Experiment 1: Clustering by K-means
6- Experiment 2: Self Organized map (SOM) and K-means
7- Conclusion & Future work
References
Abstract
Customers’ Segmentation is an important concept for designing marketing campaigns to improve businesses and increase revenue. Clustering algorithms can help marketing experts to achieve this goal. The rapid growth of high dimensional databases and data warehouses, such as Customer Relationship Management (CRM), stressed the need for advanced data analytics techniques. In this paper we investigate different data analytics algorithms, specifically K-Means and SOM, using the TIC CRM dataset. While K-Means has shown promising clustering results, SOM has outperformed in the sense of: speed, quality of clustering, and visualization. Also we discuss how both techniques segmentation analysis can be useful in studying customer’s interest. The purpose of this paper is to provide a proof of concept (based on a small publicity of data) of how big data analytics can be used in customer segmentation.
Introduction
Companies nowadays are continuously working to increase their competitiveness. The availability of big data for Customer Relationship Management (CRM) and data warehouses, with high dimensions, the need to use data mining advanced technologies has been increasing significantly. The usage of data mining algorithm might help businesses to find interesting knowledge in its customer’s data both demographic and behavioral then it is the marketing experts’ responsibility to use these insights in designing the company marketing campaigns to fulfill customer’s interests. The insurance company dataset (TIC), which we mine in this paper, was used in the COIL 2000 challenge. The goal of the challenge was to predict customers who are interested in a caravan insurance policy. The main target of our study is to answer the following research question: Can we discover meaningful clusters using different cluster analysis techniques applied on the high dimensional TIC CRM data? We attempt to answer this question by exploring cluster analysis. We identify some interesting patterns that can be used by marketing experts in insurance companies. In particular, we investigate two different data mining techniques: the infamous K-means clustering algorithm combined with the SOM technique (based on ANN). Using this solution had been showing promising results in clustering CRM dataset and visualization. Clustering and visualization of a high dimensional dataset would be used to recognize the characteristics of a customer in CRM data to design customer centric marketing plans. The remaining parts of this paper are organized as follows. Section 2 reviews other researches related to using data mining tasks and techniques in CRM data in different domains, showing points of strengths and interests in other high quality pa-pers. Section 3 is the methodology and techniques used in this research. Section 4 proposes our solutions as an experimental evaluation of the used data mining techniques showing the results of applying these techniques on the TIC CRM dataset. Finally Section 5 concludes this study with a general discussion about the proposed solutions and the future work that could be done.