Human capital is of a high concern for companies’ management where their most interest is in hiring the highly qualified personnel which are expected to perform highly as well. Recently, there has been a growing interest in the data mining area, where the objective is the discovery of knowledge that is correct and of high benefit for users. In this paper, data mining techniques were utilized to build a classification model to predict the performance of employees. To build the classification model the CRISP-DM data mining methodology was adopted. Decision tree was the main data mining tool used to build the classification model, where several classification rules were generated. To validate the generated model, several experiments were conducted using real data collected from several companies. The model is intended to be used for predicting new applicants’ performance.
Human resource has become one of the main concerns of managers in almost all types of businesses which include private companies, educational institutions and governmental organizations. Business Organizations are really interested to settle plans for correctly selecting proper employees. After hiring employees, managements become concerned about the performance of these employees were management build evaluation systems in an attempt to preserve the goodperformers of employees (Chein and Chen, 2006).
Data mining is a young and promising field of information and knowledge discovery (Han et al., 2011). It started to be an interest target for information industry, because of the existence of huge data containing large amounts of hidden knowledge. With data mining techniques, such knowledge can be extracted and accessed transforming the databases tasks from storing and retrieval to learning and extracting knowledge.
Data miming consists of a set of techniques that can be used to extract relevant and interesting knowledge from data. Data mining has several tasks such as association rule mining, classification and prediction, and clustering. Classification techniques are supervised learning techniques that classify data item into predefined class label. It is one of the most useful techniques in data mining to build classification models from an input data set. The used classification techniques commonly build models that are used to predict future data trends. There are several algorithms for data classification such as decision tree and Naïve Bayes classifiers. With classification, the generated model will be able to predict a class for given data depending on previously learned information from historical data.
Decision tree is one of the most used techniques, since it creates the decision tree from the data given using simple equations depending mainly on calculation of the gain ratio, which gives automatically some sort of weights to attributes used, and the researcher can implicitly recognize the most effective attributes on the predicted target. As a result of this technique, a decision tree would be built with classification rules generated from it (Han et al., 2011).
Naïve Bayes classifier is another classification technique that is used to predict a target class. It depends in its calculations on probabilities, namely Bayesian theorem. Because of this use, results from this classifier are more accurate and effective, and more sensitive to new data added to the dataset (Han et al., 2011).
Several studies used data mining for extracting rules and predicting certain behaviors in several areas of science, information technology, human resources, education, biology and medicine.
For example, Beikzadeh and Delavari (2004) used data mining techniques for suggesting enhancements on higher educational systems. Al-Radaideh et al. (2006) also used data mining techniques to predict university students’ performance. Many medical researchers, on the other hand, used data mining techniques for clinical extraction units using the enormous patients data files and histories, Lavrac (1999) was one of such researchers. Mullins et al. (2006) also worked on patients’ data to extract disease association rules using unsupervised methods.
Karatepe et al. (2006) defined the performance of a frontline employee, as his/her productivity comparing with his/her peers. Schwab (1991), on the other hand, described the performance of university teachers included in his study, as the number of researches cited or published. In general,performance is usually measured by the units produced by the employee in his/her job within the given period of time.
Researchers like Chein and Chen (2006) have worked on the improvement of employee selection, by building a model, using data mining techniques, to predict the performance of newly applicants. Depending on attributes selected from their CVs, job applications and interviews. Their performance could be predicted to be a base for decision makers to take their decisions about either employing these applicants or not.