With popularization of internet, internet attack cases are increasing, and attack methods differs each day, thus information safety problem has became a significant issue all over the world. Nowadays, it is an urgent need to detect, identify and hold up such attacks effectively. The research intends to compare efficiency of machine learning methods in intrusion detection system, including classification tree and support vector machine, with the hope of providing reference for establishing intrusion detection system in future.
Compared with other related works in data mining-based intrusion detectors, we proposed to calculate the mean value via sampling different ratios of normal data for each measurement, which lead us to reach a better accuracy rate for observation data in real world. We compared the accuracy, detection rate, false alarm rate for four attack types. More over, it shows better performance than KDD Winner, especially for U2R type and R2L type attacks.
In recent years, as internet and personal computers are populated, utilization rate of internet keeps increasing. It is changing people’s lives gradually, and the majorities of people study, recreate, communicate and buy through internet. Besides common people, enterprise structure and business mode also undergoes transformation due to internet, and large enterprise or government organizations, in order to achieve operation purpose and efficiency, develop many application and service items resting on internet; these are an irresistible tendency in the new era.
However, though internet brings about convenience and realtimeliness, consequently comes information safety problem; for example: servers are attacked and paralyzed, inner data and information are stolen, and so on. In the event of such cases, big losses may be caused in money and business credit. For example, in 2000, American Yahoo was subject to DDos attack, the servers were paralyzed for 3 hours approximately, 1 million users were affected, and the losses involved were too large to calculate. Other famous business internets, such as CNN, eBay, Amazon.com, Buy.com, and so on, also suffered such internet attacks.
5.2. Future research suggestions
-Dataset KDD Cup 99 applied in the research is popularly used in current intrusion detection system; however, it is data of 1999, and network technology and attack methods changes greatly, it cannot reflect real network situation nowadays. Therefore, if newer information is got and tested and compared refresh, they can more accurately reflect current network situation.
-Through test and comparison, the accuracy and detection rate of C4.5 is higher than that of SVM, but false alarm rate of SVM is better; if we combine the two methods, overall accuracy can be increased greatly.
-In sampling, the research supposes that the distribution of attack data other than normal data is even, which cannot surely get optimal results, and this should be improved and validated in future.
-C4.5 parameters set in the research is not optimal, thus the future work should optimize the parameters according to C4.5 parameters and different training dataset.
-SVM applied in the research uses its built-in grid.py to optimize its parameters, and it needs approximately 2 hours to search parameters for 10,000 groups of data in the research; however, it is not suitable, for intrusion detection system requires realtimeliness. The future research should aim at the direction where the parameters can be optimized rapidly.