بخشی از مقاله (انگلیسی)
The increasing size, variety, rate of growth and change, and complexity of network data has warranted advanced network analysis and services. Tools that provide automated analysis through traditional or advanced signature-based systems or machine learning classifiers suffer from practical difficulties. These tools fail to provide comprehensive and contextual insights into the network when put to practical use in operational cyber security. In this paper, we present an effective tool for network security and traffic analysis that uses high-performance data analytics based on a class of unsupervised learning algorithms called tensor decompositions. The tool aims to provide a scalable analysis of the network traffic data and also reduce the cognitive load of network analysts and be network-expert-friendly by presenting clear and actionable insights into the network. In this paper, we demonstrate the successful use of the tool in two completely diverse operational cyber security environments, namely, (1) security operations center (SOC) for the SCinet network at the SuperComputing (SC) Conference in 2016 and 2017 and (2) Reservoir Labs’ Local Area Network (LAN). In each of these environments, we produce actionable results for cyber security specialists including (but not limited to) (1) finding malicious network traffic involving internal and external attackers using port scans, SSH brute forcing, and NTP amplification attacks, (2) uncovering obfuscated network threats such as data exfiltration using DNS port and using ICMP traffic, and (3) finding network misconfiguration and performance degradation patterns.
Network analysis and network threat identification are notoriously difficult problems to solve. Traditional signature-based approaches are often thwarted by the ever-changing nature of modern cyber threats. It is nearly impossible to define signatures for what is or is not normal that generalize across many networks. Even on a given network, expected behaviors might change from day to day. Furthermore, it might not be possible to write coherent rules that capture all activities of concern. The application of cutting-edge data analytics to network traffic logs has struggled to surpass the shortcomings of classical signature-based systems. Supervised techniques run afoul of the same key problem – it is not realistic to specify normal versus abnormal behavior upfront. Other approaches that rely on training a model based on large volumes of historical data are hindered by another issue – because of the sensitive nature of network traffic there is very little publicly-available training data, and that data is not guaranteed to generalize in a meaningful way to the user’s own network. Tensor decompositions are a class of algorithms that provides a new approach for analyzing network traffic data that has been demonstrated to overcome these traditional shortcomings. A tensor is a multidimensional array of data – a suitable abstraction for structured network metadata collected in the form of network logs. A tensor decomposition breaks down a tensor, such as a log, into a finite set of patterns, called components. In this way, tensor decompositions perform a form of unsupervised learning on network traffic that does not require prior training data.