Abstract
1- Introduction
2- Backgrounds
3- Incremental genetic algorithm
4- Genetic framework
5- Experimental results
6- Conclusion
References
Abstract
Graph pattern matching is a key problem in many applications which data is represented in the form of a graph, and this problem is generally defined as a subgraph isomorphism. In this paper, we analyze an incremental hybrid genetic algorithm for the subgraph isomorphism problem considering various design issues to improve the performance of the algorithm. An incremental hybrid genetic algorithm was previously suggested to solve the subgraph isomorphism problem and have shown good performance. It decomposes the problem into a sequence of consecutive subproblems which has an optimal substructure. Each subproblem is solved by the hybrid genetic algorithm and the solutions obtained are extended to be applied to the next subproblem as initial solutions. We examine a wide range of schemes that determine the overall performance of the incremental process and make a number of experiments to verify the effectiveness of each scheme with the synthetic dataset of random graphs. We show that the performance of incremental approach can be significantly improved compared to the previous representative studies by applying appropriate schemes found by the experiments. In addition, we also investigate the effect of different genetic parameters and identify the scalability of our method by conducting experiments using real world dataset with large-sized graphs.
Introduction
Graph is a simple and universal data representation to model pairwise relationships among a set of objects. One of the interesting problems encountered when handling graph data is a graph pattern matching arising from pattern recognition, knowledge discovery, biology, cheminformatics, dynamic network traffic and intelligence analysis [1–4]. And this matching is typically formulated in terms of the subgraph isomorphism problem. Given two graphs G and H, the subgraph isomorphism problem is to determine whether H contains a subgraph that is isomorphic to G and this decision problem is well-known NP-Complete [5]. Many algorithms have been proposed to solve this problem starting with the backtracking algorithm by Ullmann [6]. VF2 [7], QuickSI [8], GraphQL [9], GADDI [10] and SPath [11] improved the performance by exploiting different join orders, pruning rules, and auxiliary information from the Ullmann algorithm [12]. The maximum common subproblem, the generalized problem of the subgraph isomorphism problem, also has been tackled by many algorithms [13–15]. However, these algorithms for both problems have exponential time complexity, their scalability are limited and they only work with auxiliary information such as vertex or edge labels. On the other hand, metaheuristic algorithms, especially a genetic algorithm, have been used to address this problem [16–22]. They can usually find good quality solutions within a reasonable amount of time, but most algorithms does not have enough search capability to cover large and complex problem space of the subgraph isomorphism problem.