The Maximal Independent Set (MIS) graph problem arises in many applications such as computer vision, information theory, molecular biology, and process scheduling. The growing scale of MIS problems suggests the use of distributed-memory hardware as a cost-effective approach to providing necessary compute and memory resources. Luby proposed four randomized algorithms to solve the MIS problem. All those algorithms are designed focusing on shared-memory machines and are analyzed using the PRAM model. These algorithms do not have direct efficient distributed-memory implementations. In this paper, we extend two of Luby’s seminal MIS algorithms, “Luby(A)” and “Luby(B),” to distributed-memory execution, and we evaluate their performance. We compare our results with the “Filtered MIS” implementation in the Combinatorial BLAS library for two types of synthetic graph inputs.
Let G = (V, E) be a graph where V represents the set of vertices and E represents the set of edges in the graph. An independent set in G is a set of vertices in a graph such that no two vertices in the set are adjacent. The largest independent sets (there may be more than one) are called the maximum independent sets. Since finding a maximum independent set is NP-hard, most applications settle for finding a maximal independent set. A maximal independent set (MIS) of a graph is an independent set that is not a subset of any other independent set (see Figure 1). Finding a MIS is an important graph problem used in many applications, including computer vision, coding theory, molecular biology and process scheduling. Although efficient MIS algorithms are well-known , the increasing scale of data-intensive applications suggests the use of distributed-memory hardware (clusters), which in turn requires distributed-memory algorithms.
Luby’s Monte Carlo MIS algorithms are often used for parallel MIS implementations. Luby MIS algorithms are designed focusing on shared memory machines and analyzed using the Parallel Random Access Machine (PRAM) model. Luby’s algorithms do not immediately lend itself to efficient distributed memory parallel algorithms due to overhead incurred by synchronization and distributed subgraph computations. In this paper, we present distributed versions of Luby’s Monte Carlo algorithms (Algorithm A and Algorithm B) that minimize these overheads. Furthermore, we derive a variation of Luby(A) that avoids computing random numbers in every iteration. All presented algorithms are implemented in the AM++ runtime  and their performance is evaluated. Our results show that the proposed algorithms scale well in distributed settings. We also compare our results with the FilteredMIS implementation in the Combinatorial BLAS library , and we show that our implementations are several times faster compared to FilteredMIS algorithm.
II. RELATED WORK
Most parallel MIS algorithms focus on shared memory, using the PRAM model for parallel complexity analysis (e.g., , , , , , ). In this work, we specifically focus on Luby’s  randomized algorithms (see Section III for details). Luby provided a detailed analysis of his algorithms using the PRAM machine model. Later, Luby’s algorithm concepts were used to implement distributed versions. Lynch et al.  discuss a distributed version of Luby’s algorithm for synchronous distributed networks. Metivier et al. [ ´ 12] present an improved version of Lynch’s algorithm, improving communication message complexity. Kuhn et al.  provide a deterministic distributed MIS algorithm. However, they assume a synchronous communication model and provide no experimental results.
The Combinatorial BLAS library implements a distributed version of Luby’s algorithm. Their implementation uses linear algebra primitives in implementing Luby’s algorithm and also the algorithm works on filtered graphs. Salihoglu et al.  implemented a distributed version of Luby’s algorithm for Pregel-like systems. They used Luby’s MIS algorithm to solve the graph coloring problem. Garimella et al.  compare the performance of Luby’s implementation in Pregel with a parallel algorithm designed by Blelloch et al. . Blelloch et al., parallelize the sequential greedy lexicographically-first MIS algorithm. This algorithms uses a priority DAG constructed over the vertices of the input graph, where the DAG edges connect higher-priority to lower-priority endpoints based on random values assigned to vertices.
The problem of parallel MIS has been the focus of much theoretical research. Part of the related work discussed above involves analyzing parallel time complexity or bit complexity of the algorithm discussed. Existing implementations of parallel MIS mostly adopt Luby’s algorithms or they use an algorithm based on Luby’s MIS. However, Luby’s MIS does not immediately extend to efficient distributed memory parallel algorithm due to reasons we discuss in Section III.
A distributed implementation of Luby’s MIS is openly available in CombBLAS library. However, CombBLAS algorithm is designed to work on “filtered” graphs. Further, it performs several pre-processing steps including removing self-edges and load balancing. The distributed Luby algorithms we present in this paper are designed specifically for large-scale static graph processing under a distributed memory setting and capable of processing unstructured graphs without preprocessing.
III. LUBY’S ALGORITHMS
Luby’s algorithms are the most widely used parallel algorithms for finding a MIS in shared memory. In his original publication, Luby discussed a general iterative scheme and four particular variations based on it. The general iterative scheme is listed in Algorithm 1. In every iteration, the general iterative scheme selects a non-empty independent set (Line 5) and merges it to the output (Smis). Then, the selected independent set and its neighbors are removed from the input graph, and the resulting subgraph is fed into the scheme for the next iteration (Lines 7– 9). This process is repeated until the resulting subgraph is empty. In every iteration, the general iterative scheme generates a new independent set. Luby proved that the union of all those independent sets is a maximal independent set.
To select an independent set from a subgraph in an iteration, Luby proposed two Monte Carlo algorithms: Select A and Select B. Select B is further enhanced to create two more variations, Select C and Select D. All four of those variations use randomization to calculate an independent set. Select algorithms A, B, and C are non-deterministic, while Select D is deterministic. In this paper, we focus on Select algorithms A and B (since C and D are variations of B). Select A and Select B algorithms are summarized in Table I.