Abstract
1- Preliminaries
2- Relevant algorithmic work and structural characterization of successive minimal maximum subsequences
3- Parallel algorithm on PRAM model for Max-computation
4- Parallel algorithm on cluster systems for Max-computation
5- Conclusion
References
Abstract
A minimal maximum subsequence is a minimal subsequence with maximum cumulative sum. We present two parallel algorithms that find all successive minimal maximum subsequences: one on the parallel random-access model in logarithmic time with linear work, and the other with overlapping domain decomposition on cluster systems. We confirm the efficacy and efficiency of the latter algorithm for random sequences via: (1) an application of random-walk theory that derives an approximate probabilistic length upper bound for overlapping subsequences — thus facilitating concurrent/independent computations of all minimal maximum subsequences in hosting processors, and (2) an empirical study with normally-distributed random sequences.
Conclusion
The problem of computing the set of all successive non-empty minimal maximum subsequences of a real-valued sequence has major applications, among other areas, in informatics and textual information retrieval. The Max-computation has real practical importance as it appears as a filtering subroutine in biological sequence analysis. Hence there is a natural need for computing Max in parallel and its implementation on practical parallel systems. Our theoretical parallel algorithm computes Max in logarithmic parallel time and optimal linear work on the EREW PRAM model. Our initial efforts in adapting the algorithm to a cluster of processors are impeded by the lack of a communicationefficient algorithm in solving an embedded subproblem. The Max-computation has linear sequential complexity, and it is solved very efficiently by an optimal linear-time sequential algorithm, hence achieving good speed-ups on a practical parallel system is a challenge. The underlying support for the overlapping domain-decomposed parallel algorithm (on cluster systems with Message passing Interface) for random sample sequences is an application of the theory of random walk in R1, which derives an approximate probabilistic length upper bound for the common intersection of overlapping subsequences in an appropriate probabilistic setting for the random sequences. The length bound is incorporated in the algorithm to ensure the probabilistic satisfiability of the -locality/Max-independence, which admits concurrent and independent computations of all non-empty minimal maximum subsequences in hosting processors. We confirm the efficacy and efficiency of the practical parallel algorithm with a small-scale empirical study with synthetic random sample sequences from normal distributions with negative mean. Our work in progress includes a comparative empirical/probabilistic study based on current implementation and refining the algorithm to detect and resolve violations of -locality among near-neighbor processors.