Background: Handling the vast amount of gene expression data generated by genome-wide transcriptional profiling techniques is a challenging task, demanding an informed combination of pre-processing, filtering and analysis methods if meaningful biological conclusions are to be drawn. For example, a range of traditional statistical and computational pathway analysis approaches have been used to identify over-represented processes in microarray data derived from various disease states. However, most of these approaches tend not to exploit the full spectrum of gene expression data, or the various relationships and dependencies. Previously, we described a pathway enrichment analysis tool created in MATLAB that yields a Pathway Regulation Score (PRS) by considering signalling pathway topology, and the overrepresentation and magnitude of differentially-expressed genes (J Comput Biol 19:563–573, 2012). Herein, we extended this approach to include metabolic pathways, and described the use of a graphical user interface (GUI).
Results: Using input from a variety of microarray platforms and species, users are able to calculate PRS scores, along with a corresponding z-score for comparison. Further pathway significance assessment may be performed to increase confidence in the pathways obtained, and users can view Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway diagrams marked-up to highlight impacted genes.
Conclusions: The PRS tool provides a filter in the isolation of biologically-relevant insights from complex transcriptomic data.
Increasingly, high-throughput transcriptional profiling techniques (microarrays or, increasingly, RNAseq) inform modern life-science research. Such techniques provide a molecular “camera” taking genome-wide “snapshots” of genetic activity. However, the effective analysis of microarray data presents a number of challenges, in particular handling the large number of genes that are studied simultaneously.
Analysing gene expression in the context of curated knowledge, or “knowledge base-driven pathway analysis”, is critical as this guides the reduction in search space from many thousands of genes to an subset of biological processes, which are much more tractable to human interpretation . According to Khatri et al , pathway enrichment approaches can be divided into three generations:
i. Over-representation Analysis (ORA): This scores a pathway by considering the proportion of differentiallyexpressed genes (DEGs) observed in each pathway relative to the proportion of all microarray DEGs. This is used by several pathway analysis tools, including GenMAPP , GoMiner , Onto-Express  and FatiGo .
ii. Functional Class Scoring (FCS): FCS gives a score to each gene in a pathway based on its expression, from which a pathway-score is calculated based on the scores of all the genes in the pathway. A number of FCS methods have been implemented through standalone tools such as GSEA , SigPathway , and SAFE , or web tools such as T-profiler , Gazer  and GeneTrail .
iii. Pathway Topology (PT)-based approaches: These approaches exploit the topology of pathways by giving weights to pre-defined connections between genes, which inform pathway scoring. Several topology-based approaches have been described in the literature over the past few years. According to Mitrea et al , PT-based approaches differ in the way they translate pathway topology information into a pathway score. Some methods use only the topology data of differentially-expressed genes (DEGs) in the enrichment score (for example MetaCore  and EnrichNet ), whereas others (including SPIA  and GANPA ) use expression data of DEGs along with the topology data. Alternatively, some methods use expression data derived from all microarray genes, whether they change between conditions or not, for example PathOlogist , DEGraph , and ACST . Importantly, some PT-based tools use only signalling pathway descriptions, such as Pathway-Express , NetGSA , ScorePAGE , TAPPA  MetPA , and Clipper .
Previously, we proposed a new pathway enrichment method, in which both pathway topology and the magnitude of gene expression changes informed the creation of a Pathway Regulation Score (PRS) . Specifically, by combining fold-change data for those transcripts exceeding a significance threshold, and by taking into account the potential of altered gene expression to impact upon downstream transcription, we identified those pathways most relevant to the pathophysiological process under investigation. Our approach addressed a number of issues that potentially compromise enrichment methods. We took steps to mitigate the influence of errors in ID mapping, and to reduce the bias introduced by highlyredundant pathways (i.e. multiple instances of the same gene). Topology methods also have to handle loops effectively, so we used a search algorithm derived from graph theory to resolve this problem. We also felt that arbitrarily dividing processes into either up- or downregulated was artificial as changes in gene expression are likely to be distributed throughout pathways, thus ours was an overall impact assessment.
Herein, we described the implementation of our PRS approach as a standalone tool that provides end users with the option of importing data from different microarray platforms and species. The tool yields both PRS and z-scores, provides statistical analysis, and allows browsing of pathways with impacted genes highlighted in different colours. An enhancement from our original report is that users are able to enrich both signalling and metabolic pathways.
The PRS approach was implemented in MATLAB. Users without access to the MATLAB environment can download the MATLAB Runtime Compiler (MRC) in order to deploy the software described herein, via a user-friendly GUI. The PRS interface (Figure 1) provides users with several functions:
Preprocessing microarray data
We did not re-engineer a filter to normalise data from a variety of platforms, rather users must first preprocess transcriptomic data using one of the myriad existing tools.