Pathway analysis
In bioinformatics research, Pathway Analysis Software is used to identify related proteins within a pathway or building pathway de novo from the proteins of interest. This is helpful when studying differential expression of a gene in a disease or analyzing any OMICS dataset with large number of proteins. By examining the changes in gene expression in a pathway, its biological causes can be explored. Pathway is the term from molecular biology which depicts an artificial simplified model of a process within a cell or tissue. Typical pathway model starts with extracellular signaling molecule that activates a specific protein. Thus triggers a chain of protein-protein or protein-small molecule interactions.[1] Pathway analysis helps to understand or interpret OMICS data from the point of view of canonical prior knowledge structured in the form of pathways diagrams. It allows finding distinct cell processes (Cellular processes), diseases or signaling pathways that are statistically associated with selection of deferentially expressed genes between two samples.[2] Often pathway analysis is used as synonym for network analysis (functional enrichment analysis and gene set analysis).[3]
Uses
The data for pathway analysis come from High throughput biology. This includes high throughput sequencing data and microarray data. Before pathway analysis can be done, the OMICS data should be normalized, and genes should be ranked by differential expression usually with help of Student's t-test, ANOVA or other statistics. In general, any list of statistical ranked genes can be analyzed by pathway analysis. In case when ranking is not available simply list of genes can be analyzed. Also it is possible to integrate multiple microarray data sets from different research groups by meta-analysis and cross-platform normalization.[4] By using pathway analysis software, researchers can determine which gene groups such as pathways, cell processes or diseases are enriched with over and under expressed in experimental data genes. They can also infer associated upstream and downstream regulators, proteins, small molecules, drugs, etc.[5] For example, pathway analysis of several independent microarray experiments (meta-analysis) helped to discover potential biomarkers in a single pathway important for fast-to-slow switch fiber type transition in Duchenne muscular dystrophy.[6] In other study meta-analysis identified two biomarkers in blood of patients with Parkinson's Disease, which can be useful for monitoring the disease.[7]
Pathways Databases
Pathway analysis needs bases with pathway collection and interaction networks. Pathway collections content, structure and functionality usually vary in different sources. The examples of the most popular free public pathway collections are KEGG [8] and Reactome.[9] Also there are commercial pathways collections such as Pathway Studio pathways [10] and IPA pathways.[11]
Methods and software
Pathway analysis software can be generally divided into web-based applications, desktop programs and programming packages. Programming packages are mostly coded in the R and Python languages, and are shared openly through the BioConductor [12] and GitHub [13] projects. Different methods of pathway analysis evolve fast, so classification of these methods is still discussable.,.[14][15] There are 3 main groups of methods in pathway analysis according to:[16] ORA, FSC and PT.
Over-Representation Analysis or Enrichment Analysis (ORA)
This method measures the percentage of genes in a pathway or any gene group (gene ontology (GO) groups, protein families, pathways) that have differential expression. The aim of ORA is to get a list of the most relevant pathways, ordered in accordance to a p-value. The basic hypothesis in ORA is that relevant pathways can be identified by the number of genes differently expressed in the experiment that pathways contain. The statistical significance of the overlap between genes from a pathway and the list of differently expressed genes is determined by such statistical tests as Fisher's exact test, Hypergeometric distribution test or Jaccard index.
Functional Class Scoring (FCS)
This method analyzes the expression change of overall genes in the list (not ranking by statistical significance or something else) of differently expressed in experiment genes. FCS discards the ORA cut-off threshold limitation. The aim of FCS is to evaluate differently expressed genes enrichment scores (see Gene set enrichment) using pathways as gene sets to perform their computations. One of the first and most popular methods deploying the FCS approach is the Gene Set Enrichment Analysis (GSEA).[17]
Pathway Topology (PT)
Pathway topology is essentially the same as FCS, except PT uses gene-level statistics through different databases integration.[18] However the critical difference is that by leveraging the information about role, position, and direction of interaction from the pathway database, PT is able to re-score the significance of a pathway as the linkages change, whereas FCS will always provide the same score. [19]
Notable companies
Several companies have licensed software to perform a number of analytic methods on gene set. Most of free software solutions provide only links to online pathway collections; rather commercial ones have their own collections. The choice of best software depends on user skills, cost and time which one could spend on pathways analysis.[20] Ingenuity, for example, charges a fee for use of their software. Some software, like STRING or Cytoscape are an open-source. However, Ingenuity maintains a knowledge base to compare gene expression data to.[21] Pathways Studio [22] is commercial software which allows to search biologically relevant facts, analyze experiments and create pathways. Pathways Studio Viewer [23] is a free resource from that company for making acquaintance with Pathway Studio interactive pathway collection and database. Only two commercial applications are known to offer pathway topology (PT) based analyses, PathwayGuide from Advaita Corporation and MetaCore from Thomson Reuters.[24] Advaita uses the peer reviewed Signaling Pathway Impact Analysis (SPIA) method[25][26] whereas the MetaCore method is unpublished.[27]
Limits
Missing annotations on cell types and conditions
Many current methods for pathway analysis depend on existing databases. The data used, however, is not always completely annotated. Many genes interactions in databases relatively speculative as basing on scientific facts are pulled from a specific cell types or diseases. Also most canonical pathways are built using the knowledge obtained from limited number of experiments with narrow cell models. Therefore interpretation of results of pathway analysis of OMICS data obtained from different tissues should be done with caution.[28]
References
- ↑ Berg JM, Tymoczko JL, Stryer L. Biochemistry, 5th edition, New York: W H Freeman; 2002
- ↑ García-Campos, Miguel Angel; Espinal-Enríquez, Jesús; Hernández-Lemus, Enrique (2015). "Pathway analysis: State of the art". Frontiers in Physiology. 6. doi:10.3389/fphys.2015.00383.
- ↑ GSEA
- ↑ Walsh, Christopher, Pingzhao Hu, Jane Batt, and Claudia Santos. 2015. "Microarray Meta-Analysis and Cross-Platform Normalization: Integrative Genomics for Robust Biomarker Discovery." Microarrays 4 (3): 389–406. doi:10.3390/microarrays4030389
- ↑ Subramanian, Aravind, Pablo Tamayo, Vamsi K. Mootha, Sayan Mukherjee, Benjamin L. Ebert, Michael A. Gillette, Amanda Paulovich, et al. 2005. "Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles." Proceedings of the National Academy of Sciences of the United States of America 102 (43): 15545–50. doi:10.1073/pnas.0506580102
- ↑ Kotelnikova, Ekaterina, Maria A. Shkrob, Mikhail A. Pyatnitskiy, Alessandra Ferlini, and Nikolai Daraselia. 2012. "Novel Approach to Meta-Analysis of Microarray Datasets Reveals Muscle Remodeling-Related Drug Targets and Biomarkers in Duchenne Muscular Dystrophy." PLoS Computational Biology 8 (2): e1002365. doi:10.1371/journal.pcbi.1002365
- ↑ Santiago, Jose A., and Judith A. Potashkin. 2015. "Network-Based Metaanalysis Identifies HNF4A and PTBP1 as Longitudinally Dynamic Biomarkers for Parkinson's Disease." Proceedings of the National Academy of Sciences of the United States of America 112 (7): 2257–62. doi:10.1073/pnas.1423573112
- ↑ Ogata, H., S. Goto, K. Sato, W. Fujibuchi, H. Bono, and M. Kanehisa. 1999. "KEGG: Kyoto Encyclopedia of Genes and Genomes." Nucleic Acids Research 27 (1): 29–34
- ↑ Vastrik, Imre, Peter D’Eustachio, Esther Schmidt, Geeta Joshi-Tope, Gopal Gopinath, David Croft, Bernard de Bono, et al. 2007. "Reactome: A Knowledge Base of Biologic Pathways and Processes." Genome Biology 8 (3): R39. doi:10.1186/gb-2007-8-3-r39
- ↑ Pathway Studio Pathways
- ↑ Pathway Central
- ↑ Gentleman, R. C., Carey, V. J., Bates, D. M., Bolstad, B., Dettling, M., Dudoit, S., et al. (2004). Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5:R80. doi: 10.1186/gb-2004-5-10-r80
- ↑ Dabbish, L., Stuart, C., Tsay, J., and Herbsleb, J. (2012). "Social coding in github: transparency and collaboration in an open software repository," in Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work (New York, NY: ACM), 1277–1286
- ↑ Khatri P, Sirota M, Butte AJ. Ten years of pathway analysis: current approaches and outstanding challenges. Plos Comput Biol. 2012;8(2)
- ↑ Henderson-Maclennan NK, Papp JC, Talbot CC, McCabe ERB, Presson AP. Pathway analysis software: annotation errors and solutions. Mol Genet Metab. 2010 Nov;101(2–3):134–40
- ↑ Khatri P, Sirota M, Butte AJ. Ten years of pathway analysis: current approaches and outstanding challenges. Plos Comput Biol. 2012;8(2)
- ↑ Subramanian, Aravind, Pablo Tamayo, Vamsi K. Mootha, Sayan Mukherjee, Benjamin L. Ebert, Michael A. Gillette, Amanda Paulovich, et al. 2005. "Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles." Proceedings of the National Academy of Sciences of the United States of America 102 (43): 15545–50. doi:10.1073/pnas.0506580102
- ↑ Emmert-Streib, F., and Dehmer, M. (2011). Networks for systems biology: conceptual connection of data and function. Syst. Biol. IET 5, 185–207. doi: 10.1049/iet-syb.2010.0025
- ↑ Khatri, Purvesh; Sirota, Marina; Butte, Atul J.; Ouzounis, Christos A. (23 February 2012). "Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges". PLoS Computational Biology. 8 (2): e1002375. doi:10.1371/journal.pcbi.1002375.
- ↑ García-Campos, Miguel Angel; Espinal-Enríquez, Jesús; Hernández-Lemus, Enrique (2015). "Pathway analysis: State of the art". Frontiers in Physiology. 6. doi:10.3389/fphys.2015.00383.
- ↑ "Ingenuity IPA - Integrate and Understand Complex 'omics Data." Ingenuity. Web. 8 Apr. 2015. <http://www.ingenuity.com/products/ipa#/?tab=features>.
- ↑ Pathway Studio
- ↑ Pathway Studio Viewer
- ↑ Mitrea, Cristina; Taghavi, Zeinab; Bokanizad, Behzad; Hanoudi, Samer; Tagett, Rebecca; Donato, Michele; Voichiţa, Călin; Drăghici, Sorin (2013). "Methods and approaches in the topology-based analysis of biological pathways". Frontiers in Physiology. 4. doi:10.3389/fphys.2013.00278.
- ↑ Draghici, S.; Khatri, P.; Tarca, A. L.; Amin, K.; Done, A.; Voichita, C.; Georgescu, C.; Romero, R. (4 September 2007). "A systems biology approach for pathway level analysis". Genome Research. 17 (10): 1537–1545. doi:10.1101/gr.6202607.
- ↑ Tarca, A. L.; Draghici, S.; Khatri, P.; Hassan, S. S.; Mittal, P.; Kim, J.-s.; Kim, C. J.; Kusanovic, J. P.; Romero, R. (5 November 2008). "A novel signaling pathway impact analysis". Bioinformatics. 25 (1): 75–82. doi:10.1093/bioinformatics/btn577.
- ↑ Mitrea, Cristina; Taghavi, Zeinab; Bokanizad, Behzad; Hanoudi, Samer; Tagett, Rebecca; Donato, Michele; Voichiţa, Călin; Drăghici, Sorin (2013). "Methods and approaches in the topology-based analysis of biological pathways". Frontiers in Physiology. 4. doi:10.3389/fphys.2013.00278.
- ↑ Henderson-Maclennan, Nicole K., Jeanette C. Papp, C. Conover Talbot, Edward R.b. Mccabe, and Angela P. Presson. "Pathway Analysis Software: Annotation Errors and Solutions."Molecular Genetics and Metabolism (2010): 134-40. PMC. Web. 8 Apr. 2015.