What is it about?
Cancer cells harbor molecular alterations at all levels of information processing. These alterations are inter-related and affect the clinical traits in a complicated manner. We infer the associations of these molecular and clinical features and construct an Integrated Hierarchical Association Structure (IHAS) from The Cancer Genome Atlas (TCGA) data. IHAS provides a unique contribution in cancer omics as it (1) represents complicated associations in a hierarchical structure and presents varying levels-of-details views from a single gene in a specific cancer type to groups of genes across multiple cancer types, (2) performs both vertical (across multiple types of assays) and horizontal (across multiple cancer types) data integrations, (3) incorporates a large-scale biological knowledge base in the model, (4) validates the inferred associations in over 300 external datasets. In the long term, IHAS can illuminate the universal and idiosyncratic aspects of cancer omics data, give new insights about diagnosis, and provide guidance for targeted cancer therapies for precision medicine.
Featured Image
Why is it important?
Cancer cells harbor molecular alterations at all levels of information processing in the central dogma. To comprehensively chart the molecular alteration landscape, The Cancer Genome Atlas (TCGA) provides 7 types of omics data and rich clinical information of over 11000 patients across 33 cancer types. Aberrations on genomes and epigenomes likely modulate transcriptomic variations, which in turn affect clinical and molecular phenotypes. While many existing studies employ this causal chain to mine useful information, relatively few of them integrate all types of omics data probing the same cohorts (vertical integration). Even fewer works simultaneously combine the data from multiple omics measurements and multiple cancer types (horizontal integration). To our knowledge, no prior studies ever establish associations among alterations on genomes/epigenomes, variations on transcriptomes, and clinical/molecular phenotypes of all omics data in all cancer types of TCGA, organize these associations in a hierarchical structure, and validate these associations in a wide range of external datasets. To this end, we develop a data integration framework to infer associations between alterations on genomes/epigenomes (effectors) and variations on transcriptomes (targets) from TCGA data and term these associations Integrated Hierarchical Association Structure (IHAS). IHAS offers unique values in terms of scope, resource, perspective, method, and biological findings. We establish the effector-target associations of all types of omics data in all cancer types, compile a large database of multi-omics associations in cancers, and make it publicly accessible. The hierarchical structure of these associations depicts the complex systems of cancers with varying levels of details. One can view associations pertaining to individual targets, effectors, groups of closely related targets within individual cancer types and across multiple cancer types. This hierarchical view is a novel perspective unexplored by prior studies. Furthermore, we also combine both vertical (multi-omics) and horizontal (multi-cancer types) data integrations and incorporate the knowledge of biomolecular interactions and pathways in the statistical model to infer the associations. We are also one of the first teams to validate inference outcomes in numerous external datasets with comparable scopes and sizes of TCGA data, including multi-omics data of tumors and cancer cell lines, 294 cancer transcriptomics datasets from GEO, responses of cancer cell lines upon drug treatments and gene perturbations, and transcriptomic and epigenomic data of normal tissues. More important are the biological findings and insights drawn from the inferred associations. Almost all associations across cancer types impact target genes belonging to 18 Gene Groups. Half of them are further reduced to three Meta Gene Groups enriched with (1) immune and inflammatory responses, (2) embryonic development and neurogenesis, and (3) cell cycle process and DNA repair. The combinatorial expressions of IHAS subunits account for more than 80% of the molecular and clinical phenotypes reported in TCGA, such as PAM50 subtypes in breast cancer, CMS subtypes in colon cancer, sample purity and stemness across cancer types. The limited number of these combinatorial expression patterns are mis-regulated by diverse alterations on genomes and epigenomes from multiple cancer types. Validations on external datasets indicate that (1) associations may arise from the intrinsic properties of cancer cells or the interactions between cancer cells and microenvironment, (2) IHAS contains critical information about perturbation responses of cancer cells, (3) IHAS also carries tissue-specific signatures in normal tissues. While most of those findings were previously noted, they have not been weaved in a systematic framework.
Perspectives
To my knowledge, this is one of the most comprehensive characterization relating molecular alterations and transcriptomic variations in cancers. IHAS not only provides a detailed catalog of these associations but also points out general principles governing these associations. Furthermore, to infer IHAS we propose a data analysis framework applicable to much broader contexts and problems.
Chen-Hsiang Yeang
Read the Original
This page is a summary of: An integrated analysis of the cancer genome atlas data discovers a hierarchical association structure across thirty three cancer types, PLOS Digital Health, December 2022, PLOS,
DOI: 10.1371/journal.pdig.0000151.
You can read the full text:
Contributors
The following have contributed to this page







