We exploited a number of contemporary statistical concepts and programming languages, high-performance computing, and a traditional Cox PH regression survival analysis to create a workflow for finding genes associated with survival outcomes. Our workflow explores a high-dimensional dataset to identify the most robust and influential set of co-related genes. The workflow has tested on transcriptomic data from a cohort of cholangiocarcinoma (CCA) patients from The Cancer Genomic Atlas that was externally validated on the International Cancer Genome Consortium CCA dataset. The workflow was applied to a larger cancer genomic dataset of breast cancer from TCGA project, by training on 700 cases and validating on the 350 remaining TCGA BRCA cases, as well as on the 2509 BRCA cases characterized in the METABRIC study.
Principal Investigator: Dr. Oliver Bathe
Co-Principal Investigator: Dr. Karen Kopciuk