Through integration of genomic data from multiple sources, we may obtain a more accurate and complete picture of the molecular mechanisms underlying tumorigenesis. We discuss the integration of DNA copy number and mRNA gene expression data from an observational integrative genomics study involving cancer patients. The two molecular levels involved are linked through the central dogma of molecular biology. DNA copy number aberrations abound in the cancer cell. Here we investigate how these aberrations affect gene expression levels within a pathway using observational integrative genomics data of cancer patients. In particular, we aim to identify differential edges between regulatory networks of two groups involving these molecular levels. Motivated by the rate equations, the regulatory mechanism between DNA copy number aberrations and gene expression levels within a pathway is modeled by a simultaneous-equations model, for the one- and two-group case. The latter facilitates the identification of differential interactions between the two groups. Model parameters are estimated by penalized least squares using the lasso (L1) penalty to obtain a sparse pathway topology. Simulations show that the inclusion of DNA copy number data benefits the discovery of gene-gene interactions. In addition, the simulations reveal that cis-effects tend to be over-estimated in a univariate (single gene) analysis. In the application to real data from integrative oncogenomic studies we show that inclusion of prior information on the regulatory network architecture benefits the reproducibility of all edges. Furthermore, analyses of the TP53 and TGFb signaling pathways between ER+ and ER- samples from an integrative genomics breast cancer study identify reproducible differential regulatory patterns that corroborate with existing literature.
|Statistical Applications in Genetics and Molecular Biology
|Published - 2014