Computational analysis of copy number profiles of tumors

Ewald Van Dyk

Research output: ThesisDissertation (TU Delft)

236 Downloads (Pure)

Abstract

Cancer is a genetic disease. The activation, alteration or deactivation of cancer genes can stimulate undesirable cell-proliferation. Cancer genes can be subdivided into oncogenes and tumor suppressors. Oncogenes, such as growth factor receptors, are altered and/or overexpressed genes that are causally linked to tumorigenesis. Tumor suppressors, by contrast, are typically under expressed or deleted in tumors since they would otherwise serve a protective role.
There are two main genetic mechanism that can activate or deactivate cancer genes: mutations and DNA copy number alterations. In this work, we focus on detecting novel cancer genes using somatic DNA copy number data. The philosophy is simple: if independently acquired somatic amplifications or deletions occur frequently across multiple tumor samples, they are likely to harbor oncogenes or tumor-suppressors respectively. With a single tumor DNA copy number profile,it is not possible to know which copy number alterations activate or deactivate cancer genes, since many of the alterations (referred to as passenger aberrations) occur due to genomic instability and do not necessarily provide a selective advantage for cancerous cells. However, when aggregating across many samples, we expect cancer genes to be amplified or deleted more frequently than by chance, which allows us to detect them.
This application can be regarded as a peak calling problem. We aggregate (sum) copy number profiles across many tumors and call peaks that are significantly high. To do this we define a null model that describes the behavior of an aggregate copy number profile that would arise if only passenger aberrations occurred. The null aggregate profile (also called the noise profile) exhibits high auto correlation across the genome due to the segmented nature of copy number profiles.
We therefore developed a statistical framework for calling peaks (at varying widths) where the noise profile can exhibit strong auto correlation. The framework allows us to detect peaks (at varying widths) with high statistical power while controlling the false discovery rate of detected peaks. We employ two concepts. First, we take advantage of the fact that broad peaks can be detected with much higher statistical power when smoothing the profile and we developed techniques for adaptive smoothing. Second, we use a powerful statistic called the expected Euler characteristic that is insensitive to platform resolution, directly compatible with our smoothing methodology and that can be directly used to estimate the expected number of false positive peaks called.
This framework does not rely directly on the inherent properties of DNA copy number profiles and can therefore be applied in many more applications with suitably defined null-models. Although the mathematics we develop in this framework might be taxing at times, we observe thatthe equations that result and that are ultimately used in our peak calling algorithms are simple and the validity can easily be verified by simulating data and comparing our theoretical expectations with measured observations.
Original languageEnglish
Awarding Institution
  • Delft University of Technology
Supervisors/Advisors
  • Wessels, L.F.A., Supervisor
  • Reinders, M.J.T., Supervisor
Award date9 Jan 2019
Print ISBNs978-94-6384-003-3
DOIs
Publication statusPublished - 2019

Keywords

  • copy number profile
  • segmentation
  • recurrent aberrations
  • recurrent copy number breaks
  • oncogene
  • tumor suppressor
  • driver gene
  • scale space
  • Euler characteristic

Fingerprint

Dive into the research topics of 'Computational analysis of copy number profiles of tumors'. Together they form a unique fingerprint.

Cite this