Data-driven modelling of protein synthesis: A sequence perspective

Alexey Gritsenko

doi:10.4233/uuid:064f0a35-5d76-42e8-a1ad-3afb5916dd3c

Data-driven modelling of protein synthesis: A sequence perspective

Alexey Gritsenko

Pattern Recognition and Bioinformatics

Research output: Thesis › Dissertation (TU Delft)

497 Downloads (Pure)

Abstract

Recent advances in DNA sequencing, synthesis and genetic engineering have enabled the introduction of choice DNA sequences into living cells. This is an exciting prospect for the field of industrial biotechnology, which aims at using microorganisms to produce foods, beverages, pharmaceuticals and fine- and bulk chemicals in a sustainable fashion. Biotechnologists often achieve this by genetically engineering these microorganisms to introduce novel production pathways using genes found in other strains or species. However, detailed understanding of gene expression regulation remains elusive, especially at the level of translation; thus, when it comes to writing DNA to express proteins at user-specified levels, we are still miles away.

Second generation DNA sequencing technologies have made it easy and affordable to reconstruct the genomes of industrially relevant microbes, thus providing better reference sequences for genetic engineering. However, technological limitations allow for reconstructing only parts of the entire genomes unambiguously, thus requiring additional scaffolding steps to obtain genome-length reconstructions. We propose a method that improves genome scaffolding by integrating heterogeneous sources of information on genome contiguity. These methods improve the quality of genome reconstructions at the cost of a limited number of additional errors.

The ease and affordability of DNA sequencing has also led to the development of a number of biological assays which exploit sequencing, among which the ribosome profiling assay. This assay allows for unprecedented examination of the process of protein synthesis by recording positions of actively translating ribosomes across thousands of living cells. We employed these data to develop data-driven models of Saccharomyces cerevisiae protein synthesis. A relatively simple model was used to re-design genes for heterologous expression; a second, more complex model yielded insights into the process of translation. Our models suggest that protein synthesis is limited at the stage of initiation, and that codon translation rates are not determined by tRNA levels alone, and appear to be sequence context-dependent.

Finally, the combination of DNA synthesis and sequencing offers the possibility to perform high-throughput in vivo assays to study the effect of user-designed sequences. We used this approach to study translation initiation at Internal Ribosome Entry Sites (IRESs). We identified short sequence elements predictive of IRES activity in viruses and humans, and obtained insights into the effect of element sequence, multiplicity and position on IRES activity. We propose a high-level architecture of viral and cellular IRESs, and offer a mechanistic explanation for differences between IRES architectures of different virus types.

Original language	English
Qualification	Doctor of Philosophy
Awarding Institution	Delft University of Technology
Supervisors/Advisors	de Ridder, Dick, Supervisor Reinders, M.J.T., Supervisor
Award date	22 Mar 2017
Print ISBNs	978-94-028-0559-8
DOIs	https://doi.org/10.4233/uuid:064f0a35-5d76-42e8-a1ad-3afb5916dd3c
Publication status	Published - 2017

Keywords

translation
protein synthesis
sequence modelling
sequence analysis
cap-independent translation

Access to Document

10.4233/uuid:064f0a35-5d76-42e8-a1ad-3afb5916dd3c

Dissertation GritsenkoFinal published version, 29.6 MB

Cite this

@phdthesis{064f0a355d7642e8a1ad3afb5916dd3c,

title = "Data-driven modelling of protein synthesis: A sequence perspective",

abstract = "Recent advances in DNA sequencing, synthesis and genetic engineering have enabled the introduction of choice DNA sequences into living cells. This is an exciting prospect for the field of industrial biotechnology, which aims at using microorganisms to produce foods, beverages, pharmaceuticals and fine- and bulk chemicals in a sustainable fashion. Biotechnologists often achieve this by genetically engineering these microorganisms to introduce novel production pathways using genes found in other strains or species. However, detailed understanding of gene expression regulation remains elusive, especially at the level of translation; thus, when it comes to writing DNA to express proteins at user-specified levels, we are still miles away.Second generation DNA sequencing technologies have made it easy and affordable to reconstruct the genomes of industrially relevant microbes, thus providing better reference sequences for genetic engineering. However, technological limitations allow for reconstructing only parts of the entire genomes unambiguously, thus requiring additional scaffolding steps to obtain genome-length reconstructions. We propose a method that improves genome scaffolding by integrating heterogeneous sources of information on genome contiguity. These methods improve the quality of genome reconstructions at the cost of a limited number of additional errors.The ease and affordability of DNA sequencing has also led to the development of a number of biological assays which exploit sequencing, among which the ribosome profiling assay. This assay allows for unprecedented examination of the process of protein synthesis by recording positions of actively translating ribosomes across thousands of living cells. We employed these data to develop data-driven models of Saccharomyces cerevisiae protein synthesis. A relatively simple model was used to re-design genes for heterologous expression; a second, more complex model yielded insights into the process of translation. Our models suggest that protein synthesis is limited at the stage of initiation, and that codon translation rates are not determined by tRNA levels alone, and appear to be sequence context-dependent.Finally, the combination of DNA synthesis and sequencing offers the possibility to perform high-throughput in vivo assays to study the effect of user-designed sequences. We used this approach to study translation initiation at Internal Ribosome Entry Sites (IRESs). We identified short sequence elements predictive of IRES activity in viruses and humans, and obtained insights into the effect of element sequence, multiplicity and position on IRES activity. We propose a high-level architecture of viral and cellular IRESs, and offer a mechanistic explanation for differences between IRES architectures of different virus types.",

keywords = "translation, protein synthesis, sequence modelling, sequence analysis, cap-independent translation",

author = "Alexey Gritsenko",

year = "2017",

doi = "10.4233/uuid:064f0a35-5d76-42e8-a1ad-3afb5916dd3c",

language = "English",

isbn = "978-94-028-0559-8",

type = "Dissertation (TU Delft)",

school = "Delft University of Technology",

}

TY - THES

T1 - Data-driven modelling of protein synthesis

T2 - A sequence perspective

AU - Gritsenko, Alexey

PY - 2017

Y1 - 2017

N2 - Recent advances in DNA sequencing, synthesis and genetic engineering have enabled the introduction of choice DNA sequences into living cells. This is an exciting prospect for the field of industrial biotechnology, which aims at using microorganisms to produce foods, beverages, pharmaceuticals and fine- and bulk chemicals in a sustainable fashion. Biotechnologists often achieve this by genetically engineering these microorganisms to introduce novel production pathways using genes found in other strains or species. However, detailed understanding of gene expression regulation remains elusive, especially at the level of translation; thus, when it comes to writing DNA to express proteins at user-specified levels, we are still miles away.Second generation DNA sequencing technologies have made it easy and affordable to reconstruct the genomes of industrially relevant microbes, thus providing better reference sequences for genetic engineering. However, technological limitations allow for reconstructing only parts of the entire genomes unambiguously, thus requiring additional scaffolding steps to obtain genome-length reconstructions. We propose a method that improves genome scaffolding by integrating heterogeneous sources of information on genome contiguity. These methods improve the quality of genome reconstructions at the cost of a limited number of additional errors.The ease and affordability of DNA sequencing has also led to the development of a number of biological assays which exploit sequencing, among which the ribosome profiling assay. This assay allows for unprecedented examination of the process of protein synthesis by recording positions of actively translating ribosomes across thousands of living cells. We employed these data to develop data-driven models of Saccharomyces cerevisiae protein synthesis. A relatively simple model was used to re-design genes for heterologous expression; a second, more complex model yielded insights into the process of translation. Our models suggest that protein synthesis is limited at the stage of initiation, and that codon translation rates are not determined by tRNA levels alone, and appear to be sequence context-dependent.Finally, the combination of DNA synthesis and sequencing offers the possibility to perform high-throughput in vivo assays to study the effect of user-designed sequences. We used this approach to study translation initiation at Internal Ribosome Entry Sites (IRESs). We identified short sequence elements predictive of IRES activity in viruses and humans, and obtained insights into the effect of element sequence, multiplicity and position on IRES activity. We propose a high-level architecture of viral and cellular IRESs, and offer a mechanistic explanation for differences between IRES architectures of different virus types.

AB - Recent advances in DNA sequencing, synthesis and genetic engineering have enabled the introduction of choice DNA sequences into living cells. This is an exciting prospect for the field of industrial biotechnology, which aims at using microorganisms to produce foods, beverages, pharmaceuticals and fine- and bulk chemicals in a sustainable fashion. Biotechnologists often achieve this by genetically engineering these microorganisms to introduce novel production pathways using genes found in other strains or species. However, detailed understanding of gene expression regulation remains elusive, especially at the level of translation; thus, when it comes to writing DNA to express proteins at user-specified levels, we are still miles away.Second generation DNA sequencing technologies have made it easy and affordable to reconstruct the genomes of industrially relevant microbes, thus providing better reference sequences for genetic engineering. However, technological limitations allow for reconstructing only parts of the entire genomes unambiguously, thus requiring additional scaffolding steps to obtain genome-length reconstructions. We propose a method that improves genome scaffolding by integrating heterogeneous sources of information on genome contiguity. These methods improve the quality of genome reconstructions at the cost of a limited number of additional errors.The ease and affordability of DNA sequencing has also led to the development of a number of biological assays which exploit sequencing, among which the ribosome profiling assay. This assay allows for unprecedented examination of the process of protein synthesis by recording positions of actively translating ribosomes across thousands of living cells. We employed these data to develop data-driven models of Saccharomyces cerevisiae protein synthesis. A relatively simple model was used to re-design genes for heterologous expression; a second, more complex model yielded insights into the process of translation. Our models suggest that protein synthesis is limited at the stage of initiation, and that codon translation rates are not determined by tRNA levels alone, and appear to be sequence context-dependent.Finally, the combination of DNA synthesis and sequencing offers the possibility to perform high-throughput in vivo assays to study the effect of user-designed sequences. We used this approach to study translation initiation at Internal Ribosome Entry Sites (IRESs). We identified short sequence elements predictive of IRES activity in viruses and humans, and obtained insights into the effect of element sequence, multiplicity and position on IRES activity. We propose a high-level architecture of viral and cellular IRESs, and offer a mechanistic explanation for differences between IRES architectures of different virus types.

KW - translation

KW - protein synthesis

KW - sequence modelling

KW - sequence analysis

KW - cap-independent translation

UR - http://resolver.tudelft.nl/uuid:064f0a35-5d76-42e8-a1ad-3afb5916dd3c

U2 - 10.4233/uuid:064f0a35-5d76-42e8-a1ad-3afb5916dd3c

DO - 10.4233/uuid:064f0a35-5d76-42e8-a1ad-3afb5916dd3c

M3 - Dissertation (TU Delft)

SN - 978-94-028-0559-8

ER -

Data-driven modelling of protein synthesis: A sequence perspective

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Unbiased quantitative models of protein translation derived from ribosome profiling data

Using predictive models to engineer biology: a case study in codon optimization

GRASS: a generic algorithm for scaffolding next-generation sequencing assemblies

Cite this