From genome to transcriptome: new layers of gene expression

by Dulguun Amgalan

The field of molecular biology, as its name implies, is the study of life at the molecular scale. Living things are made of cells, and each one of these basic units of living matter is a complex molecular machine, in which macromolecules such as nucleic acids, polysaccharides and proteins interact together to generate a metabolic process we know as “life”. Although organisms are overwhelmingly diverse when observed from the outside, the fundamental molecular processes are essentially uniform, and it is this constancy that provides molecular biologists with the tools to understand how living things function.

One of the first things we learn as biology students is the “central dogma” of molecular biology. It is the concept that explains how our genetic information is utilized to produce the necessary structural and functional components inside our cells. The most simplistic view of the central dogma can be outlined as this:

DNA (gene) → RNA → protein.

Our genetic information, stored in the form of DNA in cells, is first transcribed into messenger RNA (mRNA) – a close molecular relative of DNA. The mRNA then relays the genetic information to ribosomes and directs the synthesis of proteins. To use a computer analogy: genes (encoded in DNA) are the software of a cell that contains all the information needed to make up a cell, and proteins are the hardware that performs the actual physical work (i.e. energy production, cell migration, light emission). For many decades, genes and the proteins that they code were the foci of molecular biology research. After all, our genes and the expression of those genes as proteins define our appearance and behavior. For example, human skin color is determined by the amount of a natural pigment called melanin that is produced by specialized cells in our skin. A protein called tyrosinase – encoded by the gene tyr – catalyzes the synthesis of melanin, which means that people with more tyrosinase activity have darker skin color1. In some genetic disorders, a mutation in the tyr gene results in the production of defective tyrosinase, causing albinism. Observations like this, where a gene or multiple genes determined what we are and how we look like, led many scientists to hypothesize that higher organisms must have more of these “protein-coding genes” than lower organisms, to account for their greater complexity. But how do we quantify the total number of genes of an organism?

Advances in DNA sequencing technology in the 1990s allowed scientists to map an organism’s entire genome – all the genetic information encoded in the entirety of its DNA. To date, hundreds of organisms’ genomes have been sequenced. One unexpected result that emerged from this growing library of genomes is the fact that there is no direct relationship between the number of protein-coding genes and organismal complexity. Simple organisms such as the roundworm, C. elegans, have nearly as many genes as humans. In fact, humans actually have fewer protein-coding genes than plants such as rice2. Interestingly, however, there is a strong correlation between the size of the non-protein-coding portion of the genome and developmental complexity, suggesting that these sequences, which were once thought to be mostly “junk” DNA, may have crucial biological functions.

Surprisingly, while only 2% of the total human genome encodes for proteins, more than 70% of the entire genome is transcribed to RNA, indicating that our cells make vast amounts of non-coding RNAs3. These include small RNAs, long non-coding RNAs (lncRNAs), pseudogenes, and circular RNAs (circRNAs). Due to these findings, the field of RNA biology has found itself under a greater spotlight in the world of post-genomic molecular biology. During the past 15 years, a large body of research has documented some of the regulatory, structural, and catalytic functions of the human transcriptome – the entirety of the transcribed RNA.

In 2011, Dr. Pier Paolo Pandolfi’s group at Harvard Medical School advanced a radical hypothesis – the ceRNA hypothesis4. They postulated that all RNA transcripts communicate with and regulate each other through a new RNA language called the “competing endogenous RNA” (ceRNA) language. It is well known that microRNAs (miRNAs), a class of small non-coding RNAs, can bind to protein-coding RNAs, specifically to a region called miRNA response elements (MREs). Once bound, these miRNAs recruit other molecules that destroy the protein-coding RNA, thereby repressing the protein expression of that gene. Interestingly, it has become evident that other RNA species (e.g. small RNAs, lncRNAs, pseudogenes and circRNAs) also harbor MREs that can be targeted by miRNAs. This has led to the hypothesis that all RNA transcripts that contain MREs can compete for limited pools of miRNAs, thus acting as competing endogenous RNAs and control each other’s expression.

One of the first pieces of evidence in support of these ceRNAs was discovered in plants in 20075. In the common mustard weed, A. thaliana (the first plant to have its genome sequenced), a gene called PHO2 encodes for a protein that reduces shoot phosphorus level. An miRNA species, called miR-399, targets PHO2 RNA for degradation, causing an increase in phosphorus content. However, a non-coding RNA called IPS1 sequesters miR-399 by the MRE-mechanism. Therefore, overexpression of IPS1 decreased the availability of miR-399 to cleave its target PHO2 RNA, resulting in PHO2 protein accumulation and concomitant phenotypic change. This one-to-one communication between a coding and a non-coding RNA initially appeared to be an elegantly simple mechanism, but in fact, various ceRNA species can engage in direct and indirect cross talks, creating complex networks termed ceRNETs. In recent years, experimental evidence has extended the ceRNA hypothesis to diverse species such as viruses, plants, and higher vertebrates, suggesting that it may be a conserved and nearly universal mechanism of genetic regulation.

The ceRNA hypothesis is highly innovative because it unifies the diverse populations of RNA transcripts, both coding and non-coding, under a common RNA language mediated by MREs and miRNAs, and defines a novel layer of genetic regulation. Importantly, it serves as a foundation toward functionalizing a considerable portion of the transcriptome. From a certain perspective, the driving force behind protein expression of genes is not the production of mRNA itself, but the layers of regulatory and cross-competing RNAs that “decide” whether or not that mRNA is able to get to access a ribosome and initiate protein synthesis. This allows any given gene to act, not simply in a binary “on-off” capacity, but along an expression gradient – and it is this interpretation that may provide a satisfying answer to the question of how a relatively small number of protein-coding genes can result in such phenotypically complex organisms. This framework has significantly contributed to the RNA biology field, as evidenced by the establishment of several rapidly expanding ceRNA databases, such as ceRDB6 and starBase7, that catalogue and predict interactions of ceRNAs. Intriguingly, the expression of non-coding genes, just like coding genes, can be heavily altered in diseases, and many genes involved in ceRNA interactions have been implicated in different cancers8, suggesting the importance of the transcriptome in normal cellular function. Therefore, this additional layer of genetic regulation has major implications for human physiology and pathophysiology, and may provide new avenues for the development of therapies for cancer and other diseases.


  1. Iwata, M. et al. J Invest Dermatol 95, 9-15 (1990).
  2. Taft, R.J. et al. Bioessays 29, 288-99 (2007).
  3. Djebali, S. et al. Nature 489, 101-8 (2012).
  4. Salmena, L. et al. Cell 146, 353-8 (2011).
  5. Franco-Zorrilla, J.M. et al. Nat Genet 39, 1033-7 (2007).
  6. Sarver, A.L. et al. Bioinformation 8, 731-3 (2012).
  7. Li, J.H. et al. Nucleic Acids Res 42, D92-7 (2014).
  8. Karreth, F.A. et al. Cancer Discov 3, 1113-21 (2013).

picDulguun was born and raised in Mongolia, a richly historic and rapidly developing country in central Asia. Driven by her love of science, she graduated from the University of Tokyo with a B.S. in Biochemistry and is currently a Ph.D. candidate at Albert Einstein College of Medicine. Dulguun is conducting her thesis research on the fundamental mechanisms of cell death and its role in heart dysfunction. Her research involves powerful techniques of chemical, cellular and molecular biology and she hopes that one day her work will lead to the development of drugs against heart disease, the most common cause of death worldwide. Dulguun has recently been awarded the American Heart Association (AHA) Predoctoral Fellowship for her innovative research.

This entry was posted in Uncategorized and tagged , , , , . Bookmark the permalink.

1 Response to From genome to transcriptome: new layers of gene expression

  1. Badamtsetseg Lkhamjav says:

    She was a always a hard-worker, truly dedicated person. Good luck in your research!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s