Introduction
Eukaryotic organisms are comprised of many distinct cell types, each of which possesses an identical nuclear genome. However, each cell type (e.g., liver, kidney, neuron) has a unique morphology and function. This is made possible by the fact that every cell type transcribes a distinct set of genes. Genes within a given cell type that are transcribed are said to be expressed, and the set of all the RNAs transcribed are aptly known as the transcriptome. The transcriptome includes all types of RNA, including messenger RNA (mRNA) and functional RNA molecules such as ribosomal RNA (rRNA). Expression of the correct set of genes at the correct time is necessary for proper organismal development, cellular function, and maintenance of homeostasis in cells. Hence a gene can show different levels of expression in different developmental stages or tissues. For example, a gene might have high levels of expression in embryos but not in adults. Transcription is therefore a highly regulated process in eukaryotes.
The TSS Modules aim to familiarize students with the process of defining the core promoter region for each isoform of a gene and identifying the transcription start site or sites. This primer therefore includes some biological concepts that may be helpful to review prior to working your way through the four TSS Modules, including: a brief review of gene transcription, mRNA processing, promoter structure, and chromatin.
Review of Gene Transcription
Transcription is the biological process by which a template strand of genomic DNA is used to generate a single-stranded RNA molecule, the transcript. Transcription begins at a position in the genomic DNA called the transcription start site (TSS). The TSS is found within a region of the genomic DNA known as the core promoter. The core promoter is bound by the major enzyme responsible for transcription — RNA Polymerase II (RNA Pol II).
It is RNA Pol II that synthesizes the RNA transcript by incorporating ribonucleotides that are complementary to the bases found in the template strand of genomic DNA. RNA Pol II does not act alone, however. There are cellular proteins known as transcription factors that are critical in recruiting the RNA Pol II enzyme and controlling the level of transcription. Figure 1 shows a basic overview of transcription, with special emphasis on RNA Pol II and the core promoter region.
mRNA Processing
Transcripts that can be translated into proteins are called messenger RNAs (mRNAs) and are processed in three ways prior to export to the cytoplasm where translation occurs. The three mRNA processing events are:
-
5' Cap: a 7 methyl-guanosine is added to the 5' end of the RNA transcript
-
Splicing: Intronic regions are removed via splicing. Conserved nucleotide sequences found in the genomic DNA and corresponding transcript, including splice donor (GU) and splice acceptor (AG) sites are critical for faithful removal of introns by the proteins responsible for splicing. Alternative patterns of splicing can result in different gene isoforms.
-
Poly-A Tail: A string of non-templated adenines (~30-200) is added to the 3' end of the transcript.
Processing of the transcript in these three ways results in a completed or "mature" mRNA molecule. Figure 2 below summarizes eukaryotic gene structure from genomic DNA to mature transcript.
On Figure 2, use brackets and labels to indicate the following parts of the final processed transcript: 5' untranslated region (5' UTR), coding sequence (CDS) and 3' untranslated region (3' UTR).
The Core Promoter and Promoter Structure
The DNA sequence surrounding the first transcribed exon of a gene isoform is required for recruitment of RNA Pol II to the genomic DNA and is known as the core promoter (Figure 1). The core promoter for a given isoform consists of a unique complement of specific nucleotide sequences referred to as DNA motifs. Two such motifs, known as the TATA box and Inr motif, are shown in Figure 1 and Figure 2, to help you visualize the position of these nucleotide sequences relative to the start of transcription. The DNA sequences that make up these motifs are critical because transcription factor proteins directly bind to these sequences and help to recruit and properly position RNA Pol II on the genomic DNA (Figure 1). The nucleotide position on the genomic DNA that corresponds to the start of transcription, the transcription start site, is designated as position +1, and all other core promoter motif positions are designated relative to the +1 TSS position. Core promoters in Drosophila typically encompass the region from -40 to +40 nt relative to the +1 TSS position [Vo Ngoc et al., 2019]. Once RNA Pol II has been recruited and properly positioned at the TSS on the genomic DNA, the process of transcription initiation and elongation can begin.
Core promoters can be classified into one of three major categories: Peaked, Broad, or Intermediate (Figure 3). Classification is based on the number and distribution of the TSSs utilized by the RNA polymerase. This is readily seen in Figure 3, where the location of transcription start sites for each promoter type are shown as arrows and peaks. The size of the arrows and the height of the peaks correspond to the levels of transcripts generated from each TSS, where larger arrows and higher peaks correspond to higher levels of gene expression. "Promoter shape" refers to the distribution of the TSSs for a given promoter. Peaked promoters have narrow shape because they possess only a single TSS. Broad promoters have a broad shape resulting from the fact that there are multiple TSSs within the promoter that are utilized by RNA Pol II. Given that peaked promoters have a single TSS, transcription from peaked promoters is sometimes referred to as focused transcription, whereas transcription from broad promoters is referred to as dispersed transcription. Genes that exhibit tissue- or development-specific gene expression often have peaked promoters, while broad promoters are often associated with genes that are expressed ubiquitously in cells.
The classification of core promoters is rarely cut and dried. Recent analysis has shown that most promoters in the fruit fly Drosophila melanogaster have multiple TSSs and that promoter type may be classified as "Intermediate" [Hoskins et al., 2011]. Intermediate promoters have multiple TSSs, but some TSSs are used at higher frequencies than others (Figure 3, bottom panel).
Review of Chromatin Packaging
It is important to remember that the process of transcription occurs on DNA that is packaged into chromatin. Chromatin refers to the complex of DNA, histone proteins and non-histone proteins that interact with each other to form an intact chromosome within the cell nucleus.
The amino acids of histone proteins can be chemically modified, which allows the DNA to be wound tighter or more loosely around the histone octamer. Areas of the genome in which the DNA is packaged comparatively loosely is referred to as euchromatin (Figure 4). DNA that is packaged tightly around the histone octamer is referred to as heterochromatin. Euchromatic regions of the genome are generally transcriptionally active, whereas heterochromatic regions of the genome are comparatively transcriptionally inactive.
In the TSS modules you will see that identification of DNA motifs and characterization of chromatin packaging in a particular area of the genome can be used as evidence to assist in the process of TSS annotation.
Bibliography
-
[Hoskins et al., 2011] Hoskins, R. A. et al. Genome-wide analysis of promoter architecture in Drosophila melanogaster. Genome Res. 21, 182–192 (2011).
-
[Vo Ngoc et al., 2019] Vo Ngoc, L., Kassavetis, G. A., & Kadonaga, J. T. The RNA Polymerase II Core Promoter in Drosophila. Genetics, 212(1), 13–24 (2019).