8.1Gene Transcription and Expression


Central Dogma

Fig. 8-1 Central dogma

The genetic information for a protein specifically refers to the information that determines its primary structure (amino acid sequence) and, at the molecular level, to the nucleotide sequence (base sequence) of DNA. This information (base sequence) contained in DNA is transcribed into mRNA (messenger RNA, refer to section 2 in this chapter), which is synthesized using DNA as a template. This process is called transcription. The central dogma of molecular biology is the concept that genetic information flows from DNA to mRNA and then to protein (Fig. 8-1). This concept is the basic principle common to all organisms from prokaryotes to eukaryotes, including bacteria and humans. Since one information language (base sequence of mRNA) is translated into another language (amino acid sequence), the process is called translation.

Top of Page


Genetic Code

Fig. 8-2 Codon table

The genetic information contained in DNA consists of base sequences. However, the genetic code is defined as the base sequence of mRNA transcribed using DNA as a template, with a sequence of three specific bases (a codon) corresponding to one amino acid. Sixty-four possible codons (43) encode 20 types of amino acids (Fig. 8-2). For example, the codon 5′-AUG-3′ on mRNA corresponds to the amino acid methionine in a protein (Fig. 8-3). A protein consisting of a chain of 400 amino acids is derived from a 1200 base sequence of DNA and thus a 1200 base sequence of mRNA. In this case, the 1200-base sequence stretch of the entire DNA molecule is the gene for this protein. While this information is contained in the exon region in eukaryotes, a gene is also considered to contain introns (see Selection 3 of Chapter 7, Fig. 7-4).

While AUG is the codon for methionine, it also functions as the initiation codon in protein synthesis. Following the bases determining the first amino acid, the sequence of the next three bases determines the next amino acid (Fig. 8-3). While there are three types of termination codons (Fig. 8-2), no amino acid corresponds to these codons. Protein synthesis stops once it reaches a termination codon. The region from the initiation codon to the termination codon is the coding region.

Fig. 8-3 Genes and genetic information

Top of Page


Sense Strand DNA

The strand of DNA complementary to the template strand for RNA synthesis is termed the sense strand (Fig. 8-3). Codons on the sense strand DNA are almost the same as those on the mRNA sequence, and if T on the sense strand DNA is changed to U, it becomes mRNA. Thus, ATG on the sense strand corresponds to AUG on mRNA. Along double-stranded DNA, the sense strand differs depending on the gene and is not identical along the span of DNA (Fig. 8-4A). While regions containing amino acid information (coding regions) are present on the span of mRNA, respective 5′ and 3′ non-coding regions are also present on the 5′ and 3′ ends of the coding regions (Fig. 8-4B).

Fig. 8-4 RNA transcription

Top of Page


Gene Expression

The synthesis of mRNA based on their genetic information, from which proteins are then produced, is known as gene expression. However, expression of non-coding RNA genes is the production of RNA. Inactive state of gene is described as the suppression of gene expression.

Top of Page