The content of the word “gene” or its concept will change with advances in research. It has different histories in each field and its significance can change depending on the researcher or the situation in which it is used. There have been historical transitions and fluctuations to the word “gene.” Although we have defined genes earlier in this book, it is an important subject, and therefore, we have included a review that summarizes what we have learned about genes thus far.
Genes Are Factors That Control Traits
In the latter half of the 19th century, Mendel recognized from his experiments with peas that something was controlling the traits that were being passed from the parent to offspring. This was how the concept of genes was established. However, the actual material (at the substance level) was unclear when Morgan created the chromosomal genetic map with Drosophila in the 1910s. It remained unclear even when Beadle proposed his “one gene–one enzyme” theory using Neurospora crassa at the start of the 1940s when genetic biochemistry had developed from genetics.
Genes Are DNA
Avery’s experiments on transformations and Hershey and Chase’s experiments confirmed that DNA is the manifestation of genes, and in 1953, Watson and Crick presented a structural model for DNA. DNA is, without exception, the substance carrying genes in all prokaryotes and eukaryotes. However, some viruses have no DNA but RNA that carries genes, e.g., influenza virus.
Genes Are Parts of DNA That Encode Amino Acids
Advances in molecular biology using E. coli and phages lead to the establishment of a central dogma, and it becomes common to believe that genes are the parts of DNA that encode amino acids. In particular, genes begin with ATG and end at (or before) a stop codon. It is often said that genes are DNA, but this would imply that the substantial entity of genes is DNA. However, all of the DNA sequence is not genes.
Introns Are Also Included in Genes
With the discovery of splicing, eukaryotic genes were described to be fragmented because it was believed that genetic information was present in exon portions and not in intron portions. However, it is now believed that in case of eukaryotes, genes include not only exons and introns but also the regions that are transcribed. Thus, genes are considered to be the parts of DNA corresponding to the region from the beginning of pre-mRNA to the end including the non-translated regions at both the 5′ and 3′ ends.
However, in case of prokaryotes, polycistronic mRNA is not only a unit that is transcribed but also is one mRNA that contains information of multiple genes. In this case, a gene represents only the parts of DNA that encode amino acids. Similarly, genes in monocistronic mRNA in prokaryotes are considered to be the parts of DNA that encode amino acids. However, considering polycistronic mRNA as an exception, it is also often said that genes in prokaryotes are the entire range that is transcribed, including the non-translated regions at both the 3′ and 5′ ends. This is the same as the definition in eukaryotes.
Furthermore, complementary DNA (cDNA) is used because it has practically the same functions as genes and can be easily synthesized from mRNA. It is therefore (see Column Selection 3 of Chapter 8) such that when “cDNA is cloned,” it can also be said that “genes have been cloned.” However, in case of eukaryotes, cloning of cDNA and that of genes are clearly different as DNA.
Structural and Regulatory Genes
The typical genes that determine a protein’s primary structure are called structural genes because there existed, in contrast to it, the concept of regulatory genes that modulate gene functions. Thus, historically, promoter and operator genes are examples of regulatory genes. This was because the part of DNA that fulfills specific functions was considered as genes. At present, such parts are no longer called genes; they are rather called promoter and operator regions, the general name for these being regulatory regions. However, until now, genes encoding proteins regulating expression of other genes are occasionally called regulatory genes. The genes for transcriptional regulatory factors are structural genes, and at the same time, they function as regulatory genes regulating expression of other genes.
When the One Gene–One Enzyme Theory Does Not Apply?
Basically, one gene determines one type of protein. However, in a considerable number of cases, multiple types of proteins with different functions are produced as exceptions of a one-to-one correspondence. For example, in prokaryotes and viruses, it is possible to synthesize two different types of proteins from different reading frames on the same mRNA. In some cases, multiple types of mRNA are synthesized with different reading frames by starting transcription using multiple transcription initiation points. Possibly, there are some cases in which different types of mRNA can be synthesized using each of the two strands found roughly in the same region of DNA as respective templates.
In addition, in eukaryotes, different types of mRNA and multiple types of proteins are often synthesized by selectively splicing the same pre-mRNA. Therefore, the number of types of proteins is so high while the number of human genes is so low. Because multiple types of pre-mRNA are transcribed with different number of exons by initiating transcription from different transcription initiation points, multiple types of mRNA can occasionally produce multiple types of proteins. When polyadenylation signals occur at two sites in pre-mRNA, a protein with a shortened C terminal is occasionally synthesized from the resulting short mRNA.
Furthermore, in some cases, a large completed protein is cleaved and then multiple cleaved proteins with specific functions are produced; these proteins are given specific names.
In these cases, even when the transcribed region is called a gene, it is impossible to simply use the name of the gene as the name of the proteins. There is no unified representational method for correlating the name of the gene product (the protein) to the name of the DNA region or gene that encodes it, and thus, we cannot but device another notational system to prevent misunderstandings arising in individual cases.
Genes without Protein Information
The idea that genes are parts of DNA that are transcribed is elaborated by rRNA, tRNA, and snRNA genes. These are ncRNA that have no structural information for proteins. rRNA genes refer to all parts of DNA that correspond to the transcribed 45S rRNA; however, when they are limited as 18S rRNA genes, then it refers to the parts of DNA that correspond to 18S rRNA in 45S rRNA. miRNA is mainly believed to be responsible for the phenomenon of RNAi in cells, which is associated with ncRNA. The parts of DNA that are templates for miRNA are also ncRNA genes, and a very large number of different types of these genes are considered to be present.
Regulatory Regions Are Occasionally Included in Genes
When the entire DNA is divided into functionally important and unimportant parts, structural genes and the regions that regulate their transcription are considered as portions with important functions. Therefore, genes are sometimes considered to include transcription regulatory regions in addition to structural gene portions. This is meaningful in its own way. However, transcription regulatory regions in eukaryotes are often extremely large, and it is often impossible to accurately grasp the entire related region.
The Number of Genes
By the beginning of the 21st century, almost the entire DNA sequence had been determined by the human genome project. Efforts are in progress to determine the entire DNA sequence in many other organisms, and the number of genes is estimated based on these data. In humans, the number of genes was first estimated to be greater than 30,000, but later the estimated number dropped to 22,000. Advances in the methods for estimating undefined genes revealed that the number is 25,000, although this number could change. As described in 5, it was recently found that small RNAs in mice are transcribed from a considerable portion of DNA. If many of these small RNA function as miRNA etc., then the template DNA regions should be considered as genes, giving rise to the possibility that the number of human genes is greater than several tens of thousands. This indicates that there might be a change in the conventional knowledge that because the types of ncRNA genes are very limited, almost all genes carry protein information. Instead, it may be that the majority of eukaryotic genes are ncRNA genes. The issue remains unresolved.
Revisiting the Genome
Genome is the genetic information required for the creation of an organism. DNA is the substantial entity of genes and has all the information required for the creation of an organism. Prokaryotic cells and eukaryotic germ cells are haploids that have a single set of a genome, while many eukaryotic somatic cells are diploids with two sets of genomes. At first glance, this is very simple to understand, but there are some contentious issues. As the genetic information required for the creation of an organism, the human genome is defined as “22 chromosomes + X chromosome + Y chromosome” (the set of all genes found or all DNA found within them). This is not only true for humans but also for organisms that have sex chromosomes. On the other hand, germ cells are haploid because they have one set of the genome. In germ cells, the egg has 22 chromosomes and an X chromosome, while the sperm has 22 chromosomes and a Y chromosome. In both cases, a proper set lacks one part of the chromosome. Somatic cells have two sets of genomes. However, female somatic cells do not include a Y chromosome; thus, one set is partially lacking. In addition, males are therefore not diploid, as far as the sex chromosomes are concerned. However, it is usually not pursued to this extent.
Chromosomal and Extrachromosomal Genomes
The original cellular genome is called the chromosomal genome or chromosomal DNA. Eukaryotes are also said to have a nuclear genome. The word chromosome originally referred to the characteristic aggregated structure of condensed chromatin (46 in humans) that appears during eukaryotic mitosis. In the non-mitotic phase, chromosomes are not formed but are distributed as chromatin within the nucleus of a cell. In prokaryotes, there exist no chromosomes with the same meaning as mentioned above, but the phrase “chromosomal genome” or “chromosomal DNA” is used regardless of whether for prokaryotes or eukaryotes. In contrast, plasmid DNA found in prokaryotic cells and DNA of eukaryotic mitochondria and chloroplasts are treated separately as the extrachromosomal genome or extrachromosomal DNA. These are also called the mitochondrial and chloroplast genomes. They are treated separately because they have some elements of parasitic origin, and the amount contained per cell is not always constant.