24.1What is the Genome?

The term “genome” was coined by adding “ome” a suffix that means “total” to “gene.”*1 Since the age of classical genetics, the term “genes” has been used to refer to something that is passed on to future generations to convey an organism’s traits, and it is now known that this is actually carried out by DNA (or its sequence). DNA exists not only in chromosomes in the nucleus, but in certain subcellular organelles outside the nucleus such as mitochondria and chloroplasts. Bacteria are known to have circular DNA (plasmid DNA), which can be replicated autonomously, apart from chromosomal DNA. Furthermore, DNA sequences include both sequences that specify amino acid sequences forming proteins and those that clearly do not code proteins, e.g., introns, repeated sequences and transcriptional regulatory regions that regulate the expression of genes. On the basis of this, the term “genome” can be defined as “a set of genetic information required for maintaining an organism of a species in the complete state.”* It is thus appropriate to understand life phenomena from the perspective of the expression of genetic information through temporal and spatial axes.

As with the term “gene,” the meaning of “genome” and “genetic information” also changes with time and purpose. With regard to eukaryotes, it is gradually being discovered that not only DNA sequence information but also the state of proteins and RNA associated with that DNA affects genetic expression. Consequently, genetic information in the nucleus comprises a complex of information carried by nucleic acids such as DNA and RNA and associated proteins (or their state of modification).

*1 Likewise, the term transcriptome was created from transcript, and proteome from protein.
*2 Genome is defined to be the “a set of genetic information required for maintainingan organism of a species in the complete state.” In organisms having a diploid nuclear phase, for example, all of chromosomal DNA was conventionally thought to be found in the components of the genome. In the Human Genome Project, however, genome was defined as a complete set of specific intranuclear sequences and expressed as 22 chromosomes + X and Y chromosomes. Generally, multiple copies of mitochondria and chloroplast DNA are contained in cells, but mitochondrial genome and chloroplast genome refers to a single set of DNA information in the mitochondria and chloroplast, respectively.

Top of Page


Genome sequencing

Efforts to decode the DNA sequences in genes have been made since the 1970s. In the latter half of the 1970s, a method to determine DNA sequences was established (see Column the bottom), which helped clarify the base sequence of genes gradually. Then, from the 1990s onward, projects attempting to decode the entire genome that form organisms were launched, the most ambitious of which was the Human Genome Project. This was a grand plan aiming to sequence all 3 billion base pairs.


Frederick Sanger: A genius who contributed enormously to life science research

Genome sequencing is nothing but the process of decoding the numerous pieces of DNAs of an organism one by one. This was made possible by technological innovations that led to the establishment of methods for DNA base sequence determination. Today, the most widely applied method is that developed by Fredick Sanger. To recap the differences between DNA and RNA discussed in the previous chapter, of five-carbon sugars, which are also called pentoses, ribose has a hydroxyl group at the 2′ and 3′ positions, while deoxyribose has a hydroxyl group only at the 3′ position. When this hydroxyl group at the 3′ position is eliminated from deoxyribose, it becomes a chemical substance called dideoxyribose. When triggering elongation reaction of DNA in a test tube, Sanger intentionally mixed a certain ratio of dideoxynucleotides to the deoxynucleotides used for the reaction. The hydroxyl group at the 3′ position is an important site necessary for phosphate binding with the next base. If dideoxyribose is bonded to a chain during the reaction, the elongation reaction does not progress any further. It is thus important to mix the two compounds at a certain ratio. Given that the bonding of dideoxynucleotides depends solely on probability, elongation would be terminated in some chains (those that happened to bond with dideoxynucleotide at one point or another) and in other chains elongation would progress further (those that did not bond with dideoxynucleotide), resulting in varied lengths of chains. After the elongation reaction is completed, high-resolution polyacrylamide gel electrophoresis (see Selection 5 of Appendix, Figure 5) allows separation of elongation fragments of various lengths. Then, by sequencing these later, the base sequences of DNA can be determined. In recognition of this study Sanger was awarded the Nobel Prize in Chemistry in 1980*3 . In fact, Sanger had already won the same prize in 1958. His first Nobel Prize was given for his achievement in determining the amino acid sequence of insulin, famous for its role in lowering blood sugar levels. Then too, he devised a unique method for amino acid sequence determination, which was a key factor to his winning of the prize. Moreover, in the 1960s, Sanger also established a method for determining the base sequence of RNA. His RNA base sequencing method, which breaks down RNA using the RNA degrading enzyme and then efficiently sorts the degraded fragments and assembles them like a jigsaw puzzle, is also an accomplishment well worth the Nobel Prize. Thus, Sanger has developed methods for decoding the information of the three most important biomolecules -DNA, RNA and protein- all by himself. With the discovery of reverse transcriptase (see Column Selection 2 of Chapter 8), it has become very easy to reversely transcribe DNA from RNA, to create DNA artificially from RNA, and then to determine the base sequence. For this reason, Sanger’s method of decoding base sequences directly from RNA is hardly used now, but Sanger’s achievements will never fade away.

*3 Walter Gilbert also received the Nobel Prize in Chemistry in the same year for a different method for determining base sequences in nucleic acids.

Column Figure 24-1 Outline of Sanger’s method

A) Schematic diagram of nucleotides, deoxynucleotides, and dideoxynucleotides
B) DNA fragments of various lengths can be produced by starting the DNA elongation reaction with a small amount of dideoxynucleotide added. The fragments are then classified according to length and the results are analyzed to read the DNA sequence.

On the other hand, because of DNA sequencing techniques and improved performance of computers used for the analysis, the time required for genome sequencing has been shortened remarkably. Today, the genome of various species is being sequenced on a daily basis. The Human Genome Project was declared complete in 2003 when 99% of the human genome had been sequenced. In recent years, progress has been made to the point that all genes of an individual human can be sequenced in a short period*4 .

*4 The initial costs for the Human Genome Project were said to be 100 million dollars. In contrast, in 2007, the costs for sequencing of the genome of James Watson, co-discoverer of the DNA structure, were only about a million dollars with only 2 months spent. This indicates that both time and costs for genome sequencing have been reduced sharply.

Top of Page


Classification of species based on the genome

As already discussed in Chapters 1 and 7, prokaryotes and eukaryotes have very different structures such as presence/absence of nuclear membranes, organelles, etc. The following comparison of their genomes also clearly shows these differences (Figure 24-1).

■The genome of prokaryotes
The genome of prokaryotes comprises chromosomes and plasmid DNA. Both have circular double-helix structures*5*6.

■The genome of eukaryotes
The genome of eukaryotes comprises intranuclear chromosomes and unique DNA sequences existing in mitochondria and plastids (e.g., chloroplasts). Unlike prokaryotes, most of the genome is composed of untranslated regions that do not contain protein information. In the case of human beings, the regions coding for proteins account for only 1.3% of the genome.

The genome of mitochondria and chromosomes has a circular structure, similar to prokaryotes. For example, yeast has a plasmid, but its structure is circular like that of prokaryotes.

*5 About 90% of this sequence is the translated region indicating the amino acid sequences making up protein.
*6 There are some (but few) prokaryotes that have linear chromosomes.

Figure 24-1 Illustration of genomes of prokaryotes and eukaryotes

A) Prokaryotes as well as mitochondria and plastids of eukaryotes have genome with a circular structure. B) The chromosome genome of eukaryotes has a linear structure with repetitive base sequences at each end of the chain (in mammals, 6-base TTAGGG are repeated). This repetitive sequence is called telomere.

On the other hand, the nuclear genome forming chromosomes of eukaryotic cells is structured as multiple linear chains. Compared to the genome of prokaryotes, the structure of the nuclear genome is complex. Its features are described below.

■Nucleosome structure
As described in Chapter 10, the nuclear genome of eukaryotes has a structure called nucleosome, which comprises DNA wrapped around basic proteins called histones. It is known that changes in the nucleosome structure due to differences in the modification of histone proteins affect the expression of genes from those DNA sequences that are wrapped around them.

■Exon-intron structure
As described in Chapter 8, the genes of eukaryotes have an exon-intron structure, which undergoes a modification process, called splicing, during gene expression.

It is difficult to answer the fundamental question as to why introns exist in the genome of eukaryotes (necessity of introns). At present, advantages of having introns are assumed to be as follows. During the creation of new genes by genetic recombination, if recombination takes places at the intron position, exons may be combined to form a new protein with novel functions. This process is called exon shuffling and is considered an important element of evolution (see Column Selection 3 of >Chapter 7).

In addition, the number of exons used during splicing can be changed to increase the types of proteins translated (Figure 24-2). This phenomenon is called selective splicing and allows approximately 100,000 diverse proteins to be created from approximately 25,000 genes coded in the human genome.

The telomere is a region of repetitive nucleotide sequences present at each end of the nuclear DNA linear structures (see Column Selection 4 of Chapter 7). Due to the DNA replication mechanism, these telomeres grow shorter as the cell undergoes cell division repeatedly. Normally, cells eventually stop dividing when the telomeres become too short. Cancer cells, however, have either no telomeres which results in chromosome abnormalities, or, on the contrary, telomeres that are replenished endlessly by over-activation of telomerases (enzymes that work to elongate telomeres).

■Higher-order structure of the genome and gene expression
As mentioned earlier, nucleosome structures affect gene expression, which is one of the reasons for the differences in gene expressions among different types of differentiated cells. Multicellular organisms have various tissues and organs, which are comprised of differentiated cells specific to the type of tissue or organ. However, almost all the cells of an organism (with a few exceptions) carry the same genome information (i.e., DNA sequence). Proteins that surround the genome are responsible for regulating the expression of genes according to specific patterns of cell differentiation. This shows that, in addition to the DNA sequence, higher-order structures (i.e., the epigenome, to be discussed later) that surround the genome are vital to gene expression.

Figure 24-2 Actions of intron in the genome

A) The genome of eukaryotes (particularly multicellular) contains numerous introns. When gene recombination takes place in the intron area, amino acid sequences are not only damaged, but genes with new functions may be created. B) In eukaryotes, RNA transcription is carried out with the introns included, and the precursor mRNA (pre-mRNA) becomes mature mRNA by splicing. During the process, slightly varying proteins are produced from the same genome by selectively joining together exons. The example here shows 6 types of mRNA (A to F), which are expressed in different tissues or at different stages in time, made by alternative splicing. The squares in the figure represent exons whereas the straight lines represent introns. The bent lines indicate that unnecessary parts have been eliminated through splicing.

Top of Page


Organelles and Genome

It has been known that several eukaryote organelles such as mitochondria and chloroplasts carry their own DNA. Surprising results were obtained when the DNA sequence of such organelles was decoded and analyzed. It was found that the genome of rickettsia, a parasitic bacterium that grows inside animal cells, had the highest homology to the mitochondria genome. Likewise, the genome of cyanobacterium, which is a photosynthesis bacterium, was closest to the genome of chloroplasts. There are two different hypotheses with regard to the origin of these organelles. One is the autogenous theory that suggests these organelles formed and evolved uniquely in cells, while the other is the endosymbiotic theory suggesting that organelles descended from parasitic cells that became incorporated in the host cell. However, as studies on mitochondria and chloroplast genomes progress, the results seem to provide strong support to the endosymbiotic theory which claims that these organelles originated from parasites living in the host cells.

Top of Page