Print

24.5Genome Sequencing and Future Biology

With the successful sequencing of the genome of various organisms including humans, it is becoming increasingly possible to elucidate the entire picture of not only DNA, but also RNA and proteins, which are made based on this information. Just like the discovery of the Rosetta stone, which led to the competition between researchers to decipher the story told on the stone, today’s studies are being carried out briskly to decipher information obtained from genome sequences. With regard to eukaryotic cells, it is gradually being revealed that genetic information is contained not only in DNA sequences, but also in the modifications of DNA and histones. The decoding of such epigenome information, which is progressing rapidly, is expected to lead to the comprehensive understanding of life.

Top of Page

24.5.1

Genetic differences between individuals: SNP and CNV

In the actual genome plan, first, the genome of an individual of a certain species is sequenced. Then, the differences between individuals of that species are reviewed. In humans, comparison of the sequences of two persons who are not blood-related indicate that difference in the sequence of one base out of 1,000 to 2,000 bases can be found. An international comparison of the sequences of numerous human beings of different races revealed polymorphism at 10 million places in the whole genome of 3 billion base pairs. This change in the DNA sequence is called single nucleotide polymorphism (SNP). In some cases, SNPs may be present in protein areas coding amino acids, but most SNPs are found in sequences regulating genetic activity.

Furthermore, mutations are also known to occur in the number of genes copied. Because humans have a pair of chromosomes, the general understanding is that there are two copies of genes. However, it is gradually being revealed that long areas of several thousand to several million bases, including genes, overlap on the genome and the apparent number of copies of genes differs among individuals. This form of mutation is called copy number variation (CNV). Such mutations can be seen in a much wider spectrum than expected, in more than 12% of the human genome, and are increasingly believed to be related to the individual difference in genes and the onset rate of diseases.

Through such genetic differences at the individual level, it can be seen that the genome of the human species contains various genetic mutations, and these mutations generate diversity within the species and serve as the driving force to continued survival of the species.

Top of Page

24.5.2

Transcriptome

Figure 24-8 Example of a transcriptome

This shows the transcriptome analysis of cancer and non-cancerous regions of the surgically resected liver. The expression level of mRNA was compared between the two regions to identify the genes expressed at a higher level in the cancerous region (AFP, glypican3).

With technological progress, it has become possible to extract all RNA molecules from a certain cell or an indivisual and determine the sequence. The comprehensive information of the RNA sequence is called transcriptome (meaning totality of transcript). In multicellular organisms, there exist not only housekeeping genes that all cells have in common, but also gene clusters specifically expressed in certain cell species and each cell species has a characteristic transcriptome. By analyzing the transcriptome, it is possible to discover genetic expression specific to canceration (Figure 24-8). Through transcriptome analysis, new target genes for the diagnosis and treatment of cancers are being identified and genomic drug discovery is progressing.

The comparison of the transcriptome of various animals with their genome sequences has also revealed that other than the mRNA sequence coding proteins, the DNA of various regions is transcribed to RNA. In addition to the known tRNA, which transports amino acids according to the mRNA sequence, and rRNA, which creates ribosomes that synthesize proteins, short miRNAs, which regulate transcription and translation, and other types of RNA are gradually being discovered.

Furthermore, it has also been found that the cells of female mammals have a long untranslated RNA molecule, called Xist, which adheres to one of the two X chromosomes and inactivates it. Thus, both male and female animals have one active X chromosome (see Chapter 10).

RNA molecules are present in form of a single nucleic acid chain, allowing them to identify DNA and RNA with complementary sequences and bind to them specifically. In addition, they are able to create various three-dimensional conformations by themselves, such as the shape of loops and hairpins, and bind to proteins. Examples of such RNA molecules forming complexes with proteins, DNA, and RNA to carry out physiological functions are gradually being uncovered, and their most important function is the control of gene transcription and translation. RNA molecules have also been discovered to play an important role in the formation of intranuclear structures.

Top of Page

24.5.3

Proteome: Systematic analysis of cell proteins and their modifications

On the basis of discoveries made by researchers such as Koichi Tanaka who won the Nobel Prize, numerous proteins can now be identified in a short time by using mass spectrometry (see Selection 3 of Appendix, Mass Spectrometry). As with the genomic analysis using a sequencer and transcriptome analysis using microarrays, mass spectrometry enables analyses of the proteome?, the entire picture of all the proteins present in an organism or a cell. First, the protein is broken down using trypsin and the mass of the peptide obtained is determined; next, the ionized peptides are analyzed one by one to determine the amino acid composition. Modifications such as the phosphorylation, acetylation, and methylation of proteins can be elucidated systematically. As modified proteins have a larger mass than the sum of amino acids, the bindings of modified substrates are dissociated by means of collision of inactivated gases or irradiation of low speed electrons allowing accurate measurements of the decrease in molecular mass, and thus, enabling the determination of the type of modification.

Such effective methods for identification of proteins can be applied to medical use, including systematic analysis of proteins in the blood and urine, which allows the search for biomarkers reflecting health and nutrient conditions.

Another important element in the systematic studies on protein is the identification of the three-dimensional conformation of proteins. When being translated, proteins fold in a unique manner, a state that is called protein folding. If the protein is crystallized, its structure can be determined in detail by X-ray crystallographic analysis. However, the conditions for crystallization differ according to the protein and are often difficult to realize.

Efforts are being made to develop systematic methods of crystallization. Structural analysis in water solvents using nuclear magnetic resonance (NMR) is also being progressed. In recent years, computer-based techniques for the calculation of molecular dynamics are being developed to predict changes in folding in water solvents. Previously, such predictions were thought to be impossible because of the high number of atoms involved, but with the progress of computer performance and parallel processing, advances in this area are anticipated.

Furthermore, it has also been found that the cells of female mammals have a long untranslated RNA molecule, called Xist, which adheres to one of the two X chromosomes and inactivates it. Thus, both male and female animals have one active X chromosome (see Chapter 10).

RNA molecules are present in form of a single nucleic acid chain, allowing them to identify DNA and RNA with complementary sequences and bind to them specifically. In addition, they are able to create various three-dimensional conformations by themselves, such as the shape of loops and hairpins, and bind to proteins. Examples of such RNA molecules forming complexes with proteins, DNA, and RNA to carry out physiological functions are gradually being uncovered, and their most important function is the control of gene transcription and translation. RNA molecules have also been discovered to play an important role in the formation of intranuclear structures.

column

Search for disease causal genes from haplotypes

Column Figure 24-3 Search for haplotypes linked to the onset of diseases

International HapMap Project
(adapted from http://hapmap.ncbi.nim.nih.gov/originhaplotype.html.ja)

Column Figure 24-3 shows how chromosome sequences of two distant ancestors are blended by repeated recombination through the generations. Various regions of a chromosome are marked by numerous SNP and CNV information, which allows identification of regions that are transmitted together as a set. This set of genetic markers is called “haplotype” in the narrow sense of the term. In the broad sense, the term is used to indicate the sequence of a single chromosome. However, the term haplotype is often used in the narrow sense, as is the case with international haplotype mapping projects. If a disease is triggered by a region marked with X, determining the characteristic haplotype in patients with the disease can help identify the region where the causal gene lies (see Column Selection 4 of Chapter 18). Hence, it is important to properly detect and mark SNPs, CNVs, and other mutations in the different regions.

In humans, the easiest way to conduct a haplotype analysis is by using the Y chromosome found only in men and the mitochondria gene found in women.

Top of Page

24.5.4

Epigenome: The memory system of eukaryotic cells

Figure 24-9 Concept of epigenome and genetic expression

As organisms evolved, genetic information also underwent major changes. It has been discovered that in eukaryotic organisms, epigenetic information such as modification of DNA and histones are conveyed from cell to cell, thereby changing the gene expression profile.

In particular, in multicellular organisms that form tissues and organs, epigenetic changes are recorded as cells differentiate. The entirety of these records is called epigenome, and it characterizes the cell lineage and adaptation to the environment of multicellular organisms. It has been suggested that the human body has about 200 types of cells, each having a unique epigenome.

The advent of technology now allows DNA sequencing of regions with epigenetic modification. This has also allowed accurate analysis of epigenetic information for the whole genome. In particular, as shown in Appendix-Figure 8, the methylated region of the DNA can be determined in the entire genome, by immunoseparation from cut DNA fragments using antibodies for methylated DNA, and then by determining the separated sequences using a high-speed sequencer. Likewise, methylated regions of important amino acids of histones can also be determined using antibodies.

These results suggest that undifferentiated cells such as ES and iPS cells have markers for both acceleration-type modification for promoting transcription (methylation of histone H3 at lysine 4) and deceleration-type modification for inhibiting transcription (methylation of histone H3 at lysine 27). If only the acceleration-type modification remains after differentiation, they will become cells expressing that gene. On the contrary, if only the deceleration-type modification remains, they become cells that do not express that gene (Figure 24-9).

The concept of the epigenome is bringing forth an enormous change to the understanding of the principles of organisms. Until now, it has been taken for granted that DNA codes RNA information and that RNA possesses information on proteins. However, studies on the epigenome have revealed that in eukaryotic cell organisms, the genetic information of DNA is modified by proteins and RNA within an individual organism, and information, including that on these modifications, is transmitted to the next generation. The principles by which genetic information creates a stable system while being modified and changed is not clear, and it is expected to become a major theme in the field of biology in the future.

Top of Page

next

prev