In order to answer the fundamental question of what is life, it is important to know the history of life and how it has evolved. Genomes can also provide vital information in this area of evolution studies.
Comparative biology and taxonomy from the perspective of morphological characteristics
In the past, studies on the evolution and taxonomy of organisms have always classified organisms according to morphology, lifestyle, and habitat environment. For example, the hedgehogs that live in the North American continent and the greater hedgehog tenrec that lives on the Madagascar Island in Africa look very much alike. Because of this close resemblance, they used to be classified under the same phyletic line.
However, the habitats of the two animals are separated by the sea; therefore, their habitats could not have spread because of the migration of their common ancestors. Today, it is understood that two entirely different phyletic lines evolved independently in separate places. As this example shows, evolution may occur in the same direction separately on different continents (parallel evolution). In other cases, different species living in a similar environment may eventually develop similar traits, such as the case where animals living underground have eyes that have atrophied in the same way (convergent evolution). Thus, organisms with similar traits do not always have common ancestors and therefore, classifying animals according only to morphology, lifestyle, and habitat environment may lead to a wrong conclusion.
Such problems can also be resolved by comparing genome sequences. For example, protists of the order Diplomonadida have a nuclear membrane but no mitochondria. This was thought to be proof that Diplomonadida had descended from primitive eukaryotic cells from before when mitochondria was incorporated into eukaryotic cells through symbiosis. However, it has been revealed through comparison of the genomes that there was a stage when these protists did have mitochondria in them, but lost their mitochondria during the process of evolution. In other words, it was incorrect to classify the protists as a primitive eukaryotic cell just on the basis of its morphology.
In this way, molecular phylogeny based on studies of the genome is gradually starting to hold an important position, but this will not lead to the complete rejection of conventional methods. As discussed later, information obtained from fossils may be more accurate in some cases. It is thus important to combine conventional approaches with molecular analysis.
Comparative biology and taxonomy from the perspective of nucleic acid sequence
As mentioned earlier, in classification methods based only on fossils and morphology, it is difficult to accurately determine whether a morphological trait was newly acquired through evolution, or it can be traced from the organism’s ancestors; and if a morphological similarity is the result of parallel evolution, or derived from a common ancestor. In such cases, comparison of the genome sequence may shed light on the direction of evolution. For example, taking four similar species a, b, c, and d (see Column the bottom), if the same mutation was seen at the same location in the genomes of all four species, the mutation should have occurred before these species branched. If a, b, and c have no mutation, and only d shows the mutation, d is assumed to be relatively distant from a, b, and c. In this way, by comparing the homology of each base substitutions in genomes, it is possible to estimate the phylogenetic tree through calculations that use mutations as elements (Figure 24-3).
Let us now consider genetic mutations using the theory of probability. Assuming that one base substitution occurs stochastically in a certain time, genomes with numerous mutations can be considered to have developed over a long time. This method to back calculate time on the basis of the frequency of genetic mutations is called the molecular clock.
How are phylogenetic trees created? (maximum parsimony method)
DNA and amino acid sequences can be used as elements of calculation for constructing a molecular phylogenetic tree. However, there are various calculation methods in constructing molecular phylogenetic trees, each with its merits and demerits. So far, there is no single definitive method for calculating phylogenetic trees. In fact, it is best to use several calculation methods to estimate a phylogenetic tree, and to compare the results in order to determine the validity of the tree. In this COLUMN, a method called maximum parsimony method is briefly described.
Assume that there are four species with the base sequences shown in Column Figure 24-2 and you want to construct a phylogenetic tree on the basis of these sequences, but you do not have any other information on the mutual relationships between the four species. Let us start with the tree form shown in (1).
Next, select a species (the out-group) to be used as reference for calculating the phylogenetic tree. The out-group should be a species or group of species closely related to the species in question (the in-group), but can be determined based on other evidence that they are not included in the in-group. For simplicity, let us proceed with the discussion, assuming that d has been found to be an out-group*7 .
*7 It should be noted that the species with the most different sequence should not necessarily be chosen as the outroup. Choosing of the outgroup should be validated based on a viewpoint other than sequence patterns.
Looking at the first base from the left, c is the same as that of out-group d, whereas a and b are different. Therefore, it can be assumed that a and b have acquired new traits, so drive a wedge as shown in (2) to separate them into two groups. Since the second base is the same in all the species, let us skip it, and focus on the third base. This base differs only in the out-group, so drive another wedge as shown in (3). The fifth base is complicated. Some may consider separating them between the (a, b, d) group and c group; however, since it is known that d is an out-group, it makes sense to take this change as a base substitution introduced after the establishment of species c, thus the wedge should be placed as in (4).
In this way, by repeatedly reflecting base substitutions in the phylogenetic tree base by base, the form shown in (5) will eventually be obtained. This method of estimating the phylogenetic tree is called maximum parsimony method. In reviewing base substitutions according to the maximum parsimony method, it is important to take into consideration that there exists a direction, which extends from plesiomorphy (ancestral character) to apomorphy from the out-group to the in-group, and to form groups based on synapomorphy.
When actually constructing a maximum parsimony phylogenetic tree, one should basically prepare a phylogenetic tree for all potential topologies. Then, one tree should be selected that has the smallest number of base substitutions. This is in line with the simple concept of the maximum parsimony, that is to say, the hypothesis requiring the least number of assumptions is the best.
Apart from the maximum parsimony method, several methods of preparing phylogenetic trees have been proposed. For further details, it may be interesting to study the neighbor-joining method.
Problems of molecular phylogeny
Figure 24-3 Example of steps to construct a phylogenetic tree using the maximum parsimony method
Mitochondrial DNA (mDNA) includes approximately 16,000 bp in humans, bonobos, chimpanzees, gorillas, orangutans, and rhesus macaques, respectively. The mDNA sequences were compared between the species, according to which the phylogenetic tree was constructed by using the neighbor-joining method. The number near each branch denotes the frequency of base substitution*8. The established phylogenetic tree suggests that the bonobo and chimpanzee are the nearest neighbors, and then become more distant in the order of human, gorilla, orangutan, and rhesus macaque.
*8 Denotes the probability of base substitution of a species of interest at the same position as that of a base extracted from the reference species. The closer the probability approaches to 0, the closer the species are. The closer the probability approaches to 1, the greater the difference is. When comparing the base substitution frequency of organisms in this drawing, the lengths of the branches in between must be combined.
In this way, as long as genetic information is available, the evolution of species can be compared objectively without interference of the subjective views of humans. However, this method has not just advantages but also disadvantages. First, comparative methods applying genome sequences cannot be used for extinct species. With the techniques presently available, it is extremely difficult to extract DNA from fossils. Even if an organism has been acquired from an amber fossil with its original form intact, the DNA found in this organism would be heavily fragmented and could not be used for analysis.
Another serious problem of comparison on the genome level is that the speed of gene mutation is not uniform and follows different molecular clocks. In other words, the frequencies of base substitutions are not uniform depending on the gene used or species. Thus, it is “risky” to discuss the time of species divergence (absolute age of divergence) in phylogenetic trees solely on the basis of gene sequence comparisons.
The age of a fossil and the environment it lived in can be estimated by isotopic age determination and by the stratum from which the fossil was found. Fossil information is therefore essential for determining the absolute age of divergence of molecular phylogenetic trees. In addition, fossils provide important information when discussing the evolution and phylesis of extinct species. In this regard, it is safer to discuss general evolution by combining estimates of molecular evolution based on genome comparison with evidence obtained from fossil information.