The Use of Phylogenetic Analysis in the Classification of Plant Viruses: Some Examples

The classification of plant viruses has traditionally been a conservative field of endeavour, with plant virologists resolutely resisting the introduction of Latinate nomenclature and Linnaean classification schemes. Plant viruses have also until recently been classified in only two taxa - the "virus" (more or less equivalent ot animal virus species) and the "group" (equivalent to genus).

Recently, however, there has come a move to introduce higher-level taxa such as the "supergroup" or family into plant virus classification schemes, in line with animal and human virus classification. There has also been an increasing use of genome sequence data for establishing relationships between viruses: some of these analyses have been especially illuminating in that they have revealed distant relationships between several plant virus groups and some animal virus families.

Two notable examples are the relationships between picornaviruses, comoviruses and potyviruses; and between caulimoviruses, retroviruses and hepadnaviruses.

Such sequence comparisons have largely made use of strongly-conserved predicted protein sequences that are related to replication; the inference of the analyses is that the replicases of individual viruses within these separate groups of viruses have diverged evolutionarily from a common source.

Similar sequence comparisons have been used to estimate evolutionary distances between, and make taxonomic proposals for, many different organisms, ranging from bacteria to plants to hominoids. A common approach in these studies has been the construction of phylogenetic trees for the organisms in question: these have been used to clarify lines of evolutionary descent of the organisms, and illustrate relationships with taxonomic significance. It is interesting that very few such analyses have been performed for viruses in general, and plant viruses in particular - probably both because of a general reluctance to speculate upon virus evolutionary processes, and because of a lack of sequence data. The analyses that have been done have also concentrated mainly on distance data - that is, pairwise similarity or difference estimates - for use in phenetic comparisons, rather than using the cladistic techniques which have become popular in animal and plant systematics recently.

Data Sets Used for Relationship Dendrograms and Phylogenetic Analyses

1. Use of serological data

Some of the first quantitative estimates of "difference" between plant viruses were based on serological data. Serology has long been used for grouping plant viruses (Gibbs, 1985); however, the concept of quantitating serological differences is fairly recent. In 1970 van Regenmortel and von Wechmar introduced the concept of the serological differentiation index or SDI: this is the serological cross-reactivity between two viruses expressed as the number of twofold dilution steps separating homologous and heterologous titers.

Formerly such investigations were performed using precipitin techniques; recently, however, ELISA techniques have been used for SDI determinations: Jaegle and van Regenmortel demonstrated the general use of the technique, and Dekker et al. (1988) and Pinner et al. (1992) put it to use in determining relationships among maize streak virus (MSV) and related geminiviruses of cereals and grasses.

SDI data have been used for taxonomic purposes, in that "clusters" of related viruses can be differentiated from less-related species (van Regenmortel, 1982); they have also been used to construct relationship dendrograms: Koenig (1976) presented a comprehensive analysis of the serology of 13 tymoviruses, which included a novel "loop structure".

In fact, one can take a table of reciprocal SDI values for a group of viruses and analyse it directly for construction of a dendrogram. The SDI data is conceptually similar to the pairwise DNA hybridisation data that has often been used for construction of phylogenetic trees); thus, one may also root the dendrogram to make it into a tree, and speculate on the phylogeny of virus coat proteins from serological data.

2. Coat protein composition

Gibbs (1980, 1985) has shown that the relatedness of tobamovirus coat proteins as assessed from amino acid sequences agrees closely with that assessed from their amino acid compositions. He has used these latter data to construct relationship dendrograms for many tobamoviruses: these dendrograms may also be regarded as phylogenetic trees, as they are rooted, and reflect a line of evolutionary descent. These may productively be used for re-classification of certain of the viruses, and as confirmation of the classification of several others: for example, closely-clustered viruses could be regarded as strains, and distinct clusters as groups of strains of different viruses.

That the approach may be comfortably used to define a taxonomic group - or genus - is shown by Gibbs (1980), where a tree that includes the furovirus beet necrotic yellow vein virus (BNYVV) clearly separates this from definitive tobamoviruses.

Fauquet et al. (1985) have also claimed that the amino acid compositions of most virus coat proteins fell into groups closely corresponding to accepted plant virus taxonomic groups: thus determination of the composition of a virus coat protein may well be a most useful exercise from a taxonomic point of view, as such data appears to define the virus taxonomic group (=genus); however, it is still an analytical undertaking of some magnitude, and difficult for most laboratories.

3. Limited RNAse digestion and oligonucleotide mapping

For many years researchers have been mapping and typing ssRNA viruses of mammals (and of plants) by limited RNAse digestion of genomic RNA and oligonucleotide mapping by electrophoresis and chromatography. Such analyses are obviously possible for plant viruses, and could be as informative. They are only really usable at the level of "species" relationships), as distantly-related RNAs tend to have very different oligonucleotide maps.

4. Restriction fragment and map data

It is a relatively simple task - and certainly easier than exhaustive protein purification protocols - to purify double-stranded DNAs from plants, and to analyse them by restriction endonuclease digestion, electrophoresis, and hybridisation with labelled probes. The use of restriction endonuclease cleavage patterns and derived restriction maps of DNA sequences for phylogenetic analysis - particularly of mitochondrial DNAs - is well established; however, the methods have had little application in plant virology since most plant viruses have ssRNA genomes.

One group of viruses that have been studied by restriction mapping and restriction fragment pattern differences are the geminiviruses of maize and cereals (Clarke et al., 1989; Kirby et al., 1989; Rybicki et al., 1989; Hughes et al., 1990).

It is possible to directly convert a series of restriction fragment patterns of different viral DNAs on an electropherogram into a digital data matrix and analyse this by cladistic techniques; it is also possible to construct difference tables (Clarke et al., 1989; Rybicki et al., 1989) and analyse them phenetically. Detailed restriction maps of related virus DNAs may be transformed into distance matrices (Kirby et al., 1989; Hughes et al., 1990), or also treated as digital information for cladistic analysis.

With recent developments in cDNA synthesis technology, it is also possible to routinely map RNA viruses by restriction enzyme cleavage: for instance, DNA fragments obtained from human rhinoviruses by polymerase chain reaction (PCR) amplification of cDNA have been used for typing of the virus isolates by restriction endonuclease cleavage patterns; PCR has also been used for amplification and typing of viroids from cDNA. Restriction map data falls down, however, at demonstrating relationships beyond the level of about 30% sequence difference (Kirby et al., 1989); thus map comparisons may only be useful for sub-group level comparisons.

5. Nucleic acid and protein sequences

Recent developments in nucleic acid technology make it almost easier to obtain long stretches of RNA or DNA sequence, than to sequence or to determine the composition of the protein coded for by the sequence; however, the latter has also become easier (Shukla and Ward, 1989). Thus, comparisons of coat protein gene sequences - or of the predicted amino acid sequences coded for - are probably going to be far more widely used for comparison purposes than are serological data or amino acid composition data. The usefulness of coat protein sequence differences for phylogenetic analysis of the tobamoviruses has already been mentioned; it is worth noting that Gibbs (1976) has used such an analysis to estimate a divergence time of between 200 and 600 Myr for the definitive tobamoviruses. Both sorts of analysis - directly from sequence, and from pairwise sequence difference comparisons - lead to diagrams which can be used to group viruses.

Application of Phylogenetic Analyses to Plant Virus Classification

A convenient rule of thumb with the simpler viruses appears to be that coat protein gene sequences vary the most, followed by certain of the non-structural proteins such as movement proteins, followed by proteins associated with the replication machinery). This is nowhere more clear than among the plant virus groups linked by possession of a tripartite ssRNA genome and isometric or bacilliform particles: the bromo-, cucumo-, ilar- and alfalfa mosaic virus groups, which have been proposed as members of a plant virus family to be called the Bromoviridae).

These examples could be taken as leading inexorably to a proposal for the establishment of plant virus taxa right up to the "super-familial" level: however, plant virus genome relationships are not as simple as are the phylogenetic relationships of some of their constituent parts, and any taxon higher than the familial level would be very hard to justify. For example, the Bromoviridae all have a similar genomic structure and organisation, and it could be confidently asserted that all components of all of them have a common source. Tobamoviruses also all have a similar genetic organisation. Their putative replicases are related to those of the Bromoviridae; however, there is no demonstrable relationship between any other genes of the two groups of viruses, and the genetic organisation is markedly different. Moreover, both sets of viruses share sequence similarity with the animal alphaviruses, in the family Togaviridae, which have even less similar genetic organisation. Should these viruses all be grouped in a "superfamily"?

Much the same can be said for the picorna-, como- and potyviruses. Goldbach (1987) has proposed that simple viruses have evolved by building up modules into genomes, and that different "supergroups" can be distinguished by their core modules, which consist of genes related to replication: thus there is a picornavirus-like supergroup, of viruses with similar polymerse/protease core modules, and an alphavirus supergroup, of viruses with similar "nucleotide-binding" protein/replicase cores.

(RETURN TO MAIN)


Last Modified September 28, 1994