With a grant from the NSF's Plant Genome Research Program, Bob Jansen is applying next-generation DNA sequencing methods to better understand why the geranium has evolved to be so radically different from other plants.
Intergenome Cooperation
New Tools Bring New Challenges
The technologies that researchers use to sequence and analyze genetic data are only a few years old and the scale of the information involved is massive. Before Jansen and collaborators could start interpreting the genomic data, they needed to determine the most efficient way to gather it.
"We first went through the literature to see what everybody thought we should do and there was absolutely no consensus," Jansen said. "Many of the aspects of the sequencing and analysis hadn't even been compared."
Basic questions needed to be answered: Which sequencing platform works best for this type of problem? Which algorithm is fastest and most accurate for assembling sequences? And how much information is needed to find significant factors in the evolution of the genome?
A recent analysis by Jansen and his colleagues explored these questions and advanced the researchers' quest for the optimal experimental setup. They found that by using the Illumina HighSeq 2000 platform (a next-generation sequencer) in tandem with Trinity (a leading assembly tool), they were able to achieve the most accurate and efficient results. They also determined that roughly 40% of the sequence data was needed before they reached a plateau of useful information to assemble a complete transcriptome.
"We had no idea how much data we needed and the more data you have to gather the more expensive it is," Jansen said.They established this percentage by taking increments of a huge amount of data — about 14 billion sequence reads — from 5% up to 100%, assembling those different increments, and using a reference genome to see how many more genes they found and how the coverage of each improved.
Supercomputers like TACC's Ranger speed up sequence analyses by breaking the process down into small chunks and distributing them to thousands of computer processors working together. In the case of Jansen's project, Ranger also acted as a test-bed for method development, allowing the researchers to compare multiple experimental approaches to find the best one.
"For each species that we're looking at, we get all of these DNA or RNA sequences and we have to assemble these short reads into a complete genome, or into complete transcriptomes. This takes lots of memory and space," Jansen said. "The bottom line in our case—we could not do it without TACC."
Identifying Genetic Differences
Above and beyond the specific evolutionary history of the geranium, the researchers are hoping their investigation will uncover basic facts about evolution. They speculate that the high levels of rate change occurring in this group might have something to do with genes that are involved in DNA repair and recombination.
"Experimental evidence demonstrates that if you mutate the recombination genes, you can generate instability in the genome," Jansen said. "We're hoping to uncover some evidence that this phenomenon is related to those classes of genes."
Understanding how plant genomes evolve, interact with each other, and coordinate functions may seem obscure, but a general model of the division of labor within plant cells and their shared genomic functions could eventually lead to practical applications.
"We use evolution for lots of purposes agriculturally. We select for certain features in crop plants to have bigger ears of corn or bigger tomatoes," Jansen said. "If you don't understand the genes that are involved in that and how they work, it's hit or miss with regard to whether you're doing the right thing."
Written by Aaron Dubrow, Science and Technology Writer
Comments