A computer scientist at Washington University in St. Louis has developed a novel technique to extract more DNA from a single sequence reaction than is normally possible, reducing both cost and time of the sequencing process.
Michael R. Brent, Ph.D., associate professor of computer science, has applied software developed in his Washington University laboratory that sorts through the maze of genetic information and finds predicted sequences.
“Normally, you get one 600 to 700 base pair sequence in a reaction, but under certain conditions, we’ve figured out how to get more than one sequence out of a single sequencing reaction,” said Brent. “In most cases, people would throw out a reaction with more than one sequence but we’ve developed software that allows us to sort out the mess and figure out the different sequences.”
Writing in the April issue of Genome Research, Brent and collaborators at Baylor College of Medicine, led by Richard A. Gibbs, Ph.D., director of Baylor’s Human Genome Sequencing Center, discuss related techniques in genome analysis, while noting that the recent publication of a third mammalian genome, the brown rat, suggests a new approach to genome annotation is needed. Sequencing genomes has proven to be so labor-intensive and expensive that researchers fear little headway will be made in future genome analyses. Thus, the need for automated analysis.
The researchers describe their method of predicting genes in the brown rat using Brent’s TWINSCAN software, which predicts the existence of genes by looking at two genomes in parallel and homing in on statistical patterns in the individual DNA sequences of each genome. The recently completed sequencing of the brown rat genome was conducted primarily using another program called Ensembl. Brent and his collaborators tested 444 TWINSCAN-predicted rat genes that showed significant homology, or correspondence, to known human genes implicated in disease. Ensembl and other techniques that use protein-to-genome mapping missed these genes.
Brent and his collaborators verified the existence of 59 percent of their predicted genes.
“We showed that we can do this efficiently with a reasonable fraction of the genes that TWINSCAN predicts and that you can actually produce a gene structure with the method,” Brent said. “These predictions are a viable springboard for doing experiments. When you start with a prediction you’ll get an experimental result pretty frequently. We believe it’s a good way to complete the annotation of a genome.”
The approach stands traditional genome annotation on its head because it starts with a computer analysis of genome data, using that as a hypothesis and drawing experiments from the hypothesis.
“Currently, experimental sequencing of both genomes and gene products is followed by computational analysis of the resulting sequences,” said Brent. “It’s a one way street. We want to integrate computational and experimental genomics, so that the parts of the process talk to each other.”