Inside Iowa State

Inside Archives

Submit news

Send news for Inside to inside@iastate.edu, or call (515) 294-7065. See publication dates, deadlines.

About Inside

Inside Iowa State, a newspaper for faculty and staff, is published by the Office of University Relations.

April 1, 2005

His code pieces together the biological puzzle

by Samantha Beres

It almost sounds like science fiction. DNA is put into a box that breaks it up into fragments, or DNA sequences. The fragments are read and each is described by a cryptic word from a strange language that uses only the letters A, T, C and G; all the words are stored in the box's memory. Then, someone writes a computer program to reassemble the fragments.

This is a simple description of shotgun sequencing, one technique for building a genome -- the complete sequence of DNA that contains the instructions for how an organism grows and develops.

Only a handful of research groups in the nation write computer programs to assemble the sequences. Xiaoqiu Huang, associate professor of computer science, leads one of them. An early version of his program, PCAP (Parallel Contig Assembly Program), was used to assemble a mouse genome. A more recent version was used to assemble the chicken genome. Those results were published in the December issue of Nature.

"We understand computer systems very well because we designed them, but biological systems are designed by nature and much more complex," Huang said. "It's an interesting problem to understand them and computation is an important part."

Computation is necessary because of the vast amount of information in a genome. The human genome is more than three billion letters long. The chicken genome is somewhere around a billion, enough letters to fill more than 25,000 pages in a newspaper. In a computer, it takes up a gigabyte.

That box in which the DNA gets chopped is called a sequencing machine. It can only read and spit out 500 letters at a time. Huang's program takes the smaller, more manageable pieces and looks for overlapping letter sequences to rebuild the larger, original sequence.

"It's similar to a puzzle," said Huang. "It will look at two pieces to see if there is a perfect fit."

His program uses parallel computing (an interconnected array of computers that simultaneously execute a task).

The final product is a "draft" assembly because assembled genomes aren't perfect. Huang, whose work is funded by the National Human Genome Research Institute of the National Institutes of Health, is continually refining and perfecting the program, and that could take years. He took two years to develop the program and thus far, he has spent two years refining it.

Researchers from the Washington University Genome Sequencing Center in St. Louis, a world leader in genome sequencing, used Huang's program to assemble chimpanzee and chicken genomes. They also provided feedback to Huang.

So far, his program has proven to play a major role in reducing the time and costs of genome sequencing projects. It took years and cost billions of dollars to sequence the human genome. The chicken genome took nine months and cost $10 million using his program.

Though perfection of the program is a personal goal, Huang also likes to think about the comparisons that can be made once the genomes are assembled.

A small percentage of the sequence has been conserved in the genomes of all animals over long periods of evolutionary history.

"It's a very small percentage, maybe 1 or 2 percent," Huang said. "If you compare the conserved regions in different animals, you can see where the animals diverged.

"To me there are lots of important computational problems in genome research," he added. "If my programs can produce results that are useful to biologists, then I could have an impact in that area."

Xiaoqiu Huang

Computer scientist Xiaoqiu Huang leads one of just a few research groups in this country that write computer programs to assemble genome sequences. Photo by Bob Elbert.