Computer-savvy science and technology

In late October, the International Human Genome Sequencing Consortium, led in the United States by the National Human Genome Research Institute and the Energy Department, published a scientific summary of the finished human genome sequence, reducing the estimated number of human protein-coding genes from 35,000 to 25,000.

In late October, the International Human Genome Sequencing Consortium, led in the United States by the National Human Genome Research Institute and the Energy Department, published a scientific summary of the finished human genome sequence, reducing the estimated number of human protein-coding genes from 35,000 to 25,000.


Appearing in the Oct. 21 issue of the journal Nature, the paper described the final product of the Human Genome Project, a 13-year effort to read the information encoded in human chromosomes that reached its culmination in 2003.


Deciphering genetic complexity seems a task tailor-made for the Information Age. The rapid advance of processor technology and processing speeds are letting biologists begin to decipher how individual genes work and how genes interact in networks, inhibiting and activating each other and even regulating themselves.


Genes also "express" proteins, that is, they create the very substances that enable life in all its variety to survive and prosper.


The marriage of computer acumen with biological research, known as computational biology or bioinformatics, may let life sciences transfer unprecedented discoveries from labs to homes. It's an effort that, when and if successful, could amount to billions or even trillions of dollars of return to companies and individuals.


"Biology is becoming much more information-based, dependent on databases and computation in general," said Jean-Pierre Auffret, director of the Bioscience Management Program at George Mason University in Fairfax, Va. "There's more data. Now, the challenge is to process it."


For pharmaceutical companies, he said, "it provides a means of accelerating drug discovery."


Another enabling technology for genetic discovery is the microarray, sometimes called a "gene chip." Microarrays are keychain-size devices with as many as several million tiny spots. Researchers can use the devices to monitor simultaneously the activities of thousands of genes from a single tissue sample and identify patterns that could indicate disease.


Because the technology is relatively new, standards ensuring the reliability and comparability of results are just beginning to take shape.


Another computational-biology frontier is the emerging field of proteomics, which aims to understand how gene-created proteins act and interact within the body for good and for ill. Scientists are beginning to identify and quantify thousands of the proteins, but for a complete understanding they will need to determine how a wide variety of factors influence protein impact on physiology.


Because such a challenge is substantially more complex than genetic sequencing, the computational demands will prove intense.


"Advanced biology is about crunching numbers and synthesizing data," said Tim Howard, founder and former CEO of Galt Associates Inc., a bioinformatics firm in Sterling, Va. "It's all about leveraging knowledge."


Thanks to computation, acquiring biological knowledge is becoming dramatically less expensive. Rodney Brooks, director of the Massachusetts Institute of Technology's Computer Science and Artificial Intelligence Laboratory, said the cost of sequencing DNA is diminishing exponentially. By 2005, the cost of sequencing a person's genome is expected to be a mere penny per base pair, according to a recent article by Brooks in MIT's Technology Review.


"Compare that to the $10 it cost in 1990," he said.


As a practical matter, Brooks said, one only need examine 10 million base pairs to cover all the variations in the human genome. Consequently, sequencing this number to determine a person's genetic fingerprint and disease susceptibility will cost only about $1 by sometime in the 2020s. n

NEXT STORY: Bioinformatics