IT as health care warrior
Bioinformatics battles cancer, epidemics and bioterrorism
- By Doug Beizer
- Nov 04, 2005
It's a scenario that keeps politicians awake at night: A deadly form of avian flu mutates, spreads from birds to humans and sets off a global pandemic.
If ? some say when ? that scenario becomes a reality, it would fall to health professionals and researchers to come up with measures to combat it and treat patients. Perhaps more important, researchers are looking for safeguards that would prevent a pandemic from happening in the first place.
That's where IT in the form of bioinformatics plays a major role.
Bioinformatics is the collection, organization and analysis of large amounts of biological data through the use of computers, networks and databases. A number of projects under way demonstrate the direction the field is taking and its continued importance in health care and homeland security.Defend the homeland
Northrop Grumman IT, for example, is working on a project at the National Institute of Allergy and Infectious Diseases to create a large bioinformatics data repository, said Robert Cothran, chief scientist for Northrop Grumman IT's Federal Enterprise Solutions business unit.
NIAID is developing a consortium of bioinformatics resource centers to study several pathogenic micro-organisms. The BioHealthBase database will contain data about pathogens such as mycobacterium tuberculosis and influenza virus.
"The institute's goal was to choose a large number of pathogens of interest in biodefense, and then to provide a single resource for collecting all of the data on these pathogens for researchers to work with," Cothran said.
With assistance and data from the resource center, researchers will have a tool to accelerate and expand their study of these dangerous organisms.
"But perhaps more important from a biodefense or bioterrorism standpoint is that researchers have a resource of information to come out with countermeasures," Cothran said. "It could lead to treatment for those infected, or vaccinations or other prophylactic methods for people who are not infected."
From an IT perspective, the system is a data center that houses large databases ? terabytes worth of data for each pathogen ? along with an interface that lets researchers input and receive data to interact with the center.
The BioHealthBase bioinformatics resource center focuses on data about six priority pathogens to help fill in gaps in genomic and other data critical to scientific researchers. Relational databases will include data on genome sequencing, comparative genomics, genome polymorphisms, gene expression, proteomics, host and pathogen interactions and pathways.
A university professor doing work on tuberculosis could use the resource center to access all data available on the pathogen from research done at other universities and research centers. In the past, a researcher would have to get that data from each university individually.
"At the same time, when she gets done doing research or makes some discovery, such as finding a new variant, that variant can be uploaded to this single resource, and everybody knows where to look for it," Cothran said. "So it's a mechanism for publishing that type of information to the rest of the research community. It's kind of a clearinghouse for all the information that a collection of researchers would use who are working on different aspects of the same infectious agent."
The interface researchers use to upload and download data is Web-based, mostly implemented through open-source technology.
A challenge for the project is handling the volume of data required and the computing power needed to process it.
"Bioinformatics produces more information than probably any other research area that's ongoing today," Cothran said. "It's just huge volumes of data that are used constantly."
Northrop used large network-storage devices and a high-powered database engine, not usually found in an academic setting, to handle the load.
A related project Northrop is working on at the Allergy and Infectious Diseases Institute is the Bioinformatics Integration Support Contract, which would integrate several tools for researchers. One goal is to establish a unified way for researchers to configure data.
The effort will provide the means for scientists to easily access, store, analyze and exchange complex high-quality data sets.
The lessons learned from both projects eventually could lead to applications outside bioinformatics.
"These types of technology are appropriate any time you're dealing with very large data sets in a research-type environment," Cothran said. "That's what's really special about this marriage, dealing with very large data sets in a research environment where open source tools are important."Fighting disease
Bioinformatics is also a key part of research for diseases such as cancer. A program at the National Cancer Institute, for example, aims to make clinical trials and research easier and more effective, said Ken Buetow, director of the National Cancer Institute Center for Bioinformatics.
"The Cancer Biomedical Informatics Grid aims to put in place an infrastructure that uses IT to let cancer centers share data, applications and infrastructure," Buetow said. "It's trying to put in place, metaphorically, a World Wide Web of cancer research, a semantically interoperable World Wide Web of cancer research."
Across the United States, cancer centers such as the Memorial Sloan-Kettering Cancer Center and the Mayo Clinic are participating in the grid project, said Robin Portman of Booz Allen Hamilton Inc. of McLean, Va., one of the contractors working on the project.
Until now, there has not been a cohesive way for universities, cancer centers and other organizations to collect and share data, Portman said.
"The goal was to develop and promote the use of a common infrastructure so the research community could focus more on innovation instead of on building their own infrastructures," Portman said.
Once the grid is established, its developers expect the research community will take ownership of it, and develop it.
By the end of the year, major data sets are expected to be available on the grid. And analysis tools will be available in a program called integrated cancer research.
"If you have research data that is broadly accessible and captured in a standardized manner, you can let your head go to the next step," Portman said. "We also have health information technology out there and electronic medical records coming along. So given that all this information is collected in a standardized fashion, you can see a better and easier interface between basic research data and clinical care."
To achieve that, the project is advocating a shared vocabulary, shared data elements and shared data models.
"It also includes the use of interoperable applications, which would be developed to these common standards," Portman said.
The Cancer Institute wants to make raw cancer research data available on the informatics grid for data mining and integration. Eventually, Cancer Institute officials expect the program to extend to other groups such as the Food and Drug Administration and pharmaceutical companies.
Booz Allen is helping build some of the tools that will make the infrastructure a reality.
"We've been responsible for what we call a middle-layer application that allows for the communication between the actual framework and the application that might sit on top of it," Portman said.
Getting that built is essential to making the project work. A strong collaborative spirit exists today in the cancer research community, but the infrastructure that developed over the years was not conducive to sharing.
"There are real obstacles to groups working together, and one of those is the exchange and integration of information," Buetow said.
"It is very difficult, even within an institution, to bring that information," he said. "And if it's difficult within an institution to collaborate, it is daunting when you try to cross institutional boundaries to try to share and integrate information."
The Cancer Institute's project is one of the most broad to date in the field of bioinformatics, said Susan Flood, genomics solutions manager at business analytical software company SAS Institute Inc.
"It is really a precedent setting project in the field of bioinformatics and clinical informatics, because they're actually tackling two different things," Flood said. "One is dealing with the molecular information so they can get a better understanding of what causes disease. The other is putting that information into a more clinical and medical setting where it is being used for treating patients directly."
Fifty cancer centers have agreed to join the grid, and more than 80 organizations are participating in the governance model and all of the developmental activities.
It won't be long before organizations and patients will benefit from the effort, Buetow said.
"We will be able to much more easily do multicenter clinical research investigations, so a clinical trial running at Sloan-Kettering would now be able to enroll patients and participate with patients across the entire United States or world," Buetow said. "Suddenly, clinical trials will be widely available to anyone who wants to participate in them." n
Staff Writer Doug Beizer can be reached at firstname.lastname@example.org