Open government looks for next-generation technologies

Whereas Web 2.0 is about connecting people through social-networking applications, Web 3.0 will be about connecting information in new ways that people will find more useful and relevant. It'll be a boon for government transparency, but it won't be happening soon.

Transparency and accountability have become the watchwords of the Obama administration, and part of that promise is making all sorts of government data available to citizens and public interest groups via the Web.

But go past the slick home pages of public access Web sites such as Data.gov and Recovery.gov, and one finds frustrating inconsistencies in the volume and presentation of available information. Critics look at these shortcomings and gripe about the oxymoron of “government accountability.” But it’s also certainly true that the Web tools the Obama team is using are not cut out for the job.

Almost every morsel of government data exists in electronic form somewhere, and with the exception of classified data, it is perfectly acceptable for public consumption. However, making it easy for people to find, analyze, share and ultimately understand the information is another story.

Many tech experts say the solution lies in the Semantic Web, a slowly emerging set of technologies that aim to improve access to and the usability of information and software services on the Internet, ushering in a new era of Internet applications that some are already calling Web 3.0.

Whereas Web 2.0 is about connecting people through social-networking applications such as Facebook, wikis and Twitter, the next generation will be about connecting information in new ways that people will find more useful, relevant and enjoyable.

The question is: Will open government and open-access technology be meeting soon? No one can say for sure, partly because it’s nearly impossible to predict what a Web experience will be like after these technologies take hold. Just as no one foresaw the meteoric rise and wide reach of Facebook and Twitter — even after those tools hit the market — it’s hard to know what will happen when the next new thing comes along.

The Semantic Web “is something that has a lot of potential and people are asking about it. But there’s a long way to go,” said Michael Donovan, chief technologist for U.S. Department of Defense Business at EDS, an HP company.

Widespread usage is probably at least five years away, Donovan said, because Web sites need to be revamped and Web browsers endowed with new capabilities. But that isn’t stopping a number of government Semantic Web efforts from taking some baby steps now. One of President Barack Obama’s pet projects is among the early efforts.

How the Semantic Web works

The Web’s traditional function is to simply present content, such as a government report posted online. The Semantic Web goes a step further by seeking to illuminate the content’s meaning. It’s only a subtle shift, to be sure, but moving down that road creates powerful opportunities for associating content with other related resources on the Web.

The World Wide Web Consortium (W3C), which is a focal point for Semantic Web developers from academia and industry, describes the Semantic Web as a common framework for data sharing.

That framework can play multiple roles, but two functions stand out in particular: describing data so users can more easily find it through Web searches, and describing applications so other applications can more easily identify how they work and then use their functionality for new purposes.

“The first step is to make data available by description,” said Michael Cataldo, chief executive officer at Cambridge Semantics, a two-year-old company that focuses on semantic technology.

Enter metadata tagging. The tags encoded in a particular piece of content provide a description that makes it easier to locate in a Web search. W3C has advanced the Resource Description Framework (RDF) as the specification for describing things such as documents, images or people.

Ruhollah Farchtchi, associate at Booz Allen Hamilton, said he considers RDF a base layer for the Semantic Web. Another specification, Web Ontology Language, provides more detailed descriptions by allowing systems to infer additional information based on the data explicitly provided.

As Web resources become better defined, searches will produce higher fidelity answers, said Doug Chabot, vice president and principal solutions architect at QinetiQ North America Mission Solutions Group.

“We expect the Web to understand our context so well that it predicts what we really mean in our queries," Chabot said.

In its second role, the Semantic Web serves as an application integration mechanism. It does this by using machine-readable metadata that lets one application interpret the meaning of the data it receives from another application. This common understanding could help watchdog groups and other organizations use semantic technology to more easily build mashups using government data feeds or Web applications that pull together data and software functionality from two or more sources.

Gary Bass, executive director of OMB Watch, said his group would like to look at government contractors to see if they comply with Occupational Health and Safety Administration, Equal Employment Opportunity Commission and other agency directives. But the group would need to know that a company listed in one database is the same entity listed in others.

“Semantic technology, if done properly, should be able to tell us that,” Bass said.

Government early adopters

Government agencies have already begun adopting semantic technology. Data.gov, which aims to provide access to executive branch data, uses the Dublin Core metadata standard to define datasets. Dublin Core, which makes use of RDF, includes elements such as title, author, date and description.

In Dublin Core, the government has “chosen a humble and broadly accepted specification to describe reports,” said Ed Lyons, chief engineer at Keane.

For the most part, although government agencies might be aware of semantic technology, they aren’t ready to deploy it. Recovery.gov, the Web site that tracks the distribution and management of funds stemming from the American Recovery and Reinvestment Act, is focusing on other priorities, said Ed Pound, a spokesman at the Recovery Accountability and Transparency Board. The introduction of Web 3.0 technology will have to wait while they get the Web site in shape to receive ARRA recipient reports, due in October.

Other agencies have taken a similar position: They are aware of semantic technology but aren’t ready to deploy it. “We are monitoring the situation as the technology matures; it is not factoring into our business requirements at this point,” a General Services Administration spokesman said.

However, industry is showing how semantic technology can be applied to government data. Cambridge Semantics, for example, demonstrated how its semantic technology can combine data from Recovery.gov and the U.S. Census Bureau to compare recovery spending with population distribution, Cataldo said.

In this particular case, Cambridge Semantics used its Anzo for Excel product, which semantically enables the data in Microsoft Excel spreadsheets, allowing the user to link them, create RDFs based on the content, and make it usable with other semantically tagged data.

Other appearances of semantic technology in the public sector and academia tend to focus on specific domains such as health care.

The University of Texas Health Science Center at Houston provides an early example. In 2004, the center developed its semantics-based Situational Awareness and Preparedness for Public Health Incidences Using Reasoning Engines system. Sapphire, still used today in biosurveillance, helped spawn additional projects built around semantic technology.

Those spin-offs include the Biomedical Language Understanding and Extraction System, a clinical text processing application, and the Survey On Demand System, which administers surveys and collects data.

Dr. Parsa Mirhaji, assistant professor of medicine at the center, cited Harris County, Texas, as a user of the survey system. The county’s health department plans to use the system to collect data to support its influenza preparedness efforts.

Challenges

Walton Smith, senior associate at Booz Allen, said agencies are asking the company about semantic tools, but he noted that the full spectrum of government agencies has not yet embraced the technology. The lack of a killer application — as the browser was for the first generation Web — is part of the reason. “There is no platform like the browser that makes this available for widespread, mainstream use,” Smith said.

Donovan at EDS said he believes most of the technology advances will occur as incremental extensions to familiar tools such as Web servers, search engines and collaboration tools.

The time and effort required to tag and describe the government’s vast data holdings represent another adoption challenge. Clay Johnson, director of Sunlight Labs, expressed concern that the government might become preoccupied with formatting data rather than releasing it. Sunlight Labs is an open-source development team that launched as a project of the Sunlight Foundation, an open-government advocacy group in Washington.

“I would hate to see them get bogged down in trying to make their data Semantic Web compatible before it even sees the light of day,” he said.

Keane’s Lyons said the Semantic Web could prove to be the linchpin for the administration’s openness agenda. But he said other paths to greater accessibility might arise, such as efforts to open more government data to Google searches.

Discoverability and access are the key goals, Lyons said, regardless of the method.

“It doesn’t matter if information is technically available if there is no reasonable way for the average citizen to access it,” he said.

Uses for the Semantic Web

Enter search terms: A person using a browser equipped with semantic search capability enters terms into a search form, such as "Cincinnati, asthma, child, treatment."

Scan metadata

Identify resources

Scan ontologies

Build new applications

Return search results