By Jeff Erlichman
, 1105 Government Information Group Custom Media.
Citizen-engaging technology can make government data available online, easy to access, and understandable.
On Friday, March 6, 2009, President Obama and Attorney General Eric Holder announced the allocation of $2 billion in funds from the American Recovery and Reinvestment Act to control crime and improve the criminal justice system. You can follow the money on http://www.recovery.gov.
Following the money is what most American's have on their minds these days when they think of transparency and open government. And this website allows them to do just that. It is one important solution for a transparent government, but just one.
Do a Google search for DC Data Catalog (http://data.octo.dc.gov/) and hopefully this is what Data.gov will look like, but obviously for the federal government,” explained Jerry Brito, senior research fellow at Virginia's George Mason University in a recent interview with 1105 Government Information Group Custom Media.
Brito is the author the author of “Hack, Mash, and Peer: Crowdsourcing Government Transparency,” published last year in the Columbia Science and Technology Law Review. What he is describing is what Federal CIO Vivek Kundra did when he was CTO for Washington, DC.
What Kundra did with the DC Data Catalog was very simple according to Brito.
Every government agency in DC has its own IT system they use to process their cases, whatever they do. So for example police department arrest records go into their computer system. Every time a water pipe is fixed or one is recorded broken, it goes into a computer system. The same thing for Metro system bus and train schedules. All are on computer systems,” explained Brito.
These are all internal IT systems the government has access to. So what Kundra did is basically build a pipe between the systems and the Web. And they are all in one place and they are just XML feeds or other kinds of structured raw data feeds that anybody can take and use.
What makes this fantastic according to Brito is that third parties, whether they are watchdog groups, citizen groups, academics or Web application developers can take this data and do amazing things with them.
Brito explained for example that a developer took the feed for alcohol licenses and bars and plotted it on a map. Then they took crime statistics from another data feed and plotted it on a map on top of that. You could now see the relationship visually, which is a great tool for the police.
Tools include RSS feeds, mashups, which highlight hidden connections between different data sets, and crowdsourcing, which makes light work of sifting through mountains of data by focusing thousands of eyes on a particular set of data.
Another is called 'we the people wiki' where they took data feeds of different things that were being reported, street light bulbs out, potholes and they created a wiki page automatically for each of these reports,” explained Brito.
Citizens can go there and use the wiki to track whether the thing has been fixed and keep their government accountable. So if you make the data available, third parties can do amazing things with it. And that’s what I'm hoping will happen with Data .gov.
RSS and Mashups
Brito thinks what is driving the transparency revolution is that communications and storage technologies have become so cheap. Combine this with the fact that there is all this data publicly available by law and people are demanding more data be available on line then you can use it in interesting ways.
For example RSS (Really Simple Syndication) feeds are just a raw data feeds. If the government makes those feeds available, third parties can take them and do interesting things,” explained Brito.
RSS usually refers to a family of data formats that allows the automation and aggregation of data. For example, if a website offers an RSS feed for its homepage, you can subscribe to these feeds with a desktop application or Web-based “feed reader.
Anytime something is added to the homepage, it is simultaneously published in the site's RSS feed,” said Brito. “When subscribers turn on their feed reader, it checks all the subscribed feeds for new items. So, with one simple feed reader application, a user can keep track of multiple feeds without having to regularly visit the Web sites of the publisher.
What happens then is third parties can make mashups. “Mashups are when you take two or more of those feeds and you combine them to create new and interesting tools,” Brito said. “So you can take a combination of arrest records or crime records and alcohol licensing on top of a map, that is a mashup and it's very interesting. So now this new thing is more useful than either of those things standing alone.
So what is that going to mean for the government staffer and IT professional? What are they going to have to do other things that they are not doing now?
Brito said that if agencies aren't providing RSS feeds now, they will have to. But what this really does according to Brito is liberate government from having to worry about the presentation of data.
If they can give us the raw data, everything, they don't have to worry about creating the maps, about creating different presentations for the data. Third parties can take it and create a million different presentations.
If the government creates one presentation and that's not exactly what you wanted, you are stuck with that one view. But if you provide the raw data you can go out and create your own or you can look around and maybe someone else has built it,” Brito declared.
While mashups can help ease the information overload by highlighting the most interesting connections among data sets, human judgment is still necessary to determine the most relevant facts said Brito. “Crowdsourcing presents the key to sifting through the data made available by official disclosures, hacks, and mashups.” That's because the more eyes you have studying the data the more you are likely you are able to spot problems and offer solutions.
For all this to happen there has to be some standards for recovery of data. “You want all agencies to be reporting how much money they are spending on the different projects and what they are getting for their money,” said Brito. “But you want all the agencies to report it in the same ways. For example let's say, imagine it as an excel sheet and you would have a row and that row represents a project. And then you have columns. It's just making sure the columns are the same for all agencies.
Brito went on to say that even when public information is available online, it is often not available in an easily accessible form. And to allow users to exploit the full potential of the Internet-to subscribe to data streams, to mix and match data sources-data must be presented in a structured machine-readable format.
Structured data,” Brito said, “is a term of art, meaning that information is presented in a format that allows computers to easily parse and manipulate it. Although a static Web page that lists a series of news stories is not structured, it may have a companion XML file containing the same information. A structured XML file would allow a user to sort the data by ascending or descending date, alphabetically and in many other ways that a static Web page does not afford.
It would also allow users to download the data. Brito said the benefit of a download of the data is that with the complete data set computers can help people delve more deeply into the data and put it in new forms, such as charts and maps, that would be too time consuming to create by hand.
The bottom line said Brito is “If government data is made available online in useful and flexible formats, citizens will be able to utilize modern Internet tools to shed light on government activities.” And deliver more solutions for a transparent government.