Common denominator

By Doug Beizer

| July 20, 2007

Maps that show where specific data is stored can facilitate information sharing.

A good map is essential for navigating an unfamiliar town.

The same can be said for sharing data. Without a complete and accurate map, sharing information among several sources is difficult at best.

A map of where specific data resides is essential for sharing to succeed. One database may record, for example, a person's race with a single digit in a specific row, while another database may record that information with two digits in a different row.

The lack of standardization makes it difficult for a single application to access data from several sources.

After the 2001 terrorist attacks, the push for better data sharing became a national priority.

In that spirit, the Florida Department of Law Enforcement launched an effort to gather better crime-fighting intelligence by sharing information among state courts, police forces and other agencies.

Existing information-sharing systems in the state made it too costly and time-consuming to require localities to switch to a new, statewide system. Florida officials also wanted to conform to the National Information Exchange Model to share data with federal agencies.

Florida officials launched a project last year to establish a single statewide information-sharing infrastructure.

The discovery and mapping of relationships across hundreds of disparate systems was achieved in part by using software from Sypherlink Inc. of Dublin, Ohio.

"Instead of relying on individuals and domain experts from every system to manually map fields between the systems, we automated that as much as possible," said James Paat, Sypherlink's chief executive officer.

Agreeing to a standard isn't the big challenge to sharing data ? the biggest hurdle is getting to that standard. Without tools like Sypherlink's, adhering to a standard is a largely manual effort.

Before the statewide effort, Florida was divided into seven regions, each of which had its own data-sharing effort under way and its own data model.

"While they could share information within that region, it was very difficult to share information between the regions," Paat said. "It's a similar analogy to the national front."

Model builders

State information-sharing initiatives often focused on just one state, not its neighbors. After the Sept. 11 attacks, there was a big push for data sharing but no strong guideline existed. As a result, states took different approaches to solving the problem.

In Ohio, officials built a data warehouse for law enforcement information. They also created a data model for all the state's agencies to feed information to.

South Carolina took a similar approach but used a different data model than Ohio.
"At the end of the day, they're all collecting law enforcement data, but the challenge is there's a lack of interoperability," Paat said.

In today's world, data is distributed, and the problem that arises is it develops inconsistencies as data moves from system to system. Also, when data is distributed there is a lot of potentially overlapping data. When data is consolidated back together, it doesn't match perfectly.

One system might use the number 23 to describe armed robbery while another system might use 55.

"The problem is when you go and look at the data, it's very hard to tell what is what," said Todd Goldman, vice president of marketing at Santa Clara, Calif.-based Exeros Inc., a maker of automated data relationship discovery software.

"That's where data mapping comes into play," he said. "When you start relating these systems together, you might be able to find some key, like first name, Social Security number, or something else that lets you line up the row between those systems."

Mapping must also discover all the information that might be overlapping or shared. Data relationships are complex. For example, Social Security numbers tend to get buried into other larger numbers because information technology people like using the numbers as a unique identifier. Those relationships must be clear so agencies don't accidentally release sensitive data.

Adding to the complexity is how data has been gathered over the years. It started out with mainframe computers, then to server-style computers, and finally to desktop computers. Data found its way into all those different systems.

Because it is much more efficient to build applications on Unix servers rather than mainframes, IT departments began developing smaller, more distributed server-based applications. Those applications run faster when the data access is local, so data started getting sent to multiple locations, which can degrade the data.

It is like keeping three separate address books. Information is updated in one, but not in the other two. New contacts are added to one, but not the others. Pretty soon none of the address books is perfect.

Key advice

Organizations often try to manually resolve those relationships. But with hundreds or thousands of applications, that might be impossible.

A commercial customer of Exeros spent about three years trying to justify a project to discover sensitive data in 1,700 applications.

"The cost would be 50 people over five years to do the work, and management said, 'No, you're not going to do that,'" Goldman said. "Using our product, they were able to automate that process."

Exeros' software looks at data in two systems simultaneously. It looks at the data values and by analyzing millions of rows of data, it can find patterns that identify the relationships between systems.

For systems integrators and agencies tackling data-mapping projects, Goldman cautions against underestimating the complexity they're facing.
"End users will do things and create their own code because they don't want to wait for an IT person to do it," he said.

When that happens, a whole category of data may exist and be totally unknown to the people in charge of maintaining the database.

Staff writer Doug Beizer can be reached at dbeizer@1105govinfo.com.

NEXT STORY: On the edge