Getting out in front of the burgeoning data deluge

Find opportunities — and win them.

We are now well into the exabyte-per-year era of data (1 billion gigabytes), with predictions that the size of the digital universe will double every 18 months. How do you store all of that data, let alone find ways to manage it so you can retrieve it and make use of it?

Digital preservation is one of those issues that everyone thinks is obvious; it also is one that no one really talks about and, least of all, offers any solutions for.

Everyone will have yet another chance to confront the issue on April 1 when the Blue Ribbon Task Force on Sustainable Digital Preservation and Access plans to hold a symposium in Washington on “sustainable digital preservation practices.” It will include a wide range of organizations whose existence depends upon digital preservation, such as Google, as well as representatives from the publishing and movie industries. There also surely will be plenty of government involvement as well.

This isn’t a new subject, but it’s one that rarely makes the headlines. Data protection is all the rage, and producing data comes in a close second, but who ever talks about preserving it? It’s taken as given. Every now and then stories about the work of the National Archives or Library of Congress are published but never seem to make it onto the most-popular or most-read lists.

But think about it: We are now well into the exabyte-per-year era of data (1 billion gigabytes), and reports from outfits such as the International Data Corp. predict a doubling of the size of the digital universe every 18 months. Given the explosion in social media, and the coming one in online video, I’d say that’s conservative.

How do you store all of that data, let alone find ways to manage it so you can retrieve it and make use of it? Government mandates on retaining data aren’t going away, after all.

The blue ribbon panel recently came out with a report that examines the economies involved with both preserving data and making sure it can be accessed.

In an earlier article, task force member Dr. Fran Berman, director of the San Diego Supercomputer Center (SDSC) at the University of California-San Diego, talked about what’s needed to meet the data-cyberinfrastructure challenge.

Good stuff. I’m sure that and lots more will be discussed at the April 1 symposium. But will anybody be listening?