Report: Homeland security data mining generates concern

The National Security Agency is now sponsoring intelligence data mining with massive databases that are growing as fast as four petabytes per month, according to a new report published by the Congressional Research Service.

The National Security Agency is now sponsoring intelligence data mining with massive databases that are growing as fast as four petabytes per month, according to a new report published by the Congressional Research Service.

The report, "Data Mining and Homeland Security," written by Jeffrey Seifert, highlights the growing popularity of data mining and its benefits while also outlining limitations and possibly privacy and mission creep concerns. It was posted online by the Federation of American Scientists.

Although data mining has become more common in recent years, its rapid expansion in homeland security raises concerns related to how it is implemented and monitored.

Data mining, which can help reveal patterns and relationships, requires skilled personnel to determine the value and significance of the information. What's more, data mining on its own does not show causal relationships. Issues of data quality, interoperability, mission creep and privacy also have been obstacles in applying data mining, the report said.

The National Security Agency's recently disclosed surveillance of alleged domestic terrorists has sparked privacy concerns in Congress, as has the former Terrorism Information Awareness project and the Homeland Security Department's Computer-Assisted Passenger Prescreening System II, both discontinued. The Multi-State Anti-Terrorism Information Exchange formerly operated by Florida and several other states, and the Defense Department's Able Danger project, also have attracted attention.

Lesser known programs for data mining include the NSA's Novel Intelligence from Massive Data program, which is being developed by grants under its Advanced Research Development Activity arm.

The massive data program refers to data that is especially challenging to common data analysis tools because of its unusually large size, such as a petabyte or greater, as well as databases with great complexity and a variety of formats, such as those that include unstructured text, audio, video, graphs, diagrams, images, maps, equations, chemical formulas or tables.

"Some intelligence data sources grow at a rate of four petabytes per month now, and the rate of growth is increasing," the congressional research report states, quoting from the advanced research activity's former Web site. The huge expansion in electronic communications means that NSA requires more sophisticated data tools.

"Whereas NSA once predicted it was in danger of becoming proverbially deaf due to the spreading use of encrypted communications, it appears that NSA may now be at greater risk of being 'drowned' in information," the report states.

NEXT STORY: Viisage closes SecuriMetrics buy