Workin' in a data mine

Privacy and civil-rights concerns
have dogged the use of
data mining for counterterrorism
and national security purposes,
but it is quietly thriving
nonetheless.

Although several high-profile
federal data-mining programs
have been shut down, contractors
supporting the work say
new opportunities continue to
develop for agencies to use
commercial data mining and
analytics.

"Data mining facilitates the
modeling that would take years
to do manually," said Jesus
Mena, chief strategy officer at
InferX Corp., a software company
in McLean, Va., that specializes
in data mining and analytics
for systems integrators
and the military.

Mena helped the Homeland
Security Department's Office of
Inspector General review DHS'
data-mining programs in 2006.

"Each of [DHS'] components
has missions, and they cannot
accomplish those missions
without advanced analytics,"
Mena said.

"Without a doubt, data mining
is an area of growth," said
Patrick Crago, president of
Multi-Threaded Inc., an information
technology company in
Herndon, Va., with customers
at defense and intelligence
agencies. "Our customers are
becoming a lot more sophisticated
in their use of data. Data
mining can help you analyze
unstructured data, which has a
volume that is larger than structured
data by a factor of 10 to 1."

Despite progress, the public's
concerns about using data mining
for homeland security has
slowed its adoption and forced agencies to turn to more modest
forms and different terminologies,
industry experts say. In
addition, some proponents
claim that newer data-mining
techniques could offer better
privacy protections.

"People are doing it but not
calling it data mining," said Gary
Monroe, director of federal operations
at MicroStrategy Inc., of
McLean, Va. "It is a very valid
technology that can be applied
to uncover trends. Speaking for
myself, we try to steer away from
the term 'data mining' because
it has preconceived negative
connotations."

Although commercial
and government datamining
applications are
growing, the commercial
ones are more popular:
For every $1 in federal data
mining, there is $20 worth of
commercial data mining,
Monroe said.

DHS demand

Data mining is broadly
defined as the analysis of large
amounts of data to uncover hidden
relationships and patterns. In one type of analysis, keywords
are used to search large amounts
of data to determine patterns
and associations and ultimately
develop behavioral profiles.
Those profiles can identify other
people who fit the pattern and
possibly predict their behavior.

For example, marketers can
sort through data to identify the
behavior and characteristics of
people who bought a particular
item most quickly at a Web site.
Then they develop marketing
strategies to target more likely
buyers.

Similar techniques were
used to develop the Terrorist
Screening Center's watch list of
750,000 individuals and create
DHS' Automated Targeting
System, which produces risk
scores for cargo and airline passengers
entering the United
States. Both programs have been
criticized for inadequate privacy
protections, lack of transparency
and high error rates.

Within weeks, DHS intends
to introduce its long-delayed
Secure Flight program, through
which it will assume full responsibility
for checking airline passengers'
names against the terrorist
watch list. Airlines conduct
those checks now.

There have also been highly
publicized flops.

Congress rejected the
Pentagon's Terrorism Information
Awareness program in
2003. And the DHS-financed
Multistate Anti-Terrorism Information
Exchange, which offered
search and data-mining capabilities
to local law enforcement
agencies, was terminated in
2005 because of privacy fears.

More recently, DHS' Analysis,
Dissemination, Visualization,
Insight and Semantic Enhancement
program was suspended in
August because of privacy concerns,
according to a Government
Accountability Office
report.

"I have a feeling data mining
has probably lost its luster," said
Jim Harper, director of information
policy studies at the
Cato Institute, a Washington
think tank.

Data mining for commercial
purposes is booming, but when
the technology is used to target
terrorists, there is too little data
to accurately identify patterns,
which could result in accusations
against innocent people,
he added.

Funding continues

Even so, House appropriators
approved $12 million in fiscal
2008 for the FBI's National
Security Branch Analysis
Center, which will have 36 staff
positions. The money will support
advanced analytical techniques,
technologies and data
resources for terrorist tracking.

However, Rep. Brad Miller
(D-N.C.), chairman of the
House Science and Technology
Committee's Investigations and
Oversight Subcommittee, and
Rep. F. James Sensenbrenner Jr.
(R-Wis.), the subcommittee's
ranking member, asked GAO to
evaluate whether the FBI can
properly manage the center's
proposed 6 billion records.

DHS has 12 data-mining programs
on the books, nine of
which are active, according to a
survey released in August 2006
by DHS Inspector General
Richard Skinner.

To address privacy concerns,
DHS could use newer technologies
that allow searchable data
to remain at its original location,
Mena said. Doing so would
help protect privacy by avoiding
situations in which data is collected
for one purpose but eventually
used for many other purposes,
he said.

Newer data-mining techniques
are also more effective,
Mena said. "You can buy more
power in commercial products
off the shelf," he said. "DHS is
using 20-year-old technologies."

Sergei Ananyan, president of
Megaputer Intelligence Inc., of
Bloomington, Ind., said federal
agencies, including DHS, are
using data mining to achieve
goals such as evaluating
employee training and improving
safety records.

DHS could benefit from datamining
software that analyzes
text on Web sites, he said. In a
test project, that tool helped a
law enforcement agency identify
connections and links among
various crime groups, but it is
difficult to make such a program
transparent without revealing
too much information, he added.

As for counterterrorism, "the
techniques exist today that, if
they are put to good use, the
results would be quite good,"
Ananyan said. "Can it protect
privacy? It's not a technical
question but a political decision."
Despite progress on many
data-mining fronts, concerns
about privacy and civil rights are
not likely to go away.

"Data mining is a very sensitive
issue," said Michael Daconta,
an independent consultant who
was formerly metadata program
manager at DHS. "It is a powerful
tool, but it has lots of implications,
and people get nervous."

Staff writer Alice Lipowicz can
be reached at alipowicz@1105govinfo.com
.

NEXT STORY: A policy under siege