Watson’s pipelined, pragmatic process demonstrated significant innovations that overcame the type of grand challenges faced every day by the government IT community.
As a hardcore geek, it was exciting for me to watch Watson, a supercomputer named after IBM’s founder Thomas Watson, recently dominate its human competitors on “Jeopardy!” In deference to the “Terminator” generation, my son took the side of the humans and was a bit offended that I was rooting for the computer.
More importantly than answering trivia questions and putting aside fears that artificially intelligent (AI) computers will make us all slaves, what Watson achieved represents another milestone and possibly a watershed event for government IT in general and information management in particular. To understand why, we need to first understand what innovations the IBM Research team, led by David Ferrucci, achieved and then extrapolate how they can be applied to pressing government challenges.
For a computer to compete at the highest levels of “Jeopardy!”, it must understand the question, search its knowledge base and deliver an answer in approximately six seconds. Candidates can only press the buzzer after Alex Trebek has finished reading the question.
By researching IBM’s websites on Watson and the DeepQA project and watching Nova’s excellent documentary titled “Smartest Machine On Earth,” I learned that Watson achieved this feat using a pipelined approach.
- Use natural-language processing (NLP) to parse the sentence and extract keywords.
- Use information retrieval techniques against structured and unstructured data sources to create candidate answers.
- Use rules and logic to winnow down the field of candidates.
- Use machine learning techniques to score the candidates.
- Deliver the highest scoring candidate as the answer.
Although that description significantly simplifies the hundreds, if not thousands, of nontrivial innovations delivered by this research team, we can correlate those major pipeline stages to specific systemic information management problems that every government organization faces. And it’s refreshing to see such a pragmatic, hybrid approach to solving this problem rapidly — in a span of three years — in contrast to the plodding, linear progress that more traditional AI techniques had demonstrated in the past two decades. That is not a condemnation of those techniques, just another confirmation that “perfect is the enemy of the good.”
There are many grand challenges in every field. But in government information management, there is one that stands above all others: information overload. And information overload has three subchallenges: poor data quality, poor knowledge modeling and poor precision.
Watson’s pipelined, pragmatic process demonstrated significant innovations that overcame all of those challenges. That should inspire both hope and excitement in the government IT community.
Let’s examine each subchallenge and Watson’s solution.
To deal with poor data quality, Watson uses information retrieval techniques on curated data sources to develop its list of possible candidate answers. Wolfram Alpha also extensively uses the same technique of carefully curating data. As a foundational technique, government agencies must institute data quality programs or their data foundation will be unreliable and unsuitable for higher-order information management functions.
Second, Watson used rules and logic to winnow the candidates. That requires government agencies to go beyond data modeling to embrace the semantic techniques of knowledge modeling — also known as ontology development — to enable robust rules and logical inferences.
Finally, Watson used machine learning to select its final answer. That is one solution — among others, even mundane ones such as data standardization — to improving the precision and relevance of government information retrieval. Though not suitable in all domains, machine learning holds significant promise for knowledge domains we cannot model well.
NEXT STORY: Readers fret about shutdown risk