IBM makes open source analytics move

By Nick Wakeman,
Editor-in-Chief, Washington Technology

| June 15, 2015

IBM makes a deeper commitment to open source data analytics by agreeing to imbed Apache Spark into some of its platforms and donating its machine-learning software to the open-source community.

IBM has been involved with Apache Spark, an open source data analytics project, since its inception but has now upped the ante by releasing some of its own software and adding the technology to several of its own products.

Big Blue announced plans today to embed Spark into its analytics and commerce platforms and offer Spark as a service on Bluemix. The company also will put more than 3,500 researchers and developers to work on Spark-related projects, IBM said in a release.

As part of the commitment, IBM is donating its IBM SystemML, a machine learning technology, to the Spark open source ecosystem.

Apache Spark began as a project at the University of California-Berkley in 2009, and IBM claims it is the fastest growing open source project in history.

For federal agencies, Spark should help them more quickly use and manage the massive amounts of data they produce.

For example, IBM is working with NASA and the SETI Institute to analyze terabytes of deep space radio signals using Spark’s machine learning capabilities.

“With Spark as a Service on Bluemix, we’ll be able to work with IBM to develop promising new ways to analyze signal data as we hunt for evidence of intelligence elsewhere in the cosmos,” said Seth Shostak, senior astronomer and director of the Center for SETI Research.

Any agency that has to juggle and analyze massive amounts of data is a potential user of Spark.

IBM officials also pointed to the Agriculture Department as an example. USDA collects data about farming, food inspections and economic data related to food production. They also have access to weather data and agricultural data from around the world. Spark can put all of those disparate forms and sources of data into a single data stream for analytics.

IBM Watson Health Cloud also will use Spark.

IBM is one of four founding members of the UC Berkeley AMPLab, where Spark was first invented.

NEXT STORY: Leidos picks up $23.7M contract to help combat malaria