Return of the supercomputers
Experts debate U.S. response to Japan's Earth Simulator
- By Brad Grimes
- Feb 05, 2004
Christopher Jehn, vice president of government programs for Cray Inc., said Japan is making improvements to its Earth Simulator while the U.S. supercomputing community awaits additional research funding.
In late January, the National Science Foundation flipped the switch on the first phase of its TeraGrid supercomputing project. By lashing together large-scale computing systems at research centers around the country, the TeraGrid project is capable of 4.5 teraflops of computing power that can be used for everything from astrophysics to biomolecular research.
After the second phase goes live, NSF expects TeraGrid to be capable of more than 20 teraflops. That's more than 20 trillion floating-point operations per second.
But as supercomputing users know, 20 teraflops represents only half the theoretical performance of the world's fastest supercomputer: Japan's Earth Simulator, built by NEC Corp. and funded by the Japanese government at a cost of between $350 million and $500 million.
This power disparity has the U.S. supercomputing community wringing its hands, and federal officials seeking funds for advanced supercomputing systems.
"It's a sign that the United States has not focused enough on high-performance computing technologies in recent years," said Jack Dongarra, a computer scientist at the University of Tennessee and one of the keepers of an authoritative list of the top 500 supercomputing sites.
Dongarra pointed to congressional testimony and a series of reports indicating that government and industry might be regaining their focus.
The Defense Department and a separate interagency task force set up by the President's Office of Science and Technology Policy have issued reports to guide supercomputing funding.
Agencies such as NSF and the Energy Department enjoyed 2004 budget increases for supercomputing research. For example, Congress increased the Energy Department's 2004 budget for advanced scientific computing research by $30 million to more than $200 million for the express purpose of acquiring additional computing power.
At NSF, the 2004 budget for its Partnerships for Advanced Computational Infrastructure was up 19 percent to $87 million, including $20 million for hardware upgrades.
But the president's 2005 budget request for the Energy Department's supercomputing research, released last week, provided only a slight increase to $204.3 million
Meanwhile, NSF's PACI program ends in 2004 but effectively will be replaced by a widely shared cyberinfrastructure program, highlighted by TeraGrid. NSF requested $137.9 million for the program.
In the meantime, supercomputer manufacturers are jockeying for position, pitting old technologies against new in an effort to convince people that their way of supercomputing will bring the title of fastest supercomputer back to U.S. soil.
WHAT'S AT STAKE
When it was confirmed that Japan had the fastest supercomputer in the world, Dongarra characterized the ensuing furor as "Computenik," an allusion to the space race with the Soviet Union.
Aside from the bragging rights that would come with housing the world's fastest supercomputer, experts said there are practical reasons for the United States to invest heavily in next-generation technologies.
In testimony before the House Science Committee last July, Ray Orbach, the Energy Department's Office of Science director, said the United States needs to invest in greater supercomputing power so U.S. businesses can remain competitive.
Orbach described how General Motors uses 3.5 teraflops of supercomputing power to design cars but requires up to 50 percent more power annually to keep up with higher safety standards. This need, Orbach said, "will not be met by existing architectures and technologies."
Paul Muzio, vice president for high-performance computing programs at Minneapolis-based Network Computer Services Inc. (NCSI), said private- and public-sector researchers need more power to solve new problems.
"If a researcher has a limited resource, he will structure his work within that limited resource," Muzio said. "If you provide him with a computing system that has significantly more capability, he will extend his approach ... to solve new problems."
NCSI handles systems integration and services for the Army High Performance Computing Research Center. The center has several supercomputers with a combined 2.9 teraflops of power. The Defense Department uses the systems for research into armor and anti-armor design, biological and chemical defenses and fluid dynamics.
Experts are also concerned about the lure of a system such as the Earth Simulator.
"Our scientists ... have to go to Japan to get access to this Earth Simulator," Dongarra said. "We may be losing some talent, because they have to go someplace else" for the best equipment to help them solve problems.
Last April, in response to congressional worries over inadequate supercomputing resources, the Defense Department issued a report called "High Performance Computing for the National Security Community." The report recommended extensive government investment in supercomputing research.
The president's 2004 budget request established the High-End Computing Revitalization Task Force, an interagency group that spent last year preparing its recommendations. The report will be out soon.
"We expect the [task force] report to be similar to the DoD report in its recommendations," said Christopher Jehn, vice president of government programs for Seattle-based Cray Inc.
Jehn, however, noted that improvements are being made to Japan's Earth Simulator.
"Meanwhile, we haven't embarked on anything closely resembling the program that the Defense Department report issued last April called for," he said.
Although today's U.S. supercomputers can't compete with the Earth Simulator, that doesn't stop their proponents from touting their relative strengths.
Christopher Willard, vice president of high-performance technologies research at IDC in Framingham, Mass., said although supercomputers with RISC processors are still popular, clusters of commodity-priced processors are quickly gaining acceptance.
In the 1990s, companies such as IBM Corp. began stringing together inexpensive processors or servers to create supercomputer clusters. These clusters use chips from household names such as Apple Computer Inc., IBM, Intel Corp. and others.
Aside from the Earth Simulator, many of the world's top supercomputers use clusters of commodity processors.
"If you have a problem that is well suited for a cluster, then it is the most cost-effective way to solve the problem," Willard said.
The Earth Simulator, on the other hand, uses specialized vector processors and high-speed interconnects developed specifically for supercomputers by NEC. The vector processors are of the type Cray developed for its newest line of X1 supercomputers.
Cray, which invented supercomputing in the 1970s but was left for dead when Silicon Graphics Inc. acquired the company in 1996, has enjoyed a rebirth. Bought back from SGI by a Seattle company, the new Cray has sold several of its X1 supercomputers to research facilities, including the Army High Performance Computing Research Center.
The X1 uses an interconnect technology developed by Cray that is faster than the technologies used in other supercomputers. So-called interconnects move data between processors and system memory.
"On commodity-based systems, you might see about 5 percent of peak utilization," Muzio said, referring to the share of the total supercomputing power that an application can use at one time. "That's because of the way the processors access memory."
Muzio said the Cray X1 his company deployed for the Army research center can achieve 35 percent peak use.
David Turek, vice president of Deep Computing at Armonk, N.Y.-based IBM, bristles at the idea that commodity processor-based supercomputers are somehow inferior to vector processor-based systems built by Cray or NEC.
"The Earth Simulator has gotten some degree of disproportionate attention, because it's like building a 300-mph car for the auto industry and declaring that, therefore, the guys in Stuttgart and Detroit don't know what they're doing anymore," Turek said.
Turek said the number of teraflops a supercomputer such as the Earth Simulator can handle is irrelevant. For one reason, no application exists that can take advantage of that level of computing power, which is why the theoretical peak performance of a supercomputer is significantly higher than its actual performance under real-world conditions.
In addition, Turek said, the available computing resources are inevitably divvied up among the facility's users. According to Turek, access is the best measure of a supercomputer.
"Research is not so much a function of the size of the biggest system that exists," Turek said. "It's more a function of how ubiquitous technology is."
Turek cited the TeraGrid, which IBM helped develop, as an example of accessible computing power.
Others are not as impressed.
"Just because we hook a bunch of computers together doesn't give us the ability to solve scientific problems," Dongarra said.
Muzio, whose company manages both vector-based and commodity-based supercomputers, said there are applications where each is the better solution. Compute-intensive tasks are well suited to clusters, while applications that must move large amounts of data around the system do well on vector-based systems such as the Cray X1.
Cray itself is not above using commodity processors to build a supercomputer. Last fall, the company won a $90 million from the Sandia National Laboratory, an Energy Department facility in Albuquerque, N.M., to develop a new supercomputer. The new system will combine Cray's high-speed interconnect technology with commodity processors from Sunnyvale, Calif.-based Advanced Micro Devices Inc.
Cray expects to deliver the system by the end of the year, and reach 40 teraflops of theoretical peak performance by 2005.
What everyone agrees on is the need for fundamental research into next-generation supercomputing technologies. The question is: Who will pay for the necessary R&D?
"Every major vendor has told the government that they simply will not make the kinds of investments in high-performance computing that a lot of us think is necessary," Jehn said. "It's just not profitable."
Dongarra said grids would prove more effective if algorithms could be developed to use distributed resources more effectively. Experts also said research into new software, hardware and basic materials is needed to lay a foundation for future breakthroughs.
"Even if I had the money right now, I'm not sure I'd buy a leadership-type machine," Orbach said. He said the most efficient way to spend the Energy Department's 2005 budget for advanced computing science research is to develop new technologies, and determine what supercomputing architecture will be most effective going forward.
So far, the Defense Department has been the main source of funding for supercomputing research. Last summer, the Defense Advanced Research Projects Agency awarded three contracts, worth more than $146 million, to Cray, IBM, and Sun Microsystems Inc., Santa Clara, Calif. The awards are part of DARPA's High Productivity Computing Systems project to develop next-generation technologies.
"It is a race," Willard said. "It's a technological race, and it behooves the United States to maintain a high level of technical expertise and capability."
Staff Writer Brad Grimes can be reached at email@example.com.