Open Systems: Web-Basing Leads Linux to Forefront<@VM>Inventing the Grid<@VM>Pacific Northwest National Laboratory<@VM>Worldwide Shipments of Operating Systems 1999<@VM>Los Lobos<@VM>Is a Linux Cluster In Your Future?<@VM>The Lowdown on Clustering, Etc.
By Jon William Toigo
Just three years ago, debates were raging in the technology trade press regarding the future of Linux and open-source computing. Would government and business be willing to host mission-critical applications on what amounted to a shareware operating system?
Today, that question seems to have been answered decisively: Linux is big business.
Shipments of open-source Unix operating system "look-alikes" accounted for one-quarter of the 5.7 million operating environment shipments made to consumers around the world in 1999, according to Dan Kusnetzky, vice president of systems software research for market research firm International Data Corp. in Framingham, Mass.
The Linux share ranked a close second to Microsoft Corp., whose NT operating system made up 38 percent of operating environment shipments.
The Linux market share could be much larger, according to Kusnetzky, owing to the fact that only purchased software, as opposed to software that is freely distributed, is included in IDC estimates.
The Linux operating system can be downloaded free of charge from numerous Web servers and file transfer protocol sites on the Internet, in addition to being purchased as shrink-wrapped software from Red Hat Inc. of Durham, N.C., Corel Corp. of Ottawa and others. It is also included on many servers shipping today, including all platforms from Dell Computer Corp.
But market share data alone is insufficient to evaluate the penetration of Linux into the mainstream of government or business computing, according to Kusnetzky. Dismissing as incorrect "a popular perception" that all operating environments are general purpose in nature, he said that end users tend to use different operating environments for very different purposes.
The four uses most often cited by companies deploying Microsoft NT and Novell Netware operating environments are, in order, file and print services, electronic messaging, communications services and database support, Kusnetzky said. By contrast, companies fielding Unix servers ranked database support as their No. 1 use for the operating system, followed by electronic messaging and custom application development.
"While functionally similar, operating environments fill very different application niches," he said.
Linux, while capable of supporting a broad range of applications, "is primarily used to support Web servers," he added. "In contrast to Unix, less than 10 percent of companies use Linux to host databases."
If Web serving tops the list of applications for which Linux typically is deployed, high-performance computing must be the second application niche for the operating system, especially within the circles of governmental scientific and technological research.
For the past two years, Linux has been at the heart of numerous government-sponsored supercomputer development efforts throughout the United States. Collaborations involving the National Science Foundation, the departments of Energy and Defense, leading academic institutions and name-brand computer hardware vendors have focused on exploiting the clustering capabilities of Linux implemented on commodity hardware platforms.
The objective has been to build "supercluster" platforms capable of doing the work of single-purpose supercomputers, but at a fraction of the cost. According to Frank Gilfeather, director of the High Performance Computing, Education and Research Center (HPCERC) at the University of New Mexico in Albuquerque, the short-term results of this activity will boost supercomputing capabilities for use in government and private research.
And in the longer term, he said, the collaborations will set the stage for a new generation of low-cost, high-performance, business computing platforms that offer scalability and manageability well surpassing existing systems.
Gilfeather and others also said forward-looking systems integrators and IT solution providers should keep an eye out on the work that is being done at the leading academic institutions in the field, including the University of New Mexico, University of Minnesota, Carnegie Mellon University in Pittsburgh and Cornell University in Ithaca, N.Y., as well as Energy Department-backed national research laboratories, including Argonne, Sandia, Los Alamos and the Pacific Northwest National Laboratory.
These organizations are spearheading the technology that may one day support business applications amenable to high-speed parallel processing, such as data mining, according to Tom Morgan, program manager for Argonne National Laboratory near Chicago.Argonne is no stranger to primary research, Morgan said. The first national laboratory, Argonne was an outgrowth of the Manhattan Project and developed the first nuclear reactors for power generation and submarine propulsion.
In the early 1960s, the laboratory moved into basic scientific and mathematical research. Last fall, in an effort funded by the Energy Department's Office of Science and supported by IBM Corp. of Armonk, N.Y., and VA Linux Systems Inc. of Sunnyvale, Calif., Argonne fielded a 512-processor Linux supercluster nicknamed Chiba City.
Morgan said that the supercluster, which unites 256 IBM servers running VA Linux Systems via a combination of Fast Ethernet, Gigabit Ethernet and Myrinet interconnects, provides a scalable platform for use in "pure science research." He could not disclose the total cost of the implementation, but suggested that the laboratory's long-term partnership with IBM allowed Argonne significant discounts on hardware street prices.
Chiba City was assembled by Argonne team members in two days in September 1999, according to Morgan. It provides a flexible development environment for scalable, open-source software in four key categories: cluster management, high-performance systems software (file systems, schedulers and libraries), scientific visualization and distributed computing. The platform is used to "push the boundaries of high-end clustering and to design software and algorithms that can utilize it," Morgan said.
He said that Argonne-developed shared-processing software is at the heart of the National Computing Grid, a National Science Foundation-backed effort to provide a high-performance computing infrastructure accessible to government, academia and business. One of Argonne's designers, Ian Foster, is credited with developing the Globus Project, Morgan said, "which is the basis of most Grid software."
In addition to contributing the underlying software and algorithms, Argonne also participates directly in grid management as a member of the National Computational Science Alliance. The alliance, which comprises more than 50 universities and research labs, was formed by NSF in October 1997 with the mission of prototyping an advanced computational infrastructure for the 21st century. NSF has promised to invest $170 million in the effort over five years.
The alliance has been involved in numerous supercluster development projects, serving as a funding agent in some cases. It is headed by director Daniel Reed, who also chairs the Department of Computer Science at the University of Illinois in Champaign-Urbana.
Morgan said three universities are the major stars in the supercomputing university: the University of Illinois, which hosts the National Center for Supercomputing Applications; the University of California at San Diego, which hosts the National Partnership for Advanced Computational Infrastructure; and the University of New Mexico, with its HPCERC.
Chiba City was not developed as part of the NSF-sponsored Grid, but as a separate, Energy Department-funded project, Morgan noted. The difference is important, as it affects how the resources may be accessed and used by universities, businesses or government agencies.
"We contribute software for the Grid, but we are not subject to the Grid's peer review processes [which determine how, when and by whom the supercluster is used]," Morgan said. Argonne has collaborated since 1990 with numerous universities, businesses and government agencies on various projects, he said, but not as part of the Grid.Scott Jackson, system administration task leader for Linux Clusters at the Pacific Northwest National Laboratory in Richland, Wash., noted that his organization's new 96-node, dual-processor Linux supercluster is not part of the national Grid either, but has planned several Grid-type software experiments.
As an Energy Department-backed lab, Pacific Northwest conducts research in the fields of environment, energy, health sciences and national security. The laboratory has been operated on behalf of the agency by Battelle Memorial Institute, Columbus, Ohio, since 1965.
According to Jackson, the laboratory went looking in October 1999 for a supercomputing platform to support the growing needs of its 3,400-strong staff involved in multidisciplinary scientific research.
"We looked at available technologies that would give us cost-effective operations and support for a diversity of applications, such as the molecular simulation of the effects of contamination on microorganisms," Jackson said. Using $380,000 from internal laboratory investments and funds from two Department of Energy programs, Pacific Northwest turned to Dell Computer Corp., Round Rock, Texas, to build their supercluster solution, nicknamed Colony.
"We already had a managed hardware program with Dell, and their recommended clustering solution was cost-effective," said Jackson. Contracts were made for equipment and software in November 1999.
Initially, the clustering technology selected for interconnecting the Dell PowerEdge-1300 servers was limited to 64 servers, said Jarek Nieplocha, chief scientist within Pacific Northwest's Advanced Computing Group, who contributed his insights to the cluster acquisition. When Dell delivered the equipment in January, the cluster needed to be divided into two partitions of 32 and 64 nodes, respectively.
"We upgraded the Giganet Network cLAN interconnect, operating the Virtual Interface Architecture (VIA) protocol, and merged the entire cluster into a single partition in early May," Nieplocha said.
VIA is an interface protocol that defines mechanisms for low-latency, high-bandwidth message passing between interconnected nodes. It is embraced by a number of industry-leading companies such as Compaq Computer Corp., Intel Corp. and Microsoft Corp.
The Colony has become a key platform for researchers at Pacific Northwest, according to Jackson. It is not shared with the National Computational Science Alliance Grid because of the high demand of internal programs. The Energy Department, he noted, purchased the system, and agency programs get dibs on its use.
Pacific Northwest is interested in upgrading and enlarging the Colony as budgets permit, Jackson added.
Worldwide Shipments of Operating Systems 1999
Microsoft NT Server 38 percent
Linux 25 percent
Novell Netware 18 percent
Unix 15 percent
Other 4 percent
Total Shipments: 5.7 million copies
Source: International Data Corp.
In contrast to the national laboratories, the University of New Mexico, with its two high-performance computing centers in Albuquerque and Maui, Hawaii, is a centerpiece of the Grid computing effort.
Using a combination of Alliance and NSF funding, university money and grants from IBM, the university has fielded three clusters in as many years, according to Brian Smith and Patricia Kovatch, officials at the Albuquerque center.
The latest is Los Lobos, which features 256 IBM NetFinity 4500R dual processor servers running the Red Hat Linux operating system and linked via a Myrinet interconnect. The new supercluster, which has an estimated street price of $2.25 million, is viewed as an upgrade of the university's existing 128-node Road Runner supercluster, which was put into service last April.
Equipment was delivered in early June and was expected to go live July 1, according to Kovatch, who manages the High Performance Computing Support Group at the center.
Smith, the director there, said that Los Lobos would join the resources contributed by six educational institutions on the Grid. "Researchers can submit proposals for projects that will use compute cycles," he said. "Three allocation boards, comprised of academicians at the San Diego and Illinois centers and the National Science Foundation, perform technical reviews of the reasonableness and feasibility of the proposed use, then grant time."
Kovatch said that for Los Lobos, researchers in fields ranging from astronomy and fluid dynamics to computational chemistry will have access to a computing platform capable of more than 375 million operations per second (called gigaflops).
In addition to its superclustering work, New Mexico also is working with IBM on hyperclustering. A hypercluster couples two or more superclusters of different machine types. Using a hypercluster, the operations of an application can be allocated to either supercluster based on the suitability of the machine type.
Gilfeather at New Mexico's HPCERC used the example of a rendering application to illustrate the concept. "The Linux cluster can do the compute operations involved in efficient rendering," Gilfeather said. "But current Linux clusters are weak in input/output operations and visualization, so these tasks could be passed to [a cluster of machines that perform these tasks well.]"
Project Vista Azul, initiated at HPCERC in December 1999 following a grant from IBM, is expected to run through 2000 as university scientists seek to create a stable hypercluster from an IBM RS/6000 SP cluster and a Linux supercluster based on IBM NetFinity servers.
Gilfeather said he expects the project to enhance existing "community codes" and produce new codes that can be reused within the community of researchers to build other hyperclusters. This will be a foundation for building scalable heterogeneous platforms to serve the broader world of business computing, he said.Dave Turek, IBM's vice president of deep computing, is more reluctant to endorse Linux clustering as ready for the mainstream. He referred to universities and government laboratories as classic early adopters of new technology. They have skills that enable them to capitalize on open-source code, "free labor" in the form of skilled workers and graduate students and a motivation to do as much as possible with limited resources, Turek said.
And "they have strong economic reasons for acquiring free or low-cost operating systems and running them on commodity hardware," Turek said.
Once the Linux platform becomes operational, service and support is an even bigger issue, Turek said. While experience shows that support of Linux is more timely than other software products ? "if you have a problem, send it out via e-mail and you will probably find another Linux programmer somewhere in the world who can help you at any time day or night" ? he questioned whether companies will be willing to surrender their shrink-wrap software service agreements.
Al Stutz, director of high-performance computing at the Ohio Supercomputer Center in Columbus, said "only time will tell the answer to that issue." Stutz said support from Silicon Graphics Inc. of Mountain View, Calif., with respect to his center's 128-processor supercluster has been very good to date.
New Mexico's Kovatch said the calls she fields from businesses are increasing. "We get calls from companies who are interested in Linux clusters. [They] are finding that relying on one vendor to look after their software [leads to systems] that are down more often than they are up," she said.
The ultimate determination of Linux clustering, superclustering or hyperclustering success will be the number of applications found suitable to the platform, said Morgan at Argonne. "There is a divided opinion over what is useful or appropriate for high-performance computing even in scientific research," he said. "Being able to get support is important. But it is also important to identify applications that can take advantage of parallel processing.
"Certain types of code will never run on this platform," he said. "Data mining, however, is an example of a business app that is also a parallel app. There may be others that haven't been invented yet."
By Jon William Toigo
The terminology of high-performance clustering can sound more like astronomy than computer science. Here is a brief definition of terms: ? Clustering:
Connecting two or more computers in such a way that they behave like a single computer. Clustering is used for parallel processing of applications, for load balancing and for fault tolerance. Most IT professionals are familiar with failover clustering, in which a server takes over the load from a primary server if the primary server fails.
Application clusters, invented by Digital Equipment Corp. nearly 20 years ago, are tight couplings of servers that represent themselves to applications as a single, virtual server system. Application operations can be distributed among the processors of an application cluster to make the best possible use of the computing resources available, resulting theoretically in the best possible application performance. ? Interconnect:
A technology for coupling the servers that comprise nodes in a cluster. Interconnects range from Fast Ethernet and Gigabit Ethernet network connections to high-performance and low-cost products, such as Myrinet from Myracom Inc., to proprietary (and often extremely expensive) technologies from Compaq Computer Corp., Silicon Graphics Inc. and others. ? Parallel Processing:
The simultaneous use of more than one central processing unit to execute an application program. Theoretically, parallel processing makes a program operate more efficiently because there are more processing engines to support it. In practice, it is often difficult to divide a program to capitalize on multiple CPUs without having program operations interfere with each other. Parallel processing is different from multitasking, in which a single processor executes several programs at once. ? Superclustering:
A term coined by the University of New Mexico High Performance Computing, Education and Research Center to describe a class of clusters featuring nodes with fast processors and large memories coupled via a high bandwidth (greater than one gigabit per second), low-latency (under 20 microsecond) interconnect technology. ? Hyperclustering:
Another University of New Mexico HPCERC-coined term describing clustering architecture in which two or more superclusters, often with different node operating systems (i.e., IBM AIX and Linux), are coupled via a common interconnect technology. Application processes may be divided between the joined superclusters based on the appropriateness of the supercluster to the application program task.