GENETIC RESEARCH AT DARTMOUTH COLLEGE WITH PSSC LABS POWERWULF CLUSTERS
PROFILE
Dartmouth College's DISCOVERY Supercomputing Cluster
CHALLENGE
Processing large, multi-gigabyte data files for human genetics research
Scaling processor cores to expand the compute cluster without increasing
the amount of space needed.
SOLUTION
Dartmouth selected system integrator PSSC Labs to supply an initial
PowerWulf Cluster consisting of 25 dual-core nodes in 2005, and now has
nearly 90 nodes with 328 processor cores, 600 gigabytes of memory,
and11 terabytes of disk space.
IMPACT
Schmitt and his team have built the largest computing cluster at
Dartmouth and one of the largest educational computing clusters in New
England. This facility enables world-renowned research into the genetic
causes of cancer and other diseases and also provides high-performance
computing resources for engineering, physics, and other programs.
ORGANIZATIONAL PROFILE
Founded in 1769, Dartmouth College is an Ivy League school that offers an outstanding
undergraduate education along with world-famous graduate institutions, including the
Tuck School of Business, Dartmouth Medical School, and The Thayer School of
Engineering.
To enhance its medical research capabilities, the College's Norris-Cotton Cancer Center hired Jason Moore, a renowned genetics research scientist, to build a large computing cluster at the College in 2004. The cluster became known as the Dartmouth Initiative for SuperComputing Ventures in Education and Research, or DISCOVERY. Moore had overseen development of a similar cluster at Vanderbilt University, and Dartmouth wanted to provide similar or better facilities for its Computational Genetics Computing Laboratory.
CHALLENGE
Increases in computing power over the past two decades have driven far more
sophisticated data analyses in the field of genetics. Many of these compute sessions
involve massive files – as large as 20 gigabytes or more. A leading graduate
educational institution, Dartmouth College wanted to provide its genetics students with
up-to-date computing resources that would not only speed execution of their projects
but enable new and highly sophisticated analyses.
In 2005, the College's Computational Genetics Lab began an effort to build a supercomputing server cluster and hired Peter Schmitt as the Lab's technical director. A former programmer with no prior experience in building clusters, Schmitt had a fast and steep learning curve. He interviewed server manufacturers and system integrators while evaluating cluster management software, educating himself about how these large systems were built and run.
SOLUTION
After evaluating major suppliers of clustering hardware and software such as HP, IBM,
and Sun as well as third-party system integrators, he selected PSSC Labs, a Southern
California-based systems integrator focusing on high-performance server clusters for
corporate and government clients. "PSSC Labs had the best combination of service and
price," says Schmitt. "Some vendors had lower prices with no service, while others had
great service with very high prices. PSSC Labs had just the right combination."
In stating his requirements, Schmitt had one firm request. "We requested AMD
Opteron™ processors in the servers because we believed their memory management
was superior," he says. "Our cluster is 100 percent AMD Opteron processor-based."
This even includes the processors in legacy servers the lab owned before PSSC
brought in its equipment. PSSC Labs supplied servers with 64-bit Dual-Core AMD
Opteron processors, along with 8 gigabytes of RAM, high-speed, low-latency InfiniBand
interconnects, and one 80-gigabyte hard drive.
Although the servers arrived with nearly perfect configurations, selecting cluster management software involved a longer period of trial and error for Schmitt. "We started off with Maui and Torque as the cluster software," he says, "and we have now settled on Moab, which has been a great product."
In addition to the PSSC Labs PowerWulf cluster servers, Schmitt added some existing server nodes with single-core AMD Opteron processors to create a free pool of computing resources for the engineering, physics, and chemistry students. "We share the cluster's resources with the rest of the community," he says. "We have a buy-in process where these other departments actually purchase hardware nodes and get four years of access to the cluster. But there's always enough performance left over for the genetics jobs."
Students and professors in the Computational Genetics Lab develop and run their own applications using standard tools such as C++, FORTRAN, Perl, Python, and Java. Students from the engineering, physics and chemistry departments use applications such as Fluent (a computational fluid dynamics tool), EMAN (a set of image/volume processing tools that perform single particle reconstructions to determine the 3- dimensional structures of molecules), and MatLab (a high-level technical computing language and interactive environment for algorithm development, data visualization, data analysis, and numerical computation).
As manager of the cluster, Schmitt also developed a web site that shows current utilization and usage information and facilitates scheduling to minimize request calls from potential users. The web site (http://discovery.dartmouth.edu) is open to anyone who wants to know more about the DISCOVERY cluster.
IMPACT
The performance and reliability of the PSSC Labs PowerWulf cluster servers have been
outstanding, and the cluster now supports major projects in cancer research as well as
other disciplines. "Students can now use hundreds of processors to handle a
computationally-intensive problem or to process 26 gigabytes of data in a few days
when it would have taken a year or more on a single system," says Schmitt.
In addition, InfiniBand's low latency server interconnects are also delivering fast results. "We have a user who saw a 70 percent improvement in processing speed on a job due to the InfiniBand interconnect," says Schmitt.
Although the PowerWulf cluster started with 25 dual-core nodes in 2005, Schmitt added dozens of nodes during 2006 to bring the cluster to its current total of 328 processor cores. For the future, he wants to increase the number of processing cores. "When Quad-Core AMD Opteron processors come out, we will replace our dual-core units as they reach their end-of-life cycles," says Schmitt. "We want to get to 500 CPUs within the next two years, and the only way we're going to get there in the space available is to use quad-core CPUs." Fortunately, the sophisticated power management capabilities of the AMD Opteron processor will allow Schmitt and his team to upgrade to Quad-Core AMD Opteron processors without linear increases in power requirements. Throughout the cluster's development, PSSC Labs has provided highly responsive support. "We get very excellent turnaround on our service requests, and we can always ship a system back to them if it's a big problem," says Schmitt.
By relying on AMD processor performance and PSSC Labs' deep knowledge of educational and scientific computing clusters, Peter Schmitt and the Dartmouth Computational Genetics Laboratory have built a truly scalable community resource that is helping speed the advance of medical science.