ASCR Monthly Computing News Report - December 2010



Team Exploring Technology at the Nanoscale Breaks Petaflop on Jaguar
A team led by Gerhard Klimeck of Purdue University has broken the petascale barrier while addressing a relatively old problem in the very young field of computer chip design. Using Oak Ridge National Laboratory’s Jaguar supercomputer, Klimeck and Purdue colleague Mathieu Luisier reached more than a thousand trillion calculations a second (1 petaflop) modeling electrons as they travel through electronic devices at the smallest possible scale. Klimeck, leader of Purdue’s Nanoelectronic Modeling Group, and Luisier, a member of the university’s research faculty, used more than 220,000 of Jaguar’s 224,000 processing cores to reach 1.03 petaflops.
The team is pursuing this work on Jaguar with two applications, known as Nanoelectric Modeling (NEMO) 3D and OMEN (a more recent effort whose name is an anagram of NEMO). The team calculates the most important particles in the system—valence electrons located on atoms’ outermost shells—from their fundamental properties. These are the electrons that flow in and out of the system. On the other hand, the applications approximate the behavior of less critical particles—the atomic nuclei and electrons on the inner shells.
The team is working with two experimental groups to bring this research to the real world. One is led by Jesus Del Alamo at the Massachusetts Institute of Technology, the other by Alan Seabaugh at Notre Dame. With Del Alamo’s group the team is looking at making the electrons move through a semiconductor faster by building it from a material called indium arsenide rather than silicon. With Seabaugh’s group the modeling team is working on band-to-band-tunneling transistors, which could dramatically reduce the energy consumption in traditional field-effect transistors.
Contact: Jayson Hines,
ALCF Helps Deliver “Green” Wind Turbines and Jet Engines
Understanding the complex turbulence noise-generation mechanism from wind turbine airfoils and jet exhaust nozzles is critical to delivering the next generation of “green,” low-noise-emission wind turbines and jet engines. Scientists at GE Global Research are leveraging their INCITE allocation at the Argonne Leadership Computing Facility (ALCF) to develop and prove high-fidelity, direct-from-first-principles predictions of noise to characterize these hard-to-measure acoustic sources.
Comparisons with experimental data show that large eddy simulation (LES) predictions for both single-flow and dual-flow nozzles are successful in predicting the turbulent flow evolution. Far-field acoustics prediction based on the near-field flow data compares very favorably to the experiments. This first-principles-based approach can be used for design guidance. In addition, preliminary investigation of the airfoil simulation demonstrated an ability to correctly predict the eddy convection velocity in the boundary layer – an important step towards improved modeling of turbulent boundary layer noise sources for reduced order methods.
With proof-of-concept aero and acoustic LES calculations completed, the research team is pursing the use of LES and high performance computing to design noise-reduction features, demonstrate numerical wind tunnel capability, and improve efficiency of numerical algorithms and parallel scalability beyond 32K cores.
Contact: Umesh Paliath
Berkeley Lab Team Wins Best Paper at CloudCom
Cloud computing has proven to be a cost-efficient model for many commercial web applications, but will it work for scientific computing? Not unless the cloud is optimized for it, writes a team from the Lawrence Berkeley National Laboratory.
After running a series of benchmarks designed to represent a typical midrange scientific workload—applications that use less than 1,000 cores—on Amazon’s EC2 system, the researchers found that the EC2’s interconnect severely limits performance and causes significant variability. Overall, the cloud ran six times slower than a typical midrange Linux cluster, and 20 times slower than a modern high performance computing system.
The team’s paper, “Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud,” was honored with the Best Paper Award at the IEEE’s International Conference on Cloud Computing Technology and Science (CloudCom 2010) held Nov. 30-Dec.1 in Bloomington, Ind. The authors were Keith R. Jackson, Lavanya Ramakrishnan, Krishna Muriki, Shane Canon, Shreyas Cholia, John Shalf Harvey J. Wasserman, and Nicholas J. Wright.
The award is the second such honor for Jackson and Ramakrishnan this year. Along with Berkeley Lab colleagues Karl Runge of the Physics Division and Rollin Thomas of the Computational Cosmology Center, they won the Best Paper Award at the Association for Computing Machinery’s ScienceCloud 2010 workshop for “Seeking Supernovae in the Clouds: A Performance Study.”
PNNL Releases Climate Analysis Tool: Parallel Analysis of Geodesic Data (Pagoda)
Researchers at Pacific Northwest National Laboratory have released version 0.5 of “pagoda,” a suite of parallel climate analysis command-line tools and APIs that uses PNNL’s Global Arrays toolkit. Pagoda is a first-of-its-kind climate analysis tool to use a data-parallel design and parallel IO, making it capable of distributing large datasets.
This project directly supports the development of the Global Cloud Resolving Model (GCRM) being developed by Colorado State University. The GCRM will soon be running multi-day simulations at 4 km grid spacing, generating massive amounts of data. At this scale, a single snapshot of a single 3D variable would take up to 50 GB. Post-processing and analysis of the data is not feasible given today’s analysis tools; most climate analysis tools are serial and cannot handle data of this size, nor can they correctly handle the requirements of the geodesic grid used by the GCRM.
Pagoda specializes in using parallel IO via the Parallel NetCDF library and also operates on regularly gridded data, making it a general tool for climate analysis and post processing. Primary users include the group at CSU; however, this recently released version has also been picked up by Argonne as well as the National Center for Atmospheric Research (NCAR) for evaluation.
Contact: Mary Anne Wuennecke,
Improving HPC I/O Performance on ALCF Leadership-Class Supercomputers
The performance mismatch between the computing and I/O components of current-generation HPC systems has made I/O the critical bottleneck for data-intensive scientific applications. I/O forwarding attempts to bridge this increasing performance gap by regulating the I/O traffic. Researchers’ evaluation of the I/O forwarding mechanism in the IBM Blue Gene/P system revealed significant performance bottlenecks caused by resource contention on the I/O node. Their approaches to overcome this bottleneck include an I/O scheduling mechanism that leverages a work-queue model and an asynchronous data staging mechanism.
On the leadership computers at Argonne National Laboratory, these approaches yield up to 53 percent improvement in performance over the existing I/O forwarding mechanism and close to 95 percent efficiency of the maximum achievable end-to-end throughput between the BG/P compute nodes in a P/set and the file server nodes. Argonne researchers Venkatram Vishwanath, Mark Hereld, Kamil Iskra, Dries Kimpe, Vitali Morozov, Michael E. Papka, Robert Ross and Kazutomo Yoshii presented their work in a paper titled “Accelerating I/O Forwarding in IBM Blue Gene/P Systems” at the SC11 conference held in November in New Orleans.
The researchers believe this is a significant step toward scaling the I/O performance of applications on current large-scale systems and will provide insight for the design of I/O architectures for exascale systems.
PNNL Releases Task Scheduling Library: tascel
Researchers at Pacific Northwest National Laboratory have released version of 0.1 of tascel (pronounced as tassel), a task-scheduling library. The library enables scheduling and load balancing of task-based computations on distributed memory machines. Typical parallel applications are organized in as a collection of processes, one per processor. This requires the programmer to organize the application such that all processes perform the same amount of work. If some of the processes have less than others, they idle for the other processes to complete their work. This results in load imbalance – a challenge for application programmers, especially as computers scale in the number of processors. Tascel automatically balances the load across the processors, relieving the programmer of the problem. The programmer organizes the work to be done as a collection of tasks, units of work that can be performed in parallel. Tascel schedules these tasks, together with the associated data movement, based on the constraints expressed by the programmer. The library is distributed as part of the Global Arrays suite, a collection of libraries for global address space programming. Researchers are currently evaluating the algorithms in the context of computational chemistry kernels.
Contact: Mary Anne Wuennecke,


LBNL’s Niels Gronbech-Jensen Named APS Fellow
Niels Gronbech-Jensen, a faculty scientist in CRD’s Scientific Computing Group and a professor in the Applied Science Department at UC Davis, has been named a fellow of the American Physical Society, nominated by the Computational Physics Division. He was cited “for his development and application of new computational algorithms and tools in Biological and Condensed Matter Physics, especially those involving massively parallel molecular dynamics, electrostatic interactions, ion implantation, and nonlinear physics.”.
LBNL’s Kathy Yelick Contributes to NRC Report on Future of Computing Performance
The rapid advances in information technology that drive many sectors of the U.S. economy could stall unless the nation aggressively pursues fundamental research and development of parallel computing — hardware and software that enable multiple computing activities to process simultaneously — says a new report by the National Research Council (NRC). Better options for managing power consumption in computers will also be essential for continued improvements in IT performance.
The report, titled The Future of Computing Performance: Game Over or Next Level?, was written by the NRC’s Committee on Sustaining Growth in Computing Performance. Berkeley Lab Associate Laboratory Director Kathy Yelick was the only representative of a government agency on the committee and contributed to the report. A National Academies news release summarizes the report’s conclusions and recommendations.
LLNL’s Dona Crawford Profiled as Rock Star of HPC by insideHPC Newsletter
Dona Crawford, Associate Director for Computation at Lawrence Livermore National Laboratory, is the latest HPC luminary to be spotlighted as a “Rock Star of HPC” by insideHPC. Writer Mike Bernhardt notes “From her days as one of the original leaders of the Accelerated Strategic Computing Initiative (ASCI) program, a national effort dating back to the early 90s, to her current position as Associate Director for Computation at Lawrence Livermore National Laboratory (LLNL) where she is responsible for a staff of roughly 900, she has built a tremendous following of loyal employees and close friends. I have heard numerous colleagues refer to Dona as a true leader who inspires and motivates with vision and passion.”
ESnet’s Inder Monga Gives Talks at Cisco’s “Nerd Lunch,” IEEE ANTS
Collaborating across distances poses not just technological challenges — one of ESnet’s core areas of expertise — but also communication challenges. That’s what Inder Monga, ESnet’s Area Lead for Research and Services, discovered when he was invited to Cisco Systems’ headquarters to give a presentation in their Nerd Lunch series on “Collaborative Science Research: Driving Network Innovation at ESnet.” With more employees attending the talk from their desks than in person, Monga was deprived of visual feedback on how the audience was responding to the presentation, and he is rethinking his approach. Read more.
Monga gave another invited talk that required a little more travel in the 100 Gbps Networking Session at the IEEE International Symposium on Advanced Networks and Tele-communications Systems (ANTS), held December 16–18 in Mumbai, India. A paper on “A Heuristic for Combined Protection of IP Services and Wavelength Services in Optical WDM Networks,” which he coauthored with ESnet’s Chin Guok and collaborators from UC Davis and Politecnico di Milano, Italy, was presented in the High-Speed Networks Session.


OLCF Announces 930 Million Hours for 2011 INCITE Projects
The Oak Ridge Leadership Computing Facility (OLCF) will provide more than 930 million processor hours via the Innovative and Novel Computational Impact on Theory and Experiment (INCITE) program, the U.S. government’s premier supercomputing allocation, jointly managed by the Department of Energy’s leadership computing facilities at Argonne and Oak Ridge National Laboratories. In all, the OLCF will host 32 projects representing a wide array of scientific inquiry, from combustion to climate to chemistry. “This year’s group of proposals was probably the best we’ve seen to date,” said OLCF Director of Science Bronson Messer. “The final list of awardees is a collection of projects that we believe will have remarkably high scientific impact through the use of leadership computing resources.”
After peer review and computational readiness evaluations, projects were selected for their potential to advance scientific discoveries, speed technological innovations, and their ability to make use of hundreds of thousands of processors working concertedly to do so. OLCF computational and computer science experts play a vital role in ensuring INCITE researchers fully harness the power of the supercomputers to meet their goals. They work closely with the scientists and engineers to enhance the fidelity and scalability of the simulations. The INCITE program will sponsor a total of 1.7 billion computing hours at both the OLCF and the Argonne Leadership Computing Facility, located at Argonne National Laboratory in Chicago, Illinois.
Contact: Jayson Hines,
ALCF to Support 30 Research Projects through INCITE Program
Based on their potential for breakthroughs in science and engineering research, 30 projects have been awarded a total of 732 million hours of computing time at Argonne’s Leadership Computing Facility (ALCF) as part of the DOE’s INCITE program. The projects include research in the areas of biological sciences, chemistry, computer science, earth science, energy technologies, engineering, materials science and physics. The projects will be run on ALCF’s Intrepid supercomputer, a 40-rack IBM Blue Gene/P capable of a peak performance of 557 Teraflops (557 trillion calculations per second). Intrepid features a low-power system-on-a-chip architecture, reducing power demands and lowering operating costs by using one-third as much electricity as other machines of comparable size.
As part of the INCITE program, the ALCF provides in-depth expertise and assistance in using ALCF systems and optimizing applications to help researchers from all different scientific disciplines to scale successfully to an unprecedented number of processors to solve some of the nation’s most pressing technology challenges. This year’s allocations were awarded on a competitive basis and represent the largest amount of supercomputing time ever awarded under the INCITE program, reflecting both the growing sophistication of the field of computer modeling and simulation and the rapid expansion of supercomputing capabilities at DOE national laboratories in recent years.
NERSC Allocations to Support More Than 400 Projects in 2011
More than 400 research projects led by scientists at national laboratories and universities across the country have been awarded supercomputing and data storage allocations at the National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory. In all, 244 million processor hours on the Cray XT4 were awarded to 407 projects selected by the six program offices in DOE’s Office of Science. These allocations will be increased later in 2011 when NERSC’s newest system, a 153,408 processor-core Cray XE6 system, known as Hopper, goes into production. Hopper will nearly quadruple the amount of available computing time at NERSC. Additionally, 433 projects were awarded a total of 99 million Storage Resource Units at NERSC.
ESnet Reaches Milestone: 10 Petabytes of Traffic in One Month
November 2010 was the first month in which the ESnet network crossed a major threshold – over 10 petabytes of traffic. Traffic volume was 40 percent higher than the prior month and 10 times higher than just a little over four years ago. But what’s behind this dramatic increase in network utilization?
Breaking down the ESnet traffic highlighted a few things. The bump in traffic was not caused by the demo traffic sent to the SC conference in New Orleans (151.99 TB delivered), since that accounted for less than 1 percent of November’s ESnet traffic. Instead, for the first time significant volumes of genomics data traversed the network as DOE’s Joint Genome Institute (JGI) sent over 1 petabyte of data to NERSC and continued to use NERSC’s cloud service heavily. JGI alone accounted for about 10 percent of November’s traffic volume. And as the Large Hadron Collider increases its luminosity, it continues to churn out massive datasets, which ESnet delivers to researchers across the US.
Contact: Wendy Tsabba
IBM Case Study: NERSC Creates an Ultra-Efficient Supercomputer
A recently published IBM case study, “NERSC creates an ultra-efficient supercomputer,” describes how the Carver and Magellan clusters were designed to maximize utilization of space, power, and cooling capacity, resulting in a lower-cost, higher-performance option than the clusters they replaced. Brent Draney, NERSC’s group lead for Network, Security, and Servers, is quoted throughout the case study, describing the innovative design and its benefits for NERSC.


OLCF Staff Members Lead BoF to Raise Awareness for Women in HPC
OLCF staff members Rebecca Hartman-Baker, Hai Ah Nam, and Judith C. Hill organized a Birds of a Feather session at SC10 titled “Developing, Recruiting, and Retaining Women in HPC,” exploring ways to bring women into high-performance computing and keep them there.
The session included three panelists from different HPC work environments. Nannette J. Boden, president and chief executive officer at Myricom, which provides extreme-performance, 10-Gigabit Ethernet products, covered industry. Lois Curfman McInnes, computational scientist in the Mathematics and Computer Science Division of Argonne National Laboratory, dealt with national laboratories. Finally, Patricia Teller, professor in the Department of Computer Science at the University of Texas at El Paso, discussed academia. The session will expand with a two-fold plan for next year’s SC conference. The first part will be to bring successful managers to discuss how they have kept high numbers of women in their organizations. The second task will be to plan how to build a community of women in supercomputing through approaches such as social networking.
Contact: Jayson Hines,
User Meeting Highlights Applications of Nek5000 Simulation Code
Thirty researchers from around the world gathered for the first Nek5000 meeting, held Dec. 9-10 at Argonne. The objective was to enable developers and users of Nek5000 to exchange information, address technical issues, and share experiences in areas of common interest. Nek5000 is a scalable fluid-mechanics and heat-transfer simulation code developed in the Mathematics and Computer Science Division at Argonne under DOE’ Advanced Scientific Computing Research program, with additional application support from the DOE Nuclear Energy Advanced Modeling and Simulation Program.
An open source code, Nek5000 enjoys an active users group with over 60 registered users. Participants at the workshop gave short presentations of their applications of Nek5000, which included nuclear reactor modeling and simulation, film cooling for gas turbines, sub-mesoscale oceanography, combustion in microscale channels, astrophysics, and vascular flow modeling. In addition to the technical presentations, the workshop included hands-on sessions in the use of Nek5000 and associated codes for mesh generation and parallel data analysis and visualization.
Nek5000 is an early science application on Blue Gene/Q at Argonne and, through funding from the Innovative and Novel Computational Impact on Theory and Experiment (INCITE) program, has been used for large-scale numerical simulations on DOE’s most powerful leadership computing facilities.
OLCF Provides Expertise to Student Cluster Challenge at SC10
On Monday, Nov. 15, the starting gun fired, and students began feverishly computing. For 47 hours, sleep was out of the question, caffeinated beverages were consumed like water, and the power of supercomputers was laid at the fingertips of eight teams vying to be known as the best next-generation of high performance computing. “We’re having [students] run a high-performance cluster on the power it takes to run three coffee makers,” said OLCF’s Hai Ah Nam, computational scientist and technical chair of this year’s Student Cluster Competition (SCC). Students had to build a commercial computer cluster capable of running open-source software and meeting HPCC benchmarks—on 26 amps of power.
The competition had OLCF staff organizing, judging, interviewing, and getting to know the students throughout the week. OLCF’s Jeff Kuehn, Bronson Messer, Arnold Tharrington, Rebecca Hartman-Baker, and Ilene Carpenter all served as scientific application judges for this year’s competition. Various parts of the world were represented in the competition, with teams from National Tsing Hua University in Taiwan, Nizhni Novgoroad State University in Russia, Florida A&M University, Louisiana State University, the University of Colorado, the University of Texas at Austin, Purdue University in Indiana, and Stony Brook University in New York. Students were aided in their preparation for the competition by teaming with experts from the high-performance computing industry. When the closing bell rang, National Tsing Hua University was declared the winner.
Contact: Jayson Hines,
LBNL Staff Present Job-Search Skills Program to High School IT Academy Students
Berkeley Lab Computing Sciences Communications staffers Jon Bashor and Linda Vu joined LBNL recruiters Andi Horton and Robert Rodriguez in a presentation on finding rewarding jobs to a group of seniors at Kennedy High School in Richmond, California on Thursday, Dec. 16. The session drew 40 students from the school’s IT Academy and stemmed from a summer outreach program organized by Communications staff last summer. Among the topics covered were where to look for jobs, who to call on for help, dressing for success in the job interview, likely interview questions, and a “circle of support” exercise to identify people, organizations, and other support resources.
At the end, more than three-fourths of the students raised their hands when asked if they found the session useful, and a handful stayed afterwards to ask questions about opportunities at the Lab. Computer science teacher Lane Good also approved, asking the group to give a similar presentation to seniors during the next school year.
Contact: Jon Bashor,
Coalition of Lustre Users Announces Open Registration for LUG 2011
LUG 2011 will be held in Orlando, Florida from Tuesday, April 12, through April 14, 2011. This two-and-a-half-day event is the primary venue for discussion and seminars on open source parallel file system technologies with a unique focus on the Lustre parallel file system. “LUG is the preeminent venue for users of the Lustre parallel file system, drawing individuals both internationally and across a variety of disciplines,” said Galen Shipman, Technology Integration Group Leader at ORNL. LUG 2011 is a user-led event with an organizing committee made up of representatives from Commissariat à l’Energie Atomique, Indiana University—Pervasive Technology Institute, Lawrence Livermore National Laboratory, Naval Research Laboratory, Oak Ridge National Laboratory, Sandia National Laboratories, and Texas Advanced Computing Center.
As of December 15, members of the Lustre community can register for LUG 2011 via the conference website at Early bird registration (through March 15) is $400 per person, while standard registration (after March 15) is $550 per person for the entire two-and-a-half-day event. The LUG program committee would like to invite members of the Lustre community to submit presentation abstracts for inclusion in this year’s meeting. The deadline to submit presentation abstracts is February 14, 2011. For questions or to submit a presentation abstract contact the program committee chair, Stephen Simms, at
Registration Open for ALCF 2011 Winter Workshop Series
The 2011 Winter Workshop Series, sponsored by the Argonne Leadership Computing Facility (ALCF), will be held in January and is open to all ALCF users. Registration is required.
Getting Started – January 18
The Getting Started workshop is held annually to provide users with information on ALCF services and resources, technical details on the IBM Blue Gene/P architecture, as well as hands-on assistance in porting and tuning of applications on the Blue Gene/P.
Productivity Tools for Leadership Science – January 19-20
Experts will help boost users’ productivity using TAU, Allinea, and other HPC tools. Topics will also include parallel I/O, visualization and data analysis, and libraries on the IBM Blue Gene/P system at the ALCF. Hands-on assistance also will be available.
Learn more
INCITE Proposal Writing – January 24
Nearly 1.7 billion processing hours were awarded for scientific research through the 2011 INCITE program. The INCITE Proposal Writing workshop can help users/potential users prepare proposals for 2012 awards. Scientists with the Argonne Leadership Computing Facility and Oak Ridge’s Scientific Computing group will provide tips and suggestions to improve the quality of INCITE proposal submissions. This workshop will be presented concurrently as a live event and a webinar.
Workshop to Focus on Manycore- and Accelerator-Based Scientific HPC
The second workshop on “Manycore and Accelerator-Based High-performance Scientific Computing” will held Jan. 24-28, 2011, in Berkeley, Calif. The workshop is organized by the International Center for Computational Science (ICCS) located at Lawrence Berkeley National Laboratory and the University of California, Berkeley. ICCS is an international collaboration to research and deliver state-of-the-art high-performance computing (HPC) hardware and software solutions to broader scientific communities. The inaugural three-day workshop, held in December 2009, drew experts from Asia, Europe and North America. The 2011 workshop will also feature a three-day program, supplemented by a two-day tutorial program on Jan. 24-25.