ASCR Monthly Computing News Report - November 2010



Labs Contribute Expertise to Eight Technical Papers at SC10
The Technical Papers program at the SC conference is the leading international forum for highlighting research in the areas of applications, architecture and networks, clouds and Grids, performance, storage and system software. For SC10, 253 papers were submitted, making it one of the most competitive fields in the history of the conference. Of the 51 papers selected after careful review, eight were authored or co-authored by experts from DOE labs. The papers and authors are:
  • “Parallel Fast Gauss Transform,” (Best Paper candidate), Rahul S. Sampath, Oak Ridge National Laboratory; Hari Sundar. Siemens; and Shravan K. Veerapaneni, New York University.
  • “A Flexible Reservation Algorithm for Advance Network Provisioning,” Mehmet Balman, Evangelos Chaniotakis, Arie Shoshani and Alex Sim, all of Lawrence Berkeley National Laboratory.
  • “Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System,” Adam Moody, Greg Bronevetsky, Kathryn Mohror and Bronis R. de Supinski, all of Lawrence Livermore National Laboratory.
  • “Accelerating I/O Forwarding in IBM Blue Gene/P Systems,” Venkatram Vishwanath, Mark Hereld, Kamil Iskra, Dries Kimpe, Vitali Morozov, Michael E. Papka, Robert Ross and Kazutomo Yoshii, all of Argonne National Laboratory.
  • “A Scalable and Distributed Dynamic Formal Verifier for MPI Programs,” Anh Vo, Sriram Aananthakrishnan and Ganesh Gopalakrishnan, all of University of Utah; Greg Bronevetsky, Bronis R. de Supinski and Martin Schulz, all of Lawrence Livermore National Laboratory
  • “Functional Partitioning to Optimize End-to-End Performance on Many-Core Architectures,” Min Li, Virginia Tech; Sudharshan S. Vazhkudai, Oak Ridge National Laboratory; Ali R. Butt, Virginia Tech; Fei Meng and Xiaosong Ma, both of North Carolina State University; Youngjae Kim, Christian Engelmann and Galen Shipman, all of Oak Ridge National Laboratory.
  • “Diagnosis, Tuning and Redesign for Multicore Performance: A Case Study of the Fast Multipole Method,” Aparna Chandramowlishwaran, Georgia Institute of Technology; Kamesh Madduri, Lawrence Berkeley National Laboratory; Richard Vuduc, Georgia Institute of Technology.
  • “Optimal Utilization of Heterogeneous Resources for Biomolecular Simulations,” Scott Hampton, Oak Ridge National Laboratory; Sadaf Alam, Swiss National Supercomputing Centre; Paul Crozier, Sandia National Laboratories; Pratul Agarwal, Oak Ridge National Laboratory
Blood Flow Simulation on Jaguar Takes ACM Gordon Bell Prize at SC10
A team from Georgia Tech, New York University, and Oak Ridge National Laboratory (ORNL) took this year’s ACM Gordon Bell Prize at SC10 by pushing ORNL’s Jaguar supercomputer to 700 trillion calculations per second (700 teraflops) with a groundbreaking simulation of blood flow. The team wins a $10,000 prize provided by HPC pioneer Bell as well as the distinction of having the world’s leading scientific computing application. The winning team used 196,000 of Jaguar’s 224,000 processor cores to simulate 260 million red blood cells and their interaction with plasma in the circulatory system. Another team using Jaguar took an honorable mention in the competition for developing an innovative framework that calculates critical nanoscale properties of materials.
Lawrence Berkeley National Laboratory’s Horst Simon, in announcing the winners Nov. 18, noted that the winners achieved a 10,000-fold improvement over previous simulations of its type.
“This team from Georgia Tech, NYU, and Oak Ridge National Lab received the award for obtaining four orders of magnitude improvement over previous work and achieved an impressive more than 700 teraflops on 200,000 cores of the Jaguar system,” Simon said. “It’s a very significant accomplishment.” Simon noted also that the team simulated realistic, “deformable” blood cells that change shape rather than simpler, but less realistic, spherical red blood cells, calling the approach a “very challenging multiscale, multiphysics problem.” The winning team included Abtin Rahimian, Ilya Lashuk, Aparna Chandramowlishwaran, Dhairya Malhotra, Logan Moon, Aashay Shringarpure, Richard Vuduc, and George Biros of Georgia Tech; Shravan Veerapaneni and Denis Zorin of NYU; and Rahul Sampath and Jeffrey Vetter of ORNL.
Jaguar Shines in Supercomputing Challenge
The Jaguar supercomputer housed at Oak Ridge National Laboratory’s Leadership Computing Facility (OLCF) continued to demonstrate its balanced architecture at SC10, taking many of the top spots in this year’s High-Performance Computing (HPC) Challenge. The challenge gives the world’s most powerful systems an opportunity to demonstrate the range of hardware and software capabilities necessary for a useful supercomputer.
The OLCF Jaguar system took first place in two of the competition’s four benchmarks, known as HPL and STREAM. HPL, or High-Performance Linpack, measures the speed of a supercomputer by solving a dense linear system of equations, while STREAM measures the memory bandwidth and corresponding computational rate for a simple vector kernel. In addition, Jaguar took second place in a benchmark known as Global FFT and third place in another called RandomAccess. Global FFT evaluates a system’s ability to transform one function into another via fast Fourier transform, a method used in many DOE science applications, including fusion, materials, accelerators and many fluid dynamics problems; while RandomAccess evaluates a memory system’s performance with small, randomly placed transactions.
Berkeley’s Hasenkamp Wins Third Place in ACM Student Research Poster Competition
Daren Hasenkamp, an undergraduate intern from UC Berkeley working at Lawrence Berkeley National Laboratory, has won third place in the SC10 ACM Student Research Competition for his poster on “Finding Tropical Cyclones on Clouds.” As one of the top finishers, he will be invited to participate in the ACM Student Research Competition Grand Finals. A full paper on his work with Alex Sim, Michael Wehner, and Kesheng John Wu has been accepted at CloudCom 2010. Here is the poster abstract: “In this work, we bring the power of cloud computing to bear on the task of analyzing trends of tropical cyclones in climate simulation data. The cloud computing platform is attractive here because it can provide an environment familiar to climatologists and their analysis tools. We created virtual machines (VMs) and ran them on the Magellan Scientific Cloud at Argonne National Laboratory. Our VM communicates with instances of itself to split up and analyze large datasets in parallel. In a preliminary test, we used this virtual climate analysis platform to analyze ~500 GB of climate data. Using 34 VMs, the total analysis time was reduced by a factor of ~40 from the traditional analysis method. The main advantages of our method are that the level of parallelism is easily configurable, and software dependency resolution is simple.”
DOE Labs’ Habib, Colella, Stevens, Harrison Deliver Invited Talks at SC10
Four leading computational scientists – Salman Habib of Los Alamos National Laboratory (LANL), Phil Colella of Lawrence Berkeley National Laboratory (LBNL), Rick Stevens of Argonne National Laboratory (ANL), and Robert Harrison of Oak Ridge National Laboratory (ORNL) – were among the 16 invited speakers in the Masterworks series at SC10.
Habib, a member of the Nuclear & Particle Physics, Astrophysics and Cosmology Group in the Theoretical Division at LANL spoke on “Computing the Universe” during the Big Science, Big Data I session on Tuesday, Nov. 16.
Colella, leader of the Applied Numerical Algorithms Group in LBNL’s Computational Research Division, discussed “High-End Computing and Climate Modeling: Future Trends and Prospects” during the Big Science, Big Data II session on Tuesday, Nov. 16.
Stevens, ANL’s Associate Laboratory Director for Computing, Environment and Life Sciences with appointments as professor at the University of Chicago, described “Computing and Biology: Toward Predictive Theory in the Life Sciences” as part of the Genomics-Driven Biology session on Tuesday, Nov. 16.
Harrison, leader of the Computational Chemical Sciences Group in ORNL’s Computer Science and Mathematics Division who has a joint appointment in the chemistry department of the University of Tennessee, Knoxville, gave a talk on “Applications of MADNESS (Multiresolution ADaptive Numerical Environment for Scientific Simulation)” during the Climbing the Computational Wall session on Thursday, Nov. 18.
NERSC’s Bautista, Shalf Create SC10 Showcase for Disruptive Technologies
Each year since 2006, the SC conference has highlighted “disruptive technologies” – drastic innovations in current practices such that they have the potential to completely transform the high-performance computing field as it currently exists — ultimately overtaking the incumbent technologies or software tools in the marketplace. For SC10, NERSC’s Elizabeth Bautista and John Shalf made a concerted effort to provide a showcase for such technologies, especially in light of SC10 Keynote Speaker Clayton Christensen, who first coined the term “disruptive technology” in his 1997 book, The Innovator’s Dilemma. The focus this year was on technologies that could enabling exascale computing. By all accounts, the path to exascale computing will require many highly disruptive technology phase transitions, so any technology that is able to overcome these hurdles will be “disruptive” by definition. In New Orleans, the Disruptive Technologies program included 14 competitively selected projects featured in the SC10 exhibits, seven plenary talks, and two panel discussions. Read more about the Disruptive Technologies program. Read a Q&A with John Shalf about the program.


INCITE Program Allocates 1.7 Billion Hours to 57 Advanced Research Projects
Energy Secretary Steven Chu announced the largest-ever awards of the Department’s supercomputing time to 57 innovative research projects through DOE’s Innovative and Novel Computational Impact on Theory and Experiment (INCITE) program. The awards include nearly 1.7 billion processor hours on the Department of Energy’s advanced supercomputers — the largest total ever — reflecting both the growing sophistication of the field of computer modeling and simulation and the rapid expansion of supercomputing capabilities at DOE National Laboratories in recent years. The projects include both academic and commercial research, including partnerships with companies such as GE and Boeing to use sophisticated computer modeling in the development of better wind turbines and jet engines.
Scientists Perform the Largest Unstructured LES of a Real, Full Combustion Chamber
The increase of computer power has allowed science to make important strides in a variety of domains such as plasma studies, biomechanics, and molecular dynamics. With access to the INCITE program, researchers from CERFACS (the European Centre for Research and Advanced Training in Scientific Computation) have been able to perform top-of-the-line quality simulations on highly complex cases in their goal towards the fully numerical modeling of a real combustor.
Their research is focused on large eddy simulation (LES) of gas turbine engines with the inclusion of liquid phase phenomena. CERFACS has performed simulations and validation of two-phase flow experiments. In parallel, the researchers have performed the largest unstructured LES done to date of a real, full combustion chamber (330 million elements) on more than 16K cores of the Blue Gene/P at the Argonne Leadership Computing Facility. This simulation contributes to the validation of the LES approach when dealing with combustion instabilities. In these cases, the effects of mesh refinement are a highly critical point that was validated during the Stanford Center for Turbulence Research (CTR) summer program. A second mesh independency validation was performed but used a simpler, two-phase-flow single burner with three levels of refinement (4, 8, and 16 million elements). These results have been published in the CTR Proceedings of the 2010 Summer Program by Cambridge University Press and submitted to Flow Turbulence and Combustion in October 2010.
Contact: Cheryl Drugan,
LBNL Team Develops Flexible Reservation Algorithm for Advance Network Provisioning
As scientific research becomes more collaborative and more data-intensive, many applications need networking support that provides predictable performance, which in turn requires effective algorithms for bandwidth reservations. Network reservation systems such as ESnet’s OSCARS (On-Demand Secure Circuits and Advance Reservation System) establish guaranteed bandwidth of secure virtual circuits for a certain bandwidth and length of time. However, users currently cannot inquire about bandwidth availability nor receive alternative suggestions when reservation requests fail.
To address this, Mehmet Balman, Arie Shoshani and Alex Sim of Berkeley Lab’s Scientific Data Management Group and Evangelos Chaniotakis of ESnet developed a flexible reservation algorithm for advance network provisioning. The algorithm is a novel approach for pathfinding in time-dependent networks, taking advantage of user-provided parameters of total volume of data to be sent and time constraints for moving the data. Users receive a list of options for earliest completion and shortest duration. A paper describing the work was one of 51 technical papers accepted by the SC10 conference (out of 253 submissions) and was presented by Balman on Tuesday, Nov. 16 at the conference in New Orleans. The algorithm will also be incorporated into the next version of the OSCARS software release.
Magellan Cloud Testbed at ANL, LBNL Top Choice of HPCwire Readers
The U.S. Department of Energy’s cloud computing testbed project known as Magellan has been recognized in the annual HPCwire Readers’ and Editors’ Choice Awards. Representatives of Argonne and Lawrence Berkeley national laboratories, where Magellan testbeds are located, were presented the award at SC10, the international conference for high performance computing, networking, storage and analysis, in New Orleans.
The Magellan research project, which is exploring the suitability of cloud computing to help meet the computational science requirements of DOE researchers, was honored with the Readers’ Choice Award for “Best Use of HPC in the Cloud.” Magellan systems, funded under the American Reinvestment and Recovery Act, are located at the Argonne Leadership Computing Facility (ALCF) in Illinois and the National Energy Research Scientific Computing (NERSC) Center at Berkeley Lab in California.
The HPCwire Readers’ Choice Awards are determined through online polling of the global HPCwire audience, along with a rigorous selection process involving HPCwire editors and industry luminaries. The awards are an annual feature of the publication and constitute prestigious recognition from the HPC community. These awards are revealed each year to kick off the SC conference, which showcases high performance computing, networking, storage, and data analysis.


Berkeley Lab’s Kesheng John Wu Is Named ACM Distinguished Scientist
The ACM (the Association for Computing Machinery) has named 47 of its members as Distinguished Members in recognition of their individual contributions to practical and theoretical aspects of computing that drive innovation and sustain economic competitiveness. Kesheng John Wu of the Scientific Data Management Research Group in Berkeley Lab’s Computational Research Division is one of 41 ACM members named 2010 Distinguished Scientists. Select this link for the complete list. Wu is the lead developer of FastBit, an indexing technology for accelerating searching operations of massive databases, capable of searching up to 100 times faster than other technologies. FastBit received a 2008 R&D100 Award.
ORNL’s Aprà Takes HPCwire Readers’ Choice Award
ORNL computational chemist Edoardo Aprà is winner of this year’s HPCwire Reader’s Choice Award in supercomputing achievement. The awards were passed out Monday, November 10, in New Orleans at the 2010 International Conference for High Performance Computing, Networking, Storage and Analysis, better known as SC10. Aprà was honored for his work with a computational chemistry application known as NWChem, which was developed at Pacific Northwest National Laboratory. Under Aprà’s guidance, the application reached 1.39 thousand trillion calculations per second, or 1.39 petaflops, on ORNL’s Cray XT5 Jaguar system.
NWChem helped Aprà and his colleagues uncover the electronic structure of water using a quantum chemistry technique called coupled cluster. They published some of their scientific results in the October 21 issue of the Journal of Physical Chemistry Letters. The team was also a finalist for the prestigious 2009 Gordon Bell Prize, which recognizes the world’s top supercomputing application. “Top supercomputing achievement” is one of about 20 categories offered in the awards. The Readers’ Choice winners are determined through polling among HPCwire’s online audience. The site’s 30,000 newsletter subscribers are also asked to vote.
Argonne Researcher Balaji Discusses the Next Step in High-End Computing
In an article in HPCsource, Pavan Balaji, an assistant computer scientist in Argonne’s Mathematics and Computer Science Division, discusses architectural trends in the post-petascale era. He compares two models: a homogeneous but heavily hierarchical architecture, where each level of the hierarchy has an increasingly large amount of shared hardware, and systems with a heterogeneous collection of a few general-purpose CPU cores together with a large number of accelerators. The article, titled “The Next Step in High-end Computing,” appeared in the autumn 2010 issue of HPCsource.
Berkeley Lab’s Kathy Yelick Gives Invited Talk in MIT Distinguished Lecture Series
Kathy Yelick, LBNL’s Associate Laboratory Director for Computing Sciences and Director of the NERSC Division, returned to MIT, where she earned her bachelor’s, master’s and doctorate degrees, to give an invited talk on “Exascale Computing: More and Moore?” on November 4. Her talk looked at the issues facing HPC as it moves toward the exascale. In her abstract, Yelick noted, “Past growth in the high end has relied on a combination of faster clock speeds and larger systems, but the clock speed benefits of Moore’s Law have ended, and 200-cabinet petascale machines are near a practical limit. Future system designs will instead be constrained by power density and total system power demand, resulting in radically different architectures. The challenges associated with exascale computing will require broad research activities across computer science, including the development of new algorithms, programming models, system software and computer architecture. While these problems are most evident at the high end, they limit the growth in computing performance across scales, from hand-held client devices to personal clusters and computational clouds.”
Yelick’s presentation was part of MIT’s Computer Science and Artificial Intelligence Lab (CSAIL) Dertouzos Lecture series, named in honor of Prof. Michael L. Dertouzos, the former director of the MIT Laboratory for Computer Science who died in 2001.
ANL’s Boyana Norris Delivers Keynote at Surface Dynamics Modeling System Meeting
Argonne computer scientist Boyana Norris gave a keynote address at the Community Surface Dynamics Modeling System Meeting 2010 “Modeling for Environmental Change,” held in San Antonio, Texas, October 14–17. Her presentation, titled “High-Performance Component-Based Scientific Software Development,” included a frank discussion of both the benefits and the challenges in producing component-based software. The CSDMS Project is an international effort to develop a suite of modular numerical models able to simulate the evolution of landscapes and sedimentary basins, on time scales ranging from individual events to many millions of years. Norris is also on the CSDMS steering committee, where she focuses on using component technology to provide consistent interfaces to software developed within CSDMS and to ensure that the component software infrastructure and tools meet the needs of CSDMS researchers.


OLCF Leads in Petascale Science
The Oak Ridge Leadership Computing Facility (OLCF) continues to demonstrate importance to the science community, offering unmatched resources to researchers exploring climate change, alternative energy sources, and the full range of critical science challenges. Jaguar at the OLCF was ranked second among the world’s most powerful supercomputers, according to the latest TOP500 list, released this week at the SC10 conference in New Orleans. And Jaguar continues to lead the world in science application performance as the only system to deliver sustained petascale performance on demanding scientific application codes.
ORNL Associate Laboratory Director for Computing and Computational Sciences Jeff Nichols pointed to five applications in physics, materials science, and chemistry that have broken the petascale barrier, taking Jaguar to more than 1 thousand trillion calculations a second (1 petaflop): Swiss National Supercomputing Center Director Thomas Schulthess and colleagues used DCA++, an application that simulates high-temperature superconductors; Markus Eisenbach of ORNL and colleagues used WS-LSMS, an application that analyzes magnetic systems and, in particular, the effect of temperature on these systems; Edoardo Aprà of ORNL and colleagues used NWChem, a quantum chemistry application that accurately describes the electronic structure of water; Gerhard Klimeck of Purdue University and colleagues used OMEN, an application that delves into the quantum mechanical behavior of electrons traveling through electronic devices at the smallest possible scale; and Schulthess and colleagues once again broke the petaflop barrier with a method that calculates an important parameter for DCA++.
NERSC’s Hopper Breaks the Petaflops Barrier
DOE’s National Energy Research Scientific Computing Center (NERSC), already one of the world’s leading centers for scientific productivity, is now home to the fifth most powerful supercomputer in the world and the second most powerful in the United States, according to the latest edition of the TOP500 list, the definitive ranking of the world’s top computers.
NERSC’s newest supercomputer, a 153,408-processor-core Cray XE6 system, posted a performance of 1.05 petaflops (quadrillions of calculations per second) running the Linpack benchmark. This makes Hopper the second most powerful system in the U.S., the fifth fastest in the world, and only the third U.S. machine to achieve petaflop/s performance. In keeping with NERSC’s tradition of naming computers for renowned scientists, the system is named Hopper in honor of Admiral Grace Hopper, a pioneer in software development and programming languages. The system, installed in September 2010, is funded by DOE’s Office of Advanced Scientific Computing Research.
Established in 1974, NERSC is located at Lawrence Berkeley National Laboratory in California and provides computing systems and services to more than 3,000 researchers supported by the Department of Energy (DOE). NERSC’s users, located at universities, national laboratories, and other research institutions around the country, report producing more than 1,500 scientific publications each year as a result of calculations run at NERSC.
IBM’s Next-Generation Blue Gene Named #1 on Green500 List
A prototype of IBM’s next-generation Blue Gene has been cited as Number 1 on the Green500 list. The list ranks the most energy-efficient supercomputers in the world, focusing on power consumption and reliability as well as performance. The next-generation Blue Gene supercomputer is scheduled to be deployed in 2012 by Argonne and LLNL. The two laboratories have collaborated closely with IBM on the design of Blue Gene, influencing many aspects of the system’s software and hardware. In a press release from IBM, Rick Stevens, associate laboratory director for computing at Argonne, said, “IBM’s next-generation Blue Gene provides a glimpse of the discipline needed to improve power efficiency in order to allow the industry to build exascale-class systems capable of solving highly complex challenges.”
Argonne currently is home to the 557-teraflops Intrepid, an energy-efficient IBM Blue Gene/P supercomputer that uses about one-third as much electricity as a comparable supercomputer. The next-generation, 10-petaflops system will include more than 0.75 million cores and 0.75 petabytes of memory.
ALCF’s Intrepid Ranked #1 on Graph 500 List, NERSC’s Franklin Comes in Second
Intrepid, the IBM Blue Gene/P supercomputer housed at the Argonne Leadership Computing Facility (ALCF), was ranked number one on the first Graph 500 list unveiled November 17 at the SC10 conference in New Orleans. Franklin, a Cray XT4 at the National Energy Research Scientific Computing Center (NERSC), was ranked number two on the list; and Jaguar, the Cray XT5 at the Oak Ridge Leadership Computing Facility (OLCF), was ranked ninth. The list ranks supercomputers based on their performance on data-intensive applications ( and thus complements the Top 500 list that is based on the LINPACK benchmark (
Data-intensive supercomputer applications are increasingly important HPC workloads. Current benchmarks and performance metrics do not provide useful information on the suitability of supercomputing systems for data-intensive applications.
Backed by a steering committee of more than 30 international HPC experts from academia, industry and national laboratories, Graph 500 establishes a new set of large-scale benchmarks for these applications. These benchmarks will guide the design of hardware architectures and software systems intended to support such applications and help procurements. Graph algorithms are a core part of many analytics workloads. The Graph 500 was introduced at the 2010 International Supercomputing Conference held May 30-June 3 in Germany. In future years, the list is expected to rotate between the annual ISC and SC conferences.
Fenius Networking Demo Scores a Global First
ESnet has been promoting global interoperability in virtual circuit provisioning by collaborating on the Fenius project. Recently this effort took another step forward by enabling four different provisioning systems to cooperate for the Automated GOLE demonstration at the GLIF workshop held at CERN in Geneva, Switzerland.
The demonstration was a complete success and resulted in what is believed to be a global first: a virtual circuit was completely automatically set up through five different networks and four different provisioning systems. And it was completed in a short amount of time — it only took about five minutes from the initiating request until packets were flowing from end to end.