ASCR Monthly Computing News Report - January 2011


Computer Chip Design Simulation Breaks the Petascale Barrier using Jaguar

A team led by Gerhard Klimeck of Purdue University has broken the petascale barrier while addressing a relatively old problem in the modern field of computer chip design. Using the OLCF's Jaguar supercomputer, Klimeck and Purdue colleague Mathieu Luisier reached more than a thousand trillion calculations a second (1 petaflop) modeling the journey of electrons as they travel through electronic devices at the smallest possible scale. Klimeck, leader of Purdue's Nanoelectronic Modeling Group, and Luisier, a member of the university's research faculty, used more than 220,000 of Jaguar's 224,000 processing cores to reach 1.03 petaflops.

"What we do is build models that try to represent how electrons move through transistor structures," Klimeck explained. "Can we come up with geometries on materials or on combinations of materials - or physical effects at the nanometer scale - that might be different than on a traditional device, and can we use them to make a transistor that is less power hungry or doesn't generate as much heat or runs faster?" The team is pursuing this work on Jaguar with two applications, known as Nanoelectric Modeling (NEMO) 3D and OMEN (a more recent effort whose name is an anagram of NEMO). "Having machines like Jaguar made these calculations possible. The theory of how to do that was understood with NEMO 1D, but it was computationally prohibitively expensive. OMEN is the next-generation prototype that runs on Jaguar now," said Klimeck.

Team Develops Optimal Strategies for Inventory, Distribution of Industrial Gases
Developing effective strategies for optimizing industrial gas distribution systems raises numerous challenges. In particular, short-term distribution planning decisions (e.g., which customers receive deliveries, how much to deliver, when to deliver, and how to combine routes) must be balanced with long-term inventory decisions (e.g., how many tanks to install in each customer location and when to upgrade or downgrade existing tanks). In addition, uncertainties arising from demand fluctuations and the losses or gains of customers in the distribution network may significantly affect the decision-making across the industrial gas supply chains.

Researchers at Argonne National Laboratory, Carnegie Mellon University, and Praxair Inc. (a worldwide provider of industrial gases) developed a stochastic mixed-integer nonlinear programming (MINLP) mixed-integer linear programming (MILP) model to address these challenges. To handle the significant computational complexity that arises, the team developed a continuous approximation approach, which estimates the operational cost at the strategic level and determines the tradeoff with the capital cost from tank sizing. A tailored branch-and-refine algorithm based on successive piece-wise linear approximation was also developed to globally optimize the stochastic MINLP problems.

Case studies with up to 200 customers - and involving as many as 42,230 binary variables, 12,262 continuous variables, and 17,394 constraints - show the effectiveness of the approach in solving the distribution-inventory planning problem of large-scale industrial gas supply chains. The research has been accepted for publication in Industrial & Engineering Chemistry (the second-most-cited journal in chemical engineering).

Contact: Fengqi You - youf@mcs.anl.gov

The 20th Century Reanalysis Project: A Climate Time Machine
From the hurricane that smashed into New York in 1938 to the impact of the Krakatoa eruption of 1883, the late 19th and 20th centuries are rich with examples of extreme weather. Now an international team of climatologists have created a comprehensive reanalysis of all global weather events from 1871 to the present day, and from the earth's surface to the jet stream level. The 20th Century Reanalysis Project, outlined in the January 2011 issue of the Quarterly Journal of the Royal Meteorological Society, not only allows researchers to understand the long-term impact of extreme weather, but provides key historical comparisons for our own changing climate.

"Producing this huge dataset required an international effort to collate historical observations and recordings from sources as diverse as 19th century sea captains, turn of the century explorers and medical doctors, all pieced together using some of the world's most powerful supercomputers at the US Department of Energy's National Energy Research Scientific Computing Center in California and the Oak Ridge Leadership Computing Facility in Tennessee," said lead author Dr. Gil Compo. Compo leads the 20th Century Reanalysis Project (20CR) at the National Oceanic and Atmospheric Administration (NOAA) Earth System Research Laboratory (ESRL) and the Cooperative Institute for Research in Environmental Sciences (a joint project of NOAA and the University of Colorado) Climate Diagnostics Center.

Whole-Genome Sequencing Simulated on Jaguar
The Human Genome Project paved the way for genomics, the study of an organism's genome. Personalized genomics can establish the relationship between DNA sequence variations among individuals and their health conditions and responses to drugs and treatments. To make genome sequencing a routine procedure, however, the time must be reduced to less than a day and the cost to less than $1,000 - a feat not possible with current knowledge and technologies. Using ORNL's Jaguar, Aleksei Aksimentiev, assistant professor in the physics department at the University of Illinois–Urbana-Champaign, and his team are developing a nanopore approach, which promises a drastic reduction in time and costs for DNA sequencing. Their research reveals the shape of DNA moving through a single nanopore - a protein pore a billionth of a meter wide that traverses a membrane. As the DNA passes through the pore, the sequence of nucleotides (DNA building blocks) is read by a detector.

Aksimentiev's group uses the nanopore MspA, an engineered protein. Its sequence must be altered to bind more strongly to the moving DNA strand. MspA is an ideal platform for sequencing DNA because scientists can now measure dams in the pore, which could slow DNA's journey through the protein. Altering the MspA protein to optimize dams is both time-consuming and costly in a laboratory but simple on a computer. The team received 10 million processor hours on Jaguar through the Innovative and Novel Computational Impact on Theory and Experiment (INCITE) program. With the INCITE allocation, the scientists were able to reproduce the dams in the MspA nanopore for the type of DNA nucleotides confined to it, slowing down the sequence movement through the nanopore. "We have carried out a pilot study on several variants of the MspA nanopore and observed considerable reduction of the DNA strand speed," said Aksimentiev. "These very preliminary results suggest that achieving a 100-fold reduction of DNA velocity, which should be sufficient to read out the DNA sequence with single-nucleotide resolution, is within reach. Future studies will be directed toward this goal."

NERSC, LBNL Help Planck Mission Peel Back Layers of Universe
On Jan. 11, the Planck mission released a new catalog of data from its initial maps of the entire sky. The catalogue, called The Early Release Compact Source Catalog, includes everything from thousands of never-before-seen dusty cocoons where stars are forming, to some of the most massive clusters of galaxies. This data was produced by the U.S. Planck team using computing resources at the Department of Energy's NERSC at Berkeley Lab and at NASA's Jet Propulsion Laboratory. NERSC is the primary U.S. computing center for Planck, a European Space Agency mission with significant contributions from NASA. Planck launched in May 2009 to probe the universe just a few hundred thousand years after the Big Bang, an explosive event that created the universe about 13.7 billion years ago. The spacecraft's state-of-the-art detectors will ultimately survey the whole sky up to five times, measuring the cosmic microwave background, or radiation left over from the Big Bang. The data will help scientists decipher clues about the evolution, fate, and fabric of our universe.

This image of the microwave sky was synthesized using data spanning the range of light frequencies detected by Planck. Image credit: ESA, HFI & LFI consortia (2010).

Berkeley Lab's participation in the Planck data analysis is coordinated by Julian Borrill, one of the leaders of the Computational Cosmology Center (C3) in the Computational Research Division, and includes Christopher Cantalupo and Theodore Kisner of C3 and George Smoot and Martin White of the Physics Division.

A New Approach to Understanding Boundary Layers Using High Reynolds Numbers
Boundary layers are the primary interface between vehicles (e.g., airplanes, boats) and the medium in which they move. Their physical understanding is not only an intellectual challenge but also a prerequisite for better vehicle design.

INCITE researchers from the University of Texas at Austin discovered that at large Reynolds numbers, the boundary layer relaxes slowly from the artifacts of the inlet conditions. This appears to be physical and implies that boundary layers have longer memory than had previously been thought. However, particularly for the recycling inflow used, the inlet artifacts persist for so long that it is too expensive to simulate a long enough domain to have a significant range that is not affected.

Using the IBM Blue Gene/P system at the Argonne Leadership Computing Facility (ALCF), the researchers developed a new approach in which an effectively longer spatial domain is simulated by breaking the domain into two shorter pieces, with a plane at the end of the useful range of the first simulation used as an inlet condition for the second. This has a significant advantage, since the first domain can be simulated with much coarser resolution, making it about 10 times less expensive to simulate this portion than it would be as part of a unified single simulation. This approach has been implemented and tested during the course of 34M hours (29M capability hours), typically using 32,768 cores for each job. Coarse resolution simulations using this approach are now being calculated to eliminate statistical transients. A new set of production simulations will follow.
Contact: Robert Moser, rmoser@ices.utexas.edu

Visualization of the vorticity in a boundary layer at Re up to 2100.

PNNL's New Multiscale Model Simulates Heterogeneous Reactions in Catalytic Reactors
Donghai Mei and Guang Lin at Pacific Northwest National Laboratory (PNNL) have developed a multiscale model to simulate the heterogeneous reactions in catalytic reactors by combining first-principles kinetic Monte Carlo (KMC) simulation with a continuum computational fluid dynamic model. The developed multiscale model is employed to study the effects of heat and mass transfer on the heterogeneous reaction kinetics. The integrated computational framework consists of a surface phase, where catalytic surface reactions occur, and a gas-phase boundary layer imposed on the catalyst surface, where the fluctuating temperature and pressure gradients exist. The surface phase domain is modeled by the site-explicit first-principles KMC simulation. The gas-phase boundary layer domain is described using a computational fluid dynamic model.

Unlike in other hybrid models, the heat and mass fluxes between two domains are directly coupled by the varying boundary conditions at each simulation time-step from the unsteady state reaction regime to the steady state reaction regime in the present model. The simulation results indicate that the limitation of heat and mass transfer in the surrounding environment over the catalyst could dramatically affect the observed macroscopic reaction kinetics under presumed operating reaction conditions. This work has been published in Catalysis Today.

Contact: Mary Anne Wuennecke, maryanne.wuennecke@pnl.gov

Release of MAGMA 1.0 Math Library for GPUs and its use in MATLAB
MAGMA is a LAPACK-like math library specialized to exploit GPU performance. The MAGMA research is supported in part by the DOE Math/CS institute called EASI. A major goal of the Extreme-scale Algorithms and Software Institute is to get its newly developed algorithms and software out into the larger science community as quickly as possible. MAGMA 1.0, including the MAGMA sources, is now available.

MAGMA is designed to achieve the maximum possible performance for selected algorithms from a single CUDA-enabled NVIDIA GPU. The graph illustrates MAGMA LU performance for real and complex arithmetic. The results show single precision (SP) achieving over 450 GFlops and complex precision reaching as high as 600 GFlops. Both are a very high percentage of the peak available performance.

A second way that the algorithms are being made available to a wide community is through MATLAB. The MAGMA library is being used inside MATLAB to accelerate matrix operations on GPU based computers. MATLAB GPU support is available in Parallel Computing Toolbox. Using MATLAB for GPU computing lets students and researchers explore new ideas and take advantage of GPU performance without low-level C or Fortran programming.

Contact: Jack Dongarradongarra@cs.utk.edu


Sandia's Tamara Kolda Accepts HPC Editorship of Key Journal
Tamara Kolda of Sandia California has accepted a section editorship of the Society for Industrial and Applied Mathematics' Journal on Scientific Computing [SIAM SISC], overseeing the portion reserved for high-performance computing and software. Kolda had served on the journal's editorial board for six years and had strongly suggested formation of the new section. She is also an associate editor for SIAM Journal on Matrix Analysis and a senior member of the Association for Computing Machinery (ACM).

Kolda's research interests include multilinear algebra and tensor decompositions, data and graph mining, optimization, nonlinear solvers, graph algorithms, cybersecurity, parallel computing, and the design of scientific software. Among her awards is a 2003 Presidential Early Career Award for Scientists and Engineers.

Sandia Researcher Mike Heroux Named Editor-in-Chief of Key Software Journal
Mike Heroux has been named editor-in-chief of the Association for Computing Machinery (ACM) journal Transactions on Mathematical Software. The quarterly publication has been rated among the top 20 journals for its “impact factor” - roughly, the number of times its articles are cited by others. Heroux, a member of Sandia's Scalable Algorithms Department, has been involved in the development of mathematical software for more than 20 years. He started and leads the Trilinos Project, which is the largest single collection of open-source software libraries for scientific computing and is funded through the ASCR TOPS-2 SciDAC Center. He is also an author of other open software efforts such as Tramonto, Mantevo, BPKIT and Aztec.

Contact: Mike Heroux, maherou@sandia.gov

HPCwire Names ORNL's Aprà, Sandia's Murphy as “People to Watch” in 2011
Oak Ridge National Laboratory's Edoardo Aprà and Sandia National Laboratories researcher Richard Murphy have been named as “People to Watch” in 2011 by the online computing magazine HPCwire. The magazine each year names a handful of researchers whom its editors believe to be doing the world's most interesting work in supercomputing. “These thought leaders from the academic, government, industrial and vendor communities comprise an elite group of individuals that we believe will impact and influence the future of High Performance Computing in 2011 and far beyond,” wrote Jeff Hyman, HPCwire's president and publisher.

Aprà was also a recipient of the 2010 HPCwire Reader's Choice Award for Supercomputing Achievement, Dr. Aprà was honored for his work with a computational chemistry application known as NWChem, which was developed at Pacific Northwest National Laboratory. Under Aprà's expert guidance, the application reached an astounding 1.39 thousand trillion calculations per second, or 1.39 petaflops, on Oak Ridge National Lab's (ORNL) Jaguar system.
Read more about Aprà's selection.

Murphy is principal investigator for Sandia's X-caliber project, an effort to radically lower the power usage of computer systems at all scales by 2018, when the next generation of supercomputers is predicted to come into use. The work is funded by the Defense Advanced Research Projects Agency (DARPA). Murphy also led the launch this year of the newly created Graph500 test, an internationally used benchmark that offers an alternative to the Linpack500, which for years has been the standard measure of the ability of computers to manipulate large data sets.
Read more about Murphy's selection.

ANL's Zavala Gives Invited Talk on Optimization Methods and Energy Applications
Victor Zavala, an assistant computational mathematician in Argonne's Mathematics and Computer Science Division, was an invited speaker at the U.S.-Mexico Workshop on Optimization and Its Applications, held January 3-7 in Oaxaca, Mexico. Zavala's presentation, following the initial welcome by the workshop chairman, focused on optimization challenges in the energy industry. Zavala, in collaboration with his MCS colleagues, is exploring the use of detailed physical models, large-scale optimization solvers, and control concepts to enable a more consistent integration of multiple decision-making levels in electricity markets and, ultimately, to achieve higher overall efficiencies. Domains of this work involve transmission planning, stochastic unit commitment and economic dispatch, dynamic electricity markets, and building energy management.

NERSC User James Drake Receives 2010 APS Maxwell Prize for Plasma Physics
Long-time NERSC user James Drake has been awarded the 2010 James Clerk Maxwell Prize for Plasma Physics, the highest honor bestowed to plasma physicists by the American Physical Society (APS). Drake, a Professor of Physics at the University of Maryland, has been a user of NERSC supercomputer systems for over a decade. He and his Maryland colleagues use NERSC resources to focus on two key problems in what is called “magnetic reconnection,” which refers to the breaking and topological rearrangement of magnetic field lines in a plasma.

In 2010 alone, Drake and his coworkers reported eight journal publications arising from computations using NERSC resources. Drake is a leading investigator on the NERSC project “Turbulence, Transport and Magnetic Reconnection in High Temperature Plasma,” which is funded by the Department of Energy's Office of Fusion Energy Science (FES).


Engineering Mixed Traffic on ESnet 100 Gbps Testbed

The first crop of experiments using ESnet's Advanced Networking Initiative testbed are now in full swing. In a project funded by the DOE Office of Science, Prof. Malathi Veeraraghavan and post-doc Zhenzhen Yan at the University of Virginia, along with consultant Prof. Admela Jukan, are investigating the role of hybrid networking in ESnet's next generation 100 Gbps network. Their goal is to learn how to optimize a hybrid network comprised of two components, an IP datagram network and a high-speed optical dynamic circuit network, to best serve users' data communication needs. ESnet deployed a hybrid network in 2006, based on an IP-routed network and the Science Data Network (SDN), which is a dynamic virtual circuit network.

“It is a question of efficiency, which essentially boils down to cost,” Veeraraghavan said. “IP networks have to be operated at low utilization for the performance requirements of all types of flows to be met. With hybrid networks, it is feasible to meet performance requirements while still operating the network at higher utilization.”

Data flows have certain characteristics that make them suited for certain types of networks. It is a complex problem to match flows with the right networks. In the ESnet core network, one can identify flows by looking at multiple fields in packet headers, according to Veeraraghavan, but you can't know the size of the flow (bytes) or whether a flow is long or short. A challenge of this project is to predict characteristics of data flows based on prior history. To do this, the researchers are using machine learning techniques. Flows are classified based on size and duration. Large-sized (“elephant”) flows are known to consume a higher share of bandwidth and thus adversely affect small-sized (“mice”) flows. Therefore, they are good candidates to redirect to the SDN. If SDN circuits are to be established dynamically, i.e., after a router starts seeing packets in a flow that history indicates is a good candidate for SDN, then the flow needs to not only be large-sized but also of long-duration (“tortoise”) because circuit setup takes minutes. Short-duration (“dragonfly”) flows are not good candidates for dynamic circuits, but if they are of large size and occur frequently, static circuits could be used.

Peridynamics “Gets Math” at Oberwolfach
A Mathematisches Forschungsinstitut Oberwolfach mini-workshop on the mathematical analysis for peridynamics was held January 16-22. Organized by Etienne Emmrich (University of Bielefeld), Max Gunzburger (FSU) and Richard Lehoucq (Sandia National Labs), the workshop brought together 17 participants to present their research on the peridynamic theory of continuum mechanics, a novel multiscale mechanical model proposed by Stewart Silling (Sandia National Labs) where the flux is given by an integral operator. As such, the underlying regularity assumptions are more general, for instance, allowing discontinuous, let alone non-differentiable, deformation. Although the theoretical mechanical formulation of peridynamics is well-understood, the mathematical and numerical analyses are in their early stages and are being actively explored under an ASCR Applied Math project led by Lehoucq. Oberwolfach workshops are proposed and are granted during a peer-reviewed process.

Contact: Rich Lehoucq, rblehou@sandia.gov

2011 ALCF Winter Workshop Series Offers Hands-on Experience to Users
The January 2011 Winter Workshop Series, sponsored by the Argonne Leadership Computing Facility (ALCF), drew 68 attendees. The “Getting Started” workshop, held January 18, provided users with key information on ALCF services and resources, technical details on the IBM Blue Gene/P architecture, as well as hands-on assistance in porting and tuning of applications on the Blue Gene/P. In the "Productivity Tools for Leadership Science" workshop, held January 19-20, ALCF experts helped boost users' productivity using TAU, Allinea, and other HPC tools. The workshop covered parallel I/O, visualization and data analysis, and libraries on the IBM Blue Gene/P system at the ALCF. Hands-on assistance was provided. Attendees remarked how helpful they found the interactions with the ALCF experts and hands-on demos to be.

In addition, 54 registered for the ALCF-sponsored webinar, "INCITE Proposal Writing," held January 24. Scientists with the ALCF and Oak Ridge's Scientific Computing group provided tips and suggestions to improve the quality of proposal submissions for 2012 INCITE awards. This workshop was presented concurrently as a live event and a webinar.

To download workshop and webinar presentations, visit the ALCF websites (http://workshops.alcf.anl.gov/wss11/agenda/ and http://workshops.alcf.anl.gov/pww11/).

Attendees at ANL's “Getting Started” workshop.

LBNL Hosts Successful Workshop on Manycore- and Accelerator-based HPC
The second annual workshop on "Manycore and Accelerator-based High-performance Scientific Computing" held Jan. 24-28 in Berkeley drew nearly 100 attendees and a new international partner. According to workshop co-organizer Hemant Shukla, 85 attendees registered in advance and another five signed up at the door. A two-day tutorial session drew 40 participants and about 20 were turned away due to lack of space.

The meeting focused on harnessing the full potential of emerging many-core architectures in science and technology by bringing together experts and enthusiasts from academia and industry to introduce, explore and discuss the scope and challenges of these novel architectures for high performance computing. The developed solutions will have broader impact across science and technology disciplines such as healthcare, energy, aerospace and others.

The workshop was organized by the International Center for Computational Science (ICCS) located at Lawrence Berkeley National Laboratory and the University of California, Berkeley. ICCS is an international collaboration to research and deliver state-of-the-art high-performance computing (HPC) hardware and software solutions to broader scientific communities. During the workshop, Nagasaki University joined ICCS, which also includes the University of Heidelberg in Germany and the National Astronomical Observatories of Chinese Academy of Science.

Sandia Researchers Organize Tensor Workshop
Sandia's David Gleich (current Von Neumann Fellow) and Tamara Kolda, along with Andreas Argyriou of the Toyota Institute of Technology at Chicago, Vicente Malave of the University of California at San Diego, and Marco Signoretto and Johan Suykens of K. U. Leuven, co-organized a workshop on "Tensors, Kernels, and Machine Learning." The Dec. 10 workshop was held as part of the 2010 Neural Information Processing Systems (NIPS) conference, one of the premiere venues for leading-edge research in the technological, mathematical, and theoretical aspects of information processing systems. The workshop took place in Whistler, British Columbia. The purpose of the workshop was to foster cross-fertilization between experts in tensors and experts in machine learning, with the hope that techniques could be adapted in both directions. Three plenary speakers anchored the workshop, which also featured eight contributed talks, a contributed poster session, and two lively discussion sessions. One of the contributed talks was given by Eric Chi, a DOE Computational Sciences Graduate Fellowship student, who lectured on statistics-based techniques for fitting tensor decompositions, based on his summer internship at Sandia. Abstracts and slides are available online at http://csmr.ca.sandia.gov/~dfgleic/tkml2010/.

Contact: Tamara Kolda, tgkolda@sandia.gov

Hone Your Supercomputing Skills at OLCF's Spring Training
Users of ORNL's Cray XT5 will get an opportunity to hone their supercomputing skills this March at a weeklong "Spring Training" planned at ORNL. The event will run from March 7-11, offering training opportunities for expert users and novices alike, and will culminate on Friday with the annual OLCF Users Meeting. The training is being sponsored by the OLCF, home to the Jaguar system. Supercomputing novices will benefit from the "High-Performance Computing Crash Course" offered Monday and Tuesday. It will be taught by Rebecca Hartman-Baker and Arnold Tharrington from the OLCF, with Monday's session focusing on Linux and Tuesday's focusing on the MPI (Message-Passing Interface) communications protocol.

More advanced users will be interested in Wednesday and Thursday workshops focusing on the Cray XT5 and led by OLCF staff. According to Bobby Whitten of the OLCF, Wednesday's workshop is geared to intermediate users, while Thursday's will be for both intermediate and advanced users. Friday will be devoted to the OLCF Users Meeting. Users will hear from OLCF group leaders and the facility's User Council. Liaisons from the center's Scientific Computing Group will also address the meeting. According to Whitten, the meeting will be open to users offsite through the WebEx online conferencing tool. He said organizers hope meeting software will boost overall attendance. "If overall participation increases, I don't think we really care if it's in-person participation or distance participation," he noted. If there is enough demand, Whitten said, the weeklong training may be offered summer and fall as well. For more information on the "Spring Training, see www.olcf.ornl.gov/event/olcf-nics-springtraining/.

NERSC Hosts Members of Parallel Computing Club from Community College
During a conversation at the SC10 conference in New Orleans, Contra Costa College professor Tom Murphy mentioned to Berkeley Lab staff that his students wanted to learn parallel programming, but since there was no funding, he was launching a Parallel Programming Club. Ideas were tossed around for collaboration, and on Jan. 14, Murphy and 10 of his students paid a visit to DOE's National Energy Research Scientific Computing Center (NERSC) at Berkeley Lab. The group was welcomed by NERSC User Services Group Lead Katie Antypas, who gave an introduction to parallel computing and NERSC, as well as a tour of the facility's computer room. "Thanks so much," Murphy said afterward. "It was a great impactful day!"