DDRM projects

Dynamic Distributed Resource Management (DDRM)

Three projects awarded.

Title: Deduce: Distributed Dynamic Data Analytics Infrastructure for Collaborative Environments
PI: Deborah Agarwal
URL: http://deduce.lbl.gov

Next-generation scientific discoveries are where the boundaries of datasets intersect, for instance, data integration across multiple science disciplines, data resources (simulation and experiment), and institutions and across spatial and temporal scales.  Thus, collaborative frameworks for data integration across multiple, non-coordinating sites are needed. The goal of this proposal is to investigate a distributed resource infrastructure to support collaborative dynamic distributed data integration.

Title: DREAM: Distributed Resources for the Earth System Grid Federation (ESGF) Advanced Management

PI/Co-PI: Dean Williams
URL: http://dream.llnl.gov/

The Distributed Resources for the Earth System Grid Federation (ESGF) Advanced Management (DREAM) will provide a new way to access large data sets across multiple facilities, which will immediately improve research efforts as well as numerous other data-intensive applications. With its customizable user interface that communities of scientists can use to interact with each other, DREAM will provide a host of underlying services that can be adopted in part or as a whole, including services for publishing, discovering, moving, and analyzing data. The infrastructure will be deployable on private or public clouds, and will enable seamless scaling of services to meet increased demand (for example for a large computation task) and provide fault tolerance and failover. The approach is applicable to anyone with large or small global data used for analytic data processing needs, resource management, security, and client applications. To illustrate the applicability of this infrastructure in multiple disciplines, the project will provide a number of use cases in areas including climate science, hydrology, and biology.

This project will engage closely with DOE, NASA, and NOAA science groups working at the leading edge of computing. These engagements—in domains such as biology, climate, and hydrology—will allow us to advance disciplinary science goals and inform our development of technologies that can accelerate discovery across DOE more broadly. In addition, the project will utilize well established applications, such as: the Earth System Grid Federation (ESGF: http://esgf.llnl.gov), the Ultrascale Visualization Climate Data Analysis Tools (UV-CDAT: http://uvcdat.llnl.gov), and the Earth System CoG (CoG: https://www.earthsystemcog.org/projects/cog/).

Title: VC3: Virtual Clusters for Community Computation

PI:  Doug Thain
URL: : http://virtualclusters.org

This project will aim to make research computing facilities more amenable to self-service use by broad user communities. The organizing principle is the idea of virtual clusters for community computation (VC3) in which end users can effectively "allocate" clusters from existing facilities by requesting, for example, 200 nodes of 24 cores and 64GB RAM each, with 100TB local scratch and 10Gb connectivity to external data sources. Once allocated, the owner of the virtual cluster is to install software, load data, execute jobs, and share the allocation with collaborators. Of course, we do not expect to unilaterally change the underlying resource management systems of existing facilities. Rather, virtual clusters will be provisioned on top of existing facilities by deploying user-level tools (e.g. HTCondor based tools for resource management, data caching, and Parrot for virtual file system access) within the existing batch systems. In short, a virtual cluster will appear to facility administrators as a large parallel job managed by the end user.

But virtual clusters by themselves are not enough: workloads must be self-configuring so that they can bring necessary dependencies, discover the dynamic configuration, and adapt to the (possibly) changing conditions of the virtual cluster. To accomplish this integration with reasonable effort, we will leverage a variety of existing open source tools and production quality services. The end product will be a flexible, optimizable software ecosystem with a generic set of capabilities, presented to users as a virtual cluster service accessible via a web portal. We presume the cluster as an optimal computing abstraction as it naturally covers a large swath of modern scientific computation and training base of expertise (e.g. campus HPC centers).