Slurm User Group Meeting 2015
Registration
The conference cost is
- $250 per person for early registration by 31 July 2015
- $350 per person for standard registration by 31 August 2015
- $600 per person for late registration starting 1 September 2015
This includes presentations, tutorials, lunch and snacks on both days,
plus dinner on Tuesday evening.
Register here.
Agenda
Hosted by the The George Washington University
The 2015 Slurm User Group Meeting will be held on September 15 and 16 in Washington, DC. The meeting will include an assortment of tutorials, technical presentations, and site reports. The schedule and abstracts are shown below.
The meeting will be held at the The Marvin Center Grand Ballroom, 800 21st St NW, Washington, DC 20052.
Hotel Information
The official George Washington University Hotels
Schedule
15 September 2015
Time | Theme | Speaker | Title |
---|---|---|---|
08:00 - 08:30 | Registration | ||
08:30 - 08:45 | Welcome | Wickberg | Welcome |
08:45 - 09:30 | Keynote | Putman | 10-years of computing and atmospheric research at NASA: 1 day per day |
09:30 - 10:00 | Technical | Jette, Auble, Georgiou | Overview of Slurm Version 15.08 |
10:00 - 10:15 | Break | ||
10:15 - 11:00 | Technical | Christiansen, Auble | Trackable Resources (TRES) |
11:00 - 11:30 | Technical | Auble, Perry | Message Aggregation |
11:30 - 12:00 | Technical | Jette and Wickberg | Burst Buffer Support |
12:00 - 12:15 | Technical | Auble | Quality Of Service Attached to a Partition |
12:15 - 13:15 | Lunch | ||
13:15 - 13:45 | Technical | Jette | Power Management Support for Cray Systems |
13:45 - 14:15 | Technical | Hautreux | Slurm Layouts Framework |
14:15 - 14:45 | Technical | Georgiou, Hautreux | Power adaptive scheduling based on layouts |
14:45 - 15:15 | Technical | Jacobsen, Botts, Canon | Never port your code again: Docker deployment with Slurm |
15:15 - 15:30 | Break | ||
15:30 - 16:00 | Technical | Silla | Increasing cluster throughput with Slurm and rCUDA |
16:00 - 16:30 | Technical | Markwardt | Running Virtual Machines using Slurm |
16:30 - 17:00 | Technical | Lu, Zhang, et. al. | Extending Slurm with Support for SR-IOV and IVShmem |
19:00 | Dinner | Old Ebbitt Grill (Partial Atrium 1) | 675 15th Street, NW, Washington, DC 20005 |
16 September 2015
Time | Theme | Speaker | Title |
---|---|---|---|
08:00 - 08:30 | Technical | Schultz, Perry | Support for heterogeneous resources and MPMD model |
08:30 - 09:00 | Technical | Rajagopal, Glesser | Towards a multi-constraints resources selection within Slurm |
09:00 - 09:30 | Technical | Glesser, Georgiou | Improving Job Scheduling by using Machine Learning |
09:30 - 10:00 | Technical | Chakraborty, et.al. | Enhancing Startup Performance of Parallel Applications in Slurm |
10:00 - 10:15 | Break | ||
10:15 - 10:45 | Technical | Haymore | Profile-driven testbed |
10:45 - 11:15 | Technical | Benini, Trofinoff | Workload Simulator |
11:15 - 11:45 | Technical | Auble, Georgiou | Slurm Roadmap |
11:45 - 12:15 | Technical | Christiansen and Jette | Federated Cluster Scheduling |
12:15 - 13:15 | Lunch | ||
13:15 - 13:45 | Technical | Jacobsen, Botts | Experiences of Native SLURM on the NERSC Edison Cray XC30 |
13:45 - 14:05 | Site Report | Cox | Brigham Young University |
14:05 - 14:25 | Site Report | Desantis | University of South Florida |
14:25 - 14:45 | Site Report | Pfaff | NASA Center for Climate Simulation (NCCS) |
14:45 - 15:05 | Site Report | Krause | Jülich Supercomputing Center |
15:05 - 15:20 | Break | ||
15:20 - 15:40 | Site Report | Toro, Hernandez | Slurm Experiences on GUANE-1 |
15:40 - 16:00 | Site Report | Wickberg | The George Washington University |
16:00 - 17:00 | Closing | Auble | Closing discussions |
Abstracts
15 September 2015
Keynote: 10-years of computing and atmospheric research at NASA: 1 day per day
William Putman (NASA Center for Climate Simulation, NCCS)
Global weather and climate models have evolved dramatically from their origins as basic mathematical models in the 1950s and 1960s. The growth and availability of more advanced computing over the latter part of the twentieth century lead to interactive atmosphere and land models and eventually fully coupled ocean/atmosphere Earth system models. The twentieth century has seen these global climate/weather models evolve into massive numerical missions including more components of the Earth system and representing more processes at much finer scales. Throughout this evolution of development, scientists have been willing explore the boundaries of computational capacity to push these models beyond their limitations. Often times scientists are willing to explore at a rate of just a single simulation day per day to see features never before seen in these models.
At Goddard’s NASA Center for Climate Simulation (NCCS) and the Global Modeling and Assimilation Office (GMAO) the development of the Goddard Earth Observing System model (GEOS-5) over the last 10-years serves as a microcosm of this evolution. From spiraling storms in the tropics, to the fidelity of clouds over the North Pacific, this global atmospheric model has evolved into an Earth system simulator depicting global weather and climate at resolutions never before explored on a global scale. This evolution takes us from Hurricane Katrina in 2005 to Superstorm Sandy in 2012. We will explore stratocumulus clouds across the North Pacific and tornadoes in the Midwest United States. With branches along the way to explore fine particles across the globe and the first global view of waves of carbon dioxide leaving their sources and engulfing the world.
Overview of Slurm Version 15.08
Morris Jette and Danny Auble (SchedMD)
Yiannis Georgiou (Bull)
This presentation will describe a multitude of new capabilities provided in Slurm version 15.08 (released August 2015) which are not covered in a separate talk. These enhancements include:
- Resource allocation optimization for both Dragonfly and SGI Hypercube networks
- Dedication of nodes to a single user, with the ability to run multiple jobs per node
- Archiving job accounting information to Elasticsearch with it’s powerful analytic tools
- Job dependencies joined with OR operator
- Automatic replacement of resources in advanced reservation with idle resources
- sbcast support for file transfers based upon job step allocation
- Additional options automatically distributing a job’s tasks across allocated nodes (pack/no_pack options)
- Reservation of hyperthreads (or cores) for system use
- Permit QOS based preemption with job suspend/resume
Trackable Resources (TRES)
Brian Christiansen and Danny Auble (SchedMD)
Resource accounting has been largely built on CPU utilization. As we see more heterogeneous workloads come, we see some that take up all the memory on a node, but only a few CPUs. The user is only charged for a fraction of the node, even though they make it so only a job with small memory requirements would be use the remaining CPUs. The same kind issue can exist on nodes with accelerators or system licenses. Energy is also becoming of interest to account for against power caps. Basically any resource that has a limit can now tracked by Slurm. Each of these resources is tracked separately in the Slurm database, making this information available to tools like sreport. A variety of system metrics can be displayed to help determine where the bottleneck is in terms of hardware for the workload of the system. This presentation will describe the design and implementation of this functionality as well as specific information about its configuration and use.
Burst Buffer Support
Morris Jette (SchedMD) and
Tim Wickberg (The George Washington University)
Slurm version 15.08 includes support for burst buffers, a shared high-speed storage resource. Slurm provides support for allocating these resources, staging files in, scheduling compute nodes for jobs using these resources, then staging files out. Burst buffers can also be used as temporary storage during a job’s lifetime, without file staging. Slurm also supports the concept of a persistent burst buffer, which are not associated with any specific job. A typical use of persistent burst buffers is to maintain datasets used by multiple programs. Slurm support for burst buffers is provided using a plugin mechanism so that a various infrastructures may be easily configured. Two plugins are currently available: one for Cray (DataWarp) systems and a second which relies upon scripts to provide a generic interface. This presentation will describe the design and implementation of Slurm’s burst buffer support as well as specific information about its configuration and use.
Message Aggregation
Martin Perry (Bull)
Danny Auble (SchedMD)
In efforts to support bigger systems we look at potential scaling issues. One of these deals with messages not originating with the slurmctld that potentially create a many to one scenario. A couple of these exist in previous versions of Slurm with the epilog complete and node registration messages. These messages come from each slurmd when they are ready to send. What has been added is the ability to "route" these messages through other slurmds to gather them up only delivering the slurmctld a small subset of the messages drastically reducing the amount of connections the slurmctld has to service and respond to as well as limiting the contention dealing with locks in the slurmctld. This presentation will describe the design and implementation of this functionality as well as specific information about its configuration and use.
Quality Of Service Attached to a Partition
Danny Auble (SchedMD)
A partition can now have an associated Quality Of Service (QOS). This will allow a partition to have all the limits available to a QOS. If a limit is set in both, the partition QOS will take precedence over the job’s QOS unless the job’s QOS has the ’OverPartQOS’ flag set. This also allows for truly floating partitions where a partition can have access to all the nodes in the system you can set a GrpCPU limit in the Partition QOS making it so only so many CPUs can be used at once it just doesn’t matter which ones. This can improve utilization as well since nodes aren’t carved off for debugging or such. This presentation will describe the design and implementation of this functionality as well as specific information about its configuration and use.
Power Management Support for Cray Systems
Morris Jette (SchedMD)
Power consumption has become a critical factor in high performance computer management. Slurm version 15.08 provides an integrated power management system for power capping on Cray systems. The mode of operation is to take the configured power cap for the system and distribute it across the compute nodes under Slurm control. Initially that power is distributed evenly across all compute nodes. Slurm then monitors actual power consumption and redistributes power as appropriate. Specifically, Slurm lowers the power caps on nodes using less than their cap and redistributes that power across the other nodes. The thresholds at which a node’s power cap are raised or lowered are configurable as are the rate of change the power cap. In addition, starting a job on a node immediately triggers resetting the node’s power cap to a higher level. A variety of configuration parameters are available to control the rate of change permitted in a node’s power cap, triggers for changing a node’s power cap, and how power caps are managed across resources allocated to each job. This presentation will describe the design and implementation of Slurm’s power management support as well as specific information about its configuration and use.
Slurm Layouts Framework, latest evolutions
Matthieu Hautreux (CEA)
Looking at HPC and data centers past years trends, pressure has been pushed on the ability to make the most of the available resources while minimizing the associated exploitation costs. New workload and or system characteristics have been studied or even added in resource managers, showing their benefits when selecting the best places to spread an ever increasing demand of IT tasks.
The Slurm layouts framework aims at providing a new and generic way to describe resource characteristics, related collateral resources as well as the relations between them. By giving Slurm an extensible stack of consistent layers to represent the different aspects of systems, the framework goal is to ease the management of multiple objectives scheduling and to better integrate Slurm with its future hosting systems for smart interactions.
This talk will present the latest evolutions in the layouts framework: the newly introduced API and the layout consistency logic.
Power adaptive scheduling based on layouts
Yiannis Georgiou (BULL)
Matthieu Hautreux (CEA)
The power consumption of a supercomputer needs to be adjusted based on varying power budget or electricity availabilities. As a consequence, Slurm has to be adequately adapted in order to efficiently schedule jobs with optimized performance while limiting power usage whenever needed. Based on last years prototype and theoretical studies along with the latest evolutions of the layouts framework within Slurm we have developed a power adaptive scheduling that uses layouts for the description of the power characteristics and the dynamic calculation of the varying power budget. This presentation will provide a description of the developments’ internals, along with various use cases for administrators and users.
Never port your code again: Docker deployment with Slurm
Douglas Jacobsen, James Botts, Shane Canon (NERSC, Lawrence Berkeley National Laboratory)
Linux container technology has been transforming many aspects of software engineering, testing, and delivery. The promise of this technology is extreme portability, reproducibility of code and, for the a large-scale HPC facility, ease of new application deployment. The application of Docker containers to scientific codes could enable portability of applications between HPC centers, repeatability of data analysis, and increased ease-of-use. The authors have developed Shifter, a software mechanism for importing Docker and other user-defined images to scalably and securely run them across thousands of nodes. An additional benefit of the Shifter approach is improved I/O performance of starting large applications with many shared-library or other dependencies. Using Slurm plugin capabilities, Shifter is tightly integrated into Slurm enabling a seamless user experience.
Running Virtual Machines using Slurm
Ulf Markwardt (Technische Universität Dresden)
The diversity of user requests in high-throughput computing sometimes are easier to be met with user-specific operating systems. Some communities even need a certain version Scientific Linux of e.g. in order to create reproducible results. From this we derive a demand for running virtual machines on compute nodes. This is justified by the low overhead of virtualization infrastructure in terms of CPU usage and memory footprint. Now, the challenge is to manage these virtual nodes with the same batch system as the real nodes, with minimal requirements for the users.
We are proposing a lightweight infrastructure based on Slurm to manage virtual machines based on users’ demands. This infrastructure runs in a test cluster, and is about to be installed in our new Bull petaflop system. We will present our experiences, as well as the working scheme, benchmarks and figures.
Challenges and Designs of Extending Slurm with Support for SR-IOV and IVShmem
Xiaoyi Lu, Jie Zhang, Sourav Chakraborty, Hari Subramoni, Hari, Mark Arnold, Jonathan Perkins, Dhabaleswar Panda (Ohio State University)
Significant growth has been witnessed during the last few years in HPC clusters with multi-/many-core processors, accelerators, and high-performance interconnects (e.g. InfiniBand). To alleviate the cost burden, sharing HPC cluster resources to end users through virtualization is becoming more and more attractive. Due to the lower performance of virtualized I/O devices, the adoption of virtualization in the HPC domain still remains low. The recently introduced Single Root I/O Virtualization (SR-IOV) technique for InfiniBand and High Speed Ethernet provides native I/O virtualization capabilities and is changing the landscape of HPC virtualization. However, achieving near native throughput for HPC applications that use both MPI point-to-point and collective operations on the virtualized systems presents a new set of challenges for the designers of high performance middleware such as Slurm, MPI libraries, etc.
First of all, our earlier studies have shown that SR-IOV lacks locality-aware communication support, which leads to performance overheads for inter-VM communication within the same physical node. In this context, another novel feature, Inter-VM Shared Memory (IVShmem), is proposed to support shared memory backed intra-node-inter-VM communication. Our enhanced MVAPICH2 MPI library can fully take advantage of SR-IOV and IVShmem to deliver near-native performance for HPC applications. Through these studies, we find that there is a significant requirement of managing and isolating virtualized resources of SR-IOV and IVShmem to support running multiple concurrent MPI jobs and such an isolation is hard to be achieved by MPI library alone. Furthermore, modern multi-core architectures allow users to have the flexibility to choose from various VM subscription policies, ranging from one VM per node, to one VM per CPU socket and one VM per CPU core. The choices allow for finer-grained resource management and scheduling, depending on the resource requirements of various applications and workloads. These issues lead us to the following broad challenges:
- Can Slurm be extended to support SR-IOV and IVShmem for running concurrent MPI jobs efficiently?
- Can critical HPC resources be efficiently shared among multiple users by using extended Slurm with support for SR-IOV and IVShmem based virtualization?
- Can SR-IOV and IVShmem enabled Slurm and MPI library provide bare-metal performance for end HPC applications?
In this talk, we will first discuss all these technical requirements and challenges of extending Slurm with support for SR-IOV and IVShmem. Then, we will present the alternative designs of enhancing Slurm with virtualization-oriented capabilities such as job submission to dynamically created virtual machines with SR-IOV and IVShmem resources on InfiniBand clusters. Some preliminary performance evaluation results will be shared with the Slurm community.
16 September 2015
Increasing cluster throughput with Slurm and rCUDA
Federico Silla (Technical University of Valencia, Spain)
In this presentation we will introduce a modified version of Slurm supporting the use of the remote GPU virtualization mechanism, using the rCUDA framework as an example. Furthermore, we will present an extensive performance evaluation carried out in a 16-node 16-GPU cluster, and using workloads up to 400 jobs long composed of a mix of 8 different applications. Results show that by combining the use of rCUDA with a modified version of Slurm, cluster throughput is increased up to 45%. Similar reductions are attained in overall power consumption. Additionally, GPU utilization is noticeably increased. The use of less GPUs than nodes has also been considered. In this case, results show that cluster throughput is maintained when the rCUDA middleware is used thanks to the combined ability of rCUDA+Slurm of sharing GPUs among jobs.
Towards a multi-constraints resources selection within Slurm
Dineshkumar Rajagopal, David Glesser, Yiannis Georgiou (Bull)
The selection of resources within Slurm is currently done through the select plugins. Those plugins are efficient and scalable but are not very easily extensible to take into account new and multiple type of constraints such as power or temperature for example. This presentation will investigate a new flexible method of resources selection based on the layouts framework that has the ability to support multi-constraints of resources within the selection algorithms. It will provide a description of the prototype development upon Slurm and show performance evaluation results in terms of efficiency and scalability.
Improving Job Scheduling by using Machine Learning
David Glesser, Yiannis Georgiou (BULL)
Denis Trystram (INRIA)
More and more data are produced within Slurm by monitoring the system and the jobs. The methods studied in the field of big data, including Machine Learning, could be used to improve the scheduling. This talk will investigate the following question: to what extent Machine Learning techniques can be used to improve job scheduling? We will focus on two main approaches. The first one, based on an online supervised learning algorithm, we try to predict the execution time of jobs in order to improve backfilling. In the second approach a particular ’Learning2Rank’ algorithm is implemented within Slurm as a priority plugin to sort jobs in order to optimize a given objective.
Enhancing Startup Performance of Parallel Applications in Slurm
Sourav Chakraborty, Hari Subramoni, Jonathan Perkins, Adam Moody and Dhabaleswar K. Panda (Ohio State University)
As system sizes continue to grow, time taken to launch a parallel application on large number of cores becomes an important factor affecting the overall system performance. Slurm is a popular choice to launch parallel applications written in Message Passing Interface (MPI), Partitioned Global Address Space (PGAS) and other programming models. Most of the libraries use the Process Management Interface (PMI) to communicate with the process manager and bootstrap themselves. The current PMI protocol suffers from several bottlenecks due to its design and implementation, and adversely affects the performance and scalability of launching parallel applications at large scale.
In our earlier work, we identified several of these bottlenecks and evaluated different designs to address them. We also showed how the proposed designs can improve performance and scalability of the startup mechanism of MPI and hybrid MPI+PGAS applications. Some of these designs are already available as part of the MVAPICH2 MPI library and pre-release version of Slurm. In this work we present these designs to the Slurm community. We also present some newer designs and how they can accelerate startup of large scale MPI and PGAS applications.
Experiences using the Adaptable Profile-driven Testbed (Apt) and Slurm to dynamically provision & schedule Bare Metal HPC resources
Brian Haymore (University of Utah)
In a traditional HPC environment, cluster resources have a fixed configuration into which usage has to fit. In collaboration with both the Apt project (http://www.flux.utah.edu/project/apt) at Utah and SchedMD, CHPC has been exploring ways to utilize dynamic provisioning of a set of resources that are being shared by a number of very different missions, to better deliver HPC resources to researchers.
We are using tools within the Apt facility to dynamically provision bare metal compute, network and storage resources to meet the needs of a job. The extensible dynamic "cloud" support built into Slurm is being used to manage and schedule the workloads. Key areas of interest have been robustness and ease of support of the system, job turnaround, effectiveness in handling "bursting", and resource contention.
Results on our current implementation have been positive and there are usage models for which we see clear benefit to this manner of operation. We have been in operational status for about 8 months now with only minor issues. There are, however, several areas in which improvements in the integration between Slurm and Apt would would lead to fewer failed job starts and therefore a more robust system. In addition, we have identified usage cases for which dynamic provisioning of HPC resources may be of great value, for example, supporting compliance regulated content and integrating software defined network controls.
Using the Barcelona Slurm Workload Simulator at CSCS--Porting, Re-engineering and Set up
Massimo Benini, Stephen Trofinoff and Gilles Fourestey (CSCS)
Several years ago, the BSC produced a beta-version of a Slurm workload simulator. Consisting of some modifications to the Slurm code base at the time and some additional daemons and tools, it can simulate the scheduling of various workloads in reduced time. Thus, it can provide its user with an idea of how Slurm will perform given its current configuration under different workloads. For any site that uses Slurm, especially at sites such as CSCS where there are varied and sometimes complex Slurm installations, this tool has the potential to give valuable insight into what works best and what does not. New development, however, had ceased with it still being in a beta-format. This report/presentation aims to discuss some of the technical details and challenges encountered as part of CSCS’s effort to utilize this tool.
Slurm Roadmap
Morris Jette and Danny Auble (SchedMD)
Yiannis Georgiou (Bull)
This presentation will describe new capabilities planned in future releases of Slurm
- Xeon Phi Knights Landing support
- Greater control over a computer’s power consumption including power floor and controlling the rate of change (Cray only)
- Control over a job’s frequency limits based upon its QOS
- Inter-cluster job management supporting jobs submitted to multiple clusters and started on the first resource actually available and job dependencies between jobs on different clusters
- Dynamic runtime settings environment
- Support of VM and containers management within Slurm (HPC, Cloud/Big Data)
- Deploy Big Data workflow upon HPC infrastructure
Federated Cluster Scheduling
Brian Christiansen and Morris Jette (SchedMD)
Slurm has provided limited support for resource management across multiple cluster, but with notable limitations. We have designed Slurm enhancements to eliminate these limitations in a scalable and reliable fashion while increasing both system utilization and responsiveness. This design allows jobs to be submitted to multiple clusters with their execution host determination delayed until execution can actually begin, which optimizes responsiveness in the face of workload changes. Unique enterprise-wide jobs IDs will be used to permit rapid enterprise-wide job operations such as job dependencies, status reports, and cancellation. Finally, each cluster operates with a great deal of autonomy. A limited number of inter-cluster operations are coordinated directly between the Slurm daemons managing each individual cluster. A single centralized daemon is only required to provide initial message routing information when the individual clusters start. We anticipate the overhead of this design to be sufficiently low for Slurm to retain the ability to executing hundreds of jobs per second per cluster. An overview of the design will be presented along with an analysis of its capabilities.
Support for heterogeneous resources and MPMD model
Rod Schultz, Martin Perry, Bill Brophy, Doug Parisek, Yiannis Georgiou (BULL)
Matthieu Hautreux (CEA)
Slurm, in its current stable versions, provides the support of SPMD model (Single Program Multiple Data) as well as a limited MPMD model (Multiple Program Multiple Data) support. By limited MPMD support, we mean that despite users can specify different binaries to be used within an parallel job, all the tasks are currently associated with the same resources requirements. These approaches are not very well suited to manage complex jobs with different heterogeneous resources requirements per tasks. For example, users willing to leverage different types of hardware resources inside the same MPI application, having part of their code running on GPUs while another is running on standard CPUs with 2GB per core and a last part on CPUs with 8GB per core, has to request for the most complete set of resources for each task wasting some of the hardware with tasks that will not need all of them. In some cases, the total configuration required to run such a job does not even exist as all the nodes of the cluster may not provide all the hardware features. The presentation will provide the current progress of our studies and developments towards the support of heterogeneous resources and MPMD model within Slurm.
Experiences of Native SLURM on the NERSC Edison Cray XC30
Douglas Jacobsen, James Botts (NERSC, Lawrence Berkeley National Laboratory)
The authors deployed Native SLURM on a Cray test system earlier this year and have been able to evaluate most aspects of running SLURM in this environment. Leveraging that experience, the authors deployed Native SLURM on the NERSC production XC30, edison, during a day of dedicated time. The primary goal was to measure the effectiveness of Native SLURM on a large, 5572 compute node, production system.
A simulation of the production workflow was used to load the system to typical NERSC production levels and measurements were taken of system efficiency and responsiveness. The SLURM configuration was tuned during this run and much was learned how to effectively un Native SLURM on a large system.
Brigham Young University Site Report
Ryan Cox (Brigham Young University)
- BYU’s Slurm configuration, including information about our Lua job_submit script
- Development work to catch ssh-launched processes and adopt them into Slurm for resource tracking/enforcement
- New web-based tools for account coordinators to manage their accounts, soon to be available on github
- A new tool to gather job performance data, visualize the data on a web page, and automatically alert admins about anomalies
- Questions and answers about Fair Tree
University of South Florida Site Report
John Desantis (University of South Florida)
Research Computing, an organization within Information Technology, University of South Florida, was facing several issues with the production scheduler:
- Long delays in dispatching jobs;
- Scheduler processes continually consuming large amounts of CPU resources;
- Reservations needing to be scheduled via cron jobs;
- Lots of fragmentation in resources;
- Difficult to explain priority calculation to users;
- Complicated preemption policies. Research Computing began looking at Slurm as a replacement scheduling system in late July, 2014. An alpha test cluster was deployed on old hardware and Slurm testers were recruited. Within 1 month, the benefits of Slurm over the previous scheduler were clear:
- Ease of deployment and upgrades;
- Rich accounting system;
- Predictable preemption;
- Multifactor plugins (MaxJobAge, FavorSmallJobs, etc) work as expected;
- QOS system controlled jobs as configured;
- Priorities via QOS and other multifactor weights are easily explainable to users;
- Low system overhead for scheduler processes.
Research Computing moved its full production environment to Slurm March of 2015. In addition to providing insights to the benefits of Slurm, this presentation will discuss implementation issues and solutions.
NASA Center for Climate Simulation (NCCS) Site Report
Bruce Pfaff (NASA Center for Climate Simulation)
The NASA Center for Climate Simulation (NCCS) converted to Slurm in the fall of 2013. Since then, we have migrated from version 2.x to 14.x and have converted from a partition centric scheduling regime to a QOS centric regime.
While serving a diverse user base, and supporting a variety of local modeling and scientific research codes, we have attempted to achieve our scheduling goals by using the existing Slurm features, w/o making extensive local modifications.
This site report will review our migration, user experiences, and our current QOS based approach to scheduling, as well as configuration changes made since then to support multiple hardware upgrades and running with multiple versions of the OS.
Jülich Supercomputing Center Site Report
Dorian Krause (Jülich Supercomputing Center)
The Jülich Supercomputing Center (JSC) in the research center Jülich is operating several top-class supercomputers with a number of different workload and resource management systems. Since 2013 JSC is evaluating Slurm as a workload manager for future cluster systems in combination with JSC’s custom software stack. In autumn 2014 the first user accessible cluster with the Slurm workload manager was deployed. Currently JSC is in the process of installing its next-generation, 2 Petaflops per second peak, general-purpose supercomputer which will be the first large-scale production system at JSC to leverage Slurm. In this site report we will discuss our batch system setup, experiences with Slurm gained since 2013 and possibly outline our ideas for the evolution of our Slurm deployments.
Slurm Experiences on GUANE-1 (GPUS Unified Advanced Environment for Supercomputing)
Gilberto Javier Diaz Toro, Carlos Jaime Barrios Hernandez (Universidad Industrial de Santander)
GUANE-1 is the main supercomputing platform based in NVIDIA GPUs with 128 TESLA cards. The platform was launch in operation in 2012 (Initially with 64 GPUs) and with a reload in 2013 to upgrade the performance in processing, network and storage, ensuring the same power consumption. At date, GUANE-1 offers 205 TFLOPS of peak of performance
In December 2014, after an extensive evaluation procedure, GUANE-1 uses in co-habitation Slurm with OAR Workload Manager at same time. Experiences of the use of Slurm among these 6 months show the preferential use of the users (Today, it exists 687 active users, then 70% use Slurm). On the other hand, for the management and support, maintenance process, fault‐tolerance implementation, implementation of the in-house plugins, interconnection with other platforms (and interoperability in Grid or large scale platforms) and scalability allows a good performance and relative easy operation of the platform.
This proposal presents performance evaluation, experiences and open questions about the Slurm use in co-habitation with other workload schedulers and in alone experiences performed in the SC3UIS platforms, mainly in GUANE-1 supercomputing infrastructure.
The George Washington University Site Report
Tim Wickberg (The George Washington University)
The George Washington University is proud to host the 2015 user group meeting in Washington DC. We present a brief overview of our user of Slurm on Colonial One, our University-wide shared HPC cluster. We present both a detailed overview of our use and configuration of the "fairshare" priority model to assign resources across disparate participating schools, colleges, and research centers, as well as some novel uses of the scheduler for non-traditional tasks such as file system backups.
Last modified 17 August 2015