Chapter 9. Job Scheduling with SLURM

Table of Contents

9.. Overview
9.. Using SLURM
SLURM Jargon
Cluster Status
Job Status
Using top
Job Submission
Terminating a Job
Terminating Stray Processes
Viewing Output of Active Jobs
Checking Job Stats with sacct
Checking Status of Running Jobs with scontrol
Job Sequences
Self-test
9.. Local Customizations

Before You Begin

Before reading this chapter, you should be familiar with basic Unix concepts (Chapter 3, Using Unix), the Unix shell (the section called “Command Line Interfaces (CLIs): Unix Shells”, redirection (the section called “Redirection and Pipes”), shell scripting (Chapter 4, Unix Shell Scripting) and job scheduling (Chapter 7, Job Scheduling.

Overview

SLURM is an acronym for Simple Linux Utility for Resource Management. SLURM manages resources such as CPU cores and memory on a cluster. Originally developed on Linux with the lessons of earlier job schedulers in mind, SLURM is also supported on FreeBSD and NetBSD, and has been used on Mac OS X. SLURM is also rich in advanced features, so the 'S' and the 'L' in SLURM could safely be removed at this point.

For complete information, see the official SLURM documentation at http://slurm.schedmd.com/.

This chapter covers the basics of SLURM along with some extensions provided by SPCM, Simple, Portable Cluster Manager. SPCM is a set of tools for building and managing SLURM clusters. SPCM was developed on CentOS Linux and FreeBSD, but is designed to be easily adapted to any POSIX platform. Some convenient tools provided by SPCM are mentioned below.