kslurm is a high-performance assistant for slurm-powered compute clusters. It simplifies job requests, manages python environments, and makes your cluster workflow smooth and efficient.
Intuitive, uniform commands to schedule batch jobs, request interactive sessions, and fire up jupyter notebooks
Tools for python venv management specially adapted for cluster life.
A container manager for pulling and organizing singularity containers from docker hub
kslurm is best installed through pipx:
pipx install kslurm
After installing, use
kpy bash >> $HOME/.bash_profile
to initialize bash integrations.
See here for a detailed installation guide.
Slurm provides a very powerful interface for compute cluster job scheduling, most of which is unnecessary for daily computing needs. kslurm simplifies these commands into an intuitive syntax.
Consider the following examples:
This can be requested with slurm using the
salloc command. Often, you'll need to request extra memory and time, unless your goal is very minimal:
salloc --mem=4G --time=3:00:00 --account def-account
kslurm uses 4G of memory and a 3hr runtime as the default, so the above command becomes:
Running a specific command interactively
Typically you'd use
srun to achieve this. In kslurm, the same command
krun is used both for requesting an open interactive console and for running a specific job interactly:
krun python my-script.py
Specifically, with no command provided, krun uses salloc, but uses srun when a command is given.
Requesting more resources
slurm resources are specified via traditional command line keywords:
--mem=<num> for memory,
--cpus-per-task=<num> for cpus, and so forth. kslurm uses a pattern matching syntax that lets you specificy runtime, memory, and number of cpus, the three most common resources, in a flash.
For example, memory is specified as a number plus a unit, e.g.
cpus is a simple number with no unit, e.g.
krun 16G 4
Runtime is requested as
krun 16G 4 2:00
The above command starts an interactive session with 16G of memory and 4 cores, with a runtime of 2hr.
Interactive sessions are not intended to be used for long running jobs, e.g. jobs running longer than 3 hrs. For these, jobs should be scheduled (see below). Check your organizational policies for specific length guidelines.
Scheduling a longer job
Most jobs on the cluster are submitted to the scheduler via
sbatch. This command typically requires you to write a bash script containing the commands. While this is still appropriate for complex, multistep jobs, it's overkill for single-process programs with long runtimes, such as those commonly used in neuroimaging analysis (e.g. running freesurfer). For these applications, kslurm provides kbatch:
kbatch 12:00 16 24G recon-all <recon-all-args>
The above command schedules a 12hr job with 16 cores and 24G of memory. Once started, the job will run recon-all.
Other slurm args?
salloc are highly configurable, each with a long list of available args. While most of these don't have kslurm counterparts,
kbatch will still accept slurm arguments and pass them on to the respective slurm command.
Compute environments present a few unique challenges to python environment management:
Central venv management in the home directory is prevented by space limitations (e.g. most clusters discourage the use of anaconda)
The most performant location to install venvs is the local scratch dir of a compute node, but these directories are deleted as soon as the job finishes.
requirements.txtfiles help with reproducibility, but it takes a few minutes to install a venv from scratch, a delay especially annoying for interactive work.
Compute nodes often don't have internet connection, so installing any arbitrary package from the pip index is not possible.
kslurm addresses these problems through
kpy, a set of commands to save and reload entire vitual environments as tar files. A venv can be composed on a login node, using the available internet, then saved and reloaded as needed on a login node. Get all the details here!
Singularity (now Apptainer) is a container runner allowing you to install and run software in easily reproducable environments. Kslurm assists with container management via the
kapp command. Kapp allows you to pull containers directly from docker hub without worrying about login node memory limits. It resolves and sorts tags and makes it easy to run containers from anywhere on the cluster. It also integrates with snakemake, providing a directory of images that snakemake can directly consume using its
--singularity-prefix parameter. See the full page here.