In order to use any new system, a few definitions must be familiar. Below are some terms used throughout this document that are necessary for understanding LPJS. This chapter assumes that the reader is familiar with the material in Chapter 6, Parallel Computing and Chapter 7, Job Scheduling, which cover the general concepts of HPC and HTC.
A node is a single computer in the cluster or grid.
A job is the execution of a program under the LPJS scheduler. Each job is assigned a unique integer job ID. A job is analogous to a Unix process, but not the same. Job IDs are not Unix process IDs, and a job may entail more than one Unix process, if it is a parallel program. There are three kinds of LPJS jobs:
A serial job runs a single Unix process.
A shared memory parallel job runs multiple cooperating Unix processes or threads on the same node. They most commonly use pthreads (POSIX threads) or OpenMP (not to be confused with OpenMPI), but any shared memory parallel programming API is possible.
A distributed parallel job runs multiple cooperating Unix processes, possibly on different nodes. These mostly commonly use MPI (Message Passing Interface), a set of libraries and tools for creating parallel programs. There are multiple implementations of MPI, including MPICH, OpenMPI, etc. The processes that make up an MPI job can be on the same node, or on different nodes. A typical MPI job runs multiple processes on each of multiple nodes.
A submission refers to all jobs created by one lpjs submit command. A submission is not an entity in the LPJS scheduler, but only a concept used in this document. The only unit of work managed by LPJS is the job.
The lpjs nodes command lists available compute nodes and available resources on each node. The following shows a Unix workstation (barracuda) and a Mac Mini (tarpon) being used as compute nodes on a local home network:
shell-prompt: lpjs nodes Hostname State Procs Used PhysMiB Used OS Arch barracuda.acadix.biz Up 4 0 16350 0 FreeBSD amd64 tarpon.acadix.biz Up 8 0 8192 0 Darwin arm64 Total Up 12 0 24542 0 - - Total Down 0 0 0 0 - -
The following shows a small cluster consisting of dedicated Dell PowerEdge servers:
shell-prompt: lpjs nodes Hostname State Procs Used PhysMiB Used OS Arch compute-001.albacore Up 16 0 65476 0 FreeBSD amd64 compute-002.albacore Up 16 0 65477 0 FreeBSD amd64 compute-003.albacore Up 16 0 65477 0 FreeBSD amd64 compute-004.albacore Up 16 0 65477 0 FreeBSD amd64 compute-005.albacore Up 16 0 131012 0 FreeBSD amd64 compute-006.albacore Up 16 0 131012 0 FreeBSD amd64 Total Up 96 0 523931 0 - - Total Down 0 0 0 0 - -
The lpjs jobs command shows currently pending (waiting to start) and running jobs. Below, we see an RNA-Seq adapter trimming job utilizing our workstation and Mac Mini to trim six files at once.
shell-prompt: lpjs jobs Legend: P = processor J = job N = node Pending JobID IDX Jobs P/J P/N MiB/P User Compute-node Script 169 7 18 2 2 10 bacon TBD 04-trim.lpjs 170 8 18 2 2 10 bacon TBD 04-trim.lpjs 171 9 18 2 2 10 bacon TBD 04-trim.lpjs 172 10 18 2 2 10 bacon TBD 04-trim.lpjs 173 11 18 2 2 10 bacon TBD 04-trim.lpjs 174 12 18 2 2 10 bacon TBD 04-trim.lpjs 175 13 18 2 2 10 bacon TBD 04-trim.lpjs 176 14 18 2 2 10 bacon TBD 04-trim.lpjs 177 15 18 2 2 10 bacon TBD 04-trim.lpjs 178 16 18 2 2 10 bacon TBD 04-trim.lpjs 179 17 18 2 2 10 bacon TBD 04-trim.lpjs 180 18 18 2 2 10 bacon TBD 04-trim.lpjs Running JobID IDX Jobs P/J P/N MiB/P User Compute-node Script 163 1 18 2 2 10 bacon barracuda.acadix.biz 04-trim.lpjs 164 2 18 2 2 10 bacon barracuda.acadix.biz 04-trim.lpjs 165 3 18 2 2 10 bacon tarpon.acadix.biz 04-trim.lpjs 166 4 18 2 2 10 bacon tarpon.acadix.biz 04-trim.lpjs 167 5 18 2 2 10 bacon tarpon.acadix.biz 04-trim.lpjs 168 6 18 2 2 10 bacon tarpon.acadix.biz 04-trim.lpjs
Below, we see an RNA-Seq adapter trimming job utilizing our cluster to trim all eighteen of our files simultaneously. This should get the job done much faster than the two computers in the previous example.
shell-prompt: lpjs jobs Legend: P = processor J = job N = node Pending JobID IDX Jobs P/J P/N MiB/P User Compute-node Script Running JobID IDX Jobs P/J P/N MiB/P User Compute-node Script 19 1 18 3 3 50 bacon compute-001.albacore 04-trim.lpjs 20 2 18 3 3 50 bacon compute-001.albacore 04-trim.lpjs 21 3 18 3 3 50 bacon compute-001.albacore 04-trim.lpjs 22 4 18 3 3 50 bacon compute-001.albacore 04-trim.lpjs 23 5 18 3 3 50 bacon compute-001.albacore 04-trim.lpjs 24 6 18 3 3 50 bacon compute-002.albacore 04-trim.lpjs 25 7 18 3 3 50 bacon compute-002.albacore 04-trim.lpjs 26 8 18 3 3 50 bacon compute-002.albacore 04-trim.lpjs 27 9 18 3 3 50 bacon compute-002.albacore 04-trim.lpjs 28 10 18 3 3 50 bacon compute-002.albacore 04-trim.lpjs 29 11 18 3 3 50 bacon compute-003.albacore 04-trim.lpjs 30 12 18 3 3 50 bacon compute-003.albacore 04-trim.lpjs 31 13 18 3 3 50 bacon compute-003.albacore 04-trim.lpjs 32 14 18 3 3 50 bacon compute-003.albacore 04-trim.lpjs 33 15 18 3 3 50 bacon compute-003.albacore 04-trim.lpjs 34 16 18 3 3 50 bacon compute-004.albacore 04-trim.lpjs 35 17 18 3 3 50 bacon compute-004.albacore 04-trim.lpjs 36 18 18 3 3 50 bacon compute-004.albacore 04-trim.lpjs
It is important to know how many processors and how much
memory are being used by the Unix processes that make up
our jobs. For this, we use standard Unix process monitoring
tools, such as top. In order to do so,
we must know which compute nodes are being used by the job.
This is shown by the lpjs jobs command
described above. We must also have the ability to run
commands manually on the compute nodes. This can often
be done using ssh. Use the ssh
-t
flag to enable full terminal control,
which is required by top.
shell-prompt: ssh -t compute-001 top last pid: 11680; load averages: 7.12, 6.36, 3.54 up 0+01:57:38 19:22:18 75 processes: 10 running, 65 sleeping CPU: 3.5% user, 0.0% nice, 0.5% system, 0.1% interrupt, 95.9% idle Mem: 128M Active, 1320M Inact, 4416M Wired, 1572M Buf, 57G Free ARC: 1024M Total, 56M MFU, 766M MRU, 188M Anon, 10M Header, 2422K Other 748M Compressed, 976M Uncompressed, 1.31:1 Ratio Swap: 5120M Total, 5120M Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 11590 bacon 1 68 0 13M 2664K piperd 6 4:10 52.98% fastq-tr 11631 bacon 1 111 0 13M 2664K CPU8 8 4:07 49.07% fastq-tr 11640 bacon 1 109 0 13M 2656K CPU7 7 4:10 48.49% fastq-tr 11605 bacon 1 110 0 13M 2664K CPU6 6 4:06 47.17% fastq-tr 11591 bacon 1 108 0 13M 2660K CPU9 9 4:08 43.16% fastq-tr 11617 bacon 1 47 0 14M 3916K piperd 11 2:08 26.76% gzip 11607 bacon 1 45 0 14M 3904K piperd 0 2:06 26.46% gzip 11646 bacon 1 44 0 14M 3908K piperd 5 2:09 24.27% gzip 11644 bacon 1 45 0 14M 3900K CPU4 4 2:07 23.49% gzip 11647 bacon 1 48 0 14M 3892K piperd 14 2:11 23.39% gzip 11618 bacon 1 45 0 14M 3908K CPU10 10 2:05 23.29% gzip 11635 bacon 1 47 0 14M 3916K piperd 10 2:06 23.00% gzip 11642 bacon 1 47 0 14M 3912K piperd 15 2:05 23.00% gzip 11634 bacon 1 42 0 14M 3916K piperd 2 2:10 21.09% gzip 11610 bacon 1 40 0 14M 3908K CPU5 5 2:05 20.65% gzip 11608 bacon 1 36 0 25M 11M zio->i 0 1:15 15.67% xzcat 11592 bacon 1 32 0 25M 11M select 7 1:14 15.58% xzcat 11643 bacon 1 34 0 25M 11M select 0 1:16 14.26% xzcat
TBD: Document top-job when LPJS is integrated with SPCM.
All jobs under LPJS are described by a script that contains some special directives to describe the LPJS job parameters. Otherwise, it is an ordinary script, which is run on each compute node selected for the submission.
An LPJS job script can be written in any scripting language that sees "#lpjs" as a comment.
As LPJS is a cross-platform job scheduler, it is strongly recommended that scripts be written in a portable language and without any operating system specific features. The easiest solution is to use POSIX Bourne shell, which is supported by all Unix-like systems, and is described in Chapter 4, Unix Shell Scripting.
#!/bin/sh -e #lpjs jobs 10 #lpjs processors-per-job 1 #lpjs threads-per-process 1 #lpjs pmem-per-processor 100MiB #lpjs path /usr/local/bin:/usr/bin:/bin my-program my-arguments
It may be tempting to use a more advanced shell language, but doing so may be problematic on some clusters or grids, as some compute nodes may not have the shell installed, or it may behave differently under different operating systems due to different versions of the shell, or differences between the operating systems. Features of advanced shells are rarely useful in HPC/HTC anyway, since the scripts tend to be short and simple. POSIX Bourne shell is more than adequate for most jobs.
LPJS job parameters are specified in the script using a line that begins with "#lpjs". To the scripting language, this is a comment, so it is ignored when the script is run on a compute node. The lpjs submit command, however, extracts these lines from the script and uses the information to set the jobs resource requirements and other parameters.
The parameters listed below are required by LPJS. Job submissions will fail if any of them are missing from the script.
#lpjs jobs: Number of jobs to run in one submission. Each job is assigned a different job ID and an array index beginning with 1.
A submission with jobs > 1 is known as a job array.
#lpjs processors-per-job: Number of processors to allocate for each job in the submission. A processor is whatever the operating system defines it to be. In most situations, it is a logical processor, which may be affected by SMT (simultaneous multithreading, known as hyper-threading on Intel processors). With SMT, a physical CPU core is treated as two or more logical processors by most operating systems. The same core can execute more than one machine instruction at the same time, as long as the instructions don't contend for the same CPU components. SMT is usually disabled on HPC clusters and HTC grids, due to its limited performance benefits and increased contention for memory and other resources.
In general terms, a processor is what the operating system uses to run a process, so processors >= processes that are actually running at any given moment. Other processes must wait in a queue until their next turn at using a processor.
#lpjs threads-per-process: Minimum number of processors that must be on the same node. For shared memory multiprocessing, where all processes or threads must be on the same node, this must be equal to processors-per-job.
#lpjs processors-per-job 8 #lpjs threads-per-process 8
LPJS allows you to use "processors-per-job" as a value here, so you don't need to remember to change both parameters in the future.
#lpjs processors-per-job 8 #lpjs threads-per-process processors-per-job
Distributed parallel programs, such as MPI programs, may use threads-per-process < processors-per-job. Setting threads-per-process to 1 will allow the most flexible scheduling of available processors for an MPI job, which may allow it to start sooner.
#lpjs pmem-per-processor: Physical memory (RAM) per process. This refers to the actual about of RAM (electronic memory) used by a process. All Unix-like (POSIX) operating systems use virtual memory, where the most active part of a process is in physical memory and less active parts may be swapped out to disk or other slower storage.
It is important to set this parameter correctly, as overallocating physical memory can cause serious problems:
If pmem-per-processor is set too high, then your job is hoarding memory resources that it isn't using, which may prevent other jobs from running.
If pmem-per-processor is set too low, then the compute node will be oversubscribed. This will cause processes to run slowly or crash, and may even cause the node to crash.
The only way to determine the correct pmem-per-processor is by doing sample runs of your program with the same inputs used by the job, and observing the memory use using tools such as top. If possible, this should be done on a workstation rather than on a cluster or grid, so that these test runs don't negatively impact other jobs.
Then set pmem-per-processor slightly larger than the observed maximum (maybe 10% to 20%).
Note that top may show both virtual and physical memory use, in different columns. FreeBSD top shows virtual memory under the "SIZE" header and physical memory under "RES". Physical memory is also know as resident memory in Unix, i.e. the portion of the program that resides in real memory rather than swap. Linux topshows virtual memory under "VIRT" and physical memory under "RES".
There are additional, optional parameters as well.
#lpjs path: Sets the environment variable PATH on the compute node before running the script.
Note that this is not quite the same as setting
PATH in the script, since #lpjs path
sets it before the script begins running, and before
pull-command
is executed. Hence, this
could affect which version of rsync
is used for pull-command
.
#lpjs push-command: File transfer command for transferring output files from a compute node to the submit node after a job completes.
#lpjs push-command rsync -av %w/Outputs/ %h:%s/Results/Outputs
The "%w" represents the temporary working directory on the compute node.
The "%s" represents the hostname of the submit node.
The "%s" represents the directory from which the job was submitted on the submit node.
You can choose any file transfer command you like, sending the output to some other server or directory.
Examples:
#lpjs push-command scp -r %w %s:/jobs/data/username/Results #lpjs push-command scp -r %w myworkstation.my.domain:/jobs/data/username/Results
#lpjs pull-command: Like push-command, but pulls files to the compute node before the job script executes.
#lpjs pull-command rsync -r --copy-links %h:%s/Inputs/ %w/Inputs
#lpjs log-dir: Sets the parent directory for job logs. Each job creates a subdirectory under this directory containing the chaperone log, a copy of the job script created at the time of submission, and the stdout and stderr output from the script.
The log directory name may not contain whitespace.
The default is <working-directory>/LPJS-logs/script-name/Job-jobid.
Example:
#lpjs log-dir Logs/04-trim
LPJS will not dispatch a job until enough processors and memory are available.
Note that the number of processors and memory for a submission is irrelevant. All jobs, even from the same submission, are scheduled independently. LPJS will dispatch as many jobs as possible from a given submission, and the rest will wait in the queue until sufficient resources become available.
Memory requirements must specify units, which can be MB (megabytes, 10^6 bytes), MiB (mebibytes, 2^20 bytes), GB (gigabytes, 10^9 bytes), or GiB (gibibytes, 2^30 bytes).
Other schedulers have default units, which may seem like a convenience, but this often leads to confusion and errors in practice.
LPJS sets a number of environment variables, which, like all environment variables, are inherited by child processes. In this way, information about the job is passed from LPJS to the programs run by your job scripts.
Note: LPJS_PMEM_PER_PROC is shown in MiB.
#!/bin/sh -e #lpjs jobs 1 #lpjs processors-per-job 3 #lpjs threads-per-process processors-per-job #lpjs pmem-per-processor 50MiB # Print all environment variables and filter for those containing LPJS_ printenv | grep LPJS_
LPJS_JOB_COUNT=1 LPJS_COMPUTE_NODE=TBD LPJS_PUSH_COMMAND=rsync -av %w/ %h:%s LPJS_PMEM_PER_PROC=9 LPJS_MIN_PROCS_PER_NODE=1 LPJS_PRIMARY_GROUP_NAME=bacon LPJS_SUBMIT_HOST=moray.acadix.biz LPJS_JOB_LOG_DIR=LPJS-logs/env LPJS_USER_NAME=bacon LPJS_SUBMIT_DIRECTORY=/home/bacon LPJS_JOB_ID=1983 LPJS_SCRIPT_NAME=env.lpjs LPJS_PROCS_PER_JOB=1 LPJS_ARRAY_INDEX=1 LPJS_HOME_DIR=/home/bacon
Shell scripts can use these variables directly, as described in Chapter 4, Unix Shell Scripting.
A batch serial submission has both jobs and processors-per-job parameters both set to 1.
#!/bin/sh -e #lpjs jobs 1 #lpjs processors-per-job 1 #lpjs threads-per-process 1 #lpjs pmem-per-processor 100MiB my-serial-program arguments
A batch parallel submission, or job array, has a jobs parameter > 1. Individual jobs may be serial (1 process) or parallel (multiple processes).
When submitting a job array, it's helpful to have input and output filenames that contain integer indexes, like 1, 2, 3, etc.
shell-prompt: ls input-1.fastq.xz input-15.fastq.xz input-20.fastq.xz input-8.fastq.xz input-10.fastq.xz input-16.fastq.xz input-3.fastq.xz input-9.fastq.xz input-11.fastq.xz input-17.fastq.xz input-4.fastq.xz input-12.fastq.xz input-18.fastq.xz input-5.fastq.xz input-13.fastq.xz input-19.fastq.xz input-6.fastq.xz input-14.fastq.xz input-2.fastq.xz input-7.fastq.xz
Note that the listing above does not show the filenames in numeric order. This is because they are sorted lexically (more or less alphabetically) rather than numerically. Lexically, "10" is less than "9", because they are treated as strings, not numbers, and "1" comes before "9", like "A" comes before "B". This is a quirk with ls and and with many programming/scripting constructs.
#!/bin/sh -e for file in *.xz; do echo $file done
input-1.fastq.xz input-10.fastq.xz input-2.fastq.xz input-3.fastq.xz input-4.fastq.xz input-5.fastq.xz input-6.fastq.xz input-7.fastq.xz input-8.fastq.xz input-9.fastq.xz
Sometimes this is preferable, but it's a problem if you need to process things in numeric order. You will know when you actually start to work with your input files. This will happen any time the indexes contain a variable number of digits. It can be solved by simply left-padding the numbers with 0s, so that 0s so that all the numbers have the same number of digits. This makes the lexical and numeric orders the same.
shell-prompt: ls input-001.fastq.xz input-007.fastq.xz input-013.fastq.xz input-019.fastq.xz input-002.fastq.xz input-008.fastq.xz input-014.fastq.xz input-020.fastq.xz input-003.fastq.xz input-009.fastq.xz input-015.fastq.xz input-004.fastq.xz input-010.fastq.xz input-016.fastq.xz input-005.fastq.xz input-011.fastq.xz input-017.fastq.xz input-006.fastq.xz input-012.fastq.xz input-018.fastq.xz
Running the same script as above with the new filenames:
input-001.fastq.xz input-002.fastq.xz input-003.fastq.xz input-004.fastq.xz input-005.fastq.xz input-006.fastq.xz input-007.fastq.xz input-008.fastq.xz input-009.fastq.xz input-010.fastq.xz
If raw input filenames are cryptic, as they often are, you can simplify things by creating symbolic links with names that are both easier for people to read and easier for scripts and other programs to parse.
Consider the following FASTQ RNA sequence files, downloaded from the SRA (Sequence Read Archive) and NCBI. Note how the numbers embedded in the filenames appear somewhat sequential, with mostly increments of 7, but not always.
shell-prompt: ERR458493.fastq.gz ERR458528.fastq.gz ERR458563.fastq.gz ERR458906.fastq.gz ERR458500.fastq.gz ERR458535.fastq.gz ERR458878.fastq.gz ERR458913.fastq.gz ERR458507.fastq.gz ERR458542.fastq.gz ERR458885.fastq.gz ERR458920.fastq.gz ERR458514.fastq.gz ERR458549.fastq.gz ERR458892.fastq.gz ERR458927.fastq.gz ERR458521.fastq.gz ERR458556.fastq.gz ERR458899.fastq.gz ERR458934.fastq.gz
Writing a script to utilize these numbers would be a calamity. Instead, we can generate new names, without losing the old ones, using symbolic links:
#!/bin/sh -e index=1 for file in *.fastq.gz; do prefixed_index=$(printf "%03d" $index) ln -s $file input-$prefixed_index.fastq.gz index=$(($index + 1)) done
shell-prompt: ls ERR458493.fastq.gz ERR458563.fastq.gz input-001.fastq.gz@ input-011.fastq.gz@ ERR458500.fastq.gz ERR458878.fastq.gz input-002.fastq.gz@ input-012.fastq.gz@ ERR458507.fastq.gz ERR458885.fastq.gz input-003.fastq.gz@ input-013.fastq.gz@ ERR458514.fastq.gz ERR458892.fastq.gz input-004.fastq.gz@ input-014.fastq.gz@ ERR458521.fastq.gz ERR458899.fastq.gz input-005.fastq.gz@ input-015.fastq.gz@ ERR458528.fastq.gz ERR458906.fastq.gz input-006.fastq.gz@ input-016.fastq.gz@ ERR458535.fastq.gz ERR458913.fastq.gz input-007.fastq.gz@ input-017.fastq.gz@ ERR458542.fastq.gz ERR458920.fastq.gz input-008.fastq.gz@ input-018.fastq.gz@ ERR458549.fastq.gz ERR458927.fastq.gz input-009.fastq.gz@ input-019.fastq.gz@ ERR458556.fastq.gz ERR458934.fastq.gz input-010.fastq.gz@ input-020.fastq.gz@
Now that we have links with rational filenames, specifying an input file in a job array is easy:
#!/bin/sh -e #lpjs jobs 20 #lpjs processors-per-job 1 #lpjs threads-per-process 1 #lpjs pmem-per-processor 50MiB # Add leading 0s to index provided by LPJS index=$(printf "%03d" $LPJS_ARRAY_INDEX) myprog --input input-$index.fastq.gz --output output-$index.fastq.zst
A multiprocessing job is a job with the processors-per-job parameter > 1. This could be shared memory (all processes or threads on the same node, threads-per-process = processors-per-job), or distributed parallel (processes may be spread across more than one node, threads-per-process <= processors-per-job).
#!/bin/sh -e ################################################# # A submission of 10 shared memory parallel jobs # Each job requires 5 processors # LPJS will run as many of the 10 jobs as possible at the same time # Array of 10 multiprocessing jobs #lpjs jobs 10 # 5 processors per job #lpjs processors-per-job 5 # All processors must be on the same node #lpjs threads-per-process processors-per-job #lpjs pmem-per-processor 100MiB # Make sure the program uses only 5 processors (do not oversubscribe the processors) OMP_NUM_THREADS=5 export OMP_NUM_THREADS my-openmp-program arguments
The lpjs cancel command takes one or more job IDs or ranges of job IDs. To specify a range, separate two job IDs with a '-', and only a '-'. The range must be a single Unix shell argument, so it cannot contain any whitespace.
shell-prompt: lpjs jobs Legend: P = processor J = job N = node Pending JobID IDX Jobs P/J P/N MiB/P User Compute-node Script 169 7 18 2 2 10 bacon TBD 04-trim.lpjs 170 8 18 2 2 10 bacon TBD 04-trim.lpjs 171 9 18 2 2 10 bacon TBD 04-trim.lpjs 172 10 18 2 2 10 bacon TBD 04-trim.lpjs 173 11 18 2 2 10 bacon TBD 04-trim.lpjs 174 12 18 2 2 10 bacon TBD 04-trim.lpjs 175 13 18 2 2 10 bacon TBD 04-trim.lpjs 176 14 18 2 2 10 bacon TBD 04-trim.lpjs 177 15 18 2 2 10 bacon TBD 04-trim.lpjs 178 16 18 2 2 10 bacon TBD 04-trim.lpjs 179 17 18 2 2 10 bacon TBD 04-trim.lpjs 180 18 18 2 2 10 bacon TBD 04-trim.lpjs Running JobID IDX Jobs P/J P/N MiB/P User Compute-node Script 163 1 18 2 2 10 bacon barracuda.acadix.biz 04-trim.lpjs 164 2 18 2 2 10 bacon barracuda.acadix.biz 04-trim.lpjs 165 3 18 2 2 10 bacon tarpon.acadix.biz 04-trim.lpjs 166 4 18 2 2 10 bacon tarpon.acadix.biz 04-trim.lpjs 167 5 18 2 2 10 bacon tarpon.acadix.biz 04-trim.lpjs 168 6 18 2 2 10 bacon tarpon.acadix.biz 04-trim.lpjs # Cancel one job shell-prompt: lpjs cancel 163 # Cancel all pending and running jobs shell-prompt: lpjs cancel 163-180 # Cancel all jobs running on tarpon shell-prompt: lpjs cancel 165-168 # Cancel a random sample of jobs for no good reason other than demonstration shell-prompt: lpjs cancel 163-165 177 179-180
The lpjs cancel command does a fairly thorough job hunting down and terminating all processes run by a job script. There may be circumstances where some processes are missed, however. The only way to kill such processes is by first identifying them using top, ps, or similar commands, and then manually terminating them using kill.
It can be tricky to identify which processes are part of an active job and which are strays. LPJS chaperone creates a process group for this purpose. Running ps -jxw will show processes, along with their process group (PGID). Processes that are not in the same group as any chaperone process are strays.
Clusters normally have one or more file servers, so that jobs can run in a directory that is directly accessible from all nodes. This is the ideal situation, as input files are directly available to jobs, and output files from jobs can be written to their final location without needing to transfer them.
At present, it appears to be impractical to use macOS for compute nodes with data on a file server. macOS has a security feature that prevents programs from accessing most directories unless the user explicitly grants permission via the graphical interface. In order for LPJS to access file servers as required for normal operation, the program lpjs_compd must be granted full disk access via System Settings, Privacy and Security. Otherwise, you may see "operation not permitted" errors in the log when trying to access NFS shares.
The major problem is that this is not a one-time setting. Each time LPJS is updated, full disk access is revoked, and the user must enable it via the graphical interface again.
Grids normally do not have file servers. In this case, it will be necessary for all nodes to have the ability to pull files from and push files to somewhere. Typically, this somewhere would be the submit node, or a server accessible for file transfers from the submit node and all compute nodes.
LPJS does not provide file transfer tools. There are numerous highly-evolved, general-purpose file transfer tools already available, so it is left to the systems manager and user to decide which one(s) to use. We recommend using rsync if possible, as it is highly portable and reliable, and minimizes the amount of data transferred when repeating a transfer.
The lpjs submit command creates a marker file in the working directory on the submit host, named "lpjs-submit-host-name-shared-fs-marker" (replace "submit-host-name" with the FQDN of your submit node). If this file is not accessible to the compute node, then LPJS will take the necessary steps to create the temporary working directory and transfer it back to the submit node after the script terminates.
If the working directory (the directory from which the job is
submitted on the submit node) is not accessible to the compute
nodes (e.g. using NFS), then the user's script is
responsible for downloading any required input files.
Below is an example from Test/fastq-trim.lpjs
in the LPJS Github repository.
--copy-links
option with
rsync, so that it copies files pointed to by symbolic links,
rather than just recreating the symbolic link on the compute node.
You must understand each situation and decide whether this is
necessary.
# Marker file is created by "lpjs submit" so we can detect shared filesystems. # If this file does not exist on the compute nodes, then the compute nodes # must pull (download) the input files. marker=lpjs-$LPJS_SUBMIT_HOST-shared-fs-marker if [ ! -e $marker ]; then printf "$marker does not exist. Using rsync to transfer files.\n" set -x printf "Fetching $LPJS_SUBMIT_HOST:$LPJS_WORKING_DIRECTORY/$infile\n" # Use --copy-links if a file on the submit node might be a symbolic # link pointing to something that it not also being pulled here rsync --copy-links ${LPJS_SUBMIT_HOST}:$LPJS_WORKING_DIRECTORY/$infile . set +x else printf "$marker found. No need to transfer files.\n" fi
LPJS will, by default, transfer the contents of the temporary working directory back to the working directory on the submit node, using rsync -av temp-working-dir/ submit-host:working-dir. The "working-dir" above is the directory from which the job was submitted, and "temp-working-dir" is a job-specific temporary directory created by LPJS on the compute node. Following this transfer, the working directory on the submit node should contain the same output file as it would using a shared filesystem. Users can override the transfer command. command. See the Research Computing User Guide for details.
# If we downloaded the input file, remove it now to avoid wasting time # transferring it back. By default, LPJS transfers the entire temporary # working directory to the submit node using rsync. if [ ! -e $marker ]; then rm -f $infile fi
If your system uses NFS or some other networked file sharing service, so that the compute nodes have direct access to the same files as the submit host, then we can see output files grow while the job is running. There may be a slight delay due to NFS buffering, so the output files will not show everything that has been written by a different node at a given moment.
If compute nodes do not have direct access via NFS or similar, then a temporary working directory is created on the compute node. You can only monitor progress if you can log into that compute node and view files in the temporary working directory.