As a Unix scripter, you have been living in a cocoon until now, growing and developing, but confined and unable to do much. In the next few sections, you will become ready to emerge and spread your wings, so you can see the vast possibilities of automated research computing for the first time. To use another metaphor, this is where you will reach the critical mass of knowledge needed to step aside and let Unix do much of the work for you. Give this material the attention it deserves, so that your future as a computational scientist will be as easy and rewarding as it can be.
Sometimes we need to run a particular command or sequence of commands only if a certain condition is true. For example, if program B processes the output of program A, we probably won't want to run B at all unless A finished successfully.
Conditional execution in Unix shell scripts often utilizes the exit status of the most recent command. All Unix programs return an exit status. By convention, programs return an exit status of 0 if they determine that they completed their task successfully and a variety of non-zero error codes if they failed. There are some standard error codes defined in the C header file sysexits.h. You can learn about them by running man sysexits. Or, for a quick listing of their names and values, run grep '#define.*EX_' /usr/include/sysexits.h.
A shell script can assign an exit status by providing an argument to the exit command:
#!/bin/sh -e exit 0 # Report success (EX_OK) exit 65 # Report input error (EX_DATAERR)
We can check the exit status of the most recent command by
examining the shell variable $?
in
Bourne shell family shells or $status
in C shell family shells.
bash> ls myprog.c bash> echo $? 0 bash> ls -z ls: illegal option -- z usage: ls [-ABCFGHILPRSTUWZabcdfghiklmnopqrstuwx1] [-D format] [file ...] bash> echo $? 1 bash>
tcsh> ls myprog.c tcsh> echo $status 0 tcsh> ls -z ls: illegal option -- z usage: ls [-ABCFGHILPRSTUWZabcdfghiklmnopqrstuwx1] [-D format] [file ...] tcsh> echo $status 1 tcsh>
Run several commands correctly and incorrectly and check the $? or $status variable after each one.
All Unix shells have an if-then-else
construct implemented
as internal commands. The Bourne shell family of shells
all use the same basic syntax. The C shell family of shells
also use a common syntax, which is somewhat different from
the Bourne shell family, more closely resembling the C
language.
Unlike general-purpose languages such as C and Java,
a Bourne shell conditional command
does not take a Boolean (true/false) expression.
Rather, it takes a Unix command, and the decision is based
on the exit status of that command.
The general syntax of a Bourne shell family conditional
is shown below. Note that there can be an unlimited
number of elifs
,
but we will use only one for this example.
#!/bin/sh -e if command1; then # Command1 succeeded (exit status was 0) command command ... elif command2; then # Command2 succeeded (exit status was 0) command command ... else # All commands above failed command command ... fi
The 'if' and the 'then' are actually two separate commands, so they must either be on separate lines, or separated by a ';', which can be used instead of a newline to separate Unix commands.
Code controlled by an if
should be consistently
indented as shown above. How much indentation is used is
a matter of personal taste, but four spaces is typical.
In the example above, the if command executes command1 and checks the exit status when it completes. If the exit status is 0 (indicating success), then all the commands before the elif are executed, and everything after the elif is skipped.
If the exit status is non-zero, then nothing above the elif is executed. The elif command then executes command2 and checks its exit status.
If the exit status of command2 is 0, then the commands between the elif and the else are executed and everything after the else is skipped.
If the exit status of command2 is non-zero, everything above the else is skipped and everything between the else and the fi is executed.
In most programming languages, we use some sort of Boolean expression (usually a comparison, also known as a relation), not a command, as the condition for an if statement. This is generally true in Bourne shell scripts as well, but the capability is provided in an interesting way. We'll illustrate by showing an example and then explaining how it works.
Suppose we have a shell variable and we want to check whether it contains the string "blue". We could use the following if command to test:
#!/bin/sh -e printf "Enter the name of a color: " read color if [ "$color" = "blue" ]; then printf "You entered blue.\n" elif [ "$color" = "red" ]; then printf "You entered red.\n" else printf "You did not enter blue or red.\n" fi
This may look like it violates what we just stated; that Bourne shell conditionals take a command, not a Boolean expression. The interesting thing about this code is that the square brackets are not Bourne shell syntax.
The '[' in the conditional above is actually an external
command. In fact, it is simply another name for the
test command. The files
/bin/test
and /bin/[
are actually links the same executable file:
shell-prompt: ls -l /bin/test /bin/[ -r-xr-xr-x 2 root wheel 8516 Apr 9 2012 /bin/[* -r-xr-xr-x 2 root wheel 8516 Apr 9 2012 /bin/test*
We could have also written the following, to make it more obvious that we are actually running another command in the if command:
if test "$color" = "blue"; then
Hence, '$color', '=', 'blue', and ']' are arguments to the '[' command, and must be separated by white space. If the command is invoked as '[', it requires the last argument to be ']'. If invoked as 'test', the ']' is not allowed.
The test command can be used to perform comparisons (relational operations) on variables and constants, as well as a wide variety of tests on files. For comparisons, test takes three arguments: the first and third are string values and the second is a relational operator.
# Compare a variable to a string constant test "$name" = 'Bob' [ "$name" = 'Bob' ]
# Compare the output of a program directly to a string constant test `myprog` = 42 [ `myprog` = 42 ]
For file tests, test takes two arguments: The first is a flag indicating which test to perform and the second is the path name of the file or directory.
# See if output file exists and is readable to the user # running test test -r output.txt [ -r output.txt ]
The exit status of test is 0 (success) if the test is deemed to be true and a non-zero value if it is false.
shell-prompt: test 1 = 1 shell-prompt: echo $? # Or $status for C shell 0 shell-prompt: test 1 = 2 shell-prompt: echo $? # Or $status for C shell 1
The relational operators supported by test are shown in Table 4.5, “Test Command Relational Operators”.
Table 4.5. Test Command Relational Operators
Operator | Relation |
---|---|
= | Lexical equality (string comparison) |
-eq | Integer equality |
!= | Lexical inequality (string comparison) |
-ne | Integer inequality |
< | Lexical less-than (10 < 9) |
-lt | Integer less-than (9 -lt 10) |
-le | Integer less-than or equal |
> | Lexical greater-than |
-gt | Integer greater-than |
-ge | Integer greater-than or equal |
Note that some operators, such as < and >, have special meaning to the shell, so they must be escaped or quoted.
[ 10 > 9 ] test 10 > 9 # Redirects output to a file called '9'. # The only argument sent to the test command is '10'. # The test command issues an error message since it # did not receive enough arguments. [ 10 \> 9 ] test 10 \> 9 # Compares 10 to 9 [ 10 '>' 9 ] test 10 '>' 9 # Compares 10 to 9
Common file tests are shown in Table 4.6, “Test command file operations”. To learn about additional file tests, run man test.
Table 4.6. Test command file operations
Flag | Test |
---|---|
-e | Exists |
-r | Is readable |
-w | Is writable |
-x | Is executable |
-d | Is a directory |
-f | Is a regular file |
-L | Is a symbolic link |
-s | Exists and is not empty |
-z | Exists and is empty |
Variable references in a [ or test command should usually be enclosed in soft quotes. If the value of the variable contains white space, such as "navy blue", and the variable is not enclosed in quotes, then "navy" and "blue" will be considered two separate arguments to the [ command, and it will fail.
Furthermore, if there is a chance that a variable used in a comparison is empty, then we must attach a common string to the arguments on both sides of the operator. It can be almost any character, but '0' is popular and easy to read.
name="" if [ "$name" = "Bob" ]; then # Error, expands to: if [ = Bob; then if [ 0"$name" = 0"Bob" ]; then # OK, expands to: if [ 0 = 0Bob ]; then
Relational operators are provided by the test command, not by the shell. Hence, to find out the details, we would run "man test" or "man [", not "man sh" or "man bash".
Run the following commands in sequence and run echo $? after every test under Bourne shell or echo $status after every test under C shell.
which [ test test 1 = 1 test 1=2 test 1 = 2 [ 1 = 1 [ 1 = 1 ] [ 2 < 10 ] [ 2 \< 10 ] [ 2 -lt 10 ] name='' # Bourne shell only set name='' # C shell only [ $name = Bill ] [ 0$name = 0Bill ] name=Bob # Bourne shell only set name=Bob # C shell only [ $name = Bill ] [ $name = Bob ]
Unlike the Bourne shell family of shells, the C shell family implements its own conditional expressions and operators, so there is generally no need for the test or [ command, though you can use it in C shell scripts if you really want to.
The C shell if command requires () around the condition, and the condition is a Boolean expression, just like in C and similar languages. As in C, and unlike Bourne shell, a value of zero is considered false and non-zero is true.
#!/bin/csh -ef printf "Enter the name of a color: " set color = "$<" if ( { test "$color" \< blue } ) then printf "Yup.\n" endif if ( "$color" == "blue" ) then printf "You entered blue.\n" else if ( "$color" == "red" ) then printf "You entered red.\n" else printf "You did not enter blue or red.\n" endif
The C shell relational operators are shown in Table 4.7, “C Shell Relational Operators”.
Table 4.7. C Shell Relational Operators
Operator | Relation |
---|---|
< | Integer less-than |
> | Integer greater-than |
<= | Integer less-then or equal |
>= | Integer greater-than or equal |
== | String equality |
!= | String inequality |
=~ | String matches glob pattern |
!~ | String does not match glob pattern |
C shell if commands also need soft quotes around strings that contain white space. However, unlike the test command, it can handle empty strings, so we don't need to add an arbitrary prefix like '0' if the string may be empty.
if [ 0"$name" = 0"Bob" ]; then
if ( "$name" == "Bob" ) then
The most readable way to check the status of a command
in C shell is using the status variable. Note that we
need to avoid invoking csh with
-e
so that the shell process will not
terminate when a command fails.
#!/bin/csh -f command1 if ( $status == 0 ) then # Stuff to do only if command1 succeeded else exit 1 # Exit on error since we did not use #!/bin/csh -e endif
Unix shells provide conditional operators that allow us to invert the exit status of a command or combine exit status from multiple commands. They use the same Boolean operators as C for AND (&&), OR (||), and NOT (!).
Table 4.8. Shell Conditional Operators
Operator | Meaning | Exit status |
---|---|---|
test ! command | NOT | 0 if command failed, 1 if it succeeded |
command1 && command2 | AND | 0 if both commands succeeded |
command1 || command2 | OR | 0 if either command succeeded |
# Invert exit status (0 to non-zero, non-zero to 0) shell-prompt: ! command # See if both command1 and command2 succeeded shell-prompt: command1 && command2 # See if either command1 or command2 succeeded shell-prompt: command1 || command2
These operators can be used in Bourne shell conditionals much the same way as in C:
if [ 0"$first_name" = 0"Bob" ] && [ 0"$last_name" = 0"Newhart" ]; then
We can also the test command's own operators:
if [ 0"$first_name" = 0"Bob" -a 0"$last_name" = 0"Newhart" ]; then
Note that in the case of the && operator, command2 will not be executed if command 1 fails (exits with non-zero status). There is no point running the second command, since both commands must succeed to produce an overall status of 0. Once any command in an && sequence fails, the exit status of the whole sequence will be 1 no matter what happens after that.
Likewise in the case of a || operator, once any command succeeds (exits with zero status), the remaining commands will not be executed.
This fact is often used as a clever trick to conditionally execute a command only if another command succeeds or fails.
# Execute main-processing only if pre-processing succeeds and # post-processing only if main-processing succeeds pre-processing && main-processing && post-processing # Equivalent using an if-then-fi if pre-processing; then if main-processing; then post-processing fi fi
Conditional operators can also be used in a C shell if command. Parenthesis are recommended around each relation for readability.
if ( ("$first_name" == "Bob") && ("$last_name" == "Newhart") ) then
Run the following commands in sequence and run echo $? after every command under Bourne shell or echo $status after every command under C shell.
ls -z ls -z && echo Done ls -a && echo Done ls -z || echo Done ls -a || echo Done
Instructor: Lead the class through development of a script that does the following. The solution is at the end of the chapter. No peeking...
Lists the CWD
Prompts the user for a filename and reads it
Prints an error message and exits with status 65 if the file does not exist or is not a regular file
Displays the first 5 lines of the file
Prompts the user for a simple search string and reads it
Displays lines in the file that contain the search string
shell-prompt: ./search.sh CNC-EMDiff Reference search-me.txt Computer SVN search.sh Enter the name of the file to search: searchme.txt searchme.txt is not a regular file. shell-prompt: ./search.sh CNC-EMDiff Reference search-me.txt Computer SVN search.sh Enter the name of the file to search: GFF GFF is not a regular file. shell-prompt: ./search.sh CNC-EMDiff Reference search-me.txt Computer SVN search.sh Enter the name of the file to search: search-me.txt This is a test file Enter a string to search for in search-me.txt: test test
If you need to compare a single variable to many different values, you could use a long string of elifs or else ifs:
#!/bin/sh -e printf "Enter a color name: " read color if [ "$color" = "red" ] || \ [ "$color" = "orange" ]; then printf "Long wavelength\n" elif [ "$color" = "yellow" ] || \ [ "$color" = "green" ] || \ [ "$color" = "blue" ]; then printf "Medium wavelength\n" elif [ "$color" = "indigo" ] || \ [ "$color" = "violet" ]; then printf "Short wavelength\n" else printf "Invalid color name: $color\n" fi
Like most languages, however, Unix shells offer a cleaner solution.
Bourne shell has the case command:
#!/bin/sh -e printf "Enter a color name: " read color case $color in red|orange) printf "Long wavelength\n" ;; yellow|green|blue) printf "Medium wavelength\n" ;; indigo|violet) printf "Short wavelength\n" ;; *) printf "Invalid color name: $color\n" ;; esac
C shell has a switch command that looks almost exactly like the switch statement in C, C++, and Java:
#!/bin/csh -ef printf "Enter a color name: " set color = "$<" switch($color) case red: case orange: printf "Long wavelength\n" breaksw case yellow: case green: case blue: printf "Medium wavelength\n" breaksw case indigo: case violet: printf "Short wavelength\n" breaksw default: printf "Invalid color name: $color\n" endsw
Code controlled by a case
or switch
should be consistently indented as shown above.
How much indentation is used is a matter of
personal taste, but four spaces is typical.
What is the exit status of a command that succeeds? One that fails?
What variables contain the exit status of the most recent command in Bourne shell and C shell?
What is the meaning of [ in a shell script and how is it used?
Write a shell script that lists the files in the CWD, asks the user for the name of a file, tests for the existence of the file, and issues an error message if it does not exist. If the file does exist, the script then asks for a source string and a replacement string, verifying that the source string is not empty, and then shows the file content with all occurrences of the source replaced by the replacement. The script should exit with status 65 (EX_DATAERR) if any bad input is received.
shell-prompt: cat fox.txt The quick brown fox jumped over the lazy dog. shell-prompt: ./replace.sh Documents R fox.txt stringy Downloads igv stringy.c File name? fxo.txt File fxo.txt does not exist. shell-prompt: echo $? 65 shell-prompt: ./replace.sh Documents R fox.txt stringy Downloads igv stringy.c File name? fox.txt Source string? Source string cannot be empty. shell-prompt: echo $? 65 shell-prompt: ./replace.sh Documents R fox.txt stringy Downloads igv stringy.c File name? fox.txt Source string? fox Replacement string? tortoise The quick brown tortoise jumped over the lazy dog. shell-prompt: echo $? 0
Modify the previous script so that it reports an error if the replacement string is empty or the same as the source. Use a conditional operator to check both conditions in one if-then-else command.
shell-prompt: ./replace.sh Documents R fox.txt stringy Downloads igv stringy.c File name? fox.txt Source string? fox Replacement string? Replacement must not be empty or the same as source. shell-prompt: echo $? 65 shell-prompt: ./replace.sh Documents R fox.txt stringy Downloads igv stringy.c File name? fox.txt Source string? fox Replacement string? fox Replacement must not be empty or the same as source. shell-prompt: echo $? 65
Write a shell script that asks the user for a directory name and the name of an archive to create from it, checks the file name extension on the archive name using a switch/case, and creates a tarball with the appropriate compression. The script should report an error and exit with status 65 if an invalid file name extension is used for the archive name, or if the directory name entered does not exist or is not a directory.
shell-prompt: ./case.sh Coral Qemu case.sh scripts Directory to archive? Qem Qem is not an existing directory. shell-prompt: ./case.sh Coral Qemu case.sh scripts Directory to archive? case.sh case.sh is not an existing directory. shell-prompt: ./case.sh Coral Qemu case.sh scripts Directory to archive? Qemu Archive name? qemu.ta Invalid archive name: qemu.ta shell-prompt: ./case.sh Coral Qemu case.sh scripts Directory to archive? Qemu Archive name? qemu.txz a Qemu a Qemu/FreeBSD-13.0-RELEASE-riscv-riscv64.raw
Table 4.9. Compression tool for each filename extension
Extension | Tool |
---|---|
tar | No compression |
tar.gz or tgz | gzip |
tar.bz2 or tbz | bzip2 |
tar.xz or txz | xz |
In Bourne shell, the file name extension can be extracted from a shell variable as follows:
extension=${filename##*.} # Strip off everything to the last '.'
In C shell:
set extension=${archive:e} # Extract filename extension
Write a shell script that does the following in sequence:
Check for the existence of Homo_sapiens.GRCh38.107.chromosome.1.gff3 in the CWD. If it is not present, download the gzipped file using curl from http://ftp.ensembl.org/pub/release-107/gff3/homo_sapiens/ and decompress it.
Display the first 5 lines of the file that do not begin with '#', so the user can see the format of an entry. Hint: See the grep man page for an option to select lines that do not match the given pattern.
Ask the user which column to search, and then display all unique values in that column. Hint: Use grep again to filter out lines beginning with '#', run them through cut or awk to select just the desired column, and run the output through sort and uniq or just sort with the appropriate flags.
Ask the user for a search key, and display all the lines
in the file not beginning with '#' that
contain the given key in the given
column. Hint: The awk '~' operator means "contains", e.g.
'$1 ~ "text"' means the first field contains the string "text".
You can use the -v
to set
column
and key
variables to use in the
awk pattern.
awk -v column=$column -v key=$key 'your awk script'
shell-prompt: ./gff.sh % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 4111k 100 4111k 0 0 550k 0 0:00:07 0:00:07 --:--:-- 589k 1 GRCh38 chromosome 1 248956422 . . . ID=chromosome:1;Alias=CM000663.2,chr1,NC_000001.11 1 . biological_region 10469 11240 1.3e+03 . . external_name=oe %3D 0.79;logic_name=cpg 1 . biological_region 10650 10657 0.999 + . logic_name=eponine 1 . biological_region 10655 10657 0.999 - . logic_name=eponine 1 . biological_region 10678 10687 0.999 + . logic_name=eponine Column to search? 3 CDS biological_region chromosome exon five_prime_UTR gene lnc_RNA mRNA miRNA ncRNA ncRNA_gene pseudogene pseudogenic_transcript rRNA scRNA snRNA snoRNA three_prime_UTR unconfirmed_transcript Search key? UTR 1 havana five_prime_UTR 65419 65433 . + . Parent=transcript:ENST00000641515 1 havana five_prime_UTR 65520 65564 . + . Parent=transcript:ENST00000641515 1 havana three_prime_UTR 70009 71585 . + . Parent=transcript:ENST00000641515 1 havana five_prime_UTR 923923 924431 . + . Parent=transcript:ENST00000616016 1 havana three_prime_UTR 944154 944574 . + . Parent=transcript:ENST00000616016 ...