Conditional Execution

As a Unix scripter, you have been living in a cocoon until now, growing and developing, but confined and unable to do much. In the next few sections, you will become ready to emerge and spread your wings, so you can see the vast possibilities of automated research computing for the first time. To use another metaphor, this is where you will reach the critical mass of knowledge needed to step aside and let Unix do much of the work for you. Give this material the attention it deserves, so that your future as a computational scientist will be as easy and rewarding as it can be.

Sometimes we need to run a particular command or sequence of commands only if a certain condition is true. For example, if program B processes the output of program A, we probably won't want to run B at all unless A finished successfully.

Command Exit Status

Conditional execution in Unix shell scripts often utilizes the exit status of the most recent command. All Unix programs return an exit status. By convention, programs return an exit status of 0 if they determine that they completed their task successfully and a variety of non-zero error codes if they failed. There are some standard error codes defined in the C header file sysexits.h. You can learn about them by running man sysexits. Or, for a quick listing of their names and values, run grep '#define.*EX_' /usr/include/sysexits.h.

A shell script can assign an exit status by providing an argument to the exit command:

#!/bin/sh -e

exit 0      # Report success (EX_OK)

exit 65     # Report input error (EX_DATAERR)
            

We can check the exit status of the most recent command by examining the shell variable $? in Bourne shell family shells or $status in C shell family shells.

bash> ls
myprog.c
bash> echo $?
0
bash> ls -z
ls: illegal option -- z
usage: ls [-ABCFGHILPRSTUWZabcdfghiklmnopqrstuwx1] [-D format] [file ...]
bash> echo $?
1
bash> 
            
tcsh> ls
myprog.c
tcsh> echo $status
0
tcsh> ls -z
ls: illegal option -- z
usage: ls [-ABCFGHILPRSTUWZabcdfghiklmnopqrstuwx1] [-D format] [file ...]
tcsh> echo $status
1
tcsh> 
            

Practice Break

Run several commands correctly and incorrectly and check the $? or $status variable after each one.

If-then-else Commands

All Unix shells have an if-then-else construct implemented as internal commands. The Bourne shell family of shells all use the same basic syntax. The C shell family of shells also use a common syntax, which is somewhat different from the Bourne shell family, more closely resembling the C language.

Bourne Shell Family

Unlike general-purpose languages such as C and Java, a Bourne shell conditional command does not take a Boolean (true/false) expression. Rather, it takes a Unix command, and the decision is based on the exit status of that command. The general syntax of a Bourne shell family conditional is shown below. Note that there can be an unlimited number of elifs, but we will use only one for this example.

#!/bin/sh -e

if command1; then   # Command1 succeeded (exit status was 0)
    command
    command
    ...
elif command2; then # Command2 succeeded (exit status was 0)
    command
    command
    ...
else                # All commands above failed
    command
    command
    ...
fi
                

Note

The 'if' and the 'then' are actually two separate commands, so they must either be on separate lines, or separated by a ';', which can be used instead of a newline to separate Unix commands.

Note

Code controlled by an if should be consistently indented as shown above. How much indentation is used is a matter of personal taste, but four spaces is typical.

In the example above, the if command executes command1 and checks the exit status when it completes. If the exit status is 0 (indicating success), then all the commands before the elif are executed, and everything after the elif is skipped.

If the exit status is non-zero, then nothing above the elif is executed. The elif command then executes command2 and checks its exit status.

If the exit status of command2 is 0, then the commands between the elif and the else are executed and everything after the else is skipped.

If the exit status of command2 is non-zero, everything above the else is skipped and everything between the else and the fi is executed.

Note

In Bourne shell if commands, an exit status of zero effectively means 'true' and non-zero means 'false', which is the opposite of C and similar languages.

In most programming languages, we use some sort of Boolean expression (usually a comparison, also known as a relation), not a command, as the condition for an if statement. This is generally true in Bourne shell scripts as well, but the capability is provided in an interesting way. We'll illustrate by showing an example and then explaining how it works.

Suppose we have a shell variable and we want to check whether it contains the string "blue". We could use the following if command to test:

#!/bin/sh -e

printf "Enter the name of a color: "
read color

if [ "$color" = "blue" ]; then
    printf "You entered blue.\n"
elif [ "$color" = "red" ]; then
    printf "You entered red.\n"
else
    printf "You did not enter blue or red.\n"
fi

This may look like it violates what we just stated; that Bourne shell conditionals take a command, not a Boolean expression. The interesting thing about this code is that the square brackets are not Bourne shell syntax.

The '[' in the conditional above is actually an external command. In fact, it is simply another name for the test command. The files /bin/test and /bin/[ are actually links the same executable file:

shell-prompt: ls -l /bin/test /bin/[
-r-xr-xr-x  2 root  wheel  8516 Apr  9  2012 /bin/[*
-r-xr-xr-x  2 root  wheel  8516 Apr  9  2012 /bin/test*
                

We could have also written the following, to make it more obvious that we are actually running another command in the if command:

if test "$color" = "blue"; then
                

Hence, '$color', '=', 'blue', and ']' are arguments to the '[' command, and must be separated by white space. If the command is invoked as '[', it requires the last argument to be ']'. If invoked as 'test', the ']' is not allowed.

The test command can be used to perform comparisons (relational operations) on variables and constants, as well as a wide variety of tests on files. For comparisons, test takes three arguments: the first and third are string values and the second is a relational operator.

# Compare a variable to a string constant
test "$name" = 'Bob'
[ "$name" = 'Bob' ]
                
# Compare the output of a program directly to a string constant
test `myprog` = 42
[ `myprog` = 42 ]
                

For file tests, test takes two arguments: The first is a flag indicating which test to perform and the second is the path name of the file or directory.

# See if output file exists and is readable to the user
# running test
test -r output.txt
[ -r output.txt ]
                

The exit status of test is 0 (success) if the test is deemed to be true and a non-zero value if it is false.

shell-prompt: test 1 = 1
shell-prompt: echo $?   # Or $status for C shell
0
shell-prompt: test 1 = 2
shell-prompt: echo $?   # Or $status for C shell
1
                

The relational operators supported by test are shown in Table 4.5, “Test Command Relational Operators”.

Table 4.5. Test Command Relational Operators

OperatorRelation
=Lexical equality (string comparison)
-eqInteger equality
!=Lexical inequality (string comparison)
-neInteger inequality
<Lexical less-than (10 < 9)
-ltInteger less-than (9 -lt 10)
-leInteger less-than or equal
>Lexical greater-than
-gtInteger greater-than
-geInteger greater-than or equal

Caution

Note that some operators, such as < and >, have special meaning to the shell, so they must be escaped or quoted.

[ 10 > 9 ]
test 10 > 9     # Redirects output to a file called '9'.
                # The only argument sent to the test command is '10'.
                # The test command issues an error message since it
                # did not receive enough arguments.

[ 10 \> 9 ]
test 10 \> 9    # Compares 10 to 9

[ 10 '>' 9 ]
test 10 '>' 9   # Compares 10 to 9
                

Caution

It is a common error to use '==' with the test command, but the correct equality operator is '=', unlike C and similar languages.

Common file tests are shown in Table 4.6, “Test command file operations”. To learn about additional file tests, run man test.

Table 4.6. Test command file operations

FlagTest
-eExists
-rIs readable
-wIs writable
-xIs executable
-dIs a directory
-fIs a regular file
-LIs a symbolic link
-sExists and is not empty
-zExists and is empty

Caution

Variable references in a [ or test command should usually be enclosed in soft quotes. If the value of the variable contains white space, such as "navy blue", and the variable is not enclosed in quotes, then "navy" and "blue" will be considered two separate arguments to the [ command, and it will fail.

Furthermore, if there is a chance that a variable used in a comparison is empty, then we must attach a common string to the arguments on both sides of the operator. It can be almost any character, but '0' is popular and easy to read.

name=""
if [ "$name" = "Bob" ]; then    # Error, expands to: if [ = Bob; then
if [ 0"$name" = 0"Bob" ]; then  # OK, expands to: if [ 0 = 0Bob ]; then
                

Relational operators are provided by the test command, not by the shell. Hence, to find out the details, we would run "man test" or "man [", not "man sh" or "man bash".

Practice Break

Run the following commands in sequence and run echo $? after every test under Bourne shell or echo $status after every test under C shell.

which [ test
test 1 = 1
test 1=2
test 1 = 2
[ 1 = 1
[ 1 = 1 ]
[ 2 < 10 ]
[ 2 \< 10 ]
[ 2 -lt 10 ]
name=''             # Bourne shell only
set name=''         # C shell only
[ $name = Bill ]
[ 0$name = 0Bill ]
name=Bob            # Bourne shell only
set name=Bob        # C shell only
[ $name = Bill ]
[ $name = Bob ]
                    
C shell Family

Unlike the Bourne shell family of shells, the C shell family implements its own conditional expressions and operators, so there is generally no need for the test or [ command, though you can use it in C shell scripts if you really want to.

The C shell if command requires () around the condition, and the condition is a Boolean expression, just like in C and similar languages. As in C, and unlike Bourne shell, a value of zero is considered false and non-zero is true.

#!/bin/csh -ef

printf "Enter the name of a color: "
set color = "$<"

if ( { test "$color" \< blue } ) then
    printf "Yup.\n"
endif

if ( "$color" == "blue" ) then
    printf "You entered blue.\n"
else if ( "$color" == "red" ) then
    printf "You entered red.\n"
else
    printf "You did not enter blue or red.\n"
endif

The C shell relational operators are shown in Table 4.7, “C Shell Relational Operators”.

Table 4.7. C Shell Relational Operators

OperatorRelation
<Integer less-than
>Integer greater-than
<=Integer less-then or equal
>=Integer greater-than or equal
==String equality
!=String inequality
=~String matches glob pattern
!~String does not match glob pattern

C shell if commands also need soft quotes around strings that contain white space. However, unlike the test command, it can handle empty strings, so we don't need to add an arbitrary prefix like '0' if the string may be empty.

if [ 0"$name" = 0"Bob" ]; then
                
if ( "$name" == "Bob" ) then
                

The most readable way to check the status of a command in C shell is using the status variable. Note that we need to avoid invoking csh with -e so that the shell process will not terminate when a command fails.

#!/bin/csh -f

command1
if ( $status == 0 ) then
    # Stuff to do only if command1 succeeded
else
    exit 1  # Exit on error since we did not use #!/bin/csh -e
endif
                
Shell Conditional Operators

Unix shells provide conditional operators that allow us to invert the exit status of a command or combine exit status from multiple commands. They use the same Boolean operators as C for AND (&&), OR (||), and NOT (!).

Table 4.8. Shell Conditional Operators

OperatorMeaningExit status
test ! commandNOT0 if command failed, 1 if it succeeded
command1 && command2AND0 if both commands succeeded
command1 || command2OR0 if either command succeeded

# Invert exit status (0 to non-zero, non-zero to 0)
shell-prompt: ! command

# See if both command1 and command2 succeeded
shell-prompt: command1 && command2

# See if either command1 or command2 succeeded
shell-prompt: command1 || command2
            

These operators can be used in Bourne shell conditionals much the same way as in C:

if [ 0"$first_name" = 0"Bob" ] && [ 0"$last_name" = 0"Newhart" ]; then
            

We can also the test command's own operators:

if [ 0"$first_name" = 0"Bob" -a 0"$last_name" = 0"Newhart" ]; then
            

Note that in the case of the && operator, command2 will not be executed if command 1 fails (exits with non-zero status). There is no point running the second command, since both commands must succeed to produce an overall status of 0. Once any command in an && sequence fails, the exit status of the whole sequence will be 1 no matter what happens after that.

Likewise in the case of a || operator, once any command succeeds (exits with zero status), the remaining commands will not be executed.

This fact is often used as a clever trick to conditionally execute a command only if another command succeeds or fails.

# Execute main-processing only if pre-processing succeeds and
# post-processing only if main-processing succeeds
pre-processing && main-processing && post-processing

# Equivalent using an if-then-fi
if pre-processing; then
    if main-processing; then
        post-processing
    fi
fi
            

Conditional operators can also be used in a C shell if command. Parenthesis are recommended around each relation for readability.

if ( ("$first_name" == "Bob") && ("$last_name" == "Newhart") ) then
            

Practice Break

Run the following commands in sequence and run echo $? after every command under Bourne shell or echo $status after every command under C shell.

ls -z
ls -z && echo Done
ls -a && echo Done
ls -z || echo Done
ls -a || echo Done
                

Practice Break

Instructor: Lead the class through development of a script that does the following. The solution is at the end of the chapter. No peeking...

  1. Lists the CWD

  2. Prompts the user for a filename and reads it

  3. Prints an error message and exits with status 65 if the file does not exist or is not a regular file

  4. Displays the first 5 lines of the file

  5. Prompts the user for a simple search string and reads it

  6. Displays lines in the file that contain the search string

shell-prompt: ./search.sh 
CNC-EMDiff              Reference               search-me.txt
Computer                SVN                     search.sh
Enter the name of the file to search: searchme.txt
searchme.txt is not a regular file.

shell-prompt: ./search.sh
CNC-EMDiff              Reference               search-me.txt
Computer                SVN                     search.sh
Enter the name of the file to search: GFF
GFF is not a regular file.

shell-prompt: ./search.sh
CNC-EMDiff              Reference               search-me.txt
Computer                SVN                     search.sh
Enter the name of the file to search: search-me.txt
This
is
a
test
file
Enter a string to search for in search-me.txt: test
test
                
Case and Switch Commands

If you need to compare a single variable to many different values, you could use a long string of elifs or else ifs:

#!/bin/sh -e

printf "Enter a color name: "
read color

if [ "$color" = "red" ] || \
   [ "$color" = "orange" ]; then
    printf "Long wavelength\n"
elif [ "$color" = "yellow" ] || \
     [ "$color" = "green" ] || \
     [ "$color" = "blue" ]; then
    printf "Medium wavelength\n"
elif [ "$color" = "indigo" ] || \
     [ "$color" = "violet" ]; then
    printf "Short wavelength\n"
else
    printf "Invalid color name: $color\n"
fi

Like most languages, however, Unix shells offer a cleaner solution.

Bourne shell has the case command:

#!/bin/sh -e

printf "Enter a color name: "
read color

case $color in
    red|orange)
        printf "Long wavelength\n"
        ;;
    yellow|green|blue)
        printf "Medium wavelength\n"
        ;;
    indigo|violet)
        printf "Short wavelength\n"
        ;;
    *)
        printf "Invalid color name: $color\n"
        ;;
esac

C shell has a switch command that looks almost exactly like the switch statement in C, C++, and Java:

#!/bin/csh -ef

printf "Enter a color name: "
set color = "$<"

switch($color)
case    red:
case    orange:
    printf "Long wavelength\n"
    breaksw
case    yellow:
case    green:
case    blue:
    printf "Medium wavelength\n"
    breaksw
case    indigo:
case    violet:
    printf "Short wavelength\n"
    breaksw
default:
    printf "Invalid color name: $color\n"
endsw

Note

The ;; and breaksw commands cause a jump to the first command after the entire case or switch. The ;; is required after every value in the Bourne shell case command. The breaksw is optional in the switch command. If omitted, the script will simply "fall through" to the next case (continue on and execute the commands for the next case value).

Note

Code controlled by a case or switch should be consistently indented as shown above. How much indentation is used is a matter of personal taste, but four spaces is typical.

Practice

Note

Be sure to thoroughly review the instructions in Section 2, “Practice Problem Instructions” before doing the practice problems below.
  1. What is the exit status of a command that succeeds? One that fails?

  2. What variables contain the exit status of the most recent command in Bourne shell and C shell?

  3. What is the meaning of [ in a shell script and how is it used?

  4. Write a shell script that lists the files in the CWD, asks the user for the name of a file, tests for the existence of the file, and issues an error message if it does not exist. If the file does exist, the script then asks for a source string and a replacement string, verifying that the source string is not empty, and then shows the file content with all occurrences of the source replaced by the replacement. The script should exit with status 65 (EX_DATAERR) if any bad input is received.

    shell-prompt: cat fox.txt
    The quick brown fox jumped over the lazy dog.
    
    shell-prompt: ./replace.sh
    Documents          R                  fox.txt            stringy
    Downloads          igv                stringy.c
    File name? fxo.txt
    File fxo.txt does not exist.
    shell-prompt: echo $?
    65
    
    shell-prompt: ./replace.sh
    Documents          R                  fox.txt            stringy
    Downloads          igv                stringy.c
    File name? fox.txt
    Source string? 
    Source string cannot be empty.
    shell-prompt: echo $?
    65
    
    shell-prompt: ./replace.sh
    Documents          R                  fox.txt            stringy
    Downloads          igv                stringy.c
    File name? fox.txt
    Source string? fox
    Replacement string? tortoise
    The quick brown tortoise jumped over the lazy dog.
    shell-prompt: echo $?
    0
            
  5. Modify the previous script so that it reports an error if the replacement string is empty or the same as the source. Use a conditional operator to check both conditions in one if-then-else command.

    shell-prompt: ./replace.sh
    Documents          R                  fox.txt            stringy
    Downloads          igv                stringy.c
    File name? fox.txt
    Source string? fox
    Replacement string?
    Replacement must not be empty or the same as source.
    shell-prompt: echo $?
    65
    
    shell-prompt: ./replace.sh
    Documents          R                  fox.txt            stringy
    Downloads          igv                stringy.c
    File name? fox.txt
    Source string? fox
    Replacement string? fox
    Replacement must not be empty or the same as source.
    shell-prompt: echo $?
    65
            
  6. Write a shell script that asks the user for a directory name and the name of an archive to create from it, checks the file name extension on the archive name using a switch/case, and creates a tarball with the appropriate compression. The script should report an error and exit with status 65 if an invalid file name extension is used for the archive name, or if the directory name entered does not exist or is not a directory.

    shell-prompt: ./case.sh
    Coral              Qemu               case.sh            scripts
    Directory to archive? Qem
    Qem is not an existing directory.
    
    shell-prompt: ./case.sh
    Coral              Qemu               case.sh            scripts
    Directory to archive? case.sh
    case.sh is not an existing directory.
    
    shell-prompt: ./case.sh
    Coral              Qemu               case.sh            scripts
    Directory to archive? Qemu
    Archive name? qemu.ta
    Invalid archive name: qemu.ta
    
    shell-prompt: ./case.sh
    Coral              Qemu               case.sh            scripts
    Directory to archive? Qemu  
    Archive name? qemu.txz
    a Qemu
    a Qemu/FreeBSD-13.0-RELEASE-riscv-riscv64.raw
            

    Table 4.9. Compression tool for each filename extension

    ExtensionTool
    tarNo compression
    tar.gz or tgzgzip
    tar.bz2 or tbzbzip2
    tar.xz or txzxz

    In Bourne shell, the file name extension can be extracted from a shell variable as follows:

    extension=${filename##*.}   # Strip off everything to the last '.'
            

    In C shell:

    set extension=${archive:e}  # Extract filename extension
            
  7. Write a shell script that does the following in sequence:

    1. Check for the existence of Homo_sapiens.GRCh38.107.chromosome.1.gff3 in the CWD. If it is not present, download the gzipped file using curl from http://ftp.ensembl.org/pub/release-107/gff3/homo_sapiens/ and decompress it.

    2. Display the first 5 lines of the file that do not begin with '#', so the user can see the format of an entry. Hint: See the grep man page for an option to select lines that do not match the given pattern.

    3. Ask the user which column to search, and then display all unique values in that column. Hint: Use grep again to filter out lines beginning with '#', run them through cut or awk to select just the desired column, and run the output through sort and uniq or just sort with the appropriate flags.

    4. Ask the user for a search key, and display all the lines in the file not beginning with '#' that contain the given key in the given column. Hint: The awk '~' operator means "contains", e.g. '$1 ~ "text"' means the first field contains the string "text". You can use the -v to set column and key variables to use in the awk pattern.

      awk -v column=$column -v key=$key 'your awk script'
                  
    shell-prompt: ./gff.sh 
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100 4111k  100 4111k    0     0   550k      0  0:00:07  0:00:07 --:--:--  589k
    
    1       GRCh38  chromosome      1       248956422       .       .       .      ID=chromosome:1;Alias=CM000663.2,chr1,NC_000001.11
    1       .       biological_region       10469   11240   1.3e+03 .       .      external_name=oe %3D 0.79;logic_name=cpg
    1       .       biological_region       10650   10657   0.999   +       .      logic_name=eponine
    1       .       biological_region       10655   10657   0.999   -       .      logic_name=eponine
    1       .       biological_region       10678   10687   0.999   +       .      logic_name=eponine
    
    Column to search? 3
    CDS
    biological_region
    chromosome
    exon
    five_prime_UTR
    gene
    lnc_RNA
    mRNA
    miRNA
    ncRNA
    ncRNA_gene
    pseudogene
    pseudogenic_transcript
    rRNA
    scRNA
    snRNA
    snoRNA
    three_prime_UTR
    unconfirmed_transcript
    
    Search key? UTR
    1       havana  five_prime_UTR  65419   65433   .       +       .       Parent=transcript:ENST00000641515
    1       havana  five_prime_UTR  65520   65564   .       +       .       Parent=transcript:ENST00000641515
    1       havana  three_prime_UTR 70009   71585   .       +       .       Parent=transcript:ENST00000641515
    1       havana  five_prime_UTR  923923  924431  .       +       .       Parent=transcript:ENST00000616016
    1       havana  three_prime_UTR 944154  944574  .       +       .       Parent=transcript:ENST00000616016
    ...