Loops

As in other programming languages, our scripts often need to run the same command or commands repeatedly. Very often, we need to run the same sequence of commands on a group of files. In simple cases, we can simply provide all of the files as arguments to a single invocation of a command, or use xargs to provide them:

shell-prompt: analyze input*.txt
find . -name 'input*.txt' | xargs analyze
        

We can achieve the same effect, as well as handle more complex situations involving multiple commands, using a loop in a shell script.

For and Foreach

Unix shells offer a type of loop that takes an enumerated list of string values, rather than counting through a sequence of numbers or looping while some input condition is true. This makes shell scripts very convenient for working with sets of files or arbitrary sets of values. This type of loop is well suited for use with globbing (file name patterns using wild cards, as discussed in the section called “Globbing (File Specifications)”):

#!/bin/sh -e

# Process input-1.txt, input-2.txt, etc.
for file in input-*.txt
do
    ./myprog $file
done

#!/bin/csh -ef

# Process input-1.txt, input-2.txt, etc.
foreach file (input-*.txt)
    ./myprog $file
end

Note

Code controlled by a loop should be consistently indented as shown above. How much indentation is used is a matter of personal taste, but four spaces is typical.

These loops are not limited to using file names, however. We can use them to iterate through any list of string values:

#!/bin/sh -e

for fish in flounder gobie hammerhead manta moray sculpin
do
    printf "%s\n" $fish
done
#!/bin/sh -e

for c in 1 2 3 4 5 6 7 8 9 10
do
    printf "%d\n" $c
done

To iterate through a list of integers too long to type out, we can utilize the seq command, which takes a starting value, optionally an increment value, and an ending value, and prints the sequence to the standard output. We can use shell output capture (the section called “Output Capture”) to represent the output of the seq command as a string in the script:

#!/bin/sh -e

# Count from 0 to 1000 in increments of 5
for c in $(seq 0 5 1000); do
    printf "%d\n" $c
done
#!/bin/csh -ef

foreach c (`seq 0 5 1000`)
    printf "%s\n" $c
end

The seq can even be used to embed integer values in a non-integer list:

#!/bin/sh -e

# Process all human chromosomes
for chromosome in $(seq 1 22) X Y; do
    printf "chr%s\n" $chromosome
done

Practice Break

Type in and run the fish example above.

Example 4.3. Multiple File Downloads

Often we need to download many large files from another site. This process would be tedious to do manually: Start a download, wait for it to finish, start another... There may be special tools provided by the website, but often they are poorly maintained or difficult to use. In many cases, we may be able to automate the download using a simple script and a mainstream transfer tool such as curl, rsync, or wget.

The model scripts below demonstrate how to download a set of files using curl. The local file names will be the same as those on the remote site, and if the transfer is interrupted for any reason, we can simply run the script again to resume the download where it left off.

Depending on the tools available on your local machine and the remote server, you may need to substitute another file transfer program for curl.

#!/bin/sh -e

# Download genome data from the ACME genome project
site=http://server.with.my.files/directory/with/my/files
for file in frog1 frog2 frog3 toad1 toad2 toad3; do
    printf "Fetching $site/$file.fasta.gz...\n"
    
    # Use filename from remote site and try to resume interrupted
    # transfers if a partial download already exists
    curl --continue-at - --remote-name $site/$file.fasta.gz
fi
#!/bin/csh -ef

# Download genome data from the ACME genome project
set site=http://server.with.my.files/directory/with/my/files
foreach file (frog1 frog2 frog3 toad1 toad2 toad3)
    printf "Fetching $site/$file.fasta.gz...\n"
    
    # Use filename from remote site and try to resume interrupted
    # transfers if a partial download already exists
    curl --continue-at - --remote-name $site/$file.fasta.gz
end

While Loops

A for or foreach loop is only convenient for iterating through a fixed set of values or a sequence generated by a program such as seq. Sometimes we may need to terminate a loop based on inputs that are unknown when the loop begins, or values computed over the course of the loop.

The while loop is a more general loop that iterates as long as some condition is true. It uses the same types of expressions as an if command. The while loop can be used to iterate through long integer sequences, as we might do with seq and a for/foreach loop:

#!/bin/sh -e

c=1
while [ $c -le 100 ]
do
    printf "%d\n" $c
    c=$(($c + 1))        # (( )) encloses an integer expression
done

Note again that the [ above is an external command, as discussed in the section called “Command Exit Status”, so we must use white space to separate the arguments.

#!/bin/csh -ef

set c = 1
while ( $c <= 100 )
    printf "%d\n" $c
    @ c = $c + 1        # @ is like set, but indicates an integer expression
end

Note

Code controlled by a loop should be consistently indented as shown above. How much indentation is used is a matter of personal taste, but four spaces is typical.

Practice Break

Type in and run the script above.

While loops can also be used to iterate until an input condition is met:

#!/bin/sh -e

continue=''
while [ 0"$continue" != 0'y' ] && [ 0"$continue" != 0'n' ]; do
    printf "Would you like to continue? (y/n) "
    read continue
done

#!/bin/csh -ef

set continue=''
while ( ("$continue" != 'y') && ("$continue" != 'n') )
    printf "Continue? (y/n) "
    set continue="$<"
end

Practice Break

Type in and run the script above.

We may even want a loop to iterate forever. This is often useful when using a computer to collect data at regular intervals. It is up to the user to terminate the process using Ctrl+c or kill.

#!/bin/sh -e

# 'true' is an external command that always returns an exit status of 0
while true; do
    sample-data     # Read instrument
    sleep 10        # Pause for 10 seconds without using any CPU time
done

#!/bin/csh -ef

while ( 1 )
    sample-data     # Read instrument
    sleep 10        # Pause for 10 seconds without using any CPU time
end

Practice

Note

Be sure to thoroughly review the instructions in Section 2, “Practice Problem Instructions” before doing the practice problems below.
  1. Describe three ways to run the same program using multiple input files.

  2. Write a shell script that prints the square of every number from 1 to 10 using for/foreach and seq.

  3. Write a shell script that prints the square of every number from 1 to 100 using while.

  4. Write a shell script that sorts each file in the CWD whose name begins with "input" and ending in ".txt", and saves the output to original-filename.sorted. You may assume that there are not too many input files for a simple globbing pattern. The script then merges all the sorted text into a single file called combined.txt.sorted, with duplicate lines removed. Hint: The sort can also merge presorted files. Check the man page for the necessary flag.

    shell-prompt: ./sort.sh 
    
    input1.txt:
    Starbuck
    Adama
    
    input1.txt.sorted:
    Adama
    Starbuck
    
    input2.txt:
    Tigh
    Apollo
    
    input2.txt.sorted:
    Apollo
    Tigh
    
    Combined:
    Adama
    Apollo
    Starbuck
    Tigh