2.15. Loops

We often need to run the same command or commands on a group of files or other data.

2.15.1. For and Foreach

Unix shells offer a type of loop that takes an enumerated list of string values, rather than counting through a sequence of numbers. This makes it more flexible for working with sets of files or or arbitrary sets of values.

This type of loop is well suited for use with globbing (file name patterns using wild cards, as discussed in Section 1.7.5, “Globbing (File Specifications)”):

#!/usr/bin/env bash

# Process input-1.txt, input-2.txt, etc.
for file in input-*.txt
do
    ./myprog $file
done

#!/bin/csh -ef

# Process input-1.txt, input-2.txt, etc.
foreach file (input-*.txt)
    ./myprog $file
end

These loops are not limited to using file names, however. We can use them to iterate through any list of string values:

#!/bin/sh

for fish in flounder gobie hammerhead manta moray sculpin
do
    printf "%s\n" $fish
done
#!/usr/bin/env bash

for c in 1 2 3 4 5 6 7 8 9 10
do
    printf "%d\n" $c
done

To iterate through a list of integers too long to type out, we can utilize the seq command, which takes a starting value, optionally an increment value, and an ending value. We use shell output capture (Section 2.11.4, “Output Capture”) to represent the output of the seq command as a string in the script:

#!/bin/sh -e

# Count from 0 to 1000 in increments of 5
for c in $(seq 0 5 1000); do
    printf "%d\n" $c
done
#!/bin/csh

foreach c (`seq 0 5 1000`)
    printf "%s\n" $c
end

The seq can also be used to embed integer values in a non-integer list:

#!/bin/sh -e

# Process all human chromosomes
for chromosome in $(seq 1 22) X Y; do
    printf "chr%s\n" $chromosome
done

Practice Break

Type in and run the fish example above.

Note

Note again that the Unix commands, including the shell, don't generally care whether their input comes from a file or a device such as the keyboard. Try running the fish example by typing it directly at the shell prompt as well as by writing a script file. When running it directly, be sure to use the correct shell syntax for the interactive shell you are running.

Example 2.3. Multiple File Downloads

Often we need to download many large files from another site. This process would be tedious to do manually: Start a download, wait for it to finish, start another... There may be special tools provided, but often they are unreliable or difficult to install. In many cases, we may be able to automate the download using a simple script and a file transfer tool such as curl, fetch, rsync or wget.

The model scripts below demonstrate how to download a set of files using curl. The local file names will be the same as those on the remote site and if the transfer is interrupted for any reason, we can simply run the script again to resume the download where it left off.

Depending on the tools available on your local machine and the remote server, you may need to substitute another file transfer program for curl.

#!/bin/sh -e

# Download genome data from the ACME genome project
site=http://server.with.my.files/directory/with/my/files
for file in frog1 frog2 frog3 toad1 toad2 toad3; do
    printf "Fetching $site/$file.fasta.gz...\n"
    
    # Use filename from remote site and try to resume interrupted
    # transfers if a partial download already exists
    curl --continue-at - --remote-name $site/$file.fasta.gz
fi
#!/bin/csh -ef

# Download genome data from the ACME genome project
set site=http://server.with.my.files/directory/with/my/files
foreach file (frog1 frog2 frog3 toad1 toad2 toad3)
    printf "Fetching $site/$file.fasta.gz...\n"
    
    # Use filename from remote site and try to resume interrupted
    # transfers if a partial download already exists
    curl --continue-at - --remote-name $site/$file.fasta.gz
end

2.15.2. While Loops

A for or foreach loop is only convenient for iterating through a fixed set of values. Sometimes we may need to terminate a loop based on inputs that are unknown when the loop begins, or values computed over the course of the loop.

The while loop is a more general loop that iterates as long as some condition is true. It uses the same types of expressions as an if statement.

The while loop is often used to iterate through long integer sequences:

#!/usr/bin/env bash

c=1
while [ $c -le 100 ]
do
    printf "%d\n" $c
    c=$(($c + 1))        # (( )) encloses an integer expression
done

Note again that the [ above is an external command, as discussed in Section 2.14.1, “Command Exit Status”.

#!/bin/csh -ef

set c = 1
while ( $c <= 100 )
    printf "%d\n" $c
    @ c = $c + 1        # @ is like set, but indicates an integer expression
end

Practice Break

Type in and run the script above.

While loops can also be used to iterate until an input condition is met:

#!/bin/sh

continue=''
while [ 0"$continue" != 0'y' ] && [ 0"$continue" != 0'n' ]; do
    printf "Would you like to continue? (y/n) "
    read continue
done

#!/bin/csh -ef

set continue=''
while ( ("$continue" != 'y') && ("$continue" != 'n') )
    printf "Continue? (y/n) "
    set continue="$<"
end

Practice Break

Type in and run the script above.

We may even want a loop to iterate forever. This is often useful when using a computer to collect data at regular intervals:

#!/bin/sh

# 'true' is an external command that always returns an exit status of 0
while true; do
    sample-data     # Read instrument
    sleep 10        # Pause for 10 seconds without using any CPU time
done

#!/bin/csh -ef

while ( 1 )
    sample-data     # Read instrument
    sleep 10        # Pause for 10 seconds without using any CPU time
end

2.15.3. Self-test

  1. Write a shell script that prints the square of every number from 1 to 100.
  2. Write a shell script that sorts all files with names ending in ".txt" one at a time, removes duplicate entries, and saves the output to filename.txt.sorted. The script then merges all the sorted text into a single file called combined.txt.sorted. The sort can also merge presorted files when used with the -m flag.

    The standard Unix sort can be used to sort an individual file. The uniq command will remove duplicate lines that are adjacent to each other. ( Hence, the data should be sorted already. )

  3. Do the examples for shell loops above give you any ideas about using multiple computers to speed up processing?