We often need to run the same command or commands on a group of files or other data.
Unix shells offer a type of loop that takes an enumerated list of string values, rather than counting through a sequence of numbers. This makes it more flexible for working with sets of files or or arbitrary sets of values.
This type of loop is well suited for use with globbing (file name patterns using wild cards, as discussed in Section 1.7.5, “Globbing (File Specifications)”):
#!/usr/bin/env bash # Process input-1.txt, input-2.txt, etc. for file in input-*.txt do ./myprog $file done
#!/bin/csh -ef # Process input-1.txt, input-2.txt, etc. foreach file (input-*.txt) ./myprog $file end
These loops are not limited to using file names, however. We can use them to iterate through any list of string values:
#!/bin/sh for fish in flounder gobie hammerhead manta moray sculpin do printf "%s\n" $fish done
#!/usr/bin/env bash for c in 1 2 3 4 5 6 7 8 9 10 do printf "%d\n" $c done
To iterate through a list of integers too long to type out, we can utilize the seq command, which takes a starting value, optionally an increment value, and an ending value. We use shell output capture (Section 2.11.4, “Output Capture”) to represent the output of the seq command as a string in the script:
#!/bin/sh -e # Count from 0 to 1000 in increments of 5 for c in $(seq 0 5 1000); do printf "%d\n" $c done
#!/bin/csh foreach c (`seq 0 5 1000`) printf "%s\n" $c end
The seq can also be used to embed integer values in a non-integer list:
#!/bin/sh -e # Process all human chromosomes for chromosome in $(seq 1 22) X Y; do printf "chr%s\n" $chromosome done
Type in and run the fish example above.
Example 2.3. Multiple File Downloads
Often we need to download many large files from another site. This process would be tedious to do manually: Start a download, wait for it to finish, start another... There may be special tools provided, but often they are unreliable or difficult to install. In many cases, we may be able to automate the download using a simple script and a file transfer tool such as curl, fetch, rsync or wget.
The model scripts below demonstrate how to download a set of files using curl. The local file names will be the same as those on the remote site and if the transfer is interrupted for any reason, we can simply run the script again to resume the download where it left off.
Depending on the tools available on your local machine and the remote server, you may need to substitute another file transfer program for curl.
#!/bin/sh -e # Download genome data from the ACME genome project site=http://server.with.my.files/directory/with/my/files for file in frog1 frog2 frog3 toad1 toad2 toad3; do printf "Fetching $site/$file.fasta.gz...\n" # Use filename from remote site and try to resume interrupted # transfers if a partial download already exists curl --continue-at - --remote-name $site/$file.fasta.gz fi
#!/bin/csh -ef # Download genome data from the ACME genome project set site=http://server.with.my.files/directory/with/my/files foreach file (frog1 frog2 frog3 toad1 toad2 toad3) printf "Fetching $site/$file.fasta.gz...\n" # Use filename from remote site and try to resume interrupted # transfers if a partial download already exists curl --continue-at - --remote-name $site/$file.fasta.gz end
A for or foreach loop is only convenient for iterating through a fixed set of values. Sometimes we may need to terminate a loop based on inputs that are unknown when the loop begins, or values computed over the course of the loop.
The while loop is a more general loop that iterates as long as some condition is true. It uses the same types of expressions as an if statement.
The while loop is often used to iterate through long integer sequences:
#!/usr/bin/env bash c=1 while [ $c -le 100 ] do printf "%d\n" $c c=$(($c + 1)) # (( )) encloses an integer expression done
Note again that the [ above is an external command, as discussed in Section 2.14.1, “Command Exit Status”.
#!/bin/csh -ef set c = 1 while ( $c <= 100 ) printf "%d\n" $c @ c = $c + 1 # @ is like set, but indicates an integer expression end
Type in and run the script above.
While loops can also be used to iterate until an input condition is met:
#!/bin/sh continue='' while [ 0"$continue" != 0'y' ] && [ 0"$continue" != 0'n' ]; do printf "Would you like to continue? (y/n) " read continue done
#!/bin/csh -ef set continue='' while ( ("$continue" != 'y') && ("$continue" != 'n') ) printf "Continue? (y/n) " set continue="$<" end
Type in and run the script above.
We may even want a loop to iterate forever. This is often useful when using a computer to collect data at regular intervals:
#!/bin/sh # 'true' is an external command that always returns an exit status of 0 while true; do sample-data # Read instrument sleep 10 # Pause for 10 seconds without using any CPU time done
#!/bin/csh -ef while ( 1 ) sample-data # Read instrument sleep 10 # Pause for 10 seconds without using any CPU time end
Write a shell script that sorts all files with names
ending in ".txt" one at a time, removes duplicate entries,
and saves the output to filename.txt.sorted
.
The script then merges all the sorted text into a single file
called combined.txt.sorted
.
The sort can
also merge presorted files when used with the
-m
flag.
The standard Unix sort can be used to sort an individual file. The uniq command will remove duplicate lines that are adjacent to each other. ( Hence, the data should be sorted already. )