1.15. File Transfer

Many users will need to transfer data between other computers and a remote Unix system. For example, users of a shared research computer running Unix will need to transfer input data from their computer to the Unix machine, run the research programs, and finally transfer results back to their computer. There are many software tools available to accomplish this. Some of the more convenient tools are described below.

1.15.1. File Transfers from Unix

sftp (Secure File Transfer Protocol) is often used to remotely log into another machine on the network and transfer files to or from it. Not all remote Unix systems have sftp enabled.

shell-prompt: sftp [name@]host
            
shell-prompt: sftp joe@unixdev1.ceas.uwm.edu
            

For Unix (including Mac and Cygwin) users, the recommended method for transferring files is the rsync command. The rsync command is a simple but intelligent tool that makes it easy to synchronize two directories on the same machine or on different machines across a network. Rsync is free software and part of the base installation of many Unix systems including Mac OS X. On Cygwin, you can easily add the rsync package using the Cygwin Setup utility.

Rsync has two major advantages over other file transfer programs:

  • If you have transferred the directory before, and only want to update it, rsync will automatically determine the differences between the two copies and only transfer what is necessary. When conducting research that generates large amounts of data, this can save an enormous amount of time.
  • If a transfer fails for any reason (which is more likely for large transfers), rsync's inherent ability to determine the differences between two copies allows it to resume from where it left off. Simply run the exact same rsync command again, and the transfer will resume.

The rsync command can either push (send) files from the local machine to a remote machine, or pull (retrieve) files from a remote machine to the local machine. The command syntax is basically the same in both cases. It's just a matter of how you specify the source and destination for the transfer.

The rsync command has many options, but the most typical usage is to create an exact copy of a directory on a remote system. The general rsync command to push files to another host would be:

shell-prompt: rsync -av --delete source-path [username@]hostname:[destination-path]
            

Example 1.4. Pushing data with rsync

The following command synchronizes the directory Project from the local machine to ~joeuser/Data/Project on Peregrine:

shell-prompt: rsync -av --delete Project joeuser@unixdev1.ceas.uwm.edu:Data
                

The general syntax for pulling files from another host is:

shell-prompt: rsync -av --delete [username@]hostname:[source-path] destination-path
            

Example 1.5. Pulling data with rsync

The following command synchronizes the directory ~joeuser/Data/Project on Peregrine to ./Project on the local machine:

shell-prompt: rsync -av --delete joeuser@unixdev1.ceas.uwm.edu:Data/project .
                

If you omit "username@" from the source or destination, rsync will try to log into the remote system with your username on the local system.

If you omit destination-path from a push command or source-path from a pull command, rsync will use your home directory on the remote host.

The command-line flags used above have the following meanings:

-a
Use archive mode. Archive mode copies all subdirectories recursively and preserves as many file attributes as possible, such as ownership, permissions, etc.
-v
Verbose copy: Display names of files and directories as they are copied.
--delete
Delete files and directories from the destination if they do not exists in the source. Without --delete, rsync will add and replace files in the destination, but never remove anything.

Caution

Note that a trailing / on source-path affects where rsync stores the files on the destination system. Without a trailing /, rsync will create a directory called source-path under destination-path on the destination host.

With a trailing / on source-path, destination-path is assumed to be the directory that will replace source-path on the destination host. This feature is a somewhat cryptic method of allowing you to change the name of the directory during the transfer. However, it is compatible with the basic Unix cp command.

Note also that the trailing / only affects the command when applied to source-path. A trailing / on destination-path has no effect.

The command below creates an identical copy of the directory Data/Model in Model (/home/bacon/Data/Model to be precise) on unixdev1.ceas.uwm.edu. The resulting directory is the same regardless of whether the destination directory existed before the command or not.

shell-prompt: rsync -av --delete Model joeuser@unixdev1.ceas.uwm.edu:Data
            

The command below dumps the contents of Model directly into Data, and deletes everything else in the Data directory! In other words, it makes the destination directory Data identical to the source directory Model.

shell-prompt: rsync -av --delete Model/ joeuser@unixdev1.ceas.uwm.edu:Data
            

To achieve the same effect as the command with no /, you would need to fully specify the destination path:

shell-prompt: rsync -av --delete Model/ joeuser@unixdev1.ceas.uwm.edu:Data/Model
            

Note that if using globbing on the remote system, any globbing patterns must be protected from expansion by the local shell by escaping them or enclosing them in quotes. We want the pattern expanded on the remote system, not the local system:

shell-prompt: rsync -av --delete joeuser@unixdev1.ceas.uwm.edu:Data/Study\* .
shell-prompt: rsync -av --delete 'joeuser@unixdev1.ceas.uwm.edu:Data/Study*' .
            

For full details on the rsync command, type

shell-prompt: man rsync