1.15. File Transfer

Many users will need to transfer data between other computers and a remote Unix system. For example, users of a shared research computer running Unix will need to transfer input data from their computer to the Unix machine, run the research programs, and finally transfer results back to their computer. There are many software tools available to accomplish this. Some of the more convenient tools are described below.

1.15.1. File Transfers from Unix

For Unix (including Mac and Cygwin) users, the recommended method for transferring files is the rsync command. The rsync command is a simple but intelligent tool that makes it easy to synchronize two directories on the same machine or on different machines across a network. Rsync is free software and part of the base installation of many Unix systems including Mac OS X. On Cygwin, you can easily add the rsync package using the Cygwin Setup utility.

Rsync has two major advantages over other file transfer programs:

  • If you have transferred the directory before, and only want to update it, rsync will automatically determine the differences between the two copies and only transfer what is necessary. When conducting research that generates large amounts of data, this can save an enormous amount of time.
  • If a transfer fails for any reason (which is more likely for large transfers), rsync's inherent ability to determine the differences between two copies allows it to resume from where it left off. Simply run the exact same rsync command again, and the transfer will resume.

The rsync command can either push (send) files from the local machine to a remote machine, or pull (retrieve) files from a remote machine to the local machine. The command syntax is basically the same in both cases. It's just a matter of how you specify the source and destination for the transfer.

The rsync command has many options, but the most typical usage is to create an exact copy of a directory on a remote system. The general rsync command to push files to another host would be:

shell-prompt: rsync -av --delete source-path [username@]hostname:[destination-path]
	    

Example 1.2. Pushing data with rsync

The following command synchronizes the directory Project from the local machine to ~joeuser/Data/Project on Peregrine:

shell-prompt: rsync -av --delete Project joeuser@data.peregrine.hpc.uwm.edu:Data
		

The general syntax for pulling files from another host is:

shell-prompt: rsync -av --delete [username@]hostname:[source-path] destination-path
	    

Example 1.3. Pulling data with rsync

The following command synchronizes the directory ~joeuser/Data/Project on Peregrine to ./Project on the local machine:

shell-prompt: rsync -av --delete joeuser@data.peregrine.hpc.uwm.edu:Data/project .
		

If you omit "username@" from the source or destination, rsync will try to log into the remote system with your username on the local system.

If you omit destination-path from a push command or source-path from a pull command, rsync will use your home directory on the remote host.

The command-line flags used above have the following meanings:

-a
Use archive mode. Archive mode copies all subdirectories recursively and preserves as many file attributes as possible, such as ownership, permissions, etc.
-v
Verbose copy: Display names of files and directories as they are copied.
--delete
Delete files and directories from the destination if they do not exists in the source. Without --delete, rsync will add and replace files in the destination, but never remove anything.

Caution

Note that a trailing / on source-path affects where rsync stores the files on the destination system. Without a trailing /, rsync will create a directory called source-path under destination-path on the destination host.

With a trailing / on source-path, destination-path is assumed to be the directory that will replace source-path on the destination host. This feature is a somewhat cryptic method of allowing you to change the name of the directory during the transfer. However, it is compatible with the basic Unix cp command.

Note also that the trailing / only affects the command when applied to source-path. A trailing / on destination-path has no effect.

The command below creates an identical copy of the directory Data/Model in Model (/home/bacon/Data/Model to be precise) on data.peregrine.hpc.uwm.edu. The resulting directory is the same regardless of whether the destination directory existed before the command or not.

shell-prompt: rsync -av --delete Model bacon@data.peregrine.hpc.uwm.edu:Data
	    

The command below dumps the contents of Model directly into Data, and deletes everything else in the Data directory! In other words, it makes the destination directory Data identical to the source directory Model.

shell-prompt: rsync -av --delete Model/ bacon@data.peregrine.hpc.uwm.edu:Data
	    

To achieve the same effect as the command with no /, you would need to fully specify the destination path:

shell-prompt: rsync -av --delete Model/ bacon@data.peregrine.hpc.uwm.edu:Data/Model
	    

Note that if using globbing on the remote system, any globbing patterns must be protected from expansion by the local shell by escaping them or enclosing them in quotes. We want the pattern expanded on the remote system, not the local system:

shell-prompt: rsync -av --delete bacon@unixdev1.hpc.uwm.edu:Data/Study\* .
shell-prompt: rsync -av --delete 'bacon@unixdev1.hpc.uwm.edu:Data/Study*' .
	    

For full details on the rsync command, type

shell-prompt: man rsync
	    

1.15.2. File Transfer from Windows without Cygwin

If you're using Cygwin from Windows, you can utilize the rsync command as discussed in Section 1.15.1, “File Transfers from Unix”, provided you've installed the Cygwin rsync package. Otherwise, WinSCP provides a simple way to transfer files to and from your Windows PC. WinSCP is a free program that can be downloaded and installed in a few minutes from http://winscp.net.

After installing WinSCP, simply launch the program, and the following dialog appears:

WinSCP Dialog
The WinSCP login dialog

WinSCP uses the secure shell protocol to connect to a remote system. Like the Unix ssh command, if this is the first time connecting from this computer, you will be asked if you want to add the host key and continue:

WinSCP session
A WinSCP session

Once you've successfully logged in, you can simply drag files or directories from one system to the other. If you're updating a large directory that already exists on the destination machine, you may want to check the New and updated file(s) only box. This will cause WinSCP to transfer only the files that are different on each end. This feature is a crude approximation to the functionality of rsync.

WinSCP Copy
A WinSCP copy operation

1.15.3. Self-test

  1. Show the simplest Unix command to accomplish each of the following:
    1. Copy or synchronize the directory ./PCB-Study to ~/PCB-Study under the user "joeuser" on the host data.peregrine.hpc.uwm.edu.
    2. Copy or synchronize the directory ~/PCB-Study under the user "joeuser" on the host data.peregrine.hpc.uwm.edu to ./PCB-Study.