A Unix file is simply a sequence of bytes (8-bit values) stored on a disk and given a unique name. The bytes in a file may be printable characters such as letters, digits, punctuation symbols, invisible control characters (which cause a printer or terminal to perform actions such as backspacing or scrolling), part of a number (a typical integer or floating point number consists of 8 bytes), or other non-character, non-numeric data.
This is how Unix sees all files. It takes no interest whatsoever in the meaning of the bytes within a file. The meaning of the content is determined solely by the programs using the file.
Files are often classified as either text or binary files. All of the bytes in a text file are interpreted as ASCII/ISO characters by the programs that read or write the file, while binary files may contain both character and non-character data.
Again, Unix does not make a distinction between text and binary files. This is left to the programs that use the files.
Example 3.8. Practice Break
Try the following commands:
shell-prompt: cat /etc/hosts
What do you see? The /etc/hosts
file is a text file, and cat is used
here to echo (concatenate) it to the terminal output.
Now try the following:
shell-prompt: cat /bin/ls
What do you see? The file /bin/ls
is
not a text file. It contains binary program code, not
characters.
The cat command assumes that the file
is a text file and sends each byte to your terminal.
The terminal tries to interpret each byte as an
ASCII/ISO character and display it on the screen.
Since the file does not contain a sequence of characters,
it appears as nonsense on your terminal. Some of the
bytes sent to the terminal may even knock it out of
whack, causing it to behave strangely.
If this happens, run the reset
command to restore your terminal to its default state.
While it is the program that interprets the contents of a file, there are some conventions regarding text file format that all Unix programs follow, so that they can all manipulate the same files. Unfortunately, Windows programs follow different conventions. Unix programs assume that text files terminate each line with a control character known as a line feed (also known as a newline or NL for short), which is the 10th character in the standard ASCII/ISO character sets. Windows programs use both a carriage return or CR (13th character) and NL.
Text files created on Windows will contain both a CR and NL at the end of each line. Text files created on Unix will have only an NL. This can cause problems for programs on either Unix or Windows. Hence, it is not a good idea to use a Windows editor to write code for Unix systems or vice-versa.
The dos2unix and unix2dos commands can be used to clean up files that have been transferred between Unix and Windows. These programs convert text files between the Windows and Unix standards. If you've edited a text file on a non-Unix system, and are now using it on a Unix system, you can clean it up by running:
shell-prompt: dos2unix filename
The dos2unix and unix2dos commands are not standard with most Unix systems, but they are free programs that can easily be added via most package managers.
A Unix file system contains files and directories. A file is like a document, and a directory is like a folder that contains documents and/or other directories. The terms "directory" and "folder" are interchangeable, but "directory" is the standard term used in Unix.
Directories are so called because they serve the same purpose as the directory you might find in the lobby of an office building: They are listings that keep track of what files and other directories are called and where they are located on the disk.
Unix file systems use case-sensitive file and directory
names. I.e., Temp
is not the same
as temp
, and both can coexist in the
same directory.
macOS is the only mainstream Unix system that violates this
convention. The standard OS X file systems
is case-preserving, but not case-sensitive. This means
that if you call a file Temp
, it will
remember that the T is capital, but it can also be referred
to as temp
, tEmp
,
etc. Only one of these files can exist in a given directory
at any one time.
A Unix file system can be visualized as a tree, with each file and directory contained within another directory. Figure 3.2, “Sample of a Unix File system” shows a small portion of a typical Unix file system. On a real Unix system, there are usually thousands of files and directories. Directories are shown in green and files are in yellow.
Unix uses a forward slash (/) to separate directory and file names while Windows uses a backslash (\).
The one directory that is not contained within any other
is known as the root directory,
whose name under Unix is /
. There is
exactly one root directory on every Unix system. Windows
systems, on the other hand, have a root directory for
each disk partition such as C:\ and D:\.
The Cygwin compatibility layer works around the separate drive letters of Windows by unifying them under a common parent directory called /cygdrive. Hence, for Unix commands run under Cygwin, /cygdrive/c is equivalent to c:\, /cygdrive/d is equivalent to d:\, and so on. This allows Cygwin users to do things like search multiple Windows drive letters with a single command starting in /cygdrive.
Unix file system trees are fairly standardized, but most have some variation. For instance, all Unix systems have a /bin and a /usr/bin, which contain standard Unix commands. Not all of them have /home or /usr/local. Many Linux systems install commands from add-on packages into /usr/bin, mixing them with the standard Unix commands that are essential to the basic functioning of the system. Other systems such as most BSDs keep them separated in /usr/local/bin or /usr/pkg/bin.
The root directory is the parent of /bin and /home and an ancestor of all other files and directories.
The /bin and /home directories are subdirectories, or children of /. Likewise, /home/joe and /home/sue are subdirectories of /home, and grandchildren of /.
All of the files in and under /home comprise a subtree of /home.
The children of a directory, all of its children, and so on, are known as descendants of the directory. All files and directories on a Unix system, except /, are descendants of /.
Each user has a home directory, which can be arbitrarily assigned, but is generally a child of /home on many Unix systems or of /Users on macOS. Most or all of a user's files and subdirectories are found under their home directory. In the example above, /home/joe is the home directory for user joe, and /home/sue is the home directory for user sue.
In some situations, a home directory can be referred to as ~ or ~user. For example, user joe can refer to his home directory as ~, ~/, or ~joe, while he can only refer to sue's home directory as ~sue.
The absolute path name, also known as full path name, of a file or directory denotes the complete path from / (the root directory) to the file or directory of interest. It is the path we would "walk" from the root directory (/) to the file or directory of interest. For example, the absolute path name of Sue's .cshrc file is /home/sue/.cshrc, and the absolute path name of the ape command is /usr/local/bin/ape. To walk the directory tree, we would start in / and progress from there:
Start in / Go to /usr Go to /usr/local Go to /usr/local/bin End at /usr/local/bin/ape
The absolute path name is the only way to uniquely identify a file or directory in the file system.
Example 3.9. Practice Break
Try the following commands:
shell-prompt: ls shell-prompt: ls /etc shell-prompt: cat /etc/hosts shell-prompt: ls ~
Every Unix process has an attribute called the current working directory, or CWD. This is the directory that the process is currently "in". When you first log into a Unix system, the shell process's CWD is set to your home directory.
The pwd (print working directory) command prints the CWD of the shell process. The cd (change directory) command changes the CWD of the shell process. Running cd with no arguments sets the CWD to your home directory, much like clicking your heels together three times to get back to Kansas. Running cd - changes the CWD to its previous value.
Example 3.10. Practice Break
Try the following commands:
shell-prompt: pwd shell-prompt: cd / shell-prompt: pwd shell-prompt: cd shell-prompt: pwd shell-prompt: cd - shell-prompt: pwd shell-prompt: cd - shell-prompt: pwd
Many commands, such as ls, use the
CWD as a default if you don't provide a directory name on the
command line. For example, if the CWD is
/home/joe
,
then the following commands are the same:
shell-prompt: ls shell-prompt: ls /home/joe shell-prompt: ls ~joe
Whereas an absolute path name denotes the path from / to a file or directory, the relative path name denotes the path from the CWD to a file or directory.
Any path name that does not begin with a '/' or '~'
is interpreted as a relative path name. The absolute path
name is then derived
by appending the relative path name to the CWD. For example,
if the CWD is /etc
, then the relative
path name hosts
refers to the absolute path name
/etc/hosts
, and the relative path
name of /etc/ssh/ssh_config
is
ssh/ssh_config
.
bin
is
/bin
when CWD is / and
/usr/bin
when CWD is
/usr
.
When you run a program from the shell, the new process inherits the CWD from the shell. Hence, you can use relative path names as arguments in any Unix command, and they will use the CWD inherited from the shell process. For example, the two cat commands below have the same effect.
shell-prompt: cd /etc # Set shell's CWD to /etc shell-prompt: cat hosts # Inherits CWD from shell, so hosts = /etc/hosts shell-prompt: cat /etc/hosts # Same effect as above
The cd command is one of the most overused Unix commands. Many people use it where it is completely unnecessary and actually results in significantly more typing than needed. Don't use cd if it is actually more work than using an absolute path name as an argument. For example, consider the sequence of commands:
shell-prompt: cd /etc shell-prompt: more hosts shell-prompt: cd
The same effect could have been achieved much more easily using the following single command:
shell-prompt: more /etc/hosts
Example 3.11. Practice Break
Try to predict the results of the following commands before running them:
shell-prompt: cd shell-prompt: pwd shell-prompt: cd /etc shell-prompt: pwd shell-prompt: cat hosts shell-prompt: cat /etc/hosts shell-prompt: cd shell-prompt: pwd shell-prompt: cat hosts
Why does the last command result in an error?
The relative path name is potentially much shorter than the equivalent absolute path name. Using relative path names also makes code more portable.
Suppose you have a project contained in the directory
/Users/joe/Thesis
on your Mac.
Now suppose you want to work on the same project on
an HPC cluster, where there is no /Users
directory, and you have to store it in
/share1/joe/Thesis
.
The absolute path name of every file and directory under
Thesis
will be
different on the cluster than it is on your Mac.
This can cause major problems if you were using absolute
path names in your scripts, programs, and makefiles.
Statements like the following will have to be changed
in order to run the program on a different computer.
infile = fopen("/Users/joe/Thesis/Inputs/input1.txt", "r");
sort /Users/joe/Thesis/Inputs/names.txt
While the absolute path names change when you move the Thesis
directory, the path names relative to the
Thesis
directory
remain the same. For this reason, absolute path names
should be avoided.
The statements below will work on any computer as long
as the program or script is running with
Thesis
as the CWD. It does not matter
where the Thesis
directory is located,
so long as the Inputs
directory is its
child.
infile = fopen("Inputs/input1.txt", "r");
sort Inputs/names.txt
In addition to absolute path names and relative path names, there are a few special symbols for directories that are commonly referenced:
Table 3.5. Special Directory Symbols
Symbol | Refers to |
---|---|
. | The current working directory |
.. | The parent of the current working directory |
~ | Your home directory |
~user | user's home directory |
The '.' notation for CWD is useful for copying files to CWD and other commands that require a target directory name.
shell-prompt: cp /etc/hosts .
It is also useful if a mishap occurs, leading to the creation
of a file whose name begins with a special character such as
'-' or '~'. If we have a file called "-file.txt", we
cannot remove it with rm -file.txt,
since the rm command will think the '-'
indicates a flag argument. To get around this, we simply
need to make the argument not begin with a '-'. We can
either use the absolute path name of the file, e.g.
/home/joe/-file.txt
or
./-file.txt
. ./path
is exactly the same as path
.
The ".." notation refers to the parent of the CWD and
allows for relative path names that are
not under the CWD.
For example, if the CWD is /home/joe
,
then the relative path of /home/sue/.cshrc
is ../sue/.cshrc
and the relative path
name of /etc/hosts
is
../../etc/hosts
. We can "walk" a
relative path such as ../../etc/hosts
just as we walk an absolute path:
Start at /home/joe (.) Go to /home (..) Go to / (../..) Go to /etc (../../etc) End at /etc/hosts (../../etc/hosts)
Note that /home/joe/../sue/.cshrc
(/home/joe + / + ../sue/.cshrc
)
is a valid absolute path name, but it can be shortened to
/home/sue/.cshrc
. We can always remove
a ../
along with the path component to
the left of it, such as joe/../
.
Likewise, /home/joe/../../etc/hosts
can be reduced to /home/../etc/hosts
and further to /etc/hosts
.
Example 3.12. Practice Break
Try the following commands and see what they do:
shell-prompt: cd shell-prompt: pwd shell-prompt: ls shell-prompt: ls ~ shell-prompt: ls . shell-prompt: mkdir Data Scripts shell-prompt: cp /etc/hosts . shell-prompt: mv hosts Data shell-prompt: ls Data shell-prompt: ls ./Data shell-prompt: cd Data shell-prompt: cd ../Scripts shell-prompt: ls .. shell-prompt: ls ../Data shell-prompt: more ../Data/hosts shell-prompt: rm ../Data/hosts shell-prompt: ls ~/Data shell-prompt: ls /bin shell-prompt: cd .. shell-prompt: pwd
Every file and directory on a Unix system has inherent access control features based on a simple system:
Every file and directory belongs to an individual user and to a group of users.
There are 3 types of permissions which are controlled separately from each other:
Read, write, and execute permissions can be granted or denied separately for each of the following:
Execute permissions on a file mean that the file can be executed as a script or a program by typing its name. It does not mean that the file actually contains a script or a program: It is up to the owner of the file to set the execute permissions appropriately for each file.
Execute permissions on a directory mean that permitted users can cd into it. Users only need read permissions on a directory to list it or access a file within it, but they need execute permissions in order for their processes to make it the CWD.
Unix systems provide this access using 9 on/off switches (bits) associated with each file.
If you do a long listing of a file or directory, you will see the ownership and permissions:
shell-prompt: ls -l drwx------ 2 joe users 512 Aug 7 07:52 Desktop/ drwxr-x--- 39 joe users 1536 Aug 9 22:21 Documents/ drwxr-xr-x 2 joe users 512 Aug 9 22:25 Downloads/ -rw-r--r-- 1 joe users 82118 Aug 2 09:47 bootcamp.pdf
The leftmost column shows the type of object and the permissions for each user category.
A '-' in the leftmost character means a regular file, 'd' means a directory, 'l' means a link. etc. Running man ls will reveal all the codes.
The next three characters are, in order, read, write and execute permissions for the owner (joe).
The next three after that are permissions for members of the owning group (users).
The next three are permissions for world (other).
A '-' in a permission bit column means that the permission is denied for that user or set of users and an 'r', 'w', or 'x' means that read, write, or execute is permitted.
The next three columns show the number of links (different path names for the same file), the individual and group ownership of the file or directory. The remaining columns show the size, the date and time it was last modified, and name. In addition to the 'd' in the first column, directory names are followed by a '/' if the ls is so configured.
You can see above that Joe's Desktop
directory is readable,
writable, and executable for Joe, and completely inaccessible
to everyone else.
Joe's Documents
directory is readable, writable and executable
for Joe, and readable and executable for members of the group
"users".
Users not in the group "users" cannot access the Documents
directory at all.
Joe's Downloads
directory is readable
and executable to anyone who can log into the system.
The file bootcamp.pdf
is readable by
group and world, but only writable by Joe. It is not
executable by anyone, which
makes sense because a PDF file is not a program.
Users cannot change individual ownership on a file, since this would allow them to subvert disk quotas and do other malicious acts by placing their files under someone else's name. Only the superuser (the system administrator) can change the individual ownership of a file or directory.
Every user has a primary group and may also be a member of supplementary groups. Users can change the group ownership of a file to any group that they belong to using the chgrp command, which requires a group name as the second argument and one or more path names following the group:
shell-prompt: chgrp group path [path ...]
All sharing of files on Unix systems is done by controlling group ownership and file permissions.
File permissions are changed using the chmod command:
shell-prompt: chmod permission-specification path [path ...]
The permission specification has a symbolic form, and a raw form, which is an octal number.
The symbolic form consists of any of the three user categories 'u' (user/owner), 'g' (group), and 'o' (other/world) followed by a '+' (grant) or '-' (revoke), and finally one of the three permissions 'r', 'w', or 'x'.
To add read and execute (cd) permissions for group and world on the Documents directory:
shell-prompt: chmod go+rx Documents
Sometimes it is impossible to express the changes we want to make in one simple specification. In that case, we can use a compound specification, two or more basic specs separated by commas. Remember that white space indicates the end of an argument, so we cannot have any white space next to the comma.
To revoke all permissions for world on the Documents directory and grant read permission for the group:
shell-prompt: chmod o-rwx,g+r Documents
Disable write permission for everyone, including the owner, on bootcamp.pdf. This can be used to prevent the owner from accidentally deleting an important file.
shell-prompt: chmod ugo-w bootcamp.pdf
Run man chmod for additional information.
The raw form for permissions uses a 3-digit octal number to represent the 9 permission bits. This is a quick and convenient method for computer nerds who can do octal/binary conversions in their head.
shell-prompt: chmod 644 bootcamp.pdf # 644 = 110100100 = rw-r--r-- shell-prompt: chmod 750 Documents # 750 = 111101000 = rwxr-x---
By default, new files you create are owned by you and your primary group. If you are a member of more than one group and wish to share a directory with one of your supplementary groups, it may also be helpful to set a special flag on the directory so that new files created in it will have the same group as the directory, rather than your primary group. Then you won't have to remember to chmod every new file you create.
shell-prompt: chmod g+s Shared-research
Example 3.13. Practice Break
Try the following commands, and try to predict the output of each ls before you run it.
shell-prompt: touch testfile shell-prompt: ls -l shell-prompt: chmod go-rwx testfile shell-prompt: ls -l shell-prompt: chmod o+rw testfile shell-prompt: ls -l shell-prompt: chmod g+rwx testfile shell-prompt: ls -l shell-prompt: rm testfile
Now set permissions on testfile so that it is readable, writable, and executable by you, only readable by the group, and inaccessible to everyone else.
What is a file in the viewpoint of Unix?
What is the difference between a text file and a binary file?
What will happen if you echo a binary file to your terminal?
What is the difference between Windows and Unix text files?
How can we convert text files between the Unix and Windows standards?
What is a directory?
What does it mean that Unix filenames are case-sensitive?
What is a root directory?
How many root directories does a Unix system have? How many does Windows have?
What is contained in the /bin and /usr/bin directories?
What is a subdirectory?
What is a home directory?
What is an absolute path name and how do we recognize one?
What is the absolute path name of Sue's asg01.c in the tree diagram in this section?
Of what is the CWD a property?
Show a Unix command that prints the CWD of a shell process.
Show a Unix command that sets the CWD of a shell process to /tmp.
Show a Unix command that sets the CWD of a shell process to our home directory?
What is a relative path name and how to we recognize one?
Is a relative path name unique? Prove your answer with an example.
How does Unix determine the absolute path name from a relative path name?
If the CWD of a process is /usr/local, what is the absolute path name of "bin/ape"?
If the CWD of a process is /usr/local, what is the relative path name of /usr/local/lib/libxtend.a?
If the CWD of a process is /usr/local, what is the relative path name of /usr/bin?
If the CWD of a process is /usr/local, what is the relative path name of /etc/motd?
Where does a new process get its initial CWD?
Why should we avoid using absolute path names in programs and scripts?
Show a Unix command that lists the contents of the parent directory of CWD.
If the CWD of a process is /home/bob/Programs, what is the relative path name of /home/bob/Data/input1.txt?
How do we remove a file called "~sue" in the CWD?
What are the three user categories that can be granted permissions on a file or directory?
What does it mean to set execute permission on a file? On a directory?
Given the following ls -l output, who can do what to bootcamp.pdf?
-rw-r----- 1 joe users 82118 Aug 2 09:47 bootcamp.pdf
How would we allow users who are not in the owning group to read bootcamp.pdf?
How would we allow members of the group to read and execute the program "simulation" and at the same time revoke all access to other users?
Show a Unix command that makes the directory "MyScripts" world writable.
Show a Unix command that changes the group ownership of the directory "Research" to the group "smithlab".
Assuming your primary group is "joe", show a Unix command that configures the directory Research form the previous question so that new files you create in it will be owned by "smithlab" instead of "joe"?