1.2. What is Unix?

1.2.1. Aw, man... I Have to Learn Another System?

Well, yeah, but it's the last time, I promise. As you'll see in the sections that follow, once you've learned to use Unix, you'll be able to use your new skills on virtually any computer. Over time you'll get better and better at it, and never have to start over from scratch again. Read on to find out why.

If you plan to do computational research, you have two choices:

  • Learn to use Unix.
  • Rely on the charity of others.

Most scientific software runs only on Unix and very little of it will ever have a graphical or other user interface that allows you to run it without knowing Unix.

There have been many attempts to provide access to scientific software via web interfaces, but most of them die out after a short time. Many are created by graduate students who move on and those created by more seasoned professionals usually become to much of a burden to sustain.

Hence, in order to be independent in your research computing, you must know how to use Unix in the traditional way. This is the reality of research computing. It's much easier to adapt yourself to reality than to adapt reality to yourself. This chapter will help you become proficient enough to survive and even flourish on your own.

Unix began as the trade name of an operating system developed at AT&T Bell Labs around 1970. It quickly became the model on which most subsequent operating systems have been based. Eventually, "Unix" came into common use to refer to any operating system mimicking the original Unix, much like Band-Aid is now used to refer to any adhesive bandage purchased in a drug store.

Over time, formal standards were developed to promote compatibility between the various Unix-like operating systems, and eventually, Unix ceased to be a trade name. Today, the name Unix officially refers to a set of standards to which most operating systems conform.

Look around the room and you will see many standards that make our lives easier. ( Wall outlets, keyboards, USB ports, light bulb sockets, etc. ) All of these standards make it possible to buy interchangeable devices from competing companies. This competition forces the companies to offer better value. They need to offer a lower price and/or better quality than their competition in order to stay in business.

In a nutshell, Unix is every operating system you're likely to use except Microsoft Windows. Table 1.1, “Partial List of Unix Operating Systems” provides links to many Unix-compatible operating systems. This is not a comprehensive list. Many more Unix-like systems can be found by searching the web.


Note

Apple's Mac OS X has many proprietary extensions, including Apple's own user interface, but is almost fully Unix-compatible and can be used much like any other Unix system by simply choosing not to use the Apple extensions.

Unix is historically connected with other standards such as X/Open XPG and POSIX (Portable Operating System Interface based on Unix). The Unix-related standards are fact the only open standards in existence for operating systems. For the official definition of Unix and associated standards, see the Open Group website: http://www.unix.org/what_is_unix.html.

The Unix standards serve the same purpose as all standards; to foster collaboration, give the consumer freedom of choice, reduce unnecessary learning time, and annoy developers who would rather ignore what everyone else is doing and reinvent the wheel at their employer's expense to gratify their own egos.

Unix standards make things interchangeable in the same way as many other standards, such as power plugs, USB, cell phone SIM cards, etc.

They allow us to become operating system agnostic nomads, readily switching from one Unix system to another as our needs or situation dictate.

Note

When you develop programs for any Unix-compatible operating system, those programs can be easily used by people running any other Unix-compatible system. Most Unix programs can even be used on a Microsoft Windows system with the aid of a compatibility layer such as Cygwin (See Section 1.5.1, “Cygwin: Try This First”).

I once knew a mechanic whose wife was concerned about how often he changed jobs. His response: "That's why toolboxes have wheels." Likewise, you can write programs with "wheels" that can easily be taken elsewhere to run on other Unix-compatible systems, or you can bolt your programs to the floor of a proprietary system like MS Windows and abandon them when it's time to move.

Note

Once you've learned to use one Unix system, you're ready to use any of them. Hence, Unix is the last system you'll ever need to learn!

Unix systems run on everything from your cell phone to the world's largest supercomputers. Unix is the basis for Apple's iOS, the Android mobile OS, embedded systems such as networking equipment and robotics controllers, most PC operating systems, and many large mainframe systems.

Many Unix systems are completely free (as in free beer) and can run thousands of high quality free software packages.

It's a good idea to regularly use more than one Unix system. This will make you aware of how much they all have in common and what the subtle differences are.

1.2.2. Operating System or Religion?

Aside

Keep the company of those who seek the truth, and run from those who have found it.

-- Vaclav Havel

The more confident someone is in their views, the less they probably know about the subject. As we gain life experience and wisdom, we become less certain about everything and more comfortable with that uncertainty.

What looks like confidence is usually a symptom of ignorance of our own ignorance, generally fueled by ego.

If you discuss operating systems at length with most people, you will discover, as the ancient philosopher Socrates did while discussing many topics with "experts", that their views are not based on broad knowledge and objective comparison. Before taking advice from anyone, it's a good idea to find out how much they really know and what role emotion and ego play in their preferences. Most people quickly become attached to their views and expend more effort clinging to them than it would actually take to verify or refute them.

The whole point of the Unix standard, like any other standard, is freedom of choice.

However, you won't have any trouble finding evangelists for a particular brand of Unix-compatible operating system on whom this point is lost. "Discussions" about the merits of various Unix implementations often involve arrogant pontification and emotional outbursts, and even occasional cussing.

If you step back and ask yourself what kind of person gets emotionally attached to a piece of software, you'll realize whose advice you should value and whose you should not. Rational people will keep an open mind and calmly discuss the objective measures of an OS, such as performance, reliability, security, easy of maintenance, specific capabilities, etc. They will also back up their opinions with facts rather than try to bully you into agreeing with their views.

If someone tells you that a particular operating system "isn't worth using", "is way behind the times", or "sucks wads", rather than asking you what you need and objectively discussing alternatives, this is probably someone whose advice you can safely ignore.

Evangelists are typically pretty easy to spot. They will somehow magically assess your needs without asking you a single question and proceed to explain (often aggressively) why you should be using their favorite operating system or programming language. They tend to have limited or no experience with other alternatives. This is easy to expose with a minute or two of questioning. So this OS is garbage? How many years of experience to you have with it? For what types of problems have you attempted to use it?

Ultimately, the system that most easily runs your programs to your satisfaction is the best one for you. That could turn out to be BSD, Cygwin, Linux, Mac OS X, or any other.

Someone who knows what they're doing and truly wants to help you will always begin by asking you a series of questions in order to better understand your needs. "What program(s) do you need to run?", "Do they require any special hardware?", "Do you need to run any commercial software, or just open source?", etc. They will then consider multiple alternatives and inform you about the capabilities of each one that might match your needs.

Unfortunately, many people in science and engineering are not so rational or knowledgeable, but driven by ego. They want to look smart or be cool by using the latest trendy technologies, even if those technologies do nothing to meet their needs. In the minds of the ego-driven, new technologies become solutions looking for problems.

Some recent "we're not cool unless we use this" fads include Hadoop, cgroups, solid state disk drives (SSDs), ultrathin laptops, GPUs for scientific computing, parallel file systems, machine learning, Bayesian networks, containers, etc. All of these technologies are very useful under the right circumstances, but many people waste time and money trying to apply them to ordinary tasks that don't benefit from them at all, and may actually suffer due to the high cost and added man-hours wasted on them.

For example, SSDs are roughly three times faster than magnetic disks. However, they cost a lot more for the same capacity. Does the added speed they provide actually allow you to accomplish something that could not be done if your program took a little longer to run on a magnetic disk? How much does an SSD actually reduce your run time? If your software is mainly limited by CPU speed, you won't see any measurable reduction in run-time by switching to an SSD.

The vastly larger capacity/cost ratio of a magnetic disk might be of more value to you. SSDs also burn out over time, as they have a limited number of write cycles. An SSD on a computer with constant heavy disk activity will improve performance, but the SSD might be dead in a year or two. Magnetic disks on average actually last longer, despite being more physically fragile than SSDs.

All scientists and engineers are capable of logical thought, but outside peer-reviewed publications, many only use it to rationalize what they want to believe despite all evidence to the contrary.

Aside

"I don't understand why some people wash their bath towels. When I get out of the shower, I'm the cleanest object in my house. In theory, those towels should be getting cleaner every time they touch me... Are towels supposed to bend?"

-- Wally (Dilbert)

Of course, the "more is always better" fallacy is not limited to science and engineering. Many people waste vast amounts of time and money on things that have little or no real impact on their lives (a new set of golf clubs, a racing jersey for weekend bike rides, four wheel drive, the latest iPhone, etc.) Avoid falling into this trap and life will be simpler, more productive, and more relaxing.

My personal recommendations for running Unix software (for now, these could change in the distant future) are listed below. Note that these recommendations are meant to indicate only what is optimal to minimize your wasted effort, not what is necessary to succeed in your work. All of these systems are somewhat interchangeable with each other and the many other Unix based systems available, so deviating from these recommendations will not lead to catastrophe.

More details on choosing a Unix platform are provided in ???.

  • Servers running mostly open source software: FreeBSD.

    FreeBSD is extremely fast, reliable, and secure. Software management is very easy with FreeBSD ports, which offers over 33,000 distinct and usually very recent software packages (not counting different builds of the same software). The ports system supports installation via either generic binary packages, or you can just as easily build from source with custom options or optimizations for your specific CPU. With the Linux compatibility module, FreeBSD can directly run most Linux closed-source programs with no performance penalty and a little added effort and resources.

  • Servers running mainly or commercial applications or CUDA GPU software: Enterprise Linux (CentOS, RHEL, Scientific Linux, SUSE).

    These systems are designed for better reliability, security, and long-term binary compatibility than bleeding-edge Linux systems. They are the only platforms besides MS Windows and Mac OS X supported by many commercial software vendors. While you may be able to get some commercial engineering software running on Ubuntu or Mint, it is often difficult and the company will not provide support. Packages in the native Yum repository of enterprise Linux are generally outdated, but more recent open source software can be installed using a separate add-on package manager such as pkgsrc.

  • An average Joe who wants to browse the web, use a word processor, etc.: GhostBSD, Ubuntu, or similar open source Unix system with graphical installer and management tools, or Macintosh.

    These systems make it easy to install software packages and system updates, with minimal risk of breakage that a non computer wizard would know how to fix.

    Debian Linux
    FreeBSD with the Lumina desktop environment
  • Someone who uses mostly Windows-based software, but needs a basic Unix environment for software development or connecting to other Unix systems: A Windows PC with Cygwin.

    Cygwin is free, entirely open source, and very easy to install in about 10 minutes on most Windows systems. It has some performance bottlenecks, fewer packages than a real Unix system running on the same machine, and a few other limitations, but it's more than adequate for the needs of many typical users.

    WSL (Windows Services for Linux) is an alternative to Cygwin which is binary compatible with a real Linux system such as Debian, but it lacks graphical capabilities, is slower than Cygwin (see Table 1.2, “Pkgsrc Build Times”), and not entirely open source, leaving you at the mercy of Microsoft if you become dependent on it.

1.2.3. The Unix Standard API

Programmers cost money. This is a problem in the computer industry for which we haven't found a solution. Even if we keep them locked in a basement for days at a time and only pay them for half the hours they work (which many of them oddly find perfectly acceptable), they still need to be fed and watered occasionally, and paid a reasonably competitive salary so they won't leave and go work in someone else's basement.

Unix systems adhere to an application program interface (API) standard, which means that programs written for one Unix-based system can be run on any other with little or no modification, even if it runs on completely different hardware. For example, programs written for an Intel/AMD-based Linux system will also run a PowerPC based Mac, a Solaris system running on a Sparc processor, or FreeBSD on an ARM processor.

An API defines a set of functions (subprograms) used to request services from the operating system, such as opening a file, allocating memory, running another program, etc.

These functions are the same on all Unix systems, but some of them are different on Windows and other non-standard systems. For example, to open a file in a C program on any Unix system, one could use the fopen() function:

	    FILE *fopen(const char *filename, const char *mode);
	    

Microsoft compilers also support fopen(), but also provide another function for the same purpose that won't work on other systems:

	    errno_t fopen_s(FILE** pFile, const char *filename, const char *mode);
	    

Note

Microsoft claims that fopen_s() is more secure, which is debatable. Note however, that even if this is true, the existing fopen() function itself could have been made more secure rather than creating a separate, non-standard function that does the same thing.

Here are a few other standard Unix functions that can be used in programs written in C and most other compiled languages. These functions can be used on any Unix system, regardless of the type of hardware running it. Some of these may also work in Windows, but for others, Windows uses a completely different function to achieve the same goal.

	    chdir()         // Change current working directory
	    execl()         // Load and run another program
	    mkdir()         // Create a directory
	    unlink()        // Remove a file
	    sleep()         // Pause execution of the process
	    DisplayWidth()  // Get the width of the screen
	    

Because the Unix API is platform-independent, it is also possible to compile and run most Unix programs on Windows with the aid of a compatibility layer, software that bridges the difference between two platforms. ( See Section 1.5.1, “Cygwin: Try This First” for details. ) It is not generally possible to compile and run Windows software on Unix, however, since Windows has many PC-specific features.

Since programs written for Unix can be run on almost any computer, including Windows computers, they will probably never have to be rewritten in order to run somewhere else!

Programs written for non-Unix platforms will only run on that platform, and will have to be rewritten (at least partially) in order to run on any other system. This leads to an enormous waste of man-hours that could have gone into creating something new.

They may also become obsolete as they proprietary systems for which they were written evolve. For example, most programs written for MS DOS and Windows 3.x are no longer in use today, while programs written for Unix around that same time will still work on modern Unix systems.

Of course, if you had a lot of fun writing a program the first time, you may want to do it again. In that case, you won't want to use Unix, since it would take away at least half your fun.

1.2.4. Shake Out the Bugs

Another advantage of programming on standardized platforms is the ability to easily do more thorough testing.

Compiling and running a program on multiple operating systems and with multiple compilers will almost always expose bugs that you were unaware of while running it on the original development system. The same bug will have different effects on different operating systems, with different compilers or interpreters, or with different compile options (e.g. with and without optimization).

For example, an errant array subscript or pointer might cause corruption in a non-critical memory location in some environments, while causing the program to crash in others.

A program may seem to be fine when you compile it with Clang and run it on your Mac, but may not compile, or may crash when compiled with GCC on a Linux machine.

Finding bugs now may save you from the stressful situation of tracking them down under time pressure later, with an approaching grant deadline. A bug that was invisible on your Mac for the test cases you've used could also show up on your Mac later, when you run the program with different inputs.

Developing for the Unix API makes it easy to test on various operating systems and with different compilers. There are many free BSD and Linux based systems, as well as free compilers such as Clang and GCC. Most of them can be run in a virtual machine (???), so you don't even need to have another computer for the sake of program testing.

Take advantage of this easy opportunity to stay ahead of program bugs, so they don't lead to missed deadlines down the road.

1.2.5. The Unix Standard UI

Let's face it: Most people don't like to learn new things. At least not about computers. Unix can help with this, too.

All Unix systems support the same basic set of commands, which conform to standards so that they behave the same way everywhere. So, if you learn to use FreeBSD, most of that knowledge will directly apply to Linux, Mac OS X, Solaris, etc.

Another part of the original Unix design philosophy was to do everything in the simplest way possible. As you learn Unix, you will likely find some of its features befuddling at first. However, upon closer examination, you will often come to appreciate the elegance of the Unix solution to a difficult problem. If you're observant enough, you'll learn to apply this Zen-like simplicity to your own work, and maybe even your everyday life. Who knows, mastering Unix could even help you attain enlightenment someday.

You will also gradually recognize a great deal of consistency between various Unix commands and functions. For example, many Unix commands support a -v (verbose) flag to indicate more verbose output, as well as a -q (quiet) flag to indicate no unnecessary output. Over time, you will develop an intuitive feel for Unix commands, become adept at correctly guessing how things work, and feel downright God-like at times.

Unix documentation also follows a few standard formats, which users quickly get used to, making it easier to learn new things about commands on any Unix system.

The consistency provided by Unix standards minimizes the amount of knowledge Unix users need in order to effectively utilize the numerous Unix systems available.

In a nutshell, the time and effort you spend learning any Unix system will make it easy to use any other in the future. You need only learn Unix once, and you'll be proficient with many different implementations such as FreeBSD, Linux, and Mac OS X.

1.2.6. Freedom of Choice

Unix standards are designed to give the user as much freedom of choice as possible. Unix users can run their programs on virtually any type or brand of hardware, and switch at will when a better or cheaper option appears.

As a case in point, until the early 1990's, most Unix systems were high-end workstations or minicomputers costing $10,000 or more. Many of the same programs that ran on those systems are now running on commodity PCs and even laptops that cost a few hundred dollars. In fact, at this very moment I'm editing this chapter on an old ThinkPad someone else threw away, which is now running FreeBSD with the XFCE desktop environment.

Another of the main design goals of Unix is to stay out of the user's way. With freedom comes responsibility, though. A common quip about Unix is that it gives us the freedom to shoot ourselves in the foot. Unix does a lot to protect users from each other, but very little to protect users from themselves. This usually leads to some mistakes by new users, but most users quickly become conditioned to be more careful and come to prefer the freedom Unix offers over more restrictive, cumbersome systems.

1.2.7. Fast, Stable and Secure

Since Unix systems compete directly with each other to win and retain users running the same programs, developers are highly motivated to optimize objective measures of the system such as performance, stability, and security.

Most Unix systems operate near the maximum speed of the hardware on which they run. Unix systems typically respond faster than other systems on the same hardware and run intensive programs in less time. Many Unix systems require far fewer resources than non-Unix systems, leaving more disk and memory for use by your programs.

Unix systems may run for months or even years without being rebooted. Software installations almost never require a reboot, and even most security updates can be applied without rebooting. As a professional systems manager, I run the risk of forgetting that some of my Unix systems exist because I haven't touched them for so long. I'm occasionally reminded when a sleep-deprived graduate student nods off at his desk and knocks the monitor to the floor with a nervous twitch.

Stability is critical for research computing, where computational models often run for weeks or months. Users of non-Unix operating systems often have to choose between killing a process that has been running for weeks and neglecting critical security updates that require a reboot.

Very few viruses or other malware programs exist for Unix systems. This is in part due to the inherently better security of Unix systems and in part due to a strong tradition in the Unix community of discouraging users from engaging in risky practices such as running programs under an administrator account and installing software from pop-ups on the web.

1.2.8. Sharing Resources

Your mom probably told you that it's nice to share, but did you know it's also more efficient?

One of the major problems for researchers in computational science is managing their own computers.

Most researchers aren't very good at installing operating systems, managing software, apply security updates, etc., nor do they want to be. Unfortunately, they often have to do these things in order to conduct computational research.Computers managed by a tag-team of researchers usually end up full of junk software, out-of-date, full of security issues, and infected with malware.

Some universities have staff to help with research computing support, but most do not. Good IT guys are expensive and hard to find, so it takes a critical mass of demand from researchers to motivate the creation of a research computing support group.

Since Unix is designed from the ground up to be accessed remotely, Unix creates an opportunity to serve researchers' needs far more cost-effectively than individual computers for each researcher. A single Unix machine on a modern PC can support dozens or even hundreds of users at the same time.

IT staff managing one or a few central Unix hosts can serve the needs of many researchers, freeing the researchers to focus on what they do best. All the researchers need is a computer that can connect to the central Unix host, and the systems managers of the host can take care of all the hard stuff.