Download It

Fortunately for the vast population of underfunded researchers in most fields, there is a huge and growing collection of open source software available for research.

Open source software is software for which the source code is freely available. Source code (a collective noun, like "milk" and "honey") is the program in a human-readable language such as C, C++, Fortran, MATLAB, Python, R, etc. It must be compiled (translated to machine language by a program called a compiler) or interpreted by a program called an interpreter, in order to run on a computer. Compiled and interpreted languages are discussed in the section called “Compiled vs Interpreted Languages”.

The quality of open source software varies from almost unusable, to better than commercial alternatives. The only way to determine whether open source software will serve your needs is by exploring the available options. Things can change rapidly as well, so what you learned about software options a year ago may no longer apply.

The main advantages of open source over commercial software are:

  1. It's usually free.
  2. There are no licenses to manage.
  3. It will usually run on whatever hardware and operating system you prefer.
  4. Installation is trivial if done properly.

On the whole, open source software has come of age. It is now possible for most computer users to do all of their everyday work using exclusively open source operating systems such as BSD, Illumos and Linux and open source applications such as Firefox and LibreOffice.

How to Shoot Yourself in the Foot with Open Source Software

Many people fear open source software because they assume it is hard to install and learn. Installation of open source software is actually far easier than commercial software installations when done properly, using a package manager such as Debian packages, FreeBSD ports, MacPorts, or Pkgsrc. Package managers became popularized during the 1990s as open source software availability exploded along with computer speed and storage.

Fear of open source software installations usually arises from a lack of awareness of package managers and subsequent unnecessary attempts to perform difficult and poorly-documented "caveman installs", where software is manually downloaded, patched, built, and installed. Unfortunately, many people still perform caveman installs, mostly because they don't know any better. I miss the 1980s too, but nobody should be installing software this way in the 21st century.

Example 2.1, “A Typical Caveman Install” describes a typical caveman install for the R statistics package. Note that this example is relatively simple and well-documented compared to many.

Example 2.1. A Typical Caveman Install

2.1 Simple compilation

First review the essential and useful tools and libraries in Essential and useful other programs under a Unix-alike, and install those you want or need. Ensure that the environment variable TMPDIR is either unset (and /tmp exists and can be written in and scripts can be executed from) or points to a valid temporary directory (one from which execution of scripts is allowed).

Choose a directory to install the R tree (R is not just a binary, but has additional data sets, help files, font metrics etc). Let us call this place R_HOME. Untar the source code. This should create directories src, doc, and several more under a top-level directory: change to that top-level directory (At this point North American readers should consult Setting paper size.)

Issue the following commands:

                ./configure
                make
                

(See Using make if your make is not called 'make'.)

Users of Debian-based 64-bit systems may need

                ./configure LIBnn=lib
                make
                

Then check the built system works correctly by

                make check
                

Failures are not necessarily problems as they might be caused by missing functionality, but you should look carefully at any reported discrepancies. (Some non-fatal errors are expected in locales that do not support Latin-1, in particular in true C locales and non-UTF-8 non-Western-European locales.) A failure in tests/ok-errors. R may indicate inadequate resource limits (see Running R). More comprehensive testing can be done by

                make check-devel
                

or

                make check-all
                

See file tests/README. If the configure and make commands execute successfully, a shell-script front-end called R will be created and copied to R_HOME/bin. You can link or copy this script to a place where users can invoke it, for example to /usr/local/bin/R. You could also copy the man page R.1 to a place where your man reader finds it, such as /usr/local/man/man1.

If you want to install the complete R tree to, e.g., /usr/local/lib/R, see installation. Note: you do not need to install R: you can run it from where it was built. You do not necessarily have to build R in the top-level source directory (say, TOP_SRCDIR).

To build in BUILDDIR, run cd BUILDDIR TOP_SRCDIR/configure make and so on, as described further below. This has the advantage of always keeping your source tree clean and is particularly recommended when you work with a version of R from Subversion. (You may need GNU make to allow this, and you will need no spaces in the path to the build directory.)

Now rehash if necessary, type R, and read the R manuals and the R FAQ (files FAQ or doc/manual/R-FAQ.html, or http://CRAN.R-project.org/doc/FAQ/R-FAQ.html which always has the version for the latest release of R).


Before doing the above, however, one must also install dozens of other prerequisite packages, following a similar process for each one. This would include a compiler suite, GNU configure, possibly a make utility, and many math libraries on which R depends.

If you can follow the instructions and all goes well, you may be done with all this in a day or two. More likely, you will struggle for weeks and ultimately give up. If you're not using the exact same version of the exact same operating system as the developers, instructions like these are unlikely to work. Since the developers of the software and all its prerequisites likely use a variety of operating systems, it's very unlikely that you'll get through any installation without running into problems that you're probably not qualified to solve.

How Not to Shoot Yourself in the Foot with Open Source Software

Lucky for you, there are thousands of nerds like me creating ports and packages of popular software. As a result, all of the pain described above can be avoided by simply choosing an operating system with a good package manager and learning how to use it. The current state of the most popular package managers can be found at https://repology.org/.

FreeBSD ports, for example, makes it possible to install any one of over 30,000 software packages over the Internet, usually in a matter of seconds. Instead of following the instructions above from the R developers, a FreeBSD user would simply run:

            pkg install R
            

This single command will automatically download and install R and all the necessary prerequisite packages required to run it.

The pkg command installs a "binary" (precompiled) package built to be compatible with most common CPUs. Binary packages are built to utilize only CPU features present on most typical systems. For example, an AMD Epyc processor has features not present in an Intel Core i5 that might make a given program significantly faster. However, binary packages won't use these features, because the software would not run on an i5 processor.

With FreeBSD ports, we can just as easily build and install the R package optimized for the local CPU type. It also allows us to build the package with non-default features and options. The command below will build R, instructing the compiler to use all CPU features available on the computer doing the build.

            cd /usr/ports/math/R
            env CFLAGS=-march=native make install
            

Installations of this type take longer to complete, typically minutes to hours, but they require no more effort on your part than installing with the pkg command.

FreeBSD ports also provides a menu for selecting build options, as shown in Figure 2.1, “FreeBSD Ports Build Options”. Providing this kind of flexibility via binary packages would mean a separate binary package for every possible combination of options, which is not very practical.

Figure 2.1. FreeBSD Ports Build Options

FreeBSD Ports Build Options

We can similarly install R via a Debian package on Debian-based Linux distributions (e.g. Debian or Ubuntu) as follows:

            apt install R
            

Unfortunately, the Debian package system does not provide an easy way for the average user to build an optimized or customized version from source. Some package managers, such as FreeBSD ports, Gentoo's Portage, MacPorts, and pkgsrc are designed to support conveniently building from source. Others such as Debian packages, RPM, and Conda, are only designed for installing prebuilt binary packages.

Fortunately, Linux users can use the pkgsrc package manager in addition to native package managers such as apt. Pkgsrc is designed to work on any Unix-like platform and can exist alongside other package managers (with some care).

More detail on various package managers and how to use them can be found in Chapter 40, Software Management. In summary, here's a brief comparison of caveman installs and package managers:

  • Caveman installs are very difficult and require extensive knowledge of software development tools. Package managers make installing software trivial.

  • Upgrading involves the same nightmarish process as the initial installation. In contrast, upgrading R and all other packages installed via FreeBSD packages is a matter of typing pkg upgrade. To upgrade all your Debian packages, simply run apt update && apt upgrade.

  • Some of the software that the caveman install depends on may come from package managers such as FreeBSD ports or Debian packages. Upgrading or removing these packages may break the caveman install. R will suddenly stop working and it may be difficult to fix. Package managers, in contrast, make sure that all of the packages installed are compatible.

  • Uninstalling a caveman install requires knowing where all the files are and removing them manually. Using FreeBSD ports, you would simply run pkg remove R. Using Debian packages, apt remove R.

  • Caveman installs might overwrite files installed by other software. Package managers have safeguards that detect conflicts and prevent this from happening. To get around a conflict with FreeBSD ports, we can install from source to a different installation prefix.

What if There is No Package?

If there is no package for your software in the package manager you are using, there are likely better solutions than doing caveman installs for all your software.

  • Look into portable package managers such as Conda, pip, or pkgsrc. These can coexist with the native package manager for your system.

  • Switch to a different operating system. This might sound radical, but it's actually much easier and safer than a lifetime of caveman installs.

  • Learn to create your own packages. This will require an investment of time, but by becoming a package maintainer, you break your dependence on others for managing the software you need. You will forever have the power to cleanly install, remove, and upgrade the software you need.

Practice

Note

Be sure to thoroughly review the instructions in Section 2, “Practice Problem Instructions” before doing the practice problems below.
  1. What are three advantages of FOSS (free open source software)?

  2. What can the average computer user do with FOSS nowadays?

  3. What is a caveman installation and when should one be performed?

  4. What is a package manager?

  5. What is an advantage of source-based package managers such as FreeBSD ports, Gentoo Portage, MacPorts, and pkgsrc, over binary-only package managers such as Debian packages and Conda?

  6. What are some of the problems that package managers solve when compared with caveman installations?

  7. What can you do besides resort to caveman installations if your package manager doesn't have a package for your software?