Chapter 2. Where do I get the Software?

Most researchers today don't need to do much programming beyond some scripting to automate analyses. There's a vast number of both commercial and free software applications available to handle most of the computational needs of researchers, and more being developed all the time.

The need for researchers who can write basic scripts to automate processing pipelines continues to grow. Existing software may do all or most of what you need, but not always conveniently. You will likely need to run multiple programs in sequence to generate the results you need. This sequence is called a pipeline and can often be automated with a simple script. Most computational researchers should set out immediately to learn Unix and scripting, as described in Chapter 3, Using Unix and Chapter 4, Unix Shell Scripting, but learning hard-core programming will be a lower priority for most.

In many cases, however, solid programming skills could turn out to be a major advantage in the race for research grants.

Unfortunately, most scientific software is very low quality. While there are many well-organized projects around, much of the software is developed as someone's thesis and then abandoned after they graduate. The ability to improve or replace low-quality existing software can remove major barriers to your research.

Also, research by definition involves doing things that have never been done before, and this may require writing new software to perform novel analyses.

Hiring someone else to do the programming is not feasible for most researchers. Experienced scientific programmers are very rare, and likely have higher salaries than you do, so you probably can't afford one even if you can find one. You might find a student to work with you on the cheap or free (for credit), but most likely they'll leave you with badly written, unmaintainable code that the next programmer won't be able to work with.

The only sustainable solution for most researchers who need code written is to do it themselves. The question, then, is which of the dozens of popular programming languages should you learn? This topic is covered in detail in Part III, “High Performance Programming”.

For now, suffice it to say that you should become adept at Unix shell scripts, one purely compiled language such as C, which may run hundreds of times faster than scripting languages, and perhaps another interpreted language such as Perl, Python, or R, whichever is most useful in your field. This topic is discussed in more depth below.

Practice

Note

Be sure to thoroughly review the instructions in Section 2, “Practice Problem Instructions” before doing the practice problems below.

What are the three ways to obtain research software?
What kind of programming do most researchers need to learn? Why?
Why is it a good idea for researchers to learn how to program beyond simple scripting?
What is the benefit of learning a compiled language?