4. Motivation and Goals

4.1. Why Are We Here?

The difference between what is being accomplished in scientific computing today and what could be accomplished, using existing inexpensive and free tools, is staggering. Many researchers spend months struggling to do simple computational analyses that could be done in a few hours with the right tools and knowledge. Worse yet, they often give up, leaving potentially life-saving research unfinished. The entire reason for all of this unrealized potential is a simple lack of proper education.

A major problem in scientific computing is people who know just enough to be dangerous. Many computational scientists know a little Unix, a little scripting, a little Python, and a little C or C++. This leads to badly designed programs and scripts that waste computing resources and make it difficult or impossible to reproduce results, a cornerstone of all science.

This book and this course are here to address these issues by showing how to use many of the amazing tools available for free, manage the software you need with minimal effort, and write software of your own as efficiently as possible. We will focus on depth, not breadth, so that you will learn how to do things well. The hope is that you will then carry these good habits forward as you develop more breadth of knowledge. Perhaps someday you will teach others as well, so that the benefits of this knowledge will continue to spread far and wide.

4.2. Damn it Jim, I'm a Scientist, not a Systems Programmer...

Why should researchers learn about computing? Because nobody can do it for them. Some researchers cling to the dream of hiring computer staff to handle these things, so they won't have to learn and do it themselves. This is unrealistic for several reasons:

  • There aren't enough computer experts around to fill even a small fraction of the needs in scientific research.

  • Even if the talent existed, we couldn't afford it. Most computer professionals with the necessary skills are earning more than typical PIs (principal investigators) doing scientific research will ever earn.

  • Computer staff would need to be trained in your field of research before they can do anything more than manage computers for you. Writing software to conduct computation research requires a level of expertise in the domain typically only achieved by experienced researchers.

The bottom line is, if you don't have the computing skills to do your own computational analysis, your research will likely be severely delayed or stalled entirely.

There is an enormous gap between what many researchers want and what is feasible. Many researchers wish for easier ways to do their research computing, including software with a graphical user interface or web-based interface. Convenient point-and-click interfaces help people avoid learning the things they fear such as the Unix command line, scripting, and programming. However, the manpower to build and maintain these convenient user interfaces does not exist, and never will. Building and maintaining graphical user interfaces is astronomically expensive, and the research community has only a tiny fraction of the resources necessary to pay for it. The interfaces that scientists attempt to develop tend to be limiting and unreliable.

Avoidance is counterproductive and futile. It not only delays the inevitable need to learn other approaches, it also wastes resources attempting to create more convenient interfaces that will likely never be finished or maintained. If you do manage to find a convenient user interface that works for you, chances are it will be abandoned soon, which means your research will not be reproducible. Convenient scientific software is often written, with the best of intentions, by students as part of a thesis. In most cases, work on such software ceases entirely after they graduate. More rarely, a lab might get funding to develop and maintain a software package for a longer term, but discover that they can't find time or staff to do the work.

Generic tools, such as the Unix command line and scripting languages, are widely used by a much broader audience than just the scientific research community. As a result, they are better maintained, more reliable, and better supported. By using them, we leverage the vast resources of other industries much wealthier than we are. Learning to use these generic tools now will be much easier than resisting and giving into the inevitable later.

I spent nearly twenty years working in understaffed research computing support environments. During that time, it became clear that struggling to hire and retain the support staff needed to assist the many researchers across campus was a hopeless cause. The only solution to the support problem is to make researchers more self-sufficient. The required skills are not difficult to learn. A small investment of time will produce a huge payoff for your research output.

4.3. Solutions Looking for Problems

Beware the tendency to confuse tools with fields of study. Many professionals deliberately study specific technologies for their own sake, such as virtualization, containers, machine learning, specific languages and operating systems, etc. It is often necessary to focus on the technologies while learning them, but not such a good idea when applying them. E.g. using machine learning to optimize the organization of your sock drawer might be useful as a learning exercise, but would be foolish as a real-world solution.

Problems arise when we try to apply the tools we have invested in learning to solve every new problem they encounter. This is a backward approach to problem-solving that usually leads to a very suboptimal solution at best. It is unfortunately a common mistake, however. When all you have is a hammer, everything looks like a nail. Shoe-horning problems into a solution that you think is "cool" or one that you already know generally leads to wasted effort and computing resources. Unfortunately, the world is full of techno-geeks selling over-complicated solutions to simple problems that could be handled much more cost-effectively.

To find the best solution for a problem, we must look for the best solution for the problem. Sound obvious? It is. Nevertheless, many people don't think this way and instead look for ways to solve it with their favorite tools. Finding the optimal solution means examining the problem with an open mind and exploring solutions that we don't know as well as the ones we do. A willingness to learn new things every time you take on a new project is the difference between a great engineer or scientist and a mediocre one.

4.4. What You will Learn

The overall goal of this user guide is to provide all the knowledge needed for researchers to get started with research computing. If you're a researcher, you will learn to be self-sufficient so that your computational analysis can move forward in the absence of I.T. staff to help you. If you are one of the few I.T. personnel working in scientific computing, you will learn to use your time as efficiently as possible, so that you can effectively serve the researchers who vastly outnumber you.

Different users have different needs and most will not need to read all of the chapters of this guide. The guide is divided into four parts, each of which is focused on the needs of typical types of researchers. You may only need the knowledge presented in one or two parts, or you may need it all!

Some of the most commonly needed knowledge covered here:

  • How computers are used in scientific research

  • How to find (or build) and use available computing resources

  • How to use Unix-compatible operating system environments, including BSD, Cygwin, Linux, and Mac OS X

  • How to write portable shell scripts to automate the execution of your research tools on any Unix-compatible operating system

  • The types of parallel computing available today

  • How to schedule typical jobs on clusters and grids

  • Where to find more detailed information on all of the above

This document and other information can be found at:

https://acadix.biz/publications.php