The figure below represents the time line of a computational science project.
Table 39.1. Computation Time Line
Development Time | Deployment Time | Learning Time | Run Time |
---|---|---|---|
Hours to years | Hours to months (or never) | Hours to weeks | Hours to months |
Not relevant to most researchers.
Learn software development life cycle, efficient coding and testing techniques.
Understand objective language factors; compiled vs interpreted speed, portability, etc.
Deployment time virtually eliminated by package managers, described in the section called “Package Managers”.
Software efficiency (algorithms, language selection) should always be the first focus. Often software can be made to run many times faster simply by changing the inputs. Is the resolution of your fluid model higher than you really need? Are you analyzing garbage data along with the useful data? Is your algorithm implemented in an interpreted language such as Matlab, Perl, or Python? If so, it might run 100 times faster if rewritten in C, C++, or Fortran. See the section called “Language Selection”.
System reliability (system crashes cause major setbacks, especially where check pointing is not used). Operating system, (FreeBSD, ENTERPRISE Linux), UPS.
Some scientific analyses take a month or more to run. FSL, single-threaded. Average up time of 1 month is not good enough.
From the researcher's perspective, this may mean restarting simulations or analyses, losing weeks worth of work if check pointing is not possible.
From the sysadmin's perspective, if managing 30 machines with an average up time of 1 month, you average 1 system crash per day.
Some choose scheduled reboots to maximize likelihood of completing jobs. Better to do your homework and find an operating system with longer up times.
Parallelism is expensive in terms of both hardware and learning curve. It should be considered a last resort after attempting to improve software performance.