Ronni Grapenthin - Notes
New Mexico Tech
Dept. of Earth &
801 Leroy Place
Grad school puts a new level of strain on students compared to the good old undergraduate times. It between taking classes, figuring out research targets, outlining a career (never too early), you should not forget to convert the code / scripts you'll be writing over the coming 2-N years into tools that are applicable to future problems. These problems could be popping up in chapter 2 and 3 of your thesis/dissertation or during your post-doc / whatever comes after. You should put some time and effort into understanding software you didn't write at a level that enables you to install and run it at your new job. Your tools should become transportable and accompany you to future endeavors; much like a pirate's parrot.
This post is a natural expansion to my previous call to teach students how to make tools. Here, I go into how students should approach the -at first- intimidating task of using existing research style software and writing their own.
The central questions are:
Research software continues, for most of it, to live far outside of traditional software development wisdom. When I transitioned from computer science into geophysics, I was exposed to some FORTRAN code and couldn't believe what desolate condition software development was in. Fast-forward 10 years and I've learned that code that answers a specific question operates under a different set of rules. Often someone develops a theory and provides a well tested FORTRAN code along with the journal article - and people will keep using this until a better theory accompanied by code comes along. Note that theories without accompanying code have a harder time establishing themselves! The reason? It is very expensive to sit down and (re)write someones code, or transform math into software.
There's no point in rewriting if you can dig up a compiler that translates it and somehow get it to work. Particularly, since academic hero-code often comes without comments and creative variable names. Both don't necessarily boost confidence when touching the code. And so our virtual labs turn into a duct-taped scaffolding that may very well collapse with the next compiler or OS upgrade. Once out of the protective shelter that grad school is (I know, it doesn't feel like it; enjoy it anyway) and drowning in -hopefully?- a tenure-track position, plenty of other tasks compete for your attention. Withdrawal to your coding lair will be a rare treat.
One of the worst things that can happen at that point is to lose access to the analysis tools you used when working on your dissertation. This can happen quite easily: your school provides a proprietary / licensed software, which you depend on like addicts on their meth. Once you leave, you may face thousands of dollars in license fees, or you have to downgrade to a partial version of the software, or you dropped your bucket of tools in a glacial stream. At least your source code is still yours; maybe there's a cross-compiler to another, free language; or a free interpreter/compiler that runs directly on your code. Else, you're out of luck and you get to translate yourself; manually. (How about creating a cross-compiler and making it available for others to use?)
This gets at the first critical question you should ask when it comes to tool choices (see list above): pick a development environment that you can carry home when you leave, along with your degree. Sure, the lab you're working in may set tight constraints (policy, highly specialized commercial software package, ...). If so, operate as freely in these constraints as you can. More than likely, your adviser will not care too much about the minute details of your if-statements and instead debug your code with you via figures that illustrate the tests you've run. Then choose whatever is best for the future you.
The second, no less important point, lies with generalization and concurrent organization of your tools:
Once you've generalized and organized your tools you're fairly well set. Superstars go one step further and place their (academic) codes into publicly accessible repositories. Why? Because you may have solved a problem others encounter and ultimately make their lives easier as they can use your tool to find solutions; rather than wasting time, energy, resources through reinventing the wheel. This tends to be particularly infuriating in academia where the math is usually communicated through papers and the reader is left to write the code on their own.
If you don't want to deal with code hosting yourself, github is one mechanism that does it for you, for free; bitbucket is another one. Github, in collaboration with Zenodo, provides a mechanism to attach doi's to your code. That makes it citable; the ultimate motivation for an academic, I guess. Bitbucket allows you to keep your repositories non-public without payment. I use both for different purposes.
Now you're a little more ready for grad school.
rg <at> nmt <dot> edu | Created: 2015/09/03 | Last modified: September 04 2015 04:18.