Ronni Grapenthin - Notes

twitter: @rngrp
New Mexico Tech
Dept. of Earth &
Environmental Science
801 Leroy Place
Socorro, NM-87801

Teach your students how to make tools.

Published: 2015/02/05 (A version of this was originally published in Eos, Vol. 92, No. 50, 13 December 2011; as it's still relevant and Eos recently open accessed new submissions, here it is again.)

When I announced my intention to pursue a Ph.D. in Geophysics some people gave me confused looks, because I was working on a Master's in Computer Science at the time. My friends, just like many incoming Geoscience graduate students, have trouble linking these two fields. From my perspective it is pretty straightforward: Much of Geoscience evolves around novel analyses of large data sets which require custom tools - computer programs - to minimize the drudgery of manual data handling; other disciplines share this characteristic. While most faculty adapted to the need for tool development quite naturally, as they grew up around terminal interfaces, incoming graduate students lack the intuitive understanding of programming concepts such as generalization and automation. I argue the major cause to be the intuitive graphical user interfaces of modern operating systems and applications, which isolate the user from all technical details. Generally, current curricula do not recognize this gap between user and machine. In order for students to operate effectively, they require specialized courses teaching them the tools they need to make tools that operate on particular data sets and solve their specific problems. Courses in Computer Science Departments are aimed at a different audience, and are of limited help.

In 2009, my adviser, Jeff Freymueller, and I began to experiment with a course on programming for Geoscience graduate students in our department at the University of Alaska Fairbanks. This emerged from a fortunate mix of people in one room: a graduate student in need; myself, already thinking about such a course; and very supportive and aware faculty. We now have gone through three iterations of this experiment. Our course goals are ambitious for a one semester, 2 credit course. We learned a lot from our many mistakes and I want to share some of our experiences and encourage other institutions to follow along. I will not mention any specific programming languages or tools in this article as these vary by discipline and department. The overarching main points we believe such a course should touch on are as follows:

Repetitive work is for machines.

Students need to realize that a problem is worth being solved once. Exactly once. Yet, there are students manually laboring through identical procedures on a daily basis. We want them to understand that breaking down a complex problem into simple tasks, writing out the respective steps, testing them individually, and finally bundling them into one command is of great value and is time well invested. From this we advance to generalizing specific solutions such that their tool tackles an array of problems. Assume, for example, having a tool that analyzes a day's worth of data for one sensor. We want students to ask how this tool can be used to treat all available sensors on all days. Trying to think of such a configuration is a worthwhile, yet challenging exercise. The solution to questions like this is abstract and entirely free of code, but it establishes the fundamental concept of computers working while you are out for a nice afternoon run.

Understand fundamental principles.

No single programming language is the ultimate tool for all problems. Handing your students one tool to solve a specific task will be a great quick fix until a different kind of problem emerges, rendering this tool a poor fit. Exposing students to a small variety of programming languages and the connecting fundamental principles loosens the tension a new syntax brings and also hands them abilities they crave. Comprehension of the concepts of variables, functions, and flow control gives students sufficient momentum and the ability to transition to whichever shiny new language comes around in the future. While Object-Oriented Programming certainly deserves consideration as it enables wonderful software design, it seems impossible to teach such advanced concepts well in a few lectures and labs and we decided against including it.

Organize data consistently.

Data related programming evolves around traversing directories, picking files, reading data, processing data, and writing out results. To have a computer operate effectively and keep coding efforts under control, a consistent naming scheme for files and directories is crucial. Imagine needing all available data for May 23rd, 2012. It's easy if all files carry the date in their name in a consistent format, say 20120523. Consistent data archives allow your program to find files in a minimal number of steps. Admittedly, this is pretty straightforward, but students are so accustomed to the fact that they can easily recognize a multitude of date formats that they do not realize how hard it is for a machine to do so.

Create legible, reproducible figures.

In many disciplines the figure is the ultimate conveyor of achievements; summarizing findings in an accessible way (we think). A lot of effort goes into figure creation. Yet, this should not be repeated whenever new data comes around. Once created, a figure is a solved problem. Hours wasted on recreating it indicate the use of the wrong tool. Similarly, illegible axis labels or poor color schemes should prompt everyone to at least wonder about a tool's capabilities and -if necessary- switch to a tool that offers the required level of freedom. Sadly, more often than not, this is not done. Conveying these thoughts is not unique to us; we join the choir of people like Edward Tufte and Jon Claerbout, calling for sensible and reproducible visualization of data.

The course has been well received by both students and faculty in our department. Several Biology students have taken the class in the last two iterations, which shows that the demand for the class extends beyond Geoscience. Apart from classic lecture settings, three core ideas are responsible for this success:

Provide guided practical application.

Probably the biggest mistakes we made were to assume too much prior knowledge and to provide too little individualized guidance. Basically we assumed we were instructing experienced students, but in reality they are entering a new field and are beginners on this topic. Although banging your head against a wall is an integral part of computer programming, it is necessary to keep a healthy balance between frustration and gratification; this makes a controlled lab environment indispensable. It is of great help to demonstrate individually how to solve the mostly minor problems encountered when working through problem sets. Most of this knowledge seems so deeply engraved in the mind of experienced programmers that it appears natural. Conveying these techniques and simple concepts is critical, and impossible in a pure lecture setting.

Solve student-specific problems.

We assign projects that are ideally related to a student's thesis work so that they include course concepts into their daily routine. Here, the key to success is heavy mentoring, which includes time intensive code review. Given the diversity of student research this is hard, but comes with tremendous gratification of engaging education that sticks with the student.

Demonstrate problem solving.

A last point that inspires significant progress is ``live coding.'' I pick a simple problem, think it through, but write the actual program with the students in class. Naturally this brings embarrassment and high entertainment potential. Between laughter students break down complex problems into simpler tasks, learn to read error messages, see the value of search engines in debugging, and get a feeling for connecting the dots.

In general, as a result of the course our students make enormous strides in their programming skills and their confidence to take on problems that require them. We see them apply these techniques in their research and in other courses. This will allow instructors of other courses to transition to a state where they can assume basic knowledge on the subject and use computational exercises to teach concepts of Geoscience rather than programming. Other curriculum changes are planned that will depend on this. Our experience gives us confidence that our students will leave behind a trace of useful tools. Some already advance their community by making their work freely available; some consider publishing on their tools. This is surely more desirable than stacks of sticky notes; and hopefully fresh in your mind for coming curriculum changes.


I want to thank my adviser Jeff Freymueller for supporting me in the pursuit of this experiment and providing valuable feedback on the manuscript. Jamie Martin and Bernie Coakley also provided many helpful comments on early drafts of the article, which I am grateful for. The Department of Geology and Geophysics at UAF provided financial support to realize the course. Furthermore, I also would like to acknowledge the supportive and constructive comments by Eos Editor Christina Cohen and two anonymous reviewers, which improved the manuscript.

rg <at> nmt <dot> edu | Created: 2015/02/05 | Last modified: February 05 2015 17:19.