University of Alaska Fairbanks
Geophysical Institute

Beyond the Mouse 2010 - The geoscientist's computational chest.

Lab 9: Unix / Shell Introduction II

"Programming is legitimate and necessary academic endeavor."
Donald E. Knuth

Lab slides

none.

Note

As solution, send me your scripts and the answers to the questions. No datafiles please, I have plenty of those :)

Running the VirtualBox

Check here if you forgot how that works. Really. Go there if you forgot something.

Exercise 0: Permanently changing your Path and stuff

To allow for you to have your shell scripts accessible on all machines they should go to a special directory on the shared drive; N:\ on the Windows machines which is mounted to /home/btm_user/N_DRIVE inside the VirtualBox. Since there is only one user btm_user for the VirtualBox, but many Windows users that will use this log-in, everybody will have to use the same directory name for the scripts.

Go to /home/btm_user/N_DRIVE and create a directory btm_unix_scripts. Check here to see how to do this.

Now that you all have this directory you will have to edit the .tcshrc file. This is the "Run Command" (rc) file for the tcsh-shell. It is executed every time you log into a shell or open a new terminal Window or subshell. All environment variables, aliases, etc. will therefore be available in any shell session you start on this system. Here is a brief description of things that happen during the login process (for a shell). You might see that you can easily configure your working environment using this file. If it does not exist, you will have to create it. (The leading dot is important; it's part of the filename and 'hides' the file in normal ls listings. This is generally used for configuration files and directories that have to be in your home directory; but you don't have much business messing around with them (or so the developer thought) To see all the stuff that's in your directory, try ls -lisa. The options l,i,s,a are explained in the man pages of ls.)

After all this talk, here's what to do (assuming you created /home/btm_user/N_DRIVE/btm_unix_scripts):

Exercise 1: Exercise 1 of last week's lab, altered by a little bit -- Writing a Shell script (Commented solution)

In this directory you will find GPS data for a certain day. That's not essential. The key is that there are many, many files. Some of which are gzipped, others are duplicates: gzipped and unzipped. What I want you to do now is find all the duplicates and rename the unzipped files to all upper case:

While developing this script you might wanna test it. To test it you will have to make it an executable:

You will have to repeat this for any script you want to be executed on the command line. Otherwise you'll get a "could not find ..." response.

The testing happens in ~/lab08! Open a new terminal window (to refresh the path contents with your new excutable), go to ~/lab08, and execute whatever you called your script. And yes, I will do exactly that and expect your script to work, no matter where it is stored.

Now the funky part: Rename the file such that all lower case letters are upper case. You find almost a full solution at this website. You will have to do the conversion from lower case to upper case since they convert from upper case to lower case. I find this task challenging and yet rewarding enough that giving the solution away is fine with me. However, you still have to find the correct line on the website, copy it correctly into your script, modify and explain to me what this line does (use man pages and Internet to find answers).

As a guideline: my neatly formatted, yet uncommented solution script is 8 lines long.

Once you're done try > ls *QM | wc -l
The result should be 944.

You've just changed the name of 944 files. Given the boredom caused by doing the actual conversion by hand and the number of files, writing the script, testing, failing, fixing, testing, succeeding was still a lot faster.

Now you could go ahead and remove all files that are upper case using rm ./*.QM. A safe way to mark files and hopefully avoid deleting precious data accidentally - good thing you have backups. The result should be:
> ls *qm
ls: No match.
Good! All the unneccessary stuff has been removed.

To be fair though, in real life you would simply call: gzip *qm and let gzip complain about existing files. But the point of this exercise was to introduce you to a few unix tools, get you to do some scripting and do a simple task on many, many files. I hope this objective was accomplished.

Exercise 2: Data Handling with awk (Commented solution, Commented log file of some runs of the code)

Hopefully you remember Exercise 2 of Lab 05. Most likely you will remember though that at some point in the past we had you fiddle with pesky formatting strings to extract some data from a file with a lot more data. This was Exercise 2 of Lab 05. Now we'll go back to the FAIR.pfiles text file and treat it with Unix tools to extract the information we want.

Now that you know how to do those two key actions, create a new tcsh script pfiles2llh in $BTM_BIN that generalizes this for any .pfiles file it gets handed over as a command line argument. The format for executing this script at the command line should be like this:

> pfiles2llh STATION_NAME.pfiles

Command line arguments are given to a script in various forms. ONE is using the built-in variables $0, $1 ... $N. Inside your script $0 is the program name that has been called. $1, $2, ..., $N are the first argument, 2nd, ..., n-th argument for the program that has been called. This convention is generally used when you have a few arguments that you expect to be handed to the script in a certain order. Next week, we'll do something more fancy. Here is an interesting article that tells you how to find the maximum number of arguments for a shell command.

Here's what your script is expected to do:

ronni <at> gi <dot> alaska <dot> edu | last changed: December 27, 2010