University of Alaska Fairbanks
Geophysical Institute

Beyond the Mouse 2010 - The geoscientist's computational chest.

Lab 7: Unix / Shell Introduction

"Programming is legitimate and necessary academic endeavor."
Donald E. Knuth

Lab slides

last slides from lecture

Running the VirtualBox (Read this before doing anything else!)

I installed VirtualBox on all the machines in the lab. VirtualBox allows to install many different guest operating systems into one host operating system. For us the host is, big surprise, Windows XP into which I installed a Linux distribution; namely Ubuntu. I decided for Ubuntu because it's rather user friendly and it's also what I run on my laptop these days (read: familiar). The installation routine is rather hassle free and a lot of effort is put in supporting modern hardware. This, of course, comes with drawbacks I am not going to debate here. You can read through one of the comparisions of the most widely used Linux distributions if you're interested and wondering what you would choose.

How to get your Ubuntu started? Double-click on the ``Oracle VM VirtualBox'' icon on the desktop, select 'Ubuntu10.10' and click ``start''. The startup takes a while: an actual operating system is being loaded; Ubuntu doesn't know a thing about the Windows it's running in; poor thing. Do NOT run any of the updates!

At some point, after the system is initialized (which may take a while because everything being virtual slows things a bit) you will see a Desktop with a task bar and all that stuff. Welcome to your working environment for the next labs (this window manager is called gdm (gnome desktop manager)). Here are a few useful shortcuts:

I recommend working in fullscreen mode; but that's just me.

You will be automatically logged in as user btm_user. This is a local user which exists on all the computers in the lab. In your home directory (/home/btm_user/) you will find a directory N_DRIVE; this is your personal N:\ on nugget from the Windows world. I mounted this as a shared drive so that you have a place to save your data/scripts - use it! Everything you save to /home/btm_user/N_DRIVE (case sensitive!) will be available on all machines. Everything else is local to the GEOS-XX machine you're working on. So save your scripts (NOT THE DATA!) to N_DRIVE or your personal flash drive (given that usb works; untested). If there's any software you want to install you can do that locally in /home/btm_user only. If it's a reasonably useful application (read: useful to everybody) let me know and I'll sudo it on the machines.

This distribution comes with quite a few applications already; check out the applications menu; or go to /usr/bin. The text editor we will use is jedit (press alt+F2 and type jedit and press enter). Please use this editor as other editors can have issues with writing to N_DRIVE! I added a few other things including MATLAB, GMT, LaTeX compiler, and several other standard development tools which we'll probably not use, though. Feel free to poke around; one objective of these labs is indeed getting you acquainted with a (maybe) unfamiliar operating system (a crusade in disguise: one can do the virtualbox thing also the other way around -- Windows inside a Linux ;) ).

In the upper task bar, you'll find a shortcut to the gnome-terminal application (you could also press alt+F2 and type gnome-terminal and press enter to open such a window). Another popular terminal application is xterm. Whichever you choose; your shell will be a tcsh. Yes, I want you to open that window now ...

You will be greeted by a prompt: GEOS-XX:~> Let's look at what this means ('>' marks the prompt):

Exercise 0: Warm-up with some shell tricks.

Type these commands and follow the output on the screen. I want you to get to know this world a bit better:

Sweet. A useful thing to know is that with the up and down arrows you can browse through the history of your commands. Quite handy if you just typed a very long command and you want to do something quite similar again (Well, it that case it might be time for some shell scripting).

Now for some fun with the PATH and other variables. Be sure to do exactly what is listed. Answer questions where I ask.

You can learn more about other environment variables by using env | more.

Exercise 1: Commands and Piping

Below we give a list of unix commands which we find useful and you'll get to know this week. Some of these commands work on files / directories. Download this to your home directory and unpack it using tar xfz lab07_ex1.tar.gz. Now cd into the newly created directory lab07_ex1.

Each of the commands given in the tables below does something and creates output that that can be piped into the others (multiple pipes are perfectly fine). The general syntax of Unix commands given in the man-pages is generally something like command [options] file(s). This means you write the command name on the command line, you get to chose whether you want to use any of the options the command offers (square brackets usually indicate that things can be left out), and then you operate on one or more files. Here are the commands:

commanduseful optionsexplanation
ls -l, -s, -t, -rlist files in current directory
wc -l, -c, -w count lines, characters (bytes), words
head -#### (### represents number) output first part of a file
tail -#### (### represents number) output last part of a file
diffcompare files line by line
sort-n -r -ksort lines of text files
historynone.lists history of commands
df / du-k -hshow available disk-space / space used by files
catdisplay / concatenate files
top-n show process statistics, for piping use top -n 1 as it keeps running otherwise

Here are a few examples that can be run in lab07_ex1, which should illustrate how these commands work:

The first thing you will do is play with these commands and their command line options. If you need any files to work on, explore the example directory we gave you, maybe with the example commands listed above. You should go to the man pages and read up on the options we've highlighted in the table if you want to know what they actually mean (> man COMMAND). You will send to us 1) three commands that may include command line options of your choice (please tell us from which directory you invoke the command) 2) an explanation of what the command does, and 3) the output of each command.

Now it's time to step it up a bit: From the list of commands use as many as you like, pipe the output into other commands to create 5 new commands that do something you find useful, redirect the final output to a new file. Send in: (1) the command, (2) a description of the command, (3) a file with the redirected output for each command.

Exercise 2a: Permanently changing your Path and stuff

To allow for you to have your shell scripts accessible on all machines they should go to a special directory on the shared drive; N:\ on the Windows machines which is mounted to /home/btm_user/N_DRIVE inside the VirtualBox. Since there is only one user btm_user for the VirtualBox, but many Windows users that will use this log-in, everybody will have to use the same directory name for the scripts.

Go to /home/btm_user/N_DRIVE and create a directory btm_unix_scripts. Check here to see how to do this.

Now that you all have this directory you will have to edit the .tcshrc file. This is the "Run Command" (rc) file for the tcsh-shell. It is executed every time you log into a shell or open a new terminal Window or subshell. All environment variables, aliases, etc. will therefore be available in any shell session you start on this system. Here is a brief description of things that happen during the login process (for a shell). You might see that you can easily configure your working environment using this file. If it does not exist, you will have to create it. (The leading dot is important; it's part of the filename and 'hides' the file in normal ls listings. This is generally used for configuration files and directories that have to be in your home directory; but you don't have much business messing around with them (or so the developer thought) To see all the stuff that's in your directory, try ls -lisa. The options l,i,s,a are explained in the man pages of ls.)

After all this talk, here's what to do (assuming you created /home/btm_user/N_DRIVE/btm_unix_scripts):

Exercise 2b: Writing a Shell script

In this directory you will find GPS data for a certain day. That's not essential. The key is that there are many, many files. Some of which are gzipped, others are duplicates: gzipped and unzipped. What I want you to do now is find all the duplicates and rename the unzipped files to all upper case:

While developing this script you might wanna test it. To test it you will have to make it an executable. In a terminal window:

You will have to repeat this for any script you want to be executed on the command line. Otherwise you'll get a "could not find ..." response.

The testing happens in ~/lab07_ex2! Open a new terminal window (to refresh the path contents with your new executable), go to ~/lab07_ex2, and execute whatever you called your script. And yes, I will do exactly that and expect your script to work, no matter where it is stored.

Now the funky part: Rename the file such that all lower case letters are upper case. You find almost a full solution at this website. You will have to do the conversion from lower case to upper case since they convert from upper case to lower case. I find this task challenging and yet rewarding enough that giving the solution away is fine with me. However, you still have to find the correct line on the website, copy it correctly into your script, modify and explain to me what this line does (use man pages and Internet to find answers).

As a guideline: my neatly formatted, yet uncommented solution script is 8 lines long.

Once you're done try > ls *QM | wc -l
The result should be 944.

You've just changed the name of 944 files. Given the boredom caused by doing the actual conversion by hand and the number of files, writing the script, testing, failing, fixing, testing, succeeding was still a lot faster.

Now you could go ahead and remove all files that are upper case using rm ./*.QM. A safe way to mar k files and hopefully avoid deleting precious data accidentally - good thing you have backups. The result should be:
> ls *qm
ls: No match.
Good! All the unnecessary stuff has been removed.

To be fair though, in real life you would simply call: gzip *qm and let gzip complain about existing files. But the point of this exercise was to introduce you to a few unix tools, get you to do some scripting and do a simple task on many, many files. I hope this objective was accomplished.

ronni <at> gi <dot> alaska <dot> edu | Last modified: April 20 2016 17:00.