Using the Unix/Linux Shell as Glue
As a long time Linux/Unix consultant and administrator, I have found the underlying Unix paradigm to be a very enduring model for system operation and maintenance. Unlike many other operating systems,. simple Unix tools can be built relatively quickly because some key operating system primitives are exposed to the shell user interface without the need to resort to a lower level language. Here are the key primitives which are exposed to the shell:
- fork()
- exec()
- pipe()
- dup() (e.g. file descriptor manipulation)
Writing against these primitives, say, in C requires a fair amount of bookkeeping and low-level coding which means it is difficult to "glue" existing programs together. The shell on the other hand exposes these key primitives in an elegant and easy to use way. I call the shell a glue language, since it let you glue existing programs together without doing a lot of detailed coding.
In the shell the "|" (or pipe symbol) allows the user to create processes and glue them together by piping the output of one command into the input of another. The redirection "<", ">" and ">>" manipulate descriptors to allow output or input to come from a different source without modifying an existing program. Together, these higher level constructs take care of all the low-level bookkeeping you would need to do in C to perform the same functionality.
The shell syntax is so familiar to well-versed Unix users that their usage becomes second nature. Exposing the primitives at this high level profoundly alters the way the typical Unix user uses the system. Instead of being limited to the specific functionality of the various bult in tools, the Unix user can use the shell as glue to tinkertoy programs together and extend their functionality. This approach is both modular and extensible, thereby making the toolset far more pwerful.
Here is one simple example of this power.
A Simple Log Parser and Viewer
Suppose you want to extract and view specific logfile entries from the /var/log/mail.log file that are the logfile entries for mail sent to the smoot@tic.com address. A simple way to do this is to use the grep program:
grep smoot@tic.com /var/log/mail.log
This will match all lines which have the string smoot@tic.com in them. There is nothing special about this. Almost any decent command line can do this sort of processing. I can also look at the logfile in real time by using tail:
tail -f /var/log/mail.log
I can combine these two tools together to see the log entries in real time and also filter the output at the same time:
tail -f /var/log/mail.log | grep smoot@tic.com
Under the covers this is two processes with the output of the first connected to the input of the second. tail -f does a tail on the logfile and continously outputs the logfile as it grows. This works because of some simple programming rules which allow easy pipelining. grep follows these rules which basically says that in the absence of filenames on the command line, the program will read its standard input (stdin). Another programming rule is program output should always go to standard output ((stdout). Almost all Unix programs follow these simple rule.
With this in mind we can do some interesting things with our simple one line program. Suppose we want to run it on a remote system. We can use ssh for this by typing:
ssh some_remote_host.com tail -f /var/log/mail.log | grep smoot@tic.com
ssh runs a command on a remote system just as if it was local. It follows the programming standards and the output of the remotely run tail program is piped into the locally run grep program. How do I know that only tail is run remotely? Because ssh takes its first argument as the name of the remote host and the 2nd argument as the name of a command to run on the remote system. All subsequent tokens up to the pipe (|) are arguments to tail. Everything after the pipe symbol is considered another local process and its arguments.
You can change the above pipeline slightly and get the grep to run remotely by quoting the entire command set you want to run remotely:
ssh some_remote_host.com "tail -f /var/log/mail.log | grep smoot@tic.com"
The quotes tell the invoking shell to pass the quoted argument in its entirety to be processed by the ssh command. ssh interprets the argument as the command to pass to the remote shell. This has the advantage of reducing the amount of output over the network.
As a further step, we can encapsulate this simple pipeline into a script and do some parameterizing and defaulting to make it more general:
#! /bin/sh
search=$1
logfile=$2
remote=$3
if [ ! $search ]; then
echo usage: $0 search [logfile] [remote]
exit 1
fi
[ $logfile ] || logfile=/var/log/syslog if [ $remote ]; then ssh $remote "tail -f $logfile | grep $search" else tail -f $logfile | grep $search fi
This particular script is written to be a standard Bourne shell script and does not use any bash extensions or features. If you want your scripts to be portable, you should write them in this manner. This script takes 3 arguments:
- search string
- logfile name
- remote host name
It utilizes some defaults, so you can run it with 1, 2 or 3 arguments and it defaults in a reasonable way. We could continue to refine this script and add features, but it is probably better to leave it as a simple tool which can be combined with other tools using the underlying Unix primitives which are elegantly exposed in the shell.
- smoot's blog
- Login or register to post comments
