Pet Peeves: run-parts(8), asynchronous execution, and logging
I run Slackware on two of my systems. I have always admired Slackware's simplicity and honesty. Some people accuse it of being overly complex, but I've found that what others see as complexity, I see as the ability to control and fine-tune. Other distributions may be great for those who are looking for a Unix system with a Microsoft feel, but I prefer my OS to not hide it's controls, to not tell me that, for my own good, it won't do what I want it to do.
Slackware's run-parts(8) is one of those simple tools that works well. Built from a simple /bin/sh script, and using basic Linux tools, Slackware's run-parts does the basic work of "batch" processing for cron. But, this simplicity has it's price; target scripts are run synchronously, one after the other. If one get's stuck, the remainder can't run. And, there's no easy way to determine if a script has got "stuck"; there's no conveniently accessable logging of the start or termination of each script, nor is there any consistent way to gather messages from the scripts into one place.
And, so, my pet peeve: run-parts(8) doesn't do everything that I need in a batch processor. So, I went and built my own add-on solution.
How it started
Originally, I had a number of scripts that performed backups on specific directories. Each script would ensure that a specific USB-attached hard drive was mounted, create it's backup on that hard drive, and terminate. There were scripts for daily backups, scripts for weekly backups, and scripts for monthly backups. They seemed to ideally fit within the /etc/cron.daily, /etc/cron.weekly and /etc/cron.monthly directories parsed by Slackware's root crontab using run-parts(8).
But, there were some hiccups.
- because of the synchronous design of run-parts(8), each script had to complete before the next one would start, and some of these scripts took a long time to complete.
- run-parts(8) had no convenient way to indicate which script it was currently waiting on; you either receive an email containing all the script startup notifications, once the last script has completed, or you receive no email at all. When things ran long, there was no easy way to determine what had been done, and what was waiting to be done
- I wanted the USB hard drive to remain unmounted until needed for the backups. Because the daily, weekly, and monthly scripts started at different times, the USB hard drive had to be mounted independently of, and before the first script was started.
- I wanted the USB hard drive to be unmounted once all the backups were complete. Because the daily, weekly, and monthly scripts ended at different times, the USB hard drive had to be unmounted independently of, and after the last script had completed
To make the scripts asynchronous, I added wrapper logic to each script, so that the working part of the script ran in the background, while the wrapper terminated and let run-parts(8) proceed to the next script. And so, the scripts got complex.
Additionally, I put lots of logging logic (using logger(1) to log to syslog) into the scripts. This helped me track the execution of each script, with the assistance of grep and awk. But, this logging was inconsistent in format, and did nothing to solve the synchronicity and resource management problems, and the scripts got even more complex.
Finally, to solve the resource dependency issue (the mounting and unmounting of the USB drive), I added even more common logic to each script, this time to determine if the drive was mounted, and mount it if not, and (for the last script to finish) to determine that the drive was no longer needed, and to unmount and fsck it. And the scripts got even more complex.
So, my simple backup scripts, which had just started out as a bunch of tar(1) commands, now contained logic to lock sentinal files, mount drives, log to syslog, daemonize, umount drives, and unlock sentinal files, as well as run individual tar(1) commands.
Gawd, what a mess. There had to be a better way.
The genesis of RMonitor
I have no qualms about, for my own Slackware installations, enhancing the scripts and tools that Pat V. supplies in the Slackware distribution package. I did take a brief look at the run-parts(8) script to see how I could enhance it, but quickly abandoned that as a bad idea. run-parts(8) is simple, works well, and is crucial to a healthy system. You don't fsck with the things that work, when you can just add stuff to make it work better.
So, I decided to take all my custom script code; the syslogging of start and finish, the asynchonous execution, the resource manipulation and locking; and built it all into a program that would handle everything for me. A local program that could be used in conjunction with run-parts(8), to give me the features that I was looking for.
What I came up with is an "interpreter" that you use as the hashbang of a script. This interpreter (called rmonitor) launches the "real" script in an asynchronous child process. The child takes care of manipulating the resource management and sentinal/lock files to ensure that named resources are made available before the first target runs, and cleaned up after the last target completes. It also generates, to syslog, a fixed-format set of messages indicating the start and completion (with exit status) of each target process run, and intercepts any stdout and/or stderr text written by the target scripts, and logs that to syslog in a standard format.
Now, each script, that embeds this interpreter as it's hashbang, runs asynchronously, has standard start and termination messages logged to syslog in a consistent and identifiable format, has it's own informational and error messages automatically logged to syslog in a consistent and identifiable format, and manipulates required resources on demand. My run-parts(8) target scripts are now back to the simple scripts they were, taking care of just the things that they need to take care of.
RMonitor Block Overview
>-----, \ ,-->{STDERR} | ,--->{SYSLOG} v / +----------+ | rmonitor | | gparent |- - - - ->+----------+ +----------+ | rmonitor |- - - - ->+----------+ | | parent | | PREP |+ | | ~ ~ ~ | | task ||+ / | |<- - - - -+----------+|| <----' | | +----------+| | | +----------+ | | | |- - - - ->+----------+ | | | target | | ~ ~ ~ | | task | | |<- - - - -+----------+ | | | |- - - - ->+----------+ | ~ ~ ~ | | POST |+ | | | task ||+ | |<- - - - -+----------+|| +----------+ +----------+| | \ +----------+ | \ / '-->{SYSLOG} <--------------------------'
RMonitor Manpage
RMONITOR(1) RMONITOR(1) NAME rmonitor - run script with on-demand pre and post processing SYNOPSIS rmonitor resourcename[,resourcename...] [/path/to/interpreter] /path/to/script DESCRIPTION rmonitor is intended to be used as the binary named by a script's hash- bang interpreter line. A resourcename names a single resource that must be prepared for use before the target script runs, and cleaned up after the target script completes. A resource list consists of one or more resourcenames, separated by commas, permitting a script to involve one or more resources. The resourcename is used to tie a number of different target scripts together, in that every script that uses a specific resourcename uses the same specific resources. rmonitor uses the resourcename to select a specific helper script to perform resource management prior to the first use, and following the last use. Typical resource management pro- cesses would mount external storage before the first process writes to it, and unmount it after the last process finishes with it. The optional /path/to/interpreter names the script interpreter program to be used to execute the target script in which rmonitor is embedded. If the hashbang omits the interpreter path, rmonitor uses /bin/sh as the target interpreter program. OPTIONS As a hashbang, the rmonitor commandline will take the format #!/path/to/rmonitor resourcelist /path/to/interpreter or #!/path/to/rmonitor resourcelist Where - resourcelist is a list of resourcename values, separated by commas Example: resourcename1,resourcename2,resourcename3 or: resourcename - resourcename is a 7 to 20 character long name consisting only of upper or lower case alphabetics, digits, dash and/or underscore - /path/to/interpreter is an optional path to a script interpreter (defaults to /bin/sh) FILES /var/lock/rmonitor.*.lock acts as both a serialization lockfile, and a semaphore file for the named resource. Each resource will have a unique lock/semaphore file. This file is built and managed by rmonitor as it executes. /usr/local/lib/rmonitor/*.exit provides the PREProcessing and POSTpost- processing logic for the named resource. These executable files will accept a single argument which will be set to "PREP" for preprocessing, and "POST" for postprocessing. The executable will use this argument to determine and execute the correct processing. The following is a trivial example exit #!/bin/sh echo ::: $0 Begins ::: RC=0 case $1 in PREP) echo ::: PREProcessing exit invoked ::: # perform resource allocation here ;; POST) echo ::: POSTprocessing exit invoked ::: # perform resource deallocation here ;; *) echo ::: $1 option invalid ::: >&2 RC=1 ;; esac echo ::: $0 Terminates ::: exit $RC /bin/sh is used as the default interpreter should none be specified in commandline arguments. DIAGNOSTICS rmonitor emits diagnostic messages to stderr and/or syslog (LOG_LOCAL1). The program logs according to it's phase of operation. The grandparent process, parses commandline arguments and prepares the internal processing controls. While validating the commandline, it logs only to stderr. After validation, the grandparent logs to both stderr and syslog. Under normal use, all grandparent messages will show up in the stderr log (i.e. email from cron) of the initiating script. The parent process logs all diagnostic messages to syslog. Informa- tional messages, including parent generated messages, preprocessing stdout, target stdout, and postprocessing stdout are directed to LOG_INFO, while critical messages including parent generated messages, preprocessing stderr, target stderr, and postprocessing stderr are directed to LOG_ERR. Diagnostics will contain values derived from execution parameters and/or script-generated data - <PgmName> is the name of the rmonitor binary, stripped of it's path - </Path/To/JobBase/JobScript> is the path to the hashbanged script - <JobBase> is the name of the directory in which the hashbanged script is found - <JobScript> is the name of the hashbanged script within the JobBase directory - <Resource> is the name of one of the resources listed in the hash- banged script - <Interpreter> is the interpreter specified by the hashbanged script - <Pid> is the process id of the rmonitor child process for the hash- banged script - <text> is arbitrary text generated by one of the invoked scripts
Example Syslogs
Mar 9 04:40:03 merlin cron.daily: JOB 21.daily.backup.etc[10200] Launched Mar 9 04:40:03 merlin cron.daily: JOB 21.daily.backup.htdocs[10226] Launched Mar 9 04:40:03 merlin cron.daily: JOB 21.daily.backup.mysql[10246] Launched Mar 9 04:40:03 merlin cron.daily: JOB 21.daily.backup.userdirs[10266] Launched Mar 9 04:40:03 merlin 21.daily.backup.etc[10200]: JOB Begins Mar 9 04:40:03 merlin 21.daily.backup.etc[10200]: PREP UsbBackup: EXIT Begins Mar 9 04:40:03 merlin 21.daily.backup.etc[10200]: PREP UsbBackup: Will fsck filesystem and mount to /var/backup Mar 9 04:40:03 merlin 21.daily.backup.htdocs[10226]: JOB Begins Mar 9 04:40:03 merlin 21.daily.backup.mysql[10246]: JOB Begins Mar 9 04:40:03 merlin 21.daily.backup.userdirs[10266]: JOB Begins Mar 9 04:48:29 merlin 21.daily.backup.etc[10200]: PREP UsbBackup: usb_backup: 9233/39075840 files (3.9% non-contiguous), 31231875/78142160 blocks Mar 9 04:48:31 merlin 21.daily.backup.etc[10200]: PREP UsbBackup: /sbin/fsck returned 1 Mar 9 04:48:31 merlin 21.daily.backup.etc[10200]: PREP UsbBackup: Filesystem 1K-blocks Used Available Use% Mounted on Mar 9 04:48:31 merlin 21.daily.backup.etc[10200]: PREP UsbBackup: /dev/sdh1 307663800 120022660 172012708 42% /var/backup Mar 9 04:48:31 merlin 21.daily.backup.etc[10200]: PREP UsbBackup: EXIT Status 0 Mar 9 04:48:31 merlin 21.daily.backup.htdocs[10226]: PROCESSING Begins Mar 9 04:48:31 merlin 21.daily.backup.etc[10200]: PROCESSING Begins Mar 9 04:48:31 merlin 21.daily.backup.mysql[10246]: PROCESSING Begins Mar 9 04:48:31 merlin 21.daily.backup.userdirs[10266]: PROCESSING Begins Mar 9 04:48:52 merlin 21.daily.backup.mysql[10246]: PROCESSING Status 0 Mar 9 04:48:52 merlin 21.daily.backup.mysql[10246]: JOB Status 0 Mar 9 04:48:54 merlin 21.daily.backup.etc[10200]: PROCESSING Status 0 Mar 9 04:48:54 merlin 21.daily.backup.etc[10200]: JOB Status 0 Mar 9 04:49:29 merlin 21.daily.backup.htdocs[10226]: PROCESSING Status 0 Mar 9 04:49:29 merlin 21.daily.backup.htdocs[10226]: JOB Status 0 Mar 9 05:09:39 merlin 21.daily.backup.userdirs[10266]: PROCESSING Status 0 Mar 9 05:09:39 merlin 21.daily.backup.userdirs[10266]: POST UsbBackup: EXIT Begins Mar 9 05:09:39 merlin 21.daily.backup.userdirs[10266]: POST UsbBackup: Will umount and fsck /var/backup Mar 9 05:09:39 merlin 21.daily.backup.userdirs[10266]: POST UsbBackup: Filesystem 1K-blocks Used Available Use% Mounted on Mar 9 05:09:39 merlin 21.daily.backup.userdirs[10266]: POST UsbBackup: /dev/sdh1 307663800 120023924 172011444 42% /var/backup Mar 9 05:18:00 merlin 21.daily.backup.userdirs[10266]: POST UsbBackup: usb_backup: 9233/39075840 files (3.9% non-contiguous), 31232191/78142160 blocks Mar 9 05:18:01 merlin 21.daily.backup.userdirs[10266]: POST UsbBackup: /sbin/fsck returned 1 Mar 9 05:18:01 merlin 21.daily.backup.userdirs[10266]: POST UsbBackup: EXIT Status 0 Mar 9 05:18:01 merlin 21.daily.backup.userdirs[10266]: JOB Status 0
Attachment | Size |
---|---|
rmonitor-2.0.0.tar.bz2 | 21.26 KB |
rmonitor-2.0.1.tar.bz2 | 22.43 KB |
rmonitor-2.0.3.tar.bz2 | 23.84 KB |
rmonitor-2.0.4.tar.bz2 | 25.17 KB |
- Lew's blog
- Log in to post comments
Comments
Coming enhancements
I'm working on rmonitor v2.0.1, to satisfy a request for an alteration of the syslog ID used by the rmonitor "parent" process. With v2.0.0, rmonitor logs all messages from, or intercepted by, the parent, with an ID that identifies the target script. I have a request to enhance this ID so that it selectively includes the name of the monitor program as well. In this way, "the rmanager job management is easily distinguished from the processes of the script itself".
This means that syslog lines that currently look like
Mar 9 04:40:03 merlin cron.daily: JOB 21.daily.backup.htdocs[10226] Launched
Mar 9 04:40:03 merlin 21.daily.backup.htdocs[10226]: JOB Begins
Mar 9 04:48:31 merlin 21.daily.backup.htdocs[10226]: PROCESSING Begins
Mar 9 04:48:31 merlin 21.daily.backup.htdocs[10226]: message from script
Mar 9 04:49:29 merlin 21.daily.backup.htdocs[10226]: PROCESSING Status 0
Mar 9 04:49:29 merlin 21.daily.backup.htdocs[10226]: JOB Status 0
would, with a commandline option, look like
Mar 9 04:40:03 merlin cron.daily: JOB 21.daily.backup.htdocs[10226] Launched
Mar 9 04:40:03 merlin rmonitor: 21.daily.backup.htdocs[10226]: JOB Begins
Mar 9 04:48:31 merlin rmonitor: 21.daily.backup.htdocs[10226]: PROCESSING Begins
Mar 9 04:48:31 merlin 21.daily.backup.htdocs[10226]: message from script
Mar 9 04:49:29 merlin rmonitor: 21.daily.backup.htdocs[10226]: PROCESSING Status 0
Mar 9 04:49:29 merlin rmonitor: 21.daily.backup.htdocs[10226]: JOB Status 0
It's not a difficult change, but it means that I have to refactor the commandline parsing logic (because it was complicated, and not conducive to change), and add in an option to select this "Long Name" logging.
I've got the coding complete, and an initial test looks good, but I'm going to try it out on my systems for a while before releasing the code into the wild.
General Availability for RMonitor v2.0.1
OK, I've taken some time, and tested the revisions to RMonitor v2.0.0 that provide an alternate logging identifier. Everything seems to work well, so I've added the v2.0.1 tarball to my original posting.
Please note that the v2.0.1 codebase is the one I will go forward with.
To activate the alternate logging identifier, include a "-L" option before the resourcelist.
I.e.
#!/usr/local/bin/rmonitor resource1,resource2,resource3 /bin/bash
#!/usr/local/bin/rmonitor -L resource1,resource2,resource3 /bin/bash
That's all there is to it.
"Management Reporting"
Here's a simple script to provide simple "management reporting" of the rmonitor jobstream:
The output looks something like this:
rmonitor-2.0.4: Evolution, not revolution
I built rmonitor to take some of the "heavy lifting" out of scripts that I run using run-parts(8). One of those problems that I meant rmonitor to address involved the management and reporting of long-running scripts, and what I came up with managed to adequately handle that requirement. Until it didn't.
A while back, I encountered a situation in which I had an extraordinarily-long-running rmonitor job, that I needed to kill and restart. Unfortunately, while the existing rmonitor code adequately handled the splitting out and reporting of this job, it didn't give me any way to properly kill the job. Specifically, I didn't know which of the processes to kill, and when I killed the "target" process, I didn't kill the "monitor" process.
I immediately recognized the error of my implementation; not only should rmonitor fork off children processes to run the target tasks in, it should take the role of session leader, in order to properly manage and distribute signals to the target tasks. With rmonitor as session leader, I could SIGHUP or SIGTERM the rmonitor process, and have it propogate those signals to it's children, giving me a way to initiate a controlled termination. And so, I made some changes.
With rmonitor-2.0.4, the rmonitor process intercepts signals sent to it, and propogates them forward to the children processes. Once all the children have terminated, the rmonitor process terminates cleanly. As the rmonitor process uses it's own PID when it logs both it's own messages and child messages to syslog, the external user will know exactly which process ID to kill; the one logged with the messages.