Main Page   Compound List   File List   Compound Members   File Members   Related Pages  

Application Heartbeat Monitor Documentation

0.2

Introduction

The Application Heartbeat Monitor provides local applications with the ability to provide time-based notifications to the monitor via message passing. Applications running on remote systems register with their local Application Heartbeat Monitor.

Usage

The Application Heartbeat Monitor should be started from inittab so that if it terminates, it will be automatically restarted. In its normal operation mode, it runs as a daemon process with a separate thread for each application it is monitoring. Monitoring an application is the act of timing the receipt of "heartbeats" from the application via a named pipe. If the monitor times out before receiving a heartbeat from the application associated with the timer, then it is assumed that the application has failed and must be recovered.

The Application Heartbeat Monitor can be started in a non-daemon mode for debugging purposes. The monitor will also create a timer that updates the watchdog timer periodically. This feature of the heartbeat monitor may be configured at startup. Other features can configured at start-up as well, such as the registration directory, etc.

Termination of the heartbeat monitor can be done by sending SIGTERM (signal 15). Sending SIGTERM will shutdown the monitor and cause all listening for heartbeats to end. An application that is generating heartbeats (if using the client API) will detect that writing to the heartbeat FIFO will block and not send additional heartbeats. However, the application will continuet to check if it can successfully send a heartbeat. When the heartbeat monitor is started again, it will automatically register the application and resume listening for heartbeats.

Overview of Main Components

The main components of the monitor are:

  1. Registration/Unregistration
  2. Application Recovery
  3. Self-Recovery
  4. Watchdog Timer Update
  5. Listening for heartbeats
See the activity diagram at the bottom of this page for a graphical overview of the system.

Registration/Unregistration Component

Registration consists of creating a specially named file in a registration directory. This is currently set as /var/apphbd. The contents of the file consist of registration information that also must be of a specific format or the registration will be considered invalid and it will fail. The client library takes care of creating this file and its contents.

Application Recovery

The recovery script provided during registration is executed when an application fails to send a heartbeat within its stated interval. A message is also logged regarding the application failure.

Self-Recovery

If the heartbeat monitor is terminated unexpectedly, no registration information is lost because it is maintained outside the daemon in the /var/apphbd directory. Any application that has a registration file in that directory will be registered and the daemon will continue to monitor the FIFO for heartbeats. If an application has terminated abnormally while the heartbeat monitor was not running, then the registration file will remain but not correspond to any process. The heartbeat monitor will log this error, but will not execute the recovery script in this case. The reason for not executing the recovery script is because there was no process ever running to authenticate the recovery script or registration file since the heartbeat monitor itself was not running. However, if it is determined that this is not the desired functionality, this can easily be modified.

CMonitor_RecoverSelf.gif

Activity Diagram of Self Recovery Component

Watchdog Timer Update

The watchdog timer can be reset by configuring the heartbeat monitor to open /dev/watchdog at startup and write to it at a specific interval. Reseting the watchdog timer and the interval for reseting the timer are both optional and configurable via a command line parameter.

The following diagram gives an overview of the entire Application Heartbeat Monitor.

Overview.gif

Activity Diagram for the App. Heartbeat Monitor

Todo:
determine if the g_timeout_add() callback will be terminated automatically


Generated on Wed Oct 30 15:21:17 2002 for Application Heartbeat Monitor by doxygen1.2.14 written by Dimitri van Heesch, © 1997-2002