OverviewIn a highly available system, it is critical for not only the hardware to be available, but also for the software to be available. For this reason, a method must be introduced that monitors applications for availability and proper functionality. A heartbeat, in terms of software and software health monitoring, is a signal or message generated by one application and received by another. In the absence of regular heartbeats from an application, it can be assumed that the application is no longer functioning as expected. Other applications that depend on the failed application should be notified so the proper course of action can be taken to minimize potential down-time. The heartbeat functionality must work for both local applications and remote applications that reside over a network. It is also necessary for the response times of the heartbeat to be compatible for a real-time environment.
The Application Heartbeat Monitor (AHM) is a daemon that handles local application-level health monitoring through registration of applications and notification of missed heartbeats. It utilizes real-time system timers to respond to local application heartbeats and performs local application recovery in the event of an application failure. The AHM can also be configured to reset a hardware watchdog timer that uses the /dev/watchdog interface.
More InformationFor a more complete description of what this daemon does, you can view the documentation links at the left or obtain a copy of the source code and run "make docs" to produce the latest html documentation (or latex) documentation. For information on how to get the latest source from CVS, go to the CVS info page.
How you can help...There are several areas that need experienced developer eyes to review. Since this piece of code monitors the "health" of other processes, it is critical that it be robust itself. The more eyes looking and identifying areas to improve, the better! Please subscribe to the mailing list to start sharing your valued input.