Modernization Hub

Availability Manager

Enhanced Definition

An Availability Manager is a critical software component or system function within the z/OS environment designed to proactively monitor, manage, and restore the availability of vital applications, subsystems, and resources. Its primary purpose is to minimize downtime, automate recovery from failures, and ensure continuous operation of enterprise-critical services. This functionality is often implemented by products like IBM's System Automation for z/OS (`SA z/OS`).

Key Characteristics

    • Real-time Monitoring: Continuously tracks the operational status and health of various z/OS resources, including CICS regions, DB2 subsystems, IMS control regions, batch jobs, and network components.
    • Policy-Driven Automation: Executes predefined, customizable policies and rules to manage resource dependencies, start/stop sequences, and automated recovery actions.
    • Event-Driven Response: Automatically reacts to system events, messages, alerts, and detected failures by initiating corrective actions without manual intervention.
    • Complex Resource Management: Orchestrates the startup and shutdown of interdependent applications and subsystems, ensuring correct sequencing and resource allocation.
    • High Availability Facilitation: Provides the mechanisms for automated failover, workload balancing, and disaster recovery capabilities across single or multiple z/OS images.
    • Root Cause Analysis Integration: Often integrates with monitoring tools to provide context for failures, aiding in quicker problem determination.

Use Cases

    • Automated Subsystem Recovery: Automatically restarting a failed CICS region, DB2 subsystem, or IMS control region to restore service quickly.
    • Application Dependency Management: Ensuring that an IMS system only starts after its dependent DB2 subsystem is fully operational, preventing startup failures.
    • Scheduled Maintenance Automation: Orchestrating the graceful shutdown and restart of applications and their components during planned maintenance windows, reducing manual effort and errors.
    • Disaster Recovery Failover: Automating the activation of backup systems and data replication processes to a recovery site in the event of a primary site failure.
    • Resource Reconfiguration: Dynamically reconfiguring or restarting components based on performance thresholds or resource contention to maintain service levels.

Related Concepts

The Availability Manager, particularly SA z/OS, is tightly integrated with other core z/OS components. It leverages monitoring data from tools like OMEGAMON and RMF to detect issues and often works in conjunction with NetView for network and event management. It complements the Workload Manager (WLM) by ensuring resources are available to meet performance goals, and it is a foundational technology for achieving robust High Availability (HA) and Disaster Recovery (DR) strategies within the z/OS ecosystem.

Best Practices:
  • Comprehensive Policy Definition: Develop detailed and robust automation policies that cover all critical applications, their dependencies, and potential failure scenarios, including specific recovery actions.
  • Regular Testing and Validation: Periodically test all automated recovery procedures, failover scenarios, and disaster recovery plans to ensure they function as expected and meet recovery time objectives (RTOs).
  • Integration with Monitoring: Ensure seamless integration with z/OS monitoring tools to provide the Availability Manager with accurate and timely alerts and performance data.
  • Granular Control and Grouping: Design policies to manage individual components as well as logical groups of resources, allowing for flexible and precise control over complex applications.
  • Thorough Documentation: Maintain up-to-date documentation of all automation policies, recovery procedures, and configuration settings for auditing, troubleshooting, and knowledge transfer.

Related Vendors

IBM

646 products

CA Technologies

74 products

Related Categories

Automation

222 products

Operating System

154 products