Modernization Hub

Intermittent

Occasional
Enhanced Definition

In the context of mainframe systems, "intermittent" describes a condition, error, or performance characteristic that occurs sporadically, unpredictably, and not consistently. Such issues are challenging to diagnose and resolve due to their non-reproducible nature, often appearing under specific, transient circumstances.

Key Characteristics

    • Unpredictable Occurrence: Manifests at irregular intervals, making it difficult to anticipate or intentionally trigger for debugging.
    • Non-Reproducible: Often cannot be consistently replicated in test environments or even repeatedly in production, hindering problem determination.
    • Transient Nature: The underlying cause might be a temporary resource contention, timing issue, or environmental factor that resolves itself before full analysis can occur.
    • Diagnostic Challenge: Requires extensive logging, monitoring, and correlation of system events to identify the specific conditions under which the issue arises.
    • Potential for Impact: Despite being occasional, intermittent issues can still lead to production outages, data integrity problems, or significant performance degradation when they do occur.
    • Often Resource-Related: Frequently linked to transient shortages or contention for resources such as CPU, memory, I/O channels, enqueues, or network bandwidth.

Use Cases

    • Intermittent Program Abends: A COBOL or PL/I batch program occasionally ABENDs (abnormally terminates) with a S0C4 or S0C7 error, but runs successfully most of the time.
    • Intermittent CICS Transaction Slowdowns: CICS transactions experience sporadic periods of high response times, often during peak load, but return to normal performance without intervention.
    • Intermittent DB2 Deadlocks: Applications accessing DB2 databases occasionally encounter deadlocks or timeouts, which are difficult to trace to a specific query or application logic.
    • Intermittent JCL Job Failures: A JCL job stream fails with an IEFBR14 or IDCAMS error on some runs, typically due to temporary dataset unavailability or enqueue conflicts.
    • Intermittent Network Connectivity Issues: Communication between LPARs or to external systems (e.g., MQ, TCP/IP sockets) occasionally drops or experiences latency spikes.

Related Concepts

Intermittent issues are a common challenge in problem determination (PD) and performance tuning on z/OS. They often relate to resource contention, system monitoring, logging, and error handling strategies. Their resolution typically involves a deep understanding of system internals, application behavior, and the effective use of diagnostic tools like SMF, RMF, IPCS, and OMEGAMON.

Best Practices:
  • Comprehensive Logging and Tracing: Implement detailed logging within applications and ensure z/OS system logs (SYSLOG, SMF, CICS logs, DB2 logs) are robust and retained for analysis.
  • Proactive Monitoring: Utilize z/OS monitoring tools (e.g., RMF, OMEGAMON, SYSVIEW) to capture performance metrics and system events continuously, looking for correlations with issue occurrences.
  • Reproducibility Efforts: Attempt to isolate and replicate the issue in a controlled test environment by simulating production loads, data volumes, or specific timing conditions.
  • Systematic Analysis: When an intermittent issue occurs, immediately gather all relevant diagnostic data, including dumps, logs, and traces, and perform a thorough timeline analysis.
  • Resource Contention Analysis: Investigate potential transient resource shortages (CPU, memory, I/O, enqueues, dataset locks) using RMF reports and specialized tools.
  • Change Management Correlation: Review recent system, application, or configuration changes (APARs, PTFs, application deployments) that might have introduced the intermittent behavior.

Related Vendors

Software AG

51 products

Boole and Babbage

4 products

Related Categories

Performance

171 products

Data Management

117 products

Operating System

154 products

Monitor

262 products