Intermittent
Occasional
Enhanced Definition
In the context of mainframe systems, "intermittent" describes a condition, error, or performance characteristic that occurs sporadically, unpredictably, and not consistently. Such issues are challenging to diagnose and resolve due to their non-reproducible nature, often appearing under specific, transient circumstances.
Key Characteristics
-
- Unpredictable Occurrence: Manifests at irregular intervals, making it difficult to anticipate or intentionally trigger for debugging.
- Non-Reproducible: Often cannot be consistently replicated in test environments or even repeatedly in production, hindering problem determination.
- Transient Nature: The underlying cause might be a temporary resource contention, timing issue, or environmental factor that resolves itself before full analysis can occur.
- Diagnostic Challenge: Requires extensive logging, monitoring, and correlation of system events to identify the specific conditions under which the issue arises.
- Potential for Impact: Despite being occasional, intermittent issues can still lead to production outages, data integrity problems, or significant performance degradation when they do occur.
- Often Resource-Related: Frequently linked to transient shortages or contention for resources such as CPU, memory, I/O channels, enqueues, or network bandwidth.
Use Cases
-
- Intermittent Program Abends: A
COBOLorPL/Ibatch program occasionallyABENDs(abnormally terminates) with aS0C4orS0C7error, but runs successfully most of the time. - Intermittent CICS Transaction Slowdowns:
CICStransactions experience sporadic periods of high response times, often during peak load, but return to normal performance without intervention. - Intermittent DB2 Deadlocks: Applications accessing
DB2databases occasionally encounter deadlocks or timeouts, which are difficult to trace to a specific query or application logic. - Intermittent JCL Job Failures: A
JCLjob stream fails with anIEFBR14orIDCAMSerror on some runs, typically due to temporary dataset unavailability or enqueue conflicts. - Intermittent Network Connectivity Issues: Communication between
LPARsor to external systems (e.g.,MQ,TCP/IPsockets) occasionally drops or experiences latency spikes.
- Intermittent Program Abends: A
Related Concepts
Intermittent issues are a common challenge in problem determination (PD) and performance tuning on z/OS. They often relate to resource contention, system monitoring, logging, and error handling strategies. Their resolution typically involves a deep understanding of system internals, application behavior, and the effective use of diagnostic tools like SMF, RMF, IPCS, and OMEGAMON.
Best Practices:
- Comprehensive Logging and Tracing: Implement detailed logging within applications and ensure
z/OSsystem logs (SYSLOG,SMF,CICS logs,DB2 logs) are robust and retained for analysis. - Proactive Monitoring: Utilize
z/OSmonitoring tools (e.g.,RMF,OMEGAMON,SYSVIEW) to capture performance metrics and system events continuously, looking for correlations with issue occurrences. - Reproducibility Efforts: Attempt to isolate and replicate the issue in a controlled test environment by simulating production loads, data volumes, or specific timing conditions.
- Systematic Analysis: When an intermittent issue occurs, immediately gather all relevant diagnostic data, including
dumps,logs, andtraces, and perform a thorough timeline analysis. - Resource Contention Analysis: Investigate potential transient resource shortages (CPU, memory, I/O, enqueues, dataset locks) using
RMFreports and specialized tools. - Change Management Correlation: Review recent system, application, or configuration changes (
APARs,PTFs, application deployments) that might have introduced the intermittent behavior.
Related Products
Related Vendors
Software AG
51 products
Boole and Babbage
4 products
Related Categories
Performance
171 products
Data Discovery, Mining and Processi
40 products
Data Management
117 products
Operating System
154 products
Monitor
262 products