Data Set
A fundamental unit of data storage on IBM mainframe systems running z/OS, representing a collection of logically related records. It is the mainframe equivalent of a file in other operating systems, managed by z/OS and its data management services.
Key Characteristics
-
- Organization: Data sets can be organized in various ways, including
Sequential Data Set (PS),Partitioned Data Set (PDS)orPartitioned Data Set Extended (PDSE),Virtual Storage Access Method (VSAM)(KSDS, ESDS, RRDS, LDS), andGeneration Data Group (GDG). - Naming Convention: Each data set is identified by a unique
data set name (DSN), which is typically hierarchical (e.g.,PROD.APPL.COBOL.SOURCE). - Allocation: Requires pre-allocation of space on Direct Access Storage Devices (DASD) using
JCLor utilities, specifying attributes like space, record format, and block size. - Attributes: Defined by characteristics such as
RECFM(Record Format - Fixed, Variable, Undefined),LRECL(Logical Record Length),BLKSIZE(Block Size), andDSORG(Data Set Organization). - Management: Managed by z/OS's Data Facility Product (DFP) component, which handles allocation, cataloging, and I/O operations.
- Cataloging: Most production data sets are cataloged in the
Integrated Catalog Facility (ICF)to allow programs and users to locate them by DSN without needing specific volume information.
- Organization: Data sets can be organized in various ways, including
Use Cases
-
- Source Code Storage:
PDSorPDSEare commonly used to store source code for programs written in COBOL, PL/I, Assembler, or JCL procedures. - Program Libraries:
PDSEs store executable load modules (compiled programs) that are invoked by batch jobs, CICS transactions, or IMS applications. - Transaction Data:
VSAM KSDSorESDSare frequently used by online transaction processing systems like CICS and IMS for high-volume, random, or sequential access to application data. - Batch Processing Input/Output:
Sequential data setsare extensively used for input files, intermediate work files, and output reports generated by batch jobs. - System Logs and Journals:
Sequential data setsorVSAM ESDScan store system logs, audit trails, and journal records for recovery or compliance.
- Source Code Storage:
Related Concepts
Data sets are the fundamental building blocks for data storage on z/OS. They are defined and manipulated using JCL (Job Control Language) statements (specifically DD statements) to specify their names, attributes, and access methods for batch jobs. COBOL and other programming languages interact with data sets through file definitions (SELECT, FD) and I/O statements (OPEN, READ, WRITE). CICS and DB2 leverage VSAM data sets for their underlying data storage, with DB2 managing its own data within VSAM Linear Data Sets (LDS).
- Descriptive Naming: Use clear, hierarchical
DSNs that indicate ownership, application, and content (e.g.,SYS1.PROD.APPL.DATA.MASTFILE) for easier identification and management. - Optimal Block Size: Choose an appropriate
BLKSIZEto optimize I/O performance and DASD utilization, often a multiple of the track size orLRECL. - Cataloging: Always catalog production data sets in the
ICFto simplify access, improve system performance, and facilitate data set management. - Space Allocation: Allocate sufficient primary and secondary space to prevent
X37abends, but avoid over-allocating excessively large amounts of space that waste DASD. - GDG Usage: Utilize
Generation Data Groups (GDGs)for sequential files that are regularly updated or archived, simplifying JCL and providing inherent version control. - Data Set Security: Implement
RACF(or equivalent security product) profiles to control access to data sets, specifying read, write, update, and delete permissions to protect sensitive information.