Extracting Data from LFS Files

This topic describes the TMON Data Extract File Service (herein called Data Extract). It describes how to set up a Data Extract job and the control statements needed to select, limit, and sort the data.

Data Extract provides you with the capability to retrieve data from a TMON product’s log files and use it as input to another system. While TMON Report Writer and NaviGraph allow you to process and display data collected by a TMON product, they provide only limited capability to combine and manipulate that data. Data Extract allows you to select the data fields for the TMON log records and output them to a file in a generalized format. Then you can input the file to another software package (e.g., ASG-Safari) or download it to a PC or other distributed system.

Standard SYSUT2 Output

With standard SYSUT2 output processing, Data Extract converts LFS data to normalized forms suitable for processing in a distributed environment. This batch service permits the selection, sorting, and filtering of LFS data to create data in normalized form, without the repeating segments of LFS data. The normalized data structure is referred to as an aggregate.

Creation of an aggregate is accomplished via a batch utility, program $DXTRACT. It accepts as input a set of control statements and an LFS file:

The control statements indicate which TMON product and aggregate are being processed, which fields are requested (if less than an entire aggregate), filtering constraints, and data output order. You can also format the output by separating each field with a character, and by formatting dates and times in printable form. The control statements are standard 80-column card images.
The LFS file can be any archive or summarized LFS file containing data of the specified TMON product and aggregate. Online access to TMON product data is also supported. Sample members are provided for each product record that is supported by $DXTRACT. See Aggregate Data Elements Summary Report for an example of an aggregate element list.

$DXTRACT output is an aggregate file, a data map file, and a summary file that reports the results of the data extraction and conversion.

See Special SYSUT3 Output if SYSUT3 file output is selected for data record reduction.
The output file is the aggregate data as requested and formatted via the control statements.
The data map file describes the normalized data structure: its field names and their characteristics, including offset within aggregate records, length, format, and units. This file describes only unformatted data; if anything other than FORMAT(NONE) is specified, the offsets of the fields in the output will not match the outputs listed in the description.
The summary file reports the results of the data extraction and conversion, including messages and the number of records read and processed.

Special SYSUT3 Output

The optional SYSUT3 output file produces a subset of the archive input file that reflects only the selection criteria of the WHERE clause and is not otherwise specially formatted or processed.

SYSUT3 record output is only intended to provide unprocessed, but selected, records for passing to a subsequent $DXTRACT step. It should be used to select more complex data with multiple job steps, each of which further reduces the set of qualifying records using the step’s selection criteria. WHERE (condition [AND|OR condition]...) for an explanation of allowable selection criteria.

Job Control Language

This JCL is used to execute data extraction.

A sample of this JCL is distributed with each TMON product and is named pppXTRCT, where ppp is the product code (e.g., TCE for TMON for CICS TS, TCP for TMON for TCP/IP, etc.). See the TMON product’s system administrator document for a list of the samples that support Data Extract.

//EXTRACT EXEC PGM=$DXTRACT,REGION=4M

//STEPLIB DD DSN=pppAPF.pppLOAD,DISP=SHR

// DD DSN=LSSAPF.LMKLOAD,DISP=SHR

// DD DSN=LSRAPF.LMKRLOD,DISP=SHR

//SYSIN DD *

control statements, described below

//***************************************************************

//*

//* Input data - Compressed/Decompressed Archive or Summary File

//*

//* Uncomment one of the SYSUT1 cards

//****************************************************************

//*

//* Archive File

//*

//****************************************************************

//*SYSUT1 DD DSN=pppPFX.cccc0n(0),DISP=SHR > archive file

//*

//****************************************************************

//*

//* Decompressed Archive File

//*

//****************************************************************

//*SYSUT1 DD DSN=pppPFX.EXPANDED.cccc0n,DISP=SHR > Decompressed

//* archive file

//*

//****************************************************************

//*

//* Summary File

//*

//****************************************************************

//SYSUT1 DD DSN=pppQUAL.cccc0n.HISTOUT(0),DISP=SHR > History file

//*

//****************************************************************

//*

//* On-line LFS data

//*

//****************************************************************

//*LMRKA DD DSN=pppPFX.vsam-on-line.cccc0n,DISP=SHR

//*LMRKB DD DSN=pppPFX.vsam-on-line.cccc0n,DISP=SHR

//*LMCFnnnn DD DSN=pppPFX.vsam-on-line.cccc0n,DISP=SHR

//*

//****************************************************************

//* Output data - extracted data

//*

//****************************************************************

//SYSUT2 DD DSN=output_aggregate_data,

// UNIT=SYSDA,

// SPACE=(TRK,(225,15),RLSE),

// DCB=(RECFM=VB,LRECL=27994,BLKSIZE=27998),

// DISP=(NEW,CATLG)

//SYSPRINT DD SYSOUT=* Summary Report

//DATAMAP DD SYSOUT=* Data map

//SYSUDUMP DD SYSOUT=*

//*

//****************************************************************

//* Optional Output data - LFS Archive selected subset of data

//* (uncomment if needed)

//*

//* NOTE. LFS Archive output is for use in multi-step data record reduction

//* and is only produced if a SYSUT3 DD file output is supplied.

//* (if the ORDERBY parm is used in control statements then the

//* SYSUT3 DD is ignored).

//*

//****************************************************************

//*SYSUT3 DD DSN=&&TEMP,DISP=(,PASS), >>> selected LFS data records (only)

//* UNIT=Z#RSYSDA,

//* SPACE=(CYL,(12,12),RLSE),

//* DCB=(SYS1.LINKLIB,DSORG=PS,RECFM=VB,LRECL=32756,BLKSIZE=32760),

//*

//*

//****************************************************************

//* Optional - Multi-Step Selection Additional Step, using the subset of

//* data selected in the prior Extract step.

//*

//****************************************************************

//*

//*EXTRACT2 EXEC PGM=$DXTRACT,REGION=4M

//*STEPLIB DD DSN=pppAPF.pppLOAD,DISP=SHR

//* DD DSN=LSSAPF.LMKLOAD,DISP=SHR

//* DD DSN=LSRAPF.LMKRLOD,DISP=SHR

//*SYSIN DD *

//* control statements, described below

//*

//****************************************************************

//* Input File - temporary Archive file passed from previous step

//*

//****************************************************************

//*SYSUT1 DD DSN=&&TEMP,DISP=(OLD,PASS,KEEP)

//*

//****************************************************************

//* Output data - extracted data

//*

//****************************************************************

//*SYSUT2 DD DSN=output2_aggregate_data,

//* UNIT=SYSDA,

//* SPACE=(TRK,(225,15),RLSE),

//* DCB=(RECFM=VB,LRECL=27994,BLKSIZE=27998),

//* DISP=(NEW,CATLG)

//*SYSPRINT DD SYSOUT=* Summary Report

//*DATAMAP DD SYSOUT=* Data map

//*SYSUDUMP DD SYSOUT=*

DD Statements

This section describes each DD statement in the JCL model.

DATAMAP

Identifies the description file of the aggregate records. Each output field is described by this file. The record format of DATAMAP is FBA, and the record length is 121. Typically, DATAMAP is routed to SYSOUT, but it may be routed to a dataset instead, in which case a block size equal to a multiple of 121 must be specified.

LMCFnnnn

Identifies a series of a product’s active LFS datasets, where nnnn is any 1- to 4-character combination of valid characters for DD names. This permits any number of online files to be read by $DXTRACT, which reads the files in the order in which they appear in the JCL. In an active system, online data is volatile and may change even as it is being read. $DXTRACT, however, never modifies the data and ignores LMCFnnnn if //LMRKA, or //LMRKB, or //SYSUT1 (archive data) is present in the JCL.

LMRKA/LMRKB

Identifies a product’s active LFS data. When reading these file(s), $DXTRACT begins with LMRKA, then at end-of-file, begins reading LMRKB. If LMRKA is not present, $DXTRACT begins with LMRKB. In an active system, online data is volatile and may change even as it is being read. $DXTRACT, however, never modifies the data and ignores LMRKA/LMRKB if //SYSUT1 (archive data) is present in the JCL.

SYSIN

Identifies the file of control statements to be used. Control statements must be specified on fixed-block, 80-byte records. Block size may be of any size.

SYSOUT

Identifies the summary report of SORT, which is invoked only if the control statement ORDERBY is employed. SYSOUT can be dynamically allocated by SORT and need not be specified in your JCL.

SYSPRINT

Identifies the summary report for $DXTRACT. All control statements are listed along with any informational, warning, or error messages that may have been produced. Input and output record counts are also listed. The record format of SYSPRINT is FBA, and the record length is 121. Typically, SYSPRINT is routed to SYSOUT, but it may be routed to a dataset instead, in which case you must specify a block size equal to a multiple of 121. File SYSPRINT is dynamically allocated and need not be specified. If this DD name is specified in the JCL, however, dynamic allocation will not occur and the JCL specification will be honored.

SYSUT1

Identifies the input file of archive LFS records. Compressed or expanded forms of LFS data are supported, as well as summary (or history) files. Online data is not supported through SYSUT1; instead, use LMCFnnnn or LMRKA/LMRKB. SYSUT1, if present, prevents the reading of online files by $DXTRACT.

SYSUT2

Identifies the output file of aggregate records. The record format of output file SYSUT2 is always variable-blocked (VB) and need not be specified in the JCL. Record length and block size are set to 27994 and 27998, respectively.

SYSUT3

Identifies the optional output file of selected LFS records from an input file as selected by the SELECT statement’s WHERE logic. This option can be used to narrow down the selection of records as a way to handle complex selections of the data by means of multiple job steps, each of which further reduces the set of selected records. If an ORDERBY statement is included in the control statements, the SYSUT3 DD is ignored and no output is produced.

Control Statements

You use the control statements to select the input data and control the output format. The statements are 80-column card images, using columns 1 through 72. Columns 73 through 80 are ignored by $DXTRACT. $DXTRACT requires the SELECT and OUTPUT statements, which each has a number of subparameters that define how the data should be selected and formatted. This is the control syntax for $DXTRACT:

* Select parameters follow

SELECT ( (*|field [,field]...) -

FROM (PRODUCT(pppppppp), AGGREGATE(aggregate_name)) -

[WHERE (condition [AND|OR condition]...)] -

[ORDERBY (field,A|D [,field,A|D]...)] )

 

* Output parameters follow

OUTPUT (FORMAT (COMMASEPARATED|NONE) -

[DATE(date_pattern)] [TIME(time_pattern)] -

[HEADINGS(YES)] [SEPARATOR(‘c’|X’hh’)] ) -

[NONPRINTABLE(ASIS|BLANK|?)] -

[QUALIFIER(NONE|QUOTE|APOST)] -

[LMRKCLK(HEX)]

The dash character (-), as seen on the right end of some statements, is a continuation character and must be specified to indicate that the statement continues on the next line.

The SELECT statement, although similar in syntax to Structured Query Language (SQL), supports only the parameters shown.

Parameters

This section describes the parameters of the SELECT and OUTPUT statements in alphabetical order.

*|field [,field]...

Required. Specifies names of fields to be extracted from an LFS file to form the aggregate. An asterisk (*) indicates that all fields defined to the complete aggregate are to be extracted. Specifying a name other than "*" indicates that a specific list of fields is to be extracted. Derived fields are not supported.

condition

Specifies a filter condition that must be met to select input for the extraction process. The format of condition is:

field-name comparator value|field-name-2

where:

field-name is the name of the field being tested.

comparator is the way the test is conducted and is represented by one of these values:

Comparator

Description

LT or <

Less than

LE or <= or !>

Less than or equal

EQ or =

Equal to

NE or != or <>

Not equal to

GE or >= or !<

Greater than or equal to

GT or >

Greater than

value is the value against which the value of the field will be tested. value must be delimited with apostrophes or quotation marks when an alphabetic literal, date, or time is specified. Do not use delimiters with other numeric fields or with the name of a field.

Values for date fields must be in yyyy-mm-dd format, where yyyy specifies the year, mm specifies the month, and dd specifies the day. Specify any leading zeros - the date must occupy 10 characters. Values for time fields must be of form hh:mm:ss.nnnnnn, where hh, mm, ss, and nnnnnn specify the numeric hour, minute, second, and fractional second of the desired time. Specify leading zeros. The minimum time specification is hh:mm:ss - eight characters. The fractional second is optional and may be used to specify increased precision. Use from one to six digits preceded by a decimal point.

field-name-2 is the name of a field against which field-name will be compared. If field-name-2 is specified, value cannot be coded.

DATE (date_pattern)

Specifies a pattern in which all date fields in the output file will be displayed when FORMAT is not set to NONE. The default DATE pattern is YYYY-MM-DD. Date fields created with a pattern are in printable format. If you want to use a delimiter (space or comma) as the separator between the date elements, you must enclose the pattern in apostrophes or quotation marks. DATE is ignored when FORMAT(NONE) is in effect.

The pattern is a combination of the letters D, M, and Y that indicate day, month, and year, respectively. This table describes valid date patterns:

Value

Description

D

Numeric day of the month from 1 through 31 (no leading zero).

DD

Numeric day of the month from 01 through 31 (leading zero is required).

DDD

Name of the day of the week in three characters: SUN, MON, TUE, WED, THU, FRI, and SAT.

DDDD

Name of the day of the week in multiple characters: SUNDAY, MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY, and SATURDAY.

M

Numeric month of the year from 1 through 12 (no leading zero).

MM

Numeric month of the year from 01 through 12 (leading zero is required).

MMM

Name of the month of the year in three characters: JAN, FEB, MAR, APR, MAY, JUN, JUL, AUG, SEP, OCT, NOV, and DEC.

MMMM

Name of the month of the year in multiple characters: JANUARY, FEBRUARY, MARCH, APRIL, MAY, JUNE, JULY, AUGUST, SEPTEMBER, OCTOBER, NOVEMBER, and DECEMBER.

YY

Two-digit year (leading zero is required).

YYYY

Four-digit year.

Any other character in the pattern is considered a separation character and is reproduced in the output as is. You can delimit date_pattern with apostrophes or quotation marks.

FORMAT (COMMASEPARATED|NONE)

Specifies that output fields are to be separated by commas (COMMASEPARATED) or not separated at all (NONE). Default is NONE. COMMASEPARATED, which may be abbreviated CSV, specifies a field-separated format. The default field separator character is the comma and may be overridden with the SEPARATOR(c|X’hh’) parameter. See SEPARATOR(c|X‘hh’).

When COMMASEPARATED is specified, all time and date fields are formatted according to their respective patterns, and all numeric fields are converted to display (zoned decimal).
When NONE is specified, either explicitly or by default, all other formatting operands are ignored (DATE, HEADINGS, TIME, NONPRINTABLE, QUALIFIER, and SEPARATOR). See Data Description for Extracted Data with FORMAT(NONE) for more information.

FROM (PRODUCT(pppppppp), AGGREGATE(aggregate_name))

Specifies the 6- to 8-character code of the product (pppppppp) and the name of an aggregate (aggregate_name) of that product from which data is to be extracted. Aggregate names can be up to eight characters long. FROM is a required parameter. This table lists the product codes:

Code

Product

TMONCICS

TMON for CICS TS

TMONDB2

TMON for DB2

TMONIMS

TMON for IMS

TMONMQ

TMON for MQ

TMONMVS

TMON for z/OS

TMONTCP

TMON for TCP/IP

TMONVTAM

TMON for VTAM

TMONWEB

TMON for Web Services

HEADINGS(YES)|(NO)

Specifies whether the names of the fields extracted will be output, separated by commas, as the first record in the output dataset (//SYSUT2). The default is NO. HEADINGS is ignored when FORMAT(NONE) is in effect.

LMRKCLK(HEX)

Specifies that when the LMRKCLK field is selected for retrieval, it should be written to the output file in a hexadecimal format (e.g., X’5E31A8007B42F511’). If the LMRKCLK(HEX) parameter is not specified, the field is written to the output file as a 20-digit decimal number.

NONPRINTABLE(ASIS|BLANK|?)

Specifies that character fields containing one or more bytes not represented in the character set (e.g., X’00’) either remain unchanged (ASIS) or be replaced with a blank (X’40’) or a question mark (?). ASIS is the default.

ORDERBY (field,A|D [,field,A|D]...)

Specifies the order of the output in field-value hierarchy and by ascending (A) or descending (D) order by field. The output will be created in order of the first field specified, followed by the second, etc., down to the last field specified.

If an ORDERBY statement is specified in the control statements for the current job step, any SYSUT3 DD is ignored.

QUALIFIER(NONE|APOST|QUOTE)

Specifies how character fields are to be delimited. The specified character will be placed prior to the beginning of the string and after the end of the string. If a character string contains the specified QUALIFIER character, that character is doubled. For example, if you specify QUALIFIER(APOST) and the character string is ab’cd, then the output string would be ‘ab’’cd’. NONE is the default.

SEPARATOR(c|X‘hh’)

Specifies an alternative field separation character to the comma, which is the default. c is any single, specific character, and must be enclosed within apostrophes (‘ ’) or quotation marks (“ ”). X’hh’ is a hexadecimal value, where h represents a valid hex digit (0 through 9, A through F). You can specify x in either lowercase or uppercase. If an apostrophe itself is desired as the separation character, specify SEPARATOR("’") (quote-apostrophe-quote). Similarly, if the quotation mark is desired as the separation character, specify SEPARATOR(‘"’) (apostrophe-quote-apostrophe). SEPARATOR is ignored when FORMAT(NONE) is in effect.

TIME (time_pattern)

Specifies a pattern in which all time fields in the output file will be displayed when FORMAT is not set to NONE. The default TIME pattern is HH:MM:SS. Time fields created with a pattern will be in printable format. If you want to use a delimiter (space or comma) as the separator between the TIME elements, the time_pattern must be enclosed in apostrophes or quotation marks. TIME is ignored when FORMAT(NONE) is in effect.

The pattern is a combination of the letters H, M, N, and S that indicate hours, minutes, fractions, and seconds, respectively. This table describes valid time patterns:

Value

Description

H

For time of daynumeric hours from 0 through 23 (no leading zero).

For elapsed time–numeric hours 0 through 99 (no leading zero).

HH

For time of daynumeric hours from 00 through 23 (leading zero is required).

For elapsed time–numeric hours 0 through 99 (leading zero is required).

HHH

For time of daynumeric hours from 00 through 23 (leading zero is required).

For elapsed time–numeric hours 0 through 999 (leading zero is required if the value matches the HH pattern).

HHHH

For time of daynumeric hours from 00 through 23 (leading zero is required).

For elapsed time–numeric hours 0 through 8750 (leading zero is required if the value matches the HH pattern).

M

Numeric minutes from 0 through 59 (no leading zero).

MM

Numeric minutes from 00 through 59 (leading zero is required).

S

Numeric seconds from 0 through 59 (no leading zero).

SS

Numeric seconds from 00 through 59 (leading zero is required).

N through NNNNNN

Numeric fractions of seconds from tenths (N = 10-1) through microseconds (NNNNNN = 10-6).

Any other character in the pattern is considered a separation character and is reproduced in the output as is.

WHERE (condition [AND|OR condition]...)

Specifies conditions, or combinations of conditions, that must be met for a record to be selected for output. You can use the operators AND or OR to form the selection filter criteria. AND and OR operators are evaluated in order from left to right. ASG recommends that the combination of conditions within a WHERE statement be limited to simple selection of either a series of AND conditions or a series of OR conditions.

If more complex selections are required (e.g., logically nested selection criteria), you can use a multiple job step approach using the optional SYSUT3 DD output file that will provide unprocessed, but selected, records to be passed to one or more subsequent $DXTRACT job steps. Then the set of data records is successively reduced until only those records you want to analyze or display remain. Example 4.

The Data Extract SELECT statement, although similar in syntax to Structured Query Language (SQL), supports only the parameters shown.

Examples

These examples of control statements illustrate the descriptions in this section.

Example 1

SELECT ((*) -

FROM (PRODUCT(TMONDB2),AGGREGATE(DBMAIN)) )

This example selects all fields for aggregate DBMAIN from the input file. The fields are written to the output file without field separators. This example shows the minimum control statement specification.

Example 2

SELECT ((*) -

FROM (PRODUCT(TMONDB2),AGGREGATE(DBMAIN)) )

OUTPUT (FORMAT (COMMASEPARATED) DATE(yyyy-mm-dd) TIME(hh:ss))

This example selects all fields for aggregate DBMAIN from the input file and places a comma (,) between each output field. Date and time fields are output as character data in the forms indicated.

Example 3

SELECT ((LMRKDATE LMRKTIME DBHSSSID) -

FROM (PRODUCT(TMONDB2),AGGREGATE(DBMAIN)) -

WHERE (DBHSSSID EQ 'DB71') -

ORDERBY (LMRKDATE,D,LMRKTIME,D) )

OUTPUT (FORMAT (CSV) SEPARATOR(‘/’) HEADINGS(YES))

This example selects three fields from aggregate DBMAIN from the input file, sorts the output by fields LMRKDATE and LMRKTIME in descending order, selects only those records whose field DBHSSID has a value equal to 'DB71', and places a slash (/) between each output field. The first record in the output will contain a list of the selected field names, where each field name is also separated by a slash (/).

Example 4

You can use SYSUT3 processing to progressively subset the data. This example shows two ways to accomplish the same result. The first set of statements uses a complex nested clause. The second is a two-step process. For example, this is a complex clause:

SELECT ((field-name,field-name-2,field-name-3) -

FROM (PRODUCT(pppppppp),AGGREGATE(agg_name)) -

WHERE ((field-name EQ 'n' OR field-name EQ 'n') -

AND (field-name-2 EQ 'n' OR field-name-2 EQ 'n'))-

ORDERBY (field-name,D,field-name-2,D) )

OUTPUT (FORMAT(CSV) HEADINGS(NO) SEPARATOR(';') -

TIME(hh:mm:ss:nnn) DATE(yyyy.mm.dd) )

The following two-step process within the same job produces the same SYSUT2 output. The complex clause is broken into simpler steps in combination with the use of the optional SYSUT3 output.

The first step of the process would be to run $DXTRACT with the following selection clause and the extra temporary output file of SYSUT3 coded in the JCL:

SELECT ((field-name,field-name-2,field-name-3) -

FROM (PRODUCT(pppppppp),AGGREGATE(agg_name)) -

WHERE (field-name EQ 'n' OR field-name EQ 'n')) -

OUTPUT (FORMAT(CSV) HEADINGS(NO) SEPARATOR(';') -

TIME(hh:mm:ss:nnn) DATE(yyyy.mm.dd) )

No ORDERBY statement is allowed if SYSUT3 is to be used.

The second step is to use this temporary output file produced by the SYSUT3 output in the first step as SYSUT1 input in the same job:

SELECT ((field-name,field-name-2,field-name-3) -

FROM (PRODUCT(pppppppp),AGGREGATE(agg_name)) -

WHERE (field-name-2 EQ 'n' OR field-name-2 EQ 'n') -

ORDERBY (field-name,D, field-name-2,D) )

OUTPUT (FORMAT(CSV) HEADINGS(NO) SEPARATOR('/') -

TIME(hh:mm:ss:nnn) DATE(yyyy.mm.dd) )

The two-step approach allows you to select three fields from the aggregate and from the input file, sorts the output by field name in descending order and selects only those records whose fields match the complex selection criteria, and places a slash (/) between each output field. The first record in the output contains a list of the selected field names, where each field name is also separated by a slash (/).