Skip to main content

Preface

INFOPAK VSAM and INFOPAK MVS support external Lempel-Ziv dictionaries for compressing KSDS, ESDS, PDS or sequential data sets running under MVS. INFOPAK provides a tool for creating Lempel-Ziv dictionaries and is also able, when compressing a file, to accept an external dictionary name instead of one of the standard compression levels.

This document is for data administrators and system administrators for the purpose of:

  • Evaluating space savings they can expect from the use of INFOPAK with external dictionaries.

  • Using INFOPAK with external Lempel-Ziv dictionary.

Creating external dictionaries

TESTVPAK is the program to be used for creating external dictionaries. It simulates the effects of INFOPAK compression on your datasets and also creates a Lempel-Ziv dictionary optimized for compressing those datasets.

When the creation of a dictionary is requested, TESTVPAK provides compressing gains using the external dictionary (in line CP-LZ) instead of the results obtained by the standard INFOPAK LZ dictionary.

note

If the hardware instruction is not present, TESTPAK will simulate the instruction.

TESTVPAK furnishes actual results which can be used to redefine the files that will be subject to compression.

The installation procedure places the TESTVPAK module in the library with the ddname MVSLOAD (see "PRODUCT INSTALLATION JCL" delivered with the tape).

Using TESTVPAK for creating dictionaries

The TESTVPAK program, which can be executed in batch mode, may be run with the following JCL for creating dictionaries.

 //jobname  JOB............
//STEP1 EXEC PGM=TESTVPAK,PARM='CPU,LZDIC=64K'
//STEPLIB DD DSN=library containing TESTVPAK
//PRINT DD SYSOUT=*
//SYSLIN DD DSN=library(member-name),DISP=SHR
//DDxx DD DSN=name of the first file to scan
//etc. etc.
//DDyy DD DSN=name of the last file to scan
//SYSIN DD *
Command cards for each file
etc...
/*

The runtime parameters of TESTVPAK are:

{wrapper="1" role="DL"}

  • CPU

    provides CPU consumption in the compression modules.

  • LZDIC

    indicates that a dictionary is to be created and used. Compression gains for CP-LZ are reported for this dictionary.

    The size of the dictionary to be created can be specified along with this parameter. The authorized values are: 8K, 16K, 32K, 64K and 128K. The default value is 64K.

    note

:

The size of the dictionary to be created is a maximum size, if TESTVPAK finds that the same compression gain can be obtained by a smaller dictionary, it will create a smaller dictionary.

After execution the SYSLIN dataset will contain the TXT file (object) of the dictionary.

The format of the SYSIN DD Command card for each file is described below:

{wrapper="1" role="DL"}

  • OPERAND 1

    ddname of the file to be scanned

  • OPERAND 2

    (optional) number of records (or blocks) in this file to read for the compression report. If this is left blank, then TESTVPAK will read the entire file. For large files, it is recommended to limit the number of records to avoid excessive run times.

note

OPERAND 1 and OPERAND 2 must be separated by at least one space.

example

Example :

 / jobname    JOB ................
//STEP1 EXEC PGM=TESTVPAK,PARM='CPU,LZDIC=128K'
//STEPLIB DD DSN=library containing TESTVPAK
//PRINT DD SYSOUT=*,DCB=(LRECL=133,BLKSIZE=1330,
RECFM=FBA)
//SYSLIN DD DISP=SHR,DSN=INFOTEL.DICO.OBJ(DICO1)
//DD1 DD DSN=FILE1.SEQ1
//DD2 DD DSN=FILE2.SEQ2
//SYSIN DD *
DD1
DD2 770
/*

TESTVPAK will use the two files, with ddnames DD1 and DD2:

  • on the first file, FILE1.SEQ1, INFOPAK will read all physical records (blocks),

  • INFOPAK will read 770 blocks of FILE2.SEQ2.

  • it will also create a dictionary named DICO1 in the PDS INFOTEL.DICO.OBJ.

TESTVPAK Results

When creating dictionaries, TESTVPAK provides, for the file or files selected, governed by the number of records requested, a report showing the space gains achieved with CP and CS compression level and with the external dictionary. A TESTVPAK Report and the description of each report element follows.

 ************************************************************************************************************************************
* I N F O P A K V S A M / I N F O P A K M V S 01/05/95 13 H 34 PAGE : 1 *
************************************************************************************************************************************
* LIST OF CONTROL CARDS IN SYSIN *
************************************************************************************************************************************
DD1
DD2
************************************************************************************************************************************
* INFOPAK PROPERTY OF INFOTEL *
************************************************************************************************************************************
 ************************************************************************************************************************************
* I N F O P A K V S A M / I N F O P A K M V S 01/05/95 13 H 34 PAGE : 1 *
************************************************************************************************************************************
* FILE COMPRESSION STATISTICS *
************************************************************************************************************************************
! DDNAME ! RECORD LG ! NUMBER OF ! NB OF BYTES IN ! COMPR. ! NB OF BYTES ! COMPR. ! RECOMMENDED! CPU/OCC(MICS) !
! ! MIN / MAX !OCCURRENCES ! ! MODULE ! AFTER COMPR. ! GAINS ! BLKSIZE ! COMP ! DECOMP !
!+DD1 ! 1862 / 23408 ! 127 ! 2 951 270 ! CP ! 1 526 840 ! 48,27 % ! 26467 ! 27216 ! 9131 !
! ! ! ! ! CS ! 2 320 284 ! 21,37 % ! 23541 ! 8566 ! 1812 !
! ! ! ! ! CP-LZ ! 1 216 684 ! 58,78 % ! 26334 ! 17677 ! 1938 !
!+DD2 ! 720 / 3120 ! 770 ! 2 400 000 ! CP ! 1 981 760 ! 17,43 % ! 22480 ! 2497 ! 1235 !
! ! ! ! ! CS ! 2 398 000 ! 0,09 % ! 28160 ! 654 ! 152 !
! ! ! ! ! CP-LZ ! 335 920 ! 86,01 % ! 32240 ! 1418 ! 375 !
! ! ! ! ! ! ! ! ! !
!**** TOTAL **** ! 897 ! 5 351 270 ! CP ! 3 508 600 ! 34,44 % ! ! ! !
! ! ! ! CS ! 4 718 584 ! 11,83 % ! ! ! !
! ! ! ! CP-LZ ! 1 552 604 ! 70,99 % ! ! ! !
*NOTE: ABOVE MEASUREMENTS TAKE INTO ACCOUNT THE FACT THAT PRIMARY AND ALTERNATE KEYS MAY NOT BE
MOVED COMPRESSED OR ALTERED.
&ec_1; &ec_2; &ec_3; &ec_4; &ec_5; &ec_6; &ec_7; &ec_8; &ec_9;
************************************************************************************************************************************
* INFOPAK SOFTWARE PROPERTY OF INFOTEL *
************************************************************************************************************************************

note

The fact that primary and alternate keys are not compressed is accounted for in the DASD savings calculation. The character "*" appearing in front of the DDNAME indicates an ESDS file. The character "+" appearing in front of the DDNAME indicates a sequential file.

Each numbered square corresponds to the following:

  1. DDname of the file

  2. Minimum and maximum record lengths encountered

  3. Number of physical records read.

  4. Number of bytes accumulated before compression

  5. Compression Code: high performance (CS), high compression (CP) or Lempel-Ziv hardware compression with external dictionary (CP-LZ).

  6. Number of bytes accumulated after compression: this is the actual result of INFOPAK on each file.

  7. Compression gains: it is the percentage obtained from the equation: [7]=([4]-[6])/[4]

  8. Recommended BLKSIZE. This value may not look like a good BLKSIZE using conventional wisdom because it is specified in terms of uncompressed data. The actual physical BLOCK written to your DASD will be after this large uncompressed BLKSIZE is reduced due to compression. The recommended BLKSIZE should be used to maximize the benefit of compression on your DASD and TAPE datasets.

  9. CPU consumption: time in microseconds for compressing and decompressing a block (MVS) or a record(VSAM).

    This cpu time for hardware compression(CP-LZ) uses the following logic.

    • In MVS 4.3 the CVT contains a bit which designates the availability of, or absence of, the hardware compression microcode.

    • If hardware compression is to be used, INFOPAK references this bit in the CVT and if MVS 4.3 is being utilized then INFOPAK will automatically utilize the compression microcode if it is available, and the cpu time will reflect this.

    • If you are not utilizing version 4.3 of MVS and you implement hardware compression INFOPAK, believing that the hardware is not actually present, will use software emulation of the hardware microcode and the cpu time will reflect this.

    • If the hardware compression microcode is installed and MVS 4.3 is not in use, you may request a zap to force the use of the microcode from InfoTel.

The indicated gain is a MAXIMUM gain because, of course, it does not take into account the length of disk control information, any "gaps", etc.

The exact gains achieved by INFOPAK (for example expressed in number of cylinders) can be determined after reloading the compressed file.

TESTVPAK Error Messages and Return Codes

The TESTVPAK return code is 0, except if the PRINT card is omitted; in this case it would be 8.

TESTVPAK can produce the following error message:

  • THE VALIDITY DATE HAS EXPIRED, EXECUTION STOPPED

When creating a dictionary, TESTVPAK will send to ROUTECODE 11 the following messages:

 TPKDIC01 DICTIONARY SIZE REQUESTED   = XXXK
TPKDIC02 DICTIONARY SIZE CREATED = XXXK
TPKDIC03 RETURN CODE OF BUILD PHASE = X

Creating dictionary load modules

The object file created by TESTVPAK (ddname SYSLIN) must be linkedited to create a dictionary load module. This load module will be stored in LPA, LINKLIST or in STEPLIB of all jobs that will use external dictionaries for compression.

The Linkedit step is as follows:

 //LINK    EXEC PGM=IEWL,PARM='AMODE(24),RMODE(24),REFR'
//SYSPRINT DD SYSOUT=*
//SYSUT1 DD UNIT=......,SPACE=(3200,(100,100))
//INFUT1 DD UNIT=......,SPACE=(3200,(100,100))
//INFUT2 DD UNIT=......,SPACE=(3200,(100,100))
//SYSLIN DD DSN=obj-library(dictionary-name),DISP=SHR
//SYSLMOD DD DSN=output-library(dictionary-name),DISP=SHR

note

RMODE can be ANY if all programs that use an external dictionary use AMODE(31).

Using external dictionaries

Two new compression modes are provided for using external dictionaries. Those compression modes are: CX for standard compression with external dictionary and RX for smart compression with external dictionary.

All INFVSUT1 commands supports those new parameters. The syntax of the new compression level is as follows:


\>\>\-\-- Command \-\-\-- entry \-\-\--+\-- CX(LZ)\-\--+- LZDIC - dictionary\-\-\--\>\<
+\-- CX(HD)\-\--+
+\-- RX(LZ) \--+
+\-- RX(HD) \--+
  • Description:

    {wrapper="1" role="DL"}

    • entry

      Cluster name, sequential file name, generic intent or dataclass.

    • CX(LZ)

      Lempel Ziv with external dictionary. If the hardware compression instruction is not present on the CPU, a software emulation will be used.

    • CX(HD)

      Hardware Driven with external dictionary. This compression code forces the use of the hardware compression instruction when it is supported by the CPU. Otherwise the standard Huffman compression(CP) is used.

    • RX(LZ)

      Smart Lempel Ziv with external dictionary. If the hardware compression instruction is not present on the CPU, a software emulation will be used.

    • RX(HD)

      Smart Hardware Driven with external dictionary. This compression code forces the use of the hardware compression instruction when it is supported by the CPU. Otherwise the standard Huffman compression(RP) is used.

    • dictionary

      Name of the load module external dictionary to be used for compression.

      note

:

The load module containing the dictionary does not have to exist when the INFVSUT1 command is processed. But, it must be available using LPA, LINKLIST or STEPLIB at time of compression.

Error messages for external dictionaries

Messages of INFVSUT1

VSUT2916 DICTIONARY NAME NOT SPECIFIED OR INVALID.

Explanation:The keyword LZDIC is missing or the dictionary name is invalid.

{wrapper="1" role="DL"}

  • Return code

    8.

  • Operator action

    check the syntax.

  • System action

    execution stopped.

Messages of INFOPAK VSAM/MVS interface

VSUT1095 INFOPAK - EXTERNAL DICTIONARY dictionary INVALID

Explanation: This message is sent to ROUTCDE=11 by the interface if it detects that a load module dictionary has an invalid length or is not loaded on a page boundary.

{wrapper="1" role="DL"}

  • Operator action

    Check the dictionary load module. Its size must be a multiple of 4096.

  • System action

    Abend 4021 for security reasons.

VSUT1096 INFOPAK - EXTERNAL DICTIONARY dictionary NOT FOUND

Explanation: This message is sent to ROUTCDE=11 by the interface if it does not found the load module "dictionary".

{wrapper="1" role="DL"}

  • Operator action

    Check if the dictionary load module is available in LINKLIB, LPA or STEPLIB of the job that fails.

  • System action

    Abend 4021 for security reasons.