Preface
INFOPAK for DB2 allows compression of DB2 tables running under MVS, MVS/XA or MVS/ESA.
This document is meant to be used by database administrators and system administrators for the purpose of:
-
evaluating the space savings they can expect from the use of INFOPAK, and
-
using the INFOPAK compression system.
The installation tape included with this manual contains all the modules necessary to install INFOPAK. The modules are date protected during trials.
note
The installation of INFOPAK for DB2 is described in the "Installation manual" HP106.
An introduction to INFOPAK for DB2
INFOPAK is designed to take full advantage of the features inherent in the DB2 data base system. By exploiting the data attribute information provided by DB2, INFOPAK for DB2 is able to optimize the data compression.
An EDIT Routine exit is provided by DB2 for the express purpose of allowing user supplied routines for edit, encryption or compression. INFOPAK uses this exit for compression. The INFOPAK for DB2 compression routine is specified in the CREATE TABLE statement for each base table that you want to compress. The EDIT Routine exit is executed by DB2 just after the record (ROW) is retrieved and just before it is stored.
By using the DB2 provided exit, table compression is transparent to the user programs.
Compatibility with DB2's extensive backup and recovery system is maintained through the use of the DB2 EDIT Exit. Further savings from compression will be achieved in the DB2 data base recovery system. The IMAGE copy backup program will execute in less time because it is copying less data and will occupy less storage space.
INFOPAK for DB2 will only compress user table data. Storage space used by pointers, indexes, etc. will not be compressed.
Although INFOPAK picks up the data format information provided by DB2, it will always compress according to the data itself. In other words, if character data is found in a decimal field it will compress and decompress exactly what it found. Data is never changed and if there is a doubt, INFOPAK will just bypass and not compress it.
INFOPAK for DB2 has also been designed to accommodate all of the new data types in DB2, such as NULL, NOT NULL and the new formats that DB2 uses to store decimal data.
The INFOPAK for DB2 module is release independent. Once your data base tables are compressed, no further maintenance is necessary.
INFOPAK compression
INFOPAK uses a modified Huffman Encoding Technique, but unlike other Huffman compression programs, INFOPAK does not require external user defined tables to describe the data to compress.
INFOPAK employs high speed scanning instructions using techniques derived from Artificial Intelligence theory. The scanning program performs a 3 dimensional analysis of data and then makes compression decisions based on the nature of the data. The scanning program may decide to use several different compression methods for optimum performance/compression. Most data is compressed using the Huffman Encoding Technique, (i.e. high frequency characters are represented by small bit strings). Other data compression techniques include repeating character substitution, bit representation of blank and zero fields and a proprietary algorithm for certain kinds of data.
Because INFOPAK does not require a scan of the data base to generate an external compression table, it can be used on new data bases. This allows application managers to design data bases with maximum filler space which provides for painless evolution and maintenance, at virtually no extra cost in disk storage.
Evaluation
INFOPAK for DB2 also includes a separate test program, TPAKDB2, that will let you quickly evaluate the savings that INFOPAK offers.
TPAKDB2 is a compression test program running under TSO that allows you to scan DB2 data base tables and print a report showing the compression gains that INFOPAK will produce for each DB2 table that you want to test for compression.
TPAKDB2 will perform a total or partial sequential read of one or more DB2 tables, and calls the compression routine for each line (row) that is read. Just supply a parameter for each DB2 table that you want TPAKDB2 to read. The TPAKDB2 compression report can then be used for calculating the size of DB2 pages when you actually compress the DB2 tables with INFOPAK for DB2.
DB2 design considerations
Instead of using variable length fields (VARCHAR), use fixed length fields big enough to hold your data and let INFOPAK compress it without any wasted space. Usually INFOPAK will store fixed length data fields in less space than variable. INFOPAK will compress blanks or spaces down to the bit level. If a data field is specified as variable, a two byte binary length field must be carried in the table itself along with the data, so that user programs can access that field. Because fixed field lengths are carried in the DB2 Catalog, those two bytes for the variable length field are wasted table space in most cases.
Fixed length fields are also a lot easier to use with COBOL programs or with QMF applications. Naturally, the fixed length field cannot be larger than the current DB2 maximum of 256 bytes.
INFOPAK performance
INFOPAK compression is performed using high performance scanning instructions, and groups fields of the same type for global packing and unpacking. Therefore CPU usage is reduced to a minimum and response time for online systems will usually improve because of the gains in CPU I/O from compressed data.
Interactive Compression Simulation
Execution of TESTPAK under TSO
The CLIST PAKSTAR should be in a library in the LOGON procedure (see installation manual: HP106) This CLIST, PAKSTAR, invokes the interactive part of INFOPAK.
Invoking PAKSTAR, displays the following screen:
Primary menu
\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-- I N F O P A K \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\--
COMMAND ===\>
Select one of the following options:
0 DEFAULTS
1 TESTPAK VSAM/MVS NOT AVAILABLE
2 TESTPAK DB2
P CONFIDENTIAL CODE
T TUTORIAL
X END
Type END or RETURN to quit
note
If you have also TESTPAK VSAM/MVS, NOT AVAILABLE disappears.
Option 0, Defaults
This option is used to list and modify the default parameters for each user.
The parameters used by TESTPAK for DB2 are :
-
the JOB card used for batch execution,
-
The default value for the DB2 subsystem.
-
the default of the maximum number of rows per table that TESTPAK read for the evaluation of compression.
-
The maximum number of rows per page. This number is used to calculate the allocation gains. Since DB2 V3, this value can be 255.
The following panel is displayed:
\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-- DEFAULT VALUES - INFOPAK \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\--
COMMAND ===\>
Enter below the JOB card(s) for batch execution:
===\> //JOB1 JOB \'TESTPAK\',MSGCLASS=S,NOTIFY=INFO0
===\> //*
===\> //*
===\> //*
Specific parameters of TESTPAK DB2:
DB2 subsystem ===\> DSN
Maximum number of rows to be read ===\> 10000
Maximum number of rows per page ===\> 127 (127/255)
Option 1, TESTPAK DB2
The following panel is displayed:
\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-- TESTPAK DB2 \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\--
COMMAND ===\>
DB2 subsystem ===\> DSN
Enter selection criteria for tables to be tested:
CREATOR ===\> INFO%\...
TABLE ===\> \...\...\...\...\...\...
DATABASE ===\> \...\.....
TABLESPACE ===\> \...\.....
Each criteria can be a full name or a generic name.
This panel is used to select tables for simulating compression.
Description of input fields
The input fields are as follows:
{wrapper="1" role="DL"}
-
DB2 subsystem
Name of the DB2 subsystem. The default value is entered on defaults panel. ( option 0) It can be modified on this panel.
-
Creator
Select criteria of creators.
-
Table
Select criteria of tables.
-
Database
Select criteria of databases.
-
Tablespace
Select criteria of tablespaces.
Each field may be a full name or a generic name. The generic characters used follow the DB2 syntax ( '_' , '%' ).
note
Default: ARCHIVE file exists. The creation of this file is described in the installation manual of INFOPAK.
Selection panel
After the validation of the selection, the followed panel is displayed :
\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-- INFOPAK DB2 SIMULATION \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\--
COMMAND ===\>
DB2 subsystem : DSN
CREATOR : INFO% TABLE :
DATABASE : TABLESPACE :
Maximum number of rows to be read ===\> 10000
Execution type, Batch(B) or Interactive(I) ===\> I
S: S selection, ? information, E erase
S CREATOR NAME OF THE TABLE DATABASE TABLESPACE CP CS CP-LZ
. INFO0 TA0PAK DB001T TA0PAK£ 42.98 36.49 38.23
. INFO0 TA00VC DB001T TA00VC\$ 66.37 54.12 56.45
. INFO0 TA0VD DB001T TA0VD£
. INFO0 TA00VE DB001T TA00VE\$ 77.65 39.62 43.45
******************************** BOTTOM OF DATA *******************************
Description of input fields
The input fields are as follows:
{wrapper="1" role="DL"}
-
Number of rows
Indicates the maximum number of rows to be read for each selected table. This value is used in interactive and batch mode. A null value means that all rows of the table are to be read. The default value is defined in the user profile, it can be modified by the option 0 (default).
Execution type :
Indicates the execution mode of TESTPAK DB2.
{wrapper="1" role="DL"}
-
(I)nteractive
The simulation is processed for each selected table in synchronous mode.
A message shows the table name in process and the statistics are updated automatically.
'I' is the default value.
-
(B)atch
Execution JCL is displayed for all selected tables. Statistics are available at the end of the compression job and it is necessary to reselect the tables to display the statistics.
If archive file card is not present, results are in the sysout file.
When selecting numerous tables, use the batch mode.
-
Selection code
Type 'S' to select a table to be processed.
The code '?' display more information about the table compression. See Information panel .
The code 'E' erases statistics in archive file.
Output fields
The output fields are as follows:
{wrapper="1" role="DL"}
-
Selection criteria
Selection criteria entered on last panel
-
CP
Compression gain compression for the CP routine
-
CS
Compression gain compression for the CS routine
-
CP-LZ
Compression gain for the Lempel-Ziv (hardware) routine
Selection validation
The execution of commands '?' or 'E' is immediate. PF3 exits the operation. The validation of the command 'S' is different as shown below:
-
In interactive mode.
The execution is immediate for all the selected tables when depressing: ENTER, PF7 or PF8.
-
In batch mode.
After the selection, is necessary to press PF3 to obtain the execution JCL. The user can submit this JCL, modify it or cancel the operation by pressing PF3.
The archive file
The archive rules are as follows:
-
The compression simulation data is stored in a VSAM cluster, the ARCHIVE file (see installation manual)
-
The archive file name is defined in the INFOTEL PARMLIB member.
-
If the archive file is not available, statistics are not stored.
-
New statistics replace old statistics by dataset name.
Information panel
The command (?) displays more information for a table. A panel is displayed as follows:
\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-- INFOPAK for DB2 SIMULATION - INFORMATION \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\--
COMMAND ===\>
DATABASE : DB001T DB2 subsystem : DSN
TABLESPACE : TA00VE\$ Max number of rows per page : 127
CREATOR : INFO0
TABLE : TA00VE
Minimum row length : 389
Maximum row length : 389
Number of rows read : 100
Number of bytes before compression : 38 900
Comp. Data Alloc. Number of bytes Compress. Decompress.
Module Gains Gains after compression Time Time
CP 77.65 70.00 8 696 114 72
CS 39.62 40.00 23 487 57 24
CP-LZ 50.23 42.00 18 542 1214 852
Description of output fields
{wrapper="1" role="DL"}
-
DB2 subsystem
Name of the DB2 subsystem.
-
Max number of rows per page
The maximum number of rows per pages that has been used to calculate the allocation gains. The default value is 127.
-
DATABASE
Name of the database.
-
TABLESPACE
Name of the tablespace.
-
CREATOR
Name of the table creator.
-
TABLE
Name of the table.
-
Minimum row length
Minimum length of rows.
-
Maximum row length
Maximum length of rows.
-
Number of rows read
Number of rows read for the specified table.
-
Number of bytes before compression
Number of bytes before compression for the rows used in the specified table.
-
Comp. Module
Name of the compression module.
-
CP for optimal compression
-
CS for simple compression
-
CP-LZ for hardware ES/9000 compression
-
Data Gains
Compression percentages using the corresponding compression module on the specified table.
-
Alloc. Gains
Physical savings that would be achieved when using the corresponding compression module.
-
Number of bytes after compression
Number of compressed bytes for the rows used in the specified table.
-
Compress. Time
Time of the compression in microsecond.
-
Decompress. Time
Time of the decompression in microsecond.
TPAKDB2 batch
TPAKDB2 is a utility program functioning under the control of TSO, and simulates compression by INFOPAK on your DB2 tables.
TPAKDB2 will perform a total or partial read of one or more DB2 tables, and calls the compression routine for each line (row) that is read. TPAKDB2 then prints a report summary showing the gain that is possible to obtain with INFOPAK compression on DB2 tables.
The TPAKDB2 Compression report can then be used for calculating the size of "TABLESPACE".
The installation procedure copies the TPAKDB2 module on the library referenced by the MVSLOAD ddname and the member TPAKDB2 on the library referenced by the DB2DBRM ddname (see manual HP106).
INFOPAK for DB2 will only compress user table data. Storage space used by pointers, indexes, etc. will not be compressed.
To get started we suggest the following steps:
-
Install INFOPAK for DB2.
-
Install TPAKDB2.
-
Create the TPAKDB2 execution plan.
-
Run TPAKDB2.
-
Use INFOPAK for DB2 on the same tables to measure performance and prove the actual storage savings.
Using TPAKDB2
Before executing TPAKDB2, it is necessary to create a DB2 execution plan. This is achieved by a BIND of the TPAKDB2 member. The user who creates the execution plan must be authorized to use BINDADD.
The BIND can be done in two ways:
-
By means of the DB2 menu under ISPF
-
By using the DSN command under TSO as shown below:
DSN SYSTEM (DB2-sub-system-name)
BIND PLAN(TPAKDB2)
MEMBER (TPAKDB2)
LIBRARY (DBRM-library-name)
ISOLATION (CS)
END
TPAKDB2 requires the following DD statements:
-
PRINT
TPAKDB2 produces a report output record and the PRINT DD statement must be supplied according to the following format:
//PRINT DD SYSOUT=* -
SYSIN
SYSIN permits introducing one or more parameters defining the tables and the number of lines (rows) of each of these tables on which you want to see the results of compression calculated and printed by TPAKDB2.
The format of the SYSIN parameter is as follows:
----+----1----+----2----+----3----+----4
bbbnnnnnnnnnnnnnnnnnnnnnnnnnnnbllllllllwhere the symbols meanings are:
{wrapper="1" role="DL"}
-
b
space
-
nn...nn
DB2 table name (Max 27 characters)
-
llllllll
Number of rows (lines) to read in this table. This parameter is optional. If omitted TPAKDB2 will read the entire table in sequence.
-
A JCL example for executing TPAKDB2 is as follows:
//INFTSTPK JOB USER=USER1,....
//EXECIMS2 EXEC PGM=IKJEFT01
// cards DD necessary for execution of TSO
//SYSTSIN DD *
DSN SYSTEM(DB2 sub-system)
RUN PROGRAM(TPAKDB2) PLAN(TPAKDB2) LIB(userlib) PARM('CPU')
END
//PRINT DD SYSOUT=*
//SYSIN DD *
USER1.TA0PAK 20
USER1.TA00VC 100
USER2.TA00VD
USER2.TA00VE 100
/*
note
In this example, TPAKDB2 will process 20 rows of table USER1.TA0PAK, 100 rows of table USER1.TA00VC, all of table USER2.TA00VD, etc...
TPAKDB2 results
TPAKDB2 produces a report showing the gain obtained for the table(s) requested by INFOPAK's three compression modules INFDBCPE, INFDBCX0 and INFDBCZE (provided with hardware option). PARM('CPU') must be specified on the RUN statement to get compression decompression CPU time.
The TPAKDB2 report is as follows:
************************************************************************************************************************************
* INFOTEL I N F O P A K FOR D B 2 03/09/93 14 : 17 PAGE : 1 *
************************************************************************************************************************************
* COMPRESSION STATISTICS *
* ALLOCATION GAINS : 127 LINES PER PAGE USE THE HARDWARE COMPRESSION SIMULATION *
*********************+-+****************************************************************+--+****************************************
|9| |10|
|1| |2| |3| |4| |5| |6| |7| |8|
! TABLE NAME ! ROW LENGTH ! NUMBER OF ROWS ! NUMBER OF BYTES ! COMPRES. ! NUMBER OF BYTES ! DATA ! ALLOC. !
! ! MIN / MAX ! READ ! READ ! MODULE !AFTER COMPRESSION ! GAINS ! GAINS !
! USER1.TA0PAK ! 63 / 70 ! 20 ! 1 340 ! INFDBCPE ! 764 ! 42.98 %! 0.00 %!
! ! ! ! ! INFDBCX0 ! 851 ! 36.49 %! 0.00 %!
! ! ! ! ! INFDBCZE ! 801 ! 40.22 %! 0.00 %!
! USER1.TA00VC ! 68 / 68 ! 100 ! 6 800 ! INFDBCPE ! 2 287 ! 66.37 %! 50.00 %!
! ! ! ! ! INFDBCX0 ! 3 119 ! 54.12 %! 50.00 %!
! ! ! ! ! INFDBCZE ! 2 835 ! 58.32 %! 50.00 %!
! USER1.TA00VD ! 73 / 103 ! 120 ! 11 359 ! INFDBCPE ! 4 713 ! 58.51 %! 50.00 %!
! ! ! ! ! INFDBCX0 ! 8 793 ! 22.59 %! 25.00 %!
! ! ! ! ! INFDBCZE ! 7 362 ! 35.19 %! 25.00 %!
! USER1.TA00VE ! 389 / 389 ! 100 ! 38 900 ! INFDBCPE ! 8 696 ! 77.65 %! 70.00 %!
! ! ! ! ! INFDBCX0 ! 23 487 ! 39.62 %! 40.00 %!
! ! ! ! ! INFDBCZE ! 13 682 ! 64.83 %! 60.00 %!
! ! ! ! ! ! ! !
!**** TOTAL **** ! 340 ! 58 399 ! INFDBCPE ! 16 460 ! 71.81 %! 59.00 %!
! ! ! ! INFDBCX0 ! 36 250 ! 37.92 %! 35.00 %!
! ! ! ! INFDBCZE ! 23 879 ! 59.11 %! 49.00 %!
************************************************************************************************************************************
* INFOPAK SOFTWARE PROPERTY OF INFOTEL *
************************************************************************************************************************************
In the TPAKDB2 report, the headings are interpreted as follows:
-
Name of the table used.
-
Length of rows (minimum length/maximum length).
-
Number of rows read for each table specified.
-
Number of bytes before compression for the rows used in each table.
-
Name of the compression modules (INFDBCPE, INFDBCX0, INFDBCZE).
-
Number of bytes after compression. This is the actual number of compressed bytes for the rows used in each table and a grand total of all the tables used in this run.
-
Compression percentages for all compression modules for each table and a total percentage for all the tables used in this run.
Those percentages are computed using the following formula:
(7) = [(4) - (6)] / (4)These figures are the actual logical gains when compressing rows. They do not take into consideration any physical constraint.
-
Allocation gains are the physical savings that would be achieved when using each compression module.
These percentages do take into consideration control information of the DB2 pages, the limit of 127 rows per page until version 2.3, 255 rows per pages since V3 (see (9)), and the size of the page. One assumes that the pages contain rows of only one table. This result is correct if the FREEPAGE and FREESPACE that was used in the uncompressed table is unchanged and the number of rows read spans more than one page.
note
:
TPAKDB2 uses the SYSIBM.SYSTABLES and SYSIBM.SYSTABLESPACE tables to determine the size of the page. The user must have authorization to select rows from those tables.
-
Maximum number of rows per page.
-
Message that indicates if the hardware instruction is present and used by routine INFDBCZE, or if the instruction is simulated.
note
Only user data is affected. Indexes, if any, are not compressed.
Error messages
TPAKDB2 will produce error messages:
-
ACCESS ERROR ON: table_name SQLCODE: return_code FOR SQL STATEMENT: sql_statement
In case of access error.
-
table_name IS NOT A TABLE, ALLOCATION GAIN COMPUTED USING 4K PAGES
if the specified object is an alias or a view. The allocation gain is still performed but using a 4K pages.
-
ALLOCATION GAIN COMPUTED USING 4K PAGES
the access to the DB2 catalog has failed, defaulted to 4K pages.
-
EXPIRATION DATE EXCEEDED, EXECUTION CANCELED
Contact INFOTEL.
INFOPAK for DB2
This chapter describes the necessary operations to be performed in order to use INFOPAK,and answers questions that are most frequently asked by the users.
Activating compression
This process may be performed :
-
either directly, using the module INFDBCPE, INFDBCX0 which requires unload and reload of the table,
-
or using the modules INFFECPE, INFFECX0 without unloading the table but under the control of INFOTOOL for DB2 (documentation HI100).
Standard compression: INFDBCPE or INFDBCX0
Several steps are necessary to compress a table:
-
Unload the table
-
DROP/CREATE the table using the EDITPROC parameter to specify INFOPAK.
-
Reload the DB2 table to be compressed.
Unload the table
For example, to unload the table, use the program DSNTIAUL.
example
JCL example :
//*
//* RUN UNLOAD PROGRAM
//*
//UNLOAD EXEC PGM=IKJEFT01,DYNAMNBR=20
//SYSTSPRT DD SYSOUT=*
//SYSTSIN DD *
DSN SYSTEM(DB2T) <-- DB2 owner system
RUN PROGRAM(DSNTIAUL) PLAN(DSNTIAU2) -
LIB('DB2_modules_library')
//SYSPRINT DD SYSOUT=*
//SYSUDUMP DD SYSOUT=*
//SYSREC00 DD UNIT=3380,SPACE=(1024,(10,10)),DISP=(,CATLG),
DSN=unload_file
//SYSPUNCH DD UNIT=3380,SPACE=(800,(15,15)),DISP=(,CATLG),
DSN=file used to reload the table
//SYSIN DD *
table_name
DROP/CREATE
To compress a table, INFOPAK requires the EDITPROC parameter in the CREATE TABLE statement.
CREATE TABLE table_name
EDITPROC INFDBCPE
or
EDITPROC INFDBCX0
.....................;
{wrapper="1" role="DL"}
-
EDITPROC
Informs DB2 that this table will be compressed.
-
INFDBCPE
INFOPAK routine name (or INFDBCX0).
Reload the DB2 table
For example, to reload, use the LOAD command of DB2 UTILITIES (DSNUTILB) with the files SYSRECnn and SYSPUNCH created by DSNTIAUL.
At this point the data base table is now compressed. The compression is transparent to the user programs.
note
These steps are totally automated by the command ALTER ADD EDITPROC UNLOAD YES of INFOTOOL for DB2 (ref documentation HI100).
Compression in place: INFFECPE or INFFECX0
These modules cannot be used for tables in RRF (reordered row format). If used, either SQLCODE -652 or message DSNU280I with ERROR=X'00000000' FROM EDIT ROUTINE INFFECPx FOR tablename is returned by DB2.
Installing compression requires :
-
Add the EDITPROC INFFECPE (or INFFECX0) parameter to the CREATE TABLE without unloading reloading the table.
This operation is not directly supported by DB2, it is automatically performed by INFOTOOL for DB2 using the command ALTER ADD EDITPROC UNLOAD NO.
The table may now be used. Only rows that are written or updated are compressed (logical reorganization of a partition using a REORG UNLOAD PAUSE will compress the entire partition).
-
INFOTOOL for DB2 also supports the conversion of the EDITPROC INFFECPE parameter to INFDBCPE or INFFECX0 into INFDBCX0 in the CREATE TABLE statement without unloading and reloading.
The modules INFDBCPE and INFFECPE have compatible compression/decompression algorithms as well as INFDBCX0 and INFFECX0. But, INFFECPE, or INFFECX0 do not allow addition of a column with unload/reload (ref documentation HI100). It is then necessary to replace the EDITPROC INFFECPE by INFDBCPE or INFFECX0 by INFDBCX0.
This operation should be performed when all rows of the table are compressed, never before. The command ALTER REPLACE EDITPROC UNLOAD NO of INFOTOOL for DB2 automatically performs this operation.
Warning:INFBDCPE and INFDBCX0 are incompatible. Therefore, it is not possible to replace the EDITPROC INFFECX0 by INFDBCPE without unloading the table (or INFFECPE by INFDBCX0).