TAA Tools
CLCDBFHSH       CALCULATE DATA BASE FILE HASH          TAAHSHA

The Calculate Data Base  File Hash command determines a  hash value for
the  data in  a data  base member.   The  intent of  the command  is to
provide  a  comparison  method  for large  files  on  different systems
without transporting  the  entire file  and making  a  comparison.   An
optional  outfile HASHP  may  be  written.   The  CMPDBFHSH command  is
supported to compare HASHP files in different libraries.

The  model  file for  the outfile  is TAAHSHAP  with  a format  name of
HSHRCD.

To try out  the command,  use it on  a reasonably small  file (such  as
1000 record or less).

             CLCDBFHSH    FILE(xxx)

Messages  describe  the  results.   You  can  compare  the  hash  value
manually  using CLCDBFHSH on what  is supposed to be  a duplicate file.

However, the real  power of the  function is to use  the 'block  count'
function (described  later)  and the  outfile capability  and make  the
comparison with the CMPDBFHSH command.

Assume you  have a large FILEA  on two systems and want  to ensure that
the  data  matches  100%.   On  the  From system  you  would  enter the
following:

             CLCDBFHSH    FILE(FILEA) OUTPUT(*OUTFILE)
                            OUTLIB(xxx)

This creates the  HASHP outfile in the  named library.  The  member has
only  a single  record with  the count  of records  found and  the hash
value.

You can review the data in the file with the command:

             PRTDB2       FILE(xxx/HASHP)

You would  then transfer the HASHP file to  the 2nd system that has the
duplicate  FILEA.    On  the  2nd  system  you  would  issue  the  same
CLCDBFHSH command and output the HASHP file

Then enter the CMPDBFHSH command as:

             CMPDBFHSH  FROMLIB(xxx) TOLIB(yyy)

An escape  message would be sent  if the hash  values do not match.   A
listing is always produced.

Assume  that the  hash  values do  not agree  meaning the  data  is not
identical and  there  is one  record  in  a million  record  file  that
differs.    The  'block  count' function  can  be  used  to  produce  a
separate hash  value for 'blocks  of records'.  For  example, you could
specify blocks of 50,000:

             CLCDBFHSH    FILE(FILEA) OUTPUT(*OUTFILE)
                            OUTLIB(xxx) BLOCKCNT(50000)

A  separate record would  be written to  the outfile for  each block of
50,000 records  (the last  record is probably  less than  50,000).   By
doing the  same function  on the 2nd  system and then  using CMPDBFHSH,
you can determine which block of 50,000 is not identical.

Assume  it was the 300,001 to  350,000 block.  You  can then reduce the
block size  (assume 5,000)  and describe  a specific  block of  records
using the FROMRCD and TORCD parameters such as:

             CLCDBFHSH    FILE(FILEA) OUTPUT(*OUTFILE)
                            FROMRCD(300001) TORCD(350000)
                            OUTLIB(xxx) BLOCKCNT(5000)

This would output 10  records to the HASHP file.  Assuming  you did the
same  on the 2nd  system and then  used CMPDBFHSH, you  could determine
which block of 5,000 records differed.

On each iteration, you could ask  for smaller and smaller block  counts
as you identify where the  problem is.  Assume you were  able to narrow
the problem  area to a  block of 100 records.   You can  then request a
block of 1 and identify the record which is not identical.

CLCDBFHSH  utilizes a good deal of CPU time.   If you have large files,
you should use the command  at offpeak times of the day.   Specifying a
'from'  and  'to' record  will  allow you  to  perform  the hash  using
multiple steps.

Members without data
--------------------

It  is valid to  use CLCDBFHSH  on a member  without any  data, but the
defaults must be  used for FROMRCD and  TORCD.  TAA9894  is sent as  an
escape message if the defaults are not used.

HASHP file
----------

The  HASHP  file  will  contain  one record  for  each  block  that  is
requested.    The  default for  BLOCKCNT  is  *ALL  meaning the  entire
member is considered as one block and one record would be output.

The key structure for the HASHP file is:

            Library
              File
                Member
                  Key

The key  field is  taken from  the  KEY parameter  on CLCDBFHSH.    The
default is *GEN meaning  the command will generate a key  for you using
the naming  convention of NBR0000001, NBR0000002, etc.   If you request
multiple  blocks, each block  record would receive  a unique key value.
While you can name  your own key value,  the default should be used  in
most cases.

When  CMPDBFHSH is  used, the  records  in the  From file  are used  to
chain  to the records  in the To  file.  Using the  same key convention
is required.   If  the  To file  record  does not  exist, an  error  is
noted.   Both the  hash value  and the  record counts  are compared  by
CMPDBFHSH.

CMPDBFHSH allows  you to compare one or all  records in the HASHP file.
You can  output  multiple files/members  to  the same  HASHP  file  and
request a comparison on one or all of the files/members.

The  default on  CLCDBFHSH  is  REPLACE(*YES)  for the  output  member.
This means  the member is  cleared first before any  output records are
written.    When you  are only  using HASHP  to compare  one file  at a
time, the default works properly.

If you  want  records from  multiple files/members  in  the same  HASHP
file, you  do not  want the  default to clear  the file  when CLCDBFHSH
begins.

The  special value *MTN should be considered.   This invokes the MTNHSH
command which deletes any  records in the HASHP  file for the FILE  and
MBR parameters  specified on  CLCDBFHSH.   This allows  you to add  new
records  with  the default  generated  key of  NBR0000001,  etc without
causing duplicate key errors.

MTNHSH Command
--------------

The MTNHSH  command is  normally requested  by using  REPLACE(*MTN)  on
the CLCDBFHSH command.

However, you  can use MTNHSH at  any time to  cleanup old records  in a
HASHP file.   You must  identify the file  and member that  you want to
delete records for.

MTNHSH  must allocate the  HASHP file member.   The library/file/member
records specified  are deleted,  the file  is copied  to the  temporary
file  HASHP2 (created  in  the same  library  as HASHP),  and then  the
records  are copied back.   The HASHP2  file is deleted,  and the HASHP
file is de-allocated.

If the HASHP2 file  exists when the  command starts, it indicates  that
the previous use of the command did not complete successfully.

Technique used
--------------

The 'hash'  technique is not  a CRC  (Cyclic Redundancy Check)  such as
used by  the system for a 'hashing' technique.   CRC provides an 8 byte
value.  Instead,  the RIPEMD-160  hash functions are  used as  provided
by K.U.   Leuven of  the Dept of  Electrical Engineering -  ESAT/COSIC.
C Language is used to provide a 16 byte return value.

For  more  information  about  the  technique refer  to  the  TAAHSHAE1
source   member  and   the  RIPEMD-160   software  written   by  Antoon
Bosselaers                         available                         at
http://www.esat.kuleuven.ac.be/-cosicart/ps/AB-9601/   (the   character
before 'cosicart' should be a 'tilde'.

CLCDBFHSH escape messages you can monitor for
---------------------------------------------

      TAA9892    The assigned key is not unique in the file
      TAA9894    If no records exist, the defaults must be used

Escape messages from based on functions will be re-sent.

CMPDBFHSH escape messages you can monitor for
---------------------------------------------

      TAA9893    Differences were found
      TAA9895    No records were found to compare
                   Check the CMPxxx parameters

Escape messages from based on functions will be re-sent.

MTNHSH escape messages you can monitor for
------------------------------------------

None.  Escape messages from based on functions will be re-sent.

CLCDBFHSH Command parameters                          *CMD
----------------------------

   FILE          The  qualified name of  the file to  generate the hash
                 value for.    The  library value  defaults  to  *LIBL.
                 *CURLIB may also be used.

   MBR           The  member  to generate  the  hash  value for.    The
                 default  is *FIRST for  the first member  of the file.

                 A specific member name may be entered.

   FROMRCD       The 'from'  record  in  the member  to  start  reading
                 from.    The  default  is  *START  meaning  the  first
                 record in the file.

                 A  specific relative  number may  be  entered up  to a
                 maximum  of  9,999,999,999.   If  a specific  value is
                 entered, it must  be *LE to  the TORCD value and  must
                 be *LE to the number of records in the member.

                 The  file is  read  in arrival  sequence.   The  value
                 entered  (*START = 1) is used  on an OVRDBF command to
                 begin  the  reading   of  the   member.    The   first
                 non-deleted record is read from that point.

                 If  the BLOCKCNT  parameter  is other  than *ALL,  you
                 are  identifying  the block  size  within  the FROMRCD
                 and TORCD.

   TORCD         The 'to'  record  in the  member  to end  reading  on.
                 The default is *END meaning to the 'end of file'.

                 If *END is  used, the 'end  of file' is  determined by
                 the number  of records in the member  when the command
                 starts  processing.   This  value determines  the last
                 record to be  read.  If  additional records are  added
                 to  the end  of file  while CLCDBFHSH  is  in process,
                 they are not considered.

                 A  specific number may  be entered that is  *GE to the
                 value of the  FROMRCD parameter.   The number  entered
                 will  be the  last record  read unless  'end of  file'
                 occurs prior to that value.

   OUTPUT        The  type  of  output  to  be  performed.   *  is  the
                 default meaning  that messages  are sent  to  describe
                 the results.

                 *OUTFILE may be  specified to mean that  both messages
                 and an  outfile with the  results will be  output.  If
                 *OUTFILE   is  specified,  you   may  also  enter  the
                 parameters   OUTLIB,   OUTMBR,   REPLACE,   KEY,   and
                 BLOCKCNT.

   OUTLIB        The library  in which the  file HASHP will  be placed.
                 The  default is  *LIBL.   If the  HASHP file  does not
                 already exist, a library must be named.

   OUTMBR        The  member of  the  HASHP  file  to  be  used.    The
                 default is  HASHP.   If the member  does not  exist it
                 is added.

   REPLACE       A *YES/*NO  value for whether the  member of the HASHP
                 file should  be cleared  before writing  records  into
                 it.  The default is *YES.

                 *NO may  be specified to  add records to  any existing
                 data.

                 *MTN  may be specified  to invoke the  MTNHSH command.
                 This  will cause  a deletion  of any  existing records
                 for the  same  file/library/member before  adding  any
                 new records to the file.

   KEY           The  key assigned to  the record  in the  output file.
                 The  default is  *GEN meaning  a naming  convention is
                 used  of  NBR0000001,  NBR0000002   ...    Using   the
                 default is usually the best solution.

                 The total  key in  the HASHP  file is  made up of  the
                 LIB,  FILE, MBR,  and  KEY parameters  (this generates
                 Library/File/Member/Key).  Unique  keys are  required.

                 If you  use the CMPDBFHSH  command, the  key structure
                 is  used to  access  the corresponding  record  in the
                 file  being  compared.   You  must be  consistent (the
                 default normally provides  the best  approach).  If  a
                 BLOCKCNT   is  specified,  you   must  use   the  *GEN
                 default.

   BLOCKCNT      The  block count  used.  An  entry is  only valid when
                 an OUTPUT(*OUTFILE) is specified.

                 The default is  *ALL meaning that  one record will  be
                 output with  a hash  value for  all records  specified
                 between the FROMRCD and TORCD values.

                 A  block size may  be entered (such  as 50000) meaning
                 that a  record  will  be  output  for  each  block  of
                 50,000  records that  exist  between the  FROMRCD  and
                 TORCD values  (the last block would  normally not have
                 the number specified).

                 Using  a  block count  can  assist you  in  allowing a
                 comparison  of smaller  and  smaller segments  of  the
                 file  in attempting  to identify  those records  which
                 are not the same.

                 A  block of 1 is valid.   The block size cannot exceed
                 the number of  records between  the FROMRCD and  TORCD
                 values  nor can  it exceed  the number  of records  in
                 the file.

                 If  1) defaults  for FROMRCD/TORCD are  used and  2) a
                 block  count  is  specified  and  3)  deleted  records
                 exist in  the file, the  number of records  to process
                 will be  the sum of the active  and deleted records in
                 the file.

                 If  a  block  contains  only  deleted  records, X'00's
                 will be returned as the hash value.

CMPDBFHSH Command parameters                          *CMD
----------------------------

   FROMLIB       The  library containing  the  HASHP  file  created  by
                 CLCDBFHSH  that has  the 'from'  data to  be compared.
                 *LIBL or *CURLIB may be used as the library value.

   TOLIB         The  library  containing  the  HASHP  file  created by
                 CLCDBFHSH that  has  the  'to' data  to  be  compared.
                 *LIBL or *CURLIB may be used as the library value.

   FROMMBR       The member of  the 'from' HASHP file to be  used.  The
                 default is *FIRST.

   TOMBR         The  member of the  'to' HASHP file  to be used.   The
                 default is *FIRST.

   CMPFILE       The  qualified  object   name  of  the   file  to   be
                 compared.   The  file name  defaults  to *ALL  meaning
                 any file name will be compared.

                 The  library  defaults  to  *ALL meaning  any  library
                 name will be compared.

   CMPMBR        The  member  name of  the file  to  be compared.   The
                 default  is  *ALL meaning  any  member  name  will  be
                 compared.

   CMPKEY        The  assigned key  to  be compared.    The default  is
                 *ALL  meaning any  assigned key.   Either  the default
                 should be  used or  the value  for  the KEY  parameter
                 you entered  on CLCDBFHSH (assuming  you did  not take
                 the default).

MTNHSH Command parameters                             *CMD
-------------------------

   FILE          The qualified  name of the file  to delete records for
                 in the  HASHP file.   The  library value  defaults  to
                 *LIBL.  *CURLIB may also be used.

                 If  a   special  value   is  used   for  the   library
                 qualifier,  the file must  exist and its  library name
                 is  used  to determine  the records  to be  deleted in
                 HASHP.

   MBR           The member  to  delete  records  for in  HASHP.    The
                 default is  *FIRST for the  first member of  the file.
                 If  the default is  used, the file must  exist and the
                 name of the  first member  will be  used to  determine
                 the records to be deleted.

                 A specific member name may be entered.

   HASHPLIB      The library  containing the  HASHP file.   The default
                 is *LIBL.  *CURLIB may be specified.

   HASHPMBR      The  member  of  the  HASHP  file  that  contains  the
                 records to  be deleted.   The  default is  *FIRST.   A
                 specific member name may be entered.

Restrictions
------------

The maximum record length supported is 32,000.

Prerequisites
-------------

The following TAA Tools must be on your system:

     CHKOBJ3         Check object 3
     CVTHEX          Convert hex
     CVTTIM          Convert time
     EDTVAR          Edit variable
     RTVDAT          Retrieve date
     RTVDBFA         Retrieve data base file attributes
     RTVSYSVAL3      Retrieve system value 3
     SNDCOMPMSG      Send completion message
     SNDESCMSG       Send escape message
     SNDHEXMSG       Send hex message
     SNDSTSMSG       Send status message

Implementation
--------------

None, the tool is ready to use.

Objects used by the tool
------------------------

   Object        Type    Attribute      Src member    Src file
   ------        ----    ---------      ----------    ----------

   CLCDBFHSH     *CMD                   TAAHSHA       QATTCMD
   CMPDBFHSH     *CMD                   TAAHSHA2      QATTCMD
   MTNHSH        *CMD                   TAAHSHA3      QATTCMD
   TAAHSHAC      *PGM       CLLE        TAAHSHAC      QATTCL
   TAAHSHAC2     *PGM       CLP         TAAHSHAC2     QATTCL
   TAAHSHAC3     *PGM       CLP         TAAHSHAC3     QATTCL
   TAAHSHAR      *PGM
   TAAHSHAR3     *PGM                   TAAHSHAR3     QATTRPG
   TAAHSHAR11    *PGM       RPGLE       TAAHSHAR11    QATTRPG
   TAAHSHAR      *MODULE    RPGLE       TAAHSHAR      QATTRPG
   TAAHSHAE1     *MODULE    CLE         TAAHSHAE1     QATTPL1
   TAAHSHAE2     *MODULE    CLE         TAAHSHAE2     QATTPL1
   TAAHSHAP      *FILE      PF          TAAHSHAP      QATTDDS
   TAAHSHAQ      *FILE      PF

TAAHSHAQ is created from the TAAHSHAP source.

Structure
---------

CLCDBFHSH   Cmd
  TAAHSHAC    CL Pgm
    TAAHSHAR11   RPG Pgm  - Checks for duplicate key
    TAAHSHAR     RPG Pgm  - Does hash function
       TAAHSHAR    RPGLE *MODULE
       TAAHSHAE1   CLE  *MODULE
       TAAHSHAE2   CLE  *MODULE
    TAAHSHAR11   RPG Pgm  - 2nd use to write to the HASHP file

CMPDBFHSH   Cmd
  TAAHSHAC2   CL Pgm
    TAAHSHAR2    RPG Pgm

MTNHSH      Cmd
  TAAHSHAC3   CL Pgm
    TAAHSHAR3    RPG Pgm
					

Added to TAA Productivity tools December 15, 2002


Home Page Up to Top