Blogs

The timeless art of validation - Tips and Tricks to help cleanse and regulate PeopleSoft Conversion

By Archive User posted 11-09-2010 01:23 PM

  

How you validate conversion data can affect the desired outcome – and we wanted to share a new method of validation to improve results and reduce the occurrence of re-running validation steps to achieve desired results.

When it comes to converting legacy data, there are a number of things to consider. Format is usually one of the first things to check for – date, number, length, type etc… Matching these data characteristics is very important for a number of reasons, the most important being consistency between past, present and future data. Another important consideration is legitimacy of the data - can the data be inserted, searched, viewed or saved without error.

One of the best practices for validation is to place conversion data into a staging table – a temporary table with the same or similar structure (fields and keys) as the production/destination record. This allows you to cleanse and update the source data before it is inserted to the destination record. It also enables you to perform any “translations” of the data – where you update a field from a legacy value to a new value. For example, in the legacy system, you may have used “US” as a currency code, but the new system wants you to use “USD”. In the staging table, you would perform a “translate” to update all currency codes from “US” to “USD”.

In addition to inserting data into a staging table, another best practice for validation is to use an error number field (ERRNUM in PeopleSoft) to separate source values that should not be added to the destination records. The ERRNUM field allows you to assign error codes to a row in the staging table that does not meet a specific requirement. After all validations are complete, any remaining error-free row (ERRNUM = 0) will be inserted into the destination records.

As an example, you could assign a documented error number of 9999 to all rows in the staging table (PS_CUSTOM_TEST_STG) where the person does not exist in the PS_PERSON table.

UPDATE PS_CUSTOM_TEST_STG A
SET A.ERRNUM = 9999
     WHERE NOT EXISTS(
     SELECT ‘X’ FROM PS_PERSON B
          WHERE A.EMPLID = B.EMPLID);

For many Campus Solutions conversions, we would have multiple validations on very large sets of data. If we were to take the above example, and set an error number to those students who were not term activated there may be a chance for those 9999 errors to get reassigned a new error.

UPDATE PS_CUSTOM_TEST_STG A
SET A.ERRNUM = 8888
     WHERE NOT EXISTS(
     SELECT ‘X’ FROM PS_STDNT_CAR_TERM B
          WHERE A.EMPLID = B.EMPLID
          AND A.ACAD_CAREER = B.ACAD_CAREER
          AND A.STRM = B.STRM);

From the above example, any student that had the first error 9999, and then a second error of 8888, those students would now have the error 8888, losing the 9999 error. A first instinct would be to only validate where there is not currently an error (ERRNUM = 0). While this will fix the error replacement concern, it won’t truly give you an accurate overview of error counts for estimates, and won’t allow you to provide a complete error report to the legacy data team to investigate.

The Binary Approach

For certain clients, we have had conversions with 20+ validation/error detection steps on very large sets of data. As we were doing our validations, we ran into the situation above where rows of data would switch the error values. As we progressed through iterations, we would fix some of the data in the legacy system by clearing one of the errors. These errors rows would remain in a different error state as we passed through iterations. This was a timely process where we would have to cycle through very large sets of data numerous times to remove all of the errors. (Fix a 9999 error and re-run, to discover it is an 8888 error etc.)

The binary approach allows multiple validations to occur without affecting previous validation steps.

Binary Approach Steps:

1)    Create a new custom error number field (CUSTOM_ERRNUM) of number format with length 20 (or more) to handle the amount of validation/error detection steps

2)    Document required error checks and associated binary-style error numbers (1,10,100 etc.)

3)    Write code for validations

4)    Write code for error summary (output to audit file) – See Appendix #1

5)    Insert error-free rows (CUSTOM_ERRNUM = 0) into destination tables.

Above examples re-coded with Binary Approach:

UPDATE PS_CUSTOM_TEST_STG A
SET A.CUSTOM_ERRNUM = A.CUSTOM_ERRNUM + 1
     WHERE NOT EXISTS(
     SELECT ‘X’ FROM PS_PERSON B
          WHERE A.EMPLID = B.EMPLID);
UPDATE PS_CUSTOM_TEST_STG A
SET A.CUSTOM_ERRNUM = A.CUSTOM_ERRNUM + 10
     WHERE NOT EXISTS(
     SELECT ‘X’ FROM PS_STDNT_CAR_TERM B
          WHERE A.EMPLID = B.EMPLID
          AND A.ACAD_CAREER = B.ACAD_CAREER
          AND A.STRM = B.STRM);

To Query the specific errors:

Error 1 – EMPLID does not exist in PS:
SELECT * FROM PS_CUSTOM_TEST_STG WHERE CUSTOM_ERRNUM like ‘%1’;
       returns all rows with a ‘1’ in the most right position, regardless of what is in front
Error 10 – EMPLID not term activated for the career/strm:
SELECT * FROM PS_CUSTOM_TEST_STG WHERE CUSTOM_ERRNUM like ‘%1_’; (underscore = position wildcard)
       returns all rows with a ‘1’ in the second most right position
Error 100:
SELECT * FROM PS_CUSTOM_TEST_STG WHERE CUSTOM_ERRNUM like ‘%1__’;
       returns all rows with a ‘1’ in the third most right position

With the binary approach, you will have rows from the above examples with
CUSTOM_ERRNUM = 1 (first error only)
CUSTOM_ERRNUM = 10 (second error only)
CUSTOM_ERRNUM = 11 (both errors)

As you can see, the binary approach allows you to perform multiple validations on sets of data without losing other error detection steps. This will reduce the amount of conversion program re-runs and validation steps required to get closer to final production data sets.


For specific follow-up on this matter, please contact

Chad MacDonald
Consultant | Deloitte Inc.
chadmacdonald@deloitte.ca

0 comments
4 views