Tuesday, February 18, 2014

Parallel Processing using Temporary Record with Application Engine in PeopleSoft

This is another important question we often encounter during the interviews. I have been struggling a lot to understand and finally did it with a classic example.

Parallel processing comes in picture when there is a requirement to process large bulk of rows without compromising the performance which might otherwise get impacted greatly with non-parallel based processing. We would try to understand this concept with an actual example where this concept has been used.

Requirement - There are 100,000 employees in the organization and many of them have more than one assignments (EMPL_RCD) hence the total number of rows in the PER_ORG_ASGN record collectively will be 120,000. Develop a program to update the primary job indicator flag for each employee in the table PRIMARY_JOBS.

Please also see -
PeopleSoft Set Control Field Concept and Tableset Sharing
Understanding dynamic prompts in Peoplesoft
PeopleSoft 9.1 Person Model
Adding and Maintaining Person Of Interest in PeopleSoft
PeopleSoft Set Control Field
Peoplesoft Row Level Security Search Records
SQL Query for Max Effective (MAX (EFFDT) dated row from JOB table
SQL Query to find all the Direct Reports for a Manager
How to find the List of Users Assigned to a Role 
Understanding Component Interface in PeopleSoft
Creating Query Report with PS Query in PeopleSoft
PeopleSoft HRMS Online Training

Implementing Parallel Processing using Temporary Record with Application Engine

Step :1 Open the App Designer
Step :2 Create three record definitions BN_EMPLID_WRK, BN_EMPLID1_WRK and BN_JOB_WRK of type "Temporary Table"



Step 3: Create a record BN_AGE50_AET of type "Derived/Work"




Step 4: Create a new Application Engine program "BN_UPD_PRIM" in app designer.
Step 5: Open the Application engine properties, go to "Temp Tables" tab and assign all three temp tables. Give the instance count as 10.




Step 6: Likewise, go to "State Records" tab and add the record BN_AGE50_AET.

Step 7: Build the temporary records BN_EMPLID_WRK, BN_EMPLID1_WRK and BN_JOB_WRK. App designer will create the number of instances for each of these temporary tables equal to given in the Program Properties -> Temp Tab - Instance count box which is "10" in our case which means the tables for each temp record will be created as below:

   BN_EMPLID_WRK -  BN_EMPLID_WRK1, BN_EMPLID_WRK2... up to 10 instances
   BN_EMPLID1_WRK - BN_EMPLID1_WRK1, BN_EMPLID1_WRK2.... up to 10 instances
   BN_JOB_WRK - BN_JOB_WRK1, BN_JOB_WRK2.... up to 10 instances

Step 8:  In the MAIN section of app engine (Or create a new section) add a Step/Action of SQL type, then add below SQL in that.




This SQL will load all the employees into temp table instances BN_EMPLID_WRK1, BN_EMPLID_WRK2... etc, Please make sure that you are using the meta sql %Table with the temp table name.
Lets say we have 100,000 employees in PS_PERSON record and each employee has just one assignment in PS_PER_ORG_ASGN. Since we have instance count for the temp record to 10 so when the program is run, The selected 100000 rows will be divided into like 100000/10 = 10000 and each chunk of 10000 rows will be inserted into one of the available instances BN_EMPLID_WRK1, BN_EMPLID_WRK2 ...etc.

Please also see -
PeopleSoft Set Control Field Concept and Tableset Sharing
Understanding dynamic prompts in Peoplesoft
PeopleSoft 9.1 Person Model
Adding and Maintaining Person Of Interest in PeopleSoft
PeopleSoft Set Control Field
Peoplesoft Row Level Security Search Records
SQL Query for Max Effective (MAX (EFFDT) dated row from JOB table
SQL Query to find all the Direct Reports for a Manager
Hiring a Person in PeopleSoft
Business Unit, Company and Regulatory Region in PeopleSoft
Process Security in PeopleSoft 

Please note that we don't have to worry about which instance a particular chunk is being inserted into instead, %Table metasql does that for us. However, Its not necessary that application engine will process the rows in exactly same way explained above, its just a logical explanation which is trying to convey that the number of selected rows i.e 100,000 will be divided and loaded into appropriate temp table instances to be processed simultaneously which is nothing but parallel processing.
Once above step is processed, the temp table BN_EMPLID_WRK has been loaded with large volume of data (1000,000 rows) of employees eligible to be processed. How many rows have been loaded in which temp table instance (BN_EMPLID_WRK1, BN_EMPLID_WRK2... etc) is none of our concerns.

Step 9: Add below Step/Action to update the statistics of temp table BN_EMPLID_WRK for the indexes to improve the performance when the rows are selected from the instances of this table in further steps.


Step 10: Add below Step/Action to get the JOB data (EFFDT, EFFSEQ and EMPL_STATUS) for each employee loaded in BN_EMPLID_WRK, and insert into another temp table BN_JOB_WRK (BN_JOB_WRK1, BN_JOB_WRK2 ... upto 10 instances).



Here again as you notice, we just used %Table metasql with the table BN_JOB_WRK in the SQL which will take care as to how should be the allocation of rows for the instances BN_JOB_WRK1, BN_JOB_WRK2...etc.

Step 11:  Add below Step/Action to update the statistics of temp table BN_JOB_WRK for the indexes to improve the performance when the rows are selected from the instances of this table in further steps.


Step 12: Add below Step/Action to select those employees loaded in the temp table BN_EMPLID_WRK who are all don't have more than one assignments (EMPL_RCD) for a benefit record number (BENEFIT_RCD_NBR), and insert into another temp table BN_EMPLID1_WRK (BN_EMPLID1_WRK1, BN_EMPLID1_WRK2...upto 10 instances).



Step 13: Add below Step/Action to update the statistics of temp table BN_EMPLID1_WRK for the indexes to improve the performance when the rows are selected from the instances of this table in further steps.



Step 14: Finally, mark the current EMPL_RCD as "Primary Job" in the  table PS_PRIMARY_JOBS for the processed employees by joining all three temp tables.


Well, it's quite evident that the procedure to implement parallel processing appears simpler than we thought. However, if we try to compare it with the non-parallel way of addressing the same request then it would certainly help us understand the concept clearly.

Let's discuss how would it have been addressed if there was no mechanism called "Parallel Processing".
1) Record Definition - The number of records will be same but the record type for BN_EMPLID_WRK, BN_EMPLID1_WRK and BN_JOB_WRK will be "Table".
2) Since there is no temporary record created hence no need to assign the temp records in Program Properties -> Temp Tables. It won't be applicable.
3) No Need to use %Table metasql while referring to the records.

Everything else will remain exactly same. We also need to understand how different it will behave without parallel processing.
In this case, When we run the program, The system will  store all the 100,000 rows into the table BN_EMPLID_WRK during the execution of Step 1 explained earlier unlike, into various temp table instances BN_EMPLID_WRK1, BN_EMPLID_WRK2..etc. So, when execution goes further then all the 100,000 rows are fetched from one table and processed but in case of parallel processing the chunk of rows i.e 10000 if fetched from each temp table instance and processed all simultaneously. 
Hence, If time taken to complete the processing is 20 Mins then it will be 20/10 ~ 2 to 5 Mins if parallel processing is used.



Please also see -
PeopleSoft Functional and Technical Online Training
Understanding dynamic prompts in Peoplesoft
PeopleSoft 9.1 Person Model
Adding and Maintaining Person Of Interest in PeopleSoft
PeopleSoft Set Control Field
Peoplesoft Row Level Security Search Records
SQL Query for Max Effective (MAX (EFFDT) dated row from JOB table
SQL Query to find all the Direct Reports for a Manager
Hiring a Person in PeopleSoft
How to find the List of Users Assigned to a Role 
Understanding Component Interface in PeopleSoft 
FTE For Multiple Jobs in PoeopleSoft
Business Unit, Company and Regulatory Region in PeopleSoft  

17 comments:

  1. Hello, Wonderful article. Thank you for elaborating concept of Parallel processing in AE.

    However, I got one question. Is Parallel or concurrent processing simply means having more than one instance of your app engine program running at the same time and each instance of the program gets it's dedicated Temp table instance.
    OR
    Parallel or concurrent processing could be achieved using Temp tables. For example if there are 100K rows to be processed and 10 temp table instances, each table is responsible to process 10K rows simultaneously on a single execution of application engine.

    I believe your blog indicates a second concept. I tried to implement lattar in one of the application engine to process about over one Million records and when executed there is only one instance of Temp table allocated to the process. I want to know how it will process 10 temp tables (or more than one temp table) in single run of program.

    ReplyDelete
    Replies
    1. As this is written it is going to dump 100,000 records into BN_EMPLID_WRK1 (actually BN_EMPLID_WRK04). It will not split up the 100,000 rows. You could create 10 programs and limit the initial load to say 10,000 rows,then each program would be processing that specific data at the same time. Or you could actually have one AE and it could determine the number of total rows to load, divide it by the number of temp tables you have, then create a SQL statement to load this data, store it in your state record and schedule each app engine to run. Either way, I do not see how what is written here is going to AUTOMATICALLY know how to divide the 100,000 rows and actually use all instances of the temp tables and process all 10 instances of the data at the same time. If I am missing anything please let me know, but I just do not see it here.

      Delete
    2. Dale- you are absolutely correct.

      Delete
  2. If I were to process 100 K rows in an application Engine, do I need to run different instances of application Engine for parallel processing of the process or data could automatically be spread out into number of Temp tables (Say 10000 rows among 10 temp tables where 10000 is arbitrary) in single run of program.

    My objective is to divide data among instances of temp tables in a single run without running many instances of the same program as our program will be scheduled and must not require any user intervention therefore no one will be there to enter run control parameters manually.

    Our expected setting: One process -> one instance run of the program -> Uses 5 temp tables instances to insert huge data simultaneously in a single run using parallel processing to achieve performance. Your blog gives us a clue that we can implement that way (referring to para starting with 'Lets say....'). Are you sure your AE program when executed were using all instance of the temp tables to process data on a single run of an application engine.

    ReplyDelete
  3. @Mavarick_1979 - It can be implemented in both ways however the second one PeopleSoft delivered one and a lot easier to implement too.
    Did you mention number of instance count in "Instance count " filed of App engine program properties (Explained in Step: 5)

    @PeopleSoftOracleFusion - Yes, your understanding is correct.

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. I can not see the SQL that is referred to in Step 8, BUT, the way this is written, it will not spread the 100,000 rows across multiple temp tables. It will simply use 1 instance of the 10 tables created. If you want to spread this across 10 tables, you would need to have 10 copies of this app engine. the initial load in step 8, of each AE, would need to setup to only load a specific number or 'chunk' of data. From that point on, each program would be processing the 10k rows that it has loaded, The only value in what I see here is if this AE abended, you would be able to run it again using a different run control and it would use the next available instance of the temp tables

    ReplyDelete
  6. A very elaborated explanation thanks for sharing it...

    ReplyDelete
  7. Hi Santosh,

    Nice explanation. Could you please explain restart-ability in app engine.

    ReplyDelete
  8. I have queued it up in my to be published list.. thanks for visiting guys I am delighted that you all are liking the blog. .

    ReplyDelete
  9. Dale barthold is correct- parallel processing means you have the ability to run the same appengine working on different sets of data- if you implement this as it is elaborated here- it is only going to use one instance of the temp table. You need to segregate data at the runcontrol level and desing your insert sql for Insert into %table (temp_table) so that it fetches data based on different runcontrol parameters.

    ReplyDelete
  10. This comment has been removed by the author.

    ReplyDelete
  11. can anybody please explain on the calculation of total instance count of temporary tables, if count is defined online as well as in application engine

    ReplyDelete
  12. Thanks Santhosh . Very nicely explained.

    ReplyDelete
  13. errors


    PeopleTools 8.50 - Application Engine
    Copyright (c) 1988-2015 PeopleSoft, Inc.
    All Rights Reserved


    Application error: no SQL statement defined for SQL action (108,535)
    Message Set Number: 108
    Message Number: 535
    Message Reason: Application error: no SQL statement defined for SQL action (108,535)

    Process 1335 ABENDED at Step PARR_AE.MAIN.Step02 (SQL) -- RC = ? (108,524)
    Message Set Number: 108
    Message Number: 524
    Message Reason: Process 1335 ABENDED at Step PARR_AE.MAIN.Step02 (SQL) -- RC = ? (108,524)

    Process %s ABENDED at Step %s.%s.%s (Action %s) -- RC = %s

    ReplyDelete
  14. Such a wonderful work....You are Guru of Guru Sir....Thanks.

    ReplyDelete