SSIS Incremental Pkg Definition

Let’s imagine we have a scenario where a family of similar entities needs to be processed; an example that you might have seen (working on BI projects) is to produce a set of reports that are quite similar in the structure but that might differentiate themselves for some computation applied. Let’s say you cannot use SSRS or because the nature of the problem that would not be suitable…

Scenario definition

Let’s define synthetically and in a simple way the problem to solve and then let’s see how we could address it:

Sources:

  • some ReportTypesID (to identify the type of report to produce)
  • 2 tables with data: TableX, TableY
  • 3 templates: XLS template

Target:

  • 3 XLS generated using the XLS template following the logic:

Using the data in TableX and TableY (they have some columns in common) and 3 given an XLS template produce 3 report types as following:
1. XLS = UNION of TableX & Y values and use Template1
2. XLS = INTERSECTION of TableX & Y values and use Template2
3. XLS = DIFFERENCE of TableX & Y values and use Template3

Given a parameter ReportTypeID generate the related XLS

Not flexible approach

Following a not flexible approach would define 3 different packages ( ex NotFlexibleUNION.dtsx, NotFlexibleINTERSECTION.dtsx and NotFlexibleDIFFERENCE.dtsx) with the same structure

image001

plus some vars

image003

ReportTypeID = and ID 1 to 3 in our case (useful to log what we are running)
veSQLForCalculation = sql to use in the Data Flow (to achieve UNION, INTERSECTION etc…)

and specific hardcoded logic/sql inside to address the 1…3. In this way you end up with a lot of code duplication and a lot different packages to maintain;

A more flexible approach

The following might be suitable in a general situation using child pkg to do calculations; each of one keeps a different setup for ReportTypeID and veSQLForCalculation

image005

Less code duplication but still we have different Childs pkg to manage and all the complexity related passing data between them; looking at our scenario we can reach a better compromise

Incremental approach

The main difference between the 3 reports is how we do calculation (the sql snippet in veSQLForCalculation), to grab the data used to fill the xls;
So what about keeping the sql to run in a table and use this table to configure the pkg at runtime extracting the sql and map it to veSQLForCalculation?

image007

SQL Get veSQLForCalculation grabs the veSQLForCalculation value given the ReportTypeID, that value is mapped to the var veSQLForCalculation. There is an optional SCR parse object to veSQLFor that can be omitted to simplify, what it does is cope a more realistic scenario where you have several columns form the table below, not only 1 veSQLForCalculation and the sql might be big, so does the trick in the code below:

public void Main()
{

OleDbDataAdapter A = new OleDbDataAdapter();
System.Data.DataTable dt = new System.Data.DataTable();
A.Fill(dt, Dts.Variables["User::oveMetaSQL"].Value);
// as the object are stored as varmax() this trick make able to copy in the vars, you cannot use an execute task and map to and object directly ;(... if the table was varchar(8000) ok that was possible, but some sql are much bigger

foreach (DataRow row in dt.Rows)
{

object[] array = row.ItemArray;

/*
User::veSQLForSQLCalculateControlFileTotals,User::veSQLForSQLCalculateCreateFileTotals,User::veSQLForSQLCalculateUpdateFileTotals,User::veSQLForSQLIdentifyDiscrepancyDetails,User::veSQLForSQLReconcileTotals,User::veSQLForSQLUpdateReportStatus
*
*/

Dts.Variables["veSQLForSQLTotals"].Value = array[0].ToString();
Dts.Variables["veSQLForSQLControlFile"].Value = array[1].ToString();
Dts.Variables["veSQLForSQLCalculateUpdateFile"].Value = array[2].ToString();

Dts.TaskResult = (int)ScriptResults.Success;
}
}

The table BLOG.tReportypeConfiguration is something like

ReportTypeID (smallint) veSQLForCalculation (varmax)
1 UNION

SELECT Col1, Col2

UNION ALL

2 — INTERSECTION
3 — DIFFERENCE

So we need just to call the package (from a sql agent job o PROC, passing the ReportTypeID parameter)…

Adding new Types

This design is handy when you need add more new types: just add a new row to the BLOG.tReportypeConfiguration table and probably a new related XLS template to use for the new type introduced, and without code changes affecting the package we can expand to 4 the report typed the pkg can manage. Another value is that when (if) sql changes because some fix/biz requirements we can simply update the BLOG.tReportypeConfiguration without changing the pkg

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s