BI consultants the hard way SSIS template package my way

So you have some Job spec to implement, one or more SSIS packages to implement data extraction, data loading, file moving etc., sometime the approach to deal with sql tasks involve using parameters and dynamic queries, are you going to hard code them or using sql params (that might change if using oledb or ado.net connection…)

Assuming you have already coded some sql snippet that will address the logic parts of the Spec, let’s see how to build a package that can integrate them keeping the sql in the package in a sync with the sql code from the Spec. This is very useful in real case scenario, as Spec changes and you need to apply them to the pkg keeping all together, we live in an Agile word, specially if you’re consultant.

This screenshot represents the objects involved, let my details of them part by part

I assume the package use SSISConfigurations and you have some sort of table where you log the execution of the package (Batch) and probably another table to keep the details of each Batch execution (common approach for an etl logging framework)

The History SEQC

It’s very important to manage that, but having only some nice text that give info about the code changes, it’s not the most important thing, it’s very important to increment the values of 2 pkg level Vars

<DTS:Property DTS:Name=“VersionMajor”>1</DTS:Property>

<DTS:Property DTS:Name=“VersionMinor”>0</DTS:Property>

This will help you a lot when troubleshooting something in production deployment, it happened to me to see old version of packages deployed rather than the upgrades…

The title is mapped to a Var; using an expression as following

@[System::PackageName] +””+ @[User::BatchProcessDescription]
Where the latter is

“<Package purpose description to go here>”+”ver “+

(DT_WSTR, 11) @[System::VersionMajor] +”_”+

(DT_WSTR, 11) @[System::VersionMinor] +”_b”+

(DT_WSTR, 11) @[System::VersionBuild]

The BatchProcessDescription it’s a Var used to log the current execution status in a table so in this way you can check the running states of the packages deployed querying that table, and even to use it as a simple semaphore/coordinator.

TaskFlow block

All the logic it’s in the following block

1) Print Vars is used to log all the Vars contents in the logging tables; this is great when you need to retrospect some batch execution looking for root cause issues

2) the SQL Log Started/Success are just sql task to log in a table the main task flow details such as LoadStart, End and Status

Ex

It might be necessary to do some pre/post checks before the actual execution, so the 2 containers come handy to that;

Ex if the package loads data from Source to Target, you might check that data is not loaded already (in a normal execution mode) so you can add come checks in SEQC Housekeeping Pre Tasks, as the main block it’s intended to log actual data movement…

Execution override

It’s very common to use some sort of configuration to configure package variables used in the dataflow; the configuration happens when the package is loaded, so this sections it’s a sort of handy tool for the development and testing

The Boolean var Dev_Mode is used to set the behavior of the pkg itself. When true SEQC Dev Mode Pre is enabled so the Dev_Mode Setup is hit

Ex

You can override some values coming from the Config, so you can change the behavior of the package without updating, for example, anything on SSISConfigurations table (if you’re using it)

Another handy SCR is Parse Meta SQL veSQLFor, as you can end up with loads of sql script and related sql, you need to keep this in Vars (or expression Vars); this tasks will parse some meta sql very close to the sql you had written to test the spec in pre-coding phase and translate is sql expression you can copy enad paste in Var expression, rather than doing for each of them manually…

Ex

Let’s assume you have this 2 sql snippet to implement this (dummy) logic

— 1.1 Read Data.sql

SELECT GETDATE() AS CurrentDate

And

— 2.1.Do something

DECLARE @dateToCheck DATETIME =’2014-01-01— PARAM

SELECT DATEDIFF(SECOND, @dateToCheck, GETDATE()) DiffInSeconds

You can define 2 sql vars

veSQLFor1 to keep the 1.1

veSQLFor2 for 2.1, but it’s more tricky as it needs to get the value calculated from the previous code, and assuming the value is saved as result in a datetime var CurrentDate, you have to write something like that:

“DECLARE @dateToCheck DATETIME = Â ”

+ (DT_WSTR, 4 ) YEAR(@[User::CurrentDate]) + “” +

Right( “0” + (DT_WSTR, 2 ) MONTH(@[User::CurrentDate]),2) + “” +

Right( “0” + (DT_WSTR, 2 ) DAY(@[User::CurrentDate]),2) +

SELECT DATEDIFF(SECOND, @dateToCheck, GETDATE()) DiffInSeconds

image
As you can see, apart the complication of the SSIS type conversions, you have an nice match with the original sql.

The Parse Meta SQL veSQLFor does that: will parse the some sql metadata and it will print the SSIS expression so you can define the var expression without having to crate in the dialogue (super expensive in time and not handy)

Ex

Given this input

— 2.1.Do something

DECLARE @dateToCheck DATETIME =@User::CurrentDate

SELECT DATEDIFF(SECOND, @dateToCheck, GETDATE()) DiffInSeconds
it will print the SSIS expression, so you can copy and paste it and set the veSqlFor2 expression (it’s done just on package development of course, I will write a specific post to share the details)
public void Main()

{

//N> this are contained initially in the lveSQLForxyz

List<string> lVarsToReplaceInMetaSQL = new List<string> {
“User::vSourceTable001”,
“User::vTargetTable001”
};

//TODO
List<string> lVeSQLForVars = new List<string> {
“veSQLFor1”
};

//dict that will contain the lveSQLForxyz with proper SSIS exprseesion coming for the MetaSQL
Dictionary<string, string> dictMetaSQL = new Dictionary<string, string>();

//N> Add MetaSQL form Mng SQL
AddMetaSQL(dictMetaSQL);

//N> update ExpressifyVar()
ParseMetaSQL(dictMetaSQL, lVarsToReplaceInMetaSQL);

PersistInVars(dictMetaSQL);

CheckLenghtNoUnreplaced(dictMetaSQL);

PrintSQL(lVeSQLForVars);

Dts.TaskResult = (int)ScriptResults.Success

Dev_Mode Check for TOP it’s just a way to check if you are using TOP in some sql clause, very common when prototyping the flow, you need just to process/extract a few row to check the logic works, and you can fully run it in a finalization stage to check more etl performance issue if any…

Finally all the main EventHandler are enabled/disabled by Dev_Mode, as I noticed this speedup the package execution; it’s mapped to the Disable property

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s