ETL Architecture Smart Examples & Snippets (1 of N)

This topic is something that is usually very hard to find on the blog, there is a plethora of blogs about SSIS tips (sometime I do that too 8P ), but honestly I think that what makes a difference is how you develop the major core part of a system, how do you think of it and the balance to keep between what you want to deliver and the amount of resources (time and skills) you have.

I am trying here to list the core components of the ETL system I have seen in my past experience trying to focus on the most relevant characteristics they had in common, so you can reuse this information when developing yours; I am not focusing on a particular problem as that would be too specific: DW projects would face some special design consideration on how to deal with SCD Dimensions and Very Deep Fact tables, where a pure Load and Transform system should be designed to be fast and balanced with a special consideration to provide restartability in case of connection issues… etc

Generally speaking an ETL system should have: (in random order)

Configurability

Each package has a list of Vars where the values that might change are maintained. These variables usually are configured using SSIS Configuration (usually indirect configuration where you a have and Environment variable that points to the path of an XML with the connection to the sql where all the vars configured value are kept)

Why to keep the pkg configuration in sql is a pro? Because in this way you could think of the following scenario:

For each XYZ SSIS pkg define these core sqls:

  • PkgConfiguration.sql
  • PROC_uspXYZ_OverrideMode.sql
  • SQLAgentJob_XYZ_NormalMode.sql
  • AddAgentJobSchedules.sql
  • SQLAgentJob_XYZ_OverrideMode.sql

I’ll describe each one in a way to focus on what they do and how they are built

PkgConfiguration.sql

It contains the pkg configuration that is maintained in the special table ETL.[SSISConfigurations]; this table has the standard columns

ConfigurationFilter ConfiguredValue PackagePath ConfiguredValueType

That dtexec uses when a package is configured …

The empty PkgConfiguration.sql is

<PRE>

DECLARE @ConfigFilter VARCHAR(255) = 'Config_#XYZ#';

IF NOT EXISTS (
SELECT 1
FROM [ETL].[SSISConfigurations]
WHERE ConfigurationFilter = @ConfigFilter
)
BEGIN

-- *** INSERT (from xls)

-- ***

PRINT 'New [ETL].[SSISConfigurations] values created on ' + @@SERVERNAME + ' in ' + DB_NAME() + ' database';
END
ELSE
BEGIN
PRINT '[ETL].[SSISConfigurations] values already exists on ' + @@SERVERNAME + ' in ' + DB_NAME() + ' database' + ' - no action taken';
END
GO

</PRE>

The part to fill with the specific code is

<PRE>

-- *** INSERT (from xls)

-- ***

</PRE>

To simplify and make that (semi) automatic you can use an external XLS like the following

In the Pkg commit sheet make a check list of the Vars that need to be kept in configuration; at minimum the Sources/Logging/Targets (so you can test the pkg in the DEV env and then move it to QA/PROD just updating the sql configuration – don’t hardcode them!)

So how to we setup the:

<PRE>

-- *** INSERT (from xls)

-- ***

</PRE>

I came up to this approach:

Use the BIDS wizard to put in configuration all the vars listed in the Pkg commit sheet

When done; BIDS will fill AUTOMATICALLY the .[ETL].[SSISConfigurations] for the ‘Config_PKG name NO ext’ ConfigFilter

What you have to do is to use the DEV Build SQL Insert and run the following

DECLARE @ConfigFilter varchar(255) = ‘Config_PKG name NO ext’;

SELECT *

FROM [ETL].[SSISConfigurations]

WHERE ConfigurationFilter = @ConfigFilter"

And place the values in the sheet to build the INSERT snippet:

Let’s have a look through screenshots using fake values:

And to use the sql snippet from the last sheet

" DECLARE @ConfigFilter varchar(255) = ‘Config_BLOG’;

SELECT *

FROM [ETL_Logging].[ETL].[SSISConfigurations]

WHERE ConfigurationFilter = @ConfigFilter"

Note> In XLS you can play a lot with cell ref to propagate values and keep all consistent…in this case the cell value is

=" DECLARE @ConfigFilter varchar(255) = ‘"& E3&"’;

SELECT *

FROM [ETL_Logging].[ETL].[SSISConfigurations]

WHERE ConfigurationFilter = @ConfigFilter"

Where E3 is

="Config_"&’Pkg Commit’!B2

For example the row

Config_BLOG

Y

Package.Variables[User::SourceReady].Properties[Value]

String

SELECT @ConfigFilter ,’Y’,’Package.Variables[User::SourceReady].Properties[Value]’,’String’ UNION ALL

is calculated as

=IF(D15="DateTime",

" SELECT @ConfigFilter "&",’"&TEXT(B15,"DD/MM/YYYY hh:mm:ss")&"’,’"&C15&"’,’"&D15&"’ UNION ALL ",

" SELECT @ConfigFilter "&",’"&B15&"’,’"&C15&"’,’" &D15&"’ UNION ALL ")

So we can copy and paste the yellow code as :

<PRE>

DECLARE @ConfigFilter VARCHAR(255) = 'Config_BLOG';

IF NOT EXISTS (
SELECT 1
FROM [ETL].[SSISConfigurations]
WHERE ConfigurationFilter = @ConfigFilter
)
BEGIN
-- *** INSERT (from xls)

	INSERT INTO [ETL_Logging].[ETL].[SSISConfigurations] (ConfigurationFilter, ConfiguredValue, PackagePath, ConfiguredValueType)
	SELECT @ConfigFilter ,'Y','Package.Variables[User::SourceReady].Properties[Value]','String' UNION ALL
	SELECT @ConfigFilter ,'Y','Package.Variables[User::TargetReady].Properties[Value]','String' UNION ALL
	SELECT @ConfigFilter ,'ETL','Package.Variables[User::LoggingDB].Properties[Value]','String' UNION ALL
	SELECT @ConfigFilter ,'OMEGA1','Package.Variables[User::LoggingServer].Properties[Value]','String' UNION ALL
	SELECT @ConfigFilter ,'HOGAN','Package.Variables[User::SourceDB1].Properties[Value]','String' UNION ALL
	SELECT @ConfigFilter ,'X','Package.Variables[User::SourceServer].Properties[Value]','String' UNION ALL
	SELECT @ConfigFilter ,'ALLINONE','Package.Variables[User::TargetDB].Properties[Value]','String' UNION ALL
	SELECT @ConfigFilter ,'0','Package.Variables[User::TargetImportMonth].Properties[Value]','Int32' UNION ALL
	SELECT @ConfigFilter ,'OMEGAX','Package.Variables[User::TargetServer].Properties[Value]','String' -- UNION ALL

-- ***

PRINT 'New [ETL].[SSISConfigurations] values created on ' + @@SERVERNAME + ' in ' + DB_NAME() + ' database';
END
ELSE
BEGIN
PRINT '[ETL].[SSISConfigurations] values already exists on ' + @@SERVERNAME + ' in ' + DB_NAME() + ' database' + ' - no action taken';
END
GO

</PRE>

And you have done; what is left is to clone this with QA/PROD specific values (changing server at least) and deploying the pkg as it is + the PkgConfiguration_QA.sql and PkgConfiguration_PROD.sql so the pkg will configure itself at runtime… and any other changes will involve a DML against [ETL].[SSISConfigurations], without the configuration you should each time edit the pkg and redeploy it … what a nightmare and how many things could go wrong …

Conclusion

So that’s I think a nice tip to use, I guess; we’ll see

PROC_uspXYZ_OverrideMode.sql

SQLAgentJob_XYZ_NormalMode.sql

AddAgentJobSchedules.sql

SQLAgentJob_XYZ_OverrideMode.sql

in the next posts, in the middle time feel free to comment and give your impression or ask for clarification😉

One thought on “ETL Architecture Smart Examples & Snippets (1 of N)

  1. Pingback: ETL Architecture Smart Examples & Snippets (2 of N) | Obar1 aka Mario Amatucci

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s