Plan Big, an agile pomodoro spreadsheet

Plan Big, an agile pomodoro spreadsheet

I’ve seen too many colleagues starting their day without spending 10 mins thinking upfront what to do during the day, ending up in this way with loads of time wasted and confusion… honestly it’s much better starting your day with a thought what you’ve done, what you need to do today and what you plan to achieve tomorrow; this is essential in a team to have a scope and a feeling to work for something meaningful… I cannot get why this simple steps are still not a dogma in so many companies… are we supposed to be slave of incompetent PM/ Team leaders forever making projects to fail just for a luck of commons sense?

You can use this google doc here to help yourself, to use it you need 2 minutes of your rapacious time and it will help you to be more focused and to think on what are you working on…

to plan your daily task – https://docs.google.com/spreadsheet/ccc?key=0AsSPt8G9vLMjdDlzMk1DdlNkQl90TzhYZm1RV1dlZUE&usp=sharing 

1st you clone it in you google doc 
 2nd you update it with the daily task and how long do you think is going to take considering

20140364_000383-CapturFiles

1 = 25 min commitment ( a pomodoro ) 
so for each task you update the column DONE 
 
20140364_000359-CapturFiles 
Note the time if updated every minute and the state of Estimated Done changes when changing the DONE column; in this way you can have see in realtime view how long it is going to take you before finishing your daily tasks
 
Last but not least you might save the daily sheet, to keep track of your week, just add a new sheet for each new week and copy and paste the columns at the end of the day, so you can switch off your brain and enjoy your free time

SSIS Incremental Pkg Definition

Let’s imagine we have a scenario where a family of similar entities needs to be processed; an example that you might have seen (working on BI projects) is to produce a set of reports that are quite similar in the structure but that might differentiate themselves for some computation applied. Let’s say you cannot use SSRS or because the nature of the problem that would not be suitable…

Scenario definition

Let’s define synthetically and in a simple way the problem to solve and then let’s see how we could address it:

Sources:

  • some ReportTypesID (to identify the type of report to produce)
  • 2 tables with data: TableX, TableY
  • 3 templates: XLS template

Target:

  • 3 XLS generated using the XLS template following the logic:

Using the data in TableX and TableY (they have some columns in common) and 3 given an XLS template produce 3 report types as following:
1. XLS = UNION of TableX & Y values and use Template1
2. XLS = INTERSECTION of TableX & Y values and use Template2
3. XLS = DIFFERENCE of TableX & Y values and use Template3

Given a parameter ReportTypeID generate the related XLS

Not flexible approach

Following a not flexible approach would define 3 different packages ( ex NotFlexibleUNION.dtsx, NotFlexibleINTERSECTION.dtsx and NotFlexibleDIFFERENCE.dtsx) with the same structure

image001

plus some vars

image003

ReportTypeID = and ID 1 to 3 in our case (useful to log what we are running)
veSQLForCalculation = sql to use in the Data Flow (to achieve UNION, INTERSECTION etc…)

and specific hardcoded logic/sql inside to address the 1…3. In this way you end up with a lot of code duplication and a lot different packages to maintain;

A more flexible approach

The following might be suitable in a general situation using child pkg to do calculations; each of one keeps a different setup for ReportTypeID and veSQLForCalculation

image005

Less code duplication but still we have different Childs pkg to manage and all the complexity related passing data between them; looking at our scenario we can reach a better compromise

Incremental approach

The main difference between the 3 reports is how we do calculation (the sql snippet in veSQLForCalculation), to grab the data used to fill the xls;
So what about keeping the sql to run in a table and use this table to configure the pkg at runtime extracting the sql and map it to veSQLForCalculation?

image007

SQL Get veSQLForCalculation grabs the veSQLForCalculation value given the ReportTypeID, that value is mapped to the var veSQLForCalculation. There is an optional SCR parse object to veSQLFor that can be omitted to simplify, what it does is cope a more realistic scenario where you have several columns form the table below, not only 1 veSQLForCalculation and the sql might be big, so does the trick in the code below:

public void Main()
{

OleDbDataAdapter A = new OleDbDataAdapter();
System.Data.DataTable dt = new System.Data.DataTable();
A.Fill(dt, Dts.Variables["User::oveMetaSQL"].Value);
// as the object are stored as varmax() this trick make able to copy in the vars, you cannot use an execute task and map to and object directly ;(... if the table was varchar(8000) ok that was possible, but some sql are much bigger

foreach (DataRow row in dt.Rows)
{

object[] array = row.ItemArray;

/*
User::veSQLForSQLCalculateControlFileTotals,User::veSQLForSQLCalculateCreateFileTotals,User::veSQLForSQLCalculateUpdateFileTotals,User::veSQLForSQLIdentifyDiscrepancyDetails,User::veSQLForSQLReconcileTotals,User::veSQLForSQLUpdateReportStatus
*
*/

Dts.Variables["veSQLForSQLTotals"].Value = array[0].ToString();
Dts.Variables["veSQLForSQLControlFile"].Value = array[1].ToString();
Dts.Variables["veSQLForSQLCalculateUpdateFile"].Value = array[2].ToString();

Dts.TaskResult = (int)ScriptResults.Success;
}
}

The table BLOG.tReportypeConfiguration is something like

ReportTypeID (smallint) veSQLForCalculation (varmax)
1 UNION

SELECT Col1, Col2

UNION ALL

2 – INTERSECTION
3 – DIFFERENCE

So we need just to call the package (from a sql agent job o PROC, passing the ReportTypeID parameter)…

Adding new Types

This design is handy when you need add more new types: just add a new row to the BLOG.tReportypeConfiguration table and probably a new related XLS template to use for the new type introduced, and without code changes affecting the package we can expand to 4 the report typed the pkg can manage. Another value is that when (if) sql changes because some fix/biz requirements we can simply update the BLOG.tReportypeConfiguration without changing the pkg

ETL Architecture Smart Examples & Snippets (1 of N)

This topic is something that is usually very hard to find on the blog, there is a plethora of blogs about SSIS tips (sometime I do that too 8P ), but honestly I think that what makes a difference is how you develop the major core part of a system, how do you think of it and the balance to keep between what you want to deliver and the amount of resources (time and skills) you have.

I am trying here to list the core components of the ETL system I have seen in my past experience trying to focus on the most relevant characteristics they had in common, so you can reuse this information when developing yours; I am not focusing on a particular problem as that would be too specific: DW projects would face some special design consideration on how to deal with SCD Dimensions and Very Deep Fact tables, where a pure Load and Transform system should be designed to be fast and balanced with a special consideration to provide restartability in case of connection issues… etc

Generally speaking an ETL system should have: (in random order)

Configurability

Each package has a list of Vars where the values that might change are maintained. These variables usually are configured using SSIS Configuration (usually indirect configuration where you a have and Environment variable that points to the path of an XML with the connection to the sql where all the vars configured value are kept)

Why to keep the pkg configuration in sql is a pro? Because in this way you could think of the following scenario:

For each XYZ SSIS pkg define these core sqls:

  • PkgConfiguration.sql
  • PROC_uspXYZ_OverrideMode.sql
  • SQLAgentJob_XYZ_NormalMode.sql
  • AddAgentJobSchedules.sql
  • SQLAgentJob_XYZ_OverrideMode.sql

I’ll describe each one in a way to focus on what they do and how they are built

PkgConfiguration.sql

It contains the pkg configuration that is maintained in the special table ETL.[SSISConfigurations]; this table has the standard columns

ConfigurationFilter ConfiguredValue PackagePath ConfiguredValueType

That dtexec uses when a package is configured …

The empty PkgConfiguration.sql is

<PRE>

DECLARE @ConfigFilter VARCHAR(255) = 'Config_#XYZ#';

IF NOT EXISTS (
SELECT 1
FROM [ETL].[SSISConfigurations]
WHERE ConfigurationFilter = @ConfigFilter
)
BEGIN

-- *** INSERT (from xls)

-- ***

PRINT 'New [ETL].[SSISConfigurations] values created on ' + @@SERVERNAME + ' in ' + DB_NAME() + ' database';
END
ELSE
BEGIN
PRINT '[ETL].[SSISConfigurations] values already exists on ' + @@SERVERNAME + ' in ' + DB_NAME() + ' database' + ' - no action taken';
END
GO

</PRE>

The part to fill with the specific code is

<PRE>

-- *** INSERT (from xls)

-- ***

</PRE>

To simplify and make that (semi) automatic you can use an external XLS like the following

In the Pkg commit sheet make a check list of the Vars that need to be kept in configuration; at minimum the Sources/Logging/Targets (so you can test the pkg in the DEV env and then move it to QA/PROD just updating the sql configuration – don’t hardcode them!)

So how to we setup the:

<PRE>

-- *** INSERT (from xls)

-- ***

</PRE>

I came up to this approach:

Use the BIDS wizard to put in configuration all the vars listed in the Pkg commit sheet

When done; BIDS will fill AUTOMATICALLY the .[ETL].[SSISConfigurations] for the ‘Config_PKG name NO ext’ ConfigFilter

What you have to do is to use the DEV Build SQL Insert and run the following

DECLARE @ConfigFilter varchar(255) = ‘Config_PKG name NO ext’;

SELECT *

FROM [ETL].[SSISConfigurations]

WHERE ConfigurationFilter = @ConfigFilter"

And place the values in the sheet to build the INSERT snippet:

Let’s have a look through screenshots using fake values:

And to use the sql snippet from the last sheet

" DECLARE @ConfigFilter varchar(255) = ‘Config_BLOG’;

SELECT *

FROM [ETL_Logging].[ETL].[SSISConfigurations]

WHERE ConfigurationFilter = @ConfigFilter"

Note> In XLS you can play a lot with cell ref to propagate values and keep all consistent…in this case the cell value is

=" DECLARE @ConfigFilter varchar(255) = ‘"& E3&"’;

SELECT *

FROM [ETL_Logging].[ETL].[SSISConfigurations]

WHERE ConfigurationFilter = @ConfigFilter"

Where E3 is

="Config_"&’Pkg Commit’!B2

For example the row

Config_BLOG

Y

Package.Variables[User::SourceReady].Properties[Value]

String

SELECT @ConfigFilter ,’Y’,’Package.Variables[User::SourceReady].Properties[Value]‘,’String’ UNION ALL

is calculated as

=IF(D15="DateTime",

" SELECT @ConfigFilter "&",’"&TEXT(B15,"DD/MM/YYYY hh:mm:ss")&"’,’"&C15&"’,’"&D15&"’ UNION ALL ",

" SELECT @ConfigFilter "&",’"&B15&"’,’"&C15&"’,’" &D15&"’ UNION ALL ")

So we can copy and paste the yellow code as :

<PRE>

DECLARE @ConfigFilter VARCHAR(255) = 'Config_BLOG';

IF NOT EXISTS (
SELECT 1
FROM [ETL].[SSISConfigurations]
WHERE ConfigurationFilter = @ConfigFilter
)
BEGIN
-- *** INSERT (from xls)

	INSERT INTO [ETL_Logging].[ETL].[SSISConfigurations] (ConfigurationFilter, ConfiguredValue, PackagePath, ConfiguredValueType)
	SELECT @ConfigFilter ,'Y','Package.Variables[User::SourceReady].Properties[Value]','String' UNION ALL
	SELECT @ConfigFilter ,'Y','Package.Variables[User::TargetReady].Properties[Value]','String' UNION ALL
	SELECT @ConfigFilter ,'ETL','Package.Variables[User::LoggingDB].Properties[Value]','String' UNION ALL
	SELECT @ConfigFilter ,'OMEGA1','Package.Variables[User::LoggingServer].Properties[Value]','String' UNION ALL
	SELECT @ConfigFilter ,'HOGAN','Package.Variables[User::SourceDB1].Properties[Value]','String' UNION ALL
	SELECT @ConfigFilter ,'X','Package.Variables[User::SourceServer].Properties[Value]','String' UNION ALL
	SELECT @ConfigFilter ,'ALLINONE','Package.Variables[User::TargetDB].Properties[Value]','String' UNION ALL
	SELECT @ConfigFilter ,'0','Package.Variables[User::TargetImportMonth].Properties[Value]','Int32' UNION ALL
	SELECT @ConfigFilter ,'OMEGAX','Package.Variables[User::TargetServer].Properties[Value]','String' -- UNION ALL

-- ***

PRINT 'New [ETL].[SSISConfigurations] values created on ' + @@SERVERNAME + ' in ' + DB_NAME() + ' database';
END
ELSE
BEGIN
PRINT '[ETL].[SSISConfigurations] values already exists on ' + @@SERVERNAME + ' in ' + DB_NAME() + ' database' + ' - no action taken';
END
GO

</PRE>

And you have done; what is left is to clone this with QA/PROD specific values (changing server at least) and deploying the pkg as it is + the PkgConfiguration_QA.sql and PkgConfiguration_PROD.sql so the pkg will configure itself at runtime… and any other changes will involve a DML against [ETL].[SSISConfigurations], without the configuration you should each time edit the pkg and redeploy it … what a nightmare and how many things could go wrong …

Conclusion

So that’s I think a nice tip to use, I guess; we’ll see

PROC_uspXYZ_OverrideMode.sql

SQLAgentJob_XYZ_NormalMode.sql

AddAgentJobSchedules.sql

SQLAgentJob_XYZ_OverrideMode.sql

in the next posts, in the middle time feel free to comment and give your impression or ask for clarification ;)

SSIS Passing ObjVar from Child back to Parent

It’s straightforward to use Parent variable configuration to alter a child package variable using the Execute package.

A bit trickier is to do the other way around. Let’s see how…

Let’s assume we have a SSIS project with the following:

ParentPkg.dtsx

With the following vars:

And

ChildPkg.dtsx

With the following vars:

And as simple Parent Var configuration to get the value of TargetMonth

How does it work:

The flow is the following:

1. ParentPkg runs:
1.1. it sets the value of the TargetMonthEnd
1.2. it calls the child passing it
1.3. The ChildPkg runs:
1.3.1. Get the value of TargetMonthEnd from the parent
1.3.2. Calculate the value of TargetMonthEndChild
1.3.3. Copy it to TargetMonthEnd
1.4. Print both values

some snippets from the code by section:

1.1

<PRE>
Dts.Variables["TargetMonthEnd"].Value = 20140131;
</PRE>

1.3.2

<PRE>
int TargetEndMonth = Convert.ToInt32(Dts.Variables["User::TargetMonthEnd"].Value);

Dts.Variables["User::TargetEndMonthTextChild"].Value = string.Format("[ {0} ] ", TargetEndMonth);
</PRE>

The most relevant part is
1.3.3

<PRE>
public void Main()
{
try
{
Variables vars = null;
if (Dts.VariableDispenser.Contains("User::TargetEndMonthText")
&& (Dts.VariableDispenser.Contains("User::TargetEndMonthTextChild")))
{
Dts.VariableDispenser.LockForWrite("User::TargetEndMonthText");
Dts.VariableDispenser.LockForRead("User::TargetEndMonthTextChild");
Dts.VariableDispenser.GetVariables(ref vars);
vars["User::TargetEndMonthText"].Value = vars"User::TargetEndMonthTextChild"].Value;
vars.Unlock();
}
}
catch { }
Dts.TaskResult = (int)ScriptResults.Success;
}
</PRE>

In this way we can run the child as standalone without breaking it in case it’s not called by a Parent.

The way it works behind the scene is (probably) the parent when using the Execute Package tasks, the Parent and the Child share the same collection of vars (passed by reference) so they can modify them.

Test and Execution

Let’s ‘unit test the Child’ and then go for AN integration test:

As expected the child just pick the default value when called standalone

Now let’s run the Parent with Child package enabled

And to double check … with child disabled

So the rule is

to match the share 2 vars and

to have an extra var in the child that will do the trick.

BI consultants the hard way SSIS template package my way

So you have some Job spec to implement, one or more SSIS packages to implement data extraction, data loading, file moving etc., sometime the approach to deal with sql tasks involve using parameters and dynamic queries, are you going to hard code them or using sql params (that might change if using oledb or ado.net connection…)

Assuming you have already coded some sql snippet that will address the logic parts of the Spec, let’s see how to build a package that can integrate them keeping the sql in the package in a sync with the sql code from the Spec. This is very useful in real case scenario, as Spec changes and you need to apply them to the pkg keeping all together, we live in an Agile word, specially if you’re consultant.

This screenshot represents the objects involved, let my details of them part by part

I assume the package use SSISConfigurations and you have some sort of table where you log the execution of the package (Batch) and probably another table to keep the details of each Batch execution (common approach for an etl logging framework)

The History SEQC

It’s very important to manage that, but having only some nice text that give info about the code changes, it’s not the most important thing, it’s very important to increment the values of 2 pkg level Vars

<DTS:Property DTS:Name=“VersionMajor”>1</DTS:Property>

<DTS:Property DTS:Name=“VersionMinor”>0</DTS:Property>

This will help you a lot when troubleshooting something in production deployment, it happened to me to see old version of packages deployed rather than the upgrades…

The title is mapped to a Var; using an expression as following

@[System::PackageName] +””+ @[User::BatchProcessDescription]
Where the latter is

“<Package purpose description to go here>”+”ver “+

(DT_WSTR, 11) @[System::VersionMajor] +”_”+

(DT_WSTR, 11) @[System::VersionMinor] +”_b”+

(DT_WSTR, 11) @[System::VersionBuild]

The BatchProcessDescription it’s a Var used to log the current execution status in a table so in this way you can check the running states of the packages deployed querying that table, and even to use it as a simple semaphore/coordinator.

TaskFlow block

All the logic it’s in the following block

1) Print Vars is used to log all the Vars contents in the logging tables; this is great when you need to retrospect some batch execution looking for root cause issues

2) the SQL Log Started/Success are just sql task to log in a table the main task flow details such as LoadStart, End and Status

Ex

It might be necessary to do some pre/post checks before the actual execution, so the 2 containers come handy to that;

Ex if the package loads data from Source to Target, you might check that data is not loaded already (in a normal execution mode) so you can add come checks in SEQC Housekeeping Pre Tasks, as the main block it’s intended to log actual data movement…

Execution override

It’s very common to use some sort of configuration to configure package variables used in the dataflow; the configuration happens when the package is loaded, so this sections it’s a sort of handy tool for the development and testing

The Boolean var Dev_Mode is used to set the behavior of the pkg itself. When true SEQC Dev Mode Pre is enabled so the Dev_Mode Setup is hit

Ex

You can override some values coming from the Config, so you can change the behavior of the package without updating, for example, anything on SSISConfigurations table (if you’re using it)

Another handy SCR is Parse Meta SQL veSQLFor, as you can end up with loads of sql script and related sql, you need to keep this in Vars (or expression Vars); this tasks will parse some meta sql very close to the sql you had written to test the spec in pre-coding phase and translate is sql expression you can copy enad paste in Var expression, rather than doing for each of them manually…

Ex

Let’s assume you have this 2 sql snippet to implement this (dummy) logic

– 1.1 Read Data.sql

SELECT GETDATE() AS CurrentDate

And

– 2.1.Do something

DECLARE @dateToCheck DATETIME =’2014-01-01– PARAM

SELECT DATEDIFF(SECOND, @dateToCheck, GETDATE()) DiffInSeconds

You can define 2 sql vars

veSQLFor1 to keep the 1.1

veSQLFor2 for 2.1, but it’s more tricky as it needs to get the value calculated from the previous code, and assuming the value is saved as result in a datetime var CurrentDate, you have to write something like that:

“DECLARE @dateToCheck DATETIME = Â “

+ (DT_WSTR, 4 ) YEAR(@[User::CurrentDate]) + “-” +

Right( “0” + (DT_WSTR, 2 ) MONTH(@[User::CurrentDate]),2) + “-” +

Right( “0” + (DT_WSTR, 2 ) DAY(@[User::CurrentDate]),2) +

SELECT DATEDIFF(SECOND, @dateToCheck, GETDATE()) DiffInSeconds

image
As you can see, apart the complication of the SSIS type conversions, you have an nice match with the original sql.

The Parse Meta SQL veSQLFor does that: will parse the some sql metadata and it will print the SSIS expression so you can define the var expression without having to crate in the dialogue (super expensive in time and not handy)

Ex

Given this input

– 2.1.Do something

DECLARE @dateToCheck DATETIME =@User::CurrentDate

SELECT DATEDIFF(SECOND, @dateToCheck, GETDATE()) DiffInSeconds
it will print the SSIS expression, so you can copy and paste it and set the veSqlFor2 expression (it’s done just on package development of course, I will write a specific post to share the details)
public void Main()

{

//N> this are contained initially in the lveSQLForxyz

List<string> lVarsToReplaceInMetaSQL = new List<string> {
“User::vSourceTable001″,
“User::vTargetTable001″
};

//TODO
List<string> lVeSQLForVars = new List<string> {
“veSQLFor1″
};

//dict that will contain the lveSQLForxyz with proper SSIS exprseesion coming for the MetaSQL
Dictionary<string, string> dictMetaSQL = new Dictionary<string, string>();

//N> Add MetaSQL form Mng SQL
AddMetaSQL(dictMetaSQL);

//N> update ExpressifyVar()
ParseMetaSQL(dictMetaSQL, lVarsToReplaceInMetaSQL);

PersistInVars(dictMetaSQL);

CheckLenghtNoUnreplaced(dictMetaSQL);

PrintSQL(lVeSQLForVars);

Dts.TaskResult = (int)ScriptResults.Success

Dev_Mode Check for TOP it’s just a way to check if you are using TOP in some sql clause, very common when prototyping the flow, you need just to process/extract a few row to check the logic works, and you can fully run it in a finalization stage to check more etl performance issue if any…

Finally all the main EventHandler are enabled/disabled by Dev_Mode, as I noticed this speedup the package execution; it’s mapped to the Disable property

BI consultants the hard way from specs to ssis

So you have some Job spec to implement, one or more SSIS packages, let’s see an approach that will help to build a quality package in short time.

A lot of BI developers just jump into BIDS and start coding, trying to match the specs and reworking what done very often, as they understand the data workflow: wrong. This approach is inconsistent, long and keen to errors.

Unfortunately you are a BI consultant, not many tools are given to you, so let’s be smart; a cool approach I developed in the last roles it’s the following, please give me some feedback.

The idea is quite simple: use an excel worksheet to abstract the business logic of the spec so the implementation would be a matter of clicks and drag and drops in BIDS, when the dataflow is completed with the necessary tasks, connections, ad vars.

This is an example of an xls:

The general structure of the xls has the following columns:

  • SPEC
  • TaskName
  • Connection Input
  • Input Var Used
  • veSQLFor Input Var
  • DFT Fields
  • ROW_Count
  • Output Var
  • Connection output
  • hyperlink to file

let’s see what they mean.

SPEC

It’s just a reference to the spec doc section (if any)

If they are missing I suggest adding them to split the spec logic in more manageable chunks.

TaskName

The name of the task; try to prefix the name itself with the type of task

Ex:

SQL Read Latest Business Date Active = sql task to read latest

SCR Extract CurrentZipDate = Script Task to extract …

DFT Import the file content = Data Flow Task to import …

In this way you can apply (as I did) a Conditional formatting on the column to see them in different colours

Connection Input

It’s the name of the connection used by the task (if any). This is very useful so you always know the task to which data source (file or db or other) is pulling data from

You can use the

To define the connection used by the task and define a data validation:

In this way the when you need a new connection you’ll add first it to the Connection worksheet and then you can select in the Task sheet for the Task itself… keeping it consistent; avoid to copy and paste the text.

Input Var Used

The idea is similar the connection; the pkg vars are often source of errors if not managed properly, the best is to list them in a separate sheet and setup the Data validation to constraint the value you can select in the Task sheet for the Input Vars Used column

veSQLFor Input Var

This is an advanced topics, I will post something specific, in general avoid hardcodig sql snippets in the tasks, define it as vars and use expressions if they needs to be dynamic (when they dependent on some vars value); at least you can use a formula to build the name of the vars based on the task, so the matching would be straightforward:

Ex

SQL Update ControlTable = veSQLForSQLUpdateControlTable

So in the Sql Task SQL Update ControlTable, the script will get the sql from a vars named veSQLForSQLUpdateControlTable.. Easy.

You can use a simple formula like this to build the name

=”veSQLFor”&SUBSTITUTE(CELL WITH TASKNAME,” “, “”)

DFT Fields & ROW_Count

This can be optional and of course they hold the Field of a DFT task and if any ROW_Count you need to log

Ex

Usually in Data extraction you need to log the number of rows read/inserted/deleted etc… keep name consistency

Output Var &Connection output

The same as for the Input ones, if the task has outputs and connections or it’s updating a var here is where to track them

In this way you can see the dataflow even you haven’t coded the dtsx yet, just scrolling down the xls…

Hyperlink to file

This make sense if you have read the previous post https://mamatucci.wordpress.com/2014/01/15/bi-consultants-the-hard-way-structuring-sql-and-ssis-solution/ as I advice to write the sql in separate snippet and to keep it a separate sql solution, you can use a hyperlink to open the sql snippet from xls and check the details of it.

BI consultants the hard way structuring sql and ssis solution

So you have some Job spec to implement, one or more SSIS packages and related SQL code to implement data extraction, data loading, file moving etc.

A handy approach I developed for the second part is to use the following files as starting point and expand them; I will try to give you good tips but feel free to contact me if you need more details.

The workflow is basically the following:

  1. Understand the Business SPEC
  2. Prototype the SQL solution
  3. Writing some test plan to verify correctness
  4. Deploy the SQL to QA/UAT/PROD

So let’s see point by point what we can reuse to keep a good quality and be quick

1. Understand the Business SPEC

2. Prototype the SQL solution

Well 1 & 2 are all up to you, generally speaking it’s important to clarify as much as possible before starting coding… not the way around.

  • Deploy.SupportTables.sql

Use it if you need tables to implement the logic, avoid missing newly created tables or forget new columns added, the same for support view, udf, proc etc… better to separate in several files

3. Writing some test plan to verify correctness

After some initial tests executed on your local machine, you need to automatize the pkg while approaching the test and deployment phase. I hope you are using some sort of strategy to deploy/push your new code to QA/UAT and finally PROD so this deliverable can be deployed with a simple script (ex SQLCMD)

  • Deploy.PkgConfiguration.sql

You can use it to implement a vey common sep in the SSIS package creation: define the SSIS configuration for the pkg (SSISConfigurations). Without being too formal a SSIS package always needs some way to configure itself such as changing some vars value (ex used in the connection manager); if your pkg has no Configuration it means all it’s hardcoded and this is bad as you cannot deploy the same package on different environments without code changes on it.

  • Deploy.SQLAgentJob_XYZ_NormalMode.sql
  • Deploy.SQLAgentJob_XYZ_OverrideMode.sql

A normal way to execute a package is to define sql agent jobs to call it (DTEXEC), so you can define the scheduling, setup configuration connection, override some parmas at runtime not kept in the configuration of course.

I generally use 2 versions: one for normal mode to implement the biz as usual execution of the package and a second one to reload some data called override mode (it’s quite common to be required to reload some data that was updated in a source file/db …)

The difference usually resides in the way the pkg is configured so it will act/behave in a slightly different way in NormalMode/OverrideMode

  • TEST.usp_NormalMode.sql
  • TEST.usp_OverrideMode.sql

At the end of the testing phase, the Deploy.*.sql should be ready and tested to be deployed. The TEST.*.sql are important so you can reuse them to give to the support and instruct the IT support have to reload data or escalate issues.

  • Deploy.AddAgentJobSchedules.sql

At the very end, update it with the pkg scheduling (if any)

4. Deploy the SQL to QA/UAT/PROD

The last part is the Deployment. Assuming that all the pieces to deploy are ready, you can divide logically the scripts with the following approach:

First DDL code (create/drop/update objects), then DML (fill the tables with configuration etc.) and at the end any Admin code (not related to the logical implementation but necessary sometimes Ex calling an external PROC at the end of the deployment to refresh some data…)

  • Deploy.Helper.bat

This is quite handy, it will walk thought the Deploy.*.sql looking for some text strings you want to keep in mind and listing the USE clause found, so you can be clean and organized splitting the DDL/DML/ADMIN by db, rather then putting all in a huge and unmanageable script

  • Deploy.AdminDMLDDL.Helper.bat

Checking the Deploy.Helper.txt you can call this to create empty placeholders to copy and paste Deploy.*.SQL code from 1, 2 and 3. Note no code change should happen here…

The result would be something like:

  • Deploy.DDL_DB1_deployment_task.txt
  • Deploy.DML_DB1_deployment_task.txt
  • Deploy.Admin_DB1_deployment_task.txt
  • Deploy.DDL_DB2_deployment_task.txt
  • Deploy.DML_DB2_deployment_task.txt
  • Deploy.Admin_DB2_deployment_task.txt

All the code is ready now to be deployed to the next step of the software lifecycle.

Ex

<pre>

/*
 *** TODO change
 #XYZ#
 #DB#
 #Schema.Table#
 #Schema.Proc#
 #Schema.View#
 */</code>

--**********************
 --DDL Deployment Task *
 --**********************

--*************************************************************************************************
 --Prepared By * MARIOA
 --Date * YYYYMMDD
 --Purpose * Support DDL SQL for the #XYZ# ETL job
 --Name * DDL_Deploy#DB#_rFrameSupportSQL
 --*************************************************************************************************

USE #DB#;
 GO

SET ANSI_NULLS ON
 GO

SET QUOTED_IDENTIFIER ON
 GO

-- CREATE TABLE ---

DECLARE @object_type char(2) = 'U' --User Table
 PRINT '#Schema.Table#...'

IF NOT EXISTS (SELECT *
 FROM sys.objects
 WHERE [object_id] = OBJECT_ID('#Schema.Table#')
 AND [type] = @object_type)
 BEGIN

--
 -- DML
 --

PRINT 'New table created on ' + @@SERVERNAME + ' in ' + DB_NAME() + ' database';
 END
 ELSE
 BEGIN
 PRINT 'Table already exists on ' + @@SERVERNAME + ' in ' + DB_NAME() + ' database' + ' - no action taken';
 END
 GO

-- ALTER TABLE ---

DECLARE @object_type char(2) = 'U' --User Table
 PRINT '#Schema.Table#...'

IF EXISTS (SELECT *
 FROM sys.objects
 WHERE [object_id] = OBJECT_ID('#Schema.Table#')
 AND [type] = @object_type)
 BEGIN

--
 -- DML
 --

PRINT 'Table altered on ' + @@SERVERNAME + ' in ' + DB_NAME() + ' database';
 END
 ELSE
 BEGIN
 PRINT 'Table does not exist on ' + @@SERVERNAME + ' in ' + DB_NAME() + ' database' + ' - no action taken';
 END
 GO

-- Proc ---

DECLARE @object_type char(2) = 'P' --Stored Procedure
 PRINT '#Schema.Proc#...'

IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'#Schema.Proc#') AND type in (N'P', N'PC'))
 DROP ProcEDURE #Schema.Proc#
 GO

--
 -- DML
 --

PRINT 'Stored proc dropped and recreated on ' + @@SERVERNAME + ' in ' + DB_NAME() + ' database';
 GO

-- VIEW ---
 IF EXISTS (SELECT * FROM sys.views WHERE object_id = OBJECT_ID(N'#Schema.View#'))
 DROP VIEW #Schema.View#
 GO

--
 -- DML
 --

PRINT 'View dropped and recreated on ' + @@SERVERNAME + ' in ' + DB_NAME() + ' database';
 GO
</pre>

Some credits to Ian Bradshaw (for the normal/override mode part)

SSIS TooManyTasks on GitHub

I’ve just started a project on GitHub; it’s an SSIS solution to train on each SSIS control flow and data flow task. I think you need to know all the tools available in your belt to deliver an efficient and agile solution. Every BI developer should remember that every BI project by nature has a very short development time and a very long maintenance time so it’s likely to happen to have different people working on the same project on different stages, a full knowledge of the toolset it’s so necessary.

Here we have the readme with the actual implementation

The solution at the moment is based on sql 2008 but I have in plan to fork for 2008 and 2012 and to  check the relevant differences between them.

The solution structure is the following, and I will update from time to time to cover all… but it will take a bit of time 8-P

image

WPF Listbox: non blocking UI… in simple steps

I have been working in a new project for the last 2 months…. a new feature of a WPF application. I am new to this world, but I realized has been a good experience… taking advantage of the MVVM pattern. In this post I want to share a small ‘dissertation’ how to write UI in real-time with a collection of objects that will update their state in real-time while some computation on them is done in others long-time consuming threads…

The key works here are: backgroundworker, viewmodel, dispatcher and thread safe collection…

Ingredients

The first ingredient is a class to implement the INotifyPropertyChanged; each property bound to the UI will expose a setter where invoke the PropertyChanged event of the inteface

The approach can be summarized:

  1. Define a class MainViewModel : INotifyPropertyChanged
    1. The ViewModel represents an abstraction of the actual xaml page, each property of the class is bound to the xaml through some Binding statement
      • Ex: <TextBlock Text=”{Binding ChangingText}”/>
      • The class has a property ChangingText that rises the PropertyChanged event each time is changed (by some thread), so the UI is notified of the new value of the property and will update itself… note that the class doesn’t call any method like we used to: TextBlockID.Text = ChangingText… if you do in this way your are not taking advantage of the Binding mechanics that will do the dirty job for you.
    2. Define a collection of SafeObservableCollection of what you want to bind to listbox, tabcontrol etc that will be updated by a long running process.
      • We can present the updated data in the UI without doing any check… only one time we bound the collection to the UI (xaml) then the Binding will notify the Ui of the changes (some logic of the INotifyPropertyChanged)
      • It’s not enough to use an ObservableColection, if you try you’ll ge this exception:
        • This type of CollectionView does not support changes to its SourceCollection from a thread different from the Dispatcher thread.
    3. Use the BackgroundWorker to run the task (that usually requires long time) in async fashion to free the UI from any block
      • Using the SafeObservableCollection the BackgroundWorker can change it without causing the exception, (it uses  the Dispatcher when necessary)

The example solution

In attachment here the solution to implement the approach:

Change the Number of items in the texbox and click somewhere (defocus from the textbox) and the Bw to add data will be instantiated and executed (in async)

image

The lb on the right is bound to the special collection


public SafeObservableCollection<Item> Items
{
get
{
return _items;
}
set
{
_items = value;
OnPropertyChanged(this, "Items");
}
}

The  BackgroundWorker instance is defined using  a simple delegate that will do the job:


var bw = DefineBw(nItems, DoAdd);
bw.RunWorkerCompleted +=
(s, args) =>
{
if (!args.Cancelled)
{
bw = DefineBw(nItems, DoChange);
bw.RunWorkerAsync();
}
};

bw.RunWorkerAsync();

The DoAdd will add randomly the Items to the lb and when finished (if not cancelled) it defines in the queue another Bw to do some changes (this to show the difference in collection changes and items values hold in the collection itself)….

image

Last screenshot to show queued Bws and a cancel invocation, note the lb on the left with the actions history…

image

Conclusion

This is just an example of the major blocks to reach nice results, send me an email of add a comment if you need further details….

Reference

http://msdn.microsoft.com/en-us/library/system.componentmodel.backgroundworker(v=vs.90).aspx

http://msdn.microsoft.com/en-us/library/system.windows.threading.dispatcher(v=vs.90).aspx

http://msdn.microsoft.com/en-us/library/ms743695(v=vs.90).aspx

http://msdn.microsoft.com/en-us/library/bb613546(v=vs.90).aspx

more…

http://elegantcode.com/2009/07/03/wpf-multithreading-using-the-backgroundworker-and-reporting-the-progress-to-the-ui/

http://blogs.imeta.co.uk/jyoung/archive/2010/04/06/848.aspx

http://www.planetgeek.ch/2011/05/17/threadsafe-observablecollection/

The Pareto principle, a dimensional justification

The main concern everybody have with the Pareto principle is that when it works against you…. you get seriously disappointed, and you start to regret. Generally speaking the Pareto principle says the life is not fair: 80% goes to the 20%, well, today if we think to the crisis it’s likely to be more: 10% of the superrich hold 90% of the global wealth… I don’t mind this, I am more interested to the everyday effects of the Pareto principle on myself.

I see the major effects of it when the laziness comes to me and stop me to invest a bit of time to solve problems; it can spread consequences that go beyond any expectation: let’s do some example:

  • Your mattress is a bit old, you need a new one, but instead of going straight to the shop to buy a new one, you simply do nothing. Result: you sleep bad for lots of days, waking up tired and so on…
  • The email you should had sent to your accountant is still waiting for you since a week, but you are too busy with other matters and then you don’t get the important reply you needed to save money….
  • The key of the main door needed to be copied because the one you have is old and not working properly and every time you come back you have no idea if you could get in… and maybe you are not alone.

 

You could say: what a stupid guy, just a bit of bit has some big impact on the consequences… well think to yourself and to all the times you get in trouble just because you didn’t invest the small amount of time to get things done… It’s not resolution of character, it’s just that we are not aware of the spread of the consequence of our actions on the dimension that is not the time.

We are concerned more about the time, rather with the extension; start to think to bidimensional: time and space together; then prioritize what to do and do it.

In previous examples is easy to see that a small amount of time saved not getting things done is nothing in comparison to the extension of the consequences of that not taking actions implies.

A simple rule could b: not think only how much takes to do something (or not doing something) but more important focus to the extension of that action (or not action) will imply on your life; if the extension is huge, that means is likely to have  huge impact on you, so consider to take an action… it happens the most important things and decisions in our life have to be taken … in seconds.

The Pareto principle can be seen as 90% of extension of consequences, usually are connected with action that take 10% of your time to be sorted out…

I hope this helped…