So you have some Job spec to implement, one or more SSIS packages, let’s see an approach that will help to build a quality package in short time.
A lot of BI developers just jump into BIDS and start coding, trying to match the specs and reworking what done very often, as they understand the data workflow: wrong. This approach is inconsistent, long and keen to errors.
Unfortunately you are a BI consultant, not many tools are given to you, so let’s be smart; a cool approach I developed in the last roles it’s the following, please give me some feedback.
The idea is quite simple: use an excel worksheet to abstract the business logic of the spec so the implementation would be a matter of clicks and drag and drops in BIDS, when the dataflow is completed with the necessary tasks, connections, ad vars.
This is an example of an xls:
The general structure of the xls has the following columns:
- Connection Input
- Input Var Used
- veSQLFor Input Var
- DFT Fields
- Output Var
- Connection output
- hyperlink to file
let’s see what they mean.
It’s just a reference to the spec doc section (if any)
If they are missing I suggest adding them to split the spec logic in more manageable chunks.
The name of the task; try to prefix the name itself with the type of task
SQL Read Latest Business Date Active = sql task to read latest
SCR Extract CurrentZipDate = Script Task to extract …
DFT Import the file content = Data Flow Task to import …
In this way you can apply (as I did) a Conditional formatting on the column to see them in different colours
It’s the name of the connection used by the task (if any). This is very useful so you always know the task to which data source (file or db or other) is pulling data from
To define the connection used by the task and define a data validation:
In this way the when you need a new connection you’ll add first it to the Connection worksheet and then you can select in the Task sheet for the Task itself… keeping it consistent; avoid to copy and paste the text.
Input Var Used
The idea is similar the connection; the pkg vars are often source of errors if not managed properly, the best is to list them in a separate sheet and setup the Data validation to constraint the value you can select in the Task sheet for the Input Vars Used column
veSQLFor Input Var
This is an advanced topics, I will post something specific, in general avoid hardcodig sql snippets in the tasks, define it as vars and use expressions if they needs to be dynamic (when they dependent on some vars value); at least you can use a formula to build the name of the vars based on the task, so the matching would be straightforward:
SQL Update ControlTable = veSQLForSQLUpdateControlTable
So in the Sql Task SQL Update ControlTable, the script will get the sql from a vars named veSQLForSQLUpdateControlTable.. Easy.
You can use a simple formula like this to build the name
=”veSQLFor”&SUBSTITUTE(CELL WITH TASKNAME,” “, “”)
DFT Fields & ROW_Count
This can be optional and of course they hold the Field of a DFT task and if any ROW_Count you need to log
Usually in Data extraction you need to log the number of rows read/inserted/deleted etc… keep name consistency
Output Var &Connection output
The same as for the Input ones, if the task has outputs and connections or it’s updating a var here is where to track them
In this way you can see the dataflow even you haven’t coded the dtsx yet, just scrolling down the xls…
Hyperlink to file
This make sense if you have read the previous post https://mamatucci.wordpress.com/2014/01/15/bi-consultants-the-hard-way-structuring-sql-and-ssis-solution/ as I advice to write the sql in separate snippet and to keep it a separate sql solution, you can use a hyperlink to open the sql snippet from xls and check the details of it.