ETL Architecture Smart Examples & Snippets (3 of N) Job Monitor

After the pkg and related jobs have been deployed you’d like to keep an eye on their execution. In general you can define just a few queries to run against the PROD server and to check the current status of the job manually; this idea can be further developed defining a proper ETL Job Monitor define a dtsx to accomplish that in an automatic fahion and serving more user (ex BA would check Summary/ volume of data imported, Support more focused on error/warning checking)

SSRS could be an option, but sometime the logic itself can be more complicated;

For example let’s say you have 2 dbs on different server you want you use a merge join based on common fields to check if there is a match (ex some data imported on both)

It’s very likely in a ETL process that you might have a situation like this where the 2 sources have 1 or 2 column in common:

So in this case you can use an Outer Join to check if data is in sync/imported or not

Using nice colour-scheme code to have a quick overview (in the img the first 2 have no matches while the last 2 have not been imported properly…)

General Pkg structure

Let’s see the general pkg structure and its components: this is the main SEQC

We have the following:

1) A set of sql queries to validate the data on the sources/targets:

You can prototype them to check RowCount imported, sql job status, file pkgs (this is easy if you are using an etl logging framework to build your dtsx)

2) DFT tasks to implement db cross checking (as the previous example)

3) SCR tasks to generate the email contents.

This is an example of a report:

Let’s focus details the point 3) as this is the one that is independent (you know your queries😉 )

We have (simplifying):

SQL Get Report Support Mail Conf Configuration

MailBodyTemplate + MailSubjectTemplate

SCR Build ETL Monitor Report

SCR Send ETL Monitor Report

To support personalized email you can define SupportSubscribersMailConf table:

<PRE>

CREATE TABLE [dbo].[SupportSubscribersMailConf] (
[SubscriberMail] VARCHAR(255) NOT NULL, -- me@mail.com
[IsEnabled] BIT, -- t/f
[DIVSubscribed] text NULL

</PRE>

Note> The DIVSubscribed helps on creating personalized emails, in practice the mail you’re are going to send is a html with the sections contained in div elements, in this way you can simply set what the SubscriberMail (if IsEnabled) will see

Ex

SubscriberMail IsEnabled DIVSubscribed

mario.amatucci@mail.com 1 veSQLForSQLPackageRunSummary; veSQLForSQLPackageRunsWithErrorsOrWarnings;

jack.fox@mail.com 1 veSQLForSQLPackageRunSummary; veSQLForSQLPackageRowCounts;

Note> yes you might think of normalizing that using IDs and adding a new tDIVSubscribed, but I feel it’s a bit of over-complication in this scope

MailBodyTemplate + MailSubjectTemplate are 2 SSIS var to hold the meta sql used to build the actual mail to send

This is what I am using with the PackageRunSummary section example:

<PRE>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>3-col layout via CSS</title>
<style type="text/css">
         /* <![CDATA[ */ /* GENERIC STYLES */ body { margin:0; padding:0; font-family: Calibri; }
 .FirstRow { margin-top:1%;  padding-left:1%;
background-color: #EEF;
border-top: 1px solid gray;
  text-decoration:underline; font-weight:bold; }
            .container { width: 100%; display: table; border-top: none; border-bottom: 1px solid gray; }
            td { font-size: 10; text-align: left; }
            td * { display: inline; }
            th { text-align: left; }
            th * { display: inline; }
            td.LastReportInstanceStatusIDNormal{font-size:90%;}
            td.LastReportInstanceStatusIDSmall{font-size:70%;}
            td.LastReportInstanceStatusIDSmallItalic{font-size:70%; font-style:italic}
            td.LastReportInstanceStatusID2{ color:red; }
            td.LastReportInstanceStatusID0{ color:orange; }
            td.LastReportInstanceStatusID1{ color:green; }
                                  }
                        /* ]]> */
</style>
</head>
<body>
<div id="header">
	<p>
		 Dear <b>XXX ETL Support Team</b> (User::SubscriberMail),
	</p>
	<p>
		 Please find below a summary of the XXX ETL jobs run over the past 24 hours.
	</p>
	<p>
		 Please review and investigate any issues.
	</p>
	<p>
		 Regards, User::FromMail
	</p>
	<p>
		 ETL Job Status details <i>(BatchRunID =User::BatchRunID)</i>:
	</p>
	<span><a name="Summary" id="Summary"></a></span>
</div>

<div id="DIVveSQLForSQLPackageRunSummary" style="display:none">
	<div class="FirstRow">
	<a href="#Summary"> User::trveSQLForSQLPackageRunSummaryTitleHTML </a>
</div>
	<div id="TabletrveSQLForSQLPackageRunSummaryHTML" class="Table container">
	<table border="0">
	<thead>
	<tr>
		<th>
			 PackageName
		</th>
		<th>
			 RunsInLastDay
		</th>
		<th>
			 LatestStatus
		</th>
		<th>
			 TotalErrors
		</th>
		<th>
			 TotalWarnings
		</th>
	</tr>
	</thead>
	<tbody>
	 User::trveSQLForSQLPackageRunSummaryHTML
	</tbody>
	</table>
</div>
</div>

</PRE>

In SCR Build ETL Monitor Report the core function is:

<PRE>

// ParseRSVeSQLForSQLPackageRunSummary
ParseRSVeSQLForSQLXYZ("veSQLForSQLPackageRunSummary", ref currMessageBody, new object[] { "Normal", "Small", 3, null, "Normal", "Normal" });

// ParseRSVeSQLForSQLPackageRunsWithErrorsOrWarnings(ref currMessageBody);            ParseRSVeSQLForSQLXYZ("veSQLForSQLPackageRunsWithErrorsOrWarnings", ref currMessageBody, new object[] { "Normal", "Small", 3, null, "Small", "Small", "Normal", "Normal" });

</PRE>

The idea is to use some meta html and to feel with the query results, each section has the following structure (just replce XYZ with the specific section ID):

<PRE>

<div class="FirstRow">
	 User::trveSQLForSQLXYZTitleHTML
</div>

<div id="TabletrveSQLForSQLXYZHTML" class="Table container">
	<table border="0" width="85%">
	<thead>
	<tr>

		<th>
			Col1
		</th>
		<th>
			Col2
		</th>

	</tr>
	</thead>
	<tbody>
	 User::trveSQLForSQLXYZHTML
	</tbody>
	</table>
</div>

</PRE>

The colour coding comes from this simple rule:

It adds the contents of rsveSQLForSQLPackageRunSummary (populated by veSQLForSQLPackageRunSummary) and add css class as the following

  • Object[0] col css class in the html template Normal LastReportInstanceStatusIDNormal
  • Object[1] css class LastReportInstanceStatusIDSmall
  • Object[2]columns will get the value of the row[3] (at the moment 0,1 2 as status id ) and will use this to define the css class for the 3th column
  • Object[3] the column is not rendered as it holds the values used in the 2

Ex

ParseRSVeSQLForSQLXYZ(“veSQLForSQLPackageRunSummary”, ref currMessageBody, new object[] { “Normal”, “Small”, 3, null, “Normal”, “Normal” });

PackageName: Normal CSS class

RunsInLastDay: Small CSS class

LatestStatus: Use the value of column 3 (LatestStatusID)

LatestStatusID: Hidden because the values {0,1,2} are translated in colours and the LatestStaus text is displayed with that colour code

TotalErrors: Normal CSS class

TotalWarnings: Normal CSS class

Note> Some useful links to build the html:

http://htmledit.squarefree.com

http://www.w3schools.com/html/tryit.asp?filename=tryhtml_basic

http://www.digitalcoding.com/tools/html-beautifier.html

http://www.textfixer.com/tools/remove-line-breaks.php

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s