ETL Without the T: A Loop-Driven, Data about the Dataflow Approach Using SSIS

Microsoft SQL Server Integration Services (SSIS) is designed for ETL. However, sometimes the T in ETL isn’t necessary: data simply needs to be moved without manipulation. While defining a simple source-to-destination dataflow is easy, filling a package with the numerous copy operations required to move data between multiple source-destination pairs results in a verbose package that violates the Don’t Repeat Yourself (DRY) principle.

Using a loop to apply a single copy data dataflow to a list of source-destination pairs eliminate this negative. With this approach, the main dataflow in the SSIS package is not the data being moved; rather, it is data about that data—that is, data describing the sources and destinations to which the extract-and-load operation should be applied. Continue reading

SQL Tip: Breaking into Batches

Internet search results are typically broken into batches. The initial page displays results 1 through perhaps 10, page two shows matches 11–20 and so on. Splitting results up like this is known as (no surprise) paging.

To implement results paging with Microsoft SQL Server (version 2012 or later), try modifying your ORDER BY clause with OFFSET and FETCH. OFFSET tells SQL Server to start returning x rows into a result set. FETCH defines the number of rows to return. Continue reading

SQL Tip: Adjusting Alphabetization

Alphabetical sorting is simple: A comes after B, next comes C, then D and so on all the way to Z…right?!

If only it were so easy! Consider the character Á—should it be sorted before A, after A or just like it was A (ignoring the accent)? The answer depends on language: Hungarian’s Á comes after A; in Spanish, the accent should be ignored. In many languages, Y comes between X and Z; however, Lithuanian orders Y before J. French has an interesting twist: sometimes strings are sorted by first comparing their letters from left to right then by comparing diacritics from right to left. Some languages even treat certain multi-character combinations as single letters, like CD in Welsh. If this doesn’t sound complex enough, alphabetization rules can change: for example, in 1994, the Congress of the Association of Spanish Language Academies decreed that Spanish’s CH would no longer be considered a single, muilt-character letter combination! Continue reading

Knowledge, Reality and Reconciling the Difference (Part 1)

Values often need to be tracked as they change over time. Imagine we’re building a payroll system. A foundational requirement would be to store an employee’s pay rate. In all but the most basic of systems, saving this rate as a simple value is insufficient (“Joe’s wage = $20/hour”); rather, a history of the rate over time needs to be maintained (“Joe’s wage = $15/hour as of 10/15/13; Joe’s wage = $20/hour as of 2/1/14”). We do this by giving pay rate a time dimension.

What our system knows about a given moment in time and what really was at that same moment can differ. Suppose Joe’s manager signs paperwork on January 20 giving Joe a pay raise but leaves for vacation before turning in the paperwork. At close of business on January 20, the payroll system knows Joe’s rate to be one amount when, in reality, it’s a different amount. Joe has been given a raise but the system doesn’t yet know about it. Continue reading

Quick Tip: Setting Variables Inside Array & Hash Literals

Did you know that you can assign variables inside Ruby’s array and hash literals?

While the values used inside square bracket array literals are typically hard-coded, provided by a variable, computed from an inline expression or generated by a method, any valid Ruby statement can be used. The same is true for both keys and values used in hash literal key => value pairs. Continue reading

RSpec: be true & be_true Are Different!

RSpec’s be true and be_true look deceptively similar. In fact, their naming suggests that they might be synonyms for the same assertion. Can the two clauses be used interchangeably to produce the same effect? No!

Both check trueness but from different perspectives. be true checks whether the compared-against value is true: “Does the value literally equal true?” be_true asserts that the test value evaluates to true: “Is the value truthy?” be false and be_false are the respective inverse match clauses. Continue reading