In my last post, I detailed my approach to resolving the eighth #PreppinData challenge of the year. This entailed a bit of filtering and branching to separate the Theft and Stock Adjusted elements, before bringing everything together via a join. It worked, but was it the “best” method?

After submitting my workflow to the world of Twitter, I searched through the #PreppinData hashtag to see how others had tackled the challenge, and as usual there were a number of different approaches.

By far the most efficient workflow I saw was the solution that Tableau Zen Master Rosario Gauna proposed:


Looks like a couple of initial cleaning Steps akin to the ones I carried out with each table, but instead of filtering, branching and joining, Rosario found a way to overcome the challenge this week with a single join and aggregation.

A key thing to remember with aggregation is that you can use it to your advantage to ignore nulls (as articulated by Joshua Milligan here). So if you create the right combination of calculations, you can subsequently aggregate things to suppress the null values, much like this quartet of calcs:

Screenshot 2019-04-05 at 20.09.24

Null values are an inevitable output of logical calculations where you’ve set the unmatched values to be nulls:

Screenshot 2019-04-05 at 20.11.00.png

Where the Action is not ‘Stock Adjusted’ above, the [Date of Stock Adjusted] calc returns a null. That’s fine as it will be addressed at the aggregation stage.

Having created those calcs and joined to the Branch ID table, it’s time to perform the aggregation. The workflow up until this point looks like this:

Screenshot 2019-04-05 at 20.15.38

When defining the aggregation, we know that the key grouping points are Branch Name, Crime Ref Number and Type, and that we’ll be aggregating our four calculated fields. When aggregating the dates there is the choice of setting the MIN or MAX date. In the data set this week, theoretically this isn’t too important as we appear to have a single row per Theft and Stock Adjustment action.

Screenshot 2019-04-05 at 20.33.47

Once this aggregation has been done, it’s time to perform the final two calculations:

Screenshot 2019-04-05 at 20.30.30

And that is that. Rosario’s elegant solution averts the need for pivoting and/or joining by using the existing data layout. I much prefer this to the route I took, though each solution arrives at the correct end point so neither should be deemed less effective.

Through reviewing Rosario’s approach I learned a lot about data prep in Prep Builder, and there’s still at least one more solution to the week eight challenge – data pivoting.