A bonus of #MakeoverMonday is that more often than not, the data is nice, clean and ready to viz. A downside of this is that I have no real experience of data preparation at all. Nothing. All I have done is pivot a few fields and maybe written a calculation or two to parse a dodgy date format.

On week 51, we received a wide dataset with 18 columns of data relating to 9 different metrics (one “Actual”, and one “Target” for each metric). With my submission, I didn’t even Pivot the data, but arrived at a so-so solution (albeit it with the Crime and Injuries metrics misrepresented against plan).

When I recreated Lindsey Poulter‘s viz, I didn’t have to fiddle with the data, as Lindsey proved that you don’t need to work with long and tall data to deliver a terrific end result. However, when I looked to recreate David Krupp‘s submission, my lack of data prep experience (and, frankly, my lack of Excel on my MacBook) created a bit of an obstacle.

His data was pre-prepared in Excel, and kind of combined the wide original dataset with a pivoted equivalent:

screen-shot-2016-12-21-at-19-57-35

This meant that subsequent calculations could be tidier:

screen-shot-2016-12-21-at-19-59-18

So whilst David’s workbook does contain plenty of calculations, they can be shared across the metrics, which is more efficient than duplicating and editing calculations eight or nine times. I was bumbling about trying to create an equivalent only using Tableau, when I realised that I didn’t really know what I was doing. Then I saw that Ashish Chauhan had shared a post on how he tackled the same dataset.

This was really helpful for me, and I was especially keen to understand the calculations Ashish wrote in Tableau to arrange the data once it had been Pivoted. When I glanced at the calculations, I didn’t really “get” them, so I’ve dumbed them down to my way of thinking in order to help it stick in my mind, and hopefully it will help others too.

Step One. The first calc was as below. Note that after Pivoting, I called “Measure Names” Measure, and “Measure Values” Value:

screen-shot-2016-12-21-at-20-13-22

What is this doing?

screen-shot-2016-12-21-at-20-14-33

And in my dumbed-down terms? Well, it looks for ” (” as a sub-string within the Measure field, and truncates it when that string appears. What does this translate to?

Screen Shot 2016-12-21 at 20.17.33.png

See how the Measure for Crimes (per 1M passengers) has been reduced to Crimes? And how Customer Injury Rate (per 1M passengers) is now Customer Injury Rate? It has just chopped off everything to the final character before the space in the ” (” sub-string.

Good. I get this one. Time to move onto the second step. This calculation is:

screen-shot-2016-12-21-at-20-23-15

So in Noddy terms:

Take the “Clean Original Measure” calculation, and if it contains the word ” Target”, just replace that word (and the space before it) with nothing, otherwise just bring back whatever was in the original calculation.

And what does this amount to?

screen-shot-2016-12-21-at-20-23-03

See how the “New Measure” field effectively duplicates “Bus Fleet Reliability” and “Bus On-Time Performance” and “Crimes”? Why would we want that? I didn’t understand it initially, but it all becomes clear with calculation number three:

screen-shot-2016-12-21-at-20-26-40

Clink!

That is the sound of a penny dropping! By duplicating the Metric, Ashish then gives himself a chance to return to the original calc to denote whether the Row contains “Target” or “Actual” values. How does the end result look?

screen-shot-2016-12-21-at-20-29-44

So let’s walk through that to reinforce what has happened.

  1. Trim bracketed sub-string from original Measure string
  2. Remove “Target” from cleaned up string
  3. Write calc to denote whether the original Measure related to “Target” or “Actual” values

Once your data is structured like this, it’s pretty easy to then whack stuff onto the canvas and start your viz:

screen-shot-2016-12-21-at-20-44-05

The learning doesn’t end there either! Ashish shows the calc he used to refine the source date into something usable:

screen-shot-2016-12-21-at-20-36-18

So:

screen-shot-2016-12-21-at-20-38-21

Once the data type is set to Date (which is why the above is wrapped in DATE()), the Hours, Minutes and Seconds of the specified format are trimmed. I just wrote this, which is probably a consequence of my distinct lack of “coding” experience, but years of messing about on Excel:

screen-shot-2016-12-21-at-20-35-20

Same end result, just a different way of getting there.

Advertisements