We’re a quarter of the way through 2017’s #WorkoutWednesday challenges. For week 13, Andy Kriebel used the World Indicators data set to test out Benford’s Law. In comparison to some of the challenges this year, this wasn’t a major brain-acher, but I was absolutely reliant on Google for one calculation which would otherwise have meant I needed to look at Andy’s workbook for the answer. This blog post will explore some key components of the challenge.

First of all, we needed to write a calculation to plot the percentage of countries falling into each category. It was a simple enough one to write from scratch, or you could just use the in-built table calc wizard:

Screen Shot 2017-04-01 at 12.43.33.png

A trickier calculation related to the data item to put on Columns. We need to trim the leftmost digit from the Metric, which in itself is just a basic string calculation. There were, however, a couple of minor things to throw into the calc to get things working as required. Initially, my calculation looked like this:

Screen Shot 2017-04-01 at 12.47.45.png

Things looked broadly OK, until I started to look at the [Birth Rate] or [Infant Mortality Rate] metrics:

Screen Shot 2017-04-01 at 12.48.58.png

Hmm, why do I have a zero on the x-axis? Let’s look at the data itself:

Screen Shot 2017-04-01 at 12.51.55.png

I reordered things a bit to highlight the issue. CO2 emissions come through fine, and it’s because we’re dealing with whole numbers, whereas both Birth Rate and Infant Mortality Rate are all single digit percentages, so the leftmost value is always a zero. If I multiply these by 100, I should be back on track:

Screen Shot 2017-04-01 at 12.53.40.png

Screen Shot 2017-04-01 at 12.54.07.png

B-I-N-G-O.

To address Benford’s Law itself, it’s Google-time. The wikipedia article linked at the top of the page is great, but when I saw this:

Screen Shot 2017-04-01 at 12.56.20.png

I had to have a sit down. When this happens, it’s time to take a punt:

Screen Shot 2017-04-01 at 12.57.26

And what is returned first of all? An online tutorial video on Tableau’s site looking at Benford’s Law! Superbly useful! Even better – the video is accompanied by a detailed blog post, so there’s simply no reason to fail to complete this #WorkoutWednesday challenge!

Watching the video revealed two key things to me. First, the calculation which untangles the equation above:

Screen Shot 2017-04-01 at 13.04.05

And a problem with my [First Number Trim] calculation:

Screen Shot 2017-04-01 at 13.04.16

I just need to convert the output of that calc to an Integer, which can either be done in the Benford’s Law calc:

Screen Shot 2017-04-01 at 13.04.40.png

Or at source:

Screen Shot 2017-04-01 at 13.05.08.png

Once this is all set up and you filter to exclude nulls and zeroes from the [First Number Trim], you’re ready to tackle the distribution band, which is achieved as shown in the video and the companion blog.

Screen Shot 2017-04-01 at 13.10.08

The problem with this approach, is that the lowest section has a blue tinge to it, so it doesn’t 100% match the challenge. Technically I’d failed the challenge elsewhere too, as I set the axis to fix from 0% to 50%, rather than allowing it to be dynamic. I preferred this approach as I felt you got a better sense of how each metric’s distribution fitted within a standardised scale.

However, whilst I could live with my design choice for the axis, I just couldn’t let the distribution band colour get in the way. I didn’t ultimately get the colours bang on, but I resolved the blue tint issue just by unticking some boxes:

Screen Shot 2017-04-01 at 13.27.25.png

A final calc was required to tidy up the metrics for inclusion in the title. The default metrics have the first letter of each word capitalised so you need to employ LOWER() to sort things out. It’s a bit weird to see GDP presented as “gdp”, so I made an exception in that instance:

Screen Shot 2017-04-01 at 13.21.14

It was another really useful challenge, especially for making me aware of Benford’s Law and its various applications. I work alongside a Fraud Team and it’s something that I’ll look to leverage ASAP, as it could make our methods of detecting potentially fraudulent behaviour more effective.

Screen Shot 2017-04-01 at 13.31.18

Advertisements