We’re a quarter of the way through 2017’s #WorkoutWednesday challenges. For week 13, Andy Kriebel used the World Indicators data set to test out Benford’s Law. In comparison to some of the challenges this year, this wasn’t a major brain-acher, but I was absolutely reliant on Google for one calculation which would otherwise have meant I needed to look at Andy’s workbook for the answer. This blog post will explore some key components of the challenge.
First of all, we needed to write a calculation to plot the percentage of countries falling into each category. It was a simple enough one to write from scratch, or you could just use the in-built table calc wizard:
A trickier calculation related to the data item to put on Columns. We need to trim the leftmost digit from the Metric, which in itself is just a basic string calculation. There were, however, a couple of minor things to throw into the calc to get things working as required. Initially, my calculation looked like this:
Things looked broadly OK, until I started to look at the [Birth Rate] or [Infant Mortality Rate] metrics:
Hmm, why do I have a zero on the x-axis? Let’s look at the data itself:
I reordered things a bit to highlight the issue. CO2 emissions come through fine, and it’s because we’re dealing with whole numbers, whereas both Birth Rate and Infant Mortality Rate are all single digit percentages, so the leftmost value is always a zero. If I multiply these by 100, I should be back on track:
To address Benford’s Law itself, it’s Google-time. The wikipedia article linked at the top of the page is great, but when I saw this:
I had to have a sit down. When this happens, it’s time to take a punt:
And what is returned first of all? An online tutorial video on Tableau’s site looking at Benford’s Law! Superbly useful! Even better – the video is accompanied by a detailed blog post, so there’s simply no reason to fail to complete this #WorkoutWednesday challenge!
Watching the video revealed two key things to me. First, the calculation which untangles the equation above:
And a problem with my [First Number Trim] calculation:
I just need to convert the output of that calc to an Integer, which can either be done in the Benford’s Law calc:
Or at source:
Once this is all set up and you filter to exclude nulls and zeroes from the [First Number Trim], you’re ready to tackle the distribution band, which is achieved as shown in the video and the companion blog.
The problem with this approach, is that the lowest section has a blue tinge to it, so it doesn’t 100% match the challenge. Technically I’d failed the challenge elsewhere too, as I set the axis to fix from 0% to 50%, rather than allowing it to be dynamic. I preferred this approach as I felt you got a better sense of how each metric’s distribution fitted within a standardised scale.
However, whilst I could live with my design choice for the axis, I just couldn’t let the distribution band colour get in the way. I didn’t ultimately get the colours bang on, but I resolved the blue tint issue just by unticking some boxes:
A final calc was required to tidy up the metrics for inclusion in the title. The default metrics have the first letter of each word capitalised so you need to employ LOWER() to sort things out. It’s a bit weird to see GDP presented as “gdp”, so I made an exception in that instance:
It was another really useful challenge, especially for making me aware of Benford’s Law and its various applications. I work alongside a Fraud Team and it’s something that I’ll look to leverage ASAP, as it could make our methods of detecting potentially fraudulent behaviour more effective.