On #MakeoverMonday this week, we explored data breaches (not breeches!) since 2004. As ever, the link to the data is available via Andy Kriebel‘s blog, which now redirects you to the fancy new #MakeoverMonday site – you can access the datasets for each of the weekly challenges via the “Get Involved” tab.

My last post detailed my own submission, but this one instead focuses on an element of Andy’s own submission – and not even a part of his final viz. When I read his blog, my eye was taken by this simple chart:


Now it’s not the best thing in the world, but I liked it, and so wanted to recreate it. I’m a sucker for clean, simple charts and this works for me, and suits the corporate environment in which I work. Looks simple, doesn’t it? Hmm.

So I opened the dataset up into a new session on Tableau 10 and just focused on that image  from the blog without downloading the workbook. The title helps me out, and the Year is also a clue as to the basic structure of things. Let’s get to work:

Obviously Year goes on Rows, and the title explains that each square represents one breach. This means I don’t want any existing Measures on Columns as all we have there is the choice between Records Stolen and Records Lost. Why don’t I want them? Because they will spread out the squares as they aren’t uniform volumes of Records Lost or Stolen:


One mark per Year, why? No Entity on Detail. If I add that, this happens:


Better in the sense that I now have a Mark for each Entity where a breach occurred. However we still have that irregular horizontal distribution, because we’re plotting a variable measure rather than a uniform value. How to fix that? Good old INDEX():


OK. That’s not right, is it? What’s up? Well, I haven’t told the INDEX() function in which direction to count, so it has defaulted to Table (down) on Year:


So what do I need to do? Well I know that I want to count each breach per Entity and Year, so let’s tweak things accordingly:


Note how it has a sort of waterfall / gantt like appearance? Look also at the x-axis where somewhere between 220 and 240 marks are being plotted. The issue is that the table calculation isn’t being told to restart, so it’s kind of like a cumulative count of breaches which starts each new Year counting from the closing number of breaches of the prior Year. Easy to fix by just setting “Restarting every” to Year:


Progress! Still looks a bit crap, but I now have the right sort of content on the sheet. To get closer, I’m going to do a few things now:

  1. Reverse the y-axis to sort Year in Ascending order
  2. Colour by Records Stolen and add borders to each square
  3. Resize the Squares
  4. Remove the x-axis header and the y-axis title
  5. Remove gridlines, tick marks and axis rulers


Still more to do. Obvious issues are the fact that I’m missing every other Year on the y-axis, plus when I look at Andy’s submission I can see that he has obviously applied some Sorting in order to put the highest number of Records Stolen closest to the leftmost edge of each year. To remedy this, I made Year Discrete, and sorted Entity in descending order by Records Stolen:


That’s good enough! I had to fiddle with the sizing of the squares to get them looking this neat, but it ultimately looks comparable. Reverting Year to Discrete removes the spacing between the Rows, so it’s all neat and tidy now. It’s clear that by Sorting at an Entity level that it doesn’t quite work year by year, as is best shown in 2014 where the second square is lighter than the subsequent ones. That’s because AOL also had a big breach in 2004, and the sort is summing up all breaches, irrespective of year. I’m sure that can be resolved via calculations, but I can’t be bothered with that for now.

I then downloaded Andy’s workbook to see if his approach was the same. It wasn’t. The majority was, but Andy used a Gantt sized by a calculated field to alleviate the whole square sizing conundrum. He literally used an AVG(1) calculation to size the Gantt. I copied the process but just used a calculation of 1 with no aggregation, and it seems to work the same:


So there you go. Seems like there is always more than one way of doing things in Tableau, and you can learn new things (at least at my stage of development) with every single workbook you download.