It has been a busy few months for me, so my #MakeoverMonday submissions have been somewhat hurried. A case in point was my week 46 effort. In contrast, there were several excellent alternatives created across the community, and one that I really wanted to break down was Jeremy Poole‘s. I loved a lot of things about it and wasn’t sure how he’d created it, so it was time to download the workbook and learn a bunch of stuff.

screen-shot-2016-11-25-at-19-06-59

Aesthetically it’s impactful, but the things I wanted to explore were:

  1. How Jeremy managed to normalise the “chart width” for songs with differing word counts. ‘The Message’ has twice as many words used as ‘Respect’, but the Viz manages to even that distribution out
  2. The colour coding. I really like how the first use of each word is highlighted. How has Jeremy achieved that?
  3. In general, how has this graphic equaliser sort of style been squeezed into Tableau?

As ever, I start with the bits that I can figure out, and then I look under the hood at the bits I struggle with. What can I do? Everything except the chart! Combining the Title and Artist can be done a few ways. I could create a Combined Field:

screen-shot-2016-11-25-at-19-14-00

The initial output might not be perfect:

screen-shot-2016-11-25-at-19-14-58

But you can just edit a Combined field to reorder the fields, and change the separator:

screen-shot-2016-11-25-at-19-16-10

You literally just drag the Song Title to switch it with the Artist, and then amend the “Separate members by” box to choose the type of separation you want – and it’s immediately previewed to the right.

screen-shot-2016-11-25-at-19-17-38

In Jeremy’s case, he can’t have done this, because you’ll note that he clearly identifies the Song Title by wrapping it in single quotes. It’s a small detail, but it has quite an impact. Achieving that is simple and just a case of writing a quick calculation:

screen-shot-2016-11-25-at-19-23-12

As a guy entrenched in a world of Excel for 20 years, I’m used to a CONCATENATE syntax, but this is pretty much the same gist, so it was easy enough to do. Job one done, now let’s tackle the next two columns.

First up is the “# of Words”. This looks like just being SUM(Number of Records), so that’s all I do. I use a Mac so Alt-drag the Number of Records field to Rows, aggregate it as a SUM and Right-click the resultant Green Pill to change it to Discrete. Then I just change the name of the field to mirror the original:

screen-shot-2016-11-25-at-19-29-04

Two out of the three bits that I think I know. Time for the ‘% Unique’ calculation. This is just to show the percentage of all distinct words in the whole song, so it appears to be an easy one:

screen-shot-2016-11-25-at-19-31-59

Once done, I can just Convert to Discrete, to enable me to drag it to Rows without plotting a chart of some type. Looks like I’m on track?

screen-shot-2016-11-25-at-19-33-17

This is where the learning really started. I had no idea where to turn next as I just couldn’t tell what I needed to shove on Columns, nor what Measure(s) needed to be plotted. I knew from the Dashboard that some sort of Parameter was in play to allow you to compare the Word Count of songs, or to “normalise” the Word Count to achieve a uniform width for the charts, but that was it.

Time to unhide the worksheet and tick things off, one by one. First – Columns. Jeremy has created a calculation to drive a Parameter, which says:

screen-shot-2016-11-25-at-19-43-00

So if the Parameter says one thing, do this, otherwise do that. Rather than write the calc, I just dragged Word Count to Columns to see what would happen:

screen-shot-2016-11-25-at-19-44-45

O……kay. No idea what this is all about. First – I know that Mick hasn’t blurted out 33,153 words in this song, and second – I see a Green Pill and therefore a chart is plotted. Not really sure what I’m doing, so I’ll just change it to Discrete, as that’s just something I randomly do. In Jeremy’s workbook, he has Word Count as a Dimension. I could do that, but let’s see if my usual blundering works.

Once I have one part of the calculation physically hard-coded into the view, I look at the Measures plotted.

screen-shot-2016-11-25-at-19-48-13

O……kay. So I know that Measure Values means that Jeremy has plotted at least two Measures. I can’t even guess what one of them is, so I take a look:

screen-shot-2016-11-25-at-19-50-16

Things begin to make sense. It’s obviously the same calc, but with one where the values are inverted. When you look at the “graphic equaliser” chart, that figures because the y-axis is essentially mirrored. Time to see how clearly Jeremy names his calcs. “# of Uses in Song” sounds to me like the overall number of times that a particular word is used in a song. How would I calculate that?

screen-shot-2016-11-25-at-19-54-08

Level of Detail calcs are cool. I don’t fully “get” them, but I know that in this case I have to fix on the two fields to the left of the semi-colon, or else I’ll end up summing all instances of either the same word across all of the artists’ songs, or the same word across all songs by all artists. If I pop this measure on the view, this happens:

screen-shot-2016-11-25-at-19-57-52

This is not going well! I set the drop-down menu at the top to Fit Width so I could see everything across each song, and it’s clear that just one “thing” is plotted for each song. Why is that? Well look at the level of detail over on the left – there’s nothing there. All Tableau is doing is plotting the number of different words used in each song. To get a whole load of stuff plotted, I need to put “Word” on Detail:

Screen Shot 2016-11-25 at 20.01.22.pngOh crap! Now what’s up? You can see some tiny little bars and that’s OK, that just needs me to set Independent axis ranges for each row or column. My main issue here is why are things looking disaggregated? Why has this Rolling Stones classic suddenly got more than one Row?! That must mean one of my earlier calculations is screwy and is now biting me on the arse. My course of action in these cases is always to panic and then plagiarise. How did Jeremy avoid these issues? Let’s look at his calcs on the Row shelf.

screen-shot-2016-11-25-at-20-07-47

Unlike my hastily assembled SUM(Number of Records), Mr. Poole has used a LoD calc to focus the calc on specific data intersections. Only Sum up the Number of Records where it is  this specific combination of Song Title and Artist. Alright, I’ll copy that. When created, it is saved by default as a Green Measure. I could Convert to Dimension like Jeremy (which immediately makes it Discrete), or I could leave it as a Measure and just convert to Discrete there. In the spirit of discovery, I’m doing the latter just to see if it causes an issue further down the line:

screen-shot-2016-11-25-at-20-12-39

No real improvement, although I’ve tweaked the axes to make them independent to make myself feel better. Not a problem, if I know that calc is fixed, then it must either be that the other calc is screwed, or it really does matter if things are regarded as Discrete Measures rather than Discrete Dimensions? Right? What does the “proper” “% Unique” calculation look like?:

screen-shot-2016-11-25-at-20-14-26

Once more, LoD comes into play, whereas my sloppy iteration omitted to be so precise. Will remedying this bring a smile to my face?

screen-shot-2016-11-25-at-20-20-45

Well, there’s a sort of smile, but also a frown. It’s different, but is it right? To humour myself, I thought I’d add the second, inverted Measure to try to get the “graphic equaliser” aesthete to work:

screen-shot-2016-11-25-at-20-37-59

As you’d expect, I’ve just mirrored things. It wasn’t the silver bullet to make everything alright. A period of trying all sorts of things randomly ensued, but a step back and a beer made me think that I’d instead just try to see what the impact of Jeremy’s use of “Word Count” as a Dimension would have. I made the conversion, and we have half a “bingo” moment:

screen-shot-2016-11-25-at-20-39-47

Granted, it’s not the final product, but a bit of sorting and formatting can clean things up pretty quickly:

screen-shot-2016-11-25-at-20-43-19

Phew! We’re finally getting somewhere! Two things for me to focus on now. First of all, it’s  all about colouring the chart. The original highlighted the first usage of a word in pink, and subsequent usage of that same word in teal. I want a piece of that action in this recreation too. Sounds easy, was and wasn’t, and the solution is really neat.

To cut a long story short, Jeremy whacked a boolean calc on Colour:

screen-shot-2016-11-25-at-20-46-47

What is this “Earliest Appearance” calc?

screen-shot-2016-11-25-at-20-47-52

OK. So for each Song / Artist and Word combo, it tags the first instance of the word appearing. That means that the prior calc is just saying, in a roundabout way “Is this the first time it has been sung, or not?”

Screen Shot 2016-11-25 at 20.51.47.png

Final step is to understand how the original Viz manages to “normalise” the distribution of the words. Wa-ay back up the blog is the “Measure to Use” calc that Jeremy wrote, and it’s time to look in more detail at one of the component parts of that calculation:

screen-shot-2016-11-25-at-20-54-41

What’s this doing? Dunno. I looked at the data itself and at the calc a bunch of times, and the sound of air whooshing over my head was deafening. It’s just dividing Word Count by the total number of Words used in each Song Tile / Artist combination, but I don’t understand why it standardises the width of the charts in this case. If a kind soul could relieve me of this one remaining uncertainty, it would be much appreciated. If replicated, it distributed things beautifully, I just have no inkling why that proves to be the case:

screen-shot-2016-11-25-at-20-59-32

Jesus! What a journey this was! Thanks to Jeremy for posting such an interesting Viz, and thanks to Tableau for making a product that allows me to learn / get confused and enjoy myself in relatively equal quantities.

EDIT!

Jeremy himself has stepped in to answer this final conundrum:

screen-shot-2016-11-26-at-20-05-16

It makes complete sense. If you’re plotting the whole song and dividing each individual word position by the total words in the song, you simply end up as a percentage in the range 0-100 – thereby normalising the position and creating a uniform x-axis.

Advertisements