This week, Tableau eager beavers were set the task of replicating this chart from Emma Whyte. As is often the case, I found some things pretty hard to resolve, I had to download the workbook to understand a really early step, but I also found one aspect pleasingly simple.

If you want to do things properly and efficiently, read Andy Kriebel’s blog post here. If you want to read another learning journey with Tableau – keep reading! Emma’s blog post accompanying the challenge included a tick-list of stuff to tackle, and I’ll work through it in order.

You will have to use a Union in Tableau to get the data in to 1 table

Sounded like it might be vaguely fiddly, but was a walk in the park. When you load the data, you see this array of uniformly structured sheets (and you need uniformity to Union, else your field contents won’t be aligned):

screen-shot-2017-01-25-at-20-03-26

You can click the “New Union” icon at the foot of the list to bring up a sort of wizard to guide you through the process, but I just shift-selected from 2006 to 2016 and dumped the lot into the data section. I right-clicked to edit Union to see how it looked, and it’s exactly the same as if you’d gone through the shiny UI of the wizard:

screen-shot-2017-01-25-at-20-05-32

I pivoted the data to get it in to an easier to use format

Down to earth with a bump here! I thought I understood pivoting (still do!), but I just could not get the right pivot to allow me to tackle the chart itself. I had all sorts of combinations:

screen-shot-2017-01-25-at-20-09-34

screen-shot-2017-01-25-at-20-10-42

Believe me – I could add plenty more. Ultimately, when I fired a sheet up, I ended up needing to convert Pivot Field Values to a Measure, and then this happened:

screen-shot-2017-01-25-at-20-11-41

I’m not a patient man, so I downloaded Emma’s workbook at this early stage, just to see what shape her data was in.

screen-shot-2017-01-25-at-20-14-34

OK. Looks kind of like my last effort above is pretty similar? All I can see is that my metric value equivalent was still a string data type, whilst Emma is using either a decimal or Whole number. It’s green as it’s Continuous – I can’t set that in this window, but I can when I’m in a sheet.

So all I do is set the Pivot Field Values data type to Number (decimal), flip into the sheet, convert that field to Continuous, and then convert to Measure. I’m good to go, and just assume that Tableau is obliged to force a distinct count onto “Values” that it reads as strings – but that’s just a guess.

I filtered the quarterbacks to those that have played at least 3 seasons and those that have passed for at least 2000 yards in total

Sounds like a couple of booleans. First up, only those with at least 3 seasons of data. Emma doesn’t state whether she’s talking complete or incomplete seasons, so I looked at the data to see if there was any indication of a game count per season field:

screen-shot-2017-01-25-at-20-24-08

American sports are not my thing. Nothing jumps out but this dude Carson has racked up 22,121 Sheet – which is the sum of years in which Carson has thrown a ball about. I thought I’d deal with the first filter criteria by just stating that SUM(Sheet)>6,020. That’s because the first year of data is 2006, and 2006+2007+2008 is 6,021, so by only picking up players with at least 6,021 in this calc, I’d get those who have played at least three seasons. Even if the player only played in the last three seasons of data (2014-2016), the same basic logic works.

In practise, however, I couldn’t get that to work, but that’s OK as there’s always at least one other way. Here’s one:

Screen Shot 2017-01-25 at 20.41.58.png

So the filter on Metric Name focuses on Sheet, and I could Group players where the SUM(Number of Records) is >=3. If I then edit the created group by right-clicking, I can select to “Include Others” in their own group:

screen-shot-2017-01-25-at-20-44-55

Doing that, results in this:

screen-shot-2017-01-25-at-20-45-03

You can then pop that Group onto filters, edit it and uncheck “Others” to only include the relevant players.

screen-shot-2017-01-25-at-20-47-43

If you then put Player on Rows, you can see all the players and the number of seasons they featured in:

Screen Shot 2017-01-25 at 20.51.10.png

If you check the bottom-left of the screen, you can confirm that there are 87 players in the dataset with at least 3 seasons of actions under their belts:

screen-shot-2017-01-25-at-20-51-00

Alternatively……you could just achieve the same with a Set.

Screen Shot 2017-01-25 at 20.53.03.png

Stick that on filters (it should default to “Show members in set”) and you get the same result:

screen-shot-2017-01-25-at-20-54-02

Or…..you could just write a boolean calculation, and I’m sure these can take many forms. I did this:

screen-shot-2017-01-25-at-21-36-20

Put that on filters, set it to True, and Robert’s your mother’s brother:

screen-shot-2017-01-25-at-20-56-46

I’d go with the Set or calc, as the group is static, whereas the other two will update dynamically if applying the same methodology to a live dataset.

The second filter was something I could tackle with calculations. It could possibly be nested in a clever way, but I broke it down:

screen-shot-2017-01-25-at-21-14-52

With that in place, I then made use of the Level of Detail functionality to create a boolean to use as a filter:

screen-shot-2017-01-25-at-21-16-16

By using the 3 Season Set and the LoD calc, I’m hoping that they’ll work in conjunction to limit the dataset to the athletes I need to retain. To test this, I created a worksheet structured like this:

screen-shot-2017-01-25-at-21-20-06

If I scrolled to the bottom, I found this:

screen-shot-2017-01-25-at-21-20-18

Matt Flynn meets both criteria (at least 3 seasons of competition and at least 2,000 passing yards), so I reckon we can move onto the next bit. I also tested with both booleans on filters, and the result was the same.

You should be able to pick the QB that you want to highlight – this should be only the list of QBs after the filter

Before this, I needed to get the basic structure of the chart in place, and that wasn’t super easy. I knew that the first attempt wouldn’t work, but here’s how it looks when you drag existing Measures and Dimensions onto the canvas:

Screen Shot 2017-01-25 at 21.40.55.png

I know I want a RUNNING_SUM of Yards by Season for each Player, so the table calc needs to be configured thus:

Screen Shot 2017-01-25 at 21.43.11.png

Less jumbled, but still not what we’re after. The problem is the x-axis, and it’s because of this:

screen-shot-2017-01-25-at-21-47-13

Blake Bortles has three seasons of data, but he only started playing in 2014, whilst other players had a first NFL season back in 2006. The whole point of a common baseline chart is that you plot each player’s first season from a common point of 1, their second from 2 etc. At the moment, we have a mess as we’re plotting from the basis of Season, which disregards if it is the player’s first or fifth season.

To resolve that, INDEX() is brought out of the bag. But why does this work? Let’s look at Blake Bortles, with INDEX() set to Compute Using Season:

screen-shot-2017-01-25-at-21-55-44

INDEX() only kicks in when it finds data, so 2006-2013 aren’t indexed. Let’s look at someone whose career started earlier:

screen-shot-2017-01-25-at-21-58-18

That’s a great example. Looks like Mark Sanchez maybe had an injury which prevented him from playing in 2013, but the index recognises that and doesn’t increment until 2014. Again, the index starts and stops depending on when data is there to be indexed.

So, if we now apply INDEX() to Season on the x-axis, that magical common baseline calibration kicks in:

screen-shot-2017-01-25-at-22-00-36

Now onto the relevant bit: how to allow users to select a Quarterback that they want to highlight, picking only from the filtered list (at least 3 seasons and 2,000 passing yards). Can’t be too hard? First, I looked at the Highlighter option in the Analysis menu:

screen-shot-2017-01-25-at-22-08-33

That gives me a free text box, but if I select it, the player names appear and the highlighting kicks in:

screen-shot-2017-01-25-at-22-11-37

Where’s the catch? The brief of this bit of the challenge was “You should be able to pick the QB that you want to highlight – this should be only the list of QBs after the filter”. I always expect a complication somewhere, but this standard functionality seems to do the job?

Granted, it’s not a filter-type drop-down as in Emma’s final viz, but I’m all for taking the easy route when it works. I even checked to make sure that the Highlighter was filtering out players not meeting the two criteria. See how Aaron Rodgers tops the list? Here’s the list of players with fewest seasons, sorted alphabetically:

screen-shot-2017-01-25-at-22-15-48

None of those appear in the highlighter list, so the job’s a good ‘un!

The highlighted QB should be shown in the light blue colour and should have a circle at the end of the line displaying their total passing yards

That job? Not a good ‘un. The Highlighter doesn’t give me control of colour, but a calculation would. To fully meet this part of the challenge, maybe I’m going to need something which provides control and a calc – parameter time? At this point, I called it a night!

I always overcomplicate things, and I initially assumed that there would be more of a challenge than I expected to populate the parameter, and I tweeted as such:

Screen Shot 2017-01-26 at 19.09.54.png

I assumed that the list of Players would be unfiltered if I used the usual option to populate a Parameter from a field:

screen-shot-2017-01-26-at-19-19-23

And I was right. Look at the players in the list alphabetically:

screen-shot-2017-01-26-at-19-22-03

An image just up the page shows that Aaron Brooks, AJ McCarron, Alex Tanney and Andrew Walter shouldn’t be in the list, as they only feature in one season in the dataset. What the hell is going on? Andy asked if I knew about the Tableau order of operations. The truth is that I had heard of it, I’d even skim-read an article, but it hadn’t sunk in. Here’s a link to the article in question, and an image of the key “query pipeline” it discusses:

screen-shot-2017-01-26-at-19-26-37

After first looking at it, I’d assumed that my use of LoD calcs to match the criteria of the challenge would work for me, and in a sense they do. If I add a filter on Player and choose to only show Relevant Values……

screen-shot-2017-01-26-at-19-29-32

…..then the players not meeting the two criteria are excluded. No more Aaron Brooks.

screen-shot-2017-01-26-at-19-30-35

That’s great if you aren’t needing to use colour to highlight a Dimension member whilst retaining all the other members. Filtering is not the solution here, and it seems that Parameters are populated not based on the order of operations, but on the Dimension itself irrespective of any filters in place on the view.

At this point, I needed the not-too-discreet clue that I was given:

screen-shot-2017-01-26-at-19-34-05

OK. Let’s look at Data Source Filters to sort this out. Right-click the source at the top of the Data Pane and select Edit Data Source Filters:

Screen Shot 2017-01-26 at 19.36.51.png

Therein, you can see your Dimensions and Measures, including any calculations or Sets you have created. Select the relevant calc to filter the data at source:

screen-shot-2017-01-26-at-19-39-17

When I opened the already created parameter, it still retained the likes of Aaron Brooks, so there’s presumably some sort of caching going on which retains the original dimension members. I deleted that original parameter, recreated it from the Player field, and bingo!

screen-shot-2017-01-26-at-19-43-33

Now we’re in business! I’ve forgotten what the aim is…..oh yeah:

The highlighted QB should be shown in the light blue colour and should have a circle at the end of the line displaying their total passing yards

I started out with this, shoved it on Colour and tinkered with the colour settings:

screen-shot-2017-01-26-at-19-46-52

Screen Shot 2017-01-26 at 19.48.29.png

Some progress at last. More formatting stuff to do, but let’s concentrate on the circle on the end. My recent exposure to, and practise with, WINDOW_SUM, made me think I could nail this:

screen-shot-2017-01-26-at-19-54-10

Before setting Compute Using, computer said no:

Screen Shot 2017-01-26 at 19.57.08.png

Some aimless fumbling resulted in this:

Screen Shot 2017-01-26 at 19.59.38.png

The table calc on Circle equals the Running Sum of Yards. This is kind of good. But I couldn’t sort it out. Then I tried to make use of some other calculations I already had, but I ended up in the same position:

screen-shot-2017-01-26-at-20-08-20screen-shot-2017-01-26-at-20-08-36

I conceded defeat, and looked at Emma’s workbook. In hindsight, the reason my calc failed is because I wasn’t asking Tableau to only plot the value at the final data point, so it plotted it across all of the data points. Here’s what Emma did:

Screen Shot 2017-01-26 at 20.10.52.png

The second green pill is the circle, and the calc is shown below. What is the [Latest Passing Yards] calc?

screen-shot-2017-01-26-at-20-11-10

So if it is the last value, return that positions final cumulative value. I can replicate this using some existing calculations I had:

screen-shot-2017-01-26-at-20-21-40

The referenced calculation there is simply:

screen-shot-2017-01-26-at-20-22-30

So if I put the [Final Yards] calculation into my [Circle] calculation, we have a chance:

screen-shot-2017-01-26-at-20-26-26

It still looks familiar when I first plop [Circle] on Rows:

Screen Shot 2017-01-26 at 20.27.32.png

But a tweak to Compute Using gives the expected result:

Screen Shot 2017-01-26 at 20.28.56.png

Hooray! What else did Emma want?

Make the formatting of the chart the same as mine

Converting to dual-axis, synchornising axes and performing some basic formatting got me more or less to the end product. The final notable thing was the title:

screen-shot-2017-01-26-at-20-36-10

And I ended up with the final visualisation, which is uploaded and downloadable here. It includes a series of worksheets highlighting a few of the trip up points I encountered, so please feel free to check it out and share my pain! I learned so much this week, and have even incorporated this chart type at work today. Thanks Emma and Andy!

UPDATE! THINGS I LEARNED FROM ANDY KRIEBEL’S BLOG POST

Having read Andy’s blog post (linked at the top), I thought I’d start from scratch and highlight key differences between his efficient method, and my clumsy attempts. At this point, I haven’t actually downloaded his workbook, but I’m using his blog post to guide me.

The first key difference is at the data prep level, and it would have saved me a load of time. I struggled initially with the Pivot, but having read Andy’s post, I realised that there was just no need to Pivot – I could just hide the fields I didn’t need:

Screen Shot 2017-01-27 at 19.29.26.png

I don’t need any more than that. As the Yds Measure is explicitly retained (and not lost as one Measure in many, in a large “Metrics” clump, I can directly refer to it in the data source filter. Originally, I had to write a calc isolating Yds from the “Metrics”. This makes the data source filter possible immediately:

Screen Shot 2017-01-27 at 19.31.33.png

So straight away we have the reduced dataset we need. It just makes the whole process simpler. The next key difference is in the handling of the “common baseline” of the x-axis. I went down the INDEX() route, whereas Andy used a couple of calcs. In his post, he mentions whacking a [Seasons Played] calculation on columns, but doing that in its default Continuous Measure state did this:

Screen Shot 2017-01-27 at 20.09.18.png

More to add, so I’m not worried yet, but I did want to verify what the calcs Andy used actually achieved. I looked at Tom Brady, after converting the [Seasons Played] calc to a Dimension:

Screen Shot 2017-01-27 at 20.34.01.png

So the Seasons Played calculation is including missed seasons (see 2008 – maybe an injury?), whereas the INDEX() technique would not have counted 2008 as the third season:

Screen Shot 2017-01-27 at 20.36.16.png

I guess it’s all down to interpretation as to the approach you take.

Next we populate Rows with the RUNNING_SUM(Sum(Yds)). Without setting Compute Using, we end up with this:

Screen Shot 2017-01-27 at 20.10.29.png

Adding Player and Season to the Detail shelf gives this:

Screen Shot 2017-01-27 at 20.11.33.png

I’m replicating this in the 10.2 Beta, which includes a great new table calc feature called “Show calculation assistance”. Here’s how it looks when setting the Compute Using to Season:

Screen Shot 2017-01-27 at 20.13.18.png

Look how it helpfully labels each line to show how the calculation is running. It’s a great feature and not one Tableau have been promoting. I still struggle to fully comprehend table calcs, and this helps.

What I was interested in was understanding why in my case I set Compute Using on Player and Season, restarting each Player, whilst Andy can just reference Season. The reason? I didn’t need to do what I did – here’s my viz just with Compute Using on Season:

screen-shot-2017-01-27-at-20-22-54

 

Advertisements