For #MakeoverMonday week 31, Andy Kriebel provided a light-hearted dataset looking at how US broadcasters linguistically tiptoe around the delicate subject of sportsmen taking a smack in the balls. The inspiration was this article on FiveThirtyEight.com, and the dataset can be found on Andy’s blog.
This is my kind of dataset, because it is simple! When I look at the original article, the two clear Dimensions are the media source, and the phraseology adopted to dodge around the subject matter.
So, I knew that I wanted to focus on these two aspects. Firstly by looking at the types of wording used by each media source, and secondly by comparing them against each other in a simple way.
Stacked bars were an option, but they become quite hard to compare when you have a number of things to compare, and a number of component parts (phrases in this case) within each “thing”.
Next, I thought of word clouds, but I ditched that because I didn’t think it would make effective use of space when trying to compare across the various media sources.
I hadn’t created a treemap before, but had recently written up an article pulling together Tableau content from around the interwebnet about them, so that was my preferred chart type in this instance. A link to my viz is here:
It’s very basic, with the structure at a worksheet level being:
By adding Media source to rows, it basically chops up the treemap into distinct sections for each member of that Dimension. This achieves my primary aim of looking at each source independently. I then elected to use colour to focus on the direct references to “Groin”, to see if it gave any insight into how the different sources refer to a kick in the nuts.
For me, it effectively highlights differences between the old school terminology of established media sources, relative to the newer players, who are more happy to pop out balls, nuts, dicks and all manner of todger references, whilst still doffing a cap to the good old groin in almost a third of cases.
I created a calculated field called “Groin conversion rate!” to sort my data, which comprises:
“Total Mentions” is also a calculated field. I could have nested it within the “Groin conversion rate!” calculation, but I’m still new to this, and would likely have cocked it up. “Total Mentions” is a Level of Detail calculation:
So it just sums up the total number of mentions independently at each Media source, and that total is used in the “Groin conversion rate!” calculation. This field is used to sort Media source in descending order by the sum of “Groin conversion rate!”.
The theory here was that it would allow easy identification of those sources which tended to stick to “groin” when describing this sort of meat and veg incident. Subsequently, I think it not only achieves that, but in conjunction with the treemaps it more clearly reveals that a richer variety of phraseology is adopted at sources with a lower “Groin conversion rate!”
That was the thought process behind my visualisation this week!