Our experiments with Tableau came with our discussions on data visualization and readings on “Data Visualization as a Scholarly Activity” by Martyn Jessop (unfortunately pay-walled here) and “Feminist Data Visualization” by Catherine D’Ignazio and Lauren F. Klein. These two works are fantastic introductions to the scholarly notions of visualizing data and should be kept in mind during all visualization projects. In this post, I will discuss my experiments with Tableau in light of the ideas presented in these articles.
Tableau is a company that provides a robust toolkit for data visualization and analysis at a cost, but also provides a free public version with limited functionality. For this lab, I obviously used the free public version since my university has not purchased access to the full version of the software.
For the purposes of this write-up I will be working with one of Tableau’s freely available data sets (all of which can be found here) on 2016 Presidential election spending as reported by the Federal Election Commission. As with all data analysis, one’s analysis is only as good and conclusive as the data provided, so one must be critical of the data itself:
In this case, the data provided fails to display the party affiliation of these candidates, both in Tableau and Excel. Additionally, the candidates named are Marco Rubio, Ben Carson, Ted Cruz, Rand Paul, Rick Perry, and Hillary Clinton, thus obviously excluding many of the candidates from that year, namely all the Democrats and and many of the Republicans, and most notable: Trump. No explanation is given for these gaps in the data, so we can only speculate.
We will examine the data we have, however, and see what conclusions we can draw from it. Even the public version of Tableau provides a whole suite of tools with which it would take one some considerable time to become acquainted. That is both a limitation and affordance of the software, in my opinion, since the functionality is fantastic, but the initial operability is dismal. For the purposes of this demonstration, I will attempt to visualize which candidate (of those provided in this data set) paid out the most money and to whom. Tableau provides a very easy method for determining how many rows feature each candidate, with Hillary Clinton clearly leading number of individual expenses:
This of course does not tell us which of these candidates spent the most. Tableau also offers an easy way to demonstrate this:
If we compare these two values, we can tell something interesting about the data set as a whole:
Of course Hillary Clinton spent the most and had the most line-item expenses–this is not surprising. However, from comparing these two visualizations, we can tell that Ted Cruz spent more money per line-item transaction relative to the other candidates, given the disparity between the values in each respective graph. This sort of visualization might then lead us to the actual data in order to analyze on what exactly this candidate spent his budget. In Tableau, we can easily calculate the median and mean of each candidates’ expenses:
By examining this data, we can then confirm that Ted Cruz had the highest average disbursements per line-item expense, which means that, while Hillary Clinton had many more line-item expenses than other candidates, her expenses were cheaper on average. Ted Cruz spent the most per line-item and same can be said of Ben Carson and Rick Santorum. Perhaps more interesting however, is that Rick Santorum had the largest median expense value by far. If we look back at the data, Santorum only reported 6 line-item expenses:
So few expenses is definitely an outlier in this data set and begs the question of where is the data on his other expenses? This is a great question for political scientists, one which I cannot address. This example however, proves how the data provided greatly affects the analysis done in software like Tableau, regardless of the many amazing tools that it provides.