Reflection on Digitizing the Wilcox Collection

As I finish the final project for my Digital Humanities seminar this term, I realize that the work on this particular assignment is far from over. For the final project of this seminar, I aimed to complete the online database for the Wilcox Collection on which I and another undergraduate have been working for nearly two years. It started a simple task of documenting these coins which had been collecting dust for decades and had never been studied (perhaps even seen) by scholars of either Numismatics or Classics. Having long been interested in Digital Humanities and with a background in Computer Science, this originally seemed like the perfect project for me. Alas, it turned out to be more daunting that I had originally anticipated.

At the beginning of the term, I had already figured out a few things regarding the website-to-be-constructed (why doesn’t English have future passive participles?). I knew that I wanted it to have a clean, modern design and a fully searchable back-end database. It didn’t hit me that these two things comprised an incredibly small part of what I would need to plan out for the site. I spent most of my time this semester actually figuring out what back-end would actually support the functionality we wanted for the site. I tried coding something myself, but quickly realized that task was too monstrously enormous for one person, let alone one inexperienced student, to handle. So I turned to investigating other options: WordPress, Omeka, CollectiveAccess, etc., but nothing worked how we had imagined.

It was then that I started remembering a site that a colleague had sent me when I began the project: Nomisma.org. Nomisma aims to provide field-wide standards for describing numismatic data with the hopeful result that these hoards of coins make their way into open linked-data repositories like Pelagios. What I didn’t find on this site when I first looked at it, was a link to something called Numishare, which was an open-source back-end software developed by Ethan Gruber at the American Numismatic Society. It is a piece of software directly aimed at helping collection-managers (curators, I suppose) digitize their collections and enable them to contribute to the aforementioned linked-data repositories.

The whole time I had been looking for other digital platforms which could help me accomplish my task, meanwhile the perfect piece of software was right under my nose. This realization was the most frustrating yet jubilant experience of the entire year. We are currently working to implement Numishare on our university’s servers and, although we are currently running into some roadblocks, the path ahead seems promising. My advice to anyone starting a DH project: make sure you investigate all options that you can find. Be thorough. Trial and error is this field’s bread and butter.

Due to these technological constraints, I was forced to leave the mock-up of the site as is and write up technical documentation on the entire process of this long project instead. I was a bit sad to downsize my project after realizing that the site would not be up and running by the end of the semester, but the result has been fruitful. Creating an artifact of the creation process for the project will aid in its longevity after I leave the university. Overall, this experience has been frustrating, fruitful, and engaging—and I probably wouldn’t change a thing about it.

 

Lab Experiment: Tableau

Our experiments with Tableau came with our discussions on data visualization and readings on “Data Visualization as a Scholarly Activity” by Martyn Jessop (unfortunately pay-walled here) and “Feminist Data Visualization” by Catherine D’Ignazio and Lauren F. Klein. These two works are fantastic introductions to the scholarly notions of visualizing data and should be kept in mind during all visualization projects. In this post, I will discuss my experiments with Tableau in light of the ideas presented in these articles.

Tableau is a company that provides a robust toolkit for data visualization and analysis at a cost, but also provides a free public version with limited functionality. For this lab, I obviously used the free public version since my university has not purchased access to the full version of the software.

For the purposes of this write-up I will be working with one of Tableau’s freely available data sets (all of which can be found here) on 2016 Presidential election spending as reported by the Federal Election Commission. As with all data analysis, one’s analysis is only as good and conclusive as the data provided, so one must be critical of the data itself:

Screen Shot 2018-05-07 at 4.02.10 PM.png

In this case, the data provided fails to display the party affiliation of these candidates, both in Tableau and Excel. Additionally, the candidates named are Marco Rubio, Ben Carson, Ted Cruz, Rand Paul, Rick Perry, and Hillary Clinton, thus obviously excluding many of the candidates from that year, namely all the Democrats and and many of the Republicans, and most notable: Trump. No explanation is given for these gaps in the data, so we can only speculate.

Screen Shot 2018-05-07 at 4.05.10 PM.png

We will examine the data we have, however, and see what conclusions we can draw from it. Even the public version of Tableau provides a whole suite of tools with which it would take one some considerable time to become acquainted. That is both a limitation and affordance of the software, in my opinion, since the functionality is fantastic, but the initial operability is dismal. For the purposes of this demonstration, I will attempt to visualize which candidate (of those provided in this data set) paid out the most money and to whom. Tableau provides a very easy method for determining how many rows feature each candidate, with Hillary Clinton clearly leading number of individual expenses:

Screen Shot 2018-05-07 at 4.12.39 PM.png

This of course does not tell us which of these candidates spent the most. Tableau also offers an easy way to demonstrate this:

Screen Shot 2018-05-07 at 4.17.25 PM.png

If we compare these two values, we can tell something interesting about the data set as a whole:

Screen Shot 2018-05-07 at 4.20.07 PM.png

Of course Hillary Clinton spent the most and had the most line-item expenses–this is not surprising. However, from comparing these two visualizations, we can tell that Ted Cruz spent more money per line-item transaction relative to the other candidates, given the disparity between the values in each respective graph. This sort of visualization might then lead us to the actual data in order to analyze on what exactly this candidate spent his budget. In Tableau, we can easily calculate the median and mean of each candidates’ expenses:

Screen Shot 2018-05-07 at 4.24.28 PM.png

Screen Shot 2018-05-07 at 4.26.13 PM.pngBy examining this data, we can then confirm that Ted Cruz had the highest average disbursements per line-item expense, which means that, while Hillary Clinton had many more line-item expenses than other candidates, her expenses were cheaper on average. Ted Cruz spent the most per line-item and same can be said of Ben Carson and Rick Santorum. Perhaps more interesting however, is that Rick Santorum had the largest median expense value by far. If we look back at the data, Santorum only reported 6 line-item expenses:

Screen Shot 2018-05-07 at 4.35.31 PM.png

 

So few expenses is definitely an outlier in this data set and begs the question of where is the data on his other expenses? This is a great question for political scientists, one which I cannot address. This example however, proves how the data provided greatly affects the analysis done in software like Tableau, regardless of the many amazing tools that it provides.

Lab Experiment: AntConc

After an introduction to text analysis (most notably Ted Underwood’s article on the subject) and text-mining, our class explored the two platforms AntConc. This project is aimed at equipping anyone unfamiliar with programming languages (such as Python) with the tools to do introductory analysis on English texts. As I will demonstrate later in this post, this service does not work well with texts written in any languages other than English. I will examine the affordances and limitations of these software with special regard to the requirements of text analysis on Latin and Greek text in Classics.

AntConc

AntConc is developed by in an independent scholar, Laurence Anthony, and has no institutional affiliations, which perhaps gives some cause for hesitation about the longevity of the software, but such is the case for all projects of this nature. The fact that Anthony’s software is machine-based instead of online (like Voyant) does help with replicating results, however might make it inaccessible for some. For experimenting with this software, I downloaded a copy of Horace’s Odes (in Latin) from Project Gutenberg.

Importing the text into AntConc is quite easy, but one must make sure that they include a clean text. I had to manually delete much of the Project Gutenberg legal jargon from the text file before I was able get any results about the Odes at all. However, this attempt was quickly null given that AntConc has no support for inflected languages such as Latin. Take for instance the noun puer meaning ‘boy’. This noun can take all the following forms depending on its function in the sentence: puer, pueri, puero, puerum, puerorum, pueris, and pueros, with many of these serving different grammatical functions, but looking the same. To illustrate: AntConc’s concordance for puer is as follows:

Screen Shot 2018-05-07 at 3.10.51 PM.png

While is somewhat interesting, it doesn’t actually tell us anything about how Horace uses the noun in his poetry. I then decided to try the Clusters/N-Grams feature, but to no avail:

Screen Shot 2018-05-07 at 3.12.59 PM.png

Again, this is only looking for the verbatim “puer” and none of its other forms, thus the picture painted by AntConc is dismally limiting. Given this utter failure, I quickly opted for studying an English translation of the Odes. This didn’t return much better results however, since I could only work with a translation by John Conington from the 1800s (also from Project Gutenberg). After cleaning the text, these are the results I got for the word “boy”:

Screen Shot 2018-05-07 at 3.19.41 PM.png

Clearly something is amiss. We just saw that Horace used the word puer more than 20 times in his text, so why are there only 9 instances of its most direct translation “boy” in Conington’s text? This a prime example of studying translated texts. This is Horace filtered through the literary, translational lens of Conington. Thus we are unable to draw any real conclusions about this text from AntConc. Of course, it could be suited to studying a wide variety of English texts, but it lacks the ability to filter stop words–a key tool for digital text analysis.

In an attempt to see if AntConc is capable of drawing on basic connections within texts, I tried a query for which I already knew the proper results. Horace largely writes about Augustus and themes of Empire within this work, so a search for ‘Rome’ should provide context of this sort. Here’s what I got:

Screen Shot 2018-05-07 at 3.30.08 PM.png

While these search results are not ideal, it is perhaps due to the use of a translation instead of the original, but we can tease out some of the aforementioned themes from the collocations with “undying,” “standing,” “plenteous,” and “nourish.”

Overall, I would argue that AntConc is only useful for exploring texts–not crafting arguments about them. The suite of tools available in this software allows the lay user to make basic queries about what surrounds certain words or what possible N-grams exist within the text. If one seeks to do rigorous research, however, they will need to familiarize themselves with a language like Python and its packages such as the NLTK.

Rev. Pharos: Doing Justice to the Classics

In the DH seminar I’m taking this semester, we are tasked with reviewing a digital project and posting this review on our blog, aka here. I want to tailor my DH assignments to Classics, since we are graciously given quite a bit of leeway and freedom (Thanks, Dhanashree!). I thought it might be interesting to review a project I’d heard a lot about, but not explored in depth. So, I would like to take this opportunity to investigate the Pharos project which seeks to “document appropriations of Greco-Roman culture by hate groups” and provide a space where scholars (and not just those working for institutions) are able to publicly respond to them. This project was started (presumably) in 2017 by Curtis Dozier at Vassar College in New York and is funded by the Vassar College Department of Greek and Roman Studies and the Vassar College Office of Communications.

I first learned of this project via Twitter and was super excited that someone had finally created a way to document and respond to these hate groups which perpetually appropriate what we study to meet their own ends. Curtis Dozier has set up a static WordPress site, which ensure its longevity and accessibility to future directors and users. The site is hosted on the Vassar servers which lends it traditional credibility in terms of having a ‘.edu’ address but it also ensures longevity. Having started to create a new site for our own Classics museum, I have learned that any site hosted at an collegiate institution is essentially guaranteed oversight and stability.

The overall setup of the site is simple, but thorough. There are four different post tags which allow the reader to filter and search through them: Documenting Appropriations, Response Essays, Scholars Respond, and Site News/Editorial. These four categories adequately describe the different types of posts that concern the project. They are also well-described alongside the overall structure and goals of the site on the home page. In terms of accessibility, this site loads well on all devices and has many methods by which one might read the material.

The material of the site holds itself to a high scholarly standard. It is also incredibly self-consistent with its own ideology: when an article criticizes a how a hate group has misappropriated Greco-Roman culture, Pharos always links the reader to an archived version of the site, so that one may view the original content at the time the article was written without giving the site “revenue-generating traffic.” Thus, this project seems to be supremely grounded in factual criticism and always provides citations as well as a list of the scholars who contributed to the response (many of whom are well-known, respected, paragons of the field).

To illustrate, the most recent topic, for which there is both documentation and a response, concerns the male-supremacy hate group, Return of Kings, and their arguments about Roman virtus. The documentation page begins by immediately linking the reader to an archive of the original post, an article on the group by The Southern Poverty Law Center, a screenshot of the original article title and image, and an updated link to the scholarly response. All of this comes before the actual description of the article, thus prodding the reader to investigate the case for themselves before reading any secondary material. This focus on the primary evidence is compelling and a great way to introduce problematic appropriations.

The response to this document, with seven Classicists listed as contributors (and several very recognizable names), is structured in bullet point format, which makes it easy to dissect the scholarly arguments against Return of Kings’ article and clearly demonstrates the faults, misogyny, and utter incorrectness of their arguments—all while citing many educational sources, all of which are open source and not behind a paywall. I counted ~40 citations to both primary texts and secondary sources which support their points, but mostly provide avenues for non-Classicists to learn about the ancient world through trusted sources. Here we have professors and Classicists pointing the reader to open-access articles, texts, and translations so that they might become more familiar with this material and perhaps prevent the spread of these wrongful appropriations.

Lastly, I would like to draw attention to the sole post (for now, I’m sure) under the tagged section ‘Site News/Editorial’, since it outlines exactly why Curtis Dozier has created this project and why he believes it has significance. Dozier cites incredibly noble and thorough reasons for this project and it is most definitely worth reading, even if you are not a Classicist. In fact, the whole project is approachable by anyone and they need not be steeped in Greco-Roman culture in order to understand the writing or arguments. One need only have a computer, an open mind, and the will to think critically about social justice in terms of the past. This is the kind of digital scholarship that we should be producing, and I’m proud to be a part of a discipline that takes social justice so seriously.

A(n Incomplete) Survey of Digital Tools for Classicists

As I explore the online community of Classicists and their digital products, I keep thinking that it would be incredibly helpful to have a thorough list of all current projects. I hope to provide that here, working off Sarah Bond’s list as well as the extensive list of projects at the Universität Leipzig under the direction of Monica Berti. I hope to add more projects as I learn about them. All suggestions are welcome!

Literary & Textual Analysis:

CLTK: The CLTK is an expansive project undertaken by Patrick J. Burns, Luke Hollis, and Kyle P. Johnson. It aims to provide a thorough Python framework for linguistic analysis of ancient texts. This project extends the NLTK which provides a similar framework for modern linguistic study.

Quantitative Criticism Lab: Started in 2014 by Pramit Chaudhuri and Joseph Dexter, the QCL is project which seeks to

Perseus @ Tufts: The OG workspace for reading ancient texts and their commentaries.

Perseus/Philologic @ UChicago: Chicago hosts a version of Philologic on their site which allows the user to search texts, create concordances, and make mid-level linguistic searches with relative ease.

Perseids: The Perseids editor is a web-based is a text-editing environment that enables the collaborative editing of texts in a framework of rigorous and transparent peer-review and credit mechanisms and strong editorial oversight.

Arethusa: A data tree-banking client-side service for accessing texts, annotations and linguistic services from a variety of sources.

Thesaurus Linguae Graecae (TLG): The TLG is a crazy good platform for analyzing Greek texts and words.

Thesaurus Linguae Latinae (TLL): While the TLL is something I’ve never personally used (too expensive for my department), I’ve heard it’s quite good.

Digital Texts:

All these sites have been created under the Open Greek and Latin Project at the University of Leipzig and are under the direction of Monica Berti.

DFHG: The Digital Fragmenta Historicorum Graecorum provides the five volumes of the Fragmenta Historicorum Graecorum(FHG) edited by Karl Müller in the 19th century.

First 1000 Years of Greek: This project seeks to record a copy of every extant Greek text, but more specifically, those not already hosted by the Perseus Digital Library.

Digital Athenaeus: A digital version of The Deipnosophists by Athenaeus which describes several banquet conversations on a variety of topics.

Digital Marmor Parium: A digital version of the marble slab found at Paros which records a timeline of Greek history (1581/80-299/98 BC): archons, kings, and short references to historical events from the Athenian perspective.

Mapping:

Pelagios: The Pelagios linked-data project provides a robust network of linked geographical and literary data from Classical literature and scholarship.

Orbis: The Orbis project provides an extensive geospatial network model of the Roman world. It provides a way to calculate travel and associated travel cost associated with the vast distances of the empire.

Map for the DFHG: a digital and interactive map of the fragments from the DFHG, linked above.

Hestia: The Hestia Project seeks to elucidate the literary geography of Herodotus’ Histories.

Map Tiles: This site provides accurate and scalable maps for different periods of history, as compiled by the Ancient World Mapping Center.

Archaeological Resources:

FASTI: a searchable database of archaeological excavations, conservations projects, and surveys since 2000, created by The International Association of Classical Archaeology (AIAC) and the Center for the Study of Ancient Italy of the University of Texas at Austin (CSAI).