After an introduction to text analysis (most notably Ted Underwood’s article on the subject) and text-mining, our class explored the two platforms AntConc. This project is aimed at equipping anyone unfamiliar with programming languages (such as Python) with the tools to do introductory analysis on English texts. As I will demonstrate later in this post, this service does not work well with texts written in any languages other than English. I will examine the affordances and limitations of these software with special regard to the requirements of text analysis on Latin and Greek text in Classics.
AntConc is developed by in an independent scholar, Laurence Anthony, and has no institutional affiliations, which perhaps gives some cause for hesitation about the longevity of the software, but such is the case for all projects of this nature. The fact that Anthony’s software is machine-based instead of online (like Voyant) does help with replicating results, however might make it inaccessible for some. For experimenting with this software, I downloaded a copy of Horace’s Odes (in Latin) from Project Gutenberg.
Importing the text into AntConc is quite easy, but one must make sure that they include a clean text. I had to manually delete much of the Project Gutenberg legal jargon from the text file before I was able get any results about the Odes at all. However, this attempt was quickly null given that AntConc has no support for inflected languages such as Latin. Take for instance the noun puer meaning ‘boy’. This noun can take all the following forms depending on its function in the sentence: puer, pueri, puero, puerum, puerorum, pueris, and pueros, with many of these serving different grammatical functions, but looking the same. To illustrate: AntConc’s concordance for puer is as follows:
While is somewhat interesting, it doesn’t actually tell us anything about how Horace uses the noun in his poetry. I then decided to try the Clusters/N-Grams feature, but to no avail:
Again, this is only looking for the verbatim “puer” and none of its other forms, thus the picture painted by AntConc is dismally limiting. Given this utter failure, I quickly opted for studying an English translation of the Odes. This didn’t return much better results however, since I could only work with a translation by John Conington from the 1800s (also from Project Gutenberg). After cleaning the text, these are the results I got for the word “boy”:
Clearly something is amiss. We just saw that Horace used the word puer more than 20 times in his text, so why are there only 9 instances of its most direct translation “boy” in Conington’s text? This a prime example of studying translated texts. This is Horace filtered through the literary, translational lens of Conington. Thus we are unable to draw any real conclusions about this text from AntConc. Of course, it could be suited to studying a wide variety of English texts, but it lacks the ability to filter stop words–a key tool for digital text analysis.
In an attempt to see if AntConc is capable of drawing on basic connections within texts, I tried a query for which I already knew the proper results. Horace largely writes about Augustus and themes of Empire within this work, so a search for ‘Rome’ should provide context of this sort. Here’s what I got:
While these search results are not ideal, it is perhaps due to the use of a translation instead of the original, but we can tease out some of the aforementioned themes from the collocations with “undying,” “standing,” “plenteous,” and “nourish.”
Overall, I would argue that AntConc is only useful for exploring texts–not crafting arguments about them. The suite of tools available in this software allows the lay user to make basic queries about what surrounds certain words or what possible N-grams exist within the text. If one seeks to do rigorous research, however, they will need to familiarize themselves with a language like Python and its packages such as the NLTK.