Course Level
Knowledge Unit
Fundamental Programming Concepts
Collection Item Type

This programming assignment requires students to consider a collection of Old English poetry and prose texts and consider the conjecture if any words appear only in the poetry (throughout the entire corpus)? And if so, how many times do these words occur? Students use a Python dictionary (also called a “hash table” or “map”) to keep track of all words in the poetry and then remove words from that dictionary that appear in the prose. Learning goals include problem decomposition (functions), extending existing code, technical writing, and writing scripts to produce HTML output.

The author of this material was awarded a 2016 NCWIT Engagement Excellence Award for this assignment. Learn more on NCWIT's awards page.


I recommend a thorough discussion of the input file structure, especially as it relates to working with each file within multiple directories (folders) using glob(). Students are required to add a function that outputs the results in HTML. This assignment brings the semester back "full circle" as students must revisit the HTML tags learned at the start, only here they must have their Python source create the HTML "on the fly". Instructors are encouraged to require professional reports that include Methods, Results, and Discussion.

For more information on how to implement this assignment, see Mark's Teaching Paper, "Computing and the Digital Humanities-Computing for Poets." 


Engagement Highlights

Counting tokens or ngrams in a text is a wonderful entry into computational stylometry and/or text mining. This assignment presents an open question to be asked over an entire corpus, in particular, to find words in the poetry that do not occur in the prose in the Old English (Anglo-Saxon) corpus. The interdisciplinary connection between text analysis and computing is immediate evident. Students focus on solving a problem that is both meaningful and relevant for scholars of this corpora. An added benefit is for students to recognize that their solution can be applied to other corpora and collections of texts, e.g., are there any words used by the Bronte sisters that are never used by Jane Austen?

Materials and Links


Computer Science Details

Programming Language

Material Format and Licensing Information

Creative Commons License