Course Level
CS1
Knowledge Unit
Fundamental Programming Concepts
Collection Item Type
Assignment
Synopsis

This programming assignment requires students to consider a collection of Old English poetry and prose texts and consider the conjecture if any words appear only in the poetry (throughout the entire corpus)? And if so, how many times do these words occur? Students use a Python dictionary (also called a “hash table” or “map”) to keep track of all words in the poetry and then remove words from that dictionary that appear in the prose. Learning goals include problem decomposition (functions), extending existing code, technical writing, and writing scripts to produce HTML output.

The author of this material was awarded a 2016 NCWIT Engagement Excellence Award for this assignment. Learn more on NCWIT's awards page.

Recommendations

I recommend a thorough discussion of the input file structure, especially as it relates to working with each file within multiple directories (folders) using glob(). Students are required to add a function that outputs the results in HTML. This assignment brings the semester back "full circle" as students must revisit the HTML tags learned at the start, only here they must have their Python source create the HTML "on the fly". Instructors are encouraged to require professional reports that include Methods, Results, and Discussion.

For more information on how to implement this assignment, see Mark's Teaching Paper, "Computing and the Digital Humanities-Computing for Poets." 

 

Engagement Highlights

Counting tokens or ngrams in a text is a wonderful entry into computational stylometry and/or text mining. This assignment presents an open question to be asked over an entire corpus, in particular, to find words in the poetry that do not occur in the prose in the Old English (Anglo-Saxon) corpus. The interdisciplinary connection between text analysis and computing is immediate evident. Students focus on solving a problem that is both meaningful and relevant for scholars of this corpora. An added benefit is for students to recognize that their solution can be applied to other corpora and collections of texts, e.g., are there any words used by the Bronte sisters that are never used by Jane Austen?

Materials and Links

Materials
Rubric for Grading

Computer Science Details

Computer Science Topic(s)
code review
documentation
file i/o
functional decomposition
functions
loops (for)
string manipulation
strings
Programming Language
Python

Additional Details

Prerequisites / Prep Materials

See Recommendations above and Teaching Paper entitled, "Computing and the Digital Humanities-Computing for Poets" in Material Links.

Lessons Learned / Pitfalls

See Recommendations above.

Material Format and Licensing Information

Material Format
Python File
Text File
Word file
Technology Platform Required
Mobile
Creative Commons License
CC BY-NC

Author's Institutional Information

Institution Type
Baccalaureate Colleges - Liberal Arts
Community Type
Town Fringe