In this activity, students use hierarchical clustering and k-means clustering to find clusters of similar genes, which can be used to predict genes that can affect certain cancers. Students use a priority queue to find close pairs of objects to use in clustering, and then use other data structures to perform the algorithm. This assignment is excellent for students that would appreciate synthesizing several data structures with a non-trivial algorithm with real-world applications.
Incorporate Student Choice by allowing students to find their own data set to run clustering analysis on.
By explaining the ideas of clustering in the context of identifying cancerous genes, the activity Employs Meaningful and Relevant Content. By using real genetic data and explaining how these methods contributed to the field of medicine, this assignment Makes Interdisciplinary Connections to CS.