Computing the Bacon Number Using IMDB Actor-Movie Data and BRIDGES

Add Bookmark

2 Bookmarks

Course Level

Data Structures

Knowledge Unit

Fundamental Data Structures

Collection Item Type

Assignment

Project

Synopsis

This is a project involving graphs that is motivated by the Bacon Number problem (or more generally six degrees of separation), i.e. to compute the smallest number of links from one actor to another. It is well known that Kevin Bacon, a prolific actor, has worked with a large number of actors over his career and it can be shown that within about 6 links you can reach almost any actor in the actor-movie graph of this dataset. Given that the graph is unweighted, we can use a BFS traversal of the graph to find the links that lead to Kevin Bacon (or any particular actor) from any actor in the graph. Using the BRIDGES API, you can visualize the graph and highlight the links (actors and movies) that are in the path.

In the project students will:

Read in actor-movie data using the BRIDGES API into a list of objects (Java or C++).
They will build an actor-movie graph using the input dataset. The graph needs to be built ensuring that the actor and movie nodes are unique (since actors are part of multiple movies and movies have multiple actors, while the data comes as actor-movie pairs).
They will implement the BFS traversal algorithm on graphs. They will need to use a queue as part of this algorithm.
The application will receive user input for two actor names and the result will be a set of highlighted nodes and links that form the path from the source to the destination actor. See the example visualization on the BRIDGES site (http://bridgesuncc.github.io/)

The NSF-funded BRIDGES project (Bridging Real-world Infrastructure Designed to Goal-align, Engage, and Stimulate) provides easy-to-use interfaces to make it possible to use to real-world data sets in freshmen and sophomore level CS courses. The BRIDGES API makes it easier to visualize the data structures that students create as part of their assignments.

Recommendations

This is typically done in a data structures course, where students are learning about graphs, their representations and traversal algorithms (DFS, BFS). While typically used in a sophomore-level data structures course, it may be useful in advanced CS2 courses. BRIDGES provides some helper functions and example tutorials on how to build and visualize the resulting graph.

Students should have an understanding of arrays, linked lists, queues and hash tables as these are used in different parts of the project (or used by BRIDGES in its graph implementations).
Students also need to learn about hash tables (assignment requires use of them - either Java's HashMap or C++'s unordered_map) in this assignment. The actor-movie objects come in pairs and the graph needs to be built ensuring the actor and movie nodes are unique. A hash table can be used to ensure if the actor or movie has already been encountered prior to adding it as a graph vertex.
The project can be done with the BRIDGES graph adjacency list or the adjacency matrix implementation (relevant classes are GraphAdjLIst, GraphAdjMatrix).
See the simple graph example using the IMDB data at http://bridgesuncc.github.io/Hello_World_Tutorials/Graph.html that shows a small IMDB graph built and visualized.
See the versions of the assignment descriptions (http://bridgesuncc.github.io/assignments.html ) from Spring 2016 (assignments 1-1 and 1-2); in the current version of BRIDGES, the dataset is retrieved directly through the BRIDGES API (see http://bridgesuncc.github.io/Hello_World_Tutorials/Graph.html )

Engagement Highlights

Movies and entertainment are typically an interesting area for students. This BRIDGES assignment brings together an interesting data set, a classic graph algorithm (Breadth First Search/Traversal), and a visualization of the results in a single assignment. This is one of those assignments where visualization and algorithms nicely complement each other. The visualization also helps students diagnose errors in their implementation using the visualizations without taking away significant class time.

The project uses a curated subset of the IMDB dataset (about 1813 actor-movie pairs). In particular, BRIDGES takes away the complexity of retrieving/parsing data from its original form--which can be uninteresting and frustrating to students--and lets students focus on the algorithmic details of the project.

Engagement Practices Employed

Use Meaningful and Relevant Content

Materials and Links

Materials

Student-facing Handout (PDF for quick view)

Student-facing Handout (Word doc for download)

Computer Science Details

Programming Language

C++

Java

Material Format and Licensing Information

Creative Commons License

CC BY-NC