Document Type



Doctor of Philosophy (PhD)


Computer Science

First Advisor's Name

Mark A. Finlayson

First Advisor's Committee Title

Committee chair

Second Advisor's Name

Christine Lisetti

Second Advisor's Committee Title

Committee member

Third Advisor's Name

Anthony Dick

Third Advisor's Committee Title

Committee member

Fourth Advisor's Name

Santiago Ontañón

Fourth Advisor's Committee Title

Committee member

Fifth Advisor's Name

Monique Ross

Fifth Advisor's Committee Title

Committee member


animacy, character, archetypes

Date of Defense



If we are to understand stories, we must understand characters: characters are central to every narrative and drive the action forward. Critically, many stories (especially cultural ones) employ stereotypical character roles in their stories for different purposes, including efficient communication among bundles of default characteristics and associations, ease understanding of those characters' role in the overall narrative, and many more. These roles include ideas such as hero, villain, or victim, as well as culturally-specific roles such as, for example, the donor (in Russian tales) or the trickster (in Native American tales). My thesis aims to learn these roles automatically, inducing them from data using a clustering technique.

The first step of learning character roles, however, is to identify which coreference chains correspond to characters, which are defined by narratologists as animate entities that drive the plot forward. The first part of my work has focused on this character identification problem, specifically focusing on the problem of animacy detection. Prior work treated animacy as a word-level property, and researchers developed statistical models to classify words as either animate or inanimate. I claimed this approach to the problem is ill-posed and presented a new hybrid approach for classifying the animacy of coreference chains that achieved state-of-the-art performance.

The next step of my work is to develop approaches first to identify the characters and then a new unsupervised clustering approach to learn stereotypical roles. My character identification system consists of two stages: first, I detect animate chains from the coreference chains using my existing animacy detector; second, I apply a supervised machine learning model that identifies which of those chains qualify as characters. I proposed a narratologically grounded definition of character and built a supervised machine learning model with a small set of features that achieved state-of-the-art performance.

In the last step, I successfully implemented a clustering approach with plot and thematic information to cluster the archetypes. This work resulted in a completely new approach to understanding the structure of stories, greatly advancing the state-of-the-art of story understanding.




Previously Published In

Labiba Jahan, Geeticka Chauhan, Mark A. Finlayson (2017). Building on Word Animacy to Determine Coreference Chain Animacy in Cultural Narratives. In Proceedings of the 10th Workshop on Interactive Narrative Technologies (INT) co-located with AIIDE-17, Salt Lake City, Utah, (pp. 198-203) and in Widening NLP (WiNLP 2018) workshop co-located with NAACL, New Orleans, Louisiana. Non-archival.

Labiba Jahan, Geeticka Chauhan, Mark A. Finlayson (2018). A New Approach to Animacy Detection. In Proceedings of the 27th International Conference on Computational Linguistics (COLING), Santa Fe, New Mexico, (pp. 1-12).

Labiba Jahan, W. Victor H. Yarlott, Rahul Mittal, Mark A. Finlayson (2020). Confirming the Generalizability of a Chain-Based Animacy Detector. In Proceedings of the 1st Workshop on Artificial Intelligence for Narratives (AI4N), (pp 43-46).

Labiba Jahan & Mark A. Finlayson (2019). Character Identification Refined: A Proposal. In Proceedings of the First Workshop on Narrative Understanding (WNU) co-located with NAACL, Minneapolis, Minnesota, (pp 12-18).

Labiba Jahan, Rahul Mittal, W. Victor H. Yarlott, Mark A. Finlayson (2020). A Straightforward Approach to Narratologically Grounded Character Identification. In Proceedings of the 28th International Conference on Computational Linguistics (COLING), Barcelona, Spain (Online). (pp. 6089-6100)



Rights Statement

Rights Statement

In Copyright. URI:
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).