Narratives contain a lot of temporal information. To capture the temporal information in texts, natural language processing researchers developed TimeML, the temporal markup language to annotate temporal information. Temporal graphs can be derived directly from TimeML annotations and can reveal partial ordering of events and times. However, for many purposes, a global order (timeline) is more useful.
The first component of my work focused on timeline extraction from TimeML annotations. Prior approaches have presented machine learning-based systems, which have certain limitations such as imperfect scores, ignoring subordinated relations, and being unable to handle all types of temporal relations. I addressed these issues and presented a constraint satisfaction problem-based solution that achieved state-of-the-art performance.
One way to generate TimeML annotation in texts is to perform manual annotation. However, manual annotations contain human-made errors. In the second component of my work, I built a system to detect errors in the gold-standard annotations and to help users fix them. I tested the system on the TimeBank corpus and provided corrections for the entire corpus.
Another way to generate TimeML annotations is to use automatic annotation systems. In the third component of my work, I developed a novel suite of methods to evaluate the performance of automatic annotators that measures the information loss during the automatic annotation process. I presented eight metrics and evaluated four state-of-the-art automatic annotation tools.
In the last component, I successfully implemented a duration extraction system. This work resulted in a large dataset that contains hundreds of thousands of possible event durations. Combining this work with the timeline extraction system, I was able to extract the duration of entire narratives.