Summ-it: Extractive summarization, reference cohesion and coreference resolution
Dr. Renata Vieira's web page: http://www.inf.unisinos.br/~renata/
One great challenge for the task of automatic summarization, besides reducing a source text while maintaining its central informative content, is generating a shorter text that is coherent. Extractive approaches to summarization identify a subset of sentences of the source texts that are likely to better express the core informative content of the text. The resulting extracts may disrupt the reference cohesion that was originally designed in the source text, bringing also problems to the informativeness as a consequence.
One of our current research questions is whether coreference resolution, although known as a very hard problem (current systems report around 65% F-measure), might be exploited in order to spot and recover reference cohesion problems in extracts. As a basis for that investigation we are building the Summ-it corpus which is constituted of Portuguese newspaper articles annotated with coreference and rhetorical structure. In this talk I will present the general framework of this project and some preliminary results.
Renata Vieira is a Professor in the Computer Science Department at Universidade do Vale do Rio dos Sinos, Sao Leopoldo, Brazil. She received her PhD from the University of Edinburgh in 1998. Her research interests cover issues in natural language understanding, discourse processing, agent communication, knowledge representation, ontologies and the semantic web. She is currently participating in a Brazilian national project for the development of Portuguese corpora of which Summ-it is a sub-corpus.