PRST: A PageRank based Summarization Technique for Summarizing Bug Reports with Duplicates

        This page provides details about material used in our paper - "PRST: A PageRank based Summarization Technique for Summarizing Bug Reports with Duplicates"

         We construct two datasets, namely the modified BRC corpus and the OSCAR corpus. The bug reports have been collected from four different open source projects: Eclipse, Mozilla, KDE and Gnome. There are 28 bug reports in the modified version of BRC corpus, comprising of 9 master bug reports and 19 duplicate bug reports. Similarly, there are 59 bug reports in OSCAR, including 19 master bug reports and 40 duplicate bug reports.

        Since bug reports are descriptive in nature and represent a discussion among members, we employ human annotators to generate manual summaries for all bug reports. Each bug is annotated by 3 different annotators, master degree students at university. Each report in a corpus consists of turns which is further divided into sentences representing extractive summary.

        The bug corpus can be downloaded here: the bug corpus and corresponding annotated corpus.

Top