This page provides details about material used in our paper - "PRST: A PageRank based Summarization Technique for Summarizing Bug Reports with Duplicates"
We construct two datasets, namely the modified BRC corpus and the OSCAR corpus. The
bug reports have been collected from four different open source
projects: Eclipse, Mozilla, KDE and Gnome. There are 28 bug reports in the modified version of BRC corpus,
comprising of 9 master bug reports and 19 duplicate bug reports. Similarly, there are 59
bug reports in OSCAR, including 19 master bug reports and 40 duplicate bug
reports.
Since
bug reports are descriptive in nature and represent a discussion among
members, we employ human annotators to generate manual summaries for all
bug reports. Each bug is annotated by 3 different annotators, master
degree students at university. Each report in a corpus consists of turns
which is further divided into sentences representing extractive
summary.
The bug corpus can be downloaded here: the bug corpus and corresponding annotated corpus.