Unsupervised Deep Bug Report Summarization

Abstract

Bug report summarization is an effective way to reduce the considerable time in wading through numerous bug reports. Although some supervised and unsupervised algorithms have been proposed for this task, their performance is still limited, due to the particular characteristics of bug reports, including the evaluation behaviours in bug reports, the diverse sentences in software language and natural language, and the domain-specific predefined fields. In this study, we conduct the first exploration of the deep learning network on bug report summarization. Our approach, called DeepSum, is a novel stepped auto-encoder network with evaluation enhancement and predefined fields enhancement modules, which successfully integrates the bug report characteristics into a deep neural network. DeepSum is unsupervised. It significantly reduces the efforts on labeling huge training sets. Extensive experiments show that DeepSum outperforms the comparative algorithms by up to 13.2% and 9.2% in terms of F-score and Rouge-n metrics respectively over the public datasets, and achieves the state-of-the-art performance.

Packages

We provide the packages to reproduce the reported results of DeepSum in ``Unsupervised Deep Bug Report Summarization'' (ICPC 2018) Since the original study evaluated Centroid, MMR, Grasshopper and DivRank in a different criterion, we also present the reproducible packages here. To reproduce these algorithms, the following files are needed:

  1. The SDS and ADS datasets. We compress the datasets in data.zip. In this file, the folder "dataset" contains the basic information to run each algorithm and the folder "deepdata" contains the additional information to run DeepSum.
  2. The Java JDK. We implement these algorithms with jdk-6u45-windows-i586.
  3. The reproducible packages of DeepSum and other algorithms.
  4. The Java libraries (libs.zip) to support the reproducible packages.
  5. To run Grasshopper, we also need the MCRInstaller.exe file that extracts from Matlab 2010a.

Here is a readme file to illustrate how to run these packages.

Supplementary Details

1. Stop words
The stop words list can be downloaded from http://www.ranks.nl/stopwords. We use the "Default English stopwords list". We also add several (four) stop words that are programming language specific or project specific, including "java", "bug", "file", and "don"(short for don't).

2. Software sentence detection
We present more details on software sentence detection in Section 3.2.2(a) of the paper, including the regular expressions and rules.

FAQs

1. Whether the network architecture and setup (as described in 3.2.2) can be further improved by tuning various parameters (number of layers, number of units per layer, etc.).
Yes. The network architecture will influence the performance of a deep neural network. In my opinion, there is a relatively large influence. Since, to the best of our knowledge, very few studies have proposed an effective and efficient approach to tune these parameters, we follow the architecture setting of a traditional AutoEncoder network in a previous study.

2. Several problems for bug report summarization.
I think we should solve two problems before improving the results of summarization.
First, can we construct a large dataset (oracle) for bug report summarization? There are only two publicly available datasets with 36+96 bug reports to evaluate an algorithm. It is too small. It is important to (automatically) construct a large dataset.
Second, is bug report summarization really useful in real developing scenarios? There is no strong evidence to support it.
If the two problems can not be solved, the advantages of improving the results will be weakened.

Back to [Oscar homepage]