Bug report summarization is an effective way to reduce the considerable time in wading through numerous bug reports. Although some supervised and unsupervised algorithms have been proposed for this task, their performance is still limited, due to the particular characteristics of bug reports, including the evaluation behaviours in bug reports, the diverse sentences in software language and natural language, and the domain-specific predefined fields. In this study, we conduct the first exploration of the deep learning network on bug report summarization. Our approach, called DeepSum, is a novel stepped auto-encoder network with evaluation enhancement and predefined fields enhancement modules, which successfully integrates the bug report characteristics into a deep neural network. DeepSum is unsupervised. It significantly reduces the efforts on labeling huge training sets. Extensive experiments show that DeepSum outperforms the comparative algorithms by up to 13.2% and 9.2% in terms of F-score and Rouge-n metrics respectively over the public datasets, and achieves the state-of-the-art performance.
We provide the packages to reproduce the reported results of DeepSum in ``Unsupervised Deep Bug Report Summarization'' (ICPC 2018) Since the original study evaluated Centroid, MMR, Grasshopper and DivRank in a different criterion, we also present the reproducible packages here. To reproduce these algorithms, the following files are needed:
Here is a readme file to illustrate how to run these packages.
1. Stop words The stop words list can be downloaded from http://www.ranks.nl/stopwords. We use the "Default English stopwords list". We also add several (four) stop words that are programming language specific or project specific, including "java", "bug", "file", and "don"(short for don't).
2. Software sentence detection We present more details on software sentence detection in Section 3.2.2(a) of the paper, including the regular expressions and rules.
1. Whether the network architecture and setup (as described in 3.2.2) can be further improved by tuning various parameters (number of layers, number of units per layer, etc.). Yes. The network architecture will influence the performance of a deep neural network. In my opinion, there is a relatively large influence. Since, to the best of our knowledge, very few studies have proposed an effective and efficient approach to tune these parameters, we follow the architecture setting of a traditional AutoEncoder network in a previous study.
2. Several problems for bug report summarization. I think we should solve two problems before improving the results of summarization. First, can we construct a large dataset (oracle) for bug report summarization? There are only two publicly available datasets with 36+96 bug reports to evaluate an algorithm. It is too small. It is important to (automatically) construct a large dataset. Second, is bug report summarization really useful in real developing scenarios? There is no strong evidence to support it.If the two problems can not be solved, the advantages of improving the results will be weakened.
Back to [Oscar homepage]