University of Limerick
Chochlov_2017_using.pdf (1.58 MB)

Using and characterizing change-sets to support feature location

Download (1.58 MB)
posted on 2023-01-25, 11:23 authored by Muslim Chochlov
Feature location is finding the source code that implements specific functionality in software systems. Feature location is a complex activity and, when performed manually, it may require significant developers' effort. Consequently, semi-/automated feature location techniques have been proposed to assist developers. One common group of such approaches utilizes textual information in source code, and applying information retrieval techniques. Since there is a paucity of meaningful terms in source code, a reasonable research direction is to mix various data sources to expand upon the dataset of meaningful terms in source code entities, for information retrieval. One such data source is the set of change-set descriptions. Not much work has been done in the area of meaningful term expansion using change-set descriptions and the extent to which such expansions are useful has not been thoroughly studied in the literature. This work proposes a technique which leverages change-set data sets as a source of meaningful terms that can act as source code descriptors (ACIR). It is the rst work to study change-sets in such a role in isolation and characterize their e ectiveness as a data-set for information retrieval based feature location. Specifically, it characterizes the performance of ACIR in terms of granularity, recentness of change-sets, aggregation of recent change-sets by change request, and filtering of "management" change-sets using textual classification via a custom built tool, implementing ACIR. The evaluation work is larger than the other works in this area, employing 8 di erent subject systems with a total of 600 re-enactment samples. It was found, for ACIR, that the e ort required to locate entities is, in general, lower at method level than le level of granularity. Additionally, using more recent change-sets improves the effectiveness of ACIR. However, aggregation of recent change-sets by change request, decreases effectiveness. Surprisingly, it was also found, that text classification based filtering of "management" change-sets, based on generic management terms, decreases the e effectiveness of ACIR. Further, the findings indicate that certain characteristics of subject systems seem to affect the performance of ACIR: a strongly pronounced dichotomy of subject systems emerged, where one set recorded better feature location using ACIR and another recorded better FL using a more traditional baseline approach. Finally, it was found, that merging ACIR and a baseline approach significantly improves performance over the baseline approach by 95% and over ACIR alone by 17%. Apart from the more concrete findings on the effectiveness of the newly proposed technique itself, the most fundamental finding is the importance of rigorously characterizing proposed feature location techniques, to identify their optimal configurations. The results also suggest it is important to characterize the software systems under study when selecting the appropriate feature location technique. In the past, configuration of the techniques and characterization of subject systems have not been considered first-class entities in research papers, whereas the results presented here suggests these factors can have a big impact.



  • Faculty of Science and Engineering


  • Doctoral

First supervisor

Buckley, Jim

Second supervisor

English, Michael



Other Funding information




Also affiliated with

  • LERO - The Irish Software Research Centre

Department or School

  • Computer Science & Information Systems

Usage metrics

    University of Limerick Theses


    No categories selected


    Ref. manager