University of Limerick
Browse

Towards more reliable cross-comparison of feature location techniques

Download (4.83 MB)
thesis
posted on 2022-12-22, 14:16 authored by ABDUL RAZZAQ
Feature location (FL) is the task of finding the source code that implements a specific, user-observable functionality in a software system. Given its key role in many software maintenance tasks, it is an area of much research and a wide variety of Feature Location Techniques (FLTs), that rely on source code structure, dynamic or textual analysis, have been proposed by researchers. As FLTs evolve and more novel FLTs are introduced, it is important to perform comparison studies to investigate Which FLTs are relatively better? However, this thesis shows through a systematic survey of the FL literature that performing such comparisons would be an arduous process, based on the large number of techniques to be compared, the heterogeneous nature of the empirical designs employed to evaluate those FLTs, the lack of openly available, executable FLTs for re-evaluation, and existing, contradictory per- formance's results. This thesis builds on this Systematic Literature Review (SLR) to present an empirical design cognisant of FL goals which is based on best empirical practice and common empirical design elements. Then, in order to facilitate the cross-comparison of FLTs going forward, this thesis employs the resultant empirical design to cross-compare replicable FLTs, in order to relate their performance. The results suggest that Vector Space Model (VSM) with lucene implementation is frequently the best performing openly-available, Information Retrieval(IR)-based FLT but that the performance of specific FLTs is (partially) driven/controlled by feature-sets differences. Towards understanding the impact of feature-set differences, this thesis de- fines a feature-metric suite that is assessed in terms of its effect on FLTs' performance, holistically across FLTs and on the individual FLTs. As contributions, this thesis presents empirical guidelines and an empirical framework that allows better goal-cognisant, performance-based ranking of FLTs and also helps to explain the performance of FLTs in relation to the employed feature-set. It is intended that these advances will, ultimately, allow a standard selection of the systems and benchmarks during FLT evaluation which will not only facilitate increased reliability across FLTs' evaluations but will also greatly improve generality knowledge towards FLT's recommendation for practitioners given a specific software system. This work is seen as a step towards standardizing evaluation in the field, thus facilitating comparison across FLTs.

History

Faculty

  • Faculty of Science and Engineering

Degree

  • Doctoral

First supervisor

Buckely, Jim

Second supervisor

Exton, Chris

Note

peer-reviewed

Other Funding information

SFI

Language

English

Also affiliated with

  • LERO - The Irish Software Research Centre

Department or School

  • Computer Science & Information Systems

Usage metrics

    University of Limerick Theses

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC