University of Limerick
Browse

Predicting problem difficulty for genetic programming applied to data classification

Download (656.51 kB)
conference contribution
posted on 2013-02-11, 16:31 authored by Leonardo Trujillo, Yuliana Martínez, Edgar Galván-López, Pierrick Legrand
During the development of applied systems, an important problem that must be addressed is that of choosing the correct tools for a given domain or scenario. This general task has been addressed by the genetic programming (GP) community by attempting to determine the intrinsic difficulty that a problem poses for a GP search. This paper presents an approach to predict the performance of GP applied to data classification, one of themost common problems in computer science. The novelty of the proposal is to extract statistical descriptors and complexity descriptors of the problem data, and from these estimate the expected performance of a GP classifier. We derive two types of predictive models: linear regression models and symbolic regression models evolved with GP. The experimental results show that both approaches provide good estimates of classifier performance, using synthetic and real-world problems for validation. In conclusion, this paper shows that it is possible to accurately predict the expected performance of a GP classifier using a set of descriptors that characterize the problem data.

Funding

U.S.-Hungary Cooperative Mathematical Research on Vilenkin- Fourier Series

Office of the Director

Find out more...

History

Publication

GECCO '11 Proceedings of the 13th annual conference on Genetic and evolutionary computation;pp. 1355-1362

Publisher

Association for Computing Machinery

Note

peer-reviewed

Other Funding information

CONACYT

Rights

"© ACM,2011. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in GECCO '11 Proceedings of the 13th annual conference on Genetic and evolutionary computation http://dx.doi.org/10.1145/2001576.2001759

Language

English

Usage metrics

    University of Limerick

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC