posted on 2013-02-11, 16:31authored byLeonardo Trujillo, Yuliana Martínez, Edgar Galván-López, Pierrick Legrand
During the development of applied systems, an important
problem that must be addressed is that of choosing the correct
tools for a given domain or scenario. This general task
has been addressed by the genetic programming (GP) community
by attempting to determine the intrinsic difficulty
that a problem poses for a GP search. This paper presents
an approach to predict the performance of GP applied to data
classification, one of themost common problems in computer
science. The novelty of the proposal is to extract statistical
descriptors and complexity descriptors of the problem data,
and from these estimate the expected performance of a GP
classifier. We derive two types of predictive models: linear
regression models and symbolic regression models evolved
with GP. The experimental results show that both approaches
provide good estimates of classifier performance, using synthetic
and real-world problems for validation. In conclusion,
this paper shows that it is possible to accurately predict the
expected performance of a GP classifier using a set of descriptors
that characterize the problem data.
Funding
U.S.-Hungary Cooperative Mathematical Research on Vilenkin- Fourier Series