University of Limerick
Browse

Towards realistic sampling: generating dependencies in a relational database

Download (383.93 kB)
conference contribution
posted on 2013-11-28, 12:22 authored by Teodora Sandra Buda, John Murphy, Morten Kristiansen
Managing large amounts of information is one of the most expensive, time-consuming and non-trivial activities and it usually requires expert knowledge. In a wide range of application areas, such as data mining, histogram construction, approximate query evaluation, and software validation, handling exponentially growing databases has become a dif- cult challenge, and a subset of the data is generally preferred. As a solution to the current challenges in managing large amounts of data, database sampling from the operational data available has proved to be a powerful technique. However, none of the existing sampling approaches consider the dependencies between the data in a relational database. In this paper, we propose a novel approach towards constructing a realistic testing environment, by analyzing the distribution of data in the original database along these dependencies before sampling, so that the sample database is representative to the original database.

History

Publication

ACM ICUIMC’13;Article no. 12

Publisher

Association for Computing Machinery

Note

peer-reviewed

Other Funding information

SFI

Rights

"© ACM, 2013. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM ICUIMC’13, article no. 12, http://dx.doi.org/10.1145/2448556.2448568

Language

English

Usage metrics

    University of Limerick

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC