VFDS: very fast database sampling system

Buda, Teodora Sandra; Cerqueus, Thomas; Kristiansen, Morten; Murphy, John

VFDS: very fast database sampling system

conference contribution

posted on 2013-11-15, 10:01 authored by Teodora Sandra Buda, Thomas Cerqueus, Morten Kristiansen, John Murphy

In a wide range of application areas (e.g. data mining, approximate query evaluation, histogram construction), database sampling has proved to be a powerful technique. It is generally used when the computational cost of processing large amounts of information is extremely high, and a faster response with a lower level of accuracy for the results is preferred. Previous sampling techniques achieve this balance, however, an evaluation of the cost of the database sampling process should be considered. We argue that the performance of current relational database sampling techniques that maintain the data integrity of the sample database is low and a faster strategy needs to be devised. In this paper we propose a very fast sampling method that maintains the referential integrity of the sample database intact. The sampling method targets the production environment of a system under development, that generally consists of large amounts of data computationally costly to analyze. We evaluate our method in comparison with previous database sampling approaches and show that our method produces a sample database at least 300 times faster and with a maximum trade off of 0.5% in terms of sample size error.

History

Publisher

IEEE Computer Society

Note

peer-reviewed

Other Funding information

SFI

Rights

“© 2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.”

Language

English

External identifier

http://www.ieee.org/conferences_events/conferences/conferencedetails/index.html?Conf_ID=30332

VFDS: very fast database sampling system

History

Publisher

Note

Other Funding information

Rights

Language

External identifier

Usage metrics

Categories

Keywords

Licence

Exports