Loading...
Thumbnail Image
Publication

Towards realistic sampling: generating dependencies in a relational database

Date
2013
Abstract
Managing large amounts of information is one of the most expensive, time-consuming and non-trivial activities and it usually requires expert knowledge. In a wide range of application areas, such as data mining, histogram construction, approximate query evaluation, and software validation, handling exponentially growing databases has become a dif- cult challenge, and a subset of the data is generally preferred. As a solution to the current challenges in managing large amounts of data, database sampling from the operational data available has proved to be a powerful technique. However, none of the existing sampling approaches consider the dependencies between the data in a relational database. In this paper, we propose a novel approach towards constructing a realistic testing environment, by analyzing the distribution of data in the original database along these dependencies before sampling, so that the sample database is representative to the original database.
Supervisor
Description
peer-reviewed
Publisher
Association for Computing Machinery
Citation
ACM ICUIMC’13;Article no. 12
Funding code
Funding Information
Science Foundation Ireland (SFI)
Sustainable Development Goals
External Link
License
Embedded videos