COCOA: A synthetic data generator for testing anonymization techniques

Ayala-Rivera, Vanessa; Portillo-Domínguez, Andrés Omar; Murphy, Liam; Thorpe, Christina

Ayala_Rivera_2016_COCOA.pdf (684.58 kB)

COCOA: A synthetic data generator for testing anonymization techniques

conference contribution

posted on 2017-02-06, 15:05 authored by Vanessa Ayala-Rivera, Andrés Omar Portillo-Domínguez, Liam Murphy, Christina Thorpe

Conducting extensive testing of anonymization techniques is critical to assess their robustness and identify the scenarios where they are most suitable. However, the access to real microdata is highly restricted and the one that is publicly-available is usually anonymized or aggregated; hence, reducing its value for testing purposes. In this paper, we present a framework (COCOA) for the generation of realistic synthetic microdata that allows to de ne multi-attribute relationships in order to preserve the functional dependencies of the data. We prove how COCOA is useful to strengthen the testing of anonymization techniques by broadening the number and diversity of the test scenarios. Results also show how COCOA is practical to generate large datasets.

History

Publication

International Conference on Privacy in Statistical Databases: Lecture Notes in Computer Science (LNCS);9867, pp. 163-177

Publisher

Springer

Note

peer-reviewed

Other Funding information

SFI

Rights

The original publication is available at www.springerlink.com

Language

English

External identifier

Usage metrics

Keywords

anonymization techniques COCOA

Licence

CC BY-NC-SA 1.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

COCOA: A synthetic data generator for testing anonymization techniques

History

Publication

Publisher

Note

Other Funding information

Rights

Language

External identifier

Usage metrics

Categories

Keywords

Licence

Exports