University of Limerick
Browse

Towards accurate detection of offensive language in online communication in Arabic

Download (306.02 kB)
conference contribution
posted on 2019-06-10, 09:40 authored by Azalden Alakrot, LIAM MURRAYLIAM MURRAY, Nikola S. Nikolov
We present the results of predictive modelling for the detection of anti-social behaviour in online communication in Arabic, such as comments which contain obscene or offensive words and phrases. We collected and labelled a large dataset of YouTube comments in Arabic which contains a broad range of both offensive and inoffensive comments. We used this dataset to train a Support Vector Machine classifier and experimented with combinations of word-level features, N-gram features and a variety of pre-processing techniques. We summarise the pre-processing steps and features that allow training a classifier which is more precise, with 90.05% accuracy, than classifiers reported by previous studies on Arabic text.

History

Publication

Procedia Computer Science;142 pp, 315-3204th International conference on arabic computational linguistics 2018 Dubai

Publisher

Elsevier

Note

peer-reviewed

Language

English

Usage metrics

    University of Limerick

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC