University of Limerick
Browse

Detection of anti-social behaviour in online communication in Arabic

Download (1.07 MB)
thesis
posted on 2022-12-22, 14:24 authored by Azalden Alakrot
Anti-social behaviour on social media cannot be easily ignored as it affects a large and growing percentage of the world’s population. It often has a nega tive effect on people’s lives; incidents of online abuse that may seem insignif icant can have a cumulative impact on mental health. An increasing num ber of incidents of suicide and violence have been reportedly provoked by anti-social behaviour on social media. Most of the existing machine-learning approaches for detection of offensive language are specifically tailored for online communication in English. Solutions targeting Arabic language are rare, while, as we also demonstrate in this thesis, offensive language is wide spread in Arabic social media as well. Our hypothesis has been that Arabic may require a specific approach different from the solutions for English due to the specific linguistic characteristics of Arabic text and the unique to Ara bic mixture of dialects frequently observed within the same conversation on social media. The objective of this thesis is to contribute to the work on the automatic pre vention of anti-social behaviour in online written communication in Ara bic by introducing a large dataset of YouTube comments and proposing a text-mining pipe-line for training a binary classifier. The main challenge to automatic detection of offensive language is the absence of appropriate training datasets. Thus, as part of this work we undertook to collect data from Arabic social media (Arabic YouTube channels) and construct a labelled dataset. Then we utilised this dataset to experiment with a variety of text pre processing techniques, feature-selection methods, and classification machine learning algorithms in order to recommend a process for automatic detection of offensive language in online written communication in Arabic. Our results are encouraging; they suggest Support Vector Machines classifier can be ef fectively deployed for the detection of offensive language in online written communication in Arabic. We believe that the proposed text-mining process will open the door for further research in this direction and will eventually result in effective automatic prevention of incidents of verbal abuse on Ara bic social media.

History

Faculty

  • Faculty of Science and Engineering

Degree

  • Doctoral

First supervisor

Nikolov, Nikola S.

Second supervisor

Murray, Liam

Note

peer-reviewed

Language

English

Department or School

  • Computer Science & Information Systems

Usage metrics

    University of Limerick Theses

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC