posted on 2022-12-22, 14:24authored byAzalden Alakrot
Anti-social behaviour on social media cannot be easily ignored as it affects a
large and growing percentage of the world’s population. It often has a nega tive effect on people’s lives; incidents of online abuse that may seem insignif icant can have a cumulative impact on mental health. An increasing num ber of incidents of suicide and violence have been reportedly provoked by
anti-social behaviour on social media. Most of the existing machine-learning
approaches for detection of offensive language are specifically tailored for
online communication in English. Solutions targeting Arabic language are
rare, while, as we also demonstrate in this thesis, offensive language is wide
spread in Arabic social media as well. Our hypothesis has been that Arabic
may require a specific approach different from the solutions for English due
to the specific linguistic characteristics of Arabic text and the unique to Ara bic mixture of dialects frequently observed within the same conversation on
social media.
The objective of this thesis is to contribute to the work on the automatic pre vention of anti-social behaviour in online written communication in Ara bic by introducing a large dataset of YouTube comments and proposing a
text-mining pipe-line for training a binary classifier. The main challenge
to automatic detection of offensive language is the absence of appropriate
training datasets. Thus, as part of this work we undertook to collect data
from Arabic social media (Arabic YouTube channels) and construct a labelled
dataset. Then we utilised this dataset to experiment with a variety of text pre processing techniques, feature-selection methods, and classification machine learning algorithms in order to recommend a process for automatic detection
of offensive language in online written communication in Arabic. Our results
are encouraging; they suggest Support Vector Machines classifier can be ef fectively deployed for the detection of offensive language in online written
communication in Arabic. We believe that the proposed text-mining process
will open the door for further research in this direction and will eventually
result in effective automatic prevention of incidents of verbal abuse on Ara bic social media.