University of Limerick
Browse

Trustworthiness of health-related information on Arabic social media

Download (3.58 MB)
thesis
posted on 2022-12-02, 11:24 authored by Yahya AlbalawiYahya Albalawi

Social media (SM) platforms play a vital role in disseminating health-related information. However, evidence suggests that Twitter posts (i.e., tweets) are often inaccurate; for example, research from Saudi Arabia indicates that 50% of health-related tweets contain inaccurate  information. Previous studies also suggest that tweets do not need to be evidence-based or accurate to gain traction, which exacerbates the accuracy concern in the sphere of health information. The goal of the thesis is to develop a framework for automatically determining the accuracy of health-related tweets. 

Knowing the accuracy of tweets offers the potential to recommend/promote accurate tweets while identifying/flagging/demoting inaccurate tweets. As a first step, this thesis employed a  pilot study to identify possible metrics that may correlate with the accuracy of health-related  tweets. The results showed that tweet meta-characteristics have some limited potential in the identification of inaccurate tweets and to inform on their dissemination potential. 

The research then built past this work to develop a framework for automatically determining the accuracy of health-related tweets in Arabic. The first step was to develop a model to detect instances of health-related tweets. This was accomplished by determining the best pre-processing techniques for use with traditional machine learning and then developing traditional machine learning classifiers. The model was then compared with state-of-the-art pre-trained  word embeddings. The findings from evaluating the pre-processing techniques with traditional machine learning showed that pre-processing techniques perform differently from one algorithm to another. In addition, most pre-processing methods highlighted in the literature  were not included in the best combination. Pre-processing techniques specific to the Arabic  language are more likely to improve classifier performance than other generalized pre-processing techniques. However, ultimately the deep learning model outperformed the traditional machine learning models, even with optimized pre-processing. After developing a model to detect the health-related tweets, the accuracy of the health-related  tweets was to be determined. To develop classifiers for this step, we built data sets labeled “accurate” and “inaccurate.” Two medical doctors labeled each tweet, and the data sets were used to evaluate pre-trained language models and word embeddings, to identify the best model  for detecting health-related tweets’ trustworthiness. The results suggest that pre-trained  language models perform better than pre-trained word embeddings. 

Results from both phases were impressive individually but suffer from the individual inaccuracy slightly when used in combination to detect accurate health tweets. However, we believe that the proposed process to identify health trustworthiness and the findings from these experiments will open the door for further research in this direction and may eventually result  in an even more effective automatic prevention of incidents of health misinformation in Arabic  

History

Faculty

  • Faculty of Science and Engineering

Degree

  • Doctoral

First supervisor

Jim Buckley

Second supervisor

Nikola Nikolov

Also affiliated with

  • LERO - The Irish Software Research Centre

Department or School

  • Computer Science & Information Systems

Usage metrics

    University of Limerick Theses

    Categories

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC