Purpose
The use of social media and in particular community Q&A websites by
learners has increased significantly in recent years. The vast amounts of data
posted on these sites provide an opportunity to investigate the topics under
discussion and those receiving most attention. The purpose of this article is
to automatically analyse the content of a popular computer programming Q&A
website, StackOverflow, determine the exact topics of posted Q&As, and
narrow down their categories to help determine subject difficulties of learners.
By doing so, we have been able to rank identified topics and categories
according to their frequencies and, therefore, mark the most asked about
subjects and, hence, identify the most difficult and challenging topics
commonly faced by learners of computer programming and software development.
Design/methodology/approach In this work we have adopted a heuristic research
approach combined with a text mining approach to investigate the topics and
categories of Q&A posts on the StackOverflow website. Almost 160,000
Q&A posts were analysed and their categories refined using Wikipedia as a
crowd-sourced classification system. After identifying and counting the
occurrence frequency of all the topics and categories, their semantic
relationships are established. This data is then presented as a rich graph
which could be visualized using graph visualization software such as Gephi.
Findings
Reported results and corresponding discussion has given an indication that
the insight gained from the process can be further refined and potentially used
by instructors, teachers and educators to pay more attention to and focus on
the commonly occurring topics/subjects when designing their course material,
delivery and teaching methods.
Research limitations/implications The proposed approach limits the scope of the
analysis to a subset of Q&As which contain one or more links to Wikipedia.
Therefore, developing more sophisticated text mining methods capable of analysing
a larger portion of available data would improve the accuracy and generalizability
of the results.
Originality/value The application of text mining and data analytics technologies in
education has created a new interdisciplinary field of research between the
education and information sciences, called Educational Data Mining (EDM). The
work presented in this article falls under this field of research; and it is an
early attempt at investigating the practical applications of text mining
technologies in the area of computer science education.
History
Publication
Journal of Enterprise Information Management;29 (2), pp. 255-275
Publisher
Emerald
Note
peer-reviewed
Rights
This article is (c) Emerald Group Publishing and permission has been granted for this version to appear here http://ulir.ul.ie. Emerald does not grant permission for this article to be further copied/distributed or hosted elsewhere without the express permission from Emerald