posted on 2021-01-28, 14:46authored byArash Joorabchi, Alaa Alahmadi, Michael English, Abdulhussain E. Mahdi
Measurement of words semantic relatedness plays an important role in a wide range of natural language processing and information retrieval applications, such as full-text search, summarization, classification and clustering. In this paper, we propose an easy to implement and low-cost method for estimating words semantic relatedness. The proposed method is based on the utilization of words temporal footprints as found in publicly available corpora such as Google Books Ngrams (GBN), and knowledge bases such as Wikipedia. The extracted footprints are represented as time series, their similarities is measured using the Minkowski distance, and averaged using a correlation-based weighting scheme to quantify the words semantic relatedness. The overall performance of the method and the quality of the two sources used for extracting words temporal footprints (i.e., GBN and Wikipedia) are evaluated using the MTurk-287 dataset and the standard measures of Pearson's r and Spearman's ρ.
History
Publication
2017 9th IEEE-GCC Conference and Exhibition (GCCCE);