Question Tags or Text for Topic Modeling: Which is better

Authors

  • Abdel Nasser H. Zaied Vice-dean for Education and Students Affairs, College of Computers and Informatics, Author

Abstract

Topic modelling is a probabilistic based statistical model used to find the latent topics that best depicts the content of the documents.
Community Question Answering websites such as Quora, Stack Overflow and Yahoo! Answers have been prevalently in use,
performs topic modeling as lot of queries pour in on daily basis which make it challenging to understand, summarize and synthesize
the main topic of discussions. On these websites there are basically two sources of information that are available to analyze the key
latent topics: questions text and tags. Questions are in textual format and tags are the keywords or tokens that are related to the
question being asked which describes the content of the question. In past studies, most of the researchers have used question text
for the purpose of topic modeling. It is still unclear why tag is not being considered for topic modeling. To combat this issue, this
paper performs topic modeling using both question tags and text. The topic modeling based on tags has been compared with text
based on two metrics namely coherence and perplexity. Experiment has been conducted on three real time datasets namely Artificial
intelligence, Software Engineering and quantum computing from Stack exchange website. At high level tag-based topic modelling
looked promising but closer observation revealed the opposite. It has been found that topic modeling using question text is
preferable as topic modelling using tags collapses after a certain number of topics.

Downloads

Published

2020-10-16

How to Cite

Question Tags or Text for Topic Modeling: Which is better. (2020). International Journal of Engineering and Science Research, 10(4), 10-16. https://ijesr.org/index.php/ijesr/article/view/1196