Knowing the hot topics in Social Networking Systems become essential for many applications such as marketing and social studies. Topic detection from Arabic posts or tweets using standard cluster labeling techniques such as the most frequent terms or the most predictive terms in the cluster lack the accuracy to catch the implicit semantic relations between terms. In this paper, in this approach we automatically cluster the Arabic tweets based on lingual clustering model for Arabic. This model is based on the Arabic’s rich morphology where most of the words are root derivatives. Changing vowels and insertion of consonants lead to the generation of inflections and derivations. In addition, our approach introduces a semantically enriched Bayesian Network model for cluster labeling. This model generates a list of candidate labels for each cluster given its centroid terms. This model enriches the centroid terms semantic from world knowledge, and models the semantic relations between them by creating a Bayesian Network. After that it utilized probabilistic inference techniques to elect a list of labels to the candidate topic. The elected lists have the highest posterior probability as observed evidences taking into consideration the explicit and implicit features of the cluster’s centroid terms. We conducted a primitive evaluation using manually annotated data set. Our data set consists of 100 clusters that extracted from 38,000 tweets. The results showed that our approach has indications for effectively labeling clusters of tweets in Social Networking Systems.
Keywords: Arabic, bayesian network, semantic, stemmer.