Тип публикации: статья из журнала
Год издания: 2019
Идентификатор DOI: 10.17223/15617793/448/6
Ключевые слова: emotion, Internet texts, sentiment analysis, verbal markers, machine learning, cognition
Аннотация: The article aims to present theoretical grounds for the concept of the verbal marker, proposes a typology of such markers and summarizes observations about the impact of verbal marker combinations on the accuracy of the computer classifier designed to assign Internet texts in Russian to different emotional classes of texts. As a result of the complex analysis of the up-to-date information based on the international scholarship, the authors of the article give a definition of the term "verbal marker". The latter is a unit or structure belonging to the one of the linguistic system levels, available to parametrization and appearing in the text as an indicator of processes, covert from direct observation, occurring in human cognitive system. According to the level of the linguistic system in which the unit or the structure with the marking function is localized, the authors propose to distinguish the following types of verbal markers relevant for the analysis of written texts: lexical markers, morphological markers, syntactical markers, semantic markers, punctuation markers and, finally, textual markers. To prove the practical viability of the conception, the authors applied it in their project conducted in the field of sentiment analysis and supposed to resolve the problem of attributing an Internet text in Russian to a particular class of emotions. The authors are deeply interested in the emotional tonality of Internet texts because they became one of the most common forms of texts in Russian, and the technology of their automatic assessment has the clearest commercial and social prospects. The concept of the classifier is based on eight emotions detected by Swedish neuroscientist H. Lovheim in relation to some specific combinations of the levels of monoamines in the limbic system of human brain. To build the classifier, the authors used the method of supervised machine learning which demands the sample selection and the extraction of features. As the data, the authors took 15,000 emotionally rich fragments of 60-80 words selected from the Russian social network VK public Podslushano [Overheard]. For sample extraction, firstly, the authors mapped eight emotional classes of Lovheim's model to a range of hashtags used by public group editors to publish users' posts. Secondly, each text from the sample was assessed by three informants on the crowdsourcing platform. After that, the preliminary classified data went through the expert linguistic analysis made by using multiple tools offered by the linguistic corpus manager Sketch Engine. This analysis led the authors to the extraction of a feature set for the SVM algorithm-based classifier. The analysis of eight texts classes by methods of corpus linguistics and the use of prototype of the classifier showed the dynamics of the weighted average f1-score while incorporating different verbal markers as the classifier features. Thus, the results of the research showed the greatest efficiency of lexical and punctuation markers. However, syntactical and morphological markers also proved to be effective for some classes of emotions. In addition, the authors stress the relevance of marker combinations for accuracy of the statistical models created by the classifier. At present, the f1-score of the classifier in different emotional classes of texts varies from 30% to 50%, which is comparable with the results showed by classifiers built for other languages.
Журнал: TOMSK STATE UNIVERSITY JOURNAL
Выпуск журнала: Is. 448
Номера страниц: 48-58
ISSN журнала: 15617793
Место издания: TOMSK
Издатель: TOMSK STATE UNIV
Вхождение в базы данных
Информация о публикациях загружается с сайта службы поддержки публикационной активности СФУ. Сообщите, если заметили неточности.