Teitl: Pushing the envelope of sentiment analysis beyond words and polarities
Arianwyr
Engineering and Physical Sciences Research Council
Prif Ymchwiliwr
Cyd-Ymchwilwyr
Spasic, Irena
Manylion y Prosiect
Dyddiad dechrau: 01.10.2013
Dyddiad gorffen: 30.09.2017
Crynodeb
Idioms are
multi-word expressions which hold a literal and figurative meaning which is
conventionally understood by native speakers. Their overall meaning, often,
cannot be deduced from the literal meaning of their constituent words.
Sentiment analysis, also referred to as opinion mining, aims to automatically
extract and classify sentiments, opinions, and emotions expressed in text. The
research in this thesis is motivated by the fact that idioms, which often
express an affective stance towards an entity or an event, are not featured
systematically in sentiment analysis. To estimate the degree to which the
inclusion of idioms as features may improve the results of traditional
sentiment analysis, we compared our results to two state-of-the-art sentiment
analysis approaches. Firstly, we collected a set of idioms that are relevant to
sentiment analysis, i.e. those that can be mapped to an emotion. These mappings
were obtained using a crowdsourcing approach. Secondly, to evaluate the results
of sentiment analysis, we assembled a corpus of sentences in which idioms are
used in context. Each sentence was annotated with an emotion, which formed the
basis for the gold standard used for the comparison against the baseline
methods. The classification performance was improved by almost 20 percentage
points. Given the positive findings from our initial experiments, the main
limitation was the significant knowledge-engineering overhead involved in
hand-crafting lexico-semantic resources used to support idiom-based features.
To minimise the bottleneck associated with the acquisition of such resources,
we scaled up our original approach by automating their engineering.
Subsequently, these resources were used to replace the manually engineered
counterparts of such features in the originally proposed method. The fully
automated approach outperformed the two baseline methods by 7 and 9 percentage
points. These improvements, however, were poorer in comparison to those
achieved in the initial study. Nevertheless, we have demonstrated, not only can
idiom-based features be automatically engineered, but they too, improve
sentiment classification results, when such features are present. Taking a
long-term view of the research in this thesis, we want to address the
limitations of state-of-the-art sentiment analysis approaches by focusing on a
full range of emotions, rather than sentiment polarity. However, there is no
consensus among researchers on a standardised framework for classifying
emotions. Proposing such a framework would be a major contribution to the field
of sentiment analysis, as it would stimulate its evolution into fully-fledged
emotion classification and allow for systematic comparison of independent
studies. With this goal in mind, we investigated the utility of different
classification frameworks for sentiment analysis. A comprehensive statistical
analysis of our experimental results provided explicit evidence that, in
relative terms, six basic emotions are best suited for sentiment analysis.
However, we identified the major shortcoming of oversimplifying positive
emotions.
Setiau Data Cysylltiedig