Comparing the utility of different classification schemes for emotive language analysis - Williams L, Spasic I, Artemiou A, et al. (2019). Cardiff University. 10.17035/d.2019.0067889599. Natural Language Processing Machine Learning (AI) Data capture Data Mining - Porth Ymchwil

Teitl: Comparing the utility of different classification schemes for emotive language analysis

Dyfyniad
Williams L, Spasic I, Artemiou A, et al. (2019). Comparing the utility of different classification schemes for emotive language analysis. Cardiff University. https://doi.org/10.17035/d.2019.0067889599

Hawliau Mynediad: Creative Commons Attribution 4.0 International

Dull Mynediad: I anfon cais i gael y data hwn, ebostiwch opendata@caerdydd.ac.uk

Crewyr y Set Ddata o Brifysgol Caerdydd

Manylion y Set Ddata

Cyhoeddwr: Cardiff University

Dyddiad (y flwyddyn) pryd y daeth y data ar gael i'r cyhoedd: 2019

Dyddiad dechrau creu'r data: 01.01.2015

Dyddiad gorffen creu'r data: 17.03.2016

Fformat y data: .csv

Amcangyfrif o gyfanswm maint storio'r set ddata: Llai na 100 megabeit

DOI : 10.17035/d.2019.0067889599

DOI URL: http://doi.org/10.17035/d.2019.0067889599

Disgrifiad

We investigated the utility of different classification schemes for emotive language analysis. We compared six schemes: (1) Ekman's six basic emotions, (2) Plutchik's wheel of emotion, (3) Watson and Tellegen's Circumplex theory of affect, (4) the Emotion Annotation Representation Language (EARL), (5) WordNet-Affect, and (6) free text classification scheme.

To measure their utility, we investigated their ease of use by human annotators as well as the performance of supervised machine learning when these schemes were used to annotate the training data. We assembled a corpus of 500 emotionally charged tweets. To ensure that the text contained emotion, we collected tweets based on their inclusion of emoticons, hashtags including emotion terms, idioms, and tweets with an automatically generated sentiment. We also include emotionally neutral or ambiguous tweets while correcting for bias towards certain emotions based on the choice of idioms, emoticons and hashtags.

The corpus was annotated manually using an online crowdsourcing platform (CrowdFlower) by five independent annotators per text document, per classification scheme.

The data provided here consists of the annotator id (their IP address), the annotation given, and the text document from the corpus, per classification scheme.

Research results based upon these data are published at http://doi.org/10.1007/s00357-019-9307-0

Meysydd Ymchwil

Prosiectau Cysylltiedig

Pushing the envelope of sentiment analysis beyond words and polarities (01.10.2013 - 30.09.2017)