The Mascara corpus annotated for metonymies, including annotation
schemes, is downloadable here! The training and testing data for Semeval 2007 (again including all documentation) is downloadable here. You will need to register for downloading the SemEval data. Please note that Mascara data and semeval data partially overlap. Should you want to use both, you need to remove all data that occurs twice. Currently
I suggest to use Semeval data only, as the Mascara data needs an overhaul.
ISNotes: Information status and anaphora annotation for
OntoNotes 50 OntoNotes WSJ articles annotated for information
status as well as added bridging and comparative anaphora
annotation (on top of the OntoNotes coreference annotation). The
corpus is available in Standoff annotation as you will need an
OntoNotes 4.0 license to access the underlying texts. The corpus is
downloadable from my collaborators at the Heidelberg Institute of
here . The corpus was used in our ACL
2012, NAACL 2013 and EMNLP 2013 papers.
Leeds Set Element Bank 75 WSJ articles annotated for set
element relationships (both intersententially and
intrasententially). The corpus is in standoff format as you need a
Penn Treebank Release 2 license to access the underlying texts. This
data is available
here . It was used in our RANLP 2011
and IWCS 2013 papers.
SAL: Sentiment-Annotated Lexicon (WordNet 2.0)
All 120, 000 WordNet synsets annotated for subjectivity using our method from
our Naacl 2009 paper. This data is
available here .
Manually Annotated Word Sense Sentiment
Fangzhong Su and I have annotated over 1000 WordNet synsets for subjectivity.
The data is used in our Coling 2008 paper as well as in our Naacl 2009 paper
This data is available here . The
README describes the data and its copyright restrictions.
created 1994-11-04, last modified 2013-23-08