The Mascara corpus annotated for metonymies, including annotation schemes, is downloadable here! The training and testing data for Semeval 2007 (again including all documentation) is downloadable here. You will need to register for downloading the SemEval data. Please note that Mascara data and semeval data partially overlap. Should you want to use both, you need to remove all data that occurs twice. Currently I suggest to use Semeval data only, as the Mascara data needs an overhaul.

ISNotes: Information status and anaphora annotation for OntoNotes

50 OntoNotes WSJ articles annotated for information status as well as added bridging and comparative anaphora annotation (on top of the OntoNotes coreference annotation). The corpus is available in Standoff annotation as you will need an OntoNotes 4.0 license to access the underlying texts. The corpus is downloadable from my collaborators at the Heidelberg Institute of Theoretical Studies here . The corpus was used in our ACL 2012, NAACL 2013 and EMNLP 2013 papers.

Leeds Set Element Bank

75 WSJ articles annotated for set element relationships (both intersententially and intrasententially). The corpus is in standoff format as you need a Penn Treebank Release 2 license to access the underlying texts. This data is available here . It was used in our RANLP 2011 and IWCS 2013 papers.

SAL: Sentiment-Annotated Lexicon (WordNet 2.0)

All 120, 000 WordNet synsets annotated for subjectivity using our method from our Naacl 2009 paper. This data is available here .

Manually Annotated Word Sense Sentiment

Fangzhong Su and I have annotated over 1000 WordNet synsets for subjectivity. The data is used in our Coling 2008 paper as well as in our Naacl 2009 paper here . The README describes the data and its copyright restrictions.

created 1994-11-04, last modified 2013-23-08