Tags for clause-level annotation
|Clause type||clause||(main): assertion (default), question, directive, other; embedded: proposition, e.question, conditional, temporal, adverbial, attributive|
|Temporal domain||time||past, future, present|
|Modal domain||mood||factual, counterfactual, possible|
|Event structure||event||bounded, ongoing, repeated, stative, cos (change-of-state)|
|Polarity||polarity||positive (default), negative|
- future is incompatible with factual
- present is incompatible with bounded
- With generics, it's not quite clear whether they are factual or possible (He sells vacuum cleaners. -- How many has he sold? -- None so far.) Consider the following two examples. The first sentence about a snake does not seem to imply that these snakes ever actually get very hungry, while the second sentence about the owl certainly implies that it is sometimes dawn. Or is that a function of world knowledge?
|It can go down for them, when it's very hungry (DAA.6043)|
|At dawn, it goes to sleep. (DAA.0498)|
We are not aiming to exhaust the possible combinations of those tag values for each corpus (which would give us 1800 different contexts). Instead, I will list below the most important distinctions we should try to tag according to clause type:
In addition to the obligatory tags, we will use keywords to keep track of properties that are very relevant to our questions, but may only apply to a subset of clauses. Related questions that we will not track systematically through tags include the following:
- Embedded questions: How are embedded polarity questions formed?
- Modal flavor and Modal force: How are certain modal meanings expressed that correspond to modal auxiliaries of Indo-European languages such as must and can?
- Adverbial clauses: What types of adverbial clauses do we find?
- Directives: are they restricted to second-person subjects?
- Structure of texts and discourse: are there TAM frame setters?
Clause segmentation and annotation
To facilitate annotations on parts of a sentence, i.e., on spans of more than one token which may be smaller than the whole sentence ("ref", after the respective Toolbox default marker
\ref), we have introduced "subref" annotations, which define the span which they apply to via the indices of the tokens it includes.
There are different types of subref annotations (legend further below in section [Legend]), of which "identified" types can be used to use different subref spans within the same ref.
For more details, see Clause segmentation and annotation.
|Genre||gn||story, explanation, conversation, report, speech|
|Filter||filter||no translation, prime|