# Tagset

## Tags for clause-level annotation

Categories Name Values
Clause type clause (main): assertion (default), question, directive, other; embedded: proposition, e.question, conditional, temporal, adverbial, attributive
Temporal domain time past, future, present
Modal domain mood factual, counterfactual, possible
Event structure event bounded, ongoing, repeated, stative, cos (change-of-state)
Polarity polarity positive (default), negative

### Incompatabilities

• future is incompatible with factual
• present is incompatible with bounded
• With generics, it's not quite clear whether they are factual or possible (He sells vacuum cleaners. -- How many has he sold? -- None so far.) Consider the following two examples. The first sentence about a snake does not seem to imply that these snakes ever actually get very hungry, while the second sentence about the owl certainly implies that it is sometimes dawn. Or is that a function of world knowledge?
ma kuowilye ka we seling myane, ka myaa te ate ten
REAL know MOD POT descend with COMP hunger DIST bite very
It can go down for them, when it's very hungry (DAA.6043)
ka or te yuop te nge mwe vyan pwe pwer myaek
MOD place DIST be.dawn DISC 3s REAL go CONT stay be.night
At dawn, it goes to sleep. (DAA.0498)

### Prioritization

We are not aiming to exhaust the possible combinations of those tag values for each corpus (which would give us 1800 different contexts). Instead, I will list below the most important distinctions we should try to tag according to clause type:

Clause type Time Mood Event Polarity Total
Assertion 3 3 5 2 180
Question 3 1 1 2 6
Directive 1 1 1 2 2
Other 0 0 0 0 0
Proposition 3 3 1 2 18
E.question 3 1 1 1 3
Conditional 3 2 1 2 12
Temporal 3 2 1 1 6
Adverbial 3 3 1 1 9
Attributive 3 1 1 2 6
Total 232

In addition to the obligatory tags, we will use keywords to keep track of properties that are very relevant to our questions, but may only apply to a subset of clauses. Related questions that we will not track systematically through tags include the following:

• Embedded questions: How are embedded polarity questions formed?
• Modal flavor and Modal force: How are certain modal meanings expressed that correspond to modal auxiliaries of Indo-European languages such as must and can?
• Directives: are they restricted to second-person subjects?
• Structure of texts and discourse: are there TAM frame setters?

### Clause segmentation and annotation

To facilitate annotations on parts of a sentence, i.e., on spans of more than one token which may be smaller than the whole sentence ("ref", after the respective Toolbox default marker \ref), we have introduced "subref" annotations, which define the span which they apply to via the indices of the tokens it includes.

There are different types of subref annotations (legend further below in section [Legend]), of which "identified" types can be used to use different subref spans within the same ref.

For more details, see Clause segmentation and annotation.

Categories Name Values
Notes nt Free text
Certainty ct
Age age Birthyear of participants
Gender gen Gender of participants
Storyboard storyboard Title of the storyboard which was used in elicitation, eg. Woodchopper
Frame number frame Title plus number of the frame, eg. StoryboardsWoodchopper01
PDF pagenumber ppdf Number of the PDF page containing the frame
DOI sbdoi Storyboard DOI with resolver prefix (https://doi.org/)

## Text-level tags

Categories Name Values
Genre gn story, explanation, conversation, report, speech
Synopsis syn free text
Filter filter no translation, prime

## \ref-level tags

Name Description
\rtx Retained \tx line, i.e., the original \tx line, untokenized, normalized for whitespaces