User's manual
Contents
User roles
Each user has a role in the system: reader, editor or administrator:
- Reader – may view documents
- Editor – may edit the documents (Section 6).
- Administrator – has access to configuration (Section 7).
Roles are assigned to users by administrator.
Getting started
System can be accessed from: IA tagger home
It requires: Chrome, Opera or Mozilla Firefox. To start, log in. The default password is tagger, which can be changed at any time.
login | <request login from administrator> |
password | tagger |
How to upload a text document
Select Documents. Click File/Wybierz dokument to select a text document that should be uploaded. In the Language window select the language of the document. Click Submit, to confirm the choice of the file and the language. The name of the file will be added to the list of documents uploaded so far (Documents list).
During the upload, the text is split into sentences and words. One line corresponds to one sentence. A string of characters between spaces is interpreted as a word.
How to open a document
Select the Documents option and click the name of the document from the list (or the adjacent icon).
How to save a document
You don’t have to save the document. Each modification is automatically saved.
How to annotate a document
An opened document is split into sentences. Click next to the sentence you want to annotate. You can navigate between sentences using CTRL + arrow (up or down).
Adjusting the sentence splitting
You can adjust the automatic sentence splitting. Switch on the Edit mode (just below the menu). You can split the sentences with the scissors icon () or merge them with the glue icon ().
Editing the contents of a sentence
You can modify a sentence by adding or deleting words. Click any word in a sentence. You will see the following icons:
- - insert a new word before the current one - available for all words, which are not postpositions
- - add a new word after the current one - available only for the last word
- - remove the word (available only for words that do not have postpositions; to remove a word that has a postposition, you must first separate it from the postposition)
- - or (Ctrl-y) - mark the next word as a postposition (available only for words that do not have postpositions, are not postpositions themselves and do not end the line)
- - or (Ctrl-u) - separate the word from the postposition (available only postpositions or words having postpositions).
Word breaking
Words can be split into a stem and a suffix: Click the word, select the position of the split with the mouse and press Ctrl-J. Confirm with Enter. To remove the split press Ctrl-K and confirm with Enter.
Word annotation
Words can be annotated at six different levels (levels can be set in Configuration menu, Section 7). To annotate a word at the selected level click on the level under the word you annotate.
Annotation levels
- LEXEME = equivalent English gloss
- GRAMMAR = grammatical role.
Type the starting character(s) of the tag and you will see the list of all tags beginning with the typed characters. For example: typing the ‘f’ character displays three tags starting with ‘f’: F (feminine), FOC (focus), FUT (future); typing the „pr” string displays six tags starting with „pr”. Click the tag from the list to annotate the word. - OTHER LEVELS
Annotation at other levels consists in selecting one or more suggestions offered by the system. To select or deselect the offered suggestion click it or press the shortcut button (given in the brackets). Moving to another edit box or pressing enter saves the value of the current edit box.
Sentence annotation
Sentences are annotated at two levels (levels are set up in the Configuration menu, Section 7): Add info (additional information) and English (English translation). At both levels sentences are annotated manually.
Tags suggested by the system
The suggestion cloud
IA tagger suggests tags for words that fulfil one of the conditions:
- The same word has already occurred in one of the documents and has been tagged.
- The structure of the word allows the system to automatically deduce its tags.
Suggestions for tags appear in a cloud above the word. At most three suggestion lines may appear for one word. Each line suggests tags for a fee levels. For an instance, the suggestions for the word nagari looks as follows:
You can accept a chosen suggestion by clicking the ‘check’ symbol on the left side of a cloud. In the above example, the first 2 suggestions are generated according to annotations already existing in the system. By clicking the edit icon you can go to the word that was used to generate the suggestion. The number in the red border is the suggestion frequency score. It corresponds to the number of words in the system annotated with the tags of the suggestion.
The third suggestion in the example is marked with the letter R in a red border. This means that it was generated according to predefined rules. The rule used to generate this suggestion is "*|i". It catches words with the suffix "i".
Accepting the first suggestion (shortcut ctrl + 1) would add the following tags to the annotation:
Level | Tags |
---|---|
LEXEME | town |
GRAMMAR | LOC, M, SG |
POS | NOUN |
Tags overwritten by suggestions
Note that if the word for which the suggestions were generated had already been annotated, the suggested tags would overwrite existing annotations on corresponding levels. Original annotations on the levels which are absent in the suggestion are left untouched after applying this suggestion. In the above example, let us suppose that the word nagari already has the following annotations:
Level | Tags |
---|---|
GRAMMAR | INS |
SEM | REC |
If we now apply the first suggestion, the tag INS on the level GRAMMAR will be overwritten with 3 tags: LOC, M and SG, while the tag REC on the level SEM will be left untouched.
System configuration
The configuration of the system can be set solely by the administrator - other users don’t have access to configuration options.
By clicking Configuration you can manage the following options:
Users
This option allows to manage system users. You can add a new user (+ Add user) or delete a user (Delete). You can assign or change a role of the user (Change Role). You can reset the user’s system access password to a default password (Reset password), (e.g. when user forgets the password).
Languages
This option allows to manage languages of tagged documents. You can add a new language (+ Add language). The language will be added to the list of languages supported by the system. Once a new language is added, opening and tagging documents saved in this language becomes possible.
You can delete a chosen language (Delete) or edit it (Edit). Language edition allows you to change a language code (Code), which helps in shorter identification of the language, or to change the language description (Description).
Word annotation levels
This option allows you to manage the levels of word annotation. For each level you manage the following features:
- Name
- Description
- Strict Choice - if this option is selected (+), annotations on this level have the form of tags (e.g. POS), not text (e.g. LEXEME)
- Multiple Choice - if this option is selected (+) in addition to the above, tags for the words are chosen from a long list by their names (e.g. GRAMMAR)
Not selecting any of these options means that the word on this level should be annotated with text (e.g. LEXEME).
Levels are set in a pre-defined order. For example, by default the POS level is set as the first level. You may change the order of the levels by using arrows in the Order column. You can add a new level (+ Add word annotation level) or delete an existing level (Delete). You can edit the existing level (Edit), i.e. edit its name or the description, or modify Strict Choice and Multiple Choice.
For each level you can add or delete tags. To do so, you can use the option Edit tags. For each tag you should define its value (Value) and description (Description). In the process of tagging the editor may annotate words solely with tag values defined for each particular level.
You can change the order, in which tags are suggested to user. For example, for the POS level, the first suggestion on the list is the NOUN tag. You can move it to another position on the list by using arrows.
Sentence annotation level
This option allows you to manage levels of the sentence annotation, i.e. the levels, on which the Editor annotates whole sentences. Managing sentence annotation levels is similar to managing word annotation levels (word tags are not configured though).
Statistics
You can generate statistics of selected/tagged texts by selecting the Statistics option from the main menu. There are two types of statistics: word statistics and collocation statistics.
Word statistics
You can display word statistics by selecting a caption starting with a black dot, e.g.
Verb participles (PTCP on the level "Grammar", V on "SYNTAX")
The system displays information on the number of words, for which the tags for the GRAMMAR level include PTCP and the tags for the SYNTAX level include V. Moreover, the list of words meeting the criteria, along with all tags, is displayed.
Colocation statistics
You can display colocation statistics by selecting a caption starting with an empty dot, e.g.
PTCP(PTCP) + A(INS)
The system displays information on pairs of words occurring in the same sentence, first of which meets the criteria PTCP(PTCP) (tags for both GRAMMAR and POS include PTCP) and the other meets the criteria A(INS) (tags for SYNTAX include A and tags for GRAMMAR include INS). Moreover, for each pair the system gives information on a distance between the words in the text (how many words are between them).
Filtering by documents
All types of statistics offer the feature of per document filtering. At the top of the Statistics page you can find the list of all documents in the system with corresponding checkboxes. If the checkbox is full, the words from this document are counted and shown in the statistics. If the checkbox is empty, all the words from the corresponding document are ommitted in the statistics.
Clicking on the checkboxes refreshes the statistics automatically.