Results

Task 1 (A)

System namePOS accuracy (known words)POS accuracy (unknown words)POS accuracy (Subtask A score)
Toygger 95,2425 65,4741 94,6343
KRNNT_AB 94,4888 61,1807 93,8083
KRNNT_ABv 94,3022 62,0751 93,6438
NeuroParser 94,209 64,9374 93,6109
KRNNT_AB_morf1 93,6716 59,7496 92,9785
AvgPer_Forced 91,4104 67,0841 90,9134
AvgPer_RAW 89,4776 62,4329 88,925
Morphosyntactic disambiguer N/A N/A (submitted file did not comply to standard)

Notes

Task 1 (A) and (B) evaluation data set (gold labels): gold-task-a-b.xml.gz

Command used to evaluate systems:

PYTHONIOENCODING=utf8 ./tagger-eval.py -t poleval submission.xml gold-task-a-b.xml

Task 1 (B)

System nameLemmatization accuracy (known words)Lemmatization accuracy (unknown words)Lemmatization accuracy (Subtask B score)
KRNNT_AB 98,194 80,8587 97,8398
KRNNT_ABv 97,959 79,7853 97,5876
KRNNT_AB_morf1 97,7724 80,1431 97,4122
NeuroParser 97,3731 84,6154 97,1125

Notes

Task 1 (A) and (B) evaluation data set (gold labels): gold-task-a-b.xml.gz

Command used to evaluate systems:

PYTHONIOENCODING=utf8 ./tagger-eval.py -t poleval submission.xml gold-task-a-b.xml

Task 1 (C)

System namePOS accuracyLemmatization accuracyOverall accuracy (Subtask C score)
KRNNT_voted 92,9833 96,9089 94,9461
KRNNT_raw 92,6242 96,8581 94,7411
NeuroParser 91,5865 97,0032 94,2949
MorphoDiTaPL 89,6746 95,7769 92,7258

Notes

Task 1 (C) evaluation data set (gold labels): gold-task-c.xml.gz

Command used to evaluate systems:

PYTHONIOENCODING=utf8 ./tagger-eval.py -t poleval -s submission.xml gold-task-c.xml

Task 2

gold_labels - Task 2 evaluation data set (gold)

Results computed vs gold labels file:

System nameVariantAccuracy
Tree-LSTM-NR (predictions2) 0.795
Tree-LSTM-NR (predictions3) 0.795
Tree-LSTM-NR (predictions1) 0.779
Paschał (results) 0.770
Pawsemble (output-labels) 0.769
Tree-LSTM-ML (test_file) 0.768
Alan Turing climbs a tree (fast_emblr) 0.678
Alan Turing climbs a tree (slow_emblr) 0.671
Alan Turing climbs a tree (ens) 0.670