Predicted non-projectivity. Accuracy of how well each model predicts the correct number of non-projective structures for each document. Each (x, y) datapoint compares the number of gold non-projective arcs (x) in a document with the number of non-projective arcs in the predicted output for that document (y). The area of each point is proportional to the number of documents it covers. A perfect model would put all points along the diagonal y = x (dashed grey line). The Stanford 1N and 2N decoders, despite having the potential to produce non-projective structures, produce almost completely projective structures. The UMass and FAUST models produce more non-projective structures, though neither is especially precise at predicting the correct amount of non-projectivity. These experiments were performed over the development section of Genia.
McClosky et al. BMC Bioinformatics 2012 13(Suppl 11):S9 doi:10.1186/1471-2105-13-S11-S9