Figure 7.
Success estimate of the extraction pipeline by human expert manual validation. These percentages were calculated upon a manual sampling and validation protocol
conducted on 100 abstracts. Correct – Database confirmed: These are the mutations
that have been found already in at least one of the analyzed databases (Uniprot, SAAPdb,
COSMIC, KinMutBase or Greenman). Correct-Manual validation: This subset corresponds
to the mutation-protein pairs that have been found correct after manual validation
on 100 abstracts. Correct – Orthologue: This subset corresponds to the cases where
mapping is confirmed by manual validation and the mutation is mapped to a non-human
orthologue. Incorrect Mutation to Protein Assignment: Corresponds to the cases where
both proteins share the same amino acid at the mutated position and the algorithm
choses the incorrect pair. Incorrect Mutation assignment: Cases where the mutation
is not properly identified. An interesting particular case are the confusion with
cell lines (accounting 66% of this category) Too ambiguous even for human experts:
Odd little informative cases where even human experts reading the abstracts are not
able to identify to which protein the mutation corresponds to.
Krallinger et al. BMC Bioinformatics 2009 10(Suppl 8):S1 doi:10.1186/1471-2105-10-S8-S1 |