Table 1

Orthographic features.

Orthographic Feature

Reg. Exp.

Init Caps


Init Caps Alpha

[A-Z] [a-z]*

All Caps


Caps Mix


Has Digit


Single Digit


Double Digit


Natural Number


Real Number

[-0-9]+ [.,]+[0-9].,]+




[ivxdlcm]+ or [IVXDLCM]+

Has Dash


Init Dash


End Dash




This defines the complete set of orthographic predicate used by the system. The observation list for each token will include a predicate for every regular expression that token matches.

McDonald and Pereira BMC Bioinformatics 2005 6(Suppl 1):S6   doi:10.1186/1471-2105-6-S1-S6

Open Data