Named-entity recognition (NER) plays a vital role in information extraction, question
answering and text mining. Classic NER research activity has focused on tagging
instances of PERSON, LOCATION and ORGANISATION in the newswire domain.
New fine-grained NER (FG-NER) covers subtypes of the classic NEs. The goal of this
study was to investigate an FG-NER scenario with a set of new specific NEs (SNEs)
typical to a new restricted journalistic domain. Reports on birth of animals in zoos
were identified as such a productive domain. A 700-document corpus (241K tokens)
named ZooBirth was compiled from a newspaper archive and annotated. It contained
2,811 instances of the ten most frequent numerical SNEs shortlisted from 43
candidates. Using Conditional Random Fields allowed testing positional and orderwithin-
document features which were hypothesized to improve tagging SNEs. In
support of positional features, analysis of distribution of SNEs within documents
yielded SNE-specific patterns. The feature token position produced statistically
significant but modest improvement in the case of two SNEs (82.2 to 84.4 strict
precision, and 59.5 to 61.1 F-measure). Order-effect features improved with statistical
significance the F-measure when tagging the weight at birth (from 68.4 to 71.1 strict,
and from 75.5 to 80.6 lenient). In the final stage of the study a novel technique named
subtractive tagging was introduced to enrich negative examples when training CRF.
When tagging the newborn animal’s date of birth and the age of its mother strict recall
improved from 52.8 to 60.1 and 65.5 to 68.9, respectively, with statistical significance.