The Smart Taxonomy Facilitator (STF) Skill Cartridge®, which can be used to extract terms and noun phrases from thesauri, acts as a vehicle for applying taxonomies and controlled vocabularies to documents. This framework is distributed under the Creative Commons Attribution license (CC By), which allows its users to distribute their own cartridges built with TEMIS technology.
Additional capabilities of STF include:
- Exploiting part-of-speech tagging information to avoid false positives caused by ambiguous taxonomical terms.
- Cleaning a thesaurus by handling bad naming and enabling proper identification of duplicates and variants at different places of the hierarchy
- Normalizing the extracted terms of other Skill Cartridges® according to the user-specific domain.
The strength of the STF Skill Cartridge® is its ability to find variants of thesaurus terms. It embeds technologies that help overcome two key weaknesses associated to taxonomy-based indexing:
- Fuzzy Term Matching: STF automatically produces variants of the forms of terms present in the taxonomy, thereby helping to improve recall.
- Relevance Scoring: STF applies a range of heuristics to assign a relevance score to each extracted concept and discards the less relevant ones, therefore improving extraction precision.
Customization and Extension
The STF Skill Cartridge® offers unique customization properties that allows users to build and distribute their own Skill Cartridge® with their own taxonomy, according to their area of interest. This can be done in the Annotation Workbench.
The STF Skill Cartridge® is well suited to efficiently leverage domain-specific or application-specific thesauri for Entity extraction. The process is both fast and simple: false positives are managed using the close environment of the term in the hierarchy and false negatives are significantly reduced thanks to the capacity of STF to search for variants.