The RTF Skill Cartridge® is designed to identify and extract the most characteristic (relevant) terms or topics of documents by comparing the vocabulary to a statistical model based on a reference corpus. The analyses performed are based on a quantitative approach that compares the frequency of the terms extracted from a specific text against an average frequency. When a term frequency is surprisingly high in comparison to the reference frequency, it is assigned a high score. A final filtering adds the terms with the highest ranks to the list of topics for the text.
This approach prevents irrelevant terms from being extracted as topics on the basis of absolute frequency. Extracted topics are associated with a score that displays their relative level of originality.
The RTF Skill Cartridge® uses the Analytics2 Skill Cartridge® for term extractions. It also contains a reference corpus (model) for three languages (English, French, and German) whose content is exported from Wikipedia. This content is considered a representative sample of the languages. The reference corpus has been analyzed with the Analytics2 Skill Cartridge® to extract terms, in order to define an average frequency of words and noun phrases in a generic context.
This model can be adjusted through training on any corpus that appropriately reflects your specific domain.
The RTF Skill Cartridge® is mainly used to perform cross-domain, cross language analyses. It can be used in domain-specific contexts for which no terminology has yet been defined. Unlike other TEMIS Skill Cartridges® that are designed to address specific markets with a predefined accurate concept extraction, RTF can be fed with relevant corpora from any domain in order to allow specialized topic recognition. This Skill Cartridge® requires no pre-defined conceptual structure and can be applied to a wide variety of use cases such as Similar Document Recommendation, Clustering, and domain-specific Terminology Extraction.
The RTF Skill Cartridge® can be used to:
- Create a terminology base for a new domain.
- Unveil new technologies through the presence of unknown technical terms.
- Index an important document corpus according to the domain-specific terms they contain.
- Extend an existing terminology base with newly emerging technical terms.
- Highlight potential associations of any type between specific terms, such as people, places and organizations using the Luxid® proximity tool.
- Enhance controlled authoring or machine translation processes by managing corporate terminology, pinpointing inconsistencies and spelling errors, etc.