This Skill Cartridge® extracts scientific terminology from English documents. It has been constructed using a large vocabulary (>1.7M terms) derived from DBPedia, taking into account only labels that are categorized underneath "Science".
DBPedia (http://dbpedia.org) is a great resource but the category schema is often inconsistent and also the descriptors in DBPedia come to a considerable extent from trivia subjects (athletes, films, games, sitcom characters, ...). Focussing on the "Science" subbranch of the DBPedia category schema attempts to leave the noisy terms as much as possible aside.
The Skill Cartridge® by default is set to return the top most N% of the found descriptors and also we have seen best results with longer texts (>> 1 page of text) whereas with shorter, abstract-like texts often noisy terms can be found even among the top 15%. Each term comes with its DBPedia URI as attribute which is the first step for linking and enriching the indexed documents with background knowledge and related content.
A few things are essential in order to get good results:
For one thing it is essential to keep in mind what the Skill Cartridge® does and what it does not do. It does not index all scientific concepts in a document, instead it delivers a paratemerizable number of the most pertinent concepts. Setting this parameter is essential. For a typical scientific publication of a few pages of text, the topmost 30 concepts or the top 5-10% of concepts are often of high quality.
Also since we want the Skill Cartridge® to apply in a rather strict form (not accepting a wide variety of variations of the known concepts), we recommend to set the parameters:
- 'slop' to 0
- 'allow spelling variants' to false
Please make sure to examine the settings of the Skill Cartridge® after installation to make sure they meet your use case. Study the STF User Guide (which comes as part of this Skill Cartridge®) for more information on the available parameters.
The Skill Cartridge® has been compiled using Luxid7.0. It is provided as is in the hope that it is useful. No warranty!