The STF DBPedia locations Skill Cartridge® extracts names of geographical locations and their position (latitude, longitude) from text. It is built on a set of more 700000 places from DBPedia as of January 2013
Powered by Smart Taxonomy Facilitator (STF) technology, The STF DBPedia locations Skill Cartridge® contains more than 700000 place names from DBPedia . Each recognized place is dereferenced with latitide and longitude information as well as the DBPedia URI for each recognized place. Each resulting term receives a confidence score representing to what degree it appears as an appropriate topic or index term for the document.
The Skill Cartridge is compiled for the english language.
Using a functionality (not part of this Skill Cartridge®) that translates the latitude and longitude information into the respective positions on a map, the Skill Cartridge® can help to visualize the geographical aspect of a document or a collection of documents. For instance the picture below illustrates the effect of applying the Skill Cartridge to a text on the "Pacific Ring of Fire" and then using the extracted lat/long information to enrich a map:
STF technology adds two distinctive features to the thesaurus-based extraction :
- Fuzzy Term Matching that produces and recognizes variants of thesaurus terms, minimizing silence and increasing recall
- Relevance Scoring, that evaluates the contextual relevance of each recognized term, discarding the less relevant ones improving precision. Relevance Scoring exploits a range of heuristics, including statistics and part-of-speech tagging.
Customization and Extension
The Skill Cartridge® can be easily customized by tuning the STF parameters on the specific way of scoring results, on the required maximal amount of terms to return, their minimal confidence score, their minimal string distance to known thesaurus terms and many others.
Typical use cases for the STF DBPedia locations Skill Cartridge® include
- Automated indexing of documents with respect to the geographical locations mentioned in it
- Creation of hierarchical facets for enhancing search and browsing
- Document recommendations based on indexed terms
- Enriching maps with information on the locations referred to in an underlying document or a collection of documents