The NEC (Named Entity Chinese) Skill Cartridge® extracts three types of specific entities related to simplified Chinese : People, Organizations and Locations.
People names include chinese names and foreign person names (transliterations). Complete transliterated names (first and last name separated by a specific punctuation mark) as well as last names used alone can be extracted.
Organizations names cover companies (press agencies, banks, stock, manufacturers), political movements, political organizations, government bodies (ministries, commissions and committees, political unions of countries), international organizations, public organizations (schools, universities, hospitals, associations).
Locations names include continents, countries, cities, provinces, counties and villages.
To achieve this task, NEC embeds a rule-based method using knowledge from lexicon lists. These lists include family names, organization and location suffixes, countries and organizations names as well as other context words (verbs, prepositions, titles...).
Especially suited to extracting information from well-formed texts, NEC may be successfully used standalone or in combination with other components in a wide variety of use cases where the entity categories it covers are of particular interest, such as for example: Enriching articles in newspapers, magazines or B2B trade publications or proprietary corporate content with metadata regarding the key entities mentioned, for example to boost navigation or retrievability.
- Windows only
- Specific packages for chinese extraction must be installed to use this Skill Cartridge®. Please contact us if you are interested in this extractor.