Locates personal identification information for removal to enable anonymous distribution of documents. Originally developped to anonymize case law.


This is implemented as a two step process. First, the SC® recognizes and annotates all family names, company names, postal addresses, e-mail addresses, phone numbers and fax numbers in the document. These items are then tagged as either 'to anonymize' (in cases where for example a name can be replaced by a single letter) or 'to exclude' from anonymization.

  • For People names, the SC® uses titles as triggers to exclude the names of attorneys, magistrates and experts from anonymization. In the current version of the SC®, first names are also excluded from anonymization.
  • Regarding Addresses, only those associated with a party to the case are qualified as 'to anonymize'.
  • Company names are automatically excluded, unless they contain a family name cited in the document as a party.
  • All phone numbers, fax numbers and e-mail addresses are qualified as 'to anonymize'.

The SC® provides two annotation procedures. The first (Anonymization) extracts all the entities and tags them as 'to anonymize' or 'to exclude'; the second (AnoSansExclu) only extracts the 'to anonymize' entities.

Typical Applications

The original use case for which this SC® was developped is the anonymization of legal decisions for online publication in confirmity with national regulations. It may also be adapted for anonymization of any large-scale corpus containing personal information (for example, in Healthcare-related applications).

Skill cartridge
Language(s): FR
Compatibility: Luxid® 6.0
Posting date: January 2013
Version: 2.2
Business model: Project