Content Quality Assessment
Skill Cartridge®
Detects off-topic and improper posts in your forums, blogs, wikis...
By TEMIS

Scope

The Content Quality Assessment (CQA) Skill Cartridge aims at detecting off-topic and improper documents among a corpus
To perform this detection, the component extracts several indicators of content quality of the text itself, both in terms of the matter discussed and of the fluidity of its phrasing.

This cartridge was designed as part of the ROBUST European project* (Risk and Opportunity management of huge-scale BUSiness communiTy cooperation). Within this project, the CQA Skill Cartridge is used to spot forum, wiki and blog posts that are off-topic within a thread and/or with a low level of readability.

Internals

CQA processes the input data to obtain the different quality indicators and creates a fingerprint for each document/group of documents. The fingerprint of a post is a description of the post in terms of its quality indicators:

Feature

Description

Content on topic

The topic of the content must be related to the topic of the space where it is posted.
No posting of advertisements/spamming (promotional materials, surveys, junk mail, spamming, chain letters, or any other form of solicitation, commercial or otherwise).

Use of a proper language

Formal language is more credible and easier to read than informal language (swear words, slang).

Provide information/value to the reader

The more precise the text is (high level of detail), the easier to understand.

Example: "He is my best friend" is less informative than "Max Mustermann has a best friend named Martin Muster".

Use of a fluid language

Avoiding long and complex sentences improves the understandability of a text.

Use of a correct grammar, typography and spelling

A good content is more credible and easier to read if it is free of spelling and/or grammatical errors.

Free of SMS language

A content is more credible and easy to read if it is free of instant messaging-type style (emoticons, web abbreviations…)

Explanation/Illustrations providing

Reformulation or giving examples can increase the understandability.

Use of an expressive subject header

Users have to describe in few words the matter of their post in the subject header (using around 50 characters or less).

Use of a respectful tone

Use common politeness formulas.

No use of offensive, insulting or defaming language.

Structured content

Use of paragraphs if the content is long (well-spaced text).

 

Typical applications

CQA is designed to perform analysis on user-generated content (UGC) organized in threads, and is therefore particularly suited for:

  • Content curation on social media: Highlight the most relevant input and clean up unwanted content
  • Forum moderation: Spot improper posts instantly and redirect off-topic messages
  • Community management: Identify potential influence makers among a huge amount of data

 

* This work has been partially funded by the EU project ROBUST, grant nr 257859

ROBUST

Skill cartridge
Document management Generic
Language(s): EN
Compatibility: Luxid® 6.0
Posting date: October 2013
Version: 1.0
Business model: Project
Related links
ROBUST Website