Critical issues presentations/StrepHit: Generating a New Loop of Trustworthiness and Reliability for Wikimedia Data

From Wikimania 2016 • Esino Lario, Italy
Submission no. 20
Title of the submission

StrepHit: Generating a New Loop of Trustworthiness and Reliability for Wikimedia Data

Author of the submission
  • Marco Fossati
Country of origin

Italy

Topics

Research, Technical

Keywords
  • Wikidata
  • Data Quality
  • Information Extraction
  • Natural Language Processing
  • Frame Semantics
  • Crowdsourcing
  • Machine Learning
Abstract

In all Wikimedia projects, the trustworthiness of the data plays the most crucial role in delivering a high-quality information system: in order to assess their truth, data should be validated against third-party resources, and few efforts have been carried out to tackle such an essential challenge in a homogeneous way.

One form of validation can be achieved via references to external (i.e, non-wiki), authoritative sources.

We present the outcomes of StrepHit, the IEG project that received the largest 2015 round 2 grant: StrepHit is a Natural Language Processing pipeline that harvests structured data from raw text and produces Wikidata statements with reference URLs.

The long-term objective is to generate a novel, automatic process to ensure that Wikidata content can be trusted and considered reliable, thus alleviating the burden of manual curation.

Result

Not accepted