Critical issues presentations/StrepHit: Generating a New Loop of Trustworthiness and Reliability for Wikimedia Data
- Submission no. 20
- Title of the submission
StrepHit: Generating a New Loop of Trustworthiness and Reliability for Wikimedia Data
- Author of the submission
- Marco Fossati
- Country of origin
- Data Quality
- Information Extraction
- Natural Language Processing
- Frame Semantics
- Machine Learning
In all Wikimedia projects, the trustworthiness of the data plays the most crucial role in delivering a high-quality information system: in order to assess their truth, data should be validated against third-party resources, and few efforts have been carried out to tackle such an essential challenge in a homogeneous way.
One form of validation can be achieved via references to external (i.e, non-wiki), authoritative sources.
We present the outcomes of StrepHit, the IEG project that received the largest 2015 round 2 grant: StrepHit is a Natural Language Processing pipeline that harvests structured data from raw text and produces Wikidata statements with reference URLs.
The long-term objective is to generate a novel, automatic process to ensure that Wikidata content can be trusted and considered reliable, thus alleviating the burden of manual curation.