Critical issues presentations/Detecting Copyright Concerns in Near Real Time

From Wikimania 2016 • Esino Lario, Italy
Jump to navigation Jump to search
Submission no. 29
Title of the submission

Detecting Copyright Concerns in Near Real Time

Author of the submission
Country of origin



Outreach, Projects, Technical

  • Copyright
  • WikiProject Med Foundation
  • Wikimedia Israel
  • Collaboration

An ongoing problem is people adding copyrighted material to Wikipedia. When this occurs and it is not rapidly removed it puts our shared brand at risk. Often it is done not with malicious intent but simple due to a misunderstandings of copyright. Occasionally editors in obscure topic areas make 10 of thousands of concerning edits before the issues are noticed.

We began discussing potential technological methods to assist the editing community in addressing this problem back in 2012. A partnership was formed with Turnitin (basically they agreed to give us access to their API without charges).

At Wikimedia in London a community programmer (User:Eran) joined our team and they hacked together a simple bot. Since that time we have been working to improve the bots functioning and develop a community to address its output.

This presentation will not only discuss the internal workings of the bots for the technical crowd but the efforts to develop a community. The accuracy of the results will be discussed and possible methods to improve them. Additionally there's the possibilities to get this bot up and running in languages other than English.



Interested attendees and comments

Interested attendees:


The presentation slides are avaliable here and are under a CC BY SA license.