Training sessions/Proposals/Wikidata and GLAM intermediate workshop/backupetherpad

From Wikimania 2016 • Esino Lario, Italy

+Google presentation: https://docs.google.com/presentation/d/14eVUkKCMItMT72Mq3TgNgdEhNO3KiM552e57keptbVg/edit?usp=sharing

Participants

Facilitator: Alex Stinson

  • Sandra Fauconnier / sandra.fauconnier@gmail.com / User:Spinster
  • Maarten Dammers / Multichill
  • Susanna Ånäs
  • Liam Wyatt \o/
  • Lluís Madurell / lowis.madu@gmail.com / User:Lluis_tgn
  • Venus Lui/ venuslui629@gmail.com / User: Venuslui
  • Jan Ainali / User:Ainali
  • Pierre-Yves Mevel / User:Pymouss
  • Nicolas Vigneron / @belett user:VIGNERON
  • DickBos - User:Dick Bos
  • Luca Martinelli / Sannita
  • Michelle van Lanschot/ user: mtmlan84
  • User:Arianit
  • Esther Solé / esther.sole@gmail.com / User:ESM
  • Liridon Selmani / liridon.slm@gmail.com / Liridon
  • psychoslave (mathieu stumpf guntz)
  • Vahur Puik / vahur@ajapaik.ee / User:Puik
  • User:Peaceray peaceray@cascadia.wiki
  • revi / revi@reviwiki.info / User:-revi
  • User:Jens_Ohlig_(WMDE)
  • user:Axel Pettersson (WMSE)
  • much more people; it's really crowded !! but no-one is checking Facebook, we all <3 GLAM+WIKIDATA !

Short project presentations (5 mins each)

  1. What is/was the project about?
  2. What did you learn?
  3. What do you need to make the project even better or to take it further?

Sum of all Paintings

  • https://www.wikidata.org/wiki/Wikidata:WikiProject_sum_of_all_paintings By Maarten the pioneer ;)
  • Not just creating painting but things around like painters. Authority control comes in to make sure the data are good.
  • For adding data to Wikidata a (CSV) file can be used. The tool Quick Statements is helpful then to populate easily items: https://tools.wmflabs.org/wikidata-todo/quick_statements.php
  • Western Europe + US has good to very good coverage, while other areas are poorly covered, mostly because accessing to data is really difficult. There is need to find volunteers in countries that may help in releasing new catalogues.
  • You can help to project move forward!

Flemish art museums on Wikidata

https://www.wikidata.org/wiki/Wikidata:Flemish_art_collections,_Wikidata_and_Linked_Open_Data

Whitepaper: https://www.wikidata.org/wiki/Wikidata:Flemish_art_collections,_Wikidata_and_Linked_Open_Data/Whitepaper

30,000+ artworks from 8 Flemish art collections - metadata added to Wikidata as a low-cost Open Data strategy for the museums

Good stuff

  • We closed part of the huge 'Belgium gap' in art and culture, at least on Wikidata
  • The collections become part of a larger whole (e.g. oeuvres of artists; historical and geographical context as described on Wikimedia projects). Do the museums appreciate this?
  • All kinds of visualizations possible that the museums wouldn't be able to produce themselves. Do they appreciate this?
  • There is also coverage for contemporary art, which is nice (remember copyright issues!)
  • The CSV files provided by the museums were very clean and well-prepared already. That made the import much easier.

To make it even better/easier:

  • It would be great to have upload tools that are non-coder-friendly, so that you don't need to rely on volunteers with bots
  • We need a working and user-friendly reconciliation service
  • It would be great to have a tool to correct errors in bulk (similar to VisualFileChange on Commons)
  • The data providers (in this case museums) want to see what happens to their data. They want to see statistics of edits, use and re-use of the data on and outside Wikimedia projects.
  • Wikidata and the museums' collection databases both change. We'd like to keep them mutually up to date but there are no tools for that yet.

Alex : what after when the import will be complete ? Answer: We'll do other things, on Wikipedia or elsewhere. Maarten : for example, article placeholder

Q: Was there feedback from found error on the GLAM side ? Answer: No real feedback, but some on them know how to find the error report themselves ;) In some cases, there was mistake during the import for technical reason. Sandra need a VisualFileChange like Commons but for Wikidata (we all want it too !!).

Alex pointed that the reporting is important.

Europeana and Europeana280

https://www.wikidata.org/wiki/Wikidata:Europeana_Art_History_Challenge

Liam talks about Europeana who don't own works so they care a lot about metadata ; for them, this is the thing and they're interested in having more metadata.

For example, by connecting data you can build a map of birthplace of creator of a museum with a simple SPARQL query. This is politically useful for Europeana too.

That's a new way to sell to GLAM institution.

As a matter of fact, through Wikidata we're creating a global back-end access point, that institutions may use to show their own collections, and run every action they need to further work on their data.

There is no catalog identifier (like ISBN) for works and Wikidata is this identifier ! Long live the Qs ! (They cannot even reach a consensus about things inside the same institution... and that applies to libraries too!)

Mobilizing Open Cultural Data (Finland)

https://fi.wikipedia.org/wiki/Wikipedia:Wikiprojekti_Avoin_kulttuuridata_hyötykäyttöön/en

10 cultural institutions:

  • Finnish Broadcasting Company YLE: Migrating from Freebase to Wikidata > http://wikimedia.fi/2016/04/15/yle-3-wikidata/
  • Laji.fi: Updating Finnish species names >> Ideas about a bird species event including data, sound archives & Wikipeida
  • Linked Data Finland: Historical place names > WikiProject Historical Place https://www.wikidata.org/wiki/Wikidata:WikiProject_Historical_Place
  • Kuntaliitto Basic facts about Finnish municipalities
  • Open science and research
  • Finnish National Gallery artist and artwork database >> Plan to include the collection data at large
  • GIS database of the National Board of Antiquities >> Plan to participate in Wiki Loves Monuments
  • National Library of Finland, Finnish thesaurus and ontology service Finto > https://www.wikidata.org/wiki/Property:P2347
  • National Library of Finland, National GLAM aggregator Finna

Extra outcomes

Breakout groups

  1. What's the implication of SPARQL for this? 10 Jens 
  2. Statistics: how do we get them, how do we show them? 12 Sandra
  3. Synchronisation between institutions and Wikidata 10 Susanna
  4. Integration on Wikimedia Commons 19 Maarten 
  5. Models for archives and libraries (there is already some discussion going on in wikidata) 6 Alex 

SPARQL

  • there is a link to map generation directly from SPARQL but also a new graph extension as well. These visualizations are becoming more important and are very popular, such as the timeline. note all information is available in English. 

https://query.wikidata.org/ an example query for cats : https://query.wikidata.org/#%23Cats%0ASELECT%20%3Fitem%20%3FitemLabel%0AWHERE%0A%7B%0A%09%3Fitem%20wdt%3AP31%20wd%3AQ146%20.%20%0A%09SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22%20%7D%0A%7D

Autocompletion is possible with CRTL+SPACE.

Right now (and for some time probably), SPARQL can only do query inside Wikidata data but not search for others informations (for exemple only the sitlink and not the article creation date or octets, etc.). But that could be very useful.

VIGNERON talks about : https://www.mediawiki.org/wiki/Extension:Graph (but too technical for me).

Pymouss point out that SPARQL can be a tool to improve data storage in GLAM and leading by example.

Queries + Graphs

  • Number of artifacts in an institution per decade of creation

Commons group

Discussion right now in Village Pump about integration

Is the Commons community ready? - Prototype

Lots of templates are already matched

Multilingual tags in Commons should be in Wikidata

Wikipedias are more conservative, notability on wikidata is lower than on wp. There are Monuments Data for wlm, same identifiers on commons. One can create creator templates with wikidata, information about notable items can be stored on wikidata

Synchronisation between institutions and projects

  1. How do we manage synchro right now? We need to make some mechanism to do this. What about if there are changes in the datasets?
  2. Do we want more universal solutions or do we want to continue on a case-by-case solution?

Project "Connected Open Heritage" (WMSE + WMIT): we're moving WLM databases to Wikidata, but we still need to correct many data and things to be sure that data to be fed in WD are right. We don't have still a solution, but it's along the project's objectives.

We need a common page for tools needed for mapping properties with databases, for automatic imports, in short: WE NEED CLEAR INSTRUCTIONS ON HOW TO DO THINGS.

We need also showcases examples: not just items, but visualisation of those items or combining tools to make them understand the impact of their donation.

  • Tool for matching WD with databases, and then get (automatic) reports for changes, noticing both ends(?) and then letting the involved end to make corrections.
  • An interesting point is: is the database software open source or closed source?

Probably we want only to collect, structure and then offer data -- not necessarily giving them back on an automatic base, but let the institution take those data anyway.

Example: structure a query with a fixed URI and/or create an API to identify all those items that, for example, have VIAF id but not $GLAM_id, or some other tool that lets us identify the "problematic" items (i.e. items with missing info).

  • What happens when institutions don't have any database, but they're relying on Wikidata? What if we don't have any "authorised" data?

YLE used Freebase, now uses Wikidata, and whenever they don't find a concept, they create it on Wikidata. <-- moving the focus from just donating stuff to Wikidata to curate that stuff ALSO on Wikidata

Anyway, Wikidata should remain a secondary source, so this might prove problematic

NO SELF-MADE DATA! :)

We also need to take into consideration that most authority should be convinced first about the value of crowd-sourcing: most of them still have the need to reaffirm their authority.

Libraries + Archives

Wikicite- how to take citation data into WikiBase (specificially journal articles and books) -- 

Books are complicated -- so ISBNs and other materials books  

Linking catalogues with the various parts of Wikidata's  -- What  

Libraries have the very messy data libraries have been creating 

Social Collections - Archives -- have a lot of professional 

SNAC -- Social Network of the Archival Context -- interface for content aggregator of archival -- authority -- Daniel  - http://socialarchive.iath.virginia.edu/

  • Wikidata as the tagging tool for the record

Queries -- for publications and connections between -- bibliographies from author's works -- bibliographies 

Finding aids for specific research cocncerns -- https://en.wikipedia.org/wiki/List_of_Australian_diarists_of_World_War_I -- could be aggregate finding

Need to think about coalating different records of research materails into Wikipedia as a starting point 

One of the challenges for the institutions in emerging communities, is that they don't think about data, they think about  

Statistics breakout group

We have experience with the monthly BaGLAMA statistics on Commons + with GLAMorous.

These deal with page views and with re-use of images. Similar statistics are interesting for Wikidata as well.

Some examples have already been created ad hoc for the Europeana Art History Project. It would be great to be able to recreate these for other projects. https://www.wikidata.org/wiki/Wikidata:Europeana_Art_History_Challenge#Statistics

GLAMs will probably be interested in monthly reports about their data. They want to know for 'their' dataset (which can be extracted via a SPARQL query)

It would be awesome if it would output a monthly PDF that is emailed automatically (compare with Google Analytics)

Wikimedians, on the other hand, want to see real-time statistics.

Changes to the data on Wikidata - useful statistics would be:

  • What changes take place?
  • Who edits the data?
  • What did people edit?
  • Which percentage of edits is done by bots? Which percentage of humans?
  • Translation statistics for descriptions, for labels, for certain sets of terms...
  • How does the 'completeness' of items evolve?
  • How many statements have external sources? What external sources (URL patterns)?
  • How many 'external' items (from outside the dataset) are connected to the dataset?

Can be visualized in 

  • progress bars (what percentage of statements is completed in a month / over the course of several months)

Use and re-use of the data on Wikimedia projects and elsewhere - useful statistics would be:

  • How often has the data been pulled from Wikidata?
  • In what contexts?
  • Integration on Wikipedia -> how often was their data included on Wikipedia, and how often has this been viewed?
  • Apps that use the Wikidata API - external re-users: can this be inventorized or crowdsourced e.g. Icomos and Unesco re-use monuments database