Critical issues presentations/Wikidata Human Gender Index: How Should Biography Gender Gap Data Serve The Community?

From Wikimania 2016 • Esino Lario, Italy
Jump to navigation Jump to search

Scheduled for Saturday, June 25, 10:30 - Space 3

Submission no. 120
Title of the submission

Wikidata Human Gender Index: How Should Biography Gender Gap Data Serve The Community?

Author of the submission
  • Maximilian Klein
Country of origin

United States of America



  • gender gap
  • wikidata
  • statistics
  • tools

Last Wikimania we introduced the project “Wikidata Human Gender Indicators” (WHGI), a biographic dataset based on Wikidata that looks at gender disparities across time, space, culture, occupation and language. Now that a prototype has existed for a year though it's time to evaluate if it can serve its purpose - to raise awareness of the biography gender gap that exist on Wikipedia. Are the statistics useful to the community? How can we make it that way?

First we present updates on what WHGI has been measuring, the gender statistics of humans in Wikidata recorded every week from 2014 to present. Our measurements show female representation is rising in most Wikipedia languages. English Wikipedia's biographies grew 0.5% more female in 2015, but Japanese, Estonian and Lithuanian Wikipedia's each topped a 4% rise. Wikidata is also becoming richer in tracking information about humans. The usage of date of birth, date of death, place of birth, citizenship, ethnic group and occupation properties all increased significantly at the same time.

Wikidata's gender disparity is seems modelling the real world more closely over time. Our technique was to validate the gender disparity of Wikidata against 3 external measurements: the world’s historical population, “traditional” gender-disparity indices (GDI, GEI, GGGI and SIGI), and occupational gender according to the US Bureau of Labor Statistics. As we correlate Wikidata's disparity to the real world we also learn more about Wikipedia notability policies - that they inherit the biases of who are in positions of power.

Now that we are collecting data on the biography gender gap, how can we use it to best serve editors? We created the website which displays 4 interactive visualizations. There editors, can browse and make their own interpretations of the dataset, but is this enough? We could create automatic alerts for spikes and drops in gender representation. Would this be useful? We are seeking community input on what kind of tools or search they would like from WHGI data.



Interested attendees and comments