Critical issues presentations/How do we handle uncertain quantitative scientific data?

From Wikimania 2016 • Esino Lario, Italy
Submission no. 212
Title of the submission

How do we handle uncertain quantitative scientific data?

Author of the submission
  • Finn Årup Nielsen
Country of origin

Denmark

Topics

Projects, Research

Keywords
  • Wikidata
  • science
  • sources
Abstract

Science is presently experiencing what has been referred to as a

reproducibility crisis where individual scientific results are viewed

with skepticism and not readily trusted. Indeed, some studies

investigating how reported scientific results compare with later

scrutiny report double digit percentage figures for failure rate when

attempting replication, see, e.g., the "Reproducibility project" of

psychological science that made headlines in 2015.

Wikipedians have already a healthy amount of skepticism towards

science with the English Wikipedia stating in its guidelines for

"Identifying reliable sources": "Isolated studies are usually

considered tentative and may change in the light of further academic

research" and "Secondary sources, such as meta-analyses, textbooks,

and scholarly review articles are preferred".

The question is if Wikimedia projects can be better at handling

uncertain quantitative scientific data, - both for the sake of science

and for Wikipedia.

One way would be to represent quantitative research results from

published papers in structured format with an ability to

aggregate, analysis and visualize data from multiple (published)

studies together. Wikidata could possibly be used as a backend. Since

2015 Wikibase has been able to represent quantities, but it is not at

all clear whether a Wikibase representation is convenient in such an

application.

Say we what to know a volume of a particular brain area. A "volume"

is a property in Wikidata and so the volume of the area can be specified

with a value in the item on Wikidata for the brain area. However,

reported volumes vary considerably between studies. Potentially, we

could add the values from all published studies reporting volume for the

brain area and use Wikidata qualifiers and references to keep track of

"metadata" related to the volume data: species, diagnose,

bibliographic information, number of subjects in the study, standard

deviation of group mean volume.

There is a range of critical issues, e.g., is individual research

results "notable enough" for Wikidata? What is a succinct

representation of data? How can we compute with such data? Can we

better support Wikidata quantitative weight on statements? For

instance, Wikidata states that a specific bioinformatics item is

involved in the biological process "memory". Though referenced,

Wikidata does not state to which degree or which studies initially

made such a claim.

If Wikidata gets it right then it could be used as a backend for

quantitative scientific results with large-scale meta-analysis

possible that would dwarf "ordinary" studies.

Result

Not accepted