Critical issues presentations/How do we handle uncertain quantitative scientific data?
- Submission no. 212
- Title of the submission
How do we handle uncertain quantitative scientific data?
- Author of the submission
- Finn Årup Nielsen
- Country of origin
Science is presently experiencing what has been referred to as a
reproducibility crisis where individual scientific results are viewed
with skepticism and not readily trusted. Indeed, some studies
investigating how reported scientific results compare with later
scrutiny report double digit percentage figures for failure rate when
attempting replication, see, e.g., the "Reproducibility project" of
psychological science that made headlines in 2015.
Wikipedians have already a healthy amount of skepticism towards
science with the English Wikipedia stating in its guidelines for
"Identifying reliable sources": "Isolated studies are usually
considered tentative and may change in the light of further academic
research" and "Secondary sources, such as meta-analyses, textbooks,
and scholarly review articles are preferred".
The question is if Wikimedia projects can be better at handling
uncertain quantitative scientific data, - both for the sake of science
and for Wikipedia.
One way would be to represent quantitative research results from
published papers in structured format with an ability to
aggregate, analysis and visualize data from multiple (published)
studies together. Wikidata could possibly be used as a backend. Since
2015 Wikibase has been able to represent quantities, but it is not at
all clear whether a Wikibase representation is convenient in such an
Say we what to know a volume of a particular brain area. A "volume"
is a property in Wikidata and so the volume of the area can be specified
with a value in the item on Wikidata for the brain area. However,
reported volumes vary considerably between studies. Potentially, we
could add the values from all published studies reporting volume for the
brain area and use Wikidata qualifiers and references to keep track of
"metadata" related to the volume data: species, diagnose,
bibliographic information, number of subjects in the study, standard
deviation of group mean volume.
There is a range of critical issues, e.g., is individual research
results "notable enough" for Wikidata? What is a succinct
representation of data? How can we compute with such data? Can we
better support Wikidata quantitative weight on statements? For
instance, Wikidata states that a specific bioinformatics item is
involved in the biological process "memory". Though referenced,
Wikidata does not state to which degree or which studies initially
made such a claim.
If Wikidata gets it right then it could be used as a backend for
quantitative scientific results with large-scale meta-analysis
possible that would dwarf "ordinary" studies.