Citing Bytes - Adventures in Data Citation: 2017

"Should we cite preprints?" - Green Tea and Velociraptors

Agrees with my "cite what you use" rule of thumb

"Preprints won’t just publish themselves: Why we need centralized services for preprints" - Collaborative Knowledge Foundation

Neylon C, Pattinson D, Bilder G and Lin J. On the origin of nonequivalent states: How we can talk about preprints [version 1; referees: 1 approved]. F1000Research 2017, 6:608 (doi: 10.12688/f1000research.11408.1)

Really interesting article that proposes a model that distinguishes the characteristics of the object, its “state” (the external, objectively determinable, characteristics), from the subjective “standing” (the position, status, or reputation) granted to it by different communities.

Baldwin, Melinda, "In referees we trust?", Physics Today 70, 2, 44 (2017); doi: http://dx.doi.org/10.1063/PT.3.3463

Fascinating article about the history of academic journal peer review, and the societal pressures that have made peer review the "gold standard" of academic credibility, with some discussion of how it's creaking at the seams.

"Does It Matter Whose Name Appears After the © When Using Creative Commons?" - Todd Carpenter (The Scholarly Kitchen)

"Citation Performance Indicators — A Very Short Introduction" - Phil Davis (The Scholarly Kitchen)

"Satire in Scholarly Publishing" - COPE

A satirical article made it into a serious review article - COPE (Committee on Publication Ethics) give their judgement on the case. TL;DR - always fully read the papers you're citing!

"Journal accepts bogus paper requesting removal from mailing list" - The Guardian

A tale of a predatory open access journal accepting a paper (with lovely diagrams) which just repeated the words: "Get me off your ******* mailing list"

Two little aliens stowed away for this trip, and were very pleased that the venue was all space themed.

RDA Plenary 9 was held in Barcelona, in April 2017. I made my usual bunch of scrappy notes, which I've tidied up and added links and commentary (in italics) for those who are interested.

Opening plenary session

Ideas spreadsheet for suggestions on how to coordinate and communicate across RDA groups

has to be an actionable suggestion - no moaning!
closed for suggestions now, but you can see what was proposed

WG RDA/WDS Scholarly Link Exchange
Interesting stuff and presenting things that are approaching maturity and could be useful and usable systems in the future.

All about linking research objects
Scholix information model:

mandatory: for link information package: publication date, link publisher. For source and target object: identifier and object type
other optional metadata includes link provider, relationship type, license URL of link information package (for link information package), title, creator, publication date, publisher (for source/target objects)

DLI service available as a prototype

automatically picks up stuff from DataCite
Scopus using the DLI system to find links to data

information available for preview users
wishlist for Scopus includes: clearer information on where data is stored, ability to retrieve richer metadata...

Scopus planning on doing data citation counts in the future

Scholix plans on collecting every link possible, not just citations
information about datasets in the text of papers, needs to be mined out and extracted - some publishers doing this
community focus groups within the WG - working on documents to answer the main questions "why?" "how?" FAQs - hoping to have them produced in the next 3 months or so
use cases - how data centres can contribute artile links to DataCite = use "relatedIdentifier" property in DataCite metadata schema
Scholix doesn't say whether the dataset or the article is open, or about the licensing of the objects being linked

How to give credit to scientists for their involvement in making data & samples available for sharing
Unfortunately seemed to spend too much time rehashing old data citation, data publication and data metrics arguments.

BRIF - Bioresource Research Impact Factor
Data metrics and reward systems - table 3 in report
Analysis of metadata records in DataCite reveals that not all records are complete.

Consensus and standardisation of metadata needed

Top data creator in DataCite is a mycologist
WG RDA / TD Metadata Standards for attribution of physical and digital collections stewardship already exists. Reasearch Data Provenance IG already exists.
Focussing very much on data publication as a method for giving credit - too much overlap with existing WG/IGs
CoBRA short checklist for citation of bioresources in scientific journal articles
IGSN is now in DataCite metadata schema as relatedIdentifierType

IG RDA/WDS Certification of Digital Repositories
Started with presentations, then we broke out into groups to discuss certain questions and responses in the self certification process. I also got photographed by the official photographer.

Core Trustworthy Data Repository Requirements incude:

explicit mission, licenses, continuity plan, disciplinary and ethical norms, adequate funding (3-5 years) and qualified staff, expert guidance, integrity and authenticity of the data, relevence and understandability, documented processes and procedure, long-term preservation

IG RDA/WDS Publishing Data
A key topic of this session was trying to figure out the next direction the IG should take... unfortunately still to be determined

WG on Data Fitness for use - just starting - see below
OECD-GSF CODATA project: business models for sustainable research data repositories
Niso recommendation on assessment of scholarly research - non traditional metrics
Where to take the IG?

think about where scholarly publishing is going in the future. New publishing models - preprint repositories, open peer-review...

IG Data policy standardisation and implementation
Came from a BoF last plenary, but now an official IG - this meeting primarily about what already exists

UK Concordat on open research data
IG primary objective - define a common framework for research data policy allowing for different requirements, different levels of commitment and acknowledging disciplinary differences
Journal research data policy registry
Complying with funder policy is what researchers give as their motivation to share data, but researchers find it hard to comply with policy
Springer Nature Research Data Policy framework
A Data Citation Roadmap for Publishers
Do studies of quantitative results of the impact of data sharing exist? Citation benefits?

doing studies, but insufficient evidence as yet.

The Open Data Citation Advantage

Suggestion that the Belmont Forum is bringing together people for standardising policies...?

Software Source Code focus group
Good discussion in this BoF - though mainly asking questions rather than providing answers

Statement of the problem clear - need software for scientific reproducibility. But don't have suitable repositories/ontologies for source code.

differences between scientific software and open source? Can we learn from open source developers?
is RDA a suitable venue for this work? Anything else going on in this area?

Mailing lists and bug tracking chains are important sources of information about the code
Software as knowledge, versus software as an instrument in the process

Docker - focussing on re-run-abilty

Archives do throw things out - so saving all the commits might not be possible/practical

Open Source software - don't know when it starts what it will turn into - often safer to archive everything and then throw things out later.

Distinction between code as knowledge and reproducibility
Cost of storage, curation and maintainence of the metadata

Reproducibility IG working on this a bit

Difference between replicability and reproducibility

Docker image not enough for reproducibility - as we need to be able to modify the source code

Don't get the chance to read a scientific article's first five drafts. People don't want to share their first drafts. Might put people off sharing.

first drafts of literature don't usually get shared, until the person writing them becomes famous, in which case people are interested

Rely on top layers overlaying archival? e.g. overlay journal
Work being done on software citation - in/out of scope? Connected to metadata
Notes from the session

WG RDA/WDS Assessment of Data Fitness for Use
New WG - meeting primarily about the criteria that can be used to assess data fitness for (re)use.

Looking at individual data sets
Needs to be efficient, high impact and visibility
Data quality: "degree to which a set of characteristics of data fulfills requirements" (ISO900)

any data are usable as long as they fit the requirements

Criteria 1

inherent properties: objectively verifiable/measurable e.g. validity of used methodologies, completeness of metadata
non-inherent propertise: subjective assessments

Criteria 2: properties directly related to data objects/ data accessibility/ data management processes
FAIR data principles

FAIRness Index - a collection of metrics to assess adherence to the FAIR principles

DANS FAIR badge scheme - going through testing at the moment

reusability as the resultant of the other 3 (F+A+I)/3=R
scores for F,A,I as 1 to 5
publish number of user reviews, archivist assessments, downloads
mapping of reusable criteria to other F/A/I criteria
examples of star values criteria for each F/A/I
Online questionnaire system developed for reviewers of datasets
planning on creating a neutral website to assess datasets FAIRDAT.org (DAT = data assessment tool)

Issues with asssessing multi-file datasets (with files in different formats), quality of metadata (how to evaluate when metadata is insufficient versus rich), how to define use of standard vocabularies

Citing Bytes - Adventures in Data Citation

Monday, 22 May 2017

Link roundup - academic publishing edition

Tuesday, 2 May 2017

RDA Plenary 9, Barcelona, April 2017