Two little aliens stowed away for this trip, and were very pleased that the venue was all space themed.
RDA Plenary 9 was held in Barcelona, in April 2017. I made my usual bunch of scrappy notes, which I've tidied up and added links and commentary (in italics) for those who are interested.
- Ideas spreadsheet for suggestions on how to coordinate and communicate across RDA groups
- has to be an actionable suggestion - no moaning!
- closed for suggestions now, but you can see what was proposed
WG RDA/WDS Scholarly Link Exchange
Interesting stuff and presenting things that are approaching maturity and could be useful and usable systems in the future.
- All about linking research objects
- Scholix information model:
- mandatory: for link information package: publication date, link publisher. For source and target object: identifier and object type
- other optional metadata includes link provider, relationship type, license URL of link information package (for link information package), title, creator, publication date, publisher (for source/target objects)
- DLI service available as a prototype
- automatically picks up stuff from DataCite
- Scopus using the DLI system to find links to data
- information available for preview users
- wishlist for Scopus includes: clearer information on where data is stored, ability to retrieve richer metadata...
- Scopus planning on doing data citation counts in the future
- Scholix plans on collecting every link possible, not just citations
- information about datasets in the text of papers, needs to be mined out and extracted - some publishers doing this
- community focus groups within the WG - working on documents to answer the main questions "why?" "how?" FAQs - hoping to have them produced in the next 3 months or so
- use cases - how data centres can contribute artile links to DataCite = use "relatedIdentifier" property in DataCite metadata schema
- Scholix doesn't say whether the dataset or the article is open, or about the licensing of the objects being linked
How to give credit to scientists for their involvement in making data & samples available for sharing
Unfortunately seemed to spend too much time rehashing old data citation, data publication and data metrics arguments.
- BRIF - Bioresource Research Impact Factor
- Data metrics and reward systems - table 3 in report
- Analysis of metadata records in DataCite reveals that not all records are complete.
- Consensus and standardisation of metadata needed
- Top data creator in DataCite is a mycologist
- WG RDA / TD Metadata Standards for attribution of physical and digital collections stewardship already exists. Reasearch Data Provenance IG already exists.
- Focussing very much on data publication as a method for giving credit - too much overlap with existing WG/IGs
- CoBRA short checklist for citation of bioresources in scientific journal articles
- IGSN is now in DataCite metadata schema as relatedIdentifierType
IG RDA/WDS Certification of Digital Repositories
Started with presentations, then we broke out into groups to discuss certain questions and responses in the self certification process. I also got photographed by the official photographer.
- Core Trustworthy Data Repository Requirements incude:
- explicit mission, licenses, continuity plan, disciplinary and ethical norms, adequate funding (3-5 years) and qualified staff, expert guidance, integrity and authenticity of the data, relevence and understandability, documented processes and procedure, long-term preservation
IG RDA/WDS Publishing Data
A key topic of this session was trying to figure out the next direction the IG should take... unfortunately still to be determined
- WG on Data Fitness for use - just starting - see below
- OECD-GSF CODATA project: business models for sustainable research data repositories
- Niso recommendation on assessment of scholarly research - non traditional metrics
- Where to take the IG?
- think about where scholarly publishing is going in the future. New publishing models - preprint repositories, open peer-review...
IG Data policy standardisation and implementation
Came from a BoF last plenary, but now an official IG - this meeting primarily about what already exists
- UK Concordat on open research data
- IG primary objective - define a common framework for research data policy allowing for different requirements, different levels of commitment and acknowledging disciplinary differences
- Journal research data policy registry
- Complying with funder policy is what researchers give as their motivation to share data, but researchers find it hard to comply with policy
- Springer Nature Research Data Policy framework
- A Data Citation Roadmap for Publishers
- Do studies of quantitative results of the impact of data sharing exist? Citation benefits?
- doing studies, but insufficient evidence as yet.
- Suggestion that the Belmont Forum is bringing together people for standardising policies...?
Software Source Code focus group
Good discussion in this BoF - though mainly asking questions rather than providing answers
- Statement of the problem clear - need software for scientific reproducibility. But don't have suitable repositories/ontologies for source code.
- differences between scientific software and open source? Can we learn from open source developers?
- is RDA a suitable venue for this work? Anything else going on in this area?
- Mailing lists and bug tracking chains are important sources of information about the code
- Software as knowledge, versus software as an instrument in the process
- Docker - focussing on re-run-abilty
- Archives do throw things out - so saving all the commits might not be possible/practical
- Open Source software - don't know when it starts what it will turn into - often safer to archive everything and then throw things out later.
- Distinction between code as knowledge and reproducibility
- Cost of storage, curation and maintainence of the metadata
- Reproducibility IG working on this a bit
- Difference between replicability and reproducibility
- Docker image not enough for reproducibility - as we need to be able to modify the source code
- Don't get the chance to read a scientific article's first five drafts. People don't want to share their first drafts. Might put people off sharing.
- first drafts of literature don't usually get shared, until the person writing them becomes famous, in which case people are interested
- Rely on top layers overlaying archival? e.g. overlay journal
- Work being done on software citation - in/out of scope? Connected to metadata
- Notes from the session
WG RDA/WDS Assessment of Data Fitness for Use
New WG - meeting primarily about the criteria that can be used to assess data fitness for (re)use.
- Looking at individual data sets
- Needs to be efficient, high impact and visibility
- Data quality: "degree to which a set of characteristics of data fulfills requirements" (ISO900)
- any data are usable as long as they fit the requirements
- Criteria 1
- inherent properties: objectively verifiable/measurable e.g. validity of used methodologies, completeness of metadata
- non-inherent propertise: subjective assessments
- Criteria 2: properties directly related to data objects/ data accessibility/ data management processes
- FAIR data principles
- FAIRness Index - a collection of metrics to assess adherence to the FAIR principles
- DANS FAIR badge scheme - going through testing at the moment
- reusability as the resultant of the other 3 (F+A+I)/3=R
- scores for F,A,I as 1 to 5
- publish number of user reviews, archivist assessments, downloads
- mapping of reusable criteria to other F/A/I criteria
- examples of star values criteria for each F/A/I
- Online questionnaire system developed for reviewers of datasets
- planning on creating a neutral website to assess datasets FAIRDAT.org (DAT = data assessment tool)
- Issues with asssessing multi-file datasets (with files in different formats), quality of metadata (how to evaluate when metadata is insufficient versus rich), how to define use of standard vocabularies