Monday 22 May 2017

Link roundup - academic publishing edition


  • Neylon C, Pattinson D, Bilder G and Lin J. On the origin of nonequivalent states: How we can talk about preprints [version 1; referees: 1 approved]. F1000Research 2017, 6:608 (doi: 10.12688/f1000research.11408.1)
    • Really interesting article that proposes a model that distinguishes the characteristics of the object, its “state” (the external, objectively determinable, characteristics), from the subjective “standing” (the position, status, or reputation) granted to it by different communities. 
  • Baldwin, Melinda, "In referees we trust?", Physics Today 70, 2, 44 (2017); doi: http://dx.doi.org/10.1063/PT.3.3463
    • Fascinating article about the history of academic journal peer review, and the societal pressures that have made peer review the "gold standard" of academic credibility, with some discussion of how it's creaking at the seams.
  • "Satire in Scholarly Publishing" - COPE
    • A satirical article made it into a serious review article - COPE (Committee on Publication Ethics) give their judgement on the case. TL;DR - always fully read the papers you're citing!

Tuesday 2 May 2017

RDA Plenary 9, Barcelona, April 2017

Two little aliens stowed away for this trip, and were very pleased that the venue was all space themed.

RDA Plenary 9 was held in Barcelona, in April 2017. I made my usual bunch of scrappy notes, which I've tidied up and added links and commentary (in italics) for those who are interested.

Opening plenary session

  • Ideas spreadsheet for suggestions on how to coordinate and communicate across RDA groups 
    •  has to be an actionable suggestion - no moaning! 
    •  closed for suggestions now, but you can see what was proposed


WG RDA/WDS Scholarly Link Exchange
Interesting stuff and presenting things that are approaching maturity and could be useful and usable systems in the future. 

  • All about linking research objects
  • Scholix information model:
    •  mandatory: for link information package: publication date, link publisher. For source and target object: identifier and object type
    • other optional metadata includes link provider, relationship type, license URL of link information package (for link information package), title, creator, publication date,  publisher (for source/target objects)
  • DLI service available as a prototype 
    • automatically picks up stuff from DataCite
    • Scopus using the DLI system to find links to data
      •   information available for preview users
      • wishlist for Scopus includes: clearer information on where data is stored, ability to retrieve richer metadata...
    • Scopus planning on doing data citation counts in the future
  • Scholix plans on collecting every link possible, not just citations
  • information about datasets in the text of papers, needs to be mined out and extracted - some publishers doing this
  • community focus groups within the WG - working on documents to answer the main questions "why?" "how?" FAQs - hoping to have them produced in the next 3 months or so
  • use cases - how data centres can contribute artile links to DataCite = use "relatedIdentifier" property in DataCite metadata schema
  • Scholix doesn't say whether the dataset or the article is open, or about the licensing of the objects being linked


How to give credit to scientists for their involvement in making data & samples available for sharing
Unfortunately seemed to spend too much time rehashing old data citation, data publication and data metrics arguments.



IG RDA/WDS Certification of Digital Repositories
Started with presentations, then we broke out into groups to discuss certain questions and responses in the self certification process. I also got photographed by the official photographer.

  • Core Trustworthy Data Repository Requirements incude:
    •   explicit mission, licenses, continuity plan, disciplinary and ethical norms, adequate funding (3-5 years) and qualified staff, expert guidance, integrity and authenticity of the data, relevence and understandability, documented processes and procedure, long-term preservation


IG RDA/WDS Publishing Data
A key topic of this session was trying to figure out the next direction the IG should take... unfortunately still to be determined

  • WG on Data Fitness for use - just starting - see below
  •  OECD-GSF CODATA project: business models for sustainable research data repositories 
  • Niso recommendation on assessment of scholarly research - non traditional metrics
  • Where to take the IG?
    •  think about where scholarly publishing is going in the future. New publishing models - preprint repositories, open peer-review...


IG Data policy standardisation and implementation
Came from a BoF last plenary, but now an official IG - this meeting primarily about what already exists




Software Source Code focus group
Good discussion in this BoF - though mainly asking questions rather than providing answers

  • Statement of the problem clear - need software for scientific reproducibility. But don't have suitable repositories/ontologies for source code.
    • differences between scientific software and open source? Can we learn from open source developers?
    • is RDA a suitable venue for this work? Anything else going on in this area?
  • Mailing lists and bug tracking chains are important sources of information about the code
  • Software as knowledge, versus software as an instrument in the process
    • Docker - focussing on re-run-abilty
  •  Archives do throw things out - so saving all the commits might not be possible/practical
    • Open Source software - don't know when it starts what it will turn into - often safer to archive everything and then throw things out later.
  • Distinction between code as knowledge and reproducibility
  • Cost of storage, curation and maintainence of the metadata
    • Reproducibility IG working on this a bit
  • Difference between replicability and reproducibility
    •  Docker image not enough for reproducibility - as we need to be able to modify the source code
  • Don't get the chance to read a scientific article's first five drafts. People don't want to share their first drafts. Might put people off sharing.
    • first drafts of literature don't usually get shared, until the person writing them becomes famous, in which case people are interested
  •  Rely on top layers overlaying archival? e.g. overlay journal
  • Work being done on software citation - in/out of scope? Connected to metadata
  • Notes from the session


WG RDA/WDS Assessment of Data Fitness for Use 
New WG - meeting primarily about the criteria that can be used to assess data fitness for (re)use.

  • Looking at individual data sets
  • Needs to be efficient, high impact and visibility
  • Data quality: "degree to which a set of characteristics of data fulfills requirements" (ISO900)  
    •   any data are usable as long as they fit the requirements
  • Criteria 1
    • inherent properties: objectively verifiable/measurable e.g. validity of used methodologies, completeness of metadata
    • non-inherent propertise: subjective assessments
  • Criteria 2: properties directly related to data objects/ data accessibility/ data management processes
  • FAIR data principles
    •  FAIRness Index - a collection of metrics to assess adherence to the FAIR principles
  • DANS FAIR badge scheme - going through testing at the moment
    • reusability as the resultant of the other 3 (F+A+I)/3=R
    • scores for F,A,I as 1 to 5
    • publish number of user reviews, archivist assessments, downloads
    • mapping of reusable criteria to other F/A/I criteria
    • examples of star values criteria for each F/A/I
    •  Online questionnaire system developed for reviewers of datasets
    •  planning on creating a neutral website to assess datasets FAIRDAT.org (DAT = data assessment tool)
  •   Issues with asssessing multi-file datasets (with files in different formats), quality of metadata (how to evaluate when metadata is insufficient versus rich), how to define use of standard vocabularies