Citing Bytes - Adventures in Data Citation: August 2016

Standing on the Digits of Giants: Research data, preservation and innovation

ALPSP seminar, London, 8 March 2016

I was asked to present at an Association of Learned and Professional Society Publishers seminar, back in March this year. You can found my presentation slides here, and the audio of my presentation here.

I've info-dumped my notes on the various talks below, but to sum up, it was a very interesting seminar that seemed to go down well with an audience of primarily publishers, many of whom were getting to grips with this whole data thing for the first time.

William Killbride, Digital Preservation Coalition

* "Access is not an event, it's a process"
* Standing on someone's shoulders is quite precarious! We need a stable and secure platform - but how do we make one?
* Solutions for digital preservation need to be put in place at the beginning of the lifecycle
* Discussions with publishers can get bogged down in Open Access issues
* Small publishers hold the content that's most at risk
* We need action on Open Access! We've talked about it lots already
* International profile is important

Mark Thorley, NERC

* The digital, networked world is a real game changer. Peopel want on-line access now and for free. And anyone can "publish" anything on the web
* Open research is not an admin overhead
* The data revolution is replaying the printing revolution established by Gutenberg's mechanical, moveable type
* ICSU's report "Open Data in a Big Data World"
* Open research costs money - we have to learn to live with that
* Technology is the "easy bit" - people are complicated!

Robert Gurney, University of Reading

* The cloud approach is developing fast in environmental data - visualisation of data (especially large quantities of data) is very important
* Infrastructure as a service provides easy access to resources
* Problems in Big Data - volume, variety, veracity
* The Belmont Forum
* is set up to allow common cross-national calls. Their data policy and principles are published on the web
* is establishing a data and e-Infrastructure coordination office
* creating a common enhanced data plan
* planning scoping workshops and international calls for case studies and to share infrastructure and develop best practice
* NERC are leading the effort on cross-disciplinary training curriculum to expand human capacity. This will involve the UN training agency, and there will be an open call for a training champion
* The Belmont Forum implementation plan is published

Phil Jones, Digital Science

* We are moving from cottage industry to industrial scale science, but funding structures are more set up to support cottage industry science.
* Valen, Blanchat, figshare, 2015 - Survey of data policies for funders across the UK and USA
* Open Academic Tidal Wave is moving from recommendations to enforcement
* Data repositories have different approaches - structured versus unstructured
* Publishers only have a limited window of time to engage with researchers during the research workstream - but new tools are coming out to allow publishers to interactwith researchers across a greater time
* If we want compliance, the simpler we can make the tools to do it, the better

Peter Burnhill, EDINA

* Increasingly more references to the wild web, not just back to other articles
* Scholarly record always has a fuzzy edge
* Libraries no longer have e-collections, only e-connections
* Mostly big publisher content being archived - but we don't know if the small stuff is being archived. Research libraries archiving stuff aren't going for the long tail of stuff published by small publishers
* Reference rot = link rot + content drift
* analysed ~ 1 million URI links - tested if URIs still worked, is there a "memento" of that reference in the "archived web"
* ~75% not archived within 14 days of publication
* Klein 2014, PLOS One - "Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot"
* rotten references mean defective articles!
* author workflow - note taking software, working with Zotero
* Publishers should accept robust links in cited reference, avoid reference rot by triggering archiving of snapshots and inserting Hiberlinks/robust links at the point of ingest into submission system.

Mike Taylor, Elsevier

* Research data metrics - interest has exploded in past few years
* NISO - data metrics recommendations - set up 3 working groups
* "metrics for non-traditional outputs" group
* recommending report dataset download usage by using COUNTER compliant formulations, and that funders support repositories to do this
* Elsevier is adapting its research infrastructure to deal with research data
* much easier to set up new products than adapt existing systems!
* Ambitions for next year:
* most Elsevier journals promoting data publishing with data policies
* submission system to support data citations and data submissions
* communicate what's being done
* Data metrics part of the value loop encouraging researchers to make their data available. (Also including data)
* Metrics based on data citation will be happening in the near future, as soon as the infrastructure is built
* Not just one metric!
* article level metrics
* journal level metrics
* the more metrics, the harder it is to hide things - multiple metrics give multiple points of view

Josh Brown, Orcid

* CRediT schema - update ORCID schema to include other research roles e.g. data etc.
* Contributor type badges
* project-thor.eu
* need PIDs for organisations
* issues with versioning, identifier equivalence, granularity, changes over time, making cultural changes mainstream
* all research activities need to be taken into account
* we can't reward it if we don't recognise it
* we won't recognise it if we can't agree on what it is

Matthew Addis, Arkivium

* direct benefit to researchers in getting involved with digital preservation
* tools and services exist now that allow researchers to get on and do digital preservation
* 44% of links to Astronomy data broken after 10 years
* Researchers only really get judged on how much grant money thay bring in, and how many publications - digital preservation will help with both these
* Lots of tools and models out there, but not particularly helpful for most researchers. Too much choice!
* do the bare minimum to get benefits from digital preservation - parsimonious preservation
* know what you have - understand the formats, catalogue the data
* put it somewhere safe
* link rot - how to address it?
* Droid - file format identification tool, can generate xml/pdf reports. Metadata includes links to PRONOM - technical registry for file formats
* checksums - useful to establish if data has been lost/corrupted. Tools e.g. exactly - creates BagIt manifest of files
* ADMIRe survey at Nottingham
* make lots of copies to keep stuff safe - put them in places like institutional repositories...
* links are important. DOIs are dependent on URLs, which are as brittle as any URLs - lots of links compensate for reference rot

Wendy White, University of Southampton

* PIs as change agents - collaboration with academic leadership to enact changes
* collaboration - e.g. capturing information about equipment and facilities
* Risk of garbage in and pretty visualisations out
* Quick wins - embedding DOIs, CC0 metadata
* Zika initiative - engage with lots of other smaller initiatives as well e.g. greynet.org
* Networks of repositories - institutional repositories working with international and national disciplinary repositories
* Not making enough of theses data - encourage more theses to have data made available
* Library triaged research data services - consultancy, engagement with editors, advice, workshops
* Different training models - pick and mix, intense and seasonal, integrated pathways (what we want!), emergency boost (help panicking people)
* Southampton reviewing curricula - modules on data analysis, ethics and research methods are good areas to discuss data management
* PhD students are great agents for change - passionate advocates
* Embedded librarians inside research teams iutility.ac.uk
* Research data - more than management!
* An archive isn't a thing, it's a strategy

Peter Doorn - DANS

* Lots of different types of data journals and data papers
* Data paper describes the research context of a dataset
* Presentation of a data paper should look attractive - more user-friendly than the view of the dataset in the archive
* Variety of interactive data visualisation - make the data more alive
* publishing data in Mendeley data - Elsevier aren't making it obligatory to publish data in Mendeley Data

Citing Bytes - Adventures in Data Citation

Friday, 26 August 2016

Standing on the Digits of Giants: Research data, preservation and innovation - ALPSP seminar, London, 8 March 2016