Citing Bytes - Adventures in Data Citation: July 2016

COPE Seminar: An Introductions to Publication Ethics, 13th May 2016, Oxford

Old books in my local second hand bookshop

The COPE (Committee on Publication Ethics) Seminar: An Introductions to Publication Ethics, was held on Friday 13th May 2016, in Oxford.

Being fairly new to this being an editor business, and the workshop being so local, I took the opportunity to go, and found it all really useful. Not only from my perspective as someone in charge of a journal, but also from the data management and publication point of view. A lot of the issues raised during the workshop, like attribution, authorship, plagiarism etc. are just as easily applied to datasets as they are to journal articles.

The workshop was a mixture of talks and discussion sessions, where we were given examples of actual cases that COPE had been told about, and we had to discuss and decide what the best course of action was. Then we were told what the response from the COPE members was in those particular cases - reassuringly we were pretty much in agreement in all cases!

Key notes that I jotted down during the day include:

Retractions of papers are growing at a rate faster than publications
An emerging area of concern is the growth of fake peer reviewers
Ethical guidelines for peer reviewers are available on the COPE website, along with other guidelines
Similarly, there are flowcharts on the COPE site to guide you through what to do if you suspect an ethical problem
Report for the Nuffield Council on Bioethics on the culture of scientific research
Academy of Medical Sciences - Reproducibility and reliability of biomedical research
Some authors will put in white quotation marks around text to get around plagiarism detection software

The main take home message for me was that COPE have a lot of resources on their website, all free to use.

Data visualisation and the future of academic publishing, Oxford, 10 June 2016

Astrolabes at the Museum of the History of Science, Oxford

Once again wearing my Editor-in-Chief hat, I was invited to the "Data visualisation and the future of academic publishing" workshop, hosted by University of Oxford and Oxford University Press on Friday 10th June 2016.

It was a pretty standard workshop format - lots of talks, but there were a wide variety of speakers, coming from a wide spread of backgrounds, which really helped make people think about the issues involved in data visualisation. I particularly enjoyed the interactive demonstrations from the speakers from the BBC and the Financial Times - both saying things that seem really obvious in retrospect, but are worth remembering when doing your own data visualisations (like keep it simple, and self contained, and make sure it tells a story).

For those who are interested, I've copied my (slightly edited) notes from the workshop below. Hopefully they'll make sense!

Richard O’Beirne (Digital Strategy Group, Oxford University Press)

What is a figure? A scientific result converted into a collection of pixels
Steep growth in "data visualisation" in Web of Science, PubMed
Data visualisation in Review: Summary, Canada 2012
Infographics tell a story about datasets
Preservation of visualisations is an issue
OUP got funding to identify suitable datasets to create visualisations (using 3rd party tools) and embed them in papers

Mark Hahnel (figshare)

Consistency of how you get to files on the internet is key
Institutional instances of figshare now happening globally e.g. ir.stedwards.edu / stedwards.figshare.com
Making files available in the internet allows the creation of a story
How do you get credit? Citation counts? Not being done yet
Files on the internet -> context -> visualisation
Data FAIRport initiative - to join and support existing communities that try to realise and enable a situation where valuable scientific data is ‘FAIR’ in the sense of being Findable, Accessible, Interoperable and Reusable
Hard to make visualisations scale!
Open data and APIs make it easier to understand the context behind the stories
Whose responsibility is it to look after these data visualisations?
Need to make files human and machine readable - add sufficient metadata!
Making things FAIR just allows people to build on stuff that has gone before - but it's easy to break if people don't share
How to deal with long-tail data? Standardisation...

John Walton (Senior Broadcast Journalist, BBC News)

Example of data visualisation of number of civilians killed by month in Syria
Visualisation has to make things clear - the layer of annotation around a dataset is really important
Most interactive visualisations are bespoke
It's helpful to keep things simple and clear!
Explain the facts behind things with data visualisation, but not just to people who like hard numbers - also include human stories
Lots of BBC web users are on mobile devices - need to take that into account
Big driver for BBC content is sharing on social media - BBC spend time making the content rigourous and collaborating with academia
Jihadism: tracking a month of deadly attacks- during the month there was about 600 deaths and ~700 attacks around the world
Digest the information for your audience
Keep interaction simple - remember different devices are used to access content

Rowan Wilson (Research Technology Specialist, University of Oxford)

Creating cross walks for common types of research data to get it into Blender
People aren't that used to navigating around 3 dimensional data - example imported into Minecraft (as sizeable proportion of the population are comfortable with navigating around that environment)
Issues with confidentiality and data protection, data ownership, copyright and database rights, open licenses are good for data, but should consider waiving hard requirement for attribution, as cumbersome attribution lists will put people off using data
Meshlab - tool to convert scientific data into Blender format

Felix Krawatzek (Department of Politics and International Relations, University of Oxford)

Visualising 150 years of correspondence between the US and Germany
Letters (handwritten/typed) need significant resource and time to process them before they can be used
Software produced to systematically correct OCR mistakes
Visualise the temporal dynamics of the letters
Visualisation of political attitudes
Can correlate geographic data from the corpus with census data
Always questions about availability of time or resources
Crowdsourcing projects that tend to work are those that appeal to people's sense of wonder, or their human interest. Get more richly annotated data if can harness the power of crowds.
Zooniverse created a byline to give the public credit for their work in Zooniverse projects

Andrea Rota (Technical Lead and Data Scientist. Pattrn)

Origin of the platform: the Gaza platform - documenting atrocities of war, humanitarian and environmental crises

"improving the global understanding of human evil"

Not a data analysis tool - for visualisation and exploration
Data in google sheets (no setup needed)
Web-based editor to submit/approve new event data
Information and computational politics - Actor Network Theory - network of human and non-human actors - how to cope with loss
Pattrn platform for sharing of knowledge, data, tools and research, not for profit
Computational agency - what are we trading in exxchange for short term convenience?
"How to protect the future web from its founders' own frailty" Cory Doctorow 2016
Issues with private data backends e.g. dependency on cloud proprietary systems
Computational capacity - where do we run code? Computation is cheap, managing computation isn't easy

Alan Smith (Data Visualisation Editor, Financial Times)

Gave a lovely example of bad chart published in the Times, and how it should have been presented
Visuals need to carry the story
Avoid chart junk!
Good example of taking an academic chart and reformatting them to make the story clearer
Graphics have impact on accompanying copy
Opportunity to "start with the chart"
Self-contained = good for social media sharing
Fewer charts, but better
Content should adapt to different platforms
The Chart Doctor - monthly column in the FT
Visualisation has a grammar and a vocabulary, it needs to be read, like written text

Scott Hale (Data Scientist, Oxford Internet Institute, University of Oxford)

Making existing tools easy to use, online interfaces to move from data file to visualisation
Key: make it easy
Plugin to Gephi to export data as javascript plugin for website
L2project.org - compiles straight to javascript - write code once - attach tables/plot to html element. Interactive environment that can go straight into html page

Alejandra Gonzalez-Beltran (Research Lecturer, Oxford e-Research Centre)

All about Scientific Data journal
Paper on survey about reproducibility - "More than 70% of researchers have tried and failed to reproduce another scientist's experiments, and more than half have failed to reproduce their own experiments."
FAIR principles
isaexplorer to find and filter data descriptor documents

Philippa Matthews (Honorary Research Fellow, Nuffield Department of Medicine)

Work is accessible if you know where to look
Lots of researcher profiles on lots of different places - LinkedIn, ResearchFish, ORCID,...
Times for publication are long
Spotted minor error with data in a supplementary data file - couldn't correct it
Want to be able to share things better - especially entering dialogue with patients and research participants
Want to publish a database of HBV epitopes - publish as a peer-reviewed journal aricle, but journals wary of publishing a live resource

my response to this was to query the underlying assumption that at database needs to be published like a paper - again a casualty of the "papers are the only true academic output" meme.

Public engagement - dynamic and engaging rather than static images e.g. Tropical medicine sketchbook

3rd LEARN Workshop, Helsinki, June 2016

Cute bollard at Helsinki airport

The 3rd LEARN (Leaders Activating Research Networks) workshop on Research Data Management, “Make research data management policies work” was held in Helsinki on Tuesday 28th June. I was invited wearing my CODATA hat (as Editor-in-Chief for the Data Science Journal) to give the closing keynote about the Science International Accord "Open Data in a Big Data World".

The problem with doing closing talks is that so much of what I wanted to say had pretty much already been said by someone during the course of the day - sometimes even by me during the breakout sessions! Still, it was a really interesting workshop, with excellent discussion (despite the pall that Brexit cast over the coffee and lunchtime conversation - but that's a topic for another time).

There were three breakout session possibilities, of which the timings meant that you could go to two of them.

I started with Group 3: Making possible and encouraging the reuse of data: incentives needed. This is my day job - taking data in from researchers, making it understandable and reusable, and figuring out ways to give them credit and rewards for doing so. And my group has been doing this for more than 2 decades, so I'm afraid I might have gone off on a bit of a rant. Regardless, we covered a lot, though mainly the old chestnuts of the promotion and tenure system being fixated on publications as the main academic output, the requirements for standards (especially for metadata - acknowledging just how difficult it would be to come up with a universal metadata standard applicable to all research data), and the fact that repositories can control (to a certain extent) the technology, but culture change still needs to happen. Though there were some positives on the culture change - I noted that journals are now pushing DOIs for data, and this has had an impact on people coming to us to get DOIs.

Next breakout group I went to was Group 1: Research Data services planning, implementation and governance. What surprised me in this session (maybe it shouldn't have) was just how far advanced the UK is when it comes to research data management policies and the likes, in comparison to other countries. This did mean that me and my other UK colleagues did get quizzed a fair bit about our experiences, which made sense. I had a bit of a different perspective from most of the other attendees - being a discipline-specific repository means that we can pick and choose what data we take in, unlike institutional repositories, who have to be more general. On being asked about what other services we provide, I did manage to name-drop JASMIN, in the context of a UK infrastructure for data analysis and storage.

I think the key driver in the UK for getting research data management policies working was the Research Councils, and their policies, but also their willingness to stump up the cash to fund the work. A big push on institutional repositories was EPSRC's putting the onus on research institutions to manage EPSRC-funded research data. But the increasing importance of data, and people's increased interest in it, is coming from a wide range of drivers - funders, policies, journals, repositories, etc.

I understand that the talks and notes from the breakouts will be put up on the workshop website, but they're not up as of the time of me writing this. You can find the slides from my talk here.

Friday, 1 July 2016

COPE Seminar: An Introductions to Publication Ethics, 13th May 2016, Oxford

Data visualisation and the future of academic publishing, Oxford, 10 June 2016

3rd LEARN Workshop, Helsinki, June 2016