Once again wearing my
Editor-in-Chief hat, I was invited to the "
Data visualisation and the future of academic publishing" workshop, hosted by University of Oxford and Oxford University Press on Friday 10th June 2016.
It was a pretty standard workshop format - lots of talks, but there were a wide variety of speakers, coming from a wide spread of backgrounds, which really helped make people think about the issues involved in data visualisation. I particularly enjoyed the interactive demonstrations from the speakers from the BBC and the Financial Times - both saying things that seem really obvious in retrospect, but are worth remembering when doing your own data visualisations (like keep it simple, and self contained, and make sure it tells a story).
For those who are interested, I've copied my (slightly edited) notes from the workshop below. Hopefully they'll make sense!
Richard O’Beirne (Digital Strategy Group, Oxford University Press)
- What is a figure? A scientific result converted into a collection of pixels
- Steep growth in "data visualisation" in Web of Science, PubMed
- Data visualisation in Review: Summary, Canada 2012
- Infographics tell a story about datasets
- Preservation of visualisations is an issue
- OUP got funding to identify suitable datasets to create visualisations (using 3rd party tools) and embed them in papers
Mark Hahnel (figshare)
- Consistency of how you get to files on the internet is key
- Institutional instances of figshare now happening globally e.g. ir.stedwards.edu / stedwards.figshare.com
- Making files available in the internet allows the creation of a story
- How do you get credit? Citation counts? Not being done yet
- Files on the internet -> context -> visualisation
- Data FAIRport initiative - to join and support existing communities that try to realise and enable a situation where valuable scientific data is ‘FAIR’ in the sense of being Findable, Accessible, Interoperable and Reusable
- Hard to make visualisations scale!
- Open data and APIs make it easier to understand the context behind the stories
- Whose responsibility is it to look after these data visualisations?
- Need to make files human and machine readable - add sufficient metadata!
- Making things FAIR just allows people to build on stuff that has gone before - but it's easy to break if people don't share
- How to deal with long-tail data? Standardisation...
John Walton (Senior Broadcast Journalist, BBC News)
- Example of data visualisation of number of civilians killed by month in Syria
- Visualisation has to make things clear - the layer of annotation around a dataset is really important
- Most interactive visualisations are bespoke
- It's helpful to keep things simple and clear!
- Explain the facts behind things with data visualisation, but not just to people who like hard numbers - also include human stories
- Lots of BBC web users are on mobile devices - need to take that into account
- Big driver for BBC content is sharing on social media - BBC spend time making the content rigourous and collaborating with academia
- Jihadism: tracking a month of deadly attacks- during the month there was about 600 deaths and ~700 attacks around the world
- Digest the information for your audience
- Keep interaction simple - remember different devices are used to access content
Rowan Wilson (Research Technology Specialist, University of Oxford)
- Creating cross walks for common types of research data to get it into Blender
- People aren't that used to navigating around 3 dimensional data - example imported into Minecraft (as sizeable proportion of the population are comfortable with navigating around that environment)
- Issues with confidentiality and data protection, data ownership, copyright and database rights, open licenses are good for data, but should consider waiving hard requirement for attribution, as cumbersome attribution lists will put people off using data
- Meshlab - tool to convert scientific data into Blender format
Felix Krawatzek (Department of Politics and International Relations, University of Oxford)
- Visualising 150 years of correspondence between the US and Germany
- Letters (handwritten/typed) need significant resource and time to process them before they can be used
- Software produced to systematically correct OCR mistakes
- Visualise the temporal dynamics of the letters
- Visualisation of political attitudes
- Can correlate geographic data from the corpus with census data
- Always questions about availability of time or resources
- Crowdsourcing projects that tend to work are those that appeal to people's sense of wonder, or their human interest. Get more richly annotated data if can harness the power of crowds.
- Zooniverse created a byline to give the public credit for their work in Zooniverse projects
Andrea Rota (Technical Lead and Data Scientist. Pattrn)
- Origin of the platform: the Gaza platform - documenting atrocities of war, humanitarian and environmental crises
- "improving the global understanding of human evil"
- Not a data analysis tool - for visualisation and exploration
- Data in google sheets (no setup needed)
- Web-based editor to submit/approve new event data
- Information and computational politics - Actor Network Theory - network of human and non-human actors - how to cope with loss
- Pattrn platform for sharing of knowledge, data, tools and research, not for profit
- Computational agency - what are we trading in exxchange for short term convenience?
- "How to protect the future web from its founders' own frailty" Cory Doctorow 2016
- Issues with private data backends e.g. dependency on cloud proprietary systems
- Computational capacity - where do we run code? Computation is cheap, managing computation isn't easy
Alan Smith (Data Visualisation Editor, Financial Times)
- Gave a lovely example of bad chart published in the Times, and how it should have been presented
- Visuals need to carry the story
- Avoid chart junk!
- Good example of taking an academic chart and reformatting them to make the story clearer
- Graphics have impact on accompanying copy
- Opportunity to "start with the chart"
- Self-contained = good for social media sharing
- Fewer charts, but better
- Content should adapt to different platforms
- The Chart Doctor - monthly column in the FT
- Visualisation has a grammar and a vocabulary, it needs to be read, like written text
Scott Hale (Data Scientist, Oxford Internet Institute, University of Oxford)
- Making existing tools easy to use, online interfaces to move from data file to visualisation
- Key: make it easy
- Plugin to Gephi to export data as javascript plugin for website
- L2project.org - compiles straight to javascript - write code once - attach tables/plot to html element. Interactive environment that can go straight into html page
Alejandra Gonzalez-Beltran (Research Lecturer, Oxford e-Research Centre)
- All about Scientific Data journal
- Paper on survey about reproducibility - "More than 70% of researchers have tried and failed to reproduce another scientist's experiments, and more than half have failed to reproduce their own experiments."
- FAIR principles
- isaexplorer to find and filter data descriptor documents
Philippa Matthews (Honorary Research Fellow, Nuffield Department of Medicine)
- Work is accessible if you know where to look
- Lots of researcher profiles on lots of different places - LinkedIn, ResearchFish, ORCID,...
- Times for publication are long
- Spotted minor error with data in a supplementary data file - couldn't correct it
- Want to be able to share things better - especially entering dialogue with patients and research participants
- Want to publish a database of HBV epitopes - publish as a peer-reviewed journal aricle, but journals wary of publishing a live resource
- my response to this was to query the underlying assumption that at database needs to be published like a paper - again a casualty of the "papers are the only true academic output" meme.
- Public engagement - dynamic and engaging rather than static images e.g. Tropical medicine sketchbook