Men with printing press, circa 1930s by Seattle Municipal Archives, on Flickr
Ruth Wilson (Nature Publishing Group) set the scene for us with an excellent key-note talk, which led into some very spirited discussion both after the talk and down the bar before dinner. I scribbled down 3 1/2 pages of notes, so I'm not going to transcribe them all (that would be silly) but instead will aim to get the key points as I understood them. If it's a case of tl;dr, then skip down the end to the talk's conclusions, and you'll get the gist.
- NPG's main driving factors for their interest in data publication are: ensuring the transparency of the scientific process, and to speed up the scientific process.
- Data neeeds to be: available, findable, interpretable, re-useable and citeable.
The Data Publication Pyramid ( http://www.alliancepermanentaccess.org/wp-content/uploads/downloads/2011/11/ODE-ReportOnIntegrationOfDataAndPublications-1_1.pdf)
- Increasing amounts of information are integral to the article (and even more are supplementary). How can we link to data with no serving repository?
- Interactive data is becoming important - things like 3 D structure, regraph info, add/remove traces, download data from behind graphs/figures, geospatial data on maps. These are all being pulled together in things like Elsevier's article of the future.
- Supplementary data has become "a limitless bag of stuff!", often with the data locked in pdf. Supplementary information is adversely affecting the review process, in that it puts extra pressure on authors, reviewers and readers. There has been a 65% increase in supplementary information between 2008 and 2011. Sometimes it's only tenuously linked to the article, or it can be integral to the article, but put in supplementary information due to journal stringent space restrictions.
- Nature Neuroscience will be trialling a new type of paper from April 2012, where the authors will submit one seamless article, putting all of the essential information into it. Editors will then work with the referees and the authors to determine what elements should stay in the paper, and what should be considered supplementary. The plan is that it will make people think what's integral to the paper and ensure all the information submitted is peer-reviewed.
- Nature are also investigating an extended on-line version of articles (in html and pdf) where there can be up to 14 extra figures or tables included.
- Nature Chemistry was shown as an example: they publish a lot of compounds, where the synthetic procedure for the compounds is in the supplementary information, and gets pulled through to the on-line article in an interactive way.
- Linking and integration between journals and data repositories is important. NPG are looking for bidirectional linking between article and data, and are seeking more serious, interactive integration.
- NPG has a condition that "authors are required to make materials, data and associated protocols promptly available to others without undue qualifications". It also "strongly recommends" data deposit in subject repositories.
- Regarding data publications, the call for data to be a first class scientific object was acknowledged, along with the interest publishers now have in data (as shown by the increasing number of fledgeling data publications)
- Data papers were described as being a detailed descriptor of the dataset, with no conclusions, instead focussing on increasing interoperability and reuse. The data should be held in a trusted repository (definition of trusted to be defined!), with linking and integration between the paper and data. Credit would be given through citation for data producers, and would also provide attribution and credit for data managers, who might not qualify for authorship of a traditional paper.
- Linking publications and data strengthens the scientific record and improves transparency
- Funders policies are a key driver for integrating data and publications
- Journals can and do influence data deposition
- Not a situation of one size fits all!
- Partnerships are important (institutions, repositories, publishers, researchers, funders), but the roles are not well established, and business models need to be determined.