Citing Bytes - Adventures in Data Citation: IDCC 2011

The SS Great Britain, location of the opening reception

There were some absolutely amazing speakers at IDCC11, and I'd heartily encourage you to go and watch the videos that were made of the event. Below are the take-home messages I scribbled down in my notebook.

[Anything in square brackets and italics are my own comments/thoughts]

Opening Keynote by Ewan McIntosh (NoTosh)
Ewan started of by challenging us to be problem finders, rather than problem solvers, as that's where the innovations are really made, by finding a problem and then solving it. There's a lot of stuff out there that just doesn't work, because it's not got a problem to solve.

Scientists have to be careful - taking too much time to make sure that the data's correct can mean that we sit on it until it becomes useless. Communication of the data is as important as the data itself.

Even open data isn't really open, because people can't use it. Note that "open" does not mean "free".

Ewan went into a school where the kids were having problems listening and talking. And he got them to put on their very own TEDx event. A load of 7-8 year olds watched a lot of TED talks, and then they presented their own. [The photos from this event were amazing!]

He said that we've got to look at the impact of our data in the real world, and if we're not enthusiastic about what we're doing, no one else will be. Media literacy is also in the eye of the beholder.

He left us with some challenges for how we deal with data and science:
1. Tell a story
2. Create curiosity
3. Create wonder
4. Find a user pain (and solve it)
5. Create a reason to trade data

[I'm very pleased that the last two points are being addressed by the whole data citation thing I've been working on!]

David Lynn (Wellcome Trust)
The Wellcome trust has a data management and sharing policy that was published in January 2007. In it, researchers are required to maximise access to data and produce a data management plan, while the Trust commits to meet the costs of data sharing.

David's key challenges for data sharing were:

Infrastructure
Cultural (including incentives and recognition)
Technical
Professional (including training and career development of data specialists [hear hear!])
Ethical

Jeff Haywood (University of Edinburgh)

The University's mission: the creation, dissemination and curation of knowledge.

For example the Tobar an Dualchais site, which hosts an archive of video, audio, text and images of Scottish songs and stories from the 1930s on.

But to do data management, there needs to be incentives, something of value for researchers at every level.

Herding cats is easy - put fish at the end of the room where you want them to go!

Internal pressure from researchers came first. They wanted storage, which is a different problem from research data management.

Edinburgh's policy is that responsibility for research data management lies primarily with the PIs. New research proposals have to be accompanied by data management plans. The university will archive stuff that is important, and that funders/other repositories won't/can't.

One of their solutions is drop-box-like storage, which is also easily accessible from off-site and for collaborators.

Andrew Charlsworth (University of Bristol)

Focusing on the legal aspects of data.

People are interested in the workflows/processes/methodologies in science as well as the data.

There are legal implications of releasing data, including data protection, confidentiality, IPR etc...

Leaving safe storage to researchers over long periods of time is problematic because people leave, technology changes, security for personal data, FOI requests, deleting data/ownership...

Most legal and ethical problems arise because of:

lack of control (ownership)
lack of metadata
poor understanding of legal./ethical issues
not adjusting policies to new circumstances
lack of sanction (where do consequences of data loss/breach/misuse fall?)

We can't just open data, we have to put it into context.

We want to avoid undue legalisation, so use risk assessments rather than blanket rules.

Institutions and researchers should be prepared for FOI requests.

"Avoiding catching today's hot potatoes with the oven gloves of yesterday."

Mark Hahnel (FigShare)

"Scientists are egomaniacs...but it's not their fault."

We could leverage altmetrics on top of normal metrics to get extra information.

The new FigShare website will be released in January. Datasets on it are released under CC0, while everything else is CC-BY. Stuff put on the FigShare site can be cited using a DOI.

Filesets are anything that has more than one file in it.

Victoria Stodden (Columbia University)

Talking about reproducible research

"Without code, you don't have data." Open code is part of open data. Reproducability scopes what to share and how.

[I got a bit confused during her talk, until I realised that code doesn't just mean computer code, but all the workflows associated with producing a scientific result]

Scientific culture should be made so that scientific knowledge doesn't dissipate. Reproducability requires tools, infrastructure and incentives [and in the case of observational data, a time machine]

Many deep intellectual contributions are only captured in code - hence it's difficult to access these implementations without the code.

Ioannidis JPA (2005) Why Most Published Research Findings Are False. PLoS Med 2(8): e124. doi:10.1371/journal.pmed.0020124

Heather Piwowar (DataOne)
Science is based on "standing on the shoulders of giants" - but "building broad shoulders is hard work" and it doesn't help you become top dog.

Researchers overwhelmingly agree that sharing data is the right thing to do and that they'll get more citations.

We need to facilitate the deep recognition of the labour of dataset creation, and encourage researchers to have CV sections for data and code.

There is a pace for quick and dirty solutions.

We have a big problem in that citation info is often behind paywalls - we need open bibliography. More, we need open access to full text as citation doesn't tell us if the dataset was critiqued or not. We also need access to other metrics, like repository download stats.

Call to action!

Raise our expectation about what we can mash up, and our roles
Raise our voices
Get excited and make things! [I like this one!]

A future where what kind of impact something makes is as important as how much impact it makes.

[Heather very kindly has made all of her presentation notes available on her blog.]

Citing Bytes - Adventures in Data Citation

Friday, 16 December 2011

IDCC 2011 - notes from day 1 plenary talks

No comments:

Post a Comment