Citing Bytes - Adventures in Data Citation: February 2023

Friday, 17 February 2023

Do AIs Hallucinate Electric Sheep? Part 2: Creativity and the Right to Make Imperfect Things

I am a creative person. I write fiction and poetry for fun, I make (naïve) art and a lot of crafts, and it is a big part of how I view myself in the world.

I am also a consumer of other people's creativity, whether that's books, graphic novels, tv, music, film, or even the random little things that people do in this world, like sticking googly eyes in unexpected places.

Googly eyes on the cover of the USB charging point in a bus

Being creative is not easy, but not in the way that you might expect. I am lucky in that I don't lack creative ideas at all, and I have enough experience to implement them, and also not worry so much when they don't work out as I expect.

The thing that blocks my creative practice the most is judgement. In a world where we can hear professional musicians at the touch of a button, or see media made with vast amounts of talent and resources streamed into our devices any time we want, it can be really hard to look at my own efforts. When I'm comparing my amateur attempts at, for example, lino cutting with those of professional artists who have been doing it for many, many more hours than I have, then it's hard not to become discouraged and lose the will to keep trying, keep creating.

Hustle culture adds to this negative pressure on human creativity. In a world where everything has to be monetised, any time spent doing something badly for fun is considered a waste, and therefore shameful.

Let me be clear - this world with our ability to see and hear amazing creations so quickly and easily is amazing, and I am very happy to be part of it. But I do feel that we need acknowledgement of the creative things that are raw, that are rough around the edges, that aren't perfect, and that we as humans are allowed to create such things.

I believe that humans are fundamentally creative. We want to make stuff, and in times of enforced idleness, we will make stuff. During the covid lockdowns, when people were furloughed, it would have been so easy to default to the oft held belief that people would just sit on their sofa watching tv all day. This didn't happen. Whether it was musicians recording acoustic albums in their bedrooms, or the fad for sourdough bread, people were making things. Even with all this free time for "self-indulgence", people still found, and made, purpose in their lives.

So, what about AI? What about AI generated art and ChatGPT writing poetry?

I don't believe that it is possible to be creative in a vacumn - though of course there's no way of testing this. All human creativity is inspired by and builds on what comes before, whether that's how we make our clothes, or how we put paint on things.

AI art generation does this on an industrial scale - it hoovers up vast amounts of training data, i.e. images of artworks, and then uses those artworks to generate images of its own. There are lots of issues with how the training data was collected - copying copyrighted artworks from the web is not ethically or legally sound, but I'm not going to get into that at this time. Some people claim that this is just how human inspiration works - we see things and then we spin ideas off those things to create new things. The difference is just the speed and volume that the AI can manage.

I am not a trained philosopher. I am not a professional creative. So I can only tell you my opinions and thoughts here, rather than being able to generalise more widely.

ChatGPT's ability to write a perfectly metred and rhymed sonnet on a topic you give it makes me feel like my efforts to write sonnets are somehow less. Yes, the AI outputs are trite and lacking in deeper meaning, but to be honest, so are some of my creations. I am not a visual artist as such, but if I were, the AI art bots would induce the same feelings in me.

This makes me cross. AI and mechanisation was supposed to give us the tools to do boring stuff quickly and easily, so we could spend more time doing fun stuff like art. But it seems like it's the other way around - we're left with the boring drudgery while the AI is pushing out images and words at a rate that no human would ever be able to manage. What's worse - this AI art is quicker and cheaper than a human artist, so of course in a world where costs need to be cut to the bone to maximise profit, it's the human artist who's going to be ditched.

And then where will that leave us? Starved for non-AI generated media and new content? AI can't use art to explore what it means to be human, because it isn't human, and is only basing its output on statistical transformations of its training data, what has been done before.

So, what am I arguing for? A world that allows and encourages us to be creative, and rewards us for our efforts, even if our painting is wonky, or our fiction is derivative. A world where we have the time to experience amazing art done by amazing people, and become inspired by it to make our own creations. A world where the effort needed to create good art and music and words is not invisible, so we can all see and know and appreciate just how much effort it takes to create something.

Where does AI fit in? Let it be a tool, a prompt generator to spark ideas. Let it be a starting point for inspiration, not the ending point of creative endeavour. Let it give us the inverse of ourselves, so that by seeing what it is, we can understand what we are.

Most importantly, let us play. Let us be creative, and let us acknowledge our creativity, in whatever ways those manifest. Because that is a huge part of what makes us human.

Do AIs Hallucinate Electric Sheep? Part 1: Context

AI seems to be everywhere nowadays, whether it's in the "creation" of new artworks, the generation of deepfakes, or even ChatGPT's ability to produce confident text in a variety of formats on any subject you'd care to mention.

So where does that leave us, the humans who's job/inclination is to be creative, and to bring different pieces of information to create something new, or provide a new insight into something that's already known?

Firstly, let's start with what ChatGPT (and other chatbot large language models) is not: it is not human. It is not an expert, and in many cases, it can be absolutely wrong on fundamental bits of knowledge. It is a model that takes the proximity of words to each other in a given corpus (for example, a load of crawled webpages, or Wikipedia) and it encodes those relationships as a set of numbers. When it's called on to answer a question, what it does is string the words together in a way that is determined by those numbers. It's a statistical process, that produces readable text in a user friendly way.

Alright, it's an interesting computer science problem to work on, with some cool applications. But why are people collectively freaking out about it, now that it's freely open and available for anyone to use?

My answer to this is culture. We, as humans, are so used to accepting that "computer says x" is the right answer, because it's instilled to us from an early age in schools. Computers use maths, and maths always has a right answer and a wrong answer. Therefore, if computers do arithmetic perfectly (which they don't, but that's a digression), then the answers they give are always correct.

Combining this with a deterministic view of the world through school-taught science, and we can easily wind up thinking that computers can model the world around us to a level of precision that we don't need to question. "Computer says X" is always the correct answer.

Even computer scientists buy into this mode of thinking sometimes - as the rapidly growing field of AI and data science ethics can show you. Computers may not be biased in themselves, but they are very, very good at replicating and amplifying any biases in their datasets. And history is full of bias, there's no denying that.

For some AI models, there's also this well known issue with hallucination - Open AI acknowledge in their list of limitations of ChatGPT that "ChatGPT sometimes writes plausible-sounding but incorrect or nonsensical answers." These answers, or hallucinations, have no basis in the data that the AI was trained with, but the chatbot can deliver them with the same certainty it delivers all the answers to questions, even going so far as to argue the validity of the hallucination while being challenged on it. Determining what answer is accurate, and what is a hallucination can be very difficult, especially for non-experts in the field of the question being asked. Which, to be fair, is likely to be the vast majority of users.

So, computers are not always right, combined with the tendency of them to be very convincing, means that people are worried about floods of misinformation, and the misuse of them in a wide range of contexts, from getting them to write school essays for you, to making excuses about why you're filing your taxes late, to telling you how to break into a house in rap form.

From a research integrity point of view, there have been documented examples of ChatGPT including references in academic answers where the references do not exist.

All this is enough to have universities, academic publishers and knowledge repositories coming out with restrictions on the use of ChatGPT, and in some case outright bans.

Where do we go from here? The chatbot is very firmly out of the bag now, and I am sure that the problems that have been identified by it are already being worked on, one way or another. But what does that mean for the future of research, and, more fundamentally, to the future of human creativity?

I don't know, but in my next post, I'm going to explore human creativity and what it means for us when an AI can easily do what we find difficult, but ultimately is fundamental to our sense of self as human beings.

Thursday, 16 February 2023

User stories for Research Practice Training - where to draw the boundaries?

In an attempt to figure out what the core aspects of research practice training are, versus the domain specific ones, I had a think and wrote a series of user stories to try to tease out commonalities. These are not accurate or complete, but I’m hoping they’ll be a good start for conversation. The key question for each user story is:

What does this researcher need to know to do their research effectively, ethically, transparently and verifiably?

The assumption is that everyone already has their funding sorted.

Researcher in particle physics needs to know:

How to access the data they need to use
How to manage the data
How to visualise and analyse the data (Scientific computing, high performance computing)
The background and metadata of the data collection process
The current state of the art in their field
How to communicate their research results (conferences/publications)
Health and safety for working with experimental machinery
…?

Researcher in social sciences working with asylum seekers

How to gain ethical approval for their work
How to formulate their work so that it causes no harm
How to manage and safely store their data, including dealing with the privacy and dignity of their contacts
How to keep themselves and their contacts safe (physically and psychologically)
How to communicate their research results (conferences/publications)
How to influence policy and engage with non-academics as stakeholders
How to deal with conscious and unconscious bias
…?

Researcher in AI-driven drug design

How to access and understand the databases that feed into the system
How to troubleshoot and understand the system outputs
Health and safety in the lab
Data and code management
How to communicate their research results (conferences/publications)
…?

Researcher in ancient history

How to access and cite primary sources
Archive access and handling of fragile artefacts
How to store and manage their data
How to communicate their research results (conferences/publications)
How to respect the artefact’s cultural background, bearing in mind it might have been taken from another culture during a period of colonialism
The context around the artefact, and its past interpretations, bearing in mind historical biases
…?

Researcher in clinical trials

Effective clinical good practice
How to deal with conscious and unconscious bias
Ethical approvals
Double blind experimental design
Human/animal experiment good practice
How to communicate with trial participants/other non-academic stakeholders
How to communicate their research results (conferences/publications)
…?

Researcher in modern arts

How to access and use their resources
How to manage and keep records of their observations/practices
How to communicate their research results (conferences/publications/exhibitions)
Stakeholder engagement
Research ethics and integrity
…?

Common topics

Stage of research	Topics
Beginning	· Ethical approvals · Research integrity · Current state of the art in the field, including community standards · Safe working practices (physical and psychological health)
Middle	·Accessing, managing, analysing and using data/artefacts/physical resources · Documenting their workflows/processes/practices – research records · Stakeholder management · Peer review and how to do it
End	· Communicating research results (stakeholders, policy makers, general public)

Unpicking those topics a bit:

General	Domain specific
Research Integrity and ethics	Current state of the art
Data management	Safe working practices
Stakeholder management	Workflow/practice recording
Research misuse
Peer review
Results communication (journals, presentations)

Note that safe working practices are mostly covered under the general health and safety training that everyone should go through when working for an employer, so even though there are some aspects that are very domain specific (radiation training, safeguarding, psychological safety), I haven't really included them in my further thinking about research practice.

When drawing the boundaries about what is research practice (i.e. what we're wanting to train people on to help them do better research) and what are techniques/tools/practices commonly done by researcher, I tend to think of them in terms of "is it something that only a researcher would do as part of their work?" It's always going to be a fuzzy boundary, and somewhat artificial, but we need to draw the line of scope somewhere. So that's why I'm not really thinking about health and safety, or project management, as core research practice topics, at this point in time anyway.

Random notes at the end:

MRC’s Good Research Practice e-learning module is very good! https://byglearning.com/mrcrsc-lms/course/index.php?categoryid=15
https://blogs.bournemouth.ac.uk/research/researcher-toolbox/code-of-good-research-practice/

Thursday, 2 February 2023

"Good Research Practice" – what does that mean?

I’ve been thinking a lot about Research Practice in the past months, from a variety of perspectives. Of course, there’s a lot been said and written about it, and I’ve been doing a lot of listening and reading too. But to synthesise all I’ve learned over my career, with all the new things I’ve learned recently, I find myself in a bit of a muddle.

In these sort of cases, I’ve found that going back to first principles can be really useful to try and ground my thinking. So, on that basis, what do we mean when we say something is “good research practice”? “Good” I think we can (hopefully) all agree on (or at least the definition of "good" can be considered out of scope for the moment) – so that breaks down to what research practice is, because we need to know how to do it before we can do it (or is that a bit chicken and egg there?).

To Google! A search for “research practice” (in my geographical area at least) provides as the first result the UKRI policy on the governance of good research practice (GRP). This is an interesting policy document in that it clearly lays out the responsibilities of the various parties (funders, institutions, researcher) when it comes to research integrity and research misconduct, but it doesn’t say much about what good research practice is.

Now, to be fair, this may very well be because different research domains do research in radically different domains, so it’s easier to list what counts as research misconduct, than it is to say what research practice actually looks like. (The UKRI GRP policy has Appendix 2 devoted to defining what research misconduct is.)

The UKRI Good Research Resource Hub is another interesting site which gives guidance on important research topics including open research, research integrity, equality, diversity and inclusion, human research participants and many, many more. But it still doesn’t give you a recipe or definition of what research practice is.

A bit further down the search result we find [Schwab et.al., 2022], which does what it says in the title and provides “Ten simple rules for good research practice”. These are then broken up into three sections, according to the stage of the research, whether that’s planning, execution, or reporting.

These rules are (fig.1):

Planning

Specify your research question
Write and register a study protocol
Justify your sample size
Write a data management plan
Reduce bias

Execution

Avoid questionable research practices
Be cautious with interpretations of statistical significance
Make your research open

Reporting

Report all findings
Follow reporting guidelines

The authors then discuss these headings in more detail in the text.

So, that’s all sorted then, right?

Well, remember how I mentioned different domains earlier? Yes, it’s not quite as simple as that. The terms used in the Ten Rules above aren’t universal across all scientific research, let alone across all possible research (which includes the humanities and arts as well).

Let’s take a stab at generalising them, shall we?

Planning

Decide what hypothesis you want to test/decide what information you want to find
Decide your methodology, taking into account domain and community standards
Will the data you’re collecting be enough for you to confirm your results with an appropriate degree of certainty? (How about error margins and statistical significance?)
Write your data management plan
Are there any sources of bias in your data or your methods that need to be compensated for?

Does your research have the potential to cause harm? If so, is it worth it? Can you mitigate the risks of that harm occurring?

Execution

Do your research according to high standards of research integrity

Be the best researcher with the highest integrity you can be

Be cautious with your interpretations

Are there any other reasons why you might have got the results you did?

Make your research as open as possible

Reporting

Report all findings, and all the details of the research, the good, the bad and the ugly

Try to make your research and component parts (data, code, workflows, etc.) FAIR

Follow community standards and practices for reporting

If possible, try to make those standards more open

Obviously, these are just preliminary thoughts on a subject with a lot of complexity, but hopefully they’re enough to get the brain cells working on this topic. And as always, figuring out where something isn’t quite right can be really helpful for determining what really works the best, given the circumstances.

~~~

https://www.sanger.ac.uk/wp-content/uploads/Good-Research-Practice-Guidelines-v4-Feb-2021.pdf

Schwab S, Janiaud P, Dayan M, Amrhein V, Panczak R, Palagi PM, et al. (2022) Ten simple rules for good research practice. PLoS Comput Biol 18(6): e1010139. https://doi.org/10.1371/journal.pcbi.1010139

About the Author

I'm Sarah Callaghan and I am the Research Practice Manager for the University of Oxford.

Previously, I was Editor-in-Chief for Patternsa data science journal from Cell Press.

Before then I worked for the Centre for Environmental Data Analysis as a data scientist and programme manager attempting to make sense of this data citation and publication thing.

Before that I worked for the Radio Communications Research Unit (now the Chilbolton Group at STFC - Rutherford Appleton Laboratory) where I studied radio propagation at frequencies above 10 GHz (and in the process created a number of large datasets).

Needless to say, all opinions are my own, and nothing to do with my employer.

My official biography can be found here.