Open Data?

Following on from the recent posts on Open Access Publishing I wanted to pick up on something else that appeared recently in a similar vein, specifically the call by the Royal Society for Open Access Datasets in their report Science as an Open Enterprise: Open Data for Open Science.  The report argues that open inquiry is at the heart of research and that ‘publication of scientific theories – and of experimental and observational data on which they are based – permits others to identify errors, to support, reject or refine theories and to reuse data for further understanding and knowledge.  Science’s powerful capacity for self-correction comes from this openness to scrutiny and challenge.’  These are very fine words and are applicable to all areas of research whether it be scientific or not, but they also go against the inherent element of human nature epitomised by the school kid crouched over their exercise book in case their neighbour should steal an advantage!  Protecting ones sources and ones data is a natural instinct in the competitive research culture in which we live.  The report argues for a culture of change in which we open up our data to other scientists and to the public at large and that by being more open we can increasingly maximise the value of that data for the research community and crucially for society as a whole. 

On a personal level as someone who has had a line of research and a field site stolen from them by petty academic politics and rivalries the ability to gain access to data held by others is very appealing especially when you have something valuable to add.  It is why in my current research grant I gave a commitment which I will honour later this year to make all my data – thousands of digital footprint scans from sites across the World – available via a project website.  The pleasure in doing so lies in knowing that the data will be used by others to explore new ideas and agenda in the future, long after I have moved on to other topics.  This ideal is not without some challenges however.  Issues of data security and accessibility are considerable, as is the need to future proof such archives against changes in technology.  These are the challenges faced by any long term archiving project.  As a real illustration of these challenges I draw your attention to a local example.  Around ten years ago Bournemouth University was a partner in a Heritage Lottery project entitled the Dorset Coastal Digital Archive, a resource of digitised and geo-rectified maps and charts from along the Dorset Coast supported by a range of learning packages.  This web-based archive was hosted by the University and was recently subject to a malicious cyber-attack corrupting both the site and the data back-up and as a result the site has had to be taken down for the time being while a solution is sought.  I also know from personal experience the difficulty and frustration involved in extracting raw data linked to publications several decades old that was deposited in national data repositories.  But despite these issues the benefits are clear or at least they are to me.  There has already been some discussion here at BU in the Research & Knowledge Exchange Forum about whether to establish our own data repository similar to BURO and while this debate has yet to conclude it is an idea which would be in line with the proposals from the Royal Society. 

The working group at the Royal Society behind the report was chaired by Professor Geoffrey Boulton who just happens to be my former PhD supervisor but notwithstanding this association it is a really fine document and makes six clear recommendations:

  • Scientists need to be more open with respect to their data among themselves, with the public and media.
  • Greater recognition needs to be given to the value of data gathering, analysis and communication for example through recognition in future research assessment exercises or in the promotion criteria for academics.
  • There needs to a drive towards common standards for sharing information so that it can be accessed by all.
  • They argue that publishing data in a reusable form to support findings should be a mandatory part or a pre-requisite for publication and a requirement of all the main funding bodies.
  • They suggest that we need more experts in managing and supporting the use of digital data to maximise the potential that it provides to researchers and society as a whole.
  • Finally they recognise that new software tools need to be developed to analyse the growing amount of data being gathered.

Interestingly it is the reference to the role of datasets in research assessment’s such as REF that was picked up in the news, particularly by the THE.  According to this article, if REF panels were to treat datasets on a par with publications there would be a huge revolution in open data access.  Interestingly REF criteria does not currently exclude datasets and 132 datasets were evaluated as part RAE-2008.  This is an interesting and important idea and is no different from the evaluation of artefacts or similar outputs.    Whether we will see datasets more explicitly mentioned in future exercises will be something to watch for with interest.  As an aside I remember a conversation with my former supervisor about defining key research questions; his reply was that there are lots of questions but very little good data available!  The other observation that I would make here is that our own Smart Technology Research Centre is leading the way in the production of new software tools to deal with the ever growing amount of data available. 

So in conclusion I would encourage you to read the report by the Royal Society and would welcome your thoughts and suggestions about how we could incorporate these ideas into our research strategy here at BU.