On Thursday 4 July CsJCC and NRG supported a one-day workshop that demonstrated and explored database use in Humanities Research. This was a two-part day that aimed to introduce participants to the possibilities of this approach, and then moved to discuss potential collaborative projects. It was attended by FMC colleagues and doctoral students researching multiple subjects (including English, Marketing, Advertising, Law, Journalism, Computer Animation, and Radio), and two external scholars.
Ian Stephenson (Senior Lecturer in Computer Animation and Faculty Data Champion) led the morning session, a gentle introduction to using standard query language (SQL) to ask complicated questions of multiple existing data tables. Research generates data, either during primary research or in the form of meta-data where we annotate and organise existing media. While simple notetaking can work at first, greater structure often becomes necessary as projects grow and evolve. Ian demonstrated how small, free programmes such as Postgres can easily be installed (without cost) on a laptop, providing the same power and flexibility as that of commercial datacentres to store and organise essentially unlimited amounts of data. This allows data to be securely stored, well organised, and shared between researchers, allowing us to ask new questions of the data, beyond the scope of the initial investigation.
This workshop grew out of Ian’s development of Julia Round’s database of Misty stories (available at www.juliaround.com/misty). Julia’s project explores the nature of the stories in the British girls’ comic Misty, and also contains supporting information on their creators, origins, and so forth (courtesy of online communities of scholars and fans). Her online database is searchable and will help interested readers find information on these things, but by its nature it’s not capable of asking more complicated questions. Ian developed this research into a relational database or series of interlinked tables, each focused around a subject such as stories (type, length, themes, character, etc), people (artist, writer, letterer, colourist, editor, etc), publication details (title, co-title, issue date, price, cover image, tagline, free gift, etc.), and so forth.
SQL thus enables us to ask questions that link all of this information. For example:
– In what months were new titles launched?
– When did price hikes take place and how does this look if adjusted for inflation?
– How long did merged comics titled usually last?
– Were boys and girls titles different in terms of pricing, story length, or other factors susceptible to numerical analysis?’
– Which artists’ work appeared on the covers most frequently?
– Which artists’ work appears in the internal colour (centre) pages most frequently?
– Which writers and artists most frequently worked together?
In his brief demonstration, Ian showed us some interesting statistics on a number of subjects. These included identifying patterns in story crossover points, i.e. where serials overlapped; the price rises in comics (which prior to the 1980s were not significant in the context of inflation and as compared to newspaper periodicals); and that almost all new titles were launched in February or at the end of the summer. In this way, participants saw how reconstructing simple spreadsheet data as a relational database allowed it to be expanded, interrogated and repurposed. By sharing such datasets, the borders of existing research projects can be extended and interdisciplinary and collaborative projects can be taken to new levels.
The afternoon session invited interested participants to discuss how we might collaborate on developing such a project. Discussion points included a review of what is out there already (sites such as the Grand Comics Database, Jinty blog, Girls’ Comics of Yesterday, Great News for All Readers, Down the Tubes, and so forth), and identification of what these sites do and don’t offer. It was felt that even the most inclusive sites such as the GCD don’t allow complex searches and that most sites/blogs are set up with a singular aim in mind. We thus agreed there was a demonstrable need for a live shared resource that would provide students, researchers and fans with access to a much wider dataset along with the ability to ask complex, interlinked questions of this data.
We were lucky enough to have a brief discussion with a colleague from the Law Department who advised us on copyright issues when including quotations and images, and also IP rights when incorporating data gathered by other people or the templates created to contain this. Images in particular will need to be kept within private circulation and database rights will need to be explored further to ensure we have the correct permissions from contributors.
To develop this project, first steps will be to find some server space to host this dataset. In the longer term, we will reach out to the academic and fan communities for the spreadsheet data currently held by individuals, using our existing networks and also through conference presentations demonstrating the value of this potential resource. Later steps will include expanding the scope of the database to a global level, and developing tools to allow contributors to directly add data, via funding bids or other initiatives.
The proposed database has clear benefits as a data discovery tool, with a demonstrable need from the community of comics scholars at multiple levels. It will have impact as a teaching aid and a source of primary data that will lead to research outputs.