There’s no such thing as a bad metric.

Lizzie Gadd warns against jumping on ‘bad metrics’ bandwagons without really engaging with the more complex responsible metrics agenda beneath.

An undoubted legacy of the Metric Tide report has been an increased focus on the responsible use of metrics and along with this a notion of ‘bad metrics’.  Indeed, the report itself even recommended awarding an annual ‘Bad Metrics Prize’.  This has never been awarded as far as I’m aware, but nominations are still open on their web pages.  There has been a lot of focus on responsible metrics recently.  The Forum for Responsible Metrics have done a survey of UK institutions and is reporting the findings on 8 February in London.  DORA has upped its game and appointed a champion to promote their work and they seem to be regularly retweeting messages that remind us all of their take on what it means to do metrics responsibly.   There are also frequent twitter conversations about the impact of metrics in the up-coming REF.  In all of this I see an increasing amount of ‘bad metrics’ bandwagon-hopping.  The anti-Journal Impact Factor (JIF) wagon is now full and its big sister, the “metrics are ruining science” wagon, is taking on supporters at a heady pace.

It looks to me like we have moved from a state of ignorance about metrics, to a little knowledge.  Which, I hear, is a dangerous thing.

It’s not a bad thing, this increased awareness of responsible metrics; all these conversations.  I’m responsible metrics’ biggest supporter and a regular slide in my slide-deck shouts ‘metrics can kill people!’.  So why am I writing a blog post that claims that there is no such thing as a bad metric?  Surely these things can kill people? Well, yes, but guns can also kill people, they just can’t do so unless they’re in the hands of a human.  Similarly, metrics aren’t bad in and of themselves, it’s what we do with them that can make them dangerous.

In Yves Gingras’ book, “Bibliometrics and Research Evaluation” he defines the characteristics of a good indicator as follows:

  • Adequacy of the indicator for the object that it measures
  • Sensitivity to the intrinsic inertia of the object being measured
  • Homogeneity of the dimensions of the indicator.

So, you might have an indicator such as ‘shoe size’, where folks with feet of a certain length get assigned a certain shoe size indicator. No problem there – it’s adequate (length of foot consistently maps on to shoe size); it’s sensitive to the thing it measures (foot grows, shoe size increases accordingly), and it’s homogenous (one characteristic – length, leads to one indicator – shoe size).  However, in research evaluation we struggle on all of these counts.  Because the thing we really want to measure, this elusive, multi-faceted “research quality” thing, doesn’t have any adequate, sensitive and homogeneous indicators. We need to measure the immeasurable. So we end up making false assumptions about the meanings of our indicators, and then make bad decisions based on those false assumptions.  In all of this, it is not the metric that’s at fault, it’s us.

In my view, the JIF is the biggest scapegoat of the Responsible Metrics agenda.  The JIF is just the average number of cites per paper for a journal over two years.  That’s it.  A simple calculation. And as an indicator of the communication effectiveness of a journal for collection development purposes (the reason it was introduced) it served us well.  It’s just been misused as an indicator of the quality of individual academics and individual papers.  It wasn’t designed for that.  This is misuse of a metric, not a bad metric. (Although recent work has suggested that it’s not that bad an indicator for the latter anyway, but that’s not my purpose here).  If the JIF is a bad metric, so is Elsevier’s CiteScore which is based on EXACTLY the same principle but uses a three-year time window not two, a slightly different set of document types and journals, and makes itself freely available.

If we’re not careful, I fear that in a hugely ironic turn, DORA and the Leiden Manifesto might themselves become bad (misused) metrics: an unreliable indicator of a commitment to the responsible use of metrics that may or may not be there in practice.

I understand why DORA trumpets the misuse of JIFs; it is rife and there are less imperfect tools for the job. But there are also other metrics that DORA doesn’t get in a flap about – like the individual h-index – which are subject to the same amount of misuse, but are actually more damaging.  The individual h-index disadvantages certain demographics more than others (women, early-career researchers, anyone with non-standard career lengths); at least the JIF mis-serves everyone equally.  And whilst we’re at it peer review can be an equally inadequate research evaluation tool (which, ironically, metrics have proven). So if we’re to be really fair we should be campaigning for responsible peer review with as much vigour as our calls for responsible metrics.

Bumper stickers by Paul van der Werf
Bumper stickers by Paul van der Werf (CC-BY)

 

It looks to me like we have moved from a state of ignorance about metrics, to a little knowledge.  Which, I hear, is a dangerous thing.  A little knowledge can lead to a bumper sticker culture ( “I HEART DORA” anyone?  “Ban the JIF”?) which could move us away from, rather than towards, the responsible use of metrics. These concepts are easy to grasp hold of, but they mask a far more complex and challenging set of research evaluation problems that lie beneath.  The responsible use of metrics is about more than the avoidance of certain indicators, or signing DORA, or even developing your own bespoke Responsible Metrics policy (as I’ve said before this is certainly easier said than done).

The responsible use of metrics requires responsible scientometricians.  People who understand that there is really no such thing as a bad metric, but it is very possible to misuse them. People with a deeper level of understanding about what we are trying to measure, what the systemic effects of this might be, what indicators are available, what their limitations are, where they are appropriate, how they can best triangulate them with peer review.  We have good guidance on this in the form of the Leiden Manifesto, the Metric Tide and DORA.  However, these are the starting points of often painful responsible metric journeys, not easy-ride bandwagons to be jumped on.  If we’re not careful, I fear that in a hugely ironic turn, DORA and the Leiden Manifesto might themselves become bad (misused) metrics: an unreliable indicator of a commitment to the responsible use of metrics that may or may not be there in practice.

Let’s get off the ‘metric-shaming’ bandwagons, deepen our understanding and press on with the hard work of responsible research evaluation.

 


Elizabeth Gadd

Elizabeth Gadd is the Research Policy Manager (Publications) at Loughborough University. She has a background in Libraries and Scholarly Communication research. She is the co-founder of the Lis-Bibliometrics Forum and is the ARMA Metrics Special Interest Group Champion

 

 

Creative Commons LicenceOriginal content posted on The Bibliomagician reposted here with permission. Content is licensed under a Creative Commons Attribution 4.0 International License.