Log on / register
Feedback | Support | My details
  Quick Search
BioMed Central
PubMed Central
PubMed

Contents

Blaise Cronin
Indiana University at Bloomington


BMC  Freedom of Information Conference 2000

Professor Blaise Cronin
Indiana University at Bloomington

Bibliometrics and Beyond: Some thoughts on webometrics and influmetrics

I was asked at short notice to talk about the future of bibliometrics in the world of the web, the internet, and developments such as PubMed and BioMed Central.

What bibliometricians do basically is count things and, at the risk of over simplifying what, they count are publications and typically publications which have appeared in peer review journals. They also count citations to the work of scientists, scholars, and researchers, and they count at the nano-level - how many times have you been cited, how frequently has your work been invoked. They do it at numerous levels, from a micro level, such as a research team, to a macro level, such as a nation state. Basically for the last thirty or forty years bibliometricians have looked at publications and citations to scholars. They've plotted, they've measured, they've tracked, and they've tried to map the information theoretic structure of disciplines, the intersections of disciplines, and the evolution of fields, and have tried to use these techniques, amongst other things, to pick winners in science.

The kind of emerging environment of publishing that this event is addressing suggests a raft of new opportunities for people interested in measuring, citations after all are the links that scientists make in their papers to the rest of the literature. I often think it would have been extremely interesting had, say thirty years ago, Eugene Garfield - who is the patriarch of citation indexing - met Ted Nelson, who was the conceptual grandfather of hypertext, which underpins the world wide web. In a sense, citation indexing is a conceptual linking scheme that's been waiting for years for something like the world wide web. I think we're going to see over the next few years the emergence of non-commercial, public domain, open, autonomous citation indexing techniques and I think we're at an extremely interesting moment in the evolution of bibliometrics.

The origins of science citation

Let me start with a very brief history of citation indexing and analysis. Most, if not all, of you are familiar with the science citation index. The name was coined by none other than Joshua Lederberg who, way back in the 1950s, was an ardent supporter of the idea of citation indexing and analysis and was a redoubtable supporter of Eugene Garfield and his efforts to get funding and ultimately develop the company ISI, which brought us the Science Citation Index (SCI) and its sister products. In fact, the SCI grew out of a subject specialty index and owes a great deal to Shepherd's Index, a tool which is still being used in law to track citations to legal literature.

If we look from 1950 through to the 1990s, the primary application of citation indexing was to retrieve the literature of science. It was conceived by Garfield as a retrieval to, not as something to be used in evaluating academics, departments, research groups, or the state of the health of high energy physics in Britain, France or anywhere else. It really was a retrieval tool but quite a number of individuals recognised that it had far more serious and significant implications in terms of our understanding of, if you like, the structural dynamics and the social nature of scientific activity. But basically the science citation index is limited to three-and-a-half to four thousand journals. That's what I call a bounded set and they are primarily print based, peer review journals. It is a slice of life, it is the crème de la crème, ISI will argue. Other will contest it and challenge the reliability of the data set but by and large it is the crème de la crème of science literature, the social sciences, and also the arts and humanities.

Creating "silent" scientists

Over the years researchers in different fields and individuals such as Henry Small, the director of research at ISI, have developed rather more sophisticated techniques for mapping and modelling the growth, evolution, and interaction of scientific fields by developing such things as co-citation analyses, citation mapping, and visualisation techniques. Those of us who've done this kind of work and or have played with these kinds of things have been historically reliant upon the tool sets from ISI. Those tool sets are not just limited to a finite slice of the journal literature, they basically exclude things, such as monographs. Now that won't matter in biomedicine, it does, however, matter not only in the humanities but quite significantly in social science. So if you are trying to identify significant thought leaders in fields such as the sociology of science, reliant upon ISI indexes, excellent though they be, you will not get the full picture because of the absence of monographs. I've argued for years with ISI and tried to persuade them to include them but they declined so to do for cost reasons.

The last eight or ten years I've been trying to persuade ISI to also include acknowledgements, the goat's droppings of academia and scholarship some would say, but that is to underestimate the social significance of acknowledgements. If we broaden it to look at the debate currently in the medical literature, suppose we do get rid of the author and replace it with contributors and, in some cases, guarantors, wouldn't we like to identify all those who have contributed? If you've contributed by critiquing a paper or assisting in data analyses, you at least want to find your name in the acknowledgement section. The trouble is, like second authors, third authors, and end author of a paper, you disappear into the black hole of citation space, you don't exist. Jack Meadows, a historian of science, talks about silent scientists and there are many academics, not just scientists, who have contributed significantly to the evolution of ideas, the maturation of doctoral and post-doctoral students but it doesn't show up in the two most evident measures, publication counts and citation counts. And ISI has, perhaps for good commercial reasons, resisted incorporating acknowledgees. Now imagine a database of all the individuals who were acknowledged. You may dismiss it but if you happen to be a telescope operator this is precisely what you're looking for. If you've taken part in a clinical trial and are not a co-author this is just the sort of thing you're looking for.

Hearing silence through the web

Now I think that is set to change as we move into the world of the web. Why? Because the extensive growth in links and the increased transparency that links provide us with allow us to see a much wider range of scholarly contribution. New modes of contribution or historically invisible modes of contribution will become potentially visible. The development of e-print archives is significant not just because it challenges certain historical assumptions about the usability of unvetted information but also because you can find that that paper has been subsequently cited, even if it's not yet appeared in the Journal of Record. So it provides further insight into the historically invisible linkages between documents and texts.

Developments such as PubMed Central, BioMed Central, and HighWire are going to create the conditions, I suspect, for different modalities of publication, ranging across the spectrum of peer review from highly robust, full blooded, double bind peer review through to various lighter options. What that's going to do is throw up a much broader range of objects for bibliometricians and citation analysts to look at, analyse, track, and measure. So it's an extremely interesting mode, a cusp in the evolution of research and bibliometrics because we now have an incipient infrastructure which will allow us to look at a much finer level of granularity. And we have developments like CrossRef. What is CrossRef but an initial commercial variant on what ISI is or has historically done? It represents, I suspect, a threat to ISI's near historic monopoly on citation indexing and linking.

And then we have something we call research indexing. This allows you to see citations in-context citation. For example, you do not just get Smith 1994 but you can actually see the context, if you like the semantic wrap-around, in which Smith is being cited. It tells you more about the nature, purpose, and motivation of this citing author. This is giving us a level of detail and contextualisation that we have not previously had and it's going to allow us to explore and understand at a deeper, richer level what actually is happening, what individuals link to and cite.

Now you may think that acknowledgements sitting down at the bottom are trivial but the Wellcome Trust has been tracking acknowledgement data so they can provide information to funding agencies in Europe and the UK. We could eventually see systems being developed that would allow us to track individuals' contributions in the context of clinical trial and other studies. One could also imagine, for example, citation to clinical guidelines being tracked in the context of evidence based medicine. So we're at a moment in time where new tools are likely to be coming out of unusual stables and contexts and applied by new populations to new outputs for new purposes.

New publishing - messy, slippy, evanescent and promiscuous

Traditional publishing and traditional bibliometrics dealt with printed, peer-reviewed journal articles in persistent scholarly journals. To exaggerate for effect, what we're going to see is a sort of free-form publishing, a sort of libertarian publishing. We're going to see many different modes of output, we're going to see many different forums in which scholars, researchers, and scientists can publish the work. So the units of analyses with which bibliometricians are going to deal are going to become much more diverse and it seems to me there are three elementary questions to be asked. What is it that's being measured? Where is it exactly and how do we access it? And what is the 'it' which we're accounting part of? The "what is the it" - it could be a traditional journal article, an overhead transparency of Nancy Kerrigan, an e-print paper, a self-posting, or it may be a version 3 of a working paper which is being dynamically revised. These are not traditionally the potential units of analyses addressed by bibliometricians. They are messy, they are slippy, sometimes evanescent, and we're not quite sure how to deal with them.

So, what is being measured? And where is it? How do we identify this miscellany of new age, post-modern scholarly outputs? What's an accepted format in each case for representing and labelling such output? Where do they reside? Who owns them? How can we ensure persistence and stability over time? How do we deal with link rod? How do we deal with vanishing URLs? Does it belong to a traditional journal? Some new-age variant on our historic concept of a journal? Is it part of a host service? An archive? An e-depository? All this brings to mind a phrase by John CD Brown; "The social life of documents". Documents have a social life, they have kinship structures, and we need to understand how they are socially contextualised.

One presumes bibliometricians are concerned about the integrity, quality, and significance of the things they are measuring, tracking, and counting. As we move into what I call promiscuous publishing there are serious questions to be asked about pedigree and persistence of the publication itself, of the source and the host. If it's resident in PubMed Central, which is legitimated through its association with the NIH, we feel pretty comfortable and relaxed. If it's sitting in my mother's server at home, well we may or may not have some grounds for concern! There are issues of implied or perceived quality - are these outputs? Remember it could be Nancy Kerrigan's overhead, it could be scholarly skywriting, it could be the latest version of my paper sitting in granny's server. Are they covered by abstracting indexing services, historically a good indicator of presumed quality? Has any of the work been funded by agencies such as the NIH of NSF? So what are the kinds of indicators of perceived quality that we're going to want to rely upon or invoke when dealing with multi-modal, promiscuous output from publishing? And who's imprimatur under-girds these units? Have they been subjected to full peer review or is it rather laid back. Is there no peer review whatsoever? We need to understand the significance of different peer-review apparatus and paraphernalia in different disciplines, fields, and sub-fields.

Differing cultures in different fields of science

One of our colleagues asked me why the Los Alamos model hasn't been adopted by every other field around the world. The answer is quite simply that the structure of high energy particle physics research is fundamentally different from biomedicine, oceanography, fruit fly research. By that I mean the nature of the collaborate is the intense level of internal, intramural, intra-group review, and the ground rules, procedures, and processes which are formally in place before work gets released is a very self-knowing community. It's unlikely that dubious work, flawed work, or fraudulent work will escape in a way that, one has to acknowledge, happens in medical research. And it is the intense social relationships and the material practices of those people who are requesting beam time and trying to find the next particle, that allows a development such as the Los Alamos e-print archive to develop. There is such inherent credibility in the work being produced even though it may not have gone through the formal, final stages of peer review. That model technologically could be replicated in other fields but sociologically it may not be acceptable and so one has to acknowledge the socio-cognitive differences across fields and sub-fields and how those psychic differences, cognitive differences, behavioural differences will either accelerate or delay the adoption of new modes of publishing, storing and archiving, and communicating information.

The last point in this has to do with the prevailing reward system in scientific fields. What counts in one field may not count in another. For example, in computer science conference papers are taken fairly seriously and, in the context of promotion and tenure, count for something. In other fields, such information science, we look rather less favourably upon conference proceedings and papers. What counts in medicine and what counts in other fields will be reflected differently in the rewards systems of those fields.

What value a citation?

Many people challenge some of the claims made for citation data in evaluation context, whether evaluating journals using impact factors or evaluating research groups through citation accounts. Nonetheless there is a considerable body of rigorous research that shows by and large that citations are perhaps the single most useful indicator of something and that something may be impact, may be utility, may, at a stretch, be quality and, of course, it may be lots of other things that are not quite so acceptable. But what exactly are bibliometricians measuring? Is it quality? Is it faddishness? Is it a critical flash in the pan? What exactly does a citation signify?

These questions have not been answered to everybody's satisfaction. Some citations are negative, that doesn't matter - at least I took the time to say that your work was flawed, the fact that I selected you is in itself revealing. But what do citations tell us? Is to be linked-to analogous with being heavily-sited? If so how might such measures be used in conjunction with others to develop a new metrics for the age of the world wide web? How do we determine what is substantive and what is transient in the world of volatile electronic publishing and posting? Do we utterly ignore motivations - I don't care why you voted for Reagan, I don't care why you voted for Thatcher, I'm simply looking at the broad distribution of votes. And then if we're going to be looking at metrics in the context of the web there are serious issues as to the reliability of the search engines. Run a search on AltaVista today on subject X, run it twelve hours later, and you're going to get a very different output. Run a search using AltaVista and a search using HotBot and you'll get very different outputs. The courage of these engines is partial and so if you're going to rely upon indicators or measures derived from the web one needs to take account of reliability. Are we going to weight all indicators of an individual's presence on the web equally? Same question in citations, are all citations equal? If Dr Varmus cites my work I'll feel good, if some third rate masters student does I might feel less good, yet in counting citations we count the Varmus one the same way that we count the masters student. If somebody mentions me in a news group, if somebody mentions me in a conference programme, if I'm the subject of an animated discussion on some issue, is that significant, is it trivial? Who determines and how do we weight? I don't have the answers and nor am I proposing, I'm simply saying that with the advent of the transparency of the web, with the availability of all kinds of indicators of people's cognitive and social activity, the opportunity for counting name-numerology is rampant.

And the last thing, Steven Adler coined the term "slash dot effect" to describe the surge or swarming behaviour that takes place when everybody gallops to a particular URL. You saw it during the Superbowl. Victoria's Secret web site crashed as a result of the number of people who raced out of their rooms having seen the TV advert during the Superbowl and logged onto Victoria's Secret. What does that tell us? But what are the equivalents of Victoria's Secret hits in the world of scholarship?

The clever project, folk at IBM have been looking at the web in terms of the pattern of linkages to and from sites and they talk in terms of authorities and hubs but what's most interesting is if you read their work they explicitly acknowledge the science citation index, they explicitly acknowledge the contributions of Garfield, and they explicitly acknowledge the utility, from their point of view, of the journal impact factor as a measure which could be ported across to the world wide web.

Prospect mining, web style

The last thing - and I don't' have any concrete examples - is that I suspect we'll see a battery of tool sets and engines that go beyond today's search engines and allow us to prospect. Where are the ideas germinating that haven't quite worked their way into formal literature? What about those post docs who don't have the social status and access to resources to get their ideas across? With new modes of publishing, with new modes of communication, we're going to be able to see perhaps early signs of subjects of interests or incipient trends in the conduct of science.

And so I conclude with the two words: "indicator mining". I can imagine X years from now that someone will be developing tools that will allow us to marshal these distributed goat's droppings way beyond acknowledgements and citation counts. There will be richer, multi-dimensional pictures of individual's presence, in particular, scientific communities and differential and multifarious impacts a scholarship is having upon their peer communities.

Questions from the floor

Questioner 1: I'm interested to hear your thoughts on including acknowledgement and contributor-ship as a way of helping to judge people's value on the web. One of the campaigns that I am working on is open peer review and trying to find ways of rewarding peer reviewers for the work they do. It seems to me that the way that you're suggesting is exactly a way to do that. That if open peer review exists on the web - a signed commentary attached to a paper - the peer reviewer can then be rewarded in the way that you are indicating for the work they're doing.

Professor Blaise Cronin: I have done a number of very tedious analyses of acknowledgement in different fields. I looked at them over the years, gathering literally tens of thousands of acknowledgements to look for patterns and to see whether there was evidence that there is a population of mentors who may not be highly visible in terms of publication and citation output but who clearly are having instrumental effects upon the development and growth of their field. We have done rank ordered analyses of these individuals and I have done some very vulgar and crude correlations of ranking acknowledgees with rankings of citations. In other words, looking at people who are highly cited and people who are highly acknowledged and it doesn't always work out. There's a Russian astrophysicist who has created an astronomy acknowledgement index and he said, after some initial scepticism, it's been very warmly received within the astronomy community. And I mentioned in passing earlier on the telescope operators. They were one of the groups that felt vindicated, perhaps validated, it was almost a legitimisation of their existence. They are the unsung heroes and heroines of our working lives - it doesn't attract kudos and visibility but without people fixing, managing, operating, and setting telescopes, the real astronomy, the science, doesn't get done. They were delighted that somebody had actually produced what was in fact a public register of their contribution.

I'm not suggesting we start counting these things and that people's annual pay rises are based on being acknowledged 2.7 times or 9.3 times. That trivialises the potential. But it does provide some social recognition, if nothing else, for people who make significant contributions Sociologists talk in terms of trusted assessor-ship for this phenomenon. So yes, one could imagine in the context of open peer review, if I take the trouble to critique constructively somebody else's paper it may be archived for us all to see. That is how things like e-pinions on the web work. They have ratings. There are people who offer opinions and they struggle with eachother to be the most highly vaunted, highly cited opinion giver. They do it for nothing but it matters. The prestige, the status, so the social factors in this are much more important than some of the rather crude quantitative aspects which could be exploited. So I really do see it as throwing a sort of search light on some of the shadow lands of scientific research so that all those who make contributions can receive the due recompense.

Register now



© 1999-2009 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.