Log on / register
Feedback | Support | My details
  Quick Search
BioMed Central
PubMed Central
PubMed

Contents

David Lipman
National Center for Biotechnology Information


BMC  Freedom of Information Conference 2000

David Lipman
National Center for Biotechnology Information


PubMed Central: still on course to revolutionise biomedical publishing

A universal electronic template will create a reliable and flexible interface for journals

The concept of making the results of primary research freely available to anyone with an internet connection caused a great stir in the media and biomedical science community when proposed last year by the National Institutes of Health (NIH). After some revision to the original proposal, PubMed Central was launched early this year. So where is the promised publishing revolution? As this article explains, addressing the technical challenges presented by such an ambitious project have kept us busy behind the scenes, but we are now moving ahead to make PubMed Central a reality.

What is PubMed Central?
The aim of PubMed Central is to deliver primary literature research findings to the scientific community free of charge, without registration, advertisements, or other barriers. Participation by publishers in PubMed Central is voluntary, but participating publishers must meet the minimum standard of having at least three members on their editorial boards who are currently principal investigators on research grants from major funding agencies. Copyright of material remains with the publisher or the author of an article, and not with PubMed Central. There is currently no provision for non-peer reviewed literature on the PubMed Central site.

PubMed Central is still a fledgling system. If you visit the PubMed Central website, you will see a modest number of articles available from a handful of journals. We would be delighted if more content were already there. However, we underestimated the technical issues involved in displaying content from different sources. Resolving these issues to the satisfaction of all concerned has proved to be a non-trivial task.

Making it happen
The technical approach being taken by PubMed Central is to display journal articles in a web browser by conversion 'on the fly' from the source data, which are tagged in standard generalized mark up language (SGML). Currently, SGML tagging of articles is usually a by product of the printing process. In PubMed Central we require the SGML version of an article to be the definitive source.

The advantage of this approach - working directly from the SGML - is twofold. First, SGML is an international standard, which means that the data are portable and can be used by others. Second, the maximum amount of information about the actual content of an article is retained. This is obviously desirable for the working archive that PubMed Central hopes to become. Future users of the archive will not be dependent on a particular technology for continued access to its contents.

True, we could store articles as HTML (HTML is a tiny subset of the SGML language); this would be fine for merely displaying articles, but is inadequate for defining the structure of an article. For example, in SGML, an article title is described as such and retains the title tag if the article is presented in different display styles; in HTML, it is merely a set of large, bold letters. For this reason, many are looking to XML - a half way house between the complex SGML and overly simplistic HTML - to use as the standard for text based information. In order for online publishing to evolve from a process based around journal articles into a dynamic and rich set of information, it will be essential to keep the source document, tagged in SGML or XML, as the archive copy.

A more streamlined approach
The disadvantage of this 'SGML first' approach is that different publishers use different sets of rules or templates, known as document type definitions (DTDs) for tagging SGML. Furthermore, the content, tags, and DTD have to be in perfect synchrony for a document to be displayed correctly. One of the teething problems encountered by PubMed Central has been a lack of this synchrony in many cases. Compounding these problems has been the need to fine tune the translation from SGML to HTML for each journal, to conform to the distinct display styles of different journals. We have come to question whether this approach makes the best use of our time and resources.

We are now considering a more streamlined approach, which we expect will deliver a reliable interface to the articles in PubMed Central, while also allowing flexibility for the development of the fabric of new articles in the future. Under this approach we would continue to display of SGML or XML tagged articles on the fly, but would first convert the SGML/XML supplied by the publisher, so that it conforms to a common set of tagging rules - in other words, using a single PubMed Central DTD. This would mean, for example, that all article titles are called <article title> rather than a mixture of <article title>, <article name>, <paper title>, <paper heading>, etc. Such an approach would make it more feasible to develop novel information retrieval methods, a robust archiving system, and computational analysis tools.

Why we need PubMed Central
Some have questioned the need for a system such as PubMed Central, given the growing free access to article archives at many journals' own sites, combined with the reference linking that will be available through CrossRef. Our response is that PubMed--the citation retrieval system with more than 10 million entries, to which PubMed Central is linked--already provides a more powerful, fully operational, and completely free linking facility, which reaches beyond simple bibliographic links to factual databases and other resources. Even with alternative sources of free literature, an advantage of PubMed Central is that it will provide access to all articles in a single place, regardless of where they are published.

The NIH, through the National Library of Medicine (NLM), has a strong commitment to making PubMed Central a valuable resource for the life sciences. For the NLM, PubMed Central is an extension of its longstanding commitment to preserve and maintain open access to the world's biomedical literature. Clearly it will take some time to resolve some of the technical issues discussed here, but we believe that the end results will more than justify the effort, and we encourage other publishers to join this initiative.

David J Lipman Director
National Center for Biotechnology Information/National Library of Medicine/National Institutes of Health Bethesda
Maryland, USA

Competing interests: The author has responsibility for PubMed Central.

Register now



© 1999-2008 BioMed Central Ltd unless otherwise stated