OA Now back issues
 Search OA Now
Archive

October 20, 2003

INTERVIEW

A journey into DSpace

Institutional full-text repositories have recently emerged as a promising way of providing increasing access to scholarly research material. DSpace is an institutional digital 'super-archive' system jointly developed by Massachusetts Institute of Technology Libraries and Hewlett-Packard Laboratories. Open Access Now talked to MacKenzie Smith, director of the MIT DSpace team, about the DSpace project.

Much of the high-profile focus of the Open Access movement has been on providing Open Access alternatives to traditional subscription-based research journals. But the Internet also provides another way for researchers and their host institutions to provide free access to their research articles, in the form of institutional or 'self' archives. As with so many web-based technologies, however, self-archiving is not quite as straightforward as it might seem.

DSpace and Open Access
DSpace began three years ago with a US$1.8 million collaboration between Massachusetts Institute of Technology (MIT) Libraries and Hewlett-Packard (HP) to develop a dynamic repository for intellectual output in digital formats. "DSpace was originally conceived as a tool to assist universities, and particularly research universities, with making research material more easily available, through Open Access wherever that was possible," explains Smith.


"The two main functional aspects of DSpace are preservation and access to the material"

MacKenzie Smith


Universities and research institutions are developing research materials and scholarly publications in increasingly complex digital formats, and there is a pressing need for systems to collect, preserve, index and distribute them. But the time and technical expertise required to do this properly are beyond the resources of most laboratories or departments. The DSpace system provides a way to manage research materials and publications in a professionally maintained repository, to give them greater visibility and accessibility over time.

"We designed a platform that was somewhat neutral about the politics of access," Smith notes. "We wanted to create a tool that universities could use to support Open Access, if that was their goal, but that would not prohibit them from having restricted-access material as well, if that were necessary. And because the initiative came out of the library and archive community at MIT, we were very concerned about the issues of long-term access and preservation. If researchers move to a model of self-archiving over time and there are fewer and fewer things being published in print, then you have to worry about the scholarly record for the future. So, we tried to design a platform that would also help support institutions that want not only to make the material more accessible to the public but also to preserve it so that it's still there in a hundred years' time. The two main functional aspects of DSpace are preservation and access to the material."

DSpace manages and distributes digital items, made up of digital files (or bit streams) and allows for the creation, indexing, and searching of associated metadata to locate and retrieve the items. Each DSpace service is comprised of 'communities' - research groups that contribute content to DSpace. The communities might be departments, laboratories, research centers, or any other administrative unit within an institution. Communities determine their own content guidelines and decide who has access to the community's contributions. DSpace is a web-based application, so if the material is made publicly available then access can easily be unlimited.


"We are trying to get the barriers down so low that people can do self-archiving without even thinking about it"

MacKenzie Smith


"You can limit access to the university campus community, or even to your department or lab," says Smith. "Our philosophy is to give people the tool to do what they want to do, but constantly encourage them and remind them of our greater goal - to make more of this information available to the public. We have noticed that a lot of faculty are reluctant or nervous about all this and we realize that it may take some time to convince them that it's a good idea. Open Access is something that some disciplines have embraced in a big way, while other disciplines have reservations and concerns."

Smith is keen to emphasize that what an institution does with DSpace is entirely a matter of the policy of that institution. "At MIT we made a series of decisions about how we are going to use this platform. For example, we have limited it to faculty research material and teaching material; we don't accept student work or material from non-affiliated researchers. But another university could decide to do something quite different with it - requiring all campus members to deposit their articles, or whatever." MIT has a policy of encouraging, rather than forcing, faculty to deposit material in the electronic archive. "We don't dictate how people use this system; we encourage them and show them by example what you can do with it," says Smith.

Development of the software by MIT and HP took nearly two years and the DSpace system has been in production at MIT for about a year. "At MIT there is a very wide understanding of the system, what it's for and what it does," says Smith. But she admits that adoption by the MIT faculty has been fairly slow. "It takes a while, and part of the reason for that is that communities have to make a lot of decisions about policies - who can submit, who has access, what is the workflow involved," she explains. "Communities have to sit down and think hard about how people will submit files and how they will be processed before being posted; this can take anywhere from six months to a year depending on the department. It is really making people think about how they want to share their research material."

Greater sharing
Smith also explains how implementing DSpace at MIT has encouraged faculty to think about what it is that they want to share, and with whom. DSpace is designed with a flexible storage and retrieval architecture adaptable to a multitude of data formats and distinct research disciplines. Each DSpace community has its own customized user portal that can be adapted to the community's own practices and terminology.

DSpace accepts all manner of digital formats: from standard documents, such as such as articles and preprints, to working papers, conference papers, books, or theses, as well as large datasets, visual images and audio files. "We work closely with our faculty about what kind of things they can be putting into the system," says Smith. She notes that MIT researchers have been particularly receptive to ways to openly distribute supplementary material, genome-scale datasets and images. DSpace provides permanence, which is preferable to storing information on a random departmental webserver. "We have made a policy decision to make the service free for MIT faculty. If they need additional help - like extensive storage space or help creating metadata - then we have cost-recovered services to support them."

 

"Every university worth its salt has some kind of library or archive and those were set up mainly to store things that were published outside the university that the faculty needed access to. But now more and more things, such as technical reports and working papers, are being published internally and fewer things are getting to some kind of formally published form. So that has become the scholarly record now. But who is taking care of all these datasets and image sets? Some academics are trying to do all of it themselves, but they don't really have the resources. A few disciplines have a central archive for it but most of them don't. So it seems kind of obvious that universities and other research institutions have to start doing this for at least their own material. And platforms like DSpace are designed to help with this problem." Smith says that the MIT Library team are hoping to attract as many research papers as possible because of the issues of Open Access. "But the system isn't limited to dealing with papers," she adds. "It's meant to start to get research institutions positioned to deal with all the research information they are creating. For example, we are in the process of converting about 10,000 theses to put in to DSpace. Some of the things we are learning are very surprising. For example, a lot of faculty are very worried about their teaching material: they are creating all these great online course materials and there is nowhere to put them, so they have to try to do it themselves or rely on inadequate course-management systems."

There is also a lot of non-textual material, which has made it hard to develop effective search strategies. Currently, the MIT DSpace system preserves files in the format in which they were deposited without automatic conversion to a standardized format. But the developers are continuing to think about strategies for long-term preservation.

The DSpace Federation
The original goal of the DSpace developers was to create a system that could be widely used by universities and research institutions. The DSpace system is freely available as open source software: it is available for anyone to download and run at any type of institution, organization, or company (or even just as an individual). Users are also allowed to modify DSpace to meet an organization's specific needs.

To encourage wide-scale use of Dspace, MIT Libraries and HP established the DSpace Federation, with the goal of bringing together sets of universities or research organizations that share similar concerns and problems. Currently the DSpace Federation Project is supported by a grant from the Andrew W. Mellon Foundation and aims to test the application of DSpace in a variety of university settings, to discuss what sort of multi-institutional federated services might usefully be built on the DSpace platform, and to explore how interoperability among these organizations' systems may create a far more valuable resource than is possible through individual systems.

MIT's Killian Court and the Great Dome - a landmark at the Institute

The Federation is currently being defined by a core group of eight universities who are evaluating DSpace in different institutional contexts, and a further 120 institutions worldwide who are looking into the system. "The system has also been downloaded about 5,000 times," says Smith. "We provide a fair amount of technical documentation and other kinds of help to get people started and as much additional support as we can manage with our limited financial resources." In the spirit of open-source software, many people have developed tools to help other users take advantage of Dspace. Smith says that a reasonable IT team can set up DSpace within a week. "But if you want to turn it into a production system then it takes longer to develop the necessary policies and administrative processes."


"You can imagine a sort of virtual library of every research publication in the world"

MacKenzie Smith


One important aspect of institutional repositories is that they can encourage an institution-wide commitment to Open Access. Smith admits that 'super-archives' are not yet accelerating Open Access. "But the first step is to provide the platform that allows it to happen at all. Then the second step is to go out and promote it and explain to faculty why there are problems and encourage them to think about how they would want to use a tool like this."

"If faculty get addicted to using systems like DSpace and start to put in all their research articles, and if every research institution in the world has something like this, then over time it could become a serious alternative to commercial publishing as a means of communication between scholars," Smith speculates. "Whether that is what happens, we will have to wait and see. But we certainly set the system up so that could happen. We are promoting it and trying to get every research institution in the world to do this, or something like it, and they should all be interoperable. We all support the same standards, and the systems can all talk to each other, so you can imagine a sort of virtual library of every research publication in the world."

"Some disciplines are very eager to move ahead and some just don't get it. But you have to start somewhere," concludes Smith. "Researchers in the life sciences are pretty good in general about the idea of sharing. The first step is to provide the tools to enable it and the next is to get out there and let people know about it. We are trying to get the barriers down so low that people can do self-archiving without even thinking about it."

www.dspace.org

 

 
 

Open Access Now is published by BioMed Central.
Editor: Jonathan B Weitzman.