|
October 20, 2003
INTERVIEW
A journey into DSpace
Institutional full-text repositories
have recently
emerged as a promising
way of providing increasing
access to scholarly research
material. DSpace is an institutional
digital 'super-archive'
system jointly developed
by Massachusetts Institute
of Technology Libraries and
Hewlett-Packard Laboratories.
Open Access Now talked to
MacKenzie Smith, director of
the MIT DSpace team, about
the DSpace project.
Much of the high-profile focus of the
Open Access movement has been on
providing Open Access alternatives to
traditional subscription-based research
journals. But the Internet also provides
another way for researchers and their
host institutions to provide free access
to their research articles, in the form of
institutional or 'self' archives. As with
so many web-based technologies,
however, self-archiving is not quite as
straightforward as it might seem.
DSpace and
Open Access DSpace began three years ago with a
US$1.8 million collaboration between
Massachusetts Institute of Technology
(MIT) Libraries and Hewlett-Packard
(HP) to develop a dynamic repository
for intellectual output in digital
formats. "DSpace was originally conceived
as a tool to assist universities,
and particularly research universities,
with making research material more
easily available, through Open Access
wherever that was possible," explains
Smith.
"The two main functional
aspects of DSpace are
preservation and access
to the material"
MacKenzie Smith
Universities and research institutions
are developing research materials and
scholarly publications in increasingly
complex digital formats, and there is a
pressing need for systems to collect,
preserve, index and distribute them.
But the time and technical expertise
required to do this properly are beyond
the resources of most laboratories
or departments. The DSpace system
provides a way to manage research
materials and publications in a professionally
maintained repository, to give
them greater visibility and accessibility
over time.
"We designed a platform that was
somewhat neutral about the politics
of access," Smith notes. "We wanted to
create a tool that universities could
use to support Open Access, if that was
their goal, but that would not prohibit
them from having restricted-access
material as well, if that were necessary.
And because the initiative came out of
the library and archive community at
MIT, we were very concerned about
the issues of long-term access and
preservation. If researchers move to a
model of self-archiving over time and
there are fewer and fewer things being
published in print, then you have to
worry about the scholarly record for
the future. So, we tried to design a
platform that would also help support
institutions that want not only to
make the material more accessible to
the public but also to preserve it so
that it's still there in a hundred years'
time. The two main functional aspects
of DSpace are preservation and access
to the material."
DSpace manages and distributes digital
items, made up of digital files (or bit
streams) and allows for the creation,
indexing, and searching of associated
metadata to locate and retrieve the
items. Each DSpace service is comprised
of 'communities' - research groups that contribute content to
DSpace. The communities might be
departments, laboratories, research
centers, or any other administrative
unit within an institution. Communities
determine their own content guidelines
and decide who has access to the
community's contributions. DSpace is
a web-based application, so if the
material is made publicly available
then access can easily be unlimited.
"We are trying to get the barriers down so low
that people can do self-archiving without even
thinking about it"
MacKenzie Smith
"You can limit access to the university
campus community, or even to your
department or lab," says Smith. "Our
philosophy is to give people the tool
to do what they want to do, but
constantly encourage them and remind
them of our greater goal - to make
more of this information available to
the public. We have noticed that a lot
of faculty are reluctant or nervous
about all this and we realize that it
may take some time to convince them
that it's a good idea. Open Access is
something that some disciplines have
embraced in a big way, while other
disciplines have reservations and
concerns."
Smith is keen to emphasize that what
an institution does with DSpace is
entirely a matter of the policy of that
institution. "At MIT we made a series
of decisions about how we are going
to use this platform. For example, we
have limited it to faculty research
material and teaching material; we
don't accept student work or material
from non-affiliated researchers.
But another university could decide
to do something quite different with it
- requiring all campus members to
deposit their articles, or whatever."
MIT has a policy of encouraging,
rather than forcing, faculty to deposit
material in the electronic archive.
"We don't dictate how people use
this system; we encourage them and
show them by example what you can
do with it," says Smith.
Development of the software by MIT
and HP took nearly two years and
the DSpace system has been in production
at MIT for about a year. "At MIT
there is a very wide understanding of
the system, what it's for and what
it does," says Smith. But she admits
that adoption by the MIT faculty has
been fairly slow. "It takes a while, and
part of the reason for that is that
communities have to make a lot of
decisions about policies - who can
submit, who has access, what is the
workflow involved," she explains.
"Communities have to sit down and
think hard about how people will
submit files and how they will be
processed before being posted; this
can take anywhere from six months
to a year depending on the department.
It is really making people think about
how they want to share their research
material."
Greater sharing
Smith also explains how implementing
DSpace at MIT has encouraged faculty
to think about what it is that they
want to share, and with whom. DSpace
is designed with a flexible storage
and retrieval architecture adaptable
to a multitude of data formats and
distinct research disciplines. Each
DSpace community has its own customized
user portal that can be adapted
to the community's own practices
and terminology.
DSpace accepts all manner of digital
formats: from standard documents,
such as such as articles and preprints,
to working papers, conference papers,
books, or theses, as well as large
datasets, visual images and audio files.
"We work closely with our faculty
about what kind of things they can be
putting into the system," says Smith.
She notes that MIT researchers have
been particularly receptive to ways to
openly distribute supplementary material,
genome-scale datasets and images.
DSpace provides permanence, which is
preferable to storing information on a
random departmental webserver. "We
have made a policy decision to make
the service free for MIT faculty. If they
need additional help - like extensive
storage space or help creating metadata
- then we have cost-recovered
services to support them."
|
|
"Every university worth its salt has
some kind of library or archive and
those were set up mainly to store
things that were published outside the
university that the faculty needed
access to. But now more and more
things, such as technical reports and
working papers, are being published
internally and fewer things are getting
to some kind of formally published
form. So that has become the scholarly
record now. But who is taking care of
all these datasets and image sets?
Some academics are trying to do all of
it themselves, but they don't really
have the resources. A few disciplines
have a central archive for it but most
of them don't. So it seems kind of
obvious that universities and other
research institutions have to start doing
this for at least their own material.
And platforms like DSpace are
designed to help with this problem."
Smith says that the MIT Library team
are hoping to attract as many research
papers as possible because of the
issues of Open Access. "But the system
isn't limited to dealing with papers,"
she adds. "It's meant to start to get
research institutions positioned to
deal with all the research information
they are creating. For example, we
are in the process of converting about
10,000 theses to put in to DSpace.
Some of the things we are learning
are very surprising. For example, a lot
of faculty are very worried about their
teaching material: they are creating
all these great online course materials
and there is nowhere to put them, so
they have to try to do it themselves
or rely on inadequate course-management
systems."
There is also a lot of non-textual
material, which has made it hard to
develop effective search strategies.
Currently, the MIT DSpace system
preserves files in the format in which
they were deposited without automatic
conversion to a standardized format.
But the developers are continuing to
think about strategies for long-term
preservation.
The DSpace Federation
The original goal of the DSpace
developers was to create a system that
could be widely used by universities
and research institutions. The DSpace
system is freely available as open source
software: it is available for
anyone to download and run at any
type of institution, organization, or
company (or even just as an individual).
Users are also allowed to
modify DSpace to meet an organization's
specific needs.
To encourage wide-scale use of
Dspace, MIT Libraries and HP
established the DSpace Federation,
with the goal of bringing together
sets of universities or research
organizations that share similar
concerns and problems. Currently the
DSpace Federation Project is
supported by a grant from the
Andrew W. Mellon Foundation and
aims to test the application of DSpace
in a variety of university settings, to
discuss what sort of multi-institutional
federated services might usefully be
built on the DSpace platform, and to
explore how interoperability among
these organizations' systems may
create a far more valuable resource
than is possible through individual
systems.

MIT's Killian Court and the
Great Dome - a landmark at the Institute
The Federation is currently being
defined by a core group of eight
universities who are evaluating
DSpace in different institutional
contexts, and a further 120 institutions
worldwide who are looking into the
system. "The system has also been
downloaded about 5,000 times," says
Smith. "We provide a fair amount of
technical documentation and other
kinds of help to get people started and
as much additional support as we can
manage with our limited financial
resources." In the spirit of open-source
software, many people have developed
tools to help other users take advantage
of Dspace. Smith says that a reasonable
IT team can set up DSpace within a
week. "But if you want to turn it into a
production system then it takes longer
to develop the necessary policies and
administrative processes."
"You can imagine
a sort of virtual library
of every research publication
in the world"
MacKenzie Smith
One important aspect of institutional
repositories is that they can encourage
an institution-wide commitment to
Open Access. Smith admits that
'super-archives' are not yet accelerating
Open Access. "But the first step
is to provide the platform that
allows it to happen at all. Then the
second step is to go out and promote
it and explain to faculty why there
are problems and encourage them to
think about how they would want to
use a tool like this."
"If faculty get addicted to using systems
like DSpace and start to put in
all their research articles, and if every
research institution in the world has
something like this, then over time
it could become a serious alternative
to commercial publishing as a means
of communication between scholars,"
Smith speculates. "Whether that is
what happens, we will have to wait
and see. But we certainly set the
system up so that could happen.
We are promoting it and trying to get
every research institution in the
world to do this, or something like it,
and they should all be interoperable.
We all support the same standards,
and the systems can all talk to each
other, so you can imagine a sort of
virtual library of every research
publication in the world."
"Some disciplines are very eager to
move ahead and some just don't get
it. But you have to start somewhere,"
concludes Smith. "Researchers in the
life sciences are pretty good in general
about the idea of sharing. The first step
is to provide the tools to enable it and
the next is to get out there and let people
know about it. We are trying to get
the barriers down so low that people
can do self-archiving without even
thinking about it."
www.dspace.org
|