SHV β-lactamases confer resistance to a broad range of antibiotics by accumulating mutations. The number of SHV variants is steadily increasing. 117 SHV variants have been assigned in the SHV mutation table (http://www.lahey.org/Studies/ webcite). Besides, information about SHV β-lactamases can be found in the rapidly growing NCBI protein database. The SHV β-Lactamase Engineering Database (SHVED) has been developed to collect the SHV β-lactamase sequences from the NCBI protein database and the SHV mutation table. It serves as a tool for the detection and reconciliation of inconsistencies, and for the identification of new SHV variants and amino acid substitutions.
The SHVED contains 200 protein entries with distinct sequences and 20 crystal structures. 83 protein sequences are included in the both the SHV mutation table and the NCBI protein database, while 35 and 82 protein sequences are only in the SHV mutation table and the NCBI protein database, respectively. Of these 82 sequences, 41 originate from microbial sources, and 22 of them are full-length sequences that harbour a mutation profile which has not been classified yet in the SHV mutation table. 27 protein entries from the NCBI protein database were found to have an inconsistency in SHV name identification. These inconsistencies were reconciled using information from the SHV mutation table and stored in the SHVED.
The SHVED is accessible at http://www.LacED.uni-stuttgart.de/classA/SHVED/ webcite. It provides sequences, structures, and a multisequence alignment of SHV β-lactamases with the corrected annotation. Amino acid substitutions at each position are also provided. The SHVED is updated monthly and supplies all data for download.
The SHV β-Lactamase Engineering Database (SHVED) contains information about SHV variants with reconciled annotation. It serves as a tool for detection of inconsistencies in the NCBI protein database, helps to identify new mutations resulting in new SHV variants, and thus supports the investigation of sequence-function relationships of SHV β-lactamases.
Since the application of penicillin to the clinical practice in the 1940s, the effectiveness of β-lactam antibiotics have been reduced drastically [1-3]. One of the main reasons is the hydrolysis of their β-lactam ring by β-lactamases (EC 18.104.22.168) resulting in a loss of function. These enzymes, especially SHV and TEM β-lactamase variants, accumulate mutations gradually [4,5] to resist β-lactam antibiotics and rapidly spread over the world [6-8].
SHV β-lactamases belong to class A β-lactamases and have a serine in the active site . The premature protein consists of 286 amino acids. The first 21 amino acids at the N-terminus form the signal sequence and are removed to yield the mature enzyme . SHV β-lactamases were first described in the members of the genus Klebsiella as a narrow-spectrum β-lactamase against penicillin [6,11]. Their genes are located either in the bacterial chromosome or on a plasmid . Genes encoding these enzymes have been mutated rapidly and transferred to other Gram-negative bacteria in different geographical regions . Currently, 117 SHV variants have been described. A list of assigned SHV variants was compiled and maintained by Jacoby and Bush  which is referred further in this paper as "SHV mutation table". Beside the SHV mutation table, sequence information on SHV b-lactamases can also be found in the NCBI protein database . One of the important data sources of the NCBI protein database is the NCBI nucleotide database which is open for submission of new sequences without further validation; therefore it is growing rapidly, but contains inconsistencies. In contrast, the SHV mutation table is manually curated by experts in the b-lactamase field and therefore is widely accepted as a reliable and consistent information source. In the SHV mutation table, each SHV variant is characterized by its name and mutation profile which is a set of amino acid substitutions at certain positions in the sequence. Positions are identified according to the Ambler numbering scheme . To become listed in the SHV mutation table as a new SHV b-lactamase, it must have arisen naturally, is fully sequenced, and harbors a new mutation profile . Therefore, engineered proteins are not considered.
The SHV Engineering Database (SHVED) was built up as a comprehensive inventory by collecting data on SHV b-lactamases from these two databases to facilitate detection of inconsistencies in entries derived from NCBI protein database and to eventually reconcile them, to detect new SHV β-lactamases with novel mutation profiles, and to identify new amino acid positions at which mutations can occur.
Construction and content
Development and construction of SHVED
Amino acid sequence of SHV-1 originated from Klebsiella pneumoniae (GenInfo (GI): 4337048) was used as a seed sequence for building up the SHVED. A BLAST search  was performed against the NCBI protein database  without filtering of low complexity regions and with a low E-value threshold (10-124) to prevent the occurrence of TEM lactamases and other non-SHV lactamases in the BLAST results. For each hit in the BLAST result, the GI was extracted and the complete XML entry was downloaded from the NCBI protein database. Information on sequence, position-specific annotations, functional descriptions, and source organism was extracted from the entry and parsed by an automated retrieval system into an in-house developed relational database system . For BLAST results representing protein structures, monomers were extracted from the PDB  and deposited as structure entries.
Sequences generated from the annotated mutation profiles deposited in the SHV mutation table  were also incorporated into the SHVED. Except for 16 assigned SHVs which were "withdrawn" or "not yet released", 117 assigned SHV sequences were generated and parsed into the SHVED using the available information on amino acid exchanges and the reference sequence SHV-1. On the webpage, the "source organism" of these sequences was set to "Clinical sample" and the data source to 'lc' abbreviated from "Lahey Clinic" where the SHV mutation table is hosted.
Identification and naming of SHV β-lactamase sequences
Each protein sequence in the SHVED was aligned with SHV-1 using ClustalW  to identify its mutation profile. This mutation profile is the set of amino acid exchanges, deletions, and insertions occurring in a certain SHV, e.g. L35Q for the substitution of leucine at position 35 by glutamine. Subsequently, the mutation profile was matched against the mutation profiles listed in the SHV mutation table to identify whether the respective protein sequence is identical to an already assigned SHV. If the mutation profiles were identical, the protein was named accordingly (e.g. "SHV-3"). Otherwise it was named "SHV-like" and its mutation profile was stored. In the case of sequences longer than SHV-1, only the region corresponding to SHV-1 was examined to identify the mutation profile. Amino acid insertions arising inside the protein sequence were annotated, e.g. "-162.1D -162.2R" for the insertion of two residues aspartic acid and arginine after the residue at position 162. The amino acid deletion was annotated with the corresponding residue and position, e.g. "G54-" for the deletion of a glycine at position 54.
For sequences longer than SHV-1, the number of additional residues was recorded, e.g. "C+5" for a sequence 5 residues longer at its C terminus. Sequences shorter than SHV-1 were considered as fragments of the respective SHV sequences or the SHV-like sequences, although they were probably named differently in the entry of the source database. The number of missing residues at the N- and C- terminus were annotated, e.g. "N-21 C-3" for 21 and 3 residues missing at the N- and C- terminus, respectively.
Multisequence alignment and feature annotation
The annotation information was enriched by performing multisequence alignment using CLUSTALW . Information on secondary structure calculated using DSSP  were also included in the SHVED. Individual residues in the sequence as well as in the alignments were numbered according to the standard scheme suggested by Ambler 
Reconciliation of data inconsistencies
A systematic comparison of entries of the NCBI protein database and the SHV mutation table allows a reconciliation of NCBI protein database entries which have an inconsistent annotation. In the SHVED, the wrong name assignment is corrected if its mutation profile is already included in SHV mutation table. A sequence with a new mutation profile is stored in the SHVED as new SHV β-lactamase, even if it has been named by the authors by a (wrong) SHV name in the NCBI protein database. A link from the reconciled SHVED entry to the original NCBI protein database entry allows the author of the respective entry to correct an erroneous entry.
Data content of the SHVED
452 protein sequence entries from NCBI protein database and 117 protein sequences from SHV mutation table were collected and parsed into the SHVED, resulting in 200 distinct protein entries. 20 crystal structures of 2 SHV β-lactamases (SHV-1 and SHV-2) were stored in the SHVED. 19 crystal structures were from SHV-1 with one or two engineered mutations. Apart from the structure (PDB entry 3D4F) which is full-length sequence, all crystal structures lack the 21 residues of the N-terminal signal sequence. Two protein sequences (PDB entries 2A3U and 2A49) possess 5 and 4 additional residues, respectively, at their C-terminus (Table 1).
Table 1. PDB code of crystal structure entries in SHVED and their sequence annotations
Of the 200 proteins, 35 SHV sequences were derived from SHV mutation table, but not from the NCBI protein database, 82 protein sequences were exclusively found in the NCBI protein database, and 83 protein sequences were accessible in both source databases. In 82 protein sequences found only in the NCBI protein database, there are 41 sequences which originate from microbial sources and harbor a new mutation profile. 22 are full-length sequences (table 2) and 19 are fragments (table 3).
Table 2. New mutation profiles of full length sequences originating from microorganisms
Table 3. Fragments with new mutation profiles
Analysis of amino acid substitutions and substitution positions
In addition to the amino acid substitutions described in the SHV mutation table , 27 new substitution positions in protein entries originating from microbial sources have been identified. 11 new substitution positions found in full length sequences (table S1, Additional file 1) and 18 new substitution positions were found in fragments (table S2, Additional file 1), in which 2 new substitution positions could be found both in full length sequences and in fragments (positions 6 and 289). These new substitution positions spread over the complete protein sequence, including the signal peptide and the C-terminus. Most of the substitutions found in full length sequences are located at the protein surface and are distant from the active site, except for T235 and I260 (figure 1). Of the 18 new substitution positions found in fragments, 9 positions are at the C terminus, 4 positions on the protein surface, 3 positions in the protein core, and 2 in the signal peptide (figure 2). Not only the substitution at new positions, but also new amino acid exchanges at already known positions were found. As an example, the protein sequence with GI 259038268 harbors an lysine at the position 252 instead of a proline. In the SHV mutation table, only the substitution P252G is described.
Figure 1. The structure of SHV-1 β-lactamases (PDB entry 1SHV) with new substitution positions found in full length sequences. Amino acid side chains are shown in stick representation: substitutions occurring at novel positions (green), novel amino acid substitution at known position (red), active site residues (yellow).
Figure 2. The structure of SHV-1 β-lactamases (PDB entry 1SHV) with new substitution positions found in fragments. Amino acid side chains are shown in stick representation: substitutions occurring at novel positions (green), novel amino acid substitution at known position (red), active site residues (yellow).
There are 27 distinct protein entries derived from the NCBI protein database having inconsistent annotations (table 4). In all cases, the annotated SHV name is inconsistent with its mutation profile. For example, the protein sequence with GI 40950644 has three mutations (L35Q, G238S, and E240K), therefore, it should be named "SHV-12" according to the SHV mutation table, but it is actually annotated as "beta-lactamase SHV-5" in the NCBI protein database. In 12 cases, the protein sequence is a fragment and therefore there is not enough information to rename it in the SHVED.
Table 4. Inconsistencies between information from NCBI protein database and SHV mutation table
Data content of the SHVED
By systematic analysis of protein sequences in the SHVED, 41 protein sequences with a new mutation profile were identified. 22 of them are full length sequences originating from microbial sources and therefore are candidates for a new SHV number assignment. The new mutations occurred either at new position on the sequence or they were new amino acid exchange at already described positions.
Detection of novel SHV β-lactamases and novel amino acid substitutions
Except for one new mutation profile originating from a synthetic construct (GI 151861), all new mutation profiles originated from microbial sources. As a plasmid-bound gene, the SHV β-lactamase encoding blaSHV genes are easily transferred among the members of Gram-negative bacteria, especially Enterobacteriaceae because of their close genetic relationship . Thus, most of the newly detected SHV β-lactamases are from Enterobacteriaceae such as Klebsiella pneumoniae (14 SHVs), Escherichia coli (15 SHVs), Enterobacter cloacae (1 SHV), from both K.pneumoniae and E.coli (1 SHV), and from both K.pneumoniae and E.cloacae (1 SHV). Additionally, 3 new SHV variants were found in Acinetobacter baumannii and 1 new SHV variant was found in Salmonella enterica. Although 19 fragments harbor a new mutation profile, they can not be assigned to a new SHV number because of missing sequence information. However, the information about the substitution at new positions found in these fragments could be used in the future to predict the occurrence of new SHV variants.
Data inconsistencies and reconciliation
In all 27 cases of inconsistency, the annotated name differed from the actual mutation profile. However, the reasons of the inconsistency varied. In the case of the protein sequence with GI 154269503, the lysine at position 256 is substituted by an arginine, while it is reported that the lysine is exchanged by an arginine at position 250 (K250R) . In the SHV mutation table, it is listed as SHV-103 and characterized by the substitution of a leucine at position 250 by an arginine (L250R). A mutation at position 256 is not yet recorded in the SHV mutation table, and the mutation at position 250 can only be seen in the SHV-103. Probably, the difference in amino acid numbering by the author of GI 154269503 and by the curators of the SHV mutation table at Lahey Clinic caused the inconsistence. In the case of the protein sequence with GI 161367444, the inconsistency might derive from the primer used. In the sequence, only one mutation R202S was found, while it is annotated as SHV-104 which has two mutations (M5L and R202S) according to the SHV mutation table. It is noted in the NCBI entry that the forward primer "ATGCGTTATATTCGCCTGTGTATT" was used to amplify the target DNA, which results a methionine at position 5. Therefore, the deduced amino acid substitution M5L (if it actually occurred) could not be present in the deposited amino acid sequence, and the deposited amino acid sequence should not be annotated as SHV-104 because it does not harbor the mutation profile 'M5L R202S'. In the case of the protein sequence with GI 15718691, the duplication of a pentapeptide 163DRWET167 was reported  and assigned as SHV-16. But in addition, two mutations H96T and Y97H are present in the amino acid sequence. Therefore, it is not clear whether the actual SHV-16 harbors only the pentapeptide duplication or additionally the mutations H96T and Y97H. In other cases of inconsistency, the amino acid sequences were submitted to the NCBI protein database without corresponding publication and showed inconsistencies in their annotation. One example is the protein sequence with GI 30230495. It is annotated as SHV-48 which should harbor mutation V119I according to the SHV mutation table, while actually four mutations (L35Q, R191H, G238S, and E240K) were found in the deposited amino acid sequence. In the SHV mutation table, an inconsistency in residue numbering (position 253 and 255) was revealed and communicated to the curator for correction.
The SHV Lactamase Engineering Database (SHVED) was established to identify new SHV β-lactamases and to identify inconsistencies in public databases. Based on our analysis, 22 candidates for assignment of new SHV names were identified. 27 proteins entries with inconsistencies were found and reconciled. Also, three assigned mutation profiles were identified to be in doubt: SHV-16, SHV-103, and SHV-104. The SHVED thus supports the scientific community to name new SHV β-lactamases and to reconcile existing annotation of SHV β-lactamases sequences.
Availability and requirements
GI: GenInfo Identifier; SHVED: SHV Engineering Database
QKT developed the database, built the web pages, analyzed the data, and drafted the manuscript. JP supervised the study and finalized the manuscript. All authors read and approved the final manuscript.
We acknowledge Florian Wagner for valuable discussions and his support in building up the database. This work was supported by the Federal Ministry of Education and Research of Germany (VNB 04/B12).
Clin Microbiol Rev 2001, 14(4):933-951.
table of contentsPubMed Abstract | Publisher Full Text | PubMed Central Full Text
Paterson DL, Hujer KM, Hujer AM, Yeiser B, Bonomo MD, Rice LB, Bonomo RA: Extended-spectrum beta-lactamases in Klebsiella pneumoniae bloodstream isolates from seven countries: dominance and widespread prevalence of SHV- and CTX-M-type beta-lactamases.
Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.
Abbassi MS, Torres C, Achour W, Vinue L, Saenz Y, Costa D, Bouchami O, Ben Hassen A: Genetic characterisation of CTX-M-15-producing Klebsiella pneumoniae and Escherichia coli strains isolated from stem cell transplant patients in Tunisia.