Annotation pipeline for the assignment of enzymatic functions to K. lactis genes. Each EEGC locus tag was firstly queried on UniProt and, if present, the assignment was accepted and the gene was annotated. If not, then a S. cerevisiae gene was sought in the BLAST hits kept by merlin for such gene (STEP B). If a baker’s yeast homologue (STEP B1) was available, its identifier (YXX####x) was searched in both UniProt and SGD databases. When both databases records were identical, the gene was annotated; else, the records would be examined and the SGD entries would be favoured. For the EEGC’s that did not have any S. cerevisiae homologue (STEP B2), a new specific similarity search was performed in NCBI BLAST, restraining the possible outcomes to Swiss-Prot reviewed records and the organism to S. cerevisiae, with the acceptable e-value decreased to e < 1E-10. If there was an entry that complied with those conditions, the gene was annotated; else, the BLAST similarity search was unrestricted, organism wise. Again, if there was an entry that complied with the previous conditions, the gene was annotated as homologue of the first hit, else it was discarded. The previously annotated information was revised in BRENDA to verify the function about to be annotated to such gene (STEP C). Finally, the information collected in the previous steps is assigned to the EEGC (STEP D), rendering the EEGC a metabolic gene or discarding such gene as metabolic.
Dias et al. BMC Genomics 2012 13:517 doi:10.1186/1471-2164-13-517