Abstract
Background
Discovery of precise specificity of transcription factors is an important step on the way to understanding the complex mechanisms of gene regulation in eukaryotes. Recently, doublestranded proteinbinding microarrays were developed as a potentially scalable approach to tackle transcription factor binding site identification.
Results
Here we present an algorithmic approach to experimental design of a microarray that allows for testing full specificity of a transcription factor binding to all possible DNA binding sites of a given length, with optimally efficient use of the array. This design is universal, works for any factor that binds a sequence motif and is not speciesspecific. Furthermore, simulation results show that data produced with the designed arrays is easier to analyze and would result in more precise identification of binding sites.
Conclusion
In this study, we present a design of a double stranded DNA microarray for proteinDNA interaction studies and show that our algorithm allows optimally efficient use of the arrays for this purpose. We believe such a design will prove useful for transcription factor binding site identification and other biological problems.
Background
With the human and many other genome sequences complete or nearing completion, we are approaching the goal of identifying all the protein coding genes. However, to understand the function of these genes in different physiological contexts, it is important to understand how their expression is regulated. Mechanisms of gene regulation are varied and complex and unraveling them will require a combination of approaches[1,2]. Having a catalog of all the transcription factors and being able to characterize their binding specificity at cisregulatory sites would provide a fruitful starting point.
Recent advances in chromatin immunoprecipitation (CHIP) methods have led to largescale efforts to determine all proteinDNA binding events in yeast[3,4] but scaling up such methods for mammalian genomes may prove difficult. Proteinbinding microarrays (PBM), initially developed on a small scale by Bulyk et al[5,6] showed promise in identifying transcription factor binding specificity with high accuracy and was recently successfully scaled up for the yeast genome by using PBMs with all known yeast intergenic regions[7]. Although an exciting advance in the field, current design of PBMs still leaves room for uncertainty because some of the intergenic regions may be too long to pinpoint the binding sites with high accuracy. Scaling this method up to mammalian genomes would also require designs spanning multiple arrays, with a new design for each genome. Both CHIP and PBM methods are well suited for low resolution identification of genes affected by a given transcription factor. However, in order to fully understand regulation, researchers will always be interested in pinpointing the specific regions to which the factor binds. Identifying this region from CHIPCHIP or PBM data requires sophisticated computational analysis, much like that used in abinitio cisregulatory region discovery. Reliability of such analyses is sometimes questionable, in part because of the repetitive and degenerate nature of the intergenic sequences. Harbison et al. note that some intergenic sequences are highly homologous thus skewing the results of motif discovery algorithms[4]. If there was a way to test the binding of a given factor to all possible motifs of a given length, it would then be trivial to scan the intergenic sequences for potential sequences corresponding to a welldefined motif. We therefore propose a new PBM design that would allow the testing of all possible binding sequences of a given length in an optimallyefficient nondegenerate manner.
In recent years, a number of technological innovations took place, allowing programmable synthesis of microarrays as well as new techniques to make the arrays doublestranded[8,9]. In particular, Warren et al. successfully constructed and tested a combinatorial dsDNA array with all possible 8mer sequences, with one sequence per spot[9]. Since the proof of principle for this technology has now been shown, here we focus on optimizing experimental design. Using variations on established graph theory algorithms, we propose a new design of a PBM, which would allow the invitro testing of transcription factor binding to all possible DNA targets up to length 12. This approach removes some of the redundancy in testing long intergenic regions. In addition, our design is organismindependent.
Results
Algorithm
The design, as described by Bulyk et al. in proofofconcept papers [5,6] allows for testing N binding sites by screening N spots on the array. This approach is straightforward but not very practical for most transcription factors because the number of possible binding sequences is 4^{k}, where k is the length of the binding site.
The more recent design involved spotting all annotated yeast intergenic regions on the array[7]. This comprehensive approach is more scaleable, although mammalian genomes contain long "desert" regions[10] which would most likely have to be broken up into shorter segments for spotting on microarrays. In order to identify the transcription factor binding sites within the spotted regions, in this as well as in many other approaches, the authors rely on a variant of the Gibbs sampling algorithm. Some of the longer intergenic regions tested may present a problem in identifying binding patterns for lowspecificity transcription factors. Uniform probe length and optimal nonredundancy of the array proposed here would make it easier to analyze experimental results and estimate their statistical significance.
We propose the design of a dsDNA array that allows screening for length k TF binding sites with maximum efficiency by allowing the kmers to overlap. For instance, the 8mer probe ACTGTGCA represents two potential 7mer TF binding sites – ACTGTGC and CTGTGCA. It turns out that we can easily design an array with probes of certain length b that contain all possible kmers, such that the required number of probes is minimal. If we can find the shortest string that contains all possible kmer substrings, we can then "cut up" this string into individual probes of desired length. The problem of constructing such a minimumlength string can be represented in graphtheoretical formulation (see Methods for details).
Imagine a directed graph with nodes represented by all possible kmers, where the edges exist between nodes that overlap by (k1). Finding the shortest path for a graph of all possible kmers results in a superstring of length (4^{k }+ k). Given a desired probe length b > k, we can design an array with N probes that enables us to test the binding specificity of any transcription factor that can bind to a kmer. The number of probes would have to be approximately
N = 4^{k}/(bk+1)
The length of a string produced by naively joining all possible kmers is k*4^{k}. This means we are able to reduce the number of probes by a factor of k. Furthermore, we can turn the reverse complementarity of doublestranded DNA sequence to our advantage and gain another factor of 2 reduction in number of array probes[9,11]. For instance, having included the 7mer ACTGTGC in the superstring and assuming that the array probe will be double stranded, we are already accounting for the reverse complement 7mer GCACAGT. This introduces some complications in the algorithm, which we discuss in Methods.
Figure 1 shows the graph and the resulting "probes" for the simplest case, where k = 2. Here, we save approximately a factor of 4 of the length of DNA to be tested, but for all possible 10mers, we would save a factor of ~20.
Figure 1. Probe design from the shortest path on a graph. The de Bruijn graph for all possible DNA base doublets and one possible solution for a shortest path represented as a pseudoEulerian cycle (bold edges). The reverse complement solution is represented by dashed edges in the graph and also the inner cycle sequence. "Cutting" the circular sequence while retaining one overlapping base results in two sequences of total length 12 (containing all doublets) as compared to the length of all nonoverlapping concatenated doublets 2 * 4^{2 }= 32. Cutting the circular sequence at different points allows screening multiple replicates and helps identify biases in sequence recognition preferences. Reverse complement strands for the replicates are not shown.
We would also need to take into account some additional considerations, such as allowing for spacers on either side of the designed sequence to ensure reliable binding, as well as a primer, if the double stranded DNA is constructed enzymatically. We believe such an approach takes some of the ambiguity out of the decoding process that is needed in current approaches that rely on spotting long intergenic regions[7].
Experimental design
Using our combinatorial design, testing of all possible 10mers with an array of probes of length 25 (not including any spacers or primers) requires only 32928 probes. To avoid potential problems with factors binding to multiple sites on a given probe, and to aid in the identification of precise binding sites, the experiment may be performed in duplicate, with the cut points on the cyclical superstring shifted by k/2 (Figure 1). Table 1 shows the calculations for the number of probes needed on the array for a range of motif lengths k and array probe lengths b.
Table 1. Sample calculations for the number of probes/array
Identifying the actual binding sequences given intergenic array spot data is a nontrivial problem, which Mukherjee et al. addressed by Gibbssampling algorithms[7,12]. This problem arises from a combination of two factors: 1) many intergenic sequences are quite long (mean length 486 bp for yeast), increasing the probability of finding multiple binding sites; 2) intergenic sequences are inherently redundant. Our combinatorial design addresses both of these issues by proposing reasonably short and optimally nonredundant sequence features.
In order to illustrate the advantage of our design in more precisely identifying the exact binding sequences, we carried out simulation experiments with yeast Rap1 transcription factor, yeast TATABox Binding Protein (TBP), as well as 100 random binding sites of length 10. Since some transcription factors are known to tolerate substantial variation of the binding site sequence, we generated all possible double mutants for every starting consensus binding site sequence and assumed that all those sequences would be recognized on the array. For our designed array, we chose a design from Table 1 with k = 10 and b = 25. Because a probe of length 25 is statistically much less likely to contain multiple binding sites for a given factor than a probe of length 486, we also included a combinatorial design with b = 486. Note that synthesis of a dsDNA array with feature length of 486 would be very difficult if not impossible and is only used here to illustrate the properties of combinatorial design. The results of these simulations are presented in Figures 2, 3, 4. The simulation data shows that for Rap1 and for random 10mers, about 20–30% of intergenic PBM probes producing signal on the array in fact contain more than one binding site. This figure is greater than 70% for the more degenerate TATAbox sequence. In all cases, the designed array, even with average probe length of 486 results in significantly fewer multiple site probes, showing that nonredundancy comes from our combinatorial design and not just from the reduced probe length. Furthermore, results for the designed array with 25mer probes are good enough that in doing the array analysis, one can assume a single binding event per probe. Rap1 and the averaged data for 100 random sequences show ~1–2% multiple binding sites per probe. The TBP simulation results in ~6.5% putative multiple binding events.
Figure 2. Distribution of putative PBM probe hits for Rap1. Frequency of array probe hits distributed by number of potential binding sites per probe. All sequences one or two mutations away from the consensus sequence are assumed to bind.
Figure 3. Distribution of putative PBM probe hits for TBP. Frequency of array probe hits distributed by number of potential binding sites per probe. All sequences one or two mutations away from the consensus sequence are assumed to bind.
Figure 4. Distribution of putative PBM probe hits for 100 random transcription factor binding sites of length 10. Frequency of array probe hits distributed by number of potential binding sites per probe. The data is averaged over 100 random 10mer binding sites. For each 10mer, all sequences one or two mutations away from the consensus sequence are assumed to bind.
Signaltonoise ratio
As mentioned above, the problem of finding precise binding sites in long intergenic sequences used in CHIP and PBM experiments, is traditionally addressed by Gibbssampling and related algorithms. The reasons why Gibbs sampling algorithms do not always perform well fundamentally come down the ratio of signal to noise in the dataset in question. This ratio can be estimated as the number of basepairs involved in binding divided by the total number of basepairs in the array probe. Since the number of binding site bases in the combinatorial design remains approximately the same, and the total probe length decreases from a mean of 486 bp to 25, we can estimate that our design reduces the signaltonoise ratio by at least an order of magnitude. Indeed, finding a 10mer binding site in a set of 25mers is almost a trivial Gibbs sampling problem. In order to test the robustness of our designed array to experimental noise, we constructed a 10 bp wide PWM (Position Weight Matrix) of the Rap1 transcription factor from TRANSFAC[13] data, containing 14 distinct aligned sequences. Assuming, for testing purposes, that these sequences represent the entire set of Rap1 targets, we found all the combinatorial array probes and those of one replicate (see Figure 1 and legend) that included those sequences. We then proceeded to remove a fraction of these sequences from the probe set and substitute for them random probes, not containing the binding site. Upon each iteration, we used BioProspector[14], a popular implementation of the Gibbs sampling algorithm, to scan the sequences 100 times and find an overrepresented motif. We then used CompareACE[15] to calculate the correlation coefficient between the obtained motif and the original PWM that we started with. The results are presented in Figure 5. The motif extracted with the Gibbs sampler remains essentially identical to the original, withstanding up to 50% substituted noise.
Figure 5. Robustness of designed array and Gibbs Sampler to addition of noise. Starting with a set of 10mer Rap1 TRANSFAC binding sites, the effect of added noise is measured as correlation of the original PWM with that derived from 100 Gibbs Samplerruns. Each level of noise is represented by the standard boxandwhisker plot. In the 0–50% noise range, the boxes are so small that they are essentially represented by a single line.
Flanking sequences
The early versions of PBMs were made doublestranded by enzymatic primer extension, [5,8] which would mean that the combinatorial portion of the probe intended to assay for protein binding would be adjacent (either 3' or 5') to a constant primer sequence. Of course, any such primer sequence could also contain a portion of a binding site or even an entire binding site, making it difficult to analyze the data. The more recent approach involved only a short 3base flanking sequence on either side of the combinatorial portion of the probe, thus eliminating the problem[9]. Nevertheless, the enzymatic primer extension approach remains a valid option and has the advantage of higher fidelity, compared with oligo synthesis. It is therefore important to address the potential challenge of analyzing data from an experiment where the flanking sequence is bound on some probes and deciphering the true binding site in such an experiment.
We propose that this challenge be addressed by making a replicate array (Figure 1). The simplest approach would be to make a replicate array with different primers/flanking sequences. If the number of bound probes differs significantly between the two replicates, it would suggest that the flanking sequence is involved in one of them. Analysis of the array with the smaller number of bound probes should reveal the true binding site and help extract additional information from the other replicate.
Even with constant flanking sequence, we could solve the problem by making one or more nonidentical array replicates obtained by "shifting" the probe cut sites on the superstring sequence as illustrated in Figure 1. The advantage of such replicate design is that, while the set of kmers on the array remains the same, the position of each kmer with respect to the chip surface is different. Table 2 contains simulated examples for the case when half of the Rap1 consensus binding site (CACCCATACA) is contained in the flanking primer sequence of the probe, thus allowing for a large number of possibilities matching in the combinatorial part of the probe. We can filter the matching probes, retaining only those replicate probe pairs that contain at least one combinatorial kmer in common with each other. If the flanking sequence contained a portion of the binding site, the number of probes should decrease substantially after filtering, otherwise most of the probes will be retained (Table 2). For cases when a portion of the flanking sequence is involved in binding, the filtering procedure will also retain some randomly paired probes but because the signaltonoise ratio is high, the true binding site can still be easily detected by Gibbs sampling.
Table 2. Using array replicates to discover the Rap1 binding site when the flanking sequence is involved in binding.
Discussion
While the technological aspects of array construction have been the subject of much recent work, less attention has been paid to the oligonucleotides on these arrays in terms of experimental design. Here we have laid out an algorithmic solution to the design of a DNA microarray that would allow the characterization of binding specificity of any transcription factor independent of the species under study. The solution discussed here focuses on the algorithmic part of the problem and does not include some of the concerns involved in the production of such an array. However, we believe that given the recent advances in microarray technology, the arrays described here are well within the reach of current state of the art. Custom arrays can be obtained from several sources such as Agilent, Nimblegen[16] and several others and new technologies for programmable array synthesis are still being developed[17]. Synthesis of the complementary strand on the arrays can be achieved enzymatically with a surfaceproximal primer[5] or with other, more recently developed methods[8,9].
Analysis of intergenic PBM data has been complicated by the fact that the sequences are long, redundant, and often contain multiple binding sites especially for factors that do not bind with high specificity. Our design addresses this problem and in simulations produces data that is much easier to analyze due to higher signaltonoise ratio. Given our simulation data, it seems reasonable to make the assumption of a single binding site per probe and thus make it much easier for Gibbs sampling algorithms to converge on the correct solution.
The combinatorial array design that includes all possible kmers also has the advantage that as genome annotation continues to improve, including the validation of intron/exon boundaries and discovery of novel genes, the data obtained from such an array remains valid and relevant.
Despite the probe number savings offered by the design presented here, the exponential growth of the number of probes as a function of k will limit the length of combinatorial binding sites. However, even with k up to 12, the design can be applied to many important unresolved problems. Applications of ideas presented here extend beyond transcription factor interactions. For instance, they may also prove useful to characterize restriction enzyme specificity, DNA methylation patterns and in other systematic studies. The array could be used to study not only the binding patterns of natural DNAbinding proteins, but also to analyze mutants and thus help us gain a more detailed understanding of the nature of specificity/promiscuity of these interactions as well as design new ones.
Conclusion
In this study, we present the design of a microarray containing all combinations of a DNA motif for testing of transcription factor binding and other proteinDNA interaction applications. The advantage of this approach is that it is exhaustive and the same exact design could be used for any genome. Furthermore, uniform probe lengths and optimal nonredundancy allows for a more straightforward statistical analysis of the results. Combined with recent advances in PBM technology development,[9] our design will enable more precise identification of true binding sites.
Methods
The problem of constructing a minimumlength string can be represented in graphtheoretical formulation. Imagine a directed graph with nodes represented by all possible kmers, where the edges
<u,v> exist iff u = s_{1}s_{2 }... s_{n1 }and v = s_{2 }... s_{n1}s_{n}
Then, walking the shortest path through this graph results in the construction of the shortest cyclical sequence that contains all the subsequences only once. This turns out to be a wellknown problem in computer science known as the Chinese Postman problem. The shortest path visiting the edges only once is known as the Eulerian cycle. Moreover, the problem is specifically known in terms of constructing the minimal string sequence known as the de Bruijn sequence. The graph consisting of all possible subsequences of a certain length from an alphabet of a given size is known as the de Bruijn graph. A Eulerian path is easily found in linear time with Fleury's algorithm[18].
The algorithm has to be modified to take advantage of the fact that for a doublestranded DNA probe, every kmer in the probe will also have a reverse complement and therefore, the reverse complement sequence optimally should not be included in the superstring. Every de Bruijn graph therefore contains within it two "reverse complementary" subgraphs. There is an additional complication arising from the fact that graphs with k = even and k = odd have different properties. Constructing the minimal superstring for oddk graphs amounts to finding two "pseudoEulerian" cycles, which are reverse complementary to each other. This can be achieved simultaneously in the context of Fleury's algorithm. Evenk graphs are further complicated by the fact that some nodes are reverse complements of each other (e.g. ACGT) and are therefore shared nodes between the two reverse complementary subgraphs. Because of this peculiarity, the number of nodes in a "pseudoEulerian" cycle containing each kmer or its reverse complement only once is equal to k/2 for odd k graphs and slightly more than k/2 for even k graphs. As shown in Figure 1, this comes from the fact that kmers that are reverse complements of each other have to be counted twice – once for each of the reversecomplementary subgraphs. The figure shows two possible "pseudoEulerian" reversecomplementary cycles for k = 2, with the four selfcomplementary nodes highlighted.
In simulation to test how robust the array probes are to noise, BioProspector software was run to try to find a motif 100 times per run, using the probe sequences from the entire designed array as background.
In primer/flanking sequence simulations, we used ACTGACGTACTGGTTT as a control primer (not containing a part of Rap1 binding site) and ACTGACGTACTCACCC as the primer sequence with the last 5 bases overlapping the Rap1 consensus binding site (CACCCATACA).
Authors' contributions
JM and MBE conceived and designed the study. JM carried out the study and drafted the manuscript. All authors read and approved the final manuscript.
Acknowledgements
J.M. was supported by Department of Energy Computational Science Graduate Fellowship (CSGF). The authors wish to thank Boris Shakhnovich for advice and discussions.
References

Davidson EH, Rast JP, Oliveri P, Ransick A, Calestani C, Yuh CH, Minokawa T, Amore G, Hinman V, ArenasMena C, Otim O, Brown CT, Livi CB, Lee PY, Revilla R, Rust AG, Pan Z, Schilstra MJ, Clarke PJ, Arnone MI, Rowen L, Cameron RA, McClay DR, Hood L, Bolouri H: A genomic regulatory network for development.
Science 2002, 295(5560):16691678. PubMed Abstract  Publisher Full Text

Bolouri H, Davidson EH: Modeling transcriptional regulatory networks.
Bioessays 2002, 24(12):11181129. PubMed Abstract  Publisher Full Text

Lee TI, Rinaldi NJ, Robert F, Odom DT, BarJoseph Z, Gerber GK, Hannett NM, Harbison CT, Thompson CM, Simon I, Zeitlinger J, Jennings EG, Murray HL, Gordon DB, Ren B, Wyrick JJ, Tagne JB, Volkert TL, Fraenkel E, Gifford DK, Young RA: Transcriptional regulatory networks in Saccharomyces cerevisiae.
Science 2002, 298(5594):799804. PubMed Abstract  Publisher Full Text

Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, Macisaac KD, Danford TW, Hannett NM, Tagne JB, Reynolds DB, Yoo J, Jennings EG, Zeitlinger J, Pokholok DK, Kellis M, Rolfe PA, Takusagawa KT, Lander ES, Gifford DK, Fraenkel E, Young RA: Transcriptional regulatory code of a eukaryotic genome.
Nature 2004, 431(7004):99104. PubMed Abstract  Publisher Full Text

Bulyk ML, Gentalen E, Lockhart DJ, Church GM: Quantifying DNAprotein interactions by doublestranded DNA arrays.
Nat Biotechnol 1999, 17(6):573577. PubMed Abstract  Publisher Full Text

Bulyk ML, Huang X, Choo Y, Church GM: Exploring the DNAbinding specificities of zinc fingers with DNA microarrays.
Proc Natl Acad Sci U S A 2001, 98(13):71587163. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Mukherjee S, Berger MF, Jona G, Wang XS, Muzzey D, Snyder M, Young RA, Bulyk ML: Rapid analysis of the DNAbinding specificities of transcription factors with DNA microarrays.
Nat Genet 2004, 36(12):13311339. PubMed Abstract  Publisher Full Text

Wang JK, Li TX, Lu ZH: A method for fabricating unidsDNA microarray chip for analyzing DNAbinding proteins.
J Biochem Biophys Methods 2005, 63(2):100110. PubMed Abstract  Publisher Full Text

Warren CL, Kratochvil NC, Hauschild KE, Foister S, Brezinski ML, Dervan PB, Phillips GNJ, Ansari AZ: Defining the sequencerecognition profile of DNAbinding molecules.
Proc Natl Acad Sci U S A 2006, 103(4):867872. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Hillier LW, Graves TA, Fulton RS, Fulton LA, Pepin KH, Minx P, WagnerMcPherson C, Layman D, Wylie K, Sekhon M, Becker MC, Fewell GA, Delehaunty KD, Miner TL, Nash WE, Kremitzki C, Oddy L, Du H, Sun H, BradshawCordum H, Ali J, Carter J, Cordes M, Harris A, Isak A, van Brunt A, Nguyen C, Du F, Courtney L, Kalicki J, Ozersky P, Abbott S, Armstrong J, Belter EA, Caruso L, Cedroni M, Cotton M, Davidson T, Desai A, Elliott G, Erb T, Fronick C, Gaige T, Haakenson W, Haglund K, Holmes A, Harkins R, Kim K, Kruchowski SS, Strong CM, Grewal N, Goyea E, Hou S, Levy A, Martinka S, Mead K, McLellan MD, Meyer R, RandallMaher J, Tomlinson C, DauphinKohlberg S, KozlowiczReilly A, Shah N, SwearengenShahid S, Snider J, Strong JT, Thompson J, Yoakum M, Leonard S, Pearman C, Trani L, Radionenko M, Waligorski JE, Wang C, Rock SM, TinWollam AM, Maupin R, Latreille P, Wendl MC, Yang SP, Pohl C, Wallis JW, Spieth J, Bieri TA, Berkowicz N, Nelson JO, Osborne J, Ding L, Meyer R, Sabo A, Shotland Y, Sinha P, Wohldmann PE, Cook LL, Hickenbotham MT, Eldred J, Williams D, Jones TA, She X, Ciccarelli FD, Izaurralde E, Taylor J, Schmutz J, Myers RM, Cox DR, Huang X, McPherson JD, Mardis ER, Clifton SW, Warren WC, Chinwalla AT, Eddy SR, Marra MA, Ovcharenko I, Furey TS, Miller W, Eichler EE, Bork P, Suyama M, Torrents D, Waterston RH, Wilson RK: Generation and annotation of the DNA sequences of human chromosomes 2 and 4.
Nature 2005, 434(7034):724731. PubMed Abstract  Publisher Full Text

Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, Wootton JC: Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment.
Science 1993, 262(5131):208214. PubMed Abstract  Publisher Full Text

Matys V, Fricke E, Geffers R, Gossling E, Haubrock M, Hehl R, Hornischer K, Karas D, Kel AE, KelMargoulis OV, Kloos DU, Land S, LewickiPotapov B, Michael H, Munch R, Reuter I, Rotert S, Saxel H, Scheer M, Thiele S, Wingender E: TRANSFAC: transcriptional regulation, from patterns to profiles.
Nucleic Acids Res 2003, 31(1):374378. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Liu X, Brutlag DL, Liu JS: BioProspector: discovering conserved DNA motifs in upstream regulatory regions of coexpressed genes.
Pac Symp Biocomput 2001, 127138. PubMed Abstract

Hughes JD, Estep PW, Tavazoie S, Church GM: Computational identification of cisregulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae.
J Mol Biol 2000, 296(5):12051214. PubMed Abstract  Publisher Full Text

Albert TJ, Norton J, Ott M, Richmond T, Nuwaysir K, Nuwaysir EF, Stengele KP, Green RD: Lightdirected 5'>3' synthesis of complex oligonucleotide microarrays.
Nucleic Acids Res 2003, 31(7):e35. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Egeland RD, Southern EM: Electrochemically directed synthesis of oligonucleotides for DNA microarray fabrication.
Nucleic Acids Res 2005, 33(14):e125. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Skiena SS: The algorithm design manual. New York , Springer; 1998:XVI, 486 s..