<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art><ui>1471-2164-12-342</ui><ji>1471-2164</ji><fm>
<dochead>Research article</dochead>
<bibl>
<title>
<p>Identification of lignin genes and regulatory sequences involved in secondary cell wall formation in <it>Acacia auriculiformis </it>and <it>Acacia mangium </it>via <it>de novo </it>transcriptome sequencing</p>
</title>
<aug>
<au id="A1"><snm>Wong</snm><mi>ML</mi><fnm>Melissa</fnm><insr iid="I1"/><email>melissawongukm@gmail.com</email></au>
<au id="A2"><snm>Cannon</snm><mi>H</mi><fnm>Charles</fnm><insr iid="I2"/><insr iid="I3"/><email>chuck@xtbg.ac.cn</email></au>
<au ca="yes" id="A3"><snm>Wickneswari</snm><fnm>Ratnam</fnm><insr iid="I1"/><email>wicki@ukm.my</email></au>
</aug>
<insg>
<ins id="I1"><p>School of Environmental and Natural Resource Sciences, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, UKM Bangi 43600, Selangor, Malaysia</p></ins>
<ins id="I2"><p>Ecological Evolution Group, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Science, Menglun, Mengla 666303, Yunnan, P. R. China</p></ins>
<ins id="I3"><p>Department of Biological Sciences, Texas Tech University, Lubbock, TX 79409 USA</p></ins>
</insg>
<source>BMC Genomics</source>
<issn>1471-2164</issn>
<pubdate>2011</pubdate>
<volume>12</volume>
<issue>1</issue>
<fpage>342</fpage>
<url>http://www.biomedcentral.com/1471-2164/12/342</url>
<xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-12-342</pubid><pubid idtype="pmpid">21729267</pubid></pubidlist></xrefbib>
</bibl>
<history><rec><date><day>16</day><month>1</month><year>2011</year></date></rec><acc><date><day>5</day><month>7</month><year>2011</year></date></acc><pub><date><day>5</day><month>7</month><year>2011</year></date></pub></history>
<cpyrt><year>2011</year><collab>Wong et al; licensee BioMed Central Ltd.</collab><note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
<abs>
<sec>
<st>
<p>Abstract</p>
</st>
<sec>
<st>
<p>Background</p>
</st>
<p>
<it>Acacia auriculiformis </it>&#215; <it>Acacia mangium </it>hybrids are commercially important trees for the timber and pulp industry in Southeast Asia. Increasing pulp yield while reducing pulping costs are major objectives of tree breeding programs. The general monolignol biosynthesis and secondary cell wall formation pathways are well-characterized but genes in these pathways are poorly characterized in <it>Acacia </it>hybrids. RNA-seq on short-read platforms is a rapid approach for obtaining comprehensive transcriptomic data and to discover informative sequence variants.</p>
</sec>
<sec>
<st>
<p>Results</p>
</st>
<p>We sequenced transcriptomes of <it>A. auriculiformis </it>and <it>A. mangium </it>from non-normalized cDNA libraries synthesized from pooled young stem and inner bark tissues using paired-end libraries and a single lane of an Illumina GAII machine. <it>De novo </it>assembly produced a total of 42,217 and 35,759 contigs with an average length of 496 bp and 498 bp for <it>A. auriculiformis </it>and <it>A. mangium </it>respectively. The assemblies of <it>A. auriculiformis </it>and <it>A. mangium </it>had a total length of 21,022,649 bp and 17,838,260 bp, respectively, with the largest contig 15,262 bp long. We detected all ten monolignol biosynthetic genes using Blastx and further analysis revealed 18 lignin isoforms for each species. We also identified five contigs homologous to R2R3-MYB proteins in other plant species that are involved in transcriptional regulation of secondary cell wall formation and lignin deposition. We searched the contigs against public microRNA database and predicted the stem-loop structures of six highly conserved microRNA families (miR319, miR396, miR160, miR172, miR162 and miR168) and one legume-specific family (miR2086). Three microRNA target genes were predicted to be involved in wood formation and flavonoid biosynthesis. By using the assemblies as a reference, we discovered 16,648 and 9,335 high quality putative Single Nucleotide Polymorphisms (SNPs) in the transcriptomes of <it>A. auriculiformis </it>and <it>A. mangium</it>, respectively, thus yielding useful markers for population genetics studies and marker-assisted selection.</p>
</sec>
<sec>
<st>
<p>Conclusion</p>
</st>
<p>We have produced the first comprehensive transcriptome-wide analysis in <it>A. auriculiformis </it>and <it>A. mangium </it>using <it>de novo </it>assembly techniques. Our high quality and comprehensive assemblies allowed the identification of many genes in the lignin biosynthesis and secondary cell wall formation in <it>Acacia </it>hybrids. Our results demonstrated that Next Generation Sequencing is a cost-effective method for gene discovery, identification of regulatory sequences, and informative markers in a non-model plant.</p>
</sec>
</sec>
</abs>
</fm><bdy>
<sec>
<st>
<p>Background</p>
</st>
<p>Next Generation Sequencing (NGS) is quickly becoming the standard for the generation of cheap, accurate and high throughput DNA sequence data <abbrgrp>
<abbr bid="B1">1</abbr>
</abbrgrp>. The major NGS platforms are Roche 454 GS-FLX Titanium (330 bp), Illumina GAIIx (75-100 bp) and SOLiD3 (50 bp), which differ in read length, error rate and cost <abbrgrp>
<abbr bid="B2">2</abbr>
</abbrgrp>. Transcriptome sequencing using NGS, commonly known as RNA-Seq, enables rapid and cost-effective gene and marker discovery, gene expression analysis, detection of rare variants and splice isoforms. Most previous studies have involved sequencing plant transcriptomes with completed reference genomes available, such as <it>Arabidopsis thalina </it>
<abbrgrp>
<abbr bid="B3">3</abbr>
<abbr bid="B4">4</abbr>
</abbrgrp>, <it>Medicago truncatula </it>
<abbrgrp>
<abbr bid="B5">5</abbr>
</abbrgrp> and <it>Zea mays </it>
<abbrgrp>
<abbr bid="B6">6</abbr>
<abbr bid="B7">7</abbr>
</abbrgrp>. Direct sequencing of the transcriptome of non-model organisms has the potential to rapidly generate valuable genomic resources in poorly known species. However, <it>de novo </it>transcriptome assembly is challenging due to short reads, lack of reference sequences and the need for development of improved bioinformatic tools to facilitate data analysis <abbrgrp>
<abbr bid="B8">8</abbr>
</abbrgrp>.</p>
<p>Most <it>de novo </it>transcriptome studies have used the Roche 454 platforms <abbrgrp>
<abbr bid="B9">9</abbr>
<abbr bid="B10">10</abbr>
<abbr bid="B11">11</abbr>
<abbr bid="B12">12</abbr>
<abbr bid="B13">13</abbr>
</abbrgrp> as the longer reads allow more reliable <it>de novo </it>assembly, however, the reactions are relatively expensive, reducing the potential sequencing coverage which plays a major role in the accuracy of <it>de novo </it>assembly. Hybrid sequencing approaches using 454/Illumina technologies can successfully reduce cost and compensate for different sequencing technology biases <abbrgrp>
<abbr bid="B14">14</abbr>
<abbr bid="B15">15</abbr>
</abbrgrp>. While sequencing exclusively using Illumina technology, the most widely published NGS platform is an attractive and cheap alternative as the high coverage obtained can overcome sequencing error rates and short read length, relatively few <it>de novo </it>transcriptome studies have exploited these advantages in plants <abbrgrp>
<abbr bid="B16">16</abbr>
</abbrgrp>
<url>https://atgc-illumina.googlecode.com/files/PAG_2010_AKozik_V09.pdf</url>. As read lengths increase, paired-end library construction techniques improve and costs continue to go down, Illumina RNA-seq will become a powerful tool for transcriptome characterization of non-model plants.</p>
<p>
<it>Acacia mangium </it>and <it>Acacia auriculiformis </it>are important forest tree species, belonging to the Fabaceae or Legume family, and are native to Australia, Papua New Guinea and Indonesia. <it>A. mangium </it>is widely planted in Southeast Asia because of its superior growth, wide site suitability and multiple uses <abbrgrp>
<abbr bid="B17">17</abbr>
<abbr bid="B18">18</abbr>
</abbrgrp> while <it>A. auriculiformis </it>has higher adaptability, greater durability and is less susceptible to diseases than <it>A. mangium</it>. <it>A. auriculiformis </it>and <it>A. mangium </it>are predominantly out-crossing <abbrgrp>
<abbr bid="B19">19</abbr>
<abbr bid="B20">20</abbr>
</abbrgrp>. Naturally-crossed <it>Acacia </it>hybrids were first noted in Sabah in the late 1970s <abbrgrp>
<abbr bid="B21">21</abbr>
</abbrgrp>. These hybrids possessed many attractive traits highly sought in tree improvement, such as enhanced growth, form, disease resistance and adaptability. For the wood and pulp industry, the <it>Acacia </it>hybrids have great potential as raw material due to superior growth, longer wood fibers and better pulp quality over their parents <abbrgrp>
<abbr bid="B22">22</abbr>
</abbrgrp>. Low lignin and high cellulose content are desirable in the pulping process and studies have shown increased accumulation of cellulose occurs when lignin is reduced in plants <abbrgrp>
<abbr bid="B23">23</abbr>
</abbrgrp>. The monolignol biosynthesis pathway is well-characterized but the coordination and regulation of genes in the pathway is not well-understood. Recent studies revealed that known regulatory sequences, including several classes of transcription factors and microRNAs play important roles in regulation of lignin and wood formation <abbrgrp>
<abbr bid="B24">24</abbr>
<abbr bid="B25">25</abbr>
</abbrgrp>. These regulatory sequences may be good candidates in selective breeding and genetic engineering programs to increase pulp yield and reduce pulping costs.</p>
<p>The C-value for <it>A. auriculiformis </it>and <it>A. mangium </it>(both 2n = 26) are estimated to be 0.83 pg and 0.65 pg respectively <abbrgrp>
<abbr bid="B26">26</abbr>
</abbrgrp> while <it>A. auriculiformis </it>&#215; <it>A. mangium </it>hybrid genome size is estimated to be 750 Mb <abbrgrp>
<abbr bid="B27">27</abbr>
</abbrgrp>, making the hybrid genome 1.4 times larger than the <it>Populus trichocarpa </it>genome. Currently, no genome sequences for any <it>Acacia </it>species are available although the genomes of several model legume species like <it>M. trunc</it>atula and <it>Glycine max </it>have been sequenced. Unfortunately, all of these model legumes are in a separate subfamily, the Faboideae, while <it>Acacia </it>species are in the Mimosoideae subfamily. In terms of EST resources for <it>A. mangium</it>, a total of 147 from floral tissues <abbrgrp>
<abbr bid="B28">28</abbr>
</abbrgrp>, 8,963 from secondary xylem and shoot tissue <abbrgrp>
<abbr bid="B29">29</abbr>
</abbrgrp> and 2,459 from inner bark of the <it>A. auriculiformis </it>&#215; <it>A. mangium </it>hybrid <abbrgrp>
<abbr bid="B30">30</abbr>
</abbrgrp> have been deposited in the NCBI dbEST. However, no genomic resources is available for <it>A. auriculiformis</it>. Several important genes involved in monolignol biosynthesis and wood-related pathways including <it>cinammate 4-hydroxylase </it>(C4H), <it>caffeoyl CoA 3-O-methyltransferase </it>(CCoAOMT), <it>cinnamyl alcohol dehydrogenase </it>(CAD), <it>phenylalanine ammonia lyase </it>(PAL), <it>caffeic acid O-methyltransferase </it>(COMT) and <it>cellulose synthase </it>(CesA) have been successfully isolated and characterized from the <it>Acacia </it>hybrid <abbrgrp>
<abbr bid="B30">30</abbr>
<abbr bid="B31">31</abbr>
</abbrgrp>.</p>
<p>Conventional breeding programs for the improvement of forest trees are slow, laborious and land intensive due to the long life cycle and large size of trees. The application of genomic approaches facilitated by emerging DNA sequencing technologies may significantly accelerate the breeding program. Due to the lack of genomic resources for tree crops particularly tropical species, the simple discovery of genes controlling wood-related traits will be a major step forward. Ultimately, the development of large-scale genomic resources will facilitate the application of linkage and association mapping within tree improvement programs.</p>
<p>Here we applied paired-end Illumina GAII sequencing to non-normalized cDNAs of <it>A. auriculiformis </it>and <it>A. mangium </it>to discover important genes involved in lignin and secondary cell wall formation in these non-model tree species. Using standard <it>de novo </it>assembly algorithms, we examined the quality of the contigs generated and attempted to identify wood-related genes particularly genes and their isoforms in the monolignol biosynthesis pathway. We also sought to identify potential transcription factors involved in secondary wood formation and lignin deposition, and highly conserved microRNAs and their wood-related gene targets. A major objective in our analysis was to detect a large number of informative SNPs to be used for linkage mapping of hybrid progenies and population genetic studies of the two parental species. Our results could provide powerful tools for the efficient selection of hybrid offsprings with favorable traits, allowing rapid and continued improvement.</p>
</sec>
<sec>
<st>
<p>Results and Discussion</p>
</st>
<sec>
<st>
<p>De novo transcriptome assembly</p>
</st>
<p>In this study, we constructed non-normalized cDNA libraries for each parental species as this will produce more full length transcripts for significant gene discovery. Each library was sequenced using one lane of a flow cell on the Illumina GAII platform using paired end protocols. We obtained 19,899,637 and 17,859,793 51 bp paired-end raw reads for <it>A. auriculiformis </it>and <it>A. mangium</it>, respectively. Filtering and conversion to FASTQ format resulted in 13,648,154 and 12,621,865 paired-end reads for <it>A. auriculiformis </it>and <it>A. mangium </it>respectively. After filtering of ribosomal RNA sequences, 51-57% of the reads remained with an average Phred score of 34 - 35.</p>
<p>The filtered reads were used to perform <it>de novo </it>assembly using a number of software such as Velvet <abbrgrp>
<abbr bid="B32">32</abbr>
</abbrgrp>, SOAPdenovo <abbrgrp>
<abbr bid="B33">33</abbr>
</abbrgrp> and Oases <abbrgrp>
<abbr bid="B34">34</abbr>
</abbrgrp>, however, we found SOAPdenovo produced the longest assemblies despite using longer k-mers. We assessed different k-mer sizes and chose 29-mer to obtain a good tradeoff between assembly size and accuracy. <it>De novo </it>transcriptome assembly for <it>A. auriculiformis </it>(subsequently referred to as '<it>Aa</it>') and <it>A. mangium </it>(subsequently referred to as '<it>Am</it>') produced 42,217 and 35,759 contigs with an N50 contig size of 948 bp and 938 bp, a longest contig of 15,262 bp and 15,220 bp, and an average length of 496 bp and 498 bp respectively (Table <tblr tid="T1">1</tblr>). The sequencing depth was estimated to be 18.7 &#215; and 18.3 &#215; respectively. Blastx indicated that the longest contig of <it>Aa </it>and <it>Am </it>were homologs of the <it>A. thaliana </it>BIG; binding/ubiquitin-protein ligase/zinc ion binding gene (Figure <figr fid="F1">1A</figr>). This gene which is one of the longest genes in plants, was also reported in <it>de novo </it>transcriptome assembly of lettuce <url>https://atgc-illumina.googlecode.com/files/PAG_2010_AKozik_V09.pdf</url>.</p>
<tbl id="T1"><title><p>Table 1</p></title><caption><p>Summary of <it>de novo </it>transcriptome assembly</p></caption><tblbdy cols="3">
      <r>
         <c ca="left">
            <p>
               <b>Species</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>
                  <it>A. auriculiformis</it>
               </b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>
                  <it>A. mangium</it>
               </b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Filtered reads (paired-ends)</p>
         </c>
         <c ca="right">
            <p>7,743,336</p>
         </c>
         <c ca="right">
            <p>6,392,887</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Filtered reads (single-ends)</p>
         </c>
         <c ca="right">
            <p>15,486,672</p>
         </c>
         <c ca="right">
            <p>12,785,774</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Total assembled size (bp)</p>
         </c>
         <c ca="right">
            <p>21,022,649</p>
         </c>
         <c ca="right">
            <p>17,838,260</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Number of contigs and scaffolds</p>
         </c>
         <c ca="right">
            <p>42,217</p>
         </c>
         <c ca="right">
            <p>35,759</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Longest contig (bp)</p>
         </c>
         <c ca="right">
            <p>15,262</p>
         </c>
         <c ca="right">
            <p>15,220</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>N50 (bp)</p>
         </c>
         <c ca="right">
            <p>949</p>
         </c>
         <c ca="right">
            <p>938</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Average length (bp)</p>
         </c>
         <c ca="right">
            <p>498</p>
         </c>
         <c ca="right">
            <p>496</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>GC content (%)</p>
         </c>
         <c ca="right">
            <p>43</p>
         </c>
         <c ca="right">
            <p>43</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Estimated coverage</p>
         </c>
         <c ca="right">
            <p>18.7 &#215;</p>
         </c>
         <c ca="right">
            <p>18.3 &#215;</p>
         </c>
      </r>
   </tblbdy></tbl>
<fig id="F1"><title><p>Figure 1</p></title><caption><p><it>De novo </it>transcriptome assemblies</p></caption><text>
   <p><b><it>De novo </it>transcriptome assemblies</b>. A) Alignment of the longest contig to the <it>Arabidopsis thaliana </it>genome. Blastn alignment of the longest contig of <it>A. auriculiformis </it>and <it>A. mangium </it>to <it>Arabidopsis thaliana </it>BIG/ubiquitin ligase gene located at chromosome 3 (E-value = 0) as showed in GBrowse; B) Proportion of filtered reads mapped to <it>Acacia </it>transcriptomes and other genomes.</p>
</text><graphic file="1471-2164-12-342-1" hint_layout="single"/></fig>
<p>To determine the similarity at the nucleotide level between the transcriptomes, we first mapped filtered reads to their corresponding <it>de novo </it>contigs before mapping each set of reads against the contigs obtained from the other species. To substantially increase the number of mappable reads, we mapped single-end reads using Bowtie -v setting allowing three mismatches. A total of 5,766,757 <it>Aa </it>single-end reads (37.24%) and 4,647,280 <it>Am </it>single-end reads (36.35%) mapped to their corresponding contigs. We observed only a small drop (roughly 15%) in the proportion of mappable reads from one <it>Acacia </it>species to the contigs of the other <it>Acacia </it>species indicating that the two transcriptomes shared a great deal of identity at the nucleotide level and are closely related.</p>
<p>The observation that a large proportion of filtered reads failed to map to the <it>Acacia </it>transcriptomes (&gt; 60%) led us to investigate their origins by mapping to various genomes (Figure <figr fid="F1">1B</figr>). A further 5-6% of the reads mapped to the transcriptome of the other <it>Acacia </it>species probably due to differentially expressed transcripts. We discovered that approximately 14% of the reads mapped to mitochondrial and chloroplast genomes of <it>A. thaliana</it>, suggesting a significant amount of mitochondrial and chloroplast transcripts were sequenced. We suspect that mitochondrion sequences may not be assembled due to the highly heterozygous nature of genomes that were present in high copy number. We tried to map the remaining reads to several model plant genomes but found less than 10% mappable reads and no huge differences between these plant genomes. The number of reads mappable to the model legume, <it>M. truncatula </it>masked genome version Mt3.0 was 6.6-7.6%. The remaining ~36% of filtered reads were unmappable possibly due to several reasons. Some of these reads may be unique <it>Acacia </it>sequences from intergenic and intronic regions based on observation from Wang et al. <abbrgrp>
<abbr bid="B35">35</abbr>
</abbrgrp> study that reported 40.75% of RNA-Seq reads from <it>Aspergillus oryzae </it>were located at these regions. Other reasons such as lack of <it>Acacia </it>genome information, poor quality reads and microbial contamination may have contributed to the large number of unmappable reads.</p>
</sec>
<sec>
<st>
<p>Discovery of monolignol biosynthetic genes and isoforms</p>
</st>
<p>The monolignol biosynthesis pathway consists of several large protein families with members commonly known as isoforms. Isoform identification is challenging due to presence of many closely related superfamily members ("like") in the transcriptome, i.e. 27 "like" proteins of COMT, CCR and 4CL were observed in <it>A. thaliana </it>
<abbrgrp>
<abbr bid="B36">36</abbr>
</abbrgrp>. In this study, we found a total of 52 contigs in <it>Aa </it>and <it>Am </it>transcriptomes with E-value &#8804; 1E-10 corresponding to all ten monolignol genes in <it>A. thaliana</it>. Gene identification using Blast alone often resulted in an overestimation of the total number of genes and isoforms. Shi et al. <abbrgrp>
<abbr bid="B37">37</abbr>
</abbrgrp> reported 95 members of phenylpropanoid genes found in <it>P. trichocarpa </it>genome using Blastp (E-value &#8804; 1E-3), however, many are proposed to be unrelated to monolignol biosynthesis pathway based on phylogenetic and expression analysis. Therefore, we tried to remove unrelated proteins by checking the conserved motifs which provide important clues in protein function and identity. We excluded contigs with low homology to <it>A. thaliana </it>monolignol genes (less than 55% identity) and we checked the remaining contigs for conserved amino acid motifs identified in previous studies <abbrgrp>
<abbr bid="B38">38</abbr>
<abbr bid="B39">39</abbr>
<abbr bid="B40">40</abbr>
<abbr bid="B41">41</abbr>
<abbr bid="B42">42</abbr>
<abbr bid="B43">43</abbr>
<abbr bid="B44">44</abbr>
<abbr bid="B45">45</abbr>
<abbr bid="B46">46</abbr>
</abbrgrp> from the protein alignments (Additional File <supplr sid="S1">1</supplr>).</p>
<suppl id="S1">
<title>
<p>Additional file 1</p>
</title>
<text>
<p>
<b>Multiple protein sequence alignments of monolignol biosynthetic genes in <it>Arabidopsis thaliana</it>, <it>A. auriculiformis </it>and <it>A. mangium</it>
</b>. The file provides the multiple protein sequence alignments of all ten monolignol biosynthetic genes detected in <it>A. auriculiformis </it>and <it>A. mangium </it>with corresponding <it>A. thaliana </it>genes. Conserved motifs are highlighted in colour.</p>
</text>
<file name="1471-2164-12-342-S1.DOC">
   <p>Click here for file</p>
</file>
</suppl>
<p>We were able to detect all ten genes involved in monolignol biosynthesis pathway, namely <it>phenylalanine ammonia lyase </it>(PAL), <it>cinammate 4-hydroxylase </it>(C4H), <it>4-coumarate 3-hydroxylase </it>(C3H), <it>caffeic acid O-methyltransferase </it>(COMT), <it>ferulate</it>
<it> 5-hydroxylase </it>(F5H), <it>4-coumarate:CoA ligase </it>(4CL), <it>hydroxycinnamoyl-CoA shikimate/quinatehydroxy-cinnamoyltransferase </it>(HCT), <it>caffeoyl CoA 3-O-methyltransferase </it>(CCoAOMT), <it>cinnamyl alcohol dehydrogenase </it>(CAD), <it>cinnamoyl Co-A reductase </it>(CCR) compared to traditional EST sequencing in <it>A. mangium </it>
<abbrgrp>
<abbr bid="B29">29</abbr>
</abbrgrp> and <it>A. auriculiformis </it>&#215; <it>A. mangium </it>hybrid <abbrgrp>
<abbr bid="B30">30</abbr>
</abbrgrp>. We discovered more than one isoform for half of the genes which failed to be detected by EST sequencing. We identified a total of 18 isoforms for each species whereas 16 orthologous isoforms were shared in both species (Figure <figr fid="F2">2</figr>). All isoforms shared high identities with the corresponding <it>A. thaliana </it>genes where C3H shared the highest identity (68-85%), followed by PAL (71-84%), C4H (64-84%), CCoAOMT (64-83%), HCT (72-78%), CAD (58-76%), COMT (74%), CCR (73%), 4CL (57-71%) and F5H (59-62%). Our observations that orthologous isoforms of <it>Aa </it>and <it>Am </it>shared at least 99% similarity at both nucleotide and protein level while isoforms within the same family usually do not share an exact match of more than 16 nucleotides are important in determining the number of isoforms for both species.</p>
<fig id="F2"><title><p>Figure 2</p></title><caption><p>Monolignol biosynthesis pathway isoforms of <it>A. auriculiformis </it>and <it>A. mangium</it></p></caption><text>
   <p><b>Monolignol biosynthesis pathway isoforms of <it>A. auriculiformis </it>and <it>A. mangium</it></b>. The number of isoforms found in <it>A. auriculiformis </it>(showed in red circle) and <it>A. mangium </it>(showed in blue circle) based on Blastx (E-value &#8804; 1E-10) and conserved motifs. The number of orthologous isoforms shared by both species is indicated in the overlapping region. The figure is reprinted with permission from The Brazilian Society of Genetics. <it>Phenylalanine ammonia lyase </it>(PAL), <it>cinammate 4-hydroxylase </it>(C4H), <it>4-coumarate 3-hydroxylase </it>(C3H), <it>caffeic acid O-methyltransferase</it>(COMT), <it>ferulate 5-hydroxylase (F5H), 4-coumarate:CoA ligase</it> (4CL), <it>hydroxycinnamoyl-CoA shikimate/quinatehydroxy-cinnamoyltransferase </it>(HCT), <it>caffeoyl CoA 3-O-methyltransferase </it>(CCoAOMT), <it>cinnamyl alcohol dehydrogenase </it>(CAD), <it>cinnamoyl Co-A reductase </it>(CCR).
</p>
</text><graphic file="1471-2164-12-342-2" hint_layout="single"/></fig>
<p>The total assembled sequence lengths of the 36 isoforms ranged from 503 to 2,460 bp and only 14 contained complete open reading frame (ORF). No polyadenylation site was observed as expected because short polyA sequences failed to be assembled. One limitation of our sequence analysis is the presence of gap region in the contigs. Half of the assembled sequences contain gap regions with the total size range of 13 - 403 bp. These regions which were masked by Ns often occur at low coverage area where two contigs or mate pairs are connected during scaffolding. Although most <it>de novo </it>assemblers can estimate the size of the gap region, the predicted size is not always correct and sometimes resulting in inaccurate protein prediction. It is recommended to double-check the protein sequences by translating each fragment in gapped assemblies using other protein prediction software. Missing data poses a challenge to sequence comparison and analysis and therefore, gap filling by resequencing should be done in the future.</p>
<p>The total number of isoforms detected in this study is generally lower than those found in <it>A. thaliana </it>
<abbrgrp>
<abbr bid="B36">36</abbr>
</abbrgrp> and <it>P. trichocarpa </it>
<abbrgrp>
<abbr bid="B37">37</abbr>
</abbrgrp>. The identified isoforms possessed 99% DNA sequence identity with previously characterized isoforms from <it>A. auriculiformis </it>&#215; <it>A. mangium hybrid </it>for the five isoforms that we examined, namely PAL, C4H, COMT, CCoAOMT, and CAD <abbrgrp>
<abbr bid="B31">31</abbr>
</abbrgrp>. The high sequence similarity between of <it>A. auriculiformis</it>, <it>A. mangium </it>and their hybrids will allow more efficient cross amplification in gene isolation and characterization efforts. Given that several isoforms were only found in one species, greater sequencing depth is required for our analysis to overcome incomplete assemblies and sampling biases, previously observed in genomic sequences of <it>Pseudomonas syringae </it>strains <abbrgrp>
<abbr bid="B47">47</abbr>
</abbrgrp>. Nevertheless, transcriptome sequencing of other tissues such as secondary xylem will provide more differentially expressed isoforms which can be new targets for the improvement of wood properties.</p>
</sec>
<sec>
<st>
<p>Identification of wood-related transcription factors</p>
</st>
<p>We found 1,306 <it>Aa </it>and 1,160 <it>Am </it>contigs with high sequence identity (E-value &#8804; 1E-10) corresponding to 72 and 73 families out of 82 <it>A. thaliana </it>transcription factor families downloaded from PlnTFDB <abbrgrp>
<abbr bid="B48">48</abbr>
</abbrgrp>. The five most abundant transcriptional gene families were WKRY, Orphans, PHD, HB and the MYB-related group. Several major classes of transcription factors involved in lignin and wood formation were found in both species (Figure <figr fid="F3">3A</figr>), generally in similar numbers of contigs, although the NAC family was substantially more abundant in <it>Aa</it>. Additionally, eight <it>Aa </it>and nine <it>Am </it>contigs were identified as class III HD-ZIP, a member of Homeobox (HB) family.</p>
<fig id="F3"><title><p>Figure 3</p></title><caption><p>Transcription factors involved in regulation of wood formation and lignin biosynthesis</p></caption><text>
   <p><b>Transcription factors involved in regulation of wood formation and lignin biosynthesis</b>. (A) Number of contigs found in <it>A. auriculiformis </it>and <it>A. mangium </it>corresponding to five major wood-related transcription factor families (E-value &#8804; 1E-10); (B) Phylogenetic tree of R2R3-MYBs of <it>A. auriculiformis</it>, <it>A. mangium </it>and other plant species that are involved in secondary cell wall formation. The neighbour-joining tree was obtained with MEGA 5.0 software and ClustalW2 alignments of full length amino acid sequences. Bootstrap values > 50% are shown. The bar indicates an evolutionary distance of 0.1. Ath: <it>Arabidopsis thaliana</it>, Am: <it>Anthirinum majus</it>, Zm: <it>Zea mays</it>, Pt: <it>Pinus taeda</it>, Ptr: <it>Populus trichocarpa</it>, Eg: <it>Eucalyptus gunnii</it>, Aau: <it>A. auriculiformis</it>, Amg: <it>A. mangium</it>.</p>
</text><graphic file="1471-2164-12-342-3" hint_layout="single"/></fig>
<p>Some members of the R2R3-MYB family are known to be involved in controlling lignin deposition and secondary wall formation by interacting with other R2R3-MYB genes, activated by NAC transcription factor master switches and binding to AC elements <abbrgrp>
<abbr bid="B49">49</abbr>
</abbrgrp>. The AC elements are cis-acting elements found in most promoters of monolignol biosynthetic genes <abbrgrp>
<abbr bid="B36">36</abbr>
</abbrgrp>. In this study, we identified five contigs, two in <it>Aa </it>and three in <it>Am</it>, which are homologous to R2R3-MYBs regulating wood-related pathways in other plant species. In addition to R2R3-MYBs, NtLIM1 in tobacco had been proven to bind AC elements and its inhibition reduced lignin content <abbrgrp>
<abbr bid="B50">50</abbr>
</abbrgrp>. We found one <it>Am </it>contig which was highly homologous to tobacco NtLIM1 with 86% identity.</p>
<p>Phylogenetic analysis of the <it>Acacia </it>R2R3-MYB proteins with wood-related R2R3-MYBs from <it>A. thaliana </it>and other plant species showed that they fall into three groups (Figure <figr fid="F3">3B</figr>). In group one, three <it>Acacia </it>R2R3-MYB proteins, namely AauMYB1, AmgMYB1 and AauMYB3 are close homologs of <it>Arabidopsis </it>MYB61 and Pine MYB8 while AauMYB1 and AmgMYB1 are orthologs. Pine MYB8 is a close homolog of MYB61 whose overexpression caused ectopic lignin deposition but the exact functions are yet to be known <abbrgrp>
<abbr bid="B51">51</abbr>
<abbr bid="B52">52</abbr>
</abbrgrp>. Only one <it>Am </it>R2R3-MYB protein (AmgMYB2) belongs to group two which is a close homolog to <it>Arabidopsis </it>MYB20 and MYB43. MYB20, MYB42 and MYB43 are activated by NAC master switches to regulate downstream MYB proteins in wood-related pathways <abbrgrp>
<abbr bid="B53">53</abbr>
</abbrgrp> whereas MYB85 can induce secondary wall biosynthetic genes <abbrgrp>
<abbr bid="B54">54</abbr>
</abbrgrp>. Another member of this group, PineMYB1 is able to bind AC elements <abbrgrp>
<abbr bid="B55">55</abbr>
</abbrgrp> and is involved in secondary cell wall deposition <abbrgrp>
<abbr bid="B51">51</abbr>
</abbrgrp>. AauMYB2 belongs to group three that clustered together with EgMYB1, AmMYB308, ZmMYB31 and ZmMYB42, indicates an important role in regulating the monolignol biosynthesis pathway. EgMYB1 binds AC element and represses the monolignol biosynthesis pathway <abbrgrp>
<abbr bid="B56">56</abbr>
</abbrgrp>. AmMYB308, ZmMYB31 and ZmMYB42 have been shown to affect lignin content by regulating the expression of lignin genes <abbrgrp>
<abbr bid="B57">57</abbr>
<abbr bid="B58">58</abbr>
</abbrgrp>.</p>
</sec>
<sec>
<st>
<p>Identification of microRNA genes and gene targets</p>
</st>
<p>For non-model species like <it>Acacia</it>, microRNAs (miRNAs) can be identified from the transcriptome data based on homology searches against publicly available databases <abbrgrp>
<abbr bid="B59">59</abbr>
</abbrgrp>. We searched for miRNAs by comparing our contigs to known plants miRNA stem-loop sequences downloaded from miRbase <abbrgrp>
<abbr bid="B60">60</abbr>
</abbrgrp>. We found nine matching sequences from <it>Aa </it>corresponding to eight conserved families (miR319, 396, 162, 160, 168, 166, 172 and 159) and one recently identified family (miR2086). Four of these families (miR319, 396, 2086 and 166) were also found in <it>Am</it>. Most predicted miRNA genes such as miR319, miR396, miR162, miR166, miR168, miR172 are highly conserved in plants. The number of miRNAs detected in this study was lower compared to another study <abbrgrp>
<abbr bid="B61">61</abbr>
</abbrgrp> because miRNAs are most abundant in leaves and flowers.</p>
<p>Blastx results showed that all primary transcripts except miR2086 have no significant hits to any protein-coding gene, suggesting that primary transcript sequences are less conserved in plants. Primary transcripts of miR159 and miR166 were removed from further analysis due to incomplete stem-loop structure and missing mature miRNA sequence. The presence of gap region in the stem loop sequences of miR396, miR160 and miR172 in <it>Aa </it>resulted in inaccurate stem-loop structure prediction. Therefore, PCR amplification and sequencing were carried out to fill up the gap. The secondary structures of miR319, miR396, miR2086, miR160, miR162, miR168 and miR172 predicted by Mfold were stable (Figure <figr fid="F4">4</figr>) and all except miR160 have high MFEI values (Table <tblr tid="T2">2</tblr>). miR2086 is a relatively new family highly expressed in the stem of <it>M. truncatula </it>
<abbrgrp>
<abbr bid="B62">62</abbr>
</abbrgrp>. Blastx indicated that both primary transcripts of miR2086 code for DNA glycosylase (E-value = 0.0). The predicted target of miR2086 is nodulin-like protein suggesting it might play a role in nitrogen fixing pathway. This family is predicted to be a legume-specific miRNA.</p>
<fig id="F4"><title><p>Figure 4</p></title><caption><p>Stem-loop structures of miRNAs found in <it>A. auriculiformis </it>and <it>A. mangium</it></p></caption><text>
   <p><b>Stem-loop structures of miRNAs found in <it>A. auriculiformis </it>and <it>A. mangium</it></b>.</p>
</text><graphic file="1471-2164-12-342-4" hint_layout="single"/></fig>
<tbl id="T2"><title><p>Table 2</p></title><caption><p>Predicted miRNAs in <it>A. auriculiformis </it>and <it>A. mangium</it>.</p></caption><tblbdy cols="7">
      <r>
         <c ca="left">
            <p>
               <b>miRNA family</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>mature miRNA sequence</b>
            </p>
            <p>
               <b>(5'-3')</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>miRNA</b>
            </p>
            <p>
               <b>mismatch</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Length (nt)</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>MFE</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>GC %</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>MFEI</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="7">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>aau-miR319</p>
         </c>
         <c ca="left">
            <p>uuggacugaagggagcucccu</p>
         </c>
         <c ca="left">
            <p>3</p>
         </c>
         <c ca="left">
            <p>197</p>
         </c>
         <c ca="left">
            <p>-68.9</p>
         </c>
         <c ca="left">
            <p>41.1</p>
         </c>
         <c ca="left">
            <p>0.85</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>aau-miR396</p>
         </c>
         <c ca="left">
            <p>uuccacagcuuucuugaacug</p>
         </c>
         <c ca="left">
            <p>0</p>
         </c>
         <c ca="left">
            <p>146</p>
         </c>
         <c ca="left">
            <p>-63.0</p>
         </c>
         <c ca="left">
            <p>44.5</p>
         </c>
         <c ca="left">
            <p>0.97</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>aau-miR2086</p>
         </c>
         <c ca="left">
            <p>gacaugaaugcagaacuggaa</p>
         </c>
         <c ca="left">
            <p>0</p>
         </c>
         <c ca="left">
            <p>87</p>
         </c>
         <c ca="left">
            <p>-23.4</p>
         </c>
         <c ca="left">
            <p>39.1</p>
         </c>
         <c ca="left">
            <p>0.69</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>aau-miR160</p>
         </c>
         <c ca="left">
            <p>uggcauacagggagccaggca</p>
         </c>
         <c ca="left">
            <p>0</p>
         </c>
         <c ca="left">
            <p>88</p>
         </c>
         <c ca="left">
            <p>-29.4</p>
         </c>
         <c ca="left">
            <p>56.8</p>
         </c>
         <c ca="left">
            <p>0.59</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>aau-miR162</p>
         </c>
         <c ca="left">
            <p>ucgauaaaccucugcauccag</p>
         </c>
         <c ca="left">
            <p>0</p>
         </c>
         <c ca="left">
            <p>103</p>
         </c>
         <c ca="left">
            <p>-40.0</p>
         </c>
         <c ca="left">
            <p>48.5</p>
         </c>
         <c ca="left">
            <p>0.80</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>aau-miR168</p>
         </c>
         <c ca="left">
            <p>auucaguugaugcaaggcgggauc</p>
         </c>
         <c ca="left">
            <p>2</p>
         </c>
         <c ca="left">
            <p>127</p>
         </c>
         <c ca="left">
            <p>-57.8</p>
         </c>
         <c ca="left">
            <p>59.1</p>
         </c>
         <c ca="left">
            <p>0.77</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>aau-miR172</p>
         </c>
         <c ca="left">
            <p>ugagaaucuugaugaugcugcau</p>
         </c>
         <c ca="left">
            <p>0</p>
         </c>
         <c ca="left">
            <p>165</p>
         </c>
         <c ca="left">
            <p>-59.6</p>
         </c>
         <c ca="left">
            <p>40.6</p>
         </c>
         <c ca="left">
            <p>0.89</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>amg-miR319</p>
         </c>
         <c ca="left">
            <p>guggacugaaggaagcucucu</p>
         </c>
         <c ca="left">
            <p>0</p>
         </c>
         <c ca="left">
            <p>182</p>
         </c>
         <c ca="left">
            <p>-82.0</p>
         </c>
         <c ca="left">
            <p>41.2</p>
         </c>
         <c ca="left">
            <p>1.09</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>amg-miR396</p>
         </c>
         <c ca="left">
            <p>uuccacagcuuucuugaacug</p>
         </c>
         <c ca="left">
            <p>0</p>
         </c>
         <c ca="left">
            <p>145</p>
         </c>
         <c ca="left">
            <p>-63.8</p>
         </c>
         <c ca="left">
            <p>44.1</p>
         </c>
         <c ca="left">
            <p>1.00</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>amg-miR2086</p>
         </c>
         <c ca="left">
            <p>gacaugaaugcagaacuggaa</p>
         </c>
         <c ca="left">
            <p>0</p>
         </c>
         <c ca="left">
            <p>87</p>
         </c>
         <c ca="left">
            <p>-23.4</p>
         </c>
         <c ca="left">
            <p>39.1</p>
         </c>
         <c ca="left">
            <p>0.69</p>
         </c>
      </r>
   </tblbdy></tbl>
<p>A total of 512 and 442 contigs in <it>Aa </it>and <it>Am </it>were predicted to be the targets for 135 and 134 miRNA families found in plants. Blastx results for the predicted targets of several highly conserved miRNAs are indicated in Table <tblr tid="T3">3</tblr>. We found known targets such as Auxin Response Factor, APETALA 2, F-box protein, Cc-NBS-LRR disease resistance genes and Heat Shock Protein for miRNA 160, 172, and 396. We predicted three wood-related genes, namely flavonol synthase-like, xyloglucan fucosyltransferase and glucan synthase-like genes to be the targets of miR170, miR172 and miR319, respectively, suggesting that miRNAs might be directly involved in the regulation of phenylpropanoid pathway and hemicellulose biosynthesis pathway. Glucan synthase is involved in the synthesis of xyloglucan which make up the &#946;-1,4-glucan backbone while xyloglucan fucosyltransferase adds fructose sidechains to the backbone. Downregulation of flavonol synthase is predicted to redirect the carbon flux towards lignin biosynthesis as flavonoid biosynthesis uses 4-coumaroyl CoA as precursor. Functional analysis of these putative miRNA targets for potential role in wood formation should be studied in the future.</p>
<tbl id="T3"><title><p>Table 3</p></title><caption><p>Predicted miRNA targets in <it>A. auriculiformis </it>and <it>A. mangium</it>.</p></caption><tblbdy cols="5">
      <r>
         <c ca="left">
            <p>
               <b>miRNA</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Known miRNA targets</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Blastx ID</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Blastx annotation</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>E-value</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>160<sup>a</sup></p>
         </c>
         <c ca="left">
            <p>Auxin Response</p>
         </c>
         <c ca="left">
            <p>XP_002519531.1</p>
         </c>
         <c ca="left">
            <p>Auxin Response factor</p>
         </c>
         <c ca="center">
            <p>0.0</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>160<sup>b</sup></p>
         </c>
         <c ca="left">
            <p>Factors</p>
         </c>
         <c ca="left">
            <p>XP_002519531.1</p>
         </c>
         <c ca="left">
            <p>Auxin Response factor</p>
         </c>
         <c ca="center">
            <p>5e-145</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>170<sup>a</sup></p>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>AAM63621.1</p>
         </c>
         <c ca="left">
            <p>Flavonol synthase-like protein</p>
         </c>
         <c ca="center">
            <p>7e-12</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>172<sup>a</sup></p>
         </c>
         <c ca="left">
            <p>APETALA 2</p>
         </c>
         <c ca="left">
            <p>XP_002534399.1</p>
         </c>
         <c ca="left">
            <p>APETALA 2</p>
         </c>
         <c ca="center">
            <p>7e-65</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>XP_002527501.1</p>
         </c>
         <c ca="left">
            <p>Signal transducer</p>
         </c>
         <c ca="center">
            <p>1e-88</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>XP_002320412.1</p>
         </c>
         <c ca="left">
            <p>F-box protein</p>
         </c>
         <c ca="center">
            <p>2e-146</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>XP_002331783.1</p>
         </c>
         <c ca="left">
            <p>Cc-NBS-LRR resistance protein</p>
         </c>
         <c ca="center">
            <p>5e-60</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>AAD41092.1</p>
         </c>
         <c ca="left">
            <p>Xyloglucan fucosyltransferase</p>
         </c>
         <c ca="center">
            <p>1e-110</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>172<sup>b</sup></p>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>XP_002320412.1</p>
         </c>
         <c ca="left">
            <p>F-box protein</p>
         </c>
         <c ca="center">
            <p>0.0</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>NP_973532.1</p>
         </c>
         <c ca="left">
            <p>Protein kinase</p>
         </c>
         <c ca="center">
            <p>0.0</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>XP_002516311.1</p>
         </c>
         <c ca="left">
            <p>ATP binding protein</p>
         </c>
         <c ca="center">
            <p>4e-158</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>NP_001119113.1</p>
         </c>
         <c ca="left">
            <p>Zinc ion binding</p>
         </c>
         <c ca="center">
            <p>0.0</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>Q9M5Q1.1</p>
         </c>
         <c ca="left">
            <p>Xyloglucan fucosyltransferase</p>
         </c>
         <c ca="center">
            <p>2e-104</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>319<sup>a</sup></p>
         </c>
         <c ca="left">
            <p>TCP transcription factors</p>
         </c>
         <c ca="left">
            <p>NP_187372.4</p>
         </c>
         <c ca="left">
            <p>ATGSL10 (glucan synthase-like 10)</p>
         </c>
         <c ca="center">
            <p>0.0</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>AAC16330.1</p>
         </c>
         <c ca="left">
            <p>SAR DNA-binding protein</p>
         </c>
         <c ca="center">
            <p>0.0</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>319<sup>b</sup></p>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>NP_187372.4</p>
         </c>
         <c ca="left">
            <p>ATGSL10 (glucan synthase-like 10)</p>
         </c>
         <c ca="center">
            <p>0.0</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>396<sup>a</sup></p>
         </c>
         <c ca="left">
            <p>Cell proliferation,</p>
         </c>
         <c ca="left">
            <p>AAB99745.1</p>
         </c>
         <c ca="left">
            <p>Heat shock protein 70</p>
         </c>
         <c ca="center">
            <p>0.0</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>GRL</p>
         </c>
         <c ca="left">
            <p>XP_002331783.1</p>
         </c>
         <c ca="left">
            <p>Cc-NBS-LRR resistance protein</p>
         </c>
         <c ca="center">
            <p>2e-74</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>transcription factors</p>
         </c>
         <c ca="left">
            <p>AAM61431.1</p>
         </c>
         <c ca="left">
            <p>Developmental protein</p>
         </c>
         <c ca="center">
            <p>5e-84</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>396<sup>b</sup></p>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>NP_195570.1</p>
         </c>
         <c ca="left">
            <p>Metal ion binding protein</p>
         </c>
         <c ca="center">
            <p>7e-64</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>2086<sup>a</sup></p>
         </c>
         <c ca="left">
            <p>Unknown</p>
         </c>
         <c ca="left">
            <p>AAC27411.1</p>
         </c>
         <c ca="left">
            <p>Nodulin-like protein</p>
         </c>
         <c ca="center">
            <p>0.0</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>2086<sup>b</sup></p>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>AAC27411.1</p>
         </c>
         <c ca="left">
            <p>Nodulin-like protein</p>
         </c>
         <c ca="center">
            <p>3e-16</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>
         <sup>a </sup>
         <it>A. auriculiformis</it>
      </p>
      <p>
         <sup>b </sup>
         <it>A. mangium</it>
      </p>
   </tblfn></tbl>
</sec>
<sec>
<st>
<p>Detection of Single Nucleotide Polymorphisms (SNPs)</p>
</st>
<p>Single Nucleotide Polymorphisms (SNPs) are abundant markers that are suitable for a species with low genetic diversity such as <it>A. mangium </it>
<abbrgrp>
<abbr bid="B63">63</abbr>
</abbrgrp>. For a non-model species without genome sequences, we detected SNPs by mapping all the reads to <it>de novo </it>contigs as reference. We used only contigs at least 200 bp long to ensure sufficient flanking region for genotyping purposes. Although paired-end reads provide more accurate alignments, a large fraction of our contigs were too short to effectively utilize the paired end information, so we mapped the reads as single-end data, which substantially increased the number of mappable reads.</p>
<p>By using Bowtie default settings and allowing two mismatches, we detected a total of 30,837 <it>Aa </it>and 19,070 <it>Am </it>putative SNPs. After applying several filtering parameters to remove low coverage, low confidence, low minor frequency allele and multi-allelic SNPs, the putative SNPs number was further reduced to 16,648 and 9,335, respectively (Table <tblr tid="T4">4</tblr>). As expected, transition SNPs occur almost twice as frequently as transversion SNPs. One SNP was estimated to occur in every 1,123 bp and 1,704 bp in the <it>Aa </it>and <it>Am </it>transcriptomes, respectively. Although these SNPs represent only a portion of the <it>Acacia </it>transcriptome, this study has provided a better SNPs estimation compared to a previous study <abbrgrp>
<abbr bid="B31">31</abbr>
</abbrgrp> which was based on the SNPs variation in two lignin genes. Further investigations are being carried to validate these SNPs which are useful for the construction of <it>Acacia </it>hybrid linkage map.</p>
<tbl id="T4"><title><p>Table 4</p></title><caption><p>Summary of SNPs detected in <it>A. auriculiformis </it>and <it>A. mangium</it>.</p></caption><tblbdy cols="3">
      <r>
         <c>
            <p/>
         </c>
         <c ca="right">
            <p>
               <b>
                  <it>A. auriculiformis</it>
               </b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>
                  <it>A. mangium</it>
               </b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Number of contigs at least 200 bp</p>
         </c>
         <c ca="right">
            <p>23,850</p>
         </c>
         <c ca="right">
            <p>20,387</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Total size of contigs at least 200 bp (bp)</p>
         </c>
         <c ca="right">
            <p>18,701,412</p>
         </c>
         <c ca="right">
            <p>15,903,039</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Putative SNPs</p>
         </c>
         <c ca="right">
            <p>30,837</p>
         </c>
         <c ca="right">
            <p>19,070</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Filtered SNPs</p>
         </c>
         <c ca="right">
            <p>16,648</p>
         </c>
         <c ca="right">
            <p>9,335</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Transition SNPs</p>
         </c>
         <c ca="right">
            <p>10,826</p>
         </c>
         <c ca="right">
            <p>6,064</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Transversion SNPs</p>
         </c>
         <c ca="right">
            <p>5,822</p>
         </c>
         <c ca="right">
            <p>3,271</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>SNP frequency</p>
         </c>
         <c ca="right">
            <p>1 every 1,123 bp</p>
         </c>
         <c ca="right">
            <p>1 every 1,704 bp</p>
         </c>
      </r>
   </tblbdy></tbl>
</sec>
</sec>
<sec>
<st>
<p>Conclusion</p>
</st>
<p>This is the first comprehensive transcriptome-wide analysis of <it>Acacia auriculiformis </it>and <it>Acacia mangium</it>. Our results provide valuable genetic resources for further investigation of lignin biosynthesis and wood-related pathways in <it>Acacia </it>hybrids. As Next Generation Sequencing and analytical techniques improve, whole transcriptome sequencing using short read platforms will be the most cost-effective way for significant discovery of genes, regulatory sequences and markers in previously unstudied plants.</p>
</sec>
<sec>
<st>
<p>Methods</p>
</st>
<sec>
<st>
<p>Plant materials and RNA extraction</p>
</st>
<p>Plant materials were collected from one <it>A. auriculiformis </it>individual (AA6) and one <it>A. mangium </it>individual (AM20) growing in the Forest Research Institute Malaysia (FRIM), Kepong. AA6 and AM20 are parents of an <it>Acacia </it>hybrid mapping population. Both trees were about 5 years old at the time of sampling. The trees were propagated by marcotting the 4-year-old mother trees in FRIM's field station at Bidor, Perak and planting took place at Bukit Hari field plot in FRIM Kepong in 2004. Three different tissues, namely young stem, intermediate inner bark and old inner bark tissues were sampled. Young stem tissues consisted of ~5 cm of non-lignified stem, starting from the shoot tip. Inner bark tissues from intermediate and old developmental stages were sampled by cutting the largest branch on each tree into two halves. The upper half represented the intermediate stage while the lower half represented the old stage. The halves were further cut into disks about 3 cm each. The outer bark tissues were peeled off and we collected the inner bark tissues by separating it from the sapwood. The inner bark tissues are cut into smaller pieces and immediately frozen in liquid nitrogen and stored at -80&#176;C until further use. RNA extraction was carried out using QIAGEN RNeasy Mini Kit for each tissue. A single RNA sample for each individual was generated from 20 &#956;g RNA samples pooled from each of the three tissues. The quality and quantity of the RNA were evaluated using a Nanodrop ND-100 Spectrophotometer and Agilent Bioanalyzer. The RNA Integrity Number (RIN) value given by Agilent Bioanalyzer was greater than 7.5. RNase inhibitor was added to the RNA samples before sending to Canada's Michael Smith Genome Sciences Center where ribosomal RNA depletion using Invitrogen Ribominus Kit, cDNA synthesis and library construction were carried out. Each sample was subjected to one lane sequencing on an Illumina GAII platform.</p>
</sec>
<sec>
<st>
<p>De novo transcriptome assembly and annotation</p>
</st>
<p>Raw reads in QSEQ format were filtered and converted to FASTQ format using a AWK command. Ribosomal RNA was removed by mapping to <it>A. thaliana </it>25S and 18S ribosomal RNA sequences using MUMMER <abbrgrp>
<abbr bid="B64">64</abbr>
</abbrgrp> and filtered by a custom Python script (available upon request). The quality of the filtered reads was assessed using Python script htseq-qa from HTSeq package <url>http://www-huber.embl.de/users/anders/HTSeq/doc/overview.html</url>. The filtered reads were used in <it>de novo </it>assembly using SOAPdenovo v1.03 <abbrgrp>
<abbr bid="B33">33</abbr>
</abbrgrp> with all default settings except -R option was enabled and the insert size of 180-250 bp was used. SOAPdenovo performed scaffolding using paired-end read information and returned the assemblies in contigs and scaffolds. In this paper, we used the term "contigs" to refer to both contigs and scaffolds. The sequencing depth was estimated based on total length of the reads used in the assembly divided by total size of transcriptome assemblies. The contigs were searched against NCBI Non-redundant Database using Blastn and Blastx (E-value &#8804; 1e-10). All the contigs were translated into protein sequences using FrameDP <abbrgrp>
<abbr bid="B65">65</abbr>
</abbrgrp>. To compare transcriptomes of <it>Aa </it>and <it>Am</it>, we mapped single-end filtered reads from both species to both transcriptome assemblies separately using Bowtie-0.12.3 <abbrgrp>
<abbr bid="B66">66</abbr>
</abbrgrp> by allowing three mismatches and ignoring quality score. We applied an iterative mapping and filtering approach to the unmappable reads to find out their origins. Using Bowtie and allowing three mismatches, the single-end filtered reads were mapped to the both <it>Acacia </it>transcriptomes and other genomes as reference in the following order: its corresponding <it>de novo </it>contigs, <it>de novo </it>contigs from the other <it>Acacia </it>species, <it>A. thaliana </it>organelles (TAIR8 mitochondrial and chloroplast genomes) and <it>M. truncatula </it>genome (Mt3.0). After each alignment, mapped reads were removed using Bowtie's --un command and mapped to the next reference sequences. The remaining reads were considered as unmapped reads. The raw reads of <it>Aa </it>and <it>Am </it>were deposited on the NCBI Sequence Read Archive (SRA) with accession number <a href="http://www.ncbi.nlm.nih.gov/sra/47114">SRR098315</a> and <a href="http://www.ncbi.nlm.nih.gov/sra/47113">SRR098314</a>.</p>
</sec>
<sec>
<st>
<p>Discovery of monolignol biosynthetic genes and isoforms</p>
</st>
<p>All monolignol biosynthetic genes and isoforms were downloaded from the Arabidopsis Monolignol Biosynthesis Gene Families Database <abbrgrp>
<abbr bid="B67">67</abbr>
</abbrgrp>. We searched the contigs for homologs of <it>A. thaliana </it>genes in monolignol biosynthesis pathway using local NCBI Blast-2.2.23+ blastx algorithm (E-value &#8804; 1E-10). The protein sequences of the contigs were double-checked with ExPASy Translate Tool <url>http://expasy.org/tools/dna.html</url> and aligned with the corresponding <it>A. thaliana </it>genes using ClustalW2 <abbrgrp>
<abbr bid="B68">68</abbr>
</abbrgrp>. NCBI ORF finder <abbrgrp>
<abbr bid="B69">69</abbr>
</abbrgrp> was used to search for Open Reading Frame (ORF). The protein sequences were checked for presence of conserved amino acid motifs to distinguish members within the same family. Protein identity shared between the isoforms and the closest <it>A. thaliana </it>isoforms were checked using EMBOSS Matcher <abbrgrp>
<abbr bid="B70">70</abbr>
</abbrgrp> available at <url>http://mobyle.pasteur.fr/cgi-bin/portal.py?#forms::matcher</url>. The nucleotide sequences were trimmed and deposited at NCBI Transcriptome Shortgun Assembly (TSA) (Additional File <supplr sid="S2">2</supplr>). Protein and nucleotide sequences of the monolignol genes of <it>Aa &#215; Am </it>hybrid, namely PAL, C4H, COMT, CCoAOMT and CAD were downloaded from Genbank [Genbank: <ext-link ext-link-id="AAW78382.1" ext-link-type="gen">AAW78382.1</ext-link>, <ext-link ext-link-id="AAY86361.1" ext-link-type="gen">AAY86361.1</ext-link>, <ext-link ext-link-id="ABD42947.1" ext-link-type="gen">ABD42947.1</ext-link>, <ext-link ext-link-id="ABX75853.1" ext-link-type="gen">ABX75853.1</ext-link> and <ext-link ext-link-id="ABX75854.1" ext-link-type="gen">ABX75854.1</ext-link>] and aligned to the homologs in <it>Aa </it>and <it>Am </it>using ClustalW2.</p>
<suppl id="S2">
<title>
<p>Additional file 2</p>
</title>
<text>
<p>
<b>Genbank accession numbers of monolignol biosynthetic genes in <it>A. auriculiformis </it>and <it>A. mangium</it>
</b>. The table provides the lengths and accession numbers for the assembled sequences of monolignol biosynthetic genes from <it>A. auriculiformis </it>and <it>A. mangium </it>that were deposited in NCBI Transcriptome Shortgun Assembly (TSA).</p>
</text>
<file name="1471-2164-12-342-S2.DOC">
   <p>Click here for file</p>
</file>
</suppl>
</sec>
<sec>
<st>
<p>Identification of wood-related transcription factors</p>
</st>
<p>We downloaded 82 transcription factor and transcriptional regulatory families of <it>A. thaliana </it>from PlnTFDB database<abbrgrp>
<abbr bid="B48">48</abbr>
</abbrgrp>. We searched the translated contigs against this database using local NCBI Blast-2.2.23+ blastp algorithm (E-value &#8804; 1E-10). We further analyzed several classes of wood-related transcription factors such as MYB, LIM and HD-ZIPIII. Protein sequences of R2R3-MYB [Genbank: <ext-link ext-link-id="CAE09058.1" ext-link-type="gen">CAE09058.1</ext-link>, <ext-link ext-link-id="CAE09057.1" ext-link-type="gen">CAE09057.1</ext-link>, <ext-link ext-link-id="NP_566467.2" ext-link-type="gen">NP_566467.2</ext-link>, <ext-link ext-link-id="NP_172425.2" ext-link-type="gen">NP_172425.2</ext-link>, <ext-link ext-link-id="NP_567390.4" ext-link-type="gen">NP_567390.4</ext-link>, <ext-link ext-link-id="NP_176797.1" ext-link-type="gen">NP_176797.1</ext-link>, <ext-link ext-link-id="NP_197163.1" ext-link-type="gen">NP_197163.1</ext-link>, <ext-link ext-link-id="ACA33851.1" ext-link-type="gen">ACA33851.1</ext-link>, <ext-link ext-link-id="AAQ62540.1" ext-link-type="gen">AAQ62540.1</ext-link>, <ext-link ext-link-id="ABD60280.1" ext-link-type="gen">ABD60280.1</ext-link>, <ext-link ext-link-id="NP_196791.1" ext-link-type="gen">NP_196791.1</ext-link>, <ext-link ext-link-id="NP_567664.1" ext-link-type="gen">NP_567664.1</ext-link>, <ext-link ext-link-id="NP_001106009.1" ext-link-type="gen">NP_001106009.1</ext-link>, <ext-link ext-link-id="NP_001105949.1" ext-link-type="gen">NP_001105949.1</ext-link>, <ext-link ext-link-id="XP_002313303.1" ext-link-type="gen">XP_002313303.1</ext-link>, <ext-link ext-link-id="NP_187463.1" ext-link-type="gen">NP_187463.1</ext-link>, <ext-link ext-link-id="XP_002299944.1" ext-link-type="gen">XP_002299944.1</ext-link> Swiss-Prot: <ext-link ext-link-id="P81395.1" ext-link-type="sprot">P81395.1</ext-link>, <ext-link ext-link-id="P81393.1" ext-link-type="gen">P81393.1</ext-link>], LIM [Genbank: <ext-link ext-link-id="AT1G01780.1" ext-link-type="gen">AT1G01780.1</ext-link>, <ext-link ext-link-id="AT1G10200" ext-link-type="gen">AT1G10200</ext-link>, <ext-link ext-link-id="AT1G39900.1" ext-link-type="gen">AT1G39900.1</ext-link>, <ext-link ext-link-id="AT2G45800.1" ext-link-type="gen">AT2G45800.1</ext-link>, <ext-link ext-link-id="AT3G61230.1" ext-link-type="gen">AT3G61230.1</ext-link>, <ext-link ext-link-id="AT3G55770.1" ext-link-type="gen">AT3G55770.1</ext-link>] and HD-ZIPIII [Genbank: <ext-link ext-link-id="AY919616.1" ext-link-type="gen">AY919616.1</ext-link>
<ext-link ext-link-id="-AY919623.1" ext-link-type="gen">-AY919623.1</ext-link>] from other plant species were downloaded from NCBI Protein Database. To generate the phylogenetic tree of R2R3-MYBs family, the full length amino acid sequences of R2R3-MYBs from other plant species and five homologous <it>Acacia </it>R2R3-MYBs, namely AauMYB1, AauMYB2, AauMYB3, AmgMYB1, AmgMYB2 [Genbank: <ext-link ext-link-id="JL052980" ext-link-type="gen">JL052980</ext-link>, <ext-link ext-link-id="JL052981" ext-link-type="gen">JL052981</ext-link>, <ext-link ext-link-id="JL052982" ext-link-type="gen">JL052982</ext-link>, <ext-link ext-link-id="JL053003" ext-link-type="gen">JL053003</ext-link>, <ext-link ext-link-id="JL053004" ext-link-type="gen">JL053004</ext-link>, <ext-link ext-link-id="JL053005" ext-link-type="gen">JL053005</ext-link>] were used. The protein sequences of homologous <it>Acacia </it>R2R3-MYBs were double-checked with ExPASy Translate Tool <url>http://expasy.org/tools/dna.html</url>. All the sequences were aligned using Bioedit ClustalW and the alignments were manually improved (Additional File <supplr sid="S3">3</supplr>). The unrooted tree was constructed using MEGA 5 <abbrgrp>
<abbr bid="B71">71</abbr>
</abbrgrp> with the neighbour-joining method and 1,000 bootstraps (Poisson model and pairwise deletion).</p>
<suppl id="S3">
<title>
<p>Additional file 3</p>
</title>
<text>
<p>
<b>Multiple protein sequence alignments of R2R3-MYBs in <it>A. auriculiformis </it>and <it>A. mangium </it>and other species used in phylogenetic tree construction</b>. R2 and R3 repeats are shown.</p>
</text>
<file name="1471-2164-12-342-S3.DOC">
   <p>Click here for file</p>
</file>
</suppl>
</sec>
<sec>
<st>
<p>Identification of MicroRNA genes and gene targets</p>
</st>
<p>Stem-loop sequences of all major plant miRNAs were downloaded from miRbase database. The transcriptomes of <it>Aa </it>and <it>Am </it>were searched for potential stem-loop miRNAs using local NCBI Blast-2.2.23+ Blastn algorithm (E-value &#8804; 1e-10). The matching sequences were trimmed to 1,000 bp before submitting to miRbase search tool to find stem-loop sequences and mature miRNAs. For miR396, miR160 and miR172 in <it>Aa</it>, PCR amplification and sequencing were carried out to find the complete stem-loop sequences. Primers flanking the gap region were designed based on primary transcript sequences (Additional file <supplr sid="S4">4</supplr>). RNA was extracted from inner bark tissues of the <it>Aa </it>individual (AA6) using Qiagen RNeasy Plant Mini kit. The quantity and quality of the total RNA was checked using Nanodrop ND-1000 Spectrophotometer and gel electrophoresis. 5 &#956;g of total RNA were treated with DNase and converted to cDNA using Fermentas RevertAid Premium Reverse Transcriptase. The PCR reaction consists of 300 ng cDNA, 1 &#215; PCR buffer, 2 mM MgCl<sub>2</sub>, 0.2 mM dNTP, 0.25 &#956;M of each primer and 1 U Vivantis Taq polymerase. The amplification profile consists of 2 min incubation at 94&#176;C, followed by 35 cycles of 94&#176;C for 30 s, 58&#176;C for 30 s, 72&#176;C for 30 s and a final extension of 72&#176;C for 10 min. The specific PCR products were observed on 1% agarose gel stained with ethidium bromide and purified using Qiagen Gel Extraction kit. The purified PCR products were cloned into Promega pGem-T Easy Vectors and transformed into <it>E. coli </it>strain JM109. The transformed bacteria were spread on a LB plate containing amplicilin, IPTG and X-gal before overnight incubation. Five colonies for each plate were selected and grown overnight in LB broth containing amplicilin. PCR amplification using 1 &#956;l of the culture pellet as DNA template were carried out to select three positive colonies for each primer pair. Plasmid was extracted using Qiagen Qiaprep Spin Miniprep kit and sent to First Base Laboratories Sdn. Bhd. (Malaysia) for forward and reverse sequencing using M13 primers. The sequence data were analyzed using Bioedit and gap regions were identified. Stem-loop sequences were extracted to predict secondary structures using Mfold 3.1 <url>http://http://mfold.rna.albany.edu/?q=mfold/RNA-Folding-Form/</url>. The secondary structures were examined visually and compared to the existing structures in the database. We used modified method from Zhang et al. <abbrgrp>
<abbr bid="B72">72</abbr>
</abbrgrp> to identify miRNA genes except lower cutoff value of for Minimal Folding Energy Index (MFEI) was set. miRNA genes with complete stem-loop and mature miRNA sequences are available in miRbase database. We assigned prefixes aau- and amg- to represent <it>A. auriculiformis </it>and <it>A. mangium</it>. The miRNA targets were identified in <it>Aa </it>and <it>Am </it>transcriptomes by allowing 3 mismatches using a custom search in psRNAtarget <url>http://bioinfo3.noble.org/psRNATarget/</url>.</p>
<suppl id="S4">
<title>
<p>Additional file 4</p>
</title>
<text>
<p>
<b>Primer pairs for miRNA stem-loop sequences in <it>A. auriculiformis</it>
</b>. The table shows the list of primer sequences with product size and annealing temperature used in the amplification of miR160, miR172 and miR396 stem-loop sequencing in <it>A. auriculiformis</it>.</p>
</text>
<file name="1471-2164-12-342-S4.DOC">
   <p>Click here for file</p>
</file>
</suppl>
</sec>
<sec>
<st>
<p>Detection of Single-nucleotide Polymorphisms (SNPs)</p>
</st>
<p>The filtered reads were mapped back to the reference using Bowtie-0.12.3 by allowing two mismatches. Only contigs at least 200 bp were used as reference. The generated SAM files were exported to Samtools 0.1.7 <abbrgrp>
<abbr bid="B73">73</abbr>
</abbrgrp> and converted to BAM format. We called SNPs using Samtools's Pileup command and removed any SNPs with a SNP score less than 20. The putative SNPs were further filtered using the following criteria: 1) Mapping and SNP score more than 100; 2) SNPs must be covered in at least 10 reads; 3) At least three non-reference alleles are present; 4) SNPs must be bi-allelic; 5) Minor allele frequency must be at least 5%; 6) Total frequency of major and minor allele must be at least 0.95. All filtering was done using Awk and Python scripts (available upon request).</p>
</sec>
</sec>
<sec>
<st>
<p>Authors' contributions</p>
</st>
<p>MW prepared the samples, performed data analysis and drafted the manuscript. CC assisted in bioinformatics analysis. WR secured funding and coordinated the project. All the authors read and approved the final manuscript.</p>
</sec>
</bdy><bm>
<ack>
<sec>
<st>
<p>Acknowledgements</p>
</st>
<p>We would like to acknowledge Forest Research Institute Malaysia for providing samples, Diane Miller and Zhao YongJun from Michael Smith Genome Sciences Centre for library construction and sequencing, Zhang Guojie for conceptual advice, Zhang Di for writing Python scripts, Syuhaidah Sulaiman for sequencing of miRNA stem-loops, and Simon Southerton for critical reading of the manuscript. We are extremely thankful to Xishuangbanna Tropical Botanical Garden for hosting an attachment. This project was funded by Universiti Kebangsaan Malaysia (UKM-GUP-KPB-08-33-131) and the Ministry of Science, Technology and Innovation (MOSTI), Malaysia (02-01-02-SF0403).</p>
</sec>
</ack>
<refgrp><bibl id="B1"><title><p>The impact of next-generation sequencing technology on genetics</p></title><aug><au><snm>Mardis</snm><fnm>ER</fnm></au></aug><source>Trends Genet</source><pubdate>2008</pubdate><volume>24</volume><issue>3</issue><fpage>133</fpage><lpage>141</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.tig.2007.12.007</pubid><pubid idtype="pmpid" link="fulltext">18262675</pubid></pubidlist></xrefbib></bibl><bibl id="B2"><title><p>Sequencing technologies - the next generation</p></title><aug><au><snm>Metzker</snm><fnm>ML</fnm></au></aug><source>Nat Rev Genet</source><pubdate>2010</pubdate><volume>11</volume><issue>1</issue><fpage>31</fpage><lpage>46</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nrg2626</pubid><pubid idtype="pmpid" link="fulltext">19997069</pubid></pubidlist></xrefbib></bibl><bibl id="B3"><title><p>Sampling the Arabidopsis transcriptome with massively parallel pyrosequencing</p></title><aug><au><snm>Weber</snm><fnm>AP</fnm></au><au><snm>Weber</snm><fnm>KL</fnm></au><au><snm>Carr</snm><fnm>K</fnm></au><au><snm>Wilkerson</snm><fnm>C</fnm></au><au><snm>Ohlrogge</snm><fnm>JB</fnm></au></aug><source>Plant Physiol</source><pubdate>2007</pubdate><volume>144</volume><issue>1</issue><fpage>32</fpage><lpage>42</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1104/pp.107.096677</pubid><pubid idtype="pmcid">1913805</pubid><pubid idtype="pmpid" link="fulltext">17351049</pubid></pubidlist></xrefbib></bibl><bibl id="B4"><title><p>Sequencing of natural strains of <it>Arabidopsis thaliana </it>with short reads</p></title><aug><au><snm>Ossowski</snm><fnm>S</fnm></au><au><snm>Schneeberger</snm><fnm>K</fnm></au><au><snm>Clark</snm><fnm>RM</fnm></au><au><snm>Lanz</snm><fnm>C</fnm></au><au><snm>Warthmann</snm><fnm>N</fnm></au><au><snm>Weigel</snm><fnm>D</fnm></au></aug><source>Genome Res</source><pubdate>2008</pubdate><volume>18</volume><issue>12</issue><fpage>2024</fpage><lpage>2033</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.080200.108</pubid><pubid idtype="pmcid">2593571</pubid><pubid idtype="pmpid" link="fulltext">18818371</pubid></pubidlist></xrefbib></bibl><bibl id="B5"><title><p>Sequencing <it>Medicago truncatula </it>expressed sequenced tags using 454 Life Sciences technology</p></title><aug><au><snm>Cheung</snm><fnm>F</fnm></au><au><snm>Haas</snm><fnm>BJ</fnm></au><au><snm>Goldberg</snm><fnm>SM</fnm></au><au><snm>May</snm><fnm>GD</fnm></au><au><snm>Xiao</snm><fnm>Y</fnm></au><au><snm>Town</snm><fnm>CD</fnm></au></aug><source>BMC Genomics</source><pubdate>2006</pubdate><volume>7</volume><fpage>272</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-7-272</pubid><pubid idtype="pmcid">1635983</pubid><pubid idtype="pmpid" link="fulltext">17062153</pubid></pubidlist></xrefbib></bibl><bibl id="B6"><title><p>Gene discovery and annotation using LCM-454 transcriptome sequencing</p></title><aug><au><snm>Emrich</snm><fnm>SJ</fnm></au><au><snm>Barbazuk</snm><fnm>WB</fnm></au><au><snm>Li</snm><fnm>L</fnm></au><au><snm>Schnable</snm><fnm>PS</fnm></au></aug><source>Genome Res</source><pubdate>2007</pubdate><volume>17</volume><issue>1</issue><fpage>69</fpage><lpage>73</lpage><xrefbib><pubidlist><pubid idtype="pmcid">1716268</pubid><pubid idtype="pmpid" link="fulltext">17095711</pubid></pubidlist></xrefbib></bibl><bibl id="B7"><title><p>SNP discovery via 454 transcriptome sequencing</p></title><aug><au><snm>Barbazuk</snm><fnm>WB</fnm></au><au><snm>Emrich</snm><fnm>SJ</fnm></au><au><snm>Chen</snm><fnm>HD</fnm></au><au><snm>Li</snm><fnm>L</fnm></au><au><snm>Schnable</snm><fnm>PS</fnm></au></aug><source>Plant J</source><pubdate>2007</pubdate><volume>51</volume><issue>5</issue><fpage>910</fpage><lpage>918</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1111/j.1365-313X.2007.03193.x</pubid><pubid idtype="pmcid">2169515</pubid><pubid idtype="pmpid" link="fulltext">17662031</pubid></pubidlist></xrefbib></bibl><bibl id="B8"><title><p>Next-generation sequencing technologies and their implications for crop genetics and breeding</p></title><aug><au><snm>Varshney</snm><fnm>RK</fnm></au><au><snm>Nayak</snm><fnm>SN</fnm></au><au><snm>May</snm><fnm>GD</fnm></au><au><snm>Jackson</snm><fnm>SA</fnm></au></aug><source>Trends Biotechnol</source><pubdate>2009</pubdate><volume>27</volume><issue>9</issue><fpage>522</fpage><lpage>530</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.tibtech.2009.05.006</pubid><pubid idtype="pmpid" link="fulltext">19679362</pubid></pubidlist></xrefbib></bibl><bibl id="B9"><title><p>High-throughput gene and SNP discovery in <it>Eucalyptus grandis</it>, an uncharacterized genome</p></title><aug><au><snm>Novaes</snm><fnm>E</fnm></au><au><snm>Drost</snm><fnm>DR</fnm></au><au><snm>Farmerie</snm><fnm>WG</fnm></au><au><snm>Pappas</snm><fnm>GJ</fnm></au><au><snm>Grattapaglia</snm><fnm>D</fnm></au><au><snm>Sederoff</snm><fnm>RR</fnm></au><au><snm>Kirst</snm><fnm>M</fnm></au></aug><source>BMC Genomics</source><pubdate>2008</pubdate><volume>9</volume><fpage>312</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-9-312</pubid><pubid idtype="pmcid">2483731</pubid><pubid idtype="pmpid" link="fulltext">18590545</pubid></pubidlist></xrefbib></bibl><bibl id="B10"><title><p>De novo sequencing and analysis of the American ginseng root transcriptome using a GS FLX Titanium platform to discover putative genes involved in ginsenoside biosynthesis</p></title><aug><au><snm>Sun</snm><fnm>C</fnm></au><au><snm>Li</snm><fnm>Y</fnm></au><au><snm>Wu</snm><fnm>Q</fnm></au><au><snm>Luo</snm><fnm>H</fnm></au><au><snm>Sun</snm><fnm>Y</fnm></au><au><snm>Song</snm><fnm>J</fnm></au><au><snm>Lui</snm><fnm>EM</fnm></au><au><snm>Chen</snm><fnm>S</fnm></au></aug><source>BMC Genomics</source><pubdate>2010</pubdate><volume>11</volume><fpage>262</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-11-262</pubid><pubid idtype="pmcid">2873478</pubid><pubid idtype="pmpid" link="fulltext">20416102</pubid></pubidlist></xrefbib></bibl><bibl id="B11"><title><p>Comparison of the transcriptomes of American chestnut (<it>Castanea dentata</it>) and Chinese chestnut (<it>Castanea mollissima</it>) in response to the chestnut blight infection</p></title><aug><au><snm>Barakat</snm><fnm>A</fnm></au><au><snm>DiLoreto</snm><fnm>DS</fnm></au><au><snm>Zhang</snm><fnm>Y</fnm></au><au><snm>Smith</snm><fnm>C</fnm></au><au><snm>Baier</snm><fnm>K</fnm></au><au><snm>Powell</snm><fnm>WA</fnm></au><au><snm>Wheeler</snm><fnm>N</fnm></au><au><snm>Sederoff</snm><fnm>R</fnm></au><au><snm>Carlson</snm><fnm>JE</fnm></au></aug><source>BMC Plant Biol</source><pubdate>2009</pubdate><volume>9</volume><fpage>51</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2229-9-51</pubid><pubid idtype="pmcid">2688492</pubid><pubid idtype="pmpid" link="fulltext">19426529</pubid></pubidlist></xrefbib></bibl><bibl id="B12"><title><p>Global characterization of <it>Artemisia annua </it>glandular trichome transcriptome using 454 pyrosequencing</p></title><aug><au><snm>Wang</snm><fnm>W</fnm></au><au><snm>Wang</snm><fnm>Y</fnm></au><au><snm>Zhang</snm><fnm>Q</fnm></au><au><snm>Qi</snm><fnm>Y</fnm></au><au><snm>Guo</snm><fnm>D</fnm></au></aug><source>BMC Genomics</source><pubdate>2009</pubdate><volume>10</volume><fpage>465</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-10-465</pubid><pubid idtype="pmcid">2763888</pubid><pubid idtype="pmpid" link="fulltext">19818120</pubid></pubidlist></xrefbib></bibl><bibl id="B13"><title><p>Transcriptome sequencing and comparative analysis of cucumber flowers with different sex types</p></title><aug><au><snm>Guo</snm><fnm>S</fnm></au><au><snm>Zheng</snm><fnm>Y</fnm></au><au><snm>Joung</snm><fnm>JG</fnm></au><au><snm>Liu</snm><fnm>S</fnm></au><au><snm>Zhang</snm><fnm>Z</fnm></au><au><snm>Crasta</snm><fnm>OR</fnm></au><au><snm>Sobral</snm><fnm>BW</fnm></au><au><snm>Xu</snm><fnm>Y</fnm></au><au><snm>Huang</snm><fnm>S</fnm></au><au><snm>Fei</snm><fnm>Z</fnm></au></aug><source>BMC Genomics</source><pubdate>2010</pubdate><volume>11</volume><fpage>384</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-11-384</pubid><pubid idtype="pmcid">2897810</pubid><pubid idtype="pmpid" link="fulltext">20565788</pubid></pubidlist></xrefbib></bibl><bibl id="B14"><title><p>The genome of the cucumber, <it>Cucumis sativus </it>L</p></title><aug><au><snm>Huang</snm><fnm>S</fnm></au><au><snm>Li</snm><fnm>R</fnm></au><au><snm>Zhang</snm><fnm>Z</fnm></au><au><snm>Li</snm><fnm>L</fnm></au><au><snm>Gu</snm><fnm>X</fnm></au><au><snm>Fan</snm><fnm>W</fnm></au><au><snm>Lucas</snm><fnm>WJ</fnm></au><au><snm>Wang</snm><fnm>X</fnm></au><au><snm>Xie</snm><fnm>B</fnm></au><au><snm>Ni</snm><fnm>P</fnm></au><au><snm>Ren</snm><fnm>Y</fnm></au><au><snm>Zhu</snm><fnm>H</fnm></au><au><snm>Li</snm><fnm>J</fnm></au><au><snm>Lin</snm><fnm>K</fnm></au><au><snm>Jin</snm><fnm>W</fnm></au><au><snm>Fei</snm><fnm>Z</fnm></au><au><snm>Li</snm><fnm>G</fnm></au><au><snm>Staub</snm><fnm>J</fnm></au><au><snm>Kilian</snm><fnm>A</fnm></au><au><snm>van der Vossen</snm><fnm>EA</fnm></au><au><snm>Wu</snm><fnm>Y</fnm></au><au><snm>Guo</snm><fnm>J</fnm></au><au><snm>He</snm><fnm>J</fnm></au><au><snm>Jia</snm><fnm>Z</fnm></au><au><snm>Tian</snm><fnm>G</fnm></au><au><snm>Lu</snm><fnm>Y</fnm></au><au><snm>Ruan</snm><fnm>J</fnm></au><au><snm>Qian</snm><fnm>W</fnm></au><au><snm>Wang</snm><fnm>M</fnm></au><au><snm>Huang</snm><fnm>Q</fnm></au><etal/></aug><source>Nat Genet</source><pubdate>2009</pubdate><volume>41</volume><issue>12</issue><fpage>1275</fpage><lpage>1281</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng.475</pubid><pubid idtype="pmpid" link="fulltext">19881527</pubid></pubidlist></xrefbib></bibl><bibl id="B15"><title><p>Genomic and small RNA sequencing of <it>Miscanthus &#215; giganteus </it>shows the utility of sorghum as a reference genome sequence for Andropogoneae grasses</p></title><aug><au><snm>Swaminathan</snm><fnm>K</fnm></au><au><snm>Alabady</snm><fnm>MS</fnm></au><au><snm>Varala</snm><fnm>K</fnm></au><au><snm>De Paoli</snm><fnm>E</fnm></au><au><snm>Ho</snm><fnm>I</fnm></au><au><snm>Rokhsar</snm><fnm>DS</fnm></au><au><snm>Arumuganathan</snm><fnm>AK</fnm></au><au><snm>Ming</snm><fnm>R</fnm></au><au><snm>Green</snm><fnm>PJ</fnm></au><au><snm>Meyers</snm><fnm>BC</fnm></au><au><snm>Moose</snm><fnm>SP</fnm></au><au><snm>Hudson</snm><fnm>ME</fnm></au></aug><source>Genome Biol</source><pubdate>2010</pubdate><volume>11</volume><issue>2</issue><fpage>R12</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/gb-2010-11-2-r12</pubid><pubid idtype="pmcid">2872872</pubid><pubid idtype="pmpid" link="fulltext">20128909</pubid></pubidlist></xrefbib></bibl><bibl id="B16"><title><p>An approach to transcriptome analysis of non-model organisms using short-read sequences</p></title><aug><au><snm>Collins</snm><fnm>LJ</fnm></au><au><snm>Biggs</snm><fnm>PJ</fnm></au><au><snm>Voelckel</snm><fnm>C</fnm></au><au><snm>Joly</snm><fnm>S</fnm></au></aug><source>Genome Inform</source><pubdate>2008</pubdate><volume>21</volume><fpage>3</fpage><lpage>14</lpage><xrefbib><pubid idtype="pmpid">19425143</pubid></xrefbib></bibl><bibl id="B17"><title><p>Introduction to a plantation species - <it>Acacia mangium </it>Willd</p></title><aug><au><snm>Tham</snm><fnm>CK</fnm></au></aug><source>Proceedings of the 6th Malaysian Forestry Conference, Kuching, Sarawak</source><pubdate>1976</pubdate><volume>2</volume><fpage>11</fpage><lpage>17</lpage></bibl><bibl id="B18"><title><p>Diseases and potential threats to <it>Acacia mangium </it>plantations in Malaysia</p></title><aug><au><snm>Lee</snm><fnm>SS</fnm></au></aug><source>Unasylva</source><pubdate>2004</pubdate><volume>55</volume><issue>217</issue><fpage>31</fpage><lpage>35</lpage></bibl><bibl id="B19"><title><p>Breeding systems and genetic diversity in <it>Acacia auriculiformis </it>and <it>A. crassicarpa</it></p></title><aug><au><snm>Moran</snm><fnm>GF</fnm></au><au><snm>Muona</snm><fnm>O</fnm></au><au><snm>Bell</snm><fnm>JC</fnm></au></aug><source>Biotropica</source><pubdate>1989</pubdate><volume>21</volume><issue>3</issue><fpage>250</fpage><lpage>256</lpage><xrefbib><pubid idtype="doi">10.2307/2388652</pubid></xrefbib></bibl><bibl id="B20"><title><p>Spatial heterogeneity of outcrossing rates in <it>Acacia auriculiformis </it>A.Cunn.ex Benth in Australia and Papua New Guinea</p></title><aug><au><snm>Wickneswari</snm><fnm>R</fnm></au><au><snm>Norwati</snm><fnm>M</fnm></au></aug><source>Population genetics and genetic conservation of forest trees</source><pubdate>1995</pubdate><fpage>329</fpage><lpage>337</lpage></bibl><bibl id="B21"><title><p>Studies on <it>Acacia mangium </it>in Kemasul Forest, Malaysia I. Biomass and productivity</p></title><aug><au><snm>Lim</snm><fnm>MT</fnm></au></aug><source>Journal of Tropical Ecology</source><pubdate>1988</pubdate><volume>4</volume><fpage>293</fpage><lpage>302</lpage><xrefbib><pubid idtype="doi">10.1017/S0266467400002856</pubid></xrefbib></bibl><bibl id="B22"><title><p>Possibility of improvement in fundamental properties of wood of Acacia hybrids by artificial hybridization</p></title><aug><au><snm>Kim</snm><fnm>NT</fnm></au><au><snm>Matsumura</snm><fnm>J</fnm></au><au><snm>Oda</snm><fnm>K</fnm></au><au><snm>Cuong</snm><fnm>NV</fnm></au></aug><source>Journal of Wood Science</source><pubdate>2009</pubdate><volume>55</volume><issue>1</issue><fpage>8</fpage><lpage>12</lpage><xrefbib><pubid idtype="doi">10.1007/s10086-008-0993-1</pubid></xrefbib></bibl><bibl id="B23"><title><p>Repression of lignin biosynthesis promotes cellulose accumulation and growth in transgenic trees</p></title><aug><au><snm>Hu</snm><fnm>WJ</fnm></au><au><snm>Harding</snm><fnm>SA</fnm></au><au><snm>Lung</snm><fnm>J</fnm></au><au><snm>Popko</snm><fnm>JL</fnm></au><au><snm>Ralph</snm><fnm>J</fnm></au><au><snm>Stokke</snm><fnm>DD</fnm></au><au><snm>Tsai</snm><fnm>CJ</fnm></au><au><snm>Chiang</snm><fnm>VL</fnm></au></aug><source>Nat Biotechnol</source><pubdate>1999</pubdate><volume>17</volume><issue>8</issue><fpage>808</fpage><lpage>812</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/11758</pubid><pubid idtype="pmpid" link="fulltext">10429249</pubid></pubidlist></xrefbib></bibl><bibl id="B24"><title><p>Transcriptional regulation in wood formation</p></title><aug><au><snm>Demura</snm><fnm>T</fnm></au><au><snm>Fukuda</snm><fnm>H</fnm></au></aug><source>Trends Plant Sci</source><pubdate>2007</pubdate><volume>12</volume><issue>2</issue><fpage>64</fpage><lpage>70</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.tplants.2006.12.006</pubid><pubid idtype="pmpid" link="fulltext">17224301</pubid></pubidlist></xrefbib></bibl><bibl id="B25"><title><p>Novel and mechanical stress-responsive MicroRNAs in <it>Populus trichocarpa </it>that are absent from Arabidopsis</p></title><aug><au><snm>Lu</snm><fnm>S</fnm></au><au><snm>Sun</snm><fnm>YH</fnm></au><au><snm>Shi</snm><fnm>R</fnm></au><au><snm>Clark</snm><fnm>C</fnm></au><au><snm>Li</snm><fnm>L</fnm></au><au><snm>Chiang</snm><fnm>VL</fnm></au></aug><source>Plant Cell</source><pubdate>2005</pubdate><volume>17</volume><issue>8</issue><fpage>2186</fpage><lpage>2203</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1105/tpc.105.033456</pubid><pubid idtype="pmcid">1182482</pubid><pubid idtype="pmpid" link="fulltext">15994906</pubid></pubidlist></xrefbib></bibl><bibl id="B26"><title><p>Plant DNA C-values Databases</p></title><url>http://data.kew.org/cvalues/</url></bibl><bibl id="B27"><title><p><it>In vitro </it>polyploid induction in Acacia</p></title><aug><au><snm>Yap</snm><fnm>JW</fnm></au></aug><publisher>Universiti Kebangsaan Malaysia</publisher><pubdate>2010</pubdate><note>M.Sc Thesis</note></bibl><bibl id="B28"><title><p>Isolation and characterization of flower-specific transcripts in <it>Acacia mangium</it></p></title><aug><au><snm>Wang</snm><fnm>XJ</fnm></au><au><snm>Cao</snm><fnm>XL</fnm></au><au><snm>Hong</snm><fnm>Y</fnm></au></aug><source>Tree Physiol</source><pubdate>2005</pubdate><volume>25</volume><issue>2</issue><fpage>167</fpage><lpage>178</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">15574398</pubid></xrefbib></bibl><bibl id="B29"><title><p>Analysis of expressed sequence tags in developing secondary xylem and shoot of <it>Acacia mangium</it></p></title><aug><au><snm>Suzuki</snm><fnm>S</fnm></au><au><snm>Suda</snm><fnm>K</fnm></au><au><snm>Sakurai</snm><fnm>N</fnm></au><au><snm>Ogata</snm><fnm>Y</fnm></au><au><snm>Hattori</snm><fnm>T</fnm></au><au><snm>Suzuki</snm><fnm>H</fnm></au><au><snm>Shibata</snm><fnm>D</fnm></au><au><snm>Umezawa</snm><fnm>T</fnm></au></aug><source>Journal of Wood Science</source><pubdate>2011</pubdate><volume>57</volume><issue>1</issue><fpage>40</fpage><lpage>46</lpage><xrefbib><pubid idtype="doi">10.1007/s10086-010-1141-2</pubid></xrefbib></bibl><bibl id="B30"><title><p>Analysis of ESTs generated from inner bark tissue of an <it>Acacia auriculiformis </it>x <it>Acacia mangium </it>hybrid</p></title><aug><au><snm>Yong</snm><fnm>SYC</fnm></au><au><snm>Choong</snm><fnm>CY</fnm></au><au><snm>Cheong</snm><fnm>PL</fnm></au><au><snm>Pang</snm><fnm>SL</fnm></au><au><snm>Nor Amalina</snm><fnm>R</fnm></au><au><snm>Harikrishna</snm><fnm>JA</fnm></au><au><snm>Mat-Isa</snm><fnm>MN</fnm></au><au><snm>Hedley</snm><fnm>P</fnm></au><au><snm>Milne</snm><fnm>L</fnm></au><au><snm>Vaillancourt</snm><fnm>R</fnm></au><au><snm>Wickneswari</snm><fnm>R</fnm></au></aug><source>Tree Genetics and Genomes</source><pubdate>2011</pubdate><volume>7</volume><issue>1</issue><fpage>143</fpage><lpage>152</lpage><xrefbib><pubid idtype="doi">10.1007/s11295-010-0321-y</pubid></xrefbib></bibl><bibl id="B31"><title><p>Extensive DNA sequence variations in two lignin genes, Cinnamate 4-hydroxylase and Cinnamyl Alcohol Dehydrogenase from <it>Acacia mangium </it>and <it>Acacia auriculiformis</it></p></title><aug><au><snm>Nur Fariza</snm><fnm>MS</fnm></au><au><snm>Pang</snm><fnm>SL</fnm></au><au><snm>Choong</snm><fnm>CY</fnm></au><au><snm>Wickneswari</snm><fnm>R</fnm></au></aug><source>Journal of Biological Sciences</source><pubdate>2008</pubdate><volume>8</volume><issue>3</issue><fpage>687</fpage><lpage>690</lpage><xrefbib><pubid idtype="doi">10.3923/jbs.2008.687.690</pubid></xrefbib></bibl><bibl id="B32"><title><p>Velvet: algorithms for de novo short read assembly using de Bruijn graphs</p></title><aug><au><snm>Zerbino</snm><fnm>DR</fnm></au><au><snm>Birney</snm><fnm>E</fnm></au></aug><source>Genome Res</source><pubdate>2008</pubdate><volume>18</volume><issue>5</issue><fpage>821</fpage><lpage>829</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.074492.107</pubid><pubid idtype="pmcid">2336801</pubid><pubid idtype="pmpid" link="fulltext">18349386</pubid></pubidlist></xrefbib></bibl><bibl id="B33"><title><p>SOAPdenovo</p></title><url>http://soap.genomics.org.cn/soapdenovo.html</url></bibl><bibl id="B34"><title><p>Oases</p></title><url>http://www.ebi.ac.uk/~zerbino/oases/</url></bibl><bibl id="B35"><title><p>Survey of the transcriptome of <it>Aspergillus oryzae </it>via massively parallel mRNA sequencing</p></title><aug><au><snm>Wang</snm><fnm>B</fnm></au><au><snm>Guo</snm><fnm>G</fnm></au><au><snm>Wang</snm><fnm>C</fnm></au><au><snm>Lin</snm><fnm>Y</fnm></au><au><snm>Wang</snm><fnm>X</fnm></au><au><snm>Zhao</snm><fnm>M</fnm></au><au><snm>Guo</snm><fnm>Y</fnm></au><au><snm>He</snm><fnm>M</fnm></au><au><snm>Zhang</snm><fnm>Y</fnm></au><au><snm>Pan</snm><fnm>L</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2010</pubdate><volume>38</volume><issue>15</issue><fpage>5075</fpage><lpage>5087</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkq256</pubid><pubid idtype="pmcid">2926611</pubid><pubid idtype="pmpid" link="fulltext">20392818</pubid></pubidlist></xrefbib></bibl><bibl id="B36"><title><p>Genome-wide characterization of the lignification toolbox in Arabidopsis</p></title><aug><au><snm>Raes</snm><fnm>J</fnm></au><au><snm>Rohde</snm><fnm>A</fnm></au><au><snm>Christensen</snm><fnm>JH</fnm></au><au><snm>Van de Peer</snm><fnm>Y</fnm></au><au><snm>Boerjan</snm><fnm>W</fnm></au></aug><source>Plant Physiol</source><pubdate>2003</pubdate><volume>133</volume><issue>3</issue><fpage>1051</fpage><lpage>1071</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1104/pp.103.026484</pubid><pubid idtype="pmcid">523881</pubid><pubid idtype="pmpid" link="fulltext">14612585</pubid></pubidlist></xrefbib></bibl><bibl id="B37"><title><p>Towards a systems approach for lignin biosynthesis in <it>Populus trichocarpa</it>: transcript abundance and specificity of the monolignol biosynthetic genes</p></title><aug><au><snm>Shi</snm><fnm>R</fnm></au><au><snm>Sun</snm><fnm>YH</fnm></au><au><snm>Li</snm><fnm>Q</fnm></au><au><snm>Heber</snm><fnm>S</fnm></au><au><snm>Sederoff</snm><fnm>R</fnm></au><au><snm>Chiang</snm><fnm>VL</fnm></au></aug><source>Plant Cell Physiol</source><pubdate>2010</pubdate><volume>51</volume><issue>1</issue><fpage>144</fpage><lpage>163</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/pcp/pcp175</pubid><pubid idtype="pmpid" link="fulltext">19996151</pubid></pubidlist></xrefbib></bibl><bibl id="B38"><title><p>Identification of 4-coumarate:coenzyme A ligase (4CL) substrate recognition domains</p></title><aug><au><snm>Ehlting</snm><fnm>J</fnm></au><au><snm>Shin</snm><fnm>JJ</fnm></au><au><snm>Douglas</snm><fnm>CJ</fnm></au></aug><source>Plant J</source><pubdate>2001</pubdate><volume>27</volume><issue>5</issue><fpage>455</fpage><lpage>465</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1046/j.1365-313X.2001.01122.x</pubid><pubid idtype="pmpid" link="fulltext">11576429</pubid></pubidlist></xrefbib></bibl><bibl id="B39"><title><p>Structural basis for the modulation of lignin monomer methylation by caffeic acid/5-hydroxyferulic acid 3/5-O-methyltransferase</p></title><aug><au><snm>Zubieta</snm><fnm>C</fnm></au><au><snm>Kota</snm><fnm>P</fnm></au><au><snm>Ferrer</snm><fnm>JL</fnm></au><au><snm>Dixon</snm><fnm>RA</fnm></au><au><snm>Noel</snm><fnm>JP</fnm></au></aug><source>Plant Cell</source><pubdate>2002</pubdate><volume>14</volume><issue>6</issue><fpage>1265</fpage><lpage>1277</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1105/tpc.001412</pubid><pubid idtype="pmcid">150779</pubid><pubid idtype="pmpid" link="fulltext">12084826</pubid></pubidlist></xrefbib></bibl><bibl id="B40"><title><p>Plant cytochrome P450 monooxygenases</p></title><aug><au><snm>Schuler</snm><fnm>MA</fnm></au></aug><source>Critical Reviews in Plant Sciences</source><pubdate>1996</pubdate><volume>15</volume><issue>3</issue><fpage>235</fpage><lpage>284</lpage></bibl><bibl id="B41"><title><p>A molecular model for cinnamyl alcohol dehydrogenase, a plant aromatic alcohol dehydrogenase involved in lignification</p></title><aug><au><snm>McKie</snm><fnm>JH</fnm></au><au><snm>Jaouhari</snm><fnm>R</fnm></au><au><snm>Douglas</snm><fnm>KT</fnm></au><au><snm>Goffner</snm><fnm>D</fnm></au><au><snm>Feuillet</snm><fnm>C</fnm></au><au><snm>Grima-Pettenati</snm><fnm>J</fnm></au><au><snm>Boudet</snm><fnm>AM</fnm></au><au><snm>Baltas</snm><fnm>M</fnm></au><au><snm>Gorrichon</snm><fnm>L</fnm></au></aug><source>Biochimica et Biophysica Acta (BBA) - Protein Structure and Molecular Enzymology</source><pubdate>1993</pubdate><volume>1202</volume><issue>1</issue><fpage>61</fpage><lpage>69</lpage><xrefbib><pubid idtype="doi">10.1016/0167-4838(93)90063-W</pubid></xrefbib></bibl><bibl id="B42"><title><p>Isolation and characterisation of three cinnamyl alcohol dehydrogenase homologue cDNAs from perennial ryegrass (<it>Lolium perenne </it>L.)</p></title><aug><au><snm>Lynch</snm><fnm>D</fnm></au><au><snm>Lidgett</snm><fnm>A</fnm></au><au><snm>McInnes</snm><fnm>R</fnm></au><au><snm>Huxley</snm><fnm>H</fnm></au><au><snm>Jones</snm><fnm>E</fnm></au><au><snm>Mahoney</snm><fnm>N</fnm></au><au><snm>Spangenberg</snm><fnm>G</fnm></au></aug><source>Journal of Plant Physiology</source><pubdate>2002</pubdate><volume>159</volume><issue>6</issue><fpage>653</fpage><lpage>660</lpage><xrefbib><pubid idtype="doi">10.1078/0176-1617-0776</pubid></xrefbib></bibl><bibl id="B43"><title><p>Conserved sequence motifs in plant S-adenosyl-L-methionine-dependent methyltransferases</p></title><aug><au><snm>Joshi</snm><fnm>CP</fnm></au><au><snm>Chiang</snm><fnm>VL</fnm></au></aug><source>Plant Molecular Biology</source><pubdate>1998</pubdate><volume>37</volume><issue>4</issue><fpage>663</fpage><lpage>674</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1023/A:1006035210889</pubid><pubid idtype="pmpid" link="fulltext">9687070</pubid></pubidlist></xrefbib></bibl><bibl id="B44"><title><p>Molecular cloning and characterization of cDNAs encoding cinnamoyl CoA reductase (CCR) from barley (<it>Hordeum vulgare</it>) and potato (<it>Solanum tuberosum</it>)</p></title><aug><au><snm>Larsen</snm><fnm>K</fnm></au></aug><source>J Plant Physiol</source><pubdate>2004</pubdate><volume>161</volume><issue>1</issue><fpage>105</fpage><lpage>112</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1078/0176-1617-01074</pubid><pubid idtype="pmpid">15002670</pubid></pubidlist></xrefbib></bibl><bibl id="B45"><title><p>Purification, cloning, and properties of an acyltransferase controlling shikimate and quinate ester intermediates in phenylpropanoid metabolism</p></title><aug><au><snm>Hoffmann</snm><fnm>L</fnm></au><au><snm>Maury</snm><fnm>S</fnm></au><au><snm>Martz</snm><fnm>F</fnm></au><au><snm>Geoffroy</snm><fnm>P</fnm></au><au><snm>Legrand</snm><fnm>M</fnm></au></aug><source>J Biol Chem</source><pubdate>2003</pubdate><volume>278</volume><issue>1</issue><fpage>95</fpage><lpage>103</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">12381722</pubid></xrefbib></bibl><bibl id="B46"><title><p>The phenylalanine ammonia-lyase gene family in <it>Arabidopsis thaliana</it></p></title><aug><au><snm>Wanner</snm><fnm>LA</fnm></au><au><snm>Li</snm><fnm>G</fnm></au><au><snm>Ware</snm><fnm>D</fnm></au><au><snm>Somssich</snm><fnm>IE</fnm></au><au><snm>Davis</snm><fnm>KR</fnm></au></aug><source>Plant Mol Biol</source><pubdate>1995</pubdate><volume>27</volume><issue>2</issue><fpage>327</fpage><lpage>338</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1007/BF00020187</pubid><pubid idtype="pmpid">7888622</pubid></pubidlist></xrefbib></bibl><bibl id="B47"><title><p>De novo assembly of short sequence reads</p></title><aug><au><snm>Paszkiewicz</snm><fnm>K</fnm></au><au><snm>Studholme</snm><fnm>DJ</fnm></au></aug><source>Brief Bioinform</source><pubdate>2010</pubdate><volume>11</volume><issue>5</issue><fpage>457</fpage><lpage>472</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bib/bbq020</pubid><pubid idtype="pmpid" link="fulltext">20724458</pubid></pubidlist></xrefbib></bibl><bibl id="B48"><title><p>PlnTFDB: updated content and new features of the plant transcription factor database</p></title><aug><au><snm>Perez-Rodriguez</snm><fnm>P</fnm></au><au><snm>Riano-Pachon</snm><fnm>DM</fnm></au><au><snm>Correa</snm><fnm>LG</fnm></au><au><snm>Rensing</snm><fnm>SA</fnm></au><au><snm>Kersten</snm><fnm>B</fnm></au><au><snm>Mueller-Roeber</snm><fnm>B</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2010</pubdate><volume>38</volume><issue>suppl 1</issue><fpage>D822</fpage><lpage>827</lpage><xrefbib><pubidlist><pubid idtype="pmcid">2808933</pubid><pubid idtype="pmpid" link="fulltext">19858103</pubid></pubidlist></xrefbib></bibl><bibl id="B49"><title><p>Transcriptional regulation of lignin biosynthesis</p></title><aug><au><snm>Zhong</snm><fnm>R</fnm></au><au><snm>Ye</snm><fnm>ZH</fnm></au></aug><source>Plant Signal Behav</source><pubdate>2009</pubdate><volume>4</volume><issue>11</issue><fpage>1028</fpage><lpage>1034</lpage><xrefbib><pubidlist><pubid idtype="doi">10.4161/psb.4.11.9875</pubid><pubid idtype="pmcid">2819510</pubid><pubid idtype="pmpid" link="fulltext">19838072</pubid></pubidlist></xrefbib></bibl><bibl id="B50"><title><p>Functional analysis of tobacco LIM protein Ntlim1 involved in lignin biosynthesis</p></title><aug><au><snm>Kawaoka</snm><fnm>A</fnm></au><au><snm>Kaothien</snm><fnm>P</fnm></au><au><snm>Yoshida</snm><fnm>K</fnm></au><au><snm>Endo</snm><fnm>S</fnm></au><au><snm>Yamada</snm><fnm>K</fnm></au><au><snm>Ebinuma</snm><fnm>H</fnm></au></aug><source>Plant J</source><pubdate>2000</pubdate><volume>22</volume><issue>4</issue><fpage>289</fpage><lpage>301</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1046/j.1365-313x.2000.00737.x</pubid><pubid idtype="pmpid" link="fulltext">10849346</pubid></pubidlist></xrefbib></bibl><bibl id="B51"><title><p>Involvement of <it>Pinus taeda </it>MYB1 and MYB8 in phenylpropanoid metabolism and secondary cell wall biogenesis: a comparative in planta analysis</p></title><aug><au><snm>Bomal</snm><fnm>C</fnm></au><au><snm>Bedon</snm><fnm>F</fnm></au><au><snm>Caron</snm><fnm>S</fnm></au><au><snm>Mansfield</snm><fnm>SD</fnm></au><au><snm>Levasseur</snm><fnm>C</fnm></au><au><snm>Cooke</snm><fnm>JE</fnm></au><au><snm>Blais</snm><fnm>S</fnm></au><au><snm>Tremblay</snm><fnm>L</fnm></au><au><snm>Morency</snm><fnm>MJ</fnm></au><au><snm>Pavy</snm><fnm>N</fnm></au><au><snm>Grima-Pettenati</snm><fnm>J</fnm></au><au><snm>Seguin</snm><fnm>A</fnm></au><au><snm>Mackay</snm><fnm>J</fnm></au></aug><source>J Exp Bot</source><pubdate>2008</pubdate><volume>59</volume><issue>14</issue><fpage>3925</fpage><lpage>3939</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/jxb/ern234</pubid><pubid idtype="pmcid">2576632</pubid><pubid idtype="pmpid" link="fulltext">18805909</pubid></pubidlist></xrefbib></bibl><bibl id="B52"><title><p>Involvement of the R2R3-MYB, AtMYB61, in the ectopic lignification and dark-photomorphogenic components of the det3 mutant phenotype</p></title><aug><au><snm>Newman</snm><fnm>LJ</fnm></au><au><snm>Perazza</snm><fnm>DE</fnm></au><au><snm>Juda</snm><fnm>L</fnm></au><au><snm>Campbell</snm><fnm>MM</fnm></au></aug><source>Plant J</source><pubdate>2004</pubdate><volume>37</volume><issue>2</issue><fpage>239</fpage><lpage>250</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1046/j.1365-313X.2003.01953.x</pubid><pubid idtype="pmpid" link="fulltext">14690508</pubid></pubidlist></xrefbib></bibl><bibl id="B53"><title><p>The MYB46 transcription factor is a direct target of SND1 and regulates secondary wall biosynthesis in Arabidopsis</p></title><aug><au><snm>Zhong</snm><fnm>R</fnm></au><au><snm>Richardson</snm><fnm>EA</fnm></au><au><snm>Ye</snm><fnm>ZH</fnm></au></aug><source>Plant Cell</source><pubdate>2007</pubdate><volume>19</volume><issue>9</issue><fpage>2776</fpage><lpage>2792</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1105/tpc.107.053678</pubid><pubid idtype="pmcid">2048704</pubid><pubid idtype="pmpid" link="fulltext">17890373</pubid></pubidlist></xrefbib></bibl><bibl id="B54"><title><p>A battery of transcription factors involved in the regulation of secondary cell wall biosynthesis in Arabidopsis</p></title><aug><au><snm>Zhong</snm><fnm>R</fnm></au><au><snm>Lee</snm><fnm>C</fnm></au><au><snm>Zhou</snm><fnm>J</fnm></au><au><snm>McCarthy</snm><fnm>RL</fnm></au><au><snm>Ye</snm><fnm>ZH</fnm></au></aug><source>Plant Cell</source><pubdate>2008</pubdate><volume>20</volume><issue>10</issue><fpage>2763</fpage><lpage>2782</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1105/tpc.108.061325</pubid><pubid idtype="pmcid">2590737</pubid><pubid idtype="pmpid" link="fulltext">18952777</pubid></pubidlist></xrefbib></bibl><bibl id="B55"><title><p>Characterisation of Pt MYB1, an R2R3-MYB from pine xylem</p></title><aug><au><snm>Patzlaff</snm><fnm>A</fnm></au><au><snm>Newman</snm><fnm>LJ</fnm></au><au><snm>Dubos</snm><fnm>C</fnm></au><au><snm>Whetten</snm><fnm>RW</fnm></au><au><snm>Smith</snm><fnm>C</fnm></au><au><snm>McInnis</snm><fnm>S</fnm></au><au><snm>Bevan</snm><fnm>MW</fnm></au><au><snm>Sederoff</snm><fnm>RR</fnm></au><au><snm>Campbell</snm><fnm>MM</fnm></au></aug><source>Plant Mol Biol</source><pubdate>2003</pubdate><volume>53</volume><issue>4</issue><fpage>597</fpage><lpage>608</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">15010621</pubid></xrefbib></bibl><bibl id="B56"><title><p>EgMYB1, an R2R3 MYB transcription factor from eucalyptus negatively regulates secondary cell wall formation in Arabidopsis and poplar</p></title><aug><au><snm>Legay</snm><fnm>S</fnm></au><au><snm>Sivadon</snm><fnm>P</fnm></au><au><snm>Blervacq</snm><fnm>AS</fnm></au><au><snm>Pavy</snm><fnm>N</fnm></au><au><snm>Baghdady</snm><fnm>A</fnm></au><au><snm>Tremblay</snm><fnm>L</fnm></au><au><snm>Levasseur</snm><fnm>C</fnm></au><au><snm>Ladouce</snm><fnm>N</fnm></au><au><snm>Lapierre</snm><fnm>C</fnm></au><au><snm>Seguin</snm><fnm>A</fnm></au><au><snm>Hawkins</snm><fnm>S</fnm></au><au><snm>Mackay</snm><fnm>J</fnm></au><au><snm>Grima-Pettenati</snm><fnm>J</fnm></au></aug><source>New Phytol</source><pubdate>2010</pubdate><volume>188</volume><issue>3</issue><fpage>774</fpage><lpage>786</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1111/j.1469-8137.2010.03432.x</pubid><pubid idtype="pmpid">20955415</pubid></pubidlist></xrefbib></bibl><bibl id="B57"><title><p>The AmMYB308 and AmMYB330 transcription factors from antirrhinum regulate phenylpropanoid and lignin biosynthesis in transgenic tobacco</p></title><aug><au><snm>Tamagnone</snm><fnm>L</fnm></au><au><snm>Merida</snm><fnm>A</fnm></au><au><snm>Parr</snm><fnm>A</fnm></au><au><snm>Mackay</snm><fnm>S</fnm></au><au><snm>Culianez-Macia</snm><fnm>FA</fnm></au><au><snm>Roberts</snm><fnm>K</fnm></au><au><snm>Martin</snm><fnm>C</fnm></au></aug><source>Plant Cell</source><pubdate>1998</pubdate><volume>10</volume><issue>2</issue><fpage>135</fpage><lpage>154</lpage><xrefbib><pubidlist><pubid idtype="pmcid">143979</pubid><pubid idtype="pmpid" link="fulltext">9490739</pubid></pubidlist></xrefbib></bibl><bibl id="B58"><title><p>Down-regulation of the maize and <it>Arabidopsis thaliana </it>caffeic acid O-methyl-transferase genes by two new maize R2R3-MYB transcription factors</p></title><aug><au><snm>Fornale</snm><fnm>S</fnm></au><au><snm>Sonbol</snm><fnm>FM</fnm></au><au><snm>Maes</snm><fnm>T</fnm></au><au><snm>Capellades</snm><fnm>M</fnm></au><au><snm>Puigdomenech</snm><fnm>P</fnm></au><au><snm>Rigau</snm><fnm>J</fnm></au><au><snm>Caparros-Ruiz</snm><fnm>D</fnm></au></aug><source>Plant Mol Biol</source><pubdate>2006</pubdate><volume>62</volume><issue>6</issue><fpage>809</fpage><lpage>823</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1007/s11103-006-9058-2</pubid><pubid idtype="pmpid" link="fulltext">16941210</pubid></pubidlist></xrefbib></bibl><bibl id="B59"><title><p>Computational identification of microRNAs and their targets</p></title><aug><au><snm>Zhang</snm><fnm>B</fnm></au><au><snm>Pan</snm><fnm>X</fnm></au><au><snm>Wang</snm><fnm>Q</fnm></au><au><snm>Cobb</snm><fnm>GP</fnm></au><au><snm>Anderson</snm><fnm>TA</fnm></au></aug><source>Comput Biol Chem</source><pubdate>2006</pubdate><volume>30</volume><issue>6</issue><fpage>395</fpage><lpage>407</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.compbiolchem.2006.08.006</pubid><pubid idtype="pmpid" link="fulltext">17123865</pubid></pubidlist></xrefbib></bibl><bibl id="B60"><title><p>miRBase: tools for microRNA genomics</p></title><aug><au><snm>Griffiths-Jones</snm><fnm>S</fnm></au><au><snm>Saini</snm><fnm>HK</fnm></au><au><snm>van Dongen</snm><fnm>S</fnm></au><au><snm>Enright</snm><fnm>AJ</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2008</pubdate><issue>36 Database</issue><fpage>D154</fpage><lpage>158</lpage></bibl><bibl id="B61"><title><p>One-step identification of conserved miRNAs, their targets, potential transcription factors and effector genes of complete secondary metabolism pathways after 454 pyrosequencing of calyx cDNAs from the Labiate <it>Salvia sclarea </it>L</p></title><aug><au><snm>Legrand</snm><fnm>S</fnm></au><au><snm>Valot</snm><fnm>N</fnm></au><au><snm>Nicole</snm><fnm>F</fnm></au><au><snm>Moja</snm><fnm>S</fnm></au><au><snm>Baudino</snm><fnm>S</fnm></au><au><snm>Jullien</snm><fnm>F</fnm></au><au><snm>Magnard</snm><fnm>JL</fnm></au><au><snm>Caissard</snm><fnm>JC</fnm></au><au><snm>Legendre</snm><fnm>L</fnm></au></aug><source>Gene</source><pubdate>2010</pubdate><volume>450</volume><issue>1-2</issue><fpage>55</fpage><lpage>62</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.gene.2009.10.004</pubid><pubid idtype="pmpid" link="fulltext">19840835</pubid></pubidlist></xrefbib></bibl><bibl id="B62"><title><p>High-throughput sequencing of <it>Medicago truncatula </it>short RNAs identifies eight new miRNA families</p></title><aug><au><snm>Szittya</snm><fnm>G</fnm></au><au><snm>Moxon</snm><fnm>S</fnm></au><au><snm>Santos</snm><fnm>DM</fnm></au><au><snm>Jing</snm><fnm>R</fnm></au><au><snm>Fevereiro</snm><fnm>MP</fnm></au><au><snm>Moulton</snm><fnm>V</fnm></au><au><snm>Dalmay</snm><fnm>T</fnm></au></aug><source>BMC Genomics</source><pubdate>2008</pubdate><volume>9</volume><fpage>593</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-9-593</pubid><pubid idtype="pmcid">2621214</pubid><pubid idtype="pmpid" link="fulltext">19068109</pubid></pubidlist></xrefbib></bibl><bibl id="B63"><title><p><it>Acacia mangium</it>: a tropical forest tree of the coastal lowlands with low genetic diversity</p></title><aug><au><snm>Moran</snm><fnm>GF</fnm></au><au><snm>Muona</snm><fnm>O</fnm></au><au><snm>Bell</snm><fnm>JC</fnm></au></aug><source>Evolution</source><pubdate>1989</pubdate><volume>43</volume><issue>1</issue><fpage>231</fpage><lpage>235</lpage><xrefbib><pubid idtype="doi">10.2307/2409180</pubid></xrefbib></bibl><bibl id="B64"><title><p>Versatile and open software for comparing large genomes</p></title><aug><au><snm>Kurtz</snm><fnm>S</fnm></au><au><snm>Phillippy</snm><fnm>A</fnm></au><au><snm>Delcher</snm><fnm>AL</fnm></au><au><snm>Smoot</snm><fnm>M</fnm></au><au><snm>Shumway</snm><fnm>M</fnm></au><au><snm>Antonescu</snm><fnm>C</fnm></au><au><snm>Salzberg</snm><fnm>SL</fnm></au></aug><source>Genome Biol</source><pubdate>2004</pubdate><volume>5</volume><issue>2</issue><fpage>R12</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/gb-2004-5-2-r12</pubid><pubid idtype="pmcid">395750</pubid><pubid idtype="pmpid" link="fulltext">14759262</pubid></pubidlist></xrefbib></bibl><bibl id="B65"><title><p>FrameDP: sensitive peptide detection on noisy matured sequences</p></title><aug><au><snm>Gouzy</snm><fnm>J</fnm></au><au><snm>Carrere</snm><fnm>S</fnm></au><au><snm>Schiex</snm><fnm>T</fnm></au></aug><source>Bioinformatics</source><pubdate>2009</pubdate><volume>25</volume><issue>5</issue><fpage>670</fpage><lpage>671</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btp024</pubid><pubid idtype="pmcid">2647831</pubid><pubid idtype="pmpid" link="fulltext">19153134</pubid></pubidlist></xrefbib></bibl><bibl id="B66"><title><p>Ultrafast and memory-efficient alignment of short DNA sequences to the human genome</p></title><aug><au><snm>Langmead</snm><fnm>B</fnm></au><au><snm>Trapnell</snm><fnm>C</fnm></au><au><snm>Pop</snm><fnm>M</fnm></au><au><snm>Salzberg</snm><fnm>SL</fnm></au></aug><source>Genome Biol</source><pubdate>2009</pubdate><volume>10</volume><issue>3</issue><fpage>R25</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/gb-2009-10-3-r25</pubid><pubid idtype="pmcid">2690996</pubid><pubid idtype="pmpid" link="fulltext">19261174</pubid></pubidlist></xrefbib></bibl><bibl id="B67"><title><p>Arabidopsis Monolignol Biosynthesis Gene Families</p></title><url>http://www.arabidopsis.org/browse/genefamily/Raes.jsp</url></bibl><bibl id="B68"><title><p>Multiple sequence alignment using ClustalW and ClustalX</p></title><aug><au><snm>Thompson</snm><fnm>JD</fnm></au><au><snm>Gibson</snm><fnm>TJ</fnm></au><au><snm>Higgins</snm><fnm>DG</fnm></au></aug><source>Curr Protoc Bioinformatics</source><pubdate>2002</pubdate><volume>Chapter 2</volume><note>Unit 2 3</note></bibl><bibl id="B69"><title><p>NCBI ORF Finder</p></title><url>http://www.ncbi.nlm.nih.gov/projects/gorf/</url></bibl><bibl id="B70"><title><p>EMBOSS: the European Molecular Biology Open Software Suite</p></title><aug><au><snm>Rice</snm><fnm>P</fnm></au><au><snm>Longden</snm><fnm>I</fnm></au><au><snm>Bleasby</snm><fnm>A</fnm></au></aug><source>Trends Genet</source><pubdate>2000</pubdate><volume>16</volume><issue>6</issue><fpage>276</fpage><lpage>277</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S0168-9525(00)02024-2</pubid><pubid idtype="pmpid" link="fulltext">10827456</pubid></pubidlist></xrefbib></bibl><bibl id="B71"><title><p>MEGA5: Molecular Evolutionary Genetics Analysis using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods</p></title><aug><au><snm>Tamura</snm><fnm>K</fnm></au><au><snm>Peterson</snm><fnm>D</fnm></au><au><snm>Peterson</snm><fnm>N</fnm></au><au><snm>Stecher</snm><fnm>G</fnm></au><au><snm>Nei</snm><fnm>M</fnm></au><au><snm>Kumar</snm><fnm>S</fnm></au></aug><source>Molecular Biology and Evolution</source><pubdate>2011</pubdate><note><b>msr121v1-msr121</b>.</note></bibl><bibl id="B72"><title><p>Identification and characterization of new plant microRNAs using EST analysis</p></title><aug><au><snm>Zhang</snm><fnm>BH</fnm></au><au><snm>Pan</snm><fnm>XP</fnm></au><au><snm>Wang</snm><fnm>QL</fnm></au><au><snm>Cobb</snm><fnm>GP</fnm></au><au><snm>Anderson</snm><fnm>TA</fnm></au></aug><source>Cell Res</source><pubdate>2005</pubdate><volume>15</volume><issue>5</issue><fpage>336</fpage><lpage>360</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/sj.cr.7290302</pubid><pubid idtype="pmpid" link="fulltext">15916721</pubid></pubidlist></xrefbib></bibl><bibl id="B73"><title><p>The Sequence Alignment/Map format and SAMtools</p></title><aug><au><snm>Li</snm><fnm>H</fnm></au><au><snm>Handsaker</snm><fnm>B</fnm></au><au><snm>Wysoker</snm><fnm>A</fnm></au><au><snm>Fennell</snm><fnm>T</fnm></au><au><snm>Ruan</snm><fnm>J</fnm></au><au><snm>Homer</snm><fnm>N</fnm></au><au><snm>Marth</snm><fnm>G</fnm></au><au><snm>Abecasis</snm><fnm>G</fnm></au><au><snm>Durbin</snm><fnm>R</fnm></au></aug><source>Bioinformatics</source><pubdate>2009</pubdate><volume>25</volume><issue>16</issue><fpage>2078</fpage><lpage>2079</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btp352</pubid><pubid idtype="pmcid">2723002</pubid><pubid idtype="pmpid" link="fulltext">19505943</pubid></pubidlist></xrefbib></bibl></refgrp>
</bm></art>