In most species of mammals, the TRB locus has the common feature of a library of TRBV genes positioned at the 5'- end of two in tandem aligned D-J-C gene clusters, each composed of a single TRBD gene, 6-7 TRBJ genes and one TRBC gene. An enhancer located at the 3'end of the last TRBC and a well-defined promoter situated at the 5'end of the TRBD gene and/or a undefined promoter situated at the 5'end of the TRBD2 are sufficient to generate the full recombinase accessibility at the locus. In ruminant species, the 3'end of the TRB locus is characterized by the presence of three D-J-C clusters, each constituted by a single TRBD, 5-7 TRBJ and one TRBC genes with the center cluster showing a structure combined with the clusters upstream and downstream, suggesting that a unequal crossover occurred in the duplication. An enhancer downstream the last TRBC, and a promoter at the 5'-end of each TRBD gene are also present.
In this paper we focused our attention on the analysis of a large number of sheep TR β-chain transcripts derived from four different lymphoid tissues of three diverse sheep breed animals to certify the use and frequency of the three gene clusters in the β-chain repertoire. As the sheep TRB locus genomic organization is known, the exact interpretation of the V-D-J rearrangements was fully determined. Our results clearly demonstrate that sheep β-chain constitutes a level of variability that is substantially larger than that described in other mammalian species. This is due not only to the increase of the number of D and J genes available to the somatic recombination, but also to the presence of the trans-rearrangement process. Moreover, the functional complexity of β-chain repertoire is resolved by other mechanisms such as alternative cis- and trans-splicing and recombinational diversification that seems to affect the variety of the constant region.
All together our data demonstrate that a disparate set of molecular mechanisms operate to perform a diversified repertoire in the sheep β-chain and this could confer some special biological properties to the corresponding αβ T cells in the ruminant lineage.
Mature T lymphocytes must express heterodimeric α and β or γ and δ chain T cell receptors (TRs) on its surface in order to provide protection from pathogens. The diversity of the TR repertoire derives in large part from the random somatic rearrangements of Variable (V), Diversity (D) and Joining (J) genes in the case of δ and β chain, and Variable (V) and Joining (J) genes in the case of γ, and α chain encoding the variable portion of these molecules during the T-cell differentiation.
The V(D)J process requires the binding of the lymphocyte-specific recombination activating gene 1 and 2 (RAG1/2) protein complex to recombination signal sequences (RSs) flanking the rearranging sides of the individual V, D and J genes . Upon binding, the RAG1/2 recombinases introduce a nick at the border between the RS heptamer and the adjacent coding sequence. The DNA repair factors of the nonhomologous end-joining (NHEJ) machinery join the nicked genes . The RSs consist of conserved heptamer and nonamer sequences, separated by a spacer of 12 or 23 bp of relatively non-conserved DNA. Efficient recombination involves pairs of genes flanked by dissimilar 12- and 23RSs (the 12/23 rule) . However, at the locus encoding for the β-chain (TRB), despite the 12/23 compatibility, the TRBD 12RSs, but not the TRBJ 12RSs efficiently target Vβ 23RSs. This phenomenon termed "beyond 12/23 rule" , preserving the TRBD gene utilization, ensures an ordered V(D)J recombination at the TRB locus with the TRBD-to-TRBJ joining which occurs before the TRBV-to-TRBD gene assembly.
Diversity at the recombination level is further enhanced by other processes that include the exonuclease digestion (trimming) of 3'-V, 5'- and 3'-D, and 5'-J genes, the imprecise joining of nicked genes, and the addition of non germline nucleotides (N nucleotides) at the V-J, V-D and D-J junctions. For this reason the product of the V(D)J joining, corresponding to the CDR3 region in the chain, is markedly polymorphic and is dominant in the recognition of peptide. After transcription, the V(D)J sequence is spliced to the constant (C) gene.
The resources available to generate the potential repertoires and to establish the regulation are described by the genomic organization of the TR loci. In most species of mammals, the TRB locus has the common feature of a library of TRBV genes positioned at the 5'- end of two in tandem aligned D-J-C gene clusters, each composed of a single TRBD, 6-7 TRBJ and one TRBC genes, followed by a single TRBV gene with an inverted transcriptional orientation located at the 3'-end. This genomic organization is reported well conserved from human , mouse [6,7], rat , chimpanzee , rhesus monkey , and horse . A peculiar feature of the mammalian TRB locus is the presence of two very similar TRBC genes, since they differ by only a few residues in the coding region; conversely, they are different in their own 3'-UTR regions.
In the artiodactyls lineage, i.e., in sheep  as well as in cattle  and in pig , a duplication event within the 3'-end of the TRB locus has led to the generation of a third D-J-C cluster. The presence of an additional cluster produces an increase in the number of D and J genes available to partake in somatic recombination, but also expand the distance between the enhancer (Eβ) and the promoter (PDβ1) elements within the locus. Surprisingly also, in presence of three D-J-C clusters, both the nucleotide and protein sequences of all three TRBC genes are highly similar. Only four amino acid residues have undergone replacement in the TRBC1 gene with respect to the TRBC2 and TRBC3 genes, while the TRBC3 3'-UTR region is identical to that of TRBC1 gene . The amino acid replacements were located, two in the N- terminus and one in the E β-strand and in the FG loop of well-defined regions of the extracellular domain of the TRBC molecule .
To know if the altered genomic architecture of the ruminant TRB locus can modify the mechanisms of recombination, we investigated on the β-chain repertoire in sheep. For this purpose we produced a collection of cDNAs derived from four different tissues belonging to four different adult animals of three diverse sheep breeds. As the genomic organization is known, the exact interpretation of the β chain transcripts was determined. The results of the analyses clearly demonstrate that sheep possess a repertoire of functional TRβ genes that is substantially larger than that described for other mammalian species, but also that other mechanisms as trans-rearrangement, intrallelic trans-splicing and DNA recombinational diversification involving the constant regions seem to shape the β-chain repertoire in a consistent way. However, the general paradigms of the mammalian TRB regulation seem to be preserved.
Analysis of β-chain transcripts
A previous study on cloning and sequencing of the sheep TRB locus revealed that the D-J-C region is organized in three independent clusters tandem aligned, with D-J-C cluster 3 additional with respect to the other mammalian TRB loci . D-J-C cluster 1 contains one TRBD, six TRBJ and one TRBC gene. D-J-C cluster 3, located at 2.4 Kb downstream cluster 1, includes one TRBD, five TRBJ and one TRBC gene. Finally, D-J-C cluster 2 is positioned at 2.6 Kb downstream cluster 3 with one TRBD, seven TRBJ and one TRBC gene (fig. 1).
To evaluate the contribution of each gene cluster in the formation of the β-chain repertoire, a total of 72 clones containing rearranged V-D-J-C transcripts with a correct open reading frame were analyzed. All cDNA clones were registered in EMBL database with the Accession numbers from FM993913 to FM993984. 21 of these clones were derived from perinatal thymus (pSTMOS series) of a Moscia Leccese breed sheep, 15 from adult thymus (pSTA series) and 19 from spleen (pSMA series) of a Gentile di Puglia breed sheep, 17 from peripheral blood (pSSAR series) of a Sarda Ionica breed sheep. The clones were obtained by RT-PCR. The 5' primer was chosen on the YLCASS amino acid motif of the TRBV genes as members of the TRBV subgroups with this motif which seem to be the most frequently used  while the 3'-primer was designed on a conserved region of the three TRBC genes .
The deduced amino acid sequences of the V-D-J regions of all 72 cDNA clones are reported in the Table 1 together with the corresponding TRBC genes, according to the tissue of origin. Among the clones only one sequence is shared between blood (pSSAR25) and adult thymus (pSTA03). No tissue-specific expression of the genes was found. A total of 16 TRBJ genes were recovered within the different cDNAs. Thus, only one out of 17 functional TRBJ genes present in the genomic sequence was completely absent (TRBJ2.6). Besides, all TRBJ sequences match well with the corresponding genomic ones, and the high level of sequence similarity observed among the different animals is consistent with a close phylogeny of sheep breeds. The TRBJ2 cluster seems to be preferentially used (38/72 = 52.7%) and, although the numbers are too low to be statistically relevant, a slight increase in the use of TRBJ2.3 (14/38 = 36.8%) and TRBJ2.7 (10/38 = 26.3%) genes can be observed. Moreover, 20 clones retain a member of the TRBJ3 cluster, with the TRBJ3.4 gene (9/20 = 45%) more frequently used, while 14 clones retain the TRBJ1 gene set, without any preferential usage.
Table 1. Predicted amino acid sequences and length of the junctional diversity of the cDNAs. The classification of the TRBD, TRBJ and TRBC genes is indicated.
Three nucleotide differences at the N-terminus allow to distinguish the three TRBC gene isotypes: TRBC1 differs with respect to TRBC2 and TRBC3 genes for two nucleotide substitutions in the third and fourth codons; TRBC3 (as well as TRBC1 gene) is distinguishable from TRBC2 because of a silent nucleotide substitution at the third position of the first codon . On the basis of these criteria, the N-terminus of the TRBC portions within the cDNA sequences was analyzed and a significant group of cDNAs with the TRBC3 gene (35/72 = 48.6%) identified. Moreover, 25 clones retain the TRBC2 (34.7%) and 12 clones are with the TRBC1 (16.6%) gene (Table 1).
More complex is the determination and the contribution of the genes involved in the CDR3 formation. The CDR3 β region is defined as a stretch of nucleotides running after the codon encoding the cystein in position 104 of the TRBV gene to the codon before that which encodes the phenylalanine of the FGXG motif of the TRBJ gene http://imgt.cines.fr/ webcite. The corresponding amino acid sequence of the CDR3 loop deduced from the nucleotide sequences reveals that it is heterogeneous for amino acid composition (Table 1). The mean length of the CDR3 loop was approximately the same in spleen (mean 12.3 aa, range 10-16 aa) and adult thymus (mean 12.6 aa, range 9-16 aa), but larger in blood (mean 13.9 aa, range 10-15 aa) and young thymus (mean 13.7 aa, range 10-20 aa). For comparison, human peripheral blood CDR3β loop is about 12.7 residue long  and mouse is 11.9 residue long . A similar CDR3 length and size range was reported in thymus and peripheral blood lymphocytes of piglets (mean 13.1 aa, range 10-17 aa) .
For a close inspection of the CDR3 s, the nucleotide sequences have been excised from each cDNA sequence and analyzed in detail. In the absence of the TRBV germline sequences, the deletions at the 3'-end of the TRBV and the nucleotide addition at the V-D junctions cannot be accurately estimated. However, the comparison of the 72 V-D-J junctions after the ASS motif allowed the determination of the probable 3'-end of the TRBV gene that has not been trimmed by exonuclease during rearrangement in a significant proportion of sequences (Table 1). By the comparison of the TRBD genomic sequences, the nucleotides located in the CDR3 regions were considered to belong to a TRBD gene if they constituted a stretch of at least four consecutive residues corresponding to the TRBD1, TRBD3 or TRBD2 germline sequences. In this way the 72 sequences were grouped according to the TRBD1 (fig. 2a, 36 sequences), TRBD3 (fig. 2b, 16 sequences) or TRBD2 (fig. 2c, 8 sequences) gene usage. 12 sequences with no recognizable TRBD genes were grouped separately (fig. 2d). These last sequences could be interpreted as direct V-J junctions. However, it is also possible that nucleotide trimming masked the initial participation of D gene during the rearrangement. In the other cases the degree of germline nucleotide trimming in the 3'-V and 5'-J as well as the 5' and 3' D region is similar in all groups (fig. 2). Nucleotides that could not be attributed to any template sequence are considered N-elements. The mean length for N-D-N addition, including D region, is 15 nt (range 6-23 bases) for the first group (fig. 2a), 13.8 nt (range 4-22 bases) for the second group (fig. 2b) and 16 nt (range 6-33 bases) for the group with TRBD2 participation (fig. 2c). The mean of N addition in the clones without TRBD sequence (fig. 2d) is 8.3 nt (range 2-16 bases). Particular features of the CDR3 region of the clones are the presence within the D region of nucleotide substitutions as well as the presence of insertion (psTMos 13 in fig. 2b) and deletion (psTA12 in fig. 2a) with respect to the germline sequences. Although the numbers are too low to be statistically relevant, a trend towards longer CDR3 length in TRBD2 (mean 42.3 bp, range 27-60) with respect to TRBD1 (mean 40.3 bp range 33-54) and TRBD3 (mean 38.5 bp, range 30-48), or with no apparent TRBD (mean 36.2 bp, range 30-42) transcripts was evident.
These data together suggest that all three TRB D-J-C clusters are used to generate in sheep functional TR β-chain with no specific influence of any clusters.
Figure 2. CDR3 nucleotide sequences retrieved from the cDNA clones. Sequences are shown from the codon after the cys-94 of the TRBV gene to the codon before the phe-104 of the TRBJ gene and grouped on the basis of the TRBD1 (a), TRBD3 (b), TRBD2 (c) or no TRBD usage (d). Nucleotides that are conserved in the 3' end of the V portion are considered of TRBV genomic origin and indicated in bold upper cases. Residues belonging to the different TRBJ genes, on the right, are indicated also in bold upper case at the 3' end of each sequence. The germline sequences of TRBD1, TRBD3 and TRBD2 gene are indicated at the top of each figure. The sequences considered to present recognizable TRBD genes (see text) are indicated in lower cases and nucleotide substitutions or insertions are underlined. Nucleotides that cannot be attributed to any coding elements (N-nucleotides) are indicated in capital letters on the left and on the right sides of the TRBD regions. Numbers in the right column indicate the level of 5'- TRBJ nucleotide trimming.
Analysis of the D-J-C rearrangements
Since the genomic organization of the 3' region of the sheep TRB locus is known (fig. 1) , the formal interpretation of the D-J-C arrangements is possible. The intra-cluster rearrangements represent a consistent portion of the repertoire (41.6%), with 10 TRBD1-TRBJ1, 9 TRBD3-TRBJ3 and 6 TRBD2-TRBJ2 rearrangements (Table 1). A similar number of rearrangements (53.3%) can be interpreted by direct 5'- to- 3' joining across the clusters (inter-cluster rearrangements) with 20 TRBD1-TRBJ2, 6 TRBD1-TRBJ3 and 6 TRBD3-TRBJ2 rearrangements (Table 1). Interestingly, we also observed two TRBD2-TRBJ3 (psTMOs23 and psTA09, italics in Table 1) and one TRBD3-TRBJ1 (psSAR08, italics in Table 1) joining. Since the D- J-C cluster 2 is located downstream D- J-C cluster 3 as well as D- J-C cluster 3 is downstream D- J-C cluster 1 within the TRB locus, both these junctions may only be explained by chromosomal inversion, or with more probability, by trans-rearrangement occurring during TRB locus recombination.
A systematic analysis of the constant region of the transcripts also revealed that multiple splice variants are present. In fact, the canonical splicing is present in 49/72 (68%) clones with 10 TRBJ1-TRBC1, 17 TRBJ3-TRBC3 and 22 TRBJ2-TRBC2 transcripts (Table 1). A group of 7 clones (4 TRBJ1-TRBC3 and 3 TRBJ3-TRBC2) comes from an alternative or cis-splicing mechanism (9.7%). Finally, it is noteworthy that 16 clones (22.2%, bold in Table 1) with TRBJ2 genes showed TRBC3 or TRBC1 instead of the expected TRBC2 gene. Since TRBC3 as well as TRBC1 genes are located upstream TRBJ2 cluster in the germline DNA, TRBJ2 joined to TRBC1 or TRBC3 sequences cannot be a cis-spliced product of a single precursor RNA. Consequently, they must be the product of a trans-splicing between a transcript with TRBJ2-TRBC2 genes and a transcript containing TRBC1 or TRBC3 genes.
We excluded that all these non canonical sequences may be the result of PCR artifacts since the crossover points have not as expected a random distribution, but they always lie at the D-J or/and J-C junction, giving rise to products of the appropriate length and sequence.
The presence of splice variants may suggest the involvement of the TRBC gene in generating the TR β-chain functional diversity.
Structure of the TRBC region
To complete the analysis of the TRBC domain in the cDNA clones, the whole constant portion of the transcripts was retrieved from the sequences and aligned according to the three TRBC isotypes for each animal in the different tissues.
The comparison of the 72 cDNAs showed the presence of different sequences that can be identified for the nucleotide variability in 14 different positions, 12 located in the first and two in the third exon, resulting in six amino acid substitutions all grouped in the first exon, and as a consequence, in the extracellular domain of the chain (fig. 3). By means of these variations, we observed a number of different sequences in excess. For example, five different groups of sequences were assigned to the TRBC3 gene in the young thymus of the Moscia Leccese breed individual. This number is certainly higher than the expected two allelic forms, at the most, of the gene. In order to understand the origin of the additional sequences, we have isolated by PCR the allelic variants of all three TRBC genes from the young thymus genomic DNA of the Moscia Leccese individual, used as a reference model with respect to the others. The specificity of the PCR reactions was achieved by using a reverse primer which binds to either TRBC1 and TRBC3 (B40) or TRBC2 3'-UTR (B42) sequences, and completely TRBC specific forward primers complementary to a specific region upstream the TRBC1 (CC1), TRBC3 (CC3) and TRBC2 (CC2) coding regions (see Methods). The three different PCR products were sequenced, and in every case, two allelic forms for each TRBC gene were obtained (data not shown). The comparison of the genomic with the corresponding sequences within the young thymus cDNAs allows us to establish that the first two more abundant groups of TRBC3 sequences represent the two allelic forms of the TRBC3 genes (pink and lilac in fig. 3), while alternative splicing of the third exon and DNA recombinational diversification process with the TRBC2 gene can have generated the other three groups of TRBC3 sequences (mixed color in fig. 3). Moreover, the two groups of TRBC2 cDNA sequences (green and yellow in fig. 3) perfectly matched with the two allelic forms (data not shown). Only one allelic form was recovered for the TRBC1 gene (italics in fig. 3), while the other TRBC1 sequence can have been generated by a mechanism of DNA recombinational diversification with the allele of TRBC3 gene (mixed color).
After deducing the allelic variants of the three constant genes in the other tissues, alternative splicing and recombinational diversification can explain the excess of the sequences also in those cases.
Figure 3. The nucleotide sequences of the TRBC isotypes derived from the cDNA clones. Only the 14 variable nucleotide codons (12 in the first and two in the third exons numbered from the first position of the constant region in the cDNA) are depicted. The amino acids specified by the corresponding codons and those due to the nucleotide substitutions are given at the top of each codon, using the single letter code. The sequences are organized with respect to the one allelic TRBC3 sequence isolated from Moscia Leccese breed young thymus. Identities of the other allelic form of the same gene or of the other TRBC isotypes in the other tissues with respect to the reference sequence are indicated by dashes, while nucleotide substitutions are shown. The number on the left indicates the clones with the corresponding sequences. All the allelic forms of the TRBC isotypes are identified by a color. Color changes indicate recombinational diversification or alternative splicing.