• Home   /  
  • Archive by category "1"

Unit 6 Assignment 1 Aligning Account Types And Privileges Or Immunities


We report the cloning and analysis of a bovine JH locus comprising a DQ52 segment, six JH segments and sequence to a 5′ H chain intronic enhancer. The contig was mapped to BTA 11 and evidence was found for rearrangement of the sixth JH segment at a low but detectable frequency. In contrast, the fourth segment present at a second copy of the bovine JH locus mapping to BTA 21 was found to rearrange at high‐frequency, forming FR4 in the majority of bovine Ig H chains. The data thus show that bovine H chains can be generated from segments at two distinct genomic locations. Further investigation should establish if rearrangement takes place at each locus or if the participating segments are brought together from different chromosomal locations by less conventional processes (for example by gene conversion or trans‐chromosomal rearrangement).

antibodies, J genes, joining, rearrangement


Studies of the structure, function and genetics of Ig from livestock animals and other veterinary species have revealed many facets of immunology that could not have been predicted from the human–murine paradigm. For example, the Igs of camels and llamas have evolved to functional independence from L chains (1) and veterinary immunologists have shown that post rearrangement processes can generate diverse H chain repertoires from small families of conserved segments or even single V genes (2).

The bovine Ig system has several properties that distinguish it from mice or humans. The H chain repertoire is founded upon the expression of a single gene family of modest size comprising segments of very limited diversity (3–5). In consequence, cattle are unable to generate Ig diversity through rearrangement, a process which underlies Ab formation in mice and humans (6). Humoral immunity is therefore reliant upon post‐rearrangement diversification, the nature of which is presently obscure (7). The length of CDR3 is also distinctive. Bovine H chains possess CDR3 sequences that are frequently long (3–5) and sometimes in excess of 50 amino acids in length (8). This arises in foetal lymphoid tissue (3,9) indicating that it is created through rearrangement of long D segments (10) rather than from antigen‐driven processes.

Here we report our characterization of the bovine JH system. The mammalian JH locus typically carries six or more segments that are utilized to varying degrees (11). For example, in humans the JH4 segment forms the fourth framework region (FR4) in ∼50% of Ig H chains (12). In cattle, bias towards a single JH segment is even more pronounced and a common FR4 sequence can be observed in a high proportion of bovine H chain cDNAs (3–5). Infrequent rearrangement of an alternative JH segment also appears possible (3). The aims of our study were therefore to define the number of JH segments present at the JH locus, to identify which segments undergo rearrangement and to seek an explanation for the bias apparent in this process. In tackling these objectives, we have identified a further distinctive property of the bovine Ig system.


Recovery of the bovine JH locus

Since bovine and ovine Igs are very similar, we used PCR to recover the main part of the JH locus with primer pairs designed from the ovine sequence (11). As template, bovine genomic DNA was prepared with a Wizard DNA purification kit (Promega, Southampton, UK) from fresh liver tissue obtained from a local slaughterhouse from several individual animals. The tissue was stored at –80°C prior to DNA isolation. One microgram of genomic DNA was used as template in PCR with a high‐fidelity polymerase mixture (Expand HiFi, Roche, Lewes, UK) and homology primer 1 and downstream primer (Table 1). Homology primer 1 (Table 1) spanned the heptamer motif and adjacent regions of the 5′ terminal segment at the ovine JH locus, whilst downstream primer annealed to sequences ∼80 bp downstream from the 3′ terminal segment. Primers were used at final concentrations of 600 nM in a reaction buffer containing 5 mM magnesium chloride and 750 µM dNTPs. After initial denaturation at 93°C for 1.5 min, reactions were cycled 35 times through 93°C (30 sec), 55°C (1 min) and 68°C (2.5 min, extended by 5 sec for each block of 5 cycles). The 1.8 kb amplicon was isolated from agarose gels, blunted by reaction with the Klenow fragment of Escherichia coli DNA polymerase (Promega) and then ligated into pZErO2 (Invitrogen, Paisley, UK). Escherichia coli DH5α transformed with the ligation products were selected on Lauria agar plates containing 35 µg/ml kanamycin and 3 mM isopropyl β‐d‐thiogalactopyranoside. Induction of the ccd gene borne on the plasmid provided efficient selection for vector containing inserts. After characterization of candidate clones by restriction analysis, inserts were sequenced using M13 forward and reverse primers and a primer‐walking strategy with internal primers 7, 8 and 9 (Table 1). Conventional protocols for automated sequencing were used based upon Sanger chemistry (13), using Big Dye reagents (Applied Biosystems, Warrington, UK) and ABI 373 stretch and 377 instrumentation. Sequencing was carried out at the Molecular Biology Support Unit (IBLS, University of Glasgow).

Recovery of the downstream flanking regions

The primer Eµ reverse (Table 1) was designed from aligned sequences of H chain 5′ intronic enhancers including that of the sheep (GenBank accession number Z98207). FR4 primer (Table 1) carried a sequence commonly observed in the fourth framework region of bovine H chain cDNA. These primers were used in PCR with bovine genomic DNA. The 900 bp product was isolated from agarose gels, blunted and phosphorylated by concurrent reaction with Klenow and T4 polynucleotide kinase (Promega). The reaction was conducted at 37°C for 40 min in a 50 µl volume containing about 1 µg of amplicon, 1 mM ATP, 35 µM dNTPs, 20 mM magnesium chloride, 5 mM DTT and 80 µM Tris pH 7.6. Ten units of kinase and five units of Klenow were used. After heat inactivation and precipitation, the DNA was ligated into dephosphorylated SmaI‐cut pUC18 (Amersham Biosciences, Little Chalfont, UK) and transformed into E. coli. Recombinant clones were identified by restriction analysis and plasmid DNA was sequenced using M13 forward and reverse primers.

Recovery of the upstream flanking regions

Initially, a primer was designed from alignment of human, mouse, rabbit and shrew DQ52 sequences (14). PCR with this oligonucleotide and primers for the bovine JH locus was unsuccessful. Therefore, a lambda clone carrying bovine Cµ exons [clone 15 (15)] was obtained from Professor K. Knight, Stritch School of Medicine, Loyola University, Chicago. Comparison of the published characterization of this clone with emerging sequence of the JH locus suggested that a limited upstream stretch might be recoverable. Once the orientation of the insert had been established, PCR was carried out with primers against the lambda right arm and the JH locus (λ right reverse and primer 8 reverse, respectively; Table 1). The 1.2 kb amplicon was blunted and phosphorylated for ligation into pUC18 as described above. Sequencing was carried out with M13 forward and reverse primers, a lambda‐specific primer designed to anneal close to the bovine insert (λ right reverse 2) and primer 10 reverse (Table 1).

Chromosomal localization of a JH locus

It is known that a duplication of the IgM locus exists on BTA11 (16). PCR was carried out with primers Cµ forward and Cµ reverse (Table 1) using DNA from a lambda library prepared specifically from bovine chromosome 11 (17). This reaction was also carried out with BAC 944D11, a clone found to carry the JH locus isolated in our studies (see Results), and BAC 66R4C11, a clone carrying a second JH locus (18,19). Amplicons were cloned into pCR‐TOPO (Invitrogen) following the manufacturer’s protocol. Several independent clones were sequenced with M13 forward and reverse primers and compared with each other and depositions at GenBank.

Identification of the dominant rearranging segment

Genomic DNA prepared from isolated peripheral B cells was provided by Dr S. Stephens, Institute of Animal Health, Compton, UK. B cell DNA was used in PCR with VHF and downstream primer (Table 1) to isolate rearranged segments and adjacent, downstream regions of the JH locus. VHF annealed to a leader sequence common to all members of the expressed VH gene family (5). The downstream primer was complementary to sequences ∼80 bp from the segment at the 3′ terminus of the bovine JH locus. This strategy was free of assumptions regarding which JH segment(s) might undergo rearrangement. Four prominent products from the reaction were isolated from agarose gels, cloned separately into pCR4‐TOPO. Three of the four products proved to be irrelevant, but an amplicon of 1.5 kb was informative. Following this analysis, further PCRs were conducted with genomic DNA from leukocytes of the same donor animal. The JH loci were recovered with the following primer pairs: internal primers 6 and 8; internal primer 11 and insert reverse; insert forward and downstream primer (see below for further explanation).

Locus‐specific PCR

To check if copies of the JH loci identified in these studies represented variants of the same allele, locus‐specific reactions were designed using BAC clones 944D11 and 66R4C11 as control templates. PCR was carried out with primers lo forward and downstream primer, and with internal primer 11 and hi reverse. With an annealing temperature of 58°C, the former reaction was specific for the locus carried on 944D11 and the latter reaction only recovered sequence from 66R4C11. The reactions were then carried out with genomic DNA isolated from sperm from three individual animals from three breeds (South Devon, Belgian Blue, Limousin). Sperm was obtained from Lindsay’s AI, Carlisle, UK. DNA was isolated using QIAamp reagents and the manufacturer’s modified protocols (Qiagen, Crawley, UK).


Recovery and sequencing of a bovine JH locus

Following preliminary work that showed close similarity between the JH loci of cattle and sheep, the majority of a bovine locus was recovered from liver genomic DNA as a single 1.8 kb amplicon by optimized PCR. The product was cloned and sequence was gathered from its termini using M13 forward and reverse primers. Internal primers were designed to complete the characterization of the insert. Our strategy of cloning by homology was extended to recover the downstream flanking region with a primer designed from alignment of H chain 5′ intronic enhancer sequences. Once reaction conditions had been optimized, a single product of 900 bp was obtained which was cloned and sequenced. Attempts to extend the analysis upstream from the JH locus with a primer designed by alignment of DQ52 sequences failed. However, this region was successfully recovered on a 1.2 kb amplicon from a lambda clone provided by Professor K. Knight. Overall, the resulting data formed a contiguous sequence of 3282 bp starting about 70 bp upstream from a bovine DQ52 segment, through the JH locus to the 5′ intronic enhancer region lying ∼570 bp downstream (GenBank accession number AY149283). Although the sequence of the locus was assembled in several sections from different sources, its existence as a contiguous sequence was confirmed by successful recovery from a BAC clone (see below) and bovine genomic DNA (data not shown).

The contig was analysed by comparison with the ovine JH locus (11), by searching for RSS and by checking for FR4 sequences observed in H chain cDNA. As predicted from preliminary studies, the overall homology between the bovine locus and its ovine homologue was high (89% identity). Nucleotide differences were scattered throughout the 2.7 kb overlapping region with four areas where insertion or deletion of between 11 and 27 nt could be detected. Analysis by BLAST (20) and NIX (http://www.hgmp.mrc.ac.uk) did not suggest that these features were of functional significance. For reasons that will become clear, this contig is designated the JHlow locus throughout the remainder of our report.

Identification of six JH segments

Analysis of the sequence of the JHlow locus revealed six segments with spacing and organization similar to that observed in the sheep. The segments were numbered according to their order from the DQ52 proximal to the distal regions of the contig. In summary, they spanned a region of ∼1.8 kb with inter‐segment distances ranging from 130 to 500 bp. The nucleotide sequences of these segments are presented in Fig. 1. RSS motifs could be identified immediately upstream of each segment and matched to a greater or lesser extent the Ig consensus. Generally, nonamer motifs and the spacer regions were variable in sequence, the spacer ranging from 20 (JH5) to 23 bp (JH4). The heptamer sequences showed better match to the consensus, exceptions being those adjacent to the second and third segments. Potential reading frames carried on each segment were aligned and translated around the W codon which characteristically forms the first amino acid of FR4 in cattle and many other vertebrates, and the two S codons at the carboxy‐terminus of this region. The amino acid sequences predicted by these criteria are also shown in Fig. 1.

Segments JH1, JH2, JH3 and JH5 were judged likely to be pseudogenes. For JH1, the nonamer and splice site for RNA processing departed from consensus (21). For JH2, the splice site for RNA processing was also aberrant. The heptamer adjacent to JH3 departed markedly from consensus. The nonamer associated with JH5 departed from the consensus, the RSS spacer was short (20 nt) and the RNA splice site was defective. Functionality of the remaining two segments was assessed by comparison with the sheep JH locus and bovine FR4 sequences observed in Ig H chain cDNA.

Comparison of JH4 with its homolog at the ovine locus (termed JH1) revealed striking similarity (Fig. 1). In sheep, this segment is identical to the majority of FR4 sequences in ovine H chain cDNA suggesting that it undergoes high‐frequency rearrangement and expression (11). Specifically, there were only three nucleotide differences in the RSS (two substitutions, one gap) and two differences 5′ to the conserved W codon (one gap, one substitution) which would lead to amino acid substitutions in this region. Further differences between the bovine and ovine segments occurred in the main part of the reading frame where five nucleotide differences gave rise to three amino acid substitutions. In cattle, a single FR4 sequence is also predominant (Fig. 1); in one study, Berens et al. (3) isolated this sequence repeatedly from foetal splenic cDNA, suggesting minimal alteration by combinatorial imprecision or other processes of diversification. It is clear from Fig. 1 that the bovine JH4 segment at the JHlow locus could not form this FR4 sequence. Specifically, five nucleotide differences led to three amino acid changes upstream from the W codon, and six nucleotide differences created four amino acid substitutions in the main part of the reading frame. Focusing on the latter features, a CAA (Q) codon in the dominant sequence was represented in JH4 as CCA (P), and a CTC (L) codon was mismatched by ATC (I). Finally a run of four A residues in JH4 spanned Q and N codons (CAA AAC) whereas in IgH cDNA, TGGT contributed to L and V codons (CTG GTC).

Of the six segments present at the JHlow locus, the data indicated that just one was expressed directly: JH6 matched perfectly an alternative FR4 sequence identified by Berens et al. (3) in foetal and adult Ig cDNA (Fig. 1). JH6 was also near‐identical to its equivalent at the ovine locus (Fig. 1).

A bovine DQ52 segment departs from the vertebrate consensus

A bovine DQ52 homologue was located upstream from the JHlow locus (Fig. 2). The coding sequence was flanked by RSS and, at 14 nt, was ∼27% longer than that observed in other species. Alignment with DQ52 segments from llama, humans, mouse, rabbit and house shrew (Suncus murinus) revealed marked differences, explaining the failure to recovery the segment from bovine genomic DNA with consensus primers. Although features of the bovine upstream RSS suggested potential functionality, translation of the coding sequence did not reveal the frequency of Y and C codons that are characteristic of bovine H chain CDR3 (3–5,8–10). It therefore seemed unlikely that the segment was utilized with significant frequency during foetal or adult life. This was reinforced by examination of the downstream RSS in which there was a marked departure from consensus in the heptamer motif (Fig. 2).

Chromosomal assignment of the JHlow locus

BAC clone 944D11 was isolated from a genomic library (22) and PCR used as described earlier to recover the JH locus present on the insert. Sequencing confirmed that 944D11 carried the JHlow locus as described above. Further analysis of 944D11 and BAC clone 355H4, a standard marker for the bovine Ig H chain locus on BTA21 (23,24), revealed overlapping, identical sequences unconnected with the Ig system. This suggested two possibilities: that the JHlow locus mapped to the main Ig H chain locus on chromosome 21; or that the insert on BAC 944D11 was derived from a large translocation of sequence from BTA21 to another region of the bovine genome. To investigate these ideas, part of the Cµ locus was recovered as an amplicon of ∼500 bp from a BTA11‐specific library, BAC 944D11 and BAC 66R4C11. BAC 66R4C11 carries a second JH locus that can be unequivocally anchored to BTA21 through the presence of Ig constant region genes (18,19). The sequence of amplicons from 944D11 were essentially identical to those from the BTA11‐specific library whereas products from 66R4C11 consistently differed at the 3′ terminus of Cµ exon 1, through the intron and into Cµ exon 2 (Fig. 3). Interestingly, the amplicon from 944D11 closely matched GenBank file U63637 (Fig. 3), a deposition made from the original description of bovine IgM constant region sequences (25). The sequence deposited as U63637 was originally isolated from Knight’s lambda clone 15 (15), used in our study to recover flanking sequence upstream from the JHlow locus. Data recovered from 66R4C11 matched GenBank file AY230207, a result that would have been predicted as they have a common origin (18). This enabled the elimination of PCR error rate as a significant confounding factor. Taken together, the results anchor the JHlow locus and lambda clone 15 to BTA11.

Analysis of B cell genomic DNA

In order to identify the JH segment that undergoes frequent rearrangement, B cell genomic DNA was used in a PCR with primers against the VH segment and the region downstream of the JH locus. A 1.5 kb amplicon was obtained from the reaction, cloned and DNA from multiple transformants was sequenced. At one terminus of each insert, part of the VH leader could be detected, followed by the VH intron and sequence through FR1 into CDR1. At the other terminus, sequence was near‐identical to the downstream flanking region at the JHlow locus. These features confirmed that the amplicons originated from rearranged genomic DNA (data not shown). In all clones, alignment of the sequence with the JHlow locus showed rearrangement of a segment similar to JH4 had taken place. Sequences downstream of the rearranged segment showed excellent matches to the inter‐genic regions present at the JHlow locus, and segments similar but not identical to JH5 and JH6 were identified. Given that the B cells were isolated from an adult animal, it was not surprising that all clones were unique since antigen‐driven processes would likely have introduced nucleotide substitutions during the development of individual B lymphocytes.

The regions encoding FR4 from six independent clones are shown in Fig. 4(A). C to A and A to C substitutions to the sequence observed at the JHlow locus were observed, modifying codons for P and I to Q and L. Adjacent, the AAAA motif noted earlier at the JHlow locus was replaced in B cell DNA with TGGT. This substituted codons for Q and N with L and V. These substitutions matched the sequence observed commonly in FR4 of Ig H chain cDNA. Additional substitutions were also detected. Some were silent, others generated amino acid substitutions at a G codon (Fig. 4A).

Two other features of amplicons from bovine B cell DNA were significant. The first was the appearance of an insert of 21 bp immediately upstream of the JH5 segment (Fig. 4B). This was well conserved and very similar to a sequence from the equivalent region of the ovine JH locus. The second feature was a series of nucleotide substitutions in the JH6 segment that gave rise to changes in the coding sequence (Fig. 4C). The most significant of these alterations was a G to C substitution that created a radical W to C alteration at the start of the coding sequence for FR4.

We addressed the possibility that the genotype of the B cell donor differed in some fundamental way from animals previously sampled. To do this, the majority of the JHlow locus spanning segments JH3, JH4 and JH5 was recovered from leukocytic DNA of the donor animal by PCR with internal primers 8 and 6. No significant differences with the JHlow locus were detectable (Fig. 4A and B).

Duplication of the bovine JH locus

To resolve the mismatch between the JHlow locus and rearrangement products recovered from B cells, we sought evidence for a second JH locus that was selected for high‐frequency rearrangement. We designated this JHhigh. Primers were designed against the insertion noted upstream of JH5 in rearranged DNA but absent from the JHlow locus (Fig. 4B; Table 1). PCR was carried out on liver genomic DNA with internal primer 11 and insert reverse to recover a product of ∼480 bp. A fragment of 750 bp was generated with insert forward and downstream primer. Multiple clones were sequenced (Fig. 5) and compared with the JHlow locus (Fig. 1), the bovine JH locus recovered from BAC 66R4C11 by Zhao et al. (18) (GenBank accession number AY158087), and B cell amplicons recovered earlier (Fig. 4). Alignment of the JH4 segments was particularly striking (Fig. 5A). Limited differences between JHlow and JHhigh were seen in the RSS; the substitution of G for T improved the match of the nonamer motif with consensus. In the reading frame for the JH4 segment, a nucleotide insertion and 10 substitutions were observed at JHhigh compared with JHlow. Rearrangement of JH4 from JHhigh would therefore create directly the FR4 sequence recovered from B cell genomic DNA and commonly observed in bovine Ig H chains (3–5). At the JHhigh locus, the sequence from the JH4 segment to the insertion matched precisely the corresponding region of AY158087, the JH locus recovered from BAC 66R4C11 by Zhao et al. (18) (Fig. 5A).

Downstream from the JH4 segment, a copy of the JH5 segment was present at the JHhigh locus that was almost identical in sequence to that at JHlow (Fig. 5B). A 20 nt deletion was detected ∼350 bp downstream from this feature. Beyond, a JH6 segment was present. RSS sequences appeared identical at the two loci but there were significant differences in the reading frame at JHhigh: notably, the W codon was replaced with TGC (C). Comparison again showed that bases present in JH6 at JHhigh consistently appeared in sequences recovered from B cells. Throughout, the match to the sequence characterized by Zhao et al. (18) was striking.

Thus, there was no evidence to indicate the JH4 segment present at JHlow and mapped to BTA11 underwent rearrangement, although rearrangement of JH6 could be detected. Rather, the majority of bovine Ig H chains originated from rearrangement of the fourth segments at the JHhigh locus located on BTA21. It seemed that the only feature of the sixth segment at this location to exclude it from rearrangement was the radical change that it would bring to the sequence of FR4. As would be predicted from their chromosomal assignments, JHlow and JHhigh could be recovered from bovine genomic DNA by locus specific PCR with internal primer 11 and hi reverse, and lo forward and downstream primer respectively. This result was obtained with samples from three individual animals from the breeds South Devon, Belgian Blue and Limousin (data not shown), excluding the possibility that the loci were variants of the same allele.


The Ig H chains of cattle and sheep are very similar at the nucleotide and protein levels, both being founded upon the rearrangement and expression of single VH gene families and dominant JH segments (3–5,11,26). This study exploited these similarities to recover and characterize a copy of bovine JH locus (GenBank accession number AY149283) that we have termed JHlow since it undergoes rearrangement with low but detectable frequency. The sixth JH segment present at the locus formed the substrate for this process. The large majority of bovine Ig H chains carry a FR4 sequence derived from a second JH segment (3–5,8,9) but this could not be detected at the JHlow locus. This observation marks an important distinction between the Ig systems of cattle and sheep, animals that are otherwise strikingly similar in immunological terms.

This finding was unexpected and we have checked carefully that it did not arise from an artefact of the isolation strategy. Using PCR, the JHlow locus could be isolated reproducibly from multiple individual animals of different breeds, arguing against allelic variation within the bovine population or the existence of multiple allotypes. It was also recovered from a lambda clone carrying sequence to beyond the IgM constant region exons (15) and from a BAC clone carrying Cµ exons. Linkage with a Cµ locus has enabled assignment of JHlow to bovine BTA11, another important finding since the main bovine Ig H chain locus is located on chromosome 21 (23). Although trans‐chromosomal switching of Ig class has been documented in rabbits (27–29) and mice (30), translocation of the antigen‐receptor loci to other chromosomes is more often associated with their exclusion from the rearrangement process (31–33). In some cases, duplication on the same chromosome can have the same effect (34). The ability of the bovine Ig system to recruit the JHlow locus for rearrangement, albeit at low frequency, therefore seems a highly unusual property.

By recovering the rearranged H chain locus from B cell genomic DNA, it was apparent that the JH segment selected for rearrangement lay on a JH locus similar but not identical to JHlow. This was designated JHhigh and was found to be identical to the JH locus recently reported by Zhao et al. (18,19). These authors isolated the locus from BAC 66R4C11 that carries Cµ, Cδ, Cγ3 and Cγ1 constant region genes and through overlap with other clones carrying the remainder of the bovine Ig constant region locus, can be unequivocally mapped to BTA21.

Formation of the bovine H chain would thus seem possible by two pathways. The first pathway establishes the majority of the bovine H chain repertoire using the JHhigh locus on BTA21 and the VH, D and constant region exons known to be present at this chromosome (23,24,35). The second involves low‐frequency rearrangement of the the sixth segment at the JHlow locus on BTA11. Ig H chain products of this pathway carry the FR4 sequence typified by foetal clone F52M (3) in which R residues substitute for the more commonly observed residues Q and L. For the moment, it remains unknown if VH and D segments are present on this chromosome. If they prove to be absent, less conventional mechanisms [e.g. trans chromosomal rearrangement (27–29)] may have to be sought.


CDR—complementarity‐determining region

FR—framework region

RSS—recombination signal sequence


The authors are grateful to Professor K. Knight (Loyola University, Chicago), Professor A. Ponce de León (University of Minnesota), Dr S. Stephens (Institute for Animal Health, Compton, UK) and Dr F. Piumi (INRA, Jouy‐en‐Josas, France) for their support, advice and provision of materials. This work was supported by a scholarship to A.H. from the Ministry of Science, Research and Technology, Islamic Republic of Iran, and funds from the Research Committee, Institute of Biomedical and Life Sciences, University of Glasgow.

Fig. 1. Aligned nucleotide and protein coding sequences of bovine JH segments present at the JHlow locus. The nucleotide sequences of six bovine segments (JH1 to JH6) are aligned using features of the RSS indicated at the top, and the TGG codon for W that marks the start of FR4. Reading frames are shown over each nucleotide sequence, using the reading frame for the W codon. JH4 is aligned with a sequence commonly observed in bovine H chains cDNA (clone F27M) (3). JH4 is also aligned with its ovine homolog (ovine JH1) (11). JH6 is aligned with an alternative sequence detected in bovine H chain cDNA (clone F52M) (3) and its ovine homolog [ovine JH2; (11)].

Fig. 1. Aligned nucleotide and protein coding sequences of bovine JH segments present at the JHlow locus. The nucleotide sequences of six bovine segments (JH1 to JH6) are aligned using features of the RSS indicated at the top, and the TGG codon for W that marks the start of FR4. Reading frames are shown over each nucleotide sequence, using the reading frame for the W codon. JH4 is aligned with a sequence commonly observed in bovine H chains cDNA (clone F27M) (3). JH4 is also aligned with its ovine homolog (ovine JH1) (11). JH6 is aligned with an alternative sequence detected in bovine H chain cDNA (clone F52M) (3) and its ovine homolog [ovine JH2; (11)].

Error and bias correction with unique antibody identifiers

Since, by definition, the naïve B cell subset does not contain any clonally related sequences, we focused our lineage analysis on the IgG+ memory B cell subset. We isolated IgG+ memory B cells from 8 healthy human donors and sequenced the encoded antibody heavy chains (Supplementary Table 1). To minimize sequencing errors and amplification bias, we adapted a previously used barcoding strategy17 that involves labeling transcripts with unique random sequence tags. Based on the estimated number of input B cells (15,000–50,000 per sample), we selected a random tag length of 20 nucleotides, which theoretically produces 420 (roughly one trillion) unique antibody identifiers (UAIDs) and provides a very high likelihood (97.2%) that each antibody transcript is uniquely labeled.

To investigate the degree to which amplification biases affect the sequenced antibody repertoire composition, we grouped sequences by UAID and determined the size of each UAID group (Fig. 1A). We discovered several UAID groups containing over 1000 sequences, indicating that amplification bias had skewed the representation of these transcripts by multiple orders of magnitude. Compounding the problem, sequencing errors are able to convincingly mimic the natural antibody maturation process9. To examine the effect of a lack of error correction on the generation of accurate clonal lineages, lineage assignments were made using raw sequences without UAID correction and a single lineage was selected from each donor (Fig. 1B). If UAID correction was now carried out, each of the lineages in Fig. 1B was found to originate from a single antibody transcript. In other words, all of the diversity contained in these lineages is due to sequencing error, not the antibody maturation process; sequencing errors and disproportionate amplification have combined to produce artifactual ‘lineages’ that contain no information about naturally occurring antibody diversification.

Clonal lineage assignment

To permit unseeded lineage analysis of corrected antibody sequences, we developed the Clonify algorithm, which is shown schematically in Fig. 2A. Briefly, a distance matrix is calculated using an antibody-specific distance metric for each pair of sequences and the sequences are hierarchically clustered into lineages. To determine an appropriate clustering threshold, we used Clonify to calculate distance scores for 1000 UAID-corrected sequences from each of the 8 donors. Scores were binned and the frequency of each bin was calculated. There was a distinct divide in the scoring frequencies, indicating clear separation between the scores of related and unrelated sequences (Fig. 2B). We next sought to test the accuracy of Clonify’s lineage assignments. Unfortunately, the sort of large, annotated datasets of clonal lineages that would allow robust accuracy assessment are not available. Instead, we assayed the accuracy of the Clonify algorithm using three-pronged resources: a relatively small dataset of known clonally-related antibody sequences, larger datasets of presumably clonally-related antibody sequences that were identified using a seeded lineage assignment algorithm19, and several large NGS datasets from normal human donors for which antibody clonal relationships are unknown.

We first assembled a panel of HIV broadly neutralizing antibody (bnAb) sequences that contains multiple groups of known clonally related sequences10,20,21,22,23,24,25,26,27,28. Overwhelmingly, Clonify correctly grouped sequences into lineages (Fig. 2C) and appropriately segregated singletons (sequences without known clonal relatives in the bnAb dataset). Notably, although the bnAb dataset contains several genetically similar ‘VRC01-class’ lineages10,20,21 from multiple donors, Clonify correctly assigns these lineages. Further, the two cases in which Clonify made putatively incorrect assignments, that is excluding PGT153 from the PGT151 lineage26 and assigning PGT130/131 and PGT125-128 to separate lineages23 are the lineages for which evidence of a clonal relationship is weakest. In the case of the PGT150 lineage, PGT153 has very low HCDR3 homology to other PGT150 lineage members (39–46%; Figure S2) and shares very few somatic mutations with other PGT151 members26. In fact, the variable region of PGT153 is so distinct that it is assigned a different DH gene and VH allele to the rest of the PGT150 family by both IMGT and IgBLAST (Table S3). PGT130/131 and PGT125–128 are a similar case, with substantial divergence in the HCDR3 and minimal shared somatic mutation23. If these sequences are true somatic relatives, they appear to have diverged from the rest of the lineage very early and matured independently.

We next compared Clonify to a previously published seeded lineage assignment algorithm19. Using two data sets, in which sequences were identified by the seeded assignment algorithm as clonally related to the HIV bnAbs PGT141 or PGV04, Clonify was run on putative PGT141- or PGV04-like sequences. For both datasets, we found that Clonify closely reproduced the results of the seeded lineage assignment algorithm. In the case of the PGT141 lineage, 274 putative PGT141-like sequences were identified by the seeded algorithm and 259 of those sequences (94.5%) were assigned to a single lineage by Clonify. Lineage sizes and representative junctions for each Clonify-assigned lineage are shown in Table S4. Similar results were seen for the PGV04 lineage. Of 4267 putative PGV04-like sequences identified by the seeded algorithm, 4002 were assigned to a single lineage by Clonify (93.8%; Table S5).

Since clonal lineages, by their most literal definition, must originate from a single naive B cell, true lineages should not be shared between individuals. It follows, then, that an accurate lineage assignment algorithm, when given a pool of sequences from multiple donors, should build clonal lineages consisting of sequences exclusively from a single donor. We randomly selected two datasets, containing either 1000 or 7000 sequences, from each of our donors, assigned the sequences from each dataset to clonal lineages, and determined the frequency of sequences that belonged to a lineage with at least two members. As expected, due to the deeper sampling in the 7000 sequence datasets, the level of clonality was significantly higher in the larger dataset in each instance (Fig. 3A). We then made 8 leave-one-out cross-validation (LOOCV) sequence pools containing 1000 sequences from 7 of the 8 donors such that each of the 8 donors was left out of one of the sequence pools. Analysis of the LOOCV sequence pools provides a simple test for lineage assignment accuracy: if, at one extreme, the lineage assignment algorithm makes no distinction between sequences from multiple donors, the increased depth provided in the LOOCV pool should result in a frequency of clonally-related sequences equivalent to the single-donor pool of 7000 sequences. At the opposite extreme, if assigned lineages exclusively contain sequences from a single donor, the frequency of clonally related sequences in the LOOCV pool will be equivalent to the frequency seen in the single-donor datasets containing 1000 sequences. As shown in Fig. 3A, the frequency of clonally related sequences in the LOOCV pools is statistically indistinguishable from the single-donor pools containing just 1000 sequences, indicating a high level of algorithmic distinction between sequences from different donors.

To more precisely calculate the frequency of assignments to lineages containing sequences from multiple donors, we iteratively selected increasing numbers of sequences from each of the single-donor sequence sets. The sequences were pooled into multi-donor sequence sets, lineage assignments were made, and the frequencies of ‘correct’ assignments (sequences belonging to lineages containing sequences from a single donor) and ‘incorrect’ assignments were calculated (Fig. 3C). Even using this overly strict definition of clonality, the vast majority of sequences (>97%) are ‘correctly’ assigned. Indeed, among incorrectly assigned sequences, we found a HCDR3 length distribution that skewed toward short HCDR3s (Fig. 3B). Since there is less junctional diversity among sequences with short HCDR3s, the likelihood of multiple donors coincidentally expressing very similar antibodies with short HCDR3s is much higher. Therefore, even though a strict definition of clonal lineages (one in which cross-donor lineages are impossible) would suggest that all lineages containing sequences from multiple donors are the result of incorrect assignment, the low diversity of the short HCDR3 population results in a higher frequency of inter-donor sequences that are genetically indistinguishable from clonally related sequences.

Comparison between Clonify and other unseeded lineage assignment algorithms

To gauge the accuracy of Clonify relative to other unseeded lineage assignment algorithms, we evaluated the performance of each algorithm in two ways. The first, which determines the accuracy of lineage assignment on highly mutated HIV-1 bnAbs of known clonal relationships, is designed to test the inclusiveness of each algorithm. The second test compares the stringency of each algorithm using a pool of antibody sequences isolated from eight healthy donors. Five algorithms were selected for comparison and are referred to by the senior author and year of publication: Quake2013a29, Quake2013b16, Boyd201430, Church20148, and Martinez-Bernetche201531. Details of each algorithm can be found in the Supplementary Methods.

To compare algorithmic inclusiveness, we used the same panel of HIV bnAb sequences shown in Fig. 2C, except that all singleton antibody sequences (those belonging to a lineage with only a single member) were removed. These sequences were assigned to lineages by each algorithm, and the number of correctly assigned antibody sequences was determined (Fig. 4A). Clonify performed the best, correctly assigning 41 of 44 antibodies (93%). Quake2013b assigned 24 antibody sequences correctly (55%). Church2014, Quake2013b and Boyd2014 each assigned approximately one third of the sequences correctly (39%, 34% and 30%, respectively). Martinez-Bernetche2015 performed the least well, assigning each antibody to a separate lineage. We next measured assignment accuracy at the level of the lineage. For a lineage to be counted as correct, every member of the lineage must be correctly assigned. This is a distinct measurement from that shown in Fig. 4A, because at the antibody level, a partially correct lineage earns partial credit; when measuring accuracy at the lineage level, completely correct lineages are required. Of the 12 lineages, Clonify assigned 10 of them completely correctly (83%; Fig. 4B). Interestingly, although Quake2013b performed better than Church2014 when looking at individual antibody sequences, Quake2013b and Church2014 performed identically at the lineage level, with each assigning 4 lineages completely correctly (29%), Quake2013a correctly assigned 3 lineages (21%), Boyd2014 correctly assigned 2 lineages (14%), and Martinez-Bernetche did not assign any lineages completely correctly. It is important to note that the HIV bnAb inclusiveness test is extremely difficult and, in many cases, is a scenario for which these previously published algorithms were not designed. We provide these results not to diminish the usefulness of these algorithms, which have previously been shown to be highly accurate on data sets for which they were designed, but only to demonstrate the difficulty of performing unseeded lineage assignment when lineages contain highly divergent antibody sequences.

Our second comparison uses a pool of sequences derived from eight healthy donors. As in Fig. 3C, lineages are assigned from the cross-donor pool, and the fraction of incorrectly assigned sequences (sequences assigned to lineages containing sequences from multiple donors) is computed. Strikingly, although Clonify is inclusive enough to correctly assign nearly all of the HIV bnAbs, it assigns cross-donor lineages at approximately the same rate as the much less inclusive algorithms Church2014, Quake2013a and Boyd2014 (Fig. 4C). The second-highest scoring algorithm on the inclusiveness test (Quake2013b) is the least discriminating algorithm by a large margin: over 2% of sequences in the 8,000 sequence cross-donor pool are assigned to lineages containing sequences from multiple donors, compared to approximately 1% for Clonify. These results definitively show that Clonify is more inclusive than other unseeded lineage assignment algorithms and, critically, accomplishes this inclusiveness while retaining high stringency.

Antibody repertoire clonality

The combination of error correction and unseeded lineage assignment allows us to broadly characterize clonal lineages in human IgG+ memory B cell repertoires for the first time. We first analyzed the overall level of clonality in the IgG+ memory population for the set of 8 donors. From each donor, increasing numbers of sequences were randomly selected, clonal lineages were assigned, and the frequency of sequences belonging to lineages with at least two members was determined (Fig. 5A). Following a rapid increase in the frequency of clonally related sequences, clonality reaches a plateau at approximately 75%, suggesting that the IgG memory population consists of a relatively small number of frequently occurring lineages combined with a much larger number of less common lineages. To verify this, we determined the distribution of lineage sizes for lineages containing at least two members (Fig. 5B). Across all lineages with at least two members, nearly 50% contain more than 10 members. When considering all lineages, ‘singleton’ lineages containing only a single member comprise 84.5% of the lineage pool.

Several prominent pathogens, including influenza virus and RSV, have been shown to elicit antibody responses with biased germline gene use32,33,34,35. Since the majority of the population has been repeatedly exposed to such pathogens, we attempted to isolate the effect of large clonal lineages on the composition of the IgG+ memory repertoire. We first compared the germline composition of lineages to the germline composition of individual sequences. To calculate the germline composition of lineages, we determined the germline gene family used by each lineage, counting each lineage only once regardless of size. In this way, we eliminate the influence of lineage size on the use of germline genes to determine the extent to which large lineages are able to bias the total IgG memory repertoire. The frequency of variable, diversity and joining germline gene family use at the lineage level (Fig. 5C–E) is statistically indistinguishable from the frequency of germline gene family use at the individual sequence level. We next sought to determine additional features related to lineage size. We sorted all lineages by size and compared CDR3 length, nucleotide mutation frequency and amino acid mutation frequency to the lineage size (Fig. 5F–K). None of these features correlated with lineage size, indicating that several prominent genetic features are not skewed by the size of the lineage.

One thought on “Unit 6 Assignment 1 Aligning Account Types And Privileges Or Immunities

Leave a comment

L'indirizzo email non verrà pubblicato. I campi obbligatori sono contrassegnati *