tech_banner
Recombinant human B cell repertoires enable screening for...
Recombinant human B cell repertoires enable screening for rare, specific, and natively paired antibodies AbstractThe human antibody repertoire is increasingly being recognized as a valuable source of therapeutic grade antibodies. However, methods for mining primary antibody-expressing B cells are limited in their ability to rapidly isolate rare and antigen-specific binders. Here we show the encapsulation of two million primary B cells into picoliter-sized droplets, where their cognate V genes are fused in-frame to form a library of scFv cassettes. We used this approach to construct natively paired phage-display libraries from healthy donors and drove selection towards cross-reactive antibodies targeting influenza hemagglutinin. Within 4 weeks we progressed from B cell isolation to a panel of unique monoclonal antibodies, including seven that displayed broad reactivity to different clinically relevant influenza hemagglutinin subtypes. Most isolated antibody sequences were not detected by next-generation sequencing of the paired repertoire, illustrating how this method can isolate extremely rare leads not likely found by existing technologies. IntroductionAntibodies are among the fastest growing therapeutic classes within the biopharmaceutical industry1. Whereas most approved therapeutic antibodies have been obtained by engineering rodent antibodies, highly potent antibodies have recently been identified within humans against many diseases, including microbial infection2,3,4,5, autoimmunity6,7,8, and cancer9. Because these antibodies are elicited in human responses to disease, they are believed to be safer, less immunogenic and in general more translatable to human therapy10. However, the B cells producing these therapeutic antibodies tend to be rare in convalescent patients, making their discovery very challenging. Adding to this challenge is the fact that antibodies are heterodimeric proteins whose specificities are encoded by unique pairs of heavy-chain and light-chain transcripts. Technologies that preserve this native pairing are therefore best suited to recapitulate the functional characteristics of naturally produced antibodies. These methods generally fall into two categories. First, B cells can be cultured in individual wells and their conditioned media screened for function, or their antibody genes directly cloned. However, maintaining large numbers of monoclonal cultures for extended periods of time is laborious, expensive and limits the screening to a fraction of the B cell repertoire11. A more recent development uses next-generation sequencing (NGS) to profile the paired repertoire from millions of B cells12,13. Though the number of cells that can be sequenced with this method is high, inferring antigen-specificity from sequencing information is very challenging, especially from humans who are constantly exposed to a large diversity of antigens. Moreover, validating leads requires gene synthesis, cloning, and expression which can create a severe bottleneck in the number of candidates that can be functionally assessed14. Both of these methods therefore suffer from low screening throughput that overwhelmingly under-samples the ~107 B cells obtained from a typical blood draw. There is an urgent need for a discovery engine that adequately mines the natural B cell diversity to rapidly isolate antigen-specific antibodies from human patients.Here we report the creation of a microfluidic platform that pairs cognate VH and VL transcripts from millions of single cells into 鈥渆xpression-ready鈥?scFv libraries, while still maintaining the ability to profile the paired repertoire by NGS (Fig.聽1a). We coupled these recombinant repertoires with the enormous screening power of phage-display to rapidly enrich for antigen-specific clones. Using this method, we interrogated the antibody repertoires from healthy individuals by screening for influenza hemagglutinin (HA) binding and selected for a panel of cross-reactive leads targeting multiple HA subtypes.Fig. 1: Strategy for generating natively paired libraries.a Isolated B cells are purified from convalescent donors and encapsulated into water-in-oil droplets with RT-PCR reagents such that cognate VH and VL domains are amplified and linked. The resulting amplicon forms an expression-ready scFv which can be directly expressed for screening, displayed on phage for selections, and deep sequenced for repertoire characterization. b Single-cell encapsulation using droplet microfluidics. Cells were stained with CellTracker Red and Green dyes (pseudocolored magenta and cyan, respectively), mixed, and encapsulated into droplets at a density that favored single-cell encapsulation. Each droplet forms an independent reaction vessel in which that cell鈥檚 cognate V genes can be amplified and paired. Scale bar鈥?鈥?00鈥壩糾. c Validation of native chain pairing during droplet RT-PCR. Primary human and mouse B cells were mixed and their CH1鈥揅K domains linked in either encapsulated (鈥渄roplet鈥? or pooled (鈥渙pen鈥? RT-PCR. The resulting amplicons were purified and chain pairing was determined using specific nested primer combinations. Correctly paired species was only obtained using encapsulation, whereas the scambled format generated all possibile combinations in relatively equal amountsFull size imageResultsCognate chain pairing from encapsulated primary B cellsOur approach involves encapsulating single B cells into water-in-oil droplets of ~400鈥塸l in volume (Fig.聽1a). We used glass microfluidic chips with pressure pumps to reliably generate evenly sized droplets at high rates, such that one million B cells could routinely be encapsulated within 40鈥塵in. The architecture of the microfluidic chip is designed to merge two streams of aqueous fluids: one carrying a suspension of B cells and the other containing reagents for one-step reverse transcription and overlap-extension PCR. We titrated the B cell suspension to achieve ~1 cell for every 10 droplets which, based on Poisson statistics, will result in single-cell encapsulation with 95% probability (Fig.聽1b). As the contents of droplets cannot easily be modified once they have formed, we optimized a one-pot reaction mixture to perform cell lysis, reverse transcription, PCR amplification of VH and VL domains, and their linking by overlap-extension PCR. Though it has been reported by several groups that cell-based RT-PCR is not feasible in volumes of 5鈥塶l12,13,15,16, we have successfully amplified Ig transcripts directly from B cells in picoliter-sized droplets. Achieving this required extensive optimizations of the reaction components and conditions. For instance, out of 11 commercially available One-Step RT-PCR reaction mixes tested, only one reliably generated stable droplets and reproducibly yielded RT-PCR amplicons of human GAPDH (Supplementary Table聽1). We confirmed that cells were robustly lysed upon addition of RT-PCR buffer and incubation at 50鈥壜癈 by Trypan Blue staining (Supplementary Fig.聽1), release of cytosolic dyes and detection of nuclear material by SYBR-Green (Supplementary Fig.聽2).To validate our approach at achieving single-cell encapsulation and cognate chain pairing, we first used a mixture of primary human and mouse B cells and designed primer sets to amplify and link the CH1 and C魏 domains (Fig.聽1c; Supplementary Fig.聽3, Supplementary Table聽2). We added complementary overhangs to the reverse-CH1 and forward-C魏 primer sets (the 鈥渋nside鈥?primers) to fuse the two domains with a (Gly4-Ser)3 linker used in our previously described scFv libraries17. Using this design, we mixed equal amounts of primary mouse and human B cells and generated single-chain amplicons in either encapsulated or un-encapsulated formats. The four possible products鈥攏ative (hCH1鈥揾C魏 and mCH1鈥搈C魏) and combinatorial (hCH1鈥搈C魏 and mCH1鈥揾C魏)鈥攚ere easily discernible by nested PCR using species-specific primers and were confirmed by Sanger sequencing. As expected, the combinatorial format produced all possible products but, strikingly, only natively paired amplicons were generated with encapsulation (Fig.聽1c; Supplementary Fig.聽3).Natively paired antibody libraries from human B cellsHaving confirmed that we could successfully pair cognate chains from single primary B cells, we proceeded to capture the paired immunoglobulin repertoire into an expressible format. Primer sets for multiplex amplification of all known human V and J genes were computationally designed from IMGT consensus sequences. We designed 92 primers to amplify the 542 functional human V and J alleles with the appropriate overhangs for scFv generation (Supplementary Data聽1). One million total B cells isolated from the blood of a healthy donor were mixed with 10,000 IM-9 cells (1%) before being encapsulated with our optimized RT-PCR mix to generate a natively paired amplicon library, consisting of (from 5鈥?to 3鈥? part of the VH leader sequence, VH, (Gly4-Ser)3 linker, VL, N-terminus of CL. This product was then used as template for nested PCR with VH FR1 and VL FR4-specific primer sets to generate a full-length scFv amplicon library (Fig.聽2a). As a second validation of correct chain pairing, we used a forward primer specific to the IM-9 CDRH3 sequence (RRGVTDIDPFDI; IM-9-CDRH3-Fwd) with a generic reverse primer (R1; Supplementary Table聽3) to amplify all VL sequences that paired with the IM-9 heavy chain. The resulting amplicon was cloned and analyzed by Sanger sequencing, which showed correct pairing with the known IM-9 VL (QHYNRPWT) in 97/101 colonies (96% pairing accuracy), illustrating that even with a 100-fold abundance of competing B cells, our method is able to maintain correct chain pairing.Fig. 2: Generating natively paired libraries for screening and next-generation sequencing (NGS).a VH and VL domains from each encapsulated B cell are amplified with specific primer sets and paired in-frame via complementary overhangs (green). A nested PCR with VH FR1 and VL FR4 primers generates full-length scFv with overhangs (purple) to enable barcoded paired MiSeq sequencing. VH and VL FR3, CDR3 and FR4 are sequenced with R2 and R1 primers, respectively, while the standard P5-specific primer provides the index read to enable demultiplexing. The amplicons can be cleaved of adapter sequences via Sfi1/Not1 restriction sites for subcloning into expression or phagemid vectors for library generation. b, c NGS data from a representative dataset where scFv libraries from one million stimulated B cells were generated in either emulsion or combinatorial fashion. b Emulsion libraries favored a 1:1 VL:VH ratio whereas combinatorial libraries were scrambled. c In the case where multiple pairings were seen, the dominant partner accounted for 96% of sequences in the emulsion library, whereas partners were more evenly distributed in the combinatorial libraryFull size imageWe then set out to use this method in a discovery campaign to isolate antibodies against influenza hemagglutinin, an antigen to which humans are commonly exposed. Two million B cells were separately isolated from the blood of two healthy donors: total B cells from Donor 1 and switched memory B cells from Donor 2. For each donor, one million cells were encapsulated as above to generate natively paired emulsion libraries. The remaining million cells were used to build combinatorial scFv libraries by separately amplifying VH and VL domains from purified RNA and pairing them by standard overlap-extension PCR, using the same primer set.To obtain an in-depth assessment of the captured repertoire, our nested PCR primers contain barcoded overhangs that enable NGS on the Illumina MiSeq (Fig.聽2a). We designed a custom priming strategy to obtain paired 300鈥塨p reads of the 3鈥?ends of VH and VL, which generated high-quality sequences in regions mapping to FR4, CDR3, and FR3 (Supplementary Fig.聽4). Whereas the VL read was obtained using a priming site introduced at the 3鈥?end of the construct, the VH read required an internal primer annealing to the (G4S)3 linker sequence (Supplementary Table聽3). Sequencing reads were quality filtered and translated before clustering sequences that were greater than 88% identical, which falls within the range of reported thresholds used to correct for sequencing errors while minimizing the loss of truly unique sequences (reviewed in ref.18).Sequences were annotated using IgBLAST19 to assign germline families and delineate CDR3 regions. We recovered a total of 212,018 and 2,549,415 unique CDRH3:CDRL3 clusters for the two emulsion and combinatorial libraries, respectively (Supplementary Table聽4). We observed similar distributions of CDRH3 and CDRL3 lengths (between 3鈥?1 and 8鈥?1 amino acids, respectively) between the emulsion and combinatorial libraries (Supplementary Fig.聽5a, b and Supplementary Fig.聽6a, b). The emulsion libraries also included most known V and J germline families (Supplementary Fig.聽5c, d, e and Supplementary Fig.聽6c, d).As a third validation that our system preserves chain pairing, we determined the number of unique CDRL3 sequences that were paired with each CDRH3 sequence. As expected, the combinatorial libraries displayed promiscuous pairing, with each CDRH3 sequence paired with a median of 5鈥? unique CDRL3 sequences (Fig.聽2b; Supplementary Table聽4). Given that the sequencing depth (106) vastly under-samples the theoretical sequence diversity of the combinatorial libraries (1012), the true rate of combinatorial pairing would likely be considerably higher. This was in stark contrast to the emulsion libraries, where we observed a median of 1:2 CDRH3/CDRL3 pairing with narrow distribution. In cases where multiple pairings were detected, top-pair analysis determined a 96% accuracy in VH鈥揤L pairing (Fig.聽2c). Top-pair analysis has been used to validate cognate chain pairing in a previously published method that generates amplicons only suitable for sequencing, but not for screening13. Interestingly, using similar sequencing depth and starting cell numbers we have found the pairing efficiency to be significantly better in our approach (p鈥?lt;鈥?.001; Supplementary Fig.聽7, Supplementary Table聽5). This could be because our method covalently links VH and VL domains within the droplets, rather than using an intermediate bead binding step where exchange of mRNA species between beads might occur20.Selection for cross-reactive anti-hemagglutinin antibodiesThe emulsion and combinatorial libraries were bulk subcloned into a phagemid vector17 to construct phage-display libraries of approximately 108 transformants. Monoclonal phage ELISA against the myc tag fused to the scFv indicated that most transformants in the libraries displayed scFv well, with positive display seen for 90鈥?9% of clones (Supplementary Table聽6). The four libraries were subjected to two rounds of enrichment on the 2009 pandemic influenza A hemagglutinin (A/California/07/2009 H1N1). Polyclonal phage ELISA confirmed robust enrichment for specific binders regardless of the B cell source (Fig.聽3a, b; Supplementary Fig.聽8). Of note, while the combinatorial libraries showed an overall stronger specific enrichment, NGS of the enriched libraries revealed a strong bias (85%) for the IGHV1-69 germline family as compared to the corresponding emulsion library (15%; Supplementary Fig.聽9). It has previously been shown that IGHV1-69 containing antibodies can contact group 1 hemagglutinin subtypes through heavy-chain interactions alone2. It is therefore likely that the enrichment of combinatorial libraries was driven by selecting for VL partners to IGHV1-69 that expressed or folded well in bacteria and highlights a key bias with combinatorial libraries.Fig. 3: Phage-display enrichment for antigen-specific antibodies.Emulsion libraries obtained from total (a) and memory (b) B cells were subjected to two rounds of phage-display panning on hemagglutinin H1 (A/California/07/2009 H1N1) and enrichment was measured by polyclonal phage ELISA, where the unselected library is shown in blue and the enriched library in red. c Specific binding data for 17 monoclonal scFv-Fc antibodies against two different influenza hemagglutinin subtypes. Monovalent affinities (measured by biolayer inferometry) and bivalent EC50 binding (measured by ELISA) are shown as filled and open symbols, respectively, against H1 (A/California/07/2009 H1N1鈥攃ircles) and H5 (A/Vietnam/1203/2004 H5N1鈥攖riangles). d Binding data of cross-reactive antibodies against a panel of hemagglutinin antigens from Group 1 and 2 subtypes of influenza A and both lineages of influenza B, where observed binding by biolayer inferometry is shown in green (GP130-his was included as a negative control). Full virus strain designations specified in MethodsFull size imageTo specifically enrich for cross-reactive antibodies, we panned the first round output from the emulsion libraries on a non-circulating group 1 subtype, influenza A hemagglutinin H5 (A/Vietnam/1203/2004 H5N1). The enriched scFv pools were bulk subcloned into an scFv-Fc expression vector21 and screened by ELISA. Of the 5,632 clones screened, we identified 320 clones that show specific binding to H1, consisting of 17 unique antibodies that represent different germlines and display a range of affinities, with several in the picomolar range (Fig.聽3c; Supplementary Data聽2). This includes seven unique antibodies that bind to both antigens used during panning, most of which display an even greater breadth of cross-reactivity while showing no binding to a negative control antigen, GP130-his (Fig.聽3d). It is interesting that two of these antibodies (0089AY-D17 and 0084GM-D06) display specific binding to all 10 hemagglutinin subtypes we tested, including subtypes from influenza A (Group 1 and 2) and both lineages of influenza B (Fig.聽3d). Competition studies using biolayer inferometry revealed that these antibodies do not compete for binding with the published cross-reactive antibodies FI6v322, MEDI885223, 1N2324, and CR911425, suggesting that they may define novel cross-reactive epitopes (Supplementary Fig.聽10). Finding such universal anti-influenza antibodies was unexpected, particularly within healthy donor samples, yet underscores the power of our method to isolate rare specificities through deep mining of the repertoire. It is possible that we selected for an epitope unique to the recombinant HA, such as the Foldon-His domain added to enable trimerization and facilitate purification. To test this, we expressed a different protein (RSV-F) fused to the identical Foldon-His domain and did not observe binding to our antibodies (Supplementary Fig.聽11). It will be interesting to see the results from structural studies that fully characterize the identity of this epitope.To test for neutralization activity, we expressed and purified the seven cross-reactive antibodies as IgG1 heterodimers. Two of these (ZP14CE-A03 and 0084GM-D06) did not retain binding after conversion and were excluded from further analysis. This falls within the expected attrition rate we have seen when converting between scFv-Fc and IgG1 formats26. For the remaining five antibodies, we performed microneutralization assays using the seasonal H1N1 A/California/07/2009 virus or the non-seasonal H5N1 A/Vietnam/1203/2004 virus isolated from human infection. Our antibodies did not exhibit neutralizing activity against either virus at the highest concentration tested (1鈥塵g鈥塵l鈭?; Supplementary Fig.聽12), though we cannot rule out the possibility of these antibodies acting via indirect antiviral mechanisms such as Fc-mediated effector function. Although our goal was to drive discovery of cross-reactive antibodies through selection of natively paired libraries, using this technique with functionally profiled B cell sources (for instance using AMBRA27) could improve the chances of also isolating antibodies with specific mechanistic properties.To ascertain the relative frequency of our leads within the captured B cell repertoire, we searched for their respective CDRH3:CDRL3 pairs within our sequencing dataset, allowing for up to four amino acid mismatches to account for possible PCR- and sequencing-induced mutations. A recent addition to the field of repertoire sequencing has been the introduction of universal molecular indices28 to correct for such mutations and ideally our strategy would have included this approach. However, this was a challenge since terminally appended barcodes would be removed during nested PCR, and incorporating internal barcodes would likely compromise the functional integrity of our scFv library. Even with this permissive cutoff, only one of the 17 antigen-specific sequences was observed among the 212,018 unique paired sequence clusters, implying that the remaining hits were too rare to be detected by NGS. This sequence (0089EA-C02) accounted for 2 out of 4,820,834 mapped VH鈥揤L reads (Supplementary Table聽4). Following selection, we found this clone repeated in 32 out of the 5,632 clones screened, an enrichment of approximately 14,000 fold. It is possible that more abundant antigen-specific sequences existed within the repertoire, but were not selected because of differences in expression and folding of human antibodies in bacteria29,30. This could be readily addressed in future iterations of the technology by employing other platforms for displaying native human antibodies, such as yeast-display31. As this platform depends on successful PCR from gene-specific primers, it should also be noted that antibody genes mutated within the primer binding sites may be excluded from the resulting library, though it is possible that ancestral antibodies of equal activity yet having fewer mutations23,32 could still be captured. Nevertheless, this particular set of leads could not have been predicted from sequencing information alone. If one assumes that the scarcity of these leads determined by NGS represents that within the original B cell pool, it would also seem very unlikely that these leads could be found through standard methods of culturing and screening individual B cells.DiscussionWe have developed a technology that can rapidly capture the native repertoire from millions of primary human B cells into a powerful and sensitive screening platform, with significant implications for therapeutic antibody development, immune repertoire characterization and can have wide application in rational vaccine design. By linking the variable domains into a translatable scFv format we can now combine the strengths of multiple technologies: using the immense screening power of display platforms to mine the full richness of a naturally evolved antibody response. The process outlined here is one of the fastest for antibody discovery from natural repertoires. A single researcher can rapidly progress from millions of primary B cells to specific monoclonal antibodies within 4 weeks. This could be especially valuable for combating emerging infectious diseases, such as the recent Ebola and Zika outbreaks. The method is also very robust, as to date we have successfully recovered scFv libraries from over 20 separate encapsulations of one million B cells from various donors (a subset of which is shown in Supplementary Fig.聽13). These libraries constitute a renewable resource that can be expanded as new donors are added, panned repeatedly against a multitude of targets (including whole bacteria or tumor tissue), or archived indefinitely for future use. Large-scale efforts that use NGS to predict antibody function could particularly benefit, such as the recently launched Human Immunome Program33. This project aims to sequence the expressed antibody repertoires from 1000 individuals and infer vaccine reactivity based on sequencing information alone. An exciting addition to this project could be to use the method outlined here to build display libraries from these individuals, such that one could directly measure the reactivity of the human repertoire to any number of vaccine candidates.While the steps outlined here enable isolation of antibodies from human B cells, it can readily be extended to isolate antibodies from any species for which V-gene sequence information is available. This can be particularly useful for expanding the breadth and depth of the hybridoma technology, where low fusion efficiencies ( 0.5% 34) lead to significant loss of repertoire. The technology can also be applied for generating monoclonal antibodies from organisms for which myeloma fusion partners are not available. T cell receptor (TCR) repertoires (consisting of paired 伪/尾 or 未/纬 chains) could also be captured and screened in a similar recombinant format, as single-chain TCR has been shown to be amenable to selection by phage-display and yeast-display35,36. Finally, the method presented here uses equipment and reagents that can be purchased from commercial vendors at prices well within the reach of most laboratories. This opens the door for widespread mining and archiving of antibody repertoires for de novo identification of functional antigen-specific antibodies from natural sources.MethodsPrimer designAll primers were designed to anneal with a minimal melting temperature of 60鈥壜癈 and, where possible, were consolidated to have at most 4 degenerate bases. Primer sets for amplifying the CH1 and C魏 domains are listed in Supplementary Table聽2 and follow the same naming scheme as below. Primer sets for scFv amplification were designed for maximal coverage of all human immunoglobulin sequences. Consensus nucleotide sequences for leader, variable and constant regions were downloaded from IMGT37 for the human heavy, lambda, and kappa genes (excluding pseudogenes and truncated transcripts) and four sets of primers were designed for each gene family. Outside primers (鈥渙ut鈥?subscript in the primer names; Supplementary Data聽1) were designed using Primer338,39 to span the splice junction of the leader sequence (鈥渙ut_5鈥?primers) or bind within the first 50 bases of the constant domain (鈥渙ut_3鈥?primers) including, when possible, the V鈥揅 splice junction. Inside primers (鈥渋n鈥?subscript in the primer names; Supplementary Data聽1) were designed by fixing the 5鈥?end of the primer to the start (鈥渋n_5鈥? or end (鈥渋n_3鈥? of the V coding sequence. FR4-specific primers were also manually extended for increased specificity. The inside PCR primer sets were fused to overhangs to enable linker formation (VH_in_3 and VK/L_in_5), or restriction digestion by NotI or SfiI and barcoded Illumina MiSeq sequencing (Fig.聽2a; Supplementary Table聽2, Supplementary Table聽3 and Supplementary Data聽1).Validation of native chain pairing of antibodiesInformed consent was obtained from all subjects prior to human blood collection and the process was approved under the Chesapeake IRB (protocol number 2010-001). Primary human B cells were isolated from 80鈥塵l human blood samples using the RoboSep Human B Cell Enrichment Kit (StemCell Technologies). Primary mouse B cells were isolated from splenocytes using the Mouse B cell Isolation Kit (StemCell Technologies) according to the manufacturer鈥檚 instructions. Isolated cells were centrifuged at 500鈥壝椻€?i>g for 10鈥塵in and re-suspended in RPMI1640 (Invitrogen), supplemented with insulin-transferrin-selenium (Invitrogen), 10% fetal bovine serum (Invitrogen), 0.5鈥壩糶鈥塵l鈭? megaCD40L (Enzo), 33鈥塶g鈥塵l鈭? IL-21 (produced in-house) and penicillin鈥搒treptomycin鈥揼lutamine (Invitrogen), and incubated at 37鈥壜癈 and 5% CO2 for 48鈥塰. Mouse and human B cells were combined in a 1:1 ratio and 10,000 cells from this mixture were encapsulated as described below. A parallel 鈥渃ombinatorial鈥?reaction was performed by combining 10,000 cells directly in RT-PCR mix (described below) without encapsulation. RT-PCR and nested PCR conditions were as described below using primer sets designed to amplify and link the CH1 and C魏 domains (Supplementary Table聽2).Encapsulation of primary human B cellsTotal B cells from healthy donors were isolated and stimulated as described above. For memory B cell isolation, we further used the Human Switched Memory B Cell Isolation Kit (Miltenyi Biotec). Two million cells were washed in PBS (3鈥塵in at 700鈥塯) and split into two halves: one million cells were processed for total RNA using the RNEasy RNA isolation kit (Qiagen) according to the manufacturer鈥檚 instructions. The remaining one million cells were re-suspended in 250鈥壩糽 encapsulation buffer: hypo-osmolar electrofusion buffer (Eppendorf cat no 940002001) containing 1:1000 dilution of Anti-Clumping Agent (Invitrogen cat no 01-0057AE) and 16% OptiPrep Density Gradient medium (Sigma cat no D1556). This cell concentration resulted in one cell being encapsulated in every 10 droplets on average. Theoretically, this means that the number of cells encapsulated in a single droplet follows a Poisson distribution:$$p(k) = frac{{lambda ^ke^{ - lambda }}}{{k!}},{mathrm{, where}},{mathrm{lambda = 0}}{mathrm{.1}}$$From this we calculate the percentage of single-cell droplets out of all non-empty droplets as follows:$$frac{{pleft( {k = 1} right)}}{{1 - p(k = 0)}} = 95.08{mathrm{% }}$$Cells were encapsulated at a 1:1 ratio with 2脳 RT-PCR master mix. The primers within each set were mixed in equal amounts and optimized concentrations of each set were added to the RT-PCR mix. The 2脳 RT-PCR master mix was composed of 139鈥塶M VH-out-F, 416鈥塶M VL-out-R, 39鈥塶M VH-in-R, and 13鈥塶M VL-in-F (Supplementary Data聽1), 2脳 One Tube RT-PCR reaction buffer (Roche cat no 11855476001), 4% Titan One Tube RT-PCR enzyme mix (Roche cat no 11855476001), 18.2% Q solution (Qiagen cat no 210212), 0.4鈥塵M dNTP (Invitrogen cat no 18427013), 10鈥塵M DTT (Roche cat no 11855476001), and 120 units RNaseOUT (Invitrogen cat no 10777019).Encapsulation was performed on a 2-reagent droplet generation fluorophilic chip (Dolomite cat no 3200510) with fluids pumped using an OB1 flow controller (Elveflow cat no MKII). Aqueous liquids of cells and RT-PCR mix were each pumped at 30鈥塵bar while HFE7500 fluorinated oil鈥?鈥?% w/v 008-fluoro-surfactant (RAN Biotechnologies cat no 008-FLUOROSURFACTANT-HFE7500) was pumped at 67鈥塵bar, with pressures fine-tuned to obtain a 1:1 mix of aqueous fluids. The resulting emulsion was collected in fractions (about 40鈥壩糽 emulsion per fraction) in PCR strip tubes and overlaid with 40鈥壩糽 mineral oil. Excess fluorinated oil was removed to maintain an overall volume of about 100鈥壩糽.In parallel, we created a combinatorial library from the purified RNA. In total, 250鈥塶g total RNA was used for RT-PCR using the same master mix as with emulsions, except that the VH and VL sequences were amplified separately and then paired by overlap-extension PCR, as described below.Amplification of Scfv containing natively paired V-genesEncapsulated and combinatorial libraries were created by reverse transcription for 30鈥塵in at 50鈥壜癈 followed by 2鈥塵in at 88鈥壜癈. This was followed by 45 (emulsion) or 35 (combinatorial) cycles of PCR (88鈥壜癈 for 10鈥塻, 62鈥壜癈 for 30鈥塻, 68鈥壜癈 for 45鈥塻) and a final extension step of 7鈥塵in at 68鈥壜癈. Excess oil below the droplets was manually removed and the droplets chemically coalesced using an equal volume of Pico-Break 1 (Dolomite cat no 3200228). Amplified DNA was electrophoresed on 2% agarose and the region between 650鈥塨p-1000鈥塨p purified using the QIAquick gel-extraction kit (Qiagen).Nested PCR amplification consisted of 25% purified RT-PCR product, 100鈥塶M VH-in-F and VL-in-R primer pools (Supplementary Data聽1), 1脳 Hifi Platinum PCR buffer, 0.15鈥塵M dNTP, 1.5鈥塵M MgSO4, and 0.6 units Hifi Platinum Taq (Invitrogen, cat no 11304011). Cycling conditions consisted of an initial denaturation step of 2鈥塵in at 94鈥壜癈 followed by 50 cycles of PCR (94鈥壜癈 for 30鈥塻, 55鈥壜癈 for 30鈥塻, 68鈥壜癈 for 60鈥塻) and a final extension step of 10鈥塵in at 68鈥壜癈. Products were again size-selected as above.A final scale-up PCR was performed using common forward (Illu_scaleup_F) and barcoded reverse primers (Illu_R_N50X) to enable library construction and Illumina sequencing (Supplementary Table聽3). We used the Q5 polymerase (NEB, M0491S) according to manufacturer鈥檚 instructions with the following thermocycling program: 98鈥壜癈 for 2鈥塵in, 12鈥?0 cycles of 98鈥壜癈 for 10鈥塻 and 72鈥壜癈 for 30鈥塻, 72鈥壜癈 for 2鈥塵in.Imaging of single encapsulated B cellsB cells were stained using CellTracker Red CMTPX or CellTracker Green CMFDA dyes (Life Technologies) according to the manufacturer鈥檚 instructions. Stained cells were re-suspended in PBS and encapsulated using the conditions described above, substituting the RT-PCR mix with PBS. Bulk cell lysis was measured by counting 100,000 cells on the ViCell Cell Viability Analyzer (Beckman Coulter) using Trypan Blue staining and default settings, or by incubating the same number of cells in either 100鈥壩糽 culture media or RT-PCR buffer and imaging cells in a 96 well U-bottom plate (Corning) at 40x magnification. Cell lysis in droplets was imaged in two ways: (1) stained cells were re-suspended in encapsulation buffer, encapsulated with RT-PCR mix and heated to 50鈥壜癈 for 5鈥塵in; (2) unstained cells were encapsulated with RT-PCR mix containing 2脳 SYBR-Green (Invitrogen) and heated to 50鈥壜癈 for 5鈥塵in. Droplets were collected in 渭-Slide0.1 channel slides (Ibidi) and imaged at 200x magnification using the Evos FL Auto Cell Imaging System (Invitrogen).Next-generation sequencing and bioinformatic analysisEach barcoded library was size-selected to 850鈥塨p by 6% polyacrylamide gel electrophoresis, quantified using the Agilent TapeStation, combined in equal amounts, and subjected to 2鈥壝椻€?00鈥塨p MiSeq sequencing using a custom priming approach (SeqMatic). We performed two forward 300鈥塨p reads with the R1 and R2 primers (Supplementary Table聽3) by modifying the instrument sample sheet to replace the index1 read with our custom primer and generate the VL sequence. The index2 read used the standard Illumina P5 primer. Following demultiplexing, raw Fastq reads were quality-filtered using FastQC, paired by the Illumina Fastq ID, and aligned to IMGT V and J genes using IgBLAST19 to annotate germline families and delineate CDR3 regions. Subsequently, CDRH3 and CDRL3 sequences were concatenated and clustered where the amino acid identity was greater than 88%. Unique CDRH3 and CDRL3 sequences were counted and the numbers of unique VL sequences pairing with each unique VH were used as a measure of pairing efficiency.For CDRH3 sequences paired with multiple CDRL3, the top-pair weight is determined as the ratio of counts between the most abundant CDRL3 and all CDRL3 sequences13. Briefly, for each unique CDRH3, we counted associated unique CDRL3 sequences. If a CDRH3 has m CDRL3 pairs and the count of these pairs are ci (1鈥夆墹鈥?i>i鈥夆墹鈥?i>m), then the top-pair count of this CDRH3 is defined as (max left( {c_{rm {i}}} right)) and top-pair weight is (w = frac{{{mathrm{max}}(c_{rm i})}}{{{rm sum}(c_{rm i})}}). The calculation is performed for all the unique CDRH3, excluding singletons, since such sequences would all have a top-pair weight of 1. Ideally if one CDRH3 is uniquely paired with one CDRL3, the top-pair weight should be 1 (i.e., (max left( {c_{rm i}} right) = {rm sum}(c_{rm i}))).Statistical comparison of top-pair distributions were performed using the Mann鈥揥hitney non-parametric test (one-sided, unpaired) using the R stats function 鈥渨ilcox.test鈥?Phage-display library construction and enrichment for anti-hemagglutinin antibodiesAmplicon libraries were subcloned into pCANTAB6 using Not1 and Sfi1 restriction enzymes (New England Biolabs) and phage-display libraries were generated as previously described17. A total of 96 colonies from each of the four libraries were cultured to mid-log phase and infected with M13-K07 (Invitrogen) to initiate overnight monoclonal phage production. Antibody display was determined by ELISA: 1鈥壩糶鈥塵l鈭? anti-myc antibody (Invitrogen) was immobilized overnight on 96 well MAXISORP plates (Nunc) and blocked for 2鈥塰 with 3% BSA (Sigma) and 0.05% Tween-20 (BDH). Following washing with PBST (PBS pH 7.2 (Invitrogen)鈥?鈥?.05% Tween-20), diluted phage supernatant was bound and detected using an anti-M13-HRP antibody (1:5000, GE Healthcare, cat no 27942101) and visualized with TMB (KPL, cat no 52-00-01).Recombinant hemagglutinin proteins were expressed and purified as previously described40. HA proteins used as follows; H1 CA/09, A/California/07/2009 H1N1; H1 SD/07, A/South Dakota/06/2007 H1N1; H2 MO/06, A/Swine/Missouri/2006 H2N3; H5 VN/04, A/Vietnam/1194/2004 H5N1; H6 HK/97, A/teal/Hong Kong/W312/97 H6N1; H9 HK/97, A/chicken/Hong Kong/G9/97 H9N2; H3 PE/09, A/Perth/16/2009 H3N2; H7 NL/03, A/Netherlands/219/2003 H7N7; B FL/06, B/Florida/04/2006 Yamagata lineage; B BR/08, B/Brisbane/60/2008 Victoria lineage.Phage-display libraries were enriched over two rounds of panning as previously described17. The first round used 75鈥塶M biotinylated hemagglutinin H1 (A/California/07/2009 H1N1), while the second round was panned for cross-reactive clones using 75鈥塶M hemagglutinin H5 (A/Vietnam/1203/2004 H5N1). Amplified phage outputs were profiled by polyclonal ELISA as above, using immobilized NeutrAvidin (Thermo Fisher Scientific) to capture specific biotinylated antigen prior to incubation with phage.NGS libraries from panning outputs were prepared by amplifying DNA purified from the panning outputs with the phagemid-specific primers pCANTAB6-F and pCANTAB6-R (Supplementary Table聽3) with Q5 polymerase (NEB) according to manufacturer鈥檚 instructions with the following thermocycling program: 98鈥壜癈 for 30鈥塻, three cycles of 98鈥壜癈 for 10鈥塻, 65鈥壜癈 for 20鈥塻 and 72鈥壜癈 for 60鈥塻, and a final 72鈥壜癈 extension for 2鈥塵in. This was followed by Illumina barcoding PCR using the same primers and conditions as above (Illu_scaleup_F鈥?鈥塈llu_R_N50X鈥擲upplementary Table聽3) but using only 10 cycles of amplification.Enriched libraries were subcloned into an scFv-Fc expression vector21 using Not1 and Sfi1 restriction enzymes (New England Biolabs) and transformed into chemically competent Top10 cells (Invitrogen). Single clones were grown overnight in LB containing 100鈥壩糶鈥塵l鈭? carbenicillin (Invitrogen) and 2% Glucose (TekNova) before being diluted 1:500 in reconstituted MagicMedia (Invitrogen) containing 100鈥壩糶鈥塵l鈭? carbenicillin (Invitrogen). Cells were induced for 72鈥塰 at 25鈥壜癈 and pelleted by centrifugation. Diluted supernatants were used to determine antigen reactivity by ELISA as described above, using an anti-Fc-gamma-HRP secondary antibody (Jackson ImmunoResearch). Following Sanger sequencing of identified hits, unique clones were expressed in HEK-293 cells for 6 days and supernatants were used to determine EC50 values by ELISA.Measurement of binding affinity and epitope binning by biolayer inferometryAll binding steps were performed in PBS pH 7.2 containing 3% BSA (Sigma) and 0.05% Tween-20 (BDH). ScFv-Fc antibodies at 10鈥壩糶鈥塵l鈭? were immobilized on Protein A biosensors (Forte Bio cat no 18-5010) for 300鈥塻 before an equilibration step for 180鈥塻. Bound antibodies were associated with a minimum of 5 concentrations of biotinylated hemagglutinin H1 (A/California/07/2009 H1N1) or hemagglutinin H5 (A/Vietnam/1203/2004 H5N1) for 300鈥塻, then dissociated in buffer for 600鈥塻 or 2000鈥塻 for high-affinity antibodies. The affinity of the 0084-B14 antibody was assayed as above, but using Anti-human Fc biosensors (Forte Bio cat no 18-5060). Positive binding was determined where the background-corrected delta value was greater than 0.03鈥塶m. For KD determination, curves were fit to a standard 1:1 binding model using analysis software by Forte Bio. Only curves that showed good fits to this model (R2鈥?gt;鈥?.96) were used in affinity calculations.Competition assays were performed by immobilizing biotinylated antibody to streptavidin biosensors (Forte Bio cat no 18-5019) at 10鈥壩糶鈥塵l鈭? for 1200鈥塻 before equilibrating the sensors in buffer containing 250鈥壩糓 d-biotin (Amresco) for 180鈥塻. Association was performed with 1鈥壩糓 hemagglutinin H1 (A/California/07/2009 H1N1) or buffer for 500鈥塻 before associating with competing IgG for 500鈥塻.Viruses and microneutralization assayWild-type influenza strains were obtained from the Centers for Disease Control and Prevention or purchased from the American Tissue Culture Collection. All viruses were propagated in embryonated chicken eggs, and virus titers were determined by mean 50% tissue culture infective dose (TCID50) per milliliter. The microneutralization assay was performed as described previously40. Briefly, 60 TCID50 of virus was added to each well in three-fold serial dilutions of antibody (starting from 1鈥塵g鈥塵l鈭?) in a 384-well plate in complete minimal essential medium containing 0.75鈥壩糶鈥塵l鈭? Trypsin (Worthington) in duplicate wells. After 1鈥塰 incubation at 33鈥壜癈 5% CO2, 2鈥壝椻€?04 MDCK cells per well were added to the plate. Plates were incubated at 33鈥壜癈 5% CO2 incubator for approximately 40鈥塰, and the neuraminidase (NA) activity was measured by adding a fluorescently labeled substrate, methylumbelliferyl-N-acetyl neuraminic acid (Sigma) to each well and incubated at 37鈥壜癈 for 1鈥塰. Virus replication represented by NA activity was quantified by reading fluorescence using the following settings: excitation 355鈥塶m, emission 460鈥塶m; 10 flashes per well.Data availabilityThe NGS data generated during the current study are available at the SRA archive (Study accession: SRP104286). Source code of the analysis methods used in this study are available for use under the GPL v.3 license at GitHub: https://github.com/Jincheng2009/igseqanalysis. References1.Ecker, D. M., Jones, S. D. Levine, H. L. The therapeutic monoclonal antibody market. mAbs 7, 9鈥?4 (2015).CAS聽 Article聽 PubMed聽Google Scholar聽 2.Pappas, L. et al. Rapid development of broadly influenza neutralizing antibodies through redundant mutations. Nature 516, 418鈥?22 (2014).CAS聽 Article聽 PubMed聽Google Scholar聽 3.Wu, X. et al. Focused evolution of HIV-1 neutralizing antibodies revealed by structures and deep sequencing. Science 333, 1593鈥?602 (2011).CAS聽 Article聽 PubMed聽 PubMed Central聽Google Scholar聽 4.Wang, B. et al. Facile discovery of a diverse panel of anti-Ebola virus antibodies by immune repertoire mining. Sci. Rep. 5, 13926 (2015).Article聽 PubMed聽 PubMed Central聽Google Scholar聽 5.Tan, J. et al. A LAIR1 insertion generates broadly reactive antibodies against malaria variant antigens. Nature 529, 105鈥?09 (2016).CAS聽 Article聽 PubMed聽Google Scholar聽 6.Amara, K. et al. Monoclonal IgG antibodies generated from joint-derived B cells of RA patients have a strong bias toward citrullinated autoantigen recognition. J. Exp. Med. 210, 445鈥?55 (2013).CAS聽 Article聽 PubMed聽 PubMed Central聽Google Scholar聽 7.Di Niro, R. et al. High abundance of plasma cells secreting transglutaminase 2-specific IgA autoantibodies with limited somatic hypermutation in celiac disease intestinal lesions. Nat. Med. 18, 441鈥?45 (2012).Article聽 PubMed聽 PubMed Central聽Google Scholar聽 8.Lomakin, Y. A. et al. Heavy-light chain interrelations of MS-associated immunoglobulins probed by deep sequencing and rational variation. Mol. Immunol. 62, 305鈥?14 (2014).CAS聽 Article聽 PubMed聽Google Scholar聽 9.Bushey, R. T. et al. A therapeutic antibody for cancer, derived from single human B cells. Cell Rep. 15, 1505鈥?513 (2016).CAS聽 Article聽 PubMed聽 PubMed Central聽Google Scholar聽 10.Wilson, P. C. Andrews, S. F. Tools to therapeutically harness the human antibody response. Nat. Rev. Immunol. 12, 709鈥?19 (2012).CAS聽 Article聽 PubMed聽Google Scholar聽 11.Tiller, T. Single B cell antibody technologies. New Biotechnol. 28, 453鈥?57 (2011).CAS聽 Article聽Google Scholar聽 12.DeKosky, B. J. et al. High-throughput sequencing of the paired human immunoglobulin heavy and light chain repertoire. Nat. Biotechnol. 31, 166鈥?69 (2013).CAS聽 Article聽 PubMed聽 PubMed Central聽Google Scholar聽 13.DeKosky, B. J. et al. In-depth determination and analysis of the human paired heavy- and light-chain antibody repertoire. Nat. Med. 21, 86鈥?1 (2015).CAS聽 Article聽 PubMed聽Google Scholar聽 14.Galson, J. D., Kelly, D. F. Tr眉ck, J. Identification of antigen-specific B-cell receptor sequences from the total B-cell repertoire. Crit. Rev. Immunol. 35, 463鈥?78 (2015).Article聽 PubMed聽Google Scholar聽 15.White, A. K. et al. High-throughput microfluidic single-cell RT-qPCR. Proc. Natl Acad. Sci. USA 108, 13999鈥?4004 (2011).CAS聽 Article聽 PubMed聽 PubMed Central聽Google Scholar聽 16.Eastburn, D. J., Sciambi, A. Abate, A. R. Picoinjection enables digital detection of RNA with droplet rt-PCR. PloS ONE 8, e62961 (2013).CAS聽 Article聽 PubMed聽 PubMed Central聽Google Scholar聽 17.Vaughan, T. J. et al. Human antibodies with sub-nanomolar affinities isolated from a large non-immunized phage display library. Nat. Biotechnol. 14, 309鈥?14 (1996).CAS聽 Article聽 PubMed聽Google Scholar聽 18.Greiff, V., Miho, E., Menzel, U. Reddy, S. T. Bioinformatic and statistical analysis of adaptive immune repertoires. Trends Immunol. 36, 738鈥?49 (2015).CAS聽 Article聽 PubMed聽Google Scholar聽 19.Ye, J., Ma, N., Madden, T. L. Ostell, J. M. IgBLAST: an immunoglobulin variable domain sequence analysis tool. Nucleic Acids Res. 41, W34鈥?0 (2013).Article聽 PubMed聽 PubMed Central聽Google Scholar聽 20.McDaniel, J. R., DeKosky, B. J., Tanno, H., Ellington, A. D. Georgiou, G. Ultra-high-throughput sequencing of the immune receptor repertoire from millions of lymphocytes. Nat. Protoc. 11, 429鈥?42 (2016).CAS聽 Article聽 PubMed聽Google Scholar聽 21.Xiao, X. et al. A novel dual expression platform for high throughput functional screening of phage libraries in product like format. PloS ONE 10, e0140691 (2015).Article聽 PubMed聽 PubMed Central聽Google Scholar聽 22.Corti, D. et al. A neutralizing antibody selected from plasma cells that binds to group 1 and group 2 influenza A hemagglutinins. Science 333, 850鈥?56 (2011).CAS聽 Article聽 PubMed聽Google Scholar聽 23.Kallewaard, N. L. et al. Structure and function analysis of an antibody recognizing all influenza A subtypes. Cell 166, 596鈥?08 (2016).CAS聽 Article聽 PubMed聽 PubMed Central聽Google Scholar聽 24.Xiao, X. et al. A novel antibody discovery platform identifies anti-influenza A broadly neutralizing antibodies from human memory B cells. mAbs 8, 916鈥?27 (2016).CAS聽 Article聽 PubMed聽 PubMed Central聽Google Scholar聽 25.Dreyfus, C. et al. Highly conserved protective epitopes on influenza B viruses. Science 337, 1343鈥?348 (2012).CAS聽 Article聽 PubMed聽 PubMed Central聽Google Scholar聽 26.Xiao, X. et al. A high-throughput platform for population reformatting and mammalian expression of phage display libraries to enable functional screening as full-length IgG. mAbs 9, 996鈥?006 (2017).CAS聽 Article聽 PubMed聽Google Scholar聽 27.Pinna, D., Corti, D., Jarrossay, D., Sallusto, F. Lanzavecchia, A. Clonal dissection of the human memory B-cell repertoire following infection and vaccination. Eur. J. Immunol. 39, 1260鈥?270 (2009).CAS聽 Article聽 PubMed聽Google Scholar聽 28.Macosko, E. Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202鈥?214 (2015).CAS聽 Article聽 PubMed聽 PubMed Central聽Google Scholar聽 29.Saggy, I. et al. Antibody isolation from immunized animals: comparison of phage display and antibody discovery via V gene repertoire mining. Protein Eng. Des. Sel. 25, 539鈥?49 (2012).CAS聽 Article聽 PubMed聽Google Scholar聽 30.Griffiths, A. D. et al. Isolation of high affinity human antibodies directly from large synthetic repertoires. EMBO J. 13, 3245鈥?260 (1994).CAS聽 PubMed聽 PubMed Central聽Google Scholar聽 31.Wang, B. et al. Discovery of high affinity anti-ricin antibodies by B cell receptor sequencing and by yeast display of combinatorial VH:VL libraries from immunized animals. mAbs 8, 1035鈥?044 (2016).CAS聽 Article聽 PubMed聽 PubMed Central聽Google Scholar聽 32.Macagno, A. et al. Isolation of human monoclonal antibodies that potently neutralize human cytomegalovirus infection by targeting different epitopes on the gH/gL/UL128-131A complex. J. Virol. 84, 1005鈥?013 (2010).CAS聽 Article聽 PubMed聽Google Scholar聽 33.Crowe, J. E. Koff, W. C. Deciphering the human immunome. Expert Rev. Vaccin. 14, 1421鈥?425 (2015).CAS聽 Article聽Google Scholar聽 34.Yu, X., McGraw, P. A., House, F. S. Crowe, J. E. An optimized electrofusion-based protocol for generating virus-specific human monoclonal antibodies. J. Immunol. Methods 336, 142鈥?51 (2008).CAS聽 Article聽 PubMed聽 PubMed Central聽Google Scholar聽 35.Li, Y. et al. Directed evolution of human T-cell receptors with picomolar affinities by phage display. Nat. Biotechnol. 23, 349鈥?54 (2005).CAS聽 Article聽 PubMed聽Google Scholar聽 36.Smith, S. N., Harris, D. T. Kranz, D. M. T. Cell receptor engineering and analysis using the yeast display platform. Methods Mol. Biol. 1319, 95鈥?41 (2015).Article聽 PubMed聽 PubMed Central聽Google Scholar聽 37.Lefranc, M.-P. et al. IMGT, the international ImMunoGeneTics information system. Nucleic Acids Res. 37, D1006鈥?012 (2009).CAS聽 Article聽 PubMed聽Google Scholar聽 38.Untergasser, A. et al. Primer3鈥攏ew capabilities and interfaces. Nucleic Acids Res. 40, e115 (2012).CAS聽 Article聽 PubMed聽 PubMed Central聽Google Scholar聽 39.Koressaar, T. Remm, M. Enhancements and modifications of primer design program Primer3. Bioinformatics 23, 1289鈥?291 (2007).CAS聽 Article聽 PubMed聽Google Scholar聽 40.Benjamin, E. et al. A broadly neutralizing human monoclonal antibody directed against a novel conserved epitope on the influenza virus H3 hemagglutinin globular head. J. Virol. 88, 6743鈥?750 (2014).Article聽 PubMed聽 PubMed Central聽Google Scholar聽 Download referencesAcknowledgmentsWe thank Shin Ji Lee and Mary Everett for their work in B cell isolation, Qing Zhu for performing the microneutralization assays, Arnita Barnes and Sandrina Phipps for expression and purification of recombinant antigens, Reena Varkey and Jodi Karnell for technical advice regarding B cell handling, and Nicole Kallewaard-LeLay for helpful comments in reviewing this manuscript. This work was partially funded by a grant from DARPA (#HR0011-16-3-0001).Author informationAuthor notesAndrew MercerPresent address: RegenxBio Inc, Rockville, MD, 20850, USAXiaodong XiaoPresent address: Bristol-Myers Squibb, Redwood City, CA, 94063, USAPartha S. ChowdhuryPresent address: Sanofi-Genzyme R D Center, Framingham, MA, 01701, USAAffiliationsAntibody Discovery and Protein Engineering, MedImmune, Gaithersburg, MD, 20878, USASaravanan Rajan,聽Michael R. Kierny,聽Andrew Mercer,聽Herren Wu,聽William F. Dall鈥睞cqua,聽Xiaodong Xiao聽 聽Partha S. ChowdhuryResearch Bioinformatics, MedImmune, Gaithersburg, MD, 20878, USAJincheng Wu聽 聽Andrey TovchigrechkoAuthorsSaravanan RajanView author publicationsYou can also search for this author in PubMed聽Google ScholarMichael R. KiernyView author publicationsYou can also search for this author in PubMed聽Google ScholarAndrew MercerView author publicationsYou can also search for this author in PubMed聽Google ScholarJincheng WuView author publicationsYou can also search for this author in PubMed聽Google ScholarAndrey TovchigrechkoView author publicationsYou can also search for this author in PubMed聽Google ScholarHerren WuView author publicationsYou can also search for this author in PubMed聽Google ScholarWilliam F. Dall鈥睞cquaView author publicationsYou can also search for this author in PubMed聽Google ScholarXiaodong XiaoView author publicationsYou can also search for this author in PubMed聽Google ScholarPartha S. ChowdhuryView author publicationsYou can also search for this author in PubMed聽Google ScholarContributionsS.R. conceived, designed and performed experiments, interpreted data, developed software, and wrote the manuscript. M.R.K. and A.M. performed experiments, analyzed the data, and prepared figures. J.W. and A.T. implemented a software pipeline for the analysis NGS data and generated visualizations. H.W. and W.F.D.A. provided scientific guidance. X.X. and P.S.C. conceived and designed experiments, interpreted data, provided mentorship, and scientific guidance. P.S.C. edited the manuscript.Corresponding authorsCorrespondence to Saravanan Rajan or Partha S. Chowdhury.Ethics declarations Competing interests The authors declare to be or have been employed by AstraZeneca and own AstraZeneca stocks. Additional informationPublisher\'s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.Electronic supplementary material Xingdong Zhou, Hui Wang, Qun Ji, Mingjuan Du, Yuexia Liang, Huanhuan Li, Fan Li, Hang Shang, Xiujuan Zhu, Wei Wang, Lichun Jiang, Alexey V. Stepanov, Tianyu Ma, Nanxin Gong, Xiaodong Jia, Alexander G. Gabibov, Zhiyong Lou, Yinying Lu, Yu Guo, Hongkai Zhang Xiaoming Yang Protein Cell (2021) CommentsBy submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate. Sign up for the Nature Briefing newsletter 鈥?what matters in science, free to your inbox daily.