WO2024254506A2 - Satseq : système modulaire pour coupler une mutagenèse à saturation, un codage par barres d'adn et un séquençage profond - Google Patents
Satseq : système modulaire pour coupler une mutagenèse à saturation, un codage par barres d'adn et un séquençage profond Download PDFInfo
- Publication number
- WO2024254506A2 WO2024254506A2 PCT/US2024/033082 US2024033082W WO2024254506A2 WO 2024254506 A2 WO2024254506 A2 WO 2024254506A2 US 2024033082 W US2024033082 W US 2024033082W WO 2024254506 A2 WO2024254506 A2 WO 2024254506A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- barcode
- expression vector
- promoter
- nucleic acid
- variant
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/50—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
- G01N33/5005—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving human or animal cells
- G01N33/5008—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving human or animal cells for testing or evaluating the effect of chemical or biological compounds, e.g. drugs, cosmetics
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/1034—Isolating an individual clone by screening libraries
- C12N15/1058—Directional evolution of libraries, e.g. evolution of libraries is achieved by mutagenesis and screening or selection of mixed population of organisms
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/63—Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
- C12N15/79—Vectors or expression systems specially adapted for eukaryotic hosts
- C12N15/85—Vectors or expression systems specially adapted for eukaryotic hosts for animal cells
- C12N15/86—Viral vectors
-
- C—CHEMISTRY; METALLURGY
- C40—COMBINATORIAL TECHNOLOGY
- C40B—COMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
- C40B40/00—Libraries per se, e.g. arrays, mixtures
- C40B40/04—Libraries containing only organic compounds
- C40B40/06—Libraries containing nucleotides or polynucleotides, or derivatives thereof
- C40B40/08—Libraries containing RNA or DNA which encodes proteins, e.g. gene libraries
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/50—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
- G01N33/5005—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving human or animal cells
- G01N33/5008—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving human or animal cells for testing or evaluating the effect of chemical or biological compounds, e.g. drugs, cosmetics
- G01N33/502—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving human or animal cells for testing or evaluating the effect of chemical or biological compounds, e.g. drugs, cosmetics for testing non-proliferative effects
- G01N33/5023—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving human or animal cells for testing or evaluating the effect of chemical or biological compounds, e.g. drugs, cosmetics for testing non-proliferative effects on expression patterns
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2740/00—Reverse transcribing RNA viruses
- C12N2740/00011—Details
- C12N2740/10011—Retroviridae
- C12N2740/16011—Human Immunodeficiency Virus, HIV
- C12N2740/16041—Use of virus, viral particle or viral elements as a vector
- C12N2740/16043—Use of virus, viral particle or viral elements as a vector viral genome or elements thereof as genetic vector
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2830/00—Vector systems having a special element relevant for transcription
- C12N2830/48—Vector systems having a special element relevant for transcription regulating transport or export of RNA, e.g. RRE, PRE, WPRE, CTE
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2563/00—Nucleic acid detection characterized by the use of physical, structural and functional properties
- C12Q2563/179—Nucleic acid detection characterized by the use of physical, structural and functional properties the label being a nucleic acid
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N2500/00—Screening for compounds of potential therapeutic value
- G01N2500/04—Screening involving studying the effect of compounds C directly on molecule A (e.g. C are potential ligands for a receptor A, or potential substrates for an enzyme A)
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N2500/00—Screening for compounds of potential therapeutic value
- G01N2500/10—Screening for compounds of potential therapeutic value involving cells
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N2500/00—Screening for compounds of potential therapeutic value
- G01N2500/20—Screening for compounds of potential therapeutic value cell-free systems
Definitions
- SATSEQ A MODULAR SYSTEM FOR COUPLING SATURATION MUTAGENESIS, DNA BARCODING, AND DEEP SEQUENCING
- Phenotype-to-genotype relationships are complex, non-linear and can evolve within dynamic and highly inter-connected systems. Proteins are often highly pleiotropic, which means they can perform more than one function and participate in a wide range of biological processes. As such, perturbations to a single gene can affect multiple, independent cellular responses. However, it remains unclear if the individual protein itself is directly controlling the specific cellular response, or indirectly through interactions with other proteins in associated signaling cascades. Separation-of-function mutants can be harnessed to tease apart individual protein functions; but these require knowledge of which residues map to specific functions, an information often lacking. Mutants that specifically abolish the activity of a drug against a protein are called “Gate-keeper” mutations.
- gatekeeper mutations prevent drug binding without impacting target function, thereby abolishing drug effects.
- the identification of gatekeeper mutations is the gold standard for validating on-target drug effects. However, these are frequently only identified in clinical trials after patients develop drug resistance.
- a method of identifying a non-responsive or a hyper- responsive variant of a target protein for a drug comprising: (a) generating a library of expression vectors encoding a plurality of variants of the target protein, wherein each vector comprises (i) a nucleic acid encoding a single variant and (ii) a unique first barcode, and wherein the nucleic acid and the first barcode are transcribed as a single transcript; (b) delivering the library of expression vectors to a plurality of cells; (c) contacting the plurality of cells with the drug; and (d) determining a drug response profile of each cell containing the single variant, wherein the drug response profiles identify each variant that responds less to drug treatment than a wild-type target protein as a non-responsive variant; and a variant that responds more to drug treatment than a wild-type target protein as a hyper-responsive variant.
- a method of identifying a non-responsive or a hyper- responsive variant of a target protein for a drug comprising: (a) generating a library of expression vectors encoding all possible single amino acid variants of the target protein, wherein each vector comprises (i) a nucleic acid encoding a single variant and (ii) a unique first barcode, and wherein the nucleic acid and the first barcode are transcribed as a single transcript; (b) delivering the library of expression vectors to a plurality of cells; (c) contacting the plurality of cells with the drug; and (d) determining a drug response profile of each cell containing the single variant, wherein the drug response profiles identify each variant that responds less to drug treatment than a wild-type target protein as a non-responsive variant; and a variant that responds more to drug treatment than a wild-type target protein as a hyper-responsive variant.
- a method of differentiating non-specific drug effects from specific drug effects for a drug comprising: (a) generating a library of expression vectors encoding a plurality of variants of the target protein, wherein each vector comprises (i) a nucleic acid encoding a single variant and (ii) a unique first barcode, and wherein the nucleic acid and the first barcode are transcribed as a single transcript; (b) delivering the library of expression vectors to a plurality of cells; (c) contacting the plurality of cells with the drug; and (d) determining a drug response profile of each cell containing the single variant by performing single cell RNA sequencing for the plurality of cells, wherein the drug response profiles identify effects that are induced across all variants treated with the drug as non-specific drug effects; and effects that vary in a subset of the variants as specific drug effects.
- a method of differentiating non-specific drug effects from specific drug effects for a drug comprising: (a) generating a library of expression vectors encoding all possible single amino acid variants of the target protein, wherein each vector comprises (i) a nucleic acid encoding a single variant and (ii) a unique first barcode, and wherein the nucleic acid and the first barcode are transcribed as a single transcript; (b) delivering the library of expression vectors to a plurality of cells; (c) contacting the plurality of cells with the drug; and (d) determining a drug response profile of each cell containing the single variant by performing single cell RNA sequencing for the plurality of cells, wherein the drug response profiles identify effects that are induced across all variants treated with the drug as non-specific drug effects; and effects that vary in a subset of the variants as specific drug effects.
- a method of identifying a drug that binds a variant of a target protein comprising: (a) generating a library of expression vectors encoding a plurality of variants of the target protein, wherein each vector comprises (i) a nucleic acid encoding a single variant fused to a protein purification tag, and (ii) a unique first barcode, and wherein the nucleic acid and the first barcode are transcribed as a single transcript; (b) delivering the library of expression vectors to a plurality of cells to express all variants of the target protein; (c) purifying all variants of the target protein; (d) incubating all variants of the target protein with a library of drug candidates; and (e) identifying the drug that binds to a variant of the target protein.
- a method of identifying a drug that binds a variant of a target protein comprising: (a) generating a library of expression vectors encoding all possible single amino acid variants of the target protein, wherein each vector comprises (i) a nucleic acid encoding a single variant fused to a protein purification tag, and (ii) a unique first barcode, and wherein the nucleic acid and the first barcode are transcribed as a single transcript; (b) delivering the library of expression vectors to a plurality of cells to express all variants of the target protein; (c) purifying all variants of the target protein; (d) incubating all variants of the target protein with a library of drug candidates; and (e) identifying the drug that binds to a variant of the target protein.
- the protein purification tag is a FLAG or HA tag.
- the library of drug candidates is a DNA-encoded chemical library.
- the method further comprises identifying a non-responsive or a hyper-responsive variant for the identified drug using the method described herein.
- a method of identifying a pathway specific modulator of a pleiotropic gene comprising: (a) generating a library of expression vectors encoding a plurality of variants of the protein encoded by the pleiotropic gene, wherein each vector comprises (i) a nucleic acid encoding a single variant and (ii) an unique barcode, and wherein the nucleic acid and the barcode are transcribed as a single transcript; (b) delivering the library of expression vectors to a plurality of cells; (c) enriching variants that modulate the pathway in a phenotypic assay; (d) purifying the enriched variants; (e) incubating the enriched variants with a library of drug candidates; and (f) identifying a drug that binds to at least one variant of the enriched variants, thereby identifying a pathway specific modulator of the pleiotropic gene.
- a method of identifying a pathway specific modulator of a pleiotropic gene comprising: (a) generating a library of expression vectors encoding all possible single amino acid variants of the protein encoded by the pleiotropic gene, wherein each vector comprises (i) a nucleic acid encoding a single variant and (ii) an unique first barcode, and wherein the nucleic acid and the barcode are transcribed as a single transcript; (b) delivering the library of expression vectors to a plurality of cells; (c) enriching variants that modulate the pathway in a phenotypic assay; (d) purifying the enriched variants; (e) incubating the enriched variants with a library of drug candidates; and (f) identifying a drug that binds to at least one variant of the enriched variants, thereby identifying a pathway specific modulator of the pleiotropic gene.
- the methods provided herein are used for patient stratification. In some embodiments, the methods provided herein are used for predicting drug resistance.
- the target protein is an intracellular protein. In some embodiments, the target protein is SUNG. In some embodiments, the target protein is KRAS.
- the phenotypic assay is a FACS assay for a cellular marker, single-cell RNA sequencing, growth assay, or an in vivo metastatic assay.
- the method further comprises identifying a non-responsive or a hyper-responsive variant for the identified pathway specific modulator using the method described herein.
- the step (a) comprises: (1) generating a library of mutagenesis vectors comprising nucleic acid sequences encoding a plurality of variants of the target protein, wherein each vector comprises a nucleic acid encoding a single variant; (2) inserting an unique barcode to each vector; and (3) subcloning the DNA fragment comprising the nucleic acid encoding the single variant and the unique barcode to the expression vector.
- the step (a) comprises: (1) generating a library of mutagenesis vectors comprising nucleic acid sequences encoding all possible single amino acid variants of the target protein, wherein each vector comprises a nucleic acid encoding a single variant; (2) inserting an unique first barcode to each vector; and (3) subcloning the DNA fragment comprising the nucleic acid encoding the single variant and the unique first barcode to the expression vector.
- the library mutagenesis vectors (e.g., comprising nucleic acid sequences encoding all possible single amino acid variants of the target protein) are generated by any methodology known in the art or described herein, such as commercial synthesis of the variant library, error prone PCR, site directed mutagenesis, Plasmid one-pot, or pooled oligo synthesis.
- the first barcode encodes information regarding the position of the mutation. In some embodiments, the first barcode encodes information regarding both the position of the mutation and the amino acid substitution. In some embodiments, the first barcode is a unique 16-nt DNA sequence. In some embodiments, each first barcode comprises one, two, three, four or all five of the following characteristics: (1) the levenshtein distance of at least 4 from any other barcode in the set; (2) a GC content between 40-60%, (3) the maximum homodimer melting temperature (T m ) less than 40 degrees Celsius, (4) absence of any Bsal (GGTCTC) restriction sites, and (5) absence of any BsmBI (CGTCTC) restriction sites.
- each first barcode is characterized by: (1) the levenshtein distance of at least 4 from any other barcode in the set; (2) a GC content between 40-60%, (3) the maximum homodimer melting temperature (T m ) less than 40 degrees Celsius, and (4) absence of any Bsal (GGTCTC) or BsmBI (CGTCTC) restriction sites.
- the expression vector is any expression vector or plasmid known in the art or described herein.
- the expression vector is a lentivirus expression vector, an adeno-associated viral (AAV) vector, or a piggyBac system.
- the expression vector is a lentivirus expression vector.
- the lentivirus expression vector comprises a 5’ LTR comprising a mutated Bsal site, and/or a 3’ LTR comprising a mutated Bsal site.
- the Bsal (GGTCTC) restriction site in the 5’ LTR and/or 3’ LTR is mutated to remove the Bsal restriction site but preserve transduction efficiency of the lentivirus expression vector (e.g., to make it compatible with both Bsal and BsmBl GGA cloning).
- the Bsal (GGTCTC) restriction site in the 5’ LTR and/or 3’ LTR of the lentivirus expression vector is replaced with sequence GGTTTC.
- the 5’ LTR comprises the nucleic acid sequence of SEQ ID NO: 2.
- the 3’ LTR comprises the nucleic acid sequence of SEQ ID NO: 4.
- the expression vector further comprises one or more elements from a first promoter, a selection marker, a P2A domain, and a WPRE element, optionally wherein the selection marker is Blasticidin.
- the expression vector comprises, from 5’ to 3’, a first promoter, a selection marker, a P2A domain, a nucleic acid sequence encoding a single variant, a first barcode, and a WPRE element.
- the expression vector further comprises two Bsal cloning sites at the 3’ untranslated region (UTR) of the nucleic acid sequence encoding the single variant and at the 5’ of the first barcode.
- the expression vector comprises an insertion between the two Bsal cloning sites, wherein the insertion comprises a second promoter, a constant region, a second barcode or a unique clonal identifier, and a Ribozyme, and wherein the constant region is at the 5’ of the second promoter, optionally wherein the second promoter is a U6 promoter.
- the expression vector further comprises a capture feature (such as 10X feature capture sequence 1 (5’-GCTTTAAGGCCGGTCCTAGCAA-3’)) at 3’ downstream of the first barcode.
- the constant region does not form primer dimers with Nextera Rdl (5’-GCAGCGTCAGATGTGTATAAGAGACAG-3’ (SEQ ID NO: 7)), TruSeq Rdl (5’- CTACACGACGCTCTTCCGATCT-3’ (SEQ ID NO: 8)), and TSO (5’- AAGCAGTGGTATCAACGCAGAG-‘3 (SEQ ID NO: 9)).
- the constant region has the sequence of 5’-AGAACCTTGCGGGTAAATC-3’ (SEQ ID NO: 5).
- the constant region can be modified to act as a PCR priming site and to be compatible with a capture sequence (e.g., such that it does not form primer dimers).
- the expression vector comprises, from 5’ to 3’, a first promoter, a selection marker, a P2A domain, a nucleic acid encoding a single variant of a target protein, a second promoter, a constant region, a second barcode or a unique clonal identifier, a first barcode, and a capture sequence (e.g., 10X capture sequence 1), optionally wherein the second promoter is a U6 promoter.
- the variant is identified using long-read sequencing.
- the long-read sequencing offers >1.5kb read length.
- the long-read sequencing offers >5 million reads sequencing depth and >1.5kb read length.
- the long-read sequencing is PacBio Revio sequencing. Regarding the long-read sequencing, see, e.g., Marx, 2023, Nature Methods 20:6-11, which is incorporated by reference herein in its entirety.
- the drug response is determined in an in vitro cellular context, or in vivo context, at step (d).
- the cell lacks endogenous expression of the target protein.
- the method further comprises training a machine learning model using the library of expression vectors and the drug response profiles, to perform phenotypic prediction from sequences.
- training the machine learning model comprises inferring structures for the library of expression vectors and providing the inferred structures to the machine learning model.
- the method further comprises: providing a sequence to the machine learning mode and receiving therefrom a phenotypic prediction.
- the machine learning model comprises an artificial neural network.
- the drug response profile is a transcriptional profile, cell fitness, or expression level of a marker.
- the drug response profile is a transcriptional profile, and the transcriptional profile is determined by single cell RNA sequencing.
- a vector such as a mutagenesis vector or an expression vector
- a first promoter comprising (e.g., from 5’ to 3’) a first promoter, a nucleic acid sequence encoding a target protein variant, a first barcode (e.g., wherein the first barcode encodes information regarding the position of the mutation), and optionally a capture sequence; wherein the first barcode comprises one, two, three, four or all five of any of the following characteristics: (1) the levenshtein distance of at least 4 from any other barcode in the set; (2) a GC content between 40-60%, (3) the maximum homodimer melting temperature (T m ) less than 40 degrees Celsius, (4) absence of any Bsal (GGTCTC) restriction sites, and (5) absence of any BsmBI (CGTCTC) restriction sites.
- T m maximum homodimer melting temperature
- the first barcode is a unique 16-nt DNA sequence.
- the vector is any vector or plasmid known in the art or described herein (e.g., a lentivirus vector).
- the provided vector can be used in any of the methods and systems described herein.
- an expression vector comprising (e.g., from 5’ to 3’) a first promoter, a selection marker, a P2A domain, a nucleic acid sequence encoding a target protein variant, a first barcode (e.g., wherein the first barcode encodes information regarding the position of the mutation), and optionally a capture sequence and a WPRE element; wherein the first barcode comprises one, two, three, four or all five of any of the following characteristics: (1) the levenshtein distance of at least 4 from any other barcode in the set; (2) a GC content between 40-60%, (3) the maximum homodimer melting temperature (T m ) less than 40 degrees Celsius, (4) absence of any Bsal (GGTCTC) restriction sites, and (5) absence of any BsmBI (CGTCTC) restriction sites.
- a first promoter e.g., a selection marker, a P2A domain, a nucleic acid sequence encoding a target protein variant, a first barcode (e
- the first barcode is a unique 16-nt DNA sequence.
- the vector is any vector or plasmid known in the art or described herein (e.g., a lentivirus vector). In some embodiments, such an expression vector can be used in any of the methods and systems described herein.
- a vector such as a mutagenesis vector or an expression vector
- a vector comprising (e.g., from 5’ to 3’) a first promoter, a nucleic acid encoding a target protein variant, a second promoter, a constant region, a second barcode or a unique clonal identifier, a first barcode (e.g., wherein the first barcode encodes information regarding the position of the mutation), and optionally a capture sequence.
- the constant region does not form primer dimers with Nextera Rdl (5’- GCAGCGTCAGATGTGTATAAGAGACAG-3’ (SEQ ID NO: 7)), TruSeq Rdl (5’- CTACACGACGCTCTTCCGATCT-3’ (SEQ ID NO: 8)), and TSO (5’- AAGCAGTGGTATCAACGCAGAG-‘3 (SEQ ID NO: 9)).
- the constant region has the sequence of 5’-AGAACCTTGCGGGTAAATC-3’ (SEQ ID NO: 5).
- the constant region can be modified to act as a PCR priming site and to be compatible with a capture sequence (e.g., such that it does not form primer dimers).
- the first barcode is a unique 16-nt DNA sequence.
- the first barcode comprises one, two, three, four or all five of the following characteristics: (1) the levenshtein distance of at least 4 from any other barcode in the set; (2) a GC content between 40- 60%, (3) the maximum homodimer melting temperature (T m ) less than 40 degrees Celsius, (4) absence of any Bsal (GGTCTC) restriction sites, and (5) absence of any BsmBI (CGTCTC) restriction sites.
- the first barcode is characterized by: (1) the levenshtein distance of at least 4 from any other barcode in the set; (2) a GC content between 40-60%, (3) the maximum homodimer melting temperature (T m ) less than 40 degrees Celsius, and (4) absence of any Bsal (GGTCTC) or BsmBI (CGTCTC) restriction sites.
- the capture sequence is 10X capture sequence 1 (5’-GCTTTAAGGCCGGTCCTAGCAA-3’)).
- the second promoter is a U6 promoter.
- the vector is any vector or plasmid known in the art or described herein (e.g., a lentivirus vector). In some embodiments, such a vector can be used in any of the methods and systems described herein.
- an expression vector (e.g., from 5’ to 3’) a first promoter, a selection marker, a P2A domain, a nucleic acid encoding a target protein variant, a second promoter, a constant region, a second barcode or a unique clonal identifier, a first barcode, and optionally a capture sequence.
- the constant region does not form primer dimers with Nextera Rdl (5’-GCAGCGTCAGATGTGTATAAGAGACAG-3’ (SEQ ID NO: 7)), TruSeq Rdl (5’-CTACACGACGCTCTTCCGATCT-3’ (SEQ ID NO: 8)), and TSO (5’-AAGCAGTGGTATCAACGCAGAG-‘3 (SEQ ID NO: 9)).
- the constant region has the sequence of 5’-AGAACCTTGCGGGTAAATC-3’ (SEQ ID NO: 5).
- the constant region can be modified to act as a PCR priming site and to be compatible with a capture sequence (e.g., such that it does not form primer dimers).
- the first barcode is a unique 16-nt DNA sequence.
- the first barcode comprises one, two, three, four or all five of the following characteristics: (1) the levenshtein distance of at least 4 from any other barcode in the set; (2) a GC content between 40- 60%, (3) the maximum homodimer melting temperature (T m ) less than 40 degrees Celsius, (4) absence of any Bsal (GGTCTC) restriction sites, and (5) absence of any BsmBI (CGTCTC) restriction sites.
- the first barcode is characterized by: (1) the levenshtein distance of at least 4 from any other barcode in the set; (2) a GC content between 40-60%, (3) the maximum homodimer melting temperature (T m ) less than 40 degrees Celsius, and (4) absence of any Bsal (GGTCTC) or BsmBI (CGTCTC) restriction sites.
- the capture sequence is 10X capture sequence 1 (5’-GCTTTAAGGCCGGTCCTAGCAA-3’)).
- the second promoter is a U6 promoter.
- the vector is any vector or plasmid known in the art or described herein (e.g., a lentivirus vector).
- such a vector can be used in any of the methods and systems described herein.
- the vectors described herein comprise a Bsal and/or BsmBI restriction enzymes site for insertion of a nucleic acid sequence encoding a variant of a target protein and/or a DNA barcode.
- a system comprising a mutagenesis vector and an expression vector, wherein the mutagenesis vector and the expression vector comprises a BasI and/or BsmBI restriction enzymes site for insertion of a nucleic acid sequence encoding a variant of a target protein and/or a DNA barcode.
- the expression vector is a lentivirus expression vector, an adeno-associated viral (AAV) vector, or a piggyBac system.
- the expression vector is a lentivirus expression vector.
- the lentivirus expression vector comprises a 5’ LTR comprising a mutated Bsal site, and/or a 3’ LTR comprising a mutated Bsal site.
- the 5’ LTR comprises the nucleic acid sequence of SEQ ID NO: 2.
- the 3’ LTR comprises the nucleic acid sequence of SEQ ID NO: 4.
- the expression vector further comprises one or more elements from a first promoter, a selection marker, a P2A domain, and a WPRE element, optionally wherein the selection marker is Blasticidin.
- the expression vector comprises, from 5’ to 3’, a first promoter, a selection marker, a P2A domain, a nucleic acid sequence encoding a single variant, a first barcode, and a WPRE element.
- the expression vector further comprises two Bsal cloning sites at the 3 ’ untranslated region (UTR) of the nucleic acid sequence encoding the single variant and at the 5’ of the first barcode.
- the expression vector comprises an insertion between the two Bsal cloning sites, wherein the insertion comprises a second promoter, a second barcode or a unique clonal identifier, a Ribozyme, and a constant region, and wherein the constant region is at the 5’ of the second promoter, optionally wherein the second promoter is a U6 promoter.
- the expression vector further comprises a 10X feature capture sequence 1 at 3’ downstream of the first barcode.
- the constant region does not form primer dimers with Nextera Rdl (5’- GCAGCGTCAGATGTGTATAAGAGACAG-3’ (SEQ ID NO: 7)), TruSeq Rdl (5’- CTACACGACGCTCTTCCGATCT-3’ (SEQ ID NO: 8)), and TSO (5’- AAGCAGTGGTATCAACGCAGAG-‘3 (SEQ ID NO: 9)).
- the constant region has a sequence of 5’-AGAACCTTGCGGGTAAATC-3’ (SEQ ID NO: 5).
- the expression vector comprises, from 5’ to 3’, a first promoter, a selection marker, a P2A domain, a nucleic acid encoding a single variant of a target protein, a second promoter, a constant region, a second barcode or a unique clonal identifier, a first barcode, and a 10X capture sequence 1, optionally wherein the second promoter is a U6 promoter.
- a lentiviral expression vector comprising: (1) a mutated 5’ LTR, (2) a nucleic acid encoding a single variant of a target protein, (3) a first barcode, and (4) a mutated 3’ LTR.
- the mutated 5’ LTR comprises a mutated Bsal site
- the mutated 3’ LTR comprises a mutated Bsal site.
- the Bsal (GGTCTC) restriction site in the 5’ LTR and/or 3’ LTR is mutated to remove the Bsal restriction site but preserve transduction efficiency of the lentivirus expression vector (e.g., to make it compatible with both Bsal and BsmBl GGA cloning).
- the Bsal (GGTCTC) restriction site in the 5’ LTR and/or 3’ LTR of the lentivirus expression vector is replaced with sequence GGTTTC.
- the mutated 5’ LTR comprises the nucleic acid sequence of SEQ ID NO: 2.
- the mutated 3’ LTR comprises the nucleic acid sequence of SEQ ID NO: 4.
- such lentivirus expression vector further comprises any one or more additional features (including but not limited to: a first promoter, a nucleic acid encoding a target protein variant, a second promoter, a constant region, a second barcode or a unique clonal identifier, and a first barcode) as described herein.
- the lentiviral expression vector further comprises one or more elements from a first promoter, a selection marker, a P2A domain, and a WPRE element, optionally wherein the selection marker is Blasticidin.
- the lentiviral expression vector comprises, from 5’ to 3’, a first promoter, a selection marker, a P2A domain, a nucleic acid sequence encoding a single variant, a first barcode, and a WPRE element.
- the lentiviral expression vector further comprises two Bsal cloning sites at the 3’ untranslated region (UTR) of the nucleic acid sequence encoding the single variant and at the 5’ of the first barcode.
- the lentiviral expression vector comprises an insertion between the two Bsal cloning sites, wherein the insertion comprises a second promoter, a second barcode or a unique clonal identifier, a Ribozyme, and a constant region, and wherein the constant region is at the 5’ of the second promoter, optionally wherein the second promoter is a U6 promoter.
- the lentiviral expression vector further comprises a 10X feature capture sequence 1 at 3’ downstream of the first barcode.
- the constant region does not form primer dimers with Nextera Rdl (5’- GCAGCGTCAGATGTGTATAAGAGACAG-3’ (SEQ ID NO: 7)), TruSeq Rdl (5’- CTACACGACGCTCTTCCGATCT-3’ (SEQ ID NO: 8)), and TSO (5’- AAGCAGTGGTATCAACGCAGAG-‘3 (SEQ ID NO: 9)).
- the constant region has a sequence of 5’-AGAACCTTGCGGGTAAATC-3’ (SEQ ID NO: 5).
- the lentiviral expression vector comprises, from 5’ to 3’, a first promoter, a selection marker, a P2A domain, a nucleic acid encoding a single variant of a target protein, a second promoter, a constant region, a second barcode or a unique clonal identifier, a first barcode, and a 10X capture sequence 1, optionally wherein the second promoter is a U6 promoter.
- a vector can be used in any of the methods and systems described herein.
- FIGS. 1A-1G show the experimental overview.
- FIG. 1 A shows that STING is a highly pleiotropic protein responsible for multiple functions such as induction of Interferon, NF-KB and ER stress. Some of these functions have been associated with specific protein domains while others are still unknown.
- FIG. IB shows that a deep saturation library was synthesized with all possible variants of each amino acid position. Each of these variants was directly associated with a specific DNA barcode, a unique 16-nt DNA sequence that indicated which amino acid was mutated, and a random DNA barcode which acted as a unique molecular identifier. The combination of the two barcodes was unique to each transduced cell.
- FIG. 1 A shows that STING is a highly pleiotropic protein responsible for multiple functions such as induction of Interferon, NF-KB and ER stress. Some of these functions have been associated with specific protein domains while others are still unknown.
- FIG. IB shows that a deep saturation library was synthesized with all possible variant
- FIG. 1C shows that the mutagenesis variants were cloned into a lentivirus expression vector for stable integration into a cell line of interest, capable of variable expression levels.
- FIG. ID shows the library coverage of synthesized variants across the 3D structure of the STING homodimer.
- FIG. IE shows that the Barcoded Variant library was injected into mice to quantify in vivo phenotypic effects. Cells were sampled from the mice at various stages of disease progression to quantify both the frequency (and therefore fitness) as well as the transcriptional profile of each mutant.
- FIG. IF shows that the variant library was treated with immunofluorescence and FACS sorted to quantify variants that affected specific proteins abundance.
- FIG. 1G shows that scRNAseq was used to map each barcoded variant to a transcriptional response in a cellular and in vivo context.
- FIG. 2 shows pSatSeq vectors for mutagenesis and expression of a gene of interest.
- Fig. 2 shows schematics of pMUT and pSatSeq vectors.
- pMUT acts as a template for variant library synthesis or mutagenesis. It is a minimal vector containing the coding sequence of a protein of interest, a DNA barcode, flanking cloning sequences. These cloning sequences allow the coding sequence and barcode to be inserted into pSatSeq, the lentiviral expression vector.
- pSatSeq stability integrates into the host genome to drive expression of the gene of interest and can be customized for individual experiments as needed, without having to re-generate the mutagenesis library from scratch.
- FIG. 3 shows that synthesis of saturation mutagenesis library shows uniform coverage of all possible single amino acid variants. 48 positions were synthesized for each possible single amino acid substitution. Sequencing of the DNA synthesis shows relatively uniform coverage of all possible variants that were successfully synthesized.
- FIGS. 4A-4B shows single cell phenotyping of drug effects across STING variant library.
- FIG. 4A shows UMAP projection capturing single cell transcriptional responses colored by each STING variant; directed arrows show average response of variant before and after treatment with agonist DMXAA.
- FIG. 4B shows distribution of pairwise Euclidean distance between each variant -/+ DMXAA (across all variable genes).
- the I229A variant is known to have a strong decrease in affinity for DMXAA; validating we can capture drug effects with scRNA-seq readouts.
- FIGS. 5A-5D shows fitness effect of barcoded STING mutagenesis library in vitro and in vivo.
- FIG. 5A shows the change of relative abundance of each SUNG variant in each condition. Light shade labelled variants (see R76) showed an increase in relative abundance, while darker shade (N153) labelled variants showed a decrease in relative abundance.
- data represents three independent biological replicates.
- data represents four independent mice per time point.
- FIG. 5B shows the fitness of each variant of SUNG along the 2D protein sequence.
- FIG. 5C shows fitness effects of STING variants mapped to the 3D structure of mus musculus STING.
- FIG. 5D shows fitness effects measured in vitro are significantly more correlated with each other than with fitness effects observed in vivo.
- FIG. 6 shows prediction of structural changes from single amino acid substitutions using Alphafold2. All single amino acid positions were mutated to serine and their structures were simulated using Alphafold2. A heatmap is shown with the log(adjusted distance A) which is weighted by the confidence of the simulation. Variants are grouped into one of three main clusters, either no significant change in structure, a bend between the transmembrane (TM) and cytosolic domains, or a twist in the cGAMP binding cytosolic domain.
- TM transmembrane
- the known structural changes that occur during activation from cyclic di-nucleotides include a twist in the cGAMP binding cytosolic domain, a bend between the transmembrane and cytosolic domains, and the formation of a beta sheet above the cGAMP binding pocket.
- FIG. 7 shows distribution of position barcodes.
- FIG. 8 shows 4 different phenotypic effects of individual mutations mapped to the 3D protein structure.
- FIG. 9 shows the in silico structural prediction of an alanine scan of STING, where each position has individually been mutated to an alanine. The root mean square distance between the residues of these structures, and all other mutant STING structures in the alanine scan is then calculated and displayed as a clustergram.
- FIG. 10 shows a schematic diagram of single cell RNA sequencing.
- FIGS. 11A-11B show optimization of DNA Barcode Capture.
- FIG. 12 shows barcode capture efficiency was increased (from ⁇ 10%) to nearly 70%.
- FIGS. 13A-13C show a vector map for P SatSeq2 TWIST mSTINGU6 vBC tBC.
- FIGS. 14A-14B show a vector map for pSatSeq mSTING vBC SSBC0001.
- FIGS. 15A-15D show a machine learning processing pipeline according to embodiments of the present disclosure.
- FIG. 16 depicts a computing node according to an embodiment of the present disclosure.
- FIG. 17 shows the distribution of barcodes after full library synthesis and cloning into pSatSeq expression vector. 66 million unique clones, -8,500 clones/variant, and a median 160k clones/position were detected.
- FIG. 18 shows the distribution of barcodes after library transduction into EO771.1mb cell line. 7,348,827 unique clones, -1000 cones/variant, and median 16,630 clones/position were detected.
- FIG. 20 shows in vivo metastatic seeding fitness from metastatic lung tumors 21 days after tail-vein injection of EO771.1mb SatSeq mSTING variant library in C57BL/6 mice. Data consisted of 10 replicates. Relative abundance was quantified in cells before injection and in lungs after 21 days of growth. Each dot represented the change in relative abundance in an individual mouse for a given position. Specific variants were highlighted on each extreme. These included the RRDD control, which also showed a decrease in lung surface metastasis (Bottom left) when injected individually. On the other end, K337 showed an increase in TGF0 pathway and a loss of Interferon response when mutated, providing a mechanistic explanation for why this variant may be more fit in the metastatic niche.
- FIG. 21 shows single cell transcriptional data from -80,000 EO771.1mb SatSeq mSTING variant library treated with either DMXAA or DMSO.
- Cellassign was modified to call specific transcriptional pathways instead of cell-types.
- Cells treated with DMSO were mostly unassigned (light shade), ER_stress (dots), or Autophagy (dots).
- Cells treated with DMXAA were overwhelming interferon (see the dot mass on top) demonstrating a strong interferon response when induced with a STING agonist (DMXAA).
- FIG. 22 shows Euclidean distance in gene space for each variant in EO771.1mb SatSeq mSTING variant library treated with either DMXAA or DMSO (FIG. 22A). Specific variants were highlighted. STOP negative control which did not produce any STING protein had one of the smallest responses to a SUNG agonist as expected.
- the reference mSTING sequence (REF) had a robust response to DMXAA. There were a number of variants that were unresponsive to DMXAA such as Al 55 and I229A. I229A is a mutation known to attenuate DMXAA response. There were also variants that were more responsive to DMXAA than the REF strain such as SI 28 and S161A.
- S161A is a known mutation that restores sensitivity to DMXAA in human SUNG
- the other hypoactivating and hyperactivating mutants such as Al 55 and S128 were confirmed with qPCR (FIG. 22B). These variants are used to stratify patients for clinical trials or treatments, and non-responders are expected to emerge as resistant phenotypes after drug treatment.
- FIG. 23 shows transcriptional hallmark analysis of significant pathways from single cell RNA-sequencing. Across known functions of STING, most amino acid positions only significantly affected one pathway.
- FIG. 24 shows structural features of STING mapped to phenotypic effects.
- the center dendrograms represent the protein structures represented in FIG. 6 as predicted by alphafold2.
- the inner most circle represents ifnbl levels in DMSO treated cells.
- the middle circle represents ifnbl levels in cells treated with DMXAA and shows a number of variants which were no longer able to produce an interferon response.
- the outer layer indicates known protein domains with transmembrane (see SI 17A and others of the same shade), dimerization (see Y163A and others of the same shade), ligand binding domain (see C256A and others of the same shade), and C-terminal Tail (see T347A and others of the same shade). Residues that are conserved in humans are outlined in a black box.
- FIG. 25 shows a schematic of the SatSeq expression system.
- FIG. 26 shows the use of long read sequencing to associate the transgene encoding the target protein with the combined DNA barcodes (which are, in total, 36 nucleotides in length) for each individual clone within the population, and to quantify rates of recombination during viral vector integration.
- DNA barcodes which are, in total, 36 nucleotides in length
- FIG. 26 shows the use of long read sequencing to associate the transgene encoding the target protein with the combined DNA barcodes (which are, in total, 36 nucleotides in length) for each individual clone within the population, and to quantify rates of recombination during viral vector integration.
- DNA barcodes which are, in total, 36 nucleotides in length
- binding refers to an association, which may be a stable association, between two molecules, e.g., between a probe and a target nucleic acid molecule, or between a primer and a nucleic acid template, e.g., due to, for example, electrostatic, hydrophobic, ionic and/or hydrogen-bond interactions.
- the two or more nucleic acid sequences that correspond to each other all have the same nucleic acid sequences.
- the two or more nucleic acid sequences that correspond to each other may have different nucleic acid sequences. For example, 5’-TTT-3’ and 5’-UUU-3’ have different nucleic acid sequences but correspond to each other because they are both complementary to 5’- AAA-3’.
- drug response refers to all types of phenotypical and/or functional readouts for a cell in responsive to a drug treatment, including but not limited to transcriptional profile of the cell, cell fitness, or expression level of a marker, etc.
- drug response used herein includes both on-target responses and off-target responses.
- polynucleotide and “nucleic acid” are used herein interchangeably. They refer to a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides, or analogs thereof. Polynucleotides may have any three-dimensional structure, and may perform any function, known or unknown.
- polynucleotides coding or non-coding regions of a gene or gene fragment, loci (locus) defined from linkage analysis, exons, introns, messenger RNA (mRNA), transfer RNA, ribosomal RNA, ribozymes, cDNA, synthetic polynucleotides, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, and primers.
- a polynucleotide may comprise modified nucleotides, such as methylated nucleotides and nucleotide analogs.
- modifications to the nucleotide structure may be imparted before or after assembly of the polymer.
- the sequence of nucleotides may be interrupted by non-nucleotide components.
- a polynucleotide may be further modified, such as by conjugation with a labeling component.
- the present disclosure is based, at least in part, on the development of SatSeq technology.
- SatSeq is a modular breakthrough technology that couples the power of single cell sequencing with saturation mutagenesis and DNA barcoding to assay protein functions and on-target drug effects in a cellular context.
- every amino acid of the target protein is mutated one-by-one, such that the effect of individual residues can be examined through loss or gain-of-function.
- These mutations are associated with specific and unique DNA barcodes. This allows all variants to be pooled and assessed as an ensemble, greatly increasing the quality and quantity of information extracted in downstream assays.
- These assays can include e.g., high-throughput small compound screening or single cell transcriptomics, which can be screened both in vivo and in vitro.
- metastatic progression can be specifically blocked by targeting residues of STING contributing to the pro- tumorigenic activation of ER stress and NF-kB pathways, but not its anti-tumor IFN functions This would maintain the anti-tumor immune functions of the target protein, greatly diminishing unwanted effects of drug treatment as well as potentially enhancing therapeutic efficacy.
- compounds without any cytotoxic or cytostatic effects can be disregarded as usual.
- Compounds that effect all protein variants and act through non-specific mechanisms can also be disregarded.
- any compound that affects some, but not all variants must act through specific protein mechanisms. More so, the 3D location of the variants may indicate the direct binding site of the small compound without further molecular or structural experiments. This provides additional, clinically actionable information regarding which variants may respond to or adapt resistance to the compound.
- Variants that mediate resistance to compounds are often called “Gate-keeper” mutations, which are typically identified in late stage clinical trials when drug resistance develops.
- gatekeeper mutations of such lethal transitions can be identified before clinical trials even begin. This allows to predict both individual patient outcomes before treatment (based on their specific target protein sequence), as well as future mutations that may evolve under drug selection to facilitate therapy resistance. This information is critical, as it allows for the design of evolutionary traps. In addition to non-responsive “Gate-keeper” mutations, some variants that are hyper-responsive to certain drugs can also be found. This information allows to synergistically pair drug combinations for a target, i.e. “ Gate-keeper” s for one drug may be hyper-responsive for another drug. Evolutionary pressures to become drug resistant will become dead-ends.
- compound identification can be further valorized with single cell transcriptomics.
- single cell transcriptional readouts By coupling the DNA barcodes with single cell transcriptional readouts, global phenotypic effects across the entire mutagenesis library can be measured.
- the transcriptional response associated with each compound and/or gatekeeper mutants allows for further information regarding the targeted biochemical pathway and the cellular responses it may elicit. This allows for a high-dimensional interrogation of the genome- wide phenotypic responses to a compound and their evolving physiologic roles in vivo.
- the vector system allows for rapid association of each variant within the saturation mutagenesis library with a DNA barcode that is directly compatible with existing commercial single cell transcriptomics solutions. Since all of the expression vectors in the SatSeq system are modular, the same mutagenesis library can be swapped into the different expression vectors for specific cell lines or experimental needs. Furthermore, the DNA barcodes allow to track all possible single mutation variants of a protein in an ensemble. This ensemble can then be used in both traditionally bulk and single-cell assays. The DNA barcodes allow to parse the effects of each condition/treatment on each protein variant in a single, highly multiplexed experiment.
- SatSeq contains a modular 2 vector system: a mutagenesis vector and an expression vector.
- both vectors have been synthesized to be optimal for high efficiency cloning using Bsal or BsmBI restriction enzymes, which has not previously been reported for lentivector systems.
- the mutagenesis vector contains a Bsal cloning site for the insertion of a random nucleotide DNA barcode in addition to the DNA barcode generated during DNA synthesis. This encodes information regarding both the position of the mutation, as well as the amino acid substitution, in contrast to just the positional barcodes offered by some commercial source.
- this two-vector system allows to generate a single mutagenesis library on the mutagenesis vector, and insert that into multiple different versions of the expression system if specific features need to be added, such as fusion proteins, protein tags, or alternative promoters. This significantly reduces the costs and time required to optimize the protein variant library for each application.
- each phenotypic effect can be directly mapped to the protein of interest’s 3D structure.
- the protein variant library informs which mutants no longer bind to the drug, information regarding mutations that provide drug resistance, and the affect that all mutations of unknown significance have on the activity of the drug can also be immediately obtained.
- drugs that specifically target clinically significant variants of proteins such as mutant KRAS, can be screened for. This allows to interrupt processes which are essential for disease progression while maintaining native functions in healthy tissues. Because all possible variants are screened in an ensemble, it is possible to combine all of these traditionally separate experiments that would typically take several years/decades into a single experiment.
- the first step of the methods provided herein is to introduce the coding DNA sequence (CDS) of a targeted protein into the mutagenesis vector (pMUT). This may be done through either direct synthesis of pMUT with the CDS, or inserting the CDS into an existing pMUT backbone with either Restriction Digestion/Ligation, Gibson Cloning, or another method.
- the resulting pMUT: CDS vector becomes the template for saturation mutagenesis. Saturation mutagenesis may be performed from previous established or new protocols.
- the pMUT vector contains a Bsal cloning site to insert DNA barcodes.
- a defined 16- mer is used for each amino acid position in the CDS (with defined features such as levenshtein distance > 3, 40-60% GC, no hairpins, etc), and a random 16mer was used for each individual vector to differentiate different mutations at the same amino acid position.
- the pMUT: CDS vector also has 2 flanking BsmBI restriction sites. This allows the variant pMUT: CDS library to be inserted into the expression vector (pSATSEQ) through a high efficiency golden gate cloning strategy.
- the pSATSEQ vector has a BsmBI cloning site that allows the CDS from pMUT to be inserted in-frame regardless of the CDS used.
- the pSatSeq vector is also compatible with Bsal restriction enzymes, unlike traditional lentivectors. This allows for the insertion of barcodes into either pMUT or pSATSEQ, as required.
- pSATSEQ is a lentivector, which allows for stable transduction of the CDS sequence into a broad range of cell types.
- the library of pSATSEQ: CDS vectors can be used ensemble to screen small compounds in traditional microplate formats.
- the advantage of screening the saturation mutagenesis ensemble instead of a wild-type cell line is that it provides vastly more information from a single screen. Not only small compounds that are cytotoxic or cytostatic to the cell line of interest can be identified, but whether the cytotoxic/static effects are specific to a targeted protein of interest, and where those compounds bind to the targeted protein can be directly inferred. This is because any small compounds that bind specifically to a targeted protein should have their activity abolished by mutations in the binding site of the protein.
- the small compound is equally effective against all variants in the ensemble, its effects are non-specific. If the small compound preferentially inhibits some variants more than others, the specific inhibition of each variant can be mapped directly to the 3D structure of the protein. The specific inhibition is quantified by assessing the distribution of each DNA barcode (which are uniquely associated with protein variants) before and after drug treatment. This change in relative frequency allows to assign a fitness to each variant in the ensemble. This method of assigning a fitness to each variant is broadly generalizable as well. While microplates are likely the most straightforward starting point, this can also be applied in vivo or with other selective pressures and/or stimuli.
- the variant library can be further valorized by determining the mode of action of any small compounds identified above. Since all variants are barcoded in the 3’ UTR of the variant transcript, they can easily be detected in single cell transcriptomics.
- a 1 OX feature capture sequence 1 is incorporated directly downstream of the variant barcode to make the system directly compatible with current commercial platforms such as 10X Genomics. This allows to associate specific transcriptional responses to each protein variant using both in vitro and in vivo systems. By mapping which variants effected by small molecules to these transcriptional responses, which downstream pathways are impacted by candidate small molecules in a cellular context can be directly observed.
- the target gene of interest displays dominant mutations
- the target gene is essential, the order of operations can be reversed: it can first be complemented and then the native gene can be deleted. If the gene is deleted through CRISPR-Cas9 or similar methods, the target sequence (or PAM) should be altered in the complemented CDS without changing the amino acid sequences through alternative codons. This prevents the nuclease during genome editing from targeting the complemented CDS.
- the pSATSEQ expression vector can be used to fuse a variety of protein tags onto the CDS sequences of the mutagenesis library. These include protein purification tags such as FLAG or HA, but also biotin tags. The incorporation of a biotin tag allows to potentially screen much larger small compound libraries through direct-binding interactions.
- DNA-labeled small compound libraries can be screened by mixing the barcoded small-compound library with the biotinylated variant protein library. This mixture can then be added to streptavidin coated magnetic beads to allow for purification of all small compounds that bind to any possible variant of the targeted protein. In some embodiments, this is further expanded with Ribosome Display to maintain association of each protein variant with their own DNA barcode.
- the variants are identified by sequencing the barcodes.
- a third generation sequencing technology is used to identify variants.
- the variant is identified using long-read sequencing.
- the long-read sequencing can be any sequencing technology that offers >5 million reads sequencing depth and >1.5kb read length.
- the long-read sequencing is PacBio Revio sequencing.
- deep learning protein 3D structure prediction algorithms such as Alphafold2, RoseTTAfold, or similar programs are used to perform in silico structural predictions. It has been found that structural changes with single amino acid variations can be predicted. The predicted structural changes correspond with known structural changes that occur during protein activation. Similar to phenotypic metrics, the amino acid positions which result in structural changes do not directly correspond to the known 2D or 3D protein structure. As a result, the feature outputs from the deep learning algorithms can be used to train a classifier. This classifier can be used to map phenotypic changes to intermediate structural changes such as the pair representation, pairwise distances, or other features of the 3D structure prediction.
- the approach described herein uniformly perturbs every position in the protein of interest with the same frequency, and all positions are represented in each experiment, it greatly reduces the biases in the data. This would allow to predict which residues will have the greatest impact on protein function to perform a targeted panel of residues for mutagenesis, which may substantially reduce the costs.
- the methods described herein is used for identification and characterization of small molecules that bind to and interact with specific protein variants.
- the approach described herein can be used to find small molecules that inhibit critical oncogenes that have previously been found difficult to drug such as RAS proteins. However, this approach is generalizable to any disease or condition with a well-defined protein target.
- this approach can also be used to determine the precise binding site of monoclonal antibodies to the targeted protein.
- the non-responsive or hyper-responsive variants of a target protein for a drug identified using the methods described herein can be used for patient stratification.
- the methods further comprise treating a patient with the drug if the patient has a hyper-responsive variant of a target protein of the drug.
- the methods further comprise not treating a patient with a drug if the patient has a non-responsive variant of a target protein of the drug.
- the methods further comprise treating a patient with a different drug if the patient contains a non-responsive variant of a target protein of the drug.
- the non-responsive variants of a target protein for a drug identified using the methods described herein can be used for predicting drug resistance (in patients identified as having or expressing a non-responsive target protein variant).
- the SatSeq method can be used to map protein function directly to its 3D structure.
- This data-rich approach is ideally suited to feed into the new wave of machine- learning approaches which attempt to predict protein function from sequence and structural data.
- Unfortunately, such approaches are only as strong as the data which they are based on, and current mutagenesis data for most proteins are both incomplete and biased towards in vitro activities. This is in sharp contrast to the data generated by SatSeq, which exhibits uniform coverage of all possible mutations and has associated functions based on in vivo biological activities.
- FIGS. 15A-15D show incorporating SatSeq with Machine Learning to predict mutational effects.
- Data quality is a key factor for modern machine learning algorithms.
- SatSeq generates a high-dimensional, uniform, and unbiased data set that can used to train existing and future algorithms for both supervised and unsupervised learning. This can be used to predict how amino acid substitutions will effect protein structure and function.
- silico protein structure can be predicted using algorithms such as Alphafold2 (FIG. 15A) or ESM-fold.
- These programs are able to predict changes in STING protein structure from single amino acid changes which are known structural changes that occur when SUNG is activated by a ligand (although they have not been experimentally validated).
- These programs contain intermediate representations of the protein structure such as histograms and contact maps. These intermediate representations can be combined with the high-dimensional data output from SatSeq (FIG. 15B) to train a classifier to predict phenotypic effects directly from the protein structure or changes in protein structure. Furthermore, the phenotypic data from SatSeq can be used to rank features extracted from in silico structural predictions based on their ability to predict function. Two approaches can be used; the first is a large language model that uses variant embedding as a latent representation of protein structure (FIG. 15C).
- the second is a Graph Convolution network (FIG. 15D) to leverage convolutional neural networks to train a classifier of protein function based on individual mutations using the SatSeq dataset (Gligorijevic et al. (2021) Nature Communications 12:3168).
- SatSeq is uniform, high- quality, and unbiased; making it ideal for existing machine learning approaches discussed above as well as new emerging techniques.
- a feature vector as described herein is provided to a learning system. Based on the input features, the learning system generates one or more outputs.
- the learning system comprises a linear classifiers, support vector machine (SVM), or random decision forest. In other embodiments, the learning system comprises an artificial neural network. In some embodiments, the learning system is pre-trained using training data such as the derived datasets described herein.
- Suitable artificial neural networks include but are not limited to a feedforward neural network, a radial basis function network, a self-organizing map, learning vector quantization, a recurrent neural network, a Hopfield network, a Boltzmann machine, an echo state network, long short term memory, a bi-directional recurrent neural network, a hierarchical recurrent neural network, a stochastic neural network, a modular neural network, an associative neural network, a deep neural network, a deep belief network, a convolutional neural networks, a convolutional deep belief network, a large memory storage and retrieval neural network, a deep Boltzmann machine, a deep stacking network, a tensor deep stacking network, a spike and slab restricted Boltzmann machine, a compound hierarchical-deep model, a deep coding network, a multilayer kernel machine, or a deep Q-network.
- the methods provided herein comprise generating a library of expression vectors encoding all possible amino acid variants of a target protein.
- the target protein can be any protein of interest.
- the target protein is an intracellular protein.
- the target protein is a transmembrane protein.
- the target protein is an enzyme.
- the target protein is a transcription factor or regulator.
- the target protein is a secreted protein (e.g., a ligand), and the target protein is linked to a domain that can tether the target protein to the cell that secrets the target protein.
- the target protein is a target of a drug (e.g., a small molecule compound, a recombinant peptide or protein drug, a nucleic acid-based drug, or an antibody).
- the target protein is a SUNG protein (e.g., a mouse SUNG protein or a human STING protein). In some embodiments, the target protein is a KRAS protein.
- the target protein can be derived from any species or organisms. In some embodiments, the target protein is a mammalian protein. In some embodiments, the target protein is a rodent (e.g., rat or mouse) protein. In some embodiments, the target protein is a primate protein. In some embodiments, the target protein is a human protein.
- the methods provided herein comprise generating and delivering a library of expression vectors to a plurality of cells. In some embodiments, the methods provided herein comprise generating and delivering more than one such libraries of expression vectors to a plurality of cells. For example, in some embodiments, the methods provided herein comprise generating and delivering two libraries of expression vectors to a plurality of cells, wherein one library of expression vectors encodes all possible amino acid variants (e.g., all possible single variants) of a first target protein, and the other library of expression vectors encodes all possible amino acid variants (e.g., all possible single variants) of a second target protein. In some embodiments, the two target proteins encoded by the two libraries of expression vectors are within the same biological pathway. In some preferred embodiments, the two target proteins encoded by the two libraries of expression vectors interact with each other.
- the amino acid variants of the target protein may include insertion, deletion, and/or substitution of amino acid(s) of the target protein.
- the amino acid variants are single amino acid variants.
- single amino acid variant used herein refers to a substitution at a single amino acid position of the target protein.
- the wild-type residue can be substituted by any other residue at the single amino acid position of the target protein.
- the expression vectors encoding all possible amino acid variants of a target protein can be any types of expression vectors.
- the terms “vector”, “cloning vector” and “expression vector” mean the vehicle by which a DNA or RNA sequence (e.g., a foreign gene) can be introduced into a host cell, so as to transform the host and promote expression (e.g., transcription and translation) of the introduced sequence.
- a further subject matter encompassed by the present invention relates to a vector (e.g., expression vector or cloning vector) comprising a nucleic acid (e.g., nucleic acid sequences encoding all possible amino acid variants of a target protein) encompassed by the present invention.
- Such vectors may comprise regulatory elements, such as a promoter, enhancer, terminator and the like, to cause or direct expression of said polypeptide upon administration to a subject
- regulatory elements such as a promoter, enhancer, terminator and the like
- promoters and enhancers used in the expression vector for animal cell include early promoter and enhancer of SV40, LTR promoter and enhancer of Moloney mouse leukemia virus, promoter and enhancer of immunoglobulin H chain and the like.
- Any expression vector or plasmid suitable for expression in an animal cell can be used.
- viral vector include adenoviral, retroviral, lentiviral, herpes virus and AAV vectors.
- Such recombinant viruses may be produced by techniques known in the art, such as by transfecting packaging cells or by transient transfection with helper plasmids or viruses. Detailed protocols for producing such replication-defective recombinant viruses are well-known in the art and may be found, for instance, in PCT Publ. WO 95/14785, PCT. Publ. WO 96/22378, U.S. Pat No. 5,882,877, U.S. Pat. No. 6,013,516, U.S. Pat. No. 4,861,719, U.S. Pat. No. 5,278,056, and PCT Publ. WO 94/19478.
- the expression vector is a lentivirus expression vector, an adeno- associated viral (AAV) vector, or a piggyBac system.
- the expression vector is a viral vector.
- the expression vector is a lentivirus expression vector.
- the expression vector is an AAV vector.
- the lentivirus expression vector comprises a mutated 5’ LTR and/or a mutated 3’ LTR.
- the lentivirus expression vector comprises a 5’ LTR comprising a mutated Bsal site, and/or a 3’ LTR comprising a mutated Bsal site.
- the 5’ LTR comprises the nucleic acid sequence of SEQ ID NO: 2.
- the 3’ LTR comprises the nucleic acid sequence of SEQ ID NO: 4.
- the composition comprises an expression vector comprising an open reading frame encoding a variant (e.g., a single variant) of a target protein described herein.
- the expression vector includes regulatory elements necessary for expression of the open reading frame. Such elements may include, for example, a promoter, an initiation codon, a stop codon, and a polyadenylation signal. In addition, enhancers may be included. These elements may be operably linked to a sequence that encodes the binding protein, polypeptide or fragment thereof.
- the expression vector further comprises one or more elements from a promoter, a selection marker, a P2A domain, and a WPRE element.
- the promoter can be any promoter that drives the expression of a gene. Examples of promoters include, but are not limited to, promoters from Simian Virus 40 (SV40), Mouse Mammary Tumor Virus (MMTV) promoter, Human Immunodeficiency Virus (HIV) such as the HIV Long Terminal Repeat (LTR) promoter, Moloney virus, Cytomegalovirus (CMV) such as the CMV immediate early promoter, Epstein Barr Virus (EBV), Rous Sarcoma Virus (RSV) as well as promoters from human genes such as human actin, human myosin, human hemoglobin, human muscle creatine, and human metalothionein. Examples of suitable polyadenylation signals include but are not limited to SV40 polyadenylation signals and LTR polyadenylation signals.
- enhancers/promoters include, for example, human actin, human myosin, human hemoglobin, human muscle creatine and viral enhancers such as those from CMV, RSV and EBV.
- the selection marker can be any marker (e.g., an antibiotic, a fluorescent reporter, a tag protein, etc) that can be used to select and/or enrich cells that comprise the expression vector.
- the selection marker is an antibiotic (e.g., Blasticidin).
- the expression vector comprises, from 5’ to 3’, a promoter, a selection marker, a P2A domain, a nucleic acid sequence encoding a single variant, a first barcode, and a WPRE element.
- the expression vector further comprises two cloning sites (e.g., two Bsal cloning sites) at the 3’ untranslated region (UTR) of the nucleic acid sequence encoding the single variant and at the 5’ of the first barcode.
- the expression vector comprises an insertion between the two cloning sites (e.g., two Bsal cloning sites).
- the insertion may comprise a second promoter, a second barcode (or a unique clonal identifier), a Ribozyme, and/or a constant region.
- the constant region is at the 5’ of the second promoter.
- the second promoter is a U6 promoter.
- the expression vector further comprises a 10X feature capture sequence 1 at 3’ downstream of the first barcode.
- the constant region does not form primer dimers with Nextera Rdl (5’- GCAGCGTCAGATGTGTATAAGAGACAG-3’ (SEQ ID NO: 7)), TruSeq Rdl (5’- CTACACGACGCTCTTCCGATCT-3’ (SEQ ID NO: 8)), and TSO (5’- AAGCAGTGGTATCAACGCAGAG-‘3 (SEQ ID NO: 9)).
- the constant region has a sequence of 5’-AGAACCTTGCGGGTAAATC-3’ (SEQ ID NO: 5).
- the expression vector comprises, from 5’ to 3’, a first promoter (e.g., a CMV promoter), a selection marker, a P2A domain, a nucleic acid encoding a single variant of a target protein, a second promoter (e.g., a U6 promoter), a constant region, a second barcode (or a unique clonal identifier), a first barcode, and a 1 OX capture sequence 1.
- a first promoter e.g., a CMV promoter
- a selection marker e.g., a CMV promoter
- P2A domain e.g., a P2A domain
- a nucleic acid encoding a single variant of a target protein e.g., a target protein
- a second promoter e.g., a U6 promoter
- a constant region e.g., a second barcode (or a unique clonal identifier)
- expression vectors e.g., lentiviral expression vectors described herein, or a library of expression vectors described here.
- a host cell may include any individual cell or cell culture which may receive a vector or the incorporation of nucleic acids and/or proteins, as well as any progeny cells.
- the term also encompasses progeny of the host cell, whether genetically or phenotypically the same or different. Suitable host cells may depend on the vector and may include mammalian cells, animal cells, human cells, simian cells, insect cells, yeast cells, and bacterial cells.
- transformation means the introduction of a “foreign” (i.e., extrinsic or extracellular) gene, DNA or RNA sequence to a host cell, so that the host cell will express the introduced gene or sequence to produce a desired substance, typically a protein or enzyme coded by the introduced gene or sequence.
- a host cell that receives and expresses introduced DNA or RNA has been “transformed.”
- the nucleic acids encompassed by the present invention may be used to produce a recombinant polypeptide encompassed by the present invention in a suitable expression system.
- expression system means a host cell and compatible vector under suitable conditions, e.g., for the expression of a protein coded for by foreign DNA carried by the vector and introduced to the host cell.
- Common expression systems include E. coli host cells and plasmid vectors, insect host cells and Baculovirus vectors, and mammalian host cells and vectors.
- Other examples of host cells include, without limitation, prokaryotic cells (such as bacteria) and eukaryotic cells (such as yeast cells, mammalian cells, insect cells, plant cells, etc. .
- FIG. 16 a schematic of an example of a computing node is shown.
- Computing node 10 is only one example of a suitable computing node and is not intended to suggest any limitation as to the scope of use or functionality of embodiments described herein. Regardless, computing node 10 is capable of being implemented and/or performing any of the functionality set forth hereinabove.
- computing node 10 there is a computer system/server 12, which is operational with numerous other general purpose or special purpose computing system environments or configurations.
- Examples of well-known computing systems, environments, and/or configurations that may be suitable for use with computer system/server 12 include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputer systems, mainframe computer systems, and distributed cloud computing environments that include any of the above systems or devices, and the like.
- Computer system/server 12 may be described in the general context of computer systemexecutable instructions, such as program modules, being executed by a computer system.
- program modules may include routines, programs, objects, components, logic, data structures, and so on that perform particular tasks or implement particular abstract data types.
- Computer system/server 12 may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network.
- program modules may be located in both local and remote computer system storage media including memory storage devices.
- computer system/server 12 in computing node 10 is shown in the form of a general-purpose computing device.
- the components of computer system/server 12 may include, but are not limited to, one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including system memory 28 to processor 16.
- Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures.
- bus architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, Peripheral Component Interconnect (PCI) bus, Peripheral Component Interconnect Express (PCIe), and Advanced Microcontroller Bus Architecture (AMBA).
- Computer system/server 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer system/server 12, and it includes both volatile and non-volatile media, removable and non-removable media.
- System memory 28 can include computer system readable media in the form of volatile memory, such as random access memory (RAM) 30 and/or cache memory 32.
- Computer system/server 12 may further include other removable/non-removable, volatile/non-volatile computer system storage media.
- storage system 34 can be provided for reading from and writing to a non-removable, non-volatile magnetic media (not shown and typically called a "hard drive”).
- a magnetic disk drive for reading from and writing to a removable, non-volatile magnetic disk (e.g., a "floppy disk")
- an optical disk drive for reading from or writing to a removable, non-volatile optical disk such as a CD-ROM, DVD-ROM or other optical media can be provided.
- memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the disclosure.
- Program/utility 40 having a set (at least one) of program modules 42, may be stored in memory 28 by way of example, and not limitation, as well as an operating system, one or more application programs, other program modules, and program data. Each of the operating system, one or more application programs, other program modules, and program data or some combination thereof, may include an implementation of a networking environment.
- Program modules 42 generally carry out the functions and/or methodologies of embodiments as described herein.
- Computer system/server 12 may also communicate with one or more external devices 14 such as a keyboard, a pointing device, a display 24, etc.; one or more devices that enable a user to interact with computer system/server 12; and/or any devices (e.g., network card, modem, etc.) that enable computer system/server 12 to communicate with one or more other computing devices. Such communication can occur via Input/Output (I/O) interfaces 22. Still yet, computer system/server 12 can communicate with one or more networks such as a local area network (LAN), a general wide area network (WAN), and/or a public network (e.g., the Internet) via network adapter 20. As depicted, network adapter 20 communicates with the other components of computer system/server 12 via bus 18.
- LAN local area network
- WAN wide area network
- public network e.g., the Internet
- the present disclosure may be embodied as a vector, a library of vectors, a system, a method, and/or a computer program product.
- the computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present disclosure.
- the computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device.
- the computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
- a non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing.
- RAM random access memory
- ROM read-only memory
- EPROM or Flash memory erasable programmable read-only memory
- SRAM static random access memory
- CD-ROM compact disc read-only memory
- DVD digital versatile disk
- memory stick a floppy disk
- a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon
- a computer readable storage medium is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
- Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network.
- the network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers.
- a network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
- Computer readable program instructions for carrying out operations of the present disclosure may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++ or the like, and conventional procedural programming languages, such as the “C” programming language or similar programming languages.
- the computer readable program instructions may execute entirely on the user’s computer, partly on the user’s computer, as a stand-alone software package, partly on the user’s computer and partly on a remote computer or entirely on the remote computer or server.
- the remote computer may be connected to the user’s computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
- electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present disclosure.
- These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
- These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
- the computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
- each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s).
- the functions noted in the block may occur out of the order noted in the figures.
- two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
- Embodiment 1 A method of identifying a non-responsive or a hyper-responsive variant of a target protein for a drug, comprising:
- each vector comprises (i) a nucleic acid encoding a single variant and (ii) a unique barcode, and wherein the nucleic acid and the barcode are transcribed as a single transcript;
- Embodiment 2 A method of differentiating non-specific drug effects from specific drug effects for a drug, comprising:
- each vector comprises (i) a nucleic acid encoding a single variant and (ii) a unique barcode, and wherein the nucleic acid and the barcode are transcribed as a single transcript;
- Embodiment 3 A method of identifying a drug that binds a variant of a target protein, comprising:
- each vector comprises (i) a nucleic acid encoding a single variant fused to a protein purification tag, and (ii) a unique barcode, and wherein the nucleic acid and the barcode are transcribed as a single transcript;
- Embodiment 4 The method of embodiment 3, wherein the protein purification tag is a FLAG or HA tag.
- Embodiment 5 The method of embodiment 3 or 4, wherein the library of drug candidates is a DNA-encoded chemical library.
- Embodiment 6 The method of any one of embodiments 3-5, wherein the method further comprises identifying a non-responsive or a hyper-responsive variant for the identified drug using the method of embodiment 1.
- Embodiment 7 A method of identifying a pathway specific modulator of a pleiotropic gene, comprising:
- each vector comprises (i) a nucleic acid encoding a single variant and (ii) an unique barcode, and wherein the nucleic acid and the barcode are transcribed as a single transcript;
- Embodiment 8 The method of embodiment 7, wherein the phenotypic assay is a FACS assay for a cellular marker, single-cell RNA sequencing, growth assay, or an in vivo metastatic assay.
- the phenotypic assay is a FACS assay for a cellular marker, single-cell RNA sequencing, growth assay, or an in vivo metastatic assay.
- Embodiment 9 The method of embodiment 7 or 8, further comprising identifying a non- responsive or a hyper-responsive variant for the identified pathway specific modulator using the method of embodiment 1.
- Embodiment 10. The method of any one of embodiments 1-9, wherein the step (a) comprises: (1) generating a library mutagenesis vectors comprising nucleic acid sequences encoding all possible single amino acid variants of the target protein, wherein each vector comprises a nucleic acid encoding a single variant; (2) inserting an unique barcode to each vector; and (3) subcloning the DNA fragment comprising the nucleic acid encoding the single variant and the unique barcode to the expression vector.
- Embodiment 11 The method of embodiment 10, wherein the library mutagenesis vectors comprise nucleic acid sequences encoding all possible single amino acid variants of the target protein is generated by commercial synthesis of the variant library, error prone PCR, site directed mutagenesis, Plasmid one-pot, or pooled oligo synthesis.
- Embodiment 12 The method of any one of embodiments 1-11, wherein the barcode encodes information regarding both the position of the mutation and the amino acid substitution.
- Embodiment 13 The method of any one of embodiments 1-12, wherein the barcode is a unique 16-nt DNA sequence.
- Embodiment 14 The method of any one of embodiments 1-13, wherein the expression vector is a lentivirus expression vector.
- Embodiment 15 The method of any one of embodiments 1-14, wherein the drug response is determined in an in vitro cellular context, or in vivo context, at step (d).
- Embodiment 16 The method of any one of embodiments 1-15, wherein the expression vector further comprises a 1 OX feature capture sequence 1 downstream of the barcode.
- Embodiment 17 The method of any one of embodiments 1-16, wherein the cell lacks endogenous expression of the target protein.
- Embodiment 18. A system comprising a mutagenesis vector and an expression vector, wherein the mutagenesis vector and the expression vector comprises a BasI and/or BsmBI restriction enzymes site for insertion of a nucleic acid sequence encoding a variant of a target protein and/or a DNA barcode.
- Embodiment 19 The system of embodiment 18, wherein the expression vector is a lentivirus expression vector.
- Embodiment 20 The system of embodiment 18 or 19, wherein the expression vector comprises a nucleic acid sequence encoding a fusion protein, a protein tag, and/or a promoter.
- Embodiment 21 The system of any one of embodiments 18-20, wherein the expression vector further comprises a 1 OX feature capture sequence 1.
- Embodiment 22 The method of embodiment 1 or 2, further comprising: training a machine learning model using the library of expression vectors and the drug response profiles, to perform phenotypic prediction from sequences.
- Embodiment 23 The method of embodiment 22, wherein training the machine learning model comprises inferring structures for the library of expression vectors and providing the inferred structures to the machine learning model.
- Embodiment 24 The method of embodiment 22, further comprising: providing a sequence to the machine learning mode and receiving therefrom a phenotypic prediction.
- Embodiment 25 The method of embodiment 22 or 23, wherein the machine learning model comprises an artificial neural network.
- Embodiment 26 The method of embodiment 1, 2, or 15, wherein the drug response profile is a transcriptional profile, cell fitness, or expression level of a marker.
- Embodiment 27 The method of embodiment 26, wherein the drug response profile is a transcriptional profile, and the transcriptional profile is determined by single cell RNA sequencing.
- Embodiment 28 The method or system of any one of embodiments 1-27, wherein the unique barcode comprises two DNA barcodes.
- Embodiment 29 The method or system of any one of embodiments 1-28, wherein the expression vector further comprises a constant region of DNA located at the 5’ upstream of the unique barcode.
- Embodiment 30 The method or system of any one of embodiments 1-29, wherein the expression vector further comprises a U6 promoter and a constant region of DNA, both of which are located at the 5’ upstream of the unique barcode, and wherein the U6 promoter is located at the 5’ upstream of the constant region of DNA.
- Example 1 Pre-clinical validation of drug candidate.
- the SUNG agonist, DMXAA began Phase III clinical trials in 2008 after promising results in the preclinical stage.
- human clinical trials failed after the compound demonstrated no effect on the human SUNG protein.
- DMXAA effects were mouse-specific due to a change in a single amino acid mutation at position 162.
- SatSeq can be used to validate potential drug candidates before entering clinical trials and identify all possible gate-keeper mutations, without waiting for drug resistance to develop.
- saturation mutagenesis was performed on human SUNG protein and cloned using the SatSeq methodology described herein. The overview of the experiment is shown in FIGS. 1 A- 1G. Specifically, the human STING CDS was synthesized and inserted into pMUT vector.
- This vector became the basis for site-directed saturation mutagenesis using existing DNA synthesis technologies such as TWIST biosciences.
- a known positional barcode was incorporated into the 3’ UTR of the STING CDS.
- the mutant CDS sequences were pooled and inserted into the pSatSeq vector.
- the schematic of the pMUT vector and the pSatSeq vector are shown in FIG. 2.
- a second random DNA barcode was inserted into the vectors immediately adjacent to the positional barcode to identify specific variants and clones.
- each STING variant was associated with a unique DNA barcode that consisted of a random DNA sequence and a known positional sequence.
- the library of all possible single amino acid variants of STING was transduced into human cells with the native STING gene deleted.
- the variant library was treated with either DMSO or drug candidates such as DMXAA.
- the treated variant library was then assayed with single-cell RNA transcriptomics such as 10X or similar technologies (FIG. 10).
- the transcriptional profile of each variant was measured and compared between the DMSO vs drug treatment for each variant. Variants which responded significantly less to drug treatment than the wild-type protein sequence were putative gate-keeper variants which could be validated individually.
- variants which were hyper-responsive relative to wild-type sequence were putative potentiating mutations.
- I229(A,G,T) were identified as gatekeeper’s variants in mouse STING and G230I was identified as a potentiating mutation in human STING
- FIG. 3-9 and 17-24 demonstrates the successful use of the SatSeq technology to identify hypoactivating and hyperactivating variants of STING.
- non-specific drug effects induced across all variants treated with the drug candidates
- drug effects genes that vary in only some of the variants.
- This allows to demonstrate that the effects of any drug candidates are on-target or instead, act through general mechanisms such as cytotoxic phospholipidosis.
- These off-target effects are currently a significant hurdle as was shown recently with the failure to repurpose a large number of drugs as potential anti-viral therapies against Covid- 19.
- the highly variable genes not only demonstrate that the drug effects are on- target, but which molecular pathways are on-target and which are off-target, and the relative contribution of each. This greatly valorizes the drug candidate and increases the likelihood of successful clinical trials at a fraction of the cost and without undue risk to patients.
- KRAS variant library is tagged with a protein purification marker such as FLAG, HA, or similar during cloning into pSATSeq. This allows the purification of all variants of KRAS in ensemble.
- the variant ensemble is challenged with DNA-encoded chemical libraries or similar technologies to screen a large number of small molecules against all possible variants of KRAS.
- Example 1 Individual drug candidates identified in this manner is then screened as in Example 1 to identify candidates that specifically inhibit or activate KRAS, and which KRAS variants act as gatekeepers. This allows for exceptionally fast development of personalized medicine for individuals specific KRAS driver mutations, as all possible KRAS mutations are identified in parallel.
- Pleiotropy the ability of one gene to influence two or more seemingly unrelated phenotypic traits, is incredibly common in human genetics and disease. As a result, many drug candidates may have undesirable side effects even if their activity is completely on-target.
- the immune regulator STING has a well-described role in the induction of interferon signaling in response to cytosolic DNA during viral infections. However, it has also been implicated as an inducer of autophagy and ER stress response, in maintaining calcium homeostasis in the ER through associated with SUMI, and in activating the NF-kB stress-response pathway.
- the interferon response functions of STING can provide a tumor-suppressive role in early-stage cancers, which has motivated the development of STING agonists to stimulate anti-tumor immunity.
- chronic SUNG activation can lead to IFN-specific tachyphylaxis and upregulation of its immune-suppressive ER stress response function.
- This shift in functional responses with chronic activation of SUNG leads to metastatic progression and makes STING antagonists attractive drug candidates in advanced metastatic cancers.
- This pleiotropic response means that even perfectly on-target agonists or antagonists both have risks of promoting cancer progression or interfering with innate immunity in different contexts.
- Specific inhibitors of the tumor-promoting functions of STING are developed using the SatSeq method described here, by first disentangling this pleiotropy. This is accomplished by generating a STING saturation mutagenesis library as in Example 1. Variant enrichment and transcriptional responses are then quantified through in vivo and in vitro assays to measure the contribution of each amino acid position to various cellular phenotypes. These include but are not limited to: FACS analysis and barcode quantification after staining with various cellular markers for UPR, autophagy, NF-kB, etc., single-cell RNA sequencing for highly variable gene detection of both in vitro variant library and variant library tumors collected in vivo, growth fitness, or in vivo metastatic potential.
- phenotypic assays allow to score each amino acid position for its relative contribution to each phenotype.
- the saturation mutagenesis library is then purified and screened for drug candidates as in Example 2. All potential drug candidates are then assayed as in Example 1.
- Identified Gatekeeper mutations imply the location on the protein structure that drug candidates bind to and interact with, and indicates mechanism of actions. Any drug candidates that specifically interact with residues responsible for specific targeted functions of the protein, but not all functions (such as NF-kB, but not IFN functions of STING), are putative inhibitors of a specific protein function. These drug candidates are further validated by quantifying enriched variants at different stages of cancer progression.
- Example 4 methodology to increase the barcode capture efficiency
- STING saturation mutagenesis library was expanded to include 92.4% of the complete library (as compared to the 48 transmembrane domain positions initially targeted as a proof-of- concept), which demonstrated feasibility of the scaling up technology.
- the barcode capture efficiency was increased (from ⁇ 10%) to nearly 70% with two independent strategies (FIG. 12), which exceeded performance of most feature-encoded perturbation strategies published to date.
- the two approaches for optimization of DNA Barcode Capture are shown in FIG. 11.
- a 10X capture sequence (CS1) allowed the dBC to be detected using 10X feature extraction by acting as a priming site during first strand DNA synthesis.
- a high capture rate of the dBC is vital to link individual mutations with their transcriptional profiles.
- Detection of the dBC was optimized either on the cDNA stage (FIG. 11 A) or with a U6 promoter to specifically increase the abundance of small non-coding RNA (sncRNA) containing the dBC (FIG. 1 IB).
- sncRNA small non-coding RNA
- a constant region (CR) of DNA was added upstream of the dBC during cloning.
- an additional primer that bound to the CR was spiked into the first round PCR amplification during the standard 10X protocol.
- Strategy B can be more appropriate if the transgene requires lower expression to mimic physiological effects as expression of the dBC is semi-independent from the transgene. While no transcriptional effects from the expression of the dBC as a sncRNA were observed, this should be validated with each library constructed using Strategy B, as the dBC is partially random. Exemplary constructs used in the two approaches are illustrated in FIGS. 13-14. Using these methods, we were able to again capture known inactivating mutant of DMXAA.
- Example 5 vector features and methodology
- Lentiviral Vectors were originally derived from the human immunodeficiency virus type- 1 (HIV-1) lentivirus. These vectors are incapable of self-replication and are therefore generally considered safe. This safety has been improved with subsequent generations of Lentivector systems that have modified the original long-terminal-repeats (LTR) that are essential for viral packaging and integration into the host genome. Due to their critical role in transduction, LTR modifications have primarily been various truncations to remove undesired promoter activities, rather than changes to the nucleotide sequence itself.
- the LTR sequences in lentivectors natively contain a Bsal restriction site (GGTCTC).
- GGA golden gate assembly
- Bsal based GGA being one of the highest efficiency cloning protocols available
- BsmBI Type-IIS enzymes
- LTR Truncated Sequence: GGGTTTCTCTGGTTAGACCAGATCTGAGCCTGGGAGCTCTCTGGCTAACTAGGGAAC CCACTGCTTAAGCCTCAATAAAGCTTGCCTTGAGTGCTTCAAGTAGTGTGTGCCCGTC TGTTGTGTGACTCTGGTAACTAGAGATCCCTCAGACCCTTTTAGTCAGTGTGGAAAAT CTCTAGCA (SEQ ID NO: 2)
- the SatSeq expression system was modular and designed to allow for modifications to the vector backbone without re-synthesis of the mutagenesis library. This included changing out the lentiviral backbone for an Adeno-associated Viral (AAV) vector, or a piggyBac system.
- the essential structure of the expression system was a promoter (which could be modified to control expression levels), a selectable marker (BSD was used for here for Blasticidin selection, but any selectable gene could be used), a P2A domain (this linked expression of the marker to expression of the transgene to ensure that the transgene of interest was not silenced), a transgene of interest (e.g., STING mutagenesis library), a DNA Barcode, and a WPRE element.
- FIG. 25 shows a schematic of the SatSeq expression system.
- Inserted items should include (but are not limited to) a second DNA barcode (or Unique Clonal Identifier), a U6 promoter to drive expression of the DNA barcodes independently from the transgene expression, a Ribozyme such as the Tornado system to circularize DNA barcodes for increased yield, etc.
- Inserted items should include a Constant Region which was 5’ of the DNA barcodes for increased capture and detection.
- the Constant Region used herein was 5’-AGAACCTTGCGGGTAAATC-3’ (SEQ ID NO: 5). This sequence was designed to act as a good PCR priming site, and to be compatible with the 10X Capture Sequence 1 (5’-GCTTTAAGGCCGGTCCTAGCAA-3’ (SEQ ID NO: 6)).
- the Constant Region 1 can be adapted if needed. However, the sequence used should be designed such that it does not form primer dimers with Nextera Rdl (5’-GCAGCGTCAGATGTGTATAAGAGACAG-3’ (SEQ ID NO: 7)), TruSeq Rdl (5’-CTACACGACGCTCTTCCGATCT-3’ (SEQ ID NO: 8)), and TSO (5’-AAGCAGTGGTATCAACGCAGAG-‘3 (SEQ ID NO: 9)) to maintain compatibility with 1 OX scRNAseq. If using another platform than 1 OX scRNAseq, Constant Region 1 should be adapted to be compatible with downstream cDNA processing of the particular protocol of interest in the same manner as Constant Region 1 is designed for use with 10X.
- a list of 10,000 16-nucleotide long positional barcodes were generated that encode the known location of the amino acid that was mutated for each DNA fragment synthesized. These DNA barcodes all had the same characteristics. They were a levenshtein distance of at least 4 from any other barcode in the set. They all had a GC content between 40-60%. The maximum homodimer melting temperature (T m ) was less than 40 degrees Celsius, and they did not contain any Bsal (GGTCTC) or BsmBI (CGTCTC) restriction sites.
- T m The maximum homodimer melting temperature
- PacBio Revio allowed us to sequence up to 20K bp with a depth of ⁇ 10 million reads. This was sufficient to match the CDS sequence to DNA barcodes for >1 million unique clones within a population.
- the recommended starting concentration for amplicons ⁇ 3kb was 3-10 ng/pl.
- the polymerase isn’t able to fully saturate the DNA templates.
- Alternative sequencing approaches could also be used provided that they offer high sequencing depth (>5 million reads) and long sequencing read length (>1.5kb).
- the resulting sequencing reads allowed us to directly quantify rates of recombination during lentiviral integration by quantifying reads that contained a position barcode with mutations at the expected amino acid positions. Directly quantifying lentivector recombination rates was not reported previously. This sequencing was used to re-assign unique 36-nt barcodes to the correct protein coding sequence when artifacts are observed.
- any polynucleotide and polypeptide sequences which reference an accession number correlating to an entry in a public database, such as those maintained by The Institute for Genomic Research (TIGR) on the world wide web at tigr.org and/or the National Center for Biotechnology Information (NCBI) on the World Wide Web at ncbi.nlm.nih.gov.
- TIGR The Institute for Genomic Research
- NCBI National Center for Biotechnology Information
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Genetics & Genomics (AREA)
- Organic Chemistry (AREA)
- Immunology (AREA)
- Biotechnology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Health & Medical Sciences (AREA)
- Biochemistry (AREA)
- Zoology (AREA)
- Hematology (AREA)
- General Engineering & Computer Science (AREA)
- Microbiology (AREA)
- Wood Science & Technology (AREA)
- Urology & Nephrology (AREA)
- Physics & Mathematics (AREA)
- Medicinal Chemistry (AREA)
- Analytical Chemistry (AREA)
- Tropical Medicine & Parasitology (AREA)
- Pathology (AREA)
- Plant Pathology (AREA)
- General Physics & Mathematics (AREA)
- Toxicology (AREA)
- Food Science & Technology (AREA)
- Biophysics (AREA)
- Cell Biology (AREA)
- Virology (AREA)
- Crystallography & Structural Chemistry (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Ecology (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Chemical & Material Sciences (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363471801P | 2023-06-08 | 2023-06-08 | |
| US63/471,801 | 2023-06-08 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2024254506A2 true WO2024254506A2 (fr) | 2024-12-12 |
| WO2024254506A3 WO2024254506A3 (fr) | 2025-01-09 |
Family
ID=91853296
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2024/033082 Pending WO2024254506A2 (fr) | 2023-06-08 | 2024-06-07 | Satseq : système modulaire pour coupler une mutagenèse à saturation, un codage par barres d'adn et un séquençage profond |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2024254506A2 (fr) |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4861719A (en) | 1986-04-25 | 1989-08-29 | Fred Hutchinson Cancer Research Center | DNA constructs for retrovirus packaging cell lines |
| US5278056A (en) | 1988-02-05 | 1994-01-11 | The Trustees Of Columbia University In The City Of New York | Retroviral packaging cell lines and process of using same |
| WO1994019478A1 (fr) | 1993-02-22 | 1994-09-01 | The Rockefeller University | Production de retrovirus exempts d'auxiliaires, a titre eleve par transfection transitoire |
| WO1995014785A1 (fr) | 1993-11-23 | 1995-06-01 | Rhone-Poulenc Rorer S.A. | Composition pour la production de produits therapeutiques in vivo |
| WO1996022378A1 (fr) | 1995-01-20 | 1996-07-25 | Rhone-Poulenc Rorer S.A. | Cellules pour la production d'adenovirus recombinants |
| US5882877A (en) | 1992-12-03 | 1999-03-16 | Genzyme Corporation | Adenoviral vectors for gene therapy containing deletions in the adenoviral genome |
| US6013516A (en) | 1995-10-06 | 2000-01-11 | The Salk Institute For Biological Studies | Vector and method of use for nucleic acid delivery to non-dividing cells |
-
2024
- 2024-06-07 WO PCT/US2024/033082 patent/WO2024254506A2/fr active Pending
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4861719A (en) | 1986-04-25 | 1989-08-29 | Fred Hutchinson Cancer Research Center | DNA constructs for retrovirus packaging cell lines |
| US5278056A (en) | 1988-02-05 | 1994-01-11 | The Trustees Of Columbia University In The City Of New York | Retroviral packaging cell lines and process of using same |
| US5882877A (en) | 1992-12-03 | 1999-03-16 | Genzyme Corporation | Adenoviral vectors for gene therapy containing deletions in the adenoviral genome |
| WO1994019478A1 (fr) | 1993-02-22 | 1994-09-01 | The Rockefeller University | Production de retrovirus exempts d'auxiliaires, a titre eleve par transfection transitoire |
| WO1995014785A1 (fr) | 1993-11-23 | 1995-06-01 | Rhone-Poulenc Rorer S.A. | Composition pour la production de produits therapeutiques in vivo |
| WO1996022378A1 (fr) | 1995-01-20 | 1996-07-25 | Rhone-Poulenc Rorer S.A. | Cellules pour la production d'adenovirus recombinants |
| US6013516A (en) | 1995-10-06 | 2000-01-11 | The Salk Institute For Biological Studies | Vector and method of use for nucleic acid delivery to non-dividing cells |
Non-Patent Citations (4)
| Title |
|---|
| GLIGORIJEVIC ET AL., NATURE COMMUNICATIONS, vol. 12, 2021, pages 3168 |
| LIN ET AL., SCIENCE, vol. 379, 2023, pages 6637 |
| MARX, NATURE METHODS, vol. 20, 2023, pages 6 - 11 |
| SAMBROOK: "Molecular Cloning: A Laboratory Manual", 1989, COLD SPRING HARBOR LABORATORY |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2024254506A3 (fr) | 2025-01-09 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Heredia et al. | Mapping interaction sites on human chemokine receptors by deep mutational scanning | |
| Lentini et al. | DALRD3 encodes a protein mutated in epileptic encephalopathy that targets arginine tRNAs for 3-methylcytosine modification | |
| KR20230005984A (ko) | 유전자 발현을 활성화 및 침묵시키기 위해 효과기 도메인을 제조, 동정 및 특성규명하기 위한 조성물, 시스템 및 방법 | |
| Wang et al. | Molecular mechanisms governing Pcdh-γ gene expression: evidence for a multiple promoter and cis-alternative splicing model | |
| EP2816351A2 (fr) | Procédés et systèmes d'annotation de séquences biomoléculaires | |
| Levi et al. | mRNA association by aminoacyl tRNA synthetase occurs at a putative anticodon mimic and autoregulates translation in response to tRNA levels | |
| US20070082337A1 (en) | Methods of identifying putative gene products by interspecies sequence comparison and biomolecular sequences uncovered thereby | |
| Orellana et al. | RNA‐Seq highlights high clonal variation in monoclonal antibody producing CHO cells | |
| WO2018005691A1 (fr) | Procédé efficace de dépistage génétique | |
| EA038600B1 (ru) | Основанные на клетках перекрестные анализы и их применение | |
| Cai et al. | Aging-associated lncRNAs are evolutionarily conserved and participate in NFκB signaling | |
| Wu et al. | Overexpression of the microtubule-binding protein CLIP-170 induces a+ TIP network superstructure consistent with a biomolecular condensate | |
| Irimia et al. | Evolutionarily conserved A-to-I editing increases protein stability of the alternative splicing factor Nova1 | |
| Masuho et al. | Molecular deconvolution platform to establish disease mechanisms by surveying GPCR signaling | |
| Cowan et al. | Development of multiplexed orthogonal base editor (MOBE) systems | |
| US20230274792A1 (en) | System and method for prime editing efficiency prediction using deep learning | |
| Huang et al. | Never a dull enzyme, RNA polymerase II | |
| WO2024254506A2 (fr) | Satseq : système modulaire pour coupler une mutagenèse à saturation, un codage par barres d'adn et un séquençage profond | |
| Cirulli et al. | Revealing variants in SARS-CoV-2 interaction domain of ACE2 and loss of function intolerance through analysis of> 200,000 exomes | |
| Kotthoff et al. | Conserved C‐terminal motifs in odorant receptors instruct their cell surface expression and cAMP signaling | |
| Resa‐Infante et al. | Alternative interaction sites in the influenza A virus nucleoprotein mediate viral escape from the importin‐α7 mediated nuclear import pathway | |
| Palangat et al. | The RPB2 flap loop of human RNA polymerase II is dispensable for transcription initiation and elongation | |
| CN113195716A (zh) | 核酸文库、肽文库及它们的用途 | |
| Mattioli et al. | Beyond the gene: decoding alternative isoforms | |
| Chander et al. | Mechanisms of very long abortive transcript release during promoter escape |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| WWE | Wipo information: entry into national phase |
Ref document number: 2024739836 Country of ref document: EP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2024739836 Country of ref document: EP Effective date: 20260108 |
|
| ENP | Entry into the national phase |
Ref document number: 2024739836 Country of ref document: EP Effective date: 20260108 |
|
| ENP | Entry into the national phase |
Ref document number: 2024739836 Country of ref document: EP Effective date: 20260108 |
|
| ENP | Entry into the national phase |
Ref document number: 2024739836 Country of ref document: EP Effective date: 20260108 |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24739836 Country of ref document: EP Kind code of ref document: A2 |