WO2024042479A1 - Protéine cas12, système crispr-cas et leurs utilisations - Google Patents
Protéine cas12, système crispr-cas et leurs utilisations Download PDFInfo
- Publication number
- WO2024042479A1 WO2024042479A1 PCT/IB2023/058394 IB2023058394W WO2024042479A1 WO 2024042479 A1 WO2024042479 A1 WO 2024042479A1 IB 2023058394 W IB2023058394 W IB 2023058394W WO 2024042479 A1 WO2024042479 A1 WO 2024042479A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- seq
- cell
- sequence
- protein
- casl2
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/63—Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
- C12N15/70—Vectors or expression systems specially adapted for E. coli
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/102—Mutagenizing nucleic acids
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/63—Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
- C12N15/79—Vectors or expression systems specially adapted for eukaryotic hosts
- C12N15/85—Vectors or expression systems specially adapted for eukaryotic hosts for animal cells
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/16—Hydrolases (3) acting on ester bonds (3.1)
- C12N9/22—Ribonucleases [RNase]; Deoxyribonucleases [DNase]
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/11—DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
- C12N15/113—Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2310/00—Structure or type of the nucleic acid
- C12N2310/10—Type of nucleic acid
- C12N2310/20—Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPR]
-
- C—CHEMISTRY; METALLURGY
- C40—COMBINATORIAL TECHNOLOGY
- C40B—COMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
- C40B40/00—Libraries per se, e.g. arrays, mixtures
- C40B40/04—Libraries containing only organic compounds
- C40B40/06—Libraries containing nucleotides or polynucleotides, or derivatives thereof
Definitions
- the present disclosure relates to a Casl2 protein, CRISPR-Cas system and uses thereof.
- the Casl2 protein and CRISPR-Cas system are used for the gene targeting or gene editing.
- CRISPR-Cas Clustered regularly interspaced short palindromic repeats and CRISPR-associated proteins
- CRISPR-Casl2a which belongs to the class II of CRISPR-Cas system and is an alternative to the wildly used CRISPR-Cas9.
- CRISPR-Casl2a which belongs to the class II of CRISPR-Cas system and is an alternative to the wildly used CRISPR-Cas9.
- the further studies showed that each subtype of the CRISPR-Cas system itself is also diverse, and some of them are highly controversial in taxonomy.
- the disclosure provides an engineered, non-naturally occurring Casl2 protein, the Casl2 protein comprises an amino acid sequence selected from SEQ ID NOs: 1-35, or a homologue thereof having at least 70% sequence identity.
- the Casl2 protein comprises an amino acid sequence having at least 75%, 80%, 85%, 90%, 92%, 95% or 98% sequence identity to any one of SEQ ID NOs: 1-35.
- the Casl2 protein comprises an amino acid sequence having at least 90%, 95% or 98% sequence identity to any one of SEQ ID NOs: 1-35.
- the amino acid sequence of Casl2 protein lacks of 25-40 amino acids in PI domain compared to the amino acid sequence of SEQ ID NO: 71. Furthermore, the amino acid sequence of Casl2 protein lacks of 15-30 amino acids in NUC domain compared to the amino acid sequence of SEQ ID NO: 71.
- the amino acid sequence of Casl2 protein lacks of 15-30 amino acids in NUC domain compared to the amino acid sequence of SEQ ID NO: 71.
- the amino acid sequence of Casl2 protein lacks of at least 28 amino acids in PI domain compared to the amino acid sequence of SEQ ID NO: 71.
- the amino acid sequence of Casl2 protein lacks of at least 18 amino acids in NUC domain compared to the amino acid sequence of SEQ ID NO: 71.
- the Casl2 protein has an amino acid sequence selected from SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14, or a homologue thereof having at least 70% sequence identity.
- the amino acid sequence of the Casl2 protein has at least 80%, 85%, 90%, 92%, 95% or 98% sequence identity to SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14.
- the amino acid sequence of the Casl2 protein has at least 95% or 98% sequence identity to SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14.
- the amino acid sequence of the Casl2 protein has at least 70% sequence identity to SEQ ID NO: 1, SEQ ID NO: 4, or SEQ ID NO: 13. In some embodiments, the amino acid sequence of the Casl2 protein has at least 80%, 85%, 90%, 92%, 95% or 98% sequence identity to SEQ ID NO: 1, SEQ ID NO: 4, or SEQ ID NO: 13. In some embodiments, the amino acid sequence of the Casl2 protein is set forth in SEQ ID NO: 1, SEQ ID NO: 4, or SEQ ID NO: 13.
- the amino acid sequence of the Casl2 protein has at least 70% sequence identity to SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20, or SEQ ID NO: 26. In some embodiments, the amino acid sequence of the Casl2 protein has at least 80%, 85%, 90%, 92%, 95% or 98% sequence identity to SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20, or SEQ ID NO: 26. In some embodiments, the amino acid sequence of the Casl2 protein has at least 95% or 98% sequence identity to SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20, or SEQ ID NO: 26. In some embodiments, the amino acid sequence of the Casl2 protein is set forth in SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20, or SEQ ID NO: 26.
- the Casl2 protein further comprises promoter sequence, enhancer sequence, and/or termination region sequence.
- the Casl2 protein based on any one of SEQ ID NOs: 1-35, comprises an M amino acid residue at its N-terminus.
- the Casl2 protein further comprises one or more of a nuclear localization signal sequence, a cell penetrating peptide sequence, an affinity tag and/or a fusion base editor protein.
- the Casl2 protein comprises an amino acid sequence having at least 90%, 95%, or 98% sequence identity to any one of SEQ ID NOs: 74-78 or SEQ ID NOs: 148-152.
- the disclosure provides an engineered, non-naturally occurring cell comprising the Casl2 protein of any one of above.
- the cell is a eukaryotic cell or a prokaryotic cell.
- the eukaryotic cell is selected from the group consisting of: a plant cell, a fungal cell, a single cell eukaryotic organism, a mammalian cell, a reptile cell, an insect cell, an avian cell, a fish cell, a parasite cell, an arthropod cell, a cell of an invertebrate, a cell of a vertebrate, a rodent cell, a mouse cell, a rat cell, a primate cell, a non-human primate cell, and a human cell.
- the cell is a mammalian cell or a human cell or a plant cell.
- the disclosure provides a kit comprising the Casl2 protein of any one of above.
- the disclosure provides an engineered, non-naturally occurring Casl2 polynucleotide encoding the Casl2 protein of any one of above.
- the polynucleotide is ribonucleotide sequence or deoxyribonucleotide sequence, or analogs thereof; preferably the polynucleotide is mRNA, and the polynucleotide further comprises 5’cap sequence and poly-A tail sequence.
- the polynucleotide is codon optimized for expression in a cell of interest. In some embodiments, the polynucleotide is codon optimized for expression in a eukaryotic cell. In some embodiments, the polynucleotide has at least 90%, 92%, 95% or 98% sequence identity to any one of SEQ ID NOs: 111-120.
- the polynucleotide has the sequence selected from SEQ ID NOs: 111-120. In some embodiments, the polynucleotide has at least 95% or 98% sequence identity to any one of SEQ ID NOs: 112-115 or SEQ ID NOs: 117-120. In some embodiments, the polynucleotide has at least 95% or 98% sequence identity to any one of SEQ ID NOs: 112-115.
- the eukaryotic cell is selected from the group consisting of: a plant cell, a fungal cell, a single cell eukaryotic organism, a mammalian cell, a reptile cell, an insect cell, an avian cell, a fish cell, a parasite cell, an arthropod cell, a cell of an invertebrate, a cell of a vertebrate, a rodent cell, a mouse cell, a rat cell, a primate cell, a non-human primate cell, and a human cell.
- the cell is a mammalian cell, preferably a human cell.
- the polynucleotide has at least 70% sequence identity to any one of the SEQ ID NOs: 36-70 or SEQ ID NOs: 83-87.
- the polynucleotide has at least 75%, 80%, 85%, 88%, 90%, 92%, 94%, 95%, 96%, 98% or 99% sequence identity to any one of the SEQ ID NOs: 36-70 or SEQ ID NOs: 83-87.
- the disclosure provides the engineered, non-naturally occurring Casl2 protein of any one of above, or the Casl2 polynucleotide of any one of above for use as nuclease, preferably, for use as double-strand DNA cleavage nuclease or nickase.
- the disclosure provides the engineered, non-naturally occurring Casl2 protein of any one of above, or the Casl2 polynucleotide of any one of above for use in the gene editing.
- the disclosure provides the engineered, non-naturally occurring Casl2 protein of any one of above, or the Casl2 polynucleotide of any one of above for use in a therapeutic or treatment or prevention or diagnosis or detection method of disease.
- the disclosure provides the engineered, non-naturally occurring Casl2 protein of any one of above, or the Casl2 polynucleotide of any one of above for use as a medicament.
- the disclosure provides the engineered, non-naturally occurring Casl2 protein of any one of above, or the Casl2 polynucleotide of any one of above for use in a method of therapeutic treatment of a patient.
- the disclosure provides an engineered vector comprising the Casl2 polynucleotide of any one of above.
- the vector is an expression vector. In some embodiments, the vector is an inducible, conditional, or constitutive expression vector.
- the disclosure provides a vector system comprising one or more vectors of any one of above.
- one or more vectors comprise a polynucleotide according to any one of above and one or more polynucleotides which are on the same or a different vector encoding a gRNA.
- the disclosure provides an engineered cell comprising the Casl2 polynucleotide of any one of above, or comprising the vector of any one of above, or comprising the vector system of any one of above.
- the cell is expressing the Casl2 protein. In some embodiments, the cell transiently expresses or non-transiently expresses the modified CRISPR-Casl2 protein. In some embodiments, the cell is a eukaryotic cell or a prokaryotic cell. In some embodiments, the cell is a mammalian cell or a human cell or a plant cell.
- the disclosure provides a reagent kit comprising the Casl2 protein of any one of above, or comprising the Casl2 polynucleotide of any one of above, or comprising the vector of any one of above, or comprising the vector system of any one of above.
- the disclosure provides a pharmaceutical composition
- a pharmaceutical composition comprising the Casl2 protein of any one of above or the polynucleotide of any one of above or the vector of any one of above or the vector system of any one of above formulated for delivery by AAV (adena- associated viruses), Adenoviruses, retroviruses, HSV (herpes simplex virus), Gammaretrovirus, LV (lentivirus), eCIS (extracellular Contractile Injection System), eVLP (Engineered virus -like particles), VLP (virus-like particles), liposomes, plasmids, LNPs (lipid nanoparticles), exosomes, microvesicles, nucleic acid nanoassemblies, a gene gun, and/or an implantable device.
- AAV adena- associated viruses
- Adenoviruses retroviruses
- HSV herpes simplex virus
- Gammaretrovirus LV
- LV lentivirus
- eCIS extracellular Contractile In
- the disclosure provides an engineered, non-naturally occurring CRISPR- Cas system comprising: a) the Casl2 protein of any one of above or the polynucleotide encoding the Casl2 protein; b) at least one engineered guide sequence or one or more engineered nucleic acid encoding the at least one engineered guide sequence, and the guide sequence comprises a direct repeat sequence capable of binding the Casl2 protein and a spacer sequence capable of hybridizing to a target nucleotide sequence.
- the system comprises at least one guide sequence which is capable of hybridizing at least one target sequence or different regions of one target sequence.
- the guide sequence hybridizes to one or more target sequences in a prokaryotic cell or in a eukaryotic cell.
- the eukaryotic cell is selected from the group consisting of: a plant cell, a fungal cell, a single cell eukaryotic organism, a mammalian cell, a reptile cell, an insect cell, an avian cell, a fish cell, a parasite cell, an arthropod cell, a cell of an invertebrate, a cell of a vertebrate, a rodent cell, a mouse cell, a rat cell, a primate cell, a nonhuman primate cell, and a human cell.
- the eukaryotic cell comprises a mammalian cell.
- the mammalian cell comprises a human cell.
- the eukaryotic cell comprises a plant cell.
- the target sequence is a DNA. In some embodiments, the target sequence is selected from: double stranded DNA, single stranded DNA, RNA, genomic DNA, and extrachromosomal DNA.
- the direct repeat sequence comprises a stem-loop structure which comprising a first stem nucleotide strand which comprises 4-6 nucleotides; a second stem nucleotide strand which comprises 4-6 nucleotides, wherein the first and second stem nucleotide strands can hybridize with each other; and a loop nucleotide strand arranged between the first and second stem nucleotide strands, wherein the loop nucleotide strand comprises 4 or 5 nucleotides.
- the direct repeat sequence comprises a nucleotide sequence having at least 90% or 95% identity to any one of SEQ ID NOs:153-156.
- the Casl2 protein comprises an amino acid sequence having at least 90%, 95%, or 98% sequence identity to any one of is set forth in SEQ ID NO: 1, SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20 or SEQ ID NO: 26, and the direct repeat sequence comprises a nucleotide sequence having at least 90% or 95% identity to SEQ ID NO: 153 or SEQ ID NO: 156.
- the spacer sequence is between 10 and 40 nucleotides in length, preferably the spacer sequence is between 15 and 30 nucleotides in length, or between 18 and 25 nucleotides in length.
- a mRNA or a DNA encodes the Casl2 protein.
- the polynucleotide encoding the Casl2 protein operably linked to a promoter.
- the promoter is a constitutive promoter, a tissue-specific promoter or an inducible promoter.
- the polynucleotide encoding the Casl2 protein operably linked to a promoter is in a vector.
- the vector is selected from the group consisting of a retroviral vector, a lentiviral vector, a phage vector, an adenoviral vector, an adeno-associated vector, and a herpes simplex vector.
- the system further comprising a donor template nucleic acid, the donor template nucleic acid is a DNA or an RNA or a DNA-RNA hybrids.
- the targeting of the target sequence by the Casl2 protein and guide sequence results in a modification of the target sequence.
- the modification of the target sequence is a cleavage event or a nicking event.
- the target sequence is 3’ of a Protospacer Adjacent Motif (PAM), the PAM sequence is TTTR (R is A or G) and the Casl2 protein comprises an amino acid sequence selected from SEQ ID NO: 1 or SEQ ID NO: 4, or a homologue thereof having at least 70% sequence identity; the PAM sequence is TYYN (Y is T or C, N is A, T, C, or G) and the Casl2 protein comprises an amino acid sequence set forth in SEQ ID NO: 17, or a homologue thereof having at least 70% sequence identity, or the PAM sequence is TTTD (D is A, G, or T) and the Casl2 protein comprises an amino acid sequence set forth in SEQ ID NO: 26, or a homologue thereof having at least 70% sequence identity.
- PAM sequence Protospacer Adjacent Motif
- the PAM sequence is TTTR (R is A or G) and the Casl2 protein comprises an amino acid sequence selected from SEQ ID NO: 1 or SEQ ID NO: 4, or a homologue thereof having at least 80% sequence identity. In some embodiments, the PAM sequence is TTTR (R is A or G) and the Casl2 protein comprises an amino acid sequence selected from SEQ ID NO: 1 or SEQ ID NO: 4, or a homologue thereof having at least 85% sequence identity. In some embodiments, the PAM sequence is TTTR (R is A or G) and the Casl2 protein comprises an amino acid sequence selected from SEQ ID NO: 1 or SEQ ID NO: 4, or a homologue thereof having at least 90% sequence identity.
- the PAM sequence is TTTR (R is A or G) and the Casl2 protein comprises an amino acid sequence selected from SEQ ID NO: 1 or SEQ ID NO: 4, or a homologue thereof having at least 95% sequence identity. In some embodiments, the PAM sequence is TTTR (R is A or G) and the Casl2 protein comprises an amino acid sequence selected from SEQ ID NO: 1 or SEQ ID NO: 4, or a homologue thereof having at least 98% sequence identity.
- the PAM sequence is TYYN (Y is T or C, N is A, T, C, or G) and the Casl2 protein comprises an amino acid sequence set forth in SEQ ID NO: 17, or a homologue thereof having at least 80% sequence identity.
- the PAM sequence is TYYN (Y is T or C, N is A, T, C, or G) and the Casl2 protein comprises an amino acid sequence set forth in SEQ ID NO: 17, or a homologue thereof having at least 85% sequence identity.
- the PAM sequence is TYYN (Y is T or C, N is A, T, C, or G) and the Casl2 protein comprises an amino acid sequence set forth in SEQ ID NO: 17, or a homologue thereof having at least 90% sequence identity.
- the PAM sequence is TYYN (Y is T or C, N is A, T, C, or G) and the Casl2 protein comprises an amino acid sequence set forth in SEQ ID NO: 17, or a homologue thereof having at least 95% sequence identity.
- the PAM sequence is TYYN (Y is T or C, N is A, T, C, or G) and the Casl2 protein comprises an amino acid sequence set forth in SEQ ID NO: 17, or a homologue thereof having at least 98% sequence identity.
- the PAM sequence is TTTD (D is A, G, or T) and the Casl2 protein comprises an amino acid sequence set forth in SEQ ID NO: 26, or a homologue thereof having at least 80% sequence identity. In some embodiments, the PAM sequence is TTTD (D is A, G, or T) and the Casl2 protein comprises an amino acid sequence set forth in SEQ ID NO: 26, or a homologue thereof having at least 85% sequence identity. In some embodiments, the PAM sequence is TTTD (D is A, G, or T) and the Casl2 protein comprises an amino acid sequence set forth in SEQ ID NO: 26, or a homologue thereof having at least 90% sequence identity.
- the PAM sequence is TTTD (D is A, G, or T) and the Casl2 protein comprises an amino acid sequence set forth in SEQ ID NO: 26, or a homologue thereof having at least 95% sequence identity. In some embodiments, the PAM sequence is TTTD (D is A, G, or T) and the Casl2 protein comprises an amino acid sequence set forth in SEQ ID NO: 26, or a homologue thereof having at least 98% sequence identity.
- the disclosure provides a delivery system, wherein the system of any one of above is presented in selected from the group consisting of AAV (adena-associated viruses), Adenoviruses, retroviruses, HSV (herpes simplex virus), Gammaretrovirus, LV (lentivirus), eCIS (extracellular Contractile Injection System), eVLP (Engineered virus-like particles), VLP (viruslike particles), liposomes, plasmids, LNPs (lipid nanoparticles), exosomes, microvesicles, nucleic acid nanoassemblies, a gene gun, and/or an implantable device.
- AAV adena-associated viruses
- Adenoviruses retroviruses
- HSV herpes simplex virus
- Gammaretrovirus LV
- LV lentivirus
- eCIS extracellular Contractile Injection System
- eVLP Engineered virus-like particles
- VLP viruslike particles
- liposomes plasmids
- LNPs
- the disclosure provides an engineered cell comprising the system of any one of above.
- the cell is a eukaryotic cell or a prokaryotic cell.
- the eukaryotic cell is selected from the group consisting of: a plant cell, a fungal cell, a single cell eukaryotic organism, a mammalian cell, a reptile cell, an insect cell, an avian cell, a fish cell, a parasite cell, an arthropod cell, a cell of an invertebrate, a cell of a vertebrate, a rodent cell, a mouse cell, a rat cell, a primate cell, a non-human primate cell, and a human cell.
- the cell is a mammalian cell or a human cell or a plant cell.
- the disclosure provides the engineered, non-naturally occurring CRISPR- Cas system of any one of above, or the delivery system of above for use in a therapeutic or treatment or prevention or diagnosis or detection method of disease.
- the disclosure provides the engineered, non-naturally occurring CRISPR- Cas system of any one of above, delivery system of above or cell of any one of above for use as a medicament.
- the disclosure provides the engineered, non-naturally occurring CRISPR- Cas system of any one of above, delivery system of above or cell of any one of above for use in a method of therapeutic treatment of a patient.
- the disclosure provides a method of modifying or targeting a target DNA locus, the method comprising: delivering to said locus a CRISPR-Cas system of any one of above or a delivery system of above.
- said modifying or targeting a target locus comprises inducing a DNA strand break. In some embodiments, said modifying or targeting a target locus comprises inducing a DNA double strand break. In some embodiments, said modifying or targeting a target locus comprises altering gene expression of one or more genes. In some embodiments, said modifying or targeting a target locus comprises epigenetic modification of said target DNA locus. In some embodiments, the method is a method of modifying a cell, a cell line, or an organism by manipulation of one or more target sequences at genomic loci of interest.
- the cell is a eukaryotic cell or a prokaryotic cell. In some embodiments, the cell is a mammalian cell or a human cell or a plant cell. In some embodiments, the method is in vitro or in vivo.
- the disclosure provides a method of targeting and cleaving a doublestranded target DNA, the method comprising: contacting the double- stranded target DNA with a system of any one of above.
- cleaving the target DNA or target sequence results in the formation of an indel or the insertion of a nucleotide sequence. In some embodiments, cleaving the target DNA or target nucleotide comprising cleaving the target DNA or target sequence in two sites, and results in the deletion or inversion of a sequence between the two sites.
- the disclosure provides an isolated eukaryotic cell comprising a modified target locus of interest, wherein the target locus of interest has been modified according to a method or via use of a composition or via use of a system of any one of the preceding contents.
- the disclosure provides a system for detecting the presence of a nucleic acid target sequence in an in vitro sample, comprising: a Casl2 protein of any one of above; at least one guide polynucleotide comprising a guide sequence capable of binding the target sequence, and designed to form a complex with the Casl2 protein; and a nucleic acid-based masking construct comprising a non-target sequence; and wherein the Casl2 protein exhibits collateral cleavage activity of RNA and/or ssDNA and cleaves the nontarget sequence of the nucleic acid-based masking construct activated by the target sequence.
- the disclosure provides a method for detecting target nucleic acids in samples comprising: contacting one or more samples with a Casl2 protein of any one of above; at least one guide polynucleotide comprising a guide sequence designed to have a degree of complementarity with the target sequence, and designed to form a complex with the Casl2 protein; and a nucleic acid-based masking construct comprising a non-target sequence, wherein the Casl2 protein exhibits collateral cleavage activity of RNA and/or ssDNA and cleaves the non-target sequence of the nucleic acid-based masking construct activated by the target sequences; and detecting a signal from cleavage of the non-target sequence, thereby detecting the one or more target nucleic acid sequences in the sample.
- the disclosure provides an engineered, non-naturally occurring sgRNA, wherein the sgRNA comprises, in a tandem arrangement:
- a spacer sequence which is capable of hybridizing to a sequence of the target nucleic acid to be manipulated; wherein the direct repeat sequence having at least 90% sequence identity to any one of SEQ ID NOs: 153-156, and the spacer sequence having at least 90% sequence identity to any one of SEQ ID NOs: 157-182.
- tandem arrangement of the direct repeat sequence and spacer sequence is in a 5’ to 3’ orientation.
- the direct repeat sequence having at least 90% or 95% sequence identity to SEQ ID NO: 153 or SEQ ID NO: 156.
- the spacer sequence having at least 95% sequence identity to any one of SEQ ID NOs: 157-181.
- the spacer sequence is set forth in any one of SEQ ID NO: 157, SEQ ID NO: 164, SEQ ID NO: 168, SEQ ID NO: 169, SEQ ID NO: 171, and SEQ ID NOs: 175 -182.
- the disclosure provides a DNA polynucleotide molecule encoding the sgRNA of above.
- the disclosure provides a DNA expression vector comprising the DNA polynucleotide molecule of above.
- the vector further comprises one or more regulatory element(s) operably linked to sequences encoding the sgRNA.
- at least one regulatory element is capable of directing expression of the sgRNA within the cell.
- the disclosure provides a delivery vector carrying one or more sgRNA of any one of above.
- the disclosure provides an engineered, non-naturally occurring direct repeat sequence, wherein the direct repeat sequence comprises a nucleotide sequence having at least 90% sequence identity to any one of SEQ ID NOs: 153-156, or a variant thereof. In some embodiments, the direct repeat sequence having at least 95% or 98% sequence identity to SEQ ID NO: 153 or SEQ ID NO: 156. In some embodiments, the direct repeat sequence set forth in SEQ ID NO: 153 or SEQ ID NO: 156.
- the disclosure provides an engineered, non-naturally occurring spacer sequence, wherein the spacer sequence having at least 90% sequence identity to any one of SEQ ID NOs: 157-182. In some embodiments, the spacer sequence having at least 95% sequence identity to any one of SEQ ID NO: 157, SEQ ID NO: 164, SEQ ID NO: 168, SEQ ID NO: 169, SEQ ID NO: 171, and SEQ ID NOs: 175 -182. In some embodiments, the spacer sequence is set forth in any one of SEQ ID NO: 157, SEQ ID NO: 164, SEQ ID NO: 168, SEQ ID NO: 169, SEQ ID NO: 171, and SEQ ID NOs: 175 -182.
- the disclosure provides a DNA polynucleotide molecule encoding the spacer sequence of above.
- the disclosure provides the engineered, non-naturally occurring spacer sequence of above for use in a therapeutic or treatment or prevention or diagnosis or detection method of disease.
- the disclosure provides the engineered, non-naturally occurring spacer sequence of above for use as a medicament.
- the disclosure provides the engineered, non-naturally occurring spacer sequence of above for use in a method of therapeutic treatment of a patient.
- FIG.1A shows the phylogenetic tree of the Casl2 protein (GEBx0013, GEBx0014, GEBx0017, GEBx0018, GEBx0020, GEBx0021, GEBx0024, GEBx0027, GEBx0029, GEBx0032, GEBx0037, GEBx0047, GEBx0049, GEBx0063-72) constructed by IQTREE;
- FIG. IB shows the phylogenetic tree of the Casl2 protein (GEBx0030, GEBx0039) constructed by IQTREE;
- FIG.1C shows the phylogenetic tree of the Casl2 protein (GEBx0030, GEBx0039) constructed by IQTREE;
- FIG. ID shows the phylogenetic tree of the Casl2 protein (GEBx0074- GEBx0080) constructed by IQTREE;
- FIG.2A, FIG.2B, FIG.2C, and FIG.2D show the domain arrangement of the Casl2 proteins
- FIG.3A, FIG.3B and FIG.3C show the results of the amino acid sequence between GEBx0019(SEQ ID NO:5), GEBx0022(SEQ ID NO:8) and GEBxOO33(SEQ ID NO: 14) with the reference nuclease (AsCpfl, LbCpfl and FnCpfl), wherein FIG.3 A shows the comparison of the WED.3 domain of them and the difference is indicated with the line box; FIG.3B shows the comparison of the PI domain of them and the difference is indicated with the line box; FIG.3C shows the comparison of the NUC domain of them and the difference is indicated with the line box;
- FIG.4 shows the structure of AsCpfl and the model of GEBx0019, GEBx0022 and GEBxOO33 and the PI domains are shown in dark grey;
- FIG.5A shows the percentages of shared amino-acid sequences between the GEBx0030, GEBx0029 and reference Casl2a nucleases (AsCasl2a, LbCasl2a, and FnCasl2a), the multiple sequence alignment was performed by MUSCLE while the identity of each sequence was automatically calculated by GeneDoc;
- FIG.5B shows the sequence identity distance matrix between the GEBx0074-GEBx0080 in this disclosure and the reference sequences;
- FIG.6 shows the secondary structure of the crRN A utilized by the Casl2 protein
- FIG.7 shows the secondary structure of the crRNA utilized by GEBx0019, GEBx0022, GEBxOO33, GEBx0030, GEBx0039, GEBx0074-GEBx0080;
- FIG.8 shows the schematic of the construction of pEAST-Blunt E2 vector harbored with the Casl2 protein CDS;
- FIG.9A shows the SDS-PAGE results of different Chromatographic fractions of GEBx0032 protein and the position of the target band is indicated by arrows
- FIG.9B shows the SDS-PAGE analysis results of the purified GEBxOO33 proteins and the position of the target band is indicated by arrows
- FIG.9C shows the SDS-PAGE results of different Chromatographic fractions of GEBx0037 protein and the position of the target band is indicated by arrows
- FIG.9D shows the SDS-PAGE results of different Chromatographic fractions of GEBx0013 protein and the position of the target band is indicated by arrows
- FIG.9E shows the SDS-PAGE results of different Chromatographic fractions of GEBx0018 protein and the position of the target band is indicated by arrows
- FIG.9E shows the SDS-PAGE results of different Chromatographic fractions of GEBx0018 protein and the position of the target band is indicated by arrows
- FIG.10 shows the heatmap of the PAM requirement of GEBxOO33
- FIG.11A shows the in vitro cleavage result of GEBxOO33
- FIG.1 IB shows the in vitro cleavage results of GEBx0032 and GEBx0037
- FIG.11C shows the in vitro cleavage results of GEBx0013 and GEBx0018;
- FIG.12 shows the bar graph of the cleavage efficiency
- FIG.13 shows the sequences alignment between TnpB, Casl2f and GEBx0013/0047/0063/0064/0070 in this disclosure; the region of Zinc finger domain and the conserved 4-Cys Zinc finger in Casl2f and TnpB were marked with arrow and star respectively, indicated that GEBx0013/0047/0063/0064/0070 doesn’t have the zinc finger structure in their C terminus;
- FIG.14A shows the PAM preference of the GEBx0047 in HEK293 cell line
- FIG.14B is statistical curve depicting the relationship between the number of aligned sites and the cumulative number of aligned reads about GEBx0047
- FIG.14C shows the PAM preference of the GEBx0070 in HEK293 cell line
- FIG.14D is statistical curve depicting the relationship between the number of aligned sites and the cumulative number of aligned reads about GEBx0070
- FIG.14E shows the PAM preference of the GEBx0013 in HEK293 cell line
- FIG.14F shows the PAM preference of the GEBx0018 in HEK293 cell line
- FIG.14G shows the PAM preference of the GEBx0032 in HEK293 cell line
- FIG.15 shows the schematic of pCasX and pgRNA plasmid harbored with the Cas nucleases CDS and guide RNA respectively;
- FIG.16A shows the editing efficiency (Indel) of human HEK293T cells following forward transfection of different pCasX plasmids with MYODI targeted crRNA plasmid at 400 ng and 100 ng respectively, wherein NC (Negative Control) represents the cell sample without adding the lipoplex mixture
- FIG.16B shows the editing efficiency (Indel) of human HEK293T cells following forward co-transfection of pCasX plasmid harbored GEBx0063 or GEBx0064 CDS and the pgRNA plasmid harbored different length of MYODI -TTTG-T1 spacer respectively, wherein NC (Negative Control) represents the cell sample without adding the lipoplex mixture;
- FIG.17 shows the editing efficiency (Indel) of human HEK293T cells following forward transfection of different pCasX plasmids with VEGFA targeted crRNA plasmid at 400 ng and 100 ng respectively, wherein NC (Negative Control) represents the cell sample without adding the lipoplex mixture;
- FIG.18 shows the editing efficiency (Indel) of human HEK293T cells following forward transfection of different pCasX plasmids with IL1RN targeted crRNA plasmid at 400 ng and 100 ng respectively, wherein NC (Negative Control) represents the cell sample without adding the lipoplex mixture;
- FIG.19 shows the editing efficiency (Indel) of human HEK293T cells following forward transfection of different pCasX plasmids with DNMT1-1 targeted crRNA plasmid at 400 ng and 100 ng respectively, wherein NC (Negative Control) represents the cell sample without adding the lipoplex mixture.
- FIG.20A shows the indel activity of GEBx0047 across 22 targets with TTTG-PAM in HEK293T cell line
- FIG.20B shows the indel event and partly the allele plots achieved by GEBx0047 at RNF2-TTTG-T1 locus
- FIG.21A shows the indel activity of GEBx0063 across 22 targets with TTTG-PAM in HEK293T cell line
- FIG.2 IB shows the indel event and partly the allele plots achieved by GEBx0063 at TTR-TTTG-T3 locus
- FIG.22A shows the indel activity of GEBx0064 across 22 targets with TTTG-PAM in HEK293T cell line
- FIG.22B shows the indel event and partly the allele plots achieved by GEBx0064 at TTR-TTTG-T3 locus
- FIG.23A shows the indel activity of GEBx0070 across 22 targets with TTTG-PAM in HEK293T cell line
- FIG.23B shows the indel event and partly the allele plots achieved by GEBx0070 at HBB-TTTG-T2 locus.
- nucleic acids or polypeptide sequences refers to two or more sequences or subsequences that are the same or have a specified percentage of amino acid residues or nucleotides that are the same as measured using a BLAST or BLAST 2.0 or FASTA etc. sequence comparison algorithms with default parameters described below.
- the terms “recognized”, “recognizing”, or “recognition” in this context refers to the capability of the Casl2 protein to form a functional complex with a gRNA at a DNA target site to which the gRNA hybridizes (i.e. to which the guide sequence of the gRNA hybridizes) and being flanked by the PAM sequence, and wherein the Casl2 protein is capable of performing its natural function, i.e. DNA cleavage.
- DNA cleavage precludes the Casl2 protein from being a catalytically inactive Casl2 protein.
- an inactivated Casl2 protein e.g. a dead Casl2 protein
- a complex between the Casl2 protein, gRNA and cognate target may nevertheless be formed if the required PAM sequence is present, but such does not result in DNA cleavage.
- sample may contain whole cells and/or live cells and/or cell debris.
- the sample may contain (or be derived from) a “bodily fluid”.
- the bodily fluid is selected from amniotic fluid, aqueous humour, vitreous humour, bile, blood serum, breast milk, cerebrospinal fluid, cerumen (earwax), chyle, chyme, endolymph, perilymph, exudates, feces, female ejaculate, gastric acid, gastric juice, lymph, mucus (including nasal drainage and phlegm), pericardial fluid, peritoneal fluid, pleural fluid, pus, rheum, saliva, sebum (skin oil), semen, sputum, synovial fluid, sweat, tears, urine, vaginal secretion, vomit and mixtures of one or more thereof.
- Samples include cell cultures, bodily fluids, cell cultures from bodily fluids. Bodily fluids may be obtained from a mammal organism, for example by puncture, or other collecting or sampling procedures.
- subject refers to a vertebrate, preferably a mammal, more preferably a human.
- Mammals include, but are not limited to murines, simians, humans, farm animals, sport animals, and pets. Tissues, cells and their progeny of a biological entity obtained in vivo or cultured in vitro are also encompassed.
- gene refers to a nucleic acid sequence (used interchangeably with polynucleotide or nucleotide sequence) that encodes a chimeric molecule as described herein. This definition includes various sequence polymorphisms, mutations, and/or sequence variants wherein such alterations do not substantially affect the function of the encoded chimeric molecule.
- the term “gene” may include not only coding sequences but also regulatory regions such as promoters, enhancers, and termination regions. The term further can include all introns and other DNA sequences spliced from the mRNA transcript, along with variants resulting from alternative splice sites. Gene sequences encoding the molecule can be DNA or RNA that directs the expression of the chimeric molecule.
- nucleic acid sequences may be a DNA strand sequence that is transcribed into RNA or an RNA sequence that is translated into protein.
- the nucleic acid sequences include both the full-length nucleic acid sequences as well as non-full-length sequences derived from the full-length protein.
- the sequences can also include degenerate codons of the native sequence or sequences that may be introduced to provide codon preference in a specific cell type. Portions of complete gene sequences are referenced throughout the disclosure as is understood by one of ordinary skill in the art.
- Encoding refers to the property of specific sequences of nucleotides in a gene, such as a cDNA, or an mRNA, to serve as templates for synthesis of other macromolecules such as a defined sequence of amino acids.
- a gene codes for a protein if transcription and translation of mRNA corresponding to that gene produces the protein in a cell or other biological system.
- a polynucleotide encoding a protein includes all nucleotide sequences that are degenerate versions of each other and that code for the same amino acid sequence or amino acid sequences of substantially similar form and function.
- polynucleotide refers to a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides, or analogs thereof.
- Polynucleotides may have any three-dimensional structure, and may perform any function, known or unknown.
- this term includes, but is not limited to, single-, double-, or multi- stranded DNA or RNA, genomic DNA, cDNA, DNA-RNA hybrids, or a polymer comprising purine and pyrimidine bases or other natural, chemically or biochemically modified, non-natural, or derivatized nucleotide bases.
- Polynucleotide sequences encoding more than one portion of an expressed chimeric molecule can be operably linked to each other and relevant regulatory sequences. For example, there can be a functional linkage between a regulatory sequence and an exogenous nucleic acid sequence resulting in expression of the latter.
- a first nucleic acid sequence can be operably linked with a second nucleic acid sequence when the first nucleic acid sequence is placed in a functional relationship with the second nucleic acid sequence.
- a promoter is operably linked to a coding sequence if the promoter affects the transcription or expression of the coding sequence.
- operably linked DNA sequences are contiguous and, where necessary or helpful, join coding regions, into the same reading frame.
- non-naturally occurring or “engineered” are used interchangeably and indicate the involvement of the hand of man.
- the terms, when referring to nucleic acid molecules or polypeptides mean that the nucleic acid molecule or the polypeptide is at least substantially free from at least one other component with which they are naturally associated in nature and as found in nature. In all aspects and embodiments, whether they include these terms or not, it will be understood that, preferably, may be optional and thus preferably included or not preferably included.
- the terms “non-naturally occurring” and “engineered” may be used interchangeably and so can therefore be used alone or in combination and one or other may replace mention of both together.
- “engineered” is preferred in place of “non-naturally occurring” or “non-naturally occurring and/or engineered” or “engineered, non-naturally occurring”.
- “Homologue” of a protein as used herein is a protein of the same species which perform the same or a similar function as the protein it is a homologue of. Homologous proteins may but need not be structurally related, or are only partially structurally related. “Homologue” of a protein as used herein also include sequences having one or more additions, deletions, stop positions, or substitutions, as compared to a sequence disclosed herein. The Homologue protein as used herein perform the same or a similar function as the Casl2 protein disclosed herein.
- affinity tag facilitates the purification of recombinant modified proteins, for example GST, FLAG or hexahistidine sequences.
- fusion base editor protein refers to proteins that enable the direct conversion or editing of bases.
- sgRNA sgRNA
- crRNA gRNA (guide RNA)
- gRNA guide RNA
- cleavage event refers to a DNA break in a target nucleic acid created by a nuclease of a CRISPR system described herein.
- the cleavage event is a double- stranded DNA break.
- the cleavage event is a singlestranded DNA break.
- a “stem-loop structure” refers to a nucleic acid having a secondary structure that includes a region of nucleotides that are known or predicted to form a double strand (stem portion) that is linked on one side by a region of predominantly single- stranded nucleotides (loop portion).
- the terms “hairpin” and “fold-back” structures are also used herein to refer to stem- loop structures. Such structures are well known in the art and these terms are used consistently with their known meanings in the art.
- a stem-loop structure does not require exact basepairing.
- the stem may include one or more base mismatches.
- the base-pairing may be exact, i.e., not include any mismatches.
- donor template nucleic acid refers to a nucleic acid molecule that can be used by one or more cellular proteins to alter the structure of a target nucleic acid after a CRISPR enzyme described herein has altered a target nucleic acid.
- the donor template nucleic acid is a double- stranded nucleic acid.
- the donor template nucleic acid is a single-stranded nucleic acid.
- the donor template nucleic acid is linear.
- the donor template nucleic acid is circular (e.g., a plasmid).
- the donor template nucleic acid is an exogenous nucleic acid molecule.
- the donor template nucleic acid is an endogenous nucleic acid molecule (e.g., a chromosome).
- targeting refers to the ability of a complex including a CRISPR- associated protein and an RNA guide, to preferentially or specifically bind to, e.g., hybridize to, a specific target nucleic acid compared to other nucleic acids that do not have the same or similar sequence as the target nucleic acid.
- target nucleic acid refers to a specific nucleic acid substrate that contains a nucleic acid sequence complement to the entirety or a part of the spacer in an RNA guide.
- the target nucleic acid comprises a gene or a sequence within a gene.
- the target nucleic acid comprises a noncoding region (e.g., a promoter).
- the target nucleic acid is single-stranded.
- the target nucleic acid is double- stranded.
- target sequence refers to a specific nucleic acid that contains a nucleic acid sequence complement to the entirety or a part of the spacer in an RNA guide.
- the target sequence comprises a gene or a sequence within a gene.
- the target sequence comprises a noncoding region (e.g., a promoter).
- the target sequence is single-stranded.
- the target sequence is double- stranded.
- Casl2 enzyme Casl2 protein
- Casl2 effector protein Casl2
- Casl2 Casl2 effector protein
- AsCasl2a also names AsCpfl; LbCasl2a also names LbCpfl; FnCasl2a also names FnCpfl.
- AA is the abbreviation of amino acid.
- Metagenomic sequencing samples were selected from public databases and then downloaded. And sequencing reads were assembled with assembling tools. To search for potential Cas protein sequences, Cas sequences were downloaded as references and then Cas sequences were analyzed. We mined 35 novel Cas 12 proteins via lots of work. The information of the 35 novel Cas 12 proteins is showed in table 1.
- GEBx0047 MYG000001629
- GEBx0070 CALXXIO 10000001.1_4
- GEBx0063 CALUP0010000002.1_102
- GEBx0064 CALUQJ010000016.1_32
- the phylogenetic tree was constructed by IQTREE (FIG.1A) to visualize the relatedness of the orthologs at the primary amino-acid level using 102 Casl2a, Casl2b, Casl2c, Casl2d, Casl2e, Casl2f, Casl2g, Casl2h, Casl2i, Casl2j, and Casl2k sequences from The National Center for Biotechnology Information (NCBI), various publications, and patents.
- the branches of the tree corresponding to the Casl2 protein disclosed in this invention are marked with a circle while the reference nucleases (AsCpfl, LbCpfl, and FnCpfl) were marked with a star.
- GEBx0029 and GEBx0072 are more similar and they are representative clusters; GEBx0047 and GEBx0048 are more similar and they are representative clusters; GEBx0069, GEBx0070, GEBx0071, GEBx0073, and GEBx0066 are more similar and they are representative clusters, particularly, GEBx0073 and GEBx0066 are representative clusters, simultaneously GEBx0073 are unique one; GEBx0013, GEBx0014, GEBx0018, GEBx0024, and GEBx0049 are representative clusters; GEBx0020, GEBx0027, GEBx0063, GEBx0065, and GEBx0064 are representative clusters, and so on.
- Casl2 proteins share less than 70% identity with the existed Cas protein, some even share less than 60% identity or 50% identity with the existed Cas protein.
- the phylogenetic tree was constructed by IQTREE (FIG. IB) to visualize the relatedness of the orthologs at the primary amino-acid level using 88 Casl2a, Casl2b, Casl2c, Casl2d, Casl2e, Casl2f, Casl2g, Casl2h, Casl2i, Casl2j, and Casl2k sequences from The National Center for Biotechnology Information (NCBI), various publications, and patents.
- the branches of the tree corresponding to the GEBx0030 and GEBx0039 in this study are marked with a circle while the reference nucleases (AsCpfl, LbCpfl, and FnCpfl) were marked with a star.
- the engineered Casl2 protein studied here are representatives of unique Casl2 clusters. They each have their unique branches, suggesting that they are evolutionarily distinct.
- the multiple sequence alignment was performed by MUSCLE while the identity of each sequence was automatically calculated by GeneDoc.
- the results are shown in FIG.5A.
- GEBx0030 and GEBx0039 share less than 40% identity with three reference sequences.
- the size of GEBx0030 (1222 aa) and GEBx0039 (1171 aa) are much smaller than the usual Casl2a.
- the phylogenetic tree was constructed by IQTREE (FIG.1C) to visualize the relatedness of the orthologs at the primary amino-acid level using 73 Casl2a, Casl2b, Casl2c, Casl2d, Casl2e, Casl2f, Casl2g, Casl2h, Casl2i, Casl2j, and Casl2k sequences from The National Center for Biotechnology Information (NCBI), various publications, and patents.
- NCBI National Center for Biotechnology Information
- the branches of the tree corresponding to the GEBx0019, GEBx0022 and GEBxOO33 in this study are marked with a circle while the reference nucleases (AsCpfl, LbCpfl, and FnCpfl) were marked with a star.
- the tree shows that the engineered Casl2 protein studied here are representatives of unique Casl2 clusters. They each have their unique branches, suggesting that they are evolutionarily distinct.
- FIG.3 A, FIG.3B, FIG.3C and FIG.4 The amino acid sequences are aligned and the results are shown in FIG.3 A, FIG.3B, FIG.3C and FIG.4.
- GEBx0019, GEBx0022 and GEBxOO33 together with the AsCpfl has a relatively larger WED.3 domain (165 AA) than LbCpfl and FnCpfl (130 AA).
- the comparison of the PI domain and NUC domain between GEBx0019, GEBx0022 and GEBxOO33 and AsCpfl in FIGs.3B and FIG.3C shows that GEBx0019, GEBx0022 and GEBxOO33 have smaller PI domain and NUC domain than AsCpfl.
- the phylogenetic tree was constructed by IQTREE (FIG. ID) to visualize the relatedness of the orthologs at the primary amino-acid level using 120 Casl2a, Casl2b, Casl2c, Casl2d, Casl2e, Casl2f, Casl2g, Casl2h, Casl2i, Casl2j, and Casl2k sequences from The National Center for Biotechnology Information (NCBI), various publications, and patents.
- the branches of the tree corresponding to the Casl2 protein disclosed in this invention are marked with a circle while the reference nucleases (AsCpfl, LbCpfl, and FnCpfl) were marked with a star.
- the tree shows that the engineered Casl2 protein studied here are representatives of unique Casl2 clusters.
- GEBx0075 and GEBx0077 are more similar and they are representative clusters
- GEBx0079 and GEBx0080 are more similar and they are representative clusters
- GEBx0076 is a representative cluster
- GEBx0074 and GEBx0078 are more similar and they are representative clusters.
- Casl2 proteins share less than 70% identity with the existed Cas protein, some even share less than 60% identity or 50% identity with the existed Cas protein.
- the disclosure provides an engineered, non-naturally occurring Cas 12 protein, wherein the Casl2 protein comprises an amino acid sequence selected from SEQ ID NOs: 1-35, or a homologue thereof having at least 70% sequence identity.
- the Cas 12 protein comprises an amino acid sequence having at least 75%, 80%, 85%, 90%, 92%, 95% or 98% sequence identity to any one of SEQ ID NOs: 1-35.
- the Cas 12 protein comprises an amino acid sequence having at least 90%, 95% or 98% sequence identity to any one of SEQ ID NOs: 1-35.
- the amino acid sequence of the Cas 12 protein has at least 70% sequence identity to any one of SEQ ID NOs: 1-35. In certain embodiments, the amino acid sequence of the Cas 12 protein has at least 75% sequence identity to any one of SEQ ID NOs: 1- 35. In certain embodiments, the amino acid sequence of the Cas 12 protein has at least 80% sequence identity to any one of SEQ ID NOs: 1-35. In certain embodiments, the amino acid sequence of the Cas 12 protein has at least 82% sequence identity to any one of SEQ ID NOs: 1- 35. In certain embodiments, the amino acid sequence of the Cas 12 protein has at least 85% sequence identity to any one of SEQ ID NOs: 1-35.
- the amino acid sequence of the Cas 12 protein has at least 87% sequence identity to any one of SEQ ID NOs: 1- 35. In certain embodiments, the amino acid sequence of the Casl2 protein has at least 90% sequence identity to any one of SEQ ID NOs: 1-35. In certain embodiments, the amino acid sequence of the Cas 12 protein has at least 92% sequence identity to any one of SEQ ID NOs: 1- 35. In certain embodiments, the amino acid sequence of the Casl2 protein has at least 95% sequence identity to any one of SEQ ID NOs: 1-35. In certain embodiments, the amino acid sequence of the Cas 12 protein has at least 98% sequence identity to any one of SEQ ID NOs: 1- 35.
- the amino acid sequence of the Casl2 protein has at least 99% sequence identity to any one of SEQ ID NOs: 1-35. In certain embodiments, the amino acid sequence of the Cas 12 protein has 100% sequence identity to any one of SEQ ID NOs: 1-35.
- the “100% sequence identity” means the amino acid sequence of the CRISPR-Casl2 protein is selected from one of the SEQ ID NOs: 1-35.
- the amino acid sequence of Casl2 protein lacks of 25-40 amino acids in PI domain compared to the amino acid sequence of SEQ ID NO: 71. Furthermore, the amino acid sequence of Casl2 protein lacks of 15-30 amino acids in NUC domain compared to the amino acid sequence of SEQ ID NO: 71.
- the amino acid sequence of Casl2 protein lacks of at least 28 amino acids in PI domain compared to the amino acid sequence of SEQ ID NO: 71. In some embodiments, the amino acid sequence of Casl2 protein lacks of at least 18 amino acids in NUC domain compared to the amino acid sequence of SEQ ID NO: 71.
- the amino acid sequence of Casl2 protein lacks of at least 28 amino acids in PI domain and the amino acid sequence of Casl2 protein lacks of at least 18 amino acids in NUC domain compared to the amino acid sequence of SEQ ID NO: 71.
- the amino acid sequence of Casl2 protein lacks of 28 amino acids in PI domain compared to the amino acid sequence of SEQ ID NO: 71.
- the amino acid sequence of Casl2 protein lacks of 30 amino acids in PI domain compared to the amino acid sequence of SEQ ID NO: 71.
- the amino acid sequence of Casl2 protein lacks of 35 amino acids in PI domain compared to the amino acid sequence of SEQ ID NO: 71.
- the amino acid sequence of Casl2 protein lacks of 40 amino acids in PI domain compared to the amino acid sequence of SEQ ID NO: 71.
- the amino acid sequence of Casl2 protein lacks of 26 amino acids in PI domain compared to the amino acid sequence of SEQ ID NO: 71. In some embodiments, the amino acid sequence of Casl2 protein lacks of at least 18 amino acids in NUC domain compared to the amino acid sequence of SEQ ID NO: 71. In a specific embodiment, the amino acid sequence of Casl2 protein lacks of 18 amino acids in NUC domain compared to the amino acid sequence of SEQ ID NO: 71. In a specific embodiment, the amino acid sequence of Casl2 protein lacks of 20 amino acids in NUC domain compared to the amino acid sequence of SEQ ID NO: 71.
- the amino acid sequence of Casl2 protein lacks of 17 amino acids in NUC domain compared to the amino acid sequence of SEQ ID NO: 71. In a specific embodiment, the amino acid sequence of Casl2 protein lacks of 25 amino acids in NUC domain compared to the amino acid sequence of SEQ ID NO: 71. In a specific embodiment, the amino acid sequence of Casl2 protein lacks of 30 amino acids in NUC domain compared to the amino acid sequence of SEQ ID NO: 71.
- the Casl2 protein has an amino acid sequence selected from SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14, or a homologue thereof having at least 70% sequence identity. In some embodiments, the amino acid sequence of the Casl2 protein has at least 80%, 85%, 90%, 92%, 95% or 98% sequence identity to SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14. In some embodiments, the amino acid sequence of the Casl2 protein has at least 95% or 98% sequence identity to SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14.
- “at least 70%” can include 70%, 72%, 75%, 78%, 80%, 85%, 86%, 87%, 88%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%; “at least 80%” can include 85%, 86%, 87%, 88%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%; “at least 85%” can include 85%, 86%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%; “at least 90%” can include 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%; “at least 95%” can include 95%, 96%, 97%, 98%, 99% or 100%; “at least 97%” can include 97%, 98%, 99% or 100%; “at least 95%”
- the amino acid sequence of the Casl2 protein has at least 70% sequence identity to any one of SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14. In certain embodiments, the amino acid sequence of the Casl2 protein has at least 75% sequence identity to any one of SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14. In certain embodiments, the amino acid sequence of the Casl2 protein has at least 80% sequence identity to any one of SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14. In certain embodiments, the amino acid sequence of the Casl2 protein has at least 82% sequence identity to any one of SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14.
- the amino acid sequence of the Casl2 protein has at least 85% sequence identity to any one of SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14. In certain embodiments, the amino acid sequence of the Casl2 protein has at least 87% sequence identity to any one of SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14. In certain embodiments, the amino acid sequence of the Casl2 protein has at least 90% sequence identity to any one of SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14. In certain embodiments, the amino acid sequence of the Casl2 protein has at least 92% sequence identity to any one of SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14.
- the amino acid sequence of the Casl2 protein has at least 95% sequence identity to any one of SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14. In certain embodiments, the amino acid sequence of the Casl2 protein has at least 98% sequence identity to any one of SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14. In certain embodiments, the amino acid sequence of the Casl2 protein has at least 99% sequence identity to any one of SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14. In certain embodiments, the amino acid sequence of the Casl2 protein has 100% sequence identity to any one of SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14. The “100% sequence identity” means the amino acid sequence of the CRISPR-Casl2 protein is selected from one of SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14.
- the amino acid sequence of the Casl2 protein has at least 70% sequence identity to SEQ ID NO: 1, SEQ ID NO: 4, or SEQ ID NO: 13. In some embodiments, the amino acid sequence of the Casl2 protein has at least 80%, 85%, 90%, 92%, 95% or 98% sequence identity to SEQ ID NO: 1, SEQ ID NO: 4, or SEQ ID NO: 13. In some embodiments, the amino acid sequence of the Casl2 protein is set forth in SEQ ID NO: 1, SEQ ID NO: 4, or SEQ ID NO: 13.
- “at least 70%” can include 70%, 72%, 75%, 78%, 80%, 85%, 86%, 87%, 88%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%; “at least 80%”can include 85%, 86%, 87%, 88%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%; “at least 85%” can include 85%, 86%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%; “at least 90%” can include 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%; “at least 95%” can include 95%, 96%, 97%, 98%, 99% or 100%; “at least 97%” can include 97%, 98%, 99% or 100%; “at least 95%”
- the amino acid sequence of the Casl2 protein has at least 95% sequence identity to SEQ ID NO: 1, SEQ ID NO: 4, or SEQ ID NO: 13. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 98% sequence identity to SEQ ID NO: 1, SEQ ID NO: 4, or SEQ ID NO: 13.
- the amino acid sequence of the Casl2 protein has at least 90% sequence identity to SEQ ID NO: 1. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 95% sequence identity to SEQ ID NO: 1. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 98% sequence identity to SEQ ID NO: 1. In a certain embodiment, the amino acid sequence of the Casl2 protein is set forth in SEQ ID NO: 1.
- the amino acid sequence of the Casl2 protein has at least 90% sequence identity to SEQ ID NO: 4. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 95% sequence identity to SEQ ID NO: 4. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 98% sequence identity to SEQ ID NO: 4. In a certain embodiment, the amino acid sequence of the Casl2 protein is set forth in SEQ ID NO: 4.
- the amino acid sequence of the Casl2 protein has at least 90% sequence identity to SEQ ID NO: 13. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 95% sequence identity to SEQ ID NO: 13. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 98% sequence identity to SEQ ID NO: 13. In a certain embodiment, the amino acid sequence of the Casl2 protein is set forth in SEQ ID NO: 13.
- the amino acid sequence of the Casl2 protein has at least 70% sequence identity to SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20, or SEQ ID NO: 26. In some embodiments, the amino acid sequence of the Casl2 protein has at least 80%, 85%, 90%, 92%, 95% or 98% sequence identity to SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20, or SEQ ID NO: 26. In some embodiments, the amino acid sequence of the Casl2 protein has at least 95% or 98% sequence identity to SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20, or SEQ ID NO: 26. In some embodiments, the amino acid sequence of the Casl2 protein is set forth in SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20, or SEQ ID NO: 26
- “at least 70%” can include 70%, 72%, 75%, 78%, 80%, 85%, 86%, 87%, 88%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%; “at least 80%”can include 85%, 86%, 87%, 88%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%; “at least 85%” can include 85%, 86%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%; “at least 90%” can include 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%; “at least 95%” can include 95%, 96%, 97%, 98%, 99% or 100%; “at least 97%” can include 97%, 98%, 99% or 100%; “at least 95%”
- the amino acid sequence of the Casl2 protein has at least 80% sequence identity to SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20, or SEQ ID NO: 26. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 85% sequence identity to SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20, or SEQ ID NO: 26. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 90% sequence identity to SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20, or SEQ ID NO: 26.
- the amino acid sequence of the Casl2 protein has at least 95% sequence identity to SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20, or SEQ ID NO: 26. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 98% sequence identity to SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20, or SEQ ID NO: 26.
- the amino acid sequence of the Casl2 protein has at least 80% sequence identity to SEQ ID NO: 17. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 85% sequence identity to SEQ ID NO: 17. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 90% sequence identity to SEQ ID NO: 17. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 95% sequence identity to SEQ ID NO: 17. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 98% sequence identity to SEQ ID NO: 17.
- the amino acid sequence of the Casl2 protein has at least 80% sequence identity to SEQ ID NO: 19. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 85% sequence identity to SEQ ID NO: 19. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 90% sequence identity to SEQ ID NO: 19. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 95% sequence identity to SEQ ID NO: 19. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 98% sequence identity to SEQ ID NO: 19. In a certain embodiment, the amino acid sequence of the Casl2 protein is set forth in SEQ ID NO: 19.
- the amino acid sequence of the Casl2 protein has at least 80% sequence identity to SEQ ID NO: 20. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 85% sequence identity to SEQ ID NO: 20. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 90% sequence identity to SEQ ID NO: 20. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 95% sequence identity to SEQ ID NO: 20. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 98% sequence identity to SEQ ID NO: 20. In a certain embodiment, the amino acid sequence of the Casl2 protein is set forth in SEQ ID NO: 20.
- the amino acid sequence of the Casl2 protein has at least 80% sequence identity to SEQ ID NO: 26. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 85% sequence identity to SEQ ID NO: 26. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 90% sequence identity to SEQ ID NO: 26. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 95% sequence identity to SEQ ID NO: 26. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 98% sequence identity to SEQ ID NO: 26. In a certain embodiment, the amino acid sequence of the Casl2 protein is set forth in SEQ ID NO: 26.
- the amino acid sequence of the Casl2 protein has at least 80% sequence identity to SEQ ID NO: 19, or SEQ ID NO: 20. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 90% sequence identity to SEQ ID NO: 19, or SEQ ID NO: 20. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 90% sequence identity to SEQ ID NO: 19, or SEQ ID NO: 20. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 90% sequence identity to SEQ ID NO: 19, or SEQ ID NO: 20. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 95% sequence identity to SEQ ID NO: 19, or SEQ ID NO: 20. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 98% sequence identity to SEQ ID NO: 19, or SEQ ID NO: 20.
- the Casl2 proteins provided by this disclosure may Ide not only coding sequences but also regulatory regions such as promoters, enhancers, and termination regions.
- the Casl2 protein further comprises promoter sequence, enhancer sequence, and/or termination region sequence. In some embodiments, the Casl2 protein, based on any one of SEQ ID NOs: 1-35, comprises an M amino acid residue at its N-terminus.
- the Casl2 protein further comprises one or more of a nuclear localization signal sequence, a cell penetrating peptide sequence, an affinity tag and/or a fusion base editor protein.
- the Casl2 protein comprises one or more nuclear localization signal(s) NLS(s).
- the NLS(s) can locate at the end or other portion of the peptide.
- the NLS(s) located each end or other portion of the Casl2 amino acid sequence can be same or not.
- the NLS of the N- terminal end and the NLS of the C-terminal end are the same.
- the NLS of the N-terminal end and the NLS of the C-terminal end are different.
- the N- terminal end of the Casl2 amino acid sequence comprising one NLS and the C-terminal end of the Casl2 amino acid sequence comprising one NLS.
- NLS is fused to a peptide or non-peptide moiety that allows proteins to enter or localize to a tissue, a cell, or a region of a cell.
- NLS maybe an SV40 (simian virus 40) NLS, c- Myc NLS, or other suitable monopartite NLS.
- the NLS may be fused to an N-terminal and/or a C-terminal of the Casl2 protein.
- an affinity tag is added for purification of the fusion polypeptide by affinity chromatography.
- NLS located the N-terminal is set forth in SEQ ID NO: 183 (MAPKKKRKV).
- NLS located the C-terminal is set forth in SEQ ID NO: 80 (KRPAATKKAGQAKKKK).
- the FLAG sequence located the N- terminal is set forth in SEQ ID NO: 81 (DYKDDDDK).
- SEQ ID NOs: 148-152 the combination of SEQ ID NO: 183 and SEQ ID NOs:80-81 are just an example. Other available sequences and different combinations can also be chosen for the NLSs sequences and FLAG sequence.
- the Casl2 protein comprises an amino acid sequence having at least 90%, 95%, or 98% sequence identity to any one of SEQ ID NOs: 74-78 or SEQ ID NOs: 148-152. In some embodiments, the Casl2 protein comprises an amino acid sequence set forth in any one of SEQ ID NOs: 74-78 or SEQ ID NOs: 148-152. Any one of SEQ ID NOs: 74-78 or SEQ ID NOs: 148-152 comprises the Casl2 protein provided in this disclosure, the NLSs located the N- terminal and the C-terminal, and the FLAG sequence located the N-terminal.
- the polynucleotide has at least 85% sequence identity to SEQ ID NO: 74. In a certain embodiment, the polynucleotide has at least 90% sequence identity to SEQ ID NO: 74. In a certain embodiment, the polynucleotide has at least 95% sequence identity to SEQ ID NO: 74. In a certain embodiment, the polynucleotide has at least 98% sequence identity to SEQ ID NO: 74. In a certain embodiment, the polynucleotide has the sequence set forth in SEQ ID NO: 74. In a certain embodiment, the polynucleotide has at least 85% sequence identity to SEQ ID NO: 148.
- the polynucleotide has at least 90% sequence identity to SEQ ID NO: 148. In a certain embodiment, the polynucleotide has at least 95% sequence identity to SEQ ID NO: 148. In a certain embodiment, the polynucleotide has at least 98% sequence identity to SEQ ID NO: 74. In a certain embodiment, the polynucleotide has the sequence set forth in SEQ ID NO: 148. In a certain embodiment, the polynucleotide has at least 85% sequence identity to SEQ ID NO: 149. In a certain embodiment, the polynucleotide has at least 90% sequence identity to SEQ ID NO: 149.
- the polynucleotide has at least 95% sequence identity to SEQ ID NO: 149. In a certain embodiment, the polynucleotide has at least 98% sequence identity to SEQ ID NO: 149. In a certain embodiment, the polynucleotide has the sequence set forth in SEQ ID NO: 149. In a certain embodiment, the polynucleotide has at least 85% sequence identity to SEQ ID NO: 150. In a certain embodiment, the polynucleotide has at least 90% sequence identity to SEQ ID NO: 150. In a certain embodiment, the polynucleotide has at least 95% sequence identity to SEQ ID NO: 150.
- the polynucleotide has at least 98% sequence identity to SEQ ID NO: 150. In a certain embodiment, the polynucleotide has the sequence set forth in SEQ ID NO: 150. In a certain embodiment, the polynucleotide has at least 85% sequence identity to SEQ ID NO: 151. In a certain embodiment, the polynucleotide has at least 90% sequence identity to SEQ ID NO: 151. In a certain embodiment, the polynucleotide has at least 95% sequence identity to SEQ ID NO: 151. In a certain embodiment, the polynucleotide has at least 98% sequence identity to SEQ ID NO: 151.
- the polynucleotide has the sequence set forth in SEQ ID NO: 151. In a certain embodiment, the polynucleotide has at least 85% sequence identity to SEQ ID NO: 152. In a certain embodiment, the polynucleotide has at least 90% sequence identity to SEQ ID NO: 152. In a certain embodiment, the polynucleotide has at least 95% sequence identity to SEQ ID NO: 152. In a certain embodiment, the polynucleotide has at least 98% sequence identity to SEQ ID NO: 152. In a certain embodiment, the polynucleotide has the sequence set forth in SEQ ID NO: 152.
- the disclosure provides an engineered, non-naturally occurring cell comprising the Casl2 protein of any one of above.
- the cell is a eukaryotic cell or a prokaryotic cell.
- the eukaryotic cell is selected from the group consisting of: a plant cell, a fungal cell, a single cell eukaryotic organism, a mammalian cell, a reptile cell, an insect cell, an avian cell, a fish cell, a parasite cell, an arthropod cell, a cell of an invertebrate, a cell of a vertebrate, a rodent cell, a mouse cell, a rat cell, a primate cell, a non-human primate cell, and a human cell.
- the cell is a mammalian cell or a human cell or a plant cell.
- the cell maybe the eukaryotic cell or the prokaryotic cell.
- the cell is a eukaryotic cell.
- the cell is a vertebrate, mammalian, rodent, goat, pig, bird, chicken, turkey, cow, horse, sheep, fish, primate, or human cell.
- the cell is a mammalian cell.
- the cell is a human cell.
- the cell is a somatic cell, a germ cell, or a prenatal cell.
- the cell is a zygotic cell, a blastocyst cell, an embryonic cell, a stem cell, a mitotically competent cell, or a meiotically competent cell.
- the cell is not part of a human embryo. In one embodiment, the cell is a somatic cell. In one embodiment, the cell is a T cell, a CD 8+ T cell, al naive T cell, a central memory T cell, an effector memory T cell, a CD 4+ T cell, a stem cell memory T cell, a helper T cell, a regulatory T cell, a cytotoxic T cell, a natural killer T cell, a Hematopoietic Stem Cell, a long term hematopoietic stem cell, a short term hematopoietic stem cell, a multipotent progenitor cell, a lineage restricted progenitor cell, a lymphoid progenitor cell, a myeloid progenitor cell, a common myeloid progenitor cell, an erythroid progenitor cell, a megakaryocyte erythroid progenitor cell, a retinal cell, a photoreceptor cell, a
- the cell is a T cell, a Hematopoietic Stem Cell, a retinal cell, a cochlear hair cell, a pulmonary epithelial cell, a muscle cell, a neuron, a mesenchymal stem cell, an induced pluripotent stem (iPS) cell, or an embryonic stem cell.
- the cell is a plant cell.
- the disclosure provides a kit comprising the engineered, non-naturally occurring Casl2 protein of any one of above.
- the reagent kit can comprise the other components, for example, a solution or a buffer.
- the kit may further comprise other suitable excipients such as buffers or reagents for facilitating the application of the kit.
- the kit may be applied in various applications such as medical applications including therapies and diagnosis, researches and the like.
- the Casl2 protein and the kit of the present invention may be used in the preparation of a medicament for treatment and/or in the preparation of an agent for research study.
- the disclosure provides an engineered, non-naturally occurring Casl2 polynucleotide encoding the Casl2 protein of any one of above.
- the polynucleotides may be in the form of RNA or DNA, which includes cDNA, genomic DNA, and synthetic DNA.
- a polynucleotide may be double stranded or single stranded, and if single stranded, may be the coding strand or non-coding (anti- sense strand).
- a coding polynucleotide may have a coding sequence identical to a coding sequence known in the art or may have a different coding sequence, which, as the result of the redundancy or degeneracy of the genetic code, or by splicing, can encode the same polypeptide.
- the polypeptide may include not only coding sequences but also regulatory regions such as promoters, enhancers, and termination regions.
- the term further can include all introns and other DNA sequences spliced from the mRNA transcript, along with variants resulting from alternative splice sites.
- These nucleic acid sequences may be a DNA strand sequence that is transcribed into RNA or an RNA sequence that is translated into protein.
- the nucleic acid sequences include both the full-length nucleic acid sequences as well as non-full-length sequences derived from the full- length protein.
- the sequences can also include degenerate codons of the native sequence or sequences that may be introduced to provide codon preference in a specific cell type.
- the polypeptide sequences are referenced throughout the disclosure as is understood by one of ordinary skill in the art.
- the polynucleotide is ribonucleotide sequence or deoxyribonucleotide sequence or analogs thereof; preferably the polynucleotide is mRNA, and the polynucleotide further comprises 5’cap sequence and poly-A tail sequence.
- the polynucleotide is codon optimized for expression in a cell of interest. In some embodiments, the polynucleotide is codon optimized for expression in a eukaryotic cell; preferably the polynucleotide has at least 90%, 92%, 95% or 98% sequence identity to any one of SEQ ID NOs: 111-120.
- the polynucleotide has the sequence selected from SEQ ID NOs: 111-120. In some embodiments, the polynucleotide has at least 95% or 98% sequence identity to any one of SEQ ID NOs: 112-115 or SEQ ID NOs: 117-120. In some embodiments, the polynucleotide has at least 95% or 98% sequence identity to any one of SEQ ID NOs: 112-115. In some embodiments, the polynucleotide has a sequence identity to any one of SEQ ID NOs: 111- 120.
- the polynucleotide has at least 95% sequence identity to SEQ ID NO: 112 or SEQ ID NO: 117. In a certain embodiment, the polynucleotide has at least 98% sequence identity to SEQ ID NO: 112 or SEQ ID NO: 117. In a certain embodiment, the polynucleotide has the sequence set forth in SEQ ID NO: 112 or SEQ ID NO: 117. In a certain embodiment, the polynucleotide has at least 85% sequence identity to SEQ ID NO: 113 or SEQ ID NO: 118. In a certain embodiment, the polynucleotide has at least 90% sequence identity to SEQ ID NO: 113 or SEQ ID NO: 118.
- the polynucleotide has at least 95% sequence identity to SEQ ID NO: 113 or SEQ ID NO: 118. In a certain embodiment, the polynucleotide has at least 98% sequence identity to SEQ ID NO: 113 or SEQ ID NO: 118. In a certain embodiment, the polynucleotide has the sequence set forth in SEQ ID NO: 113 or SEQ ID NO: 118. In a certain embodiment, the polynucleotide has at least 85% sequence identity to SEQ ID NO: 114 or SEQ ID NO: 119. In a certain embodiment, the polynucleotide has at least 90% sequence identity to SEQ ID NO: 114 or SEQ ID NO: 119.
- the polynucleotide has at least 95% sequence identity to SEQ ID NO: 114 or SEQ ID NO: 119. In a certain embodiment, the polynucleotide has at least 98% sequence identity to SEQ ID NO: 114 or SEQ ID NO: 119. In a certain embodiment, the polynucleotide has the sequence set forth in SEQ ID NO: 114 or SEQ ID NO: 119. In a certain embodiment, the polynucleotide has at least 85% sequence identity to SEQ ID NO: 115 or SEQ ID NO: 120. In a certain embodiment, the polynucleotide has at least 90% sequence identity to SEQ ID NO: 115 or SEQ ID NO: 120.
- the polynucleotide has at least 95% sequence identity to SEQ ID NO: 115 or SEQ ID NO: 120. In a certain embodiment, the polynucleotide has at least 98% sequence identity to SEQ ID NO: 115 or SEQ ID NO: 120. In a certain embodiment, the polynucleotide has the sequence set forth in SEQ ID NO: 115 or SEQ ID NO: 120.
- “at least 90%” can include 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%; “at least 95%” can include 95%, 96%, 97%, 98%, 99% or 100%; “at least 97%” can include 97%, 98%, 99% or 100%; “at least 98%” can include 98%, 99% or 100%; and so on.
- the eukaryotic cell is selected from the group consisting of: a plant cell, a fungal cell, a single cell eukaryotic organism, a mammalian cell, a reptile cell, an insect cell, an avian cell, a fish cell, a parasite cell, an arthropod cell, a cell of an invertebrate, a cell of a vertebrate, a rodent cell, a mouse cell, a rat cell, a primate cell, a non-human primate cell, and a human cell.
- the cell is a mammalian cell, preferably a human cell.
- the cell is a mammalian cell, preferably a human cell.
- the polynucleotide has at least 70% sequence identity to any one of the SEQ ID NOs: 36-70 or SEQ ID NOs: 83-87.
- SEQ ID NOs: 36-70 are the polynucleotide sequences that correspond to encoding the SEQ ID NOs: 1-35.
- SEQ ID NOs: 83-87 are the polynucleotide sequences that correspond to encoding the SEQ ID NOs: 74-78 or SEQ ID NOs: 148-152.
- SEQ ID NOs: 83-87 all include NLSs sequences and FLAG sequence.
- the polynucleotide encoding the Casl2 protein will change accordingly to correspond with the promoter sequence, enhancer sequence, and/or termination region sequence comprising the Casl2 protein.
- the polynucleotide will add “ATG” at the 5’ end to encode an M amino acid residue at the N-terminus of the Casl2 protein.
- the polynucleotide has at least 75%, 80%, 85%, 88%, 90%, 92%, 94%, 95%, 96%, 98% or 99% sequence identity to any one of the SEQ ID NOs: 36-70 or SEQ ID NOs: 83-87.
- nucleic acids of SEQ ID Nos: 36-70 are the Non-Human Codon Optimized sequences.
- the disclosure provides the engineered, non-naturally occurring Casl2 protein as described herein above, or the Casl2 polynucleotide as described herein above for use as nuclease.
- the disclosure provides the engineered, non-naturally occurring Casl2 protein for use in gene editing.
- the disclosure provides the engineered, non-naturally occurring Casl2 protein for use in a therapeutic or treatment or prevention or diagnosis or detection method of disease.
- the disclosure provides the engineered, non-naturally occurring Casl2 protein for use as a medicament.
- the disclosure provides the engineered, non-naturally occurring Casl2 protein for use in a method of therapeutic treatment of a patient.
- the disclosure provides an engineered vector comprising the Casl2 polynucleotide of any one of above.
- the disclosure provides an engineered vector comprising the Casl2 polynucleotide of any one of above.
- the invention involves vectors.
- a “vector” is a tool that allows or facilitates the transfer of an entity from one environment to another. It is a replicon, such as a plasmid, phage, or cosmid, into which another DNA segment may be inserted so as to bring about the replication of the inserted segment.
- a vector is capable of replication when associated with the proper control elements.
- the term “vector” refers to a nucleic acid molecule capable of transporting another nucleic acid to which it has been linked.
- Vectors include, but are not limited to, nucleic acid molecules that are single-stranded, double- stranded, or partially double- stranded; nucleic acid molecules that comprise one or more free ends, no free ends (e.g., circular); nucleic acid molecules that comprise DNA, RNA, or both; and other varieties of polynucleotides known in the art.
- plasmid refers to a circular double stranded DNA loop into which additional DNA segments can be inserted, such as by standard molecular cloning techniques.
- viral vector Another type of vector is a viral vector, wherein virally- derived DNA or RNA sequences are present in the vector for packaging into a virus (e.g., retroviruses, replication defective retroviruses, adenoviruses, replication defective adenoviruses, and adeno-associated viruses (AAVs)).
- viruses e.g., retroviruses, replication defective retroviruses, adenoviruses, replication defective adenoviruses, and adeno-associated viruses (AAVs)
- Viral vectors also include polynucleotides carried by a virus for transfection into a host cell.
- Certain vectors are capable of autonomous replication in a host cell into which they are introduced (e.g., bacterial vectors having a bacterial origin of replication and episomal mammalian vectors).
- vectors e.g., non-episomal mammalian vectors
- Other vectors are integrated into the genome of a host cell upon introduction into the host cell, and thereby are replicated along with the host genome.
- certain vectors are capable of directing the expression of genes to which they are operatively-linked. Such vectors are referred to herein as “expression vectors”.
- Common expression vectors of utility in recombinant DNA techniques are often in the form of plasmids.
- Recombinant expression vectors can comprise a nucleic acid of the invention in a form suitable for expression of the nucleic acid in a host cell, which means that the recombinant expression vectors include one or more regulatory elements, which may be selected on the basis of the host cells to be used for expression, that is operatively- linked to the nucleic acid sequence to be expressed.
- “operably linked” is intended to mean that the nucleotide sequence of interest is linked to the regulatory element(s) in a manner that allows for expression of the nucleotide sequence (e.g., in an in vitro transcription/translation system or in a host cell when the vector is introduced into the host cell).
- the vector is an expression vector. In some embodiments, the vector is an inducible, conditional, or constitutive expression vector.
- the disclosure provides a vector system comprising one or more vectors of any one of above.
- one or more vectors comprise a polynucleotide according to any one of above and one or more polynucleotides which are on the same or a different vector encoding a gRNA.
- the disclosure provides an engineered cell comprising the Casl2 polynucleotide of any one of above, or comprising the vector of any one of above, or comprising the vector system of any one of above.
- the cell is expressing the Casl2 protein. In some embodiments, the cell transiently expresses or non-transiently expresses the modified CRISPR-Casl2 protein. In some embodiments, the cell is a eukaryotic cell or a prokaryotic cell. In some embodiments, the cell is a mammalian cell or a human cell or a plant cell.
- the disclosure provides a reagent kit comprising the Casl2 protein of any one of above, or comprising the Casl2 polynucleotide of any one of above, or comprising the vector of any one of above, or comprising the vector system of any one of above.
- the disclosure provides a pharmaceutical composition
- a pharmaceutical composition comprising the Casl2 protein of any one of above or the polynucleotide of any one of above or the vector of any one of above or the vector system of any one of above formulated for delivery by AAV (adena- associated viruses), Adenoviruses, retroviruses, HSV (herpes simplex virus), Gammaretrovirus, LV (lentivirus), eCIS (extracellular Contractile Injection System), eVLP (Engineered virus-like particles), VLP (virus-like particles), liposomes, plasmids, lipid nanoparticles (LNPs), exosomes, microvesicles, nucleic acid nanoassemblies, a gene gun, and/or an implantable device.
- AAV adena- associated viruses
- Adenoviruses retroviruses
- HSV herpes simplex virus
- Gammaretrovirus LV (lentivirus)
- eCIS extracellular Contractile Injection System
- Gammaretrovirus refers to a genus of the retroviridae family.
- exemplary gammaretroviruses include mouse stem cell virus, murine leukemia virus, feline leukemia virus, feline sarcoma virus, and avian reticuloendotheliosis viruses.
- the CRISPR-Casl2 system of the below or pharmaceutical composition of above described herein, or components thereof, nucleic acid molecules thereof, or nucleic acid molecules encoding or providing components thereof can be delivered by various delivery systems such as vectors, e.g., plasmids, viral delivery vectors, such as adeno- associated viruses (AAV), lentiviruses, adenoviruses, and other viral vectors, or methods, such as nucleofection or electroporation of ribonucleoprotein complexes consisting of Type V-I effectors and their cognate RNA guide or guides.
- the proteins and one or more RNA guides can be packaged into one or more vectors, e.g., plasmids or viral vectors.
- the nucleic acids encoding any of the components of the CRISPR systems described herein can be delivered to the bacteria using a phage.
- exemplary phages include, but are not limited to, T4 phage, Mu, X phage, T5 phage, T7 phage, T3 phage, ⁇ T>29, M13, MS2, Qp, and 0>X174.
- the vectors e.g., plasmids or viral vectors
- the tissue of interest by, e.g., intramuscular injection, intravenous administration, transdermal administration, intranasal administration, oral administration, or mucosal administration.
- Such delivery may be either via a single dose or multiple doses.
- the actual dosage to be delivered herein may vary greatly depending upon a variety of factors, such as the vector choices, the target cells, organisms, tissues, the general conditions of the subject to be treated, the degrees of transformation/modification sought, the administration routes, the administration modes, the types of transformation/modification sought, etc.
- the delivery is via adeno-associated viruses (AAV), e.g., AAV2, AAV8, or AAV9, which can be administered in a single dose containing at least IxlO 5 particles (also referred to as particle units, pu) of adenoviruses or adeno-associated viruses.
- AAV adeno-associated viruses
- the dose is at least about IxlO 6 particles, at least about IxlO 7 particles, at least about IxlO 8 particles, or at least about IxlO 9 particles of the adeno-associated viruses.
- the smaller size of the Casl2 proteins described herein enables greater versatility in packaging the effector and RNA guides with the appropriate control sequences (e.g., promoters) required for efficient and cell-type specific expression.
- the delivery is via a recombinant adeno-associated virus (rAAV) vector.
- a modified AAV vector may be used for delivery.
- Modified AAV vectors can be based on one or more of several capsid types, including AAV1, AV2, AAV5, AAV6, AAV8, AAV8.2.
- Exemplary AAV vectors and techniques that may be used to produce rAAV particles are known in the art (see, e.g., Aponte-Ubillus et al. (2016) Appl. Microbiol. Biotechnol. 102(3): 1045-54; Zhong et al. (2012) J. Genet. Syndr. Gene Ther. SI: 008; West et al. (1987) Virology 160: 38-47 (1987); Tratschin et al. (1985) Mol. Cell. Biol. 5: 3251-110), each of which is incorporated by reference).
- the delivery is via plasmids.
- the dosage can be a sufficient number of plasmids to elicit a response.
- suitable quantities of plasmid DNA in plasmid compositions can be from about 0.1 to about 2 mg.
- Plasmids will generally include (i) a promoter; (ii) a sequence encoding a nucleic acid-targeting CRISPR enzymes, operably linked to the promoter; (iii) a selectable marker; (iv) an origin of replication; and (v) a transcription terminator downstream of and operably linked to (ii).
- the plasmids can also encode the RNA components of a CRISPR-Cas system, but one or more of these may instead be encoded on different vectors.
- the frequency of administration is within the ambit of the medical or veterinary practitioner (e.g., physician, veterinarian), or a person skilled in the art.
- LNPs lipid nanoparticles
- the LNP can take different materials to form different forms.
- the LNP may comprises: a cationic lipid at a molar ratio between 35% and 45%, a polyethylene glycol (PEG) conjugated (PEGylated) lipid at a molar ratio between 0.25% and 2.75%, a cholesterol-based lipid at a molar ratio between 20% and 35%, and a helper lipid at a molar ratio of between 25% and 35%, wherein all the molar ratios are relative to the total lipid content of the LNP.
- LNP can be made into different sizes, such as an average diameter of 30-200 nm or 80-150 nm.
- the delivery is via liposomes or lipofection formulations and the like, and can be prepared by methods known to those skilled in the art. Such methods are described, for example, in WO 2016205764 and U.S. Pat. Nos. 5,593,972; 5,589,466; and 5,580,859; each of which is incorporated herein by reference in its entirety.
- the delivery is via nanoparticles or exosomes.
- exosomes have been shown to be particularly useful in the delivery of RNA.
- CRISPR cell penetrating peptides
- a cell penetrating peptide is linked to the CRISPR enzymes.
- the CRISPR enzymes and/or RNA guides are coupled to one or more CPPs to transport them inside cells effectively (e.g., plant protoplasts).
- the CRISPR enzymes and/or RNA guide(s) are encoded by one or more circular or non-circular DNA molecules that are coupled to one or more CPPs for cell delivery.
- the disclosure provides an engineered, non-naturally occurring CRISPR- Cas system comprising: a) the Casl2 protein of any one of above or the polynucleotide encoding the Casl2 protein; b) at least one engineered guide sequence or one or more engineered nucleic acid encoding the at least one engineered guide sequence, and the guide sequence comprises a direct repeat sequence capable of binding the Casl2 protein and a spacer sequence capable of hybridizing to a target nucleotide sequence.
- the engineered Casl2 protein that complexes with the guide sequence to form a CRISPR complex, and wherein in the CRISPR complex the nucleic acid molecule target one or more polynucleotide loci.
- the direct repeat sequence and the spacer sequence are heterologous.
- “Heterologous”, as used herein, means a nucleotide or polypeptide sequence that is not found in the native nucleic acid or protein, respectively.
- the system comprises at least one guide sequence which is capable of hybridizing at least one target sequence or different regions of one target sequence.
- the guide sequence hybridizes to one or more target sequences in a prokaryotic cell or in a eukaryotic cell.
- the eukaryotic cell is selected from the group consisting of: a plant cell, a fungal cell, a single cell eukaryotic organism, a mammalian cell, a reptile cell, an insect cell, an avian cell, a fish cell, a parasite cell, an arthropod cell, a cell of an invertebrate, a cell of a vertebrate, a rodent cell, a mouse cell, a rat cell, a primate cell, a nonhuman primate cell, and a human cell.
- the eukaryotic cell comprises a mammalian cell.
- the mammalian cell comprises a human cell.
- the eukaryotic cell comprises a plant cell.
- the target sequence is a DNA. In some embodiments, the target sequence is selected from: double stranded DNA, single stranded DNA, RNA, genomic DNA, and extrachromosomal DNA.
- the direct repeat sequence comprises a stem- loop structure which comprising a first stem nucleotide strand which comprises 4-6 nucleotides; a second stem nucleotide strand which comprises 4-6 nucleotides, wherein the first and second stem nucleotide strands can hybridize with each other; and a loop nucleotide strand arranged between the first and second stem nucleotide strands, wherein the loop nucleotide strand comprises 4 or 5 nucleotides.
- the direct repeat sequence comprises a nucleotide sequence having at least 90% or 95% identity to any one of SEQ ID NOs:153-156.
- the direct repeat sequence is selected from SEQ ID NOs:153-156 for the Casl2 protein comprising an amino acid sequence selected from SEQ ID NOs: 1-4, SEQ ID NOs: 6-7, SEQ ID NOs: 9-11, SEQ ID NO:13, SEQ ID NO:15, SEQ ID NOs: 17-28, or a homologue thereof.
- the direct repeat sequence is shown as SEQ ID NO: 153; in some embodiments, the direct repeat sequence is shown as SEQ ID NO: 154; in some embodiments, the direct repeat sequence is shown as SEQ ID NO: 155; in some embodiments, the direct repeat sequence is shown as SEQ ID NO: 156.
- the direct repeat sequence is shown as SEQ ID NO: 153 (5’- AAUUUCUACUAUUGUAGAU-3’) corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 1, wherein UAUU is the loop nucleotide (A of FIG.6, FIG.7).
- the direct repeat sequence is shown as SEQ ID NO: 154 (5’- AAUCCGUAACUUUGCAUUUGCAAAA-3’) corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO:9, wherein AUUU is the loop nucleotide (B of FIG.6).
- the direct repeat sequence is shown as SEQ ID NO: 155 (5’- AAUUUCUACUAUCGUAGAU-3’) corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 13, wherein UAUC is the loop nucleotide (C of FIG.6).
- the direct repeat sequence is shown as SEQ ID NO: 156 (5’- AAUUUCUACUGUUGUAGAU-3’) corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 15, wherein UGUU is the loop nucleotide (D of FIG.6).
- the direct repeat sequence is shown as SEQ ID NO: 153 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 2. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 153 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 6. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 153 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 10. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 153 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 17.
- the direct repeat sequence is shown as SEQ ID NO: 153 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 19. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 153 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 20. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 153 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 26. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 154 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 2.
- the direct repeat sequence is shown as SEQ ID NO: 154 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 3. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 154 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 9. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 154 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 20. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 155 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 4.
- the direct repeat sequence is shown as SEQ ID NO: 155 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 10. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 155 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 13. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 155 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 28. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 156 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 1.
- the direct repeat sequence is shown as SEQ ID NO: 156 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 7. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 156 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 15. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 156 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 17. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 156 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 19.
- the direct repeat sequence is shown as SEQ ID NO: 156 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 20. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 156 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 23. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 156 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 26. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 156 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 28. And so on.
- the direct repeat sequence is shown as SEQ ID NO: 153 (5’- AAUUUCUACUAUUGUAGAU-3’), wherein UAUU is the loop nucleotide.
- the direct repeat sequence is set forth in SEQ ID NO: 153 used by the Casl2 protein comprising an amino acid sequence selected from any one of SEQ ID NO: 5, SEQ ID NO: 8, SEQ ID NO: 12, SEQ ID NO: 14, SEQ ID NO: 16, or SEQ ID NOs: 29-35.
- the Casl2 protein comprises an amino acid sequence having at least 90%, 95%, or 98% sequence identity to any one of is set forth in SEQ ID NO: 1, SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20 or SEQ ID NO: 26, and the direct repeat sequence comprises a nucleotide sequence having at least 90% or 95% identity to SEQ ID NO: 153 or SEQ ID NO: 156.
- the Casl2 protein comprises an amino acid sequence having at least 95% sequence identity to any one of is set forth in SEQ ID NO: 1, SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20 or SEQ ID NO: 26, and the direct repeat sequence comprises a nucleotide sequence having at least 90% identity to SEQ ID NO: 153 or SEQ ID NO: 156.
- the Casl2 protein comprises an amino acid sequence having at least 98% sequence identity to any one of is set forth in SEQ ID NO: 1, SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20 or SEQ ID NO: 26, and the direct repeat sequence comprises a nucleotide sequence having at least 90% identity to SEQ ID NO: 153 or SEQ ID NO: 156.
- the Casl2 protein comprises an amino acid sequence having at least 95% sequence identity to any one of is set forth in SEQ ID NO: 1, SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20 or SEQ ID NO: 26, and the direct repeat sequence comprises a nucleotide sequence having at least 95% identity to SEQ ID NO: 153 or SEQ ID NO: 156.
- the Casl2 protein comprises an amino acid sequence having at least 98% sequence identity to any one of is set forth in SEQ ID NO: 1, SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20 or SEQ ID NO: 26, and the direct repeat sequence comprises a nucleotide sequence having at least 95% identity to SEQ ID NO: 153 or SEQ ID NO: 156.
- the Casl2 protein is set forth in SEQ ID NO: 1, SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20 or SEQ ID NO: 26, and the direct repeat sequence comprises a nucleotide sequence having at least 90% or 95% identity to SEQ ID NO: 153 or SEQ ID NO: 156.
- the Casl2 protein is set forth in SEQ ID NO: 1, SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20 or SEQ ID NO: 26, and the direct repeat sequence is set forth in SEQ ID NO: 153. In some embodiments, the Casl2 protein is set forth in SEQ ID NO: 1, SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20 or SEQ ID NO: 26, and the direct repeat sequence is set forth in SEQ ID NO: 156.
- the Casl2 protein is set forth in SEQ ID NO: 1 and the direct repeat sequence comprises a nucleotide sequence having at least 90% or 95% identity to SEQ ID NO: 153 or SEQ ID NO: 156.
- the Casl2 protein is set forth in SEQ ID NO: 17 and the direct repeat sequence comprises a nucleotide sequence having at least 90% or 95% identity to SEQ ID NO: 153 or SEQ ID NO: 156.
- the Casl2 protein is set forth in SEQ ID NO: 19 and the direct repeat sequence comprises a nucleotide sequence having at least 90% or 95% identity to SEQ ID NO: 153 or SEQ ID NO: 156.
- the Casl2 protein is set forth in SEQ ID NO: 20 and the direct repeat sequence comprises a nucleotide sequence having at least 90% or 95% identity to SEQ ID NO: 153 or SEQ ID NO: 156. In some embodiments, the Casl2 protein is set forth in SEQ ID NO: 26 and the direct repeat sequence comprises a nucleotide sequence having at least 90% or 95% identity to SEQ ID NO: 153 or SEQ ID NO: 156. In some embodiments, the Casl2 protein is set forth in SEQ ID NO: 17 and the direct repeat sequence is set forth in any one of SEQ ID NO: 153.
- the Casl2 protein is set forth in SEQ ID NO: 17 and the direct repeat sequence is set forth in any one of SEQ ID NO: 156. In some embodiments, the Casl2 protein is set forth in SEQ ID NO: 19 and the direct repeat sequence is set forth in any one of SEQ ID NO: 153. In some embodiments, the Casl2 protein is set forth in SEQ ID NO: 20 and the direct repeat sequence is set forth in any one of SEQ ID NO: 153. In some embodiments, the Casl2 protein is set forth in SEQ ID NO: 26 and the direct repeat sequence is set forth in any one of SEQ ID NO: 153.
- a “stem-loop structure” refers to a nucleic acid having a secondary structure that includes a region of nucleotides that are known or predicted to form a double strand (stem portion) that is linked on one side by a region of predominantly single- stranded nucleotides (loop portion).
- the terms “hairpin” and “fold-back” structures are also used herein to refer to stem-loop structures. Such structures are well known in the art and these terms are used consistently with their known meanings in the art. As is known in the art, a stem-loop structure does not require exact base-pairing.
- the stem may include one or more base mismatches.
- the base-pairing may be exact, i.e., not include any mismatches.
- the predicted stem loop structures of the direct repeats are illustrated in FIG.6 and FIG.7. In FIG.6 and FIG.7, “N” is just an example illustration and does not represent its actual nucleotide quantity.
- the direct repeat sequence in FIG.7 is same to the direct repeat sequence in A of FIG.6.
- the Casl2 protein has the nuclease activity, single-strand RNA cleavage activity, double-strand RNA cleavage activity, single-strand DNA cleavage activity, double-strand DNA cleavage activity or the nucleic acid binding activity.
- Casl2 protein has endonuclease activity, nickase activity, and/or exonuclease activity.
- the Casl2 protein may be a deactivated or inactivated Casl2 protein (e.g. “dead” Casl2 protein), wherein catalytic activity is partially or (substantially) completely lost, as described herein elsewhere.
- Loss of catalytic activity in this context means that the Casl2 protein is not capable of cleaving DNA (e.g. not capable of inducing double strand breaks, or only capable of inducing single strand breaks, such as a nickase).
- the Casl2 protein may be used to reduce off-target effects, as defined herein elsewhere.
- the Casl2 protein may also be part of a fusion protein, as defined herein elsewhere.
- the Casl2 protein may also be described to include a destabilization domain, as defined herein elsewhere.
- the Casl2 protein may also be a split Casl2 protein, as defined herein elsewhere.
- the Cas 12 protein may also be an inducible Cas 12 protein, as defined herein elsewhere.
- the Cas 12 protein may also be part of a self-inactivating system (SIN), as defined herein elsewhere.
- the Cas 12 protein may also be part of a synergistic activator system (SAM) as defined herein elsewhere.
- the Cas 12 protein polypeptide according to the disclosure as described herein is comprised in a fusion protein with a functional domain.
- said functional domain comprises a (transcriptional) activator domain, a (transcriptional) repressor domain, a recombinase, a transposase, a histone remodeler, a DNA methyltransferase, a cryptochrome, a light inducible/controllable domain, or a chemically inducible/controllable domain.
- the Casl2 polypeptide according to the disclosure as described herein is not capable of inducing a DNA double strand break.
- the Casl2 polypeptide according to the disclosure as described herein is a nickase.
- the Casl2 polypeptide according to the disclosure as described herein is a catalytically inactive Casl2 polypeptide.
- the Casl2 polypeptide according to the disclosure as described herein is not capable of inducing a DNA single strand break.
- the Casl2 protein is a dead Casl2 protein having a catalytically inactive.
- the Casl2 protein is a nickase having a catalytically inactive.
- a vector encoding the Casl2 protein lacks the ability to cleave one or both strands of a target polynucleotide containing a target sequence.
- the Casl2 protein lack all DNA cleavage activity when the DNA cleavage activity of the enzyme is about no more than 25%, 10%, 5%, 1%, 0.1%, 0.01%, or less of the DNA cleavage activity.
- the Casl2 protein may be used as a generic DNA binding protein with or without fusion to a functional domain.
- the Casl2 enzyme may be fused to a protein, e.g., a TAG, and/or an inducible/controllable domain such as a chemically inducible/controllable domain.
- the Casl2 in the disclosure may be a chimeric Casl2 proteins; e.g., a Casl2 having enhanced function by being a chimera. Chimeric Casl2 proteins may be new Cas containing fragments from more than one naturally occurring Cas.
- the Cas 12 protein has enhanced on target activity without higher off target cutting or for making super cutting nickases, or for combination with a mutation that renders the Cas dead for a super binder.
- the Cas 12 enzyme provided in this disclosure can recognize a short motif associated in the vicinity of a target DNA called a Protospacer Adjacent Motif (PAM).
- the Casl2 enzyme can recognize the canonical PAM comprising or consisting of 5'-TTTN-3' and the non-canonical sequences, wherein X denotes any nucleotide.
- the canonical PAM may be TTTA, TTTT, TTTG, or TTTC.
- the spacer sequence is between 10 and 40 nucleotides in length, preferably the spacer sequence is between 15 and 30 nucleotides in length, or between 18 and 25 nucleotides in length.
- a mRNA or a DNA encodes the Cas 12 protein.
- the polynucleotide encoding the Cas 12 protein, operably linked to a promoter is a constitutive promoter, a tissue-specific promoter or an inducible promoter.
- the polynucleotide encoding the Cas 12 protein operably linked to a promoter is in a vector.
- the vector is selected from the group consisting of a retroviral vector, a lentiviral vector, a phage vector, an adenoviral vector, an adeno-associated vector, and a herpes simplex vector.
- the system further comprising a donor template nucleic acid, the donor template nucleic acid is a DNA or an RNA or a DNA-RNA hybrids.
- the targeting of the target sequence by the Cas 12 protein and guide sequence results in a modification of the target sequence.
- the modification of the target sequence is a cleavage event or a nicking event.
- the target sequence is 3’ of a Protospacer Adjacent Motif (PAM), the PAM sequence is TTTR (R is A or G) and the Cas 12 protein comprises an amino acid sequence selected from SEQ ID NO: 1 or SEQ ID NO: 4, or a homologue thereof having at least 70% sequence identity; the PAM sequence is TYYN (Y is T or C, N is A, T, C, or G) and the Casl2 protein comprises an amino acid sequence set forth in SEQ ID NO: 17, or a homologue thereof having at least 70% sequence identity, or the PAM sequence is TTTD (D is A, G, or T) and the Casl2 protein comprises an amino acid sequence set forth in SEQ ID NO:26, or a homologue thereof having at least 70% sequence identity.
- PAM sequence Protospacer Adjacent Motif
- the disclosure provides a delivery system, wherein the system of any one of above is presented in selected from the group consisting of AAV (adena-associated viruses), Adenoviruses, retroviruses, HSV (herpes simplex virus), Gammaretrovirus, LV (lentivirus), eCIS (extracellular Contractile Injection System), eVLP (Engineered virus-like particles), VLP (viruslike particles), liposomes, plasmids, lipid nanoparticles (LNPs), exosomes, microvesicles, nucleic acid nanoassemblies, a gene gun, and/or an implantable device.
- AAV adena-associated viruses
- Adenoviruses retroviruses
- HSV herpes simplex virus
- Gammaretrovirus LV
- eCIS extracellular Contractile Injection System
- eVLP Engineered virus-like particles
- VLP viruslike particles
- liposomes plasmids
- LNPs lipid nanoparticles
- the disclosure provides an engineered cell comprising the system of any one of above.
- the cell is a eukaryotic cell or a prokaryotic cell.
- the eukaryotic cell is selected from the group consisting of: a plant cell, a fungal cell, a single cell eukaryotic organism, a mammalian cell, a reptile cell, an insect cell, an avian cell, a fish cell, a parasite cell, an arthropod cell, a cell of an invertebrate, a cell of a vertebrate, a rodent cell, a mouse cell, a rat cell, a primate cell, a non-human primate cell, and a human cell.
- the cell is a mammalian cell or a human cell or a plant cell.
- the disclosure provides the engineered, non-naturally occurring CRISPR- Cas system of any one of above, or the delivery system of above for use in a therapeutic or treatment or prevention or diagnosis or detection method of disease.
- the disclosure provides the engineered, non-naturally occurring CRISPR- Cas system of any one of above, delivery system of above or cell of any one of above for use as a medicament.
- the disclosure provides the engineered, non-naturally occurring CRISPR- Cas system of any one of above, delivery system of above or cell of any one of above for use in a method of therapeutic treatment of a patient.
- the disclosure provides a method of modifying or targeting a target DNA locus, the method comprising delivering to said locus a CRISPR-Cas system of any one of above or a delivery system of above.
- said modifying or targeting a target locus comprises inducing a DNA strand break. In some embodiments, said modifying or targeting a target locus comprises inducing a DNA double strand break or a DNA single strand break. In some embodiments, said modifying or targeting a target locus comprises altering gene expression of one or more genes. In some embodiments, said modifying or targeting a target locus comprises epigenetic modification of said target DNA locus. In some embodiments, the method is a method of modifying a cell, a cell line, or an organism by manipulation of one or more target sequences at genomic loci of interest.
- the cell is a eukaryotic cell or a prokaryotic cell.
- the eukaryotic cell is selected from the group consisting of: a plant cell, a fungal cell, a single cell eukaryotic organism, a mammalian cell, a reptile cell, an insect cell, an avian cell, a fish cell, a parasite cell, an arthropod cell, a cell of an invertebrate, a cell of a vertebrate, a rodent cell, a mouse cell, a rat cell, a primate cell, a non-human primate cell, and a human cell.
- the cell is a mammalian cell or a human cell or a plant cell.
- the method is in vitro or in vivo.
- the disclosure provides a method of targeting and cleaving a doublestranded target DNA, the method comprising: contacting the double- stranded target DNA with a system of any one of above.
- cleaving the target DNA or target sequence results in the formation of an indel or the insertion of a nucleotide sequence. In some embodiments, cleaving the target DNA or target nucleotide comprising cleaving the target DNA or target sequence in two sites, and results in the deletion or inversion of a sequence between the two sites.
- the disclosure provides an isolated eukaryotic cell comprising a modified target locus of interest, wherein the target locus of interest has been modified according to a method or via use of a composition or via use of a system of any one of the preceding contents.
- the cleavage efficiency of the Casl2 protein on double- stranded DNA (dsDNA) is verified.
- the cleavage ratio is 2%- 100%.
- in vitro cleavage efficiency assay the range of the cleavage ratio is less than 10%.
- in vitro cleavage efficiency assay the range of the cleavage ratio is less than 5%.
- in vitro cleavage efficiency assay the range of the cleavage ratio is less than 15%.
- in vitro cleavage efficiency assay the range of the cleavage ratio can be less than 20%.
- in vitro cleavage efficiency assay the range of the cleavage ratio is more than 30%. In some embodiments, in vitro cleavage efficiency assay, the range of the cleavage ratio is more than 40%. In some embodiments, in vitro cleavage efficiency assay, the range of the cleavage ratio is more than 50%. In some embodiments, in vitro cleavage efficiency assay, the range of the cleavage ratio is more than 60%. In some embodiments, in vitro cleavage efficiency assay, the range of the cleavage ratio is more than 70%. In some embodiments, in vitro cleavage efficiency assay, the range of the cleavage ratio is more than 80%.
- the range of the cleavage ratio is more than 90%.
- the cleavage ratio is 50%-100%.
- the cleavage ratio is 60%-100%.
- the cleavage ratio is 70%-90%.
- the cleavage ratio is 80%-90%.
- the cleavage ratio is 80%-95%.
- the cleavage ratio is 85%-95%.
- the cleavage ratio is 85%-98%.
- the cleavage ratio is 60%-90%.
- the cleavage ratio can be 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 12%, 15%, 18%, 20%, 25%, 30%, 35%, 40%, 50%, 55%, 58%, 60%, 65%, 70%, 72%, 73%, 75%, 78%, 80%, 82%, 85%, 87%, 88%, 90%, 92%, 95%, 97%, 98%, 99%, 100% and so on.
- the test of the genome cleavage activity in mammalian cells shows that the gene editing efficiency 50%-95%.
- the gene editing efficiency can be 50%, 55%, 58%, 60%, 65%, 67%, 70%, 72%, 73%, 75%, 78%, 80%, 82%, 85%, 87%, 88%, 90%, 92%, 95% and so on.
- the Casl2 protein also shows a lower off-targets and the off-targets are not detected in some Casl2 protein.
- a Casl2 protein system is engineered to provide and take advantage of collateral non-specific cleavage of nucleic acids, such as ssDNA.
- a Casl2 protein system is engineered to provide and take advantage of collateral non-specific cleavage of ssDNA. Accordingly, engineered Casl2 protein systems provide platforms for nucleic acid detection and transcriptome manipulation, and inducing cell death. Casl2 protein is developed for use as a mammalian transcript knockdown and binding tool. Casl2 protein is capable of robust collateral cleavage of RNA and ssDNA when activated by sequence- specific targeted DNA binding.
- Casl2 protein is provided or expressed in an in vitro system or in a cell, transiently or stably, and targeted or triggered to non- specifically cleave cellular nucleic acids.
- Casl2 protein is engineered to knock down ssDNA, for example viral ssDNA.
- Casl2 protein is engineered to knock down RNA. The system can be devised such that the knockdown is dependent on a target DNA present in the cell or in vitro system, or triggered by the addition of a target nucleic acid to the system or cell.
- the Casl2 protein system is engineered to non- specifically cleave RNA in a subset of cells distinguishable by the presence of an aberrant DNA sequence, for instance where cleavage of the aberrant DNA might be incomplete or ineffectual.
- SHERLOCK highly sensitive and specific nucleic acid detection platform
- engineered Cas 12 protein systems are optimized for DNA or RNA endonuclease activity and can be expressed in mammalian cells and targeted to effectively knock down reporter molecules or transcripts in cells.
- the collateral effect of engineered Cas 12 protein with isothermal amplification provides a CRISPR-based diagnostic providing rapid DNA or RNA detection with high sensitivity and singlebase mismatch specificity.
- the Cas 12 protein-based molecular detection platform is used to detect specific strains of virus, distinguish pathogenic bacteria, genotype human DNA, and identify cell- free tumor DNA mutations.
- reaction reagents can be lyophilized for cold-chain independence and long-term storage, and readily reconstituted on paper for field applications.
- the ability to rapidly detect nucleic acids with high sensitivity and single-base specificity on a portable platform may aid in disease diagnosis and monitoring, epidemiology, and general laboratory tasks. Although methods exist for detecting nucleic acids, they have trade-offs among sensitivity, specificity, simplicity, cost, and speed.
- the disclosure provides a system for detecting the presence of a nucleic acid target sequence in an in vitro sample, comprising: a Cas 12 protein of any one of above; at least one guide polynucleotide comprising a guide sequence capable of binding the target sequence, and designed to form a complex with the Cas 12 protein; and a nucleic acid-based masking construct comprising a non-target sequence; and wherein the Cas 12 protein exhibits collateral cleavage activity of RNA and/or ssDNA and cleaves the nontarget sequence of the nucleic acid-based masking construct activated by the target sequence.
- the system further comprising nucleic acid amplification reagents to amplify the trigger sequence.
- the amplification reagents are isothermal amplification reagents.
- the amplification reagents are nucleic-acid sequenced-based amplification (NASBA), recombinase polymerase amplification (RPA), loop- mediated isothermal amplification (LAMP), strand displacement amplification (SDA), helicasedependent amplification (HDA), or nicking enzyme amplification reaction (NEAR).
- the target sequence is a target RNA sequence and the system further comprises an DNA polymerase and a primer designed to bind the target RNA sequence and further comprises a DNA polymerase promoter.
- the disclosure provides a method for detecting target nucleic acids in samples comprising: contacting one or more samples with a Casl2 protein of any one of above; at least one guide polynucleotide comprising a guide sequence designed to have a degree of complementarity with the target sequence, and designed to form a complex with the Casl2 protein; and a nucleic acid-based masking construct comprising a non-target sequence, wherein the Casl2 protein exhibits collateral cleavage activity of RNA and/or ssDNA and cleaves the non-target sequence of the nucleic acid-based masking construct activated by the target sequences; and detecting a signal from cleavage of the non-target sequence, thereby detecting the one or more target nucleic acid sequences in the sample.
- the disclosure provides an engineered, non-naturally occurring sgRNA, wherein the sgRNA comprises, in a tandem arrangement:
- a spacer sequence which is capable of hybridizing to a sequence of the target nucleic acid to be manipulated; wherein the direct repeat sequence having at least 90% sequence identity to any one of SEQ ID NOs: 153-156, and the spacer sequence having at least 90% sequence identity to any one of SEQ ID NOs: 157-182.
- tandem arrangement of the direct repeat sequence and spacer sequence is in a 5’ to 3’ orientation.
- the direct repeat sequence having at least 90% or 95% sequence identity to SEQ ID NO: 153 or SEQ ID NO: 156.
- the spacer sequence having at least 95% sequence identity to any one of SEQ ID NOs: 157-181.
- the spacer sequence is set forth in any one of SEQ ID NO: 157, SEQ ID NO: 164, SEQ ID NO: 168, SEQ ID NO: 169, SEQ ID NO: 171, and SEQ ID NOs: 175 -182.
- the sgRNA interacts with the Casl2 protein, such as GEBx0047, GEBx0063, GEBx0064, or GEBx0070.
- the disclosure provides a DNA polynucleotide molecule encoding the sgRNA of above.
- the disclosure provides a DNA expression vector comprising the DNA polynucleotide molecule of above.
- the vector further comprises one or more regulatory element(s) operably linked to sequences encoding the sgRNA.
- at least one regulatory element is capable of directing expression of the sgRNA within the cell.
- the disclosure provides a delivery vector carrying one or more sgRNA of any one of above.
- the disclosure provides an engineered, non-naturally occurring direct repeat sequence, wherein the direct repeat sequence comprises a nucleotide sequence having at least 90% sequence identity to any one of SEQ ID NOs: 153-156, or a variant thereof. In some embodiments, the direct repeat sequence having at least 95% or 98% sequence identity to SEQ ID NO: 153 or SEQ ID NO: 156. In some embodiments, the direct repeat sequence set forth in SEQ ID NO: 153 or SEQ ID NO: 156.
- the disclosure provides an engineered, non-naturally occurring spacer sequence, wherein the spacer sequence having at least 90% sequence identity to any one of SEQ ID NOs: 157-182. In some embodiments, the spacer sequence having at least 95% sequence identity to any one of SEQ ID NO: 157, SEQ ID NO: 164, SEQ ID NO: 168, SEQ ID NO: 169, SEQ ID NO: 171, and SEQ ID NOs: 175 -182. In some embodiments, the spacer sequence is set forth in any one of SEQ ID NO: 157, SEQ ID NO: 164, SEQ ID NO: 168, SEQ ID NO: 169, SEQ ID NO: 171, and SEQ ID NOs: 175 -182.
- the disclosure provides a DNA polynucleotide molecule encoding the spacer sequence of above.
- the disclosure provides the engineered, non-naturally occurring spacer sequence of above for use in a therapeutic or treatment or prevention or diagnosis or detection method of disease.
- the disclosure provides the engineered, non-naturally occurring spacer sequence of above for use as a medicament.
- the disclosure provides the engineered, non-naturally occurring spacer sequence of above for use in a method of therapeutic treatment of a patient.
- the method further comprising contacting the one or more samples with reagents for amplifying one or more target sequences.
- the amplification reagents are isothermal amplification reagents.
- the amplification reagents are nucleic-acid sequenced-based amplification (NASBA), recombinase polymerase amplification (RPA), loop- mediated isothermal amplification (LAMP), strand displacement amplification (SDA), helicase- dependent amplification (HD A), or nicking enzyme amplification reaction (NEAR).
- NASBA nucleic-acid sequenced-based amplification
- RPA recombinase polymerase amplification
- LAMP loop- mediated isothermal amplification
- SDA strand displacement amplification
- HD A helicase- dependent amplification
- NEAR nicking enzyme amplification reaction
- the target sequence is a target RNA sequence and the system further comprises an DNA polymerase and a primer designed to bind the target RNA sequence and further comprises a DNA polymerase promoter.
- the masking construct suppresses generation of a detectable positive signal until cleaved or deactivated, or masks a detectable positive signal, or generates a detectable negative signal until the masking construct is deactivated or cleaved.
- the masking construct comprises: a. a silencing RNA that suppresses generation of a gene product encoded by a reporting construct, wherein the gene product generates the detectable positive signal when expressed; b.
- a ribozyme that generates the negative detectable signal, and wherein the positive detectable signal is generated when the ribozyme is deactivated; or c. a ribozyme that converts a substrate to a first color and wherein the substrate converts to a second color when the ribozyme is deactivated; d. an aptamer and/or comprises a polynucleotide-tethered inhibitor; e. a polynucleotide to which a detectable ligand and a masking component are attached; f.
- a nanoparticle held in aggregate by bridge molecules wherein at least a portion of the bridge molecules comprises a polynucleotide, and wherein the solution undergoes a color shift when the nanoparticle is disbursed in solution; g. a quantum dot or fluorophore linked to one or more quencher molecules by a linking molecule, wherein at least a portion of the linking molecule comprises a polynucleotide; q. a polynucleotide in complex with an intercalating agent, wherein the intercalating agent changes absorbance upon cleavage of the polynucleotide; or h. two fluorophores tethered by a polynucleotide that undergo a shift in fluorescence when released from the polynucleotide.
- Example 1 A method of metagenomic analysis for the proteins
- Metagenomic sequence data from public databases are search using Hidden Markov Models generated based on known Cas protein sequences including class II type V Cas effector proteins.
- CRISPR-Cas protein identified by the search are aligned to known proteins to identify potential active sites. From hundreds of potential sequences, finally, this metagenomic workflow results in the delineation of the Cas 12 protein as above described and show in FIG.1A-FIG.1D.
- the phylogenetic tree was constructed by IQTREE (FIG.1A) to visualize the relatedness of the orthologs at the primary amino-acid level using 103 Cas 12a, Cas 12b, Cas 12c, Casl2d, Casl2e, Casl2f, Casl2g, Casl2h, Casl2i, Casl2j, and Casl2k sequences from The National Center for Biotechnology Information (NCBI), various publications, and patents.
- the branches of the tree corresponding to the Cas 12 proteins provided by this disclosure are marked with a circle while the reference nucleases (AsCpfl, EbCpfl, and FnCpfl) were marked with a star.
- Structure model of Cas 12 protein was built using the crystal structure of AsCpfl/LbCpfl/FnCpfl (PDB ID: 5XH7/5XUZ/6ilK) as a template using SWISS-MODEE web server with its default parameters.
- the constructed models were used for domain determination and the result shows in FIG.2A and FIG.2B.
- the Cas 12 protein all contain the RECI, RuvC, and NUC-terminal ends.
- the phylogenetic tree was constructed by IQTREE (FIG.1C) to visualize the relatedness of the orthologs at the primary amino-acid level using 73 Casl2a, Casl2b, Casl2c, Casl2d, Casl2e, Casl2f, Casl2g, Casl2h, Casl2i, Casl2j, and Casl2k sequences from The National Center for Biotechnology Information (NCBI), various publications, and patents.
- NCBI National Center for Biotechnology Information
- the branches of the tree corresponding to the GEBx0019, GEBx0022 and GEBxOO33 in this study are marked with a circle while the reference nucleases (AsCpfl, LbCpfl, and FnCpfl) were marked with a star.
- the tree shows that the engineered Cas 12 protein studied here are representatives of unique Cas 12 clusters. They each have their unique branches, suggesting that they are evolutionarily distinct.
- GEBx0019, GEBx0022 and GEBxOO33 structure modelings were achieved by SWISS MODEE developed by Computational Structural Biology Group at the SIB Swiss Institute of Bioinformatics and the Biotechnik of the University of Basel and the template used for both three proteins was AsCpfl (PDB ID: 5xh7).
- the modeled structures are shown in FIG.4.
- the phylogenetic tree was constructed by IQTREE (FIG. IB) to visualize the relatedness of the orthologs at the primary amino-acid level using 88 Casl2a, Casl2b, Casl2c, Casl2d, Casl2e, Casl2f, Casl2g, Casl2h, Casl2i, Casl2j, and Casl2k sequences from The National Center for Biotechnology Information (NCBI), various publications, and patents.
- the branches of the tree corresponding to the GEBx0030 and GEBx0039 in this study are marked with a circle while the reference nucleases (AsCpfl, LbCpfl, and FnCpfl) were marked with a star.
- GEBx0030 and GEBx0039 are representatives of unique Casl2 clusters. They each have their unique branches, suggesting that they are evolutionarily distinct.
- FIG.5A Compared the engineered GEBx0030 and GEBx0039 with the reference nucleases (AsCasl2a, LbCasl2a, and FnCasl2a), as show in FIG.5A. From FIG.5A, we can see that GEBx0030 and GEBx0039 share low identity with three reference sequences (below 30% for GEBx0030 and 25% for GEBx0039). Besides, the size of GEBx0030 (1222 aa) and GEBx0039 (1171 aa) are much smaller than the usual Casl2a. These features suggest that GEBx0030 and GEBx0039 are independent of the existing Casl2a family.
- FIG.2C shows that GEBx0030 and GEBx0039 contain the conserved RuvC domain as other type V Cas protein.
- the phylogenetic tree was constructed by IQTREE (FIG. ID) to visualize the relatedness of the orthologs at the primary amino-acid level using 120 Cas 12a, Cas 12b, Cas 12c, Casl2d, Casl2e, Casl2f, Casl2g, Casl2h, Casl2i, Casl2j, and Casl2k sequences from The National Center for Biotechnology Information (NCBI), various publications, and patents.
- the branches of the tree corresponding to the Cas 12 proteins provided by this disclosure are marked with a circle while the reference nucleases (AsCpfl, LbCpfl, and FnCpfl) were marked with a star.
- GEBx0074-GEBx0080 Compared the engineered GEBx0074-GEBx0080 with the reference nucleases (AsCpfl, LbCpfl, and FnCpfl), as show in FIG.5B. From FIG.5B, we can see that GEBx0074-GEBx0080 share low identity with three reference sequences (below 50%). These features suggest that GEBx0074-GEBx0080 are independent of the existing Cas 12a family.
- FIG.2D The domain arrangement of GEBx0074-GEBx0080 is shown in FIG.2D.
- the amino acid sequences of Cas 12 proteins are shown in SEQ ID NOs: 1-35.
- the detail information is shown in Table 2.
- RNA folding of the active single crRNA sequence located at the CRISPR array of Casl2 proteins is computed using the RNAfold webserver developed by Lorenz et al 2011. The folded crRNAs are shown in FIG.6, and FIG.7.
- N represents the target specific sequence and the number of N is just an example illustration which does not represent its actual nucleotide quantity.
- the complete amino acid sequences of Casl2 proteins are shown in SEQ ID NOs: 74-78.
- the amino acid sequences of nuclear localization signals (NLSs) are shown in SEQ ID NOs: 79-80.
- the FLAG-tagged sequence is shown in SEQ ID NOs: 81-82.
- the complete amino acid sequences of other Casl2 proteins with NLSs and FLAG sequence are same to GEBx0013, GEBx0018, GEBx0032, GEBxOO33, and GEBx0037.
- DNA fragments encoding the Casl2 proteins are synthesized by GenScript and assembled by Gibson assembly into pEAST-Blunt E2 expression plasmid (shown in FIG.8).
- DNA fragments encoding the Casl2 proteins are partly shown in Table 6 and the others can refer to it to design.
- the nucleotide sequences of the Casl2 protein are synthesized commercially (like by Ruibiotech).
- Casl2 proteins are expressed as FLAG-tagged fusion proteins from an inducible T7 promoter (pEASY-Blunt E2 expression plasmid) in a protease deficient E.coli B strain.
- Cells expressing the FLAG-tagged proteins are lysed by sonication
- the supernatant was loaded on the Ni 2+ -charged HisTrap HP column (GE Healthcare) and eluted with a linear gradient of increasing imidazole concentration (from 0 to 500 mM) in 20 mM Tris-HCl, pH 7.5 at 25°C, 0.5 M NaCl Buffer on an AKTA Pure25 FPLC (Inscinstech).
- the eluate was resolved by SDS-PAGE on BeyoGel Plus PAGE (Beyotime) and stained with Feto SDS-PAGE staining buffer (H&Z lifescience). Purity was determined using densitometry of the protein band with ImageLab software (Bio-Rad). Purified endonucleases were dialyzed into a storage buffer composed of 20 mM CHsCOONa, 500 mM NaCl, 0.1 mM EDTA, 0.1 mM TCEP, 50% glycerol; pH 6.0 and stored at -80°C.
- FIG.9A The result of SDS-PAGE agarose gel electrophoresis about GEBx0032 is shown in FIG.9A.
- the purity of GEBx0032 protein reach -50% after Ni particle affinity chromatography.
- the optimal elution condition was 200 mM imidazole (40% NiB buffer).
- the purity of GEBxOO33 protein can reach 70% after Ni particle affinity chromatography.
- the optimal elution condition was 250 mM imidazole (50% NiB buffer).
- the result of SDS-PAGE agarose gel electrophoresis about GEBx0037 is shown in FIG.9C.
- the purity of GEBx0037 protein reach -50% after Ni particle affinity chromatography.
- the optimal elution condition was 250 mM imidazole (50% NiB buffer).
- FIG.9D and FIG.9E The results of SDS-PAGE agarose gel electrophoresis about GEBx0013 and GEBx0018 are shown in FIG.9D and FIG.9E.
- the purity of GEBx0013 protein can reach 90% after Ni particle affinity chromatography.
- the optimal elution condition was 150 mM imidazole (30% NiB buffer).
- the purity of GEBx0018 protein can reach 45% after Ni particle affinity chromatography.
- the optimal elution condition was 100 mM imidazole (20% NiB buffer).
- EXAMPLE 4 PAM Sequence identification/confirmation for the endonucleases described herein.
- a commercially available cell-free TXTL system developed from an all-Escherichia coli (E.coli) lysate (myTXTL, Arbor Biosciences) is used to rapidly express putative endonucleases from a plasmid (pEASY-Blunt E2) and targeting or non-targeting gRNAs.
- PAM sequences are determined by sequencing plasmids containing randomly generated potential PAM sequences that could be cleaved by the nucleases.
- an E.coli codon-optimized nucleotide sequence encoding the nuclease is transcribed and translated in vitro from a PCR fragment under the control of a T7 promoter.
- a synthetic crRNA encoding the repeat-spacer sequence is added to the system.
- Successful expression of the endonuclease in the TXTL system followed by complex with crRNA provides active in vitro CRISPR nuclease complexes.
- a library of target DNA fragments containing a protospacer sequence preceded by 8N mixed bases (potential PAM sequence) is incubated with the output of the TXTL reaction. After 1 hour of incubation, the reaction is stopped, and the DNA is recovered via a DNA clean-up kit.
- Adaptor sequences are blunt end ligated to DNA with active PAM sequences that have been cleaved by the endonuclease, whereas DNA that has not been cleaved was inaccessible for ligation.
- DNA segments comprising active PAM sequences are then amplified by PCR with primers specific to the library and the adapter sequence. The PCR amplification products are resolved on a gel to identify amplicons that to map cleavage events.
- the amplified segments of the cleavage reaction are also used as template for preparation of an NGS library or as a substrate for sanger sequencing. Sequencing this resulting library, which is a subset of the starting 8N library, revealed sequences with PAM activity compatible with the CRISPR complex.
- the PAM sequences are collected into seqLogo (see e.g., Huber et al. Nat Methods. 2015 Feb; 12(2): 115-21) representations.
- the seqLogo shows the 8 bp which are upstream of the spacer labelled as positions 0-7.
- For PAM testing with a processed RNA construct the same procedure is repeated except that an in vitro transcribed RNA is added along with the plasmid library and the minimal CRISPR array template is omitted.
- a library of target DNA fragments containing a protospacer sequence preceded by 8N mixed bases was synthesized for the in vitro PAM depletion assay.
- a 10 pL mixture containing 0.417 pM PAM library dsDNA (5’- TACACGACGCTCTTCCGATNNNNNNNNgagaagtcattcaataaggccactAGATCGGAAGAGCAC ACGTCTGAACTCCAGTCAC-3’, SEQ ID NO: 88), 4.17 pM purified Casl2 protein, 5 pM corresponding guide RNA harboring 5’-gagaagUcaUUcaaUaggccac-3’ (SEQ ID NO: 89) spacer and 1 pL NEBufferTM 2.1 was incubated under 37°C for 2 hours.
- GEBxOO33 show a preference for the 5’-TTTV-3’ PAM, wherein V is A, C or G.
- Target DNAs containing protospacer sequence (5’ -agaagtcattcaataaggccac-3’ , SEQ ID NO: 195) and PAM sequence (5’-TTTV-3’, V is A, C or G)) are constructed by DNA synthesis.
- a single representative PAM is chosen for testing when the PAM has degenerate bases.
- the target DNAs are comprised of 515 bp of linear DNA derived from a plasmid via PCR amplification with a PAM and protospacer located 700 bp from one end. Successful cleavage results in fragments of -200 and -300 bp.
- the target DNA, in vitro transcribed single RNA, and purified recombinant protein are combined in a cleavage buffer (NEBuffer 2.1) with an excess of protein and RNA and are incubated for 5 minutes to 3 hours, usually 1 hour.
- the reaction is stopped via addition of RNase A and incubation at 60 minutes.
- the reaction is then resolved on a 2% TAE agarose gel and the fraction of cleaved target DNA is quantified in ImageLab software.
- the cleavage efficiency is represented by cutting ratio.
- the cutting ratio is calculated by the Gray value analysis and the formula like this:
- the cutting ratio (%) 100 x (l-sqrt(l-(b + c)/(a + b + c)), “a” represents the uncut band gray value, “b” and “c” respectively represent the gray value of the two short sequences that be cut, “sqrt” is abbreviation for Square Root Calculations.
- cutting ratio can be also called cleavage ratio.
- the crRNA is shown as SEQ ID NO: 90; to GEBx0032, the crRNA is shown as SEQ ID NO: 91; to GEBx0037, the crRNA is shown as SEQ ID NO: 92.
- dsDNA cleavage 0.417 pmol substrate (515 bp DNA) was incubated with 4.17 pmol Casl2 protein and 5 pmol crRNA in lx reaction buffer (Novoprotein) at 37°C for 120 min.
- the reaction system of the dsDNA cleavage was 20pl.
- the reaction was then quenched with Ipl Proteinase K (20 mg/ml) at 50°C for 10 min.
- the cleavage products were mixed with 6x gel loading dye (NEB) and analyzed by 2% agarose gel electrophoresis and the fraction of cleaved target DNA was quantified in ImageLab software.
- the PAM sequence was TTTC.
- the nucleotide sequence of 515 bp DNA (dsDNA) was shown as SEQ ID NO: 93.
- the result of electrophoresis is shown in FIG.11 A- FIG.11C.
- GEBxOO33 exhibited a high cleavage efficiency and the cleavage efficiency of GEBxOO33 is over 85%.
- GEBx0032 and GEBx0037 exhibited the cleavage activity to the dsDNA and the cleavage efficiency of GEBx0032 is over 85%.
- GEBx0013 and GEBx0018 exhibited the cleavage activity to the dsDNA and the cleavage efficiency of GEBx0013 and GEBx0018 is 87% and 67%, respectively.
- GFP reporter plasmids containing a target DNA sequence containing spacer sequences and potential PAM sequences (determined e.g., as in Example 4) are constructed by DNA synthesis and cloning. A single representative PAM is chosen for testing when the PAM has degenerate bases. The target site is located at the EFla promoter region which could drive GFP expression.
- GFP fluorescence is measured with an Infinite M200 Plate Reader (Tecan) using excitation and emission wavelengths of 488 M and 533 nM, respectively. The reactions are incubated for 16 hours at 29°C and the resulting fluorescence data are analyzed using end-point and time-course analyses. The reported production of GFP is calculated using a linear standard calibration curve developed from recombinant GFP. For the plate reader used for our experiments, the raw fluorescence values were divided by the conversion factor 9212.61/pmol.
- EXAMPLE 7 PAM determination in mammalian cell line
- the HEK293T cells were cultured in DMEM media supplemented with 10% fetal bovine serum (GibcoTM).
- the HEK293T cells were cultured in DMEM media supplemented with 10% fetal bovine serum (GibcoTM).
- a volume of 450 pL of cells with a density of 100,000 cells/well was mixed with 50 pL mixture containing LipofectamineTM 3000 (ThermoFisher Scientific, Cat.
- the basic method of Guide-Seq library preparation is described by Nikolay et. al (Nat. Protoc. 2021 ) .
- the extracted DNA sample were first sheared using KAPA Frag Kit (Cat# KK8602, Roche) . Fragmented DNA was purified and then phosphorated using T4 Polynucleotide Kinase (Cat#M0201S, NEB).
- An SS5-adapter (generated by annealing lOpM SS5TOP oligo with lOpM SS5BTM oligo) was ligated to the fragmented DNA using Quick LigationTM Kit (Cat#M2200S, NEB), followed by two steps off-target PCR to add chemistry for sequencing.
- off-target PCR1 was performed using PlatinumTM Taq DNA Polymerase (Cat#15966005, Invitrogen) with GSP1 (a mixture of GSPl-Top and GSPl-BoT) and Y_XX oligos.
- off-target PCR2 was performed using PlatinumTM Taq DNA Polymerase with GSP2 (a mixture of GSP2- TopA/B/C and GSPl-BoTA/B/C), Y_XX (Same to PCR1) and i753_XX oligos.
- the DNA product in each step described above need purification using SPRI Select (Cat#B23318, Beckman Coulter).
- the final library was quantified with qPCR and sequenced on Illumina NextSeq 1000.
- the reads were aligned to a reference genome after eliminating those having low quality scores.
- Q30 rate is more than 0.9.
- the reads length is between 130 bp-140 bp.
- the resulting files containing the reads were mapped to the reference genome (BAM files), where reads that overlapped the target region of interest were selected.
- PS bond or linkage refers to a bond where a sulfur is substituted for one nonbridging phosphate oxygen in a phosphodiester linkage, for example in the bonds between nucleotides bases.
- the PAM preference of Casl2 proteins comprising GEBx0013, GEBx0018, GEBx0032, GEBx0047, GEBx0063, GEBx0064, and GEBx0070 were tested.
- the nucleic acid sequences (human Codon Optimized sequence) and thereof with NLS and FLAG of GEBx0013, GEBx0047, GEBx0063, GEBx0064, GEBx0070 are shown in Table 9.
- the nucleic acid sequences of referred in PAM determination GEBx0047, GEBx0063, GEBx0064, and GEBx0070 are set forth in SEQ ID NOs:112-115.
- the nucleic acid sequences of referred in PAM determination GEBx0013, GEBx0018, GEBx0032 are set forth in SEQ ID NOs: 83-85.
- FIG.14A, FIG.14C, FIG.14E, FIG.14F, and FIG.14G are shown in FIG.14A, FIG.14C, FIG.14E, FIG.14F, and FIG.14G. As shown in
- FIG.14A GEBx0047 recognizes a PAM having a sequence TYYN (Y is T or C, N is A, T, C, or G).
- GEBx0070 recognizes a PAM having a sequence TTTD (D is A, G, or T).
- GEBx0013 recognizes a PAM having a sequence TTTR (R is A or G).
- GEBx0018 recognizes a PAM having a sequence TTTR (R is A or G).
- FIG.14B, FIG.14D describes the fidelity of Casl2 proteins and FIG.14E, FIG.14F, and FIG.14G contain the statistical curve, wherein ‘perfect match’ (PM), where the sequences have 0 mismatches, and ‘mismatch’ (MM), where the sequences have mismatches.
- PM perfect match
- MM mismatch
- the PM curve with a steeper slope indicates a high fidelity of Casl2, as more perfectly aligned reads are aligned to fewer sites.
- a MM curve with a steeper slope and longer tail indicates lower fidelity of Casl2, as more reads with one or more mismatches are mapped to a higher variety of sites.
- FIGs show that GEBx0013, GEBx0018, GEBx0032, GEBx0047, and GEBx0070 are all have a high fidelity.
- HEK293T cells were cultured in DMEM media supplemented with 10% fetal bovine serum (GibcoTM).
- cells were counted and plated 450pL on 24-well plates at a density of 150,000 cells/well in a 24-well plate for 24 hours prior to transfection.
- Cells were co-transfected with a total volume of 50pL lipoplex mixture containing pCasX plasmid ( ⁇ lpL, 400 ng), pgRNA plasmid ( ⁇ lpL, lOOng), LipofectamineTM 2000 (IpL, Thermo Fisher Scientific, Cat. 11668019) and Opti-Mem, then the cells were cultured at 37°C and 5% CO2.
- the pgRNA plasmid and pCasX plasmid are shown in FIG.15.
- the nucleotide sequences (SEQ ID NO: 111-115) encoding GEBx0013/0047/0063/0064/0070 are codon-optimized for expression in mammalian cells.
- These nucleotide sequences further comprise 3’ and 5’ nuclear localization signals (NLSs) and FLAG-tagged sequences, and these nucleotide sequences are shown in Table 8 (SEQ ID NO: 116-120, lowercase letters represent the NLSs and FLAG-tagged sequences).
- the crRNA fraction including the direct repeat sequence and the spacer sequence (MYODI, VEGFA, IL1RN, and DNMT1-1, SEQ ID NO: 121- 124, based on the PAM is 5’-TTTG-3’) are shown in Table 10.
- the direct repeat sequence (DR) is same and it is set forth in SEQ ID NO: 96.
- NGS was utilized to identify the presence of insertions and deletions introduced by gene editing.
- Primers used for NGS which around the target area within the MY0D1/VEGFA/IL1RN/DNMT1 genes were designed. Additional PCR was performed following the manufacturer’s protocols (Illumina) to add chemistry for sequencing. The amplicons were sequenced on an Illumina iSeq 100 instrument. The reads were aligned to a reference genome after eliminating those having low quality scores. The Q30 rate was more than 0.9. The reads length was between 130 bp- 140 bp.
- the resulting files containing the reads were mapped to the reference genome (BAM files), where reads that overlapped the target region of interest were selected and the number of wild-type reads versus the number of reads which contain an insertion, substitution, or deletion was calculated.
- the number of the reads mapped to the reference genome is more than 1000.
- the editing efficiency (e.g., the “editing percentage” or “percent editing” or “indel frequency”) is defined as the total number of sequences reads with insertions/deletions (“indels”) or substitutions over the total number of sequences reads, including wild type.
- EXAMPEE 9 Testing of Genome Cleavage Activity of the CRISPR-Casl2 Complexes in Mammalian Cells
- the HEK293T was cultured in DMEM media supplemented with 10% fetal bovine serum (GibcoTM).
- GibcoTM fetal bovine serum
- a volume of 200 pF of cells with a density of 50,000 cells/well were seeded 24 hours pre-transfection.
- Cells were transfected with a lipoplex containing EipofectamineTM 3000 (0.4 pL/well), P3000 (2pE/well), pgRNA/pCasX plasmid (125 ng/well and 375 ng/well, respectively) and Opti-Mem up to 25 pE/well per the manufacturer’s protocol. Plated cells were allowed to settle and adhere for 72 hours in a tissue culture incubator at 37°C and 5% CO2 atmosphere.
- PCR 1 For NGS, 50 ng of total genomic DNA was input for two-step PCR using KAPA Hifi HotStart Ready Mix Kit (Roche). First-step PCR (PCR 1) resulted in a -200 bp product, followed by indexing PCR (PCR 2) yielding final fragments flanking the Illumina sequencing barcodes for subsequent Next-Seq or iSeq (Illumina, San Diego, CA, USA). PCR 1 reactions were carried out as follows: 98°C for 5 min, then 20 cycles of [98°C for 20 sec; 60°C for 20 sec; 72°C for 20 sec], followed by a final extension at 72°C for 3 min.
- the indexing PCR 2 reactions were carried out as follows: 98°C for 5 min, then 15 cycles of [98°C for 20 sec; 62°C for 20 sec; 72°C for 20 sec], followed by a final extension at 72°C for 3 min.
- PCR 2 products were purified by SPRI beads and quantified by VAHTS Library Quantification Kit for Illumina (Vazyme, Cat.NQIOl) on a StepOnePlus Real-time PCR system (Thermo Fisher Scientific).
- the amplicons were sequenced on an Illumina iSeq 100 or NextSeq instrument.
- the reads were aligned to a reference genome after eliminating those having low quality scores. Q30 rate is more than 0.9.
- the reads length is between 130 bp-140 bp.
- the resulting files containing the reads were mapped to the reference genome (BAM files), where reads that overlapped the target region of interest were selected and the number of wild types reads versus the number of reads which contain an insertion, substitution, or deletion was calculated.
- the number of the reads mapped the reference genome is more than 1000.
- Total editing frequency was calculated as: [count of total reads] divided by [count of reads with any insertions or deletions].
- Out-of-frame frequency was calculated as: [count of edited reads] divided by [count of reads with those insertions or deletions indivisible by 3].
- GEBx0063 and GEBx0064 were tested on MYOD1-TTTG-T1 target in HEK293T cell line.
- pCasX plasmid harbored GEBx0063 or GEBx0064 CDS were co-transfected with the pgRNA plasmid harbored different length of MYODI -TTTG-T1 spacer (17nt - 25nt, SEQ ID NO: 184- 188, Table 11A).
- the result is shown in FIG.16B, demonstrated a length of about 21nt-25nt MY0D1- TTTG-T1 spacer is suitable and a more suitable spacer length for GEBx0064 is 25nt.
- FIG.20A-FIG.23B indicate human cell genome editing efficiency of GEBx0047/0063/0064/0070 at additional 22 loci.
- the DR sequences and the spacers in pgRNA plasmid referred are shown in Table 11B.
- GEBx0047 shows the highest editing efficiency on RNF2-TTTG-T1 locus (3.98%, FIG.20A and FIG.20B); GEBx0063 shows more than 10% gene editing efficiency at five loci and has the highest efficiency on TTR-TTTG-T3 locus (14.77%, FIG.21A and FIG.21B); GEBx0064 shows more than 10% gene editing efficiency at three loci and has the highest efficiency on TTR-TTTG-T3 locus (11.78%, FIG.22A and FIG.22B); GEBx0070 shows more than 5% gene editing efficiency at eight loci and has the highest efficiency on HBB- TTTG-T2 locus (7.41%, FIG.23A and FIG.23B).
- the amino acid sequences of Casl2 proteins with NLSs (SEQ ID NO: 183 and SEQ ID NO: 80) and FLAG sequence (SEQ ID NO: 81) are shown in Table 12 (GEBx0013, GEBx0047, GEBx0063, GEBx0064, and GEBx0070, SEQ ID NOs: 148-152), corresponding to SEQ ID NOs: 116-120.
- the disclosure also provides an engineered, non-naturally occurring direct repeat sequence, wherein the direct repeat sequence comprises a nucleotide sequence having at least 90% sequence identity to any one of SEQ ID NOs: 153-156, or a variant thereof. In some embodiments, the direct repeat sequence having at least 95% or 98% sequence identity to SEQ ID NO: 153 or SEQ ID NO: 156. In some embodiments, the direct repeat sequence set forth in SEQ ID NO: 153 or SEQ ID NO: 156.
- the disclosure provides an engineered, non-naturally occurring spacer sequence, wherein the spacer sequence having at least 90% sequence identity to any one of SEQ ID NOs: 157-182. In some embodiments, the spacer sequence having at least 95% sequence identity to any one of SEQ ID NO: 157, SEQ ID NO: 164, SEQ ID NO: 168, SEQ ID NO: 169, SEQ ID NO: 171, and SEQ ID NOs: 175 -182. In some embodiments, the spacer sequence is set forth in any one of SEQ ID NO: 157, SEQ ID NO: 164, SEQ ID NO: 168, SEQ ID NO: 169, SEQ ID NO: 171, and SEQ ID NOs: 175 -182.
- the disclosure also provides an engineered, non-naturally occurring sgRNA wherein the sgRNA comprises, in a tandem arrangement:
- a spacer sequence which is capable of hybridizing to a sequence of the target nucleic acid to be manipulated; wherein the direct repeat sequence having at least 90% sequence identity to any one of SEQ ID NOs: 153-156, and the spacer sequence having at least 90% sequence identity to any one of SEQ ID NOs: 157-182 (Table 13).
- tandem arrangement of the direct repeat sequence and spacer sequence is in a 5’ to 3’ orientation.
- the direct repeat sequence having at least 90% or 95% sequence identity to SEQ ID NO: 153 or SEQ ID NO: 156.
- the spacer sequence having at least 95% sequence identity to any one of SEQ ID NOs: 157-181.
- the spacer sequence is set forth in any one of SEQ ID NO: 157, SEQ ID NO: 164, SEQ ID NO: 168, SEQ ID NO: 169, SEQ ID NO: 171, and SEQ ID NOs: 175 -182.
- sgRNA comprises SEQ ID NO: 153 and SEQ ID NO: 157.
- sgRNA comprises SEQ ID NO: 153 and SEQ ID NO: 164.
- sgRNA comprises SEQ ID NO: 153 and SEQ ID NO: 168.
- sgRNA comprises SEQ ID NO: 153 and SEQ ID NO: 169.
- sgRNA comprises SEQ ID NO: 153 and SEQ ID NO: 171.
- sgRNA comprises SEQ ID NO: 153 and SEQ ID NO: 175.
- sgRNA comprises SEQ ID NO: 153 and SEQ ID NO: 176.
- sgRNA comprises SEQ ID NO: 153 and SEQ ID NO: 177. In some embodiments, sgRNA comprises SEQ ID NO: 153 and SEQ ID NO: 178. In some embodiments, sgRNA comprises SEQ ID NO: 153 and SEQ ID NO: 179. In some embodiments, sgRNA comprises SEQ ID NO: 153 and SEQ ID NO: 180. In some embodiments, sgRNA comprises SEQ ID NO: 153 and SEQ ID NO: 181. In some embodiments, sgRNA comprises SEQ ID NO: 153 and SEQ ID NO: 182. In some embodiments, sgRNA comprises SEQ ID NO: 156 and SEQ ID NO: 157.
- sgRNA comprises SEQ ID NO: 156 and SEQ ID NO: 158. In some embodiments, sgRNA comprises SEQ ID NO: 156 and SEQ ID NO: 159. In some embodiments, sgRNA comprises SEQ ID NO: 153 and SEQ ID NO: 160.
Landscapes
- Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Organic Chemistry (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Biomedical Technology (AREA)
- General Engineering & Computer Science (AREA)
- Biotechnology (AREA)
- Molecular Biology (AREA)
- Biochemistry (AREA)
- Microbiology (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Plant Pathology (AREA)
- Physics & Mathematics (AREA)
- Medicinal Chemistry (AREA)
- Crystallography & Structural Chemistry (AREA)
- Micro-Organisms Or Cultivation Processes Thereof (AREA)
- Medicines That Contain Protein Lipid Enzymes And Other Medicines (AREA)
Abstract
La présente invention concerne une protéine Cas12 d'origine non naturelle modifiée, un système CRISPR-Cas et leurs utilisations. L'invention concerne les nouvelles protéines Cas12 modifiées, non naturelles, comprenant une séquence d'acides aminés sélectionnée parmi SEQ ID NOs : 1-35, ou un homologue correspondant ayant au moins 70 % d'identité de séquence. Ces protéines Cas12 devraient permettre une application plus large de systèmes CRISPR-Cas pour l'édition de gènes et le ciblage de gènes.
Applications Claiming Priority (16)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CNPCT/CN2022/114894 | 2022-08-25 | ||
| CN2022114894 | 2022-08-25 | ||
| CN2022115244 | 2022-08-26 | ||
| CNPCT/CN2022/115249 | 2022-08-26 | ||
| CNPCT/CN2022/115244 | 2022-08-26 | ||
| CN2022115249 | 2022-08-26 | ||
| CN2022120125 | 2022-09-21 | ||
| CNPCT/CN2022/120125 | 2022-09-21 | ||
| CNPCT/CN2022/125447 | 2022-10-14 | ||
| CN2022125447 | 2022-10-14 | ||
| CNPCT/CN2022/132014 | 2022-11-15 | ||
| CN2022132014 | 2022-11-15 | ||
| CNPCT/CN2023/088765 | 2023-04-17 | ||
| CN2023088765 | 2023-04-17 | ||
| CNPCT/CN2023/094273 | 2023-05-15 | ||
| CN2023094273 | 2023-05-15 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024042479A1 true WO2024042479A1 (fr) | 2024-02-29 |
Family
ID=90012664
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/IB2023/058394 Ceased WO2024042479A1 (fr) | 2022-08-25 | 2023-08-24 | Protéine cas12, système crispr-cas et leurs utilisations |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2024042479A1 (fr) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2025242229A1 (fr) * | 2024-05-24 | 2025-11-27 | 广州瑞风生物科技有限公司 | Protéine cas12 et son utilisation |
| WO2026056174A1 (fr) * | 2024-09-14 | 2026-03-19 | 上海尧唐生物科技股份有限公司 | Système crispr-cas et son utilisation |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110799525A (zh) * | 2017-04-21 | 2020-02-14 | 通用医疗公司 | 具有改变的PAM特异性的CPF1(CAS12a)的变体 |
| CN111676246A (zh) * | 2020-06-30 | 2020-09-18 | 西南大学 | 家蚕CRISPR/Cas12a介导的基因编辑载体及其应用 |
| US20210009974A1 (en) * | 2019-01-04 | 2021-01-14 | Mammoth Biosciences, Inc. | Programmable nuclease improvements and compositions and methods for nucleic acid amplification and detection |
| WO2022092317A1 (fr) * | 2020-10-30 | 2022-05-05 | 国立大学法人東京大学 | PROTÉINE Cas12f MODIFIÉE |
-
2023
- 2023-08-24 WO PCT/IB2023/058394 patent/WO2024042479A1/fr not_active Ceased
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110799525A (zh) * | 2017-04-21 | 2020-02-14 | 通用医疗公司 | 具有改变的PAM特异性的CPF1(CAS12a)的变体 |
| US20210009974A1 (en) * | 2019-01-04 | 2021-01-14 | Mammoth Biosciences, Inc. | Programmable nuclease improvements and compositions and methods for nucleic acid amplification and detection |
| CN111676246A (zh) * | 2020-06-30 | 2020-09-18 | 西南大学 | 家蚕CRISPR/Cas12a介导的基因编辑载体及其应用 |
| WO2022092317A1 (fr) * | 2020-10-30 | 2022-05-05 | 国立大学法人東京大学 | PROTÉINE Cas12f MODIFIÉE |
Non-Patent Citations (2)
| Title |
|---|
| HUANG HONGXIN, HUANG GUANJIE, TAN ZHIHONG, HU YONGFEI, SHAN LIN, ZHOU JIAJIAN, ZHANG XIN, MA SHUFENG, LV WEIQI, HUANG TAO, LIU YUC: "Engineered Cas12a-Plus nuclease enables gene editing with enhanced activity and specificity", BMC BIOLOGY, vol. 20, no. 1, 1 December 2022 (2022-12-01), XP093004963, DOI: 10.1186/s12915-022-01296-1 * |
| LUAN TIAN, GONG JUN, LUAN HUI, LIU WEN-YU, YANG QIN, ZHU YAO, WANG CHUN-LAI, LIU SI-GUO, ZHANG WAN-JIANG; LI GANG: "Establishment of a visual method for detection of Actinobacillus pleuropneumoniae based on CRISPR-Cas12a", ZHONGGUO YUFANG SHOUYI XUEBAO - CHINESE JOURNAL OF PREVENTIVE VETERINARY MEDICINE, ZHONGGUO NONGYE KEXUEYUAN * HA'ERBIN SHOUYI YANJIUSUO, CN, vol. 43, no. 8, 1 August 2021 (2021-08-01), CN , pages 843 - 847, XP093143828, ISSN: 1008-0589, DOI: 10.3969/j.issn.1008-0589.202012051 * |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2025242229A1 (fr) * | 2024-05-24 | 2025-11-27 | 广州瑞风生物科技有限公司 | Protéine cas12 et son utilisation |
| WO2026056174A1 (fr) * | 2024-09-14 | 2026-03-19 | 上海尧唐生物科技股份有限公司 | Système crispr-cas et son utilisation |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| AU2021231074B2 (en) | Class II, type V CRISPR systems | |
| CN114375334B (zh) | 工程化CasX系统 | |
| JP7830129B2 (ja) | 標的遺伝子編集構築物およびそれを使用する方法 | |
| EP3004339B1 (fr) | Nouvel échafaudage compact de cas9 dans le système crispr de type ii | |
| CA3111432A1 (fr) | Nouvelles enzymes crispr et systemes | |
| CN107794272A (zh) | 一种高特异性的crispr基因组编辑体系 | |
| WO2024089629A1 (fr) | Protéine cas12, système crispr-cas et leurs utilisations | |
| EP4349979A1 (fr) | Nucléase cas12i modifiée, protéine effectrice et utilisation de celle-ci | |
| WO2020180699A1 (fr) | Nouveaux enzymes et systèmes ciblant l'adn crispr | |
| EP4159853A1 (fr) | Système et procédé d'édition de génome | |
| WO2024042479A1 (fr) | Protéine cas12, système crispr-cas et leurs utilisations | |
| CA3221684A1 (fr) | Systemes crispr-transposon pour la modification d'adn | |
| WO2023030340A1 (fr) | Nouvelle conception d'arn guide et ses utilisations | |
| CN121358850A (zh) | Cas酶及其系统和应用 | |
| CN116162609A9 (zh) | Cas13蛋白、CRISPR-Cas系统及其应用 | |
| WO2024121790A2 (fr) | Protéine cas12, système crispr-cas et leurs utilisations | |
| JP2024501892A (ja) | 新規の核酸誘導型ヌクレアーゼ | |
| AU2021329295B2 (en) | Nuclease-mediated nucleic acid modification | |
| CN116949037A (zh) | 用于编辑目标核酸的组合物及编辑目标核酸的方法 | |
| WO2026067648A1 (fr) | Protéine cas de type ii et ses utilisations | |
| RU2832109C2 (ru) | Конструкции для направленного редактирования генов и способы с их применением | |
| US20250163392A1 (en) | Nucleic acid-guided nickase fusion proteins | |
| Lusi et al. | Programmed Manipulation of RNA Targets By Human Argonaute 2 | |
| WO2025201316A1 (fr) | Système crispr-cas | |
| WO2025096916A1 (fr) | Édition multisite dans des cellules vivantes |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23856806 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 23856806 Country of ref document: EP Kind code of ref document: A1 |