WO2023078384A1 - 分离的Cas13蛋白及其应用 - Google Patents

分离的Cas13蛋白及其应用 Download PDF

Info

Publication number
WO2023078384A1
WO2023078384A1 PCT/CN2022/129825 CN2022129825W WO2023078384A1 WO 2023078384 A1 WO2023078384 A1 WO 2023078384A1 CN 2022129825 W CN2022129825 W CN 2022129825W WO 2023078384 A1 WO2023078384 A1 WO 2023078384A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
grna
seq
protein
nucleic acid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2022/129825
Other languages
English (en)
French (fr)
Inventor
梁峻彬
梁兴祥
孙阳
徐辉
彭志琴
司凯威
皇甫德胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Synsorbio Technology Co Ltd
Guangzhou Reforgene Medicine Co Ltd
Original Assignee
Zhejiang Synsorbio Technology Co Ltd
Guangzhou Reforgene Medicine Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Synsorbio Technology Co Ltd, Guangzhou Reforgene Medicine Co Ltd filed Critical Zhejiang Synsorbio Technology Co Ltd
Priority to EP22889405.1A priority Critical patent/EP4428232A4/en
Priority to CN202280073436.6A priority patent/CN118510892A/zh
Publication of WO2023078384A1 publication Critical patent/WO2023078384A1/zh
Priority to US18/652,819 priority patent/US20240279630A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N5/00Undifferentiated human, animal or plant cells, e.g. cell lines; Tissues; Cultivation or maintenance thereof; Culture media therefor
    • C12N5/10Cells modified by introduction of foreign genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • C12N15/1136Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing against growth factors, growth regulators, cytokines, lymphokines or hormones
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • C12N15/1138Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing against receptors or cell surface proteins
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/85Vectors or expression systems specially adapted for eukaryotic hosts for animal cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • C12N15/902Stable introduction of foreign DNA into chromosome using homologous recombination
    • C12N15/907Stable introduction of foreign DNA into chromosome using homologous recombination in mammalian cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases [RNase]; Deoxyribonucleases [DNase]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6816Hybridisation assays characterised by the detection means
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPR]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2320/00Applications; Uses
    • C12N2320/10Applications; Uses in screening processes

Definitions

  • the invention relates to the technical field of gene editing, in particular to an isolated Cas13 protein and its application.
  • CRISPR-Cas13 is an RNA-targeting and editing system based on the bacterial immune system that protects it from viral attack.
  • the system is basically similar to the CRISPR-Cas9 system, but unlike the DNA-targeting CRISPR-Cas9 system, the Cas13 protein can be targeted to cleave RNA.
  • CRISPR-Cas13 is the second largest type VI of the CRISPR-Cas system, which contains a single effector protein Cas13, which can form a crRNA-guided RNA-targeting effector complex when assembled with CRISPR RNA (crRNA).
  • Cas13 proteins have two different types of ribonuclease activity, one is the pre-crRNA pretreatment by RNase to form a mature type VI interference complex; the other is RNase activity by two higher eukaryotes and prokaryotes Nucleotide-binding (Higher eukaryotes and prokaryotes nuceotide-binding, HEPN) domain provided.
  • the HEPN domain can help cut RNA, such as ssRNA, and when there is a folding structure in the target RNA, Cas13 generally preferentially cuts in the non-base-paired ssRNA region.
  • CRISPR-Cas13 can be divided into multiple subtypes (A, B, C, and D) according to phylogeny.
  • the invention discloses an isolated Cas13 protein, the amino acid sequence of the Cas13 protein comprises a sequence having ⁇ 50% sequence identity with any of the sequences shown in SEQ ID NO:1-SEQ ID NO:7, SEQ ID NO:60 .
  • the above-mentioned Cas13 protein was obtained by the inventors after repeated screening and trials among numerous proteins. It has the ability to form a complex with gRNA, form a complex with gRNA and bind to target nucleic acid, be guided to target nucleic acid by gRNA and/or target or Modification of target nucleic acid and other activities.
  • the amino acid sequence of the above-mentioned Cas13 protein can also include sequences shown in any one of SEQ ID NO: 1-SEQ ID NO: 7, SEQ ID NO: 60 have ⁇ 50%, ⁇ 60%, ⁇ 70%, ⁇ 75%, ⁇ 80%, ⁇ 85%, ⁇ 90%, ⁇ 92%, ⁇ 95%, ⁇ 96%, ⁇ 97%, ⁇ 98%, ⁇ 99%, ⁇ 99.5%, or 100% sequence identity
  • the sequence, that is, the amino acid sequence shown in the above sequence is only a part of the amino acid sequence of the Cas13 protein, and the Cas13 protein may also include other functional or non-functional domains.
  • the amino acid sequence of the above-mentioned Cas13 protein can also be ⁇ 50%, ⁇ 60%, ⁇ 70%, ⁇ 75%, Sequences with ⁇ 80%, ⁇ 85%, ⁇ 90%, ⁇ 92%, ⁇ 95%, ⁇ 96%, ⁇ 97%, ⁇ 98%, ⁇ 99%, ⁇ 99.5%, or 100% sequence identity, i.e.
  • the protein composed of the amino acid sequence shown in the above sequence is the Cas13 protein.
  • the Cas13 protein can form a complex with gRNA.
  • the Cas13 protein can be guided to the target nucleic acid by gRNA. It can be understood that after the Cas13 protein is guided to the target nucleic acid by the gRNA, the target nucleic acid can optionally be targeted or modified, or not targeted and modified. For example, in some cases, after the Cas13 protein is guided to the target nucleic acid by the gRNA, it may not target and modify the target nucleic acid (for example, it does not cut the target nucleic acid), and those skilled in the art can only use its The ability to recognize said target nucleic acid, such as causing the target nucleic acid to be bound but not cleaved.
  • the Cas13 protein after the Cas13 protein is guided to the target nucleic acid by the gRNA, it can target or modify the target nucleic acid (for example, cleave the target nucleic acid), for example, cleave the target mRNA and thus reduce the translation level.
  • the target nucleic acid for example, cleave the target nucleic acid
  • the Cas13 protein can be guided to the target nucleic acid by gRNA, and target or modify the target nucleic acid.
  • one or more of the following activities can be produced: cutting one or more target nucleic acids, visualizing or detecting one or more target nucleic acids, labeling one or more target nucleic acids seeding a target nucleic acid, transporting one or more target nucleic acids, masking one or more target nucleic acids, binding one or more target nucleic acids, increasing the level of transcription and/or translation of a gene corresponding to a target nucleic acid, and decreasing the level of transcription of a gene corresponding to a target nucleic acid transcription and/or translation levels.
  • said targeting said target nucleic acid is cleaving said target nucleic acid or binding said target nucleic acid.
  • said targeting said target nucleic acid is binding said target nucleic acid.
  • the combination may be the combination caused by base pairing between the gRNA guide sequence and the target sequence.
  • said targeting said target nucleic acid is cleaving said target nucleic acid.
  • the target nucleic acid is RNA.
  • the RNA is selected from mRNA, miRNA, rRNA, tRNA, snRNA and structural RNA.
  • the Cas13 protein can be guided to the target nucleic acid by gRNA and then optionally cut or not cut the target nucleic acid.
  • the target nucleic acid is mRNA.
  • the Cas13 protein can be guided to the target nucleic acid by a gRNA and then optionally cut or not cut the target nucleic acid.
  • the target nucleic acid is PTBP1 (Polypyrimidine Tract Binding Protein 1) mRNA, AQp1 (Aquaporin 1) mRNA, VEGFA (Vascular Endothelial Growth Factor A) mRNA, VEGFR1 (Vascular endothelial growth factor receptor 1) mRNA or VEGFR2 (Vascular endothelial growth factor receptor-2) mRNA.
  • PTBP1 Polypyrimidine Tract Binding Protein 1
  • AQp1 Amporin 1
  • VEGFA Vascular Endothelial Growth Factor A
  • VEGFR1 Vascular endothelial growth factor receptor 1
  • VEGFR2 Vascular endothelial growth factor receptor-2
  • the target nucleic acid is PTBP1 (Polypyrimidine Tract Binding Protein 1) mRNA or AQp1 (Aquaporin 1) mRNA. That is, it is used to knock down the level of AQp1 mRNA, thereby reducing the production of aqueous humor and reducing intraocular pressure, and is used to treat diseases such as glaucoma; or it is used to knock down the level of PTBP1 mRNA, so as to realize the transfer of astrocytes to neurons in the brain. Transdifferentiation, used to treat diseases such as Parkinson's disease.
  • PTBP1 Polypyrimidine Tract Binding Protein 1
  • AQp1 Amporin 1
  • the target nucleic acid is VEGFA mRNA, VEGFR1 mRNA or VEGFR2 mRNA, which can be used to treat age-related macular degeneration (AMD) by knocking down the mRNA level.
  • AMD age-related macular degeneration
  • the Cas13 protein is derived from: the same kingdom as a protein source having an amino acid sequence comprising any one of SEQ ID NO: 1-SEQ ID NO: 7, SEQ ID NO: 60 ( Kingdom), Phylum, Class, Order, Family, Genus or Species.
  • the protein comprising the sequence shown in SEQ ID NO:1 corresponds to Cas13m.1
  • the protein comprising the sequence shown in SEQ ID NO:2 corresponds to Cas13m.2
  • the protein comprising the sequence shown in SEQ ID NO:3 corresponds to Cas13m .3
  • the protein comprising the sequence shown in SEQ ID NO:4 corresponds to Cas13m.4
  • the protein comprising the sequence shown in SEQ ID NO:5 corresponds to Cas13m.5
  • the protein comprising the sequence shown in SEQ ID NO:6 corresponds to CasRfg.1
  • the protein comprising the sequence shown in SEQ ID NO:7 corresponds to CasRfg.2
  • the protein comprising the sequence shown in SEQ ID NO:60 corresponds to Cas13m.6.
  • the Cas13m.1 protein is derived from Cytophagales bacterium
  • the Cas13m.2 protein is derived from the bacterium comprising the genome shown in the numbering CNA0011077 in the CNGB database
  • the Cas13m.3 protein is derived from the Bacteroidetes ( Bacteroidetes bacterium)
  • the Cas13m.4 protein is derived from the bacterium containing the genome indicated by the number CNA0007373 in the CNGB database
  • the Cas13m.5 protein is derived from the Bacteroidetes bacterium (Bacteroidetes bacterium)
  • the Cas13m.6 protein is derived from the Prevotellaceae (Prevotellaceae bacterium)
  • the CasRfg.1 protein was derived from the bacterium comprising the genome indicated by the number GCA_003940745.1 in the NCBI database
  • the CasRfg.2 protein was derived from the bacterium comprising the genome indicated
  • the Cas13 protein is derived from:
  • the present invention is defined by the above threshold, and it is considered that the species with the above-mentioned genome ANI value ⁇ 95% are the same species, and the Cas13 protein has homology and similar function with the protein claimed in the present invention, and belongs to the present invention range.
  • the isolated Cas13 protein is from a genome comprising numbers GCA_003940745.1, GCA_013298125.1, GCA_902762805.1 or GCA_013298545.1 in the NCBI database, or CNA0011077, CNA0007373 or CNA0007373 in the CNGB database.
  • the species (species) of the genome whose ANI value ⁇ 95% of the genome indicated by CNA0009477.
  • the isolated Cas13 protein is from a genome comprising numbers GCA_013298125.1, GCA_902762805.1 or GCA_013298545.1 in the NCBI database, or a genome shown in the CNGB database as CNA0011077, CNA0007373 or CNA0009477 Species with an ANI value ⁇ 95% of the genome.
  • the isolated Cas13 protein is from a genome comprising numbers GCA_003940745.1, GCA_013298125.1, GCA_902762805.1 or GCA_013298545.1 in the NCBI database, or numbers CNA0011077, CNA0007373 or CNA0009477 in the CNGB database Bacteria with the indicated genomes.
  • the isolated Cas13 protein is from sewage WW isolate, bin5.concoct.b16b17b19.071, RUG10805 or bin17.concoct.ball.095 isolate.
  • the present invention also discloses an isolated Cas13 protein, the Cas13 protein comprising the amino acid sequence shown in the following motifs 1-15:
  • Motif 1 L-x(3)-R-N-x-Y-[ST]-H (SEQ ID NO:84)
  • Motif 2 R-x(3)-K-x-[VI]-N-G-F-G-R (SEQ ID NO:85)
  • Motif 3 P-Y-[IV]-T-x(5)-Y-x-[IV]-x(2)-N-x-I-G-L (SEQ ID NO:86)
  • Motif 4 P-x-L-x(2)-D-x(3)-[NK]
  • Motif 5 P-x-[AC]-x-L-S-x(2)-[ED]-[LF]-P-A-x(2)-F (SEQ ID NO:87)
  • Motif 6 [LI]-P-x-K-L Motif 7: [KT]-x-[AL]-x(2)-[KVE]-[IL] Motif 8: A-[DRK]-x-L-x(2)-[DS]-[MI]-[MV]-x-[FW]-Q-P (SEQ ID NO:88) Motif 9: K-L-T-x(2)-N (SEQ ID NO:89) Motif 10: F-x-[HR]-[AF]-x(5)-[QR] Motif 11: I-x-L-P-x-G-[LM]-F-x(3)-I (SEQ ID NO:90) Motif 12: [LI]-I-x(2)-[YWF]-F Motif 13: I-x(3)-I Motif 14: [DN]-[TN]-E-x(2)-[IL]-[KR]-[VR]-Y-[KR]-x-Q-D (SEQ ID NO:91)
  • A, F, C, U, D, N, E, Q, G, H, L, I, K, O, M, P, R, S, T, V, W, Y are standard amino acid codes
  • "x” is any amino acid, and the number in brackets after x indicates multiple consecutive x's
  • "[]” is an optional amino acid code
  • "-" is a separator.
  • the Cas13 protein includes motifs 1-15 in sequence from the N-terminus to the C-terminus.
  • the motif 1 is selected from motif 16, the motif 2 is selected from motif 17, the motif 3 is selected from motif 18, and the motif 4 is selected from motif 19 , the motif 5 is selected from the motif 20, the motif 6 is selected from the motif 21, the motif 7 is selected from the motif 22, the motif 8 is selected from the motif 23, and the motif 9 selected from motif 24, said motif 10 selected from motif 25, said motif 11 selected from motif 26, said motif 12 selected from motif 27, said motif 13 selected from motif 28, The motif 14 is selected from motif 29, and the motif 15 is selected from motif 30.
  • amino acid sequence shown in the motif 16-30 is as follows:
  • the amino acid sequence of the Cas13 protein comprises ⁇ 50%, ⁇ 60%, ⁇ 70% of the sequence shown in any one of SEQ ID NO:2, SEQ ID NO:3, and SEQ ID NO:60. %, ⁇ 75%, ⁇ 80%, ⁇ 85%, ⁇ 90%, ⁇ 92%, ⁇ 95%, ⁇ 96%, ⁇ 97%, ⁇ 98%, ⁇ 99%, ⁇ 99.5%, or 100% sequence identity sexual sequence.
  • the amino acid sequence of the Cas13 protein is any amino acid residue except motif 1-motif 15, and amino acid conservative substitutions are performed on the basis of the wild-type sequence, and the wild-type sequence Including sequences shown in SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:60.
  • the amino acid sequence of the Cas13 protein comprises any one of SEQ ID NO: 2, SEQ ID NO: 3, and SEQ ID NO: 60.
  • one or more amino acid residues (such as catalytic residues) in the amino acid sequence of the Cas13 protein can be mutated, so that it completely or partially loses its nuclease activity under the guidance of gRNA.
  • the RxxxxH motif of the HEPN (higher eukaryotes and pro-karyotes nucleotide, HEPN) domain of RNase is mutated to inactivate the HEPN domain.
  • nuclease activity can be reduced by mutation or modification, such as at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 97%, or 100%.
  • the Cas13 protein can form a complex with gRNA.
  • the Cas13 protein can be guided to the target nucleic acid by gRNA.
  • the Cas13 protein is derived from: the same kingdom, phylum, and class as the protein source comprising the sequence shown in any one of SEQ ID NO: 1-SEQ ID NO: 7, SEQ ID NO: 60 , order, family, genus or species.
  • the Cas13 protein is non-native.
  • the Cas13 protein of the present invention can be modified, eg, linked to a modified moiety (eg, another polypeptide, oligopeptide or other molecule). Often, the modification of a protein does not result in the desired activity of the protein (e.g., gRNA-binding activity, endonuclease activity, gRNA-guided binding to a specific site on the target nucleic acid, gRNA-guided binding to a target nucleic acid specific site binding and cleaving the target nucleic acid activity) adversely affects. Accordingly, the present invention is also intended to include such modified proteins.
  • the Cas13 protein of the present invention can be functionally linked (by chemical coupling, covalent linkage, gene fusion, non-covalent linkage or other means) to one or more modification moieties.
  • the invention discloses a conjugate, comprising the above-mentioned Cas13 protein, and a modified part (ie, a heterologous functional part) that modifies the Cas13 protein.
  • the modified portion of the conjugate is selected from another polypeptide, oligopeptide, detectable label, pharmaceutical agent, other molecule, and any combination thereof.
  • the modified moiety is selected from the group consisting of: localization tags that provide subcellular localization, tags that facilitate tracking, isolation or purification, translation activation domains, translation inhibition domains, nuclease domains, deamination Enzyme domains, methylase domains, demethylase domains and regulatory splicing domains (eg, regulate RNA splicing).
  • the localization tag providing subcellular localization is selected from the group consisting of nuclear localization signal (NLS) and nuclear export signal (NES) sequences.
  • NLS nuclear localization signal
  • NES nuclear export signal
  • NLS include, but are not limited to, NLS sequences derived from: the NLS sequence of the SV40 viral large T antigen; the NLS sequence from the nucleoplasmic protein; the c-myc NLS sequence; the hRNPA1 M9 NLS sequence; The NLS sequence of the IBB domain of ⁇ ; the NLS sequence of myoma T protein; the NLS sequence of human p 53; the NLS sequence of mouse c-abl IV; the NLS sequence of influenza virus NS1; the NLS sequence of hepatitis virus delta antigen; NLS sequence of murine Mx1 protein; NLS sequence of human poly(ADP-ribose) polymerase; NLS sequence of steroid hormone receptor (human) glucocorticoid.
  • NLS nuclear localization signal
  • the conjugate comprises one or more nuclear localization signals (NLS). In some of these embodiments, the conjugate comprises one or more nuclear export signals (NES). In some of these embodiments, the conjugate comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more nuclear localization signals.
  • NLS nuclear localization signals
  • NES nuclear export signals
  • the conjugate comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more nuclear localization signals.
  • the nuclear export signal comprises at least four hydrophobic residues.
  • the tag that facilitates tracking, separation or purification is selected from: epitope tags, fluorescent proteins (such as green fluorescent protein (GFP), YFP, RFP, CFP, mCherry, tdTomato, etc.), HIS Tags (eg 6 ⁇ His tag), hemagglutinin (HA) tag, FLAG tag, Myc tag, glutathione S-transferase (GST) tag and maltose binding protein (MBP) tag.
  • fluorescent proteins such as green fluorescent protein (GFP), YFP, RFP, CFP, mCherry, tdTomato, etc.
  • HIS Tags eg 6 ⁇ His tag
  • HA hemagglutinin
  • FLAG tag FLAG tag
  • Myc tag hemagglutinin (HA) tag
  • GST glutathione S-transferase
  • MBP maltose binding protein
  • the translation activation domain is selected from the group consisting of domains of eIF4E and other translation initiation factors, yeast poly(A)-binding protein, and GLD2.
  • the translation inhibition domain is selected from the group consisting of Pumilio protein, deadenylase (such as deadenylase CAF1) and Argonaute protein.
  • the nuclease domain is selected from the group consisting of: FokI, PIN endonuclease domain, NYN domain, SMR domain from SOT1 and RNase domain from staphylococcal nuclease.
  • the deaminase domain is from cytidine deaminase and adenosine deaminase.
  • the deaminase domain is selected from: PPR protein (Pentatricopeptide repeat), ADAR family protein, APOBEC family protein.
  • the methylase domain is from m6A methyltransferase.
  • the demethylation domain is from RNA demethylase ALKBH5.
  • the regulatory splicing domain is selected from: SRSF1, hnRNP A1, RBM4.
  • the conjugate comprises the above-mentioned Cas13 protein, and one or more modification parts. In some of these embodiments, the conjugate is composed of the above-mentioned Cas13 protein and one or more modification parts. In some of these embodiments, the conjugate consists of the above-mentioned Cas13 protein, one or more modified parts, and a linker for connecting the Cas13 protein and the modified part. In some cases, the multiple modifying moieties may be the same or different.
  • the conjugate includes or does not include a linker for connecting the Cas13 protein and the modified part.
  • the conjugate comprises a Cas13 protein, a modification part and a linker connecting the Cas13 protein and the modification part.
  • the conjugate consists of a Cas13 protein, a modification part and a linker connecting the Cas13 protein and the modification part.
  • the conjugate does not comprise a linker for connecting the Cas13 protein and the modified part. In some of these embodiments, the conjugate is directly connected by the Cas13 protein and the modified part, including direct connection through a covalent bond.
  • the linker can be amino acid, amino acid sequence or other chemical groups. In some of these embodiments, the linker can be amino acid, amino acid derivative, PEG (polyethylene glycol).
  • the linker is a linear polypeptide formed by linking one or more amino acid residues through peptide bonds, and the amino acid residues may be natural or non-natural, for example, may be modified.
  • linkers include those comprising one or more (e.g., 1, 2, 3, 4 or 5) amino acids (e.g., Glu or Ser) or amino acid derivatives (e.g., Ahx, ⁇ -Ala, GABA, or Ava ), or PEG, etc.
  • amino acids e.g., Glu or Ser
  • amino acid derivatives e.g., Ahx, ⁇ -Ala, GABA, or Ava
  • Non-limiting examples such as subcellular localization signals (such as NLS or NES), tags (such as HA tags, Flag tags), etc. as linkers are also within the scope of the present invention.
  • the conjugate can interact with gRNA.
  • the conjugate can form a complex with gRNA.
  • the conjugate can form a complex with the gRNA that binds to the target nucleic acid.
  • the conjugate can be guided to the target nucleic acid by a gRNA.
  • the conjugate can be guided to the target nucleic acid by gRNA, and target or modify the target nucleic acid.
  • the target nucleic acid can optionally be targeted or modified, or not targeted and modified.
  • the target nucleic acid can be targeted or modified, for example, cleaving the target mRNA and thus reducing the level of translation.
  • the modification part can be connected to the amino terminus of the Cas13 protein, near the amino terminus, carboxyl terminus and/or near the carboxyl terminus. In some of these embodiments, the modified part is connected to the amino terminus and/or carboxyl terminus of the Cas13 protein. In some of these embodiments, the modification part is connected near the amino terminus or carboxyl terminus of the Cas13 protein. In some of these embodiments, when the modified part is about 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 40, 50 or more from the amino terminal or carboxyl terminal along the polypeptide chain When within multiple amino acids, the modified moiety is considered to be near the amino terminus or near the carboxy terminus.
  • the conjugate comprises one or more nuclear localization signals of sufficient strength to drive detectable accumulation of the conjugate in and/or outside the nucleus of the eukaryotic cell and /or nuclear output signal. Detecting the accumulation of the Cas13 protein or conjugate at a specific site of the cell can be performed by any suitable technique.
  • the conjugate is non-natural.
  • the present invention also discloses a gRNA that can form a complex with the above-mentioned Cas13 protein or the above-mentioned conjugate.
  • the above-mentioned gRNA can guide the above-mentioned Cas13 protein or conjugate to the target nucleic acid.
  • the gRNA can guide the Cas13 protein or conjugate to a target nucleic acid, and target or modify the target nucleic acid.
  • the gRNA directs the complex to a target nucleic acid, and the complex then targets or modifies the target nucleic acid.
  • said targeting said target nucleic acid is cleaving said target nucleic acid or binding said target nucleic acid.
  • the gRNA comprises a guide sequence and a direct repeat sequence
  • the guide sequence can be complementary to the target nucleic acid
  • the direct repeat sequence can interact with the Cas13 protein or with the conjugate .
  • the guide sequence can be complementary (completely complementary or partially complementary) to the target nucleic acid, and the direct repeat sequence can interact with the Cas13 protein or with the conjugate.
  • the gRNA when the gRNA is combined with the Cas13m protein (Cas13m.1-Cas13m.6) of the present invention, a protein having ⁇ 50% sequence identity with the Cas13m protein, or a conjugate comprising the same
  • the direct repeat sequence of the gRNA is located at the 3' end of the guide sequence.
  • the direct repeat sequence of the gRNA is located at the 5' end of the guide sequence.
  • the gRNA comprises a guide sequence and a direct repeat sequence
  • the secondary structure of the direct repeat sequence includes sequentially connected: a first stem of a complementary pair, a non-complementary bulge structure, a complementary pair The second stem, non-complementary loop structure.
  • the gRNA has the following characteristics: a. the first stem consists of 4-7 pairs of base pairs, b. one of the non-complementary protrusion structures has a sequence length of 2 - 6 nucleotides, c. said second stem consists of 4-7 base pairs, and/or d. said non-complementary loop structure (excluding the one where the junction of loop and stem is complementary paired base pairs) are 5-8 nucleotides in length.
  • one of the sequences of the first stem is selected from: GUUG, GUUGU, GUUGUA, GUUGUUA.
  • the gRNA comprises a guide sequence and a direct repeat sequence
  • the direct repeat sequence is selected from any one of SEQ ID NO:15-SEQ ID NO:21, SEQ ID NO:62 Sequences with ⁇ 90% sequence identity, or sequences with ⁇ 95% sequence identity are indicated.
  • the direct repeat sequence is selected from any one of SEQ ID NO:15-SEQ ID NO:21, SEQ ID NO:62.
  • the gRNA comprises a guide sequence and a direct repeat sequence
  • the length of the guide sequence is ⁇ 10nt (10 nucleotides), ⁇ 11nt, ⁇ 12nt, ⁇ 13nt, ⁇ 14nt, ⁇ 15nt, ⁇ 16nt, ⁇ 17nt, ⁇ 18nt, ⁇ 19nt, ⁇ 20nt, ⁇ 21nt, ⁇ 22nt, ⁇ 23nt, ⁇ 24nt, ⁇ 25nt, ⁇ 26nt, ⁇ 27nt, ⁇ 28nt, ⁇ 29nt, ⁇ 30nt, ⁇ 31nt, ⁇ 32nt , ⁇ 33nt, ⁇ 34nt, ⁇ 35nt, ⁇ 40nt, ⁇ 50nt or ⁇ 60nt.
  • the gRNA comprises a guide sequence and a direct repeat sequence
  • the length of the guide sequence is ⁇ 10nt (10 nucleotides), ⁇ 11nt, ⁇ 12nt, ⁇ 13nt, ⁇ 14nt, ⁇ 15nt, ⁇ 16nt, ⁇ 17nt, ⁇ 18nt, ⁇ 19nt, ⁇ 20nt, ⁇ 21nt, ⁇ 22nt, ⁇ 23nt, ⁇ 24nt, ⁇ 25nt, ⁇ 26nt, ⁇ 27nt, ⁇ 28nt, ⁇ 29nt, ⁇ 30nt, ⁇ 31nt, ⁇ 32nt , ⁇ 33nt, ⁇ 34nt, ⁇ 35nt, ⁇ 40nt, ⁇ 50nt or ⁇ 60nt.
  • the gRNA comprises a guide sequence and a direct repeat sequence, and the length of the guide sequence ranges from 10nt-60nt, 10nt-50nt, 10nt-40nt, 12nt-35nt, 15nt-35nt, 15nt-30nt, 20nt-35nt, 20nt-30nt, 25nt-35nt or 25nt-30nt.
  • the gRNA comprises a guide sequence and a direct repeat sequence
  • the length of the direct repeat sequence is ⁇ 10nt, ⁇ 15nt, ⁇ 20nt, ⁇ 25nt, ⁇ 30nt, ⁇ 35nt, ⁇ 40nt, ⁇ 45nt , ⁇ 50nt, ⁇ 60nt, ⁇ 70nt, ⁇ 80nt, ⁇ 90nt, ⁇ 100nt, ⁇ 150nt, ⁇ 200nt or ⁇ 300nt.
  • the gRNA comprises a guide sequence and a direct repeat sequence
  • the length of the direct repeat sequence is ⁇ 10nt, ⁇ 15nt, ⁇ 20nt, ⁇ 25nt, ⁇ 30nt, ⁇ 35nt, ⁇ 40nt, ⁇ 45nt , ⁇ 50nt, ⁇ 60nt, ⁇ 70nt, ⁇ 80nt, ⁇ 90nt, ⁇ 100nt, ⁇ 150nt, ⁇ 200nt or ⁇ 300nt.
  • the gRNA comprises a guide sequence and a direct repeat sequence
  • the length range of the direct repeat sequence is 10nt-300nt, 10nt-200nt, 10nt-100nt, 15nt-80nt, 15nt-50nt, 15nt- 40nt, 15nt-35nt or 20nt-40nt.
  • the direct repeat sequence is located at the 3' end of the guide sequence. In some of these embodiments, the direct repeat sequence is located at the 5' end of the guide sequence.
  • the target nucleic acid is PTBP1 (Polypyrimidine Tract Binding Protein 1) mRNA, AQp1 (Aquaporin 1) mRNA, VEGFA mRNA, VEGFR1 mRNA or VEGFR2 mRNA.
  • PTBP1 Polypyrimidine Tract Binding Protein 1
  • AQp1 Amporin 1
  • the target nucleic acid is PTBP1 mRNA or AQp1 mRNA. In some of these embodiments, the target nucleic acid is VEGFA mRNA, VEGFR1 mRNA or VEGFR2 mRNA.
  • the invention also discloses a composition comprising:
  • the gRNA comprises a guide sequence
  • the guide sequence can be complementary to a target nucleic acid
  • the target nucleic acid is PTBP1 mRNA or AQp1 mRNA.
  • the nucleic acid is DNA. In some of these embodiments, the nucleic acid is RNA.
  • the invention also discloses a nucleic acid, comprising:
  • the nucleotide sequence is used for expression in prokaryotic cells or eukaryotic cells.
  • the nucleic acid is DNA. In some of these embodiments, the nucleic acid is RNA.
  • the invention also discloses a carrier, characterized in that the carrier comprises:
  • the nucleotide sequence encoding the Cas13 protein is one or more, and the nucleotide sequence encoding the conjugate is one or more.
  • the vector comprises a regulatory element.
  • the regulatory element can regulate the expression of the nucleotide sequence.
  • the regulatory element is a promoter and/or an enhancer. In some of these embodiments, the regulatory element is a promoter.
  • the vector is selected from: cloning vector and expression vector. In some of these embodiments, the vector is a plasmid or a viral vector.
  • the vector can express the Cas13 protein or conjugate of the present invention in cells. In some of these embodiments, the vector can express the Cas13 protein or conjugate of the present invention in eukaryotic cells. In some of these embodiments, the vector can express the Cas13 protein or conjugate of the present invention in human cells.
  • the vector is a non-natural vector.
  • the present invention also discloses a delivery composition, including a delivery carrier, and at least one selected from the following: the above-mentioned Cas13 protein, conjugate, gRNA, composition, nucleic acid, and carrier.
  • the delivery vector is selected from at least one of delivery particles, delivery vesicles, and viral vectors.
  • the present invention also discloses a cell, comprising: at least one of the above-mentioned Cas13 protein, conjugate, gRNA, composition, nucleic acid, and carrier.
  • the cells are eukaryotic cells.
  • the target nucleic acid is derived from animal cells, plant cells or microbial cells.
  • no animal or plant can be produced from the cells.
  • no animals or plants can be produced from said eukaryotic cells.
  • the eukaryotic cells comprise stem cells and stem cell lines.
  • the stem cells are not embryonic stem cells and the stem cell line is not an embryonic stem cell line.
  • the target nucleic acid in these cells has been targeted or grooming.
  • the present invention also discloses a method for targeting or modifying a target nucleic acid, comprising delivering to the target nucleic acid at least one selected from the following: the above-mentioned Cas13 protein, conjugate, gRNA, composition, nucleic acid, carrier, cell .
  • the delivery occurs ex vivo, in vitro or in vivo.
  • the method of targeting or modifying a target nucleic acid is used to modify a cell, cell line or organism by altering the target nucleic acid.
  • the target nucleic acid is derived from animal cells, plant cells or microbial cells.
  • the target nucleic acid is PTBP1 (Polypyrimidine Tract Binding Protein 1) mRNA, AQp1 (Aquaporin 1) mRNA, VEGFA mRNA, VEGFR1 mRNA or VEGFR2 mRNA.
  • PTBP1 Polypyrimidine Tract Binding Protein 1
  • AQp1 Amporin 1
  • the target nucleic acid is PTBP1 (Polypyrimidine Tract Binding Protein 1) mRNA or AQp1 (Aquaporin 1) mRNA.
  • the target nucleic acid is VEGFA mRNA, VEGFR1 mRNA or VEGFR2 mRNA.
  • the methods for targeting or modifying target nucleic acids do not include methods for diagnosis and treatment of diseases.
  • the methods for targeting or modifying target nucleic acids include methods for diagnosing and treating diseases.
  • the present invention also discloses the use of the above-mentioned Cas13 protein, conjugate, gRNA, composition, nucleic acid, carrier, and cell in the preparation of medicines for diagnosing, preventing, or treating diseases in subjects.
  • the subject is a human individual.
  • the present invention also discloses a method for administering the Cas13 protein, conjugate, gRNA, composition, nucleic acid, carrier, and cell to a subject in an effective amount for diagnosing, preventing, or treating a disease.
  • the present invention also discloses a nucleic acid detection method, which is characterized in that it includes the step of making the following a and b form a complex and bind to the target nucleic acid to be detected:
  • the method comprises forming a complex between the above-mentioned conjugate and the gRNA, and binding to the target nucleic acid; the conjugate comprises a detectable label, and the complex binds, cleaves or modifies the target
  • the nucleic acid causes the signal of the detectable marker to change, and the content of the target nucleic acid in the sample to be tested is analyzed by observing the signal change of the detectable marker.
  • the detectable label includes: a fluorescent group, a chromogenic agent, a imaging agent or a radioactive isotope.
  • the present invention has the following beneficial effects:
  • the isolated Cas13 protein of the present invention is a new Cas13 enzyme and can be used in a CRISPR/Cas system. And it has been verified by experiments that when the Cas13 protein of the present invention exerts its Cas13 nuclease activity, it can have good editing efficiency for both the exogenous reporter gene and the endogenous gene.
  • FIG. 1 is a schematic diagram of the comparison of Cas13 protein in Example 1 with the disclosed locus structure of each subtype of Cas13.
  • Fig. 2 is the position in the amino acid chain of the RxxxxH motif of Cas13 protein and other each subtype Cas13 protein in embodiment 1.
  • Figure 3 is a schematic diagram of cluster analysis of Cas13 protein and other subtypes of Cas13 protein in Example 1, wherein, A is a schematic diagram of Cas13m.1-Cas13m.5 cluster analysis, and B is a cluster analysis of Cas13m.1-Cas13m.6 schematic diagram.
  • FIG. 4 is a schematic diagram of the RNA secondary structure analysis of the Cas13 protein corresponding to the direct repeat sequence in Example 1 by using RNAfold.
  • Example 5 is the three-dimensional predicted structure of Cas13m.2, Cas13m.3 and Cas13m.6 proteins in Example 1.
  • Figure 6 is a schematic diagram of Cas13 protein superimposition in Example 1.
  • FIG. 7 is the result of detecting GFP fluorescence by flow cytometry in Example 4.
  • Fig. 8 is a schematic diagram of the mRNA changes of the endogenous target genes AQp1 and PTBP1 detected by qPCR in Example 5.
  • Fig. 9A, Fig. 9B and Fig. 9C are schematic diagrams of multiple sequence alignment screenshots of Cas13m protein and PbuCas13b in Example 9.
  • Figure 10 is the overlay diagram in Example 9, wherein, A-N respectively show the overlapping of motifs 1-15 of Cas13m.6 and the corresponding sequence of PbuCas13b after overlaying Cas13m.6 and PbuCas13b.
  • Fig. 11 is the test result of side cut effect in embodiment 11.
  • a protein or polypeptide referred to as a "Cas13 protein” or having "Cas enzymatic activity” or “Cas endonuclease activity” relates to a CRISPR-associated (Cas) polypeptide encoded by a CRISPR-associated (Cas) gene or protein, when complexed or functionally combined with one or more guide RNAs (guide RNA, gRNA), the Cas13 protein or polypeptide can be guided to a target sequence in a target nucleic acid, and sometimes subsequently target or modify the target nucleic acid .
  • guide RNA guide RNA
  • the Cas endonuclease recognizes, targets or modifies a specific target site (the target sequence or a nucleotide sequence near the target sequence) in the target nucleic acid, for example, it can be in RNA (such as coding RNA, such as mRNA ) target site in the molecule.
  • a specific target site the target sequence or a nucleotide sequence near the target sequence
  • RNA such as coding RNA, such as mRNA
  • HEPN domain has the meaning generally recognized in the art.
  • the HEPN domain has been proven to be an RNase domain and has the ability to bind and cleave target RNA molecules.
  • the target RNA can be any suitable form of RNA, including but not limited to coding RNA and non-coding RNA.
  • the previously discovered CRISPR class 2 type VI effector proteins all contain two HEPN domains, including, for example, Cas13a, Cas13b, Cas13c, Cas13d, Cas13e, and Cas13f, and their HEPN domains have a conserved RxxxxH motif, which is the HEPN Domain characteristics.
  • gRNA and "guide RNA” can be used interchangeably, and it has the meaning generally understood by those skilled in the art, and gRNA generally refers to the ability to bind to the Cas13 protein and help to guide the Cas13 protein to /An RNA molecule (or collective term for a group of RNA molecules) targeted to a specific location (target sequence) within a target nucleic acid/target polynucleotide (eg, a DNA or mRNA molecule). gRNAs contain guide sequences and direct repeat (DR) sequences.
  • DR direct repeat
  • the gRNA may contain one or more modifications (e.g., base modification, backbone modification, modification of internucleoside bonds, etc.) to provide the same function as the unmodified gRNA, or to provide new or enhanced features to the gRNA (e.g., Improved stability).
  • modifications e.g., base modification, backbone modification, modification of internucleoside bonds, etc.
  • guide sequence and “targeting domain” are used interchangeably and refer to a contiguous nucleotide sequence in a gRNA that is partially or completely complementary to a target sequence in a target nucleic acid, and Can hybridize to the target sequence in the target nucleic acid through base pairing facilitated by the Cas13 protein.
  • the complete complementarity of the guide sequence and the target sequence according to the invention is not required, as long as there is sufficient complementarity to cause hybridization and promote the formation of a CRISPR/Cas complex.
  • DR sequences can only be found through experimental screening in the CRISPR locus structure of prokaryotes (such as bacteria and archaea).
  • the size of the direct repeat sequence is usually within tens of bp, and some of its fragments are reverse complementary to each other, which means that a secondary structure is formed inside the RNA molecule, such as a stem-loop structure (often called a hairpin structure), and other fragments are Embodied as unstructured.
  • the direct repeat sequence is a constant part of the gRNA molecule, which contains a strong secondary structure, which facilitates the interaction between the Cas13 protein and the gRNA molecule.
  • target nucleic acid refers to a polynucleotide containing a target sequence and are often used interchangeably herein.
  • a target nucleic acid may comprise any polynucleotide, such as DNA (target DNA) or RNA (target RNA).
  • target nucleic acid refers to the nucleic acid that gRNA guides the Cas13 protein to target or modify.
  • target nucleic acid may be any polynucleotide, endogenous or exogenous to a cell (eg, a eukaryotic cell).
  • a “target nucleic acid” may be a polynucleotide present in eukaryotic cells, or it may be a sequence (or a portion thereof) encoding a gene product (e.g., a protein) or a non-coding sequence (or a portion thereof) .
  • a “target nucleic acid” may include one or more disease-associated genes and polynucleotides and signaling biochemical pathway-associated genes and polynucleotides.
  • a "disease-associated" gene or polynucleotide refers to any gene or polynucleotide that produces a transcription or translation product at an abnormal level or in an abnormal form in cells derived from a disease-affected tissue as compared to non-disease control tissues or cells acid.
  • the target nucleic acid is a coding RNA.
  • the target nucleic acid is a non-coding RNA.
  • the target nucleic acid includes mRNA, miRNA, rRNA, tRNA, snRNA, and structural RNA.
  • the target nucleic acid is mRNA.
  • the target nucleic acid is an entire mRNA molecule.
  • the target nucleic acid is DNA. In certain instances, the target nucleic acid is an entire chromosomal DNA molecule.
  • Target RNA means a specific sequence or its reverse complement that one wishes to bind, target or modify using the CRISPR system.
  • target sequence refers to a short sequence in a target nucleic acid molecule, which can be complementary (completely complementary or partially complementary) to the guide sequence of a gRNA molecule.
  • the target sequence is often tens of bp in length, for example, may be about 10 bp, about 20 bp, about 30 bp, about 40 bp, about 50 bp, or about 60 bp.
  • targeting is defined to include one or more of: cleaving one or more target nucleic acids, visualizing or detecting one or more target nucleic acids, labeling one or more target nucleic acids , transport one or more target nucleic acids, mask one or more target nucleic acids, bind one or more target nucleic acids, increase the transcription and/or translation levels of genes corresponding to the target sequences, and reduce the transcription and/or translation of genes corresponding to the target sequences /or translation level.
  • modification is defined to include one or more of: nucleic acid base substitution, nucleic acid base deletion, nucleic acid base insertion, nucleic acid methylation, nucleic acid demethylation, and Deamination of nucleic acids.
  • cleavage/cleaving refers to the breaking of covalent bonds (e.g., covalent phosphodiester bonds) in the ribosyl phosphodiester backbone of a polynucleotide, including but not limited to: Breaking a single-stranded polynucleotide, breaking either single strand of a double-stranded polynucleotide containing two complementary single strands, breaking both single strands of a double-stranded polynucleotide containing two complementary single strands.
  • covalent bonds e.g., covalent phosphodiester bonds
  • the Cas13 protein or conjugate of the present invention can be fused or associated with one or more heterologous functional parts (for example, through a fusion protein, a linker peptide, etc.).
  • Cas13 mutants that completely or partially lose nuclease activity are fused with heterologous functional parts.
  • These functional domains can have various activities such as methylase activity, demethylase activity, deaminase activity, translation activation activity, translation inhibition activity, RNA cleavage activity, nucleic acid binding activity, base editing activity, and switching activity (e.g. light-induced).
  • the heterologous functional moieties may include, but are not limited to: localization signals (such as nuclear localization signal NLS, nuclear export signal NES), markers or detection markers (such as fluorescent dyes such as FITC or DAPI), targeting moieties, epitope tags (e.g. Hismyc, V5, FLAG, HA, VSV-G, Trx, etc.), deaminase or deamination domain (e.g. ADAR1, ADAR2, APOBEC, AID or TAD), methylase, demethylase, ssRNA cleavage Active domain, dsRNA cleavage active domain, DNA or RNA ligase, or any combination of the above.
  • localization signals such as nuclear localization signal NLS, nuclear export signal NES
  • markers or detection markers such as fluorescent dyes such as FITC or DAPI
  • targeting moieties e.g. Hismyc, V5, FLAG, HA, VSV-G, Trx, etc.
  • the Cas13 protein of the present invention can be fused with deaminase, combined with gRNA and used to target target RNA, so as to realize single base editing of target RNA molecules.
  • the heterologous functional moiety can be a detectable label.
  • the conjugate containing the Cas13 nuclease cuts or modifies the target nucleic acid, and the presence of the target nucleic acid in the sample to be tested is analyzed by observing the presence of a detectable label.
  • the detectable label is such as a fluorophore, a chromogenic agent, a imaging agent or a radioactive isotope.
  • Methods for measuring the binding of Cas13 proteins or conjugates to target nucleic acids include, but are not limited to, chromatin immunoprecipitation assays, gel mobility shift assays, reporter gene assays, microplate capture and detection assays.
  • methods of measuring cleavage or modification of a target nucleic acid are known in the art, including in vitro or in vivo cleavage assays.
  • complex is used interchangeably with “CRISPR/Cas complex”.
  • complex refers to the ribonucleoprotein complex formed by the combination of gRNA and Cas13 protein.
  • the ribonucleoprotein complex can recognize (and sometimes further cut or modify) the target sequence complementary to the guide sequence of the gRNA or the target nucleic acid in which it is located.
  • non-natural means "engineered”, meaning man-made involvement.
  • the term means that said nucleic acid molecule or said polypeptide is at least substantially free of at least one other component with which they are naturally and with which they are found in nature.
  • the term may indicate that a nucleic acid molecule or polypeptide has a sequence that does not occur in nature.
  • conjugate refers to a modified Cas13 protein.
  • the conjugate comprises a Cas13 protein part and a modified part.
  • Modified parts can be proteins or polypeptides (or any functional fragments thereof), oligopeptides, and other small molecules (including but not limited to sugar molecules).
  • the conjugate can be a fusion protein.
  • sequence identity As used herein, the term "sequence identity" (identity or percent identity) is used to refer to the match of sequences between two polypeptides or between two nucleic acids. When a position in both sequences being compared is occupied by the same base or amino acid monomer subunit (for example, a position in each of the two DNA molecules is occupied by an adenine, or both a certain position in each of the polypeptides is occupied by lysine), then the molecules are identical at that position. "Percent sequence identity" between two sequences is a function of the number of matching positions shared by the two sequences divided by the number of positions being compared x 100%. For example, two sequences have 60% sequence identity if 6 out of 10 positions of the two sequences match.
  • comparisons are made when two sequences are aligned to yield maximum sequence identity.
  • alignments can be performed using published and commercially available alignment algorithms and programs such as, but not limited to, Clustal ⁇ , MAFFT, Probcons, T-Coffee, Probalign, BLAST, at the reasonable option of one of ordinary skill in the art.
  • Those skilled in the art can determine appropriate parameters for aligning sequences, including, for example, any algorithms needed to achieve optimal alignment or optimal alignment over the full length of the sequences being compared, as well as to achieve optimal alignment of parts of the sequences being compared or whatever algorithm is needed for optimal contrast.
  • Sequence identity relates to sequence similarity. Identity or similarity comparisons can be made by visual alignment (by eye), more usually with the aid of sequence comparison programs. These computer programs can calculate the percent (%) identity or similarity between two or more sequences and can also calculate sequence identities shared by two or more amino acid or nucleic acid sequences.
  • polypeptide peptide and protein are used interchangeably herein to refer to a polymer of amino acids of any length.
  • the polymer may be linear or branched, it may contain modified amino acids, and it may be interrupted by non-amino acids. These terms also encompass amino acid polymers that have been modified; such as disulfide bond formation, glycosylation, lipidation, acetylation, phosphorylation, or any other manipulation, such as conjugation to a labeling component.
  • amino acid as used herein includes natural and/or unnatural or synthetic amino acids, including glycine and D and L optical isomers, as well as amino acid analogs and peptidomimetics.
  • domain refers to a portion of a protein sequence that can exist and function independently of the rest of the protein chain.
  • the term "vector” refers to a nucleic acid delivery vehicle into which a polynucleotide can be inserted.
  • the vector is called an expression vector.
  • the vector can enter the host cell through transformation, transduction or transfection, so that the genetic material elements it carries can be expressed in the host cell.
  • Vectors are well known to those skilled in the art, including but not limited to: plasmids; cosmids; phagemids; artificial chromosomes, such as yeast artificial chromosomes (YAC) or bacterial artificial chromosomes (BAC); phages such as lambda phage and animal viruses, etc. .
  • Animal viruses that can be used as vectors include, but are not limited to: retroviruses (including lentiviruses), adenoviruses, adeno-associated viruses, herpesviruses (such as herpes simplex virus), poxviruses, baculoviruses, papillomaviruses, papillomaviruses, Polyoma vacuolar virus (eg SV40).
  • retroviruses including lentiviruses
  • adenoviruses such as herpes simplex virus
  • poxviruses poxviruses
  • baculoviruses papillomaviruses
  • papillomaviruses papillomaviruses
  • Polyoma vacuolar virus eg SV40
  • a vector may contain various expression-controlling elements, including but not limited to: promoter sequence, transcription initiation sequence, enhancer sequence, selection element and reporter gene.
  • the vector may also contain an origin
  • Vectors include, but are not limited to: nucleic acid molecules that are single-stranded, double-stranded, or partially double-stranded; nucleic acid molecules that contain one or more free ends, no free ends (e.g., circular); nucleic acids that contain DNA, RNA, or both molecules; and other types of polynucleotides known in the art.
  • Certain vectors are capable of autonomous replication in the host cell into which they are introduced. Other vectors are integrated into the genome of the host cell after introduction into the host cell and thereby replicate along with the host genome. Furthermore, certain vectors are capable of directing the expression of genes to which they are operably linked.
  • vectors are referred to herein as "expression vectors.”
  • Vectors that are used in, and produce expression in, eukaryotic cells may be referred to herein as “eukaryotic expression vectors.”
  • Common expression vectors employed in recombinant DNA techniques are often in the form of plasmids.
  • a vector can be introduced into a host cell to thereby produce transcripts, proteins, or peptides, including proteins, conjugates, isolated nucleic acids, complexes, compositions, etc., as described herein.
  • a recombinant expression vector may comprise a nucleic acid of the invention in a form suitable for expression of the nucleic acid in a host cell, which means that the recombinant expression vector comprises one or more regulatory elements which may be selected based on the host cell used for expression, Operably linked to the nucleic acid sequence to be expressed.
  • operably linked is intended to mean a Cas protein coding sequence or a gRNA coding sequence in a vector to allow expression of a nucleotide sequence (for example, in an in vitro transcription/translation system or when the vector is introduced Expressed in the host cell when delivered to the host cell) is linked to one or more regulatory elements.
  • the promoter 1 is placed upstream of the Cas13 protein coding sequence, and when the vector is introduced into the host cell, the transcription of the Cas13 gene can be initiated under the drive of the promoter 1.
  • regulatory element is intended to include promoters, enhancers, internal ribosomal entry sites (IRES), and other expression control elements (e.g., transcription termination signals such as polyadenylation signals and poly U sequence). Regulatory elements include those that direct the nucleotide sequence to be expressed continuously in many types of host cells and those that direct the nucleotide sequence to be expressed only in certain host cells (eg, tissue-specific regulatory sequences). Tissue-specific promoters can direct expression primarily in a desired tissue of interest such as muscle, neurons, bone, skin, blood, specific organs (e.g., liver, pancreas), or specific cell types (e.g., lymphocytes). Express.
  • the vector comprises one or more pol III promoters (e.g., 1, 2, 3, 4, 5, or more pol III promoters), one or more pol II promoters (e.g., , 1, 2, 3, 4, 5, or more pol II promoters), one or more pol I promoters (for example, 1, 2, 3, 4, 5, or more pol I promoters ), or a combination thereof.
  • pol III promoters include, but are not limited to, the U6 and H1 promoters.
  • pol II promoters include, but are not limited to, the retroviral Rous sarcoma virus (RSV) LTR promoter (optionally with an RSV enhancer), the cytomegalovirus (CMV) promoter (optional Optionally with CMV enhancer), SV40 promoter, dihydrofolate reductase promoter, ⁇ -actin promoter, phosphoglycerol kinase (PGK) promoter, and EF1 ⁇ promoter.
  • RSV Rous sarcoma virus
  • CMV cytomegalovirus
  • SV40 promoter dihydrofolate reductase promoter
  • ⁇ -actin promoter phosphoglycerol kinase (PGK) promoter
  • PGK phosphoglycerol kinase
  • EF1 ⁇ promoter EF1 ⁇ promoter.
  • regulatory element also encompasses enhancer elements such as WPRE, CMV enhancer, SV40 enhancer, and intronic sequences between ex
  • the vector can be introduced into a host cell to express the Cas13 protein, conjugate or CRISPR complex of the present invention.
  • promoter has the meaning known to those skilled in the art, which refers to a non-coding nucleotide sequence located upstream of a gene and capable of promoting the expression of downstream genes.
  • a constitutive promoter is a nucleotide sequence that, when operably linked to a polynucleotide encoding or defining a gene product, results in the expression of the gene product in the cell under most or all physiological conditions of the cell. generation.
  • An inducible promoter is a nucleotide sequence which, when operably linked to a polynucleotide encoding or defining a gene product, results in a The gene product is produced intracellularly.
  • a tissue-specific promoter is a nucleotide sequence that, when operably linked to a polynucleotide that encodes or defines a gene product, results in a The gene product is produced in the cell.
  • the term "host cell” refers to cells that can be used to introduce vectors, including but not limited to: prokaryotic cells such as Escherichia coli or Bacillus subtilis, fungal cells such as yeast cells or Aspergillus, Or animal cells such as fibroblasts, CHO cells, COS cells, NSO cells, HeLa cells, BHK cells, HEK 293 cells or other human cells.
  • expression refers to the process of transcription from a DNA template into a polynucleotide (such as transcription into mRNA or other RNA transcripts) and/or the subsequent translation of the transcribed mRNA into a peptide, Polypeptide or protein process. Transcripts and encoded polypeptides may be collectively referred to as "gene products” or “gene expression products.” “Expression” of a gene or nucleic acid as used herein encompasses not only cellular gene expression, but also transcription and translation of one or more nucleic acids in a cloning system or in any other context.
  • linker refers to a group that connects a protein and a modification moiety.
  • the group may be an amino acid, amino acid sequence or other chemical group.
  • amino acids eg, Glu or Ser
  • amino acid derivatives e.g, PEG (polyethylene glycol)
  • PEG polyethylene glycol
  • a “linker” refers to a linear polypeptide formed by linking one or more amino acid residues through peptide bonds, and the amino acid residues may be natural or non-natural, for example, may be modified.
  • the linker of the present invention may be an artificially synthesized amino acid sequence, or a naturally occurring polypeptide sequence, such as a polypeptide having a hinge region function. Such linker polypeptides are well known in the art.
  • linkers may be newly discovered or well known in the art, examples of which include, but are not limited to, those comprising one or more (e.g., 1, 2, 3, 4 or 5) amino acids (e.g., Glu or Ser ) or linkers of amino acid derivatives (eg, Ahx, ⁇ -Ala, GABA or Ava), or PEG, etc.
  • amino acids e.g., Glu or Ser
  • linkers of amino acid derivatives eg, Ahx, ⁇ -Ala, GABA or Ava
  • PEG etc.
  • the gRNA of the present invention may contain one or more modifications (for example, base modification, backbone modification, etc.) to provide the same function as the unmodified gRNA, or to provide new or enhanced features to the gRNA (for example, improved stability) .
  • modifications for example, base modification, backbone modification, etc.
  • suitable gRNAs containing modifications include gRNAs containing modified backbones or non-natural internucleoside linkages.
  • gRNA modifications include, for example, phosphorothioate modification, 2'-O-methyl modification, 2'-O-methoxyethyl (MOE) modification, 2'-deoxy modification, phosphorothioate internucleotide linkage , phosphonoacetate (PACE) internucleotide linkage, thiophosphonoacetate (thioPACE) internucleotide linkage, locked nucleic acid (LNA) or cyclohexenyl substitution for the furanose ring.
  • PACE phosphonoacetate
  • thioPACE thiophosphonoacetate internucleotide linkage
  • LNA locked nucleic acid
  • the furanose ring or the furanose ring and the internucleotide bond of the gRNA of the present invention can be replaced by a non-furanose group.
  • One such nucleic acid (which has been shown to have excellent hybridization properties) is known as peptide nucleic acid (PNA).
  • PNA peptide nucleic acid
  • the sugar backbone of the polynucleotide is replaced by an amide-containing backbone.
  • the furanose ring in the gRNA molecule can also be replaced by a cyclohexenyl ring, called cyclohexenyl nucleic acid (CeNA).
  • LNA locked nucleic acid
  • the gRNA of the present invention may also include base modifications or substitutions.
  • the gRNA of the invention may comprise unmodified or natural bases (eg, the purine bases adenine A and guanine G and the pyrimidine bases thymine T, cytosine C and uracil U).
  • the gRNA of the present invention may contain modified bases, for example including other synthetic and natural bases such as 5-methylcytosine, 5-hydroxymethylcytosine, xanthine, hypoxanthine, 2-aminoadenine, adenine And other derivatives of guanine, 5-uracil (pseudouracil), 4-thiouracil, other derivatives of cytosine, other derivatives of uracil, derivatives of thymine.
  • the modification can be at any position in the gRNA molecular structure.
  • the 5' end or 3' end of the gRNA may have additional nucleotides connected to the guide sequence.
  • the 5' end may contain 2 additional guanine nucleotides for increased targeting specificity.
  • the terms “delivery particle”, “delivery particle system” and “particle” are used interchangeably.
  • the particles are used to deliver the Cas13 protein, conjugate, gRNA, complex, nucleic acid, composition, etc. of the present invention.
  • Several types of delivery particle systems and/or formulations are known for use in a diverse range of biomedical applications.
  • particles are defined as small objects that behave as integral units with respect to their transport and properties.
  • Particles are further classified according to diameter.
  • the size of the coarse particles is between 2500-10000 nanometers.
  • the size of the fine particles is between 100-2500 nanometers.
  • the size of ultrafine particles or nanoparticles is generally between 1-100 nanometers.
  • Particle characterization (including, for example, characterizing morphology, size, etc.) can be performed using a variety of different conventional techniques.
  • Delivery particle systems within the scope of the present invention may be provided in any form, including but not limited to: liposomes (including, for example, immunoliposomes), virosomes (including, for example, artificial virosomes), extracellular vesicles (including, for example, exosomes) , microvesicles and apoptotic bodies), particles (e.g. nanoparticles), microvesicles, gene guns, electroporation, sonoporation, calcium phosphate-mediated transfection, cationic transfection, dendritic transfection, heat shock Transfection, nucleofection, magnetofection, lipofection, piercing transfection, optical transfection, nucleic acid uptake enhanced by proprietary reagents, microinjection.
  • liposomes including, for example, immunoliposomes
  • virosomes including, for example, artificial virosomes
  • extracellular vesicles including, for example, exosomes
  • exosomes are endogenous nanovesicles that transport certain substances, including but not limited to RNA and proteins.
  • Liposomes are spherical vesicular structures consisting of a single or multilamellar lipid bilayer surrounding an inner aqueous compartment and a relatively impermeable outer lipophilic phospholipid bilayer constitute. Liposomes have received considerable attention as drug delivery vehicles because they are biocompatible, nontoxic, can deliver hydrophilic and lipophilic drug molecules, protect their contents from degradation by plasma enzymes, and transport across Biofilms and the blood-brain barrier. Liposomes can be made from several different types of lipids; however, phospholipids are most commonly used to generate liposomes as drug carriers. Several other additives can be added to liposomes in order to modify their structure and properties. Delivery or administration according to the invention can be performed using liposomes.
  • the cells of the present invention include, but are not limited to: prokaryotic cells such as Escherichia coli cells, and eukaryotic cells such as yeast cells, insect cells, plant cells and animal cells (such as mammalian cells, such as mouse cells, human cells, etc., such as human stem cells, human stem cell lines such as human hematopoietic stem cells, hematopoietic progenitor cells, etc.).
  • prokaryotic cells such as Escherichia coli cells
  • eukaryotic cells such as yeast cells, insect cells, plant cells and animal cells (such as mammalian cells, such as mouse cells, human cells, etc., such as human stem cells, human stem cell lines such as human hematopoietic stem cells, hematopoietic progenitor cells, etc.).
  • eukaryotic cell includes, but is not limited to, eg, host cells, cell lines, and cell progeny.
  • the host cells, cell lines and cell progeny can be optionally ex vivo, ex vivo or in vivo.
  • drug means a drug that is administered to a subject.
  • medicament means a drug that confers some beneficial effect when administered to a subject.
  • beneficial effects include achievement of diagnostic certainty; amelioration of the disease, symptom, disorder, or pathological condition; reduction or prevention of onset of the disease, symptom, disorder, or pathological condition; and generally combating the disease, symptom, disorder, or pathological condition.
  • the term "subject” includes, but is not limited to, various animals, such as mammals, such as bovines, equines, ovines, porcines, canines, felines, Lagomorpha, rodent (eg, mouse or rat), non-human primate (eg, macaque or cynomolgus monkey) or human.
  • the subject eg, a human
  • has a disorder eg, a disorder caused by a defect in a disease-associated gene).
  • an effective amount refers to an amount of an agent sufficient to achieve a beneficial or desired result.
  • a therapeutically effective amount may vary depending on one or more of the subject receiving treatment and the disease condition, the weight and age of the subject, the severity of the disease condition, the mode of administration, etc., and can be determined by one of ordinary skill in the art. Easily determined.
  • the term also applies to a dose that provides an image for detection by any of the imaging methods described herein.
  • the specific dosage may vary depending on one or more of: the particular agent chosen, the dosing regimen followed, whether it is given in combination with other compounds, the time of administration, the tissue to be imaged, and the physical delivery system carrying it .
  • administering to an individual can occur in vitro, ex vivo, or in vivo.
  • Constant Replacement refers to the replacement (ie substitution) between amino acid molecules with similar properties.
  • the properties include, but are not limited to, the ionicity, hydrophobicity and molecular weight of the molecule.
  • substitutions can be, for example, (1) substitutions between aromatic amino acids (Phe, Trp, Tyr), (2) non-polar aliphatic amino acids (Gly, Ala, Val, Leu, Met, Ile, Pro ), (3) substitutions between uncharged polar amino acids (Ser, Thr, Cys, Asn, Gln), (4) substitutions between basic amino acids (Lys, Arg, His), or ( 5) Substitution between acidic amino acids (Asp, Glu).
  • Embodiment 1 Screening of Cas13 protein
  • the Cas13 protein of the present invention is obtained by the following method:
  • the protein sequence within 10kb upstream and downstream of the CRISPR Array was compared with the known Cas13, and the comparison results with evalue greater than 1*e -5 were filtered out.
  • Cas13 protein Cas13m.1 (SEQ ID NO: 1), Cas13m.2 (SEQ ID NO: 2), Cas13m.3 (SEQ ID NO: 3), Cas13m.4 (SEQ ID NO:4), Cas13m.5 (SEQ ID NO:5), Cas13m.6 (SEQ ID NO:60), CasRfg.1 (SEQ ID NO:6), CasRfg.2 (SEQ ID NO:7).
  • the amino acid sequence of the above-mentioned Cas13 protein is shown in Table 1 below.
  • the two RxxxxH (x represents any amino acid residue) motifs in each Cas13 protein are underlined.
  • some Cas13 protein sequences such as Cas13m.1, Cas13m.3
  • the amino acid sequences of the 5 proteins Cas13m.1-Cas13m.5 or the 6 proteins Cas13m.1-Cas13m.6 were compared, and the positions corresponding to the RxxxxH motifs of other proteins in the comparison results were identified as
  • the RxxxxH motif identified as the catalytic active center of the Cas13m.1 and Cas13m.3 proteins is also underlined in the above table.
  • the comparison results also show that the six proteins Cas13m.1-Cas13m.6 contain RNxYxH and RNxxxH motifs sequentially from N-terminus to C-terminus, and x is independently selected from naturally occurring amino acid residues.
  • CasRfg.1 and CasRfg.2 proteins contain RxxxxH motifs and RNxxxH motifs sequentially from N-terminus to C-terminus.
  • the source of the genome sequence of the above-mentioned Cas13 protein is shown in Table 2 below.
  • the natural (wild type) DNA coding sequence of the above-mentioned Cas13 protein is as follows:
  • the wild-type DNA coding sequence of Cas13 protein Cas13m.1 is shown in SEQ ID NO:8;
  • the wild-type DNA coding sequence of Cas13 protein Cas13m.2 is shown in SEQ ID NO:9;
  • the wild-type DNA coding sequence of Cas13 protein Cas13m.3 is shown in SEQ ID NO:10;
  • the wild-type DNA coding sequence of Cas13 protein Cas13m.4 is shown in SEQ ID NO:11;
  • the wild-type DNA coding sequence of Cas13 protein Cas13m.5 is shown in SEQ ID NO:12;
  • the wild-type DNA coding sequence of Cas13 protein Cas13m.6 is shown in SEQ ID NO:61;
  • the wild-type DNA coding sequence of Cas13 protein CasRfg.1 is shown in SEQ ID NO:13;
  • the wild-type DNA coding sequence of Cas13 protein CasRfg.2 is shown in SEQ ID NO:14.
  • the locus structure of the above-mentioned Cas13 protein is as shown in Figure 1, and the locus structure of each subtype of the Cas13 protein of the present invention and the disclosed Cas13 is compared in Figure 1, wherein, CRISPR means CRISPR Array (containing the DNA sequence of the corresponding DR sequence ), Cas13e.1 and Cas13f.1 are derived from the Chinese patent with publication number CN112410377A. It can be seen from the figure that the locus structure of Cas13m.1-Cas13m.5 has basically the same characteristics, and the locus structure of Cas13m.1-Cas13m.6 has basically the same characteristics.
  • Cas13 protein corresponding direct repeat sequence serial number Cas13m.1 GUUGUUACAGCCCUUAGUUUGUAGGGUAAUGACAAC SEQ ID NO:15 Cas13m.2 GUUGUAGAUGACCUCGUUUUGGAGGGGAAACACAAC SEQ ID NO:16 Cas13m.3 GUUGUAGAAGCCGUUCAUUCGGGACGGUAUGACAAC SEQ ID NO:17 Cas13m.4 GUUGUAAAUACCCACGUUUUGGUGGGCUAAUACAAC SEQ ID NO:18 Cas13m.5 GUUGUGUGUGCCUUUCAAAUUGAAGGCGUUCCCAAC SEQ ID NO:19 Cas13m.6 GUUGUAGAAGCCUAUCGUUAGGAUAGGUAUGACAAC SEQ ID NO:62 CasRfg.1 AUGACUAUACCAGCAAUGGCUGGAUUAAAAC SEQ ID NO:20
  • Fig. 2 shows the position of the RxxxxH motif of each Cas13 protein in the present invention in the amino acid chain, two RxxxxH of Cas13m.1, Cas13m.2, Cas13m.3, Cas13m.4, Cas13m.5 and Cas13m.6 protein
  • the motifs are obviously far apart, except for Cas13m.1, the distance between the two RxxxxH motifs is 923aa, the distance between the two RxxxxH motifs of Cas13m.3 is even 1061aa, and the distance between the two RxxxxH motifs of Cas13m.5 is 1011aa. Cas13m.6 is separated by 1011 aa.
  • RNA secondary structure corresponding to the direct repeat sequence of the Cas13m.1-Cas13m.6, CasRfg.1, and CasRfg.2 proteins of the present invention was predicted by using RNAfold. As shown in Figure 4. It can be seen from the figure that the DR sequences corresponding to Cas13m.1-Cas13m.6 have a conserved secondary structure.
  • the direct repeat sequences corresponding to Cas13m.1, Cas13m.2, Cas13m.3, Cas13m.4 and Cas13m.5 have the following characteristics: They all obviously have a conserved secondary structure, among which A is conserved Schematic diagram of the secondary structure of , including a complementary paired first stem (stem 1), a non-complementary raised structure (bulge), a complementary paired second stem (stem 2), a non-complementary loop structure (loop structure), Stem 1 and stem 2 respectively contain complementary paired bases; B-F are the secondary structures of the direct repeat sequence corresponding to Cas13m.1, Cas13m.2, Cas13m.3, Cas13m.4 and Cas13m.5, wherein stem 1 contains 4 base pairs (5'-GUUG-3'), or 5 base pairs (5'-GUUGU-3'), or 6 base pairs (5'-GUUGUA-3'), or
  • CasRfg.1 Using BLASTp to compare the CasRfg.1 protein with the Cas13 protein included in NCBI, it was found that the evalue value compared with Cas13c was the lowest for other Cas13 subtypes; combined with the phylogenetic tree analysis in Figure 3, CasRfg.1 was classified as Cas13c isoform.
  • CasRfg.2 Using BLASTp to compare the CasRfg.2 protein with the Cas13 protein included in NCBI, it was found that the evalue value compared with Cas13d was the lowest among other Cas13 subtypes; combined with the phylogenetic tree analysis in Figure 3, CasRfg.2 was classified as Cas13d isoform.
  • Embodiment 2 Preparation, separation and purification of Cas13 protein
  • the constructed recombinant vectors were named as Cas13m.1-pET28a, Cas13m.2-pET28a, Cas13m.3-pET28a, Cas13m.4-pET28a, Cas13m.5-pET28a, CasRfg.1-pET28a and CasRfg.2-pET28a.
  • Recombinant vectors are used to express Cas13m.1 recombinant protein (sequence shown in SEQ ID NO:22), Cas13m.2 recombinant protein (sequence shown in SEQ ID NO:23), Cas13m.3 recombinant protein (sequence shown in SEQ ID NO:23) respectively NO: shown in 24), Cas13m.4 recombinant protein (sequence shown in SEQ ID NO: 25), Cas13m.5 recombinant protein (sequence shown in SEQ ID NO: 26), CasRfg.1 recombinant protein (sequence shown in SEQ ID NO: 26) shown in ID NO:27), CasRfg.2 recombinant protein (sequence shown in SEQ ID NO:28).
  • the recombinant Cas13 series protein structure is His tag-NLS-Cas13-SV40NLS-nucleoplasmin NLS.
  • the positive clones with the correct sequence were cultivated overnight, and the plasmid was extracted to transform the expression strain Rosetta (DE3), spread on an LB plate resistant to kanamycin sulfate, and culture overnight at 37°C.
  • the recombinant Cas13 series protein structure contains the NLS sequence, and the 6 His at the N-terminal are used as purification tags, and the above-mentioned Cas13 series recombinant proteins are purified by IMAC (Ni Sepharose 6 Fast Flow, CYTIVA). SDS-PAGE electrophoresis of each purified recombinant protein shows a band in the 100-250 kDa interval.
  • Embodiment 3 Preparation, isolation and purification of Cas13m.6
  • the structure is His tag-NLS-Cas13-SV40NLS-nucleoplasmin NLS.
  • the final purified Cas13m.6 recombinant protein was subjected to SDS-PAGE electrophoresis, showing a band in the 100-250kDa interval.
  • EGFP enhanced green fluorescent protein
  • the spacer sequence targeting EGFP is: tgccgttcttctgcttgtcggccatgatat (SEQ ID NO: 30).
  • the exogenous EGFP expression vector sequence is shown in SEQ ID NO:31
  • the Cas13m.2 verification vector sequence is shown in SEQ ID NO:32
  • the Cas13m.3 verification vector sequence is shown in SEQ ID NO:33
  • the Cas13m.5 verification vector sequence The sequence is shown in SEQ ID NO:34
  • the CasRfg.2 verification vector sequence is shown in SEQ ID NO:35.
  • Both the Cas13m.1 verification vector and the Cas13m.4 verification vector have the same backbone sequence as the Cas13m.3 verification vector, and only the Cas13 protein coding sequence and the coding sequence of the DR sequence have been replaced accordingly.
  • the CasRfg.1 verification vector has the same backbone sequence as the CasRfg.2 verification vector, and only the Cas13 protein coding sequence and the coding sequence of the DR sequence have been replaced accordingly.
  • the above verification vector contains codon-optimized Cas13 protein coding sequence, which can express Cas13 protein linked with NLS, and can also express gRNA targeting EGFP containing the corresponding DR sequence of Cas13.
  • the guide sequence of the gRNA corresponds to the above-mentioned spacer sequence (SEQ ID NO: 30).
  • the vector to be verified is transfected into 293T cells
  • the plasmid expressing exogenous gene EGFP (EGFP for short) was transfected into 293T cells in a 24-well plate at a ratio of 1:2 (300ng:600ng) of the above-mentioned verified vector plasmids.
  • the transfection method is as follows:
  • Trypsin (Trypsin 0.25%, EDTA, Thermo, 11058021) digested 293T cells, counted the cells, spread 2 ⁇ 10 5 cells in a 24-well plate according to 500 ⁇ L of one well.
  • the complex was added to 293T cells and mixed, and detected by flow cytometry after 48 hours.
  • the GFP fluorescence of the EGFP group is a
  • the GFP fluorescence of other groups is x.
  • the blank control group did not participate in the comparison.
  • the GFP fluorescence intensity of 293T group is 1073.55
  • the GFP fluorescence intensity of EGFP group is 8052219.55.
  • Cas13 proteins can significantly down-regulate the expression of EGFP, which proves that it can effectively reduce the mRNA level and exert editing activity in eukaryotic cells through the guidance of gRNA.
  • Cas13m.2 and Cas13m.3 down-regulated the expression of EGFP the most.
  • Codon-optimized Cas13m.2, Cas13m.3, Cas13m.5, CasRfg.2, and CasRx (one of Cas13d) expression vectors with universal gRNA backbone expression cassettes were synthesized in the reagent company, respectively Cas13m.
  • 2-BsaI (sequence shown in SEQ ID NO:36), Cas13m.3-BsaI (sequence shown in SEQ ID NO:37), Cas13m.5-Bsa (sequence shown in SEQ ID NO:38), CasRfg .2-BsaI (sequence as shown in SEQ ID NO:39), CasRx-BpiI (sequence as shown in SEQ ID NO:40).
  • the endogenous sites selected for the experiment are AQp1 and PTBP1, wherein the 293T cell line (293T-AQp1 cells) with high expression of AQp1 was used for the verification of AQp1, and the 293T cell line was used for the verification of PTBP1.
  • the method for constructing the 293T cell line highly expressing AQp1 constructing the vector Lv-AQp1-T2a-GFP overexpressing the AQp1 gene and the EGFP gene, the sequence of which is shown in SEQ ID NO:41.
  • AQp1 is spaced from EGFP using a 2A peptide.
  • Lv-AQp1-T2a-GFP plasmid packaging lentivirus was transduced into 293T cells to form a cell line stably overexpressing AQp1 gene.
  • the guide sequence of the gRNA targeting AQp1 was selected as:
  • the guide sequence of the gRNA targeting PTBP1 was selected as:
  • primer annealing method Use the primer annealing method to obtain the fragments targeting the target site, and the primers are as follows:
  • R CAACCCACGACCCTCTTTGTCTTC (SEQ ID NO: 50)
  • R CAACCCACGACCCTCTTTGTCTTC (SEQ ID NO: 50)
  • R CAACCCACGACCCTCTTTGTCTTC (SEQ ID NO: 50)
  • R AAAACCACGACCCTCTTTGTCTTC (SEQ ID NO: 52)
  • the primer annealing reaction system is as follows, incubate in the PCR instrument at 95°C for 5 minutes, then immediately take it out and incubate on ice for 5 minutes, so that the primers anneal to each other to form double-stranded DNA with sticky ends:
  • the CasRx-BpiI plasmid was digested with BpiI endonuclease, the annealed product and the backbone purified and recovered after digestion were subjected to T4 ligation, and after transformation into Escherichia coli, positive clones were selected and the verification vector plasmid targeting the endogenous gene mRNA was extracted for Cell experiment verification.
  • the vector to be verified is transfected into 293T cells and 293T-AQp1 cells
  • the Cas13m.2, Cas13m.3, Cas13m.5, CasRfg.2, and CasRx targeting PTBP1 plasmids obtained in the previous step were transfected into 293T cells in a 24-well plate at 800ng.
  • the negative control group was transfected with CasRx-BpiI plasmid.
  • the transfection method is as follows:
  • the complex was added to the cells and mixed, and detected after 72 hours using QuantStudio TM 5 Real-Time PCR System, 96-well.
  • RNA samples at 72 hours after transfection were extracted using the SteadyPure Universal RNA Extraction Kit AG21017 kit for RNA extraction, and the mRNA concentration was detected using an ultramicro spectrophotometer.
  • the mRNA product was reverse transcribed using the Evo M-MLV Mix Kit with gDNA Clean for qPCR AG11728 Reverse Transcription Kit, and the reverse transcription product was detected using the SYBR Green Premix Pro Taq HS qPCR Kit (Low Rox Plus) AG11720qPCR Kit.
  • the primers used in qPCR are as follows:
  • the reaction system was configured according to the instructions of SYBR Green Premix Pro Taq HS qPCR Kit (Low Rox Plus) AG11720, and the QuantStudio TM 5 Real-Time PCR System, 96-well was used for detection.
  • ⁇ Ct Ct(AQp1)-Ct(GAPDH) or Ct(PTBP1)-Ct(GAPDH);
  • ⁇ Ct ⁇ Ct (sample to be verified such as Cas13m.2 group)- ⁇ Ct (negative control group);
  • Cas13m.2, Cas13m.3, Cas13m.5, and CasRfg.2 all had the effect of down-regulating the expression of AQp1 and PTBP1.
  • Cas13m.2 and Cas13m.3 have better editing activity than CasRx in down-regulating the expression of genes AQp1 and PTBP1.
  • Cas13m.5 and CasRfg.2 also have significant editing activity.
  • the endogenous site selected in this experiment is AQp1, and the 293T cell line highly expressing AQp1 in the previous examples was used to verify AQp1.
  • the guide sequence of the gRNA targeting AQp1 is GAAGACAAAGAGGGUCGUGG (SEQ ID NO: 42).
  • the verification vector used is as follows
  • the verification vector of Cas13m.2, Cas13m.3, Cas13m.5, and CasRfg.2 targeting the endogenous gene AQp1 mRNA has been constructed in Experimental Example 5, and the Cas13m of the gRNA structure (reversed position of the guide sequence and the direct repeat sequence) has been adjusted. .2-r, Cas13m.3-r, Cas13m.5-r, CasRfg.2-r target the verification vector of endogenous gene AQp1 mRNA (other sequences except gRNA coding sequence are compatible with Cas13m.2, Cas13m.3, Cas13m.5, CasRfg.2 verification vectors are the same) synthesized in the reagent company.
  • the vector to be verified is transfected into 293T cells and 293T-AQp1 cells
  • the control plasmid (the verification vector plasmid of the CasRx targeting endogenous gene AQp1 mRNA in the above-mentioned embodiment 5) was transfected into 293T-AQp1 cells in a 24-well plate according to 800ng.
  • the negative control group was transfected with the CasRx-BpiI plasmid in Example 5 above.
  • the transfection method is as follows:
  • the complex was added to the cells and mixed, and detected after 72 hours using QuantStudio TM 5 Real-Time PCR System, 96-well.
  • RNA concentration was detected using an ultramicro spectrophotometer.
  • the RNA product was reverse transcribed using the Evo M-MLV Mix Kit with gDNA Clean for qPCR AG11728 Reverse Transcription Kit, and the reverse transcription product was detected using the SYBR Green Premix Pro Taq HS qPCR Kit (Low Rox Plus) AG11720qPCR Kit.
  • the primers used for qPCR include the primer pair for detecting AQp1 shown in SEQ ID NO:56-57, and the primer pair for detecting internal reference GAPDH shown in SEQ ID NO:58-59.
  • ⁇ Ct ⁇ Ct (sample to be verified such as Cas13m.2 group)- ⁇ Ct (negative control group);
  • the amount of AQp1 mRNA is calculated as shown in Table 7 below:
  • the qPCR results showed that the editing activity of Cas13m.2-r, Cas13m.3-r, Cas13m.5-r, and CasRfg.2-r was significantly reduced after changing the relative position of the direct repeat sequence and the guide sequence.
  • Example 7 The editing activity of Cas13m.6 on exogenous genes in cells, and the activity comparison between Cas13m and published proteins
  • this example uses the same method as Example 4 to carry out the experiment.
  • a Cas13m.6 verification vector targeting EGFP was prepared, and the full-length sequence is shown in SEQ ID NO: 105 (7690bp).
  • NCBI After consulting NCBI, NCBI has disclosed two Cas13 proteins, namely C13-38 protein (GenBank: MBQ9236733.1), C13-40 protein (NCBI Reference Sequence: WP_025000926.1), and the corresponding DR sequence.
  • the inventors compared the gene editing activity of Cas13m with C13-38 and C13-40.
  • the C13-38 sequence is:
  • the DR sequence corresponding to C13-38 is:
  • the C13-40 sequence is:
  • the DR sequence corresponding to C13-40 is:
  • the C13-38 verification carrier and the C13-40 verification carrier were constructed by referring to the above method, and their skeleton sequences were the same as those of the Cas13m.3 verification carrier in Example 4, only the Cas13 protein coding sequence and the coding sequence of the DR sequence were corresponding replace.
  • the above-mentioned Cas13m.6 verification vector, C13-38 verification vector, and C13-40 verification vector all contain the Cas13 protein coding sequence, which can express the Cas13 protein connected with NLS, and can also express the gRNA targeting EGFP (C13-38 and C13-40
  • the guide sequences corresponding to the gRNA are located at the 5' end of the DR sequence).
  • the guide sequence of the gRNA corresponds to the above-mentioned spacer sequence (SEQ ID NO: 30).
  • the above vectors were all synthesized by reagent companies using conventional methods.
  • the vector to be verified is transfected into 293T cells
  • the plasmid expressing the exogenous gene EGFP and the Cas13 verification carrier plasmid are in the ratio of 300ng:600ng Transfect 293T cells.
  • the experiment of this embodiment was repeated three times.
  • the result data is the average value of three tests, and the results are shown in Table 9 below.
  • this example uses the same method as Example 5.
  • the expression vector Cas13m.6-BsaI was constructed, and its sequence is shown in SEQ ID NO:77.
  • the endogenous sites selected for the experiment were AQp1 and PTBP1, wherein the aforementioned 293T cell line (293T-AQp1 cells) with high expression of AQp1 was used for verification of AQp1, and the 293T cell line was used for verification of PTBP1.
  • the guide sequence of the gRNA targeting AQp1 was selected as:
  • the guide sequence targeting PTBP1 was selected as:
  • primer annealing method Use the primer annealing method to obtain the fragments targeting the target site, and the primers are as follows:
  • Cas13m.6 group caccGTGGTTGGAGAACTGGATGTAGATGGGCTG (SEQ ID NO: 44)
  • CasRx panel SEQ ID NO:46 and SEQ ID NO:48.
  • Cas13m.6 group CACCGagggcagaaccgatgctgatgaagac (SEQ ID NO: 69)
  • CasRx group aaacagggcagaaccgatgctgatgaagac (SEQ ID NO: 71)
  • the Cas13m.6-BsaI vector was digested with BsaI and ligated with the annealed product T4.
  • the CasRx-BpiI plasmid was digested with BpiI and ligated with the annealed product T4.
  • the verified vectors targeting endogenous genes AQp1 and PTBP1 were obtained.
  • the vector to be verified is transfected into 293T cells and 293T-AQp1 cells
  • the verification vector was transfected into 293T cells and 293T-AQp1 cells respectively.
  • the negative control group was transfected with CasRx-BpiI plasmid.
  • Embodiment 9 Identify the key amino acid residues of Cas13m protein
  • Cas13m.2, Cas13m.3, and Cas13m.6 showed the highest level of knockdown, followed by Cas13m.5.
  • Cas13m.2 and Cas13m.3 showed a higher level of knockdown than Cas13m.1, Cas13m.4 and Cas13m.5.
  • the three-dimensional spatial positions of the motifs 1-15 of the Cas13m protein are very similar to those of the corresponding sequences of PbuCas13b. Taking Cas13m.6 as an example, it is overlaid with PbuCas13b.
  • A-N of Figure 10 shows the overlapping of motifs 1-15 of Cas13m.6 and the corresponding sequence of PbuCas13b.
  • motif 1-motif 13 can make Cas13m.2, Cas13m.3, and Cas13m.6 proteins interact with their corresponding DR sequences, and motifs 14 and 15 are catalytic active centers.
  • motifs 14 and 15 are catalytic active centers.
  • homologous proteins or mutants of Cas13m.2, Cas13m.3 or Cas13m.6 contain motif 1-motif 15, they are also expected to display target nucleic acid binding activity or endonuclease activity , especially when amino acid residues other than motif 1-motif 15 are conservatively replaced on the basis of the wild-type sequence.
  • the above-mentioned homologous proteins or mutants can have sequence identity (for example ⁇ 60%, ⁇ 70%, ⁇ 80%, ⁇ 85%, ⁇ 90%) with Cas13m.2, Cas13m.3 or Cas13m.6 protein ⁇ 50%. , >95%, >96%, >97%, >98%, >99%, or >99.5% sequence identity).
  • the source of the above-mentioned homologous protein can also be: Kingdom, Phylum, Class, Order, Family same as Cas13m.2, Cas13m.3 or Cas13m.6 protein source , Genus or Species.
  • Cas13 proteins conjugates comprising the same, nucleic acids encoding these proteins or conjugates, vectors containing these nucleic acids, and methods of using these proteins/nucleic acids having such consensus motifs.
  • the PTBP1 gene was selected as the target gene for off-target verification.
  • control shRNA1 and shRNA2 use the 21 nt intercepted from the head and tail of the target sequence used by Cas13 as the target respectively, as follows:
  • shRNA1 target site GCCCATTCATCCATCCAGTTCTC (SEQ ID NO: 73)
  • shRNA2 target site CAGCCCATTCATCATCCAGTTC (SEQ ID NO:74)
  • the primers used to construct the control vector are shown in Table 12 below:
  • shRNA1 PTBP1-g3-shRNA-1F/PTBP1-g3-shRNA-1R
  • shRNA2 PTBP1-g3-shRNA-2F/PTBP1-g3-shRNA-2R to obtain annealed products.
  • the vector pAAV-CMV-EGFP was digested with BsaI and NotI to obtain a linearized backbone, and the backbone was connected to the annealed products of shRNA1 and shRNA2 respectively, and then transformed into Escherichia coli to obtain control vectors shRNA1 and shRNA2 (which can express shRNA1 respectively under the drive of U6 promoter , shRNA2).
  • the vector CasRx-blank was constructed by conventional methods. CasRx-blank is based on the aforementioned CasRx-BpiI plasmid, replacing the coding sequence GGGTCTTCGAGAAGACCT (SEQ ID NO: 103) of the gRNA guide sequence with GATCAACATTAAATGTGAGCGAGT (SEQ ID NO: 104) (the encoded gRNA can target Escherichia coli E. coli LacZ).
  • Cas13m.2, Cas13m.3, Cas13m.5, CasRfg.2, and CasRx targeting PTBP1 plasmids obtained in Example 5 (respectively named as Cas13m.2-PTBP1, Cas13m.3-PTBP1, Cas13m.5 -PTBP1, CasRfg.2-PTBP1, CasRx-PTBP1 plasmids).
  • the sequence of the backbone vector pAAV-CMV-EGFP is shown in SEQ ID NO:79.
  • the vector to be verified is transfected into 293T cells
  • 500ng of the plasmid to be verified was transfected into 293T cells in a 24-well plate.
  • the transfection method is as follows:
  • Trypsin (Trypsin 0.25%, EDTA, Thermo, 11058021) digested 293T cells, counted the cells, spread 2 ⁇ 10 5 cells in a 24-well plate according to 500 ⁇ L of one well.
  • RNA was sent to the reagent company for RNA sequencing.
  • RNA-Seq sequencing on the sample, and compare the multiple fastq files obtained by sequencing with the reference genome of the target species through HISAT2 or STAR software, and obtain multiple BAM files after comparison.
  • Use kallisto, RSEM or HTSeq to detect transcripts and the expression levels of each gene.
  • DEGs differential expression genes
  • Cas13m.2, Cas13m.3, and Cas13m.5 have fewer potential off-target genes and have less impact on cellular gene expression profiles. Therefore, this characteristic of Cas13m.2, Cas13m.3, and Cas13m.5 will make them have better safety and lower toxicity when used for disease treatment.
  • the gRNA molecular sequence obtained by in vitro transcription is as follows:
  • RNaseAlert is a novel RNA substrate that is labeled with a fluorescent reporter molecule (fluorophore) at one end and a quencher at the other end.
  • a fluorescent reporter molecule fluorophore
  • quencher quencher
  • the physical proximity of the quencher suppresses the fluorescence of the phosphor to very low levels.
  • RNase when RNase is present, the RNA substrate is cleaved and the fluorophore and quencher are separated.
  • the phosphor emits a green fluorescent signal at 520nm when excited by 490nm light.
  • the RNaseAlert substrate When Cas13 has side-cutting activity (that is, non-specific RNA cleavage activity activated by target RNA), the RNaseAlert substrate will also be cleaved to emit a green fluorescent signal that can be detected.
  • the relative fluorescence intensities of Cas13m.2, Cas13m.3, and Cas13m.6 were all lower than 10, and the fluorescence intensity did not increase with time, and no side cutting activity was observed.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Plant Pathology (AREA)
  • Medicinal Chemistry (AREA)
  • Cell Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Mycology (AREA)
  • Analytical Chemistry (AREA)
  • Immunology (AREA)
  • Endocrinology (AREA)
  • Peptides Or Proteins (AREA)
  • Enzymes And Modification Thereof (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

本发明公开了一种分离的Cas13蛋白及其应用。所述分离的Cas13蛋白的氨基酸序列包含与SEQ ID NO:1-7、SEQ ID NO:60任一项所示序列具有≥50%序列同一性的序列。该Cas13蛋白为Cas13酶,具有核酸内切酶活性,可用于CRISPR/Cas系统,实现对靶核酸的靶向和修饰,丰富了CRISPR-Cas编辑系统可用酶和体系。

Description

分离的Cas13蛋白及其应用
本申请要求申请日为2021/11/5的中国专利申请2021113061499和申请日为2022/5/13的中国专利申请2022105188261的优先权。本申请引用上述中国专利申请的全文。
技术领域
本发明涉及基因编辑技术领域,特别是涉及一种分离的Cas13蛋白及其应用。
背景技术
CRISPR-Cas13是基于细菌免疫系统的RNA靶向和编辑系统,可保护其免受病毒侵袭。该系统与CRISPR-Cas9系统基本类似,但与靶向DNA的CRISPR-Cas9系统不同,Cas13蛋白可靶向切割RNA。
CRISPR-Cas13是CRISPR-Cas系统中的第二大类的第VI型,它包含单一的效应蛋白质Cas13,与CRISPR RNA(crRNA)组装时可形成一个由crRNA引导的RNA靶向效应复合物。许多Cas13蛋白具有两类不同的核糖核酸酶活性,一类是RNase负责的pre-crRNA预处理形成成熟的VI型干扰复合物;而另一类RNase活性由两个较高等真核生物和原核生物核苷酸结合(Higher eukaryotes and prokaryotes nuceotide-binding,HEPN)结构域提供。HEPN结构域能够帮助切割RNA,例如ssRNA,且当靶RNA中存在折叠结构时,Cas13一般优先在非碱基配对的ssRNA区域进行剪切。
目前CRISPR-Cas13根据系统发育可分为多个亚型(A、B、C和D)。
本领域一直在致力于寻找新的Cas13蛋白。虽然截至目前已发现成千上万个Cas13蛋白,但其中具有活性的蛋白却并不多。例如,其中大部分Cas13蛋白未报道RNA靶向或修饰活性。有文献指出,Cas13一旦通过靶标识别而激活后,会不加选择地切割RNA并诱导休眠或细胞死亡。
开发具有RNA靶向/修饰活性的Cas13蛋白目前仍是一个难题。
发明内容
基于此,有必要针对上述问题,提供一种分离的Cas13蛋白,该Cas13蛋白具有结合、靶向和/或修饰靶RNA的活性。
本发明公开了分离的Cas13蛋白,所述Cas13蛋白的氨基酸序列包含与SEQ ID NO:1-SEQ ID NO:7、SEQ ID NO:60任一项所示序列具有≥50%序列同一性的序列。
上述Cas13蛋白,是发明人在众多蛋白中经过反复筛选、尝试后得到,其具有可与 gRNA形成复合物、与gRNA形成复合物并结合靶核酸、被gRNA引导至靶核酸和/或靶向或修饰靶核酸等活性。
可以理解的,上述Cas13蛋白的氨基酸序列也可以包含与SEQ ID NO:1-SEQ ID NO:7、SEQ ID NO:60任一项所示序列具有≥50%、≥60%、≥70%、≥75%、≥80%、≥85%、≥90%、≥92%、≥95%、≥96%、≥97%、≥98%、≥99%、≥99.5%或100%的序列同一性的序列,即上述序列所示氨基酸序列仅为所述Cas13蛋白的氨基酸序列的一部分,该Cas13蛋白还可包含其它功能或非功能结构域。上述Cas13蛋白的氨基酸序列也可以是与SEQ ID NO:1-SEQ ID NO:7、SEQ ID NO:60任一项所示序列具有≥50%、≥60%、≥70%、≥75%、≥80%、≥85%、≥90%、≥92%、≥95%、≥96%、≥97%、≥98%、≥99%、≥99.5%或100%的序列同一性的序列,即上述序列所示氨基酸序列构成的蛋白即为所述Cas13蛋白。
在其中一些实施例中,所述Cas13蛋白可与gRNA形成复合物。
在其中一些实施例中,所述Cas13蛋白可被gRNA引导至靶核酸。可以理解的,当Cas13蛋白被gRNA引导至靶核酸之后,可选地,可以靶向或修饰所述靶核酸,也可以不靶向且也不修饰所述靶核酸。例如,在一些情况下,所述Cas13蛋白被所述gRNA引导至靶核酸后,可以不靶向且不修饰所述靶核酸(例如不切割所述靶核酸),本领域技术人员可以仅利用其识别所述靶核酸的能力,如使靶核酸被结合但不被切割。在一些情况下,Cas13蛋白被gRNA引导至靶核酸后,可以靶向或修饰所述靶核酸(例如切割所述靶核酸),例如切割靶mRNA并因此降低翻译水平。
在其中一些实施例中,所述Cas13蛋白可被gRNA引导至靶核酸,并靶向或修饰所述靶核酸。
可以理解的,当该Cas13蛋白靶向靶核酸时,可以产生以下的一种或多种活性:切割一种或多种靶核酸,可视化或检测一种或多种靶核酸,标记一种或多种靶核酸,运输一种或多种靶核酸,掩蔽一种或多种靶核酸,结合一种或多种靶核酸,提高靶核酸对应基因的转录和/或翻译水平,和降低靶核酸对应基因的转录和/或翻译水平。
在其中一些实施例中,所述靶向所述靶核酸是切割所述靶核酸或结合所述靶核酸。
在其中一些实施例中,所述靶向所述靶核酸是结合所述靶核酸。所述结合可以是gRNA指导序列与靶序列碱基互补配对所导致的结合。
在其中一些实施例中,所述靶向所述靶核酸是切割所述靶核酸。
在其中一些实施例中,所述靶核酸是RNA。在其中一些实施例中,所述RNA任选自mRNA、miRNA、rRNA、tRNA、snRNA和结构RNA。所述Cas13蛋白可以被gRNA引导至所述靶核酸,然后可选地切割或不切割该靶核酸。
在其中一些实施例中,所述靶核酸是mRNA。在一些情况下,当靶核酸是mRNA时, 所述Cas13蛋白可以被gRNA引导至所述靶核酸,然后可选地切割或不切割靶核酸。
在其中一些实施例中,所述靶核酸是PTBP1(Polypyrimidine Tract Binding Protein 1)mRNA、AQp1(Aquaporin 1)mRNA、VEGFA(Vascular Endothelial Growth Factor A)mRNA、VEGFR1(Vascular endothelial growth factor receptor 1)mRNA或VEGFR2(Vascular endothelial growth factor receptor-2)mRNA。
在其中一些实施例中,所述靶核酸是PTBP1(Polypyrimidine Tract Binding Protein 1)mRNA或AQp1(Aquaporin 1)mRNA。也即用于敲低AQp1 mRNA水平,从而减少房水生成,使眼压降低,用于治疗青光眼等疾病;或者用于敲低PTBP1 mRNA水平,从而实现脑部星形胶质细胞向神经元的转分化,用于治疗帕金森氏症等疾病。在其中一些实施例中,所述靶核酸是VEGFA mRNA、VEGFR1 mRNA或VEGFR2 mRNA,通过敲低mRNA水平可以用于治疗年龄相关性黄斑变性(AMD)。
应理解,在此前的发展中,人们已经发现了很多基因/蛋白调控靶点与人畜/植物疾病、动植物性状等等具有相关性,基于本发明所建立的CRISPR系统对于此类靶点的结合、靶向或修饰均是可行的。
在其中一些实施例中,所述Cas13蛋白来源于:与具有包含SEQ ID NO:1-SEQ ID NO:7、SEQ ID NO:60任一项所示序列的氨基酸序列的蛋白来源相同的界(Kingdom)、门(Phylum)、纲(Class)、目(Order)、科(Family)、属(Genus)或种(Species)。
其中,包含SEQ ID NO:1所示序列的蛋白对应为Cas13m.1,包含SEQ ID NO:2所示序列的蛋白对应为Cas13m.2,包含SEQ ID NO:3所示序列的蛋白对应为Cas13m.3,包含SEQ ID NO:4所示序列的蛋白对应为Cas13m.4,包含SEQ ID NO:5所示序列的蛋白对应为Cas13m.5,包含SEQ ID NO:6所示序列的蛋白对应为CasRfg.1,包含SEQ ID NO:7所示序列的蛋白对应为CasRfg.2,包含SEQ ID NO:60所示序列的蛋白对应为Cas13m.6。
在其中一些实施例中,Cas13m.1蛋白来源于噬纤维菌目(Cytophagales bacterium),Cas13m.2蛋白来源于包含CNGB数据库中编号CNA0011077所示基因组的细菌,Cas13m.3蛋白来源于拟杆菌门(Bacteroidetes bacterium),Cas13m.4蛋白来源于包含CNGB数据库中编号CNA0007373所示基因组的细菌,Cas13m.5蛋白来源于拟杆菌门(Bacteroidetes bacterium),Cas13m.6蛋白来源于普雷沃氏菌科(Prevotellaceae bacterium),CasRfg.1蛋白来源于包含NCBI数据库中编号GCA_003940745.1所示基因组的细菌,CasRfg.2蛋白来源于包含CNGB数据库中编号CNA0009477所示基因组的细菌。
在其中一些实施例中,所述Cas13蛋白来源于:
1)污水宏基因组、Cytophagales bacterium、或拟杆菌属细菌;
2)具有与NCBI数据库中编号为GCA_003940745.1、GCA_013298125.1、 GCA_902762805.1或GCA_013298545.1所示基因组,或CNGB数据库中编号为CNA0011077、CNA0007373或CNA0009477所示基因组ANI值≥95%基因组的物种;
3)与污水WW分离株、bin5.concoct.b16b17b19.071、RUG10805或bin17.concoct.ball.095分离株基因组ANI值≥95%基因组的物种。
平均核苷酸同一性(average nucleotide identity,ANI)是一种在核酸水平上评价两个基因组之间所有直系同源蛋白编码基因的相似性的指标,对于细菌/古细菌一般以阈值ANI=95%来作为判断是否为同一物种的依据(Richter M,Rosselló-Móra R.Shifting the genomic gold standard for the prokaryotic species definition.Proc Natl Acad Sci U S A.2009 Nov 10;106(45):19126-31),因此,本发明以上述阈值进行界定,认为与上述基因组ANI值≥95%的物种均为同一物种,其中的Cas13蛋白与本发明要求保护的蛋白具有同源性,功能相似,属于本发明的范围。
在其中一些实施例中,所述分离的Cas13蛋白来自包含与NCBI数据库中编号为GCA_003940745.1、GCA_013298125.1、GCA_902762805.1或GCA_013298545.1所示基因组,或CNGB数据库中编号为CNA0011077、CNA0007373或CNA0009477所示基因组的ANI值≥95%的基因组的物种(species)。
在其中一些实施例中,所述分离的Cas13蛋白来自包含与NCBI数据库中编号为GCA_013298125.1、GCA_902762805.1或GCA_013298545.1所示基因组,或CNGB数据库中编号为CNA0011077、CNA0007373或CNA0009477所示基因组的ANI值≥95%的基因组的物种(species)。
在其中一些实施例中,所述分离的Cas13蛋白来自包含NCBI数据库中编号为GCA_003940745.1、GCA_013298125.1、GCA_902762805.1或GCA_013298545.1所示基因组,或CNGB数据库中编号为CNA0011077、CNA0007373或CNA0009477所示基因组的细菌。
在其中一些实施例中,所述分离的Cas13蛋白来自污水WW分离株、bin5.concoct.b16b17b19.071、RUG10805或bin17.concoct.ball.095分离株。
本发明还公开了一种分离的Cas13蛋白,所述Cas13蛋白质包含以下基序1-15所示氨基酸序列:
基序1:L-x(3)-R-N-x-Y-[ST]-H(SEQ ID NO:84)
基序2:R-x(3)-K-x-[VI]-N-G-F-G-R(SEQ ID NO:85)
基序3:P-Y-[IV]-T-x(5)-Y-x-[IV]-x(2)-N-x-I-G-L(SEQ ID NO:86)
基序4:P-x-L-x(2)-D-x(3)-[NK]
基序5:P-x-[AC]-x-L-S-x(2)-[ED]-[LF]-P-A-x(2)-F(SEQ ID NO:87)
基序6:[LI]-P-x-K-L
基序7:[KT]-x-[AL]-x(2)-[KVE]-[IL]
基序8:A-[DRK]-x-L-x(2)-[DS]-[MI]-[MV]-x-[FW]-Q-P(SEQ ID NO:88)
基序9:K-L-T-x(2)-N(SEQ ID NO:89)
基序10:F-x-[HR]-[AF]-x(5)-[QR]
基序11:I-x-L-P-x-G-[LM]-F-x(3)-I(SEQ ID NO:90)
基序12:[LI]-I-x(2)-[YWF]-F
基序13:I-x(3)-I
基序14:[DN]-[TN]-E-x(2)-[IL]-[KR]-[VR]-Y-[KR]-x-Q-D(SEQ ID NO:91)
基序15:R-N-[SA]-[FA]-x-H-x(2)-Y(SEQ ID NO:92)
其中,A、F、C、U、D、N、E、Q、G、H、L、I、K、O、M、P、R、S、T、V、W、Y为标准氨基酸代码,“x”为任意氨基酸,x后的括号内的数字表示连续的多个x,“[]”内为择一可选氨基酸代码,“-”为分隔符。
在其中一些实施例中,所述Cas13蛋白从N端至C端依次包括基序1-15。
在其中一些实施例中,所述基序1选自基序16,所述基序2选自基序17,所述基序3选自基序18,所述基序4选自基序19,所述基序5选自基序20,所述基序6选自基序21,所述基序7选自基序22,所述基序8选自基序23,所述基序9选自基序24,所述基序10选自基序25,所述基序11选自基序26,所述基序12选自基序27,所述基序13选自基序28,所述基序14选自基序29,所述基序15选自基序30。
所述基序16-30所示氨基酸序列如下:
基序16:L-[RVY]-[EYH]-[LYC]-R-N-[VFM]-Y-[ST]-H(SEQ ID NO:93)
基序17:R-[ST]-[IVL]-[SQ]-K-[NAE]-[VI]-N-G-F-G-R(SEQ ID NO:94)
基序18:P-Y-[IV]-T-[DN]-[HW]-[HR]-[AT]-[KAT]-Y-[LN]-[IV]-[HS]-[NSA]-N-[RH]-I-G-L(SEQ ID NO:95)
基序19:P-[END]-L-[TKD]-[PIT]-D-[GKE]-[AGN]-[RDG]-[NK]
基序20:P-[TMK]-[AC]-[WYS]-L-S-[IV]-[FY]-[ED]-[LF]-P-A-[LM]-[ALV]-F-[LY]-[LCM]-[HY]-[LI]-[YR](SEQ ID NO:96)
基序21:[SNG]-[QE]-[LI]-P-[RED]-K-L
基序22:[KT]-[WHK]-[AL]-[AQE]-[SQE]-[KVE]-[IL]
基序23:A-[DRK]-[FY]-L-[AM]-[HTR]-[DS]-[MI]-[MV]-[FRE]-[FW]-Q-P(SEQ ID NO:97)
基序24:[CG]-[NGK]-[ND]-K-L-T-[GS]-[LAQ]-N(SEQ ID NO:98)
基序25:F-[ALV]-[HR]-[AF]-[NS]-[QSR]-[NSM]-[KR]-[WY]-[QR]
基序26:[KA]-[SPV]-I-[ELM]-L-P-[RD]-G-[LM]-F-[ET]-[ST]-[YH]-I(SEQ ID NO:99)
基序27:[LI]-I-x(2)-[YWF]-F-x(5)-[DQ]-x(2)-Q-[PT]-F-Y-[DR](SEQ ID NO:100)
基序28:I-[RAL]-[KQ]-[KD]-I
基序29:[DN]-[TN]-E-[KTR]-[ED]-[IL]-[KR]-[VR]-Y-[KR]-[ILT]-Q-D(SEQ ID NO:101)
基序30:R-N-[SA]-[FA]-[AG]-H-[NL]-[SRT]-Y-[PK](SEQ ID NO:102)
在其中一些实施例中,所述Cas13蛋白的氨基酸序列包含与SEQ ID NO:2、SEQ ID NO:3、SEQ ID NO:60任一项所示序列具有≥50%、≥60%、≥70%、≥75%、≥80%、≥85%、≥90%、≥92%、≥95%、≥96%、≥97%、≥98%、≥99%、≥99.5%或100%序列同一性的序列。进一步地,在其中一些实施例中,所述Cas13蛋白的氨基酸序列为除基序1-基序15以外的任意氨基酸残基,在野生型序列基础上进行氨基酸保守性替换,所述野生型序列包括SEQ ID NO:2、SEQ ID NO:3、SEQ ID NO:60所示序列。在其中一些实施例中,所述Cas13蛋白的氨基酸序列包含如SEQ ID NO:2、SEQ ID NO:3、SEQ ID NO:60任一项所示序列。
在其中一些实施例中,可将所述Cas13蛋白的氨基酸序列中一个或多个氨基酸残基(如催化残基)突变,使得其完全或部分丧失在gRNA引导下的核酸酶活性。例如对RNA酶的HEPN(较高等真核生物和原核生物核苷酸,higher eukaryotes and pro-karyotes nucleotide,HEPN)结构域的RxxxxH基序进行突变,使得HEPN结构域失活。这样的变化的蛋白尽管降低或丧失了核酸酶的活性、不进行靶核酸的切割,但是其仍然可以靠近和结合至靶核酸。例如可以与其他结构域融合用于对靶核酸的单碱基转换、翻译激活或翻译抑制。
在一些实施方式中,可通过突变或修饰来降低核酸酶活性,如相比野生型Cas13蛋白的核酸酶活性降低至少10%、至少20%、至少30%、至少40%、至少50%、至少60%、至少70%、至少80%、至少90%、至少95%、至少97%或100%。
在其中一些实施例中,所述Cas13蛋白可与gRNA形成复合物。
在其中一些实施例中,所述Cas13蛋白可被gRNA引导至靶核酸。
在其中一些实施例中,所述Cas13蛋白来源于:与包含SEQ ID NO:1-SEQ ID NO:7、SEQ ID NO:60任一项所示序列的蛋白来源相同的的界、门、纲、目、科、属或种。
在其中一些实施例中,所述Cas13蛋白是非天然的。
本发明的Cas13蛋白可进行修饰,例如被连接至修饰部分(例如另一个多肽、寡肽或其他分子)。通常,蛋白的修饰不会对该蛋白的期望活性(例如,与gRNA结合的活性、核酸内切酶活性、在gRNA引导下与靶核酸特定位点结合的活性、在gRNA引导下与靶核酸特定位点结合并切割靶核酸的活性)产生不利影响。因此,本发明还意欲包括此类修饰的 蛋白。例如,可以将本发明的Cas13蛋白功能性连接(通过化学偶合、共价连接、基因融合、非共价连接或其它方式)一个或多个修饰部分。
本发明公开了一种缀合物,包含上述的Cas13蛋白,以及修饰该Cas13蛋白的修饰部分(即异源功能部分)。
在其中一些实施例中,所述缀合物的修饰部分选自另一个多肽、寡肽、可检测的标记、药用试剂、其他分子及其任意组合。
在其中一些实施例中,所述修饰部分选自:提供亚细胞定位的定位标签,有助于追踪、分离或纯化的标签,翻译激活结构域,翻译抑制结构域,核酸酶结构域,脱氨酶结构域,甲基化酶结构域,去甲基化酶结构域和调控剪接结构域(例如调控RNA剪接)。
在其中一些实施例中,所述提供亚细胞定位的定位标签选自:核定位信号(NLS)和核输出信号(NES)序列。NLS的非限制性实例包括但不限于来源于以下各项的NLS序列:SV40病毒大T抗原的NLS序列;来自核质蛋白的NLS序列;c-myc NLS序列;hRNPA1 M9 NLS序列;输入蛋白-α的IBB结构域的NLS序列;肌瘤T蛋白的NLS序列;人p 53的NLS序列;小鼠c-abl IV的NLS序列;流感病毒NS1的NLS序列;肝炎病毒δ抗原的NLS序列;小鼠Mx1蛋白的NLS序列;人聚(ADP-核糖)聚合酶的NLS序列;类固醇激素受体(人)糖皮质激素的NLS序列。
在其中一些实施例中,所述缀合物包含一个或多个核定位信号(NLS)。在其中一些实施例中,所述缀合物包含一个或多个核输出信号(NES)。在其中一些实施例中,所述缀合物包含1、2、3、4、5、6、7、8、9、10个或更多个核定位信号。
在其中一些实施例中,核输出信号包括至少四个疏水残基。
在其中一些实施例中,所述有助于追踪、分离或纯化的标签选自:表位标签,荧光蛋白(例如绿色荧光蛋白(GFP),YFP,RFP,CFP,mCherry,tdTomato等),HIS标签(例如6×His标签),血凝素(HA)标签,FLAG标签,Myc标签,谷胱甘肽S-转移酶(GST)标签以及麦芽糖结合蛋白(MBP)标签。
在其中一些实施例中,所述翻译激活结构域选自:eIF4E和其他翻译起始因子、酵母poly(A)-结合蛋白和GLD2的结构域。
在其中一些实施例中,所述翻译抑制结构域选自:Pumilio蛋白、脱腺苷酶(例如脱腺苷酶CAF1)和Argonaute蛋白。
在其中一些实施例中,所述核酸酶结构域选自:FokⅠ、PIN核酸内切酶结构域、NYN结构域、来自SOT1的SMR结构域和来自葡萄球菌核酸酶的RNA酶结构域。
在其中一些实施例中,所述脱氨酶结构域来自胞苷脱氨酶和腺苷脱氨酶。
在其中一些实施例中,所述脱氨酶结构域选自:PPR蛋白质(Pentatricopeptide repeat)、ADAR家族蛋白质、APOBEC家族蛋白质。
在其中一些实施例中,所述甲基化酶结构域来自m6A甲基化转移酶。
在其中一些实施例中,所述去甲基化结构域来自RNA去甲基化酶ALKBH5。
在其中一些实施例中,所述调控剪接结构域选自:SRSF1、hnRNP A1、RBM4。
在其中一些实施例中,所述缀合物包含上述的Cas13蛋白,以及一个或多个修饰部分。在其中一些实施例中,所述缀合物由上述的Cas13蛋白,以及一个或多个修饰部分组成。在其中一些实施例中,所述缀合物由上述的Cas13蛋白、一个或多个修饰部分,以及用于连接所述Cas13蛋白和修饰部分的接头组成。在一些情况下,所述多个修饰部分可以相同,也可以不同。
在其中一些实施例中,所述缀合物包含或不包含用于连接所述Cas13蛋白和所述修饰部分的接头。
在其中一些实施例中,所述缀合物包含Cas13蛋白,修饰部分以及连接所述Cas13蛋白和所述修饰部分的接头。
在其中一些实施例中,所述缀合物由Cas13蛋白,修饰部分以及连接所述Cas13蛋白和所述修饰部分的接头组成。
在其中一些实施例中,所述缀合物不包含用于连接所述Cas13蛋白和修饰部分的接头。在其中一些实施例中,所述缀合物由所述Cas13蛋白和所述修饰部分直接连接,包括通过共价键直接连接。
在其中一些实施例中,所述接头可以是氨基酸、氨基酸序列或其他化学基团。在其中一些实施例中,所述接头可以是氨基酸、氨基酸衍生物、PEG(聚乙二醇)。
在其中一些实施例中,接头是由1个或多个氨基酸残基通过肽键连接形成的线性多肽,所述氨基酸残基可以是天然的或非天然的,例如可以是经过修饰的。
接头的实例包括包含一个或多个(例如,1个,2个,3个,4个或5个)氨基酸(如Glu或Ser)或氨基酸衍生物(如,Ahx、β-Ala、GABA或Ava)的接头,或PEG等。
与修饰部分相同的结构作为接头的技术方案也在本发明的范围之内。非限制性实例例如,亚细胞定位信号(如NLS或NES)、标签(如HA标签、Flag标签)等作为接头也在本发明的范围之内。
在其中一些实施例中,所述缀合物可与gRNA相互作用。
在其中一些实施例中,所述缀合物可与gRNA形成复合物。
在其中一些实施例中,所述缀合物可与gRNA形成复合物,所述复合物结合至靶核酸。
在其中一些实施例中,所述缀合物可以被gRNA引导至靶核酸。
在其中一些实施例中,所述缀合物可以被gRNA引导至靶核酸,并靶向或修饰所述靶核酸。可以理解的,当所述缀合物被gRNA引导至靶核酸之后,可选地,可以靶向或修饰所述靶核酸,也可以不靶向且也不修饰所述靶核酸。例如,在一些情况下,所述缀合物被gRNA引导至靶核酸后,可以不靶向且不修饰所述靶核酸;例如不切割所述靶核酸,本领域技术人员可以仅利用其结合所述靶核酸的能力。在一些情况下,所述缀合物被gRNA引导至靶核酸后,可以靶向或修饰所述靶核酸,例如切割靶mRNA并因此降低翻译水平。
在其中一些实施例中,所述修饰部分可以连接于所述Cas13蛋白的氨基末端、氨基末端附近、羧基末端和/或羧基末端附近。在其中一些实施例中,所述修饰部分连接于所述Cas13蛋白的氨基末端和/或羧基末端。在其中一些实施例中,所述修饰部分连接于所述Cas13蛋白的氨基末端附近、或羧基末端附近。在其中一些实施例中,当所述修饰部分沿着多肽链在距氨基末端或羧基末端约1、2、3、4、5、10、15、20、25、30、40、50个或更多个氨基酸内时,该修饰部分被认为在氨基末端附近或羧基末端附近。
在其中一些实施例中,所述缀合物包含具有足以驱动所述缀合物在真核细胞的核中和/或之外以可检测的量积累的强度的一个或多个核定位信号和/或核输出信号。检测所述Cas13蛋白或缀合物在细胞特定部位的积累量可通过任何合适的技术进行。
在其中一些实施例中,所述缀合物是非天然的。
本发明还公开了一种gRNA,可与上述的Cas13蛋白或上述的缀合物形成复合物。
可以理解的,上述gRNA能将上述Cas13蛋白或缀合物引导至靶核酸。在其中一些实施例中,所述gRNA能将所述Cas13蛋白或缀合物引导至靶核酸,并靶向或修饰所述靶核酸。在其中一些实施例中,所述gRNA将所述复合物引导至靶核酸,并且随后该复合物靶向或修饰所述靶核酸。在其中一些实施例中,所述靶向所述靶核酸是切割所述靶核酸或结合所述靶核酸。
在其中一些实施例中,所述gRNA包含指导序列和同向重复序列,所述指导序列可与靶核酸互补,所述同向重复序列可与所述Cas13蛋白或与所述缀合物相互作用。
在其中一些实施例中,所述指导序列可与靶核酸互补(完全互补或部分互补),所述同向重复序列可与所述Cas13蛋白或与所述缀合物相互作用。
在其中一些实施例中,当所述gRNA与本发明的Cas13m蛋白(Cas13m.1~Cas13m.6)、与所述Cas13m蛋白具有≥50%序列同一性的蛋白、或包含其的缀合物联合使用时,所述gRNA的同向重复序列位于所述指导序列的3'端。
在其中一些实施例中,当所述gRNA与本发明的CasRfg.1或CasRfg.2蛋白、与所述CasRfg.1或CasRfg.2具有≥50%序列同一性的蛋白、或包含其的缀合物联合使用时,所述gRNA的同向重复序列位于所述指导序列的5'端。
在其中一些实施例中,所述gRNA包含指导序列和同向重复序列,所述同向重复序列的二级结构包括依次连接的:互补配对的第一茎、非互补的凸起结构、互补配对的第二茎、非互补的环结构。
进一步地,在其中一些实施例中,所述gRNA具有如下特征:a.所述第一茎由4-7对碱基对组成,b.所述非互补的凸起结构其中一条序列长度为2-6个核苷酸,c.所述第二茎由4-7对碱基对组成,和/或d.所述非互补的环结构(未包括环与茎的连接处互补配对的那一对碱基)的序列长度为5-8个核苷酸。
在其中一些实施例中,所述第一茎的其中一条序列选自:GUUG、GUUGU、GUUGUA、GUUGUUA。
在其中一些实施例中,所述gRNA包含指导序列和同向重复序列,所述同向重复序列选自与SEQ ID NO:15-SEQ ID NO:21、SEQ ID NO:62中任一项所示序列具有≥90%序列同一性的序列,或具有≥95%序列同一性的序列。
在其中一些实施例中,所述同向重复序列选自SEQ ID NO:15-SEQ ID NO:21、SEQ ID NO:62中的任一项所示序列。
在其中一些实施例中,所述gRNA包含指导序列和同向重复序列,所述指导序列长度为≥10nt(10个核苷酸)、≥11nt、≥12nt、≥13nt、≥14nt、≥15nt、≥16nt、≥17nt、≥18nt、≥19nt、≥20nt、≥21nt、≥22nt、≥23nt、≥24nt、≥25nt、≥26nt、≥27nt、≥28nt、≥29nt、≥30nt、≥31nt、≥32nt、≥33nt、≥34nt、≥35nt、≥40nt、≥50nt或≥60nt。
在其中一些实施例中,所述gRNA包含指导序列和同向重复序列,所述指导序列长度为≤10nt(10个核苷酸)、≤11nt、≤12nt、≤13nt、≤14nt、≤15nt、≤16nt、≤17nt、≤18nt、≤19nt、≤20nt、≤21nt、≤22nt、≤23nt、≤24nt、≤25nt、≤26nt、≤27nt、≤28nt、≤29nt、≤30nt、≤31nt、≤32nt、≤33nt、≤34nt、≤35nt、≤40nt、≤50nt或≤60nt。
在其中一些实施例中,所述gRNA包含指导序列和同向重复序列,所述指导序列长度范围为10nt-60nt、10nt-50nt、10nt-40nt、12nt-35nt、15nt-35nt、15nt-30nt、20nt-35nt、20nt-30nt、25nt-35nt或25nt-30nt。
在其中一些实施例中,所述gRNA包含指导序列和同向重复序列,所述同向重复序列长度为≥10nt、≥15nt、≥20nt、≥25nt、≥30nt、≥35nt、≥40nt、≥45nt、≥50nt、≥60nt、≥70nt、≥80nt、≥90nt、≥100nt、≥150nt、≥200nt或≥300nt。
在其中一些实施例中,所述gRNA包含指导序列和同向重复序列,所述同向重复序列长度为≤10nt、≤15nt、≤20nt、≤25nt、≤30nt、≤35nt、≤40nt、≤45nt、≤50nt、≤60nt、≤70nt、≤80nt、≤90nt、≤100nt、≤150nt、≤200nt或≤300nt。
在其中一些实施例中,所述gRNA包含指导序列和同向重复序列,所述同向重复序列 长度范围为10nt-300nt、10nt-200nt、10nt-100nt、15nt-80nt、15nt-50nt、15nt-40nt、15nt-35nt或20nt-40nt。
在其中一些实施例中,所述同向重复序列位于所述指导序列的3'端。在其中一些实施例中,所述同向重复序列位于所述指导序列的5'端。
在其中一些实施例中,所述靶核酸是PTBP1(Polypyrimidine Tract Binding Protein 1)mRNA、AQp1(Aquaporin 1)mRNA、VEGFA mRNA、VEGFR1 mRNA或VEGFR2 mRNA。
在其中一些实施例中,所述靶核酸是PTBP1 mRNA或AQp1 mRNA。在其中一些实施例中,所述靶核酸是VEGFA mRNA、VEGFR1 mRNA或VEGFR2 mRNA。
本发明还公开了一种组合物,包括:
1)上述的Cas13蛋白、上述的缀合物、编码上述Cas13蛋白的核酸、或编码上述缀合物的核酸;
以及
2)上述的gRNA或编码所述gRNA的核酸。
在其中一些实施例中,所述gRNA包含指导序列,所述指导序列可与靶核酸互补,所述靶核酸为PTBP1 mRNA或AQp1 mRNA。
在其中一些实施例中,所述核酸为DNA。在其中一些实施例中,所述核酸为RNA。
本发明还公开了一种核酸,包括:
1)编码上述的Cas13蛋白的核苷酸序列或编码上述的缀合物的核苷酸序列;
和/或
2)编码上述的gRNA的核苷酸序列。
在其中一些实施例中,所述核苷酸序列用于在原核细胞或真核细胞中进行表达。
在其中一些实施例中,所述核酸为DNA。在其中一些实施例中,所述核酸为RNA。
本发明还公开了一种载体,其特征在于,所述载体包含:
1)编码上述Cas13蛋白的核苷酸序列或编码上述缀合物的核苷酸序列;
和/或
2)编码上述的gRNA的核苷酸序列。
在其中一些实施例中,所述编码Cas13蛋白的核苷酸序列为一种或几种,所述编码缀合物的核苷酸序列为一种或几种。
在其中一些实施例中,所述载体包含调节元件。
在其中一些实施例中,所述调节元件可以调控所述核苷酸序列的表达。
在其中一些实施例中,所述调节元件为启动子和/或增强子。在其中一些实施例中,所述调节元件为启动子。
在其中一些实施例中,所述载体任选自:克隆载体、表达载体。在其中一些实施例中,所述载体是质粒或病毒载体。
在其中一些实施例中,所述载体能够在细胞内表达本发明的Cas13蛋白或缀合物。在其中一些实施例中,所述载体能够在真核细胞内表达本发明的Cas13蛋白或缀合物。在其中一些实施例中,所述载体能够在人细胞内表达本发明的Cas13蛋白或缀合物。
在其中一些实施例中,所述载体是非天然载体。
本发明还公开了一种递送组合物,包括递送载体,以及选自以下的至少一种:上述的Cas13蛋白、缀合物、gRNA、组合物、核酸、载体。
在其中一些实施例中,所述递送载体选自:递送粒子、递送囊泡、病毒载体中的至少一种。
本发明还公开了一种细胞,包含:上述的Cas13蛋白、缀合物、gRNA、组合物、核酸、载体中的至少一种。
在其中一些实施例中,所述细胞为真核细胞。
在其中一些实施例中,所述靶核酸来源于动物细胞、植物细胞或微生物细胞。
在其中一些实施例中,由所述细胞不能产生动物或植物。
在其中一些实施例中,由所述真核细胞不能产生动物或植物。
在其中一些实施例中,所述真核细胞包含干细胞和干细胞系。在其中一些实施例中,所述干细胞不是胚胎干细胞,并且所述干细胞系不是胚胎干细胞系。
在其中一些实施例中,对于含有本发明的Cas13蛋白、缀合物、gRNA、复合物、分离的核酸、载体、组合物、递送组合物的细胞,这些细胞内的靶核酸已被靶向或修饰。
本发明还公开了一种靶向或修饰靶核酸的方法,包括向所述靶核酸递送选自以下的至少一种:上述的Cas13蛋白、缀合物、gRNA、组合物、核酸、载体、细胞。在其中一些实施例中,所述递送发生于离体、体外或体内。在其中一些实施例中,所述靶向或修饰靶核酸的方法用于通过改变靶核酸来修饰细胞、细胞系或生物体。在其中一些实施例中,所述靶核酸来源于动物细胞、植物细胞或微生物细胞。
在其中一些实施例中,所述靶核酸是PTBP1(Polypyrimidine Tract Binding Protein 1)mRNA、AQp1(Aquaporin 1)mRNA、VEGFA mRNA、VEGFR1 mRNA或VEGFR2 mRNA。
在其中一些实施例中,所述靶核酸是PTBP1(Polypyrimidine Tract Binding Protein 1)mRNA或AQp1(Aquaporin 1)mRNA。在其中一些实施例中,所述靶核酸是VEGFA mRNA、VEGFR1 mRNA或VEGFR2 mRNA。
在其中一些实施例中,所述靶向或修饰靶核酸的方法不包括疾病的诊断和治疗方法。
在其中另一些实施例中,所述靶向或修饰靶核酸的方法包括疾病的诊断和治疗方法。
本发明还公开了上述的Cas13蛋白、缀合物、gRNA、组合物、核酸、载体、细胞在制备用于诊断、预防或治疗受试者中疾病的药物中的用途。在其中一些实施例中,所述受试者为人类个体。
本发明还公开了一种将所述的Cas13蛋白、缀合物、gRNA、组合物、核酸、载体、细胞以有效量施用于受试者以诊断、预防或治疗疾病的方法。
本发明还公开了一种核酸检测方法,其特征在于,包括使以下a和b形成复合物并与待测靶核酸结合的步骤:
a.所述的Cas13蛋白或所述的缀合物,
b.所述的gRNA。
在其中一些实施例中,所述方法包括使上述缀合物与所述gRNA形成复合物,并与靶核酸结合;所述缀合物包含可检测标记,所述复合物结合、切割或修饰靶核酸致使所述可检测标记信号变化,通过观测可检测标记的信号变化情况分析待测样品中靶核酸的含量。进一步地,所述可检测标记包括:荧光基团、显色剂、显影剂或放射性同位素。
与现有技术相比,本发明具有以下有益效果:
本发明的分离的Cas13蛋白,为新的Cas13酶,可用于CRISPR/Cas系统。且经实验验证,本发明Cas13蛋白发挥其Cas13核酸酶活性时,可对外源报告基因及内源基因均有好的编辑效率。
附图说明
图1为实施例1中Cas13蛋白与已公开的Cas13各亚型的基因座结构的比较示意图。
图2为实施例1中Cas13蛋白及其他各亚型Cas13蛋白的RxxxxH基序在氨基酸链中的位置。
图3为实施例1中Cas13蛋白及其他各亚型Cas13蛋白的聚类分析示意图,其中,A为Cas13m.1-Cas13m.5聚类分析示意图,B为Cas13m.1-Cas13m.6聚类分析示意图。
图4为利用RNAfold对实施例1中Cas13蛋白对应同向重复序列的RNA二级结构分析示意图。
图5为实施例1中Cas13m.2、Cas13m.3和Cas13m.6蛋白的三维预测结构。
图6为实施例1中Cas13蛋白叠合示意图。
图7为实施例4中流式细胞仪检测GFP荧光结果。
图8为实施例5中qPCR检测内源靶基因AQp1以及PTBP1的mRNA变化示意图。
图9A、图9B和图9C为实施例9中Cas13m蛋白与PbuCas13b的多序列比对截图示意图。
图10为实施例9中的叠合图,其中,A-N分别示出了将Cas13m.6与PbuCas13b进行叠合后Cas13m.6的基序1-15与PbuCas13b对应序列的重叠情况。
图11为实施例11中旁切效应测试结果。
具体实施方式
为了便于理解本发明,下面将参照相关附图对本发明进行更全面的描述。附图中给出了本发明的较佳实施例。但是,本发明可以以许多不同的形式来实现,并不限于本文所描述的实施例。相反地,提供这些实施例的目的是使对本发明的公开内容的理解更加透彻全面。
除非另有定义,本文所使用的所有的技术和科学术语与属于本发明的技术领域的技术人员通常理解的含义相同。本文中在本发明的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本发明。本文所使用的术语“和/或”包括一个或多个相关的所列项目的任意的和所有的组合。
如本文中所使用,被称为“Cas13蛋白”或具有“Cas酶活性”或“Cas核酸内切酶活性”的蛋白或多肽涉及由CRISPR相关(Cas)基因编码的CRISPR相关(Cas)多肽或蛋白,当与一种或多种向导RNA(guide RNA,gRNA)复合或功能性组合时,该Cas13蛋白或多肽能够被引导至靶核酸中的靶序列,有时随后还可以靶向或修饰靶核酸。通过gRNA指导,Cas核酸内切酶识别、靶向或修饰靶核酸中的特异性靶位点(靶序列或靶序列附近的核苷酸序列),例如可以是在RNA(例如编码RNA,例如mRNA)分子中的靶位点。
如本文中所使用,术语“HEPN结构域”具有本领域通常认为的含义。HEPN结构域已被证明为RNase结构域,并具有结合与切割靶RNA分子的能力。所述靶RNA可以是RNA的任何合适形式,包括但不限于编码RNA和非编码RNA。此前已发现的CRISPR第2类VI型效应蛋白都含有两个HEPN结构域,包括例如Cas13a、Cas13b、Cas13c、Cas13d、Cas13e和Cas13f,其HEPN结构域具有保守的RxxxxH基序,该基序是HEPN结构域的特征。
如本文中所使用的,术语“gRNA”和“向导RNA”可互换使用,其具有本领域技术人员通常理解的含义,gRNA通常是指能与Cas13蛋白结合并有助于将Cas13蛋白引导至/靶向至靶核酸/靶多核苷酸(例如DNA或mRNA分子)内的特定位置(靶序列)的RNA分子(或一组RNA分子的总称)。gRNA含有指导序列和同向重复(DR)序列。gRNA可以包含一个或多个修饰(例如,碱基修饰、骨架修饰、核苷间键的修饰等),以提供与未修饰gRNA相同的功能,或对gRNA提供新的或增强的特征(例如,改进的稳定性)。
如本文中使用的,术语“指导序列”与“靶向结构域”可互换使用,是指gRNA中的连续 核苷酸序列,其与靶核酸中的靶序列具有部分或完全互补性,并且可以通过由Cas13蛋白促进的碱基配对与靶核酸中的靶序列杂交。本发明所述的指导序列与靶序列的完全互补性不是必需的,只要存在足够互补性以引起杂交并且促进一种CRISPR/Cas复合物的形成即可。
合适的同向重复(direct repeat,DR)序列可由原核生物(如细菌、古细菌)的CRISPR基因座结构中,进行实验筛选才能寻找得到。同向重复序列的大小通常在数十bp内,其部分片段互为反向互补,即意味着RNA分子内部形成了一个二级结构,例如茎环结构(常称为发卡结构),其他片段则体现为非结构化。同向重复序列是gRNA分子的恒定部分,其含有强二级结构,这有利于Cas13蛋白和gRNA分子之间的相互作用。
如本文中使用的,术语“靶核酸”、“靶RNA”或“靶多核苷酸”是指含有靶序列的多核苷酸,在本文中经常可互换使用。靶核酸可以包含任何多核苷酸,如DNA(靶DNA)或RNA(靶RNA)。“靶核酸”是指gRNA引导Cas13蛋白进行靶向或修饰的核酸。术语“靶核酸”可以是对细胞(例如,真核细胞)而言任何内源或外源的多核苷酸。例如,“靶核酸”可以是一种存在于真核细胞中的多核苷酸,也可以是一个编码基因产物(例如,蛋白质)的序列(或其一部分)或一个非编码序列(或其一部分)。在某些情况下,“靶核酸”可以包括一个或多个疾病相关基因和多核苷酸以及信号传导生化途径相关基因和多核苷酸。“疾病相关”基因或多核苷酸是指与非疾病对照的组织或细胞相比,在来源于疾病影响的组织的细胞中以异常水平或以异常形式产生转录或翻译产物的任何基因或多核苷酸。在某些情况下,所述靶核酸是编码RNA。在某些情况下,所述靶核酸是非编码RNA。在某些情况下,所述靶核酸包括mRNA、miRNA、rRNA、tRNA、snRNA和结构RNA。在某些情况下,所述靶核酸为mRNA。在某些情况下,所述靶核酸为整个mRNA分子。在某些情况下,所述靶核酸为DNA。在某些情况下,所述靶核酸为整个染色体DNA分子。如本文中使用的“靶RNA”、“靶核酸”和“靶”表示人们希望使用CRISPR系统来结合、靶向或修饰的特定序列或其反向互补序列。
如本文中使用的,术语“靶序列”是指靶核酸分子中的一小段序列,其可与gRNA分子的指导序列互补(完全互补或部分互补)。靶序列的长度经常为数十bp,例如,可以为约10bp、约20bp、约30bp、约40bp、约50bp或约60bp。
如本文中使用的,术语“靶向”定义为包括以下的一种或多种:切割一种或多种靶核酸,可视化或检测一种或多种靶核酸,标记一种或多种靶核酸,运输一种或多种靶核酸,掩蔽一种或多种靶核酸,结合一种或多种靶核酸,提高靶序列对应基因的转录和/或翻译水平,和降低靶序列对应基因的转录和/或翻译水平。
如本文中使用的,术语“修饰”定义为包括以下的一种或多种:核酸碱基置换,核酸碱 基缺失,核酸碱基插入,将核酸甲基化,将核酸去甲基化,和将核酸去胺基化。
如本文中所使用的,术语“切割”(cleavage/cleaving)是指使多核苷酸的核糖基磷酸二酯主链中的共价键(例如共价磷酸二酯键)断裂,包括但不限于:使单链多核苷酸断裂,使含两条互补单链的双链多核苷酸的任一条单链断裂,使含两条互补单链的双链多核苷酸的两条单链都断裂。
例如,本领域技术人员可以理解的,可以将本发明的Cas13蛋白或缀合物与一个或多个异源功能部分融合或缔合(例如通过融合蛋白、接头肽等)。例如将完全或部分丧失核酸酶活性的Cas13突变体与异源功能部分融合。这些功能域可以具有各种活性,例如甲基化酶活性、脱甲基酶活性、脱氨酶活性、翻译激活活性、翻译抑制活性、RNA切割活性、核酸结合活性、碱基编辑活性,以及切换活性(如光诱导)。所述异源功能部分可包括但不限于:定位信号(例如核定位信号NLS、核输出信号NES)、标记或检测标记(如FITC或DAPI这种荧光染料)、靶向部分、抗原决定簇标签(例如Hismyc、V5、FLAG、HA、VSV-G、Trx等)、脱氨酶或脱氨基域(例如ADAR1,ADAR2,APOBEC,AID或TAD)、甲基化酶、脱甲基酶、ssRNA裂解活性域、dsRNA裂解活性域、DNA或RNA连接酶,或以上任意的组合。
例如,可以将本发明的Cas13蛋白与脱氨酶融合,与gRNA组合后用于靶向靶RNA,实现对靶RNA分子的单碱基编辑。
例如,所述异源功能部分可以为可检测标记。当CRISPR-CAS复合物与靶核酸接触或结合时,含有Cas13核酸酶的缀合物切割或修饰靶核酸,通过观测可检测标记的存在情况来分析待测样品中靶核酸存在情况。所述可检测标记如荧光基团、显色剂、显影剂或放射性同位素。
测量Cas13蛋白或缀合物与靶核酸的结合的方法是本领域已知的,包括但不限于染色质免疫沉淀测定、凝胶迁移率变动测定、报告基因测定、微孔板捕获和检测测定。类似地,测量靶核酸的切割或修饰的方法在本领域中是已知的,包括体外或体内切割测定。
如本文中所使用的,术语“复合物”与“CRISPR/Cas复合物”可互换使用。术语“复合物”是指,gRNA与Cas13蛋白结合所形成的核糖核蛋白复合体。该核糖核蛋白复合体能够识别(有时还可进一步切割或修饰)与该gRNA的指导序列互补的靶序列或其所在的靶核酸。
如本文中所使用,术语“非天然的”意为“改造的”,表示涉及人工。当提及核酸分子或多肽时,该术语表示所述核酸分子或所述多肽至少基本上不含至少一种在自然界中天然地与它们关联和被发现时与它们关联的其它组分。此外,该术语可以表示核酸分子或多肽具有在自然界中不存在的序列。
如本文中所使用,术语“缀合物”表示修饰的Cas13蛋白。所述缀合物包含Cas13蛋白 部分和修饰部分。修饰部分可以为蛋白质或多肽(或它们的任意功能性片段)、寡肽、其他小分子(包括但不限于糖分子)。所述缀合物可以为融合蛋白。
如本文中所使用的,术语“序列同一性”(identity或percent identity)用于指两个多肽之间或两个核酸之间序列的匹配情况。当两个进行比较的序列中的某个位置都被相同的碱基或氨基酸单体亚单元占据时(例如,两个DNA分子中的每一个的某个位置都被腺嘌呤占据,或两个多肽中的每一个的某个位置都被赖氨酸占据),那么各分子在该位置上是同一的。两个序列之间的“百分比序列同一性”(percent identity)是由这两个序列共有的匹配位置数目除以进行比较的位置数目×100%的函数。例如,如果两个序列的10个位置中有6个匹配,那么这两个序列具有60%的序列同一性。通常,在将两个序列比对以产生最大序列同一性时进行比较。这样的比对可通过使用已公开和可商购的比对算法和程序,诸如但不限于ClustalΩ、MAFFT、Probcons、T-Coffee、Probalign、BLAST,本领域的普通技术人员可合理选择使用。本领域技术人员能确定用于比对序列的适宜参数,例如包括对所比较序列全长实现较优比对或最佳对比所需要的任何算法,以及对所比较序列的局部实现较优比对或最佳对比所需要的任何算法。
序列同一性与序列相似性有关。可以通过直观比对(肉眼)、更通常地借助于序列比较程序来进行同一性或相似性比较。这些计算机程序可以计算在两个或更多个序列之间的同一性或相似性的百分比(%)并且还可以计算由两个或更多个氨基酸或核酸序列共享的序列同一性。
术语“多肽”、“肽”和“蛋白质”在本文可互换地使用,是指具有任何长度的氨基酸的聚合物。所述聚合物可为直链型或分支型,其可包含经过修饰的氨基酸,并且其可由非氨基酸中断。这些术语还涵盖已经被修饰的氨基酸聚合物;这些修饰例如二硫键形成、糖基化、脂化(lipidation)、乙酰化、磷酸化或任何其他操纵,如与标记组分的缀合。如本文使用的术语“氨基酸”包括天然的和/或非天然的或者合成的氨基酸,包括甘氨酸以及D和L旋光异构体、以及氨基酸类似物和肽模拟物。
如本文使用的,术语“结构域”或“蛋白结构域”是指可以独立于该蛋白质链的其余部分而存在并且起作用的蛋白质序列的一部分。
如本文中所使用的,术语“载体”是指,可将多聚核苷酸插入其中的一种核酸运载工具。当载体能使插入的多核苷酸编码的蛋白获得表达时,载体称为表达载体。载体可以通过转化、转导或者转染的方式进入宿主细胞,使其携带的遗传物质元件在宿主细胞中获得表达。载体是本领域技术人员公知的,包括但不限于:质粒;柯斯质粒;噬菌粒;人工染色体,例如酵母人工染色体(YAC)或细菌人工染色体(BAC);噬菌体如λ噬菌体及动物病毒等。可用作载体的动物病毒包括但不限于:逆转录酶病毒(包括慢病毒)、腺病毒、腺相关病毒、疱 疹病毒(如单纯疱疹病毒)、痘病毒、杆状病毒、乳头瘤病毒、乳头多瘤空泡病毒(如SV40)。一种载体可以含有多种控制表达的元件,包括但不限于:启动子序列、转录起始序列、增强子序列、选择元件及报告基因。另外,载体还可含有复制起始位点。载体包括但不限于:单链、双链或部分双链的核酸分子;包含一个或多个游离端、不包含游离端(例如,环状)的核酸分子;包含DNA、RNA或二者的核酸分子;以及本领域已知的其他种类的多核苷酸。某些载体能够在它们被引入至其中的宿主细胞中自主复制。其他载体在引入到宿主细胞后被整合到宿主细胞的基因组中,并且从而随着宿主基因组一起复制。此外,某些载体能够引导它们可操作地连接的基因的表达。此类载体在此被称为“表达载体”。用于真核细胞并且在真核细胞中产生表达的载体可以在此称之为“真核表达载体”。在重组DNA技术中采用的常见表达载体常常是质粒形式。
一种载体可以被引入到宿主细胞中而由此产生转录物、蛋白质、或肽,包括如本文所述的蛋白、缀合物、分离的核酸、复合物、组合物等。
重组表达载体可以包含处于适用于在宿主细胞中表达核酸的形式的本发明的核酸,这意味着重组表达载体包含一个或多个调节元件,这些调节元件可以基于用于表达的宿主细胞来选择,可操作地连接至待表达的核酸序列。
如本文中所使用的,术语“可操作地连接”旨在意指载体中Cas蛋白编码序列或gRNA编码序列以允许核苷酸序列表达(例如,在体外转录/翻译系统中或当该载体被引入到宿主细胞时在宿主细胞中表达)的方式连接至一个或多个调节元件。例如在载体中,将启动子1置于Cas13蛋白编码序列的上游,当该载体被引入到宿主细胞时,在启动子1的驱动下可以启动Cas13基因的转录。
如本文中所使用的,术语“调节元件”旨在包括启动子、增强子、内部核糖体进入位点(IRES)以及其他表达控制元件(例如,转录终止信号,诸如多聚腺苷酸化信号和聚U序列)。调节元件包括引导核苷酸序列在许多类型的宿主细胞中连续表达的那些元件和引导核苷酸序列仅在某些宿主细胞中表达的那些元件(例如,组织特异性调节序列)。组织特异性启动子可以引导主要在希望的感兴趣的组织诸如肌肉、神经元、骨骼、皮肤、血液、特定器官(例如,肝脏、胰脏)、或特定细胞类型(例如,淋巴细胞)中的表达。调节元件还可以时间依赖性方式诸如细胞周期依赖性或发育阶段依赖性方式引导表达,这可以是或也可以不是组织特异性或细胞类型特异性的。在其中一些实施例中,载体包含一个或多个pol III启动子(例如,1、2、3、4、5、或更多个pol III启动子)、一个或多个pol II启动子(例如,1、2、3、4、5、或更多个pol II启动子)、一个或多个pol I启动子(例如,1、2、3、4、5、或更多个pol I启动子)、或其组合。pol III启动子的实例包括但不限于,U6和H1启动子。pol II启动子的实例包括但不限于,逆转录病毒劳斯氏肉瘤病毒(Rous sarcoma  virus)(RSV)LTR启动子(任选地具有RSV增强子)、巨细胞病毒(CMV)启动子(任选地具有CMV增强子)、SV40启动子、二氢叶酸还原酶启动子、β-肌动蛋白启动子、磷酸甘油激酶(PGK)启动子、以及EF1α启动子。术语“调节元件”还涵盖增强子元件,诸如WPRE,CMV增强子,SV40增强子,以及兔β-球蛋白的外显子2与3之间的内含子序列。本领域技术人员将了解的是,表达载体的设计可以取决于如有待转化的宿主细胞的选择、所希望的表达水平等因素。载体可以引入到宿主细胞中从而表达本发明所述的Cas13蛋白、缀合物或CRISPR复合物。
如本文中所使用的,术语“启动子”具有本领域技术人员公知的含义,其是指一段位于基因的上游,能启动下游基因表达的非编码核苷酸序列。组成型(constitutive)启动子是这样的核苷酸序列:当其与编码或者限定基因产物的多核苷酸可操作地连接时,在细胞的大多数或者所有生理条件下,其导致细胞中基因产物的产生。诱导型启动子是这样的核苷酸序列,当可操作地与编码或者限定基因产物的多核苷酸相连时,基本上只有当对应于所述启动子的诱导物在细胞中存在时,其导致所述基因产物在细胞内产生。组织特异性启动子是这样的核苷酸序列:当可操作地与编码或者限定基因产物的多核苷酸相连时,基本上只有当细胞是该启动子对应的组织类型的细胞时,其才导致在细胞中产生基因产物。
如本文中所使用的,术语“宿主细胞”是指,可用于导入载体的细胞,其包括但不限于:如大肠杆菌或枯草菌等的原核细胞,如酵母细胞或曲霉菌等的真菌细胞,或者如纤维原细胞、CHO细胞、COS细胞、NSO细胞、HeLa细胞、BHK细胞、HEK 293细胞或其他人细胞等的动物细胞。
如本文使用的,术语“表达”(expression或expressing)是指从DNA模板转录成多核苷酸(如转录成mRNA或其他RNA转录物)的过程和/或转录的mRNA随后借此翻译成肽、多肽或蛋白质的过程。转录物和编码的多肽可以总称为“基因产物”或“基因表达产物”。如本文使用的基因或核酸的“表达”不仅涵盖细胞基因表达,而且涵盖在克隆系统中或在任何其他背景下的一个或多个核酸的转录和翻译。
如本文中所使用的,术语“接头”是指连接蛋白和修饰部分的基团。所述基团可以是氨基酸、氨基酸序列或其他化学基团。例如可以是氨基酸(如,Glu或Ser)、氨基酸衍生物、PEG(聚乙二醇)。在一些情况下,“接头”是指由1个或多个氨基酸残基通过肽键连接形成的线性多肽,所述氨基酸残基可以是天然的或非天然的,例如可以是经过修饰的。本发明的接头可以为人工合成的氨基酸序列,或天然存在的多肽序列,例如具有铰链区功能的多肽。此类接头多肽是本领域众所周知的。这类接头可以是新发现的或本领域熟知的,其实例包括但不限于包含一个或多个(例如,1个,2个,3个,4个或5个)氨基酸(如,Glu或Ser)或氨基酸衍生物(如,Ahx、β-Ala、GABA或Ava)的接头,或PEG等。
本发明gRNA可以包含一个或多个修饰(例如,碱基修饰、骨架修饰等),以提供与未修饰gRNA相同的功能,或对gRNA提供新的或增强的特征(例如,改进的稳定性)。含有修饰的适合的gRNA的实例包括含有修饰的骨架或非天然的核苷间键的gRNA。gRNA修饰包括例如,硫代磷酸酯修饰、2'-O-甲基修饰、2'-O-甲氧基乙基(MOE)修饰、2'-脱氧修饰、硫代磷酸酯核苷酸间连接、膦酰基乙酸酯(PACE)核苷酸间连接、硫代膦酰基乙酸酯(硫代PACE)核苷酸间连接、锁核酸(LNA)或环己烯基替代呋喃糖环。
本发明gRNA的呋喃糖环或呋喃糖环和核苷酸间键可被非呋喃糖基团替代。一种这样的核酸(已显示出具有优良杂交性质)称为肽核酸(PNA)。在PNA中,多核苷酸的糖骨架被含酰氨的骨架替代。gRNA分子中的呋喃糖环也可被环己烯基环替代,称为环己烯基核酸(CeNA)。另一种修饰包括锁核酸(LNA),其中2'-羟基连接至糖环的4'-碳原子从而形成2'-C、4'-C-氧基亚甲基键,从而形成双环糖部分。
本发明gRNA还可包括碱基修饰或取代。本发明gRNA可包含未修饰或天然碱基(例如嘌呤碱基腺嘌呤A和鸟嘌呤G以及嘧啶碱基胸腺嘧啶T、胞嘧啶C和尿嘧啶U)。本发明gRNA可包含修饰的碱基,例如包括其它合成和天然的碱基如5-甲基胞嘧啶、5-羟甲基胞嘧啶、黄嘌呤、次黄嘌呤、2-氨基腺嘌呤、腺嘌呤和鸟嘌呤的其他衍生物、5-尿嘧啶(假尿嘧啶)、4-硫尿嘧啶、胞嘧啶的其他衍生物、尿嘧啶的其他衍生物、胸腺嘧啶的衍生物。
所述修饰可以在gRNA分子结构的任意位置。
所述gRNA的5'端或3'端可有额外的核苷酸与指导序列相连接。非限制性示例例如5'末端可以包含2个附加的鸟嘌呤核苷酸,用于提高靶向特异性。
本文中所使用的,术语“递送粒子”、“递送粒子系统”与“粒子”可互换使用。所述粒子用于递送本发明的Cas13蛋白、缀合物、gRNA、复合物、核酸、组合物等。已知若干种类型的递送粒子系统和/或配制品可用于不同范围的生物医学应用中。总的来说,粒子被限定为关于其转运和特性以整体单位表现的小物体。根据直径将粒子进一步分类。粗粒子的大小介于2500-10000纳米之间。细粒子的大小介于100-2500纳米之间。超细粒子或纳米粒子的大小大体上介于1-100纳米之间。可使用多种不同的常规技术进行粒子表征(包括例如表征形貌、尺寸等)。
本发明范围内的递送粒子系统可以任何形式提供,包括但不限于:脂质体(包括例如免疫脂质体)、病毒体(包括例如人工病毒体)、细胞外囊泡(包括例如外泌体、微囊泡和凋亡小体)、粒子(例如纳米粒子)、微泡、基因枪、电穿孔、声孔效应、磷酸钙介导的转染、阳离子转染、树枝状转染、热激转染、核转染、磁转染、脂转染、刺穿转染、光学转染、专有剂增强的核酸摄取、微注射。
如本文中所使用的,术语“外泌体”(exosomes)是转运某些物质(包括但不限于RNA和 蛋白质)的内源性纳米囊泡。
如本文中所使用的术语“脂质体”:脂质体是球形囊泡结构,其由围绕内部水性区室的单层或多层脂质双层以及相对不可渗透的外部亲脂性磷脂双层构成。脂质体作为药物递送载体受到了相当的重视,因为它们是生物相容、无毒的,可以递送亲水性和亲脂性药物分子,保护它们的内容物免于被血浆酶降解,并且转运跨过生物膜和血脑屏障。可以由几种不同类型的脂质制造脂质体;然而,磷脂最常用来产生作为药物载体的脂质体。可以将几种其他的添加剂添加到脂质体中,以便修饰其结构和特性。可以用脂质体进行根据本发明的递送或给药。
本发明所述细胞包括但不限于:原核细胞例如大肠杆菌细胞,以及真核细胞例如酵母细胞、昆虫细胞、植物细胞和动物细胞(如哺乳动物细胞,例如小鼠细胞、人类细胞等,例如人干细胞、人干细胞系,例如人造血干细胞、造血祖细胞等)。
术语真核细胞包括但不限于例如宿主细胞、细胞系和细胞子代。在其中一些实施例中,所述宿主细胞、细胞系和细胞子代可以是任选自体外、离体或体内的。
术语“药物”、“药剂”、“治疗剂”或“能够用于治疗的试剂”是可互换地使用的,并且是指在给予受试者时赋予某种有益影响的分子或化合物。该有益影响包括诊断确定的实现;改善疾病、症状、障碍、或病理学病况;减少或预防疾病、症状、障碍或病理学病况的发作;以及总体上对抗疾病、症状、障碍或病理学病况。
如本文中所使用的,术语“受试者”包括但不限于各种动物,例如哺乳动物,例如牛科动物、马科动物、羊科动物、猪科动物、犬科动物、猫科动物、兔科动物、啮齿类动物(例如,小鼠或大鼠)、非人灵长类动物(例如,猕猴或食蟹猴)或人。在某些实施方式中,所述受试者(例如人)患有病症(例如,疾病相关基因缺陷所导致的病症)。
术语“有效量”或“治疗有效量”是指一种药剂的足以实现有益或希望的结果的量。治疗有效量可依赖于接收治疗的受试者和疾病病状、受试者的重量和年龄、疾病病况的严重度、给药方式等中一项或多个而改变,并可以由本领域普通技术人员容易地确定。该术语也适用通过此处描述的显像方法中的任一项提供一种检测用图像的一个剂量。具体剂量可依赖于以下中一个或多个而变化:所选择的具体药剂、所遵循的给药方案、是否与其他化合物组合给予、给予时间、待显像的组织、以及携带它的物理递送系统。
如本文中所使用的,术语向个体“施用......”这一过程可以发生于体外、离体或体内。
如本文中所使用的,术语“保守性替换”(Conservative Replacement或Conservative Substitution)是指性状相近的氨基酸分子之间的替换(即取代)。所述性状包括但不限于分子的离子性、疏水性和分子量等。因此,所述取代可以是,例如(1)芳香族氨基酸之间的取代(Phe、Trp、Tyr),(2)非极性脂肪族氨基酸(Gly、Ala、Val、Leu、Met、Ile、Pro)之间的取代, (3)不带电极性氨基酸(Ser、Thr、Cys、Asn、Gln)之间的取代,(4)碱性氨基酸(Lys、Arg、His)之间的取代,或(5)酸性氨基酸(Asp、Glu)之间的取代。
实施例1:Cas13蛋白的筛选
本发明Cas13蛋白通过以下方法获得:
1、CRISPR和基因的注释
使用软件对来自NCBI Gebank和CNGB数据库的微生物基因组,预测全基因组的蛋白(约数百万个),然后使用CRISPRCasFinder软件预测基因组上的CRISPR array,初筛使用默认参数设置。
2、蛋白的初步筛选
以95%的蛋白序列相似性为标准,用聚类去除冗余的蛋白,去除和其他蛋白序列同一性100%且自身覆盖度100%的蛋白,同时过滤掉小于800aa(氨基酸)或者大于1400aa的蛋白,以避免过长或者过短蛋白的干扰,得到数十万个蛋白。
3、CRISPR相关蛋白的获得
CRISPR Array上下游10kb以内的蛋白序列和已知Cas13进行比对,过滤掉evalue大于1*e -5的比对结果。
然后再与NCBI的NR库、EBI的专利库比对,过滤掉序列同一性≥95%,同时自身覆盖度≥90%的Cas13蛋白,再经由发明人挑选,得到约100个候选蛋白。
通过实验验证,最终得到本发明的Cas13蛋白Cas13m.1(SEQ ID NO:1)、Cas13m.2(SEQ ID NO:2)、Cas13m.3(SEQ ID NO:3)、Cas13m.4(SEQ ID NO:4)、Cas13m.5(SEQ ID NO:5)、Cas13m.6(SEQ ID NO:60)、CasRfg.1(SEQ ID NO:6)、CasRfg.2(SEQ ID NO:7)。
上述Cas13蛋白的氨基酸序列如下表1所示。
表1.Cas13蛋白氨基酸序列
Figure PCTCN2022129825-appb-000001
Figure PCTCN2022129825-appb-000002
Figure PCTCN2022129825-appb-000003
Figure PCTCN2022129825-appb-000004
Figure PCTCN2022129825-appb-000005
Figure PCTCN2022129825-appb-000006
Figure PCTCN2022129825-appb-000007
上述序列中,每个Cas13蛋白中的2个RxxxxH(x表示任意氨基酸残基)基序用下划线标示。在部分Cas13蛋白的序列(如Cas13m.1、Cas13m.3)中,存在多个序列满足RxxxxH的形式,但通过利用在线的MAFFT v7.487程序(E-INS-i算法,其他为默认参数设置),对Cas13m.1-Cas13m.5这5个蛋白或Cas13m.1-Cas13m.6这6个蛋白的氨基酸序列进行多序列比对,比对结果中与其他蛋白RxxxxH基序对应的位置即被认定为Cas13m.1、Cas13m.3蛋白的催化活性中心RxxxxH基序,同样在上表中用下划线标示。比对结果还显示,Cas13m.1-Cas13m.6这6个蛋白从N端到C端依次包含RNxYxH和RNxxxH基序,x独立地任选自天然存在的氨基酸残基。
另外,CasRfg.1和CasRfg.2蛋白从N端到C端依次包含RxxxxH基序和RNxxxH基序。
上述Cas13蛋白的基因组序列来源如下表2所示。
表2.Cas13蛋白的基因组序列的来源
Figure PCTCN2022129825-appb-000008
Figure PCTCN2022129825-appb-000009
注:NCBI美国国家生物技术信息中心;CNGB中国国家基因库。
上述Cas13蛋白的天然(野生型)DNA编码序列如下所示:
Cas13蛋白Cas13m.1的野生型DNA编码序列如SEQ ID NO:8所示;
Cas13蛋白Cas13m.2的野生型DNA编码序列如SEQ ID NO:9所示;
Cas13蛋白Cas13m.3的野生型DNA编码序列如SEQ ID NO:10所示;
Cas13蛋白Cas13m.4的野生型DNA编码序列如SEQ ID NO:11所示;
Cas13蛋白Cas13m.5的野生型DNA编码序列如SEQ ID NO:12所示;
Cas13蛋白Cas13m.6的野生型DNA编码序列如SEQ ID NO:61所示;
Cas13蛋白CasRfg.1的野生型DNA编码序列如SEQ ID NO:13所示;
Cas13蛋白CasRfg.2的野生型DNA编码序列如SEQ ID NO:14所示。
上述Cas13蛋白的基因座结构如图1所示,且图1中比较了本发明Cas13蛋白与已公开的Cas13各亚型的基因座结构,其中,CRISPR表示CRISPR Array(含有对应DR序列的DNA序列),Cas13e.1和Cas13f.1来源于公开号为CN112410377A的中国专利。从图中可以看出,Cas13m.1-Cas13m.5的基因座结构具有基本相同的特征,Cas13m.1-Cas13m.6的基因座结构具有基本相同的特征。
下表3列出了上述Cas13蛋白对应的同向重复(DR)序列:
表3.Cas13蛋白对应的同向重复(DR)序列
Cas13蛋白 对应同向重复序列 序列号
Cas13m.1 GUUGUUACAGCCCUUAGUUUGUAGGGUAAUGACAAC SEQ ID NO:15
Cas13m.2 GUUGUAGAUGACCUCGUUUUGGAGGGGAAACACAAC SEQ ID NO:16
Cas13m.3 GUUGUAGAAGCCGUUCAUUCGGGACGGUAUGACAAC SEQ ID NO:17
Cas13m.4 GUUGUAAAUACCCACGUUUUGGUGGGCUAAUACAAC SEQ ID NO:18
Cas13m.5 GUUGUGUGUGCCUUUCAAAUUGAAGGCGUUCCCAAC SEQ ID NO:19
Cas13m.6 GUUGUAGAAGCCUAUCGUUAGGAUAGGUAUGACAAC SEQ ID NO:62
CasRfg.1 AUGACUAUACCAGCAAUGGCUGGAUUAAAAC SEQ ID NO:20
CasRfg.2 GGUUUUACACCCGUGUAAAACUACACAGUUCUAAAAC SEQ ID NO:21
图2示出了本发明中各Cas13蛋白的RxxxxH基序在氨基酸链中的位置,Cas13m.1、Cas13m.2、Cas13m.3、Cas13m.4、Cas13m.5和Cas13m.6蛋白的两个RxxxxH基序相隔明显较远,除Cas13m.1以外相隔基本都在920aa以上,Cas13m.2的两个RxxxxH基序相隔923aa,Cas13m.3的两个RxxxxH基序甚至相隔1061aa,Cas13m.5相隔1011aa,Cas13m.6相隔1011aa。
利用在线的MAFFT version 7(E-INS-i算法)对本发明新发现的Cas13蛋白(Cas13m.1-Cas13m.5或Cas13m.1-Cas13m.6),以及此前发现的各Cas13亚型(Cas13a、Cas13b、Cas13c、Cas13d、Cas13e和Cas13f)构建系统发育树,其中部分蛋白序列在NCBI中公开,Cas13e和Cas13f来源于公开号CN112410377A的专利。结果显示本发明的Cas13m.1-Cas13m.5或Cas13m.1-Cas13m.6蛋白在系统树上聚类成组,其他Cas13a/b/c/d/e/f亚型也各自聚类成组分布。具体如图3中A和B所示。
使用RNAfold预测得到本发明Cas13m.1~Cas13m.6、CasRfg.1、CasRfg.2蛋白对应同向重复序列的RNA二级结构。如图4所示。从图中可以看出Cas13m.1~Cas13m.6对应DR序列具有保守的二级结构。
我们用RNAfold对上述同向重复序列进行了RNA二级结构的进一步分析。结果如图4所示,Cas13m.1、Cas13m.2、Cas13m.3、Cas13m.4和Cas13m.5对应的同向重复序列具有以下特点:明显都拥有保守的二级结构,其中,A为保守的二级结构示意图,包含互补配对的第一茎(茎1)、非互补的凸起结构(凸起)、互补配对的第二茎(茎2)、非互补的环结构(环结构),茎1和茎2分别包含互补配对的碱基;B-F分别为Cas13m.1、Cas13m.2、Cas13m.3、Cas13m.4和Cas13m.5对应的同向重复序列二级结构,其中,茎1含4个碱基对(5’-GUUG-3’),或5个碱基对(5’-GUUGU-3’),或6个碱基对(5’-GUUGUA-3’),或7个碱基对(5’-GUUGUUA-3’)。Cas13m.6对应的同向重复序列同样具有上述共同的结构特征。G和H分别为CasRfg.1和CasRfg.2对应的同向重复序列二级结构。
使用蛋白结构数据库程序AlphaFold v2.0预测得到Cas13m蛋白的三维结构,如图5所示,其中A、B和C分别为Cas13m.2、Cas13m.3和Cas13m.6,尽管Cas13m.2、Cas13m.3和Cas13m.6蛋白中的两个RxxxxH基序(深色标记)在氨基酸链上相隔较远,但它们在空间位置上非常接近。
然后使用PyMOL V2.5.1叠合蛋白,结果如图6所示,其中,A为Cas13m.2与Cas13m.3的叠合结果,B为Cas13m.3与Cas13m.6的叠合结果,结果显示Cas13m.2与 Cas13m.3具有类似的三维结构(RMSD=2.402),Cas13m.3与Cas13m.6具有类似的三维结构(RMSD=2.368)。
利用BLASTp将CasRfg.1蛋白和NCBI收录的Cas13蛋白进行比对,发现与Cas13c比对的evalue值相比其他Cas13亚型是最低的;结合图3中的进化树分析,将CasRfg.1归为Cas13c亚型。利用BLASTp将CasRfg.2蛋白和NCBI收录的Cas13蛋白进行比对,发现与Cas13d比对的evalue值相比其他Cas13亚型是最低的;结合图3中的进化树分析,将CasRfg.2归为Cas13d亚型。
实施例2:Cas13蛋白的制备、分离和纯化
(一)载体构建
1、取pET28a载体质粒,经BamHI和XhoI双酶切后,琼脂糖凝胶电泳切胶回收线性化的载体,将人工合成得到的包含重组蛋白(含实施例1的蛋白序列以及核定位序列)编码序列的DNA片段通过同源重组的方式插入到载体pET28a的克隆区,反应液转化Stbl3感受态,涂布硫酸卡那霉素抗性的LB平板,37℃过夜培养后,挑取克隆测序鉴定。
构建好的重组载体分别命名为Cas13m.1-pET28a,Cas13m.2-pET28a,Cas13m.3-pET28a,Cas13m.4-pET28a,Cas13m.5-pET28a,CasRfg.1-pET28a和CasRfg.2-pET28a。
重组载体分别用于表达Cas13m.1重组蛋白(序列如SEQ ID NO:22所示)、Cas13m.2重组蛋白(序列如SEQ ID NO:23所示)、Cas13m.3重组蛋白(序列如SEQ ID NO:24所示)、Cas13m.4重组蛋白(序列如SEQ ID NO:25所示)、Cas13m.5重组蛋白(序列如SEQ ID NO:26所示)、CasRfg.1重组蛋白(序列如SEQ ID NO:27所示)、CasRfg.2重组蛋白(序列如SEQ ID NO:28所示)。
重组型Cas13系列蛋白架构为His tag-NLS-Cas13-SV40NLS-nucleoplasmin NLS。
2、序列正确的阳性克隆过夜培养,提取质粒后转化表达菌株Rosetta(DE3),涂布硫酸卡那霉素抗性的LB平板,37℃过夜培养。
(二)蛋白表达
1、挑取单克隆接种至5ml硫酸卡那霉素抗性的LB培养液,37℃过夜培养。
2、以1:100比例转接种500ml硫酸卡那霉素抗性的LB培养液中,以220rpm的转速,37℃培养至OD值为0.6,加IPTG至终浓度0.2mM,16℃诱导24h。
3、离心收集菌体,15ml PBS漂洗菌体后离心收集菌体,加lysis buffer超声破碎,10,000g离心30min获得含重组蛋白的上清液,上清经过0.45μm滤膜过滤后即可上柱纯化。
(三)蛋白纯化
重组型Cas13系列蛋白架构中包含NLS序列,以N端的6个His作为纯化标签,通过IMAC(Ni Sepharose 6Fast Flow,CYTIVA)纯化上述Cas13系列重组蛋白。纯化的各重组蛋白经过SDS-PAGE电泳可见在100-250kDa区间内呈一条带。
实施例3:Cas13m.6的制备、分离和纯化
采用与上述实施例2相同的方法构建重组载体Cas13m.6-pET28a(序列如SEQ ID NO:83所示),转化表达菌株BL21-CodonPlus(DE3)-RIPL。继续采用与上述相同的方法表达和纯化Cas13m.6重组蛋白(架构为His tag-NLS-Cas13-SV40NLS-nucleoplasmin NLS)。最终纯化的Cas13m.6重组蛋白经过SDS-PAGE电泳可见在100-250kDa区间内呈一条带。
实施例4:在细胞内对外源基因的编辑活性
1、合成靶向EGFP的待验证载体
使用EGFP(增强型绿色荧光蛋白)作为外源的报告基因,其核酸序列(720bp)如SEQ ID NO:29所示。
靶向EGFP的间隔(spacer)序列为:tgccgttcttctgcttgtcggccatgatat(SEQ ID NO:30)。
外源EGFP表达载体序列如SEQ ID NO:31所示,Cas13m.2验证载体序列如SEQ ID NO:32所示,Cas13m.3验证载体序列如SEQ ID NO:33所示,Cas13m.5验证载体序列如SEQ ID NO:34所示,CasRfg.2验证载体序列如SEQ ID NO:35所示。
Cas13m.1验证载体、Cas13m.4验证载体都与Cas13m.3验证载体的骨架序列相同,仅Cas13蛋白编码序列及DR序列的编码序列作了相应替换。CasRfg.1验证载体与CasRfg.2验证载体的骨架序列相同,仅Cas13蛋白编码序列及DR序列的编码序列作了相应替换。
上述验证载体包含密码子优化的Cas13蛋白编码序列,可表达连有NLS的Cas13蛋白,也可表达包含Cas13对应DR序列的可靶向EGFP的gRNA。gRNA的指导序列对应于上述间隔(spacer)序列(SEQ ID NO:30)。
以上载体均由试剂公司用常规方法合成得到。
2、待验证载体转染293T细胞
将表达外源基因EGFP的质粒(简称EGFP)分别与上述各验证载体质粒按照1:2(300ng:600ng)在24孔板中转染293T细胞。
转染方法如下所示:
胰酶(Trypsin 0.25%,EDTA,Thermo,11058021)消化293T细胞,对细胞计数,按照一个孔500μL将2×10 5细胞铺24孔板。
对于每个转染样品,按照以下步骤准备复合物:
a.在加入细胞的24孔板每个孔中,加入50μL无血清的Opti-MEM I(Thermo,25200056)还原血清培养基中稀释前述的质粒DNA,并轻轻混合;
b.在使用前轻轻混合Lipofectamine 2000(Thermo,11668019),然后在每个孔中,即50μL的Opti-MEM I培养基中稀释1.8μL的Lipofectamine 2000。在室温下孵育5分钟。注意:在25分钟内继续执行步骤c;
c.孵育5分钟后,将稀释的DNA与稀释的Lipofectamine 2000合并。轻轻混合并在室温下孵育20分钟(溶液可能看起来混浊)。复合物在室温下稳定6小时。
将复合物加入293T细胞中并混合,48h后使用流式细胞仪进行检测。
3、流式细胞仪检测Cas13蛋白下调EGFP表达效果
使用的细胞以及质粒说明如下表4所示:
表4.转染细胞分组
分组 转染EGFP载体 转染靶向EGFP的Cas13验证载体 说明
293T / / 细胞对照
EGFP * / 仅转染EGFP的对照
CasRfg.1 * * 验证载体
Cas13m.1 * * 验证载体
Cas13m.2 * * 验证载体
Cas13m.3 * * 验证载体
Cas13m.4 * * 验证载体
Cas13m.5 * * 验证载体
CasRfg.2 * * 验证载体
注:*表示含有相关项目,/表示没有相关项目。
将上述步骤2中转染后48h的293T细胞使用胰酶(Trypsin 0.25%,EDTA,Thermo,11058021)消化,300g 5min离心去除上清,每个孔的细胞使用500μL的PBS重悬,通过流式细胞仪检测EGFP荧光表达,通过FCS-A以及SSC-A划门去除细胞碎片后,流式细胞仪检测。
收集记录FITC通道Mean-FITC-A结果,并按下述计算公式计算下调幅度:
下调幅度(%)=(a-x)÷a×100,
其中,EGFP组的GFP荧光为a,其他组别的GFP荧光为x。
其中空白对照组不参与比较。
本实施例实验重复三次。下调幅度结果如下表5和图7所示,结果数据取三次测试的平均值。
表5.流式细胞仪检测GFP荧光结果
分组 下调幅度(%)
EGFP 0.00
Cas13m.1 46
Cas13m.2 67.31
Cas13m.3 76.82
Cas13m.4 59.73
Cas13m.5 50.08
CasRfg.1 33
CasRfg.2 39.19
注:根据三次测试的平均值,293T分组GFP荧光强度1073.55,EGFP分组GFP荧光强度8052219.55。
从表中可见上述Cas13蛋白均能显著下调EGFP的表达,证明了其通过gRNA指导,可在真核细胞内有效降低mRNA水平,发挥编辑活性。其中Cas13m.2、Cas13m.3下调EGFP表达的幅度最大。
实施例5:内源基因编辑效率验证
1、构建靶向内源基因AQp1以及PTBP1的编辑载体
在试剂公司分别合成带有通用型gRNA骨架表达框的经密码子优化的Cas13m.2、Cas13m.3、Cas13m.5、CasRfg.2、CasRx(Cas13d中的一种)表达载体,分别为Cas13m.2-BsaI(序列如SEQ ID NO:36所示),Cas13m.3-BsaI(序列如SEQ ID NO:37所示),Cas13m.5-Bsa(序列如SEQ ID NO:38所示),CasRfg.2-BsaI(序列如SEQ ID NO:39所示),CasRx-BpiI(序列如SEQ ID NO:40所示)。
实验选择的内源位点是AQp1以及PTBP1,其中验证AQp1使用高表达AQp1的293T细胞系(293T-AQp1细胞),验证PTBP1使用293T细胞系。
高表达AQp1的293T细胞系的构建方法:构建过表达AQp1基因以及EGFP基因的载体Lv-AQp1-T2a-GFP,序列如SEQ ID NO:41所示。AQp1与EGFP使用2A肽进行间隔。将Lv-AQp1-T2a-GFP质粒包装慢病毒转导293T细胞,形成稳定过表达AQp1基因的细胞系。
靶向AQp1的gRNA的指导序列选为:
GAAGACAAAGAGGGUCGUGG(SEQ ID NO:42)
靶向PTBP1的gRNA的指导序列选为:
GUGGUUGGAGAACUGGAUGUAGAUGGGCUG(SEQ ID NO:43)
使用引物退火方式获得靶向靶位点的片段,其引物如下所示:
靶向PTBP1组:
Cas13m.2组:
F:CACCGTGGTTGGAGAACTGGATGTAGATGGGCTG(SEQ ID NO:44)
R:CAACCAGCCCATCTACATCCAGTTCTCCAACCAC(SEQ ID NO:45)
Cas13m.3组:
F:CACCGTGGTTGGAGAACTGGATGTAGATGGGCTG(SEQ ID NO:44)
R:CAACCAGCCCATCTACATCCAGTTCTCCAACCAC(SEQ ID NO:45)
Cas13m.5组:
F:CACCGTGGTTGGAGAACTGGATGTAGATGGGCTG(SEQ ID NO:44)
R:CAACCAGCCCATCTACATCCAGTTCTCCAACCAC(SEQ ID NO:45)
CasRfg.2组:
F:AAACGTGGTTGGAGAACTGGATGTAGATGGGCTG(SEQ ID NO:46)
R:AAAACAGCCCATCTACATCCAGTTCTCCAACCAC(SEQ ID NO:47)
CasRx组:
F:AAACGTGGTTGGAGAACTGGATGTAGATGGGCTG(SEQ ID NO:46)
R:CTTGCAGCCCATCTACATCCAGTTCTCCAACCAC(SEQ ID NO:48)
靶向AQp1组:
Cas13m.2组:
F:CACCGAAGACAAAGAGGGTCGTGG(SEQ ID NO:49)
R:CAACCCACGACCCTCTTTGTCTTC(SEQ ID NO:50)
Cas13m.3组:
F:CACCGAAGACAAAGAGGGTCGTGG(SEQ ID NO:49)
R:CAACCCACGACCCTCTTTGTCTTC(SEQ ID NO:50)
Cas13m.5组:
F:CACCGAAGACAAAGAGGGTCGTGG(SEQ ID NO:49)
R:CAACCCACGACCCTCTTTGTCTTC(SEQ ID NO:50)
CasRfg.2组:
F:AAACGAAGACAAAGAGGGTCGTGG(SEQ ID NO:51)
R:AAAACCACGACCCTCTTTGTCTTC(SEQ ID NO:52)
CasRx组:
F:AAACGAAGACAAAGAGGGTCGTGG(SEQ ID NO:51)
R:CTTGCCACGACCCTCTTTGTCTTC(SEQ ID NO:53)
引物退火反应体系如下所示,在PCR仪内95℃孵育5分钟,随后立刻取出在冰上孵 育5分钟,使引物之间互相退火形成含粘性末端的双链DNA:
Figure PCTCN2022129825-appb-000010
将合成的Cas13m-BsaI和CasRfg-BsaI质粒使用BsaI内切酶进行酶切后,将退火产物和酶切后纯化回收的骨架分别进行T4连接,转化大肠杆菌后挑选阳性克隆并提取靶向内源基因mRNA的验证载体质粒进行细胞实验验证。将CasRx-BpiI质粒使用BpiI内切酶进行酶切后,将退火产物和酶切后纯化回收的骨架进行T4连接,转化大肠杆菌后挑选阳性克隆并提取靶向内源基因mRNA的验证载体质粒进行细胞实验验证。
2、待验证载体转染293T细胞以及293T-AQp1细胞
将上步中得到的Cas13m.2、Cas13m.3、Cas13m.5、CasRfg.2、CasRx靶向AQp1的质粒(靶向内源基因mRNA的验证载体质粒)按照800ng在24孔板中转染293T-AQp1细胞。阴性对照组转染CasRx-BpiI质粒。
将上步中得到的Cas13m.2、Cas13m.3、Cas13m.5、CasRfg.2、CasRx靶向PTBP1的质粒按照800ng在24孔板中转染293T细胞。阴性对照组转染CasRx-BpiI质粒。
转染方法如下所示:
1)胰酶(Trypsin 0.25%,EDTA,Thermo,11058021)消化细胞,对细胞计数,按照一个孔500μL将2×10 5细胞铺24孔板。
2)对于每个转染样品,请按照以下步骤准备复合物:
a.在加入细胞的24孔板每个孔中,加入50μL无血清的Opti-MEM I(Thermo,25200056)还原血清培养基中稀释前述的质粒DNA,并轻轻混合;
b.在使用前轻轻混合Lipofectamine 2000(Thermo,11668019),然后在每个孔中,即50μL的Opti-MEM I培养基中稀释1.8μL的Lipofectamine 2000。在室温下孵育5分钟。注意:在25分钟内继续执行步骤c;
c.孵育5分钟后,将稀释的DNA与稀释的Lipofectamine 2000合并。轻轻混合并在室温下孵育20分钟(溶液可能看起来混浊)。注意:复合物在室温下稳定6小时。
将复合物加入细胞中并混合,72h后使用QuantStudio TM 5Real-Time PCR System,96-well进行检测。
3、qPCR检测靶基因的mRNA变化
1)实验方法
转染后72h的细胞使用SteadyPure Universal RNA Extraction Kit AG21017试剂盒进行 RNA提取操作RNA,并使用超微量分光光度计检测mRNA浓度。mRNA产物使用Evo M-MLV Mix Kit with gDNA Clean for qPCR AG11728反转录试剂盒进行反转录,反转录产物使用SYBR Green Premix Pro Taq HS qPCR Kit(Low Rox Plus)AG11720qPCR试剂盒进行检测。
其中qPCR所使用引物如下所示:
检测PTBP1:ATTGTCCCAGATATAGCCGTTG(SEQ ID NO:54)
GCTGTCATTTCCGTTTGCTG(SEQ ID NO:55)
检测AQp1:GCTCTTCTGGAGGGCAGTGG(SEQ ID NO:56)
CAGTGTGACAGCCGGGTTGAG(SEQ ID NO:57)
检测内参GAPDH:CCATGGGGAAGGTGAAGGTC(SEQ ID NO:58)
GAAGGGGTCATTGATGGCAAC(SEQ ID NO:59)
按照SYBR Green Premix Pro Taq HS qPCR Kit(Low Rox Plus)AG11720使用说明配置反应体系,使用QuantStudio TM 5 Real-Time PCR System,96-well进行检测。
2)计算方法
本实验使用相对定量方法即2-△△Ct法计算目标RNA的变化。其计算方式如下所示:
△Ct=Ct(AQp1)-Ct(GAPDH)或Ct(PTBP1)-Ct(GAPDH);
△△Ct=△Ct(待验证样品如Cas13m.2组)-△Ct(阴性对照组);
2-△△Ct=2^(-△△Ct)。
本实施例实验重复三次,按照上述计算方式计算AQp1以及PTBP1的mRNA相对表达量如下表6和图8所示,结果数据取三次测试的平均值:
表6.2-△△Ct法计算得到的目标mRNA的相对表达量
分组 AQp1 mRNA水平 PTBP1 mRNA水平
阴性对照 1.00 1.00
CasRx 0.05 0.60
Cas13m.2 0.03 0.49
Cas13m.3 0.02 0.46
Cas13m.5 0.27 0.70
CasRfg.2 0.36 0.78
qPCR结果显示,Cas13m.2、Cas13m.3、Cas13m.5、CasRfg.2均有下调AQp1以及PTBP1表达的效果。其中Cas13m.2和Cas13m.3下调基因AQp1、PTBP1表达的效果优于CasRx,有很好的编辑活性。Cas13m.5、CasRfg.2也有显著的编辑活性。
实验例6:DR序列与指导序列的连接顺序
验证gRNA分子中DR序列与指导序列的连接顺序对编辑效率的影响。
1、构建靶向内源基因AQp1的编辑载体
本实验选择的内源位点是AQp1,验证AQp1使用前述实施例的高表达AQp1的293T细胞系。
靶向AQp1的gRNA的指导序列为GAAGACAAAGAGGGUCGUGG(SEQ ID NO:42)。
所使用的验证载体如下所示
编号 gRNA结构(5’端-3’端)
Cas13m.2 指导序列-同向重复序列
Cas13m.3 指导序列-同向重复序列
Cas13m.5 指导序列-同向重复序列
CasRfg.2 同向重复序列-指导序列
Cas13m.2-r 同向重复序列-指导序列
Cas13m.3-r 同向重复序列-指导序列
Cas13m.5-r 同向重复序列-指导序列
CasRfg.2-r 指导序列-同向重复序列
其中Cas13m.2、Cas13m.3、Cas13m.5、CasRfg.2靶向内源基因AQp1 mRNA的验证载体在实验例5中已经构建,调整gRNA结构(指导序列与同向重复序列颠倒位置)的Cas13m.2-r、Cas13m.3-r、Cas13m.5-r、CasRfg.2-r靶向内源基因AQp1 mRNA的验证载体(除gRNA编码序列外的其他序列与Cas13m.2、Cas13m.3、Cas13m.5、CasRfg.2验证载体相同)在试剂公司合成。
2、待验证载体转染293T细胞以及293T-AQp1细胞
将Cas13m.2、Cas13m.3、Cas13m.5、CasRfg.2、Cas13m.2-r、Cas13m.3-r、Cas13m.5-r、CasRfg.2-r靶向内源基因AQp1 mRNA的验证载体以及对照质粒(同上述实施例5中的CasRx靶向内源基因AQp1 mRNA的验证载体质粒)按照800ng在24孔板中转染293T-AQp1细胞。阴性对照组转染上述实施例5中的CasRx-BpiI质粒。
转染方法如下所示:
1、胰酶(Trypsin 0.25%,EDTA,Thermo,11058021)消化细胞,对细胞计数,按照一个孔500μL将2×10 5细胞铺24孔板。
2、对于每个转染样品,按照以下步骤准备复合物:
a.在加入细胞的24孔板每个孔中,加入50μL无血清的Opti-MEM I(Thermo,25200056)还原血清培养基中稀释前述的质粒DNA,并轻轻混合;
b.在使用前轻轻混合Lipofectamine 2000(Thermo,11668019),然后在每个孔中,即50μL的Opti-MEM I培养基中稀释1.8μL的Lipofectamine 2000。在室温下孵育5分钟。注意:在25分钟内继续执行步骤c;
c.孵育5分钟后,将稀释的DNA与稀释的Lipofectamine 2000合并。轻轻混合并在室温下孵育20分钟(溶液可能看起来混浊)。注意:复合物在室温下稳定6小时。
将复合物加入细胞中并混合,72h后使用QuantStudio TM 5Real-Time PCR System,96-well进行检测。
3、qPCR检测靶基因的RNA变化
转染后72h的细胞使用SteadyPure Universal RNA Extraction Kit AG21017试剂盒进行RNA提取操作RNA,并使用超微量分光光度计检测RNA浓度。RNA产物使用Evo M-MLV Mix Kit with gDNA Clean for qPCR AG11728反转录试剂盒进行反转录,反转录产物使用SYBR Green Premix Pro Taq HS qPCR Kit(Low Rox Plus)AG11720qPCR试剂盒进行检测。
其中qPCR所使用引物包括如SEQ ID NO:56-57所示的检测AQp1的引物对,以及如SEQ ID NO:58-59所示的检测内参GAPDH的引物对。
按照SYBR Green Premix Pro Taq HS qPCR Kit(Low Rox Plus)AG11720使用说明配置反应体系,使用QuantStudio TM 5Real-Time PCR System,96-well进行检测。
qPCR结果如下所示:
本实验使用相对定量方法即2-△△Ct法计算目标RNA的相对表达量。其计算方式如下所示:
△Ct=Ct(AQp1)-Ct(GAPDH);
△△Ct=△Ct(待验证样品如Cas13m.2组)-△Ct(阴性对照组);
2-△△Ct=2^(-△△Ct)。
按照上述计算方式计算AQp1 mRNA量如下表7所示:
表7. 2-△△Ct法计算得到的目标RNA的相对表达量
分组 AQp1 mRNA水平
阴性对照 1.00
Cas13m.2 0.04
Cas13m.3 0.04
Cas13m.5 0.36
CasRfg.2 0.30
Cas13m.2-r 0.83
Cas13m.3-r 0.78
Cas13m.5-r 0.74
CasRfg.2-r 0.67
qPCR结果显示,更换同向重复序列与指导序列的相对位置后的Cas13m.2-r、Cas13m.3-r、Cas13m.5-r、CasRfg.2-r的编辑活性明显降低。
实施例7:Cas13m.6在细胞内对外源基因的编辑活性,及Cas13m与已公开蛋白的活性比较
除非特别指出,本实施例使用与实施例4相同的方法进行实验。
1、合成靶向EGFP的待验证载体
制备得到靶向EGFP的Cas13m.6验证载体,全长序列如SEQ ID NO:105所示(7690bp)。
经过查阅NCBI,查询到NCBI公开了2个Cas13蛋白,即C13-38蛋白(GenBank:MBQ9236733.1)、C13-40蛋白(NCBI Reference Sequence:WP_025000926.1),及对应DR序列。
发明人对Cas13m与C13-38和C13-40的基因编辑活性进行对比。
C13-38序列为:
Figure PCTCN2022129825-appb-000011
Figure PCTCN2022129825-appb-000012
C13-38对应DR序列为:
5'-GUUUUCAUACCUAUCCAAACGAUAGGCUUCUAAAAC-3'(SEQ ID NO: 64)
C13-40序列为:
Figure PCTCN2022129825-appb-000013
C13-40对应DR序列为:
5'-GUUGUUUUUACCUUUCAAACAGAAGGCAGAUACAACA-3'(SEQ ID NO: 66)
参照上述方法构建得到C13-38验证载体、C13-40验证载体,其骨架序列都与实施例四中Cas13m.3验证载体的骨架序列相同,仅Cas13蛋白编码序列及DR序列的编码序列作了相应替换。
上述Cas13m.6验证载体、C13-38验证载体、C13-40验证载体都包含Cas13蛋白编码序列,可表达连有NLS的Cas13蛋白,也可表达靶向EGFP的gRNA(C13-38和C13-40对应gRNA的指导序列都位于DR序列5'端)。gRNA的指导序列对应于上述间隔(spacer)序列(SEQ ID NO:30)。以上载体均由试剂公司用常规方法合成得到。
2、待验证载体转染293T细胞
将表达外源基因EGFP的质粒与Cas13验证载体质粒(Cas13m.6验证载体、C13-38验证载体、C13-40验证载体、或实施例四制备得到的其他Cas13m验证载体)按300ng:600ng的比例转染293T细胞。
使用的细胞以及质粒说明如下表8所示:
表8:实验分组
Figure PCTCN2022129825-appb-000014
注:*表示含有相关项目,空白表示没有相关项目
3、流式细胞仪检测Cas13蛋白下调EGFP表达的效果
令EGFP组的GFP荧光为a,其他组别的GFP荧光为x。
下调幅度%=(a-x)÷a×100%
本实施例实验重复三次。结果数据取三次测试的平均值,结果如下表9所示。
表9.流式细胞仪检测GFP荧光结果
分组 下调幅度(%)
EGFP 0.00
Cas13m.2 61.8 *#
Cas13m.3 67.2 *#
Cas13m.5 42.3 *#
Cas13m.6 65.5 *#
C13-38 4.2
C13-40 2.9
备注:*表示与C13-38组相比有显著性差异(P<0.01),#表示与C13-40组相比有显著性差异(P<0.01)。
结果显示,Cas13m.6组EGFP下调幅度为65.5%。表明Cas13m.6蛋白能显著下调EGFP的表达,证明了其通过gRNA指导,可在真核细胞内有效降低mRNA水平,发挥编辑活性。
且结果显示,Cas13m.2、Cas13m.3、Cas13m.5和Cas13m.6蛋白的编辑活性都显著高于C13-38和C13-40。
实施例8:Cas13m.6的内源基因编辑效率验证
除非特别指出,本实施例使用与实施例5相同的方法。
1、构建靶向内源基因AQp1以及PTBP1的编辑载体
构建得到表达载体Cas13m.6-BsaI,其序列如SEQ ID NO:77所示。
实验选择的内源位点是AQp1以及PTBP1,其中验证AQp1使用前述高表达AQp1的293T细胞系(293T-AQp1细胞),验证PTBP1使用293T细胞系。
靶向AQp1的gRNA的指导序列选为:
AGGGCAGAACCGATGCTGATGAAGAC(SEQ ID NO:68)
靶向PTBP1的指导序列选为:
GUGGUUGGAGAACUGGAUGUAGAUGGGCUG(SEQ ID NO:43)
使用引物退火方式获得靶向靶位点的片段,其引物如下所示:
靶向PTBP1:
Cas13m.6组:caccGTGGTTGGAGAACTGGATGTAGATGGGCTG(SEQ ID NO:44)
caacCAGCCCATCTACATCCAGTTCTCCAACCAC(SEQ ID NO:45)
CasRx组:SEQ ID NO:46和SEQ ID NO:48。
靶向AQp1:
Cas13m.6组:CACCGagggcagaaccgatgctgatgaagac(SEQ ID NO:69)
CAACgtcttcatcagcatcggttctgccctc(SEQ ID NO:70)
CasRx组:aaacagggcagaaccgatgctgatgaagac(SEQ ID NO:71)
CTTGgtcttcatcagcatcggttctgccct(SEQ ID NO:72)
BsaI酶切Cas13m.6-BsaI载体,与退火产物T4连接。将CasRx-BpiI质粒使用BpiI酶切,与退火产物T4连接。得到靶向内源基因AQp1以及PTBP1的验证载体。
2、待验证载体转染293T细胞以及293T-AQp1细胞
将验证载体分别转染293T细胞以及293T-AQp1细胞。阴性对照组转染CasRx-BpiI质粒。
3、qPCR检测靶基因的mRNA变化
qPCR检测,以2-△△Ct法计算AQp1以及PTBP1的mRNA量。本实施例实验重复三次,结果数据取三次的平均值,结果见下表10。
表10.目标mRNA的相对表达量
分组 AQp1 mRNA水平 PTBP1 mRNA水平
阴性对照 1.00 1.00
CasRx 0.03 0.59
Cas13m.6 0.01 0.51
qPCR结果显示,Cas13m.6可显著下调AQp1以及PTBP1表达,其效果略优于CasRx。
实施例9:识别Cas13m蛋白的关键氨基酸残基
在对AQp1以及PTBP1表达敲低的实验中,Cas13m.2、Cas13m.3、Cas13m.6表现出最高水平的敲低,Cas13m.5次之。在外源EGFP的敲低实验中,Cas13m.2、Cas13m.3相比Cas13m.1、Cas13m.4、Cas13m.5表现出更高水平的敲低。
文献(Slaymaker,Ian M.,et al."High-resolution structure of Cas 13b and biochemical characterization of RNA targeting and cleavage."Cell reports 26.13(2019):3741-3751.)中报道了PbuCas13b的晶体结构,该文章示出了PbuCas13b蛋白中与crRNA相互作用的氨基酸残基,以及PbuCas13b蛋白的HEPN结构域的催化残基。
考虑到Cas13m蛋白在进化树上与Cas13b较为接近,发明人将本申请文件的Cas13m蛋白与PbuCas13b进行了多序列比对(在线MAFFT v7.504,E-INS-i算法,其他为默认参数)。结果如图9A-图9C所示。在PbuCas13b蛋白的上述关键残基(与crRNA相互作用的氨基酸残基,以及HEPN结构域的催化残基)的对应位置识别出高活性Cas13m蛋白(Cas13m.2、Cas13m.3、Cas13m.6)的保守基序(基序1-基序15),以常用的Prosite形式书写,如下表11所示。基序1-基序15在Cas13m.2、Cas13m.3、Cas13m.6中出现的频率更高,基序16-基序30为基序1-15的进一步明确。
表11.多序列比对保守基序结果
Figure PCTCN2022129825-appb-000015
Figure PCTCN2022129825-appb-000016
注:单一字母代码表示高度保守的氨基酸残基,x则表示为任意氨基酸,[]表示此位置为[]内择一可选氨基酸代码。
使用程序AlphaFold v2.0预测得到Cas13m蛋白和PbuCas13b的三维结构,然后使用PyMOL V2.5.1叠合蛋白。
结果显示,AlphaFold预测得到的PbuCas13b蛋白的三维结构与文献报道的PbuCas13b与gRNA复合物的晶体结构(NDB:6DTD,https://www.rcsb.org/structure/6dtd)中蛋白的三维结构很接近(RMSD=2.122),说明PbuCas13b在结合gRNA前后的构象差异并不是特别大,也即Cas13m与PbuCas13b蛋白三维结构的比较是有意义的,而不是必须严格地对两者的Cas13-gRNA复合物来进行比较。
Cas13m蛋白的基序1-15在蛋白中的三维空间位置与PbuCas13b的相应序列所处的位置非常相似。以Cas13m.6为例,将其与PbuCas13b进行叠合,图10的A-N分别示出了Cas13m.6的基序1-15与PbuCas13b对应序列的重叠情况。
可以预测,基序1-基序13可以使得Cas13m.2、Cas13m.3、Cas13m.6蛋白与各自对应DR序列相互作用,基序14和15是催化活性中心。本领域技术人员能够理解,Cas13m.2、Cas13m.3或Cas13m.6的同源蛋白或突变体在含有基序1-基序15时,也有望展示出靶核酸结合活性或核酸内切酶活性,特别是当除基序1-基序15以外的氨基酸残基在野生型序列基础上进行氨基酸保守性替换的情况下。上述同源蛋白或突变体可以具有与Cas13m.2、Cas13m.3或Cas13m.6蛋白≥50%的序列同一性(例如≥60%、≥70%、≥80%、≥85%、≥90%、≥95%、≥96%、≥97%、≥98%、≥99%或≥99.5%的序列同一性)。上述同源蛋白的来源也可以是:与Cas13m.2、Cas13m.3或Cas13m.6蛋白来源相同的界(Kingdom)、门(Phylum)、纲(Class)、目(Order)、科(Family)、属(Genus)或种(Species)。
因此,本文提供了具有这种共有基序的Cas13蛋白质(包含其的缀合物,编码这些蛋白质或缀合物的核酸,含这些核酸的载体,和使用这些蛋白/核酸的方法)。
实施例10:脱靶测试
1、构建对照载体
本实验选用PTBP1基因作为脱靶验证的靶基因。
对照shRNA1,shRNA2分别为使用Cas13所用靶序列头尾截取的21nt作为靶点,具体如下:
shRNA1靶位点:GCCCATCTACATCCAGTTCTC(SEQ ID NO:73)
shRNA2靶位点:CAGCCCATCTACATCCAGTTC(SEQ ID NO:74)
构建对照载体使用引物如下表12所示:
表12.引物序列表
Figure PCTCN2022129825-appb-000017
将上述引物按照shRNA1:PTBP1-g3-shRNA-1F/PTBP1-g3-shRNA-1R,shRNA2:PTBP1-g3-shRNA-2F/PTBP1-g3-shRNA-2R分别退火获得退火产物。
载体pAAV-CMV-EGFP通过BsaI,NotI双酶切获得线性化骨架,将骨架分别与shRNA1 以及shRNA2的退火产物连接后转化大肠杆菌获得对照载体shRNA1、shRNA2(可在U6启动子驱动下分别表达shRNA1、shRNA2)。
采用常规方法构建载体CasRx-blank。CasRx-blank是在前述CasRx-BpiI质粒基础上,将gRNA指导序列的编码序列GGGTCTTCGAGAAGACCT(SEQ ID NO:103)替换为GATCAACATTAAATGTGAGCGAGT(SEQ ID NO:104)(编码的gRNA可靶向大肠杆菌E.coli的LacZ)。另外还使用实施例5构建得到的Cas13m.2、Cas13m.3、Cas13m.5、CasRfg.2、CasRx靶向PTBP1的质粒(分别命名为Cas13m.2-PTBP1、Cas13m.3-PTBP1、Cas13m.5-PTBP1、CasRfg.2-PTBP1、CasRx-PTBP1质粒)。
骨架载体pAAV-CMV-EGFP序列如SEQ ID NO:79所示。
2、待验证载体转染293T细胞
待验证质粒按照500ng在24孔板中转染293T细胞。
转染方法如下所示:
1)胰酶(Trypsin 0.25%,EDTA,Thermo,11058021)消化293T细胞,对细胞计数,按照一个孔500μL将2×10 5细胞铺24孔板。
2)对于每个转染样品,请按照以下步骤准备复合物:
a.在加入细胞的24孔板每个孔中,加入50μL无血清的Opti-MEM I(Thermo,25200056)还原血清培养基中稀释前述的质粒DNA,并轻轻混合;
b.在使用前轻轻混合Lipofectamine 2000(Thermo,11668019),然后在每个孔中,即50μL的Opti-MEM I培养基中稀释1.8μL的Lipofectamine 2000。在室温下孵育5分钟。注意:在25分钟内继续执行步骤c;
c.孵育5分钟后,将稀释的DNA与稀释的Lipofectamine 2000合并。轻轻混合并在室温下孵育20分钟(溶液可能看起来混浊)。注意:复合物在室温下稳定6小时。
转染后72h的细胞使用SteadyPure Universal RNA Extraction Kit AG21017试剂盒进行RNA提取操作,并使用超微量分光光度计检测RNA浓度。提取的RNA送试剂公司进行RNA测序。
3、脱靶分析
对样本进行PE150bp RNA-Seq测序,测序获得的多个fastq文件分别通过HISAT2或STAR软件与靶物种参考基因组进行比对,获得比对后的多个BAM文件。使用kallisto、RSEM或HTSeq检测得到转录本及各基因的表达量。
使用DESeq2、limma-voom、edger对各组间的表达量进行差异分析,将满足p.adj<0.05、|log2FoldChange|>=0.75、basemean>2.5的作为差异表达基因(differential expression gene,DEG),下表13列出了各实验组相对于CasRx-blank组的差异表达基因的个数:
表13.差异表达基因的个数
Figure PCTCN2022129825-appb-000018
由表中数据可知,相比于CasRx、shRNA1、shRNA2而言,Cas13m.2、Cas13m.3和Cas13m.5的潜在脱靶基因数量更少,对细胞基因表达谱的影响更小。所以Cas13m.2、Cas13m.3、Cas13m.5的这一特点将使其在用于疾病治疗时具有更好的安全性、更低的毒性。
实施例11:旁切效应测试
1、获取gRNA
使用T7体外转录试剂盒T7 High Yield RNA Transcription Kit,Vazyme,TR101-01转录靶向PTBP1的gRNA。
体外转录得到的gRNA分子序列如下:
Cas13m.2-PTBP1:
5’-GUGGUUGGAGAACUGGAUGUAGAUGGGCUGGUUGUAGAUGACCUCGUUUUGGAGGGGAAACACAAC-3’(SEQ ID NO:80)
Cas13m.3-PTBP1:
5’-GUGGUUGGAGAACUGGAUGUAGAUGGGCUGGUUGUAGAAGCCGUUCAUUCGGGACGGUAUGACAAC-3’(SEQ ID NO:81)
Cas13m.6-PTBP1:
5’-GUGGUUGGAGAACUGGAUGUAGAUGGGCUGGUUGUAGAAGCCUAUCGUUAGGAUAGGUAUGACAAC-3’(SEQ ID NO:82)
2、检测旁切效应
RNaseAlert是一种新型RNA底物,该底物一端标记有荧光报告分子(荧光体),另一端标记有淬灭剂。淬灭剂的物理接近会将荧光体的荧光抑制到极低水平。然而,当RNA酶存在时,RNA底物被裂解,荧光体和淬灭剂分离。荧光体在被490nm光激发时会发出520nm的绿色荧光信号。
当Cas13存在旁切活性(即靶RNA激活的非特异性RNA切割活性)时,RNaseAlert底物也会被切割从而发出可以被检测的绿色荧光信号。
主要实验设备及材料如下所示:
Figure PCTCN2022129825-appb-000019
Kit,IDT,11-02-01-02
RNase Inhibitor,Murine,NEB,M0314L
Figure PCTCN2022129825-appb-000020
96孔全黑聚苯乙烯微孔板,corning,3915
酶标仪,BioTek,SLXFA。
配制如下的RNaseAlert旁切体系:
Figure PCTCN2022129825-appb-000021
备注:按照SteadyPure Universal RNA Extraction Kit说明书提取293T细胞的RNA组。使用实施例2纯化的Cas13m重组蛋白。RNase组(添加RNase的阳性对照组)和blank组(空白对照)的反应体系中不加Cas13蛋白和gRNA。
在37℃反应1h,每30min使用酶标仪检测520nm荧光。
检测的效果如下表14和图11所示:
表14.旁切测试效果
Figure PCTCN2022129825-appb-000022
Cas13m.2、Cas13m.3、Cas13m.6相对荧光强度均低于10,并且荧光强度没有随时间延长而增加,并没有观察到旁切活性。
有文献(Koonin,Eugene V.,and Kira S.Makarova."Evolutionary plasticity and functional versatility of CRISPR systems."PLoS biology 20.1(2022):e3001481.)指出,Cas13一旦通过靶标识别而激活后,会不加选择地切割RNA并诱导休眠或细胞死亡。本实施例实验结果表明,Cas13m.2、Cas13m.3、Cas13m.6无旁切活性这一特点将使其在用于疾病治疗时 具有更好的安全性、更低的毒性。
以上所述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本发明的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变形和改进,这些都属于本发明的保护范围。因此,本发明专利的保护范围应以所附权利要求为准。

Claims (42)

  1. 一种分离的Cas13蛋白,其特征在于,所述Cas13蛋白的氨基酸序列包含与SEQ ID NO:1-7、SEQ ID NO:60任一项所示序列具有≥50%序列同一性的序列。
  2. 根据权利要求1所述的Cas13蛋白,其特征在于,所述Cas13蛋白可与gRNA形成复合物。
  3. 根据权利要求2所述的Cas13蛋白,其特征在于,所述Cas13蛋白可被gRNA引导至靶核酸。
  4. 根据权利要求3所述的Cas13蛋白,其特征在于,所述Cas13蛋白可被gRNA引导至靶核酸,并靶向或修饰所述靶核酸。
  5. 根据权利要求3所述的Cas13蛋白,其特征在于,所述靶核酸是RNA。
  6. 根据权利要求5所述的Cas13蛋白,其特征在于,所述靶核酸是PTBP1 mRNA、AQp1 mRNA、VEGFA mRNA、VEGFR1 mRNA或VEGFR2 mRNA。
  7. 根据权利要求1所述的Cas13蛋白,其特征在于,所述Cas13蛋白来源于:与具有包含SEQ ID NO:1-SEQ ID NO:7、SEQ ID NO:60任一项所示序列的氨基酸序列的蛋白来源相同的界、门、纲、目、科、属或种。
  8. 一种分离的Cas13蛋白,其特征在于,所述Cas13蛋白的氨基酸序列包含以下基序1-15所示氨基酸序列:
    基序1:L-x(3)-R-N-x-Y-[ST]-H;
    基序2:R-x(3)-K-x-[VI]-N-G-F-G-R;
    基序3:P-Y-[IV]-T-x(5)-Y-x-[IV]-x(2)-N-x-I-G-L;
    基序4:P-x-L-x(2)-D-x(3)-[NK];
    基序5:P-x-[AC]-x-L-S-x(2)-[ED]-[LF]-P-A-x(2)-F;
    基序6:[LI]-P-x-K-L;
    基序7:[KT]-x-[AL]-x(2)-[KVE]-[IL];
    基序8:A-[DRK]-x-L-x(2)-[DS]-[MI]-[MV]-x-[FW]-Q-P;
    基序9:K-L-T-x(2)-N;
    基序10:F-x-[HR]-[AF]-x(5)-[QR];
    基序11:I-x-L-P-x-G-[LM]-F-x(3)-I;
    基序12:[LI]-I-x(2)-[YWF]-F;
    基序13:I-x(3)-I;
    基序14:[DN]-[TN]-E-x(2)-[IL]-[KR]-[VR]-Y-[KR]-x-Q-D;
    基序15:R-N-[SA]-[FA]-x-H-x(2)-Y;
    其中,A、F、C、U、D、N、E、Q、G、H、L、I、K、O、M、P、R、S、T、V、W、Y为标准氨基酸代码,“x”为任意氨基酸,x后的括号内的数字表示连续的多个x,“[]”内为择一可选氨基酸代码,“-”为分隔符。
  9. 根据权利要求8所述的Cas13蛋白,其特征在于,所述Cas13蛋白包含如SEQ ID NO:2、SEQ ID NO:3、SEQ ID NO:60任一项所示序列,或与SEQ ID NO:2、SEQ ID NO:3、SEQ ID NO:60任一项所示序列具有50%以上序列同一性的序列。
  10. 根据权利要求8所述的Cas13蛋白,其特征在于,所述Cas13蛋白的氨基酸序列中除基序1-15所确定氨基酸以外的任意氨基酸残基,在野生型序列基础上进行氨基酸保守性替换,所述野生型序列包括SEQ ID NO:2、SEQ ID NO:3、SEQ ID NO:60所示序列。
  11. 根据权利要求8所述的Cas13蛋白,其特征在于,所述Cas13蛋白可与gRNA形成复合物。
  12. 根据权利要求8所述的Cas13蛋白,其特征在于,所述Cas13蛋白可被gRNA引导至靶核酸。
  13. 根据权利要求8所述的Cas13蛋白,其特征在于,所述Cas13蛋白来源于:与包含SEQ ID NO:1-SEQ ID NO:7、SEQ ID NO:60任一项所示序列的蛋白来源相同的的界、门、纲、目、科、属或种。
  14. 一种缀合物,其特征在于,其包含根据权利要求1-13任一项所述的Cas13蛋白,以及修饰所述Cas13蛋白的修饰部分。
  15. 根据权利要求14所述的缀合物,其特征在于,所述修饰部分选自:提供亚细胞定位的定位标签,有助于追踪、分离或纯化的标签,翻译激活结构域,翻译抑制结构域,核酸酶结构域,脱氨酶结构域,甲基化酶结构域,去甲基化酶结构域和调控剪接结构域。
  16. 根据权利要求15所述的缀合物,其特征在于,
    所述提供亚细胞定位的定位标签选自:核定位信号和核输出信号;
    所述有助于追踪、分离或纯化的标签选自:表位标签、荧光蛋白、HIS标签、血凝素(HA)标签、FLAG标签、Myc标签、谷胱甘肽S-转移酶(GST)标签以及麦芽糖结合蛋白(MBP)标签;
    所述翻译激活结构域选自:eIF4E和其他翻译起始因子和酵母poly(A)-结合蛋白和GLD2的结构域;
    所述翻译抑制结构域选自:Pumilio蛋白、脱腺苷酶和Argonaute蛋白;
    所述核酸酶结构域选自:FokⅠ、PIN核酸内切酶结构域、NYN结构域、来自SOT1的SMR结构域和来自葡萄球菌核酸酶的RNA酶结构域;
    所述脱氨酶结构域来自胞苷脱氨酶和腺苷脱氨酶;
    所述甲基化酶结构域来自m6A甲基化转移酶;
    所述去甲基化结构域来自RNA去甲基化酶ALKBH5;
    所述调控剪接结构域选自:SRSF1、hnRNP A1、RBM4。
  17. 根据权利要求14所述的缀合物,其特征在于,所述缀合物包含或不包含用于连接所述Cas13蛋白和所述修饰部分的接头。
  18. 一种gRNA,其特征在于,可与根据权利要求1-13任一项所述的Cas13蛋白或根据权利要求14-17任一项所述的缀合物形成复合物。
  19. 根据权利要求18所述的gRNA,其特征在于,包含指导序列和同向重复序列,所述指导序列可与靶核酸互补,所述同向重复序列可与所述Cas13蛋白或与所述缀合物相互作用。
  20. 根据权利要求18所述的gRNA,其特征在于,所述gRNA包含指导序列和同向重复序列,所述同向重复序列的二级结构包括依次连接的:互补配对的第一茎、非互补的凸起结构、互补配对的第二茎、非互补的环结构。
  21. 根据权利要求20所述的gRNA,其特征在于,
    a.所述第一茎由4-7对碱基对组成,
    b.所述非互补的凸起结构其中一条序列长度为2-6个核苷酸,
    c.所述第二茎由4-7对碱基对组成,
    和/或
    d.所述非互补的环结构序列长度为5-8个核苷酸。
  22. 根据权利要求18所述的gRNA,其特征在于,所述第一茎的其中一条序列选自:GUUG、GUUGU、GUUGUA和GUUGUUA。
  23. 根据权利要求19所述的gRNA,其特征在于所述同向重复序列选自SEQ ID NO:15-SEQ ID NO:21、SEQ ID NO:62中的任一项,或选自与SEQ ID NO:15-SEQ ID NO:21、SEQ ID NO:62中任一项所示序列具有90%以上序列同一性的序列。
  24. 根据权利要求19所述的gRNA,其特征在于,所述gRNA包含指导序列和同向重复序列,所述指导序列长度为10nt-60nt。
  25. 根据权利要求19所述的gRNA,其特征在于,所述靶核酸是PTBP1 mRNA、AQp1 mRNA、VEGFA mRNA、VEGFR1 mRNA或VEGFR2 mRNA。
  26. 一种组合物,其特征在于,包括:
    1)根据权利要求1-13任一项所述的Cas13蛋白、根据权利要求14-17任一项所述的缀合物、编码根据权利要求1-13任一项所述Cas13蛋白的核酸、或编码根据权利要求14- 17任一项所述缀合物的核酸;
    以及
    2)根据权利要求18-25任一项所述的gRNA或编码根据权利要求18-25任一项所述gRNA的核酸。
  27. 根据权利要求26所述的组合物,其特征在于,所述gRNA包含指导序列,所述指导序列可与靶核酸互补,所述靶核酸为PTBP1 mRNA或AQp1 mRNA。
  28. 一种载体,其特征在于,所述载体包含:
    1)编码根据权利要求1-13任一项所述的Cas13蛋白的核苷酸序列或编码根据权利要求14-17任一项所述的缀合物的核苷酸序列;
    和/或
    2)编码根据权利要求18-25任一项所述的gRNA的核苷酸序列。
  29. 根据权利要求28所述的载体,其特征在于,所述载体包含调节元件,所述调节元件可以调控所述核苷酸序列的表达。
  30. 根据权利要求29所述的载体,其特征在于,所述调节元件为启动子。
  31. 一种递送组合物,其特征在于,包括递送载体,以及选自以下的至少一种:
    根据权利要求1-13任一项所述的Cas13蛋白、根据权利要求14-17任一项所述的缀合物、根据权利要求18-25任一项所述的gRNA、根据权利要求26或27所述的组合物或根据权利要求28-30任一项所述的载体。
  32. 根据权利要求31所述的递送组合物,其特征在于,所述递送载体选自:递送粒子、递送囊泡、病毒载体中的至少一种。
  33. 一种细胞,其特征在于,包含:根据权利要求1-13任一项所述的Cas13蛋白、根据权利要求14-17任一项所述的缀合物、根据权利要求18-25任一项所述的gRNA、根据权利要求26或27所述的组合物、根据权利要求28-30任一项所述的载体中的至少一种。
  34. 根据权利要求33所述的细胞,其特征在于,所述细胞为真核细胞。
  35. 一种靶向或修饰靶核酸的方法,其特征在于,向所述靶核酸递送选自以下的至少一种:根据权利要求1-13任一项所述的Cas13蛋白、根据权利要求14-17任一项所述的缀合物、根据权利要求18-25任一项所述的gRNA、根据权利要求26或27所述的组合物、根据权利要求28-30任一项所述的载体或根据权利要求33或34所述的细胞。
  36. 根据权利要求35所述的方法,其特征在于,所述靶核酸来源于动物细胞、植物细胞或微生物细胞。
  37. 根据权利要求35所述的方法,其特征在于,所述靶核酸是PTBP1 mRNA、AQp1 mRNA、VEGFA mRNA、VEGFR1 mRNA或VEGFR2 mRNA。
  38. 根据权利要求1-13任一项所述的Cas13蛋白、根据权利要求14-17任一项所述的缀合物、根据权利要求18-25任一项所述的gRNA、根据权利要求26或27所述的组合物、根据权利要求28-30任一项所述的载体或根据权利要求33或34所述的细胞在制备用于诊断、预防或治疗受试者中疾病的药物中的用途。
  39. 一种核酸检测方法,其特征在于,包括使以下a和b形成复合物并与待测靶核酸结合的步骤:
    a.根据权利要求1-13任一项所述的Cas13蛋白或根据权利要求14-17任一项所述的缀合物,
    b.根据权利要求18-25任一项所述的gRNA。
  40. 根据权利要求39所述的方法,其特征在于,包括使根据权利要求14-17任一项所述的缀合物与根据权利要求18-25任一项所述的gRNA形成复合物,并与待测靶核酸结合;
    所述缀合物包含可检测标记,所述复合物结合、切割或修饰靶核酸致使所述可检测标记信号变化,通过观测可检测标记的信号变化情况分析待测样品中靶核酸的含量。
  41. 一种诊断、预防或治疗疾病的方法,包括向有需要的受试者施用有效量的根据权利要求1-13任一项所述的Cas13蛋白、根据权利要求14-17任一项所述的缀合物、根据权利要求18-25任一项所述的gRNA、根据权利要求26或27所述的组合物、根据权利要求28-30任一项所述的载体或者根据权利要求33或34所述的细胞。
  42. 一种用于诊断、预防或治疗疾病的药物组合物,所述药物组合物包含根据权利要求1-13任一项所述的Cas13蛋白、根据权利要求14-17任一项所述的缀合物、根据权利要求18-25任一项所述的gRNA、根据权利要求26或27所述的组合物、根据权利要求28-30任一项所述的载体或者根据权利要求33或34所述的细胞。
PCT/CN2022/129825 2021-11-05 2022-11-04 分离的Cas13蛋白及其应用 Ceased WO2023078384A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP22889405.1A EP4428232A4 (en) 2021-11-05 2022-11-04 Isolated Cas13 protein and its use
CN202280073436.6A CN118510892A (zh) 2021-11-05 2022-11-04 分离的Cas13蛋白及其应用
US18/652,819 US20240279630A1 (en) 2021-11-05 2024-05-02 Isolated cas13 protein and use thereof

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202111306149.9 2021-11-05
CN202111306149 2021-11-05
CN202210518826.1A CN116083398B (zh) 2021-11-05 2022-05-13 分离的Cas13蛋白及其应用
CN202210518826.1 2022-05-13

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/652,819 Continuation US20240279630A1 (en) 2021-11-05 2024-05-02 Isolated cas13 protein and use thereof

Publications (1)

Publication Number Publication Date
WO2023078384A1 true WO2023078384A1 (zh) 2023-05-11

Family

ID=86199736

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/129825 Ceased WO2023078384A1 (zh) 2021-11-05 2022-11-04 分离的Cas13蛋白及其应用

Country Status (4)

Country Link
US (1) US20240279630A1 (zh)
EP (1) EP4428232A4 (zh)
CN (2) CN116083398B (zh)
WO (1) WO2023078384A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024078645A3 (zh) * 2023-12-28 2024-11-07 广州瑞风生物科技有限公司 Cas蛋白及其应用
WO2025061113A1 (zh) * 2023-09-19 2025-03-27 广州瑞风生物科技有限公司 Cas12蛋白及其应用

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114075572B (zh) * 2021-11-16 2024-07-26 珠海中科先进技术研究院有限公司 一种与门基因电路及获取该与门基因电路的方法
WO2024230837A1 (zh) * 2023-05-11 2024-11-14 广州瑞风生物科技有限公司 指导rna、基因编辑系统及其应用
WO2025103411A1 (zh) * 2023-11-14 2025-05-22 广州瑞风生物科技有限公司 Cas蛋白、CRISPR-Cas系统及其应用
CN117230043B (zh) * 2023-11-14 2024-04-12 广州瑞风生物科技有限公司 Cas13蛋白及其应用
CN119661651B (zh) * 2024-12-31 2025-10-14 南京农业大学 抗菌肽cr19及其应用

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112410377A (zh) 2020-02-28 2021-02-26 中国科学院脑科学与智能技术卓越创新中心 VI-E型和VI-F型CRISPR-Cas系统及用途
CN113348245A (zh) * 2018-07-31 2021-09-03 博德研究所 新型crispr酶和系统

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2018251801B2 (en) * 2017-04-12 2024-11-07 Massachusetts Institute Of Technology Novel type VI CRISPR orthologs and systems
KR102761791B1 (ko) * 2017-06-26 2025-02-05 더 브로드 인스티튜트, 인코퍼레이티드 표적화된 핵산 편집을 위한 crispr/cas-아데닌 데아미나아제 기반 조성물, 시스템 및 방법
EP3931313A2 (en) * 2019-01-04 2022-01-05 Mammoth Biosciences, Inc. Programmable nuclease improvements and compositions and methods for nucleic acid amplification and detection
CN115175996A (zh) * 2019-09-20 2022-10-11 博德研究所 新颖vi型crispr酶和系统
WO2021175230A1 (zh) * 2020-03-02 2021-09-10 中国科学院分子细胞科学卓越创新中心 一种分离的Cas13蛋白
CN113337488B (zh) * 2020-03-02 2024-04-19 中国科学院分子细胞科学卓越创新中心 一种分离的Cas13蛋白
CN112522271B (zh) * 2020-12-23 2023-06-27 广州瑞风生物科技有限公司 一种sgRNA及其应用

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113348245A (zh) * 2018-07-31 2021-09-03 博德研究所 新型crispr酶和系统
CN112410377A (zh) 2020-02-28 2021-02-26 中国科学院脑科学与智能技术卓越创新中心 VI-E型和VI-F型CRISPR-Cas系统及用途

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
RICHTER MROSSELLO-MORA R: "Shifting the genomic gold standard for the prokaryotic species definition", PROC NATL ACAD SCI U S A., vol. 106, no. 45, 10 November 2009 (2009-11-10), pages 19126 - 31
See also references of EP4428232A4
SLAYMAKER, IAN M. ET AL.: "High-resolution structure of Cas13b and biochemical characterization of RNAtargeting and cleavage", CELL REPORTS, vol. 26, no. 13, 2019, pages 3741 - 3751, XP055754078, DOI: 10.1016/j.celrep.2019.02.094

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2025061113A1 (zh) * 2023-09-19 2025-03-27 广州瑞风生物科技有限公司 Cas12蛋白及其应用
WO2024078645A3 (zh) * 2023-12-28 2024-11-07 广州瑞风生物科技有限公司 Cas蛋白及其应用

Also Published As

Publication number Publication date
CN116083398B (zh) 2024-01-05
EP4428232A4 (en) 2026-01-28
US20240279630A1 (en) 2024-08-22
CN116083398A (zh) 2023-05-09
CN118510892A (zh) 2024-08-16
EP4428232A1 (en) 2024-09-11

Similar Documents

Publication Publication Date Title
WO2023078384A1 (zh) 分离的Cas13蛋白及其应用
US12551560B2 (en) Compositions and methods for use in immunotherapy
JP6965466B2 (ja) 操作されたカスケード構成要素およびカスケード複合体
JP2022081503A (ja) CRISPR/Cas9の核送達を通じた細胞RNAの追跡と操作
KR20210138603A (ko) 표적 서열에서 핵염기를 변형하기 위한 아데노신 데아미나제 염기 편집기를 갖는 변형된 면역 세포
AU2018321105B2 (en) Improved transposase polypeptide and uses thereof
US20240287453A1 (en) Persistent allogeneic modified immune cells and methods of use thereof
KR20180030084A (ko) 조작된 crispr-cas9 조성물 및 사용 방법
US20230203481A1 (en) Effector proteins and methods of use
WO2017136520A1 (en) Mitochondrial genome editing and regulation
KR20210129108A (ko) 글리코겐 저장 질환 1a형을 치료하기 위한 조성물 및 방법
WO2023227028A1 (zh) 新型Cas效应蛋白、基因编辑系统及用途
WO2024020346A2 (en) Gene editing components, systems, and methods of use
EP4692337A1 (en) Cas protein and mutant thereof, and corresponding gene editing system and use thereof
WO2020069029A1 (en) Novel crispr nucleases
US20200208177A1 (en) Methods and compositions for genome editing
KR102683424B1 (ko) 타우 응집과 관련된 유전적 취약성을 식별하기 위한 CRISPR/Cas 드랍아웃 스크리닝 플랫폼
US20250262304A1 (en) Fratricide resistant modified immune cells and methods of using the same
US12297450B2 (en) CRISPR-Cas13 system and use thereof
US20250340901A1 (en) Gene editing components, systems, and methods of use
CN117230043A (zh) Cas13蛋白及其应用
CN111051509A (zh) 用于电介质校准的含有c2cl核酸内切酶的组合物以及使用其进行电介质校准的方法
EP4413140B1 (en) Synthetic genome editing system
WO2025103411A1 (zh) Cas蛋白、CRISPR-Cas系统及其应用
WO2023207607A1 (zh) 用于修饰线粒体dna的脱氨酶突变体、组合物和方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22889405

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202280073436.6

Country of ref document: CN

WWE Wipo information: entry into national phase

Ref document number: 2022889405

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2022889405

Country of ref document: EP

Effective date: 20240605