WO2025051140A1 - 新型CRISPR-Casσ酶和系统 - Google Patents

新型CRISPR-Casσ酶和系统 Download PDF

Info

Publication number
WO2025051140A1
WO2025051140A1 PCT/CN2024/116773 CN2024116773W WO2025051140A1 WO 2025051140 A1 WO2025051140 A1 WO 2025051140A1 CN 2024116773 W CN2024116773 W CN 2024116773W WO 2025051140 A1 WO2025051140 A1 WO 2025051140A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
nucleic acid
protein
cell
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/CN2024/116773
Other languages
English (en)
French (fr)
Inventor
赖锦盛
杨志佳
余镁霞
陈建
辛蓓蓓
滕云鹏
周跃恒
赵海铭
宋伟彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Agricultural University
Original Assignee
China Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Agricultural University filed Critical China Agricultural University
Priority to EP24861983.5A priority Critical patent/EP4631979A1/en
Priority to US19/011,407 priority patent/US20250179534A1/en
Publication of WO2025051140A1 publication Critical patent/WO2025051140A1/zh
Anticipated expiration legal-status Critical
Pending legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/85Vectors or expression systems specially adapted for eukaryotic hosts for animal cells
    • C12N15/86Viral vectors
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • C12N15/902Stable introduction of foreign DNA into chromosome using homologous recombination
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • C12N15/902Stable introduction of foreign DNA into chromosome using homologous recombination
    • C12N15/907Stable introduction of foreign DNA into chromosome using homologous recombination in mammalian cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases [RNase]; Deoxyribonucleases [DNase]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6816Hybridisation assays characterised by the detection means
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/01Fusion polypeptide containing a localisation/targetting motif
    • C07K2319/09Fusion polypeptide containing a localisation/targetting motif containing a nuclear localisation signal
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/70Fusion polypeptide containing domain for protein-protein interaction
    • C07K2319/71Fusion polypeptide containing domain for protein-protein interaction containing domain for transcriptional activaation, e.g. VP16
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPR]

Definitions

  • the present invention relates to the field of nucleic acid editing, in particular to the technical field of regularly clustered interspaced short palindromic repeats (CRISPR). Specifically, the present invention relates to Cas effector proteins, fusion proteins comprising such proteins, and nucleic acid molecules encoding them. The present invention also relates to complexes and compositions for nucleic acid editing (e.g., gene or genome editing), comprising proteins or fusion proteins of the present invention, or nucleic acid molecules encoding them. The present invention also relates to methods for nucleic acid editing (e.g., gene or genome editing), which use proteins or fusion proteins comprising the present invention.
  • CRISPR regularly clustered interspaced short palindromic repeats
  • CRISPR/Cas technology is a widely used gene editing technology that uses RNA to guide specific binding to target sequences on the genome and cut DNA to produce double-strand breaks, and uses biological non-homologous end joining or homologous recombination for site-specific gene editing.
  • the CRISPR/Cas9 system is the most commonly used Type II CRISPR system, which recognizes the PAM motif of 3’-NGG and performs blunt-end cutting on the target sequence.
  • the CRISPR/Cas Type V system is a newly discovered CRISPR system in the past two years. It has a 5’-TTN motif and performs sticky-end cutting on the target sequence, such as Cpf1, C2c1, CasX, and CasY.
  • the different CRISPR/Cas currently available have different advantages and disadvantages.
  • Cas9, C2c1, and CasX all require two RNAs for guide RNA, while Cpf1 only requires one guide RNA and can be used for multiple gene editing.
  • CasX has a size of 980 amino acids, while the common Cas9, C2c1, CasY, and Cpf1 are usually around 1,300 amino acids in size.
  • the PAM sequences of Cas9, Cpf1, CasX, and CasY are relatively complex and diverse, while C2c1 recognizes the rigorous 5’-TTN, so its target site is easier to predict than other systems, thereby reducing potential off-target effects.
  • the inventors of this application unexpectedly discovered a new type of RNA-guided nuclease. Based on this discovery, the inventors developed a new CRISPR/Cas system and a gene editing method based on the system.
  • the present invention provides a protein having SEQ ID NO: 1, 2, 3, 4, The amino acid sequence shown in any one of 5, 6, 7, 8, 9, 10, 11, 12 and 13 or its direct homolog, homolog, variant or functional fragment; wherein the direct homolog, homolog, variant or functional fragment basically retains the biological function of the sequence from which it is derived.
  • the biological functions of the above sequences include, but are not limited to, the activity of binding to guide RNA, the endonuclease activity, and the activity of binding to and cutting a specific site of the target sequence under the guidance of the guide RNA.
  • the ortholog, homolog, variant has at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity compared to the sequence from which it is derived.
  • the orthologs, homologs, variants have at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity compared to the sequence shown in any one of SEQ ID NOs: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 and 13, and substantially retain the biological function of the sequence from which it is derived (e.g., activity binding to guide RNA, endonuclease activity, activity binding to and cutting a specific site of the target sequence under the guidance of the guide RNA).
  • sequence identity compared to the sequence shown in any one of SEQ ID NOs: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 and 13, and substantially retain the biological function of the sequence from which it is derived (e.g., activity binding to guide RNA, endonuclease activity, activity binding to and cutting
  • the protein is an effector protein in a CRISPR/Cas system.
  • the protein of the invention comprises a sequence selected from the following, or consists of a sequence selected from the following:
  • the protein of the present invention can be derivatized, for example, connected to another molecule (e.g., another polypeptide or protein).
  • another molecule e.g., another polypeptide or protein.
  • the derivatization (e.g., labeling) of the protein will not adversely affect the desired activity of the protein (e.g., activity binding to a guide RNA, endonuclease activity, activity binding to and cutting a specific site of a target sequence under the guidance of a guide RNA). Therefore, the protein of the present invention is also intended to include such derivatized forms.
  • the protein of the present invention can be The white functional link (by chemical coupling, gene fusion, non-covalent linkage or other means) to one or more other molecular groups, such as another protein or polypeptide, a detection reagent, a pharmaceutical agent, etc.
  • the protein of the present invention can be linked to other functional units.
  • it can be linked to a nuclear localization signal (NLS) sequence to improve the ability of the protein of the present invention to enter the nucleus.
  • NLS nuclear localization signal
  • it can be linked to a targeting moiety to make the protein of the present invention targeted.
  • it can be linked to a detectable label to facilitate detection of the protein of the present invention.
  • it can be linked to an epitope tag to facilitate expression, detection, tracing and/or purification of the protein of the present invention.
  • the present invention provides a conjugate comprising a protein as described above and a modifying moiety.
  • the modifying moiety is selected from another protein or polypeptide, a detectable label, or any combination thereof.
  • the additional protein or polypeptide is selected from an epitope tag, a reporter gene sequence, a nuclear localization signal (NLS) sequence, a targeting moiety, a transcriptional activation domain (e.g., VP64), a transcriptional repression domain (e.g., a KRAB domain or a SID domain), a nuclease domain (e.g., Fok1), a domain having an activity selected from the following: nucleotide deaminase, methylase activity, demethylase, transcriptional activation activity, transcriptional repression activity, transcriptional release factor activity, histone modification activity, nuclease activity, single-stranded RNA cleavage activity, double-stranded RNA cleavage activity, single-stranded DNA cleavage activity, double-stranded DNA cleavage activity and nucleic acid binding activity; and any combination thereof.
  • a transcriptional activation domain e.g., VP64
  • the conjugates of the invention comprise one or more NLS sequences, such as the NLS of the large T antigen of the SV40 virus.
  • the NLS sequence is as shown in SEQ ID NO:53.
  • the NLS sequence is located at, near, or close to the end (e.g., N-terminus or C-terminus) of the protein of the invention.
  • the NLS sequence is located at, near, or close to the C-terminus of the protein of the invention.
  • the conjugate of the present invention comprises an epitope tag.
  • epitope tags are well known to those skilled in the art, and examples thereof include but are not limited to His, V5, FLAG, HA, Myc, VSV-G, Trx, etc., and those skilled in the art know how to select a suitable epitope tag according to the desired purpose (e.g., purification, detection or tracing).
  • the conjugates of the present invention comprise a reporter gene sequence.
  • reporter genes are well known to those skilled in the art, and examples thereof include, but are not limited to, GST, HRP, CAT, GFP, HcRed, DsRed, CFP, YFP, BFP, and the like.
  • the conjugates of the present invention comprise a domain capable of binding to a DNA molecule or an intracellular molecule, such as maltose binding protein (MBP), the DNA binding domain (DBD) of Lex A, the DBD of GAL4, etc.
  • MBP maltose binding protein
  • DBD DNA binding domain
  • GAL4 GAL4
  • the conjugates of the invention comprise a detectable label, such as a fluorescent dye, eg, FITC or DAPI.
  • a detectable label such as a fluorescent dye, eg, FITC or DAPI.
  • the protein of the invention is optionally coupled, conjugated or fused to the modified portion via a linker. combine.
  • the modifying moiety is directly linked to the N-terminus or C-terminus of the protein of the invention.
  • the modified portion is connected to the N-terminus or C-terminus of the protein of the present invention via a linker.
  • linkers are well known in the art, and examples thereof include but are not limited to linkers comprising one or more (e.g., 1, 2, 3, 4 or 5) amino acids (e.g., Glu or Ser) or amino acid derivatives (e.g., Ahx, ⁇ -Ala, GABA or Ava), or PEG, etc.
  • the present invention provides a fusion protein comprising the protein of the present invention and another protein or polypeptide.
  • the fusion protein of the present invention comprises one or more NLS sequences, such as the NLS of the large T antigen of the SV40 virus.
  • the NLS sequence is located at, near, or close to the end (e.g., N-terminus or C-terminus) of the protein of the present invention.
  • the NLS sequence is shown in SEQ ID NO:53.
  • the NLS sequence is located at, near, or close to the C-terminus of the protein of the present invention.
  • the fusion proteins of the invention comprise an epitope tag.
  • the fusion proteins of the invention comprise a reporter gene sequence.
  • the fusion proteins of the invention comprise a domain capable of binding to a DNA molecule or an intracellular molecule.
  • the protein of the invention is fused to the additional protein or polypeptide, optionally via a linker.
  • the additional protein or polypeptide is directly linked to the N-terminus or C-terminus of the protein of the invention.
  • the additional protein or polypeptide is linked to the N-terminus or C-terminus of the protein of the invention via a linker.
  • the fusion protein of the present invention has an amino acid sequence as shown in any one of SEQ ID NO:54-66.
  • the protein, conjugate or fusion protein of the present invention is not limited by the production method, for example, it can be produced by genetic engineering methods (recombinant technology) or by chemical synthesis methods.
  • the present invention provides an isolated nucleic acid molecule comprising a sequence selected from the group consisting of Composed from the following sequence:
  • sequence described in any one of (ii) to (v) substantially retains the biological function of the sequence from which it is derived, and the biological function of the sequence refers to the activity as a direct repeat sequence in the CRISPR-Cas system.
  • the isolated nucleic acid molecule is a direct repeat sequence in a CRISPR-Cas system.
  • the nucleic acid molecule comprises or consists of a sequence selected from the group consisting of:
  • the isolated nucleic acid molecule is RNA.
  • the present invention provides a composite comprising:
  • a protein component selected from the group consisting of a protein, a conjugate or a fusion protein of the present invention, and any combination thereof;
  • nucleic acid component comprising, from 5' to 3' direction, an isolated nucleic acid molecule as described above and a guide sequence capable of hybridizing to a target sequence
  • the protein component and the nucleic acid component are combined with each other to form a complex.
  • the guide sequence is linked to the 3' end of the nucleic acid molecule.
  • the guide sequence comprises the complement of the target sequence.
  • the nucleic acid component is a guide RNA in a CRISPR-Cas system.
  • the nucleic acid molecule is RNA.
  • the complex does not comprise a trans-acting crRNA (tracrRNA).
  • the guide sequence is at least 5, at least 10, at least 15, at least 20, at least 25, at least 30 nucleotides in length. In certain embodiments, the guide sequence is 10-30, or 15-25, or 15-22, or 19-25, or 19-22 nucleotides in length.
  • the isolated nucleic acid molecule is 55-70 nucleotides in length, such as 55-65 nucleotides, such as 60-65 nucleotides, such as 62-65 nucleotides, such as 63-64 nucleotides. In some embodiments, the isolated nucleic acid molecule is 15-30 nucleotides in length, such as 15-25 nucleotides, such as 20-25 nucleotides, such as 22-24 nucleotides, such as 23 nucleotides.
  • the present invention provides an isolated nucleic acid molecule comprising:
  • nucleotide sequence described in any one of (i)-(iii) is codon optimized for expression in prokaryotes. In certain embodiments, the nucleotide sequence described in any one of (i)-(iii) is codon optimized for expression in eukaryotic cells.
  • the present invention further provides a vector comprising the isolated nucleic acid molecule as described in the sixth aspect.
  • the vector of the present invention may be a cloning vector or an expression vector.
  • the vector of the present invention is, for example, a plasmid, a cosmid, a phage, a cosmid, etc.
  • the vector is capable of expressing the protein of the present invention, the fusion protein, the isolated nucleic acid molecule as described in the fourth aspect, or the complex as described in the fifth aspect in a subject (e.g., a mammal, such as a human).
  • the present invention also provides a host cell comprising an isolated nucleic acid molecule or vector as described above.
  • host cells include, but are not limited to, prokaryotic cells such as Escherichia coli cells, and eukaryotic cells such as yeast cells, insect cells, plant cells and animal cells (such as mammalian cells, such as mouse cells, human cells, etc.).
  • the cell of the present invention can also be a cell line, such as 293T cells.
  • compositions and carrier compositions are provided.
  • the present invention further provides a composition comprising:
  • a first component selected from the group consisting of: a protein, a conjugate, a fusion protein, a nucleotide sequence encoding the protein or fusion protein of the present invention, and any combination thereof;
  • a second component which is a nucleotide sequence comprising a guide RNA, or a nucleotide sequence encoding the nucleotide sequence comprising a guide RNA;
  • the guide RNA comprises a direct repeat sequence and a guide sequence from the 5' to the 3' direction, and the guide sequence can hybridize with the target sequence;
  • the guide RNA is capable of forming a complex with the protein, conjugate or fusion protein described in (i).
  • the direct repeat sequence is an isolated nucleic acid molecule as defined in the fourth aspect.
  • the guide sequence is linked to the 3' end of the direct repeat sequence. In certain embodiments, the guide sequence comprises a complementary sequence to the target sequence.
  • the composition does not comprise crRNA (tracrRNA).
  • the composition is non-naturally occurring or modified. In certain embodiments, at least one component in the composition is non-naturally occurring or modified. One component is non-naturally occurring or modified; and/or, the second component is non-naturally occurring or modified.
  • the target sequence when the target sequence is DNA, the target sequence is located at the 3' end of the protospacer adjacent motif (PAM), and the PAM has a sequence shown as 5'-NTN, wherein each N is independently selected from A, G, T or C; for example, the sequence of the PAM is ATG, ATG, GTG, ATA, ATA, GTA, GTA and/or GTG.
  • PAM protospacer adjacent motif
  • the target sequence when the target sequence is RNA, the target sequence has no PAM domain restriction.
  • the target sequence is a DNA or RNA sequence from a prokaryotic cell or a eukaryotic cell. In certain embodiments, the target sequence is a non-naturally occurring DNA or RNA sequence.
  • the target sequence is present in a cell. In certain embodiments, the target sequence is present in a cell nucleus or in the cytoplasm (e.g., an organelle). In certain embodiments, the cell is a eukaryotic cell. In certain embodiments, the cell is a prokaryotic cell.
  • the protein is connected to one or more NLS sequences.
  • the conjugate or fusion protein comprises one or more NLS sequences.
  • the NLS sequence is connected to the N-terminus or C-terminus of the protein.
  • the NLS sequence is fused to the N-terminus or C-terminus of the protein.
  • the present invention also provides a composition comprising one or more carriers, wherein the one or more carriers comprise:
  • a first nucleic acid comprising a nucleotide sequence encoding a protein or fusion protein of the present invention; optionally, the first nucleic acid is operably linked to a first regulatory element;
  • a second nucleic acid comprising a nucleotide sequence encoding a guide RNA; optionally the second nucleic acid is operably linked to a second regulatory element;
  • the first nucleic acid and the second nucleic acid are present on the same or different vectors
  • the guide RNA comprises a direct repeat sequence and a guide sequence from the 5' to the 3' direction, and the guide sequence can hybridize with the target sequence;
  • the guide RNA is capable of forming a complex with the effector protein or fusion protein described in (i).
  • the direct repeat sequence is an isolated nucleic acid molecule as defined in the fourth aspect.
  • the guide sequence is linked to the 3' end of the direct repeat sequence. In certain embodiments, the guide sequence comprises a complementary sequence to the target sequence.
  • the composition does not comprise trans-acting crRNA (tracrRNA).
  • the composition is non-naturally occurring or modified. In certain embodiments, at least one component in the composition is non-naturally occurring or modified.
  • the first regulatory element is a promoter, such as an inducible promoter.
  • the second regulatory element is a promoter, such as an inducible promoter.
  • the target sequence when the target sequence is DNA, the target sequence is located at the 3' end of the protospacer adjacent motif (PAM), and the PAM has a sequence shown as 5'-NTN, wherein each N is independently selected from A, G, T or C; for example, the sequence of the PAM is ATG, ATG, GTG, ATA, ATA, GTA, GTA and/or GTG.
  • PAM protospacer adjacent motif
  • the target sequence when the target sequence is RNA, the target sequence has no PAM domain restriction.
  • the target sequence is a DNA or RNA sequence from a prokaryotic cell or a eukaryotic cell. In certain embodiments, the target sequence is a non-naturally occurring DNA or RNA sequence.
  • the target sequence is present in a cell. In certain embodiments, the target sequence is present in a cell nucleus or in the cytoplasm (e.g., an organelle). In certain embodiments, the cell is a eukaryotic cell. In certain embodiments, the cell is a prokaryotic cell.
  • the protein is connected to one or more NLS sequences.
  • the conjugate or fusion protein comprises one or more NLS sequences.
  • the NLS sequence is connected to the N-terminus or C-terminus of the protein.
  • the NLS sequence is fused to the N-terminus or C-terminus of the protein.
  • a type of vector is a plasmid, which refers to a circular double-stranded DNA loop in which other DNA fragments can be inserted, for example, by standard molecular cloning techniques.
  • a viral vector in which a virally derived DNA or RNA sequence is present in a vector for packaging a virus (e.g., a retrovirus, a replication-defective retrovirus, an adenovirus, a replication-defective adenovirus, and an adeno-associated virus).
  • the viral vector also includes a polynucleotide carried by a virus for transfection into a host cell.
  • Some vectors can replicate autonomously in the host cell into which they are introduced.
  • Other vectors e.g., non-additional mammalian vectors
  • some vectors can direct the expression of the genes that are operably connected to them.
  • Such a vector is referred to as an "expression vector" herein.
  • the common expression vector used in recombinant DNA technology is typically a plasmid form.
  • Recombinant expression vectors may contain the nucleic acid molecules of the present invention in a form suitable for nucleic acid expression in a host cell, which means that these recombinant expression vectors contain one or more regulatory elements selected based on the host cell to be used for expression, which are operably linked to the nucleic acid sequence to be expressed.
  • the protein, conjugate, fusion protein of the present invention, the isolated nucleic acid molecule as described in the fourth aspect, the complex of the present invention, the isolated nucleic acid molecule as described in the sixth aspect, the vector as described in the seventh aspect, and the composition as described in the ninth aspect and the tenth aspect can be delivered by any method known in the art.
  • Such methods include, but are not limited to, electroporation, lipofection, nuclear transfection, microinjection, sonoporation, gene gun, calcium phosphate-mediated transfection, cationic transfection, liposome transfection, dendritic transfection, heat shock transfection, nuclear transfection, magnetofection, lipofection, puncture transfection, optical transfection, agent-enhanced nucleic acid uptake, and delivery via liposomes, immunoliposomes, viral particles, artificial Delivery of virions, etc.
  • the present invention provides a delivery composition
  • a delivery composition comprising a delivery vector and one or more selected from the following: a protein, a conjugate, a fusion protein of the present invention, an isolated nucleic acid molecule as described in the fourth aspect, a complex of the present invention, an isolated nucleic acid molecule as described in the sixth aspect, a vector as described in the seventh aspect, and a composition as described in the ninth aspect and the tenth aspect.
  • the delivery vehicle is a particle.
  • the delivery vehicle is selected from a lipid particle, a sugar particle, a metal particle, a protein particle, a liposome, an exosome, a microvesicle, a gene gun, or a viral vector (e.g., a replication-defective retrovirus, a lentivirus, an adenovirus, or an adeno-associated virus).
  • a viral vector e.g., a replication-defective retrovirus, a lentivirus, an adenovirus, or an adeno-associated virus.
  • the present invention provides a kit comprising one or more of the components described above.
  • the kit comprises one or more components selected from the following: a protein, a conjugate, a fusion protein of the present invention, an isolated nucleic acid molecule as described in the fourth aspect, a complex of the present invention, an isolated nucleic acid molecule as described in the sixth aspect, a vector as described in the seventh aspect, a composition as described in the ninth aspect and the tenth aspect.
  • the kit of the present invention comprises the composition as described in the ninth aspect. In certain embodiments, the kit further comprises instructions for using the composition.
  • the kit of the present invention comprises the composition as described in the tenth aspect. In certain embodiments, the kit further comprises instructions for using the composition.
  • kits of the present invention may be provided in any suitable container.
  • the kit further comprises one or more buffers.
  • the buffer can be any buffer, including but not limited to sodium carbonate buffer, sodium bicarbonate buffer, borate buffer, Tris buffer, MOPS buffer, HEPES buffer and combinations thereof.
  • the buffer is alkaline. In certain embodiments, the buffer has a pH from about 7 to about 10.
  • the kit further comprises one or more oligonucleotides corresponding to a guide sequence for insertion into a vector so as to operably link the guide sequence and regulatory elements.
  • the kit comprises a homologous recombination template polynucleotide.
  • the present invention provides a method for modifying a target gene, comprising: contacting the complex described in the fifth aspect, the composition described in the ninth aspect, or the composition described in the tenth aspect with the target gene, or delivering it to a cell containing the target gene; the target sequence is present in the target gene.
  • the method is used to modify a target gene in vitro or ex vivo. In some embodiments, the method is not a method of treating a human or animal by therapy. In the method, the method does not include the step of modifying the human germline genetic characteristics.
  • the target gene is present in a cell.
  • the cell is a prokaryotic cell.
  • the cell is a eukaryotic cell.
  • the cell is a mammalian cell.
  • the cell is a human cell.
  • the cell is selected from non-human primates, cattle, pigs or rodent cells.
  • the cell is a non-mammalian eukaryotic cell, such as poultry or fish.
  • the cell is a plant cell, such as a cell of a cultivated plant (such as cassava, corn, sorghum, wheat or rice), algae, tree or vegetable.
  • the target gene is present in a nucleic acid molecule (e.g., a plasmid) in vitro. In certain embodiments, the target gene is present in a plasmid.
  • a nucleic acid molecule e.g., a plasmid
  • the method results in a target sequence break (e.g., a DNA double-strand break or an RNA single-strand break).
  • a target sequence break e.g., a DNA double-strand break or an RNA single-strand break.
  • the break results in a reduction in transcription of a target gene.
  • the method further comprises: contacting the editing template (e.g., exogenous nucleic acid) with the target gene, or delivering it to a cell comprising the target gene.
  • the method repairs the broken target gene by homologous recombination with an editing template (e.g., exogenous nucleic acid), wherein the repair results in a mutation, including insertion, deletion, or substitution of one or more nucleotides of the target gene.
  • the mutation results in one or more amino acid changes in a protein expressed from a gene comprising the target sequence.
  • the modification further comprises inserting an editing template (eg, an exogenous nucleic acid) into the break.
  • an editing template eg, an exogenous nucleic acid
  • the protein, protein truncation, conjugate, fusion protein, isolated nucleic acid molecule, complex, vector or composition is contained in a delivery vehicle.
  • the delivery vehicle is selected from lipid particles, sugar particles, metal particles, protein particles, liposomes, exosomes, viral vectors (such as replication-defective retroviruses, lentiviruses, adenoviruses, or adeno-associated viruses).
  • viral vectors such as replication-defective retroviruses, lentiviruses, adenoviruses, or adeno-associated viruses.
  • the methods are used to modify a cell, cell line, or organism by altering one or more target sequences in a target gene or a nucleic acid molecule encoding a target gene product.
  • the present invention provides a method for changing the expression of a gene product, comprising: contacting the complex as described in the fifth aspect, the composition as described in the ninth aspect, or the composition as described in the tenth aspect with a nucleic acid molecule encoding the gene product, or delivering it to a cell containing the nucleic acid molecule, wherein the target sequence is present in the nucleic acid molecule.
  • the method is used to change the expression of a gene product in vitro or in vitro. In certain embodiments, the method is not a method for treating a human or animal by therapy. In certain embodiments, the method does not include the step of modifying human germline genetic characteristics.
  • the nucleic acid molecule is present in a cell.
  • the cell is a prokaryotic cell.
  • the cell is a eukaryotic cell.
  • the cell is a mammalian cell.
  • the cell is a human cell.
  • the cell is selected from a non-human primate, a bovine, a porcine, or a rodent cell.
  • the cell is a non Mammalian eukaryotic cells, such as poultry or fish, etc.
  • the cell is a plant cell, such as a cell of a cultivated plant (such as cassava, corn, sorghum, wheat or rice), algae, tree or vegetable.
  • the expression of the gene product is altered (e.g., enhanced or reduced). In certain embodiments, the expression of the gene product is enhanced. In certain embodiments, the expression of the gene product is reduced.
  • the gene product is a protein.
  • the protein, protein truncation, conjugate, fusion protein, isolated nucleic acid molecule, complex, vector or composition is contained in a delivery vehicle.
  • the delivery vehicle is selected from lipid particles, sugar particles, metal particles, protein particles, liposomes, exosomes, viral vectors (such as replication-defective retroviruses, lentiviruses, adenoviruses, or adeno-associated viruses).
  • viral vectors such as replication-defective retroviruses, lentiviruses, adenoviruses, or adeno-associated viruses.
  • the methods are used to modify a cell, cell line, or organism by altering one or more target sequences in a target gene or a nucleic acid molecule encoding a target gene product.
  • the present invention relates to the protein as described in the first aspect, the conjugate as described in the second aspect, the fusion protein as described in the third aspect, the isolated nucleic acid molecule as described in the fourth aspect, the complex as described in the fifth aspect, the isolated nucleic acid molecule as described in the sixth aspect, the vector as described in the seventh aspect, the composition as described in the ninth aspect, the composition as described in the tenth aspect, and the kit of the present invention, and their use in preparing a preparation for nucleic acid editing (e.g., in vitro or ex vivo nucleic acid editing).
  • nucleic acid editing e.g., in vitro or ex vivo nucleic acid editing
  • the nucleic acid to be edited is present in a cell.
  • the cell is a prokaryotic cell or a eukaryotic cell.
  • the nucleic acid to be edited is present in an in vitro nucleic acid molecule (e.g., a plasmid).
  • the nucleic acid editing includes gene or genome editing, such as modifying a gene, knocking out a gene, changing the expression of a gene product, repairing a mutation, and/or inserting a polynucleotide.
  • the gene or genome editing does not include a step of modifying human germline genetic characteristics.
  • the use is not a method for treating a human or animal by therapy.
  • the use further comprises repairing the edited target sequence by homologous recombination with an exogenous template polynucleotide, wherein the repair can produce a mutation of the target sequence, including insertion, deletion or substitution of one or more nucleotides.
  • the present invention relates to the protein as described in the first aspect, the conjugate as described in the second aspect, the fusion protein as described in the third aspect, the isolated nucleic acid molecule as described in the fourth aspect, the complex as described in the fifth aspect, the isolated nucleic acid molecule as described in the sixth aspect, the vector as described in the seventh aspect, the composition as described in the ninth aspect, the composition as described in the tenth aspect, the kit of the present invention, and use thereof in preparing a preparation, wherein the preparation is used Used in: (i) in vitro or ex vivo DNA detection; (ii) editing a target sequence in a target locus to modify an organism or non-human organism (e.g., a prokaryotic organism).
  • an organism or non-human organism e.g., a prokaryotic organism.
  • the present invention also provides a method for detecting whether a target nucleic acid is present in a sample, comprising the following steps:
  • the DNA probe emits a detectable signal after being cleaved
  • the sequence of the target nucleic acid is obtained from the genome of a tumor cell.
  • the target nucleic acid is single-stranded or double-stranded.
  • the sequence of the target nucleic acid is a DNA or RNA sequence from a prokaryotic or eukaryotic cell; or, the sequence of the target nucleic acid is a non-naturally occurring DNA or RNA sequence.
  • the detectable signal is determined by one or more methods selected from the group consisting of imaging-based detection, sensor-based detection, color detection, gold nanoparticle-based detection, fluorescence polarization, colloidal phase transition/dispersion, electrochemical detection, and semiconductor-based sensing.
  • the modification introduced into the cell by the method of the present invention can cause the cell and its progeny to be changed to improve the production of its biological product (such as antibody, starch, ethanol or other desired cell output). In some cases, the modification introduced into the cell by the method of the present invention can cause the cell and its progeny to include changes that cause the produced biological product to change.
  • its biological product such as antibody, starch, ethanol or other desired cell output.
  • the present invention also relates to a cell obtained by the method as described above, or a progeny thereof, wherein the cell contains a modification not present in its wild type form.
  • the present invention also relates to cell products of the cells as described above or their progeny.
  • the present invention also relates to an in vitro, ex vivo or in vivo cell or cell line or their progeny, which comprises: the protein as described in the first aspect, the conjugate as described in the second aspect, the fusion protein as described in the third aspect, the isolated nucleic acid molecule as described in the fourth aspect, the complex as described in the fifth aspect, the isolated nucleic acid molecule as described in the sixth aspect, the vector as described in the seventh aspect, the composition as described in the ninth aspect, the composition as described in the tenth aspect, the kit or delivery composition of the present invention.
  • the cell is a eukaryotic cell. In certain embodiments, the cell is a mammalian cell. In certain embodiments, the cell is a human cell. In certain embodiments, the cell is a non-human mammalian cell, such as a cell of a non-human primate, a cow, a sheep, a pig, a dog, a monkey, a rabbit, a rodent (such as a rat or a mouse). In certain embodiments, the cell is a non-mammalian eukaryotic cell, such as a cell of a poultry bird (such as a chicken), a fish or a crustacean (such as a clam, a shrimp).
  • a poultry bird such as a chicken
  • fish or a crustacean such as a clam, a shrimp
  • the cell is a stem cell or a stem cell line.
  • CRISPR-CRISPR-associated (Cas) system CRISPR-Cas system
  • CRISPR system CRISPR system
  • transcripts or other elements associated with the expression of CRISPR-associated (“Cas) genes or transcripts or other elements capable of directing the activity of the Cas genes.
  • Such transcripts or other elements may include sequences encoding Cas effector proteins and guide RNAs including CRISPR RNA (crRNA), as well as trans-acting crRNA (tracrRNA) sequences contained in the CRISPR-Cas9 system, or other sequences or transcripts from the CRISPR locus.
  • crRNA CRISPR RNA
  • tracrRNA trans-acting crRNA
  • Cas effector protein As used herein, the terms “Cas effector protein”, “Cas effector enzyme” are used interchangeably and refer to any protein greater than 800 amino acids in length that is present in the CRISPR-Cas system. In some cases, such proteins refer to proteins identified from the Cas locus.
  • a guide RNA may comprise, or consist essentially of, or consist of a direct repeat sequence and a guide sequence (also referred to as a spacer in the context of an endogenous CRISPR system).
  • a guide sequence is any polynucleotide sequence that has sufficient complementarity to a target sequence to hybridize with the target sequence and guide the specific binding of a CRISPR/Cas complex to the target sequence.
  • the degree of complementarity between a guide sequence and its corresponding target sequence is at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, or at least 99%. Determining optimal alignment is within the capabilities of a person of ordinary skill in the art. For example, there are publicly and commercially available alignment algorithms and programs such as, but not limited to, ClustalW, Smith-Waterman in matlab, Bowtie, Geneious, Biopython, and SeqMan.
  • the guide sequence is at least 5, at least 10, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 35, at least 40, at least 45, or at least 50 nucleotides in length. In some cases, the guide sequence is no more than 50, 45, 40, 35, 30, 25, 24, 23, 22, 21, 20, 15, 10 or fewer nucleotides in length. In certain embodiments, the guide sequence is 10-30, or 15-25, or 15-22, or 19-25, or 19-22 nucleotides in length.
  • the direct repeat sequence is at least 10, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 35, at least 40, at least 45, at least 50, at least 55, at least 56, at least 57, at least 58, at least 59, at least 60, at least 61, at least 62, at least 63, at least 64, at least 65, or at least 70 nucleotides in length.
  • CRISPR/Cas complex refers to a ribonucleoprotein complex formed by the binding of a guide RNA or mature crRNA to a Cas protein, which comprises a guide sequence that hybridizes to a target sequence and binds to the Cas protein.
  • the ribonucleoprotein complex is capable of recognizing and cleaving a polynucleotide that can hybridize to the guide RNA or mature crRNA.
  • a "target sequence” refers to a polynucleotide targeted by a guide sequence designed to be targeted, such as a sequence complementary to the guide sequence, wherein the hybridization between the target sequence and the guide sequence will promote the formation of a CRISPR/Cas complex. Complete complementarity is not required, as long as there is enough complementarity to cause hybridization and promote the formation of a CRISPR/Cas complex.
  • the target sequence can comprise any polynucleotide, such as DNA or RNA. In some cases, the target sequence is located in the nucleus or cytoplasm of a cell.
  • the target sequence may be located in an organelle of a eukaryotic cell, such as a mitochondria or a chloroplast.
  • a sequence or template that can be used to recombine into a target locus containing the target sequence is referred to as an "editing template” or “editing polynucleotide” or “editing sequence”.
  • the editing template is an exogenous nucleic acid.
  • the recombination is homologous recombination.
  • the expression "target sequence” or “target polynucleotide” can be any endogenous or exogenous polynucleotide for a cell (e.g., a eukaryotic cell).
  • the target polynucleotide can be a polynucleotide present in the nucleus of a eukaryotic cell.
  • the target polynucleotide can be a sequence encoding a gene product (e.g., a protein) or a non-coding sequence (e.g., a regulatory polynucleotide or useless DNA).
  • PAM protospacer adjacent motif
  • PAM protein kinase kinase kinase
  • target sequence a specific motif sequence recognized by the Cas protein
  • motif sequence refers to the PAM sequence.
  • wild type has the meaning generally understood by those skilled in the art, which refers to the typical form of an organism, strain, gene, or the characteristics that distinguish it from mutant or variant forms when it exists in nature, which can be isolated from a source in nature and has not been intentionally modified by man.
  • nucleic acid molecule or polypeptide As used herein, the terms “non-naturally occurring” or “engineered” are used interchangeably and indicate the involvement of human effort. When these terms are used to describe a nucleic acid molecule or polypeptide, it means that the nucleic acid molecule or polypeptide is at least substantially free from at least one other component with which it is associated in nature or as found in nature.
  • an "orthologue” of a protein as described herein refers to a protein belonging to a different species that performs the same or similar function as its orthologue.
  • identity is used to refer to the matching of sequences between two polypeptides or between two nucleic acids.
  • a position in both sequences being compared is occupied by the same base or amino acid monomer subunit (e.g., a position in each of the two DNA molecules is occupied by adenine, or a position in each of the two polypeptides is occupied by lysine)
  • the molecules are identical at that position.
  • the "percent identity" between two sequences is a function of the number of matching positions shared by the two sequences divided by the number of positions compared x 100. For example, if 6 out of 10 positions in two sequences match, then the two sequences have 60% identity.
  • the DNA sequences CTGACT and CAGGTT share 50% identity (3 out of a total of 6 positions match).
  • two sequences are compared when they are aligned to produce maximum identity.
  • Such an alignment can be achieved by using, for example, the method of Needleman et al. (1970) J. Mol. Biol. 48: 443-453, which can be conveniently performed by a computer program such as the Align program (DNAstar, Inc.).
  • the percent identity between two amino acid sequences can also be determined using the algorithm of E. Meyers and W. Miller (Comput.
  • the term "vector” refers to a nucleic acid delivery vehicle into which a polynucleotide can be inserted.
  • a vector can express the protein encoded by the inserted polynucleotide, the vector is called an expression vector.
  • the vector can be introduced into a host cell by transformation, transduction or transfection so that the genetic material elements it carries are expressed in the host cell.
  • Vectors are well known to those skilled in the art, and include but are not limited to: plasmids; phagemids; cosmids; artificial chromosomes, such as yeast artificial chromosomes (YAC), bacterial artificial chromosomes (BAC) or P1-derived artificial chromosomes (PAC); bacteriophages such as lambda phage or M13 phage and animal viruses, etc.
  • plasmids such as yeast artificial chromosomes (YAC), bacterial artificial chromosomes (BAC) or P1-derived artificial chromosomes (PAC)
  • bacteriophages such as lambda phage or M13 phage and animal viruses, etc.
  • Animal viruses that can be used as vectors include but are not limited to retroviruses (including lentiviruses), adenoviruses, adeno-associated viruses, herpes viruses (such as herpes simplex viruses), poxviruses, baculoviruses, papillomaviruses, papillomaviruses (such as SV40).
  • retroviruses including lentiviruses
  • adenoviruses include adeno-associated viruses, herpes viruses (such as herpes simplex viruses), poxviruses, baculoviruses, papillomaviruses, papillomaviruses (such as SV40).
  • a vector can contain a variety of elements that control expression, including but not limited to promoter sequences, transcription initiation sequences, enhancer sequences, selection elements and reporter genes.
  • the vector may also contain a replication initiation site.
  • the term "host cell” refers to cells that can be used to introduce a vector, including but not limited to prokaryotic cells such as Escherichia coli or Bacillus subtilis, fungal cells such as yeast cells or Aspergillus, insect cells such as S2 Drosophila cells or Sf9, or animal cells such as fibroblasts, CHO cells, COS cells, NSO cells, HeLa cells, BHK cells, HEK 293 cells or human cells.
  • a vector can be introduced into a host cell to thereby produce a transcript, protein, or peptide, including a protein, fusion protein, isolated nucleic acid molecule, etc. as described herein (e.g., a CRISPR transcript, such as a nucleic acid transcript, protein, or enzyme).
  • a CRISPR transcript such as a nucleic acid transcript, protein, or enzyme
  • regulatory element is intended to include promoters, enhancers, internal ribosome entry sites (IRES), and other expression control elements (e.g., transcription termination signals, such as polyadenylation signals and poly-U sequences), which are described in detail in Goeddel, GENE EXPRESSION TECHNOLOGY: METHODS IN ENZYMOLOGY 185, Academic Press, San Diego, California (1990).
  • regulatory elements include those sequences that direct constitutive expression of a nucleotide sequence in many types of host cells and those sequences that direct the nucleotide sequence to be expressed only in certain host cells (e.g., tissue-specific regulatory sequences).
  • Tissue-specific promoters can primarily direct expression in a desired tissue of interest, such as muscle, neuron, bone, skin, blood, a specific organ (e.g., liver, pancreas), or a special cell type (e.g., lymphocytes).
  • regulatory elements can also direct expression in a timing-dependent manner (e.g., in a cell cycle-dependent or developmental stage-dependent manner), which may or may not be tissue or cell type specific.
  • the term "regulatory element” encompasses enhancer elements such as WPRE; CMV enhancer; R-U5' fragment in the LTR of HTLV-I ((Mol. Cell. Biol., Vol. 8(1), pp. 466-472, 1988); SV40 enhancer; and intron sequences between exons 2 and 3 of rabbit ⁇ -globin (Proc. Natl. Acad. Sci. USA., Vol. 78(3), pp. 1527-31, 1981).
  • promoter has a meaning well known to those skilled in the art, and refers to a non-coding nucleotide sequence located upstream of a gene that can initiate expression of a downstream gene.
  • a constitutive promoter is a nucleotide sequence that, when operably linked to a polynucleotide encoding or defining a gene product, results in the production of a gene product in a cell under most or all physiological conditions of the cell.
  • An inducible promoter is a nucleotide sequence that, when operably linked to a polynucleotide encoding or defining a gene product, results in the production of the gene product in the cell essentially only when an inducer corresponding to the promoter is present in the cell.
  • a tissue-specific promoter is a nucleotide sequence that, when operably linked to a polynucleotide encoding or defining a gene product, results in the production of a gene product in the cell essentially only when the cell is a cell of the tissue type corresponding to the promoter.
  • operably linked is intended to mean that the nucleotide sequence of interest is linked to the one or more regulatory elements in a manner that allows for expression of the nucleotide sequence (e.g., in an in vitro transcription/translation system or in a host cell when the vector is introduced into the host cell).
  • complementarity refers to the ability of a nucleic acid to form one or more hydrogen bonds with another nucleic acid sequence by means of traditional Watson-Crick or other non-traditional types.
  • the percentage of complementarity represents the percentage of residues in a nucleic acid molecule that can form hydrogen bonds (e.g., Watson-Crick base pairing) with a second nucleic acid sequence (e.g., 5, 6, 7, 8, 9, 10 out of 10 are 50%, 60%, 70%, 80%, 90%, and 100% complementary).
  • “Complete complementarity” means that all consecutive residues of a nucleic acid sequence form hydrogen bonds with the same number of consecutive residues in a second nucleic acid sequence.
  • stringent conditions for hybridization refer to conditions under which a nucleic acid having complementarity to a target sequence predominantly hybridizes to the target sequence and substantially does not hybridize to non-target sequences. Stringent conditions are typically sequence-dependent and vary depending on many factors. In general, the longer the sequence, the higher the temperature at which the sequence specifically hybridizes to its target sequence. Non-limiting examples of stringent conditions are described in Tijssen (1993), Laboratory Techniques In Biochemistry And Molecular Biology-Hybridization With Nucleic Acid Probes, Part I, Chapter 2, “Overview of principles of hybridization and the strategy of nucleic acid probe assay", Elsevier, New York.
  • hybridization refers to a reaction in which one or more polynucleotide reactions form a complex that is stabilized via hydrogen bonding of the bases between the nucleotide residues. Hydrogen bonding can occur by means of Watson-Crick base pairing, Hoogstein binding or in any other sequence-specific manner.
  • the complex may include two chains forming a duplex, three or more chains forming a multi-chain complex, a single self-hybridizing chain or any combination of these.
  • Hybridization reactions can constitute a step in a more extensive process (such as the beginning of PCR or via the cutting of a polynucleotide of an enzyme). A sequence that can hybridize with a given sequence is referred to as the "complement" of the given sequence.
  • linker refers to a linear polypeptide formed by connecting multiple amino acid residues through peptide bonds.
  • the linker of the present invention can be an artificially synthesized amino acid sequence, or a naturally occurring polypeptide sequence, such as a polypeptide having a hinge region function.
  • Such linker polypeptides are well known in the art (see, for example, Holliger, P. et al. (1993) Proc. Natl. Acad. Sci. USA 90: 6444-6448; Poljak, R. J. et al. (1994) Structure 2: 1121-1123).
  • the term "subject” includes, but is not limited to, various animals, such as mammals, such as bovines, equines, ovines, swine, canines, felines, lagomorphs, rodents (e.g., mice or rats), non-human primates (e.g., macaques or cynomolgus monkeys) or humans.
  • the subject e.g., human
  • suffers from a disorder e.g., a disorder caused by a disease-related gene defect.
  • the Cas protein and system of the present invention have significant advantages.
  • the Cas effector protein of the present invention is smaller than Cas9, C2c1, CasY and Cpf1 proteins in molecular size, so the transfection efficiency is better than Cas9, C2c1, CasY and Cpf1 proteins, and the delivery efficiency in eukaryotic cells can be improved.
  • viral vectors such as AAV vectors, etc.
  • it can be used for delivery to eukaryotic cells (such as mammalian cells, human cells, mouse cells, etc.), and can be applied to research and/or clinical applications.
  • the Cas effector protein of the present invention can perform DNA cleavage in eukaryotic organisms, and compared to the FnCpf1 whose PAM domain has been reported to be 5'-TTN, the Cas protein of the present invention also has a wider PAM recognition site, which is 4 times larger than that of Cas9 or Cas12a.
  • FIG. 1 shows the PAM structure and analysis results in Example 3.
  • FIG. 2 is the result of verifying the in vitro cleavage activity of PAM in Example 3.
  • FIG. 3 is the in vivo verification result of the PAM domain in Escherichia coli in Example 3.
  • FIG. 4 shows the editing activity detection results in human cells in Example 4.
  • the experiments and methods described in the embodiments are basically carried out according to conventional methods well known in the art and described in various references.
  • the conventional techniques of immunology, biochemistry, chemistry, molecular biology, microbiology, cell biology, genomics and recombinant DNA used in the present invention can be found in Sambrook, Fritsch and Maniatis, MOLECULAR CLONING: A LABORATORY MANUAL, 2nd edition (1989); CURRENT PROTOCOLS IN MOLECULAR BIOLOGY (F.M. Ausubel et al., eds., (1987)); METHODS IN ENZYMOLOGY) series (Academic Press): PCR 2: A PRACTICAL APPROACH (M.J.
  • LB liquid medium 10g Tryptone, 5g Yeast Extract, 10g NaCl, dilute to 1L, sterilize. If antibiotics are needed, add them after the medium cools down, with a final concentration of 50 ⁇ g/mL.
  • Chloroform/isoamyl alcohol Add 10 mL of isoamyl alcohol to 240 mL of chloroform and mix well.
  • RNP buffer 100 mM NaCl, 50 mM Tris-HCl, 10 mM MgCl 2 , 100 ⁇ g/mL BSA, pH 7.9.
  • Prokaryotic expression vector pUC19 and pACYCDuet-1 were purchased from Beijing Quanshijin Biotechnology Co., Ltd.
  • Escherichia coli competent TSC-E03 was purchased from Beijing Qingke Biotechnology Co., Ltd.
  • Example 1 Obtaining Cas ⁇ gene and Cas ⁇ guide RNA
  • Annotation of CRISPR and genes Prodigal was used to annotate the microbial genome and metagenomic data from the NCBI and JGI databases to obtain all proteins, and Piler-CR was used to annotate the CRISPR loci. All parameters were default.
  • Protein filtering remove redundancy of annotated proteins through sequence consistency and remove proteins with completely identical sequences.
  • Each CRISPR locus will be extended 10Kb upstream and downstream to identify non-redundant proteins in the CRISPR adjacent region.
  • Clustering of CRISPR-related proteins Use BLASTP to perform internal pairwise alignment of non-redundant CRISPR-related proteins and output alignment results with Evalue ⁇ 1E-10. Use MCL to perform cluster analysis on the output results of BLASTP to identify CRISPR-related protein families.
  • CRISPR-enriched protein families Use BLASTP to align the proteins of the CRISPR-related protein family to a non-redundant protein database without CRISPR-related proteins, and output the alignment results with Evalue ⁇ 1E-10. If the homologous protein found in a non-CRISPR-related protein database is less than 100%, it means that the protein of this family is enriched in the CRISPR region. Through this method, we identify the CRISPR-enriched protein family.
  • CRISPR-enriched protein families were annotated using the Pfam database, NR database, and Cas proteins collected from NCBI to obtain new CRISPR/Cas protein families. Mafft was used to perform multiple sequence alignments on each CRISPR/Cas family protein, and then JPred and HHpred were used to perform conserved domain analysis to identify protein families containing the RuvC domain.
  • Cas ⁇ -1 to Cas ⁇ -13 a new Cas effector protein, which was named Cas ⁇ -1 to Cas ⁇ -13, with protein sequences as shown in SEQ ID NO: 1-13 and nucleotide sequences encoding proteins as shown in SEQ ID NO: 14-26.
  • the prototype direct repeat sequences (repeat sequences contained in pre-crRNA) corresponding to Cas ⁇ -1 to Cas ⁇ -13 are shown in SEQ ID NO: 27-39.
  • the CRISPR/Cas ⁇ sequence fragment was synthesized by Beijing Qingke Biotechnology Co., Ltd. and constructed into the protein expression vector pET-30a(+), and confirmed by first-generation sequencing. According to the sequencing results, the recombinant plasmid pET-30a+CRISPR/Cas ⁇ is described as follows:
  • the recombinant plasmid pET-30a+CRISPR/Cas ⁇ -1 contains an expression cassette, and the expression cassette sequence is shown in SEQ ID NO: 67.
  • positions 1 to 27 are the nucleotide sequence of SV40-NLS
  • positions 28 to 96 are the nucleotide sequence of 3 ⁇ FLAG
  • positions 97 to 2742 are the nucleotide sequence of Cas ⁇ -1
  • positions 2743 to 2802 are the nucleoplasmin NLS signal peptide.
  • the recombinant plasmid pET-30a+CRISPR/Cas ⁇ -2 contains an expression cassette, and the expression cassette sequence is shown in SEQ ID NO: 68.
  • positions 1 to 27 are the nucleotide sequence of SV40-NLS
  • positions 28 to 96 are the nucleotide sequence of 3 ⁇ FLAG
  • positions 97 to 2901 are the nucleotide sequence of Cas ⁇ -2
  • positions 2902 to 2961 are the nucleoplasmin NLS signal peptide.
  • the recombinant plasmid pET-30a+CRISPR/Cas ⁇ -3 contains an expression cassette, and the expression cassette sequence is shown in SEQ ID NO: 69.
  • positions 1 to 27 are the nucleotide sequence of SV40-NLS
  • positions 28 to 96 are the nucleotide sequence of 3 ⁇ FLAG
  • positions 97 to 2700 are the nucleotide sequence of Cas ⁇ -3
  • positions 2701 to 2856 are the nucleoplasmin NLS signal peptide.
  • the recombinant plasmid pET-30a+CRISPR/Cas ⁇ -4 contains an expression cassette, and the expression cassette sequence is shown in SEQ ID NO: 70.
  • positions 1 to 27 are the nucleotide sequence of SV40-NLS
  • positions 28 to 96 are the nucleotide sequence of 3 ⁇ FLAG
  • positions 97 to 1977 are the nucleotide sequence of Cas ⁇ -4
  • positions 1978 to 2037 are the nucleoplasmin NLS signal peptide.
  • the recombinant plasmid pET-30a+CRISPR/Cas ⁇ -5 contains an expression cassette, and the expression cassette sequence is shown in SEQ ID NO: 71.
  • positions 1 to 27 are the nucleotide sequence of SV40-NLS
  • positions 28 to 96 are the nucleotide sequence of 3 ⁇ FLAG
  • positions 97 to 2877 are the nucleotide sequence of Cas ⁇ -5
  • positions 2878 to 2937 are the nucleoplasmin NLS signal peptide.
  • the recombinant plasmid pET-30a+CRISPR/Cas ⁇ -6 contains an expression cassette, and the expression cassette sequence is shown in SEQ ID NO: 72.
  • positions 1 to 27 are the nucleotide sequence of SV40-NLS
  • positions 28 to 96 are the nucleotide sequence of 3 ⁇ FLAG
  • positions 97 to 2796 are the nucleotide sequence of Cas ⁇ -6
  • positions 2797 to 2856 are the nucleoplasmin NLS signal peptide.
  • the recombinant plasmid pET-30a+CRISPR/Cas ⁇ -7 contains an expression cassette, and the expression cassette sequence is shown in SEQ ID NO: 73.
  • positions 1 to 27 are the nucleotide sequence of SV40-NLS
  • positions 28 to 96 are the nucleotide sequence of 3 ⁇ FLAG
  • positions 97 to 2901 are the nucleotide sequence of Cas ⁇ -7
  • positions 2902 to 2961 are the nucleoplasmin NLS signal peptide.
  • the recombinant plasmid pET-30a+CRISPR/Cas ⁇ -8 contains an expression cassette, and the expression cassette sequence is shown in SEQ ID NO: 74.
  • positions 1 to 27 are the nucleotide sequence of SV40-NLS
  • positions 28 to 96 are the nucleotide sequence of 3 ⁇ FLAG
  • positions 97 to 2784 are the nucleotide sequence of Cas ⁇ -8
  • positions 2785 to 2844 are the nucleoplasmin NLS signal peptide.
  • the recombinant plasmid pET-30a+CRISPR/Cas ⁇ -9 contains an expression cassette, and the expression cassette sequence is shown in SEQ ID NO: 75.
  • positions 1 to 27 are the nucleotide sequence of SV40-NLS
  • positions 28 to 96 are the nucleotide sequence of 3 ⁇ FLAG
  • positions 97 to 2757 are the nucleotide sequence of Cas ⁇ -9
  • positions 2758 to 2817 are the nucleoplasmin NLS signal peptide.
  • the recombinant plasmid pET-30a+CRISPR/Cas ⁇ -10 contains an expression cassette, and the expression cassette sequence is shown in SEQ ID NO: 76.
  • positions 1 to 27 are the nucleotide sequence of SV40-NLS
  • positions 28 to 96 are the nucleotide sequence of 3 ⁇ FLAG
  • positions 97 to 2559 are the nucleotide sequence of Cas ⁇ -10
  • positions 2560 to 2619 are the nucleoplasmin NLS signal peptide.
  • the recombinant plasmid pET-30a+CRISPR/Cas ⁇ -11 contains an expression cassette, and the expression cassette sequence is shown in SEQ ID NO: 77.
  • positions 1 to 27 are the nucleotide sequence of SV40-NLS
  • positions 28 to 96 are the nucleotide sequence of 3 ⁇ FLAG
  • positions 97 to 2958 are the nucleotide sequence of Cas ⁇ -11
  • positions 2959 to 3018 are the nucleoplasmin NLS signal peptide.
  • Supernatant B was purified using a nickel column produced by GE (refer to the instructions of the nickel column for the specific steps of purification), and then Cas ⁇ -1 to Cas ⁇ -13 proteins were quantified using a protein quantification kit produced by Thermo Fisher Scientific.
  • the structure of the transcription template is: (1) T7 promoter + prototype direct repeat sequence of Cas ⁇ -1 to Cas ⁇ -13 (SEQ ID NO: 27-39) + guide sequence (SEQ ID NO: 81).
  • Primer 5.0 software was used to design the primers, ensuring that the forward primer and the reward primer had at least 18 bp of overlapping sequence.
  • RNA-free 1.5ml centrifuge tube Take a new RNA-free 1.5ml centrifuge tube, aspirate the supernatant from the previous step into the centrifuge tube, be careful not to aspirate the gel, add isopropanol with the same volume as the supernatant and one-tenth volume of sodium acetate solution, mix with a pipette tip, and place in a -20°C refrigerator for 1 hour or overnight;
  • the eukaryotic expression vector containing the Cas ⁇ -1 gene and the PCR product containing the U6 promoter and guide RNA (containing the prototype direct repeat sequence shown in SEQ ID NO: 27 and the eukaryotic editing guide sequence shown in SEQ ID NO: 82) were transferred into human HELA cells by liposome transfection and cultured at 37 degrees Celsius and 5% carbon dioxide for 72 hours. DNA from all cells was extracted, and the sequence containing 700bp of the target site was amplified. The PCR product was connected to the B-simple vector for first-generation sequencing. The sequencing was completed by Thermo Fisher Scientific. The sequencing results were aligned to the AAVS1 gene in the human genome, and it was identified that Cas ⁇ -1 can perform double-stranded DNA editing on the target site, thereby causing base deletion ( Figure 4).

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Plant Pathology (AREA)
  • Mycology (AREA)
  • Medicinal Chemistry (AREA)
  • Cell Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Immunology (AREA)
  • Virology (AREA)
  • Peptides Or Proteins (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

涉及Cas效应蛋白,包含此类蛋白的融合蛋白,以及编码它们的核酸分子。还涉及用于核酸编辑(例如,基因或基因组编辑)的复合物和组合物,其包含所述的蛋白或融合蛋白,或编码它们的核酸分子。还涉及用于核酸编辑(例如,基因或基因组编辑)的方法。

Description

新型CRISPR-Casσ酶和系统
相关申请的交叉引用
本申请要求2023年9月4日提交的中国专利申请202311132967.0的优先权,这些专利申请的全部内容以引用的方式整体并入本文。
技术领域
本发明涉及核酸编辑领域,特别是规律成簇的间隔短回文重复(CRISPR)技术领域。具体而言,本发明涉及Cas效应蛋白,包含此类蛋白的融合蛋白,以及编码它们的核酸分子。本发明还涉及用于核酸编辑(例如,基因或基因组编辑)的复合物和组合物,其包含本发明的蛋白或融合蛋白,或编码它们的核酸分子。本发明还涉及用于核酸编辑(例如,基因或基因组编辑)的方法,其使用包含本发明的蛋白或融合蛋白。
背景技术
CRISPR/Cas技术是一种被广泛使用的基因编辑技术,它通过RNA引导对基因组上的靶序列进行特异性结合并切割DNA产生双链断裂,利用生物非同源末端连接或同源重组进行定点基因编辑。
CRISPR/Cas9系统是最常用的II型CRISPR系统,它识别3’-NGG的PAM基序,对靶标序列进行平末端切割。CRISPR/Cas Type V系统是一类近两年新发现的CRISPR系统,它具有5’-TTN的基序,对靶标序列进行粘性末端切割,例如Cpf1,C2c1,CasX,CasY。然而目前存在的不同的CRISPR/Cas各有不同的优点和缺陷。例如Cas9,C2c1和CasX均需要两条RNA进行导向RNA,而Cpf1只需要一条导向RNA而且可以用来进行多重基因编辑。CasX具有980个氨基酸的大小,而常见的Cas9,C2c1,CasY和Cpf1通常大小在1300个氨基酸左右。此外,Cas9,Cpf1,CasX,CasY的PAM序列都比较复杂多样,而C2c1识别严谨的5’-TTN,因此它的靶标位点比其他系统容易被预测从而降低了潜在的脱靶效应。
总之,鉴于目前可获得的CRISPR/Cas系统都受限于一些缺陷,开发一种更稳健的、具有多方面良好性能的新型CRISPR/Cas系统对生物技术的发展具有重要意义。
发明内容
本申请的发明人经过大量实验和反复摸索,出人意料地发现了一种新型RNA指导的核酸内切酶。基于这一发现,本发明人开发了新的CRISPR/Cas系统以及基于该系统的基因编辑方法。
Cas效应蛋白
因此,在第一方面,本发明提供了一种蛋白,其具有SEQ ID NO:1、2、3、4、 5、6、7、8、9、10、11、12和13任一项所示的氨基酸序列或其直系同源物、同源物、变体或功能性片段;其中,所述直系同源物、同源物、变体或功能性片段基本保留了其所源自的序列的生物学功能。
在本发明中,上述序列的生物学功能包括但不限于,与导向RNA结合的活性、核酸内切酶活性、在导向RNA引导下与靶序列特定位点结合并切割的活性。
在某些实施方案中,所述直系同源物、同源物、变体与其所源自的序列相比具有至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、或至少99%的序列同一性。
在某些实施方案中,所述直系同源物、同源物、变体与SEQ ID NO:1、2、3、4、5、6、7、8、9、10、11、12和13任一项所示的序列相比具有至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、或至少99%的序列同一性,并且基本保留了其所源自的序列的生物学功能(例如,与导向RNA结合的活性、核酸内切酶活性、在导向RNA引导下与靶序列特定位点结合并切割的活性)。
在某些实施方案中,所述蛋白是CRISPR/Cas系统中的效应蛋白。
在某些实施方案中,本发明的蛋白包含选自下列的序列,或由选自下列的序列组成:
(i)SEQ ID NO:1、2、3、4、5、6、7、8、9、10、11、12和13任一项所示的序列;
(ii)与SEQ ID NO:1、2、3、4、5、6、7、8、9、10、11、12和13任一项所示的序列相比具有一个或多个氨基酸的置换、缺失或添加(例如1个,2个,3个,4个,5个,6个,7个,8个,9个,10个,11个,12个,13个,14个,15个,16个,17个,18个,19个,20个,21个,22个,23个,24个,25个,26个,27个,28个,29个,30个,31个,32个,33个,34个,35个,36个,37个,38个,39个以及40个氨基酸的置换、缺失或添加)的序列;或
(iii)与SEQ ID NO:1-13任一项所示的序列具有至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、或至少99%的序列同一性的序列。
衍生的蛋白
本发明的蛋白可进行衍生化,例如被连接至另一个分子(例如另一个多肽或蛋白)。通常,蛋白的衍生化(例如,标记)不会不利影响该蛋白的期望活性(例如,与导向RNA结合的活性、核酸内切酶活性、在导向RNA引导下与靶序列特定位点结合并切割的活性)。因此,本发明的蛋白还意欲包括此类衍生化的形式。例如,可以将本发明的蛋 白功能性连接(通过化学偶合、基因融合、非共价连接或其它方式)于一个或多个其它分子基团,例如另一个蛋白或多肽,检测试剂,药用试剂等。
特别地,可以将本发明的蛋白连接其他功能性单元。例如,可以将其与核定位信号(NLS)序列连接,以提高本发明的蛋白进入细胞核的能力。例如,可以将其与靶向部分连接,以使得本发明的蛋白具有靶向性。例如,可以将其与可检测的标记连接,以便于对本发明的蛋白进行检测。例如,可以将其与表位标签连接,以便于本发明的蛋白的表达、检测、示踪和/或纯化。
缀合物
因此,在第二方面,本发明提供了一种缀合物,其包含如上所述的蛋白和修饰部分。
在某些实施方案中,所述修饰部分选自另外的蛋白或多肽、可检测的标记或其任意组合。
在某些实施方案中,所述另外的蛋白或多肽选自表位标签、报告基因序列、核定位信号(NLS)序列、靶向部分、转录激活结构域(例如,VP64)、转录抑制结构域(例如,KRAB结构域或SID结构域)、核酸酶结构域(例如,Fok1),具有选自下列的活性的结构域:核苷酸脱氨酶、甲基化酶活性,去甲基化酶,转录激活活性,转录抑制活性,转录释放因子活性,组蛋白修饰活性,核酸酶活性,单链RNA切割活性,双链RNA切割活性,单链DNA切割活性,双链DNA切割活性和核酸结合活性;以及其任意组合。
在某些实施方案中,本发明的缀合物包含一个或多个NLS序列,例如SV40病毒大T抗原的NLS。在某些示例性实施方案中,所述NLS序列如SEQ ID NO:53所示。在某些实施方案中,所述NLS序列位于、靠近或接近本发明的蛋白的末端(例如,N端或C端)。在某些示例性实施方案中,所述NLS序列位于、靠近或接近本发明的蛋白的C端。
在某些实施方案中,本发明的缀合物包含表位标签(epitope tag)。这类表位标签是本领域技术人员熟知的,其实例包括但不限于His、V5、FLAG、HA、Myc、VSV-G、Trx等,并且本领域技术人员已知如何根据期望目的(例如,纯化、检测或示踪)选择合适的表位标签。
在某些实施方案中,本发明的缀合物包含报告基因序列。这类报告基因是本领域技术人员熟知的,其实例包括但不限于GST、HRP、CAT、GFP、HcRed、DsRed、CFP、YFP、BFP等。
在某些实施方案中,本发明的缀合物包含能够与DNA分子或细胞内分子结合的结构域,例如麦芽糖结合蛋白(MBP)、Lex A的DNA结合结构域(DBD)、GAL4的DBD等。
在某些实施方案中,本发明的缀合物包含可检测的标记,例如荧光染料,例如FITC或DAPI。
在某些实施方案中,本发明的蛋白任选地通过接头与所述修饰部分偶联、缀合或融 合。
在某些实施方案中,所述修饰部分直接连接至本发明的蛋白的N端或C端。
在某些实施方案中,所述修饰部分通过接头连接至本发明的蛋白的N端或C端。这类接头是本领域熟知的,其实例包括但不限于包含一个或多个(例如,1个,2个,3个,4个或5个)氨基酸(如,Glu或Ser)或氨基酸衍生物(如,Ahx、β-Ala、GABA或Ava)的接头,或PEG等。
融合蛋白
在第三方面,本发明提供了一种融合蛋白,其包含本发明的蛋白以及另外的蛋白或多肽。
在某些实施方案中,所述另外的蛋白或多肽选自表位标签、报告基因序列、核定位信号(NLS)序列、靶向部分、转录激活结构域(例如,VP64)、转录抑制结构域(例如,KRAB结构域或SID结构域)、核酸酶结构域(例如,Fok1),具有选自下列的活性的结构域:核苷酸脱氨酶、甲基化酶活性,去甲基化酶,转录激活活性,转录抑制活性,转录释放因子活性,组蛋白修饰活性,核酸酶活性,单链RNA切割活性,双链RNA切割活性,单链DNA切割活性,双链DNA切割活性和核酸结合活性;以及其任意组合。
在某些实施方案中,本发明的融合蛋白包含一个或多个NLS序列,例如SV40病毒大T抗原的NLS。在某些实施方案中,所述NLS序列位于、靠近或接近本发明的蛋白的末端(例如,N端或C端)。例如,所述NLS序列如SEQ ID NO:53所示。在某些示例性实施方案中,所述NLS序列位于、靠近或接近本发明的蛋白的C端。
在某些实施方案中,本发明的融合蛋白包含表位标签。
在某些实施方案中,本发明的融合蛋白包含报告基因序列。
在某些实施方案中,本发明的融合蛋白包含能够与DNA分子或细胞内分子结合的结构域。
在某些实施方案中,本发明的蛋白任选地通过接头与所述另外的蛋白或多肽融合。
在某些实施方案中,所述另外的蛋白或多肽直接连接至本发明的蛋白的N端或C端。
在某些实施方案中,所述另外的蛋白或多肽通过接头连接至本发明的蛋白的N端或C端。
在某些示例性实施方案中,本发明的融合蛋白具有如SEQ ID NO:54-66任一项所示的氨基酸序列。
本发明的蛋白、本发明的缀合物或本发明的融合蛋白不受其产生方式的限定,例如,其可以通过基因工程方法(重组技术)产生,也可以通过化学合成方法产生。
同向重复序列
在第四方面,本发明提供了一种分离的核酸分子,其包含选自下列的序列,或由选 自下列的序列组成:
(i)SEQ ID NO:27-39任一项所示的序列;
(ii)与SEQ ID NO:27-39任一项所示的序列相比具有一个或多个碱基的置换、缺失或添加(例如1个,2个,3个,4个,5个,6个,7个,8个,9个或10个碱基的置换、缺失或添加)的序列;
(iii)与SEQ ID NO:27-39任一项所示的序列具有至少20%、至少30%、至少40%、至少50%、至少60%、至少70%、至少80%、至少90%、至少95%的序列同一性的序列;
(iv)在严格条件下与(i)-(iii)任一项中所述的序列杂交的序列;或
(v)(i)-(iii)任一项中所述的序列的互补序列;
并且,(ii)-(v)中任一项所述的序列基本保留了其所源自的序列的生物学功能,所述序列的生物学功能是指,作为CRISPR-Cas系统中的同向重复序列的活性。
在某些实施方案中,所述分离的核酸分子是CRISPR-Cas系统中的同向重复序列。
在某些实施方案中,所述核酸分子包含选自下列的序列,或由选自下列的序列组成:
(a)SEQ ID NO:27-39任一项所示的核苷酸序列;
(b)在严格条件下与(a)中所述的序列杂交的序列;或
(c)(a)中所述的序列的互补序列。
在某些实施方案中,所述分离的核酸分子是RNA。
CRISPR/Cas复合物
在第五方面,本发明提供了一种复合物,其包含:
(i)蛋白组分,其选自:本发明的蛋白、缀合物或融合蛋白,及其任意组合;和
(ii)核酸组分,其从5’至3’方向包含如上文所述的分离的核酸分子和能够与靶序列杂交的导向序列,
其中,所述蛋白组分与核酸组分相互结合形成复合物。
在某些实施方案中,所述导向序列连接于所述核酸分子的3’端。
在某些实施方案中,所述导向序列包含所述靶序列的互补序列。
在某些实施方案中,所述核酸组分是CRISPR-Cas系统中的导向RNA。
在某些实施方案中,所述核酸分子是RNA。
在某些实施方案中,所述复合物不包含反式作用crRNA(tracrRNA)。
在某些实施方案中,所述导向序列在长度上为至少5个、至少10个、至少15个、至少20个、至少25个、至少30个核苷酸。在某些实施方案中,所述导向序列在长度上为10-30个、或15-25个、或15-22个、或19-25个或19-22个核苷酸。
在某些实施方案中,所述分离的核酸分子在长度上为55-70个核苷酸,例如55-65个核苷酸,例如60-65个核苷酸,例如62-65个核苷酸,例如63-64个核苷酸。在某些实施 方案中,所述分离的核酸分子在长度上为15-30个核苷酸,例如15-25个核苷酸,例如20-25个核苷酸,例如22-24个核苷酸,例如23个核苷酸。
编码核酸、载体及宿主细胞
在第六方面,本发明提供了一种分离的核酸分子,其包含:
(i)编码本发明的蛋白或融合蛋白的核苷酸序列;
(ii)编码如第四方面所述的分离的核酸分子的核苷酸序列;或
(iii)包含(i)和(ii)的核苷酸序列。
在某些实施方案中,(i)-(iii)任一项中所述的核苷酸序列经密码子优化用于在原核细胞中进行表达。在某些实施方案中,(i)-(iii)任一项中所述的核苷酸序列经密码子优化用于在真核细胞中进行表达。
在第七方面,本发明还提供了一种载体,其包含如第六方面所述的分离的核酸分子。本发明的载体可以是克隆载体,也可以是表达载体。在某些实施方案中,本发明的载体是例如质粒,粘粒,噬菌体,柯斯质粒等等。在某些实施方案中,所述载体能够在受试者(例如哺乳动物,例如人)体内表达本发明的蛋白、融合蛋白、如第四方面所述的分离的核酸分子或如第五方面所述的复合物。
在第八方面,本发明还提供了包含如上所述的分离的核酸分子或载体的宿主细胞。此类宿主细胞包括但不限于,原核细胞例如大肠杆菌细胞,以及真核细胞例如酵母细胞,昆虫细胞,植物细胞和动物细胞(如哺乳动物细胞,例如小鼠细胞、人细胞等)。本发明的细胞还可以是细胞系,例如293T细胞。
组合物及载体组合物
在第九方面,本发明还提供了一种组合物,其包含:
(i)第一组分,其选自:本发明的蛋白、缀合物、融合蛋白、编码所述蛋白或融合蛋白的核苷酸序列,以及其任意组合;和
(ii)第二组分,其为包含导向RNA的核苷酸序列,或者编码所述包含导向RNA的核苷酸序列的核苷酸序列;
其中,所述导向RNA从5’至3’方向包含同向重复序列和导向序列,所述导向序列能够与靶序列杂交;
所述导向RNA能够与(i)中所述的蛋白、缀合物或融合蛋白形成复合物。
在某些实施方案中,所述同向重复序列是如第四方面所定义的分离的核酸分子。
在某些实施方案中,所述导向序列连接至所述同向重复序列的3’端。在某些实施方案中,所述导向序列包含所述靶序列的互补序列。
在某些实施方案中,所述组合物不包含crRNA(tracrRNA)。
在某些实施方案中,所述组合物是非天然存在的或经修饰的。在某些实施方案中,所述组合物中的至少一个组分是非天然存在的或经修饰的。在某些实施方案中,所述第 一组分是非天然存在的或经修饰的;和/或,所述第二组分是非天然存在的或经修饰的。
在某些实施方案中,当所述靶序列为DNA时,所述靶序列位于原间隔序列临近基序(PAM)的3’端,并且所述PAM具有5’-NTN所示的序列,其中,N各自独立地选自A、G、T或C;例如,所述PAM的序列为ATG,ATG,GTG,ATA,ATA,GTA,GTA和/或GTG。
在某些实施方案中,当所述靶序列为RNA时,所述靶序列不具有PAM结构域限制。
在某些实施方案中,所述靶序列是来自原核细胞或真核细胞的DNA或RNA序列。在某些实施方案中,所述靶序列是非天然存在的DNA或RNA序列。
在某些实施方案中,所述靶序列存在于细胞内。在某些实施方案中,所述靶序列存在于细胞核内或细胞质(例如,细胞器)内。在某些实施方案中,所述细胞是真核细胞。在某些实施方案中,所述细胞是原核细胞。
在某些实施方案中,所述蛋白连接有一个或多个NLS序列。在某些实施方案中,所述缀合物或融合蛋白包含一个或多个NLS序列。在某些实施方案中,所述NLS序列连接至所述蛋白的N端或C端。在某些实施方案中,所述NLS序列融合至所述蛋白的N端或C端。
在第十方面,本发明还提供了一种组合物,其包含一种或多种载体,所述一种或多种载体包含:
(i)第一核酸,其包含编码本发明的蛋白或融合蛋白的核苷酸序列;任选地所述第一核酸可操作地连接至第一调节元件;以及
(ii)第二核酸,其包含编码导向RNA的核苷酸序列;任选地所述第二核酸可操作地连接至第二调节元件;
其中:
所述第一核酸与第二核酸存在于相同或不同的载体上;
所述导向RNA从5’至3’方向包含同向重复序列和导向序列,所述导向序列能够与靶序列杂交;
所述导向RNA能够与(i)中所述的效应蛋白或融合蛋白形成复合物。
在某些实施方案中,所述同向重复序列是如第四方面所定义的分离的核酸分子。
在某些实施方案中,所述导向序列连接至所述同向重复序列的3’端。在某些实施方案中,所述导向序列包含所述靶序列的互补序列。
在某些实施方案中,所述组合物不包含反式作用crRNA(tracrRNA)。
在某些实施方案中,所述组合物是非天然存在的或经修饰的。在某些实施方案中,所述组合物中的至少一个组分是非天然存在的或经修饰的。
在某些实施方案中,所述第一调节元件是启动子,例如诱导型启动子。
在某些实施方案中,所述第二调节元件是启动子,例如诱导型启动子。
在某些实施方案中,当所述靶序列为DNA时,所述靶序列位于原间隔序列临近基序(PAM)的3’端,并且所述PAM具有5’-NTN所示的序列,其中,N各自独立地选自A、G、T或C;例如,所述PAM的序列为ATG,ATG,GTG,ATA,ATA,GTA,GTA和/或GTG。
在某些实施方案中,当所述靶序列为RNA时,所述靶序列不具有PAM结构域限制。
在某些实施方案中,所述靶序列是来自原核细胞或真核细胞的DNA或RNA序列。在某些实施方案中,所述靶序列是非天然存在的DNA或RNA序列。
在某些实施方案中,所述靶序列存在于细胞内。在某些实施方案中,所述靶序列存在于细胞核内或细胞质(例如,细胞器)内。在某些实施方案中,所述细胞是真核细胞。在某些实施方案中,所述细胞是原核细胞。
在某些实施方案中,所述蛋白连接有一个或多个NLS序列。在某些实施方案中,所述缀合物或融合蛋白包含一个或多个NLS序列。在某些实施方案中,所述NLS序列连接至所述蛋白的N端或C端。在某些实施方案中,所述NLS序列融合至所述蛋白的N端或C端。
在某些实施方案中,一种类型的载体是质粒,其是指其中可以例如通过标准分子克隆技术插入另外的DNA片段的环状双链DNA环。另一种类型的载体是病毒载体,其中病毒衍生的DNA或RNA序列存在于用于包装病毒(例如,逆转录病毒、复制缺陷型逆转录病毒、腺病毒、复制缺陷型腺病毒、以及腺相关病毒)的载体中。病毒载体还包含由用于转染到一种宿主细胞中的病毒携带的多核苷酸。某些载体(例如,具有细菌复制起点的细菌载体和附加型哺乳动物载体)能够在它们被导入的宿主细胞中自主复制。其他载体(例如,非附加型哺乳动物载体)在引入宿主细胞后整合到该宿主细胞的基因组中,并且由此与该宿主基因组一起复制。而且,某些载体能够指导它们可操作连接的基因的表达。这样的载体在此被称为“表达载体”。在重组DNA技术中使用的普通表达栽体通常是质粒形式。
重组表达载体可包含处于适合于在宿主细胞中的核酸表达的形式的本发明的核酸分子,这意味着这些重组表达载体包含基于待用于表达的宿主细胞而选择的一种或多种调节元件,所述调节元件可操作地连接至待表达的核酸序列。
递送及递送组合物
本发明的蛋白、缀合物、融合蛋白、如第四方面所述的分离的核酸分子、本发明的复合物、如第六方面所述的分离的核酸分子、如第七方面所述的载体、如第九方面及第十方面所述的组合物,可以通过本领域已知的任何方法进行递送。此类方法包括但不限于,电穿孔、脂转染、核转染、显微注射、声孔效应、基因枪、磷酸钙介导的转染、阳离子转染、脂质体转染、树枝状转染、热激转染、核转染、磁转染、脂转染、穿刺转染、光学转染、试剂增强性核酸摄取、以及经由脂质体、免疫脂质体、病毒颗粒、人工 病毒体等的递送。
因此,在另一个方面,本发明提供了一种递送组合物,其包含递送载体,以及选自下列的一种或多种:本发明的蛋白、缀合物、融合蛋白、如第四方面所述的分离的核酸分子、本发明的复合物、如第六方面所述的分离的核酸分子、如第七方面所述的载体、如第九方面及第十方面所述的组合物。
在某些实施方案中,所述递送载体是粒子。
在某些实施方案中,所述递送载体选自脂质颗粒、糖颗粒、金属颗粒、蛋白颗粒、脂质体、外泌体、微泡、基因枪或病毒载体(例如,复制缺陷型逆转录病毒、慢病毒、腺病毒或腺相关病毒)。
试剂盒
在另一个方面,本发明提供了一种试剂盒,其包含如上所述的组分中的一种或多种。在某些实施方案中,所述试剂盒包含一种或多种选自下列的组分:本发明的蛋白、缀合物、融合蛋白、如第四方面所述的分离的核酸分子、本发明的复合物、如第六方面所述的分离的核酸分子、如第七方面所述的载体、如第九方面及第十方面所述的组合物。
在某些实施方案中,本发明的试剂盒包含如第九方面所述的组合物。在某些实施方案中,所述试剂盒还包含使用所述组合物的说明书。
在某些实施方案中,本发明的试剂盒包含如第十方面所述的组合物。在某些实施方案中,所述试剂盒还包含使用所述组合物的说明书。
在某些实施方案中,本发明的试剂盒中包含的组分可以被提供于任何适合的容器中。
在某些实施方案中,所述试剂盒还包含一种或多种缓冲液。缓冲液可以是任何缓冲液,包括但不限于碳酸钠缓冲液、碳酸氢钠缓冲液、硼酸盐缓冲液、Tris缓冲液、MOPS缓冲液、HEPES缓冲液及其组合。在某些实施方案中,该缓冲液是碱性的。在某些实施方案中,该缓冲液具有从约7至约10的pH。
在某些实施方案中,该试剂盒还包括一个或多个寡核苷酸,该一个或多个寡核苷酸对应于一个用于插入进载体中的导向序列,以便可操作地连接该导向序列和调节元件。在某些实施方案中,该试剂盒包括同源重组模板多核苷酸。
方法及用途
在另一个方面,本发明提供了一种修饰靶基因的方法,其包括:将如第五方面所述的复合物、如第九方面所述的组合物或如第十方面所述的组合物与所述靶基因接触,或者递送至包含所述靶基因的细胞中;所述靶序列存在于所述靶基因中。
在某些实施方案中,所述方法用于体外(in vitro)或离体(ex vivo)修饰靶基因。在某些实施方案中,所述方法不是通过疗法来治疗人或动物的方法。在某些实施方案 中,所述方法不包括修饰人类种系遗传特性的步骤。
在某些实施方案中,所述靶基因存在于细胞内。在某些实施方案中,所述细胞是原核细胞。在某些实施方案中,所述细胞是真核细胞。在某些实施方案中,所述细胞是哺乳动物细胞。在某些实施方案中,所述细胞是人类细胞。在某些实施方案中,所述细胞选自非人灵长类动物、牛、猪或啮齿类动物细胞。在某些实施方案中,所述细胞是非哺乳动物真核细胞,例如家禽或鱼等。在某些实施方案中,所述细胞是植物细胞,例如栽培植物(如木薯、玉米、高粱、小麦或水稻)、藻类、树或蔬菜具有的细胞。
在某些实施方案中,所述靶基因存在于体外的核酸分子(例如,质粒)中。在某些实施方案中,所述靶基因存在于质粒中。
在某些实施方案中,所述方法导致靶序列断裂(例如,使DNA双链断裂或RNA单链断裂)。在某些实施方案中,所述断裂导致靶基因的转录降低。
在某些实施方案中,所述方法还包括:将编辑模板(例如外源核酸)与所述靶基因接触,或者递送至包含所述靶基因的细胞中。在此类实施方案中,所述方法通过与编辑模板(例如外源核酸)同源重组修复所述断裂的靶基因,其中所述修复导致一种突变,包括所述靶基因的一个或多个核苷酸的插入、缺失、或取代。在某些实施方案中,所述突变导致在从包含该靶序列的基因表达的蛋白质中的一个或多个氨基酸改变。
因此,在某些实施方案中,所述修饰还包括将编辑模板(例如外源核酸)插入所述断裂中。
在某些实施方案中,所述的蛋白、蛋白截短体、缀合物、融合蛋白、分离的核酸分子、复合物、载体或组合物包含于递送载体中。
在某些实施方案中,所述递送载体选自脂质颗粒、糖颗粒、金属颗粒、蛋白颗粒、脂质体、外泌体、病毒载体(如复制缺陷型逆转录病毒、慢病毒、腺病毒或腺相关病毒)。
在某些实施方案中,所述方法其用于改变靶基因或编码靶基因产物的核酸分子中的一个或多个靶序列来修饰细胞、细胞系或生物体。
在另一个方面,本发明提供了一种改变基因产物的表达的方法,其包括:将如第五方面所述的复合物、如第九方面所述的组合物或如第十方面所述的组合物与编码所述基因产物的核酸分子接触,或者递送至包含所述核酸分子的细胞中,所述靶序列存在于所述核酸分子中。
在某些实施方案中,所述方法用于体外或离体改变基因产物的表达。在某些实施方案中,所述方法不是通过疗法来治疗人或动物的方法。在某些实施方案中,所述方法不包括修饰人类种系遗传特性的步骤。
在某些实施方案中,所述核酸分子存在于细胞内。在某些实施方案中,所述细胞是原核细胞。在某些实施方案中,所述细胞是真核细胞。在某些实施方案中,所述细胞是哺乳动物细胞。在某些实施方案中,所述细胞是人类细胞。在某些实施方案中,所述细胞选自非人灵长类动物、牛、猪或啮齿类动物细胞。在某些实施方案中,所述细胞是非 哺乳动物真核细胞,例如家禽或鱼等。在某些实施方案中,所述细胞是植物细胞,例如栽培植物(如木薯、玉米、高粱、小麦或水稻)、藻类、树或蔬菜具有的细胞。
在某些实施方案中,所述核酸分子存在于体外的核酸分子(例如,质粒)中。在某些实施方案中,所述核酸分子存在于质粒中。
在某些实施方案中,所述基因产物的表达被改变(例如,增强或降低)。在某些实施方案中,所述基因产物的表达被增强。在某些实施方案中,所述基因产物的表达被降低。
在某些实施方案中,所述基因产物是蛋白。
在某些实施方案中,所述的蛋白、蛋白截短体、缀合物、融合蛋白、分离的核酸分子、复合物、载体或组合物包含于递送载体中。
在某些实施方案中,所述递送载体选自脂质颗粒、糖颗粒、金属颗粒、蛋白颗粒、脂质体、外泌体、病毒载体(如复制缺陷型逆转录病毒、慢病毒、腺病毒或腺相关病毒)。
在某些实施方案中,所述方法其用于改变靶基因或编码靶基因产物的核酸分子中的一个或多个靶序列来修饰细胞、细胞系或生物体。
在另一个方面,本发明涉及如第一方面所述的蛋白、如第二方面所述的缀合物、如第三方面所述的融合蛋白、如第四方面所述的分离的核酸分子、如第五方面所述的复合物、如第六方面所述的分离的核酸分子、如第七方面所述的载体、如第九方面所述的组合物、如第十方面所述的组合物、本发明的试剂盒,在制备制剂中的用途,所述制剂用于核酸编辑(例如,体外或离体核酸编辑)。
在某些实施方案中,待被编辑的核酸存在于细胞内。在某些实施方案中,所述细胞是原核细胞或真核细胞。在某些实施方案中,待被编辑的核酸存在于体外的核酸分子(例如,质粒)中。
在某些实施方案中,所述核酸编辑包括基因或基因组编辑,例如修饰基因、敲除基因、改变基因产物的表达、修复突变、和/或插入多核苷酸。在某些实施方案中,所述基因或基因组编辑不包括修饰人类种系遗传特性的步骤。在某些实施方案中,所述用途不是通过疗法来治疗人或动物的方法。
在某些实施方案中,所述用途还包括通过与外源模板多核苷酸的同源重组来修复被编辑的靶序列,其中所述修复可以产生该靶序列的突变,包括一个或多个核苷酸的插入、缺失或取代。
在另一个方面,本发明涉及如第一方面所述的蛋白、如第二方面所述的缀合物、如第三方面所述的融合蛋白、如第四方面所述的分离的核酸分子、如第五方面所述的复合物、如第六方面所述的分离的核酸分子、如第七方面所述的载体、如第九方面所述的组合物、如第十方面所述的组合物、本发明的试剂盒,在制备制剂中的用途,所述制剂用 于:(i)体外或离体DNA检测;(ii)编辑靶基因座中的靶序列来修饰生物或非人类生物(例如,原核生物)。
在某些实施方案中,所述制剂用于单链DNA或双链DNA的检测(例如原核细胞中的单链或双链DNA的检测)。
在某些实施方案中,所述DNA检测用于检测肿瘤、病毒或细菌。不限制于理论,认为由于Casσ在靶DNA识别后对单链DNA的非特异性切割特性,当靶DNA(例如,肿瘤特异性标记、病毒或细菌特异性标记)存在时,通过加入可被检测的单链DNA并检测该单链DNA被非特异性切割的情况,可以实现对肿瘤、埃博拉、禽流感、非洲猪瘟等病毒或细菌的检测。
在另一方面,本发明还提供一种检测样品中是否存在靶核酸的方法,其包括以下步骤:
(1)将所述样品与带有标记的DNA探针、以及以下任一组分接触:本发明的复合物、如第九方面及第十方面所述的组合物或本发明所述的试剂盒;
其中,所述复合物、组合物或试剂盒包含的导向序列能够与靶核酸杂交,并且,所述DNA探针不与所述导向序列杂交;
在某些实施方案中,所述DNA探针被切割后发出可检测信号;
(2)检测由所述复合物、组合物或试剂盒所包含的蛋白或蛋白截短体切割所述DNA探针产生的可检测信号,从而确定所述样品中是否存在靶核酸。
在某些实施方案中,所述DNA探针的一端(例如,5’端)经荧光基团标记,另一端(例如,3’端)经淬灭基团标记。
在某些实施方案中,所述靶核酸的序列为获自病原物的序列。在某些实施方案中,所述病原物选自病毒、细菌、真菌、原生动物、寄生虫或其任意组合。
在某些实施方案中,所述靶核酸的序列获自肿瘤细胞的基因组。
本申请所检测的靶核酸可以是DNA也可以是RNA。因此,在某些实施方案中,所述方法还包括将所述样品与用于逆转录的试剂接触的步骤。在某些实施方案中,所述用于逆转录的试剂选自逆转录酶、寡核苷酸引物、dNTP或其任意组合。
在某些实施方案中,所述靶核酸是单链或双链的。在某些实施方案中,所述靶核酸的序列是来自原核细胞或真核细胞的DNA或RNA序列;或者,所述靶核酸的序列是非天然存在的DNA或RNA序列。
在某些实施方案中,所述可检测信号通过选自下列的一种或多种方法测定:基于成像的检测,基于传感器的检测,颜色检测,基于金纳米颗粒的检测,荧光偏振,胶体相变/分散,电化学检测和基于半导体的传感。
在某些实施方案中,所述方法还包括扩增样品中所述靶核酸的步骤。
细胞及细胞子代
在某些情况下,由本发明的方法引入到细胞的修饰可以使得细胞和其子代被改变以改进其生物产物(如抗体、淀粉、乙醇或其他期望的细胞输出物)的产生。在某些情况下,由本发明的方法引入到细胞的修饰可以使得细胞和其子代包括使所生产生物产物发生变化的改变。
因此,在另一方面,本发明还涉及如上所述的方法获得的细胞或其子代,其中所述细胞含有在其野生型中不存在的修饰。
本发明还涉及如上所述的细胞或其子代的细胞产物。
本发明还涉及一种体外的、离体的或体内的细胞或细胞系或它们的子代,所述细胞或细胞系或它们的子代包含:如第一方面所述的蛋白、如第二方面所述的缀合物、如第三方面所述的融合蛋白、如第四方面所述的分离的核酸分子、如第五方面所述的复合物、如第六方面所述的分离的核酸分子、如第七方面所述的载体、如第九方面所述的组合物、如第十方面所述的组合物、本发明的试剂盒或递送组合物。
在某些实施方案中,所述细胞是原核细胞。
在某些实施方案中,所述细胞是真核细胞。在某些实施方案中,所述细胞是哺乳动物细胞。在某些实施方案中,所述细胞是人类细胞。某些实施方案中,所述细胞是非人哺乳动物细胞,例如非人灵长类动物、牛、羊、猪、犬、猴、兔、啮齿类(如大鼠或小鼠)的细胞。在某些实施方案中,所述细胞是非哺乳动物真核细胞,例如家禽鸟类(如鸡)、鱼类或甲壳动物(如蛤蜊、虾)的细胞。在某些实施方案中,所述细胞是植物细胞,例如单子叶植物或双子叶植物具有的细胞或栽培植物或粮食作物如木薯、玉米、高粱、大豆、小麦、燕麦或水稻具有的细胞,例如藻类、树或生产植物、果实或蔬菜(例如,树类如柑橘树、坚果树;茄属植物、棉花、烟草、番茄、葡萄、咖啡、可可等)。
在某些实施方案中,所述细胞是干细胞或干细胞系。
术语定义
在本发明中,除非另有说明,否则本文中使用的科学和技术名词具有本领域技术人员所通常理解的含义。并且,本文中所用的分子遗传学、核酸化学、化学、分子生物学、生物化学、细胞培养、微生物学、细胞生物学、基因组学和重组DNA等操作步骤均为相应领域内广泛使用的常规步骤。同时,为了更好地理解本发明,下面提供相关术语的定义和解释。
在本发明中,表述“Casσ”是指,本发明人首次发现并鉴定的一种Cas效应蛋白,其具有选自下列的氨基酸序列:
(i)SEQ ID NO:1-13任一项所示的序列;
(ii)与SEQ ID NO:1-13任一项所示的序列相比具有一个或多个氨基酸的置换、缺失或添加(例如1个,2个,3个,4个,5个,6个,7个,8个,9个,10个,11个,12个,13个,14个,15个,16个,17个,18个,19个,20个,21个,22个,23个,24个, 25个,26个,27个,28个,29个,30个,31个,32个,33个,34个,35个,36个,37个,38个,39个以及40个氨基酸的置换、缺失或添加)的序列;或
(iii)与SEQ ID NO:1-13任一项所示的序列具有至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、或至少99%的序列同一性的序列。
本发明的Casσ是一种在导向RNA引导下与靶序列特定位点结合并切割的核酸内切酶。
如本文中所使用的,术语“规律成簇的间隔短回文重复(CRISPR)-CRISPR-相关(Cas)(CRISPR-Cas)系统”或“CRISPR系统”可互换地使用并且具有本领域技术人员通常理解的含义,其通常包含与CRISPR相关(“Cas”)基因的表达有关的转录产物或其他元件,或者能够指导所述Cas基因活性的转录产物或其他元件。此类转录产物或其他元件可以包含编码Cas效应蛋白的序列和包含CRISPR RNA(crRNA)的导向RNA,以及在CRISPR-Cas9系统中所含有的反式作用crRNA(tracrRNA)序列,或来自CRISPR基因座的其他序列或转录产物。在本发明所述的基于Casσ的CRISPR系统中,不需要tracrRNA序列。
如本文中所使用的,术语“Cas效应蛋白”、“Cas效应酶”可互换地使用并且是指,CRISPR-Cas系统中呈现的任一种大于长度800个氨基酸的蛋白质。在某些情况下,这类蛋白是指从Cas基因座中鉴定的蛋白。
如本文中所使用的,术语“导向RNA(guide RNA)”、“成熟crRNA”可互换地使用并且具有本领域技术人员通常理解的含义。一般而言,导向RNA可以包含同向(direct)重复序列和导向序列(guide sequence),或者基本上由或由同向重复序列和导向序列(在内源性CRISPR系统背景下也称为间隔序列(spacer))组成。在某些情况下,导向序列是与靶序列具有足够互补性从而与所述靶序列杂交并引导CRISPR/Cas复合物与所述靶序列的特异性结合的任何多核苷酸序列。在某些实施方案中,当最佳比对时,导向序列与其相应靶序列之间的互补程度为至少50%、至少60%、至少70%、至少80%、至少90%、至少95%、或至少99%。确定最佳比对在本领域的普通技术人员的能力范围内。例如,存在公开和可商购的比对算法和程序,诸如但不限于ClustalW、matlab中的史密斯-沃特曼算法(Smith-Waterman)、Bowtie、Geneious、Biopython以及SeqMan。
在某些情况下,所述导向序列在长度上为至少5个、至少10个、至少15个、至少16个、至少17个、至少18个、至少19个、至少20个、至少21个、至少22个、至少23个、至少24个、至少25个、至少26个、至少27个、至少28个、至少29个、至少30个、至少35个、至少40个、至少45个或至少50个核苷酸。在某些情况下,所述导向序列在长度上为不超过50个、45个、40个、35个、30个、25个、24个、23个、22个、21个、20个、15个、10个或更少个核苷酸。在某些实施方案中,所述导向序列在长度上为10-30个、或15-25个、或15-22个、或19-25个或19-22个核苷酸。
在某些情况下,所述同向重复序列在长度上为至少10个、至少15个、至少16个、至少17个、至少18个、至少19个、至少20个、至少21个、至少22个、至少23个、至少24个、至少25个、至少26个、至少27个、至少28个、至少29个、至少30个、至少35个、至少40个、至少45个、至少50个、至少55个、至少56个、至少57个、至少58个、至少59个、至少60个、至少61个、至少62个、至少63个、至少64个、至少65个或至少70个核苷酸。在某些情况下,所述同向重复序列在长度上为不超过70个、65个、64个、63个、62个、61个、60个、59个、58个、57个、56个、55个、50个、45个、40个、35个、30个、29个、28个、27个、26个、25个、24个、23个、22个、21个、20个、15个、10个或更少个核苷酸。在某些实施方案中,所述同向重复序列在长度上为55-70个核苷酸,例如55-65个核苷酸,例如60-65个核苷酸,例如62-65个核苷酸,例如63-64个核苷酸。在某些实施方案中,所述同向重复序列在长度上为15-30个核苷酸,例如15-25个核苷酸,例如20-25个核苷酸,例如22-24个核苷酸,例如23个核苷酸。
如本文中所使用的,术语“CRISPR/Cas复合物”是指,导向RNA(guide RNA)或成熟crRNA与Cas蛋白结合所形成的核糖核蛋白复合体,其包含杂交到靶序列上并且与Cas蛋白结合的导向序列。该核糖核蛋白复合体能够识别并切割能与该导向RNA或成熟crRNA杂交的多核苷酸。
因此,在形成CRISPR/Cas复合物的情况下,“靶序列”是指被设计为具有靶向性的导向序列所靶向的多核苷酸,例如与该导向序列具有互补性的序列,其中靶序列与导向序列之间的杂交将促进CRISPR/Cas复合物的形成。完全互补性不是必需的,只要存在足够互补性以引起杂交并且促进一种CRISPR/Cas复合物的形成即可。靶序列可以包含任何多核苷酸,如DNA或RNA。在某些情况下,所述靶序列位于细胞的细胞核或细胞质中。在某些情况下,该靶序列可位于真核细胞的一个细胞器例如线粒体或叶绿体内。可被用于重组到包含该靶序列的靶基因座中的序列或模板被称为“编辑模板”或“编辑多核苷酸”或“编辑序列”。在某些实施方案中,所述编辑模板为外源核酸。在某些实施方案中,该重组是同源重组。
在本发明中,表述“靶序列”或“靶多核苷酸”可以是对细胞(例如,真核细胞)而言任何内源或外源的多核苷酸。例如,该靶多核苷酸可以是一种存在于真核细胞的细胞核中的多核苷酸。该靶多核苷酸可以是一个编码基因产物(例如,蛋白质)的序列或一个非编码序列(例如,调节多核苷酸或无用DNA)。在某些情况下,据信该靶序列应该与原间隔序列临近基序(PAM)相关。对PAM的精确序列和长度要求取决于使用的Cas效应酶而不同,但是PAM典型地是临近原间隔序列(也即,靶序列)的2-5个碱基对序列。本领域技术人员能够鉴定与给定的Cas效应蛋白一起使用的PAM序列。在本文中,“Cas蛋白所识别的特定的基序序列”或“基序序列”即指代PAM序列。
在某些情况下,靶序列或靶多核苷酸可以包括多个疾病相关基因和多核苷酸以及信号传导生化途径相关基因和多核苷酸。此类靶序列或靶多核苷酸的非限制性实例,包括 分别提交于2012年12月12日和2013年1月2日的美国临时专利申请61/736,527和61/748,427、提交于2013年12月12日的国际申请PCT/US2013/074667中所列举的那些,其全部通过引用并入本文。
在某些情况下,靶序列或靶多核苷酸的实例包括与信号传导生化途径相关的序列,例如信号传导生化途径相关基因或多核苷酸。靶多核苷酸的实例包括疾病相关基因或多核苷酸。“疾病相关”基因或多核苷酸是指与非疾病对照的组织或细胞相比,在来源于疾病影响的组织的细胞中以异常水平或以异常形式产生转录或翻译产物的任何基因或多核苷酸。在改变的表达与疾病的出现和/或进展相关的情况下,它可以是一个以异常高的水平被表达的基因;或者,它可以是一个以异常低的水平被表达的基因。疾病相关基因还指具有一个或多个突变或直接负责或与一个或多个负责疾病的病因学的基因连锁不平衡的遗传变异的基因。转录的或翻译的产物可以是已知的或未知的,并且可以处于正常或异常水平。
如本文中所使用的,术语“野生型”具有本领域技术人员通常理解的含义,其表示生物、菌株、基因的典型形式或者当它在自然界存在时区别于突变体或变体形式的特征,其可从自然中的来源分离并且没有被人为有意地修饰。
如本文中所使用的,术语“非天然存在的”或“工程化的”可互换地使用并且表示人工的参与。当这些术语用于描述核酸分子或多肽时,其表示该核酸分子或多肽至少基本上从它们在自然界中或如发现于自然界中的与其结合的至少另一种组分游离出来。
如本文中所使用的,术语“直系同源物(orthologue,ortholog)”具有本领域技术人员通常理解的含义。作为进一步指导,如本文中所述的蛋白质的“直系同源物”是指属于不同物种的蛋白质,该蛋白质执行与作为其直系同源物的蛋白相同或相似的功能。
如本文中所使用的,术语“同一性”用于指两个多肽之间或两个核酸之间序列的匹配情况。当两个进行比较的序列中的某个位置都被相同的碱基或氨基酸单体亚单元占据时(例如,两个DNA分子的每一个中的某个位置都被腺嘌呤占据,或两个多肽的每一个中的某个位置都被赖氨酸占据),那么各分子在该位置上是同一的。两个序列之间的“百分数同一性”是由这两个序列共有的匹配位置数目除以进行比较的位置数目×100的函数。例如,如果两个序列的10个位置中有6个匹配,那么这两个序列具有60%的同一性。例如,DNA序列CTGACT和CAGGTT共有50%的同一性(总共6个位置中有3个位置匹配)。通常,在将两个序列比对以产生最大同一性时进行比较。这样的比对可通过使用,例如,可通过计算机程序例如Align程序(DNAstar,Inc.)方便地进行的Needleman等人(1970)J.Mol.Biol.48:443-453的方法来实现。还可使用已整合入ALIGN程序(版本2.0)的E.Meyers和W.Miller(Comput.Appl Biosci.,4:11-17(1988))的算法,使用PAM120权重残基表(weight residue table)、12的缺口长度罚分和4的缺口罚分来测定两个氨基酸序列之间的百分数同一性。此外,可使用已整合入GCG软件包(可在www.gcg.com上获得)的GAP程序中的Needleman和Wunsch(J MoI Biol.48:444-453(1970))算法,使用Blossum 62矩阵或 PAM250矩阵以及16、14、12、10、8、6或4的缺口权重(gap weight)和1、2、3、4、5或6的长度权重来测定两个氨基酸序列之间的百分数同一性。
如本文中所使用的,术语“载体”是指,可将多聚核苷酸插入其中的一种核酸运载工具。当载体能使插入的多核苷酸编码的蛋白获得表达时,载体称为表达载体。载体可以通过转化,转导或者转染导入宿主细胞,使其携带的遗传物质元件在宿主细胞中获得表达。载体是本领域技术人员公知的,包括但不限于:质粒;噬菌粒;柯斯质粒;人工染色体,例如酵母人工染色体(YAC)、细菌人工染色体(BAC)或P1来源的人工染色体(PAC);噬菌体如λ噬菌体或M13噬菌体及动物病毒等。可用作载体的动物病毒包括但不限于,逆转录酶病毒(包括慢病毒)、腺病毒、腺相关病毒、疱疹病毒(如单纯疱疹病毒)、痘病毒、杆状病毒、乳头瘤病毒、乳头多瘤空泡病毒(如SV40)。一种载体可以含有多种控制表达的元件,包括但不限于,启动子序列、转录起始序列、增强子序列、选择元件及报告基因。另外,载体还可含有复制起始位点。
如本文中所使用的,术语“宿主细胞”是指,可用于导入载体的细胞,其包括但不限于,如大肠杆菌或枯草菌等的原核细胞,如酵母细胞或曲霉菌等的真菌细胞,如S2果蝇细胞或Sf9等的昆虫细胞,或者如纤维原细胞,CHO细胞,COS细胞,NSO细胞,HeLa细胞,BHK细胞,HEK 293细胞或人细胞等的动物细胞。
本领域技术人员将理解,表达载体的设计可取决于诸如待转化的宿主细胞的选择、所希望的表达水平等因素。一种载体可以被引入到宿主细胞中而由此产生转录物、蛋白质、或肽,包括由如本文所述的蛋白、融合蛋白、分离的核酸分子等(例如,CRISPR转录物,如核酸转录物、蛋白质、或酶)。
如本文中所使用的,术语“调节元件”旨在包括启动子、增强子、内部核糖体进入位点(IRES)、和其他表达控制元件(例如转录终止信号,如多聚腺苷酸化信号和多聚U序列),其详细描述可参考戈德尔(Goeddel),《基因表达技术:酶学方法》(GENE EXPRESSION TECHNOLOGY:METHODS IN ENZYMOLOGY)185,学术出版社(Academic Press),圣地亚哥(San Diego),加利福尼亚州(1990)。在某些情况下,调节元件包括指导一个核苷酸序列在许多类型的宿主细胞中的组成型表达的那些序列以及指导该核苷酸序列只在某些宿主细胞中表达的那些序列(例如,组织特异型调节序列)。组织特异型启动子可主要指导在感兴趣的期望组织中的表达,所述组织例如肌肉、神经元、骨、皮肤、血液、特定的器官(例如肝脏、胰腺)、或特殊的细胞类型(例如淋巴细胞)。在某些情况下,调节元件还可以时序依赖性方式(如以细胞周期依赖性或发育阶段依赖性方式)指导表达,该方式可以是或者可以不是组织或细胞类型特异性的。在某些情况下,术语“调节元件”涵盖的是增强子元件,如WPRE;CMV增强子;在HTLV-I的LTR中的R-U5’片段((Mol.Cell.Biol.,第8(1)卷,第466-472页,1988);SV40增强子;以及在兔β-珠蛋白的外显子2与3之间的内含子序列(Proc.Natl.Acad.Sci.USA.,第78(3)卷,第1527-31页,1981)。
如本文中所使用的,术语“启动子”具有本领域技术人员公知的含义,其是指一段位于基因的上游能启动下游基因表达的非编码核苷酸序列。组成型(constitutive)启动子是这样的核苷酸序列:当其与编码或者限定基因产物的多核苷酸可操作地相连时,在细胞的大多数或者所有生理条件下,其导致细胞中基因产物的产生。诱导型启动子是这样的核苷酸序列,当可操作地与编码或者限定基因产物的多核苷酸相连时,基本上只有当对应于所述启动子的诱导物在细胞中存在时,其导致所述基因产物在细胞内产生。组织特异性启动子是这样的核苷酸序列:当可操作地与编码或者限定基因产物的多核苷酸相连时,基本上只有当细胞是该启动子对应的组织类型的细胞时,其才导致在细胞中产生基因产物。
如本文中所使用的,术语“可操作地连接”旨在表示感兴趣的核苷酸序列以一种允许该核苷酸序列的表达的方式被连接至该一种或多种调节元件(例如,处于一种体外转录/翻译系统中或当该载体被引入到宿主细胞中时,处于该宿主细胞中)。
如本文中所使用的,术语“互补性”是指核酸与另一个核酸序列借助于传统的沃森-克里克或其他非传统类型形成一个或多个氢键的能力。互补百分比表示一个核酸分子中可与一个第二核酸序列形成氢键(例如,沃森-克里克碱基配对)的残基的百分比(例如,10个之中有5、6、7、8、9、10个即为50%、60%、70%、80%、90%、和100%互补)。“完全互补”表示一个核酸序列的所有连续残基与一个第二核酸序列中的相同数目的连续残基形成氢键。如本文使用的“基本上互补”是指在一个具有8、9、10、11、12、13、14、15、16、17、18、19、20、21、22、23、24、25、30、35、40、45、50个或更多个核苷酸的区域上至少为60%、65%、70%、75%、80%、85%、90%、95%、97%、98%、99%、或100%的互补程度,或者是指在严格条件下杂交的两个核酸。
如本文中所使用的,对于杂交的“严格条件”是指与靶序列具有互补性的一个核酸主要地与该靶序列杂交并且基本上不杂交到非靶序列上的条件。严格条件通常是序列依赖性的,并且取决于许多因素而变化。一般而言,该序列越长,则该序列特异性地杂交到其靶序列上的温度就越高。严格条件的非限制性实例描述于蒂森(Tijssen)(1993)的《生物化学和分子生物学中的实验室技术-核酸探针杂交》(Laboratory Techniques In BiochemistryAnd Molecular Biology-Hybridization With Nucleic Acid Probes),第I部分,第二章,“杂交原理概述和核酸探针分析策略”(“Overview of principles of hybridization andthe strategy of nucleic acid probe assay”),爱思唯尔(Elsevier),纽约。
如本文中所使用的,术语“杂交”是指其中一个或多个多核苷酸反应形成一种复合物的反应,该复合物经由这些核苷酸残基之间的碱基的氢键键合而稳定化。氢键键合可以借助于沃森-克里克碱基配对、Hoogstein结合或以任何其他序列特异性方式而发生。该复合物可包含形成一个双链体的两条链、形成多链复合物的三条或多条链、单个自我杂交链、或这些的任何组合。杂交反应可以构成一个更广泛的过程(如PCR的开始、或经由一种酶的多核苷酸的切割)中的一个步骤。能够与一个给定序列杂交的序列被称为该给定序列的“互补物”。
如本文中所使用的,术语“表达”是指,藉此从DNA模板转录成多核苷酸(如转录成mRNA或其他RNA转录物)的过程和/或转录的mRNA随后藉此翻译成肽、多肽或蛋白质的过程。转录物和编码的多肽可以总称为“基因产物”。如果多核苷酸来源于基因组DNA,表达可以包括真核细胞中mRNA的剪接。
如本文中所使用的,术语“接头”是指,由多个氨基酸残基通过肽键连接形成的线性多肽。本发明的接头可以为人工合成的氨基酸序列,或天然存在的多肽序列,例如具有铰链区功能的多肽。此类接头多肽是本领域众所周知的(参见例如,Holliger,P.等人(1993)Proc.Natl.Acad.Sci.USA 90:6444-6448;Poljak,R.J.等人(1994)Structure 2:1121-1123)。
如本文中所使用的,术语“治疗”是指,治疗或治愈病症,延缓病症的症状的发作,和/或延缓病症的发展。
如本文中所使用的,术语“受试者”包括但不限于各种动物,例如哺乳动物,例如牛科动物、马科动物、羊科动物、猪科动物、犬科动物、猫科动物、兔科动物、啮齿类动物(例如,小鼠或大鼠)、非人灵长类动物(例如,猕猴或食蟹猴)或人。在某些实施方式中,所述受试者(例如人)患有病症(例如,疾病相关基因缺陷所导致的病症)。
发明的有益效果
与现有技术相比,本发明的Cas蛋白及系统具有显著的有利方面。例如,本发明的Cas效应蛋白分子大小上小于Cas9、C2c1、CasY和Cpf1蛋白,因此转染效率上优于Cas9、C2c1、CasY和Cpf1蛋白,能够提高真核细胞中的递送效率。例如,在使用病毒载体(如AAV载体等)的情况下,可用于递送至真核细胞(如哺乳动物细胞、人类细胞、小鼠细胞等),并可应用于研究和/或临床应用。并且,本发明的Cas效应蛋白能在真核生物体内进行DNA切割,相比于已经报导的PAM结构域为5’-TTN的FnCpf1,本发明的Cas蛋白还具有更宽广的PAM识别位点,与Cas9或Cas12a相比PAM识别位点扩大了4倍。
下面将结合附图和实施例对本发明的实施方案进行详细描述,但是本领域技术人员将理解,下列附图和实施例仅用于说明本发明,而不是对本发明的范围的限定。根据附图和优选实施方案的下列详细描述,本发明的各种目的和有利方面对于本领域技术人员来说将变得显然。
附图说明
图1为实施例3中的PAM结构与分析结果。
图2为实施例3中PAM体外切割活性验证结果。
图3为实施例3中PAM结构域大肠杆菌体内验证结果。
图4为实施例4中人类细胞中的编辑活性检测结果。
序列信息
本发明涉及的部分序列的信息提供于下面的表1中。
表1:序列的描述


































具体实施方式
现参照下列意在举例说明本发明(而非限定本发明)的实施例来描述本发明。
除非特别指明,否则基本上按照本领域内熟知的以及在各种参考文献中描述的常规方法进行实施例中描述的实验和方法。例如,本发明中所使用的免疫学、生物化学、化学、分子生物学、微生物学、细胞生物学、基因组学和重组DNA等常规技术,可参见萨姆布鲁克(Sambrook)、弗里奇(Fritsch)和马尼亚蒂斯(Maniatis),《分子克隆:实验室手册》(MOLECULAR CLONING:A LABORATORY MANUAL),第2次编辑(1989);《当代分子生物学实验手册》(CURRENT PROTOCOLS IN MOLECULAR BIOLOGY)(F.M.奥苏贝尔(F.M.Ausubel)等人编辑,(1987));《酶学方法》(METHODS IN ENZYMOLOGY)系列(学术出版公司):《PCR 2:实用方法》(PCR 2:A PRACTICAL APPROACH)(M.J.麦克弗森(M.J.MacPherson)、B.D.黑姆斯(B.D.Hames)和G.R.泰勒(G.R.Taylor)编辑(1995))、哈洛(Harlow)和拉内(Lane)编辑(1988)《抗体:实验室手册》(ANTIBODIES,A LABORATORY MANUAL),以及《动物细胞培养》(ANIMAL CELL CULTURE)(R.I.弗雷谢尼(R.I.Freshney)编辑(1987))。
另外,实施例中未注明具体条件者,按照常规条件或制造商建议的条件进行。所用试剂或仪器未注明生产厂商者,均为可以通过市购获得的常规产品。本领域技术人员知晓,实施例以举例方式描述本发明,且不意欲限制本发明所要求保护的范围。本文中提及的全部公开案和其他参考资料以其全文通过引用合并入本文。
以下实施例涉及的部分试剂的来源如下:
LB液体培养基:10g胰蛋白胨(Tryptone),5g酵母提取物(Yeast Extract),10g NaCl,定容至1L,灭菌。若需加抗生素,则待培养基冷却后加,50μg/mL的终浓度。
氯仿/异戊醇:240ml的氯仿加10mL的异戊醇,混匀。
RNP缓冲液:100mM氯化钠,50mM Tris-HCl,10mM MgCl2,100μg/mL BSA,pH 7.9。
原核表达载体pUC19和pACYCDuet-1购自北京全式金生物技术有限公司。
大肠杆菌感受态TSC-E03购自北京擎科生物科技有限公司公司。
实施例1.Casσ基因和Casσ导向RNA的获得
1、CRISPR和基因的注释:使用Prodigal对将NCBI和JGI数据库的微生物基因组和宏基因组数据进行基因注释得到所有蛋白,同时用Piler-CR进行CRISPR座的注释,参数均为默认参数。
2、蛋白质的过滤:通过序列一致性对注释蛋白去冗余,去除序列完全一致的蛋白。
3、CRISPR相关蛋白的获得:将每一个CRISPR座上下游延伸10Kb,将对CRISPR邻近区间内的非冗余蛋白进行鉴定。
4、CRISPR相关蛋白质的聚类:使用BLASTP对非冗余CRISPR相关蛋白进行内部的两两比对,输出Evalue<1E-10的比对结果。使用MCL对BLASTP的输出结果进行聚类分析,CRISPR相关蛋白质家族。
5、CRISPR富集蛋白质家族的鉴定:使用BLASTP对CRISPR相关蛋白质家族的蛋白比对到去除去CRISPR相关蛋白的非冗余蛋白数据库,输出Evalue<1E-10的比对结果。如果一个非CRISPR相关蛋白数据库发现的同源蛋白小于100%,那么则说明这个家族的蛋白在CRISPR区域是富集的,通过这种方法我们对CRISPR富集蛋白质家族进行鉴定。
6、蛋白功能和结构域的注释:利用Pfam数据库,NR数据库以及从NCBI收集的Cas蛋白对CRISPR富集蛋白质家族进行注释,得到新的CRISPR/Cas蛋白质家族。利用Mafft对每个CRISPR/Cas家族蛋白进行多重序列比对,然后用JPred和HHpred进行保守结构域分析,鉴定含有RuvC结构域的蛋白质家族。
在此基础上,本发明人获得了一种全新的Cas效应蛋白,将它们分别命名为Casσ-1至Casσ-13,蛋白质序列如SEQ ID NO:1-13所示,编码蛋白的核苷酸序列如SEQ ID NO:14-26所示。Casσ-1至Casσ-13所对应的原型同向重复序列(pre-crRNA中所含有的repeat序列)如SEQ ID NO:27-39所示。
实施例2.Casσ基因的序列结构说明
1.CRISPR/Casσ序列片段由北京擎科生物科技有限公司合成并构建至蛋白表达载体pET-30a(+)上,并进行一代测序确认。根据测序结果,对重组质粒pET-30a+CRISPR/Casσ进行描述如下:
(1)重组质粒pET-30a+CRISPR/Casσ-1含有表达盒,表达盒序列如SEQ ID NO:67所示。SEQ ID NO:67所示的序列中,自5’末端起第1至27位为SV40-NLS的核苷酸序列,第28至96位为3×FLAG的核苷酸序列,第97至2742位为Casσ-1的核苷酸序列,第2743至2802为nucleoplasmin NLS信号肽。
(2)重组质粒pET-30a+CRISPR/Casσ-2含有表达盒,表达盒序列如SEQ ID NO:68所示。SEQ ID NO:68所示的序列中,自5’末端起第1至27位为SV40-NLS的核苷酸序列,第28至96位为3×FLAG的核苷酸序列,第97至2901位为Casσ-2的核苷酸序列,第2902至2961为nucleoplasmin NLS信号肽。
(3)重组质粒pET-30a+CRISPR/Casσ-3含有表达盒,表达盒序列如SEQ ID NO:69所示。SEQ ID NO:69所示的序列中,自5’末端起第1至27位为SV40-NLS的核苷酸序列,第28至96位为3×FLAG的核苷酸序列,第97至2700位为Casσ-3的核苷酸序列,第2701至2856为nucleoplasmin NLS信号肽。
(4)重组质粒pET-30a+CRISPR/Casσ-4含有表达盒,表达盒序列如SEQ ID NO:70所示。SEQ ID NO:70所示的序列中,自5’末端起第1至27位为SV40-NLS的核苷酸序列,第28至96位为3×FLAG的核苷酸序列,第97至1977位为Casσ-4的核苷酸序列,第1978至2037为nucleoplasmin NLS信号肽。
(5)重组质粒pET-30a+CRISPR/Casσ-5含有表达盒,表达盒序列如SEQ ID NO:71所示。SEQ ID NO:71所示的序列中,自5’末端起第1至27位为SV40-NLS的核苷酸序列,第28至96位为3×FLAG的核苷酸序列,第97至2877位为Casσ-5的核苷酸序列,第2878至2937为nucleoplasmin NLS信号肽。
(6)重组质粒pET-30a+CRISPR/Casσ-6含有表达盒,表达盒序列如SEQ ID NO:72所示。SEQ ID NO:72所示的序列中,自5’末端起第1至27位为SV40-NLS的核苷酸序列,第28至96位为3×FLAG的核苷酸序列,第97至2796位为Casσ-6的核苷酸序列,第2797至2856为nucleoplasmin NLS信号肽。
(7)重组质粒pET-30a+CRISPR/Casσ-7含有表达盒,表达盒序列如SEQ ID NO:73所示。SEQ ID NO:73所示的序列中,自5’末端起第1至27位为SV40-NLS的核苷酸序列,第28至96位为3×FLAG的核苷酸序列,第97至2901位为Casσ-7的核苷酸序列,第2902至2961为nucleoplasmin NLS信号肽。
(8)重组质粒pET-30a+CRISPR/Casσ-8含有表达盒,表达盒序列如SEQ ID NO:74所示。SEQ ID NO:74所示的序列中,自5’末端起第1至27位为SV40-NLS的核苷酸序列,第28至96位为3×FLAG的核苷酸序列,第97至2784位为Casσ-8的核苷酸序列,第2785至2844为nucleoplasmin NLS信号肽。
(9)重组质粒pET-30a+CRISPR/Casσ-9含有表达盒,表达盒序列如SEQ ID NO:75所示。SEQ ID NO:75所示的序列中,自5’末端起第1至27位为SV40-NLS的核苷酸序列,第28至96位为3×FLAG的核苷酸序列,第97至2757位为Casσ-9的核苷酸序列,第2758至2817为nucleoplasmin NLS信号肽。
(10)重组质粒pET-30a+CRISPR/Casσ-10含有表达盒,表达盒序列如SEQ ID NO:76所示。SEQ ID NO:76所示的序列中,自5’末端起第1至27位为SV40-NLS的核苷酸序列,第28至96位为3×FLAG的核苷酸序列,第97至2559位为Casσ-10的核苷酸序列,第2560至2619为nucleoplasmin NLS信号肽。
(11)重组质粒pET-30a+CRISPR/Casσ-11含有表达盒,表达盒序列如SEQ ID NO:77所示。SEQ ID NO:77所示的序列中,自5’末端起第1至27位为SV40-NLS的核苷酸序列,第28至96位为3×FLAG的核苷酸序列,第97至2958位为Casσ-11的核苷酸序列,第2959至3018为nucleoplasmin NLS信号肽。
(12)重组质粒pET-30a+CRISPR/Casσ-12含有表达盒,表达盒序列如SEQ ID NO:78所示。SEQ ID NO:78所示的序列中,自5’末端起第1至27位为SV40-NLS的核苷酸序列,第28至96位为3×FLAG的核苷酸序列,第97至3099位为Casσ-12的核苷酸序列,第3100至3159为nucleoplasmin NLS信号肽。
(13)重组质粒pET-30a+CRISPR/Casσ-13含有表达盒,表达盒序列如SEQ ID NO:79所示。SEQ ID NO:79所示的序列中,自5’末端起第1至27位为SV40-NLS的核苷酸序列,第28至96位为3×FLAG的核苷酸序列,第97至2559位为Casσ-13的核苷酸序列,第2560至2619为nucleoplasmin NLS信号肽。
实施例3.CRISPR/Casσ系统的PAM及DNA切割方式的鉴定
一、Casσ蛋白的体外表达及纯化
Casσ蛋白的体外表达及纯化的步骤具体如下:
1、人工合成SEQ ID NO:67-79所示的核苷酸序列。
2、将重组质粒pET-30a-CRISPR/Casσ-1至13导入大肠杆菌TSC-E03,得到重组菌,将该重组菌命名为TSC-E03-CRISPR/Casσ-1至13。挑取TSC-E03-CRISPR/Casσ-1至13的单克隆,接种至100mLLB液体培养基(含50μg/mL卡那霉素),37℃、200rpm振荡培养12h,得到培养菌液。
3、取培养菌液,按体积比为1:100接种至50mL LB液体培养基(含50μg/mL卡那霉素),37℃、200rpm振荡培养至OD600nm值为0.6,然后加入IPTG并使其浓度为1mM,18℃、220rpm振荡培养14h,4℃、7000rpm离心10min,收集菌体沉淀。
5、取菌体沉淀,加入100mL pH 8.0、100mM的Tris-HCl缓冲液,重悬后超声破碎(超声波功率600W,循环程序为:破碎4s,停6s,共20min),然后4℃、10000rpm离心10min,收集上清液甲。
6、取上清液甲,4℃、12000rpm离心10min,收集上清液乙。
7、采用GE公司生产的镍柱对上清液乙进行纯化(纯化的具体步骤参考镍柱的说明书),然后采用赛默飞世尔公司生产的蛋白定量试剂盒对Casσ-1至Casσ-13蛋白进行定量。
二、Casσ蛋白导向RNA的转录及纯化:
1、分别设计导向RNA转录的模板,转录模板的结构为:(1)T7启动子+Casσ-1至Casσ-13的原型同向重复序列(SEQ ID NO:27-39)+导向序列(SEQ ID NO:81)。引物的设计使用Primer5.0软件,保证Forward primer和Reward primer有至少18bp的重叠序列。
2、配置如下反应体系,轻轻吹打混匀后短暂离心,置于PCR仪中缓慢退火,PCR体系如下:
3、使用MinElute PCR Purifcation Kit进行模板的纯化,步骤如下:
1)向PCR产物中加入5倍体积的PB,将一个MinElute柱子放至2ml收集管上,室温静置2min,12000g离心1min;
2)弃废液,加入750μL Buffer PE(用之前记得加乙醇),12000g/2min;
3)弃废液,加入350μL Buffer PE,12000g离心1min,弃废液,12000g,空离2min;
4)将MinElute柱子换至新的1.5ml离心管上,开盖,65℃静置2min;
5)加入20μL预热的EB溶液,静置2min后,12000g离心1min,为了提高回收率,可将离心管内容物过2-3遍MinElute离心柱;
6)用Nanodrop测定浓度,冻存-20℃备用。
4、导向RNA的纯化:酚:氯仿:异戊醇(25:24:1)抽提去除体系内的DNaseI;
1)向转录后的反应体系中加入80μL RNA free H2O,调整体积至100μL;
2)取出2ml的Phase Lock Gel(PLG)Heavy,15000g,离心2min,加入100μL酚:氯仿:异戊醇(25:24:1)、100μL经过DNAseI消化的RNA,用手轻轻弹Phase-Lock tube 5-10次,使其混合均匀,之后15℃/16000g离心12min;
3)取一个新的RNA-free的1.5ml离心管,将上步离心的上清吸出至离心管中,注意不要吸到凝胶,加入与上清等体积的异丙醇以及十分之一体积的醋酸钠溶液,用枪头吸打混匀后放入-20℃冰箱1h或过夜静置;
4)4℃/16000g,离心30min,弃上清,加入75%预冷的乙醇,将沉淀吸打混匀,4℃/16000g,离心12min,弃上清,在通风橱静置2-3min,晾干RNA表面的乙醇,加入100μL的RNA free H2O,吸打混匀。
5、用Nanodrop测定纯化后的crRNA浓度,并统一稀释至250ng/μL,分装至200μL的PCR离心管中,冻存-80℃备用。
三、Casσ蛋白体外酶切及PAM消耗:
1、双链DNA酶切体系的建立:
(1)配置如下反应体系,轻轻吹打混匀后短暂离心。置于37℃,15min;DNA切割反应体系如下所示:
(2)加入300ng底物DNA(100ng/μL),3μL,轻轻吹打混匀后短暂离心。置于37℃,8hour;
(3)加入RNAse,置于37℃,15min,充分消化体系中的RNA杂质;
(4)加入蛋白酶K,置于55℃,15min,消化Casσ-1至13蛋白;
(5)琼脂糖跑胶检测。
跑胶结果显示,Casσ-1能够有效的切割双链DNA。
2、PAM位点鉴定:
(1)配置如上步骤1反应体系,底物DNA更换为靶点前为8个随机碱基的质粒文库,置于37℃,8hour,副对照样品为添加Casσ不添加crRNA的样品,每个蛋白三个重复;
(2)反应结束后将反应样品进行柱纯化,纯化产物作为模板进行二代文库的构建,文库构建的体系及方法同大肠杆菌体内PAM文库消耗步骤2中的文库构建方法,具体操作流程如下:
(每个样品对应一个R向引物,对应多个F向引物),配制以下试剂:
将配制好的反应体系置于PCR仪器中,程序如下:
每个样品测序1G;
(3)分别统计实验组和对照组中组合的PAM序列出现次数,并用各自组所有的PAM序列数目进行标准化。对于任意一条PAM序列,当log2(对照组标准化值/实验组标准化值)大于3.5时,我们认为这条PAM被显著消耗,我们从所有PAM序列中得到了显著被消耗的PAM序列。并且,用Weblogo对显著消耗的PAM序列进行预测,最终得到了Casσ的PAM结构域(图1)。
(4)PAM文库结构域的验证:通过PAM文库消耗实验,我们获得了Casσ-1的PAM结构域,为了验证这一结构域的严谨性,我们设置了TTT PAM进行了体内实验,测试Casσ-1对此PAM的编辑活性。首先,我们将T7启动子带有相应PAM位点的26nt的靶标以及T7终止子的序列整合到pET30a-Casσ-1的载体上,然后与pACYCDuet-1质粒共转,涂布于卡那霉素和氯霉素的抗性板上进行筛选。挑选双抗的单克隆斑进行摇菌,OD值为1.0进行IPTG诱导12小时,随后对诱导前后诱导后的菌进行梯度稀释点斑观察,如若对氯霉素基因进行编辑,则在氯霉素抗性板上生长较差。通过实验结果 (图2和图3),我们可以看到CRISPR/Casσ系统只能对带有特定PAM结构域的靶标序列(例如,TTT)进行有效地编辑,而对其余的靶标序列则(例如,CCC)没有编辑活性,从而验证了Casσ-1的PAM结构域识别的准确性。通过上述实验结果,证实了Casσ-1具有宽泛的PAM识别方式(即,NTN;其中,两个N可以各自独立地为A或G或T或C),因此Casσ-1在靶点的选择上更为容易。
实施例4.Casσ在人类细胞系中的切割活性分析
将含有Casσ-1基因的真核表达载体和含有U6启动子与向导RNA(含SEQ ID NO:27所示的原型同向重复序列和SEQ ID NO:82所示的真核编辑的导向序列)的PCR产物通过脂质体转染的方法转入到人类HELA细胞中,在37摄氏度5%二氧化碳浓度下培养72h。提取全部细胞的DNA,并对包含靶位点700bp的序列进行扩增,将PCR产物连接B-simple载体进行一代测序,测序由赛默飞公司完成,将测序结果比对到人类基因组的AAVS1基因上,鉴定到了Casσ-1对目标靶位点能进行双链DNA编辑,进而造成碱基缺失(图4)。
尽管本发明的具体实施方式已经得到详细的描述,但本领域技术人员将理解:根据已经公布的所有教导,可以对细节进行各种修改和变动,并且这些改变均在本发明的保护范围之内。本发明的全部分为由所附权利要求及其任何等同物给出。

Claims (29)

  1. 一种蛋白,其具有SEQ ID NO:1、2、3、4、5、6、7、8、9、10、11、12和13任一项所示的氨基酸序列或其直系同源物(ortholog)、同源物、变体或功能性片段;其中,所述直系同源物、同源物、变体或功能性片段基本保留了其所源自的序列的生物学功能;
    例如,所述直系同源物、同源物、变体与其所源自的序列相比具有至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、或至少99%的序列同一性;
    例如,所述直系同源物、同源物、变体与SEQ ID NO:1、2、3、4、5、6、7、8、9、10、11、12和13任一项所示的序列相比具有至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、或至少99%的序列同一性,并且基本保留了其所源自的序列的生物学功能;
    例如,所述蛋白是CRISPR/Cas系统中的效应蛋白。
  2. 权利要求1所述的蛋白,其包含选自下列的序列,或由选自下列的序列组成:
    (i)SEQ ID NO:1、2、3、4、5、6、7、8、9、10、11、12和13任一项所示的序列;
    (ii)与SEQ ID NO:1、2、3、4、5、6、7、8、9、10、11、12和13任一项所示的序列相比具有一个或多个氨基酸的置换、缺失或添加(例如1个,2个,3个,4个,5个,6个,7个,8个,9个,10个,11个,12个,13个,14个,15个,16个,17个,18个,19个,20个,21个,22个,23个,24个,25个,26个,27个,28个,29个,30个,31个,32个,33个,34个,35个,36个,37个,38个,39个以及40个氨基酸的置换、缺失或添加)的序列;或
    (iii)与SEQ ID NO:1、2、3、4、5、6、7、8、9、10、11、12和13任一项所示的序列具有至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、或至少99%的序列同一性的序列;
    例如,所述蛋白具有SEQ ID NO:1、2、3、4、5、6、7、8、9、10、11、12和13任一项所示的氨基酸序列。
  3. 一种缀合物,其包含权利要求1或2所述的蛋白以及修饰部分;
    例如,所述修饰部分选自另外的蛋白或多肽、可检测的标记,及其任意组合;
    例如,所述修饰部分任选地通过接头连接至所述蛋白的N端或C端;
    例如,所述修饰部分融合至所述蛋白的N端或C端;
    例如,所述另外的蛋白或多肽选自表位标签、报告基因序列、核定位信号(NLS)序列、靶向部分、转录激活结构域(例如,VP64)、转录抑制结构域(例如,KRAB结构域或SID结构域)、核酸酶结构域(例如,Fok1),具有选自下列的活性的结构域:核苷酸脱氨酶、甲基化酶活性,去甲基化酶,转录激活活性,转录抑制活性,转录释放因子活性,组蛋白修饰活性,核酸酶活性,单链RNA切割活性,双链RNA切割活性,单链DNA切割活性,双链DNA切割活性和核酸结合活性;以及其任意组合;
    例如,所述缀合物包含表位标签;
    例如,所述缀合物包含NLS序列;
    例如,所述NLS序列如SEQ ID NO:53所示;
    例如,所述NLS序列位于、靠近或接近所述蛋白的末端(例如,N端或C端)。
  4. 一种融合蛋白,其包含权利要求1或2所述的蛋白以及另外的蛋白或多肽;
    例如,所述另外的蛋白或多肽任选地通过接头连接至所述蛋白的N端或C端;
    例如,所述另外的蛋白或多肽选自表位标签、报告基因序列、核定位信号(NLS)序列、靶向部分、转录激活结构域(例如,VP64)、转录抑制结构域(例如,KRAB结构域或SID结构域)、核酸酶结构域(例如,Fok1),具有选自下列的活性的结构域:核苷酸脱氨酶、甲基化酶活性,去甲基化酶,转录激活活性,转录抑制活性,转录释放因子活性,组蛋白修饰活性,核酸酶活性,单链RNA切割活性,双链RNA切割活性,单链DNA切割活性,双链DNA切割活性和核酸结合活性;以及其任意组合;
    例如,所述融合蛋白包含表位标签;
    例如,所述融合蛋白包含NLS序列;
    例如,所述NLS序列如SEQ ID NO:53所示;
    例如,所述NLS序列位于、靠近或接近所述蛋白的末端(例如,N端或C端);
    例如,所述融合蛋白具有如SEQ ID NO:54-66任一项所示的氨基酸序列。
  5. 一种分离的核酸分子,其包含选自下列的序列,或由选自下列的序列组成:
    (i)SEQ ID NO:27-39任一项所示的序列;
    (ii)与SEQ ID NO:27-39任一项所示的序列相比具有一个或多个碱基的置换、缺失或添加(例如1个,2个,3个,4个,5个,6个,7个,8个,9个或10个碱基的置 换、缺失或添加)的序列;
    (iii)与SEQ ID NO:27-39任一项所示的序列具有至少20%、至少30%、至少40%、至少50%、至少60%、至少70%、至少80%、至少90%、或至少95%的序列同一性的序列;
    (iv)在严格条件下与(i)-(iii)任一项中所述的序列杂交的序列;或
    (v)(i)-(iii)任一项中所述的序列的互补序列;
    并且,(ii)-(v)中任一项所述的序列基本保留了其所源自的序列的生物学功能;
    例如,所述核酸分子包含一个或多个茎环或优化的二级结构;
    例如,(ii)-(v)中任一项所述的序列保留了其所源自的序列的二级结构;
    例如,所述核酸分子包含选自下列的序列,或由选自下列的序列组成:
    (a)SEQ ID NO:27-39任一项所示的核苷酸序列;
    (b)在严格条件下与(a)中所述的序列杂交的序列;或
    (c)(a)中所述的序列的互补序列;
    例如,所述分离的核酸分子是RNA;
    例如,所述分离的核酸分子是CRISPR/Cas系统中的同向重复序列。
  6. 一种复合物,其包含:
    (i)蛋白组分,其选自:权利要求1或2所述的蛋白、权利要求3所述的缀合物、权利要求4所述的融合蛋白,及其任意组合;和
    (ii)核酸组分,其从5’至3’方向包含权利要求5所述的分离的核酸分子和能够与靶序列杂交的导向序列,
    其中,所述蛋白组分与核酸组分相互结合形成复合物。
  7. 权利要求6所述的复合物,其中,所述导向序列连接于所述核酸分子的3’端;
    例如,所述导向序列包含所述靶序列的互补序列;
    例如,所述核酸组分是CRISPR/Cas系统中的导向RNA;
    例如,所述核酸分子是RNA;
    例如,所述复合物不包含反式作用crRNA(tracrRNA);
    优选地,所述复合物靶向第三组分,所述第三组分为含有靶序列的双链多核苷酸,所述靶序列临近所述蛋白组分所识别的基序序列;
    优选地,所述靶序列位于所述基序序列的3’端。
  8. 一种分离的核酸分子,其包含:
    (i)编码权利要求1或权利要求2所述的蛋白,或权利要求4所述的融合蛋白的核苷酸序列;
    (ii)编码权利要求5所述的分离的核酸分子的核苷酸序列;和/或,
    (iii)包含(i)和(ii)的核苷酸序列;
    例如,(i)-(iii)任一项中所述的核苷酸序列经密码子优化用于在原核细胞或真核细胞中进行表达。
  9. 一种载体,其包含权利要求8所述的分离的核酸分子。
  10. 一种宿主细胞,其包含权利要求8所述的分离的核酸分子或权利要求9所述的载体。
  11. 一种组合物,其包含:
    (i)第一组分,其选自:权利要求1或2所述的蛋白、权利要求3所述的缀合物、权利要求4所述的融合蛋白、编码所述蛋白或融合蛋白的核苷酸序列,以及其任意组合;和
    (ii)第二组分,其为包含导向RNA的核苷酸序列,或者编码所述包含导向RNA的核苷酸序列的核苷酸序列;
    其中,所述导向RNA从5’至3’方向包含同向重复序列和导向序列,所述导向序列能够与靶序列杂交;
    所述导向RNA能够与(i)中所述的蛋白、缀合物或融合蛋白形成复合物;
    例如,所述同向重复序列是权利要求5中所定义的分离的核酸分子;
    例如,所述导向序列连接至所述同向重复序列的3’端;
    例如,所述导向序列包含所述靶序列的互补序列;
    例如,所述组合物不包含反式作用crRNA(tracrRNA);
    例如,所述组合物是非天然存在的或经修饰的;
    例如,所述组合物中的至少一个组分是非天然存在的或经修饰的;
    例如,所述第一组分是非天然存在的或经修饰的;和/或,所述第二组分是非天然存在的或经修饰的。
  12. 一种组合物,其包含一种或多种载体,所述一种或多种载体包含:
    (i)第一核酸,其包含编码权利要求1或2所述的蛋白或权利要求4所述的融合蛋白的核苷酸序列;任选地所述第一核酸可操作地连接至第一调节元件;以及
    (ii)第二核酸,其包含编码导向RNA的核苷酸序列;任选地所述第二核酸可操作地连接至第二调节元件;
    其中:
    所述第一核酸与第二核酸存在于相同或不同的载体上;
    所述导向RNA从5’至3’方向包含同向重复序列和导向序列,所述导向序列能够与靶序列杂交;
    所述导向RNA能够与(i)中所述的效应蛋白或融合蛋白形成复合物;
    例如,所述同向重复序列是权利要求5中所定义的分离的核酸分子;
    例如,所述导向序列连接至所述同向重复序列的3’端;
    例如,所述导向序列包含所述靶序列的互补序列;
    例如,所述组合物不包含反式作用crRNA(tracrRNA);
    例如,所述组合物是非天然存在的或经修饰的;
    例如,所述组合物中的至少一个组分是非天然存在的或经修饰的;
    例如,所述第一调节元件是启动子,例如诱导型启动子;
    例如,所述第二调节元件是启动子,例如诱导型启动子。
  13. 权利要求11或12所述的组合物,其中,当所述靶序列为DNA时,所述靶序列位于原间隔序列临近基序(PAM)的3’端,并且所述PAM具有5’-NTN所示的序列,其中,N各自独立地选自A、G、T或C;
    例如,所述PAM的序列为ATG,ATG,GTG,ATA,ATA,GTA,GTA和/或GTG。
  14. 权利要求11-13任一项所述的组合物,其中,所述靶序列是来自原核细胞或真核细胞的DNA或RNA序列;或者,所述靶序列是非天然存在的DNA或RNA序列。
  15. 权利要求11-14任一项所述的组合物,其中,所述靶序列存在于细胞内;或者,所述靶序列存在于体外的核酸分子(例如,质粒)中;
    例如,所述靶序列存在于细胞核内或细胞质(例如,细胞器)内;例如,所述细胞是真核细胞;例如,所述细胞是原核细胞。
  16. 权利要求11-15任一项所述的组合物,其中,所述蛋白连接有一个或多个NLS序列,或者,所述缀合物或融合蛋白包含一个或多个NLS序列;
    例如,所述NLS序列连接至所述蛋白的N端或C端;
    例如,所述NLS序列融合至所述蛋白的N端或C端。
  17. 一种试剂盒,其包括一种或多种选自下列的组分:权利要求1或2所述的蛋白、权利要求3所述的缀合物、权利要求4所述的融合蛋白、权利要求5所述的分离的核酸分子、权利要求6或7所述的复合物、权利要求8所述的分离的核酸分子、权利要求9所述的载体、权利要求10所述的宿主细胞、权利要求11-16任一项所述的组合物;
    例如,所述试剂盒包含权利要求11或12所述的组合物,以及使用所述组合物的说明书。
  18. 一种递送组合物,其包含递送载体,以及选自下列的一种或多种:权利要求1或2所述的蛋白、权利要求3所述的缀合物、权利要求4所述的融合蛋白、权利要求5所述的分离的核酸分子、权利要求6或7所述的复合物、权利要求8所述的分离的核酸分子、权利要求9所述的载体、权利要求10所述的宿主细胞、权利要求11-16任一项所述的组合物;
    例如,所述递送载体是粒子;例如,所述递送载体选自脂质颗粒、糖颗粒、金属颗粒、蛋白颗粒、脂质体、外泌体、微泡、基因枪或病毒载体(例如,复制缺陷型逆转录病毒、慢病毒、腺病毒或腺相关病毒)。
  19. 一种修饰靶基因的方法,其包括:将权利要求6或7所述的复合物或权利要求11-16任一项所述的组合物与所述靶基因接触,或者递送至包含所述靶基因的细胞中;所述靶序列存在于所述靶基因中;
    例如,所述靶基因存在于细胞内,或者,所述靶基因存在于体外的核酸分子(例如,质粒)中;
    例如,所述细胞是原核细胞;例如,所述细胞是真核细胞;例如,所述细胞选自动物细胞(例如,哺乳动物细胞,例如人类细胞)、植物细胞;
    例如,所述修饰是指所述靶序列的断裂,如DNA的双链断裂或RNA的单链断裂;
    例如,所述修饰还包括将外源核酸插入所述断裂中。
  20. 一种改变基因产物的表达的方法,其包括:将权利要求6或7所述的复合物或权利要求11-16任一项所述的组合物与编码所述基因产物的核酸分子接触,或者递送至包含所述核酸分子的细胞中,所述靶序列存在于所述核酸分子中;
    例如,所述核酸分子存在于细胞内,或者所述核酸分子存在于体外的核酸分子(例 如,质粒)中;
    例如,所述细胞是原核细胞;例如,所述细胞是真核细胞;例如,所述细胞选自动物细胞(例如,哺乳动物细胞,例如人类细胞)、植物细胞;
    例如,所述基因产物的表达被改变(例如,增强或降低);例如,所述基因产物是蛋白。
  21. 权利要求18-20任一项所述的方法,其中所述的蛋白、缀合物、融合蛋白、分离的核酸分子、复合物、载体或组合物包含于递送载体中;
    例如,所述递送载体选自脂质颗粒、糖颗粒、金属颗粒、蛋白颗粒、脂质体、外泌体、病毒载体(如复制缺陷型逆转录病毒、慢病毒、腺病毒或腺相关病毒)。
  22. 权利要求18-21任一项所述的方法,其用于改变靶基因或编码靶基因产物的核酸分子中的一个或多个靶序列来修饰细胞、细胞系或生物体。
  23. 一种由权利要求18-22任一项所述的方法获得的细胞或其子代,其中所述细胞包含在其野生型中不存在的修饰。
  24. 权利要求23所述的细胞或其子代的细胞产物。
  25. 一种体外的、离体的或体内的细胞或细胞系或它们的子代,所述细胞或细胞系或它们的子代包含:权利要求1或2所述的蛋白、权利要求3所述的缀合物、权利要求4所述的融合蛋白、权利要求5所述的分离的核酸分子、权利要求6或7所述的复合物、权利要求8所述的分离的核酸分子、权利要求9所述的载体、权利要求11-16任一项所述的组合物;
    例如,所述细胞是原核细胞或真核细胞。
  26. 权利要求1或2所述的蛋白、权利要求3所述的缀合物、权利要求4所述的融合蛋白、权利要求5所述的分离的核酸分子、权利要求6或7所述的复合物、权利要求8所述的分离的核酸分子、权利要求9所述的载体、权利要求11-16任一项所述的组合物或权利要求18所述的试剂盒,在制备制剂中的用途,所述制剂用于核酸编辑(例如,体外或离体核酸编辑);
    例如,所述核酸编辑包括基因或基因组编辑;
    例如,所述基因或基因组编辑包括修饰基因、敲除基因、改变基因产物的表达、修 复突变、和/或插入多核苷酸。
  27. 权利要求1或2所述的蛋白、权利要求3所述的缀合物、权利要求4所述的融合蛋白、权利要求5所述的分离的核酸分子、权利要求6或7所述的复合物、权利要求8所述的分离的核酸分子、权利要求9所述的载体、权利要求11-16任一项所述的组合物或权利要求17所述的试剂盒,在制备制剂中的用途,所述制剂用于:(i)体外或离体DNA检测;和/或,(ii)编辑靶基因座中的靶序列来修饰生物或非人类生物。
  28. 一种检测样品中是否存在靶核酸的方法,其包括以下步骤:
    (1)将所述样品与带有标记的DNA探针、以及以下任一组分接触:权利要求6或7所述的复合物、权利要求11-16任一项所述的组合物或权利要求17所述的试剂盒;
    其中,所述复合物、组合物或试剂盒包含的导向序列能够与靶核酸杂交,并且,所述DNA探针不与所述导向序列杂交;优选地,所述DNA探针被切割后发出可检测信号;
    (2)检测由所述复合物、组合物或试剂盒所包含的蛋白或蛋白截短体切割所述DNA探针产生的可检测信号,从而确定所述样品中是否存在靶核酸;
    优选地,所述DNA探针的一端(例如,5’端)经荧光基团标记,另一端(例如,3’端)经淬灭基团标记。
  29. 权利要求28所述的方法,其中,所述靶核酸的序列为获自病原物的序列;优选地,所述病原物选自病毒、细菌、真菌、原生动物、寄生虫或其任意组合;
    优选地,所述靶核酸的序列获自肿瘤细胞的基因组;
    任选地,所述方法还包括将所述样品与用于逆转录的试剂接触的步骤;优选地,所述用于逆转录的试剂选自逆转录酶、寡核苷酸引物、dNTP或其任意组合;
    优选地,所述靶核酸是单链或双链的;优选地,所述靶核酸的序列是来自原核细胞或真核细胞的DNA或RNA序列;或者,所述靶核酸的序列是非天然存在的DNA或RNA序列;
    优选地,所述可检测信号通过选自下列的一种或多种方法测定:基于成像的检测,基于传感器的检测,颜色检测,基于金纳米颗粒的检测,荧光偏振,胶体相变/分散,电化学检测和基于半导体的传感;
    优选地,所述方法还包括扩增样品中所述靶核酸的步骤。
PCT/CN2024/116773 2023-09-04 2024-09-04 新型CRISPR-Casσ酶和系统 Pending WO2025051140A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP24861983.5A EP4631979A1 (en) 2023-09-04 2024-09-04 New crispr-cas sigma enzyme and system
US19/011,407 US20250179534A1 (en) 2023-09-04 2025-01-06 Novel CRISPR-Cas sigma enzyme and system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202311132967.0 2023-09-04
CN202311132967 2023-09-04

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US19/011,407 Continuation US20250179534A1 (en) 2023-09-04 2025-01-06 Novel CRISPR-Cas sigma enzyme and system

Publications (1)

Publication Number Publication Date
WO2025051140A1 true WO2025051140A1 (zh) 2025-03-13

Family

ID=94060923

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2024/116773 Pending WO2025051140A1 (zh) 2023-09-04 2024-09-04 新型CRISPR-Casσ酶和系统

Country Status (4)

Country Link
US (1) US20250179534A1 (zh)
EP (1) EP4631979A1 (zh)
CN (1) CN119193541B (zh)
WO (1) WO2025051140A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019214604A1 (zh) * 2018-05-07 2019-11-14 中国农业大学 CRISPR/Cas效应蛋白及系统
CN113462672A (zh) * 2018-11-15 2021-10-01 中国农业大学 CRISPR-Cas12j酶和系统
CN114410609A (zh) * 2022-03-29 2022-04-29 舜丰生物科技(海南)有限公司 一种活性提高的Cas蛋白以及应用
WO2022159741A1 (en) * 2021-01-22 2022-07-28 Arbor Biotechnologies, Inc. Compositions comprising a nuclease and uses thereof
WO2023086938A2 (en) * 2021-11-12 2023-05-19 Arbor Biotechnologies, Inc. Type v nucleases
CN116334037A (zh) * 2020-11-11 2023-06-27 山东舜丰生物科技有限公司 新型Cas酶和系统以及应用

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021178933A2 (en) * 2020-03-06 2021-09-10 Metagenomi Ip Technologies, Llc Class ii, type v crispr systems
KR102929494B1 (ko) * 2018-10-29 2026-02-20 차이나 어그리컬처럴 유니버시티 신규 CRISPR/Cas12f 효소 및 시스템
CN113930410A (zh) * 2020-06-29 2022-01-14 中国农业大学 新型CRISPR-Cas12L酶和系统
CN117230042A (zh) * 2022-10-13 2023-12-15 山东舜丰生物科技有限公司 新型crispr酶和系统及应用
WO2024251229A1 (zh) * 2023-06-09 2024-12-12 益杰立科(上海)生物科技有限公司 Cas酶及其系统和应用

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019214604A1 (zh) * 2018-05-07 2019-11-14 中国农业大学 CRISPR/Cas效应蛋白及系统
CN113462672A (zh) * 2018-11-15 2021-10-01 中国农业大学 CRISPR-Cas12j酶和系统
CN116334037A (zh) * 2020-11-11 2023-06-27 山东舜丰生物科技有限公司 新型Cas酶和系统以及应用
WO2022159741A1 (en) * 2021-01-22 2022-07-28 Arbor Biotechnologies, Inc. Compositions comprising a nuclease and uses thereof
WO2023086938A2 (en) * 2021-11-12 2023-05-19 Arbor Biotechnologies, Inc. Type v nucleases
CN114410609A (zh) * 2022-03-29 2022-04-29 舜丰生物科技(海南)有限公司 一种活性提高的Cas蛋白以及应用

Non-Patent Citations (13)

* Cited by examiner, † Cited by third party
Title
"A PRACTICAL METHOD APPROACH", 1995, ACADEMIC PRESS
"CURRENT PROTOCOLS IN MOLECULAR BIOLOGY", 1987
E. MEYERSW. MILLER, COMPUT. APPL BIOSCI., vol. 4, 1988, pages 11 - 17
GOEDDEL: "GENE EXPRESSION TECHNOLOGY: METHODS IN ENZYMOLOGY", vol. 185, 1990, ACADEMIC PRESS
HOLLIGER, P ET AL., PROC. NATL. ACAD. SCI. USA, vol. 90, 1993, pages 6444 - 6448
MOL. CELL. BIOL., vol. 8, no. 1, 1988, pages 466 - 472
NEEDLEMAN ET AL., J. MOL. BIOL., vol. 48, 1970, pages 443 - 453
NEEDLEMANWUNSCH, J MOI BIOL, vol. 48, 1970, pages 444 - 453
POLJAK, R. J ET AL., STRUCTURE, vol. 2, 1994, pages 1121 - 1123
PROC. NATL. ACAD. SCI. USA., vol. 78, no. 3, 1981, pages 1527 - 31
SAMBROOKFRITSCHMANIATIS: "MOLECULAR CLONING: A LABORATORY MANUAL", 1989
See also references of EP4631979A1
TIJSSEN: "Overview of principles of hybridization and the strategy of nucleic acid probe assay", 1993, ELSEVIER, article "Laboratory Techniques In Biochemistry And Molecular Biology-Hybridization With Nucleic Acid Probes"

Also Published As

Publication number Publication date
EP4631979A1 (en) 2025-10-15
CN119193541B (zh) 2025-08-12
CN119193541A (zh) 2024-12-27
US20250179534A1 (en) 2025-06-05

Similar Documents

Publication Publication Date Title
JP7460178B2 (ja) CRISPR-Cas12j酵素およびシステム
KR102929494B1 (ko) 신규 CRISPR/Cas12f 효소 및 시스템
CN112004932B (zh) 一种CRISPR/Cas效应蛋白及系统
CN113015798B (zh) CRISPR-Cas12a酶和系统
WO2019214604A1 (zh) CRISPR/Cas效应蛋白及系统
CN113930413B (zh) 新型CRISPR-Cas12j.23酶和系统
WO2019206233A1 (zh) 一种RNA编辑的CRISPR/Cas效应蛋白及系统
CN113930411A (zh) 新型CRISPR-Cas12M酶和系统
CN113930412B (zh) 新型CRISPR-Cas12N酶和系统
CN113930410A (zh) 新型CRISPR-Cas12L酶和系统
CN119193541B (zh) CRISPR-Casσ酶和系统
US20250368974A1 (en) Novel CRISPR-Cas delta enzyme and system
WO2024175015A1 (zh) CRISPR/Cas效应蛋白及系统
CA3118251C (en) Novel crispr/cas12f enzyme and system
WO2025201316A1 (zh) 一种CRISPR-Cas系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24861983

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 24 861 983.5

Country of ref document: EP

Ref document number: 2024861983

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2024861983

Country of ref document: EP

Effective date: 20250709

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112025015106

Country of ref document: BR

WWP Wipo information: published in national office

Ref document number: 2024861983

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 112025015106

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20250721