IE85503B1

IE85503B1 - Hepatitis c virus protease

Info

Publication number: IE85503B1
Application number: IE2006/0594A
Authority: IE
Inventors: Houghton Michael; Choo Qui-Lim; Kuo George
Original assignee: Novartis Vaccines & Diagnostics Inc
Filing date: 1991-04-04
Publication date: 2010-05-12

Abstract

ABSTARCT This invention relates to the molecular biology and virology of the hepatitis C virus (HCV). More specifically, this invention relates to a novel composition comprising a purified hepatitis C virus (HCV) protease, encoded in the NS3 domain of the HCV genome, or truncations thereof having protease activity. The invention further relates to a fusion protein comprising a suitable fusion partner fused to a HCV protease encoded in the NS3 domain of the HCV genome. Finally, the invention relates to an expression vector for producing an HCV protease in a host cell.

Description

PATENTS ACT, 1992 /0594 HEPATITIS C VIRUS PROTEASE CHIRON CORPORATION HEPATITIS C VIRUS PROTEASE Description Technical Field This invention relates to the molecular biology and virology of the hepatitis C virus (HCV). More specifically, this invention relates to a novel composition comprising a purified hepatitis C virus (HCV) protease, encoded in the NS3 domain of the HCV genome, or truncations thereof having protease activity. The invention further relates to a fusion protein comprising a suitable fusion partner fused to a HCV protease encoded in the NS3 domain of the HCV genome. Finally, the invention relates to an expression vector for producing an HCV protease in a host cell.

Background 9: the Lgventiog Non—A, Non-B hepatitis (NANBH) is a transmissible disease (or family of diseases) that is believed to be virally induced, and is distinguishable from other forms of virus-associated liver disease, such as those caused by hepatitis A virus (HAV), hepatitis B virus (HBV), delta hepatitis virus (HDV), cytomegalovirus (CMV) or Epstein-Barr virus (EBV). Epidemiologic evidence suggests that there may be three types of NANBH: the water-borne epidemic type; the blood or needle and the sporadically occurring (com- munity acquired) type. associated type; However, the number of causative agents is unknown. Recently, however, a new viral species, hepatitis C virus (HCV) has been identified as the primary (if not only) cause of blood-associated NANBH (BB~NANBH). See for example, PCT W089/046699; U.S.

Patent Application Serial No. 7/456,637, filed 21 December 1989; and U.s. Patent Application Serial No. /456,637, filed 21 December 1989, incorporated herein by reference.

Hepatitis C appears to be the major form of transfusion-associated hepatitis in a number of countries, including the United States and Japan. There is also evidence implicating HCV in induction of hepatocellular carcinoma. Thus, a need exists for an effective method for treating HCV infection: currently, there is none.

Many viruses, including adenoviruses, baculoviruses, cornoviruses, picornaviruses, retroviruses, and togaviruses, rely on specific, virally—encoded proteases for processing polypeptides from their initial translated form into mature, active proteins. In the case of picornaviruses, all of the viral proteins are believed to arise from cleavage of a single polyprotein (B.D. Korant, CRC Crit Rev Biotech (1988) 8:149-S7).

S. Pichuantes et al, in "Viral Proteinases As Targets For Chemotherapy" (Cold Spring Harbor Laboratory Press, 1989) pp. 215-22, disclosed expression of a viral protease found in HIV-1. The HIV protease was obtained in the form of a fusion protein, by fusing DNA encoding an HIV protease precursor to DNA encoding human superoxide dismutase (hSOD), and expressing the product in E. coli. Transformed cells expressed products of 36 and 10 kDa (corresponding to the hSOD- protease fusion protein and the protease alone), suggesting that the protease was expressed in a form capable of autocatalytic proteolysis.

T.J. McQuade et al, Science (1990) gg1:454-56 disclosed preparation of a peptide mimic capable of specifically inhabiting the HIV-1 protease. In HIV, the protease is believed responsible for cleavage of the initial p55 gag precursor transcript into the core structural proteins (p17, p24, p8, and p7). Adding 1 pM inhibitor to HIV—infected peripheral blood lymphocytes in culture reduced the concentration of processed HIV p24 by about 70%. Viral maturation and levels of infectious virus were reduced by the protease inhibitor.

Miller et a1. (1990), Proc. Natl. Acad. Sci. USA, vol. 87, no. 6, pages 2057-2061 disclose amino acid similarities between several groups of viruses, inter alia pestiviruses and flaviviruses.

EP 0318216 discloses part of the HCV genomic sequence. EP 0318216 does not mention a protease, a NS3 region of the genome or the possible location of a protease-encoding sequence in the NS3 region of the genome.

BP 0419182 discloses the sequence of the NS3 region of HCV.

However, there is no disclosure in EP 0419182 indicating that the N33 region of HCV has protease activity.

Gorbalenya et al. (1989), Nucleic Acids Research, 17 (10), pages 3889-3897 disclose proteases from different pestiviruses and flaviviruses. It is reported that these proteases at least contain an N—proximal protease domain and an C-proximal RNA helicase domain.

EP 0 388 232 discloses the identification of different polypeptides derived from the genome of HCV. A protease activity is not described for any of the polypeptides disclosed in EP 0 388 232.

Disclosure of the Invention We have now invented a composition comprising a purified HCV protease, HCV protease fusion proteins, truncated and altered HCV proteases, and cloning and expression vectors therefore.

Brief Desc i ti t ‘ s Figure 1 shows the sequence of HCV protease. shows the polynucleotide sequence and sequence of the clone C20c. shows the polynucleotide sequence sequence of the clone C26d. shows the polynucleotide sequence sequence of the clone C3h.

Figure 2 deduced amino acid Figure 3 and deduced amino acid Figure 4 and deduced amino acid Figure 5 shows the polynucleotide sequence and deduced amino acid sequence of the clone C7f.

Figure 6 shows the polynucleotide sequence and deduced amino acid sequence of the clone C31.

Figure 7 shows the polynucleotide sequence sequence of the clone C35. and deduced amino acid Figure 3 shows the polynucleotide sequence sequence of the clone C33c. and deduced amino acid Figure 9 schematically illustrates assembly of the vector C7fC20cC300C200.

Figure 10 shows the sequence of vector cf1SODp600.

Modes of Carrying out The Invention A. Definitions The terms "Hepatitis C Virus" and "HCV" refer to the viral species that is the major etiological agent of BB-NANBH, the prototype isolate of which is identified in PCT W039/046699; EPO publication 318,216; USSN 7/355,008, filed 18 May 1989; and USSN 7/456,637, the disclosures of which are incorporated herein by reference. "HCV" as used herein includes the pathogenic strains capable of causing hepatitis C, and attenuated strains or defective interfering particles derived there- The HCV genome is comprised of RNA. It is known from. that RNA-containing viruses have relatively high rates of spontaneous mutation, reportedly on the order of 10“3 to -4 "Fundamental Virology" (1986, Raven Press, N.Y.)). As per incorporated nucleotide (Fields & Knipe, heterogeneity and fluidity of genotype are inherent char- acteristics of RNA viruses, there will be multiple strains/isolates, which may be virulent or avirulent, within the HCV species.

Information on several different strains/isolates of HCV is disclosed herein, particularly strain or isolate CDC/HCVI (also called HCV1).

Information from one strain or isolate, such as a partial genomic sequence, is sufficient to allow those skilled in the art using standard techniques to isolate new strains/ isolates and to identify whether such new strains] isolates are HCV. For example, several different strains/isolates are described below. These strains, which were obtained from a number of human sera (and from different geographical areas), were isolated utilizing the information from the genomic sequence of HCVl.

The information provided herein suggests that HCV may be distantly related to the flaviviridae. The Flavivirus family contains a large number of viruses which are small, enveloped pathogens of man. The morphology and composition of Flavivirus particles are known, and are discussed in M.A. Brinton, in "The Viruses: The Togaviridae And Flaviviridae" (series eds.

Fraenkel-Conrat and Wagner, vol. eds. Schlesinger and Schlesinger, Plenum Press, 1986), pp. 327-374. Gen- erally, with respect to morphology, Flaviviruses contain a central nucleocapsid surrounded by a lipid bilayer.

Virions are spherical and have a diameter of about 40-50 nm. Their cores are about 25-30 nm in diameter. Along the outer surface of the virion envelope are projections measuring about 5-10 nm in length with terminal knobs about 2 nm in diameter. Typical examples of the family include Yellow Fever virus, west Nile virus, and Dengue They possess positive-stranded RNA genomes (about 11,000 nucleotides) that are slightly larger than that of HCV and encode a polyprotein precursor of about Individual viral proteins are cleaved from this precursor polypeptide.

The genome of HCV appears to be single-stranded RNA containing about 10,000 nucleotides.

Fever virus. amino acids.

The genome is positive-stranded, and possesses a continuous translational open reading frame (ORF) that encodes a In the ORF, the structural proteins appear to be encoded in ap- polyprotein of about 3,000 amino acids. proximately the first quarter of the N-terminal region, with the majority of the polyprotein attributed to non- structural proteins. when compared with all known viral sequences, small but significant co-linear homologies are observed with the non-structural proteins of the Flavivirus family, and with the pestiviruses (which are now also considered to be part of the Flavivirus family).

A schematic alignment of possible regions of a flaviviral polyprotein (using Yellow Fever Virus as an example), and of a putative polyprotein encoded in the major ORF of the HCV genome, is shown in Figure 1.

Possible domains of the HCV polyprotein are indicated in the figure. The Yellow Fever Virus polyprotein contains, from the amino terminus to the carboxy terminus, the nucleocapsid protein (C), the matrix protein (M), the envelope protein (E), and the non-structural proteins 1, 2 (a+b), 3, 4 (a+b), and 5 (N81, N32, N33, N54, and NS5).

Based upon the putative amino acids encoded in the nucleotide sequence of HCV1, a small domain at the extreme N-terminus of the HCV polyprotein appears sim- ilar both in size and high content of basic residues to the nucleocapsid protein (C) found at the N-terminus of flaviviral polyproteins. The non-structural proteins 2,3,4, and 5 (N82-5) of HCV and of yellow fever virus (YFV) appear to have counterparts of similar size and hydropathicity, although the amino acid sequences However, the region of HCV which would cor- respond to the regions of YFV polyprotein which contains diverge. the M, E, and NS1 protein not only differs in sequence, but also appears to be quite different in size and hydro- Thus, while certain domains of the HCV genome may be referred to herein as, for example, N81, or N82, it should be understood that these designations are for convenience of reference only; pathicity. there may be consider- able differences between the HCV family and flaviviruses that have yet to be appreciated.

Due to the evolutionary relationship of the strains or isolates of HCV, putative HCV strains and iso- lates are identifiable by their homology at the poly- peptide level. with respect to the isolates disclosed herein, new HCV strains or isolates are expected to be at least about 40% homologous, some more than about 70% homologous, and some even more than about 80% homologous: some may be more than about 90% homologous at the polypeptide level. The techniques for determining amino acid sequence homology are known in the art. For example, the amino acid sequence may be determined directly and compared to the sequences provided herein.

Alternatively the nucleotide sequence of the genomic material of the putative HCV may be determined (usually via a cDNA intermediate), the amino acid sequence encoded therein can be determined, and the corresponding regions compared.

The term "HCV protease" refers to an enzyme derived from HCV which exhibits proteolytic activity, specifically the polypeptide encoded in the N83 domain of the HCV genome. At least one strain of HCV contains a protease believed to be substantially encoded by or within the following sequence: Arg Arg Gly Arg Glu Ile Leu Leu Gly Pro 10 Ala Asp Gly Met val Ser Lys Gly Trp Arg 20 Leu Leu Ala Pro Ile Thr Ala Tyr Ala Gln 30 Gln Thr Arg Gly Leu Leu Gly Cys Ile Ile 40 Thr Ser Leu Thr Gly Arg Asp Lys Asn Gln 50 Val Glu Gly Glu Val Gln Ile Val Ser Thr 60 Ala Ala Gln Thr Phe Leu Ala Thr Cys Ile 70 Asn Gly Val Cys Trp Thr Val Tyr gig Gly 80 Ala Gly Thr Arg Thr Ile Ala Ser Pro Lys 90 Gly Pro Val Ile Gln Met Tyr Thr Asn Val 100 Asp Gln Asp Leu Val Gly Trp Pro Ala Ser 110 Gln Gly Thr Arg Ser Leu Thr Pro Cys Thr 120 Cys Gly Ser Ser Asp Leu Tyr Leu Val Thr 130 Arg His Ala Asp Val Ile Pro Val Arg Arg 140 Arg Gly Asp Ser Arg Gly Ser Leu Leu Ser 150 Pro Arg Pro Ile Ser Tyr Leu Lys Gly Ser 160 ggg Gly Gly Pro Leu Leu Cys Pro Ala Gly 1 His Ala Val Gly Ile Phe Arg Ala Ala Val 180 Cys Thr Arg Gly Val Ala Lys Ala Val Asp 190 Phe Ile Pro Val Glu Asn Leu Glu Thr Thr 200 Met Arg "9 202 The above N and C termini are putative, the actual termini being defined by expression and processing in an appropriate host of a DNA construct encoding the entire NS3 domain. It is understood that this sequence may vary from strain to strain, as RNA viruses like HCV are known to exhibit a great deal of variation. Further, the actual N and C termini may vary, as the protease is cleaved from a precursor polyprotein: variations in the protease amino acid sequence can result in cleavage from the polyprotein at different points. Thus, the amino- and carboxy—termini may differ from strain to strain of HCV. residue 60 in Figure 1.

The first amino acid shown above corresponds to However, the minimum sequence necessary for activity can be determined by routine The sequence may be truncated at either end by treating an appropriate expression vector with an exo- nuclease (after cleavage at the 5' or 3' end of the cod- methods. ing sequence) to remove any desired number of base pairs.

The resulting coding polynucleotide is then expressed and the sequence determined. In this manner the activity of the resulting product may be correlated with the amino acid sequence: a limited series of such experiments (removing progressively greater numbers of base pairs) determines the minimum internal sequence necessary for protease activity. We have found that the sequence may be substantially truncated, particularly at the carboxy terminus, apparently with full retention of protease activity. It is presently believed that a portion of the protein at the carboxy terminus may exhibit helicase activity. However, helicase activity is not required of the HCV proteases of the invention. The amino terminus may also be truncated to a degree without loss of protease activity.

The amino acids underlined above are believed to be the residues necessary for catalytic activity, based on sequence homology to putative flavivirus serine proteases. Table 1 shows the alignment of the three serine protease catalytic residues for HCV protease and the protease obtained from Yellow Fever Virus, West Nile Fever virus, Murray Valley Fever virus, and Kunjin virus.

Although the other four flavivirus protease sequences exhibit higher homology with each other than with HCV, a degree of homology is still observed with HCV. This homology, however, was not sufficient for indication by currently available alignment software. The indicated amino acids are numbered His79, Asp103, and Serlsl in the sequence listed above (His139, Asp163, and Serzzl in Figure 1).

TABLE 1: Alignment of Active Residues by Sequence Protease His Asp Ser Alternatively, one can make catalytic residue Table 2 shows alignment of HCV with against the catalytic sites of assignments based on structural homology. several well-characterized serine proteases based on structural considerations: protease A from Streptomyces griseus, a-lytic protease, bovine trypsin, chymotrypsin, and elastase (M. James et a1, can J giochem (1978) HCV residues identified are numbered His79, Asplzs, and serlsl in the sequence listed above.

TABLE 2: Alignment of Active Residues by structure The most direct manner to verify the residues essential to the active site is to replace each residue individually with a residue of equivalent stearic size.

This is easily accomplished by site—specific mutagenesis and similar methods known in the art. If replacement of a particular residue with a residue of equivalent size results in loss of activity, the essential nature of the replaced residue is confirmed.

"HCV protease analogs" refer to polypeptides which vary from the full length protease sequence by deletion, alteration and/or addition to the amino acid sequence of the native protease. HCV protease analogs include the truncated proteases described above, as well as HCV protease muteins and fusion proteins comprising HCV protease, truncated protease, or protease muteins.

Alterations to form HCV protease muteins are preferably conservative amino acid substitutions, in which an amino acid is replaced with another naturally-occurring amino acid of similar character. For example, the following substitutions are considered "conservative": GIY ” A13; Lys « Arg; V31 ” I13 " Len; Asn H Gln; and Asp H Glu; pne » Trp » Tyr.

Nonconservative changes are generally substitutions of one of the above amino acids with an amino acid from a different group (e.g., substituting Asn for Glu), or sub- stituting Cys, Met, His, or Pro for any of the above amino acids. Substitutions involving common amino acids are conveniently performed by site specific mutagenesis of an expression vector encoding the desired protein, and subsequent expression of the altered form. One may also alter amino acids by synthetic or semi-synthetic methods.

For example, one may convert cysteine or serine residues to selenocysteine by appropriate chemical treatment of the isolated protein. Alternatively, one may incorporate uncommon amino acids in standard in vitro protein Typically, the total number of residues changed, deleted or added to the native sequence in the muteins will be no more than about 20, preferably no more than about 10, and most preferably no more than about 5. synthetic methods.

The term fusion protein generally refers to a polypeptide comprising an amino acid sequence drawn from two or more individual proteins. In the present invention, "fusion protein" is used to denote a polypeptide comprising the HCV protease, truncate, mutein or a functional portion thereof, fused to a non-HCV protein or polypeptide ("fusion partner”). Fusion proteins are most conveniently produced by expression of a fused gene, which encodes a portion of one polypeptide at the 5' end and a portion of a different polypeptide at the 3’ end, where the different portions are joined in one reading frame which may be expressed in a suitable host. It is presently preferred (although not required) to position the HCV protease or analog at the carboxy terminus of the fusion protein, and to employ a func- tional enzyme fragment at the amino terminus. protease is normally expressed within a large polyprotein, it is not expected to include cell transport signals (e.g., export or secretion signals). Suitable functional enzyme fragments are those polypeptides which exhibit a quantifiable activity when expressed fused to the HCV protease. Exemplary enzymes include, without limitation, 6-galactosidase (B-gal), B-lactamase, As the HCV horseradish peroxidase (HRP), glucose oxidase (GO), human superoxide dismutase (hsOD), urease, and the like. These enzymes are convenient because the amount of fusion protein produced can be quantified by means of simple colorimetric assays. Alternatively, one may employ antigenic proteins or fragments, to permit simple detec- tion and quantification of fusion proteins using anti- bodies specific for the fusion partner. The presently preferred fusion partner is hSOD.

B. General e 0d The practice of the present invention generally employs conventional techniques of molecular biology, microbiology, recombinant DNA, and immunology, which are within the skill of the art. explained fully in the literature.

Such techniques are See for example J.

Sambrook et al, "Molecular Cloning; A Laboratory Manual (1989); "DNA Cloning", Vol. I and II (D.N Glover ed. 1985); "oligonucleotide Synthesis" (M.J. Gait ed, 1984); "Nucleic Acid Hybridization" (B.D. Hames & S.J. Higgins eds. 1984); "Transcription And Translation“ (B.D. Hames & S.J. Higgins eds. 1984); "Animal Cell Culture" (R.I.

Freshney ed. 1936); "Immobilized Cells And Enzymes" (IRL Press, 1986); B. Perbal, Molecular Cloning" (1984); "A Practical Guide To the series, "Methods In Both prokaryotic and eukaryotic host cells are useful for expressing desired coding sequences when appropriate control sequences compatible with the desig- nated host are used. Among prokaryotic hosts, E. coli is Expression control sequences for prokaryotes include promoters, optionally containing most frequently used. coli; if desired, other prokaryotic hosts such as strains of Bacillus or Pseudomonas may be used, with cor- responding control sequences.

Eukaryotic hosts include without limitation yeast and mammalian cells in culture systems. Yeast expression hosts include Saccharomyces, Klebsiella, Picia, and the like. Saccharomyces cerevisiae and Saccharomyces carlsbergensis and K. lactis are the most commonly used yeast hosts, and are convenient fungal Yeast-compatible vectors carry markers which permit selection of successful transformants by hosts.

Particularly useful control systems are those which com- Termin- prise the glyceraldehyde-3 phosphate dehydrogenase (GAPDH) promoter or alcohol dehydrogenase (ADH) regulatable promoter, terminators also derived from GAPDH, and if secretion is desired, a leader sequence derived from yeast a-factor (see U.S. Pat. No. 4,870,008, incorporated herein by reference).

A presently preferred expression system employs the ubiquitin leader as the fusion partner. Copending application USSN 7/390,599 filed 7 August 1989 disclosed vectors for high expression of yeast ubiquitin fusion proteins. Yeast ubiquitin provides a 76 amino acid polypeptide which is automatically cleaved from the fused protein upon expression. The ubiquitin amino acid sequence is as follows: Gln Ile Phe Val Lys Thr Leu Thr Gly Lys Thr Ile Thr Leu Glu Val Glu Ser Ser Asp Thr Ile Asp Asn Val Lys Ser Lys Ile Gln Asp Lys Glu Gly Ile Pro Pro Asp Gln Gln Arg Leu Ile Phe Ala Gly Lys Gln Leu Glu Asp Gly Arg Thr Leu Ser Asp Tyr Asn Ile Gln Lys Glu Ser Thr Leu His Leu Val Leu Arg Leu Arg Gly Gly synthesizer. to a sequence encoding the HCV protease or a fragment thereof.

In addition, the transcriptional regulatory region and the transcriptional initiation region which are operably linked may be such that they are not natur- ally associated in the wild-type organism. These systems are described in detail in EPO 120,551, published October 3, 1984; EPO 116,201, published August 22, 1984; and EPO 164,556, published December 18, 1985, all of which are commonly owned with the present invention, and are hereby incorporated herein by reference in full.

Mammalian cell lines available as hosts for expression are known in the art and include many immortalized cell lines available from the American Type Culture Collection (ATCC), hamster ovary (CHO) cells, baby hamster kidney (BHK) including HeLa cells, Chinese cells, and a number of other cell lines. Suitable promoters for mammalian cells are also known in the art and include viral promoters such as that from Simian Virus 40 (SV40) (Piers et al, Nature (1978) g1;:113), Rous sarcoma virus (RSV), adenovirus (ADV), and bovine papilloma virus (BPV). Mammalian cells may also require terminator sequences and po1y~A addition sequences.

Enhancer sequences which increase expression may also be included, and sequences which promote amplification of the gene may also be desirable (for example methotrexate resistance genes). These sequences are known in the art.

Vectors suitable for replication in mammalian Expression of the HCV polypeptide then occurs in recombinant vaccinia virus.

In order to detect whether or not the HCV poly- peptide is expressed from the vaccinia vector, BSC 1 _17_ cells may be infected with the recombinant vector and grown on microscope slides under conditions which allow expression. The cells may then be acetone-fixed, and immunofluorescence assays performed using serum which is known to contain anti-HCV antibodies to a polypeptide(s) encoded in the region of the HCV genome from which the HCV segment in the recombinant expression vector was derived.

Other systems for expression of eukaryotic or viral genomes include insect cells and vectors suitable for use in these cells. These systems are known in the art, and include, for example, insect expression transfer vectors derived from the baculovirus Autographa californica nuclear polyhedrosis virus (ACNPV), which is a he1per—independent, viral expression vector.

Expression vectors derived from this system usually use the strong viral polyhedrin gene promoter to drive expression of heterologous genes. Currently the most commonly used transfer vector for introducing foreign genes into ACNPV is pAc373 (see PCT W039/046699 and USSN 7/456,637). Many other vectors known to those of skill in the art have also been designed for improved expres- These include, for example, pVL985 (which alters the polyhedrin start codon from ATG to ATT, and intro- sion. duces a BamHI cloning site 32 bp downstream from the ATT; See Luckow and Summers, yirgl (1989) ;1:31). ACNPV transfer vectors for high level expression of nonfused foreign proteins are described in copending applications PCT W089/046699 and USSN 7/456,637. A unique BamHI site is located following position -8 with respect to the translation initiation codon ATG of the polyhedrin gene.

There are no cleavage sites for SmaI, PstI, Bg1II, XbaI or Sstl. usually requires foreign genes that ideally have a short Good expression of nonfused foreign proteins leader sequence containing suitable translation initiation signals preceding an ATG start signal. The plasmid also contains the polyhedrin polyadenylation signal and the ampicillin-resistance (amp) gene and origin of replication for selection and propagation in E. coli.

Methods for the introduction of heterologous DNA into the desired site in the baculovirus virus are known in the art. (See Summer and Smith, Texas Agricultural Experiment Station Bulletin No. 1555; Smith et al, Mol gell Big; (1983) ;:2156-2165; and Luckow and Summers, yirgl (1989) ;1:31). For example, the heterologous DNA can be inserted into a gene such as the polyhedrin gene by homologous recombination, or into a restriction enzyme site engineered into the desired baculovirus gene. The inserted sequences may be those which encode all or varying segments of the polyprotein, or other orfs which encode viral polypeptides. For example, the insert could encode the following numbers of amino acid segments from the polyproteinz amino acids 1- 1078; amino acids 332-662; amino acids 406-662; amino acids 156-328, and amino acids 199-328.

The signals for post-translational modifications, such as signal peptide cleavage, proteolytic cleavage, and phosphorylation, appear to be recognized by insect cells. The signals required for secretion and nuclear accumulation also appear to be conserved between the invertebrate cells and vertebrate cells. Examples of the signal sequences from vertebrate cells which are effective in invertebrate cells are known in the art, for example, the human interleukin-2 signal (IL2s) which signals for secretion from the cell, is recognized and properly removed in insect cells.

Transformation may be by any known method for introducing polynucleotides into a host cell, including, for example packaging the polynucleotide in a virus and are known in the art. Site-specific DNA cleavage is performed by treating with suitable restriction enzymes under conditions which generally are specified by the manufacturer of these commercially available enzymes. In general, about 1 pg of plasmid or DNA sequence is cleaved by 1 unit of enzyme in about 20 uL buffer solution by incubation for 1-2 hr at 37°C. After incubation with the restriction enzyme, protein is removed by phenol/chloroform extraction and the DNA recovered by precipitation with ethanol. The cleaved fragments may be separated using polyacrylamide or agarose gel electrophoresis techniques, according to the general pro- Sticky-ended cleavage fragments may be blunt ended using E. coli DNA polymerase I (Klenow fragment} with the appropriate deoxynucleotide triphosphates (dNTPs) present in the mixture. Treatment with S1 nuclease may also be used, resulting in the hydrolysis of any single stranded DNA portions.

Ligations are carried out under standard buffer and temperature conditions using T4 DNA ligase and ATP; sticky end ligations require less ATP and less ligase than blunt end ligations. when vector fragments are used as part of a ligation mixture, the vector fragment is often treated with bacterial alkaline phosphatase (BAP) or calf intestinal alkaline phosphatase to remove the 5’- phosphate, thus preventing religation of the vector.

Alternatively, restriction enzyme digestion of unwanted fragments can be used to prevent ligation.

Ligation mixtures are transformed into suitable cloning hosts, such as E. coli, and successful transformants selected using the markers incorporated (e.g., antibiotic resistance), and screened for the correct construction.

Synthetic oligonucleotides may be prepared using an automated oligonucleotide synthesizer as described by Warner, D35 (1984) ;:401. If desired, the synthetic strands may be labeled with 32P by treatment with polynucleotide kinase in the presence of 32P—ATP under standard reaction conditions.

DNA sequences, including those isolated from CDNA libraries, may be modified by known techniques, for example by site directed mutagenesis (see e.g., Zoller, Nuc Acids ﬁes (1982) ;Q:6487). Briefly, the DNA to be modified is packaged into phage as a single stranded sequence, and converted to a double stranded DNA with DNA polymerase, using as a primer a synthetic oligonuc1eo- -2 In tide complementary to the portion of the DNA to be modified, where the desired modification is included in the primer sequence. The resulting double stranded DNA is transformed into a phage—supporting host bacterium.

Cultures of the transformed bacteria which contain copies of each strand of the phage are plated in agar to obtain plaques. Theoretically, 50% of the new plaques contain phage having the mutated sequence, and the remaining 50% have the original sequence. Replicates of the plaques are hybridized to labeled synthetic probe at temperatures and conditions which permit hybridization with the correct strand, but not with the unmodified sequence.

The sequences which have been identified by hybridization are recovered and cloned.

DNA libraries may be probed using the procedure of Grunstein and Hogness Rroc Nat Acad Sci USA (1975) 1;:3961. Briefly, in this procedure the DNA to be probed is immobilized on nitrocellulose filters, denatured, and prehybridized with a buffer containing 0-50% formamide, 0.75 M Nacl, 75 mM Na citrate, 0.02% (wt/v) each of bovine serum albumin, polyvinylpyrrolidone, and Ficoll°, 50 mM NaH2PO4 (pH 6.5), 0.1% SDS, and 100 ug/mL carrier denatured DNA. The percentage of formamide in the buffer, as well as the time and temperature conditions of the prehybridization and subsequent hybridization steps depend on the stringency required. Oligomeric probes which require lower stringency conditions are generally used with low percentages of formamide, lower temper- atures, and longer hybridization times. Probes containing more than 30 or 40 nucleotides, such as those derived from CDNA or genomic sequences generally employ higher temperatures, e.g., about 40-42°C, and a high percentage formamide, e.g., 50%. Following prehybridization, S’-32P-labeled oligonucleotide probe is added to the buffer, and the filters are incubated in this mixture under hybridization conditions. washing, the treated filters are subjected to autoradiography to show the location of the hybridized probe; After DNA in corresponding locations on the original agar plates is used as the source of the desired DNA.

The enzyme-linked immunosorbent assay (ELISA) can be used to measure either antigen or antibody con« centrations. This method depends upon conjugation of an enzyme to either an antigen or an antibody, and uses the bound enzyme activity as a quantitative label. To measure antibody, the known antigen is fixed to a solid phase (e.g., a microtiter dish, plastic cup, dipstick, plastic bead, or the like), incubated with test serum dilutions, washed, incubated with anti—immunoglobulin labeled with an enzyme, and washed again. Enzymes suitable for labeling are known in the art, and include, for example, horseradish peroxidase (HRP). Enzyme activity bound to the solid phase is usually measured by adding a specific substrate, and determining product for- mation or substrate utilization colorimetrically. The enzyme activity bound is a direct function of the amount of antibody bound.

To measure antigen, a known specific antibody is fixed to the solid phase, the test material containing antigen is added, after an incubation the solid phase is washed, and a second enzyme-labeled antibody is added.

After washing, substrate is added, and enzyme activity is measured colorimetrically, and related to antigen concentration.

Proteases of the invention may be assayed for activity by cleaving a substrate which provides detectable cleavage products. As the HCV protease is believed to cleave itself from the genomic polyprotein, one can employ this autocatalytic activity both to assay expression of the protein and determine activity. For example, if the protease is joined to its fusion partner so that the HCV protease N-terminal cleavage signal (Arg- Arg) is included, the expression product will cleave itself into fusion partner and active HCV protease. One may then assay the products, for example by western blot, to verify that the proteins produced correspond in size to the separate fusion partner and protease proteins. It is presently preferred to employ small peptide p- nitrophenyl esters or methylcoumarins, as cleavage may then be followed by spectrophotometric or fluorescent assays. Following the method described by E.D. Matayoshi et al, science (1990) g31:231-35, one may attach a fluorescent label to one end of the substrate and a quenching molecule to the other end: cleavage is then determined by measuring the resulting increase in fluorescence. If a suitable enzyme or antigen has been employed as the fusion partner, the quantity of protein produced may easily be determined. Alternatively, one may exclude the HCV protease N-terminal cleavage signal (preventing self-cleavage) and add a separate cleavage substrate, such as a fragment of the HCV N33 domain including the native processing signal or a synthetic analog.

In the absence of this protease activity, the HCV polyprotein should remain in its unprocessed form, Thus, the protease is useful for assaying pharmaceutical agents for and thus render the virus noninfectious. control of HCV, as compounds which inhibit the protease activity sufficiently will also inhibit viral infec- tivity. Such inhibitors may take the form of organic compounds, particularly compounds which mimic the cleavage site of HCV recognized by the protease. Three of the putative cleavage sites of the HCV polyprotein have the following amino acid sequences: Val—Ser-Ala-Arg-Arg // Gly-Arg-Glu-Ile-Leu-Leu-Gly Ala-Ile-Leu-Arg-Arg // His-Val-Gly-Pro- Val-Ser-Cys-Gln-Arg // G1y-Tyr- These sites are characterized by the presence of two basic amino acids immediately before the cleavage site, and are similar to the cleavage sites recognized by other flavivirus proteases. Thus, suitable protease inhibitors may be prepared which mimic the basic/basic/small neutral motif of the HCV cleavage sites, but substituting a nonlabile linkage for the Suitable inhibitors include peptide trifluoromethyl ketones, pep- peptide bond cleaved in the natural substrate. tide boronic acids, peptide a-ketoesters, peptide difluoroketo compounds, peptide aldehydes, peptide diketones, and the like. hyde N-acetyl-phenylalanyl—glycinaldehyde is a potent For example, the peptide alde- inhibitor of the protease papain. one may conveniently ~25- prepare and assay large mixtures of peptides using the methods disclosed in U.s. Patent application Serial No. 7/189,318, filed 2 May 1983 (published as PCT W089/10931), incorporated herein by reference. This application teaches methods for generating mixtures of peptides up to hexapeptides having all possible amino acid sequences, and further teaches assay methods for identifying those peptides capable of binding to proteases.

Other protease inhibitors may be proteins, par- ticularly antibodies and antibody derivatives.

Recombinant expression systems may be used to generate quantities of protease sufficient for production of monoclonal antibodies (MAbs) specific for the protease.

Suitable antibodies for protease inhibition will bind to the protease in a manner reducing or eliminating the enzymatic activity, typically by obscuring the active site. Suitable MAbs may be used to generate derivatives, such as Fab fragments, chimeric antibodies, altered antibodies, univalent antibodies, and single domain antibodies, using methods known in the art.

Protease inhibitors are screened using methods of the invention. In general, a substrate is employed which mimics the enzyme's natural substrate, but which provides a quantifiable signal when cleaved. The signal is preferably detectable by colorimetric or fluorometric means: however, other methods such as HPLC or silica gel chromatography, GC-MS, nuclear magnetic resonance, and the like may also be useful. After optimum substrate and enzyme concentrations are determined, a candidate protease inhibitor is added to the reaction mixture at a range of concentrations. The assay conditions ideally should resemble the conditions under which the protease is to be inhibited in vivo, i.e., under physiologic pH, temperature, ionic strength, etc. Suitable inhibitors will exhibit strong protease inhibition at concentrations which do not raise toxic side effects in the subject.

Inhibitors which compete for binding to the protease active site may require concentrations equal to or greater than the substrate concentration, while inhib- itors capable of binding irreversibly to the protease active site may be added in concentrations on the order of the enzyme concentration.

In a presently preferred embodiment, an inactive protease mutein is employed rather than an active enzyme. It has been found that replacing a critical residue within the active site of a protease (e.g., replacing the active site ser of a serine protease) does not significantly alter the structure of the enzyme, and thus preserves the binding specificity.

The altered enzyme still recognizes and binds to its proper substrate, but fails to effect cleavage. Thus, in one method of the invention an inactivated HCV protease is immobilized, and a mixture of candidate inhibitors added. preferred recognition sequence will compete more Inhibitors that closely mimic the enzyme's successfully for binding than other candidate inhibitors.

The poorly-binding candidates may then be separated, and the identity of the strongly-binding inhibitors determined. For example, HCV protease may be prepared substituting Ala for Serzzl (Fig. 1), providing an enzyme capable of binding the HCV protease substrate, but incapable of cleaving it. The resulting protease mutein is then bound to a solid support, for example Sephadex° beads, and packed into a column. A mixture of candidate protease inhibitors in solution is then passed through the column and fractions collected. The last fractions to elute will contain the strongest-binding compounds, and provide the preferred protease inhibitor candidates.

Protease inhibitors may be administered by a variety of methods, such as intravenously, orally, intra- muscularly, intraperitoneally, bronchially, intranasally, and so forth. The preferred route of administration will depend upon the nature of the inhibitor. Inhibitors prepared as organic compounds may often be administered orally (which is generally preferred) if well absorbed.

Protein-based inhibitors (such as most antibody derivatives) must generally be administered by parenteral routes.

C. Examples ‘The examples presented below are provided as a further guide to the practitioner of ordinary skill in the art, and are not to be construed as limiting the invention in any way.

Exam21s.1 (Preparation of HCV CDNA) A genomic library of HCV cDNA was prepared as described in PCT W089/046699 and USSN 7/456,637. This library, ATCC accession no. 40394, has been deposited as set forth below. coli D1210 cells. These cells, named Cfl/51 in E. coli, were deposited as set forth below and have an ATCC accession no. of 67967. ” First, DNA isolated from pSODCF1 was treated with BamHI and EcoRI, and the following linker was ligated into the linear DNA created by the restriction enzymes: GAT CCT GGA ATT CTG ATA'AGA CCT TAA'GAC TAT TTT AA After cloning, the plasmid containing the insert was iso- lated.

Plasmid containing the insert was restricted with EcoRI. excised with EcoRI, and ligated into this EcoRI lin- Recombinant bacteria from one clone were induced to express the SOD-HCVS_l_1 polypeptide by growing the bacteria in the presence of IPTG.

Three separate expression vectors, pcf1AB, pcf1CD, and pcf1EF were created by ligating three new linkers, AB, CD, and BF to a BamHI-EcoRI fragment derived by digesting to completion the vector pSODCF1 with EcoRI and BamHI, followed by treatment with alkaline phosphat- ase. The linkers were created from six oligomers, A, B, C, D, E, and F. treatment with kinase in the presence of ATP prior to Each oligomer was phosphorylated by annealing to its complementary oligomer. The sequences of the synthetic linkers were the following: Na e DN uence 5' to 3’ A GATC CTG AAT TCC TGA TAA B GAC TTA AGG ACT ATT TTA A C GATC CGA ATT CTG TGA TAA D GCT TAA GAC ACT ATT TTA A E GATC CTG GAA TTC TGA TAA F GAC CTT AAG ACT ATT TTA A Each of the three linkers destroys the original EcoRI site, and creates a new EcoRI site within the Thus, the HCV cDNA EcoRI fragments isolated from the clones, when linker, but within a different reading frame. inserted into the expression vector, were in three different reading frames.

The HCV CDNA fragments in the designated Xgtll clones were excised by digestion with EcoRI; each fragment was inserted into pcf1AB, pcf1CD, and pCflEF.

These expression constructs were then transformed into D1210 E. coli cells, the transformants cloned, and polypeptides expressed as described in part B below. placed in an individual 100 mm Petri dish containing mL of 50 mM Tris HCl, pH 7.5, 150 mM NaC1, 5 mM MgCl2, 3% (w/V) BSA, 40 #9/mL lysozyme, and 0.1 ng/mL DNase. The plates were agitated gently for at least 3 hours at room temperature. The filters were rinsed in TBST (50 mM Tris HCl, pH 8.0, 150 mM NaCl, 0.005% Tween’ 20). After incubation, the cell residues were rinsed and incubated for one hour in TBS (TBST without Tween°) containing 10% The filters were then incubated with pretreated sera in TBS from individuals with NANBH, which sheep serum. included 3 chimpanzees; 8 patients with chronic NANBH whose sera were positive with respect to antibodies to HCV C100-3 polypeptide (also called C100); 8 patients with chronic NANBH whose sera were negative for anti- C100 antibodies; negative for anti-C100 antibodies; a convalescent patient whose serum was and 6 patients with community-acquired NANBH, including one whose sera was strongly positive with respect to anti-C100 antibodies, and one whose sera was marginally positive with respect to anti-C100 antibodies. The sera, diluted in TBS, was pretreated by preabsorption with hSOD for at least 30 minutes at 37°C. After incubation, the filters were washed twice for 30 min with TBST. teins which bound antibodies in the sera were labeled by h 1251-labeled sheep anti- The expressed pro- incubation for 2 hours wit human antibody. After washing, the filters were washed twice for 30 min with TBST, dried, and autoradiographed.

Example 3 (Cloning of Full-Length SOD-Protease Fusion Proteins) The nucleotide sequences of the HCV cDNAs used below were determined essentially as described above, except that the cDNA excised from these phages were sub- stituted for the CDNA isolated from clone 51. _31_ Clone C33c was isolated using a hybridization probe having the following sequence: ’ ATC AGG ACC GGG GTG AGA ACA ATT ACC ACT 3’ The sequence of the HCV cDNA in clone C33c is shown in Figure 8, which also shows the amino acids encoded therein.

Clone 35 was isolated by screening with a syn- thetic polynucleotide having the sequence: ’ AAG CCA CCG TGT GCG CTA GGG CTC AAG CCC 3’ Approximately 1 in 50,000 clones hybridized with the probe. The polynucleotide and deduced amino acid sequences for C35 are shown in Figure 7.

Clone C31 is shown in Figure 6, which also A C200 cassette was constructed by ligating together a 718 bp fragment obtained by digestion of clone C33c DNA with EcoRI and Hinfl, a 179 bp fragment obtained by digestion of clone C31 DNA with HinfI and BglI, and a 377 bp fragment obtained by digesting clone C35 DNA with BglI and EcoRI. shows the amino acids encoded therein.

The construct of ligated fragments were inserted into the EcoRI site of pBR322, yielding the plasmid pBR322-C200.

(B) C7ﬁ+C20c: Clone 7f was isolated using a probe having the sequence: ’-AGC AGA CAA GGG GCC TCC TAG GGT GCA TAA T-3’ The sequence of HCV cDNA in clone 7f and the amino acids encoded therein are shown in Figure 5.

Clone C20c is isolated using a probe having the following sequence: ’-TGC ATC AAT GGG GTG TGC TGG—3’ The sequence of HCV CDNA in clone C20c, and the amino acids encoded therein are shown in Figure 2.

Clones 7f and czoc were digested with EcoRI and SfaNI to form 400 bp and 260 bp fragments, respectively.

The fragments were then cloned into the EcoRI site of pBR322 to form the vector C7f+C20c, and transformed into HB101 cells. (0) L30; Clone Sh was isolated using a probe based on the sequence of nucleotides in clone 33c. The nucleotide sequence of the probe was ’-AGA GAC AAC CAT GAG GTC CCC GGT GTT C-3’.

The sequence of the HCV CDNA in clone Sh, and the amino acids encoded therein, are shown in Figure 4.

Clone C26d is isolated using a probe having the following sequence: '-CTG TTG TGC CCC GCG GCA GCC-3’ The sequence and amino acid translation of clone C26d is shown in Figure 3.

Clones C26d and C33c (see part A above) were transformed into the methylation minus E. coli strain GM48. Clone C26d was digested with EcoRII and DdeI to provide a 100 bp fragment. Clone C33c was digested with BcoRII and EcoRI to provide a 700 bp fragment. Clone C8h was digested with EcoRI and DdeI to provide a 208 bp fragment. the EcoRI site of pBR322, and transformed into E. coli HB101, to provide the vector C300.

(D) Pre arat'o o ull These three fragments were then ligated into e t : A 600 bp fragment was obtained from C7f+C20c by digestion with EcoRI and Nael, and ligated to a 945 bp NaeI/EcoRI fragment from C300, and the construct inserted into the EcoRI site of pGEM4Z (commercially available from Promega) to form the vector C7fC20cC300.

C7fC20cC300 was digested with NdeI and EcoRI to provide a 892 bp fragment, which was ligated with a 1160 bp fragment obtained by digesting C200 with NdeI and EcoRI.

EcoRI site of pBR322 to provide the vector The resulting construct was inserted into the C7fC20cC300C200. illustrated schematically in Figure 9.

Construction of this vector is Exa e 4 (Preparation of E. coli Expression Vectors) This vector contains a full-length HCV protease coding sequence fused to a functional hSOD leader. The vector C7fC20cC300C200 was cleaved with EcoRI to provide a 2000 bp fragment, which was then ligated into the EcoRI site of plasmid cf1CD (Example 2A). encodes amino acids 1-151 of hSOD, and amino acids 946- 1630 of HCV (numbered from the beginning of the The resulting vector polyprotein, corresponding to amino acids 1-686 in Figure 1). The vector was labeled cf1SODp600 (sometimes referred to as P600), and was transformed into E. coli D1210 cells. deposited as set forth below.

(B) 2129: A truncated SOD-protease fusion polynucleotide, These cells, ATCC accession no. 68275, were was prepared by excising a 600 bp EcoRI/Nael fragment from C7f+C20c, blunting the fragment with Klenow fragment, ligating the blunted fragment into the Klenow- blunted EcoRI site of cf1EF (Example 2A). nucleotide encodes a fusion protein having amino acids 1- 151 of hSOD, and amino acids 1-199 of HCV protease.

(C) 21991 A longer truncated SOD-protease fusion poly- This poly- nucleotide was prepared by excising an 892 bp EcoRI/NdeI fragment from C7fC20cC300, blunting the fragment with Klenow fragment, ligating the blunted fragment into the Klenow~blunted EcoRI site of cf1EF. encodes a fusion protein having amino acids 1-151 of This polynucleotide hSOD, and amino acids 1-299 of HCV protease.

(D) Em: A longer truncated SOD-protease fusion poly- nucleotide was prepared by excising a 1550 bp EcoRI/EcoRI fragment from C7fC20cC300, and ligating the fragment into the EcoRI site of cf1CD to form P500. nucleotide encodes a fusion protein having amino acids 1- 151 of hSOD, and amino acids 946-1457 of HCV protease (amino acids 1-513 in Figure 1).

This poly- coding sequence fused to the FLAG sequence, Hopp et al. (1988) Biotechnology 6: 1204-1210. produce a HCV protease gene with special restriction ends PCR was used to for cloning ease. Plasmid p500 was digested with EcoRI and Ndel to yield a 900 bp fragment. This fragment and two primers were used in a polymerase chain reaction to introduce a unique Bg1II site at amino acid 1009 and a stop codon with a SalI site at amino acid 1262 of the HCV-1, as shown in Figure 17 of WO 90/11089, published 4 October 1990. The sequence of the primers is as follows: ’ CCC GAG CAA GAT CTC CCG GCC C 3’ ’ CCC GGC TGC ATA AGC AGT CGA CTT GGA 3’ After 30 cycles of PCR, the reaction was digested with Bg1II and SalI, and the 710 bp fragment was isolated.

This fragment was annealed and ligated to the following duplex: MetAspTyrLysAspAspAspAspLysGlyArgGlu CATGGACTACAAAGACGATGACGATAAAGGCCGGGA CTGATGTTTCTGCTACTGCTATTTCCGGCCCTCTAG The duplex encodes the FLAG sequence, and initiator methionine, and a 5' NcoI restriction site. The resulting Ncol/SalI fragment was ligated into a derivative of pCF1.

This construct is then transformed into E. coli D1210 cells and expression of the protease is induced by the addition of IPTG.

The FLAG sequence was fused to the HCV protease to facilitate purification. A calcium dependent monoclonal antibody, which binds to the FLAG encoded peptide, is used to purify the fusion protein without harsh eluting conditions. xam (E. coli Expression of SOD-Protease Fusion Proteins) (A) E. coli D1210 cells were transformed with cf1SODp600 and grown in Luria broth containing 100 ug/mL IPTG was then added to a concentration of 2 mM, and the cells cultured to a final OD of 0.9 to 1.3. The cells were then lysed, and the lysate analyzed by Western blot using anti-HCV sera, as described in USSN 7/456,637.

The results indicated the occurrence of ampicillin to an OD of 0.3-0.5. cleavage, as no full length product (theoretical Mr 93 kDa) was evident on the gel. Bands corresponding to the hSOD fusion partner and the separate HCV protease appeared at relative molecular weights of about 34, S3, and 66 kDa. The 34 kDa band corresponds to the hSOD partner (about 20 kDa) with a portion of the N83 domain, while the 53 and 66 kDa bands correspond to HCV protease with varying degrees of (possibly bacterial) processing.

(B) E. coli D1210 cells were transformed with P500 and grown in Luria broth containing 100 pg/mL amp- icillin to an OD of 0.3-0.5. IPTG was then added to a concentration of 2 mM, and the cells cultured to a final OD of 0.8 to 1.0. lysate analyzed as described above.

The cells were then lysed, and the The results again indicated the occurrence of cleavage, as no full length product (theoretical Mr kDa) was evident on the gel. Bands corresponding to the hSoD fusion partner and the truncated HCV protease appeared at molecular weights of about 34 and 45 kDa, respectively.

(C) E. coli D1210 cells were transformed with vectors P300 and P190 and grown as described above.

The results from P300 expression indicated the occurrence of cleavage, as no full length product (theoretical Mr 51 kDa) was evident on the gel. A band corresponding to the hSOD fusion partner appeared at a relative molecular weight of about 34. The corresponding HCV protease band was not visible, as this region of the NS3 domain is not recognized by the sera employed to detect the products. However, appearance of the hSOD band at 34 kDa rather than 51 kDa indicates that cleavage occurred.

The P190 expression product appeared only as the full (encoded) length product without cleavage, forming a band at about 40 kDa, which corresponds to the theoretical molecular weight for the uncleaved product.

This may indicate that the minimum essential sequence for HCV protease extends to the region between amino acids and 299.

Exa 6 (Purification of E. coli Expressed Protease) The HCV protease and fragments expressed in Example 5 may be purified as follows: The bacterial cells in which the polypeptide was expressed are subjected to osmotic shock and mechanical disruption, the insoluble fraction containing the protease is isolated and subjected to differential extraction with an alkaline-Nacl solution, and the polypeptide in the extract purified by chromatography on columns of S—Sepharose° and Q-Sepharose°. -37..

The crude extract resulting from osmotic shock and mechanical disruption is prepared by suspending 1 g of the packed cells in 10 mL of a solution containing 0.02 M Tris HC1, pH 7.5, 10 mM EDTA, 20% sucrose, and incubating for 10 minutes on ice. The cells are then pelleted by centrifugation at 4,000 x g for 15 min at 4°C. are resuspended in 10 mL of Buffer A1 (0.01 M Tris Hcl, pH 7.5, 1 mM EDTA, 14 mM 6-mercaptoethanol — "BME"), and incubated on ice for 10 minutes.

After the supernatant is removed, the cell pellets The cells are again pelleted at 4,000 x g for 15 minutes at 4°C. After removal of the clear supernatant (periplasmic fraction I), the cell pellets are resuspended in Buffer A1, incu- bated on ice for 10 minutes, and again centrifuged at ,000 x g for 15 minutes at 4°G. The clear supernatant (periplasmic fraction II) is removed, and the cell pellet resuspended in 5 mL of Buffer T2 (0.02 M Tris Hcl, pH 7.5, 14 mM 6MB, 1 mM BDTA, 1 mM PMSF). disrupt the cells, the suspension (5 mL) and 7.5 mL of In order to Dyno-mill lead-free acid washed glass beads (0.10-0.15 mm diameter) (available from Glen—Mil1s, Inc.) are placed in a Falcon tube and vortexed at top speed for two minutes, followed by cooling for at least 2 min on ice. The vortexing-cooling procedure is repeated another four times. sintered glass funnel using low suction, the glass beads washed twice with Buffer A2, and the filtrate and washes combined.

The insoluble fraction of the crude extract is collected by centrifugation at 20,000 x g for 15 min at 4°C, washed twice with 10 mL Buffer A2, and resuspended in 5 mL of MILLI-Q water.

A fraction containing the HCV protease is isolated from the insoluble material by adding to the suspension NaOH (2 M) and Nacl (2 M) to yield a final After vortexing, the slurry is filtered through a concentation of 20 mM each, vortexing the mixture for 1 minute, centrifuging it 20,000 x g for 20 min at 4°C, and retaining the supernatant.

The partially purified protease is then purified by SDS-PAGE. western blot, and the band excised from the gel. The The protease may be identified by protease is then eluted from the band, and analyzed to confirm its amino acid sequence. N-terminal sequences may be analyzed using an automated amino acid sequencer, while C—termina1 sequences may be analyzed by automated amino acid sequencing of a series of tryptic fragments.

Exa 7 (Preparation of Yeast Expression Vector) (A) P650 (son/prgase Fusion) This vector contains HCV sequence, which includes the wild-type full-length HCV protease coding sequence, fused at the 5' end to a SOD coding sequence.

Two fragments, a 441 bp EcoRI/Bg1II fragment from clone 11b and a 1471 bp BglII/EcoRI fragment from expression vector P500, were used to reconstruct a wild-type, full- length HCV protease coding sequence. These two fragments were ligated together with an EcoRI digested pS356 vector to produce an expression cassette. The expression cassette encodes the ADH2/GAPDH hybrid yeast promoter, human SOD, the HCV protease, and a GAPDH transcription terminator. The resulting vector was digested with BamHI and a 4052 bp fragment was isolated. This fragment was ligated to the BamHI digested pAB24 vector to produce p650. p650 expresses a polyprotein containing, from its amino terminal end, amino acids 1-154 of hSOD, an oligopeptide -Asn-Leu-G1y—Ile-Arg- , and amino acids 819 to 1458 of HCV-1, as shown in FIgure 17 of WO 90/11089, published 4 October 1990. _39- Clone 11b was isolated from the genomic library of HCV cDNA, ATCC accession no. 40394, as described above in Example 3A, using a hybridization probe having the following sequence: ' CAC CTA TGT TTA TAA CCA TCT CAC TCC TOT 3’.

This procedure is also described in EPO Pub. No. 318 216, Example IV.A.17.

The vector pS3EF, which is a pBR322 derivative, contains the ADH2/GAPDH hybrid yeast promoter upstream of the human superoxide dimutase gene, an adaptor, and a downstream yeast effective transcription terminator. A similar expression vector containing these control elements and the superoxide dismutase gene is described in Cousens et al. (1987) gene 61: 265, and in copending application EPO 196,056, published October 1, 1986. pS3EF, however, differs from that in Cousens et al. in that the heterologous proinsulin gene and the immunoglobulin hinge are deleted, and Gln154 of SOD is followed by an adaptor sequence which contains an EcoRI The sequence of the adaptor is: ' AAT TTG GGA ATT CCA TAA TTA ATT AAG 3’ 3’ AC CCT TAA GGT ATT AAT TAA TTC AGCT 5’ The EcoRI site facilitates the insertion of heterologous site. sequences. Once inserted into pS3EF, a SOD fusion is expressed which contains an oligopeptide that links SOD to the heterologous sequences. pS3EF is exactly the same as pS356 except that pS3S6 contains a different adaptor.

The sequence of the adaptor is shown below: ’ AAT TTG GGA ATT CCA TAA TGA G 3' ' AC CCT TAA GGT ATT ACT CAG CT 5’ pS356, ATCC accession no. 67683, is deposited as set forth below.

Plasmid pAB24 is a yeast shuttle vector, which contains pBR322 sequences, the complete 2p sequence for DNA replication in yeast (Broach (1981) in: Molecular Biglogy of the yeast Saccharomyces, Vol. 1, p. 445, Cold spring Harbor Press.) and the yeast LEU2d gene derived from plasmid pcl/1, described in EPO Pub. No. 116 201.

Plasmid pAB24 was constructed by digesting YEp24 with EcoRI and re-ligating the vector to remove the partial 2 The resulting plasmid, YEp24de1taRI, was linearized with Clal and ligated with the complete 2 micron plasmid which had been linearized with claI. The micron sequences. resulting plasmid, pCBou, was then digested with Xbal, This isolated XbaI fragment was ligated with a 4460 bp XbaI and the 8605 bp vector fragment was gel isolated. fragment containing the LEU2d gene isolated from pC1/1; the orientation of LEU2d gene is in the same direction as the URA3 gene.

S. cerevisae, 21503 (pAB24-GAP-envz), accession no- 20827, is deposited with the American Type Culture Collection as set forth below. The plasmid pAB24-GAP-env2 can be recovered from the yeast cells by known techniques. The GAP-env2 expression cassette can be removed by digesting pAB24—GAP-env2 with BamHI. pAB24 is recovered by religating the vector without the BamHI insert. -41.- selected on leu- plates with 8% glucose putatively for The plates were incubated The tranformants were further high numbers of the peso plasmid. Colonies from the leu- plates were inoculated into leu- medium with 3% glucose.

These cultures were shaken at 30°C for 2 days and then diluted 1/20 into YEPD medium with 2% glucose and shaken for 2 more days at 30°C.

S. cerevisae JSC3lO contains DM15 DNA, described in EPO Pub. No. 340 986, published 8 November 1989. This DM15 DNA enhances gpgg regulated expression of heterologous proteins. pDM15, accession no. 40453, is deposited with the American Type Culture Collection as set forth below.

E am e (Yeast Ubiquitin Expression of Mature HCV Protease) Mature HCV protease is prepared by cleaving vector C7fC20cC300C200 with EcoRI to obtain a 2 Kb coding sequence, and inserting the sequence with the appropriate linkers into a ubiquitin expression vector, such as that described in WO 88/02406, published 7 April 1988, or USSN 7/390,599 filed 7 August 1989, incorporated herein by reference. Mature HCV protease is recovered upon expression of the vector in suitable hosts, particularly yeast. Specifically, the yeast expression protocol described in Example 8 is used to express a ubiquitin/HCV protease vector. x 1 0 (Preparation of an In-Vitro Expression Vector) Four synthetic DNA fragments were annealed and ligated** together to create a EcoRI/Sacl Yellow Fever leader, which was ligated to a EcoRI/SacI digested pGEM°- 3Z vector from Promega°. The sequence of the four fragments are listed below: YFK-1: ’ AAT TCG TAA ATC CTG TGT GCT AAT TGA GGT GCA TTG GTC TGC AAA TCG AGT TGC TAG GCA ATA AAC ACA TT 3’ YFK-2: ' TAT TGC CTA GCA ACT CGA TTT GCA GAC CAA TGC ACC TCA ATT AGC ACA CAG GAT TTA CG 3’ YFK-3: ’ TGG ATT AAT TTT AAT CGT TCG TTG AGC GAT TAG CAG AGA ACT GAC CAG AAC ATG TCT GAG CT 3’ YFK-4: ’ CAG ACA TGT TCT GGT CAG TTC TCT GCT AAT CGC TCA ACG AAC GAT TAA AAT TAA TCC AAA TGT GTT 3'.

For in-vitro translation of the HCV protease, the new pGEM°-3Z/Yellow Fever leader vector was digested with BamHI and blunted with Klenow.

(B) PvuII Construct frgm_p6000 A clone p6000 was constructed from sequences available from the genomic library of HCV CDNA, ATCC accession no. 40394. The HCV encoding DNA sequence of p6000 is identical to nucleotide -275 to nucleotide 6372 of Figure 17 of WO 90/l1089,.published 4 October 1990. p6000 was digested with PvuII, and from the digest, a ,864 bp fragment was isolated. This 2,864 bp fragment was ligated to the prepared pGEM9-3Z/Yellow Fever leader vector fragment, described above.’ E.&mP_lLl.J.

(In-Vitro Expression of BCV Protease) (A) Iraneszintien The pGEM°-32/Yellow Fever leader/PvuII vector was linearized with XbaI and transcribed using the materials and protocols from Promega's Riboprobe” Gemini II Core system.

(B) Translation The RNA produced by the above protocol was translated using Promega's rabbit reticulocyte lysate, minus methionine, canine pancreatic microsomal membranes, as well as, other necessary materials and instructions from Promega.

De osited Bi 0 c e i l : The following materials were deposited with the American Type Culture Collection (ATCC), 12301 Parklawn Dr., Rockville, Maryland: Name Qepgsit Date Accession no.

E. coli D1210, 23 Mar 1990 68275 cflSODp600 Cfl/51 in E. coli 11 May 1989 67967 D1210 Bacteriophage X-gtll 01 Dec 1987 40394 cDNA library E. coli-H3101-, pS356 29 Apr 1988 67683 plasmid DNA, pDM1S 05 May 1933 404 -44..

S. cerevisae, 21503 23 Dec 1986 (pAB24—GAP-envz) The above materials have been deposited with the ATCC under the accession numbers indicated. These deposits will be maintained under the terms of the Budapest Treaty on the International Recognition of the Deposit of Microorganisms for purposes of Patent materials, and no such license is granted hereby. (1) GENERAL INFORHRTION: (ii) (iii) (iv) APPLICANT: NAME: (A) (5) (C) (D) (E) (F) SEQUENCE LISTING Chiron Corporation STREET: 4560 Horton Street CITY: Emeryville STATE: California COUNTRY: US POSTAL CODE (ZIP): 94608-2916 TITLE OP INVENTION: Hepatitis C Virus Protease NUMBER OF SEQUENCES: COMPUTER RBADABLB FORM: (A) MEDIUM TYPE: Floppy disk (B) COMPUTER: (C) OPERATING SYSTEM: PC-DOS/HS-DOS (D) SOFTWARE: CURRENT APPLICATION DATE: APPLICATION NUMBER: EP 91908105.? (2) INFORMATION FOR snq ID NO:1: (xi) Arc: SEQUENCE CHARACTERISTICS: (A) (B) (C) (D) LENGTH: TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear amino acids IBM PC compatible SEQUENCE DESCRIPTION: SEQ ID N0:l: Arg Gly Arg Gly Gly Gln 50 Lou Gly Thr Thr 130 Leo Glu Thr Ar? Val 100 Glu Ile Len Leu Gly Pro Thr 85 Val 135 Ile 40 Gln Gly Ser Lou Thr Ile val $ Thr Ser Val Cys Lys 90 Asp Gly Tyr Ala Thr Gly Thr Ala Thr Val Pro Val Pro Ala ser Asp Arg 140 net Gln Arg Ala Ty: Ila Ser 110 Val Gln Asp Gln His Gln 95 Patentln Release 31.0, version #l.25 (EPO) Ser Thr Lys Thr Gly 80 Met Gly Arg Gly ser Leu Len Se: Pro Arg Pro Ile Ser Tyr Len Lys Gly ser ser Gly Gly Pro Leu Leu cys Pro Ala Gly His Ala Val Gly Ile Phe Arg Ala Ala Val Cys Thr Arg Gly Val Ala Lys Ala Val Asp Phe Ile Pro Val Glu Ash Len Glu Th: Th: Met Axg 195 200 INFORMATION FOR SEQ ID NO:2: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 9 amino acids (B) TYPE: amino acid (C) STRANDEDNBSS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID N0:28 cys Trp Tht Val Tyr His Gly Ala Gly 1 5 INFORMATION FOR SEQ ID N0:3: (1) snguancz CHARACTERISTICS: (A) LENGTH: 9 amino acids (B) TYPE: amino acid (C) STRAHDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:3: Asp Gln Asp Lou Gly Trp Pro Ala Pro 1 5 INFORMATION FOR SEQ ID NO:4: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 9 amino acids (8) TYPE: amino acid (C) STRANDEDNESSz single (D) TOPOLOGY: linear (Xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: Leu Lys Gly Set Set Gly Gly Pro Leu 1 5 INFORMATION FOR SEQ ID NO:5: (1) SEQUENCE CHARACTERISTICS: (A) LENGTH: 9 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:S: Phe His Thr Met Trp His Val Thr Arg 1 5 (2)_INFDRMATION FDR SEQ ID N026: (1) ssgusnca CHARACTERISTICS: (A) LENGTH: 9 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID no:6: Lye Glu Asp Len Val Ala Ty: Gly Gly 1 5 (2) INFORMATION FOR SEQ ID NO:7: (i) SEQUENCE CHRRACTERISTICS: (A) LENGTH: 9 amino acids (B) TYPE: amino acid (c) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:7: Pro Ser Gly Th: ser Gly set Pro 112 1 5 (2) INFORMATION FOR SEQ ID NO:8: (1) SEQUENCE CHARACTERISTICS: (A) LENGTH: 9 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID N028: Phe His Thr Leu Trp His Thr Thr Lys 1 S (2) INFORMATION FOR SEQ ID NO:9: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 9 amino acids (B) TYPE: amino acid {C} STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: seq ID NO:9: Lys Glu Asp Arg Leu cys Tyr Gly Gly l 5 (2) INFORMATION FOR sag ID NO:1D: .5 (r\ (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 9 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TDPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID N0:10: Pro Thr Gly Th: ser Gly Set Pro Ila -1 5 (2) INFORMATION FOR SEQ ID no:1I: ((1) saqvnncs CHARACTERISTICS: - (A) LERGTH: 9 amino acids (3) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) szguancz DESCRIPTION: seq ID no:11: Phe His Thr Len Trp His Th: Thr Arg 1 S (2) INFORMATION FOR SEQ ID NO:l2: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 9 amino acids (B) TYPE: amino acid (0) STRANDBDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:12: Lys Glu Asp Arg Val Th: Tyr Gly Gly 1 S (2) INFORMATION FOR SEQ ID NO:13: (1) SEQUENCE CHARACTERISTICS: (A) LENGTH: 9 amino acids (3) TYPE: amino acid (C) STRANDEDNESS; single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID N0:13: Pro Ile Gly Th: Ser Gly ser Pro Ile 1 5 (2) INFORMATION FOR SEQ ID NO:14: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 9 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear <2‘) (Xi) SEQUENCE DESCRIPTION: SEQ ID NO:14: Phe His Thr Leu Trp His Thr Thr Lys 1 5 INFORMATION FOR 539 ID N0:15: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 9 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single {D} TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID N0:15: Lys Gin Asp Arg Lou Cy: Tyr Gly cly 1 5 INFORMATION FOR SEQ ID N0:l6: (1) SEQUENCE CHRRACTERISTICS: (A) LENGTH: 9 amino acids (8) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLDGY: linear (xi) SEQUENCE DESCRIPTION: seq ID N0:l6: Pro Th: Gly Th: Se: Gly Ser Pro I19 1 5 INFORMATION FOR SEQ ID NO:17: (1) SEQUENCE CHARACTERISTICS: (A) LENGTH: 5 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID N0:l7: Thr Ala Gly His Cys 1 5 INFORMATION FOR SEQ ID NO:18: (1) SEQUENCE CHARACTERISTICS: (A) LENGTH: 7 amino acids (8) TYPE: amino acid (0) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID N0:18: Asn Asn Asp Ty: Gly Ile Ile 1 5 INFORMATION FOR SEQ ID NO:19: .. E,(_, _ (1) SEQUENCE CHARACTERISTICS : (A) LENGTH: 7 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID N0:19: Gly Asp Ser Gly Gly Ser Leu 1 5 (2) INFORMATION FOR SEQ ID N0:20: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 5 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:20: Thr Ala Gly Bis Cys 1 S (2) INFORMATION FOR SEQ ID NO:21: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 7 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TQPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:2l: Gly Asn Asp Arg Ala Trp Val 1 5 (2) INFORMATION FOR SEQ ID N0:22: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 7 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear {Xi} SEQUENCE DESCRIPTION: SEQ ID NO:22: Gly Asp Se: Gly Gly Ser Trp 1 5 (2) INFORMATION FOR SEQ ID NO:23: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 5 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: Se: Ala Ala His Cys 1 S (2) INFORMATION FOR sag ID NO:24: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 7 amino acids (3) TYPE: amino acid (c) STRANDEDNESS: single (D) TOPOLOGI: linear (xi) SEQUENCE DESCRIPTION: SEQ ID N0:24: Asn Asn Asp Ile Net Leu Ile 1 S (2) INFORMATION FOR SEQ ID N0:25: (1) ssovnnca CHARACTERISTICS: (A) LENGTH: 7 amino acids (8) TYPE: amino acid (C} STRANDEDNESS: single (D) TOPOLOGY: linear (xi) segusncz DESCRIPTION: SEQ ID no:25: Gly Asp Se: Gly Gly Pro Val 1 S (2) INFORMATION FOR SEQ ID N0:26: (i) SEQUENCE CHARQCTERISTICS: (A) LENGTH: 5 amino acids (8) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:26: Thr Ala Ala His Cys 1 S (2) INFORMATION FOR SEQ ID N0:27: (1) SEQUENCE CHARACTERISTICS: (A) LENGTH: 7 amino acids (B) TYPE: amino acid (C) STRANDEDDESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: seq ID N0:27: Asn Asn Asp Ile Thr Leu Leu 1 5 (2) INFORMATION FOR seq ID N0:28: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 7 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:2B: Gly Asp Ser Gly Gly Pro Leu 1 5 (2) INFORMATION FOR SEQ ID NO:29: (i) SEQUENCE CHRRACTERISTICS: (B) LERGTH: 5 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID uo:29: Thr Ala Ala His Cy: 1 S (2) INFORMATION FOR 559 ID NO:30: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 7 amino acids (B) TYPE: amino acid (C) STRANDBDNESS: single (D) TOPDLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:30: Gly Ty: Asp Ile Ala Leu Leu 1 S (2) INFORMATION FOR SEQ ID N0:31: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 7 amino acids (B) TYPE: amino acid (C) STRANDEDNES5: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID N0:31: Gly Asp Set Gly Gly Pro Leu 1 S (2) INFORMATION FOR sag ID no 32: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 5 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (Xi) SEQUENCE DESCRIPTION: SEQ ID NO:32: Thr Val Ty: His Gly (2) INFORMATION FOR sag ID NO:33: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 7 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (Xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: Set Set Asp Len Tyr Leu Val (2) INFORMATION FOR SEQ ID NO:34: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) segunncz oescnrprzon: sng ID no:34: Gly Set 1 Ser~Gly Gly Pro Leu (2) INFORMATION FOR SEQ ID N0:35: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 75 amino acids (3) TYPE: amino acid (C) STRANDBDNESS: single (D) TOPOLOGY: linear (Xi) Gln Ile Glu ser Glu Gly Glu 50 Thr 65 SEQUENCE DESCRIPTION: SEQ ID NO:35: Phe Val Ly: Thr Leu Thr Gly Lys Th: Ila Thr Leu Clu Val VS 10 15 sex Asp Thr Ile Asp Asa Val Lys Set Lye Ila Gln Asp Lys 25 30 I19 Pro Pro Asp Gln Gln Arg Leu Ile Phe Ala Gly Lys Gln 40 45 Asp Gly Arg Thr Leu set Asp Tyr Asn Ila Gln Lys Glu Ser 55 60 His Leu Val Leu Arg Leu Arg Sly Gly 70 75 (2) INFORMATION FOR seq ID N0:36: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 28 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID N0:36: Val Ser Ala Arg Arg Gly Axg Glu Ile Leu Leu Gly Ala Ile Leu Arg 1 S 10 15 Arg His Val Gly Pro Val Ser Cy: Gln Axg Gly Tyr 2S (2) INFORMATION pox seq ID uo:37: (1) SEQUENCE CHARACTERISTICS: (A) LENGTH: 38 base pairs (8) TYPE: nucleic acid {C} STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE nzscarprrox: sag ID no:37: carccrccaa TTCTGATAAG ACCTTAAGAC rarrrran 33 (2) INFORMATION won 339 ID NO:38: ' (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 19 base pairs (B) TYPE: nucleic acid (C) STRANDBDNBSS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:3B: GATCCTGRAT TCCTGBTRB 19 (2) INFORMATION FOR SEQ ID N0:39: (1) SEQUENCE CHARACTERISTICS: (A) LENGTH: 19 base pairs (B) TYPE: nucleic acid (c) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:39: GACTTRAGGA CTATTTTBB 19 (2; INFORMATION FOR SEQ ID N0:40: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 19 base pairs (B) TYPE: nucleic acid (C) STRANDBDNESS: single (D) TOPOLOGY: linear gr‘: {xi} SEQUENCE DESCRIPTION: SEQ ID N0:40: GATCCGAATT CTGTGATAA (2) INFORMATION FOR SEQ ID NO:41: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 19 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) segunncz DESCRIPTION: sag ID no:41: GCTTAAGACA CTATTTTAA (2) INFORMATION won sag ID NO:42: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 19 base pairs (B) TYPE: nucleic acid

Claims

1. CLAIMS: A composition comprising a purified hepatitis C virus (HCV) protease, encoded in the NS3 domain of the HCV genome, or truncations thereof having protease activity. A composition according to claim 1 wherein said protease comprises a partial internal amino acid sequence as shown in Seq ID. No. 63. A composition according to claim 1 or 2 wherein said protease comprises a partial internal amino acid sequence as shown in Seq ID. No. 64. A composition according to any one of claims 1 to 3 wherein said protease comprises the amino acid sequence as shown in Seq ID No. 69. A composition according to any one of claims 1 to 3 wherein said protease comprises a partial internal amino acid sequence as shown in Seq ID. No. 65. A composition according to any one of the preceding claims wherein said protease comprises a histidine, aspartate and serine residue at positions corresponding to amino acid 139, 163 and 221 respectively of the amino acid sequence shown in SEQ ID. NO. 69, or equivalent positions. A composition comprising a purified protease derived from the hepatitis C virus as defined in claim 1 wherein said protease has an amino acid sequence as shown in Seq ID. No. 67. A composition comprising a purified protease derived from the hepatitis C virus as defined in claim 1 wherein said protease has an amino acid sequence as shown in Seq ID. No. 66. A fusion protein comprising a suitable fusion partner fused to a protease as defined in any one of the preceding claims. A fusion protein according to claim 9 wherein said fusion partner is selected from human superoxide dismutase, ubiquitin, yeast d-factor, IL-2S, 9-galactosidase, B-lactamase, horseradish peroxidase, glucose oxidase and urease. A composition comprising a polynucleotide encoding DJ protease as defined in any one of claims 1 to 8. A composition comprising a polynucleotide encoding D-I fusion protein as defined in claim 9 or claim 10. An expression vector for producing an HCV protease in a host cell, which vector comprises: (a) a polynucleotide encoding a protease as defined in claim 11; (b) transcriptional and translational regulatory sequences functional in said host cell, operably linked to said polynucleotide; and (c) a selectable marker. A vector according to claim 13 which comprises a sequence encoding a fusion partner, linked to said polynucleotide to from a fusion protein upon expression. A vector according to claim 14 wherein said fusion partner is selected from human superoxide dismutase, ubiquitin, yeast a-factor, IL-28, B-galactosidase, B-lactamase, horseradish peroxidase, glucose oxidase and urease.