EP0821737A1 - Nukleotidsequenzen von haemophilus influenzae rd genom fragmente davon und ihre verwendungen - Google Patents

Nukleotidsequenzen von haemophilus influenzae rd genom fragmente davon und ihre verwendungen

Info

Publication number
EP0821737A1
EP0821737A1 EP96912845A EP96912845A EP0821737A1 EP 0821737 A1 EP0821737 A1 EP 0821737A1 EP 96912845 A EP96912845 A EP 96912845A EP 96912845 A EP96912845 A EP 96912845A EP 0821737 A1 EP0821737 A1 EP 0821737A1
Authority
EP
European Patent Office
Prior art keywords
fragments
genome
sequence
nucleotide sequence
seq
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP96912845A
Other languages
English (en)
French (fr)
Other versions
EP0821737A4 (de
Inventor
Robert D. Fleischmann
Mark D. Adams
Owen White
Hamilton O. Smith
J. Craig Venter
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Human Genome Sciences Inc
Johns Hopkins University
Original Assignee
Human Genome Sciences Inc
Johns Hopkins University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US08/476,102 external-priority patent/US6355450B1/en
Priority claimed from US08/487,429 external-priority patent/US6468765B1/en
Application filed by Human Genome Sciences Inc, Johns Hopkins University filed Critical Human Genome Sciences Inc
Publication of EP0821737A1 publication Critical patent/EP0821737A1/de
Publication of EP0821737A4 publication Critical patent/EP0821737A4/de
Withdrawn legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/195Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from bacteria
    • C07K14/285Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from bacteria from Pasteurellaceae (F), e.g. Haemophilus influenza

Definitions

  • the present invention relates to the field of molecular biology.
  • the present invention discloses compositions comprising the nucleotide sequence of Haemophilus influenzae, fragments thereof and usage in industrial fermentation and pharmaceutical development.
  • the complete genome sequence from a free living cellular organism has never been determined.
  • the first mycobacterium sequence should be completed by 1996, while E. coli and S. cerevisae are expected to be completed before 1998. These are being done by random and/or directed sequencing of overlapping cosmid clones. No one has attempted to determine sequences of the order of a megabase or more by a random shotgun approach.
  • H. influenzae is a small (approximately 0.4 x 1 micron) non-motile, non-spore forming, germ-negative bacterium whose only natural host is human. It is a resident of the upper respiratory mucosa of children and adults and causes otitis media and respiratory tract infections mostly in children. The most serious complication is meningitis, which produces neurological sequelae in up to 50% of affected children.
  • Six H. influenzae serotypes (a through f) have been identified based on immunologically distinct capsular polysaccharide antigens. A number of non-typeable strains are also known. Serotype b accounts for the majority of human disease. Interest in the medically important aspects of H.
  • influenzae biology has focused particularly on those genes which determine virulence characteristics of the organism.
  • a number of the genes responsible for the capsular polysaccharide have been mapped and sequenced (Kroll et al. , Mol. Microbiol. 5(6): 1549-1560 (1991)).
  • OMP outer membrane protein
  • the lipoligosaccharide (LOS) component of the outer membrane and the genes of its synthetic pathway are under intensive study (Weiser et al. , J. Bacteriol. 272:3304-3309 (1990)). While a vaccine has been available since 1984, the study of outer membrane components is motivated to some extent by the need for improved vaccines. Recently, the catalase gene was characterized and sequenced as a possible virulence-related gene (Bishni et al. , in press). Elucidation of the H. influenzae genome will enhance the understanding of how H. influenzae causes invasive disease and how best to combat infection. H.
  • influenzae possesses a highly efficient natural DNA transformation system which has been intensively studied in the non-encapsulated (R), serotype d strain (Kahn and Smith, J. Membrane Biology 87:89-103 (1984)). At least 16 transformation-specific genes have been identified and sequenced. Of these, four are regulatory (Redfield, J. Bacteriol. 775:5612-5618 (1991), and Chandler, Proc. Natl. Acad. Sci. USA 89: 1626-1630 (1992)), at least two are involved in recombination processes (Barouki and Smith, J. Bacteriol. 753(2):629-634 (1985)), and at least seven are targeted to the membranes and periplasmic space (Tomb et al.
  • H. influenzae Rd transformation shows a number of interesting features including
  • SUBST ⁇ UTE SHEET (RULE 26) sequence-specific DNA uptake, rapid uptake of several double-stranded DNA molecules per competent cell into a membrane compartment called the transformasome, linear translocation of a single strand of the donor DNA into the cytoplasm, and synapsis and recombination of the strand with the chromosome by a single-strand displacement mechanism.
  • T e H. influenzae
  • Rd transformation system is the most thoroughly studied of the gram-negative systems and distinct in a number of ways from the gram-positive systems.
  • H. influenzae Rd genome has been determined by pulsed-field agarose gel electrophoresis of restriction digests to be approximately 1.9 Mb, making its genome approximately 40% the size of E. coli (Lee and Smith, J. Bacteriol. 770:4402-4405 (1988)).
  • the restriction map of H. influenzae is circular (Lee et al., J. Bacteriol. 77:3016-3024 (1989), and Redfield and Lee, "Haemophilus influenzae Rd", pp. 2110-2112, In O'Brien, S.J. (ed), Genetic Maps: Locus Maps of Complex Genomes, Cold Spring Harbor Press, New York).
  • Various genes have been mapped to restriction fragments by Southern hybridization probing of restriction digest DNA bands. This map will be valuable in verification of the assembly of a complete genome sequence from randomly sequenced fragments.
  • GenBank currently contains about 100 kb of non-redundant H. influenzae DNA sequences. About half are from serotype b and half from R
  • the present invention is based on the sequencing of the Haemophilus influenzae Rd genome.
  • the primary nucleotide sequence which was generated is provided in SEQ ID NO: l.
  • the present invention provides the generated nucleotide sequence of the
  • Haemophilus influenzae Rd genome or a representative fragment thereof, in a form which can be readily used, analyzed, and interpreted by a skilled artisan.
  • present invention is provided as a contiguous string of primary sequence information corresponding to the nucleotide sequence depicted in SEQ ID NO:l.
  • the present invention further provides nucleotide sequences which are at least 99.9% identical to the nucleotide sequence of SEQ ID NO:l.
  • the nucleotide sequence of SEQ ID NO: 1 , a representative fragment thereof, or a nucleotide sequence which is at least 99.9% identical to the nucleotide sequence of SEQ ID NO: l may be provided in a variety of mediums to facilitate its use. In one application of this embodiment, the sequences of the present invention are recorded on computer readable media.
  • Such media includes, but is not limited to: magnetic storage media, such as floppy discs, hard disc storage medium, and magnetic tape; optical storage media such as CD-ROM; electrical storage media such as RAM and ROM; and hybrids of these categories such as magnetic/optical storage media.
  • the present invention further provides systems, particularly computer- based systems which contain the sequence information herein described stored in a data storage means. Such systems are designed to identify commercially important fragments of the Haemophilus influenzae Rd genome.
  • the fragments of the Haemophilus influenzae Rd genome of the present invention include, but are not limited to, fragments which encode peptides, hereinafter open reading frames (ORFs), fragments which modulate the expression of an operably linked ORF, hereinafter expression modulating fragments (EMFs), fragments which mediate the uptake of a linked DNA fragment into a cell, hereinafter uptake modulating fragments (UMFs), and fragments which can be used to diagnose the presence of Haemophilus influenzae Rd in a sample, hereinafter, diagnostic fragments (DFs).
  • ORFs open reading frames
  • EMFs expression modulating fragments
  • UMFs uptake modulating fragments
  • DFs diagnostic fragments
  • Each of the ORF fragments of the Haemophilus influenzae Rd genome disclosed in Tables 1(a) and 2, and the EMF found 5' to the ORF, can be used in numerous ways as polynucleotide reagents.
  • the sequences can be used as diagnostic probes or diagnostic amplification primers for the presence of a
  • SUBSTTTUTE SHEET (RULE 26) specific microbe in a sample, for the production of commercially important pharmaceutical agents, and to selectively control gene expression.
  • the present invention further includes recombinant constructs comprising one or more fragments of the Haemophilus influenzae Rd genome of the present invention.
  • the recombinant constructs of the present invention comprise vectors, such as a plasmid or viral vector, into which a fragment of the Haemophilus influenzae Rd has been inserted.
  • the present invention further provides host cells containing any one of the isolated fragments of the Haemophilus influenzae Rd genome of the present invention.
  • the host cells can be a higher eukaryotic host such as a mammalian cell, a lower eukaryotic cell such as a yeast cell, or can be a procaryotic cell such as a bacterial cell.
  • the present invention is further directed to isolated proteins encoded by the ORFs of the present invention.
  • isolated proteins encoded by the ORFs of the present invention.
  • a variety of methodologies known in the art can be utilized to obtain any one of the proteins of the present invention.
  • the amino acid sequence can be synthesized using commercially available peptide synthesizers.
  • the protein is purified from bacterial cells which naturally produce the protein.
  • proteins of the present invention can alternatively be purified from cells which have been altered to express the desired protein.
  • the invention further provides methods of obtaining homologs of the fragments of the Haemophilus influenzae Rd genome of the present invention and homologs of the proteins encoded by the ORFs of the present invention.
  • nucleotide and amino acid sequences disclosed herein as a probe or as primers, and techniques such as PCR cloning and colony/plaque hybridization, one skilled in the art can obtain homologs.
  • the invention further provides antibodies which selectively bind one of the proteins of the present invention. Such antibodies include both monoclonal and polyclonal antibodies.
  • the invention further provides hybridomas which produce the above- described antibodies.
  • a hybridoma is an immortalized cell line which is capable of secreting a specific monoclonal antibody.
  • the present invention further provides methods of identifying test samples derived from cells which express one of the ORF of the present invention, or homolog thereof. Such methods comprise incubating a test sample with one or more of the antibodies of the present invention, or one or more of the DFs of the present invention, under conditions which allow a skilled artisan to determine if the sample contains the ORF or product produced therefrom.
  • kits which contain the necessary reagents to carry out the above-described assays.
  • the invention provides a compartmentalized kit to receive, in close confinement, one or more containers which comprises: (a) a first container comprising one of the antibodies, or one of the DFs of the present invention; and (b) one or more other containers comprising one or more of the following: wash reagents, reagents capable of detecting presence of bound antibodies or hybridized DFs.
  • the present invention further provides methods of obtaining and identifying agents capable of binding to a protein encoded by one of the ORFs of the present invention.
  • agents include antibodies (described above), peptides, carbohydrates, pharmaceutical agents and the like.
  • Such methods comprise the steps of: (a) contacting an agent with an isolated protein encoded by one of the ORFs of the present invention; and
  • H. influenzae The complete genomic sequence of H. influenzae will be of great value to all laboratories working with this organism and for a variety of commercial purposes. Many fragments of the Haemophilus influenzae Rd genome will be immediately identified by similarity searches against GenBank or protein
  • SUBSTITUTE SHEET (RULE 25) databases and will be of immediate value to Haemophilus researchers and for immediate commercial value for the production of proteins or to control gene expression.
  • a specific example concerns PHA synthase. It has been reported that polyhydroxybutyrate is present in the membranes of H. influenzae Rd and that the amount correlates with the level of competence for transformation.
  • the PHA synthase that synthesizes this polymer has been identified and sequenced in a number of bacteria, none of which are evolutionarily close to H. influenzae.
  • This gene has yet to be isolated from H. influenzae by use of hybridization probes or PCR techniques.
  • the genomic sequence of the present invention allows the identification of the gene by utilizing search means described below.
  • sequenced genomes will provide the models for developing tools for the analysis of chromosome structure and function, including the ability to identify genes within large segments of genomic DNA, the structure, position, and spacing of regulatory elements, the identification of genes with potential industrial applications, and the ability to do comparative genomic and molecular phylogeny.
  • Figure 1 restriction map of the Haemophilus influenzae Rd genome.
  • Figure 2 Block diagram of a computer system 102 that can be used to implement the computer-based systems of present invention.
  • Figure 3 A comparison of experimental coverage of up to approximately 4000 random sequence fragments assembled with AutoAssembler (squares) as compared to Lander- Waterman prediction for a 2.5 Mb genome (triangles) and a 1.6 Mb genome (circles) with a 460 bp average sequence length and a 25 bp overlap.
  • Figure 4 Data flow and computer programs used to manage, assemble, edit, and annotate the H. influenzae genome. Both Macintosh and Unix platforms are used to handle the AB 373 sequence data files (Kerlavage et al. , Proceedings of the Twenty-Sixth Annual Hawaii International Conference on System Sciences, IEEE Computer Society Press, Washington
  • Factura is a Macintosh program designed for automatic vector sequence removal and end trimming of sequence files.
  • the program esp runs on a Macintosh platform and parses the feature data extracted from the sequence files by Factura to the Unix based H. influenzae relational database. Assembly is accomplished by retrieving a specific set of sequence files and their associated features using stp, an X- windows graphical interface and control program which can retrieve sequences from the H. influenzae database using user-defined or standard SQL queries.
  • the sequence files were assembled using TIGR Assembler, an assembly engine designed at TIGR for rapid and accurate assembly of thousands of sequence fragments.
  • TIGR Editor is a graphical interface which can parse the aligned sequence files from TIGR Assembler output and display the alignment and associated electropherograms for contig editing. Identification of putative coding regions was performed with Genemark (Borodovsky and Mclninch, Computers Chem. 17(2): 123 (1993)), a Markov and Bayes modeled program for predicting gene locations, and trained on a H. influenzae sequence data set. Peptide searches were performed against the three reading frames of each Genemark predicted coding region using blaze (Brutlag et al , Computers Chem. 77:203 (1993)) run on a Maspar MP-2 massively parallel computer with 4096 microprocessors.
  • Results from each frame were combined into a single output file by mblzt.
  • Optimal protein alignments were obtained using the program praze which extends alignments across potential frameshifts.
  • the output was inspected using a custom graphic viewing program, gbyob, that interacts directly with the H. influenzae database.
  • the alignments were further used to identify potential frameshift errors and were targeted for additional editing.
  • Figure 5 A circular representation of the H. influenzae Rd chromosome illustrating the location of each predicted coding region contain ⁇ ing a database match as well as selected global features of the genome.
  • Outer perimeter The location of the unique N ⁇ tl restriction site (designated as nucleotide 1), the RsrU sites, and the Smal sites.
  • Outer concentric circle The location of each identified coding region for which a gene identification was made. Each coding region location is coded as to role according to the color code in Fig. 6.
  • Second concentric circle Regions of high G/C content ( > 42% , red; > 40%, blue) and high A/T content ( > 66% , black; > 64%, green). High G/C content regions are specifically associated with the 6 ribosomal operons and the mu-like prophage.
  • Third concentric circle Coverage by lambda clones (blue). Over 300 lambda clones were sequenced from each end to confirm the overall structure of the genome and identify the 6 ribosomal operons.
  • Fourth concentric circle The locations of the 6 ribosomal operons (green), the tR ⁇ As (black) and the cryptic mu-like prophage (blue).
  • Fifth concentric circle Simple tandem repeats. The locations of the following repeats are shown: CTGGCT, GTCT, ATT, AATGGC, TTGA, TTGG, TTTA, TTATC, TGAC, TCGTC, AACC, TTGC, CAAT, CCAA.
  • the putative origin of replication is illustrated by the outward pointing arrows (green) originating near base 603,000. Two potential termination sequences are shown near the opposite midpoint of the circle (red).
  • Figure 7 A comparison of the region of the H. influenzae chromo ⁇ some containing the 8 genes of the fimbrial gene cluster present in H.
  • SUBSTTTIITE SHEET (RULE 26) influenzae type b and the same region in H. influenzae Rd. The region is flanked by the pepN and purE genes in both organisms. However in the non- infectious Rd strain the 8 genes of the fimbrial gene cluster have been excised. A 172 bp spacer region is located in this region in the Rd strain and continues to be flanked by the pepN and purE genes.
  • Figure 8 Hydroph ⁇ bicity analysis of five predicted channel-proteins.
  • the predicted coding region sequences were analyzed by the Kyte-
  • the present invention is based on the sequencing of the Haemophilus influenzae Rd genome.
  • the primary nucleotide sequence which was generated is provided in SEQ ID NO:l.
  • the "primary sequence” refers to the nucleotide sequence represented by the IUPAC nomenclature system.
  • SEQ ID NO: 1 The sequence provided in SEQ ID NO: 1 is oriented relative to a unique Not I restriction endonuclease site found in the Haemophilus influenzae Rd genome. A skilled artisan will readily recognize that this start/stop point was chosen for convenience and does not reflect a structural significance.
  • the present invention provides the nucleotide sequence of SEQ ID ⁇ O: l, or a representative fragment thereof, in a form which can be readily used, analyzed, and interpreted by a skilled artisan.
  • the sequence is provided as a contiguous string of primary sequence information corresponding to the nucleotide sequence provided in SEQ ID NO: 1.
  • SUBST ⁇ UTE SHEET (RULE 26)
  • a "representative fragment of the nucleotide sequence depicted in SEQ ID NO: 1" refers to any portion of SEQ ID NO: 1 which is not presently represented within a publicly available database.
  • Preferred representative fragments of the present invention are Haemophilus influenzae open reading frames, expression modulating fragments, uptake modulating fragments, and fragments which can be used to diagnose the presence of Haemophilus influenzae Rd in sample. A non-limiting identification of such preferred representative fragments is provided in Tables 1(a) and and 2.
  • the nucleotide sequence information provided in SEQ ID NO: 1 was obtained by sequencing the Haemophilus influenzae Rd genome using a megabase shotgun sequencing method. Using three parameters of accuracy discussed in the Examples below, the present inventors have calculated that the sequence in SEQ ID NO: 1 has a maximum accuracy of 99.98%. Thus, the nucleotide sequence provided in SEQ ID NO: 1 is a highly accurate, although not necessarily a 100% perfect, representation of the nucleotide sequence of the Haemophilus influenzae Rd genome.
  • nucleotide sequence editing software is publicly available.
  • Applied Biosystem's (AB) AutoAssemblerTM can be used as an aid during visual inspection of nucleotide sequences.
  • SUBSTTTUTE SHEET (RULE 26) Even if all of the very rare sequencing errors in SEQ ID NO: 1 were corrected, the resulting nucleotide sequence would still be at least 99.9% identical to the nucleotide sequence in SEQ ID NO:l.
  • nucleotide sequences of the genomes from different strains of Haemophilus influenzae differ slightly. However, the nucleotide sequence of the genomes of all Haemophilus influenzae strains will be at least 99.9% identical to the nucleotide sequence provided in SEQ ID NO: 1.
  • the present invention further provides nucleotide sequences which are at least 99.9% identical to the nucleotide sequence of SEQ ID NO: 1 in a form which can be readily used, analyzed and interpreted by the skilled artisan.
  • Methods for determining whether a nucleotide sequence is at least 99.9% identical to the nucleotide sequence of SEQ ID NO: 1 are routine and readily available to the skilled artisan.
  • the well known fasta algothrithm Pierson and Lipman, Proc. Natl. Acad. Sci. USA 85:2444 (1988)
  • the nucleotide sequence provided in SEQ ED NO: l, a representative fragment thereof, or a nucleotide sequence at least 99.9% identical to SEQ ID NO: 1 may be "provided” in a variety of mediums to facilitate use thereof.
  • provided refers to a manufacture, other than an isolated nucleic acid molecule, which contains a nucleotide sequence of the present invention, i.e., the nucleotide sequence provided in SEQ ID NO:l, a representative fragment thereof, or a nucleotide sequence at least 99.9% identical to SEQ ID NO: 1.
  • Such a manufacture provides the Haemophilus influenzae Rd genome or a subset thereof (e.g., a Haemophilus Influenzae Rd open reading frame
  • a nucleotide sequence of the present invention can be recorded on computer readable media.
  • computer readable media refers to any medium which can be read and accessed directly by a computer. Such media include, but are not limited to: magnetic storage media, such as floppy discs, hard disc storage medium, and magnetic tape; optical storage media such as CD-ROM; electrical storage media such as RAM and ROM; and hybrids of these categories such as magnetic/optical storage media.
  • magnetic storage media such as floppy discs, hard disc storage medium, and magnetic tape
  • optical storage media such as CD-ROM
  • electrical storage media such as RAM and ROM
  • hybrids of these categories such as magnetic/optical storage media.
  • recorded refers to a process for storing information on computer readable medium.
  • a skilled artisan can readily adopt any of the presently know methods for recording information on computer readable medium to generate manufactures comprising the nucleotide sequence information of the present invention.
  • a variety of data storage structures are available to a skilled artisan for creating a computer readable medium having recorded thereon a nucleotide sequence of the present invention.
  • the choice of the data storage structure will generally be based on the means chosen to access the stored information.
  • sequence information of the present invention can be represented in a word processing text file, formatted in commercially-available software such as WordPerfect and Microsoft Word, or represented in the form of an ASCII file, stored in a database application, such as DB2, Sybase, Oracle, or the like.
  • DB2, Sybase, Oracle a database application
  • a skilled artisan can readily adapt any number of dataprocessor structuring formats (e.g. text file or database) in order to obtain computer readable medium having recorded thereon the nucleotide sequence information of the present invention.
  • SUBST ⁇ JTE SHEET (RULE 26) By providing the nucleotide sequence of SEQ ID NO: 1, a representative fragment thereof, or a nucleotide sequence at least 99.9% identical to SEQ ID NO: l in computer readable form, a skilled artisan can routinely access the sequence information for a variety of purposes.
  • Computer software is publicly available which allows a skilled artisan to access sequence information provided in a computer readable medium.
  • the examples which follow demonstrate how software which implements the BLAST (Altschul et al. , J. Mol. Biol. 275:403-410 (1990)) and BLAZE (Brutlag et al, Comp. Chem.
  • ORFs open reading frames
  • Such ORFs are protein encoding fragments within the Haemophilus influenzae Rd genome and are useful in producing commercially important proteins such as enzymes used in fermentation reactions and in the production of commercially useful metabolites.
  • the present invention further provides systems, particularly computer- based systems, which contain the sequence information described herein. Such systems are designed to identify commercially important fragments of the Haemophilus influenzae Rd genome.
  • a computer-based system refers to the hardware means, software means, and data storage means used to analyze the nucleotide sequence information of the present invention.
  • the minimum hardware means of the computer-based systems of the present invention comprises a central processing unit (CPU), input means, output means, and data storage means.
  • the computer-based systems of the present invention comprise a data storage means having stored therein a nucleotide sequence of the present invention and the necessary hardware means and software means for supporting and implementing a search means.
  • data storage means refers to memory which can store nucleotide sequence
  • SUBSTTTUTE SHEET (RULE 26) information of the present invention or a memory access means which can access manufactures having recorded thereon the nucleotide sequence information of the present invention.
  • search means refers to one or more programs which are implemented on the computer-based system to compare a target sequence or target structural motif with the sequence information stored within the data storage means. Search means are used to identify fragments or regions of the Haemophilus influenzae Rd genome which match a particular target sequence or target motif.
  • a variety of known algorithms are disclosed publicly and a variety of commercially available software for conducting search means are and can be used in the computer-based systems of the present invention. Examples of such software includes, but is not limited to, MacPattern (EMBL), BLASTN and BLASTX (NCBIA).
  • EMBL MacPattern
  • BLASTN BLASTN
  • NCBIA BLASTX
  • a "target sequence” can be any DNA or amino acid sequence of six or more nucleotides or two or more amino acids.
  • a skilled artisan can readily recognize that the longer a target sequence is, the less likely a target sequence will be present as a random occurrence in the database.
  • the most preferred sequence length of a target sequence is from about 10 to 100 amino acids or from about 30 to 300 nucleotide residues.
  • searches for commercially important fragments of the Haemophilus influenzae Rd genome such as sequence fragments involved in gene expression and protein processing, may be of shorter length.
  • a target structural motif refers to any rationally selected sequence or combination of sequences in which the sequence(s) are chosen based on a three-dimensional configuration which is formed upon the folding of the target motif.
  • target motifs include, but are not limited to, enzymic active sites and signal sequences.
  • Nucleic acid target motifs include,
  • SUBST ⁇ UTE SHEET (RULE 26) but are not limited to, promoter sequences, hairpin structures and inducible expression elements (protein binding sequences).
  • a variety of structural formats for the input and output means can be used to input and output the information in the computer-based systems of the present invention.
  • a preferred format for an output means ranks fragments of the Haemophilus influenzae Rd genome possessing varying degrees of homology to the target sequence or target motif. Such presentation provides a skilled artisan with a ranking of sequences which contain various amounts of the target sequence or target motif and identifies the degree of homology contained in the identified fragment.
  • comparing means can be used to compare a target sequence or target motif with the data storage means to identify sequence fragments of the Haemophilus influenzae Rd genome.
  • implementing software which implement the BLAST and BLAZE algorithms was used to identify open reading frames within the Haemophilus influenzae Rd genome.
  • any one of the publicly available homology search programs can be used as the search means for the computer- based systems of the present invention.
  • Figure 2 provides a block diagram of a computer system 102 that can be used to implement the present invention.
  • the computer system 102 includes a processor 106 connected to a bus 104. Also connected to the bus 104 are a main memory 108 (preferably implemented as random access memory, RAM) and a variety of secondary storage devices 110, such as a hard drive 112 and a removable medium storage device 114.
  • the removable medium storage device 114 may represent, for example, a floppy disk drive, a CD-ROM drive, a magnetic tape drive, etc.
  • a removable storage medium 116 (such as a floppy disk, a compact disk, a magnetic tape, etc.) containing control logic and/or data recorded therein may be inserted into the removable medium storage device 114.
  • the computer system 102 includes appropriate software
  • SUBST ⁇ UTE SHEET (RULE 26) for reading the control logic and/or the data from the removable medium storage device 114 once inserted in the removable medium storage device 114.
  • a nucleotide sequence of the present invention may be stored in a well known manner in the main memory 108, any of the secondary storage devices 110, and/or a removable storage medium 116.
  • Software for accessing and processing the genomic sequence (such as search tools, comparing tools, etc.) reside in main memory 108 during execution.
  • Another embodiment of the present invention is directed to isolated fragments of the Haemophilus influenzae Rd genome.
  • Haemophilus influenzae Rd genome of the present invention include, but are not limited to fragments which encode pep tides, hereinafter open reading frames (ORFs), fragments which modulate the expression of an operably linked ORF, hereinafter expression modulating fragments (EMFs), fragments which mediate the uptake of a linked DNA fragment into a cell, hereinafter uptake modulating fragments (UMFs), and fragments which can be used to diagnose the presence of Haemophilus influenzae Rd in a sample, hereinafter diagnostic fragments (DFs).
  • ORFs open reading frames
  • EMFs expression modulating fragments
  • UMFs uptake modulating fragments
  • DFs diagnostic fragments
  • an "isolated nucleic acid molecule” or an “isolated fragment of the Haemophilus influenzae Rd genome” refers to a nucleic acid molecule possessing a specific nucleotide sequence which has been subjected to purification means to reduce, from the composition, the number of compounds which are normally associated with the composition.
  • purification means can be used to generated the isolated fragments of the present invention. These include, but are not limited to methods which separate constituents of a solution based on charge, solubility, or size.
  • Haemophilus influenaze Rd DNA can be mechanically sheared to produce fragments of 15-20 kb in length. These fragments can then be used to generate an Haemophilus influenzae Rd library
  • SUBSTTTUTE SHEET (RULE 26) by inserting them into labda clones as described in the Examples below. Primers flanking, for examiple, an ORF provided in Table 1(a) can then be generated using nucleotide sequence information provided in SEQ ID NO: 1. PCR cloning can then be used to isolate the ORF from the lambda DNA library. PCR cloning is well known in the art. Thus, given the availability of SEQ ID NO: 1, Table 1(a) and Table 2, it would be routine to isolate any ORF or other nucleic acid fragment of the present invention.
  • the isolated nucleic acid molecules of the present invention include, but are not limited to single stranded and double stranded DNA, and single stranded RNA.
  • an "open reading frame,” ORF means a series of triplets coding for amino acids without any termination codons and is a sequence translatable into protein.
  • Tables la, lb and 2 identify ORFs in the Haemophilus influenzae Rd genome.
  • Table la indicates the location of ORFs within the Haemophilus influenzae genome which encode the recited protein based on homology matching with protein sequences from the organism appearing in parentheticals (see the fourth column of Table 1(a)).
  • the first column of Table 1(a) provides the "GenelD" of a particular ORF. This information is useful for two reasons. First, the complete map of the Haemophilus influenzae Rd genome provided in Figures 6(A)-6(D) refers to the ORFs according to their GenelD numbers. Second, Table 1(b) uses the GenelD numbers to indicate which ORFs were provided previously in a public database.
  • the second and third columns in Table 1(a) indicate an ORFs position in the nucleotide sequence provided in SEQ ID NO: 1.
  • ORFs may be oriented in opposite directions in the Haemophilus influenae genome. This is reflected in columns 2 and 3.
  • the fifth column of Table 1(a) indicates the percent identity of the protein encoded for by an ORF to the corresponding protein from the orgaism appearing in parentheticals in the fourth column.
  • SUBST ⁇ UTE SHEET (RULE 25)
  • the sixth column of Table 1(a) indicates the percent similarity of the protein encoded for by an ORF to the corresponding protein from the organism appearing in parentheticals in the fourth column.
  • the concepts of percent identity and percent similarity of two polypeptide sequences is well understood in the art. For example, two polypeptides 10 amino acids in length which differ at three amino acid positions (e.g., at positions 1, 3 and 5) are said to have a percent identity of 70% . However, the same two polypeptides would be deemed to have a percent similarity of 80% if, for example at position 5, the amino acids moieties, although not identical, were "similar" (i.e., possessed similar biochemical characteristics).
  • the seventh column in Table 1(a) indicates the lenth of the amino acid homology match.
  • Table 2 provides ORFs of the Haemophilus influenzae Rd genome which encode polypeptide sequences which did not elicit a "homology match" with a known protein sequence from another organism. Further details concerning the algorithms and criteria used for homology searches are provided in the Examples below.
  • ORFs in the Haemophilus influenzae Rd genome other than those listed in Tables 1(a), 1(b) and 2, such as ORFs which are overlapping or encoded by the opposite strand of an identified ORF in addition to those ascertainable using the computer-based systems of the present invention.
  • an "expression modulating fragment,” EMF means a series of nucleotide molecules which modulates the expression of an operably linked ORF or EMF.
  • a sequence is said to "modulate the expression of an operably linked sequence" when the expression of the sequence is altered by the presence of the EMF.
  • EMFs include, but are not limited to, promoters, and promoter modulating sequences (inducible elements).
  • One class of EMFs are fragments which induce the expression or an operably linked ORF in response to a specific regulatory factor or physiological event.
  • EMF sequences can be identified within the Haemophilus influenzae Rd genome by their proximity to the ORFs provided in Tables 1(a), 1(b) and 2.
  • an "intergenic segment” refers to the fragments of the Haemophilus genome which are between two ORF(s) herein described.
  • EMFs can be identified using known EMFs as a target sequence or target motif in the computer-based systems of the present invention.
  • An EMF trap vector contains a cloning site 5' to a marker sequence.
  • a marker sequence encodes an identifiable phenotype, such as antibiotic resistance or a complementing nutrition auxotrophic factor, which can be identified or assayed when the EMF trap vector is placed within an appropriate host under appropriate conditions.
  • a EMF will modulate the expression of an operably linked marker sequence.
  • a sequence which is suspected as being a EMF is cloned in all three reading frames in one or more restriction sites upstream from the marker sequence in the EMF trap vector.
  • the vector is then transformed into an appropriate host using known procedures and the phenotype of the transformed host in examined under appropriate conditions.
  • an EMF will modulate the expression of an operably linked marker sequence.
  • an "uptake modulating fragment,” UMF means a series of nucleotide molecules which mediate the uptake of a linked DNA fragment into a cell.
  • UMFs can be readily identified using known UMFs as a target sequence or target motif with the computer-based systems described above.
  • SUBST ⁇ UTE SHEET (RULE 26) The presence and activity of a UMF can be confirmed by attaching the suspected UMF to a marker sequence. The resulting nucleic acid molecule is then incubated with an appropriate host under appropriate conditions and the uptake of the marker sequence is determined. As described above, a UMF will increase the frequency of uptake of a linked marker sequence.
  • a review of DNA uptake in Haemophilus is provided by Goodgall, S.H., et al. , J. Bact. 772:5924-5928 (1990).
  • a "diagnostic fragment,” DF means a series of nucleotide molecules which selectively hybridize to Haemophilus influenzae sequences. DFs can be readily identified by identifying unique sequences within the Haemophilus influenzae Rd genome, or by generating and testing probes or amplification primers consisting of the DF sequence in an appropriate diagnostic format which determines amplification or hybridization selectivity. The sequences falling within the scope of the present invention are not limited to the specific sequences herein described, but also include allelic and species variations thereof.
  • Allelic and species variations can be routinely determined by comparing the sequence provided in SEQ ID NO: l, a representative fragment thereof, or a nucleotide sequence at least 99.9% identical to SEQ ID NO: 1 with a sequence from another isolate of the same species.
  • the invention includes nucleic acid molecules coding for the same amino acid sequences as do. the specific ORFs disclosed herein. In other words, in the coding region of an ORF, substitution of one codon for another which encodes the same amino acid is expressly contemplated.
  • Any specific sequence disclosed herein can be readily screened for errors by resequencing a particular fragment, such as an ORF, in both directions (i.e., sequence both strands).
  • error screening can be performed by sequencing corresponding polynucleotides of Haemophilus influenzae origin isolated by using part or all of the fragments in question as a probe or primer.
  • Each of the ORFs of the Haemophilus influenzae Rd genome disclosed in Tables 1(a), 1(b) and 2, and the EMF found 5' to the ORF, can be used in numerous ways as polynucleotide reagents.
  • the sequences can be used as diagnostic probes or diagnostic amplification primers to detect the presence of a specific microbe, such as Haemophilus influenzae RD, in a sample. This is especially the case with the fragments or ORFs of Table 2, which will be highly selective for Haemophilus influenzae.
  • fragments of the present invention can be used to control gene expression through triple helix formation or antisense DNA or RNA, both of which methods are based on the binding of a polynucleotide sequence to DNA or RNA.
  • Polynucleotides suitable for use in these methods are usually 20 to 40 bases in length and are designed to be complementary to a region of the gene involved in transcription (triple helix - see Lee et al. , Nucl Acids Res. 3:173 (1979); Cooney et al , Science 247:456 (1988); and Dervan et al. , Science 257: 1360 (1991)) or to the mRNA itself (antisense - Okano, J. Neurochem. 56:560 (1991); Oligodeoxynucleotides as Antisense Inhibitors of Gene Expression, CRC Press, Boca Raton, FL (1988)).
  • SUBST ⁇ UTE SHEET (RULE 26) Triple helix- formation optimally results in a shut-off of RNA transcription from DNA, while antisense RNA hybridization blocks translation of an mRNA molecule into polypeptide. Both techniques have been demonstrated to be effective in model systems. Information contained in the sequences of the present invention is necessary for the design of an antisense or triple helix oligonucleotide.
  • the present invention further provides recombinant constructs comprising one or more fragments of the Haemophilus influenzae Rd genome of the present invention.
  • the recombinant constructs of the present invention comprise a vector, such as a plasmid or viral vector, into which a fragment of the Haemophilus influenzae Rd has been inserted, in a forward or reverse orientation.
  • the vector may further comprise regulatory sequences, including for example, a promoter, operably linked to the ORF.
  • the vector may further comprise a marker sequence or heterologous ORF operably linked to the EMF or UMF.
  • Bacterial pBs, phagescript, PsiX174, pBluescript SK, pBs
  • KS pNH8a, ⁇ NH16a, pNH18a, pNH46a (Stratagene); pTrc99A, pKK223-3, pKK233-3, pDR540, pRTT5 (Pharmacia).
  • Promoter regions can be selected from any desired gene using CAT (chloramphenicol transferase) vectors or other vectors with selectable markers.
  • CAT chloramphenicol transferase
  • Two appropriate vectors are pKK232-8 and pCM7.
  • Particular named bacterial promoters include lad, lacZ, T3, T7, gpt, lambda P R , and trc.
  • Eukaryotic promoters include CMV immediate early, HSV thymidine kinase, early and late SV40, LTRs from retrovirus, and mouse metallothionein-I. Selection of the appropriate vector and promoter is well within the level of ordinary skill in the art.
  • the present invention further provides host cells containing any one of the isolated fragments of the Haemophilus influenzae Rd genome of the present invention, wherein the fragment has been introduced into the host cell using known transformulation methods.
  • the host cell can be a higher eukaryotic host cell, such as a mammalian cell, a lower eukaryotic host cell, such as a yeast cell, or the host cell can be a procaryotic cell, such as a bacterial cell.
  • Introduction of the recombinant construct into the host cell can be effected by calcium phosphate transfection, DEAE, dextran mediated transfection, or electroporation (Davis, L. et al.
  • the host cells containing one of the fragments of the Haemophilus influenzae Rd genome of the present invention can be used in conventional manners to produce the gene product encoded by the isolated fragment (in the case of an ORF) or can be used to produce a heterologous protein under the control of the EMF.
  • the present invention further provides isolated polypeptides encoded by the nucleic acid fragments of the present invention or by degenerate variants of the nucleic acid fragments of the present invention.
  • nucleotide fragments which differ from a nucleic acid fragment of the present invention (e.g., an ORF) by nucleotide sequence but, due to the degeneracy of the Genetic Code, encode an identical polypeptide sequence.
  • Preferred nucleic acid fragments of the present invention are the ORFs depicted in Table 1(a) which encode proteins.
  • the amino acid sequence can be synthesized using commercially available peptide synthesizers. This is particularly useful in producing small peptides and fragments of larger polypeptides. Fragments are useful, for example, in generating antibodies against the native polypeptide.
  • the polypeptide or protein is purified from bacterial cells which naturally produce the polypeptide or protein.
  • One skilled in the art can readily follow known methods for isolating polpeptides and proteins in order to obtain one of the isolated polypeptides or proteins of the present invention.
  • polypeptides and proteins of the present invention can alternatively be purified from cells which have been altered to express the desired polypeptide or protein.
  • a cell is said to be altered to express a desired polypeptide or protein when the cell, through genetic manipulation, is made to produce a polypeptide or protein which it normally does not produce or which the cell normally produces at a lower level.
  • ORFs of the present invention include, but are not limited to, eukaryotic hosts such as HeLa cells, Cv-1 cell, COS cells, and Sf9 cells, as well as prokaryotic host such as E. coli and B. subtilis.
  • eukaryotic hosts such as HeLa cells, Cv-1 cell, COS cells, and Sf9 cells
  • prokaryotic host such as E. coli and B. subtilis.
  • the most preferred cells are those which do not normally express the particular polypeptide or protein or which expresses the polypeptide or protein at low natural level.
  • Recombinant means that a polypeptide or protein is derived from recombinant (e.g., microbial or mammalian) expression systems.
  • Microbial refers to recombinant polypeptides or proteins made in bacterial or fungal (e.g., yeast) expression systems.
  • recombinant microbial defines a polypeptide or protein essentially free of native endogenous substances and unaccompanied by associated native glycosylation. Polypeptides or proteins expressed in most bacterial cultures, e.g., E. coli, will be free of glycosylation modifications; polypeptides or proteins expressed in yeast will have a glycosylation pattern different from that expressed in mammalian cells.
  • Nucleotide sequence refers to a heteropolymer of deoxyribonucleotides.
  • DNA segments encoding the polypeptides and proteins provided by this invention are assembled from fragments of the Haemophilus influenzae Rd genome and short oligonucleotide linkers, or from a series of oligonucleotides, to provide a synthetic gene which is capable of being expressed in a recombinant transcriptional unit comprising regulatory elements derived from a microbial or viral operon.
  • Recombinant expression vehicle or vector refers to a plasmid or phage or virus or vector, for expressing a polypeptide from a DNA (RNA) sequence.
  • the expression vehicle can comprise a transcriptional unit comprising an assembly of (1) a genetic element or elements having a regulatory role in gene expression, for example, promoters or enhancers, (2) a structural or coding sequence which is transcribed into mRNA and translated into protein, and (3) appropriate transcription initiation and termination sequences.
  • Structural units intended for use in yeast or eukaryotic expression systems preferably include a leader sequence enabling extracellular secretion of translated protein by a host cell.
  • recombinant protein when expressed without a leader or transport sequence, it may include an N-terminal methionine residue. This residue may or may not be subsequently cleaved from the expressed recombinant protein to provide a final product.
  • Recombinant expression system means host cells which have stably integrated a recombinant transcriptional unit into chromosomal DNA or carry the recombinant transcriptional unit extra chromosomally.
  • the cells can be prokaryotic or eukaryotic.
  • Recombinant expression systems as defined herein will express heterologous polypeptides or proteins upon induction of the regulatory elements linked to the DNA segment or synthetic gene to be expressed.
  • Mature proteins can be expressed in mammalian cells, yeast, bacteria, or other cells under the control of appropriate promoters.
  • Cell-free translation systems can also be employed to produce such proteins using RNAs derived from the DNA constructs of the present invention. Appropriate cloning and
  • SUBST ⁇ JTE SHEET (RULE 26) expression vectors for use with prokaryatic and eukaryotic hosts are described by Sambrook, et al. , in Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor, New York (1989), the disclosure of which is hereby incorporated by reference.
  • recombinant expression vectors will include origins of replication and selectable markers permitting transformation of the host cell, e.g. , the ampicillin resistance gene of E. coli and S. cerevisiae TRP1 gene, and a promoter derived from a highly-expressed gene to direct transcription of a downstream structural sequence.
  • Such promoters can be derived from operons encoding glycolytic enzymes such as 3-phosphoglycerate kinase
  • the heterologous structural sequence is assembled in appropriate phase with translation initiation and termination sequences, and preferably, a leader sequence capable of directing secretion of translated protein into the periplasmic space or extracellular medium.
  • the heterologous sequence can encode a fusion protein including an N-terminal identification peptide imparting desired characteristics, e.g., stabilization or simplified purification of expressed recombinant product.
  • Useful expression vectors for bacterial use are constructed by inserting a structural DNA sequence encoding a desired protein together with suitable translation initiation and termination signals in operable reading phase with a functional promoter.
  • the vector will comprise one or more phenotypic selectable markers and an origin of replication to ensure maintenance of the vector and to, if desirable, provide amplification within the host.
  • Suitable prokaryotic hosts for transformation include E. coli, Bacillus subtilis,
  • useful expression vectors for bacterial use can comprise a selectable marker and bacterial origin of replication derived from commercially available plasmids comprising genetic
  • SUBSTTTUTE 5HEET elements of the well known cloning vector pBR322 (ATCC 37017).
  • cloning vectors include, for example, pKK223-3 (Pharmacia Fine Chemicals, Uppsala, Sweden) and GEM 1 (Promega Biotec, Madison, WI, USA). These pBR322 "backbone" sections are combined with an appropriate promoter and the structural sequence to be expressed.
  • the selected promoter is derepressed by appropriate means (e.g., temperature shift or chemical induction) and cells are cultured for an additional period.
  • appropriate means e.g., temperature shift or chemical induction
  • Cells are typically harvested by centrifugation, disrupted by physical or chemical means, and the resulting crude extract retained for further purification.
  • mammalian cell culture systems can also be employed to express recombinant protein.
  • mammalian expression systems include the COS-7 lines of monkey kidney fibroblasts, described by Gluzman, Cell 23:175 (1981), and other cell lines capable of expressing a compatible vector, for example, the C127, 3T3, CHO, HeLa and BHK cell lines.
  • Mammalian expression vectors will comprise an origin of replication, a suitable promoter and enhancer, and also any necessary ribosome binding sites, polyadenylation site, splice donor and acceptor sites, transcriptional termination sequences, and 5' flanking nontranscribed sequences.
  • DNA sequences derived from the SV40 viral genome for example, SV40 origin, early promoter, enhancer, splice, and polyadenylation sites may be used to provide the required nontranscribed genetic elements.
  • Recombinant polypeptides and proteins produced in bacterial culture is usually isolated by initial extraction from cell pellets, followed by one or more salting-out, aqueous ion exchange or size exclusion chromatography steps. Protein refolding steps can be used, as necessary, in completing configuration of the mature protein. Finally, high performance liquid chromatography (HPLC) can be employed for final purification steps. Microbial cells employed in expression of proteins can be disrupted by any
  • SUBST ⁇ UTE SHEET (RULE 25) convenient method, including freeze-thaw cycling, sonication, mechanical disruption, or use of cell lysing agents.
  • the present invention further includes isolated polypeptides, proteins and nucleic acid molecules which are substantially equivalent to those herein described.
  • substantially equivalent can refer both to nucleic acid and amino acid sequences, for example a mutant sequence, that varies from a reference sequence by one or more substitutions, deletions, or additions, the net effect of which does not result in an adverse functional dissimilarity between reference and subject sequences.
  • sequences having equivalent biological activity, and equivalent expression characteristics are considered substantially equivalent.
  • truncation of the mature sequence should be disregarded.
  • the invention further provides methods of obtaining homologs from other strains of Haemophilus influenzae, of the fragments of the Haemophilus influenzae Rd genome of the present invention and homologs of the proteins encoded by the ORFs of the present invention.
  • a sequence or protein of Haemophilus influenzae is defined as a homolog of a fragment of the Haemophilus influenzae Rd genome or a protein encoded by one of the ORFs of the present invention, if it shares significant homology to one of the fragments of the Haemophilus influenzae Rd genome of the present invention or a protein encoded by one of the ORFs of the present invention.
  • sequence disclosed herein as a probe or as primers, and techniques such as PCR cloning and colony /plaque hybridization, one skilled in the art can obtain homologs.
  • nucleic acid molecules or proteins are said to "share significant homology” if the two contain regions which process greater than 85% sequence (amino acid or nucleic acid) homology.
  • Region specific primers or probes derived from the nucleotide sequence provided in SEQ ID NO:l or from a nucleotide sequence at least 99.9% identical to SEQ ID NO: 1 can be used to prime DNA synthesis and PCR amplification, as well as to identify colonies containing cloned DNA encoding
  • sequences which are greater than 75% homologous to the primer will be amplified.
  • sequences which are greater than 40-50% homologous to the primer will also be amplified.
  • DNA probes derived from SEQ ID NO:l or from a nucleotide sequence at least 99.9% identical to SEQ ID NO: l for colony/plaque hybridization one skilled in the art will recognize that by employing high stringency conditions (e.g., hybridizing at 50-65 °C in 5X SSPC and 50% formamide, and washing at 50-65°C in 0.5X SSPC), sequences having regions which are greater than 90% homologous to the probe can be obtained, and that by employing lower stringency conditions (e.g., hybridizing at 35-37° C in 5X SSPC and 40-45% formamide, and washing at 42 °C in SSPC), sequences having regions which are greater than 35-45% homologous to the probe will be obtained.
  • high stringency conditions e.g., hybridizing at 50-65 °C in 5X SSPC and 50% formamide, and washing at 50-65°C in 0.5X SSPC
  • lower stringency conditions e.g., hybridizing at 35-37
  • Any organism can be used as the source for homologs of the present invention so long as the organism naturally expresses such a protein or contains genes encoding the same.
  • the most preferred organism for isolating homologs are bacterias which are closely related to Haemophilus influenzae Rd.
  • Table 1(a) Each ORF provided in Table 1(a) was assigned to one of 102 biological role categories adapted from Riley, M., Microbiology Reviews 57(4):S62 (1993)). This allows the skilled artisan to determine a use for each identified coding sequence. Tables 1(a) further provides an identification of the type of polypeptide which is encoded for by each ORF. As a result, one skilled in the art can use the polypeptides of the present invention for commercial, therapeutic and industrial purposes consistent with the type of putative identification of the polypeptide.
  • Such identifications permit one skilled in the art to use the Haemophilus influenzae ORFs in a manner similar to the known type of sequences for which the identification is made; for example, to ferment a particular sugar source or to produce a particular metabolite.
  • a review of enzymes used within the commercial industry see Biochemical Engineering and Biotechnology Handbook 2nd, eds. Macmillan Publ. Ltd., NY (1991) and Biocatalysts in Organic Syntheses, ed. J. Tramper et al , Elsevier Science Publishers, Amsterdam, The Netherlands (1985)).
  • Open reading frames encoding proteins involved in mediating the catalytic reactions involved in intermediary and macromolecular metabolism, the biosynthesis of small molecules, cellular processes and other functions includes enzymes involved in the degradation of the intermediary products of metabolism, enzymes involved in central intermediary metabolism, enzymes involved in respiration, both aerobic and anaerobic, enzymes involved in fermentation, enzymes involved in ATP proton motor force conversion, enzymes involved in broad regulatory function, enzymes involved in amino acid synthesis, enzymes involved in nucleotide synthesis, enzymes involved in cofactor and vitamin synthesis, can be used for industrial biosynthesis.
  • the various metabolic pathways present in Haemophilus can be identified based on absolute nutritional requirements as well as by examining the various enzymes identified in Table 1(a).
  • a number of the proteins encoded by the identified ORFs in Tables 1(a) are particularly involved in the degradation of intermediary metabolites as well as non- macromolecular metabolism.
  • Some of the enzymes identified include amylases, glucose oxidases, and catalase.
  • Proteolytic enzymes are another class of commercially important enzymes. Proteolytic enzymes find use in a number of industrial processes including the processing of flax and other vegetable fibers, in the extraction, clarification and depectinization of fruit juices, in the extraction of vegetables' oil and in the maceration of fruits and vegetables to give unicellular fruits.
  • GOD ketogulonic acid
  • the main sweetener used in the world today is sugar which comes from sugar beets and sugar cane.
  • the glucose isomerase process shows the largest expansion in the market today. Initially, soluble enzymes were used and later immobilized enzymes were developed (Krueger et al , Biotechnology, The Textbook of Industrial Microbiology,
  • Another class of commercially usable proteins of the present invention are the microbial Upases identified in Table 1 (see Macrae et al. , Philosophical
  • Upases A major use of Upases is in the fat and oU industry for the production of neutral glycerides using lipase catalyzed inter-esterification of readily available triglycerides. AppUcation of Upases include the use as a detergent additive to facilitate the removal of fats from fabrics in the course of the washing procedures.
  • Amino transf erases enzymes involved in the biosynthesis and metabolism of amino acids, are useful in the catalytic production of amino acids.
  • the advantages of using microbial based enzyme systems is that the amino transferase enzymes catalyze the stereo-selective synthesis of only /-amino acids and generally possess uniformly high catalytic rates.
  • a description of the use of amino transferases for amino acid production is provided by RoseUe-David, Methods ofEnzymology 756:479 (1987).
  • Another category of useful proteins encoded by the ORFs of the present invention include enzymes involved in nucleic acid synthesis, repair, and recombination.
  • enzymes involved in nucleic acid synthesis, repair, and recombination include the Hinc II, Hind III, and Hinf I restriction endonucleases.
  • Table 1(a) identifies a wide array of enzymes, such as restriction enzymes, ligases, gyrases and methylases, which have immediate use in the biotechnology industry.
  • proteins of the present invention can be used in a variety procedures and methods known in the art which are currently appUed to other proteins.
  • the proteins of the present invention can further be used to generate an antibody which selectively binds the protein.
  • Such antibodies can be either monoclonal or polyclonal antibodies, as well fragments of these antibodies, and humanized forms.
  • the invention further provides antibodies which selectively bind to one of the proteins of the present invention and hybridomas which produce these antibodies.
  • a hybridoma is an immortalized ceU line which is capable of secreting a specific monoclonal antibody.
  • the protein which is used as an immunogen may be modified or administered in an adjuvant in order to increase the protein's antigenicity.
  • Methods of increasing the antigenicity of a protein are well known in the art and include, but are not limited to coupUng the antigen with a heterologous
  • SUBST ⁇ TUTE SHEET (RULE 26) protein (such as globuUn or ⁇ -galactosidase) or through the inclusion of an adjuvant during immunization.
  • spleen ceUs from the immunized animals are removed, fused with myeloma ceUs, such as SP2/0-Agl4 myeloma ceUs, and allowed to become monoclonal antibody producing hybridoma ceUs.
  • myeloma ceUs such as SP2/0-Agl4 myeloma ceUs
  • any one of a number of methods weU known in the art can be used to identify the hybridoma ceU which produces an antibody with the desired characteristics. These include screening the hybridomas with an ELISA assay, western blot analysis, or radioimmunoassay (Lutz et al, Exp. Cell Res. 775: 109-124 (1988)).
  • Hybridomas secreting the desired antibodies are cloned and the class and subclass is determined using procedures known in the art (Campbell,
  • antibody containing an ⁇ sera is isolated from the immunized animal and is screened for the presence of antibodies with the desired specificity using one of the above-described procedures.
  • the present invention further provides the above-described antibodies in detectably labeUed form.
  • Antibodies can be detectably labeUed through the use of radioisotopes, affinity labels (such as biotin, avidin, etc.), enzymatic labels (such as horseradish peroxidase, alkaline phosphatase, etc.) fluorescent labels (such as FTTC or rhodamine, etc.), paramagnetic atoms, etc. Procedures for accomplishing such labelling are weU-known in the art, for example see (Sternberger, L.A. et al, J. Histochem. Cytochem. 78:315 (1970); Bayer, E.A. et al, Meth. Enzym. 62:308 (1979); Engval, E. et al, Immunol. 109:129 (1972); Goding, J.W. J. Immunol Meth. 75:215 (1976)).
  • SUBST ⁇ TUTE SHEET (RULE 26)
  • the labeled antibodies of the present invention can be used for in vitro, in vivo, and in situ assays to identify ceUs or tissues in which a fragment of the Haemophilus influenzae Rd genome is expressed.
  • the present invention further provides the above-described antibodies immobilized on a soUd support.
  • solid supports include plastics such as polycarbonate, complex carbohydrates such as agarose and sepharose, acrylic resins and such as polyacrylamide and latex beads.
  • Techniques for coupling antibodies to such soUd supports are weU known in the art (Weir, D.M. et al. , "Handbook of Experimental Immunology” 4th Ed. , Blackwell Scientific Publications, Oxford, England, Chapter 10 (1986);
  • the immobilized antibodies of the present invention can be used for in vitro, in vivo, and in situ assays as weU as for immunoaffinity purification of the proteins of the present invention.
  • the present invention further provides methods to identify the expression of one of the ORFs of the present invention, or homolog thereof, in a test sample, using one of the DFs or antibodies of the present invention.
  • such methods comprise incubating a test sample with one or more of the antibodies or one or more of the DFs of the present invention and assaying for binding of the DFs or antibodies to components within the test sample.
  • Incubation conditions depend on the format employed in the assay, the detection methods employed, and the type and nature of the DF or antibody used in the assay.
  • One skilled in the art wUl recognize that any one of the commonly available hybridization, ampUfication or immunological assay formats can readily be adapted to employ the DFs or antibodies of the present invention. Examples of such assays can be found in Chard, T., An
  • test samples of the present invention include ceUs, protein or membrane extracts of cells, or biological fluids such as sputum, blood, serum, plasma, or urine.
  • the test sample used in the above-described method wiU vary based on the assay format, nature of the detection method and the tissues, cells or extracts used as the sample to be assayed. Methods for preparing protein extracts or membrane extracts of cells are well known in the art and can be readily be adapted in order to obtain a sample which is compatible with the system utilized.
  • kits are provided which contain the necessary reagents to carry out the assays of the present invention.
  • the invention provides a compartmentalized kit to receive, in close confinement, one or more containers which comprises: (a) a first container comprising one of the DFs or antibodies of the present invention; and (b) one or more other containers comprising one or more of the following: wash reagents, reagents capable of detecting presence of a bound DF or antibody.
  • a compartmentalized kit includes any kit in which reagents are contained in separate containers.
  • Such containers include small glass containers, plastic containers or strips of plastic or paper.
  • Such containers allows one to efficiently transfer reagents from one compartment to another compartment such that the samples and reagents are not cross-contaminated, and the agents or solutions of each container can be added in a quantitative fashion from one compartment to another.
  • Such containers will include a container which wiU accept the test sample, a container which contains the antibodies used in the assay, containers which contain wash reagents (such as phosphate buffered saline, Tris-buffers, etc.), and containers which contain the reagents used to detect the bound antibody or DF.
  • Types of detection reagents include labelled nucleic acid probes, labeUed secondary antibodies, or in the alternative, if the primary antibody is labelled, the enzymatic, or antibody binding reagents which are capable of reacting with the labelled antibody.
  • labelled nucleic acid probes labeUed secondary antibodies, or in the alternative, if the primary antibody is labelled, the enzymatic, or antibody binding reagents which are capable of reacting with the labelled antibody.
  • the present invention further provides methods of obtaining and identifying agents which bind to a protein encoded by one of the ORFs of the present invention or to one of the fragments and the Haemophilus genome herein described.
  • said method comprises the steps of: (a) contacting an agent with an isolated protein encoded by one of the ORFs of the present invention, or an isolated fragment of the Haemophilus genome; and (b) determining whether the agent binds to said protein or said fragment.
  • the agents screened in the above assay can be, but are not limited to, peptides, carbohydrates, vitamin derivatives, or other pharmaceutical agents.
  • the agents can be selected and screened at random or rationally selected or designed using protein modeling techniques.
  • agents such as peptides, carbohydrates, pharmaceutical agents and the like are selected at random and are assayed for their ability to bind to the protein encoded by the ORF of the present invention.
  • agents may be rationaUy selected or designed.
  • an agent is said to be "rationaUy selected or designed" when the agent is chosen based on the configuration of the particular protein. For example, one skiUed in the art can readUy adapt currently available procedures to generate peptides, pharmaceutical agents and the like capable of binding to a specific peptide sequence in order to generate rationally designed antipeptide peptides, for example see Hurby et al , AppUcation of Synthetic Peptides: Antisense Peptides," In Synthetic Peptides, A User's Guide, W.H. Freeman, NY (1992), pp. 289-307, and Kaspczak et al., Biochemistry 28:9230-8 (1989), or pharmaceutical agents, or the like.
  • one class of agents of the present invention can be used to control gene expression through binding to one of the ORFs or EMFs of the present invention.
  • agents can be randomly screened or rationaUy designed/selected.
  • DNA binding agents are agents which contain base residues which hybridize or form a triple helix formation by binding to DNA or RNA.
  • Such agents can be based on the classic phosphodiester, ribonucleic acid backbone, or can be a variety of sulfhydryl or polymeric derivatives which have base attachment capacity.
  • Agents suitable for use in these methods usually contain 20 to 40 bases and are designed to be complementary to a region of the gene involved in transcription (triple helix - see Lee et al, Nucl Acids Res. 3:173 (1979); Cooney et al. , Science 241:456 (1988); and Dervan et al , Science 251: 1360 (1991)) or to the mRNA itself (antisense - Okano, J. Neurochem. 56:560 (1991); Oligodeoxynucleotides as Antisense Inhibitors of Gene Expression, CRC Press, Boca Raton, FL (1988)).
  • Triple helix- formation optimaUy results in a shut-off of RNA transcription from DNA, whUe antisense RNA
  • SUBST ⁇ UTE SHEET (RULE 26) hybridization blocks translation of an mRNA molecule into polypeptide. Both techniques have been demonstrated to be effective in model systems. Information contained in the sequences of the present invention is necessary for the design of an antisense or triple heUx ohgonucleotide and other DNA binding agents.
  • Agents which bind to a protein encoded by one of the ORFs of the present invention can be used as a diagnostic agent, in the control of bacterial infection by modulating the activity of the protein encoded by the ORF.
  • Agents which bind to a protein encoded by one of the ORFs of the present invention can be formulated using known techniques to generate a pharmaceutical composition for use in controlling Haemophilus growth and infection.
  • the present invention further provides pharmaceutical agents which can be used to modulate the growth of Haemophilus influenzae, or another related organism, in vivo or in vitro.
  • a "pharmaceutical agent” is defined as a composition of matter which can be formulated using known techniques to provide a pharmaceutical compositions.
  • pharmaceutical agents of the present invention refers the pharmaceutical agents which are derived from the proteins encoded by the ORFs of the present invention or are agents which are identified using the herein described assays.
  • a pharmaceutical agent is said to "modulated the growth of Haemophilus sp., or a related organism, in vivo or in vitro," when the agent reduces the rate of growth, rate of division, or viability of the organism in question.
  • the pharmaceutical agents of the present invention can modulate the growth of an organism in many fashions, although an understanding of the underlying mechanism of action is not needed to practice the use of the pharmaceutical agents of the present invention. Some agents wiU modulate the growth by binding to an important protein thus blocking the
  • SUBST7TUTE SHEET (RULE 25) biological activity of the protein, whUe other agents may bind to a component of the outer surface of the organism blocking attachment or rendering the organism more prone to act the bodies nature immune system.
  • the agent may be comprise a protein encoded by one of the ORFs of the present invention and serve as a vaccine.
  • the development and use of a vaccine based on outer membrane components, such as the LPS, are weU known in the art.
  • a "related organism” is a broad term which refers to any organism whose growth can be modulated by one of the pharmaceutical agents of the present invention.
  • such an organism wiU contain a homolog of the protein which is the target of the pharmaceutical agent or the protein used as a vaccine.
  • related organism do not need to be bacterial but may be fungal or viral pathogens.
  • the pharmaceutical agents and compositions of the present invention may be administered in a convenient manner such as by the oral, topical, intravenous, intraperitoneal, intramuscular, subcutaneous, intranasal or intradermal routes.
  • the pharmaceutical compositions are administered in an amount which is effective for treating and/or prophylaxis of the specific indication. In general, they are administered in an amount of at least about 10 ⁇ g kg body weight and in most cases they wiU be administered in an amount not in excess of about 8 mg/Kg body weight per day. In most cases, the dosage is from about 10 ⁇ g/kg to about 1 mg/kg body weight daily, taking into account the routes of administration, symptoms, etc.
  • the agents of the present invention can be used in native form or can be modified to form a chemical derivative.
  • a molecule is said to be a "chemical derivative" of another molecule when it contains additional chemical moieties not normaUy a part of the molecule. Such moieties may improve the molecule's solubUity, absorption, biological half life, etc. The moieties may alternatively decrease the toxicity of the molecule, eliminate or attenuate any undesirable side effect of the molecule, etc.
  • a change in the immunological character of the functional derivative is measured by a competitive type immunoassay. Changes in immunomodulation activity are measured by the appropriate assay. Modifications of such protein properties as redox or thermal stabiUty, biological half-life, hydrophobicity, susceptibility to proteolytic degradation or the tendency to aggregate with carriers or into multimers are assayed by methods weU known to the ordinarily skilled artisan.
  • the therapeutic effects of the agents of the present invention may be obtained by providing the agent to a patient by any suitable means (i.e., inhalation, intravenously, intramuscularly, subcutaneously, enterally, or parenterally). It is preferred to administer the agent of the present invention so as to achieve an effective concentration within the blood or tissue in which the growth of the organism is to be controUed.
  • the preferred method is to administer the agent by injection.
  • the administration may be by continuous infusion, or by single or multiple injections.
  • the dosage of the administered agent will vary depending upon such factors as the patient's age, weight, height, sex, general medical condition, previous medical history, etc. In general, it is desirable to provide the recipient with a dosage of agent which is in the range of from about 1 pg/kg to 10 mg/kg (body weight of patient), although a lower or higher dosage may be admin- istered.
  • the therapeuticaUy effective dose can be lowered by using combinations of the agents of the present invention or another agent.
  • compositions of the present invention can be administered concurrently with, prior to, or following the administration of the other agent.
  • the agents of the present invention are intended to be provided to recipient subjects in an amount sufficient to decrease the rate of growth (as defined above) of the target organism.
  • the administration of the agent(s) of the invention may be for either a "prophylactic" or "therapeutic" purpose.
  • the agent(s) are provided in advance of any symptoms indicative of the organisms growth.
  • the prophylactic administration of the agent(s) serves to prevent, attenuate, or decrease the rate of onset of any subsequent infection.
  • the agent(s) are provided at (or shortly after) the onset of an indication of infection.
  • the therapeutic administration of the compound(s) serves to attenuate the pathological symptoms of the infection and to increase the rate of recovery.
  • the agents of the present invention are administered to the mammal in a pharmaceutically acceptable form and in a therapeutically effective concentration.
  • a composition is said to be "pharmacologicaUy acceptable” if its administration can be tolerated by a recipient patient.
  • Such an agent is said to be administered in a "therapeuticaUy effective amount” if the amount administered is physiologicaUy significant.
  • An agent is physiologicaUy significant if its presence results in a detectable change in the physiology of a recipient patient.
  • the agents of the present invention can be formulated according to known methods to prepare pharmaceutically useful compositions, whereby these materials, or their functional derivatives, are combined in admixture with a pharmaceutically acceptable carrier vehicle.
  • a pharmaceutically acceptable carrier vehicle e.g., water, alcohol, and water.
  • Suitable vehicles and their formulation, inclusive of other human proteins, e.g., human serum albumin, are described, for example, in Remington's Pharmaceutical Sciences (16th ed., Osol, A., Ed., Mack, Easton PA (1980)).
  • a pharmaceuticaUy acceptable composition suitable for effective administration, such compositions
  • SUBST ⁇ JTE SHEET (RULE 26) will contain an effective amount of one or more of the agents of the present invention, together with a suitable amount of carrier vehicle.
  • Control release preparations may be achieved through the use of polymers to complex or absorb one or more of the agents of the present invention.
  • the controUed delivery may be exercised by selecting appropriate macromolecules (for example polyesters, polyamino acids, polyvinyl, pyrroUdone, ethylenevinylacetate, methylceUulose, carboxymethylceUulose, or protamine, sulfate) and the concentration of macromolecules as well as the methods of incorporation in order to control release.
  • agents of the present invention are to incorporate agents of the present invention into particles of a polymeric material such as polyesters, polyamino acids, hydrogels, poly (lactic acid) or ethylene vinylacetate copolymers.
  • microcapsules prepared, for example, by coacervation techniques or by interfacial polymerization, for example, hydroxymethylceUulose or gelatine- microcapsules and poly(methylmethacylate) microcapsules, respectively, or in colloidal drug delivery systems, for example, liposomes, albumin microspheres, microemulsions, nanoparticles, and nanocapsules or in macroemulsions.
  • colloidal drug delivery systems for example, liposomes, albumin microspheres, microemulsions, nanoparticles, and nanocapsules or in macroemulsions.
  • the invention further provides a pharmaceutical pack or kit comprising one or more containers fiUed with one or more of the ingredients of the pharmaceutical compositions of the invention.
  • a pharmaceutical pack or kit comprising one or more containers fiUed with one or more of the ingredients of the pharmaceutical compositions of the invention.
  • Associated with such containers can be a notice in the form prescribed by a governmental agency regulating the manufacture, use or sale of pharmaceuticals or biological products, which notice reflects approval by the agency of manufacture, use or sale for human administration.
  • the agents of the present invention may be employed in conjunction with other therapeutic compounds.
  • the present invention further provides the first demonstration that a sequence of greater than one megabase can be sequenced using a random shotgun approach.
  • This procedure described in detail in the examples that foUow, has eUminated the up front cost of isolating and ordering overlapping or contiguous subclones prior to the start of the sequencing protocols.
  • the overaU strategy for a shotgun approach to whole genome sequencing is outiined in Table 3.
  • the total gap length is Le n
  • the average gap size is L/n.
  • 5X coverage would leave about 128 gaps averaging about 100 bp in size.
  • the treatment is essentiaUy that of Lander and Waterman, Genomics 2:231 (1988).
  • Table 4 illustrates the coverage for a 1.9 Mb genome with an average fragment size of 460 bp.
  • H. influenzae Rd KW20 DNA was prepared by phenol extraction. A mixture (3.3 ml) containing 600 ⁇ g DNA, 300 mM sodium acetate, 10 mM Tris- ⁇ Cl, 1 mM Na-EDTA, 30% glycerol was sonicated (Branson Model 450 Sonicator) at the lowest energy setting for 1 min. at 0° using a 3 mm probe. The DNA was ethanol precipitated and redissolved in 500 ⁇ l TE buffer. To create blunt-ends, a 100 ⁇ l aUquot was digested for 10 min at 30° in 200 ⁇ l BAL31 buffer with 5 units BAL31 nuclease (New England BioLabs).
  • the DNA was phenol-extracted, ethanol -precipitated, redissolved in 100 ⁇ l TE buffer, electrophoresed on a 1.0% low melting agarose gel, and the 1.6-2.0 kb size fraction was excised, phenol-extracted, and redissolved in 20 ⁇ l TE buffer.
  • a two-step Ugation procedure was used to produce a plasmid library with 97% insert of which >99% were single inserts.
  • the first gation mixture (50 ⁇ l) contained 2 ⁇ g of DNA fragments, 2 ⁇ g Smal/ AP ⁇ UC18 DNA (Pharmacia), and 10 units T4 ligase (GIBCO/BRL), and incubation was at 14° for 4 hr.
  • the DNA was dissolved in 20 ⁇ l TE buffer and electrophoresed on a 1.0% low melting agarose gel.
  • a ladder of ethidium bromide-stained linear bands, identified by size as insert (i), vector (v), v+i, v+2i, v+3i, ... was visuahzed by 360 nm UV Ught, and the v+i DNA was excised and recovered in 20 ⁇ l TE.
  • the v+i DNA was blunt-ended by T4 polymerase treatment for 5 min.
  • ceUs were used to prevent rearrangements, deletions, and loss of clones by restriction.
  • Transformed ceUs were plated directly on antibiotic diffusion plates to avoid the usual broth recovery phase which allows multiplication and selection of the most rapidly growing ceUs. Plating occured as follows: A 100 ⁇ l atiquot of Epicurian CoU SURE II Supercompetent CeUs
  • Agar/L Agar/L).
  • the 5 ml bottom layer is supplemented with 0.4 ml ampicillin (50 mg/ml)/ 100 ml SOB agar.
  • the 15 ml top layer of SOB agar is supplemented with 1 ml X-Gal (2%), 1 ml MgCl 2 (1 M), and 1 ml MgSO 4 /100 ml SOB agar.
  • the 15 ml top layer was poured just prior to plating. Our titer was approximately 100 colonies/ 10 ⁇ l atiquot of transformation.
  • Figure 3 illustrates that there was essentially no deviation of the actual assembly data from the ideal plot, indicating that we had constructed close to an ideal random library with minimal contamination from double insert chimeras and free of vector.
  • High quatity double stranded DNA plasmid templates (19,687) were prepared using a "boiling bead” method developed in coUaboration with Advanced Genetic Technology Corp. (Gaithersburg, MD) (Adams et al. ,
  • Plamid preparation was performed in a 96-well format for aU stages of DNA preparation from bacterial growth through final DNA purification. Template concentration was determined using Hoechst Dye and a MUlipore Cytofluor. DNA concentrations were not adjusted, but low-yielding templates were identified where possible and not sequenced. Templates were also prepared from two H. influenzae lambda genomic libraries. An ampUfied library was constructed in vector Lambda GEM- 12 (Promega) and an unamplified tibrary was constructed in Lambda DASH II (Stratagene). In particular, for the
  • SUBST ⁇ UTE SHEET (RULE 26) unamplified lambda Ubrary, H. influenzae Rd KW20 DNA ( > 100 kb) was partially digested in a reaction mixture (200 ⁇ l) containing 50 ⁇ g DNA, IX Sau3Al buffer, 20 units Sau3AI for 6 min. at 23°. The digested DNA was phenol-extracted and electrophoresed on a 0.5% low melting agarose gel at 2V/cm for 7 hours. Fragments from 15 to 25 kb were excised and recovered in a final volume of 6 ⁇ l. One ⁇ l of fragments was used with 1 ⁇ l of DAS ⁇ II vector (Stratagene) in the recommended ligation reaction. One ⁇ l of the Ugation mixture was used per packaging reaction foUowing the recommended protocol with the Gigapack II XL Packaging Extract (Stratagene, #227711). Phage were plated directly without amplification from the packaging mixture
  • the amplified library was prepared essentially as above except the lambda GEM- 12 vector was used. After packaging, about 3.5x10* pfu were plated on the restrictive NM539 host. The lysate was harvested in 2 ml of SM buffer and stored frozen in 7% dimethylsulfoxide. The phage titer was approximately lxlO 9 pfu/ml.
  • Liquid lysates (10 ml) were prepared from randomly selected plaques and template was prepared on an anion-exchange resin (Qiagen). Sequencing reactions were carried out on plasmid templates using the AB Catalyst LabStation with Applied Biosystems PRISM Ready Reaction Dye Primer
  • SUBSTTTUTE SHEET (RULE 26) sequencing success rate was 84% for M13-21 sequences, 83% for M13RP1 sequences and 65% for dye-terminator reactions.
  • the average usable read length was 485 bp for M13-21 sequences, 444 bp for M13RP1 sequences, and 375 bp for dye-terminator reactions.
  • Table 5 summarizes the high-throughput sequencing phase of the invention.
  • Random reverse sequencing reactions were done based on successful forward sequencing reactoins.
  • Some M13RP1 sequences were obtained in a semi-directed fashion: Ml 3-21 sequences pointing outward at the ends of contigs were chosen for M13RP1 sequencing in an effort to specificaUy order contigs.
  • the semi- directed strategy was effective, and clone-based ordering formed an integral part of assembly and gap closure (see below).
  • the sequencing consisted of using eight ABI Catalyst robots and fourteen AB 373 Automated DNA Sequencers.
  • the Catalyst robot is a pubUcly available sophisticated pipetting and temperature control robot which has been developed specificaUy for DNA sequencing reactions.
  • the Catalyst combines pre-aliquoted templates and reaction mixes consisting of deoxy- and dideoxynucleotides, the Taq thermostable DNA polymerase, fluorescently- labelled sequencing primers, and reaction buffer. Reaction mixes and
  • SUBSTTTUTE SHEET (RULE 26) templates were combined in the weUs of an aluminum 96-well thermocycling plate. Thirty consecutive cycles of linear amplification (e.g., one primer synthesis) steps were performed including denaturation, annealing of primer and template, and extension of DNA synthesis. A heated lid with rubber gaskets on the thermocycling plate prevented evaporation without the need for an oU overlay.
  • linear amplification e.g., one primer synthesis
  • the shotgun sequencing involves use of four dye-labelled sequencing primers, one for each of the four terminator nucleotide. Each dye-primer is labelled with a different fluorescent dye, permitting the four individual reactions to be combined into one lane of the 373 DNA Sequencer for electrophoresis, detection, and base-calUng.
  • AB currently suppUes pre-mixed reaction mixes in bulk packages containing all the necessary non-template reagents for sequencing. Sequencing can be done with both plasmid and PCR-generated templates with both dye-primers and dye- terminators with approximately equal fideUty, although plasmid templates generaUy give longer usable sequences.
  • Electrophoresis was run overnight foUowing the manufacture's protocols, and the data was collected for twelve hours.
  • the AB 373 performs automatic lane tracking and base-caUing.
  • the lane-tracking was confirmed visuaUy.
  • Each sequence electropherogram (or fluorescence lane trace) was inspected visuaUy and assessed for quality. TraUing sequences of low quaUty were removed and the sequence itself was loaded via software to a Sybase database (archived daUy to a 8mm tape). Leading vector polylinker sequence was removed automaticaUy by software program. Average edited lengths of sequences from the standard ABI 373 were around 400 bp and depended mostly on the quaUty of the template used for the sequencing reaction. AU of the ABI 373 Sequencers were converted to Stretch Liners, which provided a
  • SUBSTTTUTE SHEET (RULE 26) longer electrophoresis path prior to fluorescence detection, thus increasing the average number of usable bases to 500-600 bp.
  • TIGR Assembler An assembly engine (TIGR Assembler) was developed for the rapid and accurate assembly of thousands of sequence fragments.
  • the AB AutoAssemblerTM was modified (and named TIGR Editor) to provide a graphical interface to the electropherogram for the purpose of editing data associated with the a gned sequence file output of TIGR Assembler.
  • the TIGR assembler simultaneously clusters and assembles fragments of the genome.
  • the algorithm buUds a hash table of 10 bp oUgonucleotide subsequences to generate a Ust of potential sequence fragment overlaps. The number of potential overlaps for each fragment determines which fragments are likely to fall into repetitive elements.
  • TIGR Assembler extends the current contig by attempting to add the best matching fragment based on oUgonucleotide content.
  • the current contig and candidate fragment are aligned using a modified version of the Smith-Waterman algorithm (Waterman, M.S., Methods in Enzymology 164:765 (1988)) which provides for optimal gapped aUgnments.
  • the current contig is extended by the fragment only if strict criteria for the quality of the match are met.
  • the match criteria include the minimum length of overlap, the maximum length of an unmatched end, and the minimum percentage match. These criteria are automatically lowered by the algorithm in regions of minimal coverage and raised in regions with a possible repetitive element. The number of potential overlaps for each fragment determines which fragments are likely to faU into repetitive elements.
  • TIGR Assembler is designed to take advantage of clone size information coupled with sequencing from both ends of each template. It enforces the constraint that sequence fragments from two ends of the same template point toward one another in the contig and are located within a certain ranged of base pairs (definable for each clone based on the known clone size range for a given library). Assembly of 24,304 sequence fragments of H. influenzae required 30 hours of CPU time using one processor on a SPARCenter 2000 with 512 Mb of RAM. This process resulted in approximately 210 contigs.
  • Asm align uses a number of relationships to identify and atign contigs that are adjacent to each other.
  • the 140 contigs were placed into 42 groups totaling 42 physical gaps (no template DNA for the region) and 98 sequence gaps (template available for gap closure).
  • Oligonucleotide primers were designed and synthesized from the end of each contig group. These primers were then avaUable for use in one or more of the strategies outlined below:
  • SUBST ⁇ UTE SHEET (RULE 26) chromosomal DNA digested with one frequent cutters (Asel) and five less frequent cutters (Bglll, EcoKL, PstI, Xb ⁇ l, and PvuU). The DNA from each digest was fractionated on a 0.7% agarose gel artd transferred to Nytran Plus nylon membranes (Schleicher & Schuell). Hybridization was carried out for 16 hours at 40°. To remove non-specific signals, each blot was sequentiaUy washed at room temperature with increasingly stringent conditions up to 0. IX SSC + 0.5% SDS. Blots were exposed to a Phosphorlmager cassette (Molecular Dynamics) for several hours and hybridization patterns were visually compared. Adjacent contigs identified in this manner were targeted for specific
  • the two lambda libraries constructed from H. influenaze genomic DNA were probed with oUgonucleotides designed from the ends of contig groups (Kirkness et al., Genomics 70:985 (1991)).
  • the positive plaques were then used to prepare templates and the sequence was determined from each end of the lambda clone insert. These sequence fragments were searched using grasta against a database of all contigs. Two contigs that matched the sequence from the opposite ends of the same lambda clone were ordered.
  • the lambda clone then provided the template for closure of the sequence gap between the adjacent contigs.
  • the lambda clones were especiaUy valuable for solving repeat structures.
  • Standard and long range (XL) PCR reactions were performed as follows. Standard PCR was performed in the following manner. Each reaction contained a 37 ⁇ l cocktail; 16.5 ⁇ l H 2 O, 3 ⁇ l 25 mM MgCl 2 , 8 ⁇ l of a dNTP
  • SUBST ⁇ UTE SHEET (RULE 26) mix (1.25 mM each dNTP), 4.5 ⁇ l 10X PCR core buffer II (Perkin Elmer), 25 ng H. influenzae Rd KW20 genomic DNA.
  • the appropriate two primers (4 ⁇ l, 3.2 pmole/ ⁇ l) were added to each reaction.
  • a hot start was performed at 95° for 5 min foUowed by a 75° hold.
  • Amplitaq DNA polymerase (Perkin Elmer) 0.3 ⁇ l in 4.3 ⁇ l ⁇ 2 O, 0.5 ⁇ l 10X PCR core buffer ⁇ , was added to each reaction.
  • the PCR profile was 25 cycles of 94°/45 sec.
  • a hot start was performed at 94° for 1 minute.
  • ⁇ Tth polymerase 2.0 ⁇ l (4 U/reaction) in 2.8 ⁇ l 3.3X PCR buffer II was added to each reaction.
  • the PCR profile was 18 cycles of 94 * 715 sec., denature; 62°/8 min., anneal and extend followed by 12 cycles 94°/15 sec., denature; 62 ⁇ /8 min. (increase 15 sec./cycle), anneal and extend; 72°/ 10 min., final extension.
  • AU reactions were performed in a 96 well format on a Perkin Elmer GeneAmp PCR System 9600.
  • SUBST ⁇ UTE SHEET (RULE 26) confirmation of the overaU genome assembly.
  • the lambda clones provided closure for 23 physical gaps.
  • Lambda clones were also useful for solving repeat structures. Repeat structures identified in the genome were smaU enough to be spanned by a single clone from the random insert Ubrary, except for the six ribosomal RNA operons and one repeat (2 copies) which was 5,340 bp in length.
  • OUgonucleotide probes were designed from the unique flanks at the beginning of each repeat and hybridized to the lambda libraries. Positive plaques were identified for each flank and the sequence fragments from the ends of each clone were used to correctly orient the repeats within the genome.
  • the ability to distinguish and assemble the six ribosomal RNA (rRNA) operons of H. influenaze (16S subunit-23S subunit-5S subunit) was a test of our overaU strategy to sequence and assemble a complex genome which might contain a significant number of repeat regions. The high degree of sequence similarity and the length of the six operons caused the assembly process to cluster all the underlying sequences into a few indistinguishable contigs.
  • flanking sequences were designed from these six
  • SUBST ⁇ JTE SHEET (RULE 26) flanking regions and used to probe the two lambda libraries. For each of the six rRNA operons at least one positive plaque was identified which completely spanned the rRNA operon and contained unique flanking sequence at the 16S and 5S ends. These plaques provided the templates for obtaining the unique sequence for each of the six rRNA operons.
  • restriction fragments from the sequence-derived map matched those from the physical map in size and relative order ( Figure 5).
  • each contig was edited visually by reassembling overlapping 10 kb sections of contigs using the AB AutoAssemblerTM and the Fast Data FinderTM hardware.
  • AutoAssemblerTM provides a graphical interface to electropherogram data for editing.
  • the electropherogram data was used to assign the most likely base at each position. Where a discrepancy could not be resolved or a clear assignment made, the automatic base caUs were left unchanged.
  • Individual sequence changes were written to the electropherogram files and a replication protocol (crash) was used to maintain the synchrony of sequence data between the H. influenzae database and the electropherogram files.
  • a replication protocol was used to maintain the synchrony of sequence data between the H. influenzae database and the electropherogram files.
  • SUBST ⁇ UTE SHEET (RULE 25) aUgnment containing the frameshift.
  • Apparent frameshifts were used to indicate areas of the sequence which may require further editing.
  • Frameshifts were not corrected in cases where clear electropherogram data disagreed with a frameshift.
  • Frameshift editing was performed with TIGR Editor. The rRNA and other repeat regions precluded complete assembly of the circular genome with TIGR Assembler. Final assembly of the genome was accompUshed using comb asm which splices together contigs based on short overlaps.
  • the accuracy of the H. influenaze genome sequence is difficult to quantitate because there is very tittle previously determined H. influenaze sequence and most of these sequences are from other strains. There are, however, three parameters of accuracy that can be appUed to the data.
  • the H. influenaze Rd genome is a circular chromosome of 1,830,121 bp.
  • the G/C content of the genome was examined with several window lengths to look for global structural features. With a window of 5,000 bp, the G/C content is relatively even except for 7 large G/C-rich regions and several A/T-rich regions (Fig. 5).
  • the G/C rich regions correspond to six rRNA operons and the location of a cryptic mu-Uke prophage.
  • Genes for several proteins with simUarity to proteins encoded by bacteriophage mu are located at approximately position 1.56-1.59 Mbp of the genome. This area of the genome has a markedly higher G/C content than average for H. influenaze ( " 50% G/C compared to " 38% for the rest of the genome). No significance has yet been ascertained for the source or importance of the A/T rich regions.
  • the minimal origin of replication (oriC) in E. coli is a 245 bp region defined by three copies of a thirteen base pair repeat containing a GATC core sequence at one end and four copies of a nine base pair repeat containing a TTAT core sequence at the other end.
  • the GATC sites are methylation targets and control replication whUe the TTAT sites provide the binding sites for DnaA, the first step in the reptication process (Genes V, B. Lewin Ed. (Oxford University Press, New York, 1994), chap. 18-19).
  • An approximately 281 bp sequence (602,483 - 602,764) whose limits are defined by these same core sequences appears to define the origin of replication in H. influenaze Rd.
  • Termination of E. coli replication is marked by two 23 bp termination sequences located "100 kb on either side of the midway point at which the two reptication forks meet. Two potential termination sequences sharing a 10 bp core sequence with the E. coli termination sequence were identified in H. influenaze at coordinates 1,375,949-1,375,958 and
  • Each rRNA operon contains three rRNA subunits and a variable spacer region in the order: 16S subunit - spacer region - 23S subunit -5S subunit.
  • the subunit lengths are 1539 bp, 2653 bp, and 116 bp, respectively.
  • the G/C content of the three ribosomal subunits (50%) is higher than the genome as a whole.
  • the G/C content of the spacer region (38%) is consistent with the remainder of the genome.
  • the nucleotide sequence of the three rRNA subunits is 100% identical in aU six ribosomal operons.
  • the rRNA operons can be grouped into two classes based on the spacer region between the 16S and 23S sequences.
  • the shorter of the two spacer regions is 478 bp in length (rrnB, rrnE, and rrnF) and contains the gene for tRNA Glu.
  • the longer spacer is 723 bp in length (rrnA, rrnC, and rrnD) and contains the genes for tRNA He and tRNA Ala.
  • the two sets of spacer regions are also 100% identical across each group of three operons.
  • tRNA genes are also present at the 16S and 5S ends of two of the rRNA operons.
  • the genes for tRNA Arg, tRNA His, and tRNA Pro are located at the 16S end of rrnE whUe the genes for tRNA Trp, and tRNA Asp are located at the 5S end of rrnA.
  • the predicted coding regions of the H. influenaze genome were initiaUy defined by evaluating their coding potential with the program Genemark (Borodovsky and Mclninch, Computers Chem. 17(2): 123 (1993)) using codon frequency matrices derived from 122 H. influenaze coding sequences in GenBank.
  • the predicted coding region sequences (plus 300 bp of flanking sequence) were used in searches against a database of non-redundant bacterial proteins (NRBP) created specifically for the annotation. Redundancy was removed from NRBP at two stages. AU DNA coding sequences were extracted from GenBank (release 85), and sequences from the same species were searched against each other. Sequences having > 97% similarity over regions > 100 nucleotides were combined. In addition, the sequences were translated and used in protein comparisons with aU sequences in Swiss-Prot
  • NRBP SUBST ⁇ TUTE SHEET (RULE 26) (release 30). Sequences belonging to the same species and having >98% similarity over 33 amino acids were combined.
  • NRBP is composed of 21 ,445 sequences extracted from 23,751 GenBank sequences and 11, 183 Swiss-Prot sequences from 1,099 different species. A total of 1,749 predicted coding regions were identified. Searches of the H. infuenzae predicted coding regions were performed using an algorithm that translates the query DNA sequence in the three plus-strand reading frames for searching against NRBP, identifies the protein sequences that match the query, and atigns the protein-protein matches using praze, a modified Smith- Waterman (Pearson and Lip an, Proc. Natl Acad. Sci. U.S.A. 85:2444
  • H. influenaze gene was assigned to one of 102 biological role categories adapted from Riley (Riley, M., Microbiology Reviews 57(4):%62 (1993)). Assignments were made by linking the protein sequence of the predicted coding regions with the Swiss-Prot sequences in the Riley database. Of the 1,749 predicted coding regions, 724 have no role assignment. Of these, no database match was found for 384, whUe 340 matched "hypothetical proteins" in the database. Role assignments were made for 1,025 of the predicted coding regions. A compilation of aU the predicted coding regions, their unique identifiers, a three letter gene identifier, percent identity, percent simUarity, and amino acid match length are presented in
  • SUBSTTTUTE SHEET (RULE 26) An annotated complete genome map of H. influenaze Rd is presented in Figures 6(A)-(D). The map places each predicted coding region on the H. influenaze chromosome, indicates its direction of transcription and color codes its role assignment. Role assignments are also represented in Figure 5.
  • a survey of the genes and their chromosomal organization in H. influenaze Rd make possible a description of the metaboUc processes H. influenaze requires for survival as a free Uving organism, the nutritional requirements for its growth in the laboratory, and the characteristics which make it unique from other organisms specificaUy as it relates to its pathogenicity and virulence.
  • the genome would be expected to have complete complements of certain classes of genes known to be essential for life. For example, there is a one-to-one correspondence of pubUshed E. coli ribosomal protein sequences to potential homologs in the H. influenaze database. Likewise, as shown in Table 1(a), an aminoacyl tRNA-synthetase is present in the genome for each amino acid. Finally, the location of tRNA genes was mapped onto the genome. There are 54 identified tRNA genes, including representatives of all 20 amino acids.
  • H. influenaze In order to survive as a free living organism, H. influenaze must produce energy in the form of ATP via fermentation and/or electron transport.
  • H. influenaze Rd As a facultative anaerobe, H. influenaze Rd is known to ferment glucose, fructose, galactose, ribose, xylose and fucose (Dorocicz et al , J. Bacteriol. 775:7142 (1993)).
  • the genes identified in Table 1(a) indicate that transport systems are available for the uptake of these sugars via the phosphoenolpyruvate-phosphotransferase system (PTS), and via non-PTS mechanisms.
  • PTS phosphoenolpyruvate-phosphotransferase system
  • ⁇ pr ptsl and ptsH of the PTS system were identified as well as the glucose specific err gene.
  • the ptsH, ptsl, and err genes constitute the pts operon .
  • a complete PTS system for fructose was identified.
  • SUBST7TUTE SHEET (RULE 26) Genes encoding the complete glycolytic pathway and for the production of fermentative end products were identified. Growth utilizing anaerobic respiratory mechanisms were found by identifying genes encoding functional electron transport systems using inorganic electron acceptors such as nitrates, nitrites, and dimethylsulfoxide. Genes encoding three enzymes of the tricarboxyUc acid (TCA) cycle appear to be absent from the genome. Citrate synthase, isocitrate dehydrogenase, and accordingtase were not found by searching the predicted coding regions or by using the E. coli enzymes as peptide queries against the entire genome in translation. This provides an explanation for the very high level of glutamate OE/L) which is required in defined culture media
  • Glutamate can be directed into the TCA cycle via conversion to alpha-ketoglutarate by glutamate dehydrogenase.
  • glutamate presumably serves as the source of carbon for biosynthesis of amino acids using precursors which branch from the TCA cycle.
  • Functional electron transport systems are available for the production of ATP using oxygen as a terminal electron acceptor.
  • H. influenzae Rd possesses a highly efficient natural DNA transformation system (Kahn and Smith, J. Membrane Biol. 758:155 (1984).
  • comA to comF comprise an operon which is under positive control by a 22-bp palindromic competence regulatory element (CRE) about one helix turn upstream of the promoter.
  • CRE palindromic competence regulatory element
  • the rec-2 transformation gene is also controUed by this element. It is now possible to locate additional copies of CRE in the genome and discover potential transformation genes under CRE control. In addition, it may now be possible to discover other global regulatory elements with an ease not previously possible.
  • the regulator protein is generally a transcription factor which, when activated by the sensor, turns on or off expression of a specific set of genes (for review, see Albright et al. , Ann. Rev. Genet. 23:311
  • SUBST ⁇ UTE SHEET (RULE 25) representatives of the NtrC-class of regulators were found. This class of proteins interacts directly with the sigma-54 subunit of RNA polymerase, which is not present in H. influenaze. AU of the regulator proteins faU into the OmpR subclass (Albright et al, Ann. Rev. Genet. 25:311 (1989); Parkinson and Kofoid, Ann. Rev. Genet. 26:71 (1992)).
  • H. influenaze are adjacent to one another and presumably form an operon.
  • the nar and arc genes are not located adjacent to one another.
  • the non-pathogenic H. influenaze Rd strain varies significantly from the pathogenic serotype b strains. Many of the differences between these two strains appear in factors affecting infectivity. For example, the eight genes which make up the fimbrial gene cluster (van ⁇ am et al, Mol. Microbiol. 13:673 (1994)) involved in adhesion of bacteria to host ceUs are now shown to be absent in the Rd strain.
  • the pepN and purE genes which flank the fimbrial cluster in H. influenaze type b strains are adjacent to one another in the Rd strain (Fig.
  • SUBST ⁇ UTE SHEET (RULE 26) were matched by an H. influenaze sequence whereas only 38% of the hypothetical proteins were matched. Proteins are annotated as hypothetical based on a lack of matches with any other known protein (Yura et al. , Nucleic Acids Research 20:3305 (1992); Burland et al. , Genomics 16:551 (1993)). At least two potential explanations can be offered for the over representation of hypothetical proteins among those without matches: some of the hypothetical proteins are not, in fact, translated (at least in the annotated frame), or these are E. cofi-specific proteins that are unUkely to be found in any species except those most closely related to E. coli, for example Salmonella typhimurium.
  • a total of 384 predicted coding regions did not display significant simUarity with a six-frame translation of GenBank release 87. These unidentified coding regions were compared to one another with fasta. Several novel gene famiUes were identified. For example, two predicted coding regions without database matches ( ⁇ I0591, HI0852) share 75% identity over almost their entire lengths (139 and 143 amino acid residues respectively).
  • GSDB Sequence DataBase
  • Substantially pure protein or polypeptide is isolated from the transfected or transformed cells using any one of the methods known in the art.
  • the protein can also be produced in a recombinant prokaryotic expression system, such as E. coli, or can by chemicaUy synthesized. Concentration of protein in the final preparation is adjusted, for example, by concentration on an Amicon filter device, to the level of a few micrograms/ml.
  • Monoclonal or polyclonal antibody to the protein can then be prepared as foUows:
  • Monoclonal antibody to epitopes of any of the peptides identified and isolated as described can be prepared from murine hybridomas according to the classical method of Kohler, G. and MUstein, C, Nature 256:495 (1975) or
  • SUBSTITUTE SHEET (RULE 25) modifications of the methods thereof. Briefly, a mouse is repetitively inoculated with a few micrograms of the selected protein over a period of a few weeks. The mouse is then sacrificed, and the antibody producing cells of the spleen isolated. The spleen ceUs are fused by means of polyethylene glycol with mouse myeloma ceUs, and the excess unfused cells destroyed by growth of the system on selective media comprising aminopterin (HAT media). The successfuUy fused ceUs are dUuted and aliquots of the dilution placed in weUs of a microtiter plate where growth of the culture is continued.
  • HAT media aminopterin
  • Antibody- producing clones are identified by detection of antibody in the supernatant fluid of the weUs by immunoassay procedures, such as ELISA, as originaUy described by EngvaU, E., Meth. Enzymol 70:419 (1980), and modified methods thereof. Selected positive clones can be expanded and their monoclonal antibody product harvested for use. DetaUed procedures for monoclonal antibody production are described in Davis, L. et al. Basic Methods in Molecular Biology Elsevier, New York. Section 21-2 (1989).
  • Polyclonal antiserum containing antibodies to heterogenous epitopes of a single protein can be prepared by immunizing suitable animals with the expressed protein described above, which can be unmodified or modified to enhance immunogenicity. Effective polyclonal antibody production is affected by many factors related both to the antigen and the host species. For example, smaU molecules tend to be less immunogenic than other and may require the use of carriers and adjuvant. Also, host animals vary in response to site of inoculations and dose, with both inadequate or excessive doses of antigen resulting in low titer antisera. Small doses (ng level) of antigen administered at multiple intradermal sites appears to be most reUable. An effective immunization protocol for rabbits can be found in Vaitukaitis, J. et al , J. Clin. Endocrinol. Metab. 55:988-991 (1971).
  • Booster injections can be given at regular intervals, and antiserum harvested when antibody titer thereof, as determined semi-quantitatively, for example, by double immunodiffusion in agar against known concentrations of the antigen, begins to faU. See, for example, Ouchterlony, O. et al , Chap. 19 in: Handbook of Experimental Immunology, Wier, D., ed, Blackwell
  • Plateau concentration of antibody is usually in the range of 0.1 to 0.2 mg/ml of serum (about 12 ⁇ M).
  • Affinity of the antisera for the antigen is determined by preparing competitive binding curves, as described, for example, by Fisher, D., Chap. 42 in: Manual of Clinical Immunology, second edition, Rose and Friedman, eds., Amer. Se . For Microbiology, Washington,
  • Antibody preparations prepared according to either protocol are useful in quantitative immunoassays which determine concentrations of antigen- bearing substances in biological samples; they are also used semi-quantitatively or qualitatively to identify the presence of antigen in a biological sample.
  • PCR primers are preferably at least 15 bases, and more preferably at least 18 bases in length.
  • the primer pairs have approximately the same G/C ratio, so that melting temperatures are approximately the same.
  • the PCR primers and ampUfied DNA of this Example find use in the Examples that foUow.
  • a fragment of the Haemophilus influenzae Rd genome provided in Tables 1(a) or 2 is introduced into an expression vector using conventional
  • SUBST ⁇ UTE SHEET (RULE 26) technology.
  • Techniques to transfer cloned sequences into expression vectors that direct protein translation in mammalian, yeast, insect or bacterial expression systems are weU known in the art.
  • CommerciaUy avaUable vectors and expression systems are avaUable from a variety of suppUers including Stratagene (La JoUa, CaUfornia), Promega (Madison, Wisconsin), and
  • codon context and codon pairing of the sequence may be optimized for the particular expression organism, as explained by Hatfield et al , U.S. Patent No. 5,082,767, incorporated herein by this reference.
  • the foUowing is provided as one exemplary method to generate polypeptide(s) from cloned ORFs of the Haemophilus genome fragment. Since the ORF lacks a poly A sequence because of the bacterial origin of the ORF, this sequence can be added to the construct by, for example, spUcing out the poly A sequence from pSG5 (Stratagene) using BgH and Sail restriction endonuclease enzymes and incorporating it into the mammaUan expression vector pXTl (Stratagene) for use in eukaryotic expression systems. pXTl contains the LTRs and a portion of the gag gene from Moloney Murine Leukemia Virus.
  • the position of the LTRs in the construct aUow efficient stable transfection.
  • the vector includes the Herpes Simplex thymidine kinase promoter and the selectable neomycin gene.
  • the Haemophilus DNA is obtained by PCR from the bacterial vector using oUgonucleotide primers complementary to the Haemophilus DNA and containing restriction endonuclease sequences for PstI incorporated into the 5' primer and BglQ. at the 5' end of the corresponding Haemophilus DNA 3' primer, taking care to ensure that the Haemophilus DNA is positioned such that its foUowed with the poly A sequence.
  • the purified fragment obtained from the resulting PCR reaction is digested with Pstl, blunt ended with an exonuclease, digested with BglQ, purified and Ugated to pXTl , now containing a poly A sequence and digested BglR.
  • SUBST ⁇ TUTE SHEET (RULE 26) The Ugated product is transfected into mouse NIH 3T3 ceUs using Lipofectin (Life Technologies, Inc., Grand Island, New York) under conditions outlined in the product specification. Positive transfectants are selected after growing the transfected ceUs in 600 ug/ml G418 (Sigma, St. Louis, Missouri). The protein is preferably released into the supernatant.
  • the protein may additionaUy be retained within the ceU or expression may be restricted to the ceU surface.
  • the Haemophilus DNA sequence is additionaUy incorporated into eukaryotic expression vectors and expressed as a chimeric with, for example, ⁇ -globin.
  • Antibody to ⁇ -globin is used to purify the chimeric.
  • Corresponding protease cleavage sites engineered between the ⁇ -globin gene and the Haemophilus DNA are thai used to separate the two polypeptide fragments from one another after translation.
  • One useful expression vector for generating ⁇ -globin chimerics is pSG5 (Stratagene). This vector encodes rabbit ⁇ -globin.
  • Intron ⁇ of the rabbit ⁇ -globin gene facilitates splicing of the expressed transcript, and the polyadenylation signal incorporated into the construct increases the level of expression.
  • Polypeptide may additionaUy be produced from either construct using in vitro translation systems such as In vitro ExpressTM Translation Kit (Stratagene).
  • HI1704 Isg locus hypothetica protein GB:M94855_3)
  • HI1705 Isg locus hypothetica protein GB:M94855_2)
  • Random small insert and large insert library Randomly sheared genomic DNA on the order of construction 2 kb and 15-20 kb respectively
  • Gap closure c r- a Physical gaps Order all contigs (fingerprints, peptide links, m lambda clones, PCR) and provide templates for closure
  • Annotation Identification and description of all predicted coding regions (putative identifications, starts and stops, role assignments, operons, regulatory regions)
  • ADDRESSEE Sterne, Kessler, Goldstein & Fox, P.L.L.C.
  • TAGAATTTAA TTTACGCTCT AAAAATGAAC AAGGGATCAC TAAAAATAAT TTAAAACAAT 540
  • CAACAAGCCA AAACTCGTAC AAATATGACC GCACTTCGCT ATAAAGAACA CGGCTTGTGG 720
  • CAAAAACAAT TATCGGATTT ATTTACGATT ATTTATACCT CAGGCACAAC GGGAGAGCCT 1200
  • CTTAATGTGA CAGATCAGGA TATTTCACTT TCTTTTTTAC CATTCTCTCA TATTTTTGAA 1320
  • TCAATCGGCA CACTGATGCC AAAAGCGGAA GTGAAAATTG GGGAAAATAA TGAAATCCTT 1860
  • CACATGTTCT AGCATATCCA GATCATTAAA ATTATCGCCA AATGCAATCA CTTCATTAGT 2640
  • TTGTACGCTA TAAATTGGTT CGAGATTTTG GTTCAAAATA AGCGCACCAC TAAATGCAAC 3060
  • GTTACTGCTA AATCCGTTGC GGCAGAAATC GTGACTAACA TTTCCACCCC GCAAGCCACT 4140
  • ATCTAGATCA CAATATTTGT TGAAGCTAAT GTCATTTGAT TATGGATTTA TCGTTAAAAA 4920
  • AACACTATAC CAACGCATCA TTCTTAATTG ATGAAGGTTT CAAATTTGAA GATGGTTTAT 6060
  • GGTTTAACAC CAGCACCACA AGCGAGAGAT CATAAAGTTG AAATCGCAAA ATTAATTGAT 8280
  • AAAGTGCGGT CAATTTTTCC GTTGAATTTC AAGGTCAAGC GGATTTAATC GTTTATTATT 12000
  • ATCCCCATCA CAATAATGGG GATTTTTATT ATGCGTATAA ATTTTACCGC ATTTTATTCA 12840
  • TCCCCCATCA CAAAATACTG CCCCTCTGGC ACAAGCCATT CTGCAGTTTG CATTCCCTCT 14640

Landscapes

  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Medicinal Chemistry (AREA)
  • Molecular Biology (AREA)
  • Biochemistry (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Communicable Diseases (AREA)
  • Peptides Or Proteins (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Preparation Of Compounds By Using Micro-Organisms (AREA)
EP96912845A 1995-04-21 1996-04-22 Nukleotidsequenzen von haemophilus influenzae rd genom fragmente davon und ihre verwendungen Withdrawn EP0821737A4 (de)

Applications Claiming Priority (7)

Application Number Priority Date Filing Date Title
US476102 1983-03-17
US42678795A 1995-04-21 1995-04-21
US426787 1995-04-21
US08/476,102 US6355450B1 (en) 1995-04-21 1995-06-07 Computer readable genomic sequence of Haemophilus influenzae Rd, fragments thereof, and uses thereof
US487429 1995-06-07
US08/487,429 US6468765B1 (en) 1995-04-21 1995-06-07 Selected Haemophilus influenzae Rd polynucleotides and polypeptides
PCT/US1996/005320 WO1996033276A1 (en) 1995-04-21 1996-04-22 NUCLEOTIDE SEQUENCE OF THE HAEMOPHILUS INFLUENZAE Rd GENOME, FRAGMENTS THEREOF, AND USES THEREOF

Publications (2)

Publication Number Publication Date
EP0821737A1 true EP0821737A1 (de) 1998-02-04
EP0821737A4 EP0821737A4 (de) 2005-01-19

Family

ID=27411524

Family Applications (1)

Application Number Title Priority Date Filing Date
EP96912845A Withdrawn EP0821737A4 (de) 1995-04-21 1996-04-22 Nukleotidsequenzen von haemophilus influenzae rd genom fragmente davon und ihre verwendungen

Country Status (5)

Country Link
EP (1) EP0821737A4 (de)
JP (1) JPH11501520A (de)
AU (1) AU5552396A (de)
CA (1) CA2218741A1 (de)
WO (1) WO1996033276A1 (de)

Families Citing this family (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6676948B2 (en) * 1994-08-25 2004-01-13 Washington University Haemophilus adherence and penetration proteins
GB9607993D0 (en) * 1996-04-18 1996-06-19 Smithkline Beecham Plc Novel compounds
JP4469026B2 (ja) * 1996-10-31 2010-05-26 ヒューマン ジノーム サイエンシーズ,インコーポレイテッド Streptococcus pneumoniaeの抗原およびワクチン
CA2289116A1 (en) 1997-05-06 1998-11-12 Human Genome Sciences, Inc. Enterococcus faecalis polynucleotides and polypeptides
EP0896061A3 (de) * 1997-08-08 2000-07-26 Smithkline Beecham Corporation RpoA Gen aus Staphylococcus aureus
US6548633B1 (en) 1998-12-22 2003-04-15 Genset, S.A. Complementary DNA's encoding proteins with signal peptides
AU764441B2 (en) 1998-02-09 2003-08-21 Genset S.A. cDNAs encoding secreted proteins
GB9805792D0 (en) * 1998-03-18 1998-05-13 Glaxo Group Ltd Bacterial polypeptide family
GB9808423D0 (en) * 1998-04-22 1998-06-17 Glaxo Group Ltd Bacterial polypeptide family
GB9808350D0 (en) * 1998-04-22 1998-06-17 Glaxo Group Ltd Bacterial polypeptide family
GB9808366D0 (en) * 1998-04-22 1998-06-17 Glaxo Group Ltd Bacterial polypeptide family
GB9808363D0 (en) * 1998-04-22 1998-06-17 Glaxo Group Ltd Bacterial polypeptide family
GB9808866D0 (en) 1998-04-24 1998-06-24 Smithkline Beecham Biolog Novel compounds
RU2227043C2 (ru) * 1998-05-01 2004-04-20 Чирон Корпорейшн Антигены neisseria meningitidis и композиции
EP2261338A3 (de) 1998-05-01 2012-01-04 Novartis Vaccines and Diagnostics, Inc. Antigene und Zusammensetzungen gegen Neisseria meningitidis
CA2341765A1 (en) * 1998-08-24 2000-03-02 Mount Sinai Hospital Trna binding domain
GB9902880D0 (en) * 1999-02-09 1999-03-31 Smithkline Beecham Biolog Novel compounds
GB9904183D0 (en) * 1999-02-24 1999-04-14 Smithkline Beecham Biolog Novel compounds
GB9914945D0 (en) * 1999-06-25 1999-08-25 Smithkline Beecham Biolog Novel compounds
WO2001011033A2 (en) * 1999-08-04 2001-02-15 Abbott Laboratories Identification of genes essential for the survival of haemophilus influenzae through genome scanning by transposition mutagenesis
EP1136557A1 (de) * 2000-03-21 2001-09-26 De Staat Der Nederlanden Vertegenwoordigd Door De Minister Van Welzijn, Volksgezondheid En Cultuur An Paracytose beteiligte Proteine und Polypeptide aus Haemophilus influenzae, dafür kodierende Nukleinsäuresequenzen, und ihre Verwendungen
WO2002018601A2 (en) * 2000-08-25 2002-03-07 Abbott Laboratories Essential bacteria genes and genome scanning in haemophilus influenzae for the identification of 'essential genes'
WO2002028889A2 (en) * 2000-10-02 2002-04-11 Shire Biochem Inc. Haemophilus influenzae antigens and corresponding dna fragments
AU2007202270B8 (en) * 2000-10-02 2011-10-13 Id Biomedical Corporation Of Quebec Haemophilus influenzae antigens and corresponding DNA fragments
GB0025171D0 (en) * 2000-10-13 2000-11-29 Smithkline Beecham Biolog Novel compounds
GB0025169D0 (en) * 2000-10-13 2000-11-29 Smithkline Beecham Biolog Novel compounds
GB0025488D0 (en) * 2000-10-17 2000-11-29 Smithkline Beecham Biolog Novel compounds
GB0025998D0 (en) * 2000-10-24 2000-12-13 Smithkline Beecham Biolog Novel compounds
WO2002046215A2 (en) * 2000-12-08 2002-06-13 Glaxosmithkline Biologicals S.A. Basb221 polypeptides and polynucleotides encoding basb221 polypeptides
GB0103866D0 (en) * 2001-02-16 2001-04-04 Smithkline Beecham Biolog Novel compounds
WO2002077020A2 (en) * 2001-03-22 2002-10-03 Isis Innovation Limited Virulence genes in h. influenzae
WO2002088361A2 (en) * 2001-04-30 2002-11-07 Glaxosmithkline Biologicals S.A. Haemophilus influenzae antigens
CN1659280A (zh) 2002-04-09 2005-08-24 雀巢制品公司 La1-乳杆菌属菌株的基因组
CN103589650A (zh) 2005-03-18 2014-02-19 米克罗比亚公司 产油酵母和真菌中类胡萝卜素的产生
CA2871088A1 (en) * 2005-06-16 2006-12-28 Lauren O. Bakaletz Genes of an otitis media isolate of nontypeable haemophilus influenzae
WO2008042338A2 (en) 2006-09-28 2008-04-10 Microbia, Inc. Production of carotenoids in oleaginous yeast and fungi

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
DATABASE EMBL SEQUENCE DATABASE EBI, HINXTON, UK; 23 July 1991 (1991-07-23), MASKELL D.J.: "Haemophilus influenzae lic3 locus, containing galE and adk genes for UDP-galactose 4-epimerase" XP002293203 Database accession no. X57315 -& DATABASE EMBL SEQUENCE DATABASE EBI, HINXTON, UK; 1 May 1992 (1992-05-01), "Y350 HAEIN" XP002293204 Database accession no. P24326 -& D,J. MASKELL ET AL.: "Molecular analysis of a complex locus from Haemophilus influenzae involved in phase-variable lipopolysaccharide biosynthesis" MOL. MICROBIOLOGY,, vol. 5, no. 5, 1991, pages 1013-1022, XP008034312 BLACKWELL SCIENTIFIC, OXFORD, GB , ISSN 0950-382X *
DATABASE EMBL SEQUENCE DATABASE EBI, HINXTON, UK; 8 October 1994 (1994-10-08), HANSEN E.J.: "Haemophils influenzae biopolymer transport protein (exbB and exbD) complete coding sequence" XP002293180 Database accession no. U08209 -& DATABASE EMBL SEQUENCE DATABASE EBI, HINXTON, UK; 1 November 1995 (1995-11-01), "EXBD_HAEIN" XP002293194 Database accession no. P43009 -& JAROSIK G P ET AL: "Cloning and sequencing of the Haemophilus influenzae exbB and exbD genes" GENE, ELSEVIER BIOMEDICAL PRESS. AMSTERDAM, NL, vol. 152, no. 1, 11 January 1995 (1995-01-11), pages 89-92, XP004042594 ISSN: 0378-1119 *
See also references of WO9633276A1 *

Also Published As

Publication number Publication date
CA2218741A1 (en) 1996-10-24
WO1996033276A1 (en) 1996-10-24
JPH11501520A (ja) 1999-02-09
EP0821737A4 (de) 2005-01-19
AU5552396A (en) 1996-11-07

Similar Documents

Publication Publication Date Title
EP0821737A1 (de) Nukleotidsequenzen von haemophilus influenzae rd genom fragmente davon und ihre verwendungen
AU745787B2 (en) Enterococcus faecalis polynucleotides and polypeptides
KR101914245B1 (ko) 박테리아성 균주를 함유한 조성물
KR100923598B1 (ko) 스트렙토코커스 피오게네스의 표면 단백질
JPH09322781A (ja) Staphylococcus aureusポリヌクレオチドおよび配列
AU2016357553A1 (en) Compositions comprising bacterial strains
EP0941335A2 (de) Polynukleotide und sequenzen aus streptococcus pneumoniae
KR101986442B1 (ko) 류머티스성 관절염용 바이오마커 및 이의 용도
RU2673715C2 (ru) Вакцина против haemophilus parasuis серологического типа 4
KR102191537B1 (ko) 포유류에서 골 소실을 예방하는 젖산균의 선별 및 이의 용도
JPH09252787A (ja) マイコプラズマ・ジェニタリウムゲノムまたはその断片のヌクレオチド配列およびその使用
AU2022256122A1 (en) Novel Proteins From Anaerobic Fungi And Uses Thereof
WO1998058943A1 (en) Borrelia burgdorferi polynucleotides and sequences
AU2016295176A1 (en) Genetic testing for predicting resistance of gram-negative proteus against antimicrobial agents
KR20200038970A (ko) 박테리아 균주를 포함하는 조성물
EP1337552A2 (de) Lichtgesteuerte energieerzeugung mittels proteorhodopsin
KR102797387B1 (ko) Slam 폴리뉴클레오타이드 및 폴리펩타이드 및 이의 용도
AU2018256922B2 (en) Targeted gene disruption methods and immunogenic compositions
KR20220135669A (ko) 서팩틴을 생산하는 신규한 바실러스 서브틸리스 균주 및 이의 용도
KR20190059562A (ko) γPGA 활성을 가지는 신규 고초균 및 이의 용도
AU777190B2 (en) Streptococcus pneumoniae polynucleotides and sequences
KR20060060389A (ko) 자이모모나스 모빌리스 zm4의 게놈 서열 및 에탄올생산에 관여하는 신규 유전자
KR20230172913A (ko) 병원성 대장균 또는 시겔라 속 균의 특이적 제어를 위해 축산 농가로부터 분리한 신규 박테리오파지
AU1546202A (en) Enterococcus faecalis polynucleotides and polypeptides

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 19971120

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

RIC1 Information provided on ipc code assigned before grant

Ipc: 7A 61K 39/102 B

Ipc: 7C 12Q 1/68 B

Ipc: 7C 07K 16/12 B

Ipc: 7C 07K 14/285 B

Ipc: 7C 12N 15/31 A

A4 Supplementary search report drawn up and despatched

Effective date: 20041206

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN

18W Application withdrawn

Effective date: 20050429