WO2024059750A2 - Diagnostic du cancer de l'ovaire à l'aide d'une quantification ciblée d'une glycosylation de protéine spécifique à un site - Google Patents

Diagnostic du cancer de l'ovaire à l'aide d'une quantification ciblée d'une glycosylation de protéine spécifique à un site Download PDF

Info

Publication number
WO2024059750A2
WO2024059750A2 PCT/US2023/074251 US2023074251W WO2024059750A2 WO 2024059750 A2 WO2024059750 A2 WO 2024059750A2 US 2023074251 W US2023074251 W US 2023074251W WO 2024059750 A2 WO2024059750 A2 WO 2024059750A2
Authority
WO
WIPO (PCT)
Prior art keywords
peptide
ovarian cancer
structures
peptide structure
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2023/074251
Other languages
English (en)
Other versions
WO2024059750A3 (fr
Inventor
Chirag DHAR
Prasanna Ramachandran
Tomislav CAVAL
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Venn Biosciences Corp
Original Assignee
Venn Biosciences Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Venn Biosciences Corp filed Critical Venn Biosciences Corp
Priority to EP23866505.3A priority Critical patent/EP4587839A2/fr
Publication of WO2024059750A2 publication Critical patent/WO2024059750A2/fr
Publication of WO2024059750A3 publication Critical patent/WO2024059750A3/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • G01N33/6803General methods of protein analysis not limited to specific proteins or families of proteins
    • G01N33/6842Proteomic analysis of subsets of protein mixtures with reduced complexity, e.g. membrane proteins, phosphoproteins, organelle proteins
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional [2D] or three-dimensional [3D] molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/20Protein or domain folding

Definitions

  • Embodiments of the present disclosure generally relate to methods and systems for analyzing peptide structures for diagnosing and/or treating ovarian cancer. More particularly, embodiments of the present disclosure relate to analyzing quantification data for a set of peptide structures detected in a biological sample obtained from a subject for use in diagnosing and/or treating the subject, the set of peptide structures being associated with ovarian cancer.
  • Protein glycosylation and other post-translational modifications play vital roles in virtually all aspects of human physiology. Unsurprisingly, faulty or altered protein glycosylation often accompanies various disease states. The identification of aberrant glycosylation provides opportunities for early detection, intervention, and treatment of affected subjects.
  • Current biomarker identification methods such as those developed in the fields of proteomics and genomics, can be used to detect indicators of certain diseases, such as cancer, and to differentiate certain types of cancer from other, non-cancerous diseases.
  • glycoproteomic analyses has not previously been used to successfully identify disease processes.
  • Glycoprotein analysis is fraught with challenges on several levels.
  • a single glycan composition in a peptide can contain a large number of isomeric structures due to different glycosidic linkages, branching patterns, and/or multiple monosaccharides having the same mass.
  • the presence of multiple glycans that share the same peptide backbone can lead to assay signals from various glycoforms, lowering their individual abundances compared to aglycosylated peptides. Accordingly, the development of algorithms that can identify glycan structures on peptide fragments remains elusive.
  • EOC Epithelial ovarian cancer
  • stage III or IV the majority of EOC cases are diagnosed at late-stage (stage III or IV), with 5-year survival rates between about 15% and 40%. Diagnosing early-stage EOC is impeded by initial clinical signs and symptoms that are generally nonspecific and commonly missed such as, for example, pelvic pain, urinary urgency/frequency, abdominal bloating, early satiety, loss of appetite, and weight loss.
  • An approach that is non-invasive, accurate, and reliable and that enables early diagnosis is needed.
  • An approach enabling early diagnosis may help reduce negative health outcomes in patients with ovarian cancer, reduce the under-treatment of ovarian cancer, and/or reduce the over-treatment of benign disease.
  • more strategic treatments can be provided with a diagnostic test that can assess whether a subject has early stage or late stage ovarian cancer.
  • a method for diagnosing a subject with respect to an ovarian cancer disease state includes receiving peptide structure data corresponding to a biological sample obtained from the subject.
  • the peptide structure data can be analyzed using a supervised machine learning model to generate a disease indicator that indicates whether the biological sample evidences the ovarian cancer disease state of having early stage or late stage ovarian cancer based on at least one peptide structure selected from one of a group of peptide structures identified in Tables 3B, 3C, or 3D.
  • a diagnosis output can be generated based on the disease indicator.
  • the disease indicator can include a score.
  • the method of generating the diagnosis output can include determining that the score falls above a selected threshold and generating the diagnosis output based on the score falling above the selected threshold, wherein the diagnosis output includes a classification of late stage ovarian cancer disease state.
  • the method of generating the diagnosis output can include determining that the score falls below a selected threshold and generating the diagnosis output based on the score falling below the selected threshold, wherein the diagnosis output includes a classification of early stage ovarian cancer disease state.
  • the score may include a probability score and the selected threshold is 0.5. Alternatively, the selected threshold may fall within a range between 0.30 and 0.65.
  • the analyzing the peptide structure data can include analyzing the peptide structure data using a binary classification model.
  • the peptide structure of the at least one peptide structure can include a glycopeptide structure defined by a peptide sequence and a glycan structure linked to the peptide sequence at a linking site of the peptide sequence, as identified in Table 3D, with the peptide sequence being one of SEQ ID NOS: 126-175 in Table 3D as defined in Table 5.
  • the method can include training the supervised machine learning model using training data, wherein the training data comprises a plurality of peptide structure profiles for a plurality of subjects and a plurality of subject diagnoses for the plurality of subjects, wherein the plurality of subject diagnoses includes a diagnosis for any subject of the plurality of subjects determined to have early stage or late stage ovarian cancer.
  • the method can include performing a differential expression analysis using initial training data to compare a first portion of the plurality of subjects diagnosed with the classification of early stage ovarian cancer disease state versus a second portion of the plurality of subjects diagnosed with the classification of late stage ovarian cancer disease state; identifying a training group of peptide structures based on the differential expression analysis for use as prognostic markers for the ovarian cancer disease state; and forming the training data based on the training group of peptide structures identified.
  • the training of the supervised machine learning model can include reducing the training group of peptide structures to a final group of peptide structures identified in Tables 3B, 3C, or 3D.
  • each peptide structure profile of the plurality of peptide structure profiles can include a feature selected from one of a relative abundance and a concentration for a corresponding peptide structure.
  • the plurality of peptide structure profiles can include a first peptide structure profile with a relative abundance for a corresponding peptide structure and a second peptide structure profile with a concentration for the corresponding peptide structure.
  • the supervised machine learning model can include a logistic regression model.
  • the first group of peptide structures in Tables 3B, 3C, or 3D is used to distinguish between the ovarian cancer disease state being late stage or early stage.
  • the quantification data for a peptide structure of the set of peptide structures can include at least one of an abundance, a relative abundance, a normalized abundance, a relative quantity, an adjusted quantity, a normalized quantity, a relative concentration, an adjusted concentration, or a normalized concentration.
  • the peptide structure data can be generated using multiple reaction monitoring mass spectrometry (MRM-MS), wherein the using of the MRM-MS includes ionizing one or more glycopeptides to form ionized glycopeptides; filtering the ionized glycopeptides with a mass filter to form filtered glycopeptides; fragmenting the filtered glycopeptides in a collision chamber into product ions; and detecting the product ions.
  • MRM-MS multiple reaction monitoring mass spectrometry
  • the method can include preparing a sample of the biological sample using reduction, alkylation, and enzymatic digestion to form a prepared sample that includes a set of peptide structures.
  • the method of classifying early and late stage ovarian cancer can be implemented after the subject has already been diagnosed as having ovarian cancer.
  • the subject can be initially diagnosed for having ovarian cancer using one or more biomarkers in Tables 1, 2, or 3.
  • the generating the diagnosis output can include generating a report identifying that the biological sample evidences the early stage or late stage ovarian cancer disease state.
  • the generating a treatment output can be generated based on at least one of the diagnosis output or the disease indicator.
  • the treatment output can include at least one of an identification of a treatment to treat the subject or a treatment plan.
  • the treatment can include at least one of surgery, radiation therapy, a targeted drug therapy, chemotherapy, immunotherapy, hormone therapy, or neoadjuvant therapy.
  • the group of peptide structures in Tables 3B, 3C, or 3D is listed in order of relative significance to the disease indicator.
  • the method can further include preparing a sample of the biological sample using reduction, alkylation, and enzymatic digestion to form a prepared sample that includes a set of peptide structures.
  • the method can further include generating the peptide structure data from the prepared sample using multiple reaction monitoring mass spectrometry (MRM-MS).
  • MRM-MS multiple reaction monitoring mass spectrometry
  • a method of training a model to diagnose a subject with respect to an ovarian cancer disease state having a malignant pelvic tumor is described.
  • the method can include receiving quantification data for a panel of peptide structures for a plurality of samples for a plurality of subjects.
  • the plurality of subjects includes a first portion diagnosed with a classification of early stage ovarian cancer disease state and a second portion diagnosed with a classification of late stage ovarian cancer disease state.
  • the quantification data can include a plurality of peptide structure profiles for the plurality of subjects and training a machine learning model using the quantification data to diagnose a biological sample with respect to the ovarian cancer disease state using a group of peptide structures associated with the ovarian cancer disease state, wherein the group of peptide structures is identified in Tables 3B, 3C, or 3D.
  • the machine learning model can include a logistic regression model.
  • the method of training the model can further include identifying an initial plurality of peptide structure profiles, filtering the initial plurality of peptide structure profiles by a coefficient of variation to generate a plurality of peptide structure profiles for use in training the machine learning model.
  • the filtering can be performed to exclude peptide structure profiles having the coefficient of variation at or above 20%.
  • the training of the machine learning model can include reducing the plurality of peptide structure profiles using LASSO regression to identify a final group of peptide structures identified in Tables 3B, 3C, or 3D.
  • the quantification data for the panel of peptide structures for the plurality of subjects diagnosed with the plurality of ovarian cancer disease states can include at least one of an abundance, a relative abundance, a normalized abundance, a relative quantity, an adjusted quantity, a normalized quantity, a relative concentration, an adjusted concentration, or a normalized concentration.
  • the trained model can use a relative abundance for a first portion of the first group of peptide structures and a concentration for a second portion of the second group of peptide structures.
  • Each peptide structure profile of the plurality of peptide structure profiles includes a feature selected from one of a relative abundance and a concentration for a corresponding peptide structure.
  • the plurality of peptide structure profiles can include a first peptide structure profile with a relative abundance for a corresponding peptide structure and a second peptide structure profile with a concentration for the corresponding peptide structure.
  • a composition can include at least one of peptide structures identified in Tables 3B, 3C, or 3D.
  • a method for diagnosing a subject with respect to an ovarian cancer disease state is described. The method can include analyzing the peptide structure data using a supervised machine learning model to generate a disease indicator that indicates whether a biological sample evidences the ovarian cancer disease state of having early stage or late stage ovarian cancer based on a group of glycopeptide structures.
  • the group of glycopeptide structures can include tri-antennary or tetra-antennary sialic acid moieties, wherein a portion of the glycopeptide structures of the group are fucosylated. A diagnosis is then outputted based on the disease indicator.
  • the group of glycopeptide structures can include at least one, at least three, at least five, or at least 10 glycopeptide structure identified in Tables 3B, 3C, or 3D
  • the peptide structure data was generated with a mass spectrometer using the biological sample obtained from the subject.
  • the method can further include preparing a sample of the biological sample using reduction, alkylation, and enzymatic digestion to form a prepared sample that includes a set of peptide structures.
  • the peptide structure data can be generated from the prepared sample using multiple reaction monitoring mass spectrometry (MRM-MS).
  • MRM-MS multiple reaction monitoring mass spectrometry
  • the use of the MRM-MS can include ionizing one or more glycopeptides to form ionized glycopeptides; filtering the ionized glycopeptides with a mass filter to form filtered glycopeptides; fragmenting the filtered glycopeptides in a collision chamber into product ions; and detecting the product ions.
  • a system comprising one or more data processors is described according to various embodiments.
  • the system comprises a non-transitory computer readable storage medium containing instructions which, when executed on the one or more data processors, cause the one or more data processors to perform part or all of any of the methods described herein.
  • a computer-program product tangibly embodied in a non-transitory machine-readable storage medium, including instructions configured to cause one or more data processors to perform part or all of any one of the methods described according to various embodiments.
  • a system is described according to various embodiments.
  • the system comprises one or more data processors and a non-transitory computer readable storage medium containing instructions which, when executed on the one or more data processors, cause the one or more data processors to perform part or all of any one or more of the methods described herein.
  • a computer-program product tangibly embodied in a non-transitory machine-readable storage medium, including instructions configured to cause one or more data processors to perform part or all of any one or more of the methods described herein.
  • the peptide structure data is listed in Table 3D and the detected product ion comprises a first product having a m/z value listed in Table 4C.
  • the at least one peptide structure comprises a peptide sequence and a glycan structure, wherein the glycan structure is attached to a linking site position in the peptide sequence in accordance with one of Tables 1, 2, 3, 3B, 3C, and 3D.
  • the glycan structure of the peptide sequence corresponds to a glycan structure GL number in accordance with Tables 1, 2, 3, 3B, 3C, and 3D, wherein the glycan structure comprises a symbol structure in accordance with the glycan structure GL number according to Tables 1, 2, 3, 3B, 3C, 3D, and 7.
  • the glycan structure of the peptide sequence corresponds to a glycan structure GL number in accordance with Tables 1, 2, 3, 3B, 3C, and 3D, wherein the glycan structure comprises a composition in accordance with the glycan structure GL number, Tables 1, 2, 3, 3B, 3C, 3D, and 7.
  • a rightmost N-acetylgalactosamine (open square) of the glycan structure in Table 7 is attached to a linking site position in the peptide sequence in accordance with Tables 3 and 5.
  • a bottommost N-acetylglucosamine (dark square) of the glycan structure in Table 7 is attached to a linking site position in the peptide sequence in accordance with Tables 1, 2, 3, 3B, 3C, 3D, and 5.
  • composition comprising one or more peptide structures from Tables 1, 2, 3, 3B, 3C, and 3D.
  • the at least one peptide structure comprises a peptide sequence and a glycan structure, wherein the glycan structure is attached to a linking site position in the peptide sequence in accordance with Tables 1, 2, 3, 3B, 3C, and 3D.
  • the glycan structure of the peptide sequence corresponds to a glycan structure GL number in accordance with Tables 1, 2, 3, 3B, 3C, and 3D, wherein the glycan structure comprises a symbol structure in accordance with the glycan structure GL number according to Tables 1, 2, 3, 3B, 3C, 3D, and 7.
  • the glycan structure of the peptide sequence corresponds to a glycan structure GL number in accordance with Tables 1, 2, 3, 3B, 3C, and 3D, wherein the glycan structure comprises a composition in accordance with the glycan structure GL number, Tables 1, 2, 3, 3B, 3C, 3D, and 7.
  • a rightmost N-acetylgalactosamine (GalNAc) of the glycan structure in Table 7 is attached to a linking site position in the peptide sequence in accordance with Tables 3 and 5.
  • a bottommost N-acetylglucosamine (GlcNAc) of the glycan structure in Table 7 is attached to a linking site position in the peptide sequence in accordance with Tables 1, 2, 3, 3B, 3C, 3D, and 5.
  • the peptide sequence can be one of SEQ ID NOS: 130-135, 137, 139, 140, 143, 148, 149, 155, 158-162, 166, and 171.
  • the peptide structure of the at least one peptide structure comprises a glycopeptide structure defined by a peptide sequence and a glycan structure linked to the peptide sequence at a linking site of the peptide sequence, as identified in Table 3D, with the peptide sequence being one of SEQ ID NOS: 130-135, 137, 139, 140, 143, 148, 149, 155, 158-162, 166, and 171 in Table 3D as defined in Table 5.
  • the peptide structure of the at least one peptide structure comprises a glycopeptide structure defined by a peptide sequence and a glycan structure linked to the peptide sequence at a linking site of the peptide sequence, as identified in Table 3D, with the peptide sequence being one of SEQ ID NOS: 130-135, 137, 139, 140, 143, 148, 149, 155, 159-162, 166, and 171 in Table 3D as defined in Table 5.
  • the glycan structure corresponding to the peptide sequence of SEQ ID NOS: 131, 137, 143, 155, 159, 162, 166, and 171 includes a fucose and the fucose is in an outer arm orientation.
  • a peptide structure of the at least one peptide structure comprises a glycopeptide structure defined by a peptide sequence and a glycan structure linked to the peptide sequence at a linking site of the peptide sequence, as identified in Table 3D, with the peptide sequence being one of SEQ ID NOS: 131, 137, 143, 155, 159, 162, 166, and 171 in Table 3D as defined in Table 5, wherein a fucose of the glycan structure comprises an outer arm orientation.
  • the at least one peptide structure is selected from one of a group of peptide structures identified in Tables 3D.
  • a peptide structure of the at least one peptide structure comprises a glycopeptide structure defined by a peptide sequence and a glycan structure linked to the peptide sequence at a linking site of the peptide sequence, as identified in Table 3D, with the peptide sequence being one of SEQ ID NOS: 131-134, 137, 139, 140, 143, 151, 165-167 in Table 3D as defined in Table 5.
  • the glycan structure corresponding to the peptide sequence of SEQ ID NOS: 131, 137, and 143, includes a fucose and the fucose is in an outer arm orientation.
  • the outer arm orientation of the fucose comprises the fucose being linked to a N-acetylglucosamine by a a-(l-3/4) linkage.
  • a method of treating ovarian cancer in an individual comprising administering to the individual an ovarian cancer therapy, wherein the individual has been determined to be responsive to the ovarian cancer therapy via a trained machine learning classifier that distinguishes between responsive and non-responsive individuals who have received the ovarian cancer therapy, based at least in part on a group of peptide structures identified in Tables 3B, 3C, or 3D.
  • Figure 1 is a schematic diagram of an exemplary workflow 100 for the detection of peptide structures associated with a disease state for use in diagnosis and/or treatment in accordance with one or more embodiments.
  • Figure 2A is a schematic diagram of a preparation workflow in accordance with one or more embodiments.
  • Figure 2B is a schematic diagram of data acquisition in accordance with one or more embodiments.
  • Figure 3 is a block diagram of an analysis system in accordance with one or more embodiments.
  • Figure 4 is a block diagram of a computer system in accordance with various embodiments.
  • Figure 5 is a flowchart of a process for diagnosing a subject with respect to an ovarian cancer disease state in accordance with one or more embodiments based on Tables 1 or 2.
  • Figure 6 is a flowchart of a process for diagnosing a subject with respect to ovarian cancer disease state in accordance with one or more embodiments based on Table 3.
  • Figure 6B is a flowchart of a process for diagnosing a subject with respect to ovarian cancer disease state in accordance with one or more embodiments based on Table 3B.
  • Figure 7 is a flowchart of a process for training a model to diagnose a subject with respect to an ovarian cancer disease state in accordance with one or more embodiments.
  • Figure 8 is a table describing the distribution of the samples acquired in this exemplary retrospective analysis in accordance with one or more embodiments.
  • Figure 9 is a plot diagram illustrating the results of a principal component analysis performed to assess the segregation between healthy, benign pelvic tumor, and EOC samples across first and second principal components in accordance with one or more embodiments.
  • Figure 10 is a plot diagram illustrating the results of a principal component analysis performed to assess segregation between healthy, benign pelvic tumor, early EOC, late EOC, and missing (undocumented) samples).
  • FIG 11 is an illustration of a receiver operating characteristic (ROC) diagram corresponding to the multivariable model built to predict malignancy v. benign status of pelvic tumors in accordance with one or more embodiments.
  • ROC receiver operating characteristic
  • Figure 12 is an illustration of a diagram showing the probability distributions for the various groups using the multivariable model for predicting malignancy v. benign status of pelvic tumors in accordance with one or more embodiments.
  • Figure 13 is an illustration of a receiver operating characteristic (ROC) diagram corresponding to the multivariable model built to predict malignancy v. benign status of pelvic tumors in accordance with one or more embodiments.
  • ROC receiver operating characteristic
  • Figure 14 is an illustration of a diagram showing the probability distributions for the various groups using the multivariable model for predicting malignancy v. benign status of pelvic tumors in accordance with one or more embodiments.
  • Figure 15A to 15E are a plurality of charts illustrating the upregulation of fucosylated biomarkers having tri or tetra-antennary sialic acids from stages 1/2 to 3/4 of ovarian cancer and the down regulation of non-fucosylated biomarkers having tri or tetra- antennary sialic acids from stages 1/2 to 3/4 of ovarian cancer.
  • Figure 16 is an illustration of a diagram showing the probability distributions for early stage v. late stage ovarian cancer using training data set and the testing data set in accordance with one or more embodiments using the biomarkers of Table 3C.
  • Figure 17 is an illustration of a receiver operating characteristic (ROC) diagram corresponding to the multivariable model built to predict early stage v. late stage ovarian cancer in accordance with one or more embodiments.
  • ROC receiver operating characteristic
  • Figure 18 is a graph illustrating the fold changes for a plurality of tri- and tetra- antennary glycans glycopeptides that were either non-fucosylated or fucosylated.
  • Figure 19A is a graph illustrating the fold changes for pairs of tri- and tetra- antennary glycans glycopeptides that were either non-fucosylated or fucosylated.
  • Figure 19B is a graph illustrating the fold changes for triplets of tri- and tetra- antennary glycans glycopeptides that were either non-fucosylated, mono-fucosylated, or di- fucosylated. Both mono-fucosylated and di-fucosylated markers has median FC’s above 1 suggesting correlation of these markers with malignant EOC.
  • Figure 20 is an illustration of a diagram showing the probability distributions for early stage v. late stage ovarian cancer using training data set and the testing data set in accordance with one or more embodiments using the biomarkers of Table 3D
  • Figures 21A to 21E are graphs of the relative abundance of five distinct types of fucosylated glycopeptides in benign tumors, early stage EOC, and late stage EOC.
  • Figure 22 is a representative mass spectra showing breakdown fragments of 3 glycans and 4 glycan aggregates that indicate the presence of glycans with an outer arm fucosylated orientation.
  • glycoproteomics is an emerging field that can be used in the overall diagnosis and/or treatment of subjects with various types of diseases.
  • Glycoproteomics aims to determine the positions, identities, and quantities of glycans and glycosylated proteins in a given sample (e.g., blood sample, cell, tissue, etc.).
  • Protein glycosylation is one of the most common and most complex forms of post-translational protein modification, and can affect protein structure, conformation, and function.
  • glycoproteins may play crucial roles in important biological processes such as cell signaling, host-pathogen interactions, and immune response and disease. Glycoproteins may therefore be important to diagnosing different types of diseases.
  • protein glycosylation provides useful information about cancer and other diseases
  • analysis of protein glycosylation may be difficult as the glycan typically cannot be traced back to the protein site of origin with currently available methodologies.
  • Glycoprotein analysis can be challenging in general due to several reasons. For example, a single glycan composition in a peptide may contain a large number of isomeric structures because of different glycosidic linkages, branching, and many monosaccharides having the same mass.
  • MS mass spectrometry
  • This information can be used to distinguish the disease state from other states, diagnose a subject as having or not having the disease state, determine a likelihood that a subject has the disease state, determine whether a subject has one of early stage (stages 1 and 2) or late stage (stages 3 and 4) EOC, or a combination thereof.
  • such analysis may be useful in diagnosing an ovarian cancer disease state for a subject (e.g., a negative diagnosis for the ovarian cancer disease state or a positive diagnosis for the ovarian cancer disease state).
  • Sample collection and analysis can be collected at different time points for comparing ovarian cancer disease states over time for a subject.
  • the negative diagnosis may include a healthy state or a benign tumor state (i.e., “benign” as seen throughout).
  • An example of the positive diagnosis includes the subject suffering from a form of ovarian cancer (e.g., epithelial ovarian cancer (EOC)).
  • EOC epithelial ovarian cancer
  • a diagnosis can also assess a malignancy status of a previously identified pelvic (or adnexal) tumor (or mass).
  • a machine learning model is trained to analyze peptide structure data and generate a disease indicator that provides information relating to one or more diseases.
  • the peptide structure data comprises quantification metrics (e.g., abundance or concentration data) for peptide structures.
  • a peptide structure may be defined by an aglycosylated peptide sequence (e.g., a peptide or peptide fragment of a larger parent protein) or a glycosylated peptide sequence.
  • a glycosylated peptide sequence may be a peptide sequence having a glycan structure that is attached to a linking site (e.g., an amino acid residue) of the peptide sequence, which may occur via, for example, a particular atom of the amino acid residue).
  • a linking site e.g., an amino acid residue
  • Non-limiting examples of glycosylated peptides include N-linked glycopeptides and O-linked glycopeptides.
  • An ovarian cancer disease state may include any condition that can be diagnosed as cancer that occurs in in the ovaries. Many malignant pelvic tumors are ovarian cancer. Certain peptide structures that are associated with an ovarian cancer disease state may be more relevant to that disease state than other peptide structures that are also associated with that disease state.
  • Analyzing the abundance of peptide sequences and glycosylated peptide sequences in a biological sample may provide a more accurate way in which to distinguish a positive ovarian cancer disease state (e.g., a state including the presence of ovarian cancer) from a negative ovarian cancer disease state (e.g., healthy state, a benign tumor state, an absence of ovarian cancer, etc.).
  • This type of peptide structure analysis may be more conducive to generating accurate diagnoses as compared to glycoprotein analysis that focuses on analyzing glycoproteins that are too large to be resolved via mass spectrometry. Further, with glycoproteins, there may be too many potential proteoforms to consider.
  • analysis of peptide structure data in the manner described by the various embodiments herein may be more conducive to generating accurate diagnoses as compared to glycomic analysis that provides little to no information about what proteins and to which amino acid residue sites various glycan structures attach.
  • ovarian cancer treated with surgical resection will reoccur due to the metastasis.
  • tests that can diagnose metastatic ovarian cancer and monitor the progression of the disease (e.g., assessing the state of early vs late stage ovarian cancer).
  • Such a test may be based on either ELISA or mass spectrometry.
  • stage 1 the cancer is confined to the ovaries and hasn’t spread to the abdomen, pelvis or lymph nodes, nor to distant sites.
  • stage 2 the cancer has spread from one or both ovaries to other areas of the pelvis. However, the cancer hasn’t spread to nearby lymph nodes or distant sites.
  • Stages 1 and 2 are considered early stage.
  • stage 3 the cancer has spread to nearby lymph nodes and/or other parts of the abdomen, but it hasn’t spread to distant sites.
  • stage 4 the cancer has spread beyond the abdomen. Stages 3 and 4 are considered late stage.
  • glycopeptides having fucosylation were found through mass spectrometry measurements to be associated with metastatic ovarian cancer.
  • this type of glycopeptide had tri- and tetra-antennary N-glycans on certain proteins.
  • various proteins such as AGP1, AGP2, APOC3, FETUA, HPT, CLUS, A2MG, TRFE, VTNC, IGJ, and CFAH can be captured on an ELISA plate from patient samples followed by a lectin based detection (four lectins: LCA, AAL, PHA-E, PHA-L).
  • Mass spectrometry can be used to analyze serum for various glycoproteins and/or glycopeptides to differentiate between benign and malignant adnexal masses.
  • a distinct signature was found with the circulating N-glycoproteins that allows a differentiation between late stage (metastatic disease of stage III/IV) and early stage (stage I/II) epithelial ovarian cancer (EOC).
  • EOC epithelial ovarian cancer
  • Qiagen s Ingenuity Pathway Analysis package on this data, it was predicted that the signature markers are downstream of cytokine signaling.
  • the markers also suggest the presence of the sialyl Lewis X (sLex) epitope on N-glycans of certain liver-derived circulatory glycoproteins.
  • the methods, systems, and compositions provided by the embodiments described herein may enable an earlier and more accurate diagnosis of ovarian cancer in a subject as compared to currently available diagnostic modalities (e.g., imaging, biochemical tests) used for determining whether surgical intervention is indicated.
  • diagnostic modalities e.g., imaging, biochemical tests
  • various currently available non-invasive tests to distinguish between benign and malignant pelvic tumors rely on detection of the biomarker cancer antigen 125 (CA125).
  • CA125 cancer antigen 125
  • serum CA125 is not elevated in over 20% of ovarian carcinomas and is elevated in a variety of other malignant and non-malignant conditions.
  • the term “plurality” may be 2, 3, 4, 5, 6, 7, 8, 9, 10, or more.
  • the term “set of” means one or more. For example, a set of items includes one or more items.
  • the phrase “at least one of,” when used with a list of items, means different combinations of one or more of the listed items may be used and only one of the items in the list may be needed.
  • the item may be a particular object, thing, step, operation, process, or category.
  • “at least one of’ means any combination of items or number of items may be used from the list, but not all of the items in the list may be required.
  • “at least one of item A, item B, or item C” means item A; item A and item B; item B; item A, item B, and item C; item B and item C; or item A and C.
  • “at least one of item A, item B, or item C” means, but is not limited to, two of item A, one of item B, and ten of item C; four of item B and seven of item C; or some other suitable combination.
  • “substantially” means sufficient to work for the intended purpose. The term “substantially” thus allows for minor, insignificant variations from an absolute or perfect state, dimension, measurement, result, or the like such as would be expected by a person of ordinary skill in the field but that do not appreciably affect overall performance. When used with respect to numerical values or parameters or characteristics that can be expressed as numerical values, “substantially” means within ten percent.
  • amino acid generally refers to any organic compound that includes an amino group (e.g., -NH2), a carboxyl group (-COOH), and a side chain group (R) which varies based on a specific amino acid. Amino acids can be linked using peptide bonds.
  • alkylation generally refers to the transfer of an alkyl group from one molecule to another.
  • alkylation is used to react with reduced cysteines to prevent the re-formation of disulfide bonds after reduction has been performed.
  • linking site or “glycosylation site” as used herein generally refers to the location where a sugar molecule of a glycan or glycan structure is directly bound (e.g., covalently bound) to an amino acid of a peptide, a polypeptide, or a protein.
  • the linking site may be an amino acid residue and a glycan structure may be linked via an atom of the amino acid residue.
  • types of glycosylation can include N-linked glycosylation, O-linked glycosylation, C-linked glycosylation, S-linked glycosylation, and glycation.
  • biological sample generally refers to a specimen taken by sampling so as to be representative of the source of the specimen, typically, from a subject.
  • a biological sample can be representative of an organism as a whole, specific tissue, cell type, or category or sub-category of interest.
  • Biological samples may include, but are not limited to synovial fluid, whole blood, blood serum, blood plasma, urine, sputum, tissue, saliva, tears, spinal fluid, tissue section(s) obtained by biopsy; cell(s) that are placed in or adapted to tissue culture; sweat, mucous, fecal material, gastric fluid, abdominal fluid, amniotic fluid, cyst fluid, peritoneal fluid, pancreatic juice, breast milk, lung lavage, marrow, gastric acid, bile, semen, pus, aqueous humor, transudate, and the like including derivatives, portions and combinations of the foregoing.
  • biological samples include, but are not limited, to blood and/or plasma.
  • biological samples include, but are not limited, to urine or stool.
  • Biological samples include, but are not limited, to saliva. Biological samples include, but are not limited, to tissue dissections and tissue biopsies. Biological samples include, but are not limited, any derivative or fraction of the aforementioned biological samples.
  • the biological sample can include a macromolecule.
  • the biological sample can include a small molecule.
  • the biological sample can include a virus.
  • the biological sample can include a cell or derivative of a cell.
  • the biological sample can include an organelle.
  • the biological sample can include a cell nucleus.
  • the biological sample can include a rare cell from a population of cells.
  • the biological sample can include any type of cell, including without limitation prokaryotic cells, eukaryotic cells, bacterial, fungal, plant, mammalian, or other animal cell type, mycoplasmas, normal tissue cells, tumor cells, or any other cell type, whether derived from single cell or multicellular organisms.
  • the biological sample can include a constituent of a cell.
  • the biological sample can include nucleotides (e.g., ssDNA, dsDNA, RNA), organelles, amino acids, peptides, proteins, carbohydrates, glycoproteins, or any combination thereof.
  • the biological sample can include a matrix (e.g., a gel or polymer matrix) comprising a cell or one or more constituents from a cell (e.g., cell bead), such as DNA, RNA, organelles, proteins, or any combination thereof, from the cell.
  • a matrix e.g., a gel or polymer matrix
  • the biological sample may be obtained from a tissue of a subject.
  • the biological sample can include a hardened cell. Such hardened cells may or may not include a cell wall or cell membrane.
  • the biological sample can include one or more constituents of a cell but may not include other constituents of the cell. An example of such constituents may include a nucleus or an organelle.
  • the biological sample may include a live cell.
  • the live cell can be capable of being cultured.
  • biomarker generally refers to any measurable substance taken as a sample from a subject whose presence is indicative of some phenomenon. Non- limiting examples of such phenomenon can include a disease state, a condition, or exposure to a compound or environmental condition. In various embodiments described herein, biomarkers may be used for diagnostic purposes (e.g., to diagnose a health state, a disease state). The term “biomarker” can be used interchangeably with the term “marker.”
  • digesting a peptide generally refers to a biological process that employs enzymes to break specific amino acid peptide bonds.
  • digesting a peptide includes contacting the peptide with an digesting enzyme, e.g., trypsin to produce fragments of the glycopeptide.
  • an digesting enzyme e.g., trypsin to produce fragments of the glycopeptide.
  • a protease enzyme is used to digest a glycopeptide.
  • protease enzyme refers to an enzyme that performs proteolysis or breakdown of large peptides into smaller polypeptides or individual amino acids.
  • protease examples include, but are not limited to, one or more of a serine protease, threonine protease, cysteine protease, aspartate protease, glutamic acid protease, metalloprotease, asparagine peptide lyase, and any combinations of the foregoing.
  • Enzymatic digestion may be used in preparation for mass spectrometry using trypsin digestion protocols. Proteins may be digested using other proteases in preparation for mass spectrometry if access is limited to cleavage sites.
  • disease state generally refers to a condition that affects the structure or function of an organism.
  • causes of disease states may include pathogens, immune system dysfunctions, cell damage caused by aging, cell damage caused by other factors (e.g., trauma and cancer).
  • Disease states can include any state of a disease whether symptomatic or asymptomatic.
  • Disease states can include disease stages of a disease progression. Disease states can cause minor, moderate, or severe disruptions in structure or function of an organism (e.g., a subject).
  • fragment generally refers to an ion fragmentation process which occurs in a MRM-MS instrument. Fragmenting may produce various fragments having the same mass but varying with respect to their charge, e.g., some biomarkers described herein produce more than one product m/z.
  • glycan or “polysaccharide” as used herein, both generally refer to a carbohydrate residue of a glycoconjugate, such as the carbohydrate portion of a glycopeptide, glycoprotein, glycolipid, or proteoglycan. Glycans can include monosaccharides.
  • glycopeptide or “glycopolypeptide” as used herein, generally refers to a peptide or polypeptide comprising at least one glycan residue.
  • glycopeptides comprise carbohydrate moieties (e.g., one or more glycans) covalently attached to a side chain (i.e. R group) of an amino acid residue.
  • glycopeptide fragment or “glycosylated peptide fragment” or “glycopeptide” as used herein, generally refers to a glycosylated peptide (or glycopeptide) having an amino acid sequence that is the same as part (but not all) of the amino acid sequence of the glycosylated protein from which the glycosylated peptide is obtained, e.g., ion fragmentation within a MRM-MS instrument.
  • MRM refers to multiple-reaction-monitoring.
  • glycopeptide fragments or “fragments of a glycopeptide” refer to the fragments produced directly by using a mass spectrometer optionally after the glycoprotein has been digested enzymatically to produce the glycopeptides.
  • glycoprotein generally refers to a protein having at least one glycan residue bonded thereto.
  • a glycoprotein is a protein with at least one oligosaccharide chain covalently bonded thereto. Examples of glycoproteins include but are not limited to the peptide structures including glycan molecules shown in the various Tables presented herein.
  • a glycopeptide, as used herein, refers to a fragment of a glycoprotein, unless specified otherwise to the contrary.
  • liquid chromatography generally refers to a technique used to separate a sample into parts. Liquid chromatography can be used to separate, identify, and quantify components.
  • mass spectrometry generally refers to an analytical technique used to identify molecules. In various embodiments described herein, mass spectrometry can be involved in characterization and sequencing of proteins.
  • m/z or “mass-to-charge ratio,” as used herein, generally refers to an output value from a mass spectrometry instrument.
  • m/z can represent a relationship between the mass of a given ion and the number of elementary charges that it carries.
  • the “m” in m/z stands for mass and the “z” stands for charge.
  • m/z can be displayed on an x-axis of a mass spectrum.
  • the term “patient,” as used herein, generally refers to a mammalian subject.
  • the mammal can be a human, or an animal including, but not limited to an equine, porcine, canine, feline, ungulate, and primate animal.
  • the individual is a human.
  • the methods and uses described herein are useful for both medical and veterinary uses.
  • a “patient” is a human subject unless specified to the contrary.
  • peptide generally refers to amino acids linked by peptide bonds.
  • Peptides can include amino acid chains between 10 and 50 residues.
  • Peptides can include amino acid chains shorter than 10 residues, including, oligopeptides, dipeptides, tripeptides, and tetrapeptides.
  • Peptides can include chains longer than 50 residues and may be referred to as “polypeptides” or “proteins.”
  • the phrase “peptide,” is meant to include glycopeptides unless stated otherwise.
  • Protein or “polypeptide” or “peptide” may be used interchangeably herein and generally refer to a molecule including at least three amino acid residues. Proteins can include polymer chains made of amino acid sequences linked together by peptide bonds. Proteins may be digested in preparation for mass spectrometry using trypsin digestion protocols. Proteins may be digested using other proteases in preparation for mass spectrometry if access is limited to cleavage sites.
  • peptide structure generally refers to peptides or a portion thereof or glycopeptides or a portion thereof.
  • a peptide structure can include any molecule comprising at least two amino acids in sequence.
  • reduction generally refers to the gain of an electron by a substance.
  • a sugar can directly bind to a protein, thereby, reducing the amino acid to which it binds. Such reducing reactions can occur in glycosylation. In various embodiments, reduction may be used to break disulfide bonds between two cysteines.
  • sample generally refers to a sample from a subject of interest and may include a biological sample of a subject.
  • the sample may include a cell sample.
  • the sample may include a cell line or cell culture sample.
  • the sample can include one or more cells.
  • the sample can include one or more microbes.
  • the sample may include a nucleic acid sample or protein sample.
  • the sample may also include a carbohydrate sample or a lipid sample.
  • the sample may be derived from another sample.
  • the sample may include a tissue sample, such as a biopsy, core biopsy, needle aspirate, or fine needle aspirate.
  • the sample may include a fluid sample, such as a blood sample, urine sample, or saliva sample.
  • the sample may include a skin sample.
  • the sample may include a cheek swab.
  • the sample may include a plasma or serum sample.
  • the sample may include a cell-free or cell free sample.
  • a cell-free sample may include extracellular polynucleotides.
  • the sample may originate from blood, plasma, serum, urine, saliva, mucosal excretions, sputum, stool, or tears.
  • the sample may originate from red blood cells or white blood cells.
  • the sample may originate from feces, spinal fluid, CNS fluid, gastric fluid, amniotic fluid, cyst fluid, peritoneal fluid, marrow, bile, other body fluids, tissue obtained from a biopsy, skin, or hair.
  • sequence generally refers to a biological sequence including one-dimensional monomers that can be assembled to generate a polymer.
  • sequences include nucleotide sequences (e.g., ssDNA, dsDNA, and RNA), amino acid sequences (e.g., proteins, peptides, and polypeptides), and carbohydrates (e.g., compounds including Cm (H2O) chunk).
  • the term “subj ect,” as used herein, generally refers to an animal, such as a mammal (e.g., human) or avian (e.g., bird), or other organism, such as a plant.
  • the subject can include a vertebrate, a mammal, a rodent (e.g., a mouse), a primate, a simian or a human.
  • Animals may include, but are not limited to, farm animals, sport animals, and pets.
  • a subject can include a healthy or asymptomatic individual, an individual that has or is suspected of having a disease (e.g., cancer) or a pre-disposition to the disease, and/or an individual that is in need of therapy or suspected of needing therapy.
  • a subject can be a patient.
  • a subject can include a microorganism or microbe (e.g., bacteria, fungi, archaea, viruses). However, in the context of diagnosing ovarian cancer, the subject is female unless explicitly specified otherwise.
  • a subject may be one who has been previously identified as having a disease or a condition, and optionally has already undergone, or is undergoing, a therapeutic intervention for the disease or condition.
  • a subject can also be one who has not been previously diagnosed as having a disease or a condition.
  • a subject can be one who exhibits one or more risk factors for a disease or a condition, or a subject who does not exhibit disease risk factors, or a subject who is asymptomatic for a disease or a condition.
  • a subject can also be one who is suffering from or at risk of developing a disease or a condition.
  • training data generally refers to data that can be input into models, statistical models, algorithms and any system or process able to use existing data to make predictions.
  • a “model” may include one or more algorithms, one or more mathematical techniques, one or more machine learning algorithms, or a combination thereof.
  • machine learning may be the practice of using algorithms to parse data, learn from it, and then make a determination or prediction about something in the world. Machine learning uses algorithms that can learn from data without relying on rules-based programming.
  • a machine learning algorithm may include a parametric model, a nonparametric model, a deep learning model, a neural network, a linear discriminant analysis model, a quadratic discriminant analysis model, a support vector machine, a random forest algorithm, a nearest neighbor algorithm, a combined discriminant analysis model, a k-means clustering algorithm, a supervised model, an unsupervised model, logistic regression model, a multivariable regression model, a penalized multivariable regression model, or another type of model.
  • an “artificial neural network” or “neural network” may refer to mathematical algorithms or computational models that mimic an interconnected group of artificial nodes or neurons that processes information based on a connectionistic approach to computation.
  • Neural networks which may also be referred to as neural nets, can employ one or more layers of nonlinear units to predict an output for a received input.
  • Some neural networks include one or more hidden layers in addition to an output layer. The output of each hidden layer is used as input to the next layer in the network, i.e., the next hidden layer or the output layer. Each layer of the network generates an output from a received input in accordance with current values of a respective set of parameters.
  • a reference to a “neural network” may be a reference to one or more neural networks.
  • a neural network may process information in two ways: when it is being trained it is in training mode and when it puts what it has learned into practice it is in inference (or prediction) mode.
  • Neural networks learn through a feedback process (e.g., backpropagation) which allows the network to adjust the weight factors (modifying its behavior) of the individual nodes in the intermediate hidden layers so that the output matches the outputs of the training data.
  • a neural network learns by being fed training data (learning examples) and eventually learns how to reach the correct output, even when it is presented with a new range or set of inputs.
  • a neural network may include, for example, without limitation, at least one of a Feedforward Neural Network (FNN), a Recurrent Neural Network (RNN), a Modular Neural Network (MNN), a Convolutional Neural Network (CNN), a Residual Neural Network (ResNet), an Ordinary Differential Equations Neural Networks (neural-ODE), or another type of neural network.
  • FNN Feedforward Neural Network
  • RNN Recurrent Neural Network
  • MNN Modular Neural Network
  • CNN Convolutional Neural Network
  • Residual Neural Network Residual Neural Network
  • Neural-ODE Ordinary Differential Equations Neural Networks
  • a “target glycopeptide analyte,” may refer to a peptide structure (e.g., glycosylated or aglycosylated/non-glycosylated), a fraction of a peptide structure, a sub- structure (e.g., a glycan or a glycosylation site) of a peptide structure, a product of one or more of the above listed structures and sub-structures, associated detection molecules (e.g., signal molecule, label, or tag), or an amino acid sequence that can be measured by mass spectrometry.
  • a peptide structure e.g., glycosylated or aglycosylated/non-glycosylated
  • a fraction of a peptide structure e.g., a fraction of a peptide structure
  • a sub- structure e.g., a glycan or a glycosylation site
  • associated detection molecules e.g., signal molecule,
  • a “peptide data set,” may be used interchangeably with “peptide structure data” and can refer to any data of or relating to a peptide from a resulting mass spectrometry run.
  • a peptide data set can comprise data obtained from a sample or biological sample using mass spectrometry.
  • a peptide dataset can comprise data relating to an external standard, data relating to an internal standard, and data relating to a target glycopeptide analyte of a sample.
  • a peptide data set can result from analysis originating from a single run.
  • the peptide data set can include raw abundance and mass to charge ratios for one or more peptides.
  • a “a transition,” may refer to or identify a peptide structure.
  • a transition can refer to the specific pair of m/z values associated with a precursor ion and a product or fragment ion.
  • a “non-glycosylated endogenous peptide” may refer to a peptide structure that does not comprise a glycan molecule.
  • an NGEP and a target glycopeptide analyte can originate from the same subject.
  • an NGEP and a target glycopeptide analyte may be derived from the same protein sequence.
  • the NGEP and the target glycopeptide analyte may be derived from or include the same peptide sequence.
  • an NGEP can be labeled with an isotope in preparation for mass spectrometry analysis.
  • “abundance,” may refer to a quantitative value generated using mass spectrometry.
  • the quantitative value may relate to the amount of a particular peptide structure.
  • the quantitative value may comprise an amount of an ion produced using mass spectrometry.
  • the quantitative value may be expressed as an m/z value. In other embodiments, the quantitative value may be expressed in atomic mass units.
  • “relative abundance,” may refer to a comparison of two or more abundances.
  • the comparison may comprise comparing one peptide structure to a total number of peptide structures.
  • the comparison may comprise comparing one peptide glycoform (e.g., two identical peptides differing by one or more glycans) to a set of peptide glycoforms.
  • the comparison may comprise comparing a number of ions having a particular m/z ratio by a total number of ions detected.
  • a relative abundance can be expressed as a ratio. In other embodiments, a relative abundance can be expressed as a percentage.
  • an “internal standard,” may refer to something that can be contained (e.g., spiked-in) in the same sample as a target glycopeptide analyte undergoing mass spectrometry analysis. Internal standards can be used for calibration purposes. Additionally, internal standards can be used in the systems and method described herein. In some aspects, an internal standard can be selected based on similarity m/z and or retention times and can be a “surrogate” if a specific standard is too costly or unavailable. Internal standards can be heavy labeled or non-heavy labeled.
  • FIG. 1 is a schematic diagram of an exemplary workflow 100 for the detection of peptide structures associated with a disease state for use in diagnosis and/or treatment in accordance with one or more embodiments.
  • Workflow 100 may include various operations including, for example, sample collection 102, sample intake 104, sample preparation and processing 106, data analysis 108, and output generation 110.
  • Sample collection 102 may include, for example, obtaining a biological sample 112 of one or more subjects, such as subject 114.
  • Biological sample 112 may take the form of a specimen obtained via one or more sampling methods.
  • Biological sample 112 may be representative of subject 114 as a whole or of a specific tissue, cell type, or other category or sub-category of interest.
  • Biological sample 112 may be obtained in any of a number of different ways.
  • biological sample 112 includes whole blood sample 116 obtained via a blood draw.
  • biological sample 112 includes set of aliquoted samples 118 that includes, for example, a serum sample, a plasma sample, a blood cell (e.g., white blood cell (WBC), red blood cell (RBC) sample, another type of sample, or a combination thereof.
  • Biological samples 112 may include nucleotides (e.g., ssDNA, dsDNA, RNA), organelles, amino acids, peptides, proteins, carbohydrates, glycoproteins, or any combination thereof.
  • a single run can analyze a sample (e.g., the sample including a peptide analyte), an external standard (e.g., an NGEP of a serum sample), and an internal standard.
  • a sample e.g., the sample including a peptide analyte
  • an external standard e.g., an NGEP of a serum sample
  • an internal standard e.g., an NGEP of a serum sample
  • abundance or raw abundance for the external standard, the internal standard, and target glycopeptide analyte can be determined by mass spectrometry in the same run.
  • external standards may be analyzed prior to analyzing samples.
  • the external standards can be run independently between the samples.
  • external standards can be analyzed after every 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or more experiments.
  • external standard data can be used in some or all of the normalization systems and methods described herein.
  • blank samples may be processed to prevent column fouling.
  • Sample intake 104 may include one or more various operations such as, for example, aliquoting, registering, processing, storing, thawing, and/or other types of operations.
  • sample intake 104 includes aliquoting whole blood sample 116 to form a set of aliquoted samples that can then be sub-aliquoted to form set of samples 120.
  • Sample preparation and processing 106 may include, for example, one or more operations to form set of peptide structures 122.
  • set of peptide structures 122 may include various fragments of unfolded proteins that have undergone digestion and may be ready for analysis.
  • sample preparation and processing 106 may include, for example, data acquisition 124 based on set of peptide structures 122.
  • data acquisition 124 may include use of, for example, but is not limited to, a liquid chromatography/mass spectrometry (LC/MS) system.
  • LC/MS liquid chromatography/mass spectrometry
  • Data analysis 108 may include, for example, peptide structure analysis 126.
  • data analysis 108 also includes output generation 110.
  • output generation 110 may be considered a separate operation from data analysis 108.
  • Output generation 110 may include, for example, generating final output 128 based on the results of peptide structure analysis 126. Final output 128 may be used for determining research, diagnosis, and/or treatment.
  • final output 128 is comprised of one or more outputs.
  • Final output 128 may take various forms.
  • final output 128 may be a report that includes, for example, a diagnosis output, a treatment output (e.g., a treatment design output, a treatment plan output, or combination thereof), analyzed data (e.g., relativized and normalized) or combination thereof.
  • report can comprise a target glycopeptide analyte concentration as a function of the NGEP concentration value and the normalized abundance.
  • final output 128 may be an alert (e.g., a visual alert, an audible alert, etc.), a notification (e.g., a visual notification, an audible notification, an email notification, etc.), an email output, or a combination thereof.
  • final output 128 may be sent to remote system 130 for processing.
  • Remote system 130 may include, for example, a computer system, a server, a processor, a cloud computing platform, cloud storage, a laptop, a tablet, a smartphone, some other type of mobile computing device, or a combination thereof.
  • workflow 100 may optionally exclude one or more of the operations described herein and/or may optionally include one or more other steps or operations other than those described herein (e.g., in addition to and/or instead of those described herein). Accordingly, workflow 100 may be implemented in any of a number of different ways for use in the research, diagnosis, and/or treatment of a disease state.
  • Figures 2A and 2B are schematic diagrams of a workflow for sample preparation and processing 106 in accordance with one or more embodiments. Figures 2A and 2B are described with continuing reference to Figure 1. Sample preparation and processing 106 may include, for example, preparation workflow 200 shown in Figure 2A and data acquisition 124 shown in Figure 2B.
  • FIG. 2A is a schematic diagram of preparation workflow 200 in accordance with one or more embodiments.
  • Preparation workflow 200 may be used to prepare a sample, such as a sample of set of samples 120 in Figure 1, for analysis via data acquisition 124. For example, this analysis may be performed via mass spectrometry (e.g., LC-MS).
  • mass spectrometry e.g., LC-MS
  • preparation workflow 200 may include denaturation and reduction 202, alkylation 204, and digestion 206. All areas of the preparation workflow can cause inconsistency between different samples and different experiments, necessitating, the improved normalization systems and methods described herein and throughout.
  • polymers such as proteins, in their native form, can fold to include secondary, tertiary, and/or other higher order structures.
  • Such higher order structures may functionalize proteins to complete tasks (e.g., enable enzymatic activity) in a subject.
  • higher order structures of polymers may be maintained via various interactions between side chains of amino acids within the polymers. Such interactions can include ionic bonding, hydrophobic interactions, hydrogen bonding, and disulfide linkages between cysteine residues.
  • unfolding such polymers e.g., peptide/protein molecules
  • unfolding a polymer may include denaturing the polymer, which may include, for example, linearizing the polymer.
  • denaturation and reduction 202 can be used to disrupt higher order structures (e.g., secondary, tertiary, quaternary, etc.) of one or more proteins (e.g., polypeptides and peptides) in a sample (e.g., one of set of samples 120 in Figure 1).
  • Denaturation and reduction 202 includes, for example, a denaturation procedure and a reduction procedure.
  • the denaturation procedure may be performed using, for example, thermal denaturation, where heat is used as a denaturing agent. The thermal denaturation can disrupt ionic bonding, hydrophobic interactions, and/or hydrogen bonding.
  • the denaturation procedure may include using one or more denaturing agents.
  • the denaturation procedure may include using temperature.
  • the denaturation procedure may include using one or more denaturing agents in combination with heat.
  • These one or more denaturing agents may include, for example, but are not limited to, any number of chaotropic salts (e.g., urea, guanidine), surfactants (e.g., sodium dodecyl sulfate (SDS), beta octyl glucoside, Triton X- 100), or combination thereof.
  • chaotropic salts e.g., urea, guanidine
  • surfactants e.g., sodium dodecyl sulfate (SDS), beta octyl glucoside, Triton X- 100
  • such denaturing agents may be used in combination with heat when sample preparation workflow further includes a cleanup procedure.
  • the resulting one or more denatured (e.g., unfolded, linearized) proteins may then undergo further processing in preparation of analysis.
  • a reduction procedure may be performed in which one or more reducing agents are applied.
  • a reducing agent can produce an alkaline pH.
  • a reducing agent may take the form of, for example, without limitation, dithiothreitol (DTT), tris(2-carboxyethyl)phosphine (TCEP), or some other reducing agent.
  • the reducing agent may reduce (e.g., cleave) the disulfide linkages between cysteine residues of the one or more denatured proteins to form one or more reduced proteins.
  • the one or more reduced proteins resulting from denaturation and reduction 202 may undergo a process to prevent the reformation of disulfide linkages between, for example, the cysteine residues of the one or more reduced proteins.
  • This process may be implemented using alkylation 204 to form one or more alkylated proteins.
  • alkylation 204 may be used to add an acetamide group to a sulfur on each cysteine residue to prevent disulfide linkages from reforming.
  • an acetamide group can be added by reacting one or more alkylating agents with a reduced protein.
  • the one or more alkylating agents may include, for example, one or more acetamide salts.
  • alkylating agent may take the form of, for example, iodoacetamide (IAA), 2-chloroacetamide, some other type of acetamide salt, or some other type of alkylating agent.
  • alkylation 204 may include a quenching procedure. The quenching procedure may be performed using one or more reducing agents (e.g., one or more of the reducing agents described above).
  • the one or more alkylated proteins formed via alkylation 204 can then undergo digestion 206 in preparation for analysis (e.g., mass spectrometry analysis).
  • Digestion 206 of a protein may include cleaving the protein at or around one or more cleavage sites (e.g., site 205 which may be one or more amino acid residues).
  • site 205 which may be one or more amino acid residues.
  • an alkylated protein may be cleaved at the carboxyl side of the lysine or arginine residues. This type of cleavage may break the protein into various segments, which include one or more peptide structures (e.g., glycosylated or aglycosylated).
  • digestion 206 is performed using one or more proteolysis catalysts.
  • an enzyme can be used in digestion 206.
  • the enzyme takes the form of trypsin.
  • one or more other types of enzymes e.g., proteases
  • these one or more other enzymes include, but are not limited to, LysC, LysN, AspN, GluC, and ArgC.
  • digestion 206 may be performed using tosyl phenylalanyl chloromethyl ketone (TPCK)-treated trypsin, one or more engineered forms of trypsin, one or more other formulations of trypsin, or a combination thereof.
  • digestion 206 may be performed in multiple steps, with each involving the use of one or more digestion agents. For example, a secondary digestion, tertiary digestion, etc. may be performed.
  • trypsin is used to digest serum samples.
  • trypsin/LysC cocktails are used to digest plasma samples.
  • digestion 206 further includes a quenching procedure.
  • the quenching procedure may be performed by acidifying the sample (e.g., to a pH ⁇ 3).
  • formic acid may be used to perform this acidification.
  • preparation workflow 200 further includes post-digestion procedure 207.
  • Post-digestion procedure 207 may include, for example, a cleanup procedure.
  • the cleanup procedure may include, for example, the removal of unwanted components in the sample that results from digestion 206.
  • unwanted components may include, but are not limited to, inorganic ions, surfactants, etc.
  • post-digestion procedure 207 further includes a procedure for the addition of heavy-labeled peptide internal standards.
  • preparation workflow 200 has been described with respect to a sample created or taken from biological sample 112 that is blood-based (e.g., a whole blood sample, a plasma sample, a serum sample, etc.), sample preparation workflow 200 may be similarly implemented for other types of samples (e.g., tears, urine, tissue, interstitial fluids, sputum, etc.) to produce set of peptides structures 122.
  • biological sample 112 that is blood-based
  • sample preparation workflow 200 may be similarly implemented for other types of samples (e.g., tears, urine, tissue, interstitial fluids, sputum, etc.) to produce set of peptides structures 122.
  • Figure 2B is a schematic diagram of data acquisition 124 in accordance with one or more embodiments.
  • data acquisition 124 can commence following sample preparation 200 described in Figure 2A.
  • data acquisition 124 can comprise quantification 208, quality control 210, and peak integration and normalization 212.
  • targeted quantification 208 of peptides and glycopeptides can incorporate use of liquid chromatography-mass spectrometry LC/MS instrumentation.
  • LC-MS/MS e.g., LC-MS/MS
  • tandem MS may be used.
  • LC/MS e.g., LC-MS/MS
  • LC/MS can combine the physical separation capabilities of liquid chromatograph (LC) with the mass analysis capabilities of mass spectrometry (MS).
  • this technique allows for the separation of digested peptides to be fed from the LC column into the MS ion source through an interface.
  • any LC/MS device can be incorporated into the workflow described herein.
  • an instrument or instrument system suited for identification and targeted quantification 208 may include, for example, a Triple Quadrupole LC/MSTM.
  • targeted quantification 208 is performed using multiple reaction monitoring mass spectrometry (MRM-MS).
  • identification of a particular protein or peptide and an associated quantity can be assessed. In various embodiments described herein, identification of a particular glycan and an associated quantity can be assessed. In various embodiments described herein, particular glycans can be matched to a glycosylation site on a protein or peptide and the abundances measured.
  • targeted quantification 208 includes using a specific collision energy associated for the appropriate fragmentation to consistently see an abundant product ion.
  • Glycopeptide structures may have a lower collision energy than aglycosylated peptide structures.
  • the source voltage and gas temperature may be lowered as compared to generic proteomic analysis.
  • quality control 210 procedures can be put in place to optimize data quality.
  • measures can be put in place allowing only errors within acceptable ranges outside of an expected value.
  • employing statistical models e.g., using Westgard rules
  • quality control 210 may include, for example, assessing the retention time and abundance of representative peptide structures (e.g., glycosylated and/or aglycosylated) and spiked-in internal standards, in either every sample, or in each quality control sample (e.g., pooled serum digest).
  • Peak integration and normalization 212 may be performed to process the data that has been generated and transform the data into a format for analysis.
  • peak integration and normalization 212 may include converting abundance data for various product ions that were detected for a selected peptide structure into a single quantification metric (e.g., a relative quantity, an adjusted quantity, a normalized quantity, a relative concentration, an adjusted concentration, a normalized concentration, etc.) for that peptide structure.
  • peak integration and normalization 212 may be performed using one or more of the techniques described in U.S. Patent Publication No. 2020/0372973A1 and/or US Patent Publication No. 2020/0240996A1, the disclosures of which are incorporated by reference herein in their entireties.
  • Figure 3 is a block diagram of an analysis system 300 in accordance with one or more embodiments.
  • Analysis system 300 can be used to both detect and analyze various peptide structures that have been associated to various disease states.
  • Analysis system 300 is one example of an implementation for a system that may be used to perform data analysis 108 in Figure 1. Thus, analysis system 300 is described with continuing reference to workflow 100 as described in Figures 1, 2 A, and/or 2B.
  • Analysis system 300 may include computing platform 302 and data store 304. In some embodiments, analysis system 300 also includes display system 306. Computing platform 302 may take various forms. In one or more embodiments, computing platform 302 includes a single computer (or computer system) or multiple computers in communication with each other. In other examples, computing platform 302 takes the form of a cloud computing platform.
  • Data store 304 and display system 306 may each be in communication with computing platform 302.
  • data store 304, display system 306, or both may be considered part of or otherwise integrated with computing platform 302.
  • computing platform 302, data store 304, and display system 306 may be separate components in communication with each other, but in other examples, some combination of these components may be integrated together. Communication between these different components may be implemented using any number of wired communications links, wireless communications links, optical communications links, or a combination thereof.
  • Analysis system 300 includes, for example, peptide structure analyzer 308, which may be implemented using hardware, software, firmware, or a combination thereof.
  • peptide structure analyzer 308 is implemented using computing platform 302.
  • Peptide structure analyzer 308 receives peptide structure data 310 for processing.
  • Peptide structure data 310 may be, for example, the peptide structure data that is output from sample preparation and processing 106 in Figures 1, 2 A, and 2B. Accordingly, peptide structure data 310 may correspond to set of peptide structures 122 identified for biological sample 112 and may thereby correspond to biological sample 112.
  • Peptide structure data 310 can be sent as input into peptide structure analyzer 308, retrieved from data store 304 or some other type of storage (e.g., cloud storage), accessed from cloud storage, or obtained in some other manner. In some cases, peptide structure data 310 may be retrieved from data store 304 in response to (e.g., directly or indirectly based on) receiving user input entered by a user via an input device.
  • peptide structure analyzer 308 retrieved from data store 304 or some other type of storage (e.g., cloud storage), accessed from cloud storage, or obtained in some other manner.
  • peptide structure data 310 may be retrieved from data store 304 in response to (e.g., directly or indirectly based on) receiving user input entered by a user via an input device.
  • Peptide structure analyzer 308 includes model 312 that is configured to receive peptide structure data 310 for processing.
  • Model 312 may be implemented in any of a number of different ways. Model 312 may be implemented using any number of models, functions, equations, algorithms, and/or other mathematical techniques.
  • model 312 includes machine learning system 314, which may itself be comprised of any number of machine learning models and/or algorithms.
  • machine learning system 314 may include, but is not limited to, at least one of a deep learning model, a neural network, a linear discriminant analysis model, a quadratic discriminant analysis model, a support vector machine, a random forest algorithm, a nearest neighbor algorithm (e.g., a k-Nearest Neighbors algorithm), a combined discriminant analysis model, a k-means clustering algorithm, an unsupervised model, a multivariable regression model, a penalized multivariable regression model, or another type of model.
  • a nearest neighbor algorithm e.g., a k-Nearest Neighbors algorithm
  • model 312 includes a machine learning system 314 that comprises any number of or combination of the models or algorithms described above.
  • model 312 analyzes peptide structure data 310 to generate disease indicator 316 that indicates whether the biological sample is positive for an ovarian cancer disease state based on set of peptide structures 318 identified as being associated with the ovarian cancer disease state.
  • Peptide structure data 310 may include quantification data for the plurality of peptide structures. Quantification data for a peptide structures can include at least one of an abundance, a relative abundance, a normalized abundance, a relative quantity, an adjusted quantity, a normalized quantity, a relative concentration, an adjusted concentration, or a normalized concentration.
  • peptide structure data 310 may include a set of quantification metrics for each peptide structure of a plurality of peptide structures.
  • a quantification metric for a peptide structure may be selected as one of a relative quantity, an adjusted quantity, a normalized quantity, a relative abundance, an adjusted abundance, and a normalized abundance.
  • a quantification metric for a peptide structure is selected from one of a relative concentration, an adjusted concentration, and a normalized concentration.
  • the quantification metrics used are normalized abundances. In this manner, peptide structure data 310 may provide abundance information about the plurality of peptide structures with respect to biological sample 112.
  • Disease indicator 316 may take various forms.
  • disease indicator 316 includes a classification that indicates whether or not the subject is positive for the ovarian cancer disease state.
  • disease indicator 316 can include a score 320.
  • Score 320 indicates whether the ovarian cancer disease state is present or not.
  • score 320 may be, a probability score that indicates how likely it is that the biological sample 112 evidences the presence of the ovarian cancer disease state.
  • a peptide structure of set of peptide structures 318 comprises a glycosylated peptide structure, or glycopeptide structure, that is defined by a peptide sequence and a glycan structure attached to a linking site of the peptide sequence quantity.
  • the peptide structure may be a glycopeptide or a portion of a glycopeptide.
  • a peptide structure of set of peptide structures 318 comprises an aglycosylated peptide structure that is defined by a peptide sequence.
  • the peptide structure may be a peptide or a portion of a peptide and may be referred to as a quantification peptide.
  • Set of peptide structures 318 may be identified as being those most predictive or relevant to the ovarian cancer disease state based on training of model 312.
  • set of peptide structures 318 includes at least one, at least two, or at least three peptide structures from a first group of peptide structures (peptide structures PS-1 through PS- 10) identified in Table 1 in Section VI. A. or at least one, at least two, or at least three peptide structures from a second group of peptide structures (peptide structures PS-5 and PS-11 through PS-34) identified in Table 2 in Section VI. A.
  • set of peptide structures 318 includes at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or all 10 of the peptide structures identified in Table 1 below in Section VI.
  • set of peptide structures 318 includes at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 23, at least 24, or all 25 of the peptide structures identified in Table 2 below in Section VI. A.
  • set of peptide structures 318 includes at least peptide structure PS-5, which is identified in both Table 1 and Table 2.
  • the number of peptide structures selected from Table 1 for inclusion in set of peptide structures 318 may be based on, for example, a desired level of accuracy.
  • set of peptide structures 318 includes at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36, at least 37, or all 38 of the peptide structures identified in Table 3 below in Section VI. A.
  • set of peptide structures 318 includes at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 412, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, or all 61 of the peptide structures listed in Tables 1, 2, and 3.
  • machine learning system 314 takes the form of binary classification model 322.
  • Binary classification model 322 may include, for example, but is not limited to, a regression model.
  • Binary classification model 322 may include, for example, a penalized multivariable regression model that is trained to identify set of peptide structures 318 from a plurality of (or panel of) peptide structures identified in various subjects.
  • Binary classification model 322 may be trained to identify weight coefficients for peptide structures and those peptide structures having non-zero weights or weight coefficients above a selected threshold (e.g., absolute weight coefficient above 0.0, 0.01, 0.05, 0.1, 0.015, 0.2, etc.) may be selected for inclusion in set of peptide structures 318.
  • Peptide structure analyzer 308 may generate final output 128 based on disease indicator 316 output by model 312. In other embodiments, final output 128 may be an output generated by model 312.
  • final output 128 includes disease indicator 316.
  • final output 128 includes diagnosis output 324, treatment output 326, or both.
  • Diagnosis output 324 may include, for example, a diagnosis for the ovarian cancer disease state.
  • the diagnosis can include a positive diagnosis or a negative diagnosis for the ovarian cancer disease state.
  • generating diagnosis output 324 may include comparing score 320 to selected threshold 328 to determine the diagnosis.
  • Selected threshold 328 may be, for example, without limitation, a score between 0.30 and 0.65 (e.g., 0.4, 0.5, 0.6, etc.).
  • a score 320 above 0.5 may indicate the presence of the ovarian cancer disease state and be output in diagnosis output 324 as a positive diagnosis.
  • a score 320 below 0.5 may indicate that the ovarian cancer disease state is not present and be output in diagnosis output 324 as a negative diagnosis.
  • a negative diagnosis indicates that the subject is healthy.
  • a negative diagnosis indicates that a detected pelvic tumor (or mass) is benign.
  • a biopsy may be recommended. For example, a biopsy of the subject may be performed in response to disease indicator 316 and/or diagnosis output 324 indicating a positive diagnosis for the ovarian cancer disease state.
  • peptide structure analyzer 308 (or another system implemented on computing platform 302) may generate a report recommending that a biopsy is to be performed for the subject in response to disease indicator 316 and/or diagnosis output 324 indicating a positive diagnosis for the ovarian cancer disease state.
  • peptide structure analyzer 308 may send diagnosis final output 128 to remote system 130 over one or more wireless, wired, and/or optical communications links and remote system 130 may generate a report recommending that a biopsy is to be performed for the subject in response to disease indicator 316 and/or diagnosis output 324 indicating a positive diagnosis for the ovarian cancer disease state.
  • the biopsy may be used to confirm the diagnosis to determine whether or not to administer treatment and/or how quickly to administer treatment.
  • disease indicator 316 and/or diagnosis output 324 indicate a negative diagnosis for the ovarian cancer disease state (e.g., benign pelvic tumor)
  • the report that is generated by peptide structure analyzer 308, remote system 130, or some other system implemented on computing platform 142 may recommend a period of monitoring for the subject.
  • a negative diagnosis indication by disease indicator 316 and/or diagnosis output 324 may thus help prevent unnecessary treatment or overtreatment of the subject.
  • Treatment output 326 may include, for example, at least one of an identification of a treatment for the subj ect, a treatment plan for administering the treatment, or both.
  • Treatment for ovarian cancer may include, for example, but is not limited to, at least one of surgery, radiation therapy, a targeted drug therapy (e.g., one or more targeted therapeutic agents), chemotherapy (e.g., one or more chemotherapeutic agents), immunotherapy (e.g., one or more immunotherapeutic agents), hormone therapy, neoadjuvant therapy, or some other form of treatment.
  • the treatment plan may include, for example, but is not limited to, a timeline or schedule for administering the treatment, dosing information, other treatment-related information, or a combination thereof.
  • Final output 128 may be sent to remote system 130 for processing in some examples. In other embodiments, final output 128 may be displayed on graphical user interface 330 in display system 306 for viewing by a human operator.
  • Figure 4 is a block diagram of a computer system in accordance with various embodiments.
  • Computer system 400 may be an example of one implementation for computing platform 302 described above in Figure 3.
  • computer system 400 can include a bus 402 or other communication mechanism for communicating information, and a processor 404 coupled with bus 402 for processing information.
  • computer system 400 can also include a memory, which can be a random-access memory (RAM) 406 or other dynamic storage device, coupled to bus 402 for determining instructions to be executed by processor 404. Memory also can be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 404.
  • computer system 400 can further include a read only memory (ROM) 408 or other static storage device coupled to bus 402 for storing static information and instructions for processor 404.
  • ROM read only memory
  • a storage device 410 such as a magnetic disk or optical disk, can be provided and coupled to bus 402 for storing information and instructions.
  • computer system 400 can be coupled via bus 402 to a display 412, such as a cathode ray tube (CRT), liquid crystal display (LCD), or light emitting diode (LED) for displaying information to a computer user.
  • a display 412 such as a cathode ray tube (CRT), liquid crystal display (LCD), or light emitting diode (LED) for displaying information to a computer user.
  • An input device 414 can be coupled to bus 402 for communicating information and command selections to processor 404.
  • a cursor control 416 such as a mouse, a joystick, a trackball, a gesture input device, a gaze-based input device, or cursor direction keys for communicating direction information and command selections to processor 404 and for controlling cursor movement on display 412.
  • This input device 414 typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane.
  • a first axis e.g., x
  • a second axis e.g., y
  • input devices 414 allowing for three-dimensional (e.g., x, y, and z) cursor movement are also contemplated herein.
  • results can be provided by computer system 400 in response to processor 404 executing one or more sequences of one or more instructions contained in RAM 406.
  • Such instructions can be read into RAM 406 from another computer-readable medium or computer-readable storage medium, such as storage device 410.
  • Execution of the sequences of instructions contained in RAM 406 can cause processor 404 to perform the processes described herein.
  • hard-wired circuitry can be used in place of or in combination with software instructions to implement the present teachings.
  • implementations of the present teachings are not limited to any specific combination of hardware circuitry and software.
  • computer-readable medium e.g., data store, data storage, storage device, data storage device, etc.
  • computer-readable storage medium refers to any media that participates in providing instructions to processor 404 for execution.
  • Such a medium can take many forms, including but not limited to, non-volatile media, volatile media, and transmission media.
  • non-volatile media can include, but are not limited to, optical, solid state, magnetic disks, such as storage device 410.
  • volatile media can include, but are not limited to, dynamic memory, such as RAM 406.
  • transmission media can include, but are not limited to, coaxial cables, copper wire, and fiber optics, including the wires that comprise bus 402.
  • Computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, or any other magnetic medium, a CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, a RAM, PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, or any other tangible medium from which a computer can read.
  • instructions or data can be provided as signals on transmission media included in a communications apparatus or system to provide sequences of one or more instructions to processor 404 of computer system 400 for execution.
  • a communication apparatus may include a transceiver having signals indicative of instructions and data.
  • the instructions and data are configured to cause one or more processors to implement the functions outlined in the disclosure herein.
  • Representative examples of data communications transmission connections can include, but are not limited to, telephone modem connections, wide area networks (WAN), local area networks (LAN), infrared data connections, NFC connections, optical communications connections, etc.
  • the methodologies described herein may be implemented by various means depending upon the application. For example, these methodologies may be implemented in hardware, firmware, software, or any combination thereof.
  • the processing unit may be implemented within one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), processors, controllers, micro-controllers, microprocessors, electronic devices, other electronic units designed to perform the functions described herein, or a combination thereof.
  • ASICs application specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGAs field programmable gate arrays
  • processors controllers, micro-controllers, microprocessors, electronic devices, other electronic units designed to perform the functions described herein, or a combination thereof.
  • the methods of the present teachings may be implemented as firmware and/or a software program and applications written in conventional programming languages such as C, C++, Python, etc. If implemented as firmware and/or software, the embodiments described herein can be implemented on a non-transitory computer-readable medium in which a program is stored for causing a computer to perform the methods described above. It should be understood that the various engines described herein can be provided on a computer system, such as computer system 400, whereby processor 404 would execute the analyses and determinations provided by these engines, subject to instructions provided by any one of, or a combination of, the memory components RAM 406, ROM, 408, or storage device 410 and user input provided via input device 414. VI. Exemplary Methodologies Relating to Diagnosis based on Peptide Structure Data Analysis
  • Figure 5 is a flowchart of a process for diagnosing a subject with respect to an ovarian cancer disease state in accordance with one or more embodiments.
  • Process 500 may be implemented using, for example, at least a portion of workflow 100 as described in Figures 1, 2A, and 2B and/or analysis system 300 as described in Figure 3.
  • Process 500 may be used to generate a final output that includes at least a diagnosis output for the subject.
  • Step 502 includes receiving peptide structure data corresponding to a biological sample obtained from the subject.
  • the peptide structure data may be, for example, one example of an implementation of peptide structure data 310 in Figure 3.
  • the peptide structure data may include quantification data for each peptide structure of a plurality of peptide structures.
  • the quantification data may include, for example, one or more quantification metrics for each peptide structure of the plurality of peptide structures.
  • a quantification metric for a peptide structure may be, for example, but is not limited to, a relative quantity, an adjusted quantity, a normalized quantity, a relative concentration, an adjusted concentration, or a normalized concentration.
  • the quantification data for a given peptide structure provides an indication of the abundance of the peptide structure in the biological sample.
  • at least one peptide structure includes a glycopeptide structure having a peptide sequence and a glycan structure linked to the peptide sequence at a linking site of the peptide sequence, as identified in Table 1 or Table 2, with the peptide sequence being one of SEQ ID NOS: 11-19 in Table 1 or one of SEQ ID NOS: 14, 15, and 31-46 in Table 2, the SEQ ID NOS being defined in Table 5 below.
  • Step 504 includes analyzing the peptide structure data using a supervised machine learning model to generate a disease indicator that indicates whether the biological sample evidences an ovarian cancer disease state based on at least three peptide structures selected from a first group of peptide structures identified in Table 1 (below) or a second group of peptide structures identified in Table 2 (below).
  • the first and second groups of peptide structures are associated with the ovarian cancer disease state.
  • the first group of peptide structures is listed in Table 1 with respect to relative significance to the disease indicator.
  • the second group of peptide structures is listed in Table 2 with respect to relative significance to the disease indicator.
  • the first group of peptide structures in Table 1 includes peptide structures that have been determined relevant to distinguishing at least between ovarian cancer (e.g., EOC) and a healthy state.
  • the first group of peptide structures may be used to predict the probability of EOC for use in clinically screening patients.
  • the first group of peptide structures in Table 1 may also be peptide structures that have been determined relevant to distinguishing between ovarian cancer (e.g., EOC) and a benign tumor state (e.g., a benign pelvic tumor).
  • the first group of peptide structures may be used to clinically triage patients that have been identified as having pelvic tumors to determine the probability that such a tumor evidences EOC.
  • the second group of peptide structures in Table 2 includes peptide structures that have been determined relevant to distinguishing at least between ovarian cancer (e.g., EOC) and the benign tumor state (e.g., a benign pelvic tumor).
  • the second group of peptide structures may be used to clinically triage patients that have been identified as having pelvic tumors to determine the probability that such a tumor evidences EOC. In this manner, the second group of peptide structures may predict malignancy of an identified pelvic tumor.
  • the at least 3 peptide structures includes at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or all 10 of the peptide structures PS-1 to PS-10 in Table 1.
  • the at least 3 peptide structures include at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 23, at least 24, or all 25 of the peptide structures PS-5 and PS-11 through PS-34 in Table 1.
  • the at least 3 peptide structures includes at least PS-5, which is present in both Table 1 and Table 2.
  • step 504 may be implemented using a binary classification model (e.g., a regression model).
  • the regression model may be, for example, penalized multivariable regression model.
  • the disease indicator may be computed using a weight coefficient associated with each peptide structure of the at least 3 peptide structures, the weight coefficient of a corresponding peptide structure of the at least 3 peptide structures may indicate the relative significance of the corresponding peptide structure to the disease indicator.
  • step 504 may include computing a peptide structure profile for the biological sample that identifies a weighted value for each peptide structure of the at least 3 peptide structures.
  • the weighted value for a peptide structure of the at least 3 peptide structures may be a product of a quantification metric for the peptide structure identified from the peptide structure data and a weight coefficient for the peptide structure.
  • the disease indicator may be computed using the peptide structure profile.
  • the disease indicator may be a logit equal to the sum of the weighted values for the peptide structures plus an intercept value. The intercept value may be determined during the training of the model.
  • the peptide structure profile for a given peptide structure may include a corresponding feature — relative abundance, concentration, site occupancy — for that peptide structure.
  • the relative abundance may be a normalized relative abundance; the concentration may be normalized concentration.
  • two peptide structure profiles may be computed for the same peptide structure, each profile corresponding to a different feature.
  • a first peptide structure profile may include a relative abundance for a corresponding peptide structure and a second peptide structure profile may include a concentration for the same corresponding peptide structure.
  • the disease indicator comprises a probability that the biological sample is positive for the ovarian cancer disease state and the supervised machine learning model is configured to generate an output that identifies the biological sample as either evidencing (“positive for”) the ovarian cancer disease state when the disease indicator is greater than a selected threshold or not evidencing (“negative for”) the ovarian cancer disease state when the disease indicator is not greater than the selected threshold.
  • the selected threshold may be, for example, 0.30, 0.35, 0.40, 0.45, 0.50, 0.55, 0.60, or some other threshold between 0.30 and 0.65. In one or more embodiments, the selected threshold is 0.5.
  • Step 506 includes generating a final output based on the disease indicator.
  • the final output may include a diagnosis output, such as, for example, diagnosis output 324 in Figure 3.
  • the diagnosis output may include the disease indicator, or a diagnosis made based on the disease indicator.
  • the diagnosis may be, for example, “positive” for the ovarian cancer disease state if the biological sample evidences the ovarian cancer disease state based on the disease indicator.
  • the diagnosis may be, for example, “negative” if the biological sample does not evidence the ovarian cancer disease state based on the disease indicator.
  • a negative diagnosis may mean that the biological sample has a non-ovarian cancer state.
  • the negative diagnosis for the ovarian cancer disease state can include at least one of a healthy state, a benign tumor state, or some other non-malignant state.
  • Generating the diagnosis output in step 506 may include determining that the score falls above (or at or above) a selected threshold and generating a positive diagnosis for the ovarian cancer disease state.
  • step 506 can include determining that the score falls below (or at or below) a selected threshold and generating a negative diagnosis for the ovarian cancer disease state.
  • the score can include a probability score and the selected threshold can be 0.5. In other scoring systems, the selected threshold can fall within a range between 0.30 and 0.65.
  • the final output in step 506 may include a treatment output if the diagnosis output indicates a positive diagnosis for the ovarian cancer disease state.
  • the treatment output may include, for example, at least one of an identification of a treatment for the subject, a treatment plan for administering the treatment, or both.
  • Treatment for ovarian cancer may include, for example, but is not limited to, at least one of surgery, radiation therapy, a targeted drug therapy (e.g., one or more targeted therapeutic agents), chemotherapy (e.g., one or more chemotherapeutic agents), immunotherapy (e.g., one or more immunotherapeutic agents), hormone therapy, neoadjuvant therapy, or some other form of treatment.
  • the treatment plan may include, for example, but is not limited to, a timeline or schedule for administering the treatment, dosing information, other treatment-related information, or a combination thereof.
  • Table 1 below lists a first group of peptide structures associated with malignant pelvic tumors (e.g., ovarian cancer such as EOC).
  • One or more features e.g., relative abundance, concentration, site occupancy
  • the first group of peptide structures is listed in Table 1 in order with respect to relative significance to the disease indicator.
  • the quantification metrics for peptide structure PS-9, peptide structure PS-10, or a combination of the two may form one input.
  • Table 1 also identifies check markers CK-1 and CK-2, which may also be used by the model.
  • Table 2 below lists a second group of peptide structures associated with malignant pelvic tumors (e.g., ovarian cancer such as EOC).
  • malignant pelvic tumors e.g., ovarian cancer such as EOC.
  • One or more features (e.g., relative abundance, concentration, site occupancy) of these peptide structures may be used in the supervised machine learning model described above to generate a disease indicator that predicts the probability of malignancy (e.g., in the context of triaging to distinguish between malignant and benign pelvic tumors).
  • the second group of peptide structures is listed in Table 2 in order with respect to relative significance to the disease indicator.
  • Table 2 also identifies check markers CK-3 and CK-4, which may also be used by the model.
  • Figure 6 is a flowchart of a process for diagnosing a subject with respect to an ovarian cancer disease state in accordance with one or more embodiments.
  • Process 600 may be implemented using, for example, at least a portion of workflow 100 as described in Figures 1, 2A, and 2B and/or analysis system 300 as described in Figure 3.
  • Process 600 may be used to generate a final output that includes at least a diagnosis output for the subject.
  • Step 602 includes receiving peptide structure data corresponding to a biological sample obtained from the subject.
  • the peptide structure data may be, for example, one example of an implementation of peptide structure data 310 in Figure 3.
  • the peptide structure data may include quantification data for each peptide structure of a plurality of peptide structures.
  • the quantification data may include, for example, one or more quantification metrics for each peptide structure of the plurality of peptide structures.
  • a quantification metric for a peptide structure may be, for example, but is not limited to, a relative quantity, an adjusted quantity, a normalized quantity, a relative concentration, an adjusted concentration, or a normalized concentration.
  • the quantification data for a given peptide structure provides an indication of the abundance of the peptide structure in the biological sample.
  • at least one peptide structure includes a glycopeptide structure having a peptide sequence and a glycan structure linked to the peptide sequence at a linking site of the peptide sequence, as identified in Table 3, with the peptide sequence being one of SEQ ID NOS: 11, 14, 15, 31,32, 33, 34, 37, 38, 40, 42, 44, 45, 46, 53-65 in Table 3, the SEQ ID NOS being defined in Table 5 below.
  • Step 604 includes analyzing the peptide structure data using a supervised machine learning model to generate a disease indicator that predicts whether the biological sample evidences a malignant pelvic tumor or benign pelvic tumor based on at least three peptide structures selected from a group of peptide structures identified in Table 3.
  • the group of peptide structures is listed in Table 3 with respect to relative significance to the disease indicator, which may be a probability score.
  • the group of peptide structures is associated with the malignancy (e.g., EOC).
  • the group of peptide structures in Table 3 includes peptide structures that have been determined relevant to distinguishing between a malignant and benign nature of a pelvic tumor.
  • the at least 3 peptide structures includes at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36, at least 37, or all 38 of the peptide structures PS-1, PS-5, PS-11, PS-15, PS-20, PS-25, PS-28, PS- 29, PS-30, PS-31, PS-32, and PS-35 to PS-61 identified in Table 3.
  • step 604 may be implemented using a binary classification model (e.g., a regression model).
  • the regression model may be, for example, penalized multivariable regression model.
  • the disease indicator may be computed using a weight coefficient associated with each peptide structure of the at least 3 peptide structures, the weight coefficient of a corresponding peptide structure of the at least 3 peptide structures may indicate the relative significance of the corresponding peptide structure to the disease indicator.
  • step 604 may include computing a peptide structure profile for the biological sample that identifies a weighted value for each peptide structure of the at least 3 peptide structures.
  • the weighted value for a peptide structure of the at least 3 peptide structures may be a product of a quantification metric for the peptide structure identified from the peptide structure data and a weight coefficient for the peptide structure.
  • the disease indicator may be computed using the peptide structure profile. For example, the disease indicator may be a logit equal to the sum of the weighted values for the peptide structures plus an intercept value. The intercept value may be determined during the training of the model.
  • the disease indicator comprises a probability that the biological sample is evidences malignancy (e.g., EOC) and the supervised machine learning model is configured to generate an output that identifies the biological sample as either evidencing (“positive for”) malignancy when the disease indicator is greater than a selected threshold or not evidencing (“negative for”) malignancy when the disease indicator is not greater than the selected threshold.
  • the selected threshold may be, for example, 0.30, 0.35, 0.40, 0.45, 0.50, 0.55, 0.60, or some other threshold between 0.30 and 0.65. In one or more embodiments, the selected threshold is 0.5.
  • Step 606 includes generating a final output based on the disease indicator.
  • the final output may include a diagnosis output, such as, for example, diagnosis output 324 in Figure 3.
  • the diagnosis output may include the disease indicator, or a diagnosis made based on the disease indicator.
  • the diagnosis may be, for example, “positive” for an ovarian cancer disease state (e.g., EOC) if the biological sample evidences malignancy based on the disease indicator.
  • the diagnosis may be, for example, “negative” if the biological sample does not evidence malignancy based on the disease indicator.
  • a negative diagnosis may mean that the biological sample evidences a benign status (or a non-ovarian cancer state).
  • Generating the diagnosis output in step 606 may include determining that the score falls above (or at or above) a selected threshold and generating a positive diagnosis for the ovarian cancer disease state.
  • step 606 can include determining that the score falls below (or at or below) a selected threshold and generating a negative diagnosis for the ovarian cancer disease state.
  • the score can include a probability score and the selected threshold can be 0.5. In other scoring systems, the selected threshold can fall within a range between 0.30 and 0.65.
  • the final output in step 606 may include a treatment output if the disease indicator predicts malignancy and/or the diagnosis output indicates a positive diagnosis for the ovarian cancer disease state.
  • the treatment output may include, for example, at least one of an identification of a treatment for the subject, a treatment plan for administering the treatment, or both.
  • Treatment for ovarian cancer may include, for example, but is not limited to, at least one of surgery, radiation therapy, a targeted drug therapy (e.g., one or more targeted therapeutic agents), chemotherapy (e.g., one or more chemotherapeutic agents), immunotherapy (e.g., one or more immunotherapeutic agents), hormone therapy, neoadjuvant therapy, or some other form of treatment.
  • the treatment plan may include, for example, but is not limited to, a timeline or schedule for administering the treatment, dosing information, other treatment-related information, or a combination thereof.
  • VLB.2 Exemplary Methodology — Based on Table 3B
  • Figure 6B is a flowchart of a process for diagnosing a subject with respect to an ovarian cancer disease state in accordance with one or more embodiments.
  • Process 600B may be implemented using, for example, at least a portion of workflow 100 as described in Figures 1, 2A, and 2B and/or analysis system 300 as described in Figure 3.
  • Process 600B may be used to generate a final output that includes at least a diagnosis output for the subject such as, for example early stage EOC or late stage EOC.
  • Step 602B includes receiving peptide structure data corresponding to a biological sample obtained from the subject.
  • the peptide structure data may be, for example, one example of an implementation of peptide structure data 310 in Figure 3.
  • the peptide structure data may include quantification data for each peptide structure of a plurality of peptide structures.
  • the quantification data may include, for example, one or more quantification metrics for each peptide structure of the plurality of peptide structures.
  • a quantification metric for a peptide structure may be, for example, but is not limited to, a relative quantity, an adjusted quantity, a normalized quantity, a relative concentration, an adjusted concentration, or a normalized concentration.
  • the quantification data for a given peptide structure provides an indication of the abundance of the peptide structure in the biological sample.
  • at least one peptide structure includes a glycopeptide structure having a peptide sequence and a glycan structure linked to the peptide sequence at a linking site of the peptide sequence, as identified in Table 3B, with the peptide sequence being one of SEQ ID NOS: 14, 18, 32, 33, 37, 39, 42, 45, 54, 56, 60, 68-77 in Table 3B, the SEQ ID NOS being defined in Table 5 below.
  • the glycopeptides of Table 3B were part of a glycoprotein that are further described in Table 6 and that the glycan portion of the glycopeptides is described in
  • Step 604B includes analyzing the peptide structure data using a supervised machine learning model to generate a disease indicator that predicts whether the biological sample evidences an early stage or late stage EOC on at least one peptide structure selected from a group of peptide structures identified in Table 3B.
  • the group of peptide structures is associated with the early stage or late stage EOC.
  • the group of peptide structures in Table 3B includes peptide structures that have been determined relevant to distinguishing between early stage (stages 1 and 2) or late stage (stages 3 and 4) EOC.
  • the at least 1 peptide structure includes at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, or all 36 of the peptide structures PS-4, PS-8, PS-18, PS-36, PS-37, PS-41, PS-56, PS- 62 to PS-90 identified in Table 3B.
  • step 604B may be implemented using a binary classification model (e.g., a regression model).
  • the regression model may be, for example, penalized multivariable regression model.
  • the disease indicator may be computed using a weight coefficient associated with each peptide structure of the at least 1 peptide structure, the weight coefficient of a corresponding peptide structure of the at least 1 peptide structures may indicate the relative significance of the corresponding peptide structure to the disease indicator.
  • step 604B may include computing a peptide structure profile for the biological sample that identifies a weighted value for each peptide structure of the at least 1 peptide structure.
  • the weighted value for a peptide structure of the at least 1 peptide structure may be a product of a quantification metric for the peptide structure identified from the peptide structure data and a weight coefficient for the peptide structure.
  • the disease indicator may be computed using the peptide structure profile. For example, the disease indicator may be a logit equal to the sum of the weighted values for the peptide structures plus an intercept value. The intercept value may be determined during the training of the model.
  • the disease indicator comprises a probability that the biological sample is evidences malignancy (e.g., EOC) and the supervised machine learning model is configured to generate an output that identifies the biological sample as either evidencing (“positive for”) malignancy when the disease indicator is greater than a selected threshold or not evidencing (“negative for”) malignancy when the disease indicator is not greater than the selected threshold.
  • the selected threshold may be, for example, 0.30, 0.35, 0.40, 0.45, 0.50, 0.55, 0.60, or some other threshold between 0.30 and 0.65. In one or more embodiments, the selected threshold is 0.5.
  • Step 606B includes generating a final output based on the disease indicator.
  • the final output may include a diagnosis output, such as, for example, diagnosis output 324 in Figure 3.
  • the diagnosis output may include the disease indicator, or a diagnosis made based on the disease indicator.
  • the diagnosis may be, for example, early stage or late stage based on the disease indicator.
  • An early stage diagnosis may mean that the biological sample evidences a stage 1 or 2 EOC.
  • a late stage diagnosis may mean that the biological sample evidences a stage 3 or 4 EOC.
  • Generating the diagnosis output in step 606B may include determining that the score falls above (or at or above) a selected threshold and generating a positive diagnosis for the late stage ovarian cancer disease state.
  • step 606B can include determining that the score falls below (or at or below) a selected threshold and generating a negative diagnosis for the late stage ovarian cancer disease state.
  • the score can include a probability score and the selected threshold can be 0.5. In other scoring systems, the selected threshold can fall within a range between 0.30 and 0.65.
  • the final output in step 606B may include a treatment output if the disease indicator predicts malignancy and/or the diagnosis output indicates a positive diagnosis for the ovarian cancer disease state.
  • the treatment output may include, for example, at least one of an identification of a treatment for the subject, a treatment plan for administering the treatment, or both.
  • Treatment for ovarian cancer may include, for example, but is not limited to, at least one of surgery, radiation therapy, a targeted drug therapy (e.g., one or more targeted therapeutic agents), chemotherapy (e.g., one or more chemotherapeutic agents), immunotherapy (e.g., one or more immunotherapeutic agents), hormone therapy, neoadjuvant therapy, or some other form of treatment.
  • the treatment plan may include, for example, but is not limited to, a timeline or schedule for administering the treatment, dosing information, other treatment-related information, or a combination thereof.
  • FCs that were above the 1 corresponded to markers that correlate with metastatic ovarian cancer and those below 1 corresponded to markers that correlate with non-metastatic ovarian cancer.
  • the Wilcoxon matched-pairs signed rank test was used to compare the two groups and a p value found to be ⁇ 0.0001 showing a statistical difference between non-fucosylated and fucosylated.
  • Figures 19A and 19B illustrate that a same set of markers in doublets/triplets analysis for fucosylation revealed a strong association with either metastatic ovarian cancer or non-metastatic ovarian cancer.
  • Doublet analysis refers to monitoring the fold change of a non-fucosylated and fucosylated glycopeptide that was tri or tetra-antennary for sialic acid and had the same peptide sequence and glycan linking site.
  • Triplet analysis refers to monitoring the fold change of a non-fucosylated, fucosylated, and di-fucosylated glycopeptide that was tri or tetraantennary for sialic acid and had the same peptide sequence and glycan linking site.
  • Figures 15A-15E shows that the fucosylated biomarkers (have a number 1 in the 2 nd to last number in the Peptide structure (PS) Name) show a relatively upward trend from stage 1/2 to stage 3/4.
  • Figures 15A-15E shows that the non- fucosylated biomarkers (have a number 0 in the 2 nd to last number in the Peptide structure (PS) Name) show an relatively downward trend from stage 1/2 to stage 3/4.
  • the glycan numbers 6513, 7613, 7614 are examples of fucosylated glycans having tri or tetra- antennary sialic acids.
  • the glycan numbers 6503, 7603, 7604 are examples of non- fucosylated glycans having tri or tetra-antennary sialic acids.
  • VLB.3 Exemplary Methodology — Based on Table 3C
  • process 600B may be implemented using Table 3C instead of Table 3B.
  • at least one peptide structure includes a glycopeptide structure having a peptide sequence and a glycan structure linked to the peptide sequence at a linking site of the peptide sequence, as identified in Table 3C, with the peptide sequence being one of SEQ ID NOS: 101-125 in Table 3C.
  • the group of peptide structures in Table 3C includes peptide structures that have been determined relevant to distinguishing between early stage (stages 1 and 2) or late stage (stages 3 and 4) EOC.
  • the at least 1 peptide structure includes at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 23, at least 24, or all 25 of the peptide SEQ ID NOS: 101-125 identified in Table 3C.
  • VLB.3 Exemplary Methodology — Based on Table 3D
  • process 600B may be implemented using Table 3D instead of Table 3B.
  • at least one peptide structure includes a glycopeptide structure having a peptide sequence and a glycan structure linked to the peptide sequence at a linking site of the peptide sequence, as identified in Table 3D, with the peptide sequence being one of SEQ ID NOS: 126-175 in Table 3D.
  • At least one peptide structure includes a glycopeptide structure having a peptide sequence and a glycan structure linked to the peptide sequence at a linking site of the peptide sequence, as identified in Table 3D, with the peptide sequence being one of SEQ ID NOS: 131-134, 137, 139, 140, 143, 151, 165-167 in Table 3D [0224]
  • the group of peptide structures in Table 3D includes peptide structures that have been determined relevant to distinguishing between early stage (stages 1 and 2) or late stage (stages 3 and 4) EOC.
  • the at least 1 peptide structure includes at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36, at least 37, at least 38, at least 39, at least 40, at least 41, at least 42, at least 43, at least 44, at least 45, at least 46, at least 47, at least 48, at least 49, or all 50 of the peptide SEQ ID NOS: 126-175 identified in Table 3D.
  • PS Peptide Structure
  • KNGl_294_6503 is a reference code for the protein name (e.g., KNG1), followed by the glycan linking site position in the protein (e.g., the number 294 that is preceded by an underscore and represents a sequential amino acid position in protein KNG1), and followed by the glycan structure GL number (e.g., the number 6503 that is preceded by an underscore and represents a glycan composition Hex(6)HexNAc(5)Fuc(0)NeuAc(3)).
  • PS Peptide Structure
  • the Peptide Structure (PS) Name of contains a prefix that represents an abbreviation (that may include a combination of letters and numbers) for a protein abbreviation that corresponds to the Protein Abbreviation of Table 6.
  • the term Linking Site Pos. in Protein Sequence is a number that refers to the sequential position of an amino acid of the corresponding protein in which a glycan is attached.
  • the amino acid position of the peptide sequence is defined by the sequentially numbered order of amino acids based on the Uniprot ID of the corresponding protein for the peptide sequence.
  • Peptide Sequence is a number that refers to the sequential position of an amino acid of the corresponding peptide in which a glycan is attached.
  • the amino acid position of the peptide sequence is defined by the sequentially numbered order of amino acids for the peptide sequence.
  • Glycan Structure GL No. is a number that corresponds to a symbol structure and a composition of the glycan as indicated in Tables 7.
  • the term AGP12 for SEQ ID NOs: 68-69 represents that the glycopeptide is a fragment of either AGP1 or AGP2.
  • the Glycan Linking Site Pos. in Protein Sequence column should be used for identification of the peptide.
  • the Glycan Linking Site Pos. in Protein Sequence column should be used for identification of the peptide.
  • the second number subsequent to the second underscore in the Peptide Structure (PS) NAME is inconsistent with the Glycan Structure GL NO column, then the Glycan Structure GL NO column should be used for identification of the glycan portion of the glycopeptide. If the Peptide Structure (PS) NAME does not contain any numbers, then the peptide is non-glycosylated. In some instances of the Peptide Structure (PS) NAME, subsequent to the prefix, there is a number noted with the notation MC that indicates that there was a missed cleavage at position in the peptide sequence as noted by the number.
  • Figure 7 is a flowchart of a process for training a model to diagnose a subject with respect to an ovarian cancer disease state in accordance with one or more embodiments.
  • Process 700 may be implemented using, for example, at least a portion of workflow 100 as described in Figures 1, 2A, and 2B and/or analysis system 300 as described in Figure 3.
  • process 700 may be one example of an implementation for training the model used in the process 500 in Figures 5, 6, or 6B.
  • Step 702 includes receiving quantification data for a panel of peptide structures for a plurality of subjects.
  • the plurality of subjects may include a first portion diagnosed with a negative diagnosis of an ovarian cancer disease state and a second portion diagnosed with a positive diagnosis of the ovarian cancer disease state.
  • the plurality of subjects may include a first portion having early stage EOC and a second portion have late stage EOC.
  • the quantification data comprises an initial plurality of peptide structure profiles for the plurality of subjects.
  • a peptide structure profile in the initial plurality of peptide structure profiles may include a feature associated with a corresponding peptide structure.
  • the feature may be relative abundance, concentration, site occupancy, or some other quantification-based feature.
  • the initial plurality of peptide structure profiles may include, one, two, three, or more profiles for a given peptide structure.
  • Step 704 includes training a machine learning model using the quantification data to diagnose a biological sample with respect to the ovarian cancer disease state using a group of peptide structures associated with the ovarian cancer disease state (e.g., the first group of peptide structures is identified in Table 1, the second group of peptide structures is identified in Table 2, the third group of peptide structures is identified in Table 3).
  • the first, second, and third groups of peptide structures are listed in Tables 1, 2, and 3, respectively, with respect to relative significance to diagnosing the biological sample as evidencing malignancy (e.g., EOC).
  • Step 704 can include training the machine learning using a portion of the quantification data corresponding to a training group of peptide structures included in the plurality of peptide structures.
  • Step 704 can include training a machine learning model using the quantification data to assess a biological sample with respect to the staging of the ovarian cancer disease state using a group of peptide structures associated with the ovarian cancer disease state such as a group of peptide structures identified in Table 3B, 3C, or 3D.
  • Step 704 may include reducing the plurality of peptide structure profiles using LASSO regression to identify a final group of peptide structures identified in Table 1 above.
  • Step 704 may include reducing the plurality of peptide structure profiles using LASSO regression to identify a final group of peptide structures identified in Table 2 above.
  • Step 704 may include reducing the plurality of peptide structure profiles using LASSO regression to identify a final group of peptide structures identified in Tables 3B, 3C, or 3D above.
  • Training data can be used for training the supervised machine learning model.
  • the training data can include a plurality of peptide structure profiles for a plurality of subjects and a plurality of subject diagnoses for the plurality of subjects.
  • the plurality of subject diagnoses can include a positive diagnosis for any subject of the plurality of subjects determined to have the ovarian cancer disease state and a negative diagnosis for any subject of the plurality of subjects determined not to have the ovarian cancer disease state.
  • the machine learning model can include a binary classification model.
  • Some binary classification models can include logistical regression models.
  • Some logistical regression models can include LASSO regression models.
  • An alternative or additional step in process 700 can include filtering the initial plurality of peptide structure profiles by a coefficient of variation to generate a plurality of peptide structure profiles for use in training the machine learning model. As one example, only those peptide structure profiles having a low coefficient of variation ( ⁇ 20%) were included int the plurality of peptide structure profiles used for training.
  • An alternative or additional step in process 700 can include performing a differential expression analysis using initial training data to compare a first portion of the plurality of subjects diagnosed with the positive diagnosis for the ovarian cancer disease state versus a second portion of the plurality of subjects diagnosed with the negative diagnosis for the ovarian cancer disease state.
  • An alternative or additional step in process 700 can include identifying a first portion of the plurality of samples for subjects with benign pelvic tumors and malignant pelvic tumors and a second portion of the plurality of samples for subjects with a healthy status.
  • An alternative or additional step in process 700 can include generating a training set of peptide structure profiles for 80% of the first portion and a test set of peptide structure profiles for a remaining 20% of the first portion and the second portion.
  • the machine learning model is a supervised machine learning model that is trained to determine weight coefficients for a panel of peptide structures such that a first portion of the weight coefficients for a first portion of the panel of peptide structures are non-zero and a second portion of the weight coefficients for a second portion of the panel of peptide structures are zero (or, alternatively, substantially close to zero so as to not be statistically significant).
  • the final output generated in step 506 in Figure 5 or in step 606 in Figure 6 may include a treatment output.
  • the treatment output may identify one or more treatment types for a subject based on the disease indicator and/or diagnosis output generated via process 500 in Figure 5 or process 600 in Figure 6, respectively.
  • Treatment for ovarian cancer e.g., EOC
  • the treatment output may include, for example, a treatment plan.
  • the treatment plan may include, for example, but is not limited to, a timeline or schedule for administering the treatment, dosing information, other treatment-related information, or a combination thereof.
  • Being able to accurately predict malignancy via the process 500 in Figure 5 and/or the process 600 in Figure 6 may allow treatment for malignant pelvic tumors (e.g., EOC) to be started earlier without requiring, in many or most cases, further invasive testing such as a biopsy.
  • malignant pelvic tumors e.g., EOC
  • a patient biological sample is obtained from a subject.
  • the biological sample may be processed (e.g., via digestion and fragmentation) such that one or more peptide structures of interest are detected. For example, detection and quantification may be performed for one or more peptide structures from Table 1, Table 2, Table 3, Table 3B, Table 3C, and/or Table 3D.
  • the quantification data that is generated for these peptide structures may be input into a trained binary classification model to generate a disease indicator, which may be, for example, a probability score.
  • a determination may be made as to whether the disease indicator (e.g., score) is above or below a selected threshold (e.g., 0.5). If the disease indicator is above the selected threshold, the biological sample may be classified as evidencing malignant pelvic tumor.
  • this classification may further include a classification that the subject is in need of treatment. If the subject is in need of treatment based on the classification, treatment is administered. For example, a therapeutically effective amount of a therapeutic agent is administered to the patient, where the therapeutic agent is selected from a chemotherapeutic agent, an immunotherapeutic agent, a hormone therapy, a targeted therapeutic agent, a neoadjuvant therapy, or a combination.
  • compositions comprising one or more of the peptide structures listed in Table 1, in Table 2, in Table 3, in Table 3B, in Table 3C, or in Table 3D.
  • a composition comprises a plurality of the peptide structures listed in Table 1, a plurality of the peptide structures listed in Table 2, or a plurality of the peptide structures listed in Table 3.
  • a composition comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 412, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, or 90 of the peptide structures listed in Tables 1, 2, 3, 3B, 3C, and 3D.
  • a composition comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, or all 36 of the peptide structures listed in Table 3B. In one or more embodiments, a composition comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or all 25 of the glycopeptide structures listed in Table 3C.
  • a composition comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or all 50 of the glycopeptide structures listed in Table 3D.
  • a composition comprises a peptide structure having an amino acid sequence with at least 80% sequence identity, such as, for example, at least 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to any one of SEQ ID NOs: 11-19, 31-46, 53-65, 68-77, 101-125, and 126-175 listed in Tables 1, 2, 3, 3B, 3C, and 3D.
  • a composition comprises a peptide structure having an amino acid sequence with at least 80% sequence identity, such as, for example, at least 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to any one of SEQ ID NOs: 131-134, 137, 139, 140, 143, 151, 165-167 listed in Tables 3D
  • compositions comprising one or more precursor ions having a defined charge and/or defined mass-to-charge (m/z) ratio, as listed in Tables 4, 4B, and 4C.
  • compositions comprising one or more product ions having a defined mass-to-charge (m/z) ratio, which product ions are produced by converting a peptide structure described herein (e.g., a peptide structure listed in Tables 1, 2, 3, 3B, 3C, or 3D) into a gas phase ion in a mass spectrometry system.
  • Conversion of the peptide structure into a gas phase ion can take place using any of a variety of techniques, including, but not limited to, matrix assisted laser desorption ionization (MALDI); electron ionization (El); electrospray ionization (ESI); atmospheric pressure chemical ionization (APCI); and/or atmospheric pressure photo ionization (APPI).
  • MALDI matrix assisted laser desorption ionization
  • El electron ionization
  • ESI electrospray ionization
  • APCI atmospheric pressure chemical ionization
  • APPI atmospheric pressure photo ionization
  • compositions comprising one or more product ions produced from one or more of the peptide structures described herein (e.g., a peptide structure listed in Tables 1, 2, 3, 3B, 3C, or 3D).
  • a composition comprises a set of the product ions listed in Table 4, 4B, or 4C having an m/z ratio selected from the list provided for each peptide structure in Table 4, 4B, or 4C.
  • a composition comprises at least one of peptide structures PS-1 to PS-10 identified in Table 1. In some embodiments, a composition comprises at least one of peptide structures PS-11 to PS-34 and PS-5 identified in Table 2. In some embodiments, a composition comprises at least one of peptide structures PS-1, PS-5, PS-11, PS-15, PS-20, PS-25, PS-28, PS-29, PS-30, PS-31, PS-32, and PS-35 to PS-61 identified in Table 3. In some embodiments, a composition comprises at least one of peptide structures PS-4, PS-8, PS- 18, PS-36, PS-37, PS-41, PS-56, PS-62 to PS-90 identified in Table 3B.
  • a composition comprises at least one of peptide structures of SEQ ID NOS 101-125 identified in Table 3C. In some embodiments, a composition comprises at least one of peptide structures of PS-ID 91 to 140 identified in Table 3D. In some embodiments, a composition comprises at least one of peptide structures of PS-ID NO: 96-99, 102, 104, 105, 108, 116, and 130-132 identified in Table 3D. [0247] In one or more embodiments, a composition comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or all 10 of the peptide structures PS-1 to PS-10 identified in Table 1.
  • a composition comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 23, at least 24, or all 25 of the peptide structures PS-11 to PS-34 and PS-5 identified in Table 2.
  • a composition comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36, at least 37, or all 38 of the peptide structures PS-1, PS-5, PS-11, PS-15, PS-20, PS-25, PS-28, PS-29, PS-30, PS-31, PS-32, and PS-35 to PS-61 identified in Table 3.
  • the at least 3 peptide structures additionally include at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, or all 7 of the remaining peptide structures PS-1, PS-5, PS-11, PS-15, PS-20, PS-25, PS-28, PS-29, PS-30, PS-31, PS-32, and PS-35 to PS-61 identified in Table 3.
  • a composition comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, or all 36 of the peptide structures PS-4, PS-8, PS-
  • a composition comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 23, at least 24, or all 25 of the peptide structures of SEQ ID NOS 121-125 identified in Table 3C.
  • a composition comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least
  • a composition comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, or all 12 of the peptide structures of SEQ ID NOS 131-134, 137, 139, 140, 143, 151, 165-167 identified in Table 3D
  • a composition comprises a peptide structure or a product ion.
  • the peptide structure or product ion can include an amino acid sequence having at least 90% sequence identity to any one of SEQ ID NOS: 11-19, as identified in Table 5, corresponding to peptide structures PS-1 to PS-10 in Table 1.
  • the peptide structure or product ion can include an amino acid sequence having at least 90% sequence identity to any one of SEQ ID NOS: 14, 15, 31-46, as identified in Table 5, corresponding to various ones of peptide structures PS-5 and PS-11 to PS-34 in Table 2.
  • the peptide structure or product ion can include an amino acid sequence having at least 90% sequence identity to any one of SEQ ID NOS: 11, 14, 15, 31,32, 33, 34, 37, 38, 40, 42, 44, 45, 46, 53-65, as identified in Table 5, corresponding to various ones of peptide structures PS-1, PS-5, PS-11, PS- 15, PS-20, PS-25, PS-28, PS-29, PS-30, PS-31, PS-32, and PS-35 to PS-61 in Table 3.
  • the peptide structure or product ion can include an amino acid sequence having at least 90% sequence identity to any one of SEQ ID NOS: 14, 18, 32, 33, 37, 39, 42, 45, 54, 56, 60, 68-77, as identified in Table 5, corresponding to various ones of peptide structures PS-4, PS-8, PS-18, PS-36, PS-37, PS-41, PS-56, PS-62 to PS-90 in Table 3B.
  • the peptide structure or product ion can include an amino acid sequence having at least 90% sequence identity to any one of SEQ ID NOS: 101-125, corresponding to various ones of peptide structures in Table 3C or product ions in Table 4B.
  • the peptide structure or product ion can include an amino acid sequence having at least 90% sequence identity to any one of SEQ ID NOS: 126-175, as identified in Table 5, corresponding to various ones of peptide structures PS-91 to PS-140 in Table 3D.
  • the product ion is selected as one from a group consisting of product ions identified in Tables 4, 4B, and 4C including product ions falling within an identified m/z range of the m/z ratio identified in Tables 4, 4B, and 4C and characterized as having a precursor ion having an m/z ratio within an identified m/z range of the m/z ratio identified in Tables 4, 4B, and 4C.
  • a first range for the product ion m/z ratio may be ⁇ 0.5.
  • a second range for the product ion m/z ratio may be ⁇ 0.8.
  • a third range for the product ion m/z ratio may be ⁇ 1.0.
  • a first range for the precursor ion m/z ratio may be ⁇ 1.0; a second range for the precursor ion m/z ratio may be ( ⁇ 1.5).
  • a composition may include a product ion having an m/z ratio that falls within at least one of the first range ( ⁇ 0.5), the second range ( ⁇ 0.8), or the third range ( ⁇ 1.0) of the product ion m/z ratio identified in Tables 4, 4B, and 4C, and characterized as having a precursor ion having an m/z ratio that falls within at least one of first range ( ⁇ 0.5), a second range ( ⁇ 1.0), or a third range ( ⁇ 1.0 of the precursor ion m/z ratio identified in Tables 4, 4B, and 4C.
  • Table 4B Mass Spectrometry-Related Characteristics for the Peptide Structures associated with Ovarian Cancer (e.g., EOC) - in accordance with Table 3C
  • Table 4C Mass Spectrometry-Related Characteristics for the Peptide Structures associated with Ovarian Cancer (e.g., EOC) - in accordance with Table 3D
  • Tables 4, 4B, and 4C show various parameters associated with the identification of the peptide and glycopeptides using LC and MRM-MS.
  • the retention time (RT) represents the amount of time in minutes for the peptide elute from the chromatography column.
  • the collision energy represents the energy applied to the peptide for creating fragments (i.e., product ions) such as, for example, in the 2 nd quadrupole of the triple quadrupole MS.
  • the first precursor m/z represents a ratio value associated with an ionized form having a first precursor charge for the peptide or glycopeptide.
  • the second precursor m/z represents a ratio value associated with an ionized form having a second precursor charge for the peptide or glycopeptide.
  • the first precursor ion is associated with a first product ion having a m/z ratio that was formed from a collision and the second precursor ion is associated with a second product ion having a m/z ratio that was formed from a collision.
  • the first precursor and the second precursor may be the same, but the associated first and second product m/z ratios are different.
  • Table 5 defines the peptide sequences for SEQ ID NOS: 11-19, 31-46, 53-65, 68- 77, and 126-175 from at least one of Tables 1, 2, 3, 3B, 3C, and 3D. Table 5 further identifies a corresponding protein SEQ ID NO. for each peptide sequence.
  • Table 6 identifies the proteins of SEQ ID NOS: 1-10, 20-30, 47-52, and 66-67 from at least of one of Tables 1, 2, 3, 3B, 3C, and 3D.
  • Table 6 identifies a corresponding protein abbreviation and protein name for each of protein SEQ ID NOS: 1-10, 20-30, 47-52, and 66- 67.
  • Table 6 identifies a corresponding Uniprot ID and protein sequence for each of protein SEQ ID NOS: 1-10, 20-30, 47-52, and 66-67.
  • Table 7 identifies and defines the glycan symbol structures included in Tables 1, 2, 3, 3B, 3C, and 3D.
  • Table 7 identifies a coded representation of the composition for each glycan structure included in Tables 1, 2, 3, 3B, 3C, and 3D.
  • the 4-digit GL NO. is a designation that represents the number of hexoses, the number of HexNAcs, the number of Fucoses, and the number of Neuraminic Acids. It should be noted that glycan structure GL No 1102 is an O-glycan and the remaining glycans of Table 7 were N-glycans.
  • Table 7 illustrates the symbol structure and composition of detected glycan moi eties that correspond to glycopeptides of Table 1, 2, 3, 3B, 3C, and 3D based on the Glycan GL NO.
  • the term Symbol Structure illustrates a geometric linking structure of the carbohydrates where the bottommost carbohydrate such as N-acetylglucosamine is bound to the designated amino acid for an N-linked glycan and the rightmost carbohydrate such as N- acetylgalactosamine is bound to the designated amino acid for an O-linked glycan.
  • the Glycan Structure GL NO 1102 is an O-linked glycan and that the rest of the glycans in Table 7 are N-linked glycans.
  • N-linked glycans have a glycan attached to the amino acid asparagine and O-linked glycans have a glycan attached to either a serine or a threonine.
  • the identity of the various monosaccharides is illustrated by the Legend section located at the end of Table 7.
  • the abbreviations of the Legend are Glc that represents glucose and is indicated by a dark circle, Gal that represents galactose and is indicated by an open circle, Man that represents mannose and is indicated by a circle with intermediate grey shading, Fuc that represents fucose and is indicated by a dark triangle, Neu5Ac that represents N- acetylneuraminic acid and is indicated by a dark diamond, GlcNAc that represents N- acetylglucosamine and is indicated by a dark square, GalNAc that represents N- acetylgalactosamine and is indicated by an open square, and ManNAc that represents N- acetylmannosamine and is indicated by a square with intermediate grey shading.
  • Composition refers to the number of various classes of carbohydrates that make up the glycan.
  • the quantity for each class of carbohydrate is depicted as a number in parenthesis to the right of an abbreviation that corresponds to the class of the carbohydrate.
  • the abbreviations for these clasess are Hex, HexNAc, Fuc, and NeuAc that respectively correspond to hexose, N-acetylhexosamine, fucose, and N-acetylneuraminic acid.
  • hexose sugars include glucose, galactose, and mannose; and N-acetylhexosamine sugars includes N-acetylglucosamine, N-acetylgalactosamine, and N-acetylmannosamine.
  • the terms Neu5Ac, NeuAc, and N-acetylneuraminic acid may be referred to as sialic acid.
  • Glycan Structure GL NO 3510 there are two symbol structures provided for one Glycan Structure GL NO such as, for example, Glycan Structure GL NO 3510.
  • the identify of a peptide that references a Glycan Structure GL NO that has two symbol structures could be either one of the two possibilities based on the MRM of the LC- MS analysis.
  • a bracket symbol is used as part of the Symbol Structure to indicate that the precise bonding linkage is not exactly known, but that the linking line segment is attached to one of the plurality of adj acent carbohydrates immediately adj acent to the bracket.
  • the fucose of Glycan Structure GL NO 3510 could have either a core fucose or an outer-arm fucose linkage.
  • the fucose orientation of either core or outer-arm linkage can be specified.
  • glycan symbol structure can illustrate an antennary format in the form of branches.
  • Glycan Structure GL NO’s 6513 and 7604 show a tri- antennary and tetra-antennary sialic acid format, respectively.
  • kits comprising one or more compositions, each comprising one or more peptide structures of the disclosure that can be used as assay standards, and instructions for use.
  • Kits in accordance with one or more embodiments described herein may include a label indicating the intended use of the contents of the kit.
  • label as used herein with respect to a kit includes any writing, or recorded material supplied on or with a kit, or that otherwise accompanies a kit.
  • the peptide structures and the transitions produced therefrom, as described herein, may be useful for diagnosing and treating an ovarian cancer disease state.
  • a transition includes a precursor ion and at least one product ion grouping.
  • the peptide structures in Tables 1, 2, 3, 3B, 3C, and 3D as well as their corresponding precursor ion and product ion groupings in Tables 4, 4B, and 4C can be used in mass spectrometry-based analyses to diagnose and facilitate treatment of diseases, such as, for example, PC.
  • aspects of the disclosure include methods for analyzing one or more peptide structures, as described herein.
  • the methods involve processing a sample from a patient to generate a prepared sample that can be inputted into a mass spectrometry system (e.g., a reaction monitoring mass spectrometry system).
  • processing the sample can comprise performing one or more of a denaturation procedure, a reduction procedure, an alkylation procedure, and a digestion procedure.
  • the denaturation and reduction procedures may be implemented in a manner similar to, for example, denaturation and reduction 202 in Figure 2.
  • the alkylation procedure may be implemented in a manner similar to, for example, alkylation procedure 204 in Figure 2.
  • the digestion procedure may be implemented in a manner similar to, for example, digestion procedure 206 in Figure 2.
  • the methods for analyzing one or more peptide structures involve detecting a set of product ions generated by a reaction monitoring mass spectrometry system in which one or more product ions may correspond to each of the one or more peptide structures that have been inputted into the mass spectrometry system.
  • each peptide structure can be converted into a set of product ions having a defined m/z ratio, as provided in Tables 4, 4B, and 4C or an m/z ratio within an identified m/z ratio as provided in Tables 4, 4B, and 4C.
  • the methods involve generating quantification (e.g., abundance) data for the one or more product ions detected using the reaction monitoring mass spectrometry system.
  • the methods further comprise generating a diagnosis output using the quantification data and a model that has been trained using supervised or unsupervised machine learning.
  • the reaction monitoring mass spectrometry system may include multiple/selected reaction monitoring mass spectrometry (MRM/SRM-MS) to detect the one or more product ions and generate the quantification data.
  • Figure 8 is a table describing the distribution of the samples acquired in this exemplary retrospective analysis in accordance with one or more embodiments.
  • serum samples were acquired from a commercial biobank for 151 women with benign pelvic masses, 145 women with malignant epithelial ovarian cancer (EOC), and 55 healthy controls. Information on stage of EOC was available in 98 of the 145 patients with EOC. All samples were obtained prior to therapeutic intervention. Information on the benign or malignant nature of tumors was based on histopathological analysis of tissue specimens.
  • Sample processing involved pooled human serum/plasma (e.g., glycoprotein standards purified from human serum/plasma) for assay normalization, dithiothreitol (DTT), and iodoacetamide (IAA), sequencing-grade trypsin, LC-MS-grade water and acetonitrile, and formic acid (LC-MS grade). Serum samples were treated with DTT and IAA to reduce disulfide bonds and to inhibit cysteine proteases, respectively, followed by digestion with trypsin at 37°C for 18 hours. The digestion was quenched by adding formic acid to each sample to a final concentration of 1% (v/v).
  • DTT dithiothreitol
  • IAA iodoacetamide
  • LC-MS analysis included separating digested serum samples over an Agilent ZORBAX Eclipse Plus C18 column (2.1 mm x 150 mm i.d., 1.8 pm particle size) using an Agilent 1290 Infinity UHPLC system.
  • the mobile phase A consisted of 3% acetonitrile, 0.1% formic acid in water (v/v), and the mobile phase B of 90% acetonitrile 0.1% formic acid in water (v/v), with the flow rate set at 0.5 mL/minute.
  • the binary solvent composition was set at 100% mobile phase A at the beginning of the run, linearly shifting to 20% B at 20 minutes, 30% B at 40 minutes, and 44% B at 47 minutes.
  • the column was flushed with 100% B and equilibrated with 100% A for a total run time of 70 minutes.
  • samples were injected into an Agilent 6495B triple quadrupole MS operated in dynamic multiple reaction monitoring (dMRM) mode.
  • the MRM transitions comprised 513 glycopeptide structures which were normalized by comparing them with the abundance of 71 non-glycosylated peptide structures, representing each of 71 proteins from which the glycopeptides monitored were derived.
  • Samples were injected randomized as to underlying phenotype, and reference pooled serum digests were injected interspersed with study samples. VIII. A.3. Data Analysis
  • This subset included 976 features, with each feature being a concentration, relative abundance, or site occupancy for a corresponding peptide structure and where some peptide structures correspond with multiple features.
  • a given peptide structure may be associated with one, two, or three features within the subset of the 976 features.
  • Figure 9 is a plot diagram illustrating the results of a principal component analysis performed to assess the segregation between healthy, benign pelvic tumor, and EOC samples across first and second principal components in accordance with one or more embodiments.
  • EOC samples segregated distinctly from healthy control samples, while most benign pelvic tumors did not segregate as distinctly from healthy control samples.
  • Figure 10 is a plot diagram illustrating the results of a principal component analysis performed to assess segregation between healthy, benign pelvic tumor, early EOC, late EOC, and missing (undocumented) samples).
  • EOC samples and in particular late stage EOC samples
  • segregated distinctly from healthy control samples while most benign pelvic tumors did not segregate as distinctly from healthy control samples.
  • FIG 11 is an illustration of a receiver operating characteristic (ROC) diagram corresponding to the multivariable model built to predict malignancy v. benign status of pelvic tumors in accordance with one or more embodiments.
  • the multivariable model that was built may be used accurately and reliably to malignant EOC and distinguish such malignancy from a healthy status.
  • diagnostic power may be used to reduce the need for unnecessary invasive testing.
  • diagnostic information can be used to identify patients with EOC earlier, which may lead to earlier treatment, improved treatment recommendations, and improved treatment plans.
  • FIG 12 is an illustration of a diagram showing the probability distributions for the various groups using the multivariable model for predicting malignancy v. benign status of pelvic tumors in accordance with one or more embodiments.
  • the probability distributions for benign pelvic tumor, healthy, missing (undocumented), stage 1 EOC, stage 2 EOC, stage 3 EOC, and stage 4 EOC samples increased with cancer stage, with probability distributions being similar across training and test sets.
  • applying the built multivariable model to healthy patients, who were not utilized in the training resulted in few misclassifications and a spread nearly equivalent to that of the benign pelvic tumor cases.
  • Such results indicate that the glycoproteomic signature of the solidly predicts malignancy and severity of disease.
  • Table 8 below provides the fold changes, FDRs, and p-values for the 10 peptide structures PS-1 to PS- 10 (same as those in Table 1 above) based on differential expression analysis (DEA).
  • Table 8 Peptide Structure Markers for Regression Model to distinguish between Epithelial Ovarian Cancer and Healthy State
  • FIG. 13 is an illustration of a receiver operating characteristic (ROC) diagram corresponding to the multivariable model built to predict malignancy v. benign status of pelvic tumors in accordance with one or more embodiments.
  • the multivariable model that was built may be used accurately and reliably to triage pelvic tumors and distinguish those that are malignant from those that are benign.
  • diagnostic power may be used to reduce the need for invasive testing (e.g., biopsy) prior to treatment can be administered.
  • diagnostic information can be used to improve treatment recommendations and treatment plans (e.g., earlier treatment in the case of malignant EOC) and reduce indications for unnecessary treatment (e.g., no indication for surgery when the pelvic tumor is benign).
  • FIG 14 is an illustration of a diagram showing the probability distributions for the various groups using the multivariable model for predicting malignancy v. benign status of pelvic tumors in accordance with one or more embodiments.
  • the probability distributions for benign pelvic tumor, healthy, missing (undocumented), stage 1 EOC, stage 2 EOC, stage 3 EOC, and stage 4 EOC samples increased with cancer stage, with probability distributions being similar across training and test sets.
  • applying the built multivariable model to healthy patients, who were not utilized in the training resulted in few misclassifications and a spread nearly equivalent to that of the benign pelvic tumor cases.
  • Such results indicate that the glycoproteomic signature of the 25 peptide structures for the LASSO regression model solidly predict malignancy and severity of disease.
  • Table 9 below provides the fold changes, FDRs, and p-values for the 25 peptide structures PS-5 and PS-11 to PS-34 (same as those in Table 2 above) based on differential expression analysis (DEA).
  • Table 10 below provides the fold changes, FDRs, and p-values for the 36 peptide structures PS-4, PS-8, PS-18, PS-36, PS-37, PS-41, PS-56, PS-62 to PS-90 (same as those in Table 3B above) based on differential expression analysis (DEA).
  • the peptide structures PS- 4, PS-8, PS-18, PS-36, PS-37, PS-41, PS-56, PS-62 to PS-90 are ordered in Table 10 with respect to relative significance to the p value score generated by the model.
  • Table 10 Peptide Structure Markers for Regression Model to distinguish between late stage (3/4) EOC and early stage EOC (1/2) using the biomarkers of Table 3B.
  • Table 10B below provides the fold changes, FDRs, and p-values for the 25 peptide structures denoted by SEQ ID NO 101-125 (in accordance with Table 3C above) using differential expression analysis (DEA).
  • Table 10B Peptide Structure Markers for Regression Model to distinguish between late stage (3/4) EOC and early stage EOC (1/2) using the biomarkers of Table 3C.
  • Table 10C below provides the fold changes, FDRs, and p-values for the 50 peptide structures denoted by SEQ ID NO 126-175 (in accordance with Table 3D above) using differential expression analysis (DEA).
  • DEA differential expression analysis
  • Table 10D below provides the fold changes, FDRs, and p-values for the 12 peptide structures denoted by SEQ ID NO 131-134, 137, 139, 140, 143, 151, 165-167 (in accordance with Table 3D above) using differential expression analysis (DEA).
  • DEA differential expression analysis
  • Table 10D Peptide Structure Markers for Regression Model to distinguish between late stage (3/4) EOC and early stage EOC (1/2) using 12 of the biomarkers of Table 3D.
  • the markers from Table 3B were used to train a regularized regression model (e.g., LASSO regression model).
  • Coefficients for the regularized regression model are provided in Table 11.
  • a probability for one of the states can be determined by summing together the product of the concentration of each biomarker in the sample and the respective coefficient (of one column) and then adding the summation and the intercept to yield the logit of a probability score.
  • the logit of the probability, to which the inverse logit function can be applied is equal to the following equation 1 (eq. 1).
  • n a number of biomarkers having a unique PS-ID No
  • i an index number for each of the biomarkers, Table 11.
  • the markers from Table 3C were used to train a regularized regression model (e.g., LASSO regression model).
  • Coefficients for the regularized regression model e.g., LASSO regression model
  • Table 11B Coefficients for the regularized regression model (e.g., LASSO regression model) are provided in Table 11B.
  • a probability for one of the states can be determined by summing together the product of the concentration of each biomarker in the sample and the respective coefficient (of one column) and then adding the summation and the intercept to yield the logit of a probability score (see equation 1).
  • FIG. 17 illustrates a receiver-operating-characteristic (ROC) curve and the area under curve (AUC) for the regularized regression model (e.g., LASSO regression model) for early stage and late stage ovarian cancer samples using testing case data and training case data.
  • ROC receiver-operating-characteristic
  • AUC area under curve
  • Table 12 shows the accuracy, sensitivity, specificity and precision for the training data set and the testing data set.
  • Table 13 shows the training accuracy and testing accuracy for the early stage and late stage cohort for ovarian cancer.
  • the markers from Table 3D were used to train a regularized regression model (e.g., LASSO regression model).
  • Coefficients for the regularized regression model e.g., LASSO regression model
  • Table 11C Coefficients for the regularized regression model (e.g., LASSO regression model) are provided in Table 11C.
  • a probability for one of the states can be determined by summing together the product of the concentration of each biomarker in the sample and the respective coefficient (of one column) and then adding the summation and the intercept to yield the logit of a probability score (see equation 1).
  • predicted probability was generated for early stage and late stage ovarian cancer samples showing a stratification in predicted probabilities between the two cohorts as is illustrated in Figure 20.
  • predicted probability can be generated for classifying early stage and late stage ovarian cancer samples using the markers with non-zero coefficients such as SEQ ID NO’s 130-135, 137, 139, 140, 143, 148, 149, 155, 158-162, 166, and 171.
  • a logistic regression model was used with the glycopeptides of Table 3D where the glycopeptides had 1 or more sialic acids and zero or more fucosylations for the early and late stage EOC cohorts.
  • glycopeptides that included fucose were found to be associated with EOC.
  • glycopeptides that included fucose and also carrying tri- and tetra-antennary glycan structure were found to be more strongly associated with EOC.
  • Figures 21A to 21E show that the relative abundance of tri- and tetra-antennary glycan structures in benign tumors, early-stage EOC and late-stage EOC showed an increase with the progression of the EOC disease.
  • the numbers 6512, 6512, 7612, 7613, 7614 correspond to the five distinct glycans attached to the glycopeptides.
  • the three leftmost bar graphs represent glycopeptides with tetra-antennary glycans with varying degrees of sialylation.
  • the two rightmost bar graphs are Figures 21D and 21E and they represent glycopeptides with tri-antennary glycans with two or three sialic acids.
  • Figures 21A to 21E are used to show the statistical comparisons between the benign and late-stage cohorts (highest horizontal bar), early-stage and late-stage cohorts (middle horizontal bar), and the benign and early-stage cohorts (lowest horizontal bar),
  • Table 14 shows the accuracy, sensitivity, and specificity for the training data set and the testing data set.
  • a specific subset of tri- and tetra-antennary fucosylated N-glycopeptides were identified that can be used to differentiate between early- and late- stage ovarian cancer.
  • the fucose portion of the specific subset of tri- and tetra- antennary fucosylated N-glycopeptides were found to have an outer arm position. It should be noted that fucose can be bound to a glycan in a core fucosylation or outer-arm orientation.
  • Core fucosylation is a modification of a N-glycan core structure, forming the al, 6 fucosylation of the GlcNAc residue linked to the asparagine, that is catalyzed by FUT8.
  • a fucose in the outer-arm orientation is attached to the antennae of the complex type N-glycans by a-(l-3/4) linkage to the GlcNAc residues or by a-(l-2) linkage to galactose.
  • Figure 22 is a representative figure of a mass spectra with m/z represented on the X-axis and intensity (and therefore abundance) represented on the Y-axis. Arrows indicate the breakdown products indicating the fucose is on the outer-arm (purple diamond - sialic acid, yellow circle - galactose, blue square - N-acetylglucosamine, red triangle - fucose, green circle - mannose). It is worth noting that there is a 4 glycan breakdown fragment composed of sialic acid, galactose, N-acetylglucosamine, and fucose (m/z value of 803.294).
  • the sialic acid is connected to galactose
  • galactose is connected to N- acetylglucosamine
  • N-acetylglucosamine is connected to fucose.
  • the 4 glycan breakdown fragment represents a single antennary branch having a fucose in an outer arm fucose position where the aggregate glycan was cleaved at a linkage between a mannose and a N- acetylglucosamine.
  • the 3 glycan breakdown fragment includes galactose, N-acetylglucosamine, and fucose (m/z value of 512.198).
  • the galactose is connected to N- acetylglucosamine
  • N-acetylglucosamine is connected to fucose.
  • the presence of the 4 glycan breakdown fragment and the 3 glycan breakdown fragment as shown in Figure 22 indicates the presence of outer arm fucosylation.
  • SEQ ID NOS. 131, 137, 143, 155, 158, 159, 162, 166, and 171 correspond to glycopeptides that each have a non-zero coefficient along with one fucose.
  • SEQ ID NO 131, 137, 143, 155, 159, 162, 166, and 171 each correspond to a glycoepeptide having an outer arm fucosylation format.
  • glycopeptide biomarkers with outer arm fucosylation can provide better prediction of ovarian cancer disease states.
  • a predicted probability can be generated for early stage and late stage ovarian cancer samples showing a stratification in predicted probabilities between the two cohorts.
  • predicted probability can be generated for classifying early stage and late stage ovarian cancer samples using the markers
  • glycopeptides of Table 3D were found to be associated with EOC.
  • glycopeptides that included fucose and also carrying tri- and tetra-antennary glycan structure were found to be more strongly associated with EOC.
  • Table 15 shows the accuracy, sensitivity, and specificity for the training data set and the testing data set.
  • the relative performance of Table 15 is better than Table 14 indicating that the subset of biomarkers using predominantly tri and tetra antennary glycans generated a better model for determining early vs late stage EOC.
  • a validation study was conducted using both retrospective patient samples and samples collected prospectively in the ongoing Clinical Validation of the InterVenn Ovarian CAncer Liquid Biopsy (VOCAL) study. Samples included those from patients with malignant EOC and patients with benign pelvic tumors. Samples were processed in a manner similar to the manner described for the Exemplary Retrospective Analysis in Section VII. A above.
  • a logistic regression model was built identifying a panel of 38 peptide structures (same as those in Table 3 above). This panel of 38 peptide structures had an overall predictive accuracy of over 86% for the prediction of malignancy versus benign status of pelvic tumors.
  • Table 10 below provides the fold changes and p-values for the 38 peptide structures also identified in Table 3 above based on differential expression analysis (DEA). These peptide structures are ordered both in Table 3 and in Table 10 with respect to relative significance to the probability score generated by the model based on p-values. In this context, more significant peptide structures have lower p-values, while less significant peptide structures have higher p-values. In other words, relative significance to the probability score decreased with increasing p-values. IX. Additional Considerations
  • Some embodiments of the present disclosure include a system including one or more data processors.
  • the system includes a non-transitory computer readable storage medium containing instructions which, when executed on the one or more data processors, cause the one or more data processors to perform part or all of one or more methods and/or part or all of one or more processes disclosed herein.
  • Some embodiments of the present disclosure include a computer-program product tangibly embodied in a non-transitory machine-readable storage medium, including instructions configured to cause one or more data processors to perform part or all of one or more methods and/or part or all of one or more processes disclosed herein.

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biotechnology (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • Genetics & Genomics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Chemical & Material Sciences (AREA)
  • Public Health (AREA)
  • Epidemiology (AREA)
  • Pathology (AREA)
  • Urology & Nephrology (AREA)
  • Immunology (AREA)
  • Hematology (AREA)
  • Databases & Information Systems (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Cell Biology (AREA)
  • Artificial Intelligence (AREA)
  • Primary Health Care (AREA)
  • Bioethics (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Microbiology (AREA)
  • Evolutionary Computation (AREA)
  • Food Science & Technology (AREA)
  • Medicinal Chemistry (AREA)
  • Biochemistry (AREA)
  • General Physics & Mathematics (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

L'invention concerne une méthode et un système pour diagnostiquer chez sujet un état pathologique du cancer de l'ovaire. Des données de structures peptidiques correspondant à un échantillon biologique obtenu auprès du sujet sont reçues. Les données de structures peptidiques sont analysées à l'aide d'un modèle d'apprentissage automatique supervisé pour générer un indicateur de maladie qui indique si l'échantillon biologique met en évidence l'état pathologique du cancer de l'ovaire sur la base d'au moins une structure peptidique choisie dans un groupe de structures peptidiques identifiées dans le tableau 3B, 3C ou 3D. Le groupe de structures peptidiques dans le tableau 3B, 3C ou 3D comprend un groupe de structures peptidiques associées à l'état pathologique du cancer de l'ovaire. Une sortie diagnostique est générée sur la base de l'indicateur de maladie.
PCT/US2023/074251 2022-09-16 2023-09-14 Diagnostic du cancer de l'ovaire à l'aide d'une quantification ciblée d'une glycosylation de protéine spécifique à un site Ceased WO2024059750A2 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP23866505.3A EP4587839A2 (fr) 2022-09-16 2023-09-14 Diagnostic du cancer de l'ovaire à l'aide d'une quantification ciblée d'une glycosylation de protéine spécifique à un site

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US202263376053P 2022-09-16 2022-09-16
US63/376,053 2022-09-16
US202363489712P 2023-03-10 2023-03-10
US63/489,712 2023-03-10
US202363517859P 2023-08-04 2023-08-04
US63/517,859 2023-08-04

Publications (2)

Publication Number Publication Date
WO2024059750A2 true WO2024059750A2 (fr) 2024-03-21
WO2024059750A3 WO2024059750A3 (fr) 2024-06-13

Family

ID=90275934

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2023/074251 Ceased WO2024059750A2 (fr) 2022-09-16 2023-09-14 Diagnostic du cancer de l'ovaire à l'aide d'une quantification ciblée d'une glycosylation de protéine spécifique à un site

Country Status (2)

Country Link
EP (1) EP4587839A2 (fr)
WO (1) WO2024059750A2 (fr)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5266465A (en) * 1989-06-23 1993-11-30 The Trustees Of The University Of Pennsylvania α-1-antichymotrypsin, analogues and methods of production
AU2003295328A1 (en) * 2002-10-02 2004-04-23 Genentech, Inc. Compositions and methods for the diagnosis and treatment of tumor
HRP20230744T1 (hr) * 2018-03-26 2023-10-27 Glycanostics S.R.O. Sredstva i metode glikoprofiliranja proteina
AU2020216996A1 (en) * 2019-02-01 2021-09-16 Venn Biosciences Corporation Biomarkers for diagnosing ovarian cancer
AU2022276734A1 (en) * 2021-05-18 2024-01-04 Venn Biosciences Corporation Biomarkers for diagnosing ovarian cancer

Also Published As

Publication number Publication date
WO2024059750A3 (fr) 2024-06-13
EP4587839A2 (fr) 2025-07-23

Similar Documents

Publication Publication Date Title
US20230055572A1 (en) Biomarkers for diagnosing ovarian cancer
US20220310230A1 (en) Biomarkers for determining an immuno-onocology response
US11774459B2 (en) Biomarkers for diagnosing non-alcoholic steatohepatitis (NASH) or hepatocellular carcinoma (HCC)
US20260004885A1 (en) Biomarkers for determining a cancer disease state, response to immuno-oncology, stages of fibrosis in non-alcoholic steatohepatitis, or application of age or sex related biomarker panel for quality control
US12578346B2 (en) Systems and methods for glycopeptide concentration determination, normalized abundance determination, and LC/MS run sample preparation
EP4587839A2 (fr) Diagnostic du cancer de l'ovaire à l'aide d'une quantification ciblée d'une glycosylation de protéine spécifique à un site
CN116456895A (zh) 用于诊断非酒精性脂肪性肝炎(nash)或肝细胞癌(hcc)的生物标志物
US20240379228A1 (en) Diagnosis of colorectal cancer using targeted quantification of peptides
AU2022399828A1 (en) Diagnosis of pancreatic cancer using targeted quantification of site-specific protein glycosylation
US20250111901A1 (en) De novo glycopeptide sequencing
US20250087363A1 (en) Predicting sarcoma treatment response using targeted quantification of site-specific protein glycosylation
US20250232874A1 (en) Ai-driven glycoproteomics liquid biopsy in nasopharyngeal carcinoma
HK40098154A (zh) 用於诊断非酒精性脂肪性肝炎(nash)或肝细胞癌(hcc)的生物标志物
WO2025024433A1 (fr) Variantes de pd-1 fucosylées pour détermination d'une réponse immuno-oncologique
HK40109183A (zh) 用於测定免疫肿瘤学反应的生物标志物
CN117561449A (zh) 用于测定免疫肿瘤学反应的生物标志物

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23866505

Country of ref document: EP

Kind code of ref document: A2

WWE Wipo information: entry into national phase

Ref document number: 2023866505

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2023866505

Country of ref document: EP

Effective date: 20250416

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23866505

Country of ref document: EP

Kind code of ref document: A2

WWP Wipo information: published in national office

Ref document number: 2023866505

Country of ref document: EP