WO2023080586A1 - 세포유리 핵산단편 위치별 서열 빈도 및 크기를 이용한 암 진단 방법 - Google Patents
세포유리 핵산단편 위치별 서열 빈도 및 크기를 이용한 암 진단 방법 Download PDFInfo
- Publication number
- WO2023080586A1 WO2023080586A1 PCT/KR2022/016868 KR2022016868W WO2023080586A1 WO 2023080586 A1 WO2023080586 A1 WO 2023080586A1 KR 2022016868 W KR2022016868 W KR 2022016868W WO 2023080586 A1 WO2023080586 A1 WO 2023080586A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- nucleic acid
- cancer
- sequence
- acid fragments
- size
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B35/00—ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6806—Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
- C12Q1/6886—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B35/00—ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides
- G16B35/20—Screening of libraries
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B5/00—ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
Definitions
- the present invention relates to a method for diagnosing cancer using the relative frequency and size of sequences for each position of cell-free nucleic acid fragments, and more specifically, to a method for diagnosing cancer by extracting nucleic acids from a biological sample, obtaining sequence information, and locating nucleic acid fragments based on aligned reads.
- the present invention relates to a cancer diagnosis method using a method of deriving the relative frequency of a star sequence and the size of a nucleic acid fragment, and then analyzing the calculated value by inputting it into a learned artificial intelligence model.
- Cancer diagnosis in clinical practice is usually confirmed by performing a tissue biopsy after a medical history, physical examination, and clinical evaluation. Cancer diagnosis by clinical tests is possible only when the number of cancer cells is 1 billion or more and the diameter of the cancer is 1 cm or more. In this case, the cancer cells already have the ability to metastasize, and at least half of them have already metastasized.
- tissue biopsy is invasive, it causes considerable inconvenience to patients, and there are problems in that tissue biopsy can often not be performed while treating cancer patients.
- cancer screening tumor markers are used to monitor substances produced directly or indirectly from cancer, but even when cancer is present, more than half of the tumor marker screening results are normal, and often positive even when there is no cancer. Because it appears, there is a limit to its accuracy.
- liquid biopsy using a patient's body fluid as a recent cancer diagnosis and follow-up test (liquid biopsy) is widely used.
- Liquid biopsy is a non-invasive diagnostic technique that is attracting attention as an alternative to conventional invasive diagnostic and examination methods.
- Gradient Boosting Algorithm is a predictive model capable of performing regression or classification analysis, and is an algorithm that belongs to the boosting family of ensemble methodologies of predictive models.
- the Gradient Boosting Algorithm shows tremendous performance in predicting tabular format data (data in X-Y Grid like Excel format), and is known to have the highest predictive performance among machine learning algorithms.
- the present inventors have made diligent efforts to solve the above problems and develop a highly sensitive and accurate artificial intelligence-based cancer diagnosis method.
- the present invention was completed by confirming that cancer diagnosis can be performed with high sensitivity and accuracy when the sequence relative frequency and size combination of is selected and analyzed with a learned artificial intelligence model.
- An object of the present invention is to provide a method for diagnosing cancer using the relative frequency and size of sequence by position of cell-free nucleic acid fragments.
- Another object of the present invention is to provide a device for diagnosing cancer using the relative frequency and size of sequences for each position of cell-free nucleic acid fragments.
- Another object of the present invention is to provide a computer readable storage medium containing instructions configured to be executed by a processor for diagnosing cancer by the above method.
- the present invention includes (a) obtaining sequence information by extracting nucleic acids from a biological sample; (b) aligning the obtained sequence information (reads) with a standard chromosome sequence database (reference genome database); (c) deriving a sequence relative frequency and size of nucleic acid fragments for each location of nucleic acid fragments using the aligned sequence reads; And (d) inputting the derived sequence relative frequency and size information into an artificial intelligence model learned to diagnose cancer and comparing the output result value analyzed with a cut-off value to determine the presence of cancer,
- the artificial intelligence model is a method for providing information for cancer diagnosis using cell-free nucleic acid, characterized in that it is learned to distinguish between normal samples and cancer samples based on sequence relative frequency by position of nucleic acid fragments and size information of nucleic acid fragments. to provide.
- the present invention also includes a decoding unit for extracting nucleic acids from a biological sample and decoding sequence information; an alignment unit that aligns the translated sequence with a standard chromosomal sequence database; a nucleic acid fragment analyzer for deriving a sequence relative frequency and a size of the nucleic acid fragments for each position of the nucleic acid fragments based on the aligned sequences; and a cancer diagnosis unit that inputs and analyzes the derived nucleic acid fragment sequence relative frequency by position and size information of the nucleic acid fragment into the learned artificial intelligence model, and compares the information with a reference value to determine the presence or absence of cancer. It provides a cancer diagnosis device comprising a.
- the present invention also provides a computer-readable storage medium comprising instructions configured to be executed by a processor providing information for diagnosing cancer, including: (a) obtaining sequence information by extracting nucleic acids from a biological sample; (b) aligning the obtained sequence information (reads) with a standard chromosome sequence database (reference genome database); (c) deriving a sequence relative frequency and size of nucleic acid fragments for each location of nucleic acid fragments using the aligned sequence reads; And (d) inputting the derived sequence relative frequency and size information into an artificial intelligence model learned to diagnose cancer and comparing the output result value analyzed with a cut-off value to determine the presence of cancer,
- the artificial intelligence model in step (d) is learned to distinguish between normal samples and cancer samples based on the sequence relative frequency of each position of the nucleic acid fragments and the size information of the nucleic acid fragments, wherein the information for diagnosing cancer It provides a computer readable storage medium comprising instructions configured to be executed by a processor that provides.
- the present invention also includes (a) obtaining sequence information by extracting nucleic acids from a biological sample; (b) aligning the obtained sequence information (reads) with a standard chromosome sequence database (reference genome database); (c) deriving a sequence relative frequency and size of nucleic acid fragments for each location of nucleic acid fragments using the aligned sequence reads; And (d) inputting the derived sequence relative frequency and size information into an artificial intelligence model learned to diagnose cancer and comparing the output result value analyzed with a cut-off value to determine the presence of cancer,
- the artificial intelligence model provides a method for diagnosing cancer using cell-free nucleic acid, characterized in that it is learned to distinguish between normal samples and cancer samples based on sequence relative frequency by position of nucleic acid fragments and size information of nucleic acid fragments.
- 1 is an overall flowchart for performing the method for diagnosing cancer using the relative frequency and size of sequences for each position of cell-free nucleic acid fragments according to the present invention.
- FIG. 2 is an example of a process of selecting a size of a nucleic acid fragment having a statistically significant difference in relative frequency by size between a healthy person and a cancer patient in one embodiment of the present invention.
- Figure 3 is a graph confirming the statistical value of the relative frequency by size of nucleic acid fragments identified in an embodiment of the present invention and the size distribution of selected nucleic acid fragments.
- FIG. 4 is a visualization of the FESS table produced in one embodiment of the present invention in a heatmap format.
- the left panel of FIG. 5 is an enlarged view of the portion indicated by the dotted line in FIG. 4, and the two right panels are results of statistical analysis of relative frequencies of nucleotide sequences at each location.
- 6 is a result of statistically confirming the similarity between each nucleotide sequence by calculating the relative frequency of each nucleotide sequence A, T, G, and C at the position of the selected nucleic acid fragment in one embodiment of the present invention.
- (A) is the result of confirming the performance of the machine learning model constructed in one embodiment of the present invention with accuracy and AUC, and (B) is a confusion matrix.
- FIG. 9 is a graph showing statistical values of relative frequency by size of nucleic acid fragments identified in an embodiment of the present invention and size distribution of selected nucleic acid fragments at different positions and bases.
- the upper panel is Accuracy
- the lower panel is AUC (Area Under Curve). )am.
- first, second, A, B, etc. may be used to describe various elements, but the elements are not limited by the above terms, and are merely used to distinguish one element from another. used only as For example, without departing from the scope of the technology described below, a first element may be referred to as a second element, and similarly, the second element may be referred to as a first element.
- the terms and/or include any combination of a plurality of related recited items or any of a plurality of related recited items.
- each component to be described below may be combined into one component, or one component may be divided into two or more for each more subdivided function.
- each component to be described below may additionally perform some or all of the functions of other components in addition to its main function, and some of the main functions of each component may be performed by other components. Of course, it may be dedicated and performed by .
- each process constituting the method may occur in a different order from the specified order unless a specific order is clearly described in context. That is, each process may occur in the same order as specified, may be performed substantially simultaneously, or may be performed in the reverse order.
- the sequence analysis data obtained from the sample is aligned with the reference genome, and then, based on the aligned sequence information, the sequence relative frequency and size of the nucleic acid fragment for each position of the nucleic acid fragment are derived, and the position of the derived nucleic acid fragment is derived. It was intended to confirm that cancer diagnosis can be performed with high sensitivity and accuracy when the relative frequency of star sequences and the size information of nucleic acid fragments are input to the learned artificial intelligence model and then the XPI value is calculated and analyzed.
- the optimal nucleic acid fragment After deriving the sequence relative frequency and size combination of nucleic acid fragments for each position of , the XPI value was calculated by learning it with a deep learning model, and a method for performing cancer diagnosis was developed by comparing it with the reference value (FIG. 1).
- It relates to a method for providing information for cancer diagnosis using cell-free nucleic acid comprising the following steps:
- the artificial intelligence model is characterized in that it is learned to distinguish between normal samples and cancer samples based on sequence relative frequency by position of nucleic acid fragments and size information of nucleic acid fragments.
- the nucleic acid fragment can be used without limitation as long as it is a fragment of nucleic acid extracted from a biological sample, and preferably may be a fragment of cell-free nucleic acid or intracellular nucleic acid, but is not limited thereto.
- the nucleic acid fragment can be obtained by any method known to those skilled in the art, and is preferably directly sequenced, sequenced through next-generation sequencing, or non-specific whole genome amplification. ), or obtained through sequencing or probe-based sequencing, but is not limited thereto.
- the cancer may be solid cancer or hematological cancer, preferably non-Hodgkin lymphoma, non-Hodgkin lymphoma, acute myeloid leukemia, or acute lymphocytic leukemia.
- acute-lymphoid leukemia multiple myeloma, head and neck cancer, lung cancer, glioblastoma, colon/rectal cancer, pancreatic cancer, breast cancer, ovarian cancer, melanoma, prostate cancer
- It may be selected from the group consisting of liver cancer, thyroid cancer, gastric cancer, gallbladder cancer, bile duct cancer, bladder cancer, small intestine cancer, cervical cancer, cancer of unknown primary site, kidney cancer, esophageal cancer, neuroblastoma, and mesothelioma, more preferably neuroblastoma. It may be a blastoma, but is not limited thereto.
- the step (a) is
- the step of obtaining sequence information of step (a) may be characterized in that the isolated cell-free DNA is obtained through whole genome sequencing at a depth of 1 million to 100 million reads, but is not limited thereto. .
- the biological sample refers to any material, biological fluid, tissue or cell obtained from or derived from an individual, for example, whole blood, leukocytes, peripheral blood mononuclear peripheral blood mononuclear cells, leukocyte buffy coat, blood (including plasma and serum), sputum, tears, mucus, nasal washes, nasal aspirates, breath, urine, semen, saliva, peritoneal washings, pelvic fluids, cyst fluids ( cystic fluid, meningeal fluid, amniotic fluid, glandular fluid, pancreatic fluid, lymph fluid, pleural fluid, nipple aspirate, bronchi Bronchial aspirate, synovial fluid, joint aspirate, organ secretions, cells, cell extract, hair, oral cells, placenta cells, cerebrospinal fluid ( cerebrospinal fluid) and mixtures thereof, but is not limited thereto.
- cyst fluids cystic fluid, meningeal fluid, amniotic fluid, glandular fluid, pancreatic fluid, lymph fluid,
- the term "reference group” is a reference group that can be compared like a standard sequencing database, and refers to a group of people who do not currently have a specific disease or condition.
- the standard nucleotide sequence in the standard chromosome sequence database of the reference group may be a reference chromosome registered with a public health institution such as NCBI.
- the nucleic acid in step (a) may be cell-free DNA, more preferably circulating tumor DNA, but is not limited thereto.
- next-generation sequencer can be used with any sequencing method known in the art. Sequencing of nucleic acids isolated by selection methods is typically performed using next-generation sequencing (NGS).
- Next-generation sequencing includes any sequencing method that determines the nucleotide sequence of an individual nucleic acid molecule or one of clonally expanded proxies for individual nucleic acid molecules in a highly similar manner (e.g., 105 or more molecules are sequenced simultaneously). do).
- the relative abundance of a nucleic acid species in a library can be estimated by counting the relative number of occurrences of its cognate sequence in data generated by sequencing experiments. Next-generation sequencing methods are known in the art and are described, for example, in Metzker, M. (2010) Nature Biotechnology Reviews 11:31-46, incorporated herein by reference.
- next-generation sequencing is performed to determine the nucleotide sequence of individual nucleic acid molecules (e.g., the HeliScope Gene Sequencing system from Helicos BioSciences and the Pacific Biosciences' HeliScope Gene Sequencing system). PacBio RS system).
- sequencing e.g., massively parallel short-read sequencing that yields more bases of sequence per sequencing unit than other sequencing methods that yield fewer but longer reads (e.g., San Diego, Calif.)
- the Illumina Inc. Solexa sequencer method determines the nucleotide sequence of clonally expanded proxies for individual nucleic acid molecules (e.g., Illumina Inc., San Diego, CA).
- Solexa sequencer 454 Life Sciences (Branford, Connecticut) and Ion Torrent).
- Other methods or machines for next-generation sequencing include, but are not limited to, 454 Life Sciences (Branford, CT), Applied Biosystems (Foster City, CA; SOLiD sequencers), Helicos Provided by Bioscience Corporation (Cambridge, MA) and emulsion and microfluidic sequencing technology nanodroplets (eg, GnuBio droplets).
- Genome Sequencer FLX system from Roche/454
- Illumina/Solexa Genome Analyzer GA
- Life/APG's Support Oligonucleotide Ligation Detection SOLiD
- Polonator's G.007 Hcos BioSciences' HeliScope Gene Sequencing system , Oxford Nanopore Technologies' PromethION, GriION, MinION system and Pacific Biosciences' PacBio RS system.
- the sequence alignment in step (b) is a computer algorithm that evaluates the similarity between most of the lead sequences in the genome (eg, short-lead sequences from next-generation sequencing) and the reference sequence. It includes computational methods or approaches used for identification from cases likely to be derived by A variety of algorithms can be applied to sequence alignment problems. Some algorithms are relatively slow, but allow relatively high specificity. These include, for example, dynamic programming-based algorithms. Dynamic programming is a way to solve complex problems by breaking them down into simpler steps. Other approaches are relatively more efficient, but are typically less thorough. This includes, for example, heuristic algorithms and probabilistic methods designed for bulk database searches.
- candidate screening reduces the search space for sequence alignments from the whole genome to a shorter enumeration of possible alignment positions.
- Sequence alignment involves aligning sequences with sequences provided in the candidate screening step. This can be done using a global alignment (eg Needleman-Wunsch alignment) or a local alignment (eg Smith-Waterman alignment).
- attribute sorting algorithms can be characterized as one of three types based indexing methods: hash tables (e.g. BLAST, ELAND, SOAP), suffix trees (e.g. Bowtie, BWA), and merge sort. (e.g. Slider) based algorithm. Short lead sequences are typically used for alignment.
- hash tables e.g. BLAST, ELAND, SOAP
- suffix trees e.g. Bowtie, BWA
- merge sort. e.g. Slider
- Short lead sequences are typically used for alignment.
- the alignment step of step (b) is not limited thereto, but may be performed using the BWA algorithm and the Hg19 sequence.
- the BWA algorithm may include BWA-ALN, BWA-SW or Bowtie2, but is not limited thereto.
- the length of the sequence information (reads) in step (b) is 5 to 5000 bp, and the number of sequence information used may be 5,000 to 5 million, but is not limited thereto.
- step (c) prior to performing step (c), it may be characterized in that it further comprises the step of selecting reads having a mapping quality score of the aligned nucleic acid fragments equal to or greater than a reference value, wherein the reference value is a value capable of confirming the quality of the aligned nucleic acid fragments without limitation, and is preferably 50 to 70 points, more preferably 60 points, but is not limited thereto.
- the size of the nucleic acid fragment in step (c) is the number of bases from the 5' end to the 3' end of the nucleic acid fragment.
- the size of the nucleic acid fragment in step (c) can be used without limitation as long as it can distinguish between healthy people and cancer patients, preferably 90 to 250 bp, more preferably 127-129 bp. , 137-139bp, 148-150bp, 156-158bp and 181-183bp, but may be selected from the group consisting of, but is not limited thereto.
- Reverse strand 3 ⁇ -ATGACTGAAACCTTA-5 ⁇ (SEQ ID NO: 2)
- the sequence relative frequency by position of the nucleic acid fragments in step (c) is the total number of nucleic acid fragments having A, T, G, and C bases detected at each position in nucleic acid fragments of the same size. It may be characterized as a value normalized by the number of fragments.
- the position of the nucleic acid fragment in step (c) may be characterized in that it is 1 to 10 bases from the 5' end of the nucleic acid fragment.
- the sequence relative frequency by position of the nucleic acid fragment in step (c) is the frequency of A, T, G, and C bases at 1 to 5 positions from the 5' end of the nucleic acid fragment, Positions 6 to 10 can be characterized by the frequency of A bases.
- the sequence relative frequency and size of the nucleic acid fragments for each position of the nucleic acid fragments in step (c) may be characterized in that one or more selected from those shown in Table 3, preferably Table 7 It may be the sequence relative frequency and nucleic acid fragment size by position of nucleic acid fragments from those listed to Top 1 to Top 5, more preferably sequence relative frequency and nucleic acid fragment size by position of nucleic acid fragments from those listed in Table 7 to Top 50. It may be the size of, and most preferably, the sequence relative frequency and size of the nucleic acid fragments for each position of the nucleic acid fragments up to Top 375.
- the position of the nucleic acid fragment is defined based on the 5' end of the nucleic acid fragment.
- the positions of the nucleic acid fragments from the 5' end of the forward strand of SEQ ID NO: 1 are For1, For2, ... It can have a value of For 15, and so does the reverse strand.
- the For1 value of SEQ ID NO: 1 is T
- the Rev1 value of the reverse strand is A.
- the frequency of nucleotide sequences for each position of a nucleic acid fragment can be calculated in the following process.
- the artificial intelligence model in step (d) can be used without limitation as long as it is a model capable of learning to distinguish between a healthy person and a cancer patient, and is preferably a machine learning model. .
- the artificial intelligence model may be selected from the group consisting of AdaBoost, Random forest, Catboost, Light Gradient Boosting Model, and XGBoost, but is not limited thereto.
- the loss function may be expressed by Equation 1 below.
- the binary classification means that an artificial intelligence model learns to determine the presence or absence of cancer.
- learning when the artificial intelligence model is XGBoost, learning may be performed including the following steps:
- the training data is used when learning the XGBoost model
- the validation data is used for hyper-parameter tuning verification
- the test data is used for performance evaluation after producing the optimal model.
- the hyper-parameter tuning process is a process of optimizing the values of various parameters (maximum depth of learner tree, number of learner trees, learning rate, etc.) constituting the XGBoost model.
- Hyper-parameter tuning process includes Bayesian optimization and grid It may be characterized by using a search technique.
- the learning process optimizes the internal parameters (weights) of the XGBoost model using predetermined hyper-parameters, and when the validation loss compared to the training loss starts to increase, it is determined that the model is overfitting, and before that, the model It may be characterized as stopping learning.
- the result value analyzed from the sequence relative frequency and size information for each position of the nucleic acid fragment inputted by the artificial intelligence model in step d) can be used without limitation as long as it is a specific score or real number, preferably XPI (XGBoost Probability Index) value, but is not limited thereto.
- the XGBoost Probability Index means a value expressed as a probability value by adjusting the output of the artificial intelligence model to a scale of 0 to 1.
- learning is performed so that the XPI value becomes 1 in the case of cancer by using the sigmoid function. For example, if a neuroblastoma sample and a normal sample are input, the XPI value of the neuroblastoma sample is learned to be close to 1 and the normal sample to be close to 0.
- the artificial intelligence model learns, if there is cancer, the output result learns close to 1, and if there is no cancer, the output result learns close to 0. , 0.5 or less, it was judged that there was no cancer and performance measurement was performed (training, validation, test accuracy).
- the reference value of 0.5 is a value that can be changed at any time. For example, if you want to reduce false positives, you can strictly set the standard value higher than 0.5 to determine that you have cancer. You can take a little weaker standard that judges that there is.
- a standard value can be determined by checking the probability of the XPI value by applying unseen data (data that knows the answer that has not been trained for learning) using the learned artificial intelligence model.
- the present invention includes a decoding unit for decoding sequence information by extracting nucleic acids from a biological sample
- an alignment unit that aligns the translated sequence with a standard chromosomal sequence database
- nucleic acid fragment analyzer for deriving a sequence relative frequency and a size of the nucleic acid fragments for each position of the nucleic acid fragments based on the aligned sequences
- a cancer diagnostic unit that inputs and analyzes the derived sequence relative frequency and nucleic acid fragment size information for each position of the nucleic acid fragments into the learned artificial intelligence model and compares them with reference values to determine whether or not there is cancer;
- It relates to a cancer diagnosis device comprising a.
- the decoding unit nucleic acid injection unit for injecting the extracted nucleic acid in an independent device; And it may include a sequence information analyzer for analyzing the sequence information of the injected nucleic acid, preferably an NGS analysis device, but is not limited thereto.
- the decryption unit may be characterized in that it receives and decodes sequence information data generated in an independent device.
- the present invention is a computer readable storage medium comprising instructions configured to be executed by a processor providing information for diagnosing cancer
- the artificial intelligence model in step (d) is learned to distinguish between normal samples and cancer samples based on the sequence relative frequency of each position of the nucleic acid fragments and the size information of the nucleic acid fragments, wherein the information for diagnosing cancer It relates to a computer readable storage medium comprising instructions configured to be executed by a processor that provides.
- the present invention relates to a method for diagnosing cancer using cell-free nucleic acid comprising the following steps:
- the artificial intelligence model is characterized in that it is learned to distinguish between normal samples and cancer samples based on sequence relative frequency by position of nucleic acid fragments and size information of nucleic acid fragments.
- a method according to the present disclosure may be implemented using a computer.
- a computer includes one or more processors coupled to a chip set.
- a memory, a storage device, a keyboard, a graphics adapter, a pointing device, and a network adapter are connected to the chipset.
- the performance of the chipset is enabled by a memory controller hub and an I/O controller hub.
- the memory may be used directly coupled to the processor instead of a chip set.
- a storage device is any device capable of holding data, including a hard drive, compact disk read-only memory (CD-ROM), DVD, or other memory device. Memory is concerned with data and instructions used by the processor.
- the pointing device may be a mouse, track ball or other type of pointing device, and is used in combination with a keyboard to transmit input data to a computer system.
- the graphics adapter presents images and other information on a display.
- the network adapter is connected to the computer system through a local area network or a long distance communication network.
- the computer used herein is not limited to the above configuration, may not have some configurations, may include additional configurations, and may also be part of a storage area network (SAN), and the computer of the present application May be configured to be suitable for the execution of modules in the program for the execution of the method according to the present invention.
- SAN storage area network
- a module herein may mean a functional and structural combination of hardware for implementing the technical idea according to the present application and software for driving the hardware.
- the module may mean a logical unit of a predetermined code and a hardware resource for executing the predetermined code, and does not necessarily mean a physically connected code or one type of hardware. is apparent to those skilled in the art.
- the storage medium includes any medium that stores or transmits data in a form readable by a device such as a computer.
- a computer readable medium may include Read Only Memory (ROM); random access memory (RAM); magnetic disk storage media; optical storage media; It includes flash memory devices and other electrical, optical or acoustic signal transmission media.
- the position of the nucleic acid fragment was defined based on the 5' end of the nucleic acid fragment.
- Example 1 Since the read obtained in Example 1 is a paired-end sequencing read and is 100 bp in length, the forward strand is For1, For2, ... from the 5' end. , the position up to For 100 was set, and in the reverse strand, from the 5' end, Rev1, Rev2, ... , the position up to Rev 100 was set.
- the assembly of nucleic acid fragments was performed using the bamtobed -bedpe option of the bedtools program.
- the entire nucleic acid fragment was divided into groups of nucleic acid fragments having the same size. For example, a group with a nucleic acid fragment size of 101, a group with a nucleic acid fragment size of 150, ... Groups of 200, etc.
- the number of A, T, G, and C bases for each nucleic acid fragment position in each group was counted. For example, counting the number of bases for each position of a nucleic acid fragment in a population in which the size of the nucleic acid fragment is 120 can be summarized as shown in Table 1 below.
- the position and nucleotide sequence of the nucleic acid fragment to be analyzed was fixed as (For1_A), and the following analysis was performed.
- the X-axis of FIG. 3 represents the size of the nucleic acid fragment
- the Y-axis represents the -log10(p) value.
- the larger the Y-axis value the greater the difference between the healthy person and the neuroblastoma patient.
- the (For1_A) frequency difference between healthy individuals and neuroblastomas widened significantly ( ⁇ log10(p) value peaked and went down) at a nucleic acid fragment size of about 10 cycles.
- a total of 15 nucleic acid fragment sizes were selected by selecting nucleic acid fragment sizes that showed a common -log10 (p) peak in both datasets (127-129, 137-139, 148-150, 156-158, 181-183). selected.
- Example 1 Since the data obtained in Example 1 is 100 PE data, a total of 200 nucleic acid fragment positions available for analysis are For1 to 100 and Rev1 to 100.
- Figure 4 is a visualization of FESS_Table_120 in Table 2 in a heatmap format, and the relative frequency difference of A, T, G, and C base sequences according to position was observed only in some of the ends (For1 to 10, Rev1 to 10) indicated by dotted lines. It can be seen that the relative frequency of A, T, G, and C base sequences is repeated toward the end of the read ( ⁇ 100).
- the relative frequency of sequences A, T, G, and C in For1 shows a significant difference from the relative frequencies of sequences A, T, G, and C in For2, but the relative frequencies of sequences A, T, G, and C in For11 and For99 It can be seen that the relative frequencies of the A, T, G, and C base sequences of and the relative frequencies of the A, T, G, and C base sequences of For100 are almost similar without significant differences.
- (For1_A and Rev1_A), (For1_T and Rev1_T), (For1_G and Rev1_G), and (For1_C and Rev1_C) have similar relative frequency values, and in the same way (For2_A and Rev2_A), (For2_T and Rev2_T ), (For2_G and Rev2_G), and (For2_C and Rev2_C) have similar relative frequency values.
- the relative frequency of four types of base sequences A, T, G, and C can be calculated.
- relative frequencies of For1_A, For1_T, For1_G, and For1_C can be calculated at the location of For1.
- additional selection was performed by confirming the similarity between nucleotide sequences at the same position. Base sequence selection by location was performed in the healthy population in the following manner.
- nucleotide sequences A, T, G, and C were selected at For1-5 positions, and only A nucleotide sequences were selected as representative values among A, T, C, and G at For6-10 positions.
- Nucleic acid fragment sizes 127, 128, 129, 137, 138, 139, 148, 149, 150, 156, 157, 158, 181, 182, 183. A total of 15 fragments.
- Example 2 Using the relative frequency values of the 375 features selected in Example 2 as inputs, a machine learning model for classifying healthy people and patients with neuroblastoma was learned.
- the machine learning algorithm used XGBoost.
- the entire sample was divided into training, validation, and test data sets, and the training data set was used for model learning, the validation data set for hyper-parameter tuning, and the test data set for final model performance evaluation.
- the number of samples for each set is as follows.
- the hyper-parameter tuning process is a process of optimizing the values of various parameters (maximum depth of the learner tree, number of learner trees, learning rate, etc.) that make up the XGBoost model.
- Bayesian optimization and grid search techniques were used in the hyper-parameter tuning process, and when the validation loss compared to the training loss started to increase, it was judged that the model was overfitting and model learning was stopped.
- the performance of several models obtained through hyper-parameter tuning was compared using the validation data set, and among them, the model with the best validation data set performance was determined to be the best model, and the final performance evaluation was performed with the test data set. .
- the relative frequency value vector of 375 features calculated from a random sample is input to the XGBoost model created through the above process, the probability of the sample being a healthy person or a patient with neuroblastoma is calculated, and these probability values are converted into XGBoost It was defined as the Probability Index (XPI).
- the XPI value calculated from any sample was greater than 0.5, it was determined as a neuroblastoma patient, and if it was less than 0.5, it was determined as a healthy person.
- Example 3 The performance of the XPI value output from the machine learning model built in Example 3 was tested. All samples were divided into Train, Validation, and Test groups, and models were built using the Train sample, and then the performance of the model created using the Train sample was verified using samples from the Validation and Test groups.
- the X axis of FIG. 8 represents group (True label) information of actual samples
- the Y axis represents the XPI values of a healthy person (Normal) and a neuroblastoma patient (NBT) day calculated by the machine learning model in order from the left.
- the XPI distribution confirmed that healthy samples had the highest probability of being healthy in all of the Train, Validation, and Test data sets, and neuroblastoma patient samples had the highest probability of being liver cancer patients. confirmed to appear.
- Example 3 A learning model was built in Example 3 using the features selected in Example 2, and when the XGB model was learned using each feature, the importance values of each feature are shown in Table 6 below.
- the top 3 rows of Table 7 are the results of measuring performance by the Accuracy (ACC) method, and the bottom 3 rows are the results of measuring performance by the AUC method.
- the composition of the train, valid, and test sets that measure ACC and AUC performance is the same.
- the method for diagnosing cancer using the relative frequency and size of sequences for each position of cell-free nucleic acid fragments obtains information on the relative frequency and size of optimal sequences for each position of nucleic acid fragments and analyzes them using an AI algorithm, so even if the read coverage is low, the read coverage is high. It is useful to indicate sensitivity and accuracy.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Medical Informatics (AREA)
- Chemical & Material Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Biotechnology (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Analytical Chemistry (AREA)
- Data Mining & Analysis (AREA)
- Public Health (AREA)
- Organic Chemistry (AREA)
- Molecular Biology (AREA)
- Epidemiology (AREA)
- Databases & Information Systems (AREA)
- Zoology (AREA)
- Biochemistry (AREA)
- Wood Science & Technology (AREA)
- Pathology (AREA)
- Genetics & Genomics (AREA)
- Biomedical Technology (AREA)
- Immunology (AREA)
- Library & Information Science (AREA)
- Artificial Intelligence (AREA)
- Bioethics (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Microbiology (AREA)
- General Engineering & Computer Science (AREA)
- Primary Health Care (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Hospice & Palliative Care (AREA)
Abstract
Description
Claims (14)
- 다음의 단계를 포함하는 무세포 핵산을 이용한 암 진단을 위한 정보의 제공방법:(a) 생체시료에서 핵산을 추출하여 서열정보를 획득하는 단계;(b) 획득한 서열정보(reads)를 표준 염색체 서열 데이터베이스(reference genome database)에 정렬(alignment)하는 단계;(c) 상기 정렬된 서열정보(reads)를 이용하여 핵산단편(fragments)의 위치별 서열 상대 빈도 및 핵산단편의 크기를 도출하는 단계; 및(d) 도출된 서열 상대 빈도 및 크기 정보를 암을 진단하도록 학습된 인공지능 모델에 입력하여 분석한 출력 결과값과 기준값(cut-off value)을 비교하여 암 유무를 판정하는 단계에 있어서,상기 인공지능 모델은 핵산단편의 위치별 서열 상대 빈도 및 핵산단편의 크기 정보를 기반으로 정상 샘플과 암 샘플을 구별하도록 학습된 것을 특징으로 함.
- 다음의 단계를 포함하는 무세포 핵산을 이용한 암 진단방법:(a) 생체시료에서 핵산을 추출하여 서열정보를 획득하는 단계;(b) 획득한 서열정보(reads)를 표준 염색체 서열 데이터베이스(reference genome database)에 정렬(alignment)하는 단계;(c) 상기 정렬된 서열정보(reads)를 이용하여 핵산단편(fragments)의 위치별 서열 상대 빈도 및 핵산단편의 크기를 도출하는 단계; 및(d) 도출된 서열 상대 빈도 및 크기 정보를 암을 진단하도록 학습된 인공지능 모델에 입력하여 분석한 출력 결과값과 기준값(cut-off value)을 비교하여 암 유무를 판정하는 단계에 있어서,상기 인공지능 모델은 핵산단편의 위치별 서열 상대 빈도 및 핵산단편의 크기 정보를 기반으로 정상 샘플과 암 샘플을 구별하도록 학습된 것을 특징으로 함.
- 제1항 또는 제2항에 있어서, 상기 (a) 단계는 다음의 단계를 포함하는 방법으로 수행되는 것을 특징으로 하는 방법:(a-i) 생체시료에서 핵산을 수득하는 단계;(a-ii) 채취된 핵산에서 솔팅-아웃 방법(salting-out method), 컬럼 크로마토그래피 방법(column chromatography method) 또는 비드 방법(beads method)을 사용하여 단백질, 지방, 및 기타 잔여물을 제거하고 정제된 핵산을 수득하는 단계;(a-iii) 정제된 핵산 또는 효소적 절단, 분쇄, 수압 절단 방법(hydroshear method)으로 무작위 단편화(random fragmentation)된 핵산에 대하여, 싱글 엔드 시퀀싱(single-end sequencing) 또는 페어 엔드 시퀀싱(pair-end sequencing) 라이브러리(library)를 제작하는 단계;(a-iv) 제작된 라이브러리를 차세대 유전자서열검사기(next-generation sequencer)에 반응시키는 단계; 및(a-v) 차세대 유전자서열검사기에서 핵산의 서열정보(reads)를 획득하는 단계.
- 제1항에 있어서, 상기 (c) 단계의 핵산단편의 크기는 127-129bp, 137-139bp, 148-150bp, 156-158bp 및 181-183bp로 구성된 군에서 선택되는 것을 특징으로 하는 방법.
- 제1항에 있어서, 상기 (c) 단계의 핵산단편의 위치별 서열 상대 빈도는 동일한 크기의 핵산단편에서, 각각의 위치에서 검출되는 A, T, G, C 염기를 가지는 핵산단편의 수를 전체 핵산 단편 수로 정규화한 값인 것을 특징으로 하는 방법.
- 제5항에 있어서, 상기 (c) 단계의 핵산단편의 위치는 핵산단편의 5’ 말단에서 1 내지 10개 염기인 것을 특징으로 하는 방법.
- 제5항에 있어서, 상기 (c) 단계의 핵산단편의 위치별 서열 상대 빈도는 핵산단편의 위치는 핵산단편의 5’ 말단에서 1 내지 5개 위치에서는 A, T, G 및 C 염기의 빈도이며, 6 내지 10개 위치에서는 A 염기의 빈도인 것을 특징으로 하는 방법.
- 제1항에 있어서, 상기 (c) 단계의 핵산단편(fragments)의 위치별 서열 상대 빈도 및 핵산단편의 크기는 표 3에 기재된 것에서 선택되는 어느 하나 이상인 것을 특징으로 하는 방법.
- 제1항에 있어서, 상기 (d) 단계의 인공지능 모델은 AdaBoost, Random forest, Catboost, Light Gradient Boosting Model 및 XGBoost로 구성된 군에서 선택되는 것을 특징으로 하는 방법.
- 제1항에 있어서, 상기 (e) 단계의 인공지능 모델이 입력된 서열 상대 빈도 및 크기 정보를 분석하여 출력하는 결과값은 XPI(XGBoost Probability Index)값인 것을 특징으로 하는 방법.
- 제1항에 있어서, 상기 (d) 단계의 기준값은 0.5이며, 0.5 이상일 경우, 암 인 것으로 판정하는 것을 특징으로 하는 방법.
- 생체시료에서 핵산을 추출하여 서열정보를 해독하는 해독부;해독된 서열을 표준 염색체 서열 데이터베이스에 정렬하는 정렬부;정렬된 서열 기반의 핵산단편의 위치별 서열 상대 빈도 및 핵산단편의 크기를 도출하는 핵산단편 분석부; 및도출된 핵산단편의 위치별 서열 상대 빈도 및 핵산단편의 크기 정보를 학습된 인공지능 모델에 입력하여 분석하고, 기준값과 비교하여 암 유무를 판정하는 암 진단부;를 포함하는 암 진단 장치.
- 컴퓨터 판독 가능한 저장 매체로서, 암 진단을 위한 프로세서에 의해 실행되도록 구성되는 명령을 포함하되,(a) 생체시료에서 핵산을 추출하여 서열정보를 획득하는 단계;(b) 획득한 서열정보(reads)를 표준 염색체 서열 데이터베이스(reference genome database)에 정렬(alignment)하는 단계;(c) 상기 정렬된 서열정보(reads)를 이용하여 핵산단편(fragments)의 위치별 서열 상대 빈도 및 핵산단편의 크기를 도출하는 단계; 및(d) 도출된 서열 상대 빈도 및 크기 정보를 암을 진단하도록 학습된 인공지능 모델에 입력하여 분석한 출력 결과값과 기준값(cut-off value)을 비교하여 암 유무를 판정하는 단계에 있어서,상기 (d) 단계의 인공지능 모델은 핵산단편의 위치별 서열 상대 빈도 및 핵산단편의 크기 정보를 기반으로 정상 샘플과 암 샘플을 구별하도록 학습된 것을 특징으로 하는 단계를 통하여, 암 진단을 위한 프로세서에 의해 실행되도록 구성되는 명령을 포함하는 컴퓨터 판독 가능한 저장 매체.
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US18/706,960 US20250131985A1 (en) | 2021-11-03 | 2022-11-01 | Method for diagnosing cancer by using sequence frequency and size at each position of cell-free nucleic acid fragment |
| JP2024526514A JP7805453B2 (ja) | 2021-11-03 | 2022-11-01 | 細胞遊離核酸断片の位置別配列頻度及びサイズを用いたがん診断方法{Method for detecting cancer using fragment end sequence frequency and size by position of cell-free nucleic acid} |
| EP22890317.5A EP4428864A4 (en) | 2021-11-03 | 2022-11-01 | METHOD FOR DIAGNOSING CANCER USING A FREQUENCY AND SEQUENCE SIZE AT EACH POSITION OF AN ACELLULAR NUCLEIC ACID FRAGMENT |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| KR1020210149466A KR20230064172A (ko) | 2021-11-03 | 2021-11-03 | 세포유리 핵산단편 위치별 서열 빈도 및 크기를 이용한 암 진단 방법 |
| KR10-2021-0149466 | 2021-11-03 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023080586A1 true WO2023080586A1 (ko) | 2023-05-11 |
Family
ID=86241775
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/KR2022/016868 Ceased WO2023080586A1 (ko) | 2021-11-03 | 2022-11-01 | 세포유리 핵산단편 위치별 서열 빈도 및 크기를 이용한 암 진단 방법 |
Country Status (5)
| Country | Link |
|---|---|
| US (1) | US20250131985A1 (ko) |
| EP (1) | EP4428864A4 (ko) |
| JP (1) | JP7805453B2 (ko) |
| KR (1) | KR20230064172A (ko) |
| WO (1) | WO2023080586A1 (ko) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118127167A (zh) * | 2024-05-06 | 2024-06-04 | 奥明星程(杭州)生物科技有限公司 | 确定生物体病状的基因标志物、检测模型的构建方法和检测装置 |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20220160806A (ko) * | 2021-05-28 | 2022-12-06 | 주식회사 지씨지놈 | 세포유리 핵산단편 말단 서열 모티프 빈도 및 크기를 이용한 암 진단 및 암 종 예측방법 |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20190036494A (ko) * | 2017-09-27 | 2019-04-04 | 이화여자대학교 산학협력단 | Dna 복제수 변이 기반의 암 종 예측 방법 |
| KR20190085667A (ko) * | 2018-01-11 | 2019-07-19 | 주식회사 녹십자지놈 | 무세포 dna를 포함하는 샘플에서 순환 종양 dna를 검출하는 방법 및 그 용도 |
| KR102061800B1 (ko) | 2017-07-18 | 2020-02-11 | 사회복지법인 삼성생명공익재단 | 기계 학습을 이용한 난소암의 예후 예측 방법, 장치 및 프로그램 |
| KR102108050B1 (ko) | 2019-10-21 | 2020-05-07 | 가천대학교 산학협력단 | 증강 컨볼루션 네트워크를 통한 유방암 조직학 이미지 분류 방법 및 그 장치 |
| WO2020125709A1 (en) | 2018-12-19 | 2020-06-25 | The Chinese University Of Hong Kong | Cell-free dna end characteristics |
| US10975431B2 (en) | 2018-05-18 | 2021-04-13 | The Johns Hopkins University | Cell-free DNA for assessing and/or treating cancer |
| KR20210081547A (ko) | 2019-12-24 | 2021-07-02 | 연세대학교 산학협력단 | 면역 항암 요법의 치료 반응에 관한 정보 제공 방법 및 이를 이용한 디바이스 |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2021503922A (ja) * | 2017-11-28 | 2021-02-15 | グレイル, インコーポレイテッドGrail, Inc. | ターゲットシーケンシングのためのモデル |
| GB2627085B (en) * | 2019-11-06 | 2024-11-13 | Univ Leland Stanford Junior | Methods and systems for analysing nucleic acid molecules |
| JP7763764B2 (ja) * | 2020-01-31 | 2025-11-04 | ガーダント ヘルス, インコーポレイテッド | 標的バリアントがクローンレベルで存在しないことの有意性モデリング |
| EP4118653B1 (en) * | 2020-03-11 | 2024-07-17 | Guardant Health, Inc. | Methods for classifying genetic mutations detected in cell-free nucleic acids as tumor or non-tumor origin |
-
2021
- 2021-11-03 KR KR1020210149466A patent/KR20230064172A/ko active Pending
-
2022
- 2022-11-01 US US18/706,960 patent/US20250131985A1/en active Pending
- 2022-11-01 JP JP2024526514A patent/JP7805453B2/ja active Active
- 2022-11-01 WO PCT/KR2022/016868 patent/WO2023080586A1/ko not_active Ceased
- 2022-11-01 EP EP22890317.5A patent/EP4428864A4/en active Pending
Patent Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR102061800B1 (ko) | 2017-07-18 | 2020-02-11 | 사회복지법인 삼성생명공익재단 | 기계 학습을 이용한 난소암의 예후 예측 방법, 장치 및 프로그램 |
| KR20190036494A (ko) * | 2017-09-27 | 2019-04-04 | 이화여자대학교 산학협력단 | Dna 복제수 변이 기반의 암 종 예측 방법 |
| KR20190085667A (ko) * | 2018-01-11 | 2019-07-19 | 주식회사 녹십자지놈 | 무세포 dna를 포함하는 샘플에서 순환 종양 dna를 검출하는 방법 및 그 용도 |
| US10975431B2 (en) | 2018-05-18 | 2021-04-13 | The Johns Hopkins University | Cell-free DNA for assessing and/or treating cancer |
| WO2020125709A1 (en) | 2018-12-19 | 2020-06-25 | The Chinese University Of Hong Kong | Cell-free dna end characteristics |
| KR20210113237A (ko) * | 2018-12-19 | 2021-09-15 | 더 차이니즈 유니버시티 오브 홍콩 | 무 세포 dna 말단 특성 |
| KR102108050B1 (ko) | 2019-10-21 | 2020-05-07 | 가천대학교 산학협력단 | 증강 컨볼루션 네트워크를 통한 유방암 조직학 이미지 분류 방법 및 그 장치 |
| KR20210081547A (ko) | 2019-12-24 | 2021-07-02 | 연세대학교 산학협력단 | 면역 항암 요법의 치료 반응에 관한 정보 제공 방법 및 이를 이용한 디바이스 |
Non-Patent Citations (7)
| Title |
|---|
| CRISTIANO STEPHEN; LEAL ALESSANDRO; PHALLEN JILLIAN; FIKSEL JACOB; ADLEFF VILMOS; BRUHM DANIEL C.; JENSEN SARAH ØSTRUP; MEDIN: "Genome-wide cell-free DNA fragmentation in patients with cancer", NATURE, NATURE PUBLISHING GROUP UK, LONDON, vol. 570, no. 7761, 29 May 2019 (2019-05-29), London, pages 385 - 389, XP036814426, ISSN: 0028-0836, DOI: 10.1038/s41586-019-1272-6 * |
| DAPING YU ET AL., THORACIC CANCER, vol. 11, 2020, pages 95 - 102 |
| METZKER, M., NATURE BIOTECHNOLOGY REVIEWS, vol. 11, 2010, pages 31 - 46 |
| PEIYONG JIANG ET AL., CANCER DISCOVERY, vol. 10, 2020, pages 664 - 673 |
| See also references of EP4428864A4 |
| WAN NATHAN, DAVID WEINBERG, TZU-YU LIU, KATHERINE NIEHAUS, ERIC A. ARIAZI, DANIEL DELUBAC, AJAY KANNAN, BRANDON WHITE, MITCH BAILE: "Machine learning enables detection of early-stage colorectal cancer by whole-genome sequencing of plasma cell-free DNA", BMC CANCER, vol. 19, 23 August 2019 (2019-08-23), pages 832, XP093062357 * |
| ZHOU, XIONGHUI ET AL., BIORXIV, 16 July 2020 (2020-07-16), pages 201350 |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118127167A (zh) * | 2024-05-06 | 2024-06-04 | 奥明星程(杭州)生物科技有限公司 | 确定生物体病状的基因标志物、检测模型的构建方法和检测装置 |
Also Published As
| Publication number | Publication date |
|---|---|
| US20250131985A1 (en) | 2025-04-24 |
| EP4428864A4 (en) | 2025-10-29 |
| KR20230064172A (ko) | 2023-05-10 |
| JP7805453B2 (ja) | 2026-01-23 |
| EP4428864A1 (en) | 2024-09-11 |
| JP2024544749A (ja) | 2024-12-04 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2021154060A1 (en) | Method of predicting disease, gene or protein related to queried entity and prediction system built by using the same | |
| WO2021107676A1 (ko) | 인공지능 기반 염색체 이상 검출 방법 | |
| WO2022114631A1 (ko) | 인공지능 기반 암 진단 및 암 종 예측방법 | |
| WO2022203437A1 (ko) | 인공지능 기반 무세포 dna의 종양 유래 변이 검출 방법 및 이를 이용한 암 조기 진단 방법 | |
| Dotan et al. | Effect of tokenization on transformers for biological sequences | |
| WO2023080586A1 (ko) | 세포유리 핵산단편 위치별 서열 빈도 및 크기를 이용한 암 진단 방법 | |
| WO2018143540A1 (ko) | 인공신경망을 이용한 위암의 예후 예측 방법, 장치 및 프로그램 | |
| WO2023033329A1 (ko) | 질환 연관 유전자 변이 분석을 통한 질환별 위험 유전자 변이 정보 생성 장치 및 그 방법 | |
| WO2022250512A1 (ko) | 조직 특이적 조절지역의 무세포 dna 분포를 이용한 인공지능 기반 암 조기진단 방법 | |
| WO2022098086A1 (ko) | 비기능성 전사체를 이용한 parp 저해제 또는 dna 손상 약물 감수성 판정방법 | |
| WO2021125744A1 (en) | Method and system for providing interpretation information on pathomics data | |
| WO2020022733A1 (ko) | 전장유전체 시퀀싱 기반의 염색체 이상 검출 방법 및 그 용도 | |
| Hu et al. | Predicting Gram-positive bacterial protein subcellular localization based on localization motifs | |
| WO2022250513A1 (ko) | 세포유리 핵산단편 말단 서열 모티프 빈도 및 크기를 이용한 암 진단 및 암 종 예측방법 | |
| WO2023075402A1 (ko) | 메틸화된 무세포 핵산을 이용한 암 진단 및 암 종 예측방법 | |
| WO2024096538A1 (ko) | 간암 진단용 dna 메틸화 마커 및 이의 용도 | |
| WO2024080783A1 (ko) | 인공지능 기술을 이용하여 pmhc에 대응되는 tcr 정보를 생성하기 위한 방법 및 장치 | |
| WO2025154893A1 (ko) | 세포 샘플 내 진양성 변이를 검출하기 위한 기계학습 모델을 학습시키는 방법 및 장치 | |
| WO2024117792A1 (ko) | 세포유리 핵산단편 말단 서열 모티프 빈도 및 크기를 이용한 암 진단 및 암 종 예측방법 | |
| WO2022250514A1 (ko) | 세포유리 핵산과 이미지 분석기술 기반의 암 진단 및 암 종 예측 방법 | |
| WO2023244046A1 (en) | Method for diagnosing cancer and predicting type of cancer based on single nucleotide variant in cell-free dna | |
| WO2021034034A1 (ko) | 핵산 단편간 거리 정보를 이용한 염색체 이상 검출 방법 | |
| WO2017099414A1 (ko) | 암 진단용 마이크로rna 바이오마커 발굴 방법 및 그 이용 | |
| WO2023191503A1 (ko) | 단일 세포 전사체 분석을 통한 암 미세 환경 내 세포 클러스터의 표적 후보 추천 방법, 그 장치 및 프로그램 | |
| Emam et al. | Detection of mammalian coding sequences using a hybrid approach of chaos game representation and machine learning |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22890317 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2024526514 Country of ref document: JP Kind code of ref document: A |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2022890317 Country of ref document: EP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2022890317 Country of ref document: EP Effective date: 20240603 |
|
| WWP | Wipo information: published in national office |
Ref document number: 18706960 Country of ref document: US |


















