WO2002088903A2 - Methode de prediction - Google Patents

Methode de prediction Download PDF

Info

Publication number
WO2002088903A2
WO2002088903A2 PCT/US2002/013715 US0213715W WO02088903A2 WO 2002088903 A2 WO2002088903 A2 WO 2002088903A2 US 0213715 W US0213715 W US 0213715W WO 02088903 A2 WO02088903 A2 WO 02088903A2
Authority
WO
WIPO (PCT)
Prior art keywords
event
time
relationship
disease
formulae
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2002/013715
Other languages
English (en)
Other versions
WO2002088903A3 (fr
Inventor
Bradford E. Billet
Sreekanth Thumrugoti
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
HEURISTICS USA Ltd
Original Assignee
HEURISTICS USA Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US09/846,734 external-priority patent/US20030009290A1/en
Priority claimed from US09/846,601 external-priority patent/US20020194148A1/en
Priority claimed from US09/846,733 external-priority patent/US20030028351A1/en
Priority claimed from US09/846,605 external-priority patent/US20030036890A1/en
Priority claimed from US09/846,606 external-priority patent/US20030018514A1/en
Application filed by HEURISTICS USA Ltd filed Critical HEURISTICS USA Ltd
Priority to AU2002309621A priority Critical patent/AU2002309621A1/en
Publication of WO2002088903A2 publication Critical patent/WO2002088903A2/fr
Publication of WO2002088903A3 publication Critical patent/WO2002088903A3/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/10ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to drugs or medications, e.g. for ensuring correct administration to patients
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Definitions

  • the present invention relates to the field of methods for predicting the occurrence of identifiable events, using numerical modeling based on past occurrences.
  • the method is best implemented using computer aided numerical processing.
  • the invention has wide applicability in predicting important events in science, medicine, meteorology, sociology, disease control, manufacturing and other areas.
  • the system can be used in forecasting vector-bome and other kinds of serious or fatal disease, as well as the demand for beneficial and life-saving drugs; forecasting agricultural pests and agricultural diseases for use in the chemical and pesticide manufacturing industries; assisting the pharmaceutical industries in the design, testing, synthesis and manufacture of new therapeutic molecules and compounds; increasing the speed of microprocessors; optimizing power grid operations by forecasting demand and equipment failures, and minimizing transmission and distribution losses; forecasting customer behavior for so-called Customer Relationship Management; forecasting the failure of critical equipment to allow for timely service and repair; forecasting the behavior of customers for e-commerce sites; and forecasting interest rates for banks and other financial institutions.
  • event “B” is directly produced by the operation of cause "A” over some necessary short or long period of time.
  • event “B” is produced by several causes. These several causes may produce event “B” by their simple additive effect, or only if occurring in a particular sequence, or only if occurring at precise relative times, or only upon some combination of the foregoing.
  • a causative event may in fact be the absence of a certain event. In other words, a causative event can be the absence at an appropriate time of a blocking event.
  • Another approach which is related to the method of the present invention focuses on empirical models in which the model is fitted, to the data with less regard for the scientific underpinnings of the causative mechanism.
  • these empirical numerical models are less appealing than numeric models derived from an understanding of the causative mechanisms.
  • Empirical numeric models may seem "unscientific” by tying events to causative variables without an understanding of the causation mechanism.
  • they are largely ineffective in predicting events that have never occurred in the past, because for such events there is no ' database from which to construct the empirical model (although this might be addressed in part by using extrapolation or projection techniques).
  • numeric model that is empirically derived may erroneously fail to consider certain important causative factors simply because these factors were not present at the past occurrences upon which the model is based, or were present but are not recognized in the empirical modeling as being a causative factor. Empiric numeric modeling is very useful despite these limitations.
  • the current forecasting tools depend on extracting knowledge from large databases and interpreting this knowledge to forecast future events. This process of extracting knowledge is sometimes called data mining. There are two principal approaches to this process: verification/user-driven data mining, and data driven data mining.
  • a user formats a theory about a possible relation in a database and converts this hypothesis into a query. For example, a user might hypothesize about the relationship between industrial sales of color copiers and customers' specific industries. He or she would generate a query against a data warehouse and segment the results into a report. Typically, the generated information provides a good overview.
  • the hunch is that a company's industry correlates with the number of copiers it buys or leases.
  • the quality of the extracted information depends on the user's interpretation of the results, and is thus subject to error.
  • Multi-factor analyses identify the relationships among factors that influence the outcome of copier sales. Pearson product-moment correlation measures the strength and direction of the relationship between each database field and the dependent variable.
  • One of the problems with this approach aside from its resource intensity, is that the techniques tend to focus on tasks in which all the attributes have continuous or ordinal Values. Many of the attributes are also parametric. The following are among the methodologies followed:
  • a linear classifier for instance, assumes that a relationship is expressible as a linear combination of the attribute values.
  • Neural networks do qualify as true automatic data mining tools because they autonomously interrogate the data for patterns.
  • Neural networks often require extensive care and feeding - they can only work with preprocessed numeric, normalized, scaled data. They also need a fair amount of tuning such as the setting of a stopping criterion, learning rates, hidden nodes, momentum coefficients, and weights. And their results are not always comprehensible.
  • symbolic classifiers are examples. These use machine learning technology, and hold great potential as data mining tools for corporate data warehouses. These tools do no require any manual intervention in order to perform their analysis. Their strength is their ability to automatically identify key relationships in a database - to discover rather than confirm trends or patterns in data and to present solutions in usable business formats. They can also handle the type of real-world business data that statistical and neural systems have to "scrub" and scale.
  • symbolic classifiers are also known as rule-induction programs or decision-tree generators. They use statistical algorithms or machine-learning algorithms such as LD3, C4.5, AC2, CART, CHAIRd, CN2, or modifications of these algorithms. Symbolic classifiers split a database into classes that differ as much as possible in their relation ' to a selected output. That is, the tool partitions a database according to the results of statistical tests conducted on an Output by the algorithm instead of by the user.
  • Machine learning algorithms use the data - not the user's hypotheses - to automate the stratification process.
  • the type of data mining tool requires a "dependent variable" or outcome, such as copier sales, which should be a field in the database.
  • the rest is automatic.
  • the tool's, algorithm tests a multitude of hypotheses in an effort to discover the factors or combination of factors (e.g., business type, location, number of employees) that have the most influence,.on the outcome.
  • the algorithm engages in a kind of "20 Questions" game.
  • the algorithm asks a series of questions about the values of each record. Its goal is to classify each sample into either a buyer or non-buyer group.
  • the tool processes every field in every record in the database until it sufficiently splits the buyers from the non-buyers and learns the main differences between them. Once the tool had learned the crucial attributes, it can rank them in order of importance. A user can then exclude attributes that have little or no effect on targeting potential new customers. Most data mining tools generate their findings in the format of "if then" rules. Symbolic Classifiers do have some advantages. For example: • Symbolic classifiers do not require an intensive data preparation effort. This is a convenience to end-users who freely mix numeric, categorical, and date variables.
  • numeric sequences that ultimately are chosen may not be the best ones available for correlating the chosen variables with real-life occurrences.
  • a better method is desired for establishing numeric sequences predictive of real-world events based on historic data.. The present invention includes such a method.
  • the present invention is a new paradigm in forecasting technologies. It is data driven, pattern recognizing and extension software. All the current forecasting models try to interpret historical data by the way of establishing relationships and extracting hidden knowledge, and base their predictions on these interpretations.
  • the mathematical model of the present invention selects one of the patterns from its library that matches with the historical data and extends it into the future to make the forecast. This results in several important advantages. There are no assumptions on relationships.
  • the input data need not be distributed.
  • the user need not originate queries, but, can instead perform autonomous data discovery. It completely automates the data analysis for extracting hidden knowledge and does not require any human intervention. It discovers trends and presents solutions in usable business formats. It can handle real world business data directly without any need to scrub the data.
  • the pattern library component of the present invention is very large. It uses both horizontal and vertical pattern recognizing methods.
  • the horizontal patterns identify interrelationships between various parameters (such as price of an item and customer decision to purchase it) and the knowledge that can be extracted from, and their relationship to, the eventual event.
  • the vertical patterns project these parametric values into the future.
  • NSS Number Sequences
  • NDNS N-Dimensional Numeric Space
  • the system utilizes a numeric "space” constructed of "n" dimensions, wherein "n" is typically much more than three.
  • the x-axis represents time in suitable increments, another axis represents a number indicative ;bf the event being predicted, and other axes represent parameters that correlate with the event.
  • the number of parameter axes is equal to the number of parameters correlated with the event ' ⁇
  • the "space" is ordinary three-dimensional space wherein one axis represents time, a second axis represents a numeric scale indicative of the occurrence of the event, and the third axis represents the parameter that correlates with or is a function of the first two.
  • the variables are thus plotted on multi-dimensional x-y axes with an integrated paradigm for said axes.
  • a three dimensional plot of this space is easy to visualize, in which a "strand” or other geometric figure shows t he interrelationship among these three variables.
  • This "strand” which is also called NSS is similar to the double helix of DNA. It consists of two strings of numbers. One string of numbers represents historical data of the event and a selected parameter.
  • the second string represents the corresponding patterns selected from the pattern library.
  • a relationship between these two strings of the strand will be established. Once this strand or other geometric figure is established based on historic data, it can be mathematically characterized or "modeled.” It can then be projected or extrapolated into the time region beyond the historic data, i.e., the future. " This is possible as the pattern string is of infinite length. Using the already established relationship between the pattern string and historical data, the string representing future data will be drawn. The same concept applies when using more than one parameter correlating with time and the predicted event, althoughOf course the concept is then impossible to visualize since it involves a "space" of greater than three dimensions.
  • the method in a preferred embodiment utilizes software written in the Java brand programming language. Such software is platform independent and can be used on most machines. Six modules may be used: data reader, diary of events, iterative generator;- " forecaster, communicator, and optimizer.
  • the data reader facilitates the input of data from one or more databases such as ORACLE, SYSBASE, INGRES brand databases or others.
  • the report is made to a FOXPRO brand or flat data file.
  • the data reader may also utilize web-based software to allow access to data from remote servers over a network such as the Internet.
  • the diary of events module establishes a relationship between factors that are causative or otherwise correlated with the predicted event by reading data from the data reader module and employing a pattern recognition tool.
  • the interactive generator works in tandem with the diary of events module to generate an n-dimensional numeric space (sometimes referred to as "NDNS") and a set of corresponding numeric sequence strands (sometimes referred to as "NSS”) using a set of interrelated formulae.
  • NDNS n-dimensional numeric space
  • NSS set of corresponding numeric sequence strands
  • the forecaster module utilizes the iterative generator to produce predictions of future events. Such predictions can be short-term or long-term or both. As additional events that are the subject of the predictive system occur, the historic database can be updated to tune the system for better future predictions.
  • the communication module is used to transmit or otherwise communicate predictions to appropriate persons. For example, a system used for predicting disease outbreaks transmits predictions to appropriate health authorities, a system used for predicting flooding transmits predictions to appropriate rescue or aid groups, or can commumcate a warning prior to a system failure in the case of power grids or machinery or other mechanized devices.
  • the optimizer module of the method assists the users in improving upon the forecasted results.
  • the optimizer contains a built in simulator. This simulator provides user with an opportunity to perform "what if analysis. Here the user can change values of various parameters (theoretically) and see how these changes effect the forecasted results.
  • This module also provides user an opportunity to fix time and intensity of an event based on which this method calculates and recommends feasible ranges of various parametric values. Once the user takes necessary steps to keep the parametric values within the range recommended by the method, occurrence of the forecasted event can be arrested, deferred or intensified as per requirements. Once this has occurred, one can begin using the above process to forecasts values for each of the parameters. Knowing the values or the weights of each of the parameters is key in the optimization process. These weights may change based on the interrelationship of the other inputs.
  • the program can query the desired result and work ; backwards to notify the user what parameter values must change in order to increase or decrease the,- projected outcome based upon the ND, NS and NSS integrated paradigm utilizing synchronization and discrepancies.
  • the invention constructs both the numeric sequence strands as well as the numeric sequence values in an integrated multidimensional paradigm. Once these keys are known, for an event, the invention uses the mathematical formulas in reverse to get the optimum result by recommending the change in the input values, such as price or delivery times to increase sales in the case of ice cream.
  • the methodology of the present invention has broad application in predicting the occurrence of events for which past data is available.
  • Applications include, for example, the forecasting of vector-borne diseases, so that preventive measures can be taken and to allow predictions of the demand for treatments such as pharmaceuticals.
  • the method can be used to forecast the incidence of agricultural blights or pests and the corresponding demand for pesticides or other chemical treatments.
  • the method assists in designing new drugs in the form of particular molecules or compounds by predicting their efficiency, and also in implementing their manufacture.
  • the method is also useable in designing microprocessors optimized for speed, efficiency, low cost or ease of manufacture. In the area of utility service, the system accurately predicts customer demand in order to optimize power grid operations.
  • the system can be used to predict equipment failures, so that appropriate equipment maintenance and replacement can be . undertaken on a timely basis.
  • the system can be used in Customer Relationship Management and the forecasting of customer behavior at e-commerce sites or other sale sites. .. The syste can even be used by banks and other financial institutions to predict interest rates. : -- - - , -'. ⁇ •
  • the system can also be used in numeric processing. Traditionally, computers perform numeric processing -by emphasizing classic arithmetic calculations. This can be very processor- intensive.
  • the present invention allows a processor instead to recognize patterns in numeric processing and to substitute these patterns for the step of arithmetic computation. Detailed Description of the Invention
  • the invention involves identifying a set of formulae (or numeric sequence strands), and then establishing a very high number of patterns utilizing combinations of those formulae. These patterns are created independently of time variable or data. These patterns are applied to data sets. Then, the patterns are repeatedly compared to the calculated values until an acceptable relationship can be discerned. That relationship can then be extended into the future to predict the future occurrence of the event in question.
  • This process can be visualized as encompassing a set of numeric sequences which produce numeric sequence "strands.”
  • a set of Numeric Sequences is developed.
  • the Numeric Sequences are functions of elapsed time. Each can therefore be plotted in two dimensions, such as with the Numeric Sequence on the y axis and elapsed time on the x axis. Additionally, it is also plotted on multi-dimensional x-y axes with an integrated paradigm for said axes. Many formulae can be used for these Numeric Sequences which are functions of elapsed time, but it has been found that formulae that correspond to patterns in nature are those effective in the invention.
  • NSS Numeric Sequence Strands
  • NSS Extending NSS into future, i.e., beyond historical data for forecasting.
  • Historical data on the event intensity is collected at the available frequency.
  • the effectiveness of the forecast increases with larger data sets.
  • the forecast also becomes more accurate as the historical period for which data is collected increases.
  • Calculating Numeric Sequence (NS) Values NS values are calculated by using a set of NS formulae. In these calculations initial Elapsed Time (ET) value will be zero. In each iteration ET is incremented by finest possible interval between collected historical data. This is called Time Interval (TI).
  • TI Time Interval
  • the precision of the forecast depends on this TI duration. The finer this TI duration, the more precise the forecast. It is important to note that except for this time interval, this method does not use historical data for the calculation of NS.
  • Elapsed Time is plotted on x-axis.
  • TI is the unit of measurement.
  • NS values are plotted on y-axes. As there is more than one NS value, there will be more than one y-axes. This can be viewed as a number of two-dimensional planes superimposed on one another. For the given ET there will be a set of NS values NS1, NS2, ...., NS36. This set of NS values is called a pattern. This means there will be one new pattern each time when ET increased by.TI. As ET can be extended infinitely, there can be a very large set of patterns.
  • Synchronization establishes an arithmetic relationship between these two strings. For the given time interval, the method finds whether there is any relationship between numbers in these two strings. If there is no relationship, then all the NS values are recalculated by offsetting the ET value used for calculating NS by 1 unit. Then the process of finding a relationship is repeated. As the set of NS value patterns is extremely large, at some point the relationship between these two strings is found. This process is called synchronization. .
  • NSS Extending NSS into future, i.e., beyond historical data for forecasting. Synchronized NSS can be extended into the future. As a direct relationship between NS values and event values is established, the same can be used for predicting the event. This is possible because NS values can be extended infinitely into the future.
  • NSS Building NSS for each one of the parameters.
  • the above process ' is. used for. forecasting values of each one of the parameters that have a bearing on the occurrence of the event. It means that one NSS is built for each one of the parameters. .
  • the method builds a NSS for each one of the parameters for this reason only. This method will search all the forecasted parametric values for asynchronous ones. Based on those values, it will correct the event intensity forecast. By intentionally changing one or more of forecasted parametric values, an event can be stopped from occurring or its intensity can be dramatically decreased/increased. This is the principal benefit that can be accrued from this method. This process is called optimization and is discussed below.
  • the optimizer module of the method assists the users in improving upon the forecasted results.
  • the optimizer contains a built in simulator. This simulator provides user with an opportunity to perform "what if analysis. When historical data are processed patterns between values of various parameters are collected. These patterns are analyzed by the optimizer module. During this process, the Optimizer module frames rules for verifying validity of data of each parameter, in isolation as well as in combination with values of other parameters. Rules for verifying boundary values for each one of the parameters are also part of this rule set. The verifier is the sub-module that holds all these rules.
  • the optimizer module provides the user with a facility to change values of various parameters (theoretically): " Whenever the user makes such changes in the parametric value set, the verifier module validates these changes and throws back those changes inconsistent with historical data. At this stage me parameters whose values ' are changed become asynchronous with rest of the data. This causes a discrepancy in the forecasted event, i.e. the simulator shows that forecasted event does " not occur at the predicted intensity at the predicted time and thus helps users in improving upon the forecasted results.
  • the optimizer also works in a fully automated mode. In this mode it provides the user with facility to enter a desired range both in intensity and period of occurrence of the event. Once these values are entered, it reconstructs the range of each one of the parametric values for the given time intervals. Now if values of all these critical parameter are kept within the stipulated range then it may become possible for the user to realize a desired event at a desired time.
  • the invention differs from prior art systems in that these Numeric Sequences are initially formulated without regard to the occurrence of the events at issue. They are instead raw patterns and numerous combinations of patterns relating to elapsed time. Only after these patterns and combinations are established in raw form is there any attempt to time them to the occurrence of the events at issue.
  • This historic data that is collected is appropriate for the predicted event that is the subject of the system. For example, in the case of disease outbreak, the past data would likely include the actual occurrences of the disease outbreak, along with data pertaining to causative of correlative factors such as weather, the prevalence and characteristics of carriers, and lifestyle and hygiene factors.
  • Disease can be quantified as diagnosed cases per population number, such as cases per thousand individuals, or in other desired quantifications such as deaths per population number.
  • the method chosen for quantifying the input data should correspond to the desired prediction; if the prediction that is desired is deaths per 1000 population, then the input data should similarly be expressed in deaths per 1000 population.
  • Weather can be expressed in temperature, humidity and rainfall per chosen period. Variables that are not ordinarily expressed in number can be quantified arbitrarily; for example, seasons of the year can be expressed numerically as 1, 2, 3 or 4. Gender can be expressed as 1 or 2, and occupations of individuals can be assigned numeric codes. Each variable that appears to cause or correlate with the predicted event is preferably quantified and input as part of the historic data 12.
  • Each item of historic data 12 is matched with the time at which it occurred.
  • the time scale begins with the earliest historic data at "0" and proceeds to the most recent available historic data.
  • the time increment is chosen as appropriate for the data. If the data has a time precision that is no more than weekly, for example, the time increment could be one week. If the most precise data is expressed in terms of a precision of seconds or tenths of a second, then similar precision is appropriate for the time increment. Of course, this can produce relatively large numbers for the time scale; a time scale expressed in seconds will include numbers that are equal to the number of seconds in a year for the scale at the point of one-year elapsed. The computational issues presented in manipulating these large numbers are easily handled by modem numeric processors.
  • data can be collected and input with the aid of geographical positioning systems ("GPS") or global information systems (“GIS").
  • GPS geographical positioning systems
  • GIS global information systems
  • Data associated with the geographic data can be entered on-site using traditional methods, and the ' position associated with such data is entered automatically or by the simple press of a button which determines geographic position using GPS equipment and enters such position in the database.
  • GIS technology can be used which inherently associates location with other, variables.
  • the data reader 14 facilitates the input of data from popular databases such as ORACLE, SYSBASE or INGRES brand. It is desired that the input historic data be set forth in a FOXPRO brand or flat data file for use by the system.
  • the data reader can be equipped with web-enabled software of the kind known in the field to access data from remote servers via a network such as the Internet or a private network.
  • the software may include a graphical user interface that allows the user to specify the fields for which the production models are required.
  • the diary of events module 16 works with the iterative generator to produce an n- dimension numeric space (“NDNS”) and a set of numeric sequence strands ("NSS”) within that space.
  • NDNS n- dimension numeric space
  • NSS numeric sequence strands
  • the variables are "normalized,” meaning that they are graded on a finite scale such as 1-100 (including decimal fractions).
  • numeric values of the many variables at several but less than all the instances at which .the predicted event occurred in the past are then ascertained, and these values are used to calculate the numeric sequences NS using formulae. For example, if the predicted event occurred five times in the historic data, then numeric sequences NS could be calculated for two or three or four of such instances.
  • the goal is to calculate the many numeric sequence values such that they fall within a small range or band at ET ls ET 2 , ET 3 , etc. This is done by initially calculating them with the earliest time T equal to zero. If the numeric sequences calculated using the data corresponding to these times fall within the chosen range or band, then one can proceed to the next step. If the numeric sequences calculated using the earliest time set at zero do not fall within the band or range selected, then the initial time is offset by one time increment. If the time increment for the collected data is one week and the calculations are performed in seconds, then the initial time can be offset by the number of seconds in a week, i.e. 7 x 24 x 60 x 60.
  • This method has been successful in predicting the breakdown of a centrifuge used in chemical/pharmaceutical industries, batch failures in the bulk drug industry, the quality of manufactured bottles in the glass industry, batch failures in the paper industry due to various quality problems and the growth of virus on different cultures under laboratory conditions.
  • the next step is to calculate the numeric sequence values for the complete period for which data are available.
  • the numeric sequence values are calculated for each incremental time including each time at which the predicted event occurred. If the numeric sequence values calculated for each time at which the predicted event occurred fall within the selected band or range, then the process proceeds to the next step. If not, then the process goes back to the step of once again offsetting the time value by one increment, and re-computing the numeric sequence . values again. It should be recognized that each iteration refines the accuracy and efficiency of the system. These additional iterations take place with respect to past data collected at the outset, and also with respect to data that is subsequently' collected as events occur.
  • the process moyes to thenext step is to predict the predicted event in the future based on the 'occurrence and value of the variables in the numeric sequence.
  • Discrepancies may occur in the operation of the process, which are addressed as follows. . Occasionally, there is a large discrepancy between the predicted event and the occurrence of the actual event in the historic data. For example, there may be substantial difference between the number of actual cases of a disease per population group and the number of predicted cases per population group. In that event, the process looks for a substantial aberration in the value of one of the factors in the input data. It may be, for example, that the amount of rainfall in the historic data corresponding to the discrepant prediction was extremely high or extremely low. However, the system can overcome this problem by building n-dimensional numeric sequences and synchronizing them for each one of the parameters that has a bearing on the occurrence of the event.
  • Example 1 Example 1
  • This actual example utilizes the present invention to predict successfully the outbreak of Japanese Encephalitis ("JE") in India.
  • JE Japanese Encephalitis
  • the system can be used to predict the outbreak of AIDS, tuberculosis or other identified disease.
  • a principal vector of -JE is known to be the mosquito Culex tritaemorhyncus.
  • Other vectors include Cx. vishnui group, Cx. pseudovishnui, Cx. bitaeniorhyncus, Cx. gelidus, Anopheles subpictus, An. hyrcanus, An. barbirostris and Mansonia annulifera.
  • the incubation period for JE is 9-12 days in mosquitoes and is 5-15 days in man.
  • the system of the present invention was used to build an n-dimensional numeric space based on ' the actual data.
  • Time T was taken on the x-axis.
  • the values of each one of the effective parameters was taken on the y-axis.
  • the number of cases was marked on the z-axis. For each parameter there was one such space.
  • a strand (not a straight line) connects all the events (in this case number of cases).
  • a strand extender projects the-existing strand into the future to forecast the number of cases that may occur in future.
  • the data reader module facilitates inputting data from any one of the popular databases including ORACLE, SYSBASE, INGRES to FOXPRO or a flat data file.
  • Web-enabled software such as Data Reader can access data from remote servers also.
  • the Graphical User Interface (GUI) of this software enables the user to specify the fields for which the prediction models are required.
  • the diary of event module establishes relationship between the causative factors and the disease by reading data from data reader.
  • the pattern recognition tool set of the software will establish relationship between various parameters and events occurred.
  • the iterative module works in tandem with diary of events module. It is based on data (the longer the data set period, the more accurate is the prediction) and generates both "the n-dimensional numeric space (NDNS) and the corresponding numeric sequence strands (NSS). An iterative module, it generates and regenerates these NDNS NSS combination until obtaining a satisfactory result. It uses Genetic Algorithms for generating NDNS as well as NSS. -
  • the iterative module Based on the data (the longer the data set period, the more accurate is the prediction), the iterative module generates the required logic into a software tool called Forecaster. This generates predictions on the occurrence of the future events. " •
  • Predictions are of both long term and short term in nature.
  • the self-learning algorithms contained by the iterative module continuously improve the ; pree-ision and- accuracy of the predictions generated by it. In short - and this is important — the system is self-learning; the more it is used, the more accurate it becomes.
  • NSI The numeric sequence NSI is used in this model. NSI is:
  • BB (7*SV-20*CV+(7*SZ-4+34*CZ+6*C2Z)*SQ+(38*SZ+6*S2Z-7*CZ)*CQ- 5*SZ
  • CC (36*SV+13*CV+(-68*SZ-l l*S2Z-2+13*CZ)*SQ+(15*SZ-8+60*CZ+10*C2Z
  • MNLN (D(1,J)+D(2,J)*T+D(3,J)*T*T+D(4,J)*T*T*T)
  • ANN ANN+AA-BB/ECC
  • ANN ANN+SA-SB/ECC
  • TRLONG UU+ASN
  • NS3 Y*SMD(T)-2*C*SIND(A)+4*C*Y*SLND(A)*COSD(T)
  • NS3 (E-0.5*Y*Y*SLND(T+T)-1.25*C*C*S ⁇ ND(A+A))
  • NS3 NS3/0.0174532925/15.
  • NS4 ASrND(SIND(B)*COSD(E)+COSD )*SIND(E)*SrND(A))
  • the system generates heuristic for accurately assessing the geographical location of the outbreak of any vector-bome diseases. It also demarcates the endemic area (sq.km) where the people are prone to the infection, in the specific outbreak. This is very important; not only does the system predict disease outbreaks, but it predicts with some precision the locations of the
  • the cornmunication module of the system also is capable of informing all the concerned authorities and agencies about the impending outbreak and its magnitude.
  • the communication module requires a good PSTN; if Internet facility is available it will use the facility.
  • the predictions of the years 2000, 2001 and 2002 are given in table 4.
  • the phasewise forecastings for the years 2000, 2001, and 2002 are given in table 5.
  • Table 4 Forecast of J.E. incidence in Kurnool district for the years 2000, 2001 and 2002
  • Table 5 Phasewise Forecast of J.E. in Kurnool District, Andhra Pradesh for the years 2000, 2001 and 2002
  • the system forecasts the following:
  • Necessary vector control measures to be taken to reduce the number of JE cases significantly will be at Phase I. This allows widening the gap between the man vector contract and the transmission.
  • Necessary measures must be taken to avoid the presence of reservoirs like pigs, donkeys, etc., in the environment, so that the multiplication of JE virus can be reduced which will bring down the rate of transmission of JE virus to the human beings.
  • Phase JJI i.e., intrinsic incubation period in human beings is too late to control JE, proper vaccination will help in reducing the number of deaths out of positive cases in the particular period.
  • the predicted event or occurrences associated with the event is a parameter. Plotted in two dimensional space are Numeric Sequences and Elapsed Time, and these two dimensional plots are overlaid in multi-dimensional x-y axes with an integrated paradigm.
  • the system is effective in forecasting the occurrence of crop and livestock blight and disease. Armed with relatively accurate forecasts, farmers can take preventive measures such as applying pesticides or altering planting techniques or timing, or changing crops. Moreover, such forecasts can be used to increase the production of pesticides, to store alternative food supplies or to hedge commodities.
  • Peptide drugs lack activity orally because they are digested, and they often lack selectivity because they react with many receptors. Therefore it is necessary to transform active peptide compounds into, active non-peptide drug compounds, which can be a difficult task.
  • the invention helps find active compounds by using relationships between chemical- structures and their biological activities. By establishing patterns between chemical structures and their bio logical activities, the invention can speed the iterative process of drug discovery, in which new compounds bring new information.
  • the invention can be used for developing new design techniques that must infer a cavity from available active leads.
  • a useful approach is to build a receptor-surface model (a model for the receptor site) and to construct compounds inside this model that fit sterically and complement the putative receptor interactions.
  • the forecaster model of the invention can predict the possible model that can fit thus hastening new molecule development process.
  • Invention in conjunction with available traditional software can be a powerful tool for new drug design. It can be used to fit molecules into the active site of a receptor by identifying and matching complementary polar and hydrophobic groups. As empirical functional software the invention can be used to prioritize the hits.
  • CRM Customer Relationship Management
  • the system has wide applicability. By tracking purchasing patterns for individual customers and groups of customers, and generating suitable NSS indicators, the system can predict with surprising accuracy a given customer's purchases or interests over a future time period. This allows vendors to present to a customer the particular types of goods and services that the customer is interested in purchasing, at the particular time that the interest is ripe.
  • the present system can apply a set of numeric sequence strands to such data to generate relatively reliable predictions of what an individual customer is likely to purchase during a given period of time in the future and the probable volume of his purchases. It will also indicate the price sensitivity of customers, the general types of goods and services the customer be interested in, and the cost/benefit analysis of focused marketing for individual customers.
  • the system can also use historic data to optimize the formatting of an e-commerce site, by the positioning of captions and product service names on the screen, by appropriate color selections, and by formulating mailing lists.
  • the present invention addresses this by developing an archive of data in the form of history cards for pieces of equipment, containing service details and performance data, and then processing this data through appropriate numeric sequence standards. This can be used not only to evaluate and predict the performance of specific equipment studying alone, but also in relation to other interrelated equipment. For example, as any car owner knows, the performance and reliabihty of parts and equipment is often related to the performance and reliability of associated parts and equipment.
  • the vibration produced by a failing motor can stress the motor mounts, and a poorly tightened screw can produce undue strain on. the other screws in an assembly.
  • a replacement part can result in unforeseen impacts on other parts, and-even the replacement procedure itself can impact other elements.
  • the present system can consider all these variables and parameters in predicting the need for service and maintenance. '- *
  • the operation of a power grid can be optimized by forecasting .
  • consumer demand by predicting equipment failure, and by forecasting transmission and distribution losses. All these can be derived with considerable accuracy based on past data and appropriately tailored numeric sequence strands.
  • the method can predict hourly demand, on a unit basis well in advance. This allows utility companies to optimize power procurement from feeder units.
  • the lead time available through this method allows utilities to take necessary actions to eliminate load mismatches.
  • the forecasting of equipment failures allows utilities to shift from time-based maintenance, i.e. maintenance conforming to a time schedule regardless of actual need, to event-driven maintenance, i.e. maintenance performed when actually needed.
  • the pattern of the solution begins with 1 in the units place. Doubling current digit and adding carry digit obtain each next digit.
  • Periodicity is the " number of elements in the repeating unit of a sequence.
  • the sequence 0, 3, 8, 5, 0, 3, 8, 5, . ' . ., for instance, has a periodicity of four.
  • a classical algorithm must observe at least as many elements as there are in the period. Whereas the pattern library of this method does much better. It identifies all the possible repeating sequence. A single pattern search operation then identifies the value of the sequence to which the answer corresponds. This is the beauty of the system.
  • the system can also optimize using the paradigm in the forecasting : technologies.
  • the optimization can be used throughout all industries, including but not limited' to the pharmaceutical industry, in design, testing, synthesizing and manufacturing new . . therapeutic molecule's and compounds, in increasing computer processors, in optimizing power grid operations and consumption, in consumer conservation of energy, in optimization of manufacturing process as well as customer relationship management, and in inventory control.
  • the users can conduct operations more efficiently and effectively whether in marketing, manufacturing or sales of any products or services or in any other business that uses processes or that has customers.
  • This invention establishes through prediction modeling, relationships and. interplays between datasets and creates and draws from the internal patterns of its software's library. It creates a pattern, so the user can identify the proposed outcome, which can be predetermined so that a change in the data or in the inputs can change the final or actual outcome.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Pathology (AREA)
  • Biomedical Technology (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

La présente invention concerne une méthode et un système de prévision dans lesquels on utilise la reconnaissance des formes et un logiciel d'extension. Des modèles selon la présente invention sélectionnent dans une bibliothèque, des formes qui correspondent à des données historiques et projettent ces formes dans le futur pour effectuer des prévisions qui peuvent être utilisées par diverses technologies de prévision.
PCT/US2002/013715 2001-04-30 2002-04-30 Methode de prediction Ceased WO2002088903A2 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2002309621A AU2002309621A1 (en) 2001-04-30 2002-04-30 Predictive method

Applications Claiming Priority (10)

Application Number Priority Date Filing Date Title
US09/846,605 2001-04-30
US09/846,734 US20030009290A1 (en) 2001-04-30 2001-04-30 Predictive method
US09/846,601 US20020194148A1 (en) 2001-04-30 2001-04-30 Predictive method
US09/846,733 2001-04-30
US09/846,734 2001-04-30
US09/846,606 2001-04-30
US09/846,733 US20030028351A1 (en) 2001-04-30 2001-04-30 Predictive method
US09/846,605 US20030036890A1 (en) 2001-04-30 2001-04-30 Predictive method
US09/846,601 2001-04-30
US09/846,606 US20030018514A1 (en) 2001-04-30 2001-04-30 Predictive method

Publications (2)

Publication Number Publication Date
WO2002088903A2 true WO2002088903A2 (fr) 2002-11-07
WO2002088903A3 WO2002088903A3 (fr) 2003-05-15

Family

ID=27542257

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2002/013715 Ceased WO2002088903A2 (fr) 2001-04-30 2002-04-30 Methode de prediction

Country Status (2)

Country Link
AU (1) AU2002309621A1 (fr)
WO (1) WO2002088903A2 (fr)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006045004A2 (fr) 2004-10-18 2006-04-27 Bioveris Corporation Systemes et procedes permettant d'obtenir, de stocker, de traiter et d'utiliser des informations immunologiques concernant un individu ou une population
EP2074565A4 (fr) * 2006-09-29 2010-12-15 Nortel Networks Ltd Procédé et système de prédiction de l'adoption de services tels que des services de télécommunications
TWI560634B (en) * 2011-05-13 2016-12-01 Univ Nat Taiwan Science Tech Generating method for transaction modes with indicators for option
CN112257962A (zh) * 2020-11-16 2021-01-22 南方电网科学研究院有限责任公司 一种台区线损预测方法及装置
US10964415B2 (en) 2006-04-27 2021-03-30 Wellstat Vaccines, Llc Automated systems and methods for obtaining, storing, processing and utilizing immunologic information of an individual or population for various uses
CN113515891A (zh) * 2021-06-04 2021-10-19 浙江永联民爆器材有限公司 一种乳化炸药质量预测和优化方法
CN118245822A (zh) * 2024-05-21 2024-06-25 北京弘象科技有限公司 相似集合预报优化方法、装置、设备及介质

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
OSOBA ET AL.: 'A BSP performance prediction model for parallel multigrid algorithms' IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS vol. 1, December 2000, pages 403 - 406, XP010535735 *
PINTO ET AL.: 'ULSI technology development by predictive simulation' TECHNICAL DIGEST., INTERNATIONAL ELECTRON DEVICES MEETING December 1993, pages 701 - 704, XP010118307 *
POH ET AL.: 'Heat transfer and flow issues in manifold microchannel heat sinks: a CFD approach' PROCEEDINGS OF THE 2ND ELECTRONICS PACKAGING TECHNOLOGY CONFERENCE December 1988, pages 246 - 250, XP010328991 *
SACHS ET AL.: 'Electrokinetics and electromechanics in controlled release from ionizable gels: theory & experiments' PROCEEDINGS OF THE 16TH ANNUAL INTERNATIONAL CONFERENCE OF IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY vol. 2, 1994, pages 754 - 755, XP010145854 *
YEH A.: 'Abstract: predicting the likely behaviors of complex system' PROCEEDINGS OF THE 4TH CONFERENCE ON AI FOR APPLICATIONS March 1988, pages 430 - 435, XP010011937 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006045004A2 (fr) 2004-10-18 2006-04-27 Bioveris Corporation Systemes et procedes permettant d'obtenir, de stocker, de traiter et d'utiliser des informations immunologiques concernant un individu ou une population
EP1817708A4 (fr) * 2004-10-18 2014-08-27 Wellstat Vaccines Llc Systemes et procedes permettant d'obtenir, de stocker, de traiter et d'utiliser des informations immunologiques concernant un individu ou une population
US10964415B2 (en) 2006-04-27 2021-03-30 Wellstat Vaccines, Llc Automated systems and methods for obtaining, storing, processing and utilizing immunologic information of an individual or population for various uses
EP2074565A4 (fr) * 2006-09-29 2010-12-15 Nortel Networks Ltd Procédé et système de prédiction de l'adoption de services tels que des services de télécommunications
TWI560634B (en) * 2011-05-13 2016-12-01 Univ Nat Taiwan Science Tech Generating method for transaction modes with indicators for option
CN112257962A (zh) * 2020-11-16 2021-01-22 南方电网科学研究院有限责任公司 一种台区线损预测方法及装置
CN113515891A (zh) * 2021-06-04 2021-10-19 浙江永联民爆器材有限公司 一种乳化炸药质量预测和优化方法
CN113515891B (zh) * 2021-06-04 2024-02-20 浙江永联民爆器材有限公司 一种乳化炸药质量预测和优化方法
CN118245822A (zh) * 2024-05-21 2024-06-25 北京弘象科技有限公司 相似集合预报优化方法、装置、设备及介质

Also Published As

Publication number Publication date
WO2002088903A3 (fr) 2003-05-15
AU2002309621A1 (en) 2002-11-11

Similar Documents

Publication Publication Date Title
US20030036890A1 (en) Predictive method
Cox Fuzzy modeling and genetic algorithms for data mining and exploration
US7117208B2 (en) Enterprise web mining system and method
Nayak et al. Impact of data normalization on stock index forecasting
Farah et al. Bayesian emulation and calibration of a dynamic epidemic model for A/H1N1 influenza
Zhao et al. Data mining applications with R
US20020194148A1 (en) Predictive method
US7627432B2 (en) System and method for computing analytics on structured data
AU2001291248A1 (en) Enterprise web mining system and method
WO2002027529A2 (fr) Systeme d'entreprise d'exploration en profondeur de reseau et procede
CN113706251B (zh) 基于模型的商品推荐方法、装置、计算机设备和存储介质
US20210125031A1 (en) Method and system for generating aspects associated with a future event for a subject
Farooq et al. Interpretable multi-horizon time series forecasting of cryptocurrencies by leverage temporal fusion transformer
Gupta et al. K-Means clustering based high order weighted probabilistic fuzzy time series forecasting method
Ewani et al. Smart city and future of urban planning based on predictive analysis by adoption of information technology
US12450535B2 (en) Multi-layer micro model analytics framework in information processing system
US20030018514A1 (en) Predictive method
Li et al. Hyperimts: Hypergraph neural network for irregular multivariate time series forecasting
Chang et al. An interval-valued time series forecasting scheme with probability distribution features for electric power generation prediction
WO2002088903A2 (fr) Methode de prediction
WO2021077227A1 (fr) Procédé et système de génération des aspects associés à un événement futur d'un sujet
US20030028351A1 (en) Predictive method
US20030009290A1 (en) Predictive method
Loureiro et al. Predicting multiple domain queue waiting time via machine learning
Song et al. Uncovering characteristic response paths of a population

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG UZ VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP