EP4162409A1 - Procédé de génération d'un système d'aide à la décision et systèmes associés - Google Patents
Procédé de génération d'un système d'aide à la décision et systèmes associésInfo
- Publication number
- EP4162409A1 EP4162409A1 EP21731114.1A EP21731114A EP4162409A1 EP 4162409 A1 EP4162409 A1 EP 4162409A1 EP 21731114 A EP21731114 A EP 21731114A EP 4162409 A1 EP4162409 A1 EP 4162409A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- function
- neural
- neural network
- subnetwork
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0499—Feedforward networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
Definitions
- the present invention relates to a method for generating a decision support system.
- the present invention also relates to decision support systems associated with the generation method.
- Decision support and more specifically multi-criteria decision support, is a vast field whose applications belong to many and varied fields, such as air traffic control, border surveillance or medicine.
- Such problems include in particular the problems of classification (classifying alternatives in classes each corresponding to a level of satisfaction), sorting (sorting the alternatives in the order of the decision-maker's preferences), choice (selection of the best alternative among all of them). alternatives) and evaluation (providing an overall score for an alternative)
- the models developed are capable of reproducing as closely as possible the decision-making strategies of the decision-maker, and of providing him with indicators on the proposed choice in order to justify their exits.
- such models are called preference models.
- preference model parameters require a sustained interaction between the decision maker (end user who will be helped by the model) and the designers of the model.
- the designers ask the expert to provide so-called "preferential" information which will depend on the preference model and the possible sets of parameters.
- the questions are usually artificially determined by the designers in order to extract as much information as possible.
- Such data is, for example, derived from a collection of feedback from a system operator on recommendations made to him.
- This data can be marred by uncertainty (noise on the data for example), and contain errors (in the label of the data for example).
- the present description proposes a method for generating a multicriteria decision support system, the generation method comprising the supply of an initial problem and of training data solving the initial problem for particular cases.
- the initial problem being a problem of evaluating the quality of the existing system or to be created
- the initial problem is a problem chosen among the choice of the best alternative among a set of alternatives, the distribution of alternatives among classes of preferences, ranking alternatives in order of preference, and providing an alternative evaluation score.
- the generation method further comprises the transcription of the initial problem in the form of a neural network and of a set of constraints to be respected by the neural network, in order to obtain a transcribed neural network, the learning of the network. of neurons transcribed using the training data, to obtain a learned neural network solving the initial problem, the determination of the function performed by the learned neural network, and the physical implementation of the determined function to obtain the decision support system.
- the generation method comprises one or more of the following characteristics when this is technically possible:
- the transcribed neural network comprises a set of neural subnetworks, the retranscription step comprising the formulation of the set of constraints to be respected by the neural network in the form of sub-constraints to be respected by each subnetwork neurons.
- each neural sub-network comprises hidden layers, the number of hidden layers being less than or equal to 5, preferably less than or equal to 3.
- the sub-constraints to be respected by a neural sub-network are chosen from the list made up of the monotony of the variation in the output of the neural sub-network as a function of the inputs of the neural sub-network, of the output of the neural subnet is between a minimum and a maximum value, the output of the neural subnet being equal to the minimum value when all the inputs of the neural subnet are equal to the minimum value, and the output of the subnet - neural network being equal to the maximum value when all the inputs of the neural subnetwork are equal to the maximum value and each subnetwork is suitable for implementing weights, a constraint being that the weights are positive and that the sum of the weight is equal to 1.
- the transcribed neural network comprises a set of neural subnetworks arranged according to a tree structure, each neural subnetwork being a first neural subnetwork or a second neural subnetwork, each first neural subnetwork neurons performing a respective aggregation function, the aggregation function preferably being an aggregation function of variables chosen from the list consisting of a weighted sum of the variables, a Choquet integral, an integral of Choquet 2-additive, of a weighted sum of combinations of min and max functions between at most k variables, for k at least equal to 2, of a multi-linear model, of a generalized additive independence function, and the ordered weighted average, and each second subnetwork of neurons performing a respective marginal utility function, the marginal utility function preferably being a monotonic function or a function having three parts, one pr first monotonic part, a second part which is constant and a third monotonic part, the monotony of the first part being different from the monotony of the third part.
- - learning includes a first learning with the set of transcription constraints making it possible to learn an intermediate neural network, a second learning of the set of constraints by fixing the neural network to the intermediate neural network, to obtain a learned set of constraints, and an adjustment of the learned neural network as a function of the difference between the set of constraints of the transcription and the learned set of constraints, to obtain an adjusted neural network, the learned neural network being the adjusted neural network.
- - learning includes the use of at least one technique chosen from the list consisting of batch gradient descent, stochastic gradient descent and mini-batch gradient descent.
- the present description also relates to a decision support system, in particular a multicriteria decision support system, generated by implementing a generation method as described above.
- the present description also relates to a decision support system, in particular a multicriteria decision support system, the support system comprising a physical implementation of a neural network comprising a set of arranged neural subnetworks.
- each neural subnetwork being a first neural subnetwork or a second neural subnetwork, each first neural subnetwork performing a respective aggregation function, the aggregation function being, preferably, an aggregation function of variables chosen from the list consisting of a weighted sum of the variables, a Choquet integral, a 2-additive Choquet integral, a weighted sum of combinations of functions min and max between at most k variables, for k at least equal to 2, of a multi-linear model, of a generalized additive independence function, and of the ordered weighted average, and each second subnetwork of real neurons has reading a respective marginal utility function, the utility function preferably being a monotonic function or a function having three parts, a first monotonic part, a second part which is
- FIG. 1 a schematic representation of a computer and a computer program product suitable for implementing a process for generating a decision support system
- FIG. 3 a schematic representation of an example of a neural network used in the generation method of Figure 2
- FIG. 5 a schematic representation of another example of a neural subnetwork that can be used in the generation method of Figure 2.
- a computer 10 and a computer program product 12 are shown in Figure 1.
- the interaction between the computer 10 and the computer program product 12 allows the implementation of a method for generating a decision support system.
- the generation process is thus a computer implemented process.
- Calculator 10 is a desktop computer.
- the computer 10 is a computer mounted on a rack, a laptop computer, a tablet, a personal digital assistant (PDA) or a smartphone.
- PDA personal digital assistant
- the computer 10 can be seen as a system and can be designated as such hereinafter.
- the computer is adapted to operate in real time and / or is in an on-board system, in particular in a vehicle such as an airplane.
- the computer 10 comprises a calculation unit 14, a user interface 16 and a communication device 18.
- the computing unit 14 is an electronic circuit designed to manipulate and / or transform data represented by electronic or physical quantities in registers of the computer 10 and / or memories into other similar data corresponding to physical data in them. memories of registers or other types of display devices, transmission devices or storage devices.
- the computing unit 14 includes a single-core or multi-core processor (such as a central processing unit (CPU), a graphics processing unit (GPU), a microcontroller, and a digital signal processor ( DSP)), a programmable logic circuit (such as an application-specific integrated circuit (ASIC), an in-situ programmable gate array (FPGA), a programmable logic device (PLD), and programmable logic arrays (PLA)), a state machine, a logic gate, and discrete hardware components.
- a single-core or multi-core processor such as a central processing unit (CPU), a graphics processing unit (GPU), a microcontroller, and a digital signal processor ( DSP)
- a programmable logic circuit such as an application-specific integrated circuit (ASIC), an in-situ programmable gate array (FPGA), a programmable logic device (PLD), and programmable logic arrays (PLA)
- ASIC application-specific integrated circuit
- FPGA in-situ programmable
- the calculation unit 14 comprises a data processing unit 20 suitable for processing data, in particular by performing calculations, memories 22 suitable for storing data and a reader 24 suitable for reading a computer readable medium.
- User interface 16 includes an input device 26 and an output device 28.
- the input device 26 is a device allowing the user of the system 10 to enter information or commands into the system 10.
- the input device 26 is a keyboard.
- input device 26 is a pointing device (such as a mouse, touchpad, and graphics tablet), voice recognition device, eye tracker, or haptic device (motion analysis).
- the output device 28 is a graphical user interface, i.e. a display unit designed to provide information to the user of the computer 10.
- the output device 28 is a display screen allowing a visual presentation of the output.
- the output device is a printer, augmented and / or virtual display unit, speaker, or other sound generating device for presenting the output in sound form, a unit producing sound. vibrations and / or odors or a unit adapted to produce an electrical signal.
- the input device 26 and the output device 28 are the same component forming human-machine interfaces, such as an interactive screen.
- the communication device 18 allows one-way or two-way communication between the components of the computer 10.
- the communication device 18 is a communication system by bus or an input / output interface.
- the presence of the communication device 18 allows that, in certain embodiments, the components of the computing unit 14 are distant from each other.
- the computer program product 12 includes a computer readable medium 32.
- the computer readable medium 32 is a tangible device readable by the reader 24 of the computing unit 14.
- the computer readable medium 32 is not a transient signal per se, such as radio waves or other freely propagating electromagnetic waves, such as light pulses or electronic signals.
- Such a computer readable storage medium 32 is, for example, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device or any combination of. these.
- the computer readable storage medium 32 is a mechanically encoded device, such as punch cards or relief structures in a groove, floppy disk, hard disk, ROM. (ROM), random access memory (RAM), programmable read-only erasable memory (EROM), electrically erasable and readable memory (EEPROM), magneto-optical disc, static random access memory (SRAM), compact disc ( CD-ROM), Digital Versatile Disk (DVD), USB flash drive, floppy disk, flash memory, solid-state disk (SSD), or PC card such as a PCMCIA memory card.
- a computer program is stored on the computer readable storage medium 32.
- the computer program includes one or more stored program instruction sequences.
- Such program instructions when they are executed by the data processing unit 20, result in the execution of steps of the generation method.
- the form of program instructions is a form of source code, a computer-executable form, or any form intermediate between a source code and a computer-executable form, such as the form resulting from the conversion of the source code through an interpreter. , an assembler, a compiler, a linker or a locator.
- the program instructions are microcode, firmware instructions, state definition data, integrated circuit configuration data (eg, VHDL), or object code.
- Program instructions are written in any combination of one or more languages, for example, an object-oriented programming language (FORTRAN, C ++, JAVA, HTML), a procedural programming language (eg C language).
- object-oriented programming language e.g., C ++
- JAVA JAVA
- HTML JAVA
- C language e.g., C language
- the program instructions are downloaded from an external source via a network, as is notably the case for applications.
- the computer program product comprises a computer readable data medium on which the program instructions are stored or a data medium signal on which the program instructions are encoded.
- the computer program product 12 comprises instructions which can be loaded into the data processing unit 20 and adapted to cause the execution of the generation method when they are executed by the data processing unit. data 20.
- the execution is entirely or partially carried out either on the computer 10, that is to say a single computer, or in a system distributed between several computers (in particular via the use of the computer). cloud computing).
- FIG. 2 is a flowchart illustrating an example of the implementation of the method for generating a decision support system, in particular a multicriteria decision support system. .
- the generation process is a process of creating or manufacturing a decision support system.
- the generation method makes it possible to obtain a decision support system making it possible to provide a solution to the problem, the system being able to be used by a decision maker.
- the job of the designer of the decision support system is to transform the problem data into a physical system which is a decision support system.
- the decision support system is often a calculator 10 or part of it.
- Such a decision support system can be used in many industrial applications.
- the decision support system is a system for evaluating the performance of an existing physical system.
- the decision support system is a system for evaluating the functioning of an air traffic tracking system.
- the criteria relate to measures of tracking quality, such as aircraft location error.
- the decision support system is then suitable for providing an overall assessment, or a class (level of service quality).
- the decision support system is a system to help evaluate the functioning of a transport infrastructure, such as a train line or a metro line.
- a transport infrastructure such as a train line or a metro line.
- the decision support system makes it possible to offer the best solution among all the possible solutions, such as changing the train schedule or creating loops on the line.
- the decision support system is a design system for a physical system to be created.
- the decision support system is a design system of an assembly of several parts, like a radar.
- the decision support system then seeks to identify from the preferences of the radar designer the best compromise between several criteria such as measurements of the radar's overall quality, its performance, its weight or its cost.
- the decision support system aims to help the decision maker find the best solution to the initial problem.
- the decision support system is thus a physical implementation of a decision model adapted to the situation targeted by the initial problem, the decision model taking inputs to obtain an output.
- the initial problem is defined in two phases.
- the first phase consists of a discussion with the decision maker. It is a question of characterizing completely and from a point of view "high level" the problem to be solved. During such a phase, three points are determined, namely the form of the criteria, the nature of the problem and the available data.
- the first point determined is the form of the criteria on which each of the alternatives will be evaluated.
- the form of a criterion is the variation of the monotonicity of the model with respect to the criterion.
- monotony is the direction in which the exit evolves from one of the entry criteria, all other things being equal.
- M (a) M (a1, ..., an), with a1 to an the values of the alternative a on the n criteria.
- M is increasing (respectively decreasing, or increasing then decreasing, or decreasing then increasing) with respect to the criterion 1
- F which, to a real x, associates M (x, a2, ..., an) is increasing (respectively decreasing, or increasing then decreasing, or decreasing then increasing).
- the overall satisfaction is increasing with respect to the performance criterion such as the range of the radar (the farther the radar sees, the better, all other things being equal), and overall satisfaction is decreasing in relation to electricity consumption (the less electricity the radar consumes, the better, all other things being equal).
- the first point consists in asking if the output of the model is increasing, decreasing, increasing then decreasing, or decreasing then increasing compared to the value on this criterion.
- the way of representing the criteria is modified in order to have entries compatible with the decision model.
- An example of modification is the elimination of superfluous or too noisy criteria to be usable or the application of a transformation on certain criteria in order to ensure the monotony of the output of the decision model.
- the second point determined is the nature of the problem to be solved.
- the initial problem is one of the following: choosing the best alternative from a set of alternatives, distributing the alternatives among preference classes, ranking the alternatives in order of preference, and providing an evaluation score of an alternative.
- the nature of the problem to be solved is therefore the choice, the distribution, the storage or the evaluation.
- the problem of distributing alternatives among preference classes is a tidying problem often referred to as the sorting problem with reference to English terminology.
- the problem of ranking the alternatives in order of preference is an automatic ranking problem often referred to as the ranking problem with reference to English terminology.
- the problem of providing a review score for an alternative or choosing the best alternative from a set of alternatives is a problem of giving each alternative a satisfaction score.
- Such a problem is often referred to as a scoring problem with reference to English terminology.
- the third point is to get the training data.
- the training data are data from sensors and are therefore measurements.
- Learning data is most often heterogeneous in that the learning data comes from several sources (several sensors here) and has different natures.
- training data can be represented as pairs.
- a pair is a pair (x, y) where x is an alternative and y is the score that the alternative is supposed to have (expected score).
- a pair is a pair (x, k) where x is an alternative and k the index of a class of preferences (for example "good”, “bad” and “average”).
- a pair is a pair (x1, x2) where the two elements are alternatives, with x1 a preferred alternative to the alternative x2 by the decision maker.
- obtaining the training data in some cases involves the use of preprocessing to ensure that even if the data is represented in a different way, it represents reality in the same way and uses the data. same criteria for evaluating alternatives.
- Heterogeneity can arise from the fact that this learning data may be collected at different times and come from different people - each person providing a type of data.
- the decision model itself is defined.
- the designer now focuses on defining the model itself.
- the expert will build with the model designer the hierarchy of criteria, define the artificial criteria resulting from the aggregations, choose if he wants the aggregation class at each of the nodes and the nature of the functions. utilities applied to each of the criteria, if necessary.
- the decision maker may have a priori certainties. For example, the decision maker may consider knowing in advance the utility function of part of the native criteria, the parameters of some of the aggregations (or constraints on some of these parameters) for example. It is then possible to implement such constraints effectively, and to fix these utility functions so that they are not learned, and instead are defined "by hand" according to the wishes of the decision maker.
- Hierarchy can indeed correspond to a logical organization according to a decision maker - the intermediate aggregation nodes between the criteria and the final aggregation node then correspond to concepts that are meaningful to a decision maker.
- the generation method comprises a supply step E50, a transcription step E52, a learning step E54, a determination step E56 and a physical implementation step E58.
- the initial problem is supplied as well as the training data.
- the system receives the information which has, most often, been elaborated by interactions between the decision maker and the designer although these are not obligatory.
- the system thus knows the initial problem and the hierarchy of criteria while having a set of training data.
- the initial problem is transcribed in the form of a neural network and a set of constraints to be observed by the neural network.
- the transcription step E52 thus aims to convert the initial problem and the hierarchy of criteria into a neural network and a set of constraints to be observed by the neural network.
- the neural network has a specific architecture of which FIG. 3 illustrates a particular example.
- the transcribed neural network comprises a set of neural subnetworks.
- the neural network comprises a set of neural subnetworks arranged according to a specific structure, each neural subnetwork being a first neural subnetwork or a second neural subnetwork.
- a first neural subnet is an aggregation subnet.
- a first neural network is a subnetwork implementing an aggregation function.
- An aggregation function A is a function defined by a utility vector and returning an actual aggregation value.
- N ⁇ 1, ..., n ⁇ be the set of criteria (attributes in the vocabulary of the field of decision support).
- X is the domain of the i-th criterion.
- a decision model is a function U: X -> [0,1], which is generally called a utility function inducing a total order on X.
- the aggregation function belongs to the Choquet family of integrals.
- the Choquet integral is a generalization of the weighted sum, which also takes into account interactions between criteria.
- the Choquet integral is parameterized by a fuzzy measure m which is used to assign to each of the coalitions of criteria a certain weight (where a weighted sum has only one weight per singleton).
- a fuzzy measure m over a set L / is a function of 2 N in the set M satisfying a normalization condition and a monotonicity condition.
- a Q B ⁇ N m (A) £ m (B) £ 1
- the aggregation function A is a Choquet integral which is a 2-additive integral.
- W j denote the weight of the i-th utility function u
- Max denotes the weight of the maximum interaction between the i-th utility function ut and the j-th utility function uj.
- the first neural subnetwork comprises an n-dimensional input layer then a hidden layer and an output layer.
- the neurons of the hidden layer each perform one function among: identity (if the neuron has only one input, the neuron returns its input unchanged), min-pooling (the neuron has two inputs and returns the value of the smallest of its inputs), or max-pooling (the neuron has two inputs and returns the value of the largest of its inputs).
- the aggregation function A is a weighted sum of the utility functions.
- the aggregation function A is an ordered weighted average.
- OWA Orderered weighted averaging
- the aggregation function A is a generalized additive independence function.
- Such a function is often called the GAI function, the acronym GAI referring to the English name of "Generalized Additive Independence".
- the aggregation function A is a multi-linear function.
- Such a function is written mathematically according to an expression similar to the preceding expression for the case of a 2-additive Choquet integral, the only modification being to replace the min and max functions by a product of the variables, and to add neurons for each of the missing subsets.
- the use of this model requires the use of multi-dimensional utility functions which can be obtained by a multi-linear interpolation, or by a logit function integrating a multi-linear function of the different input variables.
- each aggregation function may be specific to a first subnet.
- the aggregation function is preferably an aggregation function of variables chosen from the list consisting of a weighted sum of the variables, a Choquet integral, a 2-additive Choquet integral, a weighted sum of combinations of min and max functions between at most k variables, for k being an integer at least equal to 2, a multi-linear model, a generalized additive independence function, and the ordered weighted mean.
- a second sub-network implements a utility function such as those used in the framework of decomposable models in multicriteria decision support.
- the role of a utility function is to take as input the raw value on one of the criteria, and to output the satisfaction provided by this value on said criterion, independently of the values of the alternative on all the others. criteria. This means that a utility function takes as input a real and outputs another real.
- a second subnet has as inputs the raw values of the alternatives on each of the criteria, and gives as output the marginal satisfaction that such a value gives to a decision maker on this particular criterion.
- the marginal utility function is normalized between 0 and 1.
- the minimum value of the marginal utility function is equal to 0 and the maximum value of the marginal utility function is equal to 1. In some cases, the minimum value and / or the maximum value is reached at limits.
- a function of marginal utility u is a function of X, in the interval [0,1] is a function ensuring a correspondence between the i-th attribute domain X, on l 'interval [0,1]
- FIG. 5 An example of such a second neural subnetwork is illustrated in FIG. 5 with a hidden layer comprising 3 nodes.
- marginal utility functions come in two different forms.
- the marginal utility function is monotonic.
- the marginal utility function is a single-plateau function.
- This second form is often referred to as a "single-tray".
- a marginal utility function is in the second form when the marginal utility function exhibits a single monotonicity change. Otherwise formulated, the marginal utility function has only two portions: a first portion on which the function is monotonic in one direction and a second portion on which the function is monotonic in a different direction from the first portion.
- each marginal utility function being able to be specific to a second subnetwork of neurons. neurons.
- the marginal utility function is preferably a monotonic function or a function having three parts, a first monotonic part, a second part which is constant and a third part monotonic, the monotony of the first part being different from the monotony of the third part.
- the neural network is thus an assembly of subnetworks, more exactly an assembly of subnetworks, each subnetwork being chosen from among the first subnetwork and the second subnetwork.
- the neural network is an assembly according to a specific structure.
- the neural network comprises a set of neural subnetworks arranged in a tree structure, each neural subnetwork being a first neural subnetwork or a second neural subnetwork.
- Tree structure is a tree structure when there is a unique path from a particular vertex to all other vertices, and each non-leaf vertex has at least two child nodes.
- the tree structure is a structure whose leaves are the raw inputs of the neural network (most often the data available from the sensors when the decision support system is used in real conditions) and the root is the output of the neural network.
- the neural network comprises as input second subnetworks of neurons followed by a first subnetwork of neurons.
- the described neural network has three second subnetworks followed by a first neural subnetwork.
- the outputs of the second subnetworks are three marginal utility functions which are connected by the first neural subnetwork which implements a Choquet integral.
- the neural network comprises several first subnetworks.
- such a structure allows the neural network to represent the hierarchy of criteria.
- the structure of the neural network is designed in collaboration with the decision maker.
- the structure of the neural network is chosen by the designer.
- the designer can rely on obvious relationships between variables, tests on several models to determine the most suitable for the data, or analyzes on the variables aimed at revealing particular relationships that may be exploited (reduction of dimension, strong positive or negative correlations between two variables for example).
- the structure of the neural network is a parameter of the neural network which is learned during the learning step E54.
- the neural network in tree structure is an aggregation of input criteria to generate new criteria which will, again, be aggregated into a new criterion and so on, until reaching an overall score. for the considered alternative.
- Such a type of neural network thus implements a decision model by small successive aggregations, which makes it possible to represent certain complex decision strategies, while pruning superfluous terms and parameters which may appear in a global aggregation of all the criteria.
- the neural network comprises only a first sub-network.
- the neural network implements a classical shockistic regression model.
- the neural network comprises as input a layer of a second neural network performing utility functions. This results in a model class that allows the use of such functions.
- a Choquet integral may not be a model compatible with some raw data (that is to say upstream of the application of a marginal utility).
- overall satisfaction must decrease in relation to electricity consumption, all other things being equal. This means that, without the application of a decreasing marginal utility on the electrical consumption, the Choquet integral will not be able to aggregate this criterion in a satisfactory manner.
- u which represents the satisfaction with the electrical consumption p
- the fact that the utilities all have values in [0,1] allows important properties of the model: the commensurability of the criteria (ability to compare the satisfactions on several distinct criteria) in particular. Indeed, it is hard to compare the satisfaction brought by criteria which live in different scales (for example the radar range between 10km and 1000km, and the electric consumption between 1 kW and 1000kW), and which can be in different units .
- the utilities therefore allow a renormalization of all these elements on the same satisfaction scale [0,1]
- the neural network has, as an input, other subnetworks for determining the values of the criteria from the input data.
- each subnet has a number of hidden layers less than or equal to 3.
- each sub-network is arranged in the form of layers with an input layer grouping the input neurons and an output layer grouping the output neuron (s). All middle layers are hidden layers that link only with neurons in the layers immediately downstream and neurons in the layer immediately upstream.
- each subnet has a number of hidden layers less than or equal to 5.
- each subnet is a multi-layered perceptron.
- the neural network is a set of simple neural networks, each representing a family of aggregation functions or utility functions, interconnectable in order to obtain multicriteria decision support models.
- the set of constraints to be observed by the neural network is obtained implicitly. For example, by choosing an aggregation function, we impose the constraints of this integration function on the model; for example, an aggregation function which is a Choquet integral will be increasing by its inputs, continuous, bounded, and idempotent.
- the set of constraints is provided explicitly.
- a decision maker can decide to further constrain aggregation functions or utility functions, for example by forcing certain parameters to be taller than others. This then requires additional constraints to be placed on the aggregation function.
- the set of constraints thus includes constraints of monotony, derivability, idempotence and continuity.
- the E52 transcription step comprises the formulation of the set of constraints to be respected by the neural network in the form of sub-constraints to be respected by each neural subnetwork.
- the sub-constraints to be respected by a neural subnetwork are chosen from the list consisting of:
- the output of the neural subnetwork is between a minimum value and a maximum value, the output of the neural subnetwork being equal to the minimum value when all the inputs of the neural subnetwork are equal to the minimum value, and the output of the neural subnetwork being equal to the maximum value when all the inputs of the neural subnetwork are equal to the maximum value, and
- each sub-network is suitable for implementing weights, a constraint being that the weights are positive and that the sum of the weights is equal to 1.
- each sub-network forms an autonomous computing unit independent of the other sub-networks.
- a transcribed neural network such a neural network being a complete network ready to be used for training in order to solve the initial problem.
- the learning step E54 a learning of the neural network transcribed using the learning data is implemented.
- the network being constructed and the data being compatible with the implementation of a learning, it is possible to implement a training of the network with the learning data.
- the parameters of the neural network are the weights connecting the neurons of the neural networks and the biases brought to the input value of certain neurons.
- the parameters of the neural network include the structure of the neural network. Therefore, even if information about the neural network is not available, it is still possible to learn the neural network, although it may be less precise.
- Learning involves using at least one technique from a list of batch gradient descent, stochastic gradient descent, and mini-batch gradient descent.
- Each of these techniques is a technique or algorithm for learning the parameters of each neural subnet, especially since each subnet is a multi-layered perceptron.
- these techniques consist of reading the training data several times, propagating each of the points forward in the neural network, and propagating the gradients backward (from the output to the input) to readjust the parameters of the network. neurons.
- the batch gradient descent technique is also referred to as the batch gradient descent algorithm.
- Performing such a technique involves adjusting the parameters to minimize a cost function, which quantifies the error between the estimated response and the correct response on training data.
- the parameters are iteratively modified by subtracting the gradient from the cost function which is calculated by composition of linear operators or of point non-linearities differentiable by taking an average over the whole of the batch, that is to say the whole calculated parameters.
- the implementation of such a technique thus involves successive multiplications of Jacobian matrices.
- Stochastic gradient descent (sometimes also called the stochastic gradient algorithm) is an (iterative) gradient descent technique used for the minimization of an objective function that is written as a sum of differentiable functions.
- the stochastic gradient descent is less computationally intensive in that it is performed on a single example per iteration, the example being chosen randomly from the database.
- the gradient descent by mini-batch is a compromise between the two previous techniques by choosing a mini-batch randomly at each iteration and by calculating the gradients on this mini-batch.
- the use of the aforementioned techniques leads to learning a neuron network which does not exhibit a coherence property.
- it is proposed to impose constraints to be respected locally, that is to say at the level of each sub-network which are the sub-constraints obtained previously.
- the training consists in learning a fuzzy measure satisfying the training data and respecting the two preceding conditions, namely the normalization condition and the monotonicity condition.
- the two learning sub-conditions are obtained by carrying out frequent renormalizations without the use of regularization.
- the first subnetwork formally implements a 2-additive Choquet integral, with all the properties that are expected from it (monotony, idempotence, continuity for example).
- the meaning of monotony is fixed by the initial problem, the decision maker knowing such a meaning.
- a marginal utility function (monotonic, or a monotonic part of a simple-plateau type marginal utility function) is learned as a weighted sum of sigmoids.
- a sigmoid is written as the ratio of 1 to the sum of 1 with an exponential.
- weighted sum is normalized such that the sum of the weights equals 1, and the weights are all positive.
- Learning for such a marginal utility function then consists of learning each weight and the two constants for each of the sigmoids with two constraints on the weights. According to the first constraint, each weight is positive and according to the second constraint, the sum of the weights is equal to 1.
- the weights of the last layer towards the exit are made positive by the use of hidden variables whose weights are the image by a positive and increasing mathematical function, and in which the sum of said weights is made equal to 1 by a renormalization at each iteration.
- the marginal utility function is monotonic (ie according to the first form) and normalized.
- the marginal utility function is a plateau-type function.
- a transformation is applied to the learning data to only have to learn the plateaus.
- the training then consists in learning four values x1, x2, x3 and x4 such that x1 ⁇ x2 ⁇ x3 ⁇ x4 and such that the marginal utility function has the value 0 on the intervals] - 00 ; x1] and [x4; 00 [, has a value of 1 on [x2; x3] and let a linear interpolation over the intervals [x1; x2] and [x3; x4].
- the learning of a marginal utility function according to the second form is carried out by learning of weighted sigmoid sums in a sense of monotony (as previously for the case of a marginal utility function according to the first form) then in the direction of reverse monotony.
- the threshold x * value is also learned, such that the first weighted sum is applied to the left of this point, and the second weighted sum is applied to the right of this point.
- the value in x * is necessarily 0 (in the case of a valley) or 1 (in the case of a plateau). This is guaranteed by renormalizations.
- Such learning also makes it possible to obtain a marginal utility function which is derivable.
- Such a property makes it possible in particular to facilitate learning since the problems of disappearance or explosion of the gradient during learning are avoided.
- learning involves, at each iteration, a propagation phase and a backpropagation phase.
- the propagation phase is a forward propagation phase consisting of the injection into the sheets of values for each criterion.
- the values will then go from subnet to subnet, over successive aggregations, until they reach the root, or exit node, which will give us the result.
- the back-propagation phase consists of comparing the result obtained with the expected result.
- the backpropagation phase uses a so-called “loss” function which characterizes the difference between prediction and truth.
- the gradient of this function L is then calculated at the output, then propagated through the entire network, up to the leaves.
- each of the sub-networks calculates the gradients of the loss with respect to its own parameters and updates them before transmitting the gradients to the upstream sub-networks.
- error function to optimize depends on the type of data (root mean square error for regression, for example, or logistic error for binary classification). Any type of differentiable error function can thus be minimized, at least locally.
- a Siamese architecture is preferred (the network N is duplicated, in order to obtain two "clone” networks N1 and N2 identical to the network N).
- the error is then an increasing function of (N2 (x2) -N1 (x1)), (for example, an arctangent function, that is here atan (N2 (x2) -N1 (x1)).
- the gradients are calculated on the first network clone N1, and the second network clone N2, and are then summed to obtain the total gradient, which is applied to the first network clone N1, then the first network clone N1 is copied again to obtain the new network clone N2.
- the learning step E54 provides a learned neural network solving the initial problem.
- the learned neural network performs a function. During the determination step E56, the function performed by the learned neural network is determined.
- an explicit form is determined in the form of a compact mathematical formula, which keeps the same parameters (and therefore corresponds to exactly the same function that the neural network represents).
- the determined function is the function corresponding to the compact mathematical formula.
- the network learns the weights w ,, w iLmin and w max . Once the network is learned, the weights are extracted (ignoring all the rest of the parameters), and the weights are used to parameterize an explicit 2-additive Choquet integral function (see previous equation). The determined function is thus completely characterized.
- the network is therefore only a medium for learning such a function, while keeping its explicit parameters.
- the parameters can, after learning, be saved, stored, or used as they are to configure a function of the same type.
- the determined function is physically implemented to obtain the decision support system.
- the neural network is implemented on an FPGA.
- the generation method thus makes it possible to obtain an evaluation system implementing the function performed by the learned neural network with a simplified implementation (consuming few resources and memory) to make it compatible with an on-board implementation.
- the evaluation system will make it possible to respond to the initial problem and to assist the decision-maker, in particular when the alternatives are modified too frequently for all of them to be studied by a human (aspect implemented in real time) or by the emission of an alert when an acceptable alternative most of the time is no longer acceptable (case of a monitoring application).
- the generation method thus makes it possible to easily and quickly learn a complex preference model respecting strong formal constraints and corresponding to known classes of preference models, or resulting from successive aggregations of models belonging to said classes.
- a preference model is thus a model suitable for assisting a decision-maker in making a decision on a given problem, in particular because it remains transparent and easy to interpret.
- the generation method is a method of neural learning of a preference model, which means that the generation method advantageously uses a representation of a problem in the form of a neural network.
- the method makes it possible to modify the neural network to easily switch from one type of problem to another. For example, it is easy to go from a regression problem to a classification problem, or even preference learning from pairs of labeled alternatives, depending on the input data, their nature, and the expected nature of the output.
- learning a neural network is a well parallelizable task. It is thus possible to implement the learning step E54 with a hardware architecture suitable for operations carried out in parallel.
- the learning step E54 is advantageously executed on processors such as CPUs or GPUs.
- a CPU is a processor, the acronym CPU coming from the English term “Central Processing Unit” literally meaning central processing unit while a GPU is a graphics processor, the acronym GPU coming from the English term “Graphics Processing Unit” literally meaning graphics unit treatment.
- the learning step E54 is carried out on a calculation farm.
- the generation process makes it possible to take advantage of the best of two worlds, that of multi-criteria decision support with its rigor on the one hand, and that of machine learning with its statistical approach. 'somewhere else.
- the generation process is thus part of the field of hybrid artificial intelligence, a field that combines statistical methods of machine learning and strong expert constraints integrated into these learning models.
- the generation process exploits the advantages provided by the use of neural networks, namely high modularity and the absence of the need for manual computation of complex gradients, while guaranteeing strong constraints on the learned model, at the both thanks to the particular architecture of the network, and thanks to the standardizations and procedures mentioned above.
- the method therefore exploits the ability of multilayer perceptrons to regress model parameters from data, and the formal guarantees offered by multicriteria decision support aggregation models to learn subtle, but highly constrained models. .
- the method thus makes it possible to generate models adapted to multi-criteria decision support, which offers the guarantees on the model that decision-makers may require in operational managers.
- the method also provides the possibility of working on noisy or even erroneous data.
- the method also makes use of the fact that the neural network is divided into two types of subnets.
- the methods of the prior art involve calculating a local gradient for each of the possible configurations, which in practice leads to limiting the aggregation so that the calculation can be carried out in practice.
- the gradients are calculated only locally, i.e. aggregation by aggregation.
- the training data are propagated from subnetwork to subnetwork during the forward propagation and the gradients are then propagated from subnetwork to subnetwork during the backpropagation.
- a simple gradient is thus calculated independently of the complexity of the network as a whole and this calculated gradient is then propagated to the upstream sub-networks.
- the subnets are defined in such a way as to ensure that at any stage of the E54 learning stage the subnetwork formally complies with all the constraints that must be observed for the function that the subnetwork performs. This avoids the use of regularizations or verification calculations. Constraints are met locally for each subnet in an easy manner, and the fact that the constraints are met by each learned subnet ensures that the entire learned network meets the constraints globally.
- the method also has the advantage of great modularity since each subnet can be changed without changing the entire network. For example, it is possible to replace a first subnet with another first subnet corresponding to a different aggregation function. You can also replace one marginal utility module with another. It is also possible to rearrange the subnets. However, in this case, it is advisable to re-train the network (or at least, the sub-part of the network which has been modified).
- the learning step E54 includes validation of the model.
- the designer then presents the model to be validated to the decision-maker, i.e. the function that the neural network learns
- Such a presentation is, for example, implemented by using indicators used in the field of decision support.
- indicators used in the field of decision support.
- Examples of such indicators are Shapley values or interaction indices.
- the designer presents a graph plotting the Shapley values of each criterion.
- the Shapley value of each of the criteria represents the relative importance of the criterion compared to the other criteria. Knowing a fuzzy measure m over a set of criteria, the Shapley value of criterion i is calculated as:
- the interaction index characterizes the strength of an interaction between two criteria (by its absolute value). The sign of the interaction index indicates whether it is redundancy (negative) or synergy (positive).
- the interaction index between criteria i and j is written as:
- indicators from machine learning such as sensitivity analyzes, Sobol indices or cross-validation errors are used.
- Sobol indices are alternatives to Shapley values for calculating the part of variance expressed by each of the criteria (another way of defining the importance of each of the criteria).
- Cross-validation is a process aimed at quantifying the performance of the system: a network is trained on 80% of the data, and tested on the remaining 20% (so as to evaluate it on data that has not been seen beforehand). 'coaching). The errors on these data (in addition to their average value, which is already an indicator of the performance of the model), provide information on the model. In particular, if all the examples of a certain zone of space X are badly classified, this can denote a weakness of the model in this precise zone, which can illustrate a bad marginal utility, or a bad choice of aggregation function. .
- the decision maker determines whether the model is valid with regard to his experience of the real situation.
- the designer determines how to make the appropriate corrections to obtain a valid model.
- Such a correction is, according to a first example, a manual correction, that is to say a forced modification of certain parameter values of the learned neural network.
- the presented process (evaluation and correction) can be iterated as many times as necessary until the decision maker is satisfied with the model obtained.
- validation includes the use of quantitative indicators such as cross-validation error in place of decision maker satisfaction or in addition to decision maker satisfaction through subjective and quantitative assessment weighting.
- the training includes performing a supervised training technique.
- training of the hierarchy of criteria is carried out.
- Such an embodiment is particularly relevant when the decision maker is not available.
- the learning comprises a first learning with the set of constraints of the transcription allowing to learn an intermediate neural network, a second learning of the set of constraints by fixing the neural network to the intermediate neural network, to obtain a learned set of constraints, and an adjustment of the learned neural network as a function of the difference between the set of constraints of the transcription and the learned set of constraints, to obtaining an adjusted neural network, the learned neural network being the adjusted neural network.
- a first technique is to learn a single aggregation layer. Such a layer, however, is over-parametered, and presents greater risks of over-training than a model designed with an expert, but is most likely to adequately represent the data.
- a second technique involves studying the data using various unsupervised learning algorithms. For example, interactions can be seen on correlation / covariance matrices, utility senses on partial dependence graphs, artificial criteria can be the main components of a Karhunen-Loève transformation. More advanced techniques, such as variational autoencoders, can also be applied.
- a third technique consists of learning the hierarchy via so-called structure learning techniques (more often referred to by the corresponding English name of "structure learning”).
- structure learning techniques more often referred to by the corresponding English name of "structure learning”.
- One of the preferred techniques, specific to Choquet integrals consists of starting from a "flat" model, with a single aggregation and learning until convergence. Once this has been done, observing the values of interactions between criteria makes it possible to group together the criteria which interact in the same way with all the other criteria, thus generating a new tree.
- Several candidate trees are thus created, and trained. It can then be kept only a certain number of better candidates (those who do not degrade performance), and continue learning until a chosen stopping criterion (for example, if no new candidate improves the performance).
- the generation method comprises the application of a statistical technique to robustify the model implemented by the neural network or of characteristic modification techniques (analysis by principal component, auto-encoder, etc. ).
- the neural network comprises stages of pre-processing on the inputs or post-processing on the outputs.
- the neural network comprises one or more deep learning subnetworks.
- networks learning representation or selecting characteristics would thus make it possible to extract, from data initially incompatible with the model (an image for example), new characteristics (or new criteria) which, they would serve as a support for the decisions taken by the decision models described above.
- the first sub-network (s) in this case serve as an “output block” and therefore make decisions on the information pre-processed by the upstream deep networks.
- a special case is the computation of marginal utility functions which can be applied as a "rescaling" of native criteria, within the framework of utility models, in order to provide modified variables more suited to the following aggregations. We can thus learn in parallel the marginal utilities and the parameters of the aggregation functions by stochastic descent of the gradient.
- the generation method may include a combination of the above features when they are technically compatible.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
Claims
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| FR2005894A FR3111216A1 (fr) | 2020-06-05 | 2020-06-05 | Procédé de génération d'un système d'aide à la décision et systèmes associés |
| PCT/EP2021/064992 WO2021245227A1 (fr) | 2020-06-05 | 2021-06-04 | Procédé de génération d'un système d'aide à la décision et systèmes associés |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| EP4162409A1 true EP4162409A1 (fr) | 2023-04-12 |
Family
ID=76355493
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| EP21731114.1A Pending EP4162409A1 (fr) | 2020-06-05 | 2021-06-04 | Procédé de génération d'un système d'aide à la décision et systèmes associés |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US20230206036A1 (fr) |
| EP (1) | EP4162409A1 (fr) |
| FR (1) | FR3111216A1 (fr) |
| WO (1) | WO2021245227A1 (fr) |
Families Citing this family (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US12347421B2 (en) | 2020-06-25 | 2025-07-01 | PolyN Technology Limited | Sound signal processing using a neuromorphic analog signal processor |
| US20210406662A1 (en) * | 2020-06-25 | 2021-12-30 | PolyN Technology Limited | Analog hardware realization of trained neural networks for voice clarity |
| US20210406661A1 (en) | 2020-06-25 | 2021-12-30 | PolyN Technology Limited | Analog Hardware Realization of Neural Networks |
| US20230360119A1 (en) * | 2022-05-04 | 2023-11-09 | Jpmorgan Chase Bank, N.A. | System and method for generating grouped shapley values |
-
2020
- 2020-06-05 FR FR2005894A patent/FR3111216A1/fr active Pending
-
2021
- 2021-06-04 EP EP21731114.1A patent/EP4162409A1/fr active Pending
- 2021-06-04 US US18/008,298 patent/US20230206036A1/en active Pending
- 2021-06-04 WO PCT/EP2021/064992 patent/WO2021245227A1/fr not_active Ceased
Also Published As
| Publication number | Publication date |
|---|---|
| FR3111216A1 (fr) | 2021-12-10 |
| WO2021245227A1 (fr) | 2021-12-09 |
| US20230206036A1 (en) | 2023-06-29 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2021245227A1 (fr) | Procédé de génération d'un système d'aide à la décision et systèmes associés | |
| Chen et al. | Variational knowledge graph reasoning | |
| Pandey et al. | Variational methods for conditional multimodal deep learning | |
| Savitha et al. | A meta-cognitive learning algorithm for an extreme learning machine classifier | |
| US11941493B2 (en) | Discovering and resolving training conflicts in machine learning systems | |
| US11823076B2 (en) | Tuning classification hyperparameters | |
| CA3061717A1 (fr) | Systeme et procede pour un reseau neuronal convolutif pour la classification multi-label avec annotations partielles | |
| CN111247532A (zh) | 利用多任务学习进行特征提取 | |
| CN107004162A (zh) | 量子深度学习 | |
| US20250166236A1 (en) | Segmentation free guidance in diffusion models | |
| JP7188856B2 (ja) | 動的な画像解像度評価 | |
| US20240273682A1 (en) | Conditional diffusion model for data-to-data translation | |
| Shivhare et al. | Software effort estimation using machine learning techniques | |
| FR3144362A1 (fr) | Système et procédé de recommandation utilisant un apprentissage de données multivariées par filtrage collaboratif | |
| Hu | Deep learning for ranking response surfaces with applications to optimal stopping problems | |
| CN115879634A (zh) | 一种气象环境因素与农业干旱因果推断方法 | |
| Chang | Concept-oriented deep learning: Generative concept representations | |
| KR102110316B1 (ko) | 뉴럴 네트워크를 이용한 변분 추론 방법 및 장치 | |
| Xiang et al. | Computation of cnn’s sensitivity to input perturbation | |
| US12026474B2 (en) | Techniques for generating natural language descriptions of neural networks | |
| US20250308083A1 (en) | Reference image structure match using diffusion models | |
| Thinh | Qos prediction for web services based on Restricted Boltzmann Machines | |
| US20250045892A1 (en) | Variational inferencing by a diffusion model | |
| Glüsenkamp | Unifying supervised learning and VAEs: coverage, systematics and goodness-of-fit in normalizing-flow based neural network models for astro-particle reconstructions | |
| US20230401437A1 (en) | Reasoning with real-valued first order logic and probability intervals |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: UNKNOWN |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
| PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
| 17P | Request for examination filed |
Effective date: 20221202 |
|
| AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
| DAV | Request for validation of the european patent (deleted) | ||
| DAX | Request for extension of the european patent (deleted) | ||
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
| 17Q | First examination report despatched |
Effective date: 20260304 |