WO2019128529A1 - Url攻击检测方法、装置以及电子设备 - Google Patents
Url攻击检测方法、装置以及电子设备 Download PDFInfo
- Publication number
- WO2019128529A1 WO2019128529A1 PCT/CN2018/116100 CN2018116100W WO2019128529A1 WO 2019128529 A1 WO2019128529 A1 WO 2019128529A1 CN 2018116100 W CN2018116100 W CN 2018116100W WO 2019128529 A1 WO2019128529 A1 WO 2019128529A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- url
- access request
- training
- features
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1433—Vulnerability analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
Definitions
- the present specification relates to the field of computer applications, and in particular, to a URL attack detection method, apparatus, and electronic device.
- This specification proposes a URL attack detection method, which includes:
- the URL attack detection model is a machine learning model trained based on an Isolation Forest machine learning algorithm
- the method further includes:
- the features of the plurality of dimensions are respectively extracted from the information carried by the plurality of URL access request samples; wherein the plurality of URL access request samples are not marked with the sample tags.
- the URL attack detection model is obtained by training the plurality of training samples based on an Isolation Forest machine learning algorithm.
- the URL attack detection model includes M random binary trees trained based on the Isolation Forest machine learning algorithm
- the training based on the Isolation Forest machine learning algorithm is performed on the plurality of training samples to obtain the URL attack detection model, including:
- a classification feature as a root node for each training sample subset, and a subset of each training sample in a value interval formed by the maximum value and the minimum value of the classification feature Randomly selecting a classification threshold;
- the training samples in each leaf node are used as a new subset of training samples, and the above classification process is iteratively executed until the training samples in the obtained leaf nodes are not reclassable.
- the extracted feature is input into a preset URL attack detection model for predictive calculation, and the risk score of the URL access request is obtained, including:
- the information includes: domain name information, and/or a URL parameter; the characteristics of the several dimensions include: a feature extracted from the domain name information carried in the URL access request; and/or carried from the URL access request. The extracted feature in the URL parameter.
- the feature includes a combination of the following features: a total number of characters, a total number of letters, a total number of digits, a total number of symbols, a number of different characters, a different number of letters, a different number of digits, and a number of different symbols.
- the present specification also proposes a URL attack detecting device, the device comprising:
- the first extraction module extracts features of several dimensions from the information carried in the URL access request
- the calculating module inputs the extracted feature into a preset URL attack detection model for predictive calculation, and obtains a risk score of the URL access request; wherein the URL attack detection model is machine learning based on the Isolation Forest machine learning algorithm training model;
- a determining module determines whether the URL access request is a URL attack request based on the risk score.
- the device further includes:
- the second extraction module extracts features of the plurality of dimensions from the information carried by the plurality of URL access request samples, wherein the plurality of URL access request samples are not marked with the sample tags.
- the training module trains the plurality of training samples based on an Isolation Forest machine learning algorithm to obtain the URL attack detection model.
- the URL attack detection model includes M random binary trees trained based on the Isolation Forest machine learning algorithm
- the training module :
- a classification feature as a root node for each training sample subset, and a subset of each training sample in a value interval formed by the maximum value and the minimum value of the classification feature Randomly selecting a classification threshold;
- the training samples in each leaf node are used as a new subset of training samples, and the above classification process is iteratively executed until the training samples in the obtained leaf nodes are not reclassable.
- the computing module is:
- the information includes: domain name information, and/or a URL parameter; the characteristics of the several dimensions include: a feature extracted from the domain name information carried in the URL access request; and/or carried from the URL access request. The extracted feature in the URL parameter.
- the feature includes a combination of the following features: a total number of characters, a total number of letters, a total number of digits, a total number of symbols, a number of different characters, a number of different letters, a number of different numbers, and a number of different symbols.
- the present specification also proposes an electronic device comprising:
- a memory for storing machine executable instructions
- the processor is caused to:
- the URL attack detection model is a machine learning model trained based on an Isolation Forest machine learning algorithm
- the technical solution provided by the embodiment of the present specification can perform the attack detection on the URL access request by inputting the feature extracted from the URL access request into the URL attack detection model trained based on the Isolation Forest machine learning algorithm, and can perform the attack detection in advance. Discover potential URL attacks to help protect against potential exception URL access in a timely manner.
- FIG. 1 is a flowchart of a URL attack detection method according to an embodiment of the present disclosure
- FIG. 2 is a flow chart of constructing a training sample set training Isolation Forest model according to an embodiment of the present specification
- FIG. 3 is a hardware structural diagram of an electronic device carrying a URL attack detecting apparatus according to an embodiment of the present disclosure
- FIG. 4 is a logic block diagram of the URL attack detecting apparatus according to an embodiment of the present disclosure.
- This specification aims to propose a machine learning training based on the Isolation Forest machine learning algorithm for URL access request samples that are not marked with risk tags, to construct a URL attack detection model, and use the URL attack detection model to normal.
- the URL access request is used for attack detection to discover potential technical solutions for URL attacks.
- a number of URL access request samples may be prepared in advance; none of these URL access request samples are tagged with a risk tag. Then, the URL access request samples may be subjected to data segmentation, and features of the plurality of dimensions are extracted from the information carried in the URL access request samples;
- the foregoing information may specifically include domain name information and a URL parameter.
- the URL access request sample may be subjected to data segmentation, and the URL access request and the domain name information carried in the sample (for example, the main The domain name and the corresponding domain name suffix), the URL parameter (such as the URL parameter name and the corresponding parameter value), and then extract the characteristics of several dimensions from the extracted domain name information and the URL parameter.
- the features may be normalized, and then the normalized features are used as modeling features to construct the training samples.
- these training samples can be trained based on the Isolation Forest machine learning algorithm to construct a URL attack detection model.
- the Isolation Forest machine learning algorithm can be used to classify the training samples into binary trees to construct multiple random binary trees. .
- the features carried in the URL access request that needs to be detected by the attack may be extracted in the same manner, and the predicted samples are constructed based on the extracted features.
- the constructed predicted sample is input into the URL attack detection model to perform prediction calculation, and the risk score of the URL access request is obtained, and then the risk score can be used to determine whether the URL access request is a URL attack request.
- the attack detection of the URL access request may detect potential in advance. URL attacks to help protect against potential exception URL access in a timely manner.
- FIG. 1 is a method for detecting a URL attack according to an embodiment of the present disclosure, and performing the following steps:
- Step 102 Extract features of several dimensions from information carried in the URL access request.
- Step 104 Input the extracted feature into a preset URL attack detection model for predictive calculation, and obtain a risk score of the URL access request; wherein the URL attack detection model is machine learning based on the Isolation Forest machine learning algorithm training. model;
- Step 106 Determine, according to the risk score, whether the URL access request is a URL attack request.
- the modeler can pre-collect a large number of unmarked URL access requests as unmarked samples, and build a training sample set based on the collected unmarked samples, and then the training based on the Isolation Forest machine learning algorithm.
- the sample set performs unsupervised machine learning training to build the above URL attack detection model.
- FIG. 2 is a flowchart of constructing a training sample set training Isolation Forest model according to the present specification.
- the collected unmarked original URL access request samples may be separately segmented into data, and the information carried in the URL access request samples may be extracted.
- the information carried in the URL access request refers to information that can be extracted therefrom and can reflect whether the URL access request is risky.
- the foregoing information may specifically include a URL parameter, domain name information, and the like carried in the URL access request.
- the URL parameter may include a URL parameter name (ParamName) and a corresponding parameter value (ParamValue).
- the domain name information may include a primary domain name and a domain name suffix corresponding to the primary domain name.
- the original URL access request sample may be segmented, and the URL parameter name (ParamName) carried in the URL access request sample and corresponding parameters may be extracted. Value (ParamValue);
- the original URL access request sample may be subjected to data segmentation.
- the primary domain name carried in the URL access request and the domain name suffix corresponding to the primary domain name are extracted.
- the information that is more common in the known URL attack request can be filtered out from the information to construct a machine learning model. That is, the information that best characterizes the URL attack request is filtered to participate in the modeling.
- features of several dimensions may be extracted from the information as modeling features.
- the information extracted from the URL access request sample may specifically adopt one of the domain name information and the URL parameter carried in the URL access request sample, or may simultaneously adopt the above. Domain name information and URL parameters are used as information.
- the features extracted by the modeler from the information may include the following three cases:
- the finally extracted feature may only include several dimensions extracted from the domain name information carried in the URL access request sample.
- the finally extracted feature may only include some extracted from the URL parameter carried in the URL access request sample.
- the modeler simultaneously uses the URL parameter and the domain name information carried in the URL access request sample as the information, then the URL parameter and the above domain name information will participate in the modeling at the same time, and finally the extracted feature,
- the features of the several dimensions extracted from the URL parameters and the domain name information carried in the URL access request sample may be included at the same time; the features extracted from the information are not specifically limited in the present specification, and are in practical application. Any form of feature that characterizes the information carried in the URL attack request and the regularity of the feature can be selected as the modeling feature.
- those skilled in the art who participate in modeling can extract features of several dimensions from the information based on experience, then try to model based on these features, and evaluate the modeling results.
- the features of several dimensions with the highest contribution to the model are selected as modeling features.
- the features extracted from the information may include the total number of characters of the information, the total number of letters of the information, the total number of digits of the information, the total number of symbols of the information, the number of different characters of the information, and the difference in information. 8 dimensions, such as the number of letters, the number of different numbers of information, and the number of different symbols of information.
- the finally extracted features may include the total number of characters of the domain name information, the total number of letters of the domain name information, the total number of digits of the domain name information, and the total number of symbols of the domain name information. 8, the number of different characters of the domain name information, the number of different letters of the domain name information, the number of different digits of the domain name information, the number of different symbols of the domain name information, and the like;
- the finally extracted features may include the total number of characters of the URL parameter, the total number of letters of the URL parameter, the total number of digits of the URL parameter, the total number of symbols of the URL parameter, and the URL. 8 different dimensions of the parameter, different number of letters of the URL parameter, different number of digits of the URL parameter, different number of symbols of the URL parameter, etc.;
- the finally extracted features may include the total number of characters of the URL parameter, the total number of letters of the URL parameter, the total number of the number of the URL parameter, and the symbol of the URL parameter.
- Total number, different characters of URL parameter, different number of letters of URL parameter, different number of digits of URL parameter, different number of symbols of URL parameter total number of characters of domain name information, total number of letters of domain name information, total number of domain name information, domain name information
- the features of these dimensions may also be normalized.
- the range of values of different features is normalized to a uniform numerical interval, thereby eliminating the influence of the different values of the features on the modeling accuracy.
- a corresponding feature vector may be separately created as a training sample for each URL access request sample based on the features extracted from the information carried by each URL access request sample; wherein, The dimension of the feature vector is the same as the dimension of the extracted feature.
- a target matrix can be created based on the feature vector constructed for each URL access request sample; for example, assuming that a total of N URL access request samples are collected, from each A URL access request sample extracts an M-dimensional feature, and the target matrix may specifically be an N*M-dimensional target matrix.
- the target matrix created is the training sample set that ultimately participates in the machine learning model training.
- the training sample set when the training sample set is trained, these training samples can be trained based on the Isolation Forest machine learning algorithm to construct the above URL attack detection model.
- the Isolation Forest algorithm is an algorithm for mining abnormal data samples from the original data set by constructing multiple random binary trees.
- the so-called random binary tree refers to a binary tree constructed based on randomly generated classification features and randomly generated classification threshold values corresponding to the values of the classification features. That is, when constructing a random binary tree, the classification features used and the classification thresholds corresponding to the values of the classification features are randomly generated.
- the Isolation Forest algorithm is used to train the completed training sample set to construct the URL anomaly detection model, which is the process of classifying the training samples in the training sample set by using the Isolation Forest algorithm to construct the M random binary tree.
- the modeler Before modeling the training sample set based on the Isolation Forest algorithm, the modeler needs to configure the parameters of the Isolation Forest algorithm, configure the number of random binary trees M to be constructed for the Isolation Forest algorithm, and construct a single random number. The number of training samples N to be sampled from the training sample set when the binary tree is used.
- the value of the above M and N can be set by using the engineering experience value or based on the actual requirements of the modeling party; for example, the number of random binary trees that the Isolation Forest algorithm needs to construct by default is 100, and each random binary tree needs The number of training samples sampled is 256.
- the modeler can train the constructed training sample set by running the Isolation Forest algorithm in the built computing platform (such as a server cluster) to build the final URL. Anomaly detection model.
- M times of uniform sampling can be performed on the training sample set based on the above-described N values configured by the modeler.
- the uniform sampling refers to the same number of training sample sets sampled from the training sample set in each of the M samplings performed.
- M training sample subsets can be constructed based on the sampled training samples, and then the training samples in each training sample subset are separately classified to construct M random binary trees.
- a feature can be randomly selected from the features of the training sample subsets as the classification feature, and Using the classification feature as a root node; and determining a maximum value and a minimum value of the classification feature currently in the training sample subset, and then in the value interval formed by the maximum value and the minimum value, The sample set randomly selects a classification threshold.
- the first level classification may be performed on the training sample subset, and the value of the classification feature of each training sample in the training sample subset is respectively Comparing with the above classification threshold, and then classifying the training samples in the training sample subset based on the comparison result, the training samples whose value of the classification feature is greater than the classification threshold, and the training that the classification feature is smaller than the classification threshold.
- the training sample in the subset of training samples whose value of the classification feature is smaller than the classification threshold may be classified into a left tree branch of the binary tree, and the training sample is used as the root node on the binary tree. a left leaf node; and the training sample in the subset of the training samples having a value greater than the above classification threshold is classified into a right tree branch of the binary tree, and the training sample is used as the right leaf of the root node on the binary tree node.
- the second level classification for the above training sample subset can be continued.
- the training samples of the two leaf nodes that have been classified may be respectively used as a new subset of training samples, and then the above classification process is iteratively performed for the new training sample subset until the obtained leaf nodes are obtained.
- the training samples in the training cannot be reclassified;
- the classification features and the classification thresholds can be randomly selected for each new training sample subset, and then the training samples in each new training sample subset are classified as: the above classification features have values greater than the above classification.
- the training sample of the critical value, and the training sample whose classification feature is smaller than the above-mentioned classification threshold, and the two types of training samples classified are respectively used as the next-stage leaf node of the leaf node of the upper level, and so on.
- the obtained training samples in the leaf nodes of the next level can no longer be time-divisionally stopped; for example, only one training sample remains in the leaf nodes, or the training samples in the leaf nodes are identical. It indicates that the training samples in the obtained leaf nodes can no longer continue to be classified.
- classification features randomly selected for the root node and the child nodes of each level need to be kept different; for example, in one implementation manner, one feature is selected as a classification feature of a node in a random binary tree. After that, the feature can be removed, and when the classification feature is selected for other nodes, random selection will be made in other features than the feature.
- the stop condition of the iterative classification of the Isolation Forest algorithm shown above may by default be that the training samples in the obtained leaf nodes can no longer continue to be classified.
- the modeling party can also be in the Isolation Forest.
- a maximum binary tree depth ie, the maximum number of layers from the root node
- the above stop condition may also be that when the depth of the obtained random binary tree reaches the maximum binary tree depth configured by the algorithm by the above-described iterative classification process, the algorithm may stop immediately (the leaf nodes obtained at this time) The training samples in the middle may still be able to continue to be classified).
- Shown above is a process of iteratively classifying training samples in one of the training sample subsets to construct a single random binary tree.
- M random binary tree which is the detection model of the finally constructed URL anomaly.
- the information extraction request may be extracted from the URL access request that needs to perform attack detection according to the same feature extraction method as shown in FIG. 2, and information may be filtered from the extracted information. Extracting features of several dimensions from the filtered information (consistent with the characteristics of the training phase of the model), then constructing a prediction sample based on the extracted features, and inputting the prediction samples into the above URL attack detection model for prediction calculation, and obtaining the URL Access the risk score of the request.
- the path depth h(x) of the prediction sample in each random binary tree needs to be estimated
- the value corresponding to the classification feature of the root node in the prediction sample may be first determined, and then the first-level leaf node where the prediction sample is located is searched based on the value. After finding the first-level leaf node, the value corresponding to the classification feature of the first-level leaf in the prediction sample may be determined, and then, according to the value, the second-level leaf node where the prediction sample is located may continue to be searched. By analogy, it can be traversed step by step until it stops when the leaf node corresponding to the prediction sample is found.
- the path depth h(x) finally obtained at this time can be characterized by the following formula:
- C(n) is the correction value and can be characterized by the following formula:
- H(n-1) can be estimated by ln(n-1)+0.5772156649, where the constant is the Euler constant.
- the average value of the path depth of the predicted sample in each random binary tree can be further solved, and then the average value obtained is obtained. Normalization processing, the calculation result is quantized to between 0 and 1, and the risk score of the URL access request is obtained;
- Score(x) represents the final risk score of the predicted sample X
- E ⁇ h(x) ⁇ represents the path depth h(x) of the predicted sample in each random binary tree
- the number of training samples representing a single random binary tree Express The average path length of the binary tree constructed by the training samples is used to normalize the calculation results in the above formula.
- the URL risk score may be further determined based on the URL risk score to determine whether the URL access request is a URL attack request;
- the risk score may be compared with a preset risk threshold to determine a specific type of the URL access request; if the risk score is greater than or equal to a preset risk threshold, the URL is indicated The access request is a URL attack request; conversely, if the risk score is less than the preset risk threshold, it indicates that the URL access request is a normal URL access request.
- the URL access request is attack-detected by inputting the feature extracted from the URL access request to the URL attack detection model trained based on the Isolation Forest machine learning algorithm for predictive calculation:
- the Isolation Forest algorithm is an unsupervised machine learning algorithm
- the training samples required to train the model can no longer need to mark the sample tags, so for the modeling party, it is possible to save the training samples. A lot of labor costs caused by the standard.
- the present specification also provides an embodiment of a URL attack detecting apparatus.
- the embodiment of the URL attack detecting device of the present specification can be applied to an electronic device.
- the device embodiment may be implemented by software, or may be implemented by hardware or a combination of hardware and software.
- the processor of the electronic device in which the computer is located reads the corresponding computer program instructions in the non-volatile memory into the memory.
- FIG. 3 a hardware structure diagram of an electronic device in which the URL attack detecting device of the present specification is located, except for the processor, the memory, the network interface, and the nonvolatile memory shown in FIG.
- the electronic device in which the device is located in the embodiment may also include other hardware according to the actual function of the electronic device, and details are not described herein.
- FIG. 4 is a block diagram of a URL attack detecting apparatus shown in an exemplary embodiment of the present specification.
- the URL attack detection device 40 can be applied to the electronic device shown in FIG. 3, and includes: a first extraction module 401, a calculation module 402, and a determination module 403.
- the first extraction module 401 extracts features of several dimensions from information carried in the URL access request;
- the calculating module 402 is configured to input the extracted feature into a preset URL attack detection model for predictive calculation, and obtain a risk score of the URL access request; wherein the URL attack detection model is a machine trained based on the Isolation Forest machine learning algorithm. Learning model
- the determining module 403 determines whether the URL access request is a URL attack request based on the risk score.
- the device 40 further includes:
- the second extraction module 404 (not shown in FIG. 4) extracts features of several dimensions from the information carried by the plurality of URL access request samples respectively; wherein the plurality of URL access request samples are not labeled the sample tags.
- a building module 405 constructs a number of training samples based on the extracted features
- a training module 406 trains the plurality of training samples based on an Isolation Forest machine learning algorithm to obtain the URL attack detection model.
- the URL attack detection model includes M random binary trees trained based on the Isolation Forest machine learning algorithm
- the training module 406 is the training module 406
- a classification feature as a root node for each training sample subset, and a subset of each training sample in a value interval formed by the maximum value and the minimum value of the classification feature Randomly selecting a classification threshold;
- the training samples in each leaf node are used as a new subset of training samples, and the above classification process is iteratively executed until the training samples in the obtained leaf nodes are not reclassable.
- the calculation module 403 calculates the calculation module 403:
- the information includes: domain name information, and/or a URL parameter;
- the characteristics of the plurality of dimensions include: a feature extracted from the domain name information carried in the URL access request; and/or from the URL access request The feature extracted from the carried URL parameter.
- the feature includes a combination of a plurality of features: total number of characters, total number of letters, total number of digits, total number of symbols, number of different characters, number of different letters, number of different numbers, number of different symbols.
- the device embodiment since it basically corresponds to the method embodiment, reference may be made to the partial description of the method embodiment.
- the device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, ie may be located A place, or it can be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the objectives of the present specification. Those of ordinary skill in the art can understand and implement without any creative effort.
- the system, device, module or unit illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product having a certain function.
- a typical implementation device is a computer, and the specific form of the computer may be a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email transceiver, and a game control.
- the present specification also provides an embodiment of an electronic device.
- the electronic device includes a processor and a memory for storing machine executable instructions; wherein the processor and the memory are typically interconnected by an internal bus.
- the device may also include an external interface to enable communication with other devices or components.
- the processor by reading and executing the machine-executable instructions stored in the memory corresponding to the control logic of the URL attack detection, the processor is caused to:
- the URL attack detection model is a machine learning model trained based on an Isolation Forest machine learning algorithm
- the processor is also caused to: by reading and executing the machine executable instructions corresponding to the control logic of the URL attack detection detected by the memory:
- the features of the plurality of dimensions are respectively extracted from the information carried by the plurality of URL access request samples; wherein the plurality of URL access request samples are not marked with the sample tags.
- the URL attack detection model is obtained by training the plurality of training samples based on an Isolation Forest machine learning algorithm.
- the URL attack detection model includes M random binary trees trained based on the Isolation Forest machine learning algorithm
- the processor is also caused to: by reading and executing the machine executable instructions corresponding to the control logic of the URL attack detection detected by the memory:
- a classification feature as a root node for each training sample subset, and a subset of each training sample in a value interval formed by the maximum value and the minimum value of the classification feature Randomly selecting a classification threshold;
- the training samples in each leaf node are used as a new subset of training samples, and the above classification process is iteratively executed until the training samples in the obtained leaf nodes are not reclassable.
- the processor is also caused to: by reading and executing the machine executable instructions corresponding to the control logic of the URL attack detection detected by the memory:
- the information includes: domain name information, and/or a URL parameter;
- the characteristics of the plurality of dimensions include: a feature extracted from the domain name information carried in the URL access request; and/or from the URL access request The feature extracted from the carried URL parameter.
- the extracted features of the several dimensions include a combination of the following features: the total number of characters of the information, the total number of letters of the information, the total number of digits of the information, the total number of symbols of the information, the number of different characters of the information, The number of different letters of information, the number of different numbers of information, and the number of different symbols of information.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Computer Hardware Design (AREA)
- Signal Processing (AREA)
- Medical Informatics (AREA)
- Mathematical Physics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
- Computer And Data Communications (AREA)
Abstract
一种URL攻击检测方法,包括:从URL访问请求中携带的信息中提取若干维度的特征(102);将提取到的特征输入预设的URL攻击检测模型进行预测计算,得到所述URL访问请求的风险评分;其中,所述URL攻击检测模型为基于Isolation Forest机器学习算法训练得到的机器学习模型(104);基于所述风险评分确定所述URL访问请求是否为URL攻击请求(106)。
Description
本说明书涉及计算机应用领域,尤其涉及一种URL攻击检测方法、装置、以及电子设备。
在互联网的应用场景中,每天都会产生大量的对于网址的URL访问请求。在这些大量的URL访问请求中,也不乏不法分子试图通过不合法的URL访问请求而发起的URL攻击;例如,常见的URL攻击如木马攻击、SQL注入攻击、跨站脚本攻击(XSS)等。这一类非法的URL访问请求,通常会与普通的URL访问请求存在一定的区别;因此,在构建线上系统的同时,通过一些安全手段对非法用户发起的URL攻击进行快速的识别检测是不可忽视的问题。
发明内容
本说明书提出一种URL攻击检测方法,所述方法包括:
从URL访问请求中携带的信息中提取若干维度的特征;
将提取到的特征输入预设的URL攻击检测模型进行预测计算,得到所述URL访问请求的风险评分;其中,所述URL攻击检测模型为基于Isolation Forest机器学习算法训练得到的机器学习模型;
基于所述风险评分确定所述URL访问请求是否为URL攻击请求。
可选的,所述方法还包括:
从若干URL访问请求样本携带的信息中分别提取若干维度的特征;其中,所述若干URL访问请求样本均未被标记样本标签。
基于提取到的特征构建若干训练样本;
基于Isolation Forest机器学习算法对所述若干训练样本进行训练得到所述URL攻击检测模型。
可选的,所述URL攻击检测模型包括基于Isolation Forest机器学习算法训练得到的M棵随机二叉树;
所述基于Isolation Forest机器学习算法对所述若干训练样本进行训练得到所述URL攻击检测模型,包括:
基于从所述若干训练样本中均匀抽样出的训练样本构建出M个训练样本子集;
从所述若干维度的特征中为各训练样本子集随机选择一分类特征作为根节点,以及在所述分类特征的最大取值和最小取值构成的取值区间中,为各训练样本子集随机选取一分类临界值;
将各训练样本子集中所述分类特征的取值大于所述分类临界值的训练样本,和所述分类特征的取值小于所述分类临界值的训练样本,分别分类为所述根节点的叶节点;以及,
将各叶节点中的训练样本作为新的训练样本子集,迭代执行以上分类过程,直到得到的各叶节点中的训练样本不可再分类时停止。
可选的,所述将提取到的特征输入预设的URL攻击检测模型进行预测计算,得到所述URL访问请求的风险评分,包括:
基于提取到的特征构建预测样本;
基于所述预测样本中各特征的取值,从根节点开始遍历各棵随机二叉树查找与所述预测样本对应的叶节点;
计算查找到的叶节点在各棵随机二叉树中的路径深度的平均值,并对所述平均值进行归一化处理,得到所述URL访问请求的风险评分。
可选的,所述信息包括:域名信息,和/或URL参数;所述若干维度的特征包括:从URL访问请求中携带的域名信息中提取出的特征;和/或从URL访问请求中携带的URL参数中提取出的特征。
可选的,所述特征包括以下特征中的多个的组合:字符总数、字母总数、 数字总数、符号总数、不同字符数、不同字母数、不同数字数、不同符号数。
本说明书还提出一种URL攻击检测装置,所述装置包括:
第一提取模块,从URL访问请求中携带的信息中提取若干维度的特征;
计算模块,将提取到的特征输入预设的URL攻击检测模型进行预测计算,得到所述URL访问请求的风险评分;其中,所述URL攻击检测模型为基于Isolation Forest机器学习算法训练得到的机器学习模型;
确定模块,基于所述风险评分确定所述URL访问请求是否为URL攻击请求。
可选的,所述装置还包括:
第二提取模块,从若干URL访问请求样本携带的信息中分别提取若干维度的特征;其中,所述若干URL访问请求样本均未被标记样本标签。
构建模块,基于提取到的特征构建若干训练样本;
训练模块,基于Isolation Forest机器学习算法对所述若干训练样本进行训练得到所述URL攻击检测模型。
可选的,所述URL攻击检测模型包括基于Isolation Forest机器学习算法训练得到的M棵随机二叉树;
所述训练模块:
基于从所述若干训练样本中均匀抽样出的训练样本构建出M个训练样本子集;
从所述若干维度的特征中为各训练样本子集随机选择一分类特征作为根节点,以及在所述分类特征的最大取值和最小取值构成的取值区间中,为各训练样本子集随机选取一分类临界值;
将各训练样本子集中所述分类特征的取值大于所述分类临界值的训练样本,和所述分类特征的取值小于所述分类临界值的训练样本,分别分类为所述根节点的叶节点;以及,
将各叶节点中的训练样本作为新的训练样本子集,迭代执行以上分类过程,直到得到的各叶节点中的训练样本不可再分类时停止。
可选的,所述计算模块:
基于提取到的特征构建预测样本;
基于所述预测样本中各特征的取值,从根节点开始遍历各棵随机二叉树查找与所述预测样本对应的叶节点;
计算查找到的叶节点在各棵随机二叉树中的路径深度的平均值,并对所述平均值进行归一化处理,得到所述URL访问请求的风险评分。
可选的,所述信息包括:域名信息,和/或URL参数;所述若干维度的特征包括:从URL访问请求中携带的域名信息中提取出的特征;和/或从URL访问请求中携带的URL参数中提取出的特征。
可选的,所述特征包括以下特征中的多个的组合:字符总数、字母总数、数字总数、符号总数、不同字符数、不同字母数、不同数字数、不同符号数。
本说明书还提出一种电子设备,包括:
处理器;
用于存储机器可执行指令的存储器;
其中,通过读取并执行所述存储器存储的与URL攻击检测的控制逻辑对应的机器可执行指令,所述处理器被促使:
从URL访问请求中携带的信息中提取若干维度的特征;
将提取到的特征输入预设的URL攻击检测模型进行预测计算,得到所述URL访问请求的风险评分;其中,所述URL攻击检测模型为基于Isolation Forest机器学习算法训练得到的机器学习模型;
基于所述风险评分确定所述URL访问请求是否为URL攻击请求。
本说明书实施例提供的技术方案,通过将从URL访问请求中提取出的特征输入至基于Isolation Forest机器学习算法训练出的URL攻击检测模型进行预测计算,来对URL访问请求进行攻击检测,可以提前发现潜在的URL攻击,从而有助于对潜在的异常URL访问及时的进行安全防护。
图1是本说明书一实施例示出的URL攻击检测方法的流程图;
图2是本说明书一实施例示出的一种构建训练样本集训练Isolation Forest模型的流程图;
图3是本说明书一实施例提供的承载一种URL攻击检测装置的电子设备所涉及的硬件结构图;
图4是本说明书一实施例提供的一种所述URL攻击检测装置的逻辑框图。
本说明书旨在提出一种基于Isolation Forest(孤立森林)机器学习算法对均未被标记风险标签的URL访问请求样本进行机器学习训练,来构建URL攻击检测模型,并使用该URL攻击检测模型对正常的URL访问请求进行攻击检测,来发现潜在的URL攻击的技术方案。
在实现时,可以预先准备若干URL访问请求样本;其中,这些URL访问请求样本均未被标记风险标签。然后,可以对这些URL访问请求样本进行数据切分,从这些URL访问请求样本中携带的信息中提取出若干维度的特征;
例如,在实际应用中,上述信息具体可以包括域名信息、URL参数,在这种情况下,可以对URL访问请求样本进行数据切分,提取出URL访问请求与样本中携带的域名信息(比如主域名和对应的域名后缀)、URL参数(比如URL参数名和对应的参数取值),然后从提取出的域名信息、URL参数中提取出若干个维度的特征。
进一步,当从URL访问请求样本中,分别提取出若干个维度的特征后,可以对这些特征进行归一化处理,然后将归一化处理后的特征作为建模特征来构建训练样本。
当训练样本构建完成后,可以基于Isolation Forest机器学习算法对这些训练样本进行训练,来构建URL攻击检测模型;例如,可以采用Isolation Forest机器学习算法对训练样本进行二叉树分类,构建出多颗随机二叉树。
最后,当URL攻击检测模型训练完成后,可以按照相同的方式,从需要进行攻击检测的URL访问请求携带的信息中分别提取出若干维度的特征,并基于提取出的特征来构建预测样本,将构建完成的预测样本输入至上述URL攻击检测模型中进行预测计算,得到该URL访问请求的风险评分,然后可以基于该风险评分来确定该URL访问请求是否为URL攻击请求。
在以上技术方案中,通过将从URL访问请求中提取出的特征输入至基于Isolation Forest机器学习算法训练出的URL攻击检测模型进行预测计算,来对URL访问请求进行攻击检测,可以提前发现潜在的URL攻击,从而有助于对潜在的异常URL访问及时的进行安全防护。
下面通过具体实施例并结合具体的应用场景对本说明书进行描述。
请参考图1,图1是本说明书一实施例提供的一种URL攻击检测方法,执行以下步骤:
步骤102,从URL访问请求中携带的信息中提取若干维度的特征;
步骤104,将提取到的特征输入预设的URL攻击检测模型进行预测计算,得到所述URL访问请求的风险评分;其中,所述URL攻击检测模型为基于Isolation Forest机器学习算法训练得到的机器学习模型;
步骤106,基于所述风险评分确定所述URL访问请求是否为URL攻击请求。
在本说明书中,建模方可以预先收集大量的未进行标记的URL访问请求作为无标记样本,并基于收集到的这些无标记样本来构建训练样本集,然后基于Isolation Forest机器学习算法对该训练样本集进行无监督的机器学习训练,来构建上述URL攻击检测模型。
请参见图2,图2为本说明书示出的一种构建训练样本集训练Isolation Forest模型的流程图。
如图2所示,首先,可以对收集到的这些未进行标记的原始的URL访问请求样本分别进行数据切分,提取出这些URL访问请求样本中携带的信息。
其中,上述URL访问请求中携带的信息是指那些能够从中提取出,可以反映URL访问请求是否存在风险的特征的信息。
在示出的一种实施方式中,上述信息具体可以包括URL访问请求中携带的URL参数和域名信息等。上述URL参数,可以包括URL参数名(ParamName)以及对应的参数取值(ParamValue);而上述域名信息,可以包括主域名和与主域名对应的域名后缀。
例如,以上述信息为URL访问请求中携带的URL参数为例,可以对原始的URL访问请求样本进行数据切分,提取出这些URL访问请求样本中携带的URL参数名(ParamName)以及对应的参数取值(ParamValue);
又如,以上述信息为URL访问请求中携带的信息为例,可以对原始的URL访问请求样本进行数据切分。提取出URL访问请求中携带的主域名和与主域名对应的域名后缀。当提取出这些URL访问请求样本中携带的信息后,可以从这些信息中筛选出已知的URL攻击请求中较为常见的那一部分信息,用以构建机器学习模型。即筛选出最能够表征URL攻击请求的特征的信息,来参与建模。
例如,以上述信息为URL访问请求中携带的URL参数为例,对于部分只在个别的URL访问请求中出现的特殊URL参数,由于这部分URL参数并不能真实反映出URL攻击请求的特征,因此对于这部分URL参数可以进行过滤。
又如,以上述信息为URL访问请求中携带的信息为例,对于部分只在个别的URL访问请求中出现的特殊信息,由于这部分信息并不能真实反映出URL攻击请求的特征,参与建模会对模型的结果造成干扰,因此对于这部分信息可以进行过滤处理。
进一步的,对于筛选出的信息,可以从这些信息中分别提取出若干个维度的特征,来作为建模特征。
其中,需要说明的是,建模方在建模时,从URL访问请求样本中提取出的信息具体可以采用URL访问请求样本中携带的域名信息和URL参数中 的其中一个,也可以同时采用上述域名信息和URL参数作为信息。
因而,在这种情况下,建模方在从信息中提取到的特征,则可以包括以下示出的三种情况:
在一种情况下,如果建模方采用URL访问请求样本中携带的域名信息作为上述信息,那么最终提取到的特征,可以仅包括从URL访问请求样本中携带的域名信息中提取出的若干维度的特征;
在另一种情况下,如果建模方采用URL访问请求样本中携带的URL参数作为上述信息,那么最终提取到的特征,可以仅包括从URL访问请求样本中携带的URL参数中提取出的若干维度的特征;
在第三种情况下,如果建模方同时采用URL访问请求样本中携带的URL参数和域名信息作为信息,此时上述URL参数和上述域名信息将同时参与建模,那么最终提取到的特征,可以同时包括从URL访问请求样本中携带的URL参数和域名信息中分别提取出的若干维度的特征;其中,从这些信息中提取出的特征,在本说明书中不进行特殊限定,在实际应用中,任意形式的能够表征URL攻击请求中携带的信息的特征以及规律的特征,都可以被选定作为建模特征。
例如,在实际应用中,参与建模的本领域技术人员,可以基于经验从这些信息中提取出若干个维度的特征,然后基于这些特征进行尝试建模,并对建模结果进行评估,来从中筛选出对模型的贡献度最高的若干个维度的特征作为建模特征。
在示出的一种实施方式中,从这些信息中提取出的特征,可以包括信息的字符总数、信息的字母总数、信息的数字总数、信息的符号总数、信息的不同字符数、信息的不同字母数、信息的不同数字数、信息的不同符号数等8个维度。
例如,如果建模方采用URL访问请求样本中携带的域名信息作为上述信息,最终提取到的特征可以包括域名信息的字符总数、域名信息的字母总数、域名信息的数字总数、域名信息的符号总数、域名信息的不同字符数、域名 信息的不同字母数、域名信息的不同数字数、域名信息的不同符号数等8个维度;
如果建模方采用URL访问请求样本中携带的URL参数作为上述信息,最终提取到的特征可以包括URL参数的字符总数、URL参数的字母总数、URL参数的数字总数、URL参数的符号总数、URL参数的不同字符数、URL参数的不同字母数、URL参数的不同数字数、URL参数的不同符号数等8个维度;
如果建模方同时采用URL访问请求样本中携带的URL参数和域名信息作为信息,最终提取到的特征可以包括URL参数的字符总数、URL参数的字母总数、URL参数的数字总数、URL参数的符号总数、URL参数的不同字符数、URL参数的不同字母数、URL参数的不同数字数、URL参数的不同符号数、域名信息的字符总数、域名信息的字母总数、域名信息的数字总数、域名信息的符号总数、域名信息的不同字符数、域名信息的不同字母数、域名信息的不同数字数、域名信息的不同符号数等16个维度。
其中,需要说明的是,在实际应用中,本领域技术人员可以将以上8个基础维度进行组合作为建模特征,或者从以上8个基础维度中进一步筛选出多个维度进行组合作为建模特征,在本说明书中不进行特别限定。
当然,以上示出的8个维度的特征仅为示例性的;显然,在实际应用中,本领域技术人员也可以从这些信息中提取出以上8个维度以外的其它维度的特征作为建模特征,在本说明书中不再进行一一列举。
请继续参见图2,当从筛选出的信息中分别提取出若干个维度的特征后,由于不同的特征的取值范围可能并不统一,因此还可以对这些维度的特征进行归一化处理,将不同的特征的取值范围归一化到一个统一的数值区间,从而来消除由于特征的取值范围不同对建模精度造成的影响。
当对提取出的特征归一化处理完成之后,可以基于从各URL访问请求样本携带的信息中提取出的特征,为各URL访问请求样本分别创建一个对应的特征向量作为训练样本;其中,创建的特征向量的维度,与提取出的特征的 维度相同。
当为各URL访问请求样本构建了对应的特征向量后,此时可以基于为各URL访问请求样本构建的特征向量,创建一个目标矩阵;例如,假设共计收集到N条URL访问请求样本,从每一个URL访问请求样本提取出M维的特征,那么该目标矩阵具体可以是一个N*M维的目标矩阵。
此时,创建的该目标矩阵,即为最终参与机器学习模型训练的训练样本集。
请继续参见图2,当训练样本集训练完毕,可以基于Isolation Forest机器学习算法对这些训练样本进行训练,来构建上述URL攻击检测模型。其中,Isolation Forest算法是一种通过构建多个随机二叉树,从原始的数据集中挖掘出异常数据样本的算法。所谓随机二叉树,是指基于随机生成的分类特征,以及随机生成的与分类特征的取值对应的分类临界值构建而成的二叉树。即在构建随机二叉树时,所使用的分类特征以及与分类特征的取值对应的分类临界值均为随机生成的。
而利用Isolation Forest算法对构建完成的训练样本集进行训练,来构建URL异常检测模型的过程,即为利用Isolation Forest算法对训练样本集中的训练样本进行分类,构建M棵随机二叉树的过程。
在初始状态,建模方在基于Isolation Forest算法对上述训练样本集进行训练之前,需要对Isolation Forest算法进行参数配置,为Isolation Forest算法配置需要构建的随机二叉树个数M,以及在构建单棵随机二叉树时需要从训练样本集中抽样的训练样本数N。
其中,上述M和N的取值,可以采用工程经验值,或者基于建模方实际的需求进行自定义设置;例如,Isolation Forest算法默认需要构建的随机二叉树个数为100,每一刻随机二叉树需要采样的训练样本数为256。
当建模方完成对Isolation Forest算法的参数配置后,建模方可以通过在搭建的计算平台(比如服务器集群)中运行Isolation Forest算法,对构建完成的训练样本集进行训练,来构建最终的URL异常检测模型。
以下对利用Isolation Forest算法对训练样本集中的训练样本进行分类,来构建随机二叉树的流程,进行详细描述。
首先,可以基于建模方配置的上述N值,针对训练样本集进行M次的均匀抽样。其中,所述均匀抽样,是指在执行的M次抽样中,每一次从训练样本集中抽样出的训练样本集的数量都相同。
当完成训练样本的均匀抽样后,可以基于抽样出的训练样本,来构建出M个训练样本子集,然后针对每一个训练样本子集中的训练样本分别进行分类,来构建出M棵随机二叉树。
进一步的,在针对一个训练样本子集中的训练样本进行分类,来构建随机二叉树时,首先可以从构成训练样本的若干维度的特征中,为该训练样本子集随机选择一个特征作为分类特征,并将该分类特征作为根节点;以及,确定该分类特征当前在该训练样本子集中的最大取值和最小取值,然后在该最大取值和最小取值构成的取值区间中,为该训练样本集随机选取一分类临界值。
当选定了作为根节点的分类特征以及分类临界值后,此时可以针对该训练样本子集执行第一级的分类,将该训练样本子集中各个训练样本的上述分类特征的取值,分别与上述分类临界值进行比较,然后基于比较结果将该训练样本子集中的训练样本分类为,上述分类特征的取值大于上述分类临界值的训练样本,和上述分类特征小于上述分类临界值的训练样本两类,并将分类出的这两类训练样本,分别作为上述根节点的叶节点。
例如,在实现时,可以将该训练样本子集中上述分类特征的取值小于上述分类临界值的训练样本,分类到二叉树的左树分支,将这一类训练样本作为上述根节点在二叉树上的左树叶节点;而将该训练样本子集中上述分类特征的取值大于上述分类临界值的训练样本,分类到二叉树的右树分支,将这一类训练样本作为上述根节点在二叉树上的右树叶节点。
此时针对该训练样本子集的第一级分类完成。
进一步,当第一级分类完成后,可以继续完成针对上述训练样本子集的 第二级分类。
此时,可以将已经分类得到的两个叶节点中的训练样本,分别作为新的训练样本子集,然后针对上述新的训练样本子集,来迭代执行以上分类过程,直到得到的各叶节点中的训练样本不可再分类时停止;
例如,仍然可以采用相同的方式,为各新的训练样本子集随机选择分类特征以及分类临界值,然后将各新的训练样本子集中的训练样本分类为,上述分类特征的取值大于上述分类临界值的训练样本,和上述分类特征小于上述分类临界值的训练样本两类,并将分类出的这两类训练样本,分别作为上一级的叶节点的下一级叶节点,以此类推,直到在执行某一级的分类后,得到的下一级的叶节点中的训练样本不可再分时停止;比如,叶节点中只剩一个训练样本,或者叶节点中的训练样本完全相同,表明得到的叶节点中的训练样本已经不可以再继续分类。
其中,需要说明的是,为根节点以及各级子节点随机选择的分类特征,需要保持不同;例如,在一种实现方式中,在将某一个特征选择为随机二叉树中某一节点的分类特征后,可以将该特征移除,后续在为其它节点选择分类特征时,将在该特征以外的其它特征中过来进行随机选择。
另外,以上示出的Isolation Forest算法的迭代分类的停止条件,默认情况下可以是得到的叶节点中的训练样本已经不可以再继续分类,在实际应用中,建模方也可以在为Isolation Forest算法配置算法参数时,可以为得到的随机二叉树配置一个最大的二叉树深度(即从根节点开始节点的最大层数)。在这种情况下,上述停止条件,也可以是当通过上述迭代分类的过程,得到的随机二叉树的深度达到为算法配置的最大的二叉树深度时,算法可以立即停止(此时得到的各叶节点中的训练样本可能仍然可以再继续分类)。
以上示出的为针对其中一个训练样本子集中的训练样本进行迭代分类,构建单棵随机二叉树的过程。
相似的,可以针对每一个训练样本子集重复执行以上分类过程,最终可以基于上述M个训练样本子集,构建出M棵随机二叉树,此时针对上述的 训练样本集的训练完成,得到的上述M棵随机二叉树,即为最终构建出的URL异常该检测模型。
在本说明书中,当上述URL攻击检测模型训练完毕后,可以按照如图2示出的相同的特征提取方式,从需要进行攻击检测的URL访问请求提取信息,从提取到的信息中筛选信息、从筛选出的信息中提取若干个维度的特征(与模型训练阶段的特征一致),然后基于提取到的特征构建预测样本,并将预测样本输入至上述URL攻击检测模型进行预测计算,得到该URL访问请求的风险评分。
以下对利用训练完成的URL攻击检测模型对URL访问请求进行风险评分的流程,进行详细描述。
在计算构建出的预测样本的风险评分时,首先需要估算出该预测样本在每颗随机二叉树中的路径深度h(x);
具体的,可以基于该预测样本中各特征的取值,从各棵随机二叉树的根节点开始,按照由上至下的顺序遍历整棵随机二叉树,来查找该预测样本在随机二叉树中对应的叶节点;
例如,首先可以确定该预测样本中与根节点的分类特征对应的取值,然后基于该取值,来查找该预测样本所在的第一级叶节点。在查找到第一级叶节点后,可以继续确定该预测样本中与该第一级叶的分类特征对应的取值,然后基于该取值,继续查找该预测样本所在的第二级叶节点,以此类推,可以通过逐级遍历,直到查找到该预测样本对应的叶节点时停止。
当查找到与上述预测样本对应的叶节点后,此时可以记录在遍历随机二叉树的过程中,从根节点到查找到的该叶节点之间一共经过的边的数目e,以及与上述预测样本对应的叶节点中的训练样本数n。
此时最终得到的路径深度h(x),可以用如下公式来表征:
h(x)=e+C(n)
其中,C(n)为修正值,可以用如下公式来表征:
其中,H(n-1)可用ln(n-1)+0.5772156649估算,这里的常数是欧拉常数。
当通过以上公式,估算出该预测样本在每颗随机二叉树中的路径深度h(x)后,可以进一步求解该预测样本在每颗随机二叉树的路径深度的平均值,然后对得到的平均值进行归一化处理,将计算结果量化到0~1之间,得到该URL访问请求的风险评分;
最终得到的风险评分可以用如下公式进行表征:
其中,Score(x)表示预测样本X最终的风险评分;E{h(x)}表示预测样本在每颗随机二叉树中的路径深度h(x);
表示单棵随机二叉树的训练样本数;
表示用
条训练样本构建的二叉树的平均路径长度,在上述公式中用来对计算结果作归一化处理。
当通过上述URL攻击检测模型预测出该URL访问请求的风险评分后,可以进一步基于该URL风险评分,来确定该URL访问请求是否为URL攻击请求;
例如,在一种实现方式中,可以将该风险评分与预设的风险阈值进行比较,来确定该URL访问请求的具体类型;如果该风险评分大于或者等于预设的风险阈值,则表明该URL访问请求为URL攻击请求;反之,如果该风险评分小于该预设的风险阈值,则表明该URL访问请求为正常URL访问请求。
通过以上实施例可知,在本说明书中,通过将从URL访问请求中提取出的特征输入至基于Isolation Forest机器学习算法训练出的URL攻击检测模型进行预测计算,来对URL访问请求进行攻击检测:
一方面,通过这种方式,可以提前发现潜在的URL攻击,从而有助于对潜在的异常URL访问及时的进行安全防护。
另一方面,由于Isolation Forest算法是一种无监督的机器学习算法,在训练模型时所需的训练样本可以不再需要标记样本标签,因此对于建模方而 言,可以省去为训练样本打标而造成的大量人力成本。
与上述方法实施例相对应,本说明书还提供了一种URL攻击检测装置的实施例。本说明书的URL攻击检测设备的实施例可以应用在电子设备上。装置实施例可以通过软件实现,也可以通过硬件或者软硬件结合的方式实现。以软件实现为例,作为一个逻辑意义上的装置,是通过其所在电子设备的处理器将非易失性存储器中对应的计算机程序指令读取到内存中运行形成的。从硬件层面而言,如图3所示,为本说明书的URL攻击检测装置所在电子设备的一种硬件结构图,除了图3所示的处理器、内存、网络接口、以及非易失性存储器之外,实施例中装置所在的电子设备通常根据该电子设备的实际功能,还可以包括其他硬件,对此不再赘述。
图4是本说明书一示例性实施例示出的一种URL攻击检测装置的框图。
请参考图4,所述URL攻击检测装置40可以应用在前述图3所示的电子设备中,包括有:第一提取模块401、计算模块402和确定模块403。
第一提取模块401,从URL访问请求中携带的信息中提取若干维度的特征;
计算模块402,将提取到的特征输入预设的URL攻击检测模型进行预测计算,得到所述URL访问请求的风险评分;其中,所述URL攻击检测模型为基于Isolation Forest机器学习算法训练得到的机器学习模型;
确定模块403,基于所述风险评分确定所述URL访问请求是否为URL攻击请求。
在本例中,所述装置40还包括:
第二提取模块404(图4中未示出),从若干URL访问请求样本携带的信息中分别提取若干维度的特征;其中,所述若干URL访问请求样本均未被标记样本标签。
构建模块405(图4中未示出),基于提取到的特征构建若干训练样本;
训练模块406(图4中未示出),基于Isolation Forest机器学习算法对所述若干训练样本进行训练得到所述URL攻击检测模型。
在本例中,所述URL攻击检测模型包括基于Isolation Forest机器学习算法训练得到的M棵随机二叉树;
所述训练模块406:
基于从所述若干训练样本中均匀抽样出的训练样本构建出M个训练样本子集;
从所述若干维度的特征中为各训练样本子集随机选择一分类特征作为根节点,以及在所述分类特征的最大取值和最小取值构成的取值区间中,为各训练样本子集随机选取一分类临界值;
将各训练样本子集中所述分类特征的取值大于所述分类临界值的训练样本,和所述分类特征的取值小于所述分类临界值的训练样本,分别分类为所述根节点的叶节点;以及,
将各叶节点中的训练样本作为新的训练样本子集,迭代执行以上分类过程,直到得到的各叶节点中的训练样本不可再分类时停止。
在本例中,所述计算模块403:
基于提取到的特征构建预测样本;
基于所述预测样本中各特征的取值,从根节点开始遍历各棵随机二叉树查找与所述预测样本对应的叶节点;
计算查找到的叶节点在各棵随机二叉树中的路径深度的平均值,并对所述平均值进行归一化处理,得到所述URL访问请求的风险评分。
在本例中,所述信息包括:域名信息,和/或URL参数;所述若干维度的特征包括:从URL访问请求中携带的域名信息中提取出的特征;和/或从URL访问请求中携带的URL参数中提取出的特征。
在本例中,所述特征包括以下特征中的多个的组合:字符总数、字母总数、数字总数、符号总数、不同字符数、不同字母数、不同数字数、不同符号数。
上述装置中各个模块的功能和作用的实现过程具体详见上述方法中对应步骤的实现过程,在此不再赘述。
对于装置实施例而言,由于其基本对应于方法实施例,所以相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本说明书方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机,计算机的具体形式可以是个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件收发设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任意几种设备的组合。
与上述方法实施例相对应,本说明书还提供了一种电子设备的实施例。该电子设备包括:处理器以及用于存储机器可执行指令的存储器;其中,处理器和存储器通常通过内部总线相互连接。在其他可能的实现方式中,所述设备还可能包括外部接口,以能够与其他设备或者部件进行通信。
在本实施例中,通过读取并执行所述存储器存储的与URL攻击检测的控制逻辑对应的机器可执行指令,所述处理器被促使:
从URL访问请求中携带的信息中提取若干维度的特征;
将提取到的特征输入预设的URL攻击检测模型进行预测计算,得到所述URL访问请求的风险评分;其中,所述URL攻击检测模型为基于Isolation Forest机器学习算法训练得到的机器学习模型;
基于所述风险评分确定所述URL访问请求是否为URL攻击请求。
在本例中,通过读取并执行所述存储器存储的URL攻击检测的控制逻辑对应的机器可执行指令,所述处理器还被促使:
从若干URL访问请求样本携带的信息中分别提取若干维度的特征;其中, 所述若干URL访问请求样本均未被标记样本标签。
基于提取到的特征构建若干训练样本;
基于Isolation Forest机器学习算法对所述若干训练样本进行训练得到所述URL攻击检测模型。
在本实施例中,所述URL攻击检测模型包括基于Isolation Forest机器学习算法训练得到的M棵随机二叉树;
通过读取并执行所述存储器存储的URL攻击检测的控制逻辑对应的机器可执行指令,所述处理器还被促使:
基于从所述若干训练样本中均匀抽样出的训练样本构建出M个训练样本子集;
从所述若干维度的特征中为各训练样本子集随机选择一分类特征作为根节点,以及在所述分类特征的最大取值和最小取值构成的取值区间中,为各训练样本子集随机选取一分类临界值;
将各训练样本子集中所述分类特征的取值大于所述分类临界值的训练样本,和所述分类特征的取值小于所述分类临界值的训练样本,分别分类为所述根节点的叶节点;以及,
将各叶节点中的训练样本作为新的训练样本子集,迭代执行以上分类过程,直到得到的各叶节点中的训练样本不可再分类时停止。
在本例中,通过读取并执行所述存储器存储的URL攻击检测的控制逻辑对应的机器可执行指令,所述处理器还被促使:
基于提取到的特征构建预测样本;
基于所述预测样本中各特征的取值,从根节点开始遍历各棵随机二叉树查找与所述预测样本对应的叶节点;
计算查找到的叶节点在各棵随机二叉树中的路径深度的平均值,并对所述平均值进行归一化处理,得到所述URL访问请求的风险评分。
在本例中,所述信息包括:域名信息,和/或URL参数;所述若干维度的特征包括:从URL访问请求中携带的域名信息中提取出的特征;和/或从 URL访问请求中携带的URL参数中提取出的特征。
在本例中,提取出的所述若干维度的特征包括以下特征中的多个的组合:信息的字符总数、信息的字母总数、信息的数字总数、信息的符号总数、信息的不同字符数、信息的不同字母数、信息的不同数字数、信息的不同符号数。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本说明书的其它实施方案。本说明书旨在涵盖本说明书的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本说明书的一般性原理并包括本说明书未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本说明书的真正范围和精神由下面的权利要求指出。
应当理解的是,本说明书并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本说明书的范围仅由所附的权利要求来限制。
以上所述仅为本说明书的较佳实施例而已,并不用以限制本说明书,凡在本说明书的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本说明书保护的范围之内。
Claims (13)
- 一种URL攻击检测方法,所述方法包括:从URL访问请求中携带的信息中提取若干维度的特征;将提取到的特征输入预设的URL攻击检测模型进行预测计算,得到所述URL访问请求的风险评分;其中,所述URL攻击检测模型为基于孤立森林Isolation Forest机器学习算法训练得到的机器学习模型;基于所述风险评分确定所述URL访问请求是否为URL攻击请求。
- 根据权利要求1所述的方法,所述方法还包括:从若干URL访问请求样本携带的信息中分别提取若干维度的特征;其中,所述若干URL访问请求样本均未被标记样本标签;基于提取到的特征构建若干训练样本;基于Isolation Forest机器学习算法对所述若干训练样本进行训练得到所述URL攻击检测模型。
- 根据权利要求2所述的方法,所述URL攻击检测模型包括基于Isolation Forest机器学习算法训练得到的M棵随机二叉树;所述基于Isolation Forest机器学习算法对所述若干训练样本进行训练得到所述URL攻击检测模型,包括:基于从所述若干训练样本中均匀抽样出的训练样本构建出M个训练样本子集;从所述若干维度的特征中为各训练样本子集随机选择一分类特征作为根节点,以及在所述分类特征的最大取值和最小取值构成的取值区间中,为各训练样本子集随机选取一分类临界值;将各训练样本子集中所述分类特征的取值大于等于所述分类临界值的训练样本,和所述分类特征的取值小于所述分类临界值的训练样本,分别分类为所述根节点的叶节点;以及,将各叶节点中的训练样本作为新的训练样本子集,迭代执行以上分类过程,直到得到的各叶节点中的训练样本不可再分类时停止。
- 根据权利要求3所述的方法,所述将提取到的特征输入预设的URL攻击检测模型进行预测计算,得到所述URL访问请求的风险评分,包括:基于提取到的特征构建预测样本;基于所述预测样本中各特征的取值,从根节点开始遍历各棵随机二叉树查找与所述预测样本对应的叶节点;计算查找到的叶节点在各棵随机二叉树中的路径深度的平均值,并对所述平均值进行归一化处理,得到所述URL访问请求的风险评分。
- 根据权利要求1所述的方法,所述信息包括:域名信息,和/或URL参数;所述若干维度的特征包括:从URL访问请求中携带的域名信息中提取出的特征;和/或从URL访问请求中携带的URL参数中提取出的特征。
- 根据权利要求5所述的方法,所述特征包括以下特征中的多个的组合:字符总数、字母总数、数字总数、符号总数、不同字符数、不同字母数、不同数字数、不同符号数。
- 一种URL攻击检测装置,所述装置包括:第一提取模块,从URL访问请求中携带的信息中提取若干维度的特征;计算模块,将提取到的特征输入预设的URL攻击检测模型进行预测计算,得到所述URL访问请求的风险评分;其中,所述URL攻击检测模型为基于Isolation Forest机器学习算法训练得到的机器学习模型;确定模块,基于所述风险评分确定所述URL访问请求是否为URL攻击请求。
- 根据权利要求7所述的装置,所述装置还包括:第二提取模块,从若干URL访问请求样本携带的信息中分别提取若干维度的特征;其中,所述若干URL访问请求样本均未被标记样本标签;构建模块,基于提取到的特征构建若干训练样本;训练模块,基于Isolation Forest机器学习算法对所述若干训练样本进行训练得到所述URL攻击检测模型。
- 根据权利要求8所述的装置,所述URL攻击检测模型包括基于Isolation Forest机器学习算法训练得到的M棵随机二叉树;所述训练模块:基于从所述若干训练样本中均匀抽样出的训练样本构建出M个训练样本子集;从所述若干维度的特征中为各训练样本子集随机选择一分类特征作为根节点,以及在所述分类特征的最大取值和最小取值构成的取值区间中,为各训练样本子集随机选取一分类临界值;将各训练样本子集中所述分类特征的取值大于所述分类临界值的训练样本,和所述分类特征的取值小于所述分类临界值的训练样本,分别分类为所述根节点的叶节点;以及,将各叶节点中的训练样本作为新的训练样本子集,迭代执行以上分类过程,直到得到的各叶节点中的训练样本不可再分类时停止。
- 根据权利要求9所述的装置,所述计算模块:基于提取到的特征构建预测样本;基于所述预测样本中各特征的取值,从根节点开始遍历各棵随机二叉树查找与所述预测样本对应的叶节点;计算查找到的叶节点在各棵随机二叉树中的路径深度的平均值,并对所述平均值进行归一化处理,得到所述URL访问请求的风险评分。
- 根据权利要求7所述的装置,所述信息包括:域名信息,和/或URL参数;所述若干维度的特征包括:从URL访问请求中携带的域名信息中提取出的特征;和/或从URL访问请求中携带的URL参数中提取出的特征。
- 根据权利要求11所述的装置,所述特征包括以下特征中的多个的组合:字符总数、字母总数、数字总数、符号总数、不同字符数、不 同字母数、不同数字数、不同符号数。
- 一种电子设备,包括:处理器;用于存储机器可执行指令的存储器;其中,通过读取并执行所述存储器存储的与URL攻击检测的控制逻辑对应的机器可执行指令,所述处理器被促使:从URL访问请求中携带的信息中提取若干维度的特征;将提取到的特征输入预设的URL攻击检测模型进行预测计算,得到所述URL访问请求的风险评分;其中,所述URL攻击检测模型为基于Isolation Forest机器学习算法训练得到的机器学习模型;基于所述风险评分确定所述URL访问请求是否为URL攻击请求。
Priority Applications (5)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP18893619.9A EP3651043B1 (en) | 2017-12-28 | 2018-11-19 | Url attack detection method and apparatus, and electronic device |
| ES18893619T ES2878330T3 (es) | 2017-12-28 | 2018-11-19 | Método y aparato de detección de ataques de URL y dispositivo electrónico |
| PL18893619T PL3651043T3 (pl) | 2017-12-28 | 2018-11-19 | Sposób i aparat do wykrywania atakowania URL oraz urządzenie elektroniczne |
| SG11202001369TA SG11202001369TA (en) | 2017-12-28 | 2018-11-19 | Url attack detection method and apparatus, and electronic device |
| US16/802,147 US10785241B2 (en) | 2017-12-28 | 2020-02-26 | URL attack detection method and apparatus, and electronic device |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201711463325.3 | 2017-12-28 | ||
| CN201711463325.3A CN108229156A (zh) | 2017-12-28 | 2017-12-28 | Url攻击检测方法、装置以及电子设备 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US16/802,147 Continuation US10785241B2 (en) | 2017-12-28 | 2020-02-26 | URL attack detection method and apparatus, and electronic device |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2019128529A1 true WO2019128529A1 (zh) | 2019-07-04 |
Family
ID=62645792
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2018/116100 Ceased WO2019128529A1 (zh) | 2017-12-28 | 2018-11-19 | Url攻击检测方法、装置以及电子设备 |
Country Status (8)
| Country | Link |
|---|---|
| US (1) | US10785241B2 (zh) |
| EP (1) | EP3651043B1 (zh) |
| CN (1) | CN108229156A (zh) |
| ES (1) | ES2878330T3 (zh) |
| PL (1) | PL3651043T3 (zh) |
| SG (1) | SG11202001369TA (zh) |
| TW (1) | TWI706273B (zh) |
| WO (1) | WO2019128529A1 (zh) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111371794A (zh) * | 2020-03-09 | 2020-07-03 | 北京金睛云华科技有限公司 | 阴影域检测模型、检测模型建立方法、检测方法及系统 |
| CN111970272A (zh) * | 2020-08-14 | 2020-11-20 | 上海境领信息科技有限公司 | 一种apt攻击操作识别方法 |
| CN112398875A (zh) * | 2021-01-18 | 2021-02-23 | 北京电信易通信息技术股份有限公司 | 视频会议场景下基于机器学习的流数据安全漏洞探测方法 |
| CN114499917A (zh) * | 2021-10-25 | 2022-05-13 | 中国银联股份有限公司 | Cc攻击检测方法及cc攻击检测装置 |
Families Citing this family (28)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108229156A (zh) * | 2017-12-28 | 2018-06-29 | 阿里巴巴集团控股有限公司 | Url攻击检测方法、装置以及电子设备 |
| CN108366071B (zh) * | 2018-03-06 | 2020-06-23 | 阿里巴巴集团控股有限公司 | Url异常定位方法、装置、服务器及存储介质 |
| CN108769079A (zh) * | 2018-07-09 | 2018-11-06 | 四川大学 | 一种基于机器学习的Web入侵检测技术 |
| CN110912861B (zh) * | 2018-09-18 | 2022-02-15 | 北京数安鑫云信息技术有限公司 | 一种深度追踪团伙攻击行为的ai检测方法和装置 |
| CN109714341A (zh) * | 2018-12-28 | 2019-05-03 | 厦门服云信息科技有限公司 | 一种Web恶意攻击识别方法、终端设备及存储介质 |
| US11368486B2 (en) * | 2019-03-12 | 2022-06-21 | Fortinet, Inc. | Determining a risk probability of a URL using machine learning of URL segments |
| CN110398375B (zh) * | 2019-07-16 | 2021-10-19 | 广州亚美信息科技有限公司 | 车辆冷却系统工作状态的监测方法、装置、设备和介质 |
| CN111162961B (zh) * | 2019-12-05 | 2021-12-31 | 任子行网络技术股份有限公司 | 发现移动应用主控服务器的方法、系统及可读存储介质 |
| US12488058B2 (en) * | 2020-06-02 | 2025-12-02 | Zscaler, Inc. | Phishing detection of uncategorized URLs using heuristics and scanning |
| CN113032774B (zh) * | 2019-12-25 | 2024-06-07 | 中移动信息技术有限公司 | 异常检测模型的训练方法、装置、设备及计算机存储介质 |
| US11748629B2 (en) * | 2020-01-21 | 2023-09-05 | Moxa Inc. | Device and method of handling anomaly detection |
| US11768945B2 (en) * | 2020-04-07 | 2023-09-26 | Allstate Insurance Company | Machine learning system for determining a security vulnerability in computer software |
| CN114257565B (zh) * | 2020-09-10 | 2023-09-05 | 中国移动通信集团广东有限公司 | 挖掘潜在威胁域名的方法、系统和服务器 |
| KR102682746B1 (ko) * | 2021-05-18 | 2024-07-12 | 한국전자통신연구원 | 비휘발성 메모리 공격 취약점 탐지 장치 및 방법 |
| WO2022251462A1 (en) * | 2021-05-27 | 2022-12-01 | Google Llc | Unsupervised anomaly detection with self-trained classification |
| CN113361597B (zh) * | 2021-06-04 | 2023-07-21 | 北京天融信网络安全技术有限公司 | 一种url检测模型的训练方法、装置、电子设备和存储介质 |
| TWI774582B (zh) | 2021-10-13 | 2022-08-11 | 財團法人工業技術研究院 | 惡意超文本傳輸協定請求的偵測裝置和偵測方法 |
| CN114416972B (zh) * | 2021-12-10 | 2022-10-14 | 厦门市世纪网通网络服务有限公司 | 一种基于密度改善不平衡样本的dga域名检测方法 |
| CN114338593B (zh) * | 2021-12-23 | 2023-07-04 | 上海观安信息技术股份有限公司 | 利用地址解析协议进行网络扫描的行为检测方法及装置 |
| CN114553496B (zh) * | 2022-01-28 | 2022-11-15 | 中国科学院信息工程研究所 | 基于半监督学习的恶意域名检测方法及装置 |
| CN114443338B (zh) * | 2022-01-28 | 2025-04-11 | 北京轩宇空间科技有限公司 | 面向稀疏负样本的异常检测方法、模型构建方法及装置 |
| US12592958B2 (en) * | 2022-03-30 | 2026-03-31 | Intel Corporation | Flexible deterministic finite automata (DFA) tokenizer for AI-based malicious traffic detection |
| CN116015721A (zh) * | 2022-11-30 | 2023-04-25 | 国网浙江省电力有限公司杭州供电公司 | 一种违规外联检测方法、系统、电子设备及介质 |
| CN116248340B (zh) * | 2022-12-26 | 2026-02-17 | 北京百度网讯科技有限公司 | 接口攻击的检测方法、装置、电子设备及存储介质 |
| CN116861128A (zh) * | 2023-07-17 | 2023-10-10 | 广州百蕴启辰科技有限公司 | 一种基于模拟访问的网站风险评估方法、装置及可存储介质 |
| CN116962059A (zh) * | 2023-08-01 | 2023-10-27 | 中国电信股份有限公司技术创新中心 | Web攻击检测方法、装置、设备和非瞬态存储介质 |
| CN117494185B (zh) * | 2023-10-07 | 2024-05-14 | 联通(广东)产业互联网有限公司 | 数据库访问控制方法及装置、系统、设备、存储介质 |
| CN119538274A (zh) * | 2024-10-12 | 2025-02-28 | 杭州高新区(滨江)区块链与数据安全研究院 | 数据防护方法、装置、设备及存储介质 |
Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20110185425A1 (en) * | 2010-01-22 | 2011-07-28 | National Taiwan University Of Science & Technology | Network attack detection devices and methods |
| CN104537303A (zh) * | 2014-12-30 | 2015-04-22 | 中国科学院深圳先进技术研究院 | 一种钓鱼网站鉴别系统及鉴别方法 |
| CN104735074A (zh) * | 2015-03-31 | 2015-06-24 | 江苏通付盾信息科技有限公司 | 一种恶意url检测方法及其实现系统 |
| CN107346388A (zh) * | 2017-07-03 | 2017-11-14 | 四川无声信息技术有限公司 | Web攻击检测方法及装置 |
| CN107577945A (zh) * | 2017-09-28 | 2018-01-12 | 阿里巴巴集团控股有限公司 | Url攻击检测方法、装置以及电子设备 |
| CN107992741A (zh) * | 2017-10-24 | 2018-05-04 | 阿里巴巴集团控股有限公司 | 一种模型训练方法、检测url的方法及装置 |
| CN108111489A (zh) * | 2017-12-07 | 2018-06-01 | 阿里巴巴集团控股有限公司 | Url攻击检测方法、装置以及电子设备 |
| CN108229156A (zh) * | 2017-12-28 | 2018-06-29 | 阿里巴巴集团控股有限公司 | Url攻击检测方法、装置以及电子设备 |
Family Cites Families (15)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8306942B2 (en) * | 2008-05-06 | 2012-11-06 | Lawrence Livermore National Security, Llc | Discriminant forest classification method and system |
| US8521667B2 (en) * | 2010-12-15 | 2013-08-27 | Microsoft Corporation | Detection and categorization of malicious URLs |
| US9491187B2 (en) * | 2013-02-15 | 2016-11-08 | Qualcomm Incorporated | APIs for obtaining device-specific behavior classifier models from the cloud |
| US9178901B2 (en) * | 2013-03-26 | 2015-11-03 | Microsoft Technology Licensing, Llc | Malicious uniform resource locator detection |
| US9904893B2 (en) * | 2013-04-02 | 2018-02-27 | Patternex, Inc. | Method and system for training a big data machine to defend |
| US9635050B2 (en) * | 2014-07-23 | 2017-04-25 | Cisco Technology, Inc. | Distributed supervised architecture for traffic segregation under attack |
| CN106341377A (zh) * | 2015-07-15 | 2017-01-18 | 威海捷讯通信技术有限公司 | 一种Web服务器免受攻击的方法及装置 |
| CN105357221A (zh) * | 2015-12-04 | 2016-02-24 | 北京奇虎科技有限公司 | 识别钓鱼网站的方法及装置 |
| US9838407B1 (en) * | 2016-03-30 | 2017-12-05 | EMC IP Holding Company LLC | Detection of malicious web activity in enterprise computer networks |
| AU2017281232B2 (en) * | 2016-06-22 | 2020-02-13 | Invincea, Inc. | Methods and apparatus for detecting whether a string of characters represents malicious activity using machine learning |
| CN106789888B (zh) * | 2016-11-18 | 2020-08-04 | 重庆邮电大学 | 一种多特征融合的钓鱼网页检测方法 |
| JP6782679B2 (ja) * | 2016-12-06 | 2020-11-11 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America | 情報処理装置、情報処理方法及びプログラム |
| CN106960358A (zh) * | 2017-01-13 | 2017-07-18 | 重庆小富农康农业科技服务有限公司 | 一种基于农村电子商务大数据深度学习的金融欺诈行为量化检测系统 |
| US10909471B2 (en) * | 2017-03-24 | 2021-02-02 | Microsoft Technology Licensing, Llc | Resource-efficient machine learning |
| US11521108B2 (en) * | 2018-07-30 | 2022-12-06 | Microsoft Technology Licensing, Llc | Privacy-preserving labeling and classification of email |
-
2017
- 2017-12-28 CN CN201711463325.3A patent/CN108229156A/zh active Pending
-
2018
- 2018-10-18 TW TW107136689A patent/TWI706273B/zh not_active IP Right Cessation
- 2018-11-19 WO PCT/CN2018/116100 patent/WO2019128529A1/zh not_active Ceased
- 2018-11-19 PL PL18893619T patent/PL3651043T3/pl unknown
- 2018-11-19 SG SG11202001369TA patent/SG11202001369TA/en unknown
- 2018-11-19 EP EP18893619.9A patent/EP3651043B1/en active Active
- 2018-11-19 ES ES18893619T patent/ES2878330T3/es active Active
-
2020
- 2020-02-26 US US16/802,147 patent/US10785241B2/en active Active
Patent Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20110185425A1 (en) * | 2010-01-22 | 2011-07-28 | National Taiwan University Of Science & Technology | Network attack detection devices and methods |
| CN104537303A (zh) * | 2014-12-30 | 2015-04-22 | 中国科学院深圳先进技术研究院 | 一种钓鱼网站鉴别系统及鉴别方法 |
| CN104735074A (zh) * | 2015-03-31 | 2015-06-24 | 江苏通付盾信息科技有限公司 | 一种恶意url检测方法及其实现系统 |
| CN107346388A (zh) * | 2017-07-03 | 2017-11-14 | 四川无声信息技术有限公司 | Web攻击检测方法及装置 |
| CN107577945A (zh) * | 2017-09-28 | 2018-01-12 | 阿里巴巴集团控股有限公司 | Url攻击检测方法、装置以及电子设备 |
| CN107992741A (zh) * | 2017-10-24 | 2018-05-04 | 阿里巴巴集团控股有限公司 | 一种模型训练方法、检测url的方法及装置 |
| CN108111489A (zh) * | 2017-12-07 | 2018-06-01 | 阿里巴巴集团控股有限公司 | Url攻击检测方法、装置以及电子设备 |
| CN108229156A (zh) * | 2017-12-28 | 2018-06-29 | 阿里巴巴集团控股有限公司 | Url攻击检测方法、装置以及电子设备 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP3651043A4 * |
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111371794A (zh) * | 2020-03-09 | 2020-07-03 | 北京金睛云华科技有限公司 | 阴影域检测模型、检测模型建立方法、检测方法及系统 |
| CN111371794B (zh) * | 2020-03-09 | 2022-01-18 | 北京金睛云华科技有限公司 | 阴影域检测模型、检测模型建立方法、检测方法及系统 |
| CN111970272A (zh) * | 2020-08-14 | 2020-11-20 | 上海境领信息科技有限公司 | 一种apt攻击操作识别方法 |
| CN112398875A (zh) * | 2021-01-18 | 2021-02-23 | 北京电信易通信息技术股份有限公司 | 视频会议场景下基于机器学习的流数据安全漏洞探测方法 |
| CN114499917A (zh) * | 2021-10-25 | 2022-05-13 | 中国银联股份有限公司 | Cc攻击检测方法及cc攻击检测装置 |
| CN114499917B (zh) * | 2021-10-25 | 2024-01-09 | 中国银联股份有限公司 | Cc攻击检测方法及cc攻击检测装置 |
Also Published As
| Publication number | Publication date |
|---|---|
| PL3651043T3 (pl) | 2021-10-04 |
| EP3651043B1 (en) | 2021-04-14 |
| TW201931187A (zh) | 2019-08-01 |
| ES2878330T3 (es) | 2021-11-18 |
| US20200195667A1 (en) | 2020-06-18 |
| US10785241B2 (en) | 2020-09-22 |
| CN108229156A (zh) | 2018-06-29 |
| TWI706273B (zh) | 2020-10-01 |
| SG11202001369TA (en) | 2020-03-30 |
| EP3651043A1 (en) | 2020-05-13 |
| EP3651043A4 (en) | 2020-07-08 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| TWI706273B (zh) | 統一資源定位符(url)攻擊檢測方法、裝置及電子設備 | |
| TWI673625B (zh) | 統一資源定位符(url)攻擊檢測方法、裝置以及電子設備 | |
| CN111382434B (zh) | 用于检测恶意文件的系统和方法 | |
| CN107577945B (zh) | Url攻击检测方法、装置以及电子设备 | |
| CN114730339A (zh) | 检测计算机系统中未知的恶意内容 | |
| CN113469366B (zh) | 一种加密流量的识别方法、装置及设备 | |
| JP2020505707A (ja) | 侵入検出のための継続的な学習 | |
| US11206277B1 (en) | Method and apparatus for detecting abnormal behavior in network | |
| CN114637993A (zh) | 恶意代码包的检测方法、装置、计算机设备和存储介质 | |
| Rasheed et al. | Adversarial attacks on featureless deep learning malicious URLs detection | |
| CN114091019B (zh) | 数据集构建、恶意软件识别、识别模型构建方法及装置 | |
| CN108156127B (zh) | 网络攻击模式的判断装置、判断方法及其计算机可读取储存媒体 | |
| CN116962009A (zh) | 一种网络攻击检测方法及装置 | |
| CN120910866A (zh) | 基于大模型内生机制操控的模型安全漏洞发现方法及装置 | |
| CN111783088B (zh) | 一种恶意代码家族聚类方法、装置和计算机设备 | |
| Zhu et al. | Effective phishing website detection based on improved BP neural network and dual feature evaluation | |
| CN114398887B (zh) | 一种文本分类方法、装置及电子设备 | |
| CN116911294A (zh) | 一种敏感字段的识别方法、装置、设备及介质 | |
| CN110197066B (zh) | 一种云计算环境下的虚拟机监控方法及监控系统 | |
| CN114282209A (zh) | 一种威胁数据生成方法、系统及存储介质 | |
| CN107770129A (zh) | 用于检测用户行为的方法和装置 | |
| CN120768695B (zh) | 告警信息处理模型的训练方法、装置及电子设备 | |
| CN118368097B (zh) | 情报检测方法、装置、设备及存储介质 | |
| Alghofaili et al. | Web-based attacks detection using deep learning techniques: a comprehensive review | |
| CN116582361B (zh) | 一种漏洞攻击检测方法、装置、设备及存储介质 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18893619 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2018893619 Country of ref document: EP Effective date: 20200205 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |