WO2016145676A1 - 基于满足k度稀疏约束的深度学习模型的大数据处理方法 - Google Patents

基于满足k度稀疏约束的深度学习模型的大数据处理方法 Download PDF

Info

Publication number
WO2016145676A1
WO2016145676A1 PCT/CN2015/075473 CN2015075473W WO2016145676A1 WO 2016145676 A1 WO2016145676 A1 WO 2016145676A1 CN 2015075473 W CN2015075473 W CN 2015075473W WO 2016145676 A1 WO2016145676 A1 WO 2016145676A1
Authority
WO
WIPO (PCT)
Prior art keywords
layer
deep learning
learning model
input
degree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2015/075473
Other languages
English (en)
French (fr)
Inventor
盛益强
王劲林
邓浩江
尤佳莉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Acoustics CAS
Shanghai 3Ntv Network Technology Co Ltd
Original Assignee
Institute of Acoustics CAS
Shanghai 3Ntv Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Acoustics CAS, Shanghai 3Ntv Network Technology Co Ltd filed Critical Institute of Acoustics CAS
Priority to US15/557,469 priority Critical patent/US11048998B2/en
Priority to JP2017548139A priority patent/JP6466590B2/ja
Priority to EP15885059.4A priority patent/EP3282401A4/en
Publication of WO2016145676A1 publication Critical patent/WO2016145676A1/zh
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0495Quantised networks; Sparse networks; Compressed networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0499Feedforward networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0895Weakly supervised learning, e.g. semi-supervised or self-supervised learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning

Definitions

  • the present invention relates to the field of artificial intelligence and big data, and more particularly to a big data processing method based on a deep learning model that satisfies K-degree constraints.
  • Hinton et al. proposed a layer-by-layer initialization training method for deep confidence networks, which is the starting point of the deep learning method.
  • This method breaks the difficult situation of deep neural network training that lasts for several decades and the effect is not good.
  • deep learning algorithms have replaced traditional algorithms and have been widely used in image recognition, speech recognition, and natural language understanding.
  • Deep learning is to simulate the hierarchical abstraction of the human brain, and map the underlying data layer by layer to obtain more abstract features. Because it can extract features automatically from big data, and get a good processing effect through massive sample training, Has received extensive attention.
  • the rapid growth of big data and the research breakthroughs of deep learning are complementary.
  • the rapid growth of big data requires a method to efficiently process massive data.
  • the training of deep learning models requires massive sample data. In short, big data can maximize the performance of deep learning.
  • the existing deep learning models still have many serious problems, such as: the model is difficult to expand, the parameter optimization is difficult, the training time is too long, and the reasoning efficiency is low.
  • Bengio in 2013, it summarizes the challenges and difficulties of current deep learning, including how to extend the scale of existing deep learning models and apply them to larger data sets; how to reduce parameter optimization difficulties How to avoid expensive reasoning and sampling, and how to solve the changing factors.
  • the object of the present invention is to overcome the above problems existing in the existing neural network deep learning model in big data applications, and to propose a deep learning model based on satisfying K-degree sparse constraints, which is forwarded to each layer of neuron nodes.
  • Degree constraints simplify the structure of the model, improve the training speed and generalization ability of the model, improve the difficulty of model parameter optimization, apply the model to big data processing, can reduce the difficulty of big data processing, and improve the The speed of data processing.
  • the present invention proposes a big data processing method based on a deep learning model satisfying a K-degree sparse constraint, the method comprising:
  • Step 1) Construct a deep learning model satisfying the K-degree sparse constraint by using the unpricing training sample by the gradient pruning method;
  • the K-degree sparse constraint includes a node K-degree sparse constraint and a hierarchical K-degree sparse constraint;
  • the node K-degree sparse constraint It means that the forward output of all nodes in the model does not exceed K;
  • the value of K is (1, N/H), where N is the number of all nodes in the deep learning model; H is The number of layers of the model hidden layer;
  • the level K degree sparse constraint means that the sum of the forward outputs of all the nodes of the hth layer is smaller than the sum of the forward outputs of all the nodes of the h-1th layer;
  • Step 2) input the updated training sample into the deep learning model satisfying the K-degree sparse constraint, and optimize the weight parameters of each layer of the model; and then obtain an optimized deep learning model satisfying the K-degree sparse constraint;
  • Step 3 Input the big data to be processed into the optimized deep learning model satisfying the K-degree sparse constraint, and finally output the processing result.
  • the value of the K is:
  • d in is the dimension of the model input
  • d out is the dimension of the model output
  • H is the number of layers of the model hidden layer
  • [] is the rounding symbol.
  • step 1) in the method further includes:
  • the deep learning model includes an input layer, H hidden layers, and an output layer.
  • the input layer to the output layer includes a total of H+2 layers; the input layer is numbered 0, the first hidden layer is numbered 1, and so on.
  • the output layer is numbered H+1;
  • Step 103 Set the unlabeled training sample set Entering the hth layer, adjusting the connection weight between the hth layer and the h+1th layer and the offset weight of the h+1th layer node in the process of minimizing the cost function of the hth layer and the h+1th layer;
  • Step 104) When there is a connection weight smaller than the first threshold, determine whether to delete the connection by reconstructing a probability function of the error change;
  • the reconstructed sample is reconstructed according to the two cases of the current connection and the current connection, and the reconstruction error ⁇ E r is obtained , and the probability function of the error is changed min[1] , exp(- ⁇ E r /E r )] to decide whether to delete the current connection;
  • Step 105) determining whether the forward output of all nodes in the h layer is less than K, if the result of the determination is affirmative, then, go to step 106); otherwise, go to step 103);
  • Step 106) If h>0, it is determined whether the sum of the forward outputs of all the nodes in the hth layer is smaller than the sum of the forward outputs of all the nodes in the h-1 layer. If the judgment result is affirmative, proceed to step 107. ), otherwise, go to step 103);
  • Step 107) determining whether the cost function change is less than the second threshold, if the result of the determination is affirmative, proceeds to step 108), otherwise, proceeds to step 103);
  • Step 108) determining whether h>H is established, if the result of the determination is affirmative, the process of step 1) ends; otherwise, proceeds to step 102);
  • step 2) in the method is:
  • the method of the invention can overcome the shortcomings of the existing neural network model, such as excessive training speed and difficult parameter optimization, and improve the expansion capability, generalization ability and execution speed of the existing neural network models such as the deep feedforward neural network and the deep confidence network. Improve unsupervised learning difficulty and parameter optimization difficulty, thus reducing the depth learning algorithm for big data processing Difficulty.
  • FIG. 1 is a schematic diagram of a non-hierarchical K-degree sparse network and its node degree sparse constraint
  • FIG. 2 is a schematic diagram of a hierarchical K-degree sparse network and its hierarchical sparse constraints
  • FIG. 3 is a flow chart of a big data processing method for a deep learning model based on a satisfaction degree sparse constraint according to the present invention.
  • the non-hierarchical K-sparse network means that all nodes satisfy the node K-degree sparse constraint, and the node-K-sparse constraint refers to: deleting unnecessary connections between nodes until the forward output K i of all nodes N does not exceed K, where K is a set parameter; forward refers to the direction from input to output, if there is a hidden layer, then it is the direction from input to hidden layer to output.
  • the hierarchical K-sparse network after training means that all layers satisfy the hierarchical K-degree sparse constraint
  • the hierarchical K-degree sparse constraint refers to: the hierarchical forward degree of the hidden layer, that is, the positive of a single hidden layer node.
  • the sum of the out-of-ranges is monotonically decreasing from input to output.
  • hierarchical K-degree sparse constraints if the forward out-degrees of nodes in each layer are equal, then the product of the number of nodes in each layer and the forward-outward is monotonically decreasing from input to output.
  • the node k-sparse network refers to a neural network model that satisfies k i ⁇ k
  • the hierarchical K-degree sparse network refers to satisfying
  • the hierarchical upper limit K degree sparse network means satisfying Neural network model
  • the mathematical language is used below to describe a neural network model that satisfies the K-degree sparse constraint.
  • x j f( ⁇ i w ij x i +b j ), where x i ⁇ X, x j ⁇ X
  • x j is the output of any node
  • f is the activation function of the node
  • b j is the offset weight of the node
  • w ij is the input weight connected to the node, and allows the weight to be zero.
  • the forward direction of the entire neural network model is defined as the direction from the external input to the output.
  • the output of any node is forwarded to the K i nodes:
  • K is a hyperparameter, usually smaller than N when fully connected, or even much smaller to achieve sparse effect; K has a value range of (1, N/H), where N is deep learning The number of all nodes in the model; H is the number of layers in the model hidden layer; preferably, the value of K is:
  • d in is the dimension of the model input
  • d out t is the dimension of the model output
  • H is the number of layers of the model hidden layer
  • [] is the rounding symbol.
  • K (h) is the maximum value of the forward output of each node of the hth hidden layer. Depending on the hidden layer, K (h) may be different, but the K value remains unchanged.
  • the present invention provides a big data processing method based on a deep learning model satisfying a K-degree sparse constraint, the method comprising:
  • Step 1) Construct a deep learning model satisfying the K-degree sparse constraint with the unlabeled training samples by the gradient pruning method
  • the step 1) further includes:
  • the deep learning model includes an input layer, H hidden layers, and an output layer.
  • the input layer to the output layer includes a total of H+2 layers; the input layer is numbered 0, the first hidden layer is numbered 1, and so on.
  • the output layer is numbered H+1;
  • Step 103 Set the unlabeled training sample set Entering the hth layer, adjusting the connection weight between the hth layer and the h+1th layer and the offset weight of the h+1th layer node in the process of minimizing the cost function of the hth layer and the h+1th layer;
  • Step 104) When there is a connection weight smaller than the first threshold, determine whether to delete the connection by reconstructing a probability function of the error change;
  • the reconstructed sample is reconstructed according to the two cases of the current connection and the current connection, and the reconstruction error ⁇ E r is obtained , and the probability function of the error is changed min[1] , exp(- ⁇ E r /E r )] to decide whether to delete the current connection;
  • Step 105) determining whether the forward output of all nodes in the h layer is less than K, if the result of the determination is affirmative, then, go to step 106); otherwise, go to step 103);
  • Step 106) If h>0, it is determined whether the sum of the forward outputs of all the nodes in the hth layer is smaller than the sum of the forward outputs of all the nodes in the h-1 layer. If the judgment result is affirmative, proceed to step 107. ), otherwise, go to step 103);
  • Step 107) determining whether the cost function change is less than the second threshold, if the result of the determination is affirmative, proceeds to step 108), otherwise, proceeds to step 103);
  • Step 108) determining whether h>H is established, and if the result of the determination is affirmative, the process of step 1) ends; Otherwise, go to step 102).
  • Step 2) input the updated training sample into the deep learning model satisfying the K-degree sparse constraint, and optimize the weight parameters of each layer of the model; and then obtain an optimized deep learning model satisfying the K-degree sparse constraint;
  • Step 2) input the updated training sample into the deep learning model satisfying the K-degree sparse constraint, and optimize the weight parameters of each layer of the model; and then obtain an optimized deep learning model satisfying the K-degree sparse constraint;
  • Step 3 Input the big data to be processed into the optimized deep learning model satisfying the K-degree sparse constraint, and finally output the processing result.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明提出了基于满足K度稀疏约束的深度学习模型的大数据处理方法,所述方法包括:步骤1)通过渐变剪枝法用无标注训练样本构建满足K度稀疏约束的深度学习模型;所述K度稀疏约束包括节点K度稀疏约束和层次K度稀疏约束;步骤2)将更新后的训练样本输入所述满足K度稀疏约束的深度学习模型,优化模型的各层的权重参数;进而得到优化的满足K度稀疏约束的深度学习模型;步骤3)将待处理的大数据输入所述优化的满足K度稀疏约束的深度学习模型进行处理,最后输出处理结果。本发明的方法能够降低大数据处理的难度,提高大数据处理的速度。

Description

基于满足K度稀疏约束的深度学习模型的大数据处理方法 技术领域
本发明涉及人工智能和大数据领域,特别涉及基于满足K度约束的深度学习模型的大数据处理方法。
背景技术
随着网络技术的快速发展,数据的容量和多样性快速增加,而处理数据的算法复杂度却难以改善,如何高效处理大数据已经成为一个紧迫的难题。在现有的依赖个人经验和手工操作来描述数据、标注数据、选择特征、提取特征、处理数据的方法,已经很难满足大数据快速增长的需求。随着人工智能技术的快速发展,特别是深度学习算法的研究突破,为解决大数据处理问题指明了一个值得探索的方向。
Hinton等人在2006年提出了用于深度置信网的逐层初始化训练方法,这是深度学习方法的研究起点,该方法打破了持续了几十年的深度神经网络训练困难且效果不好的局面。此后,深度学习算法替代了传统算法,在图像识别、语音识别、自然语言理解等领域得到广泛的应用。深度学习是通过模拟人脑分层次的抽象,将底层数据逐层映射而获得更抽象的特征,由于它可以从大数据中自动提取特征,并通过海量的样本训练获得很好的处理效果,从而得到了广泛的关注。实际上,大数据的快速增长和深度学习的研究突破是相辅相成的,一方面大数据的快速增长需要一种高效处理海量数据的方法,另一方面深度学习模型的训练需要海量的样本数据。总之,大数据可以使深度学习的性能达到极致。
但是,现有的深度学习模型仍然存在很多严重的问题,例如:模型难以扩展、参数优化困难、训练时间过长、推理效率低下等。在2013年Bengio的一篇综述论文中,总结了目前深度学习所面临的挑战和难点,包括如何扩展现有的深度学习模型的规模,并应用到更大的数据集;如何减小参数优化困难,如何避免昂贵的推理和采样,以及如何解开变化因素等。
发明内容
本发明的目的在于,克服在大数据应用中现有神经网络深度学习模型存在的上述问题,提出了基于满足K度稀疏约束的深度学习模型,该模型通过对各层神经元节点的正向出度进行约束,简化了模型的结构,提高了模型的训练速度和泛化能力,改善模型参数优化困难问题,将该模型应用于大数据处理,能够降低大数据处理的难度,提高大 数据处理的速度。
为了实现上述目的,本发明提出了基于满足K度稀疏约束的深度学习模型的大数据处理方法,所述方法包括:
步骤1)通过渐变剪枝法用无标注训练样本构建满足K度稀疏约束的深度学习模型;所述K度稀疏约束包括节点K度稀疏约束和层次K度稀疏约束;所述节点K度稀疏约束是指模型中所有节点的正向出度不超过K;所述的K的取值范围为(1,N/H],其中,N为所述深度学习模型中所有节点的个数;H为模型隐层的层数;所述层次K度稀疏约束是指第h层所有节点的正向出度的和小于第h-1层所有节点的正向出度的和;
步骤2)将更新后的训练样本输入所述满足K度稀疏约束的深度学习模型,优化模型的各层的权重参数;进而得到优化的满足K度稀疏约束的深度学习模型;
步骤3)将待处理的大数据输入所述优化的满足K度稀疏约束的深度学习模型进行处理,最后输出处理结果。
上述技术方案中,所述K的取值为:
Figure PCTCN2015075473-appb-000001
其中,din为模型输入的维度,dout为模型输出的维度,H为模型隐层的层数,[]为取整符号。
上述技术方案中,所述方法中的步骤1)进一步包括:
步骤101)按照输入层至输出层的顺序对深度学习模型的各层进行编号,令h=-1;
设深度学习模型包括输入层、H个隐层和输出层,从输入层至输出层总共包括H+2层;设输入层的编号为0,第一个隐层的编号为1,依次类推,输出层的编号为H+1;
步骤102)令h=h+1,初始化第h层和第h+1层的参数;
步骤103)将无标注训练样本集
Figure PCTCN2015075473-appb-000002
输入第h层,在第h层和第h+1层的代价函数最小化的过程中,调整第h层和第h+1层之间连接权重和第h+1层节点的偏置权重;
步骤104)当有连接权重小于第一阈值时,通过重构误差变化的概率函数来判断是否删除该连接;
如果有连接的权重衰减到小于第一阈值时,则根据有当前连接和无当前连接的两种情况下重构样本,得到重构误差变化ΔEr,并以该误差变化的概率函数min[1,exp(-ΔEr/Er)]来决定是否删除当前连接;
步骤105)判断第h层所有节点的正向出度是否都小于K,如果判断结果是肯定的,当时,转入步骤106);否则,转入步骤103);
步骤106)若h>0,则判断第h层所有节点的正向出度的和是否小于第h-1层所有节点的正向出度的和,如果判断结果是肯定的,转入步骤107),否则,转入步骤103);
步骤107)判断代价函数变化是否小于第二阈值,如果判断结果是肯定的,转入步骤108),否则,转入步骤103);
步骤108)判断h>H是否成立,如果判断结果是肯定的,步骤1)的流程结束;否则,转入步骤102);
上述技术方案中,所述方法中步骤2)的具体过程为:
将更新后的训练样本输入所述满足K度稀疏约束的深度学习模型,当输入的训练样本为无标注样本集
Figure PCTCN2015075473-appb-000003
时,将输入样本得到输出再逆转为输入,在k度约束下自输出层向输入层进行反向重构,计算重构误差Er,以梯度下降或共轭梯度下降的方式调整各层权重;直至误差小于临界值;当输入的训练样本为有标注的样本集
Figure PCTCN2015075473-appb-000004
则将输出与
Figure PCTCN2015075473-appb-000005
相比较,计算出训练误差Et,以梯度下降或共轭梯度下降的方式调整正向权重;直至误差小于临界值。
本发明的方法可以克服现有神经网络模型的训练速度过长、参数优化困难等缺点,提高深度前馈神经网络、深度置信网等现有神经网络模型的扩展能力、泛化能力和执行速度,改善无监督学习难度和参数优化难度,从而降低深度学习算法进行大数据处理的 难度。
附图说明
图1为无层次K度稀疏网络及其节点度稀疏约束的示意图;
图2为有层次K度稀疏网络及其层次度稀疏约束的示意图;
图3为本发明的基于满足度稀疏约束的深度学习模型的大数据处理方法的流程图。
具体实施方式
首先对本发明涉及的概念进行解释。
如图1所示,无层次K度稀疏网络是指所有节点满足节点K度稀疏约束,节点K度稀疏约束是指:删除节点之间的不必要连接,直到所有节点的正向出度Ki都不超过K为止,其中K为一个设定的参数;正向是指从输入到输出的方向,如果有隐层,那么它是从输入到隐层再到输出的方向。
如图2所示,训练后的有层次K度稀疏网络是指所有层满足层次K度稀疏约束,层次K度稀疏约束是指:隐层的层次正向出度,即单一隐层节点的正向出度之和从输入到输出单调递减。作为层次K度稀疏约束的一种特殊情况,假如每一层中的节点的正向出度相等,那么每一层中的节点数和正向出度的乘积从输入到输出单调递减。
此外,作为上述k度稀疏网络的一些简单变化,节点k度稀疏网络是指满足ki≤k的神经网络模型,层次K度稀疏网络是指满足
Figure PCTCN2015075473-appb-000006
的神经网络模型,节点上限K度稀疏网络是指满足ki=k的神经网络模型,层次上限K度稀疏网络是指满足
Figure PCTCN2015075473-appb-000007
的神经网络模型,理想上限K度稀疏网络是指同时满足ki=k和
Figure PCTCN2015075473-appb-000008
的神经网络模型,它们都可以直接应用本发明所涉及的方法。
下面用数学语言来描述满足K度稀疏约束的神经网络模型。
假设神经网络模型共有N个节点,先以全连接的方式形成网络,将不连接的权重取值为零,如图1所示,那么任何一个节点的输出xj和该节点的输入集合X={xi,i=1…N}之间都满足以下运算规则:
xj=f(∑iwijxi+bj),其中,xi∈X,xj∈X
这里的xj是任意节点的输出,f是该节点的激活函数,bj是该节点的偏置权重,wij是连接到该节点的输入权重,并允许存在取值为零的权重。
现定义整个神经网络模型的正向为从外部输入到输出的方向,如图1所示,对于K度稀疏网络来说,任取一个节点的输出都会正向输入到Ki个节点上:
Ki≤K
这里的K是一个超参数,通常要比全连接时的N小一些,甚至要小很多,以达到稀疏的效果;K的取值范围为(1,N/H],其中,N为深度学习模型中所有节点的个数;H为模型隐层的层数;优选的,K的取值为:
Figure PCTCN2015075473-appb-000009
其中,din为模型输入的维度,dout t为模型输出的维度,H为模型隐层的层数,[]为取整符号。
Ki是第i个节点的正向出度,其中i=1…N。如果是有层次K度稀疏网络,如图2所示,还必须满足所述的层次K度稀疏约束:
Figure PCTCN2015075473-appb-000010
这里的
Figure PCTCN2015075473-appb-000011
是第j个隐层中任意节点的正向出度,
Figure PCTCN2015075473-appb-000012
是第j+1个隐层中任意节点的正向出度。
对于第h个隐层来说,任取一个节点的输出都会正向输入到Ki (h)个节点上,有:
Ki (h)≤K(h)≤K
其中,K(h)是第h个隐层的各节点的正向出度的最大值,根据隐层的不同,K(h)可以不同,但K值保持不变。
如图3所示,本发明提供了基于满足K度稀疏约束的深度学习模型的大数据处理方法,所述方法包括:
步骤1)通过渐变剪枝法用无标注训练样本构建满足K度稀疏约束的深度学习模型;
所述步骤1)进一步包括:
步骤101)按照输入层至输出层的顺序对深度学习模型的各层进行编号,令h=-1;
设深度学习模型包括输入层、H个隐层和输出层,从输入层至输出层总共包括H+2层;设输入层的编号为0,第一个隐层的编号为1,依次类推,输出层的编号为H+1;
步骤102)令h=h+1,初始化第h层和第h+1层的参数;
步骤103)将无标注训练样本集
Figure PCTCN2015075473-appb-000013
输入第h层,在第h层和第h+1层的代价函数最小化的过程中,调整第h层和第h+1层之间连接权重和第h+1层节点的偏置权重;
步骤104)当有连接权重小于第一阈值时,通过重构误差变化的概率函数来判断是否删除该连接;
如果有连接的权重衰减到小于第一阈值时,则根据有当前连接和无当前连接的两种情况下重构样本,得到重构误差变化ΔEr,并以该误差变化的概率函数min[1,exp(-ΔEr/Er)]来决定是否删除当前连接;
步骤105)判断第h层所有节点的正向出度是否都小于K,如果判断结果是肯定的,当时,转入步骤106);否则,转入步骤103);
步骤106)若h>0,则判断第h层所有节点的正向出度的和是否小于第h-1层所有节点的正向出度的和,如果判断结果是肯定的,转入步骤107),否则,转入步骤103);
步骤107)判断代价函数变化是否小于第二阈值,如果判断结果是肯定的,转入步骤108),否则,转入步骤103);
步骤108)判断h>H是否成立,如果判断结果是肯定的,步骤1)的流程结束; 否则,转入步骤102)。
步骤2)将更新后的训练样本输入所述满足K度稀疏约束的深度学习模型,优化模型的各层的权重参数;进而得到优化的满足K度稀疏约束的深度学习模型;
将更新训练样本输入所述满足K度稀疏约束的深度学习模型,当输入的训练样本为无标注样本集
Figure PCTCN2015075473-appb-000014
时,将输入样本得到输出再逆转为输入,在k度约束下自输出层向输入层进行反向重构,计算重构误差Er,以梯度下降或共轭梯度下降的方式调整各层权重;直至误差小于临界值;当输入的训练样本为有标注的样本集
Figure PCTCN2015075473-appb-000015
则将输出与
Figure PCTCN2015075473-appb-000016
相比较,计算出训练误差Et,以梯度下降或共轭梯度下降的方式调整正向权重;直至误差小于临界值。
步骤2)将更新后的训练样本输入所述满足K度稀疏约束的深度学习模型,优化模型的各层的权重参数;进而得到优化的满足K度稀疏约束的深度学习模型;
步骤3)将待处理的大数据输入所述优化的满足K度稀疏约束的深度学习模型进行处理,最后输出处理结果。
最后所应说明的是,以上实施例仅用以说明本发明的技术方案而非限制。尽管参照实施例对本发明进行了详细说明,本领域的普通技术人员应当理解,对本发明的技术方案进行修改或者等同替换,都不脱离本发明技术方案的精神和范围,其均应涵盖在本发明的权利要求范围当中。

Claims (4)

  1. 基于满足K度稀疏约束的深度学习模型的大数据处理方法,所述方法包括:
    步骤1)通过渐变剪枝法用无标注训练样本构建满足K度稀疏约束的深度学习模型;所述K度稀疏约束包括节点K度稀疏约束和层次K度稀疏约束;所述节点K度稀疏约束是指模型中所有节点的正向出度不超过K;所述的K的取值范围为(1,N/H],其中,N为所述深度学习模型中所有节点的个数;H为模型隐层的层数;所述层次K度稀疏约束是指第h层所有节点的正向出度的和小于第h-1层所有节点的正向出度的和;
    步骤2)将更新后的训练样本输入所述满足K度稀疏约束的深度学习模型,优化模型的各层的权重参数;进而得到优化的满足K度稀疏约束的深度学习模型;
    步骤3)将待处理的大数据输入所述优化的满足K度稀疏约束的深度学习模型进行处理,最后输出处理结果。
  2. 根据权利要求1所述的基于满足K度稀疏约束的深度学习模型的大数据处理方法,其特征在于,所述K的取值为:
    Figure PCTCN2015075473-appb-100001
    其中,din为模型输入的维度,dout为模型输出的维度,H为模型隐层的层数,[]为取整符号。
  3. 根据权利要求1所述的基于满足K度稀疏约束的深度学习模型的大数据处理方法,所述方法中的步骤1)进一步包括:
    步骤101)按照输入层至输出层的顺序对深度学习模型的各层进行编号,令h=-1;
    设深度学习模型包括输入层、H个隐层和输出层,从输入层至输出层总共包括H+2层;设输入层的编号为0,第一个隐层的编号为1,依次类推,输出层的编号为H+1;
    步骤102)令h=h+1,初始化第h层和第h+1层的参数;
    步骤103)将无标注训练样本集
    Figure PCTCN2015075473-appb-100002
    输入第h层,在第h层和第h+1层的代价函数最小化的过程中,调整第h层和第h+1层之间连接权重和第h+1层节点的偏置权重;
    步骤104)当有连接权重小于第一阈值时,通过重构误差变化的概率函数来判断是否删除该连接;
    如果有连接的权重衰减到小于第一阈值时,则根据有当前连接和无当前连接的两种情况下重构样本,得到重构误差变化ΔEr,并以该误差变化的概率函数min[1,exp(-ΔEr/Er)]来决定是否删除当前连接;
    步骤105)判断第h层所有节点的正向出度是否都小于K,如果判断结果是肯定的,当时,转入步骤106);否则,转入步骤103);
    步骤106)若h>0,则判断第h层所有节点的正向出度的和是否小于第h-1层所有节点的正向出度的和,如果判断结果是肯定的,转入步骤107),否则,转入步骤103);
    步骤107)判断代价函数变化是否小于第二阈值,如果判断结果是肯定的,转入步骤108),否则,转入步骤103);
    步骤108)判断h>H是否成立,如果判断结果是肯定的,步骤1)的流程结束;否则,转入步骤102)。
  4. 根据权利要求3所述的基于满足K度稀疏约束的深度学习模型的大数据处理方法,所述方法中步骤2)的具体过程为:
    将更新后的训练样本输入所述满足K度稀疏约束的深度学习模型,当输入的训练样本为无标注样本集
    Figure PCTCN2015075473-appb-100003
    时,将输入样本得到输出再逆转为输入,在k度约束下自输出层向输入层进行反向重构,计算重构误差Er,以梯度下降或共轭梯度下降的方式调整各层权重;直至误差小于临界值;当输入的训练样本为有标注的样本集
    Figure PCTCN2015075473-appb-100004
    则将输出与
    Figure PCTCN2015075473-appb-100005
    相比较,计算出训练误差Et,以梯度下降或共轭梯度下降的方式调整正向 权重;直至误差小于临界值。
PCT/CN2015/075473 2015-03-13 2015-03-31 基于满足k度稀疏约束的深度学习模型的大数据处理方法 Ceased WO2016145676A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US15/557,469 US11048998B2 (en) 2015-03-13 2015-03-31 Big data processing method based on deep learning model satisfying k-degree sparse constraint
JP2017548139A JP6466590B2 (ja) 2015-03-13 2015-03-31 K次数スパース制約を満たす深層学習モデルに基づくビッグデータの処理方法
EP15885059.4A EP3282401A4 (en) 2015-03-13 2015-03-31 Big data processing method based on deep learning model satisfying k-degree sparse constraint

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510112645.9 2015-03-13
CN201510112645.9A CN106033555A (zh) 2015-03-13 2015-03-13 基于满足k度稀疏约束的深度学习模型的大数据处理方法

Publications (1)

Publication Number Publication Date
WO2016145676A1 true WO2016145676A1 (zh) 2016-09-22

Family

ID=56918252

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/075473 Ceased WO2016145676A1 (zh) 2015-03-13 2015-03-31 基于满足k度稀疏约束的深度学习模型的大数据处理方法

Country Status (5)

Country Link
US (1) US11048998B2 (zh)
EP (1) EP3282401A4 (zh)
JP (1) JP6466590B2 (zh)
CN (1) CN106033555A (zh)
WO (1) WO2016145676A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107316024A (zh) * 2017-06-28 2017-11-03 北京博睿视科技有限责任公司 基于深度学习的周界报警算法
JP2018097467A (ja) * 2016-12-09 2018-06-21 国立大学法人電気通信大学 プライバシ保護データ提供システム及びプライバシ保護データ提供方法
CN110209943A (zh) * 2019-06-04 2019-09-06 成都终身成长科技有限公司 一种单词推送方法、装置及电子设备
CN110533170A (zh) * 2019-08-30 2019-12-03 陕西思科锐迪网络安全技术有限责任公司 一种图形化编程的深度学习神经网络搭建方法

Families Citing this family (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106685546A (zh) * 2016-12-29 2017-05-17 深圳天珑无线科技有限公司 一种无线人体感知的方法及服务器
CN109284826A (zh) * 2017-07-19 2019-01-29 阿里巴巴集团控股有限公司 神经网络处理方法、装置、设备及计算机可读存储介质
US11461628B2 (en) * 2017-11-03 2022-10-04 Samsung Electronics Co., Ltd. Method for optimizing neural networks
CN108694502B (zh) * 2018-05-10 2022-04-12 清华大学 一种基于XGBoost算法的机器人制造单元自适应调度方法
CN108985382B (zh) * 2018-05-25 2022-07-15 清华大学 基于关键数据通路表示的对抗样本检测方法
CN109033521B (zh) * 2018-06-25 2021-04-20 中南大学 一种新建铁路限制坡度优化决策方法
WO2020020088A1 (zh) * 2018-07-23 2020-01-30 第四范式(北京)技术有限公司 神经网络模型的训练方法和系统以及预测方法和系统
CN110751261B (zh) 2018-07-23 2024-05-28 第四范式(北京)技术有限公司 神经网络模型的训练方法和系统以及预测方法和系统
US11468330B2 (en) 2018-08-03 2022-10-11 Raytheon Company Artificial neural network growth
CN110390326A (zh) * 2019-06-14 2019-10-29 华南理工大学 一种基于集聚交叉熵损失函数的序列识别方法
CN110287031B (zh) * 2019-07-01 2023-05-09 南京大学 一种减少分布式机器学习通信开销的方法
US10984507B2 (en) 2019-07-17 2021-04-20 Harris Geospatial Solutions, Inc. Image processing system including training model based upon iterative blurring of geospatial images and related methods
US11068748B2 (en) 2019-07-17 2021-07-20 Harris Geospatial Solutions, Inc. Image processing system including training model based upon iteratively biased loss function and related methods
US11417087B2 (en) 2019-07-17 2022-08-16 Harris Geospatial Solutions, Inc. Image processing system including iteratively biased training model probability distribution function and related methods
CN110458337B (zh) * 2019-07-23 2020-12-22 内蒙古工业大学 一种基于c-gru的网约车供需预测方法
CN110543918B (zh) * 2019-09-09 2023-03-24 西北大学 一种基于正则化与数据增广的稀疏数据处理方法
CN111260121B (zh) * 2020-01-12 2022-04-29 桂林电子科技大学 一种基于深度瓶颈残差网络的城市范围的人流量预测方法
CN111429175B (zh) * 2020-03-18 2022-05-27 电子科技大学 稀疏特征场景下进行点击转化预测的方法
CN112100912A (zh) * 2020-09-08 2020-12-18 长春工程学院 基于满足k度稀疏约束的深度学习模型的数据处理方法
CN112631215B (zh) * 2020-12-10 2022-06-24 东北大学 工业过程运行指标智能预报方法、装置、设备及存储介质
CN112464541B (zh) * 2020-12-18 2024-05-24 浙江工业大学 一种考虑多尺度不确定性的混合复合材料铺层方法
CN112765623B (zh) * 2021-01-15 2022-08-02 浙江科技学院 基于相位恢复算法的光学多图像认证和加密方法
CN113052306B (zh) * 2021-03-04 2022-04-22 华南理工大学 一种基于堆叠式宽度学习模型的在线学习芯片
CN116776247A (zh) * 2023-06-21 2023-09-19 哈尔滨工程大学 一种基于深度学习的雷达辐射源结构反演方法
CN120822621B (zh) * 2025-09-17 2025-11-25 吉林大学 一种稀疏城市数据的主动感知方法、设备及存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103489033A (zh) * 2013-09-27 2014-01-01 南京理工大学 融合自组织映射与概率神经网络的增量式学习方法
CN103530689A (zh) * 2013-10-31 2014-01-22 中国科学院自动化研究所 一种基于深度学习的聚类方法
CN103838836A (zh) * 2014-02-25 2014-06-04 中国科学院自动化研究所 基于判别式多模态深度置信网多模态数据融合方法和系统

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014511677A (ja) * 2011-03-22 2014-05-19 コーネル・ユニバーシティー 鑑別困難な良性甲状腺病変と悪性甲状腺病変との識別法
US8700552B2 (en) * 2011-11-28 2014-04-15 Microsoft Corporation Exploiting sparseness in training deep neural networks

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103489033A (zh) * 2013-09-27 2014-01-01 南京理工大学 融合自组织映射与概率神经网络的增量式学习方法
CN103530689A (zh) * 2013-10-31 2014-01-22 中国科学院自动化研究所 一种基于深度学习的聚类方法
CN103838836A (zh) * 2014-02-25 2014-06-04 中国科学院自动化研究所 基于判别式多模态深度置信网多模态数据融合方法和系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3282401A4 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018097467A (ja) * 2016-12-09 2018-06-21 国立大学法人電気通信大学 プライバシ保護データ提供システム及びプライバシ保護データ提供方法
CN107316024A (zh) * 2017-06-28 2017-11-03 北京博睿视科技有限责任公司 基于深度学习的周界报警算法
CN107316024B (zh) * 2017-06-28 2021-06-29 北京博睿视科技有限责任公司 基于深度学习的周界报警算法
CN110209943A (zh) * 2019-06-04 2019-09-06 成都终身成长科技有限公司 一种单词推送方法、装置及电子设备
CN110533170A (zh) * 2019-08-30 2019-12-03 陕西思科锐迪网络安全技术有限责任公司 一种图形化编程的深度学习神经网络搭建方法

Also Published As

Publication number Publication date
EP3282401A1 (en) 2018-02-14
CN106033555A (zh) 2016-10-19
JP6466590B2 (ja) 2019-02-06
JP2018511871A (ja) 2018-04-26
US11048998B2 (en) 2021-06-29
US20180068216A1 (en) 2018-03-08
EP3282401A4 (en) 2018-05-30

Similar Documents

Publication Publication Date Title
WO2016145676A1 (zh) 基于满足k度稀疏约束的深度学习模型的大数据处理方法
Wang et al. Research on Web text classification algorithm based on improved CNN and SVM
CN108470320B (zh) 一种基于cnn的图像风格化方法及系统
CN104751228B (zh) 用于语音识别的深度神经网络的构建方法及系统
WO2016145675A1 (zh) 一种基于分段的两级深度学习模型的大数据处理方法
CN113706545A (zh) 一种基于双分支神经判别降维的半监督图像分割方法
CN112070209A (zh) 基于w距离的稳定可控图像生成模型训练方法
CN107832292B (zh) 一种基于神经网络模型的图像到汉语古诗的转换方法
CN114627282B (zh) 目标检测模型的建立方法、应用方法、设备、装置及介质
CN110263324A (zh) 文本处理方法、模型训练方法和装置
CN112036512A (zh) 基于网络裁剪的图像分类神经网络架构搜索方法和装置
CN104751227B (zh) 用于语音识别的深度神经网络的构建方法及系统
CN113469367A (zh) 一种联邦学习方法、装置及系统
CN107330446A (zh) 一种面向图像分类的深度卷积神经网络的优化方法
CN111242155A (zh) 一种基于多模深度学习的双模态情感识别方法
CN108171319A (zh) 网络连接自适应深度卷积模型的构建方法
CN117058156B (zh) 一种半监督医学图像分割方法
Miao et al. Evolving convolutional neural networks by symbiotic organisms search algorithm for image classification
CN119046730A (zh) 基于预训练语言模型和深度提示的文本图节点分类方法
CN111401261A (zh) 基于gan-cnn框架的机器人手势识别方法
CN114444506A (zh) 一种融合实体类型的关系三元组抽取方法
CN118503788A (zh) 一种多源跨网络节点分类方法、设备及介质
CN114612761A (zh) 一种面向图像识别的网络架构搜索方法
CN110727871A (zh) 基于卷积分解深度模型的多模态数据采集及综合分析平台
CN116543289B (zh) 一种基于编码器-解码器及Bi-LSTM注意力模型的图像描述方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15885059

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 15557469

Country of ref document: US

ENP Entry into the national phase

Ref document number: 2017548139

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

REEP Request for entry into the european phase

Ref document number: 2015885059

Country of ref document: EP