WO2019052301A1 - 视频分类的方法、信息处理的方法以及服务器 - Google Patents
视频分类的方法、信息处理的方法以及服务器 Download PDFInfo
- Publication number
- WO2019052301A1 WO2019052301A1 PCT/CN2018/100733 CN2018100733W WO2019052301A1 WO 2019052301 A1 WO2019052301 A1 WO 2019052301A1 CN 2018100733 W CN2018100733 W CN 2018100733W WO 2019052301 A1 WO2019052301 A1 WO 2019052301A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video frame
- video
- feature
- feature sequence
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/73—Querying
- G06F16/735—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/75—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/15—Correlation function computation including computation of convolution operations
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/48—Matching video sequences
Definitions
- the present invention relates to the field of computer technologies, and in particular, to video classification techniques.
- the video classification method mainly adopts the feature extraction of each video frame in the marked video, and then transforms the feature of the frame level into the feature of the video level by the average feature method, and finally transmits the feature of the video level to the classification. Classification in the network.
- the embodiment of the present invention provides a video classification method, an information processing method, and a server.
- a feature change of a video in a time dimension is also considered, so that the video content can be better expressed. Improve the accuracy of video classification and improve the effect of video classification.
- the first aspect of the present invention provides a method for video classification, which is performed by a computer device, and includes:
- time feature sampling rule is a correspondence between a time feature and a video frame feature sequence
- the first neural network model Processing, by the first neural network model, the at least one video frame feature sequence to obtain a feature expression result corresponding to the at least one video frame feature sequence;
- the first neural network model is a recurrent neural network model;
- a second aspect of the present invention provides a method of information processing, which is performed by a computer device, and includes:
- time feature sampling rule is a correspondence between a time feature and a video frame feature sequence
- the first neural network model Processing, by the first neural network model, the at least one video frame feature sequence to obtain a feature expression result corresponding to the at least one video frame feature sequence;
- the first neural network model is a recurrent neural network model;
- a third aspect of the present invention provides a server, including:
- a first acquiring module configured to acquire a video to be processed, where the to-be-processed video includes multiple video frames, and each video frame corresponds to a time feature;
- a second acquiring module configured to sample the to-be-processed video acquired by the first acquiring module according to a time feature sampling rule, and acquire at least one video frame feature sequence, where the time feature sampling rule is a time feature and Correspondence between video frame feature sequences;
- the first neural network model is a recurrent neural network model
- a second input module configured to process a feature expression result corresponding to the at least one video frame feature sequence obtained by inputting the first input module by using a second neural network model, to obtain the at least one video frame feature sequence Corresponding prediction result, wherein the prediction result is used to determine a category of the to-be-processed video.
- a fourth aspect of the present invention provides a server, including: a memory, a processor, and a bus system;
- the memory is used to store a program
- the processor is configured to execute the program in the memory, and specifically includes the following steps:
- time feature sampling rule is a correspondence between a time feature and a video frame feature sequence
- the first neural network model Processing, by the first neural network model, the at least one video frame feature sequence to obtain a feature expression result corresponding to the at least one video frame feature sequence;
- the first neural network model is a recurrent neural network model;
- the bus system is configured to connect the memory and the processor to cause the memory and the processor to communicate.
- a fifth aspect of the invention provides a computer readable storage medium for storing program code for performing the method of the above aspects.
- a sixth aspect of the invention provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the methods described in the various aspects above.
- a method for information processing is provided.
- a computer device acquires a video to be processed, where the video to be processed includes multiple video frames, each video frame corresponds to a time feature, and then is treated according to a time feature sampling rule.
- the first neural network model is a recurrent neural network model
- the computer device uses at least one video frame feature sequence corresponding to the feature expression result by the second neural network model
- Processing is performed to obtain a prediction result corresponding to at least one video frame feature sequence, and the prediction result is used to determine a category of the to-be-processed video.
- the feature change of the video in the time dimension is also considered, so that the video content can be better expressed, the accuracy of the video classification is improved, and the effect of the video classification is improved.
- FIG. 1 is a schematic structural diagram of information processing in an embodiment of the present invention.
- FIG. 2 is a schematic diagram of an embodiment of a method for processing information according to an embodiment of the present invention
- FIG. 3 is a schematic diagram of a video to be processed in an embodiment of the present invention.
- FIG. 4 is a schematic diagram of a convolutional neural network having an idea structure according to an embodiment of the present invention.
- FIG. 5 is a schematic structural diagram of a first neural network model according to an embodiment of the present invention.
- FIG. 6 is a schematic structural diagram of a second neural network model according to an embodiment of the present invention.
- FIG. 7 is a schematic diagram of an embodiment of a server according to an embodiment of the present invention.
- FIG. 8 is a schematic diagram of another embodiment of a server according to an embodiment of the present invention.
- FIG. 9 is a schematic diagram of another embodiment of a server according to an embodiment of the present invention.
- FIG. 10 is a schematic diagram of another embodiment of a server according to an embodiment of the present invention.
- FIG. 11 is a schematic diagram of another embodiment of a server according to an embodiment of the present invention.
- FIG. 12 is a schematic diagram of another embodiment of a server according to an embodiment of the present invention.
- FIG. 13 is a schematic diagram of another embodiment of a server according to an embodiment of the present invention.
- FIG. 14 is a schematic diagram of another embodiment of a server according to an embodiment of the present invention.
- FIG. 15 is a schematic structural diagram of a server according to an embodiment of the present invention.
- the embodiment of the present invention provides a video classification method, an information processing method, and a server.
- a feature change of a video in a time dimension is also considered, so that the video content can be better expressed. Improve the accuracy of video classification and improve the effect of video classification.
- the solution is mainly used for providing a video content classification service, performing feature extraction, time series modeling and feature compression on a background computer device, and finally classifying video features by a hybrid expert model, thereby implementing video on a computer device.
- Automate classification and labeling This solution can be deployed on video websites to add keyword words to videos in video websites. It can also be used to quickly search and match content. In addition, it can be used for video personalized recommendations.
- FIG. 1 is a schematic structural diagram of information processing according to an embodiment of the present invention.
- a computer device acquires a to-be-processed video.
- the to-be-processed video includes Multiple video frames, and each video frame corresponds to a time feature, and different time features can be represented by t.
- the computer device uses a convolutional neural network to process each video frame in the processed video to obtain a temporal feature corresponding to each video frame, and then the computer device determines to be processed according to the time characteristic corresponding to each video frame.
- a temporal feature sequence of a video that is, a deep learning expression at the frame level.
- the computer device can sample the processed video according to the temporal feature sampling rule, wherein the temporal feature sampling rule refers to sampling the video feature in the time dimension at different frame rates, and acquiring at least one video.
- Frame feature sequences these video frame feature sequences correspond to different time scales.
- the computer device inputs the video frame feature sequence corresponding to different time scales into the bidirectional recurrent neural network, and obtains a feature expression result corresponding to at least one video frame feature sequence, and the feature expression result is a video feature expression on a time scale.
- the computer device inputs all the feature expression results into the second neural network, that is, the hybrid expert model, and obtains a prediction result corresponding to each video frame feature sequence, and according to the prediction results, the category of the to-be-processed video can be determined, This classifies the videos to be processed.
- the second neural network that is, the hybrid expert model
- An embodiment of the method of information processing in an example includes:
- FIG. 3 is a schematic diagram of a video to be processed according to an embodiment of the present invention.
- the video to be processed includes multiple video frames, as shown in FIG. Each picture is a video frame, and each video frame corresponds to a time feature.
- the video to be processed corresponds to a playback time, so each frame of the video frame corresponds to a different playback time, and the time feature of the first video frame in the to-be-processed video is “1”, then the second video frame is The time characteristic is "2”, and so on, and the temporal feature of the Tth video frame is "T".
- the video to be processed is sampled according to a time feature sampling rule, and at least one video frame feature sequence is obtained, where the time feature sampling rule is a correspondence between a time feature and a video frame feature sequence;
- the server needs to perform sampling processing on the to-be-processed video according to the temporal feature sampling rule.
- the time feature sampling rule includes a relationship between a preset time feature and a video frame feature sequence; in an actual application, a video frame feature sequence may be acquired, and at least two different time scale video frames may also be acquired. a sequence of features; a sequence of video frame features corresponding to different time scales, wherein the number of temporal features corresponding to each video frame feature included is different, and correspondingly, the lengths of the video frame feature sequences corresponding to different time scales are also different.
- a video to be processed has a total of 1000 video frames, and the 1000 video frames respectively correspond to a time feature of 1 to 1000.
- the temporal feature sampling rule is that each time feature corresponds to one video frame feature, then 1000 times of the video to be processed.
- the feature will correspond to 1000 video frame features, and correspondingly, the length of the video frame feature sequence consisting of the 1000 video frame features is 1000.
- the time feature sampling rule corresponds to one video frame feature per 100 time features, then 1000 time features of the to-be-processed video will correspond to 10 video frame features, and correspondingly, the video frame feature sequence composed of the 10 video frame features The length is 10.
- the at least one video frame feature sequence is processed by the first neural network model to obtain a feature expression result corresponding to the at least one video frame feature sequence; wherein each video frame feature sequence has a feature expression corresponding to each result;
- the server may input the video frame feature sequence corresponding to different time scales to the first neural network model, where the first neural network model is a recurrent neural network model, and then The input at least one video frame feature sequence is recursively processed by the first neural network model, and the feature expression results of the respective video frame feature sequences are outputted accordingly.
- the different time scales are different video frame feature sequence lengths. As described in step 102, it is assumed that the total length of the video is T. If each time feature corresponds to one video frame feature, then the video frame feature sequence length is It is T/1. If every 10 time features correspond to one video frame feature, then the video frame feature sequence length is T/10.
- the second neural network model processes the feature expression result corresponding to the at least one video frame feature sequence to obtain a prediction result corresponding to the at least one video frame feature sequence, where the prediction result is used to determine a category of the to-be-processed video.
- Each of the video frame feature sequences respectively has a prediction result.
- the server may separately input the feature expression result corresponding to each video frame feature sequence to the second neural network model, and then pass through the second neural network. After the model processes the input expression results of each feature, the prediction result corresponding to each feature expression result is output. Finally, the server can determine the category of the video to be processed based on the prediction results.
- the categories of the videos to be processed may include “sports”, “news”, “music”, “anime” and “games”, etc., which are not limited herein.
- a server acquires a video to be processed, where the video to be processed includes multiple video frames, each video frame corresponds to a time feature, and then is processed according to a time feature sampling rule.
- the video is sampled, and at least one video frame feature sequence is obtained, wherein the time feature sampling rule is a correspondence between the time feature and the video frame feature sequence, and the server inputs the at least one video frame feature sequence to the first neural network model.
- the first neural network model is a recurrent neural network model
- the server inputs at least a feature expression result corresponding to the video frame feature sequence to the second neural network model, to obtain each The prediction result corresponding to the video frame feature sequence, and the prediction result is used to determine the category of the video to be processed.
- the feature change of the video in the time dimension is also considered, so that the video content can be better expressed, the accuracy of the video classification is improved, and the effect of the video classification is improved.
- the method may further include:
- a time feature sequence of the video to be processed is determined according to a temporal feature corresponding to each video frame, wherein the time feature sequence is used for sampling.
- the server may process each video frame in the processed video by using a convolutional neural network (CNN) having an inception structure, and then extract each video.
- CNN convolutional neural network
- the server determines the temporal feature sequence of the video to be processed based on the temporal characteristics of each video frame. Assuming that the first video frame of the video to be processed is 1, the second video frame is 2, and so on, and the last video frame is T, it can be determined that the time feature sequence of the video to be processed is T (seconds).
- FIG. 4 is a schematic diagram of a convolutional neural network having an idea structure according to an embodiment of the present invention.
- the inception structure includes three convolutions of different sizes, that is, 1 ⁇ 1 convolutional layer, 3 ⁇ 3 convolutional layer, 5 ⁇ 5 convolutional layer, and 3 ⁇ 3 maximum pooling layer, removing the last fully connected layer, and using the global average pooling layer (changing the image size 1 ⁇ 1) replaces the fully connected layer.
- the convolutional neural network may also process each video frame in the to-be-processed video, and obtain time features corresponding to the respective video frames. A sequence of time features that make up the entire video to be processed.
- each time window includes at least one video frame in the to-be-processed video
- a video frame feature sequence corresponding to each time window is extracted from the time feature sequence.
- At least one time window is first defined according to a temporal feature sampling rule to perform multi-scale video frame feature sequence sampling.
- the time window size can be artificially defined. The more the number of video frames in a time window, the larger the granularity. For each content in the time window, we do it as an average operation. Into a "one frame" of content.
- a method for extracting a video frame feature sequence at different time scales is described, that is, at least one time window is first determined according to a temporal feature sampling rule, wherein each time window includes at least one video in a to-be-processed video.
- the frame then extracts a sequence of video frame features corresponding to each time window from the temporal feature sequence.
- the at least one video frame feature sequence is configured by using the first neural network model.
- Performing processing to obtain a feature expression result corresponding to each video frame feature sequence may include:
- FIG. 5 is a schematic structural diagram of a first neural network model according to an embodiment of the present invention.
- the entire first neural network model includes two parts, namely, a forward recurrent neural network and The backward recurrent neural network inputs each video frame feature sequence to the forward recurrent neural network, and then outputs a corresponding first expression result.
- each video frame feature sequence is input to a backward recurrent neural network, and then a corresponding second expression result is output.
- the recursive neural network based on the recursive gate unit can be used to time series modeling the video frame feature sequence, and further, in order to better serve different time scales.
- the information is characterized.
- the first neural network model can also be used for video feature compression.
- feature compression and expression are performed from the forward and backward to the time center point position of the video to be processed, respectively, using the bidirectional recurrent neural network. In this way, the operability of the solution is improved.
- the first expression result and the second expression result are calculated.
- the feature expression result corresponding to the at least one video frame feature sequence may include:
- the feature expression corresponding to the at least one video frame feature sequence is calculated by the following formula:
- h represents the feature expression result of a video frame feature sequence
- x t represents the video frame feature sequence at time t
- GRU() represents the GRU neural network processing using gated loop unit
- T represents the total time of the video to be processed
- t represents one of 1 to T Integer.
- the bidirectional recurrent neural network can be used to perform feature compression and expression from the forward and backward video time center point positions, respectively. Specifically, for a certain scale video frame feature sequence x t , t ⁇ [1, T].
- the forward recurrent neural network is:
- the backward recurrent neural network is:
- GRU() is a recursive gate unit function whose specific form is:
- ⁇ g represents the sigmoid function
- ⁇ h represents the inverse tangent function
- W z , W r , W t , U z , U r and U h are linear transformation parameter matrices, and different subscripts respectively represent different “ The gate”
- b z , b r and b h are offset parameter vectors. Represents the calculation of a composite function.
- the embodiment of the present invention specifically describes how to calculate the feature expression result corresponding to each video frame feature sequence according to the first expression result and the second expression result.
- the prediction result can be calculated by using the relevant formula, which provides a feasible way for the realization of the scheme, thereby improving the feasibility and operability of the scheme.
- the at least one video frame feature sequence is configured by the second neural network model.
- the corresponding feature expression result is processed to obtain a prediction result corresponding to the at least one video frame feature sequence, which may include:
- FIG. 6 is a schematic structural diagram of a second neural network model according to an embodiment of the present invention.
- the entire second neural network model includes two parts, namely a first sub-model and The second sub-model, which may be referred to as a "gate expression”, may also be referred to as an "activating expression.”
- the feature expression result corresponding to each video frame feature sequence is input to the "gate expression”, and then the corresponding third expression result is output.
- the feature expression result corresponding to each video frame feature sequence is input to the "activation expression”, and then the corresponding fourth expression result is output.
- Each third expression result and each fourth expression result are multiplied and then added to obtain a prediction result of the video frame feature sequence.
- the second neural network model may be further used to classify the feature expression result.
- the feature expression results can be obtained by nonlinear transformation to obtain the gate expression and the activation expression respectively, and then the two expressions are multiplied and added to obtain the final feature expression for classification, thereby facilitating the classification accuracy.
- the third expression result and the fourth expression result are calculated according to the fifth embodiment corresponding to FIG.
- the prediction result corresponding to the at least one video frame feature sequence may include:
- the prediction result corresponding to the at least one video frame feature sequence is calculated by the following formula:
- g n ⁇ g (W g h+b g ),n ⁇ [1,N];
- a n ⁇ a (W a h+b a ),n ⁇ [1,N];
- lable represents the prediction result of a video frame feature sequence
- g n represents the third expression result
- a n represents the fourth expression result
- ⁇ g represents the softmax function
- ⁇ a represents the sigmoid function
- h represents the characteristic expression of the video frame feature sequence.
- W g and b g represent the parameters in the first sub-model
- W a and b a represent the parameters of the second sub-model
- N represents the total number of calculations obtained by nonlinearly transforming the feature expression results
- n represents from 1 to An integer in N.
- the product operation After obtaining two ways of expression, the product operation will be performed, and then the addition operation will be performed to obtain a prediction result of a video frame feature sequence.
- how to calculate the prediction result corresponding to each video frame feature sequence is calculated according to the third expression result and the fourth expression result.
- the prediction result can be calculated by using the relevant formula, which provides a feasible way for the realization of the scheme, thereby improving the feasibility and operability of the scheme.
- the second neural network model may further include:
- the videos to be processed are classified according to the category of the video to be processed.
- the server may further calculate a category of the to-be-processed video according to the prediction result corresponding to each video frame feature sequence and the weight value corresponding to each video frame feature sequence, and perform the to-be-processed video according to the classification result. classification.
- the prediction result is represented by a “0 and 1” code of length 5.
- the code with the prediction result of 1 is 00001
- the code with the prediction result of 3 is 00100, and so on. If a pending video contains both prediction result 1 and prediction result 3, the pending video is represented as 00101.
- each prediction result is not more than 1, and the prediction result can represent the possibility that the video to be processed belongs to this category.
- ⁇ 0.01, 0.02, 0.9, 0.005, 1.0 ⁇ is a reasonable prediction result, meaning that the probability that the pending video belongs to the first category is 1.0 or 100%, and the probability of belonging to the second category is 0.005 or 0.5%.
- the probability of belonging to the third category is 0.9 or 90%, the probability of belonging to the fourth category is 0.02 or 2%, and the probability of belonging to the fifth category is 0.01 or 1%.
- the prediction result is calculated by using the preset weight value, and the calculation may adopt a weighting algorithm, and each weight value is learned by linear regression, which is a value representing the importance of each video frame feature sequence, and The sum of the weight values is 1, such as ⁇ 0.1, 0.4, 0.5 ⁇ .
- linear regression which is a value representing the importance of each video frame feature sequence
- the sum of the weight values is 1, such as ⁇ 0.1, 0.4, 0.5 ⁇ .
- the prediction result of video frame feature sequence 1 is ⁇ 0.01, 0.02, 0.9, 0.005, 1.0 ⁇
- the prediction result of video frame feature sequence 2 is ⁇ 0.02, 0.01, 0.9, 0.000. 0.9 ⁇
- the prediction result of video frame feature sequence 3 is ⁇ 0.2, 0.3, 0.8, 0.01 0.7 ⁇
- the category of the video to be processed is expressed as:
- the probability that the pending video belongs to the third category is the largest, followed by the first category, so the pending video can be preferentially displayed in the video list of the third category.
- the server may further perform a prediction result corresponding to each video frame feature sequence and a weight corresponding to each video frame feature sequence.
- the value, the category of the video to be processed is calculated, and finally the video to be processed is classified according to the category of the video to be processed.
- FIG. 7 is a schematic diagram of an embodiment of a server according to an embodiment of the present invention.
- the server 20 includes:
- the first obtaining module 201 is configured to acquire a video to be processed, where the to-be-processed video includes multiple video frames, and each video frame corresponds to one time feature;
- the second obtaining module 202 is configured to sample the to-be-processed video acquired by the first acquiring module 201 according to a temporal feature sampling rule, and acquire at least one video frame feature sequence, where the time feature sampling rule is time Correspondence between features and video frame feature sequences;
- the first input module 203 is configured to process, by using the first neural network model, the at least one video frame feature sequence acquired by the second acquiring module 202, to obtain a feature expression result corresponding to the at least one video frame feature sequence.
- the first neural network model is a recurrent neural network model;
- a second input module 204 configured to process, by using the second neural network model, the feature input result corresponding to the at least one video frame feature sequence by the first input module 203, to obtain the at least one video frame feature sequence Corresponding prediction result, wherein the prediction result is used to determine a category of the to-be-processed video.
- the first obtaining module 201 acquires a to-be-processed video, where the to-be-processed video includes multiple video frames, each video frame corresponding to one time feature, and the second obtaining module 202 performs the The to-be-processed video acquired by the first acquiring module 201 is sampled, and at least one video frame feature sequence is acquired, where the time feature sampling rule is a correspondence between a time feature and a video frame feature sequence, and the first input module 203: processing, by using the first neural network model, the at least one video frame feature sequence acquired by the second acquiring module 202, to obtain a feature expression result corresponding to at least one video frame feature sequence, where the first neural network model is recursive a neural network model, the second input module 204 processes, by using the second neural network model, the feature expression result corresponding to the at least one video frame feature sequence obtained by the first input module 203, to obtain the at least one video frame feature.
- a prediction result corresponding to the sequence wherein the prediction result is used to
- a server acquires a video to be processed, where the video to be processed includes multiple video frames, each video frame corresponds to a time feature, and then the video to be processed is processed according to a time feature sampling rule. Sampling, and acquiring at least one video frame feature sequence, wherein the time feature sampling rule is a correspondence between the time feature and the video frame feature sequence, and the server further inputs at least one video frame feature sequence to the first neural network model to obtain each The feature expression result corresponding to the video frame feature sequence is finally input by the server to the second neural network model corresponding to the feature expression result corresponding to each video frame feature sequence, and the prediction result corresponding to each video frame feature sequence is obtained, and the prediction result is obtained.
- the category used to determine the video to be processed is also considered, so that the video content can be better expressed, the accuracy of the video classification is improved, and the effect of the video classification is improved.
- the server 20 further includes:
- the processing module 205 is configured to process, by using the convolutional neural network CNN, each video frame in the to-be-processed video, to obtain a corresponding video frame, The time characteristic;
- the determining module 206 is configured to determine a time feature sequence of the to-be-processed video according to a time feature corresponding to the video frame processed by the processing module 205, where the time feature sequence is used for sampling.
- the convolutional neural network may also process each video frame in the to-be-processed video, and obtain time features corresponding to the respective video frames. A sequence of time features that make up the entire video to be processed.
- the second obtaining module 202 includes:
- a determining unit 2021 configured to determine, according to the temporal feature sampling rule, at least one time window, where each time window includes at least one video frame in the to-be-processed video;
- the extracting unit 2022 is configured to extract, from the temporal feature sequence, a video frame feature sequence corresponding to each time window determined by the determining unit 2021.
- a method for extracting a video frame feature sequence at different scales is described, that is, at least one time window is first determined according to a temporal feature sampling rule, wherein each time window includes at least one video frame in a to-be-processed video. And then extracting a video frame feature sequence corresponding to each time window from the time feature sequence.
- the video frame feature sequences at different scales can be obtained, thereby obtaining a plurality of different samples for feature training, which is beneficial to improve the accuracy of the video classification result.
- the first input module 203 includes:
- a first acquiring unit 2031 configured to input the at least one video frame feature sequence into a forward recurrent neural network in the first neural network model, to obtain a first expression result
- a second obtaining unit 2032 configured to input the video frame feature sequence into a backward recurrent neural network in the first neural network model, to obtain a second expression result
- the first calculating unit 2033 is configured to calculate the at least one video frame feature sequence according to the first expression result acquired by the first acquiring unit 2031 and the second expression result acquired by the second acquiring unit 2032. Corresponding feature expression results.
- the recursive neural network based on the recursive gate unit can be used to time series modeling the video frame feature sequence, and further, in order to better serve different time scales.
- the information is characterized.
- the first neural network model can also be used for video feature compression.
- feature compression and expression are performed from the forward and backward to the time center point position of the video to be processed, respectively, using the bidirectional recurrent neural network. In this way, the operability of the solution is improved.
- the first calculating unit 2033 includes:
- the first calculating subunit 20331 is configured to calculate a feature expression result corresponding to the at least one video frame feature sequence by using the following formula:
- h represents a feature expression result of a video frame feature sequence
- Representing the first expression result the Representing the second expression result
- the x t represents the video frame feature sequence at the time t
- the GRU() represents a GRU neural network process using a gated loop unit
- the T represents the video to be processed The total time
- the t represents an integer from 1 to the T.
- the embodiment of the present invention specifically describes how to calculate the feature expression result corresponding to each video frame feature sequence according to the first expression result and the second expression result.
- the prediction result can be calculated by using the relevant formula, which provides a feasible way for the realization of the scheme, thereby improving the feasibility and operability of the scheme.
- the second input module 204 includes:
- the third obtaining unit 2041 is configured to input a feature expression result corresponding to each video frame feature sequence to a first sub-model in the second neural network model to obtain a third expression result;
- a fourth obtaining unit 2042 configured to input a feature expression result corresponding to each video frame feature sequence to a second sub-model in the second neural network model, to obtain a fourth expression result;
- a second calculating unit 2043 configured to calculate, according to the third expression result acquired by the third acquiring unit 2041 and the fourth expression result acquired by the fourth acquiring unit 2042, the video sequence of each video frame The corresponding prediction result.
- the second neural network model may be further used to classify the feature expression result.
- the feature expression results can be obtained by nonlinear transformation to obtain the gate expression and the activation expression respectively, and then the two expressions are multiplied and added to obtain the final feature expression for classification, thereby facilitating the classification accuracy.
- the second calculating unit 2043 includes:
- the second calculating subunit 20431 is configured to calculate a prediction result corresponding to each video frame feature sequence by using the following formula:
- g n ⁇ g (W g h+b g ),n ⁇ [1,N];
- a n ⁇ a (W a h+b a ),n ⁇ [1,N];
- the lable represents a prediction result of a video frame feature sequence
- the g n represents the third expression result
- the a n represents the fourth expression result
- the ⁇ g represents a softmax function
- the ⁇ a represents a sigmoid function
- the h represents a feature expression result of the video frame feature sequence
- the W g and the b g represent parameters in the first sub-model
- the W a and the b a represent a parameter of the second sub-model
- N represents a total number of calculations obtained by nonlinearly transforming the feature expression result
- the n represents an integer from 1 to the N.
- how to calculate the prediction result corresponding to each video frame feature sequence is calculated according to the third expression result and the fourth expression result.
- the prediction result can be calculated by using the relevant formula, which provides a feasible way for the realization of the scheme, thereby improving the feasibility and operability of the scheme.
- the server 20 further includes:
- the calculation module 207 is configured to process, by the second input module 204, a feature expression result corresponding to the at least one video frame feature sequence by using a second neural network model, to obtain a prediction corresponding to the at least one video frame feature sequence After the result, the category of the to-be-processed video is calculated according to the prediction result corresponding to the at least one video frame feature sequence and the weight value corresponding to the at least one video frame feature sequence;
- the classification module 208 is configured to classify the to-be-processed video according to the category of the to-be-processed video calculated by the calculation module 207.
- the server may further perform a prediction result corresponding to each video frame feature sequence and a weight corresponding to each video frame feature sequence.
- the value, the category of the video to be processed is calculated, and finally the video to be processed is classified according to the category of the video to be processed.
- FIG. 15 is a schematic structural diagram of a server according to an embodiment of the present invention.
- the server 300 may have a large difference due to different configurations or performances, and may include one or more central processing units (CPUs) 322 (for example, One or more processors and memory 332, one or more storage media 330 storing application 342 or data 344 (eg, one or one storage device in Shanghai).
- the memory 332 and the storage medium 330 may be short-term storage or persistent storage.
- the program stored on storage medium 330 may include one or more modules (not shown), each of which may include a series of instruction operations in the server.
- the central processor 322 can be configured to communicate with the storage medium 330 to perform a series of instruction operations in the storage medium 330 on the server 300.
- Server 300 may also include one or more power sources 326, one or more wired or wireless network interfaces 350, one or more input and output interfaces 358, and/or one or more operating systems 341, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and more.
- the steps performed by the server in the above embodiment may be based on the server structure shown in FIG.
- the CPU 322 included in the server has the following functions:
- time feature sampling rule is a correspondence between a time feature and a video frame feature sequence
- the first neural network model Processing, by the first neural network model, the at least one video frame feature sequence to obtain a feature expression result corresponding to the at least one video frame feature sequence;
- the first neural network model is a recurrent neural network model;
- the CPU 322 is further configured to perform the following steps:
- the CPU 322 is specifically configured to perform the following steps:
- each time window includes at least one video frame in the to-be-processed video
- the CPU 322 is specifically configured to perform the following steps:
- the CPU 322 is specifically configured to perform the following steps:
- h represents a feature expression result of a video frame feature sequence
- Representing the first expression result the Representing the second expression result
- the x t represents the video frame feature sequence at the time t
- the GRU() represents a GRU neural network process using a gated loop unit
- the T represents the video to be processed The total time
- the t represents an integer from 1 to the T.
- the CPU 322 is specifically configured to perform the following steps:
- the CPU 322 is specifically configured to perform the following steps:
- g n ⁇ g (W g h+b g ),n ⁇ [1,N];
- a n ⁇ a (W a h+b a ),n ⁇ [1,N];
- the lable represents a prediction result of a video frame feature sequence
- the g n represents the third expression result
- the a n represents the fourth expression result
- the ⁇ g represents a softmax function
- the ⁇ a represents a sigmoid function
- the h represents a feature expression result of the video frame feature sequence
- the W g and the b g represent parameters in the first sub-model
- the W a and the b a represent a parameter of the second sub-model
- N represents a total number of calculations obtained by nonlinearly transforming the feature expression result
- the n represents an integer from 1 to the N.
- the CPU 322 is further configured to perform the following steps:
- the to-be-processed video is classified according to the category of the to-be-processed video.
- the embodiment of the present invention further provides a storage medium for storing program code, which is used to execute any one of the information processing methods described in the foregoing embodiments.
- the computer program product includes one or more computer instructions.
- the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
- the computer instructions can be stored in a computer readable storage medium or transferred from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be from a website site, computer, server or data center Transmission to another website site, computer, server, or data center by wire (eg, coaxial cable, fiber optic, digital subscriber line (DSL), or wireless (eg, infrared, wireless, microwave, etc.).
- wire eg, coaxial cable, fiber optic, digital subscriber line (DSL), or wireless (eg, infrared, wireless, microwave, etc.).
- the readable storage medium can be any available media that can be stored by the computer or a data storage device such as a server, data center, or the like that includes one or more available media.
- the available media can be magnetic media (eg, floppy disk, hard disk, tape) ), an optical medium (for example, a digital versatile disc (DVD)), or a semiconductor medium (for example, a solid state disk (SSD) or the like.
- the disclosed system, apparatus, and method may be implemented in other manners.
- the device embodiments described above are merely illustrative.
- the division of the unit is only a logical function division.
- there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
- the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
- each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
- the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
- the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
- the technical solution of the present invention which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium.
- a number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
- the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program code. .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Databases & Information Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Library & Information Science (AREA)
- Algebra (AREA)
- Image Analysis (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
Description
Claims (16)
- 一种视频分类的方法,所述方法由计算机设备执行,包括:获取待处理视频,其中,所述待处理视频包含多个视频帧,每个视频帧对应一个时间特征;根据时间特征采样规则对所述待处理视频进行采样,并获取至少一个视频帧特征序列,其中,所述时间特征采样规则为时间特征与视频帧特征序列之间的对应关系;通过第一神经网络模型对所述至少一个视频帧特征序列进行处理,得到所述至少一个视频帧特征序列所对应的特征表达结果;所述第一神经网络模型为递归神经网络模型;通过第二神经网络模型对所述至少一个视频帧特征序列所对应的特征表达结果进行处理,得到所述至少一个视频帧特征序列所对应的预测结果;根据所述至少一个视频帧特征序列所对应的预测结果确定所述待处理视频的类别。
- 一种信息处理的方法,所述方法由计算机设备执行,包括:获取待处理视频,其中,所述待处理视频包含多个视频帧,每个视频帧对应一个时间特征;根据时间特征采样规则对所述待处理视频进行采样,并获取至少一个视频帧特征序列,其中,所述时间特征采样规则为时间特征与视频帧特征序列之间的对应关系;通过第一神经网络模型对所述至少一个视频帧特征序列进行处理,得到所述至少视频帧特征序列所对应的特征表达结果;所述第一神经网络模型为递归神经网络模型;通过第二神经网络模型对所述至少一个视频帧特征序列所对应的特征表达结果进行处理,得到所述至少一个视频帧特征序列所对应的预测结果,其中,所述预测结果用于确定所述待处理视频的类别。
- 根据权利要求2所述的方法,所述获取待处理视频之后,所述方法还包括:采用卷积神经网络CNN对所述待处理视频中的每个视频帧进行处理,得到所述每个视频帧所对应的时间特征;根据所述每个视频帧所对应的时间特征,确定所述待处理视频的时间特征序列,其中,所述时间特征序列用于进行采样。
- 根据权利要求3所述的方法,所述根据时间特征采样规则对所述待处理视频进行采样,并获取至少一个视频帧特征序列,包括:根据所述时间特征采样规则确定至少一个时间窗口,其中,每个时间窗口包含所述待处理视频中的至少一个视频帧;从所述时间特征序列中提取所述每个时间窗口对应的视频帧特征序列。
- 根据权利要求2所述的方法,所述通过第一神经网络模型对所述至少一个视频帧特征序列进行处理,得到每个视频帧特征序列所对应的特征表达结果,包括:将所述至少一个视频帧特征序列输入至所述第一神经网络模型中的前向递归神经网络,获取第一表达结果;将所述至少一个视频帧特征序列输入至所述第一神经网络模型中的后向递归神经网络,获取第二表达结果;根据所述第一表达结果和所述第二表达结果,计算所述至少一个视频帧特征序列所对应的特征表达结果。
- 根据权利要求2所述的方法,所述通过第二神经网络模型对所述至少一个视频帧特征序列所对应的特征表达结果进行处理,得到所述至少一个视频帧特征序列所对应的预测结果,包括:将所述至少一个视频帧特征序列所对应的特征表达结果输入至所述第二神经网络模型中的第一子模型,获取第三表达结果;将所述至少一个视频帧特征序列所对应的特征表达结果输入至所述第二神经网络模型中的第二子模型,获取第四表达结果;根据所述第三表达结果和所述第四表达结果,计算所述至少一个视频帧特征序列所对应的预测结果。
- 根据权利要求7所述的方法,所述根据所述第三表达结果和所述第四表达结果,计算所述至少一个视频帧特征序列所对应的预测结果,包括:采用如下公式计算所述至少一个视频帧特征序列所对应的预测结果:g n=σ g(W gh+b g),n∈[1,N];a n=σ a(W ah+b a),n∈[1,N];其中,所述lable表示一个视频帧特征序列的预测结果,所述g n表示所述第三表达结果,所述a n表示所述第四表达结果,所述σ g表示softmax函数,所述σ a表示sigmoid函数,所述h表示所述视频帧特征序列的特征表达结果,所述W g和所述b g表示所述第一子模型中的参数,所述W a和所述b a表示所述第二子模型的参数,所述N表示对所述特征表达结果进行非线性变换后得到的计算总数,所述n表示从1至所述N中的一个整数。
- 根据权利要求1至8中任一项所述的方法,所述通过第二神经网络模型对所述至少一个视频帧特征序列所对应的特征表达结果进行处理,得到所述至少一个视频帧特征序列所对应的预测结果之后,所述方法还包括:根据所述至少一个视频帧特征序列所对应的所述预测结果以及所述至少一个视频帧特征序列所对应的权重值,计算所述待处理视频的类别;根据所述待处理视频的类别对所述待处理视频进行分类。
- 一种服务器,包括:第一获取模块,用于获取待处理视频,其中,所述待处理视频包含多个视频帧,每个视频帧对应一个时间特征;第二获取模块,用于根据时间特征采样规则对所述第一获取模块获取的所述待处理视频进行采样,并获取至少一个视频帧特征序列,其中,所述时间特征采样规则为时间特征与视频帧特征序列之间的对应关系;第一输入模块,用于通过第一神经网络模型对所述第二获取模块获取的所述至少一个视频帧特征序列进行处理,得到所述至少一个视频帧特征序列所对应的特征表达结果;所述第一神经网络模型为递归神经网络模型;第二输入模块,用于通过第二神经网络模型对所述第一输入模块输入后得到的所述至少一个视频帧特征序列所对应的特征表达结果进行处理,得到所述至少一个视频帧特征序列所对应的预测结果,其中,所述预测结果用于确定所述待处理视频的类别。
- 根据权利要求10所述的服务器,所述服务器还包括:计算模块,用于所述第二输入模块通过第二神经网络模型对所述至少一个视频帧特征序列所对应的特征表达结果进行处理,得到所述至少一个视频帧特征序列所对应的预测结果之后,根据所述至少一个视频帧特征序列所对应的所述预测结果以及所述至少一个视频帧特征序列所对应的权重值,计算所述待处理视频的类别;分类模块,用于根据所述计算模块计算的所述待处理视频的类别对所述 待处理视频进行分类。
- 一种服务器,包括:存储器、处理器以及总线系统;其中,所述存储器用于存储程序;所述处理器用于执行所述存储器中的程序,具体包括如下步骤:获取待处理视频,其中,所述待处理视频包含多个视频帧,每个视频帧对应一个时间特征;根据时间特征采样规则对所述待处理视频进行采样,并获取至少一个视频帧特征序列,其中,所述时间特征采样规则为时间特征与视频帧特征序列之间的对应关系;通过第一神经网络模型对所述至少一个视频帧特征序列进行处理,得到所述至少一个视频帧特征序列所对应的特征表达结果;所述第一神经网络模型为递归神经网络模型;通过第二神经网络模型对所述至少一个视频帧特征序列所对应的特征表达结果进行处理,得到所述至少一个视频帧特征序列所对应的预测结果,其中,所述预测结果用于确定所述待处理视频的类别;所述总线系统用于连接所述存储器以及所述处理器,以使所述存储器以及所述处理器进行通信。
- 根据权利要求12所述的服务器,所述处理器具体用于执行如下步骤:将所述至少一个视频帧特征序列输入至所述第一神经网络模型中的前向递归神经网络,获取第一表达结果;将所述至少一个视频帧特征序列输入至所述第一神经网络模型中的后向递归神经网络,获取第二表达结果;根据所述第一表达结果和所述第二表达结果,计算所述至少一个视频帧特征序列所对应的特征表达结果。
- 根据权利要求12所述的服务器,所述处理器具体用于执行如下步骤:将所述至少一个视频帧特征序列所对应的特征表达结果输入至所述第二神经网络模型中的第一子模型,获取第三表达结果;将所述至少一个视频帧特征序列所对应的特征表达结果输入至所述第二神经网络模型中的第二子模型,获取第四表达结果;根据所述第三表达结果和所述第四表达结果,计算所述至少一个视频帧特征序列所对应的预测结果。
- 一种计算机可读存储介质,所述计算机存储介质用于存储程序代码,所述程序代码用于执行权利要求2-9任一项所述的方法。
- 一种计算机程序产品,包括指令,当其在计算机上运行时,使得计算机执行权利要求2-9任一项所述的方法。
Priority Applications (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2020515067A JP7127120B2 (ja) | 2017-09-15 | 2018-08-16 | ビデオ分類の方法、情報処理の方法及びサーバー、並びにコンピュータ可読記憶媒体及びコンピュータプログラム |
| KR1020197032023A KR102392943B1 (ko) | 2017-09-15 | 2018-08-16 | 비디오 분류 방법, 정보 처리 방법 및 서버 |
| EP18855424.0A EP3683723A4 (en) | 2017-09-15 | 2018-08-16 | VIDEO CLASSIFICATION PROCEDURES, INFORMATION PROCESSING PROCEDURES AND SERVER |
| US16/558,015 US10956748B2 (en) | 2017-09-15 | 2019-08-30 | Video classification method, information processing method, and server |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201710833668.8 | 2017-09-15 | ||
| CN201710833668.8A CN109508584B (zh) | 2017-09-15 | 2017-09-15 | 视频分类的方法、信息处理的方法以及服务器 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US16/558,015 Continuation US10956748B2 (en) | 2017-09-15 | 2019-08-30 | Video classification method, information processing method, and server |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2019052301A1 true WO2019052301A1 (zh) | 2019-03-21 |
Family
ID=65723493
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2018/100733 Ceased WO2019052301A1 (zh) | 2017-09-15 | 2018-08-16 | 视频分类的方法、信息处理的方法以及服务器 |
Country Status (7)
| Country | Link |
|---|---|
| US (1) | US10956748B2 (zh) |
| EP (1) | EP3683723A4 (zh) |
| JP (1) | JP7127120B2 (zh) |
| KR (1) | KR102392943B1 (zh) |
| CN (2) | CN109508584B (zh) |
| MA (1) | MA50252A (zh) |
| WO (1) | WO2019052301A1 (zh) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111428660A (zh) * | 2020-03-27 | 2020-07-17 | 腾讯科技(深圳)有限公司 | 视频剪辑方法和装置、存储介质及电子装置 |
| KR20200140589A (ko) * | 2019-06-07 | 2020-12-16 | 국방과학연구소 | 순환 신경망을 이용한 코덱 분류 시스템 및 코덱 분류 방법 |
| CN113010735A (zh) * | 2019-12-20 | 2021-06-22 | 北京金山云网络技术有限公司 | 一种视频分类方法、装置、电子设备及存储介质 |
Families Citing this family (29)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11636681B2 (en) * | 2018-11-21 | 2023-04-25 | Meta Platforms, Inc. | Anticipating future video based on present video |
| JP7352369B2 (ja) * | 2019-03-29 | 2023-09-28 | 株式会社日立システムズ | 予測モデル評価システム、予測モデル評価方法 |
| CN111782734B (zh) * | 2019-04-04 | 2024-04-12 | 华为技术服务有限公司 | 数据压缩、解压方法和装置 |
| CN110162669B (zh) * | 2019-04-04 | 2021-07-02 | 腾讯科技(深圳)有限公司 | 视频分类处理方法、装置、计算机设备及存储介质 |
| CN110263216B (zh) * | 2019-06-13 | 2022-01-28 | 腾讯科技(深圳)有限公司 | 一种视频分类的方法、视频分类模型训练的方法及装置 |
| CN111144508A (zh) * | 2019-12-30 | 2020-05-12 | 中国矿业大学(北京) | 煤矿副井轨道运输自动控制系统与控制方法 |
| CN111190600B (zh) * | 2019-12-31 | 2023-09-19 | 中国银行股份有限公司 | 基于gru注意力模型的前端代码自动生成的方法及系统 |
| CN111104930B (zh) * | 2019-12-31 | 2023-07-11 | 腾讯科技(深圳)有限公司 | 视频处理方法、装置、电子设备及存储介质 |
| CN111209439B (zh) * | 2020-01-10 | 2023-11-21 | 北京百度网讯科技有限公司 | 视频片段检索方法、装置、电子设备及存储介质 |
| CN111259779B (zh) * | 2020-01-13 | 2023-08-01 | 南京大学 | 一种基于中心点轨迹预测的视频动作检测方法 |
| CN111209883B (zh) * | 2020-01-13 | 2023-08-04 | 南京大学 | 一种基于多源运动特征融合的时序自适应视频分类方法 |
| US11354906B2 (en) * | 2020-04-13 | 2022-06-07 | Adobe Inc. | Temporally distributed neural networks for video semantic segmentation |
| CN111489378B (zh) * | 2020-06-28 | 2020-10-16 | 腾讯科技(深圳)有限公司 | 视频帧特征提取方法、装置、计算机设备及存储介质 |
| CN111737521B (zh) * | 2020-08-04 | 2020-11-24 | 北京微播易科技股份有限公司 | 一种视频分类方法和装置 |
| DE102020212515A1 (de) * | 2020-10-02 | 2022-04-07 | Robert Bosch Gesellschaft mit beschränkter Haftung | Verfahren und Vorrichtung zum Trainieren eines maschinellen Lernsystems |
| US12058321B2 (en) * | 2020-12-16 | 2024-08-06 | Tencent America LLC | Method and apparatus for video coding |
| US11943271B2 (en) * | 2020-12-17 | 2024-03-26 | Tencent America LLC | Reference of neural network model by immersive media for adaptation of media for streaming to heterogenous client end-points |
| CN114764859A (zh) * | 2020-12-30 | 2022-07-19 | Tcl科技集团股份有限公司 | 视频特征提取方法、装置、终端设备及存储介质 |
| CN113204992B (zh) * | 2021-03-26 | 2023-10-27 | 北京达佳互联信息技术有限公司 | 视频质量确定方法、装置、存储介质及电子设备 |
| CN115205723A (zh) * | 2021-04-13 | 2022-10-18 | 影石创新科技股份有限公司 | 视频精彩片段的检测方法、装置、计算机设备和存储介质 |
| CN113349791B (zh) * | 2021-05-31 | 2024-07-16 | 平安科技(深圳)有限公司 | 异常心电信号的检测方法、装置、设备及介质 |
| CN113204655B (zh) * | 2021-07-02 | 2021-11-23 | 北京搜狐新媒体信息技术有限公司 | 多媒体信息的推荐方法、相关装置及计算机存储介质 |
| CN113779472B (zh) * | 2021-07-30 | 2024-10-01 | 淘宝(中国)软件有限公司 | 内容审核方法、装置及电子设备 |
| KR102430989B1 (ko) | 2021-10-19 | 2022-08-11 | 주식회사 노티플러스 | 인공지능 기반 콘텐츠 카테고리 예측 방법, 장치 및 시스템 |
| CN114443896B (zh) * | 2022-01-25 | 2023-09-15 | 百度在线网络技术(北京)有限公司 | 数据处理方法和用于训练预测模型的方法 |
| CN114611584B (zh) * | 2022-02-21 | 2024-07-02 | 上海市胸科医院 | Cp-ebus弹性模式视频的处理方法、装置、设备与介质 |
| CN114764898A (zh) * | 2022-03-31 | 2022-07-19 | 阿依瓦(北京)技术有限公司 | 一种视频分析方法及装置 |
| CN115424172A (zh) * | 2022-08-25 | 2022-12-02 | 高新兴科技集团股份有限公司 | 一种交通事件检测方法、装置、设备及存储介质 |
| KR102806775B1 (ko) * | 2023-12-27 | 2025-05-16 | 한국과학기술원 | 자연어와 스케치 기반의 비디오 편집 시스템 |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104331442A (zh) * | 2014-10-24 | 2015-02-04 | 华为技术有限公司 | 视频分类方法和装置 |
| US8990132B2 (en) * | 2010-01-19 | 2015-03-24 | James Ting-Ho Lo | Artificial neural networks based on a low-order model of biological neural networks |
| CN104966104A (zh) * | 2015-06-30 | 2015-10-07 | 孙建德 | 一种基于三维卷积神经网络的视频分类方法 |
| CN106503723A (zh) * | 2015-09-06 | 2017-03-15 | 华为技术有限公司 | 一种视频分类方法及装置 |
Family Cites Families (18)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR100656373B1 (ko) | 2005-12-09 | 2006-12-11 | 한국전자통신연구원 | 시간구간별 우선순위와 판별정책을 적용하는 유해 동영상판별 방법 및 그 장치 |
| CN103544498B (zh) * | 2013-09-25 | 2017-02-08 | 华中科技大学 | 基于自适应抽样的视频内容检测方法与系统 |
| US10762894B2 (en) * | 2015-03-27 | 2020-09-01 | Google Llc | Convolutional neural networks |
| JP6556509B2 (ja) | 2015-06-16 | 2019-08-07 | Cyberdyne株式会社 | 光音響画像化装置および光源ユニット |
| CN104951965B (zh) * | 2015-06-26 | 2017-04-19 | 深圳市腾讯计算机系统有限公司 | 广告投放方法及装置 |
| US9697833B2 (en) * | 2015-08-25 | 2017-07-04 | Nuance Communications, Inc. | Audio-visual speech recognition with scattering operators |
| CN105550699B (zh) * | 2015-12-08 | 2019-02-12 | 北京工业大学 | 一种基于cnn融合时空显著信息的视频识别分类方法 |
| JP6517681B2 (ja) * | 2015-12-17 | 2019-05-22 | 日本電信電話株式会社 | 映像パターン学習装置、方法、及びプログラム |
| US11055537B2 (en) * | 2016-04-26 | 2021-07-06 | Disney Enterprises, Inc. | Systems and methods for determining actions depicted in media contents based on attention weights of media content frames |
| CN106131627B (zh) * | 2016-07-07 | 2019-03-26 | 腾讯科技(深圳)有限公司 | 一种视频处理方法、装置及系统 |
| US10402697B2 (en) * | 2016-08-01 | 2019-09-03 | Nvidia Corporation | Fusing multilayer and multimodal deep neural networks for video classification |
| CN106779467A (zh) * | 2016-12-31 | 2017-05-31 | 成都数联铭品科技有限公司 | 基于自动信息筛选的企业行业分类系统 |
| US11263525B2 (en) * | 2017-10-26 | 2022-03-01 | Nvidia Corporation | Progressive modification of neural networks |
| US10334202B1 (en) * | 2018-02-28 | 2019-06-25 | Adobe Inc. | Ambient audio generation based on visual information |
| US20190286990A1 (en) * | 2018-03-19 | 2019-09-19 | AI Certain, Inc. | Deep Learning Apparatus and Method for Predictive Analysis, Classification, and Feature Detection |
| US10860858B2 (en) * | 2018-06-15 | 2020-12-08 | Adobe Inc. | Utilizing a trained multi-modal combination model for content and text-based evaluation and distribution of digital video content to client devices |
| US10418957B1 (en) * | 2018-06-29 | 2019-09-17 | Amazon Technologies, Inc. | Audio event detection |
| US10699129B1 (en) * | 2019-11-15 | 2020-06-30 | Fudan University | System and method for video captioning |
-
2017
- 2017-09-15 CN CN201710833668.8A patent/CN109508584B/zh active Active
- 2017-09-15 CN CN201910834142.0A patent/CN110532996B/zh active Active
-
2018
- 2018-08-16 JP JP2020515067A patent/JP7127120B2/ja active Active
- 2018-08-16 KR KR1020197032023A patent/KR102392943B1/ko active Active
- 2018-08-16 MA MA050252A patent/MA50252A/fr unknown
- 2018-08-16 EP EP18855424.0A patent/EP3683723A4/en active Pending
- 2018-08-16 WO PCT/CN2018/100733 patent/WO2019052301A1/zh not_active Ceased
-
2019
- 2019-08-30 US US16/558,015 patent/US10956748B2/en active Active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8990132B2 (en) * | 2010-01-19 | 2015-03-24 | James Ting-Ho Lo | Artificial neural networks based on a low-order model of biological neural networks |
| CN104331442A (zh) * | 2014-10-24 | 2015-02-04 | 华为技术有限公司 | 视频分类方法和装置 |
| CN104966104A (zh) * | 2015-06-30 | 2015-10-07 | 孙建德 | 一种基于三维卷积神经网络的视频分类方法 |
| CN106503723A (zh) * | 2015-09-06 | 2017-03-15 | 华为技术有限公司 | 一种视频分类方法及装置 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP3683723A4 |
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20200140589A (ko) * | 2019-06-07 | 2020-12-16 | 국방과학연구소 | 순환 신경망을 이용한 코덱 분류 시스템 및 코덱 분류 방법 |
| KR102255312B1 (ko) * | 2019-06-07 | 2021-05-25 | 국방과학연구소 | 순환 신경망을 이용한 코덱 분류 시스템 및 코덱 분류 방법 |
| CN113010735A (zh) * | 2019-12-20 | 2021-06-22 | 北京金山云网络技术有限公司 | 一种视频分类方法、装置、电子设备及存储介质 |
| CN113010735B (zh) * | 2019-12-20 | 2024-03-08 | 北京金山云网络技术有限公司 | 一种视频分类方法、装置、电子设备及存储介质 |
| CN111428660A (zh) * | 2020-03-27 | 2020-07-17 | 腾讯科技(深圳)有限公司 | 视频剪辑方法和装置、存储介质及电子装置 |
| CN111428660B (zh) * | 2020-03-27 | 2023-04-07 | 腾讯科技(深圳)有限公司 | 视频剪辑方法和装置、存储介质及电子装置 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN109508584A (zh) | 2019-03-22 |
| MA50252A (fr) | 2020-07-22 |
| EP3683723A1 (en) | 2020-07-22 |
| CN109508584B (zh) | 2022-12-02 |
| JP2020533709A (ja) | 2020-11-19 |
| CN110532996A (zh) | 2019-12-03 |
| EP3683723A4 (en) | 2021-06-23 |
| CN110532996B (zh) | 2021-01-22 |
| KR20190133040A (ko) | 2019-11-29 |
| JP7127120B2 (ja) | 2022-08-29 |
| US20190384985A1 (en) | 2019-12-19 |
| US10956748B2 (en) | 2021-03-23 |
| KR102392943B1 (ko) | 2022-04-29 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2019052301A1 (zh) | 视频分类的方法、信息处理的方法以及服务器 | |
| CN113378784B (zh) | 视频标签推荐模型的训练方法和确定视频标签的方法 | |
| CN110162669B (zh) | 视频分类处理方法、装置、计算机设备及存储介质 | |
| CN111428088B (zh) | 视频分类方法、装置及服务器 | |
| CN111798879B (zh) | 用于生成视频的方法和装置 | |
| WO2020221278A1 (zh) | 视频分类方法及其模型的训练方法、装置和电子设备 | |
| US11625433B2 (en) | Method and apparatus for searching video segment, device, and medium | |
| US11948359B2 (en) | Video processing method and apparatus, computing device and medium | |
| CN112559800B (zh) | 用于处理视频的方法、装置、电子设备、介质和产品 | |
| WO2020177673A1 (zh) | 一种视频序列选择的方法、计算机设备及存储介质 | |
| CN116977774B (zh) | 图像生成方法、装置、设备和介质 | |
| WO2020108396A1 (zh) | 视频分类的方法以及服务器 | |
| CN108629224A (zh) | 信息呈现方法和装置 | |
| US20210117687A1 (en) | Image processing method, image processing device, and storage medium | |
| WO2020103674A1 (zh) | 自然语言描述信息的生成方法及装置 | |
| CN113434716B (zh) | 一种跨模态信息检索方法和装置 | |
| US20200321026A1 (en) | Method and apparatus for generating video | |
| CN112883731A (zh) | 内容分类方法和装置 | |
| CN113743277A (zh) | 一种短视频分类方法及系统、设备和存储介质 | |
| CN106611015A (zh) | 标签的处理方法及装置 | |
| CN110163052A (zh) | 视频动作识别方法、装置和机器设备 | |
| CN113742525A (zh) | 自监督视频哈希学习方法、系统、电子设备及存储介质 | |
| CN114330239A (zh) | 文本处理方法及装置、存储介质及电子设备 | |
| CN114881196A (zh) | 学生网络处理方法、装置和电子设备 | |
| CN113811893A (zh) | 用于引导架构演进的连接权重学习 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18855424 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 20197032023 Country of ref document: KR Kind code of ref document: A |
|
| ENP | Entry into the national phase |
Ref document number: 2020515067 Country of ref document: JP Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2018855424 Country of ref document: EP Effective date: 20200415 |






