WO2019184654A1 - 神经网络模型的训练、延时摄影视频的生成方法及设备 - Google Patents
神经网络模型的训练、延时摄影视频的生成方法及设备 Download PDFInfo
- Publication number
- WO2019184654A1 WO2019184654A1 PCT/CN2019/076724 CN2019076724W WO2019184654A1 WO 2019184654 A1 WO2019184654 A1 WO 2019184654A1 CN 2019076724 W CN2019076724 W CN 2019076724W WO 2019184654 A1 WO2019184654 A1 WO 2019184654A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video
- training
- time
- lapse
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/217—Validation; Performance evaluation; Active pattern learning techniques
- G06F18/2193—Validation; Performance evaluation; Active pattern learning techniques based on specific statistical tests
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0475—Generative networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/094—Adversarial learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—Three-dimensional [3D] image rendering
- G06T15/10—Geometric effects
- G06T15/20—Perspective computation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/49—Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Definitions
- the present invention relates to the field of artificial intelligence technologies, and in particular, to a training method for a neural network model, a method for generating a time-lapse video, and a corresponding device, system, device, storage medium, and computer program product.
- Time-lapse photography also known as time-lapse photography or time-lapse video
- time-lapse photography is a shooting technique that compresses time. It takes a set of photos, and later through the series of photos, the process of minutes, hours or even days or years is compressed in a short period of time to play video.
- the method of generating time-lapse video is still in the academic research stage, which is mainly realized by the neural network model, but the video content generated by the method is fuzzy, the authenticity is poor, and it is difficult to meet the needs of users, so it has not been widely Applications.
- the embodiment of the present application provides a training method for a neural network model and a method for generating a delayed photographic video, which can generate a time-lapse photographic video with clear fluency and high authenticity, and meets the needs of users.
- the present application also provides corresponding apparatus, systems, devices, storage media, and computer program products.
- a training method for a neural network model, applied to a server comprising:
- the training sample includes a training video and a corresponding image set, the image set including a first frame image or a tail frame image in the training video of the first preset number of frames;
- the neural network model comprising a basic network for modeling content of the time-lapse photography video and for modeling a motion state of the time-lapse photography video Optimized network;
- the basic network is a first generation type confrontation network with an image set including the same preset number of frames as an input and a base time-lapse photography video as an output;
- the optimized network is a second generation-oriented network that uses the output of the base network as an input to optimize time-lapse video as an output.
- a method for generating a time-lapse video is applied to an electronic device, including:
- a training device for a neural network model comprising:
- An acquiring module configured to acquire a training sample, where the training sample includes a training video and a corresponding image set, where the image set includes a first frame image or a tail frame image of the first preset number of the training videos;
- a training module configured to train, according to the training sample, a neural network model that satisfies a training end condition, where the neural network model includes a basic network for modeling content of time-lapse photography video and is used for time-lapse photography video
- An optimized network in which the motion state is modeled wherein the base network is a first generation-resistance network that outputs an image set including the same image of the first predetermined number of frames as an input, and the base time-lapse video is output;
- the optimized network is a second generation-oriented network that uses the output of the base network as an input to optimize the time-lapse video as an output.
- a device for generating a time-lapse video comprising:
- a first generation module configured to generate, according to the specified image, an image set including the first preset number of the specified images
- a second generating module configured to perform content modeling and motion state modeling on the image set by using a pre-trained neural network model according to the image set, and obtain a time-lapse photography video output by the neural network model;
- the neural network model is trained by the training method of the neural network model described in the present application.
- a time-lapse photography video generation system includes:
- terminal and the server interact through a network
- the server is configured to receive a specified image sent by the terminal, and generate, according to the specified image, an image set including the first preset number of the specified images, according to the image set, through a pre-trained neural network model pair Performing content modeling and motion state modeling on the image set, obtaining a delayed photographic video output by the neural network model, and transmitting the delayed photographic video to the terminal; wherein the neural network model is passed.
- the training method of the neural network model described in the present application is trained.
- An electronic device comprising:
- the memory is used to store a computer program
- the processor is configured to invoke and execute a computer program in the memory to implement a training method of the neural network model described in the present application, or to implement a method for generating a time-lapse video as described in the present application.
- a storage medium having stored thereon a computer program, the computer program being executed by a processor, implementing: various steps of a training method for generating a neural network model of time-lapse video as described above; and/or Each step of the method of generating a time-lapse video as described above.
- a computer program product comprising instructions which, when run on a computer, cause the computer to perform a training method of a neural network model as described herein, or to perform a method of generating a time-lapse video as described herein.
- the present application provides a method for generating a time-lapse video based on a model of a dual network structure, wherein the dual network structure specifically includes content for time-lapse video.
- An underlying network for modeling and an optimized network for modeling the motion state of the time-lapse photographic video wherein the base network is input with a video including a first predetermined number of frame-specified frame images, with a base time-lapse photographic video
- the optimization network is based on the output of the base network as an input for modeling the motion state of the time-lapse video to optimize the time-lapse video as the second generation of the output.
- the image set corresponding to the training video is generated according to the training video, and the image set includes a first frame image or a tail frame image of the first preset number of the training videos, through the training
- the video and its corresponding image set train the basic network and the neural network model composed of the optimized network.
- This video can be used to generate time-lapse photographic video.
- a time-lapse photographic video of the neural network model output is obtained.
- the method continuously optimizes the time-lapse photography video through multi-stage generation against the network, and predicts reasonable future frames or historical frames by modeling the content and modeling the motion state, thereby implementing time-lapse photography from coarse to fine. video.
- the method preserves the authenticity of the content and the rationality of the motion information, so that the generated time-lapse photography video has higher authenticity and is more natural; on the other hand, the model used in the method is cascaded double
- the network structure is easy to implement and simplify and can be applied to cloud or offline scenarios.
- FIG. 1 is a flowchart of a method for generating a time-lapse video according to an embodiment of the present application
- FIG. 2 is a flowchart of a training method of a neural network model according to an embodiment of the present application
- FIG. 3 is a flowchart of another training method of a neural network model according to an embodiment of the present application.
- FIG. 4 is a flowchart of a training method of a basic network according to an embodiment of the present application.
- FIG. 5 is a structural diagram of a basic network according to an embodiment of the present application.
- FIG. 6 is a flowchart of a training method for optimizing a network according to an embodiment of the present application
- FIG. 7 is a structural diagram of an optimized network according to an embodiment of the present application.
- FIG. 8 is a structural diagram of a system for generating a time-lapse video according to an embodiment of the present application.
- FIG. 9 is a signaling flowchart of a method for generating a time-lapse video according to an embodiment of the present application.
- FIG. 10 is a structural diagram of a device for generating a time-lapse video according to an embodiment of the present application.
- FIG. 11 is a structural diagram of a training apparatus for a neural network model according to an embodiment of the present application.
- FIG. 12 is a structural diagram of another training apparatus for a neural network model according to an embodiment of the present application.
- FIG. 13 is a hardware structural diagram of an electronic device according to an embodiment of the present application.
- FIG. 1 is a flowchart of a method for generating a time-lapse video according to an embodiment of the present application. As shown in Figure 1, the method includes:
- Step S11 acquiring a specified image
- One implementation manner is to select a photo from the album as the specified image. Specifically, the terminal obtains the selected photo in the album as the designated image in response to the selected instruction; One implementation is to capture an image in real time as a designated image, and specifically, the terminal acquires a photograph taken as a designated image in response to a photographing instruction.
- the server receives a time-lapse photography generation request sent by the terminal, where the time-lapse photography generation request carries the specified image, and the server may acquire the specified image from the time-lapse photography generation request.
- Step S12 generating, according to the specified image, an image set including the specified image of the first preset number of frames
- This embodiment provides two ways to generate an image set.
- An implementation manner is: copying the specified image until the number of the specified images reaches a first preset number, and then generating an image set according to the specified image according to the first preset number of frames; another implementation manner is: The specified image is repeatedly acquired at the data source, for example, the same image is acquired multiple times from the album, and the image is a specified image until the number of the specified images reaches a first preset number, and then specified according to the first preset number of frames.
- Image generation image set is: copying the specified image until the number of the specified images reaches a first preset number, and then generating an image set according to the specified image according to the first preset number of frames.
- the first preset number may be 32, that is, the image set includes 32 pieces of the specified image. It should be noted that the present application does not limit the first preset number of specific values. The adjustment may be made according to actual needs, and the change of the first preset number of specific values does not deviate from the protection scope of the present application.
- Step S13 Perform content modeling and motion state modeling on the image set by using a pre-trained neural network model according to the image set, and obtain a delayed photographic video output by the neural network model.
- the neural network model comprises a basic network and an optimized network.
- the basic network is used for content modeling of the time-lapse photographic video, which is a generation-oriented confrontation network that takes an image set including a first preset number of frame-designated images as an input and a basic time-lapse photographic video as an output, for convenience of expression, Recorded as the first generation of confrontation networks.
- the optimization network is used to model the motion state of the time-lapse video, which is based on the output of the basic network as an input, to optimize the time-lapse video as the output of the generated confrontation network, for the convenience of expression, recorded as the first Two generations against the network.
- the so-called generative confrontation network is a network based on the two-person zero-sum game idea.
- the network includes a generation model (also called a generator) and a discriminant model (also called a discriminator), which generates a model to capture the distribution of sample data, and generates a similar reality with noise obeying a certain distribution (such as uniform distribution, Gaussian distribution, etc.).
- the sample data of the training data; the discriminant model is a two classifier for estimating the probability that a sample data is derived from real training data (rather than generating data), and if the sample is from real training data, a large probability is output, otherwise, Output a small probability.
- the goal of the generated model is to generate the same sample data as the real training data, so that the discriminant model cannot be discriminated, and the target of the discriminant model is to detect the sample data generated by the model.
- the electronic device is deployed with a neural network model, and the specified image set is input to the neural network model, and the basic network of the neural network model performs content modeling on the delayed photography video to generate a basic time-lapse photography video, and then,
- the basic time-lapse photography video output of the basic network is input to the optimization network, and the optimized network models the motion state of the time-lapse photography video, and outputs the optimized time-lapse photography video, which is the final output time-lapse photography video.
- the video generated by the method for generating time-lapse video may be a video that represents the future, or may be a video that is used to represent the past, which is mainly determined by the neural network model used, if the model is Forecasting future frames and implementing forward prediction generates a video that represents the future. If the model predicts historical frames and implements backward prediction, a video representing the past is generated.
- the photo of the flower bud can be used as the designated image, and the specified image is copied to obtain the first predetermined number of frame designation images, thereby generating an image set, and then The image set is input to a neural network model capable of predicting future frames, which is capable of outputting a time-lapse video in which the flower buds gradually bloom until they are fully bloomed.
- the photo when the flower is fully blooming may also be used as the designated image, and the first predetermined number of frame designations are obtained by copying the specified image.
- the image thereby generating an image set, and then inputting the image set into a neural network model capable of predicting the history frame, the neural network model capable of predicting the historical frame before the flower is fully bloomed, thereby being able to output the flower from the flower bud state to the full bloom until it is fully bloomed Time-lapse video.
- the embodiment of the present application provides a method for generating a time-lapse video, which uses a pre-trained neural network model to perform content modeling and motion state modeling on an image set including a first preset number of frame-specified images.
- the method continuously optimizes the time-lapse photographic video against the network through multi-stage generation, and ensures the prediction of reasonable future frames or historical frames by modeling the content and modeling the motion state, from coarse to fine. Gradually generate time-lapse photography videos.
- the method preserves the authenticity of the content and the rationality of the motion information, so that the generated time-lapse photography video has higher authenticity and is more natural; on the other hand, the model used in the method is cascaded double
- the network structure is easy to implement and simplify and can be applied to cloud or offline scenarios.
- the step S13 uses a neural network model that has been pre-trained for generating a time-lapse video. It can be understood that the neural network for generating a time-lapse video is used. The network model needs to be trained in advance. The training process for the neural network model used to generate time-lapse video will be introduced below.
- FIG. 2 is a flowchart of a training method of a neural network model according to an embodiment of the present application. As shown in Figure 2, the method includes:
- Step S21 acquiring a training sample
- the training samples include a training video and a set of images corresponding thereto, the image set including a first frame image or a tail frame image of the first predetermined number of frames of the training video. It should be noted that when training a neural network model, it is usually implemented by using batch training samples.
- the images included in the image set of the batch training samples are all the first frame images in the training video, or are the tail frame images in the training video. .
- the training video is a time-lapse video. Specifically, each time-lapse photographic video acquired in advance is pre-processed to generate a plurality of qualified training videos, and a plurality of the qualified, independent and non-coincident training videos are acquired.
- a large number of time-lapse video videos can be crawled on the Internet in advance by setting keywords. These captured time-lapse video videos are generally large, and these large videos can be segmented into small video segments.
- remove inappropriate training data such as small video clips where the picture is still, the black side of the picture is very large, the picture is very dark, or the picture has a fast zoom in and out.
- the remaining video clips can be qualified, independent, and non-coincident training videos in the form of a training video for each first predetermined number of frames. For example, if a video clip includes 128 frames and the first preset number is 32, the video clip can be generated into 4 training videos every 32 frames.
- Each of the training videos includes a first preset number of frame images, and the first preset number may be 32, and the size thereof is suitable for convenient training. Of course, the first preset number may be set according to actual needs.
- the present application is not limited to the specific numerical values, and the specific numerical values thereof do not depart from the protection scope of the present application.
- the training sample may be obtained by: acquiring a training video first, and then extracting a first frame image or a tail frame image from the training video, and generating an image set corresponding to the training video, where The training video and its corresponding image set are used as training samples.
- the image set corresponding to the training video may be generated in two ways. One way is to copy the extracted image until the number of images reaches a first preset number, and generate an image set according to the first preset number of frame images. Another way is to obtain a first preset number of frame first frame images or a first preset number of frame end frame images by means of multiple extractions, thereby generating an image set.
- Step S23 training a neural network model that satisfies the training end condition according to the training sample.
- the neural network model includes an underlying network for modeling content of time-lapse photographic video and an optimized network for modeling motion states of time-lapse photographic video; wherein the base network is comprised of a first pre- a first set of anti-networks with the same number of frames of the same image as input, with the base time-lapse photographic video as the output; the optimized network takes the output of the basic network as an input to optimize the delayed photographic video as an output The second generation is against the network.
- the basic network and the optimized network are both generated confrontation networks, and the basic network can perform content modeling based on the image set including the first preset number of frames of the same image, thereby generating a basic time-lapse photography video, where On the basis of this, the motion state of the basic time-lapse photographic video can be modeled by optimizing the network for continuous optimization to generate more realistic and more natural optimized time-lapse photography video.
- FIG. 3 is a flowchart of another training method of a neural network model according to an embodiment of the present application. As shown in FIG. 3, the method includes:
- Step S31 training, according to the training sample, a first generated confrontation network that satisfies the training end condition, as a basic network;
- the training sample includes an image set corresponding to the training video and the training video, and the basic network takes the image set as an input, and outputs a basic time-lapse photography video by performing content modeling, which aims to generate a basic time-lapse photography video close to the training video, so that
- the parameter of the first generation type confrontation network may be adjusted based on the degree of similarity between the generated video and the training video, and the first generation type confrontation network is optimized by continuously adjusting the parameters, and when the training end condition is met, the first generation type network is used.
- the training end condition may be set according to actual requirements, for example, the loss function of the first generation type confrontation network is in a convergence state, or the loss function of the first generation type confrontation network is less than a preset value.
- the training process of the basic network will be described in detail below, and will not be described here.
- Step S32 obtaining, according to the image set corresponding to the training video, a basic time-lapse photography video output by the basic network through a basic network;
- the basic network takes an image set including the first preset number of frames of the same image as an input, and uses the basic time-lapse photographic video as an output, and inputs the image set corresponding to the training video into the basic network to obtain a basic time-lapse photographic video output by the basic network.
- Step S33 according to the basic time-lapse photography video and the training video, training to obtain a second generation-type confrontation network that satisfies the training end condition, as an optimized network.
- the optimized network is used to further optimize the underlying time-lapse video, which can be obtained through generative anti-network training.
- the basic time-lapse photography video and training video can be used as the training samples, and the basic time-lapse photography video is taken as input, and the time-lapse photography video is optimized as the output, and the generated optimized time-lapse photography video is close to the training video.
- the parameters of the second generation type confrontation network can be adjusted based on the degree of similarity between the generated optimized time-lapse photography video and the training video, and the second generation type confrontation network is optimized by continuously adjusting the parameters, when the training end condition is satisfied,
- the second generation network is used as an optimization network.
- the training end condition may be set according to actual needs, for example, the loss function of the second generation against the network is in a convergence state, or the loss function of the second generation against the network is less than a preset value.
- the training process of the basic network will be described in detail below, and will not be described here. After training the basic network and optimizing the network, cascading the basic network and the optimized network is a neural network model for generating delayed photographic video.
- FIG. 4 is a flowchart of a training method of a basic network according to an embodiment of the present application. As shown in FIG. 4, the method includes:
- Step S41 inputting the image set to the first generator to obtain a basic time-lapse photography video output by the first generator network
- the base network includes a first generator and a first discriminator, wherein the first generator is configured to generate a base time-lapse video, and the first discriminator is configured to determine whether the base time-lapse video is a real video, if The first discriminator discriminates the result as a real video, which indicates that the basic time-lapse photographic video generated by the first generator has higher authenticity and is more natural.
- the first generator can be composed of an encoder and a decoder.
- the encoder includes a specified number of convolutional layers
- the decoder can include a specified number of deconvolution layers, such that the encoder as a whole exhibits a symmetric structure.
- the specified number can be set according to actual needs. As an example, it can be 6.
- Each convolutional layer and its symmetric deconvolution layer are connected by a jumper, so that the characteristics of the encoder can be better utilized.
- the discriminator has the same structure as the encoder in the first generator except that the output layer is a two-class layer. It should be noted that the number of the convolution layers in the first discriminator can be adjusted according to actual needs, which is not limited in this application.
- FIG. 5 is a structural diagram of a basic network according to an embodiment of the present application.
- the base network includes a first generator 51 and a first discriminator 52, where x represents a first frame image or a tail frame image, and X represents an image set formed by the first frame image or an image set formed by the tail frame image.
- Y represents the training video, and Y1 represents the base time-lapse video of the first generator output.
- Step S42 input the basic time-lapse photography video and the training video corresponding to the image set to the first discriminator, and calculate a loss of the first generated confrontation network by using a loss function of the first generated confrontation network;
- a discriminator that is, a first discriminator, is used to discriminate between the video generated by the generator and the real video.
- the first discriminator has a similar structure to the encoder in the first generator, the main difference is that the output layer is a two-class layer, and the basic time-lapse photography video and the training video output by the first generator are input to the first discriminator.
- the first discriminator calculates the first generation against the loss based on the base time-lapse photographic video and the training video.
- the training of the basic network is implemented by adjusting the network parameters to reduce the loss of the first generated confrontation network.
- the loss of the first generated confrontation network includes at least an anti-loss, and the confrontation loss can be calculated based on the following formula:
- L adv denotes resistance to loss
- E denotes expectation
- D 1 denotes a function corresponding to the first generator
- G 1 denotes a function corresponding to the first discriminator
- X denotes a four-dimensional matrix corresponding to the image set
- Y denotes (the image set Corresponding to the four-dimensional matrix corresponding to the training video; wherein the four dimensions of the four-dimensional matrix are the length, width, and number of channels of the image respectively (refer to the number of channels of the image, if the image is in the RGB color mode, the number of channels is 3 ) and the number of frames of the image.
- the content loss function based on the L1 norm is also set as:
- L con (G 1 ) represents a content loss
- G 1 represents a function corresponding to the first discriminator
- X represents a four-dimensional matrix corresponding to the image set
- Y represents a four-dimensional matrix corresponding to the training video (corresponding to the image set);
- 1 means to find the L1 norm.
- the loss of the first generated confrontational network may be the sum of the loss of resistance and the loss of content based on the L1 norm.
- Step S43 updating parameters of the first generator and the first discriminator respectively based on the loss of the first generated confrontation network
- the gradient value of each layer is calculated by using the loss of the first generated confrontation network, and then the parameters of the first generator and the first discriminator (such as weight, offset, etc.) are updated. .
- the training of the first generated confrontation network is implemented by continuously updating the parameters of the first generator and the first discriminator, and when the training end condition is satisfied, if the loss of the first generated confrontation network is converged or less than a preset value
- the first generated confrontation network can be determined as the base network.
- FIG. 6 is a flowchart of a training method for optimizing a network according to an embodiment of the present application. As shown in FIG. 6, the method includes:
- Step S61 obtaining, according to the basic time-lapse photography video, an optimized time-lapse video by using the second generator in the second generation against the network;
- the optimization network includes a second generator and a second discriminator, wherein the second generator is configured to perform motion information modeling according to the basic time-lapse photography video, and obtain an optimized time-lapse video, and the second discriminator is used to determine the optimized delay. Whether the photographic video is a real video, if the second discriminator discriminates the result as a real video, it indicates that the optimized time-lapse photographic video generated by the second generator has higher authenticity and is more natural.
- the second generator in the optimized network includes an encoder and a decoder, wherein the encoder can be composed of M convolution layers, the decoder is composed of M deconvolution layers, and the encoder exhibits a symmetric structure as a whole. .
- M is a positive integer.
- the convolutional layer and the symmetric deconvolution layer can be selectively specified to be connected by jumper, so that the characteristics of the encoder can be better utilized to specify which (or which) convolution layer and The symmetrical anti-convolution layer is connected by jumper, and can be determined according to the experimental result after a certain amount of experiments, which is not limited in this application.
- the number of convolution layers and the number of deconvolution layers (ie, M), and the parameter configuration of each layer can be adjusted according to actual needs, for example, M can be equal to 6, and this application Without limitation, just ensure that the resolution of the input and output images is the same. That is to say, in the second generator of the optimized network, the increase and decrease of the number of the convolutional layer and the deconvolution layer does not deviate from the protection scope of the present application. By comparison, it can be found that the second generator network of the optimized network has a structure similar to that of the first generator network of the base network (except for the removal of several jumpers, the rest of the structure is the same).
- the second discriminator of the optimized network has the same structure as the first discriminator of the basic network, and details are not described herein again.
- FIG. 7 is a structural diagram of an optimized network according to an embodiment of the present application.
- the optimization network includes a second generator 71 and a second discriminator 72, Y1' represents a base time-lapse video of the basic network output after the training, Y represents a training video, and Y2 represents a second generation. Optimized time-lapse photographic video of the output.
- Step S62 according to the optimized time-lapse photography video, obtaining a determination result by using the second generation device in the second generation against the second discriminator in the network;
- the second discriminator can discriminate the authenticity of the optimized time-lapse photographic video generated by the second generator according to the optimized time-lapse photographic video and the training video, thereby obtaining a discrimination result, if the delayed photographic video is optimized
- the similarity with the training video reaches a preset level, it is determined that the optimized time-lapse video is a real video, that is, the optimized time-lapse video has high authenticity.
- Step S63 generating a loss of the second generation type against the network according to the optimized time-lapse photography video, the basic time-lapse photography video, the training video, and the determination result;
- the optimized network is modeled by adjusting the parameters to reduce the loss of the second generation against the network.
- the loss includes at least a ranking loss, the ranking loss being determined based on respective motion characteristics of the optimized time-lapse video, the base time-lapse video, and the training video.
- the loss of the second generation against the network may be determined according to content loss, resistance loss, and the sorting loss of the second generation against the network, and based on this, in some possible implementation manners,
- the loss function of the optimized network can be:
- L stage1 represents the loss of the optimized network
- L adv denotes the loss of resistance
- L con [ie L con (G 1 )] represents the content loss
- ⁇ represents the preset constant
- L rank represents the (total) sorting loss
- the second discriminator in the second generation against the network may be utilized to separately extract features of the optimized time-lapse video, the basic time-lapse video, and the training video. And correspondingly calculating, according to the feature, a Gram gram matrix corresponding to each of the optimized time-lapse video, the basic time-lapse video, and the training video, where the gram matrix is used to represent a motion state between video frames;
- the sorting loss can then be determined based on the respective Gram gram matrices of the optimized time-lapse photographic video, the base time-lapse photographic video, and the training video.
- the sorting loss function is:
- L rank (Y 1 , Y, Y 2 ) represents the (total) sorting loss
- L rank (Y 1 , Y, Y 2 ; l) represents the sorting loss function of a single layer (ie, a single feature layer)
- l represents The sequence number of the feature layer in the second discriminator
- Y 1 represents a four-dimensional matrix corresponding to the base time-lapse photography video
- Y represents a four-dimensional matrix corresponding to the training video (corresponding to the image set)
- Y 2 represents an optimized time-lapse photography video corresponding to Four-dimensional matrix, Indicates summation.
- l that is, which feature layer is specifically selected
- g(Y; l) represents the gram (gram) matrix extracted at layer 1.
- Step S64 Optimize the network parameters of the second generation-oriented confrontation network according to the loss of the generation against the network, until a second generation-type confrontation network that satisfies the training termination condition is obtained as an optimized network.
- the gradient values of the layers are calculated by the loss of the optimized network, and the parameters (such as weights, offsets, and the like) of the second generator and the second discriminator are updated.
- the training of the second generated confrontation network is realized by continuously updating the parameters of the second generator and the second discriminator, and when the training end condition is satisfied, if the loss of the second generated confrontation network is converged or less than the preset value
- the second generated confrontation network can be determined as an optimized network.
- the first generator and the first discriminator alternately perform training: when training the first generator, the first discriminator is fixed; when training the first discriminator, the first generator is fixed; Similarly, the second generator and the second discriminator alternately train: when training the second generator, the second discriminator is fixed, thus minimizing the sorting loss, so as to ensure the optimized delayed photographic video of the second generator output.
- the second generator is fixed to maximize the sorting loss to amplify the difference between the optimized time-lapse video of the second generator output and the real video, which is beneficial to the subsequent optimization of the network. Further training.
- the optimized network trained by the embodiment can further optimize the video outputted by the base network that has been trained to converge, mainly in the ability to optimize the motion information.
- FIG. 8 is a structural diagram of a system for generating a time-lapse video according to an embodiment of the present application. As shown in Figure 8, the system includes:
- the terminal 81 and the server 82, the terminal 81 and the server 82 interact through a network
- the server 82 is configured to receive a specified image sent by the terminal, generate an image set including the first preset number of frames of the specified image according to the specified image, and adopt a pre-trained neural network model according to the image set. Performing content modeling and motion state modeling on the image set, obtaining a delayed photographic video output by the neural network model, and transmitting the delayed photographic video to the terminal; wherein the neural network model is Trained by the training method of the neural network model described above.
- the operation of the server 82 may also include the various steps of the training process described above for generating a neural network model of time-lapse video.
- the terminal 81 may be a mobile smart device 811 such as a smart phone, or a local computer device 812 such as a computer.
- the user only needs to upload a specified image through the local terminal, and the remote server can output the predicted time-lapse video through the neural network model for generating the time-lapse video based on the specified image. And sent to the local terminal, so that users can easily create a time-lapse video, which can effectively enhance the user experience.
- the technical solution does not require a local terminal to run the neural network model for generating time-lapse video, so that a time-lapse video can be created without occupying the running resources of the local terminal, thereby effectively saving the operation of the local terminal. Resources.
- FIG. 9 is a signaling flowchart of a method for generating a time-lapse video according to an embodiment of the present disclosure. As shown in FIG. 9, the signaling process includes:
- Step S91 the local terminal sends the specified image to the remote server
- Step S92 the remote server copies the specified image, and generates an image set including the specified image of the first preset number of frames;
- Step S93 the remote server inputs the image set to a neural network model for generating a time-lapse photography video
- Step S94 reconstructing content of the specified image in the image set by using the neural network model, and outputting a delayed photographic video
- the remote server sends the output delayed photography video to the local terminal.
- the method continuously optimizes the time-lapse photography video through multi-stage generation against the network, and predicts reasonable future frames or historical frames by modeling the content and modeling the motion state, thereby implementing time-lapse photography from coarse to fine. video.
- the method preserves the authenticity of the content and the rationality of the motion information, so that the generated time-lapse photography video has higher authenticity and is more natural.
- the technical solution does not require a local terminal to run the neural network model for generating time-lapse video, so that a time-lapse video can be created without occupying the running resources of the local terminal, thereby effectively saving the operation of the local terminal. Resources.
- the training process of the neural network model for generating the time-lapse video is required, the system resources are large. Therefore, preferably, the method for generating the time-lapse video is performed at the remote server end. The training process of the neural network model.
- the present application discloses a device for generating a time-lapse video.
- FIG. 10 is a structural diagram of a device for generating a time-lapse video according to an embodiment of the present disclosure.
- the device can be applied to a local terminal, or a remote server end in a time-lapse photography video generation system.
- the device 1000 includes:
- the obtaining module 1010 is configured to acquire a specified image.
- the first generating module 1020 is configured to generate, according to the specified image, an image set including the specified image of the first preset number of frames;
- a second generation module 1030 configured to perform content modeling and motion state modeling on the image set by using a pre-trained neural network model according to the image set, to obtain a time-lapse photography video output by the neural network model;
- the neural network model is trained by the training method of the above neural network model.
- the electronic device is a terminal device, and the neural network model is deployed in the terminal device, where the acquiring module 1010 is specifically configured to:
- the photograph taken is taken as a designated image in response to the photographing instruction.
- the electronic device is a server
- the acquiring module 1010 is specifically configured to:
- the specified image is acquired from the time-lapse photography generation request.
- the apparatus for generating time-lapse photographic video provided by the embodiment of the present application first acquires a specified image, generates a specified image set including a first preset number of specified images according to the specified image, and then uses the pre-trained neural network model to specify the image.
- the image set performs content modeling and motion state modeling to obtain a time-lapse photographic video output by the neural network model.
- the device continuously optimizes the time-lapse photographic video through multi-stage generation against the network, and predicts reasonable future frames by modeling the content and modeling the motion state, and gradually generates delayed photographic video from coarse to fine.
- the device preserves the authenticity of the content and the rationality of the motion information, so that the generated time-lapse video has higher authenticity and is more natural; on the other hand, the model used by the device is a cascaded double
- the network structure is easy to implement and simplify and can be applied to cloud or offline scenarios.
- FIG. 11 is a structural diagram of a training apparatus for a neural network model according to an embodiment of the present application. As shown in FIG. 11, the device 1100 includes:
- the obtaining module 1110 is configured to acquire a training sample, where the training sample includes a training video and a corresponding image set, where the image set includes a first frame image or a tail frame image of the first preset number of the training videos;
- the training module 1120 is configured to train, according to the training sample, a neural network model that satisfies a training end condition, where the neural network model includes a basic network for modeling content of the time-lapse video and a time-lapse video An optimized network for modeling the motion state; wherein the base network is a first generation-type confrontation network that uses an image set including the first predetermined number of frames of the same image as an input and a base time-lapse video as an output; The optimized network is a second generation-oriented network that uses the output of the base network as an input to optimize the time-lapse video as an output.
- FIG. 12 is a structural diagram of another training apparatus for a neural network model according to an embodiment of the present application.
- the device 1100 includes the modules as described in FIG. 11 and its corresponding embodiments, and the training module 1120 specifically includes:
- the first training sub-module 1121 is configured to train, according to the training sample, a first generated confrontation network that satisfies a training end condition as a basic network;
- the obtaining sub-module 1122 is configured to obtain, according to the image set corresponding to the training video, a basic time-lapse photography video output by the basic network through a basic network;
- the second training sub-module 1123 is configured to, according to the basic time-lapse photography video and the training video, train a second generation-type confrontation network that satisfies the training end condition as an optimized network.
- the second training submodule 1123 is specifically configured to:
- a loss of the second generation against the network is Determining according to the motion characteristics corresponding to the optimized time-lapse photography video, the basic time-lapse photography video, and the training video;
- the device further includes a determining module, configured to determine, by using the following manner, the loss of the second generation against the network:
- the optimized time-lapse photographic video, the basic time-lapse photographic video, and the training video respectively correspond to a Gram gram matrix, wherein the gram matrix is used to represent a motion state between video frames;
- the obtaining module 1110 is specifically configured to:
- the training video and its corresponding image set are used as training samples.
- the present application provides a neural network model generation method based on a dual network structure, wherein the dual network structure specifically includes a basic network for modeling content of time-lapse photography video and for time-lapse photography video.
- the optimized network for modeling the motion state wherein the basic network is a first generation-type confrontation network with the video of the first predetermined number of frames specifying the frame image as the input, and the base time-lapse photography video as the output, and the optimized network is Taking the output of the basic network as an input, and modeling the motion state of the time-lapse video to optimize the time-lapse video as the output of the second generation-oriented confrontation network, after acquiring multiple training videos, according to the training The video generation generates an image set corresponding to the training video, where the image set includes a first frame image or a tail frame image of the first preset number of the training videos, and the basic network and the optimized network are trained by the training video and the corresponding image set.
- the composed neural network model can be used to generate time-lapse photography video when the training end condition is met.
- the neural network model trained by the device continuously optimizes the time-lapse photography video through multi-stage generation against the network, and ensures the prediction of reasonable future frames or historical frames by modeling the content and modeling the motion state, from coarse to fine. Gradually generate time-lapse photography videos.
- the method preserves the authenticity of the content and the rationality of the motion information, so that the generated time-lapse video has higher authenticity and is more natural; on the other hand, the neural network model trained by the device is cascaded. Dual network structure, easy to implement and simplify, can be applied to cloud or offline scenarios.
- the present application discloses an electronic device, which may be a local terminal (such as a local computer or a mobile terminal), corresponding to the method for generating a time-lapse video provided by the embodiment of the present application. Etc.), or a remote server.
- a local terminal such as a local computer or a mobile terminal
- a remote server corresponding to the method for generating a time-lapse video provided by the embodiment of the present application. Etc.
- FIG. 13 is a hardware structural diagram of an electronic device according to an embodiment of the present application. As shown in FIG. 13, the electronic device includes:
- Processor 1 communication interface 2, memory 3 and communication bus 4;
- the processor 1, the communication interface 2, and the memory 3 complete communication with each other through the communication bus 4;
- processor 1 for calling and executing a program stored in the memory
- a memory 3 for storing a program
- the program may include program code, the program code includes a computer operation instruction; in the embodiment of the present application, the program may include: a program corresponding to a training method for generating a neural network model of time-lapse photography video, and the extension
- the program for generating a photographic video corresponds to two programs of the program, or any one of the programs.
- the processor 1 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of the present application.
- CPU central processing unit
- ASIC Application Specific Integrated Circuit
- the memory 3 may include a high speed RAM memory and may also include a non-volatile memory such as at least one disk memory.
- the program may be used to perform the steps of any implementation manner of the method for generating a time-lapse video provided by the embodiment of the present application.
- the embodiment of the present application further provides a storage medium, where the storage medium stores a computer program, and when the computer program is executed by the processor, each step of the training method for performing the neural network model described in the foregoing embodiment, and / or each step of the method of generating the time-lapse video.
- the first method is that the user performs the production on the local terminal; wherein the operations performed by the local terminal include:
- the specified image may be a picture of the sky photographed by the user on the spot, or the user selects a picture of the sky existing before; copying the specified image to generate the first preset number of the specified image a set of images; inputting the set of images to a neural network model for generating a time-lapse photographic video; performing content modeling and motion state modeling by the neural network model, reconstructing content of a specified image, and outputting optimized time-lapse photography video.
- a neural network model for generating a time-lapse video is preset in the local terminal, that is, the local terminal can independently generate a time-lapse video.
- the second way is that the user operates on the local terminal and obtains the delayed photography video by using the remote server; the specific process is as follows:
- the local terminal sends the specified image to the remote server;
- the specified image may be a picture of the sky taken by the user on the spot, or the user selects a picture of the sky already existing;
- the user only needs to send the picture of the sky to the remote server through the local terminal, and the remote server is pre-set with a neural network model for generating the time-lapse video, and the remote server generates the time-lapse photography predicted by the sky image.
- the video is then sent to the user's local terminal.
- the present application provides a training method of a neural network model, a method and a device for generating a time-lapse video.
- the technical solution provided by the present application generates a time-lapse video based on a neural network model of a dual network structure, wherein the dual network structure specifically includes a basic network for modeling content of time-lapse video and a video for delay photography.
- An optimized network for modeling the motion state the basic network is a first generation-type confrontation network that takes as input a video including a first predetermined number of frame-specified frame images, and the base time-lapse photographic video is output, and the optimized network is The output of the basic network is used as an input to model the motion state of the time-lapse video to optimize the time-lapse video as the output of the second generation-oriented network. After acquiring multiple training videos, the training video is generated and generated.
- An image set corresponding to the training video includes a first frame image or a tail frame image in the first preset number of the training videos, and the neural network consisting of the training video and the corresponding image set training the basic network and the optimized network
- the network model can be used to generate time-lapse video when the training end condition is met.
- the technical solution continuously optimizes the time-lapse photography video through the multi-stage generation confrontation network, and ensures the prediction of a reasonable future frame by modeling the content and modeling the motion state, thereby realizing the generation of the delayed photography video from coarse to fine.
- the authenticity of the content and the rationality of the motion information are preserved, so that the generated time-lapse photography video has higher authenticity and is more natural;
- the model used is a cascaded dual network structure, Easy to implement and simplify, it can be applied to cloud or offline scenarios.
- the steps of a method or algorithm described in connection with the embodiments disclosed herein can be implemented directly in hardware, a software module executed by a processor, or a combination of both.
- the software modules can be located in random access memory (RAM), memory, read only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, or any other form of storage medium known in the art.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Multimedia (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Probability & Statistics with Applications (AREA)
- Geometry (AREA)
- Computer Graphics (AREA)
- Image Analysis (AREA)
- Studio Devices (AREA)
Abstract
一种神经网络模型的训练、延时摄影视频的生成方法及设备。所述延时摄影视频的生成方法包括:获取指定图像;根据指定图像生成包括第一预设数目帧指定图像的图像集;根据所述图像集,通过预先训练的神经网络模型对图像集进行内容建模和运动状态建模,获得神经网络模型输出的延时摄影视频;其中,所述神经网络模型包括用于对延时摄影视频进行内容建模的基础网络和用于对延时摄影视频的运动状态进行建模的优化网络,其是通过获取训练样本,并根据包括训练视频及其对应的图像集在内的训练样本训练得到。通过多阶段的生成对抗网络对延时摄影视频进行持续性优化,通过对内容建模以及对运动状态建模保证合理预测,实现从粗到细逐步生成延时摄影视频。
Description
本申请要求于2018年03月26日提交中国专利局、申请号为201810253848.3、申请名称为“神经网络模型的训练、延时摄影视频的生成方法及设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及人工智能技术领域,尤其涉及一种神经网络模型的训练方法、一种延时摄影视频的生成方法以及对应的装置、系统、设备、存储介质及计算机程序产品。
延时摄影又叫缩时摄影(Time-lapse photography)或缩时录影,是以一种将时间压缩的拍摄技术。其拍摄的是一组照片,后期通过照片串联,把几分钟、几小时甚至是几天或者几年的过程压缩在一个较短的时间内以视频的方式播放。目前,延时摄影视频的生成方法仍处于学术研究阶段,其主要是通过神经网络模型实现的,但是该方法生成的视频内容模糊,真实性较差,难以满足用户的需求,因而并未得到广泛的应用。
发明内容
有鉴于此,本申请实施例提供了一种神经网络模型的训练方法、延时摄影视频的生成方法,其能够生成清晰流畅、真实性较高的延时摄影视频,满足用户的需求,具有广泛的应用前景。本申请还提供了相应的装置、系统、设备、存储介质以及计算机程序产品。
为实现上述目的,本申请实施例提供如下技术方案:
一种神经网络模型的训练方法,应用于服务器,包括:
获取训练样本,所述训练样本包括训练视频以及与其对应的图像集,所述图像集包括第一预设数目帧所述训练视频中的首帧图像或尾帧图像;
根据所述训练样本训练得到满足训练结束条件的神经网络模型,所述神经 网络模型包括用于对延时摄影视频进行内容建模的基础网络和用于对延时摄影视频的运动状态进行建模的优化网络;
其中,所述基础网络是以包括第一预设数目帧相同图像的图像集作为输入,以基础延时摄影视频为输出的第一生成式对抗网络;
所述优化网络是以所述基础网络的输出作为输入,以优化延时摄影视频作为输出的第二生成式对抗网络。
一种延时摄影视频的生成方法,应用于电子设备,包括:
获取指定图像;
根据所述指定图像,生成包括第一预设数目帧所述指定图像的图像集;
根据所述图像集,通过预先训练的神经网络模型对所述图像集进行内容建模和运动状态建模,获得所述神经网络模型输出的延时摄影视频;其中,所述神经网络模型是通过本申请所述的神经网络模型的训练方法训练得到的。
一种神经网络模型的训练装置,包括:
获取模块,用于获取训练样本,所述训练样本包括训练视频以及与其对应的图像集,所述图像集包括第一预设数目张所述训练视频中的首帧图像或尾帧图像;
训练模块,用于根据所述训练样本训练得到满足训练结束条件的神经网络模型,所述神经网络模型包括用于对延时摄影视频进行内容建模的基础网络和用于对延时摄影视频的运动状态进行建模的优化网络;其中,所述基础网络是以包括第一预设数目帧相同图像的图像集作为输入,以基础延时摄影视频为输出的第一生成式对抗网络;所述优化网络是以所述基础网络的输出作为输入,以优化延时摄影视频作为输出的第二生成式对抗网络。
一种延时摄影视频的生成装置,所述装置包括:
获取模块,用于获取指定图像;
第一生成模块,用于根据所述指定图像,生成包括第一预设数目张所述指定图像的图像集;
第二生成模块,用于根据所述图像集,通过预先训练的神经网络模型对所述图像集进行内容建模和运动状态建模,获得所述神经网络模型输出的延时摄影视频;其中,所述神经网络模型是通过本申请所述的神经网络模型的训练方 法训练得到的。
一种延时摄影视频的生成系统,包括:
终端和服务器,所述终端和所述服务器通过网络进行交互;
所述服务器用于接收所述终端发送的指定图像,根据所述指定图像,生成包括第一预设数目张所述指定图像的图像集,根据所述图像集,通过预先训练的神经网络模型对所述图像集进行内容建模和运动状态建模,获得所述神经网络模型输出的延时摄影视频,并将向所述终端发送所述延时摄影视频;其中,所述神经网络模型是通过本申请所述的神经网络模型的训练方法训练得到的。
一种电子设备,包括:
处理器和存储器;其中,
所述存储器中用于存储计算机程序;
所述处理器用于调用并执行所述存储器中的计算机程序,以实现本申请所述的神经网络模型的训练方法,或者用于实现本申请所述的延时摄影视频的生成方法。
一种存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时,实现:如上述所述的用于生成延时摄影视频的神经网络模型的训练方法的各个步骤;和/或如上述所述的延时摄影视频的生成方法的各个步骤。
一种包括指令的计算机程序产品,当其在计算机上运行时,使得所述计算机执行本申请所述的神经网络模型的训练方法,或者执行本申请所述的延时摄影视频的生成方法。
经由上述的技术方案可知,与现有技术相比,本申请提供了一种基于双网络结构的模型生成延时摄影视频的方法,其中,双网络结构具体包括用于对延时摄影视频进行内容建模的基础网络和用于对延时摄影视频的运动状态进行建模的优化网络,其中,基础网络是以包括第一预设数目帧指定帧图像的视频为输入,以基础延时摄影视频为输出的第一生成式对抗网络,优化网络是以所述基础网络的输出作为输入,用于对延时摄影视频的运动状态进行建模,以优化延时摄影视频作为输出的第二生成式对抗网络,在获取多个训练视频后,根据训练视频生成与训练视频对应的图像集,该图像集包括第一预设数目张所述训练视频中的首帧图像或尾帧图像,通过该训练视频及其对应的图像集训练基 础网络和优化网络组成的神经网络模型,当满足训练结束条件时,即可将该视频用于生成延时摄影视频。
具体地,获取指定图像,根据该指定图像生成包括第一预设数目张指定图像的指定图像集,然后利用预先训练的神经网络模型对所述指定图像集进行内容建模和运动状态建模,获得所述神经网络模型输出的延时摄影视频。该方法通过多阶段的生成对抗网络对延时摄影视频进行持续性优化,通过对内容建模以及对运动状态建模保证预测合理的未来帧或历史帧,实现从粗到细逐步生成延时摄影视频。该方法一方面保留了内容的真实性和运动信息的合理性,使得生成的延时摄影视频具有较高的真实性,并且比较自然;另一方面,该方法所使用的模型是级联的双网络结构,易于实现和简化,可以应用于云端或离线场景中。
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为本申请实施例提供的一种延时摄影视频的生成方法的流程图;
图2为本申请实施例提供的一种神经网络模型的训练方法的流程图;
图3为本申请实施例提供的另外一种神经网络模型的训练方法的流程图;
图4为本申请实施例提供的基础网络的训练方法的流程图;
图5为本申请实施例提供的一种基础网络的结构图;
图6为本申请实施例提供的优化网络的训练方法的流程图;
图7为本申请实施例提供的一种优化网络的结构图;
图8为本申请实施例提供的一种延时摄影视频的生成系统的结构图;
图9为本申请实施例提供的一种延时摄影视频的生成方法的信令流程图;
图10为本申请实施例提供的一种延时摄影视频的生成装置的结构图;
图11为本申请实施例提供的一种神经网络模型的训练装置的结构图;
图12为本申请实施例提供的另外一种神经网络模型的训练装置的结构 图;
图13为本申请实施例提供的一种电子设备的硬件结构图。
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
为使本申请的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本申请作进一步详细的说明。
实施例
本申请实施例提供一种延时摄影视频的生成方法,可以应用于电子设备,该电子设备可以是位于本地的终端,也可以是云端的服务器,或者是由终端和服务器组成的延时摄影视频的生成系统。请参阅图1,图1为本申请实施例提供的一种延时摄影视频的生成方法的流程图。如图1所示,该方法包括:
步骤S11,获取指定图像;
该方法通过终端实现时,获取指定图像有两种实现方式,一种实现方式为从相册中选中照片作为指定图像,具体地,终端响应于选中指令,获取相册中被选中的照片作为指定图像;一种实现方式为实时拍摄图像作为指定图像,具体地,终端响应于拍摄指令,获取拍摄的照片作为指定图像。当该方法通过服务器实现时,服务器接收终端发送的延时摄影生成请求,该延时摄影生成请求中携带有指定图像,服务器可以从所述延时摄影生成请求中获取所述指定图像。
步骤S12,根据所述指定图像,生成包括第一预设数目帧所述指定图像的图像集;
本实施例提供了两种方式生成图像集。一种实现方式为,复制所述指定图像,直至所述指定图像的数目达到第一预设数目,然后根据第一预设数目帧所述指定图像生成图像集;另一种实现方式为,从数据源处重复获取指定图像,例如从相册中多次获取同一图像,该图像即为指定图像,直至所述指定图像的 数目达到第一预设数目,然后根据第一预设数目帧所述指定图像生成图像集。
可选的,所述第一预设数目可以为32,即所述图像集包括32张所述指定图像,需要说明的是,本申请并不限定所述第一预设数目的具体数值,其可以根据实际需要进行调整,第一预设数目具体数值的变化不脱离本申请的保护范围。
步骤S13,根据所述图像集,通过预先训练的神经网络模型对所述图像集进行内容建模和运动状态建模,获得所述神经网络模型输出的延时摄影视频。
其中,所述神经网络模型包括基础网络和优化网络。基础网络用于对延时摄影视频进行内容建模,其是以包括第一预设数目帧指定图像的图像集为输入,以基础延时摄影视频为输出的生成式对抗网络,为了方便表述,记作第一生成式对抗网络。优化网络用于对对延时摄影视频的运动状态进行建模,其是以所述基础网络的输出作为输入,以优化延时摄影视频作为输出的生成式对抗网络,为了方便表述,记作第二生成式对抗网络。
所谓生成式对抗网络是一种基于二人零和博弈思想实现的网络。该网络中包括生成模型(也称生成器)和判别模型(也称判别器),生成模型捕捉样本数据的分布,用服从某一分布(如均匀分布,高斯分布等)的噪声生成一个类似真实训练数据的样本数据;判别模型是一个二分类器,用于估计一个样本数据来自于真实训练数据(而非生成数据)的概率,如果样本来自于真实的训练数据,则输出大概率,否则,输出小概率。其中,生成模型的目标是生成和真实训练数据一样的样本数据,使得判别模型判别不出来,判别模型的目标是检测出来生成模型生成的样本数据。
在本实施例中,电子设备部署有神经网络模型,将指定图像集输入至神经网络模型,该神经网络模型的基础网络对延时摄影视频进行内容建模,生成基础延时摄影视频,接着,基础网络输出的基础延时摄影视频输入至优化网络,优化网络对延时摄影视频的运动状态进行建模,输出优化延时摄影视频,该优化延时摄影视频即为最终输出的延时摄影视频。
需要说明的是,本实施例提供的延时摄影视频的生成方法所生成的视频可以是表征未来的视频,也可以是表征过去的视频,其主要由所使用的神经网络模型决定,若模型是对未来帧进行预测,实现前向预测,则生成表征未来的视 频,若模型是对历史帧预测,实现后向预测,则生成表征过去的视频。
为了便于理解,下面以一个简单示例进行说明。若用户想要生成一个表征花开过程的延时摄影视频,则可以将花为花蕾时的照片作为指定图像,通过复制该指定图像得到第一预设数目帧指定图像,从而生成图像集,然后将图像集输入至能够对未来帧进行预测的神经网络模型,该神经网络模型能够输出花由花蕾状态逐步盛开直至完全盛开的延时视频。
在一些可能的实现方式中,若用户想要生成一个表征花开过程的延时摄影视频,也可以将花完全盛开时的照片作为指定图像,通过复制该指定图像得到第一预设数目帧指定图像,从而生成图像集,然后将图像集输入至能够对历史帧进预测的神经网络模型,该神经网络模型能够预测花完全盛开以前的历史帧,因而能够输出花由花蕾状态逐步盛开直至完全盛开的延时视频。
其中,神经网络模型的训练过程将在下文进行说明,在此不作详细介绍。
本申请实施例提供了一种延时摄影视频的生成方法,该方法是利用预先训练的神经网络模型对包括第一预设数目帧指定图像的图像集进行内容建模和运动状态建模生成延时摄影视频的,该方法通过多阶段的生成对抗网络对延时摄影视频进行持续性优化,通过对内容建模以及对运动状态建模保证预测合理的未来帧或历史帧,实现从粗到细逐步生成延时摄影视频。该方法一方面保留了内容的真实性和运动信息的合理性,使得生成的延时摄影视频具有较高的真实性,并且比较自然;另一方面,该方法所使用的模型是级联的双网络结构,易于实现和简化,可以应用于云端或离线场景中。
具体的,本申请上文实施例中,所述步骤S13使用的是已经预先训练好的用于生成延时摄影视频的神经网络模型,可以理解的是,该用于生成延时摄影视频的神经网络模型需要提前进行训练,下面将对该用于生成延时摄影视频的神经网络模型的训练过程进行介绍。
请参阅图2,图2为本申请实施例提供的一种神经网络模型的训练方法的流程图。如图2所示,该方法包括:
步骤S21,获取训练样本;
所述训练样本包括训练视频以及与其对应的图像集,所述图像集包括第一 预设数目帧所述训练视频中的首帧图像或尾帧图像。需要说明的是,在训练神经网络模型时通常是采用批量训练样本实现的,批量训练样本的图像集所包括的图像均为训练视频中的首帧图像,或者均为训练视频中的尾帧图像。
其中,训练视频为延时摄影视频。具体的,将预先获取的各延时摄影视频进行预处理,生成合格的多个训练视频,获取多个该合格的、独立且不重合的训练视频。
可选的,可以通过设定关键字预先在互联网上爬取大量的延时摄影视频,这些爬取的延时摄影视频一般较大,可以将这些大的视频分割成小的视频片段,在这个过程当中,去掉不合适的训练数据,比如画面静止不动、画面黑边很大、画面很黑或者画面有快速的放大缩小等操作的小视频片段。去掉了这些不合适的视频片段之后,可以将剩下的视频片段按照每第一预设数目帧组成一个训练视频的形式得到合格的、独立且不重合的训练视频。例如,一视频片段包括128帧,第一预设数目为32,则可以将该视频片段按照每32帧生成4个训练视频。其中,各训练视频包括第一预设数目帧图像,所述第一预设数目可以为32,其大小比较合适,方便训练,当然,所述第一预设数目可以根据实际需要来进行设置,本申请并不限定其具体数值,其具体数值的变化不脱离本申请的保护范围。
在本实施例中,可以通过如下方式获取训练样本,具体地,首先获取训练视频,然后从所述训练视频中提取首帧图像或尾帧图像,生成与所述训练视频对应的图像集,将所述训练视频以及其对应的图像集作为训练样本。其中,从所述训练视频中提取首帧图像或尾帧图像后,可以通过两种方式生成训练视频对应的图像集。一种方式为复制提取的图像,直至图像数目达到第一预设数目,根据上述第一预设数目帧图像生成图像集。另一种方式为,通过多次提取的方式,得到第一预设数目帧首帧图像,或者第一预设数目帧尾帧图像,从而生成图像集。
步骤S23,根据所述训练样本训练得到满足训练结束条件的神经网络模型。
所述神经网络模型包括用于对延时摄影视频进行内容建模的基础网络和用于对延时摄影视频的运动状态进行建模的优化网络;其中,所述基础网络是 以包括第一预设数目帧相同图像的图像集作为输入,以基础延时摄影视频为输出的第一生成式对抗网络;所述优化网络是以所述基础网络的输出作为输入,以优化延时摄影视频作为输出的第二生成式对抗网络。
在本实施例中,基础网络和优化网络均为生成式对抗网络,基础网络能够基于包括第一预设数目帧相同图像的图像集,进行内容建模,从而生成基础延时摄影视频,在此基础上,还可以通过优化网络对基础延时摄影视频进行运动状态建模,以便进行持续优化,生成更加真实、更为自然的优化延时摄影视频。
下面对所述步骤S22中根据所述训练样本训练得到满足训练结束条件的神经网络模型过程进行详细阐述。请参阅图3,图3为本申请实施例提供的另外一种神经网络模型的训练方法的流程图。如图3所示,该方法包括:
步骤S31,根据所述训练样本训练得到满足训练结束条件的第一生成式对抗网络,作为基础网络;
训练样本中包括训练视频和训练视频对应的图像集,基础网络以图像集为输入,通过进行内容建模输出基础延时摄影视频,其以生成的基础延时摄影视频接近训练视频为目标,如此,可以基于生成视频与训练视频的相似程度调整第一生成式对抗网络的参数,通过不断调参,对第一生成式对抗网络进行优化,当满足训练结束条件时,将该第一生成式网络作为基础网络。
其中,训练结束条件可以根据实际需求而设置,例如可以是第一生成式对抗网络的损失函数处于收敛状态,或者第一生成式对抗网络的损失函数小于预设值。基础网络的训练过程将在下文进行详细描述,在此不再赘述。
步骤S32,根据所述训练视频对应的图像集,通过基础网络获得所述基础网络输出的基础延时摄影视频;
基础网络以包括第一预设数目帧相同图像的图像集为输入,以基础延时摄影视频为输出,将训练视频对应的图像集输入基础网络,可以获得基础网络输出的基础延时摄影视频。
步骤S33,根据所述基础延时摄影视频和所述训练视频,训练得到满足训练结束条件的第二生成式对抗网络,作为优化网络。
优化网络用于对基础延时摄影视频进行进一步优化,其可以通过生成式对 抗网络训练得到。在具体实现时,可以以基础延时摄影视频和训练视频作为训练样本,以基础延时摄影视频为输入,以优化延时摄影视频为输出,以生成的优化延时摄影视频接近训练视频为目标,如此,可以基于生成的优化延时摄影视频与训练视频的相似程度调整第二生成式对抗网络的参数,通过不断调参,对第二生成式对抗网络进行优化,当满足训练结束条件时,将该第二生成式网络作为优化网络。
其中,训练结束条件可以根据实际需求而设置,例如可以是第二生成式对抗网络的损失函数处于收敛状态,或者第二生成式对抗网络的损失函数小于预设值。基础网络的训练过程将在下文进行详细描述,在此不再赘述。在训练好基础网络和优化网络后,将基础网络和优化网络级联即为生成延时摄影视频的神经网络模型。
下面对所述步骤S31中基础网络的训练过程进行详细阐述。请参阅图4,图4为本申请实施例提供的基础网络的训练方法的流程图。如图4所示,该方法包括:
步骤S41,将所述图像集输入至第一生成器,得到所述第一生成器网络输出的基础延时摄影视频;
在该实施例中,基础网络包括第一生成器和第一判别器,其中,第一生成器用于生成基础延时摄影视频,第一判别器用于判别基础延时摄影视频是否为真实视频,若第一判别器判别结果为真实视频,则表明第一生成器生成的基础延时摄影视频具有较高的真实性,而且比较自然。
第一生成器可以由编码器和解码器组成。在具体实现时,编码器括包括指定数目的卷积层,解码器可以包括指定数目的反卷积层,如此,编码器整体呈现对称结构。其中,指定数目可以根据实际需求设置,作为一个示例,其可以为6。每个卷积层和与之对称的反卷积层通过跳接进行相连,如此,可以更好地利用编码器的特征。经过基础网络的第一生成器,输出得到与输入的原始图片分辨率不变的视频帧。
第一判别器,用于分别对第一生成器生成的视频(即预测的视频)和上文中的所述训练视频(即真实视频)进行判别,以保证第一生成器生成更加接近 真实的视频,该判别器除了输出层是一个二分类的层之外,其他部分具有与第一生成器中的编码器一样的结构。需要说明的是,第一判别器中卷积层的个数可以根据实际需要进行调整,本申请对此不做限定。
请参阅图5,图5为本申请实施例提供的一种基础网络的结构图。如图5所示,该基础网络包括第一生成器51和第一判别器52,x表示首帧图像或尾帧图像,X表示首帧图像形成的图像集或者尾帧图像形成的图像集,Y表示训练视频,Y1表示第一生成器输出的基础延时摄影视频。
步骤S42,将所述基础延时摄影视频和所述图像集对应的训练视频输入至所述第一判别器,通过第一生成对抗式网络的损失函数计算第一生成对抗式网络的损失;
在本实施例中,为了保证生成器生成真实性较高的视频,采用一个判别器即第一判别器分别对生成器生成的视频和真实视频进行判别。第一判别器具有与第一生成器中的编码器相似的结构,主要区别在于其输出层为二分类层,将第一生成器输出的基础延时摄影视频和训练视频输入第一判别器,第一判别器根据基础延时摄影视频和训练视频计算第一生成式对抗损失。
本实施例是通过调整网络参数减小第一生成对抗式网络的损失实现基础网络的训练的。其中,第一生成对抗式网络的损失至少包括对抗损失,该对抗损失可以基于如下公式计算得到:
其中,L
adv表示对抗损失,E表示期望,D
1表示第一生成器对应的函数,G
1表示第一判别器对应的函数,X表示图像集对应的四维矩阵,Y表示(所述图像集对应的)训练视频所对应的四维矩阵;其中,所述四维矩阵的四个维度分别是图像的长、宽、通道数(指图像的通道数,如果图像为RGB色彩模式,则通道数为3)以及图像的帧数。
其中,
表示:在计算第一生成器的对抗损失时,第一判别器的函数D
1取常量(即,为固定值),而第一生成器的函数G
1取最大值;在计算第一判别器的对抗损失时,第一生成器的函数G
1取常量(即,为固定值),而第一判别器对应的函数D
1取最大值。
其中,为了保证第一生成器生成的视频内容足够真实,还设置了基于L1范数的内容损失函数为:
L
con(G
1)=||Y-G
1(X)||
1 (2)
其中,L
con(G
1)表示内容损失,G
1表示第一判别器对应的函数,X表示图像集对应的四维矩阵,Y表示(所述图像集对应的)训练视频所对应的四维矩阵;|| ||
1表示求L1范数。
也即,第一生成对抗式网络的损失可以是对抗损失与基于L1范数的内容损失之和。
步骤S43,基于所述第一生成对抗式网络的损失,分别更新所述第一生成器和所述第一判别器的参数;
具体的,通过所述第一生成对抗式网络的损失,计算各层的梯度值,进而对所述第一生成器和所述第一判别器的参数(如权重、偏移量等)进行更新。通过不断更新第一生成器和第一判别器的参数,实现第一生成对抗式网络的训练,当满足训练结束条件时,如第一生成对抗式网络的损失处于收敛,或者小于预设值时,即可将第一生成对抗式网络确定为基础网络。
下面对所述步骤S33中优化网络的训练过程进行详细阐述。请参阅图6,图6为本申请实施例提供的优化网络的训练方法的流程图。如图6所示,该方法包括:
步骤S61,根据所述基础延时摄影视频,通过所述第二生成式对抗网络中的第二生成器,获得优化延时摄影视频;
优化网络包括第二生成器和第二判别器,其中,第二生成器用于根据基础延时摄影视频,进行运动信息建模,得到优化延时摄影视频,第二判别器则用于判断优化延时摄影视频是否为真实视频,若第二判别器判别结果为真实视频,则表明第二生成器生成的优化延时摄影视频具有较高的真实性,而且比较自然。
与基础网络类似,优化网络中的第二生成器包括编码器和解码器,其中,编码器可以由M个卷积层构成,解码器由M个反卷积层构成,编码器整体呈现对称结构。其中,M为正整数。此外,可以选择性的指定卷积层和与之对 称的反卷积层通过跳接进行相连,这样可以更好地利用编码器的特征,具体指定哪一(或者哪几)个卷积层和与之对称的反卷积层通过跳接进行相连,可以通过一定量的实验后,根据实验结果来择优确定,本申请对此不做限定。
需要说明的是,卷积层的个数和反卷积层的个数(即M),以及每个层的参数配置,都可以根据实际需要进行调整,比如M可以等于6,本申请对此不做限定,只需保证输入和输出的图片分辨率保持一致即可。也就是说,优化网络的第二生成器中,卷积层和反卷积层的个数的增减变化不脱离本申请的保护范围。通过对比可以发现,优化网络的第二生成器网络具有与基础网络的第一生成器网络相类似的结构(除了去掉几个跳接之外,其余结构相同)。
而优化网络的第二判别器具有与基础网络的第一判别器相同的结构,在此不再赘述。
请参阅图7,图7为本申请实施例提供的一种优化网络的结构图。如图7所示,该优化网络包括第二生成器71和第二判别器72,Y1'表示所述训练之后的基础网络输出的基础延时摄影视频,Y表示训练视频,Y2表示第二生成器输出的优化延时摄影视频。
步骤S62,根据所述优化延时摄影视频,通过所述第二生成式对抗网络中的第二判别器,获得判别结果;
与第一判别器类似,第二判别器可以根据优化延时摄影视频和训练视频对第二生成器生成的优化延时摄影视频的真实性进行判别,从而得到判别结果,若优化延时摄影视频与训练视频相似性达到预设程度,则判别优化延时摄影视频为真实视频,也即该优化延时摄影视频具有较高的真实性。
步骤S63,根据所述优化延时摄影视频、所述基础延时摄影视频、所述训练视频和所述判别结果,生成第二生成式对抗网络的损失;
与基础网络类似,优化网络是通过调整参数减少第二生成式对抗网络的损失实现模型训练的。所述损失至少包括排序损失,所述排序损失是根据所述优化延时摄影视频、所述基础延时摄影视频和所述训练视频各自对应的运动特征确定的。
可选的,所述第二生成式对抗网络的损失可以根据所述第二生成式对抗网络的内容损失、对抗损失以及所述排序损失确定,基于此,在一些可能的实现 方式中,所述优化网络的损失函数可以为:
预设常数与排序损失函数的乘积、对抗损失函数以及基于L1范数的内容损失函数三者相加之和。
其中,所述优化网络的损失函数的表达式为:
L
stage1=L
adv+λ·L
rank+L
con (3)
其中,L
stage1表示优化网络的损失,L
adv表示对抗损失,L
con[即L
con(G
1)]表示内容损失,λ表示预设常数,L
rank表示(总的)排序损失;所述对抗损失函数、基于L1范数的内容损失函数已在上文进行阐述,此处不再赘述。下面重点对排序损失函数进行说明。
在一些可能的实现方式中,可以利用所述第二生成式对抗网络中的第二判别器,分别提取所述优化延时摄影视频的特征、所述基础延时摄影视频和所述训练视频各自的特征,根据所述特征分别计算所述优化延时摄影视频、所述基础延时摄影视频和所述训练视频各自对应的格拉姆gram矩阵,该gram矩阵用于表征视频帧间的运动状态;然后根据所述优化延时摄影视频、所述基础延时摄影视频和所述训练视频各自对应的格拉姆gram矩阵,可以确定所述排序损失。其中,所述排序损失函数为:
其中,L
rank(Y
1,Y,Y
2)表示(总的)排序损失,L
rank(Y
1,Y,Y
2;l)表示单层(即单个特征层)的排序损失函数,l表示第二判别器中特征层的序号,Y
1表示基础延时摄影视频对应的四维矩阵,Y表示(所述图像集对应的)训练视频所对应的四维矩阵,Y
2表示优化延时摄影视频对应的四维矩阵,
表示求和。可选的,l(即具体选择哪些特征层)可以通过一定量的实验后,根据实验结果来择优确定。
可选的,单层的排序损失函数的表达式为:
其中,g(Y;l)表示在l层提取到的gram(格拉姆)矩阵。
步骤S64,根据所述生成式对抗网络的损失,对所述第二生成式对抗网络 的网络参数进行优化,直到得到满足训练结束条件的第二生成式对抗网络,作为优化网络。
具体的,通过所述优化网络的损失,计算各层的梯度值,进而对所述第二生成器和所述第二判别器的参数(如权重、偏移量等)进行更新。通过不断更新第二生成器和第二判别器的参数,实现第二生成对抗式网络的训练,当满足训练结束条件时,如第二生成对抗式网络的损失处于收敛,或者小于预设值时,即可将第二生成对抗式网络确定为优化网络。
上文各实施例中,第一生成器和第一判别器交替进行训练:在训练第一生成器的时候,第一判别器固定;在训练第一判别器的时候,第一生成器固定;类似的,第二生成器和第二判别器交替进行训练:在训练第二生成器的时候,第二判别器固定,如此最小化排序损失,以保证第二生成器输出的优化延时摄影视频更加接近真实的视频(即第二生成器生成的视频与真实视频更加相似),而且更加远离(即增加差异化)输入至第二生成器的视频(即,已训练至收敛的基础网络输出的视频);在训练第二判别器的时候,第二生成器固定,最大化排序损失,以放大第二生成器输出的优化延时摄影视频与真实视频之间的区别,有利于后续对优化网络的进一步训练。
经本实施例训练得到的优化网络,能够对已训练至收敛的基础网络输出的视频进行进一步优化,主要体现在能够对运动信息进行优化。
以上为本申请实施例提供的延时摄影视频的生成方法、神经网络模型的训练方法的具体实现方式,对应地,本申请还提供了延时摄影视频的生成系统。请参阅图8,图8为本申请实施例提供的一种延时摄影视频的生成系统的结构图。如图8所示,该系统包括:
终端81和服务器82,所述终端81和所述服务器82通过网络进行交互;
所述服务器82用于接收所述终端发送的指定图像,根据所述指定图像,生成包括第一预设数目帧所述指定图像的图像集,根据所述图像集,通过预先训练的神经网络模型对所述图像集进行内容建模和运动状态建模,获得所述神经网络模型输出的延时摄影视频,并将向所述终端发送所述延时摄影视频;其中,所述神经网络模型是通过上文所述的神经网络模型的训练方法训练得到 的。
可以理解的是,所述服务器82的操作还可以包括上文已经阐述的所述用于生成延时摄影视频的神经网络模型的训练过程的各个步骤。
可选的,所述终端81可以为智能手机等移动智能设备811,或者电脑等本地计算机设备812。
本申请实施例提供的技术方案,用户只需要通过本地终端上传一张指定图像,远程服务器便能够基于该指定图像,通过用于生成延时摄影视频的神经网络模型输出预测的延时摄影视频,并发送给本地终端,从而方便用户很容易便能够制作一个延时摄影视频,从而能够有效提升用户体验。
同时,该技术方案不需要本地终端来运行所述用于生成延时摄影视频的神经网络模型,从而不必占用本地终端的运行资源便能够制作一个延时摄影视频,从而可以有效节省本地终端的运行资源。
对应于本申请实施例提供的延时摄影视频的生成系统,本申请将介绍一种延时摄影视频的生成方法的信令流程。请参阅图9,图9为本申请实施例提供的一种延时摄影视频的生成方法的信令流程图,如图9所示,该信令流程包括:
步骤S91,本地终端将指定图像发送至远程服务器;
步骤S92,远程服务器复制所述指定图像,生成包括第一预设数目帧所述指定图像的图像集;
步骤S93,远程服务器将所述图像集输入至用于生成延时摄影视频的神经网络模型;
步骤S94,通过所述神经网络模型重构图像集中指定图像的内容,输出延时摄影视频;
其中,指定图像为首帧图像时,可以对其后的多帧图像进行内容建模,重构图像中的内容,指定图像为尾帧图像时,可以对在其之前的多帧图像进行内容建模,重构图像中的内容,从而生成延时摄影视频。步骤S95,远程服务器将所述输出的延时摄影视频发送至本地终端。
该方法通过多阶段的生成对抗网络对延时摄影视频进行持续性优化,通过对内容建模以及对运动状态建模保证预测合理的未来帧或历史帧,实现从粗到 细逐步生成延时摄影视频。该方法一方面保留了内容的真实性和运动信息的合理性,使得生成的延时摄影视频具有较高的真实性,并且比较自然。
同时,该技术方案不需要本地终端来运行所述用于生成延时摄影视频的神经网络模型,从而不必占用本地终端的运行资源便能够制作一个延时摄影视频,从而可以有效节省本地终端的运行资源。
另外,需要说明的是,由于所述用于生成延时摄影视频的神经网络模型的训练过程需要的系统资源较大,因此,优选的,在远程服务器一端执行所述用于生成延时摄影视频的神经网络模型的训练过程。
为了更加全面地阐述本申请提供的技术方案,对应于本申请实施例提供的延时摄影视频的生成方法,本申请公开一种延时摄影视频的生成装置。
请参阅图10,图10为本申请实施例提供的一种延时摄影视频的生成装置的结构图。该装置可应用于本地终端,或者延时摄影视频的生成系统中的远程服务器一端,如图10所示,该装置1000包括:
获取模块1010,用于获取指定图像;
第一生成模块1020,用于根据所述指定图像,生成包括第一预设数目帧所述指定图像的图像集;
第二生成模块1030,用于根据所述图像集,通过预先训练的神经网络模型对所述图像集进行内容建模和运动状态建模,获得所述神经网络模型输出的延时摄影视频;其中,所述神经网络模型是通过上述神经网络模型的训练方法训练得到的。
可选的,所述电子设备为终端设备,所述终端设备中部署有所述神经网络模型,则所述获取模块1010具体用于:
响应于选中指令,获取相册中被选中的照片,作为指定图像;或者,
响应于拍摄指令,获取拍摄的照片,作为指定图像。
可选的,所述电子设备为服务器,则所述获取模块1010具体用于:
接收终端设备发送的延时摄影生成请求,所述延时摄影生成请求中携带有指定图像;
从所述延时摄影生成请求中获取所述指定图像。
本申请实施例提供的延时摄影视频的生成装置,首先获取指定图像,根据该指定图像生成包括第一预设数目张指定图像的指定图像集,然后利用预先训练的神经网络模型对所述指定图像集进行内容建模和运动状态建模,获得所述神经网络模型输出的延时摄影视频。该装置通过多阶段的生成对抗网络对延时摄影视频进行持续性优化,通过对内容建模以及对运动状态建模保证预测合理的未来帧,实现从粗到细逐步生成延时摄影视频。该装置一方面保留了内容的真实性和运动信息的合理性,使得生成的延时摄影视频具有较高的真实性,并且比较自然;另一方面,该装置所使用的模型是级联的双网络结构,易于实现和简化,可以应用于云端或离线场景中。
可选的,请参阅图11,图11为本申请实施例提供的一种神经网络模型的训练装置的结构图。如图11所示,该装置1100包括:
获取模块1110,用于获取训练样本,所述训练样本包括训练视频以及与其对应的图像集,所述图像集包括第一预设数目张所述训练视频中的首帧图像或尾帧图像;
训练模块1120,用于根据所述训练样本训练得到满足训练结束条件的神经网络模型,所述神经网络模型包括用于对延时摄影视频进行内容建模的基础网络和用于对延时摄影视频的运动状态进行建模的优化网络;其中,所述基础网络是以包括第一预设数目帧相同图像的图像集作为输入,以基础延时摄影视频为输出的第一生成式对抗网络;所述优化网络是以所述基础网络的输出作为输入,以优化延时摄影视频作为输出的第二生成式对抗网络。
可选的,请参阅图12,图12为本申请实施例提供的另外一种神经网络模型的训练装置的结构图。如图12所示,该装置1100在包括如图11及其对应的实施例中所述的各模块的基础上,所述训练模块1120具体包括:
第一训练子模块1121,用于根据所述训练样本训练得到满足训练结束条件的第一生成式对抗网络,作为基础网络;
获取子模块1122,用于根据所述训练视频对应的图像集,通过基础网络获得所述基础网络输出的基础延时摄影视频;
第二训练子模块1123,用于根据所述基础延时摄影视频和所述训练视频,训练得到满足训练结束条件的第二生成式对抗网络,作为优化网络。
可选的,所述第二训练子模块1123具体用于:
根据所述基础延时摄影视频,通过所述第二生成式对抗网络中的第二生成器,获得优化延时摄影视频;
根据所述优化延时摄影视频,通过所述第二生成式对抗网络中的第二判别器,获得判别结果;
根据所述优化延时摄影视频、所述基础延时摄影视频、所述训练视频和所述判别结果,生成第二生成式对抗网络的损失,所述损失至少包括排序损失,所述排序损失是根据所述优化延时摄影视频、所述基础延时摄影视频和所述训练视频各自对应的运动特征确定的;
根据所述生成式对抗网络的损失,对所述第二生成式对抗网络的网络参数进行优化,直到得到满足训练结束条件的第二生成式对抗网络,作为优化网络。
可选的,所述装置还包括确定模块,用于通过以下方式确定所述第二生成式对抗网络的损失:
利用所述第二生成式对抗网络中的第二判别器,分别提取所述优化延时摄影视频的特征、所述基础延时摄影视频和所述训练视频各自的特征,根据所述特征分别计算所述优化延时摄影视频、所述基础延时摄影视频和所述训练视频各自对应的格拉姆gram矩阵,所述gram矩阵用于表征视频帧间的运动状态;
根据所述优化延时摄影视频、所述基础延时摄影视频和所述训练视频各自对应的格拉姆gram矩阵,确定所述排序损失;
根据所述第二生成式对抗网络的内容损失、对抗损失以及所述排序损失,确定所述第二生成式对抗网络的损失。
可选的,所述获取模块1110具体用于:
获取训练视频;
从所述训练视频中,提取首帧图像或者尾帧图像;
复制所述首帧图像或者尾帧图像,生成与所述训练视频对应的图像集;
将所述训练视频以及其对应的图像集作为训练样本。
由上可知,本申请提供了一种基于双网络结构的神经网络模型生成方法,其中,双网络结构具体包括用于对延时摄影视频进行内容建模的基础网络和用于对延时摄影视频的运动状态进行建模的优化网络,其中,基础网络是以包括第一预设数目帧指定帧图像的视频为输入,以基础延时摄影视频为输出的第一生成式对抗网络,优化网络是以所述基础网络的输出作为输入,用于对延时摄影视频的运动状态进行建模,以优化延时摄影视频作为输出的第二生成式对抗网络,在获取多个训练视频后,根据训练视频生成与训练视频对应的图像集,该图像集包括第一预设数目张所述训练视频中的首帧图像或尾帧图像,通过该训练视频及其对应的图像集训练基础网络和优化网络组成的神经网络模型,当满足训练结束条件时,即可将该视频用于生成延时摄影视频。该装置训练的神经网络模型通过多阶段的生成对抗网络对延时摄影视频进行持续性优化,通过对内容建模以及对运动状态建模保证预测合理的未来帧或历史帧,实现从粗到细逐步生成延时摄影视频。该方法一方面保留了内容的真实性和运动信息的合理性,使得生成的延时摄影视频具有较高的真实性,并且比较自然;另一方面,该装置训练的神经网络模型是级联的双网络结构,易于实现和简化,可以应用于云端或离线场景中。
为了更加全面地阐述本申请提供的技术方案,对应于本申请实施例提供的延时摄影视频的生成方法,本申请公开一种电子设备,该电子设备可以是本地终端(如本地计算机、移动终端等),或者远程服务器等。
请参阅图13,图13为本申请实施例提供的一种电子设备的硬件结构图。如图13所示,该电子设备包括:
处理器1,通信接口2,存储器3和通信总线4;
其中处理器1、通信接口2、存储器3通过通信总线4完成相互间的通信;
处理器1,用于调用并执行所述存储器中存储的程序;
存储器3,用于存储程序;
所述程序可以包括程序代码,所述程序代码包括计算机操作指令;在本申请实施例中,程序可以包括:用于生成延时摄影视频的神经网络模型的训练方法对应的程序,以及所述延时摄影视频的生成方法对应的程序这两套程序,或 者其中的任意一套程序。
处理器1可能是一个中央处理器CPU,或者是特定集成电路ASIC(Application Specific Integrated Circuit),或者是被配置成实施本申请实施例的一个或多个集成电路。
存储器3可能包含高速RAM存储器,也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。
其中,所述程序可具体用于:
获取指定图像;
根据所述指定图像,生成包括第一预设数目帧所述指定图像的图像集;
根据所述图像集,通过预先训练的神经网络模型对所述图像集进行内容建模和运动状态建模,获得所述神经网络模型输出的延时摄影视频;其中,所述神经网络模型是通过上述神经网络模型的训练方法训练得到的。
可选的,所述程序还可以用于执行本申请实施例提供的延时摄影视频的生成方法的任意一种实现方式的步骤。
此外,本申请实施例还提供一种存储介质,该存储介质存储有计算机程序,所述计算机程序被处理器执行时,用于执行上述实施例所述神经网络模型的训练方法的各个步骤,和/或所述延时摄影视频的生成方法的各个步骤。
下面对本申请的实际应用场景做简要介绍。比如用户想要制作一个以天空的变化为实际场景的延时摄影视频,应用本申请提供的技术方案,用户可以有两种方式进行制作:
第一种方式为,用户在本地终端进行制作;其中,本地终端的执行的操作包括:
获取用户提供的指定图像;该指定图像可以为用户现场拍摄的天空的图片,或者用户选择之前已有的天空的图片;复制所述指定图像,生成包括第一预设数目张所述指定图像的图像集;将所述图像集输入至用于生成延时摄影视频的神经网络模型;通过所述神经网络模型进行内容建模和运动状态建模,重构指定图像的内容,输出优化延时摄影视频。
该方式中,本地终端中预先设置有用于生成延时摄影视频的神经网络模 型,也就是说,本地终端可以独立生成延时摄影视频。
第二种方式为,用户在本地终端进行操作,借助远程服务器来获得延时摄影视频;其具体流程如下:
本地终端将指定图像发送至远程服务器;该指定图像可以为用户现场拍摄的天空的图片,或者用户选择之前已有的天空的图片;
远程服务器复制所述指定图像,生成包括第一预设数目张所述指定图像的图像集;将所述图像集输入至用于生成延时摄影视频的神经网络模型;通过所述神经网络模型进行内容建模和运动状态建模,重构定图像的内容,输出优化延时摄影视频。
该方式中,用户通过本地终端只需要将天空的图片发送至远程服务器,远程服务器中预先设置有用于生成延时摄影视频的神经网络模型,由远程服务器生成由天空的图片预测得到的延时摄影视频,然后再发送给用户的本地终端。
经由上述的技术方案可知,与现有技术相比,本申请提供了一种神经网络模型的训练、延时摄影视频的生成方法及设备。本申请提供的技术方案基于双网络结构的神经网络模型生成延时摄影视频,其中,双网络结构具体包括用于对延时摄影视频进行内容建模的基础网络和用于对延时摄影视频的运动状态进行建模的优化网络,基础网络是以包括第一预设数目帧指定帧图像的视频为输入,以基础延时摄影视频为输出的第一生成式对抗网络,优化网络是以所述基础网络的输出作为输入,用于对延时摄影视频的运动状态进行建模,以优化延时摄影视频作为输出的第二生成式对抗网络,在获取多个训练视频后,根据训练视频生成与训练视频对应的图像集,该图像集包括第一预设数目张所述训练视频中的首帧图像或尾帧图像,通过该训练视频及其对应的图像集训练基础网络和优化网络组成的神经网络模型,当满足训练结束条件时,即可将该视频用于生成延时摄影视频。
在生成延时摄影视频时,首先获取指定图像,根据该指定图像生成包括第一预设数目张指定图像的指定图像集,然后利用预先训练的神经网络模型对所述指定图像集进行内容建模和运动状态建模,获得所述神经网络模型输出的延时摄影视频。
该技术方案通过多阶段的生成对抗网络对延时摄影视频进行持续性优化,通过对内容建模以及对运动状态建模保证预测合理的未来帧,实现从粗到细逐步生成延时摄影视频。一方面保留了内容的真实性和运动信息的合理性,使得生成的延时摄影视频具有较高的真实性,并且比较自然;另一方面,由于所使用的模型是级联的双网络结构,易于实现和简化,可以应用于云端或离线场景中。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者智能设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者智能设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者智能设备中还存在另外的相同要素。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的装置、系统、智能设备和存储介质而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器 (RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器或技术领域内所公知的任意其它形式的存储介质中。
对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本申请。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本申请的精神或范围的情况下,在其它实施例中实现。因此,本申请将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。
Claims (14)
- 一种神经网络模型的训练方法,应用于服务器,包括:获取训练样本,所述训练样本包括训练视频以及与其对应的图像集,所述图像集包括第一预设数目帧所述训练视频中的首帧图像或尾帧图像;根据所述训练样本训练得到满足训练结束条件的神经网络模型,所述神经网络模型包括用于对延时摄影视频进行内容建模的基础网络和用于对延时摄影视频的运动状态进行建模的优化网络;其中,所述基础网络是以包括第一预设数目帧相同图像的图像集作为输入,以基础延时摄影视频为输出的第一生成式对抗网络;所述优化网络是以所述基础网络的输出作为输入,以优化延时摄影视频作为输出的第二生成式对抗网络。
- 根据权利要求1所述的方法,所述根据所述训练样本训练得到满足训练结束条件的神经网络模型,包括:根据所述训练样本训练得到满足训练结束条件的第一生成式对抗网络,作为基础网络;根据所述训练视频对应的图像集,通过基础网络获得所述基础网络输出的基础延时摄影视频;根据所述基础延时摄影视频和所述训练视频,训练得到满足训练结束条件的第二生成式对抗网络,作为优化网络。
- 根据权利要求2所述的方法,所述根据所述基础延时摄影视频和所述训练视频,训练得到满足训练结束条件的第二生成式对抗网络,作为优化网络,包括:根据所述基础延时摄影视频,通过所述第二生成式对抗网络中的第二生成器,获得优化延时摄影视频;根据所述优化延时摄影视频,通过所述第二生成式对抗网络中的第二判别器,获得判别结果;根据所述优化延时摄影视频、所述基础延时摄影视频、所述训练视频和所述判别结果,生成第二生成式对抗网络的损失,所述损失至少包括排序损失,所述排序损失是根据所述优化延时摄影视频、所述基础延时摄影视频和所述训 练视频各自对应的运动特征确定的;根据所述生成式对抗网络的损失,对所述第二生成式对抗网络的网络参数进行优化,直到得到满足训练结束条件的第二生成式对抗网络,作为优化网络。
- 根据权利要求1所述的方法,通过以下方式确定所述第二生成式对抗网络的损失:利用所述第二生成式对抗网络中的第二判别器,分别提取所述优化延时摄影视频的特征、所述基础延时摄影视频和所述训练视频各自的特征,根据所述特征分别计算所述优化延时摄影视频、所述基础延时摄影视频和所述训练视频各自对应的格拉姆gram矩阵,所述gram矩阵用于表征视频帧间的运动状态;根据所述优化延时摄影视频、所述基础延时摄影视频和所述训练视频各自对应的格拉姆gram矩阵,确定所述排序损失;根据所述第二生成式对抗网络的内容损失、对抗损失以及所述排序损失,确定所述第二生成式对抗网络的损失。
- 根据权利要求1所述的方法,所述获取训练样本,包括:获取训练视频;从所述训练视频中,提取首帧图像或者尾帧图像;复制所述首帧图像或者尾帧图像,生成与所述训练视频对应的图像集;将所述训练视频以及其对应的图像集作为训练样本。
- 一种延时摄影视频的生成方法,应用于电子设备,包括:获取指定图像;根据所述指定图像,生成包括第一预设数目帧所述指定图像的图像集;根据所述图像集,通过预先训练的神经网络模型对所述图像集进行内容建模和运动状态建模,获得所述神经网络模型输出的延时摄影视频;其中,所述神经网络模型是通过上述权利要求1至5中任一项所述的方法训练得到的。
- 根据权利要求1所述的方法,所述电子设备为终端设备,所述终端设备中部署有所述神经网络模型,则所述获取指定图像,包括:响应于选中指令,获取相册中被选中的照片,作为指定图像;或者,响应于拍摄指令,获取拍摄的照片,作为指定图像。
- 根据权利要求1所述的方法,所述电子设备为服务器,则所述获取指定图像,包括:接收终端设备发送的延时摄影生成请求,所述延时摄影生成请求中携带有指定图像;从所述延时摄影生成请求中获取所述指定图像。
- 一种神经网络模型的训练装置,包括:获取模块,用于获取训练样本,所述训练样本包括训练视频以及与其对应的图像集,所述图像集包括第一预设数目张所述训练视频中的首帧图像或尾帧图像;训练模块,用于根据所述训练样本训练得到满足训练结束条件的神经网络模型,所述神经网络模型包括用于对延时摄影视频进行内容建模的基础网络和用于对延时摄影视频的运动状态进行建模的优化网络;其中,所述基础网络是以包括第一预设数目帧相同图像的图像集作为输入,以基础延时摄影视频为输出的第一生成式对抗网络;所述优化网络是以所述基础网络的输出作为输入,以优化延时摄影视频作为输出的第二生成式对抗网络。
- 一种延时摄影视频的生成装置,包括:获取模块,用于获取指定图像;第一生成模块,用于根据所述指定图像,生成包括第一预设数目帧所述指定图像的图像集;第二生成模块,用于根据所述图像集,通过预先训练的神经网络模型对所述图像集进行内容建模和运动状态建模,获得所述神经网络模型输出的延时摄影视频;其中,所述神经网络模型是通过上述权利要求1至5中任一项所述的方法训练得到的。
- 一种延时摄影视频的生成系统,包括:终端和服务器,所述终端和所述服务器通过网络进行交互;所述服务器用于接收所述终端发送的指定图像,根据所述指定图像,生成包括第一预设数目帧所述指定图像的图像集,根据所述图像集,通过预先训练的神经网络模型对所述图像集进行内容建模和运动状态建模,获得所述神经网络模型输出的延时摄影视频,并将向所述终端发送所述延时摄影视频;其中, 所述神经网络模型是通过上述权利要求1至5中任一项所述的方法训练得到的。
- 一种电子设备,包括:存储器和处理器;其中,所述存储器中用于存储计算机程序;所述处理器用于调用并执行所述存储器中的计算机程序,以实现权利要求1至5任一项所述的神经网络模型的训练方法,或者用于实现权利要求6-8任一项所述的延时摄影视频的生成方法。
- 一种存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时用于实现权利要求1至5任一项所述的神经网络模型的训练方法,或者用于实现权利要求6-8任一项所述的延时摄影视频的生成方法。
- 一种包括指令的计算机程序产品,当其在计算机上运行时,使得所述计算机执行权利要求1至5任一项所述的神经网络模型的训练方法,或者执行权利要求6-8任一项所述的延时摄影视频的生成方法。
Priority Applications (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2020568019A JP7026262B2 (ja) | 2018-03-26 | 2019-03-01 | ニューラルネットワークモデルのトレーニング、タイムラプス撮影ビデオの生成方法及び装置 |
| EP19778365.7A EP3779891A4 (en) | 2018-03-26 | 2019-03-01 | METHOD AND DEVICE FOR TRAINING A NEURONAL NETWORK MODEL AND METHOD AND DEVICE FOR GENERATING PHOTOGRAPHIC VIDEO AT TIME INTERVALS |
| US16/892,587 US11429817B2 (en) | 2018-03-26 | 2020-06-04 | Neural network model training method and device, and time-lapse photography video generating method and device |
| US17/864,730 US12001959B2 (en) | 2018-03-26 | 2022-07-14 | Neural network model training method and device, and time-lapse photography video generating method and device |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201810253848.3A CN110363293A (zh) | 2018-03-26 | 2018-03-26 | 神经网络模型的训练、延时摄影视频的生成方法及设备 |
| CN201810253848.3 | 2018-03-26 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US16/892,587 Continuation US11429817B2 (en) | 2018-03-26 | 2020-06-04 | Neural network model training method and device, and time-lapse photography video generating method and device |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2019184654A1 true WO2019184654A1 (zh) | 2019-10-03 |
Family
ID=68060891
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2019/076724 Ceased WO2019184654A1 (zh) | 2018-03-26 | 2019-03-01 | 神经网络模型的训练、延时摄影视频的生成方法及设备 |
Country Status (5)
| Country | Link |
|---|---|
| US (2) | US11429817B2 (zh) |
| EP (1) | EP3779891A4 (zh) |
| JP (1) | JP7026262B2 (zh) |
| CN (2) | CN110363293A (zh) |
| WO (1) | WO2019184654A1 (zh) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111178401A (zh) * | 2019-12-16 | 2020-05-19 | 上海航天控制技术研究所 | 一种基于多层对抗网络的空间目标分类方法 |
| US20210319258A1 (en) * | 2019-05-07 | 2021-10-14 | Tencent Technology (Shenzhen) Company Limited | Method and apparatus for training classification task model, device, and storage medium |
| CN114615421A (zh) * | 2020-12-07 | 2022-06-10 | 华为技术有限公司 | 图像处理方法及电子设备 |
| CN116596776A (zh) * | 2023-04-14 | 2023-08-15 | 阿里巴巴达摩院(杭州)科技有限公司 | 图像处理、天气图像修复以及图像数据处理方法 |
Families Citing this family (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110363293A (zh) * | 2018-03-26 | 2019-10-22 | 腾讯科技(深圳)有限公司 | 神经网络模型的训练、延时摄影视频的生成方法及设备 |
| US11854245B2 (en) * | 2018-04-27 | 2023-12-26 | Carnegie Mellon University | Generative adversarial networks having ranking loss |
| TWI732370B (zh) * | 2019-12-04 | 2021-07-01 | 財團法人工業技術研究院 | 神經網路模型的訓練裝置和訓練方法 |
| CN111882825B (zh) * | 2020-06-18 | 2021-05-28 | 闽江学院 | 一种基于类脑电波数据的疲劳预测方法及装置 |
| US12327188B2 (en) | 2020-10-16 | 2025-06-10 | Adobe Inc. | Direct regression encoder architecture and training |
| US11463652B2 (en) * | 2020-12-29 | 2022-10-04 | TCL Research America Inc. | Write-a-movie: visualize your story from script |
| CN113792853B (zh) * | 2021-09-09 | 2023-09-05 | 北京百度网讯科技有限公司 | 字符生成模型的训练方法、字符生成方法、装置和设备 |
| CN113747072B (zh) * | 2021-09-13 | 2023-12-12 | 维沃移动通信有限公司 | 拍摄处理方法和电子设备 |
| US11689601B1 (en) * | 2022-06-17 | 2023-06-27 | International Business Machines Corporation | Stream quality enhancement |
| US12477159B2 (en) | 2023-03-22 | 2025-11-18 | Samsung Electronics Co., Ltd. | Cache-based content distribution network |
| US20240331088A1 (en) * | 2023-03-31 | 2024-10-03 | Samsung Electronics Co., Ltd. | Lightweight rendering system with on-device resolution improvement |
| CN116933100B (zh) * | 2023-07-21 | 2026-03-27 | 广东电网有限责任公司 | 一种光伏出力典型场景获取方法、装置、设备及存储介质 |
| CN117291252B (zh) * | 2023-11-27 | 2024-02-20 | 浙江华创视讯科技有限公司 | 稳定视频生成模型训练方法、生成方法、设备及存储介质 |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7289127B1 (en) * | 2005-04-25 | 2007-10-30 | Apple, Inc. | Multi-conic gradient generation |
| CN102737369A (zh) * | 2011-03-31 | 2012-10-17 | 卡西欧计算机株式会社 | 图像处理装置及图像处理方法 |
| CN106779073A (zh) * | 2016-12-27 | 2017-05-31 | 西安石油大学 | 基于深度神经网络的媒体信息分类方法及装置 |
| CN107624243A (zh) * | 2015-05-08 | 2018-01-23 | 微软技术许可有限责任公司 | 通过帧选择的实时超延时视频创建 |
Family Cites Families (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP5817858B2 (ja) | 2014-01-30 | 2015-11-18 | カシオ計算機株式会社 | 画像処理装置、画像処理方法、及びプログラム |
| KR102527811B1 (ko) | 2015-12-22 | 2023-05-03 | 삼성전자주식회사 | 타임랩스 영상을 생성하는 장치 및 방법 |
| US11144761B2 (en) * | 2016-04-04 | 2021-10-12 | Xerox Corporation | Deep data association for online multi-class multi-object tracking |
| JP6758950B2 (ja) | 2016-06-27 | 2020-09-23 | キヤノン株式会社 | 撮像装置、その制御方法とプログラム |
| US10805338B2 (en) * | 2016-10-06 | 2020-10-13 | Cisco Technology, Inc. | Analyzing encrypted traffic behavior using contextual traffic data |
| CN107730458A (zh) * | 2017-09-05 | 2018-02-23 | 北京飞搜科技有限公司 | 一种基于生成式对抗网络的模糊人脸重建方法及系统 |
| CN107679465B (zh) * | 2017-09-20 | 2019-11-15 | 上海交通大学 | 一种基于生成网络的行人重识别数据生成和扩充方法 |
| CN110363293A (zh) * | 2018-03-26 | 2019-10-22 | 腾讯科技(深圳)有限公司 | 神经网络模型的训练、延时摄影视频的生成方法及设备 |
-
2018
- 2018-03-26 CN CN201810253848.3A patent/CN110363293A/zh active Pending
- 2018-03-26 CN CN201910853402.9A patent/CN110555527A/zh active Pending
-
2019
- 2019-03-01 JP JP2020568019A patent/JP7026262B2/ja active Active
- 2019-03-01 EP EP19778365.7A patent/EP3779891A4/en active Pending
- 2019-03-01 WO PCT/CN2019/076724 patent/WO2019184654A1/zh not_active Ceased
-
2020
- 2020-06-04 US US16/892,587 patent/US11429817B2/en active Active
-
2022
- 2022-07-14 US US17/864,730 patent/US12001959B2/en active Active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7289127B1 (en) * | 2005-04-25 | 2007-10-30 | Apple, Inc. | Multi-conic gradient generation |
| CN102737369A (zh) * | 2011-03-31 | 2012-10-17 | 卡西欧计算机株式会社 | 图像处理装置及图像处理方法 |
| CN107624243A (zh) * | 2015-05-08 | 2018-01-23 | 微软技术许可有限责任公司 | 通过帧选择的实时超延时视频创建 |
| CN106779073A (zh) * | 2016-12-27 | 2017-05-31 | 西安石油大学 | 基于深度神经网络的媒体信息分类方法及装置 |
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20210319258A1 (en) * | 2019-05-07 | 2021-10-14 | Tencent Technology (Shenzhen) Company Limited | Method and apparatus for training classification task model, device, and storage medium |
| US12536782B2 (en) * | 2019-05-07 | 2026-01-27 | Tencent Technology (Shenzhen) Company Limited | Method and apparatus for training classification task model, device, and storage medium |
| CN111178401A (zh) * | 2019-12-16 | 2020-05-19 | 上海航天控制技术研究所 | 一种基于多层对抗网络的空间目标分类方法 |
| CN111178401B (zh) * | 2019-12-16 | 2023-09-12 | 上海航天控制技术研究所 | 一种基于多层对抗网络的空间目标分类方法 |
| CN114615421A (zh) * | 2020-12-07 | 2022-06-10 | 华为技术有限公司 | 图像处理方法及电子设备 |
| CN114615421B (zh) * | 2020-12-07 | 2023-06-30 | 华为技术有限公司 | 图像处理方法及电子设备 |
| CN116596776A (zh) * | 2023-04-14 | 2023-08-15 | 阿里巴巴达摩院(杭州)科技有限公司 | 图像处理、天气图像修复以及图像数据处理方法 |
Also Published As
| Publication number | Publication date |
|---|---|
| US11429817B2 (en) | 2022-08-30 |
| JP7026262B2 (ja) | 2022-02-25 |
| CN110555527A (zh) | 2019-12-10 |
| EP3779891A4 (en) | 2021-12-22 |
| US20200293833A1 (en) | 2020-09-17 |
| US12001959B2 (en) | 2024-06-04 |
| JP2021515347A (ja) | 2021-06-17 |
| EP3779891A1 (en) | 2021-02-17 |
| US20220366193A1 (en) | 2022-11-17 |
| CN110363293A (zh) | 2019-10-22 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2019184654A1 (zh) | 神经网络模型的训练、延时摄影视频的生成方法及设备 | |
| Ignatov et al. | Replacing mobile camera isp with a single deep learning model | |
| CN111327824B (zh) | 拍摄参数的选择方法、装置、存储介质及电子设备 | |
| JP6267224B2 (ja) | 最良の写真を検出及び選択する方法及びシステム | |
| CN111047543B (zh) | 图像增强方法、装置和存储介质 | |
| WO2022133382A1 (en) | Semantic refinement of image regions | |
| CN109948734B (zh) | 图像聚类方法、装置及电子设备 | |
| CN111327887A (zh) | 电子装置及其操作方法,以及处理电子装置的图像的方法 | |
| CN110166684A (zh) | 图像处理方法、装置、计算机可读介质及电子设备 | |
| US20250252537A1 (en) | Enhancing images from a mobile device to give a professional camera effect | |
| CN112069338B (zh) | 图片处理方法、装置、电子设备及存储介质 | |
| CN108259767B (zh) | 图像处理方法、装置、存储介质及电子设备 | |
| CN114638375A (zh) | 视频生成模型训练方法、视频生成方法及装置 | |
| WO2017177559A1 (zh) | 一种图像管理方法和装置 | |
| CN110727810A (zh) | 图像处理方法、装置、电子设备及存储介质 | |
| Otsuka et al. | Self-supervised reversed image signal processing via reference-guided dynamic parameter selection | |
| CN111726592B (zh) | 获取图像信号处理器的架构的方法和装置 | |
| CN116645282A (zh) | 基于大数据的数据处理方法及系统 | |
| CN113129252A (zh) | 一种图像评分方法及电子设备 | |
| WO2023229591A1 (en) | Real scene super-resolution with raw images for mobile devices | |
| CN111310516A (zh) | 一种行为识别方法和装置 | |
| CN111031390B (zh) | 一种输出大小固定序列行列式点过程视频概要方法 | |
| CN116128775A (zh) | 三维查找表训练方法以及视频增强方法 | |
| Liu et al. | Feature pyramid boosting network for rendering natural bokeh | |
| CN110276760B (zh) | 一种图像场景分割方法、终端及存储介质 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19778365 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2020568019 Country of ref document: JP Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2019778365 Country of ref document: EP Effective date: 20201026 |
