WO2024120396A1 - 视频编码方法、装置、电子设备及存储介质 - Google Patents
视频编码方法、装置、电子设备及存储介质 Download PDFInfo
- Publication number
- WO2024120396A1 WO2024120396A1 PCT/CN2023/136507 CN2023136507W WO2024120396A1 WO 2024120396 A1 WO2024120396 A1 WO 2024120396A1 CN 2023136507 W CN2023136507 W CN 2023136507W WO 2024120396 A1 WO2024120396 A1 WO 2024120396A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video
- quality level
- frame
- coding
- encoding
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/136—Incoming video signal characteristics or properties
- H04N19/14—Coding unit complexity, e.g. amount of activity or edge presence estimation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/136—Incoming video signal characteristics or properties
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/154—Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/172—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/179—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a scene or a shot
Definitions
- the embodiments of the present disclosure relate to the field of image processing technology, and in particular to a video encoding method, device, electronic device, and storage medium.
- a graphics processing unit is a processor in mobile phones, personal computers and other terminal devices that is responsible for performing image processing tasks.
- the graphics processor has powerful digital computing and parallel processing capabilities. In application scenarios such as video encoding and decoding, the use of the graphics processor can effectively improve the quality and efficiency of image encoding and decoding.
- Some graphics processors include a hardware-based video encoder, also called a hardware video editor, such as an Nvenc unit.
- the hardware video editor can encode data in a YUV/RGB format into a video that complies with the H.264/HEVC standard, thereby achieving efficient video encoding.
- the embodiments of the present disclosure provide a video encoding method, apparatus, electronic device, and storage medium to overcome the problem of unreasonable bit rate of encoded video caused by encoding at a fixed quality level.
- an embodiment of the present disclosure provides a video encoding method, including:
- an embodiment of the present disclosure provides a video encoding device, including:
- a parameter acquisition module used for extracting video coding features of the initial video data based on a hardware video editor, wherein the video coding features represent the complexity of the video content of the initial video data;
- a parameter optimization module configured to process the video encoding features of the initial video data based on a pre-trained prediction neural network model to obtain a target quality level corresponding to the initial video data, wherein the target quality level represents a picture quality level when encoding the initial video in a fixed quality dynamic bit rate mode;
- the encoding module is used to encode the initial video data at the target quality level using the hardware video editor to generate a target video.
- an electronic device including:
- a processor and a memory communicatively connected to the processor
- the memory stores computer-executable instructions
- the processor executes the computer-executable instructions stored in the memory to implement the video encoding method as described in the first aspect and various possible designs of the first aspect.
- an embodiment of the present disclosure provides a computer-readable storage medium, in which computer execution instructions are stored.
- a processor executes the computer execution instructions, the video encoding method described in the first aspect and various possible designs of the first aspect is implemented.
- an embodiment of the present disclosure provides a computer program product, including a computer program, which, when executed by a processor, implements the video encoding method as described in the first aspect and various possible designs of the first aspect.
- the video encoding method, device, electronic device and storage medium extract video encoding features of initial video data based on a hardware video editor, wherein the video encoding features represent the complexity of the video content of the initial video data; process the video encoding features of the initial video data based on a pre-trained predictive neural network model to obtain a target quality corresponding to the initial video data.
- Level wherein the target quality level represents the image quality level when encoding the initial video in a fixed quality dynamic bit rate mode; the initial video data is encoded using the hardware video editor at the target quality level to generate a target video.
- the video encoding features corresponding to the initial video data are obtained, and then the target image quality level matching the video encoding features is obtained through a pre-trained predictive neural network model, and then the hardware video editor is used to encode the video at the target quality level, so that the bit rate of the generated target video is adapted to the video content, avoiding the problem of too high or too low bit rate, improving the video quality, and avoiding bit rate waste.
- FIG1 is a schematic diagram of an application scenario provided by an embodiment of the present disclosure.
- FIG2 is a flowchart diagram 1 of a video encoding method provided by an embodiment of the present disclosure
- FIG3 is a flowchart of specific implementation steps of step S101 in the embodiment shown in FIG2 ;
- FIG4 is a flowchart of specific implementation steps of step S1013 in the embodiment shown in FIG3 ;
- FIG5 is a schematic diagram of a process for generating video coding features provided by an embodiment of the present disclosure
- FIG6 is a schematic diagram of a data structure of a video coding feature provided by an embodiment of the present disclosure.
- FIG7 is a second flow chart of a video encoding method provided by an embodiment of the present disclosure.
- FIG8 is a schematic diagram of a process for generating an evaluation value corresponding to a current quality level of a current frame provided by an embodiment of the present disclosure
- FIG9 is a schematic diagram of the steps for training a prediction neural network model
- FIG10 is a structural block diagram of a video encoding device provided by an embodiment of the present disclosure.
- FIG11 is a schematic diagram of the structure of an electronic device provided by an embodiment of the present disclosure.
- FIG. 12 is a schematic diagram of the hardware structure of an electronic device provided in an embodiment of the present disclosure.
- the video encoding method provided by the embodiment of the present disclosure can be applied to various application scenarios that require video encoding, such as video editing, previewing, and playback. More specifically, for example, it is applied to video editing software, video editing cloud platforms, and live broadcast software. Exemplarily, the method provided by the embodiment of the present disclosure can be applied to terminal devices, such as smart phones, tablet computers, personal computers, etc.; it can also be applied to cloud servers.
- Figure 1 is a schematic diagram of an application scenario provided by an embodiment of the present disclosure.
- the terminal device runs the video editing software to edit the original video, such as adding video effects, adding audio tracks, subtitles, etc., to generate video data, and the video data may include multiple video frames and editing information corresponding to each frame.
- the terminal device encodes the video data by calling the hardware video editor in the graphics processor, generates a target video (film) with editing effects for playback, and completes the video editing workflow, wherein the hardware video editor is, for example, an Nvenc unit.
- variable bit rate (VBR) encoding also known as dynamic bit rate encoding and non-fixed bit rate encoding
- VBR variable bit rate
- CQ-VBR constant quality dynamic bit rate
- the quality level (cq value) is usually set based on the user's experience, which often results in unreasonable quality level settings, resulting in the video bit rate being too high (resulting in a large video volume, wasting storage and network resources) or the bit rate being too low (the video is too sparse). The quality is low, affecting the viewing experience of the video).
- the disclosed embodiment provides a video encoding method, which solves the above-mentioned problem by automatically generating a reasonable target quality level (cq value) and performing video encoding based on the target quality level.
- FIG. 2 is a flow chart of a video encoding method provided by an embodiment of the present disclosure.
- the method of this embodiment can be applied to an electronic device provided with a hardware video editor, such as a terminal device or a server.
- the terminal device is used as the execution subject for introduction.
- the video encoding method includes:
- Step S101 extracting video coding features of initial video data based on a hardware video editor, where the video coding features represent the complexity of video content of the initial video data.
- the terminal device first obtains initial video data, which may be data generated based on a video editing application, a live broadcast application, etc., for example, video data in YUV/RGB format, as well as video effects, added audio tracks, subtitles, etc.
- the initial video data is the data to be encoded, and after encoding the initial video data, a corresponding playable video can be generated.
- the initial video data is processed by calling the hardware video editor.
- the initial video data is used as an input parameter, and the application interface of the hardware video editor is called, and then the corresponding processing function is run to process the initial video data to obtain the video encoding features corresponding to the initial video data.
- the video encoding features characterize the video content complexity of the initial video data through the video parameters of the initial video data and the encoding parameters corresponding to the video parameters.
- the video encoding features are obtained after processing based on the video parameters and the encoding parameters, wherein the video parameters are information characterizing the initial video data, such as video height, width (i.e., resolution), frame rate (fps), etc.; the encoding parameters are parameters used by the hardware video editor to encode the initial video data, such as frame encoding size (size), the type of each frame corresponding to the video data (including I frame, P frame, B frame), image distortion, image fineness, etc.
- the video parameters are information characterizing the initial video data, such as video height, width (i.e., resolution), frame rate (fps), etc.
- the encoding parameters are parameters used by the hardware video editor to encode the initial video data, such as frame encoding size (size), the type of each frame corresponding to the video data (including I frame, P frame, B frame), image distortion, image fineness, etc.
- step S101 include:
- Step S1011 Obtain candidate quality levels of initial video data.
- the alternative quality level can be a preset default value. More specifically, the alternative quality level can be a value in the range of [18,35], such as 25. The smaller the quality level, the higher the quality level of the video generated after encoding, the clearer the video, and relatively, the larger the video size.
- Step S1012 Input the video parameters and the candidate quality levels of the initial video data into the hardware video editor to obtain the encoding parameters of the initial video data.
- the video parameters and the alternative quality level of the initial video data are used as input quantities and input into the hardware video editor to obtain the encoding parameters output by the hardware video editor, such as frame encoding size (size), the type of each frame corresponding to the video data (including I frame, P frame, B frame), image distortion, image fineness, etc.
- the image distortion can be represented by the Sum of Absolute Transformed Difference (SATD) of each frame
- the image fineness can be represented by the Quantizer Parameter (QP) of each frame.
- SSD Sum of Absolute Transformed Difference
- QP Quantizer Parameter
- Step S1013 Generate corresponding video encoding features based on the encoding parameters.
- the encoding parameters obtained by inputting video parameters and alternative quality levels into a hardware video editor are equivalent to the pre-encoding of the initial video data by the hardware video editor, that is, the hardware video editor predicts the corresponding encoding parameters based on the initial video data, but does not perform actual encoding.
- video encoding features are generated in combination with the video parameters and the alternative quality levels.
- the video encoding features can express the complexity of the video content of the initial video data.
- a quality level that matches it can be determined for encoding, thereby achieving the purpose of matching the combination of picture quality and the complexity of the video content, avoiding waste or too low bit rate problems.
- the initial video data includes multiple video frames, as shown in FIG4 , and the specific implementation steps of step S1013 include:
- Step S1013A Obtain encoding parameters corresponding to each video frame.
- Step S1013B According to the coding parameters corresponding to each video frame, a coding feature mean and a coding feature variance value are obtained, wherein the coding feature mean is the average value of the coding parameters corresponding to each video frame; and the coding feature variance value is the variance value of the coding parameters corresponding to each video frame.
- Step S1013C Generate video coding features according to the video parameters, the candidate quality levels, and the coding feature mean and coding feature variance values corresponding to each video frame.
- the initial video data includes multiple video frames
- the frame coding features corresponding to the video frames in the initial video data are obtained by respectively obtaining the video parameters and coding parameters corresponding to the video frames in the initial video data, and then judging the video content complexity of the initial video data according to the average level and change between the multiple frame coding features, and then obtaining the video coding features of the initial video data.
- FIG5 is a schematic diagram of a process for generating video coding features provided by an embodiment of the present disclosure, and the above process is introduced in conjunction with FIG5. As shown in FIG5, the initial video data includes N video frames, and N is an integer greater than 1.
- the video parameters of the Mth frame such as the height, width, and frame rate of the Mth frame
- the video parameters and the alternative coding level into the interface of the hardware video editor
- the hardware video editor uses the hardware video editor to obtain the coding parameters of the Mth frame, such as the type of the Mth frame (shown as Type in the figure), the frame coding size (shown as Size in the figure), the absolute error sum (shown as SATD in the figure), and the quantization parameter (shown as QP in the figure).
- the mean and variance of the coding parameters corresponding to the Mth frame are calculated to obtain the coding feature mean and coding feature variance values.
- the mean and variance of the coding parameters corresponding to the Mth frame refer to the mean and variance of the coding parameters of each video frame in the set consisting of at least one adjacent video frame before the Mth frame (shown as the Lth frame in the figure, L is an integer less than M and greater than or equal to 1) and the Mth frame.
- the corresponding mean and variance are calculated to obtain the specific calculation process of the coding feature mean and the coding feature variance value.
- the video coding feature corresponding to the Mth frame is obtained, and the video coding feature represents the video content complexity of the video segment corresponding to the Lth frame to the Mth frame in the initial video data.
- the video coding feature represents the video content complexity of the video segment before the Mth frame in the initial video data.
- FIG6 is a schematic diagram of a data structure of a video coding feature provided by an embodiment of the present disclosure.
- the video coding feature includes 21 data fields, which are shown as fields #1 to #21. Among them:
- the #1 field indicates the image height of the 1st to Mth frames in the initial video data
- the #2 field indicates the image width of the first frame to the Mth frame in the initial video data
- the #3 field indicates the frame rate of the 1st to Mth frames in the initial video data
- Field #4 indicates the alternative quality level of the 1st to Mth frames in the initial video data
- the #5 field indicates the number of I frames from the 1st frame to the Mth frame in the initial video data
- the #6 field indicates the number of P frames from the 1st frame to the Mth frame in the initial video data
- the #7 field indicates the number of B frames from the 1st frame to the Mth frame in the initial video data
- the #8 field indicates the average size of the I frames from the 1st frame to the Mth frame in the initial video data
- the #9 field indicates the average size of the P frames from the 1st frame to the Mth frame in the initial video data
- the #10 field indicates the average size of the B frames from the 1st frame to the Mth frame in the initial video data
- the #11 field indicates the mean absolute error sum of the I frames from the 1st frame to the Mth frame in the initial video data
- the #12 field indicates the mean absolute error of the P frames from the 1st frame to the Mth frame in the initial video data
- the #13 field indicates the mean absolute error of the B frames from the 1st frame to the Mth frame in the initial video data
- the #14 field indicates the average quantization parameter of the 1st frame to the Mth frame in the initial video data
- the #15 field indicates the size variance of the I frames from the 1st frame to the Mth frame in the initial video data
- the #16 field indicates the size variance of the P frames from the 1st frame to the Mth frame in the initial video data
- the #17 field indicates the size variance of the B frames from the 1st frame to the Mth frame in the initial video data
- the #18 field indicates the mean absolute error and variance of the I frames from the 1st frame to the Mth frame in the initial video data
- the #19 field indicates the mean absolute error and variance of the P frames from the 1st frame to the Mth frame in the initial video data
- the #20 field indicates the mean absolute error and variance of the B frames from the 1st frame to the Mth frame in the initial video data
- the #21 field indicates the quantization parameter variance of the 1st frame to the Mth frame in the original video data.
- Step S102 Processing the video encoding features of the initial video data based on the pre-trained predictive neural network model to obtain a target quality level corresponding to the initial video data, wherein the target quality level represents the image quality level when encoding the initial video in a fixed quality dynamic bit rate mode.
- the quality level i.e., the cq value
- CQ-VBR constant quality variable bit rate
- the video coding features of the initial video data are processed by a pre-trained predictive neural network model to predict a quality level that matches the complexity of the video content it represents, i.e., the target quality level.
- Step S103 Encode the initial video data at a target quality level using a hardware video editor to generate a target video.
- the hardware video editor is called to process the initial video data with the target quality level as a parameter, so as to generate a corresponding video piece for playback, namely, the target video.
- the target quality level may be a level sequence including multiple level identifiers, and the level identifier represents a specific quality level (i.e., cq value).
- each level identifier in the level sequence corresponds to one or more video frames in the initial video data; in a more specific possible implementation, each level identifier corresponds to a video frame, and when the initial video data is encoded at the target quality level using a hardware video editor, each video frame in the initial video data is obtained in sequence (in parallel or serially), and based on the level identifier corresponding to each video frame, the hardware video editor is called to encode the corresponding video frame at a fixed quality dynamic bit rate, so that each encoded video frame has a different picture quality level, thereby achieving more accurate encoding and improving encoding efficiency.
- the specific implementation process of calling the hardware video editor to encode at a fixed quality dynamic bit rate is a prior art and will not be repeated here.
- the video coding features of the initial video data are extracted based on a hardware video editor, and the video coding features characterize the complexity of the video content of the initial video data; the video coding features of the initial video data are processed based on a pre-trained predictive neural network model to obtain a target quality level corresponding to the initial video data, wherein the target quality level characterizes the image quality level when encoding the initial video in a fixed quality dynamic bit rate mode; the initial video data is encoded at the target quality level using the hardware video editor to generate a target video.
- the video coding features corresponding to the initial video data are obtained, and then the target image quality level matching the video coding features is obtained through a pre-trained predictive neural network model, and then the hardware video editor is used to perform video encoding at the target quality level, so that the bit rate of the generated target video is adapted to the video content, avoiding the problem of too high or too low bit rate, improving the video quality, and avoiding bit rate waste.
- FIG. 7 is a second flow chart of a video encoding method provided by an embodiment of the present disclosure. Based on the embodiment shown in FIG. 2 , this embodiment further refines the implementation process of steps S101 and S102.
- the video encoding method includes:
- Step S201 Acquire a quality grade sequence representing candidate quality grades, where the quality grade sequence is a set of multiple quality grades arranged in order.
- the alternative quality levels may include multiple ones, and the multiple alternative quality levels are characterized by a preset quality level sequence, that is, the quality level sequence is an implementation method of the alternative quality levels.
- the quality level sequence can be an enumerated data structure such as an array, a matrix, a key-value pair, a structure, or a set of numbers expressed in a function manner, which will not be described one by one here.
- Step S202 Acquire the current frame of the initial video data.
- Step S203 obtaining video parameters of the current frame and the current quality level in the quality level sequence
- Step S204 input the video parameters of the current frame and the quality level of the current frame into the hardware video editor to obtain encoding parameters corresponding to the current quality level of the current frame.
- the matching target quality level corresponding to each video frame in the initial video parameters is obtained by looping in two dimensions (video frame dimension and quality level dimension), so that each frame is encoded based on the target quality level corresponding to each video frame, thereby realizing dynamic encoding of frames in the video and improving encoding efficiency and encoding quality.
- the video parameters can be directly obtained and will not be repeated; thereafter, based on the multiple quality levels and video parameters in the alternative quality data, the hardware video editor is input in sequence to obtain the encoding parameters corresponding to the current quality level of the current frame.
- the encoding parameters of the first frame of the initial video data can be obtained by calling the corresponding interface of the hardware video editor based on the preset default quality level and the video parameters.
- Step S205 Generate a first feature corresponding to the current quality level of the current frame by using the video parameters of the current frame, the current quality level of the current frame and the encoding parameters corresponding to the current quality level;
- the above video parameters, current quality level, and encoding parameters are combined to generate a frame encoding feature corresponding to the current quality level, that is, the first feature. That is, the first feature includes the video parameters, the current quality level, and the encoding parameters.
- F represents the first feature of the current frame
- h represents the image height
- w represents the image width
- fps represents the image frame rate.
- the above h, w, and fps are video parameters.
- cq represents the current quality level
- type represents the type of the current frame
- size represents the size of the current frame
- satd represents the absolute error sum of the current frame
- gp represents the quantization parameter of the current frame.
- the above type, size, satd, and gp are encoding parameters.
- Step S206 obtaining a second feature corresponding to a target quality level of a preceding frame corresponding to the current frame, where the preceding frame is a first preset number of adjacent video frames before the current frame.
- Step S207 Generate candidate coding features corresponding to the current quality level of the current frame according to the first feature and the second feature.
- the second feature generated by the previous frame corresponding to the current frame based on the target quality level is obtained.
- the second feature is similar to the first feature and is data used to characterize the frame coding feature of the previous frame.
- the previous frame corresponding to the current frame is the first preset number of video frames adjacent to the current frame, for example, 30 video frames before the current frame.
- the specific implementation of the previous frame can refer to the embodiment corresponding to FIG5, that is, the video frame set directly from the Lth frame to the Mth frame is the previous frame.
- the current frame is the second frame of the initial video data
- the previous frame of the current frame is the first frame of the initial video data
- the second feature corresponding to the first frame is the frame coding feature generated based on the default quality level.
- the specific process is not repeated.
- the average value and variance value of the coding parameters in the frame coding features corresponding to the current frame are calculated to obtain the video selection coding feature corresponding to the second frame, that is, the candidate coding feature corresponding to the current quality level of the current frame.
- the current frame is the third frame and subsequent video frames of the initial video data
- the target quality level i.e., the optimized quality level
- the frame coding feature corresponding to the previous frame of the current frame i.e., the second feature, is generated based on the target quality level corresponding to the previous frame.
- an alternative coding feature corresponding to the current quality level and characterizing the complexity of the video content is generated based on the second feature and the first feature. Since the second feature of the previous frame is generated based on the optimized target quality level, the alternative coding feature generated by the second feature can more accurately express the complexity of the video content, thereby improving the accuracy of the alternative coding feature, and ultimately improving the accuracy of the target quality level obtained based on the alternative coding feature, thereby improving the video coding efficiency.
- Step S208 Input the candidate coding features into the prediction neural network model to obtain a corresponding first evaluation value and a corresponding second evaluation value, wherein the first evaluation value represents a video quality evaluation value based on a fusion of video multi-method evaluation, and the second evaluation value represents a video bit rate.
- Step S209 If the current quality level is at the end of the quality level sequence, continue to execute step S210; otherwise, return to execute step S203.
- the candidate coding features obtained in step S207 are input into the prediction neural network model, and the first evaluation value and the second evaluation value output by the prediction neural network model can be obtained.
- the first evaluation value represents the video quality evaluation value based on the fusion of video multi-method evaluation
- the second evaluation value represents the video bit rate.
- Video Muitimethod Assessment Fusion is a video quality assessment indicator used to measure the perception of streaming video quality in a large-scale environment. It can solve the problem that traditional indicators cannot reflect videos in multiple scenes and multiple features.
- the specific implementation method of Video Muitimethod Assessment Fusion is an existing technology.
- the video bit rate can be obtained from the quantization parameters in the encoding parameters, which will not be repeated here.
- the input video coding features (alternative coding features) can be mapped to the corresponding Video Muitimethod Assessment Fusion indicators and video bit rate.
- FIG8 is a schematic diagram of a process for generating an evaluation value corresponding to the current quality level of the current frame provided by an embodiment of the present disclosure.
- the video frame traversal is first performed, and then the quality level traversal is performed for each video frame.
- the current quality level is obtained.
- the alternative coding features corresponding to the current quality level are obtained, and the alternative coding features are input into the predictive neural network model.
- the predictive neural network model outputs a first evaluation value and a second evaluation value.
- the first evaluation value and the second evaluation value are saved together with the corresponding alternative coding features (and/or the current quality level) as a set of alternative coding feature-evaluation value mapping data.
- step S210 is executed to select the target quality level from multiple candidate quality levels; if the current quality level is not at the end of the quality level sequence, return to step S203, loop to the next set of quality levels (update the current quality level), and repeat the above process until all the quality levels in the quality level sequence are traversed.
- Step S210 Generate a target quality level of the current frame based on the first evaluation value and the corresponding second evaluation value corresponding to each candidate coding feature of the current frame.
- each candidate coding feature corresponds to a quality level.
- the quality levels in the quality level sequence are evaluated based on the first evaluation value and the second evaluation value corresponding to each quality level in the quality level sequence of the current frame to obtain an optimal quality level, that is, the target quality level.
- step S210 Exemplarily, the specific implementation steps of step S210 include:
- Step S2101 According to the first evaluation value corresponding to each candidate coding feature, a first target coding feature is obtained, where the first target coding feature is a candidate coding feature whose first evaluation value is greater than a first threshold.
- Step S2102 Determine a second target coding feature according to the second evaluation value of the first target coding feature, where the second target feature is a video coding feature with the smallest second evaluation value among the first target coding features.
- Step S2103 Obtain a target quality level according to the quality level corresponding to the second target feature.
- the first evaluation value and the second evaluation value respectively represent the video quality and bit rate after the video is encoded, wherein the higher the first evaluation value, the higher the video quality of the video, and the higher the second evaluation value, the higher the bit rate, that is, the larger the video volume.
- the higher the first evaluation value, the higher the video quality of the video, and the higher the second evaluation value, the higher the bit rate, that is, the larger the video volume In order to improve the efficiency of video encoding, it is necessary to select a quality level that can reduce the bit rate to the greatest extent while meeting the video quality requirements of the preset video.
- the candidate coding feature with the first evaluation value (i.e., video quality) greater than the first threshold is determined as the first target coding feature; then, from the first target coding feature, the video coding feature with the smallest second evaluation value (i.e., the smallest bit rate) is selected, and then the quality level corresponding to the video coding feature is obtained as the target quality level.
- the above process can be implemented by the candidate coding feature-evaluation value mapping data saved in the previous step, and the specific process will not be repeated.
- a matching target quality level is obtained, so that the target video encoded based on the target quality level can meet the preset video quality requirements.
- the bit rate can be reduced, the video volume can be compressed, and the video encoding efficiency can be improved.
- Step S211 Encode the current frame at a target quality level using a hardware video editor to generate a target frame for constituting a target video.
- Step S212 If the current frame is not the last frame of the initial video data, the next frame of the current frame is set as the new current frame, and the process returns to step S202.
- the current frame is encoded based on the target quality level to obtain a target frame with better video quality and lower bit rate; at the same time, if the current frame is not the last frame of the initial video data, then return to step S202 and continue to perform the above processing on the next video frame until all video frames are traversed and the corresponding target frames are generated, thereby forming a target video. Since each video frame is dynamically encoded using different target quality levels, the video volume can be reduced while improving the video quality, thereby improving the video encoding efficiency.
- step S208 a step of training the prediction neural network model is also included.
- the step of training the prediction neural network model includes:
- Step S2001 obtaining original video data and a quality level sequence, wherein the quality level sequence includes at least two different quality levels;
- Step S2002 Based on the quality level sequence, the original video data is processed in sequence using a hardware video editor to obtain video coding features corresponding to each quality level;
- Step S2003 Calculate the first evaluation value and the second evaluation value corresponding to each video coding feature
- Step S2004 Generate training samples according to each video encoding feature and the corresponding first evaluation value and the corresponding second evaluation value, and train a preset neural network model based on the training samples to obtain a prediction neural network model.
- the original video data is a video data sample
- the quality level sequence is a preset parameter.
- the first evaluation value and the second evaluation value corresponding to each video encoding feature are obtained, and the first evaluation value and the second evaluation value are used as sample labels, and the original video data and the level sequence are used as the original sample.
- training samples are generated to achieve efficient and high-quality training of the neural network model, so that the model can converge quickly, thereby improving the prediction accuracy and efficiency of the model.
- FIG10 is a structural block diagram of a video encoding device provided by an embodiment of the present disclosure. For ease of explanation, only the parts related to the embodiment of the present disclosure are shown.
- the video encoding device 3 includes:
- a parameter acquisition module 31 is used to extract video coding features of the initial video data based on a hardware video editor, where the video coding features represent the complexity of the video content of the initial video data;
- a parameter optimization module 32 is used to process the video encoding features of the initial video data based on the pre-trained prediction neural network model to obtain a target quality level corresponding to the initial video data, wherein the target quality level represents the image quality level when encoding the initial video in a fixed quality dynamic bit rate mode;
- the encoding module 33 is used to encode the initial video data at a target quality level using a hardware video editor to generate a target video.
- the parameter acquisition module 31 is specifically used to: obtain alternative quality levels of initial video data; input video parameters and alternative quality levels of the initial video data into a hardware video editor to obtain encoding parameters of the initial video data; and generate corresponding video encoding features based on the encoding parameters.
- the initial video data includes multiple video frames.
- the parameter acquisition module 31 When the parameter acquisition module 31 generates corresponding video coding features based on the coding parameters, it is specifically used to: obtain the coding parameters corresponding to each video frame; obtain the coding feature mean and the coding feature variance value according to the coding parameters corresponding to each video frame, wherein the coding feature mean is the average value of the coding parameters corresponding to each video frame; the coding feature variance value is the variance value of the coding parameters corresponding to each video frame; generate video coding features according to the video parameters, alternative quality levels, and the coding feature mean and coding feature variance values corresponding to each video frame.
- the parameter acquisition module 31 when the parameter acquisition module 31 inputs the video parameters and the alternative quality levels of the initial video data into the hardware video editor to obtain the encoding parameters of the initial video data, the parameter acquisition module 31 is specifically used to: loop through the following steps until a preset condition is met: obtain the initial video data when previous frame; obtaining video parameters of the current frame and the quality level of the current frame; inputting the video parameters of the current frame and the quality level of the current frame into the hardware video editor to obtain encoding parameters corresponding to the current frame; setting the next frame of the current frame as the new current frame.
- the parameter acquisition module 31 is further used to: generate a frame encoding feature corresponding to the current frame using the video parameters of the current frame, the quality level of the current frame and the encoding parameters; obtain a frame encoding feature of a preceding frame corresponding to the current frame, the preceding frame being a first preset number of adjacent video frames before the current frame, wherein the frame encoding feature of the preceding frame is generated based on the target quality level corresponding to the preceding frame;
- the parameter acquisition module 31 is specifically used to generate video coding features corresponding to the current frame according to the frame coding features corresponding to the current frame and the frame coding features of the previous frame.
- the encoding parameters include at least one of the following: frame type, frame size, image distortion, and image refinement.
- the video coding features of the initial video data include at least two alternative coding features, each alternative coding feature corresponds to a different quality level
- the parameter optimization module 32 is specifically used to: input each alternative coding feature into the prediction neural network model in turn to obtain a corresponding first evaluation value and a corresponding second evaluation value, wherein the first evaluation value represents a video quality evaluation value based on a fusion of video multi-method evaluation, and the second evaluation value represents a video bit rate; based on the first evaluation value and the corresponding second evaluation value corresponding to each alternative coding feature, generate a target quality level.
- the parameter optimization module 32 when the parameter optimization module 32 generates a target quality level based on the first evaluation value and the corresponding second evaluation value corresponding to each alternative coding feature, it is specifically used to: obtain a first target coding feature according to the first evaluation value corresponding to each alternative coding feature, the first target coding feature being an alternative coding feature whose first evaluation value is greater than a first threshold; determine a second target coding feature according to the second evaluation value of the first target coding feature, the second target feature being a video coding feature with the smallest second evaluation value among the first target coding features; and obtain the target quality level according to the quality level corresponding to the second target feature.
- the parameter optimization module 32 before processing the video coding features of the initial video data based on the pre-trained prediction neural network model to obtain the target quality level corresponding to the initial video data, the parameter optimization module 32 is further used to: obtain the original video data and a quality level sequence, wherein the quality level sequence includes at least two different quality levels; based on the quality level sequence, sequentially use the hardware video editing The device processes the original video data to obtain video coding features corresponding to each quality level; calculates the first evaluation value and the second evaluation value corresponding to each video coding feature; generates training samples according to each video coding feature and the corresponding first evaluation value and the corresponding second evaluation value, and trains a preset neural network model based on the training samples to obtain a prediction neural network model.
- the parameter acquisition module 31, parameter optimization module 32 and encoding module 33 are connected in sequence.
- the video encoding device 3 provided in this embodiment can implement the technical solution of the above method embodiment, and its implementation principle and technical effect are similar, which will not be described in detail in this embodiment.
- FIG11 is a schematic diagram of the structure of an electronic device provided by an embodiment of the present disclosure. As shown in FIG11 , the electronic device 4 includes:
- the memory 42 stores computer executable instructions
- the processor 41 executes the computer-executable instructions stored in the memory 42 to implement the video encoding method in the embodiments shown in FIG. 2 to FIG. 9 .
- processor 41 and the memory 42 are connected via a bus 43 .
- FIG. 12 it shows a schematic diagram of the structure of an electronic device 900 suitable for implementing the embodiment of the present disclosure
- the electronic device 900 may be a terminal device or a server.
- the terminal device may include but is not limited to mobile terminals such as mobile phones, laptop computers, digital broadcast receivers, personal digital assistants (PDAs), tablet computers (Portable Android Devices, PADs), portable multimedia players (PMPs), vehicle terminals (such as vehicle navigation terminals), etc., and fixed terminals such as digital TVs, desktop computers, etc.
- PDAs personal digital assistants
- PADs Portable Android Devices
- PMPs portable multimedia players
- vehicle terminals such as vehicle navigation terminals
- fixed terminals such as digital TVs, desktop computers, etc.
- the electronic device shown in FIG. 12 is only an example and should not bring any limitation to the functions and scope of use of the embodiment of the present disclosure.
- the electronic device 900 may include a processing device (e.g., a central processing unit, a graphics processing unit, etc.) 901, which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 902 or a program loaded from a storage device 908 to a random access memory (RAM) 903.
- a processing device e.g., a central processing unit, a graphics processing unit, etc.
- RAM random access memory
- Various programs and data required for the operation of the electronic device 900 are also stored in the RAM 903.
- the processing device 901, the ROM 902, and the RAM 903 are connected to each other via a bus 904.
- I/O Input/Output
- An interface 905 is also connected to the bus 904 .
- the following devices may be connected to the I/O interface 905: input devices 906 including, for example, a touch screen, a touchpad, a keyboard, a mouse, a camera, a microphone, an accelerometer, a gyroscope, etc.; output devices 907 including, for example, a liquid crystal display (LCD), a speaker, a vibrator, etc.; storage devices 908 including, for example, a magnetic tape, a hard disk, etc.; and communication devices 909.
- the communication device 909 may allow the electronic device 900 to communicate with other devices wirelessly or by wire to exchange data.
- FIG. 12 shows an electronic device 900 having various devices, it should be understood that it is not required to implement or have all of the devices shown. More or fewer devices may be implemented or have alternatively.
- an embodiment of the present disclosure includes a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart.
- the computer program can be downloaded and installed from the network through the communication device 909, or installed from the storage device 908, or installed from the ROM 902.
- the processing device 901 the above-mentioned functions defined in the method of the embodiment of the present disclosure are executed.
- the computer-readable medium disclosed above may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two.
- the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination of the above.
- Computer-readable storage media may include, but are not limited to: an electrical connection with one or more wires, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the above.
- a computer-readable storage medium may be any tangible medium containing or storing a program that may be used by or in combination with an instruction execution system, device or device.
- a computer-readable signal medium may include a data signal propagated in a baseband or as part of a carrier wave, in which a computer-readable program code is carried.
- This propagated data signal may take a variety of forms, including but not limited to an electromagnetic signal, an optical signal, or any suitable combination of the above.
- the computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, and the computer-readable signal medium may be sent, propagated or transmitted for use or by an instruction execution system, apparatus or device. Programs used in conjunction therewith.
- the program code contained on the computer-readable medium may be transmitted using any appropriate medium, including but not limited to: wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
- the computer-readable medium may be included in the electronic device, or may exist independently without being incorporated into the electronic device.
- the computer-readable medium carries one or more programs.
- the electronic device executes the method shown in the above embodiment.
- Computer program code for performing the operations of the present disclosure may be written in one or more programming languages or a combination thereof, including object-oriented programming languages such as Java, Smalltalk, C++, and conventional procedural programming languages such as "C" or similar programming languages.
- the program code may be executed entirely on the user's computer, partially on the user's computer, as a separate software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server.
- the remote computer may be connected to the user's computer via any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (e.g., via the Internet using an Internet service provider).
- LAN Local Area Network
- WAN Wide Area Network
- each square box in the flow chart or block diagram can represent a module, a program segment or a part of a code, and the module, the program segment or a part of the code contains one or more executable instructions for realizing the specified logical function.
- the functions marked in the square box can also occur in a sequence different from that marked in the accompanying drawings. For example, two square boxes represented in succession can actually be executed substantially in parallel, and they can sometimes be executed in the opposite order, depending on the functions involved.
- each square box in the block diagram and/or flow chart, and the combination of the square boxes in the block diagram and/or flow chart can be implemented with a dedicated hardware-based system that performs a specified function or operation, or can be implemented with a combination of dedicated hardware and computer instructions.
- the units involved in the embodiments of the present disclosure may be implemented by software or hardware.
- the name of a unit does not limit the unit itself in some cases.
- the first acquisition unit may also be described as "acquiring at least two Internet protocols”. unit of the proposed address”.
- exemplary types of hardware logic components include: field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chip (SOCs), complex programmable logic devices (CPLDs), and the like.
- FPGAs field programmable gate arrays
- ASICs application specific integrated circuits
- ASSPs application specific standard products
- SOCs systems on chip
- CPLDs complex programmable logic devices
- a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, device, or equipment.
- a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
- a machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or equipment, or any suitable combination of the foregoing.
- a more specific example of a machine-readable storage medium may include an electrical connection based on one or more lines, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
- RAM random access memory
- ROM read-only memory
- EPROM or flash memory erasable programmable read-only memory
- CD-ROM portable compact disk read-only memory
- CD-ROM compact disk read-only memory
- magnetic storage device or any suitable combination of the foregoing.
- a video encoding method including:
- the method of extracting video encoding features of initial video data based on a hardware video editor includes: obtaining alternative quality levels of the initial video data; inputting video parameters of the initial video data and the alternative quality levels into the hardware video editor to obtain encoding parameters of the initial video data; and generating corresponding video encoding features based on the encoding parameters.
- the initial video data includes multiple video frames
- the generating corresponding video coding features based on the coding parameters includes: obtaining the coding parameters corresponding to each of the video frames; obtaining the coding feature mean and Coding feature variance value, wherein the coding feature mean value is the average value of the coding parameters corresponding to each of the video frames; the coding feature variance value is the variance value of the coding parameters corresponding to each of the video frames; the video coding feature is generated according to the video parameters, the alternative quality level, and the coding feature mean value and coding feature variance value corresponding to each of the video frames.
- the video parameters of the initial video data and the alternative quality level are input into the hardware video editor to obtain the encoding parameters of the initial video data, including: looping the following steps until a preset condition is met: acquiring a current frame of the initial video data; obtaining the video parameters of the current frame and the quality level of the current frame; inputting the video parameters of the current frame and the quality level of the current frame into the hardware video editor to obtain the encoding parameters corresponding to the current frame; and setting the next frame of the current frame as a new current frame.
- the encoding parameters corresponding to the current frame after obtaining the encoding parameters corresponding to the current frame, it also includes: generating frame encoding features corresponding to the current frame by combining the video parameters of the current frame, the quality level of the current frame and the encoding parameters; obtaining frame encoding features of a previous frame corresponding to the current frame, the previous frame being a first preset number of adjacent video frames before the current frame, wherein the frame encoding features of the previous frame are generated based on a target quality level corresponding to the previous frame; generating corresponding video encoding features based on the encoding parameters includes: generating video encoding features corresponding to the current frame according to the frame encoding features corresponding to the current frame and the frame encoding features of the previous frame.
- the encoding parameters include at least one of the following: frame type, frame size, image distortion, and image fineness.
- the video coding features of the initial video data include at least two alternative coding features, each of the alternative coding features corresponds to a different quality level, and the video coding features of the initial video data are processed based on a pre-trained predictive neural network model to obtain a target quality level corresponding to the initial video data, including: inputting each of the alternative coding features into the predictive neural network model in turn to obtain a corresponding first evaluation value and a corresponding second evaluation value, wherein the first evaluation value represents a video quality evaluation value based on a fusion of video multi-method evaluation, and the second evaluation value represents a video bit rate; based on the first evaluation value corresponding to each of the alternative coding features and the corresponding second evaluation value, the target quality level is generated.
- generating the target quality level based on the first evaluation value and the corresponding second evaluation value corresponding to each of the candidate coding features includes: According to the first evaluation value corresponding to each of the alternative coding features, a first target coding feature is obtained, and the first target coding feature is an alternative coding feature whose first evaluation value is greater than a first threshold; according to the second evaluation value of the first target coding feature, a second target coding feature is determined, and the second target feature is a video coding feature among the first target coding features with the smallest second evaluation value; according to the quality level corresponding to the second target feature, the target quality level is obtained.
- the pre-trained predictive neural network model before the pre-trained predictive neural network model processes the video coding features of the initial video data to obtain the target quality level corresponding to the initial video data, it also includes: acquiring original video data and a quality level sequence, wherein the quality level sequence includes at least two different quality levels; based on the quality level sequence, sequentially processing the original video data using the hardware video editor to obtain the video coding features corresponding to each of the quality levels; calculating the first evaluation value and the second evaluation value corresponding to each of the video coding features; generating training samples according to each of the video coding features and the corresponding first evaluation value and the corresponding second evaluation value, and training a preset neural network model based on the training samples to obtain the predictive neural network model.
- a video encoding device including:
- a parameter acquisition module used for extracting video coding features of the initial video data based on a hardware video editor, wherein the video coding features represent the complexity of the video content of the initial video data;
- a parameter optimization module configured to process the video encoding features of the initial video data based on a pre-trained prediction neural network model to obtain a target quality level corresponding to the initial video data, wherein the target quality level represents a picture quality level when encoding the initial video in a fixed quality dynamic bit rate mode;
- the encoding module is used to encode the initial video data at the target quality level using the hardware video editor to generate a target video.
- the parameter acquisition module is specifically used to: obtain alternative quality levels of the initial video data; input the video parameters of the initial video data and the alternative quality levels into the hardware video editor to obtain encoding parameters of the initial video data; and generate corresponding video encoding features based on the encoding parameters.
- the initial video data includes a plurality of video frames
- the parameter acquisition module generates corresponding video encoding features based on the encoding parameters.
- the invention relates to: obtaining the encoding parameters corresponding to each of the video frames; obtaining the encoding feature mean and the encoding feature variance value according to the encoding parameters corresponding to each of the video frames, wherein the encoding feature mean is the average value of the encoding parameters corresponding to each of the video frames; the encoding feature variance value is the variance value of the encoding parameters corresponding to each of the video frames; generating the video encoding features according to the video parameters, the alternative quality levels, and the encoding feature mean and the encoding feature variance value corresponding to each of the video frames.
- the parameter acquisition module when the parameter acquisition module inputs the video parameters of the initial video data and the alternative quality level into the hardware video editor to obtain the encoding parameters of the initial video data, it is specifically used to: loop the following steps until a preset condition is met: obtain the current frame of the initial video data; obtain the video parameters of the current frame and the quality level of the current frame; input the video parameters of the current frame and the quality level of the current frame into the hardware video editor to obtain the encoding parameters corresponding to the current frame; set the next frame of the current frame as the new current frame.
- the parameter acquisition module is further used to: generate frame coding features corresponding to the current frame based on the video parameters of the current frame, the quality level of the current frame and the encoding parameters; obtain frame coding features of a preceding frame corresponding to the current frame, the preceding frame being a first preset number of adjacent video frames before the current frame, wherein the frame coding features of the preceding frame are generated based on a target quality level corresponding to the preceding frame; when generating the corresponding video coding features based on the encoding parameters, the parameter acquisition module is specifically used to: generate the video coding features corresponding to the current frame based on the frame coding features corresponding to the current frame and the frame coding features of the preceding frame.
- the encoding parameters include at least one of the following: frame type, frame size, image distortion, and image refinement.
- the video coding features of the initial video data include at least two alternative coding features, each of the alternative coding features corresponds to a different quality level
- the parameter optimization module is specifically used to: input each of the alternative coding features into the prediction neural network model in turn to obtain a corresponding first evaluation value and a corresponding second evaluation value, wherein the first evaluation value represents a video quality evaluation value based on a fusion of video multi-method evaluation, and the second evaluation value represents a video bit rate; based on the first evaluation value corresponding to each of the alternative coding features and the corresponding second evaluation value, generate the target quality level.
- the parameter optimization module is based on each of the alternative When generating the target quality level, the first evaluation value and the second evaluation value corresponding to the coding feature are specifically used to: obtain a first target coding feature according to the first evaluation value corresponding to each of the alternative coding features, the first target coding feature being an alternative coding feature whose first evaluation value is greater than a first threshold; determine a second target coding feature according to the second evaluation value of the first target coding feature, the second target feature being a video coding feature with the smallest second evaluation value among the first target coding features; and obtain the target quality level according to the quality level corresponding to the second target feature.
- the parameter optimization module is also used to: obtain original video data and a quality level sequence, wherein the quality level sequence includes at least two different quality levels; based on the quality level sequence, sequentially process the original video data using the hardware video editor to obtain video coding features corresponding to each of the quality levels; calculate the first evaluation value and the second evaluation value corresponding to each of the video coding features; generate training samples according to each of the video coding features and the corresponding first evaluation value and the corresponding second evaluation value, and train a preset neural network model based on the training samples to obtain the predictive neural network model.
- an electronic device comprising: a processor, and a memory communicatively connected to the processor;
- the memory stores computer-executable instructions
- the processor executes the computer-executable instructions stored in the memory to implement the video encoding method as described in the first aspect and various possible designs of the first aspect.
- a computer-readable storage medium stores computer execution instructions.
- the video encoding method described in the first aspect and various possible designs of the first aspect is implemented.
- an embodiment of the present disclosure provides a computer program product, including a computer program, which, when executed by a processor, implements the video encoding method as described in the first aspect and various possible designs of the first aspect.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Description
F={h,w,fps,cq,type,size,satd,gp}。
Claims (13)
- 一种视频编码方法,包括:基于硬件视频编辑器提取初始视频数据的视频编码特征,所述视频编码特征表征所述初始视频数据的视频内容复杂度;基于预训练的预测神经网络模型处理所述初始视频数据的视频编码特征,得到所述初始视频数据对应的目标质量等级,其中,所述目标质量等级表征固定品质动态码率模式下编码所述初始视频时的画质级别;利用所述硬件视频编辑器以所述目标质量等级对所述初始视频数据进行编码,生成目标视频。
- 根据权利要求1所述的方法,其中所述基于硬件视频编辑器提取初始视频数据的视频编码特征,包括:获取所述初始视频数据的备选质量等级;将所述初始视频数据的视频参数和所述备选质量等级输入所述硬件视频编辑器,得到所述初始视频数据的编码参数;基于所述编码参数生成对应的视频编码特征。
- 根据权利要求2所述的方法,其中所述初始视频数据中包括多个视频帧,所述基于所述编码参数生成对应的视频编码特征,包括:获取各所述视频帧对应的编码参数;根据各所述视频帧对应的编码参数,得到编码特征均值和编码特征方差值,其中,所述编码特征均值为各所述视频帧对应的编码参数的平均值;所述编码特征方差值为各所述视频帧对应的编码参数的方差值;根据所述视频参数、所述备选质量等级,以及各所视频帧对应的编码特征均值和编码特征方差值,生成所述视频编码特征。
- 根据权利要求2所述的方法,其中将所述初始视频数据的视频参数和所述备选质量等级输入所述硬件视频编辑器,得到所述初始视频数据的编码参数,包括:循环执行以下步骤,直至达到预设条件:获取所述初始视频数据的当前帧;获得所述当前帧的视频参数和所述当前帧的质量等级;将所述当前帧的视频参数和所述当前帧的质量等级输入所述硬件视频编 辑器,得到所述当前帧对应的编码参数;将所述当前帧的下一帧,设置为新的当前帧。
- 根据权利要求4所述的方法,其中在所述得到所述当前帧对应的编码参数之后,还包括:将所述当前帧的视频参数、所述当前帧的质量等级和所述编码参数,生成所述当前帧对应的帧编码特征;获取所述当前帧对应的前置帧的帧编码特征,所述前置帧为所述当前帧之前相邻的第一预设数量的视频帧,其中,所述前置帧的帧编码特征是基于所述前置帧对应的目标质量等级生成的;所述基于所述编码参数生成对应的视频编码特征,包括:根据所述当前帧对应的帧编码特征和所述前置帧的帧编码特征,生成所述当前帧对应的视频编码特征。
- 根据权利要求2所述的方法,其中所述编码参数包括以下至少一种:帧类型、帧大小、图像失真度、图像精细度。
- 根据权利要求1所述的方法,其中所述初始视频数据的视频编码特征包括至少两个备选编码特征,每一所述备选编码特征对应一个不同的质量等级,基于预训练的预测神经网络模型处理所述初始视频数据的视频编码特征,得到所述初始视频数据对应的目标质量等级,包括:依次将各所述备选编码特征输入所述预测神经网络模型,得到对应的第一评估值和对应的第二评估值,其中,所述第一评估值表征基于视频多方法评价融合的视频质量评价值,所述第二评估值表征视频码率;基于各所述备选编码特征对应的所述第一评估值和对应的所述第二评估值,生成所述目标质量等级。
- 根据权利要求7所述的方法,其中所述基于各所述备选编码特征对应的所述第一评估值和对应的所述第二评估值,生成所述目标质量等级,包括:根据各所述备选编码特征对应的第一评估值,得到第一目标编码特征,所述第一目标编码特征为所述第一评估值大于第一阈值的备选编码特征;根据所述第一目标编码特征的第二评估值,确定第二目标编码特征,所述第二目标特征为所述第一目标编码特征中,所述第二评估值最小的视频编码特征;根据所述第二目标特征对应的质量等级,得到所述目标质量等级。
- 根据权利要求7所述的方法,其中在所述基于预训练的预测神经网络模型处理所述初始视频数据的视频编码特征,得到所述初始视频数据对应的目标质量等级之前,还包括:获取原始视频数据和质量等级序列,所述质量等级序列中包括至少两个不同的质量等级;基于所述质量等级序列,依次利用所述硬件视频编辑器对所述原始视频数据进行处理,得到各所述质量等级对应的视频编码特征;计算各所述视频编码特征对应的第一评估值和所述第二评估值;根据各所述视频编码特征和对应的所述第一评估值、对应的所述第二评估值生成训练样本,并基于所述训练样本训练预设的神经网络模型,得到所述预测神经网络模型。
- 一种视频编码装置,包括:参数获取模块,用于基于硬件视频编辑器提取初始视频数据的视频编码特征,所述视频编码特征表征所述初始视频数据的视频内容复杂度;参数优化模块,用于基于预训练的预测神经网络模型处理所述初始视频数据的视频编码特征,得到所述初始视频数据对应的目标质量等级,其中,所述目标质量等级表征固定品质动态码率模式下编码所述初始视频时的画质级别;编码模块,用于利用所述硬件视频编辑器以所述目标质量等级对所述初始视频数据进行编码,生成目标视频。
- 一种电子设备,包括:处理器,以及与所述处理器通信连接的存储器;所述存储器存储计算机执行指令;所述处理器执行所述存储器存储的计算机执行指令,以实现如权利要求1至9中任一项所述的视频编码方法。
- 一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如权利要求1至9任一项所述的视频编码方法。
- 一种计算机程序产品,包括计算机程序,该计算机程序被处理器执 行时实现权利要求1至9中任一项所述的视频编码方法。
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2024573687A JP2025522455A (ja) | 2022-12-07 | 2023-12-05 | ビデオ符号化方法、装置、電子機器及び記憶媒体 |
| EP23899971.8A EP4525442A4 (en) | 2022-12-07 | 2023-12-05 | VIDEO CODING METHOD AND APPARATUS, ELECTRONIC DEVICE AND STORAGE MEDIA |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202211567316.XA CN118200571A (zh) | 2022-12-07 | 2022-12-07 | 视频编码方法、装置、电子设备及存储介质 |
| CN202211567316.X | 2022-12-07 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024120396A1 true WO2024120396A1 (zh) | 2024-06-13 |
Family
ID=91378574
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2023/136507 Ceased WO2024120396A1 (zh) | 2022-12-07 | 2023-12-05 | 视频编码方法、装置、电子设备及存储介质 |
Country Status (4)
| Country | Link |
|---|---|
| EP (1) | EP4525442A4 (zh) |
| JP (1) | JP2025522455A (zh) |
| CN (1) | CN118200571A (zh) |
| WO (1) | WO2024120396A1 (zh) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118573870A (zh) * | 2024-08-01 | 2024-08-30 | 北京宏远智控技术有限公司 | 一种视频编码方法、装置、设备以及存储介质 |
| CN119011777A (zh) * | 2024-08-12 | 2024-11-22 | 珠海城市管道燃气有限公司 | 燃气抢险处理方法和系统 |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR101868270B1 (ko) * | 2017-02-28 | 2018-06-15 | 재단법인 다차원 스마트 아이티 융합시스템 연구단 | 싱글 패스 일관 화질 제어를 기반으로 하는 컨텐츠 인식 비디오 인코딩 방법, 컨트롤러 및 시스템 |
| CN111246209A (zh) * | 2020-01-20 | 2020-06-05 | 北京字节跳动网络技术有限公司 | 自适应编码方法、装置、电子设备及计算机存储介质 |
| CN112399176A (zh) * | 2020-11-17 | 2021-02-23 | 深圳大学 | 一种视频编码方法、装置、计算机设备及存储介质 |
| CN113329226A (zh) * | 2021-05-28 | 2021-08-31 | 北京字节跳动网络技术有限公司 | 数据的生成方法、装置、电子设备及存储介质 |
| CN114845106A (zh) * | 2021-02-01 | 2022-08-02 | 北京大学深圳研究生院 | 视频编码方法、装置和存储介质及电子设备 |
| CN115209150A (zh) * | 2022-09-16 | 2022-10-18 | 沐曦科技(成都)有限公司 | 一种视频编码参数获取方法、装置、网络模型及电子设备 |
Family Cites Families (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111083473B (zh) * | 2019-12-28 | 2022-03-08 | 杭州当虹科技股份有限公司 | 一种基于机器学习的内容自适应视频编码方法 |
| CN111263154B (zh) * | 2020-01-22 | 2022-02-11 | 腾讯科技(深圳)有限公司 | 一种视频数据处理方法、装置及存储介质 |
| BR112023019978A2 (pt) * | 2021-05-28 | 2023-11-21 | Deepmind Tech Ltd | Treinamento de redes neurais de controle de taxa por meio de aprendizado por reforço |
| US12206914B2 (en) * | 2021-06-12 | 2025-01-21 | Google Llc | Methods, systems, and media for determining perceptual quality indicators of video content items |
| CN114554211B (zh) * | 2022-01-14 | 2025-01-28 | 百果园技术(新加坡)有限公司 | 内容自适应视频编码方法、装置、设备和存储介质 |
-
2022
- 2022-12-07 CN CN202211567316.XA patent/CN118200571A/zh active Pending
-
2023
- 2023-12-05 EP EP23899971.8A patent/EP4525442A4/en active Pending
- 2023-12-05 JP JP2024573687A patent/JP2025522455A/ja active Pending
- 2023-12-05 WO PCT/CN2023/136507 patent/WO2024120396A1/zh not_active Ceased
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR101868270B1 (ko) * | 2017-02-28 | 2018-06-15 | 재단법인 다차원 스마트 아이티 융합시스템 연구단 | 싱글 패스 일관 화질 제어를 기반으로 하는 컨텐츠 인식 비디오 인코딩 방법, 컨트롤러 및 시스템 |
| CN111246209A (zh) * | 2020-01-20 | 2020-06-05 | 北京字节跳动网络技术有限公司 | 自适应编码方法、装置、电子设备及计算机存储介质 |
| CN112399176A (zh) * | 2020-11-17 | 2021-02-23 | 深圳大学 | 一种视频编码方法、装置、计算机设备及存储介质 |
| CN114845106A (zh) * | 2021-02-01 | 2022-08-02 | 北京大学深圳研究生院 | 视频编码方法、装置和存储介质及电子设备 |
| CN113329226A (zh) * | 2021-05-28 | 2021-08-31 | 北京字节跳动网络技术有限公司 | 数据的生成方法、装置、电子设备及存储介质 |
| CN115209150A (zh) * | 2022-09-16 | 2022-10-18 | 沐曦科技(成都)有限公司 | 一种视频编码参数获取方法、装置、网络模型及电子设备 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4525442A4 |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118573870A (zh) * | 2024-08-01 | 2024-08-30 | 北京宏远智控技术有限公司 | 一种视频编码方法、装置、设备以及存储介质 |
| CN119011777A (zh) * | 2024-08-12 | 2024-11-22 | 珠海城市管道燃气有限公司 | 燃气抢险处理方法和系统 |
Also Published As
| Publication number | Publication date |
|---|---|
| EP4525442A4 (en) | 2025-10-29 |
| EP4525442A1 (en) | 2025-03-19 |
| JP2025522455A (ja) | 2025-07-15 |
| CN118200571A (zh) | 2024-06-14 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN112437345B (zh) | 视频倍速播放方法、装置、电子设备及存储介质 | |
| WO2024120396A1 (zh) | 视频编码方法、装置、电子设备及存储介质 | |
| US11997314B2 (en) | Video stream processing method and apparatus, and electronic device and computer-readable medium | |
| CN111385576B (zh) | 视频编码方法、装置、移动终端及存储介质 | |
| CN114257815B (zh) | 一种视频转码方法、装置、服务器和介质 | |
| US11785195B2 (en) | Method and apparatus for processing three-dimensional video, readable storage medium and electronic device | |
| CN108174290A (zh) | 用于处理视频的方法和装置 | |
| WO2024104307A1 (zh) | 直播视频流渲染方法、装置、设备、存储介质及产品 | |
| WO2021143273A1 (zh) | 直播流采样方法、装置及电子设备 | |
| CN118334594A (zh) | 目标区域安全监控方法、装置、电子设备与可读介质 | |
| CN115442617A (zh) | 一种基于视频编码的视频处理方法和装置 | |
| JP7411785B2 (ja) | イントラ予測のための補間フィルタリング方法と装置、コンピュータプログラム及び電子装置 | |
| US11190774B1 (en) | Screen content encoding mode evaluation including intra-block evaluation of multiple potential encoding modes | |
| CN115761090B (zh) | 特效渲染方法、装置、设备、计算机可读存储介质及产品 | |
| CN115022629B (zh) | 云游戏视频的最优编码模式确定方法与装置 | |
| WO2025209165A1 (zh) | 预测单元pu模式选择方法及装置、电子设备和存储介质 | |
| CN109495793B (zh) | 一种弹幕写入方法、装置、设备及介质 | |
| CN114125443A (zh) | 视频码率控制方法、装置和电子设备 | |
| CN119227770B (zh) | 多模态大语言模型量化方法、装置、设备、存储介质及产品 | |
| CN117956157B (zh) | 视频编码方法、装置、电子设备及计算机存储介质 | |
| CN115396672B (zh) | 比特流存储方法、装置、电子设备和计算机可读介质 | |
| US20250133213A1 (en) | Video encoding method and apparatus, electronic device and storage medium | |
| JP7345638B2 (ja) | ビデオ復号または符号化方法および装置、コンピュータプログラム、ならびに電子機器 | |
| CN121000875A (zh) | 一种三维高斯处理方法、装置、设备、介质及程序产品 | |
| CN119676530A (zh) | 一种视频描述生成方法、装置、电子设备及存储介质 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23899971 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2023899971 Country of ref document: EP |
|
| ENP | Entry into the national phase |
Ref document number: 2023899971 Country of ref document: EP Effective date: 20241210 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2024573687 Country of ref document: JP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |