WO2024120396A1 - 视频编码方法、装置、电子设备及存储介质 - Google Patents

视频编码方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2024120396A1
WO2024120396A1 PCT/CN2023/136507 CN2023136507W WO2024120396A1 WO 2024120396 A1 WO2024120396 A1 WO 2024120396A1 CN 2023136507 W CN2023136507 W CN 2023136507W WO 2024120396 A1 WO2024120396 A1 WO 2024120396A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
quality level
frame
coding
encoding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2023/136507
Other languages
English (en)
French (fr)
Inventor
梅元刚
韩超诣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zitiao Network Technology Co Ltd
Original Assignee
Beijing Zitiao Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zitiao Network Technology Co Ltd filed Critical Beijing Zitiao Network Technology Co Ltd
Priority to JP2024573687A priority Critical patent/JP2025522455A/ja
Priority to EP23899971.8A priority patent/EP4525442A4/en
Publication of WO2024120396A1 publication Critical patent/WO2024120396A1/zh
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/14Coding unit complexity, e.g. amount of activity or edge presence estimation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/154Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/179Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a scene or a shot

Definitions

  • the embodiments of the present disclosure relate to the field of image processing technology, and in particular to a video encoding method, device, electronic device, and storage medium.
  • a graphics processing unit is a processor in mobile phones, personal computers and other terminal devices that is responsible for performing image processing tasks.
  • the graphics processor has powerful digital computing and parallel processing capabilities. In application scenarios such as video encoding and decoding, the use of the graphics processor can effectively improve the quality and efficiency of image encoding and decoding.
  • Some graphics processors include a hardware-based video encoder, also called a hardware video editor, such as an Nvenc unit.
  • the hardware video editor can encode data in a YUV/RGB format into a video that complies with the H.264/HEVC standard, thereby achieving efficient video encoding.
  • the embodiments of the present disclosure provide a video encoding method, apparatus, electronic device, and storage medium to overcome the problem of unreasonable bit rate of encoded video caused by encoding at a fixed quality level.
  • an embodiment of the present disclosure provides a video encoding method, including:
  • an embodiment of the present disclosure provides a video encoding device, including:
  • a parameter acquisition module used for extracting video coding features of the initial video data based on a hardware video editor, wherein the video coding features represent the complexity of the video content of the initial video data;
  • a parameter optimization module configured to process the video encoding features of the initial video data based on a pre-trained prediction neural network model to obtain a target quality level corresponding to the initial video data, wherein the target quality level represents a picture quality level when encoding the initial video in a fixed quality dynamic bit rate mode;
  • the encoding module is used to encode the initial video data at the target quality level using the hardware video editor to generate a target video.
  • an electronic device including:
  • a processor and a memory communicatively connected to the processor
  • the memory stores computer-executable instructions
  • the processor executes the computer-executable instructions stored in the memory to implement the video encoding method as described in the first aspect and various possible designs of the first aspect.
  • an embodiment of the present disclosure provides a computer-readable storage medium, in which computer execution instructions are stored.
  • a processor executes the computer execution instructions, the video encoding method described in the first aspect and various possible designs of the first aspect is implemented.
  • an embodiment of the present disclosure provides a computer program product, including a computer program, which, when executed by a processor, implements the video encoding method as described in the first aspect and various possible designs of the first aspect.
  • the video encoding method, device, electronic device and storage medium extract video encoding features of initial video data based on a hardware video editor, wherein the video encoding features represent the complexity of the video content of the initial video data; process the video encoding features of the initial video data based on a pre-trained predictive neural network model to obtain a target quality corresponding to the initial video data.
  • Level wherein the target quality level represents the image quality level when encoding the initial video in a fixed quality dynamic bit rate mode; the initial video data is encoded using the hardware video editor at the target quality level to generate a target video.
  • the video encoding features corresponding to the initial video data are obtained, and then the target image quality level matching the video encoding features is obtained through a pre-trained predictive neural network model, and then the hardware video editor is used to encode the video at the target quality level, so that the bit rate of the generated target video is adapted to the video content, avoiding the problem of too high or too low bit rate, improving the video quality, and avoiding bit rate waste.
  • FIG1 is a schematic diagram of an application scenario provided by an embodiment of the present disclosure.
  • FIG2 is a flowchart diagram 1 of a video encoding method provided by an embodiment of the present disclosure
  • FIG3 is a flowchart of specific implementation steps of step S101 in the embodiment shown in FIG2 ;
  • FIG4 is a flowchart of specific implementation steps of step S1013 in the embodiment shown in FIG3 ;
  • FIG5 is a schematic diagram of a process for generating video coding features provided by an embodiment of the present disclosure
  • FIG6 is a schematic diagram of a data structure of a video coding feature provided by an embodiment of the present disclosure.
  • FIG7 is a second flow chart of a video encoding method provided by an embodiment of the present disclosure.
  • FIG8 is a schematic diagram of a process for generating an evaluation value corresponding to a current quality level of a current frame provided by an embodiment of the present disclosure
  • FIG9 is a schematic diagram of the steps for training a prediction neural network model
  • FIG10 is a structural block diagram of a video encoding device provided by an embodiment of the present disclosure.
  • FIG11 is a schematic diagram of the structure of an electronic device provided by an embodiment of the present disclosure.
  • FIG. 12 is a schematic diagram of the hardware structure of an electronic device provided in an embodiment of the present disclosure.
  • the video encoding method provided by the embodiment of the present disclosure can be applied to various application scenarios that require video encoding, such as video editing, previewing, and playback. More specifically, for example, it is applied to video editing software, video editing cloud platforms, and live broadcast software. Exemplarily, the method provided by the embodiment of the present disclosure can be applied to terminal devices, such as smart phones, tablet computers, personal computers, etc.; it can also be applied to cloud servers.
  • Figure 1 is a schematic diagram of an application scenario provided by an embodiment of the present disclosure.
  • the terminal device runs the video editing software to edit the original video, such as adding video effects, adding audio tracks, subtitles, etc., to generate video data, and the video data may include multiple video frames and editing information corresponding to each frame.
  • the terminal device encodes the video data by calling the hardware video editor in the graphics processor, generates a target video (film) with editing effects for playback, and completes the video editing workflow, wherein the hardware video editor is, for example, an Nvenc unit.
  • variable bit rate (VBR) encoding also known as dynamic bit rate encoding and non-fixed bit rate encoding
  • VBR variable bit rate
  • CQ-VBR constant quality dynamic bit rate
  • the quality level (cq value) is usually set based on the user's experience, which often results in unreasonable quality level settings, resulting in the video bit rate being too high (resulting in a large video volume, wasting storage and network resources) or the bit rate being too low (the video is too sparse). The quality is low, affecting the viewing experience of the video).
  • the disclosed embodiment provides a video encoding method, which solves the above-mentioned problem by automatically generating a reasonable target quality level (cq value) and performing video encoding based on the target quality level.
  • FIG. 2 is a flow chart of a video encoding method provided by an embodiment of the present disclosure.
  • the method of this embodiment can be applied to an electronic device provided with a hardware video editor, such as a terminal device or a server.
  • the terminal device is used as the execution subject for introduction.
  • the video encoding method includes:
  • Step S101 extracting video coding features of initial video data based on a hardware video editor, where the video coding features represent the complexity of video content of the initial video data.
  • the terminal device first obtains initial video data, which may be data generated based on a video editing application, a live broadcast application, etc., for example, video data in YUV/RGB format, as well as video effects, added audio tracks, subtitles, etc.
  • the initial video data is the data to be encoded, and after encoding the initial video data, a corresponding playable video can be generated.
  • the initial video data is processed by calling the hardware video editor.
  • the initial video data is used as an input parameter, and the application interface of the hardware video editor is called, and then the corresponding processing function is run to process the initial video data to obtain the video encoding features corresponding to the initial video data.
  • the video encoding features characterize the video content complexity of the initial video data through the video parameters of the initial video data and the encoding parameters corresponding to the video parameters.
  • the video encoding features are obtained after processing based on the video parameters and the encoding parameters, wherein the video parameters are information characterizing the initial video data, such as video height, width (i.e., resolution), frame rate (fps), etc.; the encoding parameters are parameters used by the hardware video editor to encode the initial video data, such as frame encoding size (size), the type of each frame corresponding to the video data (including I frame, P frame, B frame), image distortion, image fineness, etc.
  • the video parameters are information characterizing the initial video data, such as video height, width (i.e., resolution), frame rate (fps), etc.
  • the encoding parameters are parameters used by the hardware video editor to encode the initial video data, such as frame encoding size (size), the type of each frame corresponding to the video data (including I frame, P frame, B frame), image distortion, image fineness, etc.
  • step S101 include:
  • Step S1011 Obtain candidate quality levels of initial video data.
  • the alternative quality level can be a preset default value. More specifically, the alternative quality level can be a value in the range of [18,35], such as 25. The smaller the quality level, the higher the quality level of the video generated after encoding, the clearer the video, and relatively, the larger the video size.
  • Step S1012 Input the video parameters and the candidate quality levels of the initial video data into the hardware video editor to obtain the encoding parameters of the initial video data.
  • the video parameters and the alternative quality level of the initial video data are used as input quantities and input into the hardware video editor to obtain the encoding parameters output by the hardware video editor, such as frame encoding size (size), the type of each frame corresponding to the video data (including I frame, P frame, B frame), image distortion, image fineness, etc.
  • the image distortion can be represented by the Sum of Absolute Transformed Difference (SATD) of each frame
  • the image fineness can be represented by the Quantizer Parameter (QP) of each frame.
  • SSD Sum of Absolute Transformed Difference
  • QP Quantizer Parameter
  • Step S1013 Generate corresponding video encoding features based on the encoding parameters.
  • the encoding parameters obtained by inputting video parameters and alternative quality levels into a hardware video editor are equivalent to the pre-encoding of the initial video data by the hardware video editor, that is, the hardware video editor predicts the corresponding encoding parameters based on the initial video data, but does not perform actual encoding.
  • video encoding features are generated in combination with the video parameters and the alternative quality levels.
  • the video encoding features can express the complexity of the video content of the initial video data.
  • a quality level that matches it can be determined for encoding, thereby achieving the purpose of matching the combination of picture quality and the complexity of the video content, avoiding waste or too low bit rate problems.
  • the initial video data includes multiple video frames, as shown in FIG4 , and the specific implementation steps of step S1013 include:
  • Step S1013A Obtain encoding parameters corresponding to each video frame.
  • Step S1013B According to the coding parameters corresponding to each video frame, a coding feature mean and a coding feature variance value are obtained, wherein the coding feature mean is the average value of the coding parameters corresponding to each video frame; and the coding feature variance value is the variance value of the coding parameters corresponding to each video frame.
  • Step S1013C Generate video coding features according to the video parameters, the candidate quality levels, and the coding feature mean and coding feature variance values corresponding to each video frame.
  • the initial video data includes multiple video frames
  • the frame coding features corresponding to the video frames in the initial video data are obtained by respectively obtaining the video parameters and coding parameters corresponding to the video frames in the initial video data, and then judging the video content complexity of the initial video data according to the average level and change between the multiple frame coding features, and then obtaining the video coding features of the initial video data.
  • FIG5 is a schematic diagram of a process for generating video coding features provided by an embodiment of the present disclosure, and the above process is introduced in conjunction with FIG5. As shown in FIG5, the initial video data includes N video frames, and N is an integer greater than 1.
  • the video parameters of the Mth frame such as the height, width, and frame rate of the Mth frame
  • the video parameters and the alternative coding level into the interface of the hardware video editor
  • the hardware video editor uses the hardware video editor to obtain the coding parameters of the Mth frame, such as the type of the Mth frame (shown as Type in the figure), the frame coding size (shown as Size in the figure), the absolute error sum (shown as SATD in the figure), and the quantization parameter (shown as QP in the figure).
  • the mean and variance of the coding parameters corresponding to the Mth frame are calculated to obtain the coding feature mean and coding feature variance values.
  • the mean and variance of the coding parameters corresponding to the Mth frame refer to the mean and variance of the coding parameters of each video frame in the set consisting of at least one adjacent video frame before the Mth frame (shown as the Lth frame in the figure, L is an integer less than M and greater than or equal to 1) and the Mth frame.
  • the corresponding mean and variance are calculated to obtain the specific calculation process of the coding feature mean and the coding feature variance value.
  • the video coding feature corresponding to the Mth frame is obtained, and the video coding feature represents the video content complexity of the video segment corresponding to the Lth frame to the Mth frame in the initial video data.
  • the video coding feature represents the video content complexity of the video segment before the Mth frame in the initial video data.
  • FIG6 is a schematic diagram of a data structure of a video coding feature provided by an embodiment of the present disclosure.
  • the video coding feature includes 21 data fields, which are shown as fields #1 to #21. Among them:
  • the #1 field indicates the image height of the 1st to Mth frames in the initial video data
  • the #2 field indicates the image width of the first frame to the Mth frame in the initial video data
  • the #3 field indicates the frame rate of the 1st to Mth frames in the initial video data
  • Field #4 indicates the alternative quality level of the 1st to Mth frames in the initial video data
  • the #5 field indicates the number of I frames from the 1st frame to the Mth frame in the initial video data
  • the #6 field indicates the number of P frames from the 1st frame to the Mth frame in the initial video data
  • the #7 field indicates the number of B frames from the 1st frame to the Mth frame in the initial video data
  • the #8 field indicates the average size of the I frames from the 1st frame to the Mth frame in the initial video data
  • the #9 field indicates the average size of the P frames from the 1st frame to the Mth frame in the initial video data
  • the #10 field indicates the average size of the B frames from the 1st frame to the Mth frame in the initial video data
  • the #11 field indicates the mean absolute error sum of the I frames from the 1st frame to the Mth frame in the initial video data
  • the #12 field indicates the mean absolute error of the P frames from the 1st frame to the Mth frame in the initial video data
  • the #13 field indicates the mean absolute error of the B frames from the 1st frame to the Mth frame in the initial video data
  • the #14 field indicates the average quantization parameter of the 1st frame to the Mth frame in the initial video data
  • the #15 field indicates the size variance of the I frames from the 1st frame to the Mth frame in the initial video data
  • the #16 field indicates the size variance of the P frames from the 1st frame to the Mth frame in the initial video data
  • the #17 field indicates the size variance of the B frames from the 1st frame to the Mth frame in the initial video data
  • the #18 field indicates the mean absolute error and variance of the I frames from the 1st frame to the Mth frame in the initial video data
  • the #19 field indicates the mean absolute error and variance of the P frames from the 1st frame to the Mth frame in the initial video data
  • the #20 field indicates the mean absolute error and variance of the B frames from the 1st frame to the Mth frame in the initial video data
  • the #21 field indicates the quantization parameter variance of the 1st frame to the Mth frame in the original video data.
  • Step S102 Processing the video encoding features of the initial video data based on the pre-trained predictive neural network model to obtain a target quality level corresponding to the initial video data, wherein the target quality level represents the image quality level when encoding the initial video in a fixed quality dynamic bit rate mode.
  • the quality level i.e., the cq value
  • CQ-VBR constant quality variable bit rate
  • the video coding features of the initial video data are processed by a pre-trained predictive neural network model to predict a quality level that matches the complexity of the video content it represents, i.e., the target quality level.
  • Step S103 Encode the initial video data at a target quality level using a hardware video editor to generate a target video.
  • the hardware video editor is called to process the initial video data with the target quality level as a parameter, so as to generate a corresponding video piece for playback, namely, the target video.
  • the target quality level may be a level sequence including multiple level identifiers, and the level identifier represents a specific quality level (i.e., cq value).
  • each level identifier in the level sequence corresponds to one or more video frames in the initial video data; in a more specific possible implementation, each level identifier corresponds to a video frame, and when the initial video data is encoded at the target quality level using a hardware video editor, each video frame in the initial video data is obtained in sequence (in parallel or serially), and based on the level identifier corresponding to each video frame, the hardware video editor is called to encode the corresponding video frame at a fixed quality dynamic bit rate, so that each encoded video frame has a different picture quality level, thereby achieving more accurate encoding and improving encoding efficiency.
  • the specific implementation process of calling the hardware video editor to encode at a fixed quality dynamic bit rate is a prior art and will not be repeated here.
  • the video coding features of the initial video data are extracted based on a hardware video editor, and the video coding features characterize the complexity of the video content of the initial video data; the video coding features of the initial video data are processed based on a pre-trained predictive neural network model to obtain a target quality level corresponding to the initial video data, wherein the target quality level characterizes the image quality level when encoding the initial video in a fixed quality dynamic bit rate mode; the initial video data is encoded at the target quality level using the hardware video editor to generate a target video.
  • the video coding features corresponding to the initial video data are obtained, and then the target image quality level matching the video coding features is obtained through a pre-trained predictive neural network model, and then the hardware video editor is used to perform video encoding at the target quality level, so that the bit rate of the generated target video is adapted to the video content, avoiding the problem of too high or too low bit rate, improving the video quality, and avoiding bit rate waste.
  • FIG. 7 is a second flow chart of a video encoding method provided by an embodiment of the present disclosure. Based on the embodiment shown in FIG. 2 , this embodiment further refines the implementation process of steps S101 and S102.
  • the video encoding method includes:
  • Step S201 Acquire a quality grade sequence representing candidate quality grades, where the quality grade sequence is a set of multiple quality grades arranged in order.
  • the alternative quality levels may include multiple ones, and the multiple alternative quality levels are characterized by a preset quality level sequence, that is, the quality level sequence is an implementation method of the alternative quality levels.
  • the quality level sequence can be an enumerated data structure such as an array, a matrix, a key-value pair, a structure, or a set of numbers expressed in a function manner, which will not be described one by one here.
  • Step S202 Acquire the current frame of the initial video data.
  • Step S203 obtaining video parameters of the current frame and the current quality level in the quality level sequence
  • Step S204 input the video parameters of the current frame and the quality level of the current frame into the hardware video editor to obtain encoding parameters corresponding to the current quality level of the current frame.
  • the matching target quality level corresponding to each video frame in the initial video parameters is obtained by looping in two dimensions (video frame dimension and quality level dimension), so that each frame is encoded based on the target quality level corresponding to each video frame, thereby realizing dynamic encoding of frames in the video and improving encoding efficiency and encoding quality.
  • the video parameters can be directly obtained and will not be repeated; thereafter, based on the multiple quality levels and video parameters in the alternative quality data, the hardware video editor is input in sequence to obtain the encoding parameters corresponding to the current quality level of the current frame.
  • the encoding parameters of the first frame of the initial video data can be obtained by calling the corresponding interface of the hardware video editor based on the preset default quality level and the video parameters.
  • Step S205 Generate a first feature corresponding to the current quality level of the current frame by using the video parameters of the current frame, the current quality level of the current frame and the encoding parameters corresponding to the current quality level;
  • the above video parameters, current quality level, and encoding parameters are combined to generate a frame encoding feature corresponding to the current quality level, that is, the first feature. That is, the first feature includes the video parameters, the current quality level, and the encoding parameters.
  • F represents the first feature of the current frame
  • h represents the image height
  • w represents the image width
  • fps represents the image frame rate.
  • the above h, w, and fps are video parameters.
  • cq represents the current quality level
  • type represents the type of the current frame
  • size represents the size of the current frame
  • satd represents the absolute error sum of the current frame
  • gp represents the quantization parameter of the current frame.
  • the above type, size, satd, and gp are encoding parameters.
  • Step S206 obtaining a second feature corresponding to a target quality level of a preceding frame corresponding to the current frame, where the preceding frame is a first preset number of adjacent video frames before the current frame.
  • Step S207 Generate candidate coding features corresponding to the current quality level of the current frame according to the first feature and the second feature.
  • the second feature generated by the previous frame corresponding to the current frame based on the target quality level is obtained.
  • the second feature is similar to the first feature and is data used to characterize the frame coding feature of the previous frame.
  • the previous frame corresponding to the current frame is the first preset number of video frames adjacent to the current frame, for example, 30 video frames before the current frame.
  • the specific implementation of the previous frame can refer to the embodiment corresponding to FIG5, that is, the video frame set directly from the Lth frame to the Mth frame is the previous frame.
  • the current frame is the second frame of the initial video data
  • the previous frame of the current frame is the first frame of the initial video data
  • the second feature corresponding to the first frame is the frame coding feature generated based on the default quality level.
  • the specific process is not repeated.
  • the average value and variance value of the coding parameters in the frame coding features corresponding to the current frame are calculated to obtain the video selection coding feature corresponding to the second frame, that is, the candidate coding feature corresponding to the current quality level of the current frame.
  • the current frame is the third frame and subsequent video frames of the initial video data
  • the target quality level i.e., the optimized quality level
  • the frame coding feature corresponding to the previous frame of the current frame i.e., the second feature, is generated based on the target quality level corresponding to the previous frame.
  • an alternative coding feature corresponding to the current quality level and characterizing the complexity of the video content is generated based on the second feature and the first feature. Since the second feature of the previous frame is generated based on the optimized target quality level, the alternative coding feature generated by the second feature can more accurately express the complexity of the video content, thereby improving the accuracy of the alternative coding feature, and ultimately improving the accuracy of the target quality level obtained based on the alternative coding feature, thereby improving the video coding efficiency.
  • Step S208 Input the candidate coding features into the prediction neural network model to obtain a corresponding first evaluation value and a corresponding second evaluation value, wherein the first evaluation value represents a video quality evaluation value based on a fusion of video multi-method evaluation, and the second evaluation value represents a video bit rate.
  • Step S209 If the current quality level is at the end of the quality level sequence, continue to execute step S210; otherwise, return to execute step S203.
  • the candidate coding features obtained in step S207 are input into the prediction neural network model, and the first evaluation value and the second evaluation value output by the prediction neural network model can be obtained.
  • the first evaluation value represents the video quality evaluation value based on the fusion of video multi-method evaluation
  • the second evaluation value represents the video bit rate.
  • Video Muitimethod Assessment Fusion is a video quality assessment indicator used to measure the perception of streaming video quality in a large-scale environment. It can solve the problem that traditional indicators cannot reflect videos in multiple scenes and multiple features.
  • the specific implementation method of Video Muitimethod Assessment Fusion is an existing technology.
  • the video bit rate can be obtained from the quantization parameters in the encoding parameters, which will not be repeated here.
  • the input video coding features (alternative coding features) can be mapped to the corresponding Video Muitimethod Assessment Fusion indicators and video bit rate.
  • FIG8 is a schematic diagram of a process for generating an evaluation value corresponding to the current quality level of the current frame provided by an embodiment of the present disclosure.
  • the video frame traversal is first performed, and then the quality level traversal is performed for each video frame.
  • the current quality level is obtained.
  • the alternative coding features corresponding to the current quality level are obtained, and the alternative coding features are input into the predictive neural network model.
  • the predictive neural network model outputs a first evaluation value and a second evaluation value.
  • the first evaluation value and the second evaluation value are saved together with the corresponding alternative coding features (and/or the current quality level) as a set of alternative coding feature-evaluation value mapping data.
  • step S210 is executed to select the target quality level from multiple candidate quality levels; if the current quality level is not at the end of the quality level sequence, return to step S203, loop to the next set of quality levels (update the current quality level), and repeat the above process until all the quality levels in the quality level sequence are traversed.
  • Step S210 Generate a target quality level of the current frame based on the first evaluation value and the corresponding second evaluation value corresponding to each candidate coding feature of the current frame.
  • each candidate coding feature corresponds to a quality level.
  • the quality levels in the quality level sequence are evaluated based on the first evaluation value and the second evaluation value corresponding to each quality level in the quality level sequence of the current frame to obtain an optimal quality level, that is, the target quality level.
  • step S210 Exemplarily, the specific implementation steps of step S210 include:
  • Step S2101 According to the first evaluation value corresponding to each candidate coding feature, a first target coding feature is obtained, where the first target coding feature is a candidate coding feature whose first evaluation value is greater than a first threshold.
  • Step S2102 Determine a second target coding feature according to the second evaluation value of the first target coding feature, where the second target feature is a video coding feature with the smallest second evaluation value among the first target coding features.
  • Step S2103 Obtain a target quality level according to the quality level corresponding to the second target feature.
  • the first evaluation value and the second evaluation value respectively represent the video quality and bit rate after the video is encoded, wherein the higher the first evaluation value, the higher the video quality of the video, and the higher the second evaluation value, the higher the bit rate, that is, the larger the video volume.
  • the higher the first evaluation value, the higher the video quality of the video, and the higher the second evaluation value, the higher the bit rate, that is, the larger the video volume In order to improve the efficiency of video encoding, it is necessary to select a quality level that can reduce the bit rate to the greatest extent while meeting the video quality requirements of the preset video.
  • the candidate coding feature with the first evaluation value (i.e., video quality) greater than the first threshold is determined as the first target coding feature; then, from the first target coding feature, the video coding feature with the smallest second evaluation value (i.e., the smallest bit rate) is selected, and then the quality level corresponding to the video coding feature is obtained as the target quality level.
  • the above process can be implemented by the candidate coding feature-evaluation value mapping data saved in the previous step, and the specific process will not be repeated.
  • a matching target quality level is obtained, so that the target video encoded based on the target quality level can meet the preset video quality requirements.
  • the bit rate can be reduced, the video volume can be compressed, and the video encoding efficiency can be improved.
  • Step S211 Encode the current frame at a target quality level using a hardware video editor to generate a target frame for constituting a target video.
  • Step S212 If the current frame is not the last frame of the initial video data, the next frame of the current frame is set as the new current frame, and the process returns to step S202.
  • the current frame is encoded based on the target quality level to obtain a target frame with better video quality and lower bit rate; at the same time, if the current frame is not the last frame of the initial video data, then return to step S202 and continue to perform the above processing on the next video frame until all video frames are traversed and the corresponding target frames are generated, thereby forming a target video. Since each video frame is dynamically encoded using different target quality levels, the video volume can be reduced while improving the video quality, thereby improving the video encoding efficiency.
  • step S208 a step of training the prediction neural network model is also included.
  • the step of training the prediction neural network model includes:
  • Step S2001 obtaining original video data and a quality level sequence, wherein the quality level sequence includes at least two different quality levels;
  • Step S2002 Based on the quality level sequence, the original video data is processed in sequence using a hardware video editor to obtain video coding features corresponding to each quality level;
  • Step S2003 Calculate the first evaluation value and the second evaluation value corresponding to each video coding feature
  • Step S2004 Generate training samples according to each video encoding feature and the corresponding first evaluation value and the corresponding second evaluation value, and train a preset neural network model based on the training samples to obtain a prediction neural network model.
  • the original video data is a video data sample
  • the quality level sequence is a preset parameter.
  • the first evaluation value and the second evaluation value corresponding to each video encoding feature are obtained, and the first evaluation value and the second evaluation value are used as sample labels, and the original video data and the level sequence are used as the original sample.
  • training samples are generated to achieve efficient and high-quality training of the neural network model, so that the model can converge quickly, thereby improving the prediction accuracy and efficiency of the model.
  • FIG10 is a structural block diagram of a video encoding device provided by an embodiment of the present disclosure. For ease of explanation, only the parts related to the embodiment of the present disclosure are shown.
  • the video encoding device 3 includes:
  • a parameter acquisition module 31 is used to extract video coding features of the initial video data based on a hardware video editor, where the video coding features represent the complexity of the video content of the initial video data;
  • a parameter optimization module 32 is used to process the video encoding features of the initial video data based on the pre-trained prediction neural network model to obtain a target quality level corresponding to the initial video data, wherein the target quality level represents the image quality level when encoding the initial video in a fixed quality dynamic bit rate mode;
  • the encoding module 33 is used to encode the initial video data at a target quality level using a hardware video editor to generate a target video.
  • the parameter acquisition module 31 is specifically used to: obtain alternative quality levels of initial video data; input video parameters and alternative quality levels of the initial video data into a hardware video editor to obtain encoding parameters of the initial video data; and generate corresponding video encoding features based on the encoding parameters.
  • the initial video data includes multiple video frames.
  • the parameter acquisition module 31 When the parameter acquisition module 31 generates corresponding video coding features based on the coding parameters, it is specifically used to: obtain the coding parameters corresponding to each video frame; obtain the coding feature mean and the coding feature variance value according to the coding parameters corresponding to each video frame, wherein the coding feature mean is the average value of the coding parameters corresponding to each video frame; the coding feature variance value is the variance value of the coding parameters corresponding to each video frame; generate video coding features according to the video parameters, alternative quality levels, and the coding feature mean and coding feature variance values corresponding to each video frame.
  • the parameter acquisition module 31 when the parameter acquisition module 31 inputs the video parameters and the alternative quality levels of the initial video data into the hardware video editor to obtain the encoding parameters of the initial video data, the parameter acquisition module 31 is specifically used to: loop through the following steps until a preset condition is met: obtain the initial video data when previous frame; obtaining video parameters of the current frame and the quality level of the current frame; inputting the video parameters of the current frame and the quality level of the current frame into the hardware video editor to obtain encoding parameters corresponding to the current frame; setting the next frame of the current frame as the new current frame.
  • the parameter acquisition module 31 is further used to: generate a frame encoding feature corresponding to the current frame using the video parameters of the current frame, the quality level of the current frame and the encoding parameters; obtain a frame encoding feature of a preceding frame corresponding to the current frame, the preceding frame being a first preset number of adjacent video frames before the current frame, wherein the frame encoding feature of the preceding frame is generated based on the target quality level corresponding to the preceding frame;
  • the parameter acquisition module 31 is specifically used to generate video coding features corresponding to the current frame according to the frame coding features corresponding to the current frame and the frame coding features of the previous frame.
  • the encoding parameters include at least one of the following: frame type, frame size, image distortion, and image refinement.
  • the video coding features of the initial video data include at least two alternative coding features, each alternative coding feature corresponds to a different quality level
  • the parameter optimization module 32 is specifically used to: input each alternative coding feature into the prediction neural network model in turn to obtain a corresponding first evaluation value and a corresponding second evaluation value, wherein the first evaluation value represents a video quality evaluation value based on a fusion of video multi-method evaluation, and the second evaluation value represents a video bit rate; based on the first evaluation value and the corresponding second evaluation value corresponding to each alternative coding feature, generate a target quality level.
  • the parameter optimization module 32 when the parameter optimization module 32 generates a target quality level based on the first evaluation value and the corresponding second evaluation value corresponding to each alternative coding feature, it is specifically used to: obtain a first target coding feature according to the first evaluation value corresponding to each alternative coding feature, the first target coding feature being an alternative coding feature whose first evaluation value is greater than a first threshold; determine a second target coding feature according to the second evaluation value of the first target coding feature, the second target feature being a video coding feature with the smallest second evaluation value among the first target coding features; and obtain the target quality level according to the quality level corresponding to the second target feature.
  • the parameter optimization module 32 before processing the video coding features of the initial video data based on the pre-trained prediction neural network model to obtain the target quality level corresponding to the initial video data, the parameter optimization module 32 is further used to: obtain the original video data and a quality level sequence, wherein the quality level sequence includes at least two different quality levels; based on the quality level sequence, sequentially use the hardware video editing The device processes the original video data to obtain video coding features corresponding to each quality level; calculates the first evaluation value and the second evaluation value corresponding to each video coding feature; generates training samples according to each video coding feature and the corresponding first evaluation value and the corresponding second evaluation value, and trains a preset neural network model based on the training samples to obtain a prediction neural network model.
  • the parameter acquisition module 31, parameter optimization module 32 and encoding module 33 are connected in sequence.
  • the video encoding device 3 provided in this embodiment can implement the technical solution of the above method embodiment, and its implementation principle and technical effect are similar, which will not be described in detail in this embodiment.
  • FIG11 is a schematic diagram of the structure of an electronic device provided by an embodiment of the present disclosure. As shown in FIG11 , the electronic device 4 includes:
  • the memory 42 stores computer executable instructions
  • the processor 41 executes the computer-executable instructions stored in the memory 42 to implement the video encoding method in the embodiments shown in FIG. 2 to FIG. 9 .
  • processor 41 and the memory 42 are connected via a bus 43 .
  • FIG. 12 it shows a schematic diagram of the structure of an electronic device 900 suitable for implementing the embodiment of the present disclosure
  • the electronic device 900 may be a terminal device or a server.
  • the terminal device may include but is not limited to mobile terminals such as mobile phones, laptop computers, digital broadcast receivers, personal digital assistants (PDAs), tablet computers (Portable Android Devices, PADs), portable multimedia players (PMPs), vehicle terminals (such as vehicle navigation terminals), etc., and fixed terminals such as digital TVs, desktop computers, etc.
  • PDAs personal digital assistants
  • PADs Portable Android Devices
  • PMPs portable multimedia players
  • vehicle terminals such as vehicle navigation terminals
  • fixed terminals such as digital TVs, desktop computers, etc.
  • the electronic device shown in FIG. 12 is only an example and should not bring any limitation to the functions and scope of use of the embodiment of the present disclosure.
  • the electronic device 900 may include a processing device (e.g., a central processing unit, a graphics processing unit, etc.) 901, which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 902 or a program loaded from a storage device 908 to a random access memory (RAM) 903.
  • a processing device e.g., a central processing unit, a graphics processing unit, etc.
  • RAM random access memory
  • Various programs and data required for the operation of the electronic device 900 are also stored in the RAM 903.
  • the processing device 901, the ROM 902, and the RAM 903 are connected to each other via a bus 904.
  • I/O Input/Output
  • An interface 905 is also connected to the bus 904 .
  • the following devices may be connected to the I/O interface 905: input devices 906 including, for example, a touch screen, a touchpad, a keyboard, a mouse, a camera, a microphone, an accelerometer, a gyroscope, etc.; output devices 907 including, for example, a liquid crystal display (LCD), a speaker, a vibrator, etc.; storage devices 908 including, for example, a magnetic tape, a hard disk, etc.; and communication devices 909.
  • the communication device 909 may allow the electronic device 900 to communicate with other devices wirelessly or by wire to exchange data.
  • FIG. 12 shows an electronic device 900 having various devices, it should be understood that it is not required to implement or have all of the devices shown. More or fewer devices may be implemented or have alternatively.
  • an embodiment of the present disclosure includes a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart.
  • the computer program can be downloaded and installed from the network through the communication device 909, or installed from the storage device 908, or installed from the ROM 902.
  • the processing device 901 the above-mentioned functions defined in the method of the embodiment of the present disclosure are executed.
  • the computer-readable medium disclosed above may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination of the above.
  • Computer-readable storage media may include, but are not limited to: an electrical connection with one or more wires, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium containing or storing a program that may be used by or in combination with an instruction execution system, device or device.
  • a computer-readable signal medium may include a data signal propagated in a baseband or as part of a carrier wave, in which a computer-readable program code is carried.
  • This propagated data signal may take a variety of forms, including but not limited to an electromagnetic signal, an optical signal, or any suitable combination of the above.
  • the computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, and the computer-readable signal medium may be sent, propagated or transmitted for use or by an instruction execution system, apparatus or device. Programs used in conjunction therewith.
  • the program code contained on the computer-readable medium may be transmitted using any appropriate medium, including but not limited to: wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
  • the computer-readable medium may be included in the electronic device, or may exist independently without being incorporated into the electronic device.
  • the computer-readable medium carries one or more programs.
  • the electronic device executes the method shown in the above embodiment.
  • Computer program code for performing the operations of the present disclosure may be written in one or more programming languages or a combination thereof, including object-oriented programming languages such as Java, Smalltalk, C++, and conventional procedural programming languages such as "C" or similar programming languages.
  • the program code may be executed entirely on the user's computer, partially on the user's computer, as a separate software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server.
  • the remote computer may be connected to the user's computer via any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (e.g., via the Internet using an Internet service provider).
  • LAN Local Area Network
  • WAN Wide Area Network
  • each square box in the flow chart or block diagram can represent a module, a program segment or a part of a code, and the module, the program segment or a part of the code contains one or more executable instructions for realizing the specified logical function.
  • the functions marked in the square box can also occur in a sequence different from that marked in the accompanying drawings. For example, two square boxes represented in succession can actually be executed substantially in parallel, and they can sometimes be executed in the opposite order, depending on the functions involved.
  • each square box in the block diagram and/or flow chart, and the combination of the square boxes in the block diagram and/or flow chart can be implemented with a dedicated hardware-based system that performs a specified function or operation, or can be implemented with a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments of the present disclosure may be implemented by software or hardware.
  • the name of a unit does not limit the unit itself in some cases.
  • the first acquisition unit may also be described as "acquiring at least two Internet protocols”. unit of the proposed address”.
  • exemplary types of hardware logic components include: field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chip (SOCs), complex programmable logic devices (CPLDs), and the like.
  • FPGAs field programmable gate arrays
  • ASICs application specific integrated circuits
  • ASSPs application specific standard products
  • SOCs systems on chip
  • CPLDs complex programmable logic devices
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, device, or equipment.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or equipment, or any suitable combination of the foregoing.
  • a more specific example of a machine-readable storage medium may include an electrical connection based on one or more lines, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read-only memory
  • EPROM or flash memory erasable programmable read-only memory
  • CD-ROM portable compact disk read-only memory
  • CD-ROM compact disk read-only memory
  • magnetic storage device or any suitable combination of the foregoing.
  • a video encoding method including:
  • the method of extracting video encoding features of initial video data based on a hardware video editor includes: obtaining alternative quality levels of the initial video data; inputting video parameters of the initial video data and the alternative quality levels into the hardware video editor to obtain encoding parameters of the initial video data; and generating corresponding video encoding features based on the encoding parameters.
  • the initial video data includes multiple video frames
  • the generating corresponding video coding features based on the coding parameters includes: obtaining the coding parameters corresponding to each of the video frames; obtaining the coding feature mean and Coding feature variance value, wherein the coding feature mean value is the average value of the coding parameters corresponding to each of the video frames; the coding feature variance value is the variance value of the coding parameters corresponding to each of the video frames; the video coding feature is generated according to the video parameters, the alternative quality level, and the coding feature mean value and coding feature variance value corresponding to each of the video frames.
  • the video parameters of the initial video data and the alternative quality level are input into the hardware video editor to obtain the encoding parameters of the initial video data, including: looping the following steps until a preset condition is met: acquiring a current frame of the initial video data; obtaining the video parameters of the current frame and the quality level of the current frame; inputting the video parameters of the current frame and the quality level of the current frame into the hardware video editor to obtain the encoding parameters corresponding to the current frame; and setting the next frame of the current frame as a new current frame.
  • the encoding parameters corresponding to the current frame after obtaining the encoding parameters corresponding to the current frame, it also includes: generating frame encoding features corresponding to the current frame by combining the video parameters of the current frame, the quality level of the current frame and the encoding parameters; obtaining frame encoding features of a previous frame corresponding to the current frame, the previous frame being a first preset number of adjacent video frames before the current frame, wherein the frame encoding features of the previous frame are generated based on a target quality level corresponding to the previous frame; generating corresponding video encoding features based on the encoding parameters includes: generating video encoding features corresponding to the current frame according to the frame encoding features corresponding to the current frame and the frame encoding features of the previous frame.
  • the encoding parameters include at least one of the following: frame type, frame size, image distortion, and image fineness.
  • the video coding features of the initial video data include at least two alternative coding features, each of the alternative coding features corresponds to a different quality level, and the video coding features of the initial video data are processed based on a pre-trained predictive neural network model to obtain a target quality level corresponding to the initial video data, including: inputting each of the alternative coding features into the predictive neural network model in turn to obtain a corresponding first evaluation value and a corresponding second evaluation value, wherein the first evaluation value represents a video quality evaluation value based on a fusion of video multi-method evaluation, and the second evaluation value represents a video bit rate; based on the first evaluation value corresponding to each of the alternative coding features and the corresponding second evaluation value, the target quality level is generated.
  • generating the target quality level based on the first evaluation value and the corresponding second evaluation value corresponding to each of the candidate coding features includes: According to the first evaluation value corresponding to each of the alternative coding features, a first target coding feature is obtained, and the first target coding feature is an alternative coding feature whose first evaluation value is greater than a first threshold; according to the second evaluation value of the first target coding feature, a second target coding feature is determined, and the second target feature is a video coding feature among the first target coding features with the smallest second evaluation value; according to the quality level corresponding to the second target feature, the target quality level is obtained.
  • the pre-trained predictive neural network model before the pre-trained predictive neural network model processes the video coding features of the initial video data to obtain the target quality level corresponding to the initial video data, it also includes: acquiring original video data and a quality level sequence, wherein the quality level sequence includes at least two different quality levels; based on the quality level sequence, sequentially processing the original video data using the hardware video editor to obtain the video coding features corresponding to each of the quality levels; calculating the first evaluation value and the second evaluation value corresponding to each of the video coding features; generating training samples according to each of the video coding features and the corresponding first evaluation value and the corresponding second evaluation value, and training a preset neural network model based on the training samples to obtain the predictive neural network model.
  • a video encoding device including:
  • a parameter acquisition module used for extracting video coding features of the initial video data based on a hardware video editor, wherein the video coding features represent the complexity of the video content of the initial video data;
  • a parameter optimization module configured to process the video encoding features of the initial video data based on a pre-trained prediction neural network model to obtain a target quality level corresponding to the initial video data, wherein the target quality level represents a picture quality level when encoding the initial video in a fixed quality dynamic bit rate mode;
  • the encoding module is used to encode the initial video data at the target quality level using the hardware video editor to generate a target video.
  • the parameter acquisition module is specifically used to: obtain alternative quality levels of the initial video data; input the video parameters of the initial video data and the alternative quality levels into the hardware video editor to obtain encoding parameters of the initial video data; and generate corresponding video encoding features based on the encoding parameters.
  • the initial video data includes a plurality of video frames
  • the parameter acquisition module generates corresponding video encoding features based on the encoding parameters.
  • the invention relates to: obtaining the encoding parameters corresponding to each of the video frames; obtaining the encoding feature mean and the encoding feature variance value according to the encoding parameters corresponding to each of the video frames, wherein the encoding feature mean is the average value of the encoding parameters corresponding to each of the video frames; the encoding feature variance value is the variance value of the encoding parameters corresponding to each of the video frames; generating the video encoding features according to the video parameters, the alternative quality levels, and the encoding feature mean and the encoding feature variance value corresponding to each of the video frames.
  • the parameter acquisition module when the parameter acquisition module inputs the video parameters of the initial video data and the alternative quality level into the hardware video editor to obtain the encoding parameters of the initial video data, it is specifically used to: loop the following steps until a preset condition is met: obtain the current frame of the initial video data; obtain the video parameters of the current frame and the quality level of the current frame; input the video parameters of the current frame and the quality level of the current frame into the hardware video editor to obtain the encoding parameters corresponding to the current frame; set the next frame of the current frame as the new current frame.
  • the parameter acquisition module is further used to: generate frame coding features corresponding to the current frame based on the video parameters of the current frame, the quality level of the current frame and the encoding parameters; obtain frame coding features of a preceding frame corresponding to the current frame, the preceding frame being a first preset number of adjacent video frames before the current frame, wherein the frame coding features of the preceding frame are generated based on a target quality level corresponding to the preceding frame; when generating the corresponding video coding features based on the encoding parameters, the parameter acquisition module is specifically used to: generate the video coding features corresponding to the current frame based on the frame coding features corresponding to the current frame and the frame coding features of the preceding frame.
  • the encoding parameters include at least one of the following: frame type, frame size, image distortion, and image refinement.
  • the video coding features of the initial video data include at least two alternative coding features, each of the alternative coding features corresponds to a different quality level
  • the parameter optimization module is specifically used to: input each of the alternative coding features into the prediction neural network model in turn to obtain a corresponding first evaluation value and a corresponding second evaluation value, wherein the first evaluation value represents a video quality evaluation value based on a fusion of video multi-method evaluation, and the second evaluation value represents a video bit rate; based on the first evaluation value corresponding to each of the alternative coding features and the corresponding second evaluation value, generate the target quality level.
  • the parameter optimization module is based on each of the alternative When generating the target quality level, the first evaluation value and the second evaluation value corresponding to the coding feature are specifically used to: obtain a first target coding feature according to the first evaluation value corresponding to each of the alternative coding features, the first target coding feature being an alternative coding feature whose first evaluation value is greater than a first threshold; determine a second target coding feature according to the second evaluation value of the first target coding feature, the second target feature being a video coding feature with the smallest second evaluation value among the first target coding features; and obtain the target quality level according to the quality level corresponding to the second target feature.
  • the parameter optimization module is also used to: obtain original video data and a quality level sequence, wherein the quality level sequence includes at least two different quality levels; based on the quality level sequence, sequentially process the original video data using the hardware video editor to obtain video coding features corresponding to each of the quality levels; calculate the first evaluation value and the second evaluation value corresponding to each of the video coding features; generate training samples according to each of the video coding features and the corresponding first evaluation value and the corresponding second evaluation value, and train a preset neural network model based on the training samples to obtain the predictive neural network model.
  • an electronic device comprising: a processor, and a memory communicatively connected to the processor;
  • the memory stores computer-executable instructions
  • the processor executes the computer-executable instructions stored in the memory to implement the video encoding method as described in the first aspect and various possible designs of the first aspect.
  • a computer-readable storage medium stores computer execution instructions.
  • the video encoding method described in the first aspect and various possible designs of the first aspect is implemented.
  • an embodiment of the present disclosure provides a computer program product, including a computer program, which, when executed by a processor, implements the video encoding method as described in the first aspect and various possible designs of the first aspect.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本公开实施例提供一种视频编码方法、装置、电子设备及存储介质,通过基于硬件视频编辑器提取初始视频数据的视频编码特征,视频编码特征表征初始视频数据的视频内容复杂度;基于预训练的预测神经网络模型处理初始视频数据的视频编码特征,得到初始视频数据对应的目标质量等级,其中,目标质量等级表征固定品质动态码率模式下编码初始视频时的画质级别;利用硬件视频编辑器以目标质量等级对初始视频数据进行编码,生成目标视频。使生成的目标视频的码率与视频内容相适应,避免出现码率过高或过低的问题,提高视频画质,避免码率浪费。

Description

视频编码方法、装置、电子设备及存储介质
本申请要求2022年12月07日递交的、标题为“视频编码方法、装置、电子设备及存储介质”、申请号为202211567316.X的中国发明专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本公开实施例涉及图像处理技术领域,尤其涉及一种视频编码方法、装置、电子设备及存储介质。
背景技术
图形处理器(graphics processing unit,GPU)是手机、个人电脑等终端设备中,负责执行图像处理任务的处理器,图形处理器具有强大数字运算和并行处理能力,在视频编解码等应用场景下,利用图形处理器可以有效提高图像编解码的质量和效率。
在一些是图形处理器中,包含有基于硬件的视频编码器,也称为硬件视频编辑器,例如Nvenc单元,硬件视频编辑器能够将YUV/RGB格式的数据编码为符合H.264/HEVC标准的视频,实现高效的视频编码。
在实际应用中,调用硬件视频编辑器进行视频编码时,需要设置表征质量等级的参数,然而,现有技术中,通常是基于经验使用固定设置的质量等级进行编码,造成了编码视频的码率不合理的问题。
发明内容
本公开实施例提供一种视频编码方法、装置、电子设备及存储介质,以克服使用固定设置的质量等级进行编码,造成的编码视频的码率不合理的问题。
第一方面,本公开实施例提供一种视频编码方法,包括:
基于硬件视频编辑器提取初始视频数据的视频编码特征,所述视频编码特征表征所述初始视频数据的视频内容复杂度;基于预训练的预测神经网络模型处理所述初始视频数据的视频编码特征,得到所述初始视频数据对应的目标质量等级,其中,所述目标质量等级表征固定品质动态码率模式下编码所述初始视频时的画质级别;利用所述硬件视频编辑器以所述目标质量等级对所述初始视频数据进行编码,生成目标视频。
第二方面,本公开实施例提供一种视频编码装置,包括:
参数获取模块,用于基于硬件视频编辑器提取初始视频数据的视频编码特征,所述视频编码特征表征所述初始视频数据的视频内容复杂度;
参数优化模块,用于基于预训练的预测神经网络模型处理所述初始视频数据的视频编码特征,得到所述初始视频数据对应的目标质量等级,其中,所述目标质量等级表征固定品质动态码率模式下编码所述初始视频时的画质级别;
编码模块,用于利用所述硬件视频编辑器以所述目标质量等级对所述初始视频数据进行编码,生成目标视频。
第三方面,本公开实施例提供一种电子设备,包括:
处理器,以及与所述处理器通信连接的存储器;
所述存储器存储计算机执行指令;
所述处理器执行所述存储器存储的计算机执行指令,以实现如上第一方面以及第一方面各种可能的设计所述的视频编码方法。
第四方面,本公开实施例提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如上第一方面以及第一方面各种可能的设计所述的视频编码方法。
第五方面,本公开实施例提供一种计算机程序产品,包括计算机程序,该计算机程序被处理器执行时实现如上第一方面以及第一方面各种可能的设计所述的视频编码方法。
本实施例提供的视频编码方法、装置、电子设备及存储介质,通过基于硬件视频编辑器提取初始视频数据的视频编码特征,所述视频编码特征表征所述初始视频数据的视频内容复杂度;基于预训练的预测神经网络模型处理所述初始视频数据的视频编码特征,得到所述初始视频数据对应的目标质量 等级,其中,所述目标质量等级表征固定品质动态码率模式下编码所述初始视频时的画质级别;利用所述硬件视频编辑器以所述目标质量等级对所述初始视频数据进行编码,生成目标视频。通过利用硬件视频编辑器能够方便输出编码参数的特性,获取初始视频数据对应的视频编码特征,之后通过预训练的预测神经网络模型获得与视频编码特征匹配的目标画质等级,再利用硬件视频编辑器以目标质量等级进行视频编码,使生成的目标视频的码率与视频内容相适应,避免出现码率过高或过低的问题,提高视频画质,避免码率浪费。
附图说明
为了更清楚地说明本公开实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本公开实施例提供的一种应用场景示意图;
图2为本公开实施例提供的视频编码方法的流程示意图一;
图3为图2所示实施例中步骤S101的具体实现步骤流程图;
图4为图3所示实施例中步骤S1013的具体实现步骤流程图;
图5为本公开实施例提供的一种生成视频编码特征的过程示意图;
图6为本公开实施例提供的一种视频编码特征的数据结构示意图;
图7为本公开实施例提供的视频编码方法的流程示意图二;
图8为本公开实施例提供的一种生成当前帧的当前质量等级对应的评估值的过程示意图;
图9为对预测神经网络模型进行训练的步骤流程示意图;
图10为本公开实施例提供的视频编码装置的结构框图;
图11为本公开实施例提供的一种电子设备的结构示意图;
图12为本公开实施例提供的电子设备的硬件结构示意图。
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公 开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
下面对本公开实施例的应用场景进行解释:
本公开实施例提供的视频编码方法,可以应用于视频编辑、预览、播放等各类需要进行视频编码的应用场景。更具体地,例如,应用于视频编辑软件、视频编辑云平台、直播软件中。示例性地,本公开实施例提供的方法,可以应用于终端设备,例如智能手机、平板电脑、个人电脑等;还可以应用于云服务器。图1为本公开实施例提供的一种应用场景示意图,以在终端设备侧运行视频编辑软件的应用场景为例,如图1所示,具体地,终端设备通过运行视频编辑软件,对原始视频进行视频编辑后,例如添加视频特效、添加音轨、字幕等,生成视频数据,视频数据可以包括多个视频帧以及各帧对应的编辑信息,之后,终端设备通过调用图形处理器中的硬件视频编辑器,对视频数据进行编码,生成带有编辑效果的、用于播放的目标视频(成片),完成视频编辑的工作流程,其中,硬件视频编辑器例如Nvenc单元。
现有技术中,在调用硬件视频编辑器对视频数据进行处理时,需要设置相应的参数以控制具体的编码方式。其中,动态比特率(Variable Bit Rate,VBR)编码,也称为动态码率编码、非固定码率编码,是一种常用的编码方式,可以根据视频内容动态调整码率,从而使编码后视频的码率可以随着图像的复杂程度的不同而变化,因此其编码效率比较高,使静态画面下的视频体积得到压缩,而动态运动画面中的马赛克较少,兼顾视频的体积和质量。其中,固定品质(Constant Quality,CQ)动态码率(CQ-VBR)是一种基于动态码率的编码方法,其利用质量等级来对动态码率编码过程中的画质(码率)进行控制,从而实现对编码后生成的视频的体积和质量的更加精细的控制,进一步提高视频编码控制的灵活性和实用性。
然而,在实际应用过程中,调用硬件视频编辑器使用固定品质动态码率(CQ-VBR)模式进行编码时,质量等级(cq值)的设置,通常是基于使用者的经验来确定的,导致质量等级经常出现设置不合理的问题,从而造成视频的码率过高(导致视频体积过大,浪费存储和网络资源)或码率过低(视频画 质低,影响视频观感)的问题。
本公开实施例提供一种视频编码方法,通过自动生成合理的目标质量等级(cq值),并基于该目标质量等级来进行视频编码,从而解决上述问题。
参考图2,图2为本公开实施例提供的视频编码方法的流程示意图一。本实施例的方法可以应用在设置有硬件视频编辑器的电子设备中,例如终端设备、服务器,本实施例中以终端设备作为执行主体进行介绍,示例性地,该视频编码方法包括:
步骤S101:基于硬件视频编辑器提取初始视频数据的视频编码特征,视频编码特征表征初始视频数据的视频内容复杂度。
示例性地,在步骤S101之后,终端设备首先获取初始视频数据,可以是基于视频编辑应用、直播应用等方式生成的数据,例如包括YUV/RGB格式的视频数据,以及视频特效、添加音轨、字幕等数据。初始视频数据即待编码的数据,在对初始视频数据进行编码后,可以生成对应的可播放视频。
之后,通过调用硬件视频编辑器对初始视频数据进行处理,具体地,例如,以初始视频数据为输入参数,调用硬件视频编辑器的应用接口,进而运行对应的处理函数来处理该初始视频数据,得到该初始视频数据对应的视频编码特征。其中,视频编码特征通过初始视频数据的视频参数,以及与视频参数对应的编码参数表征初始视频数据的视频内容复杂度,具体地,在一种可能的实现方式中,视频编码特征是基于视频参数和编码参数,进行处理后得到的,其中,视频参数是表征该初始视频数据的信息,例如包括视频高度、宽度(即分辨率)、帧率(fps)等;编码参数是表征硬件视频编辑器对初始视频数据进行编码时所使用的参数,例如帧编码大小(size)、视频数据对应的各帧的类型(包括I帧、P帧、B帧)、图像失真度、图像精细度等。
进一步地,视频编码特征对应的视频参数是基于初始视频数据的视频信息直接获得的,不再赘述;视频编码特征对应的视频参数,是以视频参数为输入,调用硬件视频编辑器所提供的、用于生成编码参数的接口而获得的,因此,编码参数与初始视频数据具有对应关系。在一种可能的实现方式中,如图3所示,步骤S101的具体实现步骤包括:
步骤S1011:获取初始视频数据的备选质量等级。
示例性地,基于之前针对应用场景和现有技术的介绍,在利用硬件视频 编辑器使用固定品质动态码率模式进行编码时,需要设置质量等级,备选质量等级可以是一个预设的默认值,更具体地,备选质量等级可以位于[18,35]的区间内的值,例如25。质量等级越小,则编码后生成的视频的画质级别越高、视频越清晰,相对的,视频体积也越大。
步骤S1012:将初始视频数据的视频参数和备选质量等级输入硬件视频编辑器,得到初始视频数据的编码参数。
进一步地,通过调用硬件视频编辑器提供的接口,以初始视频数据的视频参数和备选质量等级作为输入量,输入硬件视频编辑器,得到硬件视频编辑器输出的编码参数,例如帧编码大小(size)、视频数据对应的各帧的类型(包括I帧、P帧、B帧)、图像失真度、图像精细度等。其中,示例性地,图像失真度可以通过每一帧的绝对误差和(Sum of Absolute Transformed Difference,SATD)表示,图像精细度可以通过每一帧的量化参数(Quantizer Parameter,QP)表示。其中绝对误差和量化参数的具体计算过程为现有技术,可以通过硬件视频编辑器的驱动所提供的函数实现,此处不再赘述。
步骤S1013:基于编码参数生成对应的视频编码特征。
示例性地,通过将视频参数和备选质量等级输入硬件视频编辑器后得到的编码参数,相当于是硬件视频编辑器对初始视频数据的预编码,即硬件视频编辑器基于初始视频数据,预测出对应的编码参数,但是不进行实际编码,之后,基于编码参数,结合视频参数和备选质量等级生成视频编码特征,该视频编码特征可以表现初始视频数据的视频内容的复杂度,后续基于该视频编码特征,可以确定一个与之匹配的质量等级来进行编码,从而实现画质结合与视频内容的复杂度向匹配的目的,避免码率浪费或过低问题。
进一步地,在一种可能的实现方式中,初始视频数据中包括多个视频帧,如图4所示,步骤S1013的具体实现步骤包括:
步骤S1013A:获取各视频帧对应的编码参数。
步骤S1013B:根据各视频帧对应的编码参数,得到编码特征均值和编码特征方差值,其中,编码特征均值为各视频帧对应的编码参数的平均值;编码特征方差值为各视频帧对应的编码参数的方差值。
步骤S1013C:根据视频参数、备选质量等级,以及各所视频帧对应的编码特征均值和编码特征方差值,生成视频编码特征。
示例性地,初始视频数据中包括多个视频帧,通过分别获取初始视频数据中的视频帧对应的视频参数和编码参数,来得到视频帧对应的帧编码特征,进而根据多个帧编码特征之间的平均水平和变化情况,来判断初始视频数据的视频内容复杂度,进而得到初始视频数据的视频编码特征。图5为本公开实施例提供的一种生成视频编码特征的过程示意图,下面结合图5对上述过程进行介绍,如图5所示,初始视频数据中包括N个视频帧,N为大于1的整数。其中,以第M帧为例(M为小于N大于1的整数),首先获取第M帧的视频参数,例如第M帧的高度、宽度、帧率,将该视频参数和备选编码等级输入硬件视频编辑器的接口,利用硬件视频编辑器得到第M帧的编码参数,例如,第M帧的类型(图中示为Type)、帧编码大小(图中示为Size)、绝对误差和(图中示为SATD)、量化参数(图中示为QP)。之后,计算第M帧对应的编码参数的均值和方差,得到编码特征均值和编码特征方差值。其中,具体地,第M帧对应的编码参数的均值和方差,是指第M帧之前的至少一个相邻的视频帧(图中示为第L帧,L为小于M且大于等于1的整数)与第M帧构成的集合中,各视频帧的编码参数的均值和方差。其中,针对编码参数的具体实现,计算对应的均值和方差,得到编码特征均值和编码特征方差值的具体计算过程不再赘述。之后,基于,结合视频参数、备选质量等级、编码特征均值和编码特征方差值,得到第M帧对应的视频编码特征,视频编码特征表示初始视频数据中第L帧至第M帧对应的视频段的视频内容复杂度,一种可能的L=1,视频编码特征表示初始视频数据中第M帧之前的视频段的视频内容复杂度。
进一步地,图6为本公开实施例提供的一种视频编码特征的数据结构示意图,参考图5以及图6所示,示例性地,视频编码特征共包括21个数据字段,图中示为#1字段至#21字段。其中:
#1字段表示初始视频数据中第1帧至第M帧的图像高度;
#2字段表示初始视频数据中第1帧至第M帧的图像宽度;
#3字段表示初始视频数据中第1帧至第M帧的帧率;
#4字段表示初始视频数据中第1帧至第M帧的备选质量等级;
#5字段表示初始视频数据中第1帧至第M帧中的I帧数量;
#6字段表示初始视频数据中第1帧至第M帧中的P帧数量;
#7字段表示初始视频数据中第1帧至第M帧中的B帧数量;
#8字段表示初始视频数据中第1帧至第M帧中的I帧的平均尺寸;
#9字段表示初始视频数据中第1帧至第M帧中的P帧的平均尺寸;
#10字段表示初始视频数据中第1帧至第M帧中的B帧的平均尺寸;
#11字段表示初始视频数据中第1帧至第M帧中的I帧的平均绝对误差和;
#12字段表示初始视频数据中第1帧至第M帧中的P帧的平均绝对误差和;
#13字段表示初始视频数据中第1帧至第M帧中的B帧的平均绝对误差和;
#14字段表示初始视频数据中第1帧至第M帧的平均量化参数;
#15字段表示初始视频数据中第1帧至第M帧中的I帧的尺寸方差;
#16字段表示初始视频数据中第1帧至第M帧中的P帧的尺寸方差;
#17字段表示初始视频数据中第1帧至第M帧中的B帧的尺寸方差;
#18字段表示初始视频数据中第1帧至第M帧的I帧的平均绝对误差和方差;
#19字段表示初始视频数据中第1帧至第M帧的P帧的平均绝对误差和方差;
#20字段表示初始视频数据中第1帧至第M帧的B帧的平均绝对误差和方差;
#21字段表示初始视频数据中第1帧至第M帧的量化参数方差。
步骤S102:基于预训练的预测神经网络模型处理初始视频数据的视频编码特征,得到初始视频数据对应的目标质量等级,其中,目标质量等级表征固定品质动态码率模式下编码初始视频时的画质级别。
示例性地,在得到初始视频数据的视频编码特征后,需要为该初始视频数据的视频编码特征确定一个匹配的目标质量等级。具体地,质量等级即cq值,表征固定品质动态码率(CQ-VBR)模式下编码初始视频时的画质级别,是调用硬件视频编辑器时需要使用的参数之一。本实施例步骤中,通过预训练的预测神经网络模型来处理初始视频数据的视频编码特征,预测一个与其所表征的视频内容的复杂度的相匹配的质量等级,即目标质量等级。
步骤S103:利用硬件视频编辑器以目标质量等级对初始视频数据进行编码,生成目标视频。
示例性地,进一步地,在获得目标质量等级后,以目标质量等级为参数,调用硬件视频编辑器对初始视频数据进行处理,即可生成对应的用于播放的视频成片,即目标视频。
在一种可能的实现方式中,目标质量等级可以是包含多个等级标识的等级序列,等级标识表征具体的质量等级(即cq值)。其中,等级序列中的每一等级标识,对应初始视频数据中的一个或多个视频帧;一种更具体的可能实现方式中,每一等级标识对应一个视频帧,利用硬件视频编辑器以目标质量等级对初始视频数据进行编码时,依次(并行或串行)获得初始视频数据中的每一个视频帧,并基于每一视频帧对应的等级标识,调用硬件视频编辑器以固定品质动态码率对对应的视频帧进行编码,从而使编码后的每一视频帧均具有不同的画质级别,从而实现更加精准的编码,提高编码效率。其中,调用硬件视频编辑器以固定品质动态码率进行编码的具体实现过程为现有技术,此处不再赘述。
在本实施例中,通过基于硬件视频编辑器提取初始视频数据的视频编码特征,视频编码特征表征初始视频数据的视频内容复杂度;基于预训练的预测神经网络模型处理初始视频数据的视频编码特征,得到初始视频数据对应的目标质量等级,其中,目标质量等级表征固定品质动态码率模式下编码初始视频时的画质级别;利用硬件视频编辑器以目标质量等级对初始视频数据进行编码,生成目标视频。通过利用硬件视频编辑器能够方便输出编码参数的特性,获取初始视频数据对应的视频编码特征,之后通过预训练的预测神经网络模型获得与视频编码特征匹配的目标画质等级,再利用硬件视频编辑器以目标质量等级进行视频编码,使生成的目标视频的码率与视频内容相适应,避免出现码率过高或过低的问题,提高视频画质,避免码率浪费。
参考图7,图7为本公开实施例提供的视频编码方法的流程示意图二。本实施例在图2所示实施例的基础上,进一步对步骤S101和S102的实现过程进行细化,该视频编码方法包括:
步骤S201:获取表征备选质量等级的质量等级序列,质量等级序列为多个有序排列的质量等级的集合。
示例性地,参考图2所示实施例中对备选质量等级的介绍,本实施例中,备选质量等级可以包括多个,多个备选质量等级通过预设的质量等级序列来表征,即质量等级序列是备选质量等级的一种实现方式。具体地,例如,质量等级序列为cq_data=[15:40],即包含质量等级15至质量等级40的多个质量等级的有序排列,上述有序排列的等级15至质量等级40的多个质量等级,即为备选质量等级。可以理解的是,质量等级序列的实现方式有多种,可以是数组、矩阵、键值对、结构体等枚举方式的数据结构,也可以是通过函数方式表达的数字的集合,此处不一一举例赘述。
步骤S202:获取初始视频数据的当前帧。
步骤S203:获得当前帧的视频参数和质量等级序列中的当前质量等级;
步骤S204:将当前帧的视频参数和当前帧的质量等级输入硬件视频编辑器,得到当前帧的当前质量等级对应的编码参数。
示例性地,自步骤S202和步骤S203开始,分别以两个维度(视频帧维度、质量等级维度)进行循环的方式,分别获得初始视频参数中的每一视频帧对应的匹配的目标质量等级,从而基于每一视频帧对应的目标质量等级来对每一帧进行编码,从而实现针对视频中的分帧动态编码,提高编码效率和编码质量。
具体地,从初始视频数据中的第一帧开始,依次作为当前帧,直至初始视频数据的最后一帧,针对每一当前帧,视频参数可以直接获得,不再赘述;之后,基于备选质量数据中的多个质量等级和视频参数,依次输入硬件视频编辑器,可以得到当前帧的当前质量等级对应的编码参数。其中,初始视频数据的第一帧的编码参数,可以基于预设的默认质量等级,结合视频参数,通过调用硬件视频编辑器的对应接口获得,具体实现方式在上述实施例中已进行过介绍,此处不再赘述。
步骤S205:将当前帧的视频参数、当前帧的当前质量等级和当前质量等级对应的编码参数,生成当前帧的当前质量等级对应的第一特征;
进一步地,在获得当前帧的视频参数、当前帧的当前质量等级和当前质量等级对应的编码参数后,对上述视频参数、当前质量等级和编码参数进行组合,生成当前质量等级对应的帧编码特征,即第一特征,也即,第一特征中包括视频参数、当前质量等级和编码参数。具体地,例如,视频参数包括图像 高度、宽度、分辨率、帧率;编码参数包括帧编码大小、视频数据对应的各帧的类型(包括I帧、P帧、B帧)、图像失真度、图像精细度;将当前帧的视频参数、当前帧的当前质量等级和当前帧的当前质量等级对应的编码参数进行合并,得到的第一特征:
F={h,w,fps,cq,type,size,satd,gp}。
其中,F表示当前帧的第一特征,h表示图像高度、w表示图像宽度、fps表示图像帧率,上述h、w、fps为视频参数。cq表示当前质量等级,type表示当前帧的类型、size表示当前帧的大小、satd表示当前帧的绝对误差和、gp表示当前帧的量化参数。上述type、size、satd、gp为编码参数。
步骤S206:获取当前帧对应的前置帧的目标质量等级对应的第二特征,前置帧为当前帧之前相邻的第一预设数量的视频帧。
步骤S207:根据第一特征和第二特征,生成当前帧的当前质量等级对应的备选编码特征。
进一步地,在获得当前帧对应的第一特征后,获得当前帧对应的前置帧基于目标质量等级生成的第二特征,第二特征与第一特征类似,是用于表征前置帧的帧编码特征的数据。具体地,当前帧对应的前置帧,即当前帧之前相邻的第一预设数量的视频帧,例如,当前帧之前的30个视频帧,前置帧的具体实现,可参见图5所对应实施例,即第L帧至第M帧直接的视频帧集合,即为前置帧。更具体地,当当前帧为初始视频数据的第二帧时,当前帧的前置帧为初始视频数据的第一帧,第一帧对应的第二特征,即基于默认质量等级生成的帧编码特征,具体过程不再赘述。之后,基于第一帧的第二特征和第二帧的第一特征构成的多个视频帧(初始视频数据的第一帧和第二帧)对应的帧编码特征,计算当前帧对应的帧编码特征中的编码参数的平均值和方差值,得到第二帧对应的视频选编码特征,即当前帧的当前质量等级对应的备选编码特征。具体实现过程可参见图5所示实施例中获得视频编码特征过程的详细介绍,此处不再赘述。而当当前帧为初始视频数据的第三帧及之后的视频帧时,由于初始视频数据中的各视频帧是依次作为当前帧处理的,因此,在处理至第三帧时,已经获得了前置帧(例如第二帧)对应的目标质量等级(即优化后的质量等级)。此时,当前帧的前置帧对应的帧编码特征,即第二特征,是基于该前置帧对应目标质量等级生成的。
本实施例中,通过获取当前帧对应的前置帧的目标质量等级对应的第二特征,基于第二特征和第一特征生成当前质量等级对应表征视频内容复杂度的备选编码特征,由于前置帧的第二特征是基于优化后的目标质量等级生成的,因此通过第二特征生成的备选编码特征能够更加准确的表现视频内容复杂度,从而提高备选编码特征的准确性,最终提高基于备选编码特征得到的目标质量等级的准确性,提高视频编码效率。
步骤S208:将备选编码特征输入预测神经网络模型,得到对应的第一评估值和对应的第二评估值,其中,第一评估值表征基于视频多方法评价融合的视频质量评价值,第二评估值表征视频码率。
步骤S209:若当前质量等级处于质量等级序列中的末尾,则继续执行步骤S210;否则,返回执行步骤S203。
进一步地,对于循环过程中的每一个当前帧,随着当前质量等级的循环变化,对应生成的第一特征发生也随之变化,进而,生成的当前质量等级对应的备选编码特征也会随之变化。在每一个当前质量等级对应的循环轮次中,将步骤S207中获的备选编码特征输入预测神经网络模型,可以得到预测神经网络模型输出的第一评估值和第二评估值。其中,第一评估值表征基于视频多方法评价融合的视频质量评价值,第二评估值表征视频码率。
其中,视频多方法评价融合(Video Muitimethod Assessment Fusion,VMAF)是一种视频质量评估指标,用来衡量大规模环境中流播视频质量的观感,可以解决传统指标不能反映多种场景、多种特征的视频情况。视频多方法评价融合的具体实现方法为现有技术,视频码率可以由编码参数中的量化参数获得,此处不再赘述。通过预训练的预测神经网络模型,可以将输入的视频编码特征(备选编码特征)映射为对应的视频多方法评价融合指标和视频码率。图8为本公开实施例提供的一种生成当前帧的当前质量等级对应的评估值的过程示意图,如图8所示,在针对初始视频数据,首先进行视频帧遍历,再针对每一视频帧,进行质量等级遍历,当遍历至当前帧时,获得当前质量等级,之后,基于上述实施例步骤,得到当前质量等级对应的备选编码特征,并将该备选编码特征输入预测神经网络模型,预测神经网络模型输出第一评估值和第二评估值,之后,将该第一评估值和第二评估值,与对应的备选编码特征(和/或当前质量等级),保存为一组备选编码特征-评估值映射数据。之 后,若当前质量等级处于质量等级序列中的末尾,则说明当前帧对应的质量等级序列中的质量等级已经全部遍历,则执行步骤S210,从多个备选质量等级中,选择出目标质量等级;若当前质量等级未处于质量等级序列中的末尾,返回步骤S203,循环至下一组质量等级(更新当前质量等级),并重复上述过程。直至质量等级序列中的质量等级全部遍历。
步骤S210:基于当前帧的各备选编码特征对应的第一评估值和对应的第二评估值,生成当前帧的目标质量等级。
由于备选编码特征是基于质量等级生成的,因此,每一备选编码特征对应一个质量等级。在获得当前帧的质量等级序列中每一质量等级对应的第一评估值和第二评估值后,基于当前帧的质量等级序列中每一质量等级对应的第一评估值和第二评估值,对质量等级序列中各质量等级进行评估,得到一个最佳的质量等级,即目标质量等级。
示例性地,步骤S210的具体实现步骤包括:
步骤S2101:根据各备选编码特征对应的第一评估值,得到第一目标编码特征,第一目标编码特征为第一评估值大于第一阈值的备选编码特征。
步骤S2102:根据第一目标编码特征的第二评估值,确定第二目标编码特征,第二目标特征为第一目标编码特征中,第二评估值最小的视频编码特征。
步骤S2103:根据第二目标特征对应的质量等级,得到目标质量等级。
示例性地,第一评估值和第二评估值分别表征视频进行编码后的视频质量和码率,其中,第一评估值越高,则视频的视频质量越高,而第二评估值越高,码率越高,即视频体积越大。为提高视频编码效率,需要选择一个质量等级,能够在满足预设视频的视频质量的要求的前提下,最大幅的降低码率。
为解决上述问题,本实施例中,首先基于各备选编码特征对应的第一评估值,将第一评估值(即视频质量)大于第一阈值的备选编码特征,确定为第一目标编码特征;之后,再从第一目标编码特征中,选择出第二评估值最小(即码率最小)的视频编码特征,之后获得该视频编码特征对应的质量等级,作为目标质量等级。具体地,上述过程可以通过之前步骤中保存的备选编码特征-评估值映射数据实现,具体过程不再赘述。
本实施例中,通过结合视频多方法评价融合之后和码率指标,得到匹配的目标质量等级,使基于目标质量等级编码的目标视频,能够在满足预设视 频的视频质量的要求的前提下,降低码率,实现视频体积的压缩,提高视频编码效率。
步骤S211:利用硬件视频编辑器以目标质量等级对当前帧进行编码,生成用于构成目标视频的目标帧。
步骤S212:若当前帧不为初始视频数据的最后一帧,则将当前帧的下一帧,设置为新的当前帧,并返回步骤S202。
示例性地,在获得目标质量等级之后,基于目标质量等级对当前帧进行编码,得到具有更好视频质量、更低码率的目标帧;同时,若当前帧不是初始视频数据的最后一帧,则返回步骤S202,继续对下一视频帧进行上述处理,直至所有视频帧遍历完成,均生成对应的目标帧,从而构成目标视频。由于针对每一视频帧,使用不同的目标质量等级进行动态编码,可以在提高视频质量的同时,降低视频体积,提高视频编码效率。
可选地,基于具体的需要,在步骤S208之前,还包括对预测神经网络模型进行训练的步骤,示例性,如图9所示,对预测神经网络模型进行训练的步骤包括:
步骤S2001:获取原始视频数据和质量等级序列,质量等级序列中包括至少两个不同的质量等级;
步骤S2002:基于质量等级序列,依次利用硬件视频编辑器对原始视频数据进行处理,得到各质量等级对应的视频编码特征;
步骤S2003:计算各视频编码特征对应的第一评估值和第二评估值;
步骤S2004:根据各视频编码特征和对应的第一评估值、对应的第二评估值生成训练样本,并基于训练样本训练预设的神经网络模型,得到预测神经网络模型。
示例性地,原始视频数据是视频数据样本,质量等级序列是预设的参数,通过以原始视频数据和质量等级序列作为输入参数,调用硬件视频编辑器,可以得到硬件视频编辑器输出的对应的编码参数,进而得到视频编码特征,视频编码特征的具体生成过程,可参见上述实施例所介绍,此处不再赘述。之后,基于视频多方法评价融合指标的计算方法和调用硬件视频编辑器输出的编码参数,得到各视频编码特征对应的第一评估值和第二评估值,将第一评估值和第二评估值作为样本标签,原始视频数据和等级序列作为原始样本, 生成训练样本;并基于训练样本训练预设的神经网络模型直至收敛,得到预测神经网络模型,具体过程不再赘述。
本实施例中,通过利用硬件视频编辑器可以(在不进行实际编码的情况下)方便快速的输出编码参数的特点,结合视频多方法评价融合指标,来生成训练样本,实现对神经网络模型的高效、高质量训练,使模型能够快速收敛。提高模型的预测准确性和有效率。
对应于上文实施例的视频编码方法,图10为本公开实施例提供的视频编码装置的结构框图。为了便于说明,仅示出了与本公开实施例相关的部分。参照图10,视频编码装置3,包括:
参数获取模块31,用于基于硬件视频编辑器提取初始视频数据的视频编码特征,视频编码特征表征初始视频数据的视频内容复杂度;
参数优化模块32,用于基于预训练的预测神经网络模型处理初始视频数据的视频编码特征,得到初始视频数据对应的目标质量等级,其中,目标质量等级表征固定品质动态码率模式下编码初始视频时的画质级别;
编码模块33,用于利用硬件视频编辑器以目标质量等级对初始视频数据进行编码,生成目标视频。
在本公开的一个实施例中,参数获取模块31,具体用于:获取初始视频数据的备选质量等级;将初始视频数据的视频参数和备选质量等级输入硬件视频编辑器,得到初始视频数据的编码参数;基于编码参数生成对应的视频编码特征。
在本公开的一个实施例中,初始视频数据中包括多个视频帧,参数获取模块31在基于编码参数生成对应的视频编码特征时,具体用于:获取各视频帧对应的编码参数;根据各视频帧对应的编码参数,得到编码特征均值和编码特征方差值,其中,编码特征均值为各视频帧对应的编码参数的平均值;编码特征方差值为各视频帧对应的编码参数的方差值;根据视频参数、备选质量等级,以及各所视频帧对应的编码特征均值和编码特征方差值,生成视频编码特征。
在本公开的一个实施例中,参数获取模块31在将初始视频数据的视频参数和备选质量等级输入硬件视频编辑器,得到初始视频数据的编码参数时,具体用于:循环执行以下步骤,直至达到预设条件:获取初始视频数据的当 前帧;获得当前帧的视频参数和当前帧的质量等级;将当前帧的视频参数和当前帧的质量等级输入硬件视频编辑器,得到当前帧对应的编码参数;将当前帧的下一帧,设置为新的当前帧。
在本公开的一个实施例中,在得到当前帧对应的编码参数之后,参数获取模块31,还用于:将当前帧的视频参数、当前帧的质量等级和编码参数,生成当前帧对应的帧编码特征;获取当前帧对应的前置帧的帧编码特征,前置帧为当前帧之前相邻的第一预设数量的视频帧,其中,前置帧的帧编码特征是基于前置帧对应的目标质量等级生成的;
参数获取模块31在基于编码参数生成对应的视频编码特征时,具体用于:根据当前帧对应的帧编码特征和前置帧的帧编码特征,生成当前帧对应的视频编码特征。
在本公开的一个实施例中,编码参数包括以下至少一种:帧类型、帧大小、图像失真度、图像精细度。
在本公开的一个实施例中,初始视频数据的视频编码特征包括至少两个备选编码特征,每一备选编码特征对应一个不同的质量等级,参数优化模块32,具体用于:依次将各备选编码特征输入预测神经网络模型,得到对应的第一评估值和对应的第二评估值,其中,第一评估值表征基于视频多方法评价融合的视频质量评价值,第二评估值表征视频码率;基于各备选编码特征对应的第一评估值和对应的第二评估值,生成目标质量等级。
在本公开的一个实施例中,参数优化模块32在基于各备选编码特征对应的第一评估值和对应的第二评估值,生成目标质量等级时,具体用于:根据各备选编码特征对应的第一评估值,得到第一目标编码特征,第一目标编码特征为第一评估值大于第一阈值的备选编码特征;根据第一目标编码特征的第二评估值,确定第二目标编码特征,第二目标特征为第一目标编码特征中,第二评估值最小的视频编码特征;根据第二目标特征对应的质量等级,得到目标质量等级。
在本公开的一个实施例中,在基于预训练的预测神经网络模型处理初始视频数据的视频编码特征,得到初始视频数据对应的目标质量等级之前,参数优化模块32还用于:获取原始视频数据和质量等级序列,质量等级序列中包括至少两个不同的质量等级;基于质量等级序列,依次利用硬件视频编辑 器对原始视频数据进行处理,得到各质量等级对应的视频编码特征;计算各视频编码特征对应的第一评估值和第二评估值;根据各视频编码特征和对应的第一评估值、对应的第二评估值生成训练样本,并基于训练样本训练预设的神经网络模型,得到预测神经网络模型。
其中,参数获取模块31、参数优化模块32、编码模块33依次连接。本实施例提供的视频编码装置3可以执行上述方法实施例的技术方案,其实现原理和技术效果类似,本实施例此处不再赘述。
图11为本公开实施例提供的一种电子设备的结构示意图,如图11所示,该电子设备4包括:
处理器41,以及与处理器41通信连接的存储器42;
存储器42存储计算机执行指令;
处理器41执行存储器42存储的计算机执行指令,以实现如图2-图9所示实施例中的视频编码方法。
其中,可选地,处理器41和存储器42通过总线43连接。
相关说明可以对应参见图2-图9所对应的实施例中的步骤所对应的相关描述和效果进行理解,此处不做过多赘述。
参考图12,其示出了适于用来实现本公开实施例的电子设备900的结构示意图,该电子设备900可以为终端设备或服务器。其中,终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、个人数字助理(Personal Digital Assistant,简称PDA)、平板电脑(Portable Android Device,简称PAD)、便携式多媒体播放器(Portable Media Player,简称PMP)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图12示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图12所示,电子设备900可以包括处理装置(例如中央处理器、图形处理器等)901,其可以根据存储在只读存储器(Read Only Memory,简称ROM)902中的程序或者从存储装置908加载到随机访问存储器(Random Access Memory,简称RAM)903中的程序而执行各种适当的动作和处理。在RAM 903中,还存储有电子设备900操作所需的各种程序和数据。处理装置901、ROM 902以及RAM 903通过总线904彼此相连。输入/输出(I/O) 接口905也连接至总线904。
通常,以下装置可以连接至I/O接口905:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置906;包括例如液晶显示器(Liquid Crystal Display,简称LCD)、扬声器、振动器等的输出装置907;包括例如磁带、硬盘等的存储装置908;以及通信装置909。通信装置909可以允许电子设备900与其他设备进行无线或有线通信以交换数据。虽然图12示出了具有各种装置的电子设备900,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置909从网络上被下载和安装,或者从存储装置908被安装,或者从ROM902被安装。在该计算机程序被处理装置901执行时,执行本公开实施例的方法中限定的上述功能。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者 与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备执行上述实施例所示的方法。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(Local Area Network,简称LAN)或广域网(Wide Area Network,简称WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定,例如,第一获取单元还可以被描述为“获取至少两个网际协 议地址的单元”。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
第一方面,根据本公开的一个或多个实施例,提供了一种视频编码方法,包括:
基于硬件视频编辑器提取初始视频数据的视频编码特征,所述视频编码特征表征所述初始视频数据的视频内容复杂度;基于预训练的预测神经网络模型处理所述初始视频数据的视频编码特征,得到所述初始视频数据对应的目标质量等级,其中,所述目标质量等级表征固定品质动态码率模式下编码所述初始视频时的画质级别;利用所述硬件视频编辑器以所述目标质量等级对所述初始视频数据进行编码,生成目标视频。
根据本公开的一个或多个实施例,所述基于硬件视频编辑器提取初始视频数据的视频编码特征,包括:获取所述初始视频数据的备选质量等级;将所述初始视频数据的视频参数和所述备选质量等级输入所述硬件视频编辑器,得到所述初始视频数据的编码参数;基于所述编码参数生成对应的视频编码特征。
根据本公开的一个或多个实施例,所述初始视频数据中包括多个视频帧,所述基于所述编码参数生成对应的视频编码特征,包括:获取各所述视频帧对应的编码参数;根据各所述视频帧对应的编码参数,得到编码特征均值和 编码特征方差值,其中,所述编码特征均值为各所述视频帧对应的编码参数的平均值;所述编码特征方差值为各所述视频帧对应的编码参数的方差值;根据所述视频参数、所述备选质量等级,以及各所视频帧对应的编码特征均值和编码特征方差值,生成所述视频编码特征。
根据本公开的一个或多个实施例,将所述初始视频数据的视频参数和所述备选质量等级输入所述硬件视频编辑器,得到所述初始视频数据的编码参数,包括:循环执行以下步骤,直至达到预设条件:获取所述初始视频数据的当前帧;获得所述当前帧的视频参数和所述当前帧的质量等级;将所述当前帧的视频参数和所述当前帧的质量等级输入所述硬件视频编辑器,得到所述当前帧对应的编码参数;将所述当前帧的下一帧,设置为新的当前帧。
根据本公开的一个或多个实施例,在所述得到所述当前帧对应的编码参数之后,还包括:将所述当前帧的视频参数、所述当前帧的质量等级和所述编码参数,生成所述当前帧对应的帧编码特征;获取所述当前帧对应的前置帧的帧编码特征,所述前置帧为所述当前帧之前相邻的第一预设数量的视频帧,其中,所述前置帧的帧编码特征是基于所述前置帧对应的目标质量等级生成的;所述基于所述编码参数生成对应的视频编码特征,包括:根据所述当前帧对应的帧编码特征和所述前置帧的帧编码特征,生成所述当前帧对应的视频编码特征。
根据本公开的一个或多个实施例,所述编码参数包括以下至少一种:帧类型、帧大小、图像失真度、图像精细度。
根据本公开的一个或多个实施例,所述初始视频数据的视频编码特征包括至少两个备选编码特征,每一所述备选编码特征对应一个不同的质量等级,基于预训练的预测神经网络模型处理所述初始视频数据的视频编码特征,得到所述初始视频数据对应的目标质量等级,包括:依次将各所述备选编码特征输入所述预测神经网络模型,得到对应的第一评估值和对应的第二评估值,其中,所述第一评估值表征基于视频多方法评价融合的视频质量评价值,所述第二评估值表征视频码率;基于各所述备选编码特征对应的所述第一评估值和对应的所述第二评估值,生成所述目标质量等级。
根据本公开的一个或多个实施例,所述基于各所述备选编码特征对应的所述第一评估值和对应的所述第二评估值,生成所述目标质量等级,包括: 根据各所述备选编码特征对应的第一评估值,得到第一目标编码特征,所述第一目标编码特征为所述第一评估值大于第一阈值的备选编码特征;根据所述第一目标编码特征的第二评估值,确定第二目标编码特征,所述第二目标特征为所述第一目标编码特征中,所述第二评估值最小的视频编码特征;根据所述第二目标特征对应的质量等级,得到所述目标质量等级。
根据本公开的一个或多个实施例,在所述基于预训练的预测神经网络模型处理所述初始视频数据的视频编码特征,得到所述初始视频数据对应的目标质量等级之前,还包括:获取原始视频数据和质量等级序列,所述质量等级序列中包括至少两个不同的质量等级;基于所述质量等级序列,依次利用所述硬件视频编辑器对所述原始视频数据进行处理,得到各所述质量等级对应的视频编码特征;计算各所述视频编码特征对应的第一评估值和所述第二评估值;根据各所述视频编码特征和对应的所述第一评估值、对应的所述第二评估值生成训练样本,并基于所述训练样本训练预设的神经网络模型,得到所述预测神经网络模型。
第二方面,根据本公开的一个或多个实施例,提供了一种视频编码装置,包括:
参数获取模块,用于基于硬件视频编辑器提取初始视频数据的视频编码特征,所述视频编码特征表征所述初始视频数据的视频内容复杂度;
参数优化模块,用于基于预训练的预测神经网络模型处理所述初始视频数据的视频编码特征,得到所述初始视频数据对应的目标质量等级,其中,所述目标质量等级表征固定品质动态码率模式下编码所述初始视频时的画质级别;
编码模块,用于利用所述硬件视频编辑器以所述目标质量等级对所述初始视频数据进行编码,生成目标视频。
根据本公开的一个或多个实施例,所述参数获取模块,具体用于:获取所述初始视频数据的备选质量等级;将所述初始视频数据的视频参数和所述备选质量等级输入所述硬件视频编辑器,得到所述初始视频数据的编码参数;基于所述编码参数生成对应的视频编码特征。
根据本公开的一个或多个实施例,所述初始视频数据中包括多个视频帧,所述参数获取模块在基于所述编码参数生成对应的视频编码特征时,具体用 于:获取各所述视频帧对应的编码参数;根据各所述视频帧对应的编码参数,得到编码特征均值和编码特征方差值,其中,所述编码特征均值为各所述视频帧对应的编码参数的平均值;所述编码特征方差值为各所述视频帧对应的编码参数的方差值;根据所述视频参数、所述备选质量等级,以及各所视频帧对应的编码特征均值和编码特征方差值,生成所述视频编码特征。
根据本公开的一个或多个实施例,所述参数获取模块在将所述初始视频数据的视频参数和所述备选质量等级输入所述硬件视频编辑器,得到所述初始视频数据的编码参数时,具体用于:循环执行以下步骤,直至达到预设条件:获取所述初始视频数据的当前帧;获得所述当前帧的视频参数和所述当前帧的质量等级;将所述当前帧的视频参数和所述当前帧的质量等级输入所述硬件视频编辑器,得到所述当前帧对应的编码参数;将所述当前帧的下一帧,设置为新的当前帧。
根据本公开的一个或多个实施例,在所述得到所述当前帧对应的编码参数之后,所述参数获取模块,还用于:将所述当前帧的视频参数、所述当前帧的质量等级和所述编码参数,生成所述当前帧对应的帧编码特征;获取所述当前帧对应的前置帧的帧编码特征,所述前置帧为所述当前帧之前相邻的第一预设数量的视频帧,其中,所述前置帧的帧编码特征是基于所述前置帧对应的目标质量等级生成的;所述参数获取模块在基于所述编码参数生成对应的视频编码特征时,具体用于:根据所述当前帧对应的帧编码特征和所述前置帧的帧编码特征,生成所述当前帧对应的视频编码特征。
根据本公开的一个或多个实施例,所述编码参数包括以下至少一种:帧类型、帧大小、图像失真度、图像精细度。
根据本公开的一个或多个实施例,所述初始视频数据的视频编码特征包括至少两个备选编码特征,每一所述备选编码特征对应一个不同的质量等级,所述参数优化模块,具体用于:依次将各所述备选编码特征输入所述预测神经网络模型,得到对应的第一评估值和对应的第二评估值,其中,所述第一评估值表征基于视频多方法评价融合的视频质量评价值,所述第二评估值表征视频码率;基于各所述备选编码特征对应的所述第一评估值和对应的所述第二评估值,生成所述目标质量等级。
根据本公开的一个或多个实施例,所述参数优化模块在基于各所述备选 编码特征对应的所述第一评估值和对应的所述第二评估值,生成所述目标质量等级时,具体用于:根据各所述备选编码特征对应的第一评估值,得到第一目标编码特征,所述第一目标编码特征为所述第一评估值大于第一阈值的备选编码特征;根据所述第一目标编码特征的第二评估值,确定第二目标编码特征,所述第二目标特征为所述第一目标编码特征中,所述第二评估值最小的视频编码特征;根据所述第二目标特征对应的质量等级,得到所述目标质量等级。
根据本公开的一个或多个实施例,在所述基于预训练的预测神经网络模型处理所述初始视频数据的视频编码特征,得到所述初始视频数据对应的目标质量等级之前,所述参数优化模块还用于:获取原始视频数据和质量等级序列,所述质量等级序列中包括至少两个不同的质量等级;基于所述质量等级序列,依次利用所述硬件视频编辑器对所述原始视频数据进行处理,得到各所述质量等级对应的视频编码特征;计算各所述视频编码特征对应的第一评估值和所述第二评估值;根据各所述视频编码特征和对应的所述第一评估值、对应的所述第二评估值生成训练样本,并基于所述训练样本训练预设的神经网络模型,得到所述预测神经网络模型。
第三方面,根据本公开的一个或多个实施例,提供了一种电子设备,包括:处理器,以及与所述处理器通信连接的存储器;
所述存储器存储计算机执行指令;
所述处理器执行所述存储器存储的计算机执行指令,以实现如上第一方面以及第一方面各种可能的设计所述的视频编码方法。
第四方面,根据本公开的一个或多个实施例,提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如上第一方面以及第一方面各种可能的设计所述的视频编码方法。
第五方面,本公开实施例提供一种计算机程序产品,包括计算机程序,该计算机程序被处理器执行时实现如上第一方面以及第一方面各种可能的设计所述的视频编码方法。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征 的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。

Claims (13)

  1. 一种视频编码方法,包括:
    基于硬件视频编辑器提取初始视频数据的视频编码特征,所述视频编码特征表征所述初始视频数据的视频内容复杂度;
    基于预训练的预测神经网络模型处理所述初始视频数据的视频编码特征,得到所述初始视频数据对应的目标质量等级,其中,所述目标质量等级表征固定品质动态码率模式下编码所述初始视频时的画质级别;
    利用所述硬件视频编辑器以所述目标质量等级对所述初始视频数据进行编码,生成目标视频。
  2. 根据权利要求1所述的方法,其中所述基于硬件视频编辑器提取初始视频数据的视频编码特征,包括:
    获取所述初始视频数据的备选质量等级;
    将所述初始视频数据的视频参数和所述备选质量等级输入所述硬件视频编辑器,得到所述初始视频数据的编码参数;
    基于所述编码参数生成对应的视频编码特征。
  3. 根据权利要求2所述的方法,其中所述初始视频数据中包括多个视频帧,所述基于所述编码参数生成对应的视频编码特征,包括:
    获取各所述视频帧对应的编码参数;
    根据各所述视频帧对应的编码参数,得到编码特征均值和编码特征方差值,其中,所述编码特征均值为各所述视频帧对应的编码参数的平均值;所述编码特征方差值为各所述视频帧对应的编码参数的方差值;
    根据所述视频参数、所述备选质量等级,以及各所视频帧对应的编码特征均值和编码特征方差值,生成所述视频编码特征。
  4. 根据权利要求2所述的方法,其中将所述初始视频数据的视频参数和所述备选质量等级输入所述硬件视频编辑器,得到所述初始视频数据的编码参数,包括:
    循环执行以下步骤,直至达到预设条件:
    获取所述初始视频数据的当前帧;
    获得所述当前帧的视频参数和所述当前帧的质量等级;
    将所述当前帧的视频参数和所述当前帧的质量等级输入所述硬件视频编 辑器,得到所述当前帧对应的编码参数;
    将所述当前帧的下一帧,设置为新的当前帧。
  5. 根据权利要求4所述的方法,其中在所述得到所述当前帧对应的编码参数之后,还包括:
    将所述当前帧的视频参数、所述当前帧的质量等级和所述编码参数,生成所述当前帧对应的帧编码特征;
    获取所述当前帧对应的前置帧的帧编码特征,所述前置帧为所述当前帧之前相邻的第一预设数量的视频帧,其中,所述前置帧的帧编码特征是基于所述前置帧对应的目标质量等级生成的;
    所述基于所述编码参数生成对应的视频编码特征,包括:
    根据所述当前帧对应的帧编码特征和所述前置帧的帧编码特征,生成所述当前帧对应的视频编码特征。
  6. 根据权利要求2所述的方法,其中所述编码参数包括以下至少一种:
    帧类型、帧大小、图像失真度、图像精细度。
  7. 根据权利要求1所述的方法,其中所述初始视频数据的视频编码特征包括至少两个备选编码特征,每一所述备选编码特征对应一个不同的质量等级,基于预训练的预测神经网络模型处理所述初始视频数据的视频编码特征,得到所述初始视频数据对应的目标质量等级,包括:
    依次将各所述备选编码特征输入所述预测神经网络模型,得到对应的第一评估值和对应的第二评估值,其中,所述第一评估值表征基于视频多方法评价融合的视频质量评价值,所述第二评估值表征视频码率;
    基于各所述备选编码特征对应的所述第一评估值和对应的所述第二评估值,生成所述目标质量等级。
  8. 根据权利要求7所述的方法,其中所述基于各所述备选编码特征对应的所述第一评估值和对应的所述第二评估值,生成所述目标质量等级,包括:
    根据各所述备选编码特征对应的第一评估值,得到第一目标编码特征,所述第一目标编码特征为所述第一评估值大于第一阈值的备选编码特征;
    根据所述第一目标编码特征的第二评估值,确定第二目标编码特征,所述第二目标特征为所述第一目标编码特征中,所述第二评估值最小的视频编码特征;
    根据所述第二目标特征对应的质量等级,得到所述目标质量等级。
  9. 根据权利要求7所述的方法,其中在所述基于预训练的预测神经网络模型处理所述初始视频数据的视频编码特征,得到所述初始视频数据对应的目标质量等级之前,还包括:
    获取原始视频数据和质量等级序列,所述质量等级序列中包括至少两个不同的质量等级;
    基于所述质量等级序列,依次利用所述硬件视频编辑器对所述原始视频数据进行处理,得到各所述质量等级对应的视频编码特征;
    计算各所述视频编码特征对应的第一评估值和所述第二评估值;
    根据各所述视频编码特征和对应的所述第一评估值、对应的所述第二评估值生成训练样本,并基于所述训练样本训练预设的神经网络模型,得到所述预测神经网络模型。
  10. 一种视频编码装置,包括:
    参数获取模块,用于基于硬件视频编辑器提取初始视频数据的视频编码特征,所述视频编码特征表征所述初始视频数据的视频内容复杂度;
    参数优化模块,用于基于预训练的预测神经网络模型处理所述初始视频数据的视频编码特征,得到所述初始视频数据对应的目标质量等级,其中,所述目标质量等级表征固定品质动态码率模式下编码所述初始视频时的画质级别;
    编码模块,用于利用所述硬件视频编辑器以所述目标质量等级对所述初始视频数据进行编码,生成目标视频。
  11. 一种电子设备,包括:处理器,以及与所述处理器通信连接的存储器;
    所述存储器存储计算机执行指令;
    所述处理器执行所述存储器存储的计算机执行指令,以实现如权利要求1至9中任一项所述的视频编码方法。
  12. 一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如权利要求1至9任一项所述的视频编码方法。
  13. 一种计算机程序产品,包括计算机程序,该计算机程序被处理器执 行时实现权利要求1至9中任一项所述的视频编码方法。
PCT/CN2023/136507 2022-12-07 2023-12-05 视频编码方法、装置、电子设备及存储介质 Ceased WO2024120396A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2024573687A JP2025522455A (ja) 2022-12-07 2023-12-05 ビデオ符号化方法、装置、電子機器及び記憶媒体
EP23899971.8A EP4525442A4 (en) 2022-12-07 2023-12-05 VIDEO CODING METHOD AND APPARATUS, ELECTRONIC DEVICE AND STORAGE MEDIA

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211567316.XA CN118200571A (zh) 2022-12-07 2022-12-07 视频编码方法、装置、电子设备及存储介质
CN202211567316.X 2022-12-07

Publications (1)

Publication Number Publication Date
WO2024120396A1 true WO2024120396A1 (zh) 2024-06-13

Family

ID=91378574

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/136507 Ceased WO2024120396A1 (zh) 2022-12-07 2023-12-05 视频编码方法、装置、电子设备及存储介质

Country Status (4)

Country Link
EP (1) EP4525442A4 (zh)
JP (1) JP2025522455A (zh)
CN (1) CN118200571A (zh)
WO (1) WO2024120396A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118573870A (zh) * 2024-08-01 2024-08-30 北京宏远智控技术有限公司 一种视频编码方法、装置、设备以及存储介质
CN119011777A (zh) * 2024-08-12 2024-11-22 珠海城市管道燃气有限公司 燃气抢险处理方法和系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101868270B1 (ko) * 2017-02-28 2018-06-15 재단법인 다차원 스마트 아이티 융합시스템 연구단 싱글 패스 일관 화질 제어를 기반으로 하는 컨텐츠 인식 비디오 인코딩 방법, 컨트롤러 및 시스템
CN111246209A (zh) * 2020-01-20 2020-06-05 北京字节跳动网络技术有限公司 自适应编码方法、装置、电子设备及计算机存储介质
CN112399176A (zh) * 2020-11-17 2021-02-23 深圳大学 一种视频编码方法、装置、计算机设备及存储介质
CN113329226A (zh) * 2021-05-28 2021-08-31 北京字节跳动网络技术有限公司 数据的生成方法、装置、电子设备及存储介质
CN114845106A (zh) * 2021-02-01 2022-08-02 北京大学深圳研究生院 视频编码方法、装置和存储介质及电子设备
CN115209150A (zh) * 2022-09-16 2022-10-18 沐曦科技(成都)有限公司 一种视频编码参数获取方法、装置、网络模型及电子设备

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111083473B (zh) * 2019-12-28 2022-03-08 杭州当虹科技股份有限公司 一种基于机器学习的内容自适应视频编码方法
CN111263154B (zh) * 2020-01-22 2022-02-11 腾讯科技(深圳)有限公司 一种视频数据处理方法、装置及存储介质
BR112023019978A2 (pt) * 2021-05-28 2023-11-21 Deepmind Tech Ltd Treinamento de redes neurais de controle de taxa por meio de aprendizado por reforço
US12206914B2 (en) * 2021-06-12 2025-01-21 Google Llc Methods, systems, and media for determining perceptual quality indicators of video content items
CN114554211B (zh) * 2022-01-14 2025-01-28 百果园技术(新加坡)有限公司 内容自适应视频编码方法、装置、设备和存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101868270B1 (ko) * 2017-02-28 2018-06-15 재단법인 다차원 스마트 아이티 융합시스템 연구단 싱글 패스 일관 화질 제어를 기반으로 하는 컨텐츠 인식 비디오 인코딩 방법, 컨트롤러 및 시스템
CN111246209A (zh) * 2020-01-20 2020-06-05 北京字节跳动网络技术有限公司 自适应编码方法、装置、电子设备及计算机存储介质
CN112399176A (zh) * 2020-11-17 2021-02-23 深圳大学 一种视频编码方法、装置、计算机设备及存储介质
CN114845106A (zh) * 2021-02-01 2022-08-02 北京大学深圳研究生院 视频编码方法、装置和存储介质及电子设备
CN113329226A (zh) * 2021-05-28 2021-08-31 北京字节跳动网络技术有限公司 数据的生成方法、装置、电子设备及存储介质
CN115209150A (zh) * 2022-09-16 2022-10-18 沐曦科技(成都)有限公司 一种视频编码参数获取方法、装置、网络模型及电子设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4525442A4

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118573870A (zh) * 2024-08-01 2024-08-30 北京宏远智控技术有限公司 一种视频编码方法、装置、设备以及存储介质
CN119011777A (zh) * 2024-08-12 2024-11-22 珠海城市管道燃气有限公司 燃气抢险处理方法和系统

Also Published As

Publication number Publication date
EP4525442A4 (en) 2025-10-29
EP4525442A1 (en) 2025-03-19
JP2025522455A (ja) 2025-07-15
CN118200571A (zh) 2024-06-14

Similar Documents

Publication Publication Date Title
CN112437345B (zh) 视频倍速播放方法、装置、电子设备及存储介质
WO2024120396A1 (zh) 视频编码方法、装置、电子设备及存储介质
US11997314B2 (en) Video stream processing method and apparatus, and electronic device and computer-readable medium
CN111385576B (zh) 视频编码方法、装置、移动终端及存储介质
CN114257815B (zh) 一种视频转码方法、装置、服务器和介质
US11785195B2 (en) Method and apparatus for processing three-dimensional video, readable storage medium and electronic device
CN108174290A (zh) 用于处理视频的方法和装置
WO2024104307A1 (zh) 直播视频流渲染方法、装置、设备、存储介质及产品
WO2021143273A1 (zh) 直播流采样方法、装置及电子设备
CN118334594A (zh) 目标区域安全监控方法、装置、电子设备与可读介质
CN115442617A (zh) 一种基于视频编码的视频处理方法和装置
JP7411785B2 (ja) イントラ予測のための補間フィルタリング方法と装置、コンピュータプログラム及び電子装置
US11190774B1 (en) Screen content encoding mode evaluation including intra-block evaluation of multiple potential encoding modes
CN115761090B (zh) 特效渲染方法、装置、设备、计算机可读存储介质及产品
CN115022629B (zh) 云游戏视频的最优编码模式确定方法与装置
WO2025209165A1 (zh) 预测单元pu模式选择方法及装置、电子设备和存储介质
CN109495793B (zh) 一种弹幕写入方法、装置、设备及介质
CN114125443A (zh) 视频码率控制方法、装置和电子设备
CN119227770B (zh) 多模态大语言模型量化方法、装置、设备、存储介质及产品
CN117956157B (zh) 视频编码方法、装置、电子设备及计算机存储介质
CN115396672B (zh) 比特流存储方法、装置、电子设备和计算机可读介质
US20250133213A1 (en) Video encoding method and apparatus, electronic device and storage medium
JP7345638B2 (ja) ビデオ復号または符号化方法および装置、コンピュータプログラム、ならびに電子機器
CN121000875A (zh) 一种三维高斯处理方法、装置、设备、介质及程序产品
CN119676530A (zh) 一种视频描述生成方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23899971

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023899971

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2023899971

Country of ref document: EP

Effective date: 20241210

WWE Wipo information: entry into national phase

Ref document number: 2024573687

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE