WO2022228375A1 - 视频编码方法、装置和电子设备 - Google Patents
视频编码方法、装置和电子设备 Download PDFInfo
- Publication number
- WO2022228375A1 WO2022228375A1 PCT/CN2022/088950 CN2022088950W WO2022228375A1 WO 2022228375 A1 WO2022228375 A1 WO 2022228375A1 CN 2022088950 W CN2022088950 W CN 2022088950W WO 2022228375 A1 WO2022228375 A1 WO 2022228375A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- bits
- encoding
- frame
- ratio
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/136—Incoming video signal characteristics or properties
- H04N19/14—Coding unit complexity, e.g. amount of activity or edge presence estimation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/115—Selection of the code volume for a coding unit prior to coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/149—Data rate or code amount at the encoder output by estimating the code amount by means of a model, e.g. mathematical model or statistical model
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/15—Data rate or code amount at the encoder output by monitoring actual compressed data size at the memory before deciding storage at the transmission buffer
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/157—Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
- H04N19/159—Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/172—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/177—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a group of pictures [GOP]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/184—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being bits, e.g. of the compressed video stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/189—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
- H04N19/196—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding being specially adapted for the computation of encoding parameters, e.g. by averaging previously computed encoding parameters
Definitions
- the present application belongs to the field of communication technologies, and in particular relates to a video coding method, apparatus and electronic device.
- Video coding is a data compression method for digital video. The goal is to remove the redundancy in the original video image, save storage and transmission costs; Improve the quality of the encoded video.
- the video can be encoded by the JVT-G012 rate control algorithm in the video encoding standard H.264/AVC.
- the JVT-G012 rate control algorithm implements three-level rate control at the GOP (Group of Pictures) level, frame level, and macroblock level, with comprehensive control functions.
- GOP Group of Pictures
- the JVT-G012 rate control algorithm allocates the number of bits to the P-frame images in the picture group in an evenly distributed manner That is, the JVT-G012 rate control algorithm does not consider the coding complexity at the frame level when performing bit allocation, which may make the peak signal-to-noise ratio (PSNR) of each frame image in the GOP. ) curve fluctuations, resulting in a decrease in the average peak signal-to-noise ratio (PSNR) of the entire video sequence. In this way, the quality of the encoded video is poor.
- PSNR peak signal-to-noise ratio
- the purpose of the embodiments of the present application is to provide a video coding method, apparatus and electronic device, which can solve the problem that the coding complexity is not considered at the frame level, so that the average peak signal-to-noise ratio (PSNR) of the coded video decreases, thereby causing the coded video.
- PSNR peak signal-to-noise ratio
- an embodiment of the present application provides a video encoding method, the method comprising: determining a second number of bits for encoding a first image according to a first ratio, a first number of bits, and a first number; based on the second number of bits , encode the first image; wherein, the first ratio is the ratio of the predicted coding complexity of the first image to the actual coding complexity of M frames of the second image, and the first image is the uncoded first frame image in the target image group,
- the second images of the M frames are coded images in the target image group; the first number of bits is the number of bits remaining in the target image group, the first number is the number of uncoded images in the target image group, and M is an integer greater than 1 .
- an embodiment of the present application provides a video encoding apparatus, the apparatus includes: a determination module and an encoding module; and a determination module for determining to encode a first image according to a first ratio, a first number of bits, and a first number
- the encoding module is used for encoding the first image based on the second bit number determined by the determining module; wherein, the first ratio is the predicted encoding complexity of the first image and the actual encoding complexity of M frames of the second image
- the ratio of degrees the first image is the uncoded first frame image in the target image group, the second image of the M frames is the coded image in the target image group; the first number of bits is the remaining number of bits in the target image group,
- the first number is the number of uncoded pictures in the target picture group, and M is an integer greater than 1.
- embodiments of the present application provide an electronic device, the electronic device includes a processor, a memory, and a program or instruction stored on the memory and executable on the processor, the program or instruction being The processor implements the steps of the method according to the first aspect when executed.
- an embodiment of the present application provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or instruction is executed by a processor, the steps of the method according to the first aspect are implemented .
- an embodiment of the present application provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction, and implement the first aspect the method described.
- the second number of bits for encoding the first image may be determined according to the first ratio, the first number of bits, and the first number; and the first image is encoded based on the second number of bits; wherein the first ratio is is the ratio of the predicted coding complexity of the first image to the actual coding complexity of M frames of the second image, where the first image is the uncoded first frame of the target image group, and the M second image is the target image group that has been encoded.
- the first number of bits is the number of bits remaining in the target image group
- the first quantity is the number of unencoded images in the target image group
- M is an integer greater than 1.
- the first ratio can indicate the relative coding complexity between the first image and the M frames of the coded second images in the target image group
- the video encoding method provided by the embodiments of the present application can The relative coding complexity between the image to be coded and the coded image, the number of bits remaining in the target image group, and the number of frames remaining in the target image group, the number of bits is allocated to the image to be coded, so it is possible to realize the number of bits from the target image group.
- Images with low coding complexity save coding bits, and use the saved coding bits for coding images with high coding complexity, so that the average coding rate can be kept close to the target code rate (average coding rate).
- the fluctuation of the PSNR curve of each frame image in the image group is reduced, thereby improving the quality of the encoded video.
- Fig. 1 is a basic frame diagram of rate control in video coding
- Fig. 3 is the general structure diagram of the rate control algorithm
- FIG. 4 is a flowchart of a video encoding method provided by an embodiment of the present application.
- FIG. 5 is a schematic diagram of a video encoding apparatus provided by an embodiment of the present application.
- FIG. 6 is a schematic diagram of an electronic device provided by an embodiment of the present application.
- FIG. 7 is a schematic hardware diagram of an electronic device provided by an embodiment of the present application.
- first”, “second” and the like in the description and claims of the present application are used to distinguish similar objects, and are not used to describe a specific order or sequence. It is to be understood that data so used may be interchanged under appropriate circumstances so that embodiments of the application can be practiced in sequences other than those illustrated or described herein.
- the objects distinguished by “first”, “second”, etc. are usually one type, and the number of objects is not limited.
- the first object may be one or more than one.
- “and/or” in the description and claims indicates at least one of the connected objects, and the character “/" generally indicates that the associated objects are in an "or” relationship.
- Basic unit BU (basic unit, BU): is a set of one or more macroblocks MB (macro block, MB).
- the number of MBs contained in a BU should be divisible by the number of MBs contained in a frame of image. For example, in a video sequence in QCIF format, if a frame of image contains 99 MBs, then: a BU of the image can contain 99, 33 , 11, 9, 3, 1 MB, so that 1, 3, 9, 11, 33, 99 BUs can be included in the image.
- one BU can include one MB, one slice, one field or one frame of picture.
- all macroblocks MB in one basic unit BU are coded using the same quantization parameter QP.
- QP quantization parameter
- a larger size BU is usually selected, for example, all MBs in one line of an image form a basic unit BU, or a frame of image is used as a basic unit BU.
- Traffic round-trip model used to calculate the target bits allocated to the current frame image, that is, the number of bits allocated to the current frame image.
- N represents the number of images included in a GOP in a video sequence, and N is an integer greater than 1;
- A( ni,j ) is the actual number of bits generated by the coded image ni,j
- u( ni,j-1 ) is the instantaneous channel before the coded image ni,j Bandwidth
- F r is the encoding frame rate
- B s is the buffer size of the buffer
- the maximum occupancy of the buffer is determined by different profiles and levels;
- a 0 is a constant, usually the value of a 0 is 8;
- Buffer Also known as a buffer register, it is used to temporarily store data sent by peripherals (such as encoders) so that the data can be transmitted through the channel bandwidth.
- the buffer in the embodiment of the present invention is the buffer of the buffer.
- MAD linear prediction model It is used to predict the MAD of the jth frame image according to the actual MAD of the j-1th frame image, or predict the basic unit of the corresponding position in the jth frame image according to the MAD of a basic unit in the j-1th frame image.
- the MAD of the cell, j is a positive integer greater than 1.
- the MAD linear prediction model can be expressed by the following formula (2):
- MAD cb a 1 *MAD pb +a 2 ;
- a 1 and a 2 are two parameters of the MAD linear prediction model, and the initial values of a1 and a2 are set to 1 and 0 respectively, and are updated after each BU is edited. It should be noted that, a1 and a2 can be updated according to the difference between the predicted MAD value and the actual MAD value; the specific method can be determined according to the actual use requirements, which is not specifically limited here.
- MAD of the image It is the absolute average difference between the YUV value (eg Y value) of the current frame image and the YUV value (eg Y value) of the previous frame image (should be P frame image or I frame image) of the current frame image.
- Y in YUV represents brightness (Luminance or Luma)
- U and V represent chromaticity or density (Chrominance or Chroma)
- U and V are used to describe the color of the image.
- Saturation which are used to indicate the chromaticity of the image.
- the MAD of the basic unit BU is the absolute average difference between the YUV value of one BU and the YUV value of another BU, where the other BU is the previous frame of the image (for example, the jth frame image) where the one BU is located (for example, BU in the j-1th frame image), and the coordinate information of the one BU in the jth frame image is the same as the coordinate information of the other BU in the j-1th frame image, and the jth frame image is the same as the j-th frame image.
- One frame of image belongs to the same image group, and j is an integer greater than 1.
- the video signal transmission bandwidth is usually limited to a certain extent.
- rate control is to select an appropriate encoding parameter, such as the quantization parameter QP, and encode the image corresponding to the quantization parameter according to the quantization parameter, so that the encoded bit rate of the video signal satisfies the bandwidth limitation and makes the encoding distorted. as small as possible.
- rate control is a typical rate-distortion optimization problem with multiple constraints and multiple objectives.
- N is the number of images contained in a video sequence
- D i is the coding distortion of the ith frame image in the video sequence
- R i is the ith frame image in the video sequence.
- number of encoded bits is the optimal encoding parameter (ie, the quantization parameter QP) of each frame image in the video sequence, namely is the optimal encoding parameter of the first frame image,
- Optimal coding parameters of the second frame image, ..., is the optimal coding parameter of the Nth frame image;
- Rc is the target coding bit number of the video sequence.
- a video sequence can be encoded by an encoder, and the encoded encoded bit stream usually needs to be transmitted over a communication channel. Because most of the communication channels in practical applications are constant bit rate CBR (Constant Bitrate, CBR) channels, and most of the encoded code streams output by the encoder are variable bit rate VBR (Variable Bitrate, VBR) code streams. Therefore, in order to effectively transmit the VBR code stream in the CBR channel, a buffer can be set in the output part of the encoder.
- CBR Constant Bitrate
- VBR Variable Bitrate
- FIG. 2 is a schematic diagram of a buffer.
- a in Figure 2 represents the encoded bit stream output from the video encoder to the buffer
- Bs represents the buffer size of the buffer
- Bc (that is, the padding area in Figure 2) is the number of bits to be sent in the buffer's buffer
- Cb is the channel bandwidth
- Fr is the encoding frame rate
- Cb/Fr indicates the amount of data transmitted by the communication channel within the time period during which the encoder encodes one frame of image.
- bit allocation means allocating limited resources to picture units such as groups of pictures, frames, and macroblocks.
- quantization parameter estimation is to estimate the optimal coding parameter corresponding to the resource 0 according to the resource allocated to the picture unit (hereinafter referred to as resource 0, that is, the number of bits), so as to minimize the distortion of the encoded video.
- the code rate control algorithm requires that the encoded code stream is suitable for transmission on a band-limited channel (such as a CBR channel); on the other hand, it requires better video quality under the limited channel transmission bandwidth.
- a band-limited channel such as a CBR channel
- the code rate control algorithm requires that the encoded code stream is suitable for transmission on a band-limited channel (such as a CBR channel); on the other hand, it requires better video quality under the limited channel transmission bandwidth.
- a band-limited channel such as a CBR channel
- two aspects are usually considered: one is to look at the average PSNR of all frames in the entire sequence, and the video sequence with better average PSNR has better quality; the other is to look at the change of the PSNR curve during the encoding process of the video sequence.
- Video sequences with smoother PSNR curves are of better quality.
- the above two problems are studied at three levels: the GOP layer, the frame layer and the BU layer.
- the rate control of "three layers and two steps" is usually performed in units of GOPs, as shown in Figure 3.
- a GOP usually starts with an I-frame coded with intra-frame prediction, followed by several P-frames and/or B-frames coded with inter-frame prediction.
- the I frame is a key frame in the GOP, which belongs to intra-frame compression.
- the picture of the I frame will be completely preserved, and only the data of this frame is needed to decode the I frame.
- P frame is a forward search frame, also known as difference frame or inter-frame compression. After P frame encoding, it represents the difference information between the current frame and the I frame or the P frame before the current frame; when decoding the P frame, it is necessary to use the current frame.
- the picture of the P frame or I frame buffered before the frame is superimposed with the coded difference information defined by this frame, and the picture of the current frame is reconstructed.
- the B frame is a two-way difference frame, that is to say, the encoded B frame records the difference information between the current frame (that is, the current frame) and the previous and subsequent frames; in other words, to decode the B frame, not only the previous cached picture, but also the For the decoded picture, the image of the current frame is reconstructed from the previous and subsequent frames and the encoded data of the current frame.
- the amount of data generated by encoding the I frame is much larger than the amount of data generated by encoding the P frame and the encoding B frame. Therefore, after encoding the I frame, the buffer occupancy Bc will reach a high level, and the occupancy Bc gradually decreases in the process of encoding the P frame and B frame after the I frame. After the image in one GOP is encoded, the buffer occupancy can be restored to the level before encoding the GOP.
- the rate control algorithm allocates coding resources from top to bottom, and determines the quantization parameter QP according to the number of available coded bits.
- the main task of the rate control of the GOP layer is to allocate the number of coded bits for the entire GOP. The allocation is based on the number of frames contained in the current GOP, the occupancy of the encoder output buffer and the channel bandwidth. Then it is necessary to calculate the QP of the GOP starting I frame; the process of calculating the I frame QP is the process of allocating coding resources between the intra-frame prediction frame and the inter-frame prediction frame.
- the I frame QP of each GOP is Calculated according to the average QP of all P frames in the previous GOP, for the first GOP, the QP can be selected empirically for the I frame in the first GOP.
- Frame layer rate control is an important link in video coding. Whether it is GOP layer rate control or BU layer rate control, both are carried out around frame layer rate control.
- the coded bits should be allocated in the form of target bits among each P frame within the GOP, and then the QP of the current frame should be estimated according to the number of allocated coded bits.
- the main task is to make the actual number of bits generated by encoding match the target number of bits by setting an appropriate QP for each MB in the frame.
- the following takes the JVT-F086 rate control algorithm and the JVT-G012 rate control algorithm recommended for H.264/AVC video coding as examples to illustrate the rate control method of the traditional technology.
- JVT-F086 rate control algorithm and JVT-G012 rate control algorithm are JVT-F086 rate control algorithm and JVT-G012 rate control algorithm
- the JVT-F086 code rate control algorithm is based on the MPEG-2TM5 code rate model, and performs bit allocation according to the buffer state, and tries to ensure that the buffer neither overflows nor underflows.
- it is first necessary to estimate the number of bits required for encoding a frame of image before encoding a frame of image, and then pre-assume a QP according to the feedback of the buffer, and perform the frame image according to the QP. Then, according to the actual encoding result of the current frame image, it is judged whether the pre-assumed QP needs to be adjusted.
- the QP can be adjusted first, and the frame image can be re-encoded according to the adjusted QP; that is, in the JVT -In the F086 rate control algorithm, when encoding each frame of image, it is necessary to determine whether to re-given the QP, and re-encode the frame image according to the re-given QP, so the computational complexity of JVT-F086 is relatively high.
- the JVT-F086 bit rate control algorithm controls the bit rate in terms of buffer saturation. It controls the buffer well, and the buffer occupancy changes smoothly, but the encoded video quality fluctuates greatly.
- the JVT-G012 rate control algorithm inherits the idea of the MPEG-4VM8 rate control algorithm, follows the second rate distortion model, and can adjust the model parameters in time according to the characteristics of the source.
- the key technologies of the JVT-G012 rate control algorithm include Traffic round-trip model, MAD linear prediction model and quadratic rate-distortion model, etc.
- the JVT-G012 rate control algorithm allocates target coding bits for the current frame according to the pre-defined bit rate, frame rate, buffer fullness and buffer target line, and then uses linear tracking theory to predict the MAD of the current frame image.
- the rate-distortion model calculates the QP of the current frame image.
- the JVT-G012 rate control algorithm uses the method of predicting MAD to solve the QP paradox problem, and compared with the JVT-F086 rate control algorithm, it only needs to encode each frame of the image to be encoded, so the JVT-G012 rate control algorithm The computational complexity is low. Further, the JVT-G012 rate control algorithm realizes three-level rate control of the GOP layer, the frame layer, and the macroblock layer, and the control functions are relatively comprehensive.
- Tr(n i,0 ) represents the number of available/remaining bits of the GOP after encoding the 0th frame of the ith GOP, namely:
- u(n i ,1 ) represents the available channel transmission rate before encoding the first frame image of the ith GOP
- Ni is the number of image frames included in the ith GOP
- B s is the buffer area size
- B c (n i-1,Ni ) represents the actual occupancy of the buffer after encoding the i-1th GOP
- F r represents the encoding frame rate.
- Tr(n i,j ) represents the remaining available bits in the ith GOP after the coded image ni,j
- u( ni,j ) represents the number of bits before the coded image ni,j
- the available channel transmission rate of , u(n i,j-1 ) represents the available channel transmission rate before encoding the image n i,j-1
- A(n i,j ) is the actual number of encoded bits of the image n i,j
- i is a positive integer
- j is an integer greater than 1.
- the above formula (5) can be simplified into formula (6):
- the process of allocating the number of bits of the ith GOP is the process of performing GOP layer rate control on the ith GOP.
- the initial quantization parameter of the ith GOP needs to be determined.
- the initial quantization parameters of the I frame and the first P frame of other GOPs except the first GOP in the video sequence can be calculated by the following formula (7):
- QP st (i) represents the initial quantization parameter of the ith GOP
- Sum PQP (i-1) represents the sum of the quantization parameters of all P frames in the ith-1 th GOP
- N (i -1)p represents the number of P frames in the i-1 th GOP
- Tr (n i -1,Ni ) represents the encoding of the last frame of the i-1 th GOP, in the i-1 th GOP
- the number of available bits, T r (n i,0 ) represents the number of bits available in the i-th GOP after encoding the 0th frame of the i-th GOP
- N i-1 represents the bits contained in the i-1th GOP.
- the number of image frames, N (i-1)p represents the number of P frames contained in the i-1 th GOP.
- Frame-level rate control includes two stages: a pre-encoding stage and a post-encoding stage.
- the main task of this stage is to calculate quantization parameters for all coded frames including P and B frames. Since the B frame is usually not used as a reference frame, its QP can be obtained by simple linear interpolation of the QP of the adjacent frame, and the P frame is used as the reference frame of the subsequent frame, and the value of its QP must be accurately calculated. Therefore, the calculation methods of the quantization parameters of different frames should be considered separately.
- the ith B frame has The quantization parameters as follows according to the following two cases:
- Tbl(n i,j ) is the target buffer level of the jth frame P frame image in the ith GOP, and are the average coding complexity of P frame and B frame, respectively
- u(n i,j ) represents the available channel transmission rate before encoding the jth frame image of the ith GOP
- B s is the buffer size
- N p(i- 1) is the number of P frames in the i-1th GOP.
- the coding complexity of the image can be calculated by formula (13):
- the number of bits allocated for the jth frame of the ith GOP is determined by the target occupancy of the buffer, the encoding frame rate, the available channel bandwidth and the actual occupancy of the buffer:
- ⁇ is a constant, and its value is 0.25 when the GOP is interpolated with a B frame, otherwise it is 0.75
- u(n i,j ) represents the j-th encoding of the i-th GOP
- F r is the encoding frame rate
- Tbl(n i,j ) is the target buffer level of the jth frame image in the ith GOP
- B c (n i,j ) represents the encoding The actual occupancy of the buffer after the jth frame image in the ith GOP.
- the remaining bits after encoding the jth frame image in the ith GOP also take into account:
- N p,r (j-1) and N b,r (j-1) respectively represent the number of remaining uncoded P frames and B frames in the current GOP.
- N p,r (j-1) and N b,r (j-1) respectively represent the number of remaining uncoded P frames and B frames in the current GOP.
- the MAD value of the current frame is obtained from the actual MAD of the previous frame through a linear prediction model, and then the quantization parameters of the ith frame image n i,j in a GOP are calculated according to the quadratic rate distortion model. i and j are both positive integers.
- f(n i,j ) is the number of allocated bits of the jth frame image in the ith GOP, d 1 and d 2 are constants, MAD predict (n i,j ) is the predicted MAD value,
- the quantization step size calculated for the rate-distortion model can then be converted into the quantization parameter QP.
- the difference between the quantization parameters of two adjacent frames of images should not be greater than 2, so the quantization parameters of images n i, j are adjusted to
- Q pp is the quantization parameter of the i-1th frame image ni,j in the ith GOP; the quantization parameter of the final image n ij is limited to:
- the parameters in the linear prediction model and the parameters in the quadratic rate distortion model may be updated according to the error between the predicted MAD value of the image n i, j and the actual MAD value of the image n i, j.
- the predicted occupancy of the buffer (also called the new occupancy of the buffer, or the predicted occupancy of the buffer) is actually generated by the image n i, j
- the number of bits A(n i,j ) and the current occupancy of the buffer, and the amount of data that can be transmitted by the channel within the duration of the encoder encoding one frame are determined.
- a frame skipping technique needs to be adopted to avoid excessive new occupancy or even overflow of the buffer.
- the number of skipped frames N post is initialized to 0, and then increases continuously until the following conditions are met:
- coded image representing prediction The occupancy of the back buffer, j represents the frame number of the frame skipping start; j+Npost represents the image frame that needs to be discarded.
- the occupancy of the buffer can be calculated by the following formula (22):
- j represents the frame sequence number of the start frame skipping
- l is a positive integer
- the JVT-G012 rate control algorithm does not consider each P frame at the frame level when performing bit allocation. coding complexity between. That is to say, it is assumed that the coding complexity of each P frame in the same GOP is the same, and coding resources are evenly allocated to each P frame. However, in an actual video sequence, the coding complexity of each frame will vary with the magnitude and amount of motion contained in each frame. The use of an average allocation strategy will not only cause fluctuations in the PSNR curve of each frame within the GOP, but also lead to a decrease in the average PSNR of the entire sequence. drop, reducing the quality of the entire video encoding.
- the application embodiment proposes a video coding method based on coding complexity, which optimizes the step of calculating the number of bits of P frames in the frame-level rate control in the JVT-G012 method.
- bit allocation of the frame layer is performed according to coding complexity, coding bits are saved from low-complexity frame coding, and used for High-complexity frame encoding can reduce the fluctuation of the PSNR curve of each frame image in the group of pictures on the premise of keeping the average encoding bit rate close to the target bit rate, thereby improving the quality of the encoded video.
- An embodiment of the present application provides a video encoding method. As shown in FIG. 4 , the method may include the following steps 101 and 102 . The method is exemplarily described below by taking a video encoding device as an execution subject as an example.
- Step 101 The video encoding apparatus determines the second number of bits for encoding the first image according to the first ratio, the first number of bits and the first number.
- Step 102 The video encoding apparatus encodes the first image based on the second number of bits.
- the first ratio is the ratio of the predicted coding complexity of the first image to the actual coding complexity of M frames of the second image.
- the first image is the uncoded first frame image in the target image group, and the above-mentioned M frames of the second image are the coded images in the target image group; the first number of bits is the remaining number of bits in the target image group, and the first number is The number of uncoded pictures in the target picture group, M can be an integer greater than 1.
- the second number of bits is the number of bits configured by the video encoding apparatus for the first image, that is, the second number of bits is the target number of bits of the first image.
- the first ratio may be used to represent the relative coding complexity of the first image to be coded relative to the M frames of the coded second images in the target image group.
- the first image, the M second images and the first number are determined according to the encoding progress of the target image group.
- the target image group includes 10 frames of images, namely image 1, image 2, image 3, image 4, image 5, image 6, image 7, image 8, image 9 and image 10, and image 3 is the most recent encoding
- the first image is image 4
- the image 5 becomes the unencoded first frame image in the target image group, so that the video encoding apparatus can regard the image 5 as a new first image, and re-execute the above steps 101 and 102. , and so on, until the encoding of image 10 is completed.
- the video encoding apparatus can then proceed to encode the next GOP.
- the first ratio can indicate the relative coding complexity between the first image and the coded M frames of the second image in the target image group
- the method can determine the number of bits to encode the image to be encoded according to the relative encoding complexity between the image to be encoded and the encoded image in the target image group, the number of bits remaining in the target image group, and the number of frames remaining in the target image group, so It is possible to save coding bits from images with low coding complexity in the target GOP, and use the saved coding bits for coding images with high coding complexity, so that the average coding rate can be kept close to the target rate (average coding rate). Under the premise of encoding code rate), the fluctuation of the PSNR curve of each frame image in the group is reduced, so that the quality of the encoded video can be improved.
- step 101 may be specifically implemented by the following steps 101a and 101b.
- Step 101a the video encoding apparatus determines a weighting parameter corresponding to the first ratio according to the first ratio.
- the target picture group is the ith GOP in the video to be encoded
- the first picture is the jth frame picture in the target picture group
- the first ratio is MAD ratio ( ni, j )
- a and b are two encoding parameters set according to the available channel resources (for example, the available channel transmission rate before encoding the first image) and the encoding complexity of the target image group; a represents the target The average coding complexity of the group of pictures, b is the magnitude of the adjustment to the weighting parameter W MAD ( ni,j ).
- W MAD ( ni,j ) min ⁇ S high ,max ⁇ S low ,W MAD ( ni,j ) ⁇ ;
- S high represents the upper boundary of the adjustment range of the buffer, which is used to prevent high-complexity frames from excessively occupying coding resources.
- S low represents the lower boundary of the adjustment range of the buffer, which is used to avoid video quality degradation caused by too few coding resources occupied by low-complexity frames.
- Step 101b The video encoding apparatus determines the second number of bits for encoding the first image according to the weighting parameter, the first number of bits and the first number.
- Tr(n i,j ) is the remaining available bits in the target picture group before encoding the first picture
- G( ni,j ) is the number of bits in the target picture group before encoding the first picture
- W MAD ( ni,j ) is the weighting parameter corresponding to the first number of bits.
- the weighting parameter corresponding to the first ratio representing the relative coding complexity of the first image and the coded image in the target image group can be determined first, and then the The number of encoded images determines the number of bits for encoding the first image, that is, the number of bits of the encoded image can be determined based on the relative encoding complexity between each frame of images in the group of images, so the number of bits of the encoded image is determined relative to the method of average distribution.
- the video encoding method provided by the embodiments of the present application can better suppress the fluctuation of video quality between frames after encoding.
- the encoding to be encoded may be determined according to the relative encoding complexity, the number of remaining bits, the number of remaining frames, and the buffer status between the image to be encoded (for example, the above-mentioned first image) and the encoded image.
- the number of bits in the image to avoid buffer overflow and underflow.
- step 101 may be specifically implemented by the following step 101c.
- Step 101c The video encoding apparatus determines the second number of bits for encoding the first image according to the first ratio, the first number of bits, the first number and the target parameter.
- the target parameters include the estimated occupancy of the buffer, the actual occupancy of the buffer, the encoding frame rate, and the available channel transmission rate before encoding the first image.
- the transmission rate of the available channels before encoding each frame is the same.
- the second number of bits for encoding the first image can be determined according to the first ratio, the first number of bits, the first number, and the target parameter, fluctuations in inter-frame encoding quality can be suppressed, and buffers can be avoided. occupancy overflows or underflows. In this way, the quality of the encoded video can be further improved.
- step 101c may be specifically implemented through the following steps A and B.
- Step A The video encoding apparatus determines the third number of bits according to the first ratio, the first number of bits and the first number.
- step A the number of bits for encoding the first image is determined based on the relative encoding complexity among the images in the target image group.
- the video encoding apparatus may first determine the weighting parameter corresponding to the first ratio according to the first ratio, and then determine the third number of bits according to the weighting parameter, the first number of bits, and the first quality. Equation (25). For details, refer to the related descriptions of step 101a and step 101b, which will not be repeated here in order to avoid repetition.
- Step B The video encoding apparatus determines the fourth bit number according to the target parameter.
- the fourth number of bits is the number of bits for encoding the first image determined based on the occupancy of the encoder.
- the fourth bit of the encoding current frame can be determined from the perspective of the buffer occupancy of the encoder. number
- u(n i,j ) represents the available channel transmission rate before encoding the first image
- F r is the encoding frame rate
- ⁇ 1 is a constant whose value is 0.75.
- Step C the video encoding apparatus performs weighted summation of the third bit number and the fourth bit number to obtain the second bit number.
- Tc(n i,j ) is the number of bits of the encoded first image determined from the perspective of relative encoding complexity (refer to the above formula (25) for details)
- Tc(n i,j ) is the number of bits of the encoded first image determined from the perspective of relative encoding complexity (refer to the above formula (25) for details)
- ⁇ 1 is a weighting parameter, which determines two aspects when determining the number of bits of the encoded image. degree.
- ⁇ 1 is a constant whose value range is
- the third bit number of the encoded first image can be determined from the perspective of relative coding complexity
- the fourth bit number of the encoded first image can be determined from the perspective of buffer occupancy
- the third bit number The sum of the weights of the fourth bit number and the fourth bit number is used as the bit number of the final encoded first image, so it can not only improve the quality of the high-complexity image after encoding, but also improve the smoothness of the PSNR curve of each frame image in the target image group, The fluctuation of the PSNR curve is reduced, so that the average PSNR of the entire encoded video sequence can be improved. This can improve the quality of the encoded video.
- step 102 may be specifically implemented by the following steps 102a and 102b.
- Step 102a the video encoding apparatus determines a quantization parameter (hereinafter referred to as a target quantization parameter) of the first image through a second rate distortion model based on the second number of bits and the predictive encoding complexity of the first image.
- a quantization parameter hereinafter referred to as a target quantization parameter
- Step 102b the video encoding apparatus encodes the first image according to the target quantization parameter.
- the predicted coding complexity of the first image is represented by the predicted MAD value of the first image, and the predicted MAD value of the first image is based on the actual MAD value of the previous frame image (hereinafter referred to as the third image) of the first image. , predicted by the linear prediction model; and then predicted and obtained the target quantization parameter through the quadratic rate distortion model according to the predicted coding complexity of the first image and the actual coding complexity of the third image.
- the target quantization parameter It can be predicted by the following formula (28).
- f(n i,j ) is the number of bits to encode the first image
- d 1 and d 2 are the parameters of the quadratic rate-distortion model
- d 1 and d 2 are both constants
- MAD predict ( ni,j ) represents the predictive coding complexity of the first image.
- Q pp is the quantization parameter of the third image (which can be obtained after encoding the third image).
- the target quantization parameter is finally limited to:
- the basic unit BU used in encoding is different, and the method for encoding the first image by the video encoding apparatus according to the target quantization parameter may also be different.
- the video encoding apparatus may directly use the target encoding parameter to encode the first image.
- the basic unit BU is at least one macroblock, and the number of at least one macroblock is less than the number of macroblocks included in one frame of image)
- the video encoding apparatus needs to perform a basic layer (ie, BU layer) rate control.
- the method for performing the BU layer rate control by the video coding apparatus will be exemplarily described below.
- the main object of the rate control of the BU layer is the P frame in the GOP.
- the BU layer rate control algorithm can include the following five steps:
- Step 1 Calculate the target number of bits of the BU to be coded, that is, allocate the number of bits to the coded BU.
- the number of bits remaining in the ith frame image be f rb ( ni,j ), and the number of remaining BUs be N ub ;
- the initial values of f rb ( ni,j ) and N ub are f( ni,j ) and N unit
- f( ni,j ) is the total number of bits allocated to the ith frame image
- N unit is the number of all BUs in the i-th frame image
- the number of bits allocated to the first uncoded BU in the i-th frame image is f rb /N ub .
- Step 2 Calculate the estimated header bit number m h of the c th BU in the ith frame image, where c is a positive integer, and the c th BU is the first uncoded BU in the ith frame image.
- Step 3 Calculate the residual coefficient coding bit number R i (c) of the c-th BU in the i-th frame image:
- Step 4 According to the MAD linear prediction model, the MAD value of the c-th BU in the i-th image (that is, the predicted MAD value of the c-th BU) is predicted from the MAD value of the target BU, and the target BU is the i-1th frame.
- the position in the image is the BU corresponding to the position of the c-th BU in the i-th frame image, and the target BU has completed encoding; then according to the predicted MAD value of the c-th BU, the binomial rate-distortion model is used to calculate the encoding and quantization.
- step size where the binomial rate-distortion model is:
- ⁇ i (c) is the predicted MAD value of the cth BU
- Q step,i (j) is the quantization step size calculated by the binomial rate-distortion model.
- the quantization step size can be converted into a quantization parameter QP, which can be specifically determined according to actual usage requirements.
- Step 5 According to the calculated quantization parameters, rate-distortion-optimized coding is performed on all macroblocks in the c-th BU, and after the coding is completed, the remaining bits of the i-th frame image, the parameters of the MAD linear prediction model, and two are updated.
- the parameters of the multinomial rate-distortion model For details, reference may be made to the relevant descriptions in the foregoing embodiments.
- the video encoding method provided in the embodiment of the present application may further include the following step 103.
- Step 103 The video coding apparatus determines a first ratio according to the predictive coding complexity of the first image and the average coding complexity of M frames of the second image.
- the predictive coding complexity of the first image is represented by the predicted MAD value of the first image
- the average coding complexity of M frames of second images may be represented by the average MAD value of M second images.
- the first ratio MAD ratio (j) can be calculated by the following formula (26):
- MAD ratio (j) is the MAD ratio value of the jth P frame in the current GOP
- MAD predict (j) is the MAD value of the jth P frame predicted by the MAD linear prediction model
- MAD actual (o) is the actual MAD value calculated after encoding the oth frame in the current GOP (eg, the target group of pictures).
- the average coding complexity of the coded images in the GOP where the image is located can be referred to, so it can be ensured that the coded video quality of the images in the same GOP is closer to each other , so that the fluctuation of the peak signal-to-noise ratio curve of each frame image in the same GOP can be reduced, so that the quality of the encoded video can be improved.
- the execution body may be a video coding apparatus, or a control module in the video coding apparatus for executing the video coding method.
- the video encoding device provided by the embodiments of the present application is described by taking the video encoding method performed by the video encoding device as an example.
- FIG. 5 is a schematic structural diagram of a possible structure for implementing a video encoding apparatus provided by an embodiment of the present application.
- the video encoding apparatus 50 may include: a determination module 51 and an encoding module 52 .
- the determination module 51 can be used to determine the second bit number of the encoded first image according to the first ratio, the first bit number and the first quantity; the encoding module 52 can be used to determine the second bit number based on the determination module 51, Encoding the first image; wherein, the first ratio may be the ratio of the predicted coding complexity of the first image to the actual coding complexity of M frames of the second image, and the first image is the uncoded first frame image in the target image group, The second images of the M frames are coded images in the target image group; the first number of bits is the number of bits remaining in the target image group, the first number is the number of uncoded images in the target image group, and M is an integer greater than 1 .
- the determining module 51 may be specifically configured to determine the weighting parameter corresponding to the first ratio by using the first ratio; The second bit number of an image.
- the determining module 51 may be specifically configured to determine the second number of bits to encode the first image according to the first ratio, the first number of bits, the first number and the target parameter; wherein the target parameter Including: the estimated occupancy of the buffer, the actual occupancy of the buffer, the encoding frame rate and the transmission rate of the available channel before encoding the first image.
- the above-mentioned determination module 51 may include a first determination sub-module and a processing sub-module; the first determination sub-module may be configured to determine according to the first ratio, the first number of bits and the first quantity. The third bit number; and determining the fourth bit number according to the target parameter; the processing sub-module can be used to perform a weighted summation of the third bit number and the fourth bit number determined by the first determination sub-module to obtain the second bit number .
- the encoding module 52 may include a second determination submodule and an encoding submodule;
- the second determination submodule can be used to determine the quantization parameter of the first image based on the second bit number and the predictive coding complexity of the first image, through a second rate distortion model;
- the encoding sub-module may be configured to encode the first image according to the quantization parameter determined by the second determining sub-module.
- the determining module 51 may also be configured to, before determining the second number of bits of the encoded first image according to the first ratio, the first number of bits and the first number, determine the number of bits according to the first image.
- the predicted coding complexity and the average coding complexity of the M frames of the second image are used to determine the first ratio.
- the first ratio can indicate the relative encoding complexity between the first image and the encoded second images of the M frames in the target image group, that is, the video provided by the embodiment of the present application
- the encoding method can determine the number of bits to encode the image to be encoded according to the relative encoding complexity between the image to be encoded and the encoded image in the target image group, the number of bits remaining in the target image group, and the number of frames remaining in the target image group, Therefore, it is possible to save coding bits from pictures with low coding complexity in the target GOP, and use the saved coding bits for coding the pictures with high coding complexity, so that the average coding rate can be kept close to the target code rate ( On the premise of the average coding rate), the fluctuation of the PSNR curve of each frame image in the picture group can be reduced, and the quality of the coded video can be improved.
- the video encoding apparatus in this embodiment of the present application may be an apparatus, or may be a component, an integrated circuit, or a chip in a terminal.
- the apparatus may be a mobile electronic device or a non-mobile electronic device.
- the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palmtop computer, an in-vehicle electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook, or a personal digital assistant (personal digital assistant).
- UMPC ultra-mobile personal computer
- netbook or a personal digital assistant
- the non-mobile electronic device may be a network attached storage (Network Attached Storage, NAS), a personal computer (personal computer, PC), a television (television, TV), a teller machine or a self-service machine, etc., the embodiment of the present application There is no specific limitation.
- Network Attached Storage NAS
- personal computer personal computer, PC
- television television
- teller machine a self-service machine
- the video encoding apparatus in this embodiment of the present application may be an apparatus having an operating system.
- the operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, which are not specifically limited in the embodiments of the present application.
- the video encoding apparatus provided in the embodiments of the present application can implement each process implemented by the method embodiments in FIG. 1 to FIG. 4 , and to avoid repetition, details are not repeated here.
- an embodiment of the present application further provides an electronic device 200, including a processor 202, a memory 201, and a program or instruction stored in the memory 201 and executable on the processor 202, the program or instruction being processed
- an electronic device 200 including a processor 202, a memory 201, and a program or instruction stored in the memory 201 and executable on the processor 202, the program or instruction being processed
- the device 202 is executed, each process of the above-mentioned embodiments of the screenshot method can be achieved, and the same technical effect can be achieved. In order to avoid repetition, details are not repeated here.
- the electronic devices in the embodiments of the present application include the aforementioned mobile electronic devices and non-mobile electronic devices.
- FIG. 7 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.
- the electronic device 1000 includes but is not limited to: a radio frequency unit 1001, a network module 1002, an audio output unit 1003, an input unit 1004, a sensor 1005, a display unit 1006, a user input unit 1007, an interface unit 1008, a memory 1009, and components such as the processor 1010.
- the electronic device 1000 may also include a power source (such as a battery) for supplying power to various components, and the power source may be logically connected to the processor 1010 through a power management system, so that the power management system can manage charging, discharging, and power functions. consumption management and other functions.
- a power source such as a battery
- the structure of the electronic device shown in FIG. 7 does not constitute a limitation on the electronic device.
- the electronic device may include more or less components than the one shown, or combine some components, or arrange different components, which will not be repeated here. .
- the processor 1010 may be configured to determine the second number of bits for encoding the first image according to the first ratio, the first number of bits and the first number; and, based on the second number of bits, encode the first image; wherein the first The ratio may be the ratio of the predicted coding complexity of the first image to the actual coding complexity of M frames of second images, where the first image is an uncoded first frame of images in the target image group, and the M frames of second images are the target image
- the coded pictures in the group; the first number of bits is the number of bits remaining in the target picture group, the first number is the number of uncoded pictures in the target picture group, and M is an integer greater than 1.
- the processor 1010 may be specifically configured to determine a weighting parameter corresponding to the first ratio by using the first ratio; The second bit number of an image.
- the processor 1010 may be specifically configured to determine the second number of bits to encode the first image according to the first ratio, the first number of bits, the first number and the target parameter; wherein the target parameter Including: the estimated occupancy of the buffer, the actual occupancy of the buffer, the encoding frame rate and the transmission rate of the available channel before encoding the first image.
- the processor 1010 may be configured to determine the third number of bits according to the first ratio, the first number of bits and the first number; and determine the fourth number of bits according to the target parameter; and The third bit number and the fourth bit number are weighted and summed to obtain the second bit number.
- the processor 1010 may be configured to determine the quantization parameter of the first image through a second rate distortion model based on the second number of bits and the predictive coding complexity of the first image; and according to this The quantization parameter encodes the first image.
- the processor 1010 may be further configured to, before determining the second number of bits to encode the first image according to the first ratio, the first number of bits, and the first number, perform an encoding process according to the number of bits of the first image.
- the predicted coding complexity and the average coding complexity of the M frames of the second image are used to determine the first ratio.
- the first ratio can indicate the relative encoding complexity between the first image and the encoded second images of the M frames in the target image group, that is, the video provided by the embodiment of the present application
- the encoding method can determine the number of bits to encode the image to be encoded according to the relative encoding complexity between the image to be encoded and the encoded image in the target image group, the number of bits remaining in the target image group, and the number of frames remaining in the target image group, Therefore, it is possible to save coding bits from pictures with low coding complexity in the target GOP, and use the saved coding bits for coding the pictures with high coding complexity, so that the average coding rate can be kept close to the target code rate ( On the premise of the average coding rate), the fluctuation of the PSNR curve of each frame image in the picture group can be reduced, and the quality of the coded video can be improved.
- the input unit 1004 may include a graphics processor (Graphics Processing Unit, GPU) 10041 and a microphone 10042. Such as camera) to obtain still pictures or video image data for processing.
- the display unit 1006 may include a display panel 10061, which may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like.
- the user input unit 1007 includes a touch panel 10071 and other input devices 10072 .
- the touch panel 10071 is also called a touch screen.
- the touch panel 10071 may include two parts, a touch detection device and a touch controller.
- Other input devices 10072 may include, but are not limited to, physical keyboards, function keys (such as volume control keys, switch keys, etc.), trackballs, mice, and joysticks, which will not be repeated here.
- Memory 1009 may be used to store software programs as well as various data, including but not limited to application programs and operating systems.
- the processor 1010 may integrate an application processor and a modem processor, wherein the application processor mainly processes the operating system, user interface, and application programs, and the like, and the modem processor mainly processes wireless communication. It can be understood that, the above-mentioned modulation and demodulation processor may not be integrated into the processor 1010.
- the embodiments of the present application further provide a readable storage medium, where a program or an instruction is stored on the readable storage medium.
- a program or an instruction is stored on the readable storage medium.
- the processor is the processor in the electronic device described in the foregoing embodiments.
- the readable storage medium includes a computer-readable storage medium, such as a computer read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a magnetic disk or an optical disk, and the like.
- An embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction to implement the above video encoding method embodiments.
- the chip includes a processor and a communication interface
- the communication interface is coupled to the processor
- the processor is configured to run a program or an instruction to implement the above video encoding method embodiments.
- the chip mentioned in the embodiments of the present application may also be referred to as a system-on-chip, a system-on-chip, a system-on-a-chip, or a system-on-a-chip, or the like.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computing Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Algebra (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Description
Claims (17)
- 一种视频编码方法,所述方法包括:根据第一比值、第一比特数和第一数量,确定编码第一图像的第二比特数;基于所述第二比特数,编码所述第一图像;其中,所述第一比值为所述第一图像的预测编码复杂度与M帧第二图像的实际编码复杂度的比值,所述第一图像为目标图像组中未编码的第一帧图像,所述M帧第二图像为所述目标图像组中已编码的图像;所述第一比特数为所述目标图像组中剩余的比特数,所述第一数量为所述目标图像组中未编码图像的数量,M为大于1的整数。
- 根据权利要求1所述的方法,其中,所述根据第一比值、第一比特数和第一数量,确定编码第一图像的第二比特数,包括:通过所述第一比值,确定与所述第一比值对应的加权参数;根据所述加权参数、所述第一比特数和所述第一数量,确定编码所述第一图像的所述第二比特数。
- 根据权利要求1或2所述的方法,其中,所述根据第一比值、第一比特数和第一数量,确定编码第一图像的第二比特数,包括:根据所述第一比值、所述第一比特数、所述第一数量和目标参数,确定编码所述第一图像的所述第二比特数;其中,所述目标参数包括:缓冲区的估计占用量、所述缓冲区的实际占用量、编码帧率和编码所述第一图像前的可用信道传输速率。
- 根据权利要求3所述的方法,其中,所述根据所述第一比值、所述第一比特数、所述第一数量和目标参数,确定编码所述第一图像的所述第二比特数,包括:根据所述第一比值、所述第一比特数和所述第一数量,确定第三比特数;根据所述目标参数,确定第四比特数;将所述第三比特数和所述第四比特数进行加权求和,得到所述第二比特数。
- 根据权利要求1所述的方法,其中,所述基于所述第二比特数,编码所述第一图像,包括:基于所述第二比特数和所述第一图像的预测编码复杂度,通过二次率失真模型,确定所述第一图像的量化参数;按照所述量化参数,编码所述第一图像。
- 根据权利要求1所述的方法,其中,所述根据第一比值、第一比特数和第一数量,确定编码第一图像的第二比特数之前,所述方法还包括:根据所述第一图像的预测编码复杂度和所述M帧第二图像的平均编码复杂度,确定所述第一比值。
- 一种视频编码装置,所述装置包括:确定模块和编码模块;确定模块,用于根据第一比值、第一比特数和第一数量,确定编码第一图像的第 二比特数;所述编码模块,用于基于所述确定模块确定的所述第二比特数,编码所述第一图像;其中,所述第一比值为所述第一图像的预测编码复杂度与M帧第二图像的实际编码复杂度的比值,所述第一图像为目标图像组中未编码的第一帧图像,所述M帧第二图像为所述目标图像组中已编码的图像;所述第一比特数为所述目标图像组中剩余的比特数,所述第一数量为所述目标图像组中未编码图像的数量,M为大于1的整数。
- 根据权利要求7所述的装置,其中,所述确定模块,具体用于通过所述第一比值,确定与所述第一比值对应的加权参数;并根据所述加权参数、所述第一比特数和所述第一数量,确定编码所述第一图像的所述第二比特数。
- 根据权利要求7或8所述的装置,其中,所述确定模块,具体用于根据所述第一比值、所述第一比特数、所述第一数量和目标参数,确定编码所述第一图像的所述第二比特数;其中,所述目标参数包括:缓冲区的估计占用量、所述缓冲区的实际占用量、编码帧率和编码所述第一图像前的可用信道传输速率。
- 根据权利要求9所述的装置,其中,所述确定模块包括第一确定子模块和处理子模块;所述第一确定子模块,用于根据所述第一比值、所述第一比特数和所述第一数量,确定第三比特数;并根据所述目标参数,确定第四比特数;所述处理子模块,用于将所述第一确定子模块确定的所述第三比特数和所述第四比特数进行加权求和,得到所述第二比特数。
- 根据权利要求7所述的装置,其中,所述编码模块包括第二确定子模块和编码子模块;所述第二确定子模块,用于基于所述第二比特数和所述第一图像的预测编码复杂度,通过二次率失真模型,确定所述第一图像的量化参数;所述编码子模块,用于按照所述第二确定子模块确定的所述量化参数,编码所述第一图像。
- 根据权利要求7所述的装置,其中,所述确定模块,还用于在根据所述第一比值、所述第一比特数和所述第一数量,确定编码所述第一图像的所述第二比特数之前,根据所述第一图像的预测编码复杂度和所述M帧第二图像的平均编码复杂度,确定所述第一比值。
- 一种电子设备,包括处理器,存储器及存储在所述存储器上并可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现如权利要求1至6中任一项所述的视频编码方法的步骤。
- 一种可读存储介质,所述可读存储介质上存储程序或指令,所述程序或指令被处理器执行时实现如权利要求1至6中任一项所述的视频编码方法的步骤。
- 一种计算机软件产品,所述计算机软件产品被至少一个处理器执行以实现如权利要求1至6中任一项所述的视频编码方法。
- 一种电子设备,包括电子设备被配置成用于执行如权利要求1至6中任一项所述的视频编码方法。
- 一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现如1至6中任一项所述的视频编码方法。
Priority Applications (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP22794845.2A EP4333433A4 (en) | 2021-04-26 | 2022-04-25 | VIDEO CODING METHOD AND APPARATUS, AND ELECTRONIC DEVICE |
| JP2023564189A JP7682297B2 (ja) | 2021-04-26 | 2022-04-25 | ビデオ符号化方法、装置と電子機器 |
| KR1020237034885A KR20230155002A (ko) | 2021-04-26 | 2022-04-25 | 비디오 코딩 방법, 장치 및 전자기기 |
| US18/485,487 US12413738B2 (en) | 2021-04-26 | 2023-10-12 | Video encoding method and apparatus and electronic device |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202110454418.XA CN113286145B (zh) | 2021-04-26 | 2021-04-26 | 视频编码方法、装置和电子设备 |
| CN202110454418.X | 2021-04-26 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US18/485,487 Continuation US12413738B2 (en) | 2021-04-26 | 2023-10-12 | Video encoding method and apparatus and electronic device |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2022228375A1 true WO2022228375A1 (zh) | 2022-11-03 |
Family
ID=77275740
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2022/088950 Ceased WO2022228375A1 (zh) | 2021-04-26 | 2022-04-25 | 视频编码方法、装置和电子设备 |
Country Status (6)
| Country | Link |
|---|---|
| US (1) | US12413738B2 (zh) |
| EP (1) | EP4333433A4 (zh) |
| JP (1) | JP7682297B2 (zh) |
| KR (1) | KR20230155002A (zh) |
| CN (1) | CN113286145B (zh) |
| WO (1) | WO2022228375A1 (zh) |
Families Citing this family (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113286145B (zh) | 2021-04-26 | 2022-07-22 | 维沃移动通信有限公司 | 视频编码方法、装置和电子设备 |
| CN116248882B (zh) * | 2023-02-26 | 2025-11-14 | 翱捷科技股份有限公司 | 一种i帧图像块级别的码率控制方法及装置 |
| CN116708934B (zh) * | 2023-05-16 | 2024-03-22 | 深圳东方凤鸣科技有限公司 | 一种视频编码处理方法及装置 |
Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20050175091A1 (en) * | 2004-02-06 | 2005-08-11 | Atul Puri | Rate and quality controller for H.264/AVC video coder and scene analyzer therefor |
| US20050175092A1 (en) * | 2004-02-06 | 2005-08-11 | Atul Puri | H.264/AVC coder incorporating rate and quality controller |
| CN101188755A (zh) * | 2007-12-14 | 2008-05-28 | 宁波中科集成电路设计中心有限公司 | 一种对实时视频信号在avs编码过程中vbr码率控制的方法 |
| CN101895759A (zh) * | 2010-07-28 | 2010-11-24 | 南京信息工程大学 | 一种h.264码率控制方法 |
| CN101895758A (zh) * | 2010-07-23 | 2010-11-24 | 南京信息工程大学 | 基于帧复杂度的h.264码率控制方法 |
| CN103051897A (zh) * | 2012-12-26 | 2013-04-17 | 南京信息工程大学 | 一种h264视频编码码率控制方法 |
| CN108200431A (zh) * | 2017-12-08 | 2018-06-22 | 重庆邮电大学 | 一种视频编码码率控制帧层比特分配方法 |
| CN113286145A (zh) * | 2021-04-26 | 2021-08-20 | 维沃移动通信有限公司 | 视频编码方法、装置和电子设备 |
Family Cites Families (15)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| FR2748623B1 (fr) * | 1996-05-09 | 1998-11-27 | Thomson Multimedia Sa | Encodeur a debit variable |
| US6522693B1 (en) * | 2000-02-23 | 2003-02-18 | International Business Machines Corporation | System and method for reencoding segments of buffer constrained video streams |
| JP3889552B2 (ja) * | 2000-06-09 | 2007-03-07 | パイオニア株式会社 | 符号量割り当て装置および方法 |
| CN1206864C (zh) * | 2002-07-22 | 2005-06-15 | 中国科学院计算技术研究所 | 结合率失真优化的码率控制的方法及其装置 |
| US7133448B2 (en) * | 2002-11-07 | 2006-11-07 | Silicon Integrated Systems Corp. | Method and apparatus for rate control in moving picture video compression |
| US7095784B2 (en) * | 2003-04-14 | 2006-08-22 | Silicon Intergrated Systems Corp. | Method and apparatus for moving picture compression rate control using bit allocation with initial quantization step size estimation at picture level |
| US7373004B2 (en) * | 2003-05-23 | 2008-05-13 | Silicon Integrated Systems Corp. | Apparatus for constant quality rate control in video compression and target bit allocator thereof |
| US7254176B2 (en) * | 2003-05-23 | 2007-08-07 | Silicon Integrated Systems Corp. | Apparatus for variable bit rate control in video compression and target bit allocator thereof |
| JP4908943B2 (ja) | 2006-06-23 | 2012-04-04 | キヤノン株式会社 | 画像符号化装置及び画像符号化方法 |
| JP2010062999A (ja) * | 2008-09-05 | 2010-03-18 | Toshiba Corp | 動画像符号化装置、動画像符号化方法、及び、コンピュータプログラム |
| JP5871602B2 (ja) * | 2011-12-19 | 2016-03-01 | キヤノン株式会社 | 符号化装置 |
| CN102752591B (zh) * | 2012-06-14 | 2014-09-17 | 南京信息工程大学 | 基于综合因子的h.264码率控制方法 |
| US10819997B2 (en) * | 2016-01-20 | 2020-10-27 | Arris Enterprises Llc | Encoding video data according to target decoding device decoding complexity |
| US10560696B2 (en) * | 2018-06-25 | 2020-02-11 | Tfi Digital Media Limited | Method for initial quantization parameter optimization in video coding |
| CN111432222B (zh) * | 2019-12-30 | 2022-05-31 | 鹏城实验室 | 适用于交通场景的直播视频编码方法、终端及存储介质 |
-
2021
- 2021-04-26 CN CN202110454418.XA patent/CN113286145B/zh active Active
-
2022
- 2022-04-25 KR KR1020237034885A patent/KR20230155002A/ko active Pending
- 2022-04-25 JP JP2023564189A patent/JP7682297B2/ja active Active
- 2022-04-25 EP EP22794845.2A patent/EP4333433A4/en active Pending
- 2022-04-25 WO PCT/CN2022/088950 patent/WO2022228375A1/zh not_active Ceased
-
2023
- 2023-10-12 US US18/485,487 patent/US12413738B2/en active Active
Patent Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20050175091A1 (en) * | 2004-02-06 | 2005-08-11 | Atul Puri | Rate and quality controller for H.264/AVC video coder and scene analyzer therefor |
| US20050175092A1 (en) * | 2004-02-06 | 2005-08-11 | Atul Puri | H.264/AVC coder incorporating rate and quality controller |
| CN101188755A (zh) * | 2007-12-14 | 2008-05-28 | 宁波中科集成电路设计中心有限公司 | 一种对实时视频信号在avs编码过程中vbr码率控制的方法 |
| CN101895758A (zh) * | 2010-07-23 | 2010-11-24 | 南京信息工程大学 | 基于帧复杂度的h.264码率控制方法 |
| CN101895759A (zh) * | 2010-07-28 | 2010-11-24 | 南京信息工程大学 | 一种h.264码率控制方法 |
| CN103051897A (zh) * | 2012-12-26 | 2013-04-17 | 南京信息工程大学 | 一种h264视频编码码率控制方法 |
| CN108200431A (zh) * | 2017-12-08 | 2018-06-22 | 重庆邮电大学 | 一种视频编码码率控制帧层比特分配方法 |
| CN113286145A (zh) * | 2021-04-26 | 2021-08-20 | 维沃移动通信有限公司 | 视频编码方法、装置和电子设备 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4333433A4 * |
Also Published As
| Publication number | Publication date |
|---|---|
| EP4333433A4 (en) | 2024-09-25 |
| US20240040127A1 (en) | 2024-02-01 |
| JP2024514348A (ja) | 2024-04-01 |
| KR20230155002A (ko) | 2023-11-09 |
| CN113286145A (zh) | 2021-08-20 |
| JP7682297B2 (ja) | 2025-05-23 |
| US12413738B2 (en) | 2025-09-09 |
| CN113286145B (zh) | 2022-07-22 |
| EP4333433A1 (en) | 2024-03-06 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Lee et al. | Scalable rate control for MPEG-4 video | |
| CN113766226A (zh) | 图像编码方法、装置、设备及存储介质 | |
| JP5318561B2 (ja) | マルチメディア処理のためのコンテンツ分類 | |
| JP5351040B2 (ja) | 映像符号化規格に対応した映像レート制御の改善 | |
| US12413738B2 (en) | Video encoding method and apparatus and electronic device | |
| CN101518088B (zh) | 针对有效速率控制和增强视频编码质量的ρ域帧级比特分配的方法 | |
| US20050286631A1 (en) | Encoding with visual masking | |
| CN106937112B (zh) | 基于h.264视频压缩标准的码率控制方法 | |
| CN1926863B (zh) | 多通路视频编码的方法 | |
| KR20080031344A (ko) | 가변 비트 속도 인코딩이 가능한 비디오 인코더를 위한속도 제어 방법, 모듈, 기기 및 시스템 | |
| CN104410860B (zh) | 一种高清roi视频实时质量调节的方法 | |
| CN114513664B (zh) | 视频帧编码方法、装置、智能终端及计算机可读存储介质 | |
| US20050063461A1 (en) | H.263/MPEG video encoder for efficiently controlling bit rates and method of controlling the same | |
| CN100574442C (zh) | 基于图像直方图的码率控制方法 | |
| CN102420987A (zh) | 基于分层b帧结构的码率控制的自适应比特分配方法 | |
| CN118285094A (zh) | 用于视频处理的方法、装置和介质 | |
| CN101331773A (zh) | 使用速率失真特性进行视频编码的两遍速率控制技术 | |
| Tang et al. | A low delay rate control method for screen content coding | |
| CN117956160A (zh) | 码率控制方法、码率控制装置以及计算机存储介质 | |
| Tsai | Rate control for low-delay video using a dynamic rate table | |
| CN118590648A (zh) | 视频压缩处理方法、装置、设备及存储介质 | |
| CN109379593B (zh) | 一基于超前预测的码率控制方法 | |
| CN101340584A (zh) | 一种视频解码方法和装置 | |
| Chi et al. | Region-of-interest video coding by fuzzy control for H. 263+ standard | |
| Esmaeeli et al. | Methods and Criteria for Evaluating Controllability of Video Bit Rate in HEVC-SCC |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22794845 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 20237034885 Country of ref document: KR Kind code of ref document: A |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 1020237034885 Country of ref document: KR |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2023564189 Country of ref document: JP |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 202317072284 Country of ref document: IN |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2022794845 Country of ref document: EP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2022794845 Country of ref document: EP Effective date: 20231127 |








