CN117061825B

CN117061825B - Method and device for detecting bad frames of streaming media video and computer equipment

Info

Publication number: CN117061825B
Application number: CN202311316549.7A
Authority: CN
Inventors: 王曜; 刘琦; 许亦; 贺国超; 请求不公布姓名
Original assignee: Shenzhen Yuntian Changxiang Information Technology Co ltd
Current assignee: Shenzhen Yuntian Changxiang Information Technology Co ltd
Priority date: 2023-10-12
Filing date: 2023-10-12
Publication date: 2024-01-26
Anticipated expiration: 2043-10-12
Also published as: CN117061825A

Abstract

The invention discloses a method, a device and a computer device for detecting bad frames of streaming media video, which comprise the following steps: acquiring streaming media video; obtaining a first local area for the video frame by dividing a mapping relation; carrying out characteristic information quantity statistics on the first local area, and determining a target area in the first local area according to the characteristic information quantity statistics result; and determining bad frames in the plurality of video frames through a detection model according to the target area of the video frames. According to the method, the detection model is constructed, the bad frames in the video frames are determined before the video frames are rendered, invalid processing is avoided, hardware operation resources are saved, rendering time is shortened, and the method is used for dividing the target characteristics of the video frames.

Description

Method and device for detecting bad frames of streaming media video and computer equipment

Technical Field

The present invention relates to the field of video frame processing technologies, and in particular, to a method and apparatus for detecting bad frames of streaming media video, and a computer device.

Background

Streaming media is used for playing video and cloud video in various terminals such as televisions, mobile phones and notebooks in all aspects of people's work and life. Accordingly, requirements of people on video playing quality, including definition, smoothness, real-time performance and the like, are also increasing. Many streaming media scenes, such as cloud Rendering (Rendering) of a cloud game, are rendered at the cloud end, and then video image encoding streams obtained by Rendering are transmitted to the end side, and the end side decodes the received code streams. In this way, the end side can acquire high-quality rendering content to realize video playing.

In the prior art, when video frame super-resolution preprocessing is performed for maintaining the video frame rendering effect, indiscriminate super-resolution processing is performed on all video frames, so that bad frames in a hybrid are easily processed, invalid processing is generated, hardware operation resources are wasted, and rendering time is prolonged.

Disclosure of Invention

The invention aims to provide a method, a device and computer equipment for detecting bad frames of streaming media video, which are used for solving the technical problems that invalid processing is generated in the prior art, hardware operation resources are wasted and rendering time is prolonged.

In order to solve the technical problems, the invention specifically provides the following technical scheme:

in a first aspect of the present invention, the present invention provides a method for detecting bad frames of streaming media video, comprising the following steps:

acquiring a streaming media video, wherein the streaming media video comprises a plurality of video frames;

obtaining a first local area for the video frame through dividing the mapping relation, wherein the first local area corresponds to an area dividing result of the video frame;

carrying out feature information quantity statistics on the first local area, and determining a target area in the first local area according to the feature information quantity statistics result, wherein the target area corresponds to a local image area containing shooting target object features in a video frame;

and determining bad frames in the plurality of video frames through a detection model according to the target area of the video frames, wherein the detection model is a neural network.

As a preferred embodiment of the present invention, the determining of the first local area includes:

determining the dividing number m of the first local area through the dividing mapping relation;

and carrying out equal-area division on the video frame according to the division number m to obtain m first local areas.

As a preferred embodiment of the present invention, the construction of the partition mapping relationship includes:

setting the dividing number m of the first local areas, dividing the video frame into m first local areas according to the equal area, and calculating the image discreteness among the m first local areas, wherein the image discreteness is measured by a variance formula, and the quantization formula of the image discreteness is as follows:；the method comprises the steps of carrying out a first treatment on the surface of the Where, delta is characterized by the image discreteness,S _k characterized as the firstkAn image matrix of the first partial region,S _E characterized bymA matrix of mean images between the first local regions,x _{E i,} characterized as a mean image matrixS _E Middle (f)iThe pixel values of the individual pixel points,x _{k i,} characterized as the firstkImage matrix of first partial regionS _k Middle (f)iThe pixel values of the individual pixel points,Nfor the number of pixels of the image matrix,i，kis a metering constant; maximum solving of image discreteness among first local areasmTo obtain the value ofmAnd carrying out video frame division on the values, and intensively dividing the effective pixel points representing the characteristics of the shooting target object into the same first local area, and intensively dividing the ineffective pixel points representing the characteristics of the non-shooting target object into the same first local area.

As a preferred embodiment of the present invention, the determining of the target area includes:

carrying out feature information quantity statistics on each first local area of the video frame by using the histogram to obtain feature information quantity of each first local area;

comparing the characteristic information amount of the first partial region with a preset threshold, wherein,

when the characteristic information quantity of the first local area is larger than or equal to a preset threshold value, the first local area is marked as a target area;

and when the characteristic information quantity of the first local area is smaller than a preset threshold value, the first local area is marked as a non-target area.

As a preferred aspect of the present invention, determining a bad frame of a plurality of video frames includes:

inputting all target areas of the video frame into a detection model, and outputting classification labels of the video frame by the detection model;

the classification labels include bad frame labels and non-bad frame labels.

As a preferred embodiment of the present invention, the construction of the detection model includes:

selecting a group of video frames as sample video frames in streaming media video with known shooting standard objects, and acquiring all target areas in the sample video frames;

and comparing all target areas in the sample video frame with standard images of the shot target object characteristics, wherein,

if all target areas in the sample video frame are consistent with the standard images of the shooting target object characteristics, marking the sample video frame as a non-bad frame label;

if all the target areas in the sample video frame are inconsistent with the standard images of the shooting target object characteristics, marking the sample video frame as a bad frame label;

learning and training all target areas of the sample video frames and classification labels of the sample video frames by using a neural network to obtain the detection model;

the model expression of the detection model is as follows:

Label=CNN(g)；

in the formula, label is a classification Label, g is all target areas of a sample video frame, and CNN is a neural network.

As a preferred embodiment of the present invention, the consistency is quantified using image similarity.

As a preferred embodiment of the present invention, all target areas in the sample video frame are consistent with the standard image of the feature of the photographed target.

In a second aspect of the present invention, the present invention provides a device for detecting bad frames of streaming video, including:

the data acquisition module is used for acquiring streaming media video, wherein the streaming media video comprises a plurality of video frames;

the data processing module is used for obtaining a first local area for the video frame by dividing the mapping relation;

carrying out characteristic information quantity statistics on the first local area, and determining a target area in the first local area according to the characteristic information quantity statistics result;

according to the target area of the video frames, determining bad frames in a plurality of video frames through a detection model;

and the data storage module is used for storing the detection model.

In a third aspect of the invention, the invention provides a computer device, at least one processor; and

a memory communicatively coupled to the at least one processor;

the memory stores instructions executable by the at least one processor to cause the computer device to perform a streaming video bad frame detection method.

In a fourth aspect of the present invention, a computer readable storage medium is provided, where computer executable instructions are stored, and when a processor executes the computer executable instructions, a method for detecting bad frames of a streaming video is implemented. Compared with the prior art, the invention has the following beneficial effects:

according to the method, the detection model is constructed, the bad frames in the video frames are determined before the video frames are rendered, invalid processing is avoided, hardware operation resources are saved, rendering time is shortened, the method is used for dividing the target features of the video frames, the effective pixel points representing the features of the shooting target objects are divided into the same local area in a concentrated mode, all the features of the video frames are not required to be detected, and targeted detection is achieved, and detection efficiency and accuracy are improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It will be apparent to those of ordinary skill in the art that the drawings in the following description are exemplary only and that other implementations can be obtained from the extensions of the drawings provided without inventive effort.

FIG. 1 is a flow chart of a method for detecting bad frames of streaming media video according to an embodiment of the present invention;

fig. 2 is a block diagram of a bad frame detection device for streaming media video according to an embodiment of the present invention;

fig. 3 is an internal structure diagram of a computer device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As shown in fig. 1, in a first aspect of the present invention, the present invention provides a method for detecting bad frames of streaming video, comprising the following steps:

for a video frame, obtaining a first local area through dividing the mapping relation, wherein the first local area corresponds to an area division result of the video frame;

carrying out feature information quantity statistics on the first local area, and determining a target area in the first local area according to a feature information quantity statistics result, wherein the target area corresponds to a local image area containing shooting target object features in a video frame;

In order to obtain the optimal video rendering effect, the method and the device perform super-resolution processing on the video frames before rendering, improve the resolution of the video frames, and enable the rendered video frames to have high resolution.

Furthermore, before the super-resolution processing of the video frames is carried out, the method detects the video frames, selects the video frames commonly called as bad frames from the video frames, does not carry out the super-resolution processing, and can avoid invalid or meaningless super-resolution processing due to the fact that the bad frames have no meaning of rendering processing, thereby realizing the purposes of saving hardware operation resources and reducing rendering time.

Furthermore, when the invention detects the bad frame, the image area which represents the characteristic of the shooting target object in the video frame, namely the target area, is marked by the area segmentation and the information quantity statistics, and the bad frame detection is carried out by utilizing the target area, so that the pixel data quantity of the image to be processed in the bad frame detection is reduced, and the bad frame detection efficiency is improved.

The invention is characterized in that the object region is an effective picture feature for showing the object to the audience in the video frame, therefore, the quality detection of the image feature in the object region belongs to effective quality detection, but the quality detection of the background feature and the noise feature contained in the non-object region belongs to ineffective redundant detection, therefore, the invention is carried out in the bad frame detection aiming at the local region (object region) of the video frame representing the feature of the object to be shot, the detection pertinence is strong, the video frame which is not displayed by the feature of the object to be shot is filtered, and the accuracy of the bad frame detection is improved.

In the invention, in order to intensively divide the local image of the video frame representing the characteristics of the shooting target object into one or a few local areas, so as to obtain that the content mainly contained in the one or a few local areas is the shooting target object, and the content mainly contained in the rest local areas in the convergence of the background color part can be obtained, so as to obtain that the content mainly contained in the rest local areas is the background color part, the difference of the heights represented among all the local areas obtained by the segmentation is utilized, namely, the higher the image variance is, the larger the difference of the representing pixel representing content among the local areas is, namely, the expected result is realized, and the method comprises the following steps:

the determining of the first local area includes:

and carrying out equal-area division on the video frame according to the dividing number m to obtain m first local areas.

The construction of the partition mapping relation comprises the following steps:

setting the dividing number m of the first local areas, dividing the video frame into m first local areas according to the equal area, and calculating the image discreteness among the m first local areas, wherein the image discreteness is measured by a variance formula, and the quantization formula of the image discreteness is as follows:；the method comprises the steps of carrying out a first treatment on the surface of the Where, delta is characterized by the image discreteness,S _k characterized as the firstkAn image matrix of the first partial region,S _E characterized bymA matrix of mean images between the first local regions,x _{E i,} characterized as a mean image matrixS _E Middle (f)iThe pixel values of the individual pixel points,x _{k i,} characterized as the firstkImage matrix of first partial regionS _k Middle (f)iThe pixel values of the individual pixel points,Nfor image matrixIs used for the number of pixels of a display device,i，kis a metering constant; maximum solving of image discreteness among first local areasmTo obtain the value ofmAnd carrying out video frame division on the values, and intensively dividing the effective pixel points representing the characteristics of the shooting target object into the same first local area, and intensively dividing the ineffective pixel points representing the characteristics of the non-shooting target object into the same first local area.

The determination of the target area comprises the following steps:

The invention determines the bad frame in the video frame before the video frame is rendered by constructing the detection model, avoids generating invalid processing, saves hardware operation resources and reduces rendering time, and is concretely as follows:

determining a bad frame of a plurality of video frames, comprising:

the classification labels include bad frame labels and non-bad frame labels.

The construction of the detection model comprises the following steps:

learning and training all target areas of the sample video frames and classification labels of the sample video frames by using a neural network to obtain a detection model;

the model expression of the detection model is:

Label=CNN(g)；

Consistency is quantified using image similarity.

All target areas in the sample video frame are consistent with the standard image of the shot target object characteristic.

As shown in fig. 2, in a second aspect of the present invention, the present invention provides a bad frame detection device for streaming video, including:

carrying out characteristic information quantity statistics on the first local area, and determining a target area in the first local area according to a characteristic information quantity statistics result;

according to the target area of the video frames, determining bad frames in the plurality of video frames through a detection model;

and the data storage module is used for storing the detection model.

In a third aspect of the invention, as shown in FIG. 3, the invention provides a computer device, at least one processor; and

a memory communicatively coupled to the at least one processor;

the memory stores instructions executable by the at least one processor to cause the computer device to perform a streaming video bad frame detection method. In a fourth aspect of the present invention, a computer readable storage medium is provided, where computer executable instructions are stored, and when a processor executes the computer executable instructions, a method for detecting bad frames of a streaming video is implemented. Compared with the prior art, the invention has the following beneficial effects:

The bad frame detection is applied to a streaming media video frame rendering method, wherein the streaming media video frame rendering method adopts a multi-factor fusion mode to conduct super-resolution rendering, and the method specifically comprises the following steps of:

evaluating the video frame quality of the video frame to obtain a video frame with high video frame quality and a video frame with low video frame quality;

performing super-resolution processing on the video frames with high video frame quality to obtain the video frames with high video frame quality with super-resolution;

according to the video frames with high video frame quality of the super resolution, performing video frame quality compensation on the video frames with low video frame quality to obtain the video frames with low video quality of the super resolution;

and rendering the video frames with high video frame quality and low video frame quality of the super resolution to obtain a super resolution rendering result of the video frames.

In order to improve the effect of super-resolution processing of video frames, the invention highlights the characteristics of important areas in the video frames, suppresses noise, has the best resolution improving effect, utilizes the multi-factor fusion idea, applies various attention models to the super-resolution processing, comprises a channel attention model, a space attention model and a multi-head self-attention model, fuses the advantages of the three models to complement each other, and achieves the purpose of resolution improving effect.

According to the invention, when the advantages of the channel attention model, the spatial attention model and the multi-head self-attention model are fused, the neural network is utilized to determine the fusion weight, and the optimal fusion weight is objectively and automatically determined, so that the advantages of the three models can be exerted to the maximum in fusion, and the aim of optimizing the resolution improvement effect is fulfilled.

Furthermore, before the super-resolution processing of the video frames, the method detects the video frames, and selects the video frames with low video quality (commonly called bad frames) from the video frames, namely selects the video frames commonly called bad frames from the video frames, and does not perform the super-resolution processing, so that the bad frames have no rendering processing meaning, and the detection of the video frames can avoid invalid or nonsensical super-resolution processing.

Before super-resolution processing of video frames, the method detects the video frames, and selects video frames with low video quality (commonly called bad frames) from the video frames, wherein the method comprises the following steps:

obtaining a first local area of the video frame through dividing the mapping relation, wherein the first local area corresponds to an area dividing result of the video frame, and a plurality of dividing areas of the video frame;

according to the target area of the video frames, determining bad frames or video frames with low video quality in a plurality of video frames through a detection model, wherein the detection model is a neural network.

Further, determining a bad frame of the plurality of video frames includes:

the classification labels comprise bad frame labels and non-bad frame labels;

the video frames with bad frame labels are used as low video quality video frames, and the video frames with non-bad frame labels are used as high video quality video frames.

In the video frame rendering process of the multi-factor fusion mode, the steps of performing subsequent video frame super-resolution processing, video frame quality compensation, video frame rendering and the like on a video frame with low video quality can be avoided, invalid steps are reduced, and the accuracy of rendering effect is ensured.

The above embodiments are only exemplary embodiments of the present application and are not intended to limit the present application, the scope of which is defined by the claims. Various modifications and equivalent arrangements may be made to the present application by those skilled in the art, which modifications and equivalents are also considered to be within the scope of the present application.

Claims

1. The method for detecting the bad frames of the streaming media video is characterized by comprising the following steps:

according to the target area of the video frames, determining bad frames in a plurality of video frames through a detection model, wherein the detection model is a neural network;

setting the dividing number m of the first local areas, dividing the video frame into m first local areas according to the equal area, and calculating the image discreteness among the m first local areas, wherein the image discreteness is measured by a variance formula, and the quantization formula of the image discreteness is as follows:；the method comprises the steps of carrying out a first treatment on the surface of the Where, delta is characterized by the image discreteness,S _k characterized as the firstkAn image matrix of the first partial region,S _E characterized bymA matrix of mean images between the first local regions,x _{E i,} characterized as a mean image matrixS _E Middle (f)iThe pixel values of the individual pixel points,x _{k i,} characterized as the firstkImage matrix of first partial regionS _k Middle (f)iThe pixel values of the individual pixel points,Nfor the number of pixels of the image matrix,i，kis a metering constant; maximum solving of image discreteness among first local areasmTo obtain the value ofmAnd carrying out video frame division on the values, and intensively dividing effective pixel points representing the characteristics of the shooting target object into the same first local area and intensively dividing ineffective pixel points representing the characteristics of the non-shooting target object into the same first local area.

2. The method for detecting bad frames of streaming media video according to claim 1, wherein the method comprises the following steps:

the determining of the first local area includes:

3. The method for detecting bad frames of streaming media video according to claim 1, wherein the method comprises the following steps: the determining of the target area includes:

4. The method for detecting bad frames of streaming media video according to claim 3, wherein the method comprises the following steps:

determining a bad frame of a plurality of video frames, comprising:

the classification labels include bad frame labels and non-bad frame labels.

5. The method for detecting bad frames of streaming media video according to claim 4, wherein the method comprises the following steps:

the construction of the detection model comprises the following steps:

the model expression of the detection model is as follows: label=cnn (g); in the formula, label is a classification Label, g is all target areas of a sample video frame, and CNN is a neural network.

6. The method for detecting bad frames of streaming media video according to claim 5, wherein the method comprises the steps of:

the consistency is quantified using image similarity.

7. The method for detecting bad frames of streaming media video according to claim 5, wherein the method comprises the steps of:

and all target areas in the sample video frame are consistent with the specification of a standard image of the characteristic of the shot target object.

8. The bad frame detection device of the streaming media video is characterized by comprising the following components:

the data processing module is used for obtaining a first local area for the video frame through dividing the mapping relation, wherein the first local area corresponds to an area division result of the video frame;

the data storage module is used for storing the detection model;

9. A computer device characterized by at least one processor; and

a memory communicatively coupled to the at least one processor;

wherein the memory stores instructions executable by the at least one processor to cause a computer device to perform the method of any of claims 1-7.