WO2020224428A1 - 在视频中植入信息的方法、计算机设备及存储介质 - Google Patents

在视频中植入信息的方法、计算机设备及存储介质 Download PDF

Info

Publication number
WO2020224428A1
WO2020224428A1 PCT/CN2020/085939 CN2020085939W WO2020224428A1 WO 2020224428 A1 WO2020224428 A1 WO 2020224428A1 CN 2020085939 W CN2020085939 W CN 2020085939W WO 2020224428 A1 WO2020224428 A1 WO 2020224428A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
implanted
detected
model
pixel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2020/085939
Other languages
English (en)
French (fr)
Inventor
高琛琼
殷泽龙
谢年华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to EP20802358.0A priority Critical patent/EP3968627B1/en
Priority to JP2021532214A priority patent/JP7146091B2/ja
Publication of WO2020224428A1 publication Critical patent/WO2020224428A1/zh
Priority to US17/394,579 priority patent/US11785174B2/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/272Means for inserting a foreground image in a background image, i.e. inlay, outlay
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/00Two-dimensional [2D] image generation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/00Two-dimensional [2D] image generation
    • G06T11/60Creating or editing images; Combining images with text
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/28Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/63Control of cameras or camera modules by using electronic viewfinders
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/66Remote control of cameras or camera parts, e.g. by remote control devices
    • H04N23/661Transmitting camera control signals through networks, e.g. control via the Internet
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N9/00Details of colour television systems
    • H04N9/64Circuits for processing colour signals
    • H04N9/68Circuits for processing colour signals for controlling the amplitude of colour signals, e.g. automatic chroma control circuits
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/70Circuitry for compensating brightness variation in the scene
    • H04N23/71Circuitry for evaluating the brightness variation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/2628Alteration of picture size, shape, position or orientation, e.g. zooming, rotation, rolling, perspective, translation

Definitions

  • This application relates to graphics and image technology, and in particular to a method for embedding information in a video, a computer device and a storage medium.
  • Video is the current mainstream information carrier. With the development of the Internet, especially the mobile Internet, the speed of video transmission has increased rapidly, making video an important channel for information transmission.
  • Video information implantation refers to superimposing various information, such as promotional information, including images, text, or a combination of the two, in the background of the video without affecting the main content of the video (for example, foreground content).
  • the main content of the video (such as the characters in the video, the special effects added in the post-production of the video, etc.) is presented in the form of foreground content.
  • the information needs to be integrated into In the background content of the video.
  • Related technologies lack effective solutions.
  • the embodiments of the present application provide a method, computer equipment, and storage medium for embedding information in a video, which can efficiently integrate information into the background content of the video.
  • the embodiment of the application provides a method for embedding information in a video, including:
  • the information to be implanted after applying the template is overlaid on the implanted area in the frame to be detected, so that the foreground is highlighted relative to the information to be implanted.
  • An embodiment of the present application provides a device for embedding information in a video, including:
  • a model building module is used to build a model that conforms to the pixel distribution characteristics of the implanted area in the reference frame, and to control the update of the model based on the frame to be detected subsequent to the reference frame;
  • a template generating module configured to identify the background and foreground of the implanted area in the frame to be detected based on the model, and generate a template for occluding the background and revealing the foreground;
  • a template application module configured to apply the information to be implanted to the template to shield the content in the information to be implanted that would obscure the foreground;
  • the information covering module is used for covering the information to be implanted after applying the template to the implantation area in the frame to be detected, so that the foreground is highlighted relative to the information to be implanted.
  • the device further includes:
  • the parameter initialization module is configured to correspond to each pixel of the implanted area in the reference frame, and initialize at least one sub-model corresponding to the pixel and the weight corresponding to the at least one sub-model;
  • the weight mixing module is used to mix the sub-models constructed corresponding to each pixel based on the initialized weights to form a model corresponding to the pixel.
  • the device further includes:
  • a weight retention module configured to reduce the rate at which the model is fitted to the implanted area in the to-be-detected frame in response to the implanted area in the to-be-detected frame being blocked by the foreground;
  • the fitting acceleration module is used to respond to the implanted area in the to-be-detected frame not being blocked by the foreground, and the illumination of the implanted area in the to-be-detected frame changes, to transfer the model to the to-be-detected frame
  • the fitting rate of the implanted area in the frame is increased.
  • the device further includes:
  • the parameter update module is used to respond to the pixel points of the implanted area in the frame to be detected matching at least one sub-model in the corresponding model, update the parameters of the matched sub-model, and keep the corresponding model unmatched The parameters of the sub-model remain unchanged.
  • the device further includes:
  • the first matching module is configured to match the color value of each pixel in the implanted area in the frame to be detected with the sub-model in the model corresponding to the pixel;
  • the recognition module is used for recognizing the pixels that are successfully matched as the pixels of the background, and the pixels that are not matched as the pixels of the foreground.
  • the device further includes:
  • the filling module is used to correspond to the pixels identified as background in the implanted area in the frame to be detected, and fill binary ones in the corresponding positions in the empty template, and
  • binary zeros are filled in the corresponding positions in the template filled with binary ones.
  • the device further includes:
  • the arithmetic module is used to multiply the information to be implanted with the binary number filled in each position in the template.
  • the device further includes:
  • the second matching module is configured to match the features extracted from the implanted region in the reference frame of the video with the features extracted from the frame to be detected in response to the video being formed using a motion lens;
  • the area determining module is configured to determine that the frame to be detected includes the implanted area corresponding to the implanted area in the reference frame in response to the successful matching.
  • the device further includes:
  • the area transformation module is used to respond to the video being formed with a motion lens
  • the template inverse transformation module is used to perform the inverse transformation of the template on the template before applying the information to be implanted, so that the position of each binary number in the transformed template is the same as the frame to be detected The positions of the corresponding pixels in the implanted area are the same.
  • the device further includes:
  • the region positioning module is used to respond to the video being formed by using a static lens, and locate the region of the corresponding position in the frame to be detected based on the position of the implanted region in the reference frame to determine the implanted region to be detected .
  • the device further includes:
  • the first determining module is configured to determine the first difference condition in response to the first color space distribution of the implanted area in the frame to be detected and the first color space distribution of the implanted area in the reference frame The implanted area in the frame to be detected is blocked by the foreground;
  • the second determining module is configured to determine that the second color space distribution of the implanted area in the frame to be detected meets a second difference condition with the second color space distribution of the implanted area in the reference frame The illumination of the implanted area in the frame to be detected changes.
  • An embodiment of the present application provides a computer device, including:
  • Memory used to store executable instructions
  • the processor is configured to implement the method provided in the embodiment of the present application when executing the executable instructions stored in the memory.
  • An embodiment of the present application provides a storage medium that stores executable instructions for causing a processor to execute, to implement the method provided in the embodiment of the present application.
  • the embodiment of the present application provides a computer program product, and the computer program product stores a computer program, which is used to implement the method provided in the embodiment of the present application when it is loaded and executed by a processor.
  • FIG. 1A is a schematic diagram of processing an image using a mask in an embodiment of the present application
  • FIG. 1B is a schematic diagram of an application scenario provided by an embodiment of the application.
  • FIG. 2 is a schematic diagram of an optional structure of a device provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of the implementation process of a method for embedding information in a video according to an embodiment of the present application
  • Figure 4 is a schematic diagram of the implementation process of constructing and updating a model in an embodiment of the present application
  • FIG. 5 is a schematic diagram of another implementation process of the method for embedding information in a video according to an embodiment of the application;
  • FIG. 6 is a schematic diagram of another implementation process of the method for embedding information in a video according to an embodiment of the application;
  • FIG. 7 is a schematic diagram of another implementation process of embedding information in a video according to an embodiment of the application.
  • 8A is a schematic diagram of the effect of embedding information in a video formed by using a static lens according to an embodiment of the application;
  • FIG. 8B is a schematic diagram of another effect of embedding information in a video formed by using a static lens according to an embodiment of the application.
  • 8C is a schematic diagram of the effect of embedding information in a video formed by a dynamic lens according to an embodiment of the application.
  • 8D is a schematic diagram of another effect of embedding information in a video formed by a dynamic lens according to an embodiment of the application.
  • FIG. 9A is a schematic diagram of using morphology to improve the mask in an embodiment of the application.
  • FIG. 9B is another schematic diagram of using morphology to improve the mask according to the embodiment of the application.
  • first ⁇ second ⁇ third is only used to distinguish similar objects, and does not represent a specific order of objects. Understandably, “first ⁇ second ⁇ third” Where permitted, the specific order or sequence can be interchanged, so that the embodiments of the present application described herein can be implemented in a sequence other than those illustrated or described herein.
  • a mask also called a filter or a template, is an image used to mask (part or all) pixels in the image to be processed, so as to highlight a part of a specific image.
  • the mask can be a two-dimensional matrix array, and sometimes a multi-value image is also used.
  • the image mask is mainly used to shield certain areas of the image.
  • a 3*3 image shown in 101 in FIG. 1A is calculated with a 3*3 mask shown in 102 in FIG. 1A to obtain a result image shown in 103 in FIG. 1A.
  • a static lens that is, a fixed lens (Fixed Shot, FS), is a lens with a fixed camera position, lens optical axis, and focal length.
  • the objects in the video of static shots can be static or dynamic (in and out of the screen), but the frame to which the screen is attached does not move, that is, the screen range and the field of view area Is consistent.
  • a sports lens is a lens that uses various movements (such as changes in camera position, optical axis, and focal length) to take pictures.
  • the frame to which the picture in the video of the motion lens is attached can be changed, that is, the picture range and the area of the field of view can be changed, for example, the distance, size, and angle of the image.
  • the scene behind the subject in the video frame can express the space-time environment where the character or event is located, such as buildings, walls, and ground behind the character.
  • the content of the video screen that is closer to the lens than the background is the main body of the video, such as a person standing in front of a building.
  • Background subtraction that is, manually setting a fixed threshold, subtracting the new potential foreground area in the video from the original background area, and comparing with the threshold to determine whether the background is occluded by the foreground, and then forming the corresponding occluded part of the mask membrane.
  • the solution's judgment on the foreground and background relies on manually selected thresholds, so the degree of automation is low and frequent adjustments are required; when the colors of the current and background are similar, the subtraction between the foreground and the background is not complete and the accuracy is low.
  • Gaussian mixture background modeling of static shots is to model the background of static shots with no occlusion, and use the model for subsequent image frames to determine whether the background is occluded by the foreground to form a mask for the occluded part.
  • the solution can only be used for fixed-lens video. If it is a moving-lens video, it is easy to recognize the background as the foreground, and the accuracy is also low.
  • Trajectory classification is to calibrate the target point of interest in the initial frame, use the motion tracking model to obtain the trajectory of the feature points in the implanted information, and to distinguish the foreground and background based on the trajectory.
  • the solution is sensitive to the noise in the image frame, and the accuracy depends on the motion tracking model. If the selected motion tracking model is not suitable, the discrimination accuracy of foreground and background will be greatly affected.
  • the embodiments of the present application provide a method for embedding information in a video, which combines video sequence and full-pixel statistics to model, and realizes automatic selection of still shot videos Background modeling, subsequent frames automatically update the learning rate to optimize the model, use statistical features to determine whether there is occlusion and form a mask; use transformation technology to map the standard picture of the reference frame to perform pixel statistical modeling for the video of the motion shot, and then return The occluded mask is obtained from the picture of the subsequent frame, without a motion tracking model, with high real-time performance, wide application range, strong robustness, and automatic and efficient use.
  • the devices provided in the embodiments of the present application can be implemented as mobile phones (mobile phones), tablet computers, notebook computers and other mobile terminals with wireless communication capabilities, and can also be implemented as inconvenient mobile terminals.
  • the device provided in the implementation of this application can also be implemented as a server, and the server can refer to one server, or can be a server cluster composed of multiple servers, a cloud computing center, etc., which is not limited herein.
  • Figure 1B is a schematic diagram of an application scenario provided by an embodiment of the application.
  • the terminal 400 is connected to the server 200 through a network 300.
  • the network 300 may be a wide area network or a local area network, or a combination of the two. Use wireless link to realize data transmission.
  • the implanted information may be an advertisement, and the video may be a video recorded by the terminal.
  • the terminal 400 may send the video and the information to be implanted to the server 200,
  • the server 200 is requested to implant information in the video.
  • the server 200 uses the method of implanting information in the video provided in the embodiments of the present application to add the information to be implanted into the video.
  • encapsulation is performed to obtain the encapsulated video file, and finally the encapsulated video file is sent to the terminal 400.
  • the terminal 400 can publish the video embedded with the advertisement.
  • the terminal 400 after the terminal 400 has recorded the video and determined the information to be implanted, the terminal 400 itself uses the method of implanting information in the video provided by the embodiments of this application to transfer the information to be implanted Add to each frame of the video, and encapsulate to obtain the video file, and then publish the video embedded with the advertisement through the APP for watching the video. It should be noted that, in order to reduce the amount of calculations and implantation efficiency of the terminal, when the terminal itself performs information implantation, it is generally for a relatively short video.
  • the video is the video stored in the server 200.
  • the terminal 400 may send the information to be implanted and the identification information of the video to the server 200 to request the server 200 adds the information to be implanted into the video corresponding to the identification information.
  • the server 200 determines the corresponding video file based on the identification information, embeds the information to be implanted into the video file, and finally encapsulates to obtain the encapsulated video file, and then sends the encapsulated video file to the terminal 400.
  • the device provided in the embodiment of the present application may be implemented in a hardware or a combination of software and hardware.
  • the following describes various exemplary implementations of the device provided in the embodiment of the present application.
  • FIG. 2 is a schematic diagram of an optional structure of a server 200 provided in an embodiment of the present application.
  • the server 200 may be a desktop server, or a server cluster composed of multiple servers, a cloud computing center, etc.
  • an exemplary structure when the device is implemented as a server can be foreseen. Therefore, the structure described here should not be regarded as a limitation. For example, some components described below may be omitted, or components not described below may be added to Adapt to the special needs of some applications.
  • the server 200 shown in FIG. 2 includes: at least one processor 210, a memory 240, at least one network interface 220, and a user interface 230. Each component in the terminal 200 is coupled together through the bus system 250. It can be understood that the bus system 250 is used to implement connection and communication between these components. In addition to the data bus, the bus system 250 also includes a power bus, a control bus, and a status signal bus. However, for clear description, various buses are marked as the bus system 250 in FIG. 2.
  • the user interface 230 may include a display, a keyboard, a mouse, a trackball, a click wheel, keys, buttons, a touch panel or a touch screen, etc.
  • the memory 240 may be a volatile memory or a non-volatile memory, and may also include both volatile and non-volatile memory.
  • the non-volatile memory may be a read only memory (ROM, Read Only Memory).
  • the volatile memory may be a random access memory (RAM, Random Access Memory).
  • RAM Random Access Memory
  • the memory 240 in the embodiment of the present application can store data to support the operation of the server 200.
  • Examples of these data include: any computer programs used to operate on the server 200, such as operating systems and application programs.
  • the operating system contains various system programs, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks.
  • Applications can include various applications.
  • the method provided by the embodiments of the present application may be directly embodied as a combination of software modules executed by the processor 210.
  • the software modules may be located in a storage medium, and the storage medium is located in a memory. 240.
  • the processor 210 reads the executable instructions included in the software module in the memory 240, and combines necessary hardware (for example, including the processor 210 and other components connected to the bus 250) to complete the method provided in the embodiment of the present application.
  • the processor 210 may be an integrated circuit chip with signal processing capabilities, such as a general-purpose processor, a digital signal processor (DSP, Digital Signal Processor), or other programmable logic devices, discrete gates, or transistor logic devices , Discrete hardware components, etc., where the general-purpose processor may be a microprocessor or any conventional processor.
  • DSP Digital Signal Processor
  • the general-purpose processor may be a microprocessor or any conventional processor.
  • the method for implementing the embodiment of the present application will be described in conjunction with the foregoing exemplary application and implementation of the apparatus for implementing the embodiment of the present application.
  • the method provided in the embodiment of the present application is applied to an execution device, and the execution device may be a server or a terminal. That is to say, the method provided in the embodiment of the present application may be executed by the server, or may also be executed by the terminal.
  • the server can be a desktop server, a server cluster composed of multiple servers, a cloud computing center, etc.
  • the terminal can be a mobile terminal with wireless communication capabilities such as a mobile phone (mobile phone), a tablet computer, a notebook computer, etc., and can also be implemented as a desktop computer or desktop computer with computing functions that is inconvenient to move.
  • FIG. 3 is a schematic diagram of the implementation flow of the method for embedding information in a video according to an embodiment of the present application, which will be described in conjunction with the steps shown in FIG.
  • Step S101 Construct a model that conforms to the pixel distribution characteristics of the implanted area in the reference frame, and control the update of the model based on the frame to be detected subsequent to the reference frame.
  • the reference frame may be an image after the information has been implanted, and the area where the information is implanted is the implanted area.
  • the reference frame and the implanted area can be artificially set, or can be automatically filtered using technologies such as machine learning and deep learning.
  • the reference frame may be an image frame in the video that includes at least an implanted area, the implanted area is implanted with information to be implanted, and the information to be implanted is not blocked. For example, it may be the first time that the implantation area appears in the video, and the implantation area is implanted with the image frame with the information to be implanted and the information to be implanted is not blocked.
  • the reference frame may be an image frame where a complete advertisement area (for example, a specific area on the wall or the ground, which is sufficient to display the advertisement completely) appears for the first time in the video.
  • the reference frame may be an image frame in which a target object related to the information to be implanted appears, or an image frame in which keywords related to the information to be implanted appear in the displayed caption.
  • the information to be implanted is an advertisement for a certain brand of air conditioner
  • it can be an image frame where the air conditioner appears in the video as a reference frame, or an image with keywords like "cold" and "hot” Frame as a reference frame.
  • the implantation area can be artificially delineated, for example, it can be an area in the upper right corner of the image frame, or an area in the upper middle of the image frame, of course, it can also be a specific area automatically recognized, such as the ground, the wall , Sky and other related areas. It should be noted that the implanted area in the reference frame is not obstructed by the foreground, so that when the model is initialized, the pixel distribution of the implanted area can be fully learned.
  • the model When constructing a model that conforms to the pixel distribution characteristics of the implanted area in the reference frame, it is to construct a model of each pixel in the implanted area.
  • it may be a Gaussian mixture model of each pixel in the implanted area.
  • the Gaussian mixture model predefined for each pixel is initialized according to each pixel of the implanted area in the reference frame, which includes multiple Gaussian modes (in some embodiments, the Gaussian mode is also It can be called mode/Gaussian component/sub-model), initialize the parameters in the Gaussian mode, and find the parameters to be used later.
  • each pixel in the implanted area in each subsequent frame to be detected to determine whether the pixel matches a certain pattern (ie Gaussian mode), if it matches, the pixel is classified into the pattern, and according to The new pixel value updates the weight of the model. If it does not match, a Gaussian model is established with pixels and the parameters are initialized to replace the model with the smallest weight in the original model.
  • a certain pattern ie Gaussian mode
  • Step S102 Recognizing the background and the foreground of the implanted region in the frame to be detected based on the model, and generating a template for occluding the background and revealing the foreground.
  • each pixel in the implanted area in the frame to be detected may be sequentially matched with each mode in the corresponding model. If a pixel has a matching pattern, the pixel is considered to be the background Pixel, if there is no pattern that matches the pixel, then the pixel is considered to be the foreground pixel.
  • the recognition result can be used to generate a template for occluding the background and revealing the foreground. Further, when a pixel is recognized as the background, the corresponding value of the pixel in the template can be set to 1, and if the pixel is recognized as the foreground, the corresponding value of the pixel in the template is set to 0. It should be noted that 0 and 1 here are binary numbers, that is, the template is a mask composed of binary 0 and 1.
  • Step S103 Apply the template to the information to be implanted, so as to shield the content in the information to be implanted that would obscure the foreground.
  • step S103 the information to be implanted and the template may be multiplied.
  • multiplying the information to be implanted with the template may refer to multiplying the information to be implanted with a binary number filled in each position in the template. This can be achieved by multiplying the pixel to be implanted with the binary number at the corresponding position in the template.
  • the value corresponding to the background pixel is 1 and the value corresponding to the foreground pixel is 0. Therefore, when the information to be implanted is multiplied with the template, the content that will obscure the foreground in the information to be implanted will be shielded. It will not affect the content that does not obscure the foreground in the embedded information.
  • Step S104 covering the information to be implanted after applying the template to the implantation area in the frame to be detected, so that the foreground is highlighted relative to the information to be implanted.
  • the implanted information after applying the template is overwritten to the implanted information in the frame to be detected In the area, the information to be implanted does not obscure the foreground part of the frame to be detected, thereby bringing a better viewing experience.
  • a model When using the method provided by the embodiments of this application to embed information to be implanted in a video, a model must be constructed for each pixel based on the pixel distribution characteristics of the implanted area in the reference frame, and the parameters of the model can be based on the frame to be detected
  • Each pixel in the implanted area is updated, and then based on the foreground pixels and background pixels of the implanted area in the frame to be detected, a template that can block the background and not the foreground is generated, and then the template is applied to the implanted information, and finally The information to be implanted after applying the template is covered to the implanted area in the frame to be detected.
  • the generated template can block the background but not the foreground.
  • the template after applying the template to the information to be implanted, it can block the foreground from the information to be implanted Therefore, after the information is embedded in the frame to be detected, the foreground part of the frame to be detected will not be blocked, thereby ensuring the video viewing experience.
  • Step S101 can be implemented through the steps shown in FIG. 4:
  • Step S1011 corresponding to each pixel of the implanted area in the reference frame, initialize at least one sub-model corresponding to the pixel and a weight corresponding to the at least one sub-model.
  • the granularity is pixel points, that is, a model is constructed for each pixel point, and a pixel point model may correspond to at least one sub-model.
  • a pixel model can correspond to one sub-model or multiple sub-models.
  • the pixel model may be a Gaussian mixture model, and the model includes two or more sub-models, generally three to five.
  • the sub-model may be a Gaussian probability distribution function
  • the initializing sub-model is at least the parameters of the initializing sub-model, where the parameters of the sub-model may be parameters such as mean, variance, and weight.
  • the parameters of the sub-model may be set to preset values.
  • the variance is generally set as large as possible, and the weight is set as small as possible. This setting is because the initialized Gaussian model is an inaccurate model. It is necessary to constantly reduce its range and update parameter values to obtain the most likely Gaussian model.
  • the variance is set to be larger in order to make as many as possible
  • the pixel points are matched with the sub-models, so as to obtain a model that accurately represents the distribution characteristics of the color values of the pixel points in each frame of the video.
  • the model may also be a single Gaussian model. At this time, only one sub-model is required, and the parameters of the sub-model may be the mean value, the variance, and the like. Since the single Gaussian model is suitable for scenes with a single constant background, a Gaussian mixture model is usually constructed for subsequent processing.
  • step S1012 the sub-models constructed corresponding to each pixel are mixed based on the initialized weights to form a model corresponding to the pixel.
  • step S1012 can be implemented by formula (1-1) :
  • F m K 1 *F 1 +K 2 *F 2 +K 3 *F 3 (1-1);
  • F m is the model corresponding to the pixel.
  • a simple mathematical transformation can also be performed on the formula (1-1) to form a model corresponding to the pixel.
  • step S1011 and step S1012 a model conforming to the pixel distribution characteristics of the implanted area in the reference frame is completed.
  • Step S1013 Determine whether the implanted area in the frame to be detected is blocked by the foreground.
  • the first color space distribution of the implanted area in the frame to be detected and the first color space distribution of the implanted area in the reference frame may be obtained first, and then the implanted area of the frame to be detected and the value of the reference frame are determined.
  • the difference degree of the first color space distribution of the implanted area is further determined whether the implanted area in the frame to be detected is occluded by the foreground by determining whether the difference degree satisfies the first difference condition.
  • step S1014 is entered; if the first color space distribution of the implanted area in the frame to be detected is the same as that of the implanted area in the reference frame. The first color space distribution of the region does not satisfy the first difference condition, indicating that the difference between the two is small, and then it indicates that the implanted region in the frame to be detected is not blocked by the foreground, and step S1015 is entered at this time.
  • the first color spatial distribution may be Red Green Blue (RGB) spatial distribution.
  • RGB Red Green Blue
  • Obtaining the first color space distribution of the implanted area can be obtained by obtaining the RGB histogram of the implanted area. For example, 256 gray levels can be divided into 32 intervals, and the distribution of pixels in the implanted area can be counted in these 32 intervals. Get the RGB histogram.
  • the first difference condition may be used to indicate the maximum degree of difference in the first color space distribution between the implanted area of the reference frame and the implanted area of the frame to be detected when it is determined that there is no occlusion in the implanted area of the frame to be detected. For example, assuming that there are a total of M intervals, the first difference condition may be that the difference in the number of pixels in 30%*M intervals is outside the number threshold range. For example, there are 32 intervals, then the first difference condition may be that there is at least a difference in the number of pixels in 9 intervals that exceeds 10.
  • Step S1014 in response to the implanted area in the frame to be detected being occluded by the foreground, decelerate the fitting of the model to the implanted area in the frame to be detected, and the weight of the sub-model in the model remains unchanged .
  • Decelerating the fitting of the model to the implanted area in the frame to be detected means that the rate of fitting the model to the implanted area in the frame to be detected is reduced.
  • the learning rate related to the fitting speed in the model can be set to 0 to keep the weight of the sub-model in the model unchanged.
  • Step S1015 It is judged whether the illumination condition of the implanted area in the frame to be detected has changed.
  • the second color space distribution of the implanted area in the frame to be detected and the second color space distribution of the implanted area in the reference frame may be obtained first, and then the implanted area of the frame to be detected and the value of the reference frame are determined.
  • the second difference condition may be used to indicate the maximum degree of difference in the second color space distribution between the implanted area of the reference frame and the implanted area of the frame to be detected when it is determined that the illumination condition of the implanted area in the frame to be detected changes.
  • step S1016 is entered; in response to the second color space distribution of the implanted area in the frame to be detected, the second color space distribution of the implanted area in the reference frame does not meet the first
  • the second difference condition is that it is determined that the illumination condition of the implanted area in the frame to be detected has not changed, and the original learning rate is maintained at this time, and the weight is updated.
  • the second color spatial distribution may be a Hue Saturation Value (HSV) spatial distribution.
  • HSV Hue Saturation Value
  • Step S1016 Accelerate the fitting of the model to the implanted area in the frame to be detected.
  • the fitting of the model to the implanted area in the frame to be detected is accelerated, that is, the rate at which the model is fitted to the implanted area in the frame to be detected is increased.
  • the prerequisite for performing step S1016 is that the implanted area in the frame to be detected is not occluded by the foreground, and the illumination of the implanted area in the frame to be detected has changed, so in order to avoid recognizing the new illumination as the foreground It is necessary to speed up the fitting speed so that the model can be fitted to the implanted area of the frame to be detected as soon as possible, so as to ensure that the model can represent the pixel distribution characteristics of the implanted area. For example, for the model of each pixel in the implanted area, the learning rate related to the fitting speed in the model can be set to -1.
  • step S1013 to step S1016 the weight of each sub-model in the model is updated. At this time, the parameters of the sub-model need to be further updated.
  • Step S1017 It is determined whether each pixel of the implanted area in the frame to be detected matches the sub-model in the corresponding model.
  • the pixel is considered to match the sub-model .
  • the threshold may be related to the standard deviation, and may be 2.5 times the standard deviation of the sub-model. If a pixel matches at least one sub-model in the model, then step S1018 is entered; if a pixel does not match any sub-model in the model, then step S1019 is entered.
  • Step S1018 in response to the pixel points of the implanted area in the frame to be detected matching at least one sub-model in the corresponding model, update the parameters of the matched sub-model.
  • Step S1019 In response to the pixel points of the implanted area in the frame to be detected that do not match any of the sub-models in the corresponding model, a new sub-model is initialized based on the pixels, and the sub-model with the smallest weight is replaced.
  • steps S1017 to 1019 the update of the sub-model parameters is completed.
  • the background and the template that reveals the foreground so that when the information is implanted in the implanted area of the frame to be detected, it is better integrated with the background, and the foreground can not be blocked.
  • step S102 can be implemented through the following steps:
  • Step S1021 Match the color value of each pixel in the implanted area in the frame to be detected with each sub-model in the model corresponding to the pixel.
  • step S1021 the color value of each pixel in the implanted area in the frame to be detected may be compared with each sub-model corresponding to the pixel, and the color value of one pixel is compared with the mean value of at least one sub-model. When the deviation of is within a certain threshold, it indicates that the sub-model matches the pixel.
  • Step S1022 Identify the pixels that are successfully matched as the pixels of the background, and identify the pixels that have failed to match as the pixels of the foreground.
  • the implanted area in the reference frame is an area that does not block the foreground, it can be a background, and when constructing the model, it is constructed based on the pixel distribution characteristics of the implanted area in the reference frame, then if in the frame to be detected The pixel in the implanted area matches a sub-model in the model corresponding to the pixel, then the pixel is determined to be a background pixel; if the pixel in the implanted area in the frame to be inspected matches any of the pixels in the model corresponding to the pixel If none of the sub-models match, then the pixel is determined to be the foreground pixel.
  • Step S1023 corresponding to the pixel points identified as the background in the implanted area in the frame to be detected, fill a binary one in the corresponding position in the empty template.
  • Step S1024 corresponding to the pixel points identified as the foreground in the implanted area in the frame to be detected, filling binary zeros at the corresponding positions in the template filled with binary ones.
  • a binary template is generated.
  • the corresponding template position is 1, and for pixels identified as foreground, the corresponding template position Therefore, after multiplying this template and the information to be implanted, the information to be implanted after the template is applied is obtained.
  • the pixel value of the pixel identified as the foreground in the information to be implanted after the template is applied is 0,
  • the pixel value of the pixel identified as the background remains unchanged. In this way, when the applied information to be implanted covers the implanted area in the to-be-detected frame, it can be ensured that the foreground is not occluded and is highlighted relative to the implanted information.
  • step S101 it is also necessary to determine the implantation area in the frame to be detected. If the video is formed by using a static lens, the frame range and the area of view in the video are unchanged. At this time, when it is determined that the implanted area in the frame to be detected is actually implemented, it may be based on the position of the implanted area in the reference frame, and locate the area at the corresponding position in the frame to be detected to obtain the Implant area in the frame.
  • determining the implanted area in the frame to be detected can be achieved through the following steps:
  • Step 21 Match the feature extracted from the implanted region in the reference frame of the video with the feature extracted from the frame to be detected.
  • step 21 you can first extract the feature points from the implanted area in the reference frame, then extract the feature points in the frame to be detected, and then combine the feature points extracted from the implanted area in the reference frame with The feature points in the frame to be detected are matched.
  • the feature points when extracting the feature points, it can be the features from the oriented corner test (Features from Accelerated Segment Test, FAST) and the rotated binary robust independent elementary features (Binary Robust Independent Elementary Features, BRIEF) feature points (Oriented). FAST and Rotated Brief, ORB), or Scale-Invariant Feature Transform (SIFT) feature points.
  • FAST Accelerated Segment Test
  • BRIEF Binary Robust Independent Elementary Features
  • ORB Rotated Brief
  • SIFT Scale-Invariant Feature Transform
  • Step 22 In response to the successful matching, it is determined that the frame to be detected includes an implanted area corresponding to the implanted area in the reference frame.
  • the feature points of the implanted area in the reference frame and the feature points in the frame to be detected can be matched successfully, which can mean that all the feature points are successfully matched, or a part of the feature points are successfully matched, for example, 80 % Of feature points are successfully matched.
  • step 21 to step 22 the implanted area in the frame to be detected is tracked by matching the feature points of the implanted area in the reference frame with the feature points in the frame to be detected. Compared with the realization of motion tracking in real-time High, wide range of application, strong robustness, automatic and efficient use.
  • the position, optical axis, and focal length of the lens may change, so the position of the implanted area in each image frame of the video formed by using the moving lens It will change.
  • step S1013 the following steps need to be performed:
  • Step 31 Transform the implanted area in the frame to be detected so that the position of each pixel in the transformed implanted area is consistent with the position of the corresponding pixel in the implanted area in the reference frame.
  • step 31 when step 31 is implemented, it can first track the implanted area (that is, the background area for implanting information) to generate a homography matrix H, and then calculate the implanted area in the frame to be detected according to the homography.
  • the matrix H is transformed into the reference frame, so that the position of each pixel in the implanted area after transformation is consistent with the position of the corresponding pixel in the implanted area in the reference frame. Further, it can be implemented according to formula (2-1):
  • x t , y t represent a pixel point in the current frame
  • x 0 , y 0 represent a pixel point corresponding to the pixel point in the reference frame.
  • the implanted area that has undergone homography matrix transformation is actually used, so the to-be-detected frame is identified in step S102
  • the background and foreground of the implanted area in the frame, as well as the implanted area that undergoes homography matrix transformation are also used when generating a template for blocking the background and revealing the foreground.
  • it is also necessary to perform the inverse transformation of the transformation on the template so that the position of each binary number in the transformed template corresponds to the pixel point of the implanted area in the frame to be detected. The location is consistent.
  • the pixel distribution characteristics of each pixel in the implanted area in the frame to be detected are used to fit the background pixel distribution of the implanted area in the reference frame, and Gaussian mixture is used to construct Model, automatically learn and update the model, and determine the template that can shield the background and display the foreground according to the occlusion detection result, so as to avoid the embedded information to block the foreground.
  • the transformation technology is used to map the position of the pixel in the implanted area in the frame to be detected to a position consistent with the implanted area in the reference frame, and the pixel in the implanted area in the frame to be inspected is also mapped Occlusion detection, generate a template, and then inversely transform the template to form a template that can shield the background and display the foreground, so as to ensure that the foreground can not be blocked after the information is implanted.
  • FIG. 5 is a schematic diagram of another implementation process of the method for embedding information in a video according to an embodiment of the present application. As shown in Figure 5, the method includes:
  • Step S401 The terminal obtains the video to be processed and the information to be implanted.
  • the video to be processed may be a video recorded by the terminal, or a video downloaded by the terminal from a server, of course, it may also be a video sent to the terminal by other terminals.
  • the information to be implanted may be image information to be implanted, and the image information to be implanted may be advertisement image information, or publicity information.
  • the video to be processed may be a video file that includes many image frames.
  • the video to be processed may also refer to the identification information of the video to be processed, for example, it may include the video to be processed. The title, starring and other information.
  • Step S402 The terminal sends an implantation request carrying at least the video and the information to be implanted to the server.
  • the implantation request may also include the identification of the reference frame and the information of the implantation area in the reference frame.
  • the implantation request may include the frame number of the reference frame and the coordinates of the four vertices of the implantation area in the reference frame.
  • Step S403 the server determines the reference frame and the implantation area in the reference frame based on the received implantation request.
  • the received implantation request may be parsed to obtain the set reference frame and the implantation area set in the reference frame.
  • the image frame of the video file can be analyzed by means of image recognition, so as to determine the reference frame that meets the information implantation condition and the implantation area in the reference frame.
  • information implantation conditions it may include at least one of the following: type of implantation area (for example, wall, floor), size of implantation area (for example, width and height to fit the information to be implanted), implantation area The color (for example, to form a certain contrast with the information to be implanted), and the exposure time of the implanted area (that is, the cumulative duration of appearance in the video).
  • type of implantation area for example, wall, floor
  • size of implantation area for example, width and height to fit the information to be implanted
  • implantation area The color for example, to form a certain contrast with the information to be implanted
  • the exposure time of the implanted area that is, the cumulative duration of appearance in the video.
  • step S404 the server constructs a model that conforms to the pixel distribution characteristics of the implanted area in the reference frame, and controls the update of the model based on the frame to be detected subsequent to the reference frame.
  • Step S405 The server recognizes the background and the foreground of the implanted region in the frame to be detected based on the model, and generates a template for blocking the background and revealing the foreground.
  • Step S406 The server applies the template to the information to be implanted, so as to shield the content in the information to be implanted that would obscure the foreground.
  • Step S407 covering the information to be implanted after applying the template to the implantation area in the frame to be detected, so that the foreground is highlighted relative to the information to be implanted.
  • step S404 to step S407 can be understood with reference to the description of similar steps above.
  • step S408 the server encapsulates the video after the information is implanted, and sends the encapsulated video to the terminal.
  • the server before the server embeds information on each image frame in the video, it first divides the video into frames to obtain individual image frames, and then embeds information on each image frame. After the information is implanted, in order to obtain a normal video file, each image frame, audio, subtitles, etc. need to be concentrated, so that the audio, image frames, and subtitles become a whole.
  • the server may also publish the video with the embedded information in the video-watching application.
  • Step S409 The terminal publishes the video with the information implanted.
  • it can be published in the application for watching the video, or sent to other terminals, for example, it can be published in a friend group of an instant messaging application.
  • the terminal when the terminal wants to embed information in the video, it sends the to-be-processed video and the to-be-embedded information to the server, and the server builds a model according to the pixel distribution characteristics of the implanted area in the reference frame. Since the implanted area in the reference frame will not occlude the foreground of the video, the background and foreground of the pixels in the implanted area in the subsequent frame to be detected can be identified based on the constructed model, and the background can be further generated. A template that does not obscure the foreground.
  • the content that will obscure the foreground in the information to be implanted can be shielded, so that after the information is implanted in the frame to be detected, the foreground part of the frame to be detected will not be occluded, thereby ensuring the video quality The viewing experience.
  • the embodiment of the present application further provides a method for embedding information in a video.
  • the method includes two stages in the implementation process: a background modeling learning stage and an occlusion prediction stage.
  • FIG. 6 is a schematic diagram of another implementation process of the method for embedding information in a video according to an embodiment of the application. As shown in FIG. 6, the method includes:
  • Step S501 Obtain a background picture.
  • Step S502 Perform Gaussian mixture modeling according to the background image.
  • step S501 The background modeling process is completed through step S501 and step S502.
  • Step S503 framing the video.
  • Step S504 Obtain a picture to be predicted.
  • Step S505 Inversely transform the image to be predicted based on background modeling to obtain an inversely transformed picture.
  • Step S506 Perform forward transformation on the inverse transformed picture to obtain an occlusion mask.
  • the flowchart shown in Figure 6 constructs an adaptive Gaussian mixture model for background modeling. Based on the initial frame of the business opportunity implanted in the video advertisement, the background model is adaptively selected for subsequent frames, and the learning rate is adaptively selected. Iterative updates to optimize the model.
  • Fig. 7 is a schematic diagram of another implementation process of embedding information in a video according to an embodiment of the application. As shown in Fig. 7, in this embodiment, information can be embedded in a video through the following steps:
  • Step S601 Deframe the video.
  • the input video is divided into frames through image processing technology, and the video is split into each frame as the picture to be predicted.
  • Step S602 Locate the initial frame of the business opportunity (that is, the frame where the advertisement is to be implanted), and the corresponding implantation area.
  • the initial frame of the business opportunity and the corresponding implantation area can be manually set.
  • image recognition technology based on neural networks can be used to determine the initial frames and plants of business opportunities. Into the area and a specific location (for example, the middle area, consistent with the size of the advertisement), the specific location corresponds to the corresponding implant area.
  • Step S603 According to the image of the implanted area in the initial frame of the business opportunity, initialize the Gaussian mixture model corresponding to each pixel of the implanted area.
  • Step S604 the subsequent frame (that is, the subsequent frame of the video including the implanted area) is processed as follows:
  • Step S6041 Compare the distribution characteristics of the implanted area of the subsequent frame with the implanted area of the initial frame to determine whether occlusion occurs; when occlusion occurs, the learning rate is updated.
  • Step S6042 Adjust the learning rate according to whether there is a change in illumination.
  • step S6043 the background/foreground pixels are recognized, and the model is updated based on the recognition result and the updated learning rate, and the mask is further determined.
  • the weight of the model is updated according to the updated learning rate; for the parameter, the mean and standard deviation of the unmatched pattern remain unchanged, and the mean and standard deviation of the matched pattern are updated according to the updated learning rate and weight. If no pattern matches, the pattern with the smallest weight is replaced.
  • the models are arranged in descending order of ⁇ / ⁇ 2, with the highest weight and the lowest standard deviation.
  • is the weight and ⁇ is the learning rate.
  • step S6044 the information to be implanted after applying the mask is implanted into the implantation area of the subsequent frame.
  • step S6044 can be understood with reference to the description of similar steps above.
  • Step S605, step S604 is repeated, and after all subsequent frame processing is completed, the image frame is encapsulated.
  • the embedded advertisement will not occlude the foreground part of the image frame, thereby bringing a better viewing experience.
  • steps S601 to S603 correspond to the background modeling learning part
  • steps S604 to S605 correspond to the occlusion prediction part.
  • the reference frame (that is, the initial frame of the business opportunity including the implanted area) of the video implanted advertisement item may be obtained as a background modeling to model the prior implanted area (video In the background area of the reference frame, the specific area used to implant the advertisement, namely the implantation area, is initialized with the Gaussian Mixture Model (GMM).
  • GMM Gaussian Mixture Model
  • the implanted area in the initial frame of the business opportunity satisfies the condition: the implanted area in the initial frame of the business opportunity is not blocked by the foreground. Therefore, when the model is initialized, the pixel distribution of the implanted area can be fully learned.
  • the Gaussian mixture model uses K patterns for the color values of the pixels (in some embodiments, the patterns may also be called Gaussian mode/Gaussian component/sub-model). Means, usually K is between 3-5.
  • the Gaussian mixture model represents the color value X presented by the pixel as a random variable, and the color value of the pixel in each frame of the video is the sampling value of the random variable X.
  • the color value of each pixel in the scene can be represented by a mixed distribution composed of K Gaussian components, that is, the probability that the pixel j in the image takes the value x j at time t is:
  • represents the Gaussian probability density function
  • d is the dimension of x j .
  • the covariance matrix is defined as:
  • represents the standard deviation
  • I represents the identity matrix
  • the initialization of the Gaussian mixture model may be the initialization of various parameters.
  • An initialization method is: in the initialization phase, if the initialization speed of the mixed Gaussian parameters is not high, then the range of each color channel of the pixel is [0, 255], and the K Gaussian components can be directly initialized.
  • the mean value is the color value of the pixel
  • the variance is a preset empirical value.
  • Another way of initialization is to initialize the first Gaussian component corresponding to each pixel in the first frame of image, assign the mean value to the color value of the current pixel, and assign the weight to 1, except for the first Gaussian component
  • the mean and weight of Gaussian components other than those are initialized to zero.
  • the variance is a preset empirical value.
  • step S6041 For a video formed by using a still lens, when step S6041 is implemented, it may be:
  • the implanted area of each subsequent frame of the initial frame of the business opportunity For the implanted area of each subsequent frame of the initial frame of the business opportunity, compare the RGB color space distribution of the implanted area of the subsequent frame and the initial implanted area (ie the implanted area of the initial frame), and determine whether there is occlusion based on the difference in RGB distribution . That is, it is determined whether the advertisement implanted in the implantation area of the initial frame of the business opportunity will block the foreground that appears in the implantation area in the subsequent frames, for example, the "baby type" in FIG. 8B. If the difference of RGB distribution satisfies the difference condition, it is considered that the background of the implanted area is occluded by the foreground.
  • Judging whether the difference in RGB distribution satisfies the difference condition can be achieved by comparing histogram distributions. For example, 0-255 gray levels can be divided into 16 intervals, and the distribution of pixels in each frame in the 16 intervals can be counted and compared. If the difference between the histogram distribution of the implanted area in the subsequent frame and the histogram distribution of the initial implanted area exceeds a certain threshold, it means that the difference in the RGB distribution meets the difference condition. At this time, the background of the implanted area in the subsequent frame is considered to be The foreground is blocked.
  • the difference between the histogram distribution of the implanted area of the subsequent frame and the histogram distribution of the initial implanted area does not exceed the threshold, it means that the difference of the RGB distribution does not meet the difference condition.
  • the background is not obscured by the foreground.
  • the updated learning rate is set to 0 (that is, the weight of the model in the model is not updated with subsequent frames); if there is no occlusion, the original learning rate can be maintained.
  • step S6042 when step S6042 is implemented, it may be:
  • HSV can reflect the change of illumination, if the illumination of the background changes, you can increase the weight of the mode that conforms to the new illumination change by adjusting the learning rate to -1 to avoid the new illumination being recognized as the foreground Circumstances: If there is no lighting change, the original learning rate can be maintained.
  • step S6043 when step S6043 is implemented, it may be to identify the pixel type of the implanted area of the subsequent frame, update the model, and further determine the mask.
  • the color value X t in the subsequent frame is compared with the current K modes (ie K Gaussian components) of the pixel. If it is compared with at least one mode If the deviation of the mean is within 2.5 ⁇ of the pattern (that is, 2.5 times the standard deviation), it is considered that the pattern matches the pixel, and the pixel belongs to the background of the video; if it does not match, the pixel belongs to the foreground.
  • K modes ie K Gaussian components
  • the mask After determining whether a pixel is the foreground or the background, the mask is determined and the morphology is improved.
  • the corresponding value of the pixel in the mask is 1; if the pixel belongs to the foreground of the video, then the corresponding value of the pixel in the mask is 0.
  • the morphological improvement mask is mainly used to repair some errors in the judgment foreground and occlusion of the pattern, including eliminating holes and connection faults in the mask, and avoiding the appearance of the exposed video foreground after the occlusion processing Noise.
  • 9A and 9B are schematic diagrams of using morphology to improve the mask. As shown in FIG. 9A, the holes in the white area in 901 can be eliminated by morphology, forming a completely connected area as shown in 902. As shown in FIG. 9B, the faults in 911 can be connected together through morphology, and a connected complete area as shown in 912 can also be formed.
  • Updating the model may be updating the weight of the model according to the updated learning rate.
  • the mean and standard deviation of the unmatched pattern of a pixel remain unchanged, and only the mean and standard deviation of the matched pattern are updated. If no pattern matches the pixel, a new pattern is initialized based on the pixel and the pattern with the smallest weight is replaced; each pattern is arranged in descending order of ⁇ / ⁇ 2, with the highest weight and the smallest standard deviation.
  • is the weight and ⁇ is the learning rate.
  • the i-th pattern is updated by x j , and the rest of the patterns remain unchanged.
  • the update method is as follows:
  • is the learning rate of the model
  • is the learning rate of the parameters, reflecting the convergence speed of the model parameters.
  • the weight of each mode needs to be normalized.
  • the parameter update in order to determine that the mode in the Gaussian mixture model of the pixel is generated by the background, according to each mode according to ⁇ / ⁇ 2 from large to small Sort, select the first B patterns as the distribution of the background, B satisfies the following formula, and the parameter Q represents the proportion of the background;
  • the larger one indicates that the pixel value has a larger variance and a higher probability of occurrence, which exactly reflects the characteristics of the background pixel value of the scene.
  • FIG. 8A and 8B are schematic diagrams of the effect of embedding information into a video formed by using a static lens according to an embodiment of the application.
  • the image shown in FIG. 8A may be a certain frame before the image shown in FIG. 8B (that is, a certain frame before the video does not explain "baby style").
  • FIG. 8A in the image frame
  • the wall area 801 does not show the "baby style". If the wall area is used as the advertising placement area, then in the subsequent frame, that is, in the image frame shown in Figure 8B, the foreground "baby style" appears.
  • Embedded ads are used as a layer overlay, where the "baby" part will be obscured.
  • the three words "baby" float on the advertisement that is, the embedded advertisement 811 will not block the foreground content of the video , So as to ensure the integrity of the original video's foreground content at the ad placement location.
  • step S604 For a video formed by a moving lens, when step S604 is implemented, the following steps need to be performed before step S6041:
  • Step 71 Track subsequent frames including the implanted area.
  • Template matching is performed by feature tracking technology (a template of feature points, such as feature points found using the orb method), or the sift method is used to track subsequent frames including the implanted area.
  • the implanted area that is, the background area for implanting information
  • the homography matrix H Since the background modeling is to model each pixel, the initial frame of the business opportunity (refer to Frame) and the pixel positions of the implanted area in the subsequent frames have a one-to-one correspondence. Because if the camera moves, then the initial frame of the business opportunity and the pixel position of the implant area of the current frame do not correspond.
  • x t , y t represent a pixel in the current frame
  • x 0 , y 0 represent the pixel corresponding to the pixel in the initial frame of the business opportunity.
  • steps S6041 and S6042 are similar to the implementation process of steps S6041 and S6042 of a video formed by a static lens, and can be understood with reference to the description of similar steps above.
  • step S6043 it is also necessary to identify the pixel type of the implanted area of the subsequent frame to update the model and determine the mask.
  • the homography matrix H is used to inversely transform the mask (mask) into the position of the subsequent frame, as shown in the following formula (3-11):
  • the advertisement is implanted in the implantation area of the subsequent frame, and the corresponding mask and video encapsulation are applied to the implantation area for the image frame judged to be occluded.
  • FIG. 8C and FIG. 8D are schematic diagrams of the effect of embedding information in a video formed by a dynamic lens according to an embodiment of the application.
  • FIG. 8C is a frame where the character does not appear. If the ground is used as the advertisement placement area 821 at this time, the image frame after the advertisement placement is shown in FIG. 8C. In the subsequent frames, if the embedded advertisement "Hello Qin Pro" is directly covered with a layer, the legs of the characters appearing in the area will be blocked. After applying the solution of embedding information in the video provided by this embodiment, as shown in FIG. 8D, the legs of the character are displayed on top of the implanted advertisement, so that the advertisement implantation area 831 will not block the foreground of the video.
  • the software module in the apparatus 240 may include:
  • the model construction module 241 is configured to construct a model that conforms to the pixel distribution characteristics of the implanted area in the reference frame, and control the update of the model based on the subsequent to-be-detected frames of the reference frame;
  • the template generation module 242 is configured to identify the background and foreground of the implanted region in the frame to be detected based on the model, and generate a template for occluding the background and revealing the foreground;
  • the template application module 243 is configured to apply the information to be implanted to the template, so as to shield the content in the information to be implanted that would obstruct the foreground;
  • the information covering module 244 is used for covering the information to be implanted after applying the template to the implantation area in the frame to be detected, so that the foreground is highlighted relative to the information to be implanted.
  • the device further includes:
  • the parameter initialization module is configured to correspond to each pixel of the implanted area in the reference frame, and initialize at least one sub-model corresponding to the pixel and the weight corresponding to the at least one sub-model;
  • the weight mixing module is used to mix the sub-models constructed corresponding to each pixel based on the initialized weights to form a model corresponding to the pixel.
  • the device further includes:
  • the weight holding module is configured to reduce the rate at which the model is fitted to the implanted area in the frame to be detected in response to the implanted area in the frame to be detected being blocked by the foreground, in the model The weight of the sub-model remains unchanged;
  • the fitting acceleration module is used to respond to the implanted area in the to-be-detected frame not being blocked by the foreground, and the illumination of the implanted area in the to-be-detected frame changes, to transfer the model to the to-be-detected frame
  • the fitting rate of the implanted area in the frame is increased.
  • the device further includes:
  • the parameter update module is used to respond to the pixel points of the implanted area in the frame to be detected matching at least one sub-model in the corresponding model, update the parameters of the matched sub-model, and keep the corresponding model unmatched The parameters of the sub-model remain unchanged.
  • the device further includes:
  • the first matching module is configured to match the color value of each pixel of the implanted area in the frame to be detected with the sub-model in the model corresponding to the pixel;
  • the recognition module is used for recognizing the pixels that are successfully matched as the pixels of the background, and the pixels that are not matched as the pixels of the foreground.
  • the device further includes:
  • the filling module is used to correspond to the pixels identified as background in the implanted area in the frame to be detected, and fill binary ones in the corresponding positions in the empty template, and
  • binary zeros are filled in the corresponding positions in the template filled with binary ones.
  • the device further includes:
  • the arithmetic module is used to multiply the information to be implanted with the binary number filled in each position in the template.
  • the device further includes:
  • the second matching module is configured to match the features extracted from the implanted region in the reference frame of the video with the features extracted from the frame to be detected in response to the video being formed using a motion lens;
  • the area determining module is configured to determine that the frame to be detected includes the implanted area corresponding to the implanted area in the reference frame in response to the successful matching.
  • the device further includes:
  • the area transformation module is used to respond to the video being formed with a motion lens
  • the template inverse transformation module is used to perform the inverse transformation of the template on the template before applying the information to be implanted, so that the position of each binary number in the transformed template is the same as the frame to be detected The positions of the corresponding pixels in the implanted area are the same.
  • the device further includes:
  • the region positioning module is used to respond to the video being formed by using a static lens, and locate the region of the corresponding position in the frame to be detected based on the position of the implanted region in the reference frame to determine the implanted region to be detected .
  • the device further includes:
  • the first determining module is configured to determine the first difference condition in response to the first color space distribution of the implanted area in the frame to be detected and the first color space distribution of the implanted area in the reference frame The implanted area in the reference frame is blocked by the foreground;
  • the second determining module is configured to determine that the second color space distribution of the implanted area in the frame to be detected meets a second difference condition with the second color space distribution of the implanted area in the reference frame The implanted area in the reference frame is occluded by the foreground.
  • the method provided by the embodiment of the application can be directly executed by the processor 410 in the form of a hardware decoding processor, for example, by one or more application specific integrated circuits.
  • ASIC Application Specific Integrated Circuit
  • DSP Programmable Logic Device
  • PLD Programmable Logic Device
  • CPLD Complex Programmable Logic Device
  • FPGA Field-Programmable Gate Array
  • the embodiment of the present application provides a storage medium storing executable instructions, and the executable instructions are stored therein.
  • the processor will cause the processor to execute the method provided in the embodiments of the present application, for example, as shown in FIG. 3 to 6 shows the method.
  • the storage medium may be FRAM, ROM, PROM, EPROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM, etc.; it may also be a variety of devices including one or any combination of the foregoing memories. .
  • executable instructions may be in the form of programs, software, software modules, scripts or codes, written in any form of programming language (including compiled or interpreted languages, or declarative or procedural languages), and their It can be deployed in any form, including being deployed as an independent program or deployed as a module, component, subroutine or other unit suitable for use in a computing environment.
  • executable instructions may but do not necessarily correspond to files in the file system, and may be stored as part of a file that saves other programs or data, for example, in a HyperText Markup Language (HTML, HyperText Markup Language) document
  • HTML HyperText Markup Language
  • One or more of the scripts in are stored in a single file dedicated to the program in question, or in multiple coordinated files (for example, a file storing one or more modules, subroutines, or code parts).
  • executable instructions can be deployed to be executed on one computing device, or on multiple computing devices located in one location, or on multiple computing devices that are distributed in multiple locations and interconnected by a communication network Executed on.
  • the embodiment of the application can construct a model based on the pixel distribution characteristics of the implanted area in the reference frame, perform occlusion detection on the implanted area in the frame to be detected, and update the model parameters based on the occlusion detection result.
  • the implanted area of the frame to be detected is fitted to the background pixel distribution of the implanted area in the reference frame, so that the implanted information can be better integrated into the background of the video without obstructing the foreground, thereby bringing a better viewing experience.
  • the feature points are used to determine the implantation area, and the pixel points of the implantation area in the frame to be detected are mapped to the position consistent with the reference frame through transformation, without the need for motion tracking, which is more real-time High and robust.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

本申请提供了一种在视频中植入信息的方法、计算机设备及存储介质;方法包括:构建符合参考帧中植入区域的像素分布特性的模型,基于所述参考帧后续的待检测帧控制所述模型的更新;基于所述模型识别所述待检测帧中植入区域的背景和前景,生成用于遮挡所述背景、用于显露前景的模板;将待植入信息应用所述模板,以屏蔽所述待植入信息中会遮挡所述前景的内容;将应用所述模板后的所述待植入信息覆盖到所述待检测帧中植入区域,以使所述前景相对于所述待植入信息突出显示。通过本申请能够高效地将信息融入到视频的背景内容当中。

Description

在视频中植入信息的方法、计算机设备及存储介质
本申请要求于2019年05月09日提交的申请号为201910385878.4、发明名称为“一种在视频中植入信息的方法、装置及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图形图像技术,尤其涉及一种在视频中植入信息的方法、计算机设备及存储介质。
背景技术
视频是目前主流的信息载体,随着互联网特别是移动互联网的发展,视频的传播速度得以迅速提升,使得视频成为信息传播的重要途径。视频的信息植入是指,在不影响观看视频的主要内容(例如前景内容)的情况下,在视频的背景中叠加各种信息,例如推广信息,包括图像、文字或者二者的结合。
视频的主体内容(例如视频中的人物、视频后期制作时加入的特效等)是以前景内容的形式呈现的,为了在播放视频时使用户始终能够看到视频的主体内容,需要将信息融入到视频的背景内容当中。相关技术缺乏有效的解决方案。
发明内容
本申请实施例提供一种在视频中植入信息的方法、计算机设备及存储介质,能够高效地将信息融入到视频的背景内容当中。
本申请实施例的技术方案是这样实现的:
本申请实施例提供一种在视频中植入信息的方法,包括:
构建符合参考帧中植入区域的像素分布特性的模型,基于所述参考帧后续的待检测帧控制所述模型的更新;
基于所述模型识别所述待检测帧中植入区域的背景和前景,生成用于遮挡所述背景、用于显露所述前景的模板;
将待植入信息应用所述模板,以屏蔽所述待植入信息中会遮挡所述前景的内容;
将应用所述模板后的所述待植入信息覆盖到所述待检测帧中植入区域,以使所述前景相对于所述待植入信息突出显示。
本申请实施例提供一种在视频中植入信息的装置,包括:
模型构建模块,用于构建符合参考帧中植入区域的像素分布特性的模型,基于所述参考帧后续的待检测帧控制所述模型的更新;
模板生成模块,用于基于所述模型识别所述待检测帧中植入区域的背景和前景,生成用于遮挡所述背景、用于显露所述前景的模板;
模板应用模块,用于将待植入信息应用所述模板,以屏蔽所述待植入信息中会遮挡所述前景的内容;
信息覆盖模块,用于将应用所述模板后的所述待植入信息覆盖到所述待检测帧中植入区域,以使所述前景相对于所述待植入信息突出显示。
在一种可能实现方式中,所述装置还包括:
参数初始化模块,用于对应所述参考帧中植入区域的每个像素点,初始化与所述像素点对应的至少一个子模型以及所述至少一个子模型对应的权重;
权重混合模块,用于将对应每个像素点构建的子模型基于初始化的权重混合,以形成与所述像素点对应的模型。
在一种可能实现方式中,所述装置还包括:
权重保持模块,用于响应于所述待检测帧中植入区域被所述前景遮挡,将所述模型向所述待检测帧中植入区域进行拟合的速率进行减小;
拟合加速模块,用于响应于所述待检测帧中植入区域未被所述前景遮挡、且所述待检测帧中植入区域的光照情况发生变化,将所述模型向所述待检测帧中植入区域进行拟合的速率进行提升。
在一种可能实现方式中,所述装置还包括:
参数更新模块,用于响应于所述待检测帧中植入区域的像素点与对应模型中的至少一个子模型匹配,更新所述匹配的子模型的参数,以及保持所述对应模型中未匹配的子模型的参数不变。
在一种可能实现方式中,所述装置还包括:
第一匹配模块,用于将所述待检测帧中植入区域的每个像素点的颜色值,与所述像素点对应模型中的子模型匹配;
识别模块,用于将匹配成功的像素点识别为所述背景的像素点,将匹配失败的像素点识别为所述前景的像素点。
在一种可能实现方式中,所述装置还包括:
填充模块,用于对应所述待检测帧中植入区域中被识别为背景的像素点,在空的所述模板中对应的位置填充二进制一,以及
对应所述待检测帧中植入区域中被识别为前景的像素点,在填充二进制一的所述模板中对应的位置填充二进制零。
在一种可能实现方式中,所述装置还包括:
运算模块,用于将所述待植入信息,与所述模板中每个位置填充的二进制数进行乘法操作。
在一种可能实现方式中,所述装置还包括:
第二匹配模块,用于响应于视频为采用运动镜头形成,将从所述视频的参考帧中植入区域提取的特征,与从所述待检测帧中提取的特征匹配;
区域确定模块,用于响应于匹配成功,确定所述待检测帧中包括与参考帧中植入区域对应的植入区域。
在一种可能实现方式中,所述装置还包括:
区域变换模块,用于响应于视频为采用运动镜头形成,
基于所述参考帧后续的待检测帧控制所述模型的更新之前,将所述待检测帧中植入区域进行变换,以使变换后的植入区域中每个像素点的位置,与所述参考帧中植入区域相应像素点的位置一致;
模板逆变换模块,用于将待植入信息应用所述模板之前,将所述模板进行所述变换的逆变换,以使变换后的模板中每个二进制数的位置,与所述待检测帧中植入区域相应像素点的位置一致。
在一种可能实现方式中,所述装置还包括:
区域定位模块,用于响应于视频为采用静态镜头形成,基于所述参考帧中植入区域的位置,在所述待检测帧中定位相应位置的区域,以确定所述待检测中植入区域。
在一种可能实现方式中,所述装置还包括:
第一确定模块,用于响应于所述待检测帧中植入区域的第一色彩空间分布,与所述参考帧中植入区域的第一色彩空间分布满足第一差异性条件,确定所述待检测帧中植入区域被所述前景遮挡;
第二确定模块,用于响应于所述待检测帧中植入区域的第二色彩空间分布,与所述参考帧中植入区域的第二色彩空间分布满足第二差异性条件,确定所述待检测帧中植入区域的光照情况发生变化。
本申请实施例提供一种计算机设备,包括:
存储器,用于存储可执行指令;
处理器,用于执行所述存储器中存储的可执行指令时,实现本申请实施例提供的方法。
本申请实施例提供一种存储介质,所述存储介质存储有可执行指令,用于引起处理器执行时,实现本申请实施例提供的方法。
本申请实施例提供一种计算机程序产品,所述计算机程序产品存储有计算机程序,用于由处理器加载并执行时,实现本申请实施例提供的方法。
本申请实施例具有以下有益效果:
基于参考帧中植入区域的像素分布特性构建模型,并根据模型识别待检测帧中植入区域的背景和前景,生成能够遮挡背景、显露前景的模板,在将待植入信息应用模板后可能将植入信息中会遮挡前景的内容过滤掉,保证视频中植入的信息不会遮挡视频的前景,使得植入信息能够更好地融入到视频的背景中,从而带来更好的观看体验。
附图说明
图1A是本申请实施例利用掩膜处理图像的示意图;
图1B为本申请实施例提供的应用场景示意图;
图2是本申请实施例提供的装置的一个可选的结构示意图;
图3是本申请实施例在视频中植入信息的方法的实现流程示意图;
图4是本申请实施例中构建并更新模型的实现流程示意图;
图5为本申请实施例在视频中植入信息的方法的又一实现流程示意图;
图6为本申请实施例在视频中植入信息的方法的再一实现流程示意图;
图7为本申请实施例在视频中植入信息的再一实现过程示意图;
图8A为本申请实施例在采用静态镜头形成的视频中植入信息的效果示意图;
图8B为本申请实施例在采用静态镜头形成的视频中植入信息的又一效果示意图;
图8C为本申请实施例在采用动态镜头形成的视频中植入信息的效果示意图;
图8D为本申请实施例在采用动态镜头形成的视频中植入信息的又一效果示意图;
图9A为本申请实施例利用形态学改善掩膜的示意图;
图9B为本申请实施例利用形态学改善掩膜的又一示意图。
具体实施方式
为了使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请作进一步 地详细描述,所描述的实施例不应视为对本申请的限制,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本申请保护的范围。
在以下的描述中,涉及到“一些实施例”,其描述了所有可能实施例的子集,但是可以理解,“一些实施例”可以是所有可能实施例的相同子集或不同子集,并且可以在不冲突的情况下相互结合。
在以下的描述中,所涉及的术语“第一\第二\第三”仅仅是是区别类似的对象,不代表针对对象的特定排序,可以理解地,“第一\第二\第三”在允许的情况下可以互换特定的顺序或先后次序,以使这里描述的本申请实施例能够以除了在这里图示或描述的以外的顺序实施。
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本申请实施例的目的,不是旨在限制本申请。
对本申请实施例进行进一步详细说明之前,对本申请实施例中涉及的名词和术语进行说明,本申请实施例中涉及的名词和术语适用于如下的解释。
1)掩膜,也称为滤镜或模板,是用于对待处理的图像中的(部分或者全部)像素进行屏蔽的图像,以使特定图像中的部分突出显示。
掩膜可以是二维矩阵数组,有时也用多值图像。图像掩膜主要用于对图像某些区域作屏蔽。原图中的每个像素和掩膜中相同位置的二进制数(也称为掩码)进行与运算。比如1&1=1;1&0=0。
比如一个如图1A中101所示的3*3的图像与图1A中102所示的3*3的掩膜进行运算,得到图1A中的103所示的结果图像。
2)静态镜头,即固定镜头(Fixed Shot,FS),是摄影机机位、镜头光轴和焦距都固定不变的镜头。静态镜头的视频中的对象(真实对象如人,虚拟对象如动画形象)可以是静态的,也可以是动态(出入画面)的,但是画面所依附的框架不动,即画面范围和视域面积是始终如一。
3)运动镜头,是利用各种运动(例如机位、光轴、焦距的变化)来摄像的镜头。运动镜头的视频中的画面所依附的框架可以是变化的,即画面范围和视域面积是可以变化,例如成像的远近、大小和角度的变化。
4)背景,视频的画面中主体背后的景物,能够表现人物或事件所处的时空环境,例如人物后方的建筑物、墙壁、地面等。
5)前景,视频画面中较背景更靠近镜头的内容,是视频展现的主体,例如站立在建筑物前的人物。
为了更好地理解本申请实施例中提供的在视频中植入信息的方法,首先对相关技术中针对视频中植入信息的遮挡问题的解决方案进行分析说明。
背景减除,即人工设定一个固定阈值,将视频中新的包括潜在的前景的区域与原始的背景区域进行相减,与阈值比对确定背景是否被前景遮挡,进而形成对应遮挡部分的掩膜。解决方案对前景和背景的判定依赖于人工选取的阈值,因此自动化程度低,且需要频繁调整;当前景和背景颜色相近时,前景和背景之间减除不彻底,准确度较低。
静态镜头的高斯混合背景建模,即对静态镜头挑选无遮挡情况的背景进行建模,对后续的图像帧使用模型进行背景是否被前景遮挡的判别,以形成遮挡部分的掩膜。解决方案只能用于固定镜头的视频,如果是运动镜头的视频则容易把背景当成前景识别出来,同样准确度较低。
轨迹分类,即在初始帧中标定感兴趣的目标点,利用运动跟踪模型获得植入信息中特征点的轨迹,基于轨迹进行前景和背景的判别。解决方案对图像帧中的噪声比较敏感,准确度依赖运动跟踪模型。如果选取的运动跟踪模型不合适,前景和背景的判别准确度会受到很大的影响。
针对上述几种解决方案中存在的技术问题,本申请实施例提供一种在视频中植入信息的方法,结合视频序列和全像素统计来建模的方法,对静止镜头的视频,实现自动选取背景建模,后续帧自动更新学习率以优化模型,使用统计特征判定是否存在遮挡并形成掩膜;对运动镜头的视频,使用变换技术映射为参考帧的标准画面进行像素统计建模,再返回到后续帧的画面得出遮挡的掩膜,无需运动跟踪模型,实时性高,适用范围广,鲁棒性强,使用自动高效。
下面说明实现本申请实施例的装置的示例性应用,本申请实施例提供的装置可以实施为移动电话(手机)、平板电脑、笔记本电脑等具有无线通信能力的移动终端,还可以实施为不便移动的具有计算功能的台式计算机、桌面电脑等。另外,本申请实施提供的装置也可以实施为服务器,服务器可以是指一台服务器,也可以是由多台服务器构成的服务器集群、云计算中心等等,在此不加以限定。
参见图1B,图1B为本申请实施例提供的应用场景示意图,为实现支撑一个示例性应用,终端400通过网络300连接服务器200,网络300可以是广域网或者局域网,又或者是二者的组合,使用无线链路实现数据传输。
响应于终端400想要在一个视频中植入信息,植入的信息可以是广告,视频可以是利用终端录制的视频,此时,终端400可以将视频和要植入的信息发送给服务器200,请求服 务器200在视频中植入信息,此时服务器200在接收到视频和待植入的信息后,利用本申请实施例提供的在视频中植入信息的方法,将待植入的信息加入到视频的各个帧中,并进行封装,得到封装后的视频文件,最后再将封装后的视频文件发送给终端400。终端400可以将植入了广告的视频发布出去。
在一些实施例中,还可以是终端400在录制好视频并确定了待植入的信息后,由终端400自身利用本申请实施例提供的在视频中植入信息的方法,将待植入信息加入到视频的各个帧中,并进行封装得到视频文件,然后通过观看视频的APP发布植入了广告的视频。需要说明的是,为了减少终端的运算量以及植入效率,一般终端自身进行信息植入时,是针对时长比较短的视频。
在一些实施中,当终端400想要在一个视频中植入信息时,视频为服务器200中存储的视频,此时终端400可以向服务器200发送待植入信息和视频的标识信息,以请求服务器200将待植入信息加入到标识信息对应的视频中。服务器200基于标识信息确定出对应的视频文件,并将待植入的信息植入到视频文件中,最后进行封装,得到封装后的视频文件,然后将封装后的视频文件发送给终端400。
本申请实施例提供的装置可以实施为硬件或者软硬件结合的方式,下面说明本申请实施例提供的装置的各种示例性实施。
参见图2,图2是本申请实施例提供的服务器200一个可选的结构示意图,服务器200可以是台式服务器,也可以是由多台服务器构成的服务器集群、云计算中心等。根据服务器200的结构,可以预见装置实施为服务器时的示例性结构,因此这里所描述的结构不应视为限制,例如可以省略下文所描述的部分组件,或者,增设下文所未记载的组件以适应某些应用的特殊需求。
图2所示的服务器200包括:至少一个处理器210、存储器240、至少一个网络接口220和用户接口230。终端200中的每个组件通过总线系统250耦合在一起。可理解,总线系统250用于实现这些组件之间的连接通信。总线系统250除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图2中将各种总线都标为总线系统250。
用户接口230可以包括显示器、键盘、鼠标、轨迹球、点击轮、按键、按钮、触感板或者触摸屏等。
存储器240可以是易失性存储器或非易失性存储器,也可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(ROM,Read Only Memory)。 易失性存储器可以是随机存取存储器(RAM,Random Access Memory)。本申请实施例描述的存储器240旨在包括任意适合类型的存储器。
本申请实施例中的存储器240能够存储数据以支持服务器200的操作。这些数据的示例包括:用于在服务器200上操作的任何计算机程序,如操作系统和应用程序。其中,操作系统包含各种系统程序,例如框架层、核心库层、驱动层等,用于实现各种基础业务以及处理基于硬件的任务。应用程序可以包含各种应用程序。
作为本申请实施例提供的方法采用软硬件结合实施的示例,本申请实施例所提供的方法可以直接体现为由处理器210执行的软件模块组合,软件模块可以位于存储介质中,存储介质位于存储器240,处理器210读取存储器240中软件模块包括的可执行指令,结合必要的硬件(例如,包括处理器210以及连接到总线250的其他组件)完成本申请实施例提供的方法。
作为示例,处理器210可以是一种集成电路芯片,具有信号的处理能力,例如通用处理器、数字信号处理器(DSP,Digital Signal Processor),或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等,其中,通用处理器可以是微处理器或者任何常规的处理器等。
将结合前述的实现本申请实施例的装置的示例性应用和实施,说明实现本申请实施例的方法。本申请实施例提供的方法应用于执行设备,该执行设备可以是指服务器,也可以是指终端。也就是说,本申请实施例提供的方法可以是由服务器执行的,还可以是由终端执行的。服务器可以是台式服务器,也可以是由多台服务器构成的服务器集群、云计算中心等。终端可以是移动电话(手机)、平板电脑、笔记本电脑等具有无线通信能力的移动终端,还可以实施为不便移动的具有计算功能的台式计算机、桌面电脑等。
参见图3,图3是本申请实施例在视频中植入信息的方法的实现流程示意图,将结合图3示出的步骤进行说明。
步骤S101,构建符合参考帧中植入区域的像素分布特性的模型,基于所述参考帧后续的待检测帧控制所述模型的更新。
这里,参考帧可以是已经植入信息之后的一帧图像,植入信息所在的区域即为植入区域。参考帧和植入区域可以是人为设定的,还可以是利用诸如机器学习、深度学习等技术自动筛选出来的。
参考帧可以是视频中这样的图像帧:至少包括植入区域,植入区域植入有待植入信息,且待植入信息没有被遮挡。例如,可以是视频中首次出现植入区域,植入区域植入有待植 入信息且待植入信息未被遮挡的图像帧。以待植入信息为广告为例,参考帧可以是视频中首次出现完整的广告区域(例如墙壁或地面中的特定区域,特定区域足以完整展示广告)的图像帧。
例如,参考帧可以是出现了与待植入信息相关的目标对象的图像帧,或者是在所显示的字幕中出现了与待植入信息相关的关键词的图像帧。假设待植入信息为某一品牌空调的广告,那么可以是将视频中出现了空调的某一个图像帧作为参考帧,也可以是将出现了类似“冷”、“热”等关键词的图像帧作为参考帧。
植入区域可以是人为划定的,例如可以是图像帧中右上角的一个区域,还可以是图像帧的中上方的一个区域,当然,还可以是自动识别出的特定区域,例如地面、墙壁、天空等相关区域。需要说明的是,参考帧中的植入区域要求是没有被前景遮挡的,从而初始化模型时,可以完整学习到植入区域的像素分布。
构建符合参考帧中植入区域的像素分布特性的模型在实现时,是要构建植入区域中每个像素点的模型,例如,可以是构建植入区域中每个像素点的高斯混合模型。此时,步骤S101在实现时,首先根据参考帧中植入区域的各个像素点初始化针对每个像素点预先定义的高斯混合模型,其中包括多个高斯模式(在一些实施例中,高斯模式也可称为模式/高斯分量/子模型),对高斯模式中的参数进行初始化,并求出之后将要用到的参数。然后对于后续的每一个待检测帧中植入区域的每一个像素点进行处理,判断像素点是否匹配某个模式(即高斯模式),若匹配,则将该像素点归入模式中,并根据新的像素值对模式的权重进行更新,若不匹配,则以像素点建立一个高斯模式,并初始化参数,代替原有模式中权重最小的模式。
步骤S102,基于所述模型识别所述待检测帧中植入区域的背景和前景,生成用于遮挡所述背景、用于显露前景的模板。
这里,步骤S102在实现时,可以是依次将待检测帧中植入区域的各个像素点与对应模型中的各个模式进行匹配,如果一个像素点存在与之匹配的模式,则认为像素点为背景像素点,如果没有一个模式是与像素点是匹配的,那么认为像素点为前景像素点。
在识别出植入区域的各个像素点为前景还是背景之后,可以通过识别结果来生成用于遮挡所述背景、并用于显露前景的模板。进一步地,当一个像素点识别为背景时,可以将像素点在模板中对应的值置为1,如果像素点识别为前景,那么将像素点在模板中对应的值置为0。需要说明的是,这里的0和1为二进制数,即模板是由二进制的0和1构成的掩膜。
步骤S103,将待植入信息应用所述模板,以屏蔽所述待植入信息中会遮挡所述前景的内容。
这里,步骤S103在实现时,可以是将待植入信息和模板进行乘法运算。在本实施例及其他实施例中,将待植入信息和模板进行乘法运算可以是指将待植入信息与模板中每个位置填充的二进制数进行乘法操作。实现方式可以是将待植入信息的像素点与模板中相应位置的二进制数进行乘法操作。由于在模板中,背景像素点对应的值为1,前景像素点对应的值为0,因此将待植入信息与模板进行乘法运算时,就会屏蔽待植入信息中会遮挡前景的内容,而不会对待植入信息中不遮挡前景的内容造成影响。
步骤S104,将应用所述模板后的所述待植入信息覆盖到所述待检测帧中植入区域,以使所述前景相对于所述待植入信息突出显示。
这里,由于在步骤S103中已经对植入信息应用了模板,并对待植入信息中会遮挡前景的内容进行了屏蔽,因此将应用模板后的待植入信息覆盖到待检测帧中的植入区域时,待植入信息并不会遮挡待检测帧中的前景部分,从而带来更好的观看体验。
利用本申请实施例提供的方法在视频中植入待植入信息时,首先要基于参考帧中植入区域的像素分布特性,为每个像素点构建模型,而模型的参数可以根据待检测帧中植入区域的各个像素点进行更新,然后再基于待检测帧中植入区域的前景像素点和背景像素点来生成能够遮挡背景并且不遮挡前景的模板,再对待植入信息应用模板,最后将应用了模板之后的待植入信息覆盖到待检测帧中的植入区域,生成的模板能够遮挡背景不遮挡前景,因此对待植入信息应用模板之后,能够屏蔽待植入信息中会遮挡前景的内容,从而使得在待检测帧中植入信息后,不会对待检测帧中的前景部分造成遮挡,进而能够保证视频的观看体验。
在一些实施例中,参见图4,图4是本申请实施例中构建并更新模型的实现流程示意图,步骤S101可以通过如图4所示的各个步骤实现:
步骤S1011,对应所述参考帧中植入区域的每个像素点,初始化与所述像素点对应的至少一个子模型以及至少一个子模型对应的权重。
在本申请实施例中,在构建模型时,是以像素点为粒度的,也就是说是对每个像素点构建一个模型,并且一个像素点的模型可以对应有至少一个子模型。也就是说,一个像素点的模型可以对应有一个子模型,也可以对应有多个子模型。
例如,像素点的模型可以是高斯混合模型,模型中包括两个或以上的子模型,一般可以是三到五个。子模型可以是高斯概率分布函数,初始化子模型至少为初始化子模型的参数,其中,子模型的参数可以是均值、方差、权重等参数。在初始化子模型的参数时,可以是将子模型的参数设置预设值。在初始化过程中,一般将方差设置的尽量大些,而权重 则尽量小些。这样设置是由于初始化的高斯模型是一个并不准确的模型,需要不停的缩小他的范围,更新参数值,从而得到最可能的高斯模型,将方差设置大些,就是为了使得尽可能多的像素点与子模型匹配,从而获得准确表示像素点的颜色值在视频的各个帧之间的分布特性的模型。
在一些实施例中,模型还可以是单高斯模型,此时仅需要一个子模型,子模型的参数可以是均值、方差等。由于单高斯模型适用于背景单一不变的场景,因此通常构建高斯混合模型来进行后续的处理。
步骤S1012,将对应每个像素点构建的子模型基于初始化的权重混合,以形成与所述像素点对应的模型。
这里,假设每个像素点有3个子模型,分别为F 1,F 2和F 3,对应的权重分别为K 1、K 2和K 3,那么步骤S1012可以通过公式(1-1)来实现:
F m=K 1*F 1+K 2*F 2+K 3*F 3   (1-1);
其中,F m为像素点对应的模型。
在一些实施例中,还可以对公式(1-1)进行简单的数学变换以形成与像素点对应的模型。
通过步骤S1011和步骤S1012,就完成了构建符合参考帧中植入区域的像素分布特性的模型。
步骤S1013,判断待检测帧中植入区域是否被前景遮挡。
在一些实施例中,可以是先获取待检测帧中植入区域的第一色彩空间分布和参考帧中植入区域的第一色彩空间分布,然后确定待检测帧的植入区域和参考帧的植入区域的第一色彩空间分布的差异度,进而通过判断差异度是否满足第一差异性条件,来确定待检测帧中植入区域是否被前景遮挡。
例如,由于参考帧中的植入区域是没有被前景遮挡的,如果待检测帧中植入区域的第一色彩空间分布,与参考帧中植入区域的第一色彩空间分布满足第一差异性条件,说明两者的差异很大,那么此时表明待检测帧中植入区域被前景遮挡,进入步骤S1014;如果待检测帧中植入区域的第一色彩空间分布,与参考帧中植入区域的第一色彩空间分布不满足第一差异性条件,说明两者差异较小,那么此时表明待检测帧中植入区域没有被前景遮挡,此时进入步骤S1015。
在一些实施例中,第一色彩空间分布可以是红绿蓝(Red Green Blue,RGB)空间分布。获取植入区域的第一色彩空间分布可以是获取植入区域的RGB直方图,例如可以将256个灰度分为32个区间,统计植入区域的像素点在这32个区间内的分布情况得到RGB直方图。
第一差异性条件可以是用于表示确定待检测帧中植入区域不存在遮挡时,参考帧植入区域与待检测帧的植入区域的第一色彩空间分布的最大差异度。例如,假设一共分为M个区间,第一差异性条件可以是有30%*M个区间中像素点个数的差值在个数阈值范围之外。例如有32个区间,那么第一差异性条件可以是至少有9个区间内的像素点个数的差值超过10个。
步骤S1014,响应于所述待检测帧中植入区域被所述前景遮挡,将所述模型向所述待检测帧中植入区域的拟合进行减速,所述模型中子模型的权重不变。
将所述模型向所述待检测帧中植入区域的拟合进行减速也即将所述模型向所述待检测帧中植入区域进行拟合的速率进行减小。例如,对于植入区域每个像素点的模型,可以通过将模型中拟合速度相关的学习率置为0,以保持模型中子模型的权重不变。当待检测帧中植入区域被前景遮挡时,那么模型向待检测帧中植入区域的拟合进行减速,从而使得模型学习植入区域的像素变化的速度降低,以避免后续将前景误识别为背景。
步骤S1015,判断待检测帧中植入区域的光照情况是否发生变化。
在一些实施例中,可以是首先获取待检测帧中植入区域的第二色彩空间分布和参考帧中植入区域的第二色彩空间分布,然后确定待检测帧的植入区域和参考帧的植入区域的第二色彩空间分布的差异度,进而通过判断差异度是否满足第二差异性条件,来确定待检测帧中植入区域的光照情况是否发生变化。第二差异性条件可以是用于表示确定待检测帧中植入区域的光照情况发生变化时,参考帧的植入区域与待检测帧的植入区域的第二色彩空间分布的最大差异度。
例如,响应于所述待检测帧中植入区域的第二色彩空间分布,与所述参考帧中植入区域的第二色彩空间分布满足第二差异性条件,确定所述待检测帧中植入区域的光照情况发生变化,此时进入步骤S1016;响应于所述待检测帧中植入区域的第二色彩空间分布,与所述参考帧中植入区域的第二色彩空间分布不满足第二差异性条件,确定所述待检测帧中植入区域的光照情况没有发生变化,此时保持原有的学习率,对权重进行更新。
在一些实施例中,第二色彩空间分布可以是色相饱和度色调(Hue Saturation Value,HSV)空间分布。步骤S1015实现过程可以参照步骤S1013而理解。
步骤S1016,将所述模型向所述待检测帧中植入区域的拟合进行加速。
将所述模型向所述待检测帧中植入区域的拟合进行加速,也即将所述模型向所述待检测帧中植入区域进行拟合的速率进行提升。在一些实施例中,步骤S1016执行的前提条件为待检测帧中植入区域没有被前景遮挡,且待检测帧中植入区域的光照情况发生了变化,那么为了避免将新的光照识别为前景,需要加快拟合速度,从而使得模型尽快向待检测帧 的植入区域去拟合,以便保证模型能够表示植入区域的像素分布特性。例如,对于植入区域每个像素点的模型,可以将模型中与拟合速度相关的学习率置为-1来实现。
通过步骤S1013至步骤S1016就完成了对模型中各个子模型权重的更新,此时还需要进一步对子模型的参数进行更新。
步骤S1017,判断待检测帧中植入区域的各个像素点是否分别与对应模型中的子模型匹配。
在一些实施例中,对于植入区域中任一像素点,如果像素点的颜色值与像素点的模型中任一子模型的均值的偏差小于一定阈值,则认为像素点与子模型是匹配的。例如,在实际应用中,阈值可以是与标准差有关的,可以是子模型标准差的2.5倍。如果一像素点与模型中的至少一个子模型匹配,那么进入步骤S1018;如果一像素点与模型中的任一子模型都不匹配,此时进入步骤S1019。
步骤S1018,响应于所述待检测帧中植入区域的像素点与对应模型中的至少一个子模型匹配,更新所述匹配的子模型的参数。
对于像素点的对应模型中与像素点未匹配的子模型,保持相应子模型的参数不变。
步骤S1019,响应于待检测帧中植入区域的像素点与对应模型中的任一子模型都不匹配,基于像素点初始化新的子模型,并替换权重最小的子模型。
这里,通过步骤S1017至步骤1019就完成了子模型参数的更新。在进行子模型参数的更新时,需要对待检测帧中植入区域的各个像素点进行遮挡检测,也即确定像素点为前景或背景,并根据遮挡检测结果更新子模型的参数以及生成用于遮挡背景、显露前景的模板,从而使得在待检测帧的植入区域中植入信息时,与背景更好的融合,并且能够不遮挡前景。
在一些实施例中,步骤S102可以通过以下步骤实现:
步骤S1021,将所述待检测帧中植入区域的每个像素点的颜色值,与所述像素点对应模型中的各个子模型进行匹配。
这里,步骤S1021在实现时,可以是将待检测帧中植入区域的各个像素点的颜色值,与像素点对应的各个子模型进行比较,一个像素点的颜色值与至少一个子模型的均值的偏差在一定阈值范围内时,表明该子模型与像素点匹配。
步骤S1022,将匹配成功的像素点识别为所述背景的像素点,将匹配失败的像素点识别为所述前景的像素点。
这里,由于参考帧中的植入区域是不会遮挡前景的区域,因此可以是背景,并且在构建模型时,是基于参考帧中植入区域的像素分布特性构建的,那么如果待检测帧中植入区 域中的像素点与像素点对应的模型中的一个子模型匹配,那么就确定像素点为背景像素点;如果待检测帧中植入区域的像素点与像素点对应的模型中的任一个子模型都不匹配,那么就确定像素点为前景像素点。
步骤S1023,对应所述待检测帧中植入区域中被识别为背景的像素点,在空的所述模板中对应的位置填充二进制一。
步骤S1024,对应所述待检测帧中植入区域中被识别为前景的像素点,在填充二进制一的所述模板中对应的位置填充二进制零。
通过步骤S1021至步骤S1024,就生成了二值化的模板,由于在模板中,对于识别为背景的像素点,其对应的模板位置为1,对于识别为前景的像素点,其对应的模板位置为0,因此在将此模板与待植入信息进行乘法运算之后,得到应用模板之后的待植入信息,在应用模板之后的待植入信息中识别为前景的像素点的像素值为0,而识别为背景的像素点的像素值不变。这样在将应用后的待植入信息覆盖待检测帧中的植入区域时,能够保证前景没有被遮挡,并且是相对于植入信息突出显示的。
在一些实例中,在步骤S101之前或之后,还需要确定待检测帧中的植入区域。如果所述视频是采用静态镜头形成的,那么视频中的画面范围和视域面积是不变的。此时,确定待检测帧中的植入区域在实际实现时,可以是基于所述参考帧中植入区域的位置,在所述待检测帧中定位相应位置的区域,以得到所述待检测帧中植入区域。
在一些实施例中,如果所述视频为采用运动镜头形成的,确定待检测帧中的植入区域可以通过以下步骤实现:
步骤21,将从所述视频的参考帧中植入区域提取的特征,与从所述待检测帧中提取的特征匹配。
这里,步骤21在实现时,可以首先从参考帧中的植入区域中提取出特征点,然后再提取待检测帧中的特征点,再将参考帧中的植入区域提取出的特征点与待检测帧中的特征点进行匹配。
进一步地,提取特征点时,可以是提取定向的角点测试的特征(Features from Accelerated Segment Test,FAST)和旋转的二元鲁棒独立触及特征(Binary Robust Independent Elementary Features,BRIEF)特征点(Oriented FAST and Rotated BRIEF,ORB),或者是尺度不变特征转换(Scale-Invariant Feature Transform,SIFT)特征点。当然,在一些实施例中,还是可以是提取其他类型的特征点。
步骤22,响应于匹配成功,确定所述待检测帧中包括与参考帧中植入区域对应的植入 区域。
这里,在本实施例中,参考帧中植入区域的特征点与待检测帧中的特征点匹配成功可以是所有的特征点都匹配成功,还可以是一部分特征点匹配成功,例如可以是80%的特征点匹配成功。
响应于参考帧中的植入区域的特征点与待检测帧中的特征点匹配成功,说明待检测帧中存在与参考帧中植入区域对应的植入区域,此时可以植入信息。响应于参考帧中的植入区域的特征点与待检测帧中的特征点没有匹配成功,说明待检测帧中不存在与参考帧中植入区域对应的植入区域,此时如果植入信息可能会大面积的遮挡待检测帧中的前景,因此此时不能进行信息植入。
在步骤21至步骤22中,是通过参考帧中植入区域的特征点与待检测帧中的特征点进行匹配来跟踪待检测帧中的植入区域,相对于通过运动跟踪的实现方式实时性高,适用范围广,鲁棒性强,使用自动高效。
在一些实施例中,如果所述视频为采用运动镜头形成的,由于镜头的机位、光轴、焦距都可能会发生变化,因此采用运动镜头形成的视频的各个图像帧中植入区域的位置是会发生变化的。此时在步骤S1013之前,还需要执行以下步骤:
步骤31,将所述待检测帧中植入区域进行变换,以使变换后的植入区域中每个像素点的位置,与所述参考帧中植入区域相应像素点的位置一致。
这里,步骤31在实现时,可以是先跟踪植入区域(即用于植入信息的背景区域)生成单应矩阵(Homography matrix)H,然后再将待检测帧中的植入区域根据单应矩阵H变换到参考帧,以使得变换后的植入区域中每个像素点的位置,与所述参考帧中植入区域相应像素点的位置一致。进一步地,可以是根据公式(2-1)来实现:
Figure PCTCN2020085939-appb-000001
其中,x t,y t表示当前帧中的一像素点,x 0,y 0表示参考帧中与像素点对应的像素点。
如果所述视频为采用运动镜头形成的,由于在基于待检测帧中植入区域控制模型的更新时,实际上使用的是经过单应矩阵变换的植入区域,因此在步骤S102中识别待检测帧中植入区域的背景和前景,以及生成用于遮挡背景并显露前景的模板时也是利用的经过单应矩阵变换的植入区域。那么对应地,在步骤S103之前,还需要将所述模板进行所述变换的逆变换,以使变换后的模板中每个二进制数的位置,与所述待检测帧中植入区域相应像素 点的位置一致。
在本申请实施例中,对于采用静态镜头形成的视频,利用待检测帧中植入区域的每个像素点的像素分布特性拟合参考帧中植入区域的背景像素分布,并且采用高斯混合建模,自动学习更新模型,并根据遮挡检测结果确定能够屏蔽背景显示前景的模板,从而避免植入信息遮挡前景。而对于运动镜头,使用变换技术,将待检测帧中的植入区域中的像素点位置映射到与参考帧中的植入区域中一致的位置,同样对待检测帧中植入区域的像素点进行遮挡检测,生成模板,再将模板进行逆变换,形成能够屏蔽背景显示前景的模板,从而保证植入信息后能够不遮挡前景。
本申请实施例再提供一种在视频中植入信息的方法,图5为本申请实施例在视频中植入信息的方法的又一实现流程示意图,如图5所示,所述方法包括:
步骤S401,终端获取待处理的视频和待植入信息。
这里,待处理的视频可以是终端录制的视频,还可以终端从服务器下载的视频,当然,也可以是其他终端发送给终端的视频。待植入信息可以是待植入的图片信息,待植入的图片信息可以是广告图片信息,还可以是公示信息等。
在本申请实施例中,待处理的视频可以是包括很多图像帧的视频文件,在一些实施例中,待处理的视频还可以是指待处理的视频的标识信息,例如可以包括待处理的视频的片名、主演等信息。
步骤S402,终端将至少携带有所述视频和待植入信息的植入请求,发送给服务器。
在一些实施例中,植入请求中还可以包括参考帧的标识以及参考帧中的植入区域的信息。
以植入区域为矩形为例,植入请求中可以包括参考帧的帧号、参考帧中植入区域的四个顶点的坐标。
步骤S403,服务器基于接收到的植入请求,确定参考帧和参考帧中的植入区域。
在一些实施例中,可以通过解析接收到的植入请求,以获取设置的参考帧和在参考帧中设定的植入区域。在另一些实施例中,可以通过图像识别的方式,对视频文件的图像帧分析,从而确定符合信息植入条件的参考帧和参考帧中的植入区域。
作为信息植入条件的示例,可以包括以下至少之一:植入区域的类型(例如墙壁、地面)、植入区域的尺寸(例如宽和高,以适配待植入信息)、植入区域的颜色(例如与待植入信息形成一定的对比度),植入区域的曝光时间(即在视频中累计出现的时长)。
步骤S404,服务器构建符合参考帧中植入区域的像素分布特性的模型,基于所述参考 帧后续的待检测帧控制所述模型的更新。
步骤S405,服务器基于所述模型识别所述待检测帧中植入区域的背景和前景,生成用于遮挡所述背景、用于显露前景的模板。
步骤S406,服务器将待植入信息应用所述模板,以屏蔽所述待植入信息中会遮挡所述前景的内容。
步骤S407,将应用所述模板后的所述待植入信息覆盖到所述待检测帧中植入区域,以使所述前景相对于所述待植入信息突出显示。
这里,步骤S404至步骤S407的实现过程可以参照上文类似步骤的说明而理解。
步骤S408,服务器将植入信息后的视频进行封装,并将封装后的视频发送给终端。
在一些实施例中,由于服务器在对视频中的各个图像帧植入信息之前,是先对视频进行分帧,得到一个个的图像帧,然后再对各个图像帧进行信息植入的,那么在植入信息之后,为了得到一个正常的视频文件,需要把各个图像帧、音频、字幕等进行集中,使得音频、图像帧和字幕成为一个整体。
在一些实施例中,服务器在将植入信息后的视频进行封装后,还可以在观看视频的应用中发布植入信息的视频。
步骤S409,终端将植入信息后的视频进行发布。
在一些实施例中,可以在观看视频的应用中发布,还可以是发送给其他终端,例如可以是在即时通讯应用的好友群中发布。
在本申请实施例提供的方法中,终端要在视频中植入信息时,将待处理的视频和待植入信息发送给服务器,由服务器根据参考帧中植入区域的像素分布特性构建模型,由于参考帧中的植入区域是不会遮挡视频的前景的,因此可以基于构建的模型对后续的待检测帧中的植入区域中像素点进行背景和前景的识别,并进一步生成能够遮挡背景不遮挡前景的模板。对待植入信息应用模板之后,能够屏蔽待植入信息中会遮挡前景的内容,从而使得在待检测帧中植入信息后,不会对待检测帧中的前景部分造成遮挡,进而能够保证视频的观看体验。
下面,将说明本申请实施例在一个实际的应用场景中的示例性应用。
本申请实施例再提供一种在视频中植入信息的方法,方法在实现过程中包括两个阶段:背景建模学习阶段和遮挡预测阶段。图6为本申请实施例在视频中植入信息的方法的再一实现流程示意图,如图6所示,所述方法包括:
步骤S501,获取背景图片。
步骤S502,根据背景图片进行高斯混合建模。
通过步骤S501和步骤S502就完成了背景建模过程。
步骤S503,对视频进行分帧。
步骤S504,获取待预测图片。
步骤S505,基于背景建模对待预测图像进行逆变换得到逆变换图片。
步骤S506,对逆变换图片再进行正变换得到遮挡掩膜。
图6所示的流程图,构造了一个自适应高斯混合模型用于背景建模,基于视频广告植入的商机初始帧,对后续帧自适应选择帧图片进行背景建模,自适应选择学习率迭代更新来优化模式。
图7为本申请实施例在视频中植入信息的再一实现过程示意图,如图7所示,在本实施例中可以通过以下步骤在视频中植入信息:
步骤S601、视频解帧。
这里,将输入的视频通过图像处理技术进行分帧操作,把视频拆成每一帧作为待预测的图片。
步骤S602、定位商机初始帧(即准备植入广告的帧),以及相应的植入区域。
这里,商机初始帧和相应的植入区域可以是人工设定的。在一些实施例中,还可以是自动识别视频中的包括特定对象/区域(例如地面、墙壁)的帧作为商机初始帧,进一步地,可以使用基于神经网络的图像识别技术确定商机初始帧和植入区域以及特定位置(例如中间区域,与广告尺寸一致),特定位置对应相应的植入区域。
步骤S603、根据商机初始帧中的植入区域的图像,初始化植入区域各个像素点各自对应的高斯混合模型。
步骤S604、后续帧(即视频后续的包括植入区域的帧),处理如下:
步骤S6041,将后续帧的植入区域,与初始帧的植入区域进行分布特性的比较,判断是否出现遮挡;当出现遮挡时,更新学习率。
步骤S6042,根据是否出现光照变化,调整学习率。
步骤S6043,背景/前景像素点的识别,结合识别结果和更新的学习率进行模型的更新,进一步确定掩膜。
在实际实现时,可以首先进行前景/背景的识别,也即判断像素点是否符合对应模型中的至少一个模式,如果符合,则像素点是背景像素点,如果均不符合,则像素点为前景像素点;然后再进行模型的更新,涉及模式的权重和参数(均值和标准差)的更新。
对于权重,按照更新的学习率更新模式的权重;对于参数,未匹配到的模式的均值与标准差不变,匹配到的模式的均值和标准差根据更新的学习率和权重进行更新。没有任何模式匹配,则权重最小的模式被替换。各个模式按照ω/α 2的降序排列,权重大,标准差小的模式排列靠前。这里的ω是权重,α是学习率。
步骤S6044,将应用掩膜后的待植入信息植入到后续帧的植入区域。
该步骤S6044的实现过程可以参照上文类似步骤的说明而理解。
步骤S605、重复步骤S604,在所有后续帧处理完成后,将图像帧封装。
这里,在播放封装好的视频时,植入的广告并不会对图像帧中的前景部分造成遮挡,从而带来更好的观看体验。
在上述步骤中,步骤S601至步骤S603对应背景建模学习部分,步骤S604至步骤S605对应遮挡预测部分。这两个部分是交叉执行的:首先是根据初始帧来建模;后续帧进行遮挡预测(判断),并根据预测结果继续更新模型。
在一些实施例中,步骤S603在实现时,可以是获取视频植入广告项目的参考帧(即包括植入区域的商机初始帧)作为背景建模,来对先验的植入区域(视频的参考帧的背景区域中,用于植入广告的特定区域,即植入区域)进行高斯混合模型(GMM)的初始化。
商机初始帧中的植入区域是满足这样的条件的:在商机初始帧中植入区域没有被前景遮挡。从而初始化模型时,可以完整学习到植入区域的像素分布。
建立植入区域的每一个像素点的混合高斯模型,混合高斯模型将像素点的颜色值用K个模式(在一些实施例中,模式也可称为高斯模式/高斯分量/子模型)来叠加表示,通常K取3-5之间。混合高斯模型将像素点所呈现的颜色值X表示为随机变量,则视频每帧图像中的像素点的颜色值是随机变量X的采样值。
在高斯背景维护中,场景中的每一个像素点的颜色值可以由K个高斯分量组成的混合分布来表示,即图像中像素点j在时刻t的取值为x j的概率为:
Figure PCTCN2020085939-appb-000002
其中,
Figure PCTCN2020085939-appb-000003
表示t时刻像素点j的混合高斯模型中第i个高斯分量的权重,满足:
Figure PCTCN2020085939-appb-000004
这里
Figure PCTCN2020085939-appb-000005
Figure PCTCN2020085939-appb-000006
分别表示第i个高斯分量的均值及协方差,η表示高斯概率密度函数:
Figure PCTCN2020085939-appb-000007
其中,d为x j的维数,对于RGB颜色空间而言,可视为相互独立,则协方差矩阵定义为:
Figure PCTCN2020085939-appb-000008
其中,σ表示标准差,I表示单位矩阵。
高斯混合模型的初始化可以是对各个参数的初始化,在本申请实施例中可以有以下两种初始化方式:
一种初始化方式是:在初始化阶段,如果对混合高斯参数初始化速度要求不高,那么像素点的每个颜色通道范围为[0,255],可以对K个高斯分量直接初始化较大的
Figure PCTCN2020085939-appb-000009
对每个高斯分量的权重取ω init=1/K,取第一帧图像的每个像素点的颜色值来对混合高斯模型中的K个高斯分量的均值进行初始化。其中,均值是像素点的颜色值,方差是预先设定的经验值。
另一种初始化方式是:在第一帧图像时,对每个像素点对应的第一个高斯分量进行初始化,均值赋为当前像素点的颜色值,权重赋为1,除第一个高斯分量以外的高斯分量的均值、权重都初始化零。方差是预先设定的经验值。
在一些实施例中,在进行后续帧的处理时,对于形成视频所采用的镜头类型不同而进行不同的处理。
对于采用静止镜头形成的视频,步骤S6041在实现时,可以是:
对于商机初始帧后续每一帧的植入区域,比较后续帧的植入区域和初始植入区域(即初始帧的植入区域)的RGB颜色空间分布,根据RGB分布的差异性确定是否存在遮挡。也即确定在商机初始帧的植入区域植入的广告,是否对后续帧中在植入区域出现的前景造成遮挡,例如对图8B中的“婴儿式”造成遮挡。如果RGB分布的差异性满足差异性条件,则认为植入区域的背景被前景遮挡。判断RGB分布的差异性是否满足差异性条件可以通过比较直方图分布实现,例如可以将0-255灰度分为16区间,统计每帧的像素点在16区间的分布情况,并进行比较。如果后续帧的植入区域的直方图分布与初始植入区域的直方图分布差异超过了一定的阈值,说明RGB分布的差异性满足差异性条件,此时认为后续帧的植入区域的背景被前景遮挡。相应地,如果后续帧的植入区域的直方图分布与初始植入区域的直方图分布差异没超过阈值,说明RGB分布的差异性不满足差异性条件,此时认为后续帧的植入区域的背景没有被前景遮挡。
当判断为存在遮挡时,将更新的学习率置为0(即不使用后续帧更新模型中模式的权重);如果不存在遮挡,则可以保持原先的学习率。
在一些实施例中,步骤S6042在实现时,可以是:
对于商机初始帧后续每一帧的植入区域,比较后续帧的植入区域和初始植入区域(即初始帧的植入区域)的HSV分布,根据HSV分布的差异性确定是否为背景的不同光照变化。判断HSV分布的差异性是否满足差异性条件可以通过比较HSV颜色空间的直方图分布实现。如果满足差异性条件,则认为存在背景的光照变化,将更新的学习率置为-1。由于HSV能够反映光照的变化,如果是背景的光照发生变化,则可以通过将学习率调整为-1方式,增大符合新的光照变化的模式的权重,以避免新的光照被识别为前景的情况;如果没有发生光照变化,则可以保持原先的学习率。
在一些实施例中,步骤S6043在实现时,可以是识别后续帧植入区域的像素点类型,并更新模型,进一步确定掩膜。
对于商机初始帧与后续帧的同一个像素点,在后续帧(t时刻)的颜色值X t,与像素点的当前K个模式(即K个高斯分量)进行比较,如果与至少一个模式的均值的偏差在模式的2.5σ(即标准差的2.5倍)以内,则认为模式与像素点匹配,像素点属于视频的背景;如果不匹配,则像素点属于前景。
在确定出一个像素点为前景还是背景的基础上,确定掩膜并进行形态学改善。
如果一个像素点属于视频的背景,那么像素点在掩膜中对应的值为1;如果像素点属于视频的前景,那么像素点在掩膜中对应的值为0。
在本申请实施例中,利用形态学改善掩膜,主要针对模式的一些判断前景和遮挡的误差进行修复,包括消除掩膜中的孔洞和连接断层,避免遮挡处理后,显露的视频前景中出现噪点。图9A和图9B为利用形态学改善掩膜的示意图。如图9A所示,通过形态学可以将901中白色区域中的孔洞消除了,形成如902所示的完全连通的区域。如图9B所示,通过形态学可以将911中的断层粘连起来,同样形成一个如912所示的连通的完整区域。
更新模型可以是按照更新的学习率更新模型的权重。其中,一个像素点未匹配到的模式的均值与标准差不变,仅对匹配到的模式的均值和标准差进行更新。如果没有任何模式与像素点匹配,则基于像素点初始化新的模式,并替换权重最小的模式;各个模式按照ω/α 2降序排列,权重大,标准差小的模式排列靠前。这里的ω是权重,α是学习率。
在实际实现时,若x j与第i个模式匹配,则第i个模式被x j更新,其余的模式保持不变,更新方式如下所示:
Figure PCTCN2020085939-appb-000010
Figure PCTCN2020085939-appb-000011
Figure PCTCN2020085939-appb-000012
Figure PCTCN2020085939-appb-000013
其中,α为模型的学习率,ρ为参数的学习率,反映的是模式参数的收敛速度。若x j与像素点的K个模式都不匹配,那么混合高斯模型中排在最后面的那几个模式将被新的模式所代替,新模式的均值为x j,标准差和权值初始化为σ init和ω init。剩下的模式保持相同的均值和方差,权重按照公式(3-8)进行更新:
Figure PCTCN2020085939-appb-000014
在更新完成后,各模式的权重需要被归一化,参数更新完成之后,为了确定像素点的混合高斯模型中的模式是由背景产生的,根据每个模式按照ω/α 2由大到小排序,选取前B个模式作为背景的分布,B满足以下公式,参数Q表示背景所占比例;
Figure PCTCN2020085939-appb-000015
取较大者表示像素值具有较大的方差与较高的出现概率,这正体现了场景背景像素值的特性。
以上可认为是在采用静态镜头形成的视频中植入信息的实现过程。图8A和图8B为本申请实施例在采用静态镜头形成的视频中植入信息的效果示意图。其中,图8A所示的图像可以是图8B所示图像之前的某一帧图像(即视频未讲解“婴儿式”之前的某一帧图像),此时,如图8A所示,图像帧中的墙壁区域801并未显示“婴儿式”,如果以墙壁区域作为广告植入区域,则在后续帧中,也即图8B所示的图像帧中,出现了前景“婴儿式”,如果直接将植入广告作为图层覆盖,其中“婴儿式”部分会被遮挡。
应用本申请实施例提供的在视频中植入信息的方案后,如图8B所示,“婴儿式”三个字浮动于广告上,即植入的广告811不会对视频的前景内容造成遮挡,从而保证了原始视频在广告植入位置的前景内容的完整性。
对于采用运动镜头形成的视频,步骤S604在实现时,在步骤S6041之前还需要执行以下步骤:
步骤71,跟踪包括植入区域的后续帧。
通过特征跟踪技术进行模板匹配(特征点的模板,例如使用orb方法找到的特征点)、 或sift方法来跟踪包括植入区域的后续帧。
对于视频后续帧,需要先跟踪植入区域(即用于植入信息的背景区域)生成单应矩阵H,由于背景建模是对每个像素点进行建模,所以需要将商机初始帧(参考帧)和后续帧中植入区域的像素点的位置进行一一对应。因为如果摄像头移动,那么商机初始帧和当前帧的植入区域的像素点的位置是不对应的。
将商机初始帧和后续帧中植入区域的像素点的位置进行一一对应在实现时,可以通过公式(3-10)将当前帧的植入区域根据单应矩阵H逆变化到初始帧:
其中,x t,y t表示当前帧中的一像素点,x 0,y 0表示商机初始帧中与像素点对应的像素点。
Figure PCTCN2020085939-appb-000016
对于采用运动镜头形成的视频,步骤S6041与步骤S6042在实现时与采用静态镜头形成的视频步骤S6041与步骤S6042的实现过程是类似的,可以参照上文类似步骤的说明而理解。
步骤S6043在实现时,也是需要识别后续帧植入区域的像素点类型,以更新模型和确定掩膜。不同的是,确定掩膜后,利用单应矩阵H将mask(掩膜)逆变换为后续帧的位置,变化如下公式(3-11)所示:
Figure PCTCN2020085939-appb-000017
在后续帧的植入区域植入广告,对于判断为遮挡的图像帧在植入区域应用对应的掩膜,视频封装。
图8C和图8D为本申请实施例在采用动态镜头形成的视频中植入信息的效果示意图。图8C为人物并未出现的某一帧,如果此时以地面作为广告植入区域821,那么植入广告后的图像帧如图8C所示。而在后续帧中,如果植入广告“Hello秦Pro”直接以图层覆盖,会遮挡在区域出现的人物的腿部。而应用本实施例提供的在视频中植入信息的方案后,如图8D所示,人物腿部在植入广告的顶部显示,使得广告植入区域831不会对视频的前景造成遮挡。
利用本申请实施例提供的在视频中植入信息的方法,使用结合视频序列和全像素统计建模方法,对静止镜头,实现自动选取背景建模,后续帧自动更新学习率优化模型,使用统计特征判定遮挡掩膜;对运动镜头,使用变换技术映射为标准画面进行像素统计建模, 再返回到序列帧得出遮挡掩膜,无需运动跟踪模型,不仅能够很精细地处理在视频广告植入过程中遮挡物,使植入的广告表现得更加原生,并且实时性高,适用范围广,鲁棒性强,使用自动高效。
下面说明软件模块的示例性结构,在一些实施例中,如图2所示,装置240中的软件模块可以包括:
模型构建模块241,用于构建符合参考帧中植入区域的像素分布特性的模型,基于所述参考帧后续的待检测帧控制所述模型的更新;
模板生成模块242,用于基于所述模型识别所述待检测帧中植入区域的背景和前景,生成用于遮挡所述背景、用于显露前景的模板;
模板应用模块243,用于将待植入信息应用所述模板,以屏蔽所述待植入信息中会遮挡所述前景的内容;
信息覆盖模块244,用于将应用所述模板后的所述待植入信息覆盖到所述待检测帧中植入区域,以使所述前景相对于所述待植入信息突出显示。
在一些实施例中,所述装置还进一步包括:
参数初始化模块,用于对应所述参考帧中植入区域的每个像素点,初始化与所述像素点对应的至少一个子模型以及至少一个子模型对应的权重;
权重混合模块,用于将对应每个像素点构建的子模型基于初始化的权重混合,以形成与所述像素点对应的模型。
在一些实施例中,所述装置还进一步包括:
权重保持模块,用于响应于所述待检测帧中植入区域被所述前景遮挡,将所述模型向所述待检测帧中植入区域进行拟合的速率进行减小,所述模型中子模型的权重不变;
拟合加速模块,用于响应于所述待检测帧中植入区域未被所述前景遮挡、且所述待检测帧中植入区域的光照情况发生变化,将所述模型向所述待检测帧中植入区域的进行拟合的速率进行提升。
在一些实施例中,所述装置还进一步包括:
参数更新模块,用于响应于所述待检测帧中植入区域的像素点与对应模型中的至少一个子模型匹配,更新所述匹配的子模型的参数,以及保持所述对应模型中未匹配的子模型的参数不变。
在一些实施例中,所述装置还进一步包括:
第一匹配模块,用于将所述待检测帧中植入区域的每个像素点的颜色值,与所述像素 点对应模型中的子模型匹配;
识别模块,用于将匹配成功的像素点识别为所述背景的像素点,将匹配失败的像素点识别为所述前景的像素点。
在一些实施例中,所述装置还进一步包括:
填充模块,用于对应所述待检测帧中植入区域中被识别为背景的像素点,在空的所述模板中对应的位置填充二进制一,以及
对应所述待检测帧中植入区域中被识别为前景的像素点,在填充二进制一的所述模板中对应的位置填充二进制零。
在一些实施例中,所述装置还进一步包括:
运算模块,用于将所述待植入信息,与所述模板中每个位置填充的二进制数进行乘法操作。
在一些实施例中,所述装置还进一步包括:
第二匹配模块,用于响应于视频为采用运动镜头形成,将从所述视频的参考帧中植入区域提取的特征,与从所述待检测帧中提取的特征匹配;
区域确定模块,用于响应于匹配成功,确定所述待检测帧中包括与参考帧中植入区域对应的植入区域。
在一些实施例中,所述装置还进一步包括:
区域变换模块,用于响应于视频为采用运动镜头形成,
基于所述参考帧后续的待检测帧控制所述模型的更新之前,将所述待检测帧中植入区域进行变换,以使变换后的植入区域中每个像素点的位置,与所述参考帧中植入区域相应像素点的位置一致;
模板逆变换模块,用于将待植入信息应用所述模板之前,将所述模板进行所述变换的逆变换,以使变换后的模板中每个二进制数的位置,与所述待检测帧中植入区域相应像素点的位置一致。
在一些实施例中,所述装置还进一步包括:
区域定位模块,用于响应于视频为采用静态镜头形成,基于所述参考帧中植入区域的位置,在所述待检测帧中定位相应位置的区域,以确定所述待检测中植入区域。
在一些实施例中,所述装置还进一步包括:
第一确定模块,用于响应于所述待检测帧中植入区域的第一色彩空间分布,与所述参考帧中植入区域的第一色彩空间分布满足第一差异性条件,确定所述参考帧中植入区域被所述前景遮挡;
第二确定模块,用于响应于所述待检测帧中植入区域的第二色彩空间分布,与所述参考帧中植入区域的第二色彩空间分布满足第二差异性条件,确定所述参考帧中植入区域被所述前景遮挡。
作为本申请实施例提供的方法采用硬件实施的示例,本申请实施例所提供的方法可以直接采用硬件译码处理器形式的处理器410来执行完成,例如,被一个或多个应用专用集成电路(ASIC,Application Specific Integrated Circuit)、DSP、可编程逻辑器件(PLD,Programmable Logic Device)、复杂可编程逻辑器件(CPLD,Complex Programmable Logic Device)、现场可编程门阵列(FPGA,Field-Programmable Gate Array)或其他电子元件执行实现本申请实施例提供的方法。
本申请实施例提供一种存储有可执行指令的存储介质,其中存储有可执行指令,当可执行指令被处理器执行时,将引起处理器执行本申请实施例提供的方法,例如,如图3至图6示出的方法。
在一些实施例中,存储介质可以是FRAM、ROM、PROM、EPROM、EEPROM、闪存、磁表面存储器、光盘、或CD-ROM等存储器;也可以是包括上述存储器之一或任意组合的各种设备。
在一些实施例中,可执行指令可以采用程序、软件、软件模块、脚本或代码的形式,按任意形式的编程语言(包括编译或解释语言,或者声明性或过程性语言)来编写,并且其可按任意形式部署,包括被部署为独立的程序或者被部署为模块、组件、子例程或者适合在计算环境中使用的其它单元。
作为示例,可执行指令可以但不一定对应于文件系统中的文件,可以可被存储在保存其它程序或数据的文件的一部分,例如,存储在超文本标记语言(HTML,Hyper Text Markup Language)文档中的一个或多个脚本中,存储在专用于所讨论的程序的单个文件中,或者,存储在多个协同文件(例如,存储一个或多个模块、子程序或代码部分的文件)中。
作为示例,可执行指令可被部署为在一个计算设备上执行,或者在位于一个地点的多个计算设备上执行,又或者,在分布在多个地点且通过通信网络互连的多个计算设备上执行。
综上所述,通过本申请实施例能够基于参考帧中植入区域的像素分布特性所构建模型,对待检测帧中的植入区域进行遮挡检测,并且基于遮挡检测结果对模型参数进行更新,能够使得待检测帧的植入区域拟合参考帧中植入区域的背景像素分布,使得植入信息能够更好地融入到视频的背景中而不遮挡前景,从而带来更好的观看体验。另外对于动态镜头形 成的视频利用特征点确定植入区域,并且通过变换将待检测帧中植入区域的像素点映射到与参考帧一致的位置,而不需用运动跟踪的方式,实时性更高且鲁棒性更强。
以上所述,仅为本申请的实施例而已,并非用于限定本申请的保护范围。凡在本申请的精神和范围之内所作的任何修改、等同替换和改进等,均包含在本申请的保护范围之内。

Claims (14)

  1. 一种在视频中植入信息的方法,其特征在于,所述方法应用于执行设备,包括:
    构建符合参考帧中植入区域的像素分布特性的模型,基于所述参考帧后续的待检测帧控制所述模型的更新;
    基于所述模型识别所述待检测帧中植入区域的背景和前景,生成用于遮挡所述背景、用于显露所述前景的模板;
    将待植入信息应用所述模板,以屏蔽所述待植入信息中会遮挡所述前景的内容;
    将应用所述模板后的所述待植入信息覆盖到所述待检测帧中植入区域,以使所述前景相对于所述待植入信息突出显示。
  2. 根据权利要求1所述的方法,其特征在于,所述构建符合参考帧中植入区域的像素分布特性的模型,包括:
    对应所述参考帧中植入区域的每个像素点,初始化与所述像素点对应的至少一个子模型以及所述至少一个子模型对应的权重;
    将对应每个像素点构建的子模型基于初始化的权重混合,以形成与所述像素点对应的模型。
  3. 根据权利要求1所述的方法,其特征在于,所述基于所述参考帧后续的待检测帧控制所述模型的更新,包括:
    响应于所述待检测帧中植入区域被所述前景遮挡,将所述模型向所述待检测帧中植入区域进行拟合的速率进行减小;
    响应于所述待检测帧中植入区域未被所述前景遮挡、且所述待检测帧中植入区域的光照情况发生变化,将所述模型向所述待检测帧中植入区域进行拟合的速率进行提升。
  4. 根据权利要求1所述的方法,其特征在于,所述基于所述参考帧后续的待检测帧控制所述模型的更新,包括:
    响应于所述待检测帧中植入区域的像素点与对应模型中的至少一个子模型匹配,更新所述匹配的子模型的参数,以及保持所述对应模型中未匹配的子模型的参数不变。
  5. 根据权利要求1所述的方法,其特征在于,所述基于所述模型识别所述待检测帧中植入区域中的背景和前景,包括:
    将所述待检测帧中植入区域的每个像素点的颜色值,与所述像素点对应模型中的子模型匹配;
    将匹配成功的像素点识别为所述背景的像素点,将匹配失败的像素点识别为所述前景 的像素点。
  6. 根据权利要求1所述的方法,其特征在于,所述生成用于遮挡所述背景、用于显露所述前景的模板,包括:
    对应所述待检测帧中植入区域中被识别为背景的像素点,在空的所述模板中对应的位置填充二进制一,以及
    对应所述待检测帧中植入区域中被识别为前景的像素点,在填充二进制一的所述模板中对应的位置填充二进制零。
  7. 根据权利要求1所述的方法,其特征在于,所述将待植入信息应用所述模板,包括:
    将所述待植入信息,与所述模板中每个位置填充的二进制数进行乘法操作。
  8. 根据权利要求1至7任一项所述的方法,其特征在于,所述方法还包括:
    响应于视频为采用运动镜头形成,将从所述视频的参考帧中植入区域提取的特征,与从所述待检测帧中提取的特征匹配;
    响应于匹配成功,确定所述待检测帧中包括与所述参考帧中植入区域对应的植入区域。
  9. 根据权利要求1至7任一项所述的方法,其特征在于,所述方法还包括:
    响应于视频为采用运动镜头形成,
    基于所述参考帧后续的待检测帧控制所述模型的更新之前,将所述待检测帧中植入区域进行变换,以使变换后的植入区域中每个像素点的位置,与所述参考帧中植入区域相应像素点的位置一致;
    将待植入信息应用所述模板之前,将所述模板进行所述变换的逆变换,以使变换后的模板中每个二进制数的位置,与所述待检测帧中植入区域相应像素点的位置一致。
  10. 根据权利要求1至7任一项所述的方法,其特征在于,所述方法还包括:
    响应于视频为采用静态镜头形成,基于所述参考帧中植入区域的位置,在所述待检测帧中定位相应位置的区域,以确定所述待检测帧中植入区域。
  11. 根据权利要求1至7任一项所述的方法,其特征在于,所述方法还包括:
    响应于所述待检测帧中植入区域的第一色彩空间分布,与所述参考帧中植入区域的第一色彩空间分布满足第一差异性条件,确定所述待检测帧中植入区域被所述前景遮挡;
    响应于所述待检测帧中植入区域的第二色彩空间分布,与所述参考帧中植入区域的第二色彩空间分布满足第二差异性条件,确定所述待检测帧中植入区域的光照情况发生变化。
  12. 一种计算机设备,其特征在于,包括:
    存储器,用于存储可执行指令;
    处理器,用于执行所述存储器中存储的可执行指令时,实现权利要求1至11中任一项 所述的方法。
  13. 一种存储介质,其特征在于,所述存储介质存储有可执行指令,用于引起处理器执行时,实现权利要求1至11任一项所述的方法。
  14. 一种计算机程序产品,其特征在于,所述计算机程序产品存储有计算机程序,用于由处理器加载并执行时,实现如权利要求1至11任一项所述的方法。
PCT/CN2020/085939 2019-05-09 2020-04-21 在视频中植入信息的方法、计算机设备及存储介质 Ceased WO2020224428A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP20802358.0A EP3968627B1 (en) 2019-05-09 2020-04-21 Method for implanting information into video, computer device and storage medium
JP2021532214A JP7146091B2 (ja) 2019-05-09 2020-04-21 ビデオへの情報埋め込み方法、コンピュータ機器及びコンピュータプログラム
US17/394,579 US11785174B2 (en) 2019-05-09 2021-08-05 Method for implanting information into video, computer device and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910385878.4 2019-05-09
CN201910385878.4A CN110121034B (zh) 2019-05-09 2019-05-09 一种在视频中植入信息的方法、装置、设备及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/394,579 Continuation US11785174B2 (en) 2019-05-09 2021-08-05 Method for implanting information into video, computer device and storage medium

Publications (1)

Publication Number Publication Date
WO2020224428A1 true WO2020224428A1 (zh) 2020-11-12

Family

ID=67522038

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/085939 Ceased WO2020224428A1 (zh) 2019-05-09 2020-04-21 在视频中植入信息的方法、计算机设备及存储介质

Country Status (5)

Country Link
US (1) US11785174B2 (zh)
EP (1) EP3968627B1 (zh)
JP (1) JP7146091B2 (zh)
CN (1) CN110121034B (zh)
WO (1) WO2020224428A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112672173A (zh) * 2020-12-09 2021-04-16 上海东方传媒技术有限公司 一种电视直播信号中特定内容的遮挡方法及系统
CN113486803A (zh) * 2021-07-07 2021-10-08 北京沃东天骏信息技术有限公司 视频中嵌入图像的装置
US11785174B2 (en) 2019-05-09 2023-10-10 Tencent Technology (Shenzhen) Company Limited Method for implanting information into video, computer device and storage medium

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111652207B (zh) * 2019-09-21 2021-01-26 深圳久瀛信息技术有限公司 定位式数据加载装置和方法
CN113011227B (zh) * 2019-12-19 2024-01-26 合肥君正科技有限公司 一种遮挡检测中背景更新预判断期间避免误报的辅助检测方法
CN111556336B (zh) * 2020-05-12 2023-07-14 腾讯科技(深圳)有限公司 一种多媒体文件处理方法、装置、终端设备及介质
CN111556337B (zh) * 2020-05-15 2021-09-21 腾讯科技(深圳)有限公司 一种媒体内容植入方法、模型训练方法以及相关装置
CN111556338B (zh) * 2020-05-25 2023-10-31 腾讯科技(深圳)有限公司 视频中区域的检测方法、信息融合方法、装置和存储介质
GB2599437A (en) * 2020-10-02 2022-04-06 Sony Europe Bv Client devices, server, and methods
CN113989396B (zh) * 2021-11-05 2025-12-16 北京字节跳动网络技术有限公司 图片渲染方法、装置、设备、存储介质和程序产品
CN115761598B (zh) * 2022-12-20 2023-09-08 易事软件(厦门)股份有限公司 一种基于云端业务平台的大数据分析方法及系统
CN116939294B (zh) * 2023-09-17 2024-03-05 世优(北京)科技有限公司 视频植入方法、装置、存储介质及电子设备
CN116939293B (zh) * 2023-09-17 2023-11-17 世优(北京)科技有限公司 植入位置的检测方法、装置、存储介质及电子设备

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1144588A (zh) * 1994-03-14 1997-03-05 美国赛特公司 一种将图像植入视像流的系统
US20100315510A1 (en) * 2009-06-11 2010-12-16 Motorola, Inc. System and Method for Providing Depth Imaging
CN105191287A (zh) * 2013-03-08 2015-12-23 吉恩-鲁克·埃法蒂卡迪 替换视频流中的对象的方法及计算机程序
CN107347166A (zh) * 2016-08-19 2017-11-14 北京市商汤科技开发有限公司 视频图像的处理方法、装置和终端设备
CN107493488A (zh) * 2017-08-07 2017-12-19 上海交通大学 基于Faster R‑CNN模型的视频内容物智能植入的方法
CN108961304A (zh) * 2017-05-23 2018-12-07 阿里巴巴集团控股有限公司 识别视频中运动前景的方法和确定视频中目标位置的方法
CN110121034A (zh) * 2019-05-09 2019-08-13 腾讯科技(深圳)有限公司 一种在视频中植入信息的方法、装置及存储介质

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008008045A1 (en) * 2006-07-11 2008-01-17 Agency For Science, Technology And Research Method and system for context-controlled background updating
US8477246B2 (en) * 2008-07-11 2013-07-02 The Board Of Trustees Of The Leland Stanford Junior University Systems, methods and devices for augmenting video content
JP5994493B2 (ja) * 2012-08-31 2016-09-21 カシオ計算機株式会社 動画像前景切抜き装置、方法、およびプログラム
CN105654458A (zh) * 2014-11-14 2016-06-08 华为技术有限公司 图像处理的方法及装置
EP3433816A1 (en) * 2016-03-22 2019-01-30 URU, Inc. Apparatus, systems, and methods for integrating digital media content into other digital media content
US20190130215A1 (en) * 2016-04-21 2019-05-02 Osram Gmbh Training method and detection method for object recognition
US20180048894A1 (en) * 2016-08-11 2018-02-15 Qualcomm Incorporated Methods and systems of performing lighting condition change compensation in video analytics
US10198621B2 (en) * 2016-11-28 2019-02-05 Sony Corporation Image-Processing device and method for foreground mask correction for object segmentation
US11720745B2 (en) * 2017-06-13 2023-08-08 Microsoft Technology Licensing, Llc Detecting occlusion of digital ink
US10646999B2 (en) * 2017-07-20 2020-05-12 Tata Consultancy Services Limited Systems and methods for detecting grasp poses for handling target objects
CN108419115A (zh) * 2018-02-13 2018-08-17 杭州炫映科技有限公司 一种广告植入方法
CN109461174B (zh) * 2018-10-25 2021-01-29 北京陌上花科技有限公司 视频目标区域跟踪方法和视频平面广告植入方法及系统

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1144588A (zh) * 1994-03-14 1997-03-05 美国赛特公司 一种将图像植入视像流的系统
US20100315510A1 (en) * 2009-06-11 2010-12-16 Motorola, Inc. System and Method for Providing Depth Imaging
CN105191287A (zh) * 2013-03-08 2015-12-23 吉恩-鲁克·埃法蒂卡迪 替换视频流中的对象的方法及计算机程序
CN107347166A (zh) * 2016-08-19 2017-11-14 北京市商汤科技开发有限公司 视频图像的处理方法、装置和终端设备
CN108961304A (zh) * 2017-05-23 2018-12-07 阿里巴巴集团控股有限公司 识别视频中运动前景的方法和确定视频中目标位置的方法
CN107493488A (zh) * 2017-08-07 2017-12-19 上海交通大学 基于Faster R‑CNN模型的视频内容物智能植入的方法
CN110121034A (zh) * 2019-05-09 2019-08-13 腾讯科技(深圳)有限公司 一种在视频中植入信息的方法、装置及存储介质

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11785174B2 (en) 2019-05-09 2023-10-10 Tencent Technology (Shenzhen) Company Limited Method for implanting information into video, computer device and storage medium
CN112672173A (zh) * 2020-12-09 2021-04-16 上海东方传媒技术有限公司 一种电视直播信号中特定内容的遮挡方法及系统
CN113486803A (zh) * 2021-07-07 2021-10-08 北京沃东天骏信息技术有限公司 视频中嵌入图像的装置

Also Published As

Publication number Publication date
US11785174B2 (en) 2023-10-10
JP2022531639A (ja) 2022-07-08
CN110121034B (zh) 2021-09-07
CN110121034A (zh) 2019-08-13
EP3968627A1 (en) 2022-03-16
US20210368112A1 (en) 2021-11-25
EP3968627A4 (en) 2022-06-29
EP3968627B1 (en) 2025-10-29
JP7146091B2 (ja) 2022-10-03

Similar Documents

Publication Publication Date Title
CN110121034B (zh) 一种在视频中植入信息的方法、装置、设备及存储介质
US12307732B2 (en) Methods for handling occlusion in augmented reality applications using memory and device tracking and related apparatus
CN112257729A (zh) 图像识别方法、装置、设备及存储介质
CN120953465B (zh) 生成式ai模型实时渲染引擎构建方法及其相关设备
KR20240049098A (ko) 뷰 증강 기반의 뉴럴 렌더링 방법 및 장치
KR20250119873A (ko) Gan 머신러닝 학습방법을 이용한 주행 시나리오 머신러닝 모델 생성 방법
KR20240106536A (ko) 촬영에 기반한 정밀한 인터랙션 가능한 오브젝트를 메타버스에서 구현하는 방법 및 시스템
CN114565872A (zh) 视频数据处理方法、装置、设备及计算机可读存储介质
KR102689751B1 (ko) 실시간 머신러닝 모델 업데이트 방법
KR20250120476A (ko) 실시간 머신러닝 모델 업데이트 방법
KR20250119872A (ko) Gan 머신러닝 학습방법을 이용한 주행 시나리오 머신러닝 모델 생성 방법
KR20250119870A (ko) 주행 시나리오 머신러닝 모델을 이용한 가상 객체 생성 방법
KR20250120478A (ko) 실시간 머신러닝 모델 업데이트 방법
KR20250120473A (ko) 차량 주행 시나리오 머신러닝 모델 업데이트 방법
WO2023221292A1 (en) Methods and systems for image generation
US12169908B2 (en) Two-dimensional (2D) feature database generation
TWM625817U (zh) 具時序平滑性之影像模擬系統
CN115965674B (zh) 具有时序平滑性的破碎深度图补正系统
TWI804001B (zh) 具時序平滑性之破碎深度圖補正系統
KR102816839B1 (ko) Gan 머신러닝 학습방법을 이용한 주행 시나리오 머신러닝 모델 생성 방법
KR102703811B1 (ko) 다양한 영상 이미지 결합 방법
KR102781494B1 (ko) 차량 주행 시나리오 머신러닝 모델 생성 방법
KR102806483B1 (ko) 주행 시나리오 머신러닝 모델에 기반한 객체 인식 개선 방법
US12614282B2 (en) Method and electronic device for generating a machine learning training dataset for shadow direction and removal
KR102806482B1 (ko) 주행 시나리오 머신러닝 모델을 이용한 가상 객체 생성 방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20802358

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021532214

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020802358

Country of ref document: EP

Effective date: 20211209

WWG Wipo information: grant in national office

Ref document number: 2020802358

Country of ref document: EP