WO2020038243A1 - 一种视频摘要生成方法、装置、计算设备和存储介质 - Google Patents
一种视频摘要生成方法、装置、计算设备和存储介质 Download PDFInfo
- Publication number
- WO2020038243A1 WO2020038243A1 PCT/CN2019/100051 CN2019100051W WO2020038243A1 WO 2020038243 A1 WO2020038243 A1 WO 2020038243A1 CN 2019100051 W CN2019100051 W CN 2019100051W WO 2020038243 A1 WO2020038243 A1 WO 2020038243A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- target
- video
- target tracking
- tracking sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/73—Querying
- G06F16/738—Presentation of query results
- G06F16/739—Presentation of query results in form of a video summary, e.g. the video summary being a video sequence, a composite still image or having synthesized frames
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/75—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7847—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
- G06F16/785—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content using colour or luminescence
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0499—Feedforward networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
- G06V20/47—Detecting features for summarising video content
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/85—Assembly of content; Generation of multimedia applications
- H04N21/854—Content authoring
- H04N21/8549—Creating video summaries, e.g. movie trailer
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
Definitions
- the present application relates to the field of communication technologies, and in particular, to a method, a device, a computing device, and a storage medium for generating a video abstract.
- Video Abstract is a technology that can summarize the main content of the original video.
- video summary technology uses the analysis of video content to reduce the cost of video storage, classification and indexing, and improve the efficiency, availability and accessibility of video. Is the development of content-based video analysis technology.
- An embodiment of the present application provides a method for generating a video summary, and the method includes:
- Video synthesis is performed on the structured image data that meets the target screening conditions to generate a video summary.
- An embodiment of the present application provides a device for generating a video summary.
- the device includes:
- a first obtaining unit configured to obtain a target filtering condition for generating a video summary
- a searching unit configured to search for structured image data in a video database according to the target screening condition, to obtain structured image data that meets the target screening condition, and the structured image data is structured stored image data;
- a generating unit is configured to perform video synthesis on the structured image data that meets the target screening condition to generate a video summary.
- An embodiment of the present application further provides a computing device, including a processor
- Memory for storing processor-executable instructions
- the processor is configured to execute the video digest generating method according to the embodiment of the present application.
- An embodiment of the present application further provides a storage medium, where the storage medium stores a plurality of instructions, and the instructions are suitable for loading by a processor to execute the video digest generating method described in the embodiments of the present application.
- FIG. 1 is a schematic diagram of an embodiment of a video digest generating system according to an embodiment of the present application
- FIG. 2A is a schematic diagram of an embodiment of a video abstract generating method according to an embodiment of the present application.
- step S102 is a specific flowchart of step S102 according to an embodiment of the present application.
- 2C is a specific flowchart of steps of processing a video to be processed and storing a processing result in the video database according to an embodiment of the present application;
- step S121 is a specific flowchart of step S121 according to an embodiment of the present application.
- step S213 is a specific flowchart of step S213 according to an embodiment of the present application.
- step S312 is a specific flowchart of step S312 according to an embodiment of the present application.
- FIG. 2G is a specific flowchart of step S413 according to an embodiment of the present application.
- FIG. 2H is a specific flowchart of step S214 according to an embodiment of the present application.
- FIG. 2I is a specific flowchart of step S123 according to an embodiment of the present application.
- FIG. 3 is a schematic diagram of an embodiment of synthesizing an effective image of each frame image by using a single-channel grayscale image of each frame image and a local feature map in each frame image according to an embodiment of the present application;
- FIG. 4 is a schematic diagram of another embodiment of a video digest generating method according to an embodiment of the present application.
- FIG. 5 is a schematic diagram of an embodiment of an application scenario for generating a video digest according to an embodiment of the present application
- FIG. 6A is a schematic diagram of an embodiment of a device for generating a video digest according to an embodiment of the present application
- 6B is a schematic structural diagram of a search unit in an embodiment of the present application.
- 6C is a schematic structural diagram of an attribute analysis unit in an embodiment of the present application.
- FIG. 7 is a schematic structural diagram of a server device according to an embodiment of the present application.
- the computer execution referred to herein includes operations by a computer processing unit that represents electronic signals in a structured form. This operation transforms the data or maintains it at a location in the computer's memory system, which can be reconfigured or otherwise altered the operation of the computer in a manner well known to testers in the art.
- the data structure maintained by the data is the physical location of the memory, which has specific characteristics defined by the data format.
- Testers in the art will understand that various steps and operations described below can also be implemented in hardware.
- module can be considered as a software object executing on the computing system.
- the different components, modules, engines, and services described in this article can be considered as implementation objects on this computing system.
- the devices and methods described herein are implemented in software, and of course, they can also be implemented in hardware, which are all within the protection scope of this application.
- Video summary is a technique that summarizes the main content of the original video. As our requirements for video data processing continue to increase and the amount of video data continues to increase, people need to build a summary for a long video to quickly browse in order to make better use of it. Through video summary technology, let us not only use text, but also make full use of audio and video information in content-based video retrieval. The role of the video summary is mainly to facilitate storage and viewing or searching of the video. Compared with the original video material, the length of the video summary is much shorter, saving storage time and space. The video summary retains the main points of the original content. That said, browsing or finding a video summary saves time than browsing the original video.
- the processing of video content is relatively simple, and the data content of the video content is not structured.
- the rapid screening and retrieval of video content cannot be achieved, which results in limited use scenarios and application scenarios.
- Embodiments of the present application provide a method, a device, a computing device, and a storage medium for generating a video abstract.
- FIG. 1 is a schematic diagram of a video digest generating system provided by an embodiment of the present application.
- the video digest generating system includes a server device 101.
- the server device 101 may be a server or may be A server cluster consisting of several servers, or a cloud computing service center.
- the video abstract generating system has a function of generating a video abstract.
- the video abstract generating system may include a video abstract generating device 102.
- the video abstract generating device 102 may be specifically integrated in the server device 101.
- the server device 101 is the server device in FIG. 1, and the server device 101 is mainly used to obtain a target screening condition for generating a video summary; according to the target screening condition, search for structured image data in a video database to obtain a video that meets the target screening condition.
- Structured image data the structured image data is structured stored image data; video synthesis is performed on the structured image data that meets the target screening conditions to generate a video summary.
- the video summary generating system may further include one or more first terminal devices 103, and the first terminal device 103 may be used as an image acquisition device, such as a camera or a personal computer (PC) with a camera, a notebook computer, a smartphone, a PAD Or a tablet computer, etc., can capture images and convert the captured images into a computer-readable form, such as video. Only one first terminal device 103 is shown in FIG. 1. It should be noted that, in actual applications, one or more first terminal devices 103 may be set as required.
- an image acquisition device such as a camera or a personal computer (PC) with a camera, a notebook computer, a smartphone, a PAD Or a tablet computer, etc.
- the video digest generating system may further include a memory 104 for storing a video database, and the video database stores video data.
- the video data may be video data captured by one or more first terminal devices 103, such as one or more
- the surveillance video data captured by the surveillance camera may also be other video and video data.
- the video data includes structured image data in units of target tracking sequences for users to retrieve video content and generate video summaries.
- the video summary generating system may further include a second terminal device 105 for displaying the video summary generated by the server device 101 received from the server device 101, where the second terminal device 105 may be a personal computer (PC), a notebook Smart terminal devices such as computers can also be smart mobile terminal devices such as smart phones, PADs, or tablet computers.
- PC personal computer
- notebook Smart terminal devices such as computers can also be smart mobile terminal devices such as smart phones, PADs, or tablet computers.
- FIG. 1 the scene schematic diagram of the video digest generating system shown in FIG. 1 is only an example.
- the video digest generating system and the scene described in the embodiment of the present application are for more clearly illustrating the technical solution of the embodiment of the present application, and are not It constitutes a limitation on the technical solutions provided by the embodiments of the present application.
- Those of ordinary skill in the art may know that with the evolution of the video abstraction generation system and the emergence of new business scenarios, the technical solutions provided by the embodiments of the present application are similar to similar technical problems. Be applicable.
- the present application provides a video abstract generating method, which includes: obtaining a target filtering condition for generating a video abstract; and searching for structured image data in a video database according to the target filtering condition to obtain a structured image that meets the target filtering condition Data, the structured image data is structured stored image data; video synthesis is performed on the structured image data that meets the target screening condition to generate a video summary.
- the video digest generating method is executed by a server device, and includes the following steps:
- a filtering condition may be selected in the preset filtering options to obtain a target filtering condition.
- the filtering options may be specifically set according to actual application requirements. For example, color options (such as red, black, or no restrictions, etc.), object category options (such as people or vehicles, and more specifically, men or women, cars or bicycles, etc.), target track direction options (such as target track direction is From south to north) and so on.
- the target filtering condition is the default filtering condition. For example, if the user does not select in the filtering options and the return value is empty, the default selection is all in the filtering options.
- the target filtering condition includes a corresponding keyword selected in the filtering option, and the keyword may include one or more target keywords.
- red, man, and car, etc. means that image data with target attribute characteristics such as red, man, and car need to be found in the video database.
- the target filtering conditions can also include some other setting conditions for generating the video summary, which can be set according to actual application requirements.
- the target filtering conditions include the target composition density, and the target composition density indicates the generated video summary.
- three levels of high, middle, and low of the target composite density may be set, and each position corresponds to a target composite density.
- the target composite density is In the low-grade mode, the number of targets in each frame of the generated video summary is 3, and when the target synthesis density is in the medium-grade mode, the number of targets in each frame of the generated video summary is 6.
- the target composite density When the target composite density is high-end, it means that the number of targets in each frame of the generated video summary is 9.
- the target screening condition includes a screening time range, such as 2018.3.1 to 2018.3.2, and of course, it may further include information of hours, minutes, or seconds.
- the video digest generating device when the user selects video content filtering, the video digest generating device obtains the filtering condition selected by the user in the filtering option, and obtains the target filtering condition.
- the structured image data is structured stored image data.
- the video database stores structured image data of video images, for example, the video database stores structured image data in units of target tracking sequences.
- FIG. 2B is a specific flowchart of step S102 according to the embodiment of the present application.
- the structured image data is searched in the video database according to the target screening condition, and the structured image data that meets the target screening condition may include :
- the video database stores structured image data in units of target tracking sequences.
- the target tracking sequence is to track the target in the video to be processed.
- the obtained video to be processed includes the frame sequence of the tracking target.
- the data of the target tracking sequence includes identification information of the target tracking sequence, attribute information of the target tracking sequence, and each foreground image in the target tracking sequence.
- the filtered target tracking sequence is a target tracking sequence in the video database that has the same attribute information as the keywords in the target filtering condition.
- S124 Obtain the structured image data of the filtered target tracking sequence from the video database to obtain structured image data that meets the target screening conditions.
- structured image data in units of target tracking sequences will be stored in the video database.
- the structured data is also called row data, which is logically expressed and realized by a two-dimensional table structure. It follows data format and length specifications, and is mainly stored and managed through relational databases.
- the image data is stored in a structured data format, that is, structured image data.
- a video database is separately set.
- the video data stored in the video database in the embodiment of the present application can be surveillance video data captured by one or more surveillance cameras, that is, this video
- the video database may be a surveillance video database such as a community surveillance video database, an intersection surveillance video database, a garage surveillance video database, or a mall surveillance video database.
- the video database may also be Other film and television video databases. Therefore, the video database in the embodiment of the present application may be any video database that needs to perform video content retrieval, which is not specifically limited herein.
- the method may further include a step of processing the video to be processed and storing the processing result in the video database.
- FIG. 2C illustrates the processing of the video to be processed. A flowchart of storing the processing results in a video database. As shown in FIG. 2C, the embodiment of the present application may further include:
- S121 Perform attribute analysis on the video to be processed to determine a target tracking sequence from the video to be processed, and obtain structured image data of each target tracking sequence in the video to be processed;
- the video to be processed can be stored in the video database. If the video database is a surveillance video database, the video to be processed can be new video data in a certain period of time, for example, new video data on a certain day, or Video data added in one hour or video data added in half a day can be set according to the actual scene requirements.
- FIG. 2D shows a specific flowchart of step S121. As shown in FIG. 2D, it may specifically include:
- S212 Perform foreground extraction on the image to be processed to obtain a foreground image of each frame of the image to be processed;
- S213 Perform attribute analysis using the foreground image of the image to be processed to obtain the attribute analysis result of each target tracking sequence in the image to be processed;
- the target frame in the current frame in the image to be processed is compared with the target frame in the previous frame, and the feature response is the strongest (that is, the image in the target frame is compared with the target frame in the previous frame.
- the target with the highest image matching degree is used as the tracking target, and the target is tracked.
- tracking target is tracked or the preset number of tracking frames (for example, 1000 frames) is reached, tracking of the target is completed.
- the frame sequence containing the tracking target in the image to be processed is the target tracking. sequence.
- an attribute analysis result is obtained, and the attribute analysis result may include attribute information of each target tracking sequence.
- obtaining the to-be-processed image in the to-be-processed video can obtain each frame image in the to-be-processed video. Since there may be a monitoring image that does not change much for a long time for the monitoring video, in order to improve the subsequent processing efficiency, the to-be-processed
- the to-be-processed image in the video may also be a key frame image in the to-be-processed video.
- the key-frame image in the to-be-processed video is obtained, that is, the to-be-processed image in the to-be-processed video may specifically include: Frame detection to obtain the key frames in the video to be processed; the key frames are used as the image to be processed.
- an existing key frame extraction algorithm can be used. By using key frame detection, it is possible to select only one key frame in an image that does not change much if there is a large number of repetitions in the video to be processed, or not select it (such as when there is no target in a surveillance image).
- background modeling may be performed in advance to perform foreground extraction on an image to be processed to obtain a foreground image of each frame image in the image to be processed.
- the process of foreground extraction of the image to be processed can be quickly implemented to obtain the foreground image of each frame image in the image to be processed.
- the process steps can specifically include: converting each frame image in the image to be processed into a single-channel grayscale Figures; extracting local feature maps of preset types in each frame of the image to be processed; determining the foreground image of each frame of the image to be processed based on the single-channel grayscale image of each frame and the local feature map in each frame of image .
- determining the foreground image of each frame image in the image to be processed may include: combining the single-channel grayscale image of each frame image and each frame The local feature map in the image is used to synthesize the effective image of each frame image; the effective image of each frame image is matched with a preset mixed Gaussian model to obtain the foreground image of each frame image in the image to be processed.
- the input of the mixed Gaussian mode is a multi-channel image (d), and the different channels of the image correspond to different data sources (b) (c );
- the video frame (a) in the image to be processed is usually a color image, that is, an RGB (Red, Green, Blue) three-channel image (color is obtained by mixing red, green, and blue at different ratios, so the color image contains (Three monochrome images representing the red channel, the green channel, and the blue channel respectively);
- the RGB three-channel image (a) of each frame image is compressed into a single-channel grayscale image of each frame image (b), as a channel of the multi-channel map (d) input by the mixed Gaussian model, and based on the RGB three-channel image (a) of each frame of the image (a) extracted local features (such as texture, shape and other features) map (c) as The other channels of the multi-channel map (d) input from the mixed Gaussian modeling are
- the mixed Gaussian model separates the foreground and background information from the effective image. At the same time, it will slowly update itself gradually, so that the background information it saves and maintains is consistent with the latest background information.
- the hybrid Gaussian model is updated after a frame of foreground image is obtained using the mixed Gaussian (the corresponding background image is also determined at this time). Since the process of separating the image to obtain the foreground image and the background image, the hybrid Gaussian model is updated to Common technical means in the field of technology, specific details are not repeated here.
- the obtained foreground image can be further processed, for example, by expanding the foreground segmentation image and extracting contours and refilling, the foreground image can be further reduced.
- the holes and defects make the extracted foreground images have better results. .
- the background of the frame image can be saved accordingly for subsequent use.
- the video to be processed is a surveillance video
- the surveillance camera is generally fixed and the shooting angle is fixed.
- the background image of the image taken in this way is relatively fixed. For example, the background image of the video that has been taken is always facing the intersection.
- the background image can be saved according to a preset duration, or saved in proportion to the preset frame number of the foreground image (that is, each background image corresponds to the foreground image with a preset number of frames in each target tracking sequence), such as
- a to-be-processed video saves a background image every 30 minutes, and the foreground image of the target tracking sequence in this time period corresponds to the background image, or every 1000 frames of foreground image are saved, a frame of background image is saved, that is, a frame of background image corresponds to 1000 Frame foreground image. In this way, a corresponding background image can be saved for each target tracking sequence.
- step S213 may include:
- S311 Perform target tracking on the foreground image of the image to be processed to obtain a target tracking sequence in the image to be processed;
- S312 Perform attribute analysis on each target tracking sequence to obtain an attribute analysis result of each target tracking sequence, and the attribute analysis result includes attribute information of each target tracking sequence.
- the foreground image of the image to be processed is used for attribute analysis to obtain the attribute analysis results of each target tracking sequence.
- a preset target tracking algorithm can be used, such as KCF (High-speed Tracking with Kernelized Correlation Filters) algorithm and KCF algorithm improved algorithm
- the improved algorithm of the KCF algorithm uses the target detection technology (here, the foreground modeling technology of the background modeling in this embodiment may be used, or other detection technologies may be used instead) to detect the target frame globally in the current frame, and the KCF algorithm is used in The target frame to be detected locally is compared with the target frame of the previous frame, and the tracking target is selected as the tracking target with the strongest characteristic response (that is, the target frame has the highest matching image with the previous frame).
- the target tracking sequence in the image to be processed In the embodiment of the present application, when a tracking target is tracked or a preset number (for example, 1000 frames) of tracking frames is reached, the target is completed. track.
- the attribute analysis result may include the attribute information of each target tracking sequence, and the attributes of the target tracking sequence.
- the information may be extracted according to actual needs.
- the attribute information of the target tracking sequence may include color attribute information of the target tracking sequence, object category attribute tracking information of the target tracking sequence, or target trajectory direction attribute information of the target tracking sequence.
- step S312 includes:
- S412 Count the number of pixels corresponding to each color in each foreground image according to the colors corresponding to the pixels of each foreground image;
- S413 Determine color attribute information of each target tracking sequence according to the number of pixels corresponding to each color in each foreground image.
- the preset mapping relationship between pixel values and colors can be RGB (Red, Green, Blue) color model, HSV (Hue, Saturation, Value) color model, YUV color model, or CMYK (Cyan, Magenta, Yellow, Black) color Models, etc. Since HSV is a relatively intuitive color model, it is widely used in many image processing fields. Therefore, the mapping relationship between preset pixel values and colors can be adopted as the HSV color model. After determining the color corresponding to the pixels of each foreground image in each target tracking sequence according to the preset mapping relationship between the pixel values and colors, the colors in each foreground image can be counted according to the colors corresponding to the pixels of each foreground image. The number of corresponding pixels in each foreground image. For example, in the foreground image A, 30 red pixels and 40 black pixels are determined.
- FIG. 2G shows a specific flowchart of step S413.
- S413 may include:
- the first preset ratio and the second preset ratio can be set according to actual needs. For example, the first preset ratio is 30% and the second preset ratio is 90%.
- the target foreground image includes 100 pixels, of which 30 are red pixels, 50 are black pixels, 10 are white pixels, and 10 are other color pixels.
- red and black are The attribute color of the target foreground image. After determining the attribute colors of all foreground images in the target tracking sequence, if 90% of the foreground images in the target tracking sequence have red attributes, it is determined that the target tracking sequence has the red attributes.
- FIG. 2F illustrates a specific flowchart of step S312.
- step S312 may specifically include:
- the object classification neural network model may be an existing object classification neural network model, or may be obtained by training and constructing an object classification neural network by using a preset number of foreground images and corresponding object category information in the foreground image. Network model. Currently, this technology is relatively mature and will not be described in detail here.
- object classification attribute information in each foreground image can be obtained, such as people (further details can also be men or women, adults or children, etc.) Classification), buses, cars or non-motor vehicles.
- the classification corresponding to the object classification attribute information can be set according to actual needs. For example, people can be divided into men or women, adults or children, and they can have both types of classification, that is, men and adults. We will not be specific here. limited.
- the attribute analysis results can also include attribute information of other types of target tracking sequences, which can be set according to actual needs.
- the attribute analysis results also include target direction trajectory attribute information in each target tracking sequence.
- the target tracking sequence performs attribute analysis to obtain the attribute analysis results of each target tracking sequence.
- the method may further include: according to the position information of the target in each foreground image in each target tracking sequence, determining the trajectory direction angle of the target, for example, the target It is from south to north.
- FIG. 2H shows a specific flowchart of step S214 after the attribute analysis result is obtained.
- the process may specifically include:
- S612 Obtain data of each target tracking sequence according to identification information of each target tracking sequence in the image to be processed, attribute information of each target tracking sequence, and each foreground image in each target tracking sequence;
- S613 Save the data of each target tracking sequence to a preset structured target attribute data structure, and obtain structured image data of each target tracking sequence.
- identification information may be allocated according to a preset rule, for example, identification information is allocated according to a numeric or letter number, such as an IDentity (ID) of the target tracking sequence.
- IDentity IDentity
- target tracking sequence 1 or target tracking sequence A where 1 or A is the ID of the target tracking sequence.
- the identification number of the target tracking sequence may also be based on the time corresponding to the target tracking sequence.
- saving the data of each target tracking sequence to a preset structured target attribute data structure, and obtaining the structured image data of each target tracking sequence may include calling a preset structured data storage interface to save each target The data of the tracking sequence are respectively saved into a preset structured target attribute data structure.
- An example of the target attribute data structure is shown in Table 1 below:
- each target tracking sequence is saved to a preset structured target attribute data structure, and the structured image data of each target tracking sequence may further include each of the above target tracking sequences.
- the corresponding background image, and each foreground frame in the target tracking sequence includes time information of each foreground frame.
- the structured image data of each target tracking sequence can be stored in a video database.
- the structured storage of the video can be realized according to the above-mentioned operation steps, and the corresponding video content can be filtered in the video database according to the filtering conditions.
- FIG. 2I illustrates a specific flowchart of step S123.
- the step of determining the filtered target tracking sequence in the video database according to the target screening conditions may include:
- the target tracking sequence that is screened out is a target tracking sequence in the video database that has the same attribute information as the keywords in the target screening conditions.
- the target filtering conditions include keywords of "red” and "man”
- the filtering time range is from 2018.3.1 to 2018.3.2
- the keywords of "red” and "man” are searched in the video database for In the period from 2018.3.1 to 2018.3.2
- the target tracking sequence with "red” and "man” attribute information is the target tracking sequence that is screened out.
- the structured image data of the filtered target tracking sequence can be obtained, and structured image data that meets the target screening conditions can be obtained.
- each background image in the multiple background images may correspond to a foreground image with a preset number of frames in each target tracking sequence;
- Performing video synthesis on the structured image data that meets the target screening conditions, and generating a video summary may specifically include:
- the target composition density refers to the number of targets in each frame of the generated video summary. For example, if the target composition density is 3, then the generated video summary includes 3 targets in each frame of the image.
- the target tracking sequence in the to-be-processed video includes 30 target tracking sequences, they are allocated according to the target composite density of 3, that is, each composite queue is allocated 10 target tracking sequences.
- every m foreground images in each synthesis queue correspond to multiple background images.
- a background image of the background image the m is the number of foreground images corresponding to a background image, such as 1000, that is, the identification, every 1000 foreground images in each composition queue, the background is the same each time After there are more than 1,000 background images, the background image is switched.
- the specific composite method can be as follows: the first foreground images of the N composite queues are all attached to the corresponding background image, the first frame of the composite video, and the new first of the N composite queues.
- Each foreground image is pasted to the second frame of the resulting video on the corresponding background image, and so on.
- the m + 1th synthesized image is synthesized, all the synthesized images of the final video summary are obtained. All synthetic images of yours can generate a video summary.
- the method in the embodiment of the present application may further include: obtaining a new filtering condition for generating a video summary; determining a new filtered target tracking sequence in a video database according to the new filtering condition; and obtaining a newly selected target tracking sequence from the video database.
- the structured image data of the target tracking sequence is used to obtain new structured image data; the new structured image data is video synthesized to generate a new video summary.
- the embodiment of the present application obtains a target filtering condition for generating a video abstract; searches for a structured image data in a video database according to the target filtering condition, and obtains structured image data that meets the target filtering condition;
- the structured image data of the filtering conditions is used for video synthesis to generate a video summary.
- the structured image data of the video image is stored in the video database, when the user retrieves the related video, the related video information can be quickly filtered from the structured image data to generate a video summary.
- the positioning of video target content by users greatly expands the application scenarios of video digests.
- the following describes a method for generating a video digest in an embodiment of the present application with reference to a specific application scenario.
- FIG. 4 is another schematic flowchart of a video digest generating method provided by an embodiment of the present application, which is executed by a server device.
- the method flow may include:
- the server device obtains a target filtering condition for generating a video summary.
- a filtering condition may be selected from preset filtering options to obtain a target filtering condition.
- the filtering options include a color option 501, a target synthesis density option 502, and a target type option 503, where the color option selection is "unlimited” (That is, the target color is not limited), the "low” level selected by the target synthesis density option (indicating that the number of targets in each frame image in the generated video summary is 3), and the "person" selected in the target type line.
- the content selected by this filtering option constitutes the target filtering condition.
- the server device determines the filtered target tracking sequence in the video database according to the target screening condition.
- the server device searches the video database for the target tracking sequence with "unrestricted color” and "person” attribute information according to the target filtering condition, and obtains the filtered target tracking sequence. It is assumed that there are 30 target tracking sequences selected.
- the server device obtains the structured image data of the filtered target tracking sequence from the video database to obtain structured image data that meets the target screening conditions.
- the server device can obtain the structured image data of the filtered target tracking sequence to obtain structured image data, where the structured image data is structured stored image data.
- the server device performs video synthesis on the structured image data that meets the target screening condition, and generates a video summary.
- the specific synthesis method It can be as follows: the new first foreground image of the three synthesis queues is pasted on the corresponding background image and the second frame of the synthesis result video, and so on. When the 1001th synthesis image is synthesized, the next background image is replaced. All the composite images of the final video summary are obtained by fitting, and a video summary can be generated based on all the composite images of the video summary. As shown in Figure 5, each frame of the video summary includes 3 target characters, each The time information corresponding to each target is marked in the frame image.
- the original video may be displayed on the interface for displaying the video summary, or the original video may not be displayed, and it may be specifically set according to actual needs.
- the server device obtains a target filtering condition for generating a video summary; the server device determines a filtered target tracking sequence in a video database according to the target filtering condition, and the video database stores a structure in which the target tracking sequence is a unit.
- the server device obtains the structured image data of the filtered target tracking sequence from the video database to obtain structured image data that meets the target screening conditions; the server device performs video synthesis on the structured image data to generate a video summary .
- the server device since the structured image data of the video image is stored in the video database with the target tracking sequence as a unit, when the user retrieves the relevant video, the server device can quickly filter the relevant video information from the structured image data to generate a video.
- the summary greatly facilitates the user's positioning of the video target content, and greatly expands the application scenario of the video summary.
- the embodiment of the present application further provides a device based on the video abstract generating method described above.
- the meanings of the nouns are the same as in the video abstract generation method described above. For specific implementation details, refer to the description in the method embodiment.
- FIG. 6A is a schematic structural diagram of a video digest generating device according to an embodiment of the present application.
- the video digest generating device may include a first obtaining unit 601, a finding unit 602, and a generating unit 603, as follows:
- a first obtaining unit 601, configured to obtain a target filtering condition for generating a video summary
- a searching unit 602 configured to search for structured image data in a video database according to a target filtering condition to obtain structured image data, where the structured image data is structured stored image data;
- a generating unit 603 is configured to perform video synthesis on the structured image data that meets the target screening condition to generate a video summary.
- FIG. 6B is a schematic structural diagram of a search unit 602 in the embodiment of the present application.
- the search unit 602 includes a determination subunit 6021 and an acquisition subunit 6022, as follows:
- a determining subunit 6021 configured to determine a filtered target tracking sequence in a video database according to the target screening condition, and the video database stores structured image data in units of the target tracking sequence;
- An obtaining subunit 6022 is configured to obtain the structured image data of the filtered target tracking sequence from a video database to obtain structured image data that meets the target screening conditions.
- the device further includes a second acquisition unit 604, an attribute analysis unit 605, and a storage unit 606, as follows:
- a second acquiring unit 604 configured to acquire a video to be processed
- An attribute analysis unit 605 configured to perform attribute analysis on the video to be processed to determine a target tracking sequence from the video to be processed, and obtain structured image data of each target tracking sequence in the video to be processed;
- the storage unit 606 is configured to store structured image data of each target tracking sequence in the video to be processed in the video database.
- FIG. 6C is a schematic structural diagram of an attribute analysis unit 605 in the embodiment of the present application.
- the attribute analysis unit 605 includes an acquisition subunit 6051, a foreground extraction subunit 6052, and an attribute analysis subunit 6053. ,details as follows:
- An acquisition subunit 6051 configured to acquire a to-be-processed image in a to-be-processed video
- a foreground extraction sub-unit 6052 configured to perform foreground extraction on an image to be processed to obtain a foreground image of each frame of the image to be processed;
- An attribute analysis subunit 6053 configured to perform attribute analysis using a foreground image of the image to be processed to obtain an attribute analysis result of each target tracking sequence in the image to be processed;
- the storage unit 606 is further configured to save the attribute analysis result into a preset structured target attribute data structure to obtain structured image data of each target tracking sequence.
- the attribute analysis subunit 6053 is specifically configured to:
- Target tracking of the foreground image of the image to be processed to obtain a target tracking sequence in the image to be processed
- Attribute analysis is performed on each target tracking sequence to obtain an attribute analysis result of each target tracking sequence, and the attribute analysis result includes attribute information of each target tracking sequence.
- the attribute analysis result includes color attribute information of each target tracking sequence, and the attribute analysis subunit 6053 is specifically configured to:
- the color attribute information of each target tracking sequence is determined.
- the attribute analysis subunit 6053 is specifically configured to:
- the target foreground image For each target tracking sequence, if the number of pixels corresponding to the target color in the target foreground image reaches a first preset ratio of the total number of pixels in the target foreground image, it is determined that the target foreground image has the target color Color attributes
- the target tracking sequence has the color attribute of the target color. If the foreground image of the second preset proportion in the target tracking sequence has the color attribute of the target color, it is determined that the target tracking sequence has the color attribute of the target color.
- the attribute analysis result includes object category attribute information of each foreground image in each target tracking sequence, and the attribute analysis subunit 6053 is specifically configured to:
- object category classification is performed for each foreground image in each target tracking sequence, and object category attribute information of each target tracking sequence is obtained.
- the foreground extraction subunit 6052 is specifically configured to:
- the foreground image of each frame image in the image to be processed is determined.
- the foreground extraction subunit 6052 is specifically configured to:
- the effective image of each frame is matched with a preset mixed Gaussian model to obtain the foreground image of each frame of the image to be processed.
- the storage unit 606 is specifically configured to:
- the data of each target tracking sequence is respectively stored in a preset structured target attribute data structure to obtain the structured image data of each target tracking sequence in the image to be processed.
- the obtaining subunit 6051 is specifically configured to:
- a key frame is used as the to-be-processed image.
- the determining subunit 6021 is specifically configured to:
- a target tracking sequence having the same attribute information as the keywords is determined in the video database, and a target tracking sequence that is screened out is obtained.
- the video database further includes multiple background images corresponding to each target tracking sequence, and each background image in the multiple background images corresponds to a foreground image with a preset number of frames in each target tracking sequence;
- the generating unit 603 is specifically configured to:
- the foreground images of the corresponding target tracking sequences in the N synthesis queues are sequentially pasted onto the corresponding background images to generate a video summary.
- the above units may be implemented as independent entities, or may be arbitrarily combined, and implemented as the same or several entities.
- the above units refer to the foregoing method embodiments, and details are not described herein again.
- the first obtaining unit 601 obtains a target filtering condition for generating a video abstract; the determining unit 602 searches the video database for structured image data according to the target filtering condition to obtain structured image data, and the structured image data Is structured stored image data; the generating unit 603 performs video synthesis on the structured image data that meets the target screening condition to generate a video summary.
- the structured image data of the video image is stored in the video database, when the user retrieves the related video, the related video information can be quickly filtered from the structured image data to generate a video summary, which greatly facilitates the user's
- the positioning of video target content has also greatly expanded the application scenarios of video summary.
- An embodiment of the present application further provides a computing device.
- the computing device may be a server device.
- FIG. 7 it illustrates a schematic structural diagram of a server device involved in the embodiment of the present application. Specifically,
- the server device may include components such as a processor 701 or more than one processing core, a memory 702, a power supply 703, and an input unit 704 of one or more computer-readable storage media.
- a processor 701 or more than one processing core a memory 702, a power supply 703, and an input unit 704 of one or more computer-readable storage media.
- FIG. 7 does not constitute a limitation on the server device, and may include more or fewer components than shown in the figure, or combine certain components, or arrange different components. among them:
- the processor 701 is a control center of the server device, and connects various parts of the entire server device by using various interfaces and lines, and runs or executes software programs and / or modules stored in the memory 702 and calls the stored programs in the memory 702. Data, perform various functions of the server device and process data, so as to monitor the server device as a whole.
- the processor 701 may include one or more processing cores; the processor 701 may integrate an application processor and a modem processor, wherein the application processor mainly processes an operation storage medium, a user interface, an application program, and the like.
- the modem processor mainly handles wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 701.
- the memory 702 may be used to store software programs and modules.
- the processor 701 executes various functional applications and data processing by running the software programs and modules stored in the memory 702.
- the memory 702 may mainly include a storage program area and a storage data area, where the storage program area may store a storage medium, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), etc .; the storage data area may store Data created based on the use of server equipment, etc.
- the memory 702 may include a high-speed random access memory, and may further include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices. Accordingly, the memory 702 may further include a memory controller to provide the processor 701 access to the memory 702.
- the server device further includes a power source 703 for supplying power to various components.
- the power source 703 can be logically connected to the processor 701 through a power management storage medium, so as to implement functions such as management of charging, discharging, and power management through the power management storage medium.
- the power source 703 may further include any one or more DC or AC power sources, a recharge storage medium, a power failure detection circuit, a power converter or inverter, and a power source status indicator.
- the server device may further include an input unit 704, which may be used to receive inputted numeric or character information, and generate keyboard, mouse, joystick, optical, or trackball signal inputs related to user settings and function control.
- an input unit 704 which may be used to receive inputted numeric or character information, and generate keyboard, mouse, joystick, optical, or trackball signal inputs related to user settings and function control.
- the server device may further include a display unit and the like, and details are not described herein again.
- the processor 701 in the server device loads the executable files corresponding to one or more application processes into the memory 702 according to the following instructions, and the processor 701 runs and stores the The application program in the memory 702, thereby realizing various functions, as follows:
- Obtain a target filtering condition for generating a video summary find structured image data in a video database according to the target filtering condition, and obtain structured image data that meets the target filtering condition, the structured image data is a structured stored image Data; video synthesis of the structured image data that meets the target screening conditions to generate a video summary.
- an embodiment of the present application provides a storage medium in which a plurality of instructions are stored, and the instructions can be loaded by a processor to execute the steps in any one of the video abstract generating methods provided in the embodiments of the present application.
- the instruction can perform the following steps:
- the structured image data of the target screening conditions is used for video synthesis to generate a video summary.
- the storage medium may include a read-only memory (ROM, Read Only Memory), a random access memory (RAM, Random Access Memory), a magnetic disk, or an optical disk.
- ROM read-only memory
- RAM random access memory
- magnetic disk or an optical disk.
- the instructions stored in the storage medium can execute the steps in any one of the video abstract generating methods provided in the embodiments of the present application, it can implement the capabilities of any one of the video abstract generating methods provided in the embodiments of the present application.
- the beneficial effects achieved are detailed in the previous embodiment, and are not repeated here.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Library & Information Science (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Computer Security & Cryptography (AREA)
- Signal Processing (AREA)
- Image Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Television Signal Processing For Recording (AREA)
Abstract
本申请实施例公开了一种视频摘要生成方法、装置、计算设备、和存储介质,本申请实施例中方法包括:根据选择的筛选条件,获取用于生成视频摘要的目标筛选条件;根据目标筛选条件在视频数据库中查找结构化图像数据,得到符合目标筛选条件的结构化图像数据,所述结构化图像数据为结构化存储的目标跟踪序列的数据;对结构化图像数据进行视频合成,生成视频摘要。
Description
本申请要求于2018年8月21日提交中国专利局、申请号为201810955587.X、名称为“一种视频摘要生成方法、装置和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及通信技术领域,具体涉及一种视频摘要生成方法、装置、计算设备和存储介质。
背景
视频摘要的英文名是Video Abstract,它是一个可以概括原始视频主要内容的技术。随着我们对视频数据处理的要求不断提高和视频数据量的不断增多,人们需要为一长段视频建立一段摘要来快速浏览以便更好地利用它。通过视频摘要技术,让我们在基于内容的视频检索中不仅仅能利用文字,而且能够充分利用音视频信息。视频摘要技术解决的问题是如何使视频数据有效的表示和快速的访问,它是利用对视频内容的分析来减小视频存储、分类和索引的代价,提高视频的使用效率、可用性和可访问性,是基于内容的视频分析技术的发展。
技术内容
本申请实施例提供了一种视频摘要生成方法,该方法包括:
获取用于生成视频摘要的目标筛选条件;
根据所述目标筛选条件在视频数据库中查找结构化图像数据,得到符合所述目标筛选条件的结构化图像数据,所述结构化图像数据为结构化存储的图像数据;
对所述符合所述目标筛选条件的结构化图像数据进行视频合成,生成视频摘要。
本申请实施例提供一种视频摘要生成装置,该装置包括:
第一获取单元,用于获取用于生成视频摘要的目标筛选条件;
查找单元,用于根据所述目标筛选条件在视频数据库中查找结构化图像数据,得到符合所述目标筛选条件的结构化图像数据,所述结构化图像数据为结构化存储的图像数据;
生成单元,用于对所述符合所述目标筛选条件的结构化图像数据进行视频合成,生成视频摘要。
本申请实施例还提供一种计算设备,包括处理器;
用于存储处理器可执行指令的存储器;
其中,所述处理器被配置执行本申请实施例所述的视频摘要生成方法。
本申请实施例还提供一种存储介质,所述存储介质存储有多条指令,所述指令适于处理器进行加载,以执行本申请实施例所述的视频摘要生成方法。
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的视频摘要生成系统的一个实施例示意图;
图2A是本申请实施例提供的视频摘要生成方法的一个实施例示意图;
图2B为本申请实施例所述的步骤S102的具体流程图;
图2C为本申请实施例中对待处理视频处理并将处理结果存储到该视频数据库的步骤的具体流程图;
图2D为本申请实施例所述的步骤S121的具体流程图;
图2E为本申请实施例所述的步骤S213的具体流程图;
图2F为本申请实施例所述的步骤S312的具体流程图;
图2G为本申请实施例所述的步骤S413的具体流程图;
图2H为本申请实施例所述的步骤S214的具体流程图;
图2I为本申请实施例所述的步骤S123的具体流程图;
图3是本申请实施例提供的每帧图像的单通道灰度图及每帧图像中的局部 特征图合成每帧图像的有效图像的一个实施例示意图;
图4是本申请实施例提供的视频摘要生成方法的另一个实施例示意图;
图5是本申请实施例提供的视频摘要生成应用场景的一个实施例示意图;
图6A是本申请实施例提供的视频摘要生成装置的一个实施例示意图
图6B是本申请实施例中查找单元的结构示意图;
图6C是本申请实施例中属性分析单元的结构示意图;
图7是本申请实施例提供的服务器设备的结构示意图。
实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
在以下的说明中,本申请的具体实施例将参考由一部或多部计算机所执行的步骤及符号来说明,除非另有说明。因此,这些步骤及操作将有数次提到由计算机执行,本文所指的计算机执行包括了由代表了以一结构化型式中的数据的电子信号的计算机处理单元的操作。此操作转换该数据或将其维持在该计算机的内存系统中的位置处,其可重新配置或另外以本领域测试人员所熟知的方式来改变该计算机的运作。该数据所维持的数据结构为该内存的实体位置,其具有由该数据格式所定义的特定特性。但是,本申请原理以上述文字来说明,其并不代表为一种限制,本领域测试人员将可了解到以下所述的多种步骤及操作亦可实施在硬件当中。
本文所使用的术语「模块」可看做为在该运算系统上执行的软件对象。本文所述的不同组件、模块、引擎及服务可看做为在该运算系统上的实施对象。而在一些实施例中,本文所述的装置及方法以软件的方式进行实施,当然也可在硬件上进行实施,均在本申请保护范围之内。
视频摘要是一个可以概括原始视频主要内容的技术。随着我们对视频数据处理的要求不断提高和视频数据量的不断增多,人们需要为一长段视频建立一段摘要来快速浏览以便更好地利用它。通过视频摘要技术,让我们在基于内容的视 频检索中不仅仅能利用文字,而且能够充分利用音视频信息。视频摘要的作用主要是便于存储和视频的浏览或查找,相对于原始的视频资料,视频摘要的长度要短很多,节省了存储时间、空间,视频摘要保留了原内容的要点,所以对于用户来说,浏览或查找视频摘要比浏览原始视频要节省时间。
相关技术中,对视频内容的处理较为简单,并且也没有对视频内容进行数据结构化处理,无法实现视频内容的快速筛选和检索,导致使用场景和应用场景较为受限。
本申请实施例提供一种视频摘要生成方法、装置、计算设备和存储介质。
请参阅图1,图1是本申请实施例提供的视频摘要生成系统示意图,如图1所示,该视频摘要生成系统包括服务器设备101,该服务器设备101可以是一台服务器,也可以是由若干台服务器组成的服务器集群,或者是一个云计算服务中心。
本申请实施例中,该视频摘要生成系统具有生成视频摘要的功能,具体的,该视频摘要生成系统可以包括视频摘要生成装置102,该视频摘要生成装置102具体可以集成在服务器设备101中,该服务器设备101即图1中的服务器设备,该服务器设备101主要用于采用获取生成视频摘要的目标筛选条件;根据目标筛选条件在视频数据库中查找结构化图像数据,得到符合所述目标筛选条件的结构化图像数据,所述结构化图像数据为结构化存储的图像数据;对所述符合所述目标筛选条件的结构化图像数据进行视频合成,生成视频摘要。
该视频摘要生成系统还可以包括一个或多个第一终端设备103,该第一终端设备103可以作为图像采集设备,例如摄像头或带有摄像头的个人计算机(PC)、笔记本电脑、智能手机、PAD或者平板电脑等,可以对图像进行采集,并将采集到的图像转换为计算机可读的形式,例如视频等。图1中仅示出一个第一终端设备103,需要说明的是,在实际应用中根据需要可以设置一个或多个第一终端设备103。
该视频摘要生成系统还可以包括存储器104,用于存储视频数据库,该视频数据库中保存有视频数据,该视频数据可以是一个或多个第一终端设备103拍摄的视频数据,例如一个或多个监控摄像头拍摄的监控视频数据,也可以是其他影视视频数据,该视频数据中包括以目标跟踪序列为单位的结构化图像数据,以供 用户进行视频内容的检索,生成视频摘要。
该视频摘要生成系统还可以包括第二终端设备105,用于展示从服务器设备101接收到的所述服务器设备101生成的视频摘要,其中,第二终端设备105可以是个人计算机(PC)、笔记本电脑等智能终端设备,也可以是智能手机、PAD或者平板电脑等智能移动终端设备。
需要说明的是,图1所示的视频摘要生成系统的场景示意图仅仅是一个示例,本申请实施例描述的视频摘要生成系统以及场景是为了更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定,本领域普通技术人员可知,随着视频摘要生成系统的演变和新业务场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
下面结合具体实施例进行详细说明。
在本实施例中,将从视频摘要生成装置的角度进行描述,该视频摘要生成装置具体可以集成在服务器设备中。
本申请提供一种视频摘要生成方法,该方法包括:获取用于生成视频摘要的目标筛选条件;根据目标筛选条件在视频数据库中查找结构化图像数据,得到符合所述目标筛选条件的结构化图像数据,所述结构化图像数据为结构化存储的图像数据;对所述符合所述目标筛选条件的结构化图像数据进行视频合成,生成视频摘要。
在一些实施例中,请参阅图2A,本申请实施例中视频摘要生成方法由服务器设备执行,包括以下步骤:
S101、获取用于生成视频摘要的目标筛选条件;
本申请实施例中,在用户需要在预设的视频数据库中进行视频内容筛选时,可以在预设的筛选选项中选择筛选条件以得到目标筛选条件,筛选选项具体可以根据实际应用需要进行设置,例如,颜色选项(如红色,黑色或不限制等),物体类别选项(如人或车辆等,具体的还可以是男人或女人,汽车或自行车等),目标轨迹方向选项(如目标轨迹方向为从南到北)等。
在一些实施例中,若用户未在筛选选项中选择,则目标筛选条件为默认的筛选条件,例如若用户未在筛选选项中选择,返回值为空,则对筛选选项中默认 全选。
在用户在预设的筛选选项中选择筛选条件后,即可得到目标筛选条件,该目标筛选条件中包括筛选选项中选择的对应的关键词,该关键词中可以包括一个或多个目标关键词,例如,红色、男人和汽车等,表示需要在视频数据库中查找具有红色、男人和汽车等目标属性特征的图像数据。
可以理解的是,目标筛选条件还可以同时包括生成视频摘要的其他一些设置条件,具体可以根据实际应用需求进行设置,例如,目标筛选条件中包括目标合成密度,目标合成密度表示在生成的视频摘要中每帧图像中目标的个数,在本申请一些实施例中,还可以设置目标合成密度的高、中和低三个档位,每个档位对应一个目标合成密度,例如目标合成密度为低档位时,表示在生成的视频摘要中每帧图像中目标的个数为3个,目标合成密度为中档位时,表示在生成的视频摘要中每帧图像中目标的个数为6个,目标合成密度为高档位时,表示在生成的视频摘要中每帧图像中目标的个数为9个。又或者目标筛选条件中包括筛选时间范围,例如2018.3.1~2018.3.2,当然还可以进一步包括小时、分钟或秒的信息。
本申请实施例中,在用户选择进行视频内容筛选时,视频摘要生成装置即获取用户在筛选选项中选择的筛选条件,获取该目标筛选条件。
S102、根据目标筛选条件在视频数据库中查找结构化图像数据,得到符合目标筛选条件的结构化图像数据,所述结构化图像数据为结构化存储的图像数据。
具体的,该视频数据库中保存有视频图像的结构化图像数据,例如视频数据库中保存有以目标跟踪序列为单位的结构化图像数据。此时,图2B为本申请实施例所述的步骤S102的具体流程图,参见图2B,根据目标筛选条件在视频数据库中查找结构化图像数据,得到符合目标筛选条件的结构化图像数据可以包括:
S123,根据目标筛选条件在视频数据库中确定筛选出的目标跟踪序列。
在一些实施例中,如前所述,视频数据库中保存有以目标跟踪序列为单位的结构化图像数据。而目标跟踪序列是对待处理视频中的目标进行跟踪,当跟踪结束或达到预设数量帧后,得到的待处理视频中包含该跟踪目标的帧序列。
在一些实施例中,目标跟踪序列的数据包括所述目标跟踪序列的标识信息、所述目标跟踪序列的属性信息和所述目标跟踪序列中每个前景图像。
在一些实施例中,筛选出的目标跟踪序列为视频数据库中具有与目标筛选 条件中的关键词相同属性信息的目标跟踪序列。
S124,从所述视频数据库中获取所述筛选出的目标跟踪序列的结构化图像数据,以得到符合目标筛选条件的结构化图像数据。
本申请实施例中,会在该视频数据库中保存有以目标跟踪序列为单位的结构化图像数据,结构化数据也称作行数据,是由二维表结构来逻辑表达和实现的数据,严格地遵循数据格式与长度规范,主要通过关系型数据库进行存储和管理。本申请实施例中,图像数据以结构化数据的格式存储,即结构化图像数据。
一般情况下,为了保存特定来源的视频数据,会单独设置一个视频数据库,对于本申请实施例中该视频数据库中保存的视频数据,可以是一个或多个监控摄像头拍摄的监控视频数据,即本申请实施例中视频数据库可以为小区监控视频数据库、路口监控视频数据库、车库监控视频数据库或商场监控视频数据库等监控视频数据库,可以理解的是,在本申请其他实施例中,该视频数据库还可以其他影视视频数据库,因此,本申请实施例中视频数据库可以是任何需要进行视频内容检索的视频数据库,此处不做具体限定。
对于视频数据库来说,会时时更新存储新的视频数据,因此对于待存入该视频数据库中的待处理视频,会进行一些处理,以便后续视频内容检索时方便生成视频摘要及查找。因此本申请实施例中在所述获取用于生成视频摘要的目标筛选条件之前,还可以包括对待处理视频处理并将处理结果存储到该视频数据库的步骤,图2C示出了对待处理视频进行处理并将处理结果存储到视频数据库的流程图。如图2C所示,本申请实施例中还可以包括:
S120,获取待处理视频;
S121,对待处理视频进行属性分析,以从待处理视频中确定目标跟踪序列,并获取待处理视频中各目标跟踪序列的结构化图像数据;
S122,将各目标跟踪序列的结构化图像数据存储在视频数据库中。
其中,待处理视频可以待存入视频数据库中的视频数据,假设视频数据库为监控视频数据库,该待处理视频可以是某个时间段新增的视频数据,例如,某一天新增的视频数据,或者某一小时新增的视频数据,或者半天内新增的视频数据等,具体可以根据实际场景需要进行设置。
具体的,图2D示出了步骤S121的具体流程图。如图2D所示,具体可以包括:
S211,获取待处理视频中的待处理图像;
S212,对待处理图像进行前景提取,得到所述待处理图像中每帧图像的前景图像;
S213,利用待处理图像的前景图像进行属性分析,以获取待处理图像中各目标跟踪序列的属性分析结果;
在一些实施例中,将待处理图像中的当前帧中的待选目标框和上一帧目标框进行比较,取其中特征响应最强(即待选目标框中的图像与上一帧目标框图像匹配度最高)的目标作为跟踪目标,对该目标进行跟踪。当某个跟踪目标跟踪结束,或者达到预设数量(例如1000帧)的跟踪帧数后,即完成对该目标的跟踪,此时得到的待处理图像中包含该跟踪目标的帧序列为目标跟踪序列。
在一些实施例中,在获取待处理图像中的目标跟踪序列之后,即对每个目标跟踪序列进行属性分析之后,获取属性分析结果,该属性分析结果可以包括每个目标跟踪序列的属性信息。
S214,将属性分析结果保存到预设的结构化的目标属性数据结构中,得到各目标跟踪序列的结构化图像数据。
其中,获取待处理视频中的待处理图像可以获取待处理视频中每帧图像,由于对于监控视频来说,有可能存在长时间没有多少变化的监控图像,因此为了提高后续处理效率,获取待处理视频中的待处理图像也可以是获取待处理视频中的关键帧图像,在获取待处理视频中的关键帧图像时,即获取待处理视频中的待处理图像具体可以包括:对待处理视频进行关键帧检测,得到待处理视频中的关键帧;将关键帧作为待处理图像。在对待处理视频进行关键帧检测时,可以采用现有的关键帧提取算法。利用关键帧检测可以将如待处理视频中大量重复无太大变化图像中仅选取一个关键帧,或者不选取(如一段监控图像中没有任何目标的情况下)。
本申请实施例中可以预先进行背景建模,以对待处理图像进行前景提取,得到待处理图像中每帧图像的前景图像。背景建模后,可以快速实现对待处理图像进行前景提取,得到待处理图像中每帧图像的前景图像的过程,该过程步骤具体可以包括:将待处理图像中每帧图像转换为单通道灰度图;提取待处理图像中 每帧图像中预设类型的局部特征图;根据每帧图像的单通道灰度图及每帧图像中的局部特征图,确定待处理图像中每帧图像的前景图像。进一步的,根据每帧图像的单通道灰度图及每帧图像中的局部特征图,确定待处理图像中每帧图像的前景图像可以包括:将每帧图像的单通道灰度图及每帧图像中的局部特征图,合成每帧图像的有效图像;利用每帧图像的有效图像与预设的混合高斯模型进行匹配,得到待处理图像中每帧图像的前景图像。
下面以一个示例作出说明,如图3所示,在背景建模里,混合高斯模式的输入为一个多通道的图像(d),该图像的不同通道对应了不同的数据源(b)(c);待处理图像中的视频帧(a)通常是彩色图像,即RGB(Red,Green,Blue)三通道图像(彩色是由红色、绿色、蓝色按不同比例混合而来,因此彩色图像包含了三个分别代表红色通道、绿色通道和蓝色通道的单色图);在一个实施例中,每帧图像的RGB三通道图像(a)被压缩合成为每帧图像的单通道灰度图(b),作为混合高斯模型输入的多通道图(d)的一个通道,而基于每帧图像的RGB三通道图像(a)提取的局部特征(如纹理、形状等特征)图(c)作为混合高斯建模输入的多通道图(d)的其他通道,共同贴合成一个多通道的图(d)(即为每帧图像的有效图像),作为混合高斯模型的输入。
需要说明的,在利用每帧图像的有效图像与预设的混合高斯模型进行匹配,得到待处理图像中每帧图像的前景图像的过程中,混合高斯模型在从有效图像中分离前景和背景信息的同时,也会缓慢的逐渐更新自己,使得其保存和维护的背景信息与最新的背景信息保持一致。具体的,即在利用混合高斯获得一帧前景图像(此时相应的背景图像也确定了)后更新混合高斯模型,由于在分离图像得到前景图像和背景图像的过程中,更新混合高斯模型为本领域技术常用技术手段,具体细节此处不再赘述。
在通过背景建模得到待处理图像中每帧图像的前景图像之后,还可以对得到的前景图像进行进一步处理,例如通过对前景分割图膨胀后提取轮廓再填充的技术手段,可以进一步减少前景图像孔洞和残缺,使得提取的前景图像具有更好的效果。。
在对待处理图像进行前景提取,得到待处理图像中每帧图像的前景图像的 过程中,由于每帧图像是由前景和背景图像组成的,在每帧图像提取前景图像后,该帧图像的背景图像即可确定了,可以对背景图像进行相应的保存,以方便后续使用,若待处理视频为监控视频,对于监控视频来说,由于监控摄像头一般情况下是固定的,拍摄角度也是固定的,这样拍出来的图像的背景图像相对比较固定,例如一直对着某个路口,拍摄出来的视频背景就一直是该路口的背景图像。因此,背景图像的保存可以是按照预设时长进行保存,或者与前景图像的预设帧数比例来保存(即每个背景图像对应每个目标跟踪序列中预设帧数的前景图像),例如,待处理视频每个30分钟保存一个背景图像,目标跟踪序列在这个时间段的前景图对应该背景图像,或者,每保存1000帧前景图像,保存一帧背景图像,即一帧背景图像对应1000帧前景图像,通过这种方式可以对每个目标跟踪序列保存对应的背景图像。
另外,图2E示出了步骤S213的具体流程图。如图2E所示,步骤S213可以包括:
S311,对待处理图像的前景图像进行目标跟踪,以获取待处理图像中的目标跟踪序列;
S312,对每个目标跟踪序列进行属性分析,以获取各目标跟踪序列的属性分析结果,该属性分析结果中包括每个目标跟踪序列的属性信息。
其中,利用待处理图像的前景图像进行属性分析,以获取各目标跟踪序列的属性分析结果可以采用预设目标跟踪算法,例如KCF(High-speed tracking with kernelized correlation filters)算法及KCF算法改进算法,其中,KCF算法改进算法即使用目标检测技术(此处可以使用本实施例中背景建模的前景提取技术,也可以用其他检测技术代替)在当前帧全局进行检测得到目标框,与KCF算法在局部进行检测的待选目标框一起和上一帧目标框进行比较,取其中特征响应最强(即待选目标框与上一帧目标框图像匹配对最高)的作为跟踪目标,对目标进行跟踪,即可获取待处理图像中的目标跟踪序列,本申请是实施例中,当某个跟踪目标跟踪结束,或者达到预设数量(例如1000帧)的跟踪帧数后,即完成对该目标的跟踪。
在获取待处理图像中的目标跟踪序列之后,即对每个目标跟踪序列进行属 性分析之后,获取属性分析结果,该属性分析结果可以包括每个目标跟踪序列的属性信息,该目标跟踪序列的属性信息可以根据实际需要进行提取,例如,目标跟踪序列的属性信息可以包括目标跟踪序列的颜色属性信息,目标跟踪序列的物体类别属性跟踪信息或目标跟踪序列的目标轨迹方向属性信息等。
当属性分析结果中包括每个目标跟踪序列的颜色属性信息时,图2F示出了步骤S312的具体流程图。如图2F所示,步骤S312包括:
S411,根据预设的像素值与颜色的映射关系,确定每个目标跟踪序列中每个前景图像的像素点对应的颜色;
S412,根据每个前景图像的像素点对应的颜色,统计各颜色在每个前景图像中对应的像素点个数;
S413,根据各颜色在每个前景图像中对应的像素点个数,确定每个目标跟踪序列的颜色属性信息。
该预设的像素值与颜色的映射关系可以是RGB(Red,Green,Blue)颜色模型、HSV(Hue,Saturation,Value)颜色模型、YUV颜色模型或CMYK(Cyan,Magenta,Yellow,Black)颜色模型等。由于HSV是一种比较直观的颜色模型,所以在许多图像处理领域中应用比较广泛,因此,预设的像素值与颜色的映射关系可以采用HSV颜色模型。在根据预设的像素值与颜色的映射关系,确定每个目标跟踪序列中每个前景图像的像素点对应的颜色之后,可以根据每个前景图像的像素点对应的颜色,统计各颜色在每个前景图像中对应的像素点个数。例如,对前景图像A中,确定红色像素点30个,黑色像素点40个。
进一步的,图2G示出了步骤S413的具体流程图。如图2G所示,S413可以包括:
S511,对每个目标跟踪序列,若目标颜色在目标前景图像中对应的像素点的个数达到目标前景图像中总像素点个数的第一预设比例,则确定该目标前景图像具有所述目标颜色的颜色属性;
S512,若目标跟踪序列中第二预设比例的前景图像具有目标颜色的颜色属性,则确定该目标跟踪序列具有目标颜色的颜色属性。
其中,第一预设比例和第二预设比例可以根据实际需要进行设置,例如第 一预设比例为30%,第二预设比例为90%。此时,假设对于某个目标跟踪序列,其目标前景图像中,包括100个像素点,其中红色像素点30个,黑色像素点50个,白色像素点10个,其他颜色像素点10个,则红色像素点占目标前景图中总像素点个数比例为:30/100=30%,黑色像素点占目标前景图中总像素点个数比例为:50/100=50%,白色像素点占目标前景图中总像素点比例为:10/100=10%,由于,红色像素点和黑色像素点占目标前景图中总像素点比例大于第一预设比例30%,则确定红色和黑色为目标前景图的属性颜色。在确定该目标跟踪序列中所有前景图的属性颜色之后,若目标跟踪序列中90%的前景图中具有红色属性,则确定该目标跟踪序列具有该红色属性。
当属性分析结果中包括每个目标跟踪序列中每个前景图像的物体类别属性信息时,图2F示出了步骤S312的具体流程图。如图2F所示,步骤S312具体可以包括:
S414,利用预设的物体分类神经网络模型,对每个目标跟踪序列中每个前景图像进行物体类别分类,得到每个目标跟踪序列的物体类别属性信息。
该物体分类神经网络模型可以是现有的物体分类神经网络模型,也可以是通过预设数量的前景图及对应的前景图中物体类别信息,训练构建的物体分类神经网络得到,该物体分类神经网络模型,目前该技术已比较成熟,此处不再详细描述。
利用预设的物体分类神经网络模型,对每个目标跟踪序列中每个前景图像可以得到每个前景图像中的物体分类属性信息,例如人(进一步还可以是男人或女人,大人或小孩等详细分类),大巴车,小轿车或非机动车等。物体分类属性信息对应的分类具体可以根据实际需要进行设置,例如,人可以分成男人或女人,也可以分成大人或小孩,还可以同时具有该两种分类,即男人和大人,此处不做具体限定。
需要说明的是,上面仅举例描述了属性分析结果中可能包括的几种目标跟踪序列的属性信息,如目标跟踪序列的颜色属性信息和目标跟踪序列的物体类别属性信息,可以理解的是,该属性分析结果中还可以包括其他类型的目标跟踪序列的属性信息,具体可以根据实际需要进行设置,例如,该属性分析结果中还包 括每个目标跟踪序列中的目标方向轨迹属性信息,对每个目标跟踪序列进行属性分析,以获取各目标跟踪序列的属性分析结果的步骤还可以包括:根据每个目标跟踪序列中各前景图中目标的位置信息,可以确定目标的轨迹方向角,例如,目标是由南运动到北。
本申请实施例中,图2H示出了在得到属性分析结果之后步骤S214的具体流程图。如图2H所示,将所述属性分析结果保存到预设的结构化的目标属性数据结构中,得到各目标跟踪序列的结构化图像数据,该过程具体可以包括:
S611,为待处理图像中的每个目标跟踪序列分配标识信息;
S612,根据待处理图像中每个目标跟踪序列的标识信息、每个目标跟踪序列的属性信息和每个目标跟踪序列中每个前景图像,得到每个目标跟踪序列的数据;
S613,将每个目标跟踪序列的数据分别保存到预设的结构化的目标属性数据结构中,得到每个目标跟踪序列的结构化图像数据。
具体的,对于待处理图像中的每个目标跟踪序列,可以按照预设规则分配标识信息,例如按照数字编号或字母编号分配标识信息,例如目标跟踪序列的身份标识号(IDentity,ID),具体如目标跟踪序列1或目标跟踪序列A,其中1或A即为目标跟踪序列的ID。又例如,目标跟踪序列的身份标识号还可以是根据目标跟踪序列对应的时间。
其中,将每个目标跟踪序列的数据分别保存到预设的结构化的目标属性数据结构中,得到每个目标跟踪序列的结构化图像数据可以包括调用预设结构化数据存储接口将每个目标跟踪序列的数据分别保存到预设的结构化的目标属性数据结构中。该目标属性数据结构一个示例具体如下表1所示:
表1
需要说明的是,将每个目标跟踪序列的数据分别保存到预设的结构化的目标属性数据结构中,得到的每个目标跟踪序列的结构化图像数据中还可以包括上述每个目标跟踪序列对应的背景图像,另外,目标跟踪序列中各前景帧中包括个 前景帧的时间信息。
在上述得到每个目标跟踪序列的结构化图像数据之后,即可将每个目标跟踪序列的结构化图像数据存储在视频数据库中。这样对于新的需要处理的视频,就可以按照上述操作步骤实现视频的结构化存储,在视频数据库中可以根据筛选条件筛选相应视频内容。
因此,本申请实施例中,在获取目标筛选条件后,即可根据目标筛选条件在视频数据库中确定筛选出的目标跟踪序列,进一步的,图2I示出了步骤S123的具体流程图。如图2I所示,根据目标筛选条件在视频数据库中确定筛选出的目标跟踪序列的步骤可以包括:
S621,获取目标筛选条件中的关键词;
S622,在视频数据库中确定具有与关键词相同属性信息的目标跟踪序列,得到筛选出的目标跟踪序列。
即筛选出的目标跟踪序列为视频数据库中具有与目标筛选条件中的关键词相同属性信息的目标跟踪序列。
例如,假设目标筛选条件中包括“红色”和“男人”的关键词,筛选时间范围为2018.3.1~2018.3.2,则根据该“红色”和“男人”的关键词在视频数据库中查找具有2018.3.1~2018.3.2时间段,具有“红色”和“男人”属性信息的目标跟踪序列,即得到筛选出的目标跟踪序列。
(2)从视频数据库中获取筛选出的目标跟踪序列的结构化图像数据,得到符合所述目标筛选条件的结构化图像数据;
在得到筛选出的目标跟踪序列之后,即可获取筛选出的目标跟踪序列的结构化图像数据,得到符合所述目标筛选条件的结构化图像数据。
S103、对所述符合所述目标筛选条件的结构化图像数据进行视频合成,生成视频摘要。
由于视频数据库中还包括与每个目标跟踪序列对应的多个背景图像,因此多个背景图像中每个背景图像可以对应每个目标跟踪序列中预设帧数的前景图像;此时,对所述符合所述目标筛选条件的结构化图像数据进行视频合成,生成视频摘要具体可以包括:
(1)获取目标筛选条件中的目标合成密度。
该目标合成密度即表示在生成的视频摘要中每帧图像中目标的个数,例如,目标合成密度为3,则生成的视频摘要中每帧图像中包括3个目标。
(2)新建与目标合成密度对应的N个合成队列,N为正整数。
其中,N即等于该目标合成密度,例如目标合成密度为3,则N=3,新建3个合成队列。
(3)将筛选出的目标跟踪序列中的目标跟踪序列平均分配到N个合成队列。
假设待处理视频中的目标跟踪序列包括30个目标跟踪序列,则按照目标合成密度为3来分配,即每个合成队列分配有10个目标跟踪序列。
(4)将N个合成队列中对应的目标跟踪序列的前景图像依次贴合到对应的背景图像上,以生成视频摘要。
具体的,在将N个合成队列中对应的目标跟踪序列的前景图像,贴合在所述多个背景图像上时,每个合成队列中的每隔m个前景图像,对应多个背景图像中的一个背景图进行贴合,该m即为一个背景图像对应的前景图像的数量,例如1000,即标识,每个合成队列中的每隔1000个前景图像每次合成时的背景都是该一个背景图像,超过1000个后,切换背景图像。对于视频摘要的每一个合成图像,具体合成方法可以如下:N个合成队列的第一个前景图均贴合到对应的背景图像上合成结果视频的第一帧,N个合成队列新的第一个前景图贴合到对应的背景图像上合成结果视频的第二帧,以此类推,在第m+1个合成图像的合成时,便得到了最终的视频摘要的所有合成图像,根据视频摘要的所有合成图像,即可生成视频摘要。
本申请实施例中,由于对视频进行结构化存储,在用户重新选择筛选条件后,无需重新分析原始视频,可直接根据新的筛选条件读取视频数据库中结构化图像数据快速合成相应视频摘要。因此本申请实施例中方法还可以包括:获取生成视频摘要的新的筛选条件;根据新的筛选条件在视频数据库中确定新的筛选出的目标跟踪序列;从视频数据库中获取新的筛选出的目标跟踪序列的结构化图像数据,得到新的结构化图像数据;对新的结构化图像数据进行视频合成,生成新的视频摘要。具体的,该生成新的视频摘要的方式可以参考上述实施例中描述的方式,此处不再赘述。
本申请实施例通过获取用于生成视频摘要的目标筛选条件;根据目标筛选条件在视频数据库中查找结构化图像数据,得到符合所述目标筛选条件的结构化图像数据;对所述符合所述目标筛选条件的结构化图像数据进行视频合成,生成视频摘要。本申请实施例中由于视频数据库中保存视频图像的结构化图像数据,在用户检索相关视频时,可从结构化图像数据中快速筛选相关视频信息,生成视频摘要,一方面,极大的方便了用户对视频目标内容的定位,并极大的拓展了视频摘要的应用场景,另一方面,加快了视频摘要的生成过程,提高了服务器设备的运行效率。
下面结合一具体应用场景对本申请实施例中视频摘要生成方法进行描述。
请参阅图4,图4为本申请实施例提供的视频摘要生成方法的另一流程示意图,由服务器设备执行,该方法流程可以包括:
201、服务器设备获取用于生成视频摘要的目标筛选条件。
在一些实施例中,在用户需要在预设的视频数据库中进行视频内容筛选时,可以在预设的筛选选项中选择的筛选条件以得到目标筛选条件。
如图5所示为本申请实施例中一个具体场景示例图,该示例中,筛选选项中包括颜色选项501、目标合成密度选项502和目标类型选项503,其中颜色选项选择的“不限制”(即不限制目标颜色),目标合成密度选项选择的“低”档位(表示在生成的视频摘要中每帧图像中目标的个数为3个),目标类型选线中选择的“人”。该筛选选项选择的内容即构成目标筛选条件。
202、服务器设备根据目标筛选条件在视频数据库中确定筛选出的目标跟踪序列。
服务器设备根据该目标筛选条件在视频数据库中查找具有“不限制颜色”、“人”属性信息的目标跟踪序列,即得到筛选出的目标跟踪序列,假设筛选出的目标跟踪序列有30个。
203、服务器设备从视频数据库中获取筛选出的目标跟踪序列的结构化图像数据,以得到符合目标筛选条件的结构化图像数据。
在得到筛选出的目标跟踪序列之后,服务器设备即可获取筛选出的目标跟踪序列的结构化图像数据,得到结构化图像数据,所述结构化图像数据为结构化 存储的图像数据。
204、服务器设备对所述符合所述目标筛选条件的结构化图像数据进行视频合成,生成视频摘要。
由于图5中目标筛选条件中为“低”档位目标合成密度,即表示在生成的视频摘要中每帧图像中目标的个数为3个,此时,服务器设备对结构化图像数据进行视频合成,生成视频摘要具体包括,新建与目标合成密度对应的3个合成队列,按照目标合成密度为3来平均分配,即每个合成队列分配有30/3=10个目标跟踪序列。假设每个背景图像对应1000个前景图像,3个合成队列的第一个前景图均贴合到对应的背景图像上合成结果视频的第一帧,对于视频摘要的每一个合成图像,具体合成方法可以如下:3个合成队列新的第一个前景图贴合到对应的背景图像上合成结果视频的第二帧,以此类推,在第1001个合成图像的合成时,换下一个背景图像进行贴合,便得到了最终的视频摘要的所有合成图像,根据视频摘要的所有合成图像,即可生成视频摘要,如图5中生成的视频摘要的每帧图像中即包括3个目标人物,每帧图像中均标记每个目标对应的时间信息。
需要说明的是,如图5所示,本申请实施例中可以在显示视频摘要的界面中显示原始视频,也可以不显示原始视频,具体可以根据实际需要进行设置。
本申请实施例服务器设备通过获取用于生成视频摘要的目标筛选条件;服务器设备根据目标筛选条件在视频数据库中确定筛选出的目标跟踪序列,该视频数据库中保存有以目标跟踪序列为单位的结构化图像数据;服务器设备从视频数据库中获取筛选出的目标跟踪序列的结构化图像数据,得到符合所述目标筛选条件的结构化图像数据;服务器设备对结构化图像数据进行视频合成,生成视频摘要。本申请实施例中由于以视频数据库中以目标跟踪序列为单位的保存视频图像的结构化图像数据,在用户检索相关视频时,服务器设备可从结构化图像数据中快速筛选相关视频信息,生成视频摘要,极大的方便了用户对视频目标内容的定位,也极大的拓展了视频摘要的应用场景。
为便于更好的实施本申请实施例提供的视频摘要生成方法,本申请实施例还提供一种基于上述视频摘要生成方法的装置。其中名词的含义与上述视频摘要生成方法中相同,具体实现细节可以参考方法实施例中的说明。
请参阅图6A,图6A为本申请实施例提供的视频摘要生成装置的结构示意图,其中该视频摘要生成装置可以包括第一获取单元601、查找单元602和生成单元603,具体如下:
第一获取单元601,用于获取用于生成视频摘要的目标筛选条件;
查找单元602,用于根据目标筛选条件在视频数据库中查找结构化图像数据,得到结构化图像数据,所述结构化图像数据为结构化存储的图像数据;
生成单元603,用于对所述符合所述目标筛选条件的结构化图像数据进行视频合成,生成视频摘要。
在一些实施例中,图6B为本申请实施例中查找单元602的结构示意图,如图6B所示,该查找单元602包括确定子单元6021和获取子单元6022,具体如下:
确定子单元6021,用于根据目标筛选条件在视频数据库中确定筛选出的目标跟踪序列,该视频数据库中保存有以目标跟踪序列为单位的结构化图像数据;
获取子单元6022,用于从视频数据库中获取所述筛选出的目标跟踪序列的结构化图像数据,得到符合所述目标筛选条件的结构化图像数据。
在一些实施例中,该装置还包括第二获取单元604、属性分析单元605和存储单元606,具体如下:
第二获取单元604,用于获取待处理视频;
属性分析单元605,用于对待处理视频进行属性分析,以从待处理视频中确定目标跟踪序列,并获取待处理视频中各目标跟踪序列的结构化图像数据;
存储单元606,用于将待处理视频中各目标跟踪序列的结构化图像数据存储在所述视频数据库中。
在一些实施例中,图6C为本申请实施例中属性分析单元605的结构示意图,如图6C所示,该属性分析单元605包括获取子单元6051、前景提取子单元6052、属性分析子单元6053,具体如下:
获取子单元6051,用于获取待处理视频中的待处理图像;
前景提取子单元6052,用于对待处理图像进行前景提取,得到待处理图像中每帧图像的前景图像;
属性分析子单元6053,用于利用待处理图像的前景图像进行属性分析,以 获取待处理图像中各目标跟踪序列的属性分析结果;
存储单元606,还用于将属性分析结果保存到预设的结构化的目标属性数据结构中,得到各目标跟踪序列的结构化图像数据。
在一些实施例中,该属性分析子单元6053具体用于:
对待处理图像的前景图像进行目标跟踪,以获取待处理图像中的目标跟踪序列;
对每个目标跟踪序列进行属性分析,以获取各目标跟踪序列的属性分析结果,所述属性分析结果中包括每个目标跟踪序列的属性信息。
在一些实施例中,该属性分析结果中包括每个目标跟踪序列的颜色属性信息,该属性分析子单元6053具体用于:
根据预设的像素值与颜色的映射关系,确定每个目标跟踪序列中每个前景图像的像素点对应的颜色;
根据每个前景图像的像素点对应的颜色,统计各颜色在每个前景图像中对应的像素点个数;
根据各颜色在每个前景图像中对应的像素点个数,确定每个目标跟踪序列的颜色属性信息。
在一些实施例中,该属性分析子单元6053具体用于:
对每个目标跟踪序列,若目标颜色在目标前景图像中对应的像素点的个数达到所述目标前景图像中总像素点个数的第一预设比例,则确定该目标前景图像具有目标颜色的颜色属性;
若目标跟踪序列中第二预设比例的前景图像具有所述目标颜色的颜色属性,则确定该目标跟踪序列具有目标颜色的颜色属性。
在一些实施例中,该属性分析结果中包括每个目标跟踪序列中每个前景图像的物体类别属性信息,所述属性分析子单元6053具体用于:
利用预设的物体分类神经网络模型,对每个目标跟踪序列中每个前景图像进行物体类别分类,得到每个目标跟踪序列的物体类别属性信息。
在一些实施例中,该前景提取子单元6052具体用于:
将待处理图像中每帧图像转换为单通道灰度图;
提取待处理图像中每帧图像中预设类型的局部特征图;
根据每帧图像的单通道灰度图及每帧图像中的局部特征图,确定待处理图像中每帧图像的前景图像。
在一些实施例中,该前景提取子单元6052具体用于:
将每帧图像的单通道灰度图及每帧图像中的局部特征图,合成每帧图像的有效图像;
利用每帧图像的有效图像与预设的混合高斯模型进行匹配,得到待处理图像中每帧图像的前景图像。
在一些实施例中,该存储单元606具体用于:
为待处理图像中的每个目标跟踪序列分配标识信息;
根据待处理图像中每个目标跟踪序列的标识信息、每个目标跟踪序列的属性信息和每个目标跟踪序列中每个前景图像,得到每个目标跟踪序列的数据;
将每个目标跟踪序列的数据分别保存到预设的结构化的目标属性数据结构中,得到待处理图像中每个目标跟踪序列的结构化图像数据。
在一些实施例中,该获取子单元6051具体用于:
对待处理视频进行关键帧检测,得到待处理视频中的关键帧;
将关键帧作为所述待处理图像。
在一些实施例中,该确定子单元6021具体用于:
获取目标筛选条件中的关键词;
在视频数据库中确定具有与关键词相同属性信息的目标跟踪序列,得到筛选出的目标跟踪序列。
在一些实施例中,该视频数据库中还包括与每个目标跟踪序列对应的多个背景图像,该多个背景图像中每个背景图像对应每个目标跟踪序列中预设帧数的前景图像;该生成单元603具体用于:
获取目标筛选条件中的目标合成密度;
新建与目标合成密度对应的N个合成队列,N为正整数;
将筛选出的目标跟踪序列中的目标跟踪序列平均分配到N个合成队列;
将N个合成队列中对应的目标跟踪序列的前景图像依次贴合到对应的背景 图像上,以生成视频摘要。
具体实施时,以上各个单元可以作为独立的实体来实现,也可以进行任意组合,作为同一或若干个实体来实现,以上各个单元的具体实施可参见前面的方法实施例,在此不再赘述。
本申请实施例第一获取单元601通过获取用于生成视频摘要的目标筛选条件;确定单元602根据目标筛选条件在视频数据库中查找结构化图像数据,得到结构化图像数据,所述结构化图像数据为结构化存储的图像数据;生成单元603对所述符合所述目标筛选条件的结构化图像数据进行视频合成,生成视频摘要。本申请实施例中由于视频数据库中保存有视频图像的结构化图像数据,在用户检索相关视频时,可从结构化图像数据中快速筛选相关视频信息,生成视频摘要,极大的方便了用户对视频目标内容的定位,也极大的拓展了视频摘要的应用场景。
本申请实施例还提供一种计算设备,该计算设备可以是服务器设备,如图7所示,其示出了本申请实施例所涉及的服务器设备的结构示意图,具体来讲:
该服务器设备可以包括一个或者一个以上处理核心的处理器701、一个或一个以上计算机可读存储介质的存储器702、电源703和输入单元704等部件。本领域技术人员可以理解,图7中示出的服务器设备结构并不构成对服务器设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。其中:
处理器701是该服务器设备的控制中心,利用各种接口和线路连接整个服务器设备的各个部分,通过运行或执行存储在存储器702内的软件程序和/或模块,以及调用存储在存储器702内的数据,执行服务器设备的各种功能和处理数据,从而对服务器设备进行整体监控。在一些实施例中,处理器701可包括一个或多个处理核心;处理器701可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作存储介质、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器701中。
存储器702可用于存储软件程序以及模块,处理器701通过运行存储在存储器702的软件程序以及模块,从而执行各种功能应用以及数据处理。存储器702可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作存储介质、 至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据服务器设备的使用所创建的数据等。此外,存储器702可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。相应地,存储器702还可以包括存储器控制器,以提供处理器701对存储器702的访问。
服务器设备还包括给各个部件供电的电源703,电源703可以通过电源管理存储介质与处理器701逻辑相连,从而通过电源管理存储介质实现管理充电、放电、以及功耗管理等功能。电源703还可以包括一个或一个以上的直流或交流电源、再充电存储介质、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。
该服务器设备还可包括输入单元704,该输入单元704可用于接收输入的数字或字符信息,以及产生与用户设置以及功能控制有关的键盘、鼠标、操作杆、光学或者轨迹球信号输入。
尽管未示出,服务器设备还可以包括显示单元等,在此不再赘述。具体在本实施例中,服务器设备中的处理器701会按照如下的指令,将一个或一个以上的应用程序的进程对应的可执行文件加载到存储器702中,并由处理器701来运行存储在存储器702中的应用程序,从而实现各种功能,如下:
获取用于生成视频摘要的目标筛选条件;根据目标筛选条件在视频数据库中查找结构化图像数据,得到符合所述目标筛选条件的结构化图像数据,所述结构化图像数据为结构化存储的图像数据;对所述符合所述目标筛选条件的结构化图像数据进行视频合成,生成视频摘要。
本领域普通技术人员可以理解,上述实施例的各种方法中的全部或部分步骤可以通过指令来完成,或通过指令控制相关的硬件来完成,该指令可以存储于一计算机可读存储介质中,并由处理器进行加载和执行。
为此,本申请实施例提供一种存储介质,其中存储有多条指令,该指令能够被处理器进行加载,以执行本申请实施例所提供的任一种视频摘要生成方法中的步骤。例如,该指令可以执行如下步骤:
获取用于生成视频摘要的目标筛选条件;根据目标筛选条件在视频数据库 中查找结构化图像数据,得到结构化图像数据,所述结构化图像数据为结构化存储的图像数据;对所述符合所述目标筛选条件的结构化图像数据进行视频合成,生成视频摘要。
以上各个操作的具体实施可参见前面的实施例,在此不再赘述。
其中,该存储介质可以包括:只读存储器(ROM,Read Only Memory)、随机存取记忆体(RAM,Random Access Memory)、磁盘或光盘等。
由于该存储介质中所存储的指令,可以执行本申请实施例所提供的任一种视频摘要生成方法中的步骤,因此,可以实现本申请实施例所提供的任一种视频摘要生成方法所能实现的有益效果,详见前面的实施例,在此不再赘述。
以上对本申请实施例所提供的一种视频摘要生成方法、装置和存储介质进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。
Claims (18)
- 一种视频摘要生成方法,由服务器设备执行,所述方法包括:获取用于生成视频摘要的目标筛选条件;根据所述目标筛选条件在视频数据库中查找结构化图像数据,得到符合所述目标筛选条件的结构化图像数据,所述结构化图像数据为结构化存储的图像数据;对所述符合所述目标筛选条件的结构化图像数据进行视频合成,生成视频摘要。
- 根据权利要求1所述的视频摘要生成方法,其中,所述根据所述目标筛选条件在视频数据库中查找结构化图像数据,得到符合所述目标筛选条件的结构化图像数据,包括:根据所述目标筛选条件在视频数据库中确定筛选出的目标跟踪序列,所述视频数据库中保存有以目标跟踪序列为单位的结构化图像数据;从所述视频数据库中获取所述筛选出的目标跟踪序列对应的结构化图像数据,以得到符合所述目标筛选条件的结构化图像数据。
- 根据权利要求2所述的视频摘要生成方法,其中,在所述获取用于生成视频摘要的目标筛选条件之前,所述方法还包括:获取待处理视频;对所述待处理视频进行属性分析,以从所述待处理视频中确定目标跟踪序列,并获取所述待处理视频中各目标跟踪序列的结构化图像数据;将所述各目标跟踪序列的结构化图像数据存储在所述视频数据库中。
- 根据权利要求3所述的视频摘要生成方法,其中,所述对所述待处理视频进行属性分析,以从所述待处理视频中确定目标跟踪序列,并获取所述待处理视频中各目标跟踪序列的结构化图像数据,包括:获取所述待处理视频中的待处理图像;对所述待处理图像进行前景提取,得到所述待处理图像中每帧图像的前景图像;利用所述待处理图像的前景图像进行属性分析,以获取所述待处理图像中 各目标跟踪序列的属性分析结果;将所述属性分析结果保存到预设的结构化的目标属性数据结构中,得到所述各目标跟踪序列的结构化图像数据。
- 根据权利要求4所述的视频摘要生成方法,其中,所述利用所述待处理图像的前景图像进行属性分析,以获取所述待处理图像中各目标跟踪序列的属性分析结果,包括:对所述待处理图像的前景图像进行目标跟踪,以获取所述待处理图像中的目标跟踪序列;对每个目标跟踪序列进行属性分析,以获取所述属性分析结果,所述属性分析结果中包括所述目标跟踪序列的属性信息。
- 根据权利要求5所述的视频摘要生成方法,其中,所述属性分析结果中包括每个目标跟踪序列的颜色属性信息,所述对每个目标跟踪序列进行属性分析,以获取所述属性分析结果,包括:根据预设的像素值与颜色的映射关系,确定每个目标跟踪序列中每个前景图像的像素点对应的颜色;根据所述每个前景图像的像素点对应的颜色,统计各颜色在每个前景图像中对应的像素点个数;根据各颜色在每个前景图像中对应的像素点个数,确定每个目标跟踪序列的颜色属性信息。
- 根据权利要求6所述的视频摘要生成方法,其中,所述根据各颜色在每个前景图像中对应的像素点个数,确定每个目标跟踪序列的颜色属性信息,包括:对每个目标跟踪序列,若目标颜色在目标前景图像中对应的像素点的个数达到所述目标前景图像中总像素点个数的第一预设比例,则确定该目标前景图像具有所述目标颜色的颜色属性;若目标跟踪序列中第二预设比例的前景图像具有所述目标颜色的颜色属性,则确定该目标跟踪序列具有所述目标颜色的颜色属性。
- 根据权利要求5所述的视频摘要生成方法,其中,所述属性分析结果中 包括每个目标跟踪序列中每个前景图像的物体类别属性信息,所述对每个目标跟踪序列进行属性分析,以获取所述属性分析结果,包括:利用预设的物体分类神经网络模型,对每个目标跟踪序列中每个前景图像进行物体类别分类,得到每个目标跟踪序列的物体类别属性信息。
- 根据权利要求5所述的视频摘要生成方法,其中,所述属性分析结果中包括每个目标跟踪序列中每个前景图像的目标方向轨迹属性信息,所述对每个目标跟踪序列进行属性分析,以获取所述属性分析结果,包括:根据每个目标跟踪序列中各前景图像中目标的位置信息,确定目标的轨迹方向角。
- 根据权利要求5所述的方法,其中,所述将所述属性分析结果保存到预设的结构化的目标属性数据结构中,得到所述各目标跟踪序列的结构化图像数据,包括:为所述待处理图像中的每个目标跟踪序列分配标识信息;根据所述待处理图像中每个目标跟踪序列的标识信息、每个目标跟踪序列的属性信息和所述每个目标跟踪序列中每个前景图像,得到每个目标跟踪序列的数据;将每个目标跟踪序列的数据分别保存到预设的结构化的目标属性数据结构中,得到待处理图像中的所述各目标跟踪序列的结构化图像数据。
- 根据权利要求4所述的视频摘要生成方法,其中,所述对所述待处理图像进行前景提取,得到所述待处理图像中每帧图像的前景图像,包括:将所述待处理图像中每帧图像转换为单通道灰度图;提取所述待处理图像中每帧图像中预设类型的局部特征图;根据所述每帧图像的单通道灰度图及每帧图像中的局部特征图,确定所述待处理图像中每帧图像的前景图像。
- 根据权利要求11所述的视频摘要生成方法,其中,所述根据所述每帧图像的单通道灰度图及每帧图像中的局部特征图,确定所述待处理图像中每帧图像的前景图像,包括:将每帧图像的单通道灰度图及每帧图像中的局部特征图,合成每帧图像的 有效图像;利用每帧图像的有效图像与预设的混合高斯模型进行匹配,得到所述待处理图像中每帧图像的前景图像。
- 根据权利要求4所述的视频摘要生成方法,其中,所述获取所述待处理视频中的待处理图像,包括:对所述待处理视频进行关键帧检测,得到所述待处理视频中的关键帧;将所述关键帧作为所述待处理图像。
- 根据权利要求2所述的视频摘要生成方法,其中,所述目标跟踪序列的数据包括所述目标跟踪序列的属性信息;其中,所述根据所述目标筛选条件在视频数据库中确定筛选出的目标跟踪序列,包括:获取所述目标筛选条件中的关键词;在所述视频数据库中确定具有与所述关键词相同属性信息的目标跟踪序列,得到所述筛选出的目标跟踪序列。
- 根据权利要求2所述的视频摘要生成方法,其中,所述视频数据库中还包括与每个目标跟踪序列对应的多个背景图像,所述多个背景图像中每个背景图像对应每个目标跟踪序列中预设帧数的前景图像;所述对所述所述符合所述目标筛选条件的结构化图像数据进行视频合成,生成视频摘要,包括:获取所述目标筛选条件中的目标合成密度;新建与所述目标合成密度对应的N个合成队列,N为正整数;将所述筛选出的目标跟踪序列中的目标跟踪序列平均分配到所述N个合成队列;将所述N个合成队列中对应的目标跟踪序列的前景图像依次贴合到对应的背景图像上,以生成视频摘要。
- 一种视频摘要生成装置,其中,所述装置包括:第一获取单元,用于获取用于生成视频摘要的目标筛选条件;查找单元,用于根据所述目标筛选条件在视频数据库中查找结构化图像数据,得到符合所述目标筛选条件的结构化图像数据,所述结构化图像数据为结构 化存储的图像数据;生成单元,用于对所述符合所述目标筛选条件的结构化图像数据进行视频合成,生成视频摘要。
- 一种计算设备,包括处理器;用于存储处理器可执行指令的存储器;其中,所述处理器被配置执行权利要求1-15任一项所述的视频摘要生成方法。
- 一种存储介质,所述存储介质存储有多条指令,所述指令适于处理器进行加载,以执行权利要求1至15任一项所述视频摘要生成方法。
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP19852152.8A EP3843418B1 (en) | 2018-08-21 | 2019-08-09 | Video abstract generating method and apparatus, computing device, and storage medium |
| US17/016,638 US11347792B2 (en) | 2018-08-21 | 2020-09-10 | Video abstract generating method, apparatus, and storage medium |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201810955587.XA CN110166851B (zh) | 2018-08-21 | 2018-08-21 | 一种视频摘要生成方法、装置和存储介质 |
| CN201810955587.X | 2018-08-21 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/016,638 Continuation US11347792B2 (en) | 2018-08-21 | 2020-09-10 | Video abstract generating method, apparatus, and storage medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2020038243A1 true WO2020038243A1 (zh) | 2020-02-27 |
Family
ID=67645152
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2019/100051 Ceased WO2020038243A1 (zh) | 2018-08-21 | 2019-08-09 | 一种视频摘要生成方法、装置、计算设备和存储介质 |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US11347792B2 (zh) |
| EP (1) | EP3843418B1 (zh) |
| CN (1) | CN110166851B (zh) |
| WO (1) | WO2020038243A1 (zh) |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114394100A (zh) * | 2022-01-12 | 2022-04-26 | 深圳力维智联技术有限公司 | 一种无人巡逻车控制系统及无人车 |
| CN114596516A (zh) * | 2020-12-04 | 2022-06-07 | 北京三星通信技术研究有限公司 | 目标跟踪方法、装置、电子设备及计算机可读存储介质 |
| CN116244364A (zh) * | 2023-02-24 | 2023-06-09 | 浪潮软件集团有限公司 | 一种信创平台的视频图像数据动态转换数据结构的方法 |
| US12093414B1 (en) * | 2019-12-09 | 2024-09-17 | Amazon Technologies, Inc. | Efficient detection of in-memory data accesses and context information |
| US12456205B2 (en) | 2020-12-04 | 2025-10-28 | Samsung Electronics Co., Ltd. | Method and apparatus with object tracking using dynamic field of view |
Families Citing this family (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113569605B (zh) * | 2021-01-17 | 2024-07-16 | 南京大学 | 视频信息处理方法、装置、电子设备及存储介质 |
| US12432362B2 (en) | 2021-09-02 | 2025-09-30 | Samsung Electronics Co., Ltd. | Encoding and decoding video data |
| CN114613355B (zh) * | 2022-04-07 | 2023-07-14 | 抖音视界有限公司 | 视频处理方法、装置、可读介质及电子设备 |
| CN114973072B (zh) * | 2022-05-12 | 2025-09-02 | 桂林电子科技大学 | 一种基于cnn的视频关键帧提取方法及系统 |
| CN115633222B (zh) * | 2022-09-30 | 2025-02-11 | 北京达佳互联信息技术有限公司 | 视频生成方法、装置、电子设备及存储介质 |
| CN116994176B (zh) * | 2023-07-18 | 2025-07-15 | 西北工业大学 | 一种基于多维语义信息的视频关键数据提取方法 |
| CN116701707B (zh) * | 2023-08-08 | 2023-11-10 | 成都市青羊大数据有限责任公司 | 一种教育大数据管理系统 |
| CN117061189B (zh) * | 2023-08-26 | 2024-01-30 | 上海六坊信息科技有限公司 | 一种基于数据加密的数据包传输方法及系统 |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103617234A (zh) * | 2013-11-26 | 2014-03-05 | 公安部第三研究所 | 主动式视频浓缩装置及方法 |
| US20150104149A1 (en) * | 2013-10-15 | 2015-04-16 | Electronics And Telecommunications Research Institute | Video summary apparatus and method |
| CN104754248A (zh) * | 2013-12-30 | 2015-07-01 | 浙江大华技术股份有限公司 | 一种获取目标快照的方法及装置 |
| CN106354816A (zh) * | 2016-08-30 | 2017-01-25 | 东软集团股份有限公司 | 一种视频图像处理方法及装置 |
| CN106937120A (zh) * | 2015-12-29 | 2017-07-07 | 北京大唐高鸿数据网络技术有限公司 | 基于对象的监控视频浓缩方法 |
| CN107291910A (zh) * | 2017-06-26 | 2017-10-24 | 图麟信息科技(深圳)有限公司 | 一种视频片段结构化查询方法、装置及电子设备 |
Family Cites Families (19)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6965379B2 (en) * | 2001-05-08 | 2005-11-15 | Koninklijke Philips Electronics N.V. | N-view synthesis from monocular video of certain broadcast and stored mass media content |
| US7113185B2 (en) * | 2002-11-14 | 2006-09-26 | Microsoft Corporation | System and method for automatically learning flexible sprites in video layers |
| WO2004105035A1 (en) * | 2003-05-26 | 2004-12-02 | Koninklijke Philips Electronics N.V. | System and method for generating audio-visual summaries for audio-visual program content |
| CN101561932B (zh) * | 2009-05-12 | 2012-01-11 | 北京交通大学 | 一种动态复杂背景下的实时运动目标检测方法和装置 |
| US20120173577A1 (en) * | 2010-12-30 | 2012-07-05 | Pelco Inc. | Searching recorded video |
| CN102708182B (zh) * | 2012-05-08 | 2014-07-02 | 浙江捷尚视觉科技有限公司 | 一种快速视频浓缩摘要方法 |
| CN102930061B (zh) * | 2012-11-28 | 2016-01-06 | 安徽水天信息科技有限公司 | 一种基于运动目标检测的视频摘要方法 |
| WO2016018796A1 (en) * | 2014-07-28 | 2016-02-04 | Flir Systems, Inc. | Systems and methods for video synopses |
| CN104244113B (zh) * | 2014-10-08 | 2017-09-22 | 中国科学院自动化研究所 | 一种基于深度学习技术的视频摘要生成方法 |
| CN104394353B (zh) * | 2014-10-14 | 2018-03-09 | 浙江宇视科技有限公司 | 视频浓缩方法及装置 |
| CN104581437B (zh) * | 2014-12-26 | 2018-11-06 | 中通服公众信息产业股份有限公司 | 一种视频摘要生成及视频回溯的方法及系统 |
| CN104717573B (zh) * | 2015-03-05 | 2018-04-13 | 广州市维安电子技术有限公司 | 一种视频摘要的生成方法 |
| CN105187801B (zh) * | 2015-09-17 | 2021-07-27 | 桂林远望智能通信科技有限公司 | 一种摘要视频的生成系统及方法 |
| CN105262932B (zh) * | 2015-10-20 | 2018-06-29 | 深圳市华尊科技股份有限公司 | 一种视频处理的方法及终端 |
| US10728194B2 (en) * | 2015-12-28 | 2020-07-28 | Facebook, Inc. | Systems and methods to selectively combine video streams |
| CN106101578B (zh) * | 2016-06-27 | 2019-08-16 | 上海小蚁科技有限公司 | 图像合成方法和设备 |
| KR102588524B1 (ko) * | 2016-08-01 | 2023-10-13 | 삼성전자주식회사 | 전자 장치 및 그의 동작 방법 |
| US10720182B2 (en) * | 2017-03-02 | 2020-07-21 | Ricoh Company, Ltd. | Decomposition of a video stream into salient fragments |
| CN107943837B (zh) * | 2017-10-27 | 2022-09-30 | 江苏理工学院 | 一种前景目标关键帧化的视频摘要生成方法 |
-
2018
- 2018-08-21 CN CN201810955587.XA patent/CN110166851B/zh active Active
-
2019
- 2019-08-09 EP EP19852152.8A patent/EP3843418B1/en active Active
- 2019-08-09 WO PCT/CN2019/100051 patent/WO2020038243A1/zh not_active Ceased
-
2020
- 2020-09-10 US US17/016,638 patent/US11347792B2/en active Active
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20150104149A1 (en) * | 2013-10-15 | 2015-04-16 | Electronics And Telecommunications Research Institute | Video summary apparatus and method |
| CN103617234A (zh) * | 2013-11-26 | 2014-03-05 | 公安部第三研究所 | 主动式视频浓缩装置及方法 |
| CN104754248A (zh) * | 2013-12-30 | 2015-07-01 | 浙江大华技术股份有限公司 | 一种获取目标快照的方法及装置 |
| CN106937120A (zh) * | 2015-12-29 | 2017-07-07 | 北京大唐高鸿数据网络技术有限公司 | 基于对象的监控视频浓缩方法 |
| CN106354816A (zh) * | 2016-08-30 | 2017-01-25 | 东软集团股份有限公司 | 一种视频图像处理方法及装置 |
| CN107291910A (zh) * | 2017-06-26 | 2017-10-24 | 图麟信息科技(深圳)有限公司 | 一种视频片段结构化查询方法、装置及电子设备 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP3843418A4 * |
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US12093414B1 (en) * | 2019-12-09 | 2024-09-17 | Amazon Technologies, Inc. | Efficient detection of in-memory data accesses and context information |
| CN114596516A (zh) * | 2020-12-04 | 2022-06-07 | 北京三星通信技术研究有限公司 | 目标跟踪方法、装置、电子设备及计算机可读存储介质 |
| US12456205B2 (en) | 2020-12-04 | 2025-10-28 | Samsung Electronics Co., Ltd. | Method and apparatus with object tracking using dynamic field of view |
| CN114394100A (zh) * | 2022-01-12 | 2022-04-26 | 深圳力维智联技术有限公司 | 一种无人巡逻车控制系统及无人车 |
| CN114394100B (zh) * | 2022-01-12 | 2024-04-05 | 深圳力维智联技术有限公司 | 一种无人巡逻车控制系统及无人车 |
| CN116244364A (zh) * | 2023-02-24 | 2023-06-09 | 浪潮软件集团有限公司 | 一种信创平台的视频图像数据动态转换数据结构的方法 |
Also Published As
| Publication number | Publication date |
|---|---|
| US11347792B2 (en) | 2022-05-31 |
| CN110166851B (zh) | 2022-01-04 |
| CN110166851A (zh) | 2019-08-23 |
| US20200409996A1 (en) | 2020-12-31 |
| EP3843418A4 (en) | 2021-10-13 |
| EP3843418A1 (en) | 2021-06-30 |
| EP3843418B1 (en) | 2024-06-05 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2020038243A1 (zh) | 一种视频摘要生成方法、装置、计算设备和存储介质 | |
| AU2019362347B2 (en) | Image processing method and apparatus, and device | |
| CN108090497B (zh) | 视频分类方法、装置、存储介质及电子设备 | |
| CN109087376B (zh) | 图像处理方法、装置、存储介质及电子设备 | |
| AU2019363031A1 (en) | Image processing method and apparatus, and device | |
| CN107368550B (zh) | 信息获取方法、装置、介质、电子设备、服务器及系统 | |
| CN111562955A (zh) | 终端设备主题色彩的配置方法、装置和终端设备 | |
| CN108984258A (zh) | 应用分屏显示方法、装置、存储介质和电子设备 | |
| CN112069341A (zh) | 背景图片生成及搜索结果展示方法、装置、设备和介质 | |
| CN103971134B (zh) | 图像分类、检索和校正方法,以及相应装置 | |
| US10964288B2 (en) | Automatically adapt user interface color scheme for digital images and video | |
| WO2022073516A1 (zh) | 生成图像的方法、装置、电子设备及介质 | |
| WO2020259412A1 (zh) | 展示资源的方法、装置、设备及存储介质 | |
| CN110580508A (zh) | 视频分类方法、装置、存储介质和移动终端 | |
| US20110064319A1 (en) | Electronic apparatus, image display method, and content reproduction program | |
| CN113762058B (zh) | 一种视频合成方法、装置、计算机设备和存储介质 | |
| CN110022397A (zh) | 图像处理方法、装置、存储介质及电子设备 | |
| CN114693516A (zh) | 图像处理方法、装置、计算机设备及计算机可读存储介质 | |
| CN111080748B (zh) | 基于互联网的图片自动合成系统 | |
| WO2023093721A1 (zh) | 资源召回方法、装置及网络侧设备 | |
| CN109492121A (zh) | 智能识别图片的方法、装置、服务器及存储介质 | |
| CN108833881A (zh) | 构建图像深度信息的方法及装置 | |
| CN104137101A (zh) | 用于管理媒体文件的方法、装置和计算机程序产品 | |
| US8463052B2 (en) | Electronic apparatus and image search method | |
| JP2024048364A (ja) | インスタントメッセージングサービスにおけるイメージ検索方法およびシステム |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19852152 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2019852152 Country of ref document: EP Effective date: 20210322 |
