WO2016143067A1 - Dispositif d'analyse d'image - Google Patents
Dispositif d'analyse d'image Download PDFInfo
- Publication number
- WO2016143067A1 WO2016143067A1 PCT/JP2015/056984 JP2015056984W WO2016143067A1 WO 2016143067 A1 WO2016143067 A1 WO 2016143067A1 JP 2015056984 W JP2015056984 W JP 2015056984W WO 2016143067 A1 WO2016143067 A1 WO 2016143067A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- moving object
- video
- data
- block
- unit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/18—Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
Definitions
- the present invention relates to a technique for detecting a moving object shown in an image.
- Video data obtained from a surveillance camera has a very large data size in an uncompressed state. For this reason, the hard disk capacity required for storing video data increases. Therefore, MPEG-4 and H.264 Video data is encoded by an encoding algorithm such as H.264. Then, video data whose data size has been reduced by encoding is stored in the hard disk.
- the observer detects the moving object shown in the video by viewing the reproduced video.
- People, animals, and automobiles are examples of moving objects.
- video data is accumulated over a long period of time, it is not practical and practical to detect a moving object visually.
- Patent Document 1 discloses a technique in which an arithmetic device performs decoding processing for decoding encoded video data and image analysis processing for searching for a monitoring target.
- an arithmetic device performs decoding processing for decoding encoded video data and image analysis processing for searching for a monitoring target.
- the efficiency until the search of the monitoring target is poor.
- the amount of image analysis processing is very large, a highly functional arithmetic device is required.
- An object of the present invention is to make it possible to specify a moving object block showing moving objects from video data.
- the video analysis apparatus of the present invention is A parameter acquisition unit that acquires a discrete cosine transform coefficient for each frequency component for each block of a plurality of blocks obtained by dividing the video represented by the video data; A moving object block specifying unit that specifies a block in which a discrete cosine transform coefficient of a frequency component of at least one of high frequency components larger than a frequency threshold is not zero from the plurality of blocks as a moving object block in which the moving object is reflected; Is provided.
- a moving object block showing a moving object can be specified from video data.
- FIG. 1 is a configuration diagram of a monitoring system 100 according to Embodiment 1.
- FIG. FIG. 3 is a functional configuration diagram of an accumulation server 120 in the first embodiment.
- 2 is a functional configuration diagram of a monitoring camera 110 according to Embodiment 1.
- FIG. 2 is a functional configuration diagram of a video encoding device 200 according to Embodiment 1.
- FIG. 3 is a functional configuration diagram of a video analysis device 300 according to the first embodiment.
- FIG. 3 is a functional configuration diagram of an analysis execution determination unit 320 in the first embodiment.
- 2 is a functional configuration diagram of a moving object region detection unit 330 according to Embodiment 1.
- FIG. FIG. 3 is a functional configuration diagram of a target object detection unit 340 according to the first embodiment.
- FIG. 5 is a configuration diagram of an encoding parameter 133 in the first embodiment.
- FIG. 3 is a relationship diagram between the monitoring camera 110 and the subject 102 in the first embodiment.
- FIG. 6 is a relationship diagram between the area of the video 103 and the subject area 104 in the first embodiment.
- FIG. 2 is a hardware configuration diagram of a video analysis device 300 and a video encoding device 200 according to Embodiment 1.
- FIG. 4 is a flowchart of a video encoding method according to the first embodiment.
- 5 is a flowchart of a video analysis method according to Embodiment 1.
- 7 is a flowchart of analysis execution determination processing (S230) in the first embodiment.
- FIG. 3 is a configuration diagram of a monitoring system 100 in a second embodiment.
- FIG. 6 is a functional configuration diagram of a monitoring camera 110 and a model server 150 in the second embodiment.
- Embodiment 1 FIG. A monitoring system 100 for monitoring a moving object will be described with reference to FIGS.
- the monitoring system 100 is a system for monitoring the monitoring space 101.
- the monitoring system 100 includes a monitoring camera 110 and a storage server 120.
- the monitoring camera 110 is installed obliquely above the monitoring space 101, continuously images the monitoring space 101, and generates video data for each shooting. Then, the monitoring camera 110 generates encoded data 130 and analysis result data 140 using the video data for each shooting.
- the accumulation server 120 accumulates the encoded data 130 and the analysis result data 140 for each shooting.
- the video data represents a video showing the monitoring space 101.
- An image includes a plurality of pixels, and each pixel has color information.
- a video consists of continuous images, and the images are also called frames.
- Video data is also referred to as image data or frame data.
- the color information is expressed in RGB (Red, Green, Blue) format.
- the encoded data 130 represents an encoded video.
- the analysis result data 140 includes information related to the object shown in the video of the monitoring space 101.
- the target object is a moving object to be detected.
- the accumulation server 120 includes a data receiving unit 121, an encoded data management unit 122, an analysis result data management unit 123, and a server storage unit 129.
- the data receiving unit 121 receives the encoded data 130 and the analysis result data 140 transmitted from the monitoring camera 110.
- the encoded data management unit 122 accumulates the encoded data 130 in the server storage unit 129 in time series.
- the analysis result data management unit 123 accumulates the analysis result data 140 in the server storage unit 129 in time series.
- the server storage unit 129 accumulates the encoded data 130 and the analysis result data 140 in time series.
- the analysis result data management unit 123 may generate an analysis result table in which the analysis result data 140 is associated with the encoded data 130 and store the analysis result table in the server storage unit 129.
- the supervisor can search the analysis result data 140 of the video represented by the encoded data 130 by referring to the analysis result data.
- the monitoring camera 110 includes a data reception unit 111, a video encoding device 200, a video analysis device 300, a data transmission unit 112, an encoded data storage unit 113, and a model storage unit 114.
- the monitoring camera 110 includes an imaging device that is hardware.
- the image sensor is also called an image sensor.
- the data receiving unit 111 receives video data 119 output from the image sensor.
- the video encoding device 200 generates encoded data 130, DCT coefficient data 131, and motion vector data 132 using the video data 119.
- DCT is an abbreviation for discrete cosine transform.
- the DCT coefficient data 131 and the motion vector data 132 will be described later.
- the video analysis device 300 generates the analysis result data 140 using the video data 119, the DCT coefficient data 131, the motion vector data 132, and the model database 400.
- the model database 400 will be described later.
- the encoded data storage unit 113 stores encoded data 130, DCT coefficient data 131, and motion vector data 132.
- the DCT coefficient data 131 includes discrete cosine transform coefficients for each frequency component for each block of a plurality of blocks obtained by dividing the video represented by the video data 119.
- a block is also called a macroblock.
- the frequency component is also called a spatial frequency component.
- the motion vector data 132 includes a motion vector for each block of a plurality of blocks obtained by dividing the video represented by the video data 119. The motion vector represents the motion of the subject shown in the block.
- the model storage unit 114 stores the model database 400.
- the model database 400 associates different object models with each of a plurality of divided regions obtained by dividing the video represented by the video data 119 in the vertical direction.
- the object model is data representing the characteristics of the object.
- the vertical direction means the vertical direction of the image.
- Each object model included in the model database 400 includes size information indicating a size range.
- Each object model included in the model database 400 includes feature information representing features other than size.
- the video encoding device 200 includes a video data reception unit 210, a motion vector calculation unit 220, a motion compensation prediction unit 230, a difference calculation unit 231, a superimposition unit 232, and a previous data storage unit 290.
- the video encoding apparatus 200 includes a DCT unit 240, a quantization unit 241, an entropy encoding unit 242, an inverse quantization unit 243, and an inverse DCT unit 244.
- the video encoding device 200 includes an encoded data management unit 250.
- the video data receiving unit 210 receives the video data 119.
- the motion vector calculation unit 220 generates motion vector data 132 using the video data 119 and the previous data 291.
- the previous data 291 represents the previous video.
- the motion compensation prediction unit 230 generates prediction data 239 using the motion vector data 132 and the previous data 291.
- the prediction data 239 is video data generated by prediction.
- the difference calculation unit 231 generates difference data 238 using the video data 119 and the prediction data 239.
- the difference data 238 represents the difference between the current video represented by the video data 119 and the predicted video represented by the prediction data 239.
- the superposition unit 232 uses the prediction data 239 and the difference data 247 to generate video data representing the current video.
- the generated video data is used as the previous data 291 when the next video data 119 is received.
- the previous data storage unit 290 stores the previous data 291.
- the DCT unit 240 performs discrete cosine transform (DCT) on the difference data 238 to generate DCT coefficient data 131.
- the quantization unit 241 performs quantization on the DCT coefficient data 131 and generates quantization coefficient data 249.
- the quantization coefficient data 249 includes a quantization coefficient for each frequency component for each video block.
- the quantization coefficient is a quantized DCT coefficient.
- the entropy encoding unit 242 performs entropy encoding on the quantized coefficient data 249 to generate encoded data 130.
- the inverse quantization unit 243 performs inverse quantization on the quantized coefficient data 249 to generate DCT coefficient data 248.
- the DCT coefficient data 248 includes DCT coefficients for each frequency component.
- the inverse DCT unit 244 performs inverse DCT on the DCT coefficient data 248 to generate difference data 247.
- the difference data 247 represents the difference between the current video and the predicted video.
- the encoded data management unit 250 stores the encoded data 130, the DCT coefficient data 131, and the motion vector data 132 in the encoded data storage unit 113 of the monitoring camera 110. After storing these data in the encoded data storage unit 113, the encoded data management unit 250 outputs a completion notification.
- the video analysis device 300 includes a video data reception unit 310, a completion notification reception unit 311, an analysis execution determination unit 320, and a video data storage unit 390.
- the video analysis device 300 includes a moving object region detection unit 330, a target object detection unit 340, a color information generation unit 350, and an analysis result generation unit 360.
- the video data receiving unit 310 receives the video data 119.
- Video data representing the first video is referred to as first video data
- video data representing the second video is referred to as second video data.
- the second video is a video shot at a time different from the time when the first video was shot.
- the first video data is the video data 119 and the second video data is the previous data 118.
- the previous data 118 is the video data 119 accepted last time.
- the video data storage unit 390 stores video data 119 and previous data 118.
- the completion notification receiving unit 311 receives a completion notification output from the video encoding device 200.
- the analysis execution determination unit 320 determines whether to perform video analysis based on the video data 119 and the previous data 118. When it is determined that the video analysis is to be performed, the moving object region detection unit 330, the target object detection unit 340, the color information generation unit 350, and the analysis result generation unit 360 operate.
- the moving object region detection unit 330 uses the video data 119 to generate moving object region data 339 representing the moving object region.
- the moving object area is an area in which a moving object is shown in the video represented by the video data 119.
- the target object detection unit 340 generates the target object detection data 349 using the moving object region data 339.
- the object detection data 349 includes area information indicating the range of the object area.
- the target object area is a moving object area in which the target object is shown.
- the color information generation unit 350 generates color information data 359 using the object detection data 349.
- the color information data 359 includes color information representing the characteristics of the target object.
- color information is generated based on the color information of each pixel constituting the moving object region.
- the analysis result generation unit 360 generates the analysis result data 140 using the object detection data 349 and the color information data 359.
- the analysis result data 140 includes area information indicating the range of the object area and color information indicating the color characteristics of the object.
- the analysis execution determination unit 320 includes a change amount calculation unit 321, a change amount determination unit 322, and an analysis instruction unit 323.
- the change amount calculation unit 321 changes between the first video and the second video based on the pixel value of each pixel constituting the first video and the pixel value of each pixel constituting the second video.
- the quantity 329 is calculated.
- the change amount determination unit 322 determines whether the change amount 329 is larger than the change amount threshold.
- the change amount threshold is set by the user.
- the analysis instruction unit 323 instructs the moving object region detection unit 330 to start video analysis.
- the moving object region detection unit 330 includes a parameter acquisition unit 331, a moving object block specification unit 332, and a moving object region specification unit 333.
- the parameter acquisition unit 331 acquires DCT coefficient data 131 and motion vector data 132.
- the moving object block specifying unit 332 generates moving object block data 338 using the DCT coefficient data 131 and the motion vector data 132.
- the moving object block data 338 indicates the position of the moving object block in which the moving object is shown.
- the moving object block is a block specified from a plurality of blocks obtained by dividing the current video. In the first embodiment, one or more moving object blocks are specified.
- the block specified as the moving object block is a block in which the discrete cosine transform coefficient of the frequency component of at least one of the high frequency components is not zero.
- the high frequency is a frequency higher than the frequency threshold.
- the frequency threshold is set by the user.
- a block identified as a moving object block is a block whose discrete cosine transform coefficient of at least one of the high frequency components is not zero, and the magnitude of the motion vector is equal to or greater than the motion threshold. It is a certain block.
- the motion threshold is set by the user.
- the moving body area specifying unit 333 generates moving body area data 339 indicating the moving body area using the moving body block data 338.
- the moving object area is an area specified from the current video based on the block positions of one or more moving object blocks.
- the area specified as the moving object area is an area including at least one moving object block.
- the first moving object region including the first moving object block is the first moving object block.
- the first moving body area includes the first moving body block, the second moving body block, Is a rectangular area containing
- the first moving object region includes the first moving object block, the second moving object block, and the second moving object block. This is a rectangular area including the third moving object block.
- the target object detection unit 340 includes a model acquisition unit 341 and a moving object determination unit 342.
- the model acquisition unit 341 acquires the matching model 348 from the model database 400 based on the moving object region data 339.
- the collation model 348 is an object model associated with a divided area to which the moving object area belongs, among the object models included in the model database 400.
- the moving object determination unit 342 determines whether the moving object shown in the moving object area is the target object based on the moving object area data 339 and the matching model 348.
- the moving object determination unit 342 determines that the moving object shown in the moving object region is the target object on the condition that the size of the moving object region is included in the range indicated by the size information included in the matching model 348.
- the moving object determination unit 342 is a moving object that is reflected in the moving object region on the condition that the similarity between the feature represented by the feature information included in the matching model 348 and the feature of the moving object region is higher than the similarity threshold. Is determined.
- the similarity threshold is set by the user.
- the encoding parameter 133 includes motion vector data 132 and an encoding parameter 133.
- the motion vector data 132 includes a motion vector of each macro block obtained by dividing the video.
- One macroblock is an area of n ⁇ n pixels.
- one macro block is an area of 16 ⁇ 16 pixels.
- the DCT coefficient data 131 includes DCT coefficients of 8 ⁇ 8 frequency components for each macroblock.
- the relationship between the monitoring camera 110 and the subject 102 will be described.
- the subjects 102A to 102D appear larger as they are closer to the surveillance camera 110, and appear smaller as they are farther from the surveillance camera 110.
- the subject 102A, the subject 102B, the subject 102C, and the subject 102D appear larger in this order.
- the area of the video 103 will be described.
- the vertical direction means the vertical direction.
- the video 103 is divided into three divided areas. Of the three divided areas, the lower divided area is the first area, the middle divided area is the second area, and the upper divided area is the third area.
- the lower divided area has a larger vertical size.
- each of the subject areas 104A to 104E is a rectangular area in which the subject is shown.
- the subject area 104A and the subject area 104B in the first area are large, and the subject area 104E in the third area is small.
- the subject area 104C extends over the first area and the second area.
- the model database 400 includes a first area database 410, a second area database 420, and a third area database 430.
- the first area database 410 is a model database for the first area
- the second area database 420 is a database for the second area
- the third area database 430 is a database for the third area.
- Each database includes object models of a plurality of objects.
- the first area database 410 includes a first object model 411 and a second object model 412
- the second area database 420 includes a first object model 421 and a second object model 422.
- the object model is a learning model constructed by machine learning.
- Each object model includes size information, feature information, and an object identifier.
- the object identifier identifies the object.
- the type and name of the target are examples of the target identifier.
- the first object model 411 includes size information 411A, feature information 411B, and an object identifier 411C. The same applies to the other object models.
- the size information indicates the range of the size (width, height) of the rectangular area in which the object is reflected.
- the feature information indicates a feature amount representing the feature of the target object, excluding the size of the target object. Shape and brightness are examples of the characteristics of the object. Persons, animals, and cars are examples of objects.
- the video analysis device 300 is a computer including hardware such as a processor 901, an auxiliary storage device 902, a memory 903, a communication device 904, an input interface 905, and an output interface 906.
- the processor 901 is connected to other hardware via a signal line 910.
- the input interface 905 is connected to the input device 907 via a cable 911.
- the output interface 906 is connected to the output device 908 via the cable 912.
- the processor 901 is an IC that performs processing, and controls other hardware.
- An example of the processor 901 is a CPU, DSP, or GPU.
- IC is an abbreviation for Integrated Circuit.
- CPU is an abbreviation for Central Processing Unit
- DSP is an abbreviation for Digital Signal Processor
- GPU is an abbreviation for Graphics Processing Unit.
- the auxiliary storage device 902 stores data.
- An example of the auxiliary storage device 902 is a ROM, a flash memory, and an HDD.
- ROM is an abbreviation for Read Only Memory
- HDD is an abbreviation for Hard Disk Drive.
- the memory 903 stores data.
- An example of the memory 903 is a RAM. RAM is an abbreviation for Random Access Memory.
- the communication device 904 includes a receiver 9041 that receives data and a transmitter 9042 that transmits data.
- An example of the communication device 904 is a communication chip or a NIC.
- NIC is an abbreviation for Network Interface Card.
- the input interface 905 is a port to which a cable 911 is connected, and an example of the port is a USB terminal.
- USB is an abbreviation for Universal Serial Bus.
- the output interface 906 is a port to which the cable 912 is connected, and the USB terminal and the HDMI terminal are examples of ports.
- HDMI registered trademark
- the input device 907 inputs data, instructions and requests.
- An example of the input device 907 is a mouse, a keyboard, and a touch panel.
- the output device 908 outputs data, results and responses.
- An example of the output device 908 is a display or a printer.
- An example of a display is an LCD. LCD is Liquid Abbreviation for Crystal Display.
- the auxiliary storage device 902 stores an OS.
- OS is an abbreviation for Operating System.
- the auxiliary storage device 902 stores a program that realizes the function of the “unit” provided in the video analysis device 300 except for the “storage unit” provided in the video analysis device 300.
- At least a part of the OS is loaded into the memory 903, and the processor 901 executes a program that realizes the function of “unit” while executing the OS.
- a program that realizes the function of “unit” is loaded into the memory 903, read into the processor 901, and executed by the processor 901.
- the video analysis apparatus 300 may include a plurality of processors 901, and the plurality of processors 901 may execute a program that realizes the function of “unit” in cooperation with each other.
- Data, information, signal values, variable values, and the like indicating the processing results of “unit” are stored in the memory 903, the auxiliary storage device 902, a register in the processor 901, or a cache memory in the processor 901.
- circuitry may be implemented as “circuitry”. “Part” may be read as “circuit”, “process”, “procedure”, or “processing”. “Circuit” and “circuitry” are concepts including a processing circuit such as the processor 901, logic IC, GA, ASIC, and FPGA. GA is an abbreviation for Gate Array, and ASIC is Application Specific Integrated. Circuit is an abbreviation for FPGA, and FPGA is an abbreviation for Field-Programmable Gate Array.
- the hardware configuration of the video encoding device 200 is the same as the hardware configuration of the video analysis device 300.
- the operation of the video analysis device 300 corresponds to a video analysis method.
- the video analysis method corresponds to the processing procedure of the video analysis program.
- the operation of the video encoding device 200 corresponds to a video encoding method.
- the video encoding method corresponds to the processing procedure of the video encoding program.
- a video encoding method will be described with reference to FIG. S110 is a video data receiving process.
- the video data receiving unit 210 receives the video data 119.
- S120 is a motion vector calculation process.
- the motion vector calculation unit 220 generates motion vector data 132 using the video data 119 and the previous data 291 as follows.
- the motion vector calculation unit 220 divides the current video represented by the video data 119 into a plurality of macroblocks.
- the motion vector calculation unit 220 compares the current video with the previous video represented by the previous data 291 for each macroblock.
- the motion vector calculation unit 220 calculates a motion vector for each macroblock based on the comparison result.
- the motion vector calculation unit 220 generates motion vector data 132 including a motion vector for each macroblock.
- S130 is a motion compensation prediction process.
- the motion compensation prediction unit 230 generates prediction data 239 using the motion vector data 132 and the previous data 291 as follows.
- the motion compensation prediction unit 230 divides the previous video represented by the previous data 291 into a plurality of macroblocks.
- the motion compensation prediction unit 230 moves the previous video according to the motion vector included in the motion vector data 132 for each macroblock. Thus, a prediction video is generated.
- the motion compensation prediction unit 230 generates prediction data 239 representing a prediction video.
- S131 is a difference calculation process.
- the difference calculation unit 231 generates difference data 238 using the video data 119 and the prediction data 239 as follows.
- the difference calculation unit 231 calculates a difference between the current video represented by the video data 119 and the predicted video represented by the prediction data 239.
- the difference calculation unit 231 generates difference data 238 representing the difference video.
- S140 is a discrete cosine transform process.
- the DCT unit 240 generates the DCT coefficient data 131 as follows.
- the DCT unit 240 divides the difference video represented by the difference data 238 into a plurality of macro blocks.
- the DCT unit 240 calculates DCT coefficients for each frequency component by performing discrete cosine transform (DCT) for each macroblock.
- the DCT unit 240 generates DCT coefficient data 131 including DCT coefficients for each frequency component for each macroblock.
- S141 is a quantization process.
- the quantization unit 241 generates quantization coefficient data 249 as follows.
- the quantization unit 241 calculates a quantization coefficient for each frequency component by quantizing the DCT coefficient for each frequency component included in the DCT coefficient data 131 for each macroblock.
- the quantization unit 241 generates quantization coefficient data 249 including a quantization coefficient for each frequency component for each macroblock.
- S142 is an entropy encoding process.
- the entropy encoding unit 242 generates encoded data 130 as follows.
- the entropy encoding unit 242 calculates an entropy code for each frequency component by entropy encoding the quantization coefficient for each frequency component included in the quantization coefficient data 249 for each macroblock.
- the entropy encoding unit 242 generates encoded data 130 including an entropy code for each frequency component for each macroblock.
- S150 is an encoded data management process.
- the encoded data management unit 250 stores the encoded data 130, the DCT coefficient data 131, and the motion vector data 132 in the encoded data storage unit 113 of the monitoring camera 110.
- S160 is an inverse quantization process and an inverse discrete cosine transform process.
- the inverse quantization unit 243 and the inverse DCT unit 244 generate the difference data 247 as follows.
- the inverse quantization unit 243 calculates the DCT coefficient for each frequency component by inversely quantizing the quantization coefficient for each frequency component included in the quantization coefficient data 249 for each macroblock.
- the inverse quantization unit 243 generates DCT coefficient data 248 including DCT coefficients for each frequency component for each macroblock.
- the inverse DCT unit 244 performs inverse discrete cosine transform (inverse DCT) on the DCT coefficient for each frequency component included in the DCT coefficient data 248 for each macroblock. Thereby, a difference image is generated.
- the inverse DCT unit 244 generates difference data 247 representing the difference video.
- S170 is an overlay process.
- the superimposition unit 232 stores the video data generated as follows in the previous data storage unit 290 as the previous data 291 for the next video data 119.
- the superimposing unit 232 superimposes the difference video represented by the difference data 247 on the video represented by the prediction data 239 that is video data generated by the prediction. As a result, the current video after encoding is generated.
- the superimposing unit 232 generates video data representing the current video after encoding.
- a video analysis method will be described with reference to FIG. S210 is a video data receiving process.
- the video data receiving unit 310 receives the video data 119 and stores the video data 119 in the video data storage unit 390.
- S220 is a completion notification acceptance process.
- the completion notification receiving unit 311 receives a completion notification.
- S230 is an analysis execution determination process. If there is an instruction for video analysis, the process proceeds to S240. If there is no instruction for video analysis, the processing of the video analysis method ends.
- step S ⁇ b> 231 the change amount calculation unit 321 acquires the video data 119 and the previous data 118 from the video data storage unit 390.
- the video data 119 represents the current video
- the previous data 118 represents the previous video.
- S232 is a change amount calculation process.
- the change amount calculation unit 321 calculates the change amount 329 as follows.
- the change amount calculation unit 321 compares the current video and the previous video for each pixel, and calculates a difference in pixel value for each pixel.
- the change amount calculation unit 321 calculates a value obtained by summing the calculated pixel value differences as the change amount 329.
- S233 is a change amount determination process.
- the change amount determination unit 322 determines whether the change amount 329 is larger than the change amount threshold. When the amount of change 329 is large, there is a high possibility that the moving object is reflected in the video. When the change amount 329 is larger than the change amount threshold, the process proceeds to S234. If the change amount 329 is not greater than the change amount threshold value, the analysis execution determination process (S230) ends.
- step S234 is an analysis instruction process.
- the analysis instruction unit 323 instructs the moving object region detection unit 330 to start video analysis.
- S240 is a moving object area
- the moving object region detection unit 330 generates moving object region data 339.
- S241 is a parameter acquisition process.
- the parameter acquisition unit 331 acquires the DCT coefficient data 131 and the motion vector data 132 from the encoded data storage unit 113 of the monitoring camera 110.
- S242 to S246 are moving object block specifying processes.
- the moving object block specifying unit 332 selects one macroblock in the order of the block positions.
- the moving object block specifying unit 332 operates as follows.
- the moving object block specifying unit 332 acquires the DCT coefficient for each frequency component of the selected macroblock from the DCT coefficient data 131.
- the moving object block specifying unit 332 selects a DCT coefficient for a high frequency component from DCT coefficients for each frequency component.
- the high frequency component is a frequency component having a frequency higher than the frequency threshold.
- the moving object block specifying unit 332 determines whether at least one of the DCT coefficients of the high frequency component is not zero. This is because a macroblock in which a moving object is reflected has non-zero DCT coefficients from a low frequency component to a high frequency component.
- a non-zero DCT coefficient exists in a low frequency component but does not exist in a high frequency component. If at least one of the DCT coefficients of the high frequency component is not zero, the process proceeds to S244. If all of the DCT coefficients of the high frequency components are zero, the process proceeds to S246.
- the moving object block specifying unit 332 operates as follows.
- the moving object block specifying unit 332 acquires the motion vector of the selected macroblock from the motion vector data 132.
- the moving object block specifying unit 332 determines whether the motion vector is larger than the motion threshold. This is because a macro block showing a moving object has a large motion vector. If the motion vector is larger than the motion threshold, the process proceeds to S245. If the motion vector is smaller than the motion threshold, the process proceeds to S246.
- the moving object block specifying unit 332 registers the block position of the selected macro block in the moving object block data 338 as the block position of the moving object block.
- the moving object block specifying unit 332 determines whether there is an unselected macroblock that has not been selected in S242. If there is an unselected macroblock, the process returns to S242. If there is no unselected macroblock, the process proceeds to S247.
- S247 is the moving object region specifying process.
- the moving object region specifying unit 333 generates moving object region data 339.
- the moving object region specifying unit 333 selects one moving object block in the order of the block positions based on the moving object block data 338.
- the moving object area specifying unit 333 registers the selected moving object block as the moving object area in the moving object area data 339.
- the registration of the moving object area means registration of an area identifier for identifying the moving object area and area information indicating the range of the moving object area. .
- the moving object region specifying unit 333 determines whether at least one of the adjacent blocks adjacent to the selected moving object block is a moving object block based on the moving object block data 338. If the selected moving object block is not a macro block located at the edge of the video, eight macro blocks located around the selected moving object block are adjacent macro blocks. If at least one of the adjacent blocks is a moving object block, the process proceeds to S2473. If no adjacent block is a moving object block, the process proceeds to S2474.
- the moving object region specifying unit 333 updates the moving object region including the selected moving object block to the rectangular region including the adjacent block.
- the adjacent block is located at the edge portion of the updated moving object region.
- the moving object region specifying unit 333 determines whether there is an unselected moving object block that has not been selected in S2471. If there is an unselected moving object block, the process returns to S2471. When there is no unselected moving object block, the moving object region specifying process (S247) ends.
- S250 is a target object detection process.
- the target object detection unit 340 generates target object detection data 349.
- S251, S252 and S258 are model acquisition processes.
- the model acquisition unit 341 selects one unselected moving object region from the moving object region data 339.
- S252 to S256 are executed for each selected moving body region.
- the model acquisition unit 341 acquires the database for the divided area to which the moving object area belongs from the model database 400.
- a database for each divided region of the plurality of divided regions is acquired.
- S253 to S257 are moving object determination processing.
- the moving object determination unit 342 selects one unselected object model from the acquired database.
- the selected object model is referred to as a matching model.
- the moving object determination unit 342 determines whether the size of the moving object region is included in the range indicated by the size information included in the collation model.
- the size of the moving object area is calculated based on the area information of the moving object area. If the size of the moving object region is included in the range indicated by the size information included in the collation model, the process proceeds to S254-2. If the size of the moving object region is not included in the range indicated by the size information included in the matching model, the process proceeds to S255.
- the moving object determination unit 342 calculates the similarity between the moving object region and the matching model as follows.
- the moving object determination unit 342 calculates the feature amount of the moving object region based on the pixel value of each pixel constituting the moving object region.
- the moving object determination unit 342 calculates the difference between the feature value of the moving object region and the feature value indicated by the feature information included in the matching model as the similarity.
- step S254-3 the moving object determination unit 342 determines whether the similarity between the moving object region and the matching model is greater than the maximum similarity. The initial value of the maximum similarity is zero. If the similarity is greater than the maximum similarity, the process proceeds to S254-4. If the similarity is not greater than the maximum similarity, the process proceeds to S254-4.
- the moving object determination unit 342 updates the maximum similarity to the similarity between the moving object region and the matching model, and stores the model identifier of the matching model.
- the moving object determination unit 342 determines whether there is an unselected target model that has not been selected in S253. If there is an unselected object model, the process returns to S253. If there is no unselected object model, the process proceeds to S256.
- the moving object determination unit 342 determines whether the maximum similarity is greater than the similarity threshold. If the maximum similarity is greater than the similarity threshold, the process proceeds to S257. If the maximum similarity is not greater than the similarity threshold, the process proceeds to S258.
- the moving object determination unit 342 registers the target object region in the target object detection data 349 as follows. Registration of the object area means registration of the area identifier of the object area and the area information of the object area. The moving object determination unit 342 acquires the object identifier from the matching model with the maximum similarity. The moving body determination unit 342 registers the moving body area as the target body area in the target body detection data 349 in association with the target body area and the target body identifier.
- the model acquisition unit 341 determines whether there is an unselected moving body region that has not been selected in S251. If there is an unselected moving body region, the process returns to S251. If there is no unselected moving object region, the target object detection process (S250) ends.
- S260 is a color information generation process.
- the color information generation unit 350 generates the color information data 359 as follows.
- the color information generation unit 350 generates the color information of the target area for each target area registered in the target detection data 349 based on the color information of each pixel constituting the target area.
- the color information generation unit 350 registers color information for each target body region in the color information data 359.
- the color information of the target region indicates red.
- a color threshold value to be compared with the R value, the G value, and the B value a pixel number threshold value to be compared with the number of pixels, a color palette table in which each color is associated with the RGB value, and the like are used. May be. For example, when the number of pixels representing the same color is larger than the pixel number threshold, the color information indicating the color is the color information of the object region.
- S270 is an analysis result generation process.
- the analysis result generation unit 360 generates the analysis result data 140 using the object detection data 349 and the color information data 359.
- the analysis result data 140 includes the shooting date and time of the video represented by the video data 119, the frame number for identifying the video, the range of the moving object region in which the target is shown, the target identifier for identifying the target, and the color characteristics of the target. Including color information to indicate.
- An object reflected in the video can be detected without decoding the encoded video data 119.
- the processing amount of the video analysis can be reduced.
- video analysis may be performed on all the video data 119, or video analysis may be performed on the video data 119 selected by the user.
- FIG. A mode in which the monitoring camera 110 does not include the model storage unit 114 will be described with reference to FIGS. 21 and 22. However, the description which overlaps with Embodiment 1 is abbreviate
- the monitoring system 100 includes a model server 150 that stores a model database 400.
- the monitoring camera 110 obtains a necessary object model by accessing the model database 400 of the model server 150.
- a model storage unit 114 that stores the model database 400 is provided in the model server 150 instead of the monitoring camera 110.
- the video analysis apparatus 300 acquires a necessary target object model by accessing the model database 400 of the model server 150.
- Each embodiment is an example of a preferred embodiment and is not intended to limit the technical scope of the present invention. Each embodiment may be partially implemented.
- the processing procedure described using the flowchart and the like is an example of the processing procedure of the video analysis device, the video analysis method, the video analysis program, the video encoding device, the video encoding method, and the video encoding program.
- 100 surveillance system 101 surveillance space, 102 subject, 103 video, 104 subject area, 110 surveillance camera, 111 data reception unit, 112 data transmission unit, 113 encoded data storage unit, 114 model storage unit, 119 video data, 120 accumulation Server, 121 Data receiving unit, 122 Encoded data management unit, 123 Analysis result data management unit, 129 Server storage unit, 130 Encoded data, 131 DCT coefficient data, 132 Motion vector data, 133 Encoding parameter, 140 Analysis result data 150 Model server, 200 video encoding device, 210 video data reception unit, 220 motion vector calculation unit, 230 motion compensation prediction unit, 231 difference calculation unit, 232 superposition unit, 239 prediction data, 240 DCT unit, 241 quantization unit, 242 Entropy encoding unit, 243 Inverse quantization unit, 244 Inverse DCT unit, 247 Difference data, 248 DCT coefficient data, 249 Quantized coefficient data, 250 Encoded data management unit, 290 Previous data storage unit, 300 Video analysis device, 310 video data reception unit, 311 completion notification
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
- Closed-Circuit Television Systems (AREA)
Abstract
Selon l'invention, une unité de détection (330) de zone d'objet mobile acquiert, pour chacun d'une pluralité de blocs en lesquels est divisée une image représentée par des données d'image (119), un facteur de transformation en cosinus discrète sur une composante de fréquence par base de composante de fréquence. L'unité de détection de zone d'objet mobile détermine, à partir de la pluralité de blocs, un bloc pour lequel le facteur de transformation en cosinus discrète pour au moins l'une quelconque de composantes de fréquence ayant des hautes fréquences supérieures à une valeur de seuil de fréquence est différent de zéro, en tant que bloc d'objet mobile dans lequel un objet mobile est représenté par une image. L'unité de détection de zone d'objet mobile détecte à partir de l'image une zone d'objet mobile comprenant le bloc d'objet mobile.
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2015/056984 WO2016143067A1 (fr) | 2015-03-10 | 2015-03-10 | Dispositif d'analyse d'image |
| JP2017504482A JP6415689B2 (ja) | 2015-03-10 | 2015-03-10 | 映像解析装置 |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2015/056984 WO2016143067A1 (fr) | 2015-03-10 | 2015-03-10 | Dispositif d'analyse d'image |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2016143067A1 true WO2016143067A1 (fr) | 2016-09-15 |
Family
ID=56878568
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2015/056984 Ceased WO2016143067A1 (fr) | 2015-03-10 | 2015-03-10 | Dispositif d'analyse d'image |
Country Status (2)
| Country | Link |
|---|---|
| JP (1) | JP6415689B2 (fr) |
| WO (1) | WO2016143067A1 (fr) |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2019082318A1 (fr) * | 2017-10-25 | 2019-05-02 | 株式会社ソシオネクスト | Dispositif, système et procédé de traitement d'images vidéo |
| WO2021112234A1 (fr) * | 2019-12-06 | 2021-06-10 | 京セラ株式会社 | Système de traitement d'informations, dispositif de traitement d'informations et procédé de traitement d'informations |
| JP2021092826A (ja) * | 2019-12-06 | 2021-06-17 | 京セラ株式会社 | 情報処理システム、情報処理装置、および情報処理方法 |
| JP2021099629A (ja) * | 2019-12-20 | 2021-07-01 | 京セラ株式会社 | 情報処理システム、情報処理装置及び情報処理方法 |
| JP2021103349A (ja) * | 2019-12-24 | 2021-07-15 | 京セラ株式会社 | 情報処理システム、情報処理装置及び情報処理方法 |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH10247247A (ja) * | 1997-03-04 | 1998-09-14 | Oki Electric Ind Co Ltd | 移動体抽出装置 |
| JP2001250118A (ja) * | 2000-03-06 | 2001-09-14 | Kddi Corp | 動画像内の移動物体検出追跡装置 |
| JP2001339693A (ja) * | 2000-05-26 | 2001-12-07 | Toshiba Corp | 動画像再生表示装置 |
-
2015
- 2015-03-10 WO PCT/JP2015/056984 patent/WO2016143067A1/fr not_active Ceased
- 2015-03-10 JP JP2017504482A patent/JP6415689B2/ja active Active
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH10247247A (ja) * | 1997-03-04 | 1998-09-14 | Oki Electric Ind Co Ltd | 移動体抽出装置 |
| JP2001250118A (ja) * | 2000-03-06 | 2001-09-14 | Kddi Corp | 動画像内の移動物体検出追跡装置 |
| JP2001339693A (ja) * | 2000-05-26 | 2001-12-07 | Toshiba Corp | 動画像再生表示装置 |
Non-Patent Citations (2)
| Title |
|---|
| AKIO YONEYAMA ET AL.: "Detection of moving objects from MPEG video stream", IEICE TECHNICAL REPORT, vol. 96, no. 385, 22 November 1996 (1996-11-22), pages 63 - 70 * |
| AKIO YONEYAMA ET AL.: "Moving Object Detection from MPEG Video Stream", THE TRANSACTIONS OF THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS D-II, vol. J81-D-II, no. 8, 25 August 1998 (1998-08-25), pages 1776 - 1786 * |
Cited By (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2019082318A1 (fr) * | 2017-10-25 | 2019-05-02 | 株式会社ソシオネクスト | Dispositif, système et procédé de traitement d'images vidéo |
| CN111279388A (zh) * | 2017-10-25 | 2020-06-12 | 株式会社索思未来 | 动态图像处理装置、动态图像处理系统、以及动态图像处理方法 |
| JPWO2019082318A1 (ja) * | 2017-10-25 | 2020-11-19 | 株式会社ソシオネクスト | 動画像処理装置、動画像処理システム、及び動画像処理方法 |
| WO2021112234A1 (fr) * | 2019-12-06 | 2021-06-10 | 京セラ株式会社 | Système de traitement d'informations, dispositif de traitement d'informations et procédé de traitement d'informations |
| JP2021092826A (ja) * | 2019-12-06 | 2021-06-17 | 京セラ株式会社 | 情報処理システム、情報処理装置、および情報処理方法 |
| JP7316203B2 (ja) | 2019-12-06 | 2023-07-27 | 京セラ株式会社 | 情報処理システム、情報処理装置、および情報処理方法 |
| US12430896B2 (en) | 2019-12-06 | 2025-09-30 | Kyocera Corporation | Information processing system, information processing device, and information processing method that performs at least any one of plural kinds of image processing on a taken image |
| JP2021099629A (ja) * | 2019-12-20 | 2021-07-01 | 京セラ株式会社 | 情報処理システム、情報処理装置及び情報処理方法 |
| JP7517819B2 (ja) | 2019-12-20 | 2024-07-17 | 京セラ株式会社 | 情報処理システム、情報処理装置、情報処理方法及びプログラム |
| JP2021103349A (ja) * | 2019-12-24 | 2021-07-15 | 京セラ株式会社 | 情報処理システム、情報処理装置及び情報処理方法 |
| JP7381330B2 (ja) | 2019-12-24 | 2023-11-15 | 京セラ株式会社 | 情報処理システム、情報処理装置及び情報処理方法 |
Also Published As
| Publication number | Publication date |
|---|---|
| JPWO2016143067A1 (ja) | 2017-06-29 |
| JP6415689B2 (ja) | 2018-10-31 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP6415689B2 (ja) | 映像解析装置 | |
| US11164328B2 (en) | Object region detection method, object region detection apparatus, and non-transitory computer-readable medium thereof | |
| US20120275524A1 (en) | Systems and methods for processing shadows in compressed video images | |
| KR102500265B1 (ko) | 블록의 모션 벡터에 기초한 이미지의 블록의 분산 결정 | |
| KR102261669B1 (ko) | 인공신경망 기반 객체영역 검출방법, 장치 및 이에 대한 컴퓨터 프로그램 | |
| US9712828B2 (en) | Foreground motion detection in compressed video data | |
| US12047582B2 (en) | Image encoding/decoding method and device using symmetric motion vector difference (SMVD), and method for transmitting bitstream | |
| JP2007142521A (ja) | 動きベクトル算出装置および動きベクトル算出方法 | |
| US10034016B2 (en) | Coding apparatus, computer system, coding method, and computer product | |
| CN103051891A (zh) | 确定数据流内分块预测编码的视频帧的块的显著值的方法和装置 | |
| US9736477B2 (en) | Performing video encoding mode decision based on motion activity | |
| EP1480170A1 (fr) | Méthode et appareil de traitement d'images | |
| KR102345258B1 (ko) | 객체영역 검출방법, 장치 및 이에 대한 컴퓨터 프로그램 | |
| US20220337842A1 (en) | Image encoding/decoding method and device for performing bdof, and method for transmitting bitstream | |
| US20160057429A1 (en) | Coding apparatus, method, computer product, and computer system | |
| CN119277063A (zh) | 帧内预测方法、装置及设备 | |
| CN115861073A (zh) | 一种图像拼接方法、装置、设备和存储介质 | |
| US10397566B2 (en) | Image coding apparatus, image coding method, and program | |
| US11659194B2 (en) | Method of adjusting bitrate of image and image capturing apparatus | |
| US20150181221A1 (en) | Motion detecting apparatus, motion detecting method and program | |
| US11956441B2 (en) | Identifying long term reference frame using scene detection and perceptual hashing | |
| US9471992B2 (en) | Moving image processing apparatus, moving image processing method, and computer product | |
| KR102594803B1 (ko) | 압축영상에 대한 신택스 기반의 동일인 검색 방법 | |
| JP2007158855A (ja) | 動きベクトル検出装置および動きベクトル検出方法 | |
| KR20220157832A (ko) | 움직임이 존재하는 프레임의 검출방법 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15884559 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2017504482 Country of ref document: JP Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 15884559 Country of ref document: EP Kind code of ref document: A1 |