EP3782114A1 - Matériel et système de génération de rectangle englobant pour pipeline de traitement d'image - Google Patents
Matériel et système de génération de rectangle englobant pour pipeline de traitement d'imageInfo
- Publication number
- EP3782114A1 EP3782114A1 EP19788663.3A EP19788663A EP3782114A1 EP 3782114 A1 EP3782114 A1 EP 3782114A1 EP 19788663 A EP19788663 A EP 19788663A EP 3782114 A1 EP3782114 A1 EP 3782114A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- bounding boxes
- image
- pixel
- bounding box
- bounding
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30232—Surveillance
Definitions
- the present invention relates to an image processing system and, more
- Image processing is used for a variety of implementations, including tracking and surveillance applications.
- bounding boxes are used to identify an object and, ideally, track that object across image frames and scenery.
- Bounding boxes can be formed by boxing connected components. For example, the work of Walczyk et al. described performing connected components labeling of a binary image (see,“Comparative Study on Connected Components
- This disclosure provides a system bounding box generation.
- the system includes one or more processors and a memory.
- the memory has executable instructions, such that upon execution of the instructions, the one or more processors perform several operations, such as receiving an image, the image comprised of pixels having a one-bit value per pixel; generating bounding boxes around connected components in the image, the connected components having pixel coordinate and pixel count information; generating a ranking score for each bounding box based on the pixel coordinate and pixel count information; filtering the bounding boxes to remove bounding boxes that exceed a
- the processor is a field programmable gate array (FPGA).
- FPGA field programmable gate array
- generating the bounding box further includes operations of grouping contiguous pixels in the image; and merging connected pixels as connected components, with the bounding box formed of a box that encompasses the connected components.
- controlling the device includes causing a video platform to move to maintain at least one of the remaining bounding boxes within a field of view of the video platform.
- the present invention also includes a computer program product and a computer implemented method.
- the computer program product includes computer-readable instructions stored on a non-transitory computer-readable medium that are executable by a computer having one or more processors, such that upon execution of the instructions, the one or more processors perform the operations listed herein.
- the computer implemented method includes an act of causing a computer to execute such instructions and perform the resulting operations.
- FIG. 1 is a block diagram depicting the components of a system according to various embodiments of the present invention.
- FIG. 2 is an illustration of a computer program product embodying an aspect of the present invention
- FIG. 3 is a flowchart illustrating relationships between variables and arrays during preparation according to various embodiments of the present invention
- FIG. 4 is an illustration of a search block used to find pixels labeled values according to various embodiments of the present invention
- FIG. 5 is a flowchart illustrating a search/label process according to various embodiments of the present invention.
- FIG. 6A is an illustration depicting a partial image and corresponding
- FIG. 6B is an illustration depicting a partial image and corresponding
- FIG. 6C is an illustration of the full image as partially depicted in FIGs. 6 A and 6B, and corresponding labelling according to various embodiments of the present invention
- FIG. 7 is a flowchart illustrating a merge region according to various
- FIG. 8 is a flowchart illustrating state transitions according to various
- FIG. 9A is a flowchart illustrating State 1 according to various embodiments of the present invention.
- FIG. 9B is an example of State 2 code according to various embodiments of the present invention.
- FIG. 10 is a flowchart illustrating an Incrementer according to various
- FIG. 11 is a flowchart illustrating State 2 according to various embodiments of the present invention.
- FIG. 12 is an illustration of a current label module according to various embodiments of the present invention.
- FIG. 13 is a flowchart illustrating State 3 according to various embodiments of the present invention.
- FIG. 14 is a flowchart illustrating States 4, 5, and 6 according to various conditions
- FIG. 15 is a flowchart illustrating State 7 and a Recall operation according to various embodiments of the present invention.
- FIG. 16 is a flowchart illustrating State 7 and a Ranked operation according to various embodiments of the present invention.
- FIG. 17 is a flowchart illustrating State 7 and the Ranked operation and Rank module according to various embodiments of the present invention.
- FIG. 18 is an illustration depicting an example input image, with one-bit value for each pixel location according to various embodiments of the present invention.
- FIG. 19 is an illustration depicting the image with resulting bounding boxes after being passed through the Bounding Box process and filtered according to various embodiments of the present invention.
- FIG. 20 is a block diagram depicting control of a device according to various embodiments. [00047] DETAILED DESCRIPTION
- the present invention relates to an image processing system and, more
- any element in a claim that does not explicitly state“means for” performing a specified function, or“step for” performing a specific function, is not to be interpreted as a“means” or“step” clause as specified in 35 U.S.C.
- the first is a system for image processing.
- the system is typically in the form of a computer system operating software or in the form of a“hard-coded” instruction set or as a field programmable gate array (FGPA).
- FGPA field programmable gate array
- the second principal aspect is a method, typically in the form of software, operated using a data processing system (computer).
- the third principal aspect is a computer program product.
- the computer program product generally represents computer-readable instructions stored on a non-transitory computer-readable medium such as an optical storage device, e.g., a compact disc (CD) or digital versatile disc (DVD), or a magnetic storage device such as a floppy disk or magnetic tape.
- a non-transitory computer-readable medium such as an optical storage device, e.g., a compact disc (CD) or digital versatile disc (DVD), or a magnetic storage device such as a floppy disk or magnetic tape.
- Other, non-limiting examples of computer-readable media include hard disks, read-only memory (ROM), and flash-type memories.
- FIG. 1 A block diagram depicting an example of a system (i.e., computer system
- the computer system 100 is configured to perform calculations, processes, operations, and/or functions associated with a program or algorithm.
- certain processes and steps discussed herein are realized as a series of instructions (e.g., software program) that reside within computer readable memory units and are executed by one or more processors of the computer system 100. When executed, the instructions cause the computer system 100 to perform specific actions and exhibit specific behavior, such as described herein.
- the computer system 100 may include an address/data bus 102 that is
- processors configured to communicate information.
- one or more data processing units such as a processor 104 (or processors) are coupled with the address/data bus 102.
- the processor 104 is configured to process information and instructions.
- the processor 104 is a microprocessor.
- the processor 104 may be a different type of processor such as a parallel processor, application-specific integrated circuit (ASIC), programmable logic array (PLA), complex programmable logic device (CPLD), or a field
- FPGA programmable gate array
- the computer system 100 is configured to utilize one or more data storage units.
- the computer system 100 may include a volatile memory unit 106 (e.g., random access memory (“RAM”), static RAM, dynamic RAM, etc.) coupled with the address/data bus 102, wherein a volatile memory unit 106 is configured to store information and instructions for the processor 104.
- the computer system 100 further may include a non-volatile memory unit 108 (e.g., read-only memory (“ROM”), programmable ROM (“PROM”), erasable programmable ROM
- the computer system 100 may execute instructions retrieved from an online data storage unit such as in“Cloud” computing.
- the computer system 100 also may include one or more interfaces, such as an interface 110, coupled with the address/data bus 102. The one or more interfaces are configured to enable the computer system 100 to interface with other electronic devices and computer systems.
- the communication interfaces implemented by the one or more interfaces may include wireline (e.g., serial cables, modems, network adaptors, etc.) and/or wireless (e.g., wireless modems, wireless network adaptors, etc.) communication technology.
- wireline e.g., serial cables, modems, network adaptors, etc.
- wireless e.g., wireless modems, wireless network adaptors, etc.
- the computer system 100 may include an input device 112
- the input device 112 is coupled with the address/data bus 102, wherein the input device 112 is configured to communicate information and command selections to the processor 100.
- the input device 112 is an alphanumeric input device, such as a keyboard, that may include alphanumeric and/or function keys.
- the input device 112 may be an input device other than an alphanumeric input device.
- the computer system 100 may include a cursor control device 114 coupled with the address/data bus 102, wherein the cursor control device 114 is configured to communicate user input information and/or command selections to the processor 100.
- the cursor control device 114 is implemented using a device such as a mouse, a track-ball, a track pad, an optical tracking device, or a touch screen.
- the cursor control device 114 is directed and/or activated via input from the input device 112, such as in response to the use of special keys and key sequence commands associated with the input device 112.
- the cursor control device 114 is configured to be directed or guided by voice commands.
- the computer system 100 further may include one or more
- a storage device 116 coupled with the address/data bus 102.
- the storage device 116 is configured to store information and/or computer executable instructions.
- the storage device 116 is a storage device such as a magnetic or optical disk drive (e.g., hard disk drive (“HDD”), floppy diskette, compact disk read only memory (“CD-ROM”), digital versatile disk (“DVD”)).
- a display device 118 is coupled with the address/data bus 102, wherein the display device 118 is configured to display video and/or graphics.
- the display device 118 may include a cathode ray tube (“CRT”), liquid crystal display (“LCD”), field emission display (“FED”), plasma display, or any other display device suitable for displaying video and/or graphic images and alphanumeric characters recognizable to a user.
- CTR cathode ray tube
- LCD liquid crystal display
- FED field emission display
- plasma display or any other display device suitable for displaying video and/or graphic images and alphanumeric characters recognizable to a user.
- the computer system 100 presented herein is an example computing
- the non-limiting example of the computer system 100 is not strictly limited to being a computer system.
- the computer system 100 represents a type of data processing analysis that may be used in accordance with various aspects described herein.
- other computing systems may also be implemented.
- the spirit and scope of the present technology is not limited to any single data processing environment.
- one or more operations of various aspects of the present technology are controlled or implemented using computer-executable instructions, such as program modules, being executed by a computer.
- program modules include routines, programs, objects, components and/or data structures that are configured to perform particular tasks or implement particular abstract data types.
- an aspect provides that one or more aspects of the present technology are implemented by utilizing one or more distributed computing environments, such as where tasks are performed by remote processing devices that are linked through a communications network, or such as where various program modules are located in both local and remote computer- storage media including memory- storage devices.
- FIG. 2 An illustrative diagram of a computer program product (i.e., storage device) embodying the present invention is depicted in FIG. 2.
- the computer program product is depicted as floppy disk 200 or an optical disk 202 such as a CD or DVD.
- the computer program product generally represents computer-readable instructions stored on any compatible non-transitory computer-readable medium.
- “instructions” as used with respect to this invention generally indicates a set of operations to be performed on a computer, and may represent pieces of a whole program or individual, separable, software modules.
- Non-limiting examples of“instruction” include computer program code (source or object code) and“hard-coded” electronics (i.e. computer operations coded into a computer chip).
- The“instruction” is stored on any non-transitory computer-readable medium, such as in the memory of a computer or on a floppy disk, a CD-ROM, and a flash drive. In either event, the instructions are encoded on a non-transitory computer-readable medium.
- the present disclosure provides a system and corresponding hardware
- the system is implemented on a Field Programmable Gate Array
- FPGA field-programmable gate array
- the system also computes, based the height and width of the bounding box and number of contained object pixels, a ranking score used for subsequent filtering of objects based on size and aspect ratio.
- the process is designed to provide this bounding box information while both minimizing FPGA resources and achieving sufficient throughput to keep up with a desired input image frame rate (e.g., 30 frames per second).
- a desired input image frame rate e.g. 30 frames per second.
- the system While performing the connected components labeling of the binary image, the system also simultaneously records the bounding box coordinates and the number of detected object pixels for each bounding box. This additional information is useful for subsequent ranking and filtering of objects based on size and aspect ratio, and is gathered without a significant amount of extra computation time and hardware resources.
- the design of the invention is optimized to simultaneously minimize FPGA utilization and computation time.
- SWAP low size, weight and power
- the invention is able to improve mission responsiveness and reduce the amount of raw sensor data that must be transmitted over constrained communication bandwidths.
- the system and process can be used for both active safety and
- autonomous driving applications By performing object detection in a low-power, low-cost hardware near the camera, the automobile can more rapidly and robustly detect obstacles in the road, and thus provide more timely warnings to a driver or more prompt automated responses to obstacles in autonomous vehicles. Further details are provided below.
- the bounding box is a method by which the system accepts a matrix of single bit data as an input image, which it uses as a basis to create an array of boxes.
- Each“box” will contain coordinates for two x locations, two y locations, and a valid pixel count.
- the bounding box process can be implemented using any suitable software product.
- the bounding box was implemented in Matlab.
- the software-based bounding box design can be defined into 3 different sections: Preparation, Search/Label, and Merge Regions. Each section will later be translated to be implemented on the hardware design. [00071] (3.2.1) Preparation
- the preparation process is a simple, instantiation and initialization, of the variables and arrays to their defaults values.
- Variables initialized would be the region count, previous y location, and current y location.
- Arrays initialized would be the“Image”,“Labeled Image”,“Merge to Region”, and“Bounding Box Data”. Region count will be used as a“ticket” value to label pixels found.
- Previous/Current Y locations are used as references to decrease the size of the Label Image matrix.
- a reduced Label Image total size allows a digital circuit to use less hardware resources when implementing the Bounding Box method in such a circuit. Due to the label algorithm search pattern using previous locations, a larger image size must be accommodated by placing extra blank pixels around the image, thereby increasing the size of width and height by 2.
- Label Image has a height of 2 with a width set to the Image width.
- the Merge to Region is a single dimension array set to a length of the max region count, which is a set limit to how many labels are maximally desired to be distributed. This limit is to reflect the limited resources used during a digital implementation.
- the Bounding Box Data is two dimensional having a width of 5 holding: two x locations, two y locations, and a valid pixel count.
- the Bounding Box Data height is set to the maximum region count.
- FIG. 3 depicts a relationship between each matrix and variables during their instantiation of the Preparation process.
- the variables are created, they must be initialized to their correct starting values.
- the system initiates an array of dimensions image (i.e., size) plus pixel border (i.e., blank pixels which pad the dimensions) 316 for form the image 318.
- Both Merge To Region 300 and Label Image 302 have all their values initialized to the max region count 304.
- Label Images 302 proceeds to change 312 the values such that Y Previous is set to 0 and Y Current is set to 1.
- Bounding Box Data 306 has its minimum x and y values 314 sent to the max size 308 of the Image width and height respectively.
- the system proceeds with the Search/Label process, as shown in FIG. 5.
- the system will search the image and label found pixels (from top to bottom of the image, or any other ordering as predetermined). Pixels are binary in the range of 0 or 1, thereby making something“found” as a pixel having a value.
- FIG. 4 depicts an example of how the search is conducted. Using the current pixel location in the image as (X, Y), the system first checks to see if (X,Y) has a valid pixel.
- the system proceeds to check if (X-l, Y-l),(X, Y-l),(X+l, Y-l) and/or (X-l, Y) have already been labeled. As shown in FIG. 5, the process continues until the image has been searched through 500 (e.g., to its bottom, or top, etc.), at which point the searching/labelling is done 502. Alternatively, assume that this is the first pixel found; thus, no locations currently have a label (e.g., the image has not been read through to its edge 504 and there is a valid pixel at (x,y) 506, and none of the adjacent pixels have been labeled 508).
- this pixel shall be labelled 508 with the region count of 1 and the region count 510 is increased.
- the system will index 510 the Bounding Box Data array storing the pixel count to a value of 1, and use the current x, y locations to store as their min/max values. Since the found box is only one pixel in size, the max and min values for the x and y locations would be equal. Now, assume that the next pixel is also valid. This then creates the condition that (X-l, Y-l), (X, Y-l),
- box 526 After completing evaluation of one pixel location, the system implements box 526 to increase the“X” index to move closer to the edge of the image. Further after arriving at the edge of the image as seen in box 504, the system begins to implement box 524 which will increase the “Y” dimension and swap the values for Y Current and Y Previous.
- Y Current and Y Previous are used to maintain a small Label Image Array, hence the Y Current and Y Previous swap will keep the data needed for the next evaluation while opening a new set to be overwritten.
- the process then proceeds to determine if the Bounding Box Data is updated by comparing the new data“(X, Y)” with the Max/Min, X and Y values stored. If the stored Max/Min X and Y values are smaller/larger than the new data, then the system will update the Bounding Box Data with the new X and Y locations.
- the system must update 514 some of the values in the Bounding Box Data by increasing Xmax since the connected pixel increase the box size by 1 in the x direction and the pixel count will increase by 1.
- lines 600 can be used to cut or
- a and C would contain all those components labeled as 1 in this example.
- B would contain all those components labeled as 2 and still labeled as 2 in this example.
- D would contain all those components that would have been labeled as 2 but are instead now labeled as 1.
- the last step is the Merge Regions process, as shown in FIG. 7.
- reminders to update as found in the Merge To Region array are placed into the appropriate bounding boxes.
- a new array is created 700 which will keep track of the valid Bounding Box Data.
- the valid data array will start with everything as“true” up to the region count and anything above as “false” up to the max region count (with the same length as the Bounding Box Data).
- the stored Bounding Box Data values may be merged into a central location leaving one or more sets of data invalid.
- the system starts by retracing the Bounding Box Data from the current region count (previously used to count how many labels were given out), counting down to 1 in a for loop.
- a for loop is a process that is repeated in a loop until its finished. There are different types of these loops but typically a for loop preforms some action until it hits an ending condition. Usually in the loop you are either counting down or up to reach the condition to end your process and leave the loop.
- the Bounding Box Data will contain the Xmin, Ymin, Xmax, Ymax, and pixel count information; therefore, to update 706, the process proceeds to look at the Bounding Box Data in two indexed locations (i.e., (1) that of the current index and (2) that of the value stored in Merge To Region of that same index). Using these two locations; the min values are compared to see which has a smaller value, the max values are compared to see which has a larger value, and the pixel count values are combined.
- the information is then stored back into the Bounding Box Data indexed by the Merge To Region value indexed by the for loop which will contain the lower label.
- the Bounding Box Data is fully updated in that location with the most current information on min, max, and pixel count values. This process is repeated until there is a region count of one 710, which is the lowest label and it is not possible to have another location to merge into.
- the Merge Regions process terminates 712.
- the output would now be the values stored as the Bounding Box Data along with the validation array, which identifies which parts of the Bounding Box Data contains boxes with regions that have been unmerged or have the most up-to-date merged information.
- the present disclosure also provides a digital hardware
- Creating the digital hardware implementation requires reducing the bounding box to some known range of values. For illustrative purposes, the implementation is described with respect to an image being 512 pixels wide by 256 pixels long, with each pixel containing a one-bit value.
- the design is to be controlled from an outside module such that the bounding box module will receive a start signal and will need to be allowed to index the necessary image locations per request.
- additional filtering was also implemented and the provided results were reduced to the top 15 ranked boxes. All bounding boxes are found and stored in memory; however, the module will provide specifically 15 which are ranked in a manner as described in further detail below.
- the hardware can be summarized into three stages, that of Preparation, Search/Label, and Merge Regions.
- this hardware implementation there is one additional stage referred to as Recall and Rank.
- the translation in the hardware also requires that functions finish in a certain clock cycle. That being the case, the algorithm has been broken up into different states.
- the large Bounding Box array is stored in block random access memory (BRAM).
- the Flip flops are a type of registers which stores bits. They are found in the fabric of the FPGA, and to reduce the amount required to store the created data, they are placed into BRAM (which is another component in the FPGA). This further requires that the algorithm be broken up into states, some of which are used to hide indexing and receiving information from the BRAM. [00086] (3.3.1) Preparation
- the hardware Preparation section translates to both the instantiation and some of the initialization of the variables needed in the algorithms.
- the limitation of hardware requires the sizes of the variables to be labelled.
- the process is illustrated in FIG. 8, in which some boxes represent an array 800 while the remaining boxes each represent a value.
- the largest limitation is that of the amount of labels that can be provided. In one example, the number of labels was reduced to 256, which in turn will set the range of many of the arrays.
- a Label Image array 802 is included for an input image 801, which, in this example, will be 255 dimensions tall and
- Bounding Box BRAM array 808 As in the software case above, the information 810 contained will be that of Xmax, Ymax, Xmin, Ymin, and Pixel Count.
- the hardware implementation includes Xsize and Ysize which is a simple (Xmax - Xmin or Ymax - Ymin) calculation.
- Bounding Box is in BRAM
- the search/label functionality is split to allow for reads to BRAM.
- the Search/Label software design can be further separated into different sections of the states to take advantage of the clock delays. Therefore, the Search/Label is broken up into three states with a special incrementer stage.
- State 1 will contain the condition for finding a new pixel or finding no pixel at a current search location. If no hard stop conditions exist 900, and if a current valid pixel is found 902, it is desirable to determine if any of the pixel’s neighbors (as seen in FIG. 4) 904 have valid pixels. If none of the neighbors have a valid pixel, then the Label Image Array and the Bounding Box BRAM are updated 906 and the incrementer 908 is activated 918.
- the write command to Bounding Box BRAM will take one clock cycle, but since the process increments (via the incrementer 908) the write address to never re- write an area in BRAM and state 1 does not read the BRAM, the process can return back to state 1 910 without any issue, thereby not needing unnecessary wait states.
- the FPGA/Hardware is made up of processes running in parallel every clock cycle and will confirm actions at the end of every clock cycle. It should also be noted that the BRAM operates with some clock cycle delays. The BRAM is a place the system will write to and read
- Label Image must reset the data stored at that pixel location to the max region count. This will ensure that when the system compares Label Image locations (see in FIG 12) that the lowest location is up-to- date and contains only valid data relating to that section of the image.
- FIG. 10 illustrates the incrementer 908 process, showing the decision between proceeding to State 1 910 or State 4 914.
- the system increments the“X” index to move closer to the edge of the image 1002 and proceeds to State 1 910.
- the process has read to the edge of the image 1000, then resets the“X” index to 1 1004, and determines if the process is at the end of the image 1006. If not, then increments the“Y” index to move closer to the end of the image and swaps the values for“Y Current” and“Y Previous” 1008 and proceed to State 1 910. Alternatively, if so, then reset“Y Current” and“Y Previous” to their starting values and locks in the valid region count to the current value of the region count 1010 and proceed to State 4 914.
- State 2 uses another module referred to as the Current Label Module 1100, shown in further detail in FIG. 12.
- the clock cycle delay is used to perform the current label assignment 1112 in preparation for this stage (note discussion above regarding FPGA clock cycles and BRAM read/write delays).
- BRAM because the Bounding Box BRAM 1102 will take one clock cycle to read, it is necessary to send a read signal along with the current label 1104 to read the address which contains the data to be combined. BRAM needs a signal to indicate that the process is going to read, along with an address. “Current label” 1102 will be the read address. Note the comments above regarding“Search/Label” section functions, where here the system is combining labels to later be merged.
- State 2 will contain only the section on which to set the Merge To Region.
- State 2 exists to set the“Merge To Region” 1108 array with data which will later be used in State 3a 1110 and during the“Merge Region” phase.
- the update to the Bounding Box BRAM 1102 can be performed.
- the updated includes combining the data read out from the calculated current label location with the current information.
- the digital implementation also adds another section to calculate the size for both the X and Y direction. This will later be used as a filter for unwanted bounding boxes. Another addition is to filter out unwanted boxes that do not meet a certain range of pixel size 1308.
- a lower bound and upper bound pixel count is applied to validate Bounding Box BRAM 1102 stores (with pixel counts being in a valid pixel range designated as valid 1310, while pixel counts outside the valid pixel range are designated as invalid 1312).
- the valid regions array is used to determine what information stored in BRAM should be tested in a later stage by assigning value of“1” to addresses which house valid sets of data.
- the valid regions array will later be cycled through to see if the system should read data stored in BRAM from that address location during the Recall and Rank phase.
- the system activates the incrementer 908 and sends the process 1304 to State 1. However, if the incrementer 908 detects that the process is on the last pixel location it will instead send the process 1306 back to State 4 as seen in FIG. 10.
- Region[region count] 1402) from the Bounding Box BRAM 1102 and the system writes back to the Bounding Box BRAM 1002 location. If not, the system continues searching through the Merge To Region by decreasing region count 1422 and decreasing the valid regions count 1420 based on data created during state 6 1426 1430 eventually implemented by 1418.
- State 4 Due to the BRAM 1102 reads, the process must return to State 4 to cover one clock cycle delay of reading and to request a different address. For clarity, State 4 is broken into two parts, State 4a 1404 to determine if there is a need to merge and State 4b 1406, which includes the clock cycle wait and read next address.
- State 5 1408 is very simple since it is known that BRAM 1102 has valid data from State 4a 1404. Thus, data that has arrived must be saved 1410 so that the data can later be used to compare against the address read in State 4b 1406.
- State 6 1412 will then compare 1414 both sets of data received from the two Bounding Box BRAM 1102 reads then store the merged information back accordingly into the Bounding Box BRAM 1102 with the lower address. Since the write will take place during State 4a 1404, the system will have correctly written before activating the read of the next address. Arriving in state 6 indicates that the current region count is going to be merged, thereby the system will invalidate that region for the valid region array and decrease the valid region count 1424.
- a filter can be added to filter out regions (unwanted boxes) that do not meet a certain range of pixel count 1426, with pixel counts being in a valid pixel range designated as valid 1428, while pixel counts outside the valid pixel range are designated as invalid 1430.
- Moving from State 4a to State 7 is handled by starting a read 1508 from BRAM as a valid region 1510 at address zero 1512 and by forcing one clock cycle wait 1514 before returning to State 7 1516.
- the idea is to constantly read addresses from the Bounding Box BRAM 1102 such that the process is just one clock cycle behind 1500 each read.
- State 7 1416 will filter based on if the read from Bounding Box BRAM 1102 is valid 1502 and meets additional filtering. Delaying clock cycles are included in 1504 before returning to State 0 1506 to account for the additional filtering delay.
- Box 1518 depicts the end of the search for valid regions, thereby the system must be forced to stop the constant reading of addresses seen in 1520.
- the valid bram read 1508 and bounding box data 1102 are filtered 1604 to identify a valid rank. Additionally, the result read and valid rank have a one clock delay added 1606 1608. If the data is found to be a valid rank 1602 then that is further ranked by the rank modules 1600.
- FIG. 17 depicts a flowchart of State 7, focusing on the ranked operation in each of the individual rank modules.
- Each rank module first starts by determining 1700 if the rank associated with the bounding box or region is valid or if a reset rank has been issued. If the rank is an invalid rank, determined by valid rank value of 0, then the module will send out pass along invalid rank command and“0”s values for associated data. If the rank is a reset rank 1704, then the system empties data 1706 stored in the ranks. Emptying data 1706 stored in ranks refers to the locally stored ranked data.
- the rank modules each create a rank number by dividing the pixel count by the maximum between the Xsize and Ysize. That rank number and valid rank transfers 1708 between the rank modules such that the highest rank numbers stay at the top with the lower ranked numbers bumped down to open ranks or out of the saved area entirely. This process proceeds by first determining 1710 if the incoming rank is greater than the upper rank of currently stored data.
- the incoming rank is then set 1712 as the upper rank of currently stored data, with the previously stored upper rank data then being set 1714 as the lower data stored and removed 1716 from the rank module. If the incoming rank is less than the upper rank of currently stored data, then it is determined 1718 if the incoming rank is greater than the lower rank of the previously stored data. If not, then the incoming ranked data is removed 1722 from the rank module. Alternatively, if the incoming rank is greater than the lower rank of the previously stored data, then the incoming rank data is set 1720 as the lower ranked stored data and passed out of the rank module 1716.
- Bound Box Implementation described above Adhering to the algorithm needs and as shown in FIG. 18, the known image 1800 was 512x256 pixels in size, with single bit pixels (i.e., one-bit value for each pixel location). Passing the image 1800 through the filter, the system identified 220 labeled locations, which are later merged into 182 unique Bounding Boxes. Through ranking, the system filtered the bounding boxes down to the top 15“ranked” locations (as shown FIG. 19). Thus, it was shown that the Bounding Box process of the present disclosure was effective in identified objects in the image and generating a bounding box around such an object. Based on that, the Bounding Box process described herein can be implemented on consecutive frames in a video image to operate as an efficient and effective movement tracker in any desired setting.
- a processor 2000 may be used to control a device 2002 (e g., a mobile device display, a virtual reality display, an augmented reality display, a computer monitor, a motor, a machine, a drone, a camera, etc.) based on the bounding box generation.
- the control of the device 2002 may be used to transform the localization of an object into a still image or video representing the object.
- the device 2002 may be controlled to cause the device to move or otherwise initiate a physical action based on the discrimination and localization.
- a drone or other autonomous vehicle may be controlled to move to an area where the localization of the object is determined to be based on the imagery.
- a camera may be controlled to track an identified object by maintaining a moving bounding box within a field of view.
- actuators or motors are activated to cause the camera (or sensor) to move to maintain the bounding box within the field of view so that an operator or other system can identify and track the object.
- the device can be an autonomous vehicle, such as an unmanned aerial vehicle (UAV), that includes a camera and the bounding box design described herein.
- UAV unmanned aerial vehicle
- the UAV can be caused to maneuver to follow the object such that the bounding box remains within the field of view of the UAV.
- rotors and other components of the UAV are actuated to cause the UAV to track and following the object.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
Abstract
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201862659129P | 2018-04-17 | 2018-04-17 | |
| PCT/US2019/018049 WO2019203920A1 (fr) | 2018-04-17 | 2019-02-14 | Matériel et système de génération de rectangle englobant pour pipeline de traitement d'image |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| EP3782114A1 true EP3782114A1 (fr) | 2021-02-24 |
| EP3782114A4 EP3782114A4 (fr) | 2022-01-05 |
Family
ID=68240216
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| EP19788663.3A Withdrawn EP3782114A4 (fr) | 2018-04-17 | 2019-02-14 | Matériel et système de génération de rectangle englobant pour pipeline de traitement d'image |
Country Status (3)
| Country | Link |
|---|---|
| EP (1) | EP3782114A4 (fr) |
| CN (1) | CN111801703A (fr) |
| WO (1) | WO2019203920A1 (fr) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11798269B2 (en) * | 2021-03-16 | 2023-10-24 | Kneron (Taiwan) Co., Ltd. | Fast non-maximum suppression algorithm for object detection |
Family Cites Families (26)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5848184A (en) * | 1993-03-15 | 1998-12-08 | Unisys Corporation | Document page analyzer and method |
| US6263113B1 (en) * | 1998-12-11 | 2001-07-17 | Philips Electronics North America Corp. | Method for detecting a face in a digital image |
| US6351559B1 (en) * | 1998-12-22 | 2002-02-26 | Matsushita Electric Corporation Of America | User-enclosed region extraction from scanned document images |
| US6763137B1 (en) * | 2000-09-14 | 2004-07-13 | Canon Kabushiki Kaisha | Recognition and clustering of connected components in bi-level images |
| JP4692115B2 (ja) * | 2005-07-11 | 2011-06-01 | ソニー株式会社 | 画像処理装置および撮像装置 |
| CN101551859B (zh) * | 2008-03-31 | 2012-01-04 | 夏普株式会社 | 图像辨别装置及图像检索装置 |
| CN101567048B (zh) * | 2008-04-21 | 2012-06-06 | 夏普株式会社 | 图像辨别装置及图像检索装置 |
| US8326077B2 (en) * | 2008-10-31 | 2012-12-04 | General Instrument Corporation | Method and apparatus for transforming a non-linear lens-distorted image |
| US8867820B2 (en) * | 2009-10-07 | 2014-10-21 | Microsoft Corporation | Systems and methods for removing a background of an image |
| US20120206567A1 (en) * | 2010-09-13 | 2012-08-16 | Trident Microsystems (Far East) Ltd. | Subtitle detection system and method to television video |
| US8629913B2 (en) * | 2010-09-30 | 2014-01-14 | Apple Inc. | Overflow control techniques for image signal processing |
| US8457356B2 (en) * | 2010-10-21 | 2013-06-04 | SET Corporation | Method and system of video object tracking |
| WO2012138828A2 (fr) * | 2011-04-08 | 2012-10-11 | The Trustees Of Columbia University In The City Of New York | Approche par filtre de kalman pour augmenter un suivi d'objet |
| US8917764B2 (en) * | 2011-08-08 | 2014-12-23 | Ittiam Systems (P) Ltd | System and method for virtualization of ambient environments in live video streaming |
| US9530221B2 (en) * | 2012-01-06 | 2016-12-27 | Pelco, Inc. | Context aware moving object detection |
| TWI463420B (zh) * | 2012-03-13 | 2014-12-01 | Tatung Co | 連通元件標記的影像處理方法 |
| EP3686754A1 (fr) * | 2013-07-30 | 2020-07-29 | Kodak Alaris Inc. | Système et procédé de création de vues navigables d'images ordonnées |
| US10163217B2 (en) * | 2014-02-17 | 2018-12-25 | General Electric Copmany | Method and system for processing scanned images |
| GB2525223B (en) * | 2014-04-16 | 2020-07-15 | Advanced Risc Mach Ltd | Graphics processing systems |
| EP3101592A1 (fr) * | 2015-06-02 | 2016-12-07 | Thomson Licensing | Procédé et appareil pour noter l'esthétique d'une image |
| US9594984B2 (en) * | 2015-08-07 | 2017-03-14 | Google Inc. | Business discovery from imagery |
| CN119645078A (zh) * | 2015-09-15 | 2025-03-18 | 深圳市大疆创新科技有限公司 | 控制可移动物体跟踪目标的系统和方法 |
| WO2017123920A1 (fr) * | 2016-01-14 | 2017-07-20 | RetailNext, Inc. | Détection, suivi et comptage d'objets dans des vidéos |
| US10482681B2 (en) * | 2016-02-09 | 2019-11-19 | Intel Corporation | Recognition-based object segmentation of a 3-dimensional image |
| CN107491456A (zh) * | 2016-06-13 | 2017-12-19 | 阿里巴巴集团控股有限公司 | 图像排序方法和装置 |
| US10565255B2 (en) * | 2016-08-24 | 2020-02-18 | Baidu Usa Llc | Method and system for selecting images based on user contextual information in response to search queries |
-
2019
- 2019-02-14 EP EP19788663.3A patent/EP3782114A4/fr not_active Withdrawn
- 2019-02-14 WO PCT/US2019/018049 patent/WO2019203920A1/fr not_active Ceased
- 2019-02-14 CN CN201980016252.4A patent/CN111801703A/zh active Pending
Also Published As
| Publication number | Publication date |
|---|---|
| EP3782114A4 (fr) | 2022-01-05 |
| CN111801703A (zh) | 2020-10-20 |
| WO2019203920A1 (fr) | 2019-10-24 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP3855351B1 (fr) | Procédé, appareil, dispositif et support de détection d'éléments de localisation | |
| US12154309B2 (en) | Joint training of neural networks using multi-scale hard example mining | |
| CN113159016B (zh) | 文本位置定位方法和系统以及模型训练方法和系统 | |
| US11543830B2 (en) | Unsupervised real-to-virtual domain unification for end-to-end highway driving | |
| EP4307219A1 (fr) | Procédé et appareil de détection de cible tridimensionnelle | |
| US9665542B2 (en) | Determining median value of an array on vector SIMD architectures | |
| US20210063577A1 (en) | Robot relocalization method and apparatus and robot using the same | |
| US12130860B2 (en) | Visual feature database construction method, visual positioning method and apparatus, and storage medium | |
| KR102073162B1 (ko) | 딥러닝 기반의 소형 물체 검출 기법 | |
| EP4056952A1 (fr) | Procédé de fusion de cartes, appareil, dispositif et support de stockage | |
| US10262229B1 (en) | Wide-area salient object detection architecture for low power hardware platforms | |
| US12266147B2 (en) | Hand posture estimation method, apparatus, device, and computer storage medium | |
| US11182908B2 (en) | Dense optical flow processing in a computer vision system | |
| US20200082544A1 (en) | Computer vision processing | |
| US20210166393A1 (en) | Pixel-wise Hand Segmentation of Multi-modal Hand Activity Video Dataset | |
| JP2017091103A (ja) | 粗密探索方法および画像処理装置 | |
| CN105051756A (zh) | Haar解算系统、图像分类系统、关联方法和关联计算机程序产品 | |
| Hirabayashi et al. | GPU implementations of object detection using HOG features and deformable models | |
| JP2013196454A (ja) | 画像処理装置、画像処理方法および画像処理プログラム | |
| US20230386231A1 (en) | Method for detecting three-dimensional objects in relation to autonomous driving and electronic device | |
| KR20220095169A (ko) | 3차원 객체 감지를 위한 장치의 동작 방법 및 그 장치 | |
| US20190258888A1 (en) | Hardware and system of bounding box generation for image processing pipeline | |
| US20220277595A1 (en) | Hand gesture detection method and apparatus, and computer storage medium | |
| CN110651475A (zh) | 用于致密光学流的阶层式数据组织 | |
| CN115482523A (zh) | 轻量级多尺度注意力机制的小物体目标检测方法及系统 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
| PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
| 17P | Request for examination filed |
Effective date: 20200828 |
|
| AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
| AX | Request for extension of the european patent |
Extension state: BA ME |
|
| DAV | Request for validation of the european patent (deleted) | ||
| DAX | Request for extension of the european patent (deleted) | ||
| A4 | Supplementary search report drawn up and despatched |
Effective date: 20211208 |
|
| RIC1 | Information provided on ipc code assigned before grant |
Ipc: G06F 15/76 20060101ALI20211202BHEP Ipc: G06T 1/60 20060101ALI20211202BHEP Ipc: G06T 1/20 20060101AFI20211202BHEP |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
| 18D | Application deemed to be withdrawn |
Effective date: 20220716 |