WO2012073894A1 - 物体の検出方法及びその方法を用いた物体の検出装置 - Google Patents
物体の検出方法及びその方法を用いた物体の検出装置 Download PDFInfo
- Publication number
- WO2012073894A1 WO2012073894A1 PCT/JP2011/077404 JP2011077404W WO2012073894A1 WO 2012073894 A1 WO2012073894 A1 WO 2012073894A1 JP 2011077404 W JP2011077404 W JP 2011077404W WO 2012073894 A1 WO2012073894 A1 WO 2012073894A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- hog feature
- detected
- bins
- target image
- detection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
- G06F18/2115—Selection of the most significant subset of features by evaluating different subsets according to an optimisation criterion, e.g. class separability, forward selection or backward elimination
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
- G06F18/2148—Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/50—Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/771—Feature selection, e.g. selecting representative features from a multi-dimensional feature space
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
- G06V10/7747—Organisation of the process, e.g. bagging or boosting
Definitions
- the present invention relates to an object detection method for detecting whether or not a person or a specific object is present in a captured image, and an object detection apparatus using the method.
- ITS Intelligent Transport System
- IT Information Technology
- HOG Heistograms of Oriented Gradients
- the HOG feature amount is a feature amount that can represent the shape of an object existing in the image, is determined from the luminance information of each pixel (pixel) of the image, and the direction of the luminance gradient in the local region (cell) in the image and This is a histogram-like feature amount obtained based on the size.
- Non-Patent Document 1 describes a technique using this HOG feature amount and SVM (Support Vector Machine). In this method, a cell (block) having a certain size is moved in the image while the HOG feature amount of the block portion is sequentially calculated to detect whether or not a person exists in the image. .
- Non-Patent Document 2 describes a method using a Joint feature representing a co-occurrence between a plurality of HOG feature amounts as a method for detecting an object using the HOG feature amount.
- Non-Patent Document 3 describes a method of calculating a plurality of HOG feature amounts by changing the block size.
- the present invention is made in view of such circumstances, and calculates a plurality of HOG feature quantities having different numbers of bins for each position of a local region in an image, and constructs a reference for detecting an object. It is an object of the present invention to provide a detection method and an object detection device using the method.
- the object detection method that meets the above object calculates a HOG feature value (A) indicating a luminance gradient for a detection target image, and an HOG feature that indicates a luminance gradient calculated in advance for a sample image obtained by imaging the object to be detected.
- A a HOG feature value indicating a luminance gradient for a detection target image
- HOG feature that indicates a luminance gradient calculated in advance for a sample image obtained by imaging the object to be detected.
- the plurality of HOG feature amounts having different bin numbers for each of the plurality of local regions in the sample image Calculating (B), obtaining a feature amount pattern indicating the presence of the detected object, and an identifier for determining the presence or absence of the detected object in the detection target image, based on the feature amount pattern
- the classifier performs the detection pair. Wherein in the image and a step of determining the presence or absence of the detected object.
- the object to be detected is a person
- the discriminator includes the object in the detection target image based on the plurality of HOG feature values (A) having different numbers of bins. It is preferable to detect the whole body, upper body, and lower body of the detected object, and detect the orientation of the detected whole body, upper body, and lower body of the detected object to determine the direction of the entire detected object. Here, it is possible to determine the most detected direction among the detected directions of the whole body, the upper body, and the lower body as the direction of the entire detected object.
- the bin effective for obtaining the feature amount pattern by a learning algorithm is selected from a plurality of bins of each HOG feature amount (B).
- the learning algorithm is preferably AdaBoost.
- the object detection apparatus calculates a HOG feature amount (A) indicating a luminance gradient for a detection target image, and an HOG feature indicating a luminance gradient calculated in advance for a sample image obtained by imaging the detected object.
- a plurality of HOG feature quantities each having a different number of bins for each of a plurality of local regions in the sample image in the object detection apparatus for detecting the presence or absence of the detection object in the detection target image based on the quantity (B) (B) is calculated, a feature quantity pattern indicating the presence of the detected object is obtained from the plurality of HOG feature quantities (B), and a plurality of bins having different numbers of bins are provided for each of the plurality of local regions in the detection target image.
- the object to be detected is a person
- the discriminator includes the object in the detection target image based on the plurality of HOG feature values (A) having different numbers of bins. It is preferable to detect the whole body, upper body, and lower body of the detected object, and detect the orientations of the detected whole body, upper body, and lower body of the detected object to determine the direction of the entire detected object. Here, it is possible to determine the most detected direction among the detected directions of the whole body, the upper body, and the lower body as the direction of the entire detected object.
- the calculation unit selects the bin effective for obtaining the feature amount pattern by a learning algorithm from a plurality of bins of the HOG feature amounts (B). .
- the learning algorithm is AdaBoost.
- the object detection method and the object detection apparatus indicate the presence of a detection object obtained by calculating a plurality of HOG feature quantities (B) having different bin numbers for each of a plurality of local regions in a sample image.
- a discriminator that determines the presence / absence of an object to be detected in the detection target image is constructed, and a plurality of HOG features having different numbers of bins calculated for each of a plurality of local regions in the detection target image
- the discriminator determines the presence or absence of the detected object in the detection target image, so that the presence of the detected object can be determined by calculating a plurality of HOG feature values (B) having different numbers of bins.
- bins (components) of the HOG feature quantity (B) not suitable for the detection criterion are not used, and other bins (components) of the same histogram or other HOGs having different numbers of bins are suitable for the criterion for detecting the presence of the detected object.
- Features (B) The use of bins (component), it is possible to extract composed features from the effective ingredients in the object detection, it is possible to increase the existence judgment accuracy of the detected object.
- the object to be detected is a person
- the discriminator detects the object in the detection target image based on a plurality of HOG feature quantities (A) having different numbers of bins.
- A HOG feature quantities
- a discriminator when selecting a bin effective for obtaining a feature amount pattern by a learning algorithm from a plurality of bins of each HOG feature amount (B), a discriminator It is possible to obtain a feature amount pattern that is a basis for constructing the object from a bin suitable for a reference for detecting an object to be detected, and to reliably detect the object to be detected.
- FIG. 1 is a block diagram of an object detection apparatus according to an embodiment of the present invention. It is explanatory drawing of the detection method of the object which concerns on one Example of this invention. It is explanatory drawing of the detection method of the same object. It is a flowchart which shows the preparation phase of the detection method of the same object. It is a flowchart which shows the determination phase of the detection method of the same object. It is explanatory drawing of the image for learning used in an experiment example and Comparative Examples 1 and 2, and the image for evaluation.
- an object detection method and an object detection apparatus 10 using the method according to an embodiment of the present invention use a HOG feature (A) indicating a luminance gradient for a detection target image.
- the presence / absence of the detected object in the detection target image is detected on the basis of the HOG feature value (B) indicating the brightness gradient calculated in advance for the sample image 20 calculated and imaged.
- an object detection apparatus 10 mainly includes a camera 11, a computer 12 (for example, a microcomputer), and a display 13, and is mounted on a vehicle, for example.
- the computer 12 is signal-connected to the camera 11 and determines whether or not the detected object P (see FIG. 2) is present in the image captured by the camera 11, and if so, the direction is determined. And the determination result is displayed on the display 13.
- the computer 12 is provided with a CPU 14 for performing information processing, a hard disk 15 on which various programs are mounted, and a memory 16 accessible by the CPU 14.
- the hard disk 15 includes a calculation unit 17 that calculates a HOG feature value indicating a luminance gradient for an image captured by the camera 11, and a discriminator 18 that is constructed by the calculation unit 17 and determines whether or not the detected object P is present in the image. It is installed.
- images There are two types of “images” here: a sample image 20 shown in FIG. 2 in which the detection object P is imaged, and a detection target image in which it is determined whether or not the detection object P has been imaged.
- image refers to two types of sample image 20 and detection target image.
- the calculating means 17 and the discriminator 18 are programs stored in the hard disk 15. Further, the computer 12 is provided with interfaces 11a, 13a, and 15a for signal-connecting the circuit on which the CPU 14 is mounted to the camera 11, the display 13, and the hard disk 15, respectively.
- the computing unit 17 can calculate a plurality of HOG feature quantities having different numbers of bins for a portion in a cell (local region) 19 having a certain size shown in FIG.
- the “HOG feature value” here refers to both the HOG feature value (A) calculated for the detection target image and the HOG feature value (B) calculated for the sample image 20 (the same applies hereinafter).
- the HOG feature amount is a feature amount obtained by histogramating the luminance gradient with the luminance gradient direction of the cell 19 as the horizontal axis and the magnitude (intensity) of the luminance gradient as the vertical axis.
- the direction of ° is divided into a plurality of direction areas, and the magnitude of the luminance gradient corresponding to each direction area is indicated by the height of the bin of the histogram.
- the luminance gradient becomes large at the position where the contour of the object (the boundary between the object and the background) is located. Therefore, the shape of the object in the image can be detected by obtaining the HOG feature amount.
- a HOG feature amount (B) pattern (referred to as a feature amount pattern, hereinafter simply referred to as a “feature amount pattern”) indicating the detected object P (a person in this embodiment) is previously learned from the sample image 20, Based on the learned feature amount pattern, a discriminator 18 can be constructed for determining whether or not an object to be detected exists in the detection target image.
- the size of the image captured by the camera 11 is 30 ⁇ 60 pixels as shown in FIG. 2, and the size of the cell 19 is 5 ⁇ 5 pixels.
- the computing means 17 calculates the luminance gradient magnitude m using Equation 1 and calculates the luminance gradient direction ⁇ using Equation 2.
- Fx (x, y) in Equations 1 and 2 is the luminance difference in the X-axis direction (left-right direction) shown in FIG. 2, and fy (x, y) is in the Y-axis direction (up-down direction) shown in FIG. It is a difference in luminance.
- the luminance of the pixel located at the coordinates (x, y) in the image is I (x, y)
- fx (x, y) is Equation 3
- fy (x, y) is Equation 4. Can be sought.
- the calculation means 17 moves the cell 19 in the image, and each time the cell 19 moves, the magnitude m of the luminance gradient of the area in the cell 19 and the luminance gradient direction ⁇ based on the luminance of each pixel in the cell 19. Is calculated.
- the cell 19 moves one pixel at a time from one side of both ends of the X-axis direction to the other side of the image and reaches the other end of the X-axis direction. Move to a position shifted by one pixel, and then move by one pixel in the X-axis direction.
- the calculation means 17 continues to move the cell 19 until the cell 19 has moved through the entire area of the image, and each time the cell 19 moves, N (N ⁇ 2) HOG feature values having different numbers of bins are calculated. To do.
- the feature amount is calculated.
- the calculation means 17 is a person who is the detected object P (here, “person” refers to an object belonging to the human genre among various objects, and the same applies to the following.
- the N HOG feature quantities (B) having different numbers of bins are calculated for each of the arrangement positions of the cells 19 in the sample image 20.
- a feature amount pattern indicating the presence of a person in the sample image 20 is obtained from the calculated N HOG feature amounts (B), and a discriminator 18 is constructed from the feature amount pattern.
- the calculation means 17 calculates N HOG feature values (A) having different numbers of bins for the respective arrangement positions of the cells 19 in the detection target image, and uses the calculated HOG feature values (A). This is given to the discriminator 18.
- the discriminator 18 determines the presence / absence of a person in the detection target image based on the HOG feature (A) calculated for the detection target image given from the calculation means 17. Further, when the discriminator 18 detects that a person is present in the detection target image, the discriminator 18 based on the HOG feature amount (A) calculated by the computing unit 17 for the detection target image, The whole body, upper body, and lower body are detected, and the orientations of the detected person's whole body, upper body, and lower body are detected to determine the orientation of the entire person in the detection target image.
- the method for determining the presence / absence of a person in a detection target image and the direction of the person includes a preparation phase for constructing a discriminator 18 for determining the presence / absence and direction of a person in the detection target image, It is divided into a determination phase in which the presence / absence of the inside person and the direction of the person are determined.
- a plurality of HOG feature values having different numbers of bins are calculated for each of a plurality of different arrangement positions of the cell in the feature cell, and a feature value pattern indicating the presence of a person
- the classifier 18 A step of determining the presence / absence of the person and the direction of the person is performed.
- the HOG feature value calculated by the calculation means 17 in the preparation phase and the determination phase is derived by determining the number of bins constituting the HOG feature value and determining the element (size) of each bin.
- the first position of the cell 19 for calculating the HOG feature amount in the image is the position of the first cell 19 and the position of the cell 19 at the stage of the k ⁇ 1th movement is the position of the kth cell 19.
- K is an integer greater than or equal to 2
- the element a kb i of the i-th bin of the HOG feature quantity B b at the position of the k-th cell 19 can be obtained by Expression 5.
- b is a subscript (index, ID) indicating the correspondence between a kb i and B b .
- a plurality of sample images 20 in which people are contained in an image are captured by the camera 11, and the sample images 20 are stored via a storage medium (for example, a USB memory) or a communication network.
- a storage medium for example, a USB memory
- a communication network As a database, it is input to the hard disk 15 of the computer 12 and stored in the hard disk 15 (step S1).
- a person who faces the camera 11 a person who faces the left, and a person who faces the right are imaged.
- step S2 For each of the plurality of sample images 20 stored in the hard disk 15, a plurality of HOG feature values (B) having different numbers of bins are calculated (step S2). Note that the number of bins of the HOG feature quantity (B) to be calculated is determined in advance, and in this embodiment, the HOG feature quantity (B) with the bin numbers of 3, 5, 7, and 9 is calculated.
- step S2 as shown in FIG. 4, the cell 19 moves in the sample image 20 (step S2-1), and the calculation means 17 calculates the HOG feature from a predetermined number of bins each time the cell 19 moves.
- the number of bins for calculating the quantity (B) is determined (step S2-2), and the HOG feature quantity (B) for the determined number of bins is calculated (step S2-3).
- the calculating means 17 calculates HOG feature values (B) with different numbers of bins while the cells 19 are fixed in the sample image 20, and calculates HOG feature values (B) with all the predetermined numbers of bins. After the completion, the cell 19 is moved (step S2-4).
- the calculation of the HOG feature amount (B) is as follows. For each sample image 20, the cell 19 moves from the start point (position of the first cell 19) of the sample image 20 to the end point position (position of the 1456th cell 19). The steps are sequentially performed until they are arranged (step S2-5).
- the calculation means 17 is an effective bin for obtaining a feature pattern from a plurality of bins of each HOG feature (B) calculated by AdaBoost, which is one of the learning algorithms (ie, Selection of bins suitable for the reference for detecting the object to be detected is performed (step S3). For example, when detecting a person facing right, the HOG feature quantities of 3, 5, 7, and 9 bins are calculated for all cells in the image using a sample image of the person facing right. Then, feature selection is performed using the AdaBoost algorithm, and, for example, an HOG feature amount of a bin that is effective for indicating the front side of the head of a person facing right is extracted.
- AdaBoost AdaBoost
- the first and third HOG feature quantities are 3 for the bin number 3
- the first, third, and fourth HOG feature quantities are 5 for the bin number 5, and 1 if the bin number is 7.
- the second, third, fifth, and seventh HOG feature values are 9 bins
- the first, second, fifth, sixth, and eighth HOG feature values are respectively selected. It is shown that.
- the initial HOG feature value of the cell including the front side of the head of the person facing right has (3 + 5 + 7 + 9) components.
- AdaBoost AdaBoost
- the HOG feature value of this cell is detected by detecting the right person ( It can be composed of only (2 + 3 + 5 + 5) components effective for (object detection).
- the number of bins in the cell including other parts in the sample image for example, the upper part of the body rear side, the middle part of the front side of the body, the buttocks, the upper part of the front of the leg, the middle part of the rear of the leg, the lower part of the front of the leg, etc. , 7 and 9 are calculated, and feature selection is performed using the AdaBoost algorithm.
- the HOG feature value of only the bin effective for indicating each part of the person facing right is obtained from the HOG feature values of the number of bins 3, 5, 7, and 9.
- bins of HOG feature quantities that are not suitable for the criteria for detecting the presence of a person facing right are not used, and
- bins (components) of the same histogram or other HOG feature (B) bins (components) with different number of bins that are suitable for the criteria for detecting the presence it is possible to detect the presence of a person facing right A feature amount composed of effective components can be extracted, and the presence / absence determination accuracy of the detected object can be increased.
- step S3 by doing this, from the HOG feature values of the number of bins 3, 5, 7, and 9 of the person facing the front, the HOG feature value of only the bin effective for showing each part of the person facing the front is From the HOG feature values of the number of bins 3, 5, 7, and 9 of the person facing the left side, the HOG feature values of only the bins that are effective for indicating each part of the person facing the left side are obtained.
- AdaBoost for example, a data compression method such as PCA (Principal Component Analysis) and other algorithms can be used for selecting an effective bin.
- a ′ kb i is an element after normalization
- ⁇ are coefficients for avoiding the denominator becoming zero.
- the calculation means 17 obtains a feature amount pattern indicating that a person is present in the image based on the sample image 20 by SVM learning, which is one of pattern identification methods, and uses this feature amount pattern.
- a discriminator 18 is provided in the hard disk 15 for 1) determination of the presence / absence of a person (body's whole body detection determination), 2) a process of mechanically separating the upper and lower bodies from the person's whole body, and 3) determination of the direction of the person. (Step S5).
- Step S5 A discriminator 18 is provided in the hard disk 15 for 1) determination of the presence / absence of a person (body's whole body detection determination), 2) a process of mechanically separating the upper and lower bodies from the person's whole body, and 3) determination of the direction of the person.
- HOG feature amount (a HOG selected for detection of a person facing the front), which is a feature amount pattern indicating the presence of a person in the image by using the AdaBoost algorithm using the sample image 20 of a person
- a plurality of feature quantities, HOG feature quantities selected for detection of a person facing left, and HOG feature quantities selected for detection of a person facing right), and combining these HOG feature quantities into a person Detecting means (a part of the discriminator 18) is formed.
- the detection means has a function of outputting a positive output value when it is determined to detect the whole body of a person, and a negative output value when it is determined that no person is present.
- the output value (person detection determination threshold value) when the detection means determines the whole body detection of the person can be determined. Accordingly, an output value when the detection target image is determined using the detection unit is obtained, and the obtained output value is compared with the human detection determination threshold, and it is determined that there is a person when the output value is equal to or higher than the human detection determination threshold. When the output value is less than the human detection determination threshold, determination means (a part of the discriminator 18) for determining that there is no person is formed.
- the upper and lower bodies of the detected person can be specified by mechanically dividing the whole body into two parts. Therefore, by forming a dividing means (a part of the discriminator 18) that divides the HOG feature quantity selected for the human whole body detection into HOG feature quantities that are effective for detecting the upper body and lower body of the person, An output value corresponding to the case where the upper body and the lower body are detected can be obtained. 3) Determination of the person's orientation Since the output value of the determination means when the whole body of the person is detected and the direction of the person in the sample image 20 (front direction, left direction, and right direction) are SVM learned, 1 vs.
- a threshold value for determining superiority or inferiority for rightward or frontward, a threshold for determining superiority or inferiority for frontward or leftward, and a threshold for determining superiority or inferiority for leftward or rightward are determined. Also, 1 vs. In the rest method, the threshold value for judging superiority or inferiority (right or left) (ie, front or left), threshold value for judging superiority or left (ie, front or right), other than front or front Threshold values for determining superiority or inferiority (that is, rightward or leftward) are determined. Similarly, for the case of a person's upper body and person's lower body, 1 vs. 1 method and 1 vs. Each threshold value in the rest method is obtained.
- the output values of the whole body of the person, the upper body of the person, and the lower body of the person, vs. 1 method and 1 vs. Compare the threshold values for each superiority / inferiority of the rest method, whether it is right or front, front or left, left or right, right or non-right, left or non-left, front or non-front Body direction detection means (a part of the discriminator 18) that performs superiority or inferiority determination, superimposes all the results, and determines the direction with the highest frequency as the direction of the entire person (the entire detected object).
- the discriminator 18 can be constructed in the hard disk 15 by recording in the hard disk 15 programs that express the functions of the detection means, determination means, division means, and body direction detection means. In step S5, the discriminator 18 is constructed and the preparation phase ends.
- the determination phase based on the HOG feature (A) calculated by the calculation means 17 for the detection target image, it is determined whether or not a person is present in the detection target image.
- a determination of the direction of the person is made by the classifier 18.
- the calculation means 17 calculates a plurality of HOG feature quantities (A) having different numbers of bins for each movement of the cell 19 while moving the cell 19 in the detection target image.
- Step S1 ′ The discriminator 18 determines whether or not a person is present in the detection target image based on the HOG feature (A) of the detection target image given from the calculation means 17 (step S2 ′).
- the determination result is displayed on the display 13 (step S3 ′).
- the discriminator 18 determines that a person is present in the detection target image, based on the HOG feature (A) of the detection target image, the whole body, upper body, and lower body of the person in the detection target image And the orientations of the whole person, the upper body and the lower body of the detected object are detected to determine the orientation of the entire person (step S4 ′). Then, the determination result of the presence of the person and the direction of the person is displayed on the display 13 (step S5 ′).
- step S ⁇ b> 4 ′ the discriminator 18 applies 1 vs. 1 for the whole body of the person detected in the detection target image, the upper body and the lower body of the whole body.
- 1 method and 1 vs. voting by rest method in 1vs.1 method, three superiority judgments (voting) are made, rightward or frontward, frontal or leftward, leftward or rightward, rightward or not rightward, leftward or leftward Or three dominance judgments (voting) that correspond to any of the six directions, either front-facing or non-front-facing, and the voting results (six different directions for the whole body, upper body, and lower body)
- the direction with the largest number of votes (determination result) is determined as the direction of the whole person in the detection target image based on the 18 kinds of superiority or inferiority determination results that determine which one of them is.
- Each vote is performed by always selecting either positive or negative.
- One method of positive and negative is as follows.
- the superiority / inferiority determination of rightward or frontward is negative, the superiority / inferiority of frontward or leftward is positive, and the superiority / inferiority determination of left or right is positive, 1 vs..
- the superiority / inferiority determination whether rightward or non-rightward is positive, the superiority / inferiority determination other than leftward or non-leftward is negative, and the superiority / inferiority determination whether it is frontward or non-frontal is negative, 1 vs.
- the right direction is determined and 1 vs. 1 is determined.
- the rest method determinations are made for rightward, non-leftward (i.e., rightward or frontward), and non-frontal (i.e. rightward or leftward), respectively.
- the cumulative number of votes is 5 for the right direction, 2 for the front direction, and 1 for the left direction, and the determination of the right direction is output from the body direction detection unit.
- 500 images of people facing left 100 sheets (I L (1)) +100 sheets (I L (2)) +100 sheets (I L (3)) +100 sheets (I L (4)) +100 sheets (I L (5))
- 500 images (I R ) 100 images (I R 1) ) +100 images (I R (2) ) +100 images (I R (3) ) +100 images (I R (4) ) +100 sheets (I R (5)) It becomes.
- each of the 500 images is divided into 400 learning images and 100 evaluation images in five ways shown as Case 1 to Case 5.
- a discriminator body direction detecting means for determining whether it is facing right or front is obtained by the following procedure.
- the learning image I L (2) + I L (3) + I L (4) + I L (5) and I F (2) + I F (3) + I F (4 ) + IF (5) 300 of IL (2) + IL (3) + IL (4) and 300 of IF (2) + IF (3) + IF (4)
- Adaboost is performed on the image, and weak classifiers (multiple HOG features selected for detection of people facing right, multiple HOG features selected for detection of people facing front)
- weak classifiers multiple HOG features selected for detection of people facing right, multiple HOG features selected for detection of people facing front
- the strong classifier combined multiple weak classifiers selected for detection of people facing right, combined with multiple weak classifiers selected for detection of persons facing front
- the identification plane (classification plane) to be divided is obtained by SVM learning. That. As a result, body direction detection means for determining whether to face right or front is obtained. (2) Using the discrimination surface (body direction detection means for judging whether it is facing rightward or frontward) obtained by SVM learning, the discrimination rate (200 ) is evaluated by the evaluation image I L (1) + IF (1) Recognition rate-the proportion of correctly classified images). (3) For cases 2 to 5, the processing of (1) and (2) is performed, and the identification rate is obtained. (4) The average discrimination rate of the discrimination rates of case 1 to case 5 is obtained, and this is used as the performance of the body direction detection means for judging whether it is facing right or front. Similarly, a discriminator (body direction detecting means) for determining whether the head is facing front or left and a discriminator (body direction detecting means) for determining whether it is facing right or left are obtained by the procedures (1) to (4).
- a discriminator for determining whether the object is facing right or other than right (front direction or left direction) is obtained by the following procedure.
- the learning image I L (2) + I L (3) + I L (4) + I L (5) and I F (2) + I F (3) + I F (4 ) + I F (5) + I R (2) + I R (3) + I R (4) + I , respectively I L (2 out of R (5)) + I L (3) + I L (4 300 sheets and I F of) (2) + I F ( 3) + I F (4) + I R (2) + I R (3) + I running Adaboost against 600 sheets of R (4) , And weak classifiers (multiple HOG features selected for detection of people facing right, multiple HOG features selected for detection of people facing front and people facing left), respectively.
- a strong classifier (a combination of a plurality of weak classifiers selected for detection of people facing right, a plurality of groups selected for detection of persons facing front and those facing left, respectively) seek those) that combine weak classifiers, of the learning image using the strong classifier I L (5) + I F (5) + I 3 of R (5) Map 00 images to the feature space, and find the discriminant plane (classification plane) that divides their distribution most effectively by SVM learning. (6) using the discriminant plane determined by the SVM learning (body direction detecting means for determining whether non-right or right), evaluation image I L (1) + I F (1) + I R (1) (300 sheets ) To obtain the identification rate (recognition rate—ratio of correctly classified images).
- the original HOG feature-value of the image for evaluation was computed, and the identification rate (recognition rate) at the time of determining a person's direction using each body direction detection means was calculated
- the original HOG feature value is calculated by setting cells over the entire image, but the number of bins in the density histogram of each cell is constant.
- the average discrimination rate of the discrimination rates obtained by each body direction detection means was 66.5%.
- the HOG feature using a human mask is calculated by defining a human shape as a mask and setting cells on the image within the range of the mask, but the number of bins in the density histogram of each cell is constant.
- Yes (Nakajima, Tan, Ishikawa, Morie, "Detection of person and body direction using HOG features and human masks", Journal of the Institute of Image Electronics Engineers, Vol. 39, p1104-1111, 2010).
- the average discrimination rate of the discrimination rates obtained by each body direction detecting means was 85.9%.
- the size of the cell is not limited to 5 ⁇ 5 pixels, but can be other sizes.
- the object detection device that performs the preparation phase and the object detection device that performs the determination phase may be different from each other. It is possible to adopt an operation in which the detected object is stored in the hard disk.
- HOG of Non-Patent Document 1 the direction of the edge is examined at each of several important locations on the image, but in CO-HOG (Co-occurrence-HOG), at several important locations on the image.
- the recognition accuracy is improved.
- the number of bins in the histogram is fixed to one like the original HOG. Therefore, the improved CO-HOG in which the HOG of the present invention is applied to the conventional CO-HOG technique (a plurality of HOG feature quantities having different numbers of bins are calculated at each of a plurality of important locations on the image). If this method is used, the recognition rate can be further improved.
- An object detection method for constructing a reference for detecting an object by calculating a plurality of HOG feature quantities having different numbers of bins for each position of a local region in an image and an object detection apparatus using the method.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Bioinformatics & Computational Biology (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- General Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
Description
屋外で撮像された画像を対象とした物体の検出は、照明条件の変化やオクルージョン(画像上での移動体同士の重なり)等が影響して容易でないが、交通事故を減少させる効果があるため世界中で活発な研究がなされている。
非特許文献1には、このHOG特徴量とSVM(Support Vector Machine)を用いた手法が記載されている。この方法は、一定の大きさを備えたセル(ブロック)を画像中で移動させながら順次そのブロック部分のHOG特徴量を算出し、その画像に人が存在するか否かを検出するものである。
また、その他にもHOG特徴量を用いた物体の検出方法として、非特許文献2には複数のHOG特徴量間の共起を表すJoint特徴を用いた方法が記載されている。そして、非特許文献3には、ブロックの大きさを変えて複数のHOG特徴量を算出する手法が記載されている。
そこで、出願人は物体の存否を検知する識別器を、一のビン数からなるHOG特徴量を用いて構築する従来の手法を変えることによって、物体の検出精度の向上を図れないかという点に着目した。
本発明は、かかる事情に鑑みてなされるもので、画像中の局所領域の各位置に対してビン数の異なる複数のHOG特徴量を算出し、物体を検出するための基準を構築する物体の検出方法及びその方法を用いた物体の検出装置を提供することを目的とする。
ここで、全身、上半身及び下半身のそれぞれの向きを検知した中で最も多かった検知方向を、被検出物全体の向きと判定することができる。
ここで、全身、上半身及び下半身のそれぞれの向きを検知した中で最も多かった検知方向を、被検出物全体の向きと判定することができる。
図1~図5に示すように、本発明の一実施例に係る物体の検出方法及びその方法を用いた物体の検出装置10は、検出対象画像について輝度勾配を示すHOG特徴量(A)を算出し、被検出物を撮像したサンプル画像20について予め算出した輝度勾配を示すHOG特徴量(B)を基準にして検出対象画像中の被検出物の存否を検知するものである。
以下、これらについて詳細に説明する。
計算機12は、カメラ11に信号接続されており、カメラ11で撮像した画像に被検出物P(図2参照)が存在しているか否かを判定し、存在している場合はその向きの判定も行って、判定結果をディスプレイ13に表示する。
ハードディスク15には、カメラ11で撮像した画像について輝度勾配を示すHOG特徴量を算出する演算手段17と、演算手段17によって構築され、画像中の被検出物Pの存否を判定する識別器18が搭載されている。ここでいう「画像」には、被検出物Pを撮像した図2に示すサンプル画像20と、被検出物Pが撮像されているか否かの判定がなされる検出対象画像の2種類があり、共にカメラ11によって撮像される(以下、単に「画像」という場合は、サンプル画像20と検出対象画像の2種類を指す)。
なお、演算手段17及び識別器18はハードディスク15に記憶されているプログラムである。また、計算機12には、CPU14を搭載した回路とカメラ11、ディスプレイ13及びハードディスク15をそれぞれ信号接続するインターフェース11a、13a、15aが設けられている。
HOG特徴量とは、図3で示すように、セル19の輝度勾配方向を横軸、輝度勾配の大きさ(強度)を縦軸として輝度勾配をヒストグラム化した特徴量であり、0°~180°の方向を複数の方向領域に分割し、各方向領域に対応する輝度勾配の大きさをヒストグラムのビンの高さで示したものである。
画像中では物体の輪郭(その物体と背景との境界)が位置する箇所で輝度勾配が大きくなるので、HOG特徴量を求めることにより画像中にある物体の形状を検知することができる。このため、予めサンプル画像20から被検出物P(本実施例では人)を示すHOG特徴量(B)のパターン(特徴量パターンを指し、以下単に「特徴量パターン」ともいう)を学習し、その学習した特徴量パターンを基にして検出対象画像中に被検出物が存在しているか否かを判定するための識別器18を構築することができる。
演算手段17は、セル19内にある各ピクセルの輝度を基にして、式1により輝度勾配の大きさmを算出し、式2により輝度勾配方向θを算出する。
セル19は、画像のX軸方向両端部の一側から他側に1ピクセルずつ移動し、X軸方向の他側端部に到達した段階で、X方向の一側端部でかつY方向に1ピクセルずれた位置に移動し、その後X軸方向に1ピクセルずつの移動を行う。
演算手段17は、セル19が画像の全領域を移動し終えるまで、セル19の移動を続け、セル19が移動するごとに、ビン数の異なるN個(N≧2)のHOG特徴量を算出する。
従って演算手段17はセル19の移動の開始から終了までに、セル19の1456個(1456=(30-5+1)×(60-5+1))の配置位置についてそれぞれN個、合計で1456N個のHOG特徴量を算出することになる。本実施例では、画像中におけるセル19の各配置位置についてビン数が3、5、7、9の計4つ(N=4)のHOG特徴量が算出される。
そして、演算手段17は、検出対象画像中のセル19の各配置位置に対してそれぞれビン数の異なるN個のHOG特徴量(A)を算出し、この算出されたHOG特徴量(A)を識別器18に与える。
また、識別器18は、検出対象画像中に人が存在することを検知した場合、検出対象画像について演算手段17が算出したHOG特徴量(A)を基にして、検出対象画像中の人の全身、上半身及び下半身の検出を行い、その検出した人の全身、上半身及び下半身それぞれについて向きを検知して、検出対象画像中の人全体の向きを判定する。
検出対象画像中の人の存否及び人の向きの判定方法は、検出対象画像の人の存否及び人の向きを判定するための識別器18を構築する準備フェーズと、識別器18が検出対象画像中の人の存否及び人の向きの判定を行う判定フェーズに分けられる。
準備フェーズでは、サンプル画像20中のセル19の複数の異なる配置位置(即ち複数の局所領域)それぞれについてビン数の異なるN個のHOG特徴量(B)、人が存在しない場合の画像(図示せず)中のセルの複数の異なる配置位置それぞれについてビン数の異なる複数(例えば、HOG特徴量(B)と同程度の個数)のHOG特徴量をそれぞれ算出し、人の存在を示す特徴量パターンを求める工程と、検出対象画像中の人の存否を判定する識別器18を、特徴量パターンを基にして構築する工程が行われる。
判定フェーズでは、検出対象画像中のセル19の複数の異なる配置位置それぞれに対して算出するビン数の異なるN個のHOG特徴量(A)を基にして、識別器18によって検出対象画像中の人の存否及び人の向きを判定する工程が行われる。
画像中のHOG特徴量を算出するセル19の最初の位置を1番目のセル19の位置、k-1回目の移動を行った段階でのセル19の位置をk番目のセル19の位置とすると(kは2以上の整数)、k番目のセル19の位置のビン数BbのHOG特徴量のi番目のビンの要素akb iは、式5で求めることができる。
準備フェーズでは、まず、図4に示すように、画像内に人を収めた複数のサンプル画像20をカメラ11によって撮像し、そのサンプル画像20を、記憶媒体(例えばUSBメモリ)あるいは通信ネットワークを介してデータベースとして計算機12のハードディスク15に入力してハードディスク15に記憶させる(ステップS1)。
各サンプル画像20には、カメラ11に対して正面を向いた人、左を向いた人、右を向いた人が撮像されている。
ステップS2では、図4に示すように、サンプル画像20中をセル19が移動し(ステップS2-1)、演算手段17は、セル19が移動する度に、予め定められたビン数からHOG特徴量(B)の算出を行うビン数を決定し(ステップS2-2)、その決定したビン数のHOG特徴量(B)を算出する(ステップS2-3)。
演算手段17は、セル19をサンプル画像20中で固定した状態で異なるビン数のHOG特徴量(B)の算出を行い、予め定められた全てのビン数のHOG特徴量(B)を算出し終えた後に、セル19を移動する(ステップS2-4)。
このHOG特徴量(B)の算出は、各サンプル画像20について、サンプル画像20の始点(1番目のセル19の位置)からセル19が移動して終点位置(1456番目のセル19の位置)に配置されるまで順次行われる(ステップS2-5)。
例えば、右向きの人を検出しようとする場合、右を向いた人のサンプル画像を用いて、画像内の全セルを対象に、ビン数3、5、7、及び9の各HOG特徴量を算出し、AdaBoostのアルゴリズムを用いて特徴選択を行い、例えば、右向きの人の頭部前側を示すのに有効なビンのHOG特徴量を抽出する。図3には、ビン数3の場合では1番目及び3番目のHOG特徴量が、ビン数5の場合では1番目、3番目、及び4番目のHOG特徴量が、ビン数7の場合では1番目、2番目、3番目、5番目、及び7番目のHOG特徴量が、ビン数9の場合では1番目、2番目、5番目、6番目、及び8番目のHOG特徴量がそれぞれ選択されたことを示している。その結果、右向きの人の頭部前側を含むセルの初めのHOG特徴量は(3+5+7+9)個の成分を有していたが、AdaBoostにより、このセルのHOG特徴量を、右向きの人の検出(物体検出)に有効な(2+3+5+5)個の成分のみで構成することができる。
同様に、サンプル画像中の他の部位、例えば、胴体背面側上部、胴体前面側中部、臀部、脚前面上部、脚背面中部、脚前面下部等をそれぞれ含むセル内を対象にビン数3、5、7、及び9の各HOG特徴量を算出し、AdaBoostのアルゴリズムを用いて特徴選択する。これにより、ビン数3、5、7、及び9の各HOG特徴量から、右向きの人の各部位を示すのに有効なビンのみのHOG特徴量が求まる。
このように、ビン数3、5、7、及び9の各HOG特徴量において、右を向いた人の存在を検出する基準に相応しくないHOG特徴量のビンは用いず、右を向いた人の存在を検出する基準に相応しい、同じヒストグラムの他のビン(成分)又はビン数の異なる他のHOG特徴量(B)のビン(成分)を用いることにより、右を向いた人の存在の検出に効果的な成分から構成される特徴量を抽出することができ、被検出物の存否判定精度を高めることが可能となる。
正面を向いた人を検出しようとする場合、正面を向いた人のサンプル画像を用いて、左向きの人を検出しようとする場合、左を向いた人のサンプル画像を用いて、同様の処理を行うことで、正面を向いた人のビン数3、5、7、及び9の各HOG特徴量から、正面を向いた人の各部位を示すのに有効なビンのみのHOG特徴量が、左を向いた人のビン数3、5、7、及び9の各HOG特徴量から、左を向いた人の各部位を示すのに有効なビンのみのHOG特徴量がそれぞれ求まる。
このステップS3を設けることにより、画像中の人の存否及び向きの判定をするのに有効なビンのみを、特徴量パターンを算出するための基にすることができる。なお、有効なビンの選択には、AdaBoost以外にも、例えばPCA(Principle Component Analysis)のようなデータ圧縮法や他のアルゴリズムを用いることができる。
ステップS3が完了後、AdaBoostにより選択されたHOG特徴量(B)のビンの各要素akb iに対し、式6により正規化を行う(ステップS4)。
ステップS4終了後、演算手段17はパターン識別手法の一つであるSVM学習により、サンプル画像20を基にして、画像中に人が存在することを示す特徴量パターンを求め、この特徴量パターンから、1)人の存否判定(人の全身検知判定)と、2)人の全身から上半身及び下半身を機械的に分ける過程と、3)人の向きの判定とを行う識別器18をハードディスク15内に構築する(ステップS5)。
以下、1)、2)、及び3)について各別に説明する。
1)人の存否判定
サンプル画像20を用いて、AdaBoostのアルゴリズムにより、画像中に人が存在することを示す特徴量パターンであるHOG特徴量(正面を向いた人の検出用に選択されたHOG特徴量、左を向いた人の検出用に選択されたHOG特徴量、及び右を向いた人の検出用に選択されたHOG特徴量)を複数選択し、これらのHOG特徴量を組合せて人の検知を行う検知手段(識別器18の一部)を形成する。
ここで、検知手段は、人の全身検知判定をした場合は正の出力値、人が存在しないと判定した場合は負の出力値をそれぞれ出力する機能を有するので、サンプル画像20に対して判定手段が人が存在すると判定した結果(正の出力値)と、人が存在しない場合の画像に対して検知手段が人が存在しないと判定した結果(負の出力値)をSVM学習することから、検知手段が人の全身検知判定する際の出力値(人検知判定閾値)を決定することができる。
従って、検出対象画像を検知手段を用いて判定した際の出力値を求め、得られた出力値と人検知判定閾値を比較して、出力値が人検知判定閾値以上では人が存在すると判定し、出力値が人検知判定閾値未満では人が存在しないと判定する判定手段(識別器18の一部)を形成する。
2)人の上半身及び下半身の検知
人の全身が検知された場合、人の全身を機械的に上下に2分割することにより、検知された人の上半身及び下半身をそれぞれ特定することができる。従って、人の全身検出用に選択されたHOG特徴量を、人の上半身及び下半身の検知にそれぞれ有効なHOG特徴量に分ける分割手段(識別器18の一部)を形成することにより、人の上半身及び下半身がそれぞれ検知された場合に相当する出力値を求めることができる。
3)人の向きの判定
人の全身を検知した際の判定手段の出力値とサンプル画像20中の人の向き(正面向き、左向き、及び右向き)をSVM学習することから、1vs.1法では、右向きか正面向きかの優劣判定の閾値、正面向きか左向きかの優劣判定の閾値、左向きか右向きかの優劣判定の閾値をそれぞれ決定する。また、1vs.rest法では、右向きか右向き以外か(即ち、正面向きか左向きか)の優劣判定の閾値、左向きか左向き以外か(即ち、正面向きか右向きか)の優劣判定の閾値、正面向きか正面向き以外か(即ち、右向きか左向きか)の優劣判定の閾値をそれぞれ決定する。同様に、人の上半身及び人の下半身の場合について、1vs.1法及び1vs.rest法での各閾値を求める。
そして、検出対象画像中に人が存在すると判定された場合において、人の全身、人の上半身、及び人の下半身それぞれの出力値とvs.1法及び1vs.rest法の各優劣判定の閾値を比較して、右向きか正面向きか、正面向きか左向きか、左向きか右向きか、右向きか右向き以外か、左向きか左向き以外か、正面向きか正面向き以外かの優劣判定をそれぞれ行って、それらの結果を全て重ね合せて、最も頻度の多い方向を人全体の向き(被検出物全体)の向きとして判定する身体方向検出手段(識別器18の一部)を形成する。
なお、検知手段、判定手段、分割手段、及び身体方向検出手段のそれぞれの機能を発現するプログラムをハードディスク15内に記録することにより、識別器18をハードディスク15内に構築することができる。
ステップS5により識別器18が構築され準備フェーズが終了する。
演算手段17は、図5に示すように、検出対象画像中でセル19を移動させながら、セル19の移動ごとにビン数の異なる複数のHOG特徴量(A)を算出し、識別器18に与える(ステップS1’)。
識別器18は、演算手段17から与えられた検出対象画像のHOG特徴量(A)を基にして検出対象画像中に人が存在するか否かを判定する(ステップS2’)。
一方、識別器18は、検出対象画像中に人が存在しているという判定をした場合、検出対象画像のHOG特徴量(A)を基にして検出対象画像中の人の全身、上半身及び下半身を検出し、検出した被検出物の全身、上半身及び下半身のそれぞれの向きを検知して、人全体の向きを判定する(ステップS4’)。
そして、人が存在する旨とその人の向きの判定結果がディスプレイ13上に表示される(ステップS5’)。
イ)正面向き:positive、右向き:negative
ロ)右向き:positive、左向き:negative
ハ)左向き:positive、正面向き:negative
また、1vs.rest法のpositive及びnegativeは以下の通りである。
ニ)正面向き:positive、右向き又は左向き(正面向き以外):negative
ホ)右向き:positive、正面向き又は左向き(右向き以外):negative
へ)左向き:positive、右向き又は正面向き(左向き以外):negative
左向きの人の画像500枚(IL)
=100枚(IL (1))+100枚(IL (2))+100枚(IL (3))+100枚(IL (4))+100枚(IL (5))
正面向きの人の画像500枚(IF)
=100枚(IF (1))+100枚(IF (2))+100枚(IF (3))+100枚(IF (4))+100枚(IF (5))
右向きの人の画像500枚(IR)=100枚(IR 1))+100枚(IR (2))+100枚(IR (3))+100枚(IR (4))+100枚(IR (5))
となる。
次いで、図6に示すように、各500枚の画像を、ケース1~ケース5として示す5通りのやり方で学習用画像400枚と評価用画像100枚に分ける
(1)ケース1について、学習用画像IL (2)+IL (3)+IL (4)+IL (5)及びIF (2)+IF (3)+IF (4)+IF (5)のうちそれぞれIL (2)+IL (3)+IL (4)の300枚及びIF (2)+IF (3)+IF (4)の300枚に対してAdaboostを実行し、弱識別器(右を向いた人の検出用に選択された複数のHOG特徴量、正面を向いた人の検出用に選択された複数のHOG特徴量)を求めた後に、強識別器(右を向いた人の検出用に選択された複数の弱識別器を組み合わせたもの、正面を向いた人の検出用に選択された複数の弱識別器を組み合わせたもの)を求め、強識別器を用いて学習用画像のうちIL (5)及びIF (5)のそれぞれ100枚、計200枚を特徴空間に写像し、それらの分布を最も効果的に分ける識別面(分類面)をSVM学習により求める。これにより、右向きか正面向きかを判定する身体方向検出手段が求まる。
(2)SVM学習により求めた識別面(右向きか正面向きかを判定する身体方向検出手段)を用いて、評価用画像IL (1)+IF (1)(200枚)により識別率(認識率―正しく分類された画像の比率)を求める。
(3)ケース2~ケース5について、(1)及び(2)の処理を行い、それぞれ識別率を求める。
(4)ケース1~ケース5の識別率の平均識別率を求め、これを右向きか正面向きかを判定する身体方向検出手段の性能とする。
同様に、正面向きか左向きかを判定する識別器(身体方向検出手段)、右向きか左向きかを判定する識別器(身体方向検出手段)を、上記(1)~(4)の手続により求める。
(5)ケース1について、学習用画像IL (2)+IL (3)+IL (4)+IL (5)及びIF (2)+IF (3)+IF (4)+IF (5)+IR (2)+IR (3)+IR (4)+IR (5)のうちそれぞれIL (2)+IL (3)+IL (4)の300枚及びIF (2)+IF (3)+IF (4)+IR (2)+IR (3)+IR (4)の600枚に対してAdaboostを実行し、弱識別器(右を向いた人の検出用に選択された複数のHOG特徴量、正面を向いた人及び左を向いた人の検出用にそれぞれ選択された複数のHOG特徴量)を求めた後に、強識別器(右を向いた人の検出用に選択された複数の弱識別器を組み合わせたもの、正面を向いた人及び左を向いた人の検出用にそれぞれ選択された複数の弱識別器を組み合わせたもの)を求め、強識別器を用いて学習用画像のうちIL (5)+IF (5)+IR (5)の300枚を特徴空間に写像し,それらの分布を最も効果的に分ける識別面(分類面)をSVM学習により求める.
(6)SVM学習により求めた識別面(右向きか右向き以外かを判定する身体方向検出手段)を用いて、評価用画像IL (1)+IF (1)+IR (1)(300枚)により識別率(認識率―正しく分類された画像の比率)を求める。
(7)ケース2~ケース5について、(5)及び(6)の処理を行い、それぞれ識別率を求める。
(8)ケース1~ケース5の各識別率の平均識別率を求め、これを右向きか右向き以外かを判定する身体方向検出手段の性能とする。
同様に、正面向きか正面向き以外かを判定する識別器(身体方向検出手段)、左向きか左向き以外かを判定する識別器(身体方向検出手段)を、上記(5)~(7)の手続によて求める。
識別率(認識率)=正しく識別された画像枚数/評価用の全画像枚数
によって定義する。
ビン数を3、5、7、及び9としたHOG特徴量(A)を基にして評価画像中の人の全身、上半身、及び下半身の検出を行い、検出した人の全身、上半身、及び下半身のそれぞれの向きを検知して、人全体の向きを判定した際の総合性能は、識別率(認識率)90.1%であった。
実験例と同様に、ケース1~ケース5の学習用画像及び評価用画像をそれぞれ準備し、学習用画像について非特許文献1のHOG特徴量(即ち、DalalとTriggsによるオリジナルHOG特徴量)を算出し、Adaboostを実行して弱識別器を求めた後に、強識別器を求め、SVM学習を行うことにより、右向きか正面向きか、正面向きか左向きか、左向きか右向きか、右向きか右向き以外か、左向きか左向き以外か、正面向きか正面向き以外かの人の向きを判定する身体方向検出手段をそれぞれ求めた。そして、評価用画像のオリジナルHOG特徴量を算出し、各身体方向検出手段を用いて人の向きを判定した際の識別率(認識率)を求めた。なお、オリジナルHOG特徴量は、セルを画像全体に亘って設定して算出するが、各セルの濃度ヒストグラムのビン数は一定である。
各身体方向検出手段で得られた識別率の平均識別率を求めると66.5%であった。
実験例と同様に、ケース1~ケース5の学習用画像及び評価用画像をそれぞれ準備し、学習用画像について、人マスクを用いてHOG特徴量を算出し、Adaboostを実行して弱識別器を求めた後に、強識別器を求め、SVM学習を行うことにより、右向きか正面向きか、正面向きか左向きか、左向きか右向きか、右向きか右向き以外か、左向きか左向き以外か、正面向きか正面向き以外かの人の向きを判定する身体方向検出手段をそれぞれ求めた。そして、評価用画像において、人マスクを用いてHOG特徴量を算出し、各身体方向検出手段を用いて人の向きを判定した際の識別率(認識率)を求めた。なお、人マスクを用いたHOG特徴量は、人の形をマスクとして定義し、そのマスクの範囲内の画像上にセルを設定して算出するが、各セルの濃度ヒストグラムのビン数は一定である(中島、タン、石川、森江、「HOG特徴量と人マスクを用いた人物および身体方向の検出」、画像電子学会誌、39巻、p1104~1111、2010年)。
各身体方向検出手段で得られた識別率の平均識別率を求めると85.9%であった。
例えば、セルの大きさは5×5ピクセルに限定されず、他の大きさにすることができる。
また、準備フェーズを行う物体の検出装置と、判定フェーズを行う物体の検出装置はそれぞれ別のものであってもよく、準備フェーズ用の装置で作成した識別器を、複数の判定フェーズ用の装置のハードディスクに記憶させて被検出物の検出を行う運用を採用することができる。
非特許文献1のHOGの場合、画像上の複数の重要な個所で1個所毎にエッジの方向を調べるが、CO-HOG(Co-occurrence-HOG)では、画像上の複数の重要な個所で2個ずつ同時にエッジの方向を調べるので認識精度が上がる。しかし、ヒストグラムのビン数はオリジナルHOGと同様にひとつに固定されている。従って、従来のCO-HOGの手法に本発明のHOGを適用した(画像上の複数の重要な個所それぞれにおいて、ビン数の異なる複数のHOG特徴量を算出するようにした)改良型CO-HOGの手法を用いると、更に認識率の向上が可能になる。
Claims (8)
- 検出対象画像について輝度勾配を示すHOG特徴量(A)を算出し、被検出物を撮像したサンプル画像について予め算出した輝度勾配を示すHOG特徴量(B)を基準にして前記検出対象画像中の前記被検出物の存否を検知する物体の検出方法において、
前記サンプル画像中の複数の局所領域それぞれについてビン数の異なる複数の前記HOG特徴量(B)を算出し、前記被検出物の存在を示す特徴量パターンを求める工程と、
前記検出対象画像中の前記被検出物の存否を判定する識別器を、前記特徴量パターンを基にして構築する工程と、
前記検出対象画像中の複数の局所領域それぞれに対して算出するビン数の異なる複数の前記HOG特徴量(A)を基に、前記識別器によって前記検出対象画像中の前記被検出物の存否を判定する工程とを有することを特徴とする物体の検出方法。 - 請求項1記載の物体の検出方法において、前記被検出物は人であって、前記識別器は、ビン数の異なる複数の前記HOG特徴量(A)を基にして前記検出対象画像中の前記被検出物の全身、上半身及び下半身の検出を行い、検出した該被検出物の全身、上半身及び下半身のそれぞれの向きを検知して、該被検出物全体の向きを判定することを特徴とする物体の検出方法。
- 請求項1又は2記載の物体の検出方法において、前記各HOG特徴量(B)の複数のビンから、学習アルゴリズムによって前記特徴量パターンを求めるのに有効な前記ビンを選択することを特徴とする物体の検出方法。
- 請求項3記載の物体の検出方法において、前記学習アルゴリズムはAdaBoostであることを特徴とする物体の検出方法。
- 検出対象画像について輝度勾配を示すHOG特徴量(A)を算出し、被検出物を撮像したサンプル画像について予め算出した輝度勾配を示すHOG特徴量(B)を基準にして前記検出対象画像中の前記被検出物の存否を検知する物体の検出装置において、
前記サンプル画像中の複数の局所領域それぞれについてビン数の異なる複数の前記HOG特徴量(B)を算出し、複数の該HOG特徴量(B)から前記被検出物の存在を示す特徴量パターンを求め、しかも、前記検出対象画像中の複数の局所領域それぞれについてビン数の異なる複数の前記HOG特徴量(A)を算出する演算手段と、
前記演算手段によって前記特徴量パターンを基に構築され、前記検出対象画像中の前記被検出物の存否を判定する識別器とを有し、
前記識別器は、前記演算手段が算出したビン数の異なる複数の前記HOG特徴量(A)を基にして前記検出対象画像中の前記被検出物の存否を判定することを特徴とする物体の検出装置。 - 請求項5記載の物体の検出装置において、前記被検出物は人であって、前記識別器は、ビン数の異なる複数の前記HOG特徴量(A)を基にして前記検出対象画像中の前記被検出物の全身、上半身及び下半身の検出を行い、検出した該被検出物の全身、上半身及び下半身のそれぞれの向きを検知して、該被検出物全体の向きを判定することを特徴とする物体の検出装置。
- 請求項5又は6記載の物体の検出装置において、前記演算手段は、前記各HOG特徴量(B)の複数のビンから、学習アルゴリズムによって前記特徴量パターンを求めるのに有効な前記ビンを選択することを特徴とする物体の検出装置。
- 請求項7記載の物体の検出装置において、前記学習アルゴリズムはAdaBoostであることを特徴とする物体の検出装置。
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP11844177.3A EP2648159A4 (en) | 2010-11-29 | 2011-11-28 | Object detecting method and object detecting device using same |
| JP2012546862A JP5916134B2 (ja) | 2010-11-29 | 2011-11-28 | 物体の検出方法及びその方法を用いた物体の検出装置 |
| US13/990,005 US8908921B2 (en) | 2010-11-29 | 2011-11-28 | Object detection method and object detector using the method |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2010265402 | 2010-11-29 | ||
| JP2010-265402 | 2010-11-29 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2012073894A1 true WO2012073894A1 (ja) | 2012-06-07 |
Family
ID=46171826
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2011/077404 Ceased WO2012073894A1 (ja) | 2010-11-29 | 2011-11-28 | 物体の検出方法及びその方法を用いた物体の検出装置 |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US8908921B2 (ja) |
| EP (1) | EP2648159A4 (ja) |
| JP (1) | JP5916134B2 (ja) |
| WO (1) | WO2012073894A1 (ja) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2014056572A (ja) * | 2012-09-11 | 2014-03-27 | Sharp Laboratories Of America Inc | 勾配方位のヒストグラムによるテンプレート・マッチング |
| KR20140081254A (ko) * | 2012-12-21 | 2014-07-01 | 한국전자통신연구원 | 사람 검출 장치 및 방법 |
| JP2015201151A (ja) * | 2014-04-04 | 2015-11-12 | 国立大学法人豊橋技術科学大学 | 三次元モデル検索システム、及び三次元モデル検索方法 |
| JP2020035338A (ja) * | 2018-08-31 | 2020-03-05 | 国立大学法人岩手大学 | 物体検出方法及び物体検出装置 |
Families Citing this family (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP5502149B2 (ja) * | 2012-06-29 | 2014-05-28 | 本田技研工業株式会社 | 車両周辺監視装置 |
| GB201219261D0 (en) * | 2012-10-26 | 2012-12-12 | Jaguar Cars | Vehicle access system and method |
| US9064157B2 (en) * | 2012-11-30 | 2015-06-23 | National Chung Shan Institute Of Science And Technology | Pedestrian detection systems and methods |
| SG11201704076YA (en) * | 2014-11-18 | 2017-06-29 | Agency Science Tech & Res | Method and device for traffic sign recognition |
| TWI622000B (zh) * | 2015-09-29 | 2018-04-21 | 新加坡商雲網科技新加坡有限公司 | 行人偵測系統及方法 |
| CN107025458B (zh) * | 2016-01-29 | 2019-08-30 | 深圳力维智联技术有限公司 | 人车分类方法和装置 |
| WO2017132846A1 (en) * | 2016-02-02 | 2017-08-10 | Nokia Technologies Oy | Adaptive boosting machine learning |
| US10169647B2 (en) | 2016-07-27 | 2019-01-01 | International Business Machines Corporation | Inferring body position in a scan |
| DE102018209306A1 (de) * | 2018-06-12 | 2019-12-12 | Conti Temic Microelectronic Gmbh | Verfahren zur Detektion von Kennleuchten |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2009181220A (ja) * | 2008-01-29 | 2009-08-13 | Chube Univ | 物体検出装置 |
| JP2009301104A (ja) * | 2008-06-10 | 2009-12-24 | Chube Univ | 物体検出装置 |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9183227B2 (en) * | 2008-09-19 | 2015-11-10 | Xerox Corporation | Cross-media similarity measures through trans-media pseudo-relevance feedback and document reranking |
| IT1394181B1 (it) * | 2009-05-07 | 2012-06-01 | Marchesini Group Spa | Metodo di segmentazione basato sulle caratteristiche per segmentare una pluralita' di articoli duplicati disposti alla rinfusa e gruppo che attua tale metodo per alimentare una macchina confezionatrice |
| US8639042B2 (en) * | 2010-06-22 | 2014-01-28 | Microsoft Corporation | Hierarchical filtered motion field for action recognition |
-
2011
- 2011-11-28 US US13/990,005 patent/US8908921B2/en not_active Expired - Fee Related
- 2011-11-28 WO PCT/JP2011/077404 patent/WO2012073894A1/ja not_active Ceased
- 2011-11-28 JP JP2012546862A patent/JP5916134B2/ja not_active Expired - Fee Related
- 2011-11-28 EP EP11844177.3A patent/EP2648159A4/en not_active Withdrawn
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2009181220A (ja) * | 2008-01-29 | 2009-08-13 | Chube Univ | 物体検出装置 |
| JP2009301104A (ja) * | 2008-06-10 | 2009-12-24 | Chube Univ | 物体検出装置 |
Non-Patent Citations (5)
| Title |
|---|
| NAKASHIMA, Y. ET AL.: "Detecting a human body direction using a feature selection method, Control Automation and Systems (ICCAS)", 2010 INTERNATIONAL CONFERENCE, 27 October 2010 (2010-10-27), pages 1424 - 1427, XP008169178 * |
| See also references of EP2648159A4 * |
| TAKASHI HADA: "Jitsukankyo Gazo kara no Satsuei Ichi no Tokutei", ITE TECHNICAL REPORT, vol. 34, no. 34, 13 September 2010 (2010-09-13), XP008169135 * |
| YUKI NAKAJIMA ET AL.: "HOG Tokuchoryo o Mochiita Hito no Shintai Hoko Kenshutsu", DAI 23 KAI BIOMEDICAL FUZZY SYSTEM GAKKAI NENJI TAIKAI KOEN RONBUNSHU, 9 October 2010 (2010-10-09), XP008169364 * |
| YUUKI NAKASHIMA ET AL.: "On Detecting a Human Body Direction Using an Image Information", SICE ANNUAL CONFERENCE 2010, 18 August 2010 (2010-08-18), pages 1521 - 1522, XP031776180 * |
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2014056572A (ja) * | 2012-09-11 | 2014-03-27 | Sharp Laboratories Of America Inc | 勾配方位のヒストグラムによるテンプレート・マッチング |
| KR20140081254A (ko) * | 2012-12-21 | 2014-07-01 | 한국전자통신연구원 | 사람 검출 장치 및 방법 |
| KR101724658B1 (ko) * | 2012-12-21 | 2017-04-10 | 한국전자통신연구원 | 사람 검출 장치 및 방법 |
| JP2015201151A (ja) * | 2014-04-04 | 2015-11-12 | 国立大学法人豊橋技術科学大学 | 三次元モデル検索システム、及び三次元モデル検索方法 |
| JP2020035338A (ja) * | 2018-08-31 | 2020-03-05 | 国立大学法人岩手大学 | 物体検出方法及び物体検出装置 |
| JP7201211B2 (ja) | 2018-08-31 | 2023-01-10 | 国立大学法人岩手大学 | 物体検出方法及び物体検出装置 |
Also Published As
| Publication number | Publication date |
|---|---|
| US8908921B2 (en) | 2014-12-09 |
| US20130251206A1 (en) | 2013-09-26 |
| JP5916134B2 (ja) | 2016-05-11 |
| EP2648159A4 (en) | 2018-01-10 |
| JPWO2012073894A1 (ja) | 2014-05-19 |
| EP2648159A1 (en) | 2013-10-09 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP5916134B2 (ja) | 物体の検出方法及びその方法を用いた物体の検出装置 | |
| CN103605953B (zh) | 基于滑窗搜索的车辆兴趣目标检测方法 | |
| CN101981582B (zh) | 用于检测对象的方法和装置 | |
| KR101395094B1 (ko) | 개체 검출 방법 및 시스템 | |
| Charfi et al. | Definition and performance evaluation of a robust SVM based fall detection solution | |
| Xu et al. | Detection of sudden pedestrian crossings for driving assistance systems | |
| US8385599B2 (en) | System and method of detecting objects | |
| CN105205486B (zh) | 一种车标识别方法及装置 | |
| US9443137B2 (en) | Apparatus and method for detecting body parts | |
| CN104680124A (zh) | 检测行人的影像处理装置及其方法 | |
| CN105608441B (zh) | 一种车型识别方法及系统 | |
| CN102915435B (zh) | 一种基于人脸能量图的多姿态人脸识别方法 | |
| CN106570490B (zh) | 一种基于快速聚类的行人实时跟踪方法 | |
| CN110910421B (zh) | 基于分块表征和可变邻域聚类的弱小运动目标检测方法 | |
| CN106682641A (zh) | 基于fhog‑lbph特征的图像行人识别方法 | |
| CN114743264B (zh) | 拍摄行为检测方法、装置、设备及存储介质 | |
| Ohn-Bar et al. | Fast and robust object detection using visual subcategories | |
| Masmoudi et al. | Vision based system for vacant parking lot detection: Vpld | |
| WO2012046426A1 (ja) | 物体検出装置、物体検出方法および物体検出プログラム | |
| KR101542206B1 (ko) | 코아스-파인 기법을 이용한 객체 추출과 추적 장치 및 방법 | |
| JP2011248525A (ja) | 物体の検出装置及びその検出方法 | |
| Mitsui et al. | Object detection by joint features based on two-stage boosting | |
| CN108319906B (zh) | 基于车载红外视频的行人检测方法及系统 | |
| JP4548181B2 (ja) | 障害物検出装置 | |
| Kataoka et al. | Extended feature descriptor and vehicle motion model with tracking-by-detection for pedestrian active safety |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 11844177 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2012546862 Country of ref document: JP Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 13990005 Country of ref document: US |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2011844177 Country of ref document: EP |



