WO2016019709A1

WO2016019709A1 - A processing device and method for face detection

Info

Publication number: WO2016019709A1
Application number: PCT/CN2015/071466
Authority: WO
Inventors: Vijayachandran MARIAPPAN; Rahul Arvind JADHAV; Puneet Balmukund SHARMA
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2014-08-07
Filing date: 2015-01-23
Publication date: 2016-02-11
Anticipated expiration: 2017-02-07
Also published as: US20170161549A1; CN106462736A; EP3167407A4; CN106462736B; US10296782B2; EP3167407A1

Abstract

Disclosed is a processing device (102) and method (200) for faster face detection. a method (200) for detecting a presence of at least one face in at least one image is disclosed. The method (200) comprises of creating (202) an image patch map based on a plurality of face patches identified for at least one window in said in at least one image; estimating (204) a bounding box; and searching (206) within said bounding box to detect presence of said at least one face in said at least one image. The present invention discloses use of any classifier which works on top of any feature representation to identify face patches and then using a masking system to identify bounding boxes. The present invention is applicable in the area of but not limited to handheld terminals/devices, HD video Surveillance, and camera auto-focus based on face detection.

Description

A PROCESSING DEVICE AND METHOD FOR FACE DETECTION

TECHNICAL FIELD

The present invention relates to the field of image processing, and in particular, to a face-detection processing method and a processing device for face detection.

BACKGROUND

Face detection is an important research direction in the field of computer vision because of its wide potential applications, such as video surveillance, human computer interaction, face recognition, security authentication, and face image database management etc. Face detection is to determine whether there are any faces within a given image, and return the location and extent of each face in the image if one or more faces are present.

Today, high definition cameras is an affordable commodity and is been widely used in all types of applications, video surveillance, for instance. Video analytics in the form of face detection has to match the high resolution output from the cameras and thus the performance of these algorithms is extremely critical for overall performance of analytics.

Face detection algorithms are usually employed in smart phones, bio-metric devices to detect the face and later recognize them. All the smart phones today are equipped with a feature wherein it can unlock the phone by matching the faces. This application requires a fast face detection algorithm at its core. The exemplary output of a face detection engine is shown in figure 1.

The prior-art discloses a face detection framework which is essentially AdaBoost based cascaded classifier subsystem and has produced excellent accuracy with real-time performance. AdaBoost is a short form of "Adaptive Boosting" is a machine learning meta-algorithm which may be used n conjunction with many other types of learning algorithms to improve their performance. This performance though is directly proportional to the resolution of the image/video frame.

The general overall process of face detection algorithm is shown in figure 2 and the modules of any face detection algorithm available in the prior-art includes but not limited to:

Feature representation module: Any face detection (FD) system uses some sort of feature representation which can identify facial features and correlate them in way such that overall output can be judged as a face or a non-face. Examples of feature representations are, Haar as disclosed in the prior-art document by Viola-Jones, Local Binary Patterns (LBP) as disclosed in the prior-art document “Face description with local binary patterns: Application to face recognition, ” by Ahonen et. Al, Modified Census Transform (MCT) as disclosed in the prior-art “Face detection with the modified census transform” by Froba et. Al, etc. These are alternative representations (in place of pixel intensity) which usually have better invariance to illumination, slight changes in pose/expressions.

Classifier module: Classifier provides a way to correlate multiple features. Examples are Cascaded Adaboost Classifier as disclosed in Viola-Jones, Support Vector Machines (SVM) disclosed in “Pose estimation for category specific multi-view object localization” by Ozuysal et. Al.

Search space generator module: Given an image/video frame, a face can be present at any “location” and at any “scale” . Thus the FD logic has to search (using a sliding window approach) for the possibility of the face “at all locations” and “at all the scales” . This usually results in scanning of hundreds of thousands of windows even in a low resolution image.

Also there are various algorithms like bounding box based algorithms that tries to identify the bounding box within which there is a possibility of a face to be detected. Thus the face detection classifier now has to search only within this bounding box and thus improves the speed of detection dramatically. The estimated bounding box and the face box as shown in figure 3.

However, it may be understood that it is not necessary to always find a face within the estimated bounding box. Secondly the estimated bounding box might not be centered on the face.

The sliding window approach is the most common technique to generate search space used for objects detection and is disclosed in document by Viola-Jones disclosed above. A classifier is evaluated at every location, and an object is detected when the classifier response is above a preset threshold. Cascades disclosed in Viola-Jones prior-art, speed up the detection by rejecting the background quickly and spending more time on object like regions. Although cascades were introduced, scanning with fine grid spacing is still computationally expensive.

To increase the scanning speed one approach is to train a classifier with perturbed training data to handle small shifts in the object location. But this significantly increases the number of weak classifiers required in the overall model since the training data will be noisy (unaligned/perturbed) .

Another simple approach is to increase the grid spacing (decreases the number of windows being evaluated) . Unfortunately, as the grid spacing is increased the number of detection decreases rapidly.

As shown in figure 4, in the graph (blue line) , we can see that as the grid spacing increases there is an exponential drop in the accuracy of the regular full face classifier.

Also, the other prior-art document “Face Bounding Box for faster face detection, ” by Venkatesh et. al., discloses a technique to reduce the number of miss detections while increasing the grid spacing when using the sliding window approach for object detection.

The disclosed technique trains a classifier Cpatch using decision tree and this Cpatch classifier is evaluated on a regular grid, while the main classifier Cobject is placed on location predicted by Cpatch. The LHS (left hand side) figure shows a sample face with different patch locations shown in different colored rectangles. A patch is of size wp x hp and all the patches are given as an input to the decision tree. The leaf nodes of the decision tree corresponds to patches been identified. RHS figure shows patches identified on leaf nodes and the corresponding offsets for the full face.

The core idea of this technique is to use a decision tree based approach using very light-weight and simple features such as pixel intensity value and then use this Cpatch classifier as a pre-processing step. The actual Cobject classifier works only on the output from the Cpatch classifier. Thus if the Cpatch classifier is able to remove bulk of the windows then the Cobject classifier has relatively less work to be done resulting in improved performance. The face bounding box for faster face detection technique, as disclosed in Venkatesh et. al. is shown in figure 5.

There are other approaches which are based on skin color segmentation to speed up the face detection algorithms. These techniques try to check the portion of image where the skin color is found and then try to apply face detection only on that pockets/sub windows.

However, the technique that is discussed above results in loss of accuracy. As shown in figure 4 the line shows the data for the technique as disclosed in Venkatesh et. Al. It improves the accuracy but still is lower than the desirable. For e.g. at 6x6 grid spacing the accuracy is shown to be ～80％which is down by almost 15-18％from peak. Even though, all the disclosed techniques and the available techniques for face detection are used for accurate face detection, they still have a massive drawback of an amount of time that is spent in the detection process and reducing the processing time with higher accuracy rate. Further, the existing image processing or face detection algorithms requires high end processing, and accordingly requires a high end processing advanced hardware which involves higher cost. Furthermore, as the image processing or face detection algorithm requires high end processing, the usage of CPU for this purpose is also increased in the process.

In view of the drawbacks and limitation discussed above, there exists a need to provide an efficient technique for face detection with higher accuracy of detection, less processing time and the technique must work on low-cost hardware and must have low CPU usage.

SUMMARY

This summary is provided to introduce concepts related a processing device and method for faster face detection and the concepts are further described below in the detailed description. This summary is not intended to identify essential features of the claimed subject matter nor is it intended for use in determining or limiting the scope of the claimed subject matter.

The above-described problems are addressed and a technical solution is achieved in the present invention by providing a face-detection processing methods, and processing devices for faster face detection.

The invention, in various embodiments, addresses deficiencies in the prior art by providing a face-detection processing methods, and processing devices for faster face detection.

In one implementation, in view of the difficulties discussed above, the objective of this invention is to provide an image processing method which is able to detect a presence of at least one face in at least on image； which will not require a large memory capacity； which will be capable of performing high-speed processing in real time or offline； which can be produced at a low cost； which can detect specified patterns with certainty, with a very small probability of false positives.

In one implementation, a face detection method that may be used even on lower end hardware is disclosed. This technique ensures that a very low CPU usage is done for face detection method and thus can be employed on low-cost hardware.

In one implementation, an efficient technique to estimate the bounding box for the faces in the image such that the subsequent full face classifier can be applied within the bounding box only.

In one implementation, the technique disclosed in the present invention involves sliding the search window at higher pixel-shifts such that the total numbers of windows scanned are highly reduced.

In one implementation, a mechanism to locate face patches when the sliding pixel-shifts are increased and thus not impacting the output of the overall face classifier is disclosed. In a regular FD system, the grid spacing used is 1x1 i.e. 1 pixel shift in each (x, y) direction. The present invention disclosed achieves a grid spacing of 6x6 i.e. the sliding window is shifted with 6 pixels in both x and y directions. This achieves an overall reduction/window compression of 36: 1 i.e. in ideal scenarios the performance increase can be ～36 (6x6) times.

In one implementation, a specific consideration is given to maintain the accuracy of the present technique even at higher pixel shifts.

In one implementation, the technique to identify face patches rather than full face to estimate the bounding box at higher pixel shifts and then using this bounding box to search for the presence of a face is disclosed.

Accordingly in one implementation, a method (200) for detecting a presence of at least one face in at least one image is disclosed. The method (200) comprises of creating (202) an image patch map based on a plurality of face patches identified for at least one window in said in at least one image； estimating (204) a bounding box； and searching (206) within said bounding box to detect presence of said at least one face in said at least one image.

In one implementation, a processing device (102) is disclosed. The processing device (102) comprises of memory (108) storing instructions which, when executed by one or more processors (104) , cause the one or more processors (104) to perform operations that comprises of creating (202) an image patch map based on a plurality of face patches identified for at least one window in at least one image； estimating (204) a bounding box； and searching (206) within said bounding box to detect presence of at least one face in said at least one image, is disclosed. The said non-transitory computer readable storage medium (108) storing instructions and said one or more processors (104) are a part of a processing device (102) .

In one implementation, a processing device (102) is disclosed. The processing device (102) comprises of one or more storages (402) capable of storing one or more images and other data； and a face detector (404) . The processing device (102) is configured to perform operations that comprises of creating (202) an image patch map based on a plurality of face patches identified for at least one window in said one or more images； estimating (204) a bounding box； and searching (206) within said bounding box to detect presence of said at least one face in said one or more images.

In one implementation, creating (202) said image patch map comprises of identifying (302) said plurality of face patches for said at least one window, wherein said at least one window is a detected face region of said of at least one face； training (304) a patch classifier using said plurality of face patches identified； evaluating (306) said patch classifier trained； and applying (308) said patch classifier on said windows using a pre-defined grid spacing, thereby creating (202) said image patch map.

In one implementation, the present invention provides certain advantages that may include but not limited to:

The present invention improves face detection time multi-fold without impacting the accuracy.

The present invention may be used in real-time systems even with high definition videos/images.

The present invention is suitable for generic object detection and not constrained to face domain.

These and other features and advantages will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings.

BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS

The detailed description is described with reference to the accompanying figures. In the figures, the left-most digit (s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the drawings to refer like features and components.

Corresponding reference characters indicate corresponding parts throughout the several views. The exemplification set out herein illustrates preferred embodiments of the invention, in one form, and such exemplification is not to be construed as limiting the scope of the invention in any manner.

Figure 1 illustrates an output of a face detection engine (prior-art) , is shown, in accordance with an embodiment of the present subject matter.

Figure 2 illustrates a flow of face detection algorithm (prior-art) is shown, in accordance with an embodiment of the present subject matter.

Figure 3 illustrates an estimated Bounding box and the face box (prior-art) is shown, in accordance with an embodiment of the present subject matter.

Figure 4 illustrates a graph showing impact of grid spacing on detection accuracy (prior-art) is shown, in accordance with an embodiment of the present subject matter.

Figure 5 illustrates a face bounding box for faster face detection (prior-art) is shown, in accordance with an embodiment of the present subject matter.

Figure 6 illustrates a detection flow chart for bounding box based (prior-art) , is shown, in accordance with an embodiment of the present subject matter.

Figure 7 illustrates an operations to detect presence of at least one face in said at least one image executed by one or more processors (104) is shown, in accordance with an embodiment of the present subject matter.

Figure 8 illustrates a method (200) for detecting a presence of at least one face in at least one image is shown, in accordance with an embodiment of the present subject matter.

Figure 9 illustrates a method for creating (202) said image patch map is shown, in accordance with an embodiment of the present subject matter.

Figure 10 illustrates a special purpose processing device (102) for detecting a presence of at least one face in at least one image is shown, in accordance with an embodiment of the present subject matter.

Figure 11 illustrates a face patch classification in present invention is shown, in accordance with an embodiment of the present subject matter.

Figure 12 illustrates a face patch examples in present invention are shown, in accordance with an embodiment of the present subject matter.

Figure 13 illustrates a face patch masking operation in present invention is shown, in accordance with an embodiment of the present subject matter.

Figure 14 illustrates a subsequent localized search within bounding box in present invention is shown, in accordance with an embodiment of the present subject matter.

Figure 15 illustrates a flow chart for face detection in present invention is shown, in accordance with an embodiment of the present subject matter.

It is to be understood that the attached drawings are for purposes of illustrating the concepts of the invention and may not be to scale.

DETAILED DESCRIPTION

In order to make the aforementioned objectives, technical solutions and advantages of the present application more comprehensible, embodiments are described below with accompanying figures.

The objects, advantages and other novel features of the present invention will be apparent to those skilled in the art from the following detailed description when read in conjunction with the appended claims and accompanying drawings.

Processing devices and methods for faster face detection are described. The present technique disclosed uses a patch based approach for identification of the face patch and then applying a full face classifier in the bounding box.

The present technique is characterized in the way the patches are formed, the features that are used to train on the patches and the way the bounding box is defined, as compared to the existing prior-art cited and those are available in the art.

In one implementation, the present technique may be categorized into three major steps as shown in figure 15:

Applying patch classifier step: Patch classifier is applied on windows derived by doing a grid spacing of 6x6.

Estimate bounding box step: We have an image map from step 1 and then a mask is applied which checks how many of the patches of the window actually mapped to the face patch.

Searching within bounding box step: Once the 36x36 bounding box is found, the method searches within that bounding box by using an aggressive grid spacing of 1x1, wherein the grid spacing used is 1x1 i.e. 1 pixel shift in each (x, y) direction.

In one implementation, while considering patches the area surrounding the face box for patches. The figure 11 shows the different patches used each with a different color. In one implementation a face template size is 24x24 and the patches are formed using 36x36 area centered on 24x24 face area. This area is assumed considering the worst case scenarios for 6x6 grid spacing. In one example, the face box is the actual area occupies by a face /object to be identified in the image. In one implementation, the face box may be obtained by any of the known face detector or detection technique available in the art.

In one implementation, training the patch classifier Cpatch is achieved by training a decision tree using the 9 different types of patch samples as shown in the figure 11. The leaf node of the tree will identify the patch type. A Modified Census Transform (MCT) technique may be used for feature representation rather than simple binary tests as mentioned in the earlier approach. For decision tree, the nodes are split based on one-vs-all approach i.e. one patch vs the rest of the patches. Further, non-face samples in the training may not be used. It is understood that the goal of the Cpatch classifier is to identify the face patch accurately and not distinguish between face patch and non-faces.

In one implementation, evaluating the Cpatch classifier is achieved by using the Cpatch classifier. In the present invention, the application of classifier and the bounding box estimation differs significantly from the prior-art approaches. The Cpatch is applied on classifier on all the windows with a grid spacing of 6x6. As every window gives some patch type, an image patch map based on the patch identified is created for every window at a grid spacing of 6x6. The image patches type and the formation is shown in figure 12. In one example, said image patch map may include but not limited to different patch location information arranged in different rectangles. The patch may be of size wp×hp as shown in figure 11, and all the patches are given as an input to the decision tree. The leaf nodes of the decision tree corresponds to patches been identified. In one example, the image patch map may include an arrangement of pixel locations of different patches. Further, the image patch map may be obtained by any of the existing techniques in the prior-art.

In one implementation, Once the image patch map are obtained, a matrix mask of [1, 2, 3； 4, 5, 6； 7, 8, 9] is applied on it to check how many patches around the face have matched. In one implementation, a tolerance of 4 may be considered, i.e. if 4 or more types in the mask match then 36x36 area as a possible face bounding box is chosen. The face patch masking operation is shown in figure 13.

In one implementation, after the face bounding box is estimated, a local search within that bounding box is performed and in the worst case the number of 24x24 windows searched in the bounding box can be 36. As shown in figure 14, on the LHS, the bounding box that is estimated is shown and on RHS, a localized search done within the bounding box to identify the face is shown.

In one implementation, the present invention provides the usage of 6x6 grid spacing. The technical advantages of using this grid spacing is but not limited to,

The number of classified patches: In case of 6x6, the numbers of possible patches are 9 with an overlap area of 3 pixels. In case of 4x4 it (possible patches) will be more and in case of 8x8 it will be less. All of this is depending upon the overlapped area. This will result in too many or too less leaf nodes for CART/Random forest classifier which is used for patch classification.

Background area covered: The chosen grid size may result in some of the background area been covered in the test/train images. Usually some pixels around the eyes and till below the lips for a 24x24 face image are used. For bounding box the area is extended without actually zooming in the face image, thus this will mean that some of the background area such as ears, hair, chin will come into picture, if 8x8 size is used then more of background area may come into picture which will have adverse effect on the patch classifier output.

While aspects of described processing devices and methods for faster face detection, may be implemented in any number of different computing systems, environments, and/or configurations, the embodiments are described in the context of the following exemplary system.

While illustrative embodiments of the present invention are described below, it will be appreciated that the present invention may be practiced without the specified details, and that numerous implementation-specific decisions may be made to the invention described herein to achieve the developer's specific goals, such as compliance with system-related and business-related constraints, which will vary from one system to other system such a development effort might be complex and time-consuming, it would nevertheless be a routine undertaking for those of ordinary skill in the art having the benefit of this disclosure. For example, selected aspects are shown in block diagram form, rather than in detail, in order to avoid obscuring or unduly limiting the present invention. Such descriptions and representations are used by those skilled in the art to describe and convey the substance of their work to others skilled in the art. The present invention will now be described with reference to the drawings described below.

Referring now to figure 1 illustrates an output of a face detection engine (prior-art) , is shown, in accordance with an embodiment of the present subject matter.

Referring now to figure 2 illustrates a flow of face detection algorithm (prior-art) is shown, in accordance with an embodiment of the present subject matter.

Referring now to figure 3 illustrates an estimated Bounding box and the face box (prior-art) is shown, in accordance with an embodiment of the present subject matter.

Referring now to figure 4 illustrates a graph showing impact of grid spacing on detection accuracy (prior-art) is shown, in accordance with an embodiment of the present subject matter.

Referring now to figure 5 illustrates a face bounding box for faster face detection (prior-art) is shown, in accordance with an embodiment of the present subject matter.

Referring now to figure 6 illustrates a detection flow chart for bounding box based (prior-art) , is shown, in accordance with an embodiment of the present subject matter.

Referring now to figure 7 (100) illustrates operations to detect presence of at least one face in said at least one image executed by one or more processors (104) is shown, in accordance with an embodiment of the present subject matter.

In one implementation, said image patch map is created by using the steps of: identifying (302) said plurality of face patches for said at least one window, wherein said at least one window is a detected face region of said at least one face； training (304) a patch classifier using said plurality of face patches identified； evaluating (306) said patch classifier trained； and applying (308) said patch classifier on said windows using a pre-defined grid spacing, thereby creating (202) said image patch map.

In one implementation, the plurality of face patches are identified by using said at least one window surrounding a face box for identifying (302) said plurality of face patches, wherein said at least one window size is holds said plurality of face patches in a face template centered on said face template size.

In one implementation, said at least one window size is preferably of 36x36, and said face template size is preferably of 24x24.

In one implementation, said pre-defined grid spacing is preferably of size 6x6.

In one implementation, said bounding box is estimated by applying (308) a matrix mask on said image patch map to check at least one face patch from said plurality of face patches are mapped to said at least one face.

In one implementation, searching (206) within said bounding box is a localized searching (206) and is characterized by using an aggressive grid spacing of size 1x1.

In one implementation, the non-transitory computer readable storage medium (108) storing instructions performs a patch based approach for plurality of face patches identified for at least one window thereby applying (308) a full face classifier in said bounding box.

In one implementation, wherein training (304) said patch classifier is characterized by use of a decision tree using at least one face patch from said plurality of face patches to identify at least one patch type, and one-vs-all approach, wherein one-vs-all approach considers one face patch vs the rest of face patches.

In one implementation, it is understood that the evaluation of the patch classifier is the evaluation of the trained classifier on the target image or input received image by the device on which face is detected. In one implementation there are two sets of images one from which the classifier learns that a particular structure is face and the other one which this classifier is applied. Evaluation usually is referred to the application of this trained classifier on the target image in which a face is to be detected.

Referring now to figure 8 illustrates method (200) a detecting a presence of at least one face in at least one image is shown, in accordance with an embodiment of the present subject matter. The method may be described in the general context of computer executable instructions. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, etc., that perform particular functions or implement particular abstract data types. The method may also be practiced in a distributed computing environment where functions are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, computer executable instructions may be located in both local and remote computer storage media, including memory storage devices.

The order in which the method is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method or alternate methods. Additionally, individual blocks may be deleted from the method without departing from the spirit and scope of the subject matter described herein. Furthermore, the method can be implemented in any suitable hardware, software, firmware, or combination thereof. However, for ease of explanation, in the embodiments described below, the method may be considered to be implemented in the described processing device 102.

At step 202, an image patch map is created. In on implementation, the image map is created based on a plurality of face patches identified for at least one window in said in at least one image.

At step 204, a bounding box is estimated. In one implementation, the bounding box is estimated by applying (308) a matrix mask on said image patch map to check at least one face patch from said plurality of face patches are mapped to said at least one face.

At step 206, said bounding box is searched to detect presence of said at least one face in said at least one image. In one implementation, searching (206) within said bounding box is a localized searching (206) and is characterized by using an aggressive grid spacing of size 1x1.

Referring now to figure 9 illustrates a method for creating (202) said image patch map is shown, in accordance with an embodiment of the present subject matter.

At step 302, said plurality of face patches are identified for said at least one window. In one implementation, said at least one window is a detected face region of said of at least one face. In one implementation, said plurality of face patches are identified using said at least one window surrounding a face box for identifying (302) said plurality of face patches, wherein said at least one window size is holds said plurality of face patches in a face template centered on said face template size. In one implementation, said at least one window size is preferably of 36x36, and said face template size is preferably of 24x24.

At step 304, patch classifiers are trained using said plurality of face patches identified.

At step 306, said patch classifier trained are evaluated.

At step 308, said patch classifier are applied on said at least one window using a pre-defined grid spacing, thereby creating (202) said image patch map. In one implementation, said pre-defined grid spacing is preferably of size 6x6.

Referring now to figure 10 illustrates a special purpose processing device (102) for detecting a presence of at least one face in at least one image is shown, in accordance with an embodiment of the present subject matter.

In one implementation, said pre-defined grid spacing is preferably of size 6x6.

In one implementation, said bounding box is estimated by applying (308) a matrix mask on said image patch map to check at least one face patch from said plurality of face patches are mapped to said at least one face. The bounding box is estimated based on a threshold, wherein said bounding box is preferably of size 36x36, and said threshold is based on said at least one face patch mapped with said at least one face. In one example, the bounding box may be estimated based on a threshold value keeping a tolerance of 4 i.e., if 4 or more types in the mask matching then bounding box of 36x36 is chosen.

In one implementation, processing device (102) comprises of a processor (s) (104) and a memory (108) coupled to the processor (s) (104) . The memory (108) may have a plurality of instructions stored in it. The instructions are executed using the processor (104) coupled to the memory (108) .

In one embodiment, the computer system (102) may include at least one processor (104) ； an interface (s) (106) may be an I/O interface, and a memory (s) (108) . The at least one processor (104) may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the at least one processor (104) is configured to fetch and execute computer-readable instructions stored in the memory (108) .

The I/O interface (106) may include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like. The I/O interface (106) may allow the computer system (102) to interact with a user directly or through the client devices (not shown) . Further, the I/O interface (106) may enable the computer system (102) to communicate with other computing devices, such as web servers and external data servers (not shown) . The I/O interface (106) can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. The I/O interface (106) may include one or more ports for connecting a number of devices to one another or to another server.

The memory (108) may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM) , and/or non-volatile memory, such as read only memory (ROM) , erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. The memory 108 may include but not limited to the plurality of instruction (s) . In one implementation the memory may include said face detector (404) which further comprises of plurality of instruction (s) configured to perform operations to detect presence of said at least one face in said one or more images. The operation may include but not limited to creating (202) an image patch map based on a plurality of face

patches identified for at least one window in said one or more images； estimating (204) a bounding box； and searching (206) within said bounding box to detect presence of said at least one face in said one or more images.

In one implementation, the processing device (102) may include a storages (402) configured to store at least one image received from the external devices or captured by said processing device (102) .

Although the present subject matter is explained considering that the present system 102 is implemented as a processing device (102) , it may be understood that the processing device (102) may also be implemented in a variety of computing systems, such as a laptop computer, a desktop computer, a notebook, a workstation, a mainframe computer, a server, a network server, as a software on a server and the like. It will be understood that the processing device (102) may be accessed by multiple users through one or more user devices collectively referred to as user hereinafter, or applications residing on the user devices. Examples of the processing device (102) may include, but are not limited to, a portable computer, a personal digital assistant, a handheld device, and a workstation.

Referring now to figure 11 illustrates a face patch classification in present invention is shown, in accordance with an embodiment of the present subject matter.

Referring now to figure 12 illustrates a face patch examples in present invention are shown, in accordance with an embodiment of the present subject matter.

Referring now to figure 13 illustrates a face patch masking operation in present invention is shown, in accordance with an embodiment of the present subject matter.

Referring now to figure 14 illustrates a subsequent localized search within bounding box in present invention is shown, in accordance with an embodiment of the present subject matter.

Referring now to figure 15 illustrates a flow chart for face detection in present invention is shown, in accordance with an embodiment of the present subject matter.

In one implementation, the patch classifier discussed in above sections is derived using a decision tree or a random forest based classifier. But it is well understood by the person skilled in the art that any other classifier may be used as well in place of these.

Secondly, the feature representation is by the use of MCT in the present invention. But it is well understood by the person skilled in the art that any other feature type may be chosen which will have some accuracy versus CPU performance tradeoff.

Next, a simple mask as shown in figure 13 is disclosed, but it is well understood by the person skilled in the art that there are several other variations of this mask that can be employed. One technique is to assign different weighting to different patches. It is possible to use a weight based approach for patch classification wherein with every detected face patch there will be a corresponding weighting assigned to that patch. The final output will be summation of that weighting threshold by an empirical value that can be derived during training phase.

Thus, it is well understood by the person skilled in the art that the present invention encompasses an idea of using any classifier which works on top of any feature representation to identify face patches and then using a masking system to identify bounding boxes.

Although implementations for a processing device and method for faster face detection have been described in language specific to structural features and/or methods, it is to be understood that the appended claims are not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed as examples of implementations for the processing device and method for faster face detection.

Exemplary embodiments discussed above may provide certain advantages. Though not required to practice aspects of the disclosure, these advantages may include:

Exemplary embodiments discussed above may provide certain applicable areas of the present invention. Though not required to practice aspects of the disclosure, these application of the invention may include:

Handheld Terminals/devices: Face detection step is a precursor to any face recognition system. The technique mentioned in this document will make sure that even the low-cost, low-power handheld terminals can have face detection logic inbuilt.

HD Video Surveillance: With High-def camera becoming commodity hardware it becomes all the more important to process the HD input video frames at higher speed within the constrained hardware. The technique mentioned here will improve the speed of detection many folds.

Camera auto-focus based on face detection: Cameras with face detection notice when a face is in the frame and then set the autofocus and exposure settings to give priority to the face. Cameras usually have low cpu capability and thus it is more important to achieve HD frame face detection with lower CPU utilization.

Finally, it should be understood that the above embodiments are only used to explain, but not to limit the technical solution of the present application. Despite the detailed description of the present application with reference to above preferred embodiments, it should be understood that various modifications, changes or equivalent replacements can be made by those skilled in the art without departing from the scope of the present application and covered in the claims of the present application.

Claims

A method (200) for detecting a presence of at least one face in at least one image, said method (200) comprising:

creating (202) an image patch map based on a plurality of face patches identified for at least one window in said in at least one image；

estimating (204) a bounding box based on said image patch map created； and

searching (206) within said bounding box to detect presence of said at least one face in said at least one image.
The method (200) as claimed in claim 1, wherein creating (202) said image patch map is characterized by the steps of:

identifying (302) said plurality of face patches for said at least one window, wherein said at least one window is a detected face region of said of at least one face；

training (304) a patch classifier using said plurality of face patches identified；

applying (308) said patch classifier on said at least one window using a pre-defined grid spacing, thereby creating (202) said image patch map.
The method (200) as claimed in claims 1 and 2, wherein said plurality of face patches are identified using a window of a window size surrounding a face box, wherein said window holds said plurality of face patches in a face template centered on a face template size, and said face box is obtained by a face detector.
The method (200) as claimed in claims 1 to 3, wherein said window size is preferably of 36x36, and said face template size is preferably of 24x24.
The method (200) as claimed in claims 1 to 4, wherein said pre-defined grid spacing is preferably of size 6x6.
The method (200) as claimed in claims 1 to 5, wherein estimating (204) said bounding box comprising:

applying (308) a matrix mask on said image patch map to check at least one face patch from said plurality of face patches are mapped to said at least one face, thereby estimating (204) said bounding box based on a threshold, wherein said bounding box is preferably of size 36x36, and said threshold is based on said at least one face patch mapped with said at least one face
The method (200) as claimed in claims 1 to 6, wherein searching (206) within said bounding box is a localized searching (206) and is specifically performed by using a grid spacing of size 1 x1.
The method (200) as claimed in claims 1 to 7 is comprises the use of a patch based approach for plurality of face patches identified for at least one window thereby applying (308) a patch classifier in said bounding box.
The method (200) as claimed in claims 1 to 6, wherein training (304) said patch classifier is characterized by use of a decision tree using at least one face patch from said plurality of face patches to identify at least one patch type, and one-vs-all approach, wherein one-vs-all approach considers one face patch vs the rest of face patches.
A processing device (102) for detecting a presence of at least one face in at least one image, the processing device (102) comprising:

a processor (104) ；

a memory (108) coupled to the processor for executing a plurality of instructions present in the memory (108) , the execution of the instructions cause the processor (104) to perform operations comprising:

creating (202) an image patch map based on a plurality of face patches identified for at least one window in at least one image；

estimating (204) a bounding box based on said image patch map created； and

searching (206) within said bounding box to detect presence of at least one face in said at least one image.
The processing device (102) as claimed in claim 10, wherein creating (202) said image patch map is characterized by the steps of:

identifying (302) said plurality of face patches for said at least one window, wherein said at least one window is a detected face region of said at least one face；

training (304) a patch classifier using said plurality of face patches identified；

applying (308) said patch classifier on said windows using a pre-defined grid spacing, thereby creating (202) said image patch map.
The processing device (102) as claimed in claims 10 and 11, wherein said plurality of face patches are identified by using a window of a window size surrounding a face box for identifying (302) said plurality of face patches, wherein said window holds said plurality of face patches in a face template centered on said face template size, and said face box is obtained by a face detector.
The processing device (102) as claimed in claims 10 to 12, wherein said at least one window size is preferably of 36x36, and said face template size is preferably of 24x24.
The processing device (102) as claimed in claims 10 to 13, wherein said pre-defined grid spacing is preferably of size 6x6.
The processing device (102) as claimed in claims 10 to 14, wherein estimating (204) said bounding box is characterized by the steps of:

applying (308) a matrix mask on said image patch map to check at least one face patch from said plurality of face patches are mapped to said at least one face, thereby estimating (204) said bounding box based on a threshold, wherein said bounding box is preferably of size 36x36, and said threshold is based on said at least one face patch mapped with said at least one face.
The processing device (102) as claimed in claims 10 to 15, wherein searching (206) within said bounding box is a localized searching (206) and is specifically performed by using a grid spacing of size 1x1.
The processing device (102) as claimed in claims 10 to 16 comprises the use of a patch based approach for plurality of face patches identified for at least one window thereby applying (308) a full face classifier in said bounding box.
The processing device (102) as claimed in claims 10 to 17, wherein training (304) said patch classifier is characterized by use of a decision tree using at least one face patch from said plurality of face patches to identify at least one patch type, and one-vs-all approach, wherein one-vs-all approach considers one face patch vs the rest of face patches.
The processing device (102) as claimed in claims 10 to 17, wherein said processing device (102) is configured to store at least one image which is used for detecting a presence of at least one face.
A processing device (102) comprising:

one or more storages (402) capable of storing one or more images and other data； and

a face detector (404) configured to perform operations comprising:

creating (202) an image patch map based on a plurality of face patches identified for at least one window in said one or more images；

estimating (204) a bounding box based on said image patch map created； and

searching (206) within said bounding box to detect presence of said at least one face in said one or more images.
The processing device (102) as claimed in claim 20, wherein creating (202) said image patch map is characterized by the steps of:

identifying (302) said plurality of face patches for said at least one window, wherein said at least one window is a detected face region of said of at least one face；

training (304) a patch classifier using said plurality of face patches identified；

applying (308) said patch classifier on said windows using a pre-defined grid spacing, thereby creating (202) said image patch map.
The processing device (102) as claimed in claims 20 and 21, wherein said plurality of face patches are identified by using said a window of a window size surrounding a face box for identifying (302) said plurality of face patches, wherein said window holds said plurality of face patches in a face template centered on said face template size, and said face box is obtained by a face detector.
The processing device (102) as claimed in claims 19 to 21, wherein said at least one window size is preferably of 36x36, and said face template size is preferably of 24x24.
The processing device (102) as claimed in claims 20 to 23, wherein said pre-defined grid spacing is preferably of size 6x6.
The processing device (102) as claimed in claims 20 to 24, wherein estimating (204) said bounding box is characterized by the steps of:

applying (308) a matrix mask on said image patch map to check at least one face patch from said plurality of face patches are mapped to said at least one face, thereby estimating (204) said bounding box based on a threshold, wherein said bounding box is preferably of size 36x36, and said threshold is based on said at least one face patch mapped with said at least one face.
The processing device (102) as claimed in claims 20 to 25, wherein searching (206) within said bounding box is a localized searching (206) and is specifically performed by using a grid spacing of size 1x1.
The processing device (102) as claimed in claims 20 to 26 comprises the use of a patch based approach for plurality of face patches identified for at least one window thereby applying (308) a patch classifier in said bounding box.
The processing device (102) as claimed in claims 20 to 27, wherein training (304) said patch classifier is characterized by use of a decision tree using at least one face patch from said plurality of face patches to identify at least one patch type, and one-vs-all approach, wherein one-vs-all approach considers one face patch vs the rest of face patches.