WO2016183380A1 - Facial signature methods, systems and software - Google Patents

Facial signature methods, systems and software Download PDF

Info

Publication number: WO2016183380A1
Authority: WO; WIPO (PCT)
Prior art keywords: user; camera; image; disparity; images
Prior art date: 2015-05-12
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.): Ceased

Application number

PCT/US2016/032213

Other languages

English (en)

French (fr)

Inventor

James A. MCCOMBE

Rolf Herken

Brian W. Smith

Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)

Mine One GmbH

Original Assignee

Mine One GmbH

Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)

2015-05-12

Filing date

2016-05-12

Publication date

2016-11-17

2016-03-21 Priority claimed from PCT/US2016/023433 external-priority patent/WO2016154123A2/en

2016-05-12 Application filed by Mine One GmbH filed Critical Mine One GmbH

2016-05-12 Priority to EP16793565.9A priority Critical patent/EP3295372A4/de

2016-05-12 Priority to US15/573,475 priority patent/US10853625B2/en

2016-11-17 Publication of WO2016183380A1 publication Critical patent/WO2016183380A1/en

2017-11-12 Anticipated expiration legal-status Critical

2020-11-30 Priority to US17/107,413 priority patent/US11995902B2/en

Status Ceased legal-status Critical Current

Links

Classifications

- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
- G06V10/758—Involving statistics of pixels or of feature values, e.g. histogram matching

Definitions

the present invention relates generally to methods, systems and computer program products ("software") for enabling a virtual three-dimensional visual experience (referred to herein as "V3D "5 ) in videoconferencing and other applications; for capturing, processing and displaying of images and image streams; and for generating a facial signature, based on images of a given human user's or subject's face, for enabling accurate, reliable identification or authentication of a human user or subject in a secure, difficult to forge manner.
V3D virtual three-dimensional visual experience
V3D virtual 30 experience
( 10) generate a facial signature, based on images of a gi ven human user's or subject's face, or face and head, for enabling accurate, reliable identification, authentication or matching of a human user or subject, in a. secure, difficult to forge manner.
the present invention provides methods, systems, de vices and ' computer -software/program code products that enable the foregoing aspects and others.
Some embodiments and practices of the invention are collectively referred to herein as V3D.
Certain other embodiments and practices of the invention are collectively referred to as Facial Signature aspects of the invention.
Facial Signature aspects of the invention may utilize certain V3D aspects of the invention.
the present invention provides methods, systems, devices, arid computer softw3 ⁇ 4ue/program code products for, among other aspects and possible applications, facilitating video communications and presentation of image and video content, and generating i mage input streams for a control system of autonomous vehicles; and for generating a facial signature, based on images of a human user's or subject's face, for enabling accurate, reliable identification or authentication of a human user or subject, in a secure:, difficult to forge manner.
Methods, systems, devices, and computer software/program code products in accordance with the in vention are suitable for implementation or execution in, or in conjunction with, commercially available computer graphics processor configurations and systems including one or mote display screens for displaying images, cameras for capturing images, and graphics processors for rendering images for storage or for display, such as on a di play sc een, and for processing data values for pixels in an image representation.
the cameras, graphics processors and display screens cart be of a form provided in commercially available smartphones, tablets and other mobile telecommunications devices, as well as in commercially available laptop and desktop computers, which may communicate using commercially available network architectures including client/server and client netvvork/cioud architectures.
digital processors which can include graphics processor units, including GPGPUs such as those commercially available on cellphones, smartphoo.es, tablets and other commercially available telecommunications and computing devices, as well as in digi tal display devices and digital cameras.
GPGPUs graphics processor units
Those skilled in the art to which this invention pertains will understand the structure and operation of digital processors, GPGPUs and similar digital graphics processor units.
One aspe ct of the present invention relates to methods, sy stems and computer soft are/program code products that enable a first user to view a second user with direct virtual eye contact with the second user.
This aspect of the invention comprises capturing images of the second use , utilizing at least one camera having a view of the second user's face: generating a data representation, representative of the captured images; reconstructing a synthetic view of the second riser, based on the representation: and displaying the synthetic view to the first user on a display screen used by the first user; the capturing, generating, reconstructing and displaying being executed such that the first user can have direct virtual eye contact with the second user through the first user's display screen, by the reconstructing and displaying of a synthetic view of the second user in which the second user appears to be gazing directly at the first user, even if no camera has a direct eye contact, gaze vector to the second user.
Another aspect includes executing a feature correspondence function by detecting common features between corresponding images captured by the at least one camera and measuring a relative distance in ima e space between the common features, to generate disparity values; wherein the data, .representation is representati e of the captured images and the corresponding disparity values; the capturing, detecting, generating, reconstructing and displaying being executed such that the first user can have direct virtual eye contact with the second user through the first user's display screen.
the capturing includes utilizing at least two cameras, each having a view of the second user's face: and executing a feature correspondence function comprises detecting common features between corresponding images captured by the respective cameras,
the capturing comprises utilizing at least one camera having a view of the second user's face, and. which is an infra-red time-of-f!ight camera that directly provides depth
the data representation is representative of the captured images and corresponding depth information.
the capturing includes utilizing a single camera having a view of the second user's face: and executing a feature correspondence function comprises detecting common features between sequential images captured by the single camera over time and measuring a relative distance in image space between the common features, to generate disparity values.
the captured images of the second user comprise visual information of the scene surrounding the second user: and the capturing, detecting, generating, reconstructing and displaying are executed such that: (a) the first user is provided the visual impression of looking through his display screen as a physical windo to the second user and the visual scene surrounding the second user, and (b) the first user is pro vided an immersive visual experience of the second user and the scene surrounding the second user.
Another practice of the invention includes executing image rectification to compensate for optical distortion of each camera and relative misalignment of the cameras.
executing linage rectification comprises apply ing a 2D image space transform; and applying a 2D image space transform comprises utilizing a GPGPU processor running a shader program.
the cameras for capturing images of the second user are located at or near the periphery or edges of a displa device used by the second user, the dispiay device used by the second user having a display screen viewable by the second user and having a geometric center; and the synthetic view of the second user corresponds to a selected virtual camera location, the selected virtual camera location corresponding to a point at or proximate to the geometric center.
the cameras for capturing images of the second user are located at a selected position outside the periphery or edges of a display device used by the second user .
respective camera view vectors are directed in non- copiaoar orientations.
the cameras for capturing images of the second user, or of a remote scene are located in selected positions and positioned with selected orientations around the second user or the remote scene.
Another aspect includes estimating a location of the first user's head or eyes, thereby generating tracking information; and the reconstructing of a synthetic view of the second user comprises
camera shake effects are inherently eliminated, in that the capturing, detecting, generating, reconstructing and displaying are executed such that the first user has a virtual direct view through his display screen to the second user and the visual scene surrounding the second user: and scale and perspective of the image of the second user and objects in the visual scene surrounding the second user are accurately represented to the first user regardless of user view distance and angle.
This aspect of the invention provi des, on the user's display screen, the visual impression of a frame without glass; a window into a 3D scene of the second user and the scene surrounding the second user.
the invention is adapted for uttplemeatatioa on a mobile telephone device.
the cameras for capturing images of the second user are located at or near the periphery or edges of a mobile telephone device used by the second user.
die invention is adapted for implementation on a laptop or desktop computer, and the cameras for capturing images of the second user are located at or near the periphery or edges of a display device of a laptop or desktop computer used by the second user.
the invention is adapted for implementation on computing or
telecommunications devices comprisin any of tablet computing devices, computer-driven televi ion displays or computer-driven image projection devices, and wherein the cameras for capturing images of the second user are located at or near the periphery or edges of a computing or telecommunications device used by the second user.
One aspect of the present invention relates to methods, systems and computer software/program code products that enable a user to view a remote scene in a manner that gives the user a visual
This aspect, of the invention includes capturing images of the remote scene, utilizing at least two cameras each having a vie of the remote scene; executing a feature correspondence function by detecting common features between
the capturing, detecting, generating, reconstructing and displaying being executed such that: (a) the user is provided the visual impression, of looking th rough his display screen as a physical window to the remote scene, and (b) the user is provided an immersive visual experience of the remote scene.
the capturing of images includes using at least one color camera. In another practice of the invention, the capturing includes using at least one infrared structured light emitter.
the capturing comprises utilizing a view vector rotated camera configuration wherein the locations of first and second cameras define a .line; and the line defined b the first and second camera, locations is rotated by a selected amount from a selected horizontal or vertical axis; thereby increasing the number of valid feature correspondences identified in typical real-world settings by the feature: correspondence function.
the first and second cameras are positioned relati ve to each other along epipoiar lines.
disparity values are rotated back to a selected horizontal or vertical orientation along with the captured linages.
the synthetic view is rotated back to a selected horizontal or vertical orientation.
the capturing comprises exposure cycling, comprising dynamically adjusting the exposure of the cameras on a frame-by-frame basis to improve disparity estimation in regions outside the exposed region viewed by th user; wherein a series of exposures are taken, including exposures lighter than and exposures darker than a visibility-optimal exposure, disparity values are calculated for each exposure, and the disparity values are integrated into an overall disparity solution over time, so as to improve disparity estimation.
the exposure cycling comprises dynamicall adjusting the exposure of the cameras on a frame-b -frame basis to improve disparity estimation in regions outside the exposed region viewed by the user: wherein a series of exposures are taken, including exposures lighter than and exposures darker than a visibility-optimal exposure, disparity values are calculated for each exposure, and the disparity values are integrated in a disparity histogram, the disparity histogram being converged over time, so as to improve disparity estimation.
a further aspect of the invention comprises analyzing the quality of the overall disparity solution on respective dark, mid-range and light pixels to generate variance information used to control the exposure settings of the cameras, thereby to form a closed loop between the quality of the disparity estimate and the set of exposures requested from the cameras.
Another aspect includes analyzing variance of the disparity histograms on respective dark, mid- range and light pixels to generate variance information used to control the exposure settings of the cameras, thereby to form a closed loop between the quality of the disparity estimate and the set of exposures requested from the cameras.
the feature correspondence function includes evaluating and combining vertical- and horizontal-axis correspondence information.
the feature correspondence function further comprises applying, to image pixels containing a disparity solution, a coordinate transformation, to unified coordinate system.
the unified coordinate sy stem can be the mi-rectified coordinate system of the captured images.
Another aspect of the invention incl udes using at least three cameras arranged in substantially "L"-shaped configuration, such that a pair of cameras is presented along a first axis ami a second pair of cameras is presented along a second axis substantially perpendicular t the first axis.
the feature correspondence function utilizes a disparity histogram-based method of integrating data and determining correspondence.
the refining can inciude retaining a disparity solution over a time interval, and continuing to integrate disparity solution values for each image frame over the time interval, so as to converge on an improved disparity solution by sampling over time.
the feature correspondence function comprises filling unknowns in a correspondence information set wit historical data obtained .from previously captured images.
the filling of unknowns can include the following: if a gi ven image feature is detected in an image captured by one of the cameras, and no corresponding image feature is found in a corresponding image captured by another of the cameras, then utilizing data for a pixel corresponding to the given image feature, from a corresponding, previously captured image.
the feature correspondence function utilizes a disparity histogram -based method of integrating data and determining correspondence.
This aspect of the invention can include constructing a disparity histogram indicating the relative probability of a given disparity value being correct for a given pixel.
the disparity histogram functions as a Probability
PDF Densit Function
one axis of the disparity histogram indicates a given disparity range
a second axis of the histogram indicates the number of pixels in a kernel surrounding the central pixel in question that are voting for the given disparity range.
votes indicated by the disparity histogram are initially generated utilizing a Sum of Square Differences (SSD) method, which can comprise executing an SSD method with a relati vel small kerael to produce a fast dense disparity map in which each pixel has a selected disparity that represents the lowest error;
SSD Sum of Square Differences
processing a plurality of pixels to accumulate into the disparity histogram a tally of the number of votes for a given disparity in a relatively larger kernel surrounding the pixel in question.
Another aspect of the invention includes transforming the disparity histogram into a Cumulative Distribution Function (CDF) from which the width of a corresponding interquartile range can be determined, thereby to establish a confidence level in the corresponding disparity solution.
CDF Cumulative Distribution Function
a further aspect includes maintaining a count of the number of statistically significant modes in the histogram, thereby to indicate modality, to accordance with the invention, modality can be used as an input to the above-described reconstruction aspect, to control application of a stretch vs. slide
reconstruction method Sril another aspect of the invention includes .maintaining a disparity histogram over a selected time interval and accumulating samples into the histogram, thereby to compensate for camera noise or other sources of motion or error.
Another aspect includes generating fast disparity estimates for multiple independent axes; and then combining the corresponding, respective disparity histograms to produce a statistically more robust disparity solution.
Another aspect includes evaluating the interquartile ⁇ range of a CDF of a given disparit - histogram to produce an interquartile result; and if the interquartile result is indicative of an area of poor sampling signal to noise ratio, due to camera over- or underexposure, then controlling camera exposure based on the interquartile result to improve a poorly sampled area of a given disparity histogram.
Yet another practice of the in vention includes testing for only a small se t of disparity values using a smali-kemel SSD method to generate initial results; populating a corresponding disparity histogram with the initial results; and then, using histogram votes to drive further SSD testing within a given range to improve disparity resolution over time.
Another aspect include extracting sub-pixel disparity information from the disparity histogram, the extracting including the following: where the histogram indicates a maximum-vote disparity range and an adjacent, runner-up disparity range, calculating a weighted average disparity value based on the ratio between the number of votes for each of the adjacent disparity ranges.
the feature correspondence function comprises w eighting toward a center pixel in a Sum. of Squared Differences (SSD approach, wherein the weighting includes applying a higher weight: to die center pixel for which a disparity solution is sought, and a lesser weight outside the cen ter pixel the lesser weight being proportional to the distance of a given kernel sample from the center.
SSD approach wherein the weighting includes applying a higher weight: to die center pixel for which a disparity solution is sought, and a lesser weight outside the cen ter pixel the lesser weight being proportional to the distance of a given kernel sample from the center.
the feature correspondence function comprises optimizing generation of disparity values on GPGPIJ computing structures.
GPGPU computing structures are commercially available, and are contained in commercially available forms of smartphones and tablet computers.
generating a data representation includes generating a dat structure representing 2D coordinates of a control point in image space, and c nt ining a disparity value treated as a pixel velocity in screen space with respect to a given movement of a. given view vector: and using the disparity value in combination with a movement vector to slide a pixel in a given source image in selected directions, in 2D, to enable a reconstruction of 3D image movement.
each camera generates a. respective camera stream; and the data structure representing 2D coordinates of a control point further contains a sample buffer index, stored in association with the control point coordinates, which indicates which camera stream to sample in association with the given control point.
Another aspect includes determining whether a given, pixel should be assigned a control point.
a related practice of the invention includes assigning control points along image edges, wherein assigning control points along image edges comprises executing computations enabling identification of image edges.
generating a data representation includes flagging a given iniage feature with a reference count indicating how man samples reference the given image feature, thereby to differentiate a uniquely referenced image features, and a sample corresponding to the uniquely referenced image feature, from repeatedly referenced image features; and. using the reference count, extracting unique samples, so as to enable a reduction in bandwidth requirements.
generating a data representation further includes using the reference count to encode and transmit a given sample exactly once, even if a pixel or image feature corresponding to the sample is repeated in multiple camera views, so as to enable a. reduction in bandwidth requirements.
Yet another aspect of the invention includes estimating a location of the first user's head or eyes, thereby generating tracking information; wherein the reconstructing of a synthetic view of the second user comprises reconstructing the synthetic view based on the generated data representation and the generated tracking infb-rmation; and wherein 3D image reconstruction is executed by warping a 2D image by utilizing the control points, by sliding a given pixel along a head movement vector at a displacement rate proportional to disparity, based on the tracking information and disparity values.
the disparity values are acquired from the feature correspondence function or from a co rol point data stream.
reconstructing a synthetic view comprises utilizing the tracking information to control a 2D crop box, such that the synthetic view is reconstructed based on the view origin, and then cropped and scaled so a to fill the first user's display screen view window; and the minima and maxima of the crop box are defined as a function of the first user's head location with respect to the displ ay screen, and the dimensions of the disp lay screen view window ,
reconstructing a synthetic view comprises executing a 2D warping reconstruction of a. selected view based on seiected control points, wherein the 2D warping
reconstruction includes designating a set of control points, respective control points corresponding to respective, selected pixels in a source image; sliding the control points in seiected directions in 2D space, wherein the control points are slid proportionally to respecti ve disparity values; and interpolating data values for pixels between the selected pixels corresponding to the control points; so as to create a synthetic view of the image from a selected new perspective in 3D space.
the invention can further include rotating the source image and control point coordinates such, that rows or columns of image pixels are parallel to the vector between the original source image center and the new view vector defined by the selected new perspective.
a related practice of the invention further include rotating the source image and control point coordinates so as to align, the view vector to image scanlines; iterating through each scaaline and each control point for a given scanline, generating a line element beginning and ending at each control point in 2D image space, with the addition of the corresponding disparity value multiplied by the corresponding view vector magnitude with the corresponding x-axis coordinate: assigning a texture coordinate to the beginning and ending points of each generated l ne element, equal to their respective, original 2D location in the source image; and interpolating texture coordinates linearly along each line element; thereby to create a resulting image in which image data between the control points is linearly stretched.
the invention can also include rotating the resulting image back by the inverse of the rotation applied to align the view vector with die scaniines.
Another practice of the invention includes linking the control points between scaniines, as well as along scaniines, to create polygon elements defined ' by the control points, across which interpolation; is executed.
reconstructing a sy nthetic view further comprises, for a gi ven source image, selecti vely sliding image foreground and image background independently of each other.
sliding is utilized in regions of large disparity or depth change.
a determination of whether to utilize sliding includes evaluating a disparity histogram to detect multi-modal behavior indicating that a given control point is on an. image boundary for which allowing foreground and background to slide independent of each other presents a better solution than interpolating depth between foreground and background; wherein the disparity histogram functions as a Probability Density Function (PDF) of disparity for a given pixel in which higher values indicate a higher probability of the corresponding disparity range being valid for the given pixel
PDF Probability Density Function
reconstructing a synthetic view includes using at least one Sample Integration Function Table (SIFT), the SIFT comprising a table of sample integration functions for one or more pixels in a desired output resolution of an image to be displayed to the user, wherein a given sample integration function maps an input view origin vector to at least one known, weighted 2D image sample location in at least one input image buffer.
SIFT Sample Integration Function Table
displaying the synthetic view to the first user on a display screen used by the first user includes displaying the synthetic view to the first user on a 2D display screen; and updating the display in real-time, based on the tracking information, so that the display appears to the first user to he a window into a 3D scene responsive to the first user ' s head or eye location.
Displaying the synthetic view to the first user on a display screen used by the fi st user can include displaying the synthetic view to the first user on a binocular stere display device: or, among other alternatives, on a lenticular display that enables auto-stereoscopic viewing.
One aspect of the present invention relates to methods, systems and computer software/program code products that facilitate se ' tf-portraiture of a user utilizing a. handheld device to take the self-portrait, the handheld mobile device having a display screen for displaying images to the user.
This aspect includes providing at least one camera around the periphery of the display screen, the at least one camera having a view of the user's face at a self portrait setup time during which the user is setting up the self- portrait; capturing images of the user during the setup time, utilizing the at least one camera around the periphery of the display screen; estimating a location of the user's head or eyes relative to the handheld device during the setup time, thereby generating tracking information; generating a data representation, representative of the captured images; reconstructing a synthetic view of the user, based on the generated data representation and the generated tracking information; displaying to the user, on the display screen during the setup time, the synthetic view of the user; thereby enabling the user, while setting up the self- portrait, to selectively orient or position his gaze or head, or the handheld device and its camera, with realtime visual feedback.
the capturing, estimating, generating, reconstructing and displaying are exec tried such that in the self-portrait the user can appear to be looking directly into the camera, even if the camera does not have a di rect eye contact gaze vector to the user.
One aspect of the present invention relates to methods, systems and computer software/program code products that facilitate composition of a photograph of a scene, by a user utilizing a handheld device to take die photograph, the handheld device having a display screen on first side for displaying images to the user, and at least one camera on a second, opposite side of the handheld device, for capturing images.
This aspect includes capturing images of the scene, utilizing the at least one camera, at a photograph setup time during which the user is setting up the photograph; estimating a location of the user's head or eyes relative to the handheld device during tine setup time, thereby generating tracking information; generating a data representation, representative of the captured images; reconstructing a synthetic view of the scene, based on the generated data representation and the generated tracking information, the synthetic view being reconstructed such that the scale and perspective of the synthetic view has a selected correspondence to the user's viewpoint relative to the handheld device and the scene; and displaying to the user, on the display screen during the setup time, the synthetic view of tire scene; thereby enabling the user, while setting up the photograph, to frame the scene to be photographed, with selected scale and perspective within the display frame, with realtime visual feedback.
the user can control the scale and pe rspective of the synthet ic view by changing the position of the handheld device relative to the position of the user's head.
estimating a location of the user's head or eyes relative to the handheld device includes using at least one camera on the first, display side of the handheld device, having a view of the user's head or eyes during photograph setup time.
the invention enables the features described herein to be provided in a manner that fits within the form factors of modern mobile devices such as tablets and smartphones, as well as the form factors of laptops, PCs, computer-driven televisions, computer-dri en projector devices, and the like, does not dramatically alter the economics of building such devices, and is viable within current or near current communications network/connecti vi ty architectures .
One aspect of the present invention relates to methods, system and computer software/program code products for displaying images to a user utilizing a binocular stereo head-mounted display (HMD).
This aspect includes capturing at least two image streams using at least one camera attached or mounted on or proximate to an external portion or surface of the HMD, the captured image streams containing images of a scene: generating a dat representation- representative of captured images contained in the captured image streams; reconstructing two synthetic views, based on the representation; and displaying the synthetic views to the user, via the HMD; the reconstructing and displaying being executed such that each of the synthetic views has a respective view origin corresponding to a respecti ve virtual camera location, wherein the respective view origins are positioned such that the respective virtual camera locations coincide with respective locations of the user's left and right eyes, so as to provide the user with a substantially natural visual experience of the perspective, binocular stereo and occlusion aspects of the scene, substantially as if the user were directly viewing the scene without an HMD.
Another aspect of the present invention relates to methods, systems and computer
the image content can include pre-recorded image content, which can be stored, transmitted, broadcast downloaded, streamed or otherwise made available.
This aspect includes capturing or generating at least two image streams rising at least one camera, the captured image streams containing images of a scene; generating a data representation, representative of captured images contained in the captured image streams; reconstructing two synthetic views, based on the representation; and displayin the synthetic views to a user, via the HMD, the reconstructing and displaying being executed such that each of the synthetic views has a respective view origin corresponding to a respective virtual camera location, wherein the respective view origins are positioned such that the respective virtual camera locations coincide with respective locations of the user's left and right eyes, so as to provide the user with a substantially natural visual experience of the perspective, binocular stereo and occlusion aspects of the scene, substantially as if the user were directly viewing the scene without an HMD.
the data representation can be pre-recorded, and stored, transmitted, broadcast downloaded, streamed or otherwise .made available.
Another aspect of the invention includes tracking the location or position of the user's head or eyes to generate a motion vector usable in the reconstructing of synthetic views.
the motion vector can be used to modif the respective view origins, during the reconstructing of synthetic views, so as to produce intermediate image frames to be interposed between captured image frames in the captured image streams; and interposing the intermediate image frames between the captured image frames so as to reduce apparent latency.
At least one camera is a panoramic camera, night-vision camera, or thermal imaging camera.
One aspect of the i vention relates to methods, systems and computer software program code products for generating an image data stream for use by a control system of an autonomous vehicie.
This aspect includes capturing images of a scene around at least a portion of the vehicle, the capturing comprising utilizing at least one camera having a view of the scene: executing a feature correspondence function by detecting common features between corresponding images captured by the at least one camera and measuring a relative distance in image space between the common features, to generate disparity values; calculating corresponding depth information based on. th disparity values; and generating from the images and corresponding depth, information an image data, stream for use by the control system.
the capturing can include capturing comprises utilizing at least tw o cameras, each having a view of the scene: and executing a feature correspondence function comprises detecting common features between corresponding images captured by the respective cameras.
the capturing can include using a single camera having a view of the scene; and executing a feature correspondence function comprises detecting common features between sequential images captured by the single camera over time and measuring a relative distance in image space between the common features, to generate disparity values.
One aspect of the present invention relates to methods, systems and computer software/program code products that enable video capture and processing, including: f 1) capturing images of a scene, the capturing comprising utilizing at least first and second cameras having a view of the scene, the cameras being arranged along an axis to configure a stereo camera pair having a camera pair axis; and (2) executing a feature correspondence function, by detecting common features between corresponding images captured by the respective cameras and measuring a relative distance in image space between the common features, to generate disparity values, wherein the feature correspondence function comprises: constructing a multi-level disparity histogram indicating the relative probability of a. given disparity value being correct for a given pixel, the constructing of a multi-level disparity histogram comprising:
each level can be assigned a level number, and each successivel higher level can be characterized by a lower image resolution.
the downsampling can include reducing image resolution via low-pass filtering.
the downsampling can include using a weighted summation of a kernel in level jo.-! j to produce a pixel value in level jnj, and the normalized kernel center position remains the same across all levels.
the FDDE votes for eve image level are included in the disparity solution.
Another aspect of the invention includes generating a multi-level histogram comprising a set of initially independent histograms at different levels of resolution.
each histogram bin in given level represents votes for disparity determined by the FDDE at that level.
each histogram bin in a given level has an associated disparity uncertainty range, and the disparity uncertainty range represented by each, histogram bin is a selected multiple wider than the disparity uncertainty range of a bin in the preceding level
a further aspect of the invention includes applying a sub-pixel shift to the disparity values at each level during downsampling, to negate rounding error effect, in a related aspect, applying a sub- pixel shift comprises applying a half pixel shift to only one of the images in a stereo pair at each level of dow nsampling, in a further aspect applying a sub-pixel shift is implemented inline, within the weights of the filter kernel utilized to implement the downsampling from level to level.
Another aspect of the in vention includes executing histogram integration, the histogram
a related aspect includes, during summation, modifying the weighting of each level to control the amplitude of the effect of lower levels in overall voting, by applying selected weighting coefficients to selected levels.
Yet another aspect of the invention includes inferring a sub-pixel disparity solution from the disparity histogram, by calculating a sub-pixel offset ' based on the number of votes for the maximum vote disparity range and the number of votes for an adjacent, runner-up disparity range.
a summation stack can be maintained in a memory unit
One aspect, of the present invention relates to methods, systems and computer software/program code products that enable capturing of images using at least two stereo c mera pairs, each pair being arranged along a. respective camera pair axis, and for each camera pair axis: executing image capture utilizing the camera pair to generate image data; executing rectification and undistorting transformations to transform the image data into UD im ge space: iteratively downsampiing to produce multiple, successively lower resolution levels: executing FDDE calculations for each level to compile FDDE votes for each level; gathering FDDE disparity range votes into a multi-level histogram; determining the highest ranked disparity range in the multi-level histogram; and processing the muSti-level histogram disparity data to generate a final disparit result.
One aspect of the present invention relates to methods, systems and computer soft aavprogram code products that enable video capture and processing, including: (i) capturing images of a scene, the capturing comprising utilizing at least first and second cameras having a view of the scene, the cameras being arranged along an axis to configure a stereo camera pair; and (2.) executing a feature
the feature correspondence function by detecting common features between corresponding images captured by the respective cameras and measuring a relative distance in image space between the common features, to generate disparity values, the feature correspondence function further comprising: generating a dispari ty solution based on the disparity values; and applying an injective constraint to the disparity solution based on domain, and co-domain, wherein the domain comprises pixels for a. given image captured by the first camera and. the co-domain comprises pixels for a corresponding image captured by the second camera, to enable correction of error in the disparity solution in response to violation of the injective constraint, wherein the injecti ve constraint is thai no element in the co-domain is referenced more than once by elements in the domain.
applying an injective constraint comprises: maintaining reference count for each pixel in the co-domain, and checking whether the reference count for the pixels in the co-domain exceeds " 1". and if the count exceeds "I" then designating a violation and responding to the violation with a selected error correction approach.
the selected error correction approach can include any of (a) first come, first served, (b) best match wins, (c) smallest disparity wins , or (d) seek alternative candidates.
T ie first come, first served approach can include assigning priority to the fi rst element in the domain to claim an element in the co-domain, and if a second dement in.
the best match win approach can include: comparing the actual image matching error or corresponding histogram vote count between the two possible candidate elements in the domain against the contested element in. the co-domain, and designating as winner the domain candidate with the best match.
the smallest disparity wins approach can include: if there is a contest between candidate elements in the domain for a given co-domain element wherein each candidate element has a corresponding disparity, selecting the domain candidate with the smallest disparity and designating as invalid die others.
the seek alternative candidates approach can include: selecting and testing the next best domain candidate, based on a selected criterion, and iterating the selecting and testing until the violatio , is eliminated or a computational time limit is reached.
One aspect of the present invention relates to methods, systems and computer software/program code products that enable video capture in which a first user is able to view a second user with direct virtual eye contact with the second user, including: ( !) capturing images of the second user, the capturing comprising utilizing at least one camera having a view of the second user's face: (2) executing a feature correspondence function by detecting common features between corresponding images captured by the at least one camera and measuring a relative distance in image space betwee the common features, to generate dispari ty values; (3) generating a.
the location estimating comprises: (a) passing a captured image of the first user, the captured image including the first user's head and face, to a two-dimensional (2D) facial feature detector that utilizes the image to generate a first estimate of head and eye location and a rotation angle of the face relative to an image plane: (b) utilizing an estimated eenter-of-faee position, face rotation angle, and head depth range based on the first estimate, to determine a three-dimensional (3 D) location of the first user's head, face or eyes, thereby generating tracking information; and (5) reconstructing a synthetic view of the second user, based on the representation, to enable a display to the first user of a synthetic view of the second user in which the second user appears to be gazing directly at the first user, wherein the reconstructing of a synthetic view of the second user comprises reconstructing the synthetic view based on the generated data representation and the generated tracking inibrmation; and wherein the location estimating comprises: (a) passing a captured image of the first user, the
a related aspect of the invention includes downsampling the captured image before passing it to the 2D facial feature detector. Another aspect includes interpolating image data from video frame to video frame, based on the time that has passed from a given video frame from a previous vide frame. Another aspect includes converting image data, to luminance values.
One aspect of the present invention relates to methods, systems and computer software/program code products that enable video capture and processing, including: (1 ) capturing images of . scene, the capturing comprising utilizing at least three cameras having a view of the scene, the cameras being arranged in a substantially "L" -shaped configuration wherein a first pair of cameras is disposed along a first axis and second pair of cameras is disposed along a second axis intersecting with, but angularly displaced from, the first axis, wherein the first and second pairs of cameras share a common camera at or near the intersection of the first and second axis, so that the first and second pairs of cameras represent respecti ve first and second independent stereo axes that share a common camera; (2) executing a feature correspondence function by detecting common features between corresponding images captured by the at least three cameras and measuring a relative distance in image space between the common features, to generate disparity values; (3) generating a data .representation, representative of the captured images and the corresponding disparity values: and
a related aspect includes executing a stereo correspondence operation on the image data in a rectified, undistorted (RUD) image space, and storing resultant disparity data in a RUD space coordinate system.
the resultant disparity data is stored in a URUD space coordinate system.
Another aspect includes generating disparity histograms from the disparity data in either RUD or URUD space, and storing tire disparity' histograms in a unified URUD space coordinate system,
a further aspect include applying a URUD to RUD coordinate transformation to obtain per-axis disparity values.
One aspect of the present, invention relates to methods, systems and computer software/program code products that enable video capture and processing, including (1) capturing images of a scene, the capturing comprising utilizing at least one camera having a view of the scene; (2) executing a feature correspondence function by detecting common features between corresponding images captured b the at.
the feature correspottdence function utilizes a disparity histogram -based method of integrating data and detemumng correspondence, the disparity histogram- based method comprising: (a) constructing a disparity histogram indicating the relative probabilit of: ' given disparity value being correct for a gi en pixel; and (b) optimizing generation of disparity values on a GPU computing structure, the optimizing comprising: generating, in the GPU computing structure, a plurality of output pixel threads; and, for each output pixel thread, maintaining a private disparity histogram, in a storage element associated with the GPU computing structure and physically proximate to the computation units of the GPU computing structure.
the private disparity histogram is stored such that each pixel thread writes to and reads from the corresponding pri vate disparity histogram os a dedicated portion of shared local memory in the GPU.
shared local memory in the GPU is organized at least in part into memory words; the private disparity histogram is characterized by a series of histogram bins indicating the number of votes for a given disparity range; and if a maximum possible number of votes in the private disparity histogram is known, multiple histogram bins can be packed into a single word of the shared local memory, and accessed using bitwise GPU access operations.
One aspect of the invention includes a program product for use with a digital processing system, for enabling image capture and processing, the digital processing system comprising at least first and second cameras having a view of a. scene, the cameras being arranged along an axis to configure stereo camer pa r having a camera pair axis, and a digital processing resource comprising at least one digital processor, the program product comprising digital processor-executable program instructions stored on a non-transitory digital processor-readable medium, which when executed i the digital processing resource cause the digital processing resource to; ( I ) capture images of the scene, utilizing the at least first and second cameras; and (2) execute a feature correspondence function b detecting common features between corresponding images captured by the respective cameras and measuring a relative distance in image space between the common features, to generate disparity values, wherein the feature correspondence function comprises: constructing a multi-level disparity histogram indicating the relative probability of a given disparity value being correct for a given pixel, the constructing of a multi-level disparity histogram comprising
the digital processing system comprises at least two stereo camera pairs, each pair being arranged along a respective camera pair axis, and the digital processor- executable program instructions further comprise instructions which when executed in the digital processing resource cause the digital processing resource to execute, tor each camera pair axis, the following: ( 1) image capture utilizing the camera pair to generate image data; (2) rectification and uudisiortiug transform ations to transform the image dat into ROD image space; (3) iterattvely
Another aspect of the invention includes a program product for use with a digital processing system, the digital processing system comprising at least first and second cameras having a view of a scene, the cameras being arranged along an axis to configure a stereo camera pair having a camera pair axis, and a digital processing resource composing at least one digital processor, the program product comprising digital processor-executable program instructions stored on a uon-transitory digital processor- readable medium, which when executed in the digital processing resource cause the digital processing resource to: (1) capture i mages of the scene, utilizing the at least first and second cameras; and (2) execute a feature correspondence function by detecting common features between corresponding images captured by the respective cameras and measuring a relative distance in image space between the common features, to generate disparity- values, wherein the feature correspondence function comprises: (a) generating a disparity solution based on the disparity values; and (b) applying art injective constraint to the disparity solution based on domain and co-domain, wherein the domain comprises pixels for a given image captured by the
the digital processor- executable program instructions further comprise instructions which when executed in the digital processing resource cause the digital processing resource to: maintain a reference count for each pixel in the co-domain, and check whether the reference count for the pixels in the co-domain exceeds " 1 ", and if the count exceeds " 1" then designate a violation and responding to the violation with a selected error correction approach.
Another aspect of the invention includes a program product for use with a digital processing system, for enabling a first user to view a second user with direct virtual eye contact with the second user, the digital processing system comprising at least one camera having a view of the second user's lace, and a digital processing resource comprising at least one digital processor, the program product comprising digital processor-executable program instructions stored on a non-transitory digital processor-readabie medium, which when executed in the digital processing resource cause the digital processing resource to: (1 ) capture images of the second user, utilizing the at least one camera: (2) execute a feature
the correspondence function by detecting common features between corresponding images captured by the at least one camera and measuring a relative distance in image space between the common features, to generate disparity values: (3) generate a data representation, representative of the captured images and the corresponding disparit values; (4) estimate a three-dimensional (3D) location, of the first user's head, face or eyes, thereby generating tracking information; and (5) reconstruct a synthetic vie of the second user, based on the representation, to enable a display to the first user of a synthetic view of the second user in which the second user appears to be gazing directly at the first user, wherein the reconstructing of a synthetic view of the second user comprises .reconstructing the synthetic view based on the generated data representation and the generated tracking information; herei the 3D location estimating comprises: (a) passing a captured image of the first user, the captured image including the first user's bead and face, to a two-dimensional (2D) fecial feature detector that utilizes the image to generate a first estimate of head and eye location and a rotation angle of
Yet another aspect of the invention includes a program product for use with a digital processing system, for enabling capture and processing of images of a scene, the digital processing system
a digital processing resource comprising at least one digital processor, the program product comprising digital processor- executable program instructions stored on anon-transitory digital processor-readable medium, which when executed in the digital processing resource cause the digital processing resource to: (1) capture images of the scene, utilizing the at least three cameras; (2) execute a feature correspondence function by detecting common features between corresponding images captured by the at least three cameras and measuring a relative distance in image space between the common features, to generate disparity values; (3) generate a data representation, representative of the captured images and the corresponding disparity values; and (4) utilize an unrectified, undistorted (URUD) image space to integrate disparity data for pixels between the firs and second stereo axes, thereby to combine disparity data from the first and second axes, wherein the URUD space is an image space in which polynomial lens distortion has been removed
the digital processor-executable prog ram instructions further comprise instructions which, when, executed in the digital processing resource cause the digital processing resource to execute a stereo correspondence operation on the image data in a rectified, undistorted (RUD) image space, and store resultant disparity data in a RUD space coordinate system.
RUD rectified, undistorted
Another aspect of the inventon includes a program product for use with a digital processing system, for enabling image capture and processing, the digital processing system comprising at least, one camera having a. view of a scene, and a digital processing resource comprising at least one digital processor, the program product comprising digital processor-executable program instructions stored on a non-transitory digital processor-readable medium, whic when executed in the digital processing resource cause the digital processing resource to; (!
the feature correspondence function utilizes a disparity histogram-based method of integrating data and determining correspondence, the disparity histogram-based method, comprising: (a) constructing a disparity histogram indicating the relative probability of a given disparity value being correct for a given pixel; and (b) optimizing generation of disparity values on a GPU computing structure, die optimizing comprising: generating,, in the GPU computing structure, a plurality of output pixel threads; and for each output pixel thread, maintaining a. private disparity histogram, in a storage element associated with the GPU computing structure and physically proximate to the computation units of the GPU computing structure.
One aspect of the invention includes a digital processing system for enabling a fi st user to view a second user with direct virtual eye contact with the second user, the digitai processing system comprising: ( 1 ) at least one camera having a view of the second user's face; (2) a.
a digital processing resource comprising at least one digitai processor, the digital processing resource being operable to: (a) capture images of the second user, utilizing the at least one camera: (fa) generate a data representation, representative of the captured images; (c) reconstruct a synthetic view of the second user, based on the representation; and (d) display the synthetic view to the first user on the display screen for use b the first user; the capturing, generating, reconstructing and displaying being executed such that the .first user can have direct virtual eye contact with the second user through the first user's display screen, by the teeonstruetmg and displaying of a synthetic view of the second user in which the second user appears to be gazing directly at the fi st user, even if no camera has a direct eye contact gaze vector to the second user.
Another aspect of the invention includes a digital processing system for enabling a first user to view a remote scene with the visual impression of being present with respect to the remote scene, the digital processing system, comprising: (1 ) at least two cameras, each having a view of the remote scene; (2) a display screen for use by the first user: and (3) a digital processing resource comprising at ieast one digital processor, the digital processing resource being operable to: (a) capture images of the remote scene, utilizing the at least two cameras; (b) execute a feature correspondence function by detecting common features between corresponding rmages captured by the respective cameras and measuring a relative distance in image space between the common features, to generate disparity values; (c) generate a data representation, representative of the captured images and the corresponding disparity values: (d) reconstruct a synthetic view of the remote scene, based on the representation; and (e) display the synthetic view to the first user on the display screen; the capturing, detecting, generating, reconstructing and displaying being executed such that: the first user is provided the visual impression o
Another aspect of the invention includes a system operable ia a handheld digital processing device, for facilitating self-portraiture of a user utilizing the handheld device to take the self portrait the system comprising: ( i ) a digital processor; (2) a display screen for displaying images to the user; and (3) at least one camera around die periphery of the display screen, the at least one camera having a view of the user's face at a.
the system being operable to: (a) capture images of the user during the setup time, utilizing the at least one camera around the periphery of the display screen; (b) estimate a location of the user's head or eyes relative to the handheld device during the setup time, thereby generating tracking information; (c) generate a data representation, representative of the captured images; (d) reconstruct a synthetic view of the user, based on. the generated data representation and the generated tracking information; and (e) display to the user, on. the display screen during the setup time, the synthetic view of me user; thereby enabling the user, while setting up the self-portrait;, to selectively orient or position his gaze or head, or the handheld device and its camera, with realtime visual feedback.
One aspect of the invention includes a system operable in a handheld digital processing device, for facil i tating composition of a photog raph of a scene by a user utilizing the handheld device to take the photograph, the system comprising: ( .1 ) a digital processor; (2) a display screen on a first side of the handheld device for displaying images to the user; and (3) at least one camera on a second, opposite side of the .handheld device, for capturing images; the system being operable to: (a) capture images of the scene, utilizing the at least one camera, at a photograph setup time during which the user is setting tip the photograph: (b) estimate a location of the user's head or eyes .relati ve to the handheld device during the setup time, thereby generating tracking information; (c) generate a data representation, representative of the captured, images; (d) reconstruct a synthetic view of the scene, based on the generated data representation and the generated tracking information, the synthetic view being reconstructed such that tile scale and perspective of the synthetic view has
Another aspect of the invention includes a system for enabling display of images to a user utilizing a binocular stereo head-mounted display (HMD), the system comprising: ( 1) at least one camera attached or mounted on or proximate to an external portion or surface of the HMD; and (2) a digital processing resource comprising at least one digital processor; the system being operable to: (a) capture at least two image streams using the at least one camera, the captured image streams containing i mages of a scene: (b) generate a data representation, representative of captured images contained in the captured image streams; (c) .reconstruct two synthetic views, based on the representation; and (d) display the synthetic views to the user, via the HMD: the reconstructing and displaying being executed such that each of the synthetic views has a respective view origin corresponding to a respective virtual camera location, wherein the respective view origins arc positioned sack that the respective virtual camera locations coincide with respective locations of the user's left and tight eyes, so as to provide the user with a substantially natural visual experience of the
Another aspect of the invention includes an image processing system for enabling the generation of an image data, stream for use by a control system of an autonom ous vehicle, the image processing system comprising: (1) at least one camera with a view of a scene around at least a portion of the vehicle; and (2) a digital processing resource comprising at least one digital processor; the system being operable to: (a) capture images of the scene around at least a portion of the vehicle, using the at least one camera; (b) execute a feature correspondence function by detecting common features between corresponding images captured by the at least one camera and measuring a relative distance in image space between the common features, to generate disparity values; (c) calculate corresponding depth information based on the disparity values; and d) generate from the images and corresponding depth information an image data stream for use by the control system.
Another aspect of the invention includes generating a facial signature, based on images of a human user's or subject's face, for enabling accurate, reliable identification: or authentication of a hitman subject or user of a system or resource, in a secure, difficult to forge manner.
invention relates to methods, systems and computer soft ; are/program code products that enable generating a fecial signature for use in identifying a given human user.
generating a facial signature incl udes capturing images of the user's face, using at least one camera having a view of the user's face; executing an image rectification function to compensate for optical distortion and alignment of the at least one camera; executing a feature correspondence function by detecting common features between corresponding images captured by the at .least one camera and measuring a relative distance in image space between the common features, to generate disparity values and a feature correspondence data representation representative of the captured images and the corresponding disparity values; and utilizing the feature correspondence data
facial signature data representation to generate a facial signature data representation, the facial signature data representation being usable in accurately identifying the user or subject in a secure, difficult to forge manner.
the capturing can utilize at least two cameras, each having a view of the user's face; and the feature correspondence function can include detecting common features between corresponding images captured by the respective cameras.
the capturing can utilize at least one camera having a view of the user's face and which is an infra-red time-of-fl ight camera or structured light ca era thai directly provides depth information; and the feature correspondence data representation can be representati ve of the captured images and corresponding depth information.
the capturing utilizes a single camera having a view of the second user's face; and executing a feature correspondence function includes detecting common features between images captured by the single camera over time and measuring a relative distance in image space between the common features, to generate disparity values.
the identity ing aspect of the invention uses stereo depth estimation to verify that human fecial features are presented to the cameras at the correct distance ratios ' between the cameras at the correct distance ratios between the cameras or from the structured light or tirne-of-flight sensor.
the identifying takes into account the actual 3D coordinates of facial features wit respect to other facial features, and the feature correspondence function or depth detection function includes computing distances between facial features from multiple perspectives.
the facial signature is a combination of 3D facial contour information and a 2D image from one or more of the cameras.
the 3D con to ur date can be sto red in the facia! signature data representation.
the facial signature is utilized as a security factor in art authentication system, either as the sole security factor or in combination, with other security factors.
the other security factors can include a passcode, a fingerprint or other Mornetrie infbraiatioa.
the 3D facial contour data is combined with a 2D image fr m one or more cameras in a conventional 2D face identification system to create a hybrid 3D/2D face identification system.
3D .facial contour data is used solely to confirm that a face having credible 3D human fecial proportions was presented to the cameras at an overlapping spatial location of the captured 2D image.
a further aspect of the invention includes using a 2D bounding .rectangle, defining the.2D extent of the face location, to limit search space and limit calculations to a region defined by the rectangle, thereby increasing speed of recognition and reducing power consumption.
Still another aspect of the invention includes prompting the user to present multiple distinct facial poses or head positions, and utilizing a depth detection system: to scan the multiple facial poses or head positions across a series of image frames, so as to increase protection against forgery of the facial signature.
generating a unique facial signature further includes executing a enrollment phase, which includes prompting the user to present to the cameras a plurality of selected head movements or positions, or a series of selected fecial poses, and collecting image frames from a plurality of head positions or facial poses for use in generating the unique facial signature representative of the user.
the invention further includes a matching phase, which includes using the cameras to capture, over an interval of time, a plurality of frames of 3 D and 2D data representative of the user's face; correlating the captured data with the facial signature generated during the enrollment phase, thereby to generate a probability of match score; and comparing the probability of match score with a selected threshold value, thereb to confirm or deny an identity match.
a matching phase which includes using the cameras to capture, over an interval of time, a plurality of frames of 3 D and 2D data representative of the user's face; correlating the captured data with the facial signature generated during the enrollment phase, thereby to generate a probability of match score; and comparing the probability of match score with a selected threshold value, thereb to confirm or deny an identity match.
the enrollment phase can inckide generating an enrolled facial signature containing data corresponding to multiple image scans of a user's face, the multiple image scans corresponding to a plurality of the user's head positions or facial poses; and the matching phase can inckide requiring at least a minimum number of captured image frames corresponding to different racial or head positions matching the multiple scans within the enrolled signature.
Another aspect of the invention relates to generating a histogram based facial signature representation, whereby a fecial signature is represented as one or more histograms obtained from a summation of per-pi el disparity histograms within the feature correspondence calculation, or generated from depth, data from a sensor capable of directly perceiving depth.
the histograms represent th normalized relative proportion of facia! feature depths across a plane parallel to the user's face.
Tire X- axis of the histogram represents a given disparity or depth range
the Y-axis of the histogram represents the normalized count of image samples that fall within the given range.
a conventional 2D face detector provides a face rectangle and location of basic facial features, and only samples within the face rectangle are accumulated into the histogram.
Another aspect includes rejecting samples falling outsid the statistical majority of samples within the face rectangle.
Another aspect includes projecting disparity and depth points into a canonical coordinate system defined by a plane constructed from the 3D coordinates of the haste facial features.
a. histogram is accumulated over multiple captured image .frames over a period of time.
each set of samples of the captured image frames undergoes an affine transform to lie on a common faci al plane, to enable multiple samples of facial depth relationships to be accumulated into a histogram.
multiple samples of facial depth relationships are accumulated into a histogram across a series of facial positions or poses.
a candidate histogram is accumulated over multiple captured image frames over a period of time.
a candidate histogram is accumulated, it is subtracted from a set of enrolled histograms to generate a vector distance constituting a degree-of-match score.
a further aspect of the invention includes comparing the degree-of-match score to a selected threshold to confirm or deny a match wi th each enrolled signature in a set of enrolled signatures.
the histogram representation is used in combination with conventional 2D face matching to provide an additional authentication factor.
the fecial signature is utilized as a factor in an authentication process in which a human subject or user of a. system or resource is successfully authenticated if selected criteria m met, and the facial signature aspect further iaeludes updating the facial signature on every successful match, or on e very nth successful match, where n is a selected integer.
Another aspect of the invention includes a program product for use with a digital processing system, for generating a fecial signature for use in identifying a human user or subject, the digital processing system comprising at least one camera, having a view of the user's or subject's face, and a digital processing resource comprising at least one digital processor, the program product comprising digital processor-executable program instructions stored on a.
non-transitory digital processor-readable medium which when executed in the digital processing resource cause the digital processing resource to: capture images of the user's or subject's face, utilizing the at least one camera; execute an image rectification function to compensate for optical distortion and alignment of the at least one camera; execute a feature correspondence function b detecting common features between corresponding images captured by the at least one camera and measuring a .relative distance in image space between the common features, to generate disparity values and a feature correspondence data representation representative of the captured images and the corresponding disparity values; and utilize the feature correspondence data representation to generate a facial signature data representation, the facial signature data representation being usable to accurately identify the user or subject in a secure, difficult to forge manner.
the capturing can include using at least two cameras, each having a view of the user's or subject's face; and executing a feature correspondence function can include detecting common features between corresponding images captured by the respective cameras.
the capturing can include using at least one camera having a view of the user's or subject's face and which is an infra-red Jime-af-fiight camera or structured light camera thai, directly provides depth information; and the feature correspondence data representation is representative of the captured images and correspondin depth information
the capturing includes using a single camera havin a view of the user's or subject's face; and executing a feature correspondence function includes detecting common features between images captured by the single camera over time and measuring a relative distance in image space between the common features, to generate disparity values.
the identifying of a human user or subject utilizes stereo depth estimation to verify that human facial features are presented to the cameras at the correct distance ratios between the cameras or from the structured light or time-of-flight sensor.
the identifying can take into account the actual 3D coordinates of facial features with respect to other facial features.
Anothe aspect of the in ention includes a digital processing system tor generating a facial signature for use in identifying a human user or subject, the digital processing system comprising at least one camera having a view 1 of the users or subject's face, and a digital processmg resource comprising at least one digital processor, the digital processing resource being operable to: capture images of the user's or subject's face, utilizing the at least one camera execute an image rectification function to compensate for optica!
the at least one camera executes a feature correspondence function by detecting common features between corresponding images captured by the at least one camera and measuring a relative distance in image space between the common features, to generate disparity values and a feature correspondence data representation representative of the captured images and the corresponding disparity values; and utilize the feature correspondence data representation to generate a facial signature data .representation, the facial signature data representation being usable to accurately identify the user or subject, in a secure, difficult to forge manner,
the system can include at least two cameras having a view of the user's or subject's face; the capturing can include utilizing the least two cameras and executing a feature correspondence function can include detecting common features between corresponding images captured by the respective cameras,
the capturing includes using at least one camera having a view of the user's or subject's face and which is an infra-red titne-of- ight camera, or structured light camera that directly provides depth information: and the feature correspondence data representation is
the capturing includes utilizing a single camera having a view of the user's or subject's face; and executing a feature correspondence function includes detecting common features between images captured by the single camera over time and measuring a. relative distance in image space between the common features, to generate disparity values.
the identifying of a human subject or user includes utilizing stereo depth estimation to verify that human facial features are presented to the cameras at the correct distance ratios between the cameras or from, the stmctured light or time-of-fiight sensor.
the identifying can take into account the actual 3D coordinates of facial features with respect to other facial features.
FIG. i stows a camera, configuration useful in an exemplary practice of the invention.
F GS. 2-6 are schematic diagrams illustrating exemplary practices of the invention.
FIG. 7 is a flowchart showing an exemplary practice of the invention.
FIG, 8 is a block diagram depicting an exemplary embodiment of the invention.
FIGS. 9-18 are schematic diagrams illustrating exemplary practices of the invention.
FIG, 1 is a graph, in accordance with as aspect of the invention.
FIGS. 20-45 are schematic diagrams illustrating exemplary practices of the invention.
FIG . 46 is a graph in accordance with an aspect of the invention.
FIGS. 47-54 are schematic diagrams illustrating exemplar ⁇ ' practices of the invention.
FIGS. 55-80 are flowcharts depicting exemplary practices of the invention.
FIG. 81 is a schematic flow diagram depicting processing of images to generate a Facial
FIGS. 82-83 show art e em lar' image processed in accordance with an exemplary practice of die Facial Signature aspects of the invention, where FIG. 82 is an example of an image of a human user or subject captured by at least one camera, and FIG. 83 is an example of a representation of image data, corresponding to the image of FIG. 82, processed in accordance with an exemplary practice of the invention.
FIG. 84 shows a histogram representation corresponding to the image(s) of FIGS. 82-83, generated in accordance with an exemplary practice of the Facial Signature aspects of the invention.
FIGS. 85-88 are flowcharts depicting exemplary practices of the Facial Signature aspects of the invention.
V3D aims to address and radically improve the visual aspect of sensory ; engagement in teleconferencing and other video capture settings, while doing so with low latency .
the visual aspect, of conducing a video conference is conventionally achie ved via a camera pointing at eac user, transmi tting the video stream captured by eac camera, and then projecting the video stream (s) o o the two-dimensional (2D) display of the other user in a different location.
Both users have a camera and display and thus is formed a full-duplex connection where both users can see each other and their respective environments.
the V3D of the present invention aims to deliver a significant enhancement to this particular aspect by creating a "portaT where each user would look 'through " their respective displays as if there were a "magic " sheet of glass in a frame to the other side in the remote location.
This approach enables a number of importan improvements for the user (assuming robust implementation:
Each user can form direct eye contact with the other,
Each user can move his or her head in any direction and look through the portal to the othe side. They can e ven look " 'around" and see the
V3D aspects of the invention can be configured to deliver these advantages in a manner that fits within the highly optimized form factors of today's modem mobile devices, does not dramatically alter the economics of building such de ices, and is viabte within the current connectivity performance levels available to most users.
FIG, I shows a perspective view of an exemplary prototype device 1.0, which includes a display 12 and three cameras; a to right camera 14, and bottom right camera 16, and a bottom left camera 18. in connection with this example, there will next be described various aspects of the invention relating to the unique user experience provided by the V3D invention.
the V3D system of the invention enables immersive communication between people (and in various embodiments, between sites and places), in exemplary practices of the invention, each person can. look "through” thei screen, and see the other place. Eye contact is greatly improved. Perspective and scale are matched to the viewer ' s natural view. Device shaking is inherently eliminated.
embodiments of the V3D system can be implemented in mobile configurations as well as traditional stationary devices,
FIGS. 2A-B, 3A-B. and 5A-B are images illustrating an aspect of the invention, in which the V3D system is used in conjunction with a smartphone 20, or like device.
Smartphone 20 includes a display 22, on which is displayed an. image of a face 24.
the image may be, for example, part of video/telephone conversation, in which a video image and sound conversation is being conducted with someone in a remote location, who is looking into the camera of their own smartphone.
FIGS. 2 A and. 2B illustrate a feature of the V3D system for improving eye contact.
FIG. 2.A. shows the face image prior to correction, it will be seen that the woman appears to be lookin down, so that there can be no eye contact with the other user or participant.
FIG . IB shows the face image after correction, it will be seen that in the corrected image, the woman appears to be snaking eye contact with the smartphone user,
FIGS. 3A-3B are a pair of diagrams illustrating the V3.D system ' s "move left” (FIG. 3A) and “move right” (FIG. 3B) corrections.
FIGS. 4A-4B are a pair of diagrams of the light pathways 26 a, 26b in the scene shown respectively on display 22 in FIGS. 3A-3B (shown from above, with the background at the top) leading from face 24 and surrounding objects to iew oint 28a, 28b through the "window " defined by display 22.
FIGS. 5A-5B are a pair of diagrams illustrating the V3D system's "move in” (FIG. 5A) and “move out” (FIG. 5B) corrections.
F GS. 6A-6B are a pair of diagrams of the light: pathways 26c, 26d in the scene shown respectively on display 22 in FIGS. 3A-3B (shown from above, with the background at the top) leading from face 24 and surrounding objects to viewpoints 28c, 28d through the "window "' defined by display 22.
Another embodiment of the invention utilizes the invention's ability to synthesize a virtual camera view of the user to aid in solving the problem of "where to look” when taking a self-portrait on a mobile device. " This aspect of the invention operates by image-capturing the user per the overall V3D method of the invention described herein, tracking the position and orientation of the user's face, eyes or head, and by using a display, presenting an image of the user back to themselves with a synthesized virtual camera viewpoint as if the user were looking in a mirror.
Another embodiment of the invention makes it easier to compose a photograph using a rear- facing camera on.
a mobile device works like the overall V3D method of invention described herein, except that the scene is captured through the rear-facing camerafs) and then, using the user's head location, a view is constructed such that the scale and perspective of the image matches the view of the user. such, that the device display frame becomes like a picture frame. This results in a user experience where the photographer does not have to manipulate zoom controls or perform cropping, since they can simply frame the subject as they like within the frame of the display, and take the photo.
Another embodiment of the invention enables the creation of cylindrical or spherical panoramic photographs, by processing a series of photographs taken with a device using the cameta(s) rumiing the V3D system of the invention.
the user can then enjoy viewing the panoramic view thus created, with an immersive sense of depth .
the panorama can either be viewed on a 2D display with head tracking, a multi-view display or a binocular virtual reality (VR) headset with a unique perspective shown for each eye. If the binocular VR headset lias a facility to track head location, the V3D system can re-project the view accurately.
VR virtual reality
FIG. 7 is a flow diagram illustrating the overall V3D digital processing pipeline 70, which includes the following aspects:
Image Capture One or more images of a scene, which may include a human user, are collected instantaneoitsly or over time via one or more cameras and fed into the system . Wide-angle lenses are generally preferred due to the ability to get greater stereo overlap between images, although this depends on the application and can in principle work with any focal length.
Image Rectification in order to compensate for optical lens distortion from each camera and relative misalignment between the cameras in the multi-view system, image processing is performed to apply an inverse transform to eliminate distortion, and an affine transform to correct misalignment between the cameras, in order to perform efficiently and in real-time, this process can be performed using a custom imaging pipeline or implemented using the shading hardware present in many conventional graphical processing units (CPUs) today, including GPU hardware present in devices such as i Phones and other commercially available smartphones. Additional detail and other variations of these operations will be discussed in greater detail herein.
CPUs graphical processing units
GPU hardware present in devices such as i Phones and other commercially available smartphones. Additional detail and other variations of these operations will be discussed in greater detail herein.
Feature Correspondence With the exception of using ttme-of-flight type sensors in the Image Capture phase that provide depth information directly, this process is used in order to extract parallax information present in the stereo images from the camem views. This process involves detecting common features between multi-view images and measuring their relative distance in image space to produce a disparity measurement litis disparity measurement can either be used directly or converted, to actual depth based on knowiedge of the camera field-of-vie , relative positioning, sensor size and image resolution. Additional detail and. other variations of these operations will be discussed in greater detail herein.
Reconstruction Using the previously established representation, whether stored locally on the device or received over a network, a series of Synthetic views into the originally captured scene can be generated. For example, in a video chat the physical image inputs may have come from cameras surrounding the head of the user in which no one view has a direct eye contact, gaze vector to the user. Using reconstruction, a synthetic camera view placed potentially within the bounds of the device display enabling the visual appearance of eye contact can be produced.
Head Tracking Using the image capture data as an input many different methods exist to establish an estimate of the viewer' head or eye location. This information ca be used to drive the reconstruction and generate a synthetic view which looks valid from the user's -established head location. Additional detail and various forms of these operations will be discussed in greater detail herein.
Display Several types of display can. be used with the V3D pipeline in different ways. The currently employed method involves a conventional 2D display com ined with head tracking to update the display project in -real-time so as to gi ve the visual impression of being three-dimensional (3D) or a look into a 3D environment. However binocular stereo displays ⁇ such as the commercially available Ocui s Rift) can be employed used, or still further, a lenticular type display can be employed, to allow auto-stereoscopic viewing.
binocular stereo displays such as the commercially available Ocui s Rift
a lenticular type display can be employed, to allow auto-stereoscopic viewing.
FIGS. 7 and 8 can also be used to enable the Facial Signature aspects of the invention., to enable a "signature" of a user's or subject's face, or face and head, to be generated from the Feature
Correspondence phase for purposes such as user identification, authentication or matching.
FIG 8 is a diagram of an exemplary V3.D pipeline 80 configured in accordance with the invention, for immersive eornraimication with eye contact.
the depicted pipeline is full -duplex, meaning that it allows simultaneous two-way conunomcation in both directions.
Pipeline 80 comprises a pair of communication devices 1 A-B (for example, commercially available smartphones such as iPhones) that are linked to each other through a network 82.
Each communication device includes a decoder end 83 A-B for receiving and decoding communications from the other device and an encoder end 84A-B for encoding and sending commimications to the other device 8 ! A-B.
the decoder end 83 A-B includes the following components:
the View Reconstruction module 833A-8 receives data 835A-B from a Head Tracking Module 836-B, which provides x ⁇ , y ⁇ , and z-coordmate data, with respect to the user's head that is generated by cameras 841 A-B.
the encoder end 84-B comprises a multi-camera array that includes camera;; 8 1 A-B, cameras 84 A-B, and additional camera(s) 842A-B. (As noted herein, it is possible to practice various aspects of the invention using only two cameras.)
the camera array provides data in the form of color camera streams 843 A-B that are fed into a Color Image Redundancy Elimination module 844A-B and an Encode module.
the output of the camera array is also fed into a Passive Feature Disparity Estimation module 84SA-B that provides disparity estimation data to the Color Image Redundancy Elimination module 846A-B and the Encode module 847A-B.
the encoded output of the device is then transmitted over network 82 to the Receive module 83 1 A-B in the second device 8.1A-B.
the V3D system requires an input of images in order to capture the user and the world around the user.
the V3D system can be configured to operate with a wide range of input imaging device.
Some devices, such as normal color cameras, are inherently passive and thus require extensive image processing to extract depth information, whereas non-passive systems can get depth directly, although they have the disadvantages of requiring reflected IR to work, and thus do not perform well in strongly naturally lit environments or large spaces.
Those skilled in the art will understand that a wide range of color cameras and other passive imaging devices, as well as non-passive image capture devices, are commercially available from a variet of manufacturers.
Hits descriptor is intended to cover the use of visible light or infrared specific cameras coupled with an active infrared emitter that beams one of many -potential patterns onto the surfaces of objects, to aid in. computing distance.
I -Structured Light devices are known in the art,
This descriptor covers the use o time-of ⁇ tlight cameras that work by emitting a pulse of light, and then measuring the time taken for reflected light to reach each of the camera's sensor elements. This is a more direct method of measuring depth, but has currently not reached the cost and resolution levels useful fo significant consumer adoption. Using this type of sensor, in some practices of the invention the feature correspondence operation noted above could be omitted, since accurate depth information is already provided directly from the sensor.
the V3D system of the invention can be configured to operate with multiple cameras positioned in a fixed relati ve position as part of a device. It is also possible to use a single camera, by taking images over time and with accurate tracking, so that the relative position of the camera between frames can be estimated with sufficient accuracy. With sufficiently accurate positional data, feature correspondence algorithms such as those described herein could continue to be used.
V3D a practice of the V3D invention that relates to the positioning of the cameras within the multi-camera configuration, to significantly increase the number of valid feature correspondences between images captured ia real world settings, ' litis approach is based on three observations:
Man features in man-made indoor or urban environments consist of edges aligned in the three orthogonal axes (x, y .? ⁇ .
feature correspondence algorithms typically perform their search along horizontal or vertical epipolar lines in image space.
FIGS. 9, 10, and 11 show three exemplary sensor configurations 90, 100, 1 10.
FIG. 9 shows a handheld device 90 comprising a display screen 91 surrotmded by a bezel 92.
Sensors 93, 94, and 95 are located at the corners of bezel 92, arid define a pair of perpendicular axes: a first axis 96 between sensors 93 and 94, and a second axis 97 between cameras 94 and 95.
FIG. 10 shows a handheld device 100 comprising display 101, bezel 102, and sensors .103, 104. 105.
each of sensors 103. 104, 105 is rotated by an angle 0 relati ve to bezel 1 2.
the position of the sensors .103, 104, and 105 on bezel 1 2 has been configured so that the three sensors define a pai r of perpendicular axes 1 6,. 1 7.
FIG. 1 1 shows a handheld device 110 comprising display 1 1 1, bezel 1 12, and sensors 113, 1 14, 1 15.
the sensors 1 13, 1 14, 1 15 are not rotated.
the sensors i 13, 1 14, 115 are positioned to define perpendicular axes- 116, 1 17 that are angled with respect to bezel i 12.
the data from sensors 113, 1 .14, 115 are then rotated in software such that the correspondence continues to be performed along the epipolar lines.
V3D uses 3 sensors to enable vertical and horizontal cross correspondence
the methods and practices described above are also applicable in a 2- camera stereo system.
FIGS. 12 and .13 next highlight advantages of a "rotated configuration" in accordance with the invention, in particular, 12A shows a “non-rotated” device configuration 120, with sensors 121, 122, 123 located in three comers, similar to configuration 90 shown in FIG. 9.
FIGS. 12 A - .12D being referred to as "FIG. 12"
FIGS. 12 A - .12D show the respective scene image data collected at sensors 121. .122, 123.
Sensors 121 and 122 define a horizontal axis between them, and generate a pair of images with horizontally displaced viewpoints. For certain features, e.g Craig features Hi, H2, there is a strong
correspondence i.e., the horizontally-displaced scene data provides a high level of certainty with respect to the correspondence of these features.
the correspondence is weak, as shown in FIG. .12. (i.e., the horizontally-displaced scene data provides a low level of certainty with respect to the correspondence of these features).
Sensors 122 and 123 define a vertical axis that i perpendicular to the axis defined by sensors 121 and 122. Again, for certain features, e.g., feature VI. in FIG. 12, there is strong correspondence. For other features, e.g. feature V2 in FIG. 12, the correspondence is weak.
FIG. 1 A shows a device configuration 130, similar to configuration 1.00 show in FIG. .10. with, sensors 131 , i 32, 133 positioned and rotated to define an angled horizontal axi and an angled vertical axis.
sensors 131 , i 32, 133 positioned and rotated to define an angled horizontal axi and an angled vertical axis.
FIGS. 13B, 13C, and S D the use of an angled sensor configuration eliminates the weakly corresponding features shown In FIGS, 128, 12C, and I2D.
FIGS. 12 and 13 a rotated configuration of .sensors in accordance with an exemplary practice of the invention enabies strong correspondence for certain scene features where the non-rotated configuration did not.
duri ng the process of calculating feature correspondence, a feature is selected in one image and then scanned for a corresponding feature in another image. During this process, there can often be several possible matches found and various methods are used to establish which match (if any) has the highest likelihood of being the correct one.
the correspondence errors in the excessively dark or light areas of the image can cause large-scale visible errors in the image by causing the computing of radically incorrect disparity or depth estimates.
another practice of he invention involves dynamically adjusting the exposure of the multi-vie camera, system, on a ftame-by-franie basis in order to improve the dsspariiy estimation in areas out of the exposed region viewed by the user.
exposures taken at darker and lighter exposure settings surrounding the visibility optimal exposure would be taken,, have their disparity calculated and then get integrated in the overall pixel histograms which are being retained and converged over time.
the dark and light, images could be, but are not required to be. presented to the user and would serve only to improve the disparity estimation.
Another aspect of this approach is to analyze the variance of die disparity histograms on "d k" pixels, "mid-range” pixels and “light pixels”, and use this to drive die exposure setting of the cameras, thus forming a closed loop system between the qualit of the disparity estimate and the set of exposures winch, are requested from the input multi-view camera system.
the cameras are viewin a purely indoor environment, such as an interior room, with limited dynamic range due to indirect lighting, only one exposure may be needed.
An exemplary practice of the V3D system executes image rectification in real -time using the GPU hardware of the device on winc it is operating, such as a conventional smartphone, to facilitate and improve an overall solution.
a search must be performed between two cameras arranged in a stereo configuration in order to detect the relative movement of features in the image due to parallax. This relative movement is measured in pixels and is referred to as "the disparity",
FIG. 14 shows an exemplary pair of unreetified and distorted camera (URD) source camera images 140 A and 140R for left and right stereo.
the image pair includes a matched feature, i.e.. the subject's right eye 141A, 140B.
the matching feature has largely been shifted horizontally , but there is also a vertical shift because of slight misalignment of the cameras and the fact that there is a polynomial term resulting from lens distortion.
the matching process can be optimized by measuring the lens distortion polynomial terms, and by inferring the affine transform required to apply to the images such that they are rectified to appear perfectl horizontal iy aligned and co-planar. When this is done, what would otherwise be a freefonn 2D search for a feature match can now be simplified by simply searching along the same horizontal row on the source image to find the match.
this is done in one step, in which the lens distortion and then affine transform
URD (Un rectified. Distorted) space This is the image space in which the source camera images are captured. There is both polynomial distortion due to the lens shape and an arTiue transform that makes the image not perfectly co-planar and axis-aligned with the oilier stereo linage. S he number of URD images in the system is equal to the number of cameras in the system.
URU.D Unreetified, Undistorted space: This is a space in which the polynomial lens distortion is removed from the image but the images remain unreetified.
the number of URUD images in the system is equal to number of URD images and therefore, cameras, in the system.
RUD (Rectified, Undistorted) space: This is a space in which both the polynomial lens distortion is removed from the image and an affine transform i applied to .make the image perfectly co-planar and axis aligned with the other stereo image on the respective axis.
RUD always exist in pairs. As such, for example, in a 3 camera system where the cameras are arranged in a substantially L -shape configuration (having two axes intersecting at a selected point), there would be two stereo axes, and thus
FIG . 15 is a flow diagram 150 providing various examples of possible transforms in a 4-camera Y3D system. Note that there are 4 stereo axes. Diagonal axes (not shown) would also be possible.
the typical transform when sampling the source camera images in a stereo correspondence system is to transform from RUD space (the desired space for feature correspondence on a stereo axis) to URD space (the source camera images).
RUD space the desired space for feature correspondence on a stereo axis
URD space the source camera images
an exemplary practice of the invention makes substantial use of the URU ' D image space to connect the stereo axes disparity values together.
FIG. 16 sets forth a flow diagram 160
FIGS. I 7A-C are a series of images that illustrate the appearance and purpose of the various transforms on a single camera i mage .
Tins practice of the invention works by extending the feature correspondence algorithm to include one or more additional axes of correspondence and integrating the results to improve the quality of the solution.
FIGS. .18A-D illustrate an example of this approach.
FIG. 18A is a diagram of sensor
configuration I SO having 3 cameras 181. 1 S2, .183 in a substantially L-shaped configuration such that a stereo pair exists on both the horizontal axis 185 and vertical xis 186, with one camera in common between the axes, similar to the configuration 90 shown in FIG. 9.
the overall system contains a suitable representation to integrate the .multiple disparit solutions (one such representation being the "Disparity Histograms'" practice of the invention discussed herein), this configuration will allow for 'uncertain correspondences in one stereo pair to be either corroborated or discarded through the additiooai infonnation found by performing correspondence on the other axis.
certain features which have no correspondence on one axis may find a
FIGS. I 8B, I SC, and 18D are depictions of three simultaneous images received respectively b sensors 181, 182, 183.
the three-image set. is illustrative of all the points mentioned, above.
Feature (A), i.e., the subject ' s nose, is found t correspond bom o the horizontal stereo pair
correspondence helps eliminate correspondence errors by improving the signal-to-noise ratio, since the likelihood of the same erroneous correspondence being found in both axes is low.
Feature (B) i.e., the spool of twine
Feature (B) is found to correspond only on the horizontal stereo pairs. Had the system only included a vertical pair, mis feature would not have had a depth estimate because it is entirely out of view on the upper image.
Feature (C) i.e., the cushion on the couch, is only possible to correspond on the vertical axis. Had the system only included a horizontal stereo pair, the cushion would have been entirely occluded in the left image, meaning no valid disparity estimate could have been established.
the stereo pair on a particular axis will have undergone a calibration process such that the epipolar lines are aligned to the rows or columns of the images.
Each stereo axis will have it own unique camera alignment properties and hence the coordinate systems of the features will be incompatible, in order to integrate dtsparity information on pixels between multiple axes, the pixels containing the disparity solutions will need to undergo coordinate transformation to a unified coordinate system.
This aspect of the invention involves .retaining a representation of disparity in the form of the error function or, as described elsewhere herein, the di parity histogram, and continuing to integrate disparity solutions for each frame in time to converge on a better solution through additional sampling.
This aspect of the invention is a variation of the correspondence refinement over time aspect. In cases where a given feature is detected but for which no correspondence can be found in another camera, if there was a prior solution for that pixel from a previous frame, this can be used instead. Histogram -Based pi sgarjtv Rgpre scmati on. Method
This aspect of the invention provides a representation to allow multiple disparity measuring techniques to be combined to produce a higher qualify estimate of image disparity, potentially even over time. It also permits a more efficient method of estimating disparity, taking into account more global context in the images, wi thout the significant cost of large per pixel kernels and image differencing.
disparity estimation methods for a given pixel in an image in the stereo pair involve sliding a region of pixels (known as a kernel) surrounding the pixel in question from one image over the other in the stereo pair, and computing the difference for each pixel, in. the kernel, and reducing this to a scalar value for each disparity being tested.
a region of pixels known as a kernel
FIG. 19 is a graph 190 of cumulative error for a 5x5 block of pixels for disparity values between 0 and 128 pixels. In this example, it can e seen that there is a single global minimum that is likely to be the best solution.
FIGS. 20A-B and 21.A-B are two horizontal stereo images, FIGS. 21 A and 2 IB, which, correspond to FIGS. 20A and 20B, show a selected kernel of pixels around the solution point for which we are trying to compute the disparity, it can be seen that for the kernel at its current i e, the cumulative error function will have two minima, one representing the features that have a small disparity since they are in the image background, and those on the wall which are in the foreground and will have a larger disparity. In the ideal situation, the minima would flip from the background to the foreground disparity value as close to the edge of the wall as possible. In practice, due to the high intensity of the wall pixels, many of the background pixels snap to the disparity of the foreground, resulting in a serious quality issue forming a border near the wall . lack oj ' Meaningfin Units
the units of measure of "error” i .e. the Y-axis on the example graph, is unsealed and may not be compatible between multiple cameras, each with its own color and luminance response. his introduces difficulty in applying statistical methods or combining error estimates produced through other methods. For example, computing the error function from a different stereo axis would be incompatible in scale, and thus the terms could not be easily integrated to produce a better error function.
One practice of the disparity histogram solution method of the invention works by maintaining a histogram showing the relative likelihood of a particular disparity being valid for a given pixel.
the disparity histogram behaves as a probabilit density function (PDF) of disparity for a given pixel, higher values indicating a higher likelihood that the disparity range is the "truth”.
PDF probabilit density function
FIG. 22 show an example of a typical disparity histogram for a pixel.
the x-axis indicates a particular disparit range and the scale v-axis is the number of pixels in the kernel surrounding the central pixel that are "voting" for that given disparity range.
FIGS. 23 and 24 show a pair of images and associated histograms.
the votes can be generated by using a relatively fast and low-quality estimate of disparity produced using small kernels and standard SSD type methods.
the SSD method is used to produce a "fast dense disparity ma ' ' (FDDE), wherein each pixel has a selected disparity that, is the lowest error. Then, the algorithm would go through each pixel accumulating into the histogram a tally of the number of votes for a given disparity in a larger kernel surrounding the pixel.
FDDE fast dense disparity ma ' '
disparity histogram With a given disparity histogram, many forms of analysis can be performed to establish the most likely disparity for the pixel, confidence in the solution validity, and even identify cases where there are multiple highly likely solutions. For example, if there is a single dominant mode in the histogram, the x coordinate of that peak denotes the most likely disparity solution.
FIG. 25 shows an example of a bi-modal disparity histogram with 2 equally probable disparity possibilities.
FIG. 26 is a diagram of an example showing the disparity histogram and associated cumulative distribution function (CDF), The interquartile range is narrow, indicating high confidence.
CDF cumulative distribution function
FIG. 2? is a contrasting example showing a wide interquartile range in the CDF and thus a low confidence in any peak within that range.
the width of the interquartile range can be established. This range can then be used to establish a confidence level in the solution.
a narrow interquartile range indicates that the vast majority of the samples agree with the solution, whereas a wide interquartile range (as in FIG . 27) indicates that the solution confidence is lo because many other disparity values could be the truth.
a count of the number of statistically significant modes in the histogram can be used to indicate 'modality.” For example, if there are two strong modes in the histogram (as in FIG, 25), it is highly likely that the point in question is right on the edge of a feature that demarks a background versus foreground transition in depth. This can be used to control the reconstruction later in the pipeline to control stretch versus slide (discussed in greater detail elsewhere herein).
the histogram is not biased by variation in image intensity at ail, allowing for high quali ty disparity edges on depth discontinuities , in addition, this permits othe method of estimating disparity for the given pixel to be easily integrated into a combined histogram.
SSD performance is proportional to the square of its kernel size multiplied by the number of disparity values being tested for. Even through the small SSD kernel output is a noisy disparity solution, the subsequent voting, which is done by a larger kernel of the pixels to produce the histograms, filters out so muc of the noise that it is, in practice, better than the SSD approach, even with very large kernels.
the histogram accumulation is only an addition function, and need only be done once per pixei per frame and does not increase in cost with additional. disparity resolution.
Another useful practice of the invention involves testing only for a small set of disparity values with SSD, populating the histogram, and then using the histogram votes to drive further SSD testing within that range to improve disparity resolution over time.
each output pixei thread having a respective "private histogram" maintained in on-chip storage close to the computation units (e.g., GPUs).
This private histogram can be stored such, that each, pixel thread will be reading and writing to the histogram on a single dedicated bank of shared local memory on a modern programmable GPU.
multiple histogram bins can be packed into a single word of the shared local memory and accessed using bitwise operations.
This practice of the invention is an. extension of the disparity histogram aspect of the invention, and has proven to be an highly useful pint of reducing error in the resulting disparity values, while still preserving important detail, on depth discontinuities in the scene.
n the- disparity values can come from many sources.
Multi-level disparity histograms reduce the contri bution from several of these error sources, including:
the multi-level voting scheme applies that same concept, but across descending frequencies in the image space.
FIGS. 2.8A. and 2.8B shows an example of a horizontal stereo image pair
FIGS. 28C and 28D show, respecti ely, the resulting disparity data before and after application of the described multi-level histogram, technique.
This aspect of the invention works by performing the image pattern matching FDDE) at several successively low-pass filtered versions of the input stereo images.
level is used herein to define a level of detail in the image, where higher level numbers imply a lower level of detail.
the peak image frequencies at leveifnj will be half that of levelfn-l J.
FIGS. 30A-E are a series of exemplary left and right multi-level input images.
Each level jhj is a do nsampled version of level jn-1 j.
the downsampling kernel is a 2x2 kernel with equal weights of 0.23 for each pixel The kernel remains centered at each successive level of downsampling.
the FDDE votes for every image level are included.
sttcH as the white wooden beams on the cabinets shown in. the background of the example of FIG, 30.
Level ⁇ 0 J the full image resolution
several possible matches may be found b the FDDE image comparisons since each of the wooden beams looks rather similar to each other, given the limited kernel size used for the FDDE.
FIG. 31 depicts an example of an image pair and disparity histogram, illustrating an incorrect matching scenario and its associated disparity histogram (see the notation "Winning candidate: incorrect" in the histogram of FIG. 31).
FIG. 32 shows the same scenario, but with the support, of 4 lower levels of FDDE votes in the histogram, resulting in a correct winning candidate (see the notation "Winning candidate: correct” in the histogram of FIG. 3.1 ).
the lower levels provide support for the true candidate at the higher levels.
a lower level i.e., a level characterized by reduced image resolution via low-pass filtering
the individual wooden beams shown in the image become less proooitnced, and the overall form of the broader context of that image region, begins to dominate the pattern .matching.
FIG . 33 is a schematic diagram of an exemplary practice of the invention.
FIG, 33 depicts a processing pipeline show ing the series of operations between the input camera images, through FDDE calculation and multi-level histogram voting, into a final disparity result.
multiple stereo axes e.g., 0 through n
multi-level disparity histogram representations in accordance with the invention, the following describes how the multi-level histogram is represented, and how to reliably integrate i s result to locate the final most likely disparity solution.
FIG, 34 shows a logical representation of the multi -level histogram after votes have been placed at each level.
FIG. 35 shows a physical repre entation of the same maiti-ievel histogram in numeric arrays in device memory, such as She digital memory units in a conventional s artphone architecture, hi an exemplary practice of the invention, the multi-level histogram consists of a series of initially independent histograms at each level of detail . Each histogram bin. in a given level represents the votes for a disparity found by the FDDE at that level. Since lev lj n] has a fraction the resolution as that of leveifn-l], each calculated disparity value represents a disparity uncertainty range which is that same fraction as wide. For example, in FIG. 4, each le vel is half the resolution as the one above it. As such, the disparity uncertainty range represented by each histogram bin is twice as wide as the level before it.
a significant detail to render the multi-level histogram integration correct involves applying a sub-pixel shift to the disparity values at each level during downsampling.
FIG. 34 if we look at the votes in ieveifOJ. disparity bin 8, these represent votes for disparity values 8-9, At level [1], the disparity bins are twice as wide. As such, we want to ensure that the histograms remain centered under the level above. Levelf 1 ] shows that the same bi represent 7.5 throug 9.5. This half-pixel offset is highly significant, because image error can cause the disparity to be rounded to the neighbor bin and then fail to receive support from the level below,
an exemplary practice of the invention applies a half pixel shift to only one of the images in the stereo pair at each level of down sampling. This can be done inline within the weights of the filter kernel used to do the downsampling between levels. While it is possible to omit the half pixei shift and use more complex weighting during multi-level histogram summation, it. is very inefficient. Performing the half pixel shift during down-sampling only involves modifying the filter weights and adding two extra taps, making it almost "free", from a computational standpoint,
FIG. 36 shows an example of per- !evel down-sampling according to the invention, using a 2x2 box filter.
On the left is illustrated a method without a half pixel shift.
On the right of FIG. 36 is illustrated the modified filter with a half pixel shift, in accordance with an exemplary practice of the invention. Note that this half pixel shift should only be appiied to one of the image in the stereo pair. This has the effect of disparity values remaining centered at each level in the multi-level histogram during voting, resulting in the configuration shown in FIG. 34.
FIG, 3? illustrates an exemplary practice of the invention, showing an example of the summation of the multi-level histogram to produce a combined histogram in which the peak can he found.
the histogram integration involves performing a recursive summation across all of the levels as shown in FIG. 37.
the peak disparity index and number of votes for that peak are needed and thus the combined histogram does not need to be actually stored in memory.
maintaining a summation stack can reduce summation operations and multi-level histogram memory access.
each le vel can be modified to control the amount of effect that the lower levels in the overall voting.
the current value at level [ J gets added to two of the bins above it in levelf n-lj with a weight of 1 ⁇ 2 each.
FIGS. 39-40 An exemplary practice of the invention, illustrated in FIGS. 39-40, builds on the disparity h istograms and allows for a higher accuracy disparity estimate to be acquired without requiring any additional SSD steps to be performed, and tor only a smalt amount of incremental math when selecting the optimal disparity from the histogram.
FIG. 38 is a disparit histogram for a typical pixel.
FIG. 39 is a histogram in a situation in which a sub-pixel disparity solution can be inferred from the disparity histogram. We can see mat an even number of votes exists in the 3rd and 4th bins. As such, we can say that the true disparity range l ies between 3.5 and 4.5 with a center point of 4.0.
FIG. 40 is a histogram that reveals another case in which a sub-pixel disparity solution can be inferred, in this ease, the 3rd bin is the peak with 1.0 votes, its directly adjacent neighbor is at 5 votes. As such, we can state mat the sub-pixel dispari ty is between these two and closer to the 3rd bin, ranging from 3.25 to 4,25, using the following equation:
Another practice of the invention provides a further method of solving the problem where larger kernels in the SSD method tend to favor larger intensity differences with the overall kernel, rather than for the pixel being solved.
This method of the invention involves applying a higher weight to the center pixel with a decreasing weight proportional to the distance of the given kernel sample from the center, By- doing this, the error function minim will tend to be found closer to the valid solution for the pixel being solved. injecdve Constraint
Yet another aspect of the invention involves the use of an "mjeetive constraint", as illustrated in FIGS. 41-45.
the goal is to produce the most correct results possible.
incorrect disparity values will get computed, especially if only using the FDDE data, produced via image comparison using SSD, SAD or one of the many image comparison error measurement techniques.
FIG. 41 shows an exemplary pair of stereoscopic images and the disparit data resulting from the FDDE using SAD with a 3x3 kernel.
Warmer colors represent closer objects.
a close look at FIG, 41 reveals occasional values wh ich look obviously incorrect.
Some of the factors causing these errors include camera sensor noise, image color response differences between sensors and lack of visibil ity of common feature between cameras.
one way of reducing these errors is by applying "constraints" to the solution which -reduce the set of possible solutions to a more realistic set of possibilities.
solving the disparity across multiple stereo axes is a tbrra of constraint by using the solution on one axis to reinforce or contradict that of another axis.
the disparit histograms are another form of constraint by limiting the set of possible solutions by filtering out spurious results in 2D space.
Multi-level histograms constrain the solution by ensuring agreement of the solution across multiple frequencies in the image.
the iojeefive constraint aspect of the invention uses geometric rules about how features must correspond between images in the stereo pair to eliminate false disparity solutions. It. maps these geometric rules on the concept of an imective function in set theory .
the domain and co-domain are pixels from each of the stereo cameras on an axis.
the references between the sets are the disparity values. For example, if every pixel in the domain (image A) had a disparity value of " 3 ⁇ 4", then flits means that a perfect bijection exists between the two images, since e ver pixel in the domain maps to the same pixel in the co-domain.
FIG. 42 shows an example of a bijection where every pixel in the domain maps to a unique pixel in the co-domain, hi this case, the image features are all at infinity distance and thus do not appear to shift between the camera images.
FIG. 43 shows anothe r example of a bijection. In tins ease ail the image features are closer to the cameras, but are all at the same depth and hence shift together.
FIG. 44 shows an example of an image with a foreground and background . Note that the foreground moves substantially between images. This causes new features to be revealed in the co- domain that will have .no valid reference in the domain. This is still art injective function, but not a bisection.
Best match wins The actual image matching error or histogram vote count rc compared between the two possible candidate element in the domain against the contested element in the co- domain. The one with the best match wins.
Smallest disparity wins During image reconstruction, typically errors caused by to small a disparity are less noticeable than errors with too high a disparity. As such, if there is contest for a given co-domain element, select the one with the smallest disparity and invalidate the others.
An exemplary practice of the invention involves the use of a disparity value and a sample buffer index at 2D control points. This aspect works by defining a data structure representing a 2D coordinate in image space and containing a disparity value, which is treated as a "pixel velocity" in screen space with respect to a given movement of the view vector.
control points can contain a sample buffer index that indicates which of the camera streams to take the samples from. For example, a. given feature may be visible in only one of the cameras in which case we will want to change the source that the samples are taken from when reconstructing the final reconstructed image.
This aspect of the invention is based on the observation that many of the samples in the multiple camera streams are of the same feature and are thus redundant. With a valid disparity estimate, it can be calculated that a feature is either redundant or is a unique feature from a specific camera, and
features/samples can be flagged with a refe rence count of how many of the views "reference” that feature.
Compression Method or Streaming with Video
a system in accordance with the invention can choose to only encode and transmit samples exactly one time. For example, if the system is capturing 4 camera streams to produce the disparity and control points and have produced reference counts, the system will be able to determine whether a pixel is repeated in all the camera views, or only visible in one. As such, the system need only transmit to the encoder the chunk of pixels from each camera that are actually unique. This allows for a bandwidth reduction in a video streaming session.
a system in accordance with the invention can establish an estimate of the viewer head or eye location and/or orientation. With this information and the disparity values acquired from feature correspondence or within the transmitted control point stream, the system can slide the pixels along the head movement vector at a rate that is proportional to the disparity. As such, die disparity forms the radius of a "sphere" of motion for a given feature.
This aspect allows a 3D reconstruction to be performed simply by warping a 2D image, provided the control points are positioned along important feature edges and have a sufficiently high quality disparity estimate.
no 3D geometry m the form of polygons or higher order surfaces is required.
a shortcut to estimate this behavior is to reconstruct the synthetic view based on the view origin and then crop the 2D image and scale it up to fill the v iew window before presentation, th minima and maxima of the c rop box being defined as a function of the viewer head location with respect to the display and the display dimensions.
An exemplary practice of the V3D invention contains a hybrid 2D/3D head detection component that combines a fast 2D head detector with the 3D disparity data from the multi-view solver to obtain an accurate v iewpoint position in 3D space relative to the camera system.
FIGS. 47A-B provid a flow diagram that illustrates the operation of the hybrid markerless head tracking system .
the system optionally converts to luminance and downsamp!es the image, and men passes it to a basic 2D facial feature detector.
the 2D feature detector uses the image to extract an estimate of the head and eye position as well as the face's rotation angle relative to the image plane. These extracted 2D feature positions are extremely noisy from frame to frame which, if taken alone as a 3D viewpoint, would not be sufficiently stable for the intended purposes of the invention. Accordingly, the 2D feature detection is used as a starting estimate of a head position.
the system uses tins 2D feature estimate to extract 3D points from the disparity date that exists in the same coordinate system as the original 2D image.
the system first determines an average depth for the face by extracting 3D points via the disparity date for a small area located in the center of the face. This average depth is used to determine a reasonable valid depth range that would encompass the entire head.
the system uses the estimated center of the face, the face's rotation angle, and the depth range to determine a best-fit rectangle that includes ' the head. For both the horizontal and vertical axis, the system calculates multiple vectors that are perpendicular to the axis but spaced at different intervals. For each of these vectors, the system tests the 3D points starting from outside the head and working towards the inside, to the horizontal or vertical axis. When a 3D point is encountered that, falls within the previously designated valid depth range, the system considers that a valid extent of the head -rectangle.
the system can determine a best-fit rectangle for the head, from which the system then extracts all 3 points that lie withi this best-fu rectangle and calculates a weighted average, if the number of valid 3D points extracted from this region pass a threshold in relation to the maximum number of possible 3D points in the region, then there is designated a valid 3D head position result.
FIG. 48 is a diagram depicting this technique for calculating the disparity extraction
the system can interpolate from frame -to-fi3 ⁇ 4me based on the time delta that has passed since the previous frame.
This method of the invention works by taking one or more source images and set of control points as described previousl .
the control points denote “handles " on the image which we can then move around in 2D space and interpolate the pixels in between.
the system can therefore slide the control points around in 2D image space proportionally to their disparit value and create the appearance of an image taken from a different 3D perspective.
the following are details of how the interpolation can be accomplished in accordance with exemplary- practices of the invention.
This implementation of 2D warping uses the line drawing hardware and texture filtering available on modem GPU hardware, such as in a conventional smartphone or other mobile device, it has the advantages of being easy to implement, fast to calculate, and avoiding the need to construct, complex connectivity meshes betwee the control points in multiple dimensions. It works by first rotating the source images and control points coordinates such that the rows or columns of pixels are parallel to the vector between the original image center and the new view vector. For purposes of this explanation, assume the view vector is aligned to image scaniines. Next, the system iterates through each scanline and goes through ah the control points for that scan! me.
the system draws a line beginning and ending at each control point in 2D image space, but adds the disparity multiplied by the view vector magnitude with the x coordinate.
the system assigns a texture coordinate to the beginning and end points that is equal to their original 2D location in the source image.
the GPU will draw the line and will interpolate the texture coordinates linearly along the line. As such, image data between (he control points will be stretched linearly. Provided control points are placed on edge features, the interpolation will not be visually obvious.
the result is a re-projected image, which is then rotated back by the inverse of the rotation originally applied to align the view vector with the scaniines.
Thrs approach is related to the lines but works by linking control points not only along a scanline but also between scaniines. in certain cases, this may provide a higher quality interpolation, than lines alone.
the determination of when it is appropriate to slide versus the default stretching behavior can be made by analyzing the disparity histogram and checking for multi-modal behavior. If two strong modes are present, this indicates the control point is on a boundary where it would be better to allow the foreground and background to move independently rather than interpolating depth between them.
HMDs head-mounted stereo displays
the telecommimications devices can include known Forms of cellphones, smartphones. and other known forms of mobile devices, tablet computers, desktop and laptop computers, and known forms of digital network components and server cloud netxvork ciient
Computer software can encompass any set of computer-readable programs instructions encoded on a non- transitory computer readable medium.
a computer readable medium can encompass any form of computer readable element, including, but not limited to, a computer hard disk, computer floppy disk, computer-readable flash drive, computer-readable RAM or ROM element or any other known means of encoding, storing or pro viding digital information, whether local to or remote from the cellphone, smaxtphone, tablet computer, PC, laptop, computer-driven television, or oilier digital processing device or system.
Various forms of computer readable elements and media are well known in the computing arts, and their selection is left to the implementer.
modules can be implemented using computer program modules and digital processing hardware elements, including memory units and other data storage units, and including commercially available processing units, memor units, computers, servers, smartphones and other computing and telecommunications devices.
modules include computer program instructions, objects, components, data structures, and the like mat can be executed to perform selected tasks or achie ve selected outcomes.
modules shown in the drawin s and discussed in the description herein refer to computer-based or digital processor-based elements that can be implemented as software, hardware, firmware and/or other suitable components, taken separately or in combination ⁇ that provide the functions described herein, and which ma be read from computer storage or memory, loaded into the memory of a digital processor or set of digital processors, connected via a bus.
data storage element can refer to any appropriate memory element usable for storing program instructions, machine readable files, databases, and other data structures.
the various digital processing, memory and storage elements described herein can be implemented to operate on a single computing device or system, such as a server or collection, of servers, or they can be implemented and inter-operated on various devices across a network, whether in a server-client arrangement, server-cloud-cl ient arrangement, or other configuration in which client devices can communicate with allocated resources, functions or applications programs, or with a server, via a communications network.
CM3-U3-I 3S2C- €S Three PoiniGiey Chame!eon3 (CM3-U3-I 3S2C- €S) 1.3 Megapixel camera modules with i/3" sensor size assembled on a polycarbonate plate with shutter synchronization circuit.
An Intel Core i7-4650U processor which includes on-chip the following:
FIGS. 49-54 depict system aspects of the invention, including digital processing devices and architectures in which the invention can be implemented.
FIG. 49 depicts a digital processing device, such as a commercially available smartphone, in which the invention can be implemented
FIG. 50 shows a full-duplex, bi-directional practice of the invention between two users and their corresponding devices:
FIG. 51 shows the use of a system in accordance with the invention to enable a first user to view a remote scene;
FIG. 52 shows a one-to- iany configuration in which multiple users (e.g., audience members) can. view either simultaneously or asynchronously using a variety of different viewing elements in accordance with the invention:
FIG. 49 depicts a digital processing device, such as a commercially available smartphone, in which the invention can be implemented
FIG. 50 shows a full-duplex, bi-directional practice of the invention between two users and their corresponding devices:
FIG. 51 shows the use of a system in accordance with the invention to enable a first user to
FIG. 53 shows an embodiment of the invention in connection with generating an image data stream for the control system of an autonomous or self-driving vehicle; and
FIG. 54 shows the use of a head-mounted display (HMD) in connection with the invention, either in a pass-through mode to view an actual, external scene (shown on the right side of FIG. 54), or to view prerecorded image content.
HMD head-mounted display
the commercially available smartphone. tablet computer or other digital processing device 492 communicaies with a conventional digital communications network 494 via a communications pathway 495 of known form (the collective combination of device 492, network 494 and communications pathwa (s) 495 forming configuration 490), and the device 492 includes one or more digital processors 496. cameras 4910 and 4912. digital memory or storage elements) 4914 containing, among other items, digital processor-readable and processor-executable computer program instructions (programs) 4 16, and a display element 498.
the processor 496 can execute programs 4916 to cany out various operations, including operations in accordance with the present invention.
the full-duplex, bi-directional practice of the invention between two users and their corresponding devices includes first user and scene 503, second user and scene 505, smartphones, tablet computers or other digital processing devices 502, 504, network 506 and communications pathways 508, 5010.
the devices 502, 504 respectivel include cameras 5012, 5014. 5022, 5024, displays 5016, 5026, processors 501 8, 5028, and digital memory or storage elements 5020, 5030 (which may store processor-executable computer program instructions, and which may be separate from the processors).
the configuration 10 of FIG . 51 for enabling a first user 514 to view a remote scene 515 containing objects 5022, includes smartphone or other digital processing device 5038. which can contain cameras 5030,5032, a display 5034, one or more processors) 5036 and storage 5038 (which can contain computer program instructions and which can be separate from processor 5036).
Configuration 510 also includes network 5024» communications pathways 5026, 5028, remote cameras 16, 518 with a view of the .remote scene 15, processors) 5020, and digital memory or storage eieme.ni(s) 5040 (which can contain computer program instructions, and which can be separate from processor 502.0).
the one-to-many configuration 520 of FIG. 52 in which multiple users (e.g., audience members) using smartphones, tablet computers or other devices 526.1, 526.2, 526.3 can view a remote scene or remote first user 522, either simultaneously or asynchronously, in accordance with the invention, includes digital processing device 524, network 5212 and communications pathways 5214, 5216.1, 5216.2, 5216.3.
the smartphone or other digital processing device 524 used to capture images of the remove scene or first user 522, and the smartphones or other digital processing devices 526.
L 526.2, 526.3 used by respective viewers/audience members include respective cameras, digital processors, digital memory or storage elements) ⁇ which may store computer program instructions executable by the respective processor, and which may be separate from the processor), and displays.
the embodiment or configuration 530 of the invention, illustrated in FIG. 53, for generating an image data stream for the control system 5312 of an autonomous or self-driving vehicle 532 can include camera(s) 5310 having a view of scene 534 containing objects 536, processors) 538 (which may includs or have in communication therewith digital memory or storage elements for storing data and/or processor-executable computer program instructions) in communication, with vehicle control system 5312.
vehicle control system 5312 may also include digital storage or memory elements) 5314. which may include executable program, instructions, and which may be separate from vehicle control system 5312.
HMD-related embodiment or configuration 540 of the invention can include the use of a head-mounted, display (HMD) 542 m connection with the invention, either in a pass- through mode to view an actual, external scene 544 containing objects 545 (shown on the right side of FIG, 54), or to view prerecorded image content or data representation 5410.
HMD head-mounted, display
the HMD 542. which can be a purpose-built HMD or an adaptation of a sniartphone or other digital processing device, can be In communication with an external processor 546, external digital memory or storage elements) 548 that can contain compute r program instructions 549, and/or in communication wi th a source of prerecorded content or data representation 5410,
the HMD 542 shown in FIG, 54 includes cameras 5414 and 5416 which can have a view of actual scene 544; left and right displays 5418 and 5420 for respectively displaying to a user's left and right eyes 5424 and 5426; digital processor(s) 5412, and a liead/eye face tracking element 5422.
the tracking element: 5422 can consist of a combination of hardware and software elements and algorithms, described in greater detail elsewhere herein, in accordance with the present invention.
the processor element(s) 5 12 of the HMD can also contain, or have proximate thereto, digital memory or storage elements, which may store processor-executable computer program instructions.
digital memory or storage elements can contain digital processor-executable computer program instructions, which, when executed by a digital processor, cause the processor to execute operations in accordance wi th various aspects of the present invention.
FIGS. 55-80 are flowcharts illustrating method aspects and exemplary practices of the invention
the methods depicted in these flowcharts are examples only; the organization, order and number of operations in the exemplary practices can he varied; and the exemplary practices and methods can be arranged or ordered differently, and include different functions, whether singly or in combination, while still being within the spirit and scope of the present invention. Items described below in parentheses are among other aspects, optional in a given practice of the invention.
FIG. 55 is a flowchart of a V3D method 550 according to an exemplars' practice of the mvention, including the following operations:
FIG. 56 is a flowchart of another V3B method 560 according to an exemplary practice of the invention, including the following operations:
56 i Capture images of remote scene ; 562: Execute image rectification:
567 Display synthetic view to first user (on display screen used by first user);
FIG. 7 is a flowchart of a self-portraiture V3D method 570 according to an exemplary practice of the invention, including the following operations: 571 : Capture images of user during setup time (use camera provided on or around periphery* of display screen of user's handheld device with, view of user's face during self-portrait setu time);
573 Generate data representation representative of captured images
574 Reconstruct synthetic view of user, based on the generated data represeniation and generated tracking information
575 Display to user the synthetic view of user (on the display screen during the setup time) (thereby enabling user, while setting up self-portrait, to selectively orient or position his gaze or head, or handheld device and its camera, with real-time visual feedback); 576: Execute capturing, estimating, generating, reconstructing and displaying such that, in self-portrait, user can appear to be looking directly into camera, even if camera does not have direct eye contact gaze vector to user.
FIG. 58 is a flowchart of a photo composition V3D method 580 according to an exemplary practice of the invention, including the following operations: 581 : At photograph setup time, capture images of scene to he photographed (use camera provided on a side of user's handheld device opposite display screes side of user's device);
tracking information synthetic view .reconstructed suc that scale and perspective of synthetic view have selected correspondence to user's viewpoint relati ve to handheld device and scene;
585 Display to user the synthetic view of the scene (on display screen during setup time) (thereby enabling user, while setting up photograph, to frame scene to be photographed, with selected scale and perspective within displa frame, with real-time visual feedback) (wherein user can control scale and perspective of sy nthetic view by changing position of handheld device relative to position of user's head).
FIG. 5 is a flowchart of an HMD-related V3D method 590 according to an exemplary practice of the invention, including the following operations:
captured image streams contain images of a scene
at least one camera is panoramic, night-vision, or thermal imaging camera
392 Execute .feature correspondence function
each of the synthetic views has respective view origin corresponding to respective virtual camera location, wherein the resp ctive view origins are positioned such that, ⁇ he respective virtual camera locations coincide with respective locations of user's left and right eyes, so as to provide user with substantially natural visual experience of perspective, binocular stereo and occlusion exemplary practices of the scene, substantially as if user were directly viewing scene without an HMD.
FIG. 60 is a flowchart of another HMD-related V 3D method 600 according to an exemplary practice of the invention, including the following operations:
60 i Capture or generate at. least two image streams; (using at least one camera);
captured image streams can contain images of a scene); (wherei captured image streams can be pre-recorded image content): (wherein ai least one camera is panoramic, night-vision, or thermal imaging): (wherei at least one IR TOF that directly provides depth); 602: Execute feature correspondence function;
603 Generate data representation representative of captured images contained in captured image streams
605 Display synthetic views to the user, via HMD;
each of the synthetic views has respective view origin corresponding to respective virtual camera location, wherein the respective view origins are positioned such that the respective virtual camera locations coincide with respective locations of user's left and right eyes, so as to provide user with substantially natural visual experience of perspective, binocular stereo and occlusion exemplary practices of the scene, substantially as if user were directly viewing scene without an HMD.
F1.G. 61 is a flowchart of a vehicle control system-related method 610 according to an exemplary practice of tire invention, including the following operations:
61 1 Capture images of scene around at least a portion of vehicle (using at least one camera having a view of scene )
FIG. 62 is a flowchart of another V3D method 620 according to an exemplary practice of the invention, which can utilize a view vector rotated camera configuration and/or a number of the foil owing operations:
cameras define a line; rotate the line defined by first and second camera locations by a selected amount from selected horizontal or vertical axis to increase number of valid feature
FIG. 63 is a flowchart of an exposure cycling method 630 according to an exemplary practice of the invention, including the following operations; 631 : Dynamically adjust exposure of eamera ⁇ s) on irame-by ⁇ frame basis to improve disparity estimation in regions outside exposed region: take series of exposures, including exposures lighter than and exposures darker that ⁇ a visibility-optimal exposure; calculate disparity values tor each exposure; and integrate disparity values into an overall disparity solution over time, to improve disparity estimation;
the overall disparity solution includes a disparity histogram into which disparity values are
the disparity histogram being converged over time, so as to improve disparity estimation.
disparity solution includes disparity histogram: analyze variance of disparity histograms on respective dark, mid-range and light pixels to generate variance information used to control exposure settings of eamerais), thereby to form a closed loop between quality of disparity estimate and set of exposures requested from. camera(s».
FIG. 64 is a flowchart of an image rectification method 640 according to an exemplary practice of the in vention, incl uding the following operations:
FIGS . 65A-B show a flowchart of a feature correspondence method 650 according to an exemplary practice of the invention, which can include a number of the following operations:
6SS (Votes indicated by disparity histogram initially generated utilizing sum of square differences
SSD executing SSD method with relatively small kernel to produce fast dense disparity map m which eac pixel has selected disparity that represents lowest error; then, proces ing plurality' of pixels to accumulate into disparity histogram a tally of number of votes for given disparity in relatively larger kernel surrounding pixel in question);
6515 (Test for only a small set of disparity values using small-kernel SSD method to generate initial results; populate corresponding disparity histogram with initial results; then use histogram votes to drive further SSD testing within given range to improve disparity resolution over time)
651.6 (Extract sub-pixel disparity information from disparity histogram: ' where histogram indicates a maximum-vote disparity range arid an adjacent, ronner-up disparity range, calculate a weighted average disparity value based on ratio between number of votes for each of the adjacent disparity ranges);
the feature correspondence function includes weighting toward a center pixel in. a sum -of
SSD squared differences
the feature correspondence function includes optimising generation of disparity values on
FIG, 66 is a flowchart of a method 660 for generating a data representation, according to an exemplar;' practice of the invention, which can include a number of the following operations:
a disparity value treated as a pixel velocity in screen space with respect to a given movement of a given view vector; and utilize the disparity value in combination with movement vector to slide a pixel in a gi ven source image in selected directions, in 2D, to enable a
each camera generates a respective camera stream; and the data structure contains a sample buffer index, stored in. association with control poin t coordinates, that indicates which, camera stream to sample in association wi th given control point);
FIGS. 67A-B show a flowchart of an image reconstruction method 670, according to an exemplar practice of the invention, which can include a number of the 'following operations:
sliding is utilized in regions of large disparity or depth change
integration functions for one or more pixels in a desired output -resolution of an image to be displayed to the user maps an input view origin vector to at least one known, weighted 2D image sample location in at .least one input image buffer).
FIG. 6-8 is a flowchart of a display me thod 680, according to an exemplary practice of the invention, which can include a number of the fotlowing operations:
FIG. 69 is a flowchart of a method 690 according to an exemplary practice of the invention, utilizing a multi-level disparity histogram, and which can also include the following: 691 : Capture images of scene, using at least first and second cameras having a view of the scene, the cameras being arranged along an axis to configure a stereo camera pair having a camera pair axis;
Each level is assigned a level number, and each successively higher level is characterized by a lower image resolution:
Each histogram bin in a given level represent votes for a disparity determined by the FDDE at that level
Each histogram bin in a given level has an associated disparity uncertainty range, and the disparit uncertaint range represented by each histogram bio is a selected multiple wider than the disparity uncertainty range of a bin in the preceding le vel;
rounding error effect apply half pixel shift to only one of the images in a stereo pair at each level of downsampling
694, 1 Apply sub-pixel shift implemented inline, within the weights of the filter kernel utilized to implement the downsampling from level to level. 695: Execute histogram integration, including executing a recursive summation across all the f ODE levels;
FIG. 70 is a flowchart of a method 700 according to an exemplars.' practice of the invention, utilizing RIID image space and including the following operations:
Capture linages of scene using at least first and second cameras hav ing a view of the scene, the cameras being arranged along an axis to configure a stereo camera pair having a camera pair axis, and for each camera pair axis, execute image capture using the camera pair to generate image data;
FIG. 71 is a flowchart of a method 710 according to an exemplary practice of the invention, utilizing an injective constraint aspect and including the following operations:
71 1 Capture images of a scene, using at least first and second cameras having a view of the scene, the cameras being arranged along an axis to configure a stereo camera pair;
a feature correspondence function by detecting common features between corresponding images captured by the respecti ve cameras and measuring a relative distance in image space between the common features, to generate disparity values
the feature correspondence function including: generating a disparit solution based on the disparity values, and applying an injective constraint to the disparity solution based on domain and co-domain, wherein the domain comprises pixels for a given image captured b the first camera, and the co-domain comprises pixels for a corresponding image captured by the second camera, to enable correction of error in the disparity solution, in response to violation of the injective constraint, and wherein the infective constraint is that no element in the co-domain is referenced more than once by elements in the domain .
FIG. 72 is a flowchart of a method 720 for applying an injective constraint, according to an exemplary practice of the i n vention, including the follo wi ng operations:
FIG , 73 is a flowchart of a method 730 relating to error correction approaches based on injective constraint, according to an exemplary practice of the invention, including one or more of the following:
First-come, first-served assign priority to the first element tn the domain to claim an element in the co-domain, and if a second element in the domain claims the same co-domain element, invalidating that subsequent match and designating tha subsequent match to be invalid;
Best match wins compare the actual image matching error or corresponding histogram vote count between the two possible candidate elements in the domain against the contested element in the co-domain., and designate as winner the domain candidate with the best match;
Smallest disparity wins if there is a contest between candidate elements in the domain for a given co-domain element, wherein each candidate element has a corresponding disparity, selecting the domain candidate with the smallest disparity and designating the others as invalid;
Seek alternative candidates select and test the next best domain candidate, based on a selected criterion, and iterating the selecting and testing until the violation is eliminated or a computational time limit, is reached.
FIG . 74 is a flowchart of a head eye/iaee location estimation method 740 according to an exemplary practice of the invention, including the following operations:
tracking information which can include the following:
744.1 Pass a captured image of the first user, the captured image including Ac first user's head and lace, to a two-dimensional (2D) facial feature detector that utilizes the image to generate a first estimate of head and eye location and a rotation angle of the face relative to an image plane;
2D two-dimensional
744.2 Use an estimated center-of-face position, face rotation angle, and head depth range based on the first estimate, to determine a best-fit rectangle that includes the head; 744.3: Extract from the best-fit rectangle alt 3D points that lie within, the best-fit rectangle, and calculate therefrom a representative 3D head position;
FIG. 75 is a flowchart of a method 750 providing further optional operations relating to the 3D location estimation shown in FIG. 74, according to an exemplary practice of the invention, including the following:
FIG , 76 is a flowchart of optional sub-operations 760 relating to 3D location estimation, according to an exemplar - practice of the invention, which can include a number of the following:
FIG. 77 is a flowchart of a .method 770 according to an. exemplary practice of the invention, utilizing URUD image space and including the following operations;
Captitre images of a scene using at least three cameras having a view of the scene, the cameras being arranged in a substantially ' T-shaped configuration wherein a first pair of cameras is disposed along a first axis and second pai of cameras is disposed along a second axis intersecting wi th, but angularly displaced from, the first axis, wherein the first and second pairs of came ras share common camera at or near the intersection of the first, and second axis, so that the first and second pairs of cameras represent respecti ve first and second independent stereo axes that share a common camera;
FIG, 78 is a flowchart of a method 780 relating to optional operations in RUD/URUD image space according to an exemplary practice of the invention, including the following operations:
FIG. 79 is a flow chart of a method 790 relating to private disparit histograms according to an exemplary practice of the invention, including the following operations:
a feature correspondence function by detecting common features between corresponding images captured by the at least one camera and measuring a relative distance in image space between the common features, to generat disparity values, using a disparity histogram method to integrate data and determine correspondence, which can include:
FIG. 80 is a flowchart of a method 800 fu rther relating to private disparity histograms according to mi exemplary practice of the invention, including the following operations:
disparity 1 histogram is characterized by a series of histogram bins indicating the number of votes for a given disparity range; and if a maximum possible number of votes in the private disparity histogram, is known, multiple histogram bins can be packed into a single word of the shared local memory and accessed using bitwise GPU access operations.
Identification, authentication or matching of a user or subject, by the user's facial features can be useful in a wide range of settings. These may include controlling or limiting access to systems, enabling rapid or simplified access to systems or to a particular use account or use profile on a system, or other security purposes. Exemplary practices and embodiments of the invention enable such identification, authentication or matching, by generating a Facial Signature based on images of the users or subject's face, or face and head, as described in greater detail below.
the digital processor elements of the embodiments of the invention depicted in the accompanying drawing figures can be employed to execute the Facial Signature functions of exemplary practices and embodiments of the i nvention described herein, including image capture, image rectification, feature correlation/disparity value processing, and Facial Signature generation functions.
the Facial Signature aspects of the invention can be executed on otherwise conventional processing elements and platforms provided by or associated with known forms of desktop computers, laptop computers, tablet computers, smartphones, and associated additional or peripheral hardware elements, such as cameras, suitably configured in accordance with exemplars' practices of the invention.
the front-end aspects of the V3D processing pipeline described above i.e. aspects of Image Capture, Image Rect fication and Feature Correspondence, are employed, but instead of constructing a representation intended for 3D streaming of a scene for visualizing it from different views (see, e.g., FIGS. 7 and 8, depicting exemplar ' practices and embodiments of the V3D invention), the V3D process front-end can be configured to construct, a "Facial Signature" for the purposes of subsequently identifying an individual person, or user of a system or resource, in a secure manner that is substantially more difficult to forge than a regular 2D facial image.
a "Facial Signature" for the purposes of subsequently identifying an individual person, or user of a system or resource
FIG. 85 is a flowchart of an exemplary practice of the Facial Signature aspects rising V3D process elements of the invention, includi ng capturing images of the user's or subject's face (851 ), executing image rectification to compensate for camera optical di tortion and alignment (852), executing feature correspondence to produce di parity /depth values (853), eliminating the image background (854), and generating a facial, signature data representation (855).
the enhanced level of security provided by the Faciai Signature aspect of the invention is enabled in part because the depth stereo estimation of the V3D method of the invention described in this document requires all of the facial features to be presented to the camera(s) at the correct distance ratios between the cameras or from the structure light or tinie-of-fligbt sensor. Creating a forgery would require an accurate physical model of the face in the real world. By requiring multiple poses, the forger's challenge of constructing au accurate 3D model becomes highly impractical.
FIG, 81 illustrates an exemplary practice of the Facial Signature aspect of the invention, including obtaining images from the carnera(s) (81, 1). generating rectified images (81.2), executing disparity/depth estimation (81.3), executing background elimination (81.4), and combining with 2D color information (81.5a, 81.5b, 81.5c).. which can occur using multiple poses of the human user/subject, as described in greater detail below.
the facial signature could be a combination of the 3D facial contour information and the regular 2D image from one or more of the cameras.
Facial Signature aspect of die invention could either store the 3D contour data in the signature, or simply use the 2D image of the face but use the 3D facial contours jest to confirm that the image(s) depict an actual human face with credible 3D proportions that was viewed by the cameras at. the same location as die 2 image.
a method in accordance with the Facial Signature aspect can also include an enrollment phase in which the human user or subject would be requested, by the system, to move his or her head into different orientations, and, optionally, strike a number of alternative fe i l poses, such as "smile” or "wink", so that the system can establish a. robust scan (or multiple scans) of the human subject's facial proportions.
an enrolled Facial Signature is generated from these scans.
a few seconds of images of the user's or subject's head can be captured in real-time, resulting in hundreds of individual captures, each slightly different, and then correlated with the facial signature to confirm a match.
Exemplary practices of the invention can be configured for a variety of purposes, including, but not limited to, the following:
the facial signature generated from the depth information extracted from the V3D front-end can be used to identify a specific individual. Such an identification would be more reliable than a
the facial signature aspects of the invention can be combined with other security factors, such as a fingerprin or a pass-code, to provide a high level of security for accessing a user accoun on a device or system, or for other authentication purposes, hi a hybrid configuration with a conventional 2D face identification system:
the 3D contour data could atone be used to identify a face, combining it with the existing 2D image from one or more cameras would add further security 1 .
Existing 2D facial detection systems return one or more rectangles or "boxes" defining the 2D extent of a human subject's face location.
a 2D facial detection operation could be executed, and then used to minimize the amount of processing required for the 3D calculation by limiting the calculations to within that box. Utilizing the 2D facial detection operation in this manner can reduce the system's power ssumpt on, and reduce the time required for facial recognition on a given device.
a user would train the device by generating a unique fecial signature.
the system would request the user to present a series of desired head movements relative to the device, or a series of facial expressions, suck as "smile” or "wink.”
an enrolled Facial Signature is generated from these scans. By collecting a. series of possible poses, the signature would have higher dimensionality than would a single pose. Matching process :
the matching process could simply observe or capture images of the user for a few seconds, and with a sufficiently high-performance depth detection system, would capture many frames of 3D and 2D data, in accordance with the invention, tins would be correlated with the racial signature captured during the enrollment process, and a probability of match score would be generated. This score would then be compared with a threshold to confirm or deny an identify match.
an exemplary practice of the facial signature method of the invention includes updating or evolving the facial signature itself on. every successful match, or on every «th successful match, where n is a selected integer, in order to accommodate these changes.
one method of representing the facial signature is in the form of one or more combined histograms taken directly from the summation of per-pi xel disparit histogram s within the feature correspondence calculation, or generated from depth data from a sensor capable of directly perceiv ng depth, " These combined histograms represent the normalized relative proportion of facial, feature depths across a plane, parallel to the user's face.
FIGS. 82-83 show an exemplary image processed in accordance with a exemplary practice of the Facial Signature aspects of the invention.
FIG. 82 is an example of an image of a human user or subject captured by at least one camera
FIG. 8 is an example of a representat ion of image data, corresponding to the image of FIG. 82. processed in acco rdance with an exemplary practice of the invention.
FIG. 84 shows a histogram representation corresponding to the image(s) of FIGS. 82-83, generated in accordance with an exemplary practice of the Facial Signature aspects of the invention.
the X-axis of the histogram would represent a disparity (or depth) range
the Y-axis would represent the normalized count of image samples that fell within that range.
a conventional 2D face detector can be employed to provide a face rectangle and location of the basic facial features, such as eyes, nose and mouth. See, e.g., FIG. 83, which indicates, among other aspects, a rectangle surrounding the human subject's face.
a candidate identification histogram ss captured, it would he subtracted from the set of enrolled histograms and the vector distance would constitute a matching score. By comparing the matching score against a programmable threshold, access could be granted or denied.
this method could be used hi isolation or paired in a hybrid configuration with conventional 2D image matching of the face to provide a further authentication factor.
FIGS. 85-88 are flowcharts illustrating method aspects and exemplary practices of the invention.
the methods depicted in these flowcharts axe examples only ; the organization, order and number of opemtions in the exempian' practices can be varied; and the exemplary practices and methods can be arranged or ordered differently, and include different functions, whether singly or in combination, while still being wi thin the spiri t and scope of the present invention, items described below in parentheses are, among other aspects, optional in a given practice of the invention.
FIG. 85 is a flowchart of a .method 850 for generating a facial signature data representation,, according to an exemplary practice of the invention, which can include a number of the following operations:
FIG. 86 is a flowchart of further 'method aspects 860 for generating a faciai signature data representation, according to an exemplary practice of the invention, which can incl ude a number of the following operations:
the metiiod or system can uiibze stereo depth estimation t verify that human facial features are presented to eanierai s) at correct distance ratios between camera(s) Or front structured light or time-of-flight sensor);
the feature correspondence function or depth detection function includes computing distances between facial features from multiple perspectives
the facial signature can be combination, of 3D faciai contour information and. 2D image data, from one or more camera(s));
(3D contour data can be stored in facial signature data representation);
facial signature generated in accordance with the invention can be utilized as a security factor in an authentication system, either alone or in combination with other security factors);
(3D f cial contour data can be combined with 2D image data from one or more cameras in a conventional 2D face identification system, to create a hybrid 3D/2D face identification system);
3D faciai contour data can be used to confirm that a face having credible 3D human facial proportions was presented to the caniera(s) at an overlapping spatial location of captured 2D tmage(s));
a 2D bounding rectangle defining a 2D extent of the human user's or subject's face location, can be used to limit search space and limit calculations to a region defined by the rectangle, thereby increasing speed of recognition and reducing power consumption);
the facial signature data representation can. be a histogram-based facial signature data
FIG. S7 is a flowchart of method aspects 870 for generating a histogram-based facial signature dat representation, according to an exemplary practice of the invention, which ca include a number of the following operations:
the facial signature is represented as one or more histograms obtained from a summation of per ⁇ pixel disparity histograms within feature correspondence calculation, or generated from depth data from a sensor capable of directly perceiving depth);
the histogram represents normalized retative proportioo of facial feature depths across a plane parallel to the user's or subject's face):
the Y-axis represents normalized count of image samples that fall within given range
a conventional 2D face detector can provide a face
870.6 (Disparity and depth points cars be projected into a canonical coordinate system defined by a plane geometrically constructed from or defined by basic facial features such as eyes, nose- mouth); 870.7: (The histogram e la t tion can be used m eonmination with conventional 2D face matching to provide an additional authentication factor).
FIG. 88 is a flowchart of a facial signature method aspect: 880, including enrollment; and matching phases of an exemplary facial signature method in accordance with the invention, which can include a mmiber of the following operations:
Capture images (using at least one camera) for the enrollment phase (can utilize and require a selected number ( «) poses of die human user or subject):
Capture images (using at least one camera) for the matching phase can utilize and require a selected nam her ( «) poses of the human user or subject) ;

Landscapes

Engineering & Computer Science (AREA)
Theoretical Computer Science (AREA)
General Health & Medical Sciences (AREA)
Computer Vision & Pattern Recognition (AREA)
Health & Medical Sciences (AREA)
Physics & Mathematics (AREA)
General Physics & Mathematics (AREA)
Multimedia (AREA)
Human Computer Interaction (AREA)
Oral & Maxillofacial Surgery (AREA)
Artificial Intelligence (AREA)
Computing Systems (AREA)
Databases & Information Systems (AREA)
Evolutionary Computation (AREA)
Medical Informatics (AREA)
Software Systems (AREA)
Image Analysis (AREA)
Collating Specific Patterns (AREA)

PCT/US2016/032213 2015-03-21 2016-05-12 Facial signature methods, systems and software Ceased WO2016183380A1 (en)

Priority Applications (3)

Application Number	Priority Date	Filing Date	Title
EP16793565.9A EP3295372A4 (de)	2015-05-12	2016-05-12	Verfahren, systeme und software für gesichtssignatur
US15/573,475 US10853625B2 (en)	2015-03-21	2016-05-12	Facial signature methods, systems and software
US17/107,413 US11995902B2 (en)	2015-03-21	2020-11-30	Facial signature methods, systems and software

Applications Claiming Priority (4)

Application Number	Priority Date	Filing Date	Title
US201562160563P	2015-05-12	2015-05-12
US62/160,563		2015-05-12
USPCT/US2016/023433		2016-03-21
PCT/US2016/023433 WO2016154123A2 (en)	2015-03-21	2016-03-21	Virtual 3d methods, systems and software

Related Parent Applications (1)

Application Number	Title	Priority Date	Filing Date
PCT/US2016/023433 Continuation-In-Part WO2016154123A2 (en)	2015-03-21	2016-03-21	Virtual 3d methods, systems and software

Related Child Applications (4)

Application Number	Title	Priority Date	Filing Date
US15/560,019 A-371-Of-International US10551913B2 (en)	2015-03-21	2016-03-21	Virtual 3D methods, systems and software
US15/573,475 A-371-Of-International US10853625B2 (en)	2015-03-21	2016-05-12	Facial signature methods, systems and software
US16/749,989 Continuation US11106275B2 (en)	2015-03-21	2020-01-22	Virtual 3D methods, systems and software
US17/107,413 Continuation US11995902B2 (en)	2015-03-21	2020-11-30	Facial signature methods, systems and software

Publications (1)

Publication Number	Publication Date
WO2016183380A1 true WO2016183380A1 (en)	2016-11-17

Family

ID=57248607

Family Applications (1)

Application Number	Title	Priority Date	Filing Date
PCT/US2016/032213 Ceased WO2016183380A1 (en)	2015-03-21	2016-05-12	Facial signature methods, systems and software

Country Status (2)

Country	Link
EP (1)	EP3295372A4 (de)
WO (1)	WO2016183380A1 (de)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
EP3289971A1 (de) *	2016-08-29	2018-03-07	Panasonic Intellectual Property Management Co., Ltd.	Biometrische vorrichtung und biometrisches verfahren
CN109299729A (zh) *	2018-08-24	2019-02-01	四川大学	车辆检测方法及装置
CN109492571A (zh) *	2018-11-02	2019-03-19	北京地平线机器人技术研发有限公司	识别人体年龄的方法、装置及电子设备
WO2019180538A1 (en) *	2018-03-23	2019-09-26	International Business Machines Corporation	Remote user identity validation with threshold-based matching
US10762663B2 (en)	2017-05-16	2020-09-01	Nokia Technologies Oy	Apparatus, a method and a computer program for video coding and decoding
US10892901B1 (en)	2019-07-05	2021-01-12	Advanced New Technologies Co., Ltd.	Facial data collection and verification
WO2021004055A1 (zh) *	2019-07-05	2021-01-14	创新先进技术有限公司	人脸数据采集、验证的方法、设备及系统
US10997736B2 (en)	2018-08-10	2021-05-04	Apple Inc.	Circuit for performing normalized cross correlation
US11227405B2 (en) *	2017-06-21	2022-01-18	Apera Ai Inc.	Determining positions and orientations of objects
CN114937271A (zh) *	2022-05-11	2022-08-23	中维建通信技术服务有限公司	一种通信数据智能录入校对方法
US11960639B2 (en)	2015-03-21	2024-04-16	Mine One Gmbh	Virtual 3D methods, systems and software
CN118071359A (zh) *	2024-04-17	2024-05-24	交通银行股份有限公司江西省分行	一种金融虚拟身份验证方法及系统
US11995902B2 (en)	2015-03-21	2024-05-28	Mine One Gmbh	Facial signature methods, systems and software
US12169944B2 (en)	2015-03-21	2024-12-17	Mine One Gmbh	Image reconstruction for virtual 3D
CN119232998A (zh) *	2023-06-29	2024-12-31	荣耀终端有限公司	视频处理方法、装置、电子设备和存储介质

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
CN115393262B (zh) *	2022-05-26	2025-07-25	西北工业大学	基于Gegenbauer正交多项式的图像复制-旋转-移动伪造检测方法

Citations (8)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
US20040240711A1 (en) *	2003-05-27	2004-12-02	Honeywell International Inc.	Face identification verification using 3 dimensional modeling
US20050063566A1 (en) *	2001-10-17	2005-03-24	Beek Gary A . Van	Face imaging system for recordal and automated identity confirmation
US20070183653A1 (en) *	2006-01-31	2007-08-09	Gerard Medioni	3D Face Reconstruction from 2D Images
US20120121142A1 (en) *	2009-06-09	2012-05-17	Pradeep Nagesh	Ultra-low dimensional representation for face recognition under varying expressions
US20130021490A1 (en) *	2011-07-20	2013-01-24	Broadcom Corporation	Facial Image Processing in an Image Capture Device
US20130070116A1 (en) *	2011-09-20	2013-03-21	Sony Corporation	Image processing device, method of controlling image processing device and program causing computer to execute the method
US20140050372A1 (en) *	2012-08-15	2014-02-20	Qualcomm Incorporated	Method and apparatus for facial recognition
US20150066764A1 (en) *	2013-09-05	2015-03-05	International Business Machines Corporation	Multi factor authentication rule-based intelligent bank cards

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
US7242807B2 (en) *	2003-05-05	2007-07-10	Fish & Richardson P.C.	Imaging of biometric information based on three-dimensional shapes

2016
- 2016-05-12 WO PCT/US2016/032213 patent/WO2016183380A1/en not_active Ceased
- 2016-05-12 EP EP16793565.9A patent/EP3295372A4/de not_active Withdrawn

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
US20050063566A1 (en) *	2001-10-17	2005-03-24	Beek Gary A . Van	Face imaging system for recordal and automated identity confirmation
US20040240711A1 (en) *	2003-05-27	2004-12-02	Honeywell International Inc.	Face identification verification using 3 dimensional modeling
US20070183653A1 (en) *	2006-01-31	2007-08-09	Gerard Medioni	3D Face Reconstruction from 2D Images
US20120121142A1 (en) *	2009-06-09	2012-05-17	Pradeep Nagesh	Ultra-low dimensional representation for face recognition under varying expressions
US20130021490A1 (en) *	2011-07-20	2013-01-24	Broadcom Corporation	Facial Image Processing in an Image Capture Device
US20130070116A1 (en) *	2011-09-20	2013-03-21	Sony Corporation	Image processing device, method of controlling image processing device and program causing computer to execute the method
US20140050372A1 (en) *	2012-08-15	2014-02-20	Qualcomm Incorporated	Method and apparatus for facial recognition
US20150066764A1 (en) *	2013-09-05	2015-03-05	International Business Machines Corporation	Multi factor authentication rule-based intelligent bank cards

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3295372A4 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
US11960639B2 (en)	2015-03-21	2024-04-16	Mine One Gmbh	Virtual 3D methods, systems and software
US11995902B2 (en)	2015-03-21	2024-05-28	Mine One Gmbh	Facial signature methods, systems and software
US12169944B2 (en)	2015-03-21	2024-12-17	Mine One Gmbh	Image reconstruction for virtual 3D
US10413229B2 (en)	2016-08-29	2019-09-17	Panasonic Intellectual Property Management Co., Ltd.	Biometric device and biometric method
EP3289971A1 (de) *	2016-08-29	2018-03-07	Panasonic Intellectual Property Management Co., Ltd.	Biometrische vorrichtung und biometrisches verfahren
US10561358B2 (en)	2016-08-29	2020-02-18	Panasonic Intellectual Property Management Co., Ltd.	Biometric device and biometric method
US10762663B2 (en)	2017-05-16	2020-09-01	Nokia Technologies Oy	Apparatus, a method and a computer program for video coding and decoding
US11227405B2 (en) *	2017-06-21	2022-01-18	Apera Ai Inc.	Determining positions and orientations of objects
GB2585168A (en) *	2018-03-23	2020-12-30	Ibm	Remote user identity validation with threshold-based matching
US10839238B2 (en)	2018-03-23	2020-11-17	International Business Machines Corporation	Remote user identity validation with threshold-based matching
WO2019180538A1 (en) *	2018-03-23	2019-09-26	International Business Machines Corporation	Remote user identity validation with threshold-based matching
GB2585168B (en) *	2018-03-23	2021-07-14	Ibm	Remote user identity validation with threshold-based matching
US10997736B2 (en)	2018-08-10	2021-05-04	Apple Inc.	Circuit for performing normalized cross correlation
CN109299729B (zh) *	2018-08-24	2021-02-23	四川大学	车辆检测方法及装置
CN109299729A (zh) *	2018-08-24	2019-02-01	四川大学	车辆检测方法及装置
CN109492571A (zh) *	2018-11-02	2019-03-19	北京地平线机器人技术研发有限公司	识别人体年龄的方法、装置及电子设备
WO2021004055A1 (zh) *	2019-07-05	2021-01-14	创新先进技术有限公司	人脸数据采集、验证的方法、设备及系统
US10892901B1 (en)	2019-07-05	2021-01-12	Advanced New Technologies Co., Ltd.	Facial data collection and verification
CN114937271B (zh) *	2022-05-11	2023-04-18	中维建通信技术服务有限公司	一种通信数据智能录入校对方法
CN114937271A (zh) *	2022-05-11	2022-08-23	中维建通信技术服务有限公司	一种通信数据智能录入校对方法
CN119232998A (zh) *	2023-06-29	2024-12-31	荣耀终端有限公司	视频处理方法、装置、电子设备和存储介质
CN119232998B (zh) *	2023-06-29	2025-11-04	荣耀终端股份有限公司	视频处理方法、装置、电子设备和存储介质
CN118071359A (zh) *	2024-04-17	2024-05-24	交通银行股份有限公司江西省分行	一种金融虚拟身份验证方法及系统

Also Published As

Publication number	Publication date
EP3295372A1 (de)	2018-03-21
EP3295372A4 (de)	2019-06-12

Legal Events

Date

Code

Title

Description

2016-12-28

121

Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16793565

Country of ref document: EP

Kind code of ref document: A1

2017-11-14

NENP

Non-entry into the national phase

Ref country code: DE

2017-12-12

WWE

Wipo information: entry into national phase

Ref document number: 2016793565