WO2012121404A1 - Interface d'utilisateur, dispositif incorporant cette interface et procédé de réalisation d'une interface d'utilisateur - Google Patents

Interface d'utilisateur, dispositif incorporant cette interface et procédé de réalisation d'une interface d'utilisateur Download PDF

Info

Publication number
WO2012121404A1
WO2012121404A1 PCT/JP2012/056218 JP2012056218W WO2012121404A1 WO 2012121404 A1 WO2012121404 A1 WO 2012121404A1 JP 2012056218 W JP2012056218 W JP 2012056218W WO 2012121404 A1 WO2012121404 A1 WO 2012121404A1
Authority
WO
WIPO (PCT)
Prior art keywords
gui
cursor
user
user interface
display
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2012/056218
Other languages
English (en)
Inventor
Andrew Kay
Matti Pentti Taavetti Juvonen
Christopher James Brown
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sharp Corp
Original Assignee
Sharp Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sharp Corp filed Critical Sharp Corp
Publication of WO2012121404A1 publication Critical patent/WO2012121404A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/002Specific input/output arrangements not covered by G06F3/01 - G06F3/16
    • G06F3/005Input arrangements through a video camera
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/4223Cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • H04N21/44213Monitoring of end-user related data
    • H04N21/44218Detecting physical presence or behaviour of the user, e.g. using sensors to detect if the user is leaving the room or changes his face expression during a TV programme

Definitions

  • the present invention relates to a user interface for controlling a device.
  • the invention relates to a device incorporating such a method and to a method of providing a user interface . It may relate in particular to a TV set, a set-top box, a PVR, DVD or Blu-ray player, radio, hi- fi, multimedia player, internet multimedia device or home network controller.
  • the television remote control has not changed significantly since its invention. Meanwhile, televisions and other display devices have acquired new functionality.
  • One reason for the trend is the perceived need to provide a user experience that is both easier to use and richer, that is, affording a greater amount of control than previous methods.
  • TV user interfaces are often quite crude . Richer user interfaces are required due to the changing role of the television as a home 'media hub' providing functions such as web browsing and interactive television.
  • gesture interfaces have been studied for decades, mainstream commercial interest in such interfaces is relatively new.
  • different companies have implemented a variety of interfaces with custom gestures for interacting with the device and a custom set of graphical objects, or 'widgets' , displayed on screen.
  • These systems have a steep learning curve as the user must be taught a set of gestures specific to each implementation.
  • gesture interfaces require some method of recognising the user's hand position and reacting to it, but how this is done can vary considerably between implementations. These systems aim to try to find the compromise between low computational complexity and a sufficiently rich set of gestures.
  • a high-end computer vision system may be able to analyse images and provide a real-time three-dimensional model of the position of the user' s arm, hand and fingers . Such a high-end system can recognise a large palette of subtle hand gestures and react to them, but it may be very expensive .
  • a simpler gesture-based system that is able only to track the position of the user' s hand can distinguish only a few large gestures, which may limit the way in which the user interacts with the system.
  • gesture tracking systems have predominantly been of the latter, cheaper variety.
  • a typical set of gestures that these systems can recognise includes moving the hand to left, right, up or down to navigate through menus and occasionally a separate gesture to browse through lists of items such as TV channels or music albums .
  • a gesture-based system will also need a way to bring up the interface . For this purpose, such an interface may require yet another gesture, such as waving at the system.
  • the behaviour of the user determines a virtual cursor position on (or around) the display (which we abbreviate to simply cursor) .
  • the cursor may indicate a region as precise as a pixel (as with a conventional computer mouse pointer) , or a whole (virtual) object (such as a highlighted text character, virtual button, text box or picture .)
  • the cursor indicates on the display a position or object that the user is interacting with, for example to select, move or modify.
  • a common problem for simple gesture-based systems is that of selection: in a system that can track a two- dimensional point, how does one implement the very common action of selecting an item?
  • the second common select action is that of dwelling, that is, holding the cursor still on the selection for a predetermined length of time. Holding the cursor still for long enough may require the user to exercise fine motor control, and may therefore be difficult. It may also be frustrating for the user as it introduces an enforced a delay into to the system.
  • the website introduces a number of different user interface elements.
  • the system is designed for computer user interface . Making selections in the system requires a fine control of the cursor position, using a computer pointing device such as a mouse.
  • US patent publication 20080 123937 A l describes a system for controlling a user interface .
  • the system can alternate between 'point' and 'click' modes.
  • US patent number 5, 594 ,469 (expired) describes a system for control using an open hand, described therein as 'the "how" position, ' to enter the control mode, and to select and manipulate user interface elements by moving a cursor.
  • the user selects an item by dwelling, that is , keeping the cursor on top of it for a predetermined length of time .
  • the user moves the cursor by gesturing in the corresponding direction .
  • the system uses a separate 'exit' gesture to exit the control mode .
  • the system must be calibrated for the hand position.
  • US patent publication 20 1002774 12 A l describes a camera-based system for tracking objects, including optical methods to determine a finger position in the proximity of a mobile device, for the purpose of controlling that device .
  • US patent publication 20 100079374 A l describes a tracking system where the user holds a pointing device incorporating a camera. The position of the pointing device is calculated from the camera image . In this system, the user must always hold the special tracking device .
  • US patent publication 20060098873 A l describes a system where an object is tracked using two cameras .
  • the object to be tracked is segmented from the background by subtracting the background from each image.
  • CN 1904806 proposes a " position input system" for use with a computer, to replace the current mouse or touch- sensitive screen.
  • the proposed position input system has a pair of CCD cameras which determine the mid-point of the user' s eyes and the position of the user's finger, and these are used to estimate a target location at which the user's finger points.
  • the position input system requires an initial calibration process using a "standardisation template” .
  • WO 02 / 007073 proposes a method of determining a direction in which a particular participant is pointing. It specifically addresses the problem that different people have different ways of pointing, which it asserts can lead to inaccuracies in the determination of the direction in which a person is pointing - in particular in a situation (such as a video conference) where there are a plurality of participants . It therefore proposes a calibration technique in which an user points at a known target, and a reference point is determined for that user from the position of the known target and the determined position of the user' s finger. This reference point is then used in subsequent determinations of where that user is pointing.
  • a first aspect of the invention provides a user interface for a device, the device having a display or being networked with a device having a display, the interface comprising: a tracking system capable of locating a predetermined part of a user, in space; an object tracking system capable of locating an object in space when the object is pointing at the display; a unit for calculating a cursor position at which a user is pointing, using information from the face tracking system and the object tracking system; a GUI control unit to create a graphical user interface (GUI) using information from the cursor position, to overlay the GUI and the cursor on the display, and to control the GUI ; wherein the GUI control unit is designed to initialise the GUI when the object points at a predetermined start target; and wherein the GUI control unit responds to the cursor passing over an element of the GUI .
  • GUI graphical user interface
  • a second aspect of the present invention provides a device having a user interface according to the first aspect.
  • a third aspect of the present invention provides a method of providing a user interface for a device, the device having a display being networked with a device having a display, the method comprising: locating a predetermined part of a user in space ; locating an obj ect in space when the obj ect is pointing at the display; calculating a cursor position at which a user is pointing, using information from the face tracking system and the object tracking system; creating a graphical user interface (GUI) using information from the cursor position, and overlaying the GUI and the cursor on the display; and controlling the GUI using a GUI control unit; wherein the GUI control unit is designed to initialise the GUI when the object points at a predetermined start target; and wherein the GUI control unit responds to the cursor passing over an element of the GUI .
  • GUI graphical user interface
  • the present invention comprises, at least, an electronic display device (on which a graphical user interface (GUI) is to be displayed) , a tracking unit for determining a cursor position on the display based on the pointing action and position of a user and a GUI control unit to generate and control the GUI on the display using information from the tracking unit and also to control a related function of the system.
  • the GUI control unit places elements on the display, and may overlay them on of another display function (such as the normal TV image) . These elements may display information, or may be interactive, as is common in a GUI .
  • the invention may be a TV.
  • the display would normally be the familiar TV display, over which the GUI elements are occasionally overlaid, by the GUI control unit, when the user wishes to exercise control over the TV.
  • user interaction with the GUI would lead to, for example, changing of channel, selection of input device or control of parameters such as volume, contrast etc .
  • Interaction with the GUI could also , optionally, exercise control over other devices directly connected to the TV (such as PVR, DVD or Blu-ray player) , or only indirectly connected (such as lighting, heating or other home devices) .
  • the invention may also be realised as a set-top box, to be connected to a TV, making use of the TV's own display to show the GUI elements.
  • the invention may control the operation of the TV through some remote control mechanism; or it may also provide the video images to be displayed.
  • the TV itself may merely display images, in a somewhat dumb mode, and the invention itself may include its own tuning or playback components .
  • a GUI is initialised when the user points at one of a predetermined set of positions, called the start target, determined by the tracking system.
  • the user points at the camera which may be above or below the display (or elsewhere) , but this is not essential.
  • the location of this initialisation pointing event is chosen to minimise inadvertent actions, so for example simply pointing at an actor on the screen in a TV drama ought not to initialise the GUI .
  • the position of a cursor is then controlled by the user via the tracking unit until the GUI control unit determines that the interaction is over, and the GUI is terminated until the next initialisation occurs.
  • the user controls the cursor position simply by pointing, in a natural way, at or near the display.
  • GUI Graphical user interface elements
  • widgets are arranged on the display screen with which the user may interact by moving the cursor over.
  • a confirmation region may appear beside it. If the cursor subsequently passes over a confirmation region the GUI control unit may possibly change the displayed elements, possibly change the state of the system (such as changing channel) or possibly terminate the GUI .
  • the GUI may be terminated (to allow the system to return to its normal function, such as TV viewing) when a final action, such as channel selection, is confirmed; or when an explicit cancel operation is selected; or after a predetermined timeout of no operations; or if the tracking unit detects the user is no longer pointing at (or near to) the display; or if the tracking unit fails to determine that there is a pointing user present.
  • the present invention is not a gesture-based interface . Instead, it is a style of interaction whereby the user controls the system by pointing. Pointing at the start target brings up a set of selections of which the user can select one by pointing at it. Every step is interactive: as the user selects an item a confirmation button appears next to it. To make the selection, the use can point at the button or ' slide' over it, which instantly confirms the selection. No separate click action is required to operate the system. Only the sequence of positions of the pointer is significant.
  • the present invention is based on the concept of pointing which is universal in human societies (although with some regional variations in hand shape) .
  • gesture- based interfaces vary between implementations and may require changing to accommodate cultural variations .
  • the present invention overcomes the problem of custom gestures that must be learned. Because it is based on pointing, the user interface is interactive and effectively tells the user what to do next. In contrast, gesture- based interfaces require training. Gesture-based interfaces may have extensive variations between implementations, so the training may not be transferable between implementations .
  • the present invention provides a user interface which requires no selection gesture and no 'dwell' delay, so it feels fast and responsive. Selection is easier than dwelling where the hand or other pointing device must be kept sufficiently still for a period of time. There is no separate start gesture; instead, the user may start the interface by pointing at a defined part of the system, typically at the camera. This has the advantage of being intuitive and easy to learn, but also easy to recognise .
  • finger as a shorthand for any convenient extremity, object or device that the user moves into the start region to act as or in place of a finger in the role of pointing. It may for example include a drinks can, a cup, an item of food, a pencil, a magazine, a book etc, or perhaps an obj ect specially designed for this role (such as a conventional remote controller, with or without modification) or a novelty item such as a "magic wand" , or whatever happens to be in the user's hand at the time .
  • Figure 1 shows a typical scenario for the use of the invention, and illustrates the relationship between pointing and the cursor position.
  • Figure 2 shows a possible architecture of the tracker subsystem.
  • Figure 3 shows the location of the start region.
  • Figure 4 shows a possible architecture of the finger finder unit.
  • Figure 5 shows a possible architecture of the finger detection unit.
  • Figure 6 shows details of finger template generation.
  • Figure 7 shows a set of possible states for a user interface using a simple button type graphical element (widget) .
  • Figure 8 shows a set of possible states for a user interface using a button type widget, where the confirmation region appears in an alternative position.
  • Figure 9 shows a set of possible states for a user interface using a selection widget, where two confirmation regions appear alongside the widget.
  • Figure 10 shows a set of possible states for a user interface where the user selects an item from a hierarchical menu.
  • Figure 1 1 shows a set of possible states for a user interface where the user selects an item from a radial menu.
  • Figure 12 shows a set of possible states for a user interface where the user adjusts a value using a linear adjustment widget.
  • Figure 13 shows an example of clipping the cursor position to the visible screen area if the pointer points outside the visible screen area.
  • the pointer (which may be a finger or other obj ect)
  • FIG. 1 illustrates a device including a user interface in accordance with an exemplary embodiment of the invention.
  • a display device 6 forming part of a TV set 7 is equipped with tracking sensors 8 and a tracking system (in this example a tracker processing unit 8 10) arranged to track at least one user 1 in a certain (imaginary) region of space 10, at least in front of the TV set at a reasonable viewing distance and viewing angle .
  • the TV set 7 incorporates a GUI control unit.
  • the tracker processing unit 8 10 determines the position of a user's face (or eye) 2 relative to the display surface, and also determines the presence in space of an object (such as a finger 3 of the user or a part (eg a tip) of a finger 3 of the user) in a start region.
  • an object such as a finger 3 of the user or a part (eg a tip) of a finger 3 of the user
  • the tracker processing unit 8 10 determines, at least some of the time, a cursor position 5 as the approximate point of intersection between the plane of the display 6 and an imaginary straight line 4 through the user's face (or eye) and finger. Note that the cursor position need not always be inside the displayable area.
  • a GUI control unit 700 connected to or part of the TV set 7 draws GUI elements 9 on the display, possibly including a cursor at the cursor position 5.
  • the GUI control unit 700 and the tracker processing unit 8 10 may, as is obvious, be arranged in a physically different way, for example inside the TV casing, or in a box separate from the TV; together or separately and connected by a signalling medium.
  • FIG. 2 illustrates the preferred embodiment of the tracking subsystem, consisting of sensors 8 and processing unit 8 10.
  • a pair of sensors in this example cameras 8 , are arranged to face the region of space 10 in which the user 1 is likely to be, and are connected to a processing unit 8 10 (which naturally need not be in the same housing as the cameras, so long as it can receive the image data without significant delay) .
  • the cameras are j oined rigidly (but perhaps adjustably) so they cannot move relative to each other by accident.
  • the cameras should preferably be firmly attached relative to the display, for the same reason.
  • each frame captured from the camera views can be mapped by a rectification unit 80 1 into a common coordinate system in a standard process widely known as stereo rectification . This process is well described in chapter 12 of "Learning OpenCV - Computer Vision with the OpenCV Library” by Gary Bradski & Adrian Kaehler, published in 2008 by O 'Reilly Media, Inc.
  • the rectification unit may additionally provide other services, such as partial removal of digital noise and automatic gain control for the cameras (to ensure that the image is neither under-exposed nor over-exposed) which would lead to poor tracking results .
  • the requirements for the cameras depend on the quality of the optics and sensor, on their spacing and alignment, on the number and position(s) of the user(s) , on the lighting levels and the complexity of the surrounding visual field .
  • good results can be obtained with average quality 640x480 monochrome sensors sampling synchronised frames at up to 48Hz, 6mm lenses and well-aligned (that is, near to parallel optical axes and near to parallel vertical axes) , with a baseline stereo separation of 12cm.
  • a disparity unit 802 does this by matching regions in the rectified camera views and calculating the disparity between corresponding object pixels in the two views. For example good results may be obtained using the OpenCV software library function cvFindStereoCorrespondenceBM as described in chapter 12 of "Learning OpenCV - Computer Vision with the OpenCV Library” by Gary Bradski & Adrian Kaehler, published in 2008 by O 'Reilly Media, Inc.
  • a face finder unit 803 detects the position of any user face in the scene .
  • a finger finder unit 804 locates and tracks a finger over time .
  • a cursor position unit calculates the approximate position of the cursor in the plane of the display 6, or else reports that it cannot detect any pointing activity in the scene .
  • the rectification unit 80 1 is not necessary, however, as is well known, it may be commonly used because it allows a much computationally cheaper algorithm to be used in the disparity unit 802. It also allows greater accuracy in mapping image points to real points in 3D , since it accounts for camera distortion. However, as an alternative it would be easy, and cheaper, to correct the resulting finger and head coordinates after tracking using uncorrected image data.
  • Human face detection is a well understood problem in computer vision that can be solved in various ways.
  • a common method is to create beforehand a generalised face template from a number of face photographs, and search for regions of the frame that bear resemblance to the face template .
  • the template need not be a simple bitmap; one method is to represent the face as a weighted sum of simple signals, or wavelets, which can then be matched against regions of the frame .
  • Many of the face matching algorithms are illumination invariant, that is, robust against changes in illumination.
  • One suitable and very well known method for face detection is the method of Haar classifiers, described by Lienhart and Maydt in "An Extended Set of Haar-like Features for Rapid Obj ect Detection" (ICIP02 , pp . I : 900-903 , 2002) and implemented in the OpenCV machine vision library, available from http: / / sourceforge . net/ proj ects/ opencvlibrary/ .
  • Some face recognition systems return a list of face candidates with coordinates and perhaps a confidence measure . If the face recognition algorithm is applied on both camera views independently, the results can be merged together, thereby improving match confidence and furthermore, a good estimate for face disparity can also be found in this case.
  • the face finder unit 803 will preferably predict face position when the face is temporarily obscured or tilted away, and keep track of several users in the vicinity. In this way the system can be made to handle several users at once , as will be explained later.
  • Figure 3 shows the geometry relevant to calculating the start region 44 , which is the region of space in which the finger should be found in order for detection to occur and the GUI to start up.
  • the start region obviously depends on the position of the user's face .
  • the position of the finger depends on the amount of extension of the arm 4 1 . It also depends on the choice of left or right arm, the choice of left or right eye (usually the dominant eye) , the distance and elevation of the camera system 8 and the way the user is feeling. If the user is using another obj ect (such as a can or book) to point with, this will be held in a similar position.
  • the allowed set of cursor starting positions 42 may be fairly large, and subtends an irregular cone-like shape 43 at the user's eye.
  • the cone-like shape 43 need not include the camera: this is simply a convenience in case the starting instruction for the user is "point at the camera" .
  • the cameras may be below the display, but the user is told to point to the top of the display to start interaction . This does not significantly change the method of calculation of the start region. It is preferable, but not essential, to position each camera so that its image plane is roughly parallel to the near and far faces of the start region 44.
  • Figure 4 shows the finger finder unit 804 in more detail. It inputs rectified image data 8 1 1 from the rectification unit 80 1 , disparity data 8 12 from the disparity unit 802 and face position data 8 13 from the face finder unit 803.
  • the finger finder can use continuous information from these sources, however for efficiency it will allow significantly cheaper processing if the finger finder 804 requests information only for relevant parts of the scene .
  • the finger detection unit 8 14 step needs disparity information only in a small area of the image near to the face region (the proj ection of the start region 44 onto the image plane as seen through the camera lens) ; thus the disparity unit need calculate only a much smaller region of disparity than the whole image .
  • the finger detection unit 8 14 uses the face position to determine the start region, and uses disparity information to perform the search.
  • a template generation unit 8 15 records information about the finger image and disparity, in the form of a template, for further tracking.
  • the finger finder unit 804 enters a tracking phase .
  • the tracking unit 8 16 searches each successive camera frame for the template stored earlier. So long as the tracking unit 8 16 continues to find the finger with sufficient frequency and confidence the finger finder unit 804 remains in tracking mode .
  • the found finger coordinates are passed to an optional filtering unit 8 17. The purpose of this unit is to remove unwanted effects such as jitter, and generally to condition the coordinates reported by the finger finder unit 804 as a whole . If the tracking unit 8 16 determines that tracking is lost the finger finder unit 804 returns to finger detection mode for the finger detection unit 8 14 to wait again for a finger in the start region 44.
  • a start region calculation unit 8 14 1 uses the face position to estimate the start region.
  • the distance from the camera to the face may be estimated in one of several ways .
  • the preferred method is to consider the disparity between face positions in the left and right images and to use triangulation .
  • Other methods include estimating face distance by its apparent size (and assuming a "normal" face size) , or by calculating some kind of maximum of the disparity in the face region of the disparity data 802. A combination of these methods may be used to improve the estimate .
  • the blob detection unit 8 142 looks for contiguous regions within the disparity information 8 12 which are inside the projection of the start region 44 onto the image plane .
  • the disparity information may be compared with threshold values to determine which pixels lie at about the right distance from the camera, and the pixels which pass the test are grouped into contiguous or nearly contiguous areas, which we call blobs . Grouping may be done using simple morphological operators (such as dilation) and simple graph connectivity algorithms, as is well known.
  • Each blob represents a possible candidate for the position of a finger.
  • a blob filtering unit 8 143 examines all the blobs and chooses likely candidates to be the blob corresponding to a finger. A convenient way to do this is to compute a likelihood score for each blob, and discard blobs whose score lies below a threshold, determined by trial and error.
  • blobs which are too small may be due to noise in the disparity graph, or represent only part of a larger object, and so receive a low score .
  • Blobs which are too large may represent another person walking in front of the detected face (or perhaps a child on a lap) , and may also receive a low score.
  • Higher scores may be given to blobs nearer to the centre of the start region, and so on.
  • the precise values and scoring functions may be worked out by simple geometry, but depend on the physical properties of the particular cameras, their arrangement in space, and viewer distance, so it is not possible to give universal equations.
  • a blob validation unit 8 144 is optionally employed to remove false positives.
  • This unit may wait for several contiguous frames containing a successful blob before allowing the best one through .
  • a reasonable value might be three consistent blobs in a row, depending on the frame rate of the system.
  • the finger detection unit 8 14 outputs the representation of the blob, preferably including data to determine which pixels form the blob, and the coordinates of the bounding box containing the blob . Note that this method will detect and locate any obj ect of approximately the right size inside the start region, and does not require any particular hand shape or skin colour, or even that the hand is empty.
  • the template generation unit 8 1 5 receives successful blob data from the finger detection unit.
  • different information must be stored as a template to represent the blob.
  • the template is just the region of the image data (from at least one of the cameras) which corresponds exactly to the blob .
  • the set of positions of pixels in the blob may be regarded as a mask, so that only pixels of interest to the template are selected by the mask.
  • Figure 6 shows the finger detection and template acquisition for easier understanding.
  • the rectified left camera view 6 1 and rectified right camera view 62 of course appear slightly different because each camera has a different perspective . Here they are shown in outline only, for clarity.
  • the disparity map 63 corresponding to a possible start region is calculated .
  • the large upper feature 64 corresponds to the face, and the large lower feature 65 corresponds to the hand and finger.
  • the fine contour lines within show small gradations of disparity. Other small features may correspond to noise , where the disparity unit has failed to make a good estimate of disparity, and it remains to construct a template without being overly confused by the noise.
  • the template generation unit 8 15 uses the remaining blob 67 as a mask and its position in the (in this case) left image to extract a template 68 containing only the finger and hand .
  • the tracking unit 8 16 considers each frame of input and tries to find the position of the template within the frame .
  • tracking algorithms There are many possible tracking algorithms, as is well known. A simple method is to simply search for the best match with the template at each position in the image. If the best match is good enough then tracking succeeds and outputs the coordinates of the best match . Otherwise tracking fails .
  • Many more efficient and suitable algorithms for tracking an object through a sequence of frames are available, including feature tracking using SIFT or SURF features, as is widely known.
  • Once a candidate location for the best tracking position is known it can be checked in the disparity map to confirm that it has the correct 3D position (using similar criteria as were used for blob candidate selection in the blob filtering unit 8 143) . Again, this requires only a small region of disparity, of the size of the template, and so is efficient to compute .
  • the template may be updated during tracking using subsequent frames, in order to take account of changes to the appearance of the finger over time (due for example to changes in lighting, hand shape , orientation or perspective) .
  • This is easily done since the point of match is known, and the disparity information has been obtained at that point to check the match. This information can be used to acquire and mask a new template .
  • the tracker terminates tracking when it can no longer predict the finger position with sufficient confidence, for example if the block match score is low for too many successive frames, or if the disparity check fails . Additionally the tracker may terminate following a signal from the GUI, for example when the user finalises an interaction by selection cancel or OK. Once the tracker terminates, the finger detector may start searching again for a finger in the start region 44.
  • edge detection filter it has been found advantageous to apply an edge detection filter to both the template and the search region before running the search. This results in a more robust match of the finger position.
  • a suitable and well known edge detection filter for this purpose is the Canny filter.
  • the cursor is controlled by the user' s finger, in a conventional pointing pose .
  • the locus of the pointer on the screen that is the cursor position, is computed by 3D geometry in the cursor position unit 805 ; in the preferred embodiment, the cursor lies at the intersection of the display and a line that extends from the user' s eye and finger onto the screen. In other words, the cursor appears in the position where the user is pointing, as the user would naively expect.
  • the cursor position may be calibrated as follows: the user points to (at least) the four corners of the display in succession, while the computer records the corresponding face and finger coordinates for each corner. Subsequently in use the actual cursor position is determined by interpolation of these recorded calibration coordinates, for example, using the well known technique of barycentric coordinates for interpolation as described by Bradley in "The Algebra of Geometry: Cartesian, Areal and Projective Co-ordinates" (Bath: Highperception) , scaled and offset depending on the relative face position.
  • This calibration, or its equivalent may be performed at the factory in a case when the camera is bonded immovably to the display panel.
  • latency the difference in time between a user's action and the user seeing the movement of the cursor
  • latency the difference in time between a user's action and the user seeing the movement of the cursor
  • Latency can be reduced by using faster components, such as camera and processing units, and by careful implementation of the tracking algorithms.
  • jitter or, apparently random fluctuations of the cursor position lead to a poor user experience .
  • Vision based tracking algorithms are notoriously prone to j itter, due to noise in the camera sampling, and due to the amazing complexity of visual scenes.
  • the amplitude of j itter is ideally significantly smaller than the size of the GUI elements such as selection boxes.
  • the well known Kalman filter as described by Kalman in "A new approach to linear filtering and prediction problems" (Journal of Basic Engineering 82 ( 1 ) : 35-45) , may be used to smooth the motion of the finger position without introducing latency.
  • a simple averaging filter may be used too, though this can add latency.
  • Cursor position j itter can be reduced by directly filtering the output of the cursor position unit 805.
  • the inputs to the cursor position unit 805 i.e. the finger position and the head position
  • the inputs to the cursor position unit 805 can be filtered.
  • the face detection position since small amounts of jitter in the face detection position can logically result in large cursor movements on the display it may be preferable to capture the face position at the time that a template is acquired, and use that same head position in calculation of the cursor position until the cursor is released, even if the user's head moves (or appears to move, due to inaccuracies in the face finding algorithm) .
  • the finger finding unit may therefore compensate by assuming this intention to point, noting the detected finger position and the corresponding offset from the assumed position, and invert the offset to give an adjustment which would compensate for the erroneous cursor position. This adjustment is then applied to the subsequent tracking, with the result that the cursor lies closer to where the user expects it should be .
  • the face finder unit 803 algorithm is adjusted to return the positions for all the faces in the camera's field of view. It must also keep a predictive model of the face positions, so that the different users are kept distinct, and tracked even if their faces are briefly not detected. However, if a face is not detected for more than a few seconds it should assume the user is no longer present (or has moved to a different location) . The result of this step is at any time there is a list of users with their probable face positions .
  • the finger detection unit 8 14 For each face position there is a corresponding start region 44 , thus the finger detection unit 8 14 must check all of the start regions for a finger, until it finds a finger present. A template is generated for that finger, and the tracking unit 804 then tracks that template until tracking terminates, with the cursor position unit 805 using that tracked position and the corresponding user's face position to calculate the cursor position. The finger detection unit does not begin to detect a new finger until tracking terminates on the previous finger. In this way, only one user at a time controls the cursor.
  • the face finder unit may also incorporate a face recognition unit, so that different viewers may be distinguished and remembered from session to session.
  • a face recognition unit so that different viewers may be distinguished and remembered from session to session.
  • using a face recognition unit would allow only certain authorised viewers to control the display, or provide different capabilities for different viewers . This would be of use in a public space , where members of the public might have less authority over the display than previously registered officials; or for TV where for example children may be given only restricted viewing controls.
  • the invention may be used with virtually no instruction, it may be advantageous to allow users to practise the kind of steady pointing they need to operate such an interface.
  • a face recognition unit could be used so that when a user not seen before by the system starts to operate the interface more help is given.
  • this "beginner mode” instructions can be printed on the display, and fewer widgets could be available , with a method for enabling the normal, "advanced mode” after a suitable amount of familiarisation . It may be advantageous to present a "doodling mode" in which the user can practise pointing and controlling the cursor without any menu items to clutter the display.
  • a sensitivity zone can be defined .
  • the sensitivity zone is a region in front of the user within which all pointing actions take place and therefore within which the finger or pointing device should be tracked.
  • the sensitivity zone encompasses all spatial locations in which a pointing finger could lie whilst pointing either at the display or within an. extra margin beyond the display (to allow for tracking and user inaccuracy. ) When the finger or pointing device leaves the sensitivity zone, the pointing action may be considered finished.
  • the extent of the sensitivity zone may depend on the 3D location of the detected face. Restricting tracking to this zone naturally reduces the amount of computation required (since the zone will generally contain fewer pixels than the entire captured image) .
  • any method can be used to obtain a two- dimensional picture along with depth information. Possible ways of doing this include but are not limited to special depth cameras, cameras or other sensors mounted on beside , above or below the viewing area, ultrasonic sensing and so on . In the absence of complete depth information about the scene, certain things may be inferred from a single two-dimensional image using indirect metrics such as apparent eye separation for head position.
  • Colour cameras are not required but, if they are available, then colour information may be incorporated in the tracker. If the system is required to perform in a dark or dimly lit environment, then a camera sensitive to infrared light may be used, optionally with an infrared light source.
  • An embodiment of the invention makes use of the observation that the user' s head moves less than the finger or pointing device .
  • Finding the face potentially one of the most computationally expensive parts of the system, can be performed less often than finding the finger or pointing device .
  • the face location can be estimated from previous locations . It is also possible to switch from the robust but expensive generic face finding mode to a simpler template finding face tracker once the face has initially been found.
  • the position of the face and finger or pointing device are filtered in order to reduce cursor j itter. Applying such filtering should be done in such a way that the latency of the system is not increased. Predictive filters such as the Kalman filter can be used for this purpose, as is well known.
  • a cursor when the user points at the camera to activate the system, a cursor appears on the display.
  • the position of the cursor corresponds to the position of the user' s head and finger. As the user moves the finger, the cursor follows the motion.
  • the cursor can appear on the display surface on the intersection of an imaginary line that extends from the user' s eye through the pointing finger.
  • a 3D position of any obj ect in the view of the cameras can be computed in a coordinate system that is registered with the display. It is therefore possible to compute in three dimensions the position of the face, and the position of the finger, and find the locus of the cursor on the screen without requiring a separate calibration step .
  • Figures 7- 12 show some possible states for a user interface, including examples of graphical user interface elements or widgets that could be used by such an interface .
  • the figures show only the user interface layer. In a practical system, the user interface layer would typically be overlaid on other information displayed .
  • State (a) shows the state where no user interface elements are displayed. This is the normal state when the user is not interacting with the device . Typically the display would be filled with content the user is watching, such as a TV programme (but for clarity this is not shown in the diagram) .
  • the cursor is shown as a typical 'mouse pointer' shape but may be of any desired shape or image .
  • a number of widgets may exist for various types of interaction.
  • a simple selection widget as shown in Figure 7 may be the simplest one .
  • the user interface When the user interface is activated, only the item region 7 1 is shown.
  • a confirmation region 72 appears beside it.
  • the location of this confirmation region may depend on a variety of factors including the position of other widgets and the direction from which the user approaches the widget.
  • the confirmation region When the user moves the cursor onto the confirmation region the confirmation region is activated and the GUI control unit 700 may perform an action associated with the selection widget.
  • the range of cursor positions required to activate a confirmation region may correspond to the area of the confirmation region shown on the display. Alternatively, said range of cursor positions may be smaller than the area of the confirmation region in order to mitigate effects such as cursor j itter and unintended motion of the user's finger.
  • the position of the confirmation region of an item is determined by the position and direction that the cursor entered the item .
  • the confirmation region may be placed opposite or nearly opposite to the point of entry, as shown in Figure 7, so that the cursor continuing its motion traverses the whole of the item before confirmation is decided.
  • the selection widget may have multiple confirmation regions, each of which selects a different state . This way it is possible to choose from a number of discrete options.
  • a simple example using two confirmation regions for 'Yes' and 'No' is shown in Figure 9.
  • the action of pointing at the camera brings up an on-screen display, an example of which is shown in Figures 10- 12 as State (b) . If the start target region lies above the display, then the user must move his or her finger downward (into the visible screen area) to make a selection.
  • the on-screen display can display any number of widgets and the user may select one by pointing at it. Once the cursor is on top of a widget, the onscreen display changes to reflect the selection.
  • State (c) is an example of the user selecting a widget by moving the cursor over it.
  • the user interface items may represent choices, such as which channel to watch; or variable parameters, such as volume control; or menus and submenus to allow navigation to display more sets of user interface items .
  • Some items operate by making a separate confirmation items appear. In these cases, the effect of the selection (such as channel changing) is not confirmed until the cursor passes over the confirmation item.
  • the user moves the cursor into the confirmation region. As soon as the cursor reaches the confirmation region , the selection is made.
  • the confirmation region is removed from the display as if the cursor never entered the selection item area in the first place . This allows the user to back out of a decision at a late stage, and also allows for some accidental errors in finger position to be rendered harmless.
  • the finger tracker can deliver cursor coordinates at only a certain rate , therefore the cursor may cross a widget from one side to the other without appearing at any point on top of that widget. To prevent this causing annoyance to the user it is possible to interpolate intermediate cursor positions between the actual tracked positions. These intermediate positions are then treated in exactly the same way as if they were actual cursor positions.
  • Figure 10 shows the use of a multi-level menu.
  • State (b) shows an initial state, to be entered when the GUI begins.
  • State (c) the user selects a menu by moving the cursor over it. This causes a category menu to appear.
  • a category submenu appears.
  • the user may select another category, which will be reflected by the submenu.
  • a confirmation box appears .
  • the user may now select the item by moving the cursor above the confirmation box, as in State (f) .
  • This style of interaction may be used with any depth of hierarchy.
  • one or more of the menu entries may perform a cancellation action, taking the user back to a previous state .
  • This works in the same way as a button widget or a menu entry, except that it is labelled with a label such as "CANCEL" to indicate its function and that when the cursor moves over it and then, optionally, its confirmation region, any pending actions are cancelled rather than confirmed.
  • a radial menu is represented as a number of sectors centred around a middle region .
  • the confirmation region may appear along the outer perimeter of the sector so that, when the cursor continues moving along a straight path, the item is first selected and the selection then confirmed .
  • Figure 1 1 shows the use of an example radial menu.
  • the user selects the menu by moving the cursor over the specified region, as in State (c) .
  • This action brings up the radial menu, centred around the region, as shown in State (d) .
  • the user may now move the cursor above a menu entry to select it, as in State (e) .
  • a confirmation box now appears, the selection of which will confirm the menu selection as in State (f) .
  • the radial menu may be used with any depth of hierarchy. If further levels of hierarchy are desired, they may appear as concentric menus around, or start new menus of the same or different type elsewhere on the display. If the cursor in State (e) moves not to the confirmation region but to a different item, the pending item becomes deselected, the new item becomes pending, and the previous confirmation item is replaced by another in the appropriate new position corresponding to the new pending item.
  • the radial menu may be fully visible, as pictured, or it may be partly visible, for example starting from an edge or a corner of the screen.
  • one or more of the menu entries may perform a cancellation action, optionally with a confirmation region, which takes the user back to a previous state .
  • one or more of the menu entries may be missing, leaving a gap in the radial menu. This gap may be used as an exit route for the cursor so that widgets may be accessed outside the radial menu.
  • Figure 1 2 shows the use of a linear adjustment widget, used to select a value from range which may be either continuous or discrete .
  • a linear adjustment widget may be used to control the volume of the TV audio .
  • the user selects, State (c) , a menu entry which brings up the adjustment widget in State (d) .
  • State (e) a menu entry which brings up the adjustment widget in State (d) .
  • the adjustment indicator 76 follows the cursor to give feedback.
  • the present value of the slider (such as volume level as a number) may be overlaid on or near the adjustment indicator.
  • the user confirms the adjustment by moving the cursor over a confirmation region as in state (f) .
  • linear adjustment sliders can also be used to make a selection from a set of discrete entries, particularly when the entries can be placed in an order that makes sense to the user.
  • the adjustment widget is positioned in such a way that it replaces the menu entry, which ensures that the cursor is already in a suitable position.
  • An alternative is to position the widget away from the menu entry such that the user may observe the value first and decide whether it needs adjusting. If the value should be adjusted, the user can move the cursor over the adjustment widget and perform the adjustment normally. If the value should not be adjusted, the user can cancel the action.
  • a slider requires a certain amount of careful control from the user. For commonly used functions it is important to design the GUI to make it as easy for the user as is practical. We have found it advantageous to place a horizontal volume control slider at the bottom of the display. It is best if it is tall, say 1 0% of the height of the display, so that the user can easily keep the cursor within the slider region.
  • the tracking unit is arranged to track positions of the user's finger in an region larger than that which corresponds to the visible screen area - that is to say, the sensitivity zone is sufficiently large - it may be desirable to limit, or clip, the cursor position such that it stays within the visible screen area.
  • Figure 13 shows an example of such clipping.
  • the visible screen area 8 1 is smaller than the area which can be tracked 82. If the user points in a position 83 such that the corresponding cursor position would lie outside the visible screen area, it may be desirable to clip the cursor position 84 such that the cursor, or a portion of the cursor, remains within the visible screen area.
  • buttons it may be desirable to display other information, such as programme or channel names or alerts. These may appear within the user interface . They may disappear after a predefined timeout, or they may be cleared by the user by selecting them, optionally with a confirmation step .
  • Textual or other information may also appear as a result of the user making a selection. For example, placing the cursor over a channel name (but not confirming the selection) may display current and upcoming programme information on the screen .
  • slider-type widgets could be used is a channel map, or electronic programme guide .
  • a function is implemented as a grid where the vertical axis represents channel selection and the horizontal axis represents time.
  • the user can use the arrow keys to navigate the channel map, and an OK' button to select a channel or record a programme.
  • the channel map is displayed on the screen with a vertical slider aligned alongside it. Adjusting the position of the vertical slider allows the user to choose a channel. As the channel selection is confirmed, the user may be shown a horizontal slider alongside the channel's programme information so that the user may select a programme . To see programme information for another channel, the user may return to the vertical slider.
  • the user may have several options to quit the on-screen display. Cancellation buttons may appear as selections on the menu system, or implicitly outside the visible screen area. Widgets may have their own 'return' or 'cancel' regions in addition to the confirmation regions . In addition, a predefined timeout may cancel the pointing sequence and quit the onscreen display. Additionally, if the tracker signals to the user interface that it can no longer track the pointer, the quit action may be taken.
  • the pending selection may be highlighted in some way, so that the user's attention is drawn to the effect of what would happen were the confirmation region to be selected.
  • a method of highlighting might be to temporarily change the colour of the pending selection. If the user's following actions are such as to back out of a pending selection, the pending selection highlight should be removed .
  • Some confirmation regions cause the pending selection or value to be accepted and also end the GUI session. However, when several selections or values are required it may be preferable not to end the GUI session without a further action from the user. For example, after adjusting "brightness” the user will perhaps like also to adjust "contrast” , and so the GUI should not suddenly end when brightness has been confirmed.
  • the GUI control unit may respond to an interpolated course of the cursor passing over an element of the GUI .
  • the tracking system may be a face tracking system capable of locating a user's face, or a part thereof, in space.
  • the face tracking system may comprise a stereo camera.
  • the GUI control unit may generate a confirmation area that is displaced from the element of the GUI in the direction of travel of the cursor.
  • the GUI control unit may generate a confirmation area that is , displaced from the element of the GUI in a direction different from the direction of travel of the cursor.
  • the GUI control unit may generate a confirmation area that is displaced from the element of the GUI in a direction different from a direction of travel of the cursor required to actuate the element of the GUI .
  • the GUI control unit may be adapted to initialise when the object tracking system detects an object in a first region of space, the first region of space defined on the basis of a determined location of the . user's face and the start target.
  • the user interface may be adapted to derive a correction from a detected position of the obj ect when the user points at the start target.
  • the unit for calculating the cursor position may be adapted to calculate the cursor position using the determined correction .
  • the user interface may be adapted to determine the first region of space.
  • the start target may be outside an active area of the display.
  • the object tracking system may be adapted to track an object within a second region of space , the second region of space determined on basis of a determined location of the user's face .
  • the tracking system may be capable of locating a respective predetermined part of a plurality of users, the object tracking system may be capable of locating a plurality of objects in space, and GUI control unit may be adapted to initialise when one of the objects points at the predetermined start target.
  • the GUI control unit may be adapted to initialise when the object tracking system detects an object in one of a plurality of first regions of space , each first region of space defined on the basis of a determined location of the face of a respective user and the start target.
  • the present invention provides a low cost and convenient user interface to an electronic product with a display.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Social Psychology (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Databases & Information Systems (AREA)
  • User Interface Of Digital Computer (AREA)
  • Position Input By Displaying (AREA)

Abstract

L'invention concerne une interface d'utilisateur pour un dispositif, le dispositif possédant un afficheur ou étant relié en réseau avec un dispositif possédant un afficheur, l'interface comprenant : un système de poursuite capable de localiser une partie prédéterminée d'un utilisateur, dans l'espace ; un système de poursuite d'objet capable de localiser un objet dans l'espace lorsque l'objet pointe vers l'afficheur ; une unité servant à calculer une position de curseur sur laquelle un utilisateur est en train de pointer, au moyen d'informations provenant du système de poursuite et du système de poursuite d'objet ; une unité de commande d'interface d'utilisateur graphique servant à créer une interface d'utilisateur graphique (GUI) au moyen des informations provenant de la position du curseur, à afficher la GUI et le curseur sur l'afficheur, et à commander la GUI ; l'unité de commande de GUI étant conçue pour s'initialiser lorsque l'objet pointe sur une cible de départ prédéterminée ; et l'unité de commande de GUI répondant au passage du curseur sur un élément de la GUI.
PCT/JP2012/056218 2011-03-07 2012-03-06 Interface d'utilisateur, dispositif incorporant cette interface et procédé de réalisation d'une interface d'utilisateur Ceased WO2012121404A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB1103833.8A GB2488785A (en) 2011-03-07 2011-03-07 A method of user interaction with a device in which a cursor position is calculated using information from tracking part of the user (face) and an object
GB1103833.8 2011-03-07

Publications (1)

Publication Number Publication Date
WO2012121404A1 true WO2012121404A1 (fr) 2012-09-13

Family

ID=43923307

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2012/056218 Ceased WO2012121404A1 (fr) 2011-03-07 2012-03-06 Interface d'utilisateur, dispositif incorporant cette interface et procédé de réalisation d'une interface d'utilisateur

Country Status (2)

Country Link
GB (1) GB2488785A (fr)
WO (1) WO2012121404A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150074532A1 (en) * 2013-09-10 2015-03-12 Avigilon Corporation Method and apparatus for controlling surveillance system with gesture and/or audio commands
EP2921937A1 (fr) * 2014-03-17 2015-09-23 Omron Corporation Appareil multimédia, procédé de commande d'appareil multimédia, et programme de commande d'appareil multimédia
JP2018120331A (ja) * 2017-01-24 2018-08-02 株式会社ブループリント プログラム及び表示装置
WO2019110395A1 (fr) * 2017-12-08 2019-06-13 Sagemcom Broadband Sas Procede d'interaction avec un sous-titre affiche sur un ecran de television, dispositif, produit-programme d'ordinateur et support d'enregistrement pour la mise en œuvre d'un tel procede

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SE536989C2 (sv) * 2013-01-22 2014-11-25 Crunchfish Ab Förbättrad återkoppling i ett beröringsfritt användargränssnitt
WO2017020954A1 (fr) * 2015-08-06 2017-02-09 Arcelik Anonim Sirketi Système de détection de mouvement multipoint et de surveillance d'utilisateur pour un dispositif d'affichage d'image
FR3076387B1 (fr) * 2018-01-04 2020-01-24 Airbus Operations Procede et systeme d'aide a la gestion de listes de commandes sur un aeronef.
CN114153349B (zh) * 2021-11-25 2024-11-12 深圳市鸿合创新信息技术有限责任公司 一种智能交互显示设备的光标移动控制方法及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008090640A (ja) * 2006-10-03 2008-04-17 Toyota Motor Corp 入力装置
JP2010045658A (ja) * 2008-08-14 2010-02-25 Sony Corp 情報処理装置、情報処理方法および情報処理プログラム
JP2010534895A (ja) * 2007-07-27 2010-11-11 ジェスチャー テック,インコーポレイテッド 高度なカメラをベースとした入力
JP2011039844A (ja) * 2009-08-12 2011-02-24 Shimane Prefecture 画像認識装置および操作判定方法並びにプログラム

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6531999B1 (en) * 2000-07-13 2003-03-11 Koninklijke Philips Electronics N.V. Pointing direction calibration in video conferencing and other camera-based system applications
CN100432897C (zh) * 2006-07-28 2008-11-12 上海大学 手、眼关系引导的非接触式位置输入系统和方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008090640A (ja) * 2006-10-03 2008-04-17 Toyota Motor Corp 入力装置
JP2010534895A (ja) * 2007-07-27 2010-11-11 ジェスチャー テック,インコーポレイテッド 高度なカメラをベースとした入力
JP2010045658A (ja) * 2008-08-14 2010-02-25 Sony Corp 情報処理装置、情報処理方法および情報処理プログラム
JP2011039844A (ja) * 2009-08-12 2011-02-24 Shimane Prefecture 画像認識装置および操作判定方法並びにプログラム

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150074532A1 (en) * 2013-09-10 2015-03-12 Avigilon Corporation Method and apparatus for controlling surveillance system with gesture and/or audio commands
US9766855B2 (en) * 2013-09-10 2017-09-19 Avigilon Corporation Method and apparatus for controlling surveillance system with gesture and/or audio commands
EP2921937A1 (fr) * 2014-03-17 2015-09-23 Omron Corporation Appareil multimédia, procédé de commande d'appareil multimédia, et programme de commande d'appareil multimédia
JP2018120331A (ja) * 2017-01-24 2018-08-02 株式会社ブループリント プログラム及び表示装置
WO2019110395A1 (fr) * 2017-12-08 2019-06-13 Sagemcom Broadband Sas Procede d'interaction avec un sous-titre affiche sur un ecran de television, dispositif, produit-programme d'ordinateur et support d'enregistrement pour la mise en œuvre d'un tel procede
FR3074938A1 (fr) * 2017-12-08 2019-06-14 Sagemcom Broadband Sas Procede d'interaction avec un sous-titre affiche sur un ecran de television, dispositif, produit-programme d'ordinateur et support d'enregistrement pour la mise en œuvre d'un tel procede

Also Published As

Publication number Publication date
GB2488785A (en) 2012-09-12
GB201103833D0 (en) 2011-04-20

Similar Documents

Publication Publication Date Title
WO2012121404A1 (fr) Interface d'utilisateur, dispositif incorporant cette interface et procédé de réalisation d'une interface d'utilisateur
US20250130707A1 (en) Devices, methods, and graphical user interfaces for content applications
US11262840B2 (en) Gaze detection in a 3D mapping environment
US8881051B2 (en) Zoom-based gesture user interface
US10120454B2 (en) Gesture recognition control device
KR101227610B1 (ko) 화상인식장치 및 조작판정방법과 이를 위한 프로그램을 기록한 컴퓨터 판독가능한 기록매체
WO2012121405A1 (fr) Interface d'utilisateur, dispositif possédant une interface d'utilisateur et procédé de réalisation d'une interface d'utilisateur
US8693732B2 (en) Computer vision gesture based control of a device
US9459758B2 (en) Gesture-based interface with enhanced features
KR101815020B1 (ko) 인터페이스 제어 장치 및 방법
CN108369630A (zh) 用于智能家居的手势控制系统和方法
US20200142495A1 (en) Gesture recognition control device
EP2853986B1 (fr) Dispositif de traitement d'image, procédé de traitement d'image et programme
JP2018517984A (ja) 画像領域を選択して追跡することによるビデオ・ズームのための装置および方法
KR20120040211A (ko) 화상인식장치 및 조작판정방법, 그리고 컴퓨터 판독가능한 매체
JP2003316510A (ja) 表示画面上に指示されたポイントを表示する表示装置、及び表示プログラム。
US20160147294A1 (en) Apparatus and Method for Recognizing Motion in Spatial Interaction
CN120726688A (zh) 一种动态手势识别方法、装置、电子设备、芯片及介质
CN120034605B (zh) 一种移动通讯设备的显示屏的动态显示方法以及系统
CN121742718A (zh) 基于智能显示设备的虚拟操作交互方法及系统
KR20200121513A (ko) 동작 인식 기반 조작 장치 및 방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12755661

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12755661

Country of ref document: EP

Kind code of ref document: A1