WO2021007320A1 - Détection d'objet dans des nuages de points - Google Patents

Détection d'objet dans des nuages de points Download PDF

Info

Publication number
WO2021007320A1
WO2021007320A1 PCT/US2020/041200 US2020041200W WO2021007320A1 WO 2021007320 A1 WO2021007320 A1 WO 2021007320A1 US 2020041200 W US2020041200 W US 2020041200W WO 2021007320 A1 WO2021007320 A1 WO 2021007320A1
Authority
WO
WIPO (PCT)
Prior art keywords
points
dimensional
proposal
location
locations
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2020/041200
Other languages
English (en)
Inventor
Jonathon Shlens
Patrick An Phu NGUYEN
Benjamin James CAINE
Jiquan Ngiam
Wei Han
Brandon Chauloon YANG
Yuning CHAI
Pei Sun
Yin Zhou
Xi YI
Ouais ALSHARIF
Zhifeng Chen
Vijay VASUDEVAN
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Waymo LLC
Original Assignee
Waymo LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Waymo LLC filed Critical Waymo LLC
Priority to CN202080050276.4A priority Critical patent/CN114080629A/zh
Priority to JP2022500800A priority patent/JP2022539843A/ja
Priority to EP20750916.7A priority patent/EP3980932B1/fr
Priority to KR1020227004263A priority patent/KR20220031685A/ko
Publication of WO2021007320A1 publication Critical patent/WO2021007320A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional [3D] objects
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional [3D] objects
    • G06V20/647Three-dimensional [3D] objects by matching two-dimensional images to three-dimensional objects
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • G06T2207/30252Vehicle exterior; Vicinity of vehicle

Definitions

  • a system for detecting objects within point clouds obtains point cloud data representing a sensor measurement of a scene captured by one or more sensors and including three-dimensional points in the scene, and then determines multiple two-dimensional proposal locations based on the three-dimensional points in the scene.
  • the system generates, for each two-dimensional proposal location, a feature representation from three-dimensional points in the point cloud data that are near the two-dimensional proposal location.
  • the system then processes the feature representations of the two-dimensional proposal locations using an object detection neural network that is configured to generate an object detection output that identifies objects in the scene.
  • the system described in this specification can process point cloud data representing a sensor measurement of a scene captured by one or more sensors to generate an object detection output that identifies locations of one or more objects in the scene.
  • the one or more sensors can be sensors of an autonomous vehicle (e.g., LIDAR sensors), the scene can be a scene that is in the vicinity of the autonomous vehicle, and the object detection output can be used to make autonomous driving decisions for the vehicle, to display information to operators or passengers of the vehicle, or both.
  • the system implements a non-convolutional point-based network designed specifically for point cloud data that can generate accurate object detection outputs with minimal latency and at a relatively low computational cost.
  • the system is configured to leverage this capability to adapt the amount of computation that is dedicated to each spatial region in the scene to system priorities, resource availability, or both.
  • the system may dynamically alter the computational demand by tuning the number of proposals that are determined without having to alter or retrain the system’s point-based network.
  • This framework not only allows the system to be flexibly targeted across a range of computational priorities, but also enables the system to generate object proposals in a manner geared to maximize spatial coverage or match the density of point clouds. Given the need for accurate real time information in autonomous vehicles and the nature of their surroundings, the system described in this specification may better fit the requirements of autonomous vehicle- based perception systems.
  • FIG. 2 shows a block diagram of an example perception subsystem.
  • This specification describes a system implemented as computer programs on one or more computers in one or more locations that processes point cloud data representing a sensor measurement of a scene captured by one or more sensors to generate an object detection output that identifies locations of one or more objects in the scene.
  • the one or more sensors can be sensors of an autonomous vehicle, e.g., a land, air, or sea vehicle, and the scene can be a scene that is in the vicinity of the autonomous vehicle.
  • the object detection output can then be used to make autonomous driving decisions for the vehicle, to display information to operators or passengers of the vehicle, or both.
  • the system receives point cloud data representing a sensor measurement of a scene captured by one or more sensors.
  • the point cloud data includes a set of three-dimensional points, i.e., a set of points corresponding to reflections identified by one or more scans of the scene by the one or more sensors, and optionally features generated by the one or more sensors for the three-dimensional points, e.g., LiDAR features.
  • Each three-dimensional point generally has x, y, and z coordinates (or three different coordinates in a different coordinate system).
  • the system determines, based on the three-dimensional points in the scene, a plurality of two-dimensional proposal locations.
  • the system samples a fixed number of two-dimensional locations from the locations of the three-dimensional points.
  • the system designates a pair of coordinates, e.g., (x,y), from the three coordinates representing the three-dimensional points and then samples a fixed number of two-dimensional proposal locations from among the designated coordinates, e.g., the (x,y) coordinates, of the three-dimensional points in the scene.
  • the system can sample the fixed number of two-dimensional proposal locations in any of a variety of data dependent but computationally efficient ways.
  • the system can sample the fixed number of two-dimensional proposal locations using farthest point sampling, in which individual points are selected sequentially such that the next point selected is maximally far away from all previous points selected.
  • the system can sample the fixed number of two-dimensional proposal locations using random uniform sampling, in which each two-dimensional proposal location is randomly sampled from a uniform distribution over the three-dimensional points, i.e., the (x,y) coordinates of each three-dimensional point are equally likely to be sampled.
  • the system generates, for each two-dimensional proposal location, a feature representation from three-dimensional points in the point cloud data that are near the two- dimensional proposal location.
  • the system can modify this phase of the object detection process based on the amount of computational resources available for the process or the latency requirements for the object detection process.
  • the system can adjust how many points are used for each two-dimensional proposal location to satisfy the resource or latency requirements, i.e., the system can adapt the object detector to different computational settings without needing to re-train any of the neural network layers that are used by the object detector.
  • the system can prioritize the points that have higher predictive priorities or that are in spatial regions that are likely to be relevant. For example, in the case of a self-driving vehicle, the system can prioritize points that are likely to be relevant to operation of the vehicle.
  • the system then processes the feature representations of the two-dimensional proposal locations using an object detection neural network that is configured to generate an object detection output that identifies objects in the scene.
  • FIG. 1 is a block diagram of an example on-board system 100.
  • the on-board system 100 is physically located on-board a vehicle 102.
  • the vehicle 102 in FIG. 1 is illustrated as an automobile, but the on-board system 100 can be located on-board any appropriate vehicle type.
  • the vehicle 102 can be a fully autonomous vehicle that makes fully-autonomous driving decisions or a semi-autonomous vehicle that aids a human operator.
  • the vehicle 102 can autonomously apply the brakes if a full- vehicle prediction indicates that a human driver is about to collide with a detected object, e.g., a pedestrian, a cyclist, another vehicle. While the vehicle 102 is illustrated in FIG.
  • the vehicle 102 can be any appropriate vehicle that uses sensor data to make fully-autonomous or semi-autonomous operation decisions.
  • the vehicle 102 can be a watercraft or an aircraft.
  • the on-board system 100 can include components additional to those depicted in FIG. 1 (e.g., a control subsystem or a user interface subsystem).
  • the on-board system 100 includes a sensor subsystem 120 which enables the on board system 100 to“see” the environment in a vicinity of the vehicle 102.
  • the sensor subsystem 120 includes one or more sensors, some of which are configured to receive reflections of electromagnetic radiation from the environment in the vicinity of the vehicle 102.
  • the sensor subsystem 120 can include one or more laser sensors (e.g., LIDAR sensors) that are configured to detect reflections of laser light.
  • the sensor subsystem 120 can include one or more radar sensors that are configured to detect reflections of radio waves.
  • the sensor subsystem 120 can include one or more camera sensors that are configured to detect reflections of visible light.
  • the sensor subsystem 120 repeatedly (i.e., at each of multiple time points) uses raw sensor measurements, data derived from raw sensor measurements, or both to generate sensor data 122.
  • the raw sensor measurements indicate the directions, intensities, and distances travelled by reflected radiation.
  • a sensor in the sensor subsystem 120 can transmit one or more pulses of electromagnetic radiation in a particular direction and can measure the intensity of any reflections as well as the time that the reflection was received.
  • a distance can be computed by determining the time which elapses between transmitting a pulse and receiving its reflection.
  • Each sensor can continually sweep a particular space in angle, azimuth, or both. Sweeping in azimuth, for example, can allow a sensor to detect multiple objects along the same line of sight.
  • the sensor data 122 includes point cloud data that characterizes the latest state of an environment (i.e., an environment at the current time point) in the vicinity of the vehicle 102.
  • a point cloud is a collection of data points defined by a given coordinate system.
  • a point cloud can define the shape of some real or synthetic physical system, where each point in the point cloud is defined by three values representing respective coordinates in the coordinate system, e.g., (x, y, z) coordinates.
  • each point in the point cloud can be defined by more than three values, wherein three values represent coordinates in the coordinate system and the additional values each represent a property of the point of the point cloud, e.g., an intensity of the point in the point cloud.
  • Point cloud data can be generated, for example, by using LIDAR sensors or depth camera sensors that are on-board the vehicle 102.
  • each point in the point cloud can correspond to a reflection of laser light or other radiation transmitted in a particular direction by a sensor on-board the vehicle 102.
  • the on-board system 100 can provide the sensor data 122 generated by the sensor subsystem 120 to a perception subsystem 130 for use in generating perception outputs 132.
  • the perception subsystem 130 implements components that identify objects within a vicinity of the vehicle.
  • the components typically include one or more fully- learned machine learning models.
  • a machine learning model is said to be“fully -learned” if the model has been trained to compute a desired prediction when performing a perception task.
  • a fully-learned model generates a perception output based solely on being trained on training data rather than on human-programmed decisions.
  • the perception output 132 may be a classification output that includes a respective object score corresponding to each of one or more object categories, each object score representing a likelihood that the input sensor data characterizes an object belonging to the corresponding object category.
  • the on-board system 100 can provide the perception outputs 132 to a planning subsystem 140.
  • the planning subsystem 140 can use the perception outputs 132 to generate planning decisions which plan the future trajectory of the vehicle 102.
  • the planning decisions generated by the planning subsystem 140 can include, for example: yielding (e.g., to pedestrians identified in the perception outputs 132), stopping (e.g., at a“Stop” sign identified in the perception outputs 132), passing other vehicles identified in the perception outputs 132, adjusting vehicle lane position to accommodate a bicyclist identified in the perception outputs 132, slowing down in a school or construction zone, merging (e.g., onto a highway), and parking.
  • the planning decisions generated by the planning subsystem 140 can be provided to a control system of the vehicle 102.
  • the control system of the vehicle can control some or all of the operations of the vehicle by implementing the planning decisions generated by the planning system.
  • the control system of the vehicle 102 may transmit an electronic signal to a braking control unit of the vehicle.
  • the braking control unit can mechanically apply the brakes of the vehicle.
  • the on-board system 100 In order for the planning subsystem 140 to generate planning decisions which cause the vehicle 102 to travel along a safe and comfortable trajectory, the on-board system 100 must provide the planning subsystem 140 with high quality perception outputs 132.
  • Many approaches to classifying or detecting objects within point cloud data involve projecting point clouds into 2D planar images and processing such point clouds as if they are camera images, e.g., using image processing techniques such as those involving the use of convolutional neural network (CNN) architectures or convolutional operations, to detect objects in the resulting images.
  • CNN convolutional neural network
  • CNN convolutional neural network
  • the perception subsystem 130 may implement a non- convolutional object detector designed specifically for point cloud data that may better fit the requirements of autonomous vehicles.
  • a non- convolutional object detector designed specifically for point cloud data that may better fit the requirements of autonomous vehicles. The architecture and functionality of such an object detector is described in further detail below with reference to FIG. 2.
  • the perception subsystem 230 determines or selects a subset of neighboring points in the point cloud, featurizes these points, and regresses these points to object class and bounding box parameters.
  • the object location is predicted relative to the selected location and does not employ any global information, i.e., information for points that are outside the subset of neighboring points in the point cloud. This setup ensures that each spatial location may be processed by the perception subsystem 230 independently, which may enable computation of each location by the perception subsystem 230 to be parallelized to decrease inference latency.
  • each of the three-dimensional points in the scene has respective (x, y) coordinates
  • the two-dimensional proposal locations 252 that are determined by the proposal location determination engine 250 correspond to the (x, y) coordinates where individual points reside in the point cloud.
  • the proposal location determination engine 250 may determine or sample a fixed number of two-dimensional proposal locations from among the (x, y) coordinates of the three-dimensional points in the scene.
  • the featurizer 260 For each two-dimensional proposal location included in the proposal locations 252, the featurizer 260 generates a feature representation from three- dimensional points in the point cloud data that are near the two-dimensional proposal location. As such, in some examples, the featurizer 260 generates feature representations 262 based on proposal locations 252 and based further on at least a portion of sensor data 222 or an abstraction thereof.
  • the featurizer 260 may further receive or otherwise access contextual data 242 and determine or select the fixed number of points for each two-dimensional proposal location included in the proposal locations 252 based on their distance from the proposal location and based further on the contextual data 242.
  • the featurizer 260 may initially sample a larger number of points from the points that have (x,y) coordinates that are within the threshold radius, i.e., a larger number than the fixed number that will be used to generate the feature representation and then rank these points based on a relative importance to operation of the self-driving vehicle of each point based on the contextual data 242. The featurizer 260 may then select, as the determined fixed number of points, a subset of the points that have (x, y) coordinates that are within the threshold radius of the proposal location based at least in part on the ranking. For example, the featurizer 260 may rank the points based on distance from the vehicle or based on other information in the contextual data 242.
  • contextual data 242 may include data obtained or generated by the perception subsystem 230 for one or more previous frames, including sensor data 222 from one or more previous frames, proposal locations 252 from one or more previous frames, feature representations 262 from one or more previous frames, and/or perception output 232 from one or more previous frames.
  • data from previous frames may serve to provide the perception subsystem 230 with a relatively reliable estimate of where objects may be expected to be located.
  • the perception subsystem 230 may be able to allocate more computational resources to the regions in the scene in which objects are more likely to be located in the current frame and/or allocate fewer computational resources to the regions in the scene in which objects are less likely to be located in the current frame.
  • the featurizer 260 includes a featurizer neural network that may be leveraged to generate feature representations 262. More specifically, for a given proposal location, the featurizer 260 may process a featurizer input for the given proposal location using the featurizer neural network to generate a feature representation for the given proposal location.
  • the featurizer input that is applied to the featurizer neural network may include data indicating a fixed number of points that are determined or selected for the given proposal location.
  • the featurizer input that is applied to the featurizer neural network may include data indicating the re-centered points.
  • FIG. 3 is a block diagram of an example featurizer neural network 360.
  • the featurizer neural network 360 receives data 357 as input and generates, based at least in part on data 357, a set of feature representations 362.
  • the featurizer neural network 360 may be implemented as part of the featurizer 260 of the perception subsystem 230 as described herein with reference to FIG. 2.
  • data 357 and feature representations 362 may correspond to the featurizer input and feature representations 262 as described above with reference to FIG. 2, respectively.
  • the featurizer neural network 360 includes multiple layers 361A-361E (e.g., 5 layers).
  • each object detection output included in the perception output 232 corresponds to one of the proposal locations and one of the anchor offsets and identifies (i) a location of a possible object relative to a region of the scene that corresponds to the proposal location offset by the anchor offset and (ii) a likelihood that an object is located at the identified location.
  • different anchor offsets are associated with different projection weights, and to generate a respective feature vector for each anchor offset, the object detection neural network 270 projects each feature representation included in the feature representations 262 in accordance with projection weights associated with the anchor offset.
  • the featurizer neural network of the featurizer 260 and the object detection neural network 270 may be trained jointly on ground truth object detection outputs for point clouds in a set of training data.
  • the featurizer neural network of the featurizer 260 may correspond to the featurizer neural network 360 as described with reference to FIG. 3.
  • the loss function used for the training of these neural networks can be an object detection loss that measures the quality of object detection outputs generated by the these neural networks relative to the ground truth object detection outputs, e.g., smoothed LI losses for regressed values and cross entropy losses for classification outputs.
  • the perception subsystem 230 is further configured to remove points that are likely associated with ground reflections from obtained point cloud data.
  • FIG. 4 is a flow diagram of an example process 400 for detecting objects within point clouds.
  • the process 400 will be described as being performed by a system of one or more computers located in one or more locations.
  • an on board system e.g., the on-board system 100 of FIG. 1, or subsystems thereof, e.g., the perception subsystem 130 of FIG. 1 or the perception subsystem 230 of FIG. 2, appropriately programmed in accordance with this specification, can perform the process 400.
  • process 400 may be performed by other systems or system configurations.
  • a computer program which may also be referred to or described as a program, software, a software application, an app, a module, a software module, a script, or code, can be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages; and it can be deployed in any form, including as a stand alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
  • a program may, but need not, correspond to a file in a file system.
  • a program can be stored in a portion of a file that holds other programs or data, e.g., one or more scripts stored in a markup language document, in a single file dedicated to the program in question, or in multiple coordinated files, e.g., files that store one or more modules, sub programs, or portions of code.
  • a computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a data communication network.
  • Computers suitable for the execution of a computer program can be based on general or special purpose microprocessors or both, or any other kind of central processing unit.
  • a central processing unit will receive instructions and data from a read only memory or a random access memory or both.
  • the essential elements of a computer are a central processing unit for performing or executing instructions and one or more memory devices for storing instructions and data.
  • the central processing unit and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
  • a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks. However, a computer need not have such devices.
  • Machine learning models can be implemented and deployed using a machine learning framework, e.g., a TensorFlow framework, a Microsoft Cognitive Toolkit framework, an Apache Singa framework, or an Apache MXNet framework.
  • a machine learning framework e.g., a TensorFlow framework, a Microsoft Cognitive Toolkit framework, an Apache Singa framework, or an Apache MXNet framework.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)

Abstract

La présente invention concerne des procédés, des systèmes et un appareil, y compris des programmes d'ordinateur codés sur des supports de stockage informatiques, pour traiter des données de nuage de points représentant une mesure de capteur d'une scène capturée par un ou plusieurs capteurs pour générer une sortie de détection d'objet qui identifie des emplacements d'un ou plusieurs objets dans la scène. Lorsqu'elle est déployée dans un système embarqué d'un véhicule, la sortie de détection d'objet qui est générée peut être utilisée pour prendre des décisions de conduite autonome pour le véhicule avec une précision améliorée.
PCT/US2020/041200 2019-07-08 2020-07-08 Détection d'objet dans des nuages de points Ceased WO2021007320A1 (fr)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN202080050276.4A CN114080629A (zh) 2019-07-08 2020-07-08 点云中的对象检测
JP2022500800A JP2022539843A (ja) 2019-07-08 2020-07-08 点群におけるオブジェクト検出
EP20750916.7A EP3980932B1 (fr) 2019-07-08 2020-07-08 Détection d'objet dans des nuages de points
KR1020227004263A KR20220031685A (ko) 2019-07-08 2020-07-08 포인트 클라우드들에서의 객체 검출

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201962871669P 2019-07-08 2019-07-08
US62/871,669 2019-07-08

Publications (1)

Publication Number Publication Date
WO2021007320A1 true WO2021007320A1 (fr) 2021-01-14

Family

ID=71944315

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2020/041200 Ceased WO2021007320A1 (fr) 2019-07-08 2020-07-08 Détection d'objet dans des nuages de points

Country Status (6)

Country Link
US (1) US11450120B2 (fr)
EP (1) EP3980932B1 (fr)
JP (1) JP2022539843A (fr)
KR (1) KR20220031685A (fr)
CN (1) CN114080629A (fr)
WO (1) WO2021007320A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220319054A1 (en) * 2021-03-01 2022-10-06 Waymo Llc Generating scene flow labels for point clouds using object labels

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2021131652A (ja) * 2020-02-19 2021-09-09 株式会社トプコン データ構造、記録媒体、プログラム、及びシステム
US11636592B2 (en) * 2020-07-17 2023-04-25 International Business Machines Corporation Medical object detection and identification via machine learning
CN112801036A (zh) * 2021-02-25 2021-05-14 同济大学 一种目标识别方法、训练方法、介质、电子设备及汽车
US20220292813A1 (en) * 2021-03-10 2022-09-15 Acronis International Gmbh Systems and methods for detecting objects an image using a neural network trained by an imbalanced dataset
CN113205116B (zh) * 2021-04-15 2024-02-02 江苏方天电力技术有限公司 输电线路无人机巡检拍摄目标点自动提取及航迹规划方法
US12462572B2 (en) * 2021-06-22 2025-11-04 Grabtaxi Holdings Pte. Ltd. Method and system for gathering image training data for a machine learning model
WO2023003354A1 (fr) * 2021-07-20 2023-01-26 엘지전자 주식회사 Dispositif d'émission de données de nuage de points, procédé d'émission de données de nuage de points, dispositif de réception de données de nuage de points et procédé de réception de données de nuage de points
CN115965925B (zh) * 2023-03-03 2023-06-23 安徽蔚来智驾科技有限公司 点云目标检测方法、计算机设备、存储介质及车辆
US12579814B2 (en) 2023-03-28 2026-03-17 Dell Products L.P. Computer vision-based energy usage management system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170220876A1 (en) * 2017-04-20 2017-08-03 GM Global Technology Operations LLC Systems and methods for visual classification with region proposals
US20180144496A1 (en) * 2015-04-24 2018-05-24 Oxford University Innovation Limited A method of detecting objects within a 3d environment
US20180188038A1 (en) * 2016-12-30 2018-07-05 DeepMap Inc. Detection of vertical structures based on lidar scanner data for high-definition maps for autonomous vehicles

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007069260A1 (fr) * 2005-12-16 2007-06-21 Technion Research & Development Foundation Ltd. Procede et appareil de determination de la similitude entre des surfaces
US20130202197A1 (en) * 2010-06-11 2013-08-08 Edmund Cochrane Reeler System and Method for Manipulating Data Having Spatial Co-ordinates
CN106407947B (zh) * 2016-09-29 2019-10-22 百度在线网络技术(北京)有限公司 用于无人驾驶车辆的目标物体识别方法和装置
CN110325818B (zh) * 2017-03-17 2021-11-26 本田技研工业株式会社 经由多模融合的联合3d对象检测和取向估计
CN107748871B (zh) * 2017-10-27 2021-04-06 东南大学 一种基于多尺度协方差描述子与局部敏感黎曼核稀疏分类的三维人脸识别方法
US10970553B2 (en) * 2017-11-15 2021-04-06 Uatc, Llc Semantic segmentation of three-dimensional data
US10671860B2 (en) * 2018-02-20 2020-06-02 GM Global Technology Operations LLC Providing information-rich map semantics to navigation metric map
US20210232871A1 (en) * 2018-07-05 2021-07-29 Optimum Semiconductor Technologies Inc. Object detection using multiple sensors and reduced complexity neural networks
US11676005B2 (en) * 2018-11-14 2023-06-13 Huawei Technologies Co., Ltd. Method and system for deep neural networks using dynamically selected feature-relevant points from a point cloud
CN109543601A (zh) * 2018-11-21 2019-03-29 电子科技大学 一种基于多模态深度学习的无人车目标检测方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180144496A1 (en) * 2015-04-24 2018-05-24 Oxford University Innovation Limited A method of detecting objects within a 3d environment
US20180188038A1 (en) * 2016-12-30 2018-07-05 DeepMap Inc. Detection of vertical structures based on lidar scanner data for high-definition maps for autonomous vehicles
US20170220876A1 (en) * 2017-04-20 2017-08-03 GM Global Technology Operations LLC Systems and methods for visual classification with region proposals

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHARLES R QI ET AL: "Frustum PointNets for 3D Object Detection from RGB-D Data", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 22 November 2017 (2017-11-22), XP080839554 *
TSUNG-YI LIN ET AL: "Feature Pyramid Networks for Object Detection", 9 December 2016, ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, XP080738158 *
ZHOU YIN ET AL: "VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection", 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, IEEE, 18 June 2018 (2018-06-18), pages 4490 - 4499, XP033473359, DOI: 10.1109/CVPR.2018.00472 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220319054A1 (en) * 2021-03-01 2022-10-06 Waymo Llc Generating scene flow labels for point clouds using object labels
US12106528B2 (en) * 2021-03-01 2024-10-01 Waymo Llc Generating scene flow labels for point clouds using object labels

Also Published As

Publication number Publication date
US11450120B2 (en) 2022-09-20
KR20220031685A (ko) 2022-03-11
JP2022539843A (ja) 2022-09-13
CN114080629A (zh) 2022-02-22
EP3980932A1 (fr) 2022-04-13
US20210012089A1 (en) 2021-01-14
EP3980932B1 (fr) 2025-09-03

Similar Documents

Publication Publication Date Title
US11450120B2 (en) Object detection in point clouds
US11670038B2 (en) Processing point clouds using dynamic voxelization
JP7239703B2 (ja) 領域外コンテキストを用いたオブジェクト分類
US12546882B2 (en) Camera-radar sensor fusion using local attention mechanism
KR102745062B1 (ko) 앵커 궤적들을 이용하는 에이전트 궤적 예측
US12373984B2 (en) Multi-modal 3-D pose estimation
RU2767955C1 (ru) Способы и системы для определения компьютером наличия динамических объектов
EP4086817A1 (fr) Modèles d'apprentissage de machine d'apprentissage automatique épuré utilisant un extracteur de fonctions pré-entraîné
US10963706B2 (en) Distributable representation learning for associating observations from multiple vehicles
CN114061581A (zh) 通过相互重要性对自动驾驶车辆附近的智能体排名
US11105924B2 (en) Object localization using machine learning
US11657268B1 (en) Training neural networks to assign scores
US20220355824A1 (en) Predicting near-curb driving behavior on autonomous vehicles
US11774596B2 (en) Streaming object detection within sensor data
US20240062386A1 (en) High throughput point cloud processing
US20240232647A9 (en) Efficient search for data augmentation policies
US12548248B2 (en) Late-to-early temporal fusion for point clouds
US12195013B2 (en) Evaluating multi-modal trajectory predictions for autonomous driving
US20250200751A1 (en) Training a point cloud processing model using a computer vision model

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20750916

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022500800

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2020750916

Country of ref document: EP

Effective date: 20220105

ENP Entry into the national phase

Ref document number: 20227004263

Country of ref document: KR

Kind code of ref document: A

WWR Wipo information: refused in national office

Ref document number: 1020227004263

Country of ref document: KR

WWG Wipo information: grant in national office

Ref document number: 2020750916

Country of ref document: EP