EP1654882A2 - Verfahren zum repräsentieren einer bildsequenz durch verwendung von 3d-modellen und entsprechende einrichtungen und signal - Google Patents

Verfahren zum repräsentieren einer bildsequenz durch verwendung von 3d-modellen und entsprechende einrichtungen und signal

Info

Publication number
EP1654882A2
EP1654882A2 EP04767398A EP04767398A EP1654882A2 EP 1654882 A2 EP1654882 A2 EP 1654882A2 EP 04767398 A EP04767398 A EP 04767398A EP 04767398 A EP04767398 A EP 04767398A EP 1654882 A2 EP1654882 A2 EP 1654882A2
Authority
EP
European Patent Office
Prior art keywords
mesh
images
model
dimensional
gop
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP04767398A
Other languages
English (en)
French (fr)
Inventor
Raphaèle BALTER
Patrick Gioia
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Orange SA
Original Assignee
France Telecom SA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by France Telecom SA filed Critical France Telecom SA
Publication of EP1654882A2 publication Critical patent/EP1654882A2/de
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating three-dimensional [3D] models or images for computer graphics
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/001Model-based coding, e.g. wire frame
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/177Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a group of pictures [GOP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/179Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a scene or a shot
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/20Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
    • H04N19/23Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding with coding of regions that are present throughout a whole video segment, e.g. sprites, background or mosaic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/20Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
    • H04N19/27Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding involving both synthetic and natural picture components, e.g. synthetic natural hybrid coding [SNHC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/20Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
    • H04N19/29Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding involving scalability at the object level, e.g. video object layer [VOL]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/527Global motion vector estimation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals

Definitions

  • the field of the invention is that of the coding of image sequences. More specifically, the invention relates to a technique for coding image sequences by flow of three-dimensional models, or 3D.
  • video coding by 3D models consists in representing a video sequence by one or more 3D textured models.
  • the information to be transmitted to an encoder of the image sequence are the 3D models, the texture images associated with them, and the parameters of the camera having filmed the sequence.
  • This type of coding therefore makes it possible to achieve lower bit rates than conventional coding techniques, according to which videos are generally represented by a set of pixels, which is much more expensive to transmit.
  • Certain techniques called active, require controlling the lighting of a real scene, and generally use laser technology, or a large number of cameras, in order to acquire several viewing angles and a lot of depth information.
  • camera calibration which consists of estimating the image formation parameters (i.e., the intrinsic parameters of the camera (focal length, etc.) and its extrinsic parameters (positions of the camera for the acquisition of the different footage of the sequence, etc.)).
  • the mapping is generally managed manually, as described by N. M. Bove et al. in "Semiautomatic 3D-model extraction from uncalibrated 2-D camera views," (in French, “semi-automatic extraction of 3D models from two-dimensional non-calibrated camera views”) Proceedings Visual Data Exploration and Analysis, 1995.
  • mapping is not managed manually.
  • this step consists in following along the video sequence the particular points or lines extracted during the previous step; - linking of the different images; - projective reconstruction of 3D points;
  • An autocalibration by fixing certain unknown factors to their default values and by applying the concept of the absolute conic, allows to find the internal parameters of the camera, in order to pass to a metric representation.
  • the data is then merged into a common 3D model, using a method which concatenates the points which correspond on several images, to form two chains (a descending chain and a rising chain), from the disparity maps. and rotations calculated during calibration.
  • a multi-resolution approach is proposed.
  • a drawback of this technique is that the multi-resolution approach proposed for large objects requires having several videos of the same scene, in order to have access not only to an overview but also to the details.
  • this method is of semi-automatic type.
  • two images are selected, in order to obtain an initial reconstruction, by determining the projection matrices for intrinsic parameters and an approximate rotation matrix, and by triangulating.
  • the position of the cameras corresponding to the other views is then determined using epipolar geometry.
  • the structure is then refined using a Kalman filter (described by M. Pollefeys, in "Tutorial on 3D Modeling from Images," eccv2000, June 26, 2000, Dublin, Ireland) extended for each point.
  • a beam adjustment is carried out.
  • the virtual 3D model is then obtained by elevating the triangular mesh on one of the images in the sequence, eliminating the points for which the depth is not available.
  • a disadvantage of this method is that it only works well on simple scenes, and is not suitable for complex scenes.
  • An estimation of the dense movement is carried out, based on the equation of the optical flow or on a deformable 2D mesh, in order to allow an estimation between images distant from the sequence (namely the key images which delimit the GOPs). These key images are selected in parallel and are used to support the estimation of the 3D model.
  • the robust calculation of the intrinsic and extrinsic parameters of the cameras is also carried out on the key images, and refined simultaneously with the 3D geometry, by a method of beam adjustment by sliding window.
  • the positions of the intermediate images are estimated by location by Dementhon (see in particular "Representation of video sequence: automatic extraction scheme for a stream of 3D models, applications to compression and virtual reality", University of Rennes 1, January 2002, by Franck Galpin) in order to be able to reconstruct the original sequence, as illustrated in Figure 1.
  • the initial sequence comprises a plurality of successive images I k , grouped into groups of images called GOPs.
  • the images I 0 to I 5 are grouped together in a first GOP referenced 1, which is associated with a 3D model M 0 .
  • the images I 5 to I 13 are collected in a second GOP referenced 2, with which a second model M t is associated.
  • FIG. 2a presents the evolution of the PSNR, FIGS.
  • the first curve (the highest in the figure) is the objective quality of the reconstructed sequence, obtained by reprojection of the 3D models according to the method of Franck Galpin in texture space, ie without taking into account the distortions geometric.
  • the two other curves in FIG. 2a indicate the objective quality for the reconstructed sequences obtained by the method of Franck Galpin and by the H264 coder in image space.
  • the performances obtained are similar for the Franck Galpin coder and the H26L coder, it will be noted that, from a visual point of view, the quality obtained is higher with the coder based on a flow of 3D models, in particular in terms of respect for details, absence of block effects, etc.
  • this coding technique based on a flow of 3D models makes it possible to achieve very low bit rates for satisfactory visual quality, as illustrated by FIGS. 3a to 3c, which respectively present: the evolution of the PSNR; - an image obtained using this technique; a detail area of this image, for a bit rate of 16kb / s.
  • a drawback of this technique of the prior art is that all of the 3D models obtained for a sequence of images are only partially redundant, which makes this technique unsuitable for free navigation applications in a scene.
  • This method is therefore not, or ill-suited to implementation on display terminals of very diverse processing capacities, or on variable speed transmission networks.
  • the invention particularly aims to overcome these drawbacks of the prior art.
  • an objective of the invention is to provide a technique for representing a sequence of images by 3D model which is suitable for any type of sequence of fixed or static images, or of scenes, including complex ones.
  • the invention aims to implement such a technique which allows the reconstruction of a scene, on which no hypothesis is formulated, which is acquired with a consumer device, of which we know neither the characteristics, nor move.
  • Another objective of the invention is to implement such a technique which makes it possible to obtain a sequence reproduced by reprojection of good visual quality, even when one moves away from the original trajectory of the camera used for the acquisition. of the sequence.
  • the invention also aims to provide such a technique which is suitable for low and very low flow rates.
  • the invention also aims to implement such a technique which is particularly well suited to large scenes.
  • Another object of the invention is to provide such a technique which is suitable for coding and virtual navigation applications.
  • the invention also aims to implement such a technique which makes it possible to obtain scalable representations (in English “scalable”) of image sequences, so as to allow transmission over networks of various bit rates, with a view to including portable applications.
  • Another objective of the invention is to provide such a technique, which allows, at the same rate, the representation of scenes of better visual quality than according to the technique of Franck Galpin described above.
  • the invention also aims to implement such a technique which allows, for the representation of a sequence of images of the same visual quality, a reduction in bit rate compared to the Franck Galpin technique described above.
  • the three-dimensional model associated with the GOP of level n is represented using an irregular mesh taking into account at least one vertex of at least the irregular mesh representing the three-dimensional model associated with the GOP of level n -1, said vertex being called common vertex.
  • the invention is based on a completely new and inventive approach to the representation of a sequence of images by 3D models. Indeed, as for the method proposed by Franck Galpin, the invention proposes an approach based, not on the extraction of a single 3D model for all the images of the sequence, but on the extraction of a flow of 3D models, each associated with a group of images, called GOP.
  • the invention proposes an inventive improvement of Franck Galpin's technique, by establishing a correspondence between the different models
  • 3D associated with each of the GOPs in order, in particular, to increase their redundancy.
  • the invention therefore advantageously allows applications of the interactive navigation type.
  • Such a correspondence between successive 3D models is made possible by using an irregular mesh of images, which adapts particularly well to the singularities of the images.
  • the irregular mesh of a 3D model thus takes into account at least one singular vertex (and more generally the particular points or lines of the image) of the irregular mesh of the previous 3D vertex.
  • the invention therefore makes it possible, for equal visual quality, to reduce the transmission rate of the sequence of images, due to the redundancy between the different 3D models. It also allows, for the same bit rate, to obtain a better visual quality of the representation of the sequence of images, thanks to the monitoring of the singularities of the image between successive 3D models.
  • a basic model constructed from said vertices common to said at least two three-dimensional models is also associated with at least two consecutive three-dimensional models.
  • all the 3D models associated with the sequence correspond to the same basic mesh.
  • This basic mesh, or coarse mesh whose various 3D models constitute refinements, corresponds to the geometric structure common to all the 3D models which are associated with it.
  • one of said three-dimensional models is obtained from said associated basic model by transformation into wavelets, using a second set of wavelet coefficients.
  • the invention therefore allows a scalable transmission of the sequence of images, adaptable as a function of the characteristics of the network or of the display terminal.
  • the elements to be transmitted for a reconstruction of the sequence are, in addition to the parameters of the camera, the basic mesh on the one hand, and the wavelet coefficients making it possible to reconstruct the various 3d models on the other hand.
  • a greater or lesser number of wavelet coefficients By transmitting a greater or lesser number of wavelet coefficients, a higher or lower quality of reconstruction is obtained, adapted to the speed of the transmission network or to the capacity of the display terminal.
  • said irregular mesh of level n is a two-dimensional irregular mesh of one of the images of said GOP of level n.
  • said mesh image is the first image of said GOP of level n.
  • each of said three-dimensional models is obtained by elevation of said irregular mesh representing it.
  • said irregular two-dimensional mesh is obtained by successive simplifications of a regular triangular mesh of said image.
  • said irregular two-dimensional mesh is obtained from a Delaunay mesh of predetermined points of interest of said image.
  • two successive GOPs have at least one common image.
  • the last image of a GOP is also the first image of the next GOP.
  • said vertices common to said levels n-1 and n are detected by estimation of movement between the first image of said GOP of level n-1 and the first image of said GOP of level n.
  • such a method comprises a step of storing said detected common vertices.
  • said irregular mesh representing said model associated with the GOP of level n also takes into account at least one vertex of at least the irregular mesh representing the model associated with the GOP of level n + 1.
  • said second set of wavelet coefficients is generated by applying at least one analysis filter to a semi-regular remeshing of said associated three-dimensional model.
  • a semi-regular mesh is a mesh whose vertices which do not have six neighbors are isolated on the mesh (that is to say that they are not neighbors between them).
  • said wavelets are second generation wavelets.
  • the said wavelets belong to the group comprising: - the chunky wavelets; polynomial wavelets; wavelets based on the Butterfly subdivision scheme.
  • the invention also relates to a signal representative of a sequence of images grouped into sets of at least two successive images, called GOPs, a three-dimensional textured mesh model being associated with each of said GOPs.
  • such a signal comprises: at least one field containing a basic model constructed from vertices common to at least two irregular meshes, each representing a three-dimensional model, said at least two three-dimensional models being associated with at least two Successive GOPs; at least one field containing a set of wavelet coefficients making it possible to construct, by transformation into wavelets from said basic model, at least one three-dimensional model associated with one of said GOPs; at least one field containing at least one texture associated with one of said three-dimensional models; at least one field containing at least one camera position parameter.
  • the invention also relates to a device for representing a sequence of images implementing the representation method described above.
  • the invention relates in particular to a device for representing a sequence of images grouped into sets of at least two successive images, called GOPs, a three-dimensional meshed texture model being associated with each of said GOPs.
  • such a device comprises: means for constructing said three-dimensional models, by transforming into wavelets at least one basic model, developed from vertices common to at least two irregular meshes representing two successive three-dimensional models; means for representing said images of the sequence from said three-dimensional models, at least one texture image and at least one camera position parameter.
  • the invention also relates to a device for coding a sequence of images grouped into sets of at least two successive images, called GOPs, a three-dimensional textured mesh model being associated with each of said GOPs.
  • a coding device comprises means for coding a three-dimensional model associated with the GOP of level n, said three-dimensional model being represented using an irregular mesh taking account of at least one vertex d 'at least the irregular mesh representing the three-dimensional model associated with the GOP of level n-1.
  • FIGS. 2a to 2e already commented on in relation to the prior art, illustrate a comparison of the visual results obtained according to a technique of the H26L type on the one hand, and according to the coding technique of FIG. 1 on the other hand;
  • Figures 3a to 3c already discussed in connection with the prior art, present the results obtained according to the technique of Figure 1 for a low bit rate of 16kb / s;
  • FIG. 4 illustrates the general principle of the reconstruction of a video sequence from a 3D model;
  • FIG. 5 illustrates the general principle of the present invention, based on the extraction of a stream of 3D models, each associated with a basic model, common to one or more 3D models;
  • FIG. 6 presents the various wavelet coefficients used for the coding of the 3D models of FIG. 4;
  • FIG. 7 presents a block diagram of the different steps implemented according to the invention for coding the images of the sequence.
  • the general principle of the invention is based on the extraction of a stream of 3D models with which irregular meshes are associated, adapted to the content of the images of the sequence, and which take into account the correspondents of the vertices of the irregular mesh of the 3D model. previous.
  • a sequence of images 45 is obtained, which is called the original sequence.
  • At least one 3D model 47 (a plurality of 3D models according to the invention) is constructed, from which a sequence of images 49 can be reconstructed (48), for display on a display terminal.
  • a sequence of images 49 can be reconstructed (48), for display on a display terminal.
  • Each 3D model corresponds to a part of the original image sequence, that is to say to a GOP (in English "Group of Pictures").
  • the 3D models considered are irregularly meshed elevation maps, under the constraint of taking into account the correspondents of the vertices of the previous model. This constraint makes it possible to guarantee precise correspondences between the vertices of successive models.
  • the transformations allowing to pass from one model to another are decomposed into wavelets, which makes it possible to adapt the precision of the transformation to the flow rate, thanks to the natural scalability of the wavelets.
  • the invention is also based on the reconstruction of basic models, which are associated with one or more successive GOPs, as illustrated in FIG. 4.
  • the original sequence of images is made up of successive images I k . More particularly, FIG. 4 shows the images I 0 , 1 3 , 1 5 , 1 10 , 1 20 , 1 30 , 1 40 , I 50 , and I 60 .
  • This sequence can be of any length, no restrictive hypothesis being necessary according to the present invention.
  • the sequence of images I k is divided into successive groups of images, called GOPs.
  • the first GOP 50 comprises the images referenced I 0 to I 5
  • the second GOP 51 comprises the images I 5 to I 20
  • a (k + l) ⁇ eme GOP 52 notably includes the images I 30 to I 40
  • a ( k + 2) ' th GOP 53 includes images I 40 to I 60 .
  • the last image of a GOP is also the first image of the following GOP: thus, the image I 5 for example belongs to the first GOP 50 and to the second GOP 51 .
  • a 3D model M k For each of these GOPs 50 to 53, a 3D model M k .
  • the 3D M 0 model is associated with the GOP 50
  • the 3D M model ! is associated with GOP 51, etc.
  • MB k a set of basic models, denoted MB k , of which the 3D models M k constitute refinements. So, in Figure 4, the model of base MB 0 is associated with 3D models M 0 to M k , and the basic model MB l is associated with 3D models M k , M k + 1 and following.
  • the basic mesh MB k may be valid for a variable number of GOPs, or even possibly for the whole sequence of images. Thanks to these basic models MB k , we can therefore express each model
  • 3D M k estimated, by the basic mesh corresponding to it on the one hand, and by a set of wavelet coefficients on the other hand.
  • the wavelet coefficients t 0 k, k + 1 to t n k ' k +! are used to pass from a 3D model M k to the 3D model M k + 1 .
  • the wavelet coefficients r 0 k to r k illustrate the transition from a 3D model M k to the associated basic model (in this case, the MB L model).
  • the first set of wavelet coefficients t k therefore defines the links between the different models M k , which makes it possible to switch from one to the other, and to generate intermediate models, either by linear interpolation between the correspondents , either implicitly thanks to wavelets.
  • the second set of wavelets r k ensures progressive and efficient transmission (in terms of throughput) of the different models.
  • the technique of the invention can be adapted to all types of terminals, whatever their processing capacity, and to all types of transmission networks, whatever their bit rate.
  • the selection 72 of the key images K k delimiting the GOPs is carried out according to the algorithm developed by Franck Galpin et al. in "Sliding Adjustment for 3D Video Representation" EURASIP Journal on Applied Signal Processing 2002: 10 (see in particular paragraph 5.1. Criteria selection). This selection 72 of the GOP start and end images is therefore based on the validation of three criteria:
  • the first key image selected is the first image, I 0 of the original sequence.
  • a calibration 75 is also carried out, making it possible to determine all the intrinsic and extrinsic parameters of the camera used to the acquisition of the sequence of images, and in particular the position P k of the camera associated with the image I k .
  • the depth map Z k associated with the GOP k is estimated (74).
  • an irregular two-dimensional mesh 77 of the maps of depth Z k is produced , under the constraint of taking into account the correspondents of the vertices of the model associated with the previous GOP, contained in the image K k .
  • This 2D mesh can be calculated in two ways: by successive simplifications from a regular mesh of triangles of side 1 (ie all the points of the image);
  • this study is made bidirectional, by forcing the mesh of the current model to take into account the correspondents, not only the vertices of the previous model, but also vertices of the following model.
  • the 3D meshes M k corresponding to the geometry of the 3D models representing the GOPs, are obtained by elevation of the estimated 2D meshes, as illustrated by the block referenced 80.
  • the advantage of expressing this transformation using wavelets is that one can adapt the precision of the transformation to the flow rate thanks to the natural scalability of the wavelets.
  • the wavelets used for the decomposition are second generation wavelets, that is to say that they can be defined on sets which have no vector space structure. In this case, with the notations in Figure 6, the wavelets are defined on the basic models MB 0 , MB t , etc.
  • the wavelet coefficients d are the solution of the following linear system:
  • Td ⁇ c Td ⁇ c
  • T depends on the type of wavelets used. Three schemes are favored according to the invention: the affine wavelets by pieces, the polynomial wavelets (in particular the Loop wavelets), and the wavelets based on the Butterfly subdivision scheme (J. Warren et al., "Multiresolution Analysis for Surfaces of Arbitrary Topological Type, "ACM Transactions on Graphics, vol. 16, pp. 34-73, 1997).
  • P is a sub-matrix which represents only the subdivision scheme (Affine, Loop, Butterfly, ...) and where the sub-matrix Q is the geometric interpretation of the wavelet coefficients.
  • Q is chosen so that the wavelet coefficients have a zero moment.
  • P and Q can be arbitrary as long as T remains invertible.
  • Figure 7 summarizes the approach just described for GOP k.
  • - C l n + p is the field of motion between the images / psychologistand I n + p
  • C k is the motion field associated with GOP k
  • - C (V) is the set of correspondents of the points of the set V found by the motion field
  • was the set of support points for the estimation of 3D information (vertices of the mesh used for the motion estimation having the highest scores with the Harris and Stephen detector and decimated regularly)
  • - E k is the set of vertices of the 3D model associated with GOP k;
  • - K k is the image of the original sequence corresponding to the key image associated with GOP k
  • - M k is the 3D model associated with GOP k
  • ⁇ k is the set of wavelet coefficients defining the transition transformation between M k and M k + 1 ,;
  • V k is the set of vertices of the mesh corresponding to the model M, ..
  • the encoder 81 receives as input the positions P k of the camera for the different images I k of the original sequence, the estimate M k of the 3D texture model, and the wavelet coefficients making it possible to transform the model M k _ ! in model M k . Simultaneously with the estimation of the 3D models M k of each of the GOPs k, illustrated in FIG. 7, basic models MBj valid for several successive GOPs are reconstructed.
  • the set of particular points detected in the first image of the GOP k along several images of the sequence. More precisely, the presence of the correspondents of these points is detected along several successive GOPs, until the number of correspondents included in the analyzed image is less than a predetermined threshold.
  • This threshold must be chosen so as to ensure the possibility of reconstruction (ie of the estimation of the fundamental matrix); it is chosen for example equal to 7.
  • the coefficients t k of FIG. 6 are obtained in the following way: the basic meshes from the same GOP are identical, and generate after subdivision, the same semi-regular mesh. Consequently, the coefficients r k are indexed by the same geometric vertices when k varies in the same GOP. For each intermediate k, we can therefore define a function 1 * which makes the difference between the coefficients r correspond to each of these vertices ; k and r ; k + 1 . This function i * is then broken down, as before, into wavelet coefficients, which are the coefficients
  • the invention therefore makes it possible to transmit the geometry of the models associated with the original sequence inexpensively, since the basic meshes are transmitted on the one hand and the wavelet coefficients associated with the different models on the other hand .
  • the possible applications within the framework of the invention are numerous.
  • the invention thus applies very particularly to the coding of images representing the same fixed scene (which can be a set of independent images or a video).
  • the compression rates achieved by this type of representation are in the low and very low bit rates (typically of the order of 20 kbits / s) and we can therefore consider portable applications.
  • the virtual sequence obtained by reprojection has all the features allowed by 3D, such as changing the illumination, stabilizing the sequence, free navigation, adding objects ...

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Graphics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Processing Or Creating Images (AREA)
  • Image Generation (AREA)
EP04767398A 2003-06-18 2004-06-18 Verfahren zum repräsentieren einer bildsequenz durch verwendung von 3d-modellen und entsprechende einrichtungen und signal Withdrawn EP1654882A2 (de)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FR0307375A FR2856548A1 (fr) 2003-06-18 2003-06-18 Procede de representation d'une sequence d'images par modeles 3d, signal et dispositifs correspondants
PCT/FR2004/001542 WO2004114669A2 (fr) 2003-06-18 2004-06-18 Procede de representation d’une sequence d’images par modeles 3d, signal et dispositifs correspondants

Publications (1)

Publication Number Publication Date
EP1654882A2 true EP1654882A2 (de) 2006-05-10

Family

ID=33484549

Family Applications (1)

Application Number Title Priority Date Filing Date
EP04767398A Withdrawn EP1654882A2 (de) 2003-06-18 2004-06-18 Verfahren zum repräsentieren einer bildsequenz durch verwendung von 3d-modellen und entsprechende einrichtungen und signal

Country Status (8)

Country Link
EP (1) EP1654882A2 (de)
JP (1) JP2006527945A (de)
KR (1) KR20060015755A (de)
CN (1) CN1806443A (de)
BR (1) BRPI0411506A (de)
CA (1) CA2528709A1 (de)
FR (1) FR2856548A1 (de)
WO (1) WO2004114669A2 (de)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100053153A1 (en) 2007-02-01 2010-03-04 France Telecom Method of coding data representative of a multidimensional texture, coding device, decoding method and device and corresponding signal and program
EP2147557B1 (de) * 2007-04-18 2012-04-18 Gottfried Wilhelm Leibniz Universität Hannover Skalierbare komprimierung zeitkonsistenter 3d-netzwerksequenzen
CN104243958B (zh) * 2014-09-29 2016-10-05 联想(北京)有限公司 三维网格数据的编码、解码方法以及编码、解码装置
EP3516872A4 (de) * 2016-09-21 2020-04-15 Kakadu R & D Pty Ltd Am sockel verankerte modelle und inferenz zur kompression und zum upsampling von video- und mehrfachansichtsbildern
GB2563895B (en) * 2017-06-29 2019-09-18 Sony Interactive Entertainment Inc Video generation method and apparatus
EP4064206B1 (de) * 2019-11-20 2026-02-25 Panasonic Intellectual Property Management Co., Ltd. Verfahren zur erzeugung eines dreidimensionalen modells und vorrichtung zur erzeugung eines dreidimensionalen modells
CN111862305B (zh) * 2020-06-30 2024-06-18 阿波罗智能技术(北京)有限公司 处理图像的方法、装置、电子设备、存储介质和程序产品
JP7701898B2 (ja) * 2022-07-09 2025-07-02 Kddi株式会社 メッシュ復号装置、メッシュ符号化装置、メッシュ復号方法及びプログラム
US12542926B2 (en) * 2022-08-12 2026-02-03 Tencent America LLC Motion field coding in dynamic mesh compression
WO2025154699A1 (ja) * 2024-01-16 2025-07-24 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ 符号化方法、復号方法、符号化装置及び復号装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO2004114669A2 *

Also Published As

Publication number Publication date
BRPI0411506A (pt) 2006-07-25
WO2004114669A3 (fr) 2005-03-10
CA2528709A1 (en) 2004-12-29
JP2006527945A (ja) 2006-12-07
FR2856548A1 (fr) 2004-12-24
CN1806443A (zh) 2006-07-19
KR20060015755A (ko) 2006-02-20
WO2004114669A2 (fr) 2004-12-29

Similar Documents

Publication Publication Date Title
US20030235338A1 (en) Transmission of independently compressed video objects over internet protocol
WO2020117657A1 (en) Enhancing performance capture with real-time neural rendering
Nocerino et al. A smartphone-based pipeline for the creative industry-The REPLICATE project
EP1604529A2 (de) Verfahren und einrichtungen zur codierung und decodierung einer bildsequenz mittels bewegungs-/texturzerlegung und wavelet-codierung
WO2002007099A1 (fr) Estimateur de mouvement pour le codage et le decodage de sequences d'images
EP3891991A1 (de) Punktwolkencodierung mittels homografischer transformation
EP3939304B1 (de) Verfahren und vorrichtungen zur codierung und decodierung von mehrfachansichtssequenzen
EP1654882A2 (de) Verfahren zum repräsentieren einer bildsequenz durch verwendung von 3d-modellen und entsprechende einrichtungen und signal
WO2022129737A1 (fr) Procédé et dispositif de compression de données représentatives d' une scène tridimensionnelle volumétrique en vue d'une décompression en temps réel pour un visionnage en ligne
Jantet Layered depth images for multi-view coding
EP1413140B1 (de) Verfahren zur bewegungsschätzung zwischen zwei bildern mit verwaltung der wendung einer masche und entsprechendes kodierungsverfahren
EP1116185B1 (de) Kompression- und kodierungsverfahren eines 3d maschennetzwerks
FR2813485A1 (fr) Procede de construction d'au moins une image interpolee entre deux images d'une sequence animee, procedes de codage et de decodage, signal et support de donnees correspondant
US20070064099A1 (en) Method of representing a sequence of pictures using 3d models, and corresponding devices and signals
EP0722251A1 (de) Verfahren zur Bildinterpolation
Zhang et al. Light field sampling
WO2006030103A1 (fr) Procede d'estimation de mouvement a l'aide de maillages deformables
EP3991401A1 (de) Verfahren und vorrichtung zur verarbeitung von daten von mehrfachansichtsvideo
WO2006040270A2 (fr) Procede de decodage local d'un train binaire de coefficients d'ondelettes
EP3918576A1 (de) Verfahren zur dynamischen dreidimensionalen bildgebung
Hayat Scalable 3D visualization via synchronous data hiding
Jacumin Variational methods for PDE-based image and video compression
Rudolph et al. Unified Compression of Point Cloud Geometry and Attributes through Variable-Rate Conditioning
WO2000022577A1 (fr) Procede de codage d'un maillage source, avec optimisation de la position d'un sommet resultant d'une fusion d'arete, et applications correspondantes
Lingadahalli-Ravi Machine Learning-based Depth Estimation and View Synthesis for Immersive Video

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20060118

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LI LU MC NL PL PT RO SE SI SK TR

AX Request for extension of the european patent

Extension state: AL HR LT LV MK

RIN1 Information on inventor provided before grant (corrected)

Inventor name: GIOIA, PATRICK

Inventor name: BALTER, RAPHAELE

17Q First examination report despatched

Effective date: 20060830

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20080610