CN103226830B

CN103226830B - The Auto-matching bearing calibration of video texture projection in three-dimensional virtual reality fusion environment

Info

Publication number: CN103226830B
Application number: CN201310148771.0A
Authority: CN
Inventors: 高鑫光; 兰江; 李胜; 汪国平
Original assignee: Peking University
Current assignee: Beijing Weishiwei Information Technology Co ltd
Priority date: 2013-04-25
Filing date: 2013-04-25
Publication date: 2016-02-10
Anticipated expiration: 2033-04-25
Also published as: CN103226830A

Abstract

The invention relates to an automatic matching correction method for video texture projection in a three-dimensional virtual-real fusion environment and a real video image and virtual scene fusion method. The steps of the automatic matching correction method include: building a virtual scene, acquiring video data, video texture fusion, and projector correction. Using the real video shot, through the way of texture projection, the virtual scene fusion is carried out on the complex ground surface and the scene surface of the building, which improves the expression and display ability of the dynamic information of the scene in the virtual reality environment, and also enhances the sense of layering of the scene. By increasing the number of videos taken from different shooting angles, the dynamic video texture coverage effect of a large-scale virtual scene can be realized, thereby realizing the dynamic realistic effect of virtual reality fusion of the virtual reality environment and the displayed scene. Through the pre-color consistency processing of the video frame, the obvious color jump is eliminated and the visual effect is improved. Through the automatic correction algorithm proposed by the present invention, the fusion of virtual scene and real video is more accurate.

Description

Automatic Matching and Correction Method for Video Texture Projection in 3D Virtual Reality Fusion Environment

技术领域technical field

本文涉及虚拟现实，尤其涉及一种利用真实视频影像与虚拟场景融合、校正的方法，属于虚拟现实、计算机图形学、计算机视觉和人机交互等技术领域。This article relates to virtual reality, in particular to a method for fusing and correcting real video images and virtual scenes, and belongs to the technical fields of virtual reality, computer graphics, computer vision, and human-computer interaction.

背景技术Background technique

在虚拟现实系统中，使用静态图片表现建筑物或者地面的表面的细节是最常用的手段，通常采用纹理贴图方式实现。该方法的不足是场景表面的纹理一旦设定便不再改变，对于场景模型表面变化要素的忽略，降低了虚拟环境的真实感，并不能给人一种身临其境的感觉。为了消除静态图片造成的真实感不足，利用视频代替图片是一种直观的想法。现阶段也有一些系统加入了视频元素，但大多是采用弹窗的形式，利用现有的视频播放器播放视频，只是达到了全局监控的效果，并没有做到视频与场景真正的融合。有一些研究工作在此基础上改进，通过在空间中构建附加平面，并在此平面上播放视频的方式增强真实感（可参见K.Kim,S.Oh,J.Lee,I.Essa.AugmentingAerialEarthMapswithDynamicInformation.IEEEinternationalSymposiumonMixedandAugmentedReality,ScienceandTechnologyProceedings.19-22Oct,2009,Orlando,Florida,USA.和Y.Wang,D.Bowman,D.Krum,E.Coelho,T.Smith-Jackson,D.Bailey,S.Peck,S.Anand,T.Kennedy,andY.Abdrazakov.EffectsofVideoPlacementandSpatialContextPresentationonPathReconstructionTaskswithContextualizedVideos.IEEETransactionsonvisualizationandcomputergraphics,Vol.14,No.6,November/December2008.），虽然上述方法已经将视频加入了虚拟环境，但是使用环境十分有限，只能贴附于某些大型建筑平面或者平整的地面上，对于稍微复杂的场景情况，例如建筑拐角或者不平整的地面等位置，它们的几何形状无法用平面近似表示，这些平面播放视频的方法便不适用了。In a virtual reality system, it is the most common method to use static pictures to express the details of buildings or ground surfaces, which are usually realized by texture mapping. The disadvantage of this method is that the texture of the scene surface will not change once it is set, and the neglect of the changing elements of the scene model surface reduces the sense of reality of the virtual environment and cannot give people an immersive feeling. In order to eliminate the lack of realism caused by static pictures, it is an intuitive idea to use video instead of pictures. At this stage, there are also some systems that add video elements, but most of them use the form of pop-up windows and use existing video players to play videos, which only achieves the effect of global monitoring, but does not achieve the real integration of video and scenes. Some research work has improved on this basis, by constructing an additional plane in space and playing video on this plane to enhance the sense of reality (see K.Kim, S.Oh, J.Lee, I.Essa.AugmentingAerialEarthMapswithDynamicInformation . IEEE international Symposium on Mixed and Augmented Reality, Science and Technology Proceedings. 19-22 Oct, 2009, Orlando, Florida, USA. and Y. Wang, D. Bowman, D. Krum, E. Coelho, T. Smith-Jackson, D. Bailey, S. Peck, S. Anand, T.Kennedy, and Y.Abdrazakov.Effects ofVideoPlacementandSpatialContextPresentationonPathReconstructionTaskswithContextualizedVideos.IEEETransactionsonvisualizationandcomputergraphics, Vol.14,No.6,November/December2008.), although the above method has added the video to the virtual environment, but the use of the environment is very limited and can only be attached to On some large building planes or flat ground, for slightly complex scene conditions, such as building corners or uneven ground, their geometric shapes cannot be approximated by planes, and these methods of playing video on planes are not applicable.

另一方面，由于图形学和视觉领域的发展已经有很多成熟的算法，例如基于颜色的匹配，纹理的匹配，特征的匹配（EdgeDirection，SIFT，HOG）。但是这些方法的都是应用于二维图像的方法，在三维空间中使用起来有比较大的局限性。而且现阶段与投影机校正有关的算法，多是“投影机—屏幕”系统下对于投影区域梯形校正的算法，如：多投影机图像校正方法和设备，申请号201010500209.6，校正方法限于二维空间中，通过获取个摄像机分别采集的无重叠区域的独立图像信息与独立图像对应的校正参数，根据校正参数对应的摄像机的视频数据进行校正处理，校正只针对重叠或重叠区域较小的图像。基于多投影机旋转屏三维影像可触摸的真三维显示方法，申请号：200810114457.X，通过获得三维立体空间描述获得不同角度的截面图像，使手可直接触摸到立体影像，同时提高了立体图像的对比度，但是该申请主要依靠旋转屏来解决三维图像可触摸的问题，与本工作应用场景不同。On the other hand, due to the development of graphics and vision fields, there are already many mature algorithms, such as color-based matching, texture matching, and feature matching (EdgeDirection, SIFT, HOG). However, these methods are all applied to two-dimensional images, and have relatively large limitations when used in three-dimensional space. Moreover, the algorithms related to projector correction at this stage are mostly algorithms for trapezoidal correction of the projection area under the "projector-screen" system, such as: multi-projector image correction method and equipment, application number 201010500209.6, and the correction method is limited to two-dimensional space In this method, the correction parameters corresponding to the independent image information of the non-overlapping areas collected by the cameras are respectively obtained, and the correction process is performed according to the video data of the cameras corresponding to the correction parameters. The correction is only for images with overlapping or small overlapping areas. Real 3D display method based on multi-projector rotating screen 3D images that can be touched, application number: 200810114457.X, by obtaining 3D space description to obtain cross-sectional images at different angles, so that the hands can directly touch the 3D images, and at the same time improve the 3D images contrast, but this application mainly relies on the rotating screen to solve the problem that the 3D image can be touched, which is different from the application scenario of this work.

以上的专利申请或是现有技术中的特征匹配方法在三维空间投影机的校正上面并无太多参考意义。The above patent application or the feature matching method in the prior art does not have much reference significance for the calibration of the three-dimensional space projector.

发明内容Contents of the invention

本发明的目的在于，用拍摄的真实视频，通过纹理投影的方式，在复杂的地表和建筑物等场景表面进行虚拟场景融合，提高虚拟现实环境中场景动态信息的表达与展示能力，也增强了场景的层次感，并可以通过增加从不同拍摄角度的视频数量，实现大范围虚拟场景的动态视频纹理覆盖效果，从而实现虚拟现实环境与显示场景的虚实融合的动态真实感效果。The purpose of the present invention is to use the captured real video to perform virtual scene fusion on complex ground surfaces and buildings and other scene surfaces by means of texture projection, so as to improve the expression and display capabilities of scene dynamic information in the virtual reality environment, and also enhance The layering of the scene can be achieved, and by increasing the number of videos from different shooting angles, the dynamic video texture coverage effect of a large-scale virtual scene can be realized, so as to realize the dynamic realistic effect of the virtual reality fusion of the virtual reality environment and the displayed scene.

为了实现技术目的，本发明采用如下技术方案：In order to realize technical purpose, the present invention adopts following technical scheme:

一种三维虚实融合环境中视频纹理投影的自动匹配校正方法，其步骤包括：An automatic matching and correction method for video texture projection in a three-dimensional virtual-real fusion environment, the steps of which include:

1）根据预先获得的遥感数据影像建立表面具有静态纹理图像的地表模型以及由多个包含三维几何与纹理的模型构成的虚拟场景；获取多段真实拍摄视频流并记录拍摄时所处摄像机位姿信息；1) Based on pre-acquired remote sensing data images, establish a surface model with static texture images on the surface and a virtual scene composed of multiple models containing 3D geometry and texture; obtain multiple real shooting video streams and record the camera pose information when shooting ;

2）根据所述拍摄时所处摄像机位姿信息在所述虚拟场景中加入虚拟投影机模型及与摄影机参数相对应的投影机的视见体，同时根据摄像机位姿信息设定虚拟投影机模型虚拟场景中的初始位姿值；2) Add a virtual projector model and a projector’s viewing volume corresponding to the camera parameters in the virtual scene according to the camera pose information at the time of shooting, and set the virtual projector model according to the camera pose information The initial pose value in the virtual scene;

3）对所述真实拍摄视频流的图像进行视频帧预处理得到动态视频纹理，利用投影纹理技术将所述预处理后的视频数据投影到虚拟环境中；3) Perform video frame preprocessing on the image of the real shot video stream to obtain dynamic video texture, and project the preprocessed video data into the virtual environment by using projection texture technology;

4）将所述虚拟环境中模型表面静态纹理和/或地表原有的遥感影像纹理与所述动态视频纹理进行融合，获得场景表面覆盖的最终纹理值；4) Fusing the static texture of the model surface in the virtual environment and/or the original remote sensing image texture of the ground surface with the dynamic video texture to obtain the final texture value of the surface coverage of the scene;

5）根据所述最终纹理值从所述虚拟投影机模型中通过渲染手段获取虚拟投影机作为视点下的影像，并与真实拍摄视频流中对应影像匹配，构造能量函数；5) According to the final texture value, the virtual projector is obtained from the virtual projector model by means of rendering as an image under the viewpoint, and matched with the corresponding image in the real shot video stream to construct an energy function;

6）利用能量函数中最优解对所述虚拟场景中的投影机初始位姿值进行重新设置，完成虚拟投影机校正。6) Using the optimal solution in the energy function to reset the initial pose value of the projector in the virtual scene to complete the calibration of the virtual projector.

更进一步，所述步骤4）中纹理融合方法如下：Further, the texture fusion method in step 4) is as follows:

1）重置模型视图矩阵和投影矩阵将虚拟视点变换至投影机视点下，绘制所述虚拟场景，获得在当前投影机视点下的深度值（利用Z-Buffer实现深度缓冲）；1) Reset the model view matrix and projection matrix to transform the virtual viewpoint to the projector viewpoint, draw the virtual scene, and obtain the depth value under the current projector viewpoint (use Z-Buffer to realize depth buffer);

2）重置模型视图矩阵和投影矩阵将视点变回虚拟视点下，重新绘制所述虚拟场景，获得场景中每个点对应的真实深度值；2) Reset the model view matrix and projection matrix to change the viewpoint back to the virtual viewpoint, redraw the virtual scene, and obtain the real depth value corresponding to each point in the scene;

3）在每个投影机视点下依次绘制虚拟场景，通过自动纹理生成方式获得场景中每个点的投影纹理坐标，并对上述步骤1）、2）获得的所述真实深度值与所述深度值（利用Z-Buffer实现深度缓冲）的比较；3) Draw the virtual scene sequentially under each projector viewpoint, obtain the projected texture coordinates of each point in the scene through automatic texture generation, and compare the real depth value obtained in the above steps 1) and 2) with the depth Comparison of values (using Z-Buffer to implement depth buffer);

4）如果两者相等，采用投影机视频纹理，如果不等，采用场景模型自身纹理，并通过设定纹理组合器函数的方式迭代，直至遍历完场景内所有投影机，获得场景中每个点最终的纹理值。4) If the two are equal, use the video texture of the projector; if not, use the texture of the scene model itself, and iterate by setting the texture combiner function until all the projectors in the scene are traversed to obtain each point in the scene The final texture value.

更进一步，所述步骤5）与真实拍摄视频流中对应影像匹配，建立以位姿信息为自变量的能量函数构造方法如下：Further, the step 5) is matched with the corresponding image in the real shooting video stream, and the construction method of the energy function with the pose information as the independent variable is established as follows:

第一步，重置模型视图矩阵和投影矩阵，将虚拟场景中视点调整至投影机处，绘制场景得到一幅虚拟环境下的影像，利用mean-shift算法对图像进行分割后对图像做二值化处理；The first step is to reset the model-view matrix and projection matrix, adjust the viewpoint in the virtual scene to the projector, draw the scene to get an image in the virtual environment, use the mean-shift algorithm to segment the image, and then binary the image treatment;

第二步，从所述真实拍摄视频流中抽取出一关键帧，使用第一步的方法做二值化处理；The second step is to extract a key frame from the real shooting video stream, and use the method of the first step to do binarization;

第三步，计算投影机形成的视见体区域内轮廓误差，对所述前两步骤获得的影像逐像素做异或处理，统计结果为1的像素数量，该结果为能量函数第一部分；The third step is to calculate the contour error in the viewing volume area formed by the projector, perform XOR processing pixel by pixel on the images obtained in the first two steps, and count the number of pixels whose result is 1, which is the first part of the energy function;

第四步，利用SIFT一致性算子添加局部信息的特征，收集第一、二步所获得未经过二值化处理的影像中的匹配点对，通过关键点约束（Key-pointconstraint）过程求出匹配点对的误差值，该误差值为能量函数第二部分；The fourth step is to use the SIFT consistency operator to add the features of local information, collect the matching point pairs in the image that has not been binarized obtained in the first and second steps, and obtain the key point constraint (Key-point constraint) process The error value of the matching point pair, which is the second part of the energy function;

第五步，对于能量函数两个部分分配不同的权重；The fifth step is to assign different weights to the two parts of the energy function;

第六步，对于能量函数最优值的求解，The sixth step is to solve the optimal value of the energy function,

第七步，利用最优解替换投影机初始位姿值。The seventh step is to replace the initial pose value of the projector with the optimal solution.

更进一步，所述能量函数最优值按照以下方法求解：Furthermore, the optimal value of the energy function is solved according to the following method:

首先对能量函数施加模拟退火算法，将函数的解空间缩小到最优解近似范围内，再利用downhillsimplex算法对近似解空间压缩，获得最优解。First, the simulated annealing algorithm is applied to the energy function to reduce the solution space of the function to the approximate range of the optimal solution, and then the downhillsimplex algorithm is used to compress the approximate solution space to obtain the optimal solution.

更进一步，所述第一、二步利用mean-shift算法对图像进行分割时利用建筑和公路的颜色特征，将非建筑或者公路区域的像素值置为白色，保留建筑模型或者公路对应的区域，然后对图像做二值化处理，将建筑和公路相关区域置为黑色。Furthermore, the first and second steps use the mean-shift algorithm to segment the image by utilizing the color features of buildings and roads, setting the pixel values of non-buildings or road areas to white, and retaining the corresponding areas of building models or roads, Then binarize the image, and set the building and road-related areas to black.

更进一步，对所述真实拍摄视频流的图像进行视频帧预处理的方法如下：Further, the method for performing video frame preprocessing on the image of the real shot video stream is as follows:

视频数据解码得到单张视频图像帧，从每一个视频流中抽取一个样例帧利用SIFT算子寻找样例帧中特征点匹配，并进行颜色一致性处理。The video data is decoded to obtain a single video image frame, and a sample frame is extracted from each video stream, and the SIFT operator is used to find the matching of feature points in the sample frame, and the color consistency is processed.

更进一步，所述颜色一致性处理为：Further, the color consistency processing is:

1）从进行匹配的两个视频中各抽取一个样例帧，构建帧内所有像素形成的颜色直方图，通过颜色直方图均衡化和规定化处理，使得两幅视频帧具有相同的颜色直方图分布；1) Extract a sample frame from each of the two matching videos, construct a color histogram formed by all pixels in the frame, and make the two video frames have the same color histogram through color histogram equalization and specification processing distributed;

2）对同一视频流中每一帧做与对应样例帧相同的直方图均衡化和规定化处理，由此对整个视频流完成一致性处理；2) Do the same histogram equalization and specification processing as the corresponding sample frame for each frame in the same video stream, thereby completing the consistent processing for the entire video stream;

3）为视频帧创建缓存（cache），大小约可容纳50个视频帧（视频帧分辨率为1920*1080）；3) Create a cache for the video frame, which can accommodate about 50 video frames (the video frame resolution is 1920*1080);

4）采用先进先出（FIFO）的列表结构载入帧数据。4) Load the frame data using the list structure of first-in-first-out (FIFO).

更进一步，所述真实拍摄视频流通过http协议获取，在本地进行视频数据解码，并将视频帧保存为Jepg格式。Furthermore, the real shooting video stream is obtained through the http protocol, the video data is decoded locally, and the video frame is saved in Jepg format.

更进一步，对所述视频图像帧进行多分辨率处理，对于同一张影像根据不同情况载入不同分辨率的视频帧，采用逐像素点进行双线性内插操作，将图像抽析为原图像的1/4,1/16,1/64中的一种或多种。Furthermore, multi-resolution processing is performed on the video image frame, and video frames of different resolutions are loaded into the same image according to different situations, and a bilinear interpolation operation is performed pixel by pixel to extract the image into the original image One or more of 1/4, 1/16, 1/64.

更进一步，对所述Jepg格式视频图像增加Alpha通道。Furthermore, an Alpha channel is added to the Jepg format video image.

本发明还提出一种真实视频影像和虚拟场景融合方法，其步骤为：The present invention also proposes a real video image and a virtual scene fusion method, the steps of which are:

1）根据预先获得的遥感数据影像建立表面具有静态纹理图像模型以及虚拟场景；所述虚拟场景中模型空间位置和模型间的相对位置、朝向、大小与现实场景保持一致；1) Establish a static texture image model and a virtual scene based on pre-acquired remote sensing data images; the spatial position of the model in the virtual scene and the relative position, orientation, and size of the models are consistent with the real scene;

2）获取多段真实拍摄视频流并记录拍摄所处摄像机位姿信息；2) Obtain multiple real shooting video streams and record the pose information of the camera where the shooting takes place;

3）本发明所述方法的实现可以建立在一个基于数字地球的虚拟现实平台之上，每个虚拟投影机具有地理定位信息和虚拟现实具有的笛卡尔坐标表示这两套坐标表示方式，因此根据所述拍摄所处地球表面的经纬度坐标转换至虚拟场景所在的笛卡尔坐标表示的世界坐标并结合在所述虚拟场景中加入虚拟投影机模型以及投影机模型相对应的视见体，同时根据摄像机位姿信息设定在世界坐标系下的虚拟投影机模型虚拟场景中的初始位姿值；3) The realization of the method of the present invention can be established on a virtual reality platform based on digital earth, and each virtual projector has geographic positioning information and Cartesian coordinate representation of virtual reality. The longitude and latitude coordinates of the earth’s surface where the shooting is located are converted to the world coordinates represented by the Cartesian coordinates where the virtual scene is located, combined with adding a virtual projector model and the corresponding viewing volume of the projector model to the virtual scene, and at the same time according to the camera The pose information sets the initial pose value in the virtual scene of the virtual projector model in the world coordinate system;

4）对所述真实拍摄视频流的图像进行视频帧预处理得到动态视频纹理，利用投影纹理技术将所述预处理后的视频数据投影到虚拟环境中；4) Perform video frame preprocessing on the image of the real shot video stream to obtain dynamic video texture, and project the preprocessed video data into the virtual environment by using projection texture technology;

5）将所述虚拟环境中模型的静态纹理和/或地表原有的遥感影像纹理与所述动态视频纹理进行融合；5) Fusing the static texture of the model in the virtual environment and/or the original remote sensing image texture of the ground surface with the dynamic video texture;

6）对虚拟投影机模型中不同投影机有相交覆盖区域采用纹理融合。6) Texture fusion is used for the intersecting coverage areas of different projectors in the virtual projector model.

本发明的有益效果Beneficial effects of the present invention

（a）克服了复杂的场景条件，实现了视频与虚拟场景的融合，利用视频纹理替代了原有的地形遥感纹理以及模型固有的粗糙的静态图像纹理，为虚拟场景纹理增加了动态信息，提升了视觉效果。并通过增加视频数量，扩大影响范围。(a) Overcoming complex scene conditions, the integration of video and virtual scenes is realized, and the original terrain remote sensing texture and the inherent rough static image texture of the model are replaced by video texture, which adds dynamic information to the virtual scene texture and improves visual effects. And expand the scope of influence by increasing the number of videos.

（b）为视频提供了缓存结构，并且构建了数据金字塔，提升了显示的效率，相邻两层的数据替换可以。(b) A cache structure is provided for the video, and a data pyramid is built to improve the display efficiency, and the data replacement of two adjacent layers is possible.

（c）提供了自动校正算法，对初始的虚拟投影机位姿进行调整，让虚拟场景与真实视频的融合更为精准，与初始位置作比较，更为精准体现在能量函数的值上，位置越精准，能量函数的值越趋近于零。(c) An automatic correction algorithm is provided to adjust the initial pose of the virtual projector to make the fusion of the virtual scene and the real video more accurate. Compared with the initial position, it is more accurately reflected in the value of the energy function, the position The more precise, the closer the value of the energy function is to zero.

（d）对于视频帧进行预先颜色一致性处理，消除明显的颜色跳变，提升视觉效果。(d) Perform pre-color consistency processing on video frames to eliminate obvious color jumps and improve visual effects.

附图说明Description of drawings

图1是本发明三维虚实融合环境中视频纹理投影的自动匹配校正方法一实施例中具体操作实现流程示意图；Fig. 1 is a schematic diagram of the specific operation implementation process in an embodiment of the automatic matching and correction method for video texture projection in a three-dimensional virtual-real fusion environment of the present invention;

图2a、图2b是本发明三维虚实融合环境中视频纹理投影的自动匹配校正方法一实施例中没有添加投影纹理的场景示意图；Fig. 2a and Fig. 2b are schematic diagrams of scenes without adding projection texture in an embodiment of the automatic matching and correction method for video texture projection in a three-dimensional virtual-real fusion environment of the present invention;

图3a、图3b是本发明三维虚实融合环境中视频纹理投影的自动匹配校正方法一实施例中添加了投影纹理的场景示意图；Fig. 3a and Fig. 3b are schematic diagrams of scenes with projection texture added in an embodiment of the automatic matching and correction method for video texture projection in a three-dimensional virtual-real fusion environment of the present invention;

图4是本发明三维虚实融合环境中视频纹理投影的自动匹配校正方法一实施例中投影机未经过矫正的场景示意图；Fig. 4 is a schematic diagram of a scene where the projector has not been corrected in an embodiment of the automatic matching and correction method for video texture projection in a three-dimensional virtual-real fusion environment;

图5是本发明三维虚实融合环境中视频纹理投影的自动匹配校正方法一实施例中投影机经过校正的场景示意图。Fig. 5 is a schematic diagram of a corrected scene of a projector in an embodiment of an automatic matching correction method for video texture projection in a three-dimensional virtual-real fusion environment according to the present invention.

具体实施方式detailed description

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，可以理解的是，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. It should be understood that the described embodiments are only some of the embodiments of the present invention, not all of them. example. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without making creative efforts belong to the protection scope of the present invention.

（1）构建虚拟场景。利用预先获得的遥感影像设置地形纹理，在虚拟空间中构建表面拥有静态纹理图像的模型以及模型构成的虚拟场景，场景中模型的空间位置以及模型之间的相对位置、朝向、大小等要素应尽可能与现实场景保持一致。(1) Construct a virtual scene. Use pre-acquired remote sensing images to set terrain textures, construct models with static texture images on the surface and virtual scenes composed of models in virtual space, the spatial position of models in the scene and the relative position, orientation, size and other elements between models should be as close as possible It may be consistent with the real scene.

（2）获取视频数据。数据来源可以是监控摄像机，也可以是移动设备拍摄的视频影像，同时获取拍摄时摄像机的参数信息，用于虚拟空间中投影机的初定位。对视频流进行抽析，获取单帧图像，并对图像做多分辨率处理和颜色光照一致性处理。(2) Acquire video data. The data source can be a surveillance camera, or a video image shot by a mobile device. At the same time, the parameter information of the camera at the time of shooting is obtained, which is used for the initial positioning of the projector in the virtual space. Extract the video stream, obtain a single frame image, and perform multi-resolution processing and color and light consistency processing on the image.

（3）视频纹理融合。根据步骤（2）中获取的摄像机经纬度信息，在步骤（1）中加入虚拟投影机模型以及其视见体，并通过步骤（2）中获取的摄像机位姿信息设定虚拟投影机模型朝向。目前，由于实现方法的限制，在单一虚拟视点的视见体空间中，可以至多同时加载32个虚拟投影机模型。利用投影纹理技术将视频数据投影到虚拟环境中，将模型或者地面原有的遥感影像纹理与动态视频纹理进行融合。如果不同投影机有相交的覆盖区域，也需要对于该区域采用融合操作。(3) Video texture fusion. According to the latitude and longitude information of the camera obtained in step (2), add the virtual projector model and its view volume in step (1), and set the orientation of the virtual projector model through the camera pose information obtained in step (2). At present, due to the limitation of the implementation method, in the viewing volume space of a single virtual viewpoint, at most 32 virtual projector models can be loaded at the same time. The video data is projected into the virtual environment by using projection texture technology, and the original remote sensing image texture of the model or the ground is fused with the dynamic video texture. If different projectors have intersecting coverage areas, the blending operation also needs to be applied to this area.

（4）投影机校正。获取虚拟投影机视点下的影像，与相关真实视频中影像做匹配。通过本发明中的算法，计算虚拟影像与真实影像中建筑物或者公路差异范围，以及局部特征差异，构造能量函数，求解能量函数中的最优解。利用最优解对于虚拟场景中的投影机进行重新设置，完成校正过程，提升效果。(4) Projector calibration. Obtain the image under the virtual projector viewpoint and match it with the image in the related real video. Through the algorithm in the present invention, the range of differences between buildings or roads in the virtual image and the real image, as well as the difference in local features are calculated, an energy function is constructed, and an optimal solution in the energy function is obtained. Use the optimal solution to reset the projector in the virtual scene to complete the calibration process and improve the effect.

以下从几个方面对本发明方法做具体说明。The method of the present invention is described in detail below from several aspects.

首先对于一些概念做具体说明：First, let's explain some concepts in detail:

摄像机：真实空间中的视频源，用于获取视频数据。Camera: A video source in real space, used to acquire video data.

投影机：虚拟场景中的虚拟模型，用于在虚拟场景中投影视频纹理。Projector: A virtual model in a virtual scene, used to project video textures in a virtual scene.

纹理融合：一个模型可以应用几个不同来源的纹理，那么需要对同一个点上不同的纹理颜色值进行融合处理，得到最终的颜色值。Texture fusion: A model can apply several textures from different sources, so it is necessary to fuse different texture color values at the same point to obtain the final color value.

深度值：空间中任意一点经过透视变换之后所获得的代表在Z方向与虚拟视点距离的值。Depth value: The value obtained after perspective transformation at any point in space represents the distance from the virtual viewpoint in the Z direction.

深度缓冲（Z-Buffer）：场景经过渲染之后保存的与色彩缓冲相同大小的一个缓冲，缓冲中的每个元素存储了一个场景中的深度值，表示该元素所对应的三维场景中距离视点最近的物体表面所具有的深度值。Depth buffer (Z-Buffer): A buffer of the same size as the color buffer saved after the scene is rendered, each element in the buffer stores a depth value in the scene, indicating that the 3D scene corresponding to the element is closest to the viewpoint The depth value that the surface of the object has.

投影纹理技术：异于传统的四点纹理贴图方式，将纹理以投影的形式施加到虚拟场景中，与虚拟场景中的建筑和/或地形融合，作为建筑和/或地形的最终纹理。Projection texture technology: Different from the traditional four-point texture mapping method, the texture is applied to the virtual scene in the form of projection, and it is integrated with the building and/or terrain in the virtual scene as the final texture of the building and/or terrain.

技术方案（2）具体实现方案如下：对于视频数据，通过传统http协议获取，在本地进行视频数据解码，获取单张视频帧，将视频保存为Jepg格式，从每一个视频流中抽取一个样例帧，利用SIFT算子寻找样例帧中特征点匹配（此处可对于视频源进行预分类过程，将拍摄相同建筑的视频源划分在一起，这样可以降低匹配过程耗时，提高预处理效率），并进行颜色一致性处理。Technical solution (2) The specific implementation plan is as follows: For video data, obtain it through the traditional http protocol, decode the video data locally, obtain a single video frame, save the video in Jepg format, and extract a sample from each video stream Frame, use the SIFT operator to find the matching of feature points in the sample frame (here, the pre-classification process can be performed on the video source, and the video source of the same building can be divided together, which can reduce the time-consuming matching process and improve the pre-processing efficiency) , and perform color consistency processing.

颜色一致性处理的具体操作是从进行匹配的两个视频中各抽取一个样例帧，构建帧内所有像素形成的颜色直方图，通过颜色直方图均衡化和规定化处理，使得两幅视频帧具有相同的颜色直方图分布，对于同一视频流中的所有帧都与样例帧具有近似相同的颜色直方图分布，所以对于同一视频流中每一帧做与对应样例帧相同的直方图均衡化和规定化处理，由此对整个视频流完成一致性处理。此处理的目的就是让重叠区域的视频纹理能够有相同的纹理颜色，提高视频间融合效果，避免出现明显跳变效果。由于解析出来的视频帧比较大，而且内存资源珍贵，所以为视频帧创建了cache，大小为50个视频帧。此处选择50的原因是，在获取视频流并解析的过程中，单视频帧最大的分辨率为1920*1080，每个像素都具有4个字节，即RGB和Alpha通道，其中Alpha决定了图像的半透明程度，其取值范围为0到255（0代表不透明，255代表全透明），那么读取一帧视频将消耗1MB空间，30路视频将消耗30MB的内存，如果以内存为1G估计的话，做多可以缓存50帧。采用FIFO的列表结构。如果缓存空间已满，则暂停传入进入等待。如果由于网络问题，显示速度快于载入速度，则返回上一帧视频，直至有新的帧数据传入。本发明提供的一优化方案：另外一个节约空间的策略就是对于视频帧进行多分辨率处理，对于同一张影像构建金字塔，不同情况载入不同分辨率的视频帧，节约内存开销。具体方式是逐像素点进行双线性内插操作，将图像抽析为原图像的1/4,1/16,1/64。此外，本发明提供的一优化方案：为了提高效率，使得视频帧能够直接用于纹理投影算法，需要为视频帧图像增加Alpha通道，作为视频与场景之间或者视频之间融合参数。The specific operation of color consistency processing is to extract a sample frame from each of the two videos for matching, construct a color histogram formed by all pixels in the frame, and equalize and prescribe the color histogram to make the two video frames Has the same color histogram distribution, all frames in the same video stream have approximately the same color histogram distribution as the sample frame, so for each frame in the same video stream, do the same histogram equalization as the corresponding sample frame Standardization and prescriptive processing, thereby completing consistent processing for the entire video stream. The purpose of this processing is to make the video textures in overlapping areas have the same texture color, improve the fusion effect between videos, and avoid obvious jumping effects. Since the parsed video frames are relatively large and memory resources are precious, a cache is created for the video frames, with a size of 50 video frames. The reason for choosing 50 here is that in the process of obtaining and parsing the video stream, the maximum resolution of a single video frame is 1920*1080, and each pixel has 4 bytes, namely RGB and Alpha channels, where Alpha determines The degree of translucency of the image, its value ranges from 0 to 255 (0 means opaque, 255 means fully transparent), then reading a frame of video will consume 1MB of space, and 30 channels of video will consume 30MB of memory. If the memory is 1G It is estimated that if you do more, you can cache 50 frames. Adopt the list structure of FIFO. If the cache space is full, pause incoming entry waiting. If the display speed is faster than the loading speed due to network problems, return to the previous frame of video until a new frame data is incoming. An optimization solution provided by the present invention: Another space-saving strategy is to perform multi-resolution processing on video frames, build a pyramid for the same image, and load video frames with different resolutions in different situations to save memory overhead. The specific method is to perform bilinear interpolation operation pixel by pixel, and extract the image into 1/4, 1/16, 1/64 of the original image. In addition, an optimization solution provided by the present invention: in order to improve the efficiency and enable the video frame to be directly used in the texture projection algorithm, it is necessary to add an Alpha channel to the video frame image as a fusion parameter between the video and the scene or between the videos.

技术方案（3）具体实现方案如下：The specific implementation plan of technical solution (3) is as follows:

首先通过模型视图矩阵和投影矩阵重置将虚拟视点变换至投影机视点下，清空虚拟视点下的深度缓冲区，并设置多边形偏移量和颜色掩码，绘制技术方案（1）中对应的虚拟场景，获得在当前投影机作为视点下的深度缓冲，并构成深度纹理；First, transform the virtual view point to the projector view point by resetting the model view matrix and projection matrix, clear the depth buffer under the virtual view point, and set the polygon offset and color mask, and draw the corresponding virtual view in technical solution (1). Scene, obtain the depth buffer under the current projector as the viewpoint, and form a depth texture;

其次，通过模型视图矩阵和投影矩阵重置将视点变回虚拟视点下，清空颜色和深度缓冲区，绘制技术方案（1）中对应的虚拟场景，包含其表面纹理，由此获得场景中每个点对应的真实深度值。Secondly, change the view point back to the virtual view point by resetting the model view matrix and projection matrix, clear the color and depth buffers, and draw the corresponding virtual scene in the technical solution (1), including its surface texture, thus obtaining each scene in the scene The real depth value corresponding to the point.

最后，通过模型视图矩阵和投影矩阵重置，在每个投影机视点下依次绘制场景。通过自动纹理生成方式获得场景中每个点的投影纹理坐标，并通过第一、二两步骤获得的真实深度值与Z-Buffer值的比较决定场景中每个点最终的纹理值，如果两者相等，则采用投影机视频纹理，如果不等，利用场景模型自身纹理。迭代此过程，直至遍历完场景内所有投影机。Finally, the scene is drawn sequentially under each projector viewpoint by resetting the modelview matrix and projection matrix. The projected texture coordinates of each point in the scene are obtained by automatic texture generation, and the final texture value of each point in the scene is determined by comparing the real depth value obtained in the first and second steps with the Z-Buffer value. If both If they are equal, use the projector video texture, if not, use the scene model's own texture. Iterate this process until all projectors in the scene have been traversed.

对于不同视频之间的融合，采用设定纹理组合器函数的方式实现。因为颜色一致性校正已经在视频帧预处理过程中完成，所以此处采用Replace方式，即用后来的纹理片段替代原有的值。For the fusion between different videos, it is realized by setting the texture combiner function. Because the color consistency correction has been completed in the video frame preprocessing process, the Replace method is used here, that is, the original value is replaced by the subsequent texture fragment.

技术方案（4）具体实现方案如下：对于投影机位姿的校正，即为对于虚拟场景中投影机空间三维坐标x,y,z以及三个方向偏转角φ，θ，γ的校正。The specific implementation of the technical solution (4) is as follows: the correction of the pose of the projector is the correction of the three-dimensional coordinates x, y, z of the projector in the virtual scene and the deflection angles φ, θ, γ in three directions.

以视频获取时得到的位姿值作为虚拟投影机虚拟场景中的初始值，但是由于设备精度的影响，该数值并不能使得投影纹理与虚拟空间完全融合。所以需要附加额外的校正过程。本发明采用构建以位姿信息为自变量的能量函数，并对能量函数求解最优值的方式对虚拟投影机进行校正。The pose value obtained during video acquisition is used as the initial value in the virtual scene of the virtual projector, but due to the influence of equipment precision, this value cannot completely integrate the projected texture with the virtual space. Therefore, an additional correction process is required. The present invention corrects the virtual projector by constructing an energy function with pose information as an independent variable and solving the optimal value of the energy function.

首先，通过模型视图矩阵和投影矩阵重置，将虚拟场景中视点调整至投影机处，绘制场景得到一幅虚拟环境下的影像，利用mean-shift算法对图像进行分割，并利用建筑和公路的颜色特征，将非建筑或者公路区域的像素值置为白色，只保留建筑模型或者公路对应的区域，然后对图像做二值化处理，将建筑和公路相关区域置为黑色。First, adjust the viewpoint in the virtual scene to the projector by resetting the model-view matrix and projection matrix, draw the scene to obtain an image in the virtual environment, use the mean-shift algorithm to segment the image, and use the Color features, set the pixel values of non-building or road areas to white, and only keep the area corresponding to the building model or road, and then perform binary processing on the image, and set the building and road related areas to black.

第二步，从视频中抽取出一关键帧，使用和第一步类似的方法，保留建筑模型或者公路对应的区域，做二值化处理。The second step is to extract a key frame from the video, and use a method similar to the first step to retain the area corresponding to the building model or road for binarization.

第三步，投影机视见体区域内计算轮廓误差The third step is to calculate the contour error in the viewing volume area of the projector

将前两步获得影像，逐像素做异或处理，最后统计结果为1的像素数量，将该结果作为能量函数的第一部分。The image obtained in the first two steps is XORed pixel by pixel, and finally the number of pixels with a result of 1 is counted, and the result is used as the first part of the energy function.

第四步，对于具有外观轴对称性质的建筑，如果只做外形匹配可能出现错误的结果，所以需要添加一些局部信息的特征。利用SIFT一致性算子，收集第一二两步所获得未经过二值化处理的影像中的匹配点对。通过Key-pointconstraint过程求出匹配点对的误差值，将该数值最为能量函数的第二部分。In the fourth step, for buildings with axisymmetric appearance, if only shape matching is performed, wrong results may occur, so some local information features need to be added. Use the SIFT consistency operator to collect matching point pairs in the image that has not been binarized in the first two steps. Calculate the error value of the matching point pair through the Key-point constraint process, and use this value as the second part of the energy function.

第五步，对于能量函数两个部分分配不同的权重，本发明为全局轮廓误差分配了更多的权重，至此能量函数构建完毕。The fifth step is to assign different weights to the two parts of the energy function. The present invention assigns more weights to the global contour error, and the energy function is constructed so far.

第六步，对于能量函数最优值的求解，首先对能量函数施加模拟退火算法，将函数的解空间缩小到最优解近似范围内。再利用downhillsimplex算法对近似解空间进一步压缩，从而获得最优解。The sixth step is to solve the optimal value of the energy function. Firstly, the simulated annealing algorithm is applied to the energy function to reduce the solution space of the function to the approximate range of the optimal solution. Then the downhillsimplex algorithm is used to further compress the approximate solution space to obtain the optimal solution.

第七步，利用最优解替换技术实现（3）中初始的投影机位姿值。The seventh step is to use the optimal solution replacement technology to realize the initial projector pose value in (3).

本实施例按照视频融合和校正生成流程，可以分为以下几个步骤实施：According to the process of video fusion and correction generation, this embodiment can be divided into the following steps for implementation:

1构建虚拟场景1Construct a virtual scene

先以虚拟校园为例，通过已有的地形遥感数据，构建数据金字塔，在不同视点下对地形绑定不同层次的纹理。创建校园内标志性建筑模型，并根据地形遥感数据，将模型手动添加到相应位置，让建筑间相对位置关系尽可能与现实保持一致。见图1步骤（2）遥感地形输数据→（3）LOD处理→（4）地形纹理值→（6）模型数据→（7）空间位置校准→（8）模型纹理。Taking the virtual campus as an example, a data pyramid is constructed through the existing terrain remote sensing data, and textures of different levels are bound to the terrain at different viewpoints. Create models of landmark buildings on campus, and manually add the models to corresponding positions based on terrain remote sensing data, so that the relative positional relationship between buildings is as consistent as possible with reality. See Figure 1 step (2) Remote sensing terrain input data → (3) LOD processing → (4) Terrain texture value → (6) Model data → (7) Spatial position calibration → (8) Model texture.

2获取视频数据2 Get video data

从不同视频源，如校内监控摄像头或者相机、手机，获取未经处理的视频流。将视频流抽析成单帧图像，并对图像进行多分辨率处理，创建不同分辨率的图像，并为图像添加Alpha通道，方便后续视频间以及视频与场景间的融合。此外，保存视频源的经纬度、视角和方向信息，用于虚拟空间中投影机的初始定位。见图1步骤（11）对于视频帧的抽析，构建多分辨率，为影响增加alpha通道→（13）视频流→（14）投影机纹理。Get unprocessed video streams from different video sources, such as campus surveillance cameras or cameras, mobile phones. Extract the video stream into a single frame image, and perform multi-resolution processing on the image to create images with different resolutions, and add an Alpha channel to the image to facilitate the fusion between subsequent videos and between videos and scenes. In addition, the latitude and longitude, viewing angle and direction information of the video source are saved for initial positioning of the projector in the virtual space. See Figure 1 step (11) for the extraction of video frames, build multi-resolution, add alpha channel for the effect → (13) video stream → (14) projector texture.

3视频纹理融合3 video texture fusion

将视频影像与地形、模型纹理相互融合。通过三次场景绘制实现，第一次在投影机视角下绘制物体，获得对应的Z-Buffer，第二次在虚拟视点下绘制场景，获得场景中每个点的真实深度值。第三次绘制，通过前两次绘制获得深度值做比较，决定场景中每个点的纹理值。融合过程通过设定不同的纹理融合器实现。见图1步骤（1）纹理座标自动生成→（5）Z-buffer值于真实深度值的比较→（9）虚拟场景多遍绘制→（10）纹理组合器函数→（12）最终影像。Blend video images with terrain and model textures. It is realized by three scene renderings. The object is drawn at the perspective of the projector for the first time to obtain the corresponding Z-Buffer, and the scene is drawn at the virtual viewpoint for the second time to obtain the real depth value of each point in the scene. In the third drawing, the depth value obtained by the first two drawing is compared to determine the texture value of each point in the scene. The fusion process is realized by setting different texture fusion devices. See Figure 1 Step (1) Automatic generation of texture coordinates → (5) Comparison of Z-buffer value and real depth value → (9) Multi-pass rendering of virtual scene → (10) Texture combiner function → (12) Final image.

4投影机校正4 Projector Calibration

将视点置于投影机处，绘制场景，获得虚拟场景影像。根据投影机找到其在真实场景中的视频流，并抽取其中一张关键帧。对上述两幅影像进行图像分割处理，可以选用的算法很多，例如mean-shift，normalizedcut，JSEG，pixelaffinity，本发明采用的是mean-shift算法。将图像分割为不同区域，根据颜色特征抽析出地面或者建筑物部分，将无关部分剔除。对分离之后的影像做归一化处理，非模型所在部分被设置为白色，模型所在部分设置为黑色。然后两张图像做逐像素的异或操作，如果两张影像分辨率不一样，需要添加一致性处理过程。对结果为一的像素做计数操作，此结果作为能量函数的第一部分。轮廓匹配是一种全局的比较手段，需要一些局部特征匹配作为补充，此处使用SIFT特征匹配算子，选取出几组特征点，计算其key-pointerror值，将该值叠加作为能量函数的第二部分。此时，投影机的校正问题转变为多元自变量的能量函数求最优解问题。本发明使用模拟退火算法和downhillsimplex算法的组合。通过模拟退火算法找到近似最优解，再通过downhillsimplex算法在小范围内对投影机位置做优化。求出最优解之后，将该值替换虚拟投影机的位姿，完成校正过程。见图1步骤（15）基于Mean-shift的图像分割→（16）SIFT算子提取局部特征匹配点→（17）虚拟投影机位姿校准值→（18）建筑或者公路抽析，并做异或操作→（19）Key-Point误差→（20）Downhillsimplex→（21）模拟退火→（22）构建能量函数。Place the viewpoint at the projector, draw the scene, and obtain the virtual scene image. Find its video stream in the real scene according to the projector, and extract one of the key frames. For image segmentation processing of the above two images, there are many algorithms that can be selected, such as mean-shift, normalized cut, JSEG, pixelaffinity, and the present invention uses the mean-shift algorithm. Divide the image into different areas, extract the ground or building parts according to the color features, and remove the irrelevant parts. The separated image is normalized, the part where the model is not is set to white, and the part where the model is is set to black. Then perform a pixel-by-pixel XOR operation on the two images. If the resolutions of the two images are different, a consistency process needs to be added. Count the pixels with a result of one as the first part of the energy function. Contour matching is a global comparison method that requires some local feature matching as a supplement. Here, the SIFT feature matching operator is used to select several groups of feature points, calculate their key-pointerror values, and superimpose this value as the first energy function. two parts. At this time, the calibration problem of the projector is transformed into the problem of finding the optimal solution of the energy function of the multivariate independent variables. The present invention uses a combination of the simulated annealing algorithm and the downhillsimplex algorithm. The approximate optimal solution is found by the simulated annealing algorithm, and then the projector position is optimized in a small range by the downhillsimplex algorithm. After finding the optimal solution, replace the value with the pose of the virtual projector to complete the calibration process. See Figure 1 step (15) Image segmentation based on Mean-shift → (16) SIFT operator extracts local feature matching points → (17) Virtual projector pose calibration value → (18) Building or road extraction, and do different Or operation → (19) Key-Point error → (20) Downhillsimplex → (21) Simulated annealing → (22) Construct energy function.

Claims

1. An automatic matching correction method for video texture projection in a three-dimensional virtual-real fusion environment, the steps comprising:

1) Establish a surface model and a virtual scene with static texture images on the surface according to the pre-acquired remote sensing data images; obtain multiple real shooting video streams and record the camera pose information when shooting;

2) Add a virtual projector model and a projector’s viewing volume corresponding to the camera parameters in the virtual scene according to the camera pose information at the time of shooting, and set the virtual projector model according to the camera pose information The initial pose value in the virtual scene;

3) performing video frame preprocessing on the images of the real captured video streams to obtain dynamic video textures, and projecting the preprocessed video data into the virtual environment using projection texture technology;

4) Fusing the static surface texture of the surface model in the virtual environment and/or the original remote sensing image texture of the surface with the dynamic video texture to obtain the final texture value of the scene surface coverage, including:

4-1) Resetting the model-view matrix and the projection matrix to transform the virtual viewpoint to the projector viewpoint, draw the virtual scene, and obtain the depth value under the current projector viewpoint;

4-2) Resetting the model-view matrix and the projection matrix to change the viewpoint back to the virtual viewpoint, redraw the virtual scene, and obtain the real depth value corresponding to each point in the scene;

4-3) Draw the virtual scene sequentially under each projector viewpoint, obtain the projected texture coordinates of each point in the scene through automatic texture generation, and perform the above steps 4-1), 4-2) to obtain the real a comparison of a depth value with said depth value;

4-4) If the two are equal, use the video texture of the projector; if they are not equal, use the texture of the scene model itself, and iterate by setting the texture combiner function until all projectors in the scene are traversed, and each image in the scene is obtained. The final texture value of points;

5) Acquiring the virtual projector from the virtual projector model as an image under the viewpoint according to the final texture value, and matching with the corresponding image in the real shooting video stream to construct an energy function;

6) Using the optimal solution in the energy function to reset the initial pose value of the projector in the virtual scene to complete the calibration of the virtual projector.

2. the automatic matching correction method of video texture projection in the three-dimensional virtual and real fusion environment as claimed in claim 1, it is characterized in that, described step 5) matches with the corresponding image in the real shooting video stream, and establishes the pose information as an independent variable The energy function construction method of is as follows:

The first step is to reset the model-view matrix and projection matrix, adjust the viewpoint in the virtual scene to the projector, draw the scene to get an image in the virtual environment, use the mean-shift algorithm to segment the image, and then binary the image treatment;

The second step is to extract a key frame from the real shooting video stream, and use the method of the first step to do binarization;

The third step is to calculate the contour error in the viewing volume area formed by the projector, perform XOR processing pixel by pixel on the images obtained in the first two steps, and count the number of pixels whose result is 1, which is the first part of the energy function;

The fourth step is to use the SIFT consistency operator to add the features of local information, collect the matching point pairs in the images that have not been binarized in the first and second steps, and obtain the matching points through the Key-point constraint process The error value of the pair, the error value is the second part of the energy function;

The fifth step is to assign different weights to the two parts of the energy function;

The sixth step is to solve the optimal value of the energy function,

The seventh step is to replace the initial pose value of the projector with the optimal solution.

3. The automatic matching correction method of video texture projection in the three-dimensional virtual-real fusion environment as claimed in claim 1 or 2, is characterized in that, described energy function optimal value is solved according to the following method:

First, the simulated annealing algorithm is applied to the energy function to reduce the solution space of the function to the approximate range of the optimal solution, and then the downhillsimplex algorithm is used to compress the approximate solution space to obtain the optimal solution.

4. the automatic matching correction method of video texture projection in the three-dimensional virtual and real fusion environment as claimed in claim 2, is characterized in that, described first, second step utilizes the color of building and road when image is segmented by mean-shift algorithm Features, set the pixel values of non-building or road areas to white, retain the area corresponding to the building model or road, and then perform binarization on the image, and set the building and road related areas to black.

5. the automatic matching correction method of video texture projection in the three-dimensional virtual and real fusion environment as claimed in claim 1, is characterized in that, the method for carrying out video frame preprocessing to the image of described real shooting video stream is as follows:

The video data is decoded to obtain a single video image frame, and a sample frame is extracted from each video stream, and the SIFT operator is used to find the matching of feature points in the sample frame, and the color consistency is processed.

6. the automatic matching correction method of video texture projection in the three-dimensional virtual-real fusion environment as claimed in claim 5, is characterized in that, described color consistency is processed as:

1) Extract a sample frame from each of the two videos for matching, construct a color histogram formed by all pixels in the frame, and make the two video frames have the same color histogram through color histogram equalization and prescriptive processing distributed;

2) Perform the same histogram equalization and prescriptive processing on each frame in the same video stream as the corresponding sample frame, thereby completing the consistent processing on the entire video stream;

3) Create a cache for video frames, the size of which can accommodate 50 video frames;

4) The video frame data is loaded using the list structure of the first-in-first-out FIFO.

7. the automatic matching correction method of video texture projection in the three-dimensional virtual-real fusion environment as claimed in claim 5, it is characterized in that, described real shot video stream is obtained by http protocol, video data decoding is carried out locally, and video frame is saved It is in Jepg format.

8. The method for automatic matching and correction of video texture projection in a three-dimensional virtual-real fusion environment as claimed in claim 5, wherein the video image frame is subjected to multi-resolution processing, and the same image is loaded with different resolutions according to different situations. High-rate video frame, adopts bilinear interpolation operation pixel by pixel, and extracts the image into one or more of 1/4, 1/16, 1/64 of the original image.

9. the automatic matching correction method of video texture projection in the three-dimensional virtual-real fusion environment as claimed in claim 7, is characterized in that, increases Alpha channel to described Jepg format video image.

10. A real video image and virtual scene fusion method, the steps of which are:

1) Establishing a static texture image model and a virtual scene on the surface according to pre-acquired remote sensing data images; the spatial position of the model in the virtual scene and the relative position, orientation, and size between the models are consistent with the real scene;

2) Obtain multiple segments of real shooting video streams and record the pose information of the camera where the shooting takes place;

3) Convert the latitude and longitude coordinates of the earth surface where the shooting is located to the world coordinates represented by Cartesian coordinates where the virtual scene is located and combine them, and add a virtual projector model and a visual volume corresponding to the projector model to the virtual scene , and set the initial pose value in the virtual scene of the virtual projector model in the world coordinate system according to the camera pose information;

4) performing video frame preprocessing on the images of the real shooting video stream to obtain dynamic video textures, and projecting the preprocessed video data into a virtual environment using projection texture technology;

5) Fusing the static texture of the model in the virtual environment and/or the original remote sensing image texture of the ground surface with the dynamic video texture, including:

5-1) Resetting the model-view matrix and the projection matrix to transform the virtual viewpoint to the projector viewpoint, draw the virtual scene, and obtain the depth value under the current projector viewpoint;

5-2) Resetting the model-view matrix and the projection matrix to change the viewpoint back to the virtual viewpoint, redraw the virtual scene, and obtain the real depth value corresponding to each point in the scene;

5-3) Draw the virtual scene sequentially under each projector viewpoint, obtain the projected texture coordinates of each point in the scene through automatic texture generation, and perform the above steps 5-1) and 5-2) to obtain the real a comparison of a depth value with said depth value;

5-4) If the two are equal, use the video texture of the projector; if they are not equal, use the texture of the scene model itself, and iterate by setting the texture combiner function until all projectors in the scene are traversed, and each image in the scene is obtained. The final texture value of points;

6) Texture fusion is used for overlapping coverage areas of different projectors in the virtual projector model.