TW512308B - Real-time lip dynamic simulation method with the voice as the driving mechanism - Google Patents

Real-time lip dynamic simulation method with the voice as the driving mechanism Download PDF

Info

Publication number
TW512308B
TW512308B TW90111865A TW90111865A TW512308B TW 512308 B TW512308 B TW 512308B TW 90111865 A TW90111865 A TW 90111865A TW 90111865 A TW90111865 A TW 90111865A TW 512308 B TW512308 B TW 512308B
Authority
TW
Taiwan
Prior art keywords
voice
parameters
sound
group
vector
Prior art date
Application number
TW90111865A
Other languages
Chinese (zh)
Inventor
Shiue-Wu Wang
Original Assignee
Inst Information Industry
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inst Information Industry filed Critical Inst Information Industry
Priority to TW90111865A priority Critical patent/TW512308B/en
Application granted granted Critical
Publication of TW512308B publication Critical patent/TW512308B/en

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention provides a real-time lip dynamic simulation method with the voice as the driving mechanism, which uses the Gaussian mixture model and the vector quantization as the grouping base for the voice and the lip size. In the training stage, the synchronous data for the voice and the lip is obtained from the video, and the voice is divided into continuous and overlapped voice frames. Each voice frame is converted into multiple inversed spectrum parameters, and the lip part abstracts two parameters for width and height to compose a vector. After obtaining a series of vectors, they are grouped by the vector quantization, and uses the Gaussian mixture model as the description base for each group, and find out the best description method with the maximum estimation algorithm. In the stage of corresponding the voice with the lip size, the voice is first divided into continuous and overlapped voice frames, and each voice frame is converted into multiple inversed spectrum parameters, and computing with the appearing probability of the parameter in each group. The probability and the corresponding lip size in each group are used to calculate the lip size for each voice segment in a weighted average method.

Description

512308 A? B? 五、發明說明(ί ) 【本發明之領域】 本發明係有關嘴型模擬之技術領域 為驅動機制的嘴型即時動態模擬方法。 尤指一種以聲音 s 【本發明之背景】 按,隨著電腦技術的發展,各種造型的嘴型與説話時 的搭配,無論在3D或是2D方面的應用,例如在現今的 電影、電腦遊戲等視聽娛樂之應用上,已經成為不可或缺 的一部分。然而在這些應用中,一般而言,造型的嘴型與 聲晋的搭配大都是以手工的方式調整,而以人工製作嘴形 3 0秒約需要1 · 5小時,因此,其耗時極長而缺乏效率,而 即使有提供語音的辨認來決定對應之嘴形,也都是將聲音 轉成相對應的文字,然後再依照相對應文字的嘴型大小^ 行嘴型仿眞,惟此種仿眞方式都僅能限制與單一的語含, 例如為純中文與純英文,而不能中英文混合。因此::前 述習知嘴形模擬方法來製作的動晝或是影片,通常非常= 耗費人力與時間,而有予以改進之必要。 發明人爰因於此,本於積極發明之精神,亟思一種可 以解決上述問題之「以聲音為驅動機制的嘴型即時動能模 擬万法」,幾經研究實驗終至完成此項新穎進步之於明“。 員 工 消 費 印512308 A? B? V. Description of the Invention (Field of the Invention) The present invention relates to the technical field of mouth shape simulation. The real-time dynamic simulation method of mouth shape is a driving mechanism. Especially with sound s [Background of the present invention] According to the development of computer technology, the combination of various shaped mouth shapes and speaking, whether in 3D or 2D applications, such as in today's movies, computer games In the application of audiovisual entertainment, it has become an indispensable part. However, in these applications, in general, the matching of the shape of the mouth and the sound of the mouth are mostly manually adjusted, and the artificial mouth shape takes about 1.5 hours in 30 seconds, so it takes a long time. It is inefficient, and even if speech recognition is provided to determine the corresponding mouth shape, the sound is converted into the corresponding text, and then the mouth shape is simulated according to the mouth size of the corresponding text. The imitation method can only be limited to a single language, such as pure Chinese and pure English, but not Chinese and English. Therefore :: The moving day or film produced by the above-mentioned conventional mouth shape simulation method is usually very = consumes manpower and time, and it is necessary to improve it. Because of this, based on the spirit of active invention, the inventor is eager to think of a "sound-driven real-time dynamic simulation method of mouth shape" that can solve the above problems. After several research experiments, this novel progress has been completed. Ming ". Employee Consumption Seal

【本發明之概述】本發明之目的係在提供-種以聲音4轉_的^ p: ’力':杈擬万法,以達成即時的同步動態模擬,吾音辨認技術,且能打破單一語言的限制;、而 5張尺度祕(2】G X[Summary of the invention] The purpose of the invention is to provide-a kind of ^ p: 'force': a pseudo-manipulation method with 4 turns of sound, to achieve real-time synchronous dynamic simulation, voice recognition technology, and can break a single Limitations of language; and 5 scale secrets (2) GX

1裝------ J^T --線 -I I I {請先閱讀背面之注音?事項再填寫本頁} I I I · 五、發明說明(z ) 為達前述之目的,本發明之 即時動態模擬方法,主要包括下述驅動機制=型 之影立资却沾敕立 步^ · ( A )將輸入 、…的耳音分成複數個連續而且有重最的立框. (B)將母-個音框轉成複’ _型的寬度與高度兩個參數:二數每=:;: 參數及嘴型的寬度與高度參:所 ==複數群,以使能量與嘴形大小相近之音頻-見:里在同一群;(D)以高斯混合模型作為每一群的 基礎;以及,⑻對每一個群,根據向量量化所得 到的結果,設定起始設定値,以利用最大預測演算法來求 取每-群的最佳高斯混合模型的參數値,俾供模擬 之聲音。 / 由於本發明設計新穎,能提供產業上利用,且確有增 進功效,故依法申請專利。 曰 為使貴審查委員能進一步瞭解本發明之結構、特徵 及其目的,茲附以圖式及較佳具體實施例之詳細說明如 后: σ 【圖式簡單説明】 第1圖:係為本發明之以聲音為驅動機制的嘴型即時動能 模擬方法在訓練階段的流程圖。 第2圖:係為本發明之以聲音為驅動機制的嘴型即時動能 模擬方法在求取訓練參數之組合示意圖。 512308 A7 B7 五、發明說明(3) — 第3圖·係為本發明之以聲音為驅動機制的嘴型即時動能 模擬方法在模擬階段之流程圖。 【較佳具體實施例之詳細説明】 為説明本發明之以聲音為驅動機制的嘴型即時動能模 擬方法’凊先參照第1圖所示,其顯示本發明之方法在力1丨 練階段之流程圖。本發明在訓練階段是以攝影機拍攝刻練 者的朗誦事先設計好的數段文字,俾以求取訓練參數,併 請參照第2圖所示所欲求取之訓練參數之組合示意圖,首 先’將輸入之影晋資訊(Video & Audio)的聲音分成複 數個連續而且有重疊的音框(步騾S 1 1 ),並以特徵分析 (Feature Extraction)將每一個音框轉成複數個(例如 13個)倒頻1晋參數(Cepstrum coefficients)(以 α表 示)(步驟S12),且相對應於每一個音框,以透過嘴形 追蹤程式(Lip-tracking program)取得這個音框内嘴 型的寬度(Width)與南度(Height)兩個參數(以▽表 示)(步騾S 1 3 ),而對於每一個音框,此i 5個參數便可 組成為一個音頻-視覺向量(Audio_visual feature v e c t o r )(步驟S 1 4 ),以作為該音框的代表〇 在取得一系列的晋頻-視覺向量ν之後,再利用向量量 化(Vector Quantization)將這些音頻-視覺向量分成Ν 群(步驟S 1 5 )’以使能量與嘴形大小相近之音頻_視覺向 量在同一群,而每一群即對應有一個收斂後的中心向量 (Center Vector )與共變異矩陣(C0variance 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297 ^釐) f請先閱讀背面之注咅?事項再填寫本頁}1 pack ------ J ^ T --line -I I I {Please read the phonetic on the back? Please fill in this page again. III. V. Description of the invention (z) In order to achieve the aforementioned purpose, the real-time dynamic simulation method of the present invention mainly includes the following driving mechanism = Xingyingyinglizi, but stubbornly moves ^ · (A ) Divides the ear sounds of input, ... into a plurality of continuous and heaviest frames. (B) Turns the mother-sound frame into a complex '_ width and height two parameters: two counts each ::::: Parameters and the width and height parameters of the mouth shape: all == complex groups, so that the energy is similar to the size of the mouth-see: inside the same group; (D) using the Gaussian mixture model as the basis for each group; and, ⑻ For each group, based on the results obtained from the vector quantization, the initial setting 値 is set to use the maximum prediction algorithm to obtain the parameters 値 of the optimal Gaussian mixture model for each group, for simulation sound. / As the invention is novel in design, can provide industrial use, and does have an added effect, it has applied for a patent in accordance with the law. In order to enable your review committee to further understand the structure, characteristics and purpose of the present invention, the detailed description of the drawings and preferred embodiments is attached as follows: σ [Simplified description of the drawings] Figure 1: This is the basis Invented a flowchart of a mouth-shaped instant kinetic energy simulation method using a sound as a driving mechanism during a training phase. Fig. 2: This is a schematic diagram of the combination of the real-time kinetic energy simulation method of the mouth shape using the sound as the driving mechanism to obtain training parameters. 512308 A7 B7 V. Description of the invention (3) — Figure 3 is a flowchart of the simulation method of the mouth-shaped real-time kinetic energy using the sound as the driving mechanism in the simulation phase. [Detailed description of the preferred embodiment] In order to explain the method of simulating real-time kinetic energy of the mouth shape using sound as the driving mechanism according to the present invention, refer to FIG. 1 first, which shows that the method of the present invention is in the power training stage. flow chart. In the training phase of the present invention, a camera is used to capture a number of texts designed in advance by a trainer ’s recitation to obtain training parameters, and please refer to the combined schematic diagram of the desired training parameters shown in FIG. 2. The sound of the input Video & Audio is divided into a plurality of continuous and overlapping frames (step S 1 1), and each frame is converted into a plurality by Feature Extraction (for example, 13) Cepstrum coefficients (indicated by α) (step S12), and corresponding to each frame, the mouth shape in the frame is obtained through a Lip-tracking program Width and Height parameters (indicated by ▽) (step 骡 S 1 3), and for each frame, these 5 parameters can be combined into an audio-visual vector (Audio_visual feature vector) (step S 1 4), as a representative of the sound frame. After obtaining a series of frequency-visual vectors ν, then use Vector Quantization to divide these audio-visual vectors into N groups ( Step S 1 5) 'so that the audio_visual vectors with similar energy to the mouth shape are in the same group, and each group corresponds to a converged Center Vector and a common variation matrix (C0variance) This paper scale applies to China National Standard (CNS) A4 Specification (210 X 297 ^ centimeters) f Please read the note on the back? Matters before filling out this page}

I 經濟部智慧財產局員工消費合作社印製 512308 A7 五、發明說明(+ )I Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 512308 A7 V. Description of Invention (+)

Matrix ),步騾S16係以高斯混合模型(以…以心 Mixture Model,GMM)作為每一群的表示基礎,亦 即,以GMM來表示音頻-視覺向量的機率分佈,其中, GMM是K個高斯函數(Gaussian functi〇n)的權重和 (weighted sum ),可由以下的公式所示· /=/ 其中^為混合權重,啦為)為具有平均値(mean) A與共變異矩陣Σ,的高斯函數,如下所示。 於步驟S17中,對每-個群卜根據向量量化所得到 .的結果’取其中心向量作為初始平均値(㈤“心⑽) 心以收叙後的共變異矩陣作為分群丨之共變異矩陣ς,,而 分群i中的音頻-視覺向量數目,㈣有音頻_視覺向量數 目的比例則作為初始混合權重(initial mixture weight ) Μ,而以前述之起私以 —^ ^ 九5又疋値,即可利用最大預測 次异法(ExpeetauGn_MaximizatiGnaigGdthm)^ 取每-群的最佳高斯混合模型的參數値u與义。 声…參照第3圖所示,係首先將受測者的 ::「個:_成複數個(例如"個)倒頻譜參數(以! /^S32),也就是聲音特徵向量α。步騾S33則 根據《出現在每一群中的機率値,取— 、/、 出目前的嘴型大小“另為 固加推平均値而求Matrix), step S16 uses a Gaussian mixture model (with a Heart Mixture Model, GMM) as the basis for each group, that is, the probability distribution of audio-visual vectors is represented by GMM, where GMM is K Gaussian The weighted sum of the function (Gaussian functi〇n) can be expressed by the following formula: / = / where ^ is the mixed weight, which is) Gaussian with mean A (mean) A and covariance matrix Σ, Function as shown below. In step S17, each group is obtained according to vector quantization. The result 'takes its center vector as the initial average ㈤ (㈤ "心 ⑽), and uses the co-variation matrix after classification as the co-variation matrix for grouping. ς, and the number of audio-visual vectors in cluster i, the ratio of the number of audio_visual vectors is used as the initial mixture weight (M), and from the foregoing, it is privately used ^ ^ 九 5 又 疋 値, You can use the maximum prediction sub-extra method (ExpeetauGn_MaximizatiGnaigGdthm) ^ Take the optimal Gaussian mixture model parameters 値 u and meaning of each group. Sound ... Refer to Figure 3, the first is to test subjects: "" : _ Into a plurality of (such as ") cepstrum parameters (with! / ^ S32), which is the sound feature vector α. Step 骡 S33 is based on the "probability of appearing in each group 取, take —, /, out The current size of the mouth "is calculated separately

、、ϋ込求解的速度,可設定N 本紙張尺細巾嶋鮮 經濟部智慧財產局員工消費合作社印製 512308 五、發明說明( ,κ,亦即’蔣向量量化的分 ^The speed of solving the problem can be set to N paper rulers and fine towels. Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs. 512308 V. Description of the invention

所使用的高斯函數的個數相 设足為與表示GMM #同’而其求解之公式如下: -I Pa{a) *V^ 其中^⑷⑽, 凡⑷⑷ 巧=ίν啦,Σ,](ν抽。, 由以上之説明可知, & 、 將聲音與嘴型女t ^ 、, ^明之方法係以分群的方法, 計上的分群。t這斯混合模型與向量量化做-個統 時,可根據聲音落在各分群=基% =有説話聲音輸人 的嘴型大小。而依照羊’异出孩聲骨所相對應 做即時的聲音與 土又小,便可以針對造型的嘴型 到複雜的語音辨: 動態模擬。因此,無需用 打破單一語言的限現嘴型之模擬,同時亦可 综上所陳,本發明時的同步動態模擬。 顯示其迥異於習知技:::目的、手段及功效,在在均 大突破,懇請鳩:::二為:形模擬之設計上的-社會,實感德便。惟應n ΐ,賜准專利,俾嘉惠 了便於説明而舉^^,’上述諸多實施例僅係為 申請專利範圍所述為準= 為卞,而非僅限於上述實施例。 —裝 ·訂i (請先閱讀背面之注意事項再填寫本頁) --線·The number of Gaussian functions used is set to be the same as that of GMM, and the formula for its solution is as follows: -I Pa {a) * V ^ where ^ ⑷⑽ , 凡 ⑷⑷ 巧 = ίν 啦 , Σ,] (ν From the above description, it can be known that the method of combining sound and mouth shape t ^, ^ Ming is based on the grouping method, and the grouping is counted. When this mixed model and vector quantification are made as a unified system, According to the sound falling in each subgroup = base% = the size of the mouth of a person who has a voice to speak. And according to the sheep's bones, the real-time sound and soil are small, and the shape of the mouth can be complicated. Speech recognition: dynamic simulation. Therefore, it is not necessary to use a simulation that breaks the limitation of a single language. At the same time, it can also summarize the synchronous dynamic simulation in the present invention. It shows that it is quite different from the conventional technology :: Purpose, Means and effects, in the breakthrough in the great, I ask the dove ::: The second is: the design of the shape simulation-society, real sense of morality. But should n ΐ, grant a quasi-patent, 俾 gratuitous for easy explanation ^^ , 'Many of the above-mentioned embodiments are only as described in the scope of patent application = Yes, not Limited to the embodiments described above - loaded · set i (please read the Notes on the back to fill out Page) - Line

本紙張尺度適用中國國家標準(CNS)A4規格 (210 297公釐)This paper size applies to China National Standard (CNS) A4 (210 297 mm)

Claims (1)

512308 A8B8C8D8 之 群 群 經 濟 部 智 慧 財 產 局 員 工 消 費 合 作 社 印 製 六、申請專利範圍 、# 乂聲曰為驅動機制的嘴型即時動態模擬方 法,主要包括下述之步驟· )將輸入之〜曰資訊的聲音分成複數個連續而且 有重疊的音框; 貝 (B)將每-個音框轉成複數個倒頻譜參數,並求取 每個骨框内嘴型的寬度盥 儿义/、阿度兩個參數,其中,每一個音 框係由對應之倒頻譜參數及喈刑 、 〃数及爲型的寬度與高度參數所組成 7 一音頻-視覺向量所代表; (㈡利用向量量化將該等音頻_視覺向量分成複數 以使此I與嘴形大小相近之音頻_視覺向量在同一 ⑻以高斯混合模型作為每_群的表示基礎.以及 、(E)對每-個群,根據向量量化所得到的結果,設 定起始設足値,以利用最大預測演算法來求取每 佳咼斯混合模型的參數値,俾供模擬受測者之舞立 之二如申請專利範圍第1項所述之方法,其V包。含下述 (F) 將受測者的聲音分成複數個連續而且有 音框,再將每-個音框轉成複數個代表聲音特徵: 頻譜參數;以及 (G) 根據聲音特徵向量出現在每-群中的機率値, 取一個加權平均値而求出對應於受 小。 4聲音之嘴型大 本紙張尺度適用中國國家標準(CNS>A4規格咖X 297么爱) (請先閱讀背面之注意事項再填寫本頁)512308 A8B8C8D8 Printed by the Consumer Property Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs of the Ministry of Economic Affairs. 6. Application scope of patents. # 乂 声 曰 is a real-time dynamic simulation method of the mouth shape, which mainly includes the following steps. The sound is divided into a plurality of continuous and overlapping frames; B (B) converts each frame into a plurality of cepstrum parameters, and finds the width of the mouth shape in each bone frame. Two parameters, where each sound frame is composed of a corresponding cepstrum parameter and 喈 喈, 〃 number and the shape's width and height parameters 7 an audio-visual vector; (㈡ using vector quantization to these The audio_visual vector is divided into complex numbers so that this I is similar to the mouth size of the audio_visual vector. The Gaussian mixture model is used as the basis for each group. And, (E) for each group, according to the vector quantization, The obtained results are sufficient to set the initial settings to use the maximum prediction algorithm to obtain the parameters of each Gaussian mixture model. The method described in the first item of the patent scope, which includes the V package, includes the following (F): divides the voice of the subject into a plurality of continuous and sound frames, and then converts each sound frame into a plurality of sound characteristics: Spectral parameters; and (G) According to the probability 声音 that sound feature vectors appear in each group, take a weighted average 値 to find the corresponding acceptance. 4 The mouth size of the sound The large paper size applies the Chinese National Standard (CNS > A4 size coffee X 297 What love) (Please read the precautions on the back before filling in this page) 經濟部智慧財產局員工消費合作社印製 512308 -- '—— 六、申請專利範圍 3·如中請專利範圍第i項所述之、 .二A係以特徵分析將每-個音框轉成複數個US 4·如申請專利範圍第丨項所述之 人 (B)中,係以透過嘴形追蹤程 二;:’;步驟 與高度兩個參數。 飞取侍曰框内嘴型的寬度 广”,專利範園第丨項所述之方法,其中, 母-鮮具有-個收叙後的中心向量與共變異矩 (D)6.中如:請專利範圍第1項所述之方法,其中,於步驟 (D j中’係以鬲斯混合模一 分佈。 。口楔土來表不骨頻-視覺向量的機率 7·如申請專利範圍第6項所述之方法, 混合模型是K個高斯函數的權 〃以^、斤 厂. 傳里和,可由以下的公式所 /==/ , 當中’ μ為混合權重,⑷ 与〃有平均値a與共變異 矩陣二的高斯函數,其可表示為: /、 8[μι 5 Σ/ ](〇) = 7ΗΦΕί exp{~ 2 - μί) Σ·1 (〇 -Ui)] ο 8·如申請專利範圍第7項所述之方法,其中,於步辨 ⑴中,對每-個群丨,係取其中心向量作為初始平均値 ^',以收㈣的共變異矩陣作為分群i之共變異矩陣Σ,,而 分群i中的音頻-視覺向量數目,佔所有音頻_視覺向Printed by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs 512308-'—— VI. Patent Application Scope 3. As described in item i of the patent scope, the second A is a feature analysis that converts each sound box into A plurality of US 4. In the person (B) described in item 丨 of the scope of the patent application, the process two is tracked through the mouth shape;: '; step and height two parameters. The method described in item 丨 of the patent fan garden, where the mother-fresh has a center vector and the covariation moment (D) 6. The example is as follows: Please apply the method described in item 1 of the patent scope, wherein in step (D j) is a distribution of the Bis mixed mode. The probability of the wedge clay to express the bone frequency-visual vector 7. In the method described in item 6, the mixed model is the weights of K Gaussian functions, ^ and jin. Chuanli sum can be expressed by the following formula / == /, where 'μ is the mixed weight, ⑷ and 〃 have an average 値The Gaussian function of a and the covariance matrix II can be expressed as: /, 8 [μι 5 Σ /] (〇) = 7ΗΦΕί exp {~ 2-μί) Σ · 1 (〇-Ui)] ο 8 · If applied The method described in item 7 of the patent scope, wherein, in step discrimination, for each group, the center vector is taken as the initial average 値 ^ ', and the received covariation matrix is used as the covariation of group i. Matrix Σ, and the number of audio-visual vectors in cluster i, accounting for all audio_visual directions 本紙張尺度適用中國國家標準(C^S:)A4規格(21G χ 297 _ )This paper size applies the Chinese national standard (C ^ S :) A4 specification (21G χ 297 _) 512308 六、申請專利範圍 目的,例則作為初始混合權重^,俾供作為起始設定値而 求取每-群的最佳高斯混合模型的參數値a、[與^。 价乂·如申請專利範圍第8項所述之方法,其中,向量量 化的分群數目係設定A命 ^ f τ机aa v * 為/、表不鬲斯混合模型所使用的高斯 函數的個數相同,以依墟 依據以下〈公式求解: 7 = 4小] -1 Paid)、 當中,⑷, K 凡⑻ sZmuako /=/ , α表TF倒頻靖參數’ ν表示嘴型的寬度與高度兩個參 數,F表示嘴型大小。 " (請先閱讀背面之注意事項再填寫本頁} 經濟部智慧財產局員工消費合作社印製 適 度 尺 張一紙 本 21 /IV 格 規 4 )Α Ns) (c 準 標 家 釐 :κ< 97512308 VI. Scope of Patent Application Purpose, for example, as the initial mixing weight ^, and 俾 supply as the initial setting, to find the parameters 値 a, [, and ^ of the optimal Gaussian mixture model for each group. Price: The method described in item 8 of the scope of the patent application, wherein the number of clusters for vector quantization is set to A ^ f τ machine aa v * is /, the number of Gaussian functions used in the mixture model Same, according to the following formula to solve according to the following formula: 7 = 4 small] -1 Paid), where, ⑷, K ⑻ sZmuako / = /, α TF cepstrum parameter 'ν represents the width and height of the mouth shape Parameters, F is the mouth size. " (Please read the notes on the back before filling out this page} The Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs prints a moderate size sheet of a paper 21 / IV grid 4) Α Ns) (c quasi-standard family: κ < 97
TW90111865A 2001-05-17 2001-05-17 Real-time lip dynamic simulation method with the voice as the driving mechanism TW512308B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW90111865A TW512308B (en) 2001-05-17 2001-05-17 Real-time lip dynamic simulation method with the voice as the driving mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW90111865A TW512308B (en) 2001-05-17 2001-05-17 Real-time lip dynamic simulation method with the voice as the driving mechanism

Publications (1)

Publication Number Publication Date
TW512308B true TW512308B (en) 2002-12-01

Family

ID=27731279

Family Applications (1)

Application Number Title Priority Date Filing Date
TW90111865A TW512308B (en) 2001-05-17 2001-05-17 Real-time lip dynamic simulation method with the voice as the driving mechanism

Country Status (1)

Country Link
TW (1) TW512308B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9754193B2 (en) * 2013-06-27 2017-09-05 Hewlett-Packard Development Company, L.P. Authenticating a user by correlating speech and corresponding lip shape

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9754193B2 (en) * 2013-06-27 2017-09-05 Hewlett-Packard Development Company, L.P. Authenticating a user by correlating speech and corresponding lip shape

Similar Documents

Publication Publication Date Title
Cummins et al. An image-based deep spectrum feature representation for the recognition of emotional speech
US20230122905A1 (en) Audio-visual speech separation
TW448416B (en) Speaker verification and speaker identification based on eigenvoices
TWI255141B (en) Method and system for real-time interactive video
CN113838173B (en) A virtual human head motion synthesis method driven by speech and background sound
CA2228901A1 (en) Automated speech alignment for image synthesis
CN104253984A (en) Information processor and information processing method
CN109859298A (en) A kind of image processing method and its device, equipment and storage medium
CN104575519B (en) The method, apparatus of feature extracting method, device and stress detection
CN105405436A (en) Scoring device and scoring method
Samonte et al. BridgeApp: An assistive mobile communication application for the deaf and mute
JP2000308198A (en) hearing aid
Krecichwost et al. Automated detection of sigmatism using deep learning applied to multichannel speech signal
Luong et al. Many-to-many voice conversion based feature disentanglement using variational autoencoder
CN113208592A (en) Psychological test system with multiple answering modes
Cheng et al. Haptic force guided sound synthesis in multisensory virtual reality (VR) simulation for rigid-fluid interaction
TW512308B (en) Real-time lip dynamic simulation method with the voice as the driving mechanism
CN114842859A (en) Voice conversion method, system, terminal and storage medium based on IN and MI
Paleček et al. Audio-visual speech recognition in noisy audio environments
JP2021026700A (en) Commodity and service providing apparatus with interactive function
TW200413961A (en) Device using handheld communication equipment to calculate and process natural language and method thereof
CN114051105B (en) Multimedia data processing method and device, electronic equipment and storage medium
CN118298482A (en) 2D digital human interaction system and 2D digital human interaction method
CN108462916A (en) Artificial intelligence square dance focuses sound system and method
KR102919462B1 (en) Apparatus and method for speech synthesis

Legal Events

Date Code Title Description
GD4A Issue of patent certificate for granted invention patent
MM4A Annulment or lapse of patent due to non-payment of fees