WO2015192395A1 - 人声语音质量评分方法及系统 - Google Patents
人声语音质量评分方法及系统 Download PDFInfo
- Publication number
- WO2015192395A1 WO2015192395A1 PCT/CN2014/081156 CN2014081156W WO2015192395A1 WO 2015192395 A1 WO2015192395 A1 WO 2015192395A1 CN 2014081156 W CN2014081156 W CN 2014081156W WO 2015192395 A1 WO2015192395 A1 WO 2015192395A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- signal
- speech
- output
- stream
- segment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B17/00—Monitoring; Testing
- H04B17/30—Monitoring; Testing of propagation channels
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M9/00—Arrangements for interconnection not involving centralised switching
Definitions
- the present application relates to the field of voice transmission detection, and more particularly to a voice quality quality scoring method and system.
- BACKGROUND OF THE INVENTION In the prior art, it is required to perform detection (inspection) or testing on performances such as voice transmission and video transmission of a device used for voice or video transmission, and only if the results obtained by these tests or tests meet the requirements
- the required standards are required to ensure that these voice or video equipment are available and easy to use, and that the quality is guaranteed, so that these equipment can be produced and put into practical use.
- the performance requirements for various aspects of the use of equipment and equipment are constantly being updated.
- the test method for the audio transmission quality test of the known building intercom system is a test method for evaluating the whole voice test of the voice transmission quality of the building intercom system product, which includes five technical parameters (such as: loudness evaluation value, frequency Test and calculation of response/frequency response, distortion, signal-to-noise ratio, sidetone masking rating).
- the audio signal detection at the output is focused on the evaluation of the technical characteristics of its frequency response, distortion and amplitude characteristics.
- the sound source/sound source that is, the signal source 301 generates an audio signal
- a microphone such as a microphone MIC, etc.
- a central processing unit CPU provided in the microphone 304 collects an output signal, which can convert the audio signal into an electrical signal; the microphone 304 transmits the electrical signal to an amplifying device 305 (such as a signal amplifier) After the measurement device measures, the result is output by the frequency finder 306, wherein the measuring device is The corresponding output signals are evaluated by the corresponding technical parameters including:
- Frequency response test Adjust the frequency of the audio signal source in the range of 200-4000 Hz, record the CPU sample value at the MIC end (the audio response signal at the output end), and thus obtain the sound pressure of each frequency point adjusted by the audio signal source (such as : sound intensity, sound energy) effective value, and thus calculate (such as: Fourier transform, frequency meter measurement, etc.) to get the frequency response of the output (such as: speaker);
- Distortion test Adjust the frequency of the audio signal source to the frequency to be tested, record the sample value of the CPU at the MIC end, and obtain the spectrum of the sample signal by spectral conversion (such as Fourier transform), and then pass the spectrum diagram. Calculating the distortion value of the speaker at the frequency of the signal input signal corresponding to the signal;
- Amplitude characteristic test Change the output amplitude of the audio signal source at the same frequency, record the MIC end
- the sample value of the CPU is obtained by spectrum conversion (such as Fourier transform) to obtain the spectrum of the sampled value signal, and whether the output value under different amplitude signal input corresponds to the input value, for example, whether the linear relationship is met, Example: The input is increased by 10db, and the output is also increased by 10db.
- spectrum conversion such as Fourier transform
- the degree of reduction of the output signal is indirectly determined based only on the above parameters, for example: the smaller the distortion, the flatter the frequency response (ie, as much as possible, the change between input and output is stable and linear), then the output is judged. The better the sound, and so on.
- the degree of reduction refers to the consistency of the shape of the original frame spectrum input by the signal source and the spectral shape of the sample frame at the output end. It is an important technical parameter for evaluating the overall performance of the device and the system, especially the output performance, in audio measurement. However, due to the lack of direct testing and judgment of the output signal reduction degree of the output end in the prior art, the evaluation of the performance of the device and the system, especially the output performance, is not accurate.
- the existing test methods also have the following defects: The test results cannot be tested for all the continuous frequencies of interest, and the actual test results are inaccurate; when the actual device and system are applied, the actual voice signals are multi-frequency signals (such as : The human voice is a superposition of N frequency points/frequency), and the existing distortion test is performed using a single frequency point (ie, pure 200 Hz, 400 Hz, etc.), which is inconsistent with the actual multi-frequency point distortion. It is also inaccurate to use the distortion of the test to determine the degree of reduction. Therefore, the detection result of the existing audio signal is inaccurate or the accuracy is not high, and the actual subjective evaluation has a large gap. .
- the main object of the present application is to provide a human voice. Sound quality scoring methods and systems to improve the accuracy and accuracy of audio signal reduction testing. Further, the overall accuracy of audio signal performance testing of devices, systems, and the like is improved.
- the object of the present application is achieved by the following technical solutions.
- the present application provides an audio signal reduction degree testing method, including: a sound source portion, a system under test, an audio signal collecting device, and an audio signal analyzing device; the sound source portion generates a standard human voice signal, and the human voice signal is As an input voice signal, input from the sound source part to the system under test; the input voice signal is transmitted in the system under test, and is output as an output voice signal from the output end of the system under test; the audio signal collecting device gathers the output end And outputting the continuous speech signal, and transmitting the collected output speech signal to the audio signal analyzing device; the audio signal analyzing device slicing and analyzing the signal stream outputting the speech signal to determine the degree of reduction.
- the audio signal collecting device includes: converting the collected continuous voice signal into a corresponding digital signal stream, and transmitting the signal to the audio signal analyzing device for performing slice and analysis processing of the signal stream.
- the audio signal analyzing device includes: acquiring the live voice signal generated by the sound source portion as an input voice signal; and dividing the signal stream of the input voice signal into time intervals to divide the signal stream into the same A segment of time interval, each segment of the speech signal group contains one or more speech signals, and each segment of the speech signal group is subjected to spectrum conversion to obtain a corresponding sound feature value after each segment of the speech signal group is converted.
- the audio signal analyzing device includes: a signal stream of an output voice signal collected from the audio signal collecting device, and a signal stream of the input voice signal from the sound source portion corresponding to the signal stream of the output voice signal Synchronizing; dividing the signal stream of the continuous speech signal of the set into the same time interval segments at the same time interval as the signal stream of the input speech signal, each speech signal group containing one or more The speech signals are spectrally converted for each segment of the speech signal group, and the corresponding sound feature values after each segment of the speech signal group are converted.
- the audio signal analyzing device includes: extracting a sound feature value corresponding to a segment of a voice signal group in the signal stream belonging to the input voice signal, and extracting a signal belonging to the output voice signal corresponding to the segment
- the sound feature value corresponding to a segment in the stream based on the principle of similarity, calculates and analyzes the comparison score values of the two corresponding segments;
- the comparison score values of the segments of the speech signal group in the signal stream of the speech signal and the corresponding signal stream of the output speech signal are statistically and/or averaged to determine the degree of reduction.
- the present application also provides a vocal voice quality scoring method, including: the sound source portion generates a standard human voice signal, and the human voice signal is input as an input voice signal from the sound source portion to the system under test; The voice signal is transmitted in the system under test, and is output as an output voice signal from the output end of the system under test; the continuous voice signal outputted from the output terminal is collected; the signal stream of the output voice signal of the set is sliced and analyzed, Determine the degree of reduction.
- the method further includes: pre-acquiring the voice signal generated by the sound source portion as the input voice signal; and dividing the signal stream of the input voice signal into time intervals to divide the signal stream into the same time interval.
- each segment of the speech signal group includes one or more speech signals, and performs spectral conversion on each segment of the speech signal group to obtain a corresponding sound feature value after each segment of the speech signal group is converted.
- the signal stream of the output voice signal of the set is sliced and analyzed, including: a signal stream of the output voice signal collected from the audio signal collecting device, and a sound stream corresponding to the signal stream of the output voice signal.
- the signal stream of the input speech signal of the source portion is synchronized; the signal stream slices of the consecutive speech signals of the set are divided into segments of the same time interval at the same time interval as the signal stream slicing the input speech signal,
- Each segment of the speech signal group includes one or more speech signals, and each segment of the speech signal group is subjected to spectrum conversion to obtain a corresponding sound feature value after each segment of the speech signal group is converted.
- And determining, by the slice and the analysis, the signal stream of the output voice signal, and determining the degree of reduction comprising: extracting a sound feature value corresponding to a segment of a voice signal group in the signal stream belonging to the input voice signal, and Extracting a sound feature value corresponding to a segment of the signal stream belonging to the output voice signal corresponding to the segment, and calculating and analyzing a comparison score value of the two corresponding segments based on the similarity principle;
- the comparison score values of the segments of the speech signal group in the signal stream of the input speech signal and the corresponding output speech signal are statistically and/or averaged to determine the degree of reduction.
- the method further includes: a signal flow of the input voice signal and a signal flow of the output voice signal The same time interval is sliced, and the signal stream is sliced into segments of the speech signal group containing one or more speech signals at intervals of 20 ms.
- This application uses a human voice to compare and analyze with a collection or a sample signal, which is in accordance with the actual application of the device and its system (for example, building intercom system), and thus can obtain a test that is consistent with the actual application.
- the slicing of successively transmitted audio signals is processed in a continuous frame manner, including testing of all continuous frequencies of interest (eg, all frequencies in human speech from 300 Hz to 3400 Hz), and including The degree of reduction of audio output performance is directly determined based on continuous slices (frames), which is more accurate and accurate for discriminating audio test results and audio output performance of secure communication devices and systems.
- all continuous frequencies of interest eg, all frequencies in human speech from 300 Hz to 3400 Hz
- the degree of reduction of audio output performance is directly determined based on continuous slices (frames), which is more accurate and accurate for discriminating audio test results and audio output performance of secure communication devices and systems.
- FIG. 1 is a structural block diagram of an embodiment of a vocal speech quality scoring system of the present application
- FIG. 2 is a flow chart of an embodiment of a vocal speech quality scoring method of the present application
- a sound source uses real human voice as an input voice signal, so that the audio performance of the voice transmitted in the detection is more in line with the actual application of the device or system under test, and thus is detected.
- the system's sound output characteristics (such as building intercom systems such as communication equipment and communication systems that require safety performance) to obtain more accurate and accurate test results; further, continuous audio signal is sliced continuously And comparing the similarity with the human input speech signal to obtain the score of the degree of reduction, thereby more accurately and accurately determining the sound output performance of the detected system.
- FIG. 1 there is shown a block diagram of an embodiment of a vocal speech quality scoring system of the present application.
- the detection system 100 of this embodiment may mainly include: a sound source portion 110, a system under test 120, an audio signal collecting device (sampling device) 130, and an audio signal analyzing device 140.
- the sound source portion 110 generates a specific voice signal, which can be a standard human speaking voice, for example: International Telecommunication Union Telecommunication Standardization Organization P.501 human voice signal.
- the live voice signal is used as a test transmission voice for the audio transmission characteristics of the detected system 120.
- the voice signal can be input to the input end of the system under test 120 via the sound source portion 110 as an input voice signal (for example, the signal source 301), transmitted in the system under test 120, and finally used as the measured output voice signal by the system under test.
- the output of 120 (for example: speaker, handset, etc. 303) is output.
- the system under test 120 may be a building intercom system, receiving an input voice signal from the sound source portion 110, passing the power amplifier device, the measured channel, and the power amplifier device, and transmitting the input voice signal until the system under test
- the output terminal outputs the input voice signal of the system under test 120 as the measured output voice signal.
- the measured path may be a call path that needs to be detected in the system under test (such as the tested building intercom system).
- the audio signal collecting device 130 collects the voice signal output by the system under test 120, converts the collected voice signal, and transmits the collected voice signal to the audio signal analyzing device 140 for processing and analysis. For example, set the microphone at the output (example: Mike MIC304).
- the audio signal collecting device 130 may include a MIC, a power amplifier, an audio signal concentrator, and the like.
- the voice signal transmitted by the system under test 120 played by the output speaker of the system under test 120 is received by the MIC.
- the continuous voice signals entering from the input terminal pass through.
- the continuous speech signal outputted through the output is received by the MIC; the continuous speech signal is transmitted to the audio signal collector through the power amplifier, and the continuous speech signals are collected by the audio signal collector.
- the incoming signal is transmitted to the audio signal analyzing device 140.
- the voice signal transmitted by the system under test 120 transmitted by the output terminal of the system under test 120 is received by the MIC, and the output voice signal can be converted into an electrical signal by the MIC, and then the CPU in the MIC or the like
- the processor performs processing such as A/D conversion to form a digital signal, and then transmits the digital signal corresponding to the voice signal to the audio signal analyzing device 140 for processing and analyzing the digital signal. Since the input speech of the sound source portion 110 is continuous, the speech signals outputted by the system under test 120 are also continuous, and thus, the continuous digital signals corresponding to the continuous speech signals collected by the audio signal collecting device 130 are continuous.
- the audio signal analyzing device 140 receives the continuous speech signal transmitted from the audio signal collector 130, or the corresponding continuous digital signal converted into the continuous speech signal, and performs processing and Analysis, in turn, the degree of reduction of the speech signal can be determined.
- the audio signal analysis device 140 may include a built-in processor (such as a CPU or the like) or a PC having analytical processing performance and the like. When a continuous speech signal is received, the continuous speech signal is converted by the CPU to form a continuous digital signal or a digital signal stream of speech; when a continuous digital signal that has been converted is received, the continuous digital signal is speech Digital signal stream.
- the digital signal stream corresponding to the continuous speech signal is referred to as a signal stream of the speech signal.
- the signal stream of the voice signal is sliced, for example, the signal stream is divided into N “slices" or N “frames” (N is a positive integer greater than or equal to 0), and based on the slices or frames, All continuous signals of interest for all tests are processed and analyzed.
- N is a positive integer greater than or equal to 0
- All continuous signals of interest for all tests are processed and analyzed.
- it is considered that the human ear does not distinguish the frequency of the signal level frequency in the time period of 20 milliseconds (ms), and selects to slice the signal stream at time intervals, each slice. / frame time is 20ms.
- each slice/frame is spectrally converted, and the spectrum of the slice/frame of the converted slice/frame is compared with the spectrum of the slice corresponding to the signal stream of the voice signal at the input of the original sound source portion 110 (ie, the corresponding frame spectrum of the input signal stream). Comparative analysis to obtain a reduction test/test result.
- the flow of an embodiment of the voice quality scoring method of the present application shown in FIG. 2 will be combined with the following.
- the vocal speech quality scoring system and method thereof of the present application are more specifically described.
- the sound source portion 110 generates a specific speech signal, which may be a standard human speaking sound, such as: International Telecommunications Union Telecommunication Standardization Organization P.501 real human speech signal.
- the speech signal is used as a test transmission speech signal for the audio transmission characteristics of the system under test 120.
- the voice signal can be input to the input end of the system under test 120 via the sound source portion 110, input to the system under test 120, and finally output as the measured output voice signal from the output of the system under test 120.
- the human voice signal contains all the cross-talk distortion, and it is used as the input signal, which is more in line with the environment of the system under test, and the test is more accurate and objective.
- the system under test 120 for example: building intercom system. For a detailed implementation of this step, reference may be made to the description of the sound source portion 110 and its system under test 120 in the system.
- step 220 the continuous speech signal output by the human voice signal through the system under test 120 is collected by the audio signal collecting device 130 and sent to the audio signal analyzing device 130 for analysis.
- the audio signal analyzing device 140 slices the collected audio signal (speech signal), and then performs spectrum conversion, and performs comparison analysis with the specific voice signal generated by the sound source portion 110 to obtain a degree of reduction test result.
- the digital signal of the standard speech signal may be sliced in advance by the processor (CPU) of the audio signal analyzing device 140, each segment 20ms (ie "frame"), the signal stream is sliced into an N-terminal speech signal group (ie N-frame signal).
- each segment of the speech signal group includes one or more speech signals (or signal parameters), and the N frame signals are, for example, PI, P2, P3....PN, and the frames after the slices are stored.
- Each frame, such as P1 consists of a digital signal within a 20ms period.
- each frame signal (i.e., the signal stream) is converted into a corresponding spectrum and stored.
- Each frame has a corresponding sound feature value after being converted.
- the audio signal analyzing device 140 receives the collected signal stream, it receives the transmission.
- the N-segment voice signal group is sliced in a manner of 20 ms per segment. That is N frames.
- each frame of the signal that is, each voice signal group includes one or more voice signals (or signal parameters), and the N frame signals, for example: pi , p2, P 3....pN, store the sliced frames
- Each frame, such as pi is also composed of digital signals within a 20ms period.
- each frame of the signal is converted into a corresponding spectrum and saved.
- Each frame has a corresponding sound feature value after being converted.
- the speech signal (such as the digital signal stream of the speech signal) can be spectrally converted by Fourier transform or other sound signal processing, and the sound feature values of each slice are obtained at the same time.
- the output speech signal transmitted by the system under test 120 corresponding to the input speech signal is the same as the standard speech signal as the input speech signal.
- the same slice and spectrum conversion are performed to obtain each signal group or Say the sound characteristic value of each frame signal. Then, extracting the sound feature values, that is, the sound feature values corresponding to each frame in PI, P2, P3, . . . PN and the sound feature values corresponding to each frame in pi, p2, p3, . . . , pN, based on, for example, similarity
- the principle of similarity/similarity calculation, the similarity calculation or the similarity principle analysis is performed on each corresponding frame, that is, the sound feature value corresponding to P1 and the sound feature value corresponding to pi, to determine the degree of reduction.
- the similarity between the P1 eigenvalue and the pi eigenvalue is calculated, and the similarity value is 0 ⁇ 1 (0 is similar or 100% similar).
- the range of the value can be multiplied by a multiple. For example, 100, using the percentage system, that is, 0 to 100 points, so that each frame can get a comparative analysis score.
- the eigenvalue of P1 the similarity of the characteristic values of B, C, D and pi, 3, b, c, and d
- the eigenvalue of P1 is a one-dimensional array [A, B, C, D]
- the eigenvalue of pi is a one-dimensional array [a, b, c, d]
- the arrangement is a one-dimensional array, and, A ⁇ D, a ⁇ d number are from small to large
- the human voice is detected by detecting the sound source signal during the detection process to ensure that the system under test is in the actual working environment during the detection process, and since the signal is a human voice, the distortion includes all the intermodulation.
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Electromagnetism (AREA)
- Computer Networks & Wireless Communication (AREA)
- Telephonic Communication Services (AREA)
- Telephone Function (AREA)
Abstract
一种人声语音质量评分方法及系统,该方法包括:声源部分(110)产生标准的真人语音信号作为输入语音信号,从声源部分(110)输入被测系统(120);输入语音信号在被测系统(120)中传递,并作为输出语音信号从被测系统(120)的输出端输出;采集输出的连续的语音信号;将采集的输出语音信号的信号流切片和分析处理而确定还原度。
Description
人声语音质量评分方法及系统 技术领域 本申请涉及语音传输检测领域, 更具体地涉及人声语音质量评分方法及 系统。 背景技术 在现有技术中, 需要对语音或视频传输使用的设备进行语音传输、 视频 传输等性能是否符合规定的各项指标做检测 (检验)或测试, 只有当这些检 测或测试所获得结果符合规定的各项标准要求, 才能保证这些语音或视频设 备是可用且好用、 质量是有保障的, 进而, 才能将这些设备进行生产和实际 应用。 随着语音、 视频传输设备的各项技术的不断进步, 对设备以及设备配 合使用时的各方面的性能要求也在不断更新进步中。因此,对于设备检测(检 验) 的技术, 也随之不断改进以便提高检测检验结果的准确性和精确度。 以 应用于住宅楼宇出入口控制的楼宇对讲系统为例, 对其使用的设备以及系统 整体上的音频传输特性的检测 (检验)或测试就是整个对讲系统检测最为重 要的部分。 已知的楼宇对讲系统的音频传输质量测试的试验方法是评价楼宇 对讲系统产品语音传输质量的一整套全程声测试的试验方法, 其包括了 5项 技术参数(如: 响度评定值、 频率响应 /频响、 失真、 信噪比、 侧音掩蔽评定 值) 的测试和计算。 该已有的测试方法中, 对输出端的音频信号检测则专注 于对其频响、 失真及振幅特性的技术特性的评测。 例如, 如图 3所示, 声源 / 音源即信号源 301产生音频信号后, 通过功放装置 302、 被测系统的音频信 号通路等, 在被测系统的音频输出端(如: 被测的喇叭、 听筒等) 303输出; 输出端 303的输出口固定在隔声挡板上, 隔音挡板作为模拟的设备安装用的 墙体以保证测试结果更准确性, 而正对输出口方向, 设有传声器(如: 麦克 MIC等) 304 , 传声器 304内设的中央处理器 CPU釆集输出信号, 可以将音 频信号转换为电信号; 传声器 304将电信号经过一放大装置 305 (如信号放 大器)传给测量装置测量后由频语仪 306输出结果, 其中, 测量装置对釆样
的输出信号进行相应技术参数评测包括:
频响测试: 调节音频信号源的频率在 200-4000Hz范围, 记录在 MIC端 的 CPU釆样值(输出端的音频响应信号 ) , 由此得到音频信号源调节到的每 个频点的声压 (如: 声音强弱、 声音能量)有效值, 从而计算(如: 傅里叶 变换、 电频表测算等)得到输出端 (如: 喇叭) 的频响;
失真测试: 调节音频信号源的频率至要测试的频率, 记录 MIC端 CPU 的釆样值, 通过频谱转换(如: 傅里叶转换)得到该釆样值信号的频谱图, 再通过该频谱图计算出喇叭在该信号所对应的信号源输入信号频率下的失真 值;
振幅特性测试: 在同一频率下改变音频信号源的输出幅度, 记录 MIC端
CPU的釆样值, 通过频谱转换(如: 傅里叶变换)得到该釆样值信号的频谱, 观察在不同幅度信号输入下的输出值是否与输入值相对应, 比如, 是否符合 线性关系, 例: 输入提高 10db, 输出是否也提高 10db。
进而,仅基于上述参数间接判别其输出信号的还原度, 比如: 失真越小, 频响越平 (即: 尽可能输入输出之间的变化是趋近于稳定、 线性的) , 则判 断输出的声音越好, 等等。 其中, 还原度, 是指信号源输入的原帧频谱的形 状与输出端的釆样帧频谱形状的一致性, 其是音频测量中评测设备和系统整 体性能尤其是输出性能的一个重要技术参数。 而由于现有技术缺少对输出端 的输出信号还原度进行直接测试和判断, 导致对设备和系统性能尤其输出性 能的测评并不精确。 并且, 现有的测试方式中还存在以下缺陷: 不能对所有 关心的连续频率都测试从而导致的测试结果不精确;实际设备和系统应用时 , 实际存在的语音信号都是多频点信号(如:人发声为 N个频点 /频率的各种叠 加), 而现有的失真测试是使用单频点(即单纯的 200Hz、 400Hz等)进行, 与实际的多频点的失真情况并不一致, 导致利用该测试的失真来判别还原度 也不准确。 因而, 最终导致现有的音频信号的检测结果并不准确或者说精确 度不高, 与实际的主观评测差距较大。 。 发明内容 基于现有技术中存在的上述缺陷, 本申请的主要目的是提供一种人声语
音质量评分方法及系统, 以提高音频信号还原度测试的准确度、 精确度。 进 一步, 提高了对设备、 系统等的音频信号性能测试整体的精确度。 为了解决 上述现有技术中的技术缺陷, 本申请的目的是通过以下技术方案来实现的。 本申请提供了一种音频信号还原度测试方法, 包括: 声源部分、 被测系 统、 音频信号釆集装置、 音频信号分析装置; 声源部分产生标准的真人语音 信号, 将所述真人语音信号作为输入语音信号, 从声源部分输入到所述被测 系统; 输入语音信号在被测系统中传递, 并作为输出语音信号从被测系统的 输出端输出; 音频信号釆集装置釆集输出端输出的连续的语音信号, 并且, 将釆集的输出语音信号传送到音频信号分析装置; 音频信号分析装置对输出 语音信号的信号流进行切片和分析处理, 确定还原度。 其中, 所述音频信号釆集装置, 包括: 将釆集的连续语音信号转换成相 应的数字信号流, 传送给音频信号分析装置进行信号流的切片和分析处理。 其中, 音频信号分析装置, 包括: 获取声源部分产生的作为输入语音信 号的所述真人语音信号; 将所述输入语音信号的信号流, 以时间间隔进行切 片划分, 以将信号流划分成相同时间间隔的片段, 每段语音信号组中包含有 一个或多个语音信号, 对每段语音信号组进行频谱转换, 获得每段语音信号 组转换后对应的声音特征值。 其中, 音频信号分析装置, 包括: 对来自音频信号釆集装置釆集的输出 语音信号的信号流, 与该输出语音信号的信号流相对应的来自声源部分的所 述输入语音信号的信号流进行同步; 以与切片所述输入语音信号的信号流相 同的时间间隔, 对釆集的连续的语音信号的信号流切片划分成相同时间间隔 的片段, 每段语音信号组中包含有一个或多个语音信号, 对每段语音信号组 进行频谱转换 , 获得每段语音信号组转换后对应的声音特征值。 其中, 音频信号分析装置, 包括: 提取属于所述输入语音信号的信号流 中的一语音信号组的片段所对应的声音特征值, 以及提取与该片段相应的、 属于所述输出语音信号的信号流中的一片段所对应的声音特征值, 基于相似 性原理, 计算和分析两个相应片段的对比分数值; 将所有相应的、 分属于输
入语音信号的信号流和相应的输出语音信号的信号流中的语音信号组的片段 的对比分数值, 进行统计和 /或平均计算, 以确定还原度。 本申请还提供了一种人声语音质量评分方法, 包括: 声源部分产生标准 的真人语音信号, 将所述真人语音信号作为输入语音信号, 从声源部分输入 到所述被测系统; 输入语音信号在被测系统中传递, 并作为输出语音信号从 被测系统的输出端输出; 釆集输出端输出的连续的语音信号; 将釆集的输出 语音信号的信号流进行切片和分析处理, 确定还原度。 其中, 还包括: 预先获取声源部分产生的作为输入语音信号的所述真人 语音信号; 将所述输入语音信号的信号流, 以时间间隔进行切片划分, 以将 信号流划分成相同时间间隔的片段, 每段语音信号组中包含有一个或多个语 音信号, 对每段语音信号组进行频谱转换, 获得每段语音信号组转换后对应 的声音特征值。 其中, 将釆集的输出语音信号的信号流进行切片和分析处理, 包括: 对 来自音频信号釆集装置釆集的输出语音信号的信号流, 与该输出语音信号的 信号流相对应的来自声源部分的所述输入语音信号的信号流进行同步; 以与 切片所述输入语音信号的信号流相同的时间间隔, 对釆集的连续的语音信号 的信号流切片划分成相同时间间隔的片段, 每段语音信号组中包含有一个或 多个语音信号, 对每段语音信号组进行频谱转换, 获得每段语音信号组转换 后对应的声音特征值。 其中, 将釆集的输出语音信号的信号流进行切片和分析处理, 确定还原 度, 包括: 提取属于所述输入语音信号的信号流中的一语音信号组的片段所 对应的声音特征值, 以及提取与该片段相应的、 属于所述输出语音信号的信 号流中的一片段所对应的声音特征值, 基于相似性原理, 计算和分析两个相 应片段的对比分数值; 将所有相应的、 分属于输入语音信号的信号流和相应 的输出语音信号的信号流中的语音信号组的片段的对比分数值 ,进行统计和 / 或平均计算, 以确定还原度。 其中, 还包括: 对输入语音信号的信号流和输出语音信号的信号流以相
同的时间间隔进行切片, 是以 20ms时间间隔将信号流切片包含一个或多个 语音信号的语音信号组的片段。 本申请使用真人说话声音与釆集或釆样信号进行比较分析, 符合设备及 其所在系统(例如: 楼宇对讲系统)在实际应用中的情况, 而由此能得到与 实际应用比较一致的测试结果; 而对连续发出的音频信号的切片以连续的帧 方式进行处理, 包括了对所有关心的连续频率 (例如: 从在 300Hz~3400Hz 范围的真人说话语音中的全部频率) 的测试、 以及包括了直接基于连续的切 片 (帧)判断音频输出性能的还原度, 其对安全通信设备及系统的音频测试 结果、 音频输出性能的判别更准确、 更精确。 附图说明 此处所说明的附图用来提供对本申请的进一步理解, 构成本申请的一部 分, 本申请的示意性实施例及其说明用于解释本申请, 并不构成对本申请的 不当限定。 在附图中: 图 1为本申请的人声语音质量评分系统的一实施例的结构框图; 图 2为本申请的人声语音质量评分方法的一实施例的流程图; 图 3为现有的音频传输质量测试中对输出端的几个测评参数进行测量的 一实施例的示意图。 具体实施方式 本申请的主要思想在于,检测中,声源釆用真人语音作为输入语音信号, 使得在检测中传输的语音其音频性能更符合被测设备或系统的实际应用情 况, 从而对被检测系统的声音输出特征(比如楼宇对讲系统这类对安全性能 有要求的通信设备和通信系统)进行检测或测试所获得的更准确、 精确的测 试结果; 进一步, 将连续音频信号进行切片连续处理, 与真人输入语音信号 进行相似性比较, 得到还原度的分数, 从而更精确、 准确地确定被检测系统 的声音输出性能。
为使本申请的目的、 技术方案和优点更加清楚, 下面将结合本申请具体 实施例及相应的附图对本申请技术方案进行清楚、 完整地描述。 显然, 所描 述的实施例仅是本申请一部分实施例, 而不是全部的实施例。 基于本申请中 的实施例, 本领域普通技术人员在没有做出创造性劳动前提下所获得的所有 其他实施例, 都属于本申请保护的范围。 参见图 1 所示本申请的人声语音质量评分系统的一实施例的结构示意 图。 该实施例中, 以对楼宇对讲系统的音频传输特性进行釆样测试为例, 基 于釆样信号的频谱分析以及声源信号进行还原度分析, 确定该被测系统的声 音输出性能。 该实施例的检测系统 100中, 主要可以包括: 声源部分 110、 被测系统 120、 音频信号釆集装置 (釆集器) 130、 音频信号分析装置 140。
声源部分 110,产生特定语音信号, 此特定语音信号可以为标准的人说话 的声音, 例如: 国际电信联盟远程通信标准化组织 P.501真人语音信号。 该 真人语音信号作为被检测系统 120的音频传输特性的测试用传输语音。 该语 音信号可以经由声源部分 110作为输入语音信号 (例如: 信号源 301 ) , 输 入到被测系统 120的输入端, 在被测系统 120中传输, 最后作为被测输出语 音信号由被测系统 120的输出端 (例如: 喇叭、 听筒等 303 )输出。 被测系统 120, 在本实施例中, 可以是楼宇对讲系统, 接收来自声源部 分 110的输入语音信号, 经过功放装置、 被测通路、 功放装置, 传输该输入 语音信号直到被测系统的输出端, 由该输出端将经过被测系统 120的该输入 的语音信号作为被测输出语音信号而输出。 其中, 被测通路, 可以是被测系 统(如被测的楼宇对讲系统 ) 中需要检测的通话通路。 音频信号釆集装置 130, 釆集被测系统 120输出的语音信号, 把釆集到 的语音信号进行转换后传送到音频信号分析装置 140进行处理分析。 如在输 出端设置传声器 (例: 麦克 MIC304 ) 。 音频信号釆集装置 130可以包括 MIC、 功放、 音频信号釆集仪、 等等。 例如: 由 MIC接收被测系统 120的输出端喇叭播放的经过该被测系统 120传输过来的语音信号, 具体地, 这些从输入端进入的连续语音信号经过
被测系统 120后, 通过输出端作为输出的连续语音信号被 MIC所接收; 连续 的语音信号, 通过功放, 传递到音频信号釆集仪, 由音频信号釆集仪将这些 连续的语音信号釆集到的并传送给音频信号分析装置 140。 进一步, 例如: 由 MIC接收被测系统 120的输出端喇八播放的经过该被 测系统 120传输过来的语音信号,可以由 MIC将输出的语音信号转换成电信 号, 再由 MIC中的 CPU等处理器进行 A/D转换等处理, 形成数字信号, 再 将对应语音信号的数字信号传送到音频信号分析装置 140进行数字信号的处 理和分析。 由于声源部分 110的输入语音是连续的, 通过被测系统 120输出 的语音信号也是连续的, 因而, 这些由音频信号釆集装置 130釆集到的连续 的语音信号所对应的连续的数字信号可以传送到音频信号分析装置 140 音频信号分析装置 140, 接收到从音频信号釆集器 130传送来的连续语 音信号, 或者说, 连续语音信号所转换成的相应的连续数字信号, 并进行处 理和分析, 进而, 可以确定语音信号的还原度。 具体地, 音频信号分析装置 140可以包括内置的处理器 (如 CPU等)或 者具有分析处理性能的 PC机等等。 当接收到连续语音信号时, 通过 CPU对 连续的语音信号做转换形成连续的数字信号或者说语音的数字信号流; 当接 收到已经转换成的连续数字信号时,该连续数字信号即为语音的数字信号流。 这里, 将连续语音信号对应的数字信号流称为语音信号的信号流。 进一步, 对语音信号的信号流进行切片, 如: 将信号流划分成 N "片" 或者说 N "帧" (N为大于等于 0的正整数) , 再基于这些切片或帧, 对釆 集到的所有测试所关心的所有连续信号进行处理和分析。 在一个实施例中, 可以基于哈斯效应的原理, 考虑人耳在 20 毫秒(ms ) 的时间段中不会分辨 出信号电平频率的先后, 选择以时间间隔对信号流切片, 每个切片 /帧的时间 为 20ms。 进而, 对每个切片 /帧进行频谱转换, 用转换后的切片 /帧的频谱与 原始声源部分 110输入端的语音信号的信号流对应的切片的频谱 (即输入信 号流的相应帧频谱)进行比较分析, 以获得还原度测试 /检测结果。 下面将结合图 2 所示本申请的人声语音质量评分方法一实施例的流程
图, 对本申请的人声语音质量评分系统及其方法进行更具体的描述。 在步骤 210, 声源部分 110产生特定语音信号, 该语音信号可以是标准 的人说话的声音, 如: 国际电信联盟远程通信标准化组织 P.501真人语音信 号。 该语音信号作为被测系统 120的音频传输特性的测试用传输语音信号。 该语音信号可以经由声源部分 110作为输入语音信号, 输入到被测系统 120 的输入端, 在被测系统 120中传输, 最后作为被测输出语音信号由被测系统 120 的输出端输出。 真人语音信号包含所有的交条失真, 釆用其作为输入信 号, 更符合被测系统的使用环境, 其测试更准确客观。 被测系统 120, 例如: 楼宇对讲系统。 本步骤的具体实施可以参见对系统中关于声源部分 110及其 被测系统 120的描述。 在步骤 220, 真人语音信号通过被测系统 120输出的连续语音信号被音 频信号釆集装置 130进行釆集, 并发给音频信号分析装置 130分析。 本步骤 的具体实施可以参见对系统中关于音频信号釆集装置 130的描述。 在步骤 230, 音频信号分析装置 140对釆集的音频信号 (语音信号)进 行切片, 然后进行频谱转换, 与声源部分 110产生的特定语音信号进行比较 分析, 得到还原度测试结果。 本步骤的具体实施可以参见对系统中关于音频 信号分析装置 140的描述。 下面将在一个实施方式中, 对音频信号的切片处理做进一步描述。 首先, 可以由该音频信号分析装置 140的处理器 (CPU )预先将标准语 音信号的数字信号, 即输入端的声源部分 110产生的输入的连续语音信号对 应的数字信号流, 进行切片, 每段 20ms (即 "帧" ) , 信号流切片成 N端 语音信号组(即 N帧信号) 。 其中, 每段语音信号组(N帧信号) 中包含一 个或多个语音信号 (或称信号参数) , N 帧信号例如: PI , P2, P3....PN, 存储这些切片后的帧, 而每帧如 P1 , 由 20ms时间段内数字信号组成。 进而, 把每帧信号 (即该信号流)转换成对应的频谱, 并存储。 其中每帧经转换后 有相应的声音特征值。 然后, 当该音频信号分析装置 140接收到釆集的信号流时, 即接收到传
送来的从测试的输出端所釆集的、 对应该输入的连续语音信号的信号流时, 与该输入信号流进行同步后, 同样, 以每段 20ms的方式切片成 N段语音信 号组, 即 N帧。 其中, 每帧信号即每段语音信号组中包含一个或多个语音信 号 (或称为信号参数) , N帧信号例如: pi , p2, P3....pN存储这些切片后 的帧, 而每帧如 pi , 也由 20ms时间段内数字信号组成。 进而, 把每帧信号 都转换成对应的频谱, 并保存。 其中每帧经转换后有相应的声音特征值。 其中, 语音信号 (如: 语音信号的数字信号流) , 可以通过傅里叶变换 或者其他声音信号处理实现频谱转换, 同时得到各切片的声音特征值。 基于 前述对应输入语音信号的经过被测系统 120传输后的输出语音信号、 与作为 输入语音信号的标准语音信号一样, 在二者同步后, 进行同样的切片和频谱 转换, 得到每段信号组或者说每帧信号的声音特征值。 接着, 提取这些声音特征值, 即 PI , P2, P3....PN中每帧对应的声音特 征值以及 pi , p2, p3....pN中每帧对应的声音特征值, 基于诸如相似性原理 / 相似度计算等方式, 对每个对应的帧即 P1对应的声音特征值和 pi对应的声 音特征值进行相似度计算或相似性原理分析, 确定还原度。 如: P1特征值与 pi特征值之间的相似度计算, 得到相似度值 0~1 ( 0相似或 100%相似), 为 了更清晰的确定分析结果, 可以对该值的范围同乘以倍数如 100, 釆用百分 制, 即 0~100分, 从而每帧都能得到一个对比分析的分数。 例如: P1的特征值 、 B、 C、 D与 pi的特征值3、 b、 c,、 d之间 #丈一 一匹配的相似性对比, 得出 C不同于 c', 只有 3个相似, 为 3/4*100=75分。 又例如: P1的特征值为一维数组 [A、 B、 C、 D] , pi的特征值为一维数 组 [a、 b、 c、 d] , 排列都是一维数组, 并且, A~D、 a~d编号都为从小到大, 则按相似性原理分析该排列趋势、 动向相似 /相同, 因而, P1与 pi的频语对 比分数为 ( 1/2+1/2 ) * 100=100分。 最后, 根据每帧的分数, 确定所有帧的分数, 从而确定还原度, 即输出 声音对输入声音的还原程度(与输入的相似程度) 。 例如: 统计所有帧的分 数并获得该输出语音信号的平均得分, 该平均得分就是该被测系统 120的还
原度得分。 进一步, 在信号流中还可能存在间隙帧, 而由于间隙帧不包含语 音信息, 为了减少干扰, 可以把间隙帧的分数删除, 而只考虑属于有效语音 信号的帧的分数, 然后统计有效的帧的分数得到该输出信号的平均得分, 该 平均得分就是该被测系统 120的还原度得分。 利用本申请的方案,通过在检测过程中检测用的声源信号釆用真人语音, 以保证被测系统在检测过程中处于实际工作环境, 并且, 由于信号是真人语 音, 其失真包含所有交调失真; 进而, 通过对信号的切片处理, 包括了对连 续频率的测试, 符合真人语音的信号状况, 更能充分显现被测系统的声音输 出特性, 因此, 直接对还原度的连续信号切片检测方式, 能获得更准确、 更 精确的被测系统、 设备的检测结果。 需要说明的是, 术语"包括"、 "包含"或者其任何其他变体意在涵盖非排 他性的包含, 从而使得包括一系列要素的过程、 方法、 商品或者设备不仅包 括那些要素, 而且还包括没有明确列出的其他要素, 或者是还包括为这种过 程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句 "包 括一个 ... ... "限定的要素, 并不排除在包括所述要素的过程、 方法、 商品或者 设备中还存在另外的相同要素。 以上所述仅为本申请的实施例而已, 并不用于限制本申请。 对于本领域 技术人员来说, 本申请可以有各种更改和变化。 凡在本申请的精神和原理之 内所作的任何修改、 等同替换、 改进等, 均应包含在本申请的权利要求范围 之内。
Claims
1、 一种人声语音质量评分系统, 至少包括: 声源部分、 被测系统、 音频信号釆集装置、 音频信号分析装置; 声源部分产生标准的真人语音信号, 将所述真人语音信号作为输入语音 信号, 从声源部分输入到所述被测系统; 输入语音信号在被测系统中传递, 并作为输出语音信号从被测系统的输 出端输出; 音频信号釆集装置釆集输出端输出的连续的语音信号, 并且, 将釆集的 输出语音信号传送到音频信号分析装置; 音频信号分析装置对输出语音信号的信号流进行切片和分析处理, 确定 还原度。
2、 如权利要求 1所述的系统, 其特征在于, 所述音频信号釆集装置, 包 括: 将釆集的连续语音信号转换成相应的数字信号流, 传送给音频信号分析 装置进行信号流的切片和分析处理。
3、 如权利要求 1或 2所述的系统, 其特征在于, 音频信号分析装置, 包 括: 获取声源部分产生的作为输入语音信号的所述真人语音信号; 将所述输入语音信号的信号流, 以时间间隔进行切片划分, 以将信号流 划分成相同时间间隔的片段,每段语音信号组中包含有一个或多个语音信号 , 对每段语音信号组进行频谱转换, 获得每段语音信号组转换后对应的声音特 征值。
4、 如权利要求 3所述的系统, 其特征在于, 音频信号分析装置, 包括: 对来自音频信号釆集装置釆集的输出语音信号的信号流, 与该输出语音 信号的信号流相对应的来自声源部分的所述输入语音信号的信号流进行同
步; 以与切片所述输入语音信号的信号流相同的时间间隔, 对釆集的连续的 语音信号的信号流切片划分成相同时间间隔的片段, 每段语音信号组中包含 有一个或多个语音信号, 对每段语音信号组进行频谱转换, 获得每段语音信 号组转换后对应的声音特征值。
5、 如权利要求 4所述的系统, 其特征在于, 音频信号分析装置, 包括: 提取属于所述输入语音信号的信号流中的一语音信号组的片段所对应的 声音特征值, 以及提取与该片段相应的、 属于所述输出语音信号的信号流中 的一片段所对应的声音特征值, 基于相似性原理, 计算和分析两个相应片段 的对比分数值; 将所有相应的、 分属于输入语音信号的信号流和相应的输出语音信号的 信号流中的语音信号组的片段的对比分数值, 进行统计和 /或平均计算, 以确 定还原度。
6、 一种人声语音质量评分方法, 其特征在于, 包括: 声源部分产生标准的真人语音信号, 将所述真人语音信号作为输入语音 信号, 从声源部分输入到所述被测系统; 输入语音信号在被测系统中传递, 并作为输出语音信号从被测系统的输 出端输出; 釆集输出端输出的连续的语音信号; 将釆集的输出语音信号的信号流进行切片和分析处理, 确定还原度。
7、 如权利要求 6所述的方法, 其特征在于, 还包括: 预先获取声源部分产生的作为输入语音信号的所述真人语音信号; 将所述输入语音信号的信号流, 以时间间隔进行切片划分, 以将信号流 划分成相同时间间隔的片段,每段语音信号组中包含有一个或多个语音信号 ,
对每段语音信号组进行频谱转换, 获得每段语音信号组转换后对应的声音特 征值。
8、如权利要求 7所述的方法, 其特征在于, 将釆集的输出语音信号的信 号流进行切片和分析处理, 包括: 对来自音频信号釆集装置釆集的输出语音信号的信号流, 与该输出语音 信号的信号流相对应的来自声源部分的所述输入语音信号的信号流进行同 步; 以与切片所述输入语音信号的信号流相同的时间间隔, 对釆集的连续的 语音信号的信号流切片划分成相同时间间隔的片段, 每段语音信号组中包含 有一个或多个语音信号, 对每段语音信号组进行频谱转换, 获得每段语音信 号组转换后对应的声音特征值。
9、如权利要求 8所述的方法, 其特征在于, 将釆集的输出语音信号的信 号流进行切片和分析处理, 确定还原度, 包括: 提取属于所述输入语音信号的信号流中的一语音信号组的片段所对应的 声音特征值, 以及提取与该片段相应的、 属于所述输出语音信号的信号流中 的一片段所对应的声音特征值, 基于相似性原理, 计算和分析两个相应片段 的对比分数值; 将所有相应的、 分属于输入语音信号的信号流和相应的输出语音信号的 信号流中的语音信号组的片段的对比分数值, 进行统计和 /或平均计算, 以确 定还原度。
10、 如权利要求 8所述的方法, 其特征在于, 还包括: 对输入语音信号 的信号流和输出语音信号的信号流以相同的时间间隔进行切片, 是以 20ms 时间间隔将信号流切片包含一个或多个语音信号的语音信号组的片段。
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| ES14895085T ES2774018T3 (es) | 2014-06-17 | 2014-06-30 | Método y sistema para evaluar la calidad de sonido de una voz humana |
| EP14895085.0A EP3166239B1 (en) | 2014-06-17 | 2014-06-30 | Method and system for scoring human sound voice quality |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201410269839.5A CN104050964A (zh) | 2014-06-17 | 2014-06-17 | 音频信号还原度检测方法及系统 |
| CN201410269839.5 | 2014-06-17 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2015192395A1 true WO2015192395A1 (zh) | 2015-12-23 |
Family
ID=51503704
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2014/081156 Ceased WO2015192395A1 (zh) | 2014-06-17 | 2014-06-30 | 人声语音质量评分方法及系统 |
Country Status (4)
| Country | Link |
|---|---|
| EP (1) | EP3166239B1 (zh) |
| CN (1) | CN104050964A (zh) |
| ES (1) | ES2774018T3 (zh) |
| WO (1) | WO2015192395A1 (zh) |
Families Citing this family (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105810222A (zh) * | 2014-12-30 | 2016-07-27 | 研祥智能科技股份有限公司 | 一种音频设备的缺陷检测方法、装置及系统 |
| CN105989853B (zh) * | 2015-02-28 | 2020-08-18 | 科大讯飞股份有限公司 | 一种音频质量评测方法及系统 |
| CN106161705B (zh) * | 2015-04-22 | 2020-01-07 | 小米科技有限责任公司 | 音频设备测试方法及装置 |
| CN105450882B (zh) * | 2015-11-13 | 2018-07-20 | 公安部第三研究所 | 一种音频转换特性测试方法 |
| CN107465987A (zh) * | 2017-08-09 | 2017-12-12 | 广东思派康电子科技有限公司 | 一种语音交互音箱成品拾音系统的测试方法及其测试系统 |
| CN107659888A (zh) * | 2017-08-21 | 2018-02-02 | 广州酷狗计算机科技有限公司 | 识别伪立体声音频的方法、装置及存储介质 |
| CN110473519B (zh) * | 2018-05-11 | 2022-05-27 | 北京国双科技有限公司 | 一种语音处理方法及装置 |
| CN109979434A (zh) * | 2019-04-30 | 2019-07-05 | 成都启英泰伦科技有限公司 | 本地语音模组产品声学性能的测试方法 |
| CN111276161B (zh) * | 2020-03-05 | 2023-03-10 | 公安部第三研究所 | 一种语音质量评分系统及方法 |
| CN111312289B (zh) * | 2020-03-05 | 2023-03-10 | 公安部第三研究所 | 一种音频测试的预处理方法及系统 |
| CN111294367B (zh) * | 2020-05-14 | 2020-09-01 | 腾讯科技(深圳)有限公司 | 音频信号后处理方法和装置、存储介质及电子设备 |
| CN112382313A (zh) * | 2020-12-02 | 2021-02-19 | 公安部第三研究所 | 一种音频通讯质量评价系统及方法 |
| CN115695657B (zh) * | 2022-10-28 | 2023-07-25 | 广州芯德通信科技股份有限公司 | 利用频谱测试语音网关的低噪声电源的检测方法、装置及其系统 |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20070027687A1 (en) * | 2005-03-14 | 2007-02-01 | Voxonic, Inc. | Automatic donor ranking and selection system and method for voice conversion |
| CN101645271A (zh) * | 2008-12-23 | 2010-02-10 | 中国科学院声学研究所 | 发音质量评估系统中的置信度快速求取方法 |
| US20100318635A1 (en) * | 2008-03-07 | 2010-12-16 | Yuzo Senda | Content distributing system, feature amount distributing server, client, and content distributing method |
| CN103413558A (zh) * | 2013-08-08 | 2013-11-27 | 南京邮电大学 | 一种音频设备测试方法 |
| CN103607669A (zh) * | 2013-10-12 | 2014-02-26 | 公安部第三研究所 | 一种楼宇对讲系统音频传输特性检测方法及检测系统 |
| CN103730131A (zh) * | 2012-10-12 | 2014-04-16 | 华为技术有限公司 | 语音质量评估的方法和装置 |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2007047539A (ja) * | 2005-08-11 | 2007-02-22 | Sony Corp | 音場補正装置及び音場補正方法 |
| CN102214462B (zh) * | 2011-06-08 | 2012-11-14 | 北京爱说吧科技有限公司 | 用于发音评估的方法和系统 |
| JP2013247456A (ja) * | 2012-05-24 | 2013-12-09 | Toshiba Corp | 音響処理装置、音響処理方法、音響処理プログラムおよび音響処理システム |
-
2014
- 2014-06-17 CN CN201410269839.5A patent/CN104050964A/zh active Pending
- 2014-06-30 WO PCT/CN2014/081156 patent/WO2015192395A1/zh not_active Ceased
- 2014-06-30 EP EP14895085.0A patent/EP3166239B1/en active Active
- 2014-06-30 ES ES14895085T patent/ES2774018T3/es active Active
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20070027687A1 (en) * | 2005-03-14 | 2007-02-01 | Voxonic, Inc. | Automatic donor ranking and selection system and method for voice conversion |
| US20100318635A1 (en) * | 2008-03-07 | 2010-12-16 | Yuzo Senda | Content distributing system, feature amount distributing server, client, and content distributing method |
| CN101645271A (zh) * | 2008-12-23 | 2010-02-10 | 中国科学院声学研究所 | 发音质量评估系统中的置信度快速求取方法 |
| CN103730131A (zh) * | 2012-10-12 | 2014-04-16 | 华为技术有限公司 | 语音质量评估的方法和装置 |
| CN103413558A (zh) * | 2013-08-08 | 2013-11-27 | 南京邮电大学 | 一种音频设备测试方法 |
| CN103607669A (zh) * | 2013-10-12 | 2014-02-26 | 公安部第三研究所 | 一种楼宇对讲系统音频传输特性检测方法及检测系统 |
Also Published As
| Publication number | Publication date |
|---|---|
| EP3166239B1 (en) | 2019-11-06 |
| EP3166239A4 (en) | 2018-04-18 |
| EP3166239A1 (en) | 2017-05-10 |
| ES2774018T3 (es) | 2020-07-16 |
| CN104050964A (zh) | 2014-09-17 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2015192395A1 (zh) | 人声语音质量评分方法及系统 | |
| CN102157147B (zh) | 一种拾音系统语音质量客观评价的测试方法 | |
| KR950002442B1 (ko) | 오디오 신호처리 시스템을 검사하기 위한 장치 | |
| CN104485114B (zh) | 一种基于听觉感知特性的语音质量客观评估的方法 | |
| CN107170465B (zh) | 一种音频质量检测方法及音频质量检测系统 | |
| CN111429939B (zh) | 一种双声源的声音信号分离方法和拾音器 | |
| CN112017693B (zh) | 一种音频质量评估方法及装置 | |
| CN103546853A (zh) | 一种基于短时傅里叶变换的扬声器异常音检测方法 | |
| CN201984810U (zh) | 拾音系统语音质量客观评价的测试设备 | |
| Prego et al. | A blind algorithm for reverberation-time estimation using subband decomposition of speech signals | |
| JP2010112995A (ja) | 通話音声処理装置、通話音声処理方法およびプログラム | |
| CN112382313A (zh) | 一种音频通讯质量评价系统及方法 | |
| CN118413800B (zh) | 一种基于语音播报音质的扬声器缺陷识别方法 | |
| CN111276161B (zh) | 一种语音质量评分系统及方法 | |
| CN117577098A (zh) | 一种卫星宽带短报文通信的语音通信方法及系统 | |
| CN116959491A (zh) | 一种针对wav音频的分贝、回声、底噪及啸叫检测方法 | |
| Shujau et al. | Separation of speech sources using an acoustic vector sensor | |
| CN101635865B (zh) | 一种双音多频信号抗误检测的系统及方法 | |
| CN117935818B (zh) | 具有自动增益控制功能的音频编解码装置、方法和系统 | |
| CN113744745A (zh) | 一种实时音频频谱检测设备 | |
| CN111092668B (zh) | 一种对讲终端环境噪音抑制特性的测试方法及系统 | |
| CN115691556B (zh) | 一种设备端多通道语音质量的检测方法 | |
| KR102772703B1 (ko) | 카운터 센서를 이용한 dsp 방송 시스템 및 방법 | |
| CN120676303B (zh) | 基于环境音分析进行助听器测听的方法、装置及助听器 | |
| CN118692478B (zh) | 基于动态门限的混音方法、设备、系统和存储介质 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14895085 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| REEP | Request for entry into the european phase |
Ref document number: 2014895085 Country of ref document: EP |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2014895085 Country of ref document: EP |