WO2021164303A1 - 语音传输方法、系统、装置、计算机可读存储介质和设备 - Google Patents
语音传输方法、系统、装置、计算机可读存储介质和设备 Download PDFInfo
- Publication number
- WO2021164303A1 WO2021164303A1 PCT/CN2020/124263 CN2020124263W WO2021164303A1 WO 2021164303 A1 WO2021164303 A1 WO 2021164303A1 CN 2020124263 W CN2020124263 W CN 2020124263W WO 2021164303 A1 WO2021164303 A1 WO 2021164303A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- voice
- encoded data
- packet loss
- current
- redundant
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/005—Correction of errors induced by the transmission channel, if related to the coding algorithm
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/167—Audio streaming, i.e. formatting and decoding of an encoded audio signal representation into a data stream for transmission or storage purposes
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/60—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L1/00—Arrangements for detecting or preventing errors in the information received
- H04L1/0001—Systems modifying transmission characteristics according to link quality, e.g. power backoff
- H04L1/0002—Systems modifying transmission characteristics according to link quality, e.g. power backoff by adapting the transmission rate
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L1/00—Arrangements for detecting or preventing errors in the information received
- H04L1/0001—Systems modifying transmission characteristics according to link quality, e.g. power backoff
- H04L1/0009—Systems modifying transmission characteristics according to link quality, e.g. power backoff by adapting the channel coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L1/00—Arrangements for detecting or preventing errors in the information received
- H04L1/0001—Systems modifying transmission characteristics according to link quality, e.g. power backoff
- H04L1/0015—Systems modifying transmission characteristics according to link quality, e.g. power backoff characterised by the adaptation strategy
- H04L1/0019—Systems modifying transmission characteristics according to link quality, e.g. power backoff characterised by the adaptation strategy in which mode-switching is based on a statistical approach
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L1/00—Arrangements for detecting or preventing errors in the information received
- H04L1/0001—Systems modifying transmission characteristics according to link quality, e.g. power backoff
- H04L1/0023—Systems modifying transmission characteristics according to link quality, e.g. power backoff characterised by the signalling
- H04L1/0026—Transmission of channel quality indication
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L1/00—Arrangements for detecting or preventing errors in the information received
- H04L1/004—Arrangements for detecting or preventing errors in the information received by using forward error control
- H04L1/0041—Arrangements at the transmitter end
Definitions
- This application relates to the field of computer technology, in particular to a voice transmission method, system, device, computer-readable storage medium, and computer equipment.
- the Internet is an unreliable transmission network.
- the main problem faced by Internet-based voice transmission is the problem of anti-packet loss. Due to the instability of the transmission network, packet loss will occur in the transmission process.
- the FEC (Forward Error Correction) redundant coding channel coding algorithm is usually used to generate redundant packets, and the redundant packets are sent to the receiving end together with the data packets, and the receiving end receives After that, the lost data packets are recovered through redundant packets and original packets, so as to achieve the effect of anti-lost packets.
- FEC redundant coding relies on the generation of redundant packets to resist the packet loss problem of the transmission network, which is bound to increase the bandwidth by multiples and consume too much network bandwidth resources. The stronger the ability to resist packet loss, the more network bandwidth will be consumed. Especially for bandwidth-constrained scenarios, problems such as network congestion are prone to cause more packet loss.
- a voice transmission method including:
- the current coded data is obtained according to the first voice coding feature parameter corresponding to the current coded data and the second voice coding feature parameter corresponding to the previous coded data of the current coded data Corresponding packet loss recovery capability;
- a voice transmission system including a sending end and a receiving end, in which:
- the sending end is used to obtain the current coded data in the speech coding bitstream, and according to the first speech coding characteristic parameter corresponding to the current coded data and the value of the current coded data through the packet loss recovery capability prediction model based on machine learning
- the second speech coding feature parameter corresponding to the previously coded data to obtain the packet loss recovery capability corresponding to the current coded data
- the sending end is also used to determine whether redundant encoding processing is required according to the packet loss recovery capability; if so, perform redundant encoding according to the current encoded data to generate a corresponding redundant packet, and then perform the current encoding
- the data and the redundant packet are transmitted to the receiving end; if not, the currently encoded data is directly transmitted to the receiving end;
- the receiving end When the receiving end is used to receive the currently encoded data, it directly performs voice decoding on the currently encoded data to obtain the voice signal corresponding to the currently encoded data; it is also used for when the currently encoded data is not received And when the redundant packet is received, the receiving end performs redundant decoding processing based on the redundant packet to obtain the currently encoded data and then perform voice decoding on the currently encoded data to obtain the The speech signal corresponding to the current coded data; and
- the receiving end is also configured to perform packet loss recovery processing on the currently encoded data through the receiving end when the currently encoded data and the redundant packet are not received, to obtain the data corresponding to the currently encoded data
- the recovery package is to perform voice decoding on the recovery package to obtain the voice signal corresponding to the currently encoded data.
- a voice transmission device includes:
- the acquiring module is used to acquire the current encoded data in the speech encoding code stream
- the prediction module is used to obtain the first voice coding feature parameter corresponding to the current coded data and the second voice coding feature parameter corresponding to the previous coded data of the current coded data through the packet loss recovery capability prediction model based on machine learning The packet loss recovery capability corresponding to the currently encoded data;
- the redundant coding decision module is used to decide whether redundant coding processing is required according to the packet loss recovery capability; if so, perform redundant coding according to the current coded data to generate a corresponding redundant packet, and then convert the current The encoded data and the redundant packet are transmitted to the receiving end; if not, the currently encoded data is directly transmitted to the receiving end.
- One or more non-volatile computer-readable storage media storing computer-readable instructions.
- the computer-readable instructions are executed by one or more processors, the one or more processors execute the steps of the above-mentioned voice transmission method.
- a computer device includes a memory and one or more processors.
- the memory stores computer-readable instructions.
- the one or more processors execute the steps of the above-mentioned voice transmission method.
- Figure 1 is an application environment diagram of a voice transmission method in an embodiment
- Figure 2 is a diagram of the application environment of the voice transmission method in another embodiment
- FIG. 3 is a schematic flowchart of a voice transmission method in an embodiment
- Fig. 4 is a schematic block diagram of using FEC redundant coding mechanism for voice transmission in an embodiment
- FIG. 5 is a schematic flowchart of training steps of a packet loss recovery capability prediction model in an embodiment
- FIG. 6 is a training block diagram of a prediction model of packet loss recovery capability in an embodiment
- FIG. 7 is a flowchart of a voice transmission method in an embodiment
- FIG. 8 is a schematic flowchart of a voice transmission method in a specific embodiment
- Figure 9 is a structural block diagram of a voice transmission device in an embodiment
- Fig. 10 is a structural block diagram of a computer device in an embodiment.
- Fig. 1 is an application environment diagram of a voice transmission method in an embodiment.
- the voice transmission method is applied to a voice transmission system.
- the voice transmission system includes a sending end 110 and a receiving end 120.
- the sending end 110 and the receiving end 120 are connected through a network.
- Both the sending end 110 and the receiving end 120 may be terminals.
- the terminal may specifically be a desktop terminal or a mobile terminal, and the mobile terminal may specifically be at least one of a mobile phone, a tablet computer, and a notebook computer.
- the sending end 110 and the receiving end 120 may also be servers or server clusters.
- both the sending end 110 and the receiving end 120 run applications that support voice transmission.
- the server 130 can provide computing and storage capabilities for the application, and the sending end 110 ,
- the receiving end 120 can be connected to the server 130 through a network, so that the voice transmission between the two ends can be realized based on the server 130.
- the server 130 may be implemented as an independent server or a server cluster composed of multiple servers.
- the sending end 110 can obtain the current encoded data in the voice encoding bitstream; through the packet loss recovery capability prediction model based on machine learning, according to the first voice encoding feature parameter corresponding to the current encoded data and the current encoded data
- the second speech coding feature parameter corresponding to the previous coded data obtains the packet loss recovery capability corresponding to the current coded data; determines whether redundant coding processing is required according to the packet loss recovery capability; if so, performs redundant coding generation based on the current coded data
- the current encoded data and redundant packets are transmitted to the receiving end 120; if not, the current encoded data is directly transmitted to the receiving end 120, which can effectively improve the utilization of network bandwidth as a whole, and at the same time It can also ensure the anti-packet loss capability of the transmission network.
- a voice transmission method is provided.
- the method is mainly applied to the sending end 110 in FIG. 1 or FIG. 2 as an example.
- the voice transmission method specifically includes the following steps S302 to S308:
- the speech coding code stream is the original code stream obtained after speech coding is performed on the speech signal.
- the speech coding code stream includes a set of coded data to be transmitted.
- the encoded data may be an encoded data frame obtained by encoding a voice signal by a voice encoder at the transmitting end according to a specific frame length, and the transmitting end may transmit the encoded data frame in the voice encoding code stream to the receiving end through the network.
- the encoded data may also be an encoded data packet synthesized from multiple encoded data frames, and the transmitting end may transmit the encoded data packet in the voice encoding code stream to the receiving end through the network.
- the encoder on the transmitting end obtains a 60ms voice signal, divides the voice signal into 4 frames with a frame length of 15ms, and encodes them in order to obtain 4 encoded data frames.
- the transmitting end can transmit the encoded data frames to the receiving end in turn ,
- the sending end can also combine these 4 coded data frames into one coded data packet and then transmit it to the receiving end through the network.
- the sender will directly use FEC redundant coding to send each coded data in the speech coded stream to the receiver before transmitting the speech coded stream to the receiver.
- the receiving end can receive each coded data and the corresponding redundant packet through the network, and perform redundant decoding according to the redundant packet to obtain the lost coded data and then decode to obtain the voice signal.
- the voice code stream to be transmitted includes five coded data P1, P2, P3, P4, and P5.
- the receiving end can perform redundant coding based on these five coded data to generate redundant packets.
- the number of redundant packets can be one. Or more, here it is assumed that two redundant packets R1 and R2 are generated, and then P1, P2, P3, P4, P5 and R1, R2 are packaged and sent to the receiving end.
- the sending end can predict the receiving end in turn before sending each encoded data in the speech coded stream to the receiving end.
- the sending end can obtain the coded data in the speech code stream in turn, and the current coded data is the coded data currently to be transmitted to the receiving end.
- the current coded data used in this application is used to describe the coded data currently being processed by the sending end
- the previous coded data is used to describe the coded data before the current coded data in the speech code stream
- the previous coded data may be
- the previous coded data of the current coded data may also be the previous multiple coded data of the current coded data, for example, it may be the first two coded data of the currently coded data.
- the current coded data is a relatively changing object. For example, after the sending end processes the current coded data F(i), the next coded data of the current coded data F(i) in the speech code stream can be changed. F(i+1) is taken as the new current coded data, and the current coded data F(i) is taken as the previous coded data of the new current coded data F(i+1).
- the above-mentioned voice transmission method further includes: obtaining the original voice signal; dividing the original voice signal to obtain the original voice sequence; and sequentially performing voice encoding on the voice segments in the original voice sequence to obtain the voice code stream.
- the original voice signal acquired by the sender is a 2-second voice.
- This voice signal is divided in units of 20 milliseconds to obtain an original voice sequence composed of 100 voice segments, and then each of the original voice sequences is sequentially
- the speech fragments are speech-encoded, and the encoded data corresponding to each speech fragment is obtained, thereby generating a speech encoding code stream corresponding to the original speech signal.
- the above-mentioned voice transmission method further includes: obtaining the voice coding feature parameters corresponding to the voice segments in the original voice sequence; performing voice coding on the corresponding voice segments according to the voice coding feature parameters, and generating the corresponding coded data to obtain Voice coding code stream; buffers the voice coding characteristic parameters used by each coded data in the voice coding process.
- the sending end extracts the speech encoding feature parameters of the speech segments in the original speech sequence, encodes the extracted speech encoding feature parameters, and generates encoded data corresponding to each speech segment, for example, the encoder at the sender Extract the speech coding feature parameters of the speech segment through some speech signal processing models (such as filters, feature extractors, etc.), and then encode these speech coding feature parameters (such as entropy coding) and pack them according to a certain data format to obtain the corresponding code data.
- some speech signal processing models such as filters, feature extractors, etc.
- the sender can jointly generate the current coding data corresponding to the current speech segment according to the speech coding feature parameters of the current speech segment and the speech coding feature parameters of the previous speech segment, and can also generate the current coding data corresponding to the current speech segment according to the speech coding feature parameters and the speech coding feature parameters of the current speech segment.
- the speech coding feature parameters of the subsequent speech segment jointly generate the current coded data corresponding to the current speech segment.
- the speech coding feature parameters may be parameters such as line spectrum frequency (LSF), pitch detection, adaptive codebook gain (adaptive gain), and fixed codebook gain extracted from signal processing of the speech segment.
- the sending end when the sending end generates the encoded data corresponding to each speech segment, it will also buffer the speech encoding feature parameters of each speech segment during the encoding process, that is, the speech encoding feature parameters used when generating each encoded data, for subsequent use based on The buffered speech coding feature parameters predict the packet loss recovery capability corresponding to each coded data.
- the packet loss recovery capability is a prediction result that can reflect the voice quality of the recovered packet obtained by the receiving end after the loss of the current encoded data by performing packet loss recovery processing on the current encoded data.
- the prediction result indicates that the receiving end can well recover the lost current coded data or cannot well recover the lost current coded data.
- the packet loss recovery process is PLC (Packet Loss Concealment), and the packet loss recovery capability is the PLC's packet loss recovery capability.
- the receiving end's packet loss recovery capability is limited. For example, in the case of adjacent or similar coded data with fundamental frequency hopping, LSF sudden change, etc., the receiving end of the packet loss Recoverability is limited. In this case, enabling FEC redundant coding at the sending end can effectively increase the packet loss rate and ensure the voice quality of the receiving end; and when the numerical fluctuations of the voice coding characteristic parameters of adjacent coded data are relatively stable, the receiving end It usually has good packet loss recovery capabilities. In this case, the sender does not need to enable FEC redundant coding.
- the packet loss recovery capability corresponding to the current encoded data is related to its corresponding speech encoding feature parameters.
- the machine learning model can be trained through a large number of training samples and learn how to predict the loss of data packets corresponding to the speech encoding feature parameters.
- Package recovery capability can be provided.
- the sending end may obtain the first voice coding feature parameter corresponding to the current coded data in the buffer, and the second voice coding feature parameter corresponding to the previously coded data, and use the pre-trained packet loss recovery capability prediction model according to the first A voice coding feature parameter and a second voice coding feature parameter predict the loss recovery capability corresponding to the current coded data.
- the sender may use the packet loss recovery capability prediction model to obtain the current encoding feature parameter according to the first voice encoding feature parameter corresponding to the current encoded data and the third voice encoding feature parameter corresponding to the subsequent encoded data of the current encoded data.
- the packet loss recovery capability corresponding to the encoded data is obtained.
- the post-coded data is used to describe the coded data after the current coded data in the speech code stream.
- the post-coded data can be the next coded data of the current coded data, or multiple coded data after the current coded data, such as , Can be the last two coded data of the current coded data.
- the speech coding characteristic parameters corresponding to which coded data is used by the sender as the input of the packet loss recovery capability prediction model depends on the algorithm rules adopted by the sender for speech encoding or the algorithm adopted by the receiver for speech decoding. Algorithm rules, coding and decoding rules correspond to each other. For example, if the sender needs to predict the packet loss recovery capability corresponding to the current encoded data based on the voice encoding feature parameters corresponding to the previous encoded data when generating the current encoded data, it needs to use the previous encoded data.
- the speech coding feature parameters are used as the input of the packet loss recovery capability prediction model; if the sending end is generating the current coded data, it needs to perform the packet loss recovery capability corresponding to the current coded data according to the voice coding feature parameters adopted by the latter coded data. When predicting, it is necessary to use the speech coding feature parameters used in the latter coded data as the input of the packet loss recovery capability prediction model.
- the predictive model of packet loss recovery capability is a computer model based on machine learning, which can be implemented using a neural network model. Machine learning models can learn from samples to have specific capabilities.
- the packet loss recovery capability prediction model is a model that is trained in advance and has the ability to predict packet loss recovery.
- the sender can set the model structure of the machine learning model in advance to obtain the initial machine learning model, and then train the initial machine learning model through a large number of sample voice and packet loss simulation tests to obtain the model parameters of the machine learning model .
- the sender can obtain the model parameters obtained in advance, and then import the model parameters into the initial machine learning model to obtain the packet loss recovery ability prediction model, and use the packet loss recovery ability prediction model to compare The packet loss recovery capability corresponding to each encoded data in the speech encoding bitstream is predicted, so as to determine whether to enable FEC redundant encoding for the current encoded data according to the predicted packet loss recovery capability.
- this training step can be executed by any computer device to obtain a trained packet loss recovery ability prediction model, and then the trained packet loss recovery ability prediction model is imported to the sending end that needs to perform voice transmission; this
- the computer device may also be the sending end in FIG. 1 or FIG. 2, that is to say, the training step may also be directly executed by the sending end and obtain a trained prediction model of packet loss recovery ability.
- the following uses computer equipment as the main body of execution to illustrate the training steps of the packet loss recovery capability prediction model, which specifically include:
- S502 Obtain a sample voice sequence in the training set.
- the computer device can obtain a large number of voice signals, and divide the voice signals to obtain a large number of voice signal sequences composed of voice segments, which are used as sample voice sequences for training the machine learning model.
- S504 Perform voice coding on the sample voice sequence to obtain a sample voice coding bitstream.
- the computer device extracts the voice coding feature parameters corresponding to each voice segment, and generates the coding data corresponding to each voice segment according to the extracted voice coding feature parameters, and obtains the sample voice corresponding to each sample voice sequence. Encoding stream.
- the computer equipment can buffer the speech coding characteristic parameters used by each coded data in the coding process.
- S506 Extract the first voice coding feature parameter used by the current coded data in the sample voice coding bitstream and the second voice coding feature parameter used by the previous coded data of the current coded data.
- the loss recovery capability corresponding to the encoded data is related to its corresponding speech encoding feature parameters, and may also be related to the speech encoding feature parameters corresponding to the previous encoded data and/or the subsequent encoded data. Therefore, during training , The computer equipment can use the speech coding feature parameters as the input of the machine learning model for training.
- the sending end may extract the first voice coding feature parameter corresponding to the currently processed current coded data and the second voice coding feature parameter corresponding to the previous coded data of the current coded data, as the input of the machine learning model.
- the previous coded data is the previous coded data of the currently coded data, and it may also be the previous multiple coded data of the currently coded data.
- each training object is one piece of coded data
- each sample voice code stream includes multiple coded data
- each sample voice code stream can be used for multiple trainings.
- the sending end can extract the speech coding feature parameters corresponding to the i-th coded data and the speech coding feature parameters corresponding to the i-1th coded data in the sample speech coding code stream S, and the sending end can also Extract the voice coding feature parameter corresponding to the i+1-th coded data and the voice coding feature parameter corresponding to the i-th coded data in the sample voice coding bitstream S.
- S508 Obtain a first voice quality score determined based on the first voice signal after directly decoding the sample voice encoding code stream and obtaining the first voice signal.
- the computer device may directly decode the encoded sample voice code stream obtained after encoding to obtain the first voice signal, and then use a voice quality testing tool to test the first voice quality score corresponding to the first voice signal. Since the first speech signal is obtained by directly decoding the sample speech coding code stream, there is no loss of coded data. Therefore, the obtained first speech signal is very close to the original sample speech sequence, which can be called a lossless speech signal.
- the corresponding first voice quality score may be referred to as a lossless voice quality score.
- the voice quality testing tool may be PESQ (Perceptual Evaluation of Speech Quality, subjective voice quality evaluation), and PESQ can objectively evaluate the quality of the voice signal according to some measurement standards, thereby providing a fully quantifiable voice quality measurement Methods, these measurement standards are in good agreement with human perception of voice quality.
- the first voice quality score obtained can be recorded as MOS_UNLOSS.
- S510 Obtain a recovery packet by performing simulated packet loss recovery processing on the current encoded data, and after decoding the recovery packet and obtaining a second voice signal, a second voice quality score determined based on the second voice signal.
- the computer device can use the current encoded data as the lost data packet, and simulate the decoder at the receiving end to perform packet loss recovery processing on the current encoded data and obtain the corresponding recovery packet, and obtain the corresponding second voice signal after decoding the recovery packet. Then, other voice segments in the original sample voice sequence are spliced with the second voice signal to perform a voice quality score to obtain a second voice quality score. Since the second voice signal is obtained by decoding the recovery packet obtained in the case of simulated packet loss, there is a loss between the recovery packet and the lost current coded data, so the obtained second voice signal is one of the voice fragments corresponding to the current coded data. There will also be loss in time, the second voice signal may be called a lossy voice signal, and the determined second voice quality score may be called a lossy voice quality score, which is recorded as MOS_LOSS.
- S512 Determine the true packet loss recovery capability corresponding to the current encoded data according to the score difference between the first voice quality score and the second voice quality score.
- the true packet loss recovery capability corresponding to the current encoded data can be measured by the score difference between the first voice quality score and the second voice quality score, that is, MOS_UNLOSS-MOS_LOSS can be used as the true packet loss recovery corresponding to the current encoded data Ability, that is, the target output of the machine learning model.
- the true packet loss recovery capability corresponding to the current encoded data is inversely related to the difference in the score, that is, the smaller the difference is, the better the voice quality of the recovered packet obtained by simulating the loss of the current encoded data after packet loss recovery, and the current encoded data
- the corresponding true packet loss recovery capability is stronger; conversely, the larger the difference, it means that the voice quality of the recovery packet obtained by simulating the loss of the current encoded data after packet loss recovery is poor.
- S514 Input the first voice coding feature parameter and the second voice coding feature parameter to the machine learning model, and output the predicted packet loss recovery capability corresponding to the current coded data through the machine learning model.
- the computer device can input the obtained first speech coding feature parameter and the second speech coding feature parameter into the machine learning model, and output the prediction loss corresponding to the current coded data through internal network processing.
- Package recovery capability It should be noted that S514 may also be executed before step S508, and this embodiment does not limit the execution order of this step.
- the computer device can construct a loss function based on the acquired true packet loss recovery ability and the predicted packet loss recovery ability obtained through the machine learning model, and the model parameters obtained when the loss function is minimized are used as the latest model parameters of the machine learning model, and continue The next training is performed according to the sample voice sequence, until the machine learning model converges or the number of training times reaches a preset number of times, a trained packet loss recovery ability prediction model with the ability to predict packet loss recovery is obtained.
- FIG. 6 it is a schematic diagram of the framework of training a machine learning model to obtain a packet loss recovery capability prediction model in an embodiment.
- Figure 6 shows a schematic flow diagram of a single training process.
- the computer device obtains the sample voice sequence, and performs voice coding on the sample voice sequence to obtain a sample voice code stream.
- decode the sample voice encoding code stream directly without packet loss in the current encoded data and then use PESQ to obtain MOS_UNLOSS, then simulate the loss of the current encoded data and perform packet loss recovery processing after decoding and then use PESQ to obtain MOS_LOSS.
- step S304 the packet loss recovery capability prediction model based on machine learning is used according to the first voice coding feature parameter corresponding to the current coded data and the second voice coding feature parameter corresponding to the previous coded data of the current coded data.
- Obtaining the packet loss recovery capability corresponding to the current encoded data includes: inputting the first voice encoding feature parameter corresponding to the currently encoded data and the second voice encoding feature parameter corresponding to the previous encoded data of the current encoded data into the packet loss recovery capability prediction Model; through the packet loss recovery capability prediction model, according to the first voice coding feature parameter and the second voice coding feature parameter, output the first voice quality score determined by directly decoding the current coded data and perform packet loss recovery on the current coded data After processing, the score difference between the second voice quality scores determined by decoding is decoded; the packet loss recovery capability corresponding to the current encoded data is determined according to the score difference; wherein, the packet loss recovery capability corresponding to the current encoded data is inversely related to the score difference.
- the packet loss recovery capability corresponding to the current encoded data can be predicted through the pre-trained packet loss recovery capability prediction model.
- the first speech encoding feature parameter corresponding to the current encoded data and the second speech encoding feature parameter corresponding to the previous encoded data are used as the input of the packet loss recovery capability prediction model, and the output of the packet loss recovery capability prediction model is the current encoding
- the score difference reflects the current encoded data received after packet loss
- the quality of the packet loss recovery processing performed by the end that is, the size of the packet loss recovery ability, and the packet loss recovery ability is inversely related to the difference in scores.
- the score difference is large, that is, the packet loss recovery ability is less than the preset threshold, it means that after the current encoded data is lost, if the receiving end performs packet loss recovery processing, the quality of the voice signal obtained is poor.
- the score difference is small, it means When the packet loss recovery capability is greater than the preset threshold, it means that the voice signal quality obtained by the receiving end if the packet loss recovery processing is performed after the current encoded data is lost is within an acceptable range.
- step S306 Determine whether redundant encoding processing is required according to the packet loss recovery capability; if so, perform step S308, perform redundant encoding according to the current encoded data to generate corresponding redundant packets, and then transmit the current encoded data and redundant packets to The receiving end; if not, step S310 is executed to directly transmit the currently encoded data to the receiving end.
- the sending end After the sending end obtains the packet loss recovery capability corresponding to the current encoded data through the packet loss recovery capability prediction model, it decides whether to add the current encoded data to the FEC redundancy code according to the predicted packet loss recovery capability.
- the packet loss recovery capability output by the packet loss recovery capability prediction model is a value within a numerical range
- the sender can compare the packet loss recovery capability with a preset threshold, and determine whether it needs to be corrected according to the comparison result.
- the current encoded data undergoes redundant encoding processing.
- the packet loss recovery capability when the packet loss recovery capability is less than the preset threshold, the current encoded data is subjected to redundant encoding to generate a corresponding redundant packet, and then the current encoded data and the redundant packet are transmitted to the receiving end.
- the packet loss recovery capability When the packet loss recovery capability is When the value is less than the preset threshold, it means that if the receiving end performs packet loss recovery processing after the current encoded data is lost, the quality of the voice signal obtained by the packet loss recovery process is poor. Therefore, FEC redundant encoding needs to be used to combat the packet loss problem of the transmission network, that is, the current encoded data needs to be changed. Add FEC redundant coding to generate redundant packets and then transmit them to the receiving end.
- the current encoded data is directly transmitted to the receiving end.
- the loss recovery capability is greater than the preset threshold, it means that the current encoded data is lost if the receiving end performs packet loss recovery processing to obtain the voice
- the signal quality is within the acceptable range, so for the encoded data, the sender does not need to use FEC redundant coding as an anti-packet loss strategy.
- the sender can directly transmit the current encoded data to the receiver. If the current encoded data is lost In the case of, directly use the built-in packet loss recovery algorithm in the decoder of the receiving end to perform packet loss recovery processing on the current encoded data.
- the packet loss recovery capability output by the packet loss recovery capability prediction model is of two types.
- the packet loss recovery capability is the first value, it means that the current encoded data is lost if the receiving end performs packet loss recovery processing. If the quality of the voice signal is poor, the sender needs to perform FEC redundant encoding on the current encoded data packet before transmitting to the receiver; when the loss recovery capability is the second value, it means that if the current encoded data is lost, if the receiver loses The voice signal quality obtained by the packet recovery process is within the acceptable range, then the sender can directly transmit the current encoded data to the receiver, and in the case of loss of the current encoded data, directly use the built-in packet loss in the decoder of the receiver
- the recovery algorithm can perform packet loss recovery processing on the current encoded data.
- the first value can be 1, and the second value can be 0.
- the first value can be 0, and the second value can be 1.
- the voice coding stream to be transmitted includes P1, P2, P3, P4..., assuming that the current coded data is P7, and the sender predicts that the loss recovery ability corresponding to P7 is weak, then P7 can be added to the need for redundancy.
- the remaining coded buffer queue (at this time, the buffer queue may be empty, or it may have stored the previous coded data, such as P5).
- the sender can buffer the encoded data in the queue for redundant encoding to generate redundant packets, and then buffer the The encoded data in the queue and the generated redundant packets are sent to the receiving end, and the buffer queue is cleared at the same time.
- transmitting the current encoded data and the redundant packet to the receiving end includes: obtaining feedback from the receiving end Packet loss status information; determine the redundancy rate corresponding to the current encoded data according to the packet loss status information; according to the redundancy rate, generate redundant packets based on the current encoded data and then combine the current encoded data with The redundant packet is transmitted to the receiving end.
- the receiving end may determine the packet loss status information according to the received data packet, and feed back the packet loss status information to the sending end.
- the packet loss status information can be represented by the current packet loss rate.
- the receiver can encapsulate the packet loss rate into a message and send it to the sender, and the sender parses the received control message to obtain the packet loss rate.
- the sending end can adjust the redundancy rate To achieve different levels of anti-packet loss effects, that is, a large redundancy rate can solve more continuous packet loss problems, and a small redundancy rate can solve a small number of continuous packet loss or sporadic packet loss problems, that is, r under high packet loss rate. The value is larger, and the value of r under low packet loss rate is smaller.
- the voice transmission method further includes: when the receiving end receives the currently encoded data, directly performing voice decoding on the currently encoded data to obtain the voice signal corresponding to the current encoded data; when the receiving end does not receive the currently encoded data And when a redundant packet is received, the receiving end performs redundant decoding processing based on the redundant packet to obtain the current encoded data and then perform voice decoding on the current encoded data to obtain the voice signal corresponding to the current encoded data.
- the sender adds encoded data P3, P4, P6, P7, P8, and P9 to the buffer queue after predicting the loss recovery ability (the length of the buffer queue can be set as needed, for example, 6) , Perform redundant encoding to generate redundant packets R1, R2, encapsulate the encoded data P3, P4, P6, P7, P8, P9 in the buffer queue and the generated redundant packets R1, R2 into a data group and send it to the receiving end
- the packet sequence number of each data packet in the data group can be continuous, for example, it can be 1, 2, 3, 4, 5, and 6 in sequence.
- the receiving end can directly decode the voice according to the received P3, P4, P6 to obtain the corresponding voice signal; at the same time, the receiving end can buffer P3, P4 and P6 are used for subsequent possible FEC redundant decoding. If there is no packet loss in the subsequent group of data, the buffer is cleared.
- the receiving end When the receiving end receives P8 and P9, the receiving end can judge that P7 is lost according to the packet sequence number. At this time, the receiving end buffers P8 and P9 until R1 is received, and the receiving end can according to the buffered P3, P4, P6, P8 , P9 and R1 perform redundant decoding processing to obtain the missing P7. When R2 continues to be received, it can be discarded directly.
- the voice transmission method further includes:
- the receiving end When the receiving end does not receive the current encoded data and redundant packets, the receiving end performs packet loss recovery processing on the current encoded data to obtain the recovery packet corresponding to the current encoded data, and performs voice decoding on the recovery packet to obtain the current encoded data The corresponding voice signal.
- the receiving end in the case of loss of P7, if the receiving end does not receive R1 and R2 within a certain period of time, the receiving end cannot recover P7 based on the cached P3, P4, P6, P8, and P9, it needs to pass
- the PLC algorithm built in the decoder performs packet loss recovery processing on the current encoded data, usually based on the decoding information of the previous data packet, using the method of pitch synchronization repetition to approximately replace the current encoded data as a recovery package, and then decode the recovery package. Obtain the voice signal.
- the receiving end also needs to perform packet loss recovery processing on the current encoded data through the PLC algorithm built into the decoder.
- the packet loss recovery capability prediction model based on machine learning is used according to the first voice encoding feature parameters corresponding to the current encoded data and the second voice corresponding to the previous encoded data.
- the encoding feature parameters are used to predict the receiving end’s ability to recover from the loss of the current encoded data, and then determine whether to perform redundant encoding on the current encoded data according to the loss recovery ability. If so, the current encoded data needs to be redundantly encoded to generate redundancy. After the remaining packets, the necessary network bandwidth resources are consumed to transmit the redundant packets to the receiving end.
- the current encoded data does not need to be redundantly encoded, and the current encoded data is directly transmitted to the receiving end to avoid consuming too much network bandwidth Resources, thereby effectively improving the utilization of network bandwidth as a whole, while also ensuring the anti-packet loss ability of the transmission network.
- FIG. 7 it is a flowchart of a voice transmission method in an embodiment.
- the sending end obtains the original voice signal, and performs voice coding on the original voice signal to obtain a voice coded stream.
- the transmitting end predicts the packet loss recovery capability of the receiving end for each coded data in the speech encoding bitstream by using a packet loss recovery capability prediction model based on machine learning. Then, it is determined whether to enable FEC redundancy coding for the current coded data according to the predicted packet loss recovery capability.
- the redundancy rate is set according to the packet loss status information fed back by the receiving end, and redundant packets are generated according to the redundancy rate according to the current encoded data, and the current encoded data and redundancy The remaining packets are transmitted to the receiving end. If it is decided not to enable redundant coding for the current coded data, the current coded data is directly transmitted to the receiving end.
- the receiving end If the receiving end receives the currently encoded data, it will reconstruct the voice signal according to the normal decoding process. If the receiving end does not receive the current coded data, but receives a redundant packet, the receiving end can perform FEC redundant decoding to obtain the current coded data under the condition that the packet loss can be recovered through redundant decoding. If the receiving end does not receive the current encoded data and the corresponding redundant packet within a certain period of time, it is determined that the current encoded data is lost, and the receiving end can perform packet loss recovery processing on the current encoded data through the built-in PLC algorithm of the decoder and then decode it. voice signal.
- FIG. 8 it is a schematic flowchart of a voice transmission method in a specific embodiment. Refer to Figure 8, including the following steps:
- S806 Perform voice coding on the voice segments in the original voice sequence in sequence to obtain a voice coding bitstream.
- S810 Acquire current coded data in the speech coding bitstream.
- S814 Using the packet loss recovery capability prediction model, according to the first voice encoding feature parameter and the second voice encoding feature parameter, output the first voice quality score determined by directly decoding the current encoded data and perform packet loss recovery on the current encoded data After processing, the score difference between the determined second voice quality scores is decoded.
- S816 Determine the packet loss recovery capability corresponding to the current encoded data according to the difference in scores.
- a voice transmission system may be the voice transmission system shown in FIG. 1 or FIG. 2, and includes a sending end 110 and a receiving end 120:
- the sending end 110 is used to obtain the current coded data in the voice coding bitstream, through the packet loss recovery capability prediction model based on machine learning, according to the first voice coding feature parameter corresponding to the current coded data and the previous coded data corresponding to the current coded data To obtain the packet loss recovery capability corresponding to the current encoded data;
- the sending end 110 is also used to determine whether redundant encoding processing is required according to the packet loss recovery capability; if so, perform redundant encoding according to the current encoded data to generate a corresponding redundant packet, and then transmit the current encoded data and the redundant packet to Receiving end; if not, directly transmit the current coded data to the receiving end;
- the receiving terminal 120 is used to directly decode the current encoded data to obtain the voice signal corresponding to the current encoded data when receiving the current encoded data; it is also used to when the current encoded data is not received and redundant packets are received, Then, the receiving end performs redundant decoding processing based on the redundant packets to obtain the current coded data and then perform voice decoding on the current coded data to obtain the voice signal corresponding to the current coded data;
- the receiving end 120 is also used to perform packet loss recovery processing on the current encoded data through the receiving end when the current encoded data and redundant packets are not received, to obtain a recovery packet corresponding to the current encoded data, and to perform voice decoding on the recovery packet to obtain The voice signal corresponding to the current coded data.
- the sending end 110 is also used to obtain the original voice signal; divide the original voice signal to obtain the original voice sequence; and sequentially perform voice coding on the voice segments in the original voice sequence to obtain a voice code stream.
- the sending end 110 is also used to obtain the voice coding feature parameters corresponding to the voice segments in the original voice sequence; perform voice coding on the corresponding voice segments according to the voice coding feature parameters, and generate the corresponding coded data to obtain the voice. Encoding code stream; buffering the voice coding characteristic parameters used by each coded data in the voice coding process.
- the sending end 110 is further configured to input the first speech coding characteristic parameter corresponding to the current coded data and the second speech coding characteristic parameter corresponding to the previous coded data of the current coded data into the packet loss recovery capability prediction model; Through the packet loss recovery capability prediction model, according to the first voice coding feature parameter and the second voice coding feature parameter, output the first voice quality score determined by directly decoding the current coded data and perform packet loss recovery processing on the current coded data Decoding the determined score difference between the second voice quality scores; determining the packet loss recovery capability corresponding to the current encoded data according to the score difference; wherein, the packet loss recovery capability corresponding to the current encoded data is inversely related to the score difference.
- the sending end 110 is also used to obtain the packet loss status information fed back by the receiving end; determine the redundancy rate corresponding to the current encoded data according to the packet loss status information; according to the redundancy rate, generate redundancy based on the current encoded data. After the remaining packets, the current encoded data and redundant packets are transmitted to the receiving end.
- the receiving end 120 is further configured to directly perform voice decoding on the currently encoded data when the receiving end receives the currently encoded data to obtain the voice signal corresponding to the currently encoded data.
- the receiving end 120 is further configured to: when the receiving end does not receive the currently encoded data and receives a redundant packet, the receiving end performs redundant decoding processing based on the redundant packet to obtain the current encoded data. Perform voice decoding on the current coded data to obtain the voice signal corresponding to the current coded data.
- the receiving end 120 is further configured to, when the receiving end does not receive the current encoded data and redundant packets, perform packet loss recovery processing on the current encoded data through the receiving end to obtain a recovery packet corresponding to the current encoded data. , Perform voice decoding on the recovery packet to obtain the voice signal corresponding to the current encoded data.
- the sending end 110 is also used to obtain the sample voice sequence in the training set; perform voice coding on the sample voice sequence to obtain the sample voice coding code stream; extract the first coded data in the sample voice code stream.
- the first voice quality score obtain the current encoded data to perform simulated packet loss recovery processing to obtain a recovery package, decode the recovery package and obtain the second voice signal, and then determine the second voice quality score based on the second voice signal;
- the score difference between the first voice quality score and the second voice quality score determines the true packet loss recovery capability corresponding to the current encoded data;
- the first voice encoding feature parameter and the second voice encoding feature parameter are input to the machine learning model, and the machine The learning model outputs the predicted packet loss recovery ability corresponding to the current encoded data; after
- the sender before transmitting the current coded data to the receiving end, uses a machine learning-based packet loss recovery capability prediction model, according to the first voice coding feature parameter corresponding to the current coded data and the first voice coding characteristic parameter corresponding to the previous coded data.
- Speech coding feature parameters are used to predict the receiving end’s ability to recover from packet loss of the current encoded data, so as to determine whether to perform redundant encoding on the current encoded data according to the packet loss recovery ability. If so, the current encoded data needs to be redundantly encoded After generating the redundant packets, the necessary network bandwidth resources are consumed to transmit the redundant packets to the receiving end.
- the current encoded data does not need to be redundantly encoded, and the current encoded data is directly transmitted to the receiving end to avoid excessive consumption.
- Network bandwidth resources thereby effectively improving the utilization of network bandwidth as a whole, while also ensuring the anti-packet loss ability of the transmission network.
- a voice transmission device 900 which can be implemented as all or a part of the receiving end through software, hardware or a combination of the two.
- the device includes an acquisition module 902, a prediction module 904, and a redundant coding decision module 906:
- the obtaining module 902 is used to obtain the current coded data in the speech coding code stream
- the prediction module 904 is configured to obtain the current encoding feature parameters corresponding to the first voice encoding data corresponding to the current encoded data and the second encoding feature parameters corresponding to the previous encoded data of the current encoded data through the packet loss recovery capability prediction model based on machine learning.
- the loss recovery capability corresponding to the encoded data
- the redundant coding decision module 906 is used to decide whether to perform redundant coding processing according to the packet loss recovery ability; if so, perform redundant coding according to the current coding data to generate a corresponding redundant packet, and then combine the current coding data and the redundant The packet is transmitted to the receiving end; if not, the current coded data is directly transmitted to the receiving end.
- the voice transmission device 900 further includes a voice coding module for obtaining the original voice signal; dividing the original voice signal to obtain the original voice sequence; and sequentially performing voice coding on the voice segments in the original voice sequence to obtain the voice Encoding stream.
- the voice transmission device 900 further includes a voice coding module and a buffer module.
- the voice coding module is used to obtain the voice coding feature parameters corresponding to the voice segments in the original voice sequence; Perform voice coding, generate the corresponding coded data, and obtain the voice coding code stream; the buffer module is used to buffer the voice coding characteristic parameters used by each coded data in the voice coding process.
- the prediction module 904 is further configured to input the first speech coding feature parameter corresponding to the current coded data and the second speech coding feature parameter corresponding to the previous coded data of the current coded data into the packet loss recovery capability prediction model; Through the packet loss recovery capability prediction model, according to the first voice coding feature parameter and the second voice coding feature parameter, output the first voice quality score determined by directly decoding the current coded data and perform packet loss recovery processing on the current coded data Decode the score difference between the second voice quality scores determined; determine the packet loss recovery capability corresponding to the current encoded data according to the score difference; wherein, the packet loss recovery capability corresponding to the current encoded data is inversely related to the score difference.
- the redundant encoding decision module 906 is further configured to obtain the packet loss status information fed back by the receiving end when the packet loss recovery capability is less than a preset threshold; and determine the redundancy corresponding to the current encoded data according to the packet loss status information. Residual rate: According to the redundancy rate, the current encoded data and the redundant packet are transmitted to the receiving end after the redundant packet is generated according to the current encoded data.
- the voice transmission device 900 further includes a model training module, which is used to obtain the sample voice sequence in the training set; perform voice coding on the sample voice sequence to obtain the sample voice coding bitstream; extract the current sample voice coding bitstream.
- the voice transmission device 900 Before transmitting the current encoded data to the receiving end, the voice transmission device 900 uses a machine learning-based packet loss recovery capability prediction model, according to the first voice encoding feature parameter corresponding to the current encoded data and the second corresponding to the previous encoded data. Speech coding feature parameters are used to predict the receiving end’s ability to recover from packet loss of the current encoded data, and then determine whether to perform redundant encoding on the current encoded data according to the packet loss recovery ability. If so, the current encoded data needs to be redundantly encoded to generate After the redundant packet, the necessary network bandwidth resources are consumed to transmit the redundant packet to the receiving end.
- the current encoded data does not need to be redundantly encoded, and the current encoded data is directly transmitted to the receiving end to avoid excessive network consumption Bandwidth resources, thereby effectively improving the utilization of network bandwidth as a whole, while also ensuring the anti-packet loss ability of the transmission network.
- Fig. 10 shows an internal structure diagram of a computer device in an embodiment.
- the computer device may specifically be the sending end 110 in FIG. 1.
- the computer device includes the computer device including a processor, a memory, and a network interface connected through a system bus.
- the memory includes a non-volatile storage medium and an internal memory.
- the non-volatile storage medium of the computer device stores an operating system, and may also store computer-readable instructions.
- the processor can realize the voice transmission method.
- the internal memory may also store computer-readable instructions, and when the computer-readable instructions are executed by the processor, the processor can execute the voice transmission method.
- FIG. 10 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
- the specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
- the voice transmission device 900 provided in the present application may be implemented in a form of computer-readable instructions, and the computer-readable instructions may run on the computer device as shown in FIG. 10.
- the memory of the computer device can store various program modules that make up the voice transmission device 900, for example, the acquisition module 902, the prediction module 904, and the redundant coding decision module 906 shown in FIG. 9.
- the computer-readable instructions formed by each module cause the processor to execute the steps in the voice transmission method of each embodiment of the present application described in this specification.
- the computer device shown in FIG. 10 may execute step S302 through the acquiring module 902 in the voice transmission apparatus 900 shown in FIG. 9.
- the computer device may execute step S304 through the prediction module 904.
- the computer device can execute steps S306, S308, and S310 through the redundant coding decision module 906.
- a computer device including a memory and a processor, and the memory stores computer-readable instructions, and when the computer-readable instructions are executed by the processor, the processor executes the steps of the above-mentioned voice transmission method.
- the steps of the voice transmission method here may be the steps in the voice transmission method of each of the foregoing embodiments.
- a computer-readable storage medium which stores computer-readable instructions, and when the computer-readable instructions are executed by a processor, the processor executes the steps of the above-mentioned voice transmission method.
- the steps of the voice transmission method here may be the steps in the voice transmission method of each of the foregoing embodiments.
- a computer program product or computer readable instruction includes a computer readable instruction, and the computer readable instruction is stored in a computer readable storage medium.
- the processor of the computer device reads the computer-readable instruction from the computer-readable storage medium, and the processor executes the computer-readable instruction, so that the computer device executes the steps in the foregoing method embodiments.
- a person of ordinary skill in the art can understand that all or part of the processes in the methods of the foregoing embodiments can be implemented by instructing relevant hardware through computer-readable instructions.
- the computer-readable instructions can be stored in a non-volatile computer.
- Non-volatile memory may include read-only memory (Read-Only Memory, ROM), magnetic tape, floppy disk, flash memory, or optical storage.
- Volatile memory may include random access memory (RAM) or external cache memory.
- RAM may be in various forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), etc.
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Quality & Reliability (AREA)
- Computer Networks & Wireless Communication (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
- Detection And Prevention Of Errors In Transmission (AREA)
Abstract
Description
Claims (20)
- 一种语音传输方法,由计算机执行,所述方法包括:获取语音编码码流中的当前编码数据;通过基于机器学习的丢包恢复能力预测模型,根据当前编码数据对应的第一语音编码特征参数以及所述当前编码数据的在前编码数据对应的第二语音编码特征参数,获得所述当前编码数据对应的丢包恢复能力;根据所述丢包恢复能力判决是否需要进行冗余编码处理;若是,则根据所述当前编码数据进行冗余编码生成相应的冗余包后,再将所述当前编码数据及所述冗余包传输至接收端;及若否,则直接将所述当前编码数据传输至接收端。
- 根据权利要求1所述的方法,其中所述方法还包括:获取原始语音信号;将原始语音信号进行分割,获得原始语音序列;及依次对所述原始语音序列中的语音片段进行语音编码,获得语音编码码流。
- 根据权利要求1所述的方法,其中所述方法还包括:获取原始语音序列中的语音片段各自对应的语音编码特征参数;根据所述语音编码特征参数对相应的语音片段进行语音编码,生成对应的编码数据后获得语音编码码流;及缓存语音编码过程中各个编码数据所采用的语音编码特征参数。
- 根据权利要求1所述的方法,其中所述通过基于机器学习的丢包恢复能力预测模型,根据当前编码数据对应的第一语音编码特征参数以及所述当前编码数据的在前编码数据对应的第二语音编码特征参数,获得所述当前编码数据对应的丢包恢复能力,包括:将所述当前编码数据对应的第一语音编码特征参数以及所述当前编码数据的在前编码数据对应的第二语音编码特征参数输入至丢包恢复能力预测模型;通过所述丢包恢复能力预测模型,根据所述第一语音编码特征参数及所述第二语音编码特征参数,输出对所述当前编码数据进行直接解码所确定的 第一语音质量评分与对所述当前编码数据进行丢包恢复处理后解码所确定的第二语音质量评分之间的评分差异;及根据所述评分差异确定所述当前编码数据对应的丢包恢复能力;其中,所述当前编码数据对应的丢包恢复能力与所述评分差异成反相关。
- 根据权利要求1所述的方法,其中所述根据所述当前编码数据进行冗余编码生成相应的冗余包后,再将所述当前编码数据及所述冗余包传输至接收端,包括:获取接收端所反馈的丢包状态信息;根据所述丢包状态信息确定所述当前编码数据所对应的冗余率;及按照所述冗余率,根据所述当前编码数据生成冗余包后将所述当前编码数据及所述冗余包传输至所述接收端。
- 根据权利要求1所述的方法,其中所述方法还包括:当所述接收端接收到所述当前编码数据时,则直接对所述当前编码数据进行语音解码,获得所述当前编码数据对应的语音信号;及当所述接收端未接收到所述当前编码数据、且接收到所述冗余包时,则通过所述接收端基于所述冗余包进行冗余解码处理,得到所述当前编码数据后再对所述当前编码数据进行语音解码,获得所述当前编码数据对应的语音信号。
- 根据权利要求1所述的方法,其中所述方法还包括:当所述接收端未接收到所述当前编码数据及所述冗余包时,则通过所述接收端对所述当前编码数据进行丢包恢复处理,获得对应于所述当前编码数据的恢复包,对所述恢复包进行语音解码,获得所述当前编码数据对应的语音信号。
- 根据权利要求1至7任一项所述的方法,其中所述丢包恢复能力预测模型通过以下步骤确定:获取训练集中的样本语音序列;对所述样本语音序列进行语音编码,获得样本语音编码码流;提取所述样本语音编码码流中的当前编码数据所采用的第一语音编码特征参数以及所述当前编码数据的在前编码数据所采用的第二语音编码特征参 数;获取直接对所述样本语音编码码流进行解码并获得第一语音信号后,基于所述第一语音信号所确定的第一语音质量评分;获取对所述当前编码数据进行模拟丢包恢复处理获得恢复包,对所述恢复包进行解码并获得第二语音信号后,基于所述第二语音信号所确定的第二语音质量评分;根据所述第一语音质量评分与所述第二语音质量评分之间的评分差异确定所述当前编码数据对应的真实丢包恢复能力;将所述第一语音编码特征参数及所述第二语音编码特征参数输入至机器学习模型,通过所述机器学习模型输出所述当前编码数据所对应的预测丢包恢复能力;及根据所述真实丢包恢复能力与所述预测丢包恢复能力之间的差异调整所述机器学习模型的模型参数后,返回至所述获取训练集中的样本语音序列的步骤继续训练,直至满足训练结束条件。
- 一种语音传输系统,包括发送端和接收端,其中:所述发送端用于获取语音编码码流中的当前编码数据,通过基于机器学习的丢包恢复能力预测模型,根据所述当前编码数据对应的第一语音编码特征参数以及所述当前编码数据的在前编码数据对应的第二语音编码特征参数,获得所述当前编码数据对应的丢包恢复能力;所述发送端还用于根据所述丢包恢复能力判决是否需要进行冗余编码处理;若是,则根据所述当前编码数据进行冗余编码生成相应的冗余包后,再将所述当前编码数据及所述冗余包传输至接收端;若否,则直接将所述当前编码数据传输至接收端;所述接收端用于接收到所述当前编码数据时,则直接对所述当前编码数据进行语音解码,获得所述当前编码数据对应的语音信号;还用于当未接收到所述当前编码数据、且接收到所述冗余包时,则通过所述接收端基于所述冗余包进行冗余解码处理,得到所述当前编码数据后再对所述当前编码数据进行语音解码,获得所述当前编码数据对应的语音信号;及所述接收端还用于未接收到所述当前编码数据及所述冗余包时,则通过 所述接收端对所述当前编码数据进行丢包恢复处理,获得对应于所述当前编码数据的恢复包,对所述恢复包进行语音解码,获得所述当前编码数据对应的语音信号。
- 根据权利要求9所述的系统,其中所述发送端还用于获取原始语音序列中的语音片段各自对应的语音编码特征参数;根据所述语音编码特征参数对相应的语音片段进行语音编码,生成对应的编码数据后获得语音编码码流;缓存所述语音编码过程中各个编码数据所采用的语音编码特征参数。
- 根据权利要求9所述的系统,其中所述发送端还用于将所述当前编码数据对应的第一语音编码特征参数以及所述当前编码数据的在前编码数据对应的第二语音编码特征参数输入至丢包恢复能力预测模型;通过所述丢包恢复能力预测模型,根据所述第一语音编码特征参数及所述第二语音编码特征参数,输出对所述当前编码数据进行直接解码所确定的第一语音质量评分与对所述当前编码数据进行丢包恢复处理后解码所确定的第二语音质量评分之间的评分差异;根据所述评分差异确定所述当前编码数据对应的丢包恢复能力;其中,所述当前编码数据对应的丢包恢复能力与所述评分差异成反相关。
- 根据权利要求9所述的系统,其中所述发送端还用于获取接收端所反馈的丢包状态信息;根据所述丢包状态信息确定所述当前编码数据所对应的冗余率;按照所述冗余率,根据所述当前编码数据生成冗余包后将所述当前编码数据及所述冗余包传输至所述接收端。
- 一种语音传输装置,所述装置包括:获取模块,用于获取语音编码码流中的当前编码数据;预测模块,用于通过基于机器学习的丢包恢复能力预测模型,根据当前编码数据对应的第一语音编码特征参数以及所述当前编码数据的在前编码数据对应的第二语音编码特征参数,获得所述当前编码数据对应的丢包恢复能力;及冗余编码判决模块,用于根据所述丢包恢复能力判决是否需要进行冗余编码处理;若是,则根据所述当前编码数据进行冗余编码生成相应的冗余包后,再将所述当前编码数据及所述冗余包传输至接收端;若否,则直接将所 述当前编码数据传输至接收端。
- 根据权利要求13所述的装置,其中所述语音传输装置还包括语音编码模块,用于获取原始语音信号;将原始语音信号进行分割,获得原始语音序列;及依次对所述原始语音序列中的语音片段进行语音编码,获得语音编码码流。
- 根据权利要求13所述的装置,其中所述语音传输装置还包括语音编码模块和缓存模块;所述语音编码模块用于获取原始语音序列中的语音片段各自对应的语音编码特征参数;根据所述语音编码特征参数对相应的语音片段进行语音编码,生成对应的编码数据后获得语音编码码流;所述缓存模块用于缓存语音编码过程中各个编码数据所采用的语音编码特征参数。
- 根据权利要求13所述的装置,其中所述预测模块还用于将所述当前编码数据对应的第一语音编码特征参数以及所述当前编码数据的在前编码数据对应的第二语音编码特征参数输入至丢包恢复能力预测模型;通过所述丢包恢复能力预测模型,根据所述第一语音编码特征参数及所述第二语音编码特征参数,输出对所述当前编码数据进行直接解码所确定的第一语音质量评分与对所述当前编码数据进行丢包恢复处理后解码所确定的第二语音质量评分之间的评分差异;及根据所述评分差异确定所述当前编码数据对应的丢包恢复能力;其中,所述当前编码数据对应的丢包恢复能力与所述评分差异成反相关。
- 根据权利要求13所述的装置,其中所述冗余编码判决模块还用于当所述接收端接收到所述当前编码数据时,则直接对所述当前编码数据进行语音解码,获得所述当前编码数据对应的语音信号;及当所述接收端未接收到所述当前编码数据、且接收到所述冗余包时,则通过所述接收端基于所述冗余包进行冗余解码处理,得到所述当前编码数据后再对所述当前编码数据进行语音解码,获得所述当前编码数据对应的语音信号。
- 根据权利要求13至17任一项所述的装置,其中所述冗余编码判决模块还用于当所述接收端未接收到所述当前编码数据及所述冗余包时,则通 过所述接收端对所述当前编码数据进行丢包恢复处理,获得对应于所述当前编码数据的恢复包,对所述恢复包进行语音解码,获得所述当前编码数据对应的语音信号。
- 一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如权利要求1至8中任一项所述方法的步骤。
- 一种计算机设备,包括存储器和一个或多个处理器,存储器中储存有计算机可读指令,计算机可读指令被处理器执行时,使得一个或多个处理器执行如权利要求1至8中任一项所述方法的步骤。
Priority Applications (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2022522692A JP7383138B2 (ja) | 2020-02-20 | 2020-10-28 | 音声伝送方法及びそのシステム、装置、コンピュータプログラム、並びにコンピュータ機器 |
| EP20920497.3A EP4012705B1 (en) | 2020-02-20 | 2020-10-28 | Speech transmission method, system, and apparatus, computer readable storage medium, and device |
| US17/685,242 US12451145B2 (en) | 2020-02-20 | 2022-03-02 | Speech transmission method, system and apparatus, computer-readable storage medium, and device |
| US19/356,962 US20260038511A1 (en) | 2020-02-20 | 2025-10-13 | Audio transmission based on packet loss recovery capability |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010104793.7 | 2020-02-20 | ||
| CN202010104793.7A CN112820306B (zh) | 2020-02-20 | 2020-02-20 | 语音传输方法、系统、装置、计算机可读存储介质和设备 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/685,242 Continuation US12451145B2 (en) | 2020-02-20 | 2022-03-02 | Speech transmission method, system and apparatus, computer-readable storage medium, and device |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2021164303A1 true WO2021164303A1 (zh) | 2021-08-26 |
Family
ID=75852966
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2020/124263 Ceased WO2021164303A1 (zh) | 2020-02-20 | 2020-10-28 | 语音传输方法、系统、装置、计算机可读存储介质和设备 |
Country Status (5)
| Country | Link |
|---|---|
| US (2) | US12451145B2 (zh) |
| EP (1) | EP4012705B1 (zh) |
| JP (1) | JP7383138B2 (zh) |
| CN (1) | CN112820306B (zh) |
| WO (1) | WO2021164303A1 (zh) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114513418A (zh) * | 2022-04-21 | 2022-05-17 | 腾讯科技(深圳)有限公司 | 一种数据处理方法及相关设备 |
| CN117498892A (zh) * | 2024-01-02 | 2024-02-02 | 深圳旷世科技有限公司 | 基于uwb的音频传输方法、装置、终端及存储介质 |
| TWI907896B (zh) * | 2022-12-23 | 2025-12-11 | 弗勞恩霍夫爾協會 | 用於音訊編碼/解碼的錯誤恢復工具 |
Families Citing this family (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| AU2021291010A1 (en) * | 2020-06-19 | 2023-01-19 | Rtx A/S | Low latency audio packet loss concealment |
| US20220052783A1 (en) * | 2020-08-12 | 2022-02-17 | Vmware, Inc. | Packet reconstruction and error correction for network endpoints |
| CN113192520B (zh) * | 2021-07-01 | 2021-09-24 | 腾讯科技(深圳)有限公司 | 一种音频信息处理方法、装置、电子设备及存储介质 |
| CN116073946A (zh) * | 2021-11-01 | 2023-05-05 | 中兴通讯股份有限公司 | 抗丢包方法、装置、电子设备及存储介质 |
| CN114978427B (zh) * | 2022-05-19 | 2024-04-19 | 腾讯科技(深圳)有限公司 | 数据处理方法、装置、程序产品、计算机设备和介质 |
| CN116033594A (zh) * | 2022-12-22 | 2023-04-28 | 深圳市潮流网络技术有限公司 | 数据传输方法、系统、终端设备及计算机可读存储介质 |
| US20240339117A1 (en) * | 2023-04-07 | 2024-10-10 | Apple Inc. | Low latency audio for immersive group communication sessions |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106937134A (zh) * | 2015-12-31 | 2017-07-07 | 深圳市潮流网络技术有限公司 | 一种数据传输的编码方法、编码发送装置及系统 |
| US20190051310A1 (en) * | 2017-08-10 | 2019-02-14 | Industry-University Cooperation Foundation Hanyang University | Method and apparatus for packet loss concealment using generative adversarial network |
| US20190080701A1 (en) * | 2014-03-04 | 2019-03-14 | Genesys Telecommunications Laboratories, Inc. | System and Method to Correct for Packet Loss in ASR Systems |
| CN109616129A (zh) * | 2018-11-13 | 2019-04-12 | 南京南大电子智慧型服务机器人研究院有限公司 | 用于提升语音丢帧补偿性能的混合多描述正弦编码器方法 |
| CN110265046A (zh) * | 2019-07-25 | 2019-09-20 | 腾讯科技(深圳)有限公司 | 一种编码参数调控方法、装置、设备及存储介质 |
Family Cites Families (31)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP3006541B2 (ja) * | 1997-05-26 | 2000-02-07 | 日本電気株式会社 | 通信装置及び通信システム |
| KR100462024B1 (ko) | 2002-12-09 | 2004-12-17 | 한국전자통신연구원 | 부가 음성 데이터를 이용한 패킷 손실 복구 방법 및 이를이용한 송수신기 |
| US7668712B2 (en) * | 2004-03-31 | 2010-02-23 | Microsoft Corporation | Audio encoding and decoding with intra frames and adaptive forward error correction |
| JPWO2008007698A1 (ja) | 2006-07-12 | 2009-12-10 | パナソニック株式会社 | 消失フレーム補償方法、音声符号化装置、および音声復号装置 |
| US8010351B2 (en) * | 2006-12-26 | 2011-08-30 | Yang Gao | Speech coding system to improve packet loss concealment |
| EP2381580A1 (en) | 2007-04-13 | 2011-10-26 | Global IP Solutions (GIPS) AB | Adaptive, scalable packet loss recovery |
| US8352252B2 (en) * | 2009-06-04 | 2013-01-08 | Qualcomm Incorporated | Systems and methods for preventing the loss of information within a speech frame |
| CN102036061B (zh) * | 2009-09-30 | 2012-11-21 | 华为技术有限公司 | 视频数据传输处理、发送处理方法、装置和网络系统 |
| CN102143367B (zh) * | 2010-01-30 | 2013-01-30 | 华为技术有限公司 | 一种纠错校验方法、设备和系统 |
| CN102752184A (zh) * | 2011-04-20 | 2012-10-24 | 河海大学 | 用于实时多播业务的数据通信系统及其方法 |
| US9047863B2 (en) * | 2012-01-12 | 2015-06-02 | Qualcomm Incorporated | Systems, methods, apparatus, and computer-readable media for criticality threshold control |
| CN103716718B (zh) * | 2013-12-16 | 2017-03-01 | 广州华多网络科技有限公司 | 数据包的传输方法及装置 |
| WO2016179382A1 (en) * | 2015-05-07 | 2016-11-10 | Dolby Laboratories Licensing Corporation | Voice quality monitoring system |
| US20170084280A1 (en) * | 2015-09-22 | 2017-03-23 | Microsoft Technology Licensing, Llc | Speech Encoding |
| EP3228037B1 (en) * | 2015-10-01 | 2018-04-11 | Telefonaktiebolaget LM Ericsson (publ) | Method and apparatus for removing jitter in audio data transmission |
| CN107592540B (zh) * | 2016-07-07 | 2020-02-11 | 腾讯科技(深圳)有限公司 | 一种视频数据处理方法及装置 |
| CN108011686B (zh) * | 2016-10-31 | 2020-07-14 | 腾讯科技(深圳)有限公司 | 信息编码帧丢失恢复方法和装置 |
| US10714098B2 (en) * | 2017-12-21 | 2020-07-14 | Dolby Laboratories Licensing Corporation | Selective forward error correction for spatial audio codecs |
| CN110087140B (zh) * | 2018-01-26 | 2022-07-05 | 腾讯科技(深圳)有限公司 | 一种传输流媒体数据的方法、装置、介质及设备 |
| US10475456B1 (en) * | 2018-06-04 | 2019-11-12 | Qualcomm Incorporated | Smart coding mode switching in audio rate adaptation |
| US10990812B2 (en) | 2018-06-20 | 2021-04-27 | Agora Lab, Inc. | Video tagging for video communications |
| CN109218083B (zh) * | 2018-08-27 | 2021-08-13 | 广州猎游信息科技有限公司 | 一种语音数据传输方法及装置 |
| US10784988B2 (en) * | 2018-12-21 | 2020-09-22 | Microsoft Technology Licensing, Llc | Conditional forward error correction for network data |
| CN109862440A (zh) * | 2019-02-22 | 2019-06-07 | 深圳市凯迪仕智能科技有限公司 | 音视频传输前向纠错方法、装置、计算机设备及存储介质 |
| US11509423B2 (en) * | 2019-09-09 | 2022-11-22 | Apple Inc. | Dynamic redundancy for multimedia content |
| CN112530444B (zh) * | 2019-09-18 | 2023-10-03 | 华为技术有限公司 | 音频编码方法和装置 |
| US11671448B2 (en) * | 2019-12-27 | 2023-06-06 | Paypal, Inc. | Phishing detection using uniform resource locators |
| CN111312264B (zh) * | 2020-02-20 | 2023-04-21 | 腾讯科技(深圳)有限公司 | 语音传输方法、系统、装置、计算机可读存储介质和设备 |
| US11715480B2 (en) * | 2021-03-23 | 2023-08-01 | Qualcomm Incorporated | Context-based speech enhancement |
| US11914599B2 (en) * | 2021-11-19 | 2024-02-27 | Hamilton Sundstrand Corporation | Machine learning intermittent data dropout mitigation |
| US20250182773A1 (en) * | 2023-12-01 | 2025-06-05 | Comcast Cable Communications, Llc | Methods and apparatuses for speech enhancement |
-
2020
- 2020-02-20 CN CN202010104793.7A patent/CN112820306B/zh active Active
- 2020-10-28 JP JP2022522692A patent/JP7383138B2/ja active Active
- 2020-10-28 EP EP20920497.3A patent/EP4012705B1/en active Active
- 2020-10-28 WO PCT/CN2020/124263 patent/WO2021164303A1/zh not_active Ceased
-
2022
- 2022-03-02 US US17/685,242 patent/US12451145B2/en active Active
-
2025
- 2025-10-13 US US19/356,962 patent/US20260038511A1/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20190080701A1 (en) * | 2014-03-04 | 2019-03-14 | Genesys Telecommunications Laboratories, Inc. | System and Method to Correct for Packet Loss in ASR Systems |
| CN106937134A (zh) * | 2015-12-31 | 2017-07-07 | 深圳市潮流网络技术有限公司 | 一种数据传输的编码方法、编码发送装置及系统 |
| US20190051310A1 (en) * | 2017-08-10 | 2019-02-14 | Industry-University Cooperation Foundation Hanyang University | Method and apparatus for packet loss concealment using generative adversarial network |
| CN109616129A (zh) * | 2018-11-13 | 2019-04-12 | 南京南大电子智慧型服务机器人研究院有限公司 | 用于提升语音丢帧补偿性能的混合多描述正弦编码器方法 |
| CN110265046A (zh) * | 2019-07-25 | 2019-09-20 | 腾讯科技(深圳)有限公司 | 一种编码参数调控方法、装置、设备及存储介质 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4012705A4 |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114513418A (zh) * | 2022-04-21 | 2022-05-17 | 腾讯科技(深圳)有限公司 | 一种数据处理方法及相关设备 |
| CN114513418B (zh) * | 2022-04-21 | 2022-06-24 | 腾讯科技(深圳)有限公司 | 一种数据处理方法及相关设备 |
| TWI907896B (zh) * | 2022-12-23 | 2025-12-11 | 弗勞恩霍夫爾協會 | 用於音訊編碼/解碼的錯誤恢復工具 |
| CN117498892A (zh) * | 2024-01-02 | 2024-02-02 | 深圳旷世科技有限公司 | 基于uwb的音频传输方法、装置、终端及存储介质 |
| CN117498892B (zh) * | 2024-01-02 | 2024-05-03 | 深圳旷世科技有限公司 | 基于uwb的音频传输方法、装置、终端及存储介质 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN112820306B (zh) | 2023-08-15 |
| JP2022552382A (ja) | 2022-12-15 |
| EP4012705A1 (en) | 2022-06-15 |
| JP7383138B2 (ja) | 2023-11-17 |
| US20260038511A1 (en) | 2026-02-05 |
| EP4012705A4 (en) | 2022-12-28 |
| CN112820306A (zh) | 2021-05-18 |
| EP4012705B1 (en) | 2025-07-30 |
| US12451145B2 (en) | 2025-10-21 |
| US20220189491A1 (en) | 2022-06-16 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN112820306B (zh) | 语音传输方法、系统、装置、计算机可读存储介质和设备 | |
| CN111312264B (zh) | 语音传输方法、系统、装置、计算机可读存储介质和设备 | |
| CN114333862B (zh) | 音频编码方法、解码方法、装置、设备、存储介质及产品 | |
| US20220180881A1 (en) | Speech signal encoding and decoding methods and apparatuses, electronic device, and storage medium | |
| US9275644B2 (en) | Devices for redundant frame coding and decoding | |
| JP5405659B2 (ja) | 消去されたスピーチフレームを再構成するためのシステムおよび方法 | |
| US20190198027A1 (en) | Audio frame loss recovery method and apparatus | |
| RU2432694C2 (ru) | Способ передачи данных в системе связи | |
| US20150207710A1 (en) | Call Quality Estimation by Lost Packet Classification | |
| US12501085B2 (en) | Semantic compression for compute offloading | |
| CN116580716B (zh) | 音频编码方法、装置、存储介质及计算机设备 | |
| CN113763973B (zh) | 音频信号增强方法、装置、计算机设备和存储介质 | |
| CN111371534B (zh) | 一种数据重传方法、装置、电子设备和存储介质 | |
| CN114842857B (zh) | 语音处理方法、装置、系统、设备及存储介质 | |
| HK40044533B (zh) | 语音传输方法、系统、装置、计算机可读存储介质和设备 | |
| HK40044533A (zh) | 语音传输方法、系统、装置、计算机可读存储介质和设备 | |
| HK40024144B (zh) | 语音传输方法、系统、装置、计算机可读存储介质和设备 | |
| HK40024144A (zh) | 语音传输方法、系统、装置、计算机可读存储介质和设备 | |
| HK40071960B (zh) | 音频编码方法、解码方法、装置、设备、存储介质及产品 | |
| HK40071960A (zh) | 音频编码方法、解码方法、装置、设备、存储介质及产品 | |
| CN121811890A (zh) | 基于三组分解的语音通信方法及装置、设备 | |
| CN121121429A (zh) | 模型传输方法、装置、设备、介质及程序产品 | |
| Han et al. | Error-Resilient Semantic Communication for Speech Transmission over Packet-Loss Networks | |
| Benamirouche et al. | Low complexity forward error correction for CELP-type speech coding over erasure channel transmission | |
| Shukla et al. | Enhanced Speech Compression in G. 723 Audio Codec Through Mahalanobis Distance-Based Error Concealment Technique |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20920497 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2020920497 Country of ref document: EP Effective date: 20220307 |
|
| ENP | Entry into the national phase |
Ref document number: 2022522692 Country of ref document: JP Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWG | Wipo information: grant in national office |
Ref document number: 2020920497 Country of ref document: EP |
|
| WWG | Wipo information: grant in national office |
Ref document number: 202237020222 Country of ref document: IN |