WO2021164303A1 - 语音传输方法、系统、装置、计算机可读存储介质和设备 - Google Patents

语音传输方法、系统、装置、计算机可读存储介质和设备 Download PDF

Info

Publication number
WO2021164303A1
WO2021164303A1 PCT/CN2020/124263 CN2020124263W WO2021164303A1 WO 2021164303 A1 WO2021164303 A1 WO 2021164303A1 CN 2020124263 W CN2020124263 W CN 2020124263W WO 2021164303 A1 WO2021164303 A1 WO 2021164303A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
encoded data
packet loss
current
redundant
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2020/124263
Other languages
English (en)
French (fr)
Inventor
梁俊斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to JP2022522692A priority Critical patent/JP7383138B2/ja
Priority to EP20920497.3A priority patent/EP4012705B1/en
Publication of WO2021164303A1 publication Critical patent/WO2021164303A1/zh
Priority to US17/685,242 priority patent/US12451145B2/en
Anticipated expiration legal-status Critical
Priority to US19/356,962 priority patent/US20260038511A1/en
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/005Correction of errors induced by the transmission channel, if related to the coding algorithm
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/167Audio streaming, i.e. formatting and decoding of an encoded audio signal representation into a data stream for transmission or storage purposes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/60Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/0001Systems modifying transmission characteristics according to link quality, e.g. power backoff
    • H04L1/0002Systems modifying transmission characteristics according to link quality, e.g. power backoff by adapting the transmission rate
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/0001Systems modifying transmission characteristics according to link quality, e.g. power backoff
    • H04L1/0009Systems modifying transmission characteristics according to link quality, e.g. power backoff by adapting the channel coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/0001Systems modifying transmission characteristics according to link quality, e.g. power backoff
    • H04L1/0015Systems modifying transmission characteristics according to link quality, e.g. power backoff characterised by the adaptation strategy
    • H04L1/0019Systems modifying transmission characteristics according to link quality, e.g. power backoff characterised by the adaptation strategy in which mode-switching is based on a statistical approach
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/0001Systems modifying transmission characteristics according to link quality, e.g. power backoff
    • H04L1/0023Systems modifying transmission characteristics according to link quality, e.g. power backoff characterised by the signalling
    • H04L1/0026Transmission of channel quality indication
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/004Arrangements for detecting or preventing errors in the information received by using forward error control
    • H04L1/0041Arrangements at the transmitter end

Definitions

  • This application relates to the field of computer technology, in particular to a voice transmission method, system, device, computer-readable storage medium, and computer equipment.
  • the Internet is an unreliable transmission network.
  • the main problem faced by Internet-based voice transmission is the problem of anti-packet loss. Due to the instability of the transmission network, packet loss will occur in the transmission process.
  • the FEC (Forward Error Correction) redundant coding channel coding algorithm is usually used to generate redundant packets, and the redundant packets are sent to the receiving end together with the data packets, and the receiving end receives After that, the lost data packets are recovered through redundant packets and original packets, so as to achieve the effect of anti-lost packets.
  • FEC redundant coding relies on the generation of redundant packets to resist the packet loss problem of the transmission network, which is bound to increase the bandwidth by multiples and consume too much network bandwidth resources. The stronger the ability to resist packet loss, the more network bandwidth will be consumed. Especially for bandwidth-constrained scenarios, problems such as network congestion are prone to cause more packet loss.
  • a voice transmission method including:
  • the current coded data is obtained according to the first voice coding feature parameter corresponding to the current coded data and the second voice coding feature parameter corresponding to the previous coded data of the current coded data Corresponding packet loss recovery capability;
  • a voice transmission system including a sending end and a receiving end, in which:
  • the sending end is used to obtain the current coded data in the speech coding bitstream, and according to the first speech coding characteristic parameter corresponding to the current coded data and the value of the current coded data through the packet loss recovery capability prediction model based on machine learning
  • the second speech coding feature parameter corresponding to the previously coded data to obtain the packet loss recovery capability corresponding to the current coded data
  • the sending end is also used to determine whether redundant encoding processing is required according to the packet loss recovery capability; if so, perform redundant encoding according to the current encoded data to generate a corresponding redundant packet, and then perform the current encoding
  • the data and the redundant packet are transmitted to the receiving end; if not, the currently encoded data is directly transmitted to the receiving end;
  • the receiving end When the receiving end is used to receive the currently encoded data, it directly performs voice decoding on the currently encoded data to obtain the voice signal corresponding to the currently encoded data; it is also used for when the currently encoded data is not received And when the redundant packet is received, the receiving end performs redundant decoding processing based on the redundant packet to obtain the currently encoded data and then perform voice decoding on the currently encoded data to obtain the The speech signal corresponding to the current coded data; and
  • the receiving end is also configured to perform packet loss recovery processing on the currently encoded data through the receiving end when the currently encoded data and the redundant packet are not received, to obtain the data corresponding to the currently encoded data
  • the recovery package is to perform voice decoding on the recovery package to obtain the voice signal corresponding to the currently encoded data.
  • a voice transmission device includes:
  • the acquiring module is used to acquire the current encoded data in the speech encoding code stream
  • the prediction module is used to obtain the first voice coding feature parameter corresponding to the current coded data and the second voice coding feature parameter corresponding to the previous coded data of the current coded data through the packet loss recovery capability prediction model based on machine learning The packet loss recovery capability corresponding to the currently encoded data;
  • the redundant coding decision module is used to decide whether redundant coding processing is required according to the packet loss recovery capability; if so, perform redundant coding according to the current coded data to generate a corresponding redundant packet, and then convert the current The encoded data and the redundant packet are transmitted to the receiving end; if not, the currently encoded data is directly transmitted to the receiving end.
  • One or more non-volatile computer-readable storage media storing computer-readable instructions.
  • the computer-readable instructions are executed by one or more processors, the one or more processors execute the steps of the above-mentioned voice transmission method.
  • a computer device includes a memory and one or more processors.
  • the memory stores computer-readable instructions.
  • the one or more processors execute the steps of the above-mentioned voice transmission method.
  • Figure 1 is an application environment diagram of a voice transmission method in an embodiment
  • Figure 2 is a diagram of the application environment of the voice transmission method in another embodiment
  • FIG. 3 is a schematic flowchart of a voice transmission method in an embodiment
  • Fig. 4 is a schematic block diagram of using FEC redundant coding mechanism for voice transmission in an embodiment
  • FIG. 5 is a schematic flowchart of training steps of a packet loss recovery capability prediction model in an embodiment
  • FIG. 6 is a training block diagram of a prediction model of packet loss recovery capability in an embodiment
  • FIG. 7 is a flowchart of a voice transmission method in an embodiment
  • FIG. 8 is a schematic flowchart of a voice transmission method in a specific embodiment
  • Figure 9 is a structural block diagram of a voice transmission device in an embodiment
  • Fig. 10 is a structural block diagram of a computer device in an embodiment.
  • Fig. 1 is an application environment diagram of a voice transmission method in an embodiment.
  • the voice transmission method is applied to a voice transmission system.
  • the voice transmission system includes a sending end 110 and a receiving end 120.
  • the sending end 110 and the receiving end 120 are connected through a network.
  • Both the sending end 110 and the receiving end 120 may be terminals.
  • the terminal may specifically be a desktop terminal or a mobile terminal, and the mobile terminal may specifically be at least one of a mobile phone, a tablet computer, and a notebook computer.
  • the sending end 110 and the receiving end 120 may also be servers or server clusters.
  • both the sending end 110 and the receiving end 120 run applications that support voice transmission.
  • the server 130 can provide computing and storage capabilities for the application, and the sending end 110 ,
  • the receiving end 120 can be connected to the server 130 through a network, so that the voice transmission between the two ends can be realized based on the server 130.
  • the server 130 may be implemented as an independent server or a server cluster composed of multiple servers.
  • the sending end 110 can obtain the current encoded data in the voice encoding bitstream; through the packet loss recovery capability prediction model based on machine learning, according to the first voice encoding feature parameter corresponding to the current encoded data and the current encoded data
  • the second speech coding feature parameter corresponding to the previous coded data obtains the packet loss recovery capability corresponding to the current coded data; determines whether redundant coding processing is required according to the packet loss recovery capability; if so, performs redundant coding generation based on the current coded data
  • the current encoded data and redundant packets are transmitted to the receiving end 120; if not, the current encoded data is directly transmitted to the receiving end 120, which can effectively improve the utilization of network bandwidth as a whole, and at the same time It can also ensure the anti-packet loss capability of the transmission network.
  • a voice transmission method is provided.
  • the method is mainly applied to the sending end 110 in FIG. 1 or FIG. 2 as an example.
  • the voice transmission method specifically includes the following steps S302 to S308:
  • the speech coding code stream is the original code stream obtained after speech coding is performed on the speech signal.
  • the speech coding code stream includes a set of coded data to be transmitted.
  • the encoded data may be an encoded data frame obtained by encoding a voice signal by a voice encoder at the transmitting end according to a specific frame length, and the transmitting end may transmit the encoded data frame in the voice encoding code stream to the receiving end through the network.
  • the encoded data may also be an encoded data packet synthesized from multiple encoded data frames, and the transmitting end may transmit the encoded data packet in the voice encoding code stream to the receiving end through the network.
  • the encoder on the transmitting end obtains a 60ms voice signal, divides the voice signal into 4 frames with a frame length of 15ms, and encodes them in order to obtain 4 encoded data frames.
  • the transmitting end can transmit the encoded data frames to the receiving end in turn ,
  • the sending end can also combine these 4 coded data frames into one coded data packet and then transmit it to the receiving end through the network.
  • the sender will directly use FEC redundant coding to send each coded data in the speech coded stream to the receiver before transmitting the speech coded stream to the receiver.
  • the receiving end can receive each coded data and the corresponding redundant packet through the network, and perform redundant decoding according to the redundant packet to obtain the lost coded data and then decode to obtain the voice signal.
  • the voice code stream to be transmitted includes five coded data P1, P2, P3, P4, and P5.
  • the receiving end can perform redundant coding based on these five coded data to generate redundant packets.
  • the number of redundant packets can be one. Or more, here it is assumed that two redundant packets R1 and R2 are generated, and then P1, P2, P3, P4, P5 and R1, R2 are packaged and sent to the receiving end.
  • the sending end can predict the receiving end in turn before sending each encoded data in the speech coded stream to the receiving end.
  • the sending end can obtain the coded data in the speech code stream in turn, and the current coded data is the coded data currently to be transmitted to the receiving end.
  • the current coded data used in this application is used to describe the coded data currently being processed by the sending end
  • the previous coded data is used to describe the coded data before the current coded data in the speech code stream
  • the previous coded data may be
  • the previous coded data of the current coded data may also be the previous multiple coded data of the current coded data, for example, it may be the first two coded data of the currently coded data.
  • the current coded data is a relatively changing object. For example, after the sending end processes the current coded data F(i), the next coded data of the current coded data F(i) in the speech code stream can be changed. F(i+1) is taken as the new current coded data, and the current coded data F(i) is taken as the previous coded data of the new current coded data F(i+1).
  • the above-mentioned voice transmission method further includes: obtaining the original voice signal; dividing the original voice signal to obtain the original voice sequence; and sequentially performing voice encoding on the voice segments in the original voice sequence to obtain the voice code stream.
  • the original voice signal acquired by the sender is a 2-second voice.
  • This voice signal is divided in units of 20 milliseconds to obtain an original voice sequence composed of 100 voice segments, and then each of the original voice sequences is sequentially
  • the speech fragments are speech-encoded, and the encoded data corresponding to each speech fragment is obtained, thereby generating a speech encoding code stream corresponding to the original speech signal.
  • the above-mentioned voice transmission method further includes: obtaining the voice coding feature parameters corresponding to the voice segments in the original voice sequence; performing voice coding on the corresponding voice segments according to the voice coding feature parameters, and generating the corresponding coded data to obtain Voice coding code stream; buffers the voice coding characteristic parameters used by each coded data in the voice coding process.
  • the sending end extracts the speech encoding feature parameters of the speech segments in the original speech sequence, encodes the extracted speech encoding feature parameters, and generates encoded data corresponding to each speech segment, for example, the encoder at the sender Extract the speech coding feature parameters of the speech segment through some speech signal processing models (such as filters, feature extractors, etc.), and then encode these speech coding feature parameters (such as entropy coding) and pack them according to a certain data format to obtain the corresponding code data.
  • some speech signal processing models such as filters, feature extractors, etc.
  • the sender can jointly generate the current coding data corresponding to the current speech segment according to the speech coding feature parameters of the current speech segment and the speech coding feature parameters of the previous speech segment, and can also generate the current coding data corresponding to the current speech segment according to the speech coding feature parameters and the speech coding feature parameters of the current speech segment.
  • the speech coding feature parameters of the subsequent speech segment jointly generate the current coded data corresponding to the current speech segment.
  • the speech coding feature parameters may be parameters such as line spectrum frequency (LSF), pitch detection, adaptive codebook gain (adaptive gain), and fixed codebook gain extracted from signal processing of the speech segment.
  • the sending end when the sending end generates the encoded data corresponding to each speech segment, it will also buffer the speech encoding feature parameters of each speech segment during the encoding process, that is, the speech encoding feature parameters used when generating each encoded data, for subsequent use based on The buffered speech coding feature parameters predict the packet loss recovery capability corresponding to each coded data.
  • the packet loss recovery capability is a prediction result that can reflect the voice quality of the recovered packet obtained by the receiving end after the loss of the current encoded data by performing packet loss recovery processing on the current encoded data.
  • the prediction result indicates that the receiving end can well recover the lost current coded data or cannot well recover the lost current coded data.
  • the packet loss recovery process is PLC (Packet Loss Concealment), and the packet loss recovery capability is the PLC's packet loss recovery capability.
  • the receiving end's packet loss recovery capability is limited. For example, in the case of adjacent or similar coded data with fundamental frequency hopping, LSF sudden change, etc., the receiving end of the packet loss Recoverability is limited. In this case, enabling FEC redundant coding at the sending end can effectively increase the packet loss rate and ensure the voice quality of the receiving end; and when the numerical fluctuations of the voice coding characteristic parameters of adjacent coded data are relatively stable, the receiving end It usually has good packet loss recovery capabilities. In this case, the sender does not need to enable FEC redundant coding.
  • the packet loss recovery capability corresponding to the current encoded data is related to its corresponding speech encoding feature parameters.
  • the machine learning model can be trained through a large number of training samples and learn how to predict the loss of data packets corresponding to the speech encoding feature parameters.
  • Package recovery capability can be provided.
  • the sending end may obtain the first voice coding feature parameter corresponding to the current coded data in the buffer, and the second voice coding feature parameter corresponding to the previously coded data, and use the pre-trained packet loss recovery capability prediction model according to the first A voice coding feature parameter and a second voice coding feature parameter predict the loss recovery capability corresponding to the current coded data.
  • the sender may use the packet loss recovery capability prediction model to obtain the current encoding feature parameter according to the first voice encoding feature parameter corresponding to the current encoded data and the third voice encoding feature parameter corresponding to the subsequent encoded data of the current encoded data.
  • the packet loss recovery capability corresponding to the encoded data is obtained.
  • the post-coded data is used to describe the coded data after the current coded data in the speech code stream.
  • the post-coded data can be the next coded data of the current coded data, or multiple coded data after the current coded data, such as , Can be the last two coded data of the current coded data.
  • the speech coding characteristic parameters corresponding to which coded data is used by the sender as the input of the packet loss recovery capability prediction model depends on the algorithm rules adopted by the sender for speech encoding or the algorithm adopted by the receiver for speech decoding. Algorithm rules, coding and decoding rules correspond to each other. For example, if the sender needs to predict the packet loss recovery capability corresponding to the current encoded data based on the voice encoding feature parameters corresponding to the previous encoded data when generating the current encoded data, it needs to use the previous encoded data.
  • the speech coding feature parameters are used as the input of the packet loss recovery capability prediction model; if the sending end is generating the current coded data, it needs to perform the packet loss recovery capability corresponding to the current coded data according to the voice coding feature parameters adopted by the latter coded data. When predicting, it is necessary to use the speech coding feature parameters used in the latter coded data as the input of the packet loss recovery capability prediction model.
  • the predictive model of packet loss recovery capability is a computer model based on machine learning, which can be implemented using a neural network model. Machine learning models can learn from samples to have specific capabilities.
  • the packet loss recovery capability prediction model is a model that is trained in advance and has the ability to predict packet loss recovery.
  • the sender can set the model structure of the machine learning model in advance to obtain the initial machine learning model, and then train the initial machine learning model through a large number of sample voice and packet loss simulation tests to obtain the model parameters of the machine learning model .
  • the sender can obtain the model parameters obtained in advance, and then import the model parameters into the initial machine learning model to obtain the packet loss recovery ability prediction model, and use the packet loss recovery ability prediction model to compare The packet loss recovery capability corresponding to each encoded data in the speech encoding bitstream is predicted, so as to determine whether to enable FEC redundant encoding for the current encoded data according to the predicted packet loss recovery capability.
  • this training step can be executed by any computer device to obtain a trained packet loss recovery ability prediction model, and then the trained packet loss recovery ability prediction model is imported to the sending end that needs to perform voice transmission; this
  • the computer device may also be the sending end in FIG. 1 or FIG. 2, that is to say, the training step may also be directly executed by the sending end and obtain a trained prediction model of packet loss recovery ability.
  • the following uses computer equipment as the main body of execution to illustrate the training steps of the packet loss recovery capability prediction model, which specifically include:
  • S502 Obtain a sample voice sequence in the training set.
  • the computer device can obtain a large number of voice signals, and divide the voice signals to obtain a large number of voice signal sequences composed of voice segments, which are used as sample voice sequences for training the machine learning model.
  • S504 Perform voice coding on the sample voice sequence to obtain a sample voice coding bitstream.
  • the computer device extracts the voice coding feature parameters corresponding to each voice segment, and generates the coding data corresponding to each voice segment according to the extracted voice coding feature parameters, and obtains the sample voice corresponding to each sample voice sequence. Encoding stream.
  • the computer equipment can buffer the speech coding characteristic parameters used by each coded data in the coding process.
  • S506 Extract the first voice coding feature parameter used by the current coded data in the sample voice coding bitstream and the second voice coding feature parameter used by the previous coded data of the current coded data.
  • the loss recovery capability corresponding to the encoded data is related to its corresponding speech encoding feature parameters, and may also be related to the speech encoding feature parameters corresponding to the previous encoded data and/or the subsequent encoded data. Therefore, during training , The computer equipment can use the speech coding feature parameters as the input of the machine learning model for training.
  • the sending end may extract the first voice coding feature parameter corresponding to the currently processed current coded data and the second voice coding feature parameter corresponding to the previous coded data of the current coded data, as the input of the machine learning model.
  • the previous coded data is the previous coded data of the currently coded data, and it may also be the previous multiple coded data of the currently coded data.
  • each training object is one piece of coded data
  • each sample voice code stream includes multiple coded data
  • each sample voice code stream can be used for multiple trainings.
  • the sending end can extract the speech coding feature parameters corresponding to the i-th coded data and the speech coding feature parameters corresponding to the i-1th coded data in the sample speech coding code stream S, and the sending end can also Extract the voice coding feature parameter corresponding to the i+1-th coded data and the voice coding feature parameter corresponding to the i-th coded data in the sample voice coding bitstream S.
  • S508 Obtain a first voice quality score determined based on the first voice signal after directly decoding the sample voice encoding code stream and obtaining the first voice signal.
  • the computer device may directly decode the encoded sample voice code stream obtained after encoding to obtain the first voice signal, and then use a voice quality testing tool to test the first voice quality score corresponding to the first voice signal. Since the first speech signal is obtained by directly decoding the sample speech coding code stream, there is no loss of coded data. Therefore, the obtained first speech signal is very close to the original sample speech sequence, which can be called a lossless speech signal.
  • the corresponding first voice quality score may be referred to as a lossless voice quality score.
  • the voice quality testing tool may be PESQ (Perceptual Evaluation of Speech Quality, subjective voice quality evaluation), and PESQ can objectively evaluate the quality of the voice signal according to some measurement standards, thereby providing a fully quantifiable voice quality measurement Methods, these measurement standards are in good agreement with human perception of voice quality.
  • the first voice quality score obtained can be recorded as MOS_UNLOSS.
  • S510 Obtain a recovery packet by performing simulated packet loss recovery processing on the current encoded data, and after decoding the recovery packet and obtaining a second voice signal, a second voice quality score determined based on the second voice signal.
  • the computer device can use the current encoded data as the lost data packet, and simulate the decoder at the receiving end to perform packet loss recovery processing on the current encoded data and obtain the corresponding recovery packet, and obtain the corresponding second voice signal after decoding the recovery packet. Then, other voice segments in the original sample voice sequence are spliced with the second voice signal to perform a voice quality score to obtain a second voice quality score. Since the second voice signal is obtained by decoding the recovery packet obtained in the case of simulated packet loss, there is a loss between the recovery packet and the lost current coded data, so the obtained second voice signal is one of the voice fragments corresponding to the current coded data. There will also be loss in time, the second voice signal may be called a lossy voice signal, and the determined second voice quality score may be called a lossy voice quality score, which is recorded as MOS_LOSS.
  • S512 Determine the true packet loss recovery capability corresponding to the current encoded data according to the score difference between the first voice quality score and the second voice quality score.
  • the true packet loss recovery capability corresponding to the current encoded data can be measured by the score difference between the first voice quality score and the second voice quality score, that is, MOS_UNLOSS-MOS_LOSS can be used as the true packet loss recovery corresponding to the current encoded data Ability, that is, the target output of the machine learning model.
  • the true packet loss recovery capability corresponding to the current encoded data is inversely related to the difference in the score, that is, the smaller the difference is, the better the voice quality of the recovered packet obtained by simulating the loss of the current encoded data after packet loss recovery, and the current encoded data
  • the corresponding true packet loss recovery capability is stronger; conversely, the larger the difference, it means that the voice quality of the recovery packet obtained by simulating the loss of the current encoded data after packet loss recovery is poor.
  • S514 Input the first voice coding feature parameter and the second voice coding feature parameter to the machine learning model, and output the predicted packet loss recovery capability corresponding to the current coded data through the machine learning model.
  • the computer device can input the obtained first speech coding feature parameter and the second speech coding feature parameter into the machine learning model, and output the prediction loss corresponding to the current coded data through internal network processing.
  • Package recovery capability It should be noted that S514 may also be executed before step S508, and this embodiment does not limit the execution order of this step.
  • the computer device can construct a loss function based on the acquired true packet loss recovery ability and the predicted packet loss recovery ability obtained through the machine learning model, and the model parameters obtained when the loss function is minimized are used as the latest model parameters of the machine learning model, and continue The next training is performed according to the sample voice sequence, until the machine learning model converges or the number of training times reaches a preset number of times, a trained packet loss recovery ability prediction model with the ability to predict packet loss recovery is obtained.
  • FIG. 6 it is a schematic diagram of the framework of training a machine learning model to obtain a packet loss recovery capability prediction model in an embodiment.
  • Figure 6 shows a schematic flow diagram of a single training process.
  • the computer device obtains the sample voice sequence, and performs voice coding on the sample voice sequence to obtain a sample voice code stream.
  • decode the sample voice encoding code stream directly without packet loss in the current encoded data and then use PESQ to obtain MOS_UNLOSS, then simulate the loss of the current encoded data and perform packet loss recovery processing after decoding and then use PESQ to obtain MOS_LOSS.
  • step S304 the packet loss recovery capability prediction model based on machine learning is used according to the first voice coding feature parameter corresponding to the current coded data and the second voice coding feature parameter corresponding to the previous coded data of the current coded data.
  • Obtaining the packet loss recovery capability corresponding to the current encoded data includes: inputting the first voice encoding feature parameter corresponding to the currently encoded data and the second voice encoding feature parameter corresponding to the previous encoded data of the current encoded data into the packet loss recovery capability prediction Model; through the packet loss recovery capability prediction model, according to the first voice coding feature parameter and the second voice coding feature parameter, output the first voice quality score determined by directly decoding the current coded data and perform packet loss recovery on the current coded data After processing, the score difference between the second voice quality scores determined by decoding is decoded; the packet loss recovery capability corresponding to the current encoded data is determined according to the score difference; wherein, the packet loss recovery capability corresponding to the current encoded data is inversely related to the score difference.
  • the packet loss recovery capability corresponding to the current encoded data can be predicted through the pre-trained packet loss recovery capability prediction model.
  • the first speech encoding feature parameter corresponding to the current encoded data and the second speech encoding feature parameter corresponding to the previous encoded data are used as the input of the packet loss recovery capability prediction model, and the output of the packet loss recovery capability prediction model is the current encoding
  • the score difference reflects the current encoded data received after packet loss
  • the quality of the packet loss recovery processing performed by the end that is, the size of the packet loss recovery ability, and the packet loss recovery ability is inversely related to the difference in scores.
  • the score difference is large, that is, the packet loss recovery ability is less than the preset threshold, it means that after the current encoded data is lost, if the receiving end performs packet loss recovery processing, the quality of the voice signal obtained is poor.
  • the score difference is small, it means When the packet loss recovery capability is greater than the preset threshold, it means that the voice signal quality obtained by the receiving end if the packet loss recovery processing is performed after the current encoded data is lost is within an acceptable range.
  • step S306 Determine whether redundant encoding processing is required according to the packet loss recovery capability; if so, perform step S308, perform redundant encoding according to the current encoded data to generate corresponding redundant packets, and then transmit the current encoded data and redundant packets to The receiving end; if not, step S310 is executed to directly transmit the currently encoded data to the receiving end.
  • the sending end After the sending end obtains the packet loss recovery capability corresponding to the current encoded data through the packet loss recovery capability prediction model, it decides whether to add the current encoded data to the FEC redundancy code according to the predicted packet loss recovery capability.
  • the packet loss recovery capability output by the packet loss recovery capability prediction model is a value within a numerical range
  • the sender can compare the packet loss recovery capability with a preset threshold, and determine whether it needs to be corrected according to the comparison result.
  • the current encoded data undergoes redundant encoding processing.
  • the packet loss recovery capability when the packet loss recovery capability is less than the preset threshold, the current encoded data is subjected to redundant encoding to generate a corresponding redundant packet, and then the current encoded data and the redundant packet are transmitted to the receiving end.
  • the packet loss recovery capability When the packet loss recovery capability is When the value is less than the preset threshold, it means that if the receiving end performs packet loss recovery processing after the current encoded data is lost, the quality of the voice signal obtained by the packet loss recovery process is poor. Therefore, FEC redundant encoding needs to be used to combat the packet loss problem of the transmission network, that is, the current encoded data needs to be changed. Add FEC redundant coding to generate redundant packets and then transmit them to the receiving end.
  • the current encoded data is directly transmitted to the receiving end.
  • the loss recovery capability is greater than the preset threshold, it means that the current encoded data is lost if the receiving end performs packet loss recovery processing to obtain the voice
  • the signal quality is within the acceptable range, so for the encoded data, the sender does not need to use FEC redundant coding as an anti-packet loss strategy.
  • the sender can directly transmit the current encoded data to the receiver. If the current encoded data is lost In the case of, directly use the built-in packet loss recovery algorithm in the decoder of the receiving end to perform packet loss recovery processing on the current encoded data.
  • the packet loss recovery capability output by the packet loss recovery capability prediction model is of two types.
  • the packet loss recovery capability is the first value, it means that the current encoded data is lost if the receiving end performs packet loss recovery processing. If the quality of the voice signal is poor, the sender needs to perform FEC redundant encoding on the current encoded data packet before transmitting to the receiver; when the loss recovery capability is the second value, it means that if the current encoded data is lost, if the receiver loses The voice signal quality obtained by the packet recovery process is within the acceptable range, then the sender can directly transmit the current encoded data to the receiver, and in the case of loss of the current encoded data, directly use the built-in packet loss in the decoder of the receiver
  • the recovery algorithm can perform packet loss recovery processing on the current encoded data.
  • the first value can be 1, and the second value can be 0.
  • the first value can be 0, and the second value can be 1.
  • the voice coding stream to be transmitted includes P1, P2, P3, P4..., assuming that the current coded data is P7, and the sender predicts that the loss recovery ability corresponding to P7 is weak, then P7 can be added to the need for redundancy.
  • the remaining coded buffer queue (at this time, the buffer queue may be empty, or it may have stored the previous coded data, such as P5).
  • the sender can buffer the encoded data in the queue for redundant encoding to generate redundant packets, and then buffer the The encoded data in the queue and the generated redundant packets are sent to the receiving end, and the buffer queue is cleared at the same time.
  • transmitting the current encoded data and the redundant packet to the receiving end includes: obtaining feedback from the receiving end Packet loss status information; determine the redundancy rate corresponding to the current encoded data according to the packet loss status information; according to the redundancy rate, generate redundant packets based on the current encoded data and then combine the current encoded data with The redundant packet is transmitted to the receiving end.
  • the receiving end may determine the packet loss status information according to the received data packet, and feed back the packet loss status information to the sending end.
  • the packet loss status information can be represented by the current packet loss rate.
  • the receiver can encapsulate the packet loss rate into a message and send it to the sender, and the sender parses the received control message to obtain the packet loss rate.
  • the sending end can adjust the redundancy rate To achieve different levels of anti-packet loss effects, that is, a large redundancy rate can solve more continuous packet loss problems, and a small redundancy rate can solve a small number of continuous packet loss or sporadic packet loss problems, that is, r under high packet loss rate. The value is larger, and the value of r under low packet loss rate is smaller.
  • the voice transmission method further includes: when the receiving end receives the currently encoded data, directly performing voice decoding on the currently encoded data to obtain the voice signal corresponding to the current encoded data; when the receiving end does not receive the currently encoded data And when a redundant packet is received, the receiving end performs redundant decoding processing based on the redundant packet to obtain the current encoded data and then perform voice decoding on the current encoded data to obtain the voice signal corresponding to the current encoded data.
  • the sender adds encoded data P3, P4, P6, P7, P8, and P9 to the buffer queue after predicting the loss recovery ability (the length of the buffer queue can be set as needed, for example, 6) , Perform redundant encoding to generate redundant packets R1, R2, encapsulate the encoded data P3, P4, P6, P7, P8, P9 in the buffer queue and the generated redundant packets R1, R2 into a data group and send it to the receiving end
  • the packet sequence number of each data packet in the data group can be continuous, for example, it can be 1, 2, 3, 4, 5, and 6 in sequence.
  • the receiving end can directly decode the voice according to the received P3, P4, P6 to obtain the corresponding voice signal; at the same time, the receiving end can buffer P3, P4 and P6 are used for subsequent possible FEC redundant decoding. If there is no packet loss in the subsequent group of data, the buffer is cleared.
  • the receiving end When the receiving end receives P8 and P9, the receiving end can judge that P7 is lost according to the packet sequence number. At this time, the receiving end buffers P8 and P9 until R1 is received, and the receiving end can according to the buffered P3, P4, P6, P8 , P9 and R1 perform redundant decoding processing to obtain the missing P7. When R2 continues to be received, it can be discarded directly.
  • the voice transmission method further includes:
  • the receiving end When the receiving end does not receive the current encoded data and redundant packets, the receiving end performs packet loss recovery processing on the current encoded data to obtain the recovery packet corresponding to the current encoded data, and performs voice decoding on the recovery packet to obtain the current encoded data The corresponding voice signal.
  • the receiving end in the case of loss of P7, if the receiving end does not receive R1 and R2 within a certain period of time, the receiving end cannot recover P7 based on the cached P3, P4, P6, P8, and P9, it needs to pass
  • the PLC algorithm built in the decoder performs packet loss recovery processing on the current encoded data, usually based on the decoding information of the previous data packet, using the method of pitch synchronization repetition to approximately replace the current encoded data as a recovery package, and then decode the recovery package. Obtain the voice signal.
  • the receiving end also needs to perform packet loss recovery processing on the current encoded data through the PLC algorithm built into the decoder.
  • the packet loss recovery capability prediction model based on machine learning is used according to the first voice encoding feature parameters corresponding to the current encoded data and the second voice corresponding to the previous encoded data.
  • the encoding feature parameters are used to predict the receiving end’s ability to recover from the loss of the current encoded data, and then determine whether to perform redundant encoding on the current encoded data according to the loss recovery ability. If so, the current encoded data needs to be redundantly encoded to generate redundancy. After the remaining packets, the necessary network bandwidth resources are consumed to transmit the redundant packets to the receiving end.
  • the current encoded data does not need to be redundantly encoded, and the current encoded data is directly transmitted to the receiving end to avoid consuming too much network bandwidth Resources, thereby effectively improving the utilization of network bandwidth as a whole, while also ensuring the anti-packet loss ability of the transmission network.
  • FIG. 7 it is a flowchart of a voice transmission method in an embodiment.
  • the sending end obtains the original voice signal, and performs voice coding on the original voice signal to obtain a voice coded stream.
  • the transmitting end predicts the packet loss recovery capability of the receiving end for each coded data in the speech encoding bitstream by using a packet loss recovery capability prediction model based on machine learning. Then, it is determined whether to enable FEC redundancy coding for the current coded data according to the predicted packet loss recovery capability.
  • the redundancy rate is set according to the packet loss status information fed back by the receiving end, and redundant packets are generated according to the redundancy rate according to the current encoded data, and the current encoded data and redundancy The remaining packets are transmitted to the receiving end. If it is decided not to enable redundant coding for the current coded data, the current coded data is directly transmitted to the receiving end.
  • the receiving end If the receiving end receives the currently encoded data, it will reconstruct the voice signal according to the normal decoding process. If the receiving end does not receive the current coded data, but receives a redundant packet, the receiving end can perform FEC redundant decoding to obtain the current coded data under the condition that the packet loss can be recovered through redundant decoding. If the receiving end does not receive the current encoded data and the corresponding redundant packet within a certain period of time, it is determined that the current encoded data is lost, and the receiving end can perform packet loss recovery processing on the current encoded data through the built-in PLC algorithm of the decoder and then decode it. voice signal.
  • FIG. 8 it is a schematic flowchart of a voice transmission method in a specific embodiment. Refer to Figure 8, including the following steps:
  • S806 Perform voice coding on the voice segments in the original voice sequence in sequence to obtain a voice coding bitstream.
  • S810 Acquire current coded data in the speech coding bitstream.
  • S814 Using the packet loss recovery capability prediction model, according to the first voice encoding feature parameter and the second voice encoding feature parameter, output the first voice quality score determined by directly decoding the current encoded data and perform packet loss recovery on the current encoded data After processing, the score difference between the determined second voice quality scores is decoded.
  • S816 Determine the packet loss recovery capability corresponding to the current encoded data according to the difference in scores.
  • a voice transmission system may be the voice transmission system shown in FIG. 1 or FIG. 2, and includes a sending end 110 and a receiving end 120:
  • the sending end 110 is used to obtain the current coded data in the voice coding bitstream, through the packet loss recovery capability prediction model based on machine learning, according to the first voice coding feature parameter corresponding to the current coded data and the previous coded data corresponding to the current coded data To obtain the packet loss recovery capability corresponding to the current encoded data;
  • the sending end 110 is also used to determine whether redundant encoding processing is required according to the packet loss recovery capability; if so, perform redundant encoding according to the current encoded data to generate a corresponding redundant packet, and then transmit the current encoded data and the redundant packet to Receiving end; if not, directly transmit the current coded data to the receiving end;
  • the receiving terminal 120 is used to directly decode the current encoded data to obtain the voice signal corresponding to the current encoded data when receiving the current encoded data; it is also used to when the current encoded data is not received and redundant packets are received, Then, the receiving end performs redundant decoding processing based on the redundant packets to obtain the current coded data and then perform voice decoding on the current coded data to obtain the voice signal corresponding to the current coded data;
  • the receiving end 120 is also used to perform packet loss recovery processing on the current encoded data through the receiving end when the current encoded data and redundant packets are not received, to obtain a recovery packet corresponding to the current encoded data, and to perform voice decoding on the recovery packet to obtain The voice signal corresponding to the current coded data.
  • the sending end 110 is also used to obtain the original voice signal; divide the original voice signal to obtain the original voice sequence; and sequentially perform voice coding on the voice segments in the original voice sequence to obtain a voice code stream.
  • the sending end 110 is also used to obtain the voice coding feature parameters corresponding to the voice segments in the original voice sequence; perform voice coding on the corresponding voice segments according to the voice coding feature parameters, and generate the corresponding coded data to obtain the voice. Encoding code stream; buffering the voice coding characteristic parameters used by each coded data in the voice coding process.
  • the sending end 110 is further configured to input the first speech coding characteristic parameter corresponding to the current coded data and the second speech coding characteristic parameter corresponding to the previous coded data of the current coded data into the packet loss recovery capability prediction model; Through the packet loss recovery capability prediction model, according to the first voice coding feature parameter and the second voice coding feature parameter, output the first voice quality score determined by directly decoding the current coded data and perform packet loss recovery processing on the current coded data Decoding the determined score difference between the second voice quality scores; determining the packet loss recovery capability corresponding to the current encoded data according to the score difference; wherein, the packet loss recovery capability corresponding to the current encoded data is inversely related to the score difference.
  • the sending end 110 is also used to obtain the packet loss status information fed back by the receiving end; determine the redundancy rate corresponding to the current encoded data according to the packet loss status information; according to the redundancy rate, generate redundancy based on the current encoded data. After the remaining packets, the current encoded data and redundant packets are transmitted to the receiving end.
  • the receiving end 120 is further configured to directly perform voice decoding on the currently encoded data when the receiving end receives the currently encoded data to obtain the voice signal corresponding to the currently encoded data.
  • the receiving end 120 is further configured to: when the receiving end does not receive the currently encoded data and receives a redundant packet, the receiving end performs redundant decoding processing based on the redundant packet to obtain the current encoded data. Perform voice decoding on the current coded data to obtain the voice signal corresponding to the current coded data.
  • the receiving end 120 is further configured to, when the receiving end does not receive the current encoded data and redundant packets, perform packet loss recovery processing on the current encoded data through the receiving end to obtain a recovery packet corresponding to the current encoded data. , Perform voice decoding on the recovery packet to obtain the voice signal corresponding to the current encoded data.
  • the sending end 110 is also used to obtain the sample voice sequence in the training set; perform voice coding on the sample voice sequence to obtain the sample voice coding code stream; extract the first coded data in the sample voice code stream.
  • the first voice quality score obtain the current encoded data to perform simulated packet loss recovery processing to obtain a recovery package, decode the recovery package and obtain the second voice signal, and then determine the second voice quality score based on the second voice signal;
  • the score difference between the first voice quality score and the second voice quality score determines the true packet loss recovery capability corresponding to the current encoded data;
  • the first voice encoding feature parameter and the second voice encoding feature parameter are input to the machine learning model, and the machine The learning model outputs the predicted packet loss recovery ability corresponding to the current encoded data; after
  • the sender before transmitting the current coded data to the receiving end, uses a machine learning-based packet loss recovery capability prediction model, according to the first voice coding feature parameter corresponding to the current coded data and the first voice coding characteristic parameter corresponding to the previous coded data.
  • Speech coding feature parameters are used to predict the receiving end’s ability to recover from packet loss of the current encoded data, so as to determine whether to perform redundant encoding on the current encoded data according to the packet loss recovery ability. If so, the current encoded data needs to be redundantly encoded After generating the redundant packets, the necessary network bandwidth resources are consumed to transmit the redundant packets to the receiving end.
  • the current encoded data does not need to be redundantly encoded, and the current encoded data is directly transmitted to the receiving end to avoid excessive consumption.
  • Network bandwidth resources thereby effectively improving the utilization of network bandwidth as a whole, while also ensuring the anti-packet loss ability of the transmission network.
  • a voice transmission device 900 which can be implemented as all or a part of the receiving end through software, hardware or a combination of the two.
  • the device includes an acquisition module 902, a prediction module 904, and a redundant coding decision module 906:
  • the obtaining module 902 is used to obtain the current coded data in the speech coding code stream
  • the prediction module 904 is configured to obtain the current encoding feature parameters corresponding to the first voice encoding data corresponding to the current encoded data and the second encoding feature parameters corresponding to the previous encoded data of the current encoded data through the packet loss recovery capability prediction model based on machine learning.
  • the loss recovery capability corresponding to the encoded data
  • the redundant coding decision module 906 is used to decide whether to perform redundant coding processing according to the packet loss recovery ability; if so, perform redundant coding according to the current coding data to generate a corresponding redundant packet, and then combine the current coding data and the redundant The packet is transmitted to the receiving end; if not, the current coded data is directly transmitted to the receiving end.
  • the voice transmission device 900 further includes a voice coding module for obtaining the original voice signal; dividing the original voice signal to obtain the original voice sequence; and sequentially performing voice coding on the voice segments in the original voice sequence to obtain the voice Encoding stream.
  • the voice transmission device 900 further includes a voice coding module and a buffer module.
  • the voice coding module is used to obtain the voice coding feature parameters corresponding to the voice segments in the original voice sequence; Perform voice coding, generate the corresponding coded data, and obtain the voice coding code stream; the buffer module is used to buffer the voice coding characteristic parameters used by each coded data in the voice coding process.
  • the prediction module 904 is further configured to input the first speech coding feature parameter corresponding to the current coded data and the second speech coding feature parameter corresponding to the previous coded data of the current coded data into the packet loss recovery capability prediction model; Through the packet loss recovery capability prediction model, according to the first voice coding feature parameter and the second voice coding feature parameter, output the first voice quality score determined by directly decoding the current coded data and perform packet loss recovery processing on the current coded data Decode the score difference between the second voice quality scores determined; determine the packet loss recovery capability corresponding to the current encoded data according to the score difference; wherein, the packet loss recovery capability corresponding to the current encoded data is inversely related to the score difference.
  • the redundant encoding decision module 906 is further configured to obtain the packet loss status information fed back by the receiving end when the packet loss recovery capability is less than a preset threshold; and determine the redundancy corresponding to the current encoded data according to the packet loss status information. Residual rate: According to the redundancy rate, the current encoded data and the redundant packet are transmitted to the receiving end after the redundant packet is generated according to the current encoded data.
  • the voice transmission device 900 further includes a model training module, which is used to obtain the sample voice sequence in the training set; perform voice coding on the sample voice sequence to obtain the sample voice coding bitstream; extract the current sample voice coding bitstream.
  • the voice transmission device 900 Before transmitting the current encoded data to the receiving end, the voice transmission device 900 uses a machine learning-based packet loss recovery capability prediction model, according to the first voice encoding feature parameter corresponding to the current encoded data and the second corresponding to the previous encoded data. Speech coding feature parameters are used to predict the receiving end’s ability to recover from packet loss of the current encoded data, and then determine whether to perform redundant encoding on the current encoded data according to the packet loss recovery ability. If so, the current encoded data needs to be redundantly encoded to generate After the redundant packet, the necessary network bandwidth resources are consumed to transmit the redundant packet to the receiving end.
  • the current encoded data does not need to be redundantly encoded, and the current encoded data is directly transmitted to the receiving end to avoid excessive network consumption Bandwidth resources, thereby effectively improving the utilization of network bandwidth as a whole, while also ensuring the anti-packet loss ability of the transmission network.
  • Fig. 10 shows an internal structure diagram of a computer device in an embodiment.
  • the computer device may specifically be the sending end 110 in FIG. 1.
  • the computer device includes the computer device including a processor, a memory, and a network interface connected through a system bus.
  • the memory includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium of the computer device stores an operating system, and may also store computer-readable instructions.
  • the processor can realize the voice transmission method.
  • the internal memory may also store computer-readable instructions, and when the computer-readable instructions are executed by the processor, the processor can execute the voice transmission method.
  • FIG. 10 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
  • the voice transmission device 900 provided in the present application may be implemented in a form of computer-readable instructions, and the computer-readable instructions may run on the computer device as shown in FIG. 10.
  • the memory of the computer device can store various program modules that make up the voice transmission device 900, for example, the acquisition module 902, the prediction module 904, and the redundant coding decision module 906 shown in FIG. 9.
  • the computer-readable instructions formed by each module cause the processor to execute the steps in the voice transmission method of each embodiment of the present application described in this specification.
  • the computer device shown in FIG. 10 may execute step S302 through the acquiring module 902 in the voice transmission apparatus 900 shown in FIG. 9.
  • the computer device may execute step S304 through the prediction module 904.
  • the computer device can execute steps S306, S308, and S310 through the redundant coding decision module 906.
  • a computer device including a memory and a processor, and the memory stores computer-readable instructions, and when the computer-readable instructions are executed by the processor, the processor executes the steps of the above-mentioned voice transmission method.
  • the steps of the voice transmission method here may be the steps in the voice transmission method of each of the foregoing embodiments.
  • a computer-readable storage medium which stores computer-readable instructions, and when the computer-readable instructions are executed by a processor, the processor executes the steps of the above-mentioned voice transmission method.
  • the steps of the voice transmission method here may be the steps in the voice transmission method of each of the foregoing embodiments.
  • a computer program product or computer readable instruction includes a computer readable instruction, and the computer readable instruction is stored in a computer readable storage medium.
  • the processor of the computer device reads the computer-readable instruction from the computer-readable storage medium, and the processor executes the computer-readable instruction, so that the computer device executes the steps in the foregoing method embodiments.
  • a person of ordinary skill in the art can understand that all or part of the processes in the methods of the foregoing embodiments can be implemented by instructing relevant hardware through computer-readable instructions.
  • the computer-readable instructions can be stored in a non-volatile computer.
  • Non-volatile memory may include read-only memory (Read-Only Memory, ROM), magnetic tape, floppy disk, flash memory, or optical storage.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM may be in various forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Detection And Prevention Of Errors In Transmission (AREA)

Abstract

一种语音传输方法,包括:获取语音编码码流中的当前编码数据;通过基于机器学习的丢包恢复能力预测模型,根据当前编码数据对应的第一语音编码特征参数以及当前编码数据的在前编码数据对应的第二语音编码特征参数,获得当前编码数据对应的丢包恢复能力;根据丢包恢复能力判决是否需要进行冗余编码处理;若是,则根据当前编码数据进行冗余编码生成相应的冗余包后,再将当前编码数据及冗余包传输至接收端;若否,则直接将当前编码数据传输至接收端。

Description

语音传输方法、系统、装置、计算机可读存储介质和设备
本申请要求于2020年02月20日提交中国专利局,申请号为202010104793.7,申请名称为“语音传输方法、系统、装置、计算机可读存储介质和设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,特别是涉及一种语音传输方法、系统、装置、计算机可读存储介质和计算机设备。
背景技术
互联网是非可靠的传输网络,基于互联网的语音传输面对的主要问题就是抗丢包问题,由于传输网络的不稳定性,传输过程会出现丢包现象。为了抵抗网络丢包,通常会采用FEC(Forward Error Correction,前向纠错)冗余编码这种信道编码算法生成冗余包,将冗余包与数据包一起发送到接收端,接收端收到后通过冗余包和原始包来恢复出丢失的数据包,从而起到抗丢包的效果。
然而,FEC冗余编码依靠生成冗余包来抵抗传输网络的丢包问题,势必带来带宽成倍数的增大,消耗过多网络带宽资源,抗丢包能力越强则消耗网络带宽越多,尤其对于带宽受限场景下容易出现网络拥塞等问题反而会导致丢包更多。
发明内容
一种语音传输方法,包括:
获取语音编码码流中的当前编码数据;
通过基于机器学习的丢包恢复能力预测模型,根据当前编码数据对应的第一语音编码特征参数以及所述当前编码数据的在前编码数据对应的第二语音编码特征参数,获得所述当前编码数据对应的丢包恢复能力;
根据所述丢包恢复能力判决是否需要进行冗余编码处理;
若是,则根据所述当前编码数据进行冗余编码生成相应的冗余包后,再将所述当前编码数据及所述冗余包传输至接收端;及
若否,则直接将所述当前编码数据传输至接收端。
一种语音传输系统,包括发送端和接收端,其中:
所述发送端用于获取语音编码码流中的当前编码数据,通过基于机器学习的丢包恢复能力预测模型,根据所述当前编码数据对应的第一语音编码特征参数以及所述当前编码数据的在前编码数据对应的第二语音编码特征参数,获得所述当前编码数据对应的丢包恢复能力;
所述发送端还用于根据所述丢包恢复能力判决是否需要进行冗余编码处理;若是,则根据所述当前编码数据进行冗余编码生成相应的冗余包后,再将所述当前编码数据及所述冗余包传输至接收端;若否,则直接将所述当前编码数据传输至接收端;
所述接收端用于接收到所述当前编码数据时,则直接对所述当前编码数据进行语音解码,获得所述当前编码数据对应的语音信号;还用于当未接收到所述当前编码数据、且接收到所述冗余包时,则通过所述接收端基于所述冗余包进行冗余解码处理,得到所述当前编码数据后再对所述当前编码数据进行语音解码,获得所述当前编码数据对应的语音信号;及
所述接收端还用于未接收到所述当前编码数据及所述冗余包时,则通过所述接收端对所述当前编码数据进行丢包恢复处理,获得对应于所述当前编码数据的恢复包,对所述恢复包进行语音解码,获得所述当前编码数据对应的语音信号。
一种语音传输装置,所述装置包括:
获取模块,用于获取语音编码码流中的当前编码数据;
预测模块,用于通过基于机器学习的丢包恢复能力预测模型,根据当前编码数据对应的第一语音编码特征参数以及所述当前编码数据的在前编码数据对应的第二语音编码特征参数,获得所述当前编码数据对应的丢包恢复能力;及
冗余编码判决模块,用于根据所述丢包恢复能力判决是否需要进行冗余编码处理;若是,则根据所述当前编码数据进行冗余编码生成相应的冗余包后,再将所述当前编码数据及所述冗余包传输至接收端;若否,则直接将所述当前编码数据传输至接收端。
一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行上述语音传输方法的步骤。
一种计算机设备,包括存储器和一个或多个处理器,存储器中储存有计算机可读指令,计算机可读指令被处理器执行时,使得一个或多个处理器执行上述语音传输方法的步骤。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为一个实施例中语音传输方法的应用环境图;
图2为另一个实施例中语音传输方法的应用环境图;
图3为一个实施例中语音传输方法的流程示意图;
图4为一个实施例中采用FEC冗余编码机制进行语音传输的示意框图;
图5为一个实施例中丢包恢复能力预测模型的训练步骤的流程示意图;
图6为一个实施例中丢包恢复能力预测模型的训练框图;
图7为一个实施例中语音传输方法的流程框图;
图8为一个具体的实施例中语音传输方法的流程示意图;
图9为一个实施例中语音传输装置的结构框图;
图10为一个实施例中计算机设备的结构框图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
图1为一个实施例中语音传输方法的应用环境图。参照图1,该语音传输方法应用于语音传输系统。该语音传输系统包括发送端110和接收端120。发送端110和接收端120通过网络连接。发送端110、接收端120均可以是终端,终端具体可以是台式终端或移动终端,移动终端具体可以手机、平板电 脑、笔记本电脑等中的至少一种。在另一些实施例中,发送端110、接收端120也可以是服务器或服务器集群。
如图2所示,在一个具体的应用场景中,发送端110与接收端120上均运行有支持语音传输功能的应用程序,服务器130可以为该应用程序提供计算能力及存储能力,发送端110、接收端120均可以通过网络与服务器130连接,从而基于该服务器130实现两端的语音传输。服务器130可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
在一个实施例中,发送端110可以获取语音编码码流中的当前编码数据;通过基于机器学习的丢包恢复能力预测模型,根据当前编码数据对应的第一语音编码特征参数以及当前编码数据的在前编码数据对应的第二语音编码特征参数,获得当前编码数据对应的丢包恢复能力;根据丢包恢复能力判决是否需要进行冗余编码处理;若是,则根据当前编码数据进行冗余编码生成相应的冗余包后,再将当前编码数据及冗余包传输至接收端120;若否,则直接将当前编码数据传输至接收端120,能够在整体上有效提升网络带宽的利用率,同时也能保证传输网络的抗丢包能力。
如图3所示,在一个实施例中,提供了一种语音传输方法。本实施例主要以该方法应用于上述图1或图2中的发送端110来举例说明。参照图3,该语音传输方法具体包括如下步骤S302至S308:
S302,获取语音编码码流中的当前编码数据。
语音编码码流是对语音信号进行语音编码后获得的原始码流,语音编码码流中包括一组待传输的编码数据。编码数据可以是发送端的语音编码器按照特定帧长对语音信号进行编码获得的编码数据帧,发送端可以通过网络将语音编码码流中的编码数据帧传输至接收端。编码数据也可以是根据多个编码数据帧合成得到的一个编码数据包,发送端可以通过网络将语音编码码流中的编码数据包传输至接收端。例如,发送端的编码器获取60ms的语音信号,以15ms为帧长将语音信号划分为4帧,并按顺序进行编码,获得4个编码数据帧,发送端可以依次将编码数据帧传输至接收端,发送端还可以将这4个编码数据帧合成一个编码数据包后再通过网络传输至接收端。
通常为了抵抗传输网络的丢包问题,如图4所示,发送端在将语音编码 码流传输至接收端之前,会直接采用FEC冗余编码将语音编码码流中的各个编码数据发送至接收端,接收端可以通过网络接收各个编码数据及对应的冗余包,并根据冗余包进行冗余解码获得丢失的编码数据后再经过解码获得语音信号。例如,待传输的语音编码码流包括P1、P2、P3、P4及P5五个编码数据,接收端可以根据这5个编码数据进行冗余编码生成冗余包,冗余包的数量可以是一个或多个,这里假设生成2个冗余包R1和R2,再将P1、P2、P3、P4、P5与R1、R2打包后发送至接收端。
而在本申请提供的实施例中,在发送端对原始语音信息进行编码获得语音编码码流后,在将语音编码码流中的各个编码数据发送至接收端之前,发送端可以依次预测接收端对语音编码码流中各个编码数据的丢包恢复能力,因此,发送端可以依次获取语音编码码流中的编码数据,当前编码数据是当前待传输至接收端的编码数据。
可以理解,本申请所使用的当前编码数据用于描述发送端当前正在处理的编码数据,在前编码数据用于描述语音编码码流中在当前编码数据之前的编码数据,在前编码数据可以是当前编码数据的前一个编码数据,还可以是当前编码数据的前面多个编码数据,比如,可以是当前编码数据的前两个编码数据。另外,当前编码数据是一个相对变化的对象,比如,在发送端对当前编码数据F(i)处理结束后,则可将语音编码码流中该当前编码数据F(i)的下一个编码数据F(i+1)作为新的当前编码数据,将当前编码数据F(i)作为新的当前编码数据F(i+1)的在前编码数据。
在一个实施例中,上述语音传输方法还包括:获取原始语音信号;将原始语音信号进行分割,获得原始语音序列;依次对原始语音序列中的语音片段进行语音编码,获得语音编码码流。
例如,发送端获取的原始语音信号是一段2秒的语音,以20毫秒为单位将这一段语音信号进行分割,获得100个语音片段构成的原始语音序列,然后依次对该原始语音序列中的各个语音片段进行语音编码,获得各个语音片段对应的编码数据,从而生成原始语音信号对应的语音编码码流。
在一个实施例中,上述语音传输方法还包括:获取原始语音序列中的语音片段各自对应的语音编码特征参数;根据语音编码特征参数对相应的语音 片段进行语音编码,生成对应的编码数据后获得语音编码码流;缓存语音编码过程中各个编码数据所采用的语音编码特征参数。
具体地,在语音编码过程中,发送端提取原始语音序列中的语音片段的语音编码特征参数,对提取的语音编码特征参数进行编码,生成各个语音片段对应的编码数据,例如,发送端的编码器通过一些语音信号处理模型(如滤波器、特征提取器等)提取语音片段的语音编码特征参数,再对这些语音编码特征参数进行编码(如熵编码)后按照一定数据格式进行打包获得对应的编码数据。需要说明的是,发送端可以根据当前语音片段的语音编码特征参数及在前语音片段的语音编码特征参数联合生成当前语音片段对应的当前编码数据,还可以根据当前语音片段的语音编码特征参数及在后语音片段的语音编码特征参数联合生成当前语音片段对应的当前编码数据。语音编码特征参数可以是根据语音片段进行信号处理提取的线谱对频率(Line spectrum Frequency,LSF)、基音周期(Pitch Detection)、自适应码书增益(adaptive gain)及固定码书增益等参数。
进一步地,发送端生成每个语音片段对应的编码数据时,还会缓存编码过程中各个语音片段的语音编码特征参数,也就是生成各个编码数据时所采用的语音编码特征参数,用于后续基于缓存的语音编码特征参数预测各个编码数据对应的丢包恢复能力。
S304,通过基于机器学习的丢包恢复能力预测模型,根据当前编码数据对应的第一语音编码特征参数以及当前编码数据的在前编码数据对应的第二语音编码特征参数,获得当前编码数据对应的丢包恢复能力。
丢包恢复能力是一种预测结果,能够反映接收端在当前编码数据丢失后对当前编码数据进行丢包恢复处理所获得的恢复包的语音质量情况。预测结果指示了接收端可以很好地恢复丢失的当前编码数据或是不能很好地恢复丢失的当前编码数据。丢包恢复处理即为PLC(Packet Loss Concealment丢包隐藏),丢包恢复能力即为PLC的丢包恢复能力。
在编码数据的语音编码特征参数存在数值突变的情况下,接收端丢包恢复能力存在局限,例如,在相邻或相近的编码数据存在基频跳变、LSF突变等情况下,接收端的丢包恢复能力有限,此种情况下发送端开启FEC冗余编 码能够有效提升丢包率从而保证接收端的语音质量;而在相邻编码数据的语音编码特征参数的数值波动相对平稳的情况下,接收端通常具备良好的丢包恢复能力,此种情况下发送端可以无需开启FEC冗余编码。据此可知,当前编码数据对应的丢包恢复能力与其对应的语音编码特征参数存在联系,机器学习模型可以通过大量的训练样本进行训练后,学习到如何根据语音编码特征参数预测数据包对应的丢包恢复能力。
具体地,发送端可以获取缓存的当前编码数据对应的第一语音编码特征参数,以及在前编码数据对应的第二语音编码特征参数,并通过事先训练好的丢包恢复能力预测模型,根据第一语音编码特征参数与第二语音编码特征参数对当前编码数据对应的丢包恢复能力进行预测。
在另一些实施例中,发送端可以通过丢包恢复能力预测模型,根据当前编码数据对应的第一语音编码特征参数以及当前编码数据的在后编码数据对应的第三语音编码特征参数,获得当前编码数据对应的丢包恢复能力。或者,根据第二语音编码特征参数和/或第三语音编码特征参数,获得当前编码数据对应的丢包恢复能力。在后编码数据用于描述语音编码码流中在当前编码数据后面的编码数据,在后编码数据可以是当前编码数据的后一个编码数据,还可以是当前编码数据的后面多个编码数据,比如,可以是当前编码数据的后两个编码数据。
可以理解的是,发送端使用哪些编码数据对应的语音编码特征参数作为丢包恢复能力预测模型的输入,取决于发送端进行语音编码时所采用的算法规则或接收端进行语音解码时所采用的算法规则,编解码规则是相互对应的。例如,若发送端在生成当前编码数据时,需要根据前一编码数据所对应的语音编码特征参数,则对当前编码数据对应的丢包恢复能力进行预测时,需要将前一编码数据所采用的语音编码特征参数作为丢包恢复能力预测模型的输入;若发送端在生成当前编码数据时,需要根据后一编码数据所采用的语音编码特征参数,则对当前编码数据对应的丢包恢复能力进行预测时,需要将后一编码数据所采用的语音编码特征参数作为丢包恢复能力预测模型的输入。
丢包恢复能力预测模型是基于机器学习的计算机模型,可以采用神经网 络模型实现。机器学习模型可通过样本进行学习,从而具备特定的能力。在本实施例中,丢包恢复能力预测模型是事先训练得到的具备预测丢包恢复能力的模型。
在一个实施例中,发送端可以事先设置机器学习模型的模型结构,得到初始机器学习模型,再通过大量样本语音和丢包模拟测试,对该初始机器学习模型进行训练得到机器学习模型的模型参数。据此,在需要通过网络传输语音时,发送端可以获取事先训练得到的模型参数,再将该模型参数导入初始机器学习模型,得到丢包恢复能力预测模型,并通过丢包恢复能力预测模型对语音编码码流中各个编码数据对应的丢包恢复能力进行预测,从而根据预测的丢包恢复能力来决定是否对当前编码数据开启FEC冗余编码。
如图5所示,为一个实施例中丢包恢复能力预测模型的训练步骤的流程示意图。需要说明的是,该训练步骤可以由任意的计算机设备执行后获得训练好的丢包恢复能力预测模型,然后将训练好的丢包恢复能力预测模型导入至需要进行语音传输的发送端中;该计算机设备也可以是图1或图2中的发送端,也就是说该训练步骤也可以直接由发送端执行并获得训练好的丢包恢复能力预测模型。下面以计算机设备为执行主体来举例说明丢包恢复能力预测模型的训练步骤,具体包括:
S502,获取训练集中的样本语音序列。
具体地,计算机设备可以获取大量的语音信号,并将语音信号进行分割,获得语音片段构成的大量的语音信号序列,作为用于训练机器学习模型的样本语音序列。
S504,对样本语音序列进行语音编码,获得样本语音编码码流。
具体地,对于每个样本语音序列,计算机设备提取各个语音片段对应的语音编码特征参数,并根据提取的语音编码特征参数生成各个语音片段对应的编码数据,获得每条样本语音序列对应的样本语音编码码流。计算机设备可以缓存编码过程中各个编码数据所采用的语音编码特征参数。
S506,提取样本语音编码码流中的当前编码数据所采用的第一语音编码特征参数以及当前编码数据的在前编码数据所采用的第二语音编码特征参数。
前面提到,编码数据对应的丢包恢复能力与其对应的语音编码特征参数存在联系,还可能与在前编码数据和/或在后编码数据对应的语音编码特征参数存在联系,因此,在训练时,计算机设备可以将语音编码特征参数作为机器学习模型的输入来进行训练。在本实施例中,发送端可以提取当前处理的当前编码数据对应的第一语音编码特征参数以及当前编码数据的在前编码数据对应的第二语音编码特征参数,作为机器学习模型的输入。如前文提到的,在前编码数据是当前编码数据的前一个编码数据,还可以是当前编码数据的前面多个编码数据。
需要说明的是,每一次的训练对象是一个编码数据,而每条样本语音编码码流包括多个编码数据,因此每条样本语音编码码流可以用于多次训练。例如,在训练过程中时,发送端可以提取样本语音编码码流S中的第i个编码数据对应的语音编码特征参数以及第i-1个编码数据对应的语音编码特征参数,发送端还可以提取样本语音编码码流S中的第i+1个编码数据对应的语音编码特征参数以及第i个编码数据对应的语音编码特征参数。
S508,获取直接对样本语音编码码流进行解码并获得第一语音信号后,基于第一语音信号所确定的第一语音质量评分。
为了获得本次训练过程机器学习模型的目标输出,发送端需要执行S508至步骤S512。计算机设备可以对编码后获得的样本语音编码码流直接进行解码获得第一语音信号后,使用语音质量测试工具测试该第一语音信号对应的第一语音质量评分。由于第一语音信号是直接对样本语音编码码流进行解码获得的,不存在编码数据丢失的情况,因而获得的第一语音信号与原始的样本语音序列非常靠近,可以称之为无损语音信号,对应的第一语音质量评分可以称之为无损语音质量评分。
在一个实施例中,语音质量测试工具可以是PESQ(Perceptual evaluation of speech quality,主观语音质量评估),PESQ可以根据一些衡量标准来客观地评价语音信号的质量,从而提供可以完全量化的语音质量衡量方法,这些衡量标准又是与人类对语音质量的感受吻合度较高。获得的第一语音质量评分可以记为MOS_UNLOSS。
S510,获取对当前编码数据进行模拟丢包恢复处理获得恢复包,对恢复 包进行解码并获得第二语音信号后,基于第二语音信号所确定的第二语音质量评分。
接着,计算机设备可以将当前编码数据作为丢失的数据包,模拟接收端的解码器对当前编码数据进行丢包恢复处理并获得对应的恢复包,对恢复包进行解码后获得对应的第二语音信号,并将原始的样本语音序列中的其它语音片段与该第二语音信号拼接后进行语音质量评分,获得第二语音质量评分。由于第二语音信号是对模拟丢包情况下获得的恢复包进行解码获得的,恢复包与丢失的当前编码数据之间存在损失,因而获得的第二语音信号与当前编码数据对应的语音片段之间也会存在损失,第二语音信号可以称之为有损语音信号,所确定的第二语音质量评分可以称之为有损语音质量评分,记为MOS_LOSS。
S512,根据第一语音质量评分与第二语音质量评分之间的评分差异确定当前编码数据对应的真实丢包恢复能力。
具体地,当前编码数据对应的真实丢包恢复能力可以用第一语音质量评分与第二语音质量评分之间的评分差异来衡量,即可以将MOS_UNLOSS-MOS_LOSS作为当前编码数据对应的真实丢包恢复能力,也就是机器学习模型的目标输出。当前编码数据对应的真实丢包恢复能力与该评分差异成反相关,即该差值越小,代表模拟当前编码数据丢包后进行丢包恢复获得的恢复包的语音质量较好,当前编码数据对应的真实丢包恢复能力较强;反之,该差值越大,代表模拟当前编码数据丢包后进行丢包恢复获得的恢复包的语音质量较差。
S514,将第一语音编码特征参数及第二语音编码特征参数输入至机器学习模型,通过机器学习模型输出当前编码数据所对应的预测丢包恢复能力。
在获得本次训练过程的目标输出后,计算机设备可以将获得的第一语音编码特征参数及第二语音编码特征参数输入至机器学习模型,经过内部网络的处理输出当前编码数据所对应的预测丢包恢复能力。需要说明的是,S514也可以在步骤S508之前执行,本实施例对该步骤的执行顺序不作限制。
S516,根据真实丢包恢复能力与预测丢包恢复能力之间的差异调整机器学习模型的模型参数后,返回至获取训练集中的样本语音序列的步骤继续训 练,直至满足训练结束条件。
具体地,计算机设备可以根据获取的真实丢包恢复能力与通过机器学习模型获得的预测丢包恢复能力构建损失函数,将该损失函数最小化时获得模型参数作为机器学习模型最新的模型参数,继续根据样本语音序列进行下一次的训练,直至机器学习模型收敛或是训练次数达到预设次数时,获得训练好的具备丢包恢复预测能力的丢包恢复能力预测模型。
如图6所示,为一个实施例中对机器学习模型进行训练获得丢包恢复能力预测模型的框架示意图。图6给出的是单次训练过程的流程示意图。计算机设备获取样本语音序列,对样本语音序列进行语音编码获得样本语音编码码流。先对样本语音编码码流在当前编码数据无丢包的情况下直接进行解码后采用PESQ获得MOS_UNLOSS,再对当前编码数据丢失的情况下模拟进行丢包恢复处理后解码后采用PESQ获得MOS_LOSS。将当前编码数据的语音编码特征参数与其在前编码数据的语音编码特征参数作为机器学习模型的输入,获得预测丢包恢复能力,将MOS_UNLOSS-MOS_LOSS作为机器学习模型的目标输出即真实丢包恢复能力,再根据预测丢包恢复能力、真实丢包恢复能力来调整机器学习模型的模型参数,完成本次的训练过程。
在一个实施例中,步骤S304,通过基于机器学习的丢包恢复能力预测模型,根据当前编码数据对应的第一语音编码特征参数以及当前编码数据的在前编码数据对应的第二语音编码特征参数,获得当前编码数据对应的丢包恢复能力,包括:将当前编码数据对应的第一语音编码特征参数以及当前编码数据的在前编码数据对应的第二语音编码特征参数输入至丢包恢复能力预测模型;通过丢包恢复能力预测模型,根据第一语音编码特征参数及第二语音编码特征参数,输出对当前编码数据进行直接解码所确定的第一语音质量评分与对当前编码数据进行丢包恢复处理后解码所确定的第二语音质量评分之间的评分差异;根据评分差异确定当前编码数据对应的丢包恢复能力;其中,当前编码数据对应的丢包恢复能力与评分差异成反相关。
在本实施例中,在发送端将语音编码码流中的当前编码数据发送至接收端之前,可以通过事先训练好的丢包恢复能力预测模型对当前编码数据对应得丢包恢复能力进行预测。具体地,将当前编码数据对应的第一语音编码特 征参数和在前编码数据对应的第二语音编码特征参数作为丢包恢复能力预测模型的输入,丢包恢复能力预测模型的输出是对当前编码数据进行直接解码所确定的第一语音质量评分与对当前编码数据进行丢包恢复处理后解码所确定的第二语音质量评分之间的评分差异,该评分差异反映了当前编码数据丢包后接收端进行丢包恢复处理的质量情况,也就是丢包恢复能力的大小,丢包恢复能力与评分差异成反相关。当该评分差异较大也就是丢包恢复能力小于预设阈值时,表示当前编码数据丢失后如果接收端进行丢包恢复处理获得的语音信号质量较差,反之,当该评分差异较小也就是丢包恢复能力大于预设阈值时,表示当前编码数据丢失后如果接收端进行丢包恢复处理获得的语音信号质量较在可接受范围内。
S306,根据丢包恢复能力判决是否需要进行冗余编码处理;若是,则执行步骤S308,根据当前编码数据进行冗余编码生成相应的冗余包后,再将当前编码数据及冗余包传输至接收端;若否,则执行步骤S310,直接将当前编码数据传输至接收端。
具体地,当发送端通过丢包恢复能力预测模型获得当前编码数据对应的丢包恢复能力后,根据预测的丢包恢复能力来判决是否将当前编码数据加入FEC冗余编码。
在一个实施例中,丢包恢复能力预测模型输出的丢包恢复能力为处于一个数值范围内的数值,发送端可以将丢包恢复能力与预设阈值进行比较,根据比较结果来确定是否需要对当前编码数据进行冗余编码处理。
具体地,当丢包恢复能力小于预设阈值时,则根据当前编码数据进行冗余编码生成相应的冗余包后,再将当前编码数据及冗余包传输至接收端,当丢包恢复能力小于预设阈值时,表示当前编码数据丢失后如果接收端进行丢包恢复处理获得的语音信号质量较差,所以需要利用FEC冗余编码对抗传输网络的丢包问题,也就是需要将当前编码数据加入FEC冗余编码生成冗余包后再传输至接收端。当丢包恢复能力大于预设阈值时,则直接将当前编码数据传输至接收端,当丢包恢复能力大于预设阈值时,表示当前编码数据丢失后如果接收端进行丢包恢复处理获得的语音信号质量较在可接受范围内,所以对于该编码数据,发送端不需要使用FEC冗余编码作为抗丢包策略,发送 端可以直接将当前编码数据传输至接收端,若存在该当前编码数据丢失的情况下,直接使用接收端的解码器中内置的丢包恢复算法对当前编码数据进行丢包恢复处理即可。
在一个实施例中,丢包恢复能力预测模型输出的丢包恢复能力为两种类型,当丢包恢复能力为第一值时,表示当前编码数据丢失后如果接收端进行丢包恢复处理获得的语音信号质量较差,则发送端需要对当前编码数据包进行FEC冗余编码处理后再传输至接收端;当丢包恢复能力为第二值时,表示当前编码数据丢失后如果接收端进行丢包恢复处理获得的语音信号质量较在可接受范围内,则发送端可以直接将当前编码数据传输至接收端,在该当前编码数据丢失的情况下,直接使用接收端的解码器中内置的丢包恢复算法对当前编码数据进行丢包恢复处理即可。比如,第一值可以为1,第二值可以为0。又比如,第一值可以为0,第二值可以为1
例如,待传输的语音编码码流包括P1、P2、P3、P4……,假设当前编码数据为P7,发送端预测到P7对应的丢包恢复能力较弱,则可以将P7添加至需要进行冗余编码的缓存队列中(此时缓存队列可能为空,也可能已经存放了在前编码数据,如P5),若缓存队列未填满,则继续对后续的编码数据对应的丢包恢复能力进行预测,同样将后续的丢包恢复能力较弱的编码数据添加至缓存队列中,直至缓存队列被填满时,发送端可以缓存队列中的编码数据进行冗余编码生成冗余包后,将缓存队列中的编码数据及生成的冗余包发送至接收端,同时清空缓存队列。
在一个实施例中,根据所述当前编码数据进行冗余编码生成相应的冗余包后,再将所述当前编码数据及所述冗余包传输至接收端,包括:获取接收端所反馈的丢包状态信息;根据所述丢包状态信息确定所述当前编码数据所对应的冗余率;按照所述冗余率,根据所述当前编码数据生成冗余包后将所述当前编码数据及所述冗余包传输至所述接收端。
具体地,接收端可以根据接收到的数据包确定丢包状态信息,并将该丢包状态信息反馈至发送端。丢包状态信息可以用当前的丢包率来表示,接收端可以将该丢包率封装成报文并发送至发送端,发送端解析接收到的控制报文获得丢包率。冗余率r可以是冗余包的数量m与冗余包的数量m、编码数 据n的数量n之和的比例,即r=m/(m+n),发送端可以通过调节冗余率来实现不同程度的抗丢包效果,即大冗余率可以解决更多的连续丢包问题,小冗余率可以解决少量连续丢包或者零星丢包问题,即在高丢包率下的r值较大,而低丢包率下的r值就较小。
在一个实施例中,语音传输方法还包括:当接收端接收到当前编码数据时,则直接对当前编码数据进行语音解码,获得当前编码数据对应的语音信号;当接收端未接收到当前编码数据、且接收到冗余包时,则通过接收端基于冗余包进行冗余解码处理,得到当前编码数据后再对当前编码数据进行语音解码,获得当前编码数据对应的语音信号。
例如,接着上面的举例,发送端经过丢包恢复能力的预测,将编码数据P3、P4、P6、P7、P8、P9添加至缓存队列(缓存队列的长度可以按需设置,比如为6)后,进行冗余编码生成冗余包R1、R2,将缓存队列中的编码数据P3、P4、P6、P7、P8、P9及生成的冗余包R1、R2封装成一个数据组后发送至接收端,为了方便接收端判断是否出现了丢包,该数据组中各个数据包的包序号可以是连续的,比如可以依次为1、2、3、4、5、6。若接收端接收到P3、P4、P6,因为包序号连续没有出现丢包,则接收端可以直接根据接收的P3、P4、P6进行语音解码,获得对应的语音信号;同时接收端可以缓存P3、P4、P6,以供后续可能的FEC冗余解码使用,若后续这一组数据均未出现丢包,则清空缓存。
当接收端收到P8、P9时,接收端可以根据包序号判断出P7出现丢失,此时接收端缓存P8、P9,直至接收到R1时,接收端可以根据缓存的P3、P4、P6、P8、P9及R1进行冗余解码处理,得到丢失的P7。当继续收到R2时,可以直接丢弃。
在一个实施例中,语音传输方法还包括:
当接收端未接收到当前编码数据及冗余包时,则通过接收端对当前编码数据进行丢包恢复处理,获得对应于当前编码数据的恢复包,对恢复包进行语音解码,获得当前编码数据对应的语音信号。
接着上面的例子,在P7出现丢失的情况下,若接收端在一定时间内未接收到R1及R2,则接收端无法根据缓存的P3、P4、P6、P8、P9来恢复P7, 就需要通过解码器内置的PLC算法对当前编码数据进行丢包恢复处理,通常是根据前一数据包的解码信息,利用基音同步重复的方法近似替代当前编码数据,作为恢复包,然后对恢复包进行解码,获得语音信号。另外需要说明的是,接收端可以通过冗余解码恢复该数据组中的丢包的条件是:接收端接收到的编码数据的数量+接收端接收到的冗余包的数量>=该数据组中编码数据的数量。在不满足该条件的情况下,接收端也需要通过解码器内置的PLC算法对当前编码数据进行丢包恢复处理。
上述语音传输方法,在向接收端传输当前编码数据之前,通过基于机器学习的丢包恢复能力预测模型,根据当前编码数据对应的第一语音编码特征参数以及在前的编码数据对应的第二语音编码特征参数来预测接收端对当前编码数据的丢包恢复能力,从而根据该丢包恢复能力来判决是否对当前编码数据进行冗余编码,若是,则需要对当前编码数据进行冗余编码生成冗余包后,消耗必要的网络带宽资源将冗余包传输至接收端,反之,则不需要对当前编码数据进行冗余编码,直接将当前编码数据传输至接收端,避免消耗过多的网络带宽资源,从而在整体上有效提升网络带宽的利用率,同时也能保证传输网络的抗丢包能力。
如图7所示,为一个实施例中语音传输方法的流程框图。参照图7,发送端获取原始语音信号,对原始语音信号进行语音编码获得语音编码码流。接着,发送端通过基于机器学习的丢包恢复能力预测模型预测接收端的针对语音编码码流中每一编码数据的丢包恢复能力。再根据预测的丢包恢复能力判决是否对当前编码数据开启FEC冗余编码。若判决对当前编码数据开启FEC冗余编码,则根据接收端反馈的丢包状态信息设置冗余率后,并按照该冗余率根据当前编码数据生成冗余包,并将当前编码数据及冗余包传输至接收端。若判决不对当前编码数据开启冗余编码,则直接将当前编码数据传输至接收端。
若接收端接收到当前编码数据,则按照正常的解码流程重建语音信号。若接收端未接收到当前编码数据、但接收到了冗余包,在满足可以通过冗余解码恢复丢包的条件下,接收端可以进行FEC冗余解码获得当前编码数据。若接收端在一定时间内未接收到当前编码数据及对应的冗余包,则判定当前 编码数据丢失,接收端可以通过解码器内置的PLC算法对当前编码数据进行丢包恢复处理后进行解码获得语音信号。
如图8所示,为一个具体的实施例中语音传输方法的流程示意图。参照图8,包括以下步骤:
S802,获取原始语音信号。
S804,将原始语音信号进行分割,获得原始语音序列。
S806,依次对原始语音序列中的语音片段进行语音编码,获得语音编码码流。
S808,缓存语音编码过程中各个编码数据所采用的语音编码特征参数。
S810,获取语音编码码流中的当前编码数据。
S812,将当前编码数据对应的第一语音编码特征参数以及当前编码数据的在前编码数据对应的第二语音编码特征参数输入至丢包恢复能力预测模型。
S814,通过丢包恢复能力预测模型,根据第一语音编码特征参数及第二语音编码特征参数,输出对当前编码数据进行直接解码所确定的第一语音质量评分与对当前编码数据进行丢包恢复处理后解码所确定的第二语音质量评分之间的评分差异。
S816,根据评分差异确定当前编码数据对应的丢包恢复能力。
S818,当丢包恢复能力小于预设阈值时,则获取接收端所反馈的丢包状态信息;根据丢包状态信息确定当前编码数据所对应的冗余率;按照冗余率,根据当前编码数据生成冗余包后将当前编码数据及冗余包传输至接收端。
S820,当丢包恢复能力大于预设阈值时,则直接将当前编码数据传输至接收端。
S822,当接收端接收到当前编码数据时,则直接对当前编码数据进行语音解码,获得当前编码数据对应的语音信号。
S824,当接收端未接收到当前编码数据、且接收到冗余包时,则通过接收端基于冗余包进行冗余解码处理,得到当前编码数据后再对当前编码数据进行语音解码,获得当前编码数据对应的语音信号。
S826,当接收端未接收到当前编码数据及冗余包时,则通过接收端对当 前编码数据进行丢包恢复处理,获得对应于当前编码数据的恢复包,对恢复包进行语音解码,获得当前编码数据对应的语音信号。
应该理解的是,虽然图3、图5、图8的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图3、图5、图8中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
在一个实施例中,提供了一种语音传输系统,该语音传输系统可以是如图1或如图2所示的语音传输系统,包括发送端110和接收端120:
发送端110用于获取语音编码码流中的当前编码数据,通过基于机器学习的丢包恢复能力预测模型,根据当前编码数据对应的第一语音编码特征参数以及当前编码数据的在前编码数据对应的第二语音编码特征参数,获得当前编码数据对应的丢包恢复能力;
发送端110还用于根据丢包恢复能力判决是否需要进行冗余编码处理;若是,则根据当前编码数据进行冗余编码生成相应的冗余包后,再将当前编码数据及冗余包传输至接收端;若否,则直接将当前编码数据传输至接收端;
接收端120用于接收到当前编码数据时,则直接对当前编码数据进行语音解码,获得当前编码数据对应的语音信号;还用于当未接收到当前编码数据、且接收到冗余包时,则通过接收端基于冗余包进行冗余解码处理,得到当前编码数据后再对当前编码数据进行语音解码,获得当前编码数据对应的语音信号;
接收端120还用于未接收到当前编码数据及冗余包时,则通过接收端对当前编码数据进行丢包恢复处理,获得对应于当前编码数据的恢复包,对恢复包进行语音解码,获得当前编码数据对应的语音信号。
在一个实施例中,发送端110还用于获取原始语音信号;将原始语音信号进行分割,获得原始语音序列;依次对原始语音序列中的语音片段进行语 音编码,获得语音编码码流。
在一个实施例中,发送端110还用于获取原始语音序列中的语音片段各自对应的语音编码特征参数;根据语音编码特征参数对相应的语音片段进行语音编码,生成对应的编码数据后获得语音编码码流;缓存语音编码过程中各个编码数据所采用的语音编码特征参数。
在一个实施例中,发送端110还用于将当前编码数据对应的第一语音编码特征参数以及当前编码数据的在前编码数据对应的第二语音编码特征参数输入至丢包恢复能力预测模型;通过丢包恢复能力预测模型,根据第一语音编码特征参数及第二语音编码特征参数,输出对当前编码数据进行直接解码所确定的第一语音质量评分与对当前编码数据进行丢包恢复处理后解码所确定的第二语音质量评分之间的评分差异;根据评分差异确定当前编码数据对应的丢包恢复能力;其中,当前编码数据对应的丢包恢复能力与评分差异成反相关。
在一个实施例中,发送端110还用于获取接收端所反馈的丢包状态信息;根据丢包状态信息确定当前编码数据所对应的冗余率;按照冗余率,根据当前编码数据生成冗余包后将当前编码数据及冗余包传输至接收端。
在一个实施例中,接收端120还用于当接收端接收到当前编码数据时,则直接对当前编码数据进行语音解码,获得当前编码数据对应的语音信号。
在一个实施例中,接收端120还用于当接收端未接收到当前编码数据、且接收到冗余包时,则通过接收端基于冗余包进行冗余解码处理,得到当前编码数据后再对当前编码数据进行语音解码,获得当前编码数据对应的语音信号。
在一个实施例中,接收端120还用于当接收端未接收到当前编码数据及冗余包时,则通过接收端对当前编码数据进行丢包恢复处理,获得对应于当前编码数据的恢复包,对恢复包进行语音解码,获得当前编码数据对应的语音信号。
在一个实施例中,发送端110还用于获取训练集中的样本语音序列;对样本语音序列进行语音编码,获得样本语音编码码流;提取样本语音编码码流中的当前编码数据所采用的第一语音编码特征参数以及当前编码数据的在 前编码数据所采用的第二语音编码特征参数;获取直接对样本语音编码码流进行解码并获得第一语音信号后,基于第一语音信号所确定的第一语音质量评分;获取对当前编码数据进行模拟丢包恢复处理获得恢复包,对恢复包进行解码并获得第二语音信号后,基于第二语音信号所确定的第二语音质量评分;根据第一语音质量评分与第二语音质量评分之间的评分差异确定当前编码数据对应的真实丢包恢复能力;将的第一语音编码特征参数及第二语音编码特征参数输入至机器学习模型,通过机器学习模型输出当前编码数据所对应的预测丢包恢复能力;根据真实丢包恢复能力与预测丢包恢复能力之间的差异调整机器学习模型的模型参数后,返回至获取训练集中的样本语音序列的步骤继续训练,直至满足训练结束条件。
上述语音传输系统,发送端在向接收端传输当前编码数据之前,通过基于机器学习的丢包恢复能力预测模型,根据当前编码数据对应的第一语音编码特征参数以及在前的编码数据对应的第二语音编码特征参数来预测接收端对当前编码数据的丢包恢复能力,从而根据该丢包恢复能力来判决是否对当前编码数据进行冗余编码,若是,则需要对当前编码数据进行冗余编码生成冗余包后,消耗必要的网络带宽资源将冗余包传输至接收端,反之,则不需要对当前编码数据进行冗余编码,直接将当前编码数据传输至接收端,避免消耗过多的网络带宽资源,从而在整体上有效提升网络带宽的利用率,同时也能保证传输网络的抗丢包能力。
在一个实施例中,如图9所示,提供了一种语音传输装置900,该装置可以通过软件、硬件或者两者的结合实现成为接收端的全部或一部分。该装置包括获取模块902、预测模块904和冗余编码判决模块906:
获取模块902,用于获取语音编码码流中的当前编码数据;
预测模块904,用于通过基于机器学习的丢包恢复能力预测模型,根据当前编码数据对应的第一语音编码特征参数以及当前编码数据的在前编码数据对应的第二语音编码特征参数,获得当前编码数据对应的丢包恢复能力;
冗余编码判决模块906,用于根据丢包恢复能力判决是否需要进行冗余编码处理;若是,则根据当前编码数据进行冗余编码生成相应的冗余包后,再将当前编码数据及冗余包传输至接收端;若否,则直接将当前编码数据传 输至接收端。
在一个实施例中,语音传输装置900还包括语音编码模块,用于获取原始语音信号;将原始语音信号进行分割,获得原始语音序列;依次对原始语音序列中的语音片段进行语音编码,获得语音编码码流。
在一个实施例中,语音传输装置900还包括语音编码模块和缓存模块,语音编码模块用于获取原始语音序列中的语音片段各自对应的语音编码特征参数;根据语音编码特征参数对相应的语音片段进行语音编码,生成对应的编码数据后获得语音编码码流;缓存模块用于缓存语音编码过程中各个编码数据所采用的语音编码特征参数。
在一个实施例中,预测模块904还用于将当前编码数据对应的第一语音编码特征参数以及当前编码数据的在前编码数据对应的第二语音编码特征参数输入至丢包恢复能力预测模型;通过丢包恢复能力预测模型,根据第一语音编码特征参数及第二语音编码特征参数,输出对当前编码数据进行直接解码所确定的第一语音质量评分与对当前编码数据进行丢包恢复处理后解码所确定的第二语音质量评分之间的评分差异;根据评分差异确定当前编码数据对应的丢包恢复能力;其中,当前编码数据对应的丢包恢复能力与评分差异成反相关。
在一个实施例中,冗余编码判决模块906还用于当丢包恢复能力小于预设阈值时,获取接收端所反馈的丢包状态信息;根据丢包状态信息确定当前编码数据所对应的冗余率;按照冗余率,根据当前编码数据生成冗余包后将当前编码数据及冗余包传输至接收端。
在一个实施例中,语音传输装置900还包括模型训练模块,用于获取训练集中的样本语音序列;对样本语音序列进行语音编码,获得样本语音编码码流;提取样本语音编码码流中的当前编码数据所采用的第一语音编码特征参数以及当前编码数据的在前编码数据所采用的第二语音编码特征参数;获取直接对样本语音编码码流进行解码并获得第一语音信号后,基于第一语音信号所确定的第一语音质量评分;获取对当前编码数据进行模拟丢包恢复处理获得恢复包,对恢复包进行解码并获得第二语音信号后,基于第二语音信号所确定的第二语音质量评分;根据第一语音质量评分与第二语音质量评分 之间的评分差异确定当前编码数据对应的真实丢包恢复能力;将第一语音编码特征参数及第二语音编码特征参数输入至机器学习模型,通过机器学习模型输出当前编码数据所对应的预测丢包恢复能力;根据真实丢包恢复能力与预测丢包恢复能力之间的差异调整机器学习模型的模型参数后,返回至获取训练集中的样本语音序列的步骤继续训练,直至满足训练结束条件。
上述语音传输装置900,在向接收端传输当前编码数据之前,通过基于机器学习的丢包恢复能力预测模型,根据当前编码数据对应的第一语音编码特征参数以及在前的编码数据对应的第二语音编码特征参数来预测接收端对当前编码数据的丢包恢复能力,从而根据该丢包恢复能力来判决是否对当前编码数据进行冗余编码,若是,则需要对当前编码数据进行冗余编码生成冗余包后,消耗必要的网络带宽资源将冗余包传输至接收端,反之,则不需要对当前编码数据进行冗余编码,直接将当前编码数据传输至接收端,避免消耗过多的网络带宽资源,从而在整体上有效提升网络带宽的利用率,同时也能保证传输网络的抗丢包能力。
图10示出了一个实施例中计算机设备的内部结构图。该计算机设备具体可以是图1中的发送端110。如图10所示,该计算机设备包括该计算机设备包括通过系统总线连接的处理器、存储器和网络接口。其中,存储器包括非易失性存储介质和内存储器。该计算机设备的非易失性存储介质存储有操作系统,还可存储有计算机可读指令,该计算机可读指令被处理器执行时,可使得处理器实现语音传输方法。该内存储器中也可储存有计算机可读指令,该计算机可读指令被处理器执行时,可使得处理器执行语音传输方法。
本领域技术人员可以理解,图10中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,本申请提供的语音传输装置900可以实现为一种计算机可读指令的形式,计算机可读指令可在如图10所示的计算机设备上运行。计算机设备的存储器中可存储组成该语音传输装置900的各个程序模块,比如,图9所示的获取模块902、预测模块904和冗余编码判决模块906。各个 模块构成的计算机可读指令使得处理器执行本说明书中描述的本申请各个实施例的语音传输方法中的步骤。
例如,图10所示的计算机设备可以通过如图9所示的语音传输装置900中的获取模块902执行步骤S302。计算机设备可通过预测模块904执行步骤S304。计算机设备可通过冗余编码判决模块906执行步骤S306、S308和S310。
在一个实施例中,提供了一种计算机设备,包括存储器和处理器,存储器存储有计算机可读指令,计算机可读指令被处理器执行时,使得处理器执行上述语音传输方法的步骤。此处语音传输方法的步骤可以是上述各个实施例的语音传输方法中的步骤。
在一个实施例中,提供了一种计算机可读存储介质,存储有计算机可读指令,计算机可读指令被处理器执行时,使得处理器执行上述语音传输方法的步骤。此处语音传输方法的步骤可以是上述各个实施例的语音传输方法中的步骤。
在一个实施例中,提供了一种计算机程序产品或计算机可读指令,该计算机程序产品或计算机可读指令包括计算机可读指令,该计算机可读指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机可读指令,处理器执行该计算机可读指令,使得该计算机设备执行上述各方法实施例中的步骤。本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和易失性存储器中的至少一种。非易失性存储器可包括只读存储器(Read-Only Memory,ROM)、磁带、软盘、闪存或光存储器等。易失性存储器可包括随机存取存储器(Random Access Memory,RAM)或外部高速缓冲存储器。作为说明而非局限,RAM可以是多种形式,比如静态随机存取存储器(Static Random Access Memory,SRAM)或动态随机存取存储器(Dynamic Random Access Memory,DRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种语音传输方法,由计算机执行,所述方法包括:
    获取语音编码码流中的当前编码数据;
    通过基于机器学习的丢包恢复能力预测模型,根据当前编码数据对应的第一语音编码特征参数以及所述当前编码数据的在前编码数据对应的第二语音编码特征参数,获得所述当前编码数据对应的丢包恢复能力;
    根据所述丢包恢复能力判决是否需要进行冗余编码处理;
    若是,则根据所述当前编码数据进行冗余编码生成相应的冗余包后,再将所述当前编码数据及所述冗余包传输至接收端;及
    若否,则直接将所述当前编码数据传输至接收端。
  2. 根据权利要求1所述的方法,其中所述方法还包括:
    获取原始语音信号;
    将原始语音信号进行分割,获得原始语音序列;及
    依次对所述原始语音序列中的语音片段进行语音编码,获得语音编码码流。
  3. 根据权利要求1所述的方法,其中所述方法还包括:
    获取原始语音序列中的语音片段各自对应的语音编码特征参数;
    根据所述语音编码特征参数对相应的语音片段进行语音编码,生成对应的编码数据后获得语音编码码流;及
    缓存语音编码过程中各个编码数据所采用的语音编码特征参数。
  4. 根据权利要求1所述的方法,其中所述通过基于机器学习的丢包恢复能力预测模型,根据当前编码数据对应的第一语音编码特征参数以及所述当前编码数据的在前编码数据对应的第二语音编码特征参数,获得所述当前编码数据对应的丢包恢复能力,包括:
    将所述当前编码数据对应的第一语音编码特征参数以及所述当前编码数据的在前编码数据对应的第二语音编码特征参数输入至丢包恢复能力预测模型;
    通过所述丢包恢复能力预测模型,根据所述第一语音编码特征参数及所述第二语音编码特征参数,输出对所述当前编码数据进行直接解码所确定的 第一语音质量评分与对所述当前编码数据进行丢包恢复处理后解码所确定的第二语音质量评分之间的评分差异;及
    根据所述评分差异确定所述当前编码数据对应的丢包恢复能力;
    其中,所述当前编码数据对应的丢包恢复能力与所述评分差异成反相关。
  5. 根据权利要求1所述的方法,其中所述根据所述当前编码数据进行冗余编码生成相应的冗余包后,再将所述当前编码数据及所述冗余包传输至接收端,包括:
    获取接收端所反馈的丢包状态信息;
    根据所述丢包状态信息确定所述当前编码数据所对应的冗余率;及
    按照所述冗余率,根据所述当前编码数据生成冗余包后将所述当前编码数据及所述冗余包传输至所述接收端。
  6. 根据权利要求1所述的方法,其中所述方法还包括:
    当所述接收端接收到所述当前编码数据时,则直接对所述当前编码数据进行语音解码,获得所述当前编码数据对应的语音信号;及
    当所述接收端未接收到所述当前编码数据、且接收到所述冗余包时,则通过所述接收端基于所述冗余包进行冗余解码处理,得到所述当前编码数据后再对所述当前编码数据进行语音解码,获得所述当前编码数据对应的语音信号。
  7. 根据权利要求1所述的方法,其中所述方法还包括:
    当所述接收端未接收到所述当前编码数据及所述冗余包时,则通过所述接收端对所述当前编码数据进行丢包恢复处理,获得对应于所述当前编码数据的恢复包,对所述恢复包进行语音解码,获得所述当前编码数据对应的语音信号。
  8. 根据权利要求1至7任一项所述的方法,其中所述丢包恢复能力预测模型通过以下步骤确定:
    获取训练集中的样本语音序列;
    对所述样本语音序列进行语音编码,获得样本语音编码码流;
    提取所述样本语音编码码流中的当前编码数据所采用的第一语音编码特征参数以及所述当前编码数据的在前编码数据所采用的第二语音编码特征参 数;
    获取直接对所述样本语音编码码流进行解码并获得第一语音信号后,基于所述第一语音信号所确定的第一语音质量评分;
    获取对所述当前编码数据进行模拟丢包恢复处理获得恢复包,对所述恢复包进行解码并获得第二语音信号后,基于所述第二语音信号所确定的第二语音质量评分;
    根据所述第一语音质量评分与所述第二语音质量评分之间的评分差异确定所述当前编码数据对应的真实丢包恢复能力;
    将所述第一语音编码特征参数及所述第二语音编码特征参数输入至机器学习模型,通过所述机器学习模型输出所述当前编码数据所对应的预测丢包恢复能力;及
    根据所述真实丢包恢复能力与所述预测丢包恢复能力之间的差异调整所述机器学习模型的模型参数后,返回至所述获取训练集中的样本语音序列的步骤继续训练,直至满足训练结束条件。
  9. 一种语音传输系统,包括发送端和接收端,其中:
    所述发送端用于获取语音编码码流中的当前编码数据,通过基于机器学习的丢包恢复能力预测模型,根据所述当前编码数据对应的第一语音编码特征参数以及所述当前编码数据的在前编码数据对应的第二语音编码特征参数,获得所述当前编码数据对应的丢包恢复能力;
    所述发送端还用于根据所述丢包恢复能力判决是否需要进行冗余编码处理;若是,则根据所述当前编码数据进行冗余编码生成相应的冗余包后,再将所述当前编码数据及所述冗余包传输至接收端;若否,则直接将所述当前编码数据传输至接收端;
    所述接收端用于接收到所述当前编码数据时,则直接对所述当前编码数据进行语音解码,获得所述当前编码数据对应的语音信号;还用于当未接收到所述当前编码数据、且接收到所述冗余包时,则通过所述接收端基于所述冗余包进行冗余解码处理,得到所述当前编码数据后再对所述当前编码数据进行语音解码,获得所述当前编码数据对应的语音信号;及
    所述接收端还用于未接收到所述当前编码数据及所述冗余包时,则通过 所述接收端对所述当前编码数据进行丢包恢复处理,获得对应于所述当前编码数据的恢复包,对所述恢复包进行语音解码,获得所述当前编码数据对应的语音信号。
  10. 根据权利要求9所述的系统,其中所述发送端还用于获取原始语音序列中的语音片段各自对应的语音编码特征参数;根据所述语音编码特征参数对相应的语音片段进行语音编码,生成对应的编码数据后获得语音编码码流;缓存所述语音编码过程中各个编码数据所采用的语音编码特征参数。
  11. 根据权利要求9所述的系统,其中所述发送端还用于将所述当前编码数据对应的第一语音编码特征参数以及所述当前编码数据的在前编码数据对应的第二语音编码特征参数输入至丢包恢复能力预测模型;通过所述丢包恢复能力预测模型,根据所述第一语音编码特征参数及所述第二语音编码特征参数,输出对所述当前编码数据进行直接解码所确定的第一语音质量评分与对所述当前编码数据进行丢包恢复处理后解码所确定的第二语音质量评分之间的评分差异;根据所述评分差异确定所述当前编码数据对应的丢包恢复能力;其中,所述当前编码数据对应的丢包恢复能力与所述评分差异成反相关。
  12. 根据权利要求9所述的系统,其中所述发送端还用于获取接收端所反馈的丢包状态信息;根据所述丢包状态信息确定所述当前编码数据所对应的冗余率;按照所述冗余率,根据所述当前编码数据生成冗余包后将所述当前编码数据及所述冗余包传输至所述接收端。
  13. 一种语音传输装置,所述装置包括:
    获取模块,用于获取语音编码码流中的当前编码数据;
    预测模块,用于通过基于机器学习的丢包恢复能力预测模型,根据当前编码数据对应的第一语音编码特征参数以及所述当前编码数据的在前编码数据对应的第二语音编码特征参数,获得所述当前编码数据对应的丢包恢复能力;及
    冗余编码判决模块,用于根据所述丢包恢复能力判决是否需要进行冗余编码处理;若是,则根据所述当前编码数据进行冗余编码生成相应的冗余包后,再将所述当前编码数据及所述冗余包传输至接收端;若否,则直接将所 述当前编码数据传输至接收端。
  14. 根据权利要求13所述的装置,其中所述语音传输装置还包括语音编码模块,用于获取原始语音信号;将原始语音信号进行分割,获得原始语音序列;及依次对所述原始语音序列中的语音片段进行语音编码,获得语音编码码流。
  15. 根据权利要求13所述的装置,其中所述语音传输装置还包括语音编码模块和缓存模块;
    所述语音编码模块用于获取原始语音序列中的语音片段各自对应的语音编码特征参数;根据所述语音编码特征参数对相应的语音片段进行语音编码,生成对应的编码数据后获得语音编码码流;
    所述缓存模块用于缓存语音编码过程中各个编码数据所采用的语音编码特征参数。
  16. 根据权利要求13所述的装置,其中所述预测模块还用于将所述当前编码数据对应的第一语音编码特征参数以及所述当前编码数据的在前编码数据对应的第二语音编码特征参数输入至丢包恢复能力预测模型;通过所述丢包恢复能力预测模型,根据所述第一语音编码特征参数及所述第二语音编码特征参数,输出对所述当前编码数据进行直接解码所确定的第一语音质量评分与对所述当前编码数据进行丢包恢复处理后解码所确定的第二语音质量评分之间的评分差异;及根据所述评分差异确定所述当前编码数据对应的丢包恢复能力;其中,所述当前编码数据对应的丢包恢复能力与所述评分差异成反相关。
  17. 根据权利要求13所述的装置,其中所述冗余编码判决模块还用于当所述接收端接收到所述当前编码数据时,则直接对所述当前编码数据进行语音解码,获得所述当前编码数据对应的语音信号;及当所述接收端未接收到所述当前编码数据、且接收到所述冗余包时,则通过所述接收端基于所述冗余包进行冗余解码处理,得到所述当前编码数据后再对所述当前编码数据进行语音解码,获得所述当前编码数据对应的语音信号。
  18. 根据权利要求13至17任一项所述的装置,其中所述冗余编码判决模块还用于当所述接收端未接收到所述当前编码数据及所述冗余包时,则通 过所述接收端对所述当前编码数据进行丢包恢复处理,获得对应于所述当前编码数据的恢复包,对所述恢复包进行语音解码,获得所述当前编码数据对应的语音信号。
  19. 一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如权利要求1至8中任一项所述方法的步骤。
  20. 一种计算机设备,包括存储器和一个或多个处理器,存储器中储存有计算机可读指令,计算机可读指令被处理器执行时,使得一个或多个处理器执行如权利要求1至8中任一项所述方法的步骤。
PCT/CN2020/124263 2020-02-20 2020-10-28 语音传输方法、系统、装置、计算机可读存储介质和设备 Ceased WO2021164303A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
JP2022522692A JP7383138B2 (ja) 2020-02-20 2020-10-28 音声伝送方法及びそのシステム、装置、コンピュータプログラム、並びにコンピュータ機器
EP20920497.3A EP4012705B1 (en) 2020-02-20 2020-10-28 Speech transmission method, system, and apparatus, computer readable storage medium, and device
US17/685,242 US12451145B2 (en) 2020-02-20 2022-03-02 Speech transmission method, system and apparatus, computer-readable storage medium, and device
US19/356,962 US20260038511A1 (en) 2020-02-20 2025-10-13 Audio transmission based on packet loss recovery capability

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010104793.7 2020-02-20
CN202010104793.7A CN112820306B (zh) 2020-02-20 2020-02-20 语音传输方法、系统、装置、计算机可读存储介质和设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/685,242 Continuation US12451145B2 (en) 2020-02-20 2022-03-02 Speech transmission method, system and apparatus, computer-readable storage medium, and device

Publications (1)

Publication Number Publication Date
WO2021164303A1 true WO2021164303A1 (zh) 2021-08-26

Family

ID=75852966

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/124263 Ceased WO2021164303A1 (zh) 2020-02-20 2020-10-28 语音传输方法、系统、装置、计算机可读存储介质和设备

Country Status (5)

Country Link
US (2) US12451145B2 (zh)
EP (1) EP4012705B1 (zh)
JP (1) JP7383138B2 (zh)
CN (1) CN112820306B (zh)
WO (1) WO2021164303A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114513418A (zh) * 2022-04-21 2022-05-17 腾讯科技(深圳)有限公司 一种数据处理方法及相关设备
CN117498892A (zh) * 2024-01-02 2024-02-02 深圳旷世科技有限公司 基于uwb的音频传输方法、装置、终端及存储介质
TWI907896B (zh) * 2022-12-23 2025-12-11 弗勞恩霍夫爾協會 用於音訊編碼/解碼的錯誤恢復工具

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2021291010A1 (en) * 2020-06-19 2023-01-19 Rtx A/S Low latency audio packet loss concealment
US20220052783A1 (en) * 2020-08-12 2022-02-17 Vmware, Inc. Packet reconstruction and error correction for network endpoints
CN113192520B (zh) * 2021-07-01 2021-09-24 腾讯科技(深圳)有限公司 一种音频信息处理方法、装置、电子设备及存储介质
CN116073946A (zh) * 2021-11-01 2023-05-05 中兴通讯股份有限公司 抗丢包方法、装置、电子设备及存储介质
CN114978427B (zh) * 2022-05-19 2024-04-19 腾讯科技(深圳)有限公司 数据处理方法、装置、程序产品、计算机设备和介质
CN116033594A (zh) * 2022-12-22 2023-04-28 深圳市潮流网络技术有限公司 数据传输方法、系统、终端设备及计算机可读存储介质
US20240339117A1 (en) * 2023-04-07 2024-10-10 Apple Inc. Low latency audio for immersive group communication sessions

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106937134A (zh) * 2015-12-31 2017-07-07 深圳市潮流网络技术有限公司 一种数据传输的编码方法、编码发送装置及系统
US20190051310A1 (en) * 2017-08-10 2019-02-14 Industry-University Cooperation Foundation Hanyang University Method and apparatus for packet loss concealment using generative adversarial network
US20190080701A1 (en) * 2014-03-04 2019-03-14 Genesys Telecommunications Laboratories, Inc. System and Method to Correct for Packet Loss in ASR Systems
CN109616129A (zh) * 2018-11-13 2019-04-12 南京南大电子智慧型服务机器人研究院有限公司 用于提升语音丢帧补偿性能的混合多描述正弦编码器方法
CN110265046A (zh) * 2019-07-25 2019-09-20 腾讯科技(深圳)有限公司 一种编码参数调控方法、装置、设备及存储介质

Family Cites Families (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3006541B2 (ja) * 1997-05-26 2000-02-07 日本電気株式会社 通信装置及び通信システム
KR100462024B1 (ko) 2002-12-09 2004-12-17 한국전자통신연구원 부가 음성 데이터를 이용한 패킷 손실 복구 방법 및 이를이용한 송수신기
US7668712B2 (en) * 2004-03-31 2010-02-23 Microsoft Corporation Audio encoding and decoding with intra frames and adaptive forward error correction
JPWO2008007698A1 (ja) 2006-07-12 2009-12-10 パナソニック株式会社 消失フレーム補償方法、音声符号化装置、および音声復号装置
US8010351B2 (en) * 2006-12-26 2011-08-30 Yang Gao Speech coding system to improve packet loss concealment
EP2381580A1 (en) 2007-04-13 2011-10-26 Global IP Solutions (GIPS) AB Adaptive, scalable packet loss recovery
US8352252B2 (en) * 2009-06-04 2013-01-08 Qualcomm Incorporated Systems and methods for preventing the loss of information within a speech frame
CN102036061B (zh) * 2009-09-30 2012-11-21 华为技术有限公司 视频数据传输处理、发送处理方法、装置和网络系统
CN102143367B (zh) * 2010-01-30 2013-01-30 华为技术有限公司 一种纠错校验方法、设备和系统
CN102752184A (zh) * 2011-04-20 2012-10-24 河海大学 用于实时多播业务的数据通信系统及其方法
US9047863B2 (en) * 2012-01-12 2015-06-02 Qualcomm Incorporated Systems, methods, apparatus, and computer-readable media for criticality threshold control
CN103716718B (zh) * 2013-12-16 2017-03-01 广州华多网络科技有限公司 数据包的传输方法及装置
WO2016179382A1 (en) * 2015-05-07 2016-11-10 Dolby Laboratories Licensing Corporation Voice quality monitoring system
US20170084280A1 (en) * 2015-09-22 2017-03-23 Microsoft Technology Licensing, Llc Speech Encoding
EP3228037B1 (en) * 2015-10-01 2018-04-11 Telefonaktiebolaget LM Ericsson (publ) Method and apparatus for removing jitter in audio data transmission
CN107592540B (zh) * 2016-07-07 2020-02-11 腾讯科技(深圳)有限公司 一种视频数据处理方法及装置
CN108011686B (zh) * 2016-10-31 2020-07-14 腾讯科技(深圳)有限公司 信息编码帧丢失恢复方法和装置
US10714098B2 (en) * 2017-12-21 2020-07-14 Dolby Laboratories Licensing Corporation Selective forward error correction for spatial audio codecs
CN110087140B (zh) * 2018-01-26 2022-07-05 腾讯科技(深圳)有限公司 一种传输流媒体数据的方法、装置、介质及设备
US10475456B1 (en) * 2018-06-04 2019-11-12 Qualcomm Incorporated Smart coding mode switching in audio rate adaptation
US10990812B2 (en) 2018-06-20 2021-04-27 Agora Lab, Inc. Video tagging for video communications
CN109218083B (zh) * 2018-08-27 2021-08-13 广州猎游信息科技有限公司 一种语音数据传输方法及装置
US10784988B2 (en) * 2018-12-21 2020-09-22 Microsoft Technology Licensing, Llc Conditional forward error correction for network data
CN109862440A (zh) * 2019-02-22 2019-06-07 深圳市凯迪仕智能科技有限公司 音视频传输前向纠错方法、装置、计算机设备及存储介质
US11509423B2 (en) * 2019-09-09 2022-11-22 Apple Inc. Dynamic redundancy for multimedia content
CN112530444B (zh) * 2019-09-18 2023-10-03 华为技术有限公司 音频编码方法和装置
US11671448B2 (en) * 2019-12-27 2023-06-06 Paypal, Inc. Phishing detection using uniform resource locators
CN111312264B (zh) * 2020-02-20 2023-04-21 腾讯科技(深圳)有限公司 语音传输方法、系统、装置、计算机可读存储介质和设备
US11715480B2 (en) * 2021-03-23 2023-08-01 Qualcomm Incorporated Context-based speech enhancement
US11914599B2 (en) * 2021-11-19 2024-02-27 Hamilton Sundstrand Corporation Machine learning intermittent data dropout mitigation
US20250182773A1 (en) * 2023-12-01 2025-06-05 Comcast Cable Communications, Llc Methods and apparatuses for speech enhancement

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190080701A1 (en) * 2014-03-04 2019-03-14 Genesys Telecommunications Laboratories, Inc. System and Method to Correct for Packet Loss in ASR Systems
CN106937134A (zh) * 2015-12-31 2017-07-07 深圳市潮流网络技术有限公司 一种数据传输的编码方法、编码发送装置及系统
US20190051310A1 (en) * 2017-08-10 2019-02-14 Industry-University Cooperation Foundation Hanyang University Method and apparatus for packet loss concealment using generative adversarial network
CN109616129A (zh) * 2018-11-13 2019-04-12 南京南大电子智慧型服务机器人研究院有限公司 用于提升语音丢帧补偿性能的混合多描述正弦编码器方法
CN110265046A (zh) * 2019-07-25 2019-09-20 腾讯科技(深圳)有限公司 一种编码参数调控方法、装置、设备及存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4012705A4

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114513418A (zh) * 2022-04-21 2022-05-17 腾讯科技(深圳)有限公司 一种数据处理方法及相关设备
CN114513418B (zh) * 2022-04-21 2022-06-24 腾讯科技(深圳)有限公司 一种数据处理方法及相关设备
TWI907896B (zh) * 2022-12-23 2025-12-11 弗勞恩霍夫爾協會 用於音訊編碼/解碼的錯誤恢復工具
CN117498892A (zh) * 2024-01-02 2024-02-02 深圳旷世科技有限公司 基于uwb的音频传输方法、装置、终端及存储介质
CN117498892B (zh) * 2024-01-02 2024-05-03 深圳旷世科技有限公司 基于uwb的音频传输方法、装置、终端及存储介质

Also Published As

Publication number Publication date
CN112820306B (zh) 2023-08-15
JP2022552382A (ja) 2022-12-15
EP4012705A1 (en) 2022-06-15
JP7383138B2 (ja) 2023-11-17
US20260038511A1 (en) 2026-02-05
EP4012705A4 (en) 2022-12-28
CN112820306A (zh) 2021-05-18
EP4012705B1 (en) 2025-07-30
US12451145B2 (en) 2025-10-21
US20220189491A1 (en) 2022-06-16

Similar Documents

Publication Publication Date Title
CN112820306B (zh) 语音传输方法、系统、装置、计算机可读存储介质和设备
CN111312264B (zh) 语音传输方法、系统、装置、计算机可读存储介质和设备
CN114333862B (zh) 音频编码方法、解码方法、装置、设备、存储介质及产品
US20220180881A1 (en) Speech signal encoding and decoding methods and apparatuses, electronic device, and storage medium
US9275644B2 (en) Devices for redundant frame coding and decoding
JP5405659B2 (ja) 消去されたスピーチフレームを再構成するためのシステムおよび方法
US20190198027A1 (en) Audio frame loss recovery method and apparatus
RU2432694C2 (ru) Способ передачи данных в системе связи
US20150207710A1 (en) Call Quality Estimation by Lost Packet Classification
US12501085B2 (en) Semantic compression for compute offloading
CN116580716B (zh) 音频编码方法、装置、存储介质及计算机设备
CN113763973B (zh) 音频信号增强方法、装置、计算机设备和存储介质
CN111371534B (zh) 一种数据重传方法、装置、电子设备和存储介质
CN114842857B (zh) 语音处理方法、装置、系统、设备及存储介质
HK40044533B (zh) 语音传输方法、系统、装置、计算机可读存储介质和设备
HK40044533A (zh) 语音传输方法、系统、装置、计算机可读存储介质和设备
HK40024144B (zh) 语音传输方法、系统、装置、计算机可读存储介质和设备
HK40024144A (zh) 语音传输方法、系统、装置、计算机可读存储介质和设备
HK40071960B (zh) 音频编码方法、解码方法、装置、设备、存储介质及产品
HK40071960A (zh) 音频编码方法、解码方法、装置、设备、存储介质及产品
CN121811890A (zh) 基于三组分解的语音通信方法及装置、设备
CN121121429A (zh) 模型传输方法、装置、设备、介质及程序产品
Han et al. Error-Resilient Semantic Communication for Speech Transmission over Packet-Loss Networks
Benamirouche et al. Low complexity forward error correction for CELP-type speech coding over erasure channel transmission
Shukla et al. Enhanced Speech Compression in G. 723 Audio Codec Through Mahalanobis Distance-Based Error Concealment Technique

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20920497

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020920497

Country of ref document: EP

Effective date: 20220307

ENP Entry into the national phase

Ref document number: 2022522692

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

WWG Wipo information: grant in national office

Ref document number: 2020920497

Country of ref document: EP

WWG Wipo information: grant in national office

Ref document number: 202237020222

Country of ref document: IN