Disclosure of Invention
The technical problem to be solved by the invention is as follows: how to show the content of the conversation between the customer service staff and the customer on the computer used by the customer service staff for the customer service staff to refer and copy, and provides a customer service assistant tool service method based on the speech-to-text conversion.
The invention solves the technical problems through the following technical scheme, and the invention comprises the following steps:
s1: calling client
When the customer service personnel receives the incoming call through the foreground system, the assistant client is called up to start to communicate with the customer;
s2: speech acquisition
The assistant client starts to collect audio after being started, wherein the audio comprises the voice of customer service personnel and the voice of a client;
s3: voice transmission
The assistant client sends the two collected voices to the assistant server in real time through the websocket interface;
s4: speech conversion processing
The assistant server or the assistant client is used for carrying out corresponding conversion processing on the collected voice so as to adapt to the websocket interface requirement of an ASR (automatic speech recognition technology) manufacturer;
s5: speech recognition
The assistant server side sends the processed audio data to an ASR service provider through a websocket interface for recognition, and receives a return result in real time;
s6: recognition result return
The assistant server returns the recognition result to the assistant client through the websocket interface;
s7: display of recognition results
Displaying the assistant client display interface through an independent window;
s8: stop showing
And after the call is finished, the assistant client stops collecting the audio data.
Further, in step S1, when the customer service personnel answers the incoming call event trigger, the foreground system invokes the assistant client through JavaScript, and transmits the current call information.
Further, in step S2, the assistant client calls the COM component provided by windows to collect Audio data by using Core Audio, where the collection client uses the loopback mode and the collection customer service person uses the capture mode.
Furthermore, in step S4, the ASR service websocket interface needs 8000hz, monaural, 16bit wide PCM audio data, and the collected audio data needs to be transcoded to convert the audio data into a format required by the ASR service websocket interface.
Furthermore, when an incoming call answering event is triggered, the assistant client is started, then the assistant server is connected and sends text information, the assistant server receives the text information and then starts a websocket connection, connects to the ASR service, creates a related data structure, and associates the two channels.
Furthermore, when the call end event is triggered, the assistant client sends an end mark to the assistant server, and the assistant server disconnects the websocket connection with the ASR service and clears the relevant resources.
The invention also provides a customer service assistant tool service system based on the voice-to-text conversion, which uses the service method to service the customer and comprises the following steps:
the client-side calling module is used for receiving an incoming call through the foreground system, calling the assistant client side and starting to communicate with the client;
the voice acquisition module is used for acquiring audio after the assistant client is started, and the audio comprises the voice of customer service personnel and the voice of a client;
the voice sending module is used for sending the two collected voices to the assistant server in real time through the websocket interface by the assistant client;
the voice conversion processing module is used for utilizing an assistant server or an assistant client to perform corresponding conversion processing on the collected voice so as to adapt to the websocket interface requirement of an ASR manufacturer;
the voice recognition module is used for recognizing the processed audio data through the ASR service provider and receiving a return result in real time;
the recognition result returning module is used for returning the recognition result to the assistant client through the websocket interface by using the assistant server;
the result display module is used for displaying the display interface of the assistant client through the independent window;
the display stopping module is used for stopping collecting the audio data after the call is finished;
the central processing module is used for sending instructions to other modules to complete related actions;
the client calling module, the voice acquisition module, the voice sending module, the voice conversion processing module, the voice recognition module, the recognition result returning module, the result display module and the display stopping module are all electrically connected with the central processing module.
Compared with the prior art, the invention has the following advantages: the customer service assistant tool service method based on the voice-character conversion is characterized in that an assistant server is arranged, and the assistant server receives the voice and returns the recognition result, so that the recognition result of the whole phone can be recorded and stored at the assistant server, and other services such as quality inspection and the like are facilitated; the ASR service is called through the interface, and the service provider can be replaced; the customer service personnel can copy the chat characters at any time or turn over the chat records in front, so that the system is worthy of being popularized and used.
Detailed Description
The following examples are given for the detailed implementation and specific operation of the present invention, but the scope of the present invention is not limited to the following examples.
Example one
The embodiment provides a technical scheme: a customer service assistant tool service method based on speech-to-text conversion comprises the following steps:
s1: calling client
When the customer service personnel receives the incoming call through the foreground system, the assistant client is called up to start to communicate with the customer;
s2: speech acquisition
The assistant client starts to collect audio after being started, wherein the audio comprises the voice of customer service personnel and the voice of a client;
s3: voice transmission
The assistant client sends the two collected voices to the assistant server in real time through the websocket interface;
s4: speech conversion processing
The assistant server or the assistant client is used for carrying out corresponding conversion processing on the collected voice so as to adapt to the websocket interface requirement of an ASR (automatic speech recognition technology) manufacturer;
s5: speech recognition
The assistant server side sends the processed audio data to an ASR service provider through a websocket interface for recognition, and receives a return result in real time;
s6: recognition result return
The assistant server returns the recognition result to the assistant client through the websocket interface;
s7: display of recognition results
Displaying the assistant client display interface through an independent window;
s8: stop showing
And after the call is finished, the assistant client stops collecting the audio data.
In step S1, when the customer service staff answers the incoming call event trigger, the foreground system invokes the assistant client through JavaScript, and transmits the current call information.
In step S2, the assistant client calls the COM component provided by the windows, and acquires Audio data by using Core Audio, where the voice of the client is acquired by using a loopback mode, and the voice of the service person is acquired by using a capture mode.
In step S4, the ASR service websocket interface needs to transcode the PCM audio data with 8000hz, mono channel, and 16bit wide, and converts the audio data into a format required by the ASR service websocket interface.
When an incoming call receiving event is triggered, the assistant client is connected with the assistant server and sends text information after being started, the assistant server starts a websocket connection after receiving the text information, is connected with the ASR service, creates a related data structure, and associates two channels.
When a call ending event is triggered, the assistant client sends an ending mark to the assistant server, and the assistant server disconnects the websocket connection with the ASR service and clears related resources.
The embodiment also provides a customer service assistant tool service system based on voice-to-text conversion, which uses the service method to service customers, and comprises the following steps:
the client-side calling module is used for receiving an incoming call through the foreground system, calling the assistant client side and starting to communicate with the client;
the voice acquisition module is used for acquiring audio after the assistant client is started, and the audio comprises the voice of customer service personnel and the voice of a client;
the voice sending module is used for sending the two collected voices to the assistant server in real time through the websocket interface by the assistant client;
the voice conversion processing module is used for utilizing an assistant server or an assistant client to perform corresponding conversion processing on the collected voice so as to adapt to the websocket interface requirement of an ASR manufacturer;
the voice recognition module is used for recognizing the processed audio data through the ASR service provider and receiving a return result in real time;
the recognition result returning module is used for returning the recognition result to the assistant client through the websocket interface by using the assistant server;
the result display module is used for displaying the display interface of the assistant client through the independent window;
the display stopping module is used for stopping collecting the audio data after the call is finished;
the central processing module is used for sending instructions to other modules to complete related actions;
the client calling module, the voice acquisition module, the voice sending module, the voice conversion processing module, the voice recognition module, the recognition result returning module, the result display module and the display stopping module are all electrically connected with the central processing module.
Example two
The invention aims to convert the voice of both telephone traffic parties into characters to be displayed in a customer service system, and mainly relates to the technologies of voice acquisition, audio conversion, voice recognition service and the like of a microphone and a sound card. The practical system should include an assistant client installed in the customer service personnel computer and an assistant server deployed in the server, and a websocket interface of speech recognition (ASR) provided by a third party manufacturer.
As shown in fig. 1, the main process steps of this embodiment are as follows:
s1: calling client
When customer service personnel hear an incoming call through a foreground system (such as using a browser), the assistant client is called up to start to communicate with the client;
s2: speech acquisition
The assistant client, upon starting, begins to capture speech, including the voice of the customer service person input to the computer from the microphone and the voice of the customer output to the headset from the computer sound card;
s3: voice transmission
The client side sends the two collected voices to the assistant server side in real time through the websocket interface;
s4: speech conversion processing
The assistant server is responsible for performing corresponding conversion processing on the voice so as to meet the websocket interface requirements of ASR manufacturers;
for example, a conventional Baidu ASR service interface requires to transmit PCM audio data with 8000hz, single channel, and 16bit wide, and the audio data transmitted from the assistant client generally cannot completely meet the requirement, so the assistant server needs to convert the data into a format completely meeting the interface requirement. After the audio format conversion is completed, the audio format can be sent to an ASR interface according to a certain frequency;
s5: speech recognition
The assistant server side sends the processed audio data to an ASR service provider through a websocket interface, and receives a return result in real time;
the conventional speech recognition mode is speech file transcription, while the customer service assistant needs real-time speech recognition, wherein the ASR service is required to provide real-time speech recognition, a real-time speech recognition interface is generally provided in a websocket interface mode, the assistant service end can send an audio data frame at a certain frequency, and the ASR service end returns a temporary recognition result and a final recognition result in real time;
s6: recognition result return
The assistant server returns the recognition result to the assistant client through the websocket interface;
s7: display of recognition results
The assistant client display interface can exist in a separate window, and the display form can be as shown in fig. 2;
the assistant client is a program which needs to be installed on the customer service computer, runs independently, and can provide a presentation interface.
S8: stop showing
And after the call is finished, the assistant client stops collecting data.
The specific implementation principle of this embodiment is as follows:
the client side is invoked: when the customer service personnel receives the incoming call event trigger, the assistant client can automatically register in the registry when being installed, the assistant client can be called up by the front end of the browser through JavaScript, and the assistant client is automatically connected to the assistant server after being started and transmits the current call information. .
Audio acquisition: the assistant client calls the COM component provided by windows and captures it using Core Audio. Wherein capturing client speech uses the loopback mode and capturing microphone recorded speech uses the capture mode.
Audio transcoding: since the ASR service websocket interface generally requires 8000hz, single-channel, 16bit wide PCM data, the captured data encoding is often related to the specific hardware of the current computer, such as 48000hz, dual-channel, which is common, transcoding is required. The transcoding function can be realized at the assistant client side or the assistant server side. The audio transcoding is realized by calling FFmpeg, and if the transcoding is realized by using the assistant client in view of saving network traffic, the FFmpeg is integrated in the assistant client; if transcoding is realized by the assistant server, the assistant server integrates FFmpeg, and the FFmpeg is used because the set of tools has high coding and decoding efficiency.
Interaction of the assistant client and the assistant server: the websocket is also used for interactions between the assistant client and the assistant server. When an incoming call receiving event is triggered and the assistant client is started, the assistant client is firstly connected with the assistant server and sends text information, wherein the text information comprises information such as the sampling rate, the sampling digit, the channel number and the like of audio collected by the current computer system. The assistant server, upon receiving the text message, initiates a websocket connection to the ASR service and creates a correlation data structure that correlates the two channels. The data structure includes an audio data buffer, a websocket connection channel with the assistant client, and a websocket connection channel with the ASR service. The audio data buffer is used for storing the audio data which is sent by the assistant client and is transcoded, the assistant server sends the audio data to the ASR service interface and fetches data from the buffer, and the buffer is used because the frequency of sending the data by the assistant client is not matched with the frequency of sending the data by the assistant server to the ASR service interface. Thereafter, the assistant client will send the audio data continuously at certain time intervals, and the assistant server sends the recognition result returned by the ASR service to the assistant client as text. When the call end event is triggered, the assistant client sends an end mark to the assistant server, and the assistant server disconnects the websocket connection with the ASR service and clears the related resources.
To sum up, in the customer service assistant tool service method based on the voice-to-text conversion according to the embodiment, by setting the assistant server, receiving the voice through the assistant server and returning the recognition result, the recognition result of the whole phone call can be recorded and saved at the assistant server, thereby facilitating other services such as quality inspection and the like; the ASR service is called through the interface, and the service provider can be replaced; the customer service personnel can copy the chat characters at any time or turn over the chat records in front, so that the system is worthy of being popularized and used.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.