CN112802477A

CN112802477A - Customer service assistant tool service method and system based on voice-to-text conversion

Info

Publication number: CN112802477A
Application number: CN202011612549.8A
Authority: CN
Inventors: 张德昌; 丁常坤; 夏兵; 王江淮; 时代红
Original assignee: Kedaduochuang Cloud Technology Co ltd
Current assignee: Kedaduochuang Cloud Technology Co ltd
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2021-05-14

Abstract

The invention discloses a customer service assistant tool service method and system based on voice-to-text conversion, belonging to the technical field of customer service service, comprising the following steps: S1: call up a client; S2: voice collection; S3: voice transmission; S4: voice conversion processing; S5: speech recognition; S6: return of the recognition result; S7: display of the recognition result; S8: stop display. In the present invention, by setting up an assistant server, receiving voice through the assistant server and returning the recognition result, the assistant server can record and save the recognition result of the whole call, which is convenient for other services such as quality inspection; the ASR service can be called through the interface, and the service can be changed. Provider; customer service staff can copy the chat text at any time, or look at the previous chat records.

Description

Customer service assistant tool service method and system based on voice-to-text conversion

Technical Field

The invention relates to the technical field of customer service, in particular to a customer service assistant tool service method and system based on voice-to-text conversion.

Background

Voice service personnel often need to record or confirm some information to the user while answering the customer's phone. When the amount of information is large, if no other auxiliary means or tools are provided and the memory is fully used, on one hand, the burden of customer service personnel is increased, and on the other hand, the inaccurate information causes the repeated inquiry and confirmation to the customer.

Customer service personnel sometimes need to enter the content spoken by the customer into related systems such as a CRM system and the like during the conversation, if all the content is typed and entered manually, more conversation time is occupied, or distraction is caused, and the working efficiency is not high enough. Therefore, a customer service assistant tool service method based on voice-to-text conversion is provided.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: how to show the content of the conversation between the customer service staff and the customer on the computer used by the customer service staff for the customer service staff to refer and copy, and provides a customer service assistant tool service method based on the speech-to-text conversion.

The invention solves the technical problems through the following technical scheme, and the invention comprises the following steps:

s1: calling client

When the customer service personnel receives the incoming call through the foreground system, the assistant client is called up to start to communicate with the customer;

s2: speech acquisition

The assistant client starts to collect audio after being started, wherein the audio comprises the voice of customer service personnel and the voice of a client;

s3: voice transmission

The assistant client sends the two collected voices to the assistant server in real time through the websocket interface;

s4: speech conversion processing

The assistant server or the assistant client is used for carrying out corresponding conversion processing on the collected voice so as to adapt to the websocket interface requirement of an ASR (automatic speech recognition technology) manufacturer;

s5: speech recognition

The assistant server side sends the processed audio data to an ASR service provider through a websocket interface for recognition, and receives a return result in real time;

s6: recognition result return

The assistant server returns the recognition result to the assistant client through the websocket interface;

s7: display of recognition results

Displaying the assistant client display interface through an independent window;

s8: stop showing

And after the call is finished, the assistant client stops collecting the audio data.

Further, in step S1, when the customer service personnel answers the incoming call event trigger, the foreground system invokes the assistant client through JavaScript, and transmits the current call information.

Further, in step S2, the assistant client calls the COM component provided by windows to collect Audio data by using Core Audio, where the collection client uses the loopback mode and the collection customer service person uses the capture mode.

Furthermore, in step S4, the ASR service websocket interface needs 8000hz, monaural, 16bit wide PCM audio data, and the collected audio data needs to be transcoded to convert the audio data into a format required by the ASR service websocket interface.

Furthermore, when an incoming call answering event is triggered, the assistant client is started, then the assistant server is connected and sends text information, the assistant server receives the text information and then starts a websocket connection, connects to the ASR service, creates a related data structure, and associates the two channels.

Furthermore, when the call end event is triggered, the assistant client sends an end mark to the assistant server, and the assistant server disconnects the websocket connection with the ASR service and clears the relevant resources.

The invention also provides a customer service assistant tool service system based on the voice-to-text conversion, which uses the service method to service the customer and comprises the following steps:

the client-side calling module is used for receiving an incoming call through the foreground system, calling the assistant client side and starting to communicate with the client;

the voice acquisition module is used for acquiring audio after the assistant client is started, and the audio comprises the voice of customer service personnel and the voice of a client;

the voice sending module is used for sending the two collected voices to the assistant server in real time through the websocket interface by the assistant client;

the voice conversion processing module is used for utilizing an assistant server or an assistant client to perform corresponding conversion processing on the collected voice so as to adapt to the websocket interface requirement of an ASR manufacturer;

the voice recognition module is used for recognizing the processed audio data through the ASR service provider and receiving a return result in real time;

the recognition result returning module is used for returning the recognition result to the assistant client through the websocket interface by using the assistant server;

the result display module is used for displaying the display interface of the assistant client through the independent window;

the display stopping module is used for stopping collecting the audio data after the call is finished;

the central processing module is used for sending instructions to other modules to complete related actions;

the client calling module, the voice acquisition module, the voice sending module, the voice conversion processing module, the voice recognition module, the recognition result returning module, the result display module and the display stopping module are all electrically connected with the central processing module.

Compared with the prior art, the invention has the following advantages: the customer service assistant tool service method based on the voice-character conversion is characterized in that an assistant server is arranged, and the assistant server receives the voice and returns the recognition result, so that the recognition result of the whole phone can be recorded and stored at the assistant server, and other services such as quality inspection and the like are facilitated; the ASR service is called through the interface, and the service provider can be replaced; the customer service personnel can copy the chat characters at any time or turn over the chat records in front, so that the system is worthy of being popularized and used.

Drawings

FIG. 1 is a schematic diagram illustrating an interaction flow between an assistant client and an assistant server according to a second embodiment of the present invention;

FIG. 2 is a diagram of an example of an independent window interface according to a second embodiment of the present invention.

Detailed Description

The following examples are given for the detailed implementation and specific operation of the present invention, but the scope of the present invention is not limited to the following examples.

Example one

The embodiment provides a technical scheme: a customer service assistant tool service method based on speech-to-text conversion comprises the following steps:

s1: calling client

s2: speech acquisition

s3: voice transmission

s4: speech conversion processing

s5: speech recognition

s6: recognition result return

s7: display of recognition results

s8: stop showing

In step S1, when the customer service staff answers the incoming call event trigger, the foreground system invokes the assistant client through JavaScript, and transmits the current call information.

In step S2, the assistant client calls the COM component provided by the windows, and acquires Audio data by using Core Audio, where the voice of the client is acquired by using a loopback mode, and the voice of the service person is acquired by using a capture mode.

In step S4, the ASR service websocket interface needs to transcode the PCM audio data with 8000hz, mono channel, and 16bit wide, and converts the audio data into a format required by the ASR service websocket interface.

When an incoming call receiving event is triggered, the assistant client is connected with the assistant server and sends text information after being started, the assistant server starts a websocket connection after receiving the text information, is connected with the ASR service, creates a related data structure, and associates two channels.

When a call ending event is triggered, the assistant client sends an ending mark to the assistant server, and the assistant server disconnects the websocket connection with the ASR service and clears related resources.

The embodiment also provides a customer service assistant tool service system based on voice-to-text conversion, which uses the service method to service customers, and comprises the following steps:

Example two

The invention aims to convert the voice of both telephone traffic parties into characters to be displayed in a customer service system, and mainly relates to the technologies of voice acquisition, audio conversion, voice recognition service and the like of a microphone and a sound card. The practical system should include an assistant client installed in the customer service personnel computer and an assistant server deployed in the server, and a websocket interface of speech recognition (ASR) provided by a third party manufacturer.

As shown in fig. 1, the main process steps of this embodiment are as follows:

s1: calling client

When customer service personnel hear an incoming call through a foreground system (such as using a browser), the assistant client is called up to start to communicate with the client;

s2: speech acquisition

The assistant client, upon starting, begins to capture speech, including the voice of the customer service person input to the computer from the microphone and the voice of the customer output to the headset from the computer sound card;

s3: voice transmission

The client side sends the two collected voices to the assistant server side in real time through the websocket interface;

s4: speech conversion processing

The assistant server is responsible for performing corresponding conversion processing on the voice so as to meet the websocket interface requirements of ASR manufacturers;

for example, a conventional Baidu ASR service interface requires to transmit PCM audio data with 8000hz, single channel, and 16bit wide, and the audio data transmitted from the assistant client generally cannot completely meet the requirement, so the assistant server needs to convert the data into a format completely meeting the interface requirement. After the audio format conversion is completed, the audio format can be sent to an ASR interface according to a certain frequency;

s5: speech recognition

The assistant server side sends the processed audio data to an ASR service provider through a websocket interface, and receives a return result in real time;

the conventional speech recognition mode is speech file transcription, while the customer service assistant needs real-time speech recognition, wherein the ASR service is required to provide real-time speech recognition, a real-time speech recognition interface is generally provided in a websocket interface mode, the assistant service end can send an audio data frame at a certain frequency, and the ASR service end returns a temporary recognition result and a final recognition result in real time;

s6: recognition result return

s7: display of recognition results

The assistant client display interface can exist in a separate window, and the display form can be as shown in fig. 2;

the assistant client is a program which needs to be installed on the customer service computer, runs independently, and can provide a presentation interface.

S8: stop showing

And after the call is finished, the assistant client stops collecting data.

The specific implementation principle of this embodiment is as follows:

the client side is invoked: when the customer service personnel receives the incoming call event trigger, the assistant client can automatically register in the registry when being installed, the assistant client can be called up by the front end of the browser through JavaScript, and the assistant client is automatically connected to the assistant server after being started and transmits the current call information. .

Audio acquisition: the assistant client calls the COM component provided by windows and captures it using Core Audio. Wherein capturing client speech uses the loopback mode and capturing microphone recorded speech uses the capture mode.

Audio transcoding: since the ASR service websocket interface generally requires 8000hz, single-channel, 16bit wide PCM data, the captured data encoding is often related to the specific hardware of the current computer, such as 48000hz, dual-channel, which is common, transcoding is required. The transcoding function can be realized at the assistant client side or the assistant server side. The audio transcoding is realized by calling FFmpeg, and if the transcoding is realized by using the assistant client in view of saving network traffic, the FFmpeg is integrated in the assistant client; if transcoding is realized by the assistant server, the assistant server integrates FFmpeg, and the FFmpeg is used because the set of tools has high coding and decoding efficiency.

Interaction of the assistant client and the assistant server: the websocket is also used for interactions between the assistant client and the assistant server. When an incoming call receiving event is triggered and the assistant client is started, the assistant client is firstly connected with the assistant server and sends text information, wherein the text information comprises information such as the sampling rate, the sampling digit, the channel number and the like of audio collected by the current computer system. The assistant server, upon receiving the text message, initiates a websocket connection to the ASR service and creates a correlation data structure that correlates the two channels. The data structure includes an audio data buffer, a websocket connection channel with the assistant client, and a websocket connection channel with the ASR service. The audio data buffer is used for storing the audio data which is sent by the assistant client and is transcoded, the assistant server sends the audio data to the ASR service interface and fetches data from the buffer, and the buffer is used because the frequency of sending the data by the assistant client is not matched with the frequency of sending the data by the assistant server to the ASR service interface. Thereafter, the assistant client will send the audio data continuously at certain time intervals, and the assistant server sends the recognition result returned by the ASR service to the assistant client as text. When the call end event is triggered, the assistant client sends an end mark to the assistant server, and the assistant server disconnects the websocket connection with the ASR service and clears the related resources.

To sum up, in the customer service assistant tool service method based on the voice-to-text conversion according to the embodiment, by setting the assistant server, receiving the voice through the assistant server and returning the recognition result, the recognition result of the whole phone call can be recorded and saved at the assistant server, thereby facilitating other services such as quality inspection and the like; the ASR service is called through the interface, and the service provider can be replaced; the customer service personnel can copy the chat characters at any time or turn over the chat records in front, so that the system is worthy of being popularized and used.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. a kind of customer service assistant tool service method based on voice-to-text conversion, is characterized in that, comprises the following steps:

S1: call up the client

When the customer service staff receives the call through the front desk system, they call up the assistant client and start to communicate with the customer;

S2: Voice capture

After the assistant client is started, audio collection starts, including the voice of the customer service staff and the voice of the customer;

S3: Voice sending

The assistant client sends the collected two-way voice to the assistant server in real time through the websocket interface;

S4: Speech conversion processing

Use the assistant server or assistant client to perform corresponding conversion processing on the collected voice to meet the needs of the websocket interface of the ASR service provider;

S5: Speech Recognition

The assistant server sends the processed audio data to the ASR service provider through the websocket interface for identification, and receives the returned result in real time;

S6: The recognition result is returned

S7: Display of recognition results

Display the assistant client display interface through a separate window;

S8: stop showing

After the call ends, the assistant client stops collecting audio data.

2. a kind of customer service assistant tool service method based on voice-to-text conversion according to claim 1, is characterized in that: in described step S1, when customer service personnel answers the incoming call event trigger, the foreground system calls up the assistant customer by JavaScript terminal, and transmit the current call information.

3. a kind of customer service assistant tool service method based on voice-to-text conversion according to claim 2, is characterized in that: in described step S2, assistant client calls the COM component that windows provides, uses Core Audio to carry out to audio data For collection, the loopback mode is used to collect the customer's voice, and the capture mode is used to collect the voice of the customer service personnel.

4. a kind of customer service assistant tool service method based on voice-to-text conversion according to claim 3, is characterized in that: in described step S4, ASR service websocket interface needs the PCM audio frequency of 8000hz, monophonic, bit width 16bit Data, you need to transcode the collected audio data, and convert the audio data into the format required by the ASR service websocket interface.

5. A kind of customer service assistant tool service method based on voice-to-text conversion according to claim 4, it is characterized in that: when an incoming call event is triggered, after the assistant client is started, first connect the assistant server and send the text information, the assistant server starts a websocket connection after receiving the text information, connects to the ASR service, and creates a related data structure to associate the two channels.

6. A kind of customer service assistant tool service method based on voice-to-text conversion according to claim 5, is characterized in that: when the call end event is triggered, the assistant client sends an end sign to the assistant server, and the assistant The server disconnects the websocket connection to the ASR service and cleans up related resources.

7. A customer service assistant tool service system based on voice-to-text conversion, using the service method according to any one of claims 1 to 6 to serve customers, comprising:

The client mobilization module is used to receive calls through the front desk system, mobilize the assistant client, and start communicating with customers;

The voice acquisition module is used to collect audio after the assistant client is started, including the voice of the customer service staff and the voice of the customer;

The voice sending module is used to send the two collected voices to the assistant server in real time through the websocket interface through the assistant client;

The voice conversion processing module is used to use the assistant server or assistant client to perform corresponding conversion processing on the collected voice to meet the needs of the ASR manufacturer's websocket interface;

The speech recognition module is used to recognize the processed audio data through the ASR service provider, and receive the returned result in real time;

The identification result returning module is used to return the identification result to the assistant client through the websocket interface by using the assistant server;

The result display module is used to display the assistant client display interface through an independent window;

Display the stop module, which is used to stop collecting audio data after the call ends;

The central processing module is used to issue instructions to other modules to complete related actions;

The client activating module, voice collecting module, voice sending module, voice conversion processing module, voice recognition module, recognition result return module, result display module and display stop module are all electrically connected with the central processing module.