WO2024106830A1 - Procédé de fonctionnement d'empreinte vocale basé sur un répertoire téléphonique et dispositif électronique le prenant en charge - Google Patents
Procédé de fonctionnement d'empreinte vocale basé sur un répertoire téléphonique et dispositif électronique le prenant en charge Download PDFInfo
- Publication number
- WO2024106830A1 WO2024106830A1 PCT/KR2023/017650 KR2023017650W WO2024106830A1 WO 2024106830 A1 WO2024106830 A1 WO 2024106830A1 KR 2023017650 W KR2023017650 W KR 2023017650W WO 2024106830 A1 WO2024106830 A1 WO 2024106830A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- electronic device
- learning
- voiceprint
- voice data
- voiceprint model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/18—Artificial neural networks; Connectionist approaches
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M9/00—Arrangements for interconnection not involving centralised switching
- H04M9/08—Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
Definitions
- Audio operation functions may include music playback functions, video playback functions, and call functions.
- an echo may occur in which a signal transmitted from the other party is transmitted back to the other party through the MIC of the user's terminal.
- electronic devices use echo cancellers as a sounding solution, and noise suppression technology is used to remove unnecessary noise in the surrounding environment.
- an electronic device uses an echo canceller solution, it can reduce linear echo, but is affected by vibration or reverberation of the speaker or receiver (i.e. noise left after the echo has been removed), resulting in distorted speech being transmitted to the other party. Residual echo may occur.
- this description provides a method device capable of improving sound quality when operating the audio operation function of an electronic device.
- it provides a complete method that distinguishes the voice of a specific speaker and removes ambient noise and residual echo except for the target speaker.
- a method and device that can provide sign language sound quality can be provided, and other examples will be mentioned below along with descriptions of various embodiments.
- the electronic device of the present disclosure may include a communication circuit that supports a call function, a memory that stores a phonebook, and a processor functionally connected to the communication circuit and the memory.
- the processor forms a call channel with the other party's electronic device in response to a call function activation request based on the identification information of the other party's electronic device registered in the phonebook, and when voice data transmitted by the other party's electronic device is received, the voice data is It may be set to connect the voiceprint model generated based on this to the identification information of the other party's electronic device, and to store the voiceprint model connected to the identification information of the other party's electronic device in the phonebook.
- a phonebook-based voiceprint operation method includes the steps of forming a call channel with the other party's electronic device in response to a call function activation request based on the identification information of the other party's electronic device registered in the phonebook of the electronic device; An operation of receiving voice data transmitted by the other electronic device, and if the signal quality of the voice data is better than a specified reference value, processing the generation of a voiceprint model based on the voice data, and the processing The operation may include linking the voiceprint model generated based on the voice data to identification information of the other party's electronic device, and storing the voiceprint model linked to the identification information of the other party's electronic device in the phonebook.
- This device is an example of the audio operation function of an electronic device, and supports improving call quality by utilizing differentiated artificial intellectual (AI) technology (deep learning) when making a call. For example, this device trains (or learns) the voice of a specific speaker through deep learning and effectively removes other voices or noises based on the trained speaker's voice, thereby eliminating ambient noise and residual echo without voice loss. echo) can be improved.
- AI artificial intellectual
- This effect is one effect of the present disclosure, and descriptions of other effects will be mentioned along with the description of the examples.
- Figure 1 is a diagram illustrating an example of a system environment supporting a phonebook-based voiceprint operation function according to an embodiment.
- FIG. 2 is a diagram illustrating an example of the configuration of an electronic device according to an embodiment.
- FIG. 3 is a diagram illustrating an example of a configuration of a first type electronic device including an audio processor according to an embodiment.
- FIG. 4 is a diagram illustrating an example of a configuration of a second type electronic device including an audio processor according to an embodiment.
- Figure 5 is a diagram showing an example of a server device configuration according to an embodiment.
- FIG. 6 is a diagram illustrating an example of an electronic device operation method related to phonebook-based voiceprint modeling according to an embodiment.
- FIG. 7 is a diagram illustrating an example of a first electronic device operation method related to phonebook-based voiceprint model operation according to an embodiment.
- FIG. 8 is a diagram illustrating an example of a second electronic device operation method related to phonebook-based voiceprint model operation according to an embodiment.
- FIG. 9 is a diagram illustrating an example of a method of using a server device of an electronic device related to operation of a phonebook-based voiceprint model according to an embodiment.
- Figure 10 is a diagram illustrating an example of a screen interface related to voiceprint modeling according to an embodiment.
- FIG. 11 is a diagram illustrating another example of a screen interface related to voiceprint modeling according to an embodiment.
- FIG. 12 is a block diagram of an electronic device 1201 in a network environment 1200, according to various embodiments.
- this device eliminates the hassle of having to register the user's voice through various procedures in advance or conducting intentional learning when training the voice of a specific speaker.
- the target to be learned or trained can be automatically or easily specified based on the electronic device user's phone book (or contact information) (or phone book information), and the specified target's call can be made.
- Figure 1 is a diagram illustrating an example of a system environment supporting a phonebook-based voiceprint operation function according to an embodiment.
- a system environment 10 supporting a phonebook-based voiceprint operation function includes a first electronic device 101 (or a speaking device) (or a user terminal, a portable terminal, or a portable electronic device); It may include a second electronic device 102 (or a sign language device) (or a user terminal, a portable terminal, or a portable electronic device), a server device 200 (or a cloud server device), and a network 50.
- the first electronic device 101 and the second electronic device 102 are named based on the device through which the call is attempted, and the first electronic device 101 and the second electronic device 102 are the same electronic device.
- the first electronic device 101 and the second electronic device 102 may support both speech and sign language functions, respectively.
- the configuration of the server device 200 is additionally described in relation to supporting the voiceprint operation function of the first electronic device 101 and the second electronic device 102, but the present invention is not limited thereto.
- the server device 200 may be omitted from the system environment 10.
- the network 50 is between the first electronic device 101 and the second electronic device 102, between the first electronic device 101 and the server device 200, and between the second electronic device 102 and the server device 200.
- a communication channel can be formed in at least one of the relationships.
- the network 50 may include, for example, at least one of a wireless communication network element and a wired communication network element.
- the network 50 may include at least one of a mobile communication network including at least one base station, a base station controller, and a core system, and an Internet network connected to the mobile communication network.
- the network 50 may support the formation of a communication channel for the portable terminal based on the mobile communication network. .
- the network 50 forms a communication channel between the server device 200 and the first electronic device 101 (or the second electronic device 102) or forms a communication channel between the server device 200 and the first electronic device 101 (or the second electronic device 102). It is a component that can transmit and receive signals or data by forming a communication channel between the second electronic devices 102, and is not limited to a specific communication method or communication equipment.
- the server device 200 forms a communication channel with the first electronic device 101 or the second electronic device 102 through the network 50, and the first electronic device 101 or the second electronic device (102) In response to requests, phonebook-based voiceprint learning and voiceprint model creation functions can be supported. For example, the server device 200 prepares a communication channel for connection of the first electronic device 100 (or the second electronic device 102), and the first electronic device 100 (or the second electronic device 102) When (102)) is connected, information related to phonebook-based voiceprint learning and voiceprint model generation may be provided to the first electronic device 100) (or the second electronic device 102). The server device 200 may collect voice data corresponding to a specific device user from the first electronic device 100 (or the second electronic device 102) and perform voiceprint modeling on the collected voice data. there is.
- the server device 200 may receive a voiceprint model being learned corresponding to a specific device user from the first electronic device 100 (or the second electronic device 102) and newly collected voice data corresponding to the specific device user.
- the voiceprint model may be received, and the voiceprint model learning may be performed using the received voice data, and an updated voiceprint model may be provided to the first electronic device 100 (or the second electronic device 102).
- the configuration of the server device 200 may be omitted.
- the first electronic device 101 may form a communication channel with the second electronic device 102 through the network 50.
- the first electronic device 101 may collect the user's voice and transmit it as first voice data (or transmitted voice data) to the second electronic device 102, and the second electronic device 102 may send the second voice data (or Sign language voice data) can be received.
- the first electronic device 101 uses the user's identification information as the basis if the identification information of the user of the second electronic device 102 is stored in the phonebook.
- a first voiceprint model of the user of the second electronic device 102 can be created.
- the first electronic device 101 may map and store the generated first voiceprint model to the identification information of the user of the second electronic device 102 pre-stored in the phonebook.
- the first electronic device 101 learns the first voiceprint model using the second voice data collected based on the identification information of the user of the second electronic device 102 until learning the first voiceprint model is completed. can be performed. When learning is completed, the first electronic device 101 uses the learned first voiceprint model to perform voice filtering on the second voice data when making a call with the second electronic device 102, so that the noise is improved. function can be supported.
- the first electronic device 101 may perform a process of generating a first voiceprint model through the server device 200.
- the first electronic device 101 collects second voice data corresponding to the user identification information of the second electronic device 102 in real time, after completion of a call, or when a certain amount of second voice data is collected during a call or more.
- the second voice data may be transmitted to the server device 200 to request learning of the first voiceprint model.
- the first electronic device 101 generates new identification information corresponding to the identification information of the user of the second electronic device 102 recorded in the phonebook, and identifies the new identification information.
- the second voice data and the first voiceprint model being learned may be provided to the server device 200.
- the first electronic device 101 receives a first voiceprint model updated through learning (e.g., a first voiceprint model completed through update or an updated first voiceprint model that requires additional learning) from the server device 200. Then, the new identification information can be confirmed and the first voiceprint model can be updated by matching the new identification information with phonebook information (eg, identification information of the user of the second electronic device 102).
- the second electronic device 102 may form a communication channel with the first electronic device 101 through the network 50.
- the second electronic device 102 may receive a call connection request message from the first electronic device 101 and connect a call with the first electronic device 101 in response to a user operation.
- the second electronic device 102 may transmit second voice data input by the user to the first electronic device 101 and output first voice data received from the first electronic device 101.
- the second electronic device 102 generates a first glottal modeling (or first glottal model) for the second voice data (or speech information) of the user of the second electronic device 102 from the first electronic device 101. of learning) You can receive a message requesting permission to perform.
- the second voice data collected by the second electronic device 102 and transmitted to the first electronic device 101 may be transferred from the first electronic device 101 to the user of the second electronic device 102. It can be used for first glottal modeling.
- the user of the second electronic device 102 may disallow performing the first voiceprint modeling on his or her voice data (e.g., second voice data), and when a user input corresponding to the disallowance occurs, the second electronic device ( 102) may transmit a message corresponding to disallowance of first voiceprint modeling to the first electronic device 101.
- the second electronic device 102 automatically (or according to the consent of the user of the first electronic device 101) dictionary the first voice data transmitted by the first electronic device 101 into the phonebook. It can be used to learn a second voiceprint model corresponding to the stored user identification information of the first electronic device 101.
- the second electronic device 102 performs second glottal modeling (or a second glottal model) for the first voice data of the user of the first electronic device 101.
- the process for learning can be performed through the server device 200.
- the second electronic device 102 sends at least a portion of the first voice data received from the first electronic device 101 and the second voiceprint model being pre-stored in response to the first electronic device 101 to the server. It can be transmitted to the device 200. If there is no second voiceprint model being learned, the second electronic device 102 may transmit only the first voice data to the server device 200.
- the second electronic device 102 For the purpose of protecting personal information, the second electronic device 102 provides new identification information corresponding to the first voice data (e.g., information that can identify the electronic device user, such as the electronic device user's name, phone number, and email address). random information excluded) is generated and transmitted to the server device 200 together with the first voice data, and an updated second voiceprint model mapped to new identification information from the server device 200 (or an updated second voiceprint model for which learning has not been completed 2 voiceprint model) can be received.
- the second electronic device 102 may match and store the second voiceprint model (a model being learned or a model whose learning has been completed) received from the server device 200 in the phonebook.
- FIG. 2 is a diagram illustrating an example of the configuration of an electronic device according to an embodiment.
- the electronic device described in FIG. 2 may be at least one of the first electronic device 101 or the second electronic device 102 described previously in FIG. 1 . Accordingly, hereinafter, the first electronic device 101 and the second electronic device 102 will be described by assigning the drawing number 100 to represent the electronic devices.
- the electronic device 100 may include a communication circuit 110, an input/output device 120, a memory 130, a display 140, and a processor 150.
- the communication circuit 110 may form at least one communication channel in connection with supporting the communication function of the electronic device 100.
- the communication circuit 110 may form a communication channel with the server device 200 through the network 50.
- the communication circuit 110 may support at least one communication method among various communication methods such as 3G, 4G, LTE, or 5G.
- the communication circuit 110 may include a plurality of communication modules to support a plurality of communication methods.
- the communication circuit 110 may create a call channel (eg, a voice call channel, a video call channel) with another electronic device in response to the control of the processor 150.
- the communication circuit 110 can transmit its voice data to another electronic device and receive voice data from the other electronic device.
- the communication circuit 110 may transmit voice data received from another electronic device to the server device 200. If there is a voiceprint model being learned based on voice data from another electronic device, the communication circuit 110 transmits the received voice data and the voiceprint model being learned together to the server device 200, and receives the voiceprint model from the server device 200. An updated voiceprint model can be received.
- the input/output device 120 may include at least one input means supporting an input function of the electronic device 100 and at least one output means supporting an output function.
- the input means may include various means such as a touch pad, touch keys, physical keys, physical buttons, a voice input device, a jog shuttle, or a joystick.
- the display 140 when the display 140 includes a touch screen that supports a touch function, the display 140 may be included in the input means.
- the input means may include, for example, at least one microphone 121.
- the output means may include at least one speaker 122 that outputs an audio signal.
- the output means may include a vibration module that outputs vibration of a specific pattern and an LED lamp that outputs light of a specific color.
- the memory 130 can store various data or programs necessary for operating the electronic device 100.
- the memory 130 may store an application that can form a call channel with another electronic device.
- the memory 130 may store the phone book 131 and at least one voiceprint model 132 matched to each user identification information registered in the phone book 131.
- the at least one voiceprint model 132 may be generated, for example, based on voice data of another electronic device user stored in the phonebook 131.
- the voiceprint model 132 may include a voiceprint model in training that has been learned from voice data less than a predefined certain amount.
- at least one voiceprint model 132 may include a voiceprint model that has been trained by learning a predefined amount of voice data or more.
- User identification information registered in the phonebook 131 and the at least one voiceprint model 132 may be matched with each other.
- the display 140 may provide various screens according to the operation of the electronic device 100.
- the display 140 may include a search screen for the phonebook 131, a transmission screen for requesting a call connection with another electronic device user registered in the phonebook 131, and a reception screen for receiving a call connection request from another electronic device. At least one screen can be output.
- the display 140 may output status information of at least one voiceprint model 132.
- the display 140 may output a voiceprint model that is being learned and a voiceprint model that has completed learning separately.
- user identification information e.g., at least one of name, phone number, email, and SNS address
- the display 140 includes a connection screen for the server device 200 that performs voiceprint modeling, a screen for transmitting voice data received from another electronic device to the server device 200, a voiceprint model being learned, and a reception screen. At least one of the screens for transmitting voice data to the server device 200 may be output.
- the display 140 may output at least one of a screen for receiving an updated voiceprint model from the server device 200 and a screen for applying the received updated voiceprint model to the voiceprint model 132 stored in the memory 130. You can.
- the processor 150 may perform signal transmission, processing, and storage control according to the operation of the electronic device 100. For example, the processor 150 may output a screen corresponding to the phonebook 131 (or call list) in response to a user input. The processor 150 may output user identification information registered in the phone book 131 (or call list screen) and information corresponding to the voiceprint model 132 matched for each user identification information. If there is no voiceprint model matched to the phonebook 131, the processor 150 may output user identification information without a matched voiceprint model to the display 140. When specific user identification information is selected and a user input requesting a call connection is received, the processor 150 may perform an operation to connect a call channel with another electronic device. Additionally or alternatively, the processor 150 may output an object (eg, a virtual button) that can determine whether to execute voiceprint modeling on a screen containing user identification information to the display 140 .
- an object eg, a virtual button
- the processor 150 may perform voiceprint model learning based on voice data acquired during a call with another electronic device. In this operation, if there is an existing voiceprint model being trained that matches the other electronic device user identification information, the processor 150 applies the currently received voice data of the other electronic device user to the voiceprint model being trained to train the voiceprint model. It can be done.
- a learning method for voiceprint modeling at least one of various artificial neural network modeling that generates voiceprints using voice data may be used.
- the electronic device 100 may store at least one artificial neural network algorithm.
- the electronic device 100 may provide the acquired voice data to the server device 200 and generate or update a voiceprint model using an artificial neural network algorithm stored in the server device 200.
- the processor 150 may output a reception screen for receiving a call connection request message.
- the reception screen may output a virtual object that can determine whether to accept a call, and a virtual object that can determine whether to perform a voiceprint model based on voice data from another electronic device.
- the processor 150 may output a learning status (e.g., learning or completion of learning, remaining percentage until learning completion) for the voiceprint model of another electronic device requesting a call connection.
- FIG. 3 is a diagram illustrating an example of a configuration of a first type electronic device including an audio processor according to an embodiment.
- a first type electronic device 100_1 includes a communication circuit 110, a first type processor 151, and an input/output device 120 (e.g., a speaker 122 (or receiver) , may include a microphone 121).
- an input/output device 120 e.g., a speaker 122 (or receiver) , may include a microphone 121).
- the communication circuit 110 may include, for example, at least one reception amplifier 110_2a, at least one transmission amplifier 110_2b, and an RF module 110_1 including at least one reception amplifier 110_2a.
- This communication circuit 110 may be the same or similar to the communication circuit shown in FIG. 2 above.
- the communication circuit 110 may further include at least one antenna for receiving a signal and at least one antenna for transmitting a signal.
- the at least one receiving amplifier 110_2a may be connected to at least one antenna of the first type electronic device 100_1.
- the at least one reception amplifier 110_2a may include at least one low noise amplifier.
- the transmission amplifier 110_2b may amplify the signal transmitted by the transmission processor 150b (tx solution processor) and transmit the amplified transmission signal to at least one antenna.
- the transmission amplifier 110_2b may receive a transmission signal from an echo canceller included in the audio processor.
- the RF module 110_1 may perform RF processing on a signal amplified through at least one receiving amplifier 110_2a. According to one embodiment, the RF module 110_1 may determine an abnormal Rx signal (or wireless reception signal). When an abnormal Rx signal is received or data with deteriorated sound quality due to a network problem is received, the RF module 110_1 may exclude the corresponding Rx signal. In this operation, the RF module 110_1 can check RF Bit error information in the received Rx signal and determine whether the Rx signal is normal based on the test result. The RF module 110_1 may transmit the received Rx signal to the signal to ratio (SNR) detector 150a_1.
- SNR signal to ratio
- the first type processor 151 may include an audio processor as an application processor.
- the first type processor 151 including the audio processor may include, for example, a receiving processor 150a (rx solution processor) and a transmitting processor 150b (tx solution processor).
- the receiving processor 150a may include an SNR detection unit 150A_1, a learning module 150a_2, a storage control module 150a_3, a noise suppression module 150a_4, and an audio output control module 150a_5.
- At least one of the SNR detection unit 150A_1, learning module 150a_2, storage control module 150a_3, noise suppression module 150a_4, and audio output control module 150a_5 may be formed as a software module or as a hardware configuration. there is.
- the SNR detection unit 150A_1 may check the SNR of the signed language voice data received from the RF module 110_1.
- the SNR detector 150A_1 may check whether the SNR of the signed language voice data received from the RF module 110_1 is less than a predetermined value. If the SNR value is less than a predetermined value, the SNR detection unit 150A_1 may transmit a signal for output of the received sign language voice data. If the SNR of the sign language voice data is more than a pre-specified value, the SNR detector (150A_1) may transmit the sign language voice data to the learning module (150a_2) (e.g., deep neural network, DNN training module) in relation to learning for the voiceprint model. there is.
- the learning module 150a_2
- the learning module 150a_2 (e.g., speaker aware DNN block) can capture sign language voice data transmitted through the SNR detector 150A_1 and learn (e.g., DNN Training) the captured voice data.
- the learning module 150a_2 can control output of a virtual object related to the decision to collect voice data through a phonebook-related user interface (UI) (or display screen) so that learning can be performed in relation to creating a voiceprint model.
- UI phonebook-related user interface
- the learning module (150a_2) makes a call to the other party using the phone book, but when the other party receives voice data from a person other than the user registered in the phone book, the other party's phone book is pre-stored.
- the learning module 150a_2 After determining the similarity with the voiceprint model matched to , it is possible to decide whether to apply learning to the currently collected voice data. For example, if the similarity between the currently collected voice data and the previously stored voiceprint model is greater than or equal to a threshold, the learning module 150a_2 performs learning of the voiceprint model based on the collected voice data. If it is less than the threshold, the learning module 150a_2 does not perform learning, and if the similarity between the currently collected voice data and the previously stored voiceprint model is greater than or equal to the threshold, the learning module 150a_2 performs learning of the voiceprint model based on the collected voice data. Data non-reflection processing can be performed. The learning module 150a_2 can process learning on received voice data in real time.
- the learning module 150a_2 may perform learning (eg, voiceprint modeling) on previously received voice data.
- the learning module 150a_2 may store voice data of the other party during a call, and perform learning based on the stored received voice data after the call is completed. In relation to this operation, the learning module 150a_2 may separately extract the voice data of the other party during a call.
- voice data of a predefined length or over a certain period of time is received, the learning module 150a_2 receives voice data corresponding to predefined certain speech information (e.g., voice data containing a specific word or sentence).
- the received voice data is learned in at least one of the following cases: when voice data exceeding a predefined certain amplitude (amplitude, or volume) is received, and when voice data uttered at a speed exceeding a predefined certain size is received. can do.
- the conditions under which learning progresses may include at least one of the various conditions described above. The various conditions described above can be selected or adjusted by the user.
- the storage control module 150a_3 can control the voiceprint model learned through an artificial neural network to be stored in the memory 130.
- the storage control module 150a_3 stores voice information (or voiceprint model) learned through a deep neural network (DNN) in the memory 130 (e.g., database storage) as an extract speaker profile. You can.
- the storage control module 150a_3 may match the learned voiceprint model to user identification information registered in the phonebook 131 and store it.
- this device can identify the other party using phone book information that the user uses when making a call, and based on this, a voiceprint stored as the other party's unique extract speaker profile can be created and operated.
- the extract profile block can be linked to the phone book and information to continuously learn the other party's voice data and update it until learning is complete.
- the noise suppression module 150a_4 may perform noise suppression processing on received voice data.
- the noise suppression module 150a_4 may transmit noise-suppressed data to the audio output control module 150a_5.
- the audio output control module 150a_5 may include at least one of Gain, Filter, Limiter, and DRC (dynamic range control) that can tune (e.g., adjust volume and tone) noise-suppressed data.
- the audio output control module 150a_5 can remove noise from received voice data based on the voiceprint model stored in the learned extract speaker profile (database storage).
- the noise suppression module 150a_4 and the audio output control module 150a_5 may perform noise removal using the voiceprint model whose learning has been completed at or above a predefined certain ratio if there is a voiceprint model. If the learning level of the voiceprint model is less than a predefined certain ratio, only the specified noise removal operation can be performed without using the voiceprint model.
- the transmission processor 150b may include an echo canceller 150b_1. According to one embodiment, additionally or alternatively, the transmission processor 150b may include at least one of Gain, Filter, Limiter, and DRC that can tune the transmission signal (e.g., adjust volume and timbre).
- the transmit processor 150b may include a Noise Suppressor (NS) block for removing background noise.
- NS Noise Suppressor
- the echo canceller 150b_1 may include an adaptive filter 150b_1a (adaptive filter) and an ambient noise removal module 150b_2b.
- the echo remover (150b_1) stores Echo Reference, which is a sign language (Rx) signal for reference, for echo removal, and creates an adaptive filter (150b_1a) based on the stored echo reference value to linearly remove echo.
- the surrounding noise removal module removes at least some of the surrounding noise and residual echo by referring to the voice information (glotprint model) through the extract speaker profile block in which the other person's voice has been learned. can do.
- the above-described audio processor may be configured as a hardware processor. At least one component described as being included in the receiving processor 150a and the transmitting processor 150b included in the audio processor may be comprised of at least one of a hardware processor or a software module.
- FIG. 4 is a diagram illustrating an example of a configuration of a second type electronic device including an audio processor according to an embodiment.
- the second type electronic device 100_2 includes a communication circuit 110, a second type processor 152, and an input/output device 120 (e.g., a speaker 122 (or receiver) , it may include a first microphone (121a) and a second microphone (121b).
- the communication circuit 110 may include at least one receiving antenna, at least one transmitting antenna, and an RF module.
- the communication circuit 110 (eg, RF module) may constitute at least a portion of a communication processor.
- the at least one receiving antenna may be disposed in a receiving path to receive voice data transmitted by another electronic device (or transmitted through a base station).
- the at least one transmission antenna may be disposed in a transmission path to transmit a signal generated for transmission.
- the RF module may perform RF reception processing (e.g., frequency filtering, frequency conversion) on a received voice signal, or may perform RF transmission processing (e.g., frequency conversion) on a transmitted voice signal.
- This communication circuit 110 may have a configuration corresponding to the communication circuit previously described in FIGS. 2 and 3.
- the second type processor 152 includes a channel decoder (152_1a), an Rx vocoder (152_2a), a reception processor (152a), a digital to analog converter (DAC) (152_3A), a learning module (152a_2), and a storage disposed in the reception path. It may include a control module 152a_3. At least some of the components of the second type processor 152 may be formed as software modules or hardware components.
- the DAC 152_3A may convert the audio processed data (eg, digital signal) transmitted from the receiving processor 152a into an analog signal and transmit it to the speaker 122.
- the channel decoder 152_1a is connected to the communication circuit 110 and can decode a channel through which voice data is transmitted among the received signals received through the communication circuit 110.
- the channel decoder 152_1a can check the channel through which voice data is transmitted through channel decoding, and transmit the voice data transmitted through the channel to the Rx vocoder 152_2a.
- the Rx vocoder 152_2a can synthesize the original voice of voice data and transmit the synthesized original voice to the receiving processor 152a.
- the Rx vocoder (152_2a) can transmit the synthesized original voice to the SNR detector (152A_1) of the receiving processor (152a).
- the SNR detection unit 152A_1 can detect the SNR for data received from the Rx vocoder 152_2a, as previously described with respect to the first type processor 151 of FIG. 3.
- the SNR detector 152A_1 may transmit the received voice data to the learning module 152a_2 when the SNR of the currently received voice data is greater than or equal to a pre-specified value (e.g., when the received voice data is good).
- the receiving processor 152a can perform at least one of gain processing, limiter processing, noise suppression processing, and DRC processing on the voice data and then transmit the data to the DAC 152_3A. If the received voice data is good, the learning module 152a_2 may receive the received voice data from the SNR detector 152A_1 and perform learning on the received voice data.
- Voice data learning of the learning module 152a_2 may be performed in the same or similar manner as the voiceprint modeling of the learning module 152a_2 previously described in FIG. 3.
- the learning module 152a_2 transmits the learned voiceprint model to the storage control module 152a_3, and the storage control module 152a_3 can store the learned voiceprint model in a certain area (e.g., extract speaker profile) of the memory 130. there is.
- the storage control module 152a_3 may match (or link) the other party's identification information registered in the phonebook and store it.
- the storage control module 152a_3 transfers the voiceprint model to the receiving processor 152a when the learning degree of a specific voiceprint model is greater than or equal to a specified value or when learning is completed. Supports application to filtering on voice data.
- the receiving processor 152a performs audio processing (e.g., at least one of gain, limiter, noise suppressing (NS), and dynamic range control (DRC)) on the received voice data, and converts the audio-processed voice data into a DAC ( 152_3A).
- audio processing e.g., at least one of gain, limiter, noise suppressing (NS), and dynamic range control (DRC)
- the receiving processor 152a may receive a voiceprint model corresponding to the identification information of the other party currently on the phone from the storage control module 152a_3, and perform noise filtering or other voice filtering based on the received voiceprint model.
- the receiving processor 152a may perform the above-described voiceprint model application from the start of the call to the end of the call.
- the receiving processor 152a may perform filtering based on the voiceprint model when the SNR of the received voice data is less than a specified value.
- the receiving processor 152a checks the learning progress of the voiceprint model corresponding to the other party's identification information, and determines whether the learning progress exceeds a predefined certain value (e.g., a certain length (or time)). If the voice data has been processed (or learning has been completed) more than a predefined number of times, filtering based on the voiceprint model can be performed.
- a predefined certain value e.g., a certain length (or time)
- the receiving processor 152a proceeds with learning of the voiceprint model during a call when the learning of the voiceprint model is less than a predefined value, and when learning during a call is completed (or when the learning progress is more than a predefined value) ), filtering based on the voiceprint model can be performed from that point on, or filtering based on the voiceprint model can be performed during the next call.
- the receiving processor 152a outputs a pop-up asking whether to apply filtering based on the voiceprint model at the time of completion of learning (or when the degree of learning progress exceeds a specified value) and performs filtering based on the voiceprint model in response to the user's selection. can do.
- the second type processor 152 may include an analog to digital converter (ADC) 152_3B (152_3B), an echo canceller (152b), a Tx encoder (152_2b), and a channel encoder (152_1b) disposed in the transmission path. At least one of the echo canceller (152b), Tx encoder (152_2b), and channel encoder (152_1b) may be formed as a software module or hardware configuration.
- ADC analog to digital converter
- 152_3B 152_3B
- an echo canceller 152b
- Tx encoder 152_2b
- channel encoder 152_1b
- the ADC (152_3B) is connected to the first microphone (121a) and can convert the user's voice signal collected by the first microphone (121a) into a digital signal and transmit it to the echo canceller (152b).
- the second microphone 121b is a digital microphone and can convert the received voice into a digital signal without a separate ADC process and transmit it to the echo canceller 152b.
- the first microphone 121a is described as an analog microphone and the second microphone 121b is described as a digital microphone, but the present invention is not limited thereto.
- the second type electronic device 100_2 may include a plurality of microphones, and all of the plurality of microphones may be analog microphones, or at least some of the microphones may be analog microphones, and at least one remaining microphone may be a digital microphone. You can.
- the echo canceller 152b may have the same or similar configuration as the echo canceller previously described in FIG. 3.
- the echo canceller 152b may include an adaptive filter 152b_1 and an ambient noise removal module 152b_2.
- the echo canceller 152b receives user voice data from the microphones 121a and 121b, it receives a voiceprint model corresponding to the other party's identification information from the storage control module 152a_3 and performs echo filtering based on the received voiceprint model. can be performed.
- the echo remover 152b may remove the echo based on an adaptive filter, remove ambient noise based on the voiceprint model, and then transmit the signal from which the noise has been removed to the Tx encoder 152_2b.
- the Tx encoder (152_2b) converts the received data into a packet that can be loaded on a channel, and the channel encoder (152_1b) can load the packet on a channel allocated for voice data transmission.
- At least some of the components included in the above-described digital signal processor may be configured as hardware processors.
- at least some of the components arranged in the digital signal processor may be configured as hardware components, and the remaining components may be configured as software modules.
- the electronic device (at least one of 100, 101, 102, 100_1, and 100_2) that receives voice data of the present disclosure learns (e.g., DNN training) and generates a voiceprint model, and generates a voiceprint model. Based on this, noise can be removed to support a clearer call function, and continuous updating of the voiceprint model can be supported by accumulating each call and learning automatically (or according to user selection).
- the electronic device (at least one of 100, 101, 102, 100_1, and 100_2) of the present invention is capable of removing not only the echo formed by the other party's voice, but also removing at least some of the surrounding noise and residual echo. support so that
- the first type electronic device 100_1 and the second type electronic device 100_2 described in FIGS. 3 and 4 are modified configurations related to removal of at least a portion of ambient noise and residual echo according to an embodiment of the present disclosure. has been described, and may be applied to at least one of the electronic devices (eg, 101, 102, and 100) previously described in FIGS. 1 and 2. Additionally, the electronic device including the configurations described in the first type electronic device 100_1 and the second type electronic device 100_2 described above may also be applied to at least some of the phonebook-based voiceprint operation methods described below.
- each of the above-described configurations of the first type electronic device 100_1 and the second type electronic device 100_2 is presented as an example to support the phonebook-based voiceprint operation of this description, and this description is limited to this. It doesn't work.
- at least some of the components described in the first type electronic device 100_1 and the second type electronic device 100_2 may be omitted, or the locations described (or shown) may be changed.
- at least part of the structure of Figure 3 may be omitted or replaced with at least part of the structure of Figure 4, and at least part of the structure of Figure 4 may also be omitted or replaced with at least part of the structure of Figure 3.
- at least one electronic device described in FIGS. 1 to 4 may further include at least some of the components of the electronic device in FIG. 12 described later.
- Figure 5 is a diagram showing an example of a server device configuration according to an embodiment.
- the server device 200 may include a server communication circuit 210, a server memory 230, and a server processor 250.
- the server communication circuit 210 may support the communication function of the server device 200.
- the server communication circuit 210 may form a communication channel with at least one electronic device 100 through the network 50.
- the server communication circuit 210 electronically transmits various data or signals to support voiceprint modeling of the electronic device (at least one of 100, 101, 102, 100_1, 100_2, hereinafter described with reference to 100). It can be transmitted and received with the device 100.
- the server communication circuit 110 may receive voice data (eg, second voice data in FIG. 1) received from the other electronic device for voiceprint modeling from the electronic device 100.
- the server communication circuit 110 may receive a voiceprint model being learned corresponding to the other electronic device from the electronic device 100 in response to the control of the server processor 250.
- the server communication circuit 110 is configured to provide at least one of a voiceprint model generated based on the received voice data, an updated voiceprint model that updates the voiceprint model being learned based on the received voice data, and a voiceprint model for which learning has been completed. It can be provided at (100).
- the server memory 230 may store data or programs necessary for operating the server device 200. According to one embodiment, the server memory 230 stores received voice data received from the electronic device 100 (e.g., voice data received by the electronic device 100 from the other electronic device) and identification for identifying the other electronic device.
- Voice data information 231 including information, voiceprint model learning information 232 created by performing learning based on the received voice data, and a learning algorithm 233 for voiceprint modeling may be stored.
- the voiceprint model learning information 232 may include a voiceprint model being learned or a voiceprint model that has completed learning corresponding to the learning result of the received voice data, and identification information for identifying the other party's electronic device.
- the identification information for identifying the other party's electronic device is information generated by the electronic device 100 randomly or according to a certain rule, and protects personal information by preventing direct matching between the voiceprint model and the phone number or name of the other party's electronic device.
- the identification information for identifying the other party's electronic device may be replaced with the name, phone number, email, or SNS address information of the other party's electronic device.
- the server processor 250 may control the transmission and processing of signals for operating the server device 200 and storage of the processing results.
- the server processor 250 may include a voiceprint model learning unit 251 and a user information management unit 252 in relation to supporting the voiceprint modeling function according to an embodiment.
- the voiceprint model learning unit 251 may perform voiceprint modeling (or learning) on the received voice data received from the electronic device 100.
- the voiceprint model learning unit 251 may generate a voiceprint model by applying a learning algorithm (eg, deep neural network, DNN algorithm) to the received voice data.
- the voiceprint model learning unit 251 may transmit the generated voiceprint model to the electronic device 100.
- the voiceprint model learning unit 251 can set the completeness of the voiceprint model for the received voice data.
- the voiceprint model learning unit 251 also provides information corresponding to the degree of learning progress (e.g., 30%, 50%...) can do.
- the voiceprint model learning unit 251 may predefine the number of learning times for voice data of a certain length or more or the length (or time) of voice data used for overall learning.
- the length of speech data associated with the completion of training of the voiceprint model can be statistically defined.
- the voice data length related to the completion of learning of the voiceprint model may be determined based on the degree of data distortion of the filtering result after filtering is performed by applying the learned voiceprint model to the actual received voice data.
- the voiceprint model learning unit 251 When the voiceprint model learning unit 251 receives a voiceprint model being learned along with received voice data from the electronic device 100, the voiceprint model learning unit 251 applies the received voice data to the received voiceprint model during training to perform voiceprint modeling (or learning). It can be done.
- the voiceprint model learning unit 251 may store and manage a voiceprint model in progress (or a voiceprint model for which learning has not been completed) in the server memory 230. When learning of the voiceprint model is completed, the voiceprint model learning unit 251 provides the learned voiceprint model to the electronic device 100, while providing related information (e.g., the learned voiceprint model) from the server memory 230. It can be deleted.
- the user information management unit 252 may store and manage information related to the electronic device 100 that provided the received voice data. According to one embodiment, the user information management unit 252 collects the identification information of the electronic device 100, the identification information of the other electronic device for which the electronic device 100 requests voiceprint modeling, and the received voice provided by the electronic device 100. At least part of the voiceprint model being learned can be saved through data. In this operation, in relation to personal information protection, the identification information of the other party's electronic device may be replaced with arbitrary information generated by the electronic device 100. When learning about a specific voiceprint model is completed, the user information management unit 252 may delete the identification information of the other party's electronic device related to the voiceprint model.
- the server processor 250 can receive voice data from the electronic device 100 in real time.
- the electronic device 100 may establish a call channel with the other electronic device and, upon receiving voice data from the other electronic device, provide the voice data to the server device 200.
- the server processor 250 may check the SNR of the received voice data, and if the SNR is less than a specified value, the received voice data may be discarded without applying it to voiceprint modeling.
- the server processor 250 may inform the electronic device 100 of at least part of the discard processing and reason for discarding.
- the SNR detector described in the electronic devices 100_1 and 100_2 described in FIGS. 3 and 4 may be omitted.
- the voice data extraction operation of the server processor 250 may be omitted. .
- FIG. 6 is a diagram illustrating an example of an electronic device operation method related to phonebook-based voiceprint modeling according to an embodiment.
- the processor 150 of the electronic device 100 may perform a phonebook-based call connection in operation 601. .
- the processor 150 may control the phonebook screen to be output on the display 140 in response to a phonebook execution request.
- the phonebook screen may include an item that can set the learning function to be turned on or off.
- the processor 150 When a user input requesting a call connection occurs after a specific item (or identification information of the other party's electronic device) is selected from the phone book list (or the identification information list of the other party's electronic device) displayed on the phone book screen, the processor 150 A call connection request message is sent to the device, and if the other electronic device accepts the call connection, the call can be connected. When connecting a call, the processor 150 may output a screen corresponding to the call connection on the display 140.
- the call connection screen includes at least one item related to the call function (e.g., call end button, speaker switch button, keypad selection button), and can turn on or off the learning function related to the voiceprint model. Items may be included.
- At least one of the phone book screen and the call connection screen is information indicating the learning level of the voiceprint model that performed learning based on the voice data of the other party's electronic device (e.g., text indicating the learning level or learning level A corresponding progress bar) can be output.
- the processor 150 may receive a voice signal from the other party's electronic device.
- the user of the other electronic device may make a speech and transmit his or her voice signal to the electronic device 100.
- the processor 150 may check whether the received voice signal is an abnormal signal. For example, the processor 150 can check whether there is network loss or whether the SNR signal is appropriate for the received voice signal. According to one embodiment, the processor 150 may determine a voice signal received as an abnormal signal when the signal reception strength received from the network (or base station) is less than a specified value. Alternatively, the processor 150 may check the bit error rate of the received voice signal and, if the bit error rate is greater than a specified value, determine it to be an abnormal signal. Alternatively, the processor 150 may check the signal-to-noise ratio (SNR) of the received voice signal and determine it to be an abnormal signal if the SNR is less than a specified value (if the signal ratio is lower than the specified value).
- SNR signal-to-noise ratio
- the processor 150 may check whether the learning function is turned on.
- the processor 150 may check whether an artificial intellectual (AI) training block is activated in relation to checking the learning function turn-on setting.
- the processor 150 can provide a voice recognition function related to learning function settings, and if the user does not wish to train the voice data of the other electronic device user, the above-described AI training block can be turned off, and the voice If you wish to train on data, you can change the AI training block to turn-on.
- the processor 150 may output a pop-up window or object that can change the settings of the learning function to the display 140. Alternatively, the processor 150 may output at least one of a notification sound or instruction information indicating that the learning function is turned on or turned off. According to one embodiment, when learning of the voiceprint model corresponding to the other party's electronic device is completed, the processor 150 outputs at least one of a guide sound or guidance information informing that the learned voiceprint model is being applied according to the settings. can do.
- operation 607 may be omitted.
- a function to turn off the learning function may be provided.
- the processor 150 sets the voiceprint modeling function of other users registered in the phonebook to be turned off by default, and performs voiceprint modeling for the other party with whom the number of call connections has exceeded a specified number of times (or the other party who has had a call connection for more than a specified time)
- the learning function can be automatically switched to the turn-on state (or switched upon acceptance after user confirmation).
- the processor 150 may perform voice data learning. For example, the processor 150 transfers the received voice data to a learning module (e.g., the learning module 150a2_2 in FIG. 2 or the learning module 152a_2 in FIG. 3) (e.g., a DNN block) to perform voice training. there is.
- the processor 150 may receive a learning algorithm from the memory 130 and perform learning on the received voice data using the received learning algorithm.
- the processor 150 may compare the similarity (or correlation) between the learning result and the previously stored voiceprint model to determine whether the similarity is greater than or equal to a reference value. For example, the processor 150 may compare the similarity between the voiceprint model generated as a result of learning the received voice data and the previously stored voiceprint model.
- the processor 150 may compare the similarity between the voiceprint model generated as a result of learning the received voice data and the previously stored voiceprint model.
- the processor 150 compares the received voice signal (e.g., mother's voice signal) with the father's voice signal already stored in the phonebook (or in a storage area associated with the phonebook), and if the similarity is less than the reference value, the father's voice signal is compared. It can be judged by the voice signal of someone other than . If it is determined to be another person's voice signal, the processor 150 may process the currently received voice signal (e.g., the mother's voice signal) not to be applied to learning (e.g., learning the father's glottal model).
- the received voice signal e.g., mother's voice signal
- the processor 150 may process the currently received voice signal (e.g., the mother's voice signal) not to be applied to learning (e.g., learning the father's glottal model).
- the processor 150 compares the glottal model generated from the voice signal instead of comparing the voice signal (e.g., the glottal model being learned corresponding to the father's list stored in the phonebook and the glottal model generated from the mother's voice signal). comparison) can also be performed.
- the processor 150 determines the similarity between the voiceprint model generated from the received voice signal (e.g., the mother's voice signal) and the father's voiceprint model being learned (or a previously stored voice signal) stored in the phonebook as a reference value. If it is less than the value, a user item (for example, a mother item previously stored in the phonebook) whose similarity is higher than the reference value can be detected by comparing it with the voice signals (or pre-stored voiceprint models) of other user items registered in the phonebook list.
- the voiceprint model generated from the received voice signal e.g., the mother's voice signal
- the father's voiceprint model being learned or a previously stored voice signal
- the processor 150 converts the currently received voice signal (e.g., mother voice signal) corresponding to the other user item into a pre-stored and currently learning glottal model (e.g., mother voice signal). Learning can be done by applying it to the corresponding voiceprint model being learned. If no other user items are detected, the processor 150 may process not applying learning to the received voice signal (eg, mother's voice signal).
- a pre-stored and currently learning glottal model e.g., mother voice signal.
- Learning can be done by applying it to the corresponding voiceprint model being learned. If no other user items are detected, the processor 150 may process not applying learning to the received voice signal (eg, mother's voice signal).
- the processor 150 determines that even if another user item (e.g., mother item stored in the phonebook) corresponding to the received voice signal (e.g., mother's voice signal) is detected, learning of the corresponding voiceprint model is completed. In this case, it can be processed so that no additional learning is performed. Additionally or alternatively, the processor 150 may perform filtering (e.g., mother's voice signal) of a received speech signal (e.g., mother's voice signal) based on a glottal model (e.g., a learned glottal model) corresponding to another detected user item. Filtering using a model) can be performed.
- filtering e.g., mother's voice signal
- a glottal model e.g., a learned glottal model
- performing an operation of detecting another user item e.g., mother's item stored in a phonebook
- the processor 150 first selects other user items (e.g., other user items registered as family members) that are highly related to the other electronic device (e.g., father's portable communication device). You can perform voice signal comparison (or model comparison) with .
- the processor 150 may update the learning result in operation 613.
- the processor 150 may integrate the learning results for the received voice data into an existing stored voiceprint model being learned, and store the integrated voiceprint model in extract speaker profile (database storage) (or memory 130).
- the learned and updated voiceprint model can be stored in synchronization with the other party's contact information.
- the processor 150 when an abnormal voice signal is received in operation 605, when the learning function is set to turn-off in operation 607, and when the similarity is less than the reference value in operation 611, the processor 150 performs 615 Data non-reflection processing can be performed in the operation. For example, the processor 150 may control the received voice data to be output through a speaker after performing general audio processing without applying it to learning.
- the processor 150 may check whether an event related to the end of learning occurs. For example, the processor 150 receives at least one of the following: when an event related to call termination occurs, when a user input to turn off the learning function is received, or when a message requesting not to execute the learning function is received from the other party's electronic device. You can check whether the case occurs. If an event related to the end of learning occurs, the processor 150 ends the learning function, and if no event related to the end of learning occurs, the processor 150 branches to before operation 603 and re-performs the following operations.
- FIG. 7 is a diagram illustrating an example of an electronic device operation method related to phonebook-based voiceprint model operation according to an embodiment.
- the processor 150 of the electronic device 100 may perform a phonebook-based call connection in operation 701.
- Operation 701 may be the same or similar to operation 601 described above.
- the phonebook-based call connection can be included in the phonebook-based call connection not only when a call is connected through the phonebook, but also when the other party's phone number is pressed through the virtual number keypad, if the other party's phone number is registered in the phonebook.
- the processor 150 may receive voice (eg, voice data or voice signal) through a call channel connected to the other party's electronic device.
- voice e.g, voice data or voice signal
- the processor 150 may obtain a voiceprint model stored in correspondence with the identification information of the other party's electronic device from the memory 130.
- the time of acquiring the voiceprint model may be at least one of the time when a call connection is requested, the time when the other party's electronic device accepts the call connection, or the time when voice data is received from the other party's electronic device.
- the processor 150 may perform noise filtering on the received voice data based on the voiceprint model (or learning result). For example, the processor 150 may remove at least some of the ambient noise and residual echo of the received voice data.
- the processor 150 may perform a filtering operation based on a voiceprint model corresponding to the other party's electronic device. If the signal is not abnormal, the processor 150 may selectively perform glottal model-based filtering or glottal model-based filtering to remove at least some of the surrounding noise and residual echo. The following operation explains the performance of filtering based on the voiceprint model.
- the processor 150 may convert voice data from which at least some of the surrounding noise and residual echo has been removed into an analog signal, and output the converted analog signal through a speaker.
- the processor 150 may perform various operations depending on the degree of learning of the learned voiceprint model. For example, when the learning progress of the learned voiceprint model is less than a specified reference value, the processor 150 does not perform separate noise filtering, but outputs a guide sound or guidance information to indicate that the learning progress of the learning model is insufficient. can do. Alternatively, if the learning progress of the learned voiceprint model is less than a specified reference value, the processor 150 may perform noise filtering using the voiceprint model in progress while informing that complete noise filtering cannot be performed.
- the processor 150 performs noise filtering (e.g., at least some of the surrounding noise and residual echo) on the received voice data without separate guidance (or while guiding that noise filtering based on the voiceprint model is applied). removal) can be performed.
- noise filtering e.g., at least some of the surrounding noise and residual echo
- the processor 150 may check whether an event related to call termination has occurred. If no event related to call termination occurs, the processor 150 may branch to before operation 703 and re-perform the following operations. When a call termination-related event occurs, the processor 150 returns hardware or software resources for using the voiceprint model and terminates the call function.
- the processor 150 may perform noise filtering using a voiceprint model corresponding to the identification information of the other party's electronic device for the section in which the user's voice is collected.
- echo removal using an adaptive filter e.g., removal of a signal collected by a microphone among signals output through a speaker
- the processor 150 At least some of the surrounding noise and residual echo left after echo removal can be additionally removed using the voiceprint model.
- the learning operation of the voiceprint model described in FIG. 6 can be performed independently of the filtering operation using the voiceprint model in FIG. 7 described later.
- the processor 150 when connecting a call with the other party's electronic device stored in the phonebook, the processor 150 does not perform a filtering operation using the voiceprint model if learning of the voiceprint model corresponding to the other party's electronic device is not completed, as described in FIG. 6.
- a voiceprint model learning operation can be performed.
- the processor 150 does not perform the learning operation described in FIG. 6, but performs the voiceprint model described in FIG. 7. You can perform filtering operations using .
- the processor 150 determines if the voice signal received while performing a call connection with the other party's electronic device stored in the phonebook is not the other party's own (e.g., operation 611), and if it is not the other party's own, As described above, it can be controlled to perform additional learning or filtering operations depending on whether the glottal model learning of other user items (e.g., mother items) has been completed.
- Figure 8 is a diagram showing another example of an electronic device operation method related to phonebook-based voiceprint model operation according to an embodiment.
- the processor 150 of the electronic device 100 determines whether a call is received in operation 801. You can check it. If there is no incoming call, the processor 150 may process performance of the designated function in operation 803. For example, the processor 150 may execute a specific application in response to a user input, or control the output of a screen or audio signal corresponding to the currently executing application.
- the processor 150 may check whether identification information (eg, phone number) of the other party's electronic device that requested a call connection exists in the phonebook. If the identification information is pre-registered in the phonebook, the processor 150 may output a call connection request screen including identification information (e.g., name, phone number) of the other party's electronic device. When a user input accepting a call connection is received, the processor 150 may form a call channel with the other party's electronic device. The processor 150 can receive voice data transmitted by the other party's electronic device through a call channel. If the identification information is not registered in the phonebook, the processor 150 may branch to operation 803 to perform a designated function (e.g., a call function corresponding to the user's call acceptance).
- a designated function e.g., a call function corresponding to the user's call acceptance
- the processor 150 may generate a voiceprint model by performing learning on the received voice data.
- the processor 150 may store the voiceprint model in the memory 130 by matching it with identification information registered in the phonebook.
- the processor 150 checks whether the call function is terminated, and if the call function is maintained, it branches to operation 807 and re-performs the following operations. When the call function ends, the processor 150 may end the voice data learning function.
- the electronic device 100 outputs a screen related to voiceprint model learning and the voice received from the other party according to user settings.
- Voiceprint modeling can be performed based on the data.
- the processor 150 may support registration of phonebook information corresponding to the learned voiceprint model.
- the processor 150 may store voice data received during a call and, after the call ends, create a voiceprint model through learning.
- the processor 150 generates new identification information (e.g., noname) corresponding to the other party's electronic device during the call, matches the voiceprint model generated based on voice data received during the call to the new identification information, and stores the new identification information in memory 130. ) can be saved in .
- the processor 150 may output a screen requesting change to new identification information on the display 140.
- FIGS. 7 and 8 described above examples of electronic device operation methods related to voiceprint model operation of the electronic device are separately described, but at least some of the operations described in FIG. 7 and at least some of the operations described in FIG. 8 are They may be mutually applied or excluded from each example.
- the noise filtering configuration described in FIG. 7 can also be applied to operation 807 in FIG. 8.
- FIG. 9 is a diagram illustrating an example of a method of using a server device of an electronic device related to operation of a phonebook-based voiceprint model according to an embodiment.
- the processor 150 of the electronic device 100 may check whether call data of a phonebook registered user is received in operation 901. .
- the electronic device 100 may form a communication channel with another electronic device. If there is no reception of call data from the phonebook registered user, in operation 903, the processor 150 may process performance of the designated function.
- the processor 150 may support the performance of a specific user function (e.g., music playback function, web surfing function, video playback function) of the electronic device 100 in response to user input.
- the processor 150 may generate identification information corresponding to the registered user.
- the processor 150 may generate arbitrary identification information in relation to personal information protection, and store the registered user information by matching the generated arbitrary identification information with the registered user information.
- processor 150 may transmit identification information and call data (e.g., voice data) to designated server device 200.
- the processor 150 may form a communication channel with the server device 200 and transmit the identification information and call data to the server device 200 based on the communication channel.
- the processor 150 may obtain, store and manage server device 200 connection information in advance.
- the electronic device 100 may install an application related to voiceprint modeling, and when the learning function is turned on, the electronic device 100 may connect to the server device 200 by automatically executing the installed application when a call channel is established.
- the processor 150 may also transmit the voiceprint model being learned to the server device 200.
- the processor 150 may receive learning results from the designated server device 200 and update the received learning results in the memory 130.
- the processor 150 may match and store the voiceprint model corresponding to the received learning result in the phonebook.
- the processor 150 verifies the identification information of the other party's electronic device in the phonebook based on the identification information transmitted to the server device 200, and matches it to the identified identification information of the other party's electronic device to create a voiceprint model (or server device).
- the results learned at (200) can be stored in the memory (130).
- the processor 150 may update the currently received voiceprint model if there is a previously stored learning model.
- the processor 150 may check whether an event related to termination of the learning function for the voiceprint model occurs. For example, the processor 150 may end the voiceprint modeling operation using the server device 200 when an event related to termination of the call function or an event that turns the learning function to turn off occurs. If the end event does not occur, the processor 150 may branch to before operation 901 and re-perform the following operations.
- the voice data used to generate the above-described voiceprint model may include data received during a voice call or data received during a video call. Additionally, the voice data related to generating the voiceprint model may include voice data transmitted and received during a multi-party voice call (or multi-party video call).
- the processor 150 performs similarity mapping of the voice data of a plurality of users transmitted and received, based on phonebook information used for call connection and voiceprint models during learning that are mapped to the phonebook information and pre-stored. Speech data that is greater than or equal to a predefined reference value can be used to learn the corresponding voiceprint model.
- the processor 150 may process it so that it is not reflected in learning. As an example, during a multi-party call, the processor 150 applies source separation technology to sound source data, separates the sound source data of a plurality of speakers included in the sound source data, and then models the voiceprint model of the speaker for which learning has not been completed. You can proceed with your learning. In this operation, the processor 150 may use the learned voiceprint model to filter voice data while the speaker is speaking and process the remaining voice data to perform learning for voiceprint modeling.
- Figure 10 is a diagram illustrating an example of a screen interface related to voiceprint modeling according to an embodiment.
- the display 140 of the electronic device displays a first display screen 161 as shown in relation to operation of the call function.
- the processor 150 displays the first display screen 161 ( 140) can be controlled to output.
- the first display screen 161 includes, for example, the user name 161_1 of the other party's electronic device, the phone number of the other party's electronic device (161_2), and a first object (161_3) that can request a call connection with the other party's electronic device (e.g.
- call connection virtual button a second object (161_4) that can transmit a text message to the other party's electronic device, a third object (161_5) that can request a video call connection to the other party's electronic device (e.g., video call connection virtual button) It may include at least one of: At least some of the user name 161_1, phone number 161_2, and first to third objects 161_3, 161_4, and 161_5 displayed on the above-described first display screen 161 may be omitted.
- the first display screen 161 when the learning function is set to the turn-on state, is a second glottal model object indicating that the learning function is in the turn-on state and the degree of learning progress of the glottal model. (161_7) can be output.
- the second voiceprint model object 161_7 may include, for example, a number, letter, or image corresponding to a ratio value indicating the degree of learning progress.
- the above-described first voiceprint model object 161_6 and the second voiceprint model object 161_7 may be toggled in response to user input (eg, touch).
- the processor 150 displays a call connection request screen (or video call connection request screen) ( 140).
- the call connection request screen (or video call connection request screen) may output at least one of the first voiceprint model object 161_6 or the second voiceprint model object 161_7 depending on the learning function setting state.
- the first voiceprint model object (161_6) is displayed on the second voiceprint model object (161_6). It can be changed to model object (161_7).
- Either the first voiceprint model object 161_6 or the second voiceprint model object 161_7 may be displayed on the first display screen 161 depending on the learning level and activation or deactivation state of the voiceprint model.
- the user can turn on or turn on the voiceprint model learning function of the other party's electronic device registered in the phonebook through at least one of the screen for checking phonebook information, the screen for requesting a call connection, and the in-call screen. Turn-off settings can be performed.
- the processor 150 of the electronic device 100 may output one of the voiceprint model objects 161_6 and 161_7 in association with related items on a phonebook list screen where a plurality of phonebook items are displayed.
- the user can change the settings of the learning function and check the progress of the voiceprint model or whether learning has been completed.
- FIG. 11 is a diagram illustrating another example of a screen interface related to voiceprint modeling according to an embodiment.
- the processor 150 of an electronic device may support a video call connection in response to a user input.
- the electronic device may further include a camera for video calls.
- the processor 150 can provide a screen interface that can request a video call connection through the phonebook.
- the processor 150 may perform a video call connection with the other party's electronic device and perform learning for a voiceprint model based on the voice data transmitted and received.
- the electronic device 100 may store phonebook information 131 including a plurality of counterpart user identification information 131a, 131b, 131c, 131d, and 131e in the memory 130.
- Each of the user identification information (131a, 131b, 131c, 131d, 131e) may include voiceprint models through learning of previous voice data or may be mapped to voiceprint models.
- the processor 150 may control the video call screen 162 including the video area 162_1 and the learning display area 162_2 to be output on the display 140.
- the learning display area 162_2 includes a first learning display item 162a indicating the learning progress of the first user identification information 131a and a second learning display item 162b indicating the learning progress of the second user identification information 131b. ) may include.
- the processor 150 maps the voiceprint models generated through learning of voice data received from the other electronic device and the user identification information (131a, 131b, 131c, 131d, 131e) and stores the voiceprint being learned. By comparing the models, voiceprint models whose similarity is greater than or equal to a reference value can be detected.
- the processor 150 learns a first voiceprint model mapped to the first user identification information 131a and a second voiceprint model mapped to the second user identification information 131b based on voice data acquired during a video call. can be performed respectively.
- the processor 150 obtains a first voiceprint model corresponding to the user identification information of the other party's electronic device, and uses the first voiceprint model to create a first voiceprint model and a first voiceprint model among voice data transmitted and received during a video call. Related voice data can be detected.
- the processor 150 can perform noise filtering based on the first voiceprint model to process the user's voice from the other party's electronic device so that it can be heard more clearly.
- the processor 150 creates voiceprint models by learning a plurality of voice data, and compares the voiceprint models with the voiceprint models mapped and stored in the phonebook. Voiceprint models whose similarity is greater than or equal to a predefined standard value can be detected. When the voiceprint models have been trained enough to be used for noise filtering, the processor 150 performs noise filtering using each voiceprint model, so that the user's voice corresponding to the voiceprint models can be heard more clearly. It can be processed so that it can be processed. In this operation, the processor 150 may basically use the first voiceprint model corresponding to the user identification information of the other party's electronic device performing the video call.
- the processor 150 executes an application corresponding to a video editor in response to a user input, and links the voice data included in the video file being edited in the executed video editor with the phonebook.
- Modeling can be performed.
- the processor 150 performs modeling on voice data included in the video file, and compares at least part of the modeled data with pre-stored voiceprint models in connection with the phonebook to obtain the same or similar voiceprint model.
- phonebook information corresponding to the voiceprint model can be detected, and the detected phonebook information can be matched and stored with the voice data included in the video file, or the detected phonebook information can be controlled to be output together on the video output screen. You can.
- the processor 150 distinguishes speakers that match the voiceprint models or the other party's faces stored in the phonebook 131 from the video file, and outputs them on the screen while the screen corresponding to the video file is displayed.
- Phonebook information e.g. name
- the processor 150 distinguishes speakers that match the voiceprint models or the other party's faces stored in the phonebook 131 from the video file, and outputs them on the screen while the screen corresponding to the video file is displayed.
- Phonebook information e.g. name
- the phonebook-based voiceprint operation method of the present invention can effectively remove sign language noise or echo by utilizing the other party's voice learned (or trained) based on the phonebook, and can effectively remove sign language noise or echo, and learn about the other party's voice ( If you do not want to do this (or training), or if the other party's voice for voice data collection is not clear (if there is a lot of noise in the voice due to the performance of the other party's broadcasting noise suppressor), you can support selective learning. Additionally, this device supports using the learned other party's voice for various audio functions.
- this device is stored in the phonebook and uses the other party's voiceprint (Voice Printing) information to serve as an auxiliary function of the Echo Canceller, which is a speech solution, and removes at least a residual echo or ambient noise (or ambient noise). Some of them can be removed effectively.
- voiceprint Voice Printing
- At least one object included in the screen interface described in FIG. 10 may be applied to the screen interface described in FIG. 11, and at least one object included in the screen interface described in FIG. 11 may be applied to the screen interface described in FIG. 10. You can. Additionally, at least one object included in the screen interface described in FIGS. 10 and 11 may be omitted.
- the phonebook information 131 described in FIG. 11 may be added to the screen interface described in FIG. 10 .
- the activation or deactivation display of the screen interface described in FIG. 10 may be applied to at least one of the user identification information 131a, 131b, 131c, 131d, and 131e, the image area 162_1, and the learning display area 162_2 of FIG. 11. .
- an electronic device includes a communication circuit supporting a call function, a memory for storing a phonebook, and a processor functionally connected to the communication circuit and the memory, and the processor includes the communication circuit and the memory.
- a call channel is formed with the other party's electronic device in response to a call function activation request based on the identification information of the other party's electronic device registered in the phonebook, and when voice data transmitted by the other party's electronic device is received, voice data is generated based on the voice data.
- the voiceprint model is connected to the identification information of the other party's electronic device, and the voiceprint model connected to the identification information of the other party's electronic device is stored in the phonebook.
- the processor controls to output the phonebook execution screen on a display in response to a user input related to executing the phonebook, and includes an item displaying identification information of the other party's electronic device in the phonebook execution screen, the voiceprint It is characterized in that it is set to output an object that displays at least one of whether the model is set for learning and the degree of learning progress of the voiceprint model.
- the processor controls to proceed with learning of the voiceprint model when the object is activated, and controls not to proceed with learning of the voiceprint model when the object is deactivated.
- the processor receives a user input related to selecting the object, and when the object is activated in response to receiving the user input, controls to perform learning of the voiceprint model using the voice data, When the object is deactivated in response to receiving the user input, learning of the voiceprint model using the voice data can be controlled to cancel or stop.
- the processor compares at least a portion of the generated voiceprint model with at least a portion of a voiceprint model being learned that is previously stored in the phonebook, and when the similarity as a result of the comparison is greater than a preset reference value, the processor uses the voice data.
- the method is set to proceed with learning of the voiceprint model being learned, and to cancel or stop learning of the voiceprint model using the voice data when the similarity as a result of the comparison is less than a preset reference value.
- the processor is set to generate the voiceprint model by performing learning on the voice data based on an artificial neural network algorithm stored in the memory.
- the processor performs learning of the voiceprint model when the signal quality of the voice data is greater than a predefined certain value, and when the signal quality of the voice data is less than a predefined certain value, the processor performs training on the voiceprint model. It is characterized by being set to stop or cancel model learning.
- the processor performs the voiceprint model learning when the signal-to-noise ratio of the voice data is greater than a predefined certain value or the bit error rate of the voice data is less than a predefined certain value, If the signal-to-noise ratio of the voice data is less than a predefined certain value or the bit error rate of the voice data is more than a predefined certain value, the voiceprint model learning is set to stop or cancel.
- the processor transmits the voice data to a designated server device and receives a voiceprint model learned based on the voice data from the server device.
- the processor transmits the voice data to the server device when the signal quality of the voice data is greater than a predefined certain value, and when the signal quality of the voice data is less than a predefined certain value. , Characterized in that it is set to stop or cancel the voice data transmission.
- the processor generates random identification information corresponding to the identification information of the other party's electronic device, and matches the voice data with a voiceprint model being learned stored in the phonebook to the random identification information. It is characterized in that it is set to be transmitted to a server device.
- the processor removes an echo of a voice signal to be transmitted to the other electronic device using the voice data, and uses the voiceprint model to remove at least one residual echo among the surrounding noise of the echo-removed voice signal. It is characterized by being set to remove some of it.
- the processor when the learning progress of the voiceprint model is greater than a specified value or the learning of the voiceprint model is completed, the processor performs removal of at least a portion of the residual echo among the surrounding noise using the voiceprint model. It is characterized by being set to do so.
- a phonebook-based voiceprint operation method includes the steps of forming a call channel with the other party's electronic device in response to a call function activation request based on the identification information of the other party's electronic device registered in the phonebook of the electronic device; An operation of receiving voice data transmitted by the other electronic device, and if the signal quality of the voice data is better than a specified reference value, processing the generation of a voiceprint model based on the voice data, and the processing The operation includes linking the voiceprint model generated based on the voice data to identification information of the other party's electronic device, and storing the voiceprint model linked to the identification information of the other party's electronic device in the phonebook.
- the method further includes receiving a user input related to executing the phonebook, and outputting the phonebook executing screen on a display in response to the user input, wherein the outputting operation includes: Characterized by an operation of outputting an object indicating at least one of whether the voiceprint model is set for learning and the degree of learning progress of the voiceprint model in an item displaying identification information of the other party's electronic device on the phonebook execution screen.
- the method includes, when the object is in an activated state, learning of the voiceprint model, and when the object is in an inactive state, processing the voice data according to the call function without proceeding with learning of the voiceprint model. It is characterized in that it includes an operation to perform.
- the method includes receiving a user input related to selecting the object, and when the object is activated in response to receiving the user input, performing learning of the voiceprint model using the voice data. , when the object is deactivated in response to receiving the user input, canceling or stopping learning of the voiceprint model using the voice data and performing voice data processing according to the call function.
- the method includes comparing at least a part of the generated voiceprint model with at least a part of a voiceprint model being learned that is pre-stored in the phonebook, and if the similarity as a result of the comparison is greater than a preset reference value, the voice data An operation of proceeding with learning of the voiceprint model being learned using , and an operation of canceling or stopping learning of the upper voiceprint model using the voice data when the similarity as a result of the comparison is less than a preset reference value.
- the processing operation performs the voiceprint model learning when the signal-to-noise ratio of the voice data is greater than a predefined certain value or the bit error rate of the voice data is less than a predefined certain value. And, when the signal-to-noise ratio of the voice data is less than a predefined certain value or the bit error rate of the voice data is more than a predefined certain value, an operation of stopping or canceling the voiceprint model learning is characterized. do.
- the processing operation includes, when the signal quality of the voice data is higher than a predefined certain value, transmitting the voice data to the server device and receiving the voiceprint model from the server device, If the signal quality of the voice data is below a predefined certain value, an operation of stopping or canceling the voice data transmission is included.
- the method includes at least some of the operations described in FIGS. 6 to 9, at least some of the operations for outputting an object on the screen of the display described in FIGS. 10 and 11, and an operation for processing an audio signal. It may further include at least some of these.
- the memory of the electronic device stores a video file including a voice signal (or audio information or voice data), and the processor or method of the electronic device stores audio information included in the video file in the video file.
- An operation may be performed to compare the voiceprint models stored in the memory and output identifier information for the same or similar voiceprint models.
- the processor or method of the electronic device distinguishes speakers (or separates sources) from voice signals included in the video using the voiceprint model, and selects the voice signals uttered by each speaker.
- STT Speech to text
- the processor or method of the electronic device distinguishes speakers (or source is separated), and voiceprint modeling (or learning) can be performed on the voice signals uttered by each speaker.
- FIG. 12 is a block diagram of an electronic device 1201 in a network environment 1200, according to various embodiments.
- the electronic device 1201 communicates with the electronic device 1202 through a first network 1298 (e.g., a short-range wireless communication network) or a second network 1299. It is possible to communicate with at least one of the electronic device 1204 or the server 1208 through (e.g., a long-distance wireless communication network). According to one embodiment, the electronic device 1201 may communicate with the electronic device 1204 through the server 1208.
- a first network 1298 e.g., a short-range wireless communication network
- a second network 1299 e.g., a long-distance wireless communication network
- the electronic device 1201 includes a processor 1220, a memory 1230, an input module 1250, an audio output module 1255, a display module 1260, an audio module 1270, and a sensor module ( 1276), interface (1277), connection terminal (1278), haptic module (1279), camera module (1280), power management module (1288), battery (1289), communication module (1290), subscriber identification module (1296) , or may include an antenna module 1297.
- at least one of these components eg, the connection terminal 1278
- may be omitted, or one or more other components may be added to the electronic device 1201.
- some of these components are integrated into one component (e.g., display module 1260). It can be.
- the processor 1220 executes software (e.g., program 1240) to operate at least one other component (e.g., hardware or software component) of the electronic device 1201 connected to the processor 1220. It can be controlled and various data processing or calculations can be performed. According to one embodiment, as at least part of data processing or computation, the processor 1220 stores commands or data received from another component (e.g., sensor module 1276 or communication module 1290) in volatile memory 1232. The commands or data stored in the volatile memory 1232 can be processed, and the resulting data can be stored in the non-volatile memory 1234.
- software e.g., program 1240
- the processor 1220 stores commands or data received from another component (e.g., sensor module 1276 or communication module 1290) in volatile memory 1232.
- the commands or data stored in the volatile memory 1232 can be processed, and the resulting data can be stored in the non-volatile memory 1234.
- the processor 1220 may include a main processor 1221 (e.g., a central processing unit or an application processor) or an auxiliary processor 1223 that can operate independently or together (e.g., a graphics processing unit, a neural network processing unit ( It may include a neural processing unit (NPU), an image signal processor, a sensor hub processor, or a communication processor).
- a main processor 1221 e.g., a central processing unit or an application processor
- auxiliary processor 1223 e.g., a graphics processing unit, a neural network processing unit ( It may include a neural processing unit (NPU), an image signal processor, a sensor hub processor, or a communication processor.
- the electronic device 1201 includes a main processor 1221 and a auxiliary processor 1223
- the auxiliary processor 1223 may be set to use lower power than the main processor 1221 or be specialized for a designated function. You can.
- the auxiliary processor 1223 may be implemented separately from the main processor 1221 or as part of it.
- the auxiliary processor 1223 may, for example, act on behalf of the main processor 1221 while the main processor 1221 is in an inactive (e.g., sleep) state, or while the main processor 1221 is in an active (e.g., application execution) state. ), together with the main processor 1221, at least one of the components of the electronic device 1201 (e.g., the display module 1260, the sensor module 1276, or the communication module 1290) At least some of the functions or states related to can be controlled.
- co-processor 1223 e.g., image signal processor or communication processor
- may be implemented as part of another functionally related component e.g., camera module 1280 or communication module 1290. there is.
- the auxiliary processor 1223 may include a hardware structure specialized for processing artificial intelligence models.
- Artificial intelligence models can be created through machine learning. For example, such learning may be performed in the electronic device 1201 itself on which the artificial intelligence model is performed, or may be performed through a separate server (e.g., server 1208).
- Learning algorithms may include, for example, supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning, but It is not limited.
- An artificial intelligence model may include multiple artificial neural network layers.
- Artificial neural networks include deep neural network (DNN), convolutional neural network (CNN), recurrent neural network (RNN), restricted boltzmann machine (RBM), belief deep network (DBN), bidirectional recurrent deep neural network (BRDNN), It may be one of deep Q-networks or a combination of two or more of the above, but is not limited to the examples described above.
- artificial intelligence models may additionally or alternatively include software structures.
- the memory 1230 may store various data used by at least one component (eg, the processor 1220 or the sensor module 1276) of the electronic device 1201. Data may include, for example, input data or output data for software (e.g., program 1240) and instructions related thereto.
- Memory 1230 may include volatile memory 1232 or non-volatile memory 1234.
- the program 1240 may be stored as software in the memory 1230 and may include, for example, an operating system 1242, middleware 1244, or application 1246.
- the input module 1250 may receive commands or data to be used in a component of the electronic device 1201 (e.g., the processor 1220) from outside the electronic device 1201 (e.g., a user).
- the input module 1250 may include, for example, a microphone, mouse, keyboard, keys (eg, buttons), or digital pen (eg, stylus pen).
- the sound output module 1255 may output sound signals to the outside of the electronic device 1201.
- the sound output module 1255 may include, for example, a speaker or receiver. Speakers can be used for general purposes such as multimedia playback or recording playback.
- the receiver can be used to receive incoming calls. According to one embodiment, the receiver may be implemented separately from the speaker or as part of it.
- the display module 1260 can visually provide information to the outside of the electronic device 1201 (eg, a user).
- the display module 1260 may include, for example, a display, a hologram device, or a projector, and a control circuit for controlling the device.
- the display module 1260 may include a touch sensor configured to detect a touch, or a pressure sensor configured to measure the intensity of force generated by the touch.
- the audio module 1270 can convert sound into an electrical signal or, conversely, convert an electrical signal into sound. According to one embodiment, the audio module 1270 acquires sound through the input module 1250, the sound output module 1255, or an external electronic device (e.g., directly or wirelessly connected to the electronic device 1201). Sound may be output through an electronic device 1202 (e.g., speaker or headphone).
- an electronic device 1202 e.g., speaker or headphone
- the sensor module 1276 detects the operating state (e.g., power or temperature) of the electronic device 1201 or the external environmental state (e.g., user state) and generates an electrical signal or data value corresponding to the detected state. can do.
- the sensor module 1276 includes, for example, a gesture sensor, a gyro sensor, an air pressure sensor, a magnetic sensor, an acceleration sensor, a grip sensor, a proximity sensor, a color sensor, an IR (infrared) sensor, a biometric sensor, It may include a temperature sensor, humidity sensor, or light sensor.
- the interface 1277 may support one or more designated protocols that can be used to connect the electronic device 1201 directly or wirelessly with an external electronic device (eg, the electronic device 1202).
- the interface 1277 may include, for example, a high definition multimedia interface (HDMI), a universal serial bus (USB) interface, an SD card interface, or an audio interface.
- HDMI high definition multimedia interface
- USB universal serial bus
- SD card interface Secure Digital interface
- audio interface audio interface
- connection terminal 1278 may include a connector through which the electronic device 1201 can be physically connected to an external electronic device (eg, the electronic device 1202).
- the connection terminal 1278 may include, for example, an HDMI connector, a USB connector, an SD card connector, or an audio connector (eg, a headphone connector).
- the haptic module 1279 can convert electrical signals into mechanical stimulation (e.g., vibration or movement) or electrical stimulation that the user can perceive through tactile or kinesthetic senses.
- the haptic module 1279 may include, for example, a motor, a piezoelectric element, or an electrical stimulation device.
- the camera module 1280 can capture still images and moving images.
- the camera module 1280 may include one or more lenses, image sensors, image signal processors, or flashes.
- the power management module 1288 can manage power supplied to the electronic device 1201. According to one embodiment, the power management module 1288 may be implemented as at least a part of, for example, a power management integrated circuit (PMIC).
- PMIC power management integrated circuit
- Battery 1289 may supply power to at least one component of electronic device 1201.
- the battery 1289 may include, for example, a non-rechargeable primary cell, a rechargeable secondary cell, or a fuel cell.
- the communication module 1290 provides a direct (e.g., wired) communication channel or a wireless communication channel between the electronic device 1201 and an external electronic device (e.g., the electronic device 1202, the electronic device 1204, or the server 1208). It can support establishment and communication through established communication channels.
- Communication module 1290 operates independently of processor 1220 (e.g., an application processor) and may include one or more communication processors that support direct (e.g., wired) communication or wireless communication.
- the communication module 1290 may be a wireless communication module 1292 (e.g., a cellular communication module, a short-range wireless communication module, or a global navigation satellite system (GNSS) communication module) or a wired communication module 1294 (e.g., : LAN (local area network) communication module, or power line communication module) may be included.
- a wireless communication module 1292 e.g., a cellular communication module, a short-range wireless communication module, or a global navigation satellite system (GNSS) communication module
- GNSS global navigation satellite system
- wired communication module 1294 e.g., : LAN (local area network) communication module, or power line communication module
- the corresponding communication module is a first network 1298 (e.g., a short-range communication network such as Bluetooth, wireless fidelity (WiFi) direct, or infrared data association (IrDA)) or a second network 1299 (e.g., legacy It may communicate with an external electronic device 1204 through a telecommunication network such as a cellular network, a 5G network, a next-generation communication network, the Internet, or a computer network (e.g., LAN or WAN).
- a telecommunication network such as a cellular network, a 5G network, a next-generation communication network, the Internet, or a computer network (e.g., LAN or WAN).
- a telecommunication network such as a cellular network, a 5G network, a next-generation communication network, the Internet, or a computer network (e.g., LAN or WAN).
- a telecommunication network such as a cellular network, a 5G network, a next-generation communication network
- the wireless communication module 1292 uses subscriber information (e.g., International Mobile Subscriber Identifier (IMSI)) stored in the subscriber identification module 1296 to communicate within a communication network such as the first network 1298 or the second network 1299.
- subscriber information e.g., International Mobile Subscriber Identifier (IMSI)
- IMSI International Mobile Subscriber Identifier
- the wireless communication module 1292 may support 5G networks and next-generation communication technologies after 4G networks, for example, new radio access technology (NR access technology).
- NR access technology provides high-speed transmission of high-capacity data (enhanced mobile broadband (eMBB)), minimization of terminal power and access to multiple terminals (massive machine type communications (mMTC)), or ultra-reliable and low-latency (URLLC). -latency communications)) can be supported.
- the wireless communication module 1292 may support high frequency bands (e.g., mmWave bands), for example, to achieve high data rates.
- the wireless communication module 1292 uses various technologies to secure performance in high frequency bands, for example, beamforming, massive MIMO (multiple-input and multiple-output), and full-dimensional multiplexing.
- the wireless communication module 1292 may support various requirements specified in the electronic device 1201, an external electronic device (e.g., electronic device 1204), or a network system (e.g., second network 1299). According to one embodiment, the wireless communication module 1292 supports peak data rate (e.g., 20 Gbps or more) for realizing eMBB, loss coverage (e.g., 164 dB or less) for realizing mmTC, or U-plane latency (e.g., 164 dB or less) for realizing URLLC.
- peak data rate e.g., 20 Gbps or more
- loss coverage e.g., 164 dB or less
- U-plane latency e.g., 164 dB or less
- the antenna module 1297 may transmit or receive signals or power to or from the outside (e.g., an external electronic device).
- the antenna module 1297 may include an antenna including a radiator made of a conductor or a conductive pattern formed on a substrate (eg, PCB).
- the antenna module 1297 may include a plurality of antennas (eg, an array antenna).
- at least one antenna suitable for a communication method used in a communication network such as the first network 1298 or the second network 1299 is, for example, connected to the plurality of antennas by the communication module 1290. can be selected Signals or power may be transmitted or received between the communication module 1290 and an external electronic device through the selected at least one antenna.
- other components eg, radio frequency integrated circuit (RFIC) may be additionally formed as part of the antenna module 1297.
- RFIC radio frequency integrated circuit
- antenna module 1297 may form a mmWave antenna module.
- a mmWave antenna module includes a printed circuit board, an RFIC disposed on or adjacent to a first side (e.g., bottom side) of the printed circuit board and capable of supporting a designated high frequency band (e.g., mmWave band); And a plurality of antennas (e.g., array antennas) disposed on or adjacent to the second side (e.g., top or side) of the printed circuit board and capable of transmitting or receiving signals in the designated high frequency band. can do.
- a mmWave antenna module includes a printed circuit board, an RFIC disposed on or adjacent to a first side (e.g., bottom side) of the printed circuit board and capable of supporting a designated high frequency band (e.g., mmWave band); And a plurality of antennas (e.g., array antennas) disposed on or adjacent to the second side (e.g., top or side) of the printed
- peripheral devices e.g., bus, general purpose input and output (GPIO), serial peripheral interface (SPI), or mobile industry processor interface (MIPI)
- signal e.g. commands or data
- commands or data may be transmitted or received between the electronic device 1201 and the external electronic device 1204 through the server 1208 connected to the second network 1299.
- Each of the external electronic devices 1202 or 1204 may be of the same or different type as the electronic device 1201.
- all or part of the operations performed in the electronic device 1201 may be executed in one or more of the external electronic devices 1202, 1204, or 1208.
- the electronic device 1201 needs to perform a certain function or service automatically or in response to a request from a user or another device, the electronic device 1201 does not execute the function or service on its own.
- one or more external electronic devices may be requested to perform at least part of the function or service.
- One or more external electronic devices that have received the request may execute at least part of the requested function or service, or an additional function or service related to the request, and transmit the result of the execution to the electronic device 1201.
- the electronic device 1201 may process the result as is or additionally and provide it as at least part of a response to the request.
- cloud computing distributed computing, mobile edge computing (MEC), or client-server computing technology can be used.
- the electronic device 1201 may provide an ultra-low latency service using, for example, distributed computing or mobile edge computing.
- the external electronic device 1204 may include an Internet of Things (IoT) device.
- Server 1208 may be an intelligent server using machine learning and/or neural networks.
- the external electronic device 1204 or server 1208 may be included in the second network 1299.
- the electronic device 1201 may be applied to intelligent services (e.g., smart home, smart city, smart car, or healthcare) based on 5G communication technology and IoT-related technology.
- Electronic devices may be of various types.
- Electronic devices may include, for example, portable communication devices (e.g., smartphones), computer devices, portable multimedia devices, portable medical devices, cameras, wearable devices, or home appliances.
- Electronic devices according to embodiments of this document are not limited to the above-described devices.
- first, second, or first or second may be used simply to distinguish one element from another, and may be used to distinguish such elements in other respects, such as importance or order) is not limited.
- One (e.g. first) component is said to be “coupled” or “connected” to another (e.g. second) component, with or without the terms “functionally” or “communicatively”.
- any of the components can be connected to the other components directly (e.g. wired), wirelessly, or through a third component.
- module used in various embodiments of this document may include a unit implemented in hardware, software, or firmware, and is interchangeable with terms such as logic, logic block, component, or circuit, for example. It can be used as A module may be an integrated part or a minimum unit of the parts or a part thereof that performs one or more functions. For example, according to one embodiment, the module may be implemented in the form of an application-specific integrated circuit (ASIC).
- ASIC application-specific integrated circuit
- Various embodiments of the present document are one or more instructions stored in a storage medium (e.g., built-in memory 1236 or external memory 1238) that can be read by a machine (e.g., electronic device 1201). It may be implemented as software (e.g., program 1240) including these.
- a processor e.g., processor 1220
- a device e.g., electronic device 1201
- the one or more instructions may include code generated by a compiler or code that can be executed by an interpreter.
- a storage medium that can be read by a device may be provided in the form of a non-transitory storage medium.
- 'non-transitory' only means that the storage medium is a tangible device and does not contain signals (e.g. electromagnetic waves). This term refers to cases where data is stored semi-permanently in the storage medium. There is no distinction between temporary storage cases.
- Computer program products are commodities and can be traded between sellers and buyers.
- the computer program product may be distributed in the form of a machine-readable storage medium (e.g. compact disc read only memory (CD-ROM)), or through an application store (e.g. Play Store) or on two user devices (e.g. It can be distributed (e.g. downloaded or uploaded) directly between smart phones) or online.
- a machine-readable storage medium e.g. compact disc read only memory (CD-ROM)
- an application store e.g. Play Store
- two user devices e.g. It can be distributed (e.g. downloaded or uploaded) directly between smart phones) or online.
- at least a portion of the computer program product may be at least temporarily stored or temporarily created in a machine-readable storage medium, such as the memory of a manufacturer's server, an application store's server, or a relay server.
- each component (e.g., module or program) of the above-described components may include a single or plural entity, and some of the plurality of entities may be separately placed in other components. there is.
- one or more of the components or operations described above may be omitted, or one or more other components or operations may be added.
- multiple components eg, modules or programs
- the integrated component may perform one or more functions of each component of the plurality of components identically or similarly to those performed by the corresponding component of the plurality of components prior to the integration. .
- operations performed by a module, program, or other component may be executed sequentially, in parallel, iteratively, or heuristically, or one or more of the operations may be executed in a different order, or omitted. Alternatively, one or more other operations may be added.
Landscapes
- Engineering & Computer Science (AREA)
- Acoustics & Sound (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Quality & Reliability (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Telephone Function (AREA)
Abstract
La présente spécification divulgue un dispositif électronique et un procédé de fonctionnement d'empreinte vocale basé sur un répertoire téléphonique associé, caractérisé par la formation d'un canal d'appel avec un dispositif électronique homologue en réponse à une demande d'activation de fonction d'appel sur la base d'informations d'identification du dispositif électronique homologue enregistrées dans un répertoire téléphonique stocké dans une mémoire, lorsque des données vocales transmises par le dispositif électronique homologue sont reçues, connecter un modèle d'empreinte vocale généré sur la base des données vocales aux informations d'identification du dispositif électronique homologue, et régler le modèle d'empreinte vocale connecté aux informations d'identification du dispositif électronique homologue à stocker dans le répertoire téléphonique. Divers autres modes de réalisation identifiés à partir de la spécification sont possibles.
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| KR10-2022-0154215 | 2022-11-17 | ||
| KR20220154215 | 2022-11-17 | ||
| KR10-2022-0185184 | 2022-12-27 | ||
| KR1020220185184A KR20240072874A (ko) | 2022-11-17 | 2022-12-27 | 폰북 기반의 성문 운용 방법 및 이를 지원하는 전자 장치 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024106830A1 true WO2024106830A1 (fr) | 2024-05-23 |
Family
ID=91085111
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/KR2023/017650 Ceased WO2024106830A1 (fr) | 2022-11-17 | 2023-11-06 | Procédé de fonctionnement d'empreinte vocale basé sur un répertoire téléphonique et dispositif électronique le prenant en charge |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2024106830A1 (fr) |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20060069689A (ko) * | 2004-12-18 | 2006-06-22 | 주식회사 팬택앤큐리텔 | 이동 통신 단말기의 잡음 제거 장치 |
| JP2008244796A (ja) * | 2007-03-27 | 2008-10-09 | Ntt Docomo Inc | 音声認証システム |
| JP5262967B2 (ja) * | 2009-04-30 | 2013-08-14 | 日本電気株式会社 | 携帯端末、認証方法およびプログラム |
| KR20180034507A (ko) * | 2015-07-23 | 2018-04-04 | 알리바바 그룹 홀딩 리미티드 | 사용자 성문 모델을 구축하기 위한 방법, 장치 및 시스템 |
| JP2020129094A (ja) * | 2019-02-12 | 2020-08-27 | 日本電信電話株式会社 | 学習データ取得装置、モデル学習装置、それらの方法、およびプログラム |
-
2023
- 2023-11-06 WO PCT/KR2023/017650 patent/WO2024106830A1/fr not_active Ceased
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20060069689A (ko) * | 2004-12-18 | 2006-06-22 | 주식회사 팬택앤큐리텔 | 이동 통신 단말기의 잡음 제거 장치 |
| JP2008244796A (ja) * | 2007-03-27 | 2008-10-09 | Ntt Docomo Inc | 音声認証システム |
| JP5262967B2 (ja) * | 2009-04-30 | 2013-08-14 | 日本電気株式会社 | 携帯端末、認証方法およびプログラム |
| KR20180034507A (ko) * | 2015-07-23 | 2018-04-04 | 알리바바 그룹 홀딩 리미티드 | 사용자 성문 모델을 구축하기 위한 방법, 장치 및 시스템 |
| JP2020129094A (ja) * | 2019-02-12 | 2020-08-27 | 日本電信電話株式会社 | 学習データ取得装置、モデル学習装置、それらの方法、およびプログラム |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2022055068A1 (fr) | Dispositif électronique pour identifier une commande contenue dans de la voix et son procédé de fonctionnement | |
| WO2022231135A1 (fr) | Procédé de sortie de signal audio et dispositif électronique pour la mise en œuvre de ce procédé | |
| WO2022030882A1 (fr) | Dispositif électronique de traitement de données audio, et procédé d'exploitation de celui-ci | |
| WO2021201429A1 (fr) | Dispositif électronique et procédé pour commander une sortie audio de celui-ci | |
| WO2022055319A1 (fr) | Dispositif électronique permettant de délivrer un son et son procédé de fonctionnement | |
| WO2021096281A1 (fr) | Procédé de traitement d'entrée vocale et dispositif électronique prenant en charge celui-ci | |
| WO2022197151A1 (fr) | Dispositif électronique pour l'écoute d'un son externe et procédé de fonctionnement de dispositif électronique | |
| WO2022030750A1 (fr) | Procédé de traitement de données vocales et dispositif électronique destiné à sa prise en charge | |
| WO2024106830A1 (fr) | Procédé de fonctionnement d'empreinte vocale basé sur un répertoire téléphonique et dispositif électronique le prenant en charge | |
| WO2022186471A1 (fr) | Procédé pour fournir un service d'appel de groupe et dispositif électronique le prenant en charge | |
| WO2022177186A1 (fr) | Dispositif électronique comprenant un haut-parleur et un microphone, et son procédé de fonctionnement | |
| WO2022186540A1 (fr) | Dispositif électronique et procédé de traitement d'enregistrement et d'entrée vocale dans un dispositif électronique | |
| WO2022030880A1 (fr) | Procédé permettant de traiter un signal vocal et appareil l'utilisant | |
| WO2021221440A1 (fr) | Procédé d'amélioration de qualité du son et dispositif s'y rapportant | |
| WO2022164023A1 (fr) | Procédé de traitement de données audio et dispositif électronique le prenant en charge | |
| WO2026029471A1 (fr) | Procédé de fourniture de service de traduction d'appels et dispositif électronique associé | |
| KR20240072874A (ko) | 폰북 기반의 성문 운용 방법 및 이를 지원하는 전자 장치 | |
| WO2025164905A1 (fr) | Dispositif électronique permettant d'obtenir un signal vocal et son procédé de fonctionnement | |
| WO2024117508A1 (fr) | Dispositif électronique et procédé de fourniture d'espace virtuel | |
| WO2024080745A1 (fr) | Procédé d'analyse de la parole d'un utilisateur sur la base d'une mémoire cache de parole, et dispositif électronique prenant en charge celui-ci | |
| WO2024215063A1 (fr) | Premier dispositif électronique pour délivrer un son, second dispositif électronique pour le commander, et procédé de fonctionnement d'un premier dispositif électronique | |
| WO2022186440A1 (fr) | Dispositif électronique pour traiter des paroles d'utilisateur et son procédé d'exploitation | |
| WO2022146033A1 (fr) | Dispositif électronique et procédé de commande de sortie/entrée vocale du dispositif électronique | |
| WO2024014654A1 (fr) | Dispositif électronique pour effectuer un enregistrement d'appel et son procédé de fonctionnement | |
| WO2024014869A1 (fr) | Procédé de traitement de traduction et dispositif électronique |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23891897 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 23891897 Country of ref document: EP Kind code of ref document: A1 |