WO2021036954A1 - 一种智能语音播放方法及设备 - Google Patents

一种智能语音播放方法及设备 Download PDF

Info

Publication number
WO2021036954A1
WO2021036954A1 PCT/CN2020/110623 CN2020110623W WO2021036954A1 WO 2021036954 A1 WO2021036954 A1 WO 2021036954A1 CN 2020110623 W CN2020110623 W CN 2020110623W WO 2021036954 A1 WO2021036954 A1 WO 2021036954A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
electronic device
action
user
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2020/110623
Other languages
English (en)
French (fr)
Inventor
郁心迪
刘航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to US17/634,601 priority Critical patent/US12070673B2/en
Priority to EP20856532.5A priority patent/EP3991814B1/en
Publication of WO2021036954A1 publication Critical patent/WO2021036954A1/zh
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63BAPPARATUS FOR PHYSICAL TRAINING, GYMNASTICS, SWIMMING, CLIMBING, OR FENCING; BALL GAMES; TRAINING EQUIPMENT
    • A63B71/00Games or sports accessories not covered in groups A63B1/00 - A63B69/00
    • A63B71/06Indicating or scoring devices for games or players, or for other sports activities
    • A63B71/0619Displays, user interfaces and indicating devices, specially adapted for sport equipment, e.g. display mounted on treadmills
    • A63B71/0622Visual, audio or audio-visual systems for entertaining, instructing or motivating the user
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B19/00Teaching not covered by other main groups of this subclass
    • G09B19/003Repetitive work cycles; Sequence of movements
    • G09B19/0038Sports
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B5/00Electrically-operated educational appliances
    • G09B5/06Electrically-operated educational appliances with both visual and audible presentation of the material to be studied
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63BAPPARATUS FOR PHYSICAL TRAINING, GYMNASTICS, SWIMMING, CLIMBING, OR FENCING; BALL GAMES; TRAINING EQUIPMENT
    • A63B71/00Games or sports accessories not covered in groups A63B1/00 - A63B69/00
    • A63B71/06Indicating or scoring devices for games or players, or for other sports activities
    • A63B71/0619Displays, user interfaces and indicating devices, specially adapted for sport equipment, e.g. display mounted on treadmills
    • A63B71/0622Visual, audio or audio-visual systems for entertaining, instructing or motivating the user
    • A63B2071/0625Emitting sound, noise or music
    • A63B2071/063Spoken or verbal instructions
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63BAPPARATUS FOR PHYSICAL TRAINING, GYMNASTICS, SWIMMING, CLIMBING, OR FENCING; BALL GAMES; TRAINING EQUIPMENT
    • A63B71/00Games or sports accessories not covered in groups A63B1/00 - A63B69/00
    • A63B71/06Indicating or scoring devices for games or players, or for other sports activities
    • A63B71/0619Displays, user interfaces and indicating devices, specially adapted for sport equipment, e.g. display mounted on treadmills
    • A63B71/0622Visual, audio or audio-visual systems for entertaining, instructing or motivating the user
    • A63B2071/0638Displaying moving images of recorded environment, e.g. virtual environment
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63BAPPARATUS FOR PHYSICAL TRAINING, GYMNASTICS, SWIMMING, CLIMBING, OR FENCING; BALL GAMES; TRAINING EQUIPMENT
    • A63B71/00Games or sports accessories not covered in groups A63B1/00 - A63B69/00
    • A63B71/06Indicating or scoring devices for games or players, or for other sports activities
    • A63B2071/0694Visual indication, e.g. Indicia
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63BAPPARATUS FOR PHYSICAL TRAINING, GYMNASTICS, SWIMMING, CLIMBING, OR FENCING; BALL GAMES; TRAINING EQUIPMENT
    • A63B2220/00Measuring of physical parameters relating to sporting activity
    • A63B2220/05Image processing for measuring physical parameters
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63BAPPARATUS FOR PHYSICAL TRAINING, GYMNASTICS, SWIMMING, CLIMBING, OR FENCING; BALL GAMES; TRAINING EQUIPMENT
    • A63B2220/00Measuring of physical parameters relating to sporting activity
    • A63B2220/80Special sensors, transducers or devices therefor
    • A63B2220/807Photo cameras
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/30ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to physical therapies or activities, e.g. physiotherapy, acupressure or exercising
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H30/00ICT specially adapted for the handling or processing of medical images
    • G16H30/40ICT specially adapted for the handling or processing of medical images for processing medical images, e.g. editing

Definitions

  • the embodiments of the present application relate to the field of electronic technology, and in particular to an intelligent voice playback method and device.
  • An intelligent fitness solution in the prior art is to play an animation of the coach's standard actions on a large-screen device to provide users with professional action guidance, and the users perform training with reference to the coach's standard actions.
  • the large-screen device can also play a set of preset voices in conjunction with the coach animation to assist the coach animation to guide users in fitness training.
  • the large-screen device mechanically repeatedly plays the same set of preset fixed voices, which will make the voice playback more monotonous and boring.
  • the user hears the same voice played repeatedly every time, which easily makes the user feel bored and lose interest, and therefore the user experience is poor.
  • the embodiments of the present application provide an intelligent voice playback method and device, which can play different voices according to the current training state and training actions of the user in real time, so as to provide the user with real-time voice feedback and real-time guidance for action improvement, and the voice content is rich. Change, the user experience is better.
  • an embodiment of the present application provides a voice playback method, the method includes: an electronic device displays an interface of a first application program, and the first application program is used for a user to perform sports training.
  • the electronic device collects images of the user's training actions.
  • the electronic device plays animations of standard actions and displays images of user training actions.
  • the electronic device determines the voice to be selected triggered by the first action unit in the user training action, and the voice to be selected includes multiple voices.
  • the first action unit is a training action or a part of a training action.
  • the electronic device selects a voice from the to-be-selected voices to play.
  • the electronic device can determine multiple candidate voices matching the user's motion state for the user's current action unit in real time, and choose to play different voices from the candidate voices, thereby giving the user real-time voice feedback , And the voice content played is rich and changeable, not easy to repeat, and the user experience is better.
  • the voice to be selected includes the main process voice and the non-main process voice.
  • the main process voice is used to make sports training proceed normally.
  • the non-main process voice includes one or more of action improvement voice, action evaluation voice, or training rhythm voice.
  • the main process voice is a high priority voice, and the non-main process voice is a low priority voice.
  • the electronic device selects a voice from the voices to be selected to play, including: the electronic device selects a voice from the voices to be selected to play according to the priority of the voice.
  • the voice that can be played by the electronic device includes multiple types.
  • the electronic device can play multiple different types of voice according to the priority of the voice in real time according to the user’s current exercise status and exercise actions.
  • Voice so the voice content to be played is rich and diverse, which can avoid the tedious feeling caused by playing a fixed set of voices, increase the user's training interest, and improve the user's experience.
  • the voice to be selected includes at least one high-priority main process voice
  • the electronic device selects one voice from the voice to be selected to play according to the voice priority, including: the electronic device plays the voice to be selected separately Each high-priority main process voice in the voice.
  • the main process voice is used to ensure the normal progress of sports training, the main process voice is more important and has a high priority, so the electronic device can play each main process voice in turn.
  • the method when the electronic device plays each main process voice, the method further includes: the electronic device stops playing the animation of the standard action.
  • the electronic device displays the first graphic information, and the first graphic information corresponds to the voice of the main process.
  • the electronic device After the electronic device detects that the user adjusts to the state required by the first graphic information, the electronic device displays a completion identifier on the first graphic information. Then, the electronic device stops displaying the first graphic information. The electronic device continues to play the standard motion animation.
  • the first graphic information may be graphic information in the form of a task box.
  • the electronic device plays the main process voice, it can display the first graphic information and pause the progress of the sports training. After the user makes corresponding adjustments according to the voice of the main process or the first graphic information, the electronic device can stop displaying the first graphic information and continue the subsequent operation training.
  • the to-be-selected voice may include at least one non-main process voice in addition to the above-mentioned at least one main process voice.
  • the electronic device determines that the to-be-selected voice triggered by the first action unit includes at least one main flow voice and at least one non-main flow voice, the electronic device plays at least one high-priority main flow voice.
  • the voice to be selected includes multiple low-priority non-main process voices.
  • the electronic device selects one voice from the voices to be selected to play, including: the electronic device selects the first target voice from a plurality of low-priority non-mainstream voices. The electronic device plays the first target voice.
  • the electronic device determines that the to-be-selected voice triggered by the first action unit includes at least one low-priority non-main flow voice and does not include the main flow voice, the electronic device selects one from the at least one non-main flow voice The voice is played to avoid the problem that the voice playback matches the subsequent action unit caused by multiple voices corresponding to the playing action unit.
  • the first target voice is the first voice triggered among multiple non-main process voices.
  • the electronic device selects the first voice to be triggered for playback according to the sequence of triggering.
  • the first target voice is the first action improvement voice for the first error in the action unit.
  • the electronic device playing the first target voice includes: if in this exercise training, the electronic device has not played the first action improvement voice for the training action of the type of the action unit, then the electronic device plays the first action improvement voice.
  • the method further includes: if in this exercise training, the electronic device has played the first action improvement voice for the type of training action the action unit belongs to, then the electronic device plays the action points for the first error; or the electronic device starts from the first action. Select the second target voice from other non-main process voices other than the action-improved voice.
  • the electronic device no longer plays the action-improved voice to avoid frequent operations for the same error.
  • Improved voice prompts caused by poor user experience.
  • the method when the electronic device plays the first action improved voice, the method further includes: the electronic device displays second graphic information, and the second graphic information corresponds to the first action improved voice. After the electronic device detects that the user has adjusted the training action to the state required by the second graphic information, the electronic device displays the completion indicator on the second graphic information. Then, the electronic device stops displaying the second graphic information.
  • the second graphic information may be graphic information in the form of a task box.
  • the electronic device may display the second graphic information when playing the action to improve the voice. After the user makes corresponding adjustments according to the action to improve the voice or the second graphic information, the electronic device may stop displaying the second graphic information.
  • the first target voice is the first action evaluation voice
  • the electronic device plays the first target voice, including: if the electronic device determines that the current mode is the first mode in a random manner, the electronic device plays the first action Evaluate the voice.
  • the method further includes: if the electronic device determines that the current mode is the second mode in a random manner, the electronic device does not play the first action evaluation voice.
  • the electronic device after the electronic device selects an action evaluation voice from the candidate voices, it can randomly determine whether to play it, thereby avoiding frequent and regular playing of the action evaluation voice, and increasing the uncertainty of the action evaluation voice playback.
  • the action evaluation voice includes multiple levels, each level includes multiple voices, the first action evaluation voice is the first level of action evaluation voice, and the electronic device plays the first action evaluation voice, including: The electronic device randomly selects one voice from the first-level action evaluation voice to play.
  • the electronic device playing the first target voice includes: if the electronic device determines that the voice triggered by another action unit before the first action unit has been played, the electronic device plays the first target voice. The method further includes: if the electronic device determines that the voice triggered by another action unit before the first action unit has not been played, the electronic device does not play the first target voice.
  • the electronic device will play one voice only after the other voice is played, and the voice playback will not be interrupted due to voice conflicts between different action units.
  • the voice to be selected includes a low-priority non-main process voice; the electronic device performs voice playback according to the priority of the voice in the voice to be selected, including: the electronic device plays the low-priority voice The first target voice.
  • the electronic device determines that the to-be-selected voice triggered by the first action unit includes a low-priority non-main flow voice, the electronic device plays the voice.
  • the method further includes: the electronic device according to the movement The progress of training or the status of the user is determined to trigger the main process voice; the electronic device plays the main process voice.
  • the electronic device after the electronic device enters the first application program, it can first determine whether the main process voice is triggered, and play the triggered main process voice to ensure that sports training can start normally.
  • the action improvement speech is used to guide the user to improve the training action.
  • Action evaluation voice is used to give a positive evaluation to the user's training actions.
  • the training rhythm voice is used to remind the user of the progress of the exercise training.
  • the main process voice includes one or more of a process voice, a position adjustment voice, a position adjustment voice, or a humanized prompt voice.
  • Action-improved speech includes one or more of frequency-improved speech, amplitude-improved speech, or posture-improved speech.
  • each type of voice can also include multiple dimensions of voice prompts.
  • the electronic device can give users voice prompts from different dimensions and angles, which can further enrich the voice content that can be played.
  • the voice content can be flexible and changeable, and the voice prompts can be more specific and comprehensive, which can provide users with fresh and comprehensive voice prompts. Interesting feeling, can improve the user experience.
  • the electronic device determining the candidate voice triggered by the first action unit in the user's training action includes: the electronic device determines the voice to be selected according to the state of the user during the execution of the first action unit Position adjustment voice, position adjustment voice or humanized prompt voice. The electronic device determines the non-main process voice among the to-be-selected voices according to the action in the first action unit.
  • the electronic device can determine the to-be-selected voice triggered by the first action unit according to the state of the user during the execution of the first action unit, the standard degree of the user's action, the number of actions, and other action information.
  • an embodiment of the present application provides a voice playback device, which is included in an electronic device.
  • the device has the function of realizing the behavior of the electronic device in any of the above aspects and possible design methods.
  • This function can be realized by hardware, or by hardware executing corresponding software.
  • the hardware or software includes at least one module or unit corresponding to the above-mentioned functions. For example, display module/unit, acquisition module/unit, determination module/unit, playback module/unit, etc.
  • an embodiment of the present application provides an electronic device, including: one or more processors; and a memory, in which code is stored.
  • the electronic device is caused to execute the voice playback method in any one of the possible designs in the foregoing aspects.
  • an embodiment of the present application provides a computer storage medium, including computer instructions, which when the computer instructions run on a mobile terminal, cause the mobile terminal to execute the voice playback method in any one of the possible designs of the foregoing aspects.
  • the embodiments of the present application provide a computer program product, which when the computer program product runs on a computer, causes the computer to execute the voice playback method in any one of the possible designs in the foregoing aspects.
  • FIG. 1A is a schematic structural diagram of an electronic device provided by an embodiment of this application.
  • FIG. 1B is a schematic diagram of a system provided by an embodiment of this application.
  • FIG. 2 is a schematic structural diagram of another electronic device provided by an embodiment of the application.
  • FIG. 3 is a flowchart of a voice playback provided by an embodiment of the application.
  • FIG. 4 is a schematic diagram of a set of interfaces provided by an embodiment of the application.
  • 5A is a schematic diagram of a voice prompt and interface display effect provided by an embodiment of this application.
  • FIG. 5B is a schematic diagram of an interface display effect provided by an embodiment of this application.
  • 5C is a schematic diagram of another interface display effect provided by an embodiment of the application.
  • FIG. 6 is a schematic diagram of another interface display effect provided by an embodiment of the application.
  • FIG. 7A is a schematic diagram of another interface display effect provided by an embodiment of this application.
  • FIG. 7B is a schematic diagram of another voice prompt and interface display effect provided by an embodiment of this application.
  • FIG. 7C is a schematic diagram of another voice prompt and interface display effect provided by an embodiment of this application.
  • FIG. 8A is a schematic diagram of another voice prompt and interface display effect provided by an embodiment of this application.
  • FIG. 8B is a schematic diagram of another interface display effect provided by an embodiment of this application.
  • FIG. 8C is a schematic diagram of another voice prompt and interface display effect provided by an embodiment of this application.
  • 9A is a schematic diagram of another interface display effect provided by an embodiment of the application.
  • FIG. 9B is a schematic diagram of another voice prompt and interface display effect provided by an embodiment of this application.
  • FIG. 10 is a sequence diagram of an action evaluation voice playback provided by an embodiment of this application.
  • FIG. 11 is a schematic diagram of another voice prompt and interface display effect provided by an embodiment of this application.
  • FIG. 12 is a schematic diagram of another interface display effect provided by an embodiment of this application.
  • FIG. 13 is a flowchart of another voice playback provided by an embodiment of the application.
  • the embodiment of the application provides an intelligent voice playback method in an intelligent exercise system, which can play different voices according to the user's current exercise state and exercise actions in real time when the user performs exercise training according to the coach animation, so as to provide the user with real-time Voice feedback and real-time guidance for action improvement, and rich and changeable voice content, better user experience.
  • the exercise system is used for scenes where users compare coach animations for AI exercise training.
  • the user can perform other AI sports training such as AI fitness training, AI yoga training, AI fitness operation training, or AI somatosensory game.
  • the smart voice playback method provided in the embodiments of the present application can be applied to an electronic device 01 having a screen 10. Especially can be applied to the electronic device 01 with a large screen.
  • the electronic device 01 may also include a camera 20.
  • the camera 20 may be integrated in the electronic device 01, or may be an independent camera outside the main body of the electronic device 01, and connected to the main body of the electronic device 01 in a wired or wireless manner.
  • the electronic device 01 may also include an audio player 30.
  • the audio player 30 may be integrated in the electronic device 01, for example, it may be a speaker or a sound box.
  • the audio player 30 may also be an audio playback device connected to the main body of the electronic device 01 in a wired or wireless manner, for example, it may be a speaker or the like.
  • the camera 20 can be used to collect real-time motion images of the user.
  • the screen 10 of the electronic device 01 can be used to play coach animations and display images of the user's real-time exercise.
  • the electronic device 01 determines in real time the voice to be played for the user's current training state and training action, and performs voice playback through the audio player 30.
  • the electronic device 01 can be a TV, a desktop computer, a tablet computer, a notebook computer, a mobile phone, a smart screen, a projector, an ultra-mobile personal computer (UMPC), a netbook, or an augmented reality, AR)/virtual reality (VR) equipment, etc.
  • UMPC ultra-mobile personal computer
  • VR virtual reality
  • the embodiment of the present application does not limit the specific type of the electronic device 01.
  • the smart voice playback method provided in the embodiments of the present application can also be applied to the system as shown in FIG. 1B.
  • the system includes an electronic device 02 with a screen, and an electronic device 03 used in conjunction with the electronic device 02.
  • the electronic device 02 or the electronic device 03 may include a camera, and the camera is used to collect real-time motion images of the user.
  • the electronic device 02 or the electronic device 03 may include an audio player, and the audio player is used to play voice.
  • the electronic device 03 may be a mobile phone, a wearable device (such as a watch or a bracelet, etc.), a tablet computer or a notebook computer, etc.
  • a large-screen TV is used in conjunction with a mobile phone, and the screen of the TV is used to play coach animations and display images of users' real-time exercise.
  • the mobile phone used in conjunction with the TV can determine in real time the voice to be played for the user's current training state and training actions when the user is performing exercise training according to the coach animation, and the audio player on the mobile phone or the TV can play the voice.
  • FIG. 2 shows a schematic structural diagram of an electronic device 100 that adopts the intelligent voice playback method provided by an embodiment of the present application.
  • the electronic device 100 may include: a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a power management module 140, an antenna, and a wireless communication module 160 , Audio module 170, speaker 170A, microphone 170C, speaker interface 170B, sensor module 180, buttons 190, indicator 191, camera 193, and display 192, etc.
  • the aforementioned sensor module 180 may include sensors such as a distance sensor, a proximity light sensor, a fingerprint sensor, a temperature sensor, a touch sensor, and an ambient light sensor.
  • the structure illustrated in this embodiment does not constitute a specific limitation on the electronic device 100.
  • the electronic device 100 may include more or fewer components than shown, or combine certain components, or split certain components, or arrange different components.
  • the illustrated components can be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units.
  • the processor 110 may include an application processor (AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait.
  • AP application processor
  • modem processor modem processor
  • GPU graphics processing unit
  • image signal processor image signal processor
  • ISP image signal processor
  • controller memory
  • video codec digital signal processor
  • DSP digital signal processor
  • NPU neural-network processing unit
  • the different processing units may be independent devices or integrated in one or more processors.
  • the controller may be the nerve center and command center of the electronic device 100.
  • the controller can generate operation control signals according to the instruction operation code and timing signals to complete the control of fetching and executing instructions.
  • a memory may also be provided in the processor 110 to store instructions and data.
  • the memory in the processor 110 is a cache memory.
  • the memory can store instructions or data that the processor 110 has just used or used cyclically. If the processor 110 needs to use the instruction or data again, it can be directly called from the memory. Repeated accesses are avoided, the waiting time of the processor 110 is reduced, and the efficiency of the system is improved.
  • the processor 110 may include one or more interfaces.
  • the interface may include an integrated circuit (I2C) interface, an integrated circuit audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (PCM) interface, and a universal asynchronous transmitter (universal asynchronous) interface.
  • I2C integrated circuit
  • I2S integrated circuit sound
  • PCM pulse code modulation
  • UART integrated circuit
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • USB interface etc.
  • the interface connection relationship between the modules illustrated in this embodiment is merely a schematic description, and does not constitute a structural limitation of the electronic device 100.
  • the electronic device 100 may also adopt different interface connection modes in the above-mentioned embodiments, or a combination of multiple interface connection modes.
  • the power management module 140 is used to connect to a power source.
  • the charging management module 140 may also be connected to the processor 110, the internal memory 121, the display screen 194, the camera 193, and the wireless communication module 160.
  • the power management module 141 receives power input, and supplies power to the processor 110, the internal memory 121, the display screen 194, the camera 193, and the wireless communication module 160.
  • the power management module 141 may also be provided in the processor 110.
  • the wireless communication function of the electronic device 100 can be realized by an antenna, a wireless communication module 160, and the like.
  • the wireless communication module 160 can provide applications on the electronic device 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), Bluetooth (bluetooth, BT), and global Navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions.
  • WLAN wireless local area networks
  • WiFi wireless fidelity
  • Bluetooth bluetooth, BT
  • GNSS global navigation satellite system
  • FM frequency modulation
  • NFC near field communication technology
  • infrared technology infrared, IR
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2, frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110.
  • the wireless communication module 160 may also receive the signal to be sent from the processor 110, perform frequency modulation, amplify it, and convert it into electromagnetic waves to radiate through the antenna 2.
  • the antenna of the electronic device 100 and the wireless communication module 160 are coupled, so that the electronic device 100 can communicate with the network and other devices through wireless communication technology.
  • the electronic device 100 implements a display function through a GPU, a display screen 192, an application processor, and the like.
  • the GPU is a microprocessor for image processing and is connected to the display screen 192 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations for graphics rendering.
  • the processor 110 may include one or more GPUs, which execute program instructions to generate or change display information.
  • the display screen 192 is used to display images, videos, and the like.
  • the display screen 192 includes a display panel.
  • the display panel can adopt liquid crystal display (LCD), organic light-emitting diode (OLED), active matrix organic light-emitting diode or active-matrix organic light-emitting diode (active-matrix organic light-emitting diode).
  • LCD liquid crystal display
  • OLED organic light-emitting diode
  • active matrix organic light-emitting diode active-matrix organic light-emitting diode
  • active-matrix organic light-emitting diode active-matrix organic light-emitting diode
  • emitting diode AMOLED, flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (QLED), etc.
  • the display screen 192 may be used to display coach animations and real-time motion images of the user.
  • the electronic device 100 can implement a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 192, and an application processor.
  • the ISP is used to process the data fed back by the camera 193.
  • the ISP may be provided in the camera 193.
  • the camera 193 is used to capture still images or videos.
  • the object generates an optical image through the lens and is projected to the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then transfers the electrical signal to the ISP to convert it into a digital image signal.
  • ISP outputs digital image signals to DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other formats of image signals.
  • the electronic device 100 may include one or N cameras 193, and N is a positive integer greater than one.
  • the camera 193 may be provided at the upper edge of the display screen 192 of the electronic device 100.
  • the embodiment of the present application does not limit the position of the camera 193 on the electronic device 100.
  • the electronic device 100 may not include a camera, that is, the aforementioned camera 193 is not provided in the electronic device 100.
  • the electronic device 100 can be externally connected to the camera 193 through an interface (such as the USB interface 130).
  • the external camera 193 can be fixed on the electronic device 100 by an external fixing member (such as a camera bracket with a clip).
  • the external camera 193 may be fixed on the edge of the display 192 of the electronic device 100, such as the upper edge, by an external fixing member.
  • the camera 193 may be used to collect real-time motion images of the user.
  • Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point.
  • Video codecs are used to compress or decompress digital video.
  • the electronic device 100 may support one or more video codecs. In this way, the electronic device 100 can play or record videos in multiple encoding formats, such as: moving picture experts group (MPEG) 1, MPEG2, MPEG3, MPEG4, and so on.
  • MPEG moving picture experts group
  • NPU is a neural-network (NN) computing processor.
  • NN neural-network
  • applications such as intelligent cognition of the electronic device 100 can be realized, such as image recognition, face recognition, voice recognition, text understanding, and so on.
  • the external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100.
  • the external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example, save music, video and other files in an external memory card.
  • the internal memory 121 may be used to store computer executable program code, where the executable program code includes instructions.
  • the processor 110 executes various functional applications and data processing of the electronic device 100 by running instructions stored in the internal memory 121.
  • the processor 110 may execute instructions stored in the internal memory 121, and the internal memory 121 may include a storage program area and a storage data area.
  • the storage program area can store an operating system, at least one application program (such as a sound playback function, an image playback function, etc.) required by at least one function.
  • the data storage area can store data (such as audio data, phone book, etc.) created during the use of the electronic device 100.
  • the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash storage (UFS), and the like.
  • the processor 110 may determine different voices according to the user's current exercise state and exercise actions in real time when the user is performing exercise training according to the coach animation.
  • the electronic device 100 can implement audio functions through an audio module 170, a speaker 170A, a microphone 170C, a speaker interface 170B, and an application processor. For example, music playback, recording, etc.
  • the audio module 170 is used to convert digital audio information into an analog audio signal for output, and is also used to convert an analog audio input into a digital audio signal.
  • the audio module 170 can also be used to encode and decode audio signals.
  • the audio module 170 may be provided in the processor 110, or part of the functional modules of the audio module 170 may be provided in the processor 110.
  • the speaker 170A also called “speaker”
  • the microphone 170C also called “microphone”, “microphone”, is used to convert sound signals into electrical signals.
  • the speaker interface 170B is used to connect wired speakers.
  • the speaker interface 170B may be a USB interface 130, or a 3.5mm open mobile terminal platform (OMTP) standard interface, and a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association of the USA, CTIA
  • the speaker 170A or the speaker interface 170B and the connected speakers, can be used for the playback processor 110 to determine different real-time behaviors based on the user’s current exercise status and exercise actions when the user performs exercise training according to the coach animation. voice.
  • the button 190 includes a power-on button, a volume button, and so on.
  • the button 190 may be a mechanical button. It can also be a touch button.
  • the electronic device 100 may receive key input, and generate key signal input related to user settings and function control of the electronic device 100.
  • the indicator 191 may be an indicator light, which may be used to indicate that the electronic device 100 is in a power-on state, a standby state, or a power-off state. For example, when the indicator light is off, it can indicate that the electronic device 100 is in the off state; the indicator light is green or blue, it can indicate that the electronic device 100 is in the standby state; the indicator light is red, it can indicate that the electronic device 100 is in the standby state.
  • the electronic device 100 is equipped with a remote control.
  • the remote controller is used to control the electronic device 100.
  • the remote control may include multiple buttons, such as a power button, a volume button, and other multiple selection buttons.
  • the buttons on the remote control can be mechanical buttons or touch buttons.
  • the remote controller can receive key input, generate key signal input related to user settings and function control of the electronic device 100, and send corresponding control signals to the electronic device 100 to control the electronic device 100.
  • the remote controller may send a control signal to the electronic device 100 through an infrared signal or the like.
  • the remote control may also include a battery storage cavity for installing batteries to supply power to the remote control.
  • the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the electronic device 100. It may have more or fewer parts than shown in FIG. 2, may combine two or more parts, or may have a different part configuration.
  • the electronic device may also include components such as speakers.
  • the various components shown in FIG. 2 may be implemented in hardware, software, or a combination of hardware and software including one or more signal processing or application specific integrated circuits.
  • the electronic device 100 may also include the above-mentioned different components, may have more or fewer components than those shown in FIG. 2, may combine two or more components, or may have different components.
  • the component configuration is not limited in the embodiment of this application.
  • the camera 193 in the electronic device 100 shown in FIG. 2 may be used to collect real-time motion images of the user.
  • the display screen 192 may be used to display coach animations and images of real-time movements of the user.
  • the processor 110 may determine different voices based on the user's current exercise state and exercise actions in real time when the user is performing exercise training according to the coach animation according to preset rules.
  • the speaker 170A may be used to play the voice determined by the processor 110. In this way, the electronic device can play different voices according to the user’s current exercise state and exercise actions in real time when the user performs exercise training according to the coach animation, so as to provide the user with real-time voice feedback and guidance for action improvement, and the voice content is rich and changeable. , The user experience is better.
  • the voices that can be played by the electronic device include main process voice and non-main process voice.
  • the main process voice may include one or more of the types of action improvement voice, training rhythm voice, or action evaluation voice.
  • the electronic device can play a variety of different types of voice in real time for the user's current exercise state and exercise action, so the played voice content is rich and diverse, and the voice content is not easy to repeat, which can avoid fixed playback
  • the monotonous feeling caused by a set of voices increases the user’s interest in training and improves the user’s experience.
  • each type of voice can also include multiple dimensions of voice prompts.
  • the electronic device can give users voice prompts from different dimensions and angles, which can further enrich the voice content that can be played.
  • the voice content can be flexible and changeable, and the voice prompts can be more specific and comprehensive, which can provide users with fresh and comprehensive voice prompts. Interesting feeling, can improve the user experience.
  • the main process voice can be used to introduce some relevant information of the current training course to the user, to ensure the normal progress of the training process, etc.
  • the main process voice may include one or more of multiple dimensions such as process voice, position adjustment voice, position adjustment voice, or humanized prompt voice.
  • the process voice is used to introduce the relevant information of the current course progress, for example, it may include the voice used to introduce the name of the action and the number of groups, the voice prompt for the start of the action, and the voice prompt for the end of the action, etc.
  • the process voice can be "Welcome to the fitness system, the first session, hand-assisted squat, a total of 3 groups, each group of 5 actions".
  • the voice prompt at the end of the action can be "Yes, you have completed the first session of the training action” or "Wow, that's great. It seems that you have completely mastered the essentials of the action, and you must continue to hold the next action. "Wait.
  • the position adjustment voice can be used to prompt the user to stand in a specific position, so that the camera can collect the user's image and ensure that the training can be carried out normally.
  • the user's position adjustment voice may be "please stand in front of the screen” or "please move to the middle area of the screen” and so on.
  • the stance adjustment voice can be used to prompt the user to adjust the orientation of the stance, so that the camera can collect images of a specific part of the user to ensure that the training can proceed normally.
  • the user's position adjustment voice may be "please stand sideways to the screen".
  • the humanized prompt voice can be used to give some humanized prompts to the user. For example, when the electronic device determines that the user is close to the corner of the table through the image collected by the camera, it can prompt the user “please stay away from the corner of the table to avoid injury”.
  • the electronic device can determine whether to play the process voice according to the progress of the training course.
  • the electronic device can determine whether to play the position adjustment voice or the position adjustment voice according to the user's position and orientation.
  • the electronic device can determine whether to play the humanized prompt voice according to the user's current environment and other conditions.
  • the above-mentioned action-improved speech is used to guide the user to improve the training action and improve the standard degree of the training action.
  • the above action evaluation speech is used to affirm, encourage and praise the user's training actions.
  • the electronic device can determine the degree of matching between the user's training action and the standard action in the coach animation based on the collected user's image, combined with a preset detection algorithm.
  • the electronic device can guide the user to improve the standard level of the action through the action improvement voice, so that the user can learn the shortcomings and improvement directions in time based on real-time voice feedback , So as to adjust and improve the quality of the action in time.
  • the action-improving voice is a positive, positive, and guiding voice, which is used to help the user improve and improve the quality of the action to correct the error; it does not continue to simply point out that the user's training action is wrong or irregular This leads to a poor user experience. Moreover, even if it is pointed out that the user's training movement is wrong or a certain part of the movement is not standardized, the user usually does not know how to improve and improve.
  • the electronic device can play the action improvement voice to improve and improve the quality of the current training action in real time for the user's current exercise action, so as to provide the user with real-time guidance for the current training action and guide the user to perform the action.
  • the action-improved speech can be a short and easy-to-understand language content that is biased towards colloquialism.
  • the action-improved speech may include one or more of multiple dimensions such as frequency-improved speech, amplitude-improved speech, or posture-improved speech.
  • frequency-improved speech amplitude-improved speech
  • posture-improved speech exemplary, see Table 2 for examples of actions to improve speech content.
  • the electronic device can encourage the user to be positive through the action evaluation voice, so as to increase the user's confidence and sense of accomplishment, and enhance the user's sense of interest.
  • the action evaluation voice may include multiple dimensions such as perfect level and awesome level.
  • the content of the action evaluation voice can be found in Table 3.
  • the electronic device can voice the progress of the user's current training action through the training rhythm, let the user know how many sets/actions are left, how many sets/actions have been performed, and help the user understand the progress status of the course, so as to adjust the physical and mental state . Or, the user may be a little tired during the exercise, the electronic device can voice remind the user that there are few groups left through the training rhythm, and it will be over immediately, to help the user persevere.
  • the electronic device can guide, correct, and encourage the user in real time through voice during the user's exercise training process, and guide the user to smoothly perform the exercise training.
  • the method may include:
  • the electronic device After the electronic device detects the user's operation to open the AI fitness APP, it opens the AI fitness APP and displays the APP interface.
  • the electronic device can detect the operation of the user instructing the electronic device to open the AI fitness APP through a remote control or a voice command. Or, if the screen of the electronic device is a touch screen, the electronic device can also detect a touch operation that the user instructs to open the AI fitness APP. After the electronic device detects the user's operation to open the AI fitness APP, it can open the AI fitness APP to start the fitness course.
  • an AI fitness APP icon 401 is displayed on the electronic device. After the electronic device detects that the user has clicked the AI fitness APP icon 401 through the remote control, it opens the AI fitness APP and displays as Figure 4 (b) shows the interface of the AI fitness APP. Then, after the electronic device detects that the user selects a fitness course (for example, the introductory course 402), it starts the fitness course and displays the interface as shown in (c) in FIG. 4.
  • a fitness course for example, the introductory course 402
  • the electronic device determines to trigger the process voice in the main process voice according to the course progress, the electronic device plays the process voice.
  • the electronic device can play a process voice according to the course progress, so as to introduce the relevant information of the current training course to the user, and explain and explain the training actions that the user will perform.
  • the electronic device can play a procedural voice at the beginning of the course.
  • the process voice may be "Welcome to the fitness introductory course", or "Training is about to start, please be prepared!”, etc.
  • the electronic device may also display the text information of the process voice on the screen to further prompt the user visually.
  • the electronic device determines to trigger the position adjustment voice, the position adjustment voice or the humanized prompt voice in the main process voice according to the user's state and the preset condition 1, the electronic device plays the triggered voice.
  • the user's status includes the user's location, station orientation, and environment.
  • the electronic device determines to trigger the position adjustment voice.
  • the electronic device determines to trigger the station adjustment voice.
  • the electronic device determines to trigger a humanized prompt voice.
  • the electronic device can determine whether to trigger the position adjustment voice, the position adjustment voice or the humanized prompt voice in the main process voice according to the user's status and preset condition 1, so as to prompt the user to adjust the user's position, orientation and other status in time to ensure the user Able to start fitness training normally.
  • the electronic device determines to trigger the main process voice "please stand in front of the screen".
  • the electronic device may display the first image and text information on the screen when the main process voice such as the playback position adjustment voice, the position adjustment voice, or the humanized prompt voice is displayed.
  • the first graphic information is graphic information corresponding to the voice of the main process.
  • the electronic device may display the first graphic information in the form of a task box on the side where the user's image screen is displayed, so that the user can learn the target that needs adjustment and improvement of the user status according to the first graphic information.
  • the first graphic information can be a more concise graphic representation for the main process voice.
  • the first graphic information in the form of the task box may refer to the prompt information 501 in FIG. 5A.
  • the electronic device can also display the first graphic information in the middle of the screen, so that the user can more easily see the related prompt content.
  • the first graphic information may further include the prompt information 502 in FIG. 5A.
  • the information 500 shown in FIG. 5A represents the main process voice played by the electronic device.
  • the first image and text information in the form of the task box on the electronic device can be changed accordingly , In response to the user's status adjustment.
  • the completion indicator may be displayed on the first graphic information.
  • the electronic device may update the cross before the prompt message 501 to a tick.
  • the check mark can be the completion mark.
  • the electronic device may also play a response voice (for example, a beep). Then, referring to Figure 5C, the first graphic information on the screen disappears. The electronic equipment continues to play the coach animation, thereby continuing the subsequent courses.
  • the electronic device detects that the user has not adjusted the state to the state required by the first graphic information, the first graphic information is continuously displayed. The first graphic information will not disappear, the course will not continue, and the coach animation will not play the follow-up content.
  • the electronic device determines that multiple main process voices are triggered according to the same state of the user, the electronic device can play each main process voice separately.
  • multiple main process voices may include "please stand in front of the screen", “please face the screen” and “return your head back to the screen”, etc.
  • the electronic device stops displaying the first image and text information and continues the subsequent courses.
  • the electronic device plays the coach animation and displays the image of the user's training action.
  • the electronic device can play the coach animation.
  • the coach animation is the animation of the coach's standard actions.
  • the user can refer to the coach animation for training actions.
  • the electronic device can also collect images of user training actions (or called fitness actions or sports actions) and display them on the screen of the electronic device. That is, the electronic device can display the image of the coach animation and the image of the user's training action on the screen at the same time, so that the user can compare the matching degree between his own training action and the standard action in the coaching animation.
  • the interface displayed by the electronic device can be seen in FIG. 6.
  • the electronic device may also display preset bone nodes on the image screen of the user's training actions, so that the user can more accurately determine the key parts and the difference from the coach's standard actions based on the bone nodes, and facilitate the user to perform actions. Adjust and improve.
  • the bone node may be shown as a dot 601 in FIG. 6.
  • the electronic device can determine whether to trigger the main process voice, the action improvement voice, the action evaluation voice, and the training rhythm voice.
  • the method may further include:
  • the electronic device determines whether to play the process voice in the main process voice according to the subsequent course progress.
  • the electronic device can play a process voice at the beginning of each set of actions to give relevant instructions and prompts for the training actions to be performed next.
  • the process voice may prompt the name and group number of the group of actions, for example, "Next group of actions" or "3, 2, 1, start! and so on.
  • the electronic device can play a process voice.
  • the process voice may be "You have completed all actions!.
  • the electronic device continuously monitors the state of the user, and determines whether to trigger the position adjustment voice, the position adjustment voice, or the humanized prompt voice in the main process voice according to the preset condition 1.
  • the electronic device can continuously monitor the user's state, and determine whether to trigger the position adjustment voice, the position adjustment voice or the humanized prompt voice in the main process voice according to the preset condition 1, so as to prompt the user to adjust the user's position, orientation and other states in time to ensure The user can perform fitness training normally.
  • the electronic device continuously detects the user's training action, and determines whether to trigger the action to improve the voice according to the preset condition 2.
  • the preset condition 2 may include user training actions, and the degree of matching with the standard actions in the coach animation is relatively small.
  • the action improvement voice can be used to guide the user in real time for the current training action, and guide the user to make the action more standard and standardized.
  • the electronic device continuously detects the user's training action, and determines whether to trigger the action evaluation voice according to the preset condition 3.
  • the preset condition 3 may include the user's training action, which has a greater degree of matching with the standard action in the coach animation.
  • the action evaluation voice can be used to encourage the user to actively encourage the current training action, thereby increasing the user's confidence and sense of accomplishment, and enhancing the user's sense of interest.
  • the electronic device records the number of user training actions, and determines whether to trigger the training rhythm voice according to preset condition 4.
  • the preset condition 4 is that a set of actions has been performed half the amount, and the training rhythm voice is used to remind the user that half of the action has been performed.
  • the preset condition 4 is that there are N times left in a group of actions, and the value of N is relatively small, for example, it can be 1, 2, or 5.
  • the training rhythm voice is used to remind the user that the group of actions is about to end, and then hold on.
  • the electronic device selectively plays the voice triggered by the same action unit.
  • the electronic device determines that there are usually multiple voices to be played that are triggered according to the state and actions of the user through step 305 to step 309. In order to ensure the timeliness of the voice playback, to prevent the user from already performing the next action, the electronic device is still playing the related voice of the previous action, avoiding the playback conflict of multiple voices, so that the played voice can correspond to the user's training action. To ensure that the voice playback matches the user's training actions in real time, the electronic device can select one of the multiple to-be-played voices to play. In other words, the electronic device can selectively play the voice triggered by the same action unit.
  • the action unit is a unit preset by the electronic device for selective voice playback.
  • the selective playback of the voice triggered by the same action unit by the electronic device refers to the non-main process voice triggered by the electronic device on the action in the same action unit, and the main process voice triggered by the user's state during the execution of the action in the same action unit For selective playback.
  • the action unit can be a training action, or the action unit can be a part of a training action.
  • a dumbbell lateral flexion training action can include a first action unit and a second action unit.
  • the first action unit includes holding the dumbbell in one hand and placing the other hand. To the back of the head; the second action unit includes bending to the side holding the dumbbell.
  • the electronic device In a dumbbell lateral flexion training action, if the first action unit triggers two voices, the electronic device will selectively play the two voices. If the second action unit triggers 3 voices, the electronic device selectively plays from these 3 voices.
  • one squat training action is one action unit. If a deep training action triggers 4 voices, the electronic device will selectively play from the 4 voices.
  • the electronic device can perform selective playback during the execution of the current action unit, and does not necessarily perform selective playback after the execution of the current action unit.
  • the user is usually required to follow the action points in the process of squatting, and there are basically no special requirements in the process of getting up after the user squats. Therefore, after the squat is over, the electronic device can selectively play the voice triggered during the user's squatting process. The voice may have already been played during the user's squat and stand up process, and it will not be the same as the next training session. Action conflicts.
  • a squat training action is an action unit, and the user squats for the first time, second time, and third time.
  • the electronic device selects to play voice 1 from a plurality of triggered voices. If the voice 1 is not played after the first squat is completed, the voice 1 will continue to be played during the second squat. In addition, the electronic device discards the voice triggered by the second squat. If the voice 1 has already been played during the third squat, the electronic device selectively plays from the multiple voices triggered by the third squat.
  • the voice triggered by the electronic device to the same action unit may include multiple voices of multiple types and multiple dimensions.
  • the electronic device performs selective voice playback from these voices, which can make the content of voice playback difficult. It is repetitive, unpredictable and flexible, and the voice played corresponds to the user's training actions and status in real time.
  • the electronic device may select the first voice to be triggered for playback according to the sequence of triggering.
  • the electronic device may selectively play multiple voices triggered by the same action unit according to the priority of the voice. Among them, compared with low-priority voices, high-priority voices are more important. That is, the electronic device can preferentially play important voices.
  • the priority of the main process voice is higher than the action improvement voice, and the non-main process voices such as the training rhythm voice and the action evaluation voice.
  • the priority of the main process voice is higher than the action-improved voice, and the priority of the action-improved voice is higher than the action evaluation voice and the training rhythm voice.
  • the electronic device determines that multiple main process voices are triggered according to the user's state during the execution of the same action unit of the user, the main process voice has a higher priority because the function of the main process voice is more important. Therefore, the electronic device can play each main process voice separately to ensure that the user's fitness training can be carried out normally. Specifically, the electronic device may sequentially play each main process voice according to the sequence of triggering.
  • the electronic device determines that the main process voice "please stand in front of the screen” is triggered first, and then the main process voice "please stay away from the corner of the table” is triggered. At this time, referring to FIG. 7B, the electronic device may first play the main process voice "please stand in front of the screen", and display the first graphic information corresponding to the voice. After the electronic device detects that the user stands in front of the screen, referring to Figure 7C, the electronic device can play the main process voice "please stay away from the corner of the table” and display the first graphic information corresponding to the voice.
  • the electronic device detects that the user adjusts to the state required by the first image and text information according to the main process voice and the first image and text information, the first image and text information will not disappear, and the course will not continue. , The coach animation will not play follow-up content. Therefore, the electronic device separately plays each main process voice, which will not cause subsequent course actions to be incompatible with the voice playback, and will not affect the timeliness of the voice playback.
  • the electronic device determines that at least one main process voice is triggered according to the user's state during the execution of the same action unit of the user, and determines that at least one non-main process voice is triggered according to the training action, then The priority of the main process voice is high, so the electronic device can play the at least one main process voice and discard other non-main process voices to ensure the timeliness of the voice playback, ensure that the voice playback matches the user's training actions in real time, and prevent the user from being The next action is being performed, and the electronic device is still playing the non-main process voice of the previous action.
  • the first training action is dumbbell lateral flexion training
  • one dumbbell lateral flexion training action is one action unit.
  • the second training action is dumbbell compound press
  • one dumbbell compound press is a unit of action.
  • the electronic device plays all the triggered voices corresponding to the first training action, when the second training action is performed, the voice corresponding to the first training action may not have been played, which will result in voice playback and training actions. Mismatch, poor timeliness of voice playback.
  • the electronic device determines that multiple non-main process voices are triggered according to the user's training actions during the execution of the same action unit of the user. For this situation, in scheme 1, if priority division method 1 is adopted, and multiple non-main process voices correspond to the same priority, the electronic device can select the first non-mainstream voice to be triggered from the multiple non-main process voices. Play. Alternatively, the electronic device may randomly select one non-mainstream voice from a plurality of non-main flow voices to play.
  • the electronic device may also display second graphic information on the screen, and the second graphic information is graphic information of the action-improved voice. For example, the electronic device may display the second graphic information on the side where the user's image screen is displayed.
  • the second graphic information may be a more concise graphic representation in the form of a task box.
  • a dumbbell lateral flexion training exercise is an action unit, and the electronic device determines that the user has triggered the action improvement voice "please bend to the dumbbell hand side" in the process of performing a dumbbell lateral flexion training exercise, and " A larger lateral flexion can stimulate the external oblique muscles more effectively.” These two voices correspond to the same priority.
  • the electronic device can trigger the sequence according to the sequence, or randomly select a voice to play. For example, referring to FIG. 8A, the electronic device selects and plays the voice "Please bend to the dumbbell hand side", and at the same time displays the second graphic information corresponding to the voice on the screen, that is, the prompt message 801.
  • the task on the electronic device can be changed accordingly to respond to the user's status adjustment.
  • the completion mark may be displayed on the second graphic information.
  • the second graphic information disappears.
  • the cross in the prompt message 801 is updated to a tick.
  • the check mark can be the completion mark. Then, the electronic device stops displaying the prompt message 801.
  • the electronic device may display a prompt mark on the second graphic information.
  • the cross in the prompt message 801 is updated to an exclamation mark.
  • the exclamation mark can be the prompt identification.
  • the electronic device plays the voice "Bend your body to the side, don't bend forward" for the follow-up action unit.
  • the display stops After the second graphic information is continuously displayed for a preset period of time, or after the same type of training action ends, the display stops.
  • the end of the same type of training action here refers to the end of this type of training action of dumbbell lateral flexion.
  • the electronic device if the non-mainstream voice selected by the electronic device is an action to improve the voice, and this type of training action for this fitness training (for example, multiple training actions of the same type may need to be performed), the electronic device If the action improvement voice has been played, the electronic device discards the voice and re-selects a non-mainstream voice to play according to the trigger sequence or randomly. In this way, it can be avoided that the electronic device frequently points out the same problem and the same improvement request of the user. In some technical solutions, when the electronic device is playing the newly selected non-main process voice, it may also display the second graphic information corresponding to the improved voice of the action.
  • one dumbbell lateral flexion training action is one action unit.
  • another dumbbell lateral flexion training action after FIG. 8A that is, in another action unit of the same type of training action
  • the electronic device selects The voice played is the action-improving voice "please bend to the dumbbell hand side". Since the electronic device has already played the action-improved voice, the electronic device discards the action-improved voice and selects a new voice to play.
  • the non-mainstream voice selected by the electronic device is an action to improve the voice for the first error (for example, the error is the user bends to the side of the hand that is not holding the dumbbell), and is targeted for this exercise
  • the electronic device has played the action improvement voice before, then the electronic device can play the action points for the first error, and after the action points are played, determine whether to trigger the subsequent training actions Voice, to avoid mismatch between subsequent training actions and voice.
  • the electronic device can continue to play the coach animation of the subsequent course, and the user can continue the subsequent training action.
  • the content of the corresponding action points is also different.
  • one dumbbell lateral flexion training action is an action unit.
  • the voice selected by the electronic device to be played is the action-improving voice "Please move to the dumbbell hand side "Bending"
  • the electronic device no longer plays the voice
  • the action point of the training action is "Straight back and leaning over, one holding dumbbell, the other Put your hand behind your head, bend to the side of the dumbbell, don't bend forward.”
  • the electronic device can continue to play the coach animation of the subsequent courses, but it is not yet certain whether the voice is triggered during the subsequent courses.
  • the electronic device when the electronic device plays the action point, it can also display the second graphic information corresponding to the improved voice of the action.
  • the electronic device can discard the action to improve the voice, and display the second graphic information corresponding to the action to improve the voice. Then, in some embodiments, the electronic device reselects a voice from other voices triggered by the action unit to play; or, in other embodiments, the electronic device no longer plays the voice triggered by the action unit.
  • the electronic device selects the voice with the highest priority to play. If there are multiple voices with the highest priority, one voice is selected to play according to the trigger sequence or randomly.
  • the electronic device determines that a non-main process voice is triggered according to the user's training action during the user's execution of the same action unit, the electronic device plays the non-main process voice.
  • the electronic device plays the training rhythm voice.
  • the electronic device determines that the triggered action evaluation voice is more. If the electronic device plays the action evaluation voice for each action, the voice has been continuously played, and the action evaluation voice is played too frequently, and the user experience is poor.
  • the electronic device may use a bye mechanism to play the action evaluation voice.
  • the electronic device randomly determines whether it is currently in the bye mode or the play mode. If the result of the random determination by the electronic device is the bye mode, the action evaluation voice is discarded, and the electronic device does not play the action evaluation voice. If the result of the random determination by the electronic device is the play mode, one is randomly selected from the level corresponding to the degree of matching between the user's training action and the standard action (for example, perfect level or awesome level), and the rich corpus of action evaluation speech included The voice is played.
  • the randomness and unpredictability of voice playback can be increased, the appearance of repetitive action evaluation voices can be reduced, the dullness and predictability of repeated evaluations can be reduced, and the user can have a new and fresh experience.
  • the electronic device may randomly bye or play a perfect level of action evaluation voice.
  • the perfect level of action evaluation voice played by the electronic device is also randomly determined.
  • the electronic device when the electronic device is playing the action evaluation voice, it may also display graphic information corresponding to the action evaluation voice, so as to better affirm and encourage the user, give the user confidence, and increase the user's interest in training.
  • graphic information corresponding to the action evaluation voice that is, the prompt information 1101.
  • the priority of the voice is divided by the voice type, and the priority of the voice can also be determined in other ways.
  • the priority of voice can be determined according to the actual situation of the user and combined with artificial intelligence AI.
  • the electronic device determines that the user's historical performance is poor according to the results of AI learning, but the standard level of this training action is significantly improved, it can indicate that the user has made great progress, so the electronic device can be prioritized Play a positive, positive, and encouraging action evaluation voice, that is, the action evaluation voice has the highest priority. In this way, users can be informed of their own progress in time and feel the joy of progress, thereby increasing user confidence and enhancing user interest.
  • the electronic device can also adjust the played voice through AI. For example, for fatter users, squats are more difficult to perform. If the user’s squat training action does not match the standard action as well as the awesome level, but is close to the awesome level, the electronic device can play the awesome level action evaluation voice to encourage the user; if the user’s training action matches the standard action If the level is close to the perfect level, the electronic device can play a perfect level of action evaluation voice to encourage the user and enhance the user's self-confidence.
  • the electronic device can give priority to playing the training rhythm voice and increase the playing frequency of the training rhythm voice to facilitate the user to know the current training in time The progress and the number of remaining actions encourage users to persist in doing a complete set of training actions.
  • the electronic device when the electronic device plays the main process voice, action improvement voice, action evaluation voice, or training rhythm voice, it can also display the text content corresponding to the voice on the screen, so that the user can understand the electronic device more clearly.
  • the voice prompt content of the device Exemplarily, in the case shown in FIG. 5A, referring to FIG. 12, the electronic device may display the text content 1201 corresponding to the voice on the top of the screen.
  • the electronic device is preset with a voice library including multiple voices of various types and dimensions.
  • the electronic device can target the user's current exercise in real time when the user performs exercise training according to the coach animation. State and movement actions, select different voices from the voice library to play.
  • the electronic device is not preset with a voice library including multiple voices, but a single-character corpus is preset.
  • the single-character corpus includes multiple single characters or words.
  • the electronic device can analyze and synthesize more targeted and humanized voice prompts based on the user's current exercise state and exercise actions in real time when the user performs exercise training according to the coach animation, so as to perform voice playback.
  • the vocabulary/word corpus includes "beautiful, perfect, handsome, invincible, you, you, myself, screen, temperament, front, side, bend, yes, exactly the same, true, very, very, good, consistent, and ising"Wait.
  • a voice synthesized by the electronic device in real time is "You are so handsome!"
  • Another exemplary voice synthesized by the electronic device in real time is "beautiful, in line with your own temperament!.
  • Another embodiment of the present application also provides an intelligent voice playback method, which may be applied to an electronic device, and the electronic device may include a screen, a camera, and an audio player.
  • the method may include:
  • the electronic device displays an interface of a first application program, and the first application program is used for a user to perform sports training.
  • the electronic device may display the interface of the first application program on the screen.
  • the first application program may be the aforementioned AI fitness APP
  • the interface of the first application program may be the interface shown in (c) in FIG. 4.
  • the sports training may be AI fitness training, AI yoga training, AI bodybuilding operation training, or AI somatosensory game and other sports training.
  • the electronic device collects images of the user's training actions.
  • the electronic device can collect images of the user's training actions through the camera.
  • the electronic device plays the animation of the standard action and displays the image of the user's training action.
  • FIG. 6 Exemplarily, an interface for the electronic device to play standard motion animations and display images of user training motions can be seen in FIG. 6.
  • the electronic device determines a voice to be selected triggered by a first action unit in the user's training action.
  • the voice to be selected includes multiple voices, and the first action unit is a training action or a part of a training action.
  • the first action unit may be any action unit in the user's exercise training process.
  • the electronic device selects a voice from the to-be-selected voices to play.
  • step 1304-step 1305 For the related descriptions in step 1304-step 1305, please refer to the description in step 305-step 310 above, which will not be repeated here.
  • the electronic device can determine multiple candidate voices matching the user's motion state for the user's current action unit in real time, and choose to play different voices from the candidate voices , So as to give users real-time voice feedback, and the voice content played is rich and changeable, not easy to repeat, and the user experience is better.
  • Another embodiment of the present application also provides an electronic device, which may include a display unit, a collection unit, a determination unit, a playback unit, and so on. These units can execute the steps in the above-mentioned embodiments to realize the intelligent voice playback method.
  • the embodiments of the present application also provide an electronic device, including one or more processors; a memory; and one or more computer programs.
  • One or more computer programs are stored in the memory, and the one or more computer programs include instructions.
  • the electronic device is caused to execute each step in the foregoing embodiment, so as to realize the foregoing intelligent voice playback method.
  • the embodiment of the present application also provides a computer storage medium, the computer storage medium stores computer instructions, when the computer instructions run on the electronic device, the electronic device executes the above-mentioned related method steps to realize the intelligent voice in the above-mentioned embodiment Play method.
  • the embodiments of the present application also provide a computer program product, which when the computer program product runs on a computer, causes the computer to execute the above-mentioned related steps, so as to realize the intelligent voice playback method executed by the electronic device in the above-mentioned embodiment.
  • the embodiments of the present application also provide a device.
  • the device may specifically be a chip, component or module.
  • the device may include a processor and a memory connected to each other.
  • the memory is used to store computer execution instructions.
  • the processor can execute the computer-executable instructions stored in the memory, so that the chip executes the intelligent voice playback method executed by the electronic device in the foregoing method embodiments.
  • the electronic device, computer storage medium, computer program product, or chip provided in this embodiment are all used to execute the corresponding method provided above. Therefore, the beneficial effects that can be achieved can refer to the corresponding method provided above. The beneficial effects of the method will not be repeated here.
  • the disclosed device and method can be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods, for example, multiple units or components may be divided. It can be combined or integrated into another device, or some features can be omitted or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate parts may or may not be physically separate.
  • the parts displayed as units may be one physical unit or multiple physical units, that is, they may be located in one place, or they may be distributed to multiple different places. . Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a readable storage medium.
  • the technical solutions of the embodiments of the present application are essentially or the part that contributes to the prior art, or all or part of the technical solutions can be embodied in the form of a software product, and the software product is stored in a storage medium. It includes several instructions to make a device (may be a single-chip microcomputer, a chip, etc.) or a processor (processor) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the foregoing storage media include: U disk, mobile hard disk, read only memory (read only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Business, Economics & Management (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Physical Education & Sports Medicine (AREA)
  • Educational Technology (AREA)
  • Educational Administration (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Engineering & Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

一种智能语音播放方法及设备,涉及电子技术领域,可以应用于人工智能健身场景,能够实时针对用户当前的训练状态和训练动作播放不同的语音,从而给用户以实时的语音反馈及动作改善的实时引导,且语音内容丰富多变,用户体验较好。该语音播放方法包括:电子设备显示第一应用程序的界面,第一应用程序用于用户进行运动训练(1301);采集用户训练动作的图像(1302);播放标准动作的动画,并显示用户训练动作的图像(1303);确定用户训练动作中的第一动作单元触发的多条待选语音,第一动作单元为一次训练动作或者为一次训练动作的一部分(1304);从待选语音中选择一条语音进行播放(1305)。

Description

一种智能语音播放方法及设备
本申请要求于2019年8月30日提交国家知识产权局、申请号为201910818708.0、申请名称为“一种智能语音播放方法及设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及电子技术领域,尤其涉及一种智能语音播放方法及设备。
背景技术
近年来,随着健身需求的提高和健身意识的增强,健身爱好群体逐渐增大。随着生活节奏的加快,许多健身爱好者常常没有时间前往健身房进行专门的健身训练。基于图像处理的智能健身方案,可以使得用户足不出户即可得到专业的健身指导。
现有技术中的一种智能健身方案为,在大屏设备上播放教练标准动作的动画,以为用户提供专业的动作指导,用户参照教练的标准动作进行训练。并且,大屏设备还可以配合教练动画播放预设的一套语音,以辅助教练动画指导用户进行健身训练。
在该智能健身方案中,大屏设备机械性地重复播放同一套预设的固定语音,将使得语音播放较为单调和无趣。尤其在用户多次训练和使用的过程中,用户每次均听到重复性播放的同样的语音,容易使得用户感到乏味从而失去兴趣,因而用户体验较差。
发明内容
本申请实施例提供一种智能语音播放方法及设备,能够实时针对用户当前的训练状态和训练动作播放不同的语音,从而给用户以实时的语音反馈以及动作改善的实时引导,并且语音内容丰富多变,用户体验较好。
为达到上述目的,本申请实施例采用如下技术方案:
一方面,本申请实施例提供了一种语音播放方法,该方法包括:电子设备显示第一应用程序的界面,第一应用程序用于用户进行运动训练。电子设备采集用户训练动作的图像。电子设备播放标准动作的动画,并显示用户训练动作的图像。电子设备确定用户训练动作中的第一动作单元触发的待选语音,待选语音包括多条语音。其中,第一动作单元为一次训练动作或者为一次训练动作的一部分。电子设备从待选语音中选择一条语音进行播放。
在该方案中,电子设备能够实时针对用户当前的动作单元确定与用户的运动状态相匹配的多条待选语音,并从待选语音中选择播放不同的语音,从而给用户以实时的语音反馈,且播放的语音内容丰富多变、不易重复,用户体验较好。
在一种可能的设计中,待选语音包括主流程语音和非主流程语音。主流程语音用于使得运动训练正常进行。非主流程语音包括动作改善语音、动作评价语音或训练节奏语音中的一项或多项。主流程语音为高优先级的语音,非主流程语音为低优先级的语音。电子设备从待选语音中选择一条语音进行播放,包括:电子设备根据语音的优先级,从待选语音中选择一条语音进行播放。
也就是说,电子设备可播放的语音包括多种类型,在用户根据教练动画进行运动训练时,电子设备可以根据语音的优先级,实时针对用户当前的运动状态和运动动作播放多种不同类型的语音,因而待播放的语音内容丰富、多样,能够避免播放固定的一套语音导致的单调乏味的感受,增加用户的训练兴趣,提高用户的使用体验。
在另一种可能的设计中,待选语音包括至少一条高优先级的主流程语音,电子设备根据语音的优先级,从待选语音中选择一条语音进行播放,包括:电子设备分别播放待选语音中的每条高优先级的主流程语音。
由于主流程语音用于保证运动训练的正常进行,主流程语音较为重要且优先级高,因而电子设备可以依次播放每一条主流程语音。
在另一种可能的设计中,在电子设备播放每条主流程语音时,该方法还包括:电子设备停止播放标准动作的动画。电子设备显示第一图文信息,第一图文信息与主流程语音相对应。在电子设备检测到用户调整到第一图文信息要求的状态后,电子设备在第一图文信息上显示完成标识。而后,电子设备停止显示第一图文信息。电子设备继续播放标准动作的动画。
其中,该第一图文信息可以是任务框形式的图文信息。电子设备在播放主流程语音时,可以显示第一图文信息并暂停运动训练的进程。在用户根据主流程语音或第一图文信息进行相应的调整后,电子设备可以停止显示第一图文信息并继续后续的运行训练。
在另一种可能的设计中,待选语音在包括上述至少一条主流程语音的基础上,还可以包括至少一条非主流程语音。
也就是说,如果电子设备确定第一动作单元触发的待选语音包括至少一条主流程语音和至少一条非主流程语音,则电子设备播放至少一条高优先级的主流程语音。
在另一种可能的设计中,待选语音包括多条低优先级的非主流程语音。电子设备根据语音的优先级,从待选语音中选择一条语音进行播放,包括:电子设备从多条低优先级的非主流语音中选择第一目标语音。电子设备播放第一目标语音。
在该方案中,如果电子设备确定第一动作单元触发的待选语音包括至少一条低优先级的非主流程语音,且未包括主流程语音,则电子设备从至少一条非主流程语音中选择一条语音进行播放,以避免播放动作单元对应的多条语音导致的,语音播放与后续的动作单元比匹配的问题。
在另一种可能的设计中,第一目标语音为多条非主流程语音中最先触发的一条语音。
也就是说,电子设备按照触发的先后顺序选择最先触发的一条语音进行播放。
在另一种可能的设计中,第一目标语音为针对动作单元中的第一错误的第一动作改善语音。电子设备播放第一目标语音,包括:若在本次运动训练中,电子设备针对动作单元所属类型的训练动作,未播放过第一动作改善语音,则电子设备播放第一动作改善语音。方法还包括:若在本次运动训练中,电子设备针对动作单元所属类型的训练动作,已播放过第一动作改善语音,则电子设备播放针对第一错误的动作要点;或者电子设备从第一动作改善语音以外的其他非主流程语音中选择第二目标语音。
在该方案中,如果电子设备从待选语音中选择的语音为动作改善语音,且该动作 改善语音已经播放过,则电子设备不再播放该动作改善语音,以避免出现针对同一错误频繁进行同样的语音改善提示而导致的用户体验较差的问题。
在另一种可能的设计中,在电子设备播放第一动作改善语音时,该方法还包括:电子设备显示第二图文信息,第二图文信息与第一动作改善语音相对应。在电子设备检测到用户将训练动作调整到第二图文信息要求的状态后,电子设备在第二图文信息上显示完成标识。而后,电子设备停止显示第二图文信息。
其中,该第二图文信息可以是任务框形式的图文信息。电子设备在播放动作改善语音时,可以显示第二图文信息。在用户根据动作改善语音或第二图文信息进行相应的调整后,电子设备可以停止显示第二图文信息。
在另一种可能的设计中,第一目标语音为第一动作评价语音,电子设备播放第一目标语音,包括:若电子设备采用随机方式确定当前为第一模式,则电子设备播放第一动作评价语音。方法还包括:若电子设备通过随机的方式确定当前为第二模式,则电子设备不播放第一动作评价语音。
也就是说,电子设备在从待选语音中选择一条动作评价语音后,可以随机确定是否进行播放,从而避免频繁、规律性地播放动作评价语音,提高动作评价语音播放的不确定性。
在另一种可能的设计中,动作评价语音包括多个级别,每个级别包括多条语音,第一动作评价语音为第一级别的动作评价语音,电子设备播放第一动作评价语音,包括:电子设备从第一级别的动作评价语音中,随机选择一条语音进行播放。
这样,在电子设备确定要播放动作评价语音时,可以从包括多条动作评价语音的语料库中随机选择一条进行播放,从而可以提高动作评价内容的不可预测性和不确定性,避免动作评价语音的重复播放,给用户以耳目常新的使用体验。
在另一种可能的设计中,电子设备播放第一目标语音,包括:若电子设备确定第一动作单元之前的另一动作单元触发的语音已播放完成,则电子设备播放第一目标语音。该方法还包括:若电子设备确定第一动作单元之前的另一动作单元触发的语音尚未播放完成,则电子设备不播放第一目标语音。
也就是说,电子设备在一条语音播放完之后,才会播放另一条语音,不会由于不同动作单元的语音冲突导致语音播放的中断。
在另一种可能的设计中,待选语音包括一条低优先级的非主流程语音;电子设备根据待选语音中语音的优先级,进行语音播放,包括:电子设备播放该条低优先级的第一目标语音。
在该方案中,如果电子设备确定第一动作单元触发的待选语音包括一条低优先级的非主流程语音,则电子设备播放该条语音。
在另一种可能的设计中,在电子设备显示第一应用程序的界面之后,且电子设备确定用户训练动作中的第一动作单元触发的待选语音之前,该方法还包括:电子设备根据运动训练的进度或用户的状态确定触发主流程语音;电子设备播放主流程语音。
也就是说,电子设备在进入第一应用程序后,可以先确定是否触发了主流程语音,并播放触发的主流程语音,以保证运动训练能够正常开始。
在另一种可能的设计中,动作改善语音用于引导用户改善训练动作。动作评价语 音用于针对用户的训练动作给用户以肯定性的评价。训练节奏语音用于提示用户运动训练的进度。
这样,不同类型的语音可以对用户的运行训练进行不同方面和不同角度的提示。
在另一种可能的设计中,主流程语音包括流程性语音,位置调整语音,站位调整语音或人性化提示语音中的一项或多项。动作改善语音包括频率改善语音,幅度改善语音或姿态改善语音中的一项或多项。
也就是说,每种类型的语音还可以包括多个维度的语音提示。这样,电子设备可以从不同的维度和角度给用户以语音提示,从而可以进一步丰富可播放语音的语音内容,语音内容可以灵活多变,语音提示更为具体和全面,因而可以给用户以新鲜、有趣的感觉,能够提高用户的使用体验。
在另一种可能的设计中,电子设备确定用户的训练动作中的第一动作单元触发的待选语音,包括:电子设备根据第一动作单元执行过程中用户的状态,确定待选语音中的位置调整语音、站位调整语音或人性化提示语音。电子设备根据第一动作单元中的动作,确定待选语音中的非主流程语音。
这样,电子设备可以根据第一动作单元执行过程中用户的状态,以及用户动作的标准程度、动作的数量等动作信息,确定第一动作单元触发的待选语音。
另一方面,本申请实施例提供了一种语音播放装置,该装置包含在电子设备中。该装置具有实现上述方面及可能的设计中任一方法中电子设备行为的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。硬件或软件包括至少一个与上述功能相对应的模块或单元。例如,显示模块/单元,采集模块/单元,确定模块/单元,播放模块/单元等。
另一方面,本申请实施例提供了一种电子设备,包括:一个或多个处理器;以及存储器,存储器中存储有代码。当代码被电子设备执行时,使得电子设备执行上述方面任一项可能的设计中的语音播放方法。
另一方面,本申请实施例提供了一种计算机存储介质,包括计算机指令,当计算机指令在移动终端上运行时,使得移动终端执行上述方面任一项可能的设计中的语音播放方法。
又一方面,本申请实施例提供了一种计算机程序产品,当计算机程序产品在计算机上运行时,使得计算机执行上述方面任一项可能的设计中的语音播放方法。
上述其他方面对应的有益效果,可以参见关于方法方面的有益效果的描述,此处不予赘述。
附图说明
图1A为本申请实施例提供的一种电子设备的结构示意图;
图1B为本申请实施例提供的一种系统示意图;
图2为本申请实施例提供的另一种电子设备的结构示意图;
图3为本申请实施例提供的一种语音播放的流程图;
图4为本申请实施例提供的一组界面示意图;
图5A为本申请实施例提供的一种语音提示和界面显示效果示意图;
图5B为本申请实施例提供的一种界面显示效果示意图;
图5C为本申请实施例提供的另一种界面显示效果示意图;
图6为本申请实施例提供的另一种界面显示效果示意图;
图7A为本申请实施例提供的另一种界面显示效果示意图;
图7B为本申请实施例提供的另一种语音提示和界面显示效果示意图;
图7C为本申请实施例提供的另一种语音提示和界面显示效果示意图;
图8A为本申请实施例提供的另一种语音提示和界面显示效果示意图;
图8B为本申请实施例提供的另一种界面显示效果示意图;
图8C为本申请实施例提供的另一种语音提示和界面显示效果示意图;
图9A为本申请实施例提供的另一种界面显示效果示意图;
图9B为本申请实施例提供的另一种语音提示和界面显示效果示意图;
图10为本申请实施例提供的一种动作评价语音播放的时序图;
图11为本申请实施例提供的另一种语音提示和界面显示效果示意图;
图12为本申请实施例提供的另一种界面显示效果示意图;
图13为本申请实施例提供的另一种语音播放的流程图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。其中,在本申请实施例的描述中,除非另有说明,“/”表示或的意思,例如,A/B可以表示A或B;本文中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,在本申请实施例的描述中,“多个”是指两个或多于两个。
本申请实施例提供了一种智能运动系统中的智能语音播放方法,可以在用户根据教练动画进行运动训练时,实时针对用户当前的运动状态和运动动作播放不同的语音,从而给用户以实时的语音反馈和动作改善的实时引导,并且语音内容丰富多变,用户体验较好。
其中,该运动系统用于用户对比教练动画进行AI运动训练的场景。例如,用户可以进行AI健身训练、AI瑜伽训练、AI健美操作训练或AI体感游戏等其他AI运动训练。
在一些实施例中,参见图1A,本申请实施例提供的智能语音播放方法可以应用于具有屏幕10的电子设备01。尤其可以应用于具有大屏的电子设备01。该电子设备01还可以包括摄像头20。该摄像头20可以集成在电子设备01中,也可以是电子设备01主体外独立的摄像头,并通过有线或无线的方式与电子设备01的主体连接。此外,该电子设备01还可包括音频播放器30。该音频播放器30可以集成在电子设备01中,例如可以是扬声器或音箱等。该音频播放器30也可以是通过有线或无线的方式与电子设备01的主体连接的音频播放设备,例如可以是音箱等。
其中,该摄像头20可以用于采集用户实时运动的图像。电子设备01的屏幕10可以用于播放教练动画并显示用户实时运动的图像画面。电子设备01在用户根据教练动画进行运动训练时,实时确定针对用户当前的训练状态和训练动作的待播放语音,并通过音频播放器30进行语音播放。
例如,该电子设备01可以是电视机、台式电脑、平板电脑、笔记本电脑、手机、 智慧屏、投影仪、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本或增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备等。本申请实施例对电子设备01的具体类型不作限制。
在另一些实施例中,本申请实施例提供的智能语音播放方法还可以应用于如图1B所示的系统。该系统包括具有屏幕的电子设备02,以及与电子设备02配合使用的电子设备03。电子设备02或电子设备03可以包括摄像头,该摄像头用于采集用户实时运动的图像。电子设备02或电子设备03可以包括音频播放器,该音频播放器用于播放语音。例如,该电子设备03可以是手机、可穿戴设备(例如手表或手环等)、平板电脑或笔记本电脑等。
比如,大屏电视机和手机配合使用,电视机的屏幕用于播放教练动画并显示用户实时运动的图像画面。与电视机配合使用的手机可以在用户根据教练动画进行运动训练时,实时确定针对用户当前的训练状态和训练动作的待播放语音,手机或电视机上的音频播放器可以进行语音播放。
示例性的,图2示出了采用本申请实施例提供的智能语音播放方法的电子设备100的一种结构示意图。如图2所示,电子设备100可以包括:处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,电源管理模块140,天线,无线通信模块160,音频模块170,扬声器170A,麦克风170C,音箱接口170B,传感器模块180,按键190,指示器191,摄像头193,以及显示屏192等。
其中,上述传感器模块180可以包括距离传感器,接近光传感器,指纹传感器,温度传感器,触摸传感器,环境光传感器等传感器。
可以理解的是,本实施例示意的结构并不构成对电子设备100的具体限定。在另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
控制器可以是电子设备100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound, I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,和/或USB接口等。
可以理解的是,本实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对电子设备100的结构限定。在另一些实施例中,电子设备100也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。
电源管理模块140用于连接电源。充电管理模块140还可以与处理器110、内部存储器121、显示屏194、摄像头193和无线通信模块160等连接。电源管理模块141接收电源的输入,为处理器110、内部存储器121、显示屏194、摄像头193和无线通信模块160等供电。在一些实施例中,电源管理模块141也可以设置于处理器110中。
电子设备100的无线通信功能可以通过天线和无线通信模块160等实现。其中,无线通信模块160可以提供应用在电子设备100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。
无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。在一些实施例中,电子设备100的天线和无线通信模块160耦合,使得电子设备100可以通过无线通信技术与网络以及其他设备通信。
电子设备100通过GPU,显示屏192,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏192和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏192用于显示图像,视频等。该显示屏192包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。
在本申请的实施例中,显示屏192可以用于显示教练动画和用户实时运动的图像画面。
电子设备100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏192以及应用处理器等实现拍摄功能。ISP用于处理摄像头193反馈的数据。在一些实施例中,ISP可以设置在摄像头193中。
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元 件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,电子设备100可以包括1个或N个摄像头193,N为大于1的正整数。例如,摄像头193可以设置于电子设备100的显示屏192的上侧边缘处。当然,本申请实施例对摄像头193在电子设备100上的位置不作限定。
或者,电子设备100可以不包括摄像头,即上述摄像头193并未设置于电子设备100中。电子设备100可以通过接口(如USB接口130)外接摄像头193。该外接的摄像头193可以通过外部固定件(如带夹子的摄像头支架)固定在电子设备100上。例如,外接的摄像头193可以通过外部固定件,固定在电子设备100的显示屏192的边缘处,如上侧边缘处。
在本申请的实施例中,该摄像头193可以用于采集用户实时运动的图像画面。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。视频编解码器用于对数字视频压缩或解压缩。电子设备100可以支持一种或多种视频编解码器。这样,电子设备100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现电子设备100的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,从而执行电子设备100的各种功能应用以及数据处理。例如,在本申请实施例中,处理器110可以通过执行存储在内部存储器121中的指令,内部存储器121可以包括存储程序区和存储数据区。
其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储电子设备100使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。
在本申请的实施例中,处理器110可以在用户根据教练动画进行运动训练时,实时针对用户当前的运动状态和运动动作确定不同的语音。
电子设备100可以通过音频模块170,扬声器170A,麦克风170C,音箱接口170B,以及应用处理器等实现音频功能。例如,音乐播放,录音等。
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。
音箱接口170B用于连接有线音箱。音箱接口170B可以是USB接口130,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。
在本申请的实施例中,扬声器170A,或者音箱接口170B及连接的音箱,可以用于播放处理器110在用户根据教练动画进行运动训练时,实时针对用户当前的运动状态和运动动作确定不同的语音。
按键190包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。电子设备100可以接收按键输入,产生与电子设备100的用户设置以及功能控制有关的键信号输入。
指示器191可以是指示灯,可以用于指示电子设备100处于开机状态、待机状态或者关机状态等。例如,指示灯灭灯,可指示电子设备100处于关机状态;指示灯为绿色或者蓝色,可指示电子设备100处于待机状态;指示灯为红色,可指示电子设备100处于待机状态。
通常,电子设备100会配有一遥控器。该遥控器用于控制电子设备100。该遥控器可以包括:多个按键,如电源按键、音量按键、以及其他的多个选择按键。遥控器上的按键可以是机械按键,也可以是触摸式按键。遥控器可以接收按键输入,产生与电子设备100的用户设置以及功能控制有关的键信号输入,并向电子设备100发送相应的控制信号,以控制电子设备100。例如,遥控器可以通过红外信号等向电子设备100发送控制信号。该遥控器还可以包括电池收纳腔,用于安装电池,为遥控器供电。
可以理解的是,本申请实施例示意的结构并不构成对电子设备100的具体限定。其可以具有比图2中所示出的更多的或者更少的部件,可以组合两个或更多的部件,或者可以具有不同的部件配置。例如,该电子设备还可以包括音箱等部件。图2中所示出的各种部件可以在包括一个或多个信号处理或专用集成电路在内的硬件、软件、或硬件和软件的组合中实现。
可以理解的是,电子设备100也可以包括上述不同的部件,可以具有比图2中所示出的更多的或者更少的部件,可以组合两个或更多的部件,或者可以具有不同的部件配置,本申请实施例不予限定。
在本申请实施例中,图2所示的电子设备100中的摄像头193可以用于采集用户实时运动的图像画面。显示屏192可以用于显示教练动画和用户实时运动的图像画面。处理器110可以根据预设规则,在用户根据教练动画进行运动训练时,实时针对用户当前的运动状态和运动动作确定不同的语音。扬声器170A,可以用于播放处理器110确定的语音。这样,电子设备可以在用户根据教练动画进行运动训练时,实时针对用户当前的运动状态和运动动作播放不同的语音,从而给用户以实时的语音反馈和动作 改善的引导,并且语音内容丰富多变,用户体验较好。
以下将以具有图2所示结构的电子设备为例,对本申请实施例的智能语音播放方法进行说明。
在本申请实施例提供的智能语音播放方法中,电子设备可播放的语音包括主流程语音和非主流程语音。主流程语音可以包括动作改善语音、训练节奏语音或动作评价语音等类型中的一种或多种。电子设备在用户根据教练动画进行运动训练时,可以实时针对用户当前的运动状态和运动动作播放多种不同类型的语音,因而播放的语音内容丰富、多样,且语音内容不易重复,能够避免播放固定的一套语音导致的单调乏味的感受,增加用户的训练兴趣,提高用户的使用体验。
并且,每种类型的语音还可以包括多个维度的语音提示。这样,电子设备可以从不同的维度和角度给用户以语音提示,从而可以进一步丰富可播放语音的语音内容,语音内容可以灵活多变,语音提示更为具体和全面,因而可以给用户以新鲜、有趣的感觉,能够提高用户的使用体验。
其中,主流程语音可以用于向用户介绍一些当前训练课程的相关信息,保证训练流程的正常进行等。例如,主流程语音可以包括流程性语音,位置调整语音,站位调整语音或人性化提示语音等多个维度中的一项或多项。
其中,流程性语音用于介绍当前课程进度的相关信息,例如可以包括用于介绍动作名称和组数的语音,动作开始提示语音,以及动作结束提示语音等。比如,流程性语音可以为“欢迎使用健身系统,第一节,手助力深蹲,共3组,每组5个动作”。再比如,动作结束提示语音可以为“不错哦,您已完成第一节的训练动作”或“哇哦,太厉害了,看来你已经完全掌握了动作要点,下一个动作也要继续保持哦”等。
位置调整语音可以用于提示用户站在特定的位置,以使得摄像头能够采集到用户的图像,保证训练能够正常进行。例如,用户位置调整语音可以为“请站到屏幕前”或“请移动至屏幕的中间区域”等。
站位调整语音可以用于提示用户调整站位朝向,以使得摄像头能够采集到用户特定部位的图像,保证训练能够正常进行。例如,用户位置调整语音可以为“请侧向屏幕站立”。
人性化提示语音可以用于对用户进行一些人性化提示。比如,电子设备通过摄像头采集到的图像画面确定用户离桌角较近时,可以语音提示用户“请注意远离桌角,以避免受伤”。
其中,电子设备可以根据训练课程的进度,确定是否播放流程性语音。电子设备可以根据用户的位置和朝向等状态,确定是否播放位置调整语音或站位调整语音。电子设备可以根据用户当前所处的环境等状态,确定是否播放人性化提示语音。
示例性的,主流程语音的内容举例可以参见表1。
表1
Figure PCTCN2020110623-appb-000001
Figure PCTCN2020110623-appb-000002
上述动作改善语音用于引导用户改善训练动作,提高训练动作的标准程度。上述动作评价语音用于对用户训练动作进行肯定、鼓励和赞扬。在用户运动训练的过程中,电子设备可以根据采集到的用户的图像,结合预设的检测算法,确定用户训练动作与教练动画中标准动作的匹配程度。
若用户训练动作与教练动画中标准动作的匹配程度较小,则电子设备可以通过动作改善语音引导用户改善动作的标准程度,以使得用户能够根据实时的语音反馈,及时获知不足之处和改善方向,从而及时调整和提升动作质量。
在本申请的实施例中,动作改善语音为正向的,积极的,引导性的语音,用于帮助用户提高和改善动作质量从而纠正错误;并不是持续地单纯指出用户训练动作错误或不规范而导致用户体验较差。并且,即使指出用户训练动作错误或某个部位动作不规范,用户通常也不知如何进行提高和改善。
在本申请的实施例中,电子设备可以实时针对用户当前的运动动作,播放改善和提高当前训练动作质量的动作改善语音,从而针对当前训练动作给用户以实时的引导,指导用户将动作做得更为标准和规范。并且,动作改善语音可以是偏向口语化的简短易懂的语言内容。
例如,动作改善语音可以包括频率改善语音、幅度改善语音或姿态改善语音等多个维度中的一项或多项。示例性的,动作改善语音的内容举例可以参见表2。
表2
Figure PCTCN2020110623-appb-000003
若用户训练动作与教练动画中标准动作的匹配程度较大,则电子设备可以通过动作评价语音对用户给予积极性的鼓励,以增大用户的信心和成就感,提升用户的兴趣感。
例如,动作评价语音可以包括完美级别和真棒级别等多个维度。示例性的,动作评价语音的内容举例可以参见表3。
表3
Figure PCTCN2020110623-appb-000004
Figure PCTCN2020110623-appb-000005
在运动训练过程中,若动作频率较快或要求的动作数量较多,则用户不容易记住动作数量,用户不清楚已经做了几个动作,还有几个动作。或者用户并不想记住已经做了几个动作。电子设备可以通过训练节奏语音提示用户当前训练动作的进度,让用户获知还有几组/个动作,已进行了几组/个动作,帮助用户了解课程的进展状态,从而调整好身体和心理状态。或者,用户在运动过程中可能有点累了,电子设备可以通过训练节奏语音提示用户还剩几组,马上就结束了,帮助用户坚持一下。
示例性的,训练节奏语音的内容举例可以参见表4。
表4
Figure PCTCN2020110623-appb-000006
通过上述多种类型和多种维度的语音提示,电子设备可以在用户运动训练过程中,通过语音实时引导、纠正和鼓励用户,指导用户顺利进行运动训练。
以下将以用户进行健身训练为例,对本申请实施例提供的智能语音播放方法进行阐述。参见图3,该方法可以包括:
301、电子设备检测到用户打开AI健身APP的操作后,打开AI健身APP并显示APP的界面。
其中,电子设备可以检测用户通过遥控器或语音指令等方式指示电子设备打开AI健身APP的操作。或者,若电子设备的屏幕为触摸屏,则电子设备还可以检测用户指示打开AI健身APP的触摸操作。电子设备检测到用户打开AI健身APP的操作后,可以打开AI健身APP,从而开始健身课程。
示例性的,参见图4中的(a),电子设备上显示有AI健身APP图标401,电子设备检测到用户通过遥控器点击该AI健身APP图标401的操作后,打开AI健身APP,显示如图4中的(b)所示的AI健身APP的界面。而后,电子设备检测到用户选择一个健身课程(例如入门课程402)的操作后,开始该健身课程,并显示如图4中的(c)所示的界面。
302、若电子设备根据课程进度,确定触发主流程语音中的流程性语音,则电子设备播放该流程性语音。
在课程开始后,电子设备可以根据课程进度播放流程性语音,以向用户介绍当前训练课程的相关信息,对用户将要进行的训练动作进行解释和说明等。例如,电子设 备可以在课程开始时播放流程性语音。示例性的,该流程性语音可以为“欢迎使用健身入门课程”,或“即将开启训练,请做好准备!”等。
在一些实施例中,电子设备还可以在屏幕上显示流程性语音的文字信息,以从视觉上进一步提示用户。
303、若电子设备根据用户的状态和预设条件1,确定触发主流程语音中的位置调整语音、站位调整语音或人性化提示语音,则电子设备播放触发的语音。
其中,用户的状态包括用户的位置、站位朝向和所处环境等。例如,当预设条件1为用户不在摄像头采集视角范围内时,电子设备确定触发位置调整语音。再例如,当预设条件1为用户站位朝向不符合预设角度时,电子设备确定触发站位调整语音。再例如,当预设条件1为用户距离危险对象较近时,电子设备确定触发人性化提示语音。
电子设备可以根据用户的状态和预设条件1,确定是否触发主流程语音中的位置调整语音、站位调整语音或人性化提示语音,以便及时提示用户调整用户的位置、朝向等状态,保证用户能够正常开始健身训练。
例如,在图4中的(c)所示情况下,若电子设备的摄像头未检测到用户,则电子设备确定触发主流程语音“请站到屏幕前”。
在一些实施例中,电子设备在播放位置调整语音、站位调整语音或人性化提示语音等主流程语音时,可以在屏幕上显示第一图文信息。该第一图文信息为该主流程语音对应的图文信息。
例如,电子设备可以在显示用户图像画面的一侧,以任务框的形式显示第一图文信息,以方便用户根据该第一图文信息获知用户状态需要调整和改进的目标。该第一图文信息可以为针对主流程语音的更为简洁的图文表述。示例性的,该任务框的形式的第一图文信息可以参见图5A中的提示信息501。
此外,电子设备还可以在屏幕中间显示第一图文信息,以方便用户更容易看到相关的提示内容。示例性的,第一图文信息还可以包括图5A中的提示信息502。
需要注意的是,图5A所示的信息500表示电子设备播放的主流程语音。
电子设备在确定用户根据主流程语音和第一图文信息,将用户的状态调整到第一图文信息要求的状态后,电子设备上的任务框形式的第一图文信息可以进行相应的变化,以响应用户的状态调整。例如,第一图文信息上可以显示完成标识。示例性的,参见图5B,电子设备检测到用户站到屏幕前之后,可以将提示信息501之前的叉号更新为对勾。该对勾可以为该完成标识。在一些实施例中,随着任务框形式的第一图文信息的变化,电子设备还可以播放响应语音(例如嘀一声)。而后,参见图5C,屏幕上的第一图文信息消失。电子设备继续播放教练动画,从而继续后续课程。
若电子设备检测到用户未将状态调整到第一图文信息要求的状态,则第一图文信息持续显示。第一图文信息不会消失,课程也不会往下继续进行,教练动画不会播放后续内容。
若电子设备根据用户的同一状态确定触发了多条主流程语音,则电子设备可以分别播放每条主流程语音。例如,多条主流程语音可以包括“请一个人站到屏幕前”,“请侧对屏幕”和“头部回到屏幕内”等。电子设备在确定用户根据多条主流程语音 和对应第一图文信息,将用户的状态调整到第一图文信息要求的状态后,停止显示第一图文信息,并继续后续课程。
304、电子设备播放教练动画,并显示用户训练动作的图像画面。
在课程开始后,电子设备可以播放教练动画,该教练动画为教练标准动作的动画,用户可以参照教练动画进行训练动作。电子设备还可以采集用户训练动作(或称健身动作或运动动作)的图像画面,并显示在电子设备的屏幕上。即,电子设备可以在屏幕上同时显示教练动画的图像和用户训练动作的图像,以便用户对比自身的训练动作和教练动画中的标准动作的匹配程度。示例性的,在用户健身训练的过程中,电子设备显示的界面可以参见图6。
在一些实施例中,电子设备还可以在用户训练动作的图像画面上显示预设的骨骼节点,以方便用户根据骨骼节点更为准确地确定关键部位和与教练标准动作的区别,方便用户进行动作调整和改善。示例性的,骨骼节点可以如图6中的圆点601所示。
而后,在用户根据课程进行健身训练的过程中,电子设备可以确定是否触发主流程语音、动作改善语音、动作评价语音和训练节奏语音。
在步骤304之后,该方法还可以包括:
305、电子设备根据后续课程进度,确定是否播放主流程语音中的流程性语音。
例如,在课程进行过程中,电子设备可以在每组动作开始时播放流程性语音,以对接下来要进行的训练动作进行相关说明和提示。示例性的,在一组动作开始时,流程性语音可以提示该组动作的名称和组数等,例如“下一组动作…”或“3、2、1,开始!”等。
再例如,在每组动作结束后,或者在课程结束后,电子设备可以播放流程性语音。示例性的,该流程性语音可以为“您已完成全部动作!”。
306、电子设备持续监测用户的状态,根据预设条件1确定是否触发主流程语音中的位置调整语音、站位调整语音或人性化提示语音。
电子设备可以持续监测用户的状态,根据预设条件1确定是否触发主流程语音中的位置调整语音、站位调整语音或人性化提示语音,以便及时提示用户调整用户的位置、朝向等状态,保证用户能够正常进行健身训练。
307、电子设备持续检测用户训练动作,根据预设条件2确定是否触发动作改善语音。
例如,预设条件2可以包括用户训练动作,与教练动画中标准动作的匹配程度较小。动作改善语音可以用于针对当前训练动作给用户以实时的引导,指导用户将动作做得更为标准和规范。
308、电子设备持续检测用户训练动作,根据预设条件3确定是否触发动作评价语音。
例如,预设条件3可以包括用户训练动作,与教练动画中标准动作的匹配程度较大。动作评价语音可以用于针对当前训练动作给用户以积极性的鼓励,从而增大用户的信心和成就感,提升用户的兴趣感。
309、电子设备记录用户训练动作的数量,根据预设条件4确定是否触发训练节奏语音。
例如,该预设条件4为一组动作已进行到一半数量,训练节奏语音用于提示用户已进行了一半。再例如,该预设条件4为一组动作还剩N次,N的数值较小,例如可以为1,2或5等。训练节奏语音用于提示用户该组动作快就要结束了,再坚持一下。
310、电子设备对同一动作单元触发的语音进行选择性播放。
由于语音类型和维度较多,因而电子设备通过步骤305-步骤309,根据用户的状态和动作确定触发的待播放语音通常有多条。为了保证语音播放的时效性,避免用户已经在进行下一个动作了,电子设备还在播放上一个动作的相关语音,避免多条语音的播放冲突,使得播放的语音能够与用户训练动作相对应,保证语音播放与用户训练动作实时匹配,电子设备可以从该多条待播放语音中选择一条进行播放。也就是说,电子设备可以对同一动作单元触发的语音进行选择性播放。
其中,动作单元是电子设备预设的进行语音选择性播放的单位。电子设备对同一动作单元触发的语音进行选择性播放是指,电子设备对同一动作单元中的动作触发的非主流程语音,以及同一动作单元中的动作执行过程中用户的状态触发的主流程语音,进行选择性播放。
动作单元可以是一次训练动作,或者动作单元可以是一次训练动作的一部分。举例来说,对于哑铃体侧屈这一训练动作来说,一次哑铃体侧屈训练动作可以包括第一动作单元和第二动作单元,第一动作单元包括是一手握哑铃,另一只手放到脑后;第二动作单元包括向握哑铃的一侧屈体。在一次哑铃体侧屈训练动作中,若第一动作单元触发了2条语音,则电子设备从这2条语音中进行选择性播放。若第二动作单元触发了3条语音,则电子设备从这3条语音中进行选择性播放。
在另一举例中,对于深蹲这一训练动作来说,一次深蹲训练动作为一个动作单元。若一次深度训练动作触发了4条语音,则电子设备从者4条语音中进行选择性播放。
可以理解的是,电子设备可以在当前动作单元执行过程中进行选择性播放,而不用一定在当前动作单元的执行完再进行选择性播放。例如,对于深蹲这一训练动作来说,通常要求用户在下蹲的过程中遵循动作要点,在用户下蹲后起来的过程中基本没有特别的要点要求。因而,在下蹲结束后,电子设备就可以对用户下蹲的过程中触发的语音进行选择性播放,在用户下蹲后站起的过程中该语音可能就已经播放完了,不会与下一个训练动作发生冲突。
若电子设备针对一个动作单元选择播放的语音,在该动作单元的执行过程中确实未播放完,则课程和用户训练动作可以继续进行,且该语音不会被中断,而会继续播放直至播放完成。在该语音播放过程中的后续动作单元触发的语音将被舍弃。举例来说,一次深蹲训练动作为一个动作单元,用户深蹲了第一次、第二次和第三次。针对用户的第一次深蹲,电子设备从触发的多条语音中选择播放语音1。若第一次深蹲完成后语音1未播放完成,则在第二次深蹲的过程中继续播放语音1。并且,电子设备舍弃第二次深蹲触发的语音。若在第三次深蹲时语音1已经播放完成,则电子设备从第三次深蹲触发的多条语音中进行选择性播放。
在本申请的实施例中,电子设备对同一动作单元触发的语音可以包括多种类型、多种维度的多条语音,电子设备从这些语音中进行选择性语音播放,可以使得语音播放的内容不易重复、不可预测且灵活多变,而且播放的语音与用户的训练动作和状态 实时对应。
在一些实施例中,对于同一动作单元触发的语音,电子设备可以根据触发的先后顺序,选择最先触发的一条语音进行播放。
在另一些实施例中,电子设备可以根据语音的优先级,对同一动作单元触发的多条语音进行选择性播放。其中,与优先级低的语音相比,优先级高的语音更为重要。即,电子设备可以优先播放重要的语音。
比如,在优先级划分方式1中,主流程语音的优先级高于动作改善语音,训练节奏语音和动作评价语音这些非主流程语音。再比如,在优先级划分方式2中,主流程语音的优先级高于动作改善语音,动作改善语音的优先级高于动作评价语音和训练节奏语音。
在一种情况下,若电子设备在用户同一动作单元的执行过程中,根据用户的状态确定触发了多条主流程语音,则由于主流程语音的功能较为重要,主流程语音的优先级较高,因而电子设备可以分别播放每条主流程语音,以保证用户的健身训练能够正常进行。具体的,电子设备可以根据触发的先后顺序依次播放每条主流程语音。
例如,在用户进行如图7A所示的哑铃侧体屈训练时,用户已偏离屏幕范围,且用户在桌子旁边。电子设备确定先触发了主流程语音为“请站到屏幕前”,后触发了主流程语音“请远离桌角”。此时,参见图7B,电子设备可以先播放主流程语音为“请站到屏幕前”,并显示该语音对应的第一图文信息。电子设备检测到用户站到屏幕前之后,参见图7C,电子设备可以播放主流程语音“请远离桌角”,并显示该语音对应的第一图文信息。
如前所述,在电子设备检测到用户根据主流程语音和第一图文信息,调整到第一图文信息要求的状态之前,第一图文信息不消失,课程也不会往下继续进行,教练动画不会播放后续内容。因而,电子设备分别播放每条主流程语音,不会使得后续的课程动作与语音播放不对应,不会影响语音播放的时效性。
在另一种情况下,若电子设备在用户同一个动作单元的执行过程中,根据用户的状态确定触发了至少一条主流程语音,并根据训练动作确定触发了至少一条非主流程语音,则由于主流程语音的优先级高,因而电子设备可以播放该至少一条主流程语音,并舍弃其他非主流程语音,以保证语音播放的时效性,保证语音播放与用户训练动作实时匹配,避免用户已经在进行下一个动作了,电子设备还在播放上一个动作的非主流程语音。
举例来说,第一个训练动作为哑铃侧体屈训练,且一次哑铃侧体屈训练动作为一个动作单元。第二个训练动作为哑铃复合推举,且一次哑铃复合推举为一个动作单元。电子设备在用户进行第一个训练动作的过程中,确定触发了主流程语音“请远离桌角”和动作改善语音“侧屈幅度再大一点”。此时,由于主流程语音的优先级高,因而电子设备可以播放主流程语音“请远离桌角”,并舍弃其他非主流程语音。否则,若电子设备播放第一个训练动对应的所有触发的语音,则在执行第二个训练动作时,第一个训练动作对应的语音可能还没有播放完,从而将导致语音播放与训练动作不匹配,语音播放的时效性差。
在另一种情况下,电子设备在用户同一个动作单元的执行过程中,根据用户的训 练动作确定触发了多条非主流程语音。针对该种情况在方案1中,若采用优先级划分方式1,多条非主流程语音对应同一优先级,则电子设备可以从多条非主流程语音中选择最先触发的一条非主流语音进行播放。或者,电子设备可以从多条非主流程语音中随机选择一条非主流语音进行播放。
在一些实施例中,若电子设备选择的非主流语音为一条动作改善语音,则电子设备还可以在屏幕上显示第二图文信息,该第二图文信息为动作改善语音的图文信息。例如,电子设备可以在显示用户图像画面的一侧显示第二图文信息。
其中,该第二图文信息可以为任务框形式的更为简洁的图文表述。举例来说,一次哑铃侧体屈训练动作为一个动作单元,电子设备确定用户在执行一次哑铃侧体屈训练动作的过程中,先后触发了动作改善语音“请向哑铃手侧屈”,以及“侧屈幅度大一点可以更有效的刺激腹外斜肌哦”。这两条语音对应同一优先级。电子设备可以根据先后触发顺序,或者随机选择一条语音进行播放。例如,参见图8A,电子设备选择并播放了语音“请向哑铃手侧屈”,同时在屏幕上显示了该语音对应的第二图文信息,即提示信息801。
与任务框形式的第一图文信息类似,电子设备在确定用户根据动作改善语音和第二图文信息,将用户的训练动作调整到第二图文信息要求的状态后,电子设备上的任务框形式的第二图文信息可以进行相应的变化,以响应用户的状态调整。例如,第二图文信息上可以显示完成标识。而后,第二图文信息消失。示例性的,参见图8B,电子设备检测到用户向哑铃手侧弯曲后,提示信息801中的叉号更新为对勾。该对勾可以为该完成标识。而后,电子设备停止显示提示信息801。
若电子设备长时间内未检测到用户根据动作改善语音和第二图文信息,将训练动作调整到第二图文信息要求的状态,则如图8C所示,第二图文信息将持续显示,并且第二图文信息可以进行相应的变化,以引起用户的注意,提醒用户重点关注该第二图文信息的内容。例如,电子设备可以在第二图文信息上显示提示标识。示例性的,如图8C所示,提示信息801中的叉号更新为叹号。该叹号可以为该提示标识。并且,随着后续训练动作的进行,电子设备针对后续动作单元播放语音“身体向侧面弯曲,不要向前弯腰”。第二图文信息在持续显示预设时长后,或者在同种类型的训练动作结束后停止显示。其中,这里的同种类型的训练动作结束是指,哑铃侧体屈该类型的训练动作结束。
需要说明的是,与第一图文信息不同,在第二图文信息持续显示的过程中,课程、训练动作和语音播放会继续进行,因而不会使得动作与语音播放不对应,也不会影响语音播放的时效性。
在一些实施例中,若电子设备选择的非主流语音为一条动作改善语音,且针对本次健身训练的本类型的训练动作(例如同种类型的训练动作可能需要做多个),电子设备之前已播放过该条动作改善语音,则电子设备舍弃该条语音,重新根据触发先后顺序或随机选择一条非主流语音进行播放。这样,可以避免电子设备频繁指出用户的同一问题和同一改善要求。在一些技术方案中,电子设备在播放重新选择的非主流程语音时,还可以显示该动作改善语音对应的第二图文信息。
举例来说,一次哑铃侧体屈训练动作为一个动作单元,在图8A之后的另一哑铃 体侧屈训练动作中(即同类型训练动作的另一动作单元中),若电子设备选择的要播放的语音为动作改善语音“请向哑铃手侧屈”,则由于电子设备已播放过该动作改善语音,因而电子设备舍弃该动作改善语音,重新选择一条语音进行播放。
在另一些实施例中,若电子设备选择的非主流语音为一条针对第一错误(例如该错误为用户向未握持哑铃的手的一侧屈体)的动作改善语音,且针对本次健身训练的本类型的训练动作,电子设备之前已播放过该条动作改善语音,则电子设备可以播放针对第一错误的动作要点,并在动作要点播放完成后,再针对后续的训练动作确定是否触发语音,避免后续的训练动作和语音不匹配。在要点播放过程中,电子设备可以继续播放后续课程的教练动画,用户可以继续进行后续的训练动作。并且,针对同一类型的训练动作中的不同错误动作,对应的动作要点的内容也不同。
举例来说,一次哑铃侧体屈训练动作为一个动作单元,在图8A之后的另一哑铃体侧屈训练动作中,若电子设备选择的要播放的语音为动作改善语音“请向哑铃手侧屈”,则由于电子设备已播放过该动作改善语音,因而电子设备不再播放该语音,而播放该训练动作的动作要点“腰背挺直,并微微俯身,一只手持哑铃,另一只手置于脑后,向哑铃一侧屈体,不要向前方弯腰”。在要点播放过程中,电子设备可以继续播放后续课程的教练动画,但暂不确定后续课程进行过程中是否触发了语音。在一些技术方案中,电子设备在播放动作要点时,还可以显示该动作改善语音对应的第二图文信息。
在本次健身训练中,若后续电子设备针对本类型的训练动作的某一动作单元,选择的非主流语音仍为一条针对第一错误的动作改善语音,且确定之前针对该第一错误播放过动作要点,则电子设备可以舍弃该条动作改善语音,并显示该条动作改善语音对应的第二图文信息。而后,在一些实施例中,电子设备从该动作单元触发的其他语音中重新选择一条语音进行播放;或者,在另一些实施例中,电子设备不再播放该动作单元触发的语音。
在方案2中,若采用优先级划分方式2,则电子设备选择优先级最高的一条语音进行播放。若优先级最高的语音包括多条,则根据触发的先后顺序或者随机选择一条语音播放。
在另一种情况下,电子设备在用户执行同一个动作单元的过程中,根据用户的训练动作确定触发了一条非主流程语音,则电子设备播放该条非主流程语音。
举例来说,若电子设备在用户进行如图9A所示的肩部环绕训练动作的过程中,根据用户训练动作的数量确定触发了一条训练节奏语音“还有最后5次,加油”,则参见图9B,电子设备播放该训练节奏语音。
在一些情况下,若用户正在进行训练动作,且动作较为标准,因而主流程语音和动作改善语音触发的较少,训练节奏语音本身较少,则电子设备确定触发的动作评价语音较多。若电子设备针对每个动作都播放动作评价语音,则语音一直在持续播放,且动作评价语音播放过于频繁,用户体验较差。
在本申请的实施例中,电子设备可以采用轮空机制播放动作评价语音。在采用上述方案1或方案2确定电子设备从多条语音中选择了动作评价语音时,电子设备随机确定当前为轮空模式还是播放模式。若电子设备随机确定的结果为轮空模式,则舍弃 该条动作评价语音,电子设备不播放该条动作评价语音。若电子设备随机确定的结果为播放模式,则从用户训练动作与标准动作的匹配程度对应的级别(例如完美级别或真棒级别),所包括的动作评价语音的丰富的语料库中,随机选择一条语音进行播放。
这样,可以增加语音播放的随机性和不可预测性,减少重复的动作评价语音出现,减少不断重复评价的枯燥感和可预测性,给用户以耳目常新的使用体验。
举例来说,参见图10所示的时序图,电子设备可以随机轮空或播放完美级别的动作评价语音。并且,电子设备所播放的完美级别的动作评价语音也是随机确定的。
在其他一些实施例中,电子设备在播放动作评价语音时,还可以显示动作评价语音对应的图文信息,以更好地肯定和鼓励用户,给用户以信心,提高用户的训练兴趣。示例性的,参见图11,电子设备在播放动作评价语音时,还可以显示动作评价语音对应的图文信息,即提示信息1101。
以上是以语音的优先级通过语音类型来划分为例进行说明的,语音的优先级还可以通过其他方式来确定。例如,语音的优先级可以根据用户的实际情况,并结合人工智能AI来确定。
比如,针对同一类型的训练动作,若电子设备根据通过AI学习结果确定用户历史表现很差,但本次训练动作的标准程度有明显提高,则可以说明用户的进步很大,因而电子设备可以优先播放正向、积极、鼓励性的动作评价语音,即动作评价语音的优先级最高。这样,可以让用户及时获知自身的进步,感受到进步的喜悦,从而增加用户的信心,提升用户的兴趣。
并且,电子设备还可以通过AI对播放的语音进行调整。比如,对于较胖的用户来说,深蹲动作较难执行。若用户深蹲的训练动作与标准动作的匹配程度不及真棒级别,但接近真棒级别,则电子设备可以播放真棒级别的动作评价语音,以鼓励用户;若用户训练动作与标准动作的匹配程度接近完美级别,则电子设备可以播放完美级别的动作评价语音,以鼓励用户,提升用户自信。
再比如,若电子设备根据AI学习结果确定用户的耐力较差,比较容易中途停止训练,则电子设备可以优先播放训练节奏语音,增大训练节奏语音的播放频率,以方便用户及时获知当前的训练进度以及剩余的动作次数,鼓励用户坚持做完整套训练动作。
在其他一些实施例中,电子设备在播放主流程语音、动作改善语音、动作评价语音或训练节奏语音时,还可以在屏幕上显示该语音对应的文字内容,以方便用户更为清楚地了解电子设备的语音提示内容。示例性的,在图5A所示情况下,参见图12,电子设备可以在屏幕的顶部显示该语音对应的文字内容1201。
在以上实施例描述的方案中,电子设备上预设有包括各种类型和各种维度的多条语音的语音库,电子设备可以在用户根据教练动画进行运动训练时,实时针对用户当前的运动状态和运动动作,从该语音库中选择不同的语音进行播放。
在其他一些实施例中,电子设备上未预设有包括多条语音的语音库,而是预设有单字语料库。该单字语料库中包括多个单个的字或词。电子设备可以在用户根据教练动画进行运动训练时,实时针对用户当前的运动状态和运动动作,通过大数据分析合成更加有针对性的人性化的语音提示,从而进行语音播放。
例如,单字/词语料库中包括“漂亮,完美,帅呆了,无敌,你,您,本人,屏幕, 气质,前,侧,屈,是,一模一样,真,很,的,不错,相符,与……”等。示例性的,电子设备实时合成的一条语音为“你真是帅呆了!”再示例性的,电子设备实时合成的一条语音为“漂亮,与您本人的气质很相符!”。
本申请另一实施例还提供了一种智能语音播放方法,可以应用于电子设备,该电子设备可以包括屏幕、摄像头和音频播放器。参见图13,该方法可以包括:
1301、电子设备显示第一应用程序的界面,第一应用程序用于用户进行运动训练。
电子设备可以在屏幕上显示第一应用程序的界面。示例性的,第一应用程序可以为上述AI健身APP,第一应用程序的界面可以为图4中的(c)所示的界面。
例如,该运动训练可以是AI健身训练、AI瑜伽训练、AI健美操作训练或AI体感游戏等其他运动训练。
1302、电子设备采集用户训练动作的图像。
电子设备可以通过摄像头采集用户训练动作的图像。
1303、电子设备播放标准动作的动画,并显示用户训练动作的图像。
示例性的,电子设备播放标准动作的动画,并显示用户训练动作的图像的界面可以参见图6。
1304、电子设备确定用户训练动作中的第一动作单元触发的待选语音,待选语音包括多条语音,该第一动作单元为一次训练动作或者为一次训练动作的一部分。
其中,该第一动作单元可以是用户运动训练过程中任意的动作单元。
1305、电子设备从待选语音中选择一条语音进行播放。
其中,关于步骤1304-步骤1305中的相关说明,可以参见上述步骤305-步骤310中的描述,此处不予赘述。
这样,在步骤1301-步骤1305描述的方案中,电子设备能够实时针对用户当前的动作单元,确定与用户的运动状态相匹配的多条待选语音,并从待选语音中选择播放不同的语音,从而给用户以实时的语音反馈,且播放的语音内容丰富多变,不易重复,用户体验较好。
本申请另一实施例还提供了一种电子设备,可以包括显示单元,采集单元,确定单元和播放单元等。这些单元可以执行上述实施例中的各个步骤,以实现智能语音播放方法。
另外,本申请实施例还提供了一种电子设备,包括一个或多个处理器;存储器;以及一个或多个计算机程序。一个或多个计算机程序被存储在存储器中,一个或多个计算机程序包括指令。当指令被一个或多个处理器执行时,使得电子设备执行上述实施例中的各个步骤,以实现上述智能语音播放方法。
本申请的实施例还提供一种计算机存储介质,该计算机存储介质中存储有计算机指令,当该计算机指令在电子设备上运行时,使得电子设备执行上述相关方法步骤实现上述实施例中的智能语音播放方法。
本申请的实施例还提供了一种计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述相关步骤,以实现上述实施例中电子设备执行的智能语音播放方法。
另外,本申请的实施例还提供一种装置,这个装置具体可以是芯片,组件或模块, 该装置可包括相连的处理器和存储器;其中,存储器用于存储计算机执行指令,当装置运行时,处理器可执行存储器存储的计算机执行指令,以使芯片执行上述各方法实施例中电子设备执行的智能语音播放方法。
其中,本实施例提供的电子设备、计算机存储介质、计算机程序产品或芯片均用于执行上文所提供的对应的方法,因此,其所能达到的有益效果可参考上文所提供的对应的方法中的有益效果,此处不再赘述。
通过以上实施方式的描述,所属领域的技术人员可以了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个装置,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是一个物理单元或多个物理单元,即可以位于一个地方,或者也可以分布到多个不同地方。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该软件产品存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上内容,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (20)

  1. 一种语音播放方法,其特征在于,包括:
    电子设备显示第一应用程序的界面,所述第一应用程序用于用户进行运动训练;
    所述电子设备采集所述用户训练动作的图像;
    所述电子设备播放标准动作的动画,并显示所述用户训练动作的图像;
    所述电子设备确定所述用户训练动作中的第一动作单元触发的待选语音,所述待选语音包括多条语音;所述第一动作单元为一次所述训练动作或者为一次所述训练动作的一部分;
    所述电子设备从所述待选语音中选择一条语音进行播放。
  2. 根据权利要求1所述的方法,其特征在于,所述待选语音包括主流程语音和非主流程语音;所述主流程语音用于使得所述运动训练正常进行;所述非主流程语音包括动作改善语音、动作评价语音或训练节奏语音中的一项或多项;所述主流程语音为高优先级的语音,所述非主流程语音为低优先级的语音;所述电子设备从所述待选语音中选择一条语音进行播放,包括:
    所述电子设备根据语音的优先级,从所述待选语音中选择一条语音进行播放。
  3. 根据权利要求2所述的方法,其特征在于,所述待选语音包括至少一条高优先级的所述主流程语音,所述电子设备根据语音的优先级,从所述待选语音中选择一条语音进行播放,包括:
    所述电子设备分别播放所述待选语音中的每条高优先级的所述主流程语音。
  4. 根据权利要求3所述的方法,其特征在于,在所述电子设备播放每条所述主流程语音时,所述方法还包括:
    所述电子设备停止播放所述标准动作的动画;
    所述电子设备显示第一图文信息,所述第一图文信息与所述主流程语音相对应;
    在所述电子设备检测到所述用户调整到所述第一图文信息要求的状态后,所述电子设备在所述第一图文信息上显示完成标识;
    所述电子设备停止显示所述第一图文信息;
    所述电子设备继续播放所述标准动作的动画。
  5. 根据权利要求3或4所述的方法,其特征在于,所述待选语音还包括至少一条低优先级的所述非主流程语音。
  6. 根据权利要求2所述的方法,其特征在于,所述待选语音包括多条低优先级的所述非主流程语音;所述电子设备根据语音的优先级,从所述待选语音中选择一条语音进行播放,包括:
    所述电子设备从多条低优先级的所述非主流语音中选择第一目标语音;
    所述电子设备播放所述第一目标语音。
  7. 根据权利要求6所述的方法,其特征在于,所述第一目标语音为多条低优先级的所述非主流程语音中最先触发的一条语音。
  8. 根据权利要求6或7所述的方法,其特征在于,所述第一目标语音为针对所述第一动作单元中的第一错误的第一动作改善语音,所述电子设备播放所述第一目标语音,包括:
    若在本次运动训练中,所述电子设备针对所述第一动作单元所属类型的训练动作,未播放过所述第一动作改善语音,则所述电子设备播放所述第一动作改善语音;
    所述方法还包括:
    若在本次运动训练中,所述电子设备针对所述第一动作单元所属类型的训练动作,已播放过所述第一动作改善语音,则所述电子设备播放针对所述第一错误的动作要点;或者所述电子设备从所述第一动作改善语音以外的其他所述非主流程语音中选择第二目标语音。
  9. 根据权利要求8所述的方法,其特征在于,在所述电子设备播放所述第一动作改善语音时,所述方法还包括:
    所述电子设备显示第二图文信息,所述第二图文信息与所述第一动作改善语音相对应;
    在所述电子设备检测到所述用户将所述训练动作调整到所述第二图文信息要求的状态后,所述电子设备在所述第二图文信息上显示完成标识;
    而后,所述电子设备停止显示所述第二图文信息。
  10. 根据权利要求6或7所述的方法,其特征在于,所述第一目标语音为第一动作评价语音,所述电子设备播放所述第一目标语音,包括:
    若所述电子设备采用随机方式确定当前为第一模式,则所述电子设备播放所述第一动作评价语音;
    所述方法还包括:
    若所述电子设备通过随机的方式确定当前为第二模式,则所述电子设备不播放所述第一动作评价语音。
  11. 根据权利要求10所述的方法,其特征在于,所述动作评价语音包括多个级别,每个所述级别包括多条语音,所述第一动作评价语音为第一级别的动作评价语音,所述电子设备播放所述第一动作评价语音,包括:
    所述电子设备从所述第一级别的动作评价语音中,随机选择一条语音进行播放。
  12. 根据权利要求6-11任一项所述的方法,其特征在于,所述电子设备播放所述第一目标语音,包括:
    若所述电子设备确定所述第一动作单元之前的另一动作单元触发的语音已播放完成,则所述电子设备播放所述第一目标语音;
    所述方法还包括:
    若所述电子设备确定所述第一动作单元之前的另一动作单元触发的语音尚未播放完成,则所述电子设备不播放所述第一目标语音。
  13. 根据权利要求2-12任一项所述的方法,其特征在于,在所述电子设备显示第一应用程序的界面之后,且所述电子设备确定所述用户训练动作中的第一动作单元触发的待选语音之前,所述方法还包括:
    所述电子设备根据所述运动训练的进度或所述用户的状态确定触发所述主流程语音;
    所述电子设备播放所述主流程语音。
  14. 根据权利要求2-13任一项所述的方法,其特征在于,所述动作改善语音用于 引导所述用户改善所述训练动作;
    所述动作评价语音用于针对所述用户的训练动作给所述用户以肯定性的评价;
    所述训练节奏语音用于提示所述用户所述运动训练的进度。
  15. 根据权利要求14所述的方法,其特征在于,所述主流程语音包括流程性语音,位置调整语音,站位调整语音或人性化提示语音中的一项或多项;
    所述动作改善语音包括频率改善语音,幅度改善语音或姿态改善语音中的一项或多项。
  16. 根据权利要求15所述的方法,其特征在于,所述电子设备确定所述用户的训练动作中的第一动作单元触发的待选语音,包括:
    所述电子设备根据所述第一动作单元执行过程中所述用户的状态,确定所述待选语音中的所述位置调整语音、所述站位调整语音或所述人性化提示语音;
    所述电子设备根据所述第一动作单元中的动作,确定所述待选语音中的所述非主流程语音。
  17. 一种电子设备,其特征在于,包括:
    屏幕,用于显示应用程序的界面和用户训练动作的图像;
    音频播放器,用于播放语音;
    一个或多个处理器;
    以及存储器,所述存储器中存储有代码;
    当所述代码被所述一个或多个处理器执行时,使得所述电子设备执行如权利要求1-16中任一项所述的语音播放方法。
  18. 一种计算机存储介质,其特征在于,包括计算机指令,当所述计算机指令在电子设备上运行时,使得所述电子设备执行如权利要求1-16中任一项所述语音播放方法。
  19. 一种计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如权利要求1-16中任一项所述的语音播放方法。
  20. 一种芯片系统,其特征在于,所述芯片系统应用于电子设备;所述芯片系统包括一个或多个接口电路和一个或多个处理器;所述接口电路和所述处理器通过线路互联;所述接口电路用于从所述电子设备的存储器接收信号,并向所述处理器发送所述信号,所述信号包括所述存储器中存储的计算机指令;当所述处理器执行所述计算机指令时,所述电子设备执行如权利要求1-16中任一项所述的语音播放方法。
PCT/CN2020/110623 2019-08-30 2020-08-21 一种智能语音播放方法及设备 Ceased WO2021036954A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US17/634,601 US12070673B2 (en) 2019-08-30 2020-08-21 Intelligent voice playing method and device
EP20856532.5A EP3991814B1 (en) 2019-08-30 2020-08-21 Intelligent speech playing method and device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910818708.0A CN112439180B (zh) 2019-08-30 2019-08-30 一种智能语音播放方法及设备
CN201910818708.0 2019-08-30

Publications (1)

Publication Number Publication Date
WO2021036954A1 true WO2021036954A1 (zh) 2021-03-04

Family

ID=74683330

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/110623 Ceased WO2021036954A1 (zh) 2019-08-30 2020-08-21 一种智能语音播放方法及设备

Country Status (4)

Country Link
US (1) US12070673B2 (zh)
EP (1) EP3991814B1 (zh)
CN (2) CN112439180B (zh)
WO (1) WO2021036954A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115188064A (zh) * 2021-04-07 2022-10-14 华为技术有限公司 一种运动指导信息的确定方法、电子设备和运动指导系统
JP2023540286A (ja) * 2021-03-19 2023-09-22 シェンツェン・ショックス・カンパニー・リミテッド ユーザー動作を識別する方法及びシステム

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115909831A (zh) * 2021-09-22 2023-04-04 北京卡路里信息技术有限公司 课程训练的数据处理方法及装置
CN115202531A (zh) * 2022-05-27 2022-10-18 当趣网络科技(杭州)有限公司 界面交互的方法、系统和电子装置
CN115620866B (zh) * 2022-06-17 2023-10-24 荣耀终端有限公司 运动信息提示方法及装置
CN115779394B (zh) * 2022-12-01 2025-10-31 炫彩互动网络科技有限公司 一种体感运动训练方法和系统
CN116678083B (zh) * 2023-05-31 2025-10-24 青岛海尔空调器有限总公司 智能家居互联的方法、控制装置及空调器
CN119229858A (zh) * 2023-06-30 2024-12-31 荣耀终端有限公司 录入语音并发冲突的处理方法及终端设备
CN116935484A (zh) * 2023-07-04 2023-10-24 康键信息技术(深圳)有限公司 深蹲跳动作计数方法、装置、设备及介质
CN119961870A (zh) * 2025-03-27 2025-05-09 北京一石科技有限责任公司 一种ai驱动全方位体态监测方法、系统、介质和程序产品

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101202994A (zh) * 2006-12-14 2008-06-18 北京三星通信技术研究有限公司 辅助用户健身的方法和装置
US20110307927A1 (en) * 2010-06-11 2011-12-15 Toshihisa Nakano Method, system and apparatus for managing network services
CN103050138A (zh) * 2012-11-19 2013-04-17 长沙中联消防机械有限公司 提示音播放控制方法、装置及工程机械设备
CN106139564A (zh) * 2016-08-01 2016-11-23 纳恩博(北京)科技有限公司 图像处理方法和装置
CN108654061A (zh) * 2018-05-14 2018-10-16 北京卡路里科技有限公司 分段跑的语音播放方法及装置
CN110110647A (zh) * 2019-04-30 2019-08-09 北京小米移动软件有限公司 基于ar设备进行信息显示的方法、装置及存储介质
CN110170159A (zh) * 2019-06-27 2019-08-27 郭庆龙 一种人体健身动作运动监测系统

Family Cites Families (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004024627A (ja) 2002-06-26 2004-01-29 Yamaha Corp 動作練習装置
JP2010194034A (ja) 2009-02-24 2010-09-09 Panasonic Electric Works Co Ltd 運動装置
CN102365115B (zh) * 2009-03-13 2014-05-21 高夫准株式会社 虚拟高尔夫模拟设备及其方法
JP2010137097A (ja) * 2010-03-23 2010-06-24 Namco Bandai Games Inc ゲーム装置および情報記憶媒体
CN103413018B (zh) * 2012-05-23 2017-04-12 微软技术许可有限责任公司 用于提供动态锻炼内容的方法
US9161708B2 (en) * 2013-02-14 2015-10-20 P3 Analytics, Inc. Generation of personalized training regimens from motion capture data
US20150347717A1 (en) * 2014-06-02 2015-12-03 Xerox Corporation Hybrid personal training system and method
CN105148479A (zh) 2015-07-07 2015-12-16 王瑜 一种舞蹈训练系统
CN106110627B (zh) 2016-06-20 2018-08-21 曲大方 体育和武术运动动作校正设备和方法
CN106730760A (zh) * 2016-12-06 2017-05-31 广州视源电子科技股份有限公司 健身动作检测方法、系统、可穿戴设备及终端
KR20180103280A (ko) 2017-03-09 2018-09-19 석원영 관절 사이의 거리 유사성을 토대로 자세인식을 수행하는 노인전용 운동안내 시스템
CN106984027B (zh) * 2017-03-23 2019-07-26 华映科技(集团)股份有限公司 一种动作对比分析方法、装置及一种显示器
US11037369B2 (en) * 2017-05-01 2021-06-15 Zimmer Us, Inc. Virtual or augmented reality rehabilitation
US20180339195A1 (en) * 2017-05-25 2018-11-29 Erik A. Bernotas Exercise Information System
CN107551521B (zh) * 2017-08-17 2020-05-08 广州视源电子科技股份有限公司 健身指导方法及装置、智能设备及存储介质
US11161236B2 (en) 2017-09-14 2021-11-02 Sony Interactive Entertainment Inc. Robot as personal trainer
US20190201744A1 (en) * 2017-12-31 2019-07-04 Zheng Shen Internet based asynchronous coaching system
CN108601133A (zh) * 2018-02-12 2018-09-28 甄十信息科技(上海)有限公司 一种智能台灯及基于智能台灯的坐姿纠正方法
CN108694240A (zh) * 2018-05-14 2018-10-23 北京卡路里科技有限公司 语音播放方法及装置
CN108853946A (zh) * 2018-07-10 2018-11-23 燕山大学 一种基于Kinect的健身指导训练系统及方法
CN109087694B (zh) * 2018-07-13 2021-03-30 广东小天才科技有限公司 一种帮助学生锻炼身体的方法及家教设备
CN109144247A (zh) * 2018-07-17 2019-01-04 尚晟 视频交互的方法以及基于可交互视频的运动辅助系统
CN109165560A (zh) * 2018-07-26 2019-01-08 深圳市梵高夫科技有限公司 运动姿势的学习方法、装置、系统及信息存储介质
CN109011508A (zh) * 2018-07-30 2018-12-18 三星电子(中国)研发中心 一种智能教练系统及方法
CN109407829A (zh) * 2018-09-18 2019-03-01 孔军民 一种应用于健身器材的人机交互系统及交互方法
CN109745163A (zh) * 2019-01-05 2019-05-14 张伟 身体姿势指导方法及系统
CN110052013A (zh) * 2019-05-28 2019-07-26 上海应用技术大学 一种健身即时辅助系统
CN210864619U (zh) 2019-10-12 2020-06-26 北京踏行天际科技发展有限公司 一种无脂镜显示系统

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101202994A (zh) * 2006-12-14 2008-06-18 北京三星通信技术研究有限公司 辅助用户健身的方法和装置
US20110307927A1 (en) * 2010-06-11 2011-12-15 Toshihisa Nakano Method, system and apparatus for managing network services
CN103050138A (zh) * 2012-11-19 2013-04-17 长沙中联消防机械有限公司 提示音播放控制方法、装置及工程机械设备
CN106139564A (zh) * 2016-08-01 2016-11-23 纳恩博(北京)科技有限公司 图像处理方法和装置
CN108654061A (zh) * 2018-05-14 2018-10-16 北京卡路里科技有限公司 分段跑的语音播放方法及装置
CN110110647A (zh) * 2019-04-30 2019-08-09 北京小米移动软件有限公司 基于ar设备进行信息显示的方法、装置及存储介质
CN110170159A (zh) * 2019-06-27 2019-08-27 郭庆龙 一种人体健身动作运动监测系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3991814A4

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2023540286A (ja) * 2021-03-19 2023-09-22 シェンツェン・ショックス・カンパニー・リミテッド ユーザー動作を識別する方法及びシステム
JP7508698B2 (ja) 2021-03-19 2024-07-01 シェンツェン・ショックス・カンパニー・リミテッド ユーザー動作を識別する方法、システム、および非一時的コンピュータ可読媒体
CN115188064A (zh) * 2021-04-07 2022-10-14 华为技术有限公司 一种运动指导信息的确定方法、电子设备和运动指导系统
EP4310724A4 (en) * 2021-04-07 2024-09-04 Huawei Technologies Co., Ltd. METHOD FOR DETERMINING EXERCISE GUIDANCE INFORMATION, ELECTRONIC DEVICE AND EXERCISE GUIDANCE SYSTEM

Also Published As

Publication number Publication date
CN112439180B (zh) 2021-12-28
CN112439180A (zh) 2021-03-05
EP3991814A1 (en) 2022-05-04
US20220323848A1 (en) 2022-10-13
CN114432683A (zh) 2022-05-06
US12070673B2 (en) 2024-08-27
EP3991814A4 (en) 2022-08-17
EP3991814B1 (en) 2026-04-22

Similar Documents

Publication Publication Date Title
WO2021036954A1 (zh) 一种智能语音播放方法及设备
US20230077227A1 (en) Reflective video display apparatus for interactive training and demonstration and methods of using same
CN111669515A (zh) 一种视频生成方法及相关装置
WO2021000708A1 (zh) 健身教学方法、装置、电子设备及存储介质
CN115188064B (zh) 一种运动指导信息的确定方法、电子设备和运动指导系统
CN113655935B (zh) 一种用户确定方法、电子设备和计算机可读存储介质
CN114969490A (zh) 一种训练课程推荐方法及设备
CN115177938B (zh) 交互式智能健身镜面装置
TW201409279A (zh) 互動式擴增實境系統及其可攜式通訊裝置與互動方法
WO2019235338A1 (ja) 情報処理装置、情報処理方法、及びプログラム
JP7341324B2 (ja) 目標ユーザのロック方法および電子デバイス
CN115291533B (zh) 智能床垫的控制方法及装置、智能床垫、存储介质
WO2022105715A1 (zh) 一种体感交互方法及电子设备
WO2022135177A1 (zh) 控制方法和电子设备
WO2024222544A1 (zh) 震动反馈的方法、相关装置及通信系统
CN113949936A (zh) 一种电子设备的屏幕交互方法及装置
WO2023011356A1 (zh) 视频生成方法及电子设备
WO2021036839A1 (zh) 摄像头控制的方法、装置和终端设备
CN115657845A (zh) 虚拟健身管理方法与装置
CN116456159A (zh) 一种健身互动方法及设备
CN116185185A (zh) 一种数字化体育教学方法及设备
CN116188647A (zh) 控制方法、智能终端及存储介质
CN117834945A (zh) 健身视频播放方法、装置、电子设备及存储介质
WO2022161027A1 (zh) 动作提示图标序列生成方法、电子设备和可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20856532

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 20856532.5

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2020856532

Country of ref document: EP

Effective date: 20220127

NENP Non-entry into the national phase

Ref country code: DE

WWG Wipo information: grant in national office

Ref document number: 2020856532

Country of ref document: EP