WO2020034849A1 - 音乐推荐的方法、装置、计算设备和介质 - Google Patents
音乐推荐的方法、装置、计算设备和介质 Download PDFInfo
- Publication number
- WO2020034849A1 WO2020034849A1 PCT/CN2019/098861 CN2019098861W WO2020034849A1 WO 2020034849 A1 WO2020034849 A1 WO 2020034849A1 CN 2019098861 W CN2019098861 W CN 2019098861W WO 2020034849 A1 WO2020034849 A1 WO 2020034849A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- music
- user
- matching
- visual semantic
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/11—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information not detectable on the record carrier
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
- G06F16/24578—Query processing with adaptation to user needs using ranking
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/63—Querying
- G06F16/635—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/65—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/68—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/686—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title or artist information, time, location or usage information, user ratings
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7837—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/0008—Associated control or indicating means
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/36—Accompaniment arrangements
- G10H1/361—Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems
- G10H1/368—Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems displaying animated or moving pictures synchronized with the music or audio part
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/02—Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
- G11B27/031—Electronic editing of digitised analogue information signals, e.g. audio or video signals
- G11B27/036—Insert-editing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2220/00—Input/output interfacing specifically adapted for electrophonic musical tools or instruments
- G10H2220/155—User input interfaces for electrophonic musical instruments
- G10H2220/441—Image sensing, i.e. capturing images or optical patterns for musical purposes or musical control purposes
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2240/00—Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
- G10H2240/075—Musical metadata derived from musical analysis or for use in electrophonic musical instruments
- G10H2240/085—Mood, i.e. generation, detection or selection of a particular emotional content or atmosphere in a musical piece
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2240/00—Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
- G10H2240/121—Musical libraries, i.e. musical databases indexed by musical parameters, wavetables, indexing schemes using musical parameters, musical rule bases or knowledge bases, e.g. for automatic composing methods
- G10H2240/131—Library retrieval, i.e. searching a database or selecting a specific musical piece, segment, pattern, rule or parameter set
- G10H2240/141—Library retrieval matching, i.e. any of the steps of matching an inputted segment or phrase with musical database contents, e.g. query by humming, singing or playing; the steps may include, e.g. musical analysis of the input, musical feature extraction, query formulation, or details of the retrieval process
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/311—Neural networks for electrophonic musical instruments or musical processing, e.g. for musical recognition or control, automatic composition or improvisation
Definitions
- the present application relates to the field of computer technology, and in particular, to a method, an apparatus, a computing device, and a medium for music recommendation.
- the embodiments of the present application provide a method, an apparatus, a computing device, and a medium for music recommendation, which are used to provide personalization for different users while using less processing resources and bandwidth resources of the computing device when recommending matching music for the user. Recommended services.
- An embodiment of the present application provides a method for music recommendation, which is executed by a server device and includes:
- each visual semantic tag is used to describe at least one content of the material
- the matching music is filtered according to the preset music filtering conditions, and the filtered matching music is recommended as the candidate music of the material.
- An embodiment of the present application further provides a method for music recommendation, which is executed by a terminal device, and includes:
- the estimated music appreciation information of the user for each matching music is obtained based on the actual music appreciation information of each candidate music by different users.
- An embodiment of the present application further provides a device for music recommendation, including:
- An acquisition unit which is used to acquire material to be soundtracked
- a first determining unit configured to determine at least one visual semantic tag of the material, and each visual semantic tag is used to describe at least one content of the material;
- a search unit configured to search each matching music that matches at least one visual semantic tag from the candidate music library
- a sorting unit configured to sort each matching music according to the user appreciation information for each matching music corresponding to the material
- a recommendation unit is configured to filter matching music according to a preset music filtering condition based on the sorting result, and recommend the filtered matching music as a candidate music of the material.
- An embodiment of the present application further provides a device for music recommendation, including:
- a sending unit configured to send the material to be scored to the server device, and trigger the server device to perform the following steps: determine at least one visual semantic tag of the material; and search for each matching music that matches the at least one visual semantic tag from the candidate music library; According to the estimated music appreciation information of each matching music by the user corresponding to the material, sort each matching music; based on the sorting result, filter the matching music according to the preset music filtering conditions, and recommend the filtered matching music as the material Alternative music
- a receiving unit configured to receive candidate music returned by the server device
- the estimated music appreciation information of the user for each matching music is obtained based on the actual music appreciation information of each candidate music by different users.
- An embodiment of the present application further provides a computing device including at least one processing unit and at least one storage unit, where the storage unit stores a computer program, and when the program is executed by the processing unit, causes the processing unit to execute any of the foregoing music recommendations Steps of the method.
- An embodiment of the present application further provides a computer-readable medium that stores a computer program executable by a computing device, and when the program runs on a terminal device, causes the computing device to execute the steps of any of the above-mentioned music recommendation methods.
- the method, device, computing device, and medium for music recommendation determine the visual semantic tags of the music material to be matched, and search for matching music that matches the visual semantic tags, and according to the user appreciation information of each matching music by the user, Sort each matching music and recommend matching music to the user according to the sorted results.
- the reason for music recommendation can be explained to users through visual semantic tags, and differentiated recommendations are made to different users, and personalized recommendation services for music recommendation are realized, and the need for re-recommendations due to inappropriate music recommendations is further avoided.
- the problem of wasting processing resources of the computing device and occupying bandwidth resources between the terminal device and the server can save the processing resources of the computing device and the bandwidth resources between the terminal device and the server.
- FIG. 1 is a schematic structural diagram of a terminal device provided in an embodiment of the present application.
- FIG. 2 is an implementation flowchart of a music recommendation method according to an embodiment of the present application
- FIG. 3a is an example diagram of an analytic image provided in an embodiment of the present application.
- FIG. 3b is a schematic diagram of an Inception submodule of Inception V1 provided in an embodiment of the present application.
- FIG. 3c is an exemplary diagram of a user music review provided in an embodiment of the present application.
- FIG. 3d is a second example of a user music review provided in an embodiment of the present application.
- 3e is a schematic structural diagram of a model of FastText provided in an embodiment of the present application.
- FIG. 3f is a first schematic diagram of a music recommendation application interface provided in an embodiment of the present application.
- FIG. 3g is a diagram of a matching music recommendation example of a material provided in an embodiment of the present application.
- 3h is a second schematic diagram of a music recommendation application interface provided in an embodiment of the present application.
- 3i is an information interaction diagram provided in an embodiment of the present application.
- FIG. 4a is a first schematic structural diagram of a music recommendation device according to an embodiment of the present application.
- 4b is a second structural schematic diagram of a music recommendation device according to an embodiment of the present application.
- FIG. 5 is a schematic structural diagram of a terminal device according to an embodiment of the present application.
- the embodiments of the present application provide a method, device, computing device, and medium for music recommendation.
- Terminal device It is an electronic device that can install various applications and can display the entity provided in the installed application.
- the electronic device can be mobile or fixed.
- a mobile phone a tablet, an in-vehicle device, a personal digital assistant (PDA), or other electronic devices capable of implementing the above functions.
- PDA personal digital assistant
- CNN Convolutional Neural Networks
- Visual semantic label vector represents the probability distribution of a frame of image corresponding to each label, including: a frame of image corresponding to the score of each label, in the embodiment of the present application, a score can be a frame of image corresponding to one The probability value of each label.
- An image can be labeled with multiple labels.
- Label recognition model a model for identifying the input image and determining the label of the image.
- Music search model It is a model for performing music search according to the input search term, and obtaining music matching the search term.
- FastText It is a word vector calculation and text classification tool open sourced by Facebook in 2016, but its advantages are also very obvious. In the text classification task, FastText can achieve accuracy comparable to that of deep networks, but Many orders of magnitude faster than deep networks in training time.
- the embodiment of the present application provides a technical solution for music recommendation to determine the visual semantic label of the material. It also searches for matching music that matches the visual semantic tags, and sorts and recommends matching music according to the user's user appreciation information for the matching music. In this way, differentiated recommendations can be provided for different users, and personalized services can be provided for users.
- the method for music recommendation provided in the embodiments of the present application can be applied to a terminal device.
- the terminal device may be a mobile phone, a tablet computer, or a PDA (Personal Digital Assistant).
- FIG. 1 is a schematic structural diagram of a terminal device 100.
- the terminal device 100 includes: a processor 110, a memory 120, a power source 130, a display unit 140, and an input unit 150.
- the processor 110 is a control center of the terminal device 100, and uses various interfaces and lines to connect various components.
- the processor 110 executes various functions of the terminal device 100 by running or executing software programs and / or data stored in the memory 120, so as to control the terminal.
- the equipment performs overall monitoring.
- the processor 110 may include one or more processing units; the processor 110 may integrate an application processor and a modem processor, wherein the application processor mainly processes an operating system, a user interface, and an application program.
- the modem processor mainly handles wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 110.
- the processor and the memory may be implemented on a single chip. In other embodiments, they may also be implemented on separate chips.
- the memory 120 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, various application programs, and the like; the storage data area may store data created according to the use of the terminal device 100 and the like.
- the memory 120 may include a high-speed random access memory, and may further include a non-volatile memory, for example, at least one magnetic disk storage device, a flash memory device, or other non-volatile solid-state storage devices.
- the terminal device 100 further includes a power source 130 (such as a battery) for supplying power to various components.
- the power source can be logically connected to the processor 110 through a power management system, so as to implement functions such as management of charging, discharging, and power consumption through the power management system.
- the display unit 140 may be configured to display information input by the user or information provided to the user and various menus of the terminal device 100.
- the display interface 140 is mainly used to display the display interface and display interface of each application program in the terminal device 100. Displayed entities such as text and pictures.
- the display unit 140 may include a display panel 141.
- the display panel 141 may be configured using a liquid crystal display (Liquid Crystal Display, LCD), an organic light emitting diode (Organic Light-Emitting Diode, OLED), or the like.
- the input unit 150 may be used to receive information such as numbers or characters input by a user.
- the input unit 150 may include a touch panel 151 and other input devices 152.
- the touch panel 151 also referred to as a touch screen, can collect touch operations performed by the user on or near the touch panel (for example, the user uses a finger, a touch pen, or any suitable object or accessory on the touch panel 151 or the touch panel 151 Nearby actions).
- the touch panel 151 can detect a user's touch operation, and detect signals brought by the touch operation, convert these signals into contact coordinates, send them to the processor 110, and receive commands from the processor 110 and execute them.
- the touch panel 151 may be implemented in various types such as a resistive type, a capacitive type, an infrared type, and a surface acoustic wave.
- the other input devices 152 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, power on / off keys, etc.), a trackball, a mouse, a joystick, and the like.
- the touch panel 151 may cover the display panel 141.
- the touch panel 151 detects a touch operation on or near the touch panel 151, the touch panel 151 transmits the touch operation to the processor 110 to determine the type of the touch event.
- a corresponding visual output is provided on the display panel 141.
- the touch panel 151 and the display panel 141 are implemented as input and output functions of the terminal device 100 as two independent components, in some embodiments, the touch panel 151 and the display panel 141 may be implemented. Integrated to implement the input and output functions of the terminal device 100.
- the terminal device 100 may further include one or more sensors, such as a pressure sensor, a gravity acceleration sensor, a proximity light sensor, and the like.
- sensors such as a pressure sensor, a gravity acceleration sensor, a proximity light sensor, and the like.
- the above-mentioned terminal device 100 may further include other components such as a camera. Since these components are not the components used in the embodiments of the present application, they are not shown in FIG. 1 and will not be described in detail. .
- FIG. 1 is an example of a terminal device and does not constitute a limitation on the terminal device.
- the terminal device may include more or fewer components than shown in the figure, or some components may be combined. , Or different parts.
- the method for music recommendation may also be applied to a server device. Both the server device and the terminal device can adopt the structure shown in FIG. 1. Server devices and terminal devices are collectively referred to as computing devices.
- the method for music recommendation provided in the embodiments of the present application can be applied to recommend matching music for various materials, such as an image collection or a video.
- the image collection may include one or more images, images, or videos. It can be taken by the user or obtained from other sources.
- an implementation flowchart of a music recommendation method provided by an embodiment of the present application is performed by a server device.
- the specific implementation process of the method includes steps 200 to 205, as follows:
- Step 200 The server device acquires the material that needs a soundtrack.
- the material when step 200 is performed, the material may be a video or an image collection, and the image collection includes at least one frame of the image.
- the material of the server device can be obtained in the following ways: the server device receives the material for the soundtrack sent by the terminal device, or the server device directly obtains the material for the soundtrack input by the user, and the server device itself sets the material for the soundtrack.
- the user can be an instant messaging service (such as WeChat).
- the user can input various types of materials through his own terminal device, such as short material to be recorded by WeChat friends, and the terminal device sends the short material to the server through the communication network. device.
- the user uploads the material to be scored directly on the application interface provided on the server device side.
- the server device may also actively search for materials uploaded by the user to the public platform, then perform soundtrack on these materials, and then send the soundtracked materials to the user.
- Step 201 The server device determines a visual semantic tag of the material.
- step 201 when step 201 is performed, the following methods may be adopted:
- the first method is: determining at least one visual semantic tag specified by the user from the alternative visual semantic tags as at least one visual semantic tag of the material.
- the user may be provided with some alternative visual semantic tags for the user to choose, where the user designates and submits at least one visual semantic tag he wants, and determines the visual semantic tag specified by the user as at least one visual semantic tag of the material.
- the second method is to parse the content of the material and determine at least one visual semantic tag of the material. For example, the content of the video or image collection is parsed, and at least one visual semantic tag of the material is determined according to the analysis result.
- a pre-trained label recognition model is used to perform visual semantic label recognition on the material to obtain the visual semantic label vector of the material, and the visual semantics of the scores in the visual semantic label vector that meet the preset filtering conditions
- the tag is determined as a visual semantic tag corresponding to the material.
- the image collection includes at least one frame of image
- the visual semantic label vector of the material includes: at least one visual semantic label of the content identified from the material and its corresponding score
- the label recognition model is performed on multiple label recognition samples. After training, each label recognition sample includes a sample image and a visual semantic label vector of the sample image.
- the server device parses the material according to a preset duration to obtain each frame image.
- the server device uses the pre-trained label recognition model to perform visual semantic label recognition on each frame of the image, and obtains the visual semantic label vector of each frame of the image.
- the server device determines an average vector of the visual semantic label vectors of each frame image, and determines the visual semantic labels whose scores meet the preset filtering conditions as the visual semantic labels corresponding to the material.
- the visual semantic label vector of a frame of image includes at least one visual semantic label of the content identified from the frame of the image and its corresponding score
- the label recognition model is obtained after training multiple label recognition samples, Each label recognition sample includes a sample image and a visual semantic label vector of the sample image.
- the preset duration may be 1s, that is, 1s is used to parse a frame of image.
- the filter condition can be a specified number of visual semantic labels with the highest score. The specified number can be one or more.
- the server device determines that the visual semantic label corresponding to the material is the sky with the highest score.
- the label recognition model is a model for identifying an input image and determining a label of the image.
- the label recognition model may be a model obtained by training a large number of sample images and corresponding visual semantic label vectors, or may be a model established according to an association relationship between image features and visual semantic labels.
- the specific acquisition method of the label recognition model is not limited herein.
- a label recognition model is obtained by training a sample image and a visual semantic label vector through a convolutional neural network algorithm as an example for description.
- the server device Before executing step 201, uses a convolutional neural network algorithm in advance to train a large number of sample images in the image database and the visual semantic label vectors of the sample images, thereby obtaining a label recognition model.
- Image databases usually contain tens of millions of image data.
- the visual semantic label vector represents a probability distribution in a frame of an image corresponding to each label, including: a frame of an image corresponding to a score of each label.
- a score may correspond to a frame of an image.
- An image can be labeled with multiple labels.
- FIG. 3a is an example diagram of an analytic image. It is assumed that the set of visual semantic labels includes: sky, mountain, sea, plant, animal, person, snow, light, and car. Then, the server device determines that the visual semantic tag vector corresponding to the parsed image shown in FIG. 3a is ⁇ 0.7, 0.03, 0.1, 0.02, 0, 0, 0, 0.05, 0 ⁇ .
- an Inception V1 or Inception V3 model in a convolutional neural network may be used, and a cross entropy loss function may be used (Cross Entropy Loss) as a loss function to determine the similarity between the visual semantic label vector obtained from the recognition and the sample visual semantic label vector.
- Cross entropy Loss a loss function to determine the similarity between the visual semantic label vector obtained from the recognition and the sample visual semantic label vector.
- FIG. 3b is a schematic diagram of an Inception submodule of Inception V1.
- the previous layer is used to get the output value of the previous layer.
- 1x1, 3x3, and 5x5 are all Convolutions.
- the Inception sub-module performs convolution and pooling (3x3max pooling) on the output value of the previous layer through each convolution check, and uses a filter connection (Filter Concatenation) to process and output to the next layer.
- a convolutional neural network algorithm can be used in advance to train a large number of sample images in the image database and the visual semantic label vectors of the sample images, thereby obtaining a label recognition model.
- the pre-trained label recognition model is used to visually identify each frame of the image to obtain the visual semantic label vector of each frame of the image, and determined according to the probability distribution of the material at each visual semantic label.
- the visual semantic tags corresponding to the materials are labeled with different visual semantic tags for different materials, so that the user can explain the reason for the music recommendation through the visual semantic tags.
- a label recognition model is directly used to determine the visual semantic label vector of the image, and the visual semantic label of the image is determined according to the visual semantic label vector.
- Step 202 The server device searches each candidate music library that matches at least one visual semantic tag from the candidate music library.
- the server device uses the pre-trained music search model based on the at least one visual semantic tag to search for each matching music that matches the at least one visual semantic tag from the candidate music library.
- the visual semantic label is "missing my old mother"
- the server device searches the candidate music library for the matching music that matches "missing my old mother” according to the music search model as "Mother” by Yan Weiwen.
- the music search model is a model for performing music search according to the input search word, and obtaining music matching the search word.
- the music search model can be obtained through text classification algorithms or the relationship between text and music.
- the specific method of obtaining the music search model is not limited here.
- a preset text classification algorithm is used for training text and music to obtain a music search model as an example for description.
- the server device may obtain a music search model after performing text training by using a preset text classification algorithm based on the music review information of each user on each music in advance.
- Text classification algorithms are used for text classification. This is because the massive music review information of each user on each song can reflect the theme and mood of each song, and different songs may have completely different review styles.
- FIG. 3c is an example of a user's music review.
- FIG. 3d is an example of a user's music review.
- the three songs in Figure 3d are Huslan's “Hong Yan”, Yan Weiwen's “Mother”, and the military song “Military Flowers in the Army”.
- the comments of "Hong Yan” are mostly concentrated Homesickness, hometown, Inner Mongolia, and Saibei, "Mother” is mostly for the love of children, parents, and “Green Flowers in the Army” is more about the life of the army and the military.
- the text classification algorithm may adopt FastText.
- FIG. 3e is a schematic diagram of a model structure of FastText.
- the input layer (x1, x2 ... xN ) is used to input the user's music review information;
- the hidden layer is used to generate a hidden layer vector based on the input music review information;
- the output layer is used to classify the hidden layer vector , Which is classified by music.
- the optimization objective function of FastText is:
- x n is user's music review information
- y n is music
- matrix parameter A is a word-based quick lookup table, that is, the word's embedding vector
- the mathematical meaning of the Ax n matrix operation is to add or take the word's embedding vector Average to get the hidden layer vector.
- the matrix parameter B is a parameter of the function f
- the function f is a multi-class linear function.
- a preset text classification algorithm is used to perform text training to obtain a music search model, and a pre-trained music search model is used to search out and visualize from the candidate music library. Semantic tags match each matching music.
- Step 203 The server device determines user appreciation information for each matching music by the user corresponding to the material.
- step 203 when step 203 is performed, the following methods may be adopted:
- the first method is to use one parameter value or a weighted average value of multiple parameter values of the music appreciation behavior data as the user appreciation information for the music appreciation behavior data of each matching music by the user providing the material.
- the second method is: The server device predicts the estimated music appreciation information of each matching music by the user based on the actual music appreciation information of each matching music by the similar users of the user, and uses the estimated music appreciation information as the user appreciation. information.
- the third method is as follows: the server device obtains a predetermined estimated evaluation matrix, and directly obtains the user's estimated music appreciation information for each matching music in the estimated evaluation matrix, and uses the estimated music appreciation information as the user appreciation information.
- priorities can be set for various methods.
- the priority order of the methods is not limited.
- the server device obtains user attribute information of each user who appreciates each matching music, and filters out similar users whose user attribute information is similar to the user attribute information of the user who inputs the material.
- the server device separately acquires actual music appreciation information of each similar user for each matching music.
- the server device averages the actual music appreciation information of each matching music by each similar user, and estimates the user's estimated music appreciation information of each matching music.
- the server device sorts each matching music according to the estimated music appreciation information of the corresponding music by the user corresponding to the material, and the estimated music appreciation information of the matching music by the user is based on the different users on each candidate.
- the actual music appreciation information of the music is obtained.
- the server device obtains a parameter value of one piece of music appreciation behavior data of music according to a user corresponding to the material, or is obtained by weighting the parameter values of at least two types of music appreciation behavior data of music. Comprehensive value to sort each matching music.
- the user attribute information is used to describe the characteristics of the user.
- the user attribute information may include: gender, age, education, and job.
- a user's actual music appreciation information for a piece of music is obtained by weighting each parameter value contained in the user's music appreciation behavior data; the music appreciation behavior data contains any one or any combination of the following parameters: music Ratings, click-through rates, favorite behavior, like behavior, and sharing behavior.
- the estimated music appreciation information of the matching music by the users can be predicted, so that the matching music can be recommended for the users according to the actual music appreciation information of the similar users.
- the server device determines an estimated evaluation matrix in advance based on actual user appreciation information of each candidate music in the candidate music library by each user.
- the server device composes a scoring matrix based on each user's actual music appreciation information for each candidate music.
- the element mij in the scoring matrix represents a value corresponding to the appreciation of the music j by the user i.
- the server device performs matrix decomposition on the scoring matrix by using a preset matrix decomposition algorithm to obtain a user matrix and a music feature matrix.
- the product of the transposition of each music feature vector in the music feature matrix and each user vector in the user matrix is determined as the estimated music appreciation information of each user for each music.
- the matrix decomposition algorithm may use the FunkSVD algorithm, and the specific principle is as follows:
- M is a scoring matrix
- P is a user matrix
- Q is a music feature matrix
- m is the total number of users
- n is the total number of music
- k is a parameter.
- the estimated music score of user j for music j can be expressed by qTjpi.
- p is the user vector and q is the music feature vector.
- the mean square error is used as a loss function to determine the final P and Q.
- p is a user vector
- q is a music feature vector
- ⁇ is a regularization coefficient
- i is a user number
- j is a music number.
- pi pi + ⁇ ((mij-qTjpi) qj- ⁇ pi);
- qj qj + ⁇ ((mij-qTjpi) pi- ⁇ qj);
- the user matrix and the music feature matrix can be obtained through matrix decomposition, and then based on the user matrix and the music feature matrix, each user's prediction of each music is obtained Evaluate the evaluation matrix, and determine the estimated evaluation matrix as the user's estimated music appreciation information for each candidate music.
- step 204 the server device sorts each matching music according to the user appreciation information of each matching music by the user corresponding to the material.
- Step 205 The server device filters each matching music according to a preset music filtering condition based on the ranking result, and recommends the filtered matching music as a candidate music of the material.
- the server device selects matching music that matches the preset music filtering conditions according to the sorting among the matching music, and directly displays the filtered candidate music to the user according to the sorting or sends the information of the candidate music to the terminal device. .
- the music filtering condition may be filtering out matching music whose value in the user appreciation information is higher than a set value, or according to a sorting result from high to low, filtering out a match whose serial number is higher than the set value Music, or countdown set number of matching music.
- the user can select his favorite music among the candidate music for material soundtrack.
- FIG. 3f is a first schematic diagram of a music recommendation application interface.
- the terminal device asks the user whether the music is a small video soundtrack.
- FIG. 3g which is an example of a matching music recommendation for a material.
- the terminal device determines that the user is a small video soundtrack, it sends a small video to the server device, the server device parses the small video, and determines that the visual semantic tags of the small video are snow and motion. Then, the server device searches the massive music library (candidate music library) for 5 songs matching the snow, and searches for 5 songs matching the movement. Next, the server device sorts the songs according to the user's estimated music appreciation information for the above 10 songs.
- FIG. 3h which is a second schematic diagram of a music recommendation application interface. In FIG. 3h, the top 5 songs are recommended to the user according to the order.
- the terminal device receives the information of the candidate music returned by the server device, and displays the information of the candidate music to the user, determines to receive the user's instruction information for specifying the soundtrack music from the candidate music, and obtains the synthesized and Output clips with soundtrack music.
- the first way is: sending the instruction information to the server device, and receiving the material from the server device, which is composed of the soundtrack music.
- the second method is: sending the instruction information to the server device, and receiving the soundtrack music returned by the server device according to the instruction information, and synthesizing the soundtrack music into the material.
- the server device receives the instruction information sent from the terminal device to specify the soundtrack music from the candidate music, synthesizes the soundtrack music into the material according to the instruction information, and sends the synthesized material to the terminal device.
- a number of material semantic tags of the material are determined, and a number of matching music matched by the material semantic tags are searched based on a music search model obtained by each user for the music review information of each music, and based on the user appreciation information of the user, Sort the matching music and make music recommendations to users based on the sorted results.
- Sort the matching music and make music recommendations to users based on the sorted results it is possible to personalize services according to different users' preferences for different music, that is, to make different recommendations to different users, not only to recommend users music that matches the material, but also to recommend users to the music they like.
- An embodiment of the present application further provides a method for music recommendation, which is executed by a terminal device and includes:
- the terminal device sends the material to be dubbed to the server device, and triggers the server device to perform the following steps: determining at least one visual semantic tag of the material; searching from the candidate music library for each matching music that matches the at least one visual semantic tag; according to the material
- Corresponding users sort the estimated music appreciation information of each matching music, sort each matching music; based on the sorting result, filter the matching music according to the preset music filtering conditions, and recommend the filtered matching music as a backup of the material Choose music.
- the terminal device receives the alternative music returned by the server device.
- the estimated music appreciation information of the user for each matching music is obtained based on the actual music appreciation information of each candidate music by different users.
- FIG. 3i is an interactive sequence diagram of a music soundtrack.
- the specific implementation process of this method is as follows:
- Step 301 The terminal device sends instruction information for scoring the material to the server device.
- Step 302 The terminal device receives the candidate music based on the material recommendation returned by the server device.
- Step 303 The terminal device sends, to the server device, instruction information for using the specified music in the candidate music for soundtracking.
- Step 304 The terminal device receives the music-synthesized material returned by the server device.
- an embodiment of the present application further provides a device for music recommendation. Since the principle of solving the problem by the above device and device is similar to the method for music recommendation described above, the implementation of the device can refer to the implementation of the above method and repeat The details are not repeated here.
- FIG. 4a it is a first structural schematic diagram of a music recommendation device according to an embodiment of the present application, including:
- An obtaining unit 400 for obtaining materials to be soundtracked
- a first determining unit 401 configured to determine at least one visual semantic tag of a material, and each visual semantic tag is used to describe at least one content of the material;
- a search unit 402 configured to search each matching music that matches at least one visual semantic tag from a candidate music library
- a sorting unit 403 configured to sort each matching music according to the user appreciation information for each matching music corresponding to the material
- the recommendation unit 404 is configured to filter matching music according to a preset music filtering condition based on the ranking result, and recommend the filtered matching music as a candidate music of the material.
- the recommendation unit 404 is further configured to:
- the first determining unit 401 further includes:
- a second determining unit configured to determine at least one visual semantic tag specified by the user from the alternative visual semantic tags as at least one visual semantic tag of the material;
- a parsing unit configured to parse the content of the material and determine at least one visual semantic tag of the material.
- the parsing unit is specifically configured to:
- a pre-trained label recognition model is used to perform visual semantic label recognition on the material to obtain the visual semantic label vector of the material, and the visual semantic labels whose scores in the visual semantic label vector meet the preset filtering conditions are determined.
- the image collection includes at least one frame of image
- the visual semantic label vector of the material includes: at least one visual semantic label of the content identified from the material and its corresponding score
- the label recognition model is performed on multiple label recognition samples. After training, each label recognition sample includes a sample image and a visual semantic label vector of the sample image.
- the parsing unit is specifically configured to:
- the material is a video
- the material is frame-parsed to obtain each frame image
- the visual semantic label vector of a frame of image includes at least one visual semantic label of the content identified from the frame of the image and its corresponding score
- the label recognition model is obtained after training multiple label recognition samples, Each label recognition sample includes a sample image and a visual semantic label vector of the sample image.
- the search unit 402 is specifically configured to:
- the music search model is obtained after each user performs text classification training on the music review information of each music.
- the sorting unit 403 is specifically configured to:
- the estimated music appreciation information of each matching music by the user corresponding to the material sort each matching music, and the estimated music appreciation information of each matching music by the user is obtained based on the actual music appreciation information of each candidate music by different users;
- a user's actual music appreciation information for a piece of music is obtained by weighting each parameter value included in the user's music appreciation behavior data;
- the music appreciation behavior data includes any one or any combination of the following parameters : Music ratings, click-through rates, favorite behavior, like behavior, and sharing behavior.
- the sorting unit 403 is specifically configured to:
- For the matched music obtain user attribute information of each user who appreciates the matched music, and filter out similar users whose user attribute information is similar to the user attribute information of the user who inputs the material;
- the average music processing information of each matching music is averaged for each similar user, and the estimated music appreciation information of each matching music is estimated by the user.
- the sorting unit 403 is specifically configured to:
- the product of the transpose of each music feature vector in the music feature matrix and each user vector in the user matrix is determined as the estimated music appreciation information of each user for each music.
- the sorting unit 403 is specifically configured to:
- the music appreciation behavior data of a user for a piece of music includes any one or any combination of the following parameters: music score, click rate, favorite behavior, like behavior, and sharing behavior.
- FIG. 4b it is a second structural schematic diagram of a music recommendation device according to an embodiment of the present application, including:
- the sending unit 410 is configured to send the material to be scored to the server device, and trigger the server device to perform the following steps: determine at least one visual semantic tag of the material; and search for each matching music that matches the at least one visual semantic tag from the candidate music library. ; According to the estimated music appreciation information of each matching music by the user corresponding to the material, sort each matching music; based on the sorting result, filter the matching music according to the preset music filtering conditions, and recommend the filtered matching music as Alternative music for the material;
- a receiving unit 411 configured to receive candidate music returned by the server device
- the estimated music appreciation information of the user for each matching music is obtained based on the actual music appreciation information of each candidate music by different users.
- an embodiment of the present application further provides a computing device including at least one processing unit and at least one storage unit, where the storage unit stores a computer program, and when the program is executed by the processing unit, the processing unit Perform the steps of the method described in the above embodiments.
- the computing device may be a server device or a terminal device, and both the server device and the terminal device may adopt the structure shown in FIG. 5.
- the terminal device is taken as an example to describe the structure of the computing device.
- An embodiment of the present application provides a terminal device 500. Referring to FIG. 5, the terminal device 500 is configured to implement the methods described in the foregoing method embodiments.
- the terminal device 500 may include a memory 501, The processor 502, the input unit 503, and the display panel 504.
- the memory 501 is configured to store a computer program executed by the processor 502.
- the memory 501 may mainly include a storage program area and a storage data area, where the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the terminal device 500, and the like.
- the processor 502 may be a central processing unit (CPU), or a digital processing unit.
- the input unit 503 may be configured to obtain a user instruction input by a user.
- the display panel 504 is used to display information input by or provided to the user. In the embodiment of the present application, the display panel 504 is mainly used to display the display interface of each application in the terminal device and the control entity displayed in each display interface. . In the embodiment of the present application, the display panel 504 may be configured with a liquid crystal display (LCD) or an organic light-emitting diode (OLED).
- LCD liquid crystal display
- OLED organic light-emitting diode
- the embodiments of the present application are not limited to specific connection media between the memory 501, the processor 502, the input unit 503, and the display panel 504.
- the memory 501, the processor 502, the input unit 503, and the display panel 504 are connected by a bus 505 in FIG. 5.
- the bus 505 is indicated by a thick line in FIG. 5. It is for illustrative purposes and is not limited.
- the bus 505 can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is used in FIG. 5, but it does not mean that there is only one bus or one type of bus.
- the memory 501 may be a volatile memory, such as a random-access memory (RAM); the memory 501 may also be a non-volatile memory, such as a read-only memory, and a flash memory.
- RAM random-access memory
- the memory 501 may also be a non-volatile memory, such as a read-only memory, and a flash memory.
- Memory flash memory
- HDD hard disk
- SSD solid-state drive
- memory 501 can be used to carry or store the desired program code in the form of instructions or data structures and can be implemented by Any other media that the computer accesses, but is not limited to.
- the memory 501 may be a combination of the above-mentioned memories.
- the processor 502 is configured to implement the embodiment shown in FIG. 2 and includes:
- the processor 502 is configured to call a computer program stored in the memory 501 to execute the embodiment shown in FIG. 2.
- An embodiment of the present application further provides a computer-readable storage medium that stores computer-executable instructions that need to be executed to execute the processor, and includes a program that is required to execute the processor.
- the storage medium stores a computer program executable by a computing device, and when the program runs on the computing device, causes the computing device to execute the steps of the method described in the foregoing embodiment.
- aspects of a method for music recommendation provided in the present application may also be implemented in the form of a program product, which includes program code.
- the program product runs on a terminal device, the program code is used
- the method is to enable the terminal device to perform the steps in a method for music recommendation according to various exemplary embodiments of the present application described above in this specification.
- the terminal device may execute the embodiment shown in FIG. 2.
- the program product may employ any combination of one or more readable media.
- the readable medium may be a readable signal medium or a readable storage medium.
- the readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing.
- the program product for a music recommendation may adopt a portable compact disc read-only memory (CD-ROM) and include a program code, and may run on a computing device.
- CD-ROM portable compact disc read-only memory
- the program product of the present application is not limited thereto.
- the readable storage medium may be any tangible medium containing or storing a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device.
- the readable signal medium may include a data signal that is borne in baseband or propagated as part of a carrier wave, in which readable program code is carried. Such a propagated data signal may take a variety of forms, including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing.
- the readable signal medium may also be any readable medium other than a readable storage medium, and the readable medium may send, propagate, or transmit a program for use by or in combination with an instruction execution system, apparatus, or device.
- Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
- the program code used to perform the operations of this application can be written in any combination of one or more programming languages.
- the programming languages include entity-oriented programming languages—such as Java, C ++, etc., and also include conventional procedural programming. Language—such as "C” or a similar programming language.
- the program code may be executed entirely on the user computing device, partly on the user device, as an independent software package, partly on the user computing device, partly on the remote computing device, or entirely on the remote computing device or server On the device.
- the remote computing device may be connected to the user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it may be connected to an external computing device (e.g., using Internet services Provider to connect via the Internet).
- LAN local area network
- WAN wide area network
- Internet services Provider to connect via the Internet
- this application may be provided as a method, a system, or a computer program product. Therefore, this application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Moreover, this application may take the form of a computer program product implemented on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
- computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
- These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing device to work in a particular manner such that the instructions stored in the computer-readable memory produce a manufactured article including an instruction device, the instructions
- the device implements the functions specified in one or more flowcharts and / or one or more blocks of the block diagram.
- These computer program instructions can also be loaded on a computer or other programmable data processing device, so that a series of steps can be performed on the computer or other programmable device to produce a computer-implemented process, which can be executed on the computer or other programmable device.
- the instructions provide steps for implementing the functions specified in one or more flowcharts and / or one or more blocks of the block diagrams.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Library & Information Science (AREA)
- Acoustics & Sound (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
Description
Claims (18)
- 一种音乐推荐的方法,其特征在于,由服务器设备执行,包括:获取待配乐的素材;确定所述素材的至少一个视觉语义标签,每个视觉语义标签用于描述素材的至少一项内容;从候选音乐库中,搜索出与所述至少一个视觉语义标签匹配的各个匹配音乐;根据所述素材对应的用户针对各个匹配音乐的用户鉴赏信息,对各个匹配音乐进行排序;基于排序结果,按照预设的音乐筛选条件对匹配音乐进行筛选,并将筛选出的匹配音乐推荐为所述素材的备选音乐。
- 如权利要求1所述的方法,其特征在于,所述方法还包括:接收终端设备发送的从所述备选音乐中指定配乐音乐的指示信息;根据所述指示信息,将所述配乐音乐合成到所述素材;将合成有音乐的素材发送给终端设备。
- 如权利要求1所述的方法,其特征在于,所述确定所述素材的至少一个视觉语义标签,包括:将所述用户从备选的视觉语义标签中指定的至少一个视觉语义标签,确定为所述素材的至少一个视觉语义标签;或者,解析所述素材的内容,确定所述素材的至少一个视觉语义标签。
- 如权利要求3所述的方法,其特征在于,解析所述素材的内容,确定所述素材的至少一个视觉语义标签,包括:所述素材为图像集合时,利用预先训练的标签识别模型,对所述素材进行视觉语义标签识别,获得所述素材的视觉语义标签向量,并将所述视觉语义标签向量中分值符合预设筛选条件的视觉语义标签,确定为所述素材对应的视觉语义标签;其中:所述图像集合中包含至少一帧图像,所述素材的视觉语义标签向量包括:从素材中识别出的内容的至少一个视觉语义标签及其对应的分值,所述标签识别模型为对多个标签识别样本进行训练后获得的,每个标签识别样本包括样本图像和该样本图像的视觉语义标签向量。
- 如权利要求3所述的方法,其特征在于,所述解析所述素材的内容,确定所述素材的至少一个视觉语义标签,包括;所述素材为视频时,将所述素材进行帧解析,获得各帧图像;利用预先训练的标签识别模型,分别对每一帧图像进行视觉语义标签识别,获得每一帧图像的视觉语义标签向量;将各帧图像的视觉语义标签向量的平均向量中,分值符合预设筛选条件的视觉语义标签,确定为所述素材对应的视觉语义标签;其中:一帧图像的视觉语义标签向量包括:从该帧图像中识别出的内容的至少一个视觉语义标签及其对应的分值,所述标签识别模型为对多个标签识别样本进行训练后获得的,每个标签识别样本包括样本图像和该样本图像的视觉语义标签向量。
- 如权利要求1所述的方法,其特征在于,所述搜索出与所述至少一个视觉语义标签匹配的各个匹配音乐,包括:基于所述至少一个视觉语义标签,采用预先训练的音乐搜索模型,获得与所述至少一个视觉语义标签匹配的各个匹配音乐;其中,所述音乐搜索模型是将各用户对各音乐的音乐评论信息进行文本分类训练后获得的。
- 如权利要求1~6任一项所述的方法,其特征在于,根据所述素材对应的用户针对各个匹配音乐的用户鉴赏信息,对各个匹配音乐进行排序,包括:根据所述素材对应的用户对各个匹配音乐的预估音乐鉴赏信息,对各个匹配音乐进行排序,所述用户对各个匹配音乐的预估音乐鉴赏信息是基于不同用户对各个候选音乐的实际音乐鉴赏信息获得的;其中,一个用户对一首音乐的所述实际音乐鉴赏信息是对用户的音乐鉴赏行为数据中包含的各个参数值进行加权处理后获得的;所述音乐鉴赏行为数据中包含以下参数中的任意一种或任意组合:音乐评分、点击率、收藏行为、点赞行为,以及分享行为。
- 如权利要求7所述的方法,其特征在于,在根据所述素材对应的用户对各个匹配音乐的预估音乐鉴赏信息,对各个匹配音乐进行排序之前,进一步包括:针对匹配音乐,获取鉴赏该匹配音乐的各用户的用户属性信息,并筛选出用户属性信息与所述用户的用户属性信息相似的各相似用户;获取各相似用户对各匹配音乐的实际音乐鉴赏信息;分别对各相似用户分别对每一匹配音乐的实际音乐鉴赏信息进行平均值处理,预估所述用户对各匹配音乐的预估音乐鉴赏信息。
- 如权利要求7所述的方法,其特征在于,在根据所述素材对应的用户对各个匹配音乐的预估音乐鉴赏信息,对各个匹配音乐进行排序之前,进一步包括:基于各用户对各候选音乐的实际音乐鉴赏信息,获得评分矩阵;对所述评分矩阵进行矩阵分解以及优化处理,获得用户矩阵和音乐特征矩阵;分别将所述音乐特征矩阵中的每一音乐特征向量的转置与所述用户矩阵中的每一用户向量的乘积,确定为每一用户对每一音乐的预估音乐鉴赏信息。
- 如权利要求1~6任一项所述的方法,其特征在于,根据所述素材对应的用户针对各个匹配音乐的用户鉴赏信息,对各个匹配音乐进行排序,包括:根据所述素材对应的用户对音乐的一种音乐鉴赏行为数据的参数值,或者是对音乐的至少两种音乐鉴赏行为数据的参数值进行加权处理后获得的综合值,对各个匹配音乐进行排序;其中,一个用户对一首音乐的音乐鉴赏行为数据中包含以下参数中的任意一种或任意组合:音乐评分、点击率、收藏行为、点赞行为,以及分享行为。
- 一种音乐推荐的方法,其特征在于,由终端设备执行,包括:向服务器设备发送待配乐的素材,触发所述服务器设备执行以下步骤:确定所述素材的至少一个视觉语义标签;从候选音乐库中,搜索出与所述至少一个视觉语义标签匹配的各个匹配音乐;根据所述素材对应的用户对各个匹配音乐的预估音乐鉴赏信息,对各个匹配音乐进行排序;基于排序结果,按照预设的音乐筛选条件对匹配音乐进行筛选,并将筛选出的匹配音乐推荐为所述素材的备选音乐;接收所述服务器设备返回的备选音乐;其中,所述用户对各个匹配音乐的预估音乐鉴赏信息是基于不同用户对各个候选音乐的实际音乐鉴赏信息获得的。
- 一种音乐推荐的装置,其特征在于,包括:获取单元,用于获取待配乐的素材;第一确定单元,用于确定所述素材的至少一个视觉语义标签,每个视觉语义标签用于描述素材的至少一项内容;搜索单元,用于从候选音乐库中,搜索出与所述至少一个视觉语义标签匹配的各个匹配音乐;排序单元,用于根据所述素材对应的用户针对各个匹配音乐的用户鉴赏信息,对各个匹配音乐进行排序;推荐单元,用于基于排序结果,按照预设的音乐筛选条件对匹配音乐进行筛选,并将筛选出的匹配音乐推荐为所述素材的备选音乐。
- 如权利要求12所述的装置,其特征在于,所述第一确定单元还包括:第二确定单元,用于将所述用户从备选的视觉语义标签中指定的至少一个视觉语义标签,确定为所述素材的至少一个视觉语义标签;或者,解析单元,用于解析所述素材的内容,确定所述素材的至少一个视觉语义标签。
- 如权利要求12或13所述的装置,其特征在于,所述排序单元具体用于:根据所述素材对应的用户对各个匹配音乐的预估音乐鉴赏信息,对各个匹配音乐进行排序,所述用户对各个匹配音乐的预估音乐鉴赏信息是基于不同用户对各个候选音乐的实际音乐鉴赏信息获得的;其中,一个用户对一首音乐的所述实际音乐鉴赏信息是对用户的音乐鉴赏行为数据中包含的各个参数值进行加权处理后获得的;所述音乐鉴赏行为数据中包含以下参数中的任意一种或任意组合:音乐评分、点击率、收藏行为、点赞行为,以及分享行为。
- 如权利要求14所述的装置,其特征在于,所述排序单元具体用于:针对匹配音乐,获取鉴赏该匹配音乐的各用户的用户属性信息,并筛选出用户属性信息与所述用户的用户属性信息相似的各相似用户;获取各相似用户对各匹配音乐的实际音乐鉴赏信息;分别对各相似用户分别对每一匹配音乐的实际音乐鉴赏信息进行平均值处理,预估所述用户对各匹配音乐的预估音乐鉴赏信息;基于各用户对各候选音乐的实际音乐鉴赏信息,获得评分矩阵;对所述评分矩阵进行矩阵分解以及优化处理,获得用户矩阵和音乐特征矩阵;分别 将所述音乐特征矩阵中的每一音乐特征向量的转置与所述用户矩阵中的每一用户向量的乘积,确定为每一用户对每一音乐的预估音乐鉴赏信息;或者根据所述素材对应的用户对音乐的一种音乐鉴赏行为数据的参数值,或者是对音乐的至少两种音乐鉴赏行为数据的参数值进行加权处理后获得的综合值,对各个匹配音乐进行排序;其中,一个用户对一首音乐的音乐鉴赏行为数据中包含以下参数中的任意一种或任意组合:音乐评分、点击率、收藏行为、点赞行为,以及分享行为。
- 一种音乐推荐的装置,其特征在于,包括:发送单元,用于向服务器设备发送待配乐的素材,触发所述服务器设备执行以下步骤:确定所述素材的至少一个视觉语义标签;从候选音乐库中,搜索出与所述至少一个视觉语义标签匹配的各个匹配音乐;根据所述素材对应的用户对各个匹配音乐的预估音乐鉴赏信息,对各个匹配音乐进行排序;基于排序结果,按照预设的音乐筛选条件对匹配音乐进行筛选,并将筛选出的匹配音乐推荐为所述素材的备选音乐;接收单元,用于接收所述服务器设备返回的备选音乐;其中,所述用户对各个匹配音乐的预估音乐鉴赏信息是基于不同用户对各个候选音乐的实际音乐鉴赏信息获得的。
- 一种计算设备,其特征在于,包括至少一个处理单元、以及至少一个存储单元,其中,所述存储单元存储有计算机程序,当所述程序被所述处理单元执行时,使得所述处理单元执行权利要求1~10或11任一权利要求所述方法的步骤。
- 一种计算机可读介质,其特征在于,其存储有可由计算设备执行的计算机程序,当所述程序在计算设备上运行时,使得所述计算设备执行权利要求1~10或11任一所述方法的步骤。
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2020549554A JP7206288B2 (ja) | 2018-08-14 | 2019-08-01 | 音楽推薦方法、装置、コンピューティング機器及び媒体 |
| EP19849335.5A EP3757995A4 (en) | 2018-08-14 | 2019-08-01 | METHOD AND DEVICE FOR RECOMMENDING MUSIC AND COMPUTER DEVICE AND MEDIUM |
| US17/026,477 US11314806B2 (en) | 2018-08-14 | 2020-09-21 | Method for making music recommendations and related computing device, and medium thereof |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201810924409.0A CN109063163B (zh) | 2018-08-14 | 2018-08-14 | 一种音乐推荐的方法、装置、终端设备和介质 |
| CN201810924409.0 | 2018-08-14 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/026,477 Continuation US11314806B2 (en) | 2018-08-14 | 2020-09-21 | Method for making music recommendations and related computing device, and medium thereof |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2020034849A1 true WO2020034849A1 (zh) | 2020-02-20 |
Family
ID=64683893
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2019/098861 Ceased WO2020034849A1 (zh) | 2018-08-14 | 2019-08-01 | 音乐推荐的方法、装置、计算设备和介质 |
Country Status (5)
| Country | Link |
|---|---|
| US (1) | US11314806B2 (zh) |
| EP (1) | EP3757995A4 (zh) |
| JP (1) | JP7206288B2 (zh) |
| CN (1) | CN109063163B (zh) |
| WO (1) | WO2020034849A1 (zh) |
Cited By (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112597320A (zh) * | 2020-12-09 | 2021-04-02 | 上海掌门科技有限公司 | 社交信息生成方法、设备及计算机可读介质 |
| CN114372469A (zh) * | 2022-01-14 | 2022-04-19 | 平安科技(深圳)有限公司 | 实体样本的抽取方法、系统及存储介质 |
| CN114390342A (zh) * | 2021-12-10 | 2022-04-22 | 阿里巴巴(中国)有限公司 | 一种视频配乐方法、装置、设备及介质 |
| CN115115745A (zh) * | 2022-06-24 | 2022-09-27 | 北京华录新媒信息技术有限公司 | 自主创作型的数字艺术的生成方法、系统、存储介质及电子设备 |
| CN115866327A (zh) * | 2021-09-22 | 2023-03-28 | 腾讯科技(深圳)有限公司 | 一种背景音乐添加方法和相关装置 |
| CN116501916A (zh) * | 2023-04-26 | 2023-07-28 | 腾讯音乐娱乐科技(深圳)有限公司 | 配乐推荐方法、模型的训练方法、设备及存储介质 |
| JP2023535047A (ja) * | 2020-08-31 | 2023-08-15 | レモン インコーポレイテッド | マルチメディア作品の作成方法、装置及びコンピュータ可読記憶媒体 |
| CN117349257A (zh) * | 2022-06-28 | 2024-01-05 | 教育科技加私人有限公司 | 乐谱训练数据库的构建和应用 |
| CN119202307A (zh) * | 2024-09-18 | 2024-12-27 | 腾讯音乐娱乐科技(深圳)有限公司 | 歌曲推荐模型的处理方法、计算机设备和可读存储介质 |
Families Citing this family (53)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8805854B2 (en) * | 2009-06-23 | 2014-08-12 | Gracenote, Inc. | Methods and apparatus for determining a mood profile associated with media data |
| CN109063163B (zh) * | 2018-08-14 | 2022-12-02 | 腾讯科技(深圳)有限公司 | 一种音乐推荐的方法、装置、终端设备和介质 |
| CN109587554B (zh) * | 2018-10-29 | 2021-08-03 | 百度在线网络技术(北京)有限公司 | 视频数据的处理方法、装置及可读存储介质 |
| CN109766493B (zh) * | 2018-12-24 | 2022-08-02 | 哈尔滨工程大学 | 一种在神经网络下结合人格特征的跨域推荐方法 |
| CN111401100B (zh) * | 2018-12-28 | 2021-02-09 | 广州市百果园信息技术有限公司 | 视频质量评估方法、装置、设备及存储介质 |
| CN111435369B (zh) * | 2019-01-14 | 2024-04-09 | 腾讯科技(深圳)有限公司 | 音乐推荐方法、装置、终端及存储介质 |
| CN109862393B (zh) * | 2019-03-20 | 2022-06-14 | 深圳前海微众银行股份有限公司 | 视频文件的配乐方法、系统、设备及存储介质 |
| CN110297939A (zh) * | 2019-06-21 | 2019-10-01 | 山东科技大学 | 一种融合用户行为和文化元数据的音乐个性化系统 |
| CN112182281B (zh) * | 2019-07-05 | 2023-09-19 | 腾讯科技(深圳)有限公司 | 一种音频推荐方法、装置及存储介质 |
| CN115049052B (zh) * | 2019-08-28 | 2025-09-19 | 第四范式(北京)技术有限公司 | 一种商品推荐模型的训练方法、装置及电子设备 |
| CN110727785A (zh) * | 2019-09-11 | 2020-01-24 | 北京奇艺世纪科技有限公司 | 推荐模型的训练、搜索文本的推荐方法、装置及存储介质 |
| JP7188337B2 (ja) * | 2019-09-24 | 2022-12-13 | カシオ計算機株式会社 | サーバ装置、演奏支援方法、プログラム、および情報提供システム |
| CN112559777B (zh) * | 2019-09-25 | 2024-10-25 | 北京达佳互联信息技术有限公司 | 内容项投放方法、装置、计算机设备及存储介质 |
| CN110704682B (zh) * | 2019-09-26 | 2022-03-18 | 新华智云科技有限公司 | 一种基于视频多维特征智能推荐背景音乐的方法及系统 |
| CN110728539A (zh) * | 2019-10-09 | 2020-01-24 | 重庆特斯联智慧科技股份有限公司 | 一种基于大数据的顾客差异化管理的方法及装置 |
| CN110677711B (zh) * | 2019-10-17 | 2022-03-01 | 北京字节跳动网络技术有限公司 | 视频配乐方法、装置、电子设备及计算机可读介质 |
| US11907963B2 (en) * | 2019-10-29 | 2024-02-20 | International Business Machines Corporation | On-device privacy-preservation and personalization |
| CN110852047B (zh) * | 2019-11-08 | 2025-04-25 | 腾讯科技(深圳)有限公司 | 一种文本配乐方法、装置、以及计算机存储介质 |
| CN110839173A (zh) * | 2019-11-18 | 2020-02-25 | 上海极链网络科技有限公司 | 一种音乐匹配方法、装置、终端及存储介质 |
| CN110971969B (zh) * | 2019-12-09 | 2021-09-07 | 北京字节跳动网络技术有限公司 | 视频配乐方法、装置、电子设备及计算机可读存储介质 |
| CN111008287B (zh) * | 2019-12-19 | 2023-08-04 | Oppo(重庆)智能科技有限公司 | 音视频处理方法、装置、服务器及存储介质 |
| CN111031391A (zh) * | 2019-12-19 | 2020-04-17 | 北京达佳互联信息技术有限公司 | 视频配乐方法、装置、服务器、终端及存储介质 |
| CN111259192B (zh) * | 2020-01-15 | 2023-12-01 | 腾讯科技(深圳)有限公司 | 音频推荐方法和装置 |
| CN111259191A (zh) * | 2020-01-16 | 2020-06-09 | 石河子大学 | 一种中小学音乐教育学习系统与方法 |
| US11461649B2 (en) * | 2020-03-19 | 2022-10-04 | Adobe Inc. | Searching for music |
| CN111417030A (zh) * | 2020-04-28 | 2020-07-14 | 广州酷狗计算机科技有限公司 | 设置配乐的方法、装置、系统、设备及存储设备 |
| CN111800650B (zh) * | 2020-06-05 | 2022-03-25 | 腾讯科技(深圳)有限公司 | 视频配乐方法、装置、电子设备及计算机可读介质 |
| CN111695041B (zh) * | 2020-06-17 | 2023-05-23 | 北京字节跳动网络技术有限公司 | 用于推荐信息的方法和装置 |
| WO2022041182A1 (zh) * | 2020-08-31 | 2022-03-03 | 华为技术有限公司 | 音乐推荐方法和装置 |
| CN112214636B (zh) * | 2020-09-21 | 2024-08-27 | 华为技术有限公司 | 音频文件的推荐方法、装置、电子设备以及可读存储介质 |
| US11544315B2 (en) * | 2020-10-20 | 2023-01-03 | Spotify Ab | Systems and methods for using hierarchical ordered weighted averaging for providing personalized media content |
| US11693897B2 (en) | 2020-10-20 | 2023-07-04 | Spotify Ab | Using a hierarchical machine learning algorithm for providing personalized media content |
| CN113434763B (zh) * | 2021-06-28 | 2022-10-14 | 平安科技(深圳)有限公司 | 搜索结果的推荐理由生成方法、装置、设备及存储介质 |
| US11876841B2 (en) | 2021-07-21 | 2024-01-16 | Honda Motor Co., Ltd. | Disparate player media sharing |
| CN115687680A (zh) * | 2021-07-26 | 2023-02-03 | 脸萌有限公司 | 音乐筛选方法、装置、设备、存储介质及程序产品 |
| CN113706663B (zh) * | 2021-08-27 | 2024-02-02 | 脸萌有限公司 | 图像生成方法、装置、设备及存储介质 |
| CN113569088B (zh) * | 2021-09-27 | 2021-12-21 | 腾讯科技(深圳)有限公司 | 一种音乐推荐方法、装置以及可读存储介质 |
| CN114168787B (zh) * | 2021-11-17 | 2026-02-24 | 卓尔智联(武汉)研究院有限公司 | 音乐推荐方法、装置、计算机设备和存储介质 |
| CN114117142B (zh) * | 2021-12-02 | 2024-11-15 | 南京邮电大学 | 一种基于注意力机制与超图卷积的标签感知推荐方法 |
| CN114302225A (zh) * | 2021-12-23 | 2022-04-08 | 阿里巴巴(中国)有限公司 | 视频配乐方法、数据处理方法、设备及存储介质 |
| CN116431889B (zh) * | 2021-12-31 | 2026-03-31 | 腾讯科技(深圳)有限公司 | 内容推荐方法、装置、电子设备和存储介质 |
| CN115114473B (zh) * | 2022-01-14 | 2025-09-09 | 长城汽车股份有限公司 | 一种音乐推送方法、装置、电子设备及可读存储介质 |
| CN116932808A (zh) * | 2022-03-31 | 2023-10-24 | 安徽华米健康科技有限公司 | 助眠音乐的确定方法、装置及电子设备 |
| CN114637867A (zh) * | 2022-05-18 | 2022-06-17 | 合肥的卢深视科技有限公司 | 视频特效配置方法、装置、电子设备和存储介质 |
| CN115795023B (zh) * | 2022-11-22 | 2024-01-05 | 百度时代网络技术(北京)有限公司 | 文档推荐方法、装置、设备以及存储介质 |
| US12505149B2 (en) * | 2022-12-28 | 2025-12-23 | Adobe Inc. | Multi-modal sound effects recommendation |
| US20250014610A1 (en) * | 2023-07-03 | 2025-01-09 | Epidemic Sound AB | Video parsing and audio pairing |
| CN119520894A (zh) * | 2023-08-23 | 2025-02-25 | 北京字跳网络技术有限公司 | 视频处理方法、装置、电子设备及存储介质 |
| US20250094774A1 (en) * | 2023-09-14 | 2025-03-20 | Lemon Inc. | Implementing dialog-based music recommendations for videos |
| CN118626671B (zh) * | 2024-08-12 | 2024-12-17 | 南京财经大学 | 一种基于动态窗口word2vec模型的音乐推荐方法及系统 |
| CN119724132B (zh) * | 2024-11-27 | 2025-09-16 | 安徽城市管理职业学院 | 基于智能语音分析的音乐生成方法及装置 |
| CN119848289B (zh) * | 2025-03-18 | 2025-08-15 | 联通沃音乐文化有限公司 | 一种多模态多任务大模型的训练方法、系统及介质 |
| CN121151655A (zh) * | 2025-11-20 | 2025-12-16 | 粤港澳大湾区数字经济研究院(国际先进技术应用推进中心(深圳)) | 视频生成方法、电子设备及存储介质 |
Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20110320454A1 (en) * | 2010-06-29 | 2011-12-29 | International Business Machines Corporation | Multi-facet classification scheme for cataloging of information artifacts |
| CN102637178A (zh) * | 2011-02-14 | 2012-08-15 | 北京瑞信在线系统技术有限公司 | 一种音乐推荐方法、装置及系统 |
| US20130077937A1 (en) * | 2011-09-26 | 2013-03-28 | Sony Corporation | Apparatus and method for producing remote streaming audiovisual montages |
| CN105975472A (zh) * | 2015-12-09 | 2016-09-28 | 乐视网信息技术(北京)股份有限公司 | 一种推荐方法和装置 |
| WO2018081751A1 (en) * | 2016-10-28 | 2018-05-03 | Vilynx, Inc. | Video tagging system and method |
| CN108153831A (zh) * | 2017-12-13 | 2018-06-12 | 北京小米移动软件有限公司 | 音乐添加方法及装置 |
| WO2018145015A1 (en) * | 2017-02-06 | 2018-08-09 | Kodak Alaris Inc. | Method for creating audio tracks for accompanying visual imagery |
| CN109063163A (zh) * | 2018-08-14 | 2018-12-21 | 腾讯科技(深圳)有限公司 | 一种音乐推荐的方法、装置、终端设备和介质 |
Family Cites Families (27)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| DE69637504T2 (de) * | 1996-09-13 | 2009-06-25 | Hitachi, Ltd. | Automatisches musikkomponierverfahren |
| JP2006099740A (ja) | 2004-09-02 | 2006-04-13 | Olympus Corp | 情報提供装置、端末装置、情報提供システム及び情報提供方法 |
| EP1666967B1 (en) * | 2004-12-03 | 2013-05-08 | Magix AG | System and method of creating an emotional controlled soundtrack |
| KR101329266B1 (ko) | 2005-11-21 | 2013-11-14 | 코닌클리케 필립스 일렉트로닉스 엔.브이. | 관련된 오디오 반주를 찾도록 디지털 영상들의 컨텐트특징들과 메타데이터를 사용하는 시스템 및 방법 |
| US9032297B2 (en) * | 2006-03-17 | 2015-05-12 | Disney Enterprises, Inc. | Web based video editing |
| US9111146B2 (en) * | 2008-02-15 | 2015-08-18 | Tivo Inc. | Systems and methods for semantically classifying and normalizing shots in video |
| JP2009266005A (ja) * | 2008-04-25 | 2009-11-12 | Clarion Co Ltd | 画像検索方法、画像検索プログラム、楽曲再生装置、および楽曲検索用物品 |
| CN101727943B (zh) | 2009-12-03 | 2012-10-17 | 无锡中星微电子有限公司 | 一种图像配乐的方法、图像配乐装置及图像播放装置 |
| WO2012004650A1 (en) * | 2010-07-08 | 2012-01-12 | Siun Ni Raghallaigh | Systems and methods for dynamic, distributed creation of a musical composition to accompany a visual composition |
| US9045967B2 (en) | 2011-07-26 | 2015-06-02 | Schlumberger Technology Corporation | System and method for controlling and monitoring a drilling operation using refined solutions from a panistic inversion |
| CN103793447B (zh) | 2012-10-26 | 2019-05-14 | 汤晓鸥 | 音乐与图像间语义相似度的估计方法和估计系统 |
| JP2014095966A (ja) * | 2012-11-08 | 2014-05-22 | Sony Corp | 情報処理装置、情報処理方法およびプログラム |
| CN103605656B (zh) * | 2013-09-30 | 2018-02-02 | 小米科技有限责任公司 | 一种推荐音乐的方法、装置及一种移动终端 |
| CN103795897A (zh) | 2014-01-21 | 2014-05-14 | 深圳市中兴移动通信有限公司 | 自动生成背景音乐的方法和装置 |
| CN105072354A (zh) | 2015-07-17 | 2015-11-18 | Tcl集团股份有限公司 | 一种利用多张照片合成视频流的方法及系统 |
| TWI587574B (zh) | 2015-07-20 | 2017-06-11 | 廣達電腦股份有限公司 | 行動裝置 |
| US10178341B2 (en) * | 2016-03-01 | 2019-01-08 | DISH Technologies L.L.C. | Network-based event recording |
| CN105930429A (zh) * | 2016-04-19 | 2016-09-07 | 乐视控股(北京)有限公司 | 一种音乐推荐的方法及装置 |
| US9836853B1 (en) * | 2016-09-06 | 2017-12-05 | Gopro, Inc. | Three-dimensional convolutional neural networks for video highlight detection |
| KR20180036153A (ko) * | 2016-09-30 | 2018-04-09 | 주식회사 요쿠스 | 영상 편집 시스템 및 방법 |
| JP6589838B2 (ja) * | 2016-11-30 | 2019-10-16 | カシオ計算機株式会社 | 動画像編集装置及び動画像編集方法 |
| WO2018104563A2 (en) | 2016-12-09 | 2018-06-14 | Tomtom Global Content B.V. | Method and system for video-based positioning and mapping |
| KR101863672B1 (ko) * | 2016-12-15 | 2018-06-01 | 정우주 | 멀티미디어 컨텐츠 정보를 기반으로 사용자 맞춤형 멀티미디어 컨텐츠를 제공하는 방법 및 장치 |
| CN107220663B (zh) * | 2017-05-17 | 2020-05-19 | 大连理工大学 | 一种基于语义场景分类的图像自动标注方法 |
| CN107707828B (zh) | 2017-09-26 | 2019-07-26 | 维沃移动通信有限公司 | 一种视频处理方法及移动终端 |
| CN107959873A (zh) * | 2017-11-02 | 2018-04-24 | 深圳天珑无线科技有限公司 | 在视频中植入背景音乐的方法、装置、终端及存储介质 |
| CN108600825B (zh) * | 2018-07-12 | 2019-10-25 | 北京微播视界科技有限公司 | 选择背景音乐拍摄视频的方法、装置、终端设备和介质 |
-
2018
- 2018-08-14 CN CN201810924409.0A patent/CN109063163B/zh active Active
-
2019
- 2019-08-01 WO PCT/CN2019/098861 patent/WO2020034849A1/zh not_active Ceased
- 2019-08-01 EP EP19849335.5A patent/EP3757995A4/en not_active Ceased
- 2019-08-01 JP JP2020549554A patent/JP7206288B2/ja active Active
-
2020
- 2020-09-21 US US17/026,477 patent/US11314806B2/en active Active
Patent Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20110320454A1 (en) * | 2010-06-29 | 2011-12-29 | International Business Machines Corporation | Multi-facet classification scheme for cataloging of information artifacts |
| CN102637178A (zh) * | 2011-02-14 | 2012-08-15 | 北京瑞信在线系统技术有限公司 | 一种音乐推荐方法、装置及系统 |
| US20130077937A1 (en) * | 2011-09-26 | 2013-03-28 | Sony Corporation | Apparatus and method for producing remote streaming audiovisual montages |
| CN105975472A (zh) * | 2015-12-09 | 2016-09-28 | 乐视网信息技术(北京)股份有限公司 | 一种推荐方法和装置 |
| WO2018081751A1 (en) * | 2016-10-28 | 2018-05-03 | Vilynx, Inc. | Video tagging system and method |
| WO2018145015A1 (en) * | 2017-02-06 | 2018-08-09 | Kodak Alaris Inc. | Method for creating audio tracks for accompanying visual imagery |
| CN108153831A (zh) * | 2017-12-13 | 2018-06-12 | 北京小米移动软件有限公司 | 音乐添加方法及装置 |
| CN109063163A (zh) * | 2018-08-14 | 2018-12-21 | 腾讯科技(深圳)有限公司 | 一种音乐推荐的方法、装置、终端设备和介质 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP3757995A4 |
Cited By (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP7502553B2 (ja) | 2020-08-31 | 2024-06-18 | レモン インコーポレイテッド | マルチメディア作品の作成方法、装置及びコンピュータ可読記憶媒体 |
| JP2023535047A (ja) * | 2020-08-31 | 2023-08-15 | レモン インコーポレイテッド | マルチメディア作品の作成方法、装置及びコンピュータ可読記憶媒体 |
| US12306867B2 (en) | 2020-08-31 | 2025-05-20 | Lemon Inc. | Production method of multimedia work, apparatus, and computer-readable storage medium |
| CN112597320A (zh) * | 2020-12-09 | 2021-04-02 | 上海掌门科技有限公司 | 社交信息生成方法、设备及计算机可读介质 |
| CN115866327A (zh) * | 2021-09-22 | 2023-03-28 | 腾讯科技(深圳)有限公司 | 一种背景音乐添加方法和相关装置 |
| CN114390342A (zh) * | 2021-12-10 | 2022-04-22 | 阿里巴巴(中国)有限公司 | 一种视频配乐方法、装置、设备及介质 |
| CN114390342B (zh) * | 2021-12-10 | 2023-08-29 | 阿里巴巴(中国)有限公司 | 一种视频配乐方法、装置、设备及介质 |
| CN114372469A (zh) * | 2022-01-14 | 2022-04-19 | 平安科技(深圳)有限公司 | 实体样本的抽取方法、系统及存储介质 |
| CN114372469B (zh) * | 2022-01-14 | 2025-12-02 | 平安科技(深圳)有限公司 | 实体样本的抽取方法、系统及存储介质 |
| CN115115745A (zh) * | 2022-06-24 | 2022-09-27 | 北京华录新媒信息技术有限公司 | 自主创作型的数字艺术的生成方法、系统、存储介质及电子设备 |
| CN117349257A (zh) * | 2022-06-28 | 2024-01-05 | 教育科技加私人有限公司 | 乐谱训练数据库的构建和应用 |
| CN116501916A (zh) * | 2023-04-26 | 2023-07-28 | 腾讯音乐娱乐科技(深圳)有限公司 | 配乐推荐方法、模型的训练方法、设备及存储介质 |
| CN119202307A (zh) * | 2024-09-18 | 2024-12-27 | 腾讯音乐娱乐科技(深圳)有限公司 | 歌曲推荐模型的处理方法、计算机设备和可读存储介质 |
Also Published As
| Publication number | Publication date |
|---|---|
| US11314806B2 (en) | 2022-04-26 |
| EP3757995A4 (en) | 2021-06-09 |
| JP2021516398A (ja) | 2021-07-01 |
| CN109063163B (zh) | 2022-12-02 |
| EP3757995A1 (en) | 2020-12-30 |
| JP7206288B2 (ja) | 2023-01-17 |
| CN109063163A (zh) | 2018-12-21 |
| US20210004402A1 (en) | 2021-01-07 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2020034849A1 (zh) | 音乐推荐的方法、装置、计算设备和介质 | |
| US20230237093A1 (en) | Video recommender system by knowledge based multi-modal graph neural networks | |
| US11216496B2 (en) | Visual interactive search | |
| US11397873B2 (en) | Enhanced processing for communication workflows using machine-learning techniques | |
| US20210027160A1 (en) | End-to-end deep collaborative filtering | |
| US8838606B1 (en) | Systems and methods for classifying electronic information using advanced active learning techniques | |
| CN112380331A (zh) | 信息推送的方法和装置 | |
| US20250225398A1 (en) | Data processing method and related apparatus | |
| CN111125422A (zh) | 一种图像分类方法、装置、电子设备及存储介质 | |
| US12050936B2 (en) | Enhanced processing for communication workflows using machine-learning techniques | |
| CN112905885B (zh) | 向用户推荐资源的方法、装置、设备、介质和程序产品 | |
| WO2014107193A1 (en) | Efficiently identifying images, videos, songs or documents most relevant to the user based on attribute feedback | |
| US20250200428A1 (en) | Cluster-based few-shot sampling to support data processing and inferences in imperfect labeled data environments | |
| CN113806588A (zh) | 搜索视频的方法和装置 | |
| EP4100903A1 (en) | Enhanced processing for communication workflows using machine-learning techniques | |
| CN111512299A (zh) | 用于内容搜索的方法及其电子设备 | |
| CN118626727A (zh) | 一种基于动态用户画像的个性化推荐方法 | |
| EP4700608A1 (en) | Data processing method and related apparatus | |
| CN114329004A (zh) | 数字指纹生成、数据推送方法、装置和存储介质 | |
| CN113641900A (zh) | 信息推荐方法及装置 | |
| CN116975735A (zh) | 相关程度预估模型的训练方法、装置、设备、存储介质 | |
| CN118172146B (zh) | 物品数据处理方法、装置、计算机设备和存储介质 | |
| US20260064761A1 (en) | Visual Search Pivot Generation | |
| CN118939984A (zh) | 模型训练方法、推荐方法及相关装置 | |
| Foniakov et al. | Application of Multimodal Machine Learning for Image Recommendation Systems |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19849335 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2020549554 Country of ref document: JP Kind code of ref document: A |
|
| ENP | Entry into the national phase |
Ref document number: 2019849335 Country of ref document: EP Effective date: 20200924 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWW | Wipo information: withdrawn in national office |
Ref document number: 2019849335 Country of ref document: EP |