WO2020224128A1 - Procédé et appareil de recommandation d'informations reposant sur un intérêt à court terme d'un utilisateur, dispositif électronique et support - Google Patents

Procédé et appareil de recommandation d'informations reposant sur un intérêt à court terme d'un utilisateur, dispositif électronique et support Download PDF

Info

Publication number
WO2020224128A1
WO2020224128A1 PCT/CN2019/103700 CN2019103700W WO2020224128A1 WO 2020224128 A1 WO2020224128 A1 WO 2020224128A1 CN 2019103700 W CN2019103700 W CN 2019103700W WO 2020224128 A1 WO2020224128 A1 WO 2020224128A1
Authority
WO
WIPO (PCT)
Prior art keywords
news
user
term
short
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2019/103700
Other languages
English (en)
Chinese (zh)
Inventor
王健宗
贾雪丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Publication of WO2020224128A1 publication Critical patent/WO2020224128A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Definitions

  • This application relates to the field of data analysis technology, and more specifically, to a news recommendation method and device, electronic equipment, and media based on users' short-term interests.
  • the outline of a user based on the content is called a user portrait.
  • the key issue of content-based news recommendation is how to construct user portraits based on the user's reading history.
  • most content-based recommendation systems consider the user's reading history as a whole.
  • the long-term interest of a user may be relatively stable, but in the short term, the content that the user pays attention to will change. For example, a sports enthusiast, his focus may change with the competition of different events. Therefore, using long-term reading history to determine the user's preference cannot accurately recommend news for him, nor can it better stimulate the user's interest in reading.
  • the purpose of this application is to provide a news recommendation method and device, electronic equipment and medium based on the user's short-term interest that combine the long-term and short-term preferences of the user to recommend news to the user.
  • a news recommendation device based on a user’s short-term interest, including: a collection module that collects user behavioral data on news, the behavioral data includes a news matrix; a word vector matrix module, based on the news matrix Obtain the corresponding word vector matrix; clustering module, cluster the word vector matrix, obtain the grouping result of each news, and group each news into corresponding news groups according to the grouping result; user portrait obtaining module, A long-term portrait and a short-term portrait of each user are obtained through the long-term behavior data and short-term behavior data of each user for each news. The long-term portrait and the short-term portrait are used to represent the user's preference for the word vector corresponding to the word contained in the news.
  • the first similarity acquisition module which analyzes the similarity between the long-term portrait of each user and different newsgroups, and obtains multiple first similarities
  • the preference newsgroup acquisition module in descending order, compares the multiple first similarities According to the ranking results, the first set number of news groups corresponding to each user is obtained based on the result of the ranking
  • the second similarity obtaining module analyzes the latest short-term portrait of each user and the first set number of news groups The second degree of similarity between each news; a bipartite graph construction module, which constructs a user-news bipartite graph according to the second degree of similarity; a recommendation module, which selects the recommended news on the bipartite graph using an absorption random walk method , So as to get the recommended news of each user.
  • a news recommendation method based on users' short-term interests including: step S1, collecting user behavior data on news, the behavior data including a news matrix; step S2, according to the news matrix Obtain the corresponding word vector matrix; step S3, cluster the word vector matrix to obtain the grouping result of each news, and group each news into the corresponding news group according to the grouping result; step S4, pass each The long-term behavior data and short-term behavior data of each news user obtain a long-term portrait and a short-term portrait of each user respectively, and the long-term portrait and the short-term portrait are used to represent the user's preference for the word vector corresponding to the word contained in the news; step S5 Analyze the similarity between the long-term portrait of each user and the different newsgroups to obtain multiple first similarities; step S6, sort the multiple first similarities in descending order, and obtain each The first set number of newsgroups corresponding to the user; step S7, analyzing the second similarity between the latest short-term portrait of each
  • the present application also provides an electronic device including a memory and a processor, and the memory includes a news recommendation program based on the user's short-term interest, and the news recommendation program based on the user's short-term interest When executed by the processor, the above-mentioned news recommendation method based on the user's short-term interest is realized.
  • the present application also provides a computer non-volatile readable storage medium
  • the computer non-volatile readable storage medium includes a news recommendation program based on the user's short-term interests, and the When the interest news recommendation program is executed by the processor, the steps of the above-mentioned news recommendation method based on the user's short-term interest are realized.
  • the news recommendation method and device based on the short-term interests of users, electronic equipment and media described in this application establishes a user-item bipartite graph based on long-term and short-term user portraits, and seamlessly integrates long-term and short-term users to represent users’ reading preferences.
  • Absorbing random walk algorithm to select news in different topics not only can provide relevant news articles about user interests, but also expand user preferences by introducing articles on different topics.
  • FIG. 1 is a schematic diagram of an application environment of a preferred embodiment of a news recommendation method based on a user's short-term interest in this application;
  • Fig. 2 is a schematic diagram of a news recommendation device based on the short-term interests of users in this application;
  • Fig. 3 is a flowchart of a preferred embodiment of a news recommendation method based on a user's short-term interest in this application.
  • This application provides a news recommendation method based on a user's short-term interest, which is applied to an electronic device 1.
  • FIG. 1 it is a schematic diagram of an application environment of a preferred embodiment of a news recommendation method based on a user's short-term interest in this application.
  • the electronic device 1 may be a terminal client with computing functions such as a server, a mobile phone, a tablet computer, a portable computer, a desktop computer, and the like.
  • the memory 11 includes at least one type of readable storage medium.
  • the at least one type of readable storage medium may be a non-volatile storage medium such as flash memory, hard disk, multimedia card, card-type memory, and the like.
  • the readable storage medium may be an internal storage unit of the electronic device 1, such as a hard disk of the electronic device 1.
  • the readable storage medium may also be an external memory of the electronic device 1, such as a plug-in hard disk or a smart memory card (Smart Media Card, SMC) equipped on the electronic device 1. Secure Digital (SD) card, flash card (Flash Card), etc.
  • SD Secure Digital
  • flash card Flash Card
  • the readable storage medium of the memory 11 is generally used to store a news recommendation program 10 based on the user's short-term interests installed in the electronic device 1 and the like.
  • the memory 11 can also be used to temporarily store data that has been output or will be output.
  • the processor 12 may be a central processing unit (CPU), a microprocessor or other data processing chip, which is used to run the program code or process data stored in the memory 11, for example, to execute a short-term Interested news recommendation program 10 etc.
  • CPU central processing unit
  • microprocessor or other data processing chip
  • the network interface 13 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface), and is usually used to establish a communication connection between the electronic device 1 and other electronic clients.
  • a standard wired interface and a wireless interface such as a WI-FI interface
  • the communication bus 14 is used to realize the connection and communication between these components.
  • FIG. 1 only shows the electronic device 1 with the components 11-14, but it should be understood that it is not required to implement all the illustrated components, and more or fewer components may be implemented instead.
  • the electronic device 1 may also include a user interface, and the user interface may include an input unit such as a keyboard (Keyboard), a voice input device such as a microphone (microphone) and other clients with voice recognition functions, and a voice output device such as audio, earphones, etc. Etc.
  • the user interface may also include a standard wired interface and a wireless interface.
  • the electronic device 1 may also include a display, and the display may also be called a display screen or a display unit.
  • it may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, and an organic light-emitting diode (Organic Light-Emitting Diode, OLED) touch device.
  • the display is used for displaying information processed in the electronic device 1 and for displaying a visualized user interface.
  • the electronic device 1 further includes a touch sensor.
  • the area provided by the touch sensor for the user to perform a touch operation is called a touch area.
  • the touch sensor described here may be a resistive touch sensor, a capacitive touch sensor, or the like.
  • the touch sensor includes not only a contact type touch sensor, but also a proximity type touch sensor and the like.
  • the touch sensor may be a single sensor, or may be, for example, a plurality of sensors arranged in an array.
  • the electronic device 1 may also include logic gate circuits, sensors, audio circuits, etc., which will not be repeated here.
  • the memory 11 as a computer storage medium may include an operating system and a news recommendation program 10 based on the user's short-term interest; the processor 12 executes the information stored in the memory 11 based on the user's short-term interest
  • the news recommendation program implements the following steps at 10:
  • Step S1 collecting user behavior data on news, the behavior data including a news matrix
  • Step S2 Obtain a corresponding word vector matrix according to the news matrix
  • Step S3 clustering the word vector matrix to obtain a grouping result of each news, and grouping each news into a corresponding news group according to the grouping result;
  • Step S4 Obtain a long-term portrait and a short-term portrait of each user through the long-term behavior data and short-term behavior data of each user for each news.
  • the long-term portrait and the short-term portrait are used to represent the word corresponding to the word contained in the news.
  • Step S5 Analyze the similarity between the long-term portrait of each user and different newsgroups to obtain multiple first similarities
  • Step S6 sort the plurality of first similarities in descending order, and obtain a first set number of newsgroups corresponding to each user based on the sorting result;
  • Step S7 analyzing the second similarity between the latest short-term portrait of each user and each news in the first set number of newsgroups;
  • Step S8 construct a user news bipartite graph according to the second similarity
  • Step S9 Use the absorption random walk method to select recommended news on the user news bipartite graph, so as to obtain the recommended news of each user.
  • the news recommendation program 10 based on the user's short-term interests can also be divided into one or more modules, and the one or more modules are stored in the memory 11 and executed by the processor 12 to complete the content.
  • the module referred to in this application refers to a series of computer program instruction segments that can complete specific functions.
  • the above electronic device obtains the long-term portrait of the user while also modeling the short-term reading preference of the user, and according to the short-term reading preference, recommends articles that can arouse the user's reading interest to expand the user's reading volume.
  • FIG. 2 is a schematic diagram of a news recommendation device based on a user's short-term interest in this application. As shown in FIG. 2, the news recommendation device includes:
  • the collection module 110 collects user behavior data on news.
  • the behavior data includes a news matrix.
  • the behavior data further includes a news matrix and a behavior matrix.
  • the behavior matrix is a news matrix of each user in the user matrix.
  • the word vector matrix module 120 obtains a corresponding word vector matrix according to the news matrix
  • the clustering module 130 clusters the word vector matrix to obtain a grouping result of each news, and groups each news into a corresponding news group according to the grouping result;
  • the user portrait obtaining module 140 obtains a long-term portrait and a short-term portrait of each user through the long-term behavior data and short-term behavior data of each user for each news.
  • the long-term portrait and the short-term portrait are used to represent the words contained in the news.
  • the first similarity obtaining module 150 analyzes the similarity between the long-term portrait of each user and different news groups to obtain multiple first similarities
  • the preferred newsgroup obtaining module 160 sorts the plurality of first similarities in descending order, and obtains a first set number of newsgroups corresponding to each user based on the sorting result;
  • the second similarity obtaining module 170 analyzes the second similarity between the latest short-term portrait of each user and each news in the first set number of news groups;
  • the bipartite graph construction module 180 constructs a user-news bipartite graph according to the second similarity
  • the recommendation module 190 selects the recommended news by using an absorption random walk method on the bipartite graph, so as to obtain the recommended news of each user.
  • the aforementioned clustering module 130 includes:
  • the hierarchical clustering unit performs hierarchical clustering on the word vector matrix of the word vector matrix module to obtain a hierarchical clustering dendrogram, where one leaf node of the hierarchical clustering dendrogram corresponds to one news;
  • Dunn index obtaining unit to obtain the Dunn index corresponding to each clustering result of the hierarchical clustering unit
  • a cutting unit cutting the hierarchical clustering dendrogram of the hierarchical clustering unit through the layer corresponding to the maximum Dunn index obtained by the Dunn index obtaining unit to obtain the best hierarchical clustering dendrogram;
  • the news grouping unit cuts the cutting unit to form the best hierarchical clustering dendrogram and the news corresponding to the leaf nodes belonging to the same parent node belong to the same news group, thereby obtaining the news grouping of each news.
  • the above-mentioned news recommendation device further includes: a topic matrix construction module, which analyzes the word vector matrix using a linear discriminant analysis method to obtain topic probability matrices of multiple topics of each news and different words corresponding to each topic
  • the word probability matrix of the vector, the topic value of each news is obtained through the combination of the topic probability matrix, word probability matrix, and word vector matrix of each news.
  • the topic value of each news forms the topic matrix.
  • the clustering module 130 obtains the topic vector of each news group through the topic matrix constructed by the topic matrix building module; the first similarity obtaining module 150 uses the vector similarity measurement method to determine the long-term portrait of the user and the topic of each news group The first similarity of the vector; the second similarity obtaining module 170 uses a vector similarity measurement method to determine the second similarity between the short-term portrait of the user and the first set number of each news group.
  • this application also provides a news recommendation method based on users' short-term interests.
  • FIG. 3 it is a flowchart of a preferred embodiment of a news recommendation method based on a user's short-term interest in this application.
  • the method can be executed by a device, and the device can be implemented by software and/or hardware.
  • the news recommendation method based on the user's short-term interest includes:
  • Step S1 Collect user behavior data about news.
  • the behavior data includes a user matrix, preferably a news matrix and a behavior matrix.
  • the behavior matrix is the behavioral data of each user in the user matrix to each news in the news matrix.
  • Matrix of behavior indicators are the behavioral data of each user in the user matrix to each news in the news matrix.
  • N [n 1 , n 2 ,..., n b ]
  • U is the user matrix
  • a is the total number of users
  • N is the news matrix
  • b is the total number of news
  • UN is the behavior matrix formed by each user's behavior indicators for each news
  • UN a is the behavior vector of the a-th user
  • un ab is the behavior indicator of the a-th user on the b-th news.
  • the behavior indicators include the number of clicks, the number of reads, the number of likes, the number of evaluations, the length of reading, the frequency of clicks (the number of clicks per unit time), the frequency of reading, and the like
  • One or more of frequency and evaluation frequency for example, collecting user browsing history of news websites through web crawler technology, sorting user identifiers into a user matrix, sorting news identifiers in news websites into a news matrix, and dividing any The number of times the user clicks on any news is used as the user's behavior indicator for the news. When the user is not browsing news, the number of clicks by the user on the news is 0, which constitutes a behavior matrix;
  • Step S2 Obtain the corresponding word vector matrix according to the news matrix, that is to say, convert the words in each news in the news matrix into word vectors to form the corresponding word vector matrix
  • W is the word vector matrix of all news
  • c is the number of the longest word vector in the news
  • w bc represents the word vector of the c-th word in the b-th news, when the number of news word vectors is not enough c, Fill it with zeros
  • W b is the word vector matrix of the b-th news
  • Step S3 clustering the word vector matrix to obtain a grouping result of each news, and grouping each news into a corresponding news group according to the grouping result, and the news group represents the grouping of news clusters;
  • Step S4 Obtain a long-term portrait and a short-term portrait of each user through the long-term behavior data and short-term behavior data of each user for each news.
  • the long-term and short-term are in terms of time (for example, the long-term can be one month, The short-term may be one week), the long-term includes a plurality of the short-terms, and the long-term portrait and the short-term portrait represent the user's preference for the word vector corresponding to the word contained in the news;
  • Step S5 separately analyze the first similarity of the word vector between the long-term portrait of each user and each news group;
  • Step S6 sort the plurality of first similarities in descending order, and obtain a first set number of newsgroups corresponding to each user based on the sorting result;
  • Step S7 respectively analyze the second similarity of the word vector between the short-term portrait of each user closest to the analysis time and each news in the first set number of news groups;
  • Step S8 construct a user-news bipartite graph according to the second similarity
  • Step S9 Use the absorption random walk method to select the recommended news on the bipartite graph, so as to obtain the recommended news of each user.
  • the above-mentioned news recommendation method based on users’ short-term interests emphasizes the influence of the evolution of user’s interests when establishing user portraits, and seamlessly integrates long-term and short-term users as users’ reading preferences, establishes a relationship diagram between specific news and users, and then The absorption random walk method is implemented on the graph to select news articles with different topics.
  • the foregoing news recommendation method based on the user's short-term interest includes:
  • step S4 the word vector of each news is used as a label, and the long-term portrait and short-term portrait are the user's preference weight for each label,
  • P is a short-term portrait of a user
  • P' is a long-term portrait of a user
  • P b represents the short-term weight vector of the user for the b-th news
  • p bc is the user's c-th news in the b-th news.
  • step S5 the matrix similarity measurement method is used to determine the first similarity between the long-term portrait of the user and each newsgroup, for example, the correlation coefficient of the matrix, the cosine theorem of the space vector, etc., or the word vector of the news in the newsgroup
  • the similarity between the newsgroup matrix and the corresponding long-term profile sub-matrix (including the preference of the word vector of newsgroup news).
  • Another example is to use the cosine function to flatten the newsgroup matrix and the long-term profile sub-matrix, using the vector similarity method Obtain the first degree of similarity, for example, subtract the elements of the newsgroup matrix and the long-term portrait sub-matrix to square and then sum to obtain the first degree of similarity;
  • step S7 a matrix similarity measurement method is used to determine the second similarity between the short-term portrait of the user and the first set number of each news group;
  • step S8 in the second similarity of each user, each news group is sorted in descending order, and the second set number (less than the first set number) of the news group is taken, and all the news groups of each user are obtained.
  • a user-news bipartite graph is constructed according to the news of each user and the second set number of newsgroups, where the weight of the upper edge of the bipartite graph is set according to the user’s rating of news The higher the score, the greater the weight.
  • the above-mentioned news recommendation method based on the user's short-term interest screens newsgroups through the user's long-term portraits and short-term portraits, so that the selected newsgroups not only conform to the users' long-term preferences but also conform to the users' short-term interests, and improve the accuracy of news recommendation
  • Euclidean distance Euclidean distance, Manhattan distance, Chebyshev distance, Minkowski distance, normalized Euclidean distance, Mahalanobis distance, angle cosine, Hamming distance, Jeckard Vector similarity measurement methods such as distance &Jaccard's similarity coefficient, correlation coefficient & correlation distance obtain the second similarity between the user's short-term portrait and each news in the first set number of newsgroups, for example, after the user's long-term portrait filtering
  • d(P i , W i ) is the second degree of similarity between the user and news n 1 ;
  • each news is sorted in descending order in the second similarity of each user, and the first third set number of news is taken to obtain the third set number of news for each user, according to
  • Each user constructs a user-news bipartite graph with their respective third set number of news, wherein the weight of the sideline on the bipartite graph is set according to the user’s rating of the news.
  • the second similarity The user-news bipartite graph is constructed as the weight of the upper edge of the bipartite graph, or the user-news bipartite graph can be constructed directly without the second similarity ranking.
  • the above-mentioned news recommendation method based on users' short-term interests has two stages in news selection. First, long-term portraits are used to distinguish whether newsgroups meet user preferences, and then short-term portraits are used to filter specific news articles to users, so that users’ long-term preferences and short-term preferences Preference for seamless connection, which improves the accuracy of recommendations.
  • the news recommendation method based on the user's short-term interest includes:
  • step S2 LDA (Latent Dirichlet Allocation, linear discriminant analysis) is used to analyze the word vector matrix to obtain the topic value of each news, thereby obtaining the topic matrix, specifically including: obtaining each of the news matrix through LDA The topic probability matrix of multiple topics of news and the word probability matrix of different word vectors corresponding to each topic
  • ⁇ b is the topic probability matrix of the b-th news, Is the probability that the b-th news corresponds to the d-th topic, Is the word probability matrix of the b-th news, Indicates the probability that the dth topic generates the cth word vector in the bth news;
  • T b is the topic value of the b-th news, ".” means matrix multiplication
  • step S3 the word vector matrix is clustered to obtain the news group to which each news belongs, thereby obtaining the topic vector of each news group.
  • a news group is [n i , n j ], corresponding to the topic
  • the vector is [z i , z j ].
  • step S4 LDA is used as a language model for detecting potential topics, and a long-term portrait and a short-term portrait of each user are obtained. Specifically: the long-term portrait and the short-term portrait are obtained through the topic probability matrix, word probability matrix and behavior matrix of each news , Among them, the user’s behavioral index for news is taken as the user’s behavioral index for each word vector in the news,
  • z a [z a1 , z a2 ,..., z ab ]
  • un ab (c) represents the behavior vector of the a-th user to the c word vectors in the b-th news, that is, un ab (c) is composed of c un abs , and z ab is the a-th user pair
  • the topic value of the b-th news, z a is the long-term portrait or short-term portrait of the a-th user.
  • step S5 the similarity measurement method is used to determine the first similarity between the long-term portrait of the user and each newsgroup.
  • the cosine similarity method is used to obtain the first similarity.
  • sm , n represents the similarity between the m-th long-term portrait and the n-th newsgroup
  • (x 1 , x 2 ,..., x b ) is the topic vector of the m-th long-term portrait
  • (y 1 , y 2 ,...,y b ) is the nth newsgroup topic vector.
  • a newsgroup X includes the first news and the third news
  • the topic vector of the newsgroup is (z 1 ,z 3 )
  • the corresponding long-term portrait vector of the a-th user is (Z a1 ,Z a3 )
  • step S7 the similarity measurement method of step S5 is used to determine the second similarity between the short-term portrait of the user and the first set number of each news group.
  • step S8 in the second similarity of each user, each news group is sorted in descending order, and the second set number (less than the first set number) of the news group is taken, and all the news groups of each user are obtained.
  • a user-news bipartite graph is constructed according to the news of each user and the second set number of newsgroups, where the weight of the upper edge of the bipartite graph is set according to the user’s rating of news set.
  • the above-mentioned news recommendation method based on the user's short-term interest obtains the topic vector of each news and the user's short-term portrait and long-term portrait vector through LDA analysis, and screens newsgroups through similarity, which reduces the amount of calculation while ensuring the accuracy of recommendation .
  • step S4 the long-term portrait is obtained by formula (3), and the short-term portrait is obtained by the following formula (5)
  • step S7 the similarity measurement method is used to determine the second similarity between the short-term portrait of the user and each news of each news group of the first set number.
  • the cosine similarity method is used to obtain the second similarity.
  • s′ m,n represents the similarity between the m-th short-term portrait and the n-th news
  • (x 1 ,x 2 ,...,x c ) is the topic vector of the m-th short-term portrait
  • (y 1 , y 2 ,...,y c ) are the word vectors of the nth news, all of which are 1 ⁇ c vectors.
  • each news is sorted in descending order in the second similarity of each user, and the first third set number of news is taken to obtain the third set number of news for each user, according to
  • Each user constructs a user-news bipartite graph with their respective third set number of news, wherein the weight of the sideline on the bipartite graph is set according to the user’s rating of the news.
  • the second similarity The user-news bipartite graph is constructed as the weight of the upper edge of the bipartite graph, or the user-news bipartite graph can be constructed directly without the second similarity ranking.
  • the above-mentioned news recommendation method based on the user's short-term interest obtains the topic vector of each news and the user's short-term portrait and long-term portrait vector through LDA analysis, and screens news groups and news respectively, reduces the amount of calculation, increases the speed of recommendation, and improves the recommendation. Accuracy.
  • step S2 LDA is used to analyze the word vector matrix, and the topic vector of each news is obtained by the following formula (7)
  • step S7 the second similarity between each user's short-term portrait and each news is obtained by the similarity between each user's short-term portrait and the topic vector of each news.
  • step S4 the step of obtaining the long-term portrait and the short-term portrait of each user through the long-term behavior data and short-term behavior data of each user for each news respectively further includes:
  • the long-term portrait of the user is obtained in a weighted manner according to the user portrait of the user in each time frame, wherein the short-term portrait of the user closer to the analysis time has a higher weight.
  • a time equation is used to weighted combination of multiple short-term portraits of users into a long-term portrait of users
  • P u represents a long-term portrait
  • is the constant parameter of the time equation
  • the aforementioned news recommendation method based on the user's short-term interests first constructs a long-term portrait of a given user based on time-sensitive weighting, and then analyzes the user's latest reading history to analyze his short-term preferences.
  • step S3 the step of clustering the word vector matrix includes:
  • the above method of clustering the word vector matrix first uses a hierarchical agglomerative clustering algorithm to construct a news hierarchy purely based on the content of news articles, and then uses Dunn’s effectiveness index to determine the best hierarchical dendrogram, which avoids the cluster decision Quantity.
  • Dunn index calculates the shortest distance between any two cluster elements (between clusters) divided by the maximum distance (within cluster) in any cluster. The larger the index, the greater the distance between clusters and the smaller the distance within the cluster. Use Dunn The index decides which layer to cut the tree diagram. After obtaining news groups, LDA can be used to analyze each group, and the theme of each group can be represented by a theme vector to match the long-term user portrait for group filtering.
  • step S9 news is selected in different topics by absorbing random walk method.
  • the absorbing random walk method first chooses an initial point, and then randomly jumps to any point on the graph with the probability of p. The remaining 1-p probability will be assigned to the adjacent points according to the weight of the edge, and the same probability will be used every time. Jump to a random point or adjacent point, and use the transition matrix to calculate the jump probability. After several iterations, the jump probability stabilizes, and the news with the highest transition probability will be recommended, and the random walk method will decrease afterwards. The jump probability of the same article of the article in order to achieve the purpose of selecting more types of news. In this way, the news recommendation method based on the user's short-term interest described in this application can not only provide relevant news articles about the user's interest, but also expand the user's preferences by introducing articles on different topics.
  • step S9 includes:
  • each user acts as a node, and each news also acts as a node.
  • the random walk restart method is used to obtain the correlation value between the nodes;
  • the adjacent set of each user formed by the adjacent nodes of each user node form the first sub-correlation matrix of each user from the correlation value between any two nodes in the adjacent set, and divide the first sub-correlation matrix
  • the reciprocal of the mean value of the off-diagonal elements in the correlation matrix is used as the bridging value of each user, combined with the bridging values of user nodes in adjacent sets to form the bridging matrix of each user, for example, a user node u 1 , and its adjacent set is [n 2 , n 4 , u 3 ], the first autocorrelation matrix of user node u 1 r 23 is the correlation value between news node n 2 and user node u 3 , and the bridge value q 1 of user node u 1 is the mean value of the off-diagonal elements in the first correlation matrix, namely The bridging matrix of user node u 1 is [q 1 , q 3 ];
  • the correlation value of each user node and the user node in the adjacent set and the news node in the adjacent set constitutes the second sub-correlation matrix of each user, as in the above example, the second sub-correlation matrix of user node u 1
  • the bridge matrix of each user and the second sub-correlation matrix are multiplied to obtain the recommended value of the news node
  • the news nodes are sorted according to the recommended value in descending order, and the set number of news with the highest sorting is selected to recommend the user.
  • the step of using a random walk restart method to obtain correlation values between nodes includes:
  • Iterative processing is performed on the adjacency matrix until the adjacency matrix converges, and the elements in the adjacency matrix after the convergence are the correlation values between the one node and the other node.
  • an embodiment of the present application also proposes a computer non-volatile readable storage medium, the computer non-volatile readable storage medium includes a news recommendation program based on the user's short-term interest, and the news based on the user's short-term interest The following steps are implemented when the recommended program is executed by the processor:
  • Step S1 Collect user behavior data on news, the behavior data includes a user matrix
  • Step S2 Obtain a corresponding word vector matrix according to the news matrix
  • Step S3 clustering the word vector matrix to obtain a grouping result of each news, and grouping each news into a corresponding news group according to the grouping result;
  • Step S4 Obtain a long-term portrait and a short-term portrait of each user through the long-term behavior data and short-term behavior data of each user for each news.
  • the long-term portrait and the short-term portrait are used to represent the word corresponding to the word contained in the news.
  • Step S5 Analyze the similarity between the long-term portrait of each user and different newsgroups to obtain multiple first similarities
  • Step S6 sort the plurality of first similarities in descending order, and obtain a first set number of newsgroups corresponding to each user based on the sorting result;
  • Step S7 analyzing the second similarity between the latest short-term portrait of each user and each news in the first set number of newsgroups;
  • Step S8 construct a user news bipartite graph according to the second similarity
  • Step S9 Use the absorption random walk method to select recommended news on the user news bipartite graph, so as to obtain the recommended news of each user.
  • the specific implementation of the computer non-volatile readable storage medium of the present application is substantially the same as the specific implementation of the above-mentioned news recommendation method and device based on the user's short-term interest, and electronic equipment, and will not be repeated here.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente invention se rapporte au domaine de l'analyse de données, et concerne un procédé et un appareil de recommandation d'informations reposant sur un intérêt à court terme d'un utilisateur, un dispositif électronique, et un support d'enregistrement qui permettent de combiner des préférences à long terme et à court terme de l'utilisateur. Le procédé consiste : à recueillir des données de comportement d'un utilisateur concernant des informations (S1) ; à obtenir une matrice de vecteurs de mots correspondant à une matrice d'informations (S2) ; à regrouper la matrice de vecteurs de mots de manière à obtenir un groupe d'informations de chaque sous-groupe d'informations (S3) ; à obtenir un portrait à long terme et un portrait à court terme de chaque utilisateur au moyen de données de comportement à long terme et de données de comportement à court terme de chaque utilisateur en ce qui concerne chaque élément d'informations (S4) ; à analyser une première similarité entre le portrait à long terme de chaque utilisateur et chaque groupe d'informations (S5) ; à trier les groupes d'informations de chaque utilisateur dans un ordre décroissant selon la première similarité, et à prendre un premier nombre donné de groupes d'informations triés en haut de la liste (S6) ; à analyser une seconde similarité entre le portrait à court terme récent de chaque utilisateur et chaque élément d'informations dans le premier nombre donné de groupes d'informations (S7) ; à construire un graphe biparti d'informations d'utilisateur selon la seconde similarité (S8) ; et à sélectionner des informations recommandées sur le graphe biparti à l'aide d'un procédé de marche aléatoire avec absorption (S9).
PCT/CN2019/103700 2019-05-08 2019-08-30 Procédé et appareil de recommandation d'informations reposant sur un intérêt à court terme d'un utilisateur, dispositif électronique et support Ceased WO2020224128A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910379183.5A CN110275952A (zh) 2019-05-08 2019-05-08 基于用户短期兴趣的新闻推荐方法、装置及介质
CN201910379183.5 2019-05-08

Publications (1)

Publication Number Publication Date
WO2020224128A1 true WO2020224128A1 (fr) 2020-11-12

Family

ID=67959845

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/103700 Ceased WO2020224128A1 (fr) 2019-05-08 2019-08-30 Procédé et appareil de recommandation d'informations reposant sur un intérêt à court terme d'un utilisateur, dispositif électronique et support

Country Status (2)

Country Link
CN (1) CN110275952A (fr)
WO (1) WO2020224128A1 (fr)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112883292A (zh) * 2021-02-06 2021-06-01 西北大学 用户行为推荐模型建立及基于时空信息的位置推荐方法
CN114969566A (zh) * 2022-06-27 2022-08-30 中国测绘科学研究院 一种距离度量的政务服务事项协同过滤推荐方法
CN115481236A (zh) * 2022-08-31 2022-12-16 电子科技大学 一种基于用户兴趣建模的新闻推荐方法
CN116738057A (zh) * 2023-06-27 2023-09-12 平安科技(深圳)有限公司 信息推荐方法、装置、计算机设备及存储介质
CN121412462A (zh) * 2025-12-29 2026-01-27 橙客时代(北京)网络科技有限公司 基于公众号私域流量的用户画像生成方法及系统

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112733006B (zh) * 2019-10-14 2022-12-02 中国移动通信集团上海有限公司 用户画像的生成方法、装置、设备及存储介质
CN111062757B (zh) * 2019-12-17 2023-09-01 山大地纬软件股份有限公司 基于多路径寻优匹配的信息推荐方法及系统
CN111444428B (zh) * 2020-03-27 2022-08-30 腾讯科技(深圳)有限公司 基于人工智能的信息推荐方法、装置、电子设备及存储介质
CN111680218B (zh) * 2020-06-10 2023-08-11 网易传媒科技(北京)有限公司 用户兴趣识别方法、装置、电子设备及存储介质
CN111680073A (zh) * 2020-06-11 2020-09-18 天元大数据信用管理有限公司 一种基于用户数据的金融服务平台政策资讯推荐方法
CN112633356B (zh) * 2020-12-18 2024-09-10 平安科技(深圳)有限公司 推荐模型的训练方法、推荐方法、装置、设备及存储介质
CN114817753B (zh) * 2022-06-29 2022-09-09 京东方艺云(杭州)科技有限公司 一种艺术画作的推荐方法及装置
CN118051615A (zh) * 2024-02-07 2024-05-17 百果园技术(新加坡)有限公司 用户兴趣画像生成方法、装置、设备、存储介质以及产品
CN119005313B (zh) * 2024-07-23 2025-03-07 中国标准化研究院 一种知识图谱数据分类与管理方法及系统
CN118797014B (zh) * 2024-09-11 2024-11-19 广州方硅信息技术有限公司 聊天机器人应答方法及其装置、设备、介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5782487B2 (ja) * 2013-08-07 2015-09-24 日本電信電話株式会社 行動目的抽出方法及び装置
CN106503014A (zh) * 2015-09-08 2017-03-15 腾讯科技(深圳)有限公司 一种实时信息的推荐方法、装置和系统
CN107133290A (zh) * 2017-04-19 2017-09-05 中国人民解放军国防科学技术大学 一种个性化信息检索方法与装置
CN108197335A (zh) * 2018-03-09 2018-06-22 中国人民解放军国防科技大学 一种基于用户行为个性化查询推荐方法及装置
CN108446350A (zh) * 2018-03-09 2018-08-24 华中科技大学 一种基于主题模型分析与用户长短兴趣的推荐方法

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8589378B2 (en) * 2010-10-11 2013-11-19 Yahoo! Inc. Topic-oriented diversified item recommendation
CN103116639B (zh) * 2013-02-20 2016-05-11 新浪网技术(中国)有限公司 基于用户-物品二分图模型的物品推荐方法及系统
CN104572797A (zh) * 2014-05-12 2015-04-29 深圳市智搜信息技术有限公司 基于主题模型的个性化服务推荐系统和方法
CN105022840B (zh) * 2015-08-18 2018-06-05 新华网股份有限公司 一种新闻信息处理方法、新闻推荐方法和相关装置
CN105913296B (zh) * 2016-04-01 2020-01-03 北京理工大学 一种基于图的个性化推荐方法
CN108197211A (zh) * 2017-12-28 2018-06-22 百度在线网络技术(北京)有限公司 一种信息推荐方法、装置、服务器和存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5782487B2 (ja) * 2013-08-07 2015-09-24 日本電信電話株式会社 行動目的抽出方法及び装置
CN106503014A (zh) * 2015-09-08 2017-03-15 腾讯科技(深圳)有限公司 一种实时信息的推荐方法、装置和系统
CN107133290A (zh) * 2017-04-19 2017-09-05 中国人民解放军国防科学技术大学 一种个性化信息检索方法与装置
CN108197335A (zh) * 2018-03-09 2018-06-22 中国人民解放军国防科技大学 一种基于用户行为个性化查询推荐方法及装置
CN108446350A (zh) * 2018-03-09 2018-08-24 华中科技大学 一种基于主题模型分析与用户长短兴趣的推荐方法

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112883292A (zh) * 2021-02-06 2021-06-01 西北大学 用户行为推荐模型建立及基于时空信息的位置推荐方法
CN112883292B (zh) * 2021-02-06 2023-04-18 西北大学 用户行为推荐模型建立及基于时空信息的位置推荐方法
CN114969566A (zh) * 2022-06-27 2022-08-30 中国测绘科学研究院 一种距离度量的政务服务事项协同过滤推荐方法
CN115481236A (zh) * 2022-08-31 2022-12-16 电子科技大学 一种基于用户兴趣建模的新闻推荐方法
CN116738057A (zh) * 2023-06-27 2023-09-12 平安科技(深圳)有限公司 信息推荐方法、装置、计算机设备及存储介质
CN121412462A (zh) * 2025-12-29 2026-01-27 橙客时代(北京)网络科技有限公司 基于公众号私域流量的用户画像生成方法及系统

Also Published As

Publication number Publication date
CN110275952A (zh) 2019-09-24

Similar Documents

Publication Publication Date Title
WO2020224128A1 (fr) Procédé et appareil de recommandation d'informations reposant sur un intérêt à court terme d'un utilisateur, dispositif électronique et support
CN111291261B (zh) 融合标签和注意力机制的跨领域推荐方法及其实现系统
CN107103057B (zh) 一种资源推送方法及装置
CN104090919B (zh) 推荐广告的方法及广告推荐服务器
CN105224699B (zh) 一种新闻推荐方法及装置
TWI636416B (zh) 內容個人化之多相排序方法和系統
WO2021068610A1 (fr) Procédé et appareil de recommandation de ressources, dispositif électronique et support d'informations
US20190114362A1 (en) Searching Online Social Networks Using Entity-based Embeddings
CN111967914B (zh) 基于用户画像的推荐方法、装置、计算机设备和存储介质
CN110503506B (zh) 基于评分数据的物品推荐方法、装置及介质
WO2014056408A1 (fr) Procédé, dispositif et serveur de recommandation d'informations
KR101590976B1 (ko) 협업 필터링 기반 추천 성능을 향상 시키기 위한 의미 클러스터 기반 매트릭스 지역화 방법 및 장치
CN114491294B (zh) 基于图神经网络的数据推荐方法及装置、电子设备
CN104217031A (zh) 一种根据服务器搜索日志数据进行用户分类的方法和装置
CN109753601A (zh) 推荐信息点击率确定方法、装置及电子设备
Lorenz-Spreen et al. Tracking online topics over time: understanding dynamic hashtag communities
CN103365842B (zh) 一种页面浏览推荐方法及装置
CN112825089B (zh) 文章推荐方法、装置、设备及存储介质
CN108429865B (zh) 一种产品推荐处理方法及装置
US11144783B2 (en) Servers, non-transitory computer-readable media and methods for providing articles
JP7042720B2 (ja) 情報処理装置、情報処理方法、およびプログラム
JP6960838B2 (ja) 情報提供装置、情報提供方法、およびプログラム
CN113761084B (zh) 一种poi搜索排序模型训练方法、排序装置与方法及介质
CN113688206A (zh) 基于文本识别的趋势分析方法、装置、设备及介质
CN110689410B (zh) 数据处理方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19927848

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19927848

Country of ref document: EP

Kind code of ref document: A1