WO2024201980A1 - 検索システム、検索方法、および情報処理装置 - Google Patents
検索システム、検索方法、および情報処理装置 Download PDFInfo
- Publication number
- WO2024201980A1 WO2024201980A1 PCT/JP2023/013487 JP2023013487W WO2024201980A1 WO 2024201980 A1 WO2024201980 A1 WO 2024201980A1 JP 2023013487 W JP2023013487 W JP 2023013487W WO 2024201980 A1 WO2024201980 A1 WO 2024201980A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- query
- images
- search
- unit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/53—Querying
- G06F16/532—Query formulation, e.g. graphical querying
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
Definitions
- the present invention relates to a search system, a search method, and an information processing device, and in particular to a technique for searching for items according to an input query.
- Patent Document 1 discloses a search system that performs a product search by searching for product data that contains a character string that matches text entered by the user (an example of a query).
- this disclosure provides technology for flexibly accepting query input from a user and obtaining appropriate search results in response to the query.
- a search system includes a query acquisition unit that acquires a query for searching for a desired item, an image generation unit that generates an image using the query and a trained machine learning model, and a search unit that searches for an item based on the image.
- a search method includes a query acquisition step of acquiring a query for searching for a desired item, an image generation step of generating an image using the query and a trained machine learning model, and a search step of searching for an item based on the image.
- An information processing device includes a query acquisition unit that acquires a query for searching for a desired item, an image generation unit that generates a plurality of images using the query and a trained diffusion model, and a feature vector acquisition unit that acquires a feature vector for searching for the item based on the plurality of images.
- FIG. 1 shows an example of the configuration of an information processing system.
- FIG. 2 shows an example of the functional configuration of the information processing device.
- FIG. 3 shows an example of the functional configuration of the search device.
- FIG. 4 is a diagram for explaining an image generation model in the learning stage.
- FIG. 5 shows an example of the hardware configuration of the information processing device and the search device according to the embodiment.
- FIG. 6A is a diagram for explaining the procedure for generating a first type of image.
- FIG. 6B is a diagram for explaining the procedure for generating an image of the third type.
- FIG. 7 is a sequence diagram showing the flow of processing executed in the information processing system.
- FIG. 8A shows an example of an image generated by an information processing device.
- FIG. 8B shows an example of a search result page generated by the search device.
- FIG. 1 shows an example of the configuration of an information processing system 1 according to this embodiment.
- the information processing system 1 includes a user device 10 and a search system 20.
- the search system 20 includes an information processing device 21 and a search device 22.
- the user device 10 transmits a search query for a desired item (hereinafter also simply referred to as a query) to the search system 20 in response to a user's operation, and the search system 20, having received the query, operates to transmit search results that match the query to the user device 10.
- a desired item hereinafter also simply referred to as a query
- the user device 10 is, for example, a device such as a smartphone or a tablet, and is configured to be able to communicate with the search system 20 (i.e., an information processing device 21 and a search device 22) via a public network such as LTE (Long Term Evolution) or a wireless communication network such as a wireless LAN (Local Area Network).
- the user device 10 has a display unit (display surface) such as a liquid crystal display, and a user of the user device 10 can perform various operations using a GUI (Graphic User Interface) equipped on the liquid crystal display.
- the operations include various operations on content such as images displayed on the screen, such as tapping, sliding, and scrolling using a finger or a stylus.
- the user device 10 is not limited to the device shown in Fig. 1, but may be a device such as a desktop PC (Personal Computer) or a notebook PC. In this case, the user may operate the device using an input device such as a mouse or a keyboard.
- the user device 10 may also
- the search device 22 is a server device that provides an electronic commerce site (EC site).
- the user of the user device 10 can access the EC site provided by the search device 22, input a query to search for a desired item in a search window (input field) on the site, and confirm the query.
- the user device 10 transmits the query input by the user to the search system 20.
- the information processing device 21 receives the query transmitted from the user device 10 and generates one or more images based on the query. Alternatively, the information processing device 21 may generate one or more images from information other than the query.
- the information processing device 21 acquires a feature vector (hereinafter also referred to as an image feature) that represents the features of the one or more images, and outputs the image feature to the search device 22.
- a feature vector hereinafter also referred to as an image feature
- the search device 22 acquires the image feature acquired by the information processing device 21, and generates a web page (search result page) that displays the search results based on the image feature.
- the search device 22 transmits (distributes) the generated search result page to the user device 10.
- the information processing device 21 and the search device 22 are configured as separate devices, but they may be configured as a single information processing device.
- the query input by the user of the user device 10 may be text (e.g., text information describing an item), tags indicating attributes (e.g., the item's color, brand, category, genre, size, target gender (female, male, unisex, etc.)) (hereinafter also referred to as attribute tags), or images (e.g., images representing the item).
- the query may be a combination of two or more of text, attribute tags, and images.
- the image serving as a query may be an incomplete image, such as an image with missing parts, a blurry image, or a portion of an image.
- a query input by a user of the user device 10 is linked to information that identifies the user device 10 and information on the date and time of input. Furthermore, the query may be linked to characteristics of the user device 10 or the user (hereinafter also referred to as user characteristics).
- the user characteristics may include demographic information. Demographic information is demographic user characteristics (attributes) such as gender, age, residential area, occupation, family structure, etc.
- the items to be searched for can be tangible or intangible goods and services that can be provided in relation to various services.
- digital content services there are items such as video content such as movies and animations, and still image content such as photographs and illustrations.
- e-commerce services there are items such as intangible or tangible goods handled in online shopping.
- travel services there are items such as information and reservations regarding hotels, package tours, and transportation.
- mobile services there are items such as mobile devices, public network/Internet connections, and communication usage fees.
- sports and cultural services there are items such as events such as sporting events and concerts, and goods sold at events.
- [Functional configuration of information processing device 21] 2 shows an example of the functional configuration of the information processing device 21 according to the present embodiment.
- the information processing device 21 includes a query acquisition unit 101, an image generation unit 102, a classifier selection unit 103, an image feature extraction unit 104, an output unit 105, and a learning model storage unit 110.
- the learning model storage unit 110 stores an image generation model 111, a classifier group 112 including a plurality of classifiers (image recognition models), and an image feature extraction model 113, which are learned machine learning models.
- the query acquisition unit 101 acquires a query sent from the user device 10 by user operation.
- the query may be text, attribute tags, or an image (e.g., an image representing an item).
- the query may be a combination of two or more of text, attribute tags, and images. Note that the query is not limited to these, and may be any information that the user device 10 can send to search for a desired item.
- the image generation unit 102 generates one or more images based on the query acquired by the query acquisition unit 101.
- the image generation unit 102 may generate one or more images based on a classifier described below without the query.
- the image generation unit 102 generates an image using a trained image generation model 111 stored in the learning model storage unit 110.
- FIG. 4 is a diagram for explaining the image generation model 111 in the learning stage.
- the image generation model 111 corresponds to a diffusion model in the machine learning field.
- the diffusion model is a model that generates an image by gradually removing noise from a noisy image, and for example, a stable diffusion model is known.
- the image generation model 111 is configured to also reflect the query information, so it can be called a conditional diffusion model by the query.
- GAN Geneerative Adversarial Network
- GAN Geneative Adversarial Network
- an image 41 (a reference image for learning) and a query 42 are used as input data.
- the image 41 is input to an image encoder 401 in the image generation model 111.
- a VAE Virtual Autoencoder
- the image encoder 401 converts the image 41 (pixel image) into a latent representation (latent image embedding representation) in a latent space. This embeds a high-dimensional image into a low-dimensional latent space, reducing the load on computational processing.
- the latent representation converted by the image encoder 401 is then repeatedly added with noise (e.g., Gaussian noise) at multiple time steps in a diffusion process 402 to generate a noise image 403.
- noise e.g., Gaussian noise
- a U-Net may be used for noise removal in the despreading process 404.
- the U-Net is a type of FCN (Fully Convolution Network), and is configured to include multiple blocks including a convolution layer and multiple blocks including an attention layer.
- the de-diffusion process 404 also uses information on the query 42 input to the image generation model 111. Specifically, a feature vector (embedding vector) embedded in a feature space is extracted from the query 42 by the query encoder 406.
- the feature vector extracted by the query encoder 406 represents the features of the query 42, and is hereinafter also referred to as a query feature.
- the query encoder 406 is a trained model (transformer) configured to extract query features from the query 42.
- CLIP Content Language-Image Pre-training
- the de-diffusion process 404 derives noise to be removed from the input noise image 403 based on the query feature obtained by the query encoder 406, and subtracts the derived noise from the noise image 403.
- U-Net is used in the de-diffusion process 404, the query feature obtained by the query encoder 406 is input to each of a plurality of blocks including an attention layer, and noise is derived.
- the output obtained by subtracting the derived noise from the noise image 403 is finally converted into an image in the image decoder 405, and the image 43 is generated. Thereafter, the error (difference) between the image 43 (generated image) and the image 41 (teacher image) is calculated, and the image generation model 111 is trained so that the error is reduced. For example, the mean absolute error (MAE) or the mean square error (MSE) is calculated from the pixel values of the image 43 and the image 41, and the image generation model 111 is trained so that the error is reduced.
- MAE mean absolute error
- MSE mean square error
- the image 41 may be at least a part of all images representing items that can be handled on the EC site provided by the search device 22.
- the image 41 is not limited to images representing items, and may be or may include images that are not directly related to items that can be handled on the EC site.
- the learning process noise image generation and image generation process described above is performed on the at least a part of the images.
- the learned image generation model 111 is stored in the learning model storage unit 110.
- the image generation model 111 is further learned accordingly and stored in the learning model storage unit 110.
- the image generation model 111 may be learned by a predetermined functional block of the information processing device 21 and stored in the learning model storage unit 110, or may be learned by an external device and stored in the learning model storage unit 110.
- the image generation unit 102 generates one or more images using the image generation model 111 learned according to the above-mentioned flow.
- the image generation procedure by the image generation unit 102 will be described later.
- the classifier selection unit 103 selects a classifier to be used from a classifier group 112 that includes multiple classifiers stored in the learning model storage unit 110.
- the multiple classifiers included in the classifier group 112 are prepared in advance, and each classifier is a trained classifier (image recognition model) configured to classify images.
- the classifier group 112 and the selection of a classifier will be described later.
- the image feature extraction unit 104 uses the trained image feature extraction model 113 stored in the learning model storage unit 110 to extract a feature vector (embedded vector) embedded in a feature space from each of one or more images generated by the image generation unit 102.
- the feature vector extracted by the image feature extraction unit 104 corresponds to an image feature that represents the characteristics of each of the images.
- the image feature extraction model 113 is a learning model that has been trained to extract and output image features of an input image.
- the image feature extraction unit 104 inputs the image generated by the image generation unit 102 into the feature extraction model 221 and extracts and obtains the image features of the image.
- the output unit 105 outputs the image features of each of the one or more images generated by the image generation unit 102, which are obtained by the image feature extraction unit 104, to the search device 22. Furthermore, the output unit 105 may output image data including one or more images generated by the image generation unit 102 to the search device 22. Furthermore, the output unit 105 may be configured to output (present) the image data including one or more images generated by the image generation unit 102 to the user device 10.
- FIG. 3 shows an example of the functional configuration of the search device 22 according to the present embodiment.
- the search device 22 is composed of an image feature acquisition unit 201, a search unit 202, an output unit 203, and a search database 210.
- the learning model storage unit 210 stores a learned feature extraction model 211.
- the search database 210 stores data related to items (hereinafter also referred to as item data) including images representing each item. Each item data is assigned identification information that identifies each item, and is linked to the image features of each image. Each item data may include attribute information (related information) such as the price and description of each item.
- the item data stored in the search database 210 may be updated according to the addition and/or deletion of items in the EC site provided by the search device 22, or the addition and/or deletion of search conditions.
- the image feature acquisition unit 201 acquires image features (feature vectors) generated by the information processing device 21. That is, the image feature acquisition unit 201 acquires image features representing the features of one or more images generated in the information processing device 21 based on a query transmitted from the user device 10. Alternatively, the image feature acquisition unit 201 may acquire image features representing the features of one or more images generated in the information processing device 21 based on a selected classifier. In addition to this, or instead of this, the image feature acquisition unit 201 may acquire image data including one or more images generated by the information processing device 21.
- the search unit 202 performs an item search based on the image features acquired by the image feature acquisition unit 201. For example, the search unit 202 searches the search database 210 using the image features as search criteria, and acquires item data having the same or similar features as the image features as search results. Furthermore, the search unit 202 creates (constructs) a web page (search result page) that includes the acquired item data.
- the search unit 202 may extract image features from the image acquired by the image feature acquisition unit 201, and search the search database 210 using the image features as search conditions to acquire item data.
- the search device 22 has a trained image feature extraction model 113.
- the search unit 202 may extract and acquire image features from the image acquired from the image feature acquisition unit 201 using the image feature extraction model 113, and may search the search database 210 using the image features as search conditions to acquire item data.
- the search unit 202 may create (configure) a search result page including the acquired item data.
- the output unit 203 transmits (distributes) the search result page created by the search unit 202 to the user device 10.
- the search result page is response data to the query transmitted from the user device 10, and can be displayed on the display unit of the user device 10.
- FIG. 5 is a block diagram showing an example of a hardware configuration of an information processing device 21 according to the present embodiment.
- the information processing device 21 according to this embodiment can be implemented on any single or multiple computers, mobile devices, or any other processing platform. 5, the information processing device 21 is illustrated as being implemented in a single computer, but the information processing device 21 according to the present embodiment may be implemented in a computer system including multiple computers. The multiple computers may be connected to each other via a wired or wireless network so as to be able to communicate with each other.
- the information processing device 21 may include a CPU (Central Processing Unit) 501, a ROM (Read Only Memory) 502, a RAM 503, a HDD (Hard Disk Drive) 504, an input unit 505, a display unit 506, a communication I/F 507, a GPU (Graphics Processing Unit) 508, and a system bus 509.
- the information processing device 21 may also include an external memory.
- the CPU 501 generally controls the operation of the information processing device 21, and controls each of the components (502 to 508) via a system bus 509 which is a data transmission path.
- the ROM 502 is a non-volatile memory that stores a control program and the like necessary for the CPU 501 to execute processing.
- the program may be stored in a non-volatile memory such as the HDD 504 or an SSD (Solid State Drive) or an external memory such as a removable storage medium (not shown).
- the RAM (Random Access Memory) 503 is a volatile memory and functions as a main memory, a work area, etc. of the CPU 501. That is, when executing a process, the CPU 501 loads necessary programs, etc. from the ROM 502 into the RAM 503 and executes the programs, etc. to realize various functional operations.
- the RAM 503 may include the learning model storage unit 110 shown in FIG. 1.
- the HDD 504 stores, for example, various data and various information required when the CPU 501 performs processing using a program.
- the HDD 504 also stores, for example, various data and various information obtained when the CPU 501 performs processing using a program.
- the input unit 505 is composed of a keyboard and a pointing device such as a mouse.
- the display unit 506 is configured with a monitor such as a liquid crystal display (LCD), etc.
- the display unit 506 may be configured in combination with the input unit 505 to function as a GUI (Graphical User Interface).
- the communication I/F 507 is an interface that controls communication between the information processing device 21 and an external device.
- the communication I/F 507 provides an interface with a network and executes communication with an external device via the network.
- Various data, various parameters, etc. are transmitted and received between the external device and the communication I/F 507.
- the communication I/F 507 may execute communication via a wired LAN (Local Area Network) or a dedicated line that complies with a communication standard such as Ethernet (registered trademark).
- the network that can be used in this embodiment is not limited to this, and may be configured as a wireless network.
- This wireless network includes wireless PANs (Personal Area Networks) such as Bluetooth (registered trademark), ZigBee (registered trademark), and UWB (Ultra Wide Band). It also includes wireless LANs (Local Area Networks) such as Wi-Fi (Wireless Fidelity) (registered trademark) and wireless MANs (Metropolitan Area Networks) such as WiMAX (registered trademark). It also includes wireless WANs (Wide Area Networks) such as 4G and 5G. Note that the network only needs to be able to connect devices to each other and communicate with each other, and the communication standard, scale, and configuration are not limited to those described above.
- the GPU 508 is a processor specialized for image processing. The GPU 508 performs, for example, image generation processing according to this embodiment in cooperation with the CPU 501. Note that the GPU 508 may perform processing other than image processing.
- each element of the information processing device 21 shown in FIG. 2 can be realized by the CPU 501 executing a program. However, at least some of the functions of each element of the information processing device 21 shown in FIG. 2 may be operated as dedicated hardware. In this case, the dedicated hardware operates under the control of the CPU 501.
- FIG. 6A is a diagram for explaining a procedure of the first type of image generation.
- the image generation unit 102 uses a trained image generation model 111 to generate one or more images based on a query transmitted from the user device 10 and acquired by the query acquisition unit 101.
- the image generation unit 102 inputs the query 61 acquired by the query acquisition unit 101 to the trained image generation model 111.
- the query 61 is converted into a query feature by the query encoder 406 and input to the de-diffusion process 404.
- the de-diffusion process 404 derives noise to be removed from the noise image 403 based on the query feature acquired by the query encoder 406, and subtracts the derived noise from the noise image 403.
- the output obtained by subtracting the derived noise from the noise image 403 is finally converted into an image 62 in the image decoder 405 and output. By performing such processing on different noise images 403, multiple images including the image 62 are generated.
- generating an image 62 may mean generating multiple images including the image 62 by the above-described procedure.
- the image generation unit 102 acquires the query 61 input by the user of the user device 10 to search for a desired item, and generates one or more images based on the query 61 and the trained image generation model 111.
- the image generation by using a query feature (feature vector) indicating the features of the query 61 generated by the query encoder 406, it becomes possible to generate one or more images based on the query 61.
- the image serving as query 61 may be an incomplete image, such as an image with missing parts, an unclear image, or a portion of an image.
- the missing parts are masked in the image, and noise is added to the areas other than the masked parts.
- the generated noise image and the masked parts are then combined into an image, and noise is removed from the image by a de-diffusion process, thereby generating a more complete image in which the missing parts are predicted (inpainted).
- the second type of image generation is image generation by sampling.
- the de-diffusion process 404 generates a large amount of output from a single query feature obtained by the query encoder 406 by sampling.
- the output is converted into a plurality of images including the image 62 in the image decoder 405 and output. This allows a large amount of images to be generated.
- the large amount of images corresponds to images forming a distribution of the single query 61.
- FIG. 6B is a diagram for explaining the procedure of the third type of image generation.
- the image generation unit 102 generates one or more images using a trained image generation model 111 including a classifier. Specifically, the image generation unit 102 generates one or more images using the trained image generation model 111 and gradients from the classifier. This type of image generation corresponds to guided image generation by a classifier.
- a classifier selected by the classifier selection unit 103 from the classifier group 112 stored in the learning model storage unit 110 is used.
- Each classifier included in the classifier group 112 is a trained classifier (image recognition model) configured to classify images having a predetermined characteristic.
- Each classifier is assigned with identification information that identifies the classifier.
- each classifier is associated with one or more characteristics (keywords) related to the image to be classified.
- this type of image generation differs from the first type of image generation described above (see FIG. 6A) in that the gradient output from the classifier 601 is input to the de-diffusion process 404.
- the classifier 601 is selected from the classifier group 121 by the classifier selection unit 103.
- the classifier selection unit 103 can select a classifier in the following procedure.
- the classifier selection unit 103 may analyze the query 61, extract features included in the query 61, and select a classifier having the features.
- the classifier selection unit 103 may extract (predict) features such as an item name or an item genre included in the text, and select a classifier associated with the features.
- the classifier selection unit 103 may extract the features on a rule basis or by using machine learning (e.g., the query encoder 406).
- the classifier selection unit 103 may select a classifier according to a user instruction 63.
- the user may be a user of the user device 10.
- the classifier selection unit 103 may select a classifier corresponding to the identification information.
- the classifier selection unit 103 may select a classifier associated with the feature indicated by the text.
- the classifier selection unit 103 may extract (predict) the feature indicated by the text and select a classifier associated with the feature.
- the classifier selection unit 103 may extract the feature on a rule basis or by using machine learning.
- User instructions 63 for selecting a classifier may be included in the query 61 or may be sent from the user device 10 separately from the query 61 .
- the classifier selection unit 103 may select a classifier according to preset setting information 64.
- the setting information 64 may be preset in the classifier selection unit 103, or may be set from an external device.
- the classifier selection unit 103 may select a classifier according to the type of webpage.
- the classifier selection unit 103 may select a classifier associated with characteristics related to food.
- the selected classifier 601 outputs the gradient from the classifier 601 as a weight to the de-diffusion process 404 so that there is a high probability that an image that can be classified by the classifier 601 is generated.
- the de-diffusion process 404 removes noise from the noise image 403 according to the query features output from the query encoder 406 and the gradient output from the classifier 601 to generate an image 62. This allows the image generation unit 102 to generate images limited to those that can be classified by the selected classifier 601.
- the image generating unit 102 may remove noise from the noise image 403 according to the gradient from the classifier 601 without the query 61, to generate the image 62.
- the classifier selecting unit 103 may select the classifier 601 according to a user instruction 63 transmitted from the user device 10, and the image generating unit 102 may remove noise from the noise image 403 according to the gradient of the classifier 601, to generate the image 62.
- the classifier selecting unit 103 may select the classifier 601 based on the setting information 64 at a predetermined timing, and the image generating unit 102 may remove noise from the noise image 403 according to the gradient from the classifier 601, to generate the image 62.
- the image generation unit 102 generates one or more images based on the query 61 input by the user of the user device 10 to search for a desired item, the gradient from the classifier 601, and the trained image generation model 111.
- the classifier 601 can be selected by the classifier selection unit 103 based on the query 61, the user instruction 63, or the setting information 64. This allows the image generation unit 102 to generate an image that can be classified by the selected classifier 601, i.e., an image that is the target of the classifier 601. Also, in this type of image generation, the image generation unit 102 can generate an image according to the classifier selected based on the user instruction 63 or the setting information 64, even without the query 61.
- FIG. 7 is a sequence diagram showing a process flow executed in the information processing system 1.
- the information processing system 1 is composed of a user device 10, and a search system 20 including an information processing device 21 and a search device 22.
- a process flow will be described when a query for searching for a desired item is transmitted from the user device 10 in response to an operation by a user.
- the query acquisition unit 101 of the information processing device 21 receives and acquires the query (S701).
- the query may be text (e.g., text information describing an item) or attribute tags indicating attributes (e.g., the color, brand, category, genre, size, and gender of the target (female, male, unisex, etc.) of the item).
- the query may be a combination of two or more of text, attribute tags, and images.
- the image generation unit 102 of the information processing device 21 generates a plurality of images based on the query and the trained image generation model 111 (S702). The image generation process is as described above.
- the image feature extraction unit 104 of the information processing device 21 extracts and acquires image features (feature vectors) of each of the generated images using the trained image feature extraction model 113 (S703).
- the output unit 105 of the information processing device 21 outputs the image features to the search device 22 (S704).
- FIG. 8A shows an example of images generated in S702 for a query sent from the user device 10 in S701.
- FIG. 8A shows an image group 81 generated by the image generation unit 102 of the information processing device 21 when the query 80 sent from the user device 10 in S701 is the text "black and white striped V-neck T-shirt.”
- the image group 81 includes four images (images 811, 812, 813, and 814).
- the query 80 is composed of a relatively long sentence (a long query without breaks), but in the image generation model 111, the query encoder 406 extracts the query features of the query 80 and then generates images reflecting the query features, making it possible to generate an image group 81 that takes into account the features of the query 80.
- the image feature acquisition unit 201 of the search device 22 acquires the image feature (S704).
- the search unit 202 of the search device 22 performs an item search using the search database 210 based on the image feature (S705).
- the search unit 202 searches the search database 210 using the image feature as a search condition, and acquires item data having the same or similar features as the image feature as the search result.
- the search unit 202 creates (constructs) a search result page including the acquired item data (S706).
- the output unit transmits (distributes) the created search result page to the user device 10 (S707).
- FIG. 8B shows an example of a search result page created in S706 for the query sent in S701.
- FIG. 8B shows a search result page 82 including item data searched based on image features extracted from images 811-814 included in the image group 81 generated for the query 80 shown in FIG. 8A.
- the item data 821 may include an image 822 representing an item and attribute information 823 such as the price and description of the item.
- the search result page 82 includes item data searched based on image features extracted from each of the images 811-814 included in the image group 81, it is composed of item data reflecting features explicitly and/or implicitly included in the images 811-814.
- the search result page 81 is sent to the user device 10 and displayed on the display unit of the user device 10. The user of the user device 10 can receive the search result page 81 as a search result for the query 80.
- the search system 20 accepts a query for a desired item from a user, generates one or more images that reflect the characteristics of the query, and generates search results based on the characteristics of the images, and provides them to the user.
- the query is not limited to text, but can be any form of query, such as attribute tags or images (including incomplete images). This enables the search system 20 to more flexibly accept query inputs from users and provide the user with appropriate search results in accordance with the query, which can improve customer satisfaction on EC sites and lead to increased purchasing motivation.
- FIG. 7 describes the processing flow when a query is sent from the user device 10
- the information processing device 21 may generate an image based on the query from the user device 10 and the gradient from the selected classifier, as described with reference to FIG. 6B (S702).
- the classifier selection unit 103 of the information processing device 21 may select a classifier based on the query received in S701.
- the classifier selection unit 103 may select a classifier based on a user instruction (user instruction 63 in FIG. 6B) or setting information (setting information 64).
- the information processing device 21 may generate an image based on the gradient from the selected classifier without a query from the user device 10, as described with reference to FIG. 6B (S702).
- the classifier selection unit 103 may select a classifier based on a user instruction (user instruction 63 in FIG. 6B) or setting information (setting information 64).
- the information processing device 21 can generate an image regardless of the timing of receiving a query, and in response, the search device 22 can create a search result page based on the generated image (S705, S706) and transmit it to the user device 10 (S707).
- the search result page transmitted to the user may include items that match the potential needs of the user, and therefore may be an effective push-type advertisement.
- the information processing device 21 may be configured to transmit the generated image group 81 to the user device 10 in order to present it to the user.
- the user who viewed the image group 81 may operate the user device 10 to delete unintended images from the image group 81 and transmit one or more remaining images to the information processing device 21, and the information processing device 21 may acquire image features of the remaining images and transmit them to the search device 22.
- the user who viewed the image group 81 may operate the user device 10 to identify one image that matches the user's intention from the image group 81 and transmit it to the information processing device 21, and the information processing device 21 may acquire image features of the identified image and transmit it to the search device 22. This may result in the search results 22 being closer to the results intended by the user.
- a search system comprising: a query acquisition unit that acquires a query for searching for a desired item; an image generation unit that generates an image using the query and a trained machine learning model; and a search unit that searches for an item based on the image.
- the search system described in [5] further includes a selection unit that selects the classifier from a plurality of pre-prepared, trained classifiers configured to classify images.
- a search system according to any one of [1] to [6], wherein the query includes any one of text, tags indicating attributes, and images, or a combination of two or more of text, tags indicating attributes, and images.
- a search method including a query acquisition step of acquiring a query for searching for a desired item, an image generation step of generating an image using the query and a trained machine learning model, and a search step of searching for an item based on the image.
- An information processing device comprising: a query acquisition unit that acquires a query for searching for a desired item; an image generation unit that generates a plurality of images using the query and a trained diffusion model; and a feature vector acquisition unit that acquires a feature vector for searching for an item based on the plurality of images.
- An information processing method including: a query acquisition step of acquiring a query for searching for a desired item; an image generation step of generating a plurality of images using the query and a trained diffusion model; and a feature vector acquisition step of acquiring a feature vector for searching for the item based on the plurality of images.
- An information processing program for causing a computer to execute information processing, the program causing the computer to execute processes including a query acquisition process for acquiring a query for searching for a desired item, an image generation process for generating an image using the query and a trained machine learning model, and a search process for searching for an item based on the image.
- An information processing program for causing a computer to execute information processing, the program causing the computer to execute processes including a query acquisition process for acquiring a query for searching for a desired item, an image generation process for generating a plurality of images using the query and a trained diffusion model, and a feature vector acquisition process for acquiring a feature vector for searching for an item based on the plurality of images.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Library & Information Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
Description
なお、ユーザ装置10は、図1に示すような形態のデバイスに限らず、デスクトップ型のPC(Personal Computer)や、ノート型のPCといったデバイスであってもよい。その場合、ユーザによる操作は、マウスやキーボードといった入力装置を用いて行われうる。また、ユーザ装置10は、表示部を別に備えてもよい。
本実施形態による情報処理装置21の機能構成の一例を図2に示す。情報処理装置21は、その機能構成の一例として、クエリ取得部101、画像生成部102、分類器選択部103、画像特徴抽出部104、出力部105、および学習モデル記憶部110から構成される。学習モデル記憶部110には、学習済みの機械学習モデルである、画像生成モデル111、複数の分類器(画像認識モデル)を含む分類器群112、および画像特徴抽出モデル113が格納されている。
逆拡散プロセス404には、画像生成モデル111に入力されたクエリ42の情報も用いられる。具体的には、クエリ42から、クエリエンコーダ406によって、特徴空間に埋め込まれた特徴ベクトル(埋め込みベクトル)が抽出される。クエリエンコーダ406によって抽出された当該特徴ベクトルは、クエリ42の特徴を表し、以下、クエリ特徴とも称する。クエリエンコーダ406は、クエリ42からクエリ特徴を抽出するように構成された学習済みモデル(Transformer)である。クエリ42がテキストの場合は、例えば、クエリエンコーダ406として、CLIP(Contrastive Language-Image Pre-training)を用いることができる。
逆拡散プロセス404は、入力されたノイズ画像403から、クエリエンコーダ406により得られたクエリ特徴に基づいて除去すべきノイズを導出し、導出されたノイズをノイズ画像403から引き算する。逆拡散プロセス404にU-Netが用いられる場合、クエリエンコーダ406により得られたクエリ特徴は、アテンション層を含む複数のブロックそれぞれに入力され、ノイズが導出される。逆拡散プロセス404において、導出されたノイズをノイズ画像403から引き算することにより得られた出力は、最終的に画像デコーダ405において画像へ変換され、画像43が生成される。その後、画像43(生成画像)と画像41(教師画像)との誤差(差分)が算出され、当該誤差が小さくなるように、画像生成モデル111が学習される。例えば、画像43と画像41の画素値から、平均絶対誤差(MAE)や平均二乗誤差(MSE)を算出し、当該誤差が小さくなるように、画像生成モデル111が学習される。
本実施形態による検索装置22の機能構成の一例を図3に示す。検索装置22は、その機能構成の一例として、画像特徴取得部201、検索部202、出力部203、および検索データベース210から構成される。学習モデル記憶部210には、学習済みの特徴抽出モデル211が格納されている。検索データベース210には、各アイテムを表す画像を含む、アイテムに関するデータ(以下、アイテムデータとも称する)が格納されている。各アイテムデータは、各アイテムを識別する識別情報が付され、各画像の画像特徴が紐づけられている。各アイテムデータは、各アイテムの価格や説明といった属性情報(関連情報)を含みうる。検索データベース210に格納されるアイテムデータは、検索装置22が提供するECサイトにおけるアイテムの追加および/または削除、または、検索条件の追加および/または削除に従って、更新されうる。
次に、情報処理装置21と検索装置22のハードウェア構成例について説明する。両装置は同様のハードウェア構成を有しうるため、ここでは情報処理装置21について説明する。また、ユーザ装置10も同様のハードウェア構成を有しうる。
図5は、本実施形態による情報処理装置21のハードウェア構成の一例を示すブロック図である。
本実施形態による情報処理装置21は、単一または複数の、あらゆるコンピュータ、モバイルデバイス、または他のいかなる処理プラットフォーム上にも実装することができる。
図5を参照して、情報処理装置21は、単一のコンピュータに実装される例が示されているが、本実施形態による情報処理装置21は、複数のコンピュータを含むコンピュータシステムに実装されてよい。複数のコンピュータは、有線または無線のネットワークにより相互通信可能に接続されてよい。
CPU501は、情報処理装置21における動作を統括的に制御するものであり、データ伝送路であるシステムバス509を介して、各構成部(502~508)を制御する。
RAM(Random Access Memory)503は、揮発性メモリであり、CPU501の主メモリ、ワークエリア等として機能する。すなわち、CPU501は、処理の実行に際してROM502から必要なプログラム等をRAM503にロードし、当該プログラム等を実行することで各種の機能動作を実現する。情報処理装置21の場合、RAM503は、図1に示す学習モデル記憶部110を含みうる。
入力部505は、キーボードやマウス等のポインティングデバイスにより構成される。
表示部506は、液晶ディスプレイ(LCD)等のモニターにより構成される。表示部506は、入力部505と組み合わせて構成されることにより、GUI(Graphical User Interface)として機能してもよい。
通信I/F507は、ネットワークとのインタフェースを提供し、ネットワークを介して、外部装置との通信を実行する。通信I/F507を介して、外部装置との間で各種データや各種パラメータ等が送受信される。本実施形態では、通信I/F507は、イーサネット(登録商標)等の通信規格に準拠する有線LAN(Local Area Network)や専用線を介した通信を実行してよい。ただし、本実施形態で利用可能なネットワークはこれに限定されず、無線ネットワークで構成されてもよい。この無線ネットワークは、Bluetooth(登録商標)、ZigBee(登録商標)、UWB(Ultra Wide Band)等の無線PAN(Personal Area Network)を含む。また、Wi-Fi(Wireless Fidelity)(登録商標)等の無線LAN(Local Area Network)や、WiMAX(登録商標)等の無線MAN(Metropolitan Area Network)を含む。さらに、4G、5G等の無線WAN(Wide Area Network)を含む。なお、ネットワークは、各機器を相互に通信可能に接続し、通信が可能であればよく、通信の規格、規模、構成は上記に限定されない。
GPU508は、画像処理に特化したプロセッサである。GPU508は、CPU501と協働して、例えば本実施形態による画像生成処理を行う。なお、GPU508は画像処理以外の処理を行ってもよい。
次に、情報処理装置21の画像生成部102により実行される画像生成の手順について説明する。本実施形態では、3つのタイプ(第1のタイプ、第2のタイプ、および第3のタイプ)の画像生成の手順について説明する。画像生成モデル111の構成については、図4を参照する。
図6Aは、第1のタイプの画像生成の手順を説明するための図である。第1のタイプの画像生成として、画像生成部102は、学習済みの画像生成モデル111を用いて、ユーザ装置10から送信され、クエリ取得部101により取得されたクエリに基づく1つ以上の画像を生成する。
第2のタイプの画像生成は、サンプリングによる画像生成である。図6Aを参照して説明すると、逆拡散プロセス404は、サンプリングにより、クエリエンコーダ406により得られた単一のクエリ特徴から、大量の出力を生成する。当該出力は、画像デコーダ405において画像62を含む複数の画像へ変換され、出力される。これにより、大量の画像を生成することができる。当該大量の画像は、単一のクエリ61の分布を形成する画像に対応する。
図6Bは、第3のタイプの画像生成の手順を説明するための図である。第3のタイプの画像生成として、画像生成部102は、分類器を含む、学習済みの画像生成モデル111を用いて、1つ以上の画像を生成する。具体的には、画像生成部102は、学習済みの画像生成モデル111と、分類器からの勾配(gradients)を用いて、1つ以上の画像を生成する。本タイプの画像生成は、分類器によるガイド付き画像生成に対応する。本実施形態では、学習モデル記憶部110に記憶されている分類器群112から、分類器選択部103により選択された分類器が使用される。分類器群112に含まれるそれぞれの分類器は、所定の特徴を有する画像を分類するように構成された、学習済みの分類器(画像認識モデル)である。それぞれの分類器には、分類器を識別する識別情報が付されている。また、それぞれの分類器には、分類する画像に関する1つ以上の特徴(キーワード)が関連付けられているものとする。
例えば、分類器選択部103はクエリ61を分析し、クエリ61に含まれる特徴を抽出し、当該特徴を有する分類器を選択しうる。例えば、分類器選択部103は、クエリ61がテキストである場合に、当該テキストに含まれるアイテム名やアイテムのジャンルといった特徴を抽出(予測)し、当該特徴に関連付けられた分類器を選択しうる。分類器選択部103は、ルールベースで、もしくは、機械学習(例えば、クエリエンコーダ406)を用いることにより、当該特徴を抽出することができる。
分類器を選択するためのユーザ指示63は、クエリ61に含まれていてもよいし、クエリ61とは別にユーザ装置10から送信されてもよい。
本実施形態による情報処理システム1において実行される処理の流れを説明する。図7は、情報処理システム1において実行される処理の流れを示すシーケンス図である。情報処理システム1は、図1に示すように、ユーザ装置10と、情報処理装置21と検索装置22とを含む検索システム20から構成される。ここでは、ユーザによる操作により、ユーザ装置10から所望のアイテムを検索するためのクエリが送信された場合の処理の流れについて説明する。
[1]所望のアイテムを検索するためのクエリを取得するクエリ取得部と、前記クエリと、学習済みの機械学習モデルとを用いて、画像を生成する画像生成部と、前記画像に基づいて、アイテムを検索する検索部と、を備える、検索システム。
Claims (9)
- 所望のアイテムを検索するためのクエリを取得するクエリ取得部と、
前記クエリと、学習済みの機械学習モデルとを用いて、画像を生成する画像生成部と、
前記画像に基づいて、アイテムを検索する検索部と、
を備える、検索システム。 - 前記学習済みの機械学習モデルは、学習用の画像に対してノイズが加えられたノイズ画像からノイズを除去することにより生成された生成画像と、当該学習用の画像との誤差が小さくなるように機械学習が実行された拡散モデルである、
請求項1に記載の検索システム。 - 前記画像生成部は、前記クエリと前記拡散モデルとを用いて、複数の画像を生成し、
前記検索部は、前記複数の画像に基づいて、アイテムを検索する、
請求項2に記載の検索システム。 - 前記学習済みの機械学習モデルは、ノイズが加えられた学習用の画像からノイズを除去して前記学習用の画像を生成するように機械学習が実行された拡散モデルであり、
前記画像生成部は、前記クエリと前記拡散モデルとを用いて、複数の画像を生成し、
前記検索部は、前記複数の画像に基づいて、アイテムを検索する、
請求項1に記載の検索システム。 - 前記画像生成部は、前記クエリと、画像を分類するように構成された学習済みの分類器を含む前記拡散モデルとを用いて、前記複数の画像を生成する、
請求項3または4に記載の検索システム。 - 予め用意された、画像を分類するように構成された学習済みの複数の分類器から、前記分類器を選択する選択部をさらに備える、
請求項5に記載の検索システム。 - 前記クエリは、テキスト、属性を示すタグ、および画像のいずれか、あるいは、テキスト、属性を示すタグ、および画像のうちの2つ以上の組み合わせを含む、
請求項1から4のいずれか1項に記載の検索システム。 - 所望のアイテムを検索するためのクエリを取得するクエリ取得工程と、
前記クエリと、学習済みの機械学習モデルとを用いて、画像を生成する画像生成工程と、
前記画像に基づいて、アイテムを検索する検索工程と、
を含む、検索方法。 - 所望のアイテムを検索するためのクエリを取得するクエリ取得部と、
前記クエリと、学習済みの拡散モデルとを用いて、複数の画像を生成する画像生成部と、
前記複数の画像に基づいて、アイテムを検索するための特徴ベクトルを取得する特徴ベクトル取得部と、
を備える情報処理装置。
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP23930615.2A EP4675459A4 (en) | 2023-03-31 | 2023-03-31 | RESEARCH SYSTEM, RESEARCH METHOD AND INFORMATION PROCESSING DEVICE |
| PCT/JP2023/013487 WO2024201980A1 (ja) | 2023-03-31 | 2023-03-31 | 検索システム、検索方法、および情報処理装置 |
| JP2025509571A JPWO2024201980A1 (ja) | 2023-03-31 | 2023-03-31 |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2023/013487 WO2024201980A1 (ja) | 2023-03-31 | 2023-03-31 | 検索システム、検索方法、および情報処理装置 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024201980A1 true WO2024201980A1 (ja) | 2024-10-03 |
Family
ID=92904486
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2023/013487 Ceased WO2024201980A1 (ja) | 2023-03-31 | 2023-03-31 | 検索システム、検索方法、および情報処理装置 |
Country Status (3)
| Country | Link |
|---|---|
| EP (1) | EP4675459A4 (ja) |
| JP (1) | JPWO2024201980A1 (ja) |
| WO (1) | WO2024201980A1 (ja) |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2001265781A (ja) | 2000-03-17 | 2001-09-28 | Nec Corp | 電子ショップ用商品情報表示システム及び方法並びに記録媒体 |
| JP2002366575A (ja) * | 2001-03-26 | 2002-12-20 | Lg Electronics Inc | イメージ検索方法及び検索装置 |
| JP2020098521A (ja) * | 2018-12-19 | 2020-06-25 | 富士通株式会社 | 情報処理装置、データ抽出方法およびデータ抽出プログラム |
| WO2022265992A1 (en) * | 2021-06-14 | 2022-12-22 | Google Llc | Diffusion models having improved accuracy and reduced consumption of computational resources |
-
2023
- 2023-03-31 WO PCT/JP2023/013487 patent/WO2024201980A1/ja not_active Ceased
- 2023-03-31 JP JP2025509571A patent/JPWO2024201980A1/ja active Pending
- 2023-03-31 EP EP23930615.2A patent/EP4675459A4/en active Pending
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2001265781A (ja) | 2000-03-17 | 2001-09-28 | Nec Corp | 電子ショップ用商品情報表示システム及び方法並びに記録媒体 |
| JP2002366575A (ja) * | 2001-03-26 | 2002-12-20 | Lg Electronics Inc | イメージ検索方法及び検索装置 |
| JP2020098521A (ja) * | 2018-12-19 | 2020-06-25 | 富士通株式会社 | 情報処理装置、データ抽出方法およびデータ抽出プログラム |
| WO2022265992A1 (en) * | 2021-06-14 | 2022-12-22 | Google Llc | Diffusion models having improved accuracy and reduced consumption of computational resources |
Non-Patent Citations (2)
| Title |
|---|
| "Generating images from text: Visual language model is key(Special feature: Amazing AI encyclopedia that will change industries)", NIKKEI KONPYUTA - NIKKEI COMPUTER, NIKKEI MAGUROUHIRUSHA, TOKYO,, JP, no. 1078, 29 September 2022 (2022-09-29), JP , pages 38 - 41, XP009557803, ISSN: 0285-4619 * |
| See also references of EP4675459A1 |
Also Published As
| Publication number | Publication date |
|---|---|
| EP4675459A1 (en) | 2026-01-07 |
| EP4675459A4 (en) | 2026-02-25 |
| JPWO2024201980A1 (ja) | 2024-10-03 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11803872B2 (en) | Creating meta-descriptors of marketing messages to facilitate in delivery performance analysis, delivery performance prediction and offer selection | |
| JP2024503774A (ja) | 融合パラメータの特定方法及び装置、情報推奨方法及び装置、パラメータ測定モデルのトレーニング方法及び装置、電子機器、記憶媒体、並びにコンピュータプログラム | |
| US20170329840A1 (en) | Computerized system and method for performing a feature-based search and displaying an interactive dynamically updatable, multidimensional user interface therefrom | |
| JP2019531548A (ja) | 視覚検索プラットフォームのための映像取り込みフレームワーク | |
| JP2019527395A (ja) | コンテンツを効果的に配信するための動的クリエイティブの最適化 | |
| JP2017174062A (ja) | 購買行動分析装置およびプログラム | |
| US12417244B2 (en) | Determining user affinities for content generation applications | |
| JP7138264B1 (ja) | 情報処理装置、情報処理方法、情報処理システム、およびプログラム | |
| KR102301663B1 (ko) | 시각 검색 쿼리를 사용하여 물리적 객체를 식별하는 기법 | |
| KR101990502B1 (ko) | 범용화된 정보 추출 방법 및 이를 적용한 디바이스 | |
| EP4123548B1 (en) | Information processing device for generating changed advertising content | |
| EP4195135B1 (en) | Information processing device, information processing method, information processing system, and program | |
| US20240202532A1 (en) | Information processing apparatus, information processing method, and storage medium | |
| CN113641900A (zh) | 信息推荐方法及装置 | |
| JP7550264B1 (ja) | 画像提供装置、画像提供方法及び画像提供プログラム | |
| WO2024201980A1 (ja) | 検索システム、検索方法、および情報処理装置 | |
| JP2022037575A (ja) | 画像評価予測装置および方法 | |
| KR20210041730A (ko) | 패션 상품 추천 방법, 장치 및 컴퓨터 프로그램 | |
| JP2020047013A (ja) | 情報表示プログラム、情報表示方法、情報表示装置、および情報処理システム | |
| JP7676662B2 (ja) | 情報処理装置、情報処理方法、および情報処理プログラム | |
| US20250217578A1 (en) | Information processing system, information processing method, and non-transitory computer readable medium storing program | |
| JP7457157B2 (ja) | 情報処理装置、情報処理方法、およびプログラム | |
| KR102910896B1 (ko) | 성격 유형에 기초한 향수 및 패션 트렌드 추천 방법 | |
| JP7218847B2 (ja) | 情報処理装置、情報処理方法、およびプログラム | |
| JP7068999B2 (ja) | 情報処理装置、情報処理方法及び情報処理プログラム |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23930615 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2025509571 Country of ref document: JP Kind code of ref document: A |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2025509571 Country of ref document: JP |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2023930615 Country of ref document: EP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2023930615 Country of ref document: EP Effective date: 20250930 |
|
| ENP | Entry into the national phase |
Ref document number: 2023930615 Country of ref document: EP Effective date: 20250930 |
|
| ENP | Entry into the national phase |
Ref document number: 2023930615 Country of ref document: EP Effective date: 20250930 |
|
| WWP | Wipo information: published in national office |
Ref document number: 2023930615 Country of ref document: EP |