WO2019196659A1 - 多媒体资源的匹配方法、装置、存储介质及电子装置 - Google Patents

多媒体资源的匹配方法、装置、存储介质及电子装置 Download PDF

Info

Publication number
WO2019196659A1
WO2019196659A1 PCT/CN2019/079988 CN2019079988W WO2019196659A1 WO 2019196659 A1 WO2019196659 A1 WO 2019196659A1 CN 2019079988 W CN2019079988 W CN 2019079988W WO 2019196659 A1 WO2019196659 A1 WO 2019196659A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
frame image
resource
feature
multimedia resource
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2019/079988
Other languages
English (en)
French (fr)
Inventor
徐敘遠
龚国平
吴韬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to JP2020545252A priority Critical patent/JP7013587B2/ja
Priority to EP19785786.5A priority patent/EP3761187B1/en
Publication of WO2019196659A1 publication Critical patent/WO2019196659A1/zh
Priority to US16/930,069 priority patent/US11914639B2/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/45Clustering; Classification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/48Matching video sequences
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Definitions

  • the present application relates to the field of computers, and in particular, to a method, an apparatus, a storage medium, and an electronic device for matching multimedia resources.
  • the multimedia resource providing platform sometimes needs to match the multimedia resources to perform subsequent processing on the multimedia resources.
  • the matching accuracy and matching efficiency of current multimedia resources are low. How to match multimedia resources with high accuracy and efficiency has become the key to improving the efficiency of multimedia resource processing.
  • the embodiment of the present application provides a method, an apparatus, a storage medium, and an electronic device for matching multimedia resources, so as to at least solve the technical problem that the matching efficiency of multimedia resources in the related art is low.
  • a method for matching a multimedia resource includes: searching a first media resource set in a multimedia resource set, where each media resource in the first media resource set is A target frame image satisfies a target condition, a feature of the first target frame image matches a feature in a frame image of a multimedia resource to be matched, and satisfies a first matching condition; and a second target is determined in the first target frame image a frame image, wherein the feature of the second target frame image matches a feature in the frame image of the to-be-matched multimedia resource, and satisfies a second matching condition; and the matching information of the second target frame image is acquired, where The matching information is used to indicate a total duration and a playing time of the second target frame image in the target media resource, and the target media resource is a media resource where the second target frame image is located.
  • the matching method of the multimedia resource is applied to the target device.
  • the target device includes: a terminal device, or a server device.
  • a matching device for a multimedia resource including: a searching module, configured to search a first media resource set in a multimedia resource set, where the first media resource set The first target frame image of each of the media resources satisfies a target condition, and the feature of the first target frame image matches a feature in the frame image of the multimedia resource to be matched, and satisfies a first matching condition; the first determining module, Is configured to determine a second target frame image in the first target frame image, wherein a feature of the second target frame image matches a feature in a frame image of the to-be-matched multimedia resource, and satisfies a second match a first obtaining module, configured to acquire matching information of the second target frame image, where the matching information is used to indicate a total duration and play of the second target frame image in the target media resource
  • the target media resource is the media resource where the second target frame image is located.
  • a storage medium wherein a computer program is stored in the storage medium, wherein the computer program is configured to execute the above-mentioned one in operation The method described.
  • an electronic device comprising a memory and a processor, wherein the memory stores a computer program, the processor being configured to be executed by the computer program The method described in any of the above.
  • the feature of the first target frame image included in the multimedia resource set is matched with the feature of the frame image of the multimedia resource to be matched, and the first matching condition is met, and the first target frame image satisfies the target.
  • a conditional media resource thereby finding a media resource in the resource library similar to the to-be-matched multimedia resource, forming a first media resource set, and determining a feature and a to-be-matched multimedia from the first target frame image of the media resource in the first media resource set
  • the second target frame image that matches the feature in the frame image of the resource and satisfies the second matching condition, and acquires the matching information of the second target frame image, so as to filter the similarity from the multimedia resources similar to the multimedia resource to be matched.
  • the high multimedia resources and the specific matching information are obtained, thereby realizing the technical effect of improving the matching efficiency of the multimedia resources, thereby solving the technical problem that the matching efficiency of the multimedia resources in the related technology is low.
  • FIG. 1 is a schematic diagram of an optional method for matching multimedia resources according to an embodiment of the present application
  • FIG. 2 is a schematic diagram of an application environment of an optional multimedia resource matching method according to an embodiment of the present application
  • FIG. 3 is a schematic diagram of an optional multimedia resource matching method according to an alternative embodiment of the present application.
  • FIG. 4 is a schematic diagram of an optional multimedia resource matching method according to an alternative embodiment of the present application.
  • FIG. 5 is a schematic diagram of an optional multimedia resource matching apparatus according to an embodiment of the present application.
  • FIG. 6 is a schematic diagram of an application scenario of an optional multimedia resource matching method according to an embodiment of the present application.
  • FIG. 7 is a schematic diagram of an alternative electronic device in accordance with an embodiment of the present application.
  • a method for matching a multimedia resource includes:
  • the target device searches for a first media resource set in the multimedia resource set, where the first target frame image of each media resource in the first media resource set satisfies a target condition, and the feature of the first target frame image and the to-be-matched multimedia Feature matching in the frame image of the resource and satisfying the first matching condition;
  • the target device determines, in the first target frame image, the second target frame image, where the feature of the second target frame image matches the feature in the frame image of the multimedia resource to be matched, and satisfies the second matching condition;
  • the target device acquires the matching information of the second target frame image, where the matching information is used to indicate the total duration and the playing time of the second target frame image in the target media resource, and the target media resource is the media where the second target frame image is located. Resources.
  • the foregoing method for matching multimedia resources may be applied to a hardware environment formed by the target device 202 as shown in FIG. 2.
  • the target device 202 searches for a first media resource set in the multimedia resource set, where the first target frame image of each media resource in the first media resource set satisfies a target condition, and the first target frame image
  • the feature matches the feature in the frame image of the multimedia resource to be matched, and satisfies the first matching condition.
  • a second target frame image is determined in the first target frame image, wherein features of the second target frame image match features in the frame image of the multimedia resource to be matched, and the second matching condition is satisfied.
  • Obtaining matching information of the second target frame image wherein the matching information is used to indicate a total duration and a playing time of the second target frame image in the target media resource.
  • the foregoing target device 202 may be, but is not limited to, a terminal device, or may be, but is not limited to, a server device.
  • a terminal device capable of installing a multimedia-enabled client, such as a mobile phone, a tablet computer, a personal computer (PC), and the like.
  • a server corresponding to a client supporting multimedia.
  • the above is only an example, and is not limited in this embodiment.
  • the foregoing matching method of the multimedia resource may be, but is not limited to, applied to a scenario in which the multimedia resource is matched.
  • the above client may be, but is not limited to, various types of applications, such as an online education application, an instant messaging application, a community space application, a game application, a shopping application, a browser application, a financial application, a multimedia application (video application, audio). Applications, etc.), live applications, etc.
  • it may be, but is not limited to, being applied to a scenario in which a video resource is matched in the foregoing video application, or may be, but not limited to, being applied to a scenario in which the audio resource is matched in the instant messaging application to improve The matching efficiency of multimedia resources.
  • the above is only an example, and is not limited in this embodiment.
  • the foregoing multimedia resource may include, but is not limited to, a video resource (a video file, a video stream, and the like), an audio resource (an audio file, an audio stream, and the like), a picture resource (a moving picture, an audio picture, etc.). ), text resources, and more.
  • the target condition that the first target frame image of each media resource in the foregoing first media resource set needs to satisfy may be, but is not limited to, a condition for determining similarity between two multimedia resources.
  • the number of first target images in a multimedia resource is more than the first number, and the proportion of the first target image in the multimedia resource in the multimedia resource is higher than the first ratio, and the time is continuous in a multimedia resource.
  • the number of first target images is more than the second number, the number of consecutive first target images in the multimedia resource is higher than the second ratio, and the like.
  • the first matching condition that the feature of the first target frame image needs to satisfy may be, but is not limited to, the feature that includes the first target frame image and the frame image of the multimedia resource to be matched have the same first type.
  • one or more features of the first type can be extracted from each frame image, for example, the feature of the first type may be a feature extracted by deep learning, and the first type extracted separately from the two frame images. All or all of the features are identical, and it can be confirmed that the two frame images are similar.
  • the second matching condition that the feature of the second target frame image and the feature in the frame image of the multimedia resource to be matched need to satisfy may be, but is not limited to, including the first extracted from the second target frame image.
  • the number of the same or similar features of the second type of features and the second type of features extracted from the frame image of the multimedia resource to be matched is higher than the target value, or the number of the same or similar features accounts for the total number of features
  • the ratio is higher than a certain value.
  • a feature extraction algorithm for example, a scale-invariant feature transform (sift) algorithm, a speeded up robust features (referred to as surf) algorithm, etc.
  • the second type of feature can be considered to be the same frame image if the same or similar second type of features in the two frame images reach a certain number.
  • the matching information of the second target frame image may include, but is not limited to, a total duration and a playing time of the second target frame image in the target media resource.
  • the matching information may include, but is not limited to, a scaling relationship of the matching segment between the target media resource and the to-be-matched media resource, and a total duration of the second target frame image in the target media resource as a percentage of the duration of the target media resource. Wait.
  • the matching information may be used to process the matched multimedia resource. For example, it is determined whether the resource is infringing, the multimedia resource is pushed, the multimedia resource on the interface is typeset, and the like.
  • the video resource (FT) to be matched is input to a deep learning network (for example, a computer vision group network (Visual Geometry Group Net, referred to as VGGNet for short).
  • VGGNet Computer Vision Group Net
  • the VGG features of each frame in the FT are extracted, and the VGG features are matched with the VGG features of the frame images of the multimedia resources in the multimedia resource set, and the first target frames having the VGG features in the multimedia resource set are selected.
  • the frame image is determined as the second target frame image, and the matching information of the second target frame image is acquired.
  • the feature of the first target frame image included in the multimedia resource set is matched with the feature of the frame image of the multimedia resource to be matched, and the first matching condition is met and the first target frame image satisfies the target condition.
  • Media resources thereby finding media resources in the resource library similar to the to-be-matched multimedia resources, forming a first media resource set, and determining features and to-be-matched multimedia resources from the first target frame image of the media resources in the first media resource set
  • the second target frame image that matches the feature in the frame image and satisfies the second matching condition, and acquires the matching information of the second target frame image, thereby filtering out the similarity from the multimedia resources similar to the multimedia resource to be matched.
  • the multimedia resources and the specific matching information are obtained, thereby realizing the technical effect of improving the matching efficiency of the multimedia resources, thereby solving the technical problem that the matching efficiency of the multimedia resources in the related technology is low.
  • the target device searching for the first media resource set in the multimedia resource set includes:
  • the target device determines, from a frame image of the multimedia resource in the multimedia resource set, a first target frame image that satisfies a target condition.
  • the target device acquires the first multimedia resource to which the first target frame image belongs, where the first media resource set includes the first multimedia resource.
  • the storage form in the multimedia resource set may be, but is not limited to, a form of a feature-frame image pair, where the frame image may be represented by a multimedia resource identifier and a coordinate form of a play time point.
  • Dt and D t+1 are characteristics
  • t is a time point
  • videoID is the id number of the video.
  • the first target frame image after acquiring the first target frame image that satisfies the target condition, may be aggregated according to the multimedia resource to find the first multimedia to which the first target frame image belongs. Physical resources. Thereby obtaining a first set of media resources.
  • the target device determines, from the frame image of the multimedia resource in the multimedia resource set, the first target frame image that meets the target condition, including:
  • the target device extracts the first feature from the frame image of the multimedia resource to be matched.
  • the target device acquires a target frame image set corresponding to the first feature from the feature and the frame image set having the corresponding relationship, where the target frame image set includes the frame image of the first feature in the multimedia resource of the first media resource set. And the feature of the frame image in the target frame image set matches the feature in the frame image of the multimedia resource to be matched, and satisfies the first matching condition;
  • the target device acquires a second multimedia resource to which the frame image in the target frame image set belongs.
  • the target device acquires a continuous number of frame images having the first feature in the second multimedia resource.
  • the target device determines, as the first target frame image that meets the target condition, the frame image having the first feature in the second multimedia resource that has the number of consecutive frame images having the first feature falling within the target number threshold range;
  • the target device determines, as the first media resource set, the media resource where the first target frame image that meets the target condition is located.
  • the first feature in the frame image of the multimedia resource to be matched may be extracted by using a plurality of multimedia resource samples and similarity data to train the classified network model to obtain a target classification network model, where
  • the similarity data is data for indicating the similarity between the plurality of multimedia resource samples
  • the loss function of the classification network model is set as the contrast loss function
  • the input parameter of the target classification network model is the frame image of the multimedia resource
  • the target classification network The output parameter of the model is a feature corresponding to the frame image of the multimedia resource; the frame image of the multimedia resource to be matched is input into the target classification network model, and the first feature of the target classification network model output is obtained.
  • the foregoing classification network model may include, but is not limited to, a VGG network, a Google network (GoogleNet), a Resnet network, and the like.
  • determining, by the target device, the second target frame image in the first target frame image includes:
  • the target device extracts a second feature from the first target frame image, and extracts a third feature from the frame image of the multimedia resource to be matched.
  • the target device acquires a correspondence between the first target frame image and the frame image of the multimedia resource to be matched.
  • the target device acquires the number of features matching the second feature of the first target frame image having the corresponding relationship and the third feature of the frame image of the multimedia resource to be matched, and the number of features that do not match each other;
  • the target device obtains a ratio between the number of matching features and the number of mutually unmatched features.
  • the target device determines, as the second target frame image, the first target frame image whose ratio falls within the first ratio range, wherein the frame image whose ratio falls within the first ratio range is the feature and the frame image of the multimedia resource to be matched. A frame image whose features match and satisfy the second matching condition.
  • the second feature of the first target frame image having the corresponding relationship includes S1, S2, S3, S4, S5, S6, S7, S8, S9, and S10.
  • the third feature of the frame image matching the multimedia resource includes: S1, S2, S3, S4, S5, S6, S7, S8, S9, S11, then the matching characteristics between the two are S1, S2, S3, S4.
  • the number of matched features is 9, and the characteristics of the mismatch between the two are S1, S2, S3, S4, S5, S6, S7, S8, S9, S10, S11, the number of mutually mismatched features is 11, and the ratio between the number of matching features and the number of mutually unmatched features is 9/11, assuming that the first ratio range is greater than 3/4, and the ratio is 9 If /11 is greater than 3/4, the first target frame image may be determined as the second target frame image.
  • the target device acquiring the matching information of the second target frame image includes:
  • the target device acquires a target media resource where the second target frame image is located.
  • the target device determines a quantity of a second target frame image included in each target media resource in the target media resource, and a frame rate value of each target media resource, where the frame rate value is used to indicate each target media resource every second. The number of frame images played;
  • the target device determines a product value of the number of second target frame images included in each target media resource and a frame rate value of each target media resource as a total duration corresponding to each target media resource, and each target media The playing time point of the second target frame image included in the resource in each target media resource is determined as the playing time corresponding to each target media resource.
  • the total length of time that the to-be-matched multimedia resource matches the multimedia resource may be determined by, but not limited to, the number of second target frame images in a multimedia resource and the frame rate of the multimedia resource.
  • the zooming of the matching part in the multimedia resource may be determined by, but not limited to, constructing a mapping relationship between a second target frame image having a corresponding relationship and a time point of the frame image to be matched. relationship.
  • the Least squares error is used to estimate the temporal deformation of the video.
  • the method further includes:
  • the target device determines that the to-be-matched multimedia resource infringes the copyright of the target media resource when the ratio between the total duration and the duration of the target media resource falls within a second ratio range, where the target media resource is a copyrighted multimedia. Resources.
  • the infringement determination may be performed on the multimedia resource to be matched according to the obtained matching information. For example, if the video to be matched matches a video in the video library for more than 50% of the total duration of the video, it can be determined that the video to be matched infringes the copyright of the video.
  • the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course, by hardware, but in many cases, the former is A better implementation.
  • the technical solution of the present application which is essential or contributes to the related art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk, CD-ROM).
  • the instructions include a number of instructions for causing a terminal device (which may be a cell phone, a computer, a server, or a network device, etc.) to perform the methods described in various embodiments of the present application.
  • a matching device for implementing a multimedia resource matching method as shown in FIG. 5, the device includes:
  • the searching module 52 is configured to search, in the multimedia resource set, the first media resource set, where the first target frame image of each media resource in the first media resource set satisfies a target condition, and the first target frame image The feature matches the feature in the frame image of the multimedia resource to be matched, and satisfies the first matching condition;
  • the first determining module 54 is configured to determine a second target frame image in the first target frame image, wherein the feature of the second target frame image matches the feature in the frame image of the multimedia resource to be matched, and satisfies Two matching conditions;
  • the first acquisition module 56 is configured to acquire the matching information of the second target frame image, where the matching information is used to indicate the total duration and the playing time of the second target frame image in the target media resource, and the target media resource is the first The media resource where the second target frame image is located.
  • the foregoing matching device of the multimedia resource may be applied to a hardware environment formed by the target device 202 as shown in FIG. 2 .
  • the target device 202 searches for a first media resource set in the multimedia resource set, where the first target frame image of each media resource in the first media resource set satisfies a target condition, and the first target frame image
  • the feature matches the feature in the frame image of the multimedia resource to be matched, and satisfies the first matching condition.
  • a second target frame image is determined in the first target frame image, wherein features of the second target frame image match features in the frame image of the multimedia resource to be matched, and the second matching condition is satisfied.
  • Obtaining matching information of the second target frame image wherein the matching information is used to indicate a total duration and a playing time of the second target frame image in the target media resource.
  • the foregoing target device 202 may be, but is not limited to, a terminal device, or may be, but is not limited to, a server device.
  • a terminal device capable of installing a multimedia-enabled client, such as a mobile phone, a tablet computer, a PC computer, and the like.
  • it may be a server corresponding to a client supporting multimedia.
  • the above is only an example, and is not limited in this embodiment.
  • the foregoing matching device of the multimedia resource may be, but is not limited to, applied to a scenario in which the multimedia resource is matched.
  • the above client may be, but is not limited to, various types of applications, such as an online education application, an instant messaging application, a community space application, a game application, a shopping application, a browser application, a financial application, a multimedia application (video application, audio). Applications, etc.), live applications, etc.
  • it may be, but is not limited to, being applied to a scenario in which a video resource is matched in the foregoing video application, or may be, but not limited to, being applied to a scenario in which the audio resource is matched in the instant messaging application to improve The matching efficiency of multimedia resources.
  • the above is only an example, and is not limited in this embodiment.
  • the foregoing multimedia resource may include, but is not limited to, a video resource (a video file, a video stream, and the like), an audio resource (an audio file, an audio stream, and the like), a picture resource (a moving picture, an audio picture, etc.). ), text resources, and more.
  • the target condition that the first target frame image of each media resource in the foregoing first media resource set needs to satisfy may be, but is not limited to, a condition for determining similarity between two multimedia resources.
  • the number of first target images in a multimedia resource is more than the first number, and the proportion of the first target image in the multimedia resource in the multimedia resource is higher than the first ratio, and the time is continuous in a multimedia resource.
  • the number of first target images is more than the second number, the number of consecutive first target images in the multimedia resource is higher than the second ratio, and the like.
  • the first matching condition that the feature of the first target frame image needs to satisfy may be, but is not limited to, the feature that includes the first target frame image and the frame image of the multimedia resource to be matched have the same first type.
  • one or more features of the first type can be extracted from each frame image, for example, the feature of the first type may be a feature extracted by deep learning, and the first type extracted separately from the two frame images. All or all of the features are identical, and it can be confirmed that the two frame images are similar.
  • the second matching condition that the feature of the second target frame image and the feature in the frame image of the multimedia resource to be matched need to satisfy may be, but is not limited to, including the first extracted from the second target frame image.
  • the number of the same or similar features of the second type of features and the second type of features extracted from the frame image of the multimedia resource to be matched is higher than the target value, or the number of the same or similar features accounts for the total number of features
  • the ratio is higher than a certain value.
  • a feature of the second type can be extracted from a frame image by a feature extraction algorithm (eg, sift algorithm, surf algorithm, etc.), if a feature of the same or similar second type in the two frame images reaches a certain number, It can be considered that two frame images are the same frame image.
  • a feature extraction algorithm eg, sift algorithm, surf algorithm, etc.
  • the matching information of the second target frame image may include, but is not limited to, a total duration of the second target frame image in the target media resource and a play time.
  • the matching information may include, but is not limited to, a scaling relationship of the matching segment between the target media resource and the to-be-matched media resource, and a total duration of the second target frame image in the target media resource as a percentage of the duration of the target media resource. Wait.
  • the matching information may be used to process the matched multimedia resource. For example, it is determined whether the resource is infringing, the multimedia resource is pushed, the multimedia resource on the interface is typed, and the like.
  • the video resource FT to be matched is input into a deep learning network (for example, a VGG network), and the VGG features of each frame in the FT are extracted, and the VGG features are combined with The VGG feature features of the frame image of the multimedia resource in the multimedia resource set are matched, and the first target frame image with the VGG features in the multimedia resource set is filtered out, and the multimedia resource where the first target frame image is located is determined as the first media resource.
  • a deep learning network for example, a VGG network
  • the frame image is determined as the second target frame image, and the matching information of the second target frame image is acquired.
  • the feature of the first target frame image included in the multimedia resource set is matched with the feature of the frame image of the multimedia resource to be matched, and the first matching condition is met and the first target frame image satisfies the target condition.
  • Media resources thereby finding media resources in the resource library similar to the to-be-matched multimedia resources, forming a first media resource set, and determining features and to-be-matched multimedia resources from the first target frame image of the media resources in the first media resource set
  • the second target frame image that matches the feature in the frame image and satisfies the second matching condition, and acquires the matching information of the second target frame image, thereby filtering out the similarity from the multimedia resources similar to the multimedia resource to be matched.
  • the multimedia resources and the specific matching information are obtained, thereby realizing the technical effect of improving the matching efficiency of the multimedia resources, thereby solving the technical problem that the matching efficiency of the multimedia resources in the related technology is low.
  • the lookup module includes:
  • a first determining unit configured to determine, from a frame image of the multimedia resource in the multimedia resource set, a first target frame image that satisfies a target condition
  • the first acquiring unit is configured to acquire the first multimedia resource to which the first target frame image belongs, where the first media resource set includes the first multimedia resource.
  • the storage form in the multimedia resource set may be, but is not limited to, a form of a feature-frame image pair, where the frame image may be represented by a multimedia resource identifier and a coordinate form of a play time point.
  • Dt and D t+1 are characteristics
  • t is a time point
  • videoID is the id number of the video.
  • the first target frame image after acquiring the first target frame image that satisfies the target condition, may be aggregated according to the multimedia resource to find the first multimedia to which the first target frame image belongs. Physical resources. Thereby obtaining a first set of media resources.
  • the first determining unit is set to:
  • the target frame image set includes the frame image having the first feature and the target frame image in the multimedia resource of the first media resource set
  • the feature of the frame image in the set matches the feature in the frame image of the multimedia resource to be matched, and satisfies the first matching condition
  • the first determining unit is further configured to: use a plurality of multimedia resource samples and similarity data to train the classified network model to obtain a target classified network model, where the similarity data is used to indicate multiple The similarity data between the multimedia resource samples, the loss function of the classification network model is set as the contrast loss function, the input parameter of the target classification network model is the frame image of the multimedia resource, and the output parameter of the target classification network model is the frame of the multimedia resource The image corresponding to the image; the frame image of the multimedia resource to be matched is input into the target classification network model, and the first feature of the target classification network model output is obtained.
  • the foregoing classification network model may include, but is not limited to, a VGG network, a GoogleNet network, a Resnet network, and the like.
  • the first determining module is configured to: extract a second feature from the first target frame image, and extract a third feature from the frame image of the multimedia resource to be matched; acquire the first target frame image and Corresponding relationship between frame images of the multimedia resource to be matched; obtaining the number of features matching the second feature of the first target frame image having the corresponding relationship and the third feature of the frame image of the multimedia resource to be matched, and not matching each other
  • the number of features; the ratio between the number of matched features and the number of mutually unmatched features; the first target frame image having the ratio falling within the first ratio range is determined as the second target frame image, wherein the ratio The frame image falling within the first ratio range is a frame image whose feature matches the feature in the frame image of the multimedia resource to be matched and satisfies the second matching condition.
  • the second feature of the first target frame image having the corresponding relationship includes S1, S2, S3, S4, S5, S6, S7, S8, S9, and S10, and the frame image of the multimedia resource to be matched.
  • the third feature includes: S1, S2, S3, S4, S5, S6, S7, S8, S9, S11, then the matching characteristics between the two are S1, S2, S3, S4, S5, S6, S7. , S8, S9, the number of matched features is 9, and the characteristics of the mismatch between the two are S1, S2, S3, S4, S5, S6, S7, S8, S9, S10, S11, which do not match each other.
  • the number of features is 11, and the ratio between the number of matched features and the number of mutually unmatched features is 9/11, assuming that the first ratio range is greater than 3/4, and the ratio 9/11 is greater than 3/4. Then, the first target frame image can be determined as the second target frame image.
  • the first obtaining module includes:
  • a second acquiring unit configured to acquire a target media resource where the second target frame image is located
  • a second determining unit configured to determine a number of second target frame images included in each target media resource in the target media resource and a frame rate value of each target media resource, wherein the frame rate value is used to indicate each The number of frame images played by the target media resource per second;
  • a third determining unit configured to determine a product value of the number of second target frame images included in each target media resource and a frame rate value of each target media resource as a total duration corresponding to each target media resource, And playing a play time point of the second target frame image included in each target media resource in each target media resource as a play time corresponding to each target media resource.
  • the total length of time that the to-be-matched multimedia resource matches the multimedia resource may be determined by, but not limited to, the number of second target frame images in a multimedia resource and the frame rate of the multimedia resource.
  • the zooming of the matching part in the multimedia resource may be determined by, but not limited to, constructing a mapping relationship between a second target frame image having a corresponding relationship and a time point of the frame image to be matched. relationship.
  • the Least squares error is used to estimate the temporal deformation of the video.
  • the above device further includes:
  • a second obtaining module configured to obtain a ratio between the total duration and the duration of the target media resource
  • the second determining module is configured to determine that the to-be-matched multimedia resource infringes the copyright of the target media resource, where the ratio between the total duration and the duration of the target media resource falls within the second ratio range, wherein the target media Resources are copyrighted multimedia resources.
  • the infringement determination may be performed on the multimedia resource to be matched according to the obtained matching information. For example, if the video to be matched matches a video in the video library for more than 50% of the total duration of the video, it can be determined that the video to be matched infringes the copyright of the video.
  • the application environment of the embodiment of the present application may be, but is not limited to, the reference to the application environment in the foregoing embodiments.
  • An embodiment of the present application provides an optional optional application example of a connection method for implementing the above-described real-time communication.
  • the foregoing matching method of the multimedia resource may be, but is not limited to, applied to a scenario in which video resources are matched as shown in FIG. 6.
  • the video matching process includes two feature matching processes, matching of VGG hash features and rootsift feature matching process.
  • the similarity matching of the video is performed by using the hash feature of VGG (the VGG hash fingerprint library refers to the feature set of the copyrighted video).
  • the feature matching process of VGG if not similar, the result is directly output. If they are similar, a second correction will be made -- the feature match of rootsift. After the feature matching of rootsift, the corrected result will be output uniformly.
  • the video feature is extracted by the input video (corresponding to the above-mentioned multimedia resource to be matched) first undergoing a frame rate change to K frames/second (for example, K is 3). Then two methods of feature extraction are applied: feature extraction for deep learning and traditional feature extraction.
  • a traditional classification network such as VGG, GoogleNet, Resnet
  • Migration learning is performed in a pre-trained classification network (eg, a VGG network with 1000 object classifications, trained using the public dataset imageNet).
  • the similarity of the two images can be measured by collecting similar data sets of a batch of pictures and changing the last loss layer of the classified network VGG (where VGG is used as an example, other networks are also applicable) to the contrast loss. Then carry out migration learning to get a network with the ability to distinguish picture similarity.
  • each picture has only one feature, denoted here as Fdt, and t represents a certain point in time.
  • the extracted features are transformed into hashes by median cut and recorded as Dt.
  • the rootSift method is adopted. First, sift feature extraction is performed on the extracted video frames to obtain P features. A normalized operation is then performed on the P features. Normalization can increase resistance to cockroaches:
  • V sift (v 1 , v 2 ,...,V 128 );
  • the normalized feature is subjected to median binarization, and P hash values are obtained for each frame, denoted as T t,i , where i ⁇ [0, P).
  • the process of video fingerprint matching is performed in the following manner: the matching of the video fingerprint includes two processes: 1. VGG feature matching, 2. rootSift feature matching.
  • the flow is as follows: the input video first performs VGG feature matching because the features extracted by VGG are more abstract and the number of hashes is smaller. Ideal for first-time video matching filtering. VGG feature matching can have a high recall rate. After the VGG feature is matched, the similarity of the video can be calculated. For the similarity greater than the threshold, the rootsift matching analysis is performed to further confirm the matching information of the video. Rootsift has a good description of the details to better ensure accuracy.
  • the feature matching process of the VGG includes: fingerprint feature extraction, hash conversion, and time domain matching analysis.
  • the input video is first extracted by video fingerprint feature, and then converted by median binary value to obtain a series of hash feature values and time points corresponding to the hash.
  • the fingerprint library stores the characteristics of the copyright video (Dt, videoID, t), t is the time point, and videoID is the id number of the video. And such features are stored according to the data structure of the inverted table:
  • the input video is divided into multiple pieces of Ks (now K is 5), and each piece is matched individually.
  • K hash values
  • a single slice has a total of 15 hash values (D i , i ⁇ [0, 15)).
  • D i hash feature values
  • For each D i compare it with the features in the fingerprint library to find the corresponding video information of the hash feature values (such as Dt equal data) ([t j , videoID k ], [t k , videoID x ]..), then aggregate by videoID k , count the number of frame images that are consecutively matched in time for videoID k , and then divide by 15, to obtain the similarity.
  • a video clip with a similar value greater than 0.8 is taken as a matching segment.
  • the degree of similarity can be calculated for each Ks segment, and finally the number R of pieces of each video similar to the input video is obtained.
  • the similarity MatchPer of the VGG feature matching calculation is greater than a certain threshold (for example, Q, Q is 50), and the feature matching of the rootsift is performed.
  • a certain threshold for example, Q, Q is 50
  • the feature matching of the rootsift is performed.
  • a list of videoIDs matching the VGG can be obtained.
  • the video matching of the input video and the VGG match is performed twice. The first input video will be extracted via the rootsift feature, and the VGG-matched video rootsift feature will be read by the fingerprint library.
  • a pairwise matching strategy is adopted, that is, the input video and the video in the videoID list are matched one by one. Find the matching information.
  • the similarity of each frame of image is calculated in the following way:
  • T t1 is the video feature of the t1 time of the input video
  • T' t2 is the video feature of the feature of the videoID list at time t2.
  • describes the number of similar two video features
  • describes the total number of different hash feature features in the two video features.
  • the information of the matching time points of the two videos is obtained.
  • the Least squares error is used to estimate the temporal deformation of the video.
  • you can get the length of the match you can get the length of the match.
  • the percentage of matching of the video can be calculated based on the length of the match and the length of the input video (the percentage calculation can be adjusted according to the relevant business logic).
  • the matching of the video is determined by the percentage or the information of the matching duration. In the rootsift secondary match, the videos in the videoID list are matched in turn.
  • the matching result (including matching time point, matching duration, etc.) is output.
  • the originality of the video created by the high-quality user can be protected, the original protection can be provided to the user, and the advertisement can be provided.
  • the advertisement can be provided.
  • Copyright protection for copyrights such as movies, TV series, and variety shows.
  • an electronic device for implementing matching of the foregoing multimedia resources includes: one or more (only one shown in the figure) The processor 702, the memory 704, the sensor 706, the encoder 708, and the transmission device 710, in which the computer program is stored, the processor being arranged to perform the steps in any of the above method embodiments by a computer program.
  • the foregoing electronic device may be located in at least one network device of the plurality of network devices of the computer network.
  • the foregoing processor may be configured to perform the following steps by using a computer program:
  • the second target frame image is determined in the first target frame image, where the feature of the second target frame image matches the feature in the frame image of the multimedia resource to be matched, and the second matching condition is met;
  • the matching information of the second target frame image is obtained, where the matching information is used to indicate the total duration and the playing time of the second target frame image in the target media resource, and the target media resource is the media resource where the second target frame image is located.
  • the structure shown in FIG. 7 is merely illustrative, and the electronic device may also be a smart phone (such as an Android mobile phone, an iOS mobile phone, etc.), a tablet computer, a palm computer, and a mobile Internet device (Mobile). Terminal devices such as Internet Devices, MID) and PAD.
  • FIG. 7 does not limit the structure of the above electronic device.
  • the electronic device may also include more or fewer components (such as a network interface, display device, etc.) than shown in FIG. 7, or have a different configuration than that shown in FIG.
  • the memory 702 can be configured to store a software program and a module, such as a matching method of the multimedia resource in the embodiment of the present application and a program instruction/module corresponding to the device, and the processor 704 runs the software program and the module stored in the memory 702. , thereby performing various functional applications and data processing, that is, implementing the above-described control method of the target component.
  • Memory 702 can include high speed random access memory, and can also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory.
  • memory 702 can further include memory remotely located relative to processor 704, which can be connected to the terminal over a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the transmission device 710 described above is arranged to receive or transmit data via a network. Examples of network options described above may include wired networks and wireless networks.
  • the transmission device 710 includes a Network Interface Controller (NIC) that can be connected to other network devices and routers via a network cable to communicate with the Internet or a local area network.
  • NIC Network Interface Controller
  • transmission device 710 is a radio frequency (RF) module that is configured to communicate with the Internet wirelessly.
  • RF radio frequency
  • the memory 702 is configured to store an application.
  • Embodiments of the present application also provide a storage medium having stored therein a computer program, wherein the computer program is configured to execute the steps of any one of the method embodiments described above.
  • the above storage medium may be configured to store a computer program for performing the following steps:
  • the matching information of the second target frame image is obtained, where the matching information is used to indicate the total duration and the playing time of the second target frame image in the target media resource, and the target media resource is the media resource where the second target frame image is located.
  • the storage medium is further configured to store a computer program for performing the steps included in the method in the above embodiments, which will not be described in detail in this embodiment.
  • the storage medium may include a flash disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, and the like.
  • the integrated unit in the above embodiment if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in the above-described computer readable storage medium.
  • the technical solution of the present application may be embodied in the form of a software product, or the whole or part of the technical solution, which is stored in the storage medium, including
  • the instructions are used to cause one or more computer devices (which may be a personal computer, server or network device, etc.) to perform all or part of the steps of the methods described in the various embodiments of the present application.
  • the disclosed client may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, unit or module, and may be electrical or otherwise.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Library & Information Science (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

本申请公开了一种多媒体资源的匹配方法、装置、存储介质及电子装置。其中,该方法包括:目标设备在多媒体资源集合中查找第一媒体资源集合,其中,第一媒体资源集合中的每个媒体资源的第一目标帧图像满足目标条件,第一目标帧图像的特征与待匹配多媒体资源的帧图像中的特征匹配、且满足第一匹配条件;目标设备在第一目标帧图像中确定第二目标帧图像,其中,第二目标帧图像的特征与待匹配多媒体资源的帧图像中的特征匹配、且满足第二匹配条件;目标设备获取第二目标帧图像的匹配信息,其中,匹配信息用于指示第二目标帧图像在目标媒体资源中的总时长和播放时刻。本申请解决了相关技术中多媒体资源的匹配效率较低的技术问题。

Description

多媒体资源的匹配方法、装置、存储介质及电子装置
本申请要求于2018年04月13日提交中国专利局、申请号为2018103338056、发明名称“多媒体资源的匹配方法、装置、存储介质及电子装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机领域,具体而言,涉及一种多媒体资源的匹配方法、装置、存储介质及电子装置。
背景技术
随着计算机和网络技术的飞速发展,人们能够在网络上接触到越来越多的多媒体资源。多媒体资源的提供平台有时会需要对多媒体资源进行匹配,以对多媒体资源进行后续的处理。但目前的多媒体资源的匹配方式准确率和匹配效率都较低。如何能够高准确率、高效地对多媒体资源进行匹配成为了提高多媒体资源处理效率的关键。
针对上述的问题,目前尚未提出有效的解决方案。
发明内容
本申请实施例提供了一种多媒体资源的匹配方法、装置、存储介质及电子装置,以至少解决相关技术中多媒体资源的匹配效率较低的技术问题。
根据本申请实施例的一个方面,提供了一种多媒体资源的匹配方法,包括:在多媒体资源集合中查找第一媒体资源集合,其中,所述第一媒体资源集合中的每个媒体资源的第一目标帧图像满足目标条件,所述第一目标帧图像的特征与待匹配多媒体资源的帧图像中的特征匹配、且满足第一匹配条件;在所述第一目标帧图像中确定第二目标帧图像,其中,所述第 二目标帧图像的特征与所述待匹配多媒体资源的帧图像中的特征匹配、且满足第二匹配条件;获取所述第二目标帧图像的匹配信息,其中,所述匹配信息用于指示所述第二目标帧图像在所述目标媒体资源中的总时长和播放时刻,目标媒体资源为第二目标帧图像所在的媒体资源。
可选地,在本实施例中,所述多媒体资源的匹配方法应用于目标设备。
可选地,在本实施例中,所述目标设备包括:终端设备,或者,服务器设备。
根据本申请实施例的另一方面,还提供了一种多媒体资源的匹配装置,包括:查找模块,被设置为在多媒体资源集合中查找第一媒体资源集合,其中,所述第一媒体资源集合中的每个媒体资源的第一目标帧图像满足目标条件,所述第一目标帧图像的特征与待匹配多媒体资源的帧图像中的特征匹配、且满足第一匹配条件;第一确定模块,被设置为在所述第一目标帧图像中确定第二目标帧图像,其中,所述第二目标帧图像的特征与所述待匹配多媒体资源的帧图像中的特征匹配、且满足第二匹配条件;第一获取模块,被设置为获取所述第二目标帧图像的匹配信息,其中,所述匹配信息用于指示所述第二目标帧图像在所述目标媒体资源中的总时长和播放时刻,目标媒体资源为第二目标帧图像所在的媒体资源。
根据本申请实施例的另一方面,还提供了一种存储介质,其特征在于,所述存储介质中存储有计算机程序,其中,所述计算机程序被设置为运行时执行上述任一项中所述的方法。
根据本申请实施例的另一方面,还提供了一种电子装置,包括存储器和处理器,其特征在于,所述存储器中存储有计算机程序,所述处理器被设置为通过所述计算机程序执行上述任一项中所述的方法。
在本申请实施例中,首先从多媒体资源集合中查找到包括的第一目标帧图像的特征与待匹配多媒体资源的帧图像的特征匹配并满足第一匹配条件并且这些第一目标帧图像满足目标条件的媒体资源,从而找到资源库 中与待匹配多媒体资源相似的媒体资源,组成第一媒体资源集合,再从第一媒体资源集合中媒体资源的第一目标帧图像中确定特征与待匹配多媒体资源的帧图像中的特征匹配、且满足第二匹配条件的第二目标帧图像,并获取第二目标帧图像的匹配信息,从而从与待匹配多媒体资源相似的多媒体资源中筛选出相似度更加高的多媒体资源,并获取到具体的匹配信息,从而实现了提高多媒体资源的匹配效率的技术效果,进而解决了相关技术中多媒体资源的匹配效率较低的技术问题。
附图说明
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1是根据本申请实施例的一种可选的多媒体资源的匹配方法的示意图;
图2是根据本申请实施例的一种可选的多媒体资源的匹配方法的应用环境示意图;
图3是根据本申请可选的实施方式的一种可选的多媒体资源的匹配方法的示意图;
图4是根据本申请可选的实施方式的一种可选的多媒体资源的匹配方法的示意图;
图5是根据本申请实施例的一种可选的多媒体资源的匹配装置的示意图;
图6是根据本申请实施例的一种可选的多媒体资源的匹配方法的应用场景示意图;以及
图7是根据本申请实施例的一种可选的电子装置的示意图。
具体实施方式
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请保护的范围。
需要说明的是,本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
根据本申请实施例的一个方面,提供了一种多媒体资源的匹配方法,如图1所示,该方法包括:
S102,目标设备在多媒体资源集合中查找第一媒体资源集合,其中,第一媒体资源集合中的每个媒体资源的第一目标帧图像满足目标条件,第一目标帧图像的特征与待匹配多媒体资源的帧图像中的特征匹配、且满足第一匹配条件;
S104,目标设备在第一目标帧图像中确定第二目标帧图像,其中,第二目标帧图像的特征与待匹配多媒体资源的帧图像中的特征匹配、且满足第二匹配条件;
S106,目标设备获取第二目标帧图像的匹配信息,其中,匹配信息用于指示第二目标帧图像在目标媒体资源中的总时长和播放时刻,目标媒体资源为第二目标帧图像所在的媒体资源。
可选地,在本实施例中,上述多媒体资源的匹配方法可以应用于如图 2所示的目标设备202所构成的硬件环境中。如图2所示,目标设备202在多媒体资源集合中查找第一媒体资源集合,其中,第一媒体资源集合中的每个媒体资源的第一目标帧图像满足目标条件,第一目标帧图像的特征与待匹配多媒体资源的帧图像中的特征匹配、且满足第一匹配条件。在第一目标帧图像中确定第二目标帧图像,其中,第二目标帧图像的特征与待匹配多媒体资源的帧图像中的特征匹配、且满足第二匹配条件。获取第二目标帧图像的匹配信息,其中,匹配信息用于指示第二目标帧图像在目标媒体资源中的总时长和播放时刻。
可选地,在本实施例中,上述目标设备202可以但不限为终端设备,或者,也可以但不限于是服务器设备。例如:能够安装支持多媒体的客户端的终端设备,比如:手机、平板电脑、个人计算机(Personal Computer,简称为PC)等等。或者,还可以是支持多媒体的客户端对应的服务器。上述仅是一种示例,本实施例中对此不做任何限定。
可选地,在本实施例中,上述多媒体资源的匹配方法可以但不限于应用于对多媒体资源进行匹配的场景中。其中,上述客户端可以但不限于为各种类型的应用,例如,在线教育应用、即时通讯应用、社区空间应用、游戏应用、购物应用、浏览器应用、金融应用、多媒体应用(视频应用、音频应用等等)、直播应用等。可选的,可以但不限于应用于在上述视频应用中对视频资源进行匹配的场景中,或还可以但不限于应用于在上述在即时通讯应用中对音频资源进行匹配的场景中,以提高多媒体资源的匹配效率。上述仅是一种示例,本实施例中对此不做任何限定。
可选地,在本实施例中,上述多媒体资源可以但不限于包括:视频资源(视频文件、视频流等)、音频资源(音频文件、音频流等)、图片资源(动图、有声图片等)、文字资源等等。
可选地,在本实施例中,上述第一媒体资源集合中的每个媒体资源的第一目标帧图像需满足的目标条件可以但不限于是用于确定两个多媒体资源相似度的条件。例如:一个多媒体资源中第一目标图像的数量多于第 一数量、一个多媒体资源中第一目标图像在该多媒体资源中所占的比例高于第一比例、一个多媒体资源中在时间上连续的第一目标图像的数量多于第二数量、上述连续的第一目标图像的数量在该多媒体资源中所占的比例高于第二比例等等。
可选地,在本实施例中,第一目标帧图像的特征需满足的第一匹配条件可以但不限于包括第一目标帧图像与待匹配多媒体资源的帧图像具有相同的第一类型的特征。例如:从每个帧图像中能够提取出一个或者多个该第一类型的特征,比如:第一类型的特征可以是通过深度学习提取的特征,当两个帧图像中分别提取的第一类型的特征中全部或者是有部分特征是相同的,则可以确认两个帧图像是相似的。
可选地,在本实施例中,第二目标帧图像的特征与待匹配多媒体资源的帧图像中的特征需满足的第二匹配条件可以但不限于包括从第二目标帧图像中提取的第二类型的特征与从待匹配多媒体资源的帧图像中提取的第二类型的特征中相同或者相似的特征的数量高于目标值,或者该相同或者相似的特征的数量占二者特征总数量的比例高于某值。例如:可以通过特征提取算法(例如:尺度不变特征变换(Scale-invariant feature transform,简称为sift)算法、加速稳健特征(speeded up robust features,简称为surf)算法等)从一个帧图像中提取第二类型的特征,如果两个帧图像中相同或者相似的第二类型的特征达到一定的数量,则可以认为两个帧图像是相同的帧图像。
可选地,在本实施例中,第二目标帧图像的匹配信息可以但不限于包括:第二目标帧图像在目标媒体资源中的总时长和播放时刻。或者,匹配信息还可以但不限于包括:目标媒体资源与待匹配媒体资源之间匹配的片段的缩放关系,第二目标帧图像在目标媒体资源中的总时长占目标媒体资源的时长的百分比等等。
可选地,在本实施例中,获取到第二目标帧图像的匹配信息后,可以使用这些匹配信息对待匹配多媒体资源进行处理。例如:判定该资源是否 侵权、进行多媒体资源推送、对界面上的多媒体资源进行排版等等。
在一个可选的实施方式中,以视频资源为例,如图3所示,将待匹配的视频资源(FT)输入到深度学习网络(例如计算机视觉组网络(Visual Geometry Group Net,简称为VGGNet))中,提取出FT中每一帧的VGG特征,将这些VGG特征与多媒体资源集合中多媒体资源的帧图像的VGG特征进行匹配,筛选出多媒体资源集合中具有这些VGG特征的第一目标帧图像,将这些第一目标帧图像所在的多媒体资源确定为第一媒体资源集合中的媒体资源。再从待匹配多媒体资源的帧图像中提取sift特征,并将待匹配多媒体资源的帧图像中的sift特征与第一目标帧图像中的sift特征进行匹配,将第一目标帧图像中匹配成功的帧图像确定为第二目标帧图像,并获取第二目标帧图像的匹配信息。
可见,通过上述步骤,首先从多媒体资源集合中查找到包括的第一目标帧图像的特征与待匹配多媒体资源的帧图像的特征匹配并满足第一匹配条件并且这些第一目标帧图像满足目标条件的媒体资源,从而找到资源库中与待匹配多媒体资源相似的媒体资源,组成第一媒体资源集合,再从第一媒体资源集合中媒体资源的第一目标帧图像中确定特征与待匹配多媒体资源的帧图像中的特征匹配、且满足第二匹配条件的第二目标帧图像,并获取第二目标帧图像的匹配信息,从而从与待匹配多媒体资源相似的多媒体资源中筛选出相似度更加高的多媒体资源,并获取到具体的匹配信息,从而实现了提高多媒体资源的匹配效率的技术效果,进而解决了相关技术中多媒体资源的匹配效率较低的技术问题。
作为一种可选的方案,目标设备在多媒体资源集合中查找第一媒体资源集合包括:
S1,目标设备从多媒体资源集合中的多媒体资源的帧图像中确定满足目标条件的第一目标帧图像;
S2,目标设备获取第一目标帧图像所属的第一多媒体资源,其中,第一媒体资源集合中包括第一多媒体资源。
可选地,在本实施例中,多媒体资源集合中的存储形式可以但不限于是特征-帧图像对的形式,其中,帧图像可以用多媒体资源标识和播放时间点的坐标形式来进行表示。例如:[D t]:{[t j,videoID k],[t k,videoID x]...}、[D t+1]:{[t j+n,videoID k+h],[t k,videoID x]...}等等。其中,Dt和D t+1为特征,t是时间点,videoID是视频的id编号。通过这种形式就可以筛选出在第一多媒体资源集合中的哪个多媒体资源的哪个帧图像具有与待匹配多媒体资源的帧图像相同或相似的特征。
可选地,在本实施例中,在获取到满足目标条件的第一目标帧图像后,可以对第一目标帧图像按照多媒体资源进行聚合,找出第一目标帧图像所属的第一多媒体资源。从而得到第一媒体资源集合。
作为一种可选的方案,目标设备从多媒体资源集合中的多媒体资源的帧图像中确定满足目标条件的第一目标帧图像包括:
S1,目标设备从待匹配多媒体资源的帧图像中提取第一特征;
S2,目标设备从具有对应关系的特征和帧图像集合中获取第一特征对应的目标帧图像集合,其中,目标帧图像集合中包括第一媒体资源集合的多媒体资源中具有第一特征的帧图像,目标帧图像集合中的帧图像的特征与待匹配多媒体资源的帧图像中的特征匹配、且满足第一匹配条件;
S3,目标设备获取目标帧图像集合中的帧图像所属的第二多媒体资源;
S4,目标设备获取第二多媒体资源中连续的具有第一特征的帧图像的数量;
S5,目标设备将连续的具有第一特征的帧图像的数量落入目标数量阈值范围的第二多媒体资源中的具有第一特征的帧图像确定为满足目标条件的第一目标帧图像;
S6,目标设备将所述满足所述目标条件的所述第一目标帧图像所在的媒体资源确定为所述第一媒体资源集合。
可选地,在本实施例中,可以通过以下方式提取待匹配多媒体资源的帧图像中的第一特征:使用多个多媒体资源样本和相似度数据训练分类网络模型,得到目标分类网络模型,其中,相似度数据为用于指示多个多媒体资源样本之间的相似度的数据,分类网络模型的损失函数设置为对比损失函数,目标分类网络模型的输入参数为多媒体资源的帧图像,目标分类网络模型的输出参数为多媒体资源的帧图像对应的特征;将待匹配多媒体资源的帧图像输入目标分类网络模型,得到目标分类网络模型输出的第一特征。
可选地,在本实施例中,上述分类网络模型可以但不限于包括VGG网络、谷歌网络(GoogleNet)、残差网络(Resnet)网络等等。
作为一种可选的方案,目标设备在第一目标帧图像中确定第二目标帧图像包括:
S1,目标设备从第一目标帧图像中提取第二特征,并从待匹配多媒体资源的帧图像中提取第三特征;
S2,目标设备获取第一目标帧图像与待匹配多媒体资源的帧图像之间的对应关系;
S3,目标设备获取具有对应关系的第一目标帧图像的第二特征与待匹配多媒体资源的帧图像的第三特征中相匹配的特征的数量以及互不匹配的特征的数量;
S4,目标设备获取相匹配的特征的数量以及互不匹配的特征的数量之间的比值;
S5,目标设备将比值落入第一比值范围的第一目标帧图像确定为第二目标帧图像,其中,比值落入第一比值范围的帧图像为特征与待匹配多媒体资源的帧图像中的特征匹配、且满足第二匹配条件的帧图像。
在一个可选的实施方式中,如图4所示,具有对应关系的第一目标帧图像的第二特征包括S1、S2、S3、S4、S5、S6、S7、S8、S9、S10,待 匹配多媒体资源的帧图像的第三特征包括:S1、S2、S3、S4、S5、S6、S7、S8、S9、S11,那么,二者之间相匹配的特征为S1、S2、S3、S4、S5、S6、S7、S8、S9,相匹配的特征的数量为9,二者之间互不匹配的特征为S1、S2、S3、S4、S5、S6、S7、S8、S9、S10、S11,互不匹配的特征的数量为11,则相匹配的特征的数量以及互不匹配的特征的数量之间的比值为9/11,假设第一比值范围为大于3/4,该比值9/11大于3/4,则可以将该第一目标帧图像确定为第二目标帧图像。
作为一种可选的方案,目标设备获取第二目标帧图像的匹配信息包括:
S1,目标设备获取第二目标帧图像所在的目标媒体资源;
S2,目标设备确定目标媒体资源中每个目标媒体资源包含的第二目标帧图像的数量以及每个目标媒体资源的帧率值,其中,帧率值用于指示每个目标媒体资源每一秒所播放的帧图像的数量;
S3,目标设备将每个目标媒体资源包含的第二目标帧图像的数量与每个目标媒体资源的帧率值的乘积值确定为每个目标媒体资源对应的总时长,并将每个目标媒体资源包含的第二目标帧图像在每个目标媒体资源中的播放时间点确定为每个目标媒体资源对应的播放时刻。
可选地,在本实施例中,可以但不限于通过一个多媒体资源中第二目标帧图像数量以及该多媒体资源的帧率来确定待匹配多媒体资源与该多媒体资源相匹配的总时长。
可选地,在本实施例中,还可以但不限于通过构造具有对应关系的第二目标帧图像和待匹配的帧图像的时间点之间的映射关系来确定多媒体资源中匹配的部分的缩放关系。例如:通过构造at1+bt2=c的时间点映射关系(t1为输入的视频的时间点,t2为匹配的视频的时间点)去估算视频匹配时域上的缩放关系。采用最小平方法(Least squares error)去估计视频时域变形的信息。
作为一种可选的方案,在目标设备获取第二目标帧图像的匹配信息之 后,还包括:
S1,目标设备获取总时长与目标媒体资源的时长之间的比值;
S2,目标设备在总时长与目标媒体资源的时长之间的比值落入第二比值范围的情况下,确定待匹配多媒体资源侵犯了目标媒体资源的版权,其中,目标媒体资源是具有版权的多媒体资源。
可选地,在本实施例中,可以根据得到的匹配信息对待匹配多媒体资源进行侵权判定。例如:如果待匹配视频与视频库中某个视频匹配的时长超过了该视频总时长的50%,则可以确定待匹配视频侵犯了该视频的版权。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。
根据本申请实施例的另一个方面,还提供了一种用于实施上述多媒体资源的匹配方法的多媒体资源的匹配装置,如图5所示,该装置包括:
1)查找模块52,被设置为在多媒体资源集合中查找第一媒体资源集合,其中,第一媒体资源集合中的每个媒体资源的第一目标帧图像满足目标条件,第一目标帧图像的特征与待匹配多媒体资源的帧图像中的特征匹 配、且满足第一匹配条件;
2)第一确定模块54,被设置为在第一目标帧图像中确定第二目标帧图像,其中,第二目标帧图像的特征与待匹配多媒体资源的帧图像中的特征匹配、且满足第二匹配条件;
3)第一获取模块56,被设置为获取第二目标帧图像的匹配信息,其中,匹配信息用于指示第二目标帧图像在目标媒体资源中的总时长和播放时刻,目标媒体资源为第二目标帧图像所在的媒体资源。
可选地,在本实施例中,上述多媒体资源的匹配装置可以应用于如图2所示的目标设备202所构成的硬件环境中。如图2所示,目标设备202在多媒体资源集合中查找第一媒体资源集合,其中,第一媒体资源集合中的每个媒体资源的第一目标帧图像满足目标条件,第一目标帧图像的特征与待匹配多媒体资源的帧图像中的特征匹配、且满足第一匹配条件。在第一目标帧图像中确定第二目标帧图像,其中,第二目标帧图像的特征与待匹配多媒体资源的帧图像中的特征匹配、且满足第二匹配条件。获取第二目标帧图像的匹配信息,其中,匹配信息用于指示第二目标帧图像在目标媒体资源中的总时长和播放时刻。
可选地,在本实施例中,上述目标设备202可以但不限为终端设备,或者,也可以但不限于是服务器设备。例如:能够安装支持多媒体的客户端的终端设备,比如:手机、平板电脑、PC计算机等等。或者,还可以是支持多媒体的客户端对应的服务器。上述仅是一种示例,本实施例中对此不做任何限定。
可选地,在本实施例中,上述多媒体资源的匹配装置可以但不限于应用于对多媒体资源进行匹配的场景中。其中,上述客户端可以但不限于为各种类型的应用,例如,在线教育应用、即时通讯应用、社区空间应用、游戏应用、购物应用、浏览器应用、金融应用、多媒体应用(视频应用、音频应用等等)、直播应用等。可选的,可以但不限于应用于在上述视频应用中对视频资源进行匹配的场景中,或还可以但不限于应用于在上述在 即时通讯应用中对音频资源进行匹配的场景中,以提高多媒体资源的匹配效率。上述仅是一种示例,本实施例中对此不做任何限定。
可选地,在本实施例中,上述多媒体资源可以但不限于包括:视频资源(视频文件、视频流等)、音频资源(音频文件、音频流等)、图片资源(动图、有声图片等)、文字资源等等。
可选地,在本实施例中,上述第一媒体资源集合中的每个媒体资源的第一目标帧图像需满足的目标条件可以但不限于是用于确定两个多媒体资源相似度的条件。例如:一个多媒体资源中第一目标图像的数量多于第一数量、一个多媒体资源中第一目标图像在该多媒体资源中所占的比例高于第一比例、一个多媒体资源中在时间上连续的第一目标图像的数量多于第二数量、上述连续的第一目标图像的数量在该多媒体资源中所占的比例高于第二比例等等。
可选地,在本实施例中,第一目标帧图像的特征需满足的第一匹配条件可以但不限于包括第一目标帧图像与待匹配多媒体资源的帧图像具有相同的第一类型的特征。例如:从每个帧图像中能够提取出一个或者多个该第一类型的特征,比如:第一类型的特征可以是通过深度学习提取的特征,当两个帧图像中分别提取的第一类型的特征中全部或者是有部分特征是相同的,则可以确认两个帧图像是相似的。
可选地,在本实施例中,第二目标帧图像的特征与待匹配多媒体资源的帧图像中的特征需满足的第二匹配条件可以但不限于包括从第二目标帧图像中提取的第二类型的特征与从待匹配多媒体资源的帧图像中提取的第二类型的特征中相同或者相似的特征的数量高于目标值,或者该相同或者相似的特征的数量占二者特征总数量的比例高于某值。例如:可以通过特征提取算法(例如:sift算法、surf算法等)从一个帧图像中提取第二类型的特征,如果两个帧图像中相同或者相似的第二类型的特征达到一定的数量,则可以认为两个帧图像是相同的帧图像。
可选地,在本实施例中,第二目标帧图像的匹配信息可以但不限于包 括:第二目标帧图像在目标媒体资源中的总时长和播放时刻。或者,匹配信息还可以但不限于包括:目标媒体资源与待匹配媒体资源之间匹配的片段的缩放关系,第二目标帧图像在目标媒体资源中的总时长占目标媒体资源的时长的百分比等等。
可选地,在本实施例中,获取到第二目标帧图像的匹配信息后,可以使用这些匹配信息对待匹配多媒体资源进行处理。例如:判定该资源是否侵权、进行多媒体资源推送、对界面上的多媒体资源进行排版等等。
在一个可选的实施方式中,以视频资源为例,将待匹配的视频资源FT输入到深度学习网络(例如VGG网络)中,提取出FT中每一帧的VGG特征,将这些VGG特征与多媒体资源集合中多媒体资源的帧图像的VGG特征特征进行匹配,筛选出多媒体资源集合中具有这些VGG特征的第一目标帧图像,将这些第一目标帧图像所在的多媒体资源确定为第一媒体资源集合中的媒体资源。再从待匹配多媒体资源的帧图像中提取sift特征,并将待匹配多媒体资源的帧图像中的sift特征与第一目标帧图像中的sift特征进行匹配,将第一目标帧图像中匹配成功的帧图像确定为第二目标帧图像,并获取第二目标帧图像的匹配信息。
可见,通过上述装置,首先从多媒体资源集合中查找到包括的第一目标帧图像的特征与待匹配多媒体资源的帧图像的特征匹配并满足第一匹配条件并且这些第一目标帧图像满足目标条件的媒体资源,从而找到资源库中与待匹配多媒体资源相似的媒体资源,组成第一媒体资源集合,再从第一媒体资源集合中媒体资源的第一目标帧图像中确定特征与待匹配多媒体资源的帧图像中的特征匹配、且满足第二匹配条件的第二目标帧图像,并获取第二目标帧图像的匹配信息,从而从与待匹配多媒体资源相似的多媒体资源中筛选出相似度更加高的多媒体资源,并获取到具体的匹配信息,从而实现了提高多媒体资源的匹配效率的技术效果,进而解决了相关技术中多媒体资源的匹配效率较低的技术问题。
作为一种可选的方案,查找模块包括:
1)第一确定单元,被设置为从多媒体资源集合中的多媒体资源的帧图像中确定满足目标条件的第一目标帧图像;
2)第一获取单元,被设置为获取第一目标帧图像所属的第一多媒体资源,其中,第一媒体资源集合中包括第一多媒体资源。
可选地,在本实施例中,多媒体资源集合中的存储形式可以但不限于是特征-帧图像对的形式,其中,帧图像可以用多媒体资源标识和播放时间点的坐标形式来进行表示。例如:[D t]:{[t j,videoID k],[t k,videoID x]...}、[D t+1]:{[t j+n,videoID k+h],[t k,videoID x]...}等等。其中,Dt和D t+1为特征,t是时间点,videoID是视频的id编号。通过这种形式就可以筛选出在第一多媒体资源集合中的哪个多媒体资源的哪个帧图像具有与待匹配多媒体资源的帧图像相同或相似的特征。
可选地,在本实施例中,在获取到满足目标条件的第一目标帧图像后,可以对第一目标帧图像按照多媒体资源进行聚合,找出第一目标帧图像所属的第一多媒体资源。从而得到第一媒体资源集合。
作为一种可选的方案,第一确定单元被设置为:
从待匹配多媒体资源的帧图像中提取第一特征;
从具有对应关系的特征和帧图像集合中获取第一特征对应的目标帧图像集合,其中,目标帧图像集合中包括第一媒体资源集合的多媒体资源中具有第一特征的帧图像,目标帧图像集合中的帧图像的特征与待匹配多媒体资源的帧图像中的特征匹配、且满足第一匹配条件;
获取目标帧图像集合中的帧图像所属的第二多媒体资源;
获取第二多媒体资源中连续的具有第一特征的帧图像的数量;
将连续的具有第一特征的帧图像的数量落入目标数量阈值范围的第二多媒体资源中的具有第一特征的帧图像确定为满足目标条件的第一目标帧图像;
将所述满足所述目标条件的所述第一目标帧图像所在的媒体资源确定为所述第一媒体资源集合。
可选地,在本实施例中,第一确定单元还被设置为:使用多个多媒体资源样本和相似度数据训练分类网络模型,得到目标分类网络模型,其中,相似度数据为用于指示多个多媒体资源样本之间的相似度的数据,分类网络模型的损失函数设置为对比损失函数,目标分类网络模型的输入参数为多媒体资源的帧图像,目标分类网络模型的输出参数为多媒体资源的帧图像对应的特征;将待匹配多媒体资源的帧图像输入目标分类网络模型,得到目标分类网络模型输出的第一特征。
可选地,在本实施例中,上述分类网络模型可以但不限于包括VGG网络、GoogleNet网络、Resnet网络等等。
作为一种可选的方案,第一确定模块被设置为:从第一目标帧图像中提取第二特征,并从待匹配多媒体资源的帧图像中提取第三特征;获取第一目标帧图像与待匹配多媒体资源的帧图像之间的对应关系;获取具有对应关系的第一目标帧图像的第二特征与待匹配多媒体资源的帧图像的第三特征中相匹配的特征的数量以及互不匹配的特征的数量;获取相匹配的特征的数量以及互不匹配的特征的数量之间的比值;将比值落入第一比值范围的第一目标帧图像确定为第二目标帧图像,其中,比值落入第一比值范围的帧图像为特征与待匹配多媒体资源的帧图像中的特征匹配、且满足第二匹配条件的帧图像。
在一个可选的实施方式中,具有对应关系的第一目标帧图像的第二特征包括S1、S2、S3、S4、S5、S6、S7、S8、S9、S10,待匹配多媒体资源的帧图像的第三特征包括:S1、S2、S3、S4、S5、S6、S7、S8、S9、S11,那么,二者之间相匹配的特征为S1、S2、S3、S4、S5、S6、S7、S8、S9,相匹配的特征的数量为9,二者之间互不匹配的特征为S1、S2、S3、S4、S5、S6、S7、S8、S9、S10、S11,互不匹配的特征的数量为11,则相匹配的特征的数量以及互不匹配的特征的数量之间的比值为9/11,假设 第一比值范围为大于3/4,该比值9/11大于3/4,则可以将该第一目标帧图像确定为第二目标帧图像。
作为一种可选的方案,第一获取模块包括:
1)第二获取单元,被设置为获取第二目标帧图像所在的目标媒体资源;
2)第二确定单元,被设置为确定目标媒体资源中每个目标媒体资源包含的第二目标帧图像的数量以及每个目标媒体资源的帧率值,其中,帧率值用于指示每个目标媒体资源每一秒所播放的帧图像的数量;
3)第三确定单元,被设置为将每个目标媒体资源包含的第二目标帧图像的数量与每个目标媒体资源的帧率值的乘积值确定为每个目标媒体资源对应的总时长,并将每个目标媒体资源包含的第二目标帧图像在每个目标媒体资源中的播放时间点确定为每个目标媒体资源对应的播放时刻。
可选地,在本实施例中,可以但不限于通过一个多媒体资源中第二目标帧图像数量以及该多媒体资源的帧率来确定待匹配多媒体资源与该多媒体资源相匹配的总时长。
可选地,在本实施例中,还可以但不限于通过构造具有对应关系的第二目标帧图像和待匹配的帧图像的时间点之间的映射关系来确定多媒体资源中匹配的部分的缩放关系。例如:通过构造at1+bt2=c的时间点映射关系(t1为输入的视频的时间点,t2为匹配的视频的时间点)去估算视频匹配时域上的缩放关系。采用最小平方法(Least squares error)去估计视频时域变形的信息。
作为一种可选的方案,上述装置还包括:
1)第二获取模块,被设置为获取总时长与目标媒体资源的时长之间的比值;
2)第二确定模块,被设置为在总时长与目标媒体资源的时长之间的 比值落入第二比值范围的情况下,确定待匹配多媒体资源侵犯了目标媒体资源的版权,其中,目标媒体资源是具有版权的多媒体资源。
可选地,在本实施例中,可以根据得到的匹配信息对待匹配多媒体资源进行侵权判定。例如:如果待匹配视频与视频库中某个视频匹配的时长超过了该视频总时长的50%,则可以确定待匹配视频侵犯了该视频的版权。
本申请实施例的应用环境可以但不限于参照上述实施例中的应用环境,本实施例中对此不再赘述。本申请实施例提供了用于实施上述实时通信的连接方法的一种可选的可选应用示例。
作为一种可选的实施例,上述多媒体资源的匹配方法可以但不限于应用于如图6所示的对视频资源进行匹配的场景中。在本场景中,视频的匹配流程包括了两个特征匹配的过程,VGG哈希特征的匹配和rootsift的特征匹配过程。首先利用VGG的哈希特征,进行视频的相似性匹配(VGG哈希指纹库指的是拥有版权的视频的特征集合)。在VGG的特征匹配过程中,如果不相似,直接输出结果。如果相似,会进行二次的较正--rootsift的特征匹配。经过rootsift的特征匹配,最后会把纠正后的结果,统一输出。
可选地,在本实施例中,通过以下方式进行视频特征的提取:输入的视频(相当于上述待匹配多媒体资源)首先经过帧率变化到K帧/秒(例如:K取3)。然后应用两个特征提取的方法:深度学习的特征提取和传统特征提取。
可选地,在本实施例中,深度学习的特征提取中,采用传统的分类网络(如VGG,GoogleNet,Resnet)。在预训练的分类网络里(例如:具有对1000个物体分类的VGG网络,利用公开的数据集imageNet训练所得)进行迁移学习。通过收集一批图片的相似数据集,并将分类的网络VGG(这里以VGG为例子,其他网络也适用)最后的损失层改为对比损失(contrastive Loss),可以衡量两个图片的相似度。然后进行迁移学习,得到一个具有区分图片相似度能力的网络。基于VGG提取的特征,每个图 片只有一个特征,这里记为Fdt,t代表某个时间点。提取得到的特征,经过中值二值化(median cut),转变成哈希,记为Dt。
可选地,在本实施例中,传统的特征提取中,采用的是rootSift的方法。首先对提取的视频帧进行sift特征提取,得到P个特征。然后对P个特征采取归一化的操作。归一化可以增加抗嗓能力:
原sift向量:V sift(v 1,v 2,...,V 128);
变形公式:
Figure PCTCN2019079988-appb-000001
归一化后的特征,进行中值二值化的转化(median cut),每一帧得到P个哈希值,记为T t,i,其中i∈[0,P)。
可选地,在本实施例中,通过以下方式进行视频指纹匹配的过程:视频指纹的匹配包括两个流程:1.VGG特征匹配,2.rootSift特征匹配。流程如下:输入的视频,首先进行VGG特征匹配,因为VGG提取的特征较为抽像,并且哈希的数量较少。非常适合作为第一次的视频匹配过滤。VGG特征匹配可以有很高的召回率。在VGG特征匹配后,可以计算视频的相似度,对于相似度大于阈值的,采取rootsift的匹配分析,进一步确认视频的匹配信息。rootsift具有较好的细节描述,可以更好地保证准确率。
可选地,在本实施例中,VGG的特征匹配过程包括:指纹特征提取,哈希转换和时域匹配分析。输入的视频首先经过视频指纹特征提取,再经过中值二值转化,得到一连串的哈希特征值和哈希对应的时间点。指纹库中存储着版权视频的特征(Dt,videoID,t),t是时间点,videoID是视频的id编号。而且这类特征按照倒排表的数据结构存储:
[Dt]:{[t j,videoID k],[t k,videoID x]...}
[Dt+1]:{[t j+n,videoID k+h],[t k,videoID x]...}
...
在匹配的时候,将输入的视频分为Ks(现在K取5)的多个片断,单独针对每一个片断进行匹配。以每秒有三个特征帧来说,在K取5时,单个分片总共有15个哈希值(D i,i∈[0,15))。对于每个D i,将其和指纹库中的特征比较,找出哈希特征值相等(如Dt相等的数据)的对应视频的信息([t j,videoID k],[t k,videoID x]....),然后按videoID k进行聚合,数出对于videoID k来说在时间上连续匹配的帧图像的个数,然后除以15,得到相似度。取相似值大于0.8的视频片段作为匹配的片断。依照上面的方法,可以对每个Ks的片断进行相似度的计算,最后得出每个视频与输入视频相似的片断个数R。通过相似的片断个数,可以计算出视频的相似度:MatchPer=100*R*K/dur。dur为匹配的视频的时长。
可选地,在本实施例中,在VGG特征匹配计算的相似度MatchPer大于一定的阈值(例如:Q,Q取50),会进行rootsift的特征匹配。在VGG特征匹配之后,可以得到VGG匹配的一个videoID列表。在rootsift匹配时,对输入视频和VGG匹配的videoID进行二次匹配的较准。首先输入的视频会经过rootsift特征提取,而VGG匹配的视频rootsift特征会由在指纹库中读取。
在匹配的过程中采取两两匹配的策略,即输入的视频和videoID列表中的视频逐个进行匹配。找出匹配的信息。在这里每帧图像的相似度用下面的方式计算:
Figure PCTCN2019079988-appb-000002
其中,S为相似度,T t1为输入视频的t1时间的视频特征,而T' t2为videoID列表的特征在t2时间的视频特征。∩描述了两个视频特征相似的个数,而∪描述了两个视频特征中不同哈希种类特征总数。
Figure PCTCN2019079988-appb-000003
描述了两个视频的相似度,如果
Figure PCTCN2019079988-appb-000004
小于某个阈值E(例如:E取0.011),那么视频帧T t1和T′ t2匹配,并记录匹配的时间信息(t1,t2)。
在视频两两匹配的过程中,得到了两个视频的匹配时间点的信息。通过构造at1+bt2=c的时间点映射关系(t1为输入的视频的时间点,t2为匹 配的视频的时间点)去估算两个视频匹配时域上的缩放关系。采用最小平方法(Least squares error)去估计视频时域变形的信息。最后,通过分析匹配的时间点,可以得到匹配的时长。视频的匹配的百分比可以根据匹配的时长和输入视频的时长来计算得出(百分比的计算可根据相关的业务逻辑来调整)。最后通过百分比或者匹配时长的信息,来决定视频的匹配情况。在rootsift二次匹配中,会依次对videoID的列表中的视频进行两两匹配。最后把匹配的结果(包括匹配时间点,匹配时长等)进行输出。
通过上述方式,能够保护优质用户制作的视频的原创性,为用户提供原创保护,并可以提供广告分成。鼓励一些优质的视频制作商,为平台提供更优质的内容。此外还为电影,电视剧,综艺等版权提供版权保护。
另一方面,还可以应用在视频的重复检测中。不但可以净化视频平台的存量视频,提升平台的视频质量,而且还能应用在推荐时,对推荐的视频进行过滤。
根据本申请实施例的又一个方面,还提供了一种用于实施上述多媒体资源的匹配的电子装置,如图7所示,该电子装置包括:一个或多个(图中仅示出一个)处理器702、存储器704、传感器706、编码器708以及传输装置710,该存储器中存储有计算机程序,该处理器被设置为通过计算机程序执行上述任一项方法实施例中的步骤。
可选地,在本实施例中,上述电子装置可以位于计算机网络的多个网络设备中的至少一个网络设备。
可选地,在本实施例中,上述处理器可以被设置为通过计算机程序执行以下步骤:
S1,在多媒体资源集合中查找第一媒体资源集合,其中,第一媒体资源集合中的每个媒体资源的第一目标帧图像满足目标条件,第一目标帧图像的特征与待匹配多媒体资源的帧图像中的特征匹配、且满足第一匹配条件;
S2,在第一目标帧图像中确定第二目标帧图像,其中,第二目标帧图 像的特征与待匹配多媒体资源的帧图像中的特征匹配、且满足第二匹配条件;
S3,获取第二目标帧图像的匹配信息,其中,匹配信息用于指示第二目标帧图像在目标媒体资源中的总时长和播放时刻,目标媒体资源为第二目标帧图像所在的媒体资源。
可选地,本领域普通技术人员可以理解,图7所示的结构仅为示意,电子装置也可以是智能手机(如Android手机、iOS手机等)、平板电脑、掌上电脑以及移动互联网设备(Mobile Internet Devices,MID)、PAD等终端设备。图7其并不对上述电子装置的结构造成限定。例如,电子装置还可包括比图7中所示更多或者更少的组件(如网络接口、显示装置等),或者具有与图7所示不同的配置。
其中,存储器702可被设置为存储软件程序以及模块,如本申请实施例中的多媒体资源的匹配方法和装置对应的程序指令/模块,处理器704通过运行存储在存储器702内的软件程序以及模块,从而执行各种功能应用以及数据处理,即实现上述的目标组件的控制方法。存储器702可包括高速随机存储器,还可以包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器702可进一步包括相对于处理器704远程设置的存储器,这些远程存储器可以通过网络连接至终端。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
上述的传输装置710被设置为经由一个网络接收或者发送数据。上述的网络可选实例可包括有线网络及无线网络。在一个实例中,传输装置710包括一个网络适配器(Network Interface Controller,NIC),其可通过网线与其他网络设备与路由器相连从而可与互联网或局域网进行通讯。在一个实例中,传输装置710为射频(Radio Frequency,RF)模块,其被设置为通过无线方式与互联网进行通讯。
其中,可选地,存储器702被设置为存储应用程序。
本申请的实施例还提供了一种存储介质,该存储介质中存储有计算机 程序,其中,该计算机程序被设置为运行时执行上述任一项方法实施例中的步骤。
可选地,在本实施例中,上述存储介质可以被设置为存储用于执行以下步骤的计算机程序:
S1,在多媒体资源集合中查找第一媒体资源集合,其中,第一媒体资源集合中的每个媒体资源的第一目标帧图像满足目标条件,第一目标帧图像的特征与待匹配多媒体资源的帧图像中的特征匹配、且满足第一匹配条件;
S2,在第一目标帧图像中确定第二目标帧图像,其中,第二目标帧图像的特征与待匹配多媒体资源的帧图像中的特征匹配、且满足第二匹配条件;
S3,获取第二目标帧图像的匹配信息,其中,匹配信息用于指示第二目标帧图像在目标媒体资源中的总时长和播放时刻,目标媒体资源为第二目标帧图像所在的媒体资源。
可选地,存储介质还被设置为存储用于执行上述实施例中的方法中所包括的步骤的计算机程序,本实施例中对此不再赘述。
可选地,在本实施例中,本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令终端设备相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:闪存盘、只读存储器(Read-Only Memory,ROM)、随机存取器(Random Access Memory,RAM)、磁盘或光盘等。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
上述实施例中的集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在上述计算机可读取的存储介质中。基于这样的理解,本申请的技术方案本质上或者说对相关技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计 算机软件产品存储在存储介质中,包括若干指令用以使得一台或多台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。
在本申请的上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的客户端,可通过其它的方式实现。其中,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,单元或模块的间接耦合或通信连接,可以是电性或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
以上所述仅是本申请的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本申请的保护范围。

Claims (14)

  1. 一种多媒体资源的匹配方法,包括:
    目标设备在多媒体资源集合中查找第一媒体资源集合,其中,所述第一媒体资源集合中的每个媒体资源的第一目标帧图像满足目标条件,所述第一目标帧图像的特征与待匹配多媒体资源的帧图像中的特征匹配、且满足第一匹配条件;
    所述目标设备在所述第一目标帧图像中确定第二目标帧图像,其中,所述第二目标帧图像的特征与所述待匹配多媒体资源的帧图像中的特征匹配、且满足第二匹配条件;
    所述目标设备获取所述第二目标帧图像的匹配信息,其中,所述匹配信息用于指示所述第二目标帧图像在目标媒体资源中的总时长和播放时刻,所述目标媒体资源为所述第二目标帧图像所在的媒体资源。
  2. 根据权利要求1所述的方法,其中,所述目标设备在所述多媒体资源集合中查找所述第一媒体资源集合包括:
    所述目标设备从所述多媒体资源集合中的多媒体资源的帧图像中确定满足所述目标条件的所述第一目标帧图像;
    所述目标设备获取所述第一目标帧图像所属的第一多媒体资源,其中,所述第一媒体资源集合中包括所述第一多媒体资源。
  3. 根据权利要求2所述的方法,其中,所述目标设备从所述多媒体资源集合中的多媒体资源的帧图像中确定满足所述目标条件的所述第一目标帧图像包括:
    所述目标设备从所述待匹配多媒体资源的帧图像中提取第一特征;
    所述目标设备从具有对应关系的特征和帧图像集合中获取所述第一特征对应的目标帧图像集合,其中,所述目标帧图像集合中包括 所述第一媒体资源集合的多媒体资源中具有所述第一特征的帧图像,所述目标帧图像集合中的帧图像的特征与所述待匹配多媒体资源的帧图像中的特征匹配、且满足所述第一匹配条件;
    所述目标设备获取所述目标帧图像集合中的帧图像所属的第二多媒体资源;
    所述目标设备获取所述第二多媒体资源中连续的具有所述第一特征的帧图像的数量;
    所述目标设备将连续的具有所述第一特征的帧图像的数量落入目标数量阈值范围的所述第二多媒体资源中的具有所述第一特征的帧图像确定为满足所述目标条件的所述第一目标帧图像;
    所述目标设备将所述满足所述目标条件的所述第一目标帧图像所在的媒体资源确定为所述第一媒体资源集合。
  4. 根据权利要求3所述的方法,其中,所述目标设备从所述待匹配多媒体资源的帧图像中提取所述第一特征包括:
    所述目标设备使用多个多媒体资源样本和相似度数据训练分类网络模型,得到目标分类网络模型,其中,所述相似度数据为用于指示所述多个多媒体资源样本之间的相似度的数据,所述分类网络模型的损失函数设置为对比损失函数,所述目标分类网络模型的输入参数为多媒体资源的帧图像,所述目标分类网络模型的输出参数为所述多媒体资源的帧图像对应的特征;
    所述目标设备将所述待匹配多媒体资源的帧图像输入所述目标分类网络模型,得到所述目标分类网络模型输出的所述第一特征。
  5. 根据权利要求1所述的方法,其中,所述目标设备在所述第一目标帧图像中确定所述第二目标帧图像包括:
    所述目标设备从所述第一目标帧图像中提取第二特征,并从所述待匹配多媒体资源的帧图像中提取第三特征;
    所述目标设备获取所述第一目标帧图像与所述待匹配多媒体资 源的帧图像之间的对应关系;
    所述目标设备获取具有所述对应关系的所述第一目标帧图像的所述第二特征与所述待匹配多媒体资源的帧图像的所述第三特征中相匹配的特征的数量以及互不匹配的特征的数量;
    所述目标设备获取所述相匹配的特征的数量以及所述互不匹配的特征的数量之间的比值;
    所述目标设备将所述比值落入第一比值范围的所述第一目标帧图像确定为所述第二目标帧图像,其中,所述比值落入所述第一比值范围的帧图像为特征与所述待匹配多媒体资源的帧图像中的特征匹配、且满足第二匹配条件的帧图像。
  6. 根据权利要求1所述的方法,其中,所述目标设备获取所述第二目标帧图像的匹配信息包括:
    所述目标设备获取所述第二目标帧图像所在的目标媒体资源;
    所述目标设备确定所述目标媒体资源中每个目标媒体资源包含的所述第二目标帧图像的数量以及所述每个目标媒体资源的帧率值,其中,所述帧率值用于指示所述每个目标媒体资源每一秒所播放的帧图像的数量;
    所述目标设备将所述每个目标媒体资源包含的所述第二目标帧图像的数量与所述每个目标媒体资源的帧率值的乘积值确定为所述每个目标媒体资源对应的所述总时长,并将所述每个目标媒体资源包含的第二目标帧图像在所述每个目标媒体资源中的播放时间点确定为所述每个目标媒体资源对应的所述播放时刻。
  7. 根据权利要求1所述的方法,其中,在所述目标设备获取所述第二目标帧图像的匹配信息之后,所述方法还包括:
    所述目标设备获取所述总时长与所述目标媒体资源的时长之间的比值;
    所述目标设备在所述总时长与所述目标媒体资源的时长之间的 比值落入第二比值范围的情况下,确定所述待匹配多媒体资源侵犯了所述目标媒体资源的版权,其中,所述目标媒体资源是具有版权的多媒体资源。
  8. 一种多媒体资源的匹配装置,包括一个或多个处理器,以及一个或多个存储程序单元的存储器,其中,所述程序单元由所述处理器执行,所述程序单元包括:
    查找模块,被设置为在多媒体资源集合中查找第一媒体资源集合,其中,所述第一媒体资源集合中的每个媒体资源的第一目标帧图像满足目标条件,所述第一目标帧图像的特征与待匹配多媒体资源的帧图像中的特征匹配、且满足第一匹配条件;
    第一确定模块,被设置为在所述第一目标帧图像中确定第二目标帧图像,其中,所述第二目标帧图像的特征与所述待匹配多媒体资源的帧图像中的特征匹配、且满足第二匹配条件;
    第一获取模块,被设置为获取所述第二目标帧图像的匹配信息,其中,所述匹配信息用于指示所述第二目标帧图像在目标媒体资源中的总时长和播放时刻,所述目标媒体资源为所述第二目标帧图像所在的媒体资源。
  9. 根据权利要求8所述的装置,其中,所述查找模块包括:
    第一确定单元,被设置为从所述多媒体资源集合中的多媒体资源的帧图像中确定满足所述目标条件的所述第一目标帧图像;
    第一获取单元,被设置为获取所述第一目标帧图像所属的第一多媒体资源,其中,所述第一媒体资源集合中包括所述第一多媒体资源。
  10. 根据权利要求8所述的装置,其中,所述第一确定模块被设置为:
    从所述第一目标帧图像中提取第二特征,并从所述待匹配多媒体资源的帧图像中提取第三特征;
    获取所述第一目标帧图像与所述待匹配多媒体资源的帧图像之间的对应关系;
    获取具有所述对应关系的所述第一目标帧图像的所述第二特征与所述待匹配多媒体资源的帧图像的所述第三特征中相匹配的特征的数量以及互不匹配的特征的数量;
    获取所述相匹配的特征的数量以及所述互不匹配的特征的数量之间的比值;
    将所述比值落入第一比值范围的所述第一目标帧图像确定为所述第二目标帧图像,其中,所述比值落入所述第一比值范围的帧图像为特征与所述待匹配多媒体资源的帧图像中的特征匹配、且满足第二匹配条件的帧图像;
    将所述满足所述目标条件的所述第一目标帧图像所在的媒体资源确定为所述第一媒体资源集合。
  11. 根据权利要求8所述的装置,其中,所述第一获取模块包括:
    第二获取单元,被设置为获取所述第二目标帧图像所在的目标媒体资源;
    第二确定单元,被设置为确定所述目标媒体资源中每个目标媒体资源包含的所述第二目标帧图像的数量以及所述每个目标媒体资源的帧率值,其中,所述帧率值用于指示所述每个目标媒体资源每一秒所播放的帧图像的数量;
    第三确定单元,被设置为将所述每个目标媒体资源包含的所述第二目标帧图像的数量与所述每个目标媒体资源的帧率值的乘积值确定为所述每个目标媒体资源对应的所述总时长,并将所述每个目标媒体资源包含的第二目标帧图像在所述每个目标媒体资源中的播放时间点确定为所述每个目标媒体资源对应的所述播放时刻。
  12. 根据权利要求8所述的装置,其中,所述装置还包括:
    第二获取模块,被设置为获取所述总时长与所述目标媒体资源的时长之间的比值;
    第二确定模块,被设置为在所述总时长与所述目标媒体资源的时 长之间的比值落入第二比值范围的情况下,确定所述待匹配多媒体资源侵犯了所述目标媒体资源的版权,其中,所述目标媒体资源是具有版权的多媒体资源。
  13. 一种存储介质,所述存储介质中存储有计算机程序,其中,所述计算机程序被设置为运行时执行所述权利要求1至7任一项中所述的方法。
  14. 一种电子装置,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器被设置为通过所述计算机程序执行所述权利要求1至7任一项中所述的方法。
PCT/CN2019/079988 2018-04-13 2019-03-28 多媒体资源的匹配方法、装置、存储介质及电子装置 Ceased WO2019196659A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2020545252A JP7013587B2 (ja) 2018-04-13 2019-03-28 マルチメディアリソースのマッチング方法、装置、コンピュータプログラムおよび電子装置
EP19785786.5A EP3761187B1 (en) 2018-04-13 2019-03-28 Method and apparatus for matching multimedia resource, and storage medium and electronic device
US16/930,069 US11914639B2 (en) 2018-04-13 2020-07-15 Multimedia resource matching method and apparatus, storage medium, and electronic apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810333805.6A CN108647245B (zh) 2018-04-13 2018-04-13 多媒体资源的匹配方法、装置、存储介质及电子装置
CN201810333805.6 2018-04-13

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/930,069 Continuation US11914639B2 (en) 2018-04-13 2020-07-15 Multimedia resource matching method and apparatus, storage medium, and electronic apparatus

Publications (1)

Publication Number Publication Date
WO2019196659A1 true WO2019196659A1 (zh) 2019-10-17

Family

ID=63746162

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/079988 Ceased WO2019196659A1 (zh) 2018-04-13 2019-03-28 多媒体资源的匹配方法、装置、存储介质及电子装置

Country Status (5)

Country Link
US (1) US11914639B2 (zh)
EP (1) EP3761187B1 (zh)
JP (1) JP7013587B2 (zh)
CN (1) CN108647245B (zh)
WO (1) WO2019196659A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115623267A (zh) * 2021-07-16 2023-01-17 北京字跳网络技术有限公司 动态图片显示方法、装置、电子设备和存储介质

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108647245B (zh) * 2018-04-13 2023-04-18 腾讯科技(深圳)有限公司 多媒体资源的匹配方法、装置、存储介质及电子装置
CN109871490B (zh) * 2019-03-08 2021-03-09 腾讯科技(深圳)有限公司 媒体资源匹配方法、装置、存储介质和计算机设备
CN110390352A (zh) * 2019-06-26 2019-10-29 华中科技大学 一种基于相似性哈希的图像暗数据价值评估方法
JP7622641B2 (ja) * 2019-11-29 2025-01-28 ソニーグループ株式会社 情報処理装置、情報処理方法および情報処理プログラム
CN111314736B (zh) * 2020-03-19 2022-03-04 北京奇艺世纪科技有限公司 一种视频的版权分析方法、装置、电子设备及存储介质
CN111737522B (zh) * 2020-08-14 2021-03-02 支付宝(杭州)信息技术有限公司 视频匹配方法、基于区块链的侵权存证方法和装置
CN112257595A (zh) * 2020-10-22 2021-01-22 广州市百果园网络科技有限公司 视频匹配方法、装置、设备及存储介质
CN113254707B (zh) * 2021-06-10 2021-12-07 北京达佳互联信息技术有限公司 模型确定、关联媒体资源确定方法和装置
TWI780881B (zh) * 2021-08-27 2022-10-11 緯創資通股份有限公司 瑕疵檢測模型的建立方法及電子裝置
CN113810733B (zh) * 2021-09-22 2024-06-25 湖南快乐阳光互动娱乐传媒有限公司 媒体资源上线方法和装置
CN113870133B (zh) * 2021-09-27 2024-03-12 抖音视界有限公司 多媒体显示及匹配方法、装置、设备及介质
CN117131212A (zh) * 2022-08-31 2023-11-28 深圳Tcl新技术有限公司 多媒体资源聚合方法、装置、电子设备和可读存储介质
TWI838028B (zh) * 2022-12-21 2024-04-01 宏碁股份有限公司 用於讀取圖形資源和影像處理模型的電子裝置和方法
CN118337997B (zh) * 2024-06-13 2024-08-20 中亿(深圳)信息科技有限公司 基于嵌入式系统的视频识别方法、装置、介质及设备
WO2026063683A1 (en) * 2024-09-17 2026-03-26 Samsung Electronics Co., Ltd. Method for handling video compression in extended reality (xr) environment by electronic device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103605773A (zh) * 2013-11-27 2014-02-26 乐视网信息技术(北京)股份有限公司 一种多媒体文件搜索方法及装置
CN104504059A (zh) * 2014-12-22 2015-04-08 合一网络技术(北京)有限公司 多媒体资源推荐方法
US20160180379A1 (en) * 2014-12-18 2016-06-23 Nbcuniversal Media, Llc System and method for multimedia content composition
CN107766571A (zh) * 2017-11-08 2018-03-06 北京大学 一种多媒体资源的检索方法和装置
CN108647245A (zh) * 2018-04-13 2018-10-12 腾讯科技(深圳)有限公司 多媒体资源的匹配方法、装置、存储介质及电子装置

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009272936A (ja) * 2008-05-08 2009-11-19 Sony Corp 情報処理装置および方法、並びにプログラム
US8671109B2 (en) * 2009-10-01 2014-03-11 Crim (Centre De Recherche Informatique De Montreal) Content-based video copy detection
KR101541495B1 (ko) * 2012-08-17 2015-08-05 네이버 주식회사 캡쳐된 이미지를 이용한 동영상 분석 장치, 방법 및 컴퓨터 판독 가능한 기록 매체
US9659014B1 (en) * 2013-05-01 2017-05-23 Google Inc. Audio and video matching using a hybrid of fingerprinting and content based classification
CN104182719B (zh) * 2013-05-21 2017-06-30 宁波华易基业信息科技有限公司 一种图像识别方法及装置
US9465995B2 (en) * 2013-10-23 2016-10-11 Gracenote, Inc. Identifying video content via color-based fingerprint matching
CN103593464B (zh) * 2013-11-25 2017-02-15 华中科技大学 基于视觉特征的视频指纹检测及视频序列匹配方法及系统
US9934453B2 (en) * 2014-06-19 2018-04-03 Bae Systems Information And Electronic Systems Integration Inc. Multi-source multi-modal activity recognition in aerial video surveillance
CN105681898B (zh) * 2015-12-31 2018-10-30 北京奇艺世纪科技有限公司 一种相似视频和盗版视频的检测方法及装置
US20170277955A1 (en) * 2016-03-23 2017-09-28 Le Holdings (Beijing) Co., Ltd. Video identification method and system
CN106202413B (zh) * 2016-07-11 2018-11-20 北京大学深圳研究生院 一种跨媒体检索方法
CN106844528A (zh) 2016-12-29 2017-06-13 广州酷狗计算机科技有限公司 获取多媒体文件的方法和装置
CN107180074A (zh) * 2017-03-31 2017-09-19 北京奇艺世纪科技有限公司 一种视频分类方法及装置
CN107291910A (zh) * 2017-06-26 2017-10-24 图麟信息科技(深圳)有限公司 一种视频片段结构化查询方法、装置及电子设备
CN111723813B (zh) * 2020-06-05 2021-07-06 中国科学院自动化研究所 基于类内判别器的弱监督图像语义分割方法、系统、装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103605773A (zh) * 2013-11-27 2014-02-26 乐视网信息技术(北京)股份有限公司 一种多媒体文件搜索方法及装置
US20160180379A1 (en) * 2014-12-18 2016-06-23 Nbcuniversal Media, Llc System and method for multimedia content composition
CN104504059A (zh) * 2014-12-22 2015-04-08 合一网络技术(北京)有限公司 多媒体资源推荐方法
CN107766571A (zh) * 2017-11-08 2018-03-06 北京大学 一种多媒体资源的检索方法和装置
CN108647245A (zh) * 2018-04-13 2018-10-12 腾讯科技(深圳)有限公司 多媒体资源的匹配方法、装置、存储介质及电子装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3761187A4 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115623267A (zh) * 2021-07-16 2023-01-17 北京字跳网络技术有限公司 动态图片显示方法、装置、电子设备和存储介质

Also Published As

Publication number Publication date
CN108647245B (zh) 2023-04-18
US11914639B2 (en) 2024-02-27
US20200349385A1 (en) 2020-11-05
JP7013587B2 (ja) 2022-01-31
EP3761187B1 (en) 2025-03-19
JP2021518005A (ja) 2021-07-29
CN108647245A (zh) 2018-10-12
EP3761187A1 (en) 2021-01-06
EP3761187A4 (en) 2021-12-15

Similar Documents

Publication Publication Date Title
WO2019196659A1 (zh) 多媒体资源的匹配方法、装置、存储介质及电子装置
US20190080177A1 (en) Video detection method, server and storage medium
CN104317959B (zh) 基于社交平台的数据挖掘方法及装置
US9749710B2 (en) Video analysis system
CN110020122B (zh) 一种视频推荐方法、系统及计算机可读存储介质
CN104063706B (zh) 一种基于surf算法的视频指纹提取方法
JP6365024B2 (ja) サービス提供装置、方法、及びプログラム
KR20190022662A (ko) 일치하는 컨텐츠를 식별하는 시스템 및 방법
CN106575280B (zh) 用于分析用户关联图像以产生非用户生成标签以及利用该生成标签的系统和方法
US10769208B2 (en) Topical-based media content summarization system and method
CN103988232A (zh) 使用运动流形来改进图像匹配
CN107657004A (zh) 视频推荐方法、系统及设备
WO2014090034A1 (zh) 实现增强现实应用的方法及设备
WO2015135475A1 (en) Multimedia file push method and apparatus
WO2017166472A1 (zh) 广告数据匹配方法、装置及系统
CN108024148B (zh) 基于行为特征的多媒体文件识别方法、处理方法及装置
CN109977738A (zh) 一种视频场景分割判断方法、智能终端及存储介质
CN103810241B (zh) 一种低频点击的过滤方法和装置
CN107666573A (zh) 摄像头场景下对象视频的录制方法及装置、计算设备
CN107733874A (zh) 信息处理方法、装置、计算机设备和存储介质
CN114973293B (zh) 相似性判断方法、关键帧提取方法及装置、介质和设备
JP6934001B2 (ja) 画像処理装置、画像処理方法、プログラムおよび記録媒体
CN110839167B (zh) 一种视频推荐方法、装置及终端设备
CN106454398A (zh) 一种视频处理的方法及终端
CN104867026A (zh) 提供商品图像的方法和系统以及输出商品图像的终端装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19785786

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020545252

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2019785786

Country of ref document: EP

Effective date: 20200929