WO2021051604A1 - Procédé d'identification de région de texte d'osd, dispositif et support d'enregistrement - Google Patents
Procédé d'identification de région de texte d'osd, dispositif et support d'enregistrement Download PDFInfo
- Publication number
- WO2021051604A1 WO2021051604A1 PCT/CN2019/118284 CN2019118284W WO2021051604A1 WO 2021051604 A1 WO2021051604 A1 WO 2021051604A1 CN 2019118284 W CN2019118284 W CN 2019118284W WO 2021051604 A1 WO2021051604 A1 WO 2021051604A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- bounding box
- bounding
- text area
- osd
- frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
- G06V20/635—Overlay text, e.g. embedded captions in a TV programme
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/28—Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Definitions
- This application relates to the field of image recognition technology, and in particular to a method, device and storage medium for recognizing the text area of OSD.
- OSD display technology is widely used in the market.
- OSD On Screen Display
- Some special fonts or graphics are generated on the display screen, so that users can get some message.
- the text and video are nested in each frame, and the text embedded in the image is an important way of expressing the semantic content of the image. If these words can be automatically extracted and recognized, the machine can automatically understand the content of the pictures and classify the pictures, and then use the mature text retrieval technology to use these words to label and retrieve the pictures, thereby providing content-based image and video retrieval Provide a way. How to accurately find the range of the text from the reverse direction is a problem that needs to be solved urgently.
- the existing image text segmentation technologies are mainly divided into the following three categories: threshold-based methods, clustering-based methods, and statistical model-based methods.
- the font area can not be removed from interfering with the foreground, and the interference of the dynamic change of the light can not be shielded.
- This application provides an OSD text area recognition method, electronic device, and computer readable storage medium, which mainly use edge detection-based segmentation methods to segment the OSD text and obtain the segmentation area. After shielding the interference, it adopts adaptive Binarization processing locks the OSD text segmentation area and completes text extraction.
- the present application also provides a method for recognizing the text area of OSD, which is applied to an electronic device.
- the method includes: S110, preprocessing the OSD file to obtain a frame-by-frame image, and using the canny algorithm to analyze the frame-by-frame image. Perform binarization threshold filtering on the image, perform frame contour acquisition on the filtered image, and find discrete points of the contour of the text area; S120, encapsulate the discrete points into at least two polygonal bounding boxes, and calculate the bounding boxes Screen the bounding boxes according to the preset conditions; S130. Use an expansion algorithm to enlarge the screened bounding boxes in proportion to realize the interconnection between the outlines of the bounding boxes; S140.
- an OSD text area recognition system including an image preprocessing unit, a bounding box forming unit, and a bounding box selection unit; wherein the image preprocessing unit is used to preprocess the OSD file to obtain For frame-by-frame images, the canny algorithm is used to perform binarization threshold filtering on the frame-by-frame images, and the frame contours of the filtered images are obtained and discrete points of the contours of the text area are found; the bounding box forming unit is used to combine the The discrete points are encapsulated into at least two polygonal bounding boxes, and the area of the bounding box is calculated, and the bounding box is screened according to preset conditions; the bounding box obtained by the screening is scaled up by an expansion algorithm to achieve all The outlines between the bounding boxes are connected to each other; after the outlines of the bounding boxes are connected, the remaining bounding boxes after the connection are combined according to a preset condition; the bounding box selection unit is used to set the font according
- the present application provides an electronic device, the electronic device comprising: a memory, a processor, the memory stores an OSD text area recognition program, the OSD text area recognition program is used by the processor
- the following steps are implemented during execution: S110, preprocess the OSD file to obtain a frame-by-frame image, use the canny algorithm to perform binarization threshold filtering on the frame-by-frame image, obtain the frame contour of the filtered image and find the text area S120, encapsulate the discrete points into at least two polygonal bounding boxes, calculate the area of the bounding box, and filter the bounding boxes according to preset conditions;
- S130 use an expansion algorithm to The bounding boxes obtained by the screening are enlarged in proportion to realize the interconnection between the outlines of the bounding boxes; S140, after the outlines of the bounding boxes are connected, take the union of the remaining bounding boxes after the interconnection according to a preset condition S150, the rectangular range of the bounding box after the regression calculation according to the width of the font of
- the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, the computer program includes an OSD text area recognition program, the OSD text area When the recognition program is executed by the processor, the steps of the method for recognizing the text area of the OSD are realized.
- the OSD character region recognition method, electronic device, and computer-readable storage medium proposed in this application segment the OSD Chinese characters and obtain the segmented region through a segmentation method based on edge detection, and after shielding interference, adopt adaptive binarization Process to lock the OSD text segmentation area and complete text extraction.
- the beneficial effects are as follows:
- FIG. 1 is a flowchart of a preferred embodiment of a method for recognizing the text area of the OSD of this application;
- FIG. 2 is a structural diagram of a preferred embodiment of the OSD text area recognition system of this application.
- FIG. 3 is a schematic structural diagram of a preferred embodiment of the electronic device of this application.
- This application provides a method for recognizing the text area of OSD.
- FIG. 1 it is a flowchart of a preferred embodiment of a method for recognizing an OSD text area of the present application.
- the method can be executed by a device, and the device can be implemented by software and/or hardware.
- This application uses a segmentation method based on edge detection to segment the OSD Chinese text area and obtain the segmentation area. After shielding the interference, the OSD text segmentation area is locked through adaptive binary processing to complete the text extraction.
- the edge-based image segmentation method used in this application is a collection of continuous pixel points on the boundary of two different regions in the image, which reflects the discontinuity of the local features of the image.
- the abrupt changes of image characteristics such as grayscale, color, texture, etc.
- the edge-based segmentation method refers to the edge detection based on gray value, which is based on the observation that the gray value of the edge will show a step-shaped or roof-shaped change.
- the gray value of the pixels on both sides of the step edge is obviously different, while the roof edge is located at the turning point of the gray value rising or falling.
- differential operators can be used for edge detection, that is, the extreme value of the first derivative and the zero-crossing point of the second derivative are used to determine the edge.
- the specific implementation can be done by convolution of the image and the template.
- the method for recognizing the text area of the OSD includes: step S110-step S150.
- Preprocess the OSD file to obtain a frame-by-frame image use the canny algorithm to perform binarization threshold filtering on the frame-by-frame image, obtain frame contours on the filtered image, and find discrete points of the contour of the text area; Achieve the effect of initially deleting part of the darker area.
- the frame-by-frame image is obtained by real-time decoding of the OSD image.
- edge detection is completed by the canny algorithm; in image edge detection, noise suppression and precise edge positioning cannot be satisfied at the same time.
- Some edge detection algorithms remove noise through smoothing filtering, while also increasing the uncertainty of edge positioning; and While improving the sensitivity of the edge detection operator to edges, it also improves the sensitivity to noise.
- the canny algorithm strives to find the best compromise between anti-noise interference and precise positioning.
- the Canny algorithm has the characteristics of high signal-to-noise ratio, high positioning accuracy and high single-edge response.
- the ordinary canny algorithm blurs the image P, and then convolves it with a bunch of orthogonal differential filters (such as Prewitt filters) to generate images H and V that include the derivatives in the horizontal and vertical directions, respectively.
- Pixel (i, j) calculates its gradient direction and amplitude. If the amplitude exceeds the critical value, an edge is assigned (here called the threshold method, but the effect is not good).
- the image binarization achieved by dual thresholds is adopted, that is, the gray value of the pixels on the image is set to 0 or 255, that is, the entire image is obviously only black and white.
- An image includes target objects, background, and noise.
- T To extract the target objects directly from a multi-value digital image, a common method is to set a threshold T, and use T to divide the image data into two parts: those greater than T Pixel groups and pixel groups smaller than T. This is the most special method for studying grayscale transformation, which is called image binarization.
- the adaptive threshold does not need a fixed threshold, but can be based on the corresponding adaptive method, and the threshold can be set adaptively through the local features of the image to perform binarization processing. This application locks the OSD text segmentation area through adaptive binarization processing.
- step S110 using the canny algorithm to perform binarization threshold filtering on the frame-by-frame image includes: obtaining the gradient amplitude of the pixel points of the frame-by-frame image by using a cumulative histogram, and setting a high threshold TH and The low threshold TL judges the pixel with the gradient amplitude greater than the high threshold TH as an edge.
- adaptive threshold processing can take into account the two problems of rich edges with low threshold and missing edges at high threshold; high threshold is used to detect important and prominent lines in the image, Contours, etc., while the low threshold is used to ensure that the details are not lost.
- the edges detected by the low threshold are more abundant, but many edges are not what we care about. What is pursued is the single-edge response criterion; finally, a search algorithm is adopted to keep the lines that overlap with the edges of the high threshold in the low threshold, and delete all other lines.
- the high threshold is used to judge the position of the true edge, and the low threshold is used to repair the judged true edge.
- the high threshold is processed to obtain the high threshold image, the true edge endpoints in the high threshold image are searched, and the edges near the true edge endpoints are repaired based on the low threshold.
- S110 using the canny algorithm to perform threshold filtering on the frame contour includes: S111, using a Gaussian filter to convolve and smooth the image, and initially delete part of the darker area to obtain a processed image guass; S112. Calculate the magnitude and direction of the gradient by using the finite difference of the first-order partial derivative; S113. Perform non-maximum suppression on the gradient magnitude, that is, traverse the image. The gray value of each pixel is not the largest in comparison, then the pixel value is set to 0, that is, it is not an edge; S114, a double threshold algorithm is performed to detect and connect the edges.
- the threshold is for the gradient.
- Using the high-threshold Canny algorithm is to take the maximum gradient profiling as the edge determination. In other words, even if the cumulative histogram is used to calculate the two thresholds, anything greater than the high threshold must be an edge; anything less than the low threshold must not be an edge.
- the MyResult array is obtained, which contains only 0 or 255 image data, and the image data is directly assigned to the image object and returned.
- Light changes are divided into sudden changes and gradual changes of light.
- the background model must be able to adapt to the gradual change of light in the outdoor environment during the day; correspondingly, the background model can also adapt to the indoor environment of suddenly turned on lights. In short, the change of light will strongly affect the background model, which may lead to false detections.
- the OSD text is segmented and the segmentation area is obtained, while shielding the interference of the dynamic change of the illumination on the detection.
- the edge detection methods of Roberts operator, Sobel operator and Kirsh operator do not perform well in image segmentation due to factors such as uneven brightness.
- the histogram equalization of the canny algorithm is used to remove the influence of the illumination and the double threshold algorithm is used to realize the edge detection, so as to achieve the technical effect of shielding the interference of the dynamic change of the illumination.
- step S110 is a binarized image.
- the above-mentioned binarized image is further used to find contours.
- the contour is a point set (a set of discrete points), which is a parameter that constitutes a convex hull.
- the discrete points of the outline of the text area are obtained by findContous. Find the discrete points of the contour (ie discrete pixel points) in the acquired frame contour through findContous; smaller objects have only one layer of contours, and larger objects have multiple layers of contours, and the layers of contours are not connected.
- findContous function uses the findContous function to calculate the contour of the image and traverse each point in the contour; that is to say, use the findContous function to retrieve the contours of all the feature points in the image, and each contour is a polygon to get the contour set of the feature points. It is to perform contour search on the binarized image obtained after the edge detection of the canny algorithm. retrieve contours from the binary image and return the number of detected contours. In order to achieve the preliminary extraction of the boundary of the text area.
- the number of contour groups The contours parameter is multi-layer; it is used in the background calculation area, because there are multiple layers, so it is used more; it is a vector and a double vector.
- Each element in the vector stores a group of continuous Point points.
- the vector of the set of points, each set of Point points is an outline. As many contours as there are, there are as many elements as vector contours.
- the function to extract the target contour is findContours. Its input image is a binary image, and the output is a set of contour points of each connected area: vector ⁇ vector ⁇ Point>>.
- the size of the outer vector represents the number of contours in the image, and the size of the inner vector represents the number of points on the contour.
- the function to extract the target contour is findContours has three parameters, the first is the input image; the second is the contour retrieval mode of the search contour function, indicating the type of contour search, here we use the outer contour, and you can also find all contours , That is, the part that includes some holes, like the outline formed by the arms and waist of the figure in the image.
- the third parameter explains the contour representation method, that is, the contour approximation method.
- the parameters in the program indicate that the contour includes all points. You can also use other parameters to make the point a straight line, and only save the start and end points of the line.
- the second return value of the contour is a Python list, which stores all the contours in the image.
- Each contour is a Numpy array containing the coordinates of the boundary point (x, y) of the object.
- S120 Encapsulate the discrete points into at least two polygonal bounding boxes, calculate the area of the bounding boxes, and filter the bounding boxes according to preset conditions;
- the discrete points obtain convex hulls through convexhull, and the obtained convex hulls are encapsulated into a polygonal bounding box one by one.
- the bounding box is formed because there is inevitably noise in the picture, and the noise will also be found as discrete points, but the area of the noise is smaller than the real text area.
- Convex Hull is a concept in computational geometry (graphics).
- V the intersection S of all convex sets containing X is called the convex hull of X.
- the convex hull of X can be constructed by the linear combination of all points (x1, x2...xn) in X.
- opencv provides the convexHull() function to find the convex hull of the object in the image. It refers to the convex polygon with the smallest area that encloses a given point inside.
- the convexHull() function contains four parameters.
- the first parameter indicates the input two-dimensional point set, Mat type data; the second parameter, the output parameter, is used to find the convex hull after the output function is called; the third parameter indicates Operation direction, when the identifier is true, the output convex hull is clockwise, otherwise it is counterclockwise; the fourth parameter represents the operation identifier, and the default value is true.
- each point of each convex hull is returned. Otherwise, the index of each point of the convex hull is returned.
- this flag is ignored.
- Algorithms for forming convex hulls include Graham’s Scan method and Jarvis stepping method.
- the bounding box may be an AABB bounding box, a bounding sphere, a direction bounding box OBB, a fixed direction convex hull FDH, and the like.
- the bounding box is an algorithm for solving the optimal enclosing space of a set of discrete points.
- the basic idea is to use a slightly larger geometric body with simple characteristics (called a bounding box) to approximately replace complex geometric objects.
- the bounding box is a simple geometric space that contains objects with complex shapes.
- the constructed smallest polygon that can contain all the edges in the two-dimensional convex hull is a bounding box. Construct a bounding box at the same time, and select the one with the smallest area.
- AABB bounding box (Axis-aligned bounding box), bounding ball (Sphere), orientation bounding box (OBB) (Oriented bounding box), and fixed direction convex hulls (FDH) (Fixed directions hulls or k-DOP).
- the bounding box of a group of objects is a closed space that completely contains the combination of objects. Encapsulating complex objects in a simple bounding box and using a simple bounding box shape to approximate the shape of a complex geometric body can improve the efficiency of geometric operations. And usually simple objects are easier to check for overlap between each other.
- the area of the bounding box includes three types: O’Rourke algorithm, projection rotation method and principal component analysis method.
- the method for calculating the area of the bounding box includes: determining 4 edge points of the bounding box, calculating the width and height of the bounding box through the 4 edge points of the bounding box, and using the width and height values Obtain the area of the bounding box.
- the bounding box is two-dimensional, and the area of the bounding box can be estimated by width * height.
- the points of the bounding box, because the bounding box is a rectangle, including 4 edge points, can also be converted into the center point and the offset of the upper and the left.
- the filtering of bounding boxes according to preset conditions refers to deleting bounding boxes with a smaller area.
- some interference object regions with a small area are filtered out, and the bounding box with a small area is deleted by setting the size and concentration parameters of the corrosion particles. That is to say, with the sorting method, the smaller inner area can be deleted by the hyperparameter method, and the bounding box area with the smaller area is defined as the interference object area. Therefore, the smaller inner area needs to be deleted. What we need is the largest outer area.
- Dilate Dilate uses specific structural elements to expand the input image.
- the main function of the expansion operation of the Dilate() function is to communicate; because there are gaps between the characters, they will be connected by the expansion operation, which is convenient after the contour extraction.
- Dilate Dilate uses 3 ⁇ 3 structural elements/templates for related operations to expand and support structural elements of any shape. Through expansion, the size of the object can be increased by one pixel (3 ⁇ 3); the edge of the object can be smoothed and the distance between the objects can be reduced or filled. Reduce the small holes formed by the difference between the images, so that the regional image is complete.
- the contours of multiple bounding boxes can communicate with each other.
- step S130 if the outline definition of the text area does not reach the preset threshold, repeat steps S120 and S130 until the outline definition of the text area reaches the preset threshold.
- Hyperparameters include at least three categories: 1. The threshold parameter of contour definition; 2. The size and concentration of corrosion particles; 3. The contour of the connected area, that is, the parameter of the number of contours.
- the preliminary edge detection is corrected through the parameter setting of Corrosion + Unicom, and accurate edge detection results are obtained.
- the parameter setting of Corrosion + Unicom is for different scenes and different texts (size, color and background). Carry on the corresponding parameter setting, the preliminary positioning is a basic parameter, which is obtained through rough estimation. After the parameter correction of the above scenes and different texts, the best effect can be achieved by finally mixing.
- convexhull is repeated to obtain a bounding box for the remaining pixels. Because after deleting the bounding box with a small area (that is, deleting the noise) and realizing the outline connection of the bounding box by corrosion (that is, filtering simple characters), the remaining discrete pixels need to be summarized as an aggregate.
- S150 Regressively calculate the rectangular range of the bounding box after the union according to the width of the font of the text area, and select the bounding box with the smallest rectangular range, and the area corresponding to the bounding box is the text area to be recognized.
- the width of the font it is further regressed to calculate the rectangular range of the bounding box after the union; because the font size and area in the video are determined, it can be processed separately, that is, different parameters can be set separately. Because starting from pixels, through image processing, there are errors, such as rectangles that are too large and wide, or too small and narrow. These will be affected by lighting, color, background, etc., but the font is fixed, so there will be a certain comparison error , Such as the zoom ratio of width and height, as a reference for regression.
- this application binarizes the canny algorithm to detect the contour, uses findContous to find the discrete points of the contour, finds the convex hull through convexhull, and forms the bounding box, calculates the bounding box area, selects the best bounding box, and completes the text Identify (extract).
- the present application provides an OSD text area recognition system 200.
- FIG. 2 it is a schematic diagram of the structure of a preferred embodiment of a system for recognizing the text area of the OSD of this application.
- an OSD text area recognition system 200 which includes an image preprocessing unit 210, a bounding box forming unit 220, and a bounding box selection unit 230; wherein, the image preprocessing unit 210 is used for recognizing the OSD
- the file is preprocessed to obtain a frame-by-frame image, the frame-by-frame image is filtered by binarization threshold using the canny algorithm, the frame contour is obtained on the filtered image, and the discrete points of the contour of the text area are found; the bounding box forming unit 220 , For encapsulating the discrete points into at least two polygonal bounding boxes, calculating the area of the bounding box, and filtering the bounding box according to preset conditions; pressing the bounding box obtained by the screening by an expansion algorithm Enlarge the scale to realize the interconnection between the outlines of the bounding boxes; after the outlines of the bounding boxes are connected, take the union of the remaining bounding boxes after the interconnection according to a prese
- the image preprocessing unit 210 includes a threshold filter subunit 211 and a discrete point acquisition subunit 212; wherein the threshold filter subunit 211 is used to preprocess the OSD file to obtain a frame-by-frame image, and use the canny algorithm to Perform binarization threshold filtering on frame-by-frame images; the discrete point acquisition subunit 212 is configured to perform frame contour acquisition on the filtered image and find discrete points of the contour of the text area; wherein, the discrete point acquisition subunit The discrete points of the outline of the text area are obtained by findContous.
- the bounding box forming unit 220 includes a bounding box forming subunit 221, a bounding box communicating subunit 222, and a bounding box merging subunit 223; the bounding box forming subunit 221 is used to encapsulate the discrete points into at least two polygonal enclosures The bounding box is calculated and the area of the bounding box is calculated, and the bounding box is screened according to preset conditions; the bounding box communication sub-unit 222 is used to scale up the screened bounding box through an expansion algorithm to achieve all The outlines between the bounding boxes communicate with each other; the bounding box merging subunit 223 is configured to merge the remaining bounding boxes after the interconnection according to a preset condition.
- it further includes a threshold judging subunit (not shown in the figure) for judging whether the outline definition of the text area of the bounding box after the bounding box communication subunit 222 is connected has reached a preset threshold, If not, use the bounding box forming subunit 221 and the bounding box connecting subunit 222 to repeat the screening until the outline definition of the text area reaches the preset threshold; if so, the bounding box merging subunit 223 is used to connect the remaining Take the union of the bounding boxes.
- a threshold judging subunit (not shown in the figure) for judging whether the outline definition of the text area of the bounding box after the bounding box communication subunit 222 is connected has reached a preset threshold, If not, use the bounding box forming subunit 221 and the bounding box connecting subunit 222 to repeat the screening until the outline definition of the text area reaches the preset threshold; if so, the bounding box merging subunit 223 is used to connect the remaining Take the union of the bounding boxes.
- a bounding box screening unit 240 which is used to screen the bounding boxes by setting corrosion particle size and concentration parameters, and set the contour layer parameters of the connected area to realize the mutual contours between bounding boxes.
- the bounding box forming subunit 221 includes a bounding box forming module and a bounding box screening module.
- the bounding box forming module is used to encapsulate the discrete points into at least two polygonal bounding boxes; the bounding box screening module uses In calculating the area of the bounding box, the bounding box is screened according to preset conditions; wherein the bounding box forming module includes a convex hull forming sub-module and a bounding box forming sub-module, and the convex hull forming sub-module,
- the discrete points are used to obtain convex hulls through convexhull; the bounding box forms a sub-module for encapsulating the obtained convex hulls into polygonal bounding boxes one by one.
- the bounding box screening module includes a size obtaining sub-module, an area obtaining sub-module, and a screening sub-module; the size obtaining sub-module is used to determine the 4 edge points of the bounding box and pass the 4 edge points of the bounding box.
- the edge point calculates the width and height of the bounding box; the area acquisition sub-module is used to obtain the area of the bounding box through the width and height; the screening sub-module is used to perform the calculation according to preset conditions The bounding box is screened.
- the threshold filtering subunit includes a gradient amplitude acquisition module and an edge determination module; wherein the gradient amplitude acquisition module is used to preprocess the OSD file to obtain a frame-by-frame image, and obtain the frame-by-frame image by using a cumulative histogram
- the edge determination module is used to set a high threshold TH and a low threshold TL, and determine the pixel with a gradient amplitude greater than the high threshold TH as an edge.
- the present application provides a method for recognizing the text area of the OSD, which is applied to an electronic device 3.
- FIG. 3 it is a schematic diagram of an application environment of a preferred embodiment of the method for recognizing the text area of the OSD of this application.
- the electronic device 1 may be a terminal device with arithmetic function, such as a server, a smart phone, a tablet computer, a portable computer, a desktop computer, and the like.
- the electronic device 3 includes a processor 32, a memory 31, a communication bus 33, and a network interface 34.
- the memory 31 includes at least one type of readable storage medium.
- the at least one type of readable storage medium may be a non-volatile storage medium such as flash memory, hard disk, multimedia card, card-type memory 31, and the like.
- the readable storage medium may be an internal storage unit of the electronic device 3, such as a hard disk of the electronic device 3.
- the readable storage medium may also be the external memory 31 of the electronic device 3, such as a plug-in hard disk or a smart memory card (Smart Media Card, SMC) equipped on the electronic device 3. , Secure Digital (SD) card, Flash Card (Flash Card), etc.
- SD Secure Digital
- Flash Card Flash Card
- the readable storage medium of the memory 31 is generally used to store the recognition program 30 and the like of the text area of the OSD installed in the electronic device 3.
- the memory 31 can also be used to temporarily store data that has been output or will be output.
- the processor 32 may be a central processing unit (CPU), microprocessor or other data processing chip, used to run the program code or processing data stored in the memory 31, such as executing OSD text Area recognition program 30 and so on.
- CPU central processing unit
- microprocessor or other data processing chip, used to run the program code or processing data stored in the memory 31, such as executing OSD text Area recognition program 30 and so on.
- the communication bus 33 is used to realize the connection and communication between these components.
- the network interface 34 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface), and is generally used to establish a communication connection between the electronic device 3 and other electronic devices.
- FIG. 3 only shows the electronic device 3 with the components 31-34, but it should be understood that it is not required to implement all the illustrated components, and more or fewer components may be implemented instead.
- the electronic device 3 may also include a user interface.
- the user interface may include an input unit such as a keyboard (Keyboard), a voice input device such as a microphone (microphone) and other devices with voice recognition functions, and a voice output device such as audio, earphones, etc.
- the user interface may also include a standard wired interface and a wireless interface.
- the electronic device 3 may also include a display, and the display may also be called a display screen or a display unit.
- the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, and an organic light-emitting diode (Organic Light-Emitting Diode, OLED) touch device, etc.
- the display is used to display the information processed in the electronic device 3 and used to display recognized characters.
- the electronic device 3 may also include a radio frequency (RF) circuit, a sensor, an audio circuit, etc., which will not be repeated here.
- RF radio frequency
- the memory 31 as a computer storage medium may include an operating system and an OSD text area recognition program 30; the processor 32 executes the OSD text area recognition program stored in the memory 31
- the recognition program 30 implements the following steps: preprocess the OSD file to obtain a frame-by-frame image, use the canny algorithm to perform binarization threshold filtering on the frame-by-frame image, obtain the frame contour of the filtered image and find the contour of the text area Discrete points; encapsulate the discrete points into multiple polygonal bounding boxes, and filter the bounding boxes according to preset conditions; enlarge the multiple bounding boxes obtained by the expansion algorithm in proportion; according to the preset conditions, the remaining
- the bounding box is a union; the rectangular range of the bounding box after the union is calculated according to the width of the font of the text area, and the bounding box with the smallest rectangular range is selected, and the area corresponding to the bounding box is the text area to be recognized.
- the OSD text area recognition program 30 can also be divided into one or more modules, and the one or more modules are stored in the memory 31 and executed by the processor 32 to complete the application.
- the module referred to in this application refers to a series of computer program instruction segments that can complete specific functions.
- the OSD text area recognition program 30 can be divided into: an image preprocessing unit, a bounding box forming unit, and a bounding box selection unit.
- the image and processing unit includes preprocessing the OSD file to obtain a frame-by-frame image, using the canny algorithm to perform binarization threshold filtering on the frame-by-frame image, acquiring the frame contour of the filtered image and finding the contour of the text area
- the bounding box forming unit includes packaging the discrete points into multiple polygonal bounding boxes, and calculating the area of the multiple bounding boxes, screening the bounding boxes according to preset conditions, and expanding The algorithm magnifies the multiple bounding boxes obtained by the screening in proportion to realize the interconnection of the outlines between the multiple bounding boxes; after the outlines of the bounding boxes are connected, the remaining ones after the connection are connected according to preset conditions
- the bounding box takes a union; the bounding box selection unit includes the rectangular range of the bounding box after the union is calculated according to the width of the font of the text area, and
- an embodiment of the present application also proposes a computer-readable storage medium.
- the computer-readable storage medium includes an OSD text area recognition program.
- the OSD text area recognition program When executed by a processor, the following operations are implemented : Preprocess the OSD file to obtain a frame-by-frame image, use the canny algorithm to filter the frame-by-frame image by binarization threshold, obtain the frame contour of the filtered image and find the discrete points of the contour of the text area; encapsulate the discrete points Multiple polygonal bounding boxes are formed, and the bounding boxes are filtered according to preset conditions; the multiple bounding boxes obtained by the screening are scaled up by an expansion algorithm; and the remaining bounding boxes after the connection are taken according to the preset conditions.
- the specific implementation of the computer-readable storage medium of the present application is substantially the same as the specific implementation of the above-mentioned OSD character area recognition method and electronic device, and will not be repeated here.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
- Character Input (AREA)
Abstract
La présente invention se rapporte au domaine technique de l'identification d'image et concerne un procédé d'identification d'une région de texte d'OSD. Le procédé consiste à : prétraiter un fichier OSD pour obtenir une image trame par trame, réaliser une filtration avec seuil de binarisation sur l'image trame par trame en utilisant un algorithme de Canny, réaliser une acquisition de contour de trame sur l'image filtrée et trouver des points discrets du contour d'une région de texte ; encapsuler les points discrets dans au moins deux zones de délimitation polygonales, et cribler les zones de délimitation en fonction d'une condition prédéfinie ; amplifier les zones de délimitation obtenues par criblage en proportion au moyen d'un algorithme d'expansion ; prendre, en fonction d'une condition prédéfinie, une union des zones de délimitation restantes après liaison ; et calculer de manière régressive, en fonction de la largeur des polices de caractères dans la région de texte, des plages rectangulaires des zones de délimitation après union, sélectionner la zone de délimitation ayant la plus petite plage rectangulaire, et prendre la région correspondant à la zone de délimitation et la considérer comme une région de texte à identifier. Selon la présente demande, un procédé de segmentation basé sur une détection de bord est utilisé pour segmenter des textes OSD et obtenir des régions de segmentation, de façon à pouvoir mettre en œuvre l'effet de protection contre l'interférence de la variation dynamique de l'éclairage sur la détection.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910885665.8A CN110717489B (zh) | 2019-09-19 | 2019-09-19 | Osd的文字区域的识别方法、装置及存储介质 |
| CN201910885665.8 | 2019-09-19 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2021051604A1 true WO2021051604A1 (fr) | 2021-03-25 |
Family
ID=69209932
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2019/118284 Ceased WO2021051604A1 (fr) | 2019-09-19 | 2019-11-14 | Procédé d'identification de région de texte d'osd, dispositif et support d'enregistrement |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN110717489B (fr) |
| WO (1) | WO2021051604A1 (fr) |
Cited By (15)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113191311A (zh) * | 2021-05-19 | 2021-07-30 | 广联达科技股份有限公司 | 矢量pdf图纸的填充边界识别方法、装置、设备及存储介质 |
| CN113486892A (zh) * | 2021-07-02 | 2021-10-08 | 东北大学 | 基于智能手机图像识别的生产信息采集方法及系统 |
| CN113688815A (zh) * | 2021-06-01 | 2021-11-23 | 无锡启凌科技有限公司 | 用于复杂光照环境的药品包装文字计算机识别算法及装置 |
| CN113903038A (zh) * | 2021-10-13 | 2022-01-07 | 广东金赋科技股份有限公司 | 一种基于包围盒的文字矫正方法及系统 |
| CN114125705A (zh) * | 2021-11-19 | 2022-03-01 | 中国电子科技集团公司第二十八研究所 | 一种基于数学形态学的ads-b基站监视范围估计方法 |
| CN114266800A (zh) * | 2021-12-24 | 2022-04-01 | 中设数字技术股份有限公司 | 一种图形的多矩形包围盒算法及生成系统 |
| CN114565925A (zh) * | 2022-02-25 | 2022-05-31 | 北京鼎事兴教育咨询有限公司 | 文本信息获取方法、装置、存储介质及电子设备 |
| CN115035316A (zh) * | 2022-06-30 | 2022-09-09 | 招联消费金融有限公司 | 目标区域图像识别方法、装置、计算机设备 |
| CN115393589A (zh) * | 2022-08-25 | 2022-11-25 | 浙江中控技术股份有限公司 | 一种通用dcs工艺流程图识别转换方法、系统以及介质 |
| CN115620302A (zh) * | 2022-11-22 | 2023-01-17 | 山东捷瑞数字科技股份有限公司 | 一种图片字体识别方法、系统、电子设备及存储介质 |
| CN115880360A (zh) * | 2022-11-17 | 2023-03-31 | 上海逐路智能科技发展有限公司 | 基于多目视觉的箱体识别方法、系统、设备及存储介质 |
| CN115909366A (zh) * | 2022-11-14 | 2023-04-04 | 南京太司德智能科技有限公司 | 一种基于图像连通域的文字位置计算、编排及识别算法 |
| CN116372947A (zh) * | 2023-03-14 | 2023-07-04 | 中国一冶集团有限公司 | 基于三维点云的机器人自动拆袋方法及系统 |
| CN116433701A (zh) * | 2023-06-15 | 2023-07-14 | 武汉中观自动化科技有限公司 | 一种工件孔洞轮廓的提取方法、装置、设备及存储介质 |
| US11741732B2 (en) | 2021-12-22 | 2023-08-29 | International Business Machines Corporation | Techniques for detecting text |
Families Citing this family (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111353489B (zh) * | 2020-02-27 | 2025-09-09 | 深圳赛安特技术服务有限公司 | 文本图像处理方法、装置、计算机设备和存储介质 |
| CN111405345B (zh) * | 2020-03-19 | 2022-03-01 | 展讯通信(上海)有限公司 | 图像处理方法、装置、显示设备及可读存储介质 |
| CN111444903B (zh) * | 2020-03-23 | 2022-12-09 | 西安交通大学 | 漫画气泡内文字定位方法、装置、设备及可读存储介质 |
| CN111783493A (zh) * | 2020-06-18 | 2020-10-16 | 福州富昌维控电子科技有限公司 | 一种批量二维码的识别方法及识别终端 |
| CN111611783B (zh) * | 2020-06-18 | 2023-04-25 | 山东旗帜信息有限公司 | 一种图形表格的定位分割方法及装置 |
| CN112019925B (zh) * | 2020-10-29 | 2021-01-22 | 蘑菇车联信息科技有限公司 | 视频水印识别处理方法及装置 |
| CN112800824B (zh) * | 2020-12-08 | 2024-02-02 | 北京方正印捷数码技术有限公司 | 扫描文件的处理方法、装置、设备及存储介质 |
| CN114764780B (zh) * | 2020-12-30 | 2025-06-13 | 杭州广立微电子股份有限公司 | 一种识别光刻缺陷热点图形的方法及图形结构 |
| CN112905843B (zh) * | 2021-03-17 | 2024-08-09 | 安徽文香科技股份有限公司 | 一种基于视频流的信息处理方法、装置以及存储介质 |
| CN116739926B (zh) * | 2023-06-09 | 2025-10-10 | 合肥昇创微电子有限公司 | 一种对Demura相机拍摄结果进行透视变换矫正的方法 |
| CN119360078B (zh) * | 2024-09-19 | 2025-04-01 | 中国人民解放军总医院第一医学中心 | 一种器官自动分割识别系统 |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20100303356A1 (en) * | 2007-11-28 | 2010-12-02 | Knut Tharald Fosseide | Method for processing optical character recognition (ocr) data, wherein the output comprises visually impaired character images |
| CN107563380A (zh) * | 2017-09-08 | 2018-01-09 | 上海理工大学 | 一种基于mser和swt相结合的车辆车牌检测识别方法 |
| CN108805116A (zh) * | 2018-05-18 | 2018-11-13 | 浙江蓝鸽科技有限公司 | 图像文本检测方法及其系统 |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108171104B (zh) * | 2016-12-08 | 2022-05-10 | 腾讯科技(深圳)有限公司 | 一种文字检测方法及装置 |
| CN109002824B (zh) * | 2018-06-27 | 2021-11-12 | 淮阴工学院 | 一种基于OpenCV的建筑图纸标签信息检测方法 |
| CN109670500B (zh) * | 2018-11-30 | 2024-06-28 | 平安科技(深圳)有限公司 | 一种文字区域获取方法、装置、存储介质及终端设备 |
-
2019
- 2019-09-19 CN CN201910885665.8A patent/CN110717489B/zh active Active
- 2019-11-14 WO PCT/CN2019/118284 patent/WO2021051604A1/fr not_active Ceased
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20100303356A1 (en) * | 2007-11-28 | 2010-12-02 | Knut Tharald Fosseide | Method for processing optical character recognition (ocr) data, wherein the output comprises visually impaired character images |
| CN107563380A (zh) * | 2017-09-08 | 2018-01-09 | 上海理工大学 | 一种基于mser和swt相结合的车辆车牌检测识别方法 |
| CN108805116A (zh) * | 2018-05-18 | 2018-11-13 | 浙江蓝鸽科技有限公司 | 图像文本检测方法及其系统 |
Cited By (21)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113191311A (zh) * | 2021-05-19 | 2021-07-30 | 广联达科技股份有限公司 | 矢量pdf图纸的填充边界识别方法、装置、设备及存储介质 |
| CN113191311B (zh) * | 2021-05-19 | 2024-04-16 | 广联达科技股份有限公司 | 矢量pdf图纸的填充边界识别方法、装置、设备及存储介质 |
| CN113688815A (zh) * | 2021-06-01 | 2021-11-23 | 无锡启凌科技有限公司 | 用于复杂光照环境的药品包装文字计算机识别算法及装置 |
| CN113486892A (zh) * | 2021-07-02 | 2021-10-08 | 东北大学 | 基于智能手机图像识别的生产信息采集方法及系统 |
| CN113486892B (zh) * | 2021-07-02 | 2023-11-28 | 东北大学 | 基于智能手机图像识别的生产信息采集方法及系统 |
| CN113903038A (zh) * | 2021-10-13 | 2022-01-07 | 广东金赋科技股份有限公司 | 一种基于包围盒的文字矫正方法及系统 |
| CN114125705A (zh) * | 2021-11-19 | 2022-03-01 | 中国电子科技集团公司第二十八研究所 | 一种基于数学形态学的ads-b基站监视范围估计方法 |
| CN114125705B (zh) * | 2021-11-19 | 2024-03-08 | 中国电子科技集团公司第二十八研究所 | 一种基于数学形态学的ads-b基站监视范围估计方法 |
| US11741732B2 (en) | 2021-12-22 | 2023-08-29 | International Business Machines Corporation | Techniques for detecting text |
| CN114266800A (zh) * | 2021-12-24 | 2022-04-01 | 中设数字技术股份有限公司 | 一种图形的多矩形包围盒算法及生成系统 |
| CN114266800B (zh) * | 2021-12-24 | 2023-05-05 | 中设数字技术股份有限公司 | 一种平面图形的多矩形包围盒生成方法及系统 |
| CN114565925A (zh) * | 2022-02-25 | 2022-05-31 | 北京鼎事兴教育咨询有限公司 | 文本信息获取方法、装置、存储介质及电子设备 |
| CN115035316A (zh) * | 2022-06-30 | 2022-09-09 | 招联消费金融有限公司 | 目标区域图像识别方法、装置、计算机设备 |
| CN115393589A (zh) * | 2022-08-25 | 2022-11-25 | 浙江中控技术股份有限公司 | 一种通用dcs工艺流程图识别转换方法、系统以及介质 |
| CN115909366A (zh) * | 2022-11-14 | 2023-04-04 | 南京太司德智能科技有限公司 | 一种基于图像连通域的文字位置计算、编排及识别算法 |
| CN115880360A (zh) * | 2022-11-17 | 2023-03-31 | 上海逐路智能科技发展有限公司 | 基于多目视觉的箱体识别方法、系统、设备及存储介质 |
| CN115620302A (zh) * | 2022-11-22 | 2023-01-17 | 山东捷瑞数字科技股份有限公司 | 一种图片字体识别方法、系统、电子设备及存储介质 |
| CN115620302B (zh) * | 2022-11-22 | 2023-12-01 | 山东捷瑞数字科技股份有限公司 | 一种图片字体识别方法、系统、电子设备及存储介质 |
| CN116372947A (zh) * | 2023-03-14 | 2023-07-04 | 中国一冶集团有限公司 | 基于三维点云的机器人自动拆袋方法及系统 |
| CN116433701A (zh) * | 2023-06-15 | 2023-07-14 | 武汉中观自动化科技有限公司 | 一种工件孔洞轮廓的提取方法、装置、设备及存储介质 |
| CN116433701B (zh) * | 2023-06-15 | 2023-10-10 | 武汉中观自动化科技有限公司 | 一种工件孔洞轮廓的提取方法、装置、设备及存储介质 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN110717489B (zh) | 2023-09-15 |
| CN110717489A (zh) | 2020-01-21 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2021051604A1 (fr) | Procédé d'identification de région de texte d'osd, dispositif et support d'enregistrement | |
| CN105144239B (zh) | 图像处理装置、图像处理方法 | |
| TWI541763B (zh) | 用於調整深度値的方法、電子裝置和媒體 | |
| US9076056B2 (en) | Text detection in natural images | |
| CN103336961B (zh) | 一种交互式的自然场景文本检测方法 | |
| JP2018524732A (ja) | 半自動画像セグメンテーション | |
| CN111695373B (zh) | 斑马线的定位方法、系统、介质及设备 | |
| US8929664B2 (en) | Shape detection using chain code states | |
| CN115439523B (zh) | 一种半导体器件引脚尺寸检测方法、设备及存储介质 | |
| CN110570442A (zh) | 一种复杂背景下轮廓检测方法、终端设备及存储介质 | |
| CN109753953A (zh) | 图像中定位文本的方法、装置、电子设备和存储介质 | |
| CN109948521B (zh) | 图像纠偏方法和装置、设备及存储介质 | |
| CN118864537A (zh) | 一种视频监控中运动目标的跟踪方法、装置和设备 | |
| CN110263778A (zh) | 一种基于图像识别的抄表方法及装置 | |
| JP2017500662A (ja) | 投影ひずみを補正するための方法及びシステム | |
| CN113343987A (zh) | 文本检测处理方法、装置、电子设备及存储介质 | |
| WO2018058573A1 (fr) | Procédé de détection d'objet, appareil de détection d'objet et dispositif électronique | |
| CN115063578B (zh) | 芯片图像中目标对象检测与定位方法、装置及存储介质 | |
| CN112465835A (zh) | 用于翡翠图像分割的方法及模型训练方法 | |
| US7440636B2 (en) | Method and apparatus for image processing | |
| CN111612005A (zh) | 文字检测方法及装置 | |
| Liu et al. | A simple and fast text localization algorithm for indoor mobile robot navigation | |
| CN113128435B (zh) | 图像中手部区域分割方法、装置、介质及计算机设备 | |
| CN105930813B (zh) | 一种在任意自然场景下检测行文本的方法 | |
| CN113256482B (zh) | 一种拍照背景虚化方法、移动终端及存储介质 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19945508 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 19945508 Country of ref document: EP Kind code of ref document: A1 |