BRPI0604212A - detecção automática de codificação de caracter - Google Patents
detecção automática de codificação de caracterInfo
- Publication number
- BRPI0604212A BRPI0604212A BRPI0604212-0A BRPI0604212A BRPI0604212A BR PI0604212 A BRPI0604212 A BR PI0604212A BR PI0604212 A BRPI0604212 A BR PI0604212A BR PI0604212 A BRPI0604212 A BR PI0604212A
- Authority
- BR
- Brazil
- Prior art keywords
- legally
- candidates
- text strings
- character encoding
- automatic character
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/126—Character encoding
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Document Processing Apparatus (AREA)
Abstract
DETECçãO AUTOMáTICA DE CODIFICAçãO DE CARACTER. A presente inveção refere-se a um método para detectar a codificação utilizada em um documento eletrónico que inclui testar as cadeias de caracteres de texto para determinar se o documento eletrónico contém somente cadeias de caracteres de texto possuindo códigos numéricos legais. Uma análise estatística das cadeias de caracteres de texto é então realizada para proporcionar um mapeamento dos candidatos legalmente codificados. Os candidatos legalmente são classificados e combinados com uma classificação esperada dos candidatos legalmente codificados para proporcionar um mapeamento de caracteres mais provável.
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US11/198,428 US7148824B1 (en) | 2005-08-05 | 2005-08-05 | Automatic detection of character encoding format using statistical analysis of the text strings |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| BRPI0604212A true BRPI0604212A (pt) | 2007-07-17 |
Family
ID=37497287
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| BRPI0604212-0A BRPI0604212A (pt) | 2005-08-05 | 2006-08-07 | detecção automática de codificação de caracter |
Country Status (3)
| Country | Link |
|---|---|
| US (1) | US7148824B1 (pt) |
| JP (1) | JP2007048284A (pt) |
| BR (1) | BRPI0604212A (pt) |
Families Citing this family (16)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7499943B2 (en) * | 2006-01-09 | 2009-03-03 | International Business Machines Corporation | Mapping for mapping source and target objects |
| NZ549548A (en) * | 2006-08-31 | 2009-04-30 | Arc Innovations Ltd | Managing supply of a utility to a customer premises |
| WO2009002593A2 (en) * | 2007-04-20 | 2008-12-31 | Stephen Murphy | Apparatuses, methods and systems for a multi-modal data interfacing platform |
| US8156432B2 (en) * | 2007-11-14 | 2012-04-10 | Zih Corp. | Detection of UTF-16 encoding in streaming XML data without a byte-order mark and related printers, systems, methods, and computer program products |
| JP2010176237A (ja) * | 2009-01-28 | 2010-08-12 | Nec Corp | 文字コード自動判別システム、文字コード自動判別方法及び文字コード自動判別プログラム |
| CN102567293B (zh) * | 2010-12-13 | 2015-05-20 | 汉王科技股份有限公司 | 文本文件的编码格式探测方法和装置 |
| GB2489512A (en) | 2011-03-31 | 2012-10-03 | Clearswift Ltd | Classifying data using fingerprint of character encoding |
| CN104156373B (zh) * | 2013-05-15 | 2017-06-06 | 宏碁股份有限公司 | 编码格式检测方法及装置 |
| CN104516862B (zh) * | 2013-09-29 | 2018-05-01 | 北大方正集团有限公司 | 一种选择读取目标文档的编码格式的方法及其系统 |
| CN104361021B (zh) * | 2014-10-21 | 2018-07-24 | 小米科技有限责任公司 | 网页编码识别方法及装置 |
| US9665546B1 (en) * | 2015-12-17 | 2017-05-30 | International Business Machines Corporation | Real-time web service reconfiguration and content correction by detecting in invalid bytes in a character string and inserting a missing byte in a double byte character |
| DE102018108693A1 (de) * | 2017-04-13 | 2018-10-18 | Hirschmann Car Communication Gmbh | Zeichensatz-Erkennung |
| US10949617B1 (en) * | 2018-09-27 | 2021-03-16 | Amazon Technologies, Inc. | System for differentiating encoding of text fields between networked services |
| CN110196968B (zh) * | 2019-06-06 | 2023-04-07 | 北京林业大学 | 一种基于特定字符串查找的简体中文编码方式自动识别系统及方法 |
| CN113569534A (zh) * | 2020-04-29 | 2021-10-29 | 杭州海康威视数字技术股份有限公司 | 一种检测文档中乱码的方法及装置 |
| CN117424765B (zh) * | 2023-12-19 | 2024-03-22 | 天津医康互联科技有限公司 | 分布式独热编码方法、装置、电子设备及计算机存储介质 |
Family Cites Families (20)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4843389A (en) * | 1986-12-04 | 1989-06-27 | International Business Machines Corp. | Text compression and expansion method and apparatus |
| US5282194A (en) * | 1992-08-17 | 1994-01-25 | Loral Aerospace Corporation | Interactive protocol analysis system |
| US5414650A (en) * | 1993-03-24 | 1995-05-09 | Compression Research Group, Inc. | Parsing information onto packets using context-insensitive parsing rules based on packet characteristics |
| US5649214A (en) * | 1994-09-20 | 1997-07-15 | Unisys Corporation | Method and apparatus for continued use of data encoded under a first coded character set while data is gradually transliterated to a second coded character set |
| US5684478A (en) * | 1994-12-06 | 1997-11-04 | Cennoid Technologies, Inc. | Method and apparatus for adaptive data compression |
| US5778361A (en) * | 1995-09-29 | 1998-07-07 | Microsoft Corporation | Method and system for fast indexing and searching of text in compound-word languages |
| JP3499671B2 (ja) * | 1996-02-09 | 2004-02-23 | 富士通株式会社 | データ圧縮装置及びデータ復元装置 |
| US7058726B1 (en) * | 1996-07-08 | 2006-06-06 | Internet Number Corporation | Method and systems for accessing information on a network using message aliasing functions having shadow callback functions |
| US6035268A (en) * | 1996-08-22 | 2000-03-07 | Lernout & Hauspie Speech Products N.V. | Method and apparatus for breaking words in a stream of text |
| TW421750B (en) * | 1997-03-14 | 2001-02-11 | Omron Tateisi Electronics Co | Language identification device, language identification method and storage media recorded with program of language identification |
| US6049869A (en) * | 1997-10-03 | 2000-04-11 | Microsoft Corporation | Method and system for detecting and identifying a text or data encoding system |
| US6525831B1 (en) * | 1998-12-02 | 2003-02-25 | Xerox Corporation | Non-format violating PDL guessing technique to determine the page description language in which a print job is written |
| US6314469B1 (en) * | 1999-02-26 | 2001-11-06 | I-Dns.Net International Pte Ltd | Multi-language domain name service |
| MXPA01010103A (es) * | 1999-04-05 | 2002-11-04 | Neomedia Tech Inc | Sistema y metodo para utilizar codigos de enlace legibles por maquina o legibles por humanos para tener acceso a recursos de datos en red. |
| US6400287B1 (en) * | 2000-07-10 | 2002-06-04 | International Business Machines Corporation | Data structure for creating, scoping, and converting to unicode data from single byte character sets, double byte character sets, or mixed character sets comprising both single byte and double byte character sets |
| US6668085B1 (en) * | 2000-08-01 | 2003-12-23 | Xerox Corporation | Character matching process for text converted from images |
| US6829386B2 (en) * | 2001-02-28 | 2004-12-07 | Sun Microsystems, Inc. | Methods and apparatus for associating character codes with optimized character codes |
| US7010779B2 (en) * | 2001-08-16 | 2006-03-07 | Knowledge Dynamics, Inc. | Parser, code generator, and data calculation and transformation engine for spreadsheet calculations |
| US6650261B2 (en) * | 2001-09-06 | 2003-11-18 | Xerox Corporation | Sliding window compression method utilizing defined match locations |
| US6701320B1 (en) * | 2002-04-24 | 2004-03-02 | Bmc Software, Inc. | System and method for determining a character encoding scheme |
-
2005
- 2005-08-05 US US11/198,428 patent/US7148824B1/en not_active Expired - Fee Related
-
2006
- 2006-08-01 JP JP2006209344A patent/JP2007048284A/ja active Pending
- 2006-08-07 BR BRPI0604212-0A patent/BRPI0604212A/pt not_active IP Right Cessation
Also Published As
| Publication number | Publication date |
|---|---|
| US7148824B1 (en) | 2006-12-12 |
| JP2007048284A (ja) | 2007-02-22 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| BRPI0604212A (pt) | detecção automática de codificação de caracter | |
| BR112012009445A2 (pt) | Codificador de áudio, decodificador de áudio, método para codificar uma informação de áudio, método para decodificar uma informação de áudio e programa de computador que utiliza uma detecção de um grupo de valores espectrais previamente decodificados | |
| Saastamoinen | The phraseology and structure of Latin building inscriptions in Roman north Africa | |
| BR112012011230A2 (pt) | fatores de risco e previsão de infarto do miocárdio | |
| BRPI0720343A2 (pt) | método aparelho e programa de computador para detecção de fraude em computador | |
| MY149569A (en) | Improvements in resisting the spread of unwanted code and data | |
| BR112014007214A8 (pt) | Método para determinar a probabilidade de que um indivíduo tenha risco elevado de um evento cardiovascular, método para avaliação do risco de um evento cardiovascular futuro e método implementado por computador para avaliação do risco de um evento cardiovascular | |
| BR112015022493A2 (pt) | sistema de determinação de contexto demográfico | |
| BR112012013160A2 (pt) | máquina de preparação de bebidas com funcionalidade de emulsão ambiente | |
| BRPI0600359A (pt) | método e meio legìvel por computador para proporcionar indicadores de desempenho chave acionados por planilha | |
| BR112013003391A2 (pt) | biomarcadores de câncer pancreático e usos dos mesmos | |
| ATE527834T1 (de) | Ökonomische lautheitmessung von codiertem audio | |
| ATE522875T1 (de) | Identifizierung von textpassagen | |
| BR112015009022A2 (pt) | métodos para determinação da abundância de um analito em uma pluralidade de amostras | |
| WO2015003143A3 (en) | Method and system for simplifying implicit rhetorical relation prediction in large scale annotated corpus | |
| Pawelka et al. | Is this code written in English? A study of the natural language of comments and identifiers in practice | |
| BR112013023409A2 (pt) | método e kit de teste para determinar a concentração de nitrato | |
| BR112015014557A2 (pt) | método, aparelho e sistema para indexar conteúdo com base na informação do tempo | |
| RU2011153489A (ru) | Способ автоматизированного определения языка и(или) кодировки текстового документа | |
| BR112013006724A2 (pt) | aparelho, conjunto de circuitos integrados ou chips, dispositivo de posicionamento, método, programa de computador, e, sinal | |
| BRPI0705108A2 (pt) | sistema e método de avaliação de buchas capacitivas | |
| Olsson et al. | Perception of glare in relation to the CIE scale on Unified Glare Rating (UGR) and the impact of ambient light on both UGR and Subjective Glare Index Scales (SGI) | |
| 王薇 | Analysis. of Addition in English Translation of Su Shi's Lyric Poems | |
| Sabry | The phenomenon of substitution in the Akkadian and Arabic languages-a comparative study | |
| Azenabor | Developing electronic government models for Nigeria: an analysis |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| B08F | Application dismissed because of non-payment of annual fees [chapter 8.6 patent gazette] |
Free format text: REFERENTE A 10A ANUIDADE. |
|
| B08K | Patent lapsed as no evidence of payment of the annual fee has been furnished to inpi [chapter 8.11 patent gazette] |
Free format text: EM VIRTUDE DO ARQUIVAMENTO PUBLICADO NA RPI 2385 DE 20-09-2016 E CONSIDERANDO AUSENCIA DE MANIFESTACAO DENTRO DOS PRAZOS LEGAIS, INFORMO QUE CABE SER MANTIDO O ARQUIVAMENTO DO PEDIDO DE PATENTE, CONFORME O DISPOSTO NO ARTIGO 12, DA RESOLUCAO 113/2013. |