Различия
Здесь показаны различия между двумя версиями данной страницы.
Предыдущая версия справа и слева Предыдущая версия Следующая версия | Предыдущая версия | ||
wiki:tesseract [2012/07/20 10:48] [OPTIONS] форматирование |
wiki:tesseract [2017/03/22 20:56] |
||
---|---|---|---|
Строка 1: | Строка 1: | ||
======== Tesseract ======== | ======== Tesseract ======== | ||
+ | ''tesseract'' - консольный OCR движок. | ||
- | ''tesseract'' - консольный OCR движок | + | ==== Описание ==== |
+ | ''Tesseract'' является качественным консольным OCR движком с открытым исходным кодом. В настоящий момент программа работает с UTF-8, поддержка языков (включая русский с версии 3.0) осуществляется с помощью дополнительных модулей. | ||
+ | |||
+ | Существуют несколько графических интерфейсов (GUI) для программы: //gImageReader, OCRFeeder, YAGF//. | ||
==== Синтаксис ==== | ==== Синтаксис ==== | ||
<code bash>tesseract imagename outbase [-l язык] [-psm N] [configfile ...]</code> | <code bash>tesseract imagename outbase [-l язык] [-psm N] [configfile ...]</code> | ||
- | ==== Описание ==== | ||
- | |||
- | ''tesseract(1)'' является качественным коммерческим OCR движком, оригинально разработанным HP между 1985 и 1995. В 1995, этот движок был в топ-3 по оценке UNLV. Исходные тексты были открыты HP и UNLV в 2005-м, и с тех пор дорабатываются Google. | ||
- | ((''tesseract(1)'' is a commercial quality OCR engine originally developed at HP between 1985 and 1995. In 1995, this engine was among the top 3 evaluated by UNLV. It was open-sourced by HP and UNLV in 2005, and has been developed at Google since then.)) | ||
==== Опции ==== | ==== Опции ==== | ||
<code bash>imagename</code> | <code bash>imagename</code> | ||
Строка 47: | Строка 47: | ||
<note>Nota Bene: The options -l lang and -psm N must occur before any configfile.</note> | <note>Nota Bene: The options -l lang and -psm N must occur before any configfile.</note> | ||
==== Языки ==== | ==== Языки ==== | ||
- | There are currently language packs available for the following | + | There are currently language packs available for the following languages: |
- | languages: | + | |
- | ara (Arabic), aze (Azerbauijani), bul (Bulgarian), cat (Catalan), ces | + | - ara (Arabic), |
- | (Czech), chi_sim (Simplified Chinese), chi_tra (Traditional Chinese), | + | - aze (Azerbauijani), |
- | chr (Cherokee), dan (Danish), dan-frak (Danish (Fraktur)), deu | + | - bul (Bulgarian), |
- | (German), ell (Greek), eng (English), enm (Old English), epo | + | - cat (Catalan), |
- | (Esperanto), est (Estonian), fin (Finnish), fra (French), frm (Old | + | - ces (Czech), |
- | French), glg (Galician), heb (Hebrew), hin (Hindi), hrv (Croation), hun | + | - chi_sim (Simplified Chinese), |
- | (Hungarian), ind (Indonesian), ita (Italian), jpn (Japanese), kor | + | - chi_tra (Traditional Chinese), |
- | (Korean), lav (Latvian), lit (Lithuanian), nld (Dutch), nor | + | - chr (Cherokee), |
- | (Norwegian), pol (Polish), por (Portuguese), ron (Romanian), rus | + | - dan (Danish), |
- | (Russian), slk (Slovakian), slv (Slovenian), sqi (Albanian), spa | + | - dan-frak (Danish (Fraktur)), |
- | (Spanish), srp (Serbian), swe (Swedish), tam (Tamil), tel (Telugu), tgl | + | - deu (German), |
- | (Tagalog), tha (Thai), tur (Turkish), ukr (Ukrainian), vie (Vietnamese) | + | - ell (Greek), |
+ | - eng (English), | ||
+ | - enm (Old English), | ||
+ | - epo (Esperanto), | ||
+ | - est (Estonian), | ||
+ | - fin (Finnish), | ||
+ | - fra (French), | ||
+ | - frm (Old French), | ||
+ | - glg (Galician), | ||
+ | - heb (Hebrew), | ||
+ | - hin (Hindi), | ||
+ | - hrv (Croation), | ||
+ | - hun (Hungarian), | ||
+ | - ind (Indonesian), | ||
+ | - ita (Italian), | ||
+ | - jpn (Japanese), | ||
+ | - kor (Korean), | ||
+ | - lav (Latvian), | ||
+ | - lit (Lithuanian), | ||
+ | - nld (Dutch), | ||
+ | - nor (Norwegian), | ||
+ | - pol (Polish), | ||
+ | - por (Portuguese), | ||
+ | - ron (Romanian), | ||
+ | - rus (Russian), | ||
+ | - slk (Slovakian), | ||
+ | - slv (Slovenian), | ||
+ | - sqi (Albanian), | ||
+ | - spa (Spanish), | ||
+ | - srp (Serbian), | ||
+ | - swe (Swedish), | ||
+ | - tam (Tamil), | ||
+ | - tel (Telugu), | ||
+ | - tgl (Tagalog), | ||
+ | - tha (Thai), | ||
+ | - tur (Turkish), | ||
+ | - ukr (Ukrainian), | ||
+ | - vie (Vietnamese) | ||
- | To use a non-standard language pack named foo.traineddata, set the | + | To use a non-standard language pack named foo.traineddata, set the TESSDATA_PREFIX environment variable so the file can be found at TESSDATA_PREFIX/tessdata/foo.traineddata and give Tesseract the argument -l foo. |
- | TESSDATA_PREFIX environment variable so the file can be found at | + | |
- | TESSDATA_PREFIX/tessdata/foo.traineddata and give Tesseract the | + | |
- | argument -l foo. | + | |
==== История ==== | ==== История ==== | ||
+ | ''Tesseract'' был разработан компанией HP между 1985 и 1995, а затем десять лет не изменялся. В 2005 году были открыты исходные тексты. С 2006 года разработку движка спонсирует компания Google. | ||
+ | |||
The engine was developed at Hewlett Packard Laboratories Bristol and at | The engine was developed at Hewlett Packard Laboratories Bristol and at | ||
Hewlett Packard Co, Greeley Colorado between 1985 and 1994, with some | Hewlett Packard Co, Greeley Colorado between 1985 and 1994, with some | ||
Строка 100: | Строка 135: | ||
==== Ресурсы ==== | ==== Ресурсы ==== | ||
- | Main web site: http://code.google.com/p/tesseract-ocr/ Information on | + | * Сайт проекта: https://github.com/tesseract-ocr |
- | training: | + | * Документация: https://github.com/tesseract-ocr/tesseract/wiki |
- | http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 | + | * Википедия: https://ru.wikipedia.org/wiki/Tesseract |
==== Смотрите также ==== | ==== Смотрите также ==== | ||
ambiguous_words(1), cntraining(1), combine_tessdata(1), | ambiguous_words(1), cntraining(1), combine_tessdata(1), | ||
Строка 120: | Строка 154: | ||
Samuel Charron, Sheelagh Lloyd, Shobhit Saxena, and Thomas Kielbus. | Samuel Charron, Sheelagh Lloyd, Shobhit Saxena, and Thomas Kielbus. | ||
- | ==== COPYING ==== | + | ==== Копирование ==== |
- | Licensed under the Apache License, Version 2.0 | + | |
+ | Зарегистрирован под лицензией //Apache License, Version 2.0// | ||
{{tag>tesseract}} | {{tag>tesseract}} |