Различия

Здесь показаны различия между двумя версиями данной страницы.

--- wiki:tesseract [2012/07/20 10:24]
[Автор]
+++ wiki:tesseract [2017/03/22 20:56] (текущий)
@@ Строка 1: / Строка 1: @@
 ======== Tesseract ========
+''tesseract'' - консольный OCR движок.
-''tesseract'' - консольный OCR движок
+==== Описание ====
+''Tesseract'' является качественным консольным OCR движком с открытым исходным кодом. В настоящий момент программа работает с UTF-8, поддержка языков (включая русский с версии 3.0) осуществляется с помощью дополнительных модулей.
+Существуют несколько графических интерфейсов (GUI) для программы: //gImageReader, OCRFeeder, YAGF//.
 ==== Синтаксис ====
-<code bash>
+<code bash>tesseract imagename outbase [-l язык] [-psm N] [configfile ...]</code>
-       tesseract imagename outbase [-l lang] [-psm N] [configfile ...]
-       </code>
-==== DESCRIPTION ====
+==== Опции ====
-       tesseract(1) is a commercial quality OCR engine originally developed at
+<code bash>imagename</code>
-       HP between 1985 and 1995. In 1995, this engine was among the top 3
+The name of the input image. Most image file formats (anything   readable by Leptonica) are supported.
-       evaluated by UNLV. It was open-sourced by HP and UNLV in 2005, and has
-       been developed at Google since then.
-==== OPTIONS ====
+<code bash>outbase</code>
-       imagename
+The basename of the output file (to which the appropriate extension   will be appended). By default the output will be named outbase.txt.
-           The name of the input image. Most image file formats (anything
-           readable by Leptonica) are supported.
-       outbase
+<code bash>-l lang</code>
-           The basename of the output file (to which the appropriate extension
-           will be appended). By default the output will be named outbase.txt.
-       -l lang
+The language to use. If none is specified, English is assumed. Multiple languages may be specified, separated by plus characters. Tesseract uses 3-character ISO 639-2 language codes. (See LANGUAGES)
-           The language to use. If none is specified, English is assumed.
-           Multiple languages may be specified, separated by plus characters.
-           Tesseract uses 3-character ISO 639-2 language codes. (See
-           LANGUAGES)
-       -psm N
+<code bash>-psm N</code>
-           Set Tesseract to only run a subset of layout analysis and assume a
+Set Tesseract to only run a subset of layout analysis and assume a   certain form of image. The options for N are:
-           certain form of image. The options for N are:
-= Orientation and script detection (OSD) only.
+  *    0 = Orientation and script detection (OSD) only.
-= Automatic page segmentation with OSD.
+  *    1 = Automatic page segmentation with OSD.
-= Automatic page segmentation, but no OSD, or OCR.
+  *    2 = Automatic page segmentation, but no OSD, or OCR.
-= Fully automatic page segmentation, but no OSD. (Default)
+  *    3 = Fully automatic page segmentation, but no OSD. (Default)
-= Assume a single column of text of variable sizes.
+  *    4 = Assume a single column of text of variable sizes.
-= Assume a single uniform block of vertically aligned text.
+  *    5 = Assume a single uniform block of vertically aligned text.
-= Assume a single uniform block of text.
+  *    6 = Assume a single uniform block of text.
-= Treat the image as a single text line.
+  *    7 = Treat the image as a single text line.
-= Treat the image as a single word.
+  *    8 = Treat the image as a single word.
-= Treat the image as a single word in a circle.
+  *    9 = Treat the image as a single word in a circle.
-= Treat the image as a single character.
+  *    10 = Treat the image as a single character.
-       -v
+<code bash>-v</code>
-           Returns the current version of the tesseract(1) executable.
+Returns the current version of the tesseract(1) executable.
-       configfile
+<code bash>configfile</code>
-           The name of a config to use. A config is a plaintext file which
-           contains a list of variables and their values, one per line, with a
-           space separating variable from value. Interesting config files
-           include:
-           o   hocr - Output in hOCR format instead of as a text file.
+The name of a config to use. A config is a plaintext file which   contains a list of variables and their values, one per line, with a space separating variable from value. Interesting config files   include:
-       Nota Bene: The options -l lang and -psm N must occur before any
+   o   hocr - Output in hOCR format instead of as a text file.
-       configfile.
+<note>Nota Bene: The options -l lang and -psm N must occur before any configfile.</note>
 ==== Языки ====
-       There are currently language packs available for the following
+There are currently language packs available for the following       languages:
-       languages:
-       ara (Arabic), aze (Azerbauijani), bul (Bulgarian), cat (Catalan), ces
+   -  ara (Arabic),
-       (Czech), chi_sim (Simplified Chinese), chi_tra (Traditional Chinese),
+   -  aze (Azerbauijani),
-       chr (Cherokee), dan (Danish), dan-frak (Danish (Fraktur)), deu
+   -  bul (Bulgarian),
-       (German), ell (Greek), eng (English), enm (Old English), epo
+   - cat (Catalan),
-       (Esperanto), est (Estonian), fin (Finnish), fra (French), frm (Old
+   -  ces (Czech),
-       French), glg (Galician), heb (Hebrew), hin (Hindi), hrv (Croation), hun
+   -  chi_sim (Simplified Chinese),
-       (Hungarian), ind (Indonesian), ita (Italian), jpn (Japanese), kor
+   -  chi_tra (Traditional Chinese),
-       (Korean), lav (Latvian), lit (Lithuanian), nld (Dutch), nor
+   -  chr (Cherokee),
-       (Norwegian), pol (Polish), por (Portuguese), ron (Romanian), rus
+   -  dan (Danish),
-       (Russian), slk (Slovakian), slv (Slovenian), sqi (Albanian), spa
+   -  dan-frak (Danish (Fraktur)),
-       (Spanish), srp (Serbian), swe (Swedish), tam (Tamil), tel (Telugu), tgl
+   -  deu  (German),
-       (Tagalog), tha (Thai), tur (Turkish), ukr (Ukrainian), vie (Vietnamese)
+   - ell (Greek),
+   - eng (English),
+   - enm (Old English),
+   -  epo (Esperanto),
+   -  est (Estonian),
+   - fin (Finnish),
+   -  fra (French),
+   -  frm (Old   French),
+   - glg (Galician),
+   - heb (Hebrew),
+   -  hin (Hindi),
+   -  hrv (Croation),
+   -  hun   (Hungarian),
+   - ind (Indonesian),
+   - ita (Italian),
+   - jpn (Japanese),
+   -  kor (Korean),
+   -  lav (Latvian),
+   -  lit (Lithuanian),
+   -  nld (Dutch),
+   - nor (Norwegian),
+   - pol (Polish),
+   -  por (Portuguese),
+   -  ron (Romanian),
+   -  rus (Russian),
+   -  slk (Slovakian),
+   - slv (Slovenian),
+   - sqi (Albanian),
+   -  spa (Spanish),
+   -  srp (Serbian),
+   -  swe (Swedish),
+   - tam (Tamil),
+   -  tel (Telugu),
+   -  tgl (Tagalog),
+   -  tha (Thai),
+   - tur (Turkish),
+   - ukr (Ukrainian),
+   -  vie (Vietnamese)
-       To use a non-standard language pack named foo.traineddata, set the
+To use a non-standard language pack named foo.traineddata, set the       TESSDATA_PREFIX environment variable so the file can be found at       TESSDATA_PREFIX/tessdata/foo.traineddata and give Tesseract the       argument -l foo.
-       TESSDATA_PREFIX environment variable so the file can be found at
-       TESSDATA_PREFIX/tessdata/foo.traineddata and give Tesseract the
-       argument -l foo.
  ==== История ====
+''Tesseract'' был разработан компанией HP между 1985 и 1995, а затем десять лет не изменялся. В 2005 году были открыты исходные тексты. С 2006 года разработку движка спонсирует компания Google.
        The engine was developed at Hewlett Packard Laboratories Bristol and at
        Hewlett Packard Co, Greeley Colorado between 1985 and 1994, with some
@@ Строка 113: / Строка 135: @@
  ==== Ресурсы ====
-       Main web site: http://code.google.com/p/tesseract-ocr/ Information on
+  * Сайт проекта: https://github.com/tesseract-ocr
-       training:
+  * Документация: https://github.com/tesseract-ocr/tesseract/wiki
-       http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3
+  * Википедия: https://ru.wikipedia.org/wiki/Tesseract
  ==== Смотрите также ====
        ambiguous_words(1), cntraining(1), combine_tessdata(1),
@@ Строка 133: / Строка 154: @@
        Samuel Charron, Sheelagh Lloyd, Shobhit Saxena, and Thomas Kielbus.
-==== COPYING ====
+==== Копирование ====
-       Licensed under the Apache License, Version 2.0
+Зарегистрирован под лицензией //Apache License, Version 2.0//
 {{tag>tesseract}}

Tesseract Сравнение версий

Различия