Skip to content

Commit

Permalink
Update version in README and manpages (tesseract-ocr#1381)
Browse files Browse the repository at this point in the history
Signed-off-by: Stefan Weil <sw@weilnetz.de>
  • Loading branch information
stweil authored and zdenop committed Mar 12, 2018
1 parent 8fb6874 commit bdf6629
Show file tree
Hide file tree
Showing 3 changed files with 22 additions and 22 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ In 2005 Tesseract was open sourced by HP. Since 2006 it is developed by Google.

The latest stable version is **[3.05.01](https://github.com/tesseract-ocr/tesseract/releases/tag/3.05.01)**, released on June 1, 2017. Latest source code for 3.05 is available from [3.05 branch on GitHub](https://github.com/tesseract-ocr/tesseract/tree/3.05).

Source code for the new **[LSTM based 4.00.00alpha version](https://github.com/tesseract-ocr/tesseract)** is available from the master branch on GitHub. Please note this branch is under active development.
Source code for the new **[LSTM based 4.0 version](https://github.com/tesseract-ocr/tesseract)** is available from the master branch on GitHub. Please note this branch is under active development.

See **[Release Notes](https://github.com/tesseract-ocr/tesseract/wiki/ReleaseNotes)** and **[Change Log](https://github.com/tesseract-ocr/tesseract/blob/master/ChangeLog)** for more details of the releases.

Expand Down
38 changes: 19 additions & 19 deletions doc/combine_tessdata.1.asc
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ SYNOPSIS

DESCRIPTION
-----------
combine_tessdata(1) is the main program to combine/extract/overwrite/list/compact
combine_tessdata(1) is the main program to combine/extract/overwrite/list/compact
tessdata components in [lang].traineddata files.

To combine all the individual tessdata components (unicharset, DAWGs,
Expand Down Expand Up @@ -59,10 +59,10 @@ OPTIONS
*-c* '.traineddata' 'FILE'...:
Compacts the LSTM component in the .traineddata file to int.
*-d* '.traineddata' 'FILE'...:
Lists directory of components from the .traineddata file.
*-e* '.traineddata' 'FILE'...:
Extracts the specified components from the .traineddata file
Expand All @@ -81,15 +81,15 @@ CAVEATS
COMPONENTS
----------
The components in a Tesseract lang.traineddata file as of
Tesseract 4.00alpha are briefly described below; For more information on
Tesseract 4.0 are briefly described below; For more information on
many of these files, see
<https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract>
and
<https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00>
lang.config::
(Optional) Language-specific overrides to default config variables.
For 4.00alpha traineddata files, lang.config provides control parameters which
For 4.0 traineddata files, lang.config provides control parameters which
can affect layout analysis, and sub-languages.
lang.unicharset::
Expand Down Expand Up @@ -148,36 +148,36 @@ lang.params-model::
(Optional - 3.0x legacy tesseract) .
lang.lstm::
(Required - 4.00alpha LSTM) Neural net trained recognition model generated by lstmtraining.
(Required - 4.0 LSTM) Neural net trained recognition model generated by lstmtraining.
lang.lstm-punc-dawg::
(Optional - 4.00alpha LSTM) A dawg made from punctuation patterns found around words.
(Optional - 4.0 LSTM) A dawg made from punctuation patterns found around words.
The "word" part is replaced by a single space. Uses lang.lstm-unicharset.
lang.lstm-word-dawg::
(Optional - 4.00alpha LSTM) A dawg made from dictionary words from the language.
(Optional - 4.0 LSTM) A dawg made from dictionary words from the language.
Uses lang.lstm-unicharset.
lang.lstm-number-dawg::
(Optional - 4.00alpha LSTM) A dawg made from tokens which originally contained digits.
(Optional - 4.0 LSTM) A dawg made from tokens which originally contained digits.
Each digit is replaced by a space character. Uses lang.lstm-unicharset.
lang.lstm-unicharset::
(Required - 4.00alpha LSTM) The unicode character set that Tesseract recognizes, with properties.
(Required - 4.0 LSTM) The unicode character set that Tesseract recognizes, with properties.
Same unicharset must be used to train the LSTM and build the lstm-*-dawgs files.
lang.lstm-recoder::
(Required - 4.00alpha LSTM) Unicharcompress, aka the recoder, which maps the unicharset
(Required - 4.0 LSTM) Unicharcompress, aka the recoder, which maps the unicharset
further to the codes actually used by the neural network recognizer. This is created as
part of the starter traineddata by combine_lang_model.
lang.version::
(Optional) Version string for the traineddata file.
First appeared in version 4.00alpha of Tesseract.
Old version of traineddata files will report Version string:Pre-4.0.0.
4.00alpha version of traineddata files may include the network spec
(Optional) Version string for the traineddata file.
First appeared in version 4.0 of Tesseract.
Old version of traineddata files will report Version string:Pre-4.0.0.
4.0 version of traineddata files may include the network spec
used for LSTM training as part of version string.
HISTORY
-------
combine_tessdata(1) first appeared in version 3.00 of Tesseract
Expand Down
4 changes: 2 additions & 2 deletions doc/tesseract.1.asc
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ SINGLE OPTIONS
LANGUAGES
---------

The currently available traineddata files for tesseract 4.00
The currently available traineddata files for tesseract 4.0
for the following languages are in
(in https://github.com/tesseract-ocr/tessdata_fast):

Expand Down Expand Up @@ -244,7 +244,7 @@ argument '-l foo'.
SCRIPTS
-------
The traineddata files for the following scripts for tesseract 4.00
The traineddata files for the following scripts for tesseract 4.0
are also in https://github.com/tesseract-ocr/tessdata_fast.
In most cases, each of these contains all the languages that use that script PLUS English.
Expand Down

0 comments on commit bdf6629

Please sign in to comment.