Skip to content

Commit

Permalink
3.01 code from http://github.com/jimregan/tesseract-ocr with addaptio…
Browse files Browse the repository at this point in the history
…ns related to Linux and Windows (VC2008) compile process

git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@526 d0cd1f9f-072b-0410-8dd7-cf729c803f20
  • Loading branch information
zdenop@gmail.com committed Nov 23, 2010
1 parent 7511d76 commit 4523ce9
Show file tree
Hide file tree
Showing 558 changed files with 51,651 additions and 41,572 deletions.
35 changes: 35 additions & 0 deletions ChangeLog
Original file line number Diff line number Diff line change
@@ -1,3 +1,38 @@
2010-09-22 - V3.01
* Thread-safety! Moved all critical globals and statics to
members of the appropriate class. Tesseract is now
thread-safe (multiple instances can be used in parallel
in multiple threads.) with the minor exception that some
control parameters are still global and affect all threads.
* Added Cube, a new recognizer for Arabic. Cube can also be
used in combination with normal Tesseract for other languages
with an improvement in accuracy at the cost of (much) lower speed.
There is no training module for Cube yet.
* OcrEngineMode in Init replaces AccuracyVSpeed to control cube.
* Greatly improved segmentation search with consequent accuracy and
speed improvements, especially for Chinese.
* Added PageIterator and ResultIterator as cleaner ways to get the
full results out of Tesseract, that are not currently provided
by any of the TessBaseAPI::Get* methods.
All other methods, such as the ETEXT_STRUCT in particular are
deprecated and will be deleted in the future.
* ApplyBoxes totally rewritten to make training easier.
It can now cope with touching/overlapping training characters,
and a new boxfile format allows word boxes instead of character
boxes, BUT to use that you have to have already boostrapped the
language with character boxes. "Cyclic dependency" on traineddata.
* Auto orientation and script detection added to page layout analysis.
* Deleted *lots* of dead code.
* Fixxht module replaced with scalable data-driven module.
* Output font characteristics accuracy improved.
* Removed the double conversion at each classification.
* Upgraded oldest structs to be classes and deprecated PBLOB.
* Removed non-deterministic baseline fit.
* Added fixed length dawgs for Chinese.
* Handling of vertical text improved.
* Handling of leader dots improved.
* Table detection greatly improved.

2010-09-21 - V3.00
* Preparations for thread safety:
* Changed TessBaseAPI methods to be non-static
Expand Down
2 changes: 1 addition & 1 deletion Makefile.am
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# TODO(luc) Add 'doc' to this list when ready
ACLOCAL_AMFLAGS = -I m4
SUBDIRS = ccstruct ccutil classify cutil dict image textord viewer wordrec ccmain training tessdata testing java api vs2008
SUBDIRS = ccstruct ccutil classify cube cutil dict image neural_networks/runtime textord viewer wordrec ccmain training tessdata testing java api
#if USING_GETTEXT
#SUBDIRS += po
#AM_CPPFLAGS = -DLOCALEDIR=\"$(localedir)\"
Expand Down
2 changes: 1 addition & 1 deletion Makefile.in
Original file line number Diff line number Diff line change
Expand Up @@ -234,7 +234,7 @@ top_srcdir = @top_srcdir@

# TODO(luc) Add 'doc' to this list when ready
ACLOCAL_AMFLAGS = -I m4
SUBDIRS = ccstruct ccutil classify cutil dict image textord viewer wordrec ccmain training tessdata testing java api vs2008
SUBDIRS = ccstruct ccutil classify cube cutil dict image neural_networks/runtime textord viewer wordrec ccmain training tessdata testing java api
#if USING_GETTEXT
#SUBDIRS += po
#AM_CPPFLAGS = -DLOCALEDIR=\"$(localedir)\"
Expand Down
35 changes: 35 additions & 0 deletions ReleaseNotes
Original file line number Diff line number Diff line change
@@ -1,3 +1,38 @@
Tesseract release notes Oct 1 2010 - V3.01
* Thread-safety! Moved all critical globals and statics to
members of the appropriate class. Tesseract is now
thread-safe (multiple instances can be used in parallel
in multiple threads.) with the minor exception that some
control parameters are still global and affect all threads.
* Added Cube, a new recognizer for Arabic. Cube can also be
used in combination with normal Tesseract for other languages
with an improvement in accuracy at the cost of (much) lower speed.
There is no training module for Cube yet.
* OcrEngineMode in Init replaces AccuracyVSpeed to control cube.
* Greatly improved segmentation search with consequent accuracy and
speed improvements, especially for Chinese.
* Added PageIterator and ResultIterator as cleaner ways to get the
full results out of Tesseract, that are not currently provided
by any of the TessBaseAPI::Get* methods.
All other methods, such as the ETEXT_STRUCT in particular are
deprecated and will be deleted in the future.
* ApplyBoxes totally rewritten to make training easier.
It can now cope with touching/overlapping training characters,
and a new boxfile format allows word boxes instead of character
boxes, BUT to use that you have to have already boostrapped the
language with character boxes. "Cyclic dependency" on traineddata.
* Auto orientation and script detection added to page layout analysis.
* Deleted *lots* of dead code.
* Fixxht module replaced with scalable data-driven module.
* Output font characteristics accuracy improved.
* Removed the double conversion at each classification.
* Upgraded oldest structs to be classes and deprecated PBLOB.
* Removed non-deterministic baseline fit.
* Added fixed length dawgs for Chinese.
* Handling of vertical text improved.
* Handling of leader dots improved.
* Table detection greatly improved.

Tesseract release notes Sep 30 2010 - V3.00
* Preparations for thread safety:
* Changed TessBaseAPI methods to be non-static
Expand Down
6 changes: 4 additions & 2 deletions api/Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,15 @@ AM_CPPFLAGS = -DLOCALEDIR=\"$(localedir)\"\
-I$(top_srcdir)/textord

include_HEADERS = \
baseapi.h tesseractmain.h
apitypes.h baseapi.h pageiterator.h resultiterator.h tesseractmain.h

lib_LTLIBRARIES = libtesseract_api.la
libtesseract_api_la_SOURCES = baseapi.cpp
libtesseract_api_la_SOURCES = baseapi.cpp pageiterator.cpp resultiterator.cpp
libtesseract_api_la_LDFLAGS = -version-info $(GENERIC_LIBRARY_VERSION)
libtesseract_api_la_LIBADD = \
../ccmain/libtesseract_main.la \
../cube/libtesseract_cube.la \
../neural_networks/runtime/libtesseract_neural.la \
../textord/libtesseract_textord.la \
../wordrec/libtesseract_wordrec.la \
../classify/libtesseract_classify.la \
Expand Down
13 changes: 10 additions & 3 deletions api/Makefile.in
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,8 @@ am__installdirs = "$(DESTDIR)$(libdir)" "$(DESTDIR)$(bindir)" \
"$(DESTDIR)$(includedir)"
LTLIBRARIES = $(lib_LTLIBRARIES)
libtesseract_api_la_DEPENDENCIES = ../ccmain/libtesseract_main.la \
../cube/libtesseract_cube.la \
../neural_networks/runtime/libtesseract_neural.la \
../textord/libtesseract_textord.la \
../wordrec/libtesseract_wordrec.la \
../classify/libtesseract_classify.la \
Expand All @@ -82,7 +84,8 @@ libtesseract_api_la_DEPENDENCIES = ../ccmain/libtesseract_main.la \
../image/libtesseract_image.la ../cutil/libtesseract_cutil.la \
../viewer/libtesseract_viewer.la \
../ccutil/libtesseract_ccutil.la
am_libtesseract_api_la_OBJECTS = baseapi.lo
am_libtesseract_api_la_OBJECTS = baseapi.lo pageiterator.lo \
resultiterator.lo
libtesseract_api_la_OBJECTS = $(am_libtesseract_api_la_OBJECTS)
libtesseract_api_la_LINK = $(LIBTOOL) --tag=CXX $(AM_LIBTOOLFLAGS) \
$(LIBTOOLFLAGS) --mode=link $(CXXLD) $(AM_CXXFLAGS) \
Expand Down Expand Up @@ -294,13 +297,15 @@ AM_CPPFLAGS = -DLOCALEDIR=\"$(localedir)\"\
-I$(top_srcdir)/textord

include_HEADERS = \
baseapi.h tesseractmain.h
apitypes.h baseapi.h pageiterator.h resultiterator.h tesseractmain.h

lib_LTLIBRARIES = libtesseract_api.la
libtesseract_api_la_SOURCES = baseapi.cpp
libtesseract_api_la_SOURCES = baseapi.cpp pageiterator.cpp resultiterator.cpp
libtesseract_api_la_LDFLAGS = -version-info $(GENERIC_LIBRARY_VERSION)
libtesseract_api_la_LIBADD = \
../ccmain/libtesseract_main.la \
../cube/libtesseract_cube.la \
../neural_networks/runtime/libtesseract_neural.la \
../textord/libtesseract_textord.la \
../wordrec/libtesseract_wordrec.la \
../classify/libtesseract_classify.la \
Expand Down Expand Up @@ -446,6 +451,8 @@ distclean-compile:
-rm -f *.tab.c

@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/baseapi.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/pageiterator.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/resultiterator.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/tesseractmain.Po@am__quote@

.cpp.o:
Expand Down
31 changes: 31 additions & 0 deletions api/apitypes.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
///////////////////////////////////////////////////////////////////////
// File: apitypes.h
// Description: Types used in both the API and internally
// Author: Ray Smith
// Created: Wed Mar 03 09:22:53 PST 2010
//
// (C) Copyright 2010, Google Inc.
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
// http://www.apache.org/licenses/LICENSE-2.0
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//
///////////////////////////////////////////////////////////////////////

#ifndef TESSERACT_API_APITYPES_H__
#define TESSERACT_API_APITYPES_H__

#include "publictypes.h"

// The types used by the API and Page/ResultIterator can be found in
// ccstruct/publictypes.h.
// API interfaces and API users should be sure to include this file, rather
// than the lower-level one, and lower-level code should be sure to include
// only the lower-level file.

#endif // TESSERACT_API_APITYPES_H__
Loading

0 comments on commit 4523ce9

Please sign in to comment.