From 47cc64a41f2c085f3de5890e903a0ad326453929 Mon Sep 17 00:00:00 2001 From: Shree Devi Kumar Date: Thu, 1 Jun 2017 12:34:28 +0530 Subject: [PATCH] Reorganize Readme.md --- README.md | 102 +++++++++++++++++++++++++++++------------------------- 1 file changed, 55 insertions(+), 47 deletions(-) diff --git a/README.md b/README.md index 3b24334d0f..39c030b9c6 100644 --- a/README.md +++ b/README.md @@ -1,73 +1,50 @@ # Tesseract OCR -For the latest online version of the README.md see: - -https://github.com/tesseract-ocr/tesseract/blob/master/README.md - -### Build -[![Build Status](https://travis-ci.org/tesseract-ocr/tesseract.svg?branch=master)](https://travis-ci.org/tesseract-ocr/tesseract) -[![Build status](https://ci.appveyor.com/api/projects/status/miah0ikfsf0j3819/branch/master?svg=true)](https://ci.appveyor.com/project/zdenop/tesseract/) +**Travis** +[![Travis Build Status](https://travis-ci.org/tesseract-ocr/tesseract.svg?branch=master)](https://travis-ci.org/tesseract-ocr/tesseract) +**Appveyor** +[![Appveyor Build status](https://ci.appveyor.com/api/projects/status/miah0ikfsf0j3819/branch/master?svg=true)](https://ci.appveyor.com/project/zdenop/tesseract/) -### Other +**Other** [![Coverity Scan Build Status](https://scan.coverity.com/projects/tesseract-ocr/badge.svg)](https://scan.coverity.com/projects/tesseract-ocr) -[![Insight.io](https://www.insight.io/repoBadge/github.com/tesseract-ocr/tesseract)](https://insight.io/github.com/tesseract-ocr/tesseract) +[![Insight.io Documentation](https://www.insight.io/repoBadge/github.com/tesseract-ocr/tesseract)](https://insight.io/github.com/tesseract-ocr/tesseract) -# About +## About -This package contains an OCR engine - `libtesseract` and a command line program - `tesseract`. +This package contains an **OCR engine** - `libtesseract` and a **command line program** - `tesseract`. The lead developer is Ray Smith. The maintainer is Zdenko Podobny. For a list of contributors see [AUTHORS](https://github.com/tesseract-ocr/tesseract/blob/master/AUTHORS) and GitHub's log of [contributors](https://github.com/tesseract-ocr/tesseract/graphs/contributors). -Tesseract has unicode (UTF-8) support, and can recognize more than 100 -languages "out of the box". It can be trained to recognize other languages. See [Tesseract Training](https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract) for more information. +Tesseract has **unicode (UTF-8) support**, and can **recognize more than 100 languages** "out of the box". -Tesseract supports various output formats: plain-text, hocr(html), pdf. +Tesseract supports **various output formats**: plain-text, hocr(html), pdf, tsv, invisible-text-only pdf. -This project does not include a GUI application. If you need one, please see the [3rdParty](https://github.com/tesseract-ocr/tesseract/wiki/User-Projects-%E2%80%93-3rdParty) wiki page. +You should note that in many cases, in order to get better OCR results, you'll need to **[improve the quality](https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality) of the image** you are giving Tesseract. -You should note that in many cases, in order to get better OCR results, you'll need to [improve the quality](https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality) of the image you are giving Tesseract. +This project **does not include a GUI application**. If you need one, please see the [3rdParty](https://github.com/tesseract-ocr/tesseract/wiki/User-Projects-%E2%80%93-3rdParty) wiki page. -The latest stable version is 3.05.00, released in February 2017. +Tesseract **can be trained to recognize other languages**. See [Tesseract Training](https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract) for more information. -# Brief history +## Brief history Tesseract was originally developed at Hewlett-Packard Laboratories Bristol and at Hewlett-Packard Co, Greeley Colorado between 1985 and 1994, with some more changes made in 1996 to port to Windows, and some C++izing in 1998. - In 2005 Tesseract was open sourced by HP. Since 2006 it is developed by Google. -[Release Notes](https://github.com/tesseract-ocr/tesseract/wiki/ReleaseNotes) +The latest stable version is **[3.05.00](https://github.com/tesseract-ocr/tesseract/releases/tag/3.05.00)**, released in February 2017. Source code is available from [3.05 branch on github](https://github.com/tesseract-ocr/tesseract/tree/3.05). 3.05.01 bug-fix release is expected in May/June 2017. -# For developers +Source code for the new **[LSTM based 4.00.00alpha version](https://github.com/tesseract-ocr/tesseract)** is available from the master branch on github. Please note this branch is under active development. -Developers can use `libtesseract` [C](https://github.com/tesseract-ocr/tesseract/blob/master/api/capi.h) or [C++](https://github.com/tesseract-ocr/tesseract/blob/master/api/baseapi.h) API to build their own application. If you need bindings to `libtesseract` for other programming languages, please see the [wrapper](https://github.com/tesseract-ocr/tesseract/wiki/AddOns#tesseract-wrappers) section on AddOns wiki page. +See **[Release Notes](https://github.com/tesseract-ocr/tesseract/wiki/ReleaseNotes)** and **[Change Log](https://github.com/tesseract-ocr/tesseract/blob/master/ChangeLog)** for more details of the releases. -Documentation of Tesseract generated from source code by doxygen can be found on [tesseract-ocr.github.io](http://tesseract-ocr.github.io/). - -# License - - The code in this repository is licensed under the Apache License, Version 2.0 (the "License"); - you may not use this file except in compliance with the License. - You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the License for the specific language governing permissions and - limitations under the License. - -**NOTE**: This software depends on other packages that may be licensed under different open source licenses. - -# Installing Tesseract +## Installing Tesseract You can either [Install Tesseract via pre-built binary package](https://github.com/tesseract-ocr/tesseract/wiki) or [build it from source](https://github.com/tesseract-ocr/tesseract/wiki/Compiling). -## Supported Compilers +Supported Compilers are: * GCC 4.8 and above * Clang 3.4 and above @@ -75,18 +52,49 @@ You can either [Install Tesseract via pre-built binary package](https://github.c Other compilers might work, but are not officially supported. -# Running Tesseract +## Running Tesseract -Basic command line usage: +Basic **[command line usage](https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage)**: - tesseract imagename outputbase [-l lang] [--psm pagesegmode] [configfiles...] + tesseract imagename outputbase [-l lang] [--oem ocrenginemode] [--psm pagesegmode] [configfiles...] For more information about the various command line options use `tesseract --help` or `man tesseract`. -# Support +## For developers + +Developers can use `libtesseract` [C](https://github.com/tesseract-ocr/tesseract/blob/master/api/capi.h) or [C++](https://github.com/tesseract-ocr/tesseract/blob/master/api/baseapi.h) API to build their own application. If you need bindings to `libtesseract` for other programming languages, please see the [wrapper](https://github.com/tesseract-ocr/tesseract/wiki/AddOns#tesseract-wrappers) section on AddOns wiki page. + +Documentation of Tesseract generated from source code by doxygen can be found on [tesseract-ocr.github.io](http://tesseract-ocr.github.io/). + +## Support + +First read the [Wiki](https://github.com/tesseract-ocr/tesseract/wiki), particularly the [FAQ](https://github.com/tesseract-ocr/tesseract/wiki/FAQ) to see if your problem is addressed there. If not, search the [Tesseract user forum](https://groups.google.com/d/forum/tesseract-ocr), the [Tesseract developer forum](https://groups.google.com/d/forum/tesseract-dev) and [past issues](https://github.com/tesseract-ocr/tesseract/issues), and if you still can't find what you need, ask for support in the mailing-lists. Mailing-lists: * [tesseract-ocr](https://groups.google.com/d/forum/tesseract-ocr) - For tesseract users. * [tesseract-dev](https://groups.google.com/d/forum/tesseract-dev) - For tesseract developers. -Please read the [FAQ](https://github.com/tesseract-ocr/tesseract/wiki/FAQ) before asking any question in the mailing-list or reporting an issue. +Please report an issue only for a **bug**, not for asking questions. + +## License + + The code in this repository is licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + +**NOTE**: This software depends on other packages that may be licensed under different open source licenses. + +## Latest Version of README + +For the latest online version of the README.md see: + +https://github.com/tesseract-ocr/tesseract/blob/master/README.md +