Image Text Extraction

This repository contains a Node.js script for extracting text from images and parsing the extracted text to obtain specific information. The script utilizes optical character recognition (OCR) to extract text from images and outputs the extracted information in CSV format.

Assignment Task

The assignment task involves the following steps:

Read Input Images: Provide input images containing the relevant text you want to extract. Place these images in the images/ directory.
Extract Selective Text: Extract specific information from the images, such as check marks on checkboxes and text written on lines.
Output Key-Value Pairs in CSV Format: Organize the extracted information into key-value pairs and output it in CSV format. Fields to include are Name, Husband Name/Father's Name, House Number, Age, and Gender.

Installation

Clone the Repository: Clone the repository to your local machine using Git:
```
git clone https://github.com/your-username/image-text-extraction.git
```
Install Dependencies: Navigate into the cloned directory and install the required dependencies using npm:
```
cd image-text-extraction
sudo apt-get install tesseract-ocr
npm install
```

Usage

Place Your Image Files: Put your image files containing the text you want to extract into the images/ directory within the repository.
Edit index.js: Customize the index.js file to specify the path to your image file(s) and any other settings you need to adjust.
Run the Script: Execute the Node.js script using the following command:
```
node index.js
```
Check Output: Once the script finishes running, you'll find the extracted information saved in an output.csv file in the root directory of the repository.

Dependencies

node-tesseract-ocr: Library for optical character recognition (OCR) to extract text from images.
fs: Built-in module in Node.js used for file system operations.

Contributing

Contributions are welcome! If you encounter any issues or have suggestions for improvement, feel free to open an issue or submit a pull request to the repository.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
images		images
output		output
.gitignore		.gitignore
README.md		README.md
eng.traineddata		eng.traineddata
index.js		index.js
openai_config.js		openai_config.js
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Text Extraction

Assignment Task

Installation

Usage

Dependencies

Contributing

About

Releases

Packages

Languages

adarsh-scanta/image-text-extraction

Folders and files

Latest commit

History

Repository files navigation

Image Text Extraction

Assignment Task

Installation

Usage

Dependencies

Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages