文字识别后返回单字识别坐标 #10377

shiyutang · 2023-07-13T02:56:29Z

背景

经过需求征集#10334 和每周技术研讨会 #10223 讨论，我们确定了文字识别后返回单字识别坐标的任务，该任务在文档比对、关键字提取、合同篡改确认等重要场景发挥作用。本任务的完成能显著OCR结果的细粒度，并有众多场景应用。

解决步骤

在OCR的过程中，在没有单字符标注的情况下，确定图像中每个字符的位置，一种推荐的方案是在最后的CTC过程中计算重复字符的区域。
将这个区域反馈到原图中，计算得到每个字符的位置。
将计算单字符结果和gt比较，确认返回结果正确，并针对badcase进行进一步调优。

shiyutang · 2023-08-14T03:56:34Z

任务已经完成

evanlin88 · 2024-01-16T07:02:42Z

@shiyutang 此功能，如何使用？有对应的说明文档不？谢谢

-- 刚找到了 --return_word_box true，只是需安装最新2.6.1版本
paddleocr --image_dir ./11.jpg --use_angle_cls true --use_gpu false --return_word_box true

xiaolling · 2024-03-01T09:09:18Z

@shiyutang 此功能，如何使用？有对应的说明文档不？谢谢

-- 刚找到了 --return_word_box true，只是需安装最新2.6.1版本 paddleocr --image_dir ./11.jpg --use_angle_cls true --use_gpu false --return_word_box true

请问在 2.6.1 哪个子版本，试了都没看到 --return_word_box 这个参数

bucaiLi · 2024-03-27T13:33:51Z

@shiyutang 此功能，如何使用？有对应的说明文档不？谢谢
-- 刚找到了 --return_word_box true，只是需安装最新2.6.1版本 paddleocr --image_dir ./11.jpg --use_angle_cls true --use_gpu false --return_word_box true

请问在 2.6.1 哪个子版本，试了都没看到 --return_word_box 这个参数

你好请问有办法了嘛我也是想识别单个字符位置请问老哥你实现了嘛

gongdj · 2024-06-10T23:41:13Z

没有找到这个：unrecognized arguments: --return_word_box true

GreatV · 2024-06-11T01:06:16Z

@gongdj 试试 main 分支的最新版本 pip install -e git+https://github.com/PaddlePaddle/PaddleOCR.git，

paddleocr --image_dir=338306764-525d60c9-9383-4525-b707-dda104919b78.jpg --return_word_box=true

gongdj · 2024-06-12T13:14:45Z

@GreatV 试了不行，我的依赖是：
(paddle_env) PS F:\github\PaddleOCR> pip list
Package Version

anyio 4.4.0
astor 0.8.1
attrdict 2.0.1
Babel 2.15.0
bce-python-sdk 0.9.14
beautifulsoup4 4.12.3
blinker 1.8.2
cachetools 5.3.3
certifi 2024.6.2
charset-normalizer 3.3.2
click 8.1.7
colorama 0.4.6
contourpy 1.1.1
cssselect 1.2.0
cssutils 2.11.1
cycler 0.12.1
Cython 3.0.10
decorator 5.1.1
et-xmlfile 1.1.0
exceptiongroup 1.2.1
fire 0.6.0
Flask 3.0.3
flask-babel 4.0.0
fonttools 4.53.0
future 1.0.0
h11 0.14.0
httpcore 1.0.5
httpx 0.27.0
idna 3.7
imageio 2.34.1
imgaug 0.4.0
importlib_metadata 7.1.0
importlib_resources 6.4.0
itsdangerous 2.2.0
Jinja2 3.1.4
kiwisolver 1.4.5
lazy_loader 0.4
lmdb 1.4.1
lxml 5.2.2
MarkupSafe 2.1.5
matplotlib 3.7.5
more-itertools 10.2.0
networkx 3.1
numpy 1.24.4
opencv-contrib-python 4.6.0.66
opencv-python 4.6.0.66
opencv-python-headless 4.10.0.82
openpyxl 3.1.3
opt-einsum 3.3.0
packaging 24.0
paddleocr 2.7.3
paddlepaddle 2.6.1
pandas 2.0.3
pdf2docx 0.5.8
pillow 10.3.0
pip 24.0
premailer 3.10.0
protobuf 3.20.2
psutil 5.9.8
pyclipper 1.3.0.post5
pycryptodome 3.20.0
PyMuPDF 1.24.5
PyMuPDFb 1.24.3
pyparsing 3.1.2
python-dateutil 2.9.0.post0
python-docx 1.1.2
pytz 2024.1
PyWavelets 1.4.1
PyYAML 6.0.1
rapidfuzz 3.9.3
rarfile 4.2
requests 2.32.3
scikit-image 0.21.0
scipy 1.10.1
setuptools 69.5.1
shapely 2.0.4
six 1.16.0
sniffio 1.3.1
soupsieve 2.5
termcolor 2.4.0
tifffile 2023.7.10
tqdm 4.66.4
typing_extensions 4.12.2
tzdata 2024.1
urllib3 2.2.1
visualdl 2.5.3
Werkzeug 3.0.3
wheel 0.43.0
zipp 3.19.2

gongdj · 2024-06-12T13:15:38Z

@GreatV
命令是：
paddleocr --image_dir="D:\ocr\scan-1.png" --det_model_dir="F:\github\ocr_model\ch_PP-OCRv4_det_server_infer" --return_word_box=true --lang=ch

gongdj · 2024-06-12T13:17:36Z

报错：paddleocr: error: unrecognized arguments: --return_word_box=true

xiehurricane · 2024-06-15T08:07:44Z

报错：paddleocr: error: unrecognized arguments: --return_word_box=true

我用的代码也许你可以参考一下，2.8.0 现在2024-6-14的master分支。

from paddleocr import PaddleOCR, draw_ocr

ocr = PaddleOCR(use_angle_cls=False, lang="ch", return_word_box=True, use_gpu=False)  # need to run only once to download and load model into memory
img_path = r'D:\projectTest\PaddleOCR\doc\imgs\11.jpg'
result = ocr.ocr(img_path, cls=True)
# ocr返回每个字的坐标和文本信息

for idx in range(len(result)):
    res = result[idx]
    for line in res:
        print(line)

[[[26.0, 37.0], [304.0, 37.0], [304.0, 73.0], [26.0, 73.0]], ('纯臻营养护发素', 0.9946897625923157, [46.085826210826205, [['纯', '臻', '营', '养', '护', '发', '素']], [[3, 10, 16, 23, 30, 36, 43]], ['cn']])]
这个东西应该就是文字的偏移量[[3, 10, 16, 23, 30, 36, 43]]，找了挺久才在这里看到。
话说PaddleOCR这个对象有哪些参数可以传啊，哪里有说明？

Alanhzl · 2024-06-17T07:57:33Z

报错：paddleocr: error: unrecognized arguments: --return_word_box=true

我用的代码也许你可以参考一下，2.8.0 现在2024-6-14的master分支。
from paddleocr import PaddleOCR, draw_ocr

ocr = PaddleOCR(use_angle_cls=False, lang="ch", return_word_box=True, use_gpu=False)  # need to run only once to download and load model into memory
img_path = r'D:\projectTest\PaddleOCR\doc\imgs\11.jpg'
result = ocr.ocr(img_path, cls=True)
# ocr返回每个字的坐标和文本信息

for idx in range(len(result)):
    res = result[idx]
    for line in res:
        print(line)
[[[26.0, 37.0], [304.0, 37.0], [304.0, 73.0], [26.0, 73.0]], ('纯臻营养护发素', 0.9946897625923157, [46.085826210826205, [['纯', '臻', '营', '养', '护', '发', '素']], [[3, 10, 16, 23, 30, 36, 43]], ['cn']])] 这个东西应该就是文字的偏移量[[3, 10, 16, 23, 30, 36, 43]]，找了挺久才在这里看到。话说PaddleOCR这个对象有哪些参数可以传啊，哪里有说明？

对着图片看了下，[[3, 10, 16, 23, 30, 36, 43]]这个不像偏移量，位置对不上

LinZhineng · 2024-06-19T09:37:42Z

@Alanhzl 你好，你了解到这个数值代表的含义了吗？不知道这个怎么对应到像素的位置坐标

Alanhzl · 2024-06-20T01:23:52Z

@Alanhzl 你好，你了解到这个数值代表的含义了吗？不知道这个怎么对应到像素的位置坐标

这个数值是CTC切片后的识别内容的位置，字符大概的位置：每个切片的宽度是（图片总宽width/46.085826210826205），然后乘[[3, 10, 16, 23, 30, 36, 43]]这里面的值，就是每一个字符相对文本行的位置。这个位置不是特别准确，如果要精确框出每一个字符的话是不行的，只能有个大概的位置。
还有这个46.085826210826205我不怎么能理解，这个是切片数* (wh_ratio / max_wh_ratio)，为什么要乘这个值我不清楚。

buptlj · 2024-07-23T10:42:41Z

@Alanhzl 你好，你了解到这个数值代表的含义了吗？不知道这个怎么对应到像素的位置坐标

这个数值是CTC切片后的识别内容的位置，字符大概的位置：每个切片的宽度是（图片总宽width/46.085826210826205），然后乘[[3, 10, 16, 23, 30, 36, 43]]这里面的值，就是每一个字符相对文本行的位置。这个位置不是特别准确，如果要精确框出每一个字符的话是不行的，只能有个大概的位置。还有这个46.085826210826205我不怎么能理解，这个是切片数* (wh_ratio / max_wh_ratio)，为什么要乘这个值我不清楚。

因为多batch处理时，会把图片按照max_wh_ratio进行pad，所以* (wh_ratio / max_wh_ratio)就相当于对应回原始图片宽度，这样用图片总宽width/46.085826210826205，就可以得到每个cell的宽度

DoiiarX · 2024-08-05T04:25:52Z

细节：根据源码分析，这里提到的坐标，中文是中心坐标，而英文是左侧坐标。

paddle-bot bot assigned tink2123 Jul 13, 2023

shiyutang changed the title ~~文字识别返回单字识别坐标~~ OCR过程返回单字识别坐标 Jul 13, 2023

shiyutang changed the title ~~OCR过程返回单字识别坐标~~ 文字识别后返回单字识别坐标 Jul 13, 2023

This was referenced Jul 13, 2023

CV套件建设专项活动 PaddlePaddle/PaddleSeg#3333

Closed

🏅️飞桨套件快乐开源常规赛 #10223

Closed

shiyutang assigned shiyutang and unassigned tink2123 Jul 13, 2023

shiyutang closed this as completed Aug 14, 2023

paddle-bot bot added the status/close label Aug 14, 2023

Ligoml mentioned this issue Oct 31, 2023

飞桨快乐开源活动全新升级🔥 PaddlePaddle/Paddle#56689

Closed

DoiiarX mentioned this issue Jul 25, 2024

关于单字和多字检测识别的区别 #2828

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

文字识别后返回单字识别坐标 #10377

文字识别后返回单字识别坐标 #10377

shiyutang commented Jul 13, 2023 •

edited

Loading

shiyutang commented Aug 14, 2023

evanlin88 commented Jan 16, 2024 •

edited

Loading

xiaolling commented Mar 1, 2024

bucaiLi commented Mar 27, 2024

gongdj commented Jun 10, 2024

GreatV commented Jun 11, 2024

gongdj commented Jun 12, 2024

gongdj commented Jun 12, 2024

gongdj commented Jun 12, 2024

xiehurricane commented Jun 15, 2024 •

edited

Loading

Alanhzl commented Jun 17, 2024

LinZhineng commented Jun 19, 2024

Alanhzl commented Jun 20, 2024

buptlj commented Jul 23, 2024

DoiiarX commented Aug 5, 2024

文字识别后返回单字识别坐标 #10377

文字识别后返回单字识别坐标 #10377

Comments

shiyutang commented Jul 13, 2023 • edited Loading

背景

解决步骤

shiyutang commented Aug 14, 2023

evanlin88 commented Jan 16, 2024 • edited Loading

xiaolling commented Mar 1, 2024

bucaiLi commented Mar 27, 2024

gongdj commented Jun 10, 2024

GreatV commented Jun 11, 2024

gongdj commented Jun 12, 2024

gongdj commented Jun 12, 2024

gongdj commented Jun 12, 2024

xiehurricane commented Jun 15, 2024 • edited Loading

Alanhzl commented Jun 17, 2024

LinZhineng commented Jun 19, 2024

Alanhzl commented Jun 20, 2024

buptlj commented Jul 23, 2024

DoiiarX commented Aug 5, 2024

shiyutang commented Jul 13, 2023 •

edited

Loading

evanlin88 commented Jan 16, 2024 •

edited

Loading

xiehurricane commented Jun 15, 2024 •

edited

Loading