update readme

CosmosShadow · Jun 28, 2024 · 40ecbb0 · 40ecbb0
1 parent 99cdf8e
commit 40ecbb0
Show file tree

Hide file tree

Showing 2 changed files with 11 additions and 7 deletions.
diff --git a/README.md b/README.md
@@ -17,13 +17,15 @@ This package use [GeneralAgent](https://github.com/CosmosShadow/GeneralAgent) li
 
 ## Process steps
 
-1. Use the PyMuPDF library to parse the PDF and extract all non-text rectangular areas.
-2. Convert all non-text rectangular areas on the PDF into pictures and number them
-3. Mark each page of the PDF with a red rectangle and number and save it as an image, similar to the following:
+1. Use the PyMuPDF library to parse the PDF and extract all non-text areas.
+
+2. Convert all non-text areas on the PDF into images and number them
+
+3. Mark the non-text areas and numbers on each page of the PDF and save them as images, similar to the following:
 
 ![](docs/demo.jpg)
 
-4. Based on the picture in step 3, use a large visual model (such as GPT-4o) to parse and get the markdown content.
+4. Based on the image in step 3, use a large visual model (such as GPT-4o) to parse and obtain the markdown content.
 
 
 

diff --git a/README_CN.md b/README_CN.md
@@ -17,9 +17,11 @@
 
 ## 处理流程
 
-1. 使用 PyMuPDF 库，对 PDF 进行解析，提取所有非文本的矩形区域(包括表格、图片、图标等)
-2. 将 PDF 上所有非文本的矩形区域转成图片，并进行编号
-3. 在每页PDF上标记好红色矩形框和编号，保存为图片，类似如下:
+1. 使用 PyMuPDF 库，对 PDF 进行解析，提取所有非文本区域(包括表格、图片、图标等)
+
+2. 将 PDF 上所有非文本区域转成图片，并进行编号
+
+3. 在每页PDF上标记非文本区域和编号，保存为图片，类似如下:
 
 ![](docs/demo.jpg)