Document Intelligence Series — Part-1: Table Detection with YOLOv8

Rajat Roy
3 min readAug 13, 2023
Photo by Mr Cup / Fabien Barral on Unsplash

Introduction

When dealing with unstructured data, you frequently encounter a situation where you must seek a resolution to efficiently retrieve information from a table within any document.

Pytesseract is one of python libraries which can help you perform OCR on documents to get all the text present in it. But what if you want only the tabular information?

In order to extract the tabular data, you must initially determine the table’s location within the document. Subsequently, you should extract and retain solely the tabular segment of the image. Ultimately, you can provide the cropped image to pytesseract for information extraction.

In this article, I am going to code to perform table detection on a given image and pass it on to pytesseract to extract the text from it.

Code

Let’s break down the process step by step. The initial stage involves table detection, which is facilitated by a Python library known as ultralytics. This library encompasses a range of advanced tasks such as object detection, classification, and segmentation based on deep learning techniques. For this particular illustration, the yolov8 model from ultralytics has been used.

Please note that the code is executed within the Google Colab environment.

  1. Installing dependencies.
# Installing tesseract in system
!sudo apt install tesseract-ocr

# Installing required dependencies
!pip install pytesseract transformers ultralyticsplus==0.0.23 ultralytics==8.0.21

2. Import Packages.

import numpy as np
import pandas as pd
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

import pytesseract
from pytesseract import Output

from ultralyticsplus import YOLO, render_result
from PIL import Image

3. Load Image

## Image downloaded from below link
# https://stackoverflow.com/questions/50829874/how-to-find-table-like-structure-in-image

image = './borderless_table.jpg'

img = Image.open(image)
img
Sample Image with a borderless table

4. Initialize YOLOv8m model.

# load model
model = YOLO('keremberke/yolov8m-table-extraction')

# set model parameters
model.overrides['conf'] = 0.25 # NMS confidence threshold
model.overrides['iou'] = 0.45 # NMS IoU threshold
model.overrides['agnostic_nms'] = False # NMS class-agnostic
model.overrides['max_det'] = 1000 # maximum number of detections per image

5. Table Detection.

# perform inference
results = model.predict(img)

# observe results
print('Boxes: ', results[0].boxes)
render = render_result(model=model, image=img, result=results[0])
render
Model Output

6. Cropping tabular part from image.

x1, y1, x2, y2, _, _ = tuple(int(item) for item in results[0].boxes.data.numpy()[0])
img = np.array(Image.open(image))
#cropping
cropped_image = img[y1:y2, x1:x2]
cropped_image = Image.fromarray(cropped_image)
cropped_image
Cropped Table Region

7. Perform OCR.

ext_df = pytesseract.image_to_data(cropped_image, output_type=Output.DATAFRAME, config="--psm 6 --oem 3")
ext_df.head()
OCR Output Dataframe

Conclusion

Certainly, the YOLO v8 model offers significant assistance in the realm of table detection, particularly for both bordered and borderless tables found within documents. I’m indeed eager to explore and experiment with its capabilities firsthand.

If you want to dig deep and know more about the model it can be found here.

This message is for you!!

Are you a programming, AI, or machine learning enthusiast? Then you’ll love my blog on Medium! I regularly post about these topics and share my insights on the latest trends and tools in data science. If you find my content helpful, please like and follow my blog. And if you want to show some extra support, you can give a tip by clicking the button below. Thanks for your time and support!

WRITER at MLearning.ai / AI Movie Director /imagine AI 3D Models

--

--

Rajat Roy

Data Scientist | Machine Learning Engineer | Blogger