What algorithm should I follow for extracting tables containing text from PDF using Python, OpenCV and Tesseract?
First you need to convert the PDF into s and PDF. Please check the below paper for table detection in the scanned document image. Use the method to identify the table region and apply tessearct to convert the table cell region into . Tesseract has the table detection module but it won't detect all kind of tables in the PDF.