OCR PDF

How Do I Find Data From A PDF Using Bule Prism?

Try now

How it works

Upload & Edit
Your PDF Document
Save, Download,
Print, and Share
Sign & Make
It Legally Binding

How Do I Find Data From A PDF Using Bule Prism?

Sorry for the late reply. I would assume the PDF is an actual PDF Document and not image. This might sound a bit confusing but there are two types of PDF documents PDF Documents. This document ends with a .pdf extension and it was created using Ocr Pdf or Microsoft Word. Again, you can verify that by sending Ctrl + A and Ctrl + C to select and copy the content of the document. Using any of these applications Word/Text File/Excel/Powerpoint, send Ctrl + V to paste. If successful, you should be able to paste your clipboard content into any cell or line and/or use GetClipboard in your process to retrieve the clipboard content PDF Images. This document ends with a .pdf or a .tiff format and most likely are scanned documents. Unfortunately copying and pasting will never work for this category and 100% of the time, you will opt for the OCR technology. OCR will only work for PDF images that are not handwritten but computer generated as it can recognise system fonts available on your computer and read it. The success of OCR in this scenario is dependent on the quality of the image e.g 300dpi will provide a higher success rate Let's assume your PDF was an actual document, Blue Prism has three ways of extracting information from your PDF Windows Clipboard OCR Ocr Pdf API Windows Clipboard Launch or attach to your document displayed in Ocr Pdf Reader e.g using Utility - Environment > Start Process with a file location parameter Send Keystrokes e.g Ctrl + A followed by Ctrl+C or Alt + E followed by Copy File to Clipboard Use GetClipboard in a calculation stage to retrieve text or use the Paste from Clipboard action in MS Excel if Excel was the intended recipient Use Instr, Mid, Left, Right or other relevant text function in a calculation stage to extract specific texts from the clipboard content OCR It is advisable to have a run through of the Surface Automation & OCR training before going down this route. The steps would normally be Launch or attach to your document displayed in Ocr Pdf Reader e.g using Utility - Environment > Start Process with a file location parameter Maximise and/or resize the document to ensure the screen behaviour is standardized Spy the Window in a region mode and create a region of the content area In a read stage, call the Read Text with OCR Functionality and store the value in a text item Perform sanity checks where required to ensure the success rate is at an acceptable percentage and avoid unnecessary false positives You can proceed to use Instr, Mid, Left, Right or other relevant text function in a calculation stage to extract specific texts from the clipboard content Ocr Pdf API It is advisable to have fully understood the Blue Prism Data Sheet - Extending Automation using the .Net Framework before having a go at this. There is a possibility to convert PDF document to other readable formats such as a .xml or .word extension for the sole purpose of reading or extracting texts. It's worth noting here, to benefit from such functionality, a license cost may be incurred as such functionality requires either an Ocr Pdf Standard or Professional installed on the resource pcs where such process will run. This will need to be considered as part of the solution design. Assuming the license has been procured, the steps would normally be Create a code stage that interfaces with the Ocr Pdf API Call a Save/Save As functions within the API to allow you to save in different formats Open the file in the new format e.g Word Copy the full content to a clipboard using keystrokes or actions within the VBOs to read the content You can now proceed to use Instr, Mid, Left, Right or other relevant text function in a calculation stage to extract specific texts from the clipboard content I hope this helps.

OCR PDF: All You Need to Know

A Note about Open Image Formats While Blue Prism software requires that an image format is present in the document before OCR will operate, there is a chance that there will be some text visible within the image file's metadata that Blue Prism software will not be able to extract. This can happen if the file is either a RIFF or a JPG. You can check the file's metadata to ensure some text is still present in the file.