How it works

Upload & Edit
Your PDF Document
Save, Download,
Print, and Share
Sign & Make
It Legally Binding
Video instructions and help with filling out and completing google tesseract demo


What is your favorite coding project you have done?
My Favorite is building the Person of Interest machine ;) In my sophomore I came across the CBS series Person Of Interest that revolves around aputer system dubbed the Machine that can analyze data from surveillance cameras electronicmunications and audio input and predict acts of crime. I had this spark then in 213 Why can I make one such machine? While most of the things the machine do are fairly impossible at present I was able to design a working system of the machine that can perform at least half of what the POI Machine does. The Machine as described in Person of Interest has artificial consciousness which is pretty much out of equation at the current moment. I started listing down all the features the machine came packed with thanks to the episode where Nathan boots the system reveals most of the stuff. The first step was to decide upon the technologies I need to use and I ended up with the following NodeJS for forwarding and handling the videos streamed from the users webcam through Websockets C++ for processing the s on Day 1. To put it in action I connected it to my department local network and subscribed to the video stream from theputers (well not legal). I can still recall my friend saying to the other during demonstration We must kill Raghav before this weekend at the central park. I got an email within seconds notifying me that I was predicted to be a victim. Now I needed to test the system in the Real World. Thanks to my college we have over 13 CCTV equipped round our campus and our lab being the relay to the network I was able to tap into the network and voila I had this As days progressed I improved with features like it calling me through Twilio and also revamped the UI to suit Samaritan Apparently my friends weren aware that it was running on their systems as well. It has been 2 years since I ran it and it is quietly resting inside my backup HDD. Edit italic Digged up some of the old pics italic I needed a medium through which the machine canmunicate with me on the go. Facebook didn have its Bots API earlier then so I created a separate account for the machine and integrated Facebook API that allowed it to chat with me from anywhere I also made a Dashboard through which I was able to view the locations of the people the system tracks their devices connected to network within our campus network Finally when I upgraded the system to the new UI and wiped all memory (Day ) Along with my friend we were able to build a portable version of the system that is solar powered and can be fitted anywhere like lamp posts. The main goal of the portable version is to provide assistance during Disasters where the Government can remotely access affected areas find casualties provide charging station for smartphones and also WiFi hotspot for nearby people by connecting to Outernet. We showcased the system at a Nam event s Edit #2 italic While I am overwhelmed by a lot of positive response I came across a fewments questioning the authenticity of the answer above and I would be very much happy to share some things italic I had posted the above answer in my personal medium blog late Nov 214 ( How I built the Person of Interest Machine 3 Raghav 3 Medium s@ragav_g ) and suprisingly I was curious enough to share the first day I started working on this thing in Twitter in Nov 213 s html
What is the best approach to extract data from a receipt or invoice that is a PDF and categorize it by name, total cost, description, etc.?
We have helped a number ofpanies from financial logistics and retail domain to build innovative applications using OCR solutions as well as rules based machine learning systems. You can read about one such solution here Mobile Based Receipt Scanning & Data Extraction System The trick with building really effective data extraction and categorization solution is to allow for the flexibility of building custom rules and building machine learning models that automatically improve with usage. Having worked on these technologies for 4-5 years now we have arrived at a perfect balance of getting this done. We also provide white labeled mobile apps that can perform this extraction based on images taken from mobile phones. Most solutions out there do not leave any room for customization. So they give you a quick start to building your solution their model does not leave any room for customization. This leaves a lot of plumbing work to be done on your end to make the solution work with other business applications.
Which is the best Android OCR library?
There are many OCR libraries available for integration with Android - Tesseract is very widely used. From my experience extraction on OCR is generally not that great. So what you should do is - do a basic extraction test on the Android to make sure the image is taken properly does not have a shake etc and then send it to a server side library for deeper extraction and pre-processing. The trickier part is what to do after OCR engine gives you the . Text extraction is way moreplicated than vanilla OCR. For extraction you need to worry about two more things Extraction Rules OCR softwares usually dump the in your document into a free form field. This works great if you are scanning a page from a book or a doc. But in case you need to separate the line items from the document then you also need to apply lot of rules around it. That can take a lot more time than integrating the OCR engine For business apps there are situations where the OCR engine is pretty confident of the extracted data but the does not add up in the con of all the other data around it. This is where classic OCR engines fails. A lot ofpanies have been able to get around this problem by building strong algorithms based on machine learning which can plug the gap in the OCR engine's readability.
How can I implement OCR technique in apache spark?
There are a number of additional techniques that you may want to check out for applying OCR techniques with Apache Spark Utilize the Tesseract-OCR via PyTesseract (via PySpark) pytesseract .1.6 s While not OCR per se here an interesting -analysis-and-analytics-using-spark A fun one is Tim Hunter blog postbining Apache Spark and Google TensorFlow via TensorFrames Deep Learning with Apache Spark and TensorFlow s Another approach is to use a Data Science service such as Algorithmia with their various OCR algorithms ocr Algorithms - Algorithmia s . You can find a sample of how to make a call to Algorithmia from Spark at algorithmiaio s HTH!