OCR(Optical Character Recognition) is a technique used for converting text images into editable text formats. OCR has worked well for scripts written in Roman or Latin, where the characters are usually individually written and are space separated. But considering scripts written in Devanagari, the tasks are challenging as it consists of various combinations of vowels and consonants. It becomes difficult to identify characters in case there is no boundary separation between adjacent characters or characters of adjacent lines, for instance. continuous writing pattern in cursive writing. If text cannot be segmented into individual characters, OCR models might not work well. Then there is the problem of varying hand writings, font sizes etc.
The OCR software has been developed for many languages for various types of characters in Devanagari. So far, OCR software has worked only for Devanagari. The Devanagari script was chosen because its script is easily read in computers and can be read without special software in Hindi, Tamil, Punjabi and Urdu. The Devanagari Script has the most characters of any script. It comprises 17 consonants (AAZ) and 28 vowels [ʰ]. [P] is the only missing consonant and appears between vowels. Here are the consonants (AAZ), but there are many more (see Chart 1) [ʰ] and all vowels have a pronunciation [a]. Chart 1: Devanagari Script — Number of Consonants and Vowels Vowels and Aids to Recognize Consonants Below is a chart showing the different vowel sounds in Devanagari that can aid in recognizing specific Devanagari letters. Vowels A few words in Devanagari are formed by combinations of two or more different.