What are some ways of implementing an OCR using neural networks with backpropagation?
One way is using Convolutional Neural Networks. The key aspect is to omit any fully-connected layers that are usually found at the end of such networks and describe the final layers as convolutional layers to peform the classification step on multiple patches it get from preceeding layers. These can be referred to as fully convolutional networks because the classification part emerges from a convolution just like the feature extraction part. This idea goes back to multi-digit recognition experiment by Yann LeCun's group MNIST Demos on Yann LeCun's website also see LeNet-5 demos . Particularly this paper Multi-Digit Recognition Using a Space Displacement Neural Network . The same concept can be applied to alphanumeric character recognition. It is not limited to numeric digits from the MNIST database.
How do I integrate a barcode scanner in an app using Java API?
Well that is OCR thing to do that you need to use Java API with the capability of doing the OCR stuff ( Optical Character Recognition ) there's a project on github maintains such an API it's called zebra crossing there are examples for android too out there using this library API for starters here is the zxing s
In your opinion, what is better for a GUI: Java or c#?
I think if you're happy to be making a Windows-only application then WPF is probably on balance a little nicer than JavaFX. They are *really* similar though basically you've got a XML-based GUI design language (XAML for WPF and Windows Store apps and FXML for JavaFX). Then you've got the language C# and Java which are basically the same. WPF wins on tooling Visual Studio is very nice and basically while I'm used to NetBeans now Visual Studio is better. The GUI designer in Visual Studio is notably better than Scene Builder for JavaFX although I think now Gluon are taking care of Scene Builder more will happen on that front. JavaFX is of course cross platform though and for me by far the best cross-platform GUI toolkit out there. I think WPF and C# suit beginners more simply because of Visual Studio it's just a more cohesive offering than NetBeans + JDK + Scene Builder. WPF is better if you want to make native-looking applications. Consider that if you want a native dialog box in JavaFX you're going to be using JNI + C. I use JavaFX for a few applications and I've got about 13 lines of C to make the whole thing seem like a native application. 13 lines isn't very much but if you're not used to C it'll be hard going. I think they are very similar offerings both excellent. WPF if you're working on Windows only is better for a lot of things especially if you're wanting to hook into DirectX or some other Windows-only technology. If you need to run on something other than Windows like Mac Linux Android iPhone then JavaFX is obviously the way to go. I think choosing between WPF and JavaFX is like choosing between a BMW and an Audi you're unlikely to regret either.
How can I grab identical articles in corrupted OCR texts?
Your question seems to be attacking two different problems. Are you Trying to figure out identical articles in the index? Even if some of the words are missed the solution to 1 is pretty straightforward - Sujit has outlined it pretty nicely in his answer. In case thats not what you are trying to solve then you should look at machine learning for extrapolate the missing words. If all of your scanned documents belong to the same category (I mean all of them are either hotel bills or tax returns etc) then you should build a simple Bayesian algorithm to figure out which words occur next to each other. Once you have this training corpus loaded in a machine learning platform you should be able to easily identify the missing words. We have used a similar technique for a variety of purpose Understanding Expense Types - Mobile Based Receipt Scanning & Data Extraction System Loyalty Management - Receipt Based Loyalty Management Program For Indirect Sales ordered-list Hope this helps.