I would break down the problem into a normal program that finds bits to send to nets. For example if everything was a neat fixed width font computer printed, then you could break it up into a grid and send each grid cell to a net capable of recognizing a single character.
In practice we often get more accurate results when using a net designed for a specifc language and usage set. In this case, the net is often trained to recognize full words at once. This has the added property that because nets are biased toward sets of letters they are trained against that if you enter a word that is slightly misspelled, the net may recognize it as the correctly spelled word. This requires a *much* larger training set (ie the whole dictionary) in order to be accurate on all words though.
In any case, use a simple algorithm to break the text up into lines, and then into chunks you can feed into a net. They may be characters or full words, or for one that can analyze grammar you might even try full sentences. (hint: dont actually try this, but realize the implications if you could get this to work and think about this as an advanced topic)