Newspaper Reader [INACTIVE]

Devonator
Administrator

Posts: 47

Newspaper Reader [INACTIVE] Dec 5, 2012 4:18:39 GMT -5

Quote

Post by Devonator on Dec 5, 2012 4:18:39 GMT -5

Project Members

Logan Buchy
Martin Pajchel

Brief

Abstract about the feature, including something about significance. What should it do? How should we do it? What is the deliverable?
Significance

Broader context about why this feature is important to add to the robot
Milestones

Optical Character Recognition
Page Segmentation ?

Technicals
Image Processing

Filter descriptions and implementations may go here

Last Edit: Dec 6, 2012 20:55:19 GMT -5 by Devonator

martin @software Posts: 2	Newspaper Reader [INACTIVE] Dec 5, 2012 20:39:09 GMT -5 Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by martin on Dec 5, 2012 20:39:09 GMT -5 Information: - Project completed as part of EECE 466 DSP - Program segments and reads scanned documents in Times New Roman - Problems if to be integrated with robot:

martin @software Posts: 2	Newspaper Reader [INACTIVE] Dec 5, 2012 20:40:47 GMT -5 Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by martin on Dec 5, 2012 20:40:47 GMT -5 (cont) - skew - perspective - lens eye effect from camera - Me and Logan are probably going to work on other things that are more core functions the robot needs.

Devonator
Administrator

Posts: 47

Newspaper Reader [INACTIVE] Dec 6, 2012 20:51:53 GMT -5

Quote

Post by Devonator on Dec 6, 2012 20:51:53 GMT -5

Here are the logs as taken from trac. As per the post above I believe this project is being discontinued for now?

Log

10/09 - MARTIN:

current training script I have (enclosed)
most literature say use multi layered NN with input notes # of pixels
got it to converge pretty close with 30000 epochs
am trying to put a max of 100000 epochs and train til convergence
found more links that relate how to train these let me know if you want these
some StackOverflow? links http://stackoverflow.com/questions/9092821/python-neurolab-feed-forward-neural-network, http://stackoverflow.com/questions/12404128/neural-network-to-train-a-image-so-as-to-get-its-unicode-as-output-python
lib book referenced an article that talks about splitting up the NN for classes of letter
we could use vert projections to class in: [ a, o, m , n ] [ b, t, l ... ] [ y, j, q, ... ]
lower number of letters make it easier to train NN (less cumputing time, prob more accurate as well)
tesseract OCR: http://code.google.com/p/tesseract-ocr/
open source ocr for documents
got it installed
written in C++, runs in Visual Studio 2008 environment
hope we can use for the car recog part if needed
haven't played around with it yet

10/15 LOGAN:

Referenced paper: Optical Character Recognition by a Neural Network - MICHAEL SABOURIN suggests that this may be more difficult that just attaching the input neurons directly to the pixels
Martin and I have confirmed that for a larger set of letters ( more than 6ish ) the NN will not accurately recognise any letters
The paper suggests finding the contours of letters and deriving a 'tangent field' from these contours

Quote from paper...

In this classification system, the primary feature is the shape of the object,
as represented by the tangent field of its contour. The tangent field is derived
by smoothing the chain code description and then uni- formly sampling the
contour to 64 points. Smoothing reduces noise influences, and uniform
sampling makes this feature scale invariant. The angle between adjacent
samples of the smoothed contour are encoded as a vec- tor, which forms
the input to the neural network.

Have implemented some code to find the 'tangent field' for letters. The current state of the code requires large high-resolution letters to generate the tangent fields.
This paper is quite extensive on how they approached OCR. They have many layer of networks to recognise characters that are similar to each other
They also use the genus of the object (number of contours within the letter is how I interpreted this)
For the neural network, we need to create a larger data set to train with. We need to generate many samples of 'A' (rotated? Added noise?). The paper used 200 per letter, but they also can identify many typefaces.

Last Edit: Dec 6, 2012 20:52:44 GMT -5 by Devonator

Post by Devonator on Dec 5, 2012 4:18:39 GMT -5

Post by martin on Dec 5, 2012 20:39:09 GMT -5

Post by martin on Dec 5, 2012 20:40:47 GMT -5

Post by Devonator on Dec 6, 2012 20:51:53 GMT -5

Shoutbox