RECENT DEVELOPMENT IN OPTICAL
CHARACTER RECOGNITION AT M.I.T.*

Lawrence G. Roberts

Research Laboratory of Electronics
Massachusetts Institute of Technology
Cambridge, Massachusetts


yellow rule

During the past five years, the work on optical character recognition at M.I.T. has been conducted mainly at Lincoln lab- oratory. Selfridge, Doyle, Roberts, and Sherman have all published results of various aspects of character recognition re- search as reviewed briefly below. In the past year, however, the emphasis at M.I.T. has changed from these hand-printed character studies to research on cursive handwriting conducted by Eden, Dennis and their students at the Institute. Progress has been made in characterizing handwritten strokes and applying this to the handwriting recognition problem.

In the past, there has been considerable activity at M.I. T. on two-dimensional shape recognition. Although the emphasis of these studies has been on the self-organizing aspects of such systems, the shapes tested have been characters. Much of the theoretical work has been done by Minsky and Selfridge, thus providing fruitful ideas for others [1, 2]. The pandemonium concept of Selfridge was tested by Doyle onhand-printed characters [3]. This method consists of many computation routines testing various features of the characters. Each routine has a binary output, the set of which are correlated with output probabilities as in a Bayes net. Using the same characters, Roberts then experimented with a perceptron type of net in an attempt to find good reward procedures [4]. The main difference from a pandemonium system is that the computation routines are replaced by threshold elements directly connected to several input cells. Their output is still treated by a Bayes net wherein the learning takes place. Selfridge intended pandemonium to modify its computation routines as a more powerful learning technique, but this is difficult to accomplish in a computer. However, it proved possible to have the computer modify the set of input cells feeding a threshold unit whenever the unit became inactive. This two- stage adaptation has the effect of reducing both the number of units needed and the learning time.

Recently, however, researchers at M.I. T. have been investigating other types of input material than the scanned, single character. Handwriting analysis was begun and very well characterized by Eden and Halle [5]. They found that only 18 strokes are used to construct all of the English Characters. These strokes can be generated and joined together by simple rules in a computer so as to "write" any word. By finding the ranges of the stroke parameters for an individual, the computer can forge his handwriting. Continuing this work toward handwriting recognition, Earnest has developed a set of simple tests on the strokes in a word, which enables a computer to pick likely words from a dictionary [6]. Presently these tests are to count the closed loops, the tails above and below the small leters, and the axis crossings at the center of the word. Each letter will contribute some to each category and words can be chosen if their letters would produce a similar test result. With just these tests, approximately 20%of the 10,000 words are recognized uniquely and the average number of words chosen is 20. A test for upstrokes is being added and the order of the tails, upstrokes, and closed curves will be recorded. With these changes it is hoped the number of words chosen from the dictionary will decrease to a point at which analysis by synthesis can be tried. That is, the chosen words can be written by the computer and compared with the in- put word. It is also possible that a few more tests will complete the recognition job.

Another type of character input that is being considered at M.I. T .is an "electronic drafting board" intended for real-time multisequence computer usage. This device, now being constructed by Teager [7], consists of a 20" x 20" sheet of paper on a base laid with wires in two directions. Pulses sent down these wires announce to a stylus above the paper the binary position of the wire just under the stylus. The stylus consists of a pen and a set of coils, for the dual purpose of marking and sensing the pen's x and y position. Evidently the problem of receiving pulses from several wires is not very troublesome. The wires are laid approximately .01 " apart and wound in and out of pulse trans- former cores to provide drive current. By properly choosing the cores to loop, the pulse code for each wire is selected. Thus, approximately 11 cores are needed for each axis of the board.

When the pen is moved to a new board position, it will receive a new pulsed position. This 22-bit position can be sent to the computer each time the pen moves, or at such a time as a further' slope sensor says the pen has turned a corner. Thus, the computer will not be interrupted more often than every millisecond. Since the sequence of points is also known by the computer, character recognition is simpler. For separated letters, Teager estimates that 100 machine instructions are sufficient to identify a character from a list of characters that the user has previously drawn. Everything considered, each such user should only consume a small fraction of the machine's time. This makes on-line programming possible. Of course, many other uses can be thought of. This whole project is in the prototype stage.

One further type of character-input treatment could be mentioned and that is in connection with my present work on recognition of 3-D solids. Although I am mainly concerned with identifying and positioning objects in a three-dimensional space, using a photograph as input, it is often the case that letters will appear on the surface of objects. If these letters are treated in the same manner as objects and mapped by a three-dimensional projective transform to a model, then the distortion caused by perspective is removed and the model is an invariant transform of that letter. This procedure can be carried out and is purely mathematical, involving a topology fit and a minimum square- error match with the models. It is designed to make the identification of an object independent of the viewpoint. However, it may be true that many of the variations in font and letter style are really not much more than perspective changes, since we normally see as identical almost all two-dimentional perspective transformations of a pattern. Thus, one model may fit all of the forms of each letter completing the job of recognition.

REFERENCES

  1. M. L. Minsky. "Steps Toward Artificial Intelligence," Proc. IRE, Vol. 49, pp. 8-30; January 1961.
  2. 0. G. Selfridge and U. Neisser, "Pattern Recognition by Ma- chine," Scientific American, Vol. 203, No.3, pp. 60-68; August 1960.
  3. W. Doyle, "Recognition of Sloppy Hand-printed Characters," Proc. WJCC, San Francisco, May 3-5, 1960, Vol. 17, pp. 133-142.
  4. L. G. Roberts, "Pattern Recognition With An Adaptive Net- work," 1960 IRE International Convention Record, Pt. 2, pp. 66-70.
  5. M. Eden and M. Halle, "Characterization of Cursive Hand- writing," Proc. Fourth London Symposium on Information Theory, C. Cherry (ed.) Butterworth Scientific Publications, London, 1961.
  6. L. Earnest, Mitre Corporation, Bedford, Mass.
  7. H. Teager, Computation Center, M.I. T., Cambridge, Mass.

yellow rule

Home || Contact Dr. Roberts

Copyright 2001 Dr. Lawrence G. Roberts

Contact webmaster