HanziLookup

Jordan Kiang
jordan-at-kiang.org


(screenshot)



Here's a Java Swing component for looking up Chinese characters by handwritten/mousewritten input. It's basically the detached lookup component from HanziDict project that I worked on previously. It is designed to support both simplified and traditional character forms. It works by checking the direction and length of strokes entered against values stored in its datafile. While correct stroke ordering is important, it's designed to be a little forgiving, but your mileage may vary. It can't really compete with commercial handwriting recognition, but hey, it's just a widget I coded up in some spare time.

The source includes two stroke definition data files. strokes.txt has the original hand-input definitions for about 4,000+ characters. The second, strokes-extended.txt has the original plus almost 10,000 more that Erik Peterson of www.mandarintools.com generated programatically. Older machines might want to stick to the smaller file as it limits the computation required to find a match and takes less memory.

1/2006 Updates

  • Added optional auto-lookup after each stroke.
  • Removed "Select" button. Now just clicking on the option triggers the handler event.
  • Added "looseness" adjuster so that the scope of the comparison can be limited to characters with similar complexity. This may help slower machines.
  • Added font chooser.

    Requirements

    You'll need Sun's Java runtime version 1.4 or higher. You can download it here.

    You'll also need a font capable of displaying Chinese characters. Your operating system will probably have facilities for adding Chinese support which should include a Chinese font. This page is a good resource for information on enabling your computer for Chinese computing.

    Usage

    The HanziLookup class is a JPanel that can be dropped into a Swing app. See its the source for instructions on how to use it. It should be fairly straightforward.

    Apps

    HanziDict (link) An app/applet that looks up character definitions of selected characters.
    HanziInput (jar download) An app/applet that just puts the selected characters into a text pane that can be copy/pasted from. Can be used as a crude IME.
    DimSum (link) Erik Peterson has integrated it into his great set of DimSum tools at www.mandarintools.com.
    Pablo (link) Emmanuel Haton has ported the recognition code and used it in his native Windows dictionary, Pablo.

    Source

    Released under the GNU GPL.
    source zip (includes related data files)

    Thanks to

  • Erik Peterson of www.mandarintools.com for the aforementioned stroke data extensions.
  • Ian Johnson for adding the ability to undo strokes.
    You may also be interested in a Java-based Pinyin IME that I have worked on. You can find it here.

    back