Lars Yencken


I'm an engineer and former researcher in human languages, machine learning and web systems. I love making and discovering; sometimes I even write about it.

I tried to model what language learners know to better help them learn, and turned these models into tools for studying Japanese and Chinese. Following text mining interests to NICTA, I began analysing medical text corpora. This sparked my interest in machine reading and big data. I joined 99designs to learn how to serve larger audiences, and stayed to delve into the rich data that comes with customer interaction.

Large datasets help to unveil the mathematical beauty underpinning the world. I like to discover and share this beauty, and impact peoples lives.


  • The Great Language Game: learn to distinguish between spoken languages
  • marelle: test-driven sysadmin through logic programming
  • doko: a command-line tool for determining your current location
  • colorific: library for detecting significant color in designs
  • anytop: an ncurses frequency visualisation from streaming input.
  • simsearch: an open source visual similarity search for Japanese kanji.
  • foks: an intelligent dictionary for learners of Japanese.
  • kanji tester: a study tool for JLPT levels 3 and 4 centred around adaptive testing.
  • cjktools: a Python library for working with Japanese and Chinese dictionaries.

Some of my work is open-sourced, and is available on Github or Bitbucket.

Data sets


  • Tara McIntosh, Lars Yencken, Timothy Baldwin and James Curran: “Relation guided bootstrapping of semantic lexicons”, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR (2011) [pdf]
  • Su Nam Kim, David Martinez, Lawrence Cavedon and Lars Yencken: “Automatic classification of sentences to support evidence based medicine”, BMC Bioinformatics, 12:S2 (2011) [pdf]
  • Lars Yencken and Timothy Baldwin: “Predicting and compensating for lexicon access errors”, Proceedings of the 2011 International Conference on Intelligent User Interfaces, Palo Alto, CA (2011) [pdf]
  • Lars Yencken: “Orthographic support for passing the reading hurdle in Japanese”, PhD Thesis, University of Melbourne (2010) [pdf]
  • Lars Yencken and Timothy Baldwin: “Measuring and predicting orthographic associations: modelling the similarity of Japanese kanji”, in Proceedings of COLING 2008, Manchester, UK (2008) [pdf bib errata]
  • Lars Yencken and Timothy Baldwin: “Orthographic similarity search for dictionary lookup of Japanese words”, in Proceedings of ECAI 2008, Patras, Greece (2008) [pdf bib errata]
  • Lars Yencken, Zhihui Jin and Kumiko Tanaka-Ishii: “Pinyomi - Dictionary lookup via orthographic associations”, in Proceedings of PACLING 2007, Melbourne, Australia (2007) [pdf bib]
  • Zhihui Jin, Lars Yencken and Kumiko Tanaka-Ishii: “漢字対応に基づく日中辞書検索 (Japanese-Chinese dictionary lookup using ideogram transliteration)”, in Proceedings of NLP 2007, Otsu, Shiga Japan (2007) [pdf]
  • Lars Yencken and Timothy Baldwin: “Modelling the orthographic neighbourhood for Japanese Kanji”, in Proceedings of ICCPOL 2006, Singapore (2006) [pdf bib]
  • Lars Yencken and Timothy Baldwin: “Efficient grapheme-phoneme alignment for Japanese”, in Proceedings of ALTW 2005, Sydney, Australia, pp. 143-151 (2005) [pdf bib]