Lars Yencken


I'm an engineer and former researcher in human languages, machine learning and web systems. I love making, discovering and writing.

For my PhD, I worked on tools for learning Japanese and Chinese that understand learners' mistakes better. At NICTA, I analysed medical texts using machine learning, sparking my interest in larger datasets. I joined 99designs to learn how to serve larger audiences, and stayed to delve into the rich data that comes with customer interaction.

I believe that a mathematical beauty underpins our world. I stand for kindness, reason, collaboration, openness and growth.


  • Quietly amused: musings on language, code and life.
  • Halvsvensk: an bilingual blog exploring Australian and Swedish culture.


  • Great Language Game: learn to distinguish between spoken languages
  • Simsearch: an open source visual similarity search for Japanese kanji.
  • FOKS: an intelligent dictionary for learners of Japanese.
  • Kanji Tester: a study tool for JLPT levels 3 and 4 centred around adaptive testing.

Open source

  • csvdiff: compare CSV files for differences
  • marelle: test-driven sysadmin through logic programming
  • colorific: library for detecting significant color in designs
  • anytop: an ncurses frequency visualisation from streaming input.
  • doko: a command-line tool for determining your current location
  • cjktools: a Python library for working with Japanese and Chinese dictionaries.

For a fuller list, check Github or Bitbucket.

Data sets


  • Tara McIntosh, Lars Yencken, Timothy Baldwin and James Curran: “Relation guided bootstrapping of semantic lexicons”, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR (2011) [pdf]
  • Su Nam Kim, David Martinez, Lawrence Cavedon and Lars Yencken: “Automatic classification of sentences to support evidence based medicine”, BMC Bioinformatics, 12:S2 (2011) [pdf]
  • Lars Yencken and Timothy Baldwin: “Predicting and compensating for lexicon access errors”, Proceedings of the 2011 International Conference on Intelligent User Interfaces, Palo Alto, CA (2011) [pdf]
  • Lars Yencken: “Orthographic support for passing the reading hurdle in Japanese”, PhD Thesis, University of Melbourne (2010) [pdf]
  • Lars Yencken and Timothy Baldwin: “Measuring and predicting orthographic associations: modelling the similarity of Japanese kanji”, in Proceedings of COLING 2008, Manchester, UK (2008) [pdf bib errata]
  • Lars Yencken and Timothy Baldwin: “Orthographic similarity search for dictionary lookup of Japanese words”, in Proceedings of ECAI 2008, Patras, Greece (2008) [pdf bib errata]
  • Lars Yencken, Zhihui Jin and Kumiko Tanaka-Ishii: “Pinyomi - Dictionary lookup via orthographic associations”, in Proceedings of PACLING 2007, Melbourne, Australia (2007) [pdf bib]
  • Zhihui Jin, Lars Yencken and Kumiko Tanaka-Ishii: “漢字対応に基づく日中辞書検索 (Japanese-Chinese dictionary lookup using ideogram transliteration)”, in Proceedings of NLP 2007, Otsu, Shiga Japan (2007) [pdf]
  • Lars Yencken and Timothy Baldwin: “Modelling the orthographic neighbourhood for Japanese Kanji”, in Proceedings of ICCPOL 2006, Singapore (2006) [pdf bib]
  • Lars Yencken and Timothy Baldwin: “Efficient grapheme-phoneme alignment for Japanese”, in Proceedings of ALTW 2005, Sydney, Australia, pp. 143-151 (2005) [pdf bib]