Lars Yencken


I'm an engineer and former researcher in human languages, machine learning and web systems. I love making, discovering and writing.

Today I lead Data Science at Lifesum, where we help people to live happier and healthier lives, and try to understand what we need, how we behave and what drives us as humans.

I believe that a mathematical beauty underpins our world. I stand for warmth, reason, collaboration, openness and growth.


  • Quietly amused: musings on language, code and life.
  • Halvsvensk: an bilingual blog exploring Australian and Swedish culture.


  • Great Language Game: learn to distinguish between spoken languages
  • Simsearch: an open source visual similarity search for Japanese kanji.
  • FOKS: an intelligent dictionary for learners of Japanese.
  • Kanji Tester: a study tool for JLPT levels 3 and 4 centred around adaptive testing.

Open source

  • csvdiff: compare CSV files for differences
  • marelle: test-driven sysadmin through logic programming
  • colorific: library for detecting significant color in designs
  • anytop: an ncurses frequency visualisation from streaming input.
  • doko: a command-line tool for determining your current location
  • cjktools: a Python library for working with Japanese and Chinese dictionaries.

For a fuller list, check Github or Bitbucket.

Data sets


  • Tara McIntosh, Lars Yencken, Timothy Baldwin and James Curran: “Relation guided bootstrapping of semantic lexicons”, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR (2011) [pdf]
  • Su Nam Kim, David Martinez, Lawrence Cavedon and Lars Yencken: “Automatic classification of sentences to support evidence based medicine”, BMC Bioinformatics, 12:S2 (2011) [pdf]
  • Lars Yencken and Timothy Baldwin: “Predicting and compensating for lexicon access errors”, Proceedings of the 2011 International Conference on Intelligent User Interfaces, Palo Alto, CA (2011) [pdf]
  • Lars Yencken: “Orthographic support for passing the reading hurdle in Japanese”, PhD Thesis, University of Melbourne (2010) [pdf]
  • Lars Yencken and Timothy Baldwin: “Measuring and predicting orthographic associations: modelling the similarity of Japanese kanji”, in Proceedings of COLING 2008, Manchester, UK (2008) [pdf bib errata]
  • Lars Yencken and Timothy Baldwin: “Orthographic similarity search for dictionary lookup of Japanese words”, in Proceedings of ECAI 2008, Patras, Greece (2008) [pdf bib errata]
  • Lars Yencken, Zhihui Jin and Kumiko Tanaka-Ishii: “Pinyomi - Dictionary lookup via orthographic associations”, in Proceedings of PACLING 2007, Melbourne, Australia (2007) [pdf bib]
  • Zhihui Jin, Lars Yencken and Kumiko Tanaka-Ishii: “漢字対応に基づく日中辞書検索 (Japanese-Chinese dictionary lookup using ideogram transliteration)”, in Proceedings of NLP 2007, Otsu, Shiga Japan (2007) [pdf]
  • Lars Yencken and Timothy Baldwin: “Modelling the orthographic neighbourhood for Japanese Kanji”, in Proceedings of ICCPOL 2006, Singapore (2006) [pdf bib]
  • Lars Yencken and Timothy Baldwin: “Efficient grapheme-phoneme alignment for Japanese”, in Proceedings of ALTW 2005, Sydney, Australia, pp. 143-151 (2005) [pdf bib]