Lars Yencken


I'm an engineering leader and data scientist based Geneva, Switzerland. I love making, discovering and doing work with impact.

Today I lead the tech team at Our World In Data, where we use data and evidence to shine a light on human progress and its gaps, striving always to improve the quality of public debate and public policy.

I believe that a mathematical beauty underpins our world, including the fractal complexity of human affairs. I stand for warmth, reason, collaboration, openness and growth.


  • Great Language Game: learn to distinguish between spoken languages
  • Simsearch: an open source visual similarity search for Japanese kanji.
  • Kanji Tester: a study tool for JLPT levels 3 and 4 centred around adaptive testing.

Open source

  • csvdiff: compare CSV files for differences
  • marelle: test-driven sysadmin through logic programming
  • colorific: library for detecting significant color in designs
  • anytop: an ncurses frequency visualisation from streaming input.
  • doko: a command-line tool for determining your current location
  • cjktools: a Python library for working with Japanese and Chinese dictionaries.

For a fuller list, check Github.

Data sets


  • Hedvig Skirgård, Seán G. Roberts and Lars Yencken: “Why are some languages confused for others? Investigating data from the Great Language Game”, PLoS ONE, 12(4) (2017) [pdf]
  • Tara McIntosh, Lars Yencken, Timothy Baldwin and James Curran: “Relation guided bootstrapping of semantic lexicons”, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR (2011) [pdf]
  • Su Nam Kim, David Martinez, Lawrence Cavedon and Lars Yencken: “Automatic classification of sentences to support evidence based medicine”, BMC Bioinformatics, 12:S2 (2011) [pdf]
  • Lars Yencken and Timothy Baldwin: “Predicting and compensating for lexicon access errors”, Proceedings of the 2011 International Conference on Intelligent User Interfaces, Palo Alto, CA (2011) [pdf]
  • Lars Yencken: “Orthographic support for passing the reading hurdle in Japanese”, PhD Thesis, University of Melbourne (2010) [pdf]
  • Lars Yencken and Timothy Baldwin: “Measuring and predicting orthographic associations: modelling the similarity of Japanese kanji”, in Proceedings of COLING 2008, Manchester, UK (2008) [pdf bib errata]
  • Lars Yencken and Timothy Baldwin: “Orthographic similarity search for dictionary lookup of Japanese words”, in Proceedings of ECAI 2008, Patras, Greece (2008) [pdf bib errata]
  • Lars Yencken, Zhihui Jin and Kumiko Tanaka-Ishii: “Pinyomi - Dictionary lookup via orthographic associations”, in Proceedings of PACLING 2007, Melbourne, Australia (2007) [pdf bib]
  • Zhihui Jin, Lars Yencken and Kumiko Tanaka-Ishii: “漢字対応に基づく日中辞書検索 (Japanese-Chinese dictionary lookup using ideogram transliteration)”, in Proceedings of NLP 2007, Otsu, Shiga Japan (2007) [pdf]
  • Lars Yencken and Timothy Baldwin: “Modelling the orthographic neighbourhood for Japanese Kanji”, in Proceedings of ICCPOL 2006, Singapore (2006) [pdf bib]
  • Lars Yencken and Timothy Baldwin: “Efficient grapheme-phoneme alignment for Japanese”, in Proceedings of ALTW 2005, Sydney, Australia, pp. 143-151 (2005) [pdf bib]