Lars Yencken


Profile

I am an engineer and researcher in text, machine learning and web systems. My background is in software engineering, mathematics and natural language processing, but I have broad interests in artificial intelligence, first and second language acquisition, information retrieval, agile web development and rationality.

My love of languages led to my research in user modelling for language learners, so as to provide learners of Japanese and Chinese with better tools to aid their study. I followed text mining applications to NICTA, where analysis of medical text corpora piqued my interest in machine reading and big data. I then joined the 99designs team in order to explore audiences of larger scale, and the rich data associated with customer interactions.

I like building systems for real use, and finding the structure and beauty in large data sets. These are the things which drive me to explore new and interesting projects.

Projects

  • marelle: test-driven sysadmin through logic programming
  • doko: a command-line tool for determining your current location
  • colorific: library for detecting significant color in designs
  • anytop: an ncurses frequency visualisation from streaming input.
  • simsearch: an open source visual similarity search for Japanese kanji.
  • foks: an intelligent dictionary for learners of Japanese.
  • kanji tester: a study tool for JLPT levels 3 and 4 centred around adaptive testing.
  • cjktools: a Python library for working with Japanese and Chinese dictionaries.

Some of my work is open-sourced, and is available on Bitbucket.

Data sets

A number of data sets related to my PhD work are available for download.

Publications

  • Tara McIntosh, Lars Yencken, Timothy Baldwin and James Curran: “Relation guided bootstrapping of semantic lexicons”, to appear in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR (2011) [pdf]
  • Su Nam Kim, David Martinez, Lawrence Cavedon and Lars Yencken: “Automatic classification of sentences to support evidence based medicine”, BMC Bioinformatics, 12:S2 (2011) [pdf]
  • Lars Yencken and Timothy Baldwin: “Predicting and compensating for lexicon access errors”, Proceedings of the 2011 International Conference on Intelligent User Interfaces, Palo Alto, CA (2011) [pdf]
  • Lars Yencken: “Orthographic support for passing the reading hurdle in Japanese”, PhD Thesis, University of Melbourne (2010) [pdf]
  • Lars Yencken and Timothy Baldwin: “Measuring and predicting orthographic associations: modelling the similarity of Japanese kanji”, in Proceedings of COLING 2008, Manchester, UK (2008) [pdf bib errata]
  • Lars Yencken and Timothy Baldwin: “Orthographic similarity search for dictionary lookup of Japanese words”, in Proceedings of ECAI 2008, Patras, Greece (2008) [pdf bib errata]
  • Lars Yencken, Zhihui Jin and Kumiko Tanaka-Ishii: “Pinyomi - Dictionary lookup via orthographic associations”, in Proceedings of PACLING 2007, Melbourne, Australia (2007) [pdf bib]
  • Zhihui Jin, Lars Yencken and Kumiko Tanaka-Ishii: “漢字対応に基づく日中辞書検索 (Japanese-Chinese dictionary lookup using ideogram transliteration)”, in Proceedings of NLP 2007, Otsu, Shiga Japan (2007) [pdf]
  • Lars Yencken and Timothy Baldwin: “Modelling the orthographic neighbourhood for Japanese Kanji”, in Proceedings of ICCPOL 2006, Singapore (2006) [pdf bib]
  • Lars Yencken and Timothy Baldwin: “Efficient grapheme-phoneme alignment for Japanese”, in Proceedings of ALTW 2005, Sydney, Australia, pp. 143-151 (2005) [pdf bib]