Wide Language Index
The Wide Language Index is an audio catalog of broadcasts and podcasts in 102 languages. It is designed to be “wide”, containing a huge variety of languages, but “shallow”, containing 5-20 examples of each language. The catalog was created to serve as the base for the Great Language Game, but now is a standalone dataset that can be re-used for other purposes.
The catalog is available on git at: https://github.com/larsyencken/wide-language-index
You may clone it like:
git clone email@example.com:larsyencken/wide-language-index cd wide-language-index
Then you will find the catalog in the
index/ folder. To download the audio samples that match the catalog entries, run: