What is this?
The goal of the project is to help you practice listening comprehension.
It works by giving you random sentences in the language you're learning and asking you to fill in the gaps. The sentences were submitted by contributors to Mozilla Common Voice platform.
The project aims to not require any knowledge of a meta language in order to start learning. If you are interested in a more traditional course creation project, check out LibreLingo.
The game works by ordering the the questions by difficulty, then you are given batches of five with a random task for each of the questions. When you sucessfully answer a batch of five in less time than the audio takes to play, then you advance a level and get given a new batch of five.
- Fill in the blanks: A cloze-style task
- Drag and drop: Get a set of tiles and click on them to build a word or sentence
- Pick the right one: Get two options and choose the right one
- Spot the word: Get set of six tiles and click on the ones that appear in the audio
Space: Play the recording
- Submit and check if you got it right
- If already submitted, move to the next recording
The data comes from the Common Voice dataset releases.
This system is designed with two main user groups in mind:
- People who want to learn a new language
- People who want to learn how to write their native language
The system endeavours to be audio first, with knowledge of writing built up by hearing.
Talk to us!
#OmniLingo:matrix.org(access via Element)
- Telegram: OmniLingo
All of the languages available in Common Voice 6.1 dataset.
Abkhaz · Arabic · Assamese · Breton · Catalan · Hakha Chin · Czech · Chuvash · Welsh · German · Dhivehi · Greek · English · Esperanto · Spanish · Estonian · Basque · Persian · Finnish · French · Frisian · Irish · Hindi · Upper Sorbian · Hungarian · Interlingua · Indonesian · Italian · Japanese · Georgian · Kabyle · Kyrgyz · Luganda · Lithuanian · Latvian · Mongolian · Maltese · Dutch · Odia · Punjabi · Polish · Portuguese · Romansh Sursilvan · Romansh Vallader · Romanian · Russian · Kinyarwanda · Sakha · Slovenian · Swedish · Tamil · Thai · Turkish · Tatar · Ukrainian · Vietnamese · Votic · Chinese (China) · Chinese (Hong Kong) · Chinese (Taiwan)
If you want to work with a language not yet in Common Voice, we highly recommend that you get set up in Common Voice, but in the meantime, you can check out the format guidelines.
To bootstrap the project for Finnish,
git clone the repository, then run the following commands:
pip install poetry pip install -r requirements.txt make poetry install poetry run omnilingo serve
The project should be accessible through http://localhost:5001/index.html
To add more languages, download a dataset from Common Voice and put it in
Happy hacking! :)
For those who prefer to install their dependencies through their package manager in Debian/Ubuntu, the following dependencies are available there:
python3-mutagen - audio metadata editing library (Python 3) python3-jieba - Jieba Chinese text segmenter (Python 3) python3-flask - micro web framework based on Werkzeug and Jinja2 - Python 3.x
Logo by Fabi Yamada! Licensed under CC-BY.
Remove Japanese tokenizer - it's crashing.
We might need to use a different tokenizer library. Here's the crash report:
Tokenisers multi issue
Here is a list of languages that might require specific tokenisation for a first release:
Click when checked and implemented. Each tokeniser should have a test set.
Issue #11 fix?
The app isn't running properly on my system (the page isn't updating when I reload the server) but surely this fixes the issue? Sorry if this doesn't help at all but surely the onBlur event means that the input is checked every time the input element loses focus, such as when the audio is played? I don't know, I tried lol Also I added the encoding option because I was getting a UnicodeDecodeError otherwise. This might just be an issue for me though Possibly fixes #11
Static Site Commands
The following are steps towards having an omnilingo command that can perform the steps for build, deploy, etc that are automatically tested in a production like manner. These are subject to change but this issue will be a tracking point for those changes as the pull requests that implement this change continue to land.
Incorporation of Pictures
Add an option where an audio file will be played and the user selects pictures of things that are said in the audio stream. This could help users start to actually learn new words from the app
as a first pass at gamification it would be good to have a timer that starts when the user presses play and stops when the user completes the task. At the end the interface can report the score for the task, as
user time / audio time x.
Create a tokeniser for Italian
The Italian corpus has a lot of non-alphabetic characters in it:
This probably requires a bit of a different approach, e.g. stripping some of the characters, or filtering some of the sentences.
add tokenisers with different levels
we have words, but also useful would be, e.g.
Example: kinkʼowinik kimbʼek
k i n kʼ o w i n i k k i m bʼ e k
kin kʼow in ik kim bʼek
k in kʼow in ik k im bʼe k
Create an announcement board
Modal for settings
Initially two settings:
other apps often have an option to slow down the audio, this seems to work by people putting pauses in between the words. it would be cool to support something like this, but chopping the audio is hard. some other ideas:
Sometimes spacebar is needed for the answer, but its use reloads the page