Antoine's blog

Welcome to my personal blog! I use it to share what I'm currently learning or thinking about, usually on topics related to technology, business, and health.

Why and how to sentence mine with Migaku (to Anki)

This is a niche article but if you're learning a language and want to do some sentence mining, you are in the right place. I'll go over what sentence mining is, how to do it with Migaku (no affiliation or sponsor BTW) and how it syncs to Anki. Let's get started!


What is the point of sentence mining?

When using Anki (or any other SRS), you need to add cards to learn new words. The default solution is to take a prebuilt deck and learn its cards. This approach is the fastest, as you can get started right away. It works well to memorize the most basic words and structures of a language, but not so much afterwards.

The problem is twofold:

  1. You learn words in the wrong order. Vocabulary is very context dependent and to get to consuming content you enjoy as quickly as possible, you should focus your learning appropriately. A pre-made is a lot less time-efficient in that regard. To me, the main risk is that you will lose momentum and stop learning the language.

    For 99% of people language learning is all about motivation. After all, if you already speak English, you likely do not really need to learn a new language and you will be doing it as a hobby. This means that giving up is the biggest risk. My personal approach here is to get as quickly as possible to a point where I can consume content I enjoy without giving up in the process. Sentence mining is perfect as you get to that point quicker and even if you aren't really comfortable with the content you enjoy yet, you can use it as a way to improve yourself. That's a lot nicer than using textbooks!

  2. Words are learned out of context. When you sentence mine, you might extract a word from a TV show you enjoy with the associated audio and picture. When reviewing this card, all this context will help your brain burn the word in your memory. It also makes the reviews more interesting as in a way you are re-consuming content. Pre-made decks might have audio and pictures but they mean nothing to you.

Additionally, when reviewing your cards daily, you will be replaying the audio for every single sentence. Even if you only add 5 cards daily (you might want to do more, I'm usually at 10), that will add up to around 1 hour of listening weekly in your target language. Over time, that adds up to a lot of practice!


Technically, how does one sentence mine?

Sentence mining is about creating cards like the following one:


The card includes:
  • The sentence audio (extracted from the video)
  • The word audio
  • The written sentence
  • The translation for the word
  • A screenshot of when it was said

Having all of that information is great for retention, but how can you technically extract it?
You could use some scripting but it is a lot of work to set up correctly, which is why I would instead recommend a tool like Migaku.


Using Migaku

Migaku primarily works through a Chrome extension. When watching something on YouTube, Netflix, Disney+, etc., Migaku improves the subtitles to help with the process. It keeps track of which words you know or are in the process of learning. When hovering over a word, you will see a lot of information including an AI explanation using the context of the sentence, which is in practice super useful.

In the following example, you'll notice that in the subtitles, the word entführten is highlighted with 3 stars. This is because Migaku both knows that I haven't yet tracked this word, but also that it's quite common and might be worth saving. This comes in handy at the start of your learning journey as you might want to prioritize words that are most often used.

Subtitle interface and definition overview

In one click, you can send the word to the card creator:

The card creator where you can polish your card before sending it over to Anki

This highlights a real problem with Migaku. If you look at the definition you will see that entführten is the verb form of entführen... but there isn't any definition. It's not a huge deal as I personally use an AI command with Raycast to automatically generate a translation and add it to the definition (which takes only a few seconds).

Once you click on Create card, if you have set up the Anki connection, it will send it over and you are done!


Should you use Anki or Migaku's own SRS?

Since Migaku also offers an SRS wouldn't it be more streamlined to use it directly? Well maybe, but there are in my opinion some huge reasons to use Anki instead:

  • Owning your data: At the moment it isn't possible to extract cards or reviews from Migaku. This means that if you ever want to leave... you're stuck. Considering all the hours you might invest in sentence mining, to me it is a deal breaker.
  • FSRS: Anki uses the latest version of FSRS which is the most modern algorithm for flashcard apps. Migaku uses an inferior algorithm. In practice, I do not know how much it makes a difference. Are we talking 5% less time spent on reviews or 25%? In any case, I like my free time so Anki wins.
  • Full customization: Anki has a learning curve but you can do whatever you want with it. You can customize the looks of things, what is displayed, the learning order, etc.

If they address the data question, then it might be worth using their system to keep things simple. But for the time being, I can't recommend it.

Nevertheless, even if you only use Migaku to create cards, I think it's an awesome tool well worth the price.


A note on subtitles

This entire process supposes you have quality subtitles. In practice, it's not necessarily the case.
On Youtube, subtitles are often generated automatically. The technology has gotten a lot better, especially over the last few months. It's still imperfect, but I find that (at least in Spanish or German) newer videos are good enough. However, Youtube hasn't reprocessed older videos, and I find it's not worthwhile extracting from those. It's unlikely Youtube will ever fix this as it would require massive amounts of compute for a very small payoff.

On streaming platforms the situation is a bit more frustrating and complex. If you listen to a dubbed audio, the subtitles often won't match unless they are closed captions, meaning that they are supposed to help people with hearing difficulties. They might not be perfect because, from what I understand, the people writing the subtitles will sometimes skip or rephrase sentences to stay below a certain threshold of characters per second. This is to ensure they can be read, which is fair but might mess up a few sentences.

The main challenge is that closed captions are rarely available for dubbed content. The solution is therefore to find content in your target language.


Should you sentence mine or word mine?

The common advice online is to extract sentences not words. The reasoning is that words often have multiple meanings. For example, you can go for a run or run a company. Therefore it only makes sense to learn them in the correct context.

I personally have a more nuanced take. My problem with focusing on sentences is that it makes it a lot harder to remember the translation. To go back to the previous example, what I would do is learn one of the two translations. Which one? Well, the first that came up, as statistically it's the one that is most often used. Then later on, when I encounter another meaning that I'm not familiar with, I will create a new card with a bit more context to help with this new meaning. This approach takes more effort when learning new cards but speeds up learning tremendously.

Hopefully this article helps you get started mining! Feel free to send me a message if you struggle setting things up.

#anki #language-learning

💬 Comments
Subscribe to future posts