I explored memory models for spaced repetition in my master's thesis and later built an SRS product. This post shares my thoughts on content-aware memory models.
I believe this technical shift in how SRS models the student's memory won't just improve scheduling accuracy but, more critically, will unlock better product UX and new types of SRS.
I've been playing with something similar, but far less thought out than what you have.
I have a script for it, but am basically waiting until I can run a powerful enough LLM locally to chug through it with good results.
Basically like the knowledge tree you mention towards the end, but attempt to create a knowledge DAG by asking a LLM "does card (A) imply knowledge of card (B) or vice versa". Then, take that DAG and use it to schedule the cards in a breadth first ordering. So, when reviewing a new deck with a lot of new cards, I'll be sure to get questions like "what was the primary cause of the civil war", before I get questions like "who was the Confederate general who fought at bull run"
What I like about your approach is that it circumvents the data problem. You don't need a dataset with review histories and flashcard content in order to train a model.
I've got a system for learning languages that does some of the things you mention. The goal is to be able to recommend content for a user to read which combines 1) appropriate level of difficulty 2) usefulness for learning. The idea is to have the SRS system build into the system, so you just sit and read what it gives you, and review of old words and learning new words (according to frequency) happens automatically.
Separating the recall model from the teaching model as you say opens up loads of possibilities.
Brief introduction:
1. Identify "language building blocks" for a language; this includes not just pure vocabulary, but the grammar concepts, inflected forms of words, and can even include graphemes and what-not.
2. For each building block, assign a value -- normally this is the frequency of the building block within the corpus.
3. Get a corpus of selections to study. Tag them with the language building blocks. This is similar to Math Academy's approach, but while they have hundreds of math concepts, I have tens of thousands of building blocks.
3. Use a model to estimate the current difficulty of each word. (I'm using "difficulty" here as the inverse of "retrievability", for reasons that will be clear later.)
4. Estimate the delta of difficulty of each building block after being viewed. Multiply this delta by the word value to get the study value of that word.
5. For each selection, calculate the total difficulty, average difficulty, and total study value. (This is why I use "difficulty" rather than "retrievability", so that I can calculate total cognitive load of a selection.)
Now the teaching algorithm has a lot of things it can do. It can calculate a selection score which balances study value, difficulty, as well as repetitiveness. It can take the word with the highest study value, and then look for words with that word in it. It can take a specific selection that you want to read or listen to, find the most important word in that selection, and then look for things to study which reinforce that word.
You mentioned computational complexity -- calculating all this from scratch certainly takes a lot, but the key thing is that each time you study something, only a handful of things change. This makes it possible to update things very efficiently using an incremental computation [1].
I've got several active users without really having done any advertising; working on revamping the UI and redesigning the website before I do a big push and start advertising. Most of the people using the site have learned Biblical Greek entirely through the system.
There are experimental ports to Korean and Japanese as well, but those (along with the Mandarin port) aren't public yet. The primary missing pieces are:
1. Content -- the system relies on having large amounts of high-quality content. Finding it, tagging it, and dealing with copyright will take some time
2. On-ramp: It works best to help people at the intermediate level to advance. But if you start at an intermediate level, it doesn't know what you know.
Another thread I'm pursuing is exposing the algorithm via API to other language learning apps:
I believe this technical shift in how SRS models the student's memory won't just improve scheduling accuracy but, more critically, will unlock better product UX and new types of SRS.