More

lukasego · 2025-06-21T10:56:26 1750503386

This is an amazing vision. I want my browser to remind me if I lose focus, and to analyze and show me what I've been doing so I can learn from myself. Self-reflection is powerful here.

lukasego · 2025-06-19T00:00:03 1750291203

LLMs should be used to REFLECT cognitive states while writing, and not for generating text. Reflecting thought patterns would be a mode where the writer deepens their understanding when writing essays, and gains better decision-making as well as coherence, as the LLM assesses and suggests where thinking could be refined. That will help against the accumulation of cognitive debt and increase cognitive width and depth.

Cogilo (https://cogilo.me/) was built for this purpose in the last weeks. This paper comes at a very welcome time. Cogilo is a Google Docs add-on (https://workspace.google.com/marketplace/app/cogilo/31975274...) that sees thinking patterns in essays. It operates on a semantic level and judges and tries to reveal the writer's cognitive state and thinking present in the text - to themselves, hence making the writer deepen their thinking and essay.

Ultimately, I think that in 300 years, upon looking back at the effect and power that AI had on humanity, we will see that it was built by us, and existed, to reflect human intelligence. I think that's where the power of LLMs will be big for us.

lukasego · 2025-04-02T11:52:30 1743594750

The entirety of the production-ready platform took us 3-4 weeks to build, including figuring out RL and GPU infrastructure. If you want to know more about RL, you can check out Huggingface. You can also hop on Augento https://augento.ai and join our Slack community. We'll answer and discuss any question together and with others. You'd get $250 worth of free credits you can use to tinker with RL already - it'll teach you some stuff.

jacobross · 2025-04-02T12:43:25 1743597805

That’s impressive. I have an extremely long chat with Claude that I did about a month ago discussing an idea very similar to this. Obviously an idea is worth next to nothing compared to what you and the team have created here but it’s becoming a genuine obsession of mine. Will Brown’s talk recently on RL ignited this even further given what he explained.

I’ll jump in this weekend.

Part of me wishes I did CS instead of learning SWE. There’s so much to uncover in RL and jumping straight in at the top feels like the wrong strategy to learn effectively.

I love the idea, love the platform. I’ll be keeping a close eye on how you guys go.

If you need a Technical Product Manager, let me know! I’m currently an Artificial Intelligence Lead at a hardware-enabled SaaS company but genuinely believe RL and agents will be the next step towards AGI.

lukasego · 2025-04-02T11:36:02 1743593762

Hi everyone, we stripped the need to connect to a subscription when you Import a Provider. You wouldn't have had to pay anyways - but now you can just go ahead and start data ingestion onto Augento without any friction. And we continue to be happy to answer your feedback!

lukasego · 2025-04-01T11:44:30 1743507870

Thanks for this very lucid post! For many use cases such as coding, formatting, it's very clear for the users how to define the reward function. Fore more intricate ones, you're right in that it can be tricky. I like your ideas of trying to provide tools to help here, and offering recurring reward functions as templates that will only need slight adaptations. It will be the user defining it, but there's a path to simplification. - The operational friction with getting the GPUs, optimizing compute and preparing the training are hard for RL, hence we got these things out of the way. - Thanks for the very thoughtful suggestions and contacting, great input!

lukasego · 2025-04-01T09:10:11 1743498611

That's true, thanks for the feedback! In the end, it wasn't boredom, but the long work - put too much energy into the platform ;) Taking it to heart for the next one!

lukasego · 2025-04-01T08:58:29 1743497909

No, DPO avoids a Reinforcement Learning training loop. For the current iteration on verifiable domains, our method is GRPO. Let me elaborate: DPO is for preference learning - each data sample in the dataset contains 2 pieces: preferred and non-preferred responses (what the model should avoid generating). DPO optimizes for the preferred response between the 2. That means, DPO is one effective method for making a model learn sentiment or preference. We call a generalization of this alignment mode - it's on our roadmap. On the current GRPO implementation side, dataset needs on Augento are simpler: Just the prompt, and some captured context if you like - it's then the reward function that scores the model generations. Currently, with GRPO, training is done on verifiable domains. Not preference, but one piece of output will be judged by a deterministic reward function, or by a reward model (which the user decides - you can decide it through defining the reward function).

(EDIT: Would you use DPO? Do you have experience with it or needs?)

lukasego · 2025-04-01T11:29:16 1743506956

To add, there is the important distinction to be made between RLHF (Reinforcement Learning with Human Feedback) and RL. DPO is a simpler and more efficient way to do RLHF. In its current iteration, Augento does RL (using the term coined by OpenAI: Reinforcement Fine-tuning) which improves model performance on domains where there exists a verification function for the answer that you can use for scoring, rather than a preferred answer such as DPO needs. But as said, such preference mode is on the roadmap.

lukasego · 2025-04-01T07:04:45 1743491085

People pay for convenience, that's true - and part of the equation here. Agreed! The approach is to make data capturing as convenient as possible, where you just paste in api key + base url into your existing code, and you gather all your runs. And then, Reinforcement Learning is hard to figure out - so one of the goals is to commoditize Reinforcement Learning, what you're alluding to. In its iteration, the platform is released with verifiable mode where Augento takes all the headache of GPU infrastructure, GRPO implementation, training configurations and dataset curation away - you just select your gathered runs, and start the training. But we'll go past that, and expand Augento into a platform for alignment and self-learning. Tl;DR Yes, indeed! We designed Augento with convenience in mind.

lukasego · 2025-04-01T06:02:29 1743487349

Yes, indeed

lukasego · 2025-04-01T05:43:55 1743486235

Thanks for stating your preference! This is something we can incorporate into the platform.