Well. I'm working on a product that relies on both AI assistants in the user-facing parts, as well as LLM inference in the data processing pipeline. If we let our LLM guy run free, he would create an inscrutable tangled mess of Python code, notebooks, Celery tasks, and expensive VMs in the cloud.
I know Pythonista's regard themselves more as artists than engineers, but the rest of us needs reliable and deterministically running applications with observability, authorization, and accessible documentation. I don't want to drop into a notebook to understand what the current throughput is, I don't want to deploy huge pickle and CSV files alongside my source to do something interesting.
LangChain might not be the answer, but having no standard tools at all isn't either.
That's the central idea here. Most guys available to hire aren't. Hence why they get constrained into a framework that limits the damage they can cause. In other areas of software development the frameworks are quite mature at this point so it works well enough.
This AI/LLM/whatever you want to call it area of development, however, hadn't garnered much interest until recently, and thus there isn't much in the way of frameworks to lean on. But business is trying to ramp up around it, thus needing to hire those who aren't good to fill seats. Like the parent says, LangChain may not be the framework we want, but it is the one we have, which beats letting the not-very-good developers create some unconstrained mess.
If you win the lottery by snagging one of the small few good developers out there, then certainly you can let them run wild engineering a much better solution. But not everyone is so fortunate.
LLMs are, at least at present, exactly the kind of thing where trying to use an abstraction without understanding what it actually does is exactly what's going to create a mess in the long run.
Some hiring teams just don’t operate in unlimited venture capital land and have tight boundaries in terms of compensation. There’s someone good in anything if you can throw enough money at the problem.
"More artists than engineers": yes and no.
I've been working with Pandas and Scikit-learn since 2012, and I haven't even put any "LLM/AI" keywords on my LinkedIn/CV, although I've worked on relevant projects.
I remember collaborating back then with PhD in ML, and at the end of the day, we'd both end up using sklearn or NLTK, and I'd usually be "faster and better" because I could write software faster and better.
The problem is that the only "LLM guy." I could trust with such a description, someone who has co-authored a substantial paper or has hands-on training experience in real big shops.
Everyone else should stand somewhere between artist and engineer: i.e., the LLM work is still greatly artisanal. We'll need something like scikit-learn, but I doubt it will be LangChain or any other tools I see now.
You can see their source code and literally watch in the commit history when they discover things an experienced software engineer would do in the first pass.
I'm not belittling their business model! I'm focusing solely on the software. I don't think they their investors are naive or anything.
And I bet that in 1-2 years, there'll be many "migration projects" being commissioned to move things away from LangChain, and people would have a hard time explaining to management why that 6-month project ended up reducing 5K LOC to 500 LOC.
For the foreseeable future though, I think most projects will have to rely on great software engineers with experience with different LLMs and a solid understanding of how these models work.
It's like the various "databricks certifications" I see around. They may help for some job opportunities but I've never met a great engineer who had one. They're mostly junior ones or experienced code-monkeys (to continue the analogy)
What you need is a software developer, not someone who chaotically tries shit until it kinda sorta works. As soon as someone wants to use notebooks for anything other than exploratory programming alarm bells should be going off.
I know Pythonista's regard themselves more as artists than engineers, but the rest of us needs reliable and deterministically running applications with observability, authorization, and accessible documentation. I don't want to drop into a notebook to understand what the current throughput is, I don't want to deploy huge pickle and CSV files alongside my source to do something interesting.
LangChain might not be the answer, but having no standard tools at all isn't either.