Consciousness implies self-awareness, in space and time. Consciousness implies progressive formation of the self. This is not acquired instantly by a type of design. This is acquired via a developmental process where some conditions have to be met. Keys to consciousness are closer to developmental neurobiology than the transformer architecture.
Languages come and go. There was a time when there was a huge momentum behind Ruby (and Rails). It is not (sadly) the case anymore. It is a matter of traction. C'est la vie. I remember back in the 90s there was great interest in Delphi (Borland OO language) but then came Java. I don't even know if someone is still coding in Delphi. I guess Ruby will eventually go the same way.
MCP is not mature enough to put servers in an Internet facing position. Unless you put gateways (inspecting JWTs, filtering out sensitive data) in front of them. Spec still has a long way to go, especially on the Streamable HTTP/SSE + OAuth front.
Exploring MCP (model context protocol) using Claude as a base LLM. I understand that this is quite new and may change a lot in the next few months but I feel something interesting could be done by plugging transactional APIs to a LLM. Remind me of the old CGI (common interface gateway) stuff in the 1990s.
This kind of hack can lead LLMs to be the 21st century browsers.
Oh yes... also working on preparing my retirement end of this year...
I think big parts of the answer include time domain, multi-agent and iterative concepts.
Language is about communication of information between parties. One instance of an LLM doing one-shot inference is not leveraging much of this. Only first-order semantics can really be explored. There is a limit to what can be communicated in a context of any size if you only get one shot at it. Change over time is a critical part of our reality.
Imagine if your agent could determine that it has been thinking about something for too long and adapt strategy automatically. Increase to higher param model, adapt the context, etc.
Perhaps we aren't seeking total AGI/ASI either (aka inventing new physics). From a business standpoint, it seems like we mostly have what we need now. The next ~3 months are going to be a hurricane in our shop.
The training space is more important. I don’t think a general intelligence will spawn from text corpuses. A person only able to consume text to learn would be considered severely disabled.
A significant part of intelligence comes from existence in meatspace and the ability to manipulate and observe that meatspace. A two year old learns much faster with much less data than any LLM.
We already have multimodal models that take both images and text as input. The bulk of the training for these models was in text, not images. This shouldn’t be surprising. Text is a great way of abstractly and efficiently representing reality. Of course those patterns are useful for making sense of other modalities.
Beyond modeling the world, text is also a great way to model human thought and reason. People like to explain their thought process in writing. LLMs already pick up on and mimic chain of thought well.
Contained within large datasets is crystallized thought, and efficient descriptions of reality that have proven useful for processing modalities beyond text. To me that seems like a great foundation for AGI.
> To me that seems like a great foundation for AGI.
It's only one part, predicting text is relatively straightforward because it doesn't require predicting complex sequences like 'a S23mz s.zawsds'. Based on statistical analysis, there is a limited number of word combinations that humans use. With hundreds of billions of parameters, significant compression is possible. Mathematics is different as it requires actual reasoning, an area where LLMs often struggle significantly because they lack the capability for genuine reasoning.
Text and 2D images are a tiny subset of physical reality as perceived by an able-bodied human. Even our best approximation (3D VR headset with Spatial Audio) is a poor representation. We don’t even bother to simulate touch, temperature, equilibrio-sense, etc. And the more detailed you get, the less data you have.
These senses can be described via text, but I’m highly skeptical that the learning outcomes will be the same.
>> Text and 2D images are a tiny subset of physical reality as perceived by an able-bodied human. Even our best approximation is a poor representation.
This is wrong. There’s nothing magical about human perception. You see the world because a 2D image is projected onto your retina.
GPT-4 was trained on text and generalized the ability to output 2D images. There’s absolutely nothing to suggest text can’t generalize further to new modalities. GPT4 is forced to serialize images as SVGs to output them (a crazy emergent ability btw), but that demonstrates an inherent spatial reasoning capability baked into the model.
GPT4V was created with a transfer learning step where image embeddings are passed as input in place of text. That’s further evidence of models ability to generalize to new modalities.
Everything you need to do multimodal input and output is already trained in, GPT-4V I’m sure is just the start.
And it shows. It has a poor grasp of reality. It does a poor job with complex tasks. It cannot be trusted with specialized tasks typically done by expert humans. It is certainly an amazing technical achievement that does a decent job with simple tasks requiring cursory knowledge, but that’s all it is at this time.
>There’s absolutely nothing to suggest text can’t generalized further to new modalities
Nope. OpenAI has already demonstrated the ability to generalize GPT4 to a new modality. Your claim that text models can only generalize to images and not other modalities is utterly unconvincing. Explain to me why vision is so much different than say audio?
>> And it shows. It has a poor grasp of reality. It does a poor job with complex tasks.
GPT4 is a proof of concept more than anything. I’m excited to see how much reliability improves over time. It’s grasp of reality isn’t prefect, but at least it understands how burden of proof works.
I walked back nothing. OpenAI was surprised by the mass adoption of ChatGPT, they saw it as an early technical preview.
I don’t understand why some people have a such hard time envisioning the potential of new technologies without a polished end product in their hands. Imagine if AI researchers had the same attitude.
Technology can be both real and unpolished at the same time. Those two things are not contradictory.
A two year old learns faster because it has inherited training data from its ancestors in the form of evolutionary memory. Think of it as a BIOS for human beings. The LLM takes longer to learn because we are building this BIOS for it. Remember it took billions of years for the human BIOS to be developed.
Definitions, again. OpenAI defines AGI as highly autonomous agents that can replace humans in most of the economically important jobs. Those don't need to look or function like humans.
LLMs as we currently understand them won't reach AGI. But AGI will very likely have an LLM as a component. What is language but a way to represent arbitrary structure? Of course that's relevant to AGI.
I'd say it depends on your data model. If it is fairly simple, Ruby on Rails is quite awesome. I didn,t experiment with Clojure, Scala, etc. but RoR is a quite impressive toolset to do CRUD on simple stuff.
Apart from the legal aspects, a problem I see is the day you'd have X subscribers paying to get your (aggregated) content, what if some sources (playing cat/mouse with you, or not) refactor their web sites (basically F*#king your data pipes). Then you'll have to turn around quite fast because you'll have tens and tens customers yelling at you.