I think you are not appreciating the difference between a "commercial" NN and the human brain. NNs usually are designed for specific tasks that are simply a subset of the capability of humans. The human brain is huge and therefore an equivalent NN would also be huge. Instead we have lots of small networks and many of them are even competing and trying to solve the same problem.
You need a lot of samples because you're starting from scratch with each network. If you had one super NN that is equally powerful to a bunch of small networks then you would have a network that can easily generalize because it can use existing data as a starting point. The amount of existing data that is useful to an unknown task grows with the size of the NN.
An NLP NN for English could be combined with an image recognition NN. Since the NLP NN already has a concept for "cars" it only has to associate its already learned definition of "car" with images of cars. If you have separate NNs then you will have to teach the both NNs what a car is twice. With small NNs there will always be some redundancy and that redundancy is a fixed cost.
You need a lot of samples because you're starting from scratch with each network. If you had one super NN that is equally powerful to a bunch of small networks then you would have a network that can easily generalize because it can use existing data as a starting point. The amount of existing data that is useful to an unknown task grows with the size of the NN.
An NLP NN for English could be combined with an image recognition NN. Since the NLP NN already has a concept for "cars" it only has to associate its already learned definition of "car" with images of cars. If you have separate NNs then you will have to teach the both NNs what a car is twice. With small NNs there will always be some redundancy and that redundancy is a fixed cost.