One approach being considered is "AI Safety Via Debate"[0], which hopes to preve...

Nasrudith · on July 24, 2018

Forget AIs - we need this for humans to design legal and administrative systems.

I have pondered if it would be a workable field to have incentive based design in a formalized way to ensure that even a complete sociopath would find acting in a beneficial way the best option.

TeMPOraL · on July 24, 2018

Do we know the entire game theory well enough so that we can structure such games with no theoretical way for AI to sneak out? I doubt that, but even so, funny things start happening when theory meets practice. I recall the example of quantum entanglement, which (I read) enables communications that cannot be spied upon without the intended parties knowing. Except, (I also read) it was attacked at the interface between quantum and classical domain. The world is complex, and superhuman AI is by definition better equipped to find loopholes than humans are.

nutjob2 · on July 24, 2018

Unfortunatley being dishonest or evil is just one example. Arguably the AI can develop new classes of deviancy, abuse or maladaptation that we haven't conceptualized yet. We supersize the ability, surely we supersize the problems.

It leads to a scary question: what does a superhuman AI really want?

Nasrudith · on July 24, 2018

To be fair a HFT agent can count as superhuman AI technically. Wanting isn't a thing that applies yet to actual AI and there is no special sauce that indicates advancement beyond neuron scale. Barring directives and assuming "grown" what it wants can be utterly peripheral to rationality and likely based on what it is taught - internationally or not. Look at how society preaches honesty from a young age and then starts teaching lying again by rewarding it. The real lesson is the spartan one on stealing- don't get caught. It may not be intended but it is the result.

krageon · on July 26, 2018

What does a human that is much smarter than you really want? It's a fundamental philosophical problem that hasn't been solved.

JoshTriplett · on July 24, 2018

> which hopes to prevent deception by carefully constructing games in which a superhuman agent's best strategy is honesty

I'd be very hesitant to assume that an agent cannot learn under which circumstances it should be honest to gain a benefit without putting any innate value on honesty. A human agent is more than capable of reasoning like that, let alone a superhuman one.