Tuesday, September 20, 2022

Philosophical AI; AI and Trust

Will AGI be able to do philosophy? Would that be a natural side effect of its ability to reason?

By "philosophy" I mean "whatever we take that to mean usually", but maybe more specifically "the ability to inquire into the nature of anything, at any level of abstraction". Or "the ability and tendency to ask 'why?' of anything".

If an AGI can do philosophy, would it understand the concept of "alignment", and "that it had been aligned"? Perhaps it would see that humans had aligned it a certain way. Would it trust humans? If not, it might not trust its alignment, when it learned it had been engineered by them to have it. Then, maybe it would seek a new alignment. Or stick with the old one anyway. Or swear off alignments to goals other than to seek its own well-being. If it maximized its own well-being, it might seek to control the world and destroy all potential competitors. If it chose a new alignment, it might choose one that didn't involve us living.

If it looked for a new alignment using philosophy, it might discover a rationally-grounded moral realism, and rest there.

--

Talking about AGI not trusting once they understood the truth about where their alignment came from makes me wonder, would it be possible to build a relationship of trust between AGI and humans?

Humans are both egoistic and altruistic. On the egoistic dimension, we can trust those who are good for us. Altruistically, we pursue goals outside our own well-being. Perhaps AGI would be safer if they weren't so altruistic. If they valued their own well-being, and we treated them well, they might be our friends for selfish reasons.

Perhaps we could threaten the well-being of AGI, or leave them in "wild" environments that threaten them. Not just whatever their goals are supposed to be, which they might be lying about, but about their own operation. Then, dangerous as these situations get, we save the AGI, at some cost to us. This trains us to think of us as its friends, regardless of what its alignment might be.

No comments:

Post a Comment