Dark Goals and the Paperclip Maximizer

Much discussion on Beneficial General Intelligence focuses on the appropriate goal choice to amplify the probability that AGI systems prove beneficial and not very harmful. Due to the intractability of perfection in the real world (at least as modeled by partially observable Markov decision processes¹), even finding a benevolent goal does not rule out harmful mistakes; however, perhaps we can say something about a class of dark goals.

I will return to the anti-bodhisattva who exhibits a high D-factor: “The general tendency to maximize one’s individual utility — disregarding, accepting, or malevolently provoking disutility for others —, accompanied by beliefs that serve as justifications.“

What counts as “one’s individual utility”? This could be almost any utility function. The classic silly case is that of a “paperclip maximizer” that turns the Earth into paperclips. Let’s call an individual utility function one whose value can be determined without incorporating the evaluation of other entities². In pursuit of an individual utility function, an entity may objectify all other entities and disregard their concerns except insofar as cooperation is instrumentally needed.

Thus one should expect an entity with a single goal defined by an individual utility function to exhibit dark factor traits increasingly proportional to its intelligence. It will be Machiavellian and egoistic, by definition. Deeper empathy will be a detriment replaced by practically effective cognitive empathy devoid of care. Don’t be fooled by the cutesy paperclip maximizer example: these dark traits are merely intelligent behavior following from a selfish goal³.

Share this: