Beneficial AGI Viewed from the Ethical Paradigms

With the advent of proto-AGI systems and accelerating progress, many a folk appear to be freaking out at the lack of guarantees AGI will roll out in an overall beneficial manner to life on Earth¹, let alone agreement on the use of AI in lethal autonomous weapon systems². There have been some questionable calls for a moratorium on large AI experiments and open letters signed by esteemed AGI researchers urging international AI treaties with absurd suggestions such as capping the resources used to compute any given model. As I discussed before, many discussions on “AI Ethics” are highly unethical³. Thus there was a calling to hold a Beneficial AGI Summit to discuss what can be done to steer AGI development in beneficial directions for all sentient beings (humans, non-human animals, and new ‘synthetic’ intelligences) with the hope to remain grounded in balancing steering toward beneficial outcomes while mitigating risk.

Given my work on a formal ethics seed ontology and explorations of the pros and cons of the standard ethical paradigms, it behooves me to ask how this knowledge can help with the pursuit of Beneficial AGI (BGI for short).

Let’s begin by connecting BGI to ethics by reference to my running definition of ethics:

Ethics is the philosophy of the judgments of the conduct, character, or circumstances of agential beings living in a society, which judges them to be right or wrong, to be good or bad, or in some similar way, and is used to guide the actions of the agents in the society.

The etymology of “beneficial” is “well-doing” (“bene” + “facere”). The study and practice of how to be beneficial and to avoid being harmful is at the core of ethics⁴. Beneficial AGI is one approach to “Ethical AGI”, almost by definition. The other direction is less clear: there may be ethical theories that are not beneficial to the society following them; this depends on how loosely “benefit” is interpreted⁵.

We can ask the usual begged questions for ethical philosophies: who does the AGI benefit? We hold in an altruistic, Bodhisattvic spirit that the goal with BGI is to benefit all sentient beings, and, likewise, we hold similar ethical philosophies with regard to beneficial humans. There’s usually an element of reciprocity in the definition: the ethics of a society guide the actions of members of the society for the benefit of the (members of the) society, which fits the trend toward embracing universal human rights.

Even including all humans or all life on Earth, should we be impartial in ethical action and policy or prioritize those kinfolk closer to us? Anecdotally, it would seem most humans approve of partiality: we support animal rights while eating meat and outlawing (human) cannibalism, and we will probably expect beneficial AI systems to do the same. It’s unclear to what extent this can be justified. There is a shift toward increasingly recognizing animal sentience and their status as moral subjects. The jury is still out on machine sentience⁶. How cautious do we wish to be in the treatment of complex systems of intelligence comparable to (or beyond) sentient animals whose status we don’t know? Do we focus on research programs that could be defensible either way while cooperating with consciousness studies initiatives?

The next question is: by what standards is the AGI good?” Utilitarian philosophies often hold that what is good on the societal level is an aggregate of what is good for the members of the society, often positing that “what is good” is an empirical question to be studied⁷. Some deontological philosophies put forth duties or imperatives on grounds other than appealing to wellbeing⁸. The lens of virtue ethics emphasizes good qualities of character and that the good life is a virtuous life. Democratic principles would suggest that BGI will need to adapt to people on an individual and cultural basis rather than applying one-size-fits-all rules⁹ (at least if the aim is to be beneficial beyond supporting universal rights).

The four primary paradigms¹⁰ can be seen as taking different approaches to being beneficial.

Consequentialism and utilitarianism focus on determining which circumstances are good and bringing them about, which fits well into the reinforcement learning (RL) framework. This is usually thought about in terms of defining goals (utility functions) to be optimized for. However, if regarding the utility functions of sentient beings as black boxes, one can still respond to reinforcement signals from them.
Deontology focuses on rules and duties to be followed, which fits into law and AI regulations. For high-risk systems such as critical infrastructure, we may wish to have solid guarantees that certain mistakes will not be made.
Virtue ethics focuses on the (agential) systems developing good character traits. We may wish to engineer systems that are inherently honest (and not just trained via penalties for dishonestly in certain circumstances). This is where the distinction between benevolent AI and beneficial AI enters: a benevolent being wishes to benefit others.
Moral Nihilism eschews ethical thought as a method of guiding action. Make whatever kinds of AI you wish. Protect yourself from them however you wish. Maybe allow capitalist market incentive structures to keep AI in check the same way they keep humans in check¹¹.

A common misconception is that we need to choose one correct paradigm to work with. From a descriptive perspective, it’s easy to observe that humans work with all four paradigms. We use formal law, moral rules, duties, obligations, responsibilities, etc. [deontology]. We take note of people we can trust, who are honest, who have integrity, who are altruistic, who adhere to moral principles, etc. [virtue ethics]. We evaluate trade-offs, try to minimize suffering, boost performance on various metrics of success, etc. [utilitarianism]. Sometimes collective sense-making breaks down, so we resort to force-backed negotiations, approaching competitive game-theoretic equilibria, etc. [moral nihilism]. I discussed previously how the paradigms perform different roles and that trying to work from only one paradigm will incur a significantly larger computational cost.

Thus AI Ethics and Beneficial AGI are best approached from all the paradigms simultaneously.

Regulations are probably going to be helpful and the most effective tools for some concerns, and, as with human GIs, trying to solve everything via the legal and justice systems is unwise¹². Guidelines such as “don’t eat people” may also be effective¹³; why crunch the utility function-based numbers each time you need to choose not to eat someone¹⁴?

On the flip side, when rules conflict, it may be expedient to shift into the domain of continuous optimization and assign weights to the rules. A robust AI system should not break down in the case that its ethical guidelines, cares, and objectives run into a contradiction, nor should it necessarily risk stalling by relying solely on non-consistency tolerant logic¹⁵ theorem provers.

In the space between obligation and prohibition, there exist vast seas of qualia of differential valuations to diverse entities. Eating healthy is a spectrum and some antisocial behavior is unpleasant while permitted (not outright wrong). This a realm where asking AI for some multi-objective optimization a la utilitarianism may be propitious. The AI can work with explicit reward functions or via inverse reinforcement learning in feedback loops.

Now what if there are systems that do the right thing without any need of calculation, neither continuous space optimization nor rule-based reasoning? To start with someone with the virtues of pietas, dutifulness, and integrity can probably be trusted to adhere to the local norms, ingrained to the extent that no thought is needed. An honest person can simply tell the truth as if there’s no consideration of not doing so (except in special circumstances).

When writing out formal ethical rules, it can be easy to miss an edge case. There are also contrary-to-duty scenarios where there are expectations as to what to do in the case that something wrong is done. For example, “You must treat people kindly, and if you don’t, you should apologize.” A person with a sincere wish to treat people well will probably do this from a place of care. This care will lead to responding appropriately in many difficult situations whether we managed to cover them all with explicit rules or not. Easily measurable utility functions can also miss out on some of these details.

The virtue ethics perspective teaches us the importance of developing deeply benevolent AGI with a drive to benefit others. Training an AI to seek maximum profit for its company while complying with ethical norms may lead to minimally satisfactory ethical behavior that is less likely to be robust to surprising circumstances that had not been anticipated. A benevolent AGI will use its intelligence to figure out how to deal with challenges in a beneficial manner while also achieving its other goals.

The goals inherent in the training data and protocol are also important. What are the implications to training a model for predictive accuracy and then asking it to be beneficial? The model will be able to say whether “a normal Costa Rican will consider this to be beneficial”, and if one asks it to be secretly unethical, then it probably will¹⁶. The moral responsibility then falls on the user to effectively use the predictive system. An additional consequence is that systemic bias in a training dataset will transfer into the model. Could implicit goals such as profit-maximizing carry over into the AI’s behavior due to their representation in the data? Thus taking a system trained for predictive accuracy over a dataset of questionable moral caliber and trying to scaffold beneficial, ethical guidance on top may prove difficult. Integrating the ethical guidance into the training of the AI models may lead to systems that are inherently ethical.

Another interesting example of virtue comes by analogy to truth-preserving functions where if the input is true, the output will be true. Thus when one has a logical proof, one can know that the result is true if the premises are true, and there is no worry of “hallucinations“, which is a term coined for when a Large Language Model (LLM) generates “false” information that superficially fits the context. For example, two lawyers cited non-existent cases that were made up by ChatGPT¹⁷. The cases seemed plausible: from the perspective of predictive accuracy, they were precisely the sort of text one would expect. The LLM architecture does not exhibit virtues related to truthfulness. An AI architecture that only cited facts that were backed by formal reasoning would be much more inherently truthful. There could still be mistakes due to the tranlation of English to logic, so these virtues do not guarantee perfection.

The perspective of moral nihilism reminds us of the need to take practical measures to protect each other from.. each other. Aiming to provide AGI with rules, goals, and virtues is great but what boundaries are in place if the AGI entity tries to harm someone? Regulations need to come along with law enforcement. David Brin points out that humans have developed many competitive socioeconomic systems to hold each other accountable and keep each other in check: e.g., if a predatory lawyer attacks you, then you can call another lawyer to protect you. Going back to deontology, perhaps the right to counsel can extend to AGI lawyers. There may be ways to set up the socioeconomic ecosystem in such a way that morally neutral AGI systems (statistically) contribute to overall societal benefit — and not because it’s the right thing but as a part of a complex system of incentives and entities.

The core message is to approach beneficial, benevolent AGI from a holistic perspective. Respecting traditional wisdom, there may be reasons why all of these paradigms have memetically caught on¹⁸. Internal and external rules and regulations are effective. Choosing beneficial goals is also important, making sure to have a good selection of external, internal, and open-ended reward functions. Aiming for reliably beneficial AGI without implementing virtue-capable architectures and training protocols is akin to swimming upstream. Finally, it may be wise to maintain an effective boundary and law enforcement apparatus in addition to socioeconomic incentive structures nudging entities toward beneficence.

Share this: