To Gen or Not to Gen: The Ethical Use of Generative AI
This article by Johannes Link and Jakob Schnell explores the ethical dimensions of generative AI (GenAI), focusing on large language models. It highlights both promises and harms, including ecological impact, misinformation, threats to education and democracy, and digital colonialism. The authors argue for a balanced, informed approach that weighs benefits against risks, often requiring trade-offs.
Article intelligence
Key points
- GenAI has significant downsides: massive energy use, e-waste, misinformation, and IP issues.
- LLMs lack true reasoning and are prone to hallucinations; they cannot distinguish truth from falsehood.
- Ethical use requires weighing advantages against damages, sometimes sacrificing efficiency or profit.
- The authors recommend using small, domain-specific models and staying informed about GenAI's impacts.
Why it matters
This matters because genAI has significant downsides: massive energy use, e-waste, misinformation, and IP issues.
Technical impact
May affect model selection, inference cost, product capability, and evaluation benchmarks.
This blog entry started out as a translation of an article that my colleague Jakob and I wrote for a German magazine. After that we added more stuff and enriched it by additional references and sources. We aim at giving an overview about many - but not all - aspects that we learned about GenAI and that we consider relevant for an informed ethical opinion. As for the depth of information, we are just scratching the surface; hopefully, the loads of references can lead you to diving in deeper wherever you want. Since we are both software developers our views are biased and distorted. Keep also in mind that any writing about a “hot” topic like this is nothing but a snapshot of what we think to know today. By the time you read it the authors’ knowledge and opinions have already changed.
Last Update: January 17, 2026.
Table of Contents
Abstract
About us
Johannes Link
Jakob Schnell
Introduction
Ethics, what does that even mean?
Clarification of terms
Basics
Can LLMs think?
What LLMs are good at
GenAI as a knowledge source
GenAI in software development
Actual vs. promised benefits
Harmful aspects of GenAI
GenAI is an ecological disaster
Power
Water
Electronic Waste
GenAI threatens education and science
GenAI is destroying the free internet.
GenAI is a danger to democracy
GenAI versus human creativity
Digital colonialism
Political aspects
Conclusion
Can there be ethical GenAI?
How to act ethically
Abstract
ChatGPT, Gemini, Copilot. The number of generative AI applications (GenAI) and models is growing every day. In the field of software development in particular, code generation, coding assistants and vibe coding are on everyone’s lips. Like any technology, GenAI has two sides. The great promises are offset by numerous disadvantages: immense energy consumption, mountains of electronic waste, the proliferation of misinformation on the internet and the dubious handling of intellectual property are just a few of the many negative aspects. Ethically responsible behaviour requires us to look at all the advantages, disadvantages and collateral damages of a technology before we use it or recommend its use to others.
In this article, we examine both sides and eventually arrive at our personal and naturally subjective answer to whether and how GenAI can be used in an ethical manner.
About us
Johannes Link
… has been programming for over 40 years, 30 of them professionally. Since the end of the last century, extreme programming and other human-centred software development approaches have been at the heart of his work. The meaningful and ethical implementation of his private and professional life has been his driving force for years. He has been involved with GenAI since the early days of OpenAI’s GPT language models. More about Johannes can be found at https://johanneslink.net.
Jakob Schnell
… studied mathematics and computer science and has been working as a software developer for 5 years. He works as a lecturer and course director in university and non-university settings. As a youth leader, he also comes into regular contact with the lives of children and young people. In all these environments, he observes the growing use of GenAI and its impact on people.
Introduction
Ethics, what does that even mean?
Ethical behaviour sounds like the title of a boring university seminar. However, if you look at the wikipedia article of the term 1, you will find that ‘how individuals behave when confronted with ethical dilemmas’ is at the heart of the definition. So it’s about us as humans taking responsibility and weighing up whether and how we do or don’t do certain things based on our values.
We have to consider ethical questions in our work because all the technologies we use and promote have an impact on us and on others. Therefore, they are neither neutral nor without alternative. It is about weighing up the advantages and potential against the damage and risks; and that applies to everyone, not just us personally. Because often those who benefit from a development are different from those who suffer the consequences.
As individuals and as a society, we have the right to decide whether and how we want to use technologies. Ideally, this should be in a way that benefits us all; but under no circumstances should it be in a way that benefits a small group and harms the majority.
The crux of the matter is that ethical behaviour does not come for free. Ethics are neither efficient nor do they enhance your economic profit. That means that by acting according to your values you will, at some point, have to give something up. If you’re not willing to do that, you don’t have values - just opinions.
Clarification of terms
When we write ‘generative AI’ (GenAI), we are referring to a very specific subset of the many techniques and approaches that fall under the term ‘artificial intelligence’. Strictly speaking, these are a variety of very different approaches that range from symbolic logic, over automated planning up to the broad field of machine learning (ML). Nowadays most effort, hype and money goes into deep learning (DL): a subfield of ML that uses multi-layered artificial neural networks to discover statistical correlations (aka patterns) based on very large amounts of training data in order to reproduce those patterns later.
Large language models (LLM) and related methods for generating images, videos and speech now make it possible to apply this idea to completely unstructured data. While traditional ML methods often managed with a few dozen parameters, these models now work with several trillion (10^12) parameters. In order for this to produce the desired results, both the amount of training data and the training duration must be increased by several orders of magnitude.
This brings us to the definition of what we mean by ‘GenAI’ in this article: Hyperscaled models that can only be developed, trained and deployed by a handful of companies in the world. These are primarily the GenAI services provided by OpenAI, Anthropic, Google and Microsoft, or based on these services. We also focus primarily on language models; the generation of images, videos, speech and music plays only a minor role in this article.
Our focus on hyperscale services does not mean that other ML methods are free of ethical problems; however, we are dealing with a completely different order of magnitude of damage and risk here. For example, there do exist variations of GenAI that use the same or similar techniques, but on a much smaller scale and restricted domains (e.g. AlphaFold 2). These approaches tend to bring more value with fewer downsides.
Basics
GenAI models are designed to interpolate and extrapolate 3, i.e. to fill in the gaps between training data and speculate beyond the limits of the training data. Together with the stochastic nature of the training data, this results in some interesting properties:
GenAI models ‘invent’ answers; with LLMs, we like to refer to this as ‘hallucinations’.
GenAI models do not know what is true or false, good or bad, efficient or effective, only what is statistically probable or improbable in relation to training data, context and query (aka prompt).
GenAI models cannot explain their output; they have no capability of introspection. What is sold as introspection is just more output, with the previous output re-injected.
GenAI models do not learn from you; they only draw from their training material. The learning experience is faked by reinjecting prior input into a conversation’s context 4.
The context, i.e. the set of input parameters provided, is decisive for the accuracy of the generated result, but can also steer the model in the wrong direction. Increasing the context window makes a query much more computation-intensive - likely in a quadratic way. Therefore, the promised increase of “maximum context window” of many models is mostly fake 5.
The reliability of LLMs cannot be fundamentally increased by even greater scaling 6.
Can LLMs think?
Proponents of the language-of-thought hypothesis 7 believe it is possible for purely language-based models to acquire the capabilities of the human brain – reasoning, modelling, abstraction and much more. Some enthusiasts even claim that today’s models have already acquired this capability. However, recent studies 8 9 show that today’s models are neither capable of genuine reasoning nor do they build internal models of the world. Moreover, “…according to current neuroscience, human thinking is largely independent of human language 10” and there is fundamental scientific doubt that achieving human cognition through computation is achievable in practice let alone by scaling up training of deep networks 11.
An example of a lack of understanding of the world is the prompt ‘Give me a random number between 0 and 50’. The typical GenAI response to this is ‘27’, and it is significantly more reliable than true randomness would allow. (If you don’t believe it, just try it out!) This is because 27 is the most likely answer in the GenAI training data – and not because the model understands what ‘random’ means.
‘Chain of Thought (CoT)’ approaches and ‘Reasoning models’ attempt to improve reasoning by breaking down a prompt, the query to the model, into individual (logical) steps and then delegating these individual steps back to the LLM. This allows some well-known reasoning benchmarks to be met, but it also multiplies the necessary computational effort by a factor between 30 and 700 12. In addition, multistep reasoning lets individual errors chain together to form large errors. And yet, CoT models do not seem to possess any real reasoning abilities 13 14 and improve the overall accuracy of LLMs only marginally 15.
The following thought experiment from 16 underscores the lack of real “thinking” capabilities: LLMs have simultaneous access to significantly more knowledge than humans. Together with the postulated ability of LLMs to think logically and draw conclusions, new insights should just fall from the sky. But they don’t. Getting new insights from LLMs would require these to be already encoded in the existing training material, and to be decoded and extracted by pure statistical means.
What LLMs are good at
Undoubtedly, LLMs represent a major qualitative advance when it comes to extracting information from texts, generating texts in natural and artificial languages, and machine translation. But even here, the error rate, and above all the type of error (‘hallucinations’), is so high that autonomous, unsupervised use in serious applications must be considered highly negligent.
GenAI as a knowledge source
As we have pointed out above, LLMs cannot differentiate between true and false - regardless of the training material. It does not answer the question “What is XYZ?” but the question “How would an answer to question ‘What is XYZ?’ look like?”. Nevertheless, many people claim that the answers that ChatGPT and alike provide for the typical what-how-when-who queries are good enough and often better than what a “normal” web search would have given us. Arguably, this is the most prevalent use case for “AI” bots today. The problem is that most of the time we will never learn about the inaccuracies, left-outs, distortions and biases that the answer contained - unless we re-check everything, which defies the whole purpose of speeding up knowledge retrieval. The less we already know, the better the “AI’s” answer looks to us, but the less equipped we are to spot the problems. A recent by the BBC and 22 Public Service Media organizations shows that 45% of all “AI” assistants’ answers on questions about news and current affairs have significant errors 17.
Moreover, LLMs are easy prey for manipulation - either by the service providing organization or by third parties. A recent study claims that even multi-billion-parameter models can be “poisoned” by injecting just a few corru
[truncated for AI cost control]