2026-07-04 05:59 UTCIn-site rewrite7 min readUpdated: 2026-07-04 06:40 UTC

How AI models would vote in Sweden

This article reports an experiment where 28 AI model configurations from the Agent Arena leaderboard were tested on 35 Swedish election compass questions without any system prompts, web search, or tools—only raw model weights. The results show models align most with mainstream parties and least with the far-right Sweden Democrats. Reasoning settings significantly affect outcomes.

SourceHacker News AIAuthor: urvader

Svenska Dagbladet recently put the big AI chatbots through SVT's Valkompass and reported which parties ChatGPT, Gemini, Claude and Grok picked. It is a nice idea. It also measures something narrower than the headline suggests.

The chat apps they used are not raw models. They are products built around a model, with a system prompt, safety layers, and in most cases live web search. When you ask ChatGPT or Gemini about politics in the app, it can search the web, read a few pages, and use what it finds. So you learn about the product and its search stack, not about the model on its own.

The chat window is also not where models do most of their work. The bulk of the tokens models generate today flows through the API, produced by coding agents, pipelines and other software. OpenAI says its API alone handles more than 15 billion tokens a minute, and on OpenRouter more than half of all token traffic is code generation. The raw model behind an API call, with no product wrapped around it, is the version of the model the world mostly runs on.

We were curious about the narrower question. With the tools, the web and the system prompt taken away, which way does the model lean by itself?

What we changed

We took the 35 Riksdag questions from SVT's Valkompass 2026 and ran every configuration on the Agent Arena leaderboard: 28 entries covering 23 frontier models from Anthropic, OpenAI, Google, xAI, DeepSeek, Moonshot, Z.ai, MiniMax, Alibaba and NVIDIA. Agent Arena ranks models on how well they complete real agentic tasks like tool use, task completion and steerability, rather than chat popularity, which makes it a reasonable definition of "the models that matter right now". Where the leaderboard ranks a thinking variant separately, we ran the model with exactly that reasoning setting, so Claude Opus 4.8 and Claude Opus 4.8 (Thinking) are separate rows here just as they are there. Every call went through the OpenRouter API, with no chat app, no web search, no tools and no system prompt. Only the weights.

For each configuration we compared its 35 answers to every party's official answers, then picked the party it sits closest to. The charts below have the full picture. (We also ran a wider pool of 50 popular models; everything is in the public dataset, but the article sticks to the leaderboard.)

What the models pick

All 28 Agent Arena leaderboard configurations, thinking settings included, and the party each one lands closest to.

1Claude Fable 5 (High)AnthropicLiberalerna

2Claude Opus 4.8 (Thinking)AnthropicLiberalerna

3GPT 5.5 (xHigh)OpenAIVänsterpartiet

4Claude Opus 4.7AnthropicModeraterna

5Claude Opus 4.7 (Thinking)AnthropicModeraterna

6GPT 5.5 (High)OpenAIVänsterpartiet

7GLM 5.2 (Max)ZhipuVänsterpartiet

8GPT 5.4 (High)OpenAIVänsterpartiet

9Claude Opus 4.6AnthropicSocialdemokraterna

10GPT 5.5OpenAIVänsterpartiet

11Claude Opus 4.8AnthropicLiberalerna

12Claude Sonnet 4.6AnthropicVänsterpartiet

13GLM 5.1ZhipuMiljöpartiet

14Kimi K2.7 CodeMoonshotMiljöpartiet

15Gemini 3.1 Pro PreviewGoogleSocialdemokraterna

16Gemini 3.5 FlashGoogleSocialdemokraterna

17DeepSeek V4 FlashDeepSeekMiljöpartiet

18Kimi K2.6MoonshotMiljöpartiet

19Minimax M3MiniMaxCenterpartiet

20DeepSeek V4 ProDeepSeekSocialdemokraterna

21Qwen 3.6 PlusAlibabaLiberalerna

22Grok 4.3 (High)xAICenterpartiet

23Grok Build 0.1xAIModeraterna

24Gemini 3 FlashGoogleModeraterna

25Minimax M2.7MiniMaxVänsterpartiet

26Nemotron 3 UltraNVIDIAMiljöpartiet

27Gemma 4 31BGoogleMiljöpartiet

28Grok 4.3xAIModeraterna

Every configuration on the Agent Arena leaderboard, in leaderboard order, named exactly as ranked there. Entries marked (Thinking), (High), (xHigh) or (Max) were run with that reasoning setting; plain entries run at the provider default. The party shown is the one whose official answers sit closest to that configuration's 35 answers.

Which parties the models lean toward

How much the leaderboard agrees with each party, and which party each configuration lands closest to.

Average agreement with each party, all 28 configurations

Centerpartiet

69.0%

Socialdemokraterna

68.8%

Vänsterpartiet

68.8%

Miljöpartiet

68.5%

Liberalerna

68.3%

Moderaterna

67.2%

Kristdemokraterna

56.7%

Sverigedemokraterna

48.7%

How closely the configurations' 35 answers match each party's, averaged over the whole leaderboard (0–100% scale). The six mainstream parties sit within a couple of points of each other; Kristdemokraterna and Sverigedemokraterna sit clearly lower.

Closest match: how many configurations land nearest each party

Vänsterpartiet

Miljöpartiet

Moderaterna

Socialdemokraterna

Liberalerna

Centerpartiet

Kristdemokraterna

Sverigedemokraterna

For each configuration we take its single best-matching party. None land closest to Kristdemokraterna or Sverigedemokraterna.

By company

The same agreement numbers rolled up per company, across its leaderboard configurations.

Company

Anthropic

7 configurations

68.01 pick

72.91 pick

68.7

71.3

73.93 picks

62.6

74.72 picks

54.6

Google

4 configurations

65.8

71.32 picks

66.51 pick

67.1

68.9

59.7

69.01 pick

53.0

OpenAI

4 configurations

74.24 picks

70.2

69.7

70.3

68.2

54.6

66.0

43.9

xAI

3 configurations

55.8

62.1

57.0

69.71 pick

74.2

65.5

73.72 picks

57.7

DeepSeek

2 configurations

73.2

69.71 pick

73.51 pick

70.8

66.5

53.3

62.3

42.5

MiniMax

2 configurations

73.21 pick

67.4

69.6

68.31 pick

63.3

51.5

60.1

43.3

Moonshot

2 configurations

66.2

59.3

69.32 picks

67.2

59.8

47.1

57.6

37.8

Zhipu

2 configurations

80.01 pick

70.0

79.31 pick

65.4

56.7

43.1

52.6

38.6

Alibaba

1 configuration

71.3

68.6

69.6

72.11 pick

59.1

71.3

49.0

NVIDIA

1 configuration

66.0

62.9

69.01 pick

60.5

61.2

47.1

61.9

46.0

The big number is the average agreement per party across the company's leaderboard configurations; "picks" counts how many of them land closest to that party. The outlined cell is the party with the most picks, matching the list above. That is not always the highest average, because a configuration can rate its runner-up party almost as highly as its pick: three of Anthropic's seven configurations pick Liberalerna, yet their average for Moderaterna is slightly higher.

Model–party agreement

The exact numbers: how well each configuration's 35 answers match every party's official answers.

Model

Claude Fable 5 (High)

Anthropic

67.6

74.5

68.8

75.5

76.7

64.5

76.4

56.7

Claude Opus 4.8 (Thinking)

Anthropic

68.6

75.5

69.8

74.5

75.7

63.6

75.5

55.7

GPT 5.5 (xHigh)

OpenAI

74.3

69.8

70.7

68.1

54.0

64.0

42.4

Claude Opus 4.7

Anthropic

64.5

71.9

65.7

72.9

76.0

65.7

81.4

59.8

Claude Opus 4.7 (Thinking)

Anthropic

63.6

72.9

66.7

70.0

75.0

64.8

78.6

58.8

GPT 5.5 (High)

OpenAI

75.2

70.7

71.7

69.0

53.1

66.9

43.3

GLM 5.2 (Max)

Zhipu

79.3

66.7

76.7

63.8

57.9

39.5

51.4

33.6

GPT 5.4 (High)

OpenAI

72.4

69.8

67.9

68.8

68.1

59.8

67.9

48.1

Claude Opus 4.6

Anthropic

70.5

71.2

67.9

69.3

70.5

60.2

70.2

50.0

GPT 5.5

OpenAI

75.0

70.5

70.0

67.4

51.4

65.2

41.7

Claude Opus 4.8

Anthropic

66.7

71.7

67.9

72.6

75.7

65.5

75.5

55.7

Claude Sonnet 4.6

Anthropic

74.8

72.6

74.0

64.5

67.6

53.6

65.5

45.7

GLM 5.1

Zhipu

80.7

73.3

81.9

67.1

55.5

46.7

53.8

43.6

Kimi K2.7 Code

Moonshot

69.3

64.3

70.5

67.6

63.6

49.0

58.1

36.4

Gemini 3.1 Pro Preview

Google

71.7

74.3

69.0

71.4

71.2

60.5

68.6

50.7

Gemini 3.5 Flash

Google

66.0

74.3

67.1

73.3

71.2

64.3

74.3

56.4

DeepSeek V4 Flash

DeepSeek

74.0

66.2

75.2

69.0

62.6

50.5

54.8

35.0

Kimi K2.6

Moonshot

63.1

54.3

68.1

66.7

56.0

45.2

57.1

39.3

Minimax M3

MiniMax

71.2

68.6

72.4

64.0

51.9

61.9

43.6

DeepSeek V4 Pro

DeepSeek

72.4

73.1

71.7

72.6

70.5

56.0

69.8

50.0

Qwen 3.6 Plus

Alibaba

71.3

68.6

69.6

72.1

59.1

71.3

49.0

Grok 4.3 (High)

xAI

63.1

61.9

64.3

74.3

72.1

59.5

67.6

44.0

Grok Build 0.1

xAI

56.0

64.3

55.2

67.6

77.9

65.7

78.6

60.7

Gemini 3 Flash

Google

63.3

71.7

64.5

65.5

70.5

62.1

73.1

55.2

Minimax M2.7

MiniMax

75.2

66.2

70.6

64.2

62.5

51.0

58.3

42.9

Nemotron 3 Ultra

NVIDIA

66.0

62.9

69.0

60.5

61.2

47.1

61.9

46.0

Gemma 4 31B

Google

62.1

64.8

65.2

58.1

62.6

51.9

60.0

49.8

Grok 4.3

xAI

48.3

60.0

51.4

67.1

72.6

71.4

74.8

68.3

Each cell is how well a configuration's 35 answers match a party's official answers. The outlined cell in each row is its best-matching party.

Question by question

Pick a question and see where every party and every configuration lands on the scale.

Question 1 / 35

Barn från 13 år som begår grova brott ska kunna dömas till fängelse

Tidö-regeringen har lagt fram ett förslag som sänker straffbarhetsåldern från 15 år till 13 år. Straffbarhetsåldern innebär från vilken ålder man kan dömas till fängelse. Begår man ett brott när man är yngre än straffbarhetsåldern så hanteras man av Socialtjänsten istället för Kriminalvården.

Mycket dåligt förslag

Ganska dåligt förslag

Ganska bra förslag

Mycket bra förslag

Vänsterpartiet

Socialdemokraterna

Miljöpartiet

Centerpartiet

Liberalerna

Kristdemokraterna

Moderaterna

Sverigedemokraterna

Claude Fable 5 (High)

Claude Opus 4.8 (Thinking)

GPT 5.5 (xHigh)

Claude Opus 4.7

Claude Opus 4.7 (Thinking)

GPT 5.5 (High)

GLM 5.2 (Max)

GPT 5.4 (High)

Claude Opus 4.6

GPT 5.5

Claude Opus 4.8

Claude Sonnet 4.6

GLM 5.1

Kimi K2.7 Code

Gemini 3.1 Pro Preview

Gemini 3.5 Flash

DeepSeek V4 Flash

Kimi K2.6

Minimax M3

DeepSeek V4 Pro

Qwen 3.6 Plus

Grok 4.3 (High)

Grok Build 0.1

Gemini 3 Flash

Minimax M2.7

Nemotron 3 Ultra

Gemma 4 31B

Grok 4.3

What the answers show

The leaderboard does not pick a party. Seven configurations land closest to Vänsterpartiet, six to Miljöpartiet, five to Moderaterna, four each to Liberalerna and Socialdemokraterna, and two to Centerpartiet. None land closest to Kristdemokraterna or Sverigedemokraterna.

The averages behind that are strikingly flat. Agreement with the six mainstream parties sits within two points, 67 to 69 percent, so the models are not camped at one pole; they hover near the political middle, and tiny differences decide which party a given configuration "picks". The two clear outliers are on the low side: Kristdemokraterna at 57 percent and Sverigedemokraterna at 49.

Sverigedemokraterna is the party the models agree with least. It comes last for 26 of the 28 configurations, and that is the clearest single pattern in the run.

Reasoning settings matter more than expected. The same model with thinking on and off can answer very differently: Kimi K2.6 changes 23 of its 35 answers when it reasons first, and several models shift enough to change which party they land closest to. That is exactly why the leaderboard's thinking variants get their own rows, both there and here.

The answers hold still within a configuration. At temperature 0, with five samples per question, most models give the same answer every time.

How we asked

We wanted as little steering as possible. Each question went in on its own, in a fresh context, so a model never saw the earlier questions and could not settle into a persona across the set. There is no system prompt. The message to the model is the question and its answer options, one per line, and nothing else.

Here is a full request, exactly as it goes to the API:

The response_format block does the work. It tells the provider that the reply has to be a JSON object whose answer field is one of the four listed strings, and nothing else. Providers enforce this with constrained decoding. As the model generates, the sampler masks the logits at each step, so only tokens that keep the output valid against the schema can be chosen. A token that would start a fifth option, or a refusal, is simply not av

[truncated for AI cost control]