Kimi K2 Thinking Just Punked GPT-5 (and Everyone Else)
OpenAI Burned Billions. China Did It for the Price of a Tesla.
One-Line Flow:
A Chinese open-weight model just out-reasoned GPT-5 on Humanity’s Last Exam — and yes, it runs 15 tps on dual Mac M3 Ultras.
Ahh, I know ![]()
- This thing’s like ChatGPT got cloned in China, drank five energy drinks, and now it’s beating the original — while being open enough for you to mess with.

Dumb Mode Dictionary (for the rest of us)
Summary
Alright, calm down, Einstein — here’s the plain-English version so your brain doesn’t short-circuit halfway through.
Kimi K2 Thinking:
It’s a new Chinese AI model — kinda like ChatGPT, but cheaper, faster, and freakishly smart.
“Open-weight”:
Means you can download it and run it yourself if you’ve got a monster PC.
Not fully open-source, but close enough for geeks to scream “FREEDOM.”
“Humanity’s Last Exam”:
A super-hard test made to see if an AI can think like a human.
Kimi scored higher than GPT-5, which basically means it thinks better under pressure.
“Tokens”:
Think of them as tiny words or pieces of text.
AI models eat them like snacks — and you pay per snack.
Kimi’s snacks are cheap: about $0.60 per million eaten and $2.50 per million spoken back.
“15 tps on Mac M3 Ultra”:
It spits out 15 words per second on Apple’s latest chip.
That’s insane speed for something not running on a data center.
“Heavy mode”:
Basically “exam mode” — it double-checks itself and argues in its own head before answering.
That’s why the 51% score sounds small but is actually huge.
“INT4 quantization” and “Mixture-of-Experts”:
Just fancy ways of saying “it’s trained smart and compressed tight” —
so it runs fast without forgetting how to think.
“BrowseComp,” “SWE-Bench,” “LiveCodeBench”:
Nerd tests for coding, browsing, and logic — and Kimi beat or tied GPT-5 in most of them.
“$4.6M training cost”:
That’s what it cost to teach Kimi everything — pocket change in AI world.
What’s Changing
Meet Kimi K2 Thinking — Moonshot AI’s monster of a model that just shoved its way past GPT-5 and Claude on multiple leaderboards.
It hit 51 % on Humanity’s Last Exam (heavy mode) — a setting that uses 8 parallel samples + reflection.
In standard mode with tools, K2 scores 44.9 % vs GPT-5’s 41.7 %.
It writes, reasons, and codes like a caffeinated monk —
while clocking ~15 tokens per sec on dual M3 Ultras (pipeline parallel + INT4 quantization).
And the cost? $0.60 per million input tokens and $2.50 per million output tokens.
Benchmarks worth noticing:
- BrowseComp: 60.2 (vs GPT-5 54.9 | Claude Sonnet 4.5 32.0)
- SWE-Bench Verified: 71.3 (vs GPT-5 74.9)
- LiveCodeBench v6: 83.1 (vs GPT-5 87.0)
For
1Hackers
If you ever said “open source can’t touch GPT” — congrats, you’re officially vintage.
This beast is open-weight, fast, and cheap — like if DeepSeek and Claude had a disciplined kid raised on Baidu data.
Hands-on review:
LocalLLaMA Reddit review
Official blog
Highlights
- #1 on Humanity’s Last Exam (heavy mode)
- Beats DeepSeek-V3.2, Claude 4.5 Thinking, GPT-5, Grok-4
- Excels at reasoning, writing & browser tasks
- Open-weight, downloadable from OpenRouter
- Runs crazy fast on Apple silicon with INT4 quantization
Under the Hood
- Architecture: 1 trillion params (32 B active per inference)
- Mixture of Experts: 384 experts — 8 activated per token
- Context window: 256 K tokens
- Quantization: Native INT4 (QAT)
- Training cost: ≈ $4.6 million
- Hardware reqs: > 512 GB RAM + ≥ 32 GB VRAM for 4-bit local runs
- Model size: ≈ 600 GB
Why It Matters
This isn’t “China catching up.”
It’s China overtaking — with open weights instead of closed walls.
While OpenAI spends billions hiding its secret sauce, open models are eating the leaderboard in plain sight.
And the internet lost it:
“OpenAI needs 5 trillion dollars to compete with China
”
“Open models will win the race. I hope OpenAI gets crushed.”
“Sure, just grab 10 H100s and you’re good to go!”

Cool. They Got Rich on Free GPUs… Now What the Hell Do We Do? * (⊙_◎)
maybe we just pretend to ‘benchmark’ it and accidentally start a startup.

-
The “Cheaper Than ChatGPT” Resell Trick
-
Spin up Kimi K2 on OpenRouter, slap a fancy UI on top, call it “AI Ghostwriter Ultra,” and sell access for $5/month.
-
You pay ~$0.002 per convo, users think you’re running magic.
-
Example: A kid in Vietnam already did this — made $800 in a week selling “AI Love Letter Generator” using open models via OpenRouter API.
-
The “Ghosted by GPT, Courted by Kimi” Flip
-
Sell AI girlfriend/boyfriend chat clones fine-tuned on K2 — no bans, no “let’s keep this appropriate” filters, no judgment.
-
Just algorithmic affection wrapped in poetic replies.

-
Example: A crew in South Korea launched Telegram-based AI K-pop idols trained on K2 — each “idol” flirts, remembers your birthday, and texts daily. They now pull in $9 K/month from micro-subs.
-
The “Fake University Diploma” Trick (Legal Edition)
-
Spin up a fake-serious “academy” powered by K2. Let it auto-grade essays, run quizzes, and email certificates labeled “Certified in AI Reasoning.”
-
Everyone loves credentials — no one checks the backend.
-
Example: A small team in Nigeria sold over 3 K $19 certificates to Dubai expats through their AI-graded “Business Analytics Institute.” It was just K2 behind a WordPress site.
-
The “CEO Clone” Strategy
-
Scrape a famous founder’s interviews, feed them to K2, and launch a chatbot that sounds exactly like them.
-
Startup bros will pay just to ask “AI-Sam Altman-v3” how to raise Series A funding.
-
Example: In Estonia, AskElon.AI trained K2 on Musk’s interviews. They didn’t charge users — they sold user question data to ad agencies hunting for startup trends.
-
The “Reverse Therapy” Service
-
Build a chatbot that roasts users like a brutally honest friend — powered by K2’s savage reasoning.
-
Brand it as “AI That Doesn’t Lie to You.”
-
People pay for the burn and stay for the self-hate subscription.
-
Example: In Japan, the indie app TellMeOff lets users get insulted daily by an AI “therapist.” $2 per roast. Went viral because people actually thanked it for emotional damage.
Final Thought / Uncommon Logic
OpenAI built a black box.
China built a mirror.
Guess which one the world’s about to stare into.
In Short
Kimi K2 Thinking isn’t just a benchmark fluke — it’s proof that open weights can outthink closed systems.
The AI race is no longer West vs East. It’s Closed vs Open — and Open just landed a headshot.
!