AI News - Fri Jun 26 2026

Top Story: Google Rewrote the Science Benchmarks — While Everyone Looked the Other Way

 

The Tech‑Reader AI Digest

Friday, June 26, 2026

#AI #TechNews #Digest

#AI


Podcast 🎧 • Video 📽


The week ends on pure industry. Google rewrote the science benchmark leaderboard and almost nobody noticed. SpaceX's Colossus has become one of the most valuable infrastructure businesses on earth. And OpenAI just became a chip company.


Story 1: Google Rewrote the Science Benchmarks — While Everyone Looked the Other Way

What happened: On June 22, Google launched Gemini 2.5 Pro with Deep Think, its most capable model to date. The benchmark results rewrote the leaderboard: MMLU-Pro at 89.8% — highest of any publicly available model. GPQA Diamond at 82.4% — surpassing GPT-5.5's 76.3%. HumanEval+ at 94.1% — highest ever recorded. The 2-million-token context window is double what any competitor currently offers.

Deep Think is Google's extended reasoning mode. It runs internal chain-of-thought before generating output, specifically boosting hard science, math, and complex reasoning tasks. Google says Deep Think improves accuracy on complex multi-step problems by 15–25% compared to standard mode, at the cost of 3–5x longer response times and approximately 4x the token usage. The model is available now to Google AI Ultra subscribers in the Gemini app, and via the Gemini API and Vertex AI for enterprise developers. Pricing runs at approximately $2.50/$15 per million tokens in standard mode, with Deep Think usage billed at approximately 4x that rate.

The practical benchmark picture heading into the weekend: Gemini 2.5 Pro Deep Think leads on science and reasoning. Fable 5 leads on software engineering at 88.6% SWE-bench Verified and 88.0% Terminal-Bench 2.1 — but is offline. GPT-5.6 Sol is the unknown quantity, arriving today in limited preview with no public benchmark card yet. For teams building in life sciences, financial analysis, graduate-level research, or hard math, the Gemini 2.5 Pro Deep Think result is the most consequential model development of the week.

Why it matters: Gemini 2.5 Pro is the most underrated frontier model of 2026 — and that is not a compliment to Google's marketing. While ChatGPT and Claude dominate consumer mindshare, Gemini 2.5 Pro has quietly matched or beaten them on nearly every major benchmark, while offering something neither can touch: a genuine 2-million-token context window and native video understanding. The launch landed on one of the most news-saturated days of the AI year and generated almost no developer conversation. That is a distribution problem, not a capability problem. The model that leads graduate-level science benchmarks deserved a better news cycle than the one it got.

Aaron's take — Google dropped a benchmark-leading model on the same day the AI industry was watching two other stories and got essentially zero developer attention for it. GPQA Diamond at 82.4% is a real number. MMLU-Pro at 89.8% is a real number. A 2-million-token context window that nobody else is matching is a real advantage. The irony is that Google's biggest problem in AI has never been what it builds — it has been whether anyone is paying attention when it ships. This week is the clearest possible example of that problem. Demis Hassabis has a model that leads the leaderboard on the benchmarks that matter for serious scientific work. The news cycle did not cooperate. The benchmarks do not care.


Story 2: The xAI Compute Empire — $80 Billion in Committed Revenue Through 2029

What happened: The full scale of SpaceX's Colossus infrastructure business came into focus this week with the Reflection AI deal details. SpaceX signed a compute lease with Reflection AI on June 22 for $150 million per month starting July 1, 2026, through the end of 2029, totaling approximately $6.3 billion if the contract runs its full term. Reflection gains immediate access to Nvidia GB300 chips at SpaceX's Colossus 2 facility in Memphis, Tennessee.

This is the fourth major external compute lease SpaceX has signed for Colossus, following Anthropic at approximately $45 billion through mid-2029, Google at approximately $30 billion through 2029 paying $920 million per month, and Cursor. SpaceX's committed compute revenues from outside clients now exceed $80 billion through 2029. Colossus was originally built to train Grok; SpaceX has converted the facility into one of the largest commercial AI compute platforms in the world.

Reflection AI was founded by veterans of Google DeepMind and is backed by Nvidia, Sequoia, and Lightspeed. Its thesis: American governments, banks, and defense contractors want frontier-capable AI with open weights, won't use closed US labs for data sovereignty reasons, and won't use Chinese open-weight models for security reasons. Reflection is positioning as the third option: American, open-weight, and frontier-scale. The company is now raising at a $25 billion valuation.

Why it matters: The Colossus revenue picture completely reframes what SpaceX's AI strategy actually is. Musk did not build a compute cluster to train Grok — he built a commercial infrastructure platform that generates $80 billion in committed external revenue while also training Grok on the side. The ultimate goal of the OpenAI and Broadcom partnership involves deploying gigawatt-scale data centers — that is, data centers with compute requiring energy on the order of cities. SpaceX is already operating at that scale and leasing capacity to four of the most strategically significant AI tenants on earth. The vertical integration picture for xAI — frontier models, developer tooling via Cursor, and now a dominant compute platform — is becoming one of the most consequential infrastructure bets in the industry.

Aaron's take — Eighty billion dollars in committed compute revenues from four external tenants through 2029. That number alone makes Colossus one of the most valuable infrastructure businesses in technology, independent of anything xAI does with Grok. Musk started building Colossus because he couldn't get enough Nvidia GPUs to train his models competitively. He ended up building the facility that Google, Anthropic, Cursor, and now Reflection are all paying billions to access. The Reflection deal is the most strategically interesting of the four — an American open-weight lab with DeepMind DNA, backed by Nvidia, positioning itself as the sovereignty-safe alternative for governments and enterprises that need frontier AI without the export control exposure. SpaceX gets paid regardless of which tenant wins. That is a very comfortable position.


Story 3: OpenAI Becomes a Chip Company — Jalapeño Lands in Nine Months

What happened: OpenAI and Broadcom unveiled Jalapeño on June 24 — OpenAI's first custom Intelligence Processor, an accelerator architected specifically around LLM inference. The chip went from initial design to manufacturing tape-out in just nine months, which the companies describe as potentially the fastest ASIC development cycle ever achieved in high-performance advanced semiconductors.

The chip is an ASIC — application-specific integrated circuit — purpose-built for inference rather than training. It is less flexible than Nvidia's GPU but less expensive and optimized for the specific task of serving AI models to users at scale. OpenAI President Greg Brockman told CNBC the chip was designed with significant help from OpenAI's own AI models. "The degree to which our models have been able to accelerate it was very surprising to us," Brockman said.

Jalapeño is the first step in a multi-generation compute platform designed for initial deployment by the end of 2026, combining OpenAI-designed accelerators with Broadcom silicon implementation and networking technologies. Early testing shows performance per watt substantially better than current state-of-the-art. Broadcom CEO Hock Tan confirmed gigawatt-scale deployment with Microsoft and other partners beginning in 2026. Microsoft has been confirmed as a primary deployment partner and is reported to have guaranteed 40% of the first production run.

Why it matters: By successfully finalizing the Jalapeño design, OpenAI is seeking to move beyond the traditional confines of a software laboratory and stand shoulder-to-shoulder with international cloud and infrastructure titans. The ultimate goal involves deploying gigawatt-scale data centers with Microsoft and other partners — data centers drawing a gigawatt or more of power, populated with Jalapeño-based inference clusters. Every dollar OpenAI spends on Nvidia inference hardware is a dollar it cannot spend on research or return to investors. Jalapeño is a direct attack on that cost structure. If it delivers the performance-per-watt gains claimed in early testing, it changes the unit economics of serving GPT-5.6 Sol at scale.

Aaron's take — Nine months from design to tape-out. OpenAI used its own AI models to design a chip that will run its AI models faster and cheaper. That recursive loop — AI accelerating AI infrastructure — is the most important long-term signal in this announcement, more than any specific benchmark number. Brockman framed it correctly: the world is moving to a compute-powered economy, and the companies that control the physics of their inference pipeline control their own destiny. OpenAI has spent four years as Nvidia's largest and most dependent customer. Jalapeño is the first credible step toward changing that relationship. It will not replace Nvidia overnight — training still runs on Nvidia hardware and likely will for years. But inference is where the revenue is, and inference is exactly what Jalapeño was built for.


Quick Hits — The Rest of Today's AI World

Anthropic / Claude

  • Fable 5 and Mythos 5 remain offline — Day 14. One sentence: no restoration as of press time, July 8 Persona ID verification remains the most likely structural path forward.

OpenAI

  • GPT-5.6 Sol, Terra, and Luna announced today — limited preview to approximately 20 trusted partners. Sol at $5/$30, Terra at $2.50/$15, Luna at $1/$6 per million tokens. General availability in ChatGPT and API planned in coming weeks.
  • GPT-4.5 retired from ChatGPT today.
  • Jalapeño chip: deployment begins end of 2026. Microsoft confirmed as primary partner.

xAI / SpaceX

  • Colossus committed external compute revenues exceed $80B through 2029 — Anthropic, Google, Cursor, Reflection AI.
  • Nasdaq down 4.6% on the week. SPCX near $2 trillion market cap amid tech volatility.

Gemini (Google)

  • Gemini 2.5 Pro Deep Think: GPQA Diamond 82.4%, MMLU-Pro 89.8%, HumanEval+ 94.1%. Available now to Google AI Ultra subscribers.
  • Gemini 3.5 Pro — four days remain in June. Still no GA announcement.

Microsoft / GitHub Copilot

  • DeepSeek V4 evaluation for Copilot Cowork — no final decision confirmed. GitHub AWS capacity arrangement ongoing.

DeepSeek / Alibaba Qwen / Z.ai

  • GLM-5.2 and MiniMax M2.5 continue absorbing displaced enterprise developer traffic.

Cohere / Aleph Alpha

  • $20B merger pending regulatory approval.

Presight AI

  • Banco Santander and Presight sign memorandum of understanding to explore strategic cooperation in artificial intelligence.
  • Abu Dhabi Chamber and Presight — a G42 company — sign a strategic partnership to deploy agentic AI across 102,000 SMEs.

  • That's your AI world for Friday. Back Monday. — Aaron





    Aaron Rose is a software engineer and technology writer at tech-reader.blog

    Catch up on the latest explainer videos, podcasts, and industry discussions below.


Popular posts from this blog

Insight: The Great Minimal OS Showdown—DietPi vs Raspberry Pi OS Lite

Running AI Models on Raspberry Pi 5 (8GB RAM): What Works and What Doesn't

Raspberry Pi Connect vs. RealVNC: A Comprehensive Comparison