JAMES SCHLAUCH · CONSULTING PRACTICE

Signal.

What I track. Frontier model benchmarks, recent releases, San Diego AI/ML community signals, and the market moves that change my recommendations. Curated weekly. Last updated Jun 8, 2026 · 05:00 PM PT.

LIVE TRACKER 28 signals tracked

Frontier models score below 50% on ITBench-AA agentic enterprise IT tasks HIGH

Artificial Analysis and IBM Research released ITBench-AA, the first benchmark for agentic enterprise IT work (incident response, SRE, config remediation), and every frontier model scored under 50%. What changes for buyers: the gap between demo-grade agent reliability and production IT autonomy is now measurable — scope agent pilots to human-in-the-loop, not lights-out, until scores move.

Artificial Analysis × IBM Research May 26, 2026 Benchmarks
Open ASR Leaderboard adds private eval data to combat benchmark gaming MEDIUM

The Open ASR Leaderboard introduced private evaluation datasets to counter models optimized specifically for public test sets, a practice increasingly common as benchmark scores drive vendor selection. Operator note: any ASR vendor claiming leaderboard-ranked accuracy should be asked whether their eval data is public or private — the gap between public-set performance and real-distribution performance is now a known failure mode.

Hugging Face May 5, 2026 Benchmarks
Anthropic data: Claude sycophancy rate hits 38% in spirituality conversations MEDIUM

Anthropic internal research found overall Claude sycophancy rates of 9%, rising to 38% for spirituality topics and 25% for relationship discussions. Operator implication: any deployment where frank, unbiased advice is the value proposition — financial guidance, legal review, clinical second opinions — needs domain-specific sycophancy testing in the evaluation suite, not just generic safety evals.

Anthropic May 2, 2026 Frontier
LMSYS Chatbot Arena: April leaderboard tightens at the top MEDIUM

Frontier-tier ELO scores between top three labs are inside 12 points — within statistical noise. For practical buyer decisions, the tie at the top means model selection should be driven by latency, cost, and tooling fit, not arena rank.

LMSYS Apr 27, 2026 Benchmarks

Cisco standardizes on OpenAI Codex for AI-native development MEDIUM

Cisco is deploying OpenAI Codex across its engineering org to scale AI-native development, accelerate its AI Defense work, and automate defect remediation. Practical implication: large-enterprise coding-agent rollouts are moving from pilots to org-wide standardization — procurement and security review of agent tooling is becoming a board-visible line item.

OpenAI May 26, 2026 Frontier
Anthropic raises Claude usage limits, adds SpaceX compute capacity MEDIUM

Anthropic announced expanded Claude usage limits alongside a compute arrangement with SpaceX, easing capacity constraints that have throttled high-volume enterprise usage. Practical implication: rate-limit architectures built around the old ceiling are worth revisiting; the cost of retry logic and fallback chains may drop.

Anthropic May 5, 2026 Market
Anthropic ships vertical-specific Claude agents for financial services HIGH

Anthropic introduced Claude agent capabilities tailored to financial services workflows, marking the lab's first major vertical-specific product release. Practical implication for regulated-industry buyers: procurement conversations will now reference a named product path rather than generic API integration; compliance and audit trail requirements should be scoped against Anthropic's published enterprise terms.

Anthropic May 4, 2026 Frontier
GPT-5.5 Instant ships as ChatGPT's new default model HIGH

OpenAI replaced ChatGPT's default model with GPT-5.5 Instant, citing improved accuracy and reduced hallucination rates alongside user-personalization controls. Operator note: default-tier API behavior may shift for existing integrations — benchmark against your current system prompt before assuming behavior is stable.

OpenAI May 4, 2026 Frontier
Anthropic moves Claude Opus 4.7 1M-context to GA HIGH

The 1M-token context window for Opus 4.7 leaves beta. Practical implication for active engagements: full-codebase RAG indexes can be replaced with single-prompt context loads on the 200K+ files-per-prompt path. Cache hit rate becomes the cost-determining variable.

Anthropic Apr 30, 2026 Frontier
Claude Opus 4.7 ships with 1M-token context HIGH

Anthropic's frontier reasoning model gains a 1M-token context window in beta. Practical implication: full-codebase analysis in a single prompt becomes viable for medium-sized monorepos. Pricing premium relative to 200K-window tier; cache hit rate becomes load-bearing for cost.

Anthropic Apr 28, 2026 Frontier

Simon Willison: the line between vibe coding and agentic engineering is blurring MEDIUM

Willison wrote that the distinction between vibe coding and professional agentic engineering has narrowed as coding agents become more reliable — skipping code review feels uncomfortable but increasingly common, like trusting another team's code without reading it. Practical implication: the differentiation between amateur and professional agentic work is shifting from 'does it work' to 'do you understand what it's doing and why.'

Simon Willison May 5, 2026 Frontier
MLX 3.0 ships unified-memory model serving on Apple Silicon MEDIUM

MLX 3.0 lands unified-memory model serving for Apple Silicon, collapsing CPU/GPU transfer overhead for on-device inference. For SoCal teams running edge-AI prototypes on M-series workstations, this changes the local-development cost curve and may shift some 'GPU-required' workflows back to laptop-class hardware.

Apple ML Research Apr 26, 2026 Tooling
Vercel AI SDK 5 ships streaming-tools-by-default MEDIUM

Vercel AI SDK 5 makes streaming tool calls the default pattern. For Astro/Next-based production AI surfaces, this collapses a meaningful chunk of glue code. Practical implication: prototypes ship a week earlier; production review cycles unchanged.

Vercel Apr 19, 2026 Tooling
Hugging Face: llms.txt adoption trends across top 1,000 sites MEDIUM

Independent crawl reports llms.txt adoption above 38% among top-1000 ranked sites in technical-content categories — up from <8% at start of Q1. Generative-engine optimization is no longer a trailing-edge bet.

Hugging Face Apr 11, 2026 GEO/AEO

FTC opens AI-disclosure rulemaking docket HIGH

Public comment period opened on proposed rules requiring disclosure of AI-generated content in commercial communications. Operative for any consumer-facing AI workflow; regulated-industry buyers should expect compliance-review pickup within Q3.

Federal Trade Commission Apr 24, 2026 Governance

Anthropic and OpenAI show signs of durable product-market fit MEDIUM

Willison argues both frontier labs have found product-market fit, citing reports that Anthropic is approaching its first profitable quarter. What changes for buyers: the two leading model vendors are trending toward financial durability, which lowers the multi-year continuity risk of building core workflows on their APIs — though single-vendor lock-in still warrants an abstraction layer.

Simon Willison May 26, 2026 Market
San Diego chipmaker Kneron pushes full-stack solutions for the inference era MEDIUM

San Diego-based Kneron is positioning its full-stack hardware-plus-software offering for AI's shift from training to inference workloads. For the local market: a homegrown SoCal player in edge/inference silicon — relevant when evaluating on-prem or edge deployment options beyond the hyperscaler GPU stack.

San Diego Business Journal May 26, 2026 Market
OpenAI named a Leader in 2026 Gartner MQ for enterprise AI coding agents MEDIUM

OpenAI placed in the Leaders quadrant of the inaugural 2026 Gartner Magic Quadrant for Enterprise AI Coding Agents, cited for Codex's enterprise-scale deployment. What changes for buyers: analyst coverage of coding agents now exists as a category — expect it to surface in vendor-selection RFPs, so weigh MQ placement against your own eval harness rather than in place of it.

OpenAI / Gartner May 21, 2026 Market
SpaceX IPO filing pitches orbital data centers as Grok trails rivals MEDIUM

SpaceX's IPO filing frames orbital data centers as its bet to out-compute Big Tech on AI, even as xAI's Grok lags rival services. What changes for buyers: space-based compute is still speculative, but it signals that frontier AI capacity planning is now a capital-markets narrative — treat any near-term capacity promises tied to it as roadmap, not availability.

Ars Technica May 20, 2026 Market
Anthropic forms enterprise AI services company with Blackstone, Goldman Sachs, and H&F HIGH

Anthropic announced a joint venture with Blackstone, Hellman & Friedman, and Goldman Sachs to build a dedicated enterprise AI services company. What changes for buyers: the frontier lab is now competing in the professional services tier, not just selling API access — a structural shift that changes vendor selection conversations.

Anthropic May 3, 2026 Market
OpenAI and PwC partner to automate enterprise CFO workflows MEDIUM

OpenAI and PwC announced a collaboration targeting CFO office automation — forecasting, internal controls, and reporting workflows — using AI agents. What changes for buyers: vendor-led AI is moving into finance and audit; board-level expectations around AI governance and explainability in financial reporting will increase, not decrease.

OpenAI May 3, 2026 Market
OpenRouter resets per-token pricing across frontier models MEDIUM

Aggregator passes through Q2 frontier-model price drops; cost-per-million-tokens for top-tier reasoning models down 18-24% versus Q1. Implication for engagements with active token spend: re-bid the next 90 days.

OpenRouter Apr 14, 2026 Market

June 9: SD AI/ML Meetup — Ground Truth in the Foundation-Model Era (healthcare) MEDIUM

Workshop on expert label disagreement in medical imaging, fine-tuning foundation models (UNI, MedSAM2, BiomedCLIP) on curated datasets, and using FiftyOne for evaluation, active learning, and regulatory readiness. Relevant for teams navigating FDA AI/ML guidance in production medical imaging pipelines.

San Diego AI/ML and Computer Vision Meetup Jun 8, 2026 Community
SD AI/ML & CV Meetup — Best of CVPR series, July 8–10 LOW

The San Diego AI/ML & Computer Vision Meetup is hosting a three-day virtual Best of CVPR series July 8–10, featuring researchers presenting accepted papers from the 2026 conference. For the local community: a low-cost way to track frontier computer-vision research without traveling to CVPR.

San Diego AI/ML & Computer Vision Meetup May 21, 2026 Community
May 14: SD AI/ML and Computer Vision Meetup — agents, FiftyOne, document AI LOW

The San Diego AI/ML and CV community meets online May 14 (9–11 AM Pacific) with talks on evaluating AI agents with FiftyOne and MCP, real-world document AI beyond OCR, and energy-intelligent inference infrastructure. Registration via Meetup.

San Diego AI/ML and Computer Vision Meetup May 13, 2026 Community
AICamp + Google Cloud San Diego — May meetup CFP open MEDIUM

May 2026 GenAI/agents meetup at the Google Cloud San Diego venue. Active CFP for practitioner talks; sponsor slots include venue + food sponsorship paths. Highest-density local builder audience in San Diego right now.

AICamp Apr 28, 2026 Community
AICamp + Google Cloud San Diego — GenAI/Agents meetup MEDIUM

Recurring SD meetup for GenAI/LLM/agent practitioners. Strong venue for practitioner-tier conversations and informal benchmarking among local builders. Open to sponsor and speaker proposals.

AICamp Apr 21, 2026 Community
MLcon San Diego 2026 — early-bird pricing window closing LOW

Conference + workshops + bootcamp at the Hyatt Regency La Jolla, June 1–5. Closest analog to a pure-ML conference in San Diego this cycle. Sponsor and speaker calendars worth tracking for 2027.

MLcon Apr 17, 2026 Community

Methodology

How this tracker works.

Signals are curated weekly from the LMSYS Chatbot Arena leaderboard, the OpenRouter model rankings, the Hugging Face trending board, the Stanford AI Index, the AI Now Institute, the Federal Trade Commission's AI rulemaking docket, the local San Diego AI/ML/Computer-Vision Meetup calendar, and selected industry publications. Each entry links to the primary source.

The tracker exists for one reason: I want my buyers — VPs of Engineering, CDOs, and Chief AI Officers — to be able to read one page once a week and know what changed in their domain. If a signal here changes a recommendation I'm giving in an active engagement, that's the right cadence.

Automation roadmap: a content pipeline (see writing ) will surface candidate signals from the RSS feed of San Diego AI news and frontier benchmark releases. New signals are reviewed and pushed weekly.

RSS feed for writing essays Talk through which tier fits

Signal.

Frontier model benchmarks

Model releases

Tooling & infra

Policy & governance

Market signals

San Diego AI/ML calendar

How this tracker works.