Introduction: Beyond the Bubble Narrative
As market volatility leads some to question the longevity of the AI sector, many commentators have labeled the technology a speculative bubble. Stock-market bubbles reflect sentiment—optimism, hype, and fear—which can burst overnight. The “bubble” narrative persists because speculative frenzy in public markets is far easier to see than the technical curve of a model’s benchmark performance. However, conflating the two is like judging the reality of the internet in 1995 by the volatility of a few dot-com stocks.
To understand where AI is actually headed, we must look past market hype and fear at the empirical curve of what these systems can do. Researchers at the nonprofit group METR interpret a different kind of graph: the “meter-graph”. This chart captures the actual, physical capability of AI systems, distinct from investor sentiment. It answers a simple, objective question: How long can an AI work on a real-world task entirely on its own before needing a human?. This focus on task-length capability avoids hype-driven benchmarks, relying on a binary outcome that makes the meter-graph an unusually informative tool for cutting through the noise. Just as Moore’s Law persisted for decades because physics allowed it, the meter-graph reflects a physical scaling of compute, data, and algorithmic efficiency. Therefore, labeling AI’s progress a “bubble” mischaracterizes the empirical trend; this is not a market cycle, but a tangible capability curve.
Documented Findings (supported by METR’s published data)
- 2020: GPT-3 writes a simple email (~15 seconds).
- 2021: GPT-3 fixes a software bug.
- 2022-2023: GPT-4 & Claude 3 can code significant application components from scratch.
- 2019-2023: Task-length capability doubles every ~7 months (average).
- 2024-2025: Doubling interval shortens to ~4 months.
Jagged Progress: High Peaks and Persistent Potholes
This progress, however, is not a smooth line; it is jagged. While capabilities surge in some areas, specific failure modes remain in others. On the GPQA benchmark of graduate-level, Google-proof questions, models scored ~60% correct one year ago. Today, they are at ~90% correct, surpassing many human specialists.
Yet, AI still struggles with distinct tasks like hand-drawing, precise character counting, or simple physical reasoning. This unevenness largely exists because Large Language Models process information in grouped “tokens” rather than individual letters, and they currently lack the grounded physical embodiment necessary for intuitive spatial reasoning. Highlighting these specific, persistent failures without acknowledging the rapid gains elsewhere creates a misleading picture of stagnation.
Comparison to Moore’s Law
The pace of AI advancement is not just steady; it is significantly outpacing the historical growth of computing hardware.
- Traditional Moore’s Law: 1½–2 years doubling interval for chip performance.
- METR “AI-Moore”: 4 months doubling interval (2024-2025) for task capability.
- Example: GPT-4o completes a software-engineering task that would take a human > 3 hours.
- Example: Claude 3.5 Sonnet handles near-5-hour tasks.
(Note: While traditional Moore’s Law measures hardware transistor density and the “AI-Moore” metric tracks autonomous task-length, comparing the two vividly illustrates the unprecedented rate of exponential scaling in artificial intelligence.)
Documented Impacts of Accelerating AI
The abstract doubling intervals translate into tangible productivity shifts. We are moving from theoretical potential to measurable execution.
- Month-long tasks → hour-long AI execution: Measurable productivity gains in software development and complex data analysis (e.g., automating regulatory compliance reporting).
- Year-long tasks → days: Faster prototyping of large-scale systems and accelerated scientific simulations.
- Week-long tasks → full workday: Ongoing research indicates potential for automating many white-collar activities.
The “Scaling Is Dead” Debate and Cognitive Biases
Skeptics periodically claim AI scaling is exhausted, though historical predictions by figures like Gary Marcus and Yan LeCun have repeatedly proved premature. The exponential trend persists because each technological paradigm follows a familiar S-shaped trajectory: a slow start, followed by a steep climb as breakthroughs compound, eventually leveling off as the approach matures. Critics often mistake a temporary plateau—the top of one S-curve—for the end of progress.
However, history shows that when a plateau appears, a new paradigm begins its own S-curve (e.g., multimodal models, reasoning-enhanced agents), keeping the aggregate trajectory exponential.
Our cognition is tuned to linear change, causing us to misinterpret these exponential curves. This psychological bias explains why experts consistently under-predict AI progress:
- Flat-section bias: Seeing a short-term plateau at the base of a new S-curve and assuming a permanent slowdown.
- Pond-half-full effect: Missing that a system can stay near-zero for a long time and then fill the pond with water lilies with shocking speed.
Recognizing this layered S-curve structure helps reconcile divergent views: the skeptics are seeing the end of a specific curve, while optimists see the beginning of the next.
Expert Opinions & Existential Risks
The same exponential curve that guarantees rapid productivity gains also rapidly accelerates the timeline for when these systems might operate beyond human control. With such transformative power comes significant responsibility. According to subjective expert forecasting models, the risks are non-trivial.
- AI-Impacts 2024 expert survey: 16% reported probability of AI-induced extinction.
- Anthropic CEO Dario Amodei: 10-25% reported probability.
These probabilities place AI alongside other high-impact existential risks, such as global catastrophic climate change (~10%) and a pandemic causing > 1 billion deaths (~5%).
Methodology
How METR Generates the Meter-Graph
- Data span: 15 data points collected from 2019 → 2025.
- Models evaluated: Frontier LLMs from OpenAI, Anthropic, Google DeepMind, and others.
- Task categories: Software engineering, physics problem solving, chemistry simulation, mathematics, coding, reasoning, and multimodal tasks.
- Analysis: Log-linear regression on task-length versus calendar time yields a doubling interval of ~7 months (2019-2023) and ~4 months (2024-2025).
- Source: METR’s public report “Time-Horizons of Frontier AI Models” (2025).
Further Reading
- METR Time-Horizons Report (2025) — detailed methodology and raw data.
- OpenAI GPT-4 technical report and Anthropic Claude 3 model card — capabilities and multi-hour autonomous coding.
- AI-Impacts 2024 Expert Survey — risk perception statistics.
- Tim Urban, “The AI Timeline” — accessible overview of exponential AI growth.
- Import AI by Jack Clark and The Batch by DeepLearning.AI — newsletters tracking AI capability metrics.
9. Conclusion & Call-to-Action AI capability is exponential, jagged, and accelerating. The “bubble” narrative conflates market sentiment with genuine technical progress, obscuring both the opportunities and the risks. We are not witnessing a financial bubble about to burst, but a technological explosion redefining the boundaries of possible work.
What you can do today
- Adapt: Identify one repetitive task in your weekly workflow and commit to using an AI tool to complete it for a week to experience the pace of change firsthand.
- Subscribe: Track newsletters that follow AI capability metrics rather than just stock prices.
- Participate: Join local AI-ethics or policy discussion groups to influence responsible governance.
- Contribute: Support open-source safety tooling or research through organizations like METR or Anthropic’s ARC.
Staying informed, engaged, and proactive is the most effective way to navigate the rapid transformation ahead.