CodeStory (Summer 2023)

Overview

CodeStory was a London-based AI coding tool startup, founded in 2023 by Sandeep Pani and Naresh Ramesh as part of Y Combinator's Summer 2023 batch. The company built Aide, an AI-native IDE forked from VSCode, with a privacy-first local architecture and a focus on multi-file agentic code editing. Over roughly 20 months, CodeStory pivoted three times—from chat and autocomplete, to agentic editing, to a background multi-agent task runner called AgentFarm, and briefly to a VSCode extension—while achieving genuine technical milestones including back-to-back state-of-the-art results on the SWE-Bench coding benchmark. The company shut down in February 2025 having raised only $500K, its sole institutional backer being YC. The core thesis of failure: CodeStory repeatedly made correct technical bets but executed them without the distribution, capital, or monetization strategy needed to convert benchmark leadership into a durable business before better-funded competitors like Cursor commoditized the same capabilities.

Founding Story

Sandeep Pani and Naresh Ramesh met during their freshman year at university and remained close friends for over a decade before co-founding CodeStory.^[1] That long-standing relationship reduced one of the most common early-stage risks: co-founder conflict. When they decided to build together, they were not strangers stress-testing a new partnership—they were longtime collaborators who already understood each other's working styles.

Both founders brought substantial technical credentials. Pani had served as a tech lead for testing infrastructure at Meta and as a founding engineer at Findly.ai, a YC S22 company.^[2] Ramesh had built payments infrastructure supporting 150,000 merchants and scaled an engineering team from 5 to 35 people at an Indian fintech startup. He was also an ACM ICPC world finalist—one of the most competitive programming competitions globally—and an open-source contributor to gRPC and the Bazel build system.^[3] On paper, this was a team with the depth to build something technically ambitious.

The motivation was direct and personal. CodeStory was, in the founders' own framing, built to scratch their own itch as working developers.^[4] They were frustrated by the limitations of existing AI coding tools and believed the dominant paradigm—chat interfaces and line-by-line autocomplete—was underselling what large language models could actually do.

Their initial vision was straightforward: build the best AI-augmented coding experience inside VSCode. In their first YC launch post, they described a future where "AI agents fix bugs and write tests autonomously," and included a demo of CodeStory resolving a race condition bug without manual intervention.

Launch YC: CodeStory ✨ is an AI-powered mod of VSCode

That initial vision would not survive contact with the market. Within six months, the founders concluded that chat and autocomplete were the wrong abstraction entirely—and began a series of pivots that would define, and ultimately doom, the company.

Timeline

June 2023 — CodeStory incorporated; raises $500K seed round from Y Combinator (S23 batch).^[5]
August 7, 2023 — CodeStory publicly launches as an AI-powered mod of VSCode. YC LinkedIn post marks the public debut.^[6]
August 2023 — Initial product focuses on chat and autocomplete inside the editor—the dominant paradigm at the time, competing directly with GitHub Copilot and early Cursor.
February 2024 — After approximately six months of building, founders pivot away from chat/autocomplete, concluding the UX is too limiting for LLM capabilities.^[7]
March 2024 — Founders begin experimenting with multi-file agentic editing: "What if LLMs could make edits across multiple files without breaking the logic?"^[8]
July 1, 2024 — Aide achieves #1 on SWE-Bench Lite with 40.3% accepted solutions using multi-agent collaboration with Claude Sonnet 3.5 and GPT-4o. Covered by press.^[9]
July–November 2024 — Aide holds the top position on SWE-Bench Lite at 43%, a lead it will hold for approximately four months.^[10]
November 10, 2024 — Show HN post submitted for Aide as an open-source AI-native IDE—a public user acquisition push approximately three months before shutdown.^[11]

Show HN: Aide, an open-source AI native IDE

November 2024 — Aide is surpassed on SWE-Bench Lite by Devlo (47.3%) and Globant Code Fix (48.3%), losing benchmark leadership.^[10]
November–December 2024 — Aide achieves new SOTA on SWE-Bench Verified at 62.2% using test-time scaling with Claude Sonnet 3.5 only—the company's highest technical achievement and peak public visibility (143K tweet views).^[12]
February 25, 2025 — GitHub repository codestoryai/aide is archived by the owner and made read-only, marking the end of active development.^[13]
February 2025 — Shutdown announcement posted on @aide_dev Twitter/X account; active subscribers notified of refunds and end-of-month access cutoff.^[14]
February 2025 — YC company page updated to list CodeStory as "Inactive" with founders listed as "Former Founders."^[15]

What They Built

CodeStory's product went through three distinct incarnations over its 20-month life. Understanding what they built requires tracking each phase separately.

Phase 1: AI-Powered VSCode Mod (August 2023 – February 2024)

The original product was a modified version of Visual Studio Code—the most widely used code editor in the world—with AI features layered on top. This was the same architectural approach used by Cursor, which also forked VSCode. CodeStory's version added a chat panel where developers could ask questions about their codebase and receive AI-generated code suggestions, plus an autocomplete layer that predicted the next lines of code as the developer typed. The experience was familiar to anyone who had used GitHub Copilot, and that was precisely the problem the founders would later identify.

Phase 2: Aide — Agentic Multi-File IDE (March 2024 – late 2024)

After concluding that chat and autocomplete were "limiting to the real abilities of these LLMs,"^[16] the founders rebuilt around a different premise: instead of the developer asking the AI for help, the AI would act as an autonomous agent capable of reading, reasoning about, and editing code across an entire codebase simultaneously.

The rebuilt product, renamed Aide, introduced several technically notable components. A Rust-based local process called "Sidecar" ran on the developer's own machine, handling AI computation locally to minimize the transmission of sensitive source code to external servers.^[17] This privacy-first architecture was a genuine differentiator in a market where most tools sent code to cloud APIs by default—a meaningful concern for enterprise developers working on proprietary codebases.

Aide's agent could use the Language Server Protocol (LSP)—the same underlying technology that powers code intelligence features like "go to definition" and "find all references" in modern editors—to understand the semantic structure of a codebase, not just its text. This allowed the agent to make edits that respected the actual logic of the code, not just its surface appearance. The agent could iterate on linter errors automatically, follow type definitions across files, and propose coordinated changes spanning multiple modules.

Launch YC: Aide — The Open Source IDE to solve hard problems

Aide offered a subscription model with a two-week free trial.^[18] The codebase was open-sourced, accumulating 2,200 GitHub stars and 340 forks before archival.^[19]

Phase 3: AgentFarm and VSCode Extension (late 2024)

In the company's final months, CodeStory pivoted again to a product called AgentFarm—a system that allowed developers to spawn multiple AI agents simultaneously, each working on different tasks in a codebase autonomously in the background, without requiring the developer to remain in the loop.^[20] This was a further abstraction away from the IDE interaction model: rather than an AI-assisted editor, it was closer to an autonomous coding workforce.

The company also briefly ported its agent to the VSCode extensions marketplace—a distribution experiment designed to reach developers without requiring them to switch IDEs entirely.^[21] Neither pivot had time to prove itself before the company shut down.

Market Position

Target Customers

CodeStory's primary target was professional software engineers—specifically those working on complex, multi-file codebases where the limitations of line-by-line autocomplete were most acutely felt. The privacy-first local architecture suggested a secondary focus on enterprise developers and teams working on proprietary or sensitive code, where sending source code to external cloud APIs was a compliance or security concern. The 20+ daily active professional engineers using Aide at peak represented this profile: experienced developers willing to adopt a new IDE in exchange for meaningfully better AI assistance.^[22]

Market Size

The AI coding tools market was one of the fastest-growing segments in developer tooling during CodeStory's operating period. GitHub Copilot reported over 1.3 million paid subscribers by early 2024. Cursor, CodeStory's most direct competitor, raised a $60 million Series A in August 2024 at a reported $400 million valuation, and subsequently raised a $900 million Series B in 2025 at a $9 billion valuation—figures that illustrate both the scale of the opportunity and the capital intensity required to compete. The broader developer tools market was estimated in the tens of billions of dollars annually, with AI-native tools capturing an increasing share. CodeStory was competing for a slice of a large and rapidly expanding market, but doing so with $500K against competitors raising nine-figure rounds.

Competition

The competitive landscape was CodeStory's most structurally difficult challenge. Three categories of competitors defined the market:

GitHub Copilot held the distribution advantage. Integrated directly into VSCode and JetBrains IDEs, Copilot had access to Microsoft's enterprise sales channels and GitHub's 100 million+ developer user base. It did not need to convince developers to switch editors.

Cursor was CodeStory's most direct analog—also a VSCode fork, also founded in 2022–2023, also focused on AI-native coding. But Cursor executed on distribution and fundraising in ways CodeStory did not. By the time CodeStory was achieving benchmark leadership in mid-2024, Cursor had already built a large paying user base and was raising capital at a scale that allowed it to iterate faster, hire more engineers, and spend on user acquisition. The founding design engineer at CodeStory later acknowledged: "IDEs like Cursor were just starting out" when CodeStory built multi-file editing—but Cursor scaled before CodeStory could convert its technical lead into market share.^[23]

SWE-Bench leaderboard competitors like Devlo and Globant Code Fix represented a third category: teams optimizing specifically for benchmark performance rather than product adoption. When these teams surpassed Aide on SWE-Bench Lite in November 2024—just as CodeStory was attempting its most public user acquisition push via Show HN—it underscored that benchmark leadership was not a durable moat.

Business Model

CodeStory operated on a direct subscription model. Aide offered a standard subscription plan with a two-week free trial, after which users paid a recurring fee for continued access.^[18] The existence of active paying subscribers at the time of shutdown—confirmed by the refund announcement—indicates the model generated some revenue, though no figures were disclosed.^[14]

The open-source release of the Aide codebase on GitHub served a dual purpose: it lowered the barrier to adoption by allowing developers to inspect and self-host the tool, and it generated community credibility (2,200 stars, 340 forks) that supported the commercial product.^[19] The privacy-first local architecture, while a genuine product differentiator, also implied a cost structure advantage over cloud-heavy competitors—local computation reduced API costs per user. However, the company still relied on frontier model APIs (Claude Sonnet 3.5, GPT-4o) for its most capable features, meaning model costs remained a significant operational expense. With only $500K in total funding and no follow-on capital, the runway to reach sustainable subscription revenue was always constrained.^[5]

Traction

CodeStory's most measurable successes were benchmark results rather than business metrics.

In July 2024, Aide achieved #1 on SWE-Bench Lite—an industry-standard benchmark that tests AI systems on real GitHub issues from open-source Python repositories—with 40.3% accepted solutions, later reported as 43%.^[9] The company held that top position for approximately four months, until November 2024, when Devlo (47.3%) and Globant Code Fix (48.3%) surpassed it.^[10]

In November–December 2024, Aide achieved its highest technical milestone: #1 on SWE-Bench Verified at 62.2%, using only Claude Sonnet 3.5 and a test-time scaling approach.^[12] Co-founder Sandeep Pani's announcement tweet generated 143,000 views and 559 reposts—the company's highest-visibility moment.

SOTA on swebench-verified: relearning the bitter lesson

On product adoption, the company reported 200+ repositories onboarded and more than 20 professional engineers using Aide daily at one point.^[22] The GitHub repository accumulated 2,200 stars and 340 forks before archival.^[19] These figures indicate real but modest adoption—meaningful for a small team validating product-market fit, but not at the scale required to sustain a venture-backed business in a capital-intensive market. No revenue figures, MRR, or paying subscriber counts were publicly disclosed.

Post-Mortem

CodeStory's shutdown in February 2025 was not caused by a single catastrophic failure. It was the cumulative result of four compounding problems: a capital structure that was mismatched to the competitive environment, a product strategy that prioritized technical novelty over distribution, repeated pivots that prevented compounding user growth, and a benchmark-first orientation that generated visibility without commercial conversion.

Structural Capital Disadvantage

CodeStory raised $500,000 from Y Combinator—the standard YC deal—and never raised follow-on capital.^[5] YC was the sole institutional investor.^[24] In the same period, Cursor raised a $60 million Series A (August 2024) and subsequently a $900 million Series B. The capital gap was not a minor disadvantage—it was a structural constraint on every other decision the company made.

With $500K, CodeStory could not afford meaningful paid user acquisition, could not hire a dedicated sales or marketing function, could not absorb the API costs of running frontier models at scale for a large user base, and could not sustain a long enough runway to iterate through multiple product pivots until one found product-market fit. The founding design engineer's description of the AgentFarm pivot—"within two days, I rebranded, set up ad campaigns with an automated sales funnel and completed sales calls"^[25]—captures the resource reality: a two-person founding team and one design engineer attempting to execute go-to-market strategy in 48 hours.

The company appears to have either not attempted to raise a Series A or attempted and failed. No evidence of a fundraising process exists in public records. Given that the company achieved genuine benchmark leadership and had a credible technical story, the absence of follow-on capital suggests either that the commercial metrics were insufficient to attract investors, or that the founders did not prioritize fundraising alongside product development.

Three Pivots in Twenty Months

CodeStory pivoted its core product three times in under two years: from chat/autocomplete (August 2023) to multi-file agentic editing (March 2024) to AgentFarm background agents (late 2024), with a brief additional experiment porting to the VSCode extensions marketplace.^[21] Each pivot was technically motivated and each reflected a genuine insight about LLM capabilities. But each pivot also reset the user acquisition clock.

The first pivot—away from chat and autocomplete—was the most consequential. The founders concluded that "the UX around chat and copilot felt limiting to the real abilities of these LLMs."^[16] This was a technically defensible position in early 2024. But it meant abandoning the product category that Cursor was simultaneously scaling into a large business. The founding design engineer later acknowledged the cost of this decision directly: "For codegen UX, the bitter lesson is accepting that the chat interface wins over smart approaches. Over-designing the way we serve LLM output to the user has been a big miss on my behalf."^[26]

The second pivot to AgentFarm—background multi-agent task execution—was a further step away from the interaction model that developers were actually adopting. There is no public explanation of why AgentFarm failed or was abandoned, but the timing suggests it did not generate sufficient traction before the company ran out of runway.

Benchmark Leadership Without Commercial Conversion

CodeStory's most visible achievements were its SWE-Bench results: #1 on SWE-Bench Lite at 43% (held July–November 2024) and #1 on SWE-Bench Verified at 62.2% (November–December 2024).^[10]^[12] The SWE-Bench Verified result generated 143,000 tweet views and 559 reposts—the company's single highest-visibility moment. The GitHub repository was archived approximately ten weeks later.

The benchmark results were real technical achievements. SWE-Bench Verified tests AI systems on actual GitHub issues from production open-source repositories, and a 62.2% resolution rate represented genuine capability. But benchmark leadership in AI coding tools in 2024 was not a durable competitive moat. The leaderboard turned over rapidly—CodeStory lost its SWE-Bench Lite position within four months—and the connection between benchmark ranking and user acquisition was never established. The Show HN post in November 2024, submitted at approximately the same time as the benchmark loss and three months before shutdown, suggests the team was still attempting to convert technical credibility into user growth at a very late stage.^[11]

Correct Technical Bets, Wrong Timing and Scale

The founders and their design engineer articulated a pattern in retrospect that is the most honest account of what went wrong. CodeStory made the right technical calls—multi-file editing, agentic coding, test-time scaling—but made them either too early, without sufficient distribution infrastructure, or without the capital to sustain the lead until the market caught up.

The design engineer noted that multi-file editing "may seem trivial nowadays... but they were engineering milestones back then. IDEs like Cursor were just starting out."^[23] This is accurate: CodeStory was building multi-file agentic editing in March 2024 when Cursor was still primarily a chat-and-autocomplete tool. But Cursor had the capital and distribution to catch up and surpass CodeStory's capabilities within months, while CodeStory had neither the users nor the revenue to sustain its technical lead.

The founders also acknowledged learning the "bitter lesson"—the principle from machine learning research that scale and compute consistently outperform algorithmic cleverness—twice over: once when their multi-agent approach topped SWE-Bench Lite, and again when test-time scaling produced their Verified SOTA result.^[27] In their own words: "We at CodeStory are surprised and amazed at this learning. The surprise comes from the fact that for the better half of this year, we figured algorithmic smartness would yield better results."^[28] The irony is that the same lesson applies to their business: scale—of capital, of users, of distribution—beats algorithmic cleverness in markets as well as in machine learning.

Key Lessons

Benchmark leadership is not a business. CodeStory held the #1 position on SWE-Bench Lite for four months and achieved SOTA on SWE-Bench Verified weeks before shutdown. Neither result translated into measurable commercial traction. In markets where the underlying technology is improving rapidly, benchmark positions turn over faster than sales cycles close. Technical credibility is a necessary but insufficient condition for building a software business.
Pivoting away from the commercially validated category is a high-risk bet that requires capital to survive. CodeStory's decision to abandon chat and autocomplete in February 2024 was technically motivated and arguably correct in its assessment of LLM capabilities. But chat and autocomplete were what the market was buying—as Cursor's growth demonstrated. Departing from a validated category requires enough runway to wait for the market to follow. With $500K in total funding and no follow-on capital, CodeStory did not have that runway.
Distribution is a product decision, not an afterthought. CodeStory's privacy-first local architecture was a genuine differentiator, but it also required developers to download and switch to an entirely new IDE—a high-friction adoption path. The late-stage experiment of porting to the VSCode extensions marketplace suggests the team recognized this problem, but the attempt came too late to change the trajectory. Cursor's growth, by contrast, was partly built on the familiarity of the VSCode interface and aggressive word-of-mouth among developers who could onboard in minutes.
In capital-intensive markets, the funding round is part of the product roadmap. The AI coding tools market in 2023–2025 required significant capital to compete: frontier model API costs, engineering headcount, and user acquisition all scaled with ambition. CodeStory operated on $500K against competitors raising tens and then hundreds of millions of dollars. The absence of a follow-on fundraise—whether by choice or circumstance—effectively capped the company's ability to execute on any of its technical advantages.
The "bitter lesson" applies to startups as well as to AI systems. The founders explicitly learned that scale beats algorithmic cleverness in machine learning. The same principle governed their competitive environment: Cursor's scale of capital, users, and distribution compounded faster than CodeStory's technical sophistication. Recognizing this lesson in the product domain earlier might have redirected energy toward growth and fundraising rather than successive technical pivots.