The growing demand for computers to run AI models has only accelerated, but there are two major hurdles anyone in business needs to overcome: getting the right chips, and getting them to data centers where they can start generating revenue.
General Compute, a new inference cloud — a company that rents AI processing power, specializes in the phase when models are running and responds to users rather than being trained — has answers to questions that shed light on where the AI ecosystem is headed. These responses helped it raise a $15 million seed round at a $60 million post-money valuation led by FUSE VC with participation from Kariya Venture Partners and Village Global Ventures.
First, what is the right chip? Demand for GPUs has gone through the roof, but it’s becoming conventional wisdom that they aren’t the best chips for running AI models after training. The stage of AI where a model is actively generating responses has different computational requirements than training, and a new class of chips is being developed specifically for it. Nvidia’s $20 billion Groq transaction in December and Cerebras’ $57 billion IPO last week point to that.
With both of these companies running low on capacity, General Compute co-founders, CEO Finn Paklowski and CTO Jason Goodison looked for another option. They’re turning to specialized chips made by Simba Nova, an Intel-backed chipmaker that’s focused on projections that deviate slightly from Silicon Valley talk.
That may change when SambaNova releases its new chips this year. The architecture is more flexible and uses more memory to store context during estimation calculations, and SambaNova claims it outperforms not only GPUs but also other specialized chips made by the likes of Groq or Cerebras. Pakloski says the new chips will generate 600 to 700 tokens per second, versus about 250 tokens per second for GPUs.
General Compute has $300 million of the company’s SN50 chips on order and says it will be the first nucloud to deploy them.
These chips also help solve the other big problem — where to put them — for general computing: They’re air-cooled, not water-cooled, and use less power, so they can be installed in existing data center facilities without new infrastructure investments.
Puklowski pursues colocation deals — arrangements where General Compute installs its hardware in someone else’s facility — not only with data center providers, but also with crypto miners looking to retool their infrastructure because the cost of producing bitcoin often exceeds its value.
General Compute launched its cloud offering last week, claiming it’s already the fastest running the powerful open-source LLM MiniMax 2.7.
Joe Hassleman is a venture investor who got in on the ground floor of the inference boom when he invested in Groq in 2021. That year, he launched a new fund, Evercrest Partners, focused on the AI space, and made General Compute his first investment. Hasselman sees SambaNova’s partnership with General Compute as a parallel to Coreweave’s relationship with Nvidia — and combines Groq’s chipmaking with its previous cloud offering.
“They need a healthy mix of users who will put their chips in an environment that’s highly developed for them,” Hasselman said. “As General Compute is betting on SambaNova, SambaNova is betting on General Compute.”
The question is what kind of computer architecture will gain the most importance in the AI future. Inference clouds are an implicit bet on a world of multiple models and agents, where no single provider dominates and speed and inference cost become key competitive variables. Consider the $113 million Series B raised this week for OpenRouter, which demonstrates the company’s ability to offer users access to multiple models to optimize their token spending.
In this calculation, speed matters for price, and for capacity. Puklowski wants to turn hours-long workloads for coding agents into five- or ten-minute tasks, and make audio agents for customer service, which require faster judgment to communicate effectively, more economical.
“If you use ChatGPT and it gives you 50 tokens per second, that’s still faster than our read,” Pakloski told TechCrunch.
When you make a purchase through links in our articles, we may earn a small commission. This does not affect our editorial freedom.




