Why are Models like DeepSeek so much better?

Whenever discussing LLM performance, terms that evaluate performance get very “slippery,” thus making comparisons difficult or impossible. Case in point (and this is all said before I actually dig into it): my understanding is that DeepSeek essentially uses model outputs from the popular models as its inputs, which is illegitimate. However, it also has a much more efficient model. When considering both those factors, it is extremely difficult or impossible to compare models in common terms. Moreover, US- and China-based model developers exist in different contexts domestically. It is in these different circumstances that the different approaches become salient. Officially, China is communist and has a government that has final authority on economic priorities. In contrast, officially, the US has a free market, which means US-based companies have more incentive to make fantastic claims, particularly when evaluated in the context of an “AI race.” Correspondingly, both the US and China have incentives to mislead anybody potentially comparing “AI.”

Moreover, the Chinese government has been known to have backdoors and extensive surveillance throughout its software products. The US government has similar surveillance authority, but from a purely game theory perspective, I would surmise the more open the society, the harder a time it’s gonna have establishing long-running surveillance. Here’s an example: I fully suspect the big AI chat bot companies facilitate surveillance in the US. However, to do that, the government needs budgeting, procurement, legal scrutiny, and etc. I have heard a saying to the effect of: “a secret known by three people is no longer a secret.” That one definitely applies here. Even if the US and Chinese governments had the same type of surveillance programs, the culture in the US is one that would incentivize individuals in the system to seek personal glory, thus undermining long-term secrecy. Moreover, the US doesn’t exactly have a stellar track record for secrecy.

Anyway, I’m getting off-track. The point here is that tools like DeepSeek are extremely difficult to evaluate in the same way as the top frontier models, but that may not even be a particularly useful evaluation.

Leave a Comment