Almost all data sources are poisoned by ideology

Published by marco on

Updated by marco on

The article What people get wrong about the leading Chinese open models: Adoption and censorship by Nathan Lambert (Interconnects) discusses the politics behind AI models but only from the perspective of the western empire. It makes a good point but can’t see that it applies all ways.

“People vastly underestimate the number of companies that cannot use Qwen and DeepSeek open models because they come from China. This includes on-premise solutions built by people who know the fact that model weights alone cannot reveal anything to their creators.”

This article is absolutely correct in saying that people are strongly disinclined to use Chinese models, even those with open weights, because they still can’t know what’s in the training data. That’s a great instinct, and one that they utterly failed to apply—and continue to fail to apply—to western models.

People continue to blindly trust Western models with closed training data and closed weights and closed everything, even after a track record of exactly that kind of software being replete with backdoors and ideological slant arguably stronger than that of China.

Just because you’ve learned to agree with a certain propaganda doesn’t mean it’s not there, for God’s sake.

For example, there’s the following concern, which apparently magically comes into focus when the source model is Chinese…and blends right back into the background noise as an SEP when the model comes from the good, old, U.S. of A.

“A technical example of this is that companies worry about the code generated by the models having security backdoors — treading the line between information and traditional security risks. As models become more reliant on tool-use, this also involves them executing code on a company’s infrastructure, which presents more immediate worries.”

There is a good analysis, with data, of people testing the various models for their level of willingness, evasiveness, or outright denial, to assist in criticizing Chinese policy or historical interpretation. That is, to what degree does the machine just answer questions, and to what degree does it toe the CCP line?

“When you look at queries about China specifically, the Chinese models will evade many requests.”

Again, a very interesting line of inquiry and one which has been utterly absent from analysis of Western models or sources.

For example, Wikipedia’s article on Taiwan is incredibly slanted to the interpretation that Taiwan is its own country, first citing a good handful of very reliable sources like the f@&king Atlantic Magazine, which write things like “[…] already a de facto state” and “is in fact a sovereign country from our perspective”—something so mush-mouthed and self-contradictory (it can’t be both a “fact” and “from our perspective,” you utter poltroons) that it can hardly be taken seriously—before grudgingly admitting deep into the description that, “the ROC no longer represents China as a member of the United Nations after UN members voted in 1971 to recognize the PRC instead.”

That the ROC is still an autonomous state, rather than a “fact”, is a fantasy promulgated by western neocons who would prefer that all of Taiwan’s chip-manufacturing not be located in China. The civil war—by-now over ¾ of a century in the past—is described not as the overwhelming majority of communists on the mainland having taking over China in a revolution but as a setback for the ROC that “resulted in the loss of the Chinese mainland to Communist forces”. The whole article is written as if the ROC’s defeat were a temporary setback that will be soon rectified for the forces of good and light—the anticommunist ones, of course.

This long interlude about Chinese history serves to say that we accept that narrative that is served to us and view everything else as propaganda. Perhaps some of the “propaganda” that we’re seeing come from Chinese models is that they’re just programmed to describe things from a non-Western view, one where the revolution in China lays far, far, far in the past and Taiwan is a part of China (as even the U.N. agrees and continues to agree, and as even U.S. official policy continues to agree with the One-China Policy.

Look, just stop asking pointed questions of these machines. They will give answers that align with what their creators believe. See what ChatGPT thinks about Palestinians and Israelis if you don’t believe me.