Science

Judi Lynn

(163,703 posts) Sun Jul 20, 2025, 01:05 AM Yesterday

Leaked Document Reveals Troubling Details About How AI Is Really Being Trained

Jul 19, 9:45 AM EDT by Joe Wilkins

Talk about a brain teaser.

Under the hood of a huge amount of artificial intelligence is an immense amount of human labor.

This can take many forms, but a particularly prominent one is "data labeling": the process of annotating material like written text, audio, or video, so that it can be used to train an algorithm.

Fueling the multi-billion dollar AI industry is a vast army of remote contract workers, often from less wealthy countries like the Philippines, Pakistan, Kenya, and India. Most data labelers are typically overworked and underpaid, and have to contend with the mental impact of repetitive work, punitive bosses, as well as exposure to hate speech, violent rhetoric, or other harmful and desensitizing material.

Recently, a trove of "safety guidelines" from billion-dollar data labeling company Surge AI was uncovered by the magazine Inc. Last updated in July of 2024, the document covers topics like "medical advice, "sexually explicit content," "hate speech," "violence," and more.

As Inc notes, Surge AI is a middleman firm, hiring contractors to train commercial large language models (LLMs) like Anthropic's Claude through a subsidiary, DataAnnotation.Tech. Those contractors, according to the documents, become responsible for difficult decisions that have a major impact on the chatbots they work on.

More:
https://futurism.com/documents-ai-training-surge

10 replies

= new reply since forum marked as read

Highlight:

Leaked Document Reveals Troubling Details About How AI Is Really Being Trained (Original Post) Judi Lynn Yesterday OP

The more I learn about AI.... Think. Again. 19 hrs ago #1

No, it is the real deal, but it *IS* being oversold and a bubble has formed. Bernardo de La Paz 16 hrs ago #2

How is it the 'real deal' if it isn't computers giving answers... Think. Again. 13 hrs ago #3

Because computers ARE giving answers. You err as to who is compiling what. Bernardo de La Paz 8 hrs ago #4

I think it's obvious from the hallucinations and off-topic rambling... Think. Again. 5 hrs ago #5

And you would be wrong. It does much more than fine tuning. Bernardo de La Paz 3 hrs ago #8

Basically a bunch of super-fast monkeys with typewriters... Think. Again. 46 min ago #10

AI usefulness depends on the granularity of labels used. BadgerKid 4 hrs ago #6

Yeah, just another brand of snake oil. Think. Again. 4 hrs ago #7

It's more like the categorizations prime the pump and the AI teaches itself, so granularity is less of an issue. .nt Bernardo de La Paz 3 hrs ago #9

Think. Again.

(22,401 posts)

1. The more I learn about AI....

Reply to Judi Lynn (Original post)

Sun Jul 20, 2025, 05:43 AM

19 hrs ago

...the more it seems like a huge fraud being run hot and heavy before the the public catches on.

Bernardo de La Paz

(57,143 posts)

2. No, it is the real deal, but it *IS* being oversold and a bubble has formed.

Reply to Think. Again. (Reply #1)

Sun Jul 20, 2025, 09:25 AM

16 hrs ago

Another AI winter is coming, but it will be briefer and less pronounced than previous ones where funding almost completely dried up.

AI works, just not nearly as well as those promoting it would have us believe. It is not going away, and it will get better and more powerful, not immediately. Consumer robots in 2029 might happen but there won't be much uptake. AI penetration into call centres and "agents" will continue, but mostly just as the first filtering stages until some time later. Companies will pull back some from agentic AI but not abandon it. AI will be used to gather information and suggest options and assist with data intensive work like programming and legal briefs, but everybody involved in that will pull back some due to "hallucinations" and the strict requirements for testing and fact checking. Courts have no patience for fake citations in legal briefs.

But by 2050, I think self-driving and free standing self-directing robotics will be plentiful and useful.

Think. Again.

(22,401 posts)

3. How is it the 'real deal' if it isn't computers giving answers...

Reply to Bernardo de La Paz (Reply #2)

Sun Jul 20, 2025, 11:44 AM

13 hrs ago

...but it's a whole bunch of underpaid Human lackeys compiling answer sets for the computers to click on later?

Bernardo de La Paz

(57,143 posts)

4. Because computers ARE giving answers. You err as to who is compiling what.

Reply to Think. Again. (Reply #3)

Sun Jul 20, 2025, 05:15 PM

8 hrs ago

The AI systems in learning mode are compiling much more detail and information than the people in the process are doing. The people are categorizing images with very broad descriptors like "red toy truck". That is a very simple level of compilation.

When the AI looks at a ton of red toy truck images, perhaps a couple thousand in that category alone, it is also looking at thousands of categories and reading millions of pieces of text. If you looked at few thousand red toy truck pictures you too would learn more than you would be consciously aware of: details like the common natures of the plastics used, the shades of red used, the most popular models, etc.

Think of it as three phases: 1) Assemblage and categorization of the images and text, 2) learning by the AI from the data and the categories, 3) generating responses to queries. Phases 2 and 3 are run without human intervention.

So it is not true that the people are compiling answer sets. That is phase 3.

Think. Again.

(22,401 posts)

5. I think it's obvious from the hallucinations and off-topic rambling...

Reply to Bernardo de La Paz (Reply #4)

Sun Jul 20, 2025, 08:16 PM

5 hrs ago

...that all the computer is doing is putting together random reactions to human queries that are only as fine-tuned as the amount of work humans have put in beforehand.

Edit to add:

The only credit I'll give to current AI is, it's a great syntax machine. In American English, anyway.

Bernardo de La Paz

(57,143 posts)

8. And you would be wrong. It does much more than fine tuning.

Reply to Think. Again. (Reply #5)

Sun Jul 20, 2025, 10:24 PM

3 hrs ago

I'm a programmer with an interest in the field and I have come to a different conclusion. I could be wrong, but I don't think so. The AIs have found new alloys, as just one example, which were combinations that never occurred to any metallurgists with years of high level experience in the field. That's not "fine tuning" and it does not come from work the people did categorizing texts. It does come standing on the shoulders of the researchers going decades back, just like any metallurgist stands on the same shoulders. But the machine was able to "think outside the box".