Science

Judi Lynn

(163,982 posts) Sun Jul 20, 2025, 01:05 AM Jul 2025

Leaked Document Reveals Troubling Details About How AI Is Really Being Trained [View all]

Jul 19, 9:45 AM EDT by Joe Wilkins

Talk about a brain teaser.

Under the hood of a huge amount of artificial intelligence is an immense amount of human labor.

This can take many forms, but a particularly prominent one is "data labeling": the process of annotating material like written text, audio, or video, so that it can be used to train an algorithm.

Fueling the multi-billion dollar AI industry is a vast army of remote contract workers, often from less wealthy countries like the Philippines, Pakistan, Kenya, and India. Most data labelers are typically overworked and underpaid, and have to contend with the mental impact of repetitive work, punitive bosses, as well as exposure to hate speech, violent rhetoric, or other harmful and desensitizing material.

Recently, a trove of "safety guidelines" from billion-dollar data labeling company Surge AI was uncovered by the magazine Inc. Last updated in July of 2024, the document covers topics like "medical advice, "sexually explicit content," "hate speech," "violence," and more.

As Inc notes, Surge AI is a middleman firm, hiring contractors to train commercial large language models (LLMs) like Anthropic's Claude through a subsidiary, DataAnnotation.Tech. Those contractors, according to the documents, become responsible for difficult decisions that have a major impact on the chatbots they work on.

More:
https://futurism.com/documents-ai-training-surge

13 replies

= new reply since forum marked as read

Highlight:

Leaked Document Reveals Troubling Details About How AI Is Really Being Trained [View all] Judi Lynn Jul 2025 OP

The more I learn about AI.... Think. Again. Jul 2025 #1

No, it is the real deal, but it *IS* being oversold and a bubble has formed. Bernardo de La Paz Jul 2025 #2

How is it the 'real deal' if it isn't computers giving answers... Think. Again. Jul 2025 #3

Because computers ARE giving answers. You err as to who is compiling what. Bernardo de La Paz Jul 2025 #4

I think it's obvious from the hallucinations and off-topic rambling... Think. Again. Jul 2025 #5

And you would be wrong. It does much more than fine tuning. Bernardo de La Paz Jul 2025 #8

Basically a bunch of super-fast monkeys with typewriters... Think. Again. Jul 2025 #10

Nonsense. Do you know how an artificial neuron works? How a network of them work? I think you do not Bernardo de La Paz Jul 2025 #11

I think you're right... Think. Again. Jul 2025 #12

AI usefulness depends on the granularity of labels used. BadgerKid Jul 2025 #6

Yeah, just another brand of snake oil. Think. Again. Jul 2025 #7

It's more like the categorizations prime the pump and the AI teaches itself, so granularity is less of an issue. .nt Bernardo de La Paz Jul 2025 #9

Data Proles BoRaGard Jul 2025 #13