Science
Related: About this forumLeaked Document Reveals Troubling Details About How AI Is Really Being Trained
Jul 19, 9:45 AM EDT by Joe Wilkins
Talk about a brain teaser.
Under the hood of a huge amount of artificial intelligence is an immense amount of human labor.
This can take many forms, but a particularly prominent one is "data labeling": the process of annotating material like written text, audio, or video, so that it can be used to train an algorithm.
Fueling the multi-billion dollar AI industry is a vast army of remote contract workers, often from less wealthy countries like the Philippines, Pakistan, Kenya, and India. Most data labelers are typically overworked and underpaid, and have to contend with the mental impact of repetitive work, punitive bosses, as well as exposure to hate speech, violent rhetoric, or other harmful and desensitizing material.
Recently, a trove of "safety guidelines" from billion-dollar data labeling company Surge AI was uncovered by the magazine Inc. Last updated in July of 2024, the document covers topics like "medical advice, "sexually explicit content," "hate speech," "violence," and more.
As Inc notes, Surge AI is a middleman firm, hiring contractors to train commercial large language models (LLMs) like Anthropic's Claude through a subsidiary, DataAnnotation.Tech. Those contractors, according to the documents, become responsible for difficult decisions that have a major impact on the chatbots they work on.
More:
https://futurism.com/documents-ai-training-surge

Think. Again.
(22,401 posts)...the more it seems like a huge fraud being run hot and heavy before the the public catches on.
Bernardo de La Paz
(57,143 posts)Another AI winter is coming, but it will be briefer and less pronounced than previous ones where funding almost completely dried up.
AI works, just not nearly as well as those promoting it would have us believe. It is not going away, and it will get better and more powerful, not immediately. Consumer robots in 2029 might happen but there won't be much uptake. AI penetration into call centres and "agents" will continue, but mostly just as the first filtering stages until some time later. Companies will pull back some from agentic AI but not abandon it. AI will be used to gather information and suggest options and assist with data intensive work like programming and legal briefs, but everybody involved in that will pull back some due to "hallucinations" and the strict requirements for testing and fact checking. Courts have no patience for fake citations in legal briefs.
But by 2050, I think self-driving and free standing self-directing robotics will be plentiful and useful.
Think. Again.
(22,401 posts)...but it's a whole bunch of underpaid Human lackeys compiling answer sets for the computers to click on later?
Bernardo de La Paz
(57,143 posts)The AI systems in learning mode are compiling much more detail and information than the people in the process are doing. The people are categorizing images with very broad descriptors like "red toy truck". That is a very simple level of compilation.
When the AI looks at a ton of red toy truck images, perhaps a couple thousand in that category alone, it is also looking at thousands of categories and reading millions of pieces of text. If you looked at few thousand red toy truck pictures you too would learn more than you would be consciously aware of: details like the common natures of the plastics used, the shades of red used, the most popular models, etc.
Think of it as three phases: 1) Assemblage and categorization of the images and text, 2) learning by the AI from the data and the categories, 3) generating responses to queries. Phases 2 and 3 are run without human intervention.
So it is not true that the people are compiling answer sets. That is phase 3.
Think. Again.
(22,401 posts)...that all the computer is doing is putting together random reactions to human queries that are only as fine-tuned as the amount of work humans have put in beforehand.
Edit to add:
The only credit I'll give to current AI is, it's a great syntax machine. In American English, anyway.
Bernardo de La Paz
(57,143 posts)I'm a programmer with an interest in the field and I have come to a different conclusion. I could be wrong, but I don't think so. The AIs have found new alloys, as just one example, which were combinations that never occurred to any metallurgists with years of high level experience in the field. That's not "fine tuning" and it does not come from work the people did categorizing texts. It does come standing on the shoulders of the researchers going decades back, just like any metallurgist stands on the same shoulders. But the machine was able to "think outside the box".
Think. Again.
(22,401 posts)...that eventually write great novels due to chance and probability.
BadgerKid
(4,864 posts)And even still, the AI output is a prediction with an attached probability. Yes, sometimes the best prediction will differ from ground truth.