Crowdsourcing Datasets to Train AI Fuses Artificial and Human Intelligence

CSW's 2024 theme is the unleashing of collective intelligence through the fusion of AI and HI. Do CEOs and COOs trust such datasets to train AI?
Main image for a Crowdsourcing Week blog on training AI with crowdsourced datasets that represent collective intelligence

Written by Clive Reffell

In January 2024 we set our theme for the year of unleashing collective intelligence through the fusion of AI and human intelligence (HI), particularly in regard to creating datasets to train AI. We said we would delve into the heart of this theme to unravel the layers of HI + AI where human and artificial intelligence converge to amplify our collective cognitive abilities. However, before we race too far ahead of ourselves, our partner ScaleHub consulted with a cross-section group of mainly CEOs and COOs to check where their current consensus is on this topic – if they have one.

The ScaleHub portal is a crowdsourcing platform that takes traditional data extraction to the cloud and provides access to both public and private global crowd communities for purposes of document automation. This enables businesses to scale and automate on demand through faster, easier, super-accurate data extraction.

Benefits of AI and crowdsourced datasets

The generally recognised reasons for CEOs and COOs to consider introducing AI to their businesses, AI that has been trained on crowdsourced data, are as follows:

1. High-Quality Data Training

AI systems are reliant on data for learning and improvement. Crowdsourcing provides access to a vast pool of people who can label data, identify objects in images, or transcribe audio. This human input helps AI learn from more diverse and nuanced information, leading to greater accuracy and generalisability.

Human input is necessary to differentiate between satire and hate speech, identify sarcasm or hidden meanings behind positive or negative phrasing, or apply ethical considerations.

2. Enhanced AI Evaluation and Debugging

Identifying and fixing biases or errors in AI systems can be challenging. Crowdsourcing can be used to gather feedback on an AI’s performance. People can evaluate the AI’s outputs, highlighting areas where it struggles or produces incorrect results. This feedback loop allows for targeted improvements and helps ensure the AI is functioning as intended.

3. Humans-in-the-Loop for Complex Tasks

An anonymous doctor in a Crowdsourcing Week blog on Ai and HI collective intelligence

Photo by Online Marketing on Unsplash

Certain tasks require human judgment or common sense that AI struggles with. Crowdsourcing allows you to leverage human intelligence for specific steps within the AI workflow. This can be particularly useful for tasks like sentiment analysis or identifying complex patterns in data.

A medical diagnosis system might struggle with a rare or atypical case. Human doctors can use their experience and knowledge to consider less common possibilities.

4. Broader Innovation and Idea Generation

AI development can benefit from fresh perspectives. Crowdsourcing platforms can be used to solicit ideas for new AI applications or solutions to specific problems. This can lead to a wider range of creative solutions and accelerate innovation cycles.

Also, datasets to train AI based on historical data might struggle to handle a completely new situation or a sudden shift in trends. Humans can use their understanding of cause-and-effect to adapt to changing circumstances.

5. Cost-Effectiveness

Compared to hiring a dedicated team, crowdsourcing tasks can be a more cost-effective way to access human intelligence for data training, evaluation, or specific steps within the AI development process.

The general AI and crowdsourcing background to the debate

The in-person debate about using artificial intelligence for business purposes was somewhat overshadowed by criticisms of the shortcomings of generative AI. Its hallucinations, the controversy of scraping of material under copyright, and possible infringement of law of contract, have all contributed to a perceived need for caution about anything to do with, or that uses, AI.

Training AI with extensive and comprehensive datasets is the standby advice to ensure good levels of AI-driven customer experience and internal processes. Such datasets can be crowdsourced on-demand. However, the chill in the room over the questionable reliability of AI-driven systems and tools spread to crowdsourcing. It was agreed that crowdsourcing democratises data through the greater diversity of contributors, but could this form of collective intelligence be trusted enough for a business to put AI that was trained on it in its central core?

It was agreed that where AI has been introduced so far, in the use of chatbots for example, it has been applied to the “low hanging fruit,” the simplest tasks. Future use of AI to tackle more complex matters will demand even more from high quality training datasets.

AI Challenges

The questions and issues CEOs and COOs want answers to include these.

  • How to use AI to allow people to work better, rather than having fewer people to produce the same overall level of work as before.
  • How to create perceived value so customers pay more for better services that AI can actually make cheaper to deliver.
  • How to outsource introducing AI into company processes to a reliable and trustworthy third-party.
  • How to fuse AI with HI so that the outcome is better than using just one of them.

Crowdsourcing Challenges

  • Develop data protection rules and a feedback system that validates results and removes machine bias.
  • Demonstrate how it can be used to upskill people.
  • Establish best practice guides on the use of generally accepted guard rails.
  • How to select a crowd to address certain specific issues within collective intelligence.
  •  Using different sources of data from different time periods can be complicated, but a finance company had a good idea to look at data from the stock market crash of 1929 when considering 21st century monetary crises.
  • Is there a better way than using humans to sample and check what AI is doing?

Data Security

Crowdsourcing Week

Photo by Towfiqu barbhuiya on Unsplash

There are some great benefits available for training AI with crowdsourced data to operate in the healthtech sector. However, security of personal data is an issue that could make many people think it’s too risky to let their records be used.

In the UK, for example, the Government has a poor track record of incomplete digitisation projects (e.g. centralised health records), and cybersecurity (e.g. ransomware hacks of the National Health Service, scamming of people who pay their television licence and vehicle tax online).

Key Takeaways

Such failings may not be at the heart of debates over using AI in business, and training it on crowdsourced data sets, but there are many people who have nothing other than this to add to the debate. CEOs and COOs may be wise to express their caution over moving too fast, given concerns over trustworthiness of how the data is created, managed and protected, and by whom. Will the fusion of AI and HI, and the collective intelligence it creates, actually be better for them than investing in just one line of either AI or HI?

There are also numerous historical examples of developments and innovations that became mainstream before fundamental flaws became apparent. Diversity, a benefit of crowdsourcing, remains vital. 

  • Facial recognition systems that cannot distinguish between coloured people is an often quoted example.
  • Early Covid treatment was largely based on analyses of how mostly white people responded to treatment, and Covid mortality rates were higher in other ethnic groups – at least initially.
  • Going back further, car safety belts were designed on data based on predominantly male drivers, and female drivers suffer higher levels of injuries.

The rollout of flawed “advances” due to failings by the teams gathering, creating and interpreting the data really begs the question “Is AI+HI actually better than the sum of its parts if humans train AI?”

These examples confirm the need for diversity in data sources, and for data sample sizes to be large enough for robust and accurate findings. Apart from this, what can, or should, service providers and platforms do to build confidence and encourage greater investment by businesses in using AI trained on crowdsourced data sets?

About Author

About Author

Clive Reffell

Clive has worked with Crowdsourcing Week on sourcing and creating content since May 2016. With knowledge and experience gained in a 30+ year marketing career based in London, UK, he operates as an independent crowdfunding advisor helping SMEs and startups to run successful crowdfunding projects, and with wider social media and content marketing issues.

You may also like

Speak Your Mind


Submit a Comment

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.