Data Labeling and AI

Labeled any data today? 🤔 🏷️ 🏷️

Not sure? Did you click on an image of stairs when logging onto a site today? 🧑‍💻 👩🏼‍💻 👨‍💻

Maybe not just today, but if you have, then yes indeed you are indeed a data labeler.

Data labeling is vital to powering the magic of artificial intelligence and machine learning applications. 🪄

Okay… 🤨

Most people don’t realize that these technologies had to be “taught” what they know using vast datasets of information.

These datasets consist of various types of data: visual, textual, numeric; quantitative and qualitative data; structured, semi-structured, and non-structured; real-time or historic (retrospective). Depending on what you’re dealing with, the type of data you’re handling will be self-evident.

But one thing transcends all of this: the need for data to be labeled.

Let’s use an example of a traffic light—think reCAPTCHA. 🚦

When you’re logging into a website and reCAPTCHA asks you to click on images of a traffic light, you’re actually labeling data for Google. 🥳

Your clicks help identify and re-confirm images that match a traffic light. This is labeling data, and it’s quite important to building AI/ML applications.

Now let’s consider medical MRI or CT images of tumors, tears, and other pathologies. A trained eye can spot these conditions on an image, label and describe what they see, adding now this newly labeled image to a dataset. 🩻 👩‍⚕️

This image, along with thousands of others, can now be used to train new AI image processors to spot a specific disease or disorder.

Data labeling is often a human endeavor, requiring thousands of people to manage big datasets. There’re many companies providing these services globally (more on this in the comments).

I wanted to point this out as the efforts around data labeling and handling are often not discussed when we talk about AI/ML application in health care, or with ChatGPT.

Table of Contents