Labeled any data today? 🤔 🏷️ 🏷️
Not sure? Did you click on an image of stairs when logging onto a site today? 🧑💻 👩🏼💻 👨💻
Maybe not just today, but if you have, then yes indeed you are indeed a data labeler.
Data labeling is vital to powering the magic of artificial intelligence and machine learning applications. 🪄
Most people don’t realize that these technologies had to be “taught” what they know using vast datasets of information.
These datasets consist of various types of data: visual, textual, numeric; quantitative and qualitative data; structured, semi-structured, and non-structured; real-time or historic (retrospective). Depending on what you’re dealing with, the type of data you’re handling will be self-evident.
But one thing transcends all of this: the need for data to be labeled.
Let’s use an example of a traffic light—think reCAPTCHA. 🚦
When you’re logging into a website and reCAPTCHA asks you to click on images of a traffic light, you’re actually labeling data for Google. 🥳
Your clicks help identify and re-confirm images that match a traffic light. This is labeling data, and it’s quite important to building AI/ML applications.
Now let’s consider medical MRI or CT images of tumors, tears, and other pathologies. A trained eye can spot these conditions on an image, label and describe what they see, adding now this newly labeled image to a dataset. 👩⚕️
This image, along with thousands of others, can now be used to train new AI image processors to spot a specific disease or disorder.
Data labeling is often a human endeavor, requiring thousands of people to manage big datasets. There’re many companies providing these services globally (more on this in the comments).
I wanted to point this out as the efforts around data labeling and handling are often not discussed when we talk about AI/ML application in health care, or with ChatGPT.