Labeling workflow

How to create manual labels in kern

Even though our software allows you to quickly scale your labeling tasks, we do not aim to eliminate manual labeling altogether. It helps a lot to better understand the data, identify potential obstacles and build your model.

On this page we will explain how manual labeling works in kern.

🏷️ Classification and extraction

At this point, we offer labeling tasks for classification (both binary and multi-class) and information extraction. As of now, we do not support native multi-label classification tasks. But there is a workaround for that! Create one labeling task for each label of your multi-label problem (essentially binarizing the problem) and start labeling with that setup. We plan to integrate native multi-label labeling tasks in the near future.

Labeling for information extraction tasks
For tasks like named entity recognition, you can use the information extraction option. If there is an information extraction task active on an attribute, you can hover over the tokens of that attribute and select them (highlight them as you would do when copying the text). Once you select a range of tokens, a drop-down menu will appear where you can choose from the available labels.

Labeling for classification tasks
Classification labeling tasks can be on record-level or attribute-level. The attribute-level tasks will always be displayed just below that attribute, the record-level labeling tasks will be displayed at the bottom of the record. For very efficient labeling you can display a certain number of label buttons direcly without opening the label dropdown (specify that number in the top right just above the record card).

Deleting labels
To remove labels, you can either press the "x" next to them in the record card or you can delete the entry in the label overview table loacted at the very bottom of the labeling tab.

📘

Adding labels within the labeling tab

If you come across a record that sparks an idea for a new label, you can create that label without going to the settings tab. Just add more labels for a labeling task during labeling by entering a new label name in the drop-down search bar and then clicking the + icon.

🕹️ Workflow

In kern we make use of labeling sessions. A labeling session can be started by either entering the labeling tab, or by clicking on the upper right "play button" of a record in the Data Browser. If there is no labeling session active, entering the labeling tab creates a new one with only scale records. If you want to label records that meet certrain criteria, you can set the desired filters in the data browser and start your labeling session from there. A labeling session holds 1000 records, which should be enough to keep you busy!

📘

Saving your filters as Data Slices

If you found a set of filters that you want to keep for later use, you can save them as data slices. Right now we only support dynamic data slices, which means that the result of applying the saved filters will be recalculated every time you use that data slice.

The labels you set while labeling will be added to the overview table at the bottom of the page. This table also contains information about the labels set by your information sources and weak supervision. They are initially hidden to avoid introducing bias, but if you want to view all of them, just toggle the switch on the right side just above the table.

We distinguish non-manual labels into labels derived from information sources and labels derived from information integration, i.e. Weak Supervision. Only weakly supervised or manual labels are subsequently used as actual labels (e.g. for the confusion matrix in monitoring and export).

📘

Enriching records

By enriching your raw data sets with results from information sources and manual/weakly supervised labels, you essentially add more dimensions to the data enhancing the ability to slice your data sets to find potential weaknesses. Also, by labeling the data, you increase the amount of reference data used to determine the quality of the information sources. This further improves the quality of Weak Supervision.

👪 Multi-user labeling

Labeling tasks are often worked on by several people at the same time, which is why kern implements multi-user capabilities. You and your colleagues can label data at the same time in the same project using their individual kern accounts. Labels set by different people will be indicated by icons carrying their respective initials in the top right of the labeling page. Clicking on an icon lets you enter the user specific view of that person so that you can see the labels they have set. You cannot edit the labels set by your colleagues, so if you cannot label a record at a given moment, please make sure that you currently are in your own user view.

Take a look at the labels set by your colleagues by utilizing the user views in the top right corner.Take a look at the labels set by your colleagues by utilizing the user views in the top right corner.

Take a look at the labels set by your colleagues by utilizing the user views in the top right corner.

If there are disagreements among users when labeling a record you can resolve that conflict by making use of our gold labels ⭐. Gold labels can be set in the labeling tab as soon as our system detects a conflict on the presented record.

As soon as there is a disagreement, a little star icon appears that lets you choose the label that should be used in the application: the gold label. This label can also differ from the annotations that were made by the users! Just enter the gold star view and label your record like you normally would.As soon as there is a disagreement, a little star icon appears that lets you choose the label that should be used in the application: the gold label. This label can also differ from the annotations that were made by the users! Just enter the gold star view and label your record like you normally would.

As soon as there is a disagreement, a little star icon appears that lets you choose the label that should be used in the application: the gold label. This label can also differ from the annotations that were made by the users! Just enter the gold star view and label your record like you normally would.

🚧

Unresolved conflicts

Records with unresolved conflicts (no gold label but contradicting manual labels) will not be processed for active learning, stats calculation or the confusion matrix as those processes require non-ambigious data to work best.


Did this page help you?