Labeling functions

Write heuristics within a few lines of Python code

📘

Also available on 📺 YouTube

Click here to see a video explanation of how you can build labeling functions.

If you want to automate parts of your data labeling, heuristics like labeling functions come in handy. To do so, simply head over to the heuristics page and select "Labeling function" from the "New heuristic" button.

28802880

Writing your labeling function

You'll jump into a heuristic page with some code editor. Here you can write Python functions that take as input a dictionary (we loop over all records of your project, so imagine this to be one specific record - just as in the record IDE.), and output a label name.

28802880

We run this code as containerized functions, such that we need to prepare your execution environment. You can find installed libraries in the requirements.txt of our execution environment repository. If you're missing anything, please open a thread in our community forum.

As with any other heuristic, your function will automatically and continuously be evaluated against the data you label manually.

Lookup lists for distant supervision

📘

Also available on 📺 YouTube

Click here to see a video explanation of how you can build lookup-list-based labeling functions.

You'll quickly see that many of the functions you want to write are based on list expressions. But hey, you most certainly don't want to start maintaining a long list in your heuristic, right? That's why we've integrated automated lookup lists into our application.

As you manually label spans for your extraction tasks, we collect and store these values in a lookup list for the given label.

28802880

You can access them via the heuristic overview page when you click on "Lookup lists". You'll then find another overview page with the lookup lists.

28802880

If you click on "Details", you'll see the respective list and its terms. You can of course also create them fully manually, and add terms as you like. This is also helpful if you have a long list of regular expressions you want to check for your heuristics. You can also see the python variable name of the lookup list, as in this example countries.

28802880

In your labeling function, you can then import it from the module knowledge, where we store your lookup lists. In this example, it would look as follows:

28802880

Heuristics for extraction tasks

You might already wonder what labeling functions look like for extraction tasks, as labels are on token-level. Essentially, they differ in two characteristics:

  • you use yield instead of return, as there can be multiple instances of a label in one text (e.g. multiple people)
  • you specify not only the label name but also the start index and end index of the span.

An example that incorporates an existing knowledge base to find further examples of this label type looks as follows:

28802880

This is also where the tokenization via spaCy comes in handy. You can access attributes such as noun_chunks from your attributes, which show you the very spans you want to label in many cases. Our template functions repository contains some great examples of how to use that.

Template functions

We realize that labeling functions can at first be a bit difficult to write. Because of that, we have a super simple GitHub repository in which we show some exemplary usages. You can copy and paste them, and even use them fully outside of our application.

24482448

If you have further ideas for template functions, please feel free to add them as issues.


Did this page help you?