Labeling functions

Write heuristics within a few lines of Python code


Also available on 📺 YouTube

Click here to see a video explanation of how you can build labeling functions.

If you want to automate parts of your data labeling, heuristics like labeling functions come in handy. To do so, simply head over to the heuristics page and select "Labeling function" from the "New heuristic" button.

Writing your labeling function

You'll jump into a heuristic page with some code editor. Here you can write Python functions that take as input a dictionary (we loop over all records of your project, so imagine this to be one specific record - just as in the record IDE.), and output a label name.

We run this code as containerized functions, such that we need to prepare your execution environment. You can find installed libraries in the requirements.txt of our execution environment repository.

As with any other heuristic, your function will automatically and continuously be evaluated against the data you label manually.

Lookup lists for distant supervision


Also available on 📺 YouTube

Click here to see a video explanation of how you can build lookup-list-based labeling functions.

You'll quickly see that many of the functions you want to write are based on list expressions. But hey, you most certainly don't want to start maintaining a long list in your heuristic, right? That's why we've integrated automated lookup lists into our application.

As you manually label spans for your extraction tasks, we collect and store these values in a lookup list for the given label.

You can access them via the heuristic overview page when you click on "Lookup lists". You'll then find another overview page with the lookup lists.

If you click on "Details", you'll see the respective list and its terms. You can of course also create them fully manually, and add terms as you like. This is also helpful if you have a long list of regular expressions you want to check for your heuristics. You can also see the python variable name of the lookup list, as in this example countries.

In your labeling function, you can then import it from the module knowledge, where we store your lookup lists. In this example, it would look as follows:

Heuristics for extraction tasks

You might already wonder what labeling functions look like for extraction tasks, as labels are on token-level. Essentially, they differ in two characteristics:

  • you use yield instead of return, as there can be multiple instances of a label in one text (e.g. multiple people)
  • you specify not only the label name but also the start index and end index of the span.

An example that incorporates an existing knowledge base to find further examples of this label type looks as follows:

This is also where the tokenization via spaCy comes in handy. You can access attributes such as noun_chunks from your attributes, which show you the very spans you want to label in many cases. Our template functions repository contains some great examples of how to use that.

Template functions

We realize that labeling functions can at first be a bit difficult to write. Because of that, we have a super simple GitHub repository in which we show some exemplary usages. You can copy and paste them, and even use them fully outside of our application.

If you have further ideas for template functions, please feel free to add them as issues.