Heuristics

Labeling functions

Labeling functions are the simplest concept of a heuristic. They consist of some Python code that labels the data according to some logic that the user can provide. This can be seen as transferring the domain expertise from concepts stored in a person's mind to explicit rules, which are human- and machine-readable.

Creating a labeling function

To create a labeling function simply navigate to the heuristics page and select "Labeling function" from the "New heuristic" button.

Fig. 1: Screenshot of the heuristics page where the user is about to add a labeling function by accessing the dropdown at the top.

After that, a modal will appear where you need to select the labeling task that this heuristic is for and give the labeling function a unique name and an optional description. These selections are not finite and can be changed easily later.

Fig. 2: The user selected the classification labeling task 'topic' for this labeling function, which will be a simple lookup function.

After creation, you will be redirected to the details page of that labeling function, which is also accessible from the heuristics overview. Here, you have the option to start your labeling function from scratch or to "search in bricks". Bricks is our open-source content library that collects a lot of standard NLP tasks, e.g. language detection, sentiment analysis, or profanity detection. If you are interested in integrating a bricks module, please look at the bricks integration page.

Fig. 3: Screenshot of the labeling function details page after initialization.

The next section will show you how to write a labeling function from scratch.

Writing a labeling function

After creation, you will be redirected to the details page, which packs many features. For this section, we will be concentrating on the code editor.

Before we start coding, let's talk about the signature of the labeling function:

The input record is a dictionary containing a single record with the attribute names as keys
- as we tokenize text attributes, they won't be a simple string, but rather a spaCy Doc object (use attribute.text for the raw string)
- the other attributes can be accessed directly, even categorical attributes
The output will be different depending on the type of labeling task
- a classification task must have a return statement that returns an existing label name as a string
- an extraction task must have a yield statement that follows the pattern of yield YOUR_LABEL, span.start, span.end where YOUR_LABEL also is an existing label name as a string

To write a labeling function, you just input your code into the code editor. Be aware that auto-save is always on! So if you plan to make big changes, either create a new labeling function or save the old code in a notepad.

Fig. 4: Screenshot of the labeling function details page where the user finished writing the labeling function. It imports a lookup list called 'business' and looks for any of these terms in the headline attribute in order to return the label 'business'. This could be extended to look for a minimum threshold of matching terms to increase accuracy.

Above the code editor, there are some quality-of-life features:

Attributes: list of attributes that are available in your data. The colors of these buttons indicate the data type, hover over them for more details. They can also be pressed which copies the name to your clipboard - no more typos!
Lookup lists: list of available lookup lists and the number of terms that are in them. Click them to copy the whole import statement to the clipboard.
Editor: the dropdown right next to it defines the labeling task that the labeling function will be run on. Next to that are more colorful buttons that represent the available labels for that task. Click them to copy to the clipboard!

There are many pre-installed useful libraries, e.g. beautifulsoup4, nltk, spacy, and requests. You can check what libraries are installed by clicking the See installed libraries button on the top right corner just above the editor. You read that right, requests work within labeling functions! So you can also call outside APIs and save those predictions as a labeling function. Though, if you have a production model that you want to incorporate, we suggest using model callbacks.

Our labeling function is still in the state initial, which means we cannot use it anywhere. To change that, the next chapter will cover running this function on your data.

Running a labeling function

After you've written your labeling function, you have three options to run it on your data:

Run on 10: Randomly sample 10 records from your data and run the selected function on it, then display the selected attribute of the record with the prediction of the labeling function.
Run: Run the function on all the records of your project. This will alter the state of your labeling function to running and after that (depending on the outcome) either to finished or error.
Run + weakly supervise: Just like Run but triggers a weak supervision calculation with all heuristics selected directly afterward.

Fig. 5: Screenshot of the labeling function details page where the user tested his labeling function called lookup_business on 10 randomly selected records with the 'run on 10' feature. The user can now observe what records are predicted to be 'business'-related to validate the function.

We generally recommend first running your function a bunch of times with the "run on 10" feature, as you can observe edge cases, bugs, and other things while doing so.

Fig. 6: Screenshot of the labeling function details page after running the labeling function on the full data (indicated by the state finished).

Your code is run as isolated containerized functions, so don't worry about breaking stuff. This is also the reason why we display "container logs" below the code editor. For all the information on these logs, go to the container logs section further down on this page.

Deleting a labeling function

Deleting a labeling function will remove all data associated with it, which includes the heuristic and all the predictions it has made. However, it does not reset the weak supervision predictions. So after deleting a labeling function that was included in the latest weak supervision run, the predictions are still included in the weak supervision labels. Consider re-calculating the weak supervision if that is an issue.

In order to delete the labeling function simply scroll to the very bottom of the labeling function details, click on the delete button, and confirm the deletion in the appearing modal.

Alternatively, you can delete any heuristic on the heuristics overview page by selecting it and going to Actions -> delete selected right next to the weak supervision button. Don't worry, there will still be a confirmation modal.

Container logs

When you run (or run on 10) your labeling function, refinery executes it in a freshly spawned docker container, which you cannot inspect easily from outside as it shuts itself down after the calculation finishes. That is why we display the resulting logs of the execution on the labeling function details page.

The logs of the latest full run are persisted in the database while the logs of the "run on 10" feature are only cached in the frontend, which means you will lose them as soon as you refresh or leave the page.

Fig. 7: Screenshot of the labeling function details page where the user added a print statement to the labeling function to observe which terms matched with his records.

Besides inspecting error messages, you can also use the container logs for debugging your labeling function using print statements.

Fig. 8: Screenshot of the labeling function details page where the user ran his labeling function with the print statement (from Fig. 7) on 10 randomly sampled records. You can see the print results in the container logs.

Best practices and examples

Validation

After running your labeling function on the whole dataset, you get statistics describing its performance. The statistics should be used as an indicator of the quality of the labeling function. Make sure to understand those statistics and follow best practices in evaluating heuristics.

Lookup lists for distant supervision

You'll quickly see that many of the functions you want to write are based on list expressions. But hey, you most certainly don't want to start maintaining a long list in your heuristic, right? That's why we've integrated automated lookup lists into our application.

As you manually label spans for your extraction tasks, we collect and store these values in a lookup list for the given label.

Fig. 9: Screenshot of the labeling suite where the user labels a span that is automatically added to a lookup list.

You can access them via the heuristic overview page when you click on "Lookup lists". You'll then find another overview page with the lookup lists.

Fig. 10: Screenshot of the lookup lists overview accessed by going through the heuristic overview page.

If you click on "Details", you'll see the respective list and its terms. You can of course also create them fully manually, and add terms as you like. This is also helpful if you have a long list of regular expressions you want to check for your heuristics. You can also see the python variable for the lookup list, as in this example country.

Fig. 11: Screenshot of the lookup list details view where one can manage the lookup list.

In your labeling function, you can then import it from the module knowledge, where we store your lookup lists. In this example, it would look as follows:

Fig. 12: Screenshot of a labeling function importing and utilizing a lookup list called 'country'.

Heuristics for extraction tasks

You might already wonder what labeling functions look like for extraction tasks, as labels are on token-level. Essentially, they differ in two characteristics:

you use yield instead of return, as there can be multiple instances of a label in one text (e.g. multiple people)
you specify not only the label name but also the start index and end index of the span.

An example that incorporates an existing knowledge base to find further examples of this label type looks as follows:

Fig. 13: Screenshot of a labeling function for an extraction task.

This is also where the tokenization via spaCy comes in handy. You can access attributes such as noun_chunks from your attributes, which show you the very spans you want to label in many cases. Our template functions repository contains some great examples of how to use that.

Active learners

Active learners are few-shot learning models that leverage the powerful pre-trained language models that created the embeddings of your records. You can treat them as custom classification heads that are trained on your labeled reference data. They are different from the classical notion of an active learner in the sense that they do not query the user for data points to be labeled, but more on that in the best practices section.

Many concepts in the UI are similar to the ones of the labeling functions.

Creating an active learner

To create a labeling function simply navigate to the heuristics page and select "Active learning" from the "New heuristic" button.

Fig. 1: Screenshot of the heuristics overview page where the user is about to create a new active learner.

After that, a modal will appear where you need to select the labeling task that this heuristic is for and give the active learner a unique name and an optional description. These selections are not finite and can be changed later, but they are used for the initial code generation, which will be a lot easier to use if you select the right things.

Fig. 2: Screenshot of the heuristics overview page where the user entered all the information required to create a new active learner.

After creation, you will be redirected to the details page (Fig. 3) of that active learner, which is also accessible from the heuristics overview. There is a lot of pre-filled code, which will be explained in the next section, where we write an active learner.

Fig. 3: Screenshot of the active learner details page after creation. The user has not modified anything, the code is entirely pre-filled.

Writing an active learner

When you create an active learner, there will be a lot of pre-filled code, which is a good baseline to work with, that you should keep as a structure and only exchange parts of. If you are not interested in the technical details and want to use active learners without touching code too much, please skip forward to the best practices and examples.

Generally, you have to define a new class that carries the name of the active learner, which implements an abstract LearningClassifier (see Fig. 3). In order to use this active learner, you have to implement three abstract functions:

__init__(self): initializes the self.model with an object that fulfills sklearn estimator interface. Can be modified to use the model of your choice.
fit(self, embeddings, labels): fits self.model to the embeddings and labels. Should not be modified.
predict_proba(self, embeddings): makes predictions with self.model and returns the probabilities of those predictions. Should not be modified.

If you are wondering why you should not modify two of those functions, it is because we put all the interchangeable parameters into the decorators (which begin with an "@").

In @params_fit you can specify the embedding name and train-test-split for the fitting process of your model. All valid embedding names can be seen above the code editor and the specified embedding will also be the one used for prediction. The train-test-split is currently fixed to a 50/50 split to ensure that there are enough records for thorough validation.

In @params_inference you can specify the minimal confidence that your model should have in order to output the prediction it made and the label names which this active learner should make predictions for. One active learner can be trained on any amount of classes. If you specify None, then all the available labels are used. The default value for min_confidenceis 0.9, which makes sure that your predictions are less noisy. If you find to suffer from very low coverage, consider changing this value, but the lower you set it, the more validation data should be available.

Per default, we use the LogisticRegression from sklearn, which is common practice and we recommend trying it before switching to another model.

Running an active learner

After you have finished writing your active learner (or just went with the default one), you can run it using either:

Run: Fits the active learner on all manually labeled records from the training split that carry a label specified in label_names. After the fitting, it will make predictions for all records while obeying the specified min_confidence. Finally, it will calculate the statistics using the test records. This action will alter the state of your active learner first to running and after that (depending on the outcome) either to finished or error.
Run + weakly supervise: Just like Run but triggers a weak supervision calculation with all heuristics selected directly afterward.

Fig. 4: Screenshot of the active learner details page after running it. The combination of extreme precision estimates and low coverages suggests that the user should try a lower confidence threshold.

For more details on the statistics, please visit the page about evaluating heuristics.

Deleting an active learner

Deleting an active learner will remove all data associated with it, which includes the heuristic and all the predictions it has made. However, it does not reset the weak supervision predictions. So after deleting an active learner that was included in the latest weak supervision run, the predictions are still included in the weak supervision labels. Consider re-calculating the weak supervision if that is an issue.

In order to delete the active learner simply scroll to the very bottom of the active learner details, click on the delete button, and confirm the deletion in the appearing modal.

Alternatively, you can delete any heuristic on the heuristics overview page by selecting it and going to "Actions" -> "delete selected" right next to the weak supervision button. Don't worry, there will still be a confirmation modal.

Best practices and examples

Validation

After running your active learner on the whole dataset, you get statistics describing its performance. The statistics should be used as an indicator of the quality of the active learner. Make sure to understand those statistics and follow best practices in evaluating heuristics.

Custom active learning model

You can use Scikit-Learn inside the editor as you like, e.g. to extend your model with grid search. The self.model is any model that fits the Scikit-Learn estimator interface, i.e. you can also write code like this:

from sklearn.model_selection import GridSearchCV
from sklearn.tree import DecisionTreeClassifier

class ActiveDecisionTree(LearningClassifier):

    def __init__(self):
        params = {
            "criterion": ["gini", "entropy"],
            "max_depth": [5, 10, None]
        }
        self.model = GridSearchCV(DecisionTreeClassifier(), params, cv=3)

# ...

As with any other heuristic, your function will automatically and continuously be evaluated against the data you label manually.

Minimum confidence for finetuning

One way to improve the precision of your heuristics is to label more data (also, there typically is a steep learning curve, in the beginning, so make sure to label at least some records). Another way is to increase the min_confidence threshold of the @params_inference decorator. Generally, precision beats recall in active learners for weak supervision, so it is perfectly fine to choose higher values for the minimum confidence.

Active learning for extraction

We're using our own library sequencelearn to enable a Scikit-Learn-like API for programming span predictors, which you can also use outside of our application.

Other than importing a different library, the logic works analog to active learning classifiers.

Zero-shot classifiers

Zero-Shot Classification refers to the task of predicting a class that wasn't seen by the model during training. In refinery, this is accomplished by embedding the label names and text attributes into the same space, where distance metrics (e.g. cosine similarity) can be used to assign the closest label to the respective record. Read more about zero-shot classification here.

Zero-shot classifiers are only available for multiclass classification. We are still in the process of investigating the capabilities of zero-shot classifiers for extraction tasks.

Creating a zero-shot classifier

Just like the other heuristics, we can add a zero-shot classifier by visiting the heuristic overview page and selecting "zero-shot" in the "new heuristic" dropdown menu.

Fig. 1: Screenshot of the heuristic overview page where the user is creating a new zero-shot classifier.

The information we then have to fill out (see Fig. 2) is pretty straightforward. Select the task and attribute, then input a handle from Hugging Face. Make sure that the Hugging Face model is suitable for zero-shot classification. You can browse the available models here.

Fig. 2: Screenshot of the zero-shot classifier creation dialogue.

After creation, you are redirected to the details page of the zero-shot classifier (see Fig. 3).

Fig. 3: Screenshot of the zero-shot classifier details page.

Experimenting with the zero-shot classifier

In theory, you could run this classifier out of the box with the default configuration by just clicking on the "run" button. But there are a few things one should keep in mind for optimal performance. As this classifier is really relying on the wording of the labels for classification, you should first make sure that the label names are expressive. If a label is called yes, then it does not carry any information for this zero-shot module even though it might totally make sense for a human labeler in combination with the name of the labeling task. For example, consider renaming yes to clickbait headline if it is a clickbait labeling task.

Because this module requires a little experimentation, refinery gives you the tools to try out a lot of variations. The first thing is the "run on 10" feature, which is common among the heuristics. It samples 10 random records from your data and executes the classifier on them.

Fig. 4: Screenshot of the zero-shot classifier details page, where the user ran it on 10 randomly sampled records of the `AG News` sample use case.

After running the classifier on 10 samples, we can inspect the results in the table below (see Fig. 4). All the yellow exclamation marks indicate that these records are below the prediction threshold that was set further up on this page (a good default is 40 %).

When inspecting these results (Fig. 4), we can see that the first entry was predicted to carry the label world. If we read the headline carefully, we might get an idea why that could have been:

Gamers Eye Open Virtual Worlds Fans of open-source programming and multiplayer gaming explore ways to bring the concept of the metaverse -- an infinitely expandable virtual world -- closer to fruition. By Daniel Terdiman.

If we now click on the "view" button next to the label, we get the full details of that record and prediction (see Fig. 5). The classifier lists sci_tech as the second most probably option, which would be the correct label, but apparently sci_tech is not expressive enough (overall, it is not even a real world) and world expresses the wrong concept, even though we humans might be able to use a better abstraction of it.

Fig. 5: Screenshot of the detailed predictions after the user clicked on the 'view' button.

In order to prototype better label descriptions, we can use the zero-shot playground on this details page. For that, we just insert the text we want to classify into the first textbox and put the new label names into the textbox below (comma separated). We then press "compute example", which returns the probabilities of the new label names. This has been done in Fig. 6, where the user replaced sci_tech with science and technology and replaced world with international news. The assigned label is now the correct one with a better-looking probability distribution.

Fig. 6: Screenshot of the zero-shot classifier details page where the user experiments with new label names to improve zero-shot performance.

If you like the new label names and want to test them on more data, you can now use the "run on 10" feature with these new label names. Just leave them entered in the textbox and press "run on 10". This way you get a better overview, which helps you further identify potential issues. But don't get lost in this iterative process. In the end, the zero-shot classifier is just a single imperfect heuristic in a whole collection of other heuristics.

Fig. 7: Screenshot of the zero-shot classifier details page where the user utilizes the 'run on 10' feature with custom label names to test them on more data before finalizing the renaming.

Running a zero-shot classifier

Running the zero-shot classifier will only consider those labels whose boxes are checked at the top of the details page. So if you need to make final adjustments to the names of your labels, you need to head over to the settings page and change them before running this heuristic.

If you want to exclude some labels, because they are used more as flags instead of labels (e.g. can be deleted? or needs clarification), then make sure that they are unchecked before running the heuristic.

Finally, if everything is ready, you just have to click on "run" and the heuristic will be executed on all of your records. All predictions that don't surpass the set threshold are dropped.

Deleting a zero-shot classifier

You can delete the zero-shot classifier by scrolling to the very bottom of its details page and clicking on the delete button. This will remove all predictions of this heuristic, but won't change the weak supervision results that were calculated prior to the deletion.

Alternatively, you could go to the heuristics overview page, select the classifier, and then press on "actions" -> "delete selected". This will result in the same action.

Examples and best practices

Validation

After running your zero-shot classifier on the whole dataset, you get statistics describing its performance. The statistics should be used as an indicator of the quality of the classifier. Make sure to understand those statistics and follow best practices in evaluating heuristics.

Crowd labeling

This feature is only available in the managed version.

When you have some annotation budget available or just know some colleagues who have a little less domain expertise than required, you can still incorporate their knowledge as a crowd labeling heuristic. Before starting with this, you should be aware of the implications of this design choice. As this is only a heuristic, you will not be able to use these labels individually, but only in an aggregated form by incorporating them in the weak supervision process.

Pre-requisites

Before you can use the crowd labeling heuristic, you should first create static data slices that the annotators will work on. If you want to randomly distribute work, we suggest slicing your data according to the primary key (e.g. running_id), shuffling it randomly, and saving that to a slice.

You should also have at least one user with the annotator role assigned to your workspace. If that is not the case, get in contact with us and we will set it up together with you.

Creating a crowd labeling heuristic

As with every heuristic, we start by visiting the heuristics page and selecting "crowd labeling" in the dropdown "new heuristic".

Fig. 1: Screenshot of the heuristics overview page. The user is adding a new crowd labeling heuristic.

This heuristic also requires the user to specify the labeling task, a name, and a description. This can also still be updated in the next step.

Fig. 2: Screenshot of the modal that appears after selecting to add a new crowd labeling heuristic.

After you entered everything and clicked on "create", you will be redirected to the details page of that crowd heuristic. As you can see in Fig. 3, the state is still initial and there is no annotator or slice selected. You can now fill in the required information, so assign one annotator account to one static data slice. After that, you can generate a link that can be sent out to the annotator.

Fig. 3: GIF of the user filling in the required information for the crowd heuristic shortly after creation.

Distributing the labeling work

Before sending out any links, you should be sure that you selected the right labeling task, annotator, and data slice. As soon as one crowd label is assigned to this heuristic, these options will be locked and cannot be changed anymore.

Once you made sure everything is correct, you can just send the link out to the person that has access to the selected annotator account (e.g. send the link to [email protected] in the example of Fig. 3).

Want to revoke access to this heuristic but already sent out the link? Selecting the lock icon below the link will "lock" the annotator out of this heuristic, which means they won't be able to annotate or view the data anymore. This way you can revoke access to the data without deleting the heuristic.

As you can see in Fig. 4, as soon as the off-screen annotator set the first label, the settings get locked. The progress bar indicated the progress of the annotator on the selected slice. Statics are collected and live updated without refreshing the page. Read more on evaluating heuristics if you are curious about the meaning of the statistics.

Fig. 4: GIF of the crowd heuristic details page while the annotator is labeling off-screen. Live updates show the engineer how far the annotator progressed and with what accuracy compared to the manual reference labels.

Deleting a crowd labeling heuristic

Currently, there is no delete button on the crowd heuristic details page. That means the only way to delete this heuristic is by selecting it on the heuristic overview page and clicking on "actions" -> "delete selected".