v0.6 - Zero-Shot && Data Browser++

Usually, we give you a bunch of smaller improvements to further enhance your experience using kern. We still have some, but this time around we spared neither time nor headaches to give you some exciting new possibilities. This includes the brand new type of Information Source "Zero-Shot", as well as some further Data Browser capabilities like an embedding-powered similarity search (thanks Qdrant!).

๐ŸŽฏ Zero Shot

In short, zero-shot learning is a problem setup in machine learning, where at test time, a learner observes samples from classes, which were not observed during training. The only information the learner needs for that is the name of the label itself, so it's best to not use abbreviations and keep it concise. But this means that we can use pretty much any label and still get an evaluation if the given text fits the provided label without doing any prior training or fine-tuning, which is awesome!



Image too small?

If you want to take a closer look at the images and GIFs, just click on them and they will scale to fit your screen!

Now the downside to this is that zero-shot computations are somewhat performance heavy so you might need to wait a little for the results to show up. The good news: you can go and grab a coffee or do something else (e.g. brainstorm labeling functions or label manually ๐Ÿ˜‰) while the program is working for you in the background.

But how can you ensure that the quality of the zero-shot classifier is what you want it to be before you execute it?
Introducing the zero-shot tester, where you can get a feeling for the module's capabilities. We gave our best to make it as intuitive as possible, take a look at the GIF below to get a first glimpse!


An example of how to use the zero-shot tester. Here you can apply the zero-shot classifier to different test sentences and labels. Every time you press the button to test on 10 records, a random selection of 10 records is fetched from your data. So you might want to press it multiple times to see the performance on different subsets!

๐Ÿ” Data Browser - Similarity Search

Embeddings. They are a great tool to make a computer-understandable representation of texts. It is also called a "vector representation". With this, we can achieve a lot of great things. You should already know our Active Learning module which uses these embeddings to train a model. But we can also use these vectors to compare them against each other in terms of their proximity. With that, we can do similarity search - a way to explore similar records based on a given embedding. It is important to note that the similarity search operates on attribute level at the moment. This means that you will not compare the whole record to others but rather one attribute of the record to the same attribute of others (e.g. search for similar headlines in our sample project).


Search for similar records by pressing the similarity search button on the right side. Press "clear" to jump out of the similarity search view.


Similarity Search and other filters

At the moment you cannot combine similarity search with other filtering options, but you can start a labeling session from the results of a similarity query.


Similarity Search and Embeddings

Keep in mind that similarity search is based on embeddings, so the quality of the embeddings will directly influence the quality of the Similarity Search.

๐Ÿ˜‹ Further Data Browser enhancements

In our previous releases, we already mentioned that the Data Browser will be constantly evolving. So here are four enhancements that some of you asked for and we found to be very useful!
We finally included a way to filter for users on manual labels โ‘ ! We had that one coming for quite a while and we are so excited that it is available now.
Another addition is a display option to only show labels that are related to the current Weak Supervision run โ‘ก. This means only labels used (or returned) by our current Weak Supervision run are displayed in the Data Browser and Labeling Functions that were not included in the calculation are not displayed.
You want more? We got you covered.
Did you ever wonder where your Information Sources might differ? Where one Labeling Function predicts "spam" and the Active Learner says "ham"? With the next addition we can easily filter for exactly that โ‘ข, while also being able to combine the results with our other filter criteria.
Last but not least: the "Drill Down" โ‘ฃ. A name we aren't sure we will stick with. The feature, however, is a must-have.
Until this point in time, you could only combine labels in the same task group with an OR-operator. So only finding records that have both labels A and B was quite the challenge. Now with the "Drill Down" activated this task becomes a picnic.


For an explanation of the red numbers look in the text above the image!

So what are you waiting for? We got so many exciting new things to try and explore, jump right into kern and see what these features can do for you!

๐Ÿ’—Minor Changes

  • New Login Page ๐Ÿ”ฅ
  • Increased the default value for label buttons displayed in the labeling tab to 5
  • Fixed an issue where on rare occasions multiple classification labels were created
  • Fixed an issue with the embedding creation progress bar