Also available on 📺 YouTube
Click here to see a video explanation of how you can use the data browser to navigate through your data.
The data browser is the heart of Kern Refinery. With it, you can create labeling sessions, filter down your data, find similar records, and many more. Let's dive right in!
The data browser comes with extensive filtering. The most straightforward option is to search for specific textual patterns in your texts, e.g. if an attribute contains a certain term. The filtered results highlight the found terms in your data.
From the created filter, you could now directly jump into a labeling session - in other words, with the above-applied filter, you'd now only label records containing "UK" in some attribute. Now let's see how this can become more complex, and what great use cases you can build on that foundation for your labeling.
As you implement and run heuristics, they not only automate your labeling via weak supervision. They also enrich your records such that you can filter for them in the data browser. For instance, I can now look for the records which are hit by both the heuristics
DistilbertClassifier. As you see on the right side of the browser, I now have exactly those records.
This is super helpful when you want to better understand potential intersections and conflicts of heuristics. With that capability, you can better analyze where you need to debug your heuristics.
Alternatively, I can also my confidence scores from the weakly supervised labels to order accordingly. What are the records that have super reliable labels? Easy to find out:
You can also use the labeling-task-specific drawers to select for potential labeling mismatches. As you can order by the weak supervision confidence score, this makes it easy to either find manual labeling errors (i.e. there is a mismatch and the weakly supervised label has a high likelihood) and weak supervision bugs.
You can generally mix and match between different filter segments. Your building components will be joined by a conjunction, i.e. narrowing down the result set.
Finally, you can store your filters to re-use them later on. Doing so, you have two options:
- storing them as
dynamicslices: every time you select this filter, its conditions will be re-computed. This way, this filter is highly flexible for additional customizations but takes longer to compute.
- storing them as
staticslices: the filtered result will be stored in form of indices. Those slices are super-fast, and because of that, can also be used for the monitoring page to drill down your analysis.
Updated 3 months ago