Data management

Navigate through your data easily

📘

Also available on 📺 YouTube

Click here to see a video explanation of how you can use the data browser to navigate through your data.

The data browser is the heart of Kern Refinery. With it, you can create labeling sessions, filter down your data, find similar records, and many more. Let's dive right in!

Attribute filters

The data browser comes with extensive filtering. The most straightforward option is to search for specific textual patterns in your texts, e.g. if an attribute contains a certain term. The filtered results highlight the found terms in your data.

28802880

From the created filter, you could now directly jump into a labeling session - in other words, with the above-applied filter, you'd now only label records containing "UK" in some attribute. Now let's see how this can become more complex, and what great use cases you can build on that foundation for your labeling.

Heuristic-based filters

As you implement and run heuristics, they not only automate your labeling via weak supervision. They also enrich your records such that you can filter for them in the data browser. For instance, I can now look for the records which are hit by both the heuristics starts_with_digit and DistilbertClassifier. As you see on the right side of the browser, I now have exactly those records.

28802880

This is super helpful when you want to better understand potential intersections and conflicts of heuristics. With that capability, you can better analyze where you need to debug your heuristics.

Confidence-based ordering

Alternatively, I can also my confidence scores from the weakly supervised labels to order accordingly. What are the records that have super reliable labels? Easy to find out:

28802880

Finding label mismatches

You can also use the labeling-task-specific drawers to select for potential labeling mismatches. As you can order by the weak supervision confidence score, this makes it easy to either find manual labeling errors (i.e. there is a mismatch and the weakly supervised label has a high likelihood) and weak supervision bugs.

28802880

User filters

In the managed version, you can also filter for data labeled by different users. This is especially helpful if you want to determine the inter-annotator agreement for your users.

28802880

Mixing filters

You can generally mix and match between different filter segments. Your building components will be joined by a conjunction, i.e. narrowing down the result set.

28802880

Saving filters

Finally, you can store your filters to re-use them later on. Doing so, you have two options:

  • storing them as dynamic slices: every time you select this filter, its conditions will be re-computed. This way, this filter is highly flexible for additional customizations but takes longer to compute.
  • storing them as static slices: the filtered result will be stored in form of indices. Those slices are super-fast, and because of that, can also be used for the monitoring page to drill down your analysis.
28802880

Did this page help you?