Frequently asked questions for working with the kern application

For more general questions, please make sure to check out the FAQ on our website first.
In this FAQ we will go into more detailed questions about the platform, UI and commonly faced problems.

What is the difference between scale and test data?

scale data is intended for training and validation of the subsequent machine learning models, so the goal is to have as much of it as possible. Our Information sources and Information integration work on scale data exclusively.
test data is intended for giving a realistic estimate of your final machine learning model performance. The labels on the test data should be of highest possible quality and must therefore be manually labeled.

What can I do with test data in kern?

There are two reasons we include test data in our application:

  • you get to label and manage all your data in kern
  • you can compare the label distributions against each other to avoid drastic distribution imbalances

That means that in kern you can manually label and manage your test data and see the distribution of it on the overview page of your project.

How can I bring my own labels?

If you are just starting with kern and want to bring your own labels, you must add them as attributes to the data that you are uploading. We automatically recognize your attributes that are labels if you stick to the naming schema: "$attribute__$labelTaskName", where "$attribute" is the name of the attribute you want to assign a label to (e.g. when classifying the subject of an E-Mail) and "$labelTaskName" is the name of the labeling task (e.g. "Subject Classification"). If you have classification labels for the whole record (e.g. "Spam" or "Not Spam") then you just leave the "$attribute" empty (e.g. "__SpamClassification"). If you are still unsure about uploading labels, we encourage you to look into the Project setup again.

Did this page help you?