data_balance
The job provides recommendations for adjusting your dataset to ensure balanced representation. It identifies over-represented and under-represented items based on their diversity and similarity to essential and forbidden examples, facilitating a more evenly distributed dataset for improved model training, efficient archiving and relevant discovery.
Required Account Privileges: "read"
Request JSON ["inputs"]:
"clustered_content_ids_sorted_by_decreasing_diversity_with_contents_sorted_by_distance_to_centroid":
list of lists of ints
null NOT allowed
A list of lists of integers, where each sublist represents content IDs sorted by decreasing diversity and by their distance to the centroid within clusters.
"ids_sorted_from_inliers_to_outliers":
list of ints
null allowed
An optional list of integers representing content IDs sorted from inliers to outliers.
"ids_sorted_by_essential_examples":
list of ints
null allowed
An optional list of integers representing content IDs sorted as essential examples.
"ids_sorted_by_forbidden_examples":
list of ints
null allowed
An optional list of integers representing content IDs sorted as forbidden examples.Response JSON ["results"]
"prioritized_over_represented_ids_to_remove":
list of ints
"prioritized_under_represented_ids_to_source":
list of ints