How to create a CVAT benchmark

This guide explains how to create a CVAT benchmark within the platform.

How to Create a CVAT Benchmark for Auto QA Evaluation

Steps to Create a CVAT Benchmark

1. Prerequisites

  • You must have a project utilizing CVAT.
  • A dataset is required for evaluating the project.

2. Open the Benchmark Management Modal

  • Navigate to the section where benchmarks are managed.
  • Click the "Create New Benchmark" button.
Create new benchmark

3. Fill in the Benchmark Details

  • Provide the following details:
    • Name: Enter the benchmark's name.
    • Description: Add a short description of the benchmark.
    • Type of Benchmark: Specify the benchmark type.
    • Blocking Benchmark: Indicate whether this benchmark blocks taskers.
    • Start Date/End Date: Include these dates if applicable.
    • Enabled: Specify whether the benchmark is enabled or not.

4. Configure Input/Metadata/Output Fields

For this type of benchmark, only the Metadata field needs to be completed. Include information about:

  • Labels available for the task.
  • The image to be annotated.

Example of Expected Metadata

{ "cvatPayload": "{"payload":{"name":"CVAT task","labels":[{"name":"building","color":"#0000FF","type":"any","attributes":[]},{"name":"roof","color":"#FF0000","type":"any","attributes":[]},{"name":"property","color":"#00FF00","type":"any","attributes":[]}], "images":["https://valid/image/url"]}" }
  • Ensure that the task data has been previously created with high quality in the same or a similar pipeline.
  • The labels and annotations should emulate a realistic CVAT task.

5. Select the Evaluation Procedure

  • Choose "CVAT Evaluation" from the available options.
  • This will enable the Ground Truth Data field.
Evaluation procedure dropdown

6. Fill in Ground Truth Data

  • Retrieve the Ground Truth Data from the task with excellent quality.
  • Use the information from output.taskResult.annotations. This object contains all polygon information created in the CVAT tool.

7. Define Evaluation Rules

  • Add one or more rules to evaluate the task. These rules help verify whether the polygons created in the task meet expectations.

Rule Configuration

  • Condition Fact: Choose what to evaluate, such as:
    • User's output
    • Intersection over union
    • Total matches
    • Score distribution
  • Path: Use a placeholder like "$." if no specific path is needed.
  • Operator: Select from operators such as >, <, =, or !=.
  • Value: Enter the expected value for the rule.

Example Rule

{ "conditionFact": "Average score", "path": "$.", "operator": ">", "value": 0.8 }

This rule specifies that the average score for the task must be greater than 0.8 for the task to be considered correct.


Notes

  • It is crucial to use a task with high-quality data as the basis for creating benchmarks.
  • Benchmarks provide a structured method to evaluate task performance based on specific criteria.