Skip to content


class Evaluation

Calculates performance metrics for trained models.

Loads the best model (validation accuracy) from models directory in job directory. All metrics and graphs are based on test_samples.json in job directory. Plots will only be shown if number of classes 20 or less.

  • image_dir: Path of image directory.

  • job_dir: Path to job directory with samples.

  • batch_size: Number of images per batch (default 64).

  • base_model_name: Name of pretrained CNN (default MobileNet).


def __init__(image_dir, job_dir, batch_size, base_model_name, **kwargs)

Inits evaluation component.

Loads the best model from job directory. Creates evaluation directory if app was started from commandline.


def get_correct_wrong_examples(label)

Gets correctly and wrongly predicted samples for a given label.

  • label: int or str (label for which the predictions should be considered).
  • (correct, wrong): Tuple of two image lists.


def visualize_images(image_list, title, show_heatmap, n_plot)

Visualizes images in a sample list.

  • image_list: sample list.

  • show_heatmap: boolean (generates a gradient based class activation map (grad-CAM), default False).

  • n_plot: maximum number of plots to be shown (default 20).


def run(report_create, report_kernel_name, report_export_html, report_export_pdf)

Runs evaluation pipeline on the best model found in job directory for the specific test set

  • Makes prediction on test set

  • Plots test set distribution

  • Plots classification report (accuracy, precision, recall)

  • Plots confusion matrix (on precsion and on recall)

  • Plots correct and wrong examples

If not in ipython mode an evaluation report is created.

  • report_create: boolean (create ipython kernel)

  • rt_kernel_name: str (name of ipython kernel)

  • rt_export_html: boolean (exports report to html).

  • rt_export_pdf: boolean (exports report to pdf).