

Evaluation

class Evaluation

Calculates performance metrics for trained models.

Loads the best model (validation accuracy) from models directory in job directory. All metrics and graphs are based on test_samples.json in job directory. Plots will only be shown if number of classes 20 or less.

Attributes

image_dir: Path of image directory.
job_dir: Path to job directory with samples.
batch_size: Number of images per batch (default 64).
base_model_name: Name of pretrained CNN (default MobileNet).

init

def __init__(image_dir, job_dir, batch_size, base_model_name, **kwargs)

Inits evaluation component.

Loads the best model from job directory. Creates evaluation directory if app was started from commandline.

get_correct_wrong_examples

def get_correct_wrong_examples(label)

Gets correctly and wrongly predicted samples for a given label.

Args

label: int or str (label for which the predictions should be considered).

Returns

(correct, wrong): Tuple of two image lists.

visualize_images

def visualize_images(image_list, title, show_heatmap, n_plot)

Visualizes images in a sample list.

Args

image_list: sample list.
show_heatmap: boolean (generates a gradient based class activation map (grad-CAM), default False).
n_plot: maximum number of plots to be shown (default 20).

run

def run(report_create, report_kernel_name, report_export_html, report_export_pdf)

Runs evaluation pipeline on the best model found in job directory for the specific test set

Makes prediction on test set
Plots test set distribution
Plots classification report (accuracy, precision, recall)
Plots confusion matrix (on precsion and on recall)
Plots correct and wrong examples

If not in ipython mode an evaluation report is created.

Args

report_create: boolean (create ipython kernel)
rt_kernel_name: str (name of ipython kernel)
rt_export_html: boolean (exports report to html).
rt_export_pdf: boolean (exports report to pdf).