Example 3 : Understanding the statistical iterations

Context

In the previous example, we have seen that in order to output meaningful results, the platform splits the input dataset in a training and a testing set.

However, even if the split is done at random, one can draw a lucky (or unlucky) split and have great (or poor) performance on this specific split.

To settle this issue, the platform can run on multiple splits and return the mean scores.

How to use it

This feature is controlled by a single argument : stats_iter: in the config file. Modifying this argument and setting more than one stats_iter will slightly modify the result directory’s structure. Indeed, as the platform will perform a benchmark on multiple train/test split, the result directory will be larger in order to keep all the individual results.

To run SuMMIT on several train/test splits, run :

>>> from summit.execute import execute
>>> execute("example 3")

While SuMMIT computes, let us explore the new pseudo-code

for each statistical iteration :
    ┌
    |for each monoview classifier:
    |    for each view:
    |        for each draw:
    |            for each fold:
    |                learn the classifier on all-1 folds and test it on 1
    |            get the mean performance
    |        get the best hyper-parameter set
    |        learn on the whole training set
    |and
    |for each multiview classifier:
    |    for each draw:
    |        for each fold:
    |            learn the classifier on all-1 folds and test it on 1
    |        get the mean performance
    |    get the best hyper-parameter set
    |    learn on the whole training set
    └

The result directory will be structured as :

  • feature_importances
    • doc_summit-generated_view_1-feature_importances.html
    • doc_summit-generated_view_1-feature_importances_dataframe.csv
    • doc_summit-generated_view_1-feature_importances_dataframe_stds.csv
    • doc_summit-generated_view_2-feature_importances.html
    • doc_summit-generated_view_2-feature_importances_dataframe.csv
    • doc_summit-generated_view_2-feature_importances_dataframe_stds.csv
    • doc_summit-generated_view_3-feature_importances.html
    • doc_summit-generated_view_3-feature_importances_dataframe.csv
    • doc_summit-generated_view_3-feature_importances_dataframe_stds.csv
    • doc_summit-generated_view_4-feature_importances.html
    • doc_summit-generated_view_4-feature_importances_dataframe.csv
    • doc_summit-generated_view_4-feature_importances_dataframe_stds.csv
  • iter_1
    • adaboost
      • generated_view_1
        • adaboost-doc_summit-generated_view_1-confusion_matrix.csv
        • adaboost-doc_summit-generated_view_1-feature_importances.png
        • adaboost-doc_summit-generated_view_1-full_pred.csv
        • adaboost-doc_summit-generated_view_1-summary.txt
        • adaboost-doc_summit-generated_view_1-test_labels.csv
        • adaboost-doc_summit-generated_view_1-test_metrics.csv
        • adaboost-doc_summit-generated_view_1-test_metrics.png
        • adaboost-doc_summit-generated_view_1-times.csv
        • adaboost-doc_summit-generated_view_1-train_labels.csv
        • adaboost-doc_summit-generated_view_1-train_metrics.csv
        • adaboost-doc_summit-generated_view_1-train_pred.csv
      • generated_view_2
        • adaboost-doc_summit-generated_view_2-confusion_matrix.csv
        • adaboost-doc_summit-generated_view_2-feature_importances.png
        • adaboost-doc_summit-generated_view_2-full_pred.csv
        • adaboost-doc_summit-generated_view_2-summary.txt
        • adaboost-doc_summit-generated_view_2-test_labels.csv
        • adaboost-doc_summit-generated_view_2-test_metrics.csv
        • adaboost-doc_summit-generated_view_2-test_metrics.png
        • adaboost-doc_summit-generated_view_2-times.csv
        • adaboost-doc_summit-generated_view_2-train_labels.csv
        • adaboost-doc_summit-generated_view_2-train_metrics.csv
        • adaboost-doc_summit-generated_view_2-train_pred.csv
      • generated_view_3
        • adaboost-doc_summit-generated_view_3-confusion_matrix.csv
        • adaboost-doc_summit-generated_view_3-feature_importances.png
        • adaboost-doc_summit-generated_view_3-full_pred.csv
        • adaboost-doc_summit-generated_view_3-summary.txt
        • adaboost-doc_summit-generated_view_3-test_labels.csv
        • adaboost-doc_summit-generated_view_3-test_metrics.csv
        • adaboost-doc_summit-generated_view_3-test_metrics.png
        • adaboost-doc_summit-generated_view_3-times.csv
        • adaboost-doc_summit-generated_view_3-train_labels.csv
        • adaboost-doc_summit-generated_view_3-train_metrics.csv
        • adaboost-doc_summit-generated_view_3-train_pred.csv
      • generated_view_4
        • adaboost-doc_summit-generated_view_4-confusion_matrix.csv
        • adaboost-doc_summit-generated_view_4-feature_importances.png
        • adaboost-doc_summit-generated_view_4-full_pred.csv
        • adaboost-doc_summit-generated_view_4-summary.txt
        • adaboost-doc_summit-generated_view_4-test_labels.csv
        • adaboost-doc_summit-generated_view_4-test_metrics.csv
        • adaboost-doc_summit-generated_view_4-test_metrics.png
        • adaboost-doc_summit-generated_view_4-times.csv
        • adaboost-doc_summit-generated_view_4-train_labels.csv
        • adaboost-doc_summit-generated_view_4-train_metrics.csv
        • adaboost-doc_summit-generated_view_4-train_pred.csv
      • generated_view_1feature_importances.pickle
      • generated_view_2feature_importances.pickle
      • generated_view_3feature_importances.pickle
      • generated_view_4feature_importances.pickle
    • decision_tree
      • generated_view_1
        • decision_tree-doc_summit-generated_view_1-confusion_matrix.csv
        • decision_tree-doc_summit-generated_view_1-feature_importances.png
        • decision_tree-doc_summit-generated_view_1-full_pred.csv
        • decision_tree-doc_summit-generated_view_1-summary.txt
        • decision_tree-doc_summit-generated_view_1-test_labels.csv
        • decision_tree-doc_summit-generated_view_1-train_labels.csv
        • decision_tree-doc_summit-generated_view_1-train_pred.csv
      • generated_view_2
        • decision_tree-doc_summit-generated_view_2-confusion_matrix.csv
        • decision_tree-doc_summit-generated_view_2-feature_importances.png
        • decision_tree-doc_summit-generated_view_2-full_pred.csv
        • decision_tree-doc_summit-generated_view_2-summary.txt
        • decision_tree-doc_summit-generated_view_2-test_labels.csv
        • decision_tree-doc_summit-generated_view_2-train_labels.csv
        • decision_tree-doc_summit-generated_view_2-train_pred.csv
      • generated_view_3
        • decision_tree-doc_summit-generated_view_3-confusion_matrix.csv
        • decision_tree-doc_summit-generated_view_3-feature_importances.png
        • decision_tree-doc_summit-generated_view_3-full_pred.csv
        • decision_tree-doc_summit-generated_view_3-summary.txt
        • decision_tree-doc_summit-generated_view_3-test_labels.csv
        • decision_tree-doc_summit-generated_view_3-train_labels.csv
        • decision_tree-doc_summit-generated_view_3-train_pred.csv
      • generated_view_4
        • decision_tree-doc_summit-generated_view_4-confusion_matrix.csv
        • decision_tree-doc_summit-generated_view_4-feature_importances.png
        • decision_tree-doc_summit-generated_view_4-full_pred.csv
        • decision_tree-doc_summit-generated_view_4-summary.txt
        • decision_tree-doc_summit-generated_view_4-test_labels.csv
        • decision_tree-doc_summit-generated_view_4-train_labels.csv
        • decision_tree-doc_summit-generated_view_4-train_pred.csv
      • generated_view_1feature_importances.pickle
      • generated_view_2feature_importances.pickle
      • generated_view_3feature_importances.pickle
      • generated_view_4feature_importances.pickle
    • feature_importances
      • doc_summit-generated_view_1-feature_importances.html
      • doc_summit-generated_view_1-feature_importances_dataframe.csv
      • doc_summit-generated_view_2-feature_importances.html
      • doc_summit-generated_view_2-feature_importances_dataframe.csv
      • doc_summit-generated_view_3-feature_importances.html
      • doc_summit-generated_view_3-feature_importances_dataframe.csv
      • doc_summit-generated_view_4-feature_importances.html
      • doc_summit-generated_view_4-feature_importances_dataframe.csv
    • folds
      • test_labels_fold_0.csv
      • test_labels_fold_1.csv
      • test_labels_fold_2.csv
      • test_labels_fold_3.csv
      • test_labels_fold_4.csv
    • weighted_linear_late_fusion
      • weighted_linear_late_fusion-doc_summit-confusion_matrix.csv
      • weighted_linear_late_fusion-doc_summit-summary.txt
    • doc_summit-2D_plot_data.csv
    • doc_summit-accuracy_score*-class.html
    • doc_summit-accuracy_score*.csv
    • doc_summit-accuracy_score*.html
    • doc_summit-accuracy_score*.png
    • doc_summit-bar_plot_data.csv
    • doc_summit-durations.html
    • doc_summit-durations_dataframe.csv
    • doc_summit-error_analysis_2D.html
    • doc_summit-error_analysis_2D.png
    • doc_summit-error_analysis_bar.html
    • doc_summit-error_analysis_bar.png
    • doc_summit-f1_score-class.html
    • doc_summit-f1_score.csv
    • doc_summit-f1_score.html
    • doc_summit-f1_score.png
    • train_indices.csv
    • train_labels.csv
  • iter_2
    • adaboost
      • generated_view_1
        • adaboost-doc_summit-generated_view_1-confusion_matrix.csv
        • adaboost-doc_summit-generated_view_1-feature_importances.png
        • adaboost-doc_summit-generated_view_1-full_pred.csv
        • adaboost-doc_summit-generated_view_1-summary.txt
        • adaboost-doc_summit-generated_view_1-test_labels.csv
        • adaboost-doc_summit-generated_view_1-test_metrics.csv
        • adaboost-doc_summit-generated_view_1-test_metrics.png
        • adaboost-doc_summit-generated_view_1-times.csv
        • adaboost-doc_summit-generated_view_1-train_labels.csv
        • adaboost-doc_summit-generated_view_1-train_metrics.csv
        • adaboost-doc_summit-generated_view_1-train_pred.csv
      • generated_view_2
        • adaboost-doc_summit-generated_view_2-confusion_matrix.csv
        • adaboost-doc_summit-generated_view_2-feature_importances.png
        • adaboost-doc_summit-generated_view_2-full_pred.csv
        • adaboost-doc_summit-generated_view_2-summary.txt
        • adaboost-doc_summit-generated_view_2-test_labels.csv
        • adaboost-doc_summit-generated_view_2-test_metrics.csv
        • adaboost-doc_summit-generated_view_2-test_metrics.png
        • adaboost-doc_summit-generated_view_2-times.csv
        • adaboost-doc_summit-generated_view_2-train_labels.csv
        • adaboost-doc_summit-generated_view_2-train_metrics.csv
        • adaboost-doc_summit-generated_view_2-train_pred.csv
      • generated_view_3
        • adaboost-doc_summit-generated_view_3-confusion_matrix.csv
        • adaboost-doc_summit-generated_view_3-feature_importances.png
        • adaboost-doc_summit-generated_view_3-full_pred.csv
        • adaboost-doc_summit-generated_view_3-summary.txt
        • adaboost-doc_summit-generated_view_3-test_labels.csv
        • adaboost-doc_summit-generated_view_3-test_metrics.csv
        • adaboost-doc_summit-generated_view_3-test_metrics.png
        • adaboost-doc_summit-generated_view_3-times.csv
        • adaboost-doc_summit-generated_view_3-train_labels.csv
        • adaboost-doc_summit-generated_view_3-train_metrics.csv
        • adaboost-doc_summit-generated_view_3-train_pred.csv
      • generated_view_4
        • adaboost-doc_summit-generated_view_4-confusion_matrix.csv
        • adaboost-doc_summit-generated_view_4-feature_importances.png
        • adaboost-doc_summit-generated_view_4-full_pred.csv
        • adaboost-doc_summit-generated_view_4-summary.txt
        • adaboost-doc_summit-generated_view_4-test_labels.csv
        • adaboost-doc_summit-generated_view_4-test_metrics.csv
        • adaboost-doc_summit-generated_view_4-test_metrics.png
        • adaboost-doc_summit-generated_view_4-times.csv
        • adaboost-doc_summit-generated_view_4-train_labels.csv
        • adaboost-doc_summit-generated_view_4-train_metrics.csv
        • adaboost-doc_summit-generated_view_4-train_pred.csv
      • generated_view_1feature_importances.pickle
      • generated_view_2feature_importances.pickle
      • generated_view_3feature_importances.pickle
      • generated_view_4feature_importances.pickle
    • decision_tree
      • generated_view_1
        • decision_tree-doc_summit-generated_view_1-confusion_matrix.csv
        • decision_tree-doc_summit-generated_view_1-feature_importances.png
        • decision_tree-doc_summit-generated_view_1-full_pred.csv
        • decision_tree-doc_summit-generated_view_1-summary.txt
        • decision_tree-doc_summit-generated_view_1-test_labels.csv
        • decision_tree-doc_summit-generated_view_1-train_labels.csv
        • decision_tree-doc_summit-generated_view_1-train_pred.csv
      • generated_view_2
        • decision_tree-doc_summit-generated_view_2-confusion_matrix.csv
        • decision_tree-doc_summit-generated_view_2-feature_importances.png
        • decision_tree-doc_summit-generated_view_2-full_pred.csv
        • decision_tree-doc_summit-generated_view_2-summary.txt
        • decision_tree-doc_summit-generated_view_2-test_labels.csv
        • decision_tree-doc_summit-generated_view_2-train_labels.csv
        • decision_tree-doc_summit-generated_view_2-train_pred.csv
      • generated_view_3
        • decision_tree-doc_summit-generated_view_3-confusion_matrix.csv
        • decision_tree-doc_summit-generated_view_3-feature_importances.png
        • decision_tree-doc_summit-generated_view_3-full_pred.csv
        • decision_tree-doc_summit-generated_view_3-summary.txt
        • decision_tree-doc_summit-generated_view_3-test_labels.csv
        • decision_tree-doc_summit-generated_view_3-train_labels.csv
        • decision_tree-doc_summit-generated_view_3-train_pred.csv
      • generated_view_4
        • decision_tree-doc_summit-generated_view_4-confusion_matrix.csv
        • decision_tree-doc_summit-generated_view_4-feature_importances.png
        • decision_tree-doc_summit-generated_view_4-full_pred.csv
        • decision_tree-doc_summit-generated_view_4-summary.txt
        • decision_tree-doc_summit-generated_view_4-test_labels.csv
        • decision_tree-doc_summit-generated_view_4-train_labels.csv
        • decision_tree-doc_summit-generated_view_4-train_pred.csv
      • generated_view_1feature_importances.pickle
      • generated_view_2feature_importances.pickle
      • generated_view_3feature_importances.pickle
      • generated_view_4feature_importances.pickle
    • feature_importances
      • doc_summit-generated_view_1-feature_importances.html
      • doc_summit-generated_view_1-feature_importances_dataframe.csv
      • doc_summit-generated_view_2-feature_importances.html
      • doc_summit-generated_view_2-feature_importances_dataframe.csv
      • doc_summit-generated_view_3-feature_importances.html
      • doc_summit-generated_view_3-feature_importances_dataframe.csv
      • doc_summit-generated_view_4-feature_importances.html
      • doc_summit-generated_view_4-feature_importances_dataframe.csv
    • folds
      • test_labels_fold_0.csv
      • test_labels_fold_1.csv
      • test_labels_fold_2.csv
      • test_labels_fold_3.csv
      • test_labels_fold_4.csv
    • weighted_linear_late_fusion
      • weighted_linear_late_fusion-doc_summit-confusion_matrix.csv
      • weighted_linear_late_fusion-doc_summit-summary.txt
    • doc_summit-2D_plot_data.csv
    • doc_summit-accuracy_score*-class.html
    • doc_summit-accuracy_score*.csv
    • doc_summit-accuracy_score*.html
    • doc_summit-accuracy_score*.png
    • doc_summit-bar_plot_data.csv
    • doc_summit-durations.html
    • doc_summit-durations_dataframe.csv
    • doc_summit-error_analysis_2D.html
    • doc_summit-error_analysis_2D.png
    • doc_summit-error_analysis_bar.html
    • doc_summit-error_analysis_bar.png
    • doc_summit-f1_score-class.html
    • doc_summit-f1_score.csv
    • doc_summit-f1_score.html
    • doc_summit-f1_score.png
    • train_indices.csv
    • train_labels.csv
  • iter_3
    • adaboost
      • generated_view_1
        • adaboost-doc_summit-generated_view_1-confusion_matrix.csv
        • adaboost-doc_summit-generated_view_1-feature_importances.png
        • adaboost-doc_summit-generated_view_1-full_pred.csv
        • adaboost-doc_summit-generated_view_1-summary.txt
        • adaboost-doc_summit-generated_view_1-test_labels.csv
        • adaboost-doc_summit-generated_view_1-test_metrics.csv
        • adaboost-doc_summit-generated_view_1-test_metrics.png
        • adaboost-doc_summit-generated_view_1-times.csv
        • adaboost-doc_summit-generated_view_1-train_labels.csv
        • adaboost-doc_summit-generated_view_1-train_metrics.csv
        • adaboost-doc_summit-generated_view_1-train_pred.csv
      • generated_view_2
        • adaboost-doc_summit-generated_view_2-confusion_matrix.csv
        • adaboost-doc_summit-generated_view_2-feature_importances.png
        • adaboost-doc_summit-generated_view_2-full_pred.csv
        • adaboost-doc_summit-generated_view_2-summary.txt
        • adaboost-doc_summit-generated_view_2-test_labels.csv
        • adaboost-doc_summit-generated_view_2-test_metrics.csv
        • adaboost-doc_summit-generated_view_2-test_metrics.png
        • adaboost-doc_summit-generated_view_2-times.csv
        • adaboost-doc_summit-generated_view_2-train_labels.csv
        • adaboost-doc_summit-generated_view_2-train_metrics.csv
        • adaboost-doc_summit-generated_view_2-train_pred.csv
      • generated_view_3
        • adaboost-doc_summit-generated_view_3-confusion_matrix.csv
        • adaboost-doc_summit-generated_view_3-feature_importances.png
        • adaboost-doc_summit-generated_view_3-full_pred.csv
        • adaboost-doc_summit-generated_view_3-summary.txt
        • adaboost-doc_summit-generated_view_3-test_labels.csv
        • adaboost-doc_summit-generated_view_3-test_metrics.csv
        • adaboost-doc_summit-generated_view_3-test_metrics.png
        • adaboost-doc_summit-generated_view_3-times.csv
        • adaboost-doc_summit-generated_view_3-train_labels.csv
        • adaboost-doc_summit-generated_view_3-train_metrics.csv
        • adaboost-doc_summit-generated_view_3-train_pred.csv
      • generated_view_4
        • adaboost-doc_summit-generated_view_4-confusion_matrix.csv
        • adaboost-doc_summit-generated_view_4-feature_importances.png
        • adaboost-doc_summit-generated_view_4-full_pred.csv
        • adaboost-doc_summit-generated_view_4-summary.txt
        • adaboost-doc_summit-generated_view_4-test_labels.csv
        • adaboost-doc_summit-generated_view_4-test_metrics.csv
        • adaboost-doc_summit-generated_view_4-test_metrics.png
        • adaboost-doc_summit-generated_view_4-times.csv
        • adaboost-doc_summit-generated_view_4-train_labels.csv
        • adaboost-doc_summit-generated_view_4-train_metrics.csv
        • adaboost-doc_summit-generated_view_4-train_pred.csv
      • generated_view_1feature_importances.pickle
      • generated_view_2feature_importances.pickle
      • generated_view_3feature_importances.pickle
      • generated_view_4feature_importances.pickle
    • decision_tree
      • generated_view_1
        • decision_tree-doc_summit-generated_view_1-confusion_matrix.csv
        • decision_tree-doc_summit-generated_view_1-feature_importances.png
        • decision_tree-doc_summit-generated_view_1-full_pred.csv
        • decision_tree-doc_summit-generated_view_1-summary.txt
        • decision_tree-doc_summit-generated_view_1-test_labels.csv
        • decision_tree-doc_summit-generated_view_1-train_labels.csv
        • decision_tree-doc_summit-generated_view_1-train_pred.csv
      • generated_view_2
        • decision_tree-doc_summit-generated_view_2-confusion_matrix.csv
        • decision_tree-doc_summit-generated_view_2-feature_importances.png
        • decision_tree-doc_summit-generated_view_2-full_pred.csv
        • decision_tree-doc_summit-generated_view_2-summary.txt
        • decision_tree-doc_summit-generated_view_2-test_labels.csv
        • decision_tree-doc_summit-generated_view_2-train_labels.csv
        • decision_tree-doc_summit-generated_view_2-train_pred.csv
      • generated_view_3
        • decision_tree-doc_summit-generated_view_3-confusion_matrix.csv
        • decision_tree-doc_summit-generated_view_3-feature_importances.png
        • decision_tree-doc_summit-generated_view_3-full_pred.csv
        • decision_tree-doc_summit-generated_view_3-summary.txt
        • decision_tree-doc_summit-generated_view_3-test_labels.csv
        • decision_tree-doc_summit-generated_view_3-train_labels.csv
        • decision_tree-doc_summit-generated_view_3-train_pred.csv
      • generated_view_4
        • decision_tree-doc_summit-generated_view_4-confusion_matrix.csv
        • decision_tree-doc_summit-generated_view_4-feature_importances.png
        • decision_tree-doc_summit-generated_view_4-full_pred.csv
        • decision_tree-doc_summit-generated_view_4-summary.txt
        • decision_tree-doc_summit-generated_view_4-test_labels.csv
        • decision_tree-doc_summit-generated_view_4-train_labels.csv
        • decision_tree-doc_summit-generated_view_4-train_pred.csv
      • generated_view_1feature_importances.pickle
      • generated_view_2feature_importances.pickle
      • generated_view_3feature_importances.pickle
      • generated_view_4feature_importances.pickle
    • feature_importances
      • doc_summit-generated_view_1-feature_importances.html
      • doc_summit-generated_view_1-feature_importances_dataframe.csv
      • doc_summit-generated_view_2-feature_importances.html
      • doc_summit-generated_view_2-feature_importances_dataframe.csv
      • doc_summit-generated_view_3-feature_importances.html
      • doc_summit-generated_view_3-feature_importances_dataframe.csv
      • doc_summit-generated_view_4-feature_importances.html
      • doc_summit-generated_view_4-feature_importances_dataframe.csv
    • folds
      • test_labels_fold_0.csv
      • test_labels_fold_1.csv
      • test_labels_fold_2.csv
      • test_labels_fold_3.csv
      • test_labels_fold_4.csv
    • weighted_linear_late_fusion
      • weighted_linear_late_fusion-doc_summit-confusion_matrix.csv
      • weighted_linear_late_fusion-doc_summit-summary.txt
    • doc_summit-2D_plot_data.csv
    • doc_summit-accuracy_score*-class.html
    • doc_summit-accuracy_score*.csv
    • doc_summit-accuracy_score*.html
    • doc_summit-accuracy_score*.png
    • doc_summit-bar_plot_data.csv
    • doc_summit-durations.html
    • doc_summit-durations_dataframe.csv
    • doc_summit-error_analysis_2D.html
    • doc_summit-error_analysis_2D.png
    • doc_summit-error_analysis_bar.html
    • doc_summit-error_analysis_bar.png
    • doc_summit-f1_score-class.html
    • doc_summit-f1_score.csv
    • doc_summit-f1_score.html
    • doc_summit-f1_score.png
    • train_indices.csv
    • train_labels.csv
  • iter_4
    • adaboost
      • generated_view_1
        • adaboost-doc_summit-generated_view_1-confusion_matrix.csv
        • adaboost-doc_summit-generated_view_1-feature_importances.png
        • adaboost-doc_summit-generated_view_1-full_pred.csv
        • adaboost-doc_summit-generated_view_1-summary.txt
        • adaboost-doc_summit-generated_view_1-test_labels.csv
        • adaboost-doc_summit-generated_view_1-test_metrics.csv
        • adaboost-doc_summit-generated_view_1-test_metrics.png
        • adaboost-doc_summit-generated_view_1-times.csv
        • adaboost-doc_summit-generated_view_1-train_labels.csv
        • adaboost-doc_summit-generated_view_1-train_metrics.csv
        • adaboost-doc_summit-generated_view_1-train_pred.csv
      • generated_view_2
        • adaboost-doc_summit-generated_view_2-confusion_matrix.csv
        • adaboost-doc_summit-generated_view_2-feature_importances.png
        • adaboost-doc_summit-generated_view_2-full_pred.csv
        • adaboost-doc_summit-generated_view_2-summary.txt
        • adaboost-doc_summit-generated_view_2-test_labels.csv
        • adaboost-doc_summit-generated_view_2-test_metrics.csv
        • adaboost-doc_summit-generated_view_2-test_metrics.png
        • adaboost-doc_summit-generated_view_2-times.csv
        • adaboost-doc_summit-generated_view_2-train_labels.csv
        • adaboost-doc_summit-generated_view_2-train_metrics.csv
        • adaboost-doc_summit-generated_view_2-train_pred.csv
      • generated_view_3
        • adaboost-doc_summit-generated_view_3-confusion_matrix.csv
        • adaboost-doc_summit-generated_view_3-feature_importances.png
        • adaboost-doc_summit-generated_view_3-full_pred.csv
        • adaboost-doc_summit-generated_view_3-summary.txt
        • adaboost-doc_summit-generated_view_3-test_labels.csv
        • adaboost-doc_summit-generated_view_3-test_metrics.csv
        • adaboost-doc_summit-generated_view_3-test_metrics.png
        • adaboost-doc_summit-generated_view_3-times.csv
        • adaboost-doc_summit-generated_view_3-train_labels.csv
        • adaboost-doc_summit-generated_view_3-train_metrics.csv
        • adaboost-doc_summit-generated_view_3-train_pred.csv
      • generated_view_4
        • adaboost-doc_summit-generated_view_4-confusion_matrix.csv
        • adaboost-doc_summit-generated_view_4-feature_importances.png
        • adaboost-doc_summit-generated_view_4-full_pred.csv
        • adaboost-doc_summit-generated_view_4-summary.txt
        • adaboost-doc_summit-generated_view_4-test_labels.csv
        • adaboost-doc_summit-generated_view_4-test_metrics.csv
        • adaboost-doc_summit-generated_view_4-test_metrics.png
        • adaboost-doc_summit-generated_view_4-times.csv
        • adaboost-doc_summit-generated_view_4-train_labels.csv
        • adaboost-doc_summit-generated_view_4-train_metrics.csv
        • adaboost-doc_summit-generated_view_4-train_pred.csv
      • generated_view_1feature_importances.pickle
      • generated_view_2feature_importances.pickle
      • generated_view_3feature_importances.pickle
      • generated_view_4feature_importances.pickle
    • decision_tree
      • generated_view_1
        • decision_tree-doc_summit-generated_view_1-confusion_matrix.csv
        • decision_tree-doc_summit-generated_view_1-feature_importances.png
        • decision_tree-doc_summit-generated_view_1-full_pred.csv
        • decision_tree-doc_summit-generated_view_1-summary.txt
        • decision_tree-doc_summit-generated_view_1-test_labels.csv
        • decision_tree-doc_summit-generated_view_1-train_labels.csv
        • decision_tree-doc_summit-generated_view_1-train_pred.csv
      • generated_view_2
        • decision_tree-doc_summit-generated_view_2-confusion_matrix.csv
        • decision_tree-doc_summit-generated_view_2-feature_importances.png
        • decision_tree-doc_summit-generated_view_2-full_pred.csv
        • decision_tree-doc_summit-generated_view_2-summary.txt
        • decision_tree-doc_summit-generated_view_2-test_labels.csv
        • decision_tree-doc_summit-generated_view_2-train_labels.csv
        • decision_tree-doc_summit-generated_view_2-train_pred.csv
      • generated_view_3
        • decision_tree-doc_summit-generated_view_3-confusion_matrix.csv
        • decision_tree-doc_summit-generated_view_3-feature_importances.png
        • decision_tree-doc_summit-generated_view_3-full_pred.csv
        • decision_tree-doc_summit-generated_view_3-summary.txt
        • decision_tree-doc_summit-generated_view_3-test_labels.csv
        • decision_tree-doc_summit-generated_view_3-train_labels.csv
        • decision_tree-doc_summit-generated_view_3-train_pred.csv
      • generated_view_4
        • decision_tree-doc_summit-generated_view_4-confusion_matrix.csv
        • decision_tree-doc_summit-generated_view_4-feature_importances.png
        • decision_tree-doc_summit-generated_view_4-full_pred.csv
        • decision_tree-doc_summit-generated_view_4-summary.txt
        • decision_tree-doc_summit-generated_view_4-test_labels.csv
        • decision_tree-doc_summit-generated_view_4-train_labels.csv
        • decision_tree-doc_summit-generated_view_4-train_pred.csv
      • generated_view_1feature_importances.pickle
      • generated_view_2feature_importances.pickle
      • generated_view_3feature_importances.pickle
      • generated_view_4feature_importances.pickle
    • feature_importances
      • doc_summit-generated_view_1-feature_importances.html
      • doc_summit-generated_view_1-feature_importances_dataframe.csv
      • doc_summit-generated_view_2-feature_importances.html
      • doc_summit-generated_view_2-feature_importances_dataframe.csv
      • doc_summit-generated_view_3-feature_importances.html
      • doc_summit-generated_view_3-feature_importances_dataframe.csv
      • doc_summit-generated_view_4-feature_importances.html
      • doc_summit-generated_view_4-feature_importances_dataframe.csv
    • folds
      • test_labels_fold_0.csv
      • test_labels_fold_1.csv
      • test_labels_fold_2.csv
      • test_labels_fold_3.csv
      • test_labels_fold_4.csv
    • weighted_linear_late_fusion
      • weighted_linear_late_fusion-doc_summit-confusion_matrix.csv
      • weighted_linear_late_fusion-doc_summit-summary.txt
    • doc_summit-2D_plot_data.csv
    • doc_summit-accuracy_score*-class.html
    • doc_summit-accuracy_score*.csv
    • doc_summit-accuracy_score*.html
    • doc_summit-accuracy_score*.png
    • doc_summit-bar_plot_data.csv
    • doc_summit-durations.html
    • doc_summit-durations_dataframe.csv
    • doc_summit-error_analysis_2D.html
    • doc_summit-error_analysis_2D.png
    • doc_summit-error_analysis_bar.html
    • doc_summit-error_analysis_bar.png
    • doc_summit-f1_score-class.html
    • doc_summit-f1_score.csv
    • doc_summit-f1_score.html
    • doc_summit-f1_score.png
    • train_indices.csv
    • train_labels.csv
  • iter_5
    • adaboost
      • generated_view_1
        • adaboost-doc_summit-generated_view_1-confusion_matrix.csv
        • adaboost-doc_summit-generated_view_1-feature_importances.png
        • adaboost-doc_summit-generated_view_1-full_pred.csv
        • adaboost-doc_summit-generated_view_1-summary.txt
        • adaboost-doc_summit-generated_view_1-test_labels.csv
        • adaboost-doc_summit-generated_view_1-test_metrics.csv
        • adaboost-doc_summit-generated_view_1-test_metrics.png
        • adaboost-doc_summit-generated_view_1-times.csv
        • adaboost-doc_summit-generated_view_1-train_labels.csv
        • adaboost-doc_summit-generated_view_1-train_metrics.csv
        • adaboost-doc_summit-generated_view_1-train_pred.csv
      • generated_view_2
        • adaboost-doc_summit-generated_view_2-confusion_matrix.csv
        • adaboost-doc_summit-generated_view_2-feature_importances.png
        • adaboost-doc_summit-generated_view_2-full_pred.csv
        • adaboost-doc_summit-generated_view_2-summary.txt
        • adaboost-doc_summit-generated_view_2-test_labels.csv
        • adaboost-doc_summit-generated_view_2-test_metrics.csv
        • adaboost-doc_summit-generated_view_2-test_metrics.png
        • adaboost-doc_summit-generated_view_2-times.csv
        • adaboost-doc_summit-generated_view_2-train_labels.csv
        • adaboost-doc_summit-generated_view_2-train_metrics.csv
        • adaboost-doc_summit-generated_view_2-train_pred.csv
      • generated_view_3
        • adaboost-doc_summit-generated_view_3-confusion_matrix.csv
        • adaboost-doc_summit-generated_view_3-feature_importances.png
        • adaboost-doc_summit-generated_view_3-full_pred.csv
        • adaboost-doc_summit-generated_view_3-summary.txt
        • adaboost-doc_summit-generated_view_3-test_labels.csv
        • adaboost-doc_summit-generated_view_3-test_metrics.csv
        • adaboost-doc_summit-generated_view_3-test_metrics.png
        • adaboost-doc_summit-generated_view_3-times.csv
        • adaboost-doc_summit-generated_view_3-train_labels.csv
        • adaboost-doc_summit-generated_view_3-train_metrics.csv
        • adaboost-doc_summit-generated_view_3-train_pred.csv
      • generated_view_4
        • adaboost-doc_summit-generated_view_4-confusion_matrix.csv
        • adaboost-doc_summit-generated_view_4-feature_importances.png
        • adaboost-doc_summit-generated_view_4-full_pred.csv
        • adaboost-doc_summit-generated_view_4-summary.txt
        • adaboost-doc_summit-generated_view_4-test_labels.csv
        • adaboost-doc_summit-generated_view_4-test_metrics.csv
        • adaboost-doc_summit-generated_view_4-test_metrics.png
        • adaboost-doc_summit-generated_view_4-times.csv
        • adaboost-doc_summit-generated_view_4-train_labels.csv
        • adaboost-doc_summit-generated_view_4-train_metrics.csv
        • adaboost-doc_summit-generated_view_4-train_pred.csv
      • generated_view_1feature_importances.pickle
      • generated_view_2feature_importances.pickle
      • generated_view_3feature_importances.pickle
      • generated_view_4feature_importances.pickle
    • decision_tree
      • generated_view_1
        • decision_tree-doc_summit-generated_view_1-confusion_matrix.csv
        • decision_tree-doc_summit-generated_view_1-feature_importances.png
        • decision_tree-doc_summit-generated_view_1-full_pred.csv
        • decision_tree-doc_summit-generated_view_1-summary.txt
        • decision_tree-doc_summit-generated_view_1-test_labels.csv
        • decision_tree-doc_summit-generated_view_1-train_labels.csv
        • decision_tree-doc_summit-generated_view_1-train_pred.csv
      • generated_view_2
        • decision_tree-doc_summit-generated_view_2-confusion_matrix.csv
        • decision_tree-doc_summit-generated_view_2-feature_importances.png
        • decision_tree-doc_summit-generated_view_2-full_pred.csv
        • decision_tree-doc_summit-generated_view_2-summary.txt
        • decision_tree-doc_summit-generated_view_2-test_labels.csv
        • decision_tree-doc_summit-generated_view_2-train_labels.csv
        • decision_tree-doc_summit-generated_view_2-train_pred.csv
      • generated_view_3
        • decision_tree-doc_summit-generated_view_3-confusion_matrix.csv
        • decision_tree-doc_summit-generated_view_3-feature_importances.png
        • decision_tree-doc_summit-generated_view_3-full_pred.csv
        • decision_tree-doc_summit-generated_view_3-summary.txt
        • decision_tree-doc_summit-generated_view_3-test_labels.csv
        • decision_tree-doc_summit-generated_view_3-train_labels.csv
        • decision_tree-doc_summit-generated_view_3-train_pred.csv
      • generated_view_4
        • decision_tree-doc_summit-generated_view_4-confusion_matrix.csv
        • decision_tree-doc_summit-generated_view_4-feature_importances.png
        • decision_tree-doc_summit-generated_view_4-full_pred.csv
        • decision_tree-doc_summit-generated_view_4-summary.txt
        • decision_tree-doc_summit-generated_view_4-test_labels.csv
        • decision_tree-doc_summit-generated_view_4-train_labels.csv
        • decision_tree-doc_summit-generated_view_4-train_pred.csv
      • generated_view_1feature_importances.pickle
      • generated_view_2feature_importances.pickle
      • generated_view_3feature_importances.pickle
      • generated_view_4feature_importances.pickle
    • feature_importances
      • doc_summit-generated_view_1-feature_importances.html
      • doc_summit-generated_view_1-feature_importances_dataframe.csv
      • doc_summit-generated_view_2-feature_importances.html
      • doc_summit-generated_view_2-feature_importances_dataframe.csv
      • doc_summit-generated_view_3-feature_importances.html
      • doc_summit-generated_view_3-feature_importances_dataframe.csv
      • doc_summit-generated_view_4-feature_importances.html
      • doc_summit-generated_view_4-feature_importances_dataframe.csv
    • folds
      • test_labels_fold_0.csv
      • test_labels_fold_1.csv
      • test_labels_fold_2.csv
      • test_labels_fold_3.csv
      • test_labels_fold_4.csv
    • weighted_linear_late_fusion
      • weighted_linear_late_fusion-doc_summit-confusion_matrix.csv
      • weighted_linear_late_fusion-doc_summit-summary.txt
    • doc_summit-2D_plot_data.csv
    • doc_summit-accuracy_score*-class.html
    • doc_summit-accuracy_score*.csv
    • doc_summit-accuracy_score*.html
    • doc_summit-accuracy_score*.png
    • doc_summit-bar_plot_data.csv
    • doc_summit-durations.html
    • doc_summit-durations_dataframe.csv
    • doc_summit-error_analysis_2D.html
    • doc_summit-error_analysis_2D.png
    • doc_summit-error_analysis_bar.html
    • doc_summit-error_analysis_bar.png
    • doc_summit-f1_score-class.html
    • doc_summit-f1_score.csv
    • doc_summit-f1_score.html
    • doc_summit-f1_score.png
    • train_indices.csv
    • train_labels.csv
  • 2020_04_02-14_12-.hdf5--doc_summit-LOG.log
  • clf_errors.csv
  • config_file.yml
  • doc_summit-durations.html
  • doc_summit-durations_dataframe.csv
  • doc_summit-durations_stds_dataframe.csv
  • doc_summit-mean_on_5_iter-accuracy_score*-class.html
  • doc_summit-mean_on_5_iter-accuracy_score*.csv
  • doc_summit-mean_on_5_iter-accuracy_score*.html
  • doc_summit-mean_on_5_iter-accuracy_score*.png
  • doc_summit-mean_on_5_iter-f1_score-class.html
  • doc_summit-mean_on_5_iter-f1_score.csv
  • doc_summit-mean_on_5_iter-f1_score.html
  • doc_summit-mean_on_5_iter-f1_score.png
  • error_analysis_2D.html
  • error_analysis_2D.png
  • error_analysis_bar.html
  • error_analysis_bar.png
  • example_errors.csv
  • random_state.pickle

If you look closely, nearly all the files from Example 1 are in each iter_ directories, and some files have appeared, in which the main figures are saved. Indeed, the files stored in started_1560_12_25-15_42/ are the ones that show the mean results on all the statistical iterations. For example, started_1560_12_25-15_42/*-accuracy_score.html looks like :

Similarly for the f1-score :

The main difference between this plot an the one from Example 1 is that here, the scores are means over all the statistical iterations, and the standard deviations are plotted as vertical lines on top of the bars and printed after each score under the bars as “± <std>”.

This has also an impact on the display of error analysis. Indeed, now it has multiple shades of gray depending on the number of iterations that succeeded or failed on the sample :

Indeed, if we zoom in, we can distinguish them better :

../_images/gray.png

Duration

Increasing the number of statistical iterations can be costly in terms of computational resources, indeed it is nearly a straight multiplication of the computation time .

Note

Parallelizing SuMMIT’s statistical iterations can improve its efficiency when using multiple iterations, it is currently work in progress.