Taking control : Use your own algorithms
One of the main goals of this platform is to be able to add a classifier to it without modifying the main code.
Simple task : Adding a monoview classifier
Make it work
Let’s say we want to add a monoview classifier called “name_me” to the platform in order to compare it to the other available ones.
Let’s suppose that we have a python module algo_module.py
in which name_me is defined in the class Algo
following scikit-learn
’s guidelines .
To add “algo” to the platform, let’s create a file called algo.py
in summit/multiview_platform/monoview_classifiers/
In this file let’s define the class AlgoClassifier
, inheriting from Algo
and BaseMonoviewClassifier that contains the required methods for SuMMIT.
Moreover, one has to add a variable called classifier_class_name
that contains the class name (here 'AlgoClassifier'
)
import Algo
from ..monoview.monoview_utils import BaseMonoviewClassifier
classifier_class_name = "AlgoClassifier"
class AlgoClassifier(Algo, BaseMonoviewClassifier):
To be able to use the randomized hyper-parameter optimization, we need to provide some information in the __init__()
method.
Indeed, all the algorithms included in the platform must provide two hyper-parameter-related attributes :
self.param_names
that contain the name of the hyper-parameters that have to be optimized (they must correspond to the name of the attributes of the classAlgo
)self.distribs
that contain the distributions for each of these hyper-parameters.
For example, let’s suppose that name_me need three hyper-parameters and a random state parameter allowing reproducibility :
trade_off
that is a float between 0 and 1,norm_type
that is a string in["l1", "l2"]
,max_depth
that is an integer between 0 and 100.
Then, the __init__()
method of the AlgoClassifier
class wil be :
import Algo
from ..monoview.monoview_utils import BaseMonoviewClassifier, CustomUniform, CustomRandint
classifier_class_name = "AlgoClassifier"
class AlgoClassifier(Algo, BaseMonoviewClassifier):
def __init__(self, random_sate=42, trade_off=0.5, norm_type='l1', max_depth=50)
super(AlgoClassifier, self).__init__(random_sate=random_sate,
trade_off=trade_off,
norm_type=norm_type,
max_depth=max_depth)
self.param_names = ["trade_off", "norm_type", "max_depth"]
self.distribs = [CustomUniform(),
["l1", "l2"],
CustomRandint()]
In this method, we added the needed attributes. See REF TO DOC OF DISTRIBS for the documentation on the used distributions.
If “algo” is implemented in a sklearn fashion, it is now usable in the platform.
Interpretation
It is possible to provide some information about the decision process of the algorithm in the get_interpretation
method.
It inputs four arguments :
directory
, a string containing the directory where figures should be sotredbase_file_name
, a string containing the file name prefix that should be used to store figuresy_test
, an array containing the labels of the test setmulticlass
a boolean that is True if the target is multiclass
This method must return a string that will be appended to the summary file.
An example of method can be :
def get_interpretation(self, directory, base_file_name, y_test,
multiclass=False):
interpret_string = "Algo is a very relevant algorithm that used all the features to classify"
# Save a figure in os.path.join(directory, base_file_name+figure_name.png")
return interpretString
More complex task : Adding a multiview classifier
This part is a bit more complex as to the best of our knowledge, there is no consensus regarding a multiview input for a classifier.
The first step of the integration of a multiview classifier is very similar to the monoview one let us suppose one wants to add “new mv algo”, that is implemented in the class NewMVAlgo. To do so, create a “new_mv_algo.py” file in summit/multiview_platform/multiview_classifiers/
.
In this file let’s define the class NewMVAlgoClassifier
, inheriting from NewMVAlgo
and BaseMultiviewClassifier that contains the required methods for the platform.
Moreover, one has to add a variable called classifier_class_name
that contains the class name (here 'NewMVAlgoClassifier'
)
from new_mv_algo_module import NewMVAlgo
from ..multiview.multiview_utils import BaseMultiviewClassifier
from ..utils.hyper_parameter_search import CustomRandint
classifier_class_name = "NewMVAlgoClassifier"
class NewMVAlgoClassifier(BaseMultiviewClassifier, NewMVAlgo):
def __init__(self, param_1=50,
random_state=None,
param_2="edge"):
BaseMultiviewClassifier.__init__(self, random_state)
NewMVAlgo.__init__(self, param_1=param_1,
random_state=random_state,
param_2=param_2)
self.param_names = ["param_1", "random_state", "param_2"]
self.distribs = [CustomRandint(5,200), [random_state], ["val_1", "val_2"]]
In SuMMIT the input of the fit()
method is X, a dataset object that provide access to each view with a method : dataset_var.get_v(view_index, sample_indices)
.
So in order to add a mutliview classifier to SuMMIT, one will probably have to add a data-transformation step before using the class’s fit()
method.
Moreover, to get restrain the samples and descriptors used in the method, SuMMIT provides two supplementary arguments :
train_indices
is an array of samples indices that compose the training set,view_indices
is an array of view indices to restrain the number of views on which the algorithm will train.
These two arguments are useful to reduce memory usage. Indeed, X, the dataset object is just a wrapper for an HDF5 file object, so the data will only be loaded once the get_v method is called, so the train and test set are not loaded at the same time.
def fit(self, X, y, train_indices=None, view_indices=None):
# This function is used to initialize the sample and view indices, in case they are None, it transforms them in the correct values
train_indices, view_indices = get_samples_views_indices(X,
train_indices,
view_indices)
needed_input = transform_data_if_needed(X, train_indices, view_indices)
return NewMVAlgo.fit(self, needed_input, y[train_indices])
def predict(self, X, sample_indices=None, view_indices=None):
sample_indices, view_indices = get_samples_views_indices(X,
sample_indices,
view_indices)
needed_input = transform_data_if_needed(X, sample_indices, view_indices)
return NewMVAlgo.predict(self, needed_input)
Similarly to monoview algorithms, it is possible to add an interpretation method.
Manipulate the dataset object
The input of the fit and predict method is a Dataset object.
The useful methods of this object are
get_v
The get_v method is the way to access the view data in the dataset object.
As explained earlier, SuMMIT communicates the full dataset object and two arrays through the fit()
and predict()
methods to avoid loading the views if it is not mandatory.
Example : build a list of all the views arrays
Let us suppose that the mutliview algorithm that one wants to add to SuMMIT takes as input a list list_X
of all the views.
Then an example of self.transform_data_if_needed(X, sample_indices, view_indices)
could be :
def transform_data_if_needed(self, X, sample_indices, view_indices):
views_list = []
# Browse the asked views indices
for view_index in view_indices:
# Get the data from the dataset object, for the asked samples
view_data = X.get_v(view_index, sample_indices=sample_indices)
# Store it in the list
views_list.append(view_data)
return views_list