multiview_generator package

Subpackages

Submodules

multiview_generator.base module

class MultiViewSubProblemsGenerator(random_state=42, n_samples=100, n_classes=4, n_views=4, error_matrix=None, n_features=3, class_weights=1.0, redundancy=0.0, complementarity=0.0, complementarity_level=3, mutual_error=0.0, name='generated_dataset', config_file=None, sub_problem_type='base', sub_problem_configurations=None, min_rndm_val=-1, max_rndm_val=1, **kwargs)

Bases: object

This engine generates one monoview sub-problem for each view with independant data. If then switch descriptions between the samples to create error and difficulty in the dataset

Parameters:
  • random_state – The random state or seed.

  • n_samples – The number of samples that the dataset will contain

  • n_classes – The number of classes in which the samples will be labelled

  • n_views – The number of views describing the samples

  • error_matrix – The error matrix giving in row i column j the error of the Bayes classifier on Class i for View j

  • n_features – The number of features describing the samples for each view (can specify an int or array-like of length n_views)

  • class_weights – The proportion of the dataset that will be labelled in each class. Must specify an array-like of size n_classes ([0.1,0.45,0.45] will output a dataset with with 10% of the samples in the first class and 45% in the two others.)

  • redundancy – The proportion of the samples that will be well-decribed by all the views.

# :param complementarity: The proportion of samples that will be well-decribed only by some views :param complementarity_level: The number of views that will have a bad description of the complementray samples :param mutual_error: The proportion of samples that will be mis-described by all the views :param name: The name of the dataset (will be used to name the file) :param config_file: The path to the yaml config file. If provided, the config fil entries will overwrite the one passed as arguments.

gen_report(output_path='.', file_type='md', save=True, n_cv=5)

Generates a markdown report based on the configuration. If save is True, it will be saved in output_path as <self.name>.<file_type> .

Parameters:
  • output_path (str) – path to store the text report.

  • file_type (str) – Type of file in which the report is saved (currently supported : “md” or “txt”)

  • save (bool) – Whether to save the string in a file or not.

Returns:

The report string

gen_view_report(view_index)
to_hdf5_mc(saving_path='.')

This is used to save the dataset in an HDF5 file, compatible with SuMMIT

Parameters:

saving_path (str) – where to save the dataset, the file will be names after the self.name attribute.

Returns:

None

multiview_generator.base_strs module

multiview_generator.gaussian_classes module

class MultiViewGaussianSubProblemsGenerator(random_state=42, n_samples=100, n_classes=4, n_views=4, error_matrix=None, n_features=3, class_weights=1.0, redundancy=0.05, complementarity=0.05, complementarity_level=3, mutual_error=0.01, name='generated_dataset', config_file=None, sub_problem_type='base', sub_problem_configurations=None, sub_problem_generators='StumpsGenerator', random_vertices=False, min_rndm_val=-1, max_rndm_val=1, **kwargs)

Bases: MultiViewSubProblemsGenerator

assign_complementarity()

Method assigning mis-described and well-described views to build complementary samples

assign_mutual_error()

Method assigning the mis-describing views to the mutual error samples.

assign_redundancy()

Method assigning the well-describing views to the redundant samples.

generate_multi_view_dataset()

This is the main method. It will generate a multiview dataset according to the configuration. To do so,

  • it generates the labels of the multiview dataset,

  • then it assigns all the subsets of samples (redundant, …)

  • finally, for each view it generates a monoview dataset according to the configuration

Returns:

view_data a list containing the views np.ndarrays and y, the label array.

get_distance()

Method that records the distance of each description to the ideal decision limit, will be used later to quantify more precisely the quality of a description.

multiview_generator.sub_problems module

class BaseSubProblem(n_classes=2, n_features=2, random_vertices=True, errors=array([0.5, 0.5]), random_state=RandomState(MT19937) at 0x7FE5AC3C3B40, n_samples_per_class=array([100, 100]), **configuration)

Bases: object

The base class for all the sub-problem generators.

gen_report()

General method the generate the report on the view.

Returns:

A string containing the general report for the view

class RingsGenerator(n_classes=2, n_features=2, random_vertices=True, errors=array([0.5, 0.5]), random_state=RandomState(MT19937) at 0x7FE5AC3C3B40, n_samples_per_class=array([100, 100]), **configuration)

Bases: BaseSubProblem

gen_data()

Generates the samples according to gaussian distributions with scales computed with the given error and class separation. The generator first computes a radius according to the gaussian distribution, then generates n_features-1 random angles to build the polar coordinates of the samples. The dataset returned is the cartesian version of this “polar” dataset.

Returns:

data a np.ndarray of dimension n_classes, n_samples_per_class, n_features containing the samples’ descriptions, sorted by class

gen_report()

Generates the specific report for StumpsGenerator.

get_bayes_classifier()
class StumpsGenerator(n_classes=2, n_features=2, random_vertices=True, errors=array([0.5, 0.5]), random_state=RandomState(MT19937) at 0x7FE5AC3C3B40, n_samples_per_class=array([100, 100]), **configuration)

Bases: BaseSubProblem

gen_data()

Generates the samples according to gaussian distributions with scales computed with the given error and class separation. This sub-problem is easily understandable by a decision tree.

The features are built as : relevant_features : the math.ceil(math.log2(self.n_classes)) first ones, uniform noise features : all the remaining ones

Returns:

data a np.ndarray of dimension n_classes, n_samples_per_class, n_features containing the samples’ descriptions, sorted by class

gen_report()

Generates the specific report for StumpsGenerator.

get_bayes_classifier()
class TreesGenerator(n_classes=2, n_features=2, random_vertices=True, errors=array([0.5, 0.5]), random_state=RandomState(MT19937) at 0x7FE5AC3C3B40, n_samples_per_class=array([100, 100]), **configuration)

Bases: BaseSubProblem

Work in progress : Similar generator as StumpsGenerator, but that generates several blobs per class

gen_data()

WIP

gen_report()

WIP

get_bayes_classifier()
to_cartesian(radius, angles)

Transforms polar coordinates to cartesian coordinates.

multiview_generator.utils module

format_array(input, size, type_needed=<class 'int'>)

Used to test that : * if the input is an array, it is the right size, * if it is either a string, or a saclar, build an array with input repeated size times.

Parameters:
  • input – either a string, a scalar or an array-like

  • size – an int, the size of the output array

Returns:

a numpy.ndarray of shape (size, )

get_config_from_file(file_path)

Loads the configuration for the yaml config file

Parameters:

file_path – path to the config file.

Returns:

init_array_attr(attr, n_repeat, base_val=0)

Transforms a unique attribute into an array with the same value.

Parameters:
  • attr

  • n_repeat

  • base_val

Returns:

init_class_weights(class_weights, n_classes)

Initializes the class weights. Sets a unifrom distribution if no distribution is specified.

Parameters:
  • class_weights

  • n_classes

Returns:

init_error_matrix(error_matrix, n_classes, n_views)

Initializes the error matrix

Parameters:
  • error_matrix

  • n_classes

  • n_views

Returns:

init_list(input, size, type_needed=<class 'dict'>)

Transforms a unique attribute into a list with the same value.

Parameters:
  • attr

  • n_repeat

  • base_val

Returns:

init_random_state(random_state)

Initalizes the random state.

Parameters:

random_state

Returns:

Module contents