multiview_generator package

Subpackages

multiview_generator.tests package

Submodules

multiview_generator.base module

class MultiViewSubProblemsGenerator(random_state=42, n_samples=100, n_classes=4, n_views=4, error_matrix=None, n_features=3, class_weights=1.0, redundancy=0.0, complementarity=0.0, complementarity_level=3, mutual_error=0.0, name='generated_dataset', config_file=None, sub_problem_type='base', sub_problem_configurations=None, min_rndm_val=-1, max_rndm_val=1, **kwargs)

Bases: object

This engine generates one monoview sub-problem for each view with independant data. If then switch descriptions between the samples to create error and difficulty in the dataset

Parameters:

random_state – The random state or seed.
n_samples – The number of samples that the dataset will contain
n_classes – The number of classes in which the samples will be labelled
n_views – The number of views describing the samples
error_matrix – The error matrix giving in row i column j the error of the Bayes classifier on Class i for View j
n_features – The number of features describing the samples for each view (can specify an int or array-like of length n_views)
class_weights – The proportion of the dataset that will be labelled in each class. Must specify an array-like of size n_classes ([0.1,0.45,0.45] will output a dataset with with 10% of the samples in the first class and 45% in the two others.)
redundancy – The proportion of the samples that will be well-decribed by all the views.

# :param complementarity: The proportion of samples that will be well-decribed only by some views :param complementarity_level: The number of views that will have a bad description of the complementray samples :param mutual_error: The proportion of samples that will be mis-described by all the views :param name: The name of the dataset (will be used to name the file) :param config_file: The path to the yaml config file. If provided, the config fil entries will overwrite the one passed as arguments.

gen_report(output_path='.', file_type='md', save=True, n_cv=5)

Generates a markdown report based on the configuration. If save is True, it will be saved in output_path as <self.name>.<file_type> .

Parameters:

output_path (str) – path to store the text report.
file_type (str) – Type of file in which the report is saved (currently supported : “md” or “txt”)
save (bool) – Whether to save the string in a file or not.

Returns:

The report string

gen_view_report(view_index)

to_hdf5_mc(saving_path='.')

This is used to save the dataset in an HDF5 file, compatible with SuMMIT

Parameters:: saving_path (str) – where to save the dataset, the file will be names after the self.name attribute.
Returns:: None

multiview_generator.base_strs module

multiview_generator.gaussian_classes module

class MultiViewGaussianSubProblemsGenerator(random_state=42, n_samples=100, n_classes=4, n_views=4, error_matrix=None, n_features=3, class_weights=1.0, redundancy=0.05, complementarity=0.05, complementarity_level=3, mutual_error=0.01, name='generated_dataset', config_file=None, sub_problem_type='base', sub_problem_configurations=None, sub_problem_generators='StumpsGenerator', random_vertices=False, min_rndm_val=-1, max_rndm_val=1, **kwargs)

Bases: MultiViewSubProblemsGenerator

assign_complementarity(): Method assigning mis-described and well-described views to build complementary samples

assign_mutual_error(): Method assigning the mis-describing views to the mutual error samples.

assign_redundancy(): Method assigning the well-describing views to the redundant samples.

generate_multi_view_dataset()

This is the main method. It will generate a multiview dataset according to the configuration. To do so,

it generates the labels of the multiview dataset,
then it assigns all the subsets of samples (redundant, …)
finally, for each view it generates a monoview dataset according to the configuration

Returns:: view_data a list containing the views np.ndarrays and y, the label array.

get_distance(): Method that records the distance of each description to the ideal decision limit, will be used later to quantify more precisely the quality of a description.

multiview_generator.sub_problems module

class BaseSubProblem(n_classes=2, n_features=2, random_vertices=True, errors=array([0.5, 0.5]), random_state=RandomState(MT19937) at 0x7FE5AC3C3B40, n_samples_per_class=array([100, 100]), **configuration)

Bases: object

The base class for all the sub-problem generators.

gen_report()

General method the generate the report on the view.

Returns:: A string containing the general report for the view

class RingsGenerator(n_classes=2, n_features=2, random_vertices=True, errors=array([0.5, 0.5]), random_state=RandomState(MT19937) at 0x7FE5AC3C3B40, n_samples_per_class=array([100, 100]), **configuration)

Bases: BaseSubProblem

gen_data()

Generates the samples according to gaussian distributions with scales computed with the given error and class separation. The generator first computes a radius according to the gaussian distribution, then generates n_features-1 random angles to build the polar coordinates of the samples. The dataset returned is the cartesian version of this “polar” dataset.

Returns:: data a np.ndarray of dimension n_classes, n_samples_per_class, n_features containing the samples’ descriptions, sorted by class

gen_report(): Generates the specific report for StumpsGenerator.

get_bayes_classifier()

class StumpsGenerator(n_classes=2, n_features=2, random_vertices=True, errors=array([0.5, 0.5]), random_state=RandomState(MT19937) at 0x7FE5AC3C3B40, n_samples_per_class=array([100, 100]), **configuration)

Bases: BaseSubProblem

gen_data()

Generates the samples according to gaussian distributions with scales computed with the given error and class separation. This sub-problem is easily understandable by a decision tree.

The features are built as : relevant_features : the math.ceil(math.log2(self.n_classes)) first ones, uniform noise features : all the remaining ones

Returns:: data a np.ndarray of dimension n_classes, n_samples_per_class, n_features containing the samples’ descriptions, sorted by class

gen_report(): Generates the specific report for StumpsGenerator.

get_bayes_classifier()

class TreesGenerator(n_classes=2, n_features=2, random_vertices=True, errors=array([0.5, 0.5]), random_state=RandomState(MT19937) at 0x7FE5AC3C3B40, n_samples_per_class=array([100, 100]), **configuration)

Bases: BaseSubProblem

Work in progress : Similar generator as StumpsGenerator, but that generates several blobs per class

gen_data(): WIP

gen_report(): WIP

get_bayes_classifier()

to_cartesian(radius, angles): Transforms polar coordinates to cartesian coordinates.

multiview_generator.utils module

format_array(input, size, type_needed=<class 'int'>)

Used to test that : * if the input is an array, it is the right size, * if it is either a string, or a saclar, build an array with input repeated size times.

Parameters:

input – either a string, a scalar or an array-like
size – an int, the size of the output array

Returns:

a numpy.ndarray of shape (size, )

get_config_from_file(file_path)

Loads the configuration for the yaml config file

Parameters:: file_path – path to the config file.
Returns:

init_array_attr(attr, n_repeat, base_val=0)

Transforms a unique attribute into an array with the same value.

Parameters:

attr
n_repeat
base_val

Returns:

init_class_weights(class_weights, n_classes)

Initializes the class weights. Sets a unifrom distribution if no distribution is specified.

Parameters:

class_weights
n_classes

Returns:

init_error_matrix(error_matrix, n_classes, n_views)

Initializes the error matrix

Parameters:

error_matrix
n_classes
n_views

Returns:

init_list(input, size, type_needed=<class 'dict'>)

Transforms a unique attribute into a list with the same value.

Parameters:

attr
n_repeat
base_val

Returns:

init_random_state(random_state)

Initalizes the random state.

Parameters:: random_state
Returns:

multiview_generator package

Subpackages

Submodules

multiview_generator.base module

multiview_generator.base_strs module

multiview_generator.gaussian_classes module

multiview_generator.sub_problems module

multiview_generator.utils module

Module contents