multiview_generator.gaussian_classes
gaussian_classes
- class MultiViewGaussianSubProblemsGenerator(random_state=42, n_samples=100, n_classes=4, n_views=4, error_matrix=None, n_features=3, class_weights=1.0, redundancy=0.05, complementarity=0.05, complementarity_level=3, mutual_error=0.01, name='generated_dataset', config_file=None, sub_problem_type='base', sub_problem_configurations=None, sub_problem_generators='StumpsGenerator', random_vertices=False, min_rndm_val=-1, max_rndm_val=1, **kwargs)
This engine generates one monoview sub-problem for each view with independant data. If then switch descriptions between the samples to create error and difficulty in the dataset
- Parameters:
random_state – The random state or seed.
n_samples – The number of samples that the dataset will contain
n_classes – The number of classes in which the samples will be labelled
n_views – The number of views describing the samples
error_matrix – The error matrix giving in row i column j the error of the Bayes classifier on Class i for View j
n_features – The number of features describing the samples for each view (can specify an int or array-like of length
n_views
)class_weights – The proportion of the dataset that will be labelled in each class. Must specify an array-like of size n_classes ([0.1,0.45,0.45] will output a dataset with with 10% of the samples in the first class and 45% in the two others.)
redundancy – The proportion of the samples that will be well-decribed by all the views.
# :param complementarity: The proportion of samples that will be well-decribed only by some views :param complementarity_level: The number of views that will have a bad description of the complementray samples :param mutual_error: The proportion of samples that will be mis-described by all the views :param name: The name of the dataset (will be used to name the file) :param config_file: The path to the yaml config file. If provided, the config fil entries will overwrite the one passed as arguments.
- random_vertices
- sub_problem_generators
- generate_multi_view_dataset()
This is the main method. It will generate a multiview dataset according to the configuration. To do so,
it generates the labels of the multiview dataset,
then it assigns all the subsets of samples (redundant, …)
finally, for each view it generates a monoview dataset according to the configuration
- Returns:
view_data a list containing the views np.ndarrays and y, the label array.
- assign_mutual_error()
Method assigning the mis-describing views to the mutual error samples.
- assign_complementarity()
Method assigning mis-described and well-described views to build complementary samples
- assign_redundancy()
Method assigning the well-describing views to the redundant samples.
- get_distance()
Method that records the distance of each description to the ideal decision limit, will be used later to quantify more precisely the quality of a description.