multiview_generator package
Subpackages
Submodules
multiview_generator.base module
- class MultiViewSubProblemsGenerator(random_state=42, n_samples=100, n_classes=4, n_views=4, error_matrix=None, n_features=3, class_weights=1.0, redundancy=0.0, complementarity=0.0, complementarity_level=3, mutual_error=0.0, name='generated_dataset', config_file=None, sub_problem_type='base', sub_problem_configurations=None, min_rndm_val=-1, max_rndm_val=1, **kwargs)
Bases:
object
This engine generates one monoview sub-problem for each view with independant data. If then switch descriptions between the samples to create error and difficulty in the dataset
- Parameters:
random_state – The random state or seed.
n_samples – The number of samples that the dataset will contain
n_classes – The number of classes in which the samples will be labelled
n_views – The number of views describing the samples
error_matrix – The error matrix giving in row i column j the error of the Bayes classifier on Class i for View j
n_features – The number of features describing the samples for each view (can specify an int or array-like of length
n_views
)class_weights – The proportion of the dataset that will be labelled in each class. Must specify an array-like of size n_classes ([0.1,0.45,0.45] will output a dataset with with 10% of the samples in the first class and 45% in the two others.)
redundancy – The proportion of the samples that will be well-decribed by all the views.
# :param complementarity: The proportion of samples that will be well-decribed only by some views :param complementarity_level: The number of views that will have a bad description of the complementray samples :param mutual_error: The proportion of samples that will be mis-described by all the views :param name: The name of the dataset (will be used to name the file) :param config_file: The path to the yaml config file. If provided, the config fil entries will overwrite the one passed as arguments.
- gen_report(output_path='.', file_type='md', save=True, n_cv=5)
Generates a markdown report based on the configuration. If
save
is True, it will be saved inoutput_path
as <self.name>.<file_type
> .- Parameters:
output_path (str) – path to store the text report.
file_type (str) – Type of file in which the report is saved (currently supported : “md” or “txt”)
save (bool) – Whether to save the string in a file or not.
- Returns:
The report string
- gen_view_report(view_index)
multiview_generator.base_strs module
multiview_generator.gaussian_classes module
- class MultiViewGaussianSubProblemsGenerator(random_state=42, n_samples=100, n_classes=4, n_views=4, error_matrix=None, n_features=3, class_weights=1.0, redundancy=0.05, complementarity=0.05, complementarity_level=3, mutual_error=0.01, name='generated_dataset', config_file=None, sub_problem_type='base', sub_problem_configurations=None, sub_problem_generators='StumpsGenerator', random_vertices=False, min_rndm_val=-1, max_rndm_val=1, **kwargs)
Bases:
MultiViewSubProblemsGenerator
- assign_complementarity()
Method assigning mis-described and well-described views to build complementary samples
- assign_mutual_error()
Method assigning the mis-describing views to the mutual error samples.
- assign_redundancy()
Method assigning the well-describing views to the redundant samples.
- generate_multi_view_dataset()
This is the main method. It will generate a multiview dataset according to the configuration. To do so,
it generates the labels of the multiview dataset,
then it assigns all the subsets of samples (redundant, …)
finally, for each view it generates a monoview dataset according to the configuration
- Returns:
view_data a list containing the views np.ndarrays and y, the label array.
- get_distance()
Method that records the distance of each description to the ideal decision limit, will be used later to quantify more precisely the quality of a description.
multiview_generator.sub_problems module
- class BaseSubProblem(n_classes=2, n_features=2, random_vertices=True, errors=array([0.5, 0.5]), random_state=RandomState(MT19937) at 0x7FE5AC3C3B40, n_samples_per_class=array([100, 100]), **configuration)
Bases:
object
The base class for all the sub-problem generators.
- gen_report()
General method the generate the report on the view.
- Returns:
A string containing the general report for the view
- class RingsGenerator(n_classes=2, n_features=2, random_vertices=True, errors=array([0.5, 0.5]), random_state=RandomState(MT19937) at 0x7FE5AC3C3B40, n_samples_per_class=array([100, 100]), **configuration)
Bases:
BaseSubProblem
- gen_data()
Generates the samples according to gaussian distributions with scales computed with the given error and class separation. The generator first computes a radius according to the gaussian distribution, then generates n_features-1 random angles to build the polar coordinates of the samples. The dataset returned is the cartesian version of this “polar” dataset.
- Returns:
data a np.ndarray of dimension n_classes, n_samples_per_class, n_features containing the samples’ descriptions, sorted by class
- gen_report()
Generates the specific report for StumpsGenerator.
- get_bayes_classifier()
- class StumpsGenerator(n_classes=2, n_features=2, random_vertices=True, errors=array([0.5, 0.5]), random_state=RandomState(MT19937) at 0x7FE5AC3C3B40, n_samples_per_class=array([100, 100]), **configuration)
Bases:
BaseSubProblem
- gen_data()
Generates the samples according to gaussian distributions with scales computed with the given error and class separation. This sub-problem is easily understandable by a decision tree.
The features are built as : relevant_features : the math.ceil(math.log2(self.n_classes)) first ones, uniform noise features : all the remaining ones
- Returns:
data a np.ndarray of dimension n_classes, n_samples_per_class, n_features containing the samples’ descriptions, sorted by class
- gen_report()
Generates the specific report for StumpsGenerator.
- get_bayes_classifier()
- class TreesGenerator(n_classes=2, n_features=2, random_vertices=True, errors=array([0.5, 0.5]), random_state=RandomState(MT19937) at 0x7FE5AC3C3B40, n_samples_per_class=array([100, 100]), **configuration)
Bases:
BaseSubProblem
Work in progress : Similar generator as StumpsGenerator, but that generates several blobs per class
- gen_data()
WIP
- gen_report()
WIP
- get_bayes_classifier()
- to_cartesian(radius, angles)
Transforms polar coordinates to cartesian coordinates.
multiview_generator.utils module
- format_array(input, size, type_needed=<class 'int'>)
Used to test that : * if the input is an array, it is the right size, * if it is either a string, or a saclar, build an array with
input
repeatedsize
times.- Parameters:
input – either a string, a scalar or an array-like
size – an int, the size of the output array
- Returns:
a
numpy.ndarray
of shape (size
, )
- get_config_from_file(file_path)
Loads the configuration for the yaml config file
- Parameters:
file_path – path to the config file.
- Returns:
- init_array_attr(attr, n_repeat, base_val=0)
Transforms a unique attribute into an array with the same value.
- Parameters:
attr
n_repeat
base_val
- Returns:
- init_class_weights(class_weights, n_classes)
Initializes the class weights. Sets a unifrom distribution if no distribution is specified.
- Parameters:
class_weights
n_classes
- Returns:
- init_error_matrix(error_matrix, n_classes, n_views)
Initializes the error matrix
- Parameters:
error_matrix
n_classes
n_views
- Returns:
- init_list(input, size, type_needed=<class 'dict'>)
Transforms a unique attribute into a list with the same value.
- Parameters:
attr
n_repeat
base_val
- Returns:
- init_random_state(random_state)
Initalizes the random state.
- Parameters:
random_state
- Returns: