.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/plot_distance_correlation.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_distance_correlation.py: Distance correlation plot ========================= Plot 2d synthetic datasets and compute distance correlation between their coordinates. .. GENERATED FROM PYTHON SOURCE LINES 11-16 The objective of this example is to replicate the results obtained in the `Wikipedia page for distance correlation `_. We first include the necessary imports. .. GENERATED FROM PYTHON SOURCE LINES 16-23 .. code-block:: default import matplotlib.pyplot as plt import numpy as np import scipy.stats import dcor .. GENERATED FROM PYTHON SOURCE LINES 24-27 We now create a random generator with a fixed seed for reproducibility. We also define the number of samples per dataset. Both will be global variables for this script. .. GENERATED FROM PYTHON SOURCE LINES 27-32 .. code-block:: default random_state = np.random.default_rng(seed=123456789) n_samples = 1000 .. GENERATED FROM PYTHON SOURCE LINES 33-35 We now define utility functions for plotting the data and generating the synthetic datasets. .. GENERATED FROM PYTHON SOURCE LINES 35-45 .. code-block:: default def plot_data(x, y, ax, xlim, ylim): """Plot the data without axis.""" ax.set_title(f"{correlation:.1f}") ax.set_xlim(xlim) ax.set_ylim(ylim) ax.scatter(x, y, s=1) ax.axis(False) .. GENERATED FROM PYTHON SOURCE LINES 46-49 The first row of datasets is composed of bivariate Gaussian distributions with different correlations between the coordinates, so we define a function that returns one of these datasets given the desired correlation. .. GENERATED FROM PYTHON SOURCE LINES 49-60 .. code-block:: default def gaussian2d(correlation): """Generate 2D Gaussian data with a particular correlation.""" return random_state.multivariate_normal( mean=[0, 0], cov=[[1, correlation], [correlation, 1]], size=n_samples, ) .. GENERATED FROM PYTHON SOURCE LINES 61-65 The second row of datasets have the data in a line with different rotations. We now define a function for rotating a dataset by a given number of degrees. That rotation is performed using a `rotation matrix `_. .. GENERATED FROM PYTHON SOURCE LINES 65-76 .. code-block:: default def rotate(data, angle): """Apply a rotation in degrees.""" angle = np.deg2rad(angle) rotation_matrix = [ [np.cos(angle), - np.sin(angle)], [np.sin(angle), np.cos(angle)], ] return data @ rotation_matrix .. GENERATED FROM PYTHON SOURCE LINES 77-85 The two final rows of datasets consist of data with complex relationships between the coordinates. The difference between these rows is the spread of the data, so we make that a parameter. We have made this function a generator that yields each dataset one at a time, in order to simplify looping over the datasets. As each distribution has different support, we make sure to yield not only the data, but also the limits for plotting. .. GENERATED FROM PYTHON SOURCE LINES 85-136 .. code-block:: default def other_datasets(spread): """Generate other complex datasets.""" x = random_state.uniform(-1, 1, size=n_samples) y = ( 4 * (x**2 - 1 / 2)**2 + random_state.uniform(-1, 1, size=n_samples) / 3 * spread ) yield x, y, (-1, 1), (-1 / 3, 1 + 1 / 3) y = random_state.uniform(-1, 1, size=n_samples) xy = rotate(np.column_stack([x, y]), -22.5) lim = np.sqrt(2 + np.sqrt(2)) / np.sqrt(2) yield xy[:, 0], xy[:, 1] * spread, (-lim, lim), (-lim, lim) xy = rotate(xy, -22.5) lim = np.sqrt(2) yield xy[:, 0], xy[:, 1] * spread, (-lim, lim), (-lim, lim) y = 2 * x**2 + random_state.uniform(-1, 1, size=n_samples) * spread yield x, y, (-1, 1), (-1, 3) y = ( (x**2 + random_state.uniform(0, 1 / 2, size=n_samples) * spread) * random_state.choice([-1, 1], size=n_samples) ) yield x, y, (-1.5, 1.5), (-1.5, 1.5) y = ( np.cos(x * np.pi) + random_state.normal(0, 1 / 8, size=n_samples) * spread ) x = ( np.sin(x * np.pi) + random_state.normal(0, 1 / 8, size=n_samples) * spread ) yield x, y, (-1.5, 1.5), (-1.5, 1.5) xy = np.concatenate([ random_state.multivariate_normal( mean, np.eye(2) * spread, size=n_samples, ) for mean in ([3, 3], [-3, 3], [-3, -3], [3, -3]) ]) lim = 3 + 4 yield xy[:, 0], xy[:, 1], (-lim, lim), (-lim, lim) .. GENERATED FROM PYTHON SOURCE LINES 137-138 Finally, we define the function that yields all the datasets in order. .. GENERATED FROM PYTHON SOURCE LINES 138-154 .. code-block:: default def all_datasets(): """Generate all the datasets in the example.""" for correlation in (1.0, 0.8, 0.4, 0.0, -0.4, -0.8, -1.0): x, y = gaussian2d(correlation).T yield x, y, (-4, 4), (-4, 4) line = gaussian2d(correlation=1) for angle in (0, 15, 30, 45, 60, 75, 90): x, y = rotate(line, angle).T yield x, y, (-4, 4), (-4, 4) yield from other_datasets(spread=1) yield from other_datasets(spread=0.3) .. GENERATED FROM PYTHON SOURCE LINES 155-157 We can now compute and plot each dataset, and the distance correlation between their coordinates. .. GENERATED FROM PYTHON SOURCE LINES 157-169 .. code-block:: default subplot_kwargs = dict( figsize=(10, 6), constrained_layout=True, subplot_kw=dict(box_aspect=1), ) fig, axes = plt.subplots(4, 7, **subplot_kwargs) for (x, y, xlim, ylim), ax in zip(all_datasets(), axes.flat): correlation = dcor.distance_correlation(x, y) plot_data(x, y, ax=ax, xlim=xlim, ylim=ylim) .. image-sg:: /auto_examples/images/sphx_glr_plot_distance_correlation_001.png :alt: 1.0, 0.8, 0.4, 0.1, 0.3, 0.8, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.4, 0.1, 0.1, 0.3, 0.2, 0.2, 0.0, 0.4, 0.1, 0.2, 0.5, 0.3, 0.2, 0.0 :srcset: /auto_examples/images/sphx_glr_plot_distance_correlation_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 170-174 For comparison, we include the results obtained with the standard Pearson correlation, also available in `Wikipedia `_. .. GENERATED FROM PYTHON SOURCE LINES 174-179 .. code-block:: default fig, axes = plt.subplots(4, 7, **subplot_kwargs) for (x, y, xlim, ylim), ax in zip(all_datasets(), axes.flat): correlation = scipy.stats.pearsonr(x, y).statistic plot_data(x, y, ax=ax, xlim=xlim, ylim=ylim) .. image-sg:: /auto_examples/images/sphx_glr_plot_distance_correlation_002.png :alt: 1.0, 0.8, 0.4, 0.0, -0.4, -0.8, -1.0, 1.0, 1.0, 1.0, -0.0, -1.0, -1.0, -1.0, 0.0, -0.0, 0.0, 0.1, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.0 :srcset: /auto_examples/images/sphx_glr_plot_distance_correlation_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 3.165 seconds) .. _sphx_glr_download_auto_examples_plot_distance_correlation.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/VNMabus/dcor/master?filepath=examples/plot_distance_correlation.py :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_distance_correlation.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_distance_correlation.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_