.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/plot_dcor_usage.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_dcor_usage.py: Usage of distance correlation ============================= Example that shows the usage of distance correlation. .. GENERATED FROM PYTHON SOURCE LINES 8-18 .. code-block:: Python import timeit import traceback import matplotlib.pyplot as plt import numpy as np import scipy.stats import dcor .. GENERATED FROM PYTHON SOURCE LINES 19-27 Distance correlation is a measure of dependence between distributions, analogous to the classical Pearson's correlation coefficient. However, Pearson's correlation can be 0 even when there is a nonlinear dependence, while distance correlation is 0 only for independent distributions. As an example, consider the following data sampled from two dependent distributions: .. GENERATED FROM PYTHON SOURCE LINES 27-37 .. code-block:: Python n_samples = 1000 random_state = np.random.default_rng(123456) x = random_state.uniform(-1, 1, size=n_samples) y = x**2 + random_state.normal(0, 0.01, size=n_samples) plt.scatter(x, y, s=1) plt.show() .. image-sg:: /auto_examples/images/sphx_glr_plot_dcor_usage_001.png :alt: plot dcor usage :srcset: /auto_examples/images/sphx_glr_plot_dcor_usage_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 38-45 The data from the first distribution comes from a uniform distribution. However, the second distribution is a noisy function of the first one: :math:`y \approx x^2`. It is clear then that there is a nonlinear dependence between the distributions. The standard Pearson's correlation coefficient is not able to detect this kind of dependence: .. GENERATED FROM PYTHON SOURCE LINES 45-48 .. code-block:: Python scipy.stats.pearsonr(x, y).statistic .. rst-class:: sphx-glr-script-out .. code-block:: none -0.04891153522218301 .. GENERATED FROM PYTHON SOURCE LINES 49-53 Note that Pearson's correlation takes values in :math:`[-1, 1]`, with values near the extremes indicating a high degree of linear correlation. In this case the value is near 0, as the correlation is not linear, and thus is ignored by this method. .. GENERATED FROM PYTHON SOURCE LINES 55-57 However, distance correlation correctly identifies the nonlinear dependence: .. GENERATED FROM PYTHON SOURCE LINES 57-60 .. code-block:: Python dcor.distance_correlation(x, y) .. rst-class:: sphx-glr-script-out .. code-block:: none 0.4996764265352616 .. GENERATED FROM PYTHON SOURCE LINES 61-64 As an additional advantage distance correlation can be applied between samples of arbitrary dimensional random vectors, even with different dimensions between x and y. .. GENERATED FROM PYTHON SOURCE LINES 64-80 .. code-block:: Python n_features_x2 = 2 x2 = random_state.uniform(-1, 1, size=(n_samples, 2)) y2 = x2[:, 0]**2 + x2[:, 1]**2 print(x2.shape) print(y2.shape) fig = plt.figure() ax = fig.add_subplot(projection='3d') ax.scatter(x2[:, 0], x2[:, 1], y2, s=1) plt.show() dcor.distance_correlation(x2, y2) .. image-sg:: /auto_examples/images/sphx_glr_plot_dcor_usage_002.png :alt: plot dcor usage :srcset: /auto_examples/images/sphx_glr_plot_dcor_usage_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none (1000, 2) (1000,) 0.3612284260066458 .. GENERATED FROM PYTHON SOURCE LINES 81-83 As with the Pearson's correlation, distance correlation is computed from a related covariance measure, called distance covariance: .. GENERATED FROM PYTHON SOURCE LINES 83-86 .. code-block:: Python dcor.distance_covariance(x, y) .. rst-class:: sphx-glr-script-out .. code-block:: none 0.1519085406311944 .. GENERATED FROM PYTHON SOURCE LINES 87-92 The standard naive algorithm for computing distance covariance and correlation requires the computation of the distance matrices between the observations in x and between the observations in y. This has a computational cost in both time and memory of :math:`O(n^2)`, with :math:`n` the number of observations: .. GENERATED FROM PYTHON SOURCE LINES 92-99 .. code-block:: Python n_calls = 100 timeit.timeit( lambda: dcor.distance_correlation(x, y, method="naive"), number=n_calls, ) .. rst-class:: sphx-glr-script-out .. code-block:: none 1.6177208389999578 .. GENERATED FROM PYTHON SOURCE LINES 100-104 When both x and y are one-dimensional, there are alternative algorithms with a computational cost of :math:`O(n\log(n))`, one based on the theory of the AVL balanced trees and one based on the popular sorting algorithm mergesort: .. GENERATED FROM PYTHON SOURCE LINES 104-109 .. code-block:: Python timeit.timeit( lambda: dcor.distance_correlation(x, y, method="avl"), number=n_calls, ) .. rst-class:: sphx-glr-script-out .. code-block:: none 0.18873198099754518 .. GENERATED FROM PYTHON SOURCE LINES 110-115 .. code-block:: Python timeit.timeit( lambda: dcor.distance_correlation(x, y, method="mergesort"), number=n_calls, ) .. rst-class:: sphx-glr-script-out .. code-block:: none 0.10310449700045865 .. GENERATED FROM PYTHON SOURCE LINES 116-118 By default, these fast algorithms are used when possible (this is what the default value for the method, "auto", means): .. GENERATED FROM PYTHON SOURCE LINES 118-124 .. code-block:: Python timeit.timeit( lambda: dcor.distance_correlation(x, y, method="auto"), number=n_calls, ) .. rst-class:: sphx-glr-script-out .. code-block:: none 0.13859588399645872 .. GENERATED FROM PYTHON SOURCE LINES 125-127 Note that explicitly trying to use the fast algorithms with multidimensional random vectors will produce an error. .. GENERATED FROM PYTHON SOURCE LINES 127-132 .. code-block:: Python try: dcor.distance_correlation(x2, y2, method="avl") except Exception: traceback.print_exc() .. rst-class:: sphx-glr-script-out .. code-block:: none Traceback (most recent call last): File "/home/docs/checkouts/readthedocs.org/user_builds/dcor/checkouts/latest/examples/plot_dcor_usage.py", line 129, in dcor.distance_correlation(x2, y2, method="avl") File "/home/docs/checkouts/readthedocs.org/user_builds/dcor/envs/latest/lib/python3.10/site-packages/dcor/_dcor.py", line 1033, in distance_correlation distance_correlation_sqr( File "/home/docs/checkouts/readthedocs.org/user_builds/dcor/envs/latest/lib/python3.10/site-packages/dcor/_dcor.py", line 911, in distance_correlation_sqr return method.value.stats_sqr( File "/home/docs/checkouts/readthedocs.org/user_builds/dcor/envs/latest/lib/python3.10/site-packages/dcor/_dcor.py", line 260, in stats_sqr ) = self.terms( File "/home/docs/checkouts/readthedocs.org/user_builds/dcor/envs/latest/lib/python3.10/site-packages/dcor/_fast_dcov_avl.py", line 533, in _distance_covariance_sqr_terms_avl x, y = _transform_to_1d(x, y) File "/home/docs/checkouts/readthedocs.org/user_builds/dcor/envs/latest/lib/python3.10/site-packages/dcor/_utils.py", line 125, in _transform_to_1d assert array.shape[1] == 1 AssertionError .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 2.483 seconds) .. _sphx_glr_download_auto_examples_plot_dcor_usage.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/VNMabus/dcor/develop?filepath=examples/plot_dcor_usage.py :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_dcor_usage.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_dcor_usage.py ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_