.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/plot_dcor_t_test.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_dcor_t_test.py: The distance correlation t-test of independence =============================================== Example that shows the usage of the distance correlation t-test. .. GENERATED FROM PYTHON SOURCE LINES 8-18 .. code-block:: default import matplotlib.pyplot as plt import numpy as np import pandas as pd import scipy.stats import dcor # sphinx_gallery_thumbnail_number = 3 .. GENERATED FROM PYTHON SOURCE LINES 19-24 Given matching samples of two random vectors with arbitrary dimensions, the distance covariance can be used to construct an asymptotic test of independence. For a introduction to the independence tests see :ref:`sphx_glr_auto_examples_plot_dcov_test.py`. .. GENERATED FROM PYTHON SOURCE LINES 26-27 We can consider the same case with independent observations: .. GENERATED FROM PYTHON SOURCE LINES 27-39 .. code-block:: default n_samples = 1000 random_state = np.random.default_rng(83110) x = random_state.uniform(0, 1, size=n_samples) y = random_state.normal(0, 1, size=n_samples) plt.scatter(x, y, s=1) plt.show() dcor.independence.distance_correlation_t_test(x, y) .. image-sg:: /auto_examples/images/sphx_glr_plot_dcor_t_test_001.png :alt: plot dcor t test :srcset: /auto_examples/images/sphx_glr_plot_dcor_t_test_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none HypothesisTest(pvalue=0.8250527908156173, statistic=-0.9347949830875785) .. GENERATED FROM PYTHON SOURCE LINES 40-41 We can also consider the case with nonlinear dependencies: .. GENERATED FROM PYTHON SOURCE LINES 41-58 .. code-block:: default u = random_state.uniform(-1, 1, size=n_samples) y = ( np.cos(u * np.pi) + random_state.normal(0, 0.01, size=n_samples) ) x = ( np.sin(u * np.pi) + random_state.normal(0, 0.01, size=n_samples) ) plt.scatter(x, y, s=1) plt.show() dcor.independence.distance_correlation_t_test(x, y) .. image-sg:: /auto_examples/images/sphx_glr_plot_dcor_t_test_002.png :alt: plot dcor t test :srcset: /auto_examples/images/sphx_glr_plot_dcor_t_test_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none HypothesisTest(pvalue=0.0, statistic=29.97891473933373) .. GENERATED FROM PYTHON SOURCE LINES 59-61 As we can observe, this test also correctly rejects the null hypothesis in the second case and not in the first case. .. GENERATED FROM PYTHON SOURCE LINES 63-72 The test illustrated here is an asymptotic test, that relies in the approximation of the statistic distribution to the Student's t-distribution under the null hypothesis, when the dimension of the data goes to infinity. This test is thus faster than permutation tests, as it does not require the use of permutations of the data, and it is also deterministic for a given dataset. However, the test should be applied only for high-dimensional data, at least in theory. .. GENERATED FROM PYTHON SOURCE LINES 74-81 We will now plot for the case of normal distributions the histogram of the statistic, and compute the Type I error, as seen in :footcite:t:`szekely+rizzo_2013_distance`. Users are encouraged to download this example and increase that number to obtain better estimates of the Type I error. In order to replicate the original results, one should set the value of ``n_tests`` to 1000. .. GENERATED FROM PYTHON SOURCE LINES 81-120 .. code-block:: default n_tests = 100 dim = 30 significance = 0.1 n_obs_list = [25, 30, 35, 50, 70, 100] table = pd.DataFrame() table["n_obs"] = n_obs_list dist_results = [] for n_obs in n_obs_list: n_errors = 0 statistics = [] for _ in range(n_tests): x = random_state.normal(0, 1, size=(n_samples, dim)) y = random_state.normal(0, 1, size=(n_samples, dim)) test_result = dcor.independence.distance_correlation_t_test(x, y) statistics.append(test_result.statistic) if test_result.pvalue < significance: n_errors += 1 error_prob = n_errors / n_tests dist_results.append(error_prob) table["Type I error"] = dist_results # Plot the last distribution of the statistic df = len(x) * (len(x) - 3) / 2 plt.hist(statistics, bins=12, density=True) distribution = scipy.stats.t(df=df) u = np.linspace(distribution.ppf(0.01), distribution.ppf(0.99), 100) plt.plot(u, distribution.pdf(u)) plt.show() table .. image-sg:: /auto_examples/images/sphx_glr_plot_dcor_t_test_003.png :alt: plot dcor t test :srcset: /auto_examples/images/sphx_glr_plot_dcor_t_test_003.png :class: sphx-glr-single-img .. raw:: html
n_obs Type I error
0 25 0.08
1 30 0.18
2 35 0.12
3 50 0.12
4 70 0.14
5 100 0.07


.. GENERATED FROM PYTHON SOURCE LINES 121-124 Bibliography ------------ .. footbibliography:: .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 44.814 seconds) .. _sphx_glr_download_auto_examples_plot_dcor_t_test.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/VNMabus/dcor/master?filepath=examples/plot_dcor_t_test.py :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_dcor_t_test.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_dcor_t_test.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_