.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/plot_dcor_t_test.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_auto_examples_plot_dcor_t_test.py>`
        to download the full example code or to run this example in your browser via Binder

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_plot_dcor_t_test.py:


The distance correlation t-test of independence
===============================================

Example that shows the usage of the distance correlation t-test.

.. GENERATED FROM PYTHON SOURCE LINES 8-18

.. code-block:: default


    import matplotlib.pyplot as plt
    import numpy as np
    import pandas as pd
    import scipy.stats

    import dcor

    # sphinx_gallery_thumbnail_number = 3


.. GENERATED FROM PYTHON SOURCE LINES 19-24

Given matching samples of two random vectors with arbitrary dimensions, the
distance covariance can be used to construct an asymptotic test of
independence.
For a introduction to the independence tests see
:ref:`sphx_glr_auto_examples_plot_dcov_test.py`.

.. GENERATED FROM PYTHON SOURCE LINES 26-27

We can consider the same case with independent observations:

.. GENERATED FROM PYTHON SOURCE LINES 27-39

.. code-block:: default


    n_samples = 1000
    random_state = np.random.default_rng(83110)

    x = random_state.uniform(0, 1, size=n_samples)
    y = random_state.normal(0, 1, size=n_samples)

    plt.scatter(x, y, s=1)
    plt.show()

    dcor.independence.distance_correlation_t_test(x, y)


.. image-sg:: /auto_examples/images/sphx_glr_plot_dcor_t_test_001.png
   :alt: plot dcor t test
   :srcset: /auto_examples/images/sphx_glr_plot_dcor_t_test_001.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    HypothesisTest(pvalue=0.8250527908156173, statistic=-0.9347949830875785)


.. GENERATED FROM PYTHON SOURCE LINES 40-41

We can also consider the case with nonlinear dependencies:

.. GENERATED FROM PYTHON SOURCE LINES 41-58

.. code-block:: default


    u = random_state.uniform(-1, 1, size=n_samples)

    y = (
        np.cos(u * np.pi)
        + random_state.normal(0, 0.01, size=n_samples)
    )
    x = (
        np.sin(u * np.pi)
        + random_state.normal(0, 0.01, size=n_samples)
    )

    plt.scatter(x, y, s=1)
    plt.show()

    dcor.independence.distance_correlation_t_test(x, y)


.. image-sg:: /auto_examples/images/sphx_glr_plot_dcor_t_test_002.png
   :alt: plot dcor t test
   :srcset: /auto_examples/images/sphx_glr_plot_dcor_t_test_002.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    HypothesisTest(pvalue=0.0, statistic=29.97891473933373)


.. GENERATED FROM PYTHON SOURCE LINES 59-61

As we can observe, this test also correctly rejects the null hypothesis in
the second case and not in the first case.

.. GENERATED FROM PYTHON SOURCE LINES 63-72

The test illustrated here is an asymptotic test, that relies in the
approximation of the statistic distribution to the Student's
t-distribution under the null hypothesis, when the dimension of the data
goes to infinity.
This test is thus faster than permutation tests, as it does not require the
use of permutations of the data, and it is also deterministic for a given
dataset.
However, the test should be applied only for high-dimensional data, at least
in theory.

.. GENERATED FROM PYTHON SOURCE LINES 74-81

We will now plot for the case of normal distributions the histogram of the
statistic, and compute the Type I error, as seen in
:footcite:t:`szekely+rizzo_2013_distance`.
Users are encouraged to download this example and increase that number to
obtain better estimates of the Type I error.
In order to replicate the original results, one should set the value of
``n_tests`` to 1000.

.. GENERATED FROM PYTHON SOURCE LINES 81-120

.. code-block:: default


    n_tests = 100
    dim = 30
    significance = 0.1
    n_obs_list = [25, 30, 35, 50, 70, 100]

    table = pd.DataFrame()
    table["n_obs"] = n_obs_list

    dist_results = []
    for n_obs in n_obs_list:
        n_errors = 0
        statistics = []
        for _ in range(n_tests):
            x = random_state.normal(0, 1, size=(n_samples, dim))
            y = random_state.normal(0, 1, size=(n_samples, dim))

            test_result = dcor.independence.distance_correlation_t_test(x, y)
            statistics.append(test_result.statistic)
            if test_result.pvalue < significance:
                n_errors += 1

        error_prob = n_errors / n_tests
        dist_results.append(error_prob)

    table["Type I error"] = dist_results

    # Plot the last distribution of the statistic
    df = len(x) * (len(x) - 3) / 2

    plt.hist(statistics, bins=12, density=True)

    distribution = scipy.stats.t(df=df)
    u = np.linspace(distribution.ppf(0.01), distribution.ppf(0.99), 100)
    plt.plot(u, distribution.pdf(u))
    plt.show()

    table


.. image-sg:: /auto_examples/images/sphx_glr_plot_dcor_t_test_003.png
   :alt: plot dcor t test
   :srcset: /auto_examples/images/sphx_glr_plot_dcor_t_test_003.png
   :class: sphx-glr-single-img


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>n_obs</th>
          <th>Type I error</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>25</td>
          <td>0.08</td>
        </tr>
        <tr>
          <th>1</th>
          <td>30</td>
          <td>0.18</td>
        </tr>
        <tr>
          <th>2</th>
          <td>35</td>
          <td>0.12</td>
        </tr>
        <tr>
          <th>3</th>
          <td>50</td>
          <td>0.12</td>
        </tr>
        <tr>
          <th>4</th>
          <td>70</td>
          <td>0.14</td>
        </tr>
        <tr>
          <th>5</th>
          <td>100</td>
          <td>0.07</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 121-124

Bibliography
------------
.. footbibliography::


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  44.814 seconds)


.. _sphx_glr_download_auto_examples_plot_dcor_t_test.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example


    .. container:: binder-badge

      .. image:: images/binder_badge_logo.svg
        :target: https://mybinder.org/v2/gh/VNMabus/dcor/master?filepath=examples/plot_dcor_t_test.py
        :alt: Launch binder
        :width: 150 px

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_dcor_t_test.py <plot_dcor_t_test.py>`

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_dcor_t_test.ipynb <plot_dcor_t_test.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_