.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/plot_performance.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_performance.py: Performance of the methods ========================== This section shows the relative performance of different algorithms used to compute the functionalities offered in this package. .. GENERATED FROM PYTHON SOURCE LINES 11-31 Distance covariance and distance correlation -------------------------------------------- We offer currently three different algorithms with different performance characteristics that can be used to compute the estimators of the squared distance covariance, both biased and unbiased. These algorithms are: - The original naive algorithm, with order :math:`O(n^2)`. - The AVL-inspired fast algorithm described in :footcite:`fast_distance_correlation_avl`, which improved the performance and attained :math:`O(n\log n)` complexity. - The mergesort algorithm described in :footcite:`fast_distance_correlation_mergesort`, which also obtained :math:`O(n\log n)` complexity. The following code shows the differences in performance executing the algorithm 100 times for samples of different sizes. It then plots the resulting graph. .. GENERATED FROM PYTHON SOURCE LINES 31-121 .. code-block:: Python from timeit import repeat import matplotlib.pyplot as plt import numpy as np import dcor np.random.seed(0) n_times = 100 n_samples_list = [10, 50, 100, 250, 500, 750, 1000] avl_times = np.zeros(len(n_samples_list)) avl_uncompiled_times = np.zeros(len(n_samples_list)) mergesort_times = np.zeros(len(n_samples_list)) mergesort_uncompiled_times = np.zeros(len(n_samples_list)) naive_times = np.zeros(len(n_samples_list)) for i, n_samples in enumerate(n_samples_list): x = np.random.normal(size=n_samples) y = np.random.normal(size=n_samples) def avl(): return dcor.distance_covariance(x, y, method='AVL') def avl_uncompiled(): return dcor.distance_covariance( x, y, method='AVL', compile_mode=dcor.CompileMode.NO_COMPILE, ) def mergesort(): return dcor.distance_covariance(x, y, method='MERGESORT') def mergesort_uncompiled(): return dcor.distance_covariance( x, y, method='MERGESORT', compile_mode=dcor.CompileMode.NO_COMPILE, ) def naive(): return dcor.distance_covariance(x, y, method='NAIVE') avl_times[i] = min( repeat(avl, repeat=n_times, number=1), ) avl_uncompiled_times[i] = min( repeat(avl_uncompiled, repeat=n_times, number=1), ) mergesort_times[i] = min( repeat(mergesort, repeat=n_times, number=1), ) mergesort_uncompiled_times[i] = min( repeat(mergesort_uncompiled, repeat=n_times, number=1), ) naive_times[i] = min( repeat(naive, repeat=n_times, number=1), ) plt.title("Distance covariance performance comparison") plt.xlabel("Number of samples") plt.ylabel("Time (seconds)") plt.plot( n_samples_list, avl_times, label="avl", linewidth=2, marker="o", ) plt.plot( n_samples_list, mergesort_times, label="mergesort", linewidth=2, marker="o", ) plt.plot( n_samples_list, naive_times, label="naive", linewidth=2, marker="o", ) plt.legend() plt.show() .. image-sg:: /auto_examples/images/sphx_glr_plot_performance_001.png :alt: Distance covariance performance comparison :srcset: /auto_examples/images/sphx_glr_plot_performance_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 122-125 We can see that the performance of the fast methods is much better than the performance of the naive algorithm. We can also check the difference made by the compilation with Numba: .. GENERATED FROM PYTHON SOURCE LINES 125-140 .. code-block:: Python plt.title("Distance covariance performance comparison") plt.xlabel("Number of samples") plt.ylabel("Time (seconds)") plt.plot(n_samples_list, avl_times, label="avl") plt.plot(n_samples_list, mergesort_times, label="mergesort") plt.plot(n_samples_list, avl_uncompiled_times, label="avl (uncompiled)") plt.plot( n_samples_list, mergesort_uncompiled_times, label="mergesort (uncompiled)", ) plt.legend() plt.show() .. image-sg:: /auto_examples/images/sphx_glr_plot_performance_002.png :alt: Distance covariance performance comparison :srcset: /auto_examples/images/sphx_glr_plot_performance_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 141-145 In order to see the differences between the two fast methods, we will again compute them with more samples. The large sample sizes used here could not be used with the naive algorithm, as its used memory also grows quadratically. .. GENERATED FROM PYTHON SOURCE LINES 145-171 .. code-block:: Python n_samples_list = [10, 50, 100, 500, 1000, 5000, 10000] avl_times = np.zeros(len(n_samples_list)) mergesort_times = np.zeros(len(n_samples_list)) for i, n_samples in enumerate(n_samples_list): x = np.random.normal(size=n_samples) y = np.random.normal(size=n_samples) def avl(): return dcor.distance_covariance(x, y, method='AVL') def mergesort(): return dcor.distance_covariance(x, y, method='MERGESORT') avl_times[i] = min(repeat(avl, repeat=n_times, number=1)) mergesort_times[i] = min(repeat(mergesort, repeat=n_times, number=1)) plt.title("Distance covariance performance comparison") plt.xlabel("Number of samples") plt.ylabel("Time (seconds)") plt.plot(n_samples_list, avl_times, label="avl") plt.plot(n_samples_list, mergesort_times, label="mergesort") plt.legend() plt.show() .. image-sg:: /auto_examples/images/sphx_glr_plot_performance_003.png :alt: Distance covariance performance comparison :srcset: /auto_examples/images/sphx_glr_plot_performance_003.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 172-184 Parallel computation of distance covariance -------------------------------------------- The following code shows the computation of the distance covariance between several random variables, using the :func:`dcor.rowwise` function. If the machine has several CPUs, the time spent using the parallel implementation would be divided by the number of CPUs. If there is only one, there will be no difference. For now, optimized and parallel implementations are only available for the fast AVL method, which is used by default when the operation is between random variables, and not random vectors. .. GENERATED FROM PYTHON SOURCE LINES 184-235 .. code-block:: Python n_times = 100 n_samples = 1000 n_comps_list = [10, 50, 100] naive_times = np.zeros(len(n_comps_list)) cpu_times = np.zeros(len(n_comps_list)) parallel_times = np.zeros(len(n_comps_list)) for i, n_comps in enumerate(n_comps_list): x = np.random.normal(size=(n_comps, n_samples)) y = np.random.normal(size=(n_comps, n_samples)) def naive(): return dcor.rowwise( dcor.distance_covariance_sqr, x, y, rowwise_mode=dcor.RowwiseMode.NAIVE, ) def cpu(): return dcor.rowwise( dcor.distance_covariance_sqr, x, y, compile_mode=dcor.CompileMode.COMPILE_CPU, ) def parallel(): return dcor.rowwise( dcor.distance_covariance_sqr, x, y, compile_mode=dcor.CompileMode.COMPILE_PARALLEL, ) naive_times[i] = min(repeat(naive, repeat=n_times, number=1)) cpu_times[i] = min(repeat(cpu, repeat=n_times, number=1)) parallel_times[i] = min(repeat(parallel, repeat=n_times, number=1)) plt.title("Distance covariance performance comparison") plt.xlabel("Number of computations of distance covariance") plt.ylabel("Time (seconds)") plt.plot(n_comps_list, naive_times, label="naive") plt.plot(n_comps_list, cpu_times, label="cpu") plt.plot(n_comps_list, parallel_times, label="parallel") plt.legend() plt.show() .. image-sg:: /auto_examples/images/sphx_glr_plot_performance_004.png :alt: Distance covariance performance comparison :srcset: /auto_examples/images/sphx_glr_plot_performance_004.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 236-239 Bibliography ------------ .. footbibliography:: .. rst-class:: sphx-glr-timing **Total running time of the script:** (3 minutes 36.030 seconds) .. _sphx_glr_download_auto_examples_plot_performance.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/VNMabus/dcor/develop?filepath=examples/plot_performance.py :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_performance.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_performance.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_performance.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_