{ "cells": [ { "cell_type": "markdown", "id": "91f05010-b82d-42f3-b882-4f689eaa946c", "metadata": {}, "source": [ "(migration_guide)=\n", "# ArviZ migration guide\n", "\n", "We have been working on refactoring ArviZ to allow more flexibility and extensibility of its elements\n", "while keeping as much as possible a friendly user-interface that gives sensible results with little to no arguments.\n", "\n", "One important change is enhanced modularity. Everything will still be available through a common namespace `arviz`,\n", "but ArviZ will now be composed of 3 smaller libraries:\n", "\n", "* [arviz-base](https://arviz-base.readthedocs.io/en/latest/) data related functionality, including converters from different PPLs.\n", "* [arviz-stats](https://arviz-stats.readthedocs.io/en/latest/) for statistical functions and diagnostics.\n", "* [arviz-plots](https://arviz-plots.readthedocs.io/en/latest/) for visual checks built on top of arviz-stats and arviz-base.\n", "\n", "Each library depends only on a minimal set of libraries, with a lot of functionality built on top of optional dependencies.\n", "This keeps ArviZ smaller and easier to install as you can install only the components you really need. The main examples are:\n", "\n", "* `arviz-base` has no I/O library as a dependency, but you can use `netcdf4`, `h5netcdf` or `zarr` to read and write your data, allowing you to install only the one you need.\n", "* `arviz-plots` has no plotting library as a dependency, but it can generate plots with `matplotlib`, `bokeh` or `plotly` if they are installed.\n", "\n", "At the time of writing, `arviz-xyz` libraries are independent of the `arviz` library, but `arviz` tries to import the `arviz-xyz` libraries\n", "and exposes all their elements through the `arviz.preview` namespace. In the future, with the ArviZ 1.0 release, the `arviz` namespace will look\n", "like `arviz.preview` looks like today.\n", "\n", "We encourage you to try it out and get a head start on the migration!" ] }, { "cell_type": "code", "execution_count": 1, "id": "4074f836-233b-4b10-8483-d2177cad7424", "metadata": {}, "outputs": [], "source": [ "import arviz.preview as az\n", "# change to import arviz as az after ArviZ 1.0 release" ] }, { "cell_type": "markdown", "id": "a28ab6bd-d3c1-4f71-981f-3444f39ee249", "metadata": {}, "source": [ "Check all 3 libraries have been exposed correctly:" ] }, { "cell_type": "code", "execution_count": 2, "id": "7c792e9f-9a22-4ba0-ad11-138fbc510784", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "arviz_base available, exposing its functions as part of arviz.preview\n", "arviz_stats available, exposing its functions as part of arviz.preview\n", "arviz_plots available, exposing its functions as part of arviz.preview\n", "\n" ] } ], "source": [ "print(az.info)" ] }, { "cell_type": "markdown", "id": "dba357f5-cc19-4fe3-918c-6a540723c3e9", "metadata": {}, "source": [ "## `arviz-base`" ] }, { "cell_type": "markdown", "id": "f9f0a3a9", "metadata": {}, "source": [ "### Credible intervals and rcParams\n", "\n", "Some global configuration settings have changed. For example, the default credible interval probability (`ci_prob`) has been updated from 0.94 to 0.89. Using 0.89 produces intervals with lower variability, leading to more stable summaries. At the same time, keeping a non-standard value (rather than 0.90 or 0.95) serves as a friendly reminder that the choice of interval should reflect the problem at hand.\n", "\n", "In addition, a new setting `ci_kind` has been introduced, which defaults to \"eti\" (equal-tailed interval). This controls the method used to compute credible intervals. The alternative is \"hdi\" (highest density interval), which was previously the default.\n", "\n", "Defaults set via `rcParams` are not fixed rules, they’re meant to be adjusted to fit the needs of your analysis. `rcParams` offers a convenient way to establish global defaults for your workflow, while most functions that compute credible intervals also provide `ci_prob` and `ci_kind` arguments to override these settings locally.\n", "\n", "\n", "You can check all defatult settings with:" ] }, { "cell_type": "code", "execution_count": 3, "id": "6e2e42a9", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "RcParams({'data.http_protocol': 'https',\n", " 'data.index_origin': 0,\n", " 'data.sample_dims': ('chain', 'draw'),\n", " 'data.save_warmup': False,\n", " 'plot.backend': 'matplotlib',\n", " 'plot.density_kind': 'kde',\n", " 'plot.max_subplots': 40,\n", " 'stats.ci_kind': 'eti',\n", " 'stats.ci_prob': 0.89,\n", " 'stats.ic_compare_method': 'stacking',\n", " 'stats.ic_pointwise': True,\n", " 'stats.ic_scale': 'log',\n", " 'stats.module': 'base',\n", " 'stats.point_estimate': 'mean'})" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "az.rcParams" ] }, { "cell_type": "markdown", "id": "b73c5b99-234c-4b5d-ab3e-adfce2fb2edc", "metadata": {}, "source": [ "### `DataTree`\n", "One of the main differences is the `arviz.InferenceData` object doesn't exist any more.\n", "`arviz-base` uses `xarray.DataTree` instead. This is a new data structure in xarray,\n", "so it might still have some rough edges, but it is much more flexible and powerful.\n", "To give some examples, I/O will now be more flexible, and any format supported by\n", "xarray is automatically available to you, no need to add wrappers on top of them within ArviZ.\n", "It is also possible to have arbitrary nesting of variables within groups and subgroups.\n", "\n", ":::{important}\n", "Not all the functionality on `xarray.DataTree` will be compatible with ArviZ as it would be too much\n", "work for us to cover and maintain. If there are things you have always wanted to do but\n", "were not possible with `InferenceData` and are now possible with `DataTree` please try\n", "them out, give feedback on them and on desired behaviour for things that still don't work.\n", "After a couple releases the \"ArviZverse\" will stabilize much more and it might not be\n", "possible to add support for that anymore.\n", ":::\n", "\n", "#### I already have `InferenceData` object from an external library\n", "\n", "`InferenceData` already has a method to convert it to DataTree, `idata.to_datatree()`, but you can also use\n", "\n", "```python\n", "az.convert_to_datatree(idata)\n", "```\n" ] }, { "cell_type": "code", "execution_count": 4, "id": "d005abd8", "metadata": {}, "outputs": [], "source": [ "import arviz as arviz_legacy\n", "idata = arviz_legacy.load_arviz_data(\"centered_eight\")" ] }, { "cell_type": "code", "execution_count": 5, "id": "f7c75219", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
<xarray.DatasetView> Size: 0B\n", "Dimensions: ()\n", "Data variables:\n", " *empty*
<xarray.DatasetView> Size: 0B\n", "Dimensions: ()\n", "Data variables:\n", " *empty*" ], "text/plain": [ "
<xarray.DatasetView> Size: 0B\n", "Dimensions: ()\n", "Data variables:\n", " *empty*" ], "text/plain": [ "
\n", " | mu | \n", "theta[Choate] | \n", "theta[Deerfield] | \n", "theta[Phillips Andover] | \n", "theta[Phillips Exeter] | \n", "theta[Hotchkiss] | \n", "theta[Lawrenceville] | \n", "theta[St. Paul's] | \n", "theta[Mt. Hermon] | \n", "tau | \n", "
---|---|---|---|---|---|---|---|---|---|---|
(0, 0) | \n", "7.871796 | \n", "12.320686 | \n", "9.905367 | \n", "14.951615 | \n", "11.011485 | \n", "5.579602 | \n", "16.901795 | \n", "13.198059 | \n", "15.061366 | \n", "4.725740 | \n", "
(0, 1) | \n", "3.384554 | \n", "11.285623 | \n", "9.129324 | \n", "3.139263 | \n", "9.433211 | \n", "7.811516 | \n", "2.393088 | \n", "10.055223 | \n", "6.176724 | \n", "3.908994 | \n", "
(0, 2) | \n", "9.100476 | \n", "5.708506 | \n", "5.757932 | \n", "10.944585 | \n", "5.895436 | \n", "9.992984 | \n", "8.143327 | \n", "7.604753 | \n", "8.767647 | \n", "4.844025 | \n", "
(0, 3) | \n", "7.304293 | \n", "10.037275 | \n", "8.809068 | \n", "9.900924 | \n", "5.768832 | \n", "9.062876 | \n", "6.958424 | \n", "10.298256 | \n", "3.155304 | \n", "1.856703 | \n", "
(0, 4) | \n", "9.879675 | \n", "9.149146 | \n", "5.764986 | \n", "7.015397 | \n", "15.688710 | \n", "3.097395 | \n", "12.025763 | \n", "11.316745 | \n", "17.046142 | \n", "4.748409 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
(3, 495) | \n", "1.542688 | \n", "3.737751 | \n", "5.393632 | \n", "0.487845 | \n", "4.015486 | \n", "0.717057 | \n", "-2.675760 | \n", "0.415968 | \n", "-4.991247 | \n", "2.786072 | \n", "
(3, 496) | \n", "1.858580 | \n", "-0.291737 | \n", "0.110315 | \n", "1.468877 | \n", "-3.653346 | \n", "1.844292 | \n", "6.055714 | \n", "4.986218 | \n", "9.290380 | \n", "4.281961 | \n", "
(3, 497) | \n", "1.766733 | \n", "3.532515 | \n", "2.008901 | \n", "0.510806 | \n", "0.832185 | \n", "2.647687 | \n", "4.707249 | \n", "3.073314 | \n", "-2.623069 | \n", "2.740607 | \n", "
(3, 498) | \n", "3.486112 | \n", "4.182751 | \n", "7.554251 | \n", "4.456034 | \n", "3.300833 | \n", "1.563307 | \n", "1.528958 | \n", "1.096098 | \n", "8.452282 | \n", "2.932379 | \n", "
(3, 499) | \n", "3.404464 | \n", "0.192956 | \n", "6.498428 | \n", "-0.894424 | \n", "6.849020 | \n", "1.859747 | \n", "7.936460 | \n", "6.762455 | \n", "1.295051 | \n", "4.461246 | \n", "
2000 rows × 10 columns
\n", "<xarray.DatasetView> Size: 656B\n", "Dimensions: (school: 8)\n", "Coordinates:\n", " * school (school) <U16 512B 'Choate' 'Deerfield' ... 'Mt. Hermon'\n", "Data variables:\n", " mu float64 8B 1.65e+03\n", " theta_t (school) float64 64B 2.058e+03 2.51e+03 ... 2.455e+03 2.757e+03\n", " tau float64 8B 1.115e+03\n", " theta (school) float64 64B 1.942e+03 2.199e+03 ... 2.079e+03 2.106e+03
<xarray.DatasetView> Size: 96B\n", "Dimensions: (chain: 4)\n", "Coordinates:\n", " * chain (chain) int64 32B 0 1 2 3\n", "Data variables:\n", " theta (chain) float64 32B 1.129e+03 408.2 329.2 580.9\n", " theta_t (chain) float64 32B 499.2 339.0 430.1 1.052e+03
<xarray.DatasetView> Size: 840B\n", "Dimensions: (ci_bound: 2, school: 8)\n", "Coordinates:\n", " * ci_bound (ci_bound) <U5 40B 'lower' 'upper'\n", " * school (school) <U16 512B 'Choate' 'Deerfield' ... 'Mt. Hermon'\n", "Data variables:\n", " mu (ci_bound) float64 16B -1.119 9.318\n", " theta_t (school, ci_bound) float64 128B -1.199 2.192 ... -1.451 1.739\n", " tau (ci_bound) float64 16B 0.004998 7.744\n", " theta (school, ci_bound) float64 128B -2.087 14.15 ... -2.658 13.1
<xarray.DatasetView> Size: 328B\n", "Dimensions: (chain: 4, ci_bound: 2)\n", "Coordinates:\n", " * chain (chain) int64 32B 0 1 2 3\n", " * ci_bound (ci_bound) <U5 40B 'lower' 'upper'\n", "Data variables:\n", " mu (chain, ci_bound) float64 64B -1.611 8.839 ... -0.9635 9.697\n", " theta_t (chain, ci_bound) float64 64B -1.438 1.679 -1.493 ... -1.405 1.669\n", " tau (chain, ci_bound) float64 64B 0.0111 7.34 0.02253 ... 0.1617 8.106\n", " theta (chain, ci_bound) float64 64B -2.477 12.87 -2.538 ... -3.462 12.54" ], "text/plain": [ "
<xarray.DatasetView> Size: 656B\n", "Dimensions: (school: 8)\n", "Coordinates:\n", " * school (school) <U16 512B 'Choate' 'Deerfield' ... 'Mt. Hermon'\n", "Data variables:\n", " mu float64 8B 1.65e+03\n", " theta_t (school) float64 64B 2.058e+03 2.51e+03 ... 2.455e+03 2.757e+03\n", " tau float64 8B 1.115e+03\n", " theta (school) float64 64B 1.942e+03 2.199e+03 ... 2.079e+03 2.106e+03" ], "text/plain": [ "