Jekyll2023-10-17T17:03:54+00:00https://benslack19.github.io/feed.xmlbenslack19Musings on education, neuroscience, and data science.Ben LacarWhen the Spider-Man meme is relevant to multilevel models2022-09-13T00:00:00+00:002022-09-13T00:00:00+00:00https://benslack19.github.io/data%20science/statistics/mixed_effects_freqvsbayes_cafes<p>For a while, I’ve wondered about the different approches for multilevel modeling, also known as mixed effects modeling. My initial understanding is with a Bayesian perspective since I learned about it from Statistical Rethinking. But when hearing others talk about “fixed effects”, “varying effects”, “random effects”, and “mixed effects”, I had trouble connecting my own understanding of the concept to theirs. Even more perplexing, I wasn’t sure what the <em>source(s)</em> of the differences were:</p>
<ul>
<li>It it a frequentist vs. Bayesian thing?</li>
<li>Is it a statistical package thing?</li>
<li>Is it because there are five different definitions of “fixed and random effects”, <a href="https://statmodeling.stat.columbia.edu/2005/01/25/why_i_dont_use/">infamously observed by Andrew Gelman</a> and why he avoids using those terms?</li>
</ul>
<p>I decided to take a deep dive to resolve my confusion, with much help from numerous sources. Please check out the <a href="#acknowledgements-and-references">Acknowledgments and references</a> section!</p>
<p>In this post, I’ll be comparing an example of mixed effects modeling across statistical philosophies and across statistical languages. As a bonus, a meme awaits.</p>
<table>
<thead>
<tr>
<th>method</th>
<th>approach</th>
<th>language</th>
<th>package</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>frequentist</td>
<td>R</td>
<td><code class="language-plaintext highlighter-rouge">lme4</code></td>
</tr>
<tr>
<td>2</td>
<td>Bayesian</td>
<td>Python</td>
<td><code class="language-plaintext highlighter-rouge">pymc</code></td>
</tr>
</tbody>
</table>
<p>Note that the default language in the code blocks is Python. A cell running R will have <code class="language-plaintext highlighter-rouge">%%R</code> designated at the top. A variable can be inputted (<code class="language-plaintext highlighter-rouge">-i</code>) or outputted (<code class="language-plaintext highlighter-rouge">-o</code>) on that same line if it is used between the two languages.</p>
<p><em>Special thanks to Patrick Robotham for providing a lot of feedback.</em></p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">aesara</span> <span class="kn">import</span> <span class="n">tensor</span> <span class="k">as</span> <span class="n">at</span>
<span class="kn">import</span> <span class="nn">arviz</span> <span class="k">as</span> <span class="n">az</span>
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="n">pd</span>
<span class="kn">import</span> <span class="nn">pymc</span> <span class="k">as</span> <span class="n">pm</span>
<span class="kn">import</span> <span class="nn">xarray</span> <span class="k">as</span> <span class="n">xr</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">%</span><span class="n">config</span> <span class="n">InlineBackend</span><span class="p">.</span><span class="n">figure_format</span> <span class="o">=</span> <span class="s">'retina'</span>
<span class="n">az</span><span class="p">.</span><span class="n">style</span><span class="p">.</span><span class="n">use</span><span class="p">(</span><span class="s">"arviz-darkgrid"</span><span class="p">)</span>
<span class="n">rng</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">default_rng</span><span class="p">(</span><span class="mi">1234</span><span class="p">)</span>
<span class="n">az</span><span class="p">.</span><span class="n">rcParams</span><span class="p">[</span><span class="s">"stats.hdi_prob"</span><span class="p">]</span> <span class="o">=</span> <span class="mf">0.95</span>
<span class="k">def</span> <span class="nf">standardize</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="n">x</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span> <span class="o">-</span> <span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">x</span><span class="p">))</span> <span class="o">/</span> <span class="n">np</span><span class="p">.</span><span class="n">std</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="k">return</span> <span class="n">x</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Enable running of R code
</span><span class="o">%</span><span class="n">load_ext</span> <span class="n">rpy2</span><span class="p">.</span><span class="n">ipython</span>
</code></pre></div></div>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">%%</span><span class="n">R</span><span class="w">
</span><span class="n">suppressMessages</span><span class="p">(</span><span class="n">library</span><span class="p">(</span><span class="n">tidyverse</span><span class="p">))</span><span class="w">
</span><span class="n">suppressMessages</span><span class="p">(</span><span class="n">library</span><span class="p">(</span><span class="n">lme4</span><span class="p">))</span><span class="w">
</span><span class="n">suppressMessages</span><span class="p">(</span><span class="n">library</span><span class="p">(</span><span class="n">arm</span><span class="p">))</span><span class="w">
</span><span class="n">suppressMessages</span><span class="p">(</span><span class="n">library</span><span class="p">(</span><span class="n">merTools</span><span class="p">))</span><span class="w">
</span></code></pre></div></div>
<h1 id="create-synthetic-cafe-dataset">Create synthetic <code class="language-plaintext highlighter-rouge">cafe</code> dataset</h1>
<p>The dataset I am using is created from a scenario described in Statistical Rethinking.</p>
<p>Here are a few more details of the dataset from Dr. McElreath’s book:</p>
<blockquote>
<p>Begin by defining the population of cafés that the robot might visit. This means we’ll define the average wait time in the morning and the afternoon, as well as the correlation between them. These numbers are sufficient to define the average properties of the cafés. Let’s define these properties, then we’ll sample cafés from them.</p>
</blockquote>
<p>Nearly all Python code is taken from the <a href="https://github.com/pymc-devs/pymc-resources/blob/main/Rethinking_2/Chp_14.ipynb">Statistical Rethinking pymc repo</a> with some minor alterations.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">a</span> <span class="o">=</span> <span class="mf">3.5</span> <span class="c1"># average morning wait time
</span><span class="n">b</span> <span class="o">=</span> <span class="o">-</span><span class="mf">1.0</span> <span class="c1"># average difference afternoon wait time
</span><span class="n">sigma_a</span> <span class="o">=</span> <span class="mf">1.0</span> <span class="c1"># std dev in intercepts
</span><span class="n">sigma_b</span> <span class="o">=</span> <span class="mf">0.5</span> <span class="c1"># std dev in slopes
</span><span class="n">rho</span> <span class="o">=</span> <span class="o">-</span><span class="mf">0.7</span> <span class="c1"># correlation between intercepts and slopes
</span>
<span class="n">Mu</span> <span class="o">=</span> <span class="p">[</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">]</span>
<span class="n">sigmas</span> <span class="o">=</span> <span class="p">[</span><span class="n">sigma_a</span><span class="p">,</span> <span class="n">sigma_b</span><span class="p">]</span>
<span class="n">Rho</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">matrix</span><span class="p">([[</span><span class="mi">1</span><span class="p">,</span> <span class="n">rho</span><span class="p">],</span> <span class="p">[</span><span class="n">rho</span><span class="p">,</span> <span class="mi">1</span><span class="p">]])</span>
<span class="n">Sigma</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">diag</span><span class="p">(</span><span class="n">sigmas</span><span class="p">)</span> <span class="o">*</span> <span class="n">Rho</span> <span class="o">*</span> <span class="n">np</span><span class="p">.</span><span class="n">diag</span><span class="p">(</span><span class="n">sigmas</span><span class="p">)</span> <span class="c1"># covariance matrix
</span>
<span class="n">N_cafes</span> <span class="o">=</span> <span class="mi">20</span>
<span class="n">vary_effects</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">multivariate_normal</span><span class="p">(</span><span class="n">mean</span><span class="o">=</span><span class="n">Mu</span><span class="p">,</span> <span class="n">cov</span><span class="o">=</span><span class="n">Sigma</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">N_cafes</span><span class="p">)</span>
<span class="n">a_cafe</span> <span class="o">=</span> <span class="n">vary_effects</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">]</span>
<span class="n">b_cafe</span> <span class="o">=</span> <span class="n">vary_effects</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">]</span>
</code></pre></div></div>
<p>Now simulate the observations.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">N_visits</span> <span class="o">=</span> <span class="mi">10</span>
<span class="n">afternoon</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">tile</span><span class="p">([</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span> <span class="n">N_visits</span> <span class="o">*</span> <span class="n">N_cafes</span> <span class="o">//</span> <span class="mi">2</span><span class="p">)</span>
<span class="n">cafe_id</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">repeat</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">N_cafes</span><span class="p">),</span> <span class="n">N_visits</span><span class="p">)</span>
<span class="n">mu</span> <span class="o">=</span> <span class="n">a_cafe</span><span class="p">[</span><span class="n">cafe_id</span><span class="p">]</span> <span class="o">+</span> <span class="n">b_cafe</span><span class="p">[</span><span class="n">cafe_id</span><span class="p">]</span> <span class="o">*</span> <span class="n">afternoon</span>
<span class="n">sigma</span> <span class="o">=</span> <span class="mf">0.5</span> <span class="c1"># std dev within cafes
</span>
<span class="n">wait</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">normal</span><span class="p">(</span><span class="n">loc</span><span class="o">=</span><span class="n">mu</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="n">sigma</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">N_visits</span> <span class="o">*</span> <span class="n">N_cafes</span><span class="p">)</span>
<span class="n">df_cafes</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="nb">dict</span><span class="p">(</span><span class="n">cafe</span><span class="o">=</span><span class="n">cafe_id</span><span class="p">,</span> <span class="n">afternoon</span><span class="o">=</span><span class="n">afternoon</span><span class="p">,</span> <span class="n">wait</span><span class="o">=</span><span class="n">wait</span><span class="p">))</span>
</code></pre></div></div>
<p>To get a sense of the data structure we just created, let’s take a look at the first and last 5 rows.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df_cafes</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>cafe</th>
<th>afternoon</th>
<th>wait</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>0</td>
<td>0</td>
<td>2.724888</td>
</tr>
<tr>
<th>1</th>
<td>0</td>
<td>1</td>
<td>1.951626</td>
</tr>
<tr>
<th>2</th>
<td>0</td>
<td>0</td>
<td>2.488389</td>
</tr>
<tr>
<th>3</th>
<td>0</td>
<td>1</td>
<td>1.188077</td>
</tr>
<tr>
<th>4</th>
<td>0</td>
<td>0</td>
<td>2.026425</td>
</tr>
</tbody>
</table>
</div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df_cafes</span><span class="p">.</span><span class="n">tail</span><span class="p">()</span>
</code></pre></div></div>
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>cafe</th>
<th>afternoon</th>
<th>wait</th>
</tr>
</thead>
<tbody>
<tr>
<th>195</th>
<td>19</td>
<td>1</td>
<td>3.394933</td>
</tr>
<tr>
<th>196</th>
<td>19</td>
<td>0</td>
<td>4.544430</td>
</tr>
<tr>
<th>197</th>
<td>19</td>
<td>1</td>
<td>2.719524</td>
</tr>
<tr>
<th>198</th>
<td>19</td>
<td>0</td>
<td>3.379111</td>
</tr>
<tr>
<th>199</th>
<td>19</td>
<td>1</td>
<td>2.459750</td>
</tr>
</tbody>
</table>
</div>
<p>Note that this dataset is balanced, meaning that each group (cafe) has the same number of observations. Mixed effects / multilevel models shine with unbalanced data where it can leverage partial pooling.</p>
<h1 id="visualize-data">Visualize data</h1>
<p>Let’s plot the raw data and see how the effect of afternoon influences wait time. Instead of plotting in order of the arbitrarily named cafes (0 to 19), I’ll show in order of increasing average morning wait time so that we can appreciate the variability across the dataset.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df_cafes</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>cafe</th>
<th>afternoon</th>
<th>wait</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>0</td>
<td>0</td>
<td>2.644592</td>
</tr>
<tr>
<th>1</th>
<td>0</td>
<td>1</td>
<td>2.126485</td>
</tr>
<tr>
<th>2</th>
<td>0</td>
<td>0</td>
<td>2.596465</td>
</tr>
<tr>
<th>3</th>
<td>0</td>
<td>1</td>
<td>2.250297</td>
</tr>
<tr>
<th>4</th>
<td>0</td>
<td>0</td>
<td>3.310709</td>
</tr>
</tbody>
</table>
</div>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">%%</span><span class="n">R</span><span class="w"> </span><span class="o">-</span><span class="n">i</span><span class="w"> </span><span class="n">df_cafes</span><span class="w">
</span><span class="c1"># credit to TJ Mahr for a template of this code</span><span class="w">
</span><span class="n">xlab</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="s2">"Afternoon"</span><span class="w">
</span><span class="n">ylab</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="s2">"Wait time"</span><span class="w">
</span><span class="n">titlelab</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="s2">"Wait times for each cafe (ordered by increasing average time)"</span><span class="w">
</span><span class="c1"># order by increasing average morning wait time (intercept only)</span><span class="w">
</span><span class="n">cafe_ordered_by_avgwaittime</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">df_cafes</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">filter</span><span class="p">(</span><span class="n">afternoon</span><span class="o">==</span><span class="m">0</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">group_by</span><span class="p">(</span><span class="n">cafe</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">summarize</span><span class="p">(</span><span class="n">mean</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">mean</span><span class="p">(</span><span class="n">wait</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">arrange</span><span class="p">(</span><span class="n">mean</span><span class="p">)</span><span class="w">
</span><span class="c1"># Turn the gear column from a numeric in a factor with a certain order</span><span class="w">
</span><span class="n">df_cafes</span><span class="o">$</span><span class="n">cafe</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">factor</span><span class="p">(</span><span class="n">df_cafes</span><span class="o">$</span><span class="n">cafe</span><span class="p">,</span><span class="w"> </span><span class="n">levels</span><span class="o">=</span><span class="n">cafe_ordered_by_avgwaittime</span><span class="o">$</span><span class="n">cafe</span><span class="p">)</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">df_cafes</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">afternoon</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">wait</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_boxplot</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">fill</span><span class="o">=</span><span class="n">factor</span><span class="p">(</span><span class="n">afternoon</span><span class="p">)))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">stat_summary</span><span class="p">(</span><span class="n">fun.y</span><span class="o">=</span><span class="s2">"mean"</span><span class="p">,</span><span class="w"> </span><span class="n">geom</span><span class="o">=</span><span class="s2">"line"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">facet_wrap</span><span class="p">(</span><span class="s2">"cafe"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">labs</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">xlab</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ylab</span><span class="p">,</span><span class="w"> </span><span class="n">title</span><span class="o">=</span><span class="n">titlelab</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><img src="/assets/2022-09-13-mixed_effects_freqvsbayes_cafes_files/2022-09-13-mixed_effects_freqvsbayes_cafes_15_0.png" alt="png" /></p>
<p>One pattern is that as we increase morning wait time (e.g. the intercept) the difference in wait time in the afternoon (the slope) gets bigger. In other words, when we simulated this dataset, we included a <em>co-variance</em> structure between the intercept and slope. When we develop an inferential model with this data, we want to be able to reveal this co-variance.</p>
<h1 id="definitions-of-mixed-effects-modeling">Definitions of mixed effects modeling</h1>
<h2 id="equation-set-1-both-fixed-and-random-effects-terms-in-linear-model">Equation set 1: both fixed and random effects terms in linear model</h2>
<p><a href="https://link.springer.com/book/10.1007/978-1-4614-3900-4">Galecki and Burzykowski</a>, <a href="https://en.wikipedia.org/wiki/Mixed_model">Wikipedia</a>, and <a href="https://stats.oarc.ucla.edu/other/mult-pkg/introduction-to-linear-mixed-models/">this page from UCLA</a> all describe a linear mixed model with an equation similar to equation 1 below.</p>
<p><em>I rely heavily on the UCLA page since it is the one that helped me the most. In fact, if you don’t care about how it connects to the Bayesian approach, stop reading this and check that out instead!</em></p>
<p>In contrast to the Bayesian set of equations, the fixed effects and random effects are in the same equation here.</p>
\[\textbf{y} = \textbf{X} \boldsymbol{\beta} + \textbf{Z} \textbf{u} + \boldsymbol{\epsilon} \tag{1}\]
<p>The left side of the equation $\textbf{y}$ represents all of our observations (or the wait time in the cafe example). The $\boldsymbol{\beta}$ in the first term of the equation represents a vector of coefficients across the population of cafes. These are the fixed effects. The $\textbf{u}$ in the second term of equation 1 represents a matrix of coefficients for <em>each individual cafe</em>. These are the random effects. Both $\textbf{X}$ and $\textbf{Z}$ are the design matrix of covariates. Finally, there’s a residual error term $\boldsymbol{\epsilon}$.</p>
<p>When relating this equation all back to the cafe dataset we just created, I needed to dig deeper to how terms represented an individual observation versus the group (cafe) level. Doing a dimensional analysis helped.</p>
<table>
<thead>
<tr>
<th>Equation 1 variable</th>
<th>Dimensions</th>
<th>Effects type</th>
<th>Comment</th>
</tr>
</thead>
<tbody>
<tr>
<td>$\textbf{y}$</td>
<td>200 x 1</td>
<td>n/a</td>
<td>This vector represents the wait time for all 200 observations. I’ll refer to this as $w_i$ later in equation 2.</td>
</tr>
<tr>
<td>$\textbf{X}$</td>
<td>200 x 2</td>
<td>associated with fixed</td>
<td>The first column of each observation is 1 since it is multiplied by the intercept term. The second column is $A$, which will be 0 or 1 for <code class="language-plaintext highlighter-rouge">afternoon</code>.</td>
</tr>
<tr>
<td>$\boldsymbol{\beta}$</td>
<td>2 x 1</td>
<td>fixed</td>
<td>The two elements in the $\boldsymbol{\beta}$ (bold font beta) are what I’ll refer to as the intercept $\alpha$ and the slope $\beta$ (unbolded beta) across all cafes in equation 2.</td>
</tr>
<tr>
<td>$\textbf{Z}$</td>
<td>200 x (2x20)</td>
<td>associated with random</td>
<td>The first 20 columns representing intercepts for each cafe and the second 20 for the covariate (<code class="language-plaintext highlighter-rouge">afternoon</code>). See visual below.</td>
</tr>
<tr>
<td>$\textbf{u}$</td>
<td>(2x20) x 1</td>
<td>random</td>
<td>$\textbf{u}$ holds each of the 20 cafes’ intercept $a_\text{cafe}$ and slope $b_\text{cafe}$. There’s an implied correlation structure between them.</td>
</tr>
<tr>
<td>$\boldsymbol{\epsilon}$</td>
<td>200 x 1</td>
<td>n/a</td>
<td>Normally distributed residual error.</td>
</tr>
</tbody>
</table>
<p>To better understand what $\textbf{Z}$ looks like we can create an alternate representation of <code class="language-plaintext highlighter-rouge">df_cafes</code>. Each row of the matrix $\textbf{Z}$ is for an individual observation. The first 20 columns of a row are the 20 intercepts of a cafe (column 1 is cafe 1, column 2 is cafe 2, etc.) All of the first 20 columns will contain a 0 <em>except</em> for the column that represents the cafe that observation is associated with which will be a 1. The next 20 columns (columns 21-40) will represent <code class="language-plaintext highlighter-rouge">afternoon</code>. All of this second group of columns will be 0 <em>except</em> for the column that represents the cafe that observation is associated with <em>and</em> if the observation is associated with an afternon observation.</p>
<p>To be clear, the structure of <code class="language-plaintext highlighter-rouge">df_cafes</code>, where each row is an observation with the cafe, afternoon status, and wait time, is already in a form to be understood by the <code class="language-plaintext highlighter-rouge">lmer</code> and <code class="language-plaintext highlighter-rouge">pymc</code> packages. What I’m showing below is to help understand what the matrix $\textbf{Z}$ looks like in the above equations.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Z</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">((</span><span class="mi">200</span><span class="p">,</span> <span class="mi">40</span><span class="p">))</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">df_cafes</span><span class="p">.</span><span class="n">index</span><span class="p">:</span>
<span class="n">cafe</span> <span class="o">=</span> <span class="n">df_cafes</span><span class="p">.</span><span class="n">loc</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="s">'cafe'</span><span class="p">]</span>
<span class="n">afternoon</span> <span class="o">=</span> <span class="n">df_cafes</span><span class="p">.</span><span class="n">loc</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="s">'afternoon'</span><span class="p">]</span>
<span class="n">Z</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">cafe</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span>
<span class="n">Z</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="mi">20</span><span class="o">+</span><span class="n">cafe</span><span class="p">]</span> <span class="o">=</span> <span class="n">afternoon</span>
</code></pre></div></div>
<p>We can take a look at the first 12 rows of Z. The first 10 are for the first cafe and observations alternate morning and afternoon, hence what’s displayed in column 20. I included the first two rows of the second cafe to show how the <code class="language-plaintext highlighter-rouge">1</code> moves over a row after the first 10 rows. I’ll use <code class="language-plaintext highlighter-rouge">pandas</code> to better display the values.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">pd</span><span class="p">.</span><span class="n">set_option</span><span class="p">(</span><span class="s">'display.max_columns'</span><span class="p">,</span> <span class="mi">40</span><span class="p">)</span>
<span class="p">(</span>
<span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">Z</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="mi">12</span><span class="p">,</span> <span class="p">:])</span>
<span class="p">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">int</span><span class="p">)</span>
<span class="p">.</span><span class="n">style</span>
<span class="p">.</span><span class="n">highlight_max</span><span class="p">(</span><span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">props</span><span class="o">=</span><span class="s">'color:navy; background-color:yellow;'</span><span class="p">)</span>
<span class="p">.</span><span class="n">highlight_min</span><span class="p">(</span><span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">props</span><span class="o">=</span><span class="s">'color:white; background-color:#3E0B51;'</span><span class="p">)</span>
<span class="p">)</span>
</code></pre></div></div>
<style type="text/css">
#T_7cfc2_row0_col0, #T_7cfc2_row1_col0, #T_7cfc2_row1_col20, #T_7cfc2_row2_col0, #T_7cfc2_row3_col0, #T_7cfc2_row3_col20, #T_7cfc2_row4_col0, #T_7cfc2_row5_col0, #T_7cfc2_row5_col20, #T_7cfc2_row6_col0, #T_7cfc2_row7_col0, #T_7cfc2_row7_col20, #T_7cfc2_row8_col0, #T_7cfc2_row9_col0, #T_7cfc2_row9_col20, #T_7cfc2_row10_col1, #T_7cfc2_row11_col1, #T_7cfc2_row11_col21 {
color: navy;
background-color: yellow;
}
#T_7cfc2_row0_col1, #T_7cfc2_row0_col2, #T_7cfc2_row0_col3, #T_7cfc2_row0_col4, #T_7cfc2_row0_col5, #T_7cfc2_row0_col6, #T_7cfc2_row0_col7, #T_7cfc2_row0_col8, #T_7cfc2_row0_col9, #T_7cfc2_row0_col10, #T_7cfc2_row0_col11, #T_7cfc2_row0_col12, #T_7cfc2_row0_col13, #T_7cfc2_row0_col14, #T_7cfc2_row0_col15, #T_7cfc2_row0_col16, #T_7cfc2_row0_col17, #T_7cfc2_row0_col18, #T_7cfc2_row0_col19, #T_7cfc2_row0_col20, #T_7cfc2_row0_col21, #T_7cfc2_row0_col22, #T_7cfc2_row0_col23, #T_7cfc2_row0_col24, #T_7cfc2_row0_col25, #T_7cfc2_row0_col26, #T_7cfc2_row0_col27, #T_7cfc2_row0_col28, #T_7cfc2_row0_col29, #T_7cfc2_row0_col30, #T_7cfc2_row0_col31, #T_7cfc2_row0_col32, #T_7cfc2_row0_col33, #T_7cfc2_row0_col34, #T_7cfc2_row0_col35, #T_7cfc2_row0_col36, #T_7cfc2_row0_col37, #T_7cfc2_row0_col38, #T_7cfc2_row0_col39, #T_7cfc2_row1_col1, #T_7cfc2_row1_col2, #T_7cfc2_row1_col3, #T_7cfc2_row1_col4, #T_7cfc2_row1_col5, #T_7cfc2_row1_col6, #T_7cfc2_row1_col7, #T_7cfc2_row1_col8, #T_7cfc2_row1_col9, #T_7cfc2_row1_col10, #T_7cfc2_row1_col11, #T_7cfc2_row1_col12, #T_7cfc2_row1_col13, #T_7cfc2_row1_col14, #T_7cfc2_row1_col15, #T_7cfc2_row1_col16, #T_7cfc2_row1_col17, #T_7cfc2_row1_col18, #T_7cfc2_row1_col19, #T_7cfc2_row1_col21, #T_7cfc2_row1_col22, #T_7cfc2_row1_col23, #T_7cfc2_row1_col24, #T_7cfc2_row1_col25, #T_7cfc2_row1_col26, #T_7cfc2_row1_col27, #T_7cfc2_row1_col28, #T_7cfc2_row1_col29, #T_7cfc2_row1_col30, #T_7cfc2_row1_col31, #T_7cfc2_row1_col32, #T_7cfc2_row1_col33, #T_7cfc2_row1_col34, #T_7cfc2_row1_col35, #T_7cfc2_row1_col36, #T_7cfc2_row1_col37, #T_7cfc2_row1_col38, #T_7cfc2_row1_col39, #T_7cfc2_row2_col1, #T_7cfc2_row2_col2, #T_7cfc2_row2_col3, #T_7cfc2_row2_col4, #T_7cfc2_row2_col5, #T_7cfc2_row2_col6, #T_7cfc2_row2_col7, #T_7cfc2_row2_col8, #T_7cfc2_row2_col9, #T_7cfc2_row2_col10, #T_7cfc2_row2_col11, #T_7cfc2_row2_col12, #T_7cfc2_row2_col13, #T_7cfc2_row2_col14, #T_7cfc2_row2_col15, #T_7cfc2_row2_col16, #T_7cfc2_row2_col17, #T_7cfc2_row2_col18, #T_7cfc2_row2_col19, #T_7cfc2_row2_col20, #T_7cfc2_row2_col21, #T_7cfc2_row2_col22, #T_7cfc2_row2_col23, #T_7cfc2_row2_col24, #T_7cfc2_row2_col25, #T_7cfc2_row2_col26, #T_7cfc2_row2_col27, #T_7cfc2_row2_col28, #T_7cfc2_row2_col29, #T_7cfc2_row2_col30, #T_7cfc2_row2_col31, #T_7cfc2_row2_col32, #T_7cfc2_row2_col33, #T_7cfc2_row2_col34, #T_7cfc2_row2_col35, #T_7cfc2_row2_col36, #T_7cfc2_row2_col37, #T_7cfc2_row2_col38, #T_7cfc2_row2_col39, #T_7cfc2_row3_col1, #T_7cfc2_row3_col2, #T_7cfc2_row3_col3, #T_7cfc2_row3_col4, #T_7cfc2_row3_col5, #T_7cfc2_row3_col6, #T_7cfc2_row3_col7, #T_7cfc2_row3_col8, #T_7cfc2_row3_col9, #T_7cfc2_row3_col10, #T_7cfc2_row3_col11, #T_7cfc2_row3_col12, #T_7cfc2_row3_col13, #T_7cfc2_row3_col14, #T_7cfc2_row3_col15, #T_7cfc2_row3_col16, #T_7cfc2_row3_col17, #T_7cfc2_row3_col18, #T_7cfc2_row3_col19, #T_7cfc2_row3_col21, #T_7cfc2_row3_col22, #T_7cfc2_row3_col23, #T_7cfc2_row3_col24, #T_7cfc2_row3_col25, #T_7cfc2_row3_col26, #T_7cfc2_row3_col27, #T_7cfc2_row3_col28, #T_7cfc2_row3_col29, #T_7cfc2_row3_col30, #T_7cfc2_row3_col31, #T_7cfc2_row3_col32, #T_7cfc2_row3_col33, #T_7cfc2_row3_col34, #T_7cfc2_row3_col35, #T_7cfc2_row3_col36, #T_7cfc2_row3_col37, #T_7cfc2_row3_col38, #T_7cfc2_row3_col39, #T_7cfc2_row4_col1, #T_7cfc2_row4_col2, #T_7cfc2_row4_col3, #T_7cfc2_row4_col4, #T_7cfc2_row4_col5, #T_7cfc2_row4_col6, #T_7cfc2_row4_col7, #T_7cfc2_row4_col8, #T_7cfc2_row4_col9, #T_7cfc2_row4_col10, #T_7cfc2_row4_col11, #T_7cfc2_row4_col12, #T_7cfc2_row4_col13, #T_7cfc2_row4_col14, #T_7cfc2_row4_col15, #T_7cfc2_row4_col16, #T_7cfc2_row4_col17, #T_7cfc2_row4_col18, #T_7cfc2_row4_col19, #T_7cfc2_row4_col20, #T_7cfc2_row4_col21, #T_7cfc2_row4_col22, #T_7cfc2_row4_col23, #T_7cfc2_row4_col24, #T_7cfc2_row4_col25, #T_7cfc2_row4_col26, #T_7cfc2_row4_col27, #T_7cfc2_row4_col28, #T_7cfc2_row4_col29, #T_7cfc2_row4_col30, #T_7cfc2_row4_col31, #T_7cfc2_row4_col32, #T_7cfc2_row4_col33, #T_7cfc2_row4_col34, #T_7cfc2_row4_col35, #T_7cfc2_row4_col36, #T_7cfc2_row4_col37, #T_7cfc2_row4_col38, #T_7cfc2_row4_col39, #T_7cfc2_row5_col1, #T_7cfc2_row5_col2, #T_7cfc2_row5_col3, #T_7cfc2_row5_col4, #T_7cfc2_row5_col5, #T_7cfc2_row5_col6, #T_7cfc2_row5_col7, #T_7cfc2_row5_col8, #T_7cfc2_row5_col9, #T_7cfc2_row5_col10, #T_7cfc2_row5_col11, #T_7cfc2_row5_col12, #T_7cfc2_row5_col13, #T_7cfc2_row5_col14, #T_7cfc2_row5_col15, #T_7cfc2_row5_col16, #T_7cfc2_row5_col17, #T_7cfc2_row5_col18, #T_7cfc2_row5_col19, #T_7cfc2_row5_col21, #T_7cfc2_row5_col22, #T_7cfc2_row5_col23, #T_7cfc2_row5_col24, #T_7cfc2_row5_col25, #T_7cfc2_row5_col26, #T_7cfc2_row5_col27, #T_7cfc2_row5_col28, #T_7cfc2_row5_col29, #T_7cfc2_row5_col30, #T_7cfc2_row5_col31, #T_7cfc2_row5_col32, #T_7cfc2_row5_col33, #T_7cfc2_row5_col34, #T_7cfc2_row5_col35, #T_7cfc2_row5_col36, #T_7cfc2_row5_col37, #T_7cfc2_row5_col38, #T_7cfc2_row5_col39, #T_7cfc2_row6_col1, #T_7cfc2_row6_col2, #T_7cfc2_row6_col3, #T_7cfc2_row6_col4, #T_7cfc2_row6_col5, #T_7cfc2_row6_col6, #T_7cfc2_row6_col7, #T_7cfc2_row6_col8, #T_7cfc2_row6_col9, #T_7cfc2_row6_col10, #T_7cfc2_row6_col11, #T_7cfc2_row6_col12, #T_7cfc2_row6_col13, #T_7cfc2_row6_col14, #T_7cfc2_row6_col15, #T_7cfc2_row6_col16, #T_7cfc2_row6_col17, #T_7cfc2_row6_col18, #T_7cfc2_row6_col19, #T_7cfc2_row6_col20, #T_7cfc2_row6_col21, #T_7cfc2_row6_col22, #T_7cfc2_row6_col23, #T_7cfc2_row6_col24, #T_7cfc2_row6_col25, #T_7cfc2_row6_col26, #T_7cfc2_row6_col27, #T_7cfc2_row6_col28, #T_7cfc2_row6_col29, #T_7cfc2_row6_col30, #T_7cfc2_row6_col31, #T_7cfc2_row6_col32, #T_7cfc2_row6_col33, #T_7cfc2_row6_col34, #T_7cfc2_row6_col35, #T_7cfc2_row6_col36, #T_7cfc2_row6_col37, #T_7cfc2_row6_col38, #T_7cfc2_row6_col39, #T_7cfc2_row7_col1, #T_7cfc2_row7_col2, #T_7cfc2_row7_col3, #T_7cfc2_row7_col4, #T_7cfc2_row7_col5, #T_7cfc2_row7_col6, #T_7cfc2_row7_col7, #T_7cfc2_row7_col8, #T_7cfc2_row7_col9, #T_7cfc2_row7_col10, #T_7cfc2_row7_col11, #T_7cfc2_row7_col12, #T_7cfc2_row7_col13, #T_7cfc2_row7_col14, #T_7cfc2_row7_col15, #T_7cfc2_row7_col16, #T_7cfc2_row7_col17, #T_7cfc2_row7_col18, #T_7cfc2_row7_col19, #T_7cfc2_row7_col21, #T_7cfc2_row7_col22, #T_7cfc2_row7_col23, #T_7cfc2_row7_col24, #T_7cfc2_row7_col25, #T_7cfc2_row7_col26, #T_7cfc2_row7_col27, #T_7cfc2_row7_col28, #T_7cfc2_row7_col29, #T_7cfc2_row7_col30, #T_7cfc2_row7_col31, #T_7cfc2_row7_col32, #T_7cfc2_row7_col33, #T_7cfc2_row7_col34, #T_7cfc2_row7_col35, #T_7cfc2_row7_col36, #T_7cfc2_row7_col37, #T_7cfc2_row7_col38, #T_7cfc2_row7_col39, #T_7cfc2_row8_col1, #T_7cfc2_row8_col2, #T_7cfc2_row8_col3, #T_7cfc2_row8_col4, #T_7cfc2_row8_col5, #T_7cfc2_row8_col6, #T_7cfc2_row8_col7, #T_7cfc2_row8_col8, #T_7cfc2_row8_col9, #T_7cfc2_row8_col10, #T_7cfc2_row8_col11, #T_7cfc2_row8_col12, #T_7cfc2_row8_col13, #T_7cfc2_row8_col14, #T_7cfc2_row8_col15, #T_7cfc2_row8_col16, #T_7cfc2_row8_col17, #T_7cfc2_row8_col18, #T_7cfc2_row8_col19, #T_7cfc2_row8_col20, #T_7cfc2_row8_col21, #T_7cfc2_row8_col22, #T_7cfc2_row8_col23, #T_7cfc2_row8_col24, #T_7cfc2_row8_col25, #T_7cfc2_row8_col26, #T_7cfc2_row8_col27, #T_7cfc2_row8_col28, #T_7cfc2_row8_col29, #T_7cfc2_row8_col30, #T_7cfc2_row8_col31, #T_7cfc2_row8_col32, #T_7cfc2_row8_col33, #T_7cfc2_row8_col34, #T_7cfc2_row8_col35, #T_7cfc2_row8_col36, #T_7cfc2_row8_col37, #T_7cfc2_row8_col38, #T_7cfc2_row8_col39, #T_7cfc2_row9_col1, #T_7cfc2_row9_col2, #T_7cfc2_row9_col3, #T_7cfc2_row9_col4, #T_7cfc2_row9_col5, #T_7cfc2_row9_col6, #T_7cfc2_row9_col7, #T_7cfc2_row9_col8, #T_7cfc2_row9_col9, #T_7cfc2_row9_col10, #T_7cfc2_row9_col11, #T_7cfc2_row9_col12, #T_7cfc2_row9_col13, #T_7cfc2_row9_col14, #T_7cfc2_row9_col15, #T_7cfc2_row9_col16, #T_7cfc2_row9_col17, #T_7cfc2_row9_col18, #T_7cfc2_row9_col19, #T_7cfc2_row9_col21, #T_7cfc2_row9_col22, #T_7cfc2_row9_col23, #T_7cfc2_row9_col24, #T_7cfc2_row9_col25, #T_7cfc2_row9_col26, #T_7cfc2_row9_col27, #T_7cfc2_row9_col28, #T_7cfc2_row9_col29, #T_7cfc2_row9_col30, #T_7cfc2_row9_col31, #T_7cfc2_row9_col32, #T_7cfc2_row9_col33, #T_7cfc2_row9_col34, #T_7cfc2_row9_col35, #T_7cfc2_row9_col36, #T_7cfc2_row9_col37, #T_7cfc2_row9_col38, #T_7cfc2_row9_col39, #T_7cfc2_row10_col0, #T_7cfc2_row10_col2, #T_7cfc2_row10_col3, #T_7cfc2_row10_col4, #T_7cfc2_row10_col5, #T_7cfc2_row10_col6, #T_7cfc2_row10_col7, #T_7cfc2_row10_col8, #T_7cfc2_row10_col9, #T_7cfc2_row10_col10, #T_7cfc2_row10_col11, #T_7cfc2_row10_col12, #T_7cfc2_row10_col13, #T_7cfc2_row10_col14, #T_7cfc2_row10_col15, #T_7cfc2_row10_col16, #T_7cfc2_row10_col17, #T_7cfc2_row10_col18, #T_7cfc2_row10_col19, #T_7cfc2_row10_col20, #T_7cfc2_row10_col21, #T_7cfc2_row10_col22, #T_7cfc2_row10_col23, #T_7cfc2_row10_col24, #T_7cfc2_row10_col25, #T_7cfc2_row10_col26, #T_7cfc2_row10_col27, #T_7cfc2_row10_col28, #T_7cfc2_row10_col29, #T_7cfc2_row10_col30, #T_7cfc2_row10_col31, #T_7cfc2_row10_col32, #T_7cfc2_row10_col33, #T_7cfc2_row10_col34, #T_7cfc2_row10_col35, #T_7cfc2_row10_col36, #T_7cfc2_row10_col37, #T_7cfc2_row10_col38, #T_7cfc2_row10_col39, #T_7cfc2_row11_col0, #T_7cfc2_row11_col2, #T_7cfc2_row11_col3, #T_7cfc2_row11_col4, #T_7cfc2_row11_col5, #T_7cfc2_row11_col6, #T_7cfc2_row11_col7, #T_7cfc2_row11_col8, #T_7cfc2_row11_col9, #T_7cfc2_row11_col10, #T_7cfc2_row11_col11, #T_7cfc2_row11_col12, #T_7cfc2_row11_col13, #T_7cfc2_row11_col14, #T_7cfc2_row11_col15, #T_7cfc2_row11_col16, #T_7cfc2_row11_col17, #T_7cfc2_row11_col18, #T_7cfc2_row11_col19, #T_7cfc2_row11_col20, #T_7cfc2_row11_col22, #T_7cfc2_row11_col23, #T_7cfc2_row11_col24, #T_7cfc2_row11_col25, #T_7cfc2_row11_col26, #T_7cfc2_row11_col27, #T_7cfc2_row11_col28, #T_7cfc2_row11_col29, #T_7cfc2_row11_col30, #T_7cfc2_row11_col31, #T_7cfc2_row11_col32, #T_7cfc2_row11_col33, #T_7cfc2_row11_col34, #T_7cfc2_row11_col35, #T_7cfc2_row11_col36, #T_7cfc2_row11_col37, #T_7cfc2_row11_col38, #T_7cfc2_row11_col39 {
color: white;
background-color: #3E0B51;
}
</style>
<table id="T_7cfc2">
<thead>
<tr>
<th class="blank level0"> </th>
<th id="T_7cfc2_level0_col0" class="col_heading level0 col0">0</th>
<th id="T_7cfc2_level0_col1" class="col_heading level0 col1">1</th>
<th id="T_7cfc2_level0_col2" class="col_heading level0 col2">2</th>
<th id="T_7cfc2_level0_col3" class="col_heading level0 col3">3</th>
<th id="T_7cfc2_level0_col4" class="col_heading level0 col4">4</th>
<th id="T_7cfc2_level0_col5" class="col_heading level0 col5">5</th>
<th id="T_7cfc2_level0_col6" class="col_heading level0 col6">6</th>
<th id="T_7cfc2_level0_col7" class="col_heading level0 col7">7</th>
<th id="T_7cfc2_level0_col8" class="col_heading level0 col8">8</th>
<th id="T_7cfc2_level0_col9" class="col_heading level0 col9">9</th>
<th id="T_7cfc2_level0_col10" class="col_heading level0 col10">10</th>
<th id="T_7cfc2_level0_col11" class="col_heading level0 col11">11</th>
<th id="T_7cfc2_level0_col12" class="col_heading level0 col12">12</th>
<th id="T_7cfc2_level0_col13" class="col_heading level0 col13">13</th>
<th id="T_7cfc2_level0_col14" class="col_heading level0 col14">14</th>
<th id="T_7cfc2_level0_col15" class="col_heading level0 col15">15</th>
<th id="T_7cfc2_level0_col16" class="col_heading level0 col16">16</th>
<th id="T_7cfc2_level0_col17" class="col_heading level0 col17">17</th>
<th id="T_7cfc2_level0_col18" class="col_heading level0 col18">18</th>
<th id="T_7cfc2_level0_col19" class="col_heading level0 col19">19</th>
<th id="T_7cfc2_level0_col20" class="col_heading level0 col20">20</th>
<th id="T_7cfc2_level0_col21" class="col_heading level0 col21">21</th>
<th id="T_7cfc2_level0_col22" class="col_heading level0 col22">22</th>
<th id="T_7cfc2_level0_col23" class="col_heading level0 col23">23</th>
<th id="T_7cfc2_level0_col24" class="col_heading level0 col24">24</th>
<th id="T_7cfc2_level0_col25" class="col_heading level0 col25">25</th>
<th id="T_7cfc2_level0_col26" class="col_heading level0 col26">26</th>
<th id="T_7cfc2_level0_col27" class="col_heading level0 col27">27</th>
<th id="T_7cfc2_level0_col28" class="col_heading level0 col28">28</th>
<th id="T_7cfc2_level0_col29" class="col_heading level0 col29">29</th>
<th id="T_7cfc2_level0_col30" class="col_heading level0 col30">30</th>
<th id="T_7cfc2_level0_col31" class="col_heading level0 col31">31</th>
<th id="T_7cfc2_level0_col32" class="col_heading level0 col32">32</th>
<th id="T_7cfc2_level0_col33" class="col_heading level0 col33">33</th>
<th id="T_7cfc2_level0_col34" class="col_heading level0 col34">34</th>
<th id="T_7cfc2_level0_col35" class="col_heading level0 col35">35</th>
<th id="T_7cfc2_level0_col36" class="col_heading level0 col36">36</th>
<th id="T_7cfc2_level0_col37" class="col_heading level0 col37">37</th>
<th id="T_7cfc2_level0_col38" class="col_heading level0 col38">38</th>
<th id="T_7cfc2_level0_col39" class="col_heading level0 col39">39</th>
</tr>
</thead>
<tbody>
<tr>
<th id="T_7cfc2_level0_row0" class="row_heading level0 row0">0</th>
<td id="T_7cfc2_row0_col0" class="data row0 col0">1</td>
<td id="T_7cfc2_row0_col1" class="data row0 col1">0</td>
<td id="T_7cfc2_row0_col2" class="data row0 col2">0</td>
<td id="T_7cfc2_row0_col3" class="data row0 col3">0</td>
<td id="T_7cfc2_row0_col4" class="data row0 col4">0</td>
<td id="T_7cfc2_row0_col5" class="data row0 col5">0</td>
<td id="T_7cfc2_row0_col6" class="data row0 col6">0</td>
<td id="T_7cfc2_row0_col7" class="data row0 col7">0</td>
<td id="T_7cfc2_row0_col8" class="data row0 col8">0</td>
<td id="T_7cfc2_row0_col9" class="data row0 col9">0</td>
<td id="T_7cfc2_row0_col10" class="data row0 col10">0</td>
<td id="T_7cfc2_row0_col11" class="data row0 col11">0</td>
<td id="T_7cfc2_row0_col12" class="data row0 col12">0</td>
<td id="T_7cfc2_row0_col13" class="data row0 col13">0</td>
<td id="T_7cfc2_row0_col14" class="data row0 col14">0</td>
<td id="T_7cfc2_row0_col15" class="data row0 col15">0</td>
<td id="T_7cfc2_row0_col16" class="data row0 col16">0</td>
<td id="T_7cfc2_row0_col17" class="data row0 col17">0</td>
<td id="T_7cfc2_row0_col18" class="data row0 col18">0</td>
<td id="T_7cfc2_row0_col19" class="data row0 col19">0</td>
<td id="T_7cfc2_row0_col20" class="data row0 col20">0</td>
<td id="T_7cfc2_row0_col21" class="data row0 col21">0</td>
<td id="T_7cfc2_row0_col22" class="data row0 col22">0</td>
<td id="T_7cfc2_row0_col23" class="data row0 col23">0</td>
<td id="T_7cfc2_row0_col24" class="data row0 col24">0</td>
<td id="T_7cfc2_row0_col25" class="data row0 col25">0</td>
<td id="T_7cfc2_row0_col26" class="data row0 col26">0</td>
<td id="T_7cfc2_row0_col27" class="data row0 col27">0</td>
<td id="T_7cfc2_row0_col28" class="data row0 col28">0</td>
<td id="T_7cfc2_row0_col29" class="data row0 col29">0</td>
<td id="T_7cfc2_row0_col30" class="data row0 col30">0</td>
<td id="T_7cfc2_row0_col31" class="data row0 col31">0</td>
<td id="T_7cfc2_row0_col32" class="data row0 col32">0</td>
<td id="T_7cfc2_row0_col33" class="data row0 col33">0</td>
<td id="T_7cfc2_row0_col34" class="data row0 col34">0</td>
<td id="T_7cfc2_row0_col35" class="data row0 col35">0</td>
<td id="T_7cfc2_row0_col36" class="data row0 col36">0</td>
<td id="T_7cfc2_row0_col37" class="data row0 col37">0</td>
<td id="T_7cfc2_row0_col38" class="data row0 col38">0</td>
<td id="T_7cfc2_row0_col39" class="data row0 col39">0</td>
</tr>
<tr>
<th id="T_7cfc2_level0_row1" class="row_heading level0 row1">1</th>
<td id="T_7cfc2_row1_col0" class="data row1 col0">1</td>
<td id="T_7cfc2_row1_col1" class="data row1 col1">0</td>
<td id="T_7cfc2_row1_col2" class="data row1 col2">0</td>
<td id="T_7cfc2_row1_col3" class="data row1 col3">0</td>
<td id="T_7cfc2_row1_col4" class="data row1 col4">0</td>
<td id="T_7cfc2_row1_col5" class="data row1 col5">0</td>
<td id="T_7cfc2_row1_col6" class="data row1 col6">0</td>
<td id="T_7cfc2_row1_col7" class="data row1 col7">0</td>
<td id="T_7cfc2_row1_col8" class="data row1 col8">0</td>
<td id="T_7cfc2_row1_col9" class="data row1 col9">0</td>
<td id="T_7cfc2_row1_col10" class="data row1 col10">0</td>
<td id="T_7cfc2_row1_col11" class="data row1 col11">0</td>
<td id="T_7cfc2_row1_col12" class="data row1 col12">0</td>
<td id="T_7cfc2_row1_col13" class="data row1 col13">0</td>
<td id="T_7cfc2_row1_col14" class="data row1 col14">0</td>
<td id="T_7cfc2_row1_col15" class="data row1 col15">0</td>
<td id="T_7cfc2_row1_col16" class="data row1 col16">0</td>
<td id="T_7cfc2_row1_col17" class="data row1 col17">0</td>
<td id="T_7cfc2_row1_col18" class="data row1 col18">0</td>
<td id="T_7cfc2_row1_col19" class="data row1 col19">0</td>
<td id="T_7cfc2_row1_col20" class="data row1 col20">1</td>
<td id="T_7cfc2_row1_col21" class="data row1 col21">0</td>
<td id="T_7cfc2_row1_col22" class="data row1 col22">0</td>
<td id="T_7cfc2_row1_col23" class="data row1 col23">0</td>
<td id="T_7cfc2_row1_col24" class="data row1 col24">0</td>
<td id="T_7cfc2_row1_col25" class="data row1 col25">0</td>
<td id="T_7cfc2_row1_col26" class="data row1 col26">0</td>
<td id="T_7cfc2_row1_col27" class="data row1 col27">0</td>
<td id="T_7cfc2_row1_col28" class="data row1 col28">0</td>
<td id="T_7cfc2_row1_col29" class="data row1 col29">0</td>
<td id="T_7cfc2_row1_col30" class="data row1 col30">0</td>
<td id="T_7cfc2_row1_col31" class="data row1 col31">0</td>
<td id="T_7cfc2_row1_col32" class="data row1 col32">0</td>
<td id="T_7cfc2_row1_col33" class="data row1 col33">0</td>
<td id="T_7cfc2_row1_col34" class="data row1 col34">0</td>
<td id="T_7cfc2_row1_col35" class="data row1 col35">0</td>
<td id="T_7cfc2_row1_col36" class="data row1 col36">0</td>
<td id="T_7cfc2_row1_col37" class="data row1 col37">0</td>
<td id="T_7cfc2_row1_col38" class="data row1 col38">0</td>
<td id="T_7cfc2_row1_col39" class="data row1 col39">0</td>
</tr>
<tr>
<th id="T_7cfc2_level0_row2" class="row_heading level0 row2">2</th>
<td id="T_7cfc2_row2_col0" class="data row2 col0">1</td>
<td id="T_7cfc2_row2_col1" class="data row2 col1">0</td>
<td id="T_7cfc2_row2_col2" class="data row2 col2">0</td>
<td id="T_7cfc2_row2_col3" class="data row2 col3">0</td>
<td id="T_7cfc2_row2_col4" class="data row2 col4">0</td>
<td id="T_7cfc2_row2_col5" class="data row2 col5">0</td>
<td id="T_7cfc2_row2_col6" class="data row2 col6">0</td>
<td id="T_7cfc2_row2_col7" class="data row2 col7">0</td>
<td id="T_7cfc2_row2_col8" class="data row2 col8">0</td>
<td id="T_7cfc2_row2_col9" class="data row2 col9">0</td>
<td id="T_7cfc2_row2_col10" class="data row2 col10">0</td>
<td id="T_7cfc2_row2_col11" class="data row2 col11">0</td>
<td id="T_7cfc2_row2_col12" class="data row2 col12">0</td>
<td id="T_7cfc2_row2_col13" class="data row2 col13">0</td>
<td id="T_7cfc2_row2_col14" class="data row2 col14">0</td>
<td id="T_7cfc2_row2_col15" class="data row2 col15">0</td>
<td id="T_7cfc2_row2_col16" class="data row2 col16">0</td>
<td id="T_7cfc2_row2_col17" class="data row2 col17">0</td>
<td id="T_7cfc2_row2_col18" class="data row2 col18">0</td>
<td id="T_7cfc2_row2_col19" class="data row2 col19">0</td>
<td id="T_7cfc2_row2_col20" class="data row2 col20">0</td>
<td id="T_7cfc2_row2_col21" class="data row2 col21">0</td>
<td id="T_7cfc2_row2_col22" class="data row2 col22">0</td>
<td id="T_7cfc2_row2_col23" class="data row2 col23">0</td>
<td id="T_7cfc2_row2_col24" class="data row2 col24">0</td>
<td id="T_7cfc2_row2_col25" class="data row2 col25">0</td>
<td id="T_7cfc2_row2_col26" class="data row2 col26">0</td>
<td id="T_7cfc2_row2_col27" class="data row2 col27">0</td>
<td id="T_7cfc2_row2_col28" class="data row2 col28">0</td>
<td id="T_7cfc2_row2_col29" class="data row2 col29">0</td>
<td id="T_7cfc2_row2_col30" class="data row2 col30">0</td>
<td id="T_7cfc2_row2_col31" class="data row2 col31">0</td>
<td id="T_7cfc2_row2_col32" class="data row2 col32">0</td>
<td id="T_7cfc2_row2_col33" class="data row2 col33">0</td>
<td id="T_7cfc2_row2_col34" class="data row2 col34">0</td>
<td id="T_7cfc2_row2_col35" class="data row2 col35">0</td>
<td id="T_7cfc2_row2_col36" class="data row2 col36">0</td>
<td id="T_7cfc2_row2_col37" class="data row2 col37">0</td>
<td id="T_7cfc2_row2_col38" class="data row2 col38">0</td>
<td id="T_7cfc2_row2_col39" class="data row2 col39">0</td>
</tr>
<tr>
<th id="T_7cfc2_level0_row3" class="row_heading level0 row3">3</th>
<td id="T_7cfc2_row3_col0" class="data row3 col0">1</td>
<td id="T_7cfc2_row3_col1" class="data row3 col1">0</td>
<td id="T_7cfc2_row3_col2" class="data row3 col2">0</td>
<td id="T_7cfc2_row3_col3" class="data row3 col3">0</td>
<td id="T_7cfc2_row3_col4" class="data row3 col4">0</td>
<td id="T_7cfc2_row3_col5" class="data row3 col5">0</td>
<td id="T_7cfc2_row3_col6" class="data row3 col6">0</td>
<td id="T_7cfc2_row3_col7" class="data row3 col7">0</td>
<td id="T_7cfc2_row3_col8" class="data row3 col8">0</td>
<td id="T_7cfc2_row3_col9" class="data row3 col9">0</td>
<td id="T_7cfc2_row3_col10" class="data row3 col10">0</td>
<td id="T_7cfc2_row3_col11" class="data row3 col11">0</td>
<td id="T_7cfc2_row3_col12" class="data row3 col12">0</td>
<td id="T_7cfc2_row3_col13" class="data row3 col13">0</td>
<td id="T_7cfc2_row3_col14" class="data row3 col14">0</td>
<td id="T_7cfc2_row3_col15" class="data row3 col15">0</td>
<td id="T_7cfc2_row3_col16" class="data row3 col16">0</td>
<td id="T_7cfc2_row3_col17" class="data row3 col17">0</td>
<td id="T_7cfc2_row3_col18" class="data row3 col18">0</td>
<td id="T_7cfc2_row3_col19" class="data row3 col19">0</td>
<td id="T_7cfc2_row3_col20" class="data row3 col20">1</td>
<td id="T_7cfc2_row3_col21" class="data row3 col21">0</td>
<td id="T_7cfc2_row3_col22" class="data row3 col22">0</td>
<td id="T_7cfc2_row3_col23" class="data row3 col23">0</td>
<td id="T_7cfc2_row3_col24" class="data row3 col24">0</td>
<td id="T_7cfc2_row3_col25" class="data row3 col25">0</td>
<td id="T_7cfc2_row3_col26" class="data row3 col26">0</td>
<td id="T_7cfc2_row3_col27" class="data row3 col27">0</td>
<td id="T_7cfc2_row3_col28" class="data row3 col28">0</td>
<td id="T_7cfc2_row3_col29" class="data row3 col29">0</td>
<td id="T_7cfc2_row3_col30" class="data row3 col30">0</td>
<td id="T_7cfc2_row3_col31" class="data row3 col31">0</td>
<td id="T_7cfc2_row3_col32" class="data row3 col32">0</td>
<td id="T_7cfc2_row3_col33" class="data row3 col33">0</td>
<td id="T_7cfc2_row3_col34" class="data row3 col34">0</td>
<td id="T_7cfc2_row3_col35" class="data row3 col35">0</td>
<td id="T_7cfc2_row3_col36" class="data row3 col36">0</td>
<td id="T_7cfc2_row3_col37" class="data row3 col37">0</td>
<td id="T_7cfc2_row3_col38" class="data row3 col38">0</td>
<td id="T_7cfc2_row3_col39" class="data row3 col39">0</td>
</tr>
<tr>
<th id="T_7cfc2_level0_row4" class="row_heading level0 row4">4</th>
<td id="T_7cfc2_row4_col0" class="data row4 col0">1</td>
<td id="T_7cfc2_row4_col1" class="data row4 col1">0</td>
<td id="T_7cfc2_row4_col2" class="data row4 col2">0</td>
<td id="T_7cfc2_row4_col3" class="data row4 col3">0</td>
<td id="T_7cfc2_row4_col4" class="data row4 col4">0</td>
<td id="T_7cfc2_row4_col5" class="data row4 col5">0</td>
<td id="T_7cfc2_row4_col6" class="data row4 col6">0</td>
<td id="T_7cfc2_row4_col7" class="data row4 col7">0</td>
<td id="T_7cfc2_row4_col8" class="data row4 col8">0</td>
<td id="T_7cfc2_row4_col9" class="data row4 col9">0</td>
<td id="T_7cfc2_row4_col10" class="data row4 col10">0</td>
<td id="T_7cfc2_row4_col11" class="data row4 col11">0</td>
<td id="T_7cfc2_row4_col12" class="data row4 col12">0</td>
<td id="T_7cfc2_row4_col13" class="data row4 col13">0</td>
<td id="T_7cfc2_row4_col14" class="data row4 col14">0</td>
<td id="T_7cfc2_row4_col15" class="data row4 col15">0</td>
<td id="T_7cfc2_row4_col16" class="data row4 col16">0</td>
<td id="T_7cfc2_row4_col17" class="data row4 col17">0</td>
<td id="T_7cfc2_row4_col18" class="data row4 col18">0</td>
<td id="T_7cfc2_row4_col19" class="data row4 col19">0</td>
<td id="T_7cfc2_row4_col20" class="data row4 col20">0</td>
<td id="T_7cfc2_row4_col21" class="data row4 col21">0</td>
<td id="T_7cfc2_row4_col22" class="data row4 col22">0</td>
<td id="T_7cfc2_row4_col23" class="data row4 col23">0</td>
<td id="T_7cfc2_row4_col24" class="data row4 col24">0</td>
<td id="T_7cfc2_row4_col25" class="data row4 col25">0</td>
<td id="T_7cfc2_row4_col26" class="data row4 col26">0</td>
<td id="T_7cfc2_row4_col27" class="data row4 col27">0</td>
<td id="T_7cfc2_row4_col28" class="data row4 col28">0</td>
<td id="T_7cfc2_row4_col29" class="data row4 col29">0</td>
<td id="T_7cfc2_row4_col30" class="data row4 col30">0</td>
<td id="T_7cfc2_row4_col31" class="data row4 col31">0</td>
<td id="T_7cfc2_row4_col32" class="data row4 col32">0</td>
<td id="T_7cfc2_row4_col33" class="data row4 col33">0</td>
<td id="T_7cfc2_row4_col34" class="data row4 col34">0</td>
<td id="T_7cfc2_row4_col35" class="data row4 col35">0</td>
<td id="T_7cfc2_row4_col36" class="data row4 col36">0</td>
<td id="T_7cfc2_row4_col37" class="data row4 col37">0</td>
<td id="T_7cfc2_row4_col38" class="data row4 col38">0</td>
<td id="T_7cfc2_row4_col39" class="data row4 col39">0</td>
</tr>
<tr>
<th id="T_7cfc2_level0_row5" class="row_heading level0 row5">5</th>
<td id="T_7cfc2_row5_col0" class="data row5 col0">1</td>
<td id="T_7cfc2_row5_col1" class="data row5 col1">0</td>
<td id="T_7cfc2_row5_col2" class="data row5 col2">0</td>
<td id="T_7cfc2_row5_col3" class="data row5 col3">0</td>
<td id="T_7cfc2_row5_col4" class="data row5 col4">0</td>
<td id="T_7cfc2_row5_col5" class="data row5 col5">0</td>
<td id="T_7cfc2_row5_col6" class="data row5 col6">0</td>
<td id="T_7cfc2_row5_col7" class="data row5 col7">0</td>
<td id="T_7cfc2_row5_col8" class="data row5 col8">0</td>
<td id="T_7cfc2_row5_col9" class="data row5 col9">0</td>
<td id="T_7cfc2_row5_col10" class="data row5 col10">0</td>
<td id="T_7cfc2_row5_col11" class="data row5 col11">0</td>
<td id="T_7cfc2_row5_col12" class="data row5 col12">0</td>
<td id="T_7cfc2_row5_col13" class="data row5 col13">0</td>
<td id="T_7cfc2_row5_col14" class="data row5 col14">0</td>
<td id="T_7cfc2_row5_col15" class="data row5 col15">0</td>
<td id="T_7cfc2_row5_col16" class="data row5 col16">0</td>
<td id="T_7cfc2_row5_col17" class="data row5 col17">0</td>
<td id="T_7cfc2_row5_col18" class="data row5 col18">0</td>
<td id="T_7cfc2_row5_col19" class="data row5 col19">0</td>
<td id="T_7cfc2_row5_col20" class="data row5 col20">1</td>
<td id="T_7cfc2_row5_col21" class="data row5 col21">0</td>
<td id="T_7cfc2_row5_col22" class="data row5 col22">0</td>
<td id="T_7cfc2_row5_col23" class="data row5 col23">0</td>
<td id="T_7cfc2_row5_col24" class="data row5 col24">0</td>
<td id="T_7cfc2_row5_col25" class="data row5 col25">0</td>
<td id="T_7cfc2_row5_col26" class="data row5 col26">0</td>
<td id="T_7cfc2_row5_col27" class="data row5 col27">0</td>
<td id="T_7cfc2_row5_col28" class="data row5 col28">0</td>
<td id="T_7cfc2_row5_col29" class="data row5 col29">0</td>
<td id="T_7cfc2_row5_col30" class="data row5 col30">0</td>
<td id="T_7cfc2_row5_col31" class="data row5 col31">0</td>
<td id="T_7cfc2_row5_col32" class="data row5 col32">0</td>
<td id="T_7cfc2_row5_col33" class="data row5 col33">0</td>
<td id="T_7cfc2_row5_col34" class="data row5 col34">0</td>
<td id="T_7cfc2_row5_col35" class="data row5 col35">0</td>
<td id="T_7cfc2_row5_col36" class="data row5 col36">0</td>
<td id="T_7cfc2_row5_col37" class="data row5 col37">0</td>
<td id="T_7cfc2_row5_col38" class="data row5 col38">0</td>
<td id="T_7cfc2_row5_col39" class="data row5 col39">0</td>
</tr>
<tr>
<th id="T_7cfc2_level0_row6" class="row_heading level0 row6">6</th>
<td id="T_7cfc2_row6_col0" class="data row6 col0">1</td>
<td id="T_7cfc2_row6_col1" class="data row6 col1">0</td>
<td id="T_7cfc2_row6_col2" class="data row6 col2">0</td>
<td id="T_7cfc2_row6_col3" class="data row6 col3">0</td>
<td id="T_7cfc2_row6_col4" class="data row6 col4">0</td>
<td id="T_7cfc2_row6_col5" class="data row6 col5">0</td>
<td id="T_7cfc2_row6_col6" class="data row6 col6">0</td>
<td id="T_7cfc2_row6_col7" class="data row6 col7">0</td>
<td id="T_7cfc2_row6_col8" class="data row6 col8">0</td>
<td id="T_7cfc2_row6_col9" class="data row6 col9">0</td>
<td id="T_7cfc2_row6_col10" class="data row6 col10">0</td>
<td id="T_7cfc2_row6_col11" class="data row6 col11">0</td>
<td id="T_7cfc2_row6_col12" class="data row6 col12">0</td>
<td id="T_7cfc2_row6_col13" class="data row6 col13">0</td>
<td id="T_7cfc2_row6_col14" class="data row6 col14">0</td>
<td id="T_7cfc2_row6_col15" class="data row6 col15">0</td>
<td id="T_7cfc2_row6_col16" class="data row6 col16">0</td>
<td id="T_7cfc2_row6_col17" class="data row6 col17">0</td>
<td id="T_7cfc2_row6_col18" class="data row6 col18">0</td>
<td id="T_7cfc2_row6_col19" class="data row6 col19">0</td>
<td id="T_7cfc2_row6_col20" class="data row6 col20">0</td>
<td id="T_7cfc2_row6_col21" class="data row6 col21">0</td>
<td id="T_7cfc2_row6_col22" class="data row6 col22">0</td>
<td id="T_7cfc2_row6_col23" class="data row6 col23">0</td>
<td id="T_7cfc2_row6_col24" class="data row6 col24">0</td>
<td id="T_7cfc2_row6_col25" class="data row6 col25">0</td>
<td id="T_7cfc2_row6_col26" class="data row6 col26">0</td>
<td id="T_7cfc2_row6_col27" class="data row6 col27">0</td>
<td id="T_7cfc2_row6_col28" class="data row6 col28">0</td>
<td id="T_7cfc2_row6_col29" class="data row6 col29">0</td>
<td id="T_7cfc2_row6_col30" class="data row6 col30">0</td>
<td id="T_7cfc2_row6_col31" class="data row6 col31">0</td>
<td id="T_7cfc2_row6_col32" class="data row6 col32">0</td>
<td id="T_7cfc2_row6_col33" class="data row6 col33">0</td>
<td id="T_7cfc2_row6_col34" class="data row6 col34">0</td>
<td id="T_7cfc2_row6_col35" class="data row6 col35">0</td>
<td id="T_7cfc2_row6_col36" class="data row6 col36">0</td>
<td id="T_7cfc2_row6_col37" class="data row6 col37">0</td>
<td id="T_7cfc2_row6_col38" class="data row6 col38">0</td>
<td id="T_7cfc2_row6_col39" class="data row6 col39">0</td>
</tr>
<tr>
<th id="T_7cfc2_level0_row7" class="row_heading level0 row7">7</th>
<td id="T_7cfc2_row7_col0" class="data row7 col0">1</td>
<td id="T_7cfc2_row7_col1" class="data row7 col1">0</td>
<td id="T_7cfc2_row7_col2" class="data row7 col2">0</td>
<td id="T_7cfc2_row7_col3" class="data row7 col3">0</td>
<td id="T_7cfc2_row7_col4" class="data row7 col4">0</td>
<td id="T_7cfc2_row7_col5" class="data row7 col5">0</td>
<td id="T_7cfc2_row7_col6" class="data row7 col6">0</td>
<td id="T_7cfc2_row7_col7" class="data row7 col7">0</td>
<td id="T_7cfc2_row7_col8" class="data row7 col8">0</td>
<td id="T_7cfc2_row7_col9" class="data row7 col9">0</td>
<td id="T_7cfc2_row7_col10" class="data row7 col10">0</td>
<td id="T_7cfc2_row7_col11" class="data row7 col11">0</td>
<td id="T_7cfc2_row7_col12" class="data row7 col12">0</td>
<td id="T_7cfc2_row7_col13" class="data row7 col13">0</td>
<td id="T_7cfc2_row7_col14" class="data row7 col14">0</td>
<td id="T_7cfc2_row7_col15" class="data row7 col15">0</td>
<td id="T_7cfc2_row7_col16" class="data row7 col16">0</td>
<td id="T_7cfc2_row7_col17" class="data row7 col17">0</td>
<td id="T_7cfc2_row7_col18" class="data row7 col18">0</td>
<td id="T_7cfc2_row7_col19" class="data row7 col19">0</td>
<td id="T_7cfc2_row7_col20" class="data row7 col20">1</td>
<td id="T_7cfc2_row7_col21" class="data row7 col21">0</td>
<td id="T_7cfc2_row7_col22" class="data row7 col22">0</td>
<td id="T_7cfc2_row7_col23" class="data row7 col23">0</td>
<td id="T_7cfc2_row7_col24" class="data row7 col24">0</td>
<td id="T_7cfc2_row7_col25" class="data row7 col25">0</td>
<td id="T_7cfc2_row7_col26" class="data row7 col26">0</td>
<td id="T_7cfc2_row7_col27" class="data row7 col27">0</td>
<td id="T_7cfc2_row7_col28" class="data row7 col28">0</td>
<td id="T_7cfc2_row7_col29" class="data row7 col29">0</td>
<td id="T_7cfc2_row7_col30" class="data row7 col30">0</td>
<td id="T_7cfc2_row7_col31" class="data row7 col31">0</td>
<td id="T_7cfc2_row7_col32" class="data row7 col32">0</td>
<td id="T_7cfc2_row7_col33" class="data row7 col33">0</td>
<td id="T_7cfc2_row7_col34" class="data row7 col34">0</td>
<td id="T_7cfc2_row7_col35" class="data row7 col35">0</td>
<td id="T_7cfc2_row7_col36" class="data row7 col36">0</td>
<td id="T_7cfc2_row7_col37" class="data row7 col37">0</td>
<td id="T_7cfc2_row7_col38" class="data row7 col38">0</td>
<td id="T_7cfc2_row7_col39" class="data row7 col39">0</td>
</tr>
<tr>
<th id="T_7cfc2_level0_row8" class="row_heading level0 row8">8</th>
<td id="T_7cfc2_row8_col0" class="data row8 col0">1</td>
<td id="T_7cfc2_row8_col1" class="data row8 col1">0</td>
<td id="T_7cfc2_row8_col2" class="data row8 col2">0</td>
<td id="T_7cfc2_row8_col3" class="data row8 col3">0</td>
<td id="T_7cfc2_row8_col4" class="data row8 col4">0</td>
<td id="T_7cfc2_row8_col5" class="data row8 col5">0</td>
<td id="T_7cfc2_row8_col6" class="data row8 col6">0</td>
<td id="T_7cfc2_row8_col7" class="data row8 col7">0</td>
<td id="T_7cfc2_row8_col8" class="data row8 col8">0</td>
<td id="T_7cfc2_row8_col9" class="data row8 col9">0</td>
<td id="T_7cfc2_row8_col10" class="data row8 col10">0</td>
<td id="T_7cfc2_row8_col11" class="data row8 col11">0</td>
<td id="T_7cfc2_row8_col12" class="data row8 col12">0</td>
<td id="T_7cfc2_row8_col13" class="data row8 col13">0</td>
<td id="T_7cfc2_row8_col14" class="data row8 col14">0</td>
<td id="T_7cfc2_row8_col15" class="data row8 col15">0</td>
<td id="T_7cfc2_row8_col16" class="data row8 col16">0</td>
<td id="T_7cfc2_row8_col17" class="data row8 col17">0</td>
<td id="T_7cfc2_row8_col18" class="data row8 col18">0</td>
<td id="T_7cfc2_row8_col19" class="data row8 col19">0</td>
<td id="T_7cfc2_row8_col20" class="data row8 col20">0</td>
<td id="T_7cfc2_row8_col21" class="data row8 col21">0</td>
<td id="T_7cfc2_row8_col22" class="data row8 col22">0</td>
<td id="T_7cfc2_row8_col23" class="data row8 col23">0</td>
<td id="T_7cfc2_row8_col24" class="data row8 col24">0</td>
<td id="T_7cfc2_row8_col25" class="data row8 col25">0</td>
<td id="T_7cfc2_row8_col26" class="data row8 col26">0</td>
<td id="T_7cfc2_row8_col27" class="data row8 col27">0</td>
<td id="T_7cfc2_row8_col28" class="data row8 col28">0</td>
<td id="T_7cfc2_row8_col29" class="data row8 col29">0</td>
<td id="T_7cfc2_row8_col30" class="data row8 col30">0</td>
<td id="T_7cfc2_row8_col31" class="data row8 col31">0</td>
<td id="T_7cfc2_row8_col32" class="data row8 col32">0</td>
<td id="T_7cfc2_row8_col33" class="data row8 col33">0</td>
<td id="T_7cfc2_row8_col34" class="data row8 col34">0</td>
<td id="T_7cfc2_row8_col35" class="data row8 col35">0</td>
<td id="T_7cfc2_row8_col36" class="data row8 col36">0</td>
<td id="T_7cfc2_row8_col37" class="data row8 col37">0</td>
<td id="T_7cfc2_row8_col38" class="data row8 col38">0</td>
<td id="T_7cfc2_row8_col39" class="data row8 col39">0</td>
</tr>
<tr>
<th id="T_7cfc2_level0_row9" class="row_heading level0 row9">9</th>
<td id="T_7cfc2_row9_col0" class="data row9 col0">1</td>
<td id="T_7cfc2_row9_col1" class="data row9 col1">0</td>
<td id="T_7cfc2_row9_col2" class="data row9 col2">0</td>
<td id="T_7cfc2_row9_col3" class="data row9 col3">0</td>
<td id="T_7cfc2_row9_col4" class="data row9 col4">0</td>
<td id="T_7cfc2_row9_col5" class="data row9 col5">0</td>
<td id="T_7cfc2_row9_col6" class="data row9 col6">0</td>
<td id="T_7cfc2_row9_col7" class="data row9 col7">0</td>
<td id="T_7cfc2_row9_col8" class="data row9 col8">0</td>
<td id="T_7cfc2_row9_col9" class="data row9 col9">0</td>
<td id="T_7cfc2_row9_col10" class="data row9 col10">0</td>
<td id="T_7cfc2_row9_col11" class="data row9 col11">0</td>
<td id="T_7cfc2_row9_col12" class="data row9 col12">0</td>
<td id="T_7cfc2_row9_col13" class="data row9 col13">0</td>
<td id="T_7cfc2_row9_col14" class="data row9 col14">0</td>
<td id="T_7cfc2_row9_col15" class="data row9 col15">0</td>
<td id="T_7cfc2_row9_col16" class="data row9 col16">0</td>
<td id="T_7cfc2_row9_col17" class="data row9 col17">0</td>
<td id="T_7cfc2_row9_col18" class="data row9 col18">0</td>
<td id="T_7cfc2_row9_col19" class="data row9 col19">0</td>
<td id="T_7cfc2_row9_col20" class="data row9 col20">1</td>
<td id="T_7cfc2_row9_col21" class="data row9 col21">0</td>
<td id="T_7cfc2_row9_col22" class="data row9 col22">0</td>
<td id="T_7cfc2_row9_col23" class="data row9 col23">0</td>
<td id="T_7cfc2_row9_col24" class="data row9 col24">0</td>
<td id="T_7cfc2_row9_col25" class="data row9 col25">0</td>
<td id="T_7cfc2_row9_col26" class="data row9 col26">0</td>
<td id="T_7cfc2_row9_col27" class="data row9 col27">0</td>
<td id="T_7cfc2_row9_col28" class="data row9 col28">0</td>
<td id="T_7cfc2_row9_col29" class="data row9 col29">0</td>
<td id="T_7cfc2_row9_col30" class="data row9 col30">0</td>
<td id="T_7cfc2_row9_col31" class="data row9 col31">0</td>
<td id="T_7cfc2_row9_col32" class="data row9 col32">0</td>
<td id="T_7cfc2_row9_col33" class="data row9 col33">0</td>
<td id="T_7cfc2_row9_col34" class="data row9 col34">0</td>
<td id="T_7cfc2_row9_col35" class="data row9 col35">0</td>
<td id="T_7cfc2_row9_col36" class="data row9 col36">0</td>
<td id="T_7cfc2_row9_col37" class="data row9 col37">0</td>
<td id="T_7cfc2_row9_col38" class="data row9 col38">0</td>
<td id="T_7cfc2_row9_col39" class="data row9 col39">0</td>
</tr>
<tr>
<th id="T_7cfc2_level0_row10" class="row_heading level0 row10">10</th>
<td id="T_7cfc2_row10_col0" class="data row10 col0">0</td>
<td id="T_7cfc2_row10_col1" class="data row10 col1">1</td>
<td id="T_7cfc2_row10_col2" class="data row10 col2">0</td>
<td id="T_7cfc2_row10_col3" class="data row10 col3">0</td>
<td id="T_7cfc2_row10_col4" class="data row10 col4">0</td>
<td id="T_7cfc2_row10_col5" class="data row10 col5">0</td>
<td id="T_7cfc2_row10_col6" class="data row10 col6">0</td>
<td id="T_7cfc2_row10_col7" class="data row10 col7">0</td>
<td id="T_7cfc2_row10_col8" class="data row10 col8">0</td>
<td id="T_7cfc2_row10_col9" class="data row10 col9">0</td>
<td id="T_7cfc2_row10_col10" class="data row10 col10">0</td>
<td id="T_7cfc2_row10_col11" class="data row10 col11">0</td>
<td id="T_7cfc2_row10_col12" class="data row10 col12">0</td>
<td id="T_7cfc2_row10_col13" class="data row10 col13">0</td>
<td id="T_7cfc2_row10_col14" class="data row10 col14">0</td>
<td id="T_7cfc2_row10_col15" class="data row10 col15">0</td>
<td id="T_7cfc2_row10_col16" class="data row10 col16">0</td>
<td id="T_7cfc2_row10_col17" class="data row10 col17">0</td>
<td id="T_7cfc2_row10_col18" class="data row10 col18">0</td>
<td id="T_7cfc2_row10_col19" class="data row10 col19">0</td>
<td id="T_7cfc2_row10_col20" class="data row10 col20">0</td>
<td id="T_7cfc2_row10_col21" class="data row10 col21">0</td>
<td id="T_7cfc2_row10_col22" class="data row10 col22">0</td>
<td id="T_7cfc2_row10_col23" class="data row10 col23">0</td>
<td id="T_7cfc2_row10_col24" class="data row10 col24">0</td>
<td id="T_7cfc2_row10_col25" class="data row10 col25">0</td>
<td id="T_7cfc2_row10_col26" class="data row10 col26">0</td>
<td id="T_7cfc2_row10_col27" class="data row10 col27">0</td>
<td id="T_7cfc2_row10_col28" class="data row10 col28">0</td>
<td id="T_7cfc2_row10_col29" class="data row10 col29">0</td>
<td id="T_7cfc2_row10_col30" class="data row10 col30">0</td>
<td id="T_7cfc2_row10_col31" class="data row10 col31">0</td>
<td id="T_7cfc2_row10_col32" class="data row10 col32">0</td>
<td id="T_7cfc2_row10_col33" class="data row10 col33">0</td>
<td id="T_7cfc2_row10_col34" class="data row10 col34">0</td>
<td id="T_7cfc2_row10_col35" class="data row10 col35">0</td>
<td id="T_7cfc2_row10_col36" class="data row10 col36">0</td>
<td id="T_7cfc2_row10_col37" class="data row10 col37">0</td>
<td id="T_7cfc2_row10_col38" class="data row10 col38">0</td>
<td id="T_7cfc2_row10_col39" class="data row10 col39">0</td>
</tr>
<tr>
<th id="T_7cfc2_level0_row11" class="row_heading level0 row11">11</th>
<td id="T_7cfc2_row11_col0" class="data row11 col0">0</td>
<td id="T_7cfc2_row11_col1" class="data row11 col1">1</td>
<td id="T_7cfc2_row11_col2" class="data row11 col2">0</td>
<td id="T_7cfc2_row11_col3" class="data row11 col3">0</td>
<td id="T_7cfc2_row11_col4" class="data row11 col4">0</td>
<td id="T_7cfc2_row11_col5" class="data row11 col5">0</td>
<td id="T_7cfc2_row11_col6" class="data row11 col6">0</td>
<td id="T_7cfc2_row11_col7" class="data row11 col7">0</td>
<td id="T_7cfc2_row11_col8" class="data row11 col8">0</td>
<td id="T_7cfc2_row11_col9" class="data row11 col9">0</td>
<td id="T_7cfc2_row11_col10" class="data row11 col10">0</td>
<td id="T_7cfc2_row11_col11" class="data row11 col11">0</td>
<td id="T_7cfc2_row11_col12" class="data row11 col12">0</td>
<td id="T_7cfc2_row11_col13" class="data row11 col13">0</td>
<td id="T_7cfc2_row11_col14" class="data row11 col14">0</td>
<td id="T_7cfc2_row11_col15" class="data row11 col15">0</td>
<td id="T_7cfc2_row11_col16" class="data row11 col16">0</td>
<td id="T_7cfc2_row11_col17" class="data row11 col17">0</td>
<td id="T_7cfc2_row11_col18" class="data row11 col18">0</td>
<td id="T_7cfc2_row11_col19" class="data row11 col19">0</td>
<td id="T_7cfc2_row11_col20" class="data row11 col20">0</td>
<td id="T_7cfc2_row11_col21" class="data row11 col21">1</td>
<td id="T_7cfc2_row11_col22" class="data row11 col22">0</td>
<td id="T_7cfc2_row11_col23" class="data row11 col23">0</td>
<td id="T_7cfc2_row11_col24" class="data row11 col24">0</td>
<td id="T_7cfc2_row11_col25" class="data row11 col25">0</td>
<td id="T_7cfc2_row11_col26" class="data row11 col26">0</td>
<td id="T_7cfc2_row11_col27" class="data row11 col27">0</td>
<td id="T_7cfc2_row11_col28" class="data row11 col28">0</td>
<td id="T_7cfc2_row11_col29" class="data row11 col29">0</td>
<td id="T_7cfc2_row11_col30" class="data row11 col30">0</td>
<td id="T_7cfc2_row11_col31" class="data row11 col31">0</td>
<td id="T_7cfc2_row11_col32" class="data row11 col32">0</td>
<td id="T_7cfc2_row11_col33" class="data row11 col33">0</td>
<td id="T_7cfc2_row11_col34" class="data row11 col34">0</td>
<td id="T_7cfc2_row11_col35" class="data row11 col35">0</td>
<td id="T_7cfc2_row11_col36" class="data row11 col36">0</td>
<td id="T_7cfc2_row11_col37" class="data row11 col37">0</td>
<td id="T_7cfc2_row11_col38" class="data row11 col38">0</td>
<td id="T_7cfc2_row11_col39" class="data row11 col39">0</td>
</tr>
</tbody>
</table>
<p>We can visualize all of $\textbf{Z}$ here.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">plt</span><span class="p">.</span><span class="n">imshow</span><span class="p">(</span><span class="n">Z</span><span class="p">,</span> <span class="n">aspect</span><span class="o">=</span><span class="s">'auto'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">text</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">220</span><span class="p">,</span> <span class="n">s</span><span class="o">=</span><span class="s">'intercept (cafe)'</span><span class="p">,</span> <span class="n">ha</span><span class="o">=</span><span class="s">'center'</span><span class="p">,</span> <span class="n">fontsize</span><span class="o">=</span><span class="mi">14</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">text</span><span class="p">(</span><span class="mi">30</span><span class="p">,</span> <span class="mi">220</span><span class="p">,</span> <span class="n">s</span><span class="o">=</span><span class="s">'covariate (afternoon)'</span><span class="p">,</span> <span class="n">ha</span><span class="o">=</span><span class="s">'center'</span><span class="p">,</span> <span class="n">fontsize</span><span class="o">=</span><span class="mi">14</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">ylabel</span><span class="p">(</span><span class="s">'observations'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">title</span><span class="p">(</span><span class="s">'Visual representation of Z'</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Text(0.5, 1.0, 'Visual representation of Z')
</code></pre></div></div>
<p><img src="/assets/2022-09-13-mixed_effects_freqvsbayes_cafes_files/2022-09-13-mixed_effects_freqvsbayes_cafes_25_1.png" alt="png" /></p>
<p>The vector in $\textbf{u}$ is really where the mixed effects model takes advantage of the covariance structure of the data. In our dataset, the first 20 elements of the vector represent the random intercepts of the cafes and the next 20 are the random slopes. A cafe’s random effects can be thought of as an offset from the populations (the fixed effects). Accordingly, a random effect will be multivariate normally distributed, with mean 0 and a co-variance matrix S.</p>
\[\textbf{u} \sim \text{Normal}(0, \textbf{S}) \tag{2}\]
<p>Remember that the $\textbf{u}$ is a (2x20) x 1 matrix, where each cafe’s intercept $a_\text{cafe}$ and slope $b_\text{cafe}$ are contained. Therefore, we can also write this as.</p>
\[\textbf{u} = \begin{bmatrix} a_{\text{cafe}} \\ b_{\text{cafe}} \end{bmatrix} \sim \text{MVNormal} \left( \begin{bmatrix} 0 \\ 0 \end{bmatrix} , \textbf{S} \right) \tag{3}\]
<p>In other words, in Equation 1, both the random intercept and random slope are both expected to lie at 0. With regards to $\textbf{S}$, <a href="https://benslack19.github.io/data%20science/statistics/cov_matrix_weirdness/">my prior post</a> talked about covariance matrixes so I won’t elaborate here. The key conceptual point of relevance in this problem is that the covariance matrix $\textbf{S}$ can reflect the correlation ($\rho$) that the intercept (average morning wait time) and slope (difference between morning and afternoon wait time).</p>
\[\textbf{S} = \begin{pmatrix} \sigma_{\alpha}^2 & \rho\sigma_{\alpha}\sigma_{\beta} \\
\rho\sigma_{\alpha}\sigma_{\beta} & \sigma_{\beta}^2 \end{pmatrix} \tag{4}\]
<p>We know there is a correlation because (a) we generated the data that way and (b) we can directly observe this when we <a href="#visualize-data">visualized the data</a>.</p>
<p>Finally, the role of $\boldsymbol{\epsilon}$ is to capture any residual variance. Between observations, it is assumed to be homogenous and independent.</p>
<h3 id="non-linear-algebra-form">Non-linear algebra form</h3>
<p>Equation 1 is written concisely in linear algebra form. However, since our dataset is relatively simple (only one predictor variable), equation 1 can be written in an expanded, alternative form as equation 2. This might make it easier to understand (at least it did for me). The notation will start to get hairy with subscripts and so I will explicitly rename some variables for this explanation. It will also better match with the Bayesian set of equations described in the McElreath text. Equation 2 is written at the level of a single observation $i$. I’ll repeat Equation 1 here so it’s easier to see the conversion.</p>
\[\textbf{y} = \textbf{X} \boldsymbol{\beta} + \textbf{Z} \textbf{u} + \boldsymbol{\epsilon} \tag{5}\]
\[W_i = (\alpha + \beta \times A_i) + (a_{\text{cafe}[i]} + b_{\text{cafe}[i]} \times A_i) + \epsilon_i \tag{6}\]
<p>Let’s start off with the left side where we can see that $\textbf{y}$ will now be $W_i$ for wait time. On the right side, I have segmented the fixed and random effects with parentheses. For both, I’ve deconstructed the linear algebra expression form to a simpler form. After re-arrangement, we can obtain the following form in equation 3.</p>
\[W_i = (\alpha + a_{\text{cafe}[i]}) + (\beta + b_{\text{cafe}[i]}) \times A_i + \epsilon_{\text{cafe}} \tag{7}\]
<p>Here, we can better appreciate how a cafe’s random effects intercept can be thought of as an offset from the population intercept. The same logic of an offset can be applied to its slope. We will come back to equation 3 after covering Equation set 2, the Bayesian approach.</p>
<h2 id="equation-set-2-fixed-effects-as-an-adaptive-prior-varying-effects-in-the-linear-model">Equation set 2: fixed effects as an adaptive prior, varying effects in the linear model</h2>
<p>The following equations are taken from Chapter 14 in Statistical Rethinking. These set of equations look like a beast, but to be honest, they’re more intuitive to me, probably because I learned this approach initially. I’ll state the equations before comparing them directly with Equation set 1 but you may already start seeing the relationship. Essentially what is going on is a re-writing of the above equations in a Bayesian way such that the fixed effects can act as an adaptive prior.</p>
<p>\(W_i \sim \text{Normal}(\mu_i, \sigma) \tag{8}\)
\(\mu_i = \alpha_{\text{cafe}[i]} + \beta_{\text{cafe}[i]} \times A_{i} \tag{9}\)
\(\sigma \sim \text{Exp}(1) \tag{10}\)</p>
<p>Equation 8 is stating how wait time is normally distributed around $\mu$ and $\sigma$. By making $w_i$ stochastic instead of deterministic (using a ~ instead of =), the $\sigma$ replaces $\epsilon_i$. In equation 10, the prior for $\sigma$ is exponentially distributed and paramaterized with 1. The expected value parameter $\mu$ comes from the linear model in equation 9. You can start to see the similarities with equation 7 above.</p>
\[\begin{bmatrix}\alpha_{\text{cafe}} \\ \beta_{\text{cafe}} \end{bmatrix} \sim \text{MVNormal} \left( \begin{bmatrix}{\alpha} \\ {\beta} \end{bmatrix} , \textbf{S} \right) \tag{11}\]
<p>The $\alpha_{\text{cafe}}$ and $\beta_{\text{cafe}}$ terms come from sampling of a multivariate normal distribution as shown in equation 11. <strong>Note the very subtle difference in placement of the subscript <code class="language-plaintext highlighter-rouge">cafe</code> when compared to equation 6 and 7. This is an important point I’ll discuss later.</strong> On the right side, the two-dimensional normal distribution’s expected values are $\alpha$ and $\beta$. The rest of the equations shown below are our priors for each parameter we’re trying to estimate.</p>
\[\alpha \sim \text{Normal}(5, 2) \tag{13}\]
\[\beta \sim \text{Normal}(-1, 0.5) \tag{14}\]
\[\textbf{S} = \begin{pmatrix} \sigma_{\alpha}^2 & \rho\sigma_{\alpha}\sigma_{\beta} \\
\rho\sigma_{\alpha}\sigma_{\beta} & \sigma_{\beta}^2 \end{pmatrix} = \begin{pmatrix} \sigma_{\alpha} & 0 \\ 0 & \sigma_{\beta} \end{pmatrix} \textbf{R} \begin{pmatrix} \sigma_{\alpha} & 0 \\ 0 & \sigma_{\beta} \end{pmatrix} \tag{12}\]
<p>\(\sigma, \sigma_{\alpha}, \sigma_{\beta} \sim \text{Exp}(1) \tag{15}\)
\(\textbf{R} \sim \text{LKJCorr}(2) \tag{16}\)</p>
<h2 id="comparison-of-equation-sets">Comparison of equation sets</h2>
<p>To recap, the first equation set has an explicit fixed effects term and varying effects term in the linear model. In the second equation, the linear model is already “mixed”. It contains both the fixed and varying effects terms implicitly. The fixed effects estimates can be seen in equation 5.</p>
<p>I think you can think of these $\alpha_{\text{cafe}}$ and $\beta_{\text{cafe}}$ terms as already incorporating the information from the fixed and random effects simultaneously.</p>
<p>Now that we have the dataset, we can run the two models, one with <code class="language-plaintext highlighter-rouge">lmer</code> and one with <code class="language-plaintext highlighter-rouge">pymc</code>. Here are the equations that these packages run.</p>
<h1 id="running-equation-set-1-with-lmer-frequentist">Running equation set 1 with <code class="language-plaintext highlighter-rouge">lmer</code> (frequentist)</h1>
<p>The <code class="language-plaintext highlighter-rouge">lmer</code> and by extension (<code class="language-plaintext highlighter-rouge">brms</code>) syntax was initially confusing to me.</p>
<p><code class="language-plaintext highlighter-rouge">lmer(wait ~ 1 + afternoon + (1 + afternoon | cafe), df_cafes)</code></p>
<p>The <code class="language-plaintext highlighter-rouge">1</code> corresponds to inclusion of the intercept term. A <code class="language-plaintext highlighter-rouge">0</code> would exclude it. The <code class="language-plaintext highlighter-rouge">1 +
wait</code> corresponds to the “fixed effects” portion of the model ($\alpha + \beta \times A_i$) while the <code class="language-plaintext highlighter-rouge">(1 + wait | cafe)</code> is the “varying effects” ($a_{\text{cafe}} + b_{\text{cafe}} \times A_i$).</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">%%</span><span class="n">R</span><span class="w"> </span><span class="o">-</span><span class="n">i</span><span class="w"> </span><span class="n">df_cafes</span><span class="w"> </span><span class="o">-</span><span class="n">o</span><span class="w"> </span><span class="n">m</span><span class="w"> </span><span class="o">-</span><span class="n">o</span><span class="w"> </span><span class="n">df_fe_estimates</span><span class="w"> </span><span class="o">-</span><span class="n">o</span><span class="w"> </span><span class="n">df_fe_ci</span><span class="w"> </span><span class="o">-</span><span class="n">o</span><span class="w"> </span><span class="n">df_fe_summary</span><span class="w">
</span><span class="c1"># m df_fe_summary</span><span class="w">
</span><span class="n">m</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">lmer</span><span class="p">(</span><span class="n">wait</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">afternoon</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="m">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">afternoon</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">cafe</span><span class="p">),</span><span class="w"> </span><span class="n">df_cafes</span><span class="p">)</span><span class="w">
</span><span class="n">arm</span><span class="o">::</span><span class="n">display</span><span class="p">(</span><span class="n">m</span><span class="p">)</span><span class="w">
</span><span class="c1"># get fixed effects coefficients</span><span class="w">
</span><span class="n">df_fe_estimates</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">data.frame</span><span class="p">(</span><span class="n">summary</span><span class="p">(</span><span class="n">m</span><span class="p">)</span><span class="o">$</span><span class="n">coefficients</span><span class="p">)</span><span class="w">
</span><span class="c1"># get fixed effects coefficient CIs</span><span class="w">
</span><span class="n">df_fe_ci</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">data.frame</span><span class="p">(</span><span class="n">confint</span><span class="p">(</span><span class="n">m</span><span class="p">))</span><span class="w">
</span><span class="n">df_fe_summary</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">merge</span><span class="p">(</span><span class="w">
</span><span class="n">df_fe_estimates</span><span class="p">,</span><span class="w">
</span><span class="n">df_fe_ci</span><span class="p">[</span><span class="nf">c</span><span class="p">(</span><span class="s1">'(Intercept)'</span><span class="p">,</span><span class="w"> </span><span class="s1">'afternoon'</span><span class="p">),</span><span class="w"> </span><span class="p">],</span><span class="w">
</span><span class="n">by.x</span><span class="o">=</span><span class="m">0</span><span class="p">,</span><span class="w">
</span><span class="n">by.y</span><span class="o">=</span><span class="m">0</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">rownames</span><span class="p">(</span><span class="n">df_fe_summary</span><span class="p">)</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">df_fe_summary</span><span class="p">[,</span><span class="w"> </span><span class="m">1</span><span class="p">]</span><span class="w">
</span></code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>lmer(formula = wait ~ 1 + afternoon + (1 + afternoon | cafe),
data = df_cafes)
coef.est coef.se
(Intercept) 3.64 0.23
afternoon -1.04 0.11
Error terms:
Groups Name Std.Dev. Corr
cafe (Intercept) 0.99
afternoon 0.39 -0.74
Residual 0.48
---
number of obs: 200, groups: cafe, 20
AIC = 369.9, DIC = 349.2
deviance = 353.5
R[write to console]: Computing profile confidence intervals ...
</code></pre></div></div>
<p>Can we get the partial pooling results from the <code class="language-plaintext highlighter-rouge">lmer</code> output and see how it compares with the unpooled estimates? Let’s export it for use later.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">%%</span><span class="n">R</span><span class="w"> </span><span class="o">-</span><span class="n">i</span><span class="w"> </span><span class="n">m</span><span class="w"> </span><span class="o">-</span><span class="n">o</span><span class="w"> </span><span class="n">df_partial_pooling</span><span class="w"> </span><span class="o">-</span><span class="n">o</span><span class="w"> </span><span class="n">random_sims</span><span class="w">
</span><span class="c1"># Make a dataframe with the fitted effects</span><span class="w">
</span><span class="n">df_partial_pooling</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">coef</span><span class="p">(</span><span class="n">m</span><span class="p">)[[</span><span class="s2">"cafe"</span><span class="p">]]</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">rownames_to_column</span><span class="p">(</span><span class="s2">"cafe"</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">as_tibble</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">rename</span><span class="p">(</span><span class="n">Intercept</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">`(Intercept)`</span><span class="p">,</span><span class="w"> </span><span class="n">Slope_Days</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">afternoon</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">add_column</span><span class="p">(</span><span class="n">Model</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Partial pooling"</span><span class="p">)</span><span class="w">
</span><span class="c1"># estimate confidence interval</span><span class="w">
</span><span class="n">random_sims</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">REsim</span><span class="p">(</span><span class="n">m</span><span class="p">,</span><span class="w"> </span><span class="n">n.sims</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1000</span><span class="p">)</span><span class="w">
</span><span class="c1">#plotREsim(random_sims)</span><span class="w">
</span></code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">random_sims</span>
</code></pre></div></div>
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>groupFctr</th>
<th>groupID</th>
<th>term</th>
<th>mean</th>
<th>median</th>
<th>sd</th>
</tr>
</thead>
<tbody>
<tr>
<th>1</th>
<td>cafe</td>
<td>0</td>
<td>(Intercept)</td>
<td>-1.277651</td>
<td>-1.283341</td>
<td>0.379761</td>
</tr>
<tr>
<th>2</th>
<td>cafe</td>
<td>1</td>
<td>(Intercept)</td>
<td>0.164935</td>
<td>0.162715</td>
<td>0.420411</td>
</tr>
<tr>
<th>3</th>
<td>cafe</td>
<td>2</td>
<td>(Intercept)</td>
<td>-1.047076</td>
<td>-1.043646</td>
<td>0.387153</td>
</tr>
<tr>
<th>4</th>
<td>cafe</td>
<td>3</td>
<td>(Intercept)</td>
<td>0.474320</td>
<td>0.500552</td>
<td>0.400053</td>
</tr>
<tr>
<th>5</th>
<td>cafe</td>
<td>4</td>
<td>(Intercept)</td>
<td>-1.473647</td>
<td>-1.468940</td>
<td>0.394707</td>
</tr>
<tr>
<th>6</th>
<td>cafe</td>
<td>5</td>
<td>(Intercept)</td>
<td>0.086072</td>
<td>0.082010</td>
<td>0.408971</td>
</tr>
<tr>
<th>7</th>
<td>cafe</td>
<td>6</td>
<td>(Intercept)</td>
<td>-0.640217</td>
<td>-0.628944</td>
<td>0.412642</td>
</tr>
<tr>
<th>8</th>
<td>cafe</td>
<td>7</td>
<td>(Intercept)</td>
<td>1.507154</td>
<td>1.516430</td>
<td>0.391119</td>
</tr>
<tr>
<th>9</th>
<td>cafe</td>
<td>8</td>
<td>(Intercept)</td>
<td>-0.657831</td>
<td>-0.659448</td>
<td>0.394984</td>
</tr>
<tr>
<th>10</th>
<td>cafe</td>
<td>9</td>
<td>(Intercept)</td>
<td>0.332758</td>
<td>0.331037</td>
<td>0.388295</td>
</tr>
<tr>
<th>11</th>
<td>cafe</td>
<td>10</td>
<td>(Intercept)</td>
<td>-1.018611</td>
<td>-1.025387</td>
<td>0.389930</td>
</tr>
<tr>
<th>12</th>
<td>cafe</td>
<td>11</td>
<td>(Intercept)</td>
<td>0.925071</td>
<td>0.913997</td>
<td>0.397095</td>
</tr>
<tr>
<th>13</th>
<td>cafe</td>
<td>12</td>
<td>(Intercept)</td>
<td>-1.407149</td>
<td>-1.403259</td>
<td>0.384820</td>
</tr>
<tr>
<th>14</th>
<td>cafe</td>
<td>13</td>
<td>(Intercept)</td>
<td>-0.412975</td>
<td>-0.414958</td>
<td>0.412863</td>
</tr>
<tr>
<th>15</th>
<td>cafe</td>
<td>14</td>
<td>(Intercept)</td>
<td>1.346380</td>
<td>1.343109</td>
<td>0.403694</td>
</tr>
<tr>
<th>16</th>
<td>cafe</td>
<td>15</td>
<td>(Intercept)</td>
<td>0.336807</td>
<td>0.346523</td>
<td>0.390567</td>
</tr>
<tr>
<th>17</th>
<td>cafe</td>
<td>16</td>
<td>(Intercept)</td>
<td>0.747439</td>
<td>0.735906</td>
<td>0.413094</td>
</tr>
<tr>
<th>18</th>
<td>cafe</td>
<td>17</td>
<td>(Intercept)</td>
<td>-0.046579</td>
<td>-0.035018</td>
<td>0.396795</td>
</tr>
<tr>
<th>19</th>
<td>cafe</td>
<td>18</td>
<td>(Intercept)</td>
<td>1.659019</td>
<td>1.646634</td>
<td>0.393909</td>
</tr>
<tr>
<th>20</th>
<td>cafe</td>
<td>19</td>
<td>(Intercept)</td>
<td>0.323375</td>
<td>0.327348</td>
<td>0.392401</td>
</tr>
<tr>
<th>21</th>
<td>cafe</td>
<td>0</td>
<td>afternoon</td>
<td>0.498557</td>
<td>0.501401</td>
<td>0.182594</td>
</tr>
<tr>
<th>22</th>
<td>cafe</td>
<td>1</td>
<td>afternoon</td>
<td>-0.336036</td>
<td>-0.337360</td>
<td>0.193462</td>
</tr>
<tr>
<th>23</th>
<td>cafe</td>
<td>2</td>
<td>afternoon</td>
<td>0.395379</td>
<td>0.391621</td>
<td>0.189140</td>
</tr>
<tr>
<th>24</th>
<td>cafe</td>
<td>3</td>
<td>afternoon</td>
<td>0.296956</td>
<td>0.293144</td>
<td>0.191710</td>
</tr>
<tr>
<th>25</th>
<td>cafe</td>
<td>4</td>
<td>afternoon</td>
<td>0.059611</td>
<td>0.055121</td>
<td>0.189680</td>
</tr>
<tr>
<th>26</th>
<td>cafe</td>
<td>5</td>
<td>afternoon</td>
<td>-0.033068</td>
<td>-0.036143</td>
<td>0.194723</td>
</tr>
<tr>
<th>27</th>
<td>cafe</td>
<td>6</td>
<td>afternoon</td>
<td>0.236107</td>
<td>0.237904</td>
<td>0.192575</td>
</tr>
<tr>
<th>28</th>
<td>cafe</td>
<td>7</td>
<td>afternoon</td>
<td>-0.473485</td>
<td>-0.479199</td>
<td>0.185549</td>
</tr>
<tr>
<th>29</th>
<td>cafe</td>
<td>8</td>
<td>afternoon</td>
<td>0.408039</td>
<td>0.411507</td>
<td>0.194145</td>
</tr>
<tr>
<th>30</th>
<td>cafe</td>
<td>9</td>
<td>afternoon</td>
<td>-0.402131</td>
<td>-0.393931</td>
<td>0.186868</td>
</tr>
<tr>
<th>31</th>
<td>cafe</td>
<td>10</td>
<td>afternoon</td>
<td>0.316072</td>
<td>0.309198</td>
<td>0.189218</td>
</tr>
<tr>
<th>32</th>
<td>cafe</td>
<td>11</td>
<td>afternoon</td>
<td>-0.335749</td>
<td>-0.340427</td>
<td>0.186644</td>
</tr>
<tr>
<th>33</th>
<td>cafe</td>
<td>12</td>
<td>afternoon</td>
<td>0.521558</td>
<td>0.519243</td>
<td>0.184606</td>
</tr>
<tr>
<th>34</th>
<td>cafe</td>
<td>13</td>
<td>afternoon</td>
<td>-0.006800</td>
<td>-0.014344</td>
<td>0.199548</td>
</tr>
<tr>
<th>35</th>
<td>cafe</td>
<td>14</td>
<td>afternoon</td>
<td>-0.277165</td>
<td>-0.281127</td>
<td>0.188748</td>
</tr>
<tr>
<th>36</th>
<td>cafe</td>
<td>15</td>
<td>afternoon</td>
<td>-0.234501</td>
<td>-0.235683</td>
<td>0.192804</td>
</tr>
<tr>
<th>37</th>
<td>cafe</td>
<td>16</td>
<td>afternoon</td>
<td>-0.182673</td>
<td>-0.185997</td>
<td>0.194017</td>
</tr>
<tr>
<th>38</th>
<td>cafe</td>
<td>17</td>
<td>afternoon</td>
<td>-0.017126</td>
<td>-0.023784</td>
<td>0.187302</td>
</tr>
<tr>
<th>39</th>
<td>cafe</td>
<td>18</td>
<td>afternoon</td>
<td>-0.364424</td>
<td>-0.364049</td>
<td>0.187532</td>
</tr>
<tr>
<th>40</th>
<td>cafe</td>
<td>19</td>
<td>afternoon</td>
<td>-0.028883</td>
<td>-0.032691</td>
<td>0.185824</td>
</tr>
</tbody>
</table>
</div>
<p>OK, now let’s try the Bayesian approach and compare answers.</p>
<h1 id="running-equation-set-2-with-pymc-bayesian">Running equation set 2 with <code class="language-plaintext highlighter-rouge">pymc</code> (Bayesian)</h1>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">n_cafes</span> <span class="o">=</span> <span class="n">df_cafes</span><span class="p">[</span><span class="s">'cafe'</span><span class="p">].</span><span class="n">nunique</span><span class="p">()</span>
<span class="n">cafe_idx</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">Categorical</span><span class="p">(</span><span class="n">df_cafes</span><span class="p">[</span><span class="s">"cafe"</span><span class="p">]).</span><span class="n">codes</span>
<span class="k">with</span> <span class="n">pm</span><span class="p">.</span><span class="n">Model</span><span class="p">()</span> <span class="k">as</span> <span class="n">m14_1</span><span class="p">:</span>
<span class="c1"># can't specify a separate sigma_a and sigma_b for sd_dist but they're equivalent here
</span> <span class="n">chol</span><span class="p">,</span> <span class="n">Rho_</span><span class="p">,</span> <span class="n">sigma_cafe</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">LKJCholeskyCov</span><span class="p">(</span>
<span class="s">"chol_cov"</span><span class="p">,</span> <span class="n">n</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">eta</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">sd_dist</span><span class="o">=</span><span class="n">pm</span><span class="p">.</span><span class="n">Exponential</span><span class="p">.</span><span class="n">dist</span><span class="p">(</span><span class="mf">1.0</span><span class="p">),</span> <span class="n">compute_corr</span><span class="o">=</span><span class="bp">True</span>
<span class="p">)</span>
<span class="n">a_bar</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">"a_bar"</span><span class="p">,</span> <span class="n">mu</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">sigma</span><span class="o">=</span><span class="mf">2.0</span><span class="p">)</span> <span class="c1"># prior for average intercept
</span> <span class="n">b_bar</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">"b_bar"</span><span class="p">,</span> <span class="n">mu</span><span class="o">=-</span><span class="mi">1</span><span class="p">,</span> <span class="n">sigma</span><span class="o">=</span><span class="mf">0.5</span><span class="p">)</span> <span class="c1"># prior for average slope
</span>
<span class="n">ab_subject</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">MvNormal</span><span class="p">(</span>
<span class="s">"ab_subject"</span><span class="p">,</span> <span class="n">mu</span><span class="o">=</span><span class="n">at</span><span class="p">.</span><span class="n">stack</span><span class="p">([</span><span class="n">a_bar</span><span class="p">,</span> <span class="n">b_bar</span><span class="p">]),</span> <span class="n">chol</span><span class="o">=</span><span class="n">chol</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="n">n_cafes</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
<span class="p">)</span> <span class="c1"># population of varying effects
</span> <span class="c1"># shape needs to be (n_cafes, 2) because we're getting back both a and b for each cafe
</span>
<span class="n">mu</span> <span class="o">=</span> <span class="n">ab_subject</span><span class="p">[</span><span class="n">cafe_idx</span><span class="p">,</span> <span class="mi">0</span><span class="p">]</span> <span class="o">+</span> <span class="n">ab_subject</span><span class="p">[</span><span class="n">cafe_idx</span><span class="p">,</span> <span class="mi">1</span><span class="p">]</span> <span class="o">*</span> <span class="n">df_cafes</span><span class="p">[</span><span class="s">"afternoon"</span><span class="p">].</span><span class="n">values</span> <span class="c1"># linear model
</span> <span class="n">sigma_within</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Exponential</span><span class="p">(</span><span class="s">"sigma_within"</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">)</span> <span class="c1"># prior stddev within cafes (in the top line)
</span>
<span class="n">wait</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">"wait"</span><span class="p">,</span> <span class="n">mu</span><span class="o">=</span><span class="n">mu</span><span class="p">,</span> <span class="n">sigma</span><span class="o">=</span><span class="n">sigma_within</span><span class="p">,</span> <span class="n">observed</span><span class="o">=</span><span class="n">df_cafes</span><span class="p">[</span><span class="s">"wait"</span><span class="p">].</span><span class="n">values</span><span class="p">)</span> <span class="c1"># likelihood
</span>
<span class="n">idata_m14_1</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">sample</span><span class="p">(</span><span class="mi">1000</span><span class="p">,</span> <span class="n">target_accept</span><span class="o">=</span><span class="mf">0.9</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [chol_cov, a_bar, b_bar, ab_subject, sigma_within]
</code></pre></div></div>
<style>
/* Turns off some styling */
progress {
/* gets rid of default border in Firefox and Opera. */
border: none;
/* Needs to be in here for Safari polyfill so background images work as expected. */
background-size: auto;
}
progress:not([value]), progress:not([value])::-webkit-progress-bar {
background: repeating-linear-gradient(45deg, #7e7e7e, #7e7e7e 10px, #5c5c5c 10px, #5c5c5c 20px);
}
.progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar {
background: #F44336;
}
</style>
<div>
<progress value="8000" class="" max="8000" style="width:300px; height:20px; vertical-align: middle;"></progress>
100.00% [8000/8000 02:03<00:00 Sampling 4 chains, 1 divergences]
</div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 140 seconds.
There was 1 divergence after tuning. Increase `target_accept` or reparameterize.
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># take a glimpse at the head and tail of the summary table
</span><span class="n">pd</span><span class="p">.</span><span class="n">concat</span><span class="p">(</span>
<span class="p">[</span>
<span class="n">az</span><span class="p">.</span><span class="n">summary</span><span class="p">(</span><span class="n">idata_m14_1</span><span class="p">).</span><span class="n">head</span><span class="p">(</span><span class="mi">10</span><span class="p">),</span>
<span class="n">az</span><span class="p">.</span><span class="n">summary</span><span class="p">(</span><span class="n">idata_m14_1</span><span class="p">).</span><span class="n">tail</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span>
<span class="p">]</span>
<span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/Users/blacar/opt/anaconda3/envs/pymc_env2/lib/python3.10/site-packages/arviz/stats/diagnostics.py:586: RuntimeWarning: invalid value encountered in double_scalars
(between_chain_variance / within_chain_variance + num_samples - 1) / (num_samples)
/Users/blacar/opt/anaconda3/envs/pymc_env2/lib/python3.10/site-packages/arviz/stats/diagnostics.py:586: RuntimeWarning: invalid value encountered in double_scalars
(between_chain_variance / within_chain_variance + num_samples - 1) / (num_samples)
</code></pre></div></div>
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>mean</th>
<th>sd</th>
<th>hdi_2.5%</th>
<th>hdi_97.5%</th>
<th>mcse_mean</th>
<th>mcse_sd</th>
<th>ess_bulk</th>
<th>ess_tail</th>
<th>r_hat</th>
</tr>
</thead>
<tbody>
<tr>
<th>a_bar</th>
<td>3.654</td>
<td>0.223</td>
<td>3.203</td>
<td>4.074</td>
<td>0.003</td>
<td>0.002</td>
<td>4802.0</td>
<td>3140.0</td>
<td>1.0</td>
</tr>
<tr>
<th>b_bar</th>
<td>-1.049</td>
<td>0.109</td>
<td>-1.265</td>
<td>-0.844</td>
<td>0.002</td>
<td>0.001</td>
<td>3446.0</td>
<td>3200.0</td>
<td>1.0</td>
</tr>
<tr>
<th>ab_subject[0, 0]</th>
<td>2.380</td>
<td>0.200</td>
<td>1.996</td>
<td>2.785</td>
<td>0.003</td>
<td>0.002</td>
<td>4271.0</td>
<td>2783.0</td>
<td>1.0</td>
</tr>
<tr>
<th>ab_subject[0, 1]</th>
<td>-0.587</td>
<td>0.245</td>
<td>-1.071</td>
<td>-0.119</td>
<td>0.004</td>
<td>0.003</td>
<td>3077.0</td>
<td>2833.0</td>
<td>1.0</td>
</tr>
<tr>
<th>ab_subject[1, 0]</th>
<td>3.820</td>
<td>0.199</td>
<td>3.442</td>
<td>4.220</td>
<td>0.003</td>
<td>0.002</td>
<td>3988.0</td>
<td>3167.0</td>
<td>1.0</td>
</tr>
<tr>
<th>ab_subject[1, 1]</th>
<td>-1.402</td>
<td>0.248</td>
<td>-1.897</td>
<td>-0.945</td>
<td>0.004</td>
<td>0.003</td>
<td>3165.0</td>
<td>3182.0</td>
<td>1.0</td>
</tr>
<tr>
<th>ab_subject[2, 0]</th>
<td>2.606</td>
<td>0.199</td>
<td>2.210</td>
<td>2.988</td>
<td>0.003</td>
<td>0.002</td>
<td>4702.0</td>
<td>3450.0</td>
<td>1.0</td>
</tr>
<tr>
<th>ab_subject[2, 1]</th>
<td>-0.681</td>
<td>0.240</td>
<td>-1.156</td>
<td>-0.218</td>
<td>0.004</td>
<td>0.003</td>
<td>3696.0</td>
<td>3014.0</td>
<td>1.0</td>
</tr>
<tr>
<th>ab_subject[3, 0]</th>
<td>4.120</td>
<td>0.203</td>
<td>3.739</td>
<td>4.532</td>
<td>0.003</td>
<td>0.002</td>
<td>3475.0</td>
<td>2800.0</td>
<td>1.0</td>
</tr>
<tr>
<th>ab_subject[3, 1]</th>
<td>-0.707</td>
<td>0.266</td>
<td>-1.213</td>
<td>-0.184</td>
<td>0.005</td>
<td>0.004</td>
<td>2482.0</td>
<td>2921.0</td>
<td>1.0</td>
</tr>
<tr>
<th>chol_cov[0]</th>
<td>0.988</td>
<td>0.163</td>
<td>0.710</td>
<td>1.328</td>
<td>0.002</td>
<td>0.002</td>
<td>5207.0</td>
<td>3263.0</td>
<td>1.0</td>
</tr>
<tr>
<th>chol_cov[1]</th>
<td>-0.226</td>
<td>0.105</td>
<td>-0.442</td>
<td>-0.033</td>
<td>0.002</td>
<td>0.001</td>
<td>2769.0</td>
<td>3178.0</td>
<td>1.0</td>
</tr>
<tr>
<th>chol_cov[2]</th>
<td>0.299</td>
<td>0.093</td>
<td>0.120</td>
<td>0.481</td>
<td>0.002</td>
<td>0.002</td>
<td>1379.0</td>
<td>1308.0</td>
<td>1.0</td>
</tr>
<tr>
<th>sigma_within</th>
<td>0.482</td>
<td>0.027</td>
<td>0.431</td>
<td>0.534</td>
<td>0.000</td>
<td>0.000</td>
<td>3773.0</td>
<td>2542.0</td>
<td>1.0</td>
</tr>
<tr>
<th>chol_cov_corr[0, 0]</th>
<td>1.000</td>
<td>0.000</td>
<td>1.000</td>
<td>1.000</td>
<td>0.000</td>
<td>0.000</td>
<td>4000.0</td>
<td>4000.0</td>
<td>NaN</td>
</tr>
<tr>
<th>chol_cov_corr[0, 1]</th>
<td>-0.579</td>
<td>0.192</td>
<td>-0.898</td>
<td>-0.196</td>
<td>0.003</td>
<td>0.002</td>
<td>3196.0</td>
<td>2983.0</td>
<td>1.0</td>
</tr>
<tr>
<th>chol_cov_corr[1, 0]</th>
<td>-0.579</td>
<td>0.192</td>
<td>-0.898</td>
<td>-0.196</td>
<td>0.003</td>
<td>0.002</td>
<td>3196.0</td>
<td>2983.0</td>
<td>1.0</td>
</tr>
<tr>
<th>chol_cov_corr[1, 1]</th>
<td>1.000</td>
<td>0.000</td>
<td>1.000</td>
<td>1.000</td>
<td>0.000</td>
<td>0.000</td>
<td>4087.0</td>
<td>4000.0</td>
<td>1.0</td>
</tr>
<tr>
<th>chol_cov_stds[0]</th>
<td>0.988</td>
<td>0.163</td>
<td>0.710</td>
<td>1.328</td>
<td>0.002</td>
<td>0.002</td>
<td>5207.0</td>
<td>3263.0</td>
<td>1.0</td>
</tr>
<tr>
<th>chol_cov_stds[1]</th>
<td>0.386</td>
<td>0.107</td>
<td>0.182</td>
<td>0.605</td>
<td>0.003</td>
<td>0.002</td>
<td>1541.0</td>
<td>1201.0</td>
<td>1.0</td>
</tr>
</tbody>
</table>
</div>
<h1 id="comparison-of-lmer-and-pymc-outputs">Comparison of <code class="language-plaintext highlighter-rouge">lmer</code> and <code class="language-plaintext highlighter-rouge">pymc</code> outputs</h1>
<p>While <code class="language-plaintext highlighter-rouge">pymc</code> returns posterior estimates for each parameter, including $\rho$, for this post, we are interested in comparing the output comparable to the “fixed effects” and “varying effects” from <code class="language-plaintext highlighter-rouge">lmer</code>. Having the equations above can help us piece together the relevant bits of information. The fixed intercept and slope are easy because we’ve used the same characters $\alpha$ and $\beta$ in equation set 2 as we did in Equation set 1.</p>
<p>However, when identifying the “varying effects”, we’ll have to do some arithmetic with the <code class="language-plaintext highlighter-rouge">pymc</code> output. In contrast with the <code class="language-plaintext highlighter-rouge">lmer</code> output, the <code class="language-plaintext highlighter-rouge">pymc</code> outputs have the estimate for each cafe with “baked in” varying effects. In other words, the “offset” that we see in equation 7 ($a_{\text{cafe}[i]}$ and $b_{\text{cafe}[i]}$)</p>
\[W_i = (\alpha + a_{\text{cafe}[i]}) + (\beta + b_{\text{cafe}[i]}) \times A_i + \epsilon_{\text{cafe}} \tag{7}\]
\[\mu_i = \alpha_{\text{cafe}[i]} + \beta_{\text{cafe}[i]} \times A_{i} \tag{9}\]
<p>are already embedded in ($\alpha_{\text{cafe}[i]}$ and $\beta_{\text{cafe}[i]}$) in equation 9. We’ll have to therefore subtract out the fixed effecs in the <code class="language-plaintext highlighter-rouge">pymc</code> output before we can compare with the <code class="language-plaintext highlighter-rouge">lmer</code> output. First, let’s get fixed effects from <code class="language-plaintext highlighter-rouge">pymc</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df_summary_int_and_slope</span> <span class="o">=</span> <span class="n">az</span><span class="p">.</span><span class="n">summary</span><span class="p">(</span><span class="n">idata_m14_1</span><span class="p">,</span> <span class="n">var_names</span><span class="o">=</span><span class="p">[</span><span class="s">'a_bar'</span><span class="p">,</span> <span class="s">'b_bar'</span><span class="p">])</span>
<span class="n">df_summary_int_and_slope</span>
</code></pre></div></div>
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>mean</th>
<th>sd</th>
<th>hdi_2.5%</th>
<th>hdi_97.5%</th>
<th>mcse_mean</th>
<th>mcse_sd</th>
<th>ess_bulk</th>
<th>ess_tail</th>
<th>r_hat</th>
</tr>
</thead>
<tbody>
<tr>
<th>a_bar</th>
<td>3.654</td>
<td>0.223</td>
<td>3.203</td>
<td>4.074</td>
<td>0.003</td>
<td>0.002</td>
<td>4802.0</td>
<td>3140.0</td>
<td>1.0</td>
</tr>
<tr>
<th>b_bar</th>
<td>-1.049</td>
<td>0.109</td>
<td>-1.265</td>
<td>-0.844</td>
<td>0.002</td>
<td>0.001</td>
<td>3446.0</td>
<td>3200.0</td>
<td>1.0</td>
</tr>
</tbody>
</table>
</div>
<p>These estimates and uncertainties compare well with the fixed estimates <code class="language-plaintext highlighter-rouge">lmer</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">f</span><span class="p">,</span> <span class="p">(</span><span class="n">ax0</span><span class="p">,</span> <span class="n">ax1</span><span class="p">)</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">12</span><span class="p">,</span><span class="mi">4</span><span class="p">))</span>
<span class="c1"># value to generate data
# a, average morning wait time was defined above
</span><span class="n">ax0</span><span class="p">.</span><span class="n">vlines</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">a</span><span class="p">,</span> <span class="n">ymin</span><span class="o">=</span><span class="mf">0.8</span><span class="p">,</span> <span class="n">ymax</span><span class="o">=</span><span class="mf">1.2</span><span class="p">,</span> <span class="n">linestyle</span><span class="o">=</span><span class="s">'dashed'</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'red'</span><span class="p">)</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">vlines</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">b</span><span class="p">,</span> <span class="n">ymin</span><span class="o">=</span><span class="mf">0.8</span><span class="p">,</span> <span class="n">ymax</span><span class="o">=</span><span class="mf">1.2</span><span class="p">,</span> <span class="n">linestyle</span><span class="o">=</span><span class="s">'dashed'</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'red'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'simulated value'</span><span class="p">)</span>
<span class="c1"># pymc fixed effects value
</span><span class="n">ax0</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">df_summary_int_and_slope</span><span class="p">.</span><span class="n">loc</span><span class="p">[</span><span class="s">'a_bar'</span><span class="p">,</span> <span class="s">'mean'</span><span class="p">],</span> <span class="mf">1.1</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'navy'</span><span class="p">)</span>
<span class="n">ax0</span><span class="p">.</span><span class="n">hlines</span><span class="p">(</span><span class="n">xmin</span><span class="o">=</span><span class="n">df_summary_int_and_slope</span><span class="p">.</span><span class="n">loc</span><span class="p">[</span><span class="s">'a_bar'</span><span class="p">,</span> <span class="s">'hdi_2.5%'</span><span class="p">],</span> <span class="n">xmax</span><span class="o">=</span><span class="n">df_summary_int_and_slope</span><span class="p">.</span><span class="n">loc</span><span class="p">[</span><span class="s">'a_bar'</span><span class="p">,</span> <span class="s">'hdi_97.5%'</span><span class="p">],</span> <span class="n">y</span><span class="o">=</span><span class="mf">1.1</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'navy'</span><span class="p">)</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">df_summary_int_and_slope</span><span class="p">.</span><span class="n">loc</span><span class="p">[</span><span class="s">'b_bar'</span><span class="p">,</span> <span class="s">'mean'</span><span class="p">],</span> <span class="mf">1.1</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'navy'</span><span class="p">)</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">hlines</span><span class="p">(</span><span class="n">xmin</span><span class="o">=</span><span class="n">df_summary_int_and_slope</span><span class="p">.</span><span class="n">loc</span><span class="p">[</span><span class="s">'b_bar'</span><span class="p">,</span> <span class="s">'hdi_2.5%'</span><span class="p">],</span> <span class="n">xmax</span><span class="o">=</span><span class="n">df_summary_int_and_slope</span><span class="p">.</span><span class="n">loc</span><span class="p">[</span><span class="s">'b_bar'</span><span class="p">,</span> <span class="s">'hdi_97.5%'</span><span class="p">],</span> <span class="n">y</span><span class="o">=</span><span class="mf">1.1</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'navy'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'pymc estimate'</span><span class="p">)</span>
<span class="c1"># lmer fixed effects estimate
</span><span class="n">ax0</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">df_fe_summary</span><span class="p">.</span><span class="n">loc</span><span class="p">[</span><span class="s">'(Intercept)'</span><span class="p">,</span> <span class="s">'Estimate'</span><span class="p">],</span> <span class="mf">0.9</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'darkgreen'</span><span class="p">)</span>
<span class="n">ax0</span><span class="p">.</span><span class="n">hlines</span><span class="p">(</span><span class="n">xmin</span><span class="o">=</span><span class="n">df_fe_summary</span><span class="p">.</span><span class="n">loc</span><span class="p">[</span><span class="s">'(Intercept)'</span><span class="p">,</span> <span class="s">'X2.5..'</span><span class="p">],</span> <span class="n">xmax</span><span class="o">=</span><span class="n">df_fe_summary</span><span class="p">.</span><span class="n">loc</span><span class="p">[</span><span class="s">'(Intercept)'</span><span class="p">,</span> <span class="s">'X97.5..'</span><span class="p">],</span> <span class="n">y</span><span class="o">=</span><span class="mf">0.9</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'darkgreen'</span><span class="p">)</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">df_fe_summary</span><span class="p">.</span><span class="n">loc</span><span class="p">[</span><span class="s">'afternoon'</span><span class="p">,</span> <span class="s">'Estimate'</span><span class="p">],</span> <span class="mf">0.9</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'darkgreen'</span><span class="p">)</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">hlines</span><span class="p">(</span><span class="n">xmin</span><span class="o">=</span><span class="n">df_fe_summary</span><span class="p">.</span><span class="n">loc</span><span class="p">[</span><span class="s">'afternoon'</span><span class="p">,</span> <span class="s">'X2.5..'</span><span class="p">],</span> <span class="n">xmax</span><span class="o">=</span><span class="n">df_fe_summary</span><span class="p">.</span><span class="n">loc</span><span class="p">[</span><span class="s">'afternoon'</span><span class="p">,</span> <span class="s">'X97.5..'</span><span class="p">],</span> <span class="n">y</span><span class="o">=</span><span class="mf">0.9</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'darkgreen'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'lmer estimate'</span><span class="p">)</span>
<span class="c1"># plot formatting
</span><span class="n">f</span><span class="p">.</span><span class="n">suptitle</span><span class="p">(</span><span class="s">'Fixed effect estimates'</span><span class="p">)</span>
<span class="n">ax0</span><span class="p">.</span><span class="n">set_yticks</span><span class="p">([</span><span class="mf">0.9</span><span class="p">,</span> <span class="mf">1.1</span><span class="p">])</span>
<span class="n">ax0</span><span class="p">.</span><span class="n">set_yticklabels</span><span class="p">([</span><span class="s">'lmer'</span><span class="p">,</span> <span class="s">'pymc'</span><span class="p">])</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">set_yticks</span><span class="p">([</span><span class="mf">0.9</span><span class="p">,</span> <span class="mf">1.1</span><span class="p">])</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">set_yticklabels</span><span class="p">([</span><span class="s">''</span><span class="p">,</span> <span class="s">''</span><span class="p">])</span>
<span class="n">ax0</span><span class="p">.</span><span class="nb">set</span><span class="p">(</span><span class="n">xlabel</span><span class="o">=</span><span class="s">'intercept'</span><span class="p">)</span>
<span class="n">ax1</span><span class="p">.</span><span class="nb">set</span><span class="p">(</span><span class="n">xlabel</span><span class="o">=</span><span class="s">'slope'</span><span class="p">)</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">legend</span><span class="p">(</span><span class="n">fontsize</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">tight_layout</span><span class="p">()</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/var/folders/tw/b9j0wcdj6_9cyljwt364lx7c0000gn/T/ipykernel_5516/1253574855.py:30: UserWarning: This figure was using constrained_layout, but that is incompatible with subplots_adjust and/or tight_layout; disabling constrained_layout.
plt.tight_layout()
</code></pre></div></div>
<p><img src="/assets/2022-09-13-mixed_effects_freqvsbayes_cafes_files/2022-09-13-mixed_effects_freqvsbayes_cafes_46_1.png" alt="png" /></p>
<p>As promised, here is the meme that rewards you for paying attention this far!
<img src="/assets/2022-09-13-mixed_effects_freqvsbayes_cafes_files/spideman_IMG_4672.JPG" alt="jpg" /></p>
<p>Now to get the varying effects from <code class="language-plaintext highlighter-rouge">pymc</code> output, we’ll take each sample’s intercept and slope and subtract the fixed estimate.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Convert to pandas dataframe and take a glimpse at the first few rows
</span><span class="n">idata_m14_1_df</span> <span class="o">=</span> <span class="n">idata_m14_1</span><span class="p">.</span><span class="n">to_dataframe</span><span class="p">()</span>
<span class="n">idata_m14_1_df</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>chain</th>
<th>draw</th>
<th>(posterior, a_bar)</th>
<th>(posterior, b_bar)</th>
<th>(posterior, ab_subject[0,0], 0, 0)</th>
<th>(posterior, ab_subject[0,1], 0, 1)</th>
<th>(posterior, ab_subject[1,0], 1, 0)</th>
<th>(posterior, ab_subject[1,1], 1, 1)</th>
<th>(posterior, ab_subject[10,0], 10, 0)</th>
<th>(posterior, ab_subject[10,1], 10, 1)</th>
<th>(posterior, ab_subject[11,0], 11, 0)</th>
<th>(posterior, ab_subject[11,1], 11, 1)</th>
<th>(posterior, ab_subject[12,0], 12, 0)</th>
<th>(posterior, ab_subject[12,1], 12, 1)</th>
<th>(posterior, ab_subject[13,0], 13, 0)</th>
<th>(posterior, ab_subject[13,1], 13, 1)</th>
<th>(posterior, ab_subject[14,0], 14, 0)</th>
<th>(posterior, ab_subject[14,1], 14, 1)</th>
<th>(posterior, ab_subject[15,0], 15, 0)</th>
<th>(posterior, ab_subject[15,1], 15, 1)</th>
<th>...</th>
<th>(log_likelihood, wait[97], 97)</th>
<th>(log_likelihood, wait[98], 98)</th>
<th>(log_likelihood, wait[99], 99)</th>
<th>(log_likelihood, wait[9], 9)</th>
<th>(sample_stats, tree_depth)</th>
<th>(sample_stats, max_energy_error)</th>
<th>(sample_stats, process_time_diff)</th>
<th>(sample_stats, perf_counter_diff)</th>
<th>(sample_stats, energy)</th>
<th>(sample_stats, step_size_bar)</th>
<th>(sample_stats, diverging)</th>
<th>(sample_stats, energy_error)</th>
<th>(sample_stats, lp)</th>
<th>(sample_stats, acceptance_rate)</th>
<th>(sample_stats, n_steps)</th>
<th>(sample_stats, largest_eigval)</th>
<th>(sample_stats, smallest_eigval)</th>
<th>(sample_stats, index_in_trajectory)</th>
<th>(sample_stats, step_size)</th>
<th>(sample_stats, perf_counter_start)</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>0</td>
<td>0</td>
<td>3.397744</td>
<td>-0.993140</td>
<td>2.353823</td>
<td>-0.712216</td>
<td>3.936642</td>
<td>-1.328451</td>
<td>2.497521</td>
<td>-0.990675</td>
<td>4.589760</td>
<td>-1.271864</td>
<td>2.272038</td>
<td>-0.780358</td>
<td>3.400074</td>
<td>-1.307487</td>
<td>4.660517</td>
<td>-0.920542</td>
<td>3.967868</td>
<td>-1.339014</td>
<td>...</td>
<td>-0.592594</td>
<td>-0.280869</td>
<td>-1.783441</td>
<td>-0.404212</td>
<td>5</td>
<td>-0.452539</td>
<td>0.234946</td>
<td>0.067260</td>
<td>194.679539</td>
<td>0.246795</td>
<td>False</td>
<td>-0.226605</td>
<td>-167.432037</td>
<td>0.975607</td>
<td>31.0</td>
<td>NaN</td>
<td>NaN</td>
<td>-17</td>
<td>0.284311</td>
<td>192.355518</td>
</tr>
<tr>
<th>1</th>
<td>0</td>
<td>1</td>
<td>3.227032</td>
<td>-1.105823</td>
<td>2.486742</td>
<td>-0.657790</td>
<td>3.890044</td>
<td>-1.788579</td>
<td>2.894867</td>
<td>-0.741011</td>
<td>4.346072</td>
<td>-1.048541</td>
<td>2.446301</td>
<td>-0.678041</td>
<td>3.564795</td>
<td>-1.520221</td>
<td>5.013627</td>
<td>-1.128684</td>
<td>3.793134</td>
<td>-1.084814</td>
<td>...</td>
<td>-0.581570</td>
<td>-0.708670</td>
<td>-1.709776</td>
<td>-0.741664</td>
<td>4</td>
<td>0.498338</td>
<td>0.123327</td>
<td>0.033713</td>
<td>196.867266</td>
<td>0.246795</td>
<td>False</td>
<td>0.273832</td>
<td>-177.694232</td>
<td>0.809115</td>
<td>15.0</td>
<td>NaN</td>
<td>NaN</td>
<td>-8</td>
<td>0.284311</td>
<td>192.423125</td>
</tr>
<tr>
<th>2</th>
<td>0</td>
<td>2</td>
<td>3.393307</td>
<td>-0.926431</td>
<td>2.348434</td>
<td>-0.604619</td>
<td>3.905778</td>
<td>-1.355137</td>
<td>2.712834</td>
<td>-1.124770</td>
<td>4.409195</td>
<td>-1.291088</td>
<td>2.324233</td>
<td>-0.754508</td>
<td>3.586107</td>
<td>-1.562165</td>
<td>5.050191</td>
<td>-1.556993</td>
<td>4.122478</td>
<td>-1.718417</td>
<td>...</td>
<td>-0.452885</td>
<td>-0.109849</td>
<td>-2.293094</td>
<td>-0.559207</td>
<td>5</td>
<td>-0.382814</td>
<td>0.236803</td>
<td>0.063232</td>
<td>207.926089</td>
<td>0.246795</td>
<td>False</td>
<td>-0.347905</td>
<td>-176.112370</td>
<td>0.968229</td>
<td>31.0</td>
<td>NaN</td>
<td>NaN</td>
<td>6</td>
<td>0.284311</td>
<td>192.457135</td>
</tr>
<tr>
<th>3</th>
<td>0</td>
<td>3</td>
<td>3.750943</td>
<td>-1.109148</td>
<td>2.613325</td>
<td>-0.667234</td>
<td>3.682009</td>
<td>-1.293790</td>
<td>2.558511</td>
<td>-0.362557</td>
<td>4.548968</td>
<td>-1.266139</td>
<td>2.264383</td>
<td>-0.445725</td>
<td>3.102086</td>
<td>-0.903726</td>
<td>4.589499</td>
<td>-0.409875</td>
<td>4.063760</td>
<td>-1.249921</td>
<td>...</td>
<td>-1.239451</td>
<td>-0.574010</td>
<td>-0.906557</td>
<td>-1.015460</td>
<td>4</td>
<td>-0.530897</td>
<td>0.116930</td>
<td>0.037484</td>
<td>198.279760</td>
<td>0.246795</td>
<td>False</td>
<td>-0.024171</td>
<td>-180.489888</td>
<td>0.987683</td>
<td>15.0</td>
<td>NaN</td>
<td>NaN</td>
<td>-9</td>
<td>0.284311</td>
<td>192.520656</td>
</tr>
<tr>
<th>4</th>
<td>0</td>
<td>4</td>
<td>3.416951</td>
<td>-1.152993</td>
<td>2.478859</td>
<td>-0.812085</td>
<td>3.773041</td>
<td>-1.423143</td>
<td>2.136978</td>
<td>-0.465100</td>
<td>4.385045</td>
<td>-1.180823</td>
<td>2.160109</td>
<td>-0.395771</td>
<td>3.459758</td>
<td>-1.300131</td>
<td>5.527213</td>
<td>-2.107117</td>
<td>3.906480</td>
<td>-1.388326</td>
<td>...</td>
<td>-0.400278</td>
<td>-0.240346</td>
<td>-2.188396</td>
<td>-0.471960</td>
<td>5</td>
<td>-0.382498</td>
<td>0.241781</td>
<td>0.072736</td>
<td>207.993298</td>
<td>0.246795</td>
<td>False</td>
<td>-0.041904</td>
<td>-183.942618</td>
<td>0.999986</td>
<td>31.0</td>
<td>NaN</td>
<td>NaN</td>
<td>-24</td>
<td>0.284311</td>
<td>192.558443</td>
</tr>
</tbody>
</table>
<p>5 rows × 270 columns</p>
</div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Get the "unbaked in" varying intercept and slope
</span><span class="n">bayesian_int</span> <span class="o">=</span> <span class="nb">list</span><span class="p">()</span>
<span class="n">bayesian_slope</span> <span class="o">=</span> <span class="nb">list</span><span class="p">()</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">20</span><span class="p">):</span>
<span class="n">idata_m14_1_df</span><span class="p">[</span><span class="sa">f</span><span class="s">'varying_int_</span><span class="si">{</span><span class="n">i</span><span class="si">}</span><span class="s">'</span><span class="p">]</span> <span class="o">=</span> <span class="n">idata_m14_1_df</span><span class="p">[</span> <span class="p">(</span><span class="s">'posterior'</span><span class="p">,</span> <span class="sa">f</span><span class="s">'ab_subject[</span><span class="si">{</span><span class="n">i</span><span class="si">}</span><span class="s">,0]'</span><span class="p">,</span> <span class="n">i</span><span class="p">,</span> <span class="mi">0</span><span class="p">)]</span> <span class="o">-</span> <span class="n">idata_m14_1_df</span><span class="p">[(</span><span class="s">'posterior'</span><span class="p">,</span> <span class="s">'a_bar'</span><span class="p">)]</span>
<span class="n">bayesian_int</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">idata_m14_1_df</span><span class="p">[</span><span class="sa">f</span><span class="s">'varying_int_</span><span class="si">{</span><span class="n">i</span><span class="si">}</span><span class="s">'</span><span class="p">].</span><span class="n">mean</span><span class="p">())</span>
<span class="n">idata_m14_1_df</span><span class="p">[</span><span class="sa">f</span><span class="s">'varying_slope_</span><span class="si">{</span><span class="n">i</span><span class="si">}</span><span class="s">'</span><span class="p">]</span> <span class="o">=</span> <span class="n">idata_m14_1_df</span><span class="p">[</span> <span class="p">(</span><span class="s">'posterior'</span><span class="p">,</span> <span class="sa">f</span><span class="s">'ab_subject[</span><span class="si">{</span><span class="n">i</span><span class="si">}</span><span class="s">,1]'</span><span class="p">,</span> <span class="n">i</span><span class="p">,</span> <span class="mi">1</span><span class="p">)]</span> <span class="o">-</span> <span class="n">idata_m14_1_df</span><span class="p">[(</span><span class="s">'posterior'</span><span class="p">,</span> <span class="s">'b_bar'</span><span class="p">)]</span>
<span class="n">bayesian_slope</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">idata_m14_1_df</span><span class="p">[</span><span class="sa">f</span><span class="s">'varying_slope_</span><span class="si">{</span><span class="n">i</span><span class="si">}</span><span class="s">'</span><span class="p">].</span><span class="n">mean</span><span class="p">())</span>
</code></pre></div></div>
<p>We can now make a direct comparison between the <code class="language-plaintext highlighter-rouge">lmer</code> and <code class="language-plaintext highlighter-rouge">pymc</code> outputs. I’ll ignore the uncertainties for the sake of a cleaner plot.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">random_sims_int</span> <span class="o">=</span> <span class="n">random_sims</span><span class="p">.</span><span class="n">loc</span><span class="p">[</span><span class="n">random_sims</span><span class="p">[</span><span class="s">'term'</span><span class="p">]</span><span class="o">==</span><span class="s">'(Intercept)'</span><span class="p">,</span> <span class="s">'mean'</span><span class="p">].</span><span class="n">copy</span><span class="p">()</span>
<span class="n">random_sims_slope</span> <span class="o">=</span> <span class="n">random_sims</span><span class="p">.</span><span class="n">loc</span><span class="p">[</span><span class="n">random_sims</span><span class="p">[</span><span class="s">'term'</span><span class="p">]</span><span class="o">==</span><span class="s">'afternoon'</span><span class="p">,</span> <span class="s">'mean'</span><span class="p">].</span><span class="n">copy</span><span class="p">()</span>
<span class="n">f</span><span class="p">,</span> <span class="p">(</span><span class="n">ax0</span><span class="p">,</span> <span class="n">ax1</span><span class="p">)</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">12</span><span class="p">,</span> <span class="mi">4</span><span class="p">))</span>
<span class="n">min_max_int</span> <span class="o">=</span> <span class="p">[</span><span class="nb">min</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="n">random_sims_int</span><span class="p">)</span> <span class="o">+</span> <span class="n">bayesian_int</span><span class="p">),</span> <span class="nb">max</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="n">random_sims_int</span><span class="p">)</span> <span class="o">+</span> <span class="n">bayesian_int</span><span class="p">)]</span>
<span class="n">min_max_slope</span> <span class="o">=</span> <span class="p">[</span><span class="nb">min</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="n">random_sims_slope</span><span class="p">)</span> <span class="o">+</span> <span class="n">bayesian_slope</span><span class="p">),</span> <span class="nb">max</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="n">random_sims_slope</span><span class="p">)</span> <span class="o">+</span> <span class="n">bayesian_slope</span><span class="p">)]</span>
<span class="c1"># intercepts
</span><span class="n">ax0</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">random_sims_int</span><span class="p">,</span> <span class="n">bayesian_int</span><span class="p">,</span> <span class="n">facecolors</span><span class="o">=</span><span class="s">'none'</span><span class="p">,</span> <span class="n">edgecolors</span><span class="o">=</span><span class="s">'navy'</span><span class="p">)</span>
<span class="n">ax0</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">min_max_int</span><span class="p">,</span> <span class="n">min_max_int</span><span class="p">,</span> <span class="n">linestyle</span><span class="o">=</span><span class="s">'dashed'</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'gray'</span><span class="p">)</span>
<span class="n">ax0</span><span class="p">.</span><span class="nb">set</span><span class="p">(</span><span class="n">xlabel</span><span class="o">=</span><span class="s">'lmer intercept estimates'</span><span class="p">,</span> <span class="n">ylabel</span><span class="o">=</span><span class="s">'pymc intercept estimates'</span><span class="p">,</span> <span class="n">title</span><span class="o">=</span><span class="s">'Comparison of varying intercepts'</span><span class="p">)</span>
<span class="c1"># slopes
</span><span class="n">ax1</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">random_sims_slope</span><span class="p">,</span> <span class="n">bayesian_slope</span><span class="p">,</span> <span class="n">facecolors</span><span class="o">=</span><span class="s">'none'</span><span class="p">,</span> <span class="n">edgecolors</span><span class="o">=</span><span class="s">'navy'</span><span class="p">)</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">min_max_slope</span><span class="p">,</span> <span class="n">min_max_slope</span><span class="p">,</span> <span class="n">linestyle</span><span class="o">=</span><span class="s">'dashed'</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'gray'</span><span class="p">)</span>
<span class="n">ax1</span><span class="p">.</span><span class="nb">set</span><span class="p">(</span><span class="n">xlabel</span><span class="o">=</span><span class="s">'lmer slope estimates'</span><span class="p">,</span> <span class="n">ylabel</span><span class="o">=</span><span class="s">'pymc slope estimates'</span><span class="p">,</span> <span class="n">title</span><span class="o">=</span><span class="s">'Comparison of varying slopes'</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[Text(0.5, 0, 'lmer slope estimates'),
Text(0, 0.5, 'pymc slope estimates'),
Text(0.5, 1.0, 'Comparison of varying slopes')]
</code></pre></div></div>
<p><img src="/assets/2022-09-13-mixed_effects_freqvsbayes_cafes_files/2022-09-13-mixed_effects_freqvsbayes_cafes_53_1.png" alt="png" /></p>
<p>As you can see we get very similar intercepts and slopes for the cafe-specific estimates (varying effects) for the intercept and slope between the <code class="language-plaintext highlighter-rouge">lmer</code> and <code class="language-plaintext highlighter-rouge">pymc</code> approaches.</p>
<h1 id="summary">Summary</h1>
<p>Here in this post, I set out to compare different mixed model approaches. I looked at the equations and the programmatic implementations. I concluded by showing how the two methods can arrive at the same answer. It required a careful understanding of the differences in equations and coding language- and package-specific implementations. There were various points of writing this post that confused me but it provided opportunities for deeper understanding.</p>
<h1 id="acknowledgements-and-references">Acknowledgements and references</h1>
<p>Acknowledgements</p>
<ul>
<li>Special shoutout to Patrick Robotham (@probot) from the University of Bayes Discord channel for helping me work through <em>many</em> of my confusions.</li>
<li>Eric J. Daza about some discussions about mixed effects modeling. It reminded me about improving my knowledge in this area.</li>
<li>Members of the Glymour group at UCSF for checking some of my code.</li>
</ul>
<p>References</p>
<ul>
<li><a href="https://stats.oarc.ucla.edu/other/mult-pkg/introduction-to-linear-mixed-models/">UCLA introduction to linear mixed models</a>.</li>
<li>Richard McElreath’s Statistical Rethinking for my introduction to Bayesian multilevel modeling and the <a href="https://github.com/pymc-devs/pymc-resources/blob/main/Rethinking_2/Chp_14.ipynb">Statistical Rethinking Chapter 14 repo</a>.</li>
<li>Andrzej Gałecki and Tomasz Burzykowski’s <a href="https://link.springer.com/book/10.1007/978-1-4614-3900-4">Linear Mixed-Effecsts Models Using R</a> which references the <code class="language-plaintext highlighter-rouge">lme4</code> package. Dr. McElreath referenced this package as a non-Bayesian alternative in his book.</li>
<li>Andrew Gelman wrote about why he doesn’t like using “fixed and random effects” (in a <a href="https://statmodeling.stat.columbia.edu/2005/01/25/why_i_dont_use/">blog</a> and in a <a href="https://projecteuclid.org/journals/annals-of-statistics/volume-33/issue-1/Analysis-of-variancewhy-it-is-more-important-than-ever/10.1214/009053604000001048.full">paper</a>).</li>
<li>TJ Mahr’s <a href="https://www.tjmahr.com/plotting-partial-pooling-in-mixed-effects-models/">partial pooling blog post</a>.</li>
</ul>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">%</span><span class="n">load_ext</span> <span class="n">watermark</span>
<span class="o">%</span><span class="n">watermark</span> <span class="o">-</span><span class="n">n</span> <span class="o">-</span><span class="n">u</span> <span class="o">-</span><span class="n">v</span> <span class="o">-</span><span class="n">iv</span> <span class="o">-</span><span class="n">w</span> <span class="o">-</span><span class="n">p</span> <span class="n">aesara</span><span class="p">,</span><span class="n">aeppl</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>The watermark extension is already loaded. To reload it, use:
%reload_ext watermark
Last updated: Tue Sep 13 2022
Python implementation: CPython
Python version : 3.10.6
IPython version : 8.4.0
aesara: 2.8.2
aeppl : 0.0.35
pymc : 4.1.7
xarray : 2022.6.0
pandas : 1.4.3
sys : 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:43:44) [Clang 13.0.1 ]
arviz : 0.12.1
matplotlib: 3.5.3
aesara : 2.8.2
numpy : 1.23.2
Watermark: 2.3.1
</code></pre></div></div>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">%%</span><span class="n">R</span><span class="w">
</span><span class="n">sessionInfo</span><span class="p">()</span><span class="w">
</span></code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>R version 4.1.3 (2022-03-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Monterey 12.5.1
Matrix products: default
LAPACK: /Users/blacar/opt/anaconda3/envs/pymc_env2/lib/libopenblasp-r0.3.21.dylib
locale:
[1] C/UTF-8/C/C/C/C
attached base packages:
[1] tools stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] merTools_0.5.2 arm_1.13-1 MASS_7.3-58.1 lme4_1.1-30
[5] Matrix_1.4-1 forcats_0.5.2 stringr_1.4.1 dplyr_1.0.9
[9] purrr_0.3.4 readr_2.1.2 tidyr_1.2.0 tibble_3.1.8
[13] ggplot2_3.3.6 tidyverse_1.3.2
loaded via a namespace (and not attached):
[1] httr_1.4.4 jsonlite_1.8.0 splines_4.1.3
[4] foreach_1.5.2 modelr_0.1.9 shiny_1.7.2
[7] assertthat_0.2.1 broom.mixed_0.2.9.4 googlesheets4_1.0.1
[10] cellranger_1.1.0 globals_0.16.1 pillar_1.8.1
[13] backports_1.4.1 lattice_0.20-45 glue_1.6.2
[16] digest_0.6.29 promises_1.2.0.1 rvest_1.0.3
[19] minqa_1.2.4 colorspace_2.0-3 httpuv_1.6.5
[22] htmltools_0.5.3 pkgconfig_2.0.3 broom_1.0.0
[25] listenv_0.8.0 haven_2.5.1 xtable_1.8-4
[28] mvtnorm_1.1-3 scales_1.2.1 later_1.3.0
[31] tzdb_0.3.0 googledrive_2.0.0 farver_2.1.1
[34] generics_0.1.3 ellipsis_0.3.2 withr_2.5.0
[37] furrr_0.3.1 cli_3.3.0 mime_0.12
[40] magrittr_2.0.3 crayon_1.5.1 readxl_1.4.1
[43] fs_1.5.2 future_1.27.0 fansi_1.0.3
[46] parallelly_1.32.1 nlme_3.1-159 xml2_1.3.3
[49] hms_1.1.2 gargle_1.2.0 lifecycle_1.0.1
[52] munsell_0.5.0 reprex_2.0.2 compiler_4.1.3
[55] rlang_1.0.4 blme_1.0-5 grid_4.1.3
[58] nloptr_2.0.3 iterators_1.0.14 labeling_0.4.2
[61] boot_1.3-28 gtable_0.3.0 codetools_0.2-18
[64] abind_1.4-5 DBI_1.1.3 R6_2.5.1
[67] lubridate_1.8.0 fastmap_1.1.0 utf8_1.2.2
[70] stringi_1.7.8 parallel_4.1.3 Rcpp_1.0.9
[73] vctrs_0.4.1 dbplyr_2.2.1 tidyselect_1.1.2
[76] coda_0.19-4
</code></pre></div></div>Ben LacarFor a while, I’ve wondered about the different approches for multilevel modeling, also known as mixed effects modeling. My initial understanding is with a Bayesian perspective since I learned about it from Statistical Rethinking. But when hearing others talk about “fixed effects”, “varying effects”, “random effects”, and “mixed effects”, I had trouble connecting my own understanding of the concept to theirs. Even more perplexing, I wasn’t sure what the source(s) of the differences were: It it a frequentist vs. Bayesian thing? Is it a statistical package thing? Is it because there are five different definitions of “fixed and random effects”, infamously observed by Andrew Gelman and why he avoids using those terms?LKJCorr and LKJCov in pymc2022-04-12T00:00:00+00:002022-04-12T00:00:00+00:00https://benslack19.github.io/data%20science/statistics/LKJcorrcov<p>While continuing to deep dive on covariance priors following my <a href="/data%20science/statistics/cov_matrix_weirdness/">prior post</a>, I investigated implementations in <code class="language-plaintext highlighter-rouge">pymc</code>. I played around with the <code class="language-plaintext highlighter-rouge">LKJcorr</code> and <code class="language-plaintext highlighter-rouge">LKJcov</code> functions (named for the authors Lewandowskia, Kurowicka, and Joe). There’s already great explanations out there to explain the rationale for these distributions.</p>
<ul>
<li><a href="https://www.sciencedirect.com/science/article/pii/S0047259X09000876?via%3Dihub">Original paper</a></li>
<li><a href="https://distribution-explorer.github.io/multivariate_continuous/lkj.html">Explanation from Distribution Explorer</a></li>
<li><a href="https://yingqijing.medium.com/lkj-correlation-distribution-in-stan-29927b69e9be">Stan example implementation</a></li>
<li><a href="https://stats.stackexchange.com/questions/304684/why-lkjcorr-is-a-good-prior-for-correlation-matrix">Why LKJcorr is a good prior for correlation matrix?</a></li>
<li><a href="https://docs.pymc.io/en/v3/pymc-examples/examples/case_studies/LKJ.html">PyMC documentation on covariance priors</a></li>
</ul>
<p>I felt those references were more useful after I got my hands dirty. I learned by continuing to try some of McElreath’s Statistical Rethinking problems and dissecting some of the <code class="language-plaintext highlighter-rouge">pymc</code> output. That’s what I document here. <a href="https://media.giphy.com/media/BpGWitbFZflfSUYuZ9/giphy.gif">Let’s do this</a>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">arviz</span> <span class="k">as</span> <span class="n">az</span>
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="n">pd</span>
<span class="kn">import</span> <span class="nn">pymc3</span> <span class="k">as</span> <span class="n">pm</span>
<span class="kn">import</span> <span class="nn">scipy.stats</span> <span class="k">as</span> <span class="n">stats</span>
<span class="kn">from</span> <span class="nn">scipy.special</span> <span class="kn">import</span> <span class="n">expit</span><span class="p">,</span> <span class="n">logit</span>
<span class="kn">import</span> <span class="nn">seaborn</span> <span class="k">as</span> <span class="n">sns</span>
<span class="kn">from</span> <span class="nn">theano</span> <span class="kn">import</span> <span class="n">tensor</span> <span class="k">as</span> <span class="n">tt</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">%</span><span class="n">load_ext</span> <span class="n">nb_black</span>
<span class="o">%</span><span class="n">config</span> <span class="n">InlineBackend</span><span class="p">.</span><span class="n">figure_format</span> <span class="o">=</span> <span class="s">'retina'</span>
<span class="o">%</span><span class="n">load_ext</span> <span class="n">watermark</span>
<span class="n">RANDOM_SEED</span> <span class="o">=</span> <span class="mi">8927</span>
<span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">seed</span><span class="p">(</span><span class="n">RANDOM_SEED</span><span class="p">)</span>
<span class="n">az</span><span class="p">.</span><span class="n">style</span><span class="p">.</span><span class="n">use</span><span class="p">(</span><span class="s">"arviz-darkgrid"</span><span class="p">)</span>
<span class="n">az</span><span class="p">.</span><span class="n">rcParams</span><span class="p">[</span><span class="s">"stats.hdi_prob"</span><span class="p">]</span> <span class="o">=</span> <span class="mf">0.89</span> <span class="c1"># sets default credible interval used by arviz
</span>
<span class="k">def</span> <span class="nf">standardize</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="n">x</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span> <span class="o">-</span> <span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">x</span><span class="p">))</span> <span class="o">/</span> <span class="n">np</span><span class="p">.</span><span class="n">std</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="k">return</span> <span class="n">x</span>
</code></pre></div></div>
<h1 id="lkjcorr-distribution"><code class="language-plaintext highlighter-rouge">LKJcorr</code> distribution</h1>
<p>This function is used to draw correlation values that would comprise one set of parameters that produce a covariance matrix. The other is a vector of standard deviations which we’ll cover in a later section. The smallest correlation matrix you can have is one that is 2x2. Let’s take 5 draws from an LKJ distribution for a 2x2 matrix with an <code class="language-plaintext highlighter-rouge">eta</code> value of 2. Since a matrix is always square, we only need one value for its size, represented by <code class="language-plaintext highlighter-rouge">n</code> in the function. I’ll come back to what <code class="language-plaintext highlighter-rouge">eta</code> is representing later.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># 5 draws of rho values for 2x2 correlation matrix
</span><span class="n">pm</span><span class="p">.</span><span class="n">LKJCorr</span><span class="p">.</span><span class="n">dist</span><span class="p">(</span><span class="n">n</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">eta</span><span class="o">=</span><span class="mi">2</span><span class="p">).</span><span class="n">random</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>array([[ 0.44536007],
[ 0.47925197],
[ 0.11555357],
[ 0.32794659],
[-0.36670234]])
</code></pre></div></div>
<p>For each draw, where does the one unique value from each draw come from? Since the covariance matrix is 2x2, and the diagonal is 1, we only need one rho value to complete the matrix since the off-diagonals are symmetric like this, where I’m using $a$ as the placeholder for one of the five values we sampled above:</p>
\[\begin{bmatrix} 1 & a \\ a & 1 \end{bmatrix}\]
<p>Why do we even the LKJ distribution? If we need a prior distribution for rho, can’t we use a beta distribution? After all, we only need one value since the off-diagonals are symmetric. We can see quickly that the 2x2 co-variance matrix is a special case. We immediately appreciate that once we use a 3x3 matrix.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># 5 draws of rho values for 3x3 correlation matrix
</span><span class="n">pm</span><span class="p">.</span><span class="n">LKJCorr</span><span class="p">.</span><span class="n">dist</span><span class="p">(</span><span class="n">n</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">eta</span><span class="o">=</span><span class="mi">2</span><span class="p">).</span><span class="n">random</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>array([[-0.03769077, -0.11296013, -0.1146735 ],
[ 0.02914172, 0.87776358, 0.23930909],
[ 0.22782414, 0.17480301, -0.33129416],
[-0.10766053, 0.01332989, -0.47423766],
[-0.11734588, -0.56739363, -0.03539502]])
</code></pre></div></div>
<p>Where do the three unique values from each draw come from? We now have three unique rho values which are symmetric around the diagonal vector of 1s. I’ll use $a$, $b$, and $c$ to represent rho values between the first and second parameters, first and third parameters, and second and third paramters, respectively.</p>
\[\begin{bmatrix} 1 & a & b \\ a & 1 & c \\ b & c & 1 \end{bmatrix}\]
<p>We see this same pattern continue when we increase the dimensions of the covariance matrix by one yet again.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># 5 draws of rho values for 4x4 correlation matrix
</span><span class="n">pm</span><span class="p">.</span><span class="n">LKJCorr</span><span class="p">.</span><span class="n">dist</span><span class="p">(</span><span class="n">n</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">eta</span><span class="o">=</span><span class="mi">2</span><span class="p">).</span><span class="n">random</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>array([[-0.38633472, 0.68960155, 0.48638167, -0.51097999, -0.33741679,
0.41864488],
[ 0.21746371, -0.3314052 , 0.55754886, 0.14152549, 0.426748 ,
-0.03264997],
[-0.24486788, -0.01114335, 0.09792299, 0.52026915, 0.25866165,
0.50274309],
[ 0.89416089, -0.23197608, -0.29084055, -0.2306694 , -0.37897343,
0.10800676],
[-0.50015375, 0.30615676, -0.16429283, -0.36701092, -0.17276691,
-0.34336828]])
</code></pre></div></div>
<p>The six unique values when the correlation matrix is 4x4 are shown in this arrangement. Again, the diagonal is 1, while the off-diagonals are symmetric around it.</p>
\[\begin{bmatrix} 1 & a & b & c \\ a & 1 & d & e \\ b & d & 1 & f \\ c & e & f & 1 \end{bmatrix}\]
<p>Above I’ve illustrated what each draw of <code class="language-plaintext highlighter-rouge">LKJcorr</code> represents as it varies with <code class="language-plaintext highlighter-rouge">n</code>, but the values themselves are controlled by the parameter <code class="language-plaintext highlighter-rouge">eta</code>. This controls the rho values and since they are correlation coefficients, the values are bound by -1 and 1. The <code class="language-plaintext highlighter-rouge">eta</code> parameter influences the shape of the distribution within these bounds.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># repo code (for correlation matrix of 2x2)
</span><span class="n">f</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">16</span><span class="p">,</span> <span class="mi">4</span><span class="p">))</span>
<span class="n">textloc</span> <span class="o">=</span> <span class="p">[[</span><span class="mi">0</span><span class="p">,</span> <span class="mf">0.5</span><span class="p">],</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mf">0.8</span><span class="p">],</span> <span class="p">[</span><span class="mf">0.5</span><span class="p">,</span> <span class="mf">0.9</span><span class="p">]]</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">3</span><span class="p">):</span>
<span class="k">for</span> <span class="n">eta</span><span class="p">,</span> <span class="n">loc</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">4</span><span class="p">],</span> <span class="n">textloc</span><span class="p">):</span>
<span class="n">R</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">LKJCorr</span><span class="p">.</span><span class="n">dist</span><span class="p">(</span><span class="n">n</span><span class="o">=</span><span class="n">i</span><span class="o">+</span><span class="mi">2</span><span class="p">,</span> <span class="n">eta</span><span class="o">=</span><span class="n">eta</span><span class="p">).</span><span class="n">random</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="mi">10000</span><span class="p">)</span>
<span class="n">az</span><span class="p">.</span><span class="n">plot_kde</span><span class="p">(</span><span class="n">R</span><span class="p">,</span> <span class="n">plot_kwargs</span><span class="o">=</span><span class="p">{</span><span class="s">"alpha"</span><span class="p">:</span> <span class="mf">0.8</span><span class="p">},</span> <span class="n">ax</span><span class="o">=</span><span class="n">ax</span><span class="p">[</span><span class="n">i</span><span class="p">])</span>
<span class="n">ax</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">text</span><span class="p">(</span><span class="n">loc</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">loc</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="s">"eta = %s"</span> <span class="o">%</span> <span class="p">(</span><span class="n">eta</span><span class="p">),</span> <span class="n">horizontalalignment</span><span class="o">=</span><span class="s">"center"</span><span class="p">)</span>
<span class="n">ax</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">set_xlabel</span><span class="p">(</span><span class="s">"correlation"</span><span class="p">)</span>
<span class="n">ax</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">set_ylabel</span><span class="p">(</span><span class="s">"Density"</span><span class="p">)</span>
<span class="n">ax</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">set_title</span><span class="p">(</span><span class="sa">f</span><span class="s">"</span><span class="si">{</span><span class="n">i</span><span class="o">+</span><span class="mi">2</span><span class="si">}</span><span class="s">x</span><span class="si">{</span><span class="n">i</span><span class="o">+</span><span class="mi">2</span><span class="si">}</span><span class="s"> correlation matrix"</span><span class="p">);</span>
</code></pre></div></div>
<p><img src="/assets/2022-04-22-LKJcorrcov_files/2022-04-22-LKJcorrcov_11_0.png" alt="png" /></p>
<p>You can see that an <code class="language-plaintext highlighter-rouge">eta</code> value of 2 is fairly conservative with most of the expectation for correlations being low. The size of the correlation matrix itself also influences the distribution, particularly at the tails.</p>
<h1 id="manually-building-a-covariance-matrix">Manually building a covariance matrix</h1>
<p>Remember that at this point we don’t have the covariance matrix yet. We only have sampled <em>correlation</em> matrices. But we can get the covariance matrix once we take some sampled sigmas. Let’s do that to see how this would look in 5 draws from a prior distribution. We wouldn’t do this for a real problem. This is just to demonstrate what sampling would look like, step-by-step. We’ll go back to a 2x2 covariance matrix to make things simple.</p>
<p>For each pass through this loop, we will:</p>
<ul>
<li>take one rho value, since this is a 2x2 corelation matrix</li>
<li>take two standard deviation (sigma) values and they’ll be from slightly different distributions just to make things interesting (Exp(1) and Exp(0.5))</li>
<li>arrange the sigmas as a vector</li>
<li>generate a new covariance matrix from the rho and sigmas</li>
<li>output the covariance matrix and the values that</li>
</ul>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">5</span><span class="p">):</span>
<span class="c1"># the [0] is just to get the value out of the array
</span> <span class="n">rho</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">LKJCorr</span><span class="p">.</span><span class="n">dist</span><span class="p">(</span><span class="n">n</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">eta</span><span class="o">=</span><span class="mi">2</span><span class="p">).</span><span class="n">random</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="mi">1</span><span class="p">)[</span><span class="mi">0</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span>
<span class="n">Rmat</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([[</span><span class="mi">1</span><span class="p">,</span> <span class="n">rho</span><span class="p">],</span> <span class="p">[</span><span class="n">rho</span><span class="p">,</span> <span class="mi">1</span><span class="p">]])</span>
<span class="c1"># the sigmas themselves need to be sampled; again the [0] is to get the value out of the array
</span> <span class="n">sigma_a</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Exponential</span><span class="p">.</span><span class="n">dist</span><span class="p">(</span><span class="mf">1.0</span><span class="p">).</span><span class="n">random</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="mi">1</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">sigma_b</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Exponential</span><span class="p">.</span><span class="n">dist</span><span class="p">(</span><span class="mf">0.5</span><span class="p">).</span><span class="n">random</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="mi">1</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">sigmas</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="n">sigma_a</span><span class="p">,</span> <span class="n">sigma_b</span><span class="p">])</span> <span class="c1"># arrange the sigmas as a vector
</span> <span class="n">Sigma</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">diag</span><span class="p">(</span><span class="n">sigmas</span><span class="p">).</span><span class="n">dot</span><span class="p">(</span><span class="n">Rmat</span><span class="p">).</span><span class="n">dot</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">diag</span><span class="p">(</span><span class="n">sigmas</span><span class="p">))</span> <span class="c1"># use a weird way a that covariance matrix is made
</span> <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"sample </span><span class="si">{</span><span class="n">i</span><span class="si">}</span><span class="se">\n</span><span class="s"> -- sigma for a: </span><span class="si">{</span><span class="n">sigma_a</span><span class="p">:</span><span class="mf">0.3</span><span class="n">f</span><span class="si">}</span><span class="s">,</span><span class="se">\t</span><span class="s"> sigma for b: </span><span class="si">{</span><span class="n">sigma_b</span><span class="p">:</span><span class="mf">0.3</span><span class="n">f</span><span class="si">}</span><span class="s">,</span><span class="se">\t</span><span class="s">Rho: </span><span class="si">{</span><span class="n">rho</span><span class="p">:</span><span class="mf">0.3</span><span class="n">f</span><span class="si">}</span><span class="s"> </span><span class="se">\n</span><span class="s"> -- covariance matrix:</span><span class="se">\n</span><span class="si">{</span><span class="n">Sigma</span><span class="si">}</span><span class="se">\n</span><span class="s">"</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sample 0
-- sigma for a: 0.411, sigma for b: 2.389, Rho: -0.289
-- covariance matrix:
[[ 0.16873355 -0.28350151]
[-0.28350151 5.7070884 ]]
sample 1
-- sigma for a: 1.969, sigma for b: 10.462, Rho: 0.436
-- covariance matrix:
[[ 3.87629967 8.97898901]
[ 8.97898901 109.46212826]]
sample 2
-- sigma for a: 1.932, sigma for b: 2.676, Rho: -0.404
-- covariance matrix:
[[ 3.73187298 -2.08904139]
[-2.08904139 7.1588578 ]]
sample 3
-- sigma for a: 0.601, sigma for b: 2.326, Rho: 0.233
-- covariance matrix:
[[0.36063857 0.32606953]
[0.32606953 5.41025897]]
sample 4
-- sigma for a: 3.659, sigma for b: 0.002, Rho: 0.458
-- covariance matrix:
[[1.33858317e+01 4.08000425e-03]
[4.08000425e-03 5.92186930e-06]]
</code></pre></div></div>
<h1 id="lkjcholeskycov-distribution"><code class="language-plaintext highlighter-rouge">LKJCholeskyCov</code> distribution</h1>
<p>Unlike <code class="language-plaintext highlighter-rouge">LKJCorr</code>, there’s no <code class="language-plaintext highlighter-rouge">.dist</code> method that we can sample from directly. However, we can wrap this in a model container and sample from it. Again, this is not recommended practice. It is merely to get an idea of what this function is producing. Check out the <a href="https://docs.pymc.io/en/v3/pymc-examples/examples/case_studies/LKJ.html">pymc example</a> I referenced above for an authoritative source.</p>
<p>(One difference compared to the manual way of constructing the covariance matrix above is that I don’t think there’s a way to specify different prior distributions for the sigmas in the <code class="language-plaintext highlighter-rouge">sd_dist</code> parameter, at least with version 3.11.0 of <code class="language-plaintext highlighter-rouge">pymc</code>.)</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">with</span> <span class="n">pm</span><span class="p">.</span><span class="n">Model</span><span class="p">()</span> <span class="k">as</span> <span class="n">m1</span><span class="p">:</span>
<span class="n">packed_L</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">LKJCholeskyCov</span><span class="p">(</span><span class="s">"packed_L"</span><span class="p">,</span> <span class="n">eta</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">n</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">sd_dist</span><span class="o">=</span><span class="n">pm</span><span class="p">.</span><span class="n">Exponential</span><span class="p">.</span><span class="n">dist</span><span class="p">(</span><span class="mf">1.0</span><span class="p">))</span>
<span class="n">trace_m1</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">sample</span><span class="p">(</span><span class="mi">1000</span><span class="p">,</span> <span class="n">tune</span><span class="o">=</span><span class="mi">1000</span><span class="p">,</span> <span class="n">return_inferencedata</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span> <span class="n">progressbar</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [packed_L]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 32 seconds.
There were 2 divergences after tuning. Increase `target_accept` or reparameterize.
There was 1 divergence after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.4781781973772051, but should be close to 0.8. Try to increase the number of tuning steps.
The acceptance probability does not match the target. It is 0.705022869734104, but should be close to 0.8. Try to increase the number of tuning steps.
The rhat statistic is larger than 1.05 for some parameters. This indicates slight problems during sampling.
The estimated number of effective samples is smaller than 200 for some parameters.
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">pm</span><span class="p">.</span><span class="n">trace_to_dataframe</span><span class="p">(</span><span class="n">trace_m1</span><span class="p">).</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>packed_L__0</th>
<th>packed_L__1</th>
<th>packed_L__2</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>2.248979</td>
<td>-1.339856</td>
<td>2.574068</td>
</tr>
<tr>
<th>1</th>
<td>1.521427</td>
<td>-1.494167</td>
<td>1.882593</td>
</tr>
<tr>
<th>2</th>
<td>2.729876</td>
<td>-1.076162</td>
<td>2.427225</td>
</tr>
<tr>
<th>3</th>
<td>0.781399</td>
<td>-0.943537</td>
<td>1.931551</td>
</tr>
<tr>
<th>4</th>
<td>1.161240</td>
<td>-0.111627</td>
<td>0.320429</td>
</tr>
</tbody>
</table>
</div>
<p>With each sample, we have three values. They are a lower triangular matrix but they are <em>not</em> of the covariance matrix itself. Rahter, they are the <strong>Cholesky decomposition</strong> of the covariance matrix. For practical purposes and for interpretation, it is better to use the following instantiation where we can get the rho and sigma values back automatically.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">with</span> <span class="n">pm</span><span class="p">.</span><span class="n">Model</span><span class="p">()</span> <span class="k">as</span> <span class="n">m2</span><span class="p">:</span>
<span class="n">chol</span><span class="p">,</span> <span class="n">corr</span><span class="p">,</span> <span class="n">stds</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">LKJCholeskyCov</span><span class="p">(</span>
<span class="s">"chol"</span><span class="p">,</span> <span class="n">n</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">eta</span><span class="o">=</span><span class="mf">2.0</span><span class="p">,</span> <span class="n">sd_dist</span><span class="o">=</span><span class="n">pm</span><span class="p">.</span><span class="n">Exponential</span><span class="p">.</span><span class="n">dist</span><span class="p">(</span><span class="mf">1.0</span><span class="p">),</span> <span class="n">compute_corr</span><span class="o">=</span><span class="bp">True</span>
<span class="p">)</span>
<span class="n">cov</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Deterministic</span><span class="p">(</span><span class="s">"cov"</span><span class="p">,</span> <span class="n">chol</span><span class="p">.</span><span class="n">dot</span><span class="p">(</span><span class="n">chol</span><span class="p">.</span><span class="n">T</span><span class="p">))</span>
<span class="n">trace_m2</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">sample</span><span class="p">(</span><span class="mi">1000</span><span class="p">,</span> <span class="n">tune</span><span class="o">=</span><span class="mi">1000</span><span class="p">,</span> <span class="n">return_inferencedata</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span> <span class="n">progressbar</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [chol]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 29 seconds.
There was 1 divergence after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.47696968632616427, but should be close to 0.8. Try to increase the number of tuning steps.
The acceptance probability does not match the target. It is 0.6910481731222186, but should be close to 0.8. Try to increase the number of tuning steps.
There was 1 divergence after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.5735096676868859, but should be close to 0.8. Try to increase the number of tuning steps.
The rhat statistic is larger than 1.05 for some parameters. This indicates slight problems during sampling.
The estimated number of effective samples is smaller than 200 for some parameters.
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">trace_m2_df</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">trace_to_dataframe</span><span class="p">(</span><span class="n">trace_m2</span><span class="p">)</span>
<span class="n">trace_m2_df</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>chol__0</th>
<th>chol__1</th>
<th>chol__2</th>
<th>chol_stds__0</th>
<th>chol_stds__1</th>
<th>chol_corr__0_0</th>
<th>chol_corr__0_1</th>
<th>chol_corr__1_0</th>
<th>chol_corr__1_1</th>
<th>cov__0_0</th>
<th>cov__0_1</th>
<th>cov__1_0</th>
<th>cov__1_1</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>3.622728</td>
<td>-0.515597</td>
<td>1.710626</td>
<td>3.622728</td>
<td>1.786640</td>
<td>1.0</td>
<td>-0.288585</td>
<td>-0.288585</td>
<td>1.0</td>
<td>13.124155</td>
<td>-1.867867</td>
<td>-1.867867</td>
<td>3.192082</td>
</tr>
<tr>
<th>1</th>
<td>0.697870</td>
<td>-0.172701</td>
<td>1.485430</td>
<td>0.697870</td>
<td>1.495436</td>
<td>1.0</td>
<td>-0.115485</td>
<td>-0.115485</td>
<td>1.0</td>
<td>0.487023</td>
<td>-0.120523</td>
<td>-0.120523</td>
<td>2.236328</td>
</tr>
<tr>
<th>2</th>
<td>0.600202</td>
<td>0.428665</td>
<td>1.425867</td>
<td>0.600202</td>
<td>1.488909</td>
<td>1.0</td>
<td>0.287906</td>
<td>0.287906</td>
<td>1.0</td>
<td>0.360242</td>
<td>0.257286</td>
<td>0.257286</td>
<td>2.216850</td>
</tr>
<tr>
<th>3</th>
<td>1.777497</td>
<td>-0.023333</td>
<td>0.703293</td>
<td>1.777497</td>
<td>0.703680</td>
<td>1.0</td>
<td>-0.033159</td>
<td>-0.033159</td>
<td>1.0</td>
<td>3.159494</td>
<td>-0.041475</td>
<td>-0.041475</td>
<td>0.495165</td>
</tr>
<tr>
<th>4</th>
<td>0.671842</td>
<td>-0.077284</td>
<td>0.361800</td>
<td>0.671842</td>
<td>0.369962</td>
<td>1.0</td>
<td>-0.208898</td>
<td>-0.208898</td>
<td>1.0</td>
<td>0.451371</td>
<td>-0.051923</td>
<td>-0.051923</td>
<td>0.136872</td>
</tr>
</tbody>
</table>
</div>
<p>We can verify the values of the covariance matrix by using the standard deviations and correlation coefficients of the posterior.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">5</span><span class="p">):</span>
<span class="n">sigmas</span> <span class="o">=</span> <span class="n">trace_m2_df</span><span class="p">.</span><span class="n">loc</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="p">[</span><span class="s">'chol_stds__0'</span><span class="p">,</span> <span class="s">'chol_stds__1'</span><span class="p">]]</span>
<span class="n">rho</span> <span class="o">=</span> <span class="n">trace_m2_df</span><span class="p">.</span><span class="n">loc</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="s">'chol_corr__0_1'</span><span class="p">]</span>
<span class="n">Rmat</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([[</span><span class="mi">1</span><span class="p">,</span> <span class="n">rho</span><span class="p">],</span> <span class="p">[</span><span class="n">rho</span><span class="p">,</span> <span class="mi">1</span><span class="p">]])</span>
<span class="n">Sigma</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">diag</span><span class="p">(</span><span class="n">sigmas</span><span class="p">).</span><span class="n">dot</span><span class="p">(</span><span class="n">Rmat</span><span class="p">).</span><span class="n">dot</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">diag</span><span class="p">(</span><span class="n">sigmas</span><span class="p">))</span>
<span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">'draw: </span><span class="si">{</span><span class="n">i</span><span class="si">}</span><span class="s">'</span><span class="p">,</span> <span class="n">Sigma</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s">'</span><span class="se">\n</span><span class="s">'</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>draw: 0
[[13.12415508 -1.86786674]
[-1.86786674 3.19208164]]
draw: 1
[[ 0.48702298 -0.1205228 ]
[-0.1205228 2.23632763]]
draw: 2
[[0.36024213 0.25728563]
[0.25728563 2.21685039]]
draw: 3
[[ 3.15949443 -0.0414752 ]
[-0.0414752 0.49516542]]
draw: 4
[[ 0.45137105 -0.05192275]
[-0.05192275 0.13687202]]
</code></pre></div></div>
<p>Compare the printed values from each sample draw and you see that we get an exact match with the <code class="language-plaintext highlighter-rouge">cov__0_0</code>, <code class="language-plaintext highlighter-rouge">cov__0_1</code>, <code class="language-plaintext highlighter-rouge">cov__1_0</code>, and <code class="language-plaintext highlighter-rouge">cov__1_1</code> columns.</p>
<h1 id="summary">Summary</h1>
<p>In this post, I wanted to better understand the <code class="language-plaintext highlighter-rouge">LKJcorr</code> and <code class="language-plaintext highlighter-rouge">LKJcov</code> outputs. You often wouldn’t have to go this detailed, but it helped me gain a better understanding of using and interpreting these distributions when applied to problems with multivariate normals, including varying effects models.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">%</span><span class="n">watermark</span> <span class="o">-</span><span class="n">n</span> <span class="o">-</span><span class="n">u</span> <span class="o">-</span><span class="n">v</span> <span class="o">-</span><span class="n">iv</span> <span class="o">-</span><span class="n">w</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Last updated: Tue Apr 12 2022
Python implementation: CPython
Python version : 3.8.6
IPython version : 7.20.0
seaborn : 0.11.1
matplotlib: 3.3.4
pymc3 : 3.11.0
scipy : 1.6.0
theano : 1.1.0
pandas : 1.2.1
numpy : 1.20.1
sys : 3.8.6 | packaged by conda-forge | (default, Jan 25 2021, 23:22:12)
[Clang 11.0.1 ]
arviz : 0.11.1
Watermark: 2.1.0
</code></pre></div></div>Ben LacarWhile continuing to deep dive on covariance priors following my prior post, I investigated implementations in pymc. I played around with the LKJcorr and LKJcov functions (named for the authors Lewandowskia, Kurowicka, and Joe). There’s already great explanations out there to explain the rationale for these distributions.Weird ways that covariance matrices are made2022-03-28T00:00:00+00:002022-03-28T00:00:00+00:00https://benslack19.github.io/data%20science/statistics/cov_matrix_weirdness<p>Covariance priors for multivariate normal models are an important tool for the implementation of varying effects. By representing more than one parameter with a covarying structure, even more partial pooling can result than if the parameters had their own separate distribution. Before talking more about varying effects, I thought I’d write about the weird ways that covariance matrixes are made.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">arviz</span> <span class="k">as</span> <span class="n">az</span>
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="n">pd</span>
<span class="kn">import</span> <span class="nn">pymc3</span> <span class="k">as</span> <span class="n">pm</span>
<span class="kn">import</span> <span class="nn">scipy.stats</span> <span class="k">as</span> <span class="n">stats</span>
<span class="kn">import</span> <span class="nn">scipy.linalg</span> <span class="k">as</span> <span class="n">linalg</span>
<span class="kn">import</span> <span class="nn">seaborn</span> <span class="k">as</span> <span class="n">sns</span>
<span class="o">%</span><span class="n">load_ext</span> <span class="n">nb_black</span>
<span class="o">%</span><span class="n">config</span> <span class="n">InlineBackend</span><span class="p">.</span><span class="n">figure_format</span> <span class="o">=</span> <span class="s">'retina'</span>
<span class="o">%</span><span class="n">load_ext</span> <span class="n">watermark</span>
<span class="n">RANDOM_SEED</span> <span class="o">=</span> <span class="mi">8927</span>
<span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">seed</span><span class="p">(</span><span class="n">RANDOM_SEED</span><span class="p">)</span>
<span class="n">az</span><span class="p">.</span><span class="n">style</span><span class="p">.</span><span class="n">use</span><span class="p">(</span><span class="s">"arviz-darkgrid"</span><span class="p">)</span>
<span class="n">az</span><span class="p">.</span><span class="n">rcParams</span><span class="p">[</span><span class="s">"stats.hdi_prob"</span><span class="p">]</span> <span class="o">=</span> <span class="mf">0.89</span> <span class="c1"># sets default credible interval used by arviz
</span>
<span class="k">def</span> <span class="nf">standardize</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="n">x</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span> <span class="o">-</span> <span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">x</span><span class="p">))</span> <span class="o">/</span> <span class="n">np</span><span class="p">.</span><span class="n">std</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="k">return</span> <span class="n">x</span>
</code></pre></div></div>
<p>What is a a covariance matrix? One way to think of it is through an analogy: a standard deviation is to a univariate normal distribution as a covariate matrix is to a multivariate normal distribution.</p>
<p>In equation form, you could have variables that look like this:</p>
\[x \sim \text{Normal}(\mu, \sigma) \tag{univariate normal distribution}\]
\[\begin{bmatrix}x_1 \\ x_2 \\ ... \\ x_n \end{bmatrix} \sim \text{MVNormal} \left( \begin{bmatrix} \mu_1 \\ \mu_2 \\ ... \\ \mu_n \end{bmatrix} , \Sigma \right) \tag{multivariate normal distribution}\]
<p>In both cases, we have variables paramaterized by random distributions. In the univariate case, a single draw from the distribution will result in one value. In the multivariate case, a single draw will result in <em>n</em> values, one for each parameter. In the multivariate normal (MVN) case, we have a vector of means ($\mu$), but the interesting relationships will result from the covariance matrix $\Sigma$.. It will tell us about the variability of the parameters and also possible correlative relationships between them. This is seen in how we can construct covariance matrices.</p>
<p>Using numbers helps me understand things so let’s use Dr. McElreath’s example involving cafe waiting times. For the purposes of this post, you don’t need to know the details of the problem, but it is described in <a href="https://www.youtube.com/watch?v=yfXpjmWgyXU&list=PLDcUM9US4XdNM4Edgs7weiyIguLSToZRI&index=17&t=484s">this lecture</a>.</p>
<p>The multivariate normal distribution for this cafe waiting times example is described here:</p>
\[\begin{bmatrix}\alpha_{\text{cafe}} \\ \beta_{\text{cafe}} \end{bmatrix} \sim \text{MVNormal} \left( \begin{bmatrix}\alpha \\ \beta \end{bmatrix} , \textbf{S} \right) \tag{population of varying effects}\]
<p>We’ll create a simple 2x2 covariance matrix but the lessons can be extended to larger sizes. To construct it, we’ll need values for each parameter’s standard deviation (what I’ll call $\sigma$ below) and a correlation coefficient $\rho$. For a proper multivariate normal distribution, we’ll also need values for the means (the $\mu$ vector described above), denoted as <em>a</em> and <em>b</em>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">a</span> <span class="o">=</span> <span class="mf">3.5</span> <span class="c1"># average morning wait time
</span><span class="n">b</span> <span class="o">=</span> <span class="o">-</span><span class="mf">1.0</span> <span class="c1"># average difference afternoon wait time
</span><span class="n">sigma_a</span> <span class="o">=</span> <span class="mf">1.0</span> <span class="c1"># std dev in intercepts
</span><span class="n">sigma_b</span> <span class="o">=</span> <span class="mf">0.5</span> <span class="c1"># std dev in slopes
</span><span class="n">rho</span> <span class="o">=</span> <span class="o">-</span><span class="mf">0.7</span> <span class="c1"># correlation between intercepts and slopes
</span></code></pre></div></div>
<p>While our focus is on the covariance matrix, let’s get the first term of the MVN distribution out of the way. I’ll generate the vector of the averages which is straightforward.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Mu</span> <span class="o">=</span> <span class="p">[</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">]</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Vector of means: "</span><span class="p">,</span> <span class="n">Mu</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Vector of means: [3.5, -1.0]
</code></pre></div></div>
<h1 id="intuitive-construction">Intuitive construction</h1>
<p>The first method can be made is the most intuitive for me.</p>
\[\textbf{S} = \begin{pmatrix} \sigma_{\alpha}^2 & \rho\sigma_{\alpha}\sigma_{\beta} \\ \rho\sigma_{\alpha}\sigma_{\beta} & \sigma_{\beta}^2 \end{pmatrix}\]
<p>The diagonals show each individual parameter’s variance (standard deviation squared) while the off-diagonal shows the co-variance, represented as the correlation coefficient $\rho$ multiplied by the parameters’ standard deviations.</p>
<p>I’ll use <code class="language-plaintext highlighter-rouge">Sigma1</code> with capital S to represent this covariance matrix with the <code class="language-plaintext highlighter-rouge">1</code> representing this first method of assembly but as you’ll see, they will be equivalent. (In equations like the one shown above, the covariance matrix is represented by a bold, capital S.)</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">cov_ab</span> <span class="o">=</span> <span class="n">rho</span> <span class="o">*</span> <span class="n">sigma_a</span> <span class="o">*</span> <span class="n">sigma_b</span>
<span class="n">Sigma1</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([[</span><span class="n">sigma_a</span><span class="o">**</span><span class="mi">2</span><span class="p">,</span> <span class="n">cov_ab</span><span class="p">],</span> <span class="p">[</span><span class="n">cov_ab</span><span class="p">,</span> <span class="n">sigma_b</span><span class="o">**</span><span class="mi">2</span><span class="p">]])</span>
<span class="n">Sigma1</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>array([[ 1. , -0.35],
[-0.35, 0.25]])
</code></pre></div></div>
<p>The important parts are the off-diagonals, which shows a negative covariance between the $\alpha$ and $\beta$ terms. They are symmetric because the calculation is equivalent. Hopefully there’s no confusion in how this covariance matrix resulted.</p>
<h1 id="standard-deviation-diagonals">Standard deviation diagonals</h1>
<p>The second method for building the covariance matrix will be weirder:</p>
<ul>
<li>arrange the standard deviations along the diagonal and fill in zeros everywhere else</li>
<li>matrix multiply by a <em>correlation</em> matrix</li>
<li>matrix multiply by the same arrangement of standard deviations along the diagonal</li>
</ul>
<p>Here’s how it looks in equation form:</p>
\[\textbf{S} = \begin{pmatrix} \sigma_{\alpha} & 0 \\ 0 & \sigma_{\beta} \end{pmatrix} \textbf{R} \begin{pmatrix} \sigma_{\alpha} & 0 \\ 0 & \sigma_{\beta} \end{pmatrix}\]
<p>To create a matrix where the standard deviations are on the diagonal and zeros are everywhere, we can use a handy <code class="language-plaintext highlighter-rouge">numpy</code> function called diag that can be applied to the parameter standard deviations arranged in a vector:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># put the sigmas in a vector first
</span><span class="n">sigmas</span> <span class="o">=</span> <span class="p">[</span><span class="n">sigma_a</span><span class="p">,</span> <span class="n">sigma_b</span><span class="p">]</span>
<span class="c1"># represent on the diagonal
</span><span class="n">sigma_diag</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">diag</span><span class="p">(</span><span class="n">sigmas</span><span class="p">)</span>
<span class="n">sigma_diag</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>array([[1. , 0. ],
[0. , 0.5]])
</code></pre></div></div>
<p>The $\textbf{R}$ matrix is where <code class="language-plaintext highlighter-rouge">rho</code> is arranged in the off-diagonals, where <code class="language-plaintext highlighter-rouge">rho</code> represents the correlation between the two parameters. The diagonals show values of 1 since each parameter will always be perfectly correlated with itself.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Rmat</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([[</span><span class="mi">1</span><span class="p">,</span> <span class="n">rho</span><span class="p">],</span> <span class="p">[</span><span class="n">rho</span><span class="p">,</span> <span class="mi">1</span><span class="p">]])</span>
<span class="n">Rmat</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>array([[ 1. , -0.7],
[-0.7, 1. ]])
</code></pre></div></div>
<p>Now the final step is the matrix multiplication. In <code class="language-plaintext highlighter-rouge">numpy</code>, you can do this with a small chain of matrix multiplication (taken from <a href="https://stackoverflow.com/questions/11838352/multiply-several-matrices-in-numpy">this SO post</a>).</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Sigma2</span> <span class="o">=</span> <span class="n">sigma_diag</span><span class="p">.</span><span class="n">dot</span><span class="p">(</span><span class="n">Rmat</span><span class="p">).</span><span class="n">dot</span><span class="p">(</span><span class="n">sigma_diag</span><span class="p">)</span>
<span class="n">Sigma2</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>array([[ 1. , -0.35],
[-0.35, 0.25]])
</code></pre></div></div>
<p>As expected, we get the same values of the covariance matrix as we did with the previous method.</p>
<h1 id="cholesky-factors">Cholesky factors</h1>
<p>Ok, now we have the third method of creating a covariance matrix. As promised, it gets even more weird. It deserves its own exploration but I’ll just show how it works now then explain later. The first thing we need to do is get the Cholesky factor which can be derived from the $\textbf{R}$ correlation matrix. There are other sources that explain Cholesky factors like <a href="https://en.wikipedia.org/wiki/Cholesky_decomposition">the Wikipedia page</a>.</p>
<p>The matrix $\textbf{R}$ can be derived from this Cholesky factor with the following equation:</p>
<p>$ \textbf{R} = \textbf{LL}^\intercal $</p>
<p>Accordingly, we can substitute for $\textbf{R}$ in the equation we saw above:</p>
\[\textbf{S} = \begin{pmatrix} \sigma_{\alpha} & 0 \\ 0 & \sigma_{\beta} \end{pmatrix} \textbf{LL}^\intercal \begin{pmatrix} \sigma_{\alpha} & 0 \\ 0 & \sigma_{\beta} \end{pmatrix}\]
<p>$\textbf{L}$ is <strong>not</strong> simply the lower triangle simply of a correlation matrix.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># WRONG - this is not how to get L
</span><span class="n">np</span><span class="p">.</span><span class="n">tril</span><span class="p">(</span><span class="n">Rmat</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>array([[ 1. , 0. ],
[-0.7, 1. ]])
</code></pre></div></div>
<p>There is a different <code class="language-plaintext highlighter-rouge">numpy</code> function that calculates the lower triangle properly. (Note that <code class="language-plaintext highlighter-rouge">scipy.linalg.cholesky</code> does the upper triangle. You’d modify the above equation by transposing L first then mutiplying by itself.)</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># numpy.linalg.cholesky does the lower triangle
</span><span class="n">L</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">linalg</span><span class="p">.</span><span class="n">cholesky</span><span class="p">(</span><span class="n">Rmat</span><span class="p">)</span>
<span class="n">L</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>array([[ 1. , 0. ],
[-0.7 , 0.71414284]])
</code></pre></div></div>
<p>In code, we can get this third re-construction of $\textbf{S}$ like this:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Sigma3</span> <span class="o">=</span> <span class="n">sigma_diag</span><span class="p">.</span><span class="n">dot</span><span class="p">(</span><span class="n">L</span><span class="p">).</span><span class="n">dot</span><span class="p">(</span><span class="n">L</span><span class="p">.</span><span class="n">T</span><span class="p">).</span><span class="n">dot</span><span class="p">(</span><span class="n">sigma_diag</span><span class="p">)</span>
<span class="n">Sigma3</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>array([[ 1. , -0.35],
[-0.35, 0.25]])
</code></pre></div></div>
<p>As we would expect, all three ways to get a covariance matrix give equivalent results. Why would you even use this last, strange way? It will have to do with sampling in a varying effects problem. The Cholesky factors will allow us to generate non-centered paramaterizations. I’ll cover this in a later post.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">%</span><span class="n">watermark</span> <span class="o">-</span><span class="n">n</span> <span class="o">-</span><span class="n">u</span> <span class="o">-</span><span class="n">v</span> <span class="o">-</span><span class="n">iv</span> <span class="o">-</span><span class="n">w</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Last updated: Mon Mar 28 2022
Python implementation: CPython
Python version : 3.8.6
IPython version : 7.20.0
sys : 3.8.6 | packaged by conda-forge | (default, Jan 25 2021, 23:22:12)
[Clang 11.0.1 ]
matplotlib: 3.3.4
pandas : 1.2.1
pymc3 : 3.11.0
arviz : 0.11.1
scipy : 1.6.0
seaborn : 0.11.1
numpy : 1.20.1
Watermark: 2.1.0
</code></pre></div></div>Ben LacarCovariance priors for multivariate normal models are an important tool for the implementation of varying effects. By representing more than one parameter with a covarying structure, even more partial pooling can result than if the parameters had their own separate distribution. Before talking more about varying effects, I thought I’d write about the weird ways that covariance matrixes are made.Escaping the Devil’s Funnel2022-03-26T00:00:00+00:002022-03-26T00:00:00+00:00https://benslack19.github.io/data%20science/statistics/devilsfunnel_cnc_param<p>Multi-level models are great for improving our estimates. However, the intuitive way these kinds of models are specified (which goes by the <a href="https://media.giphy.com/media/LS4AuDMMDZaUJJcusY/giphy.gif">unhelpful</a> name “centered” parameterization) can be <a href="https://media.giphy.com/media/AsDBIwyLjHc9G/giphy.gif">notorious</a> for producing posterior distributions that are difficult to sample using Markov chain Monte Carlo. This is because when a parameter (such as the scale variable of one distribution) depends on other parameters, the posterior can have weird shapes. This is the rationale for re-specifying the model into a <a href="https://benslack19.github.io/data%20science/statistics/diagnosing-a-model/#me-attempt-4-re-paramaterization">“non-centered” parameterization</a>.</p>
<p>One does not need a multi-level model to appreciate this concept. In the <a href="https://www.youtube.com/watch?v=n2aJYtuGu54&list=PLDcUM9US4XdMROZ57-OIRtIK0aOynbgZN&index=13&t=2319s">divergent transition section of Statistical Rethinking lecture 13 </a>, Dr. McElreath illustrates the centered and non-centered parameterization ideas with what he calls “The Devil’s Funnel”. A funnel can be seen when plotting $\nu$ and x from the following centered paramaterization (figures shown below).</p>
\[\nu \sim \text{Normal}(0, \sigma)\]
\[x \sim \text{Normal}(0, \text{exp}(\nu))\]
<p>This is numerically equivalent to the non-centered form. The noteworthy trick is setting $x$ from a stochastic relationship to a deterministic one and creating a new variable $z$ that is easier to sample.</p>
\[\nu \sim \text{Normal}(0, \sigma)\]
\[z \sim \text{Normal}(0, 1)\]
\[x = z \times \text{exp}(\nu))\]
<p>In an online discussion forum, we shared experiences with these kinds of parameterizations since it was around the same time <a href="https://www.youtube.com/watch?v=n2aJYtuGu54&list=PLDcUM9US4XdMROZ57-OIRtIK0aOynbgZN&index=13">lecture 13 of Statistical Rethinking</a> was released. In the <a href="https://www.youtube.com/watch?v=n2aJYtuGu54&list=PLDcUM9US4XdMROZ57-OIRtIK0aOynbgZN&index=13&t=2319s">divergent transition section of the lecture</a>, I noticed that the centered parameterization had a distribution that looked somewhat bivariate Gaussian when the value of $\sigma$ was low. I thought changing parameterizations wouldn’t affect sampling efficiency. I then asked at what value of $\sigma$ does the sampling efficiency matter. Let’s find out!</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># boiler plate setup code
</span><span class="kn">import</span> <span class="nn">arviz</span> <span class="k">as</span> <span class="n">az</span>
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="n">pd</span>
<span class="kn">import</span> <span class="nn">pymc3</span> <span class="k">as</span> <span class="n">pm</span>
<span class="kn">import</span> <span class="nn">scipy.stats</span> <span class="k">as</span> <span class="n">stats</span>
<span class="kn">from</span> <span class="nn">scipy.special</span> <span class="kn">import</span> <span class="n">expit</span>
<span class="kn">from</span> <span class="nn">scipy.special</span> <span class="kn">import</span> <span class="n">logit</span>
<span class="kn">import</span> <span class="nn">seaborn</span> <span class="k">as</span> <span class="n">sns</span>
<span class="kn">import</span> <span class="nn">statsmodels.api</span> <span class="k">as</span> <span class="n">sm</span>
<span class="o">%</span><span class="n">load_ext</span> <span class="n">nb_black</span>
<span class="o">%</span><span class="n">config</span> <span class="n">InlineBackend</span><span class="p">.</span><span class="n">figure_format</span> <span class="o">=</span> <span class="s">'retina'</span>
<span class="o">%</span><span class="n">load_ext</span> <span class="n">watermark</span>
<span class="n">RANDOM_SEED</span> <span class="o">=</span> <span class="mi">8927</span>
<span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">seed</span><span class="p">(</span><span class="n">RANDOM_SEED</span><span class="p">)</span>
<span class="n">az</span><span class="p">.</span><span class="n">style</span><span class="p">.</span><span class="n">use</span><span class="p">(</span><span class="s">"arviz-darkgrid"</span><span class="p">)</span>
<span class="n">az</span><span class="p">.</span><span class="n">rcParams</span><span class="p">[</span><span class="s">"stats.hdi_prob"</span><span class="p">]</span> <span class="o">=</span> <span class="mf">0.89</span> <span class="c1"># sets default credible interval used by arviz
</span>
<span class="k">def</span> <span class="nf">standardize</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="n">x</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span> <span class="o">-</span> <span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">x</span><span class="p">))</span> <span class="o">/</span> <span class="n">np</span><span class="p">.</span><span class="n">std</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="k">return</span> <span class="n">x</span>
</code></pre></div></div>
<p>The Devil’s Funnel variables were originally specified like this:</p>
\[\nu \sim \text{Normal}(0, \sigma=3)\]
\[x \sim \text{Normal}(0, \text{exp}(\nu))\]
<p>The funnel gets more extreme with higher values for the standard deviation of $\nu$. Since there is no data here, this would be like manipulating priors only. I therefore experimented with different values for the standard deviation (sigma) of $\nu$.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Generate a list of sigmas for the prior nu
</span><span class="n">sigmas</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">geomspace</span><span class="p">(</span><span class="mf">0.025</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">num</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>
<span class="c1"># Create dictionaries for storage
# samples for plotting
</span><span class="n">traces_C</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">()</span>
<span class="n">traces_NC</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">()</span>
<span class="c1"># summary results
</span><span class="n">summary_C</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">()</span>
<span class="n">summary_NC</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">()</span>
<span class="c1"># number of divergences
</span><span class="n">div_C</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">()</span>
<span class="n">div_NC</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">()</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Look at sigma values
</span><span class="n">sigmas</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>array([0.025 , 0.03766575, 0.05674836, 0.0854988 , 0.12881507,
0.19407667, 0.29240177, 0.44054134, 0.66373288, 1. ])
</code></pre></div></div>
<p>The following code evaluates each sigma value and uses that to build centered and non-centered models. I’ll save the results at the end of each model run and then plot the sampling metrics down below.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">sigma</span> <span class="ow">in</span> <span class="n">sigmas</span><span class="p">:</span>
<span class="c1"># Centered model
</span> <span class="k">with</span> <span class="n">pm</span><span class="p">.</span><span class="n">Model</span><span class="p">()</span> <span class="k">as</span> <span class="n">mC</span><span class="p">:</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">"v"</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">,</span> <span class="n">sigma</span><span class="p">)</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">"x"</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">,</span> <span class="n">pm</span><span class="p">.</span><span class="n">math</span><span class="p">.</span><span class="n">exp</span><span class="p">(</span><span class="n">v</span><span class="p">))</span>
<span class="n">trace_mC</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">sample</span><span class="p">(</span><span class="n">draws</span><span class="o">=</span><span class="mi">1000</span><span class="p">,</span> <span class="n">tune</span><span class="o">=</span><span class="mi">1000</span><span class="p">,</span> <span class="n">chains</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">return_inferencedata</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span> <span class="n">progressbar</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="c1"># Save results
</span> <span class="n">traces_C</span><span class="p">[</span><span class="n">sigma</span><span class="p">]</span> <span class="o">=</span> <span class="n">trace_mC</span>
<span class="n">summary_C</span><span class="p">[</span><span class="n">sigma</span><span class="p">]</span> <span class="o">=</span> <span class="n">az</span><span class="p">.</span><span class="n">summary</span><span class="p">(</span><span class="n">trace_mC</span><span class="p">)</span>
<span class="n">div_C</span><span class="p">[</span><span class="n">sigma</span><span class="p">]</span> <span class="o">=</span> <span class="n">trace_mC</span><span class="p">[</span><span class="s">"diverging"</span><span class="p">].</span><span class="nb">sum</span><span class="p">()</span>
<span class="c1"># Non-centered model
</span> <span class="k">with</span> <span class="n">pm</span><span class="p">.</span><span class="n">Model</span><span class="p">()</span> <span class="k">as</span> <span class="n">mNC</span><span class="p">:</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">"v"</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">,</span> <span class="n">sigma</span><span class="p">)</span>
<span class="n">z</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">"z"</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">)</span>
<span class="c1"># transformed variable
</span> <span class="n">x</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Deterministic</span><span class="p">(</span><span class="s">"x"</span><span class="p">,</span> <span class="n">z</span><span class="o">*</span><span class="n">np</span><span class="p">.</span><span class="n">exp</span><span class="p">(</span><span class="n">v</span><span class="p">))</span>
<span class="n">trace_mNC</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">sample</span><span class="p">(</span><span class="n">draws</span><span class="o">=</span><span class="mi">1000</span><span class="p">,</span> <span class="n">tune</span><span class="o">=</span><span class="mi">1000</span><span class="p">,</span> <span class="n">chains</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">return_inferencedata</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span> <span class="n">progressbar</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="c1"># Save results
</span> <span class="n">traces_NC</span><span class="p">[</span><span class="n">sigma</span><span class="p">]</span> <span class="o">=</span> <span class="n">trace_mNC</span>
<span class="n">summary_NC</span><span class="p">[</span><span class="n">sigma</span><span class="p">]</span> <span class="o">=</span> <span class="n">az</span><span class="p">.</span><span class="n">summary</span><span class="p">(</span><span class="n">trace_mNC</span><span class="p">)</span>
<span class="n">div_NC</span><span class="p">[</span><span class="n">sigma</span><span class="p">]</span><span class="o">=</span> <span class="n">trace_mNC</span><span class="p">[</span><span class="s">"diverging"</span><span class="p">].</span><span class="nb">sum</span><span class="p">()</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [x, v]
(Removed the rest of the pymc output to save space. Notes about divergences are explored down below.)
</code></pre></div></div>
<h1 id="exploration-of-the-joint-distribution">Exploration of the joint distribution</h1>
<p>We’ll plot the joint distribution of the centered-parameterization of $x$ and $\nu$ (blue) and see how that looks compared to the non-centered parameterization. In the latter, we’re sampling $z$ from a regular Gaussian distribution and getting $x$ through a deterministic transformation relationship.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sigmas_sampled</span> <span class="o">=</span> <span class="n">sigmas</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span><span class="nb">len</span><span class="p">(</span><span class="n">sigmas</span><span class="p">):</span><span class="mi">2</span><span class="p">]</span> <span class="c1"># plot every other sigma evaluated
</span>
<span class="n">f</span><span class="p">,</span> <span class="n">axes</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">sigmas_sampled</span><span class="p">),</span> <span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">20</span><span class="p">,</span> <span class="mi">8</span><span class="p">))</span>
<span class="c1"># top row: centered model
</span><span class="k">for</span> <span class="n">sigma</span><span class="p">,</span> <span class="n">ax</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">sigmas_sampled</span><span class="p">,</span> <span class="n">axes</span><span class="p">.</span><span class="n">flat</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="nb">len</span><span class="p">(</span><span class="n">sigmas_sampled</span><span class="p">)]):</span>
<span class="n">samples_C</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">trace_to_dataframe</span><span class="p">(</span><span class="n">traces_C</span><span class="p">[</span><span class="n">sigma</span><span class="p">])</span>
<span class="n">ax</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">samples_C</span><span class="p">[</span><span class="s">'x'</span><span class="p">],</span> <span class="n">samples_C</span><span class="p">[</span><span class="s">'v'</span><span class="p">],</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.2</span><span class="p">,</span> <span class="n">facecolors</span><span class="o">=</span><span class="s">'none'</span><span class="p">,</span> <span class="n">edgecolors</span><span class="o">=</span><span class="s">'navy'</span><span class="p">)</span>
<span class="n">sigma_str</span> <span class="o">=</span> <span class="s">'{:.3f}'</span><span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="n">sigma</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">set_title</span><span class="p">(</span><span class="sa">f</span><span class="s">'sigma = </span><span class="si">{</span><span class="n">sigma_str</span><span class="si">}</span><span class="s">'</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">set_xlabel</span><span class="p">(</span><span class="s">'x (stochastic)'</span><span class="p">)</span>
<span class="k">if</span> <span class="n">ax</span><span class="p">.</span><span class="n">is_first_col</span><span class="p">()</span> <span class="o">&</span> <span class="n">ax</span><span class="p">.</span><span class="n">is_first_row</span><span class="p">():</span>
<span class="n">ax</span><span class="p">.</span><span class="n">set_ylabel</span><span class="p">(</span><span class="s">'centered</span><span class="se">\n\n</span><span class="s">v'</span><span class="p">)</span>
<span class="c1"># middle row: non-centered model, z on x-axis
</span><span class="k">for</span> <span class="n">sigma</span><span class="p">,</span> <span class="n">ax</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">sigmas_sampled</span><span class="p">,</span> <span class="n">axes</span><span class="p">.</span><span class="n">flat</span><span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="n">sigmas_sampled</span><span class="p">):</span><span class="mi">2</span><span class="o">*</span><span class="nb">len</span><span class="p">(</span><span class="n">sigmas_sampled</span><span class="p">)]):</span>
<span class="n">samples_NC</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">trace_to_dataframe</span><span class="p">(</span><span class="n">traces_NC</span><span class="p">[</span><span class="n">sigma</span><span class="p">])</span>
<span class="n">ax</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">samples_NC</span><span class="p">[</span><span class="s">'z'</span><span class="p">],</span> <span class="n">samples_NC</span><span class="p">[</span><span class="s">'v'</span><span class="p">],</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.2</span><span class="p">,</span> <span class="n">facecolors</span><span class="o">=</span><span class="s">'none'</span><span class="p">,</span> <span class="n">edgecolors</span><span class="o">=</span><span class="s">'darkgreen'</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">set_xlabel</span><span class="p">(</span><span class="s">'z (stochastic)'</span><span class="p">)</span>
<span class="k">if</span> <span class="n">ax</span><span class="p">.</span><span class="n">is_first_col</span><span class="p">():</span>
<span class="n">ax</span><span class="p">.</span><span class="n">set_ylabel</span><span class="p">(</span><span class="s">'non-centered</span><span class="se">\n\n</span><span class="s">v'</span><span class="p">)</span>
<span class="c1"># bottom row: non-centered model, x on x-axis
</span><span class="k">for</span> <span class="n">sigma</span><span class="p">,</span> <span class="n">ax</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">sigmas_sampled</span><span class="p">,</span> <span class="n">axes</span><span class="p">.</span><span class="n">flat</span><span class="p">[</span><span class="mi">2</span><span class="o">*</span><span class="nb">len</span><span class="p">(</span><span class="n">sigmas_sampled</span><span class="p">):</span><span class="mi">3</span><span class="o">*</span><span class="nb">len</span><span class="p">(</span><span class="n">sigmas_sampled</span><span class="p">)]):</span>
<span class="n">samples_NC</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">trace_to_dataframe</span><span class="p">(</span><span class="n">traces_NC</span><span class="p">[</span><span class="n">sigma</span><span class="p">])</span>
<span class="n">ax</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">samples_NC</span><span class="p">[</span><span class="s">'x'</span><span class="p">],</span> <span class="n">samples_NC</span><span class="p">[</span><span class="s">'v'</span><span class="p">],</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.2</span><span class="p">,</span> <span class="n">facecolors</span><span class="o">=</span><span class="s">'none'</span><span class="p">,</span> <span class="n">edgecolors</span><span class="o">=</span><span class="s">'darkgreen'</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">set_xlabel</span><span class="p">(</span><span class="s">'x (deterministic)'</span><span class="p">)</span>
<span class="k">if</span> <span class="n">ax</span><span class="p">.</span><span class="n">is_first_col</span><span class="p">()</span> <span class="o">&</span> <span class="n">ax</span><span class="p">.</span><span class="n">is_last_row</span><span class="p">():</span>
<span class="n">ax</span><span class="p">.</span><span class="n">set_ylabel</span><span class="p">(</span><span class="s">'non-centered</span><span class="se">\n\n</span><span class="s">v'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">tight_layout</span><span class="p">()</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><ipython-input-21-5993b7c4c1c1>:34: UserWarning: This figure was using constrained_layout==True, but that is incompatible with subplots_adjust and or tight_layout: setting constrained_layout==False.
plt.tight_layout()
</code></pre></div></div>
<p><img src="/assets/2022-03-26-devilsfunnel_cnc_param_files/2022-03-26-devilsfunnel_cnc_param_8_1.png" alt="png" /></p>
<p>At the top is the centered target distribution with increasing values of $\sigma$. We can see that the Devil’s Funnel begins to form as $/sigma$ exceeds 0.2. However, the middle row looks like samples from a plain old bivariate Gaussian regression and doesn’t change. That’s because we’ve defined it to <em>not</em> change: it will always be $z \sim \text{Normal}(0,1)$ regardless of $\sigma$. The last row shows that we can get $x$ and our target distribution back with a deterministic transformation.</p>
<p>Let’s look at other metrics to inform us about sampling efficiency.</p>
<h1 id="number-of-divergences">Number of divergences</h1>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">f</span><span class="p">,</span> <span class="n">ax1</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">6</span><span class="p">,</span> <span class="mi">4</span><span class="p">))</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">div_C</span><span class="p">.</span><span class="n">keys</span><span class="p">(),</span> <span class="n">div_C</span><span class="p">.</span><span class="n">values</span><span class="p">(),</span> <span class="n">color</span><span class="o">=</span><span class="s">'navy'</span><span class="p">)</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">div_C</span><span class="p">.</span><span class="n">keys</span><span class="p">(),</span> <span class="n">div_C</span><span class="p">.</span><span class="n">values</span><span class="p">(),</span> <span class="n">color</span><span class="o">=</span><span class="s">'navy'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'Centered'</span><span class="p">)</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">div_NC</span><span class="p">.</span><span class="n">keys</span><span class="p">(),</span> <span class="n">div_NC</span><span class="p">.</span><span class="n">values</span><span class="p">(),</span> <span class="n">color</span><span class="o">=</span><span class="s">'darkgreen'</span><span class="p">)</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">div_NC</span><span class="p">.</span><span class="n">keys</span><span class="p">(),</span> <span class="n">div_NC</span><span class="p">.</span><span class="n">values</span><span class="p">(),</span> <span class="n">color</span><span class="o">=</span><span class="s">'darkgreen'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'Non-centered'</span><span class="p">)</span>
<span class="n">ax1</span><span class="p">.</span><span class="nb">set</span><span class="p">(</span><span class="n">xlabel</span><span class="o">=</span><span class="s">'sigma'</span><span class="p">,</span> <span class="n">ylabel</span><span class="o">=</span><span class="s">'Number of divergences'</span><span class="p">,</span> <span class="n">title</span><span class="o">=</span><span class="s">'Number of divergences</span><span class="se">\n</span><span class="s">Centered vs Non-centered'</span><span class="p">)</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">legend</span><span class="p">()</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><matplotlib.legend.Legend at 0x7fd419834520>
</code></pre></div></div>
<p><img src="/assets/2022-03-26-devilsfunnel_cnc_param_files/2022-03-26-devilsfunnel_cnc_param_11_1.png" alt="png" /></p>
<p>We don’t see any divergences at all in the non-centered paramaterization, regardless of $\sigma$. We can get away with the centered paramaterization only at low values of $\sigma$ as the bivariate plots suggest.</p>
<h1 id="number-of-effective-samples-and-r-hat">Number of effective samples and R-hat</h1>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Put the summary results in one table to facilitate plotting
</span><span class="n">df_summary_C</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">concat</span><span class="p">(</span><span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">summary_C</span><span class="p">[</span><span class="n">sigma</span><span class="p">]).</span><span class="n">reset_index</span><span class="p">().</span><span class="n">rename</span><span class="p">(</span><span class="n">columns</span><span class="o">=</span><span class="p">{</span><span class="s">'index'</span><span class="p">:</span> <span class="s">'var'</span><span class="p">})</span> <span class="k">for</span> <span class="n">sigma</span> <span class="ow">in</span> <span class="n">sigmas</span><span class="p">).</span><span class="n">reset_index</span><span class="p">(</span><span class="n">drop</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">df_summary_C</span><span class="p">[</span><span class="s">'sigma'</span><span class="p">]</span> <span class="o">=</span> <span class="nb">sorted</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="n">sigmas</span><span class="p">)</span><span class="o">*</span><span class="mi">2</span><span class="p">)</span>
<span class="n">df_summary_NC</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">concat</span><span class="p">(</span><span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">summary_NC</span><span class="p">[</span><span class="n">sigma</span><span class="p">]).</span><span class="n">reset_index</span><span class="p">().</span><span class="n">rename</span><span class="p">(</span><span class="n">columns</span><span class="o">=</span><span class="p">{</span><span class="s">'index'</span><span class="p">:</span> <span class="s">'var'</span><span class="p">})</span> <span class="k">for</span> <span class="n">sigma</span> <span class="ow">in</span> <span class="n">sigmas</span><span class="p">).</span><span class="n">reset_index</span><span class="p">(</span><span class="n">drop</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">df_summary_NC</span><span class="p">[</span><span class="s">'sigma'</span><span class="p">]</span> <span class="o">=</span> <span class="nb">sorted</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="n">sigmas</span><span class="p">)</span><span class="o">*</span><span class="mi">3</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">f</span><span class="p">,</span> <span class="p">((</span><span class="n">ax1</span><span class="p">,</span> <span class="n">ax2</span><span class="p">),</span> <span class="p">(</span><span class="n">ax3</span><span class="p">,</span> <span class="n">ax4</span><span class="p">))</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">12</span><span class="p">,</span> <span class="mi">8</span><span class="p">))</span>
<span class="c1"># Top row (v) ---------
</span>
<span class="c1"># plot centered ESS
</span><span class="n">df_centered_v</span> <span class="o">=</span> <span class="n">df_summary_C</span><span class="p">.</span><span class="n">loc</span><span class="p">[</span><span class="n">df_summary_C</span><span class="p">[</span><span class="s">'var'</span><span class="p">]</span><span class="o">==</span><span class="s">'v'</span><span class="p">,</span> <span class="p">:]</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">df_centered_v</span><span class="p">[</span><span class="s">'sigma'</span><span class="p">],</span> <span class="n">df_centered_v</span><span class="p">[</span><span class="s">'ess_mean'</span><span class="p">],</span> <span class="n">color</span><span class="o">=</span><span class="s">'navy'</span><span class="p">)</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">df_centered_v</span><span class="p">[</span><span class="s">'sigma'</span><span class="p">],</span> <span class="n">df_centered_v</span><span class="p">[</span><span class="s">'ess_mean'</span><span class="p">],</span> <span class="n">color</span><span class="o">=</span><span class="s">'navy'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'Centered'</span><span class="p">)</span>
<span class="n">ax2</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">df_centered_v</span><span class="p">[</span><span class="s">'sigma'</span><span class="p">],</span> <span class="n">df_centered_v</span><span class="p">[</span><span class="s">'r_hat'</span><span class="p">],</span> <span class="n">color</span><span class="o">=</span><span class="s">'navy'</span><span class="p">)</span>
<span class="n">ax2</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">df_centered_v</span><span class="p">[</span><span class="s">'sigma'</span><span class="p">],</span> <span class="n">df_centered_v</span><span class="p">[</span><span class="s">'r_hat'</span><span class="p">],</span> <span class="n">color</span><span class="o">=</span><span class="s">'navy'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'Centered'</span><span class="p">)</span>
<span class="c1"># plot non-centered ESS
</span><span class="n">df_noncentered_v</span> <span class="o">=</span> <span class="n">df_summary_NC</span><span class="p">.</span><span class="n">loc</span><span class="p">[</span><span class="n">df_summary_NC</span><span class="p">[</span><span class="s">'var'</span><span class="p">]</span><span class="o">==</span><span class="s">'v'</span><span class="p">,</span> <span class="p">:]</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">df_noncentered_v</span><span class="p">[</span><span class="s">'sigma'</span><span class="p">],</span> <span class="n">df_noncentered_v</span><span class="p">[</span><span class="s">'ess_mean'</span><span class="p">],</span> <span class="n">color</span><span class="o">=</span><span class="s">'darkgreen'</span><span class="p">)</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">df_noncentered_v</span><span class="p">[</span><span class="s">'sigma'</span><span class="p">],</span> <span class="n">df_noncentered_v</span><span class="p">[</span><span class="s">'ess_mean'</span><span class="p">],</span> <span class="n">color</span><span class="o">=</span><span class="s">'darkgreen'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'Non-centered'</span><span class="p">)</span>
<span class="n">ax2</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">df_noncentered_v</span><span class="p">[</span><span class="s">'sigma'</span><span class="p">],</span> <span class="n">df_noncentered_v</span><span class="p">[</span><span class="s">'r_hat'</span><span class="p">],</span> <span class="n">color</span><span class="o">=</span><span class="s">'darkgreen'</span><span class="p">)</span>
<span class="n">ax2</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">df_noncentered_v</span><span class="p">[</span><span class="s">'sigma'</span><span class="p">],</span> <span class="n">df_noncentered_v</span><span class="p">[</span><span class="s">'r_hat'</span><span class="p">],</span> <span class="n">color</span><span class="o">=</span><span class="s">'darkgreen'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'Non-centered'</span><span class="p">)</span>
<span class="c1"># plot decorations
</span><span class="n">ax1</span><span class="p">.</span><span class="n">legend</span><span class="p">()</span>
<span class="n">ax1</span><span class="p">.</span><span class="nb">set</span><span class="p">(</span><span class="n">xlabel</span><span class="o">=</span><span class="s">'sigma'</span><span class="p">,</span> <span class="n">ylabel</span><span class="o">=</span><span class="s">'ESS'</span><span class="p">,</span> <span class="n">xscale</span><span class="o">=</span><span class="s">'linear'</span><span class="p">,</span> <span class="n">title</span><span class="o">=</span><span class="s">'Effective sample size for v'</span><span class="p">)</span>
<span class="n">ax2</span><span class="p">.</span><span class="n">legend</span><span class="p">()</span>
<span class="n">ax2</span><span class="p">.</span><span class="nb">set</span><span class="p">(</span><span class="n">xlabel</span><span class="o">=</span><span class="s">'sigma'</span><span class="p">,</span> <span class="n">ylabel</span><span class="o">=</span><span class="s">'R-hat'</span><span class="p">,</span> <span class="n">xscale</span><span class="o">=</span><span class="s">'linear'</span><span class="p">,</span> <span class="n">title</span><span class="o">=</span><span class="s">'R-hat for v'</span><span class="p">)</span>
<span class="c1"># Bottom row (x) ---------
</span>
<span class="c1"># plot centered ESS
</span><span class="n">df_centered_x</span> <span class="o">=</span> <span class="n">df_summary_C</span><span class="p">.</span><span class="n">loc</span><span class="p">[</span><span class="n">df_summary_C</span><span class="p">[</span><span class="s">'var'</span><span class="p">]</span><span class="o">==</span><span class="s">'x'</span><span class="p">,</span> <span class="p">:]</span>
<span class="n">ax3</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">df_centered_x</span><span class="p">[</span><span class="s">'sigma'</span><span class="p">],</span> <span class="n">df_centered_x</span><span class="p">[</span><span class="s">'ess_mean'</span><span class="p">],</span> <span class="n">color</span><span class="o">=</span><span class="s">'navy'</span><span class="p">)</span>
<span class="n">ax3</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">df_centered_x</span><span class="p">[</span><span class="s">'sigma'</span><span class="p">],</span> <span class="n">df_centered_x</span><span class="p">[</span><span class="s">'ess_mean'</span><span class="p">],</span> <span class="n">color</span><span class="o">=</span><span class="s">'navy'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'Centered'</span><span class="p">)</span>
<span class="n">ax4</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">df_centered_x</span><span class="p">[</span><span class="s">'sigma'</span><span class="p">],</span> <span class="n">df_centered_x</span><span class="p">[</span><span class="s">'r_hat'</span><span class="p">],</span> <span class="n">color</span><span class="o">=</span><span class="s">'navy'</span><span class="p">)</span>
<span class="n">ax4</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">df_centered_x</span><span class="p">[</span><span class="s">'sigma'</span><span class="p">],</span> <span class="n">df_centered_x</span><span class="p">[</span><span class="s">'r_hat'</span><span class="p">],</span> <span class="n">color</span><span class="o">=</span><span class="s">'navy'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'Centered'</span><span class="p">)</span>
<span class="c1"># plot non-centered ESS
</span><span class="n">df_noncentered_x</span> <span class="o">=</span> <span class="n">df_summary_NC</span><span class="p">.</span><span class="n">loc</span><span class="p">[</span><span class="n">df_summary_NC</span><span class="p">[</span><span class="s">'var'</span><span class="p">]</span><span class="o">==</span><span class="s">'x'</span><span class="p">,</span> <span class="p">:]</span>
<span class="n">ax3</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">df_noncentered_x</span><span class="p">[</span><span class="s">'sigma'</span><span class="p">],</span> <span class="n">df_noncentered_x</span><span class="p">[</span><span class="s">'ess_mean'</span><span class="p">],</span> <span class="n">color</span><span class="o">=</span><span class="s">'darkgreen'</span><span class="p">)</span>
<span class="n">ax3</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">df_noncentered_x</span><span class="p">[</span><span class="s">'sigma'</span><span class="p">],</span> <span class="n">df_noncentered_x</span><span class="p">[</span><span class="s">'ess_mean'</span><span class="p">],</span> <span class="n">color</span><span class="o">=</span><span class="s">'darkgreen'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'Non-centered'</span><span class="p">)</span>
<span class="n">ax4</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">df_noncentered_x</span><span class="p">[</span><span class="s">'sigma'</span><span class="p">],</span> <span class="n">df_noncentered_x</span><span class="p">[</span><span class="s">'r_hat'</span><span class="p">],</span> <span class="n">color</span><span class="o">=</span><span class="s">'darkgreen'</span><span class="p">)</span>
<span class="n">ax4</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">df_noncentered_x</span><span class="p">[</span><span class="s">'sigma'</span><span class="p">],</span> <span class="n">df_noncentered_x</span><span class="p">[</span><span class="s">'r_hat'</span><span class="p">],</span> <span class="n">color</span><span class="o">=</span><span class="s">'darkgreen'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'Non-centered'</span><span class="p">)</span>
<span class="c1"># plot decorations
</span><span class="n">ax3</span><span class="p">.</span><span class="n">legend</span><span class="p">()</span>
<span class="n">ax3</span><span class="p">.</span><span class="nb">set</span><span class="p">(</span><span class="n">xlabel</span><span class="o">=</span><span class="s">'sigma'</span><span class="p">,</span> <span class="n">ylabel</span><span class="o">=</span><span class="s">'ESS'</span><span class="p">,</span> <span class="n">xscale</span><span class="o">=</span><span class="s">'linear'</span><span class="p">,</span> <span class="n">title</span><span class="o">=</span><span class="s">'Effective sample size for x'</span><span class="p">)</span>
<span class="n">ax4</span><span class="p">.</span><span class="n">legend</span><span class="p">()</span>
<span class="n">ax4</span><span class="p">.</span><span class="nb">set</span><span class="p">(</span><span class="n">xlabel</span><span class="o">=</span><span class="s">'sigma'</span><span class="p">,</span> <span class="n">ylabel</span><span class="o">=</span><span class="s">'R-hat'</span><span class="p">,</span> <span class="n">xscale</span><span class="o">=</span><span class="s">'linear'</span><span class="p">,</span> <span class="n">title</span><span class="o">=</span><span class="s">'R-hat for x'</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">f</span><span class="p">,</span> <span class="p">((</span><span class="n">ax1</span><span class="p">,</span> <span class="n">ax2</span><span class="p">),</span> <span class="p">(</span><span class="n">ax3</span><span class="p">,</span> <span class="n">ax4</span><span class="p">))</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">12</span><span class="p">,</span> <span class="mi">8</span><span class="p">))</span>
<span class="c1"># Top row (v) ---------
</span>
<span class="c1"># plot centered ESS
</span><span class="n">df_centered_v</span> <span class="o">=</span> <span class="n">df_summary_C</span><span class="p">.</span><span class="n">loc</span><span class="p">[</span><span class="n">df_summary_C</span><span class="p">[</span><span class="s">'var'</span><span class="p">]</span><span class="o">==</span><span class="s">'v'</span><span class="p">,</span> <span class="p">:]</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">df_centered_v</span><span class="p">[</span><span class="s">'sigma'</span><span class="p">],</span> <span class="n">df_centered_v</span><span class="p">[</span><span class="s">'ess_mean'</span><span class="p">],</span> <span class="n">marker</span><span class="o">=</span><span class="s">'o'</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'navy'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'Centered'</span><span class="p">)</span>
<span class="n">ax2</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">df_centered_v</span><span class="p">[</span><span class="s">'sigma'</span><span class="p">],</span> <span class="n">df_centered_v</span><span class="p">[</span><span class="s">'r_hat'</span><span class="p">],</span> <span class="n">marker</span><span class="o">=</span><span class="s">'o'</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'navy'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'Centered'</span><span class="p">)</span>
<span class="c1"># plot non-centered ESS
</span><span class="n">df_noncentered_v</span> <span class="o">=</span> <span class="n">df_summary_NC</span><span class="p">.</span><span class="n">loc</span><span class="p">[</span><span class="n">df_summary_NC</span><span class="p">[</span><span class="s">'var'</span><span class="p">]</span><span class="o">==</span><span class="s">'v'</span><span class="p">,</span> <span class="p">:]</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">df_noncentered_v</span><span class="p">[</span><span class="s">'sigma'</span><span class="p">],</span> <span class="n">df_noncentered_v</span><span class="p">[</span><span class="s">'ess_mean'</span><span class="p">],</span> <span class="n">marker</span><span class="o">=</span><span class="s">'o'</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'darkgreen'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'Non-centered'</span><span class="p">)</span>
<span class="n">ax2</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">df_noncentered_v</span><span class="p">[</span><span class="s">'sigma'</span><span class="p">],</span> <span class="n">df_noncentered_v</span><span class="p">[</span><span class="s">'r_hat'</span><span class="p">],</span> <span class="n">marker</span><span class="o">=</span><span class="s">'o'</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'darkgreen'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'Non-centered'</span><span class="p">)</span>
<span class="c1"># plot decorations
</span><span class="n">ax1</span><span class="p">.</span><span class="n">legend</span><span class="p">()</span>
<span class="n">ax1</span><span class="p">.</span><span class="nb">set</span><span class="p">(</span><span class="n">xlabel</span><span class="o">=</span><span class="s">'sigma'</span><span class="p">,</span> <span class="n">ylabel</span><span class="o">=</span><span class="s">'ESS'</span><span class="p">,</span> <span class="n">xscale</span><span class="o">=</span><span class="s">'linear'</span><span class="p">,</span> <span class="n">title</span><span class="o">=</span><span class="s">'Effective sample size for v'</span><span class="p">)</span>
<span class="n">ax2</span><span class="p">.</span><span class="n">legend</span><span class="p">()</span>
<span class="n">ax2</span><span class="p">.</span><span class="nb">set</span><span class="p">(</span><span class="n">xlabel</span><span class="o">=</span><span class="s">'sigma'</span><span class="p">,</span> <span class="n">ylabel</span><span class="o">=</span><span class="s">'R-hat'</span><span class="p">,</span> <span class="n">xscale</span><span class="o">=</span><span class="s">'linear'</span><span class="p">,</span> <span class="n">title</span><span class="o">=</span><span class="s">'R-hat for v'</span><span class="p">)</span>
<span class="c1"># Bottom row (x) ---------
</span>
<span class="c1"># plot centered ESS
</span><span class="n">df_centered_x</span> <span class="o">=</span> <span class="n">df_summary_C</span><span class="p">.</span><span class="n">loc</span><span class="p">[</span><span class="n">df_summary_C</span><span class="p">[</span><span class="s">'var'</span><span class="p">]</span><span class="o">==</span><span class="s">'x'</span><span class="p">,</span> <span class="p">:]</span>
<span class="n">ax3</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">df_centered_x</span><span class="p">[</span><span class="s">'sigma'</span><span class="p">],</span> <span class="n">df_centered_x</span><span class="p">[</span><span class="s">'ess_mean'</span><span class="p">],</span> <span class="n">marker</span><span class="o">=</span><span class="s">'o'</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'navy'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'Centered'</span><span class="p">)</span>
<span class="n">ax4</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">df_centered_x</span><span class="p">[</span><span class="s">'sigma'</span><span class="p">],</span> <span class="n">df_centered_x</span><span class="p">[</span><span class="s">'r_hat'</span><span class="p">],</span> <span class="n">marker</span><span class="o">=</span><span class="s">'o'</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'navy'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'Centered'</span><span class="p">)</span>
<span class="c1"># plot non-centered ESS
</span><span class="n">df_noncentered_x</span> <span class="o">=</span> <span class="n">df_summary_NC</span><span class="p">.</span><span class="n">loc</span><span class="p">[</span><span class="n">df_summary_NC</span><span class="p">[</span><span class="s">'var'</span><span class="p">]</span><span class="o">==</span><span class="s">'x'</span><span class="p">,</span> <span class="p">:]</span>
<span class="n">ax3</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">df_noncentered_x</span><span class="p">[</span><span class="s">'sigma'</span><span class="p">],</span> <span class="n">df_noncentered_x</span><span class="p">[</span><span class="s">'ess_mean'</span><span class="p">],</span> <span class="n">marker</span><span class="o">=</span><span class="s">'o'</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'darkgreen'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'Non-centered'</span><span class="p">)</span>
<span class="n">ax4</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">df_noncentered_x</span><span class="p">[</span><span class="s">'sigma'</span><span class="p">],</span> <span class="n">df_noncentered_x</span><span class="p">[</span><span class="s">'r_hat'</span><span class="p">],</span> <span class="n">marker</span><span class="o">=</span><span class="s">'o'</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'darkgreen'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'Non-centered'</span><span class="p">)</span>
<span class="c1"># plot decorations
</span><span class="n">ax3</span><span class="p">.</span><span class="n">legend</span><span class="p">()</span>
<span class="n">ax3</span><span class="p">.</span><span class="nb">set</span><span class="p">(</span><span class="n">xlabel</span><span class="o">=</span><span class="s">'sigma'</span><span class="p">,</span> <span class="n">ylabel</span><span class="o">=</span><span class="s">'ESS'</span><span class="p">,</span> <span class="n">xscale</span><span class="o">=</span><span class="s">'linear'</span><span class="p">,</span> <span class="n">title</span><span class="o">=</span><span class="s">'Effective sample size for x'</span><span class="p">)</span>
<span class="n">ax4</span><span class="p">.</span><span class="n">legend</span><span class="p">()</span>
<span class="n">ax4</span><span class="p">.</span><span class="nb">set</span><span class="p">(</span><span class="n">xlabel</span><span class="o">=</span><span class="s">'sigma'</span><span class="p">,</span> <span class="n">ylabel</span><span class="o">=</span><span class="s">'R-hat'</span><span class="p">,</span> <span class="n">xscale</span><span class="o">=</span><span class="s">'linear'</span><span class="p">,</span> <span class="n">title</span><span class="o">=</span><span class="s">'R-hat for x'</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[Text(0.5, 0, 'sigma'),
Text(0, 0.5, 'R-hat'),
None,
Text(0.5, 1.0, 'R-hat for x')]
</code></pre></div></div>
<p><img src="/assets/2022-03-26-devilsfunnel_cnc_param_files/2022-03-26-devilsfunnel_cnc_param_16_1.png" alt="png" /></p>
<h1 id="conclusion">Conclusion</h1>
<p>When looking at the number of divergences, effective sample size, and R-hat, smaller values of sigma result in good sampling whether it’s in the centered or non-centered form of the Devil’s Funnel equations. However, between 0.2 and 0.4, we begin to see indications that the non-centered form is clearly doing better.</p>
<p>Appendix: Environment and system parameters</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">%</span><span class="n">watermark</span> <span class="o">-</span><span class="n">n</span> <span class="o">-</span><span class="n">u</span> <span class="o">-</span><span class="n">v</span> <span class="o">-</span><span class="n">iv</span> <span class="o">-</span><span class="n">w</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Last updated: Sat Mar 26 2022
Python implementation: CPython
Python version : 3.8.6
IPython version : 7.20.0
pymc3 : 3.11.0
arviz : 0.11.1
statsmodels: 0.12.2
numpy : 1.20.1
sys : 3.8.6 | packaged by conda-forge | (default, Jan 25 2021, 23:22:12)
[Clang 11.0.1 ]
matplotlib : 3.3.4
pandas : 1.2.1
scipy : 1.6.0
seaborn : 0.11.1
Watermark: 2.1.0
</code></pre></div></div>Ben LacarMulti-level models are great for improving our estimates. However, the intuitive way these kinds of models are specified (which goes by the unhelpful name “centered” parameterization) can be notorious for producing posterior distributions that are difficult to sample using Markov chain Monte Carlo. This is because when a parameter (such as the scale variable of one distribution) depends on other parameters, the posterior can have weird shapes. This is the rationale for re-specifying the model into a “non-centered” parameterization.Correlated data, different DAGs2022-02-03T00:00:00+00:002022-02-03T00:00:00+00:00https://benslack19.github.io/data%20science/statistics/stats_rethinking_corr_diffDAGs<p>One of the lessons from <a href="https://xcelab.net/rm/statistical-rethinking/">Statistical Rethinking</a> that really hit home for me was the importance of considering the data generation process. Different datasets can show similar patterns, but the data generation can be different. I’ll illustrate this below, showing how correlated data can arise from these varying processes.</p>
<p>As an homage to someone who’s recently been in the news … <a href="https://media.giphy.com/media/26FL5WoWnkl0xvXnG/giphy.gif">LFG!</a></p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">arviz</span> <span class="k">as</span> <span class="n">az</span>
<span class="kn">import</span> <span class="nn">daft</span>
<span class="kn">from</span> <span class="nn">causalgraphicalmodels</span> <span class="kn">import</span> <span class="n">CausalGraphicalModel</span>
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="n">pd</span>
<span class="kn">import</span> <span class="nn">scipy.stats</span> <span class="k">as</span> <span class="n">stats</span>
<span class="kn">import</span> <span class="nn">seaborn</span> <span class="k">as</span> <span class="n">sns</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">%</span><span class="n">load_ext</span> <span class="n">nb_black</span>
<span class="o">%</span><span class="n">config</span> <span class="n">InlineBackend</span><span class="p">.</span><span class="n">figure_format</span> <span class="o">=</span> <span class="s">'retina'</span>
<span class="o">%</span><span class="n">load_ext</span> <span class="n">watermark</span>
<span class="n">RANDOM_SEED</span> <span class="o">=</span> <span class="mi">8927</span>
<span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">seed</span><span class="p">(</span><span class="n">RANDOM_SEED</span><span class="p">)</span>
<span class="n">az</span><span class="p">.</span><span class="n">style</span><span class="p">.</span><span class="n">use</span><span class="p">(</span><span class="s">"arviz-darkgrid"</span><span class="p">)</span>
<span class="n">az</span><span class="p">.</span><span class="n">rcParams</span><span class="p">[</span><span class="s">"stats.hdi_prob"</span><span class="p">]</span> <span class="o">=</span> <span class="mf">0.89</span> <span class="c1"># sets default credible interval used by arviz
</span></code></pre></div></div>
<p>We’ll have three different directed acyclic graphs (DAGs) showing relationships with only three variables, <a href="https://media.giphy.com/media/3oEduXdm2gjnrsJBOo/giphy.gif">named</a> X, Y, and Z. In all of them, we’ll have 100 datapoints. In all of them, <code class="language-plaintext highlighter-rouge">X</code> will be our “predictor” variable and <code class="language-plaintext highlighter-rouge">Y</code> will always be the “outcome” variable. <code class="language-plaintext highlighter-rouge">Z</code> will be the wild card, moving around so we will see what effects it has on the relationship between <code class="language-plaintext highlighter-rouge">X</code> and <code class="language-plaintext highlighter-rouge">Y</code>.</p>
<h1 id="pipe">Pipe</h1>
<p>The first will be a mediator, AKA a pipe. Here <code class="language-plaintext highlighter-rouge">Z</code> passes on information from <code class="language-plaintext highlighter-rouge">X</code> to <code class="language-plaintext highlighter-rouge">Y</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">dag</span> <span class="o">=</span> <span class="n">CausalGraphicalModel</span><span class="p">(</span>
<span class="n">nodes</span><span class="o">=</span><span class="p">[</span><span class="s">"X"</span><span class="p">,</span> <span class="s">"Y"</span><span class="p">,</span> <span class="s">"Z"</span><span class="p">],</span>
<span class="n">edges</span><span class="o">=</span><span class="p">[</span>
<span class="p">(</span><span class="s">"X"</span><span class="p">,</span> <span class="s">"Z"</span><span class="p">),</span>
<span class="p">(</span><span class="s">"Z"</span><span class="p">,</span> <span class="s">"Y"</span><span class="p">),</span>
<span class="p">],</span>
<span class="p">)</span>
<span class="n">pgm</span> <span class="o">=</span> <span class="n">daft</span><span class="p">.</span><span class="n">PGM</span><span class="p">()</span>
<span class="n">coordinates</span> <span class="o">=</span> <span class="p">{</span>
<span class="s">"X"</span><span class="p">:</span> <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">),</span>
<span class="s">"Z"</span><span class="p">:</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">),</span>
<span class="s">"Y"</span><span class="p">:</span> <span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">0</span><span class="p">),</span>
<span class="p">}</span>
<span class="k">for</span> <span class="n">node</span> <span class="ow">in</span> <span class="n">dag</span><span class="p">.</span><span class="n">dag</span><span class="p">.</span><span class="n">nodes</span><span class="p">:</span>
<span class="n">pgm</span><span class="p">.</span><span class="n">add_node</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="n">node</span><span class="p">,</span> <span class="o">*</span><span class="n">coordinates</span><span class="p">[</span><span class="n">node</span><span class="p">])</span>
<span class="k">for</span> <span class="n">edge</span> <span class="ow">in</span> <span class="n">dag</span><span class="p">.</span><span class="n">dag</span><span class="p">.</span><span class="n">edges</span><span class="p">:</span>
<span class="n">pgm</span><span class="p">.</span><span class="n">add_edge</span><span class="p">(</span><span class="o">*</span><span class="n">edge</span><span class="p">)</span>
<span class="n">pgm</span><span class="p">.</span><span class="n">render</span><span class="p">()</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><matplotlib.axes._axes.Axes at 0x7fe709497550>
</code></pre></div></div>
<p><img src="/assets/2022-02-03-stats_rethinking_corr_diffDAGs_files/2022-02-03-stats_rethinking_corr_diffDAGs_4_1.png" alt="png" /></p>
<p>When I say “data generation”, I literally mean that. We can simulate the relationships between all variables. Since X is a starting point, that is the easiest to get. We’ll simply take 100 random samples from a normal distribution with mean 0, standard deviation of 1 (represented in <code class="language-plaintext highlighter-rouge">stats.norm.rvs</code> as <code class="language-plaintext highlighter-rouge">loc</code> and <code class="language-plaintext highlighter-rouge">scale</code>, respectively). Then to generate Z, we’ll take another random samples, where the mean of this new normal distribution is the product of a coefficient <code class="language-plaintext highlighter-rouge">bXZ</code> and <code class="language-plaintext highlighter-rouge">X</code>. We’re drawing Z from a normal distribution, because a simple multiplication would give perfectly correlated data. That’s not what we’re trying to represent. We then do something similar with Y, expect this time it is the product of <code class="language-plaintext highlighter-rouge">bZY</code> and <code class="language-plaintext highlighter-rouge">Z</code>.</p>
<p>A few quick notes you may be wondering about: my assignment of value to the coefficients is arbitrary and I’m appending “p” to the variable names, like <code class="language-plaintext highlighter-rouge">Xp</code>, to represent the X variable in this pipe dag.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">n</span> <span class="o">=</span> <span class="mi">100</span>
<span class="c1"># pipe X > Z > Y
</span><span class="n">bXZ</span> <span class="o">=</span> <span class="mi">1</span>
<span class="n">bZY</span> <span class="o">=</span> <span class="mi">1</span>
<span class="n">Xp</span> <span class="o">=</span> <span class="n">stats</span><span class="p">.</span><span class="n">norm</span><span class="p">.</span><span class="n">rvs</span><span class="p">(</span><span class="n">loc</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">n</span><span class="p">)</span>
<span class="n">Zp</span> <span class="o">=</span> <span class="n">stats</span><span class="p">.</span><span class="n">norm</span><span class="p">.</span><span class="n">rvs</span><span class="p">(</span><span class="n">loc</span><span class="o">=</span><span class="n">bXZ</span><span class="o">*</span><span class="n">Xp</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">n</span><span class="p">)</span>
<span class="n">Yp</span> <span class="o">=</span> <span class="n">stats</span><span class="p">.</span><span class="n">norm</span><span class="p">.</span><span class="n">rvs</span><span class="p">(</span><span class="n">loc</span><span class="o">=</span><span class="n">bZY</span><span class="o">*</span><span class="n">Zp</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">n</span><span class="p">)</span>
</code></pre></div></div>
<p>We’ll plot the relationship of <code class="language-plaintext highlighter-rouge">X</code> and <code class="language-plaintext highlighter-rouge">Y</code> from this pipe at the end, after we generate all data examples.</p>
<h1 id="fork">Fork</h1>
<p>Let’s look at the second DAG, which is a fork. <code class="language-plaintext highlighter-rouge">Z</code> influences both <code class="language-plaintext highlighter-rouge">X</code> and <code class="language-plaintext highlighter-rouge">Y</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">dag</span> <span class="o">=</span> <span class="n">CausalGraphicalModel</span><span class="p">(</span>
<span class="n">nodes</span><span class="o">=</span><span class="p">[</span><span class="s">"X"</span><span class="p">,</span> <span class="s">"Y"</span><span class="p">,</span> <span class="s">"Z"</span><span class="p">],</span>
<span class="n">edges</span><span class="o">=</span><span class="p">[</span>
<span class="p">(</span><span class="s">"Z"</span><span class="p">,</span> <span class="s">"X"</span><span class="p">),</span>
<span class="p">(</span><span class="s">"Z"</span><span class="p">,</span> <span class="s">"Y"</span><span class="p">),</span>
<span class="p">],</span>
<span class="p">)</span>
<span class="n">pgm</span> <span class="o">=</span> <span class="n">daft</span><span class="p">.</span><span class="n">PGM</span><span class="p">()</span>
<span class="n">coordinates</span> <span class="o">=</span> <span class="p">{</span>
<span class="s">"X"</span><span class="p">:</span> <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">),</span>
<span class="s">"Z"</span><span class="p">:</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span>
<span class="s">"Y"</span><span class="p">:</span> <span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">0</span><span class="p">),</span>
<span class="p">}</span>
<span class="k">for</span> <span class="n">node</span> <span class="ow">in</span> <span class="n">dag</span><span class="p">.</span><span class="n">dag</span><span class="p">.</span><span class="n">nodes</span><span class="p">:</span>
<span class="n">pgm</span><span class="p">.</span><span class="n">add_node</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="n">node</span><span class="p">,</span> <span class="o">*</span><span class="n">coordinates</span><span class="p">[</span><span class="n">node</span><span class="p">])</span>
<span class="k">for</span> <span class="n">edge</span> <span class="ow">in</span> <span class="n">dag</span><span class="p">.</span><span class="n">dag</span><span class="p">.</span><span class="n">edges</span><span class="p">:</span>
<span class="n">pgm</span><span class="p">.</span><span class="n">add_edge</span><span class="p">(</span><span class="o">*</span><span class="n">edge</span><span class="p">)</span>
<span class="n">pgm</span><span class="p">.</span><span class="n">render</span><span class="p">()</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><matplotlib.axes._axes.Axes at 0x7fe6c9853eb0>
/Users/blacar/opt/anaconda3/envs/stats_rethinking/lib/python3.8/site-packages/IPython/core/pylabtools.py:132: UserWarning: Calling figure.constrained_layout, but figure not setup to do constrained layout. You either called GridSpec without the fig keyword, you are using plt.subplot, or you need to call figure or subplots with the constrained_layout=True kwarg.
fig.canvas.print_figure(bytes_io, **kw)
</code></pre></div></div>
<p><img src="/assets/2022-02-03-stats_rethinking_corr_diffDAGs_files/2022-02-03-stats_rethinking_corr_diffDAGs_8_2.png" alt="png" /></p>
<p>The code will look similar as the pipe, but the relationships of the data generation process will reflect the DAG depicting this fork. You can see how <code class="language-plaintext highlighter-rouge">Z</code> is influencing both the predictor <code class="language-plaintext highlighter-rouge">X</code> and the outcome <code class="language-plaintext highlighter-rouge">Y</code>. <code class="language-plaintext highlighter-rouge">Z</code> is a confound of this relationship.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># fork X < Z > Y
</span><span class="n">bZX</span> <span class="o">=</span> <span class="mi">1</span>
<span class="n">bZY</span> <span class="o">=</span> <span class="mi">1</span>
<span class="n">Zf</span> <span class="o">=</span> <span class="n">stats</span><span class="p">.</span><span class="n">norm</span><span class="p">.</span><span class="n">rvs</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="n">n</span><span class="p">)</span>
<span class="n">Xf</span> <span class="o">=</span> <span class="n">stats</span><span class="p">.</span><span class="n">norm</span><span class="p">.</span><span class="n">rvs</span><span class="p">(</span><span class="n">bZX</span><span class="o">*</span><span class="n">Zf</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">n</span><span class="p">)</span>
<span class="n">Yf</span> <span class="o">=</span> <span class="n">stats</span><span class="p">.</span><span class="n">norm</span><span class="p">.</span><span class="n">rvs</span><span class="p">(</span><span class="n">bZY</span><span class="o">*</span><span class="n">Zf</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">n</span><span class="p">)</span>
</code></pre></div></div>
<h1 id="collider">Collider</h1>
<p>Now let’s move onto the trickiest DAG, which is the collider. Here, our predictor and outcome variables are no longer the consequences of <code class="language-plaintext highlighter-rouge">Z</code>, but are now the <em>causes</em> of <code class="language-plaintext highlighter-rouge">Z</code>. They both influence and “collide” on Z.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">dag</span> <span class="o">=</span> <span class="n">CausalGraphicalModel</span><span class="p">(</span>
<span class="n">nodes</span><span class="o">=</span><span class="p">[</span><span class="s">"X"</span><span class="p">,</span> <span class="s">"Y"</span><span class="p">,</span> <span class="s">"Z"</span><span class="p">],</span>
<span class="n">edges</span><span class="o">=</span><span class="p">[</span>
<span class="p">(</span><span class="s">"X"</span><span class="p">,</span> <span class="s">"Z"</span><span class="p">),</span>
<span class="p">(</span><span class="s">"Y"</span><span class="p">,</span> <span class="s">"Z"</span><span class="p">),</span>
<span class="p">],</span>
<span class="p">)</span>
<span class="n">pgm</span> <span class="o">=</span> <span class="n">daft</span><span class="p">.</span><span class="n">PGM</span><span class="p">()</span>
<span class="n">coordinates</span> <span class="o">=</span> <span class="p">{</span>
<span class="s">"X"</span><span class="p">:</span> <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span>
<span class="s">"Z"</span><span class="p">:</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">),</span>
<span class="s">"Y"</span><span class="p">:</span> <span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span>
<span class="p">}</span>
<span class="k">for</span> <span class="n">node</span> <span class="ow">in</span> <span class="n">dag</span><span class="p">.</span><span class="n">dag</span><span class="p">.</span><span class="n">nodes</span><span class="p">:</span>
<span class="n">pgm</span><span class="p">.</span><span class="n">add_node</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="n">node</span><span class="p">,</span> <span class="o">*</span><span class="n">coordinates</span><span class="p">[</span><span class="n">node</span><span class="p">])</span>
<span class="k">for</span> <span class="n">edge</span> <span class="ow">in</span> <span class="n">dag</span><span class="p">.</span><span class="n">dag</span><span class="p">.</span><span class="n">edges</span><span class="p">:</span>
<span class="n">pgm</span><span class="p">.</span><span class="n">add_edge</span><span class="p">(</span><span class="o">*</span><span class="n">edge</span><span class="p">)</span>
<span class="n">pgm</span><span class="p">.</span><span class="n">render</span><span class="p">()</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><matplotlib.axes._axes.Axes at 0x7fe6681ed070>
</code></pre></div></div>
<p><img src="/assets/2022-02-03-stats_rethinking_corr_diffDAGs_files/2022-02-03-stats_rethinking_corr_diffDAGs_12_1.png" alt="png" /></p>
<p>To get correlated data with the collider, we’ll have to do some non-intuitive things that are still represented on this DAG. I’ll show the code then explain after.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># collider X > Z < Y
</span><span class="kn">from</span> <span class="nn">scipy.special</span> <span class="kn">import</span> <span class="n">expit</span>
<span class="n">n</span><span class="o">=</span><span class="mi">200</span>
<span class="n">bXZ</span> <span class="o">=</span> <span class="mi">2</span>
<span class="n">bYZ</span> <span class="o">=</span> <span class="mi">2</span>
<span class="n">Xc</span> <span class="o">=</span> <span class="n">stats</span><span class="p">.</span><span class="n">norm</span><span class="p">.</span><span class="n">rvs</span><span class="p">(</span><span class="n">loc</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="mf">1.5</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">n</span><span class="p">)</span>
<span class="n">Yc</span> <span class="o">=</span> <span class="n">stats</span><span class="p">.</span><span class="n">norm</span><span class="p">.</span><span class="n">rvs</span><span class="p">(</span><span class="n">loc</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="mf">1.5</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">n</span><span class="p">)</span>
<span class="n">Zc</span> <span class="o">=</span> <span class="n">stats</span><span class="p">.</span><span class="n">bernoulli</span><span class="p">.</span><span class="n">rvs</span><span class="p">(</span><span class="n">expit</span><span class="p">(</span><span class="n">bXZ</span><span class="o">*</span><span class="n">Xc</span> <span class="o">-</span> <span class="n">bYZ</span><span class="o">*</span><span class="n">Yc</span><span class="p">),</span> <span class="n">size</span><span class="o">=</span><span class="n">n</span><span class="p">)</span>
<span class="n">df_c</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">({</span><span class="s">"Xc"</span><span class="p">:</span><span class="n">Xc</span><span class="p">,</span> <span class="s">"Yc"</span><span class="p">:</span><span class="n">Yc</span><span class="p">,</span> <span class="s">"Zc"</span><span class="p">:</span><span class="n">Zc</span><span class="p">,</span> <span class="s">"Znorm"</span><span class="p">:</span><span class="n">bXZ</span><span class="o">*</span><span class="n">Xc</span> <span class="o">-</span> <span class="n">bYZ</span><span class="o">*</span><span class="n">Yc</span><span class="p">})</span>
<span class="n">df_c0</span> <span class="o">=</span> <span class="n">df_c</span><span class="p">[</span><span class="n">df_c</span><span class="p">[</span><span class="s">'Zc'</span><span class="p">]</span><span class="o">==</span><span class="mi">0</span><span class="p">]</span>
<span class="n">df_c1</span> <span class="o">=</span> <span class="n">df_c</span><span class="p">[</span><span class="n">df_c</span><span class="p">[</span><span class="s">'Zc'</span><span class="p">]</span><span class="o">==</span><span class="mi">1</span><span class="p">]</span>
</code></pre></div></div>
<p>What are we doing in the code?</p>
<ol>
<li>We’re going to make <code class="language-plaintext highlighter-rouge">Z</code> take on a value of 0 or 1. It is essentially a categorical variable. This is to aid in visualization.</li>
<li>To ensure <code class="language-plaintext highlighter-rouge">Z</code> acts as a collider, we still have to represent the causal influences of <code class="language-plaintext highlighter-rouge">X</code> and <code class="language-plaintext highlighter-rouge">Y</code>. Z is represented by first taking the difference <code class="language-plaintext highlighter-rouge">bXZ*Xc - bYZ*Yc</code> and applying an expit (AKA logistic sigmoid function). We take the difference to get a positive correlated relationship to mimic the pipe and fork examples, but it still is a faithful representation of the DAG.</li>
<li>We’re going to show only one subset of data, those with <code class="language-plaintext highlighter-rouge">Z</code> again to aid in visualization.</li>
<li>Since we’re subsetting the data, we double the number of observations.</li>
</ol>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df_c</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>Xc</th>
<th>Yc</th>
<th>Zc</th>
<th>Znorm</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>1.517475</td>
<td>-0.412633</td>
<td>1</td>
<td>3.860216</td>
</tr>
<tr>
<th>1</th>
<td>-2.698825</td>
<td>0.746912</td>
<td>0</td>
<td>-6.891475</td>
</tr>
<tr>
<th>2</th>
<td>2.503817</td>
<td>0.868102</td>
<td>1</td>
<td>3.271430</td>
</tr>
<tr>
<th>3</th>
<td>-0.241992</td>
<td>-0.795235</td>
<td>1</td>
<td>1.106486</td>
</tr>
<tr>
<th>4</th>
<td>0.182652</td>
<td>-2.039822</td>
<td>1</td>
<td>4.444948</td>
</tr>
</tbody>
</table>
</div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">f</span><span class="p">,</span> <span class="n">ax1</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">()</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">df_c</span><span class="p">[</span><span class="s">'Znorm'</span><span class="p">],</span> <span class="n">df_c</span><span class="p">[</span><span class="s">'Zc'</span><span class="p">])</span>
<span class="n">ax1</span><span class="p">.</span><span class="nb">set</span><span class="p">(</span><span class="n">xlabel</span><span class="o">=</span><span class="s">'bXZ*Xc - bYZ*Yc'</span><span class="p">,</span> <span class="n">ylabel</span><span class="o">=</span><span class="s">'Z'</span><span class="p">,</span> <span class="n">title</span><span class="o">=</span><span class="s">'relationship before (x-axis) and after (y-axis) expit'</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[Text(0.5, 0, 'bXZ*Xc - bYZ*Yc'),
Text(0, 0.5, 'Z'),
Text(0.5, 1.0, 'relationship before (x-axis) and after (y-axis) expit')]
</code></pre></div></div>
<p><img src="/assets/2022-02-03-stats_rethinking_corr_diffDAGs_files/2022-02-03-stats_rethinking_corr_diffDAGs_17_1.png" alt="png" /></p>
<p>OK, now let’s plot the data!</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">f</span><span class="p">,</span> <span class="p">(</span><span class="n">ax1</span><span class="p">,</span> <span class="n">ax2</span><span class="p">,</span> <span class="n">ax3</span><span class="p">)</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">16</span><span class="p">,</span> <span class="mi">6</span><span class="p">),</span> <span class="n">sharex</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">sharey</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">Xp</span><span class="p">,</span> <span class="n">Yp</span><span class="p">)</span>
<span class="n">ax1</span><span class="p">.</span><span class="nb">set</span><span class="p">(</span><span class="n">xlabel</span><span class="o">=</span><span class="s">'X'</span><span class="p">,</span> <span class="n">ylabel</span><span class="o">=</span><span class="s">'Y'</span><span class="p">,</span> <span class="n">title</span><span class="o">=</span><span class="s">'pipe, X > Z > Y'</span><span class="p">)</span>
<span class="n">ax2</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">Xf</span><span class="p">,</span> <span class="n">Yf</span><span class="p">)</span>
<span class="n">ax2</span><span class="p">.</span><span class="nb">set</span><span class="p">(</span><span class="n">xlabel</span><span class="o">=</span><span class="s">'X'</span><span class="p">,</span> <span class="n">ylabel</span><span class="o">=</span><span class="s">'Y'</span><span class="p">,</span> <span class="n">title</span><span class="o">=</span><span class="s">'fork, X < Z > Y'</span><span class="p">)</span>
<span class="c1">#ax3.scatter(df_c0['Xc'], df_c0['Yc'], color='gray', label='Z=0')
</span><span class="n">ax3</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">df_c1</span><span class="p">[</span><span class="s">'Xc'</span><span class="p">],</span> <span class="n">df_c1</span><span class="p">[</span><span class="s">'Yc'</span><span class="p">],</span> <span class="n">color</span><span class="o">=</span><span class="s">'blue'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'Z=1'</span><span class="p">)</span>
<span class="n">ax3</span><span class="p">.</span><span class="nb">set</span><span class="p">(</span><span class="n">xlabel</span><span class="o">=</span><span class="s">'X'</span><span class="p">,</span> <span class="n">ylabel</span><span class="o">=</span><span class="s">'Y'</span><span class="p">,</span> <span class="n">title</span><span class="o">=</span><span class="s">'collider, X > Z < Y,</span><span class="se">\n</span><span class="s">(inverse logit(bXZ*Xc - bYZ*Yc)'</span><span class="p">)</span>
<span class="n">ax3</span><span class="p">.</span><span class="n">legend</span><span class="p">()</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><matplotlib.legend.Legend at 0x7fe6c142c4f0>
</code></pre></div></div>
<p><img src="/assets/2022-02-03-stats_rethinking_corr_diffDAGs_files/2022-02-03-stats_rethinking_corr_diffDAGs_19_1.png" alt="png" /></p>
<p>As you can see, there’s a positive correlated relationship despite the DAGs being different and working with only three variables! This drives home the point that we can’t look only at our data to infer the proper relationships. A data generating model is key to know how to stratify our models.</p>
<p>Appendix: Environment and system parameters</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">%</span><span class="n">watermark</span> <span class="o">-</span><span class="n">n</span> <span class="o">-</span><span class="n">u</span> <span class="o">-</span><span class="n">v</span> <span class="o">-</span><span class="n">iv</span> <span class="o">-</span><span class="n">w</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Last updated: Thu Feb 03 2022
Python implementation: CPython
Python version : 3.8.6
IPython version : 7.20.0
seaborn : 0.11.1
numpy : 1.20.1
pandas : 1.2.1
arviz : 0.11.1
daft : 0.1.0
sys : 3.8.6 | packaged by conda-forge | (default, Jan 25 2021, 23:22:12)
[Clang 11.0.1 ]
pymc3 : 3.11.0
scipy : 1.6.0
matplotlib : 3.3.4
statsmodels: 0.12.2
Watermark: 2.1.0
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
</code></pre></div></div>Ben LacarOne of the lessons from Statistical Rethinking that really hit home for me was the importance of considering the data generation process. Different datasets can show similar patterns, but the data generation can be different. I’ll illustrate this below, showing how correlated data can arise from these varying processes.Running models forwards and backwards2022-01-03T00:00:00+00:002022-01-03T00:00:00+00:00https://benslack19.github.io/data%20science/statistics/stats_rethinking_ch03_sim4blog_part1<p>The value of simulations is highighted by Dr. McElreath throughout his textbook and by <a href="nature.com/articles/s43586-020-00001-2.pdf">van de Schoot and colleagues</a>. I didn’t entirely appreciate its value until I implemented this myself. I started by reviewing some materials, then I went down a rabbit hole, where one question naturally branched into other questions. I’ll talk about simulations in multiple posts and how model running <a href="https://media.giphy.com/media/2bUpP71bbVnZ3x7lgQ/giphy.gif">forwards</a> and <a href="https://media.giphy.com/media/g08eUPeabWkAOh6n4Q/giphy.gif">backwards</a> can help with understanding.</p>
<p>We’ll start simple, with a weighted coin example. Then in later posts, we’ll move into simple linear regression, then multiple linear regression and thinking from a causal perspective. As most of my recent posts have been inspired by my learning of Bayesian inference, much credit goes to Dr. McElreath’s Statistical Rethinking, its associated <a href="https://github.com/pymc-devs/resources/tree/master/Rethinking_2"><code class="language-plaintext highlighter-rouge">pymc</code> repo</a>, and friends on the Discord server. Colleagues at UCSF who are experts in simulations and causal models have also been a great source.</p>
<p>Let’s get started using our weighted coin example!</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">arviz</span> <span class="k">as</span> <span class="n">az</span>
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="n">pd</span>
<span class="kn">import</span> <span class="nn">pymc3</span> <span class="k">as</span> <span class="n">pm</span>
<span class="kn">from</span> <span class="nn">scipy</span> <span class="kn">import</span> <span class="n">stats</span>
<span class="kn">import</span> <span class="nn">statsmodels.api</span> <span class="k">as</span> <span class="n">sm</span>
<span class="kn">import</span> <span class="nn">seaborn</span> <span class="k">as</span> <span class="n">sns</span>
<span class="n">sns</span><span class="p">.</span><span class="n">set_context</span><span class="p">(</span><span class="s">"talk"</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">%</span><span class="n">load_ext</span> <span class="n">nb_black</span>
<span class="o">%</span><span class="n">config</span> <span class="n">InlineBackend</span><span class="p">.</span><span class="n">figure_format</span> <span class="o">=</span> <span class="s">'retina'</span>
<span class="o">%</span><span class="n">load_ext</span> <span class="n">watermark</span>
<span class="n">RANDOM_SEED</span> <span class="o">=</span> <span class="mi">8927</span>
<span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">seed</span><span class="p">(</span><span class="n">RANDOM_SEED</span><span class="p">)</span>
<span class="n">az</span><span class="p">.</span><span class="n">style</span><span class="p">.</span><span class="n">use</span><span class="p">(</span><span class="s">"arviz-darkgrid"</span><span class="p">)</span>
<span class="n">az</span><span class="p">.</span><span class="n">rcParams</span><span class="p">[</span><span class="s">"stats.hdi_prob"</span><span class="p">]</span> <span class="o">=</span> <span class="mf">0.89</span> <span class="c1"># sets default credible interval used by arviz
</span></code></pre></div></div>
<h1 id="running-models-forwards-and-backwards">Running models forwards and backwards</h1>
<p>We can input parameters to generate data. Or we can start from data, come up with a model to infer a parameter. When most people do statistics, they’re doing the latter. But running a model in both directions can help us understand things more deeply.</p>
<p><img src="/assets/2022-01-03-stats_rethinking_ch03_sim4blog_part1_files/model_running-01.png" alt="png" /></p>
<p>Let’s pretend we have a weighted coin. (We’ll use this in place of McElreath’s globe tossing example.) In many problems, we’re asked to deduce the true proportion that the coin comes up heads (or tails). This would be <strong>parameter estimation</strong> or what McElreath calls “running a model backwards”. A parameter is typically <em>unobserved</em> in contrast to data, which we can observe. We can count the number of heads after a known number of tosses. But for simulation (running a model forwards), we input a <em>known parameter</em> (how weighted the coin is) and generate data.</p>
<p>In our simulation, let’s say that the true proportion the coin comes up heads is 0.7. Since each coin toss is independent of the other, these are conditions for a binomial distribution. We’ll illustrate simulations using <a href="https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binom.html"><code class="language-plaintext highlighter-rouge">scipy.stats</code></a> and inference using <code class="language-plaintext highlighter-rouge">pymc</code>. I’ll also show how to do a simulation using <code class="language-plaintext highlighter-rouge">pymc</code> alone but I think using the <code class="language-plaintext highlighter-rouge">scipy.stats</code> module has more flexibility.</p>
<h1 id="running-forwards-simulation-to-get-data">Running forwards: simulation to get data</h1>
<p>Let’s imagine we do two coin flips. The possibilities after three tosses ($n$=2) is to observe 0 heads (H), 1 H, or 2 H. The number of observed heads is assigned $k$. Formally, the model equation with the known parameters will look like this.</p>
\[H \sim \text{Binomial}(n=2, p=0.7)\]
<p>It can be read as “the number of heads we will observe is binomially distributed after two tosses of a coin, with true known proportion of heads of 0.7”. We can get the probability mass function (PMF) for this distribution like this.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># k = 0, 1, or 2 heads
</span><span class="n">stats</span><span class="p">.</span><span class="n">binom</span><span class="p">.</span><span class="n">pmf</span><span class="p">(</span><span class="n">k</span><span class="o">=</span><span class="nb">range</span><span class="p">(</span><span class="mi">3</span><span class="p">),</span> <span class="n">n</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">p</span><span class="o">=</span><span class="mf">0.7</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>array([0.09, 0.42, 0.49])
</code></pre></div></div>
<p>This means that, after two coin flips, we expect to observe 0 heads with 9% probability, 1 head with 42% probability, and 2 heads with 49% probability. These three possibilities sum to 100%. We can’t get more hits on water ($k$) than the number of tosses ($n$). This is made clear if we try to input a value of $k$ that is $n$. (We’ll do an explicit construction of a list of k values.)</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">stats</span><span class="p">.</span><span class="n">binom</span><span class="p">.</span><span class="n">pmf</span><span class="p">(</span><span class="n">k</span><span class="o">=</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span> <span class="n">n</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">p</span><span class="o">=</span><span class="mf">0.7</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>array([0.09, 0.42, 0.49, 0. ])
</code></pre></div></div>
<p>We don’t get an error, we simply get a probability of zero when $k > n$. Now let’s simulate.</p>
<p>We can use <code class="language-plaintext highlighter-rouge">stats.binom.rvs</code> to input parameters and generate data. Let’s do multiple trials, which is parameterized by <code class="language-plaintext highlighter-rouge">size</code>. To be clear, each trial means we’re doing two tosses and recording the number of heads in that trial. We’ll repeat this until we have 10 total trials.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Use rvs to make dummy observations
</span><span class="n">stats</span><span class="p">.</span><span class="n">binom</span><span class="p">.</span><span class="n">rvs</span><span class="p">(</span><span class="n">n</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">p</span><span class="o">=</span><span class="mf">0.7</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>array([2, 2, 1, 1, 1, 2, 2, 2, 2, 1])
</code></pre></div></div>
<p>If you keep executing this cell, you’ll get a new set of values for observed W. We can also generate a high number of trials by simply making <code class="language-plaintext highlighter-rouge">size</code> large. We’ll do 10,000 trials.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">dummy_h</span> <span class="o">=</span> <span class="n">stats</span><span class="p">.</span><span class="n">binom</span><span class="p">.</span><span class="n">rvs</span><span class="p">(</span><span class="n">n</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">p</span><span class="o">=</span><span class="mf">0.7</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">10</span> <span class="o">**</span> <span class="mi">5</span><span class="p">)</span>
<span class="n">dummy_h</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>array([2, 2, 1, ..., 1, 1, 2])
</code></pre></div></div>
<p>And from here, we can see how well the proportion of samples for each water value generated by the simulation matches the proportions determined analytically (using <code class="language-plaintext highlighter-rouge">stats.binom.pmf</code>).</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Wrap the list in a series so I can use `value_counts`
</span><span class="n">pd</span><span class="p">.</span><span class="n">Series</span><span class="p">(</span><span class="n">dummy_h</span><span class="p">).</span><span class="n">value_counts</span><span class="p">()</span> <span class="o">/</span> <span class="mi">10</span> <span class="o">**</span> <span class="mi">5</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>2 0.48837
1 0.42089
0 0.09074
dtype: float64
</code></pre></div></div>
<p>The numbers are pretty close to our PMF we calculated above.</p>
<h1 id="running-backwards-inference-to-estimate-the-parameter">Running backwards: inference to estimate the parameter</h1>
<p>Now let’s run the model backwards with <code class="language-plaintext highlighter-rouge">pymc</code> starting with the data that we generated from running the model forward. This might seem like a silly exercise since the purpose of inference is to estimate an unknown parameter. But the point of this exercise is to see how the two directions of model running are connected.</p>
<p>Here is our model equation.</p>
<p>\(H \sim \text{Binomial}(n=2, p)\)
<br />
\(p \sim \text{Beta}(\alpha=2, \beta=2)\)</p>
<p>We can read this as “the number of heads we will observe is binomially distributed after two tosses and some unknown parameter p.” We have to give $p$ some plausible prior distribution. Since $p$ should be between 0 and 1, a beta distribution is a good choice. I wrote about the beta distribution in a prior post <a href="https://benslack19.github.io/data%20science/statistics/prior-and-beta/">here</a>.</p>
<p>I choice beta to be parameterized as (2,2) since it is fairly conservative, with most of the mass suggesting it is a fair coin. You’ll see the shape of this below, when we put it with the posterior. We can get the posterior in the following code. <strong>There is a closed form solution where we can get the posterior analytically. The following is a way to get it with MCMC. While using MCMC is overkill for this problem, it is applicable with more complicated models.</strong></p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Generate observed data
</span><span class="n">dummy_h</span> <span class="o">=</span> <span class="n">stats</span><span class="p">.</span><span class="n">binom</span><span class="p">.</span><span class="n">rvs</span><span class="p">(</span><span class="n">n</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">p</span><span class="o">=</span><span class="mf">0.7</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>
<span class="c1"># Infer the parameter
</span><span class="k">with</span> <span class="n">pm</span><span class="p">.</span><span class="n">Model</span><span class="p">()</span> <span class="k">as</span> <span class="n">m1</span><span class="p">:</span>
<span class="c1"># prior
</span> <span class="n">p</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Beta</span><span class="p">(</span><span class="s">"p"</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">beta</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
<span class="c1"># likelihood with unknown parameter p, observed dummy_h
</span> <span class="n">H</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Binomial</span><span class="p">(</span><span class="s">"H"</span><span class="p">,</span> <span class="n">n</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">p</span><span class="o">=</span><span class="n">p</span><span class="p">,</span> <span class="n">observed</span><span class="o">=</span><span class="n">dummy_h</span><span class="p">)</span>
<span class="c1"># posterior
</span> <span class="n">trace_m1</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">sample</span><span class="p">(</span>
<span class="n">draws</span><span class="o">=</span><span class="mi">1000</span><span class="p">,</span> <span class="n">random_seed</span><span class="o">=</span><span class="mi">19</span><span class="p">,</span> <span class="n">return_inferencedata</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">progressbar</span><span class="o">=</span><span class="bp">False</span>
<span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [p]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 12 seconds.
</code></pre></div></div>
<p>Let’s plot the prior and posterior for $p$ together.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">a</span> <span class="o">=</span> <span class="mi">2</span>
<span class="n">b</span> <span class="o">=</span> <span class="mi">2</span>
<span class="n">f</span><span class="p">,</span> <span class="n">ax1</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">12</span><span class="p">,</span> <span class="mi">6</span><span class="p">))</span>
<span class="c1"># known parameter to generate data
</span><span class="n">ax1</span><span class="p">.</span><span class="n">axvline</span><span class="p">(</span>
<span class="mf">0.7</span><span class="p">,</span> <span class="n">lw</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"red"</span><span class="p">,</span> <span class="n">linestyle</span><span class="o">=</span><span class="s">"dashed"</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"known p to generate data"</span>
<span class="p">)</span>
<span class="c1"># prior
</span><span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">linspace</span><span class="p">(</span><span class="n">stats</span><span class="p">.</span><span class="n">beta</span><span class="p">.</span><span class="n">ppf</span><span class="p">(</span><span class="mf">0.00</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">),</span> <span class="n">stats</span><span class="p">.</span><span class="n">beta</span><span class="p">.</span><span class="n">ppf</span><span class="p">(</span><span class="mf">1.00</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">),</span> <span class="mi">100</span><span class="p">)</span>
<span class="n">prior_label</span> <span class="o">=</span> <span class="s">"prior, beta("</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">a</span><span class="p">)</span> <span class="o">+</span> <span class="s">", "</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">b</span><span class="p">)</span> <span class="o">+</span> <span class="s">")"</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">stats</span><span class="p">.</span><span class="n">beta</span><span class="p">.</span><span class="n">pdf</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">),</span> <span class="n">lw</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"gray"</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="n">prior_label</span><span class="p">)</span>
<span class="c1"># Make the posterior values accessible and plot
</span><span class="n">df_trace_m1</span> <span class="o">=</span> <span class="n">trace_m1</span><span class="p">.</span><span class="n">to_dataframe</span><span class="p">()</span>
<span class="n">sns</span><span class="p">.</span><span class="n">kdeplot</span><span class="p">(</span><span class="n">df_trace_m1</span><span class="p">[(</span><span class="s">"posterior"</span><span class="p">,</span> <span class="s">"p"</span><span class="p">)],</span> <span class="n">ax</span><span class="o">=</span><span class="n">ax1</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"blue"</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"posterior"</span><span class="p">)</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">set_title</span><span class="p">(</span><span class="s">"Distribution of p"</span><span class="p">)</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">set_xlabel</span><span class="p">(</span><span class="s">"random variable x"</span><span class="p">)</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">set_ylabel</span><span class="p">(</span><span class="s">"PDF"</span><span class="p">)</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">legend</span><span class="p">()</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><matplotlib.legend.Legend at 0x7fb94274c190>
</code></pre></div></div>
<p><img src="/assets/2022-01-03-stats_rethinking_ch03_sim4blog_part1_files/2022-01-03-stats_rethinking_ch03_sim4blog_part1_19_1.png" alt="png" /></p>
<p>You can see how our distribution of $p$ narrows and gets closer to the known true value of 0.7, but hasn’t centered over the true value yet. This will happen with more trials. But more of the probability mass is around 0.7 than the prior was. We can look more specifically at the 89% compatibility interval like this.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">az</span><span class="p">.</span><span class="n">summary</span><span class="p">(</span><span class="n">trace_m1</span><span class="p">)</span>
</code></pre></div></div>
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>mean</th>
<th>sd</th>
<th>hdi_5.5%</th>
<th>hdi_94.5%</th>
<th>mcse_mean</th>
<th>mcse_sd</th>
<th>ess_mean</th>
<th>ess_sd</th>
<th>ess_bulk</th>
<th>ess_tail</th>
<th>r_hat</th>
</tr>
</thead>
<tbody>
<tr>
<th>p</th>
<td>0.623</td>
<td>0.097</td>
<td>0.474</td>
<td>0.778</td>
<td>0.002</td>
<td>0.002</td>
<td>1725.0</td>
<td>1714.0</td>
<td>1749.0</td>
<td>2869.0</td>
<td>1.0</td>
</tr>
</tbody>
</table>
</div>
<h1 id="summary">Summary</h1>
<p>In this post, we talked about running a model forwards and backwards. I used a simple binomial example to illustrate how a known parameter can be used to generate data (running forwards) and how using observed data can help us obtain plausible parameter values in the form of a distribution. You may know already that <code class="language-plaintext highlighter-rouge">pymc</code> (and other software) has built-in capability to produce prior and posterior predictive simulations. We’ll use this functionality in a later post.</p>
<p>Appendix: Environment and system parameters</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">%</span><span class="n">watermark</span> <span class="o">-</span><span class="n">n</span> <span class="o">-</span><span class="n">u</span> <span class="o">-</span><span class="n">v</span> <span class="o">-</span><span class="n">iv</span> <span class="o">-</span><span class="n">w</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Last updated: Mon Jan 03 2022
Python implementation: CPython
Python version : 3.8.6
IPython version : 7.20.0
json : 2.0.9
arviz : 0.11.1
pandas : 1.2.1
matplotlib : 3.3.4
pymc3 : 3.11.0
scipy : 1.6.0
seaborn : 0.11.1
numpy : 1.20.1
statsmodels: 0.12.2
Watermark: 2.1.0
</code></pre></div></div>Ben LacarThe value of simulations is highighted by Dr. McElreath throughout his textbook and by van de Schoot and colleagues. I didn’t entirely appreciate its value until I implemented this myself. I started by reviewing some materials, then I went down a rabbit hole, where one question naturally branched into other questions. I’ll talk about simulations in multiple posts and how model running forwards and backwards can help with understanding.Exploring modeling failure2021-11-23T00:00:00+00:002021-11-23T00:00:00+00:00https://benslack19.github.io/data%20science/statistics/diagnosing-a-model<blockquote>
<p>In <a href="https://benslack19.github.io/data%20science/statistics/multilevel_modeling_01/">my last post</a>, I gave an example of a multilevel model using a binomial generalized linear model (GLM). The varying intercept model helped illustrate partial pooling, shrinkage, and information sharing. The equation to create the mixed effects model was simple. But how exactly is “information shared”? Let’s get started!</p>
</blockquote>
<p>This is how I was going to start this post. I thought this would be a fairly quick write-up where I would simplify the dataset by showing a few visualizations to demonstrate some concepts. But during this process, I realized the act of simplifying my dataset (using only three clusters) caused divergences when trying to obtain the posterior distribution. This is something that Richard McElreath had warned readers about on pages 407 and 408 of Statistical Rethinking. I tested and adjusted my priors–hard and unglamorous work that I planned to minimize to not distract from the main lesson.</p>
<p>However, I listened to <a href="https://www.learnbayesstats.com/episodes/9#showEpisodes">Alex Andorra’s conversation with Michael Betancourt in the Learning Bayesian Statistics podcast #6</a>. Around the 30 min mark, Alex and Michael talk about how failure is necessary for learning. Remarkably (for me), a few minutes later Michael uses the example specific to heirarchical models and about how problems can come from interactions between population location and population scale, including when there are only a small number of groups.</p>
<p>Instead of glossing over my observation, let’s dive into the problem, explore the model failure, and see how adjustments to priors and the modeling equations can resolve it.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">arviz</span> <span class="k">as</span> <span class="n">az</span>
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="n">pd</span>
<span class="kn">import</span> <span class="nn">pymc3</span> <span class="k">as</span> <span class="n">pm</span>
<span class="kn">import</span> <span class="nn">seaborn</span> <span class="k">as</span> <span class="n">sns</span>
<span class="kn">import</span> <span class="nn">scipy.stats</span> <span class="k">as</span> <span class="n">stats</span>
<span class="kn">from</span> <span class="nn">scipy.stats</span> <span class="kn">import</span> <span class="n">gaussian_kde</span>
<span class="kn">from</span> <span class="nn">theano</span> <span class="kn">import</span> <span class="n">tensor</span> <span class="k">as</span> <span class="n">tt</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">%</span><span class="n">load_ext</span> <span class="n">nb_black</span>
<span class="o">%</span><span class="n">config</span> <span class="n">InlineBackend</span><span class="p">.</span><span class="n">figure_format</span> <span class="o">=</span> <span class="s">'retina'</span>
<span class="o">%</span><span class="n">load_ext</span> <span class="n">watermark</span>
<span class="n">RANDOM_SEED</span> <span class="o">=</span> <span class="mi">8927</span>
<span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">seed</span><span class="p">(</span><span class="n">RANDOM_SEED</span><span class="p">)</span>
<span class="n">az</span><span class="p">.</span><span class="n">style</span><span class="p">.</span><span class="n">use</span><span class="p">(</span><span class="s">"arviz-darkgrid"</span><span class="p">)</span>
<span class="n">az</span><span class="p">.</span><span class="n">rcParams</span><span class="p">[</span><span class="s">"stats.hdi_prob"</span><span class="p">]</span> <span class="o">=</span> <span class="mf">0.89</span> <span class="c1"># sets default credible interval used by arviz
</span><span class="n">sns</span><span class="p">.</span><span class="n">set_context</span><span class="p">(</span><span class="s">"talk"</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">standardize</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="n">x</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span> <span class="o">-</span> <span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">x</span><span class="p">))</span> <span class="o">/</span> <span class="n">np</span><span class="p">.</span><span class="n">std</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="k">return</span> <span class="n">x</span>
</code></pre></div></div>
<p>I’ll be re-using the problem described in the <a href="https://benslack19.github.io/data%20science/statistics/multilevel_modeling_01/#problem-description">last post</a>, so I won’t re-write everything. But let’s show the fixed effects and mixed effects models from the original post since we’ll be contrasting them.</p>
<p><strong>Equation for fixed effects model</strong></p>
<p>Model <code class="language-plaintext highlighter-rouge">mfe</code> equation</p>
\[C_i \sim \text{Binomial}(1, p_i) \tag{binomial likelihood}\]
\[\text{logit}(p_i) = \alpha_{\text{district}[i]} \tag{linear model using logit link}\]
\[\alpha_j \sim \text{Normal}(0, 1.5) \tag{regularizing prior}\]
<p><strong>Equation for mixed effects model</strong></p>
<p>Model <code class="language-plaintext highlighter-rouge">mme</code> equation</p>
\[C_i \sim \text{Binomial}(1, p_i) \tag{binomial likelihood}\]
\[\text{logit}(p_i) = \alpha_{\text{district}[i]} \tag{linear model using logit link}\]
\[\alpha_j \sim \text{Normal}(\bar{\alpha}, \sigma) \tag{adaptive prior}\]
\[\bar{\alpha} \sim \text{Normal}(0, 1.5) \tag{regularizing hyperprior}\]
\[\sigma \sim \text{Exponential}(1) \tag{regularizing hyperprior}\]
<p>The main difference is the use of an adaptive prior and the regularizing hyperiors in the mixed effects equations. There will be another change to the $\sigma$ term of the mixed effects model which I’ll detail later.</p>
<h1 id="data-exploration-and-setup">Data exploration and setup</h1>
<p>I’ll load and clean all in this one cell and skip the details. Review the last post if you’d like to revisit this step.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df_bangladesh</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">read_csv</span><span class="p">(</span>
<span class="s">"../pymc3_ed_resources/resources/Rethinking/Data/bangladesh.csv"</span><span class="p">,</span>
<span class="n">delimiter</span><span class="o">=</span><span class="s">";"</span><span class="p">,</span>
<span class="p">)</span>
<span class="c1"># Fix the district variable
</span><span class="n">df_bangladesh</span><span class="p">[</span><span class="s">"district_code"</span><span class="p">]</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">Categorical</span><span class="p">(</span><span class="n">df_bangladesh</span><span class="p">[</span><span class="s">"district"</span><span class="p">]).</span><span class="n">codes</span>
</code></pre></div></div>
<p>To make the lessons of multi-level model more comprehensible, let’s limit the dataframe to only the first three districts. (As you’ll see, here is where I encountered trouble.)</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df_bangladesh_first3</span> <span class="o">=</span> <span class="n">df_bangladesh</span><span class="p">[</span><span class="n">df_bangladesh</span><span class="p">[</span><span class="s">"district_code"</span><span class="p">]</span> <span class="o"><</span> <span class="mi">3</span><span class="p">].</span><span class="n">copy</span><span class="p">()</span>
</code></pre></div></div>
<p>We can also get a count of women represented in each district. The variability in the number of women will help drive some of the lessons home further.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df_bangladesh_first3</span><span class="p">[</span><span class="s">'district_code'</span><span class="p">].</span><span class="n">value_counts</span><span class="p">()</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0 117
1 20
2 2
Name: district_code, dtype: int64
</code></pre></div></div>
<h1 id="fixed-effects-model">Fixed-effects model</h1>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">with</span> <span class="n">pm</span><span class="p">.</span><span class="n">Model</span><span class="p">()</span> <span class="k">as</span> <span class="n">mfe</span><span class="p">:</span>
<span class="c1"># alpha prior, one for each district
</span> <span class="n">a</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">"a"</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mf">1.5</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="nb">len</span><span class="p">(</span><span class="n">df_bangladesh_first3</span><span class="p">[</span><span class="s">"district_code"</span><span class="p">].</span><span class="n">unique</span><span class="p">()))</span>
<span class="c1"># link function
</span> <span class="n">p</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">math</span><span class="p">.</span><span class="n">invlogit</span><span class="p">(</span><span class="n">a</span><span class="p">[</span><span class="n">df_bangladesh_first3</span><span class="p">[</span><span class="s">"district_code"</span><span class="p">]])</span>
<span class="c1"># likelihood, n=1 since each represents an individual woman
</span> <span class="n">c</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Binomial</span><span class="p">(</span><span class="s">"c"</span><span class="p">,</span> <span class="n">n</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">p</span><span class="o">=</span><span class="n">p</span><span class="p">,</span> <span class="n">observed</span><span class="o">=</span><span class="n">df_bangladesh_first3</span><span class="p">[</span><span class="s">"use.contraception"</span><span class="p">])</span>
<span class="n">trace_mfe</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">sample</span><span class="p">(</span><span class="n">draws</span><span class="o">=</span><span class="mi">1000</span><span class="p">,</span> <span class="n">random_seed</span><span class="o">=</span><span class="mi">19</span><span class="p">,</span> <span class="n">return_inferencedata</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">progressbar</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [a]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 22 seconds.
</code></pre></div></div>
<p>Let’s do some model diagnostics with some <code class="language-plaintext highlighter-rouge">arviz</code> functions.</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">az.summary</code> can provide the effective sample size (like with <code class="language-plaintext highlighter-rouge">ess_mean</code>) and <code class="language-plaintext highlighter-rouge">r_hat</code> values. The effective sample size is an indication of how well the posterior distribution was explored by HMC. Since Markov chains are typically autocorrelated, sampling within a chain is not entirely independent. The effective sample size accounts for this correlation. This number can even exceed the raw sample size. The <code class="language-plaintext highlighter-rouge">r_hat</code> value is “estimated between-chains and within-chain variances for each model parameter” <a href="https://blog.stata.com/2016/05/26/gelman-rubin-convergence-diagnostic-using-multiple-chains/">source</a>. (While an imperfect analogy, I think of <a href="https://en.wikipedia.org/wiki/Analysis_of_variance">ANOVA and the F-test</a> which is calculated as variance between groups divided by variance within groups.) McElreath cautions that an <code class="language-plaintext highlighter-rouge">r_hat</code> value above 1.00 is a signal of danger, but not of safety; an invalid chain can still reach 1.00.</li>
</ul>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">az</span><span class="p">.</span><span class="n">summary</span><span class="p">(</span><span class="n">trace_mfe</span><span class="p">)</span>
</code></pre></div></div>
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>mean</th>
<th>sd</th>
<th>hdi_5.5%</th>
<th>hdi_94.5%</th>
<th>mcse_mean</th>
<th>mcse_sd</th>
<th>ess_mean</th>
<th>ess_sd</th>
<th>ess_bulk</th>
<th>ess_tail</th>
<th>r_hat</th>
</tr>
</thead>
<tbody>
<tr>
<th>a[0]</th>
<td>-1.058</td>
<td>0.211</td>
<td>-1.380</td>
<td>-0.710</td>
<td>0.003</td>
<td>0.002</td>
<td>4574.0</td>
<td>4005.0</td>
<td>4666.0</td>
<td>2322.0</td>
<td>1.0</td>
</tr>
<tr>
<th>a[1]</th>
<td>-0.592</td>
<td>0.439</td>
<td>-1.287</td>
<td>0.086</td>
<td>0.006</td>
<td>0.005</td>
<td>5652.0</td>
<td>3980.0</td>
<td>5674.0</td>
<td>3009.0</td>
<td>1.0</td>
</tr>
<tr>
<th>a[2]</th>
<td>1.220</td>
<td>1.162</td>
<td>-0.582</td>
<td>3.069</td>
<td>0.015</td>
<td>0.014</td>
<td>5776.0</td>
<td>3345.0</td>
<td>5811.0</td>
<td>2619.0</td>
<td>1.0</td>
</tr>
</tbody>
</table>
</div>
<p>The <code class="language-plaintext highlighter-rouge">az.plot_trace</code> and <code class="language-plaintext highlighter-rouge">az.plot_rank</code> functions output visualizations. The former theoretically can be used to assess how well chains are mixing, but it can be hard to see. The latter are histograms of the ranked samples. This tells us how evenly a particular chain’s samples come in “ranked” across all samples and chains. Efficient exploration of the posterior should yield uniform distributions in these kind of plots (known as trace rank or “trank” plots).</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">az</span><span class="p">.</span><span class="n">plot_trace</span><span class="p">(</span><span class="n">trace_mfe</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>array([[<AxesSubplot:title={'center':'a'}>,
<AxesSubplot:title={'center':'a'}>]], dtype=object)
</code></pre></div></div>
<p><img src="/assets/2021-11-23-diagnosing-a-model_files/2021-11-23-diagnosing-a-model_18_1.png" alt="png" /></p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">az</span><span class="p">.</span><span class="n">plot_rank</span><span class="p">(</span><span class="n">trace_mfe</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>array([<AxesSubplot:title={'center':'a\n0'}, xlabel='Rank (all chains)', ylabel='Chain'>,
<AxesSubplot:title={'center':'a\n1'}, xlabel='Rank (all chains)', ylabel='Chain'>,
<AxesSubplot:title={'center':'a\n2'}, xlabel='Rank (all chains)', ylabel='Chain'>],
dtype=object)
</code></pre></div></div>
<p><img src="/assets/2021-11-23-diagnosing-a-model_files/2021-11-23-diagnosing-a-model_19_1.png" alt="png" /></p>
<p>As you can see in this case, the model ran great. Some indications for this:</p>
<ul>
<li>Pymc gave no warnings about divergences.</li>
<li>While we would need something to compare it to, the number of samples <code class="language-plaintext highlighter-rouge">ess_mean</code> is high.</li>
<li>The <code class="language-plaintext highlighter-rouge">r_hat</code> is 1.0 which is a sign of “lack of danger”.</li>
<li>The trace plots apparently show good “wiggliness” and inter-mixing between chains is confirmed by the trank plots.</li>
</ul>
<p>In the plots, the blue chain is <code class="language-plaintext highlighter-rouge">district0</code>, orange is <code class="language-plaintext highlighter-rouge">district1</code>, and green is <code class="language-plaintext highlighter-rouge">district2</code>, but let’s not focus too much on interpretation for now.</p>
<p>Let’s see how the first iteration of our mixed effects model looks. We’ll paramaterize as we had done in the last post and as the equations show above.</p>
<h1 id="mixed-effects-me-attempt-0">Mixed-effects (ME) attempt 0</h1>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">with</span> <span class="n">pm</span><span class="p">.</span><span class="n">Model</span><span class="p">()</span> <span class="k">as</span> <span class="n">mme0</span><span class="p">:</span>
<span class="c1"># prior for average district
</span> <span class="n">a_bar</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">"a_bar"</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">,</span> <span class="mf">1.5</span><span class="p">)</span>
<span class="c1"># prior for SD of districts
</span> <span class="n">sigma</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Exponential</span><span class="p">(</span><span class="s">"sigma"</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">)</span>
<span class="c1"># alpha prior, we only have 1 district
</span> <span class="n">a</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">"a"</span><span class="p">,</span> <span class="n">a_bar</span><span class="p">,</span> <span class="n">sigma</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="nb">len</span><span class="p">(</span><span class="n">df_bangladesh_first3</span><span class="p">[</span><span class="s">"district_code"</span><span class="p">].</span><span class="n">unique</span><span class="p">()))</span>
<span class="c1"># link function
</span> <span class="n">p</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">math</span><span class="p">.</span><span class="n">invlogit</span><span class="p">(</span><span class="n">a</span><span class="p">[</span><span class="n">df_bangladesh_first3</span><span class="p">[</span><span class="s">"district_code"</span><span class="p">]])</span>
<span class="c1"># likelihood, n=1 since each represents an individual woman
</span> <span class="n">c</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Binomial</span><span class="p">(</span><span class="s">"c"</span><span class="p">,</span> <span class="n">n</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">p</span><span class="o">=</span><span class="n">p</span><span class="p">,</span> <span class="n">observed</span><span class="o">=</span><span class="n">df_bangladesh_first3</span><span class="p">[</span><span class="s">"use.contraception"</span><span class="p">])</span>
<span class="n">trace_mme0</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">sample</span><span class="p">(</span><span class="n">draws</span><span class="o">=</span><span class="mi">1000</span><span class="p">,</span> <span class="n">random_seed</span><span class="o">=</span><span class="mi">19</span><span class="p">,</span> <span class="n">return_inferencedata</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">progressbar</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [a, sigma, a_bar]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 21 seconds.
There were 71 divergences after tuning. Increase `target_accept` or reparameterize.
There were 71 divergences after tuning. Increase `target_accept` or reparameterize.
There were 99 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.7116085658262987, but should be close to 0.8. Try to increase the number of tuning steps.
There were 49 divergences after tuning. Increase `target_accept` or reparameterize.
The number of effective samples is smaller than 10% for some parameters.
</code></pre></div></div>
<p>Wow, we’ve already got a <a href="https://media.giphy.com/media/fwbo0KVql262TePTMZ/giphy.gif">number of issues</a>. Let’s take a look at our summary and trace plots.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">az</span><span class="p">.</span><span class="n">summary</span><span class="p">(</span><span class="n">trace_mme0</span><span class="p">)</span>
</code></pre></div></div>
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>mean</th>
<th>sd</th>
<th>hdi_5.5%</th>
<th>hdi_94.5%</th>
<th>mcse_mean</th>
<th>mcse_sd</th>
<th>ess_mean</th>
<th>ess_sd</th>
<th>ess_bulk</th>
<th>ess_tail</th>
<th>r_hat</th>
</tr>
</thead>
<tbody>
<tr>
<th>a_bar</th>
<td>-0.449</td>
<td>0.703</td>
<td>-1.638</td>
<td>0.532</td>
<td>0.024</td>
<td>0.017</td>
<td>874.0</td>
<td>874.0</td>
<td>848.0</td>
<td>394.0</td>
<td>1.00</td>
</tr>
<tr>
<th>a[0]</th>
<td>-1.031</td>
<td>0.206</td>
<td>-1.355</td>
<td>-0.712</td>
<td>0.008</td>
<td>0.006</td>
<td>666.0</td>
<td>666.0</td>
<td>662.0</td>
<td>1641.0</td>
<td>1.00</td>
</tr>
<tr>
<th>a[1]</th>
<td>-0.642</td>
<td>0.434</td>
<td>-1.379</td>
<td>-0.025</td>
<td>0.014</td>
<td>0.011</td>
<td>996.0</td>
<td>821.0</td>
<td>993.0</td>
<td>330.0</td>
<td>1.00</td>
</tr>
<tr>
<th>a[2]</th>
<td>0.282</td>
<td>1.358</td>
<td>-1.513</td>
<td>2.137</td>
<td>0.051</td>
<td>0.036</td>
<td>715.0</td>
<td>715.0</td>
<td>663.0</td>
<td>382.0</td>
<td>1.00</td>
</tr>
<tr>
<th>sigma</th>
<td>0.966</td>
<td>0.752</td>
<td>0.167</td>
<td>1.885</td>
<td>0.033</td>
<td>0.023</td>
<td>523.0</td>
<td>523.0</td>
<td>232.0</td>
<td>134.0</td>
<td>1.01</td>
</tr>
</tbody>
</table>
</div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">az</span><span class="p">.</span><span class="n">plot_trace</span><span class="p">(</span><span class="n">trace_mme0</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>array([[<AxesSubplot:title={'center':'a_bar'}>,
<AxesSubplot:title={'center':'a_bar'}>],
[<AxesSubplot:title={'center':'a'}>,
<AxesSubplot:title={'center':'a'}>],
[<AxesSubplot:title={'center':'sigma'}>,
<AxesSubplot:title={'center':'sigma'}>]], dtype=object)
</code></pre></div></div>
<p><img src="/assets/2021-11-23-diagnosing-a-model_files/2021-11-23-diagnosing-a-model_25_1.png" alt="png" /></p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">az</span><span class="p">.</span><span class="n">plot_rank</span><span class="p">(</span><span class="n">trace_mme0</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>array([[<AxesSubplot:title={'center':'a_bar'}, xlabel='Rank (all chains)', ylabel='Chain'>,
<AxesSubplot:title={'center':'a\n0'}, xlabel='Rank (all chains)', ylabel='Chain'>,
<AxesSubplot:title={'center':'a\n1'}, xlabel='Rank (all chains)', ylabel='Chain'>],
[<AxesSubplot:title={'center':'a\n2'}, xlabel='Rank (all chains)', ylabel='Chain'>,
<AxesSubplot:title={'center':'sigma'}, xlabel='Rank (all chains)', ylabel='Chain'>,
<AxesSubplot:>]], dtype=object)
</code></pre></div></div>
<p><img src="/assets/2021-11-23-diagnosing-a-model_files/2021-11-23-diagnosing-a-model_26_1.png" alt="png" /></p>
<p>The <code class="language-plaintext highlighter-rouge">r_hat</code> seems to indicate that the chains didn’t have a lot of variation between each other. But another thing to look at is the <code class="language-plaintext highlighter-rouge">ess_mean</code> which is an indicator of the effective sample size. A higher number means sampling the posterior distribution is more efficient. When we compare between the fixed effects and this first version of our mixed effects model, we can see that the latter had trouble sampling. The number of effective samples in the mixed effects model is much smaller for each district than in the fixed effects model. The trank plots are sounding alarms, particularly with the <code class="language-plaintext highlighter-rouge">sigma</code> parameter, but let’s come back to this.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">f</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">8</span><span class="p">,</span> <span class="mi">4</span><span class="p">))</span>
<span class="n">ax</span><span class="p">.</span><span class="n">bar</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="p">[</span><span class="n">i</span> <span class="o">-</span> <span class="mf">0.1</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">3</span><span class="p">)],</span> <span class="n">height</span><span class="o">=</span><span class="n">az</span><span class="p">.</span><span class="n">summary</span><span class="p">(</span><span class="n">trace_mfe</span><span class="p">)[</span><span class="s">'ess_mean'</span><span class="p">],</span> <span class="n">width</span><span class="o">=</span><span class="mf">0.25</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'gray'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'fixed effects'</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">bar</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="p">[</span><span class="n">i</span> <span class="o">+</span> <span class="mf">0.1</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">3</span><span class="p">)],</span> <span class="n">height</span><span class="o">=</span><span class="n">az</span><span class="p">.</span><span class="n">summary</span><span class="p">(</span><span class="n">trace_mme0</span><span class="p">)[</span><span class="s">'ess_mean'</span><span class="p">].</span><span class="n">iloc</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span><span class="mi">4</span><span class="p">],</span> <span class="n">width</span><span class="o">=</span><span class="mf">0.25</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'navy'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'mixed effects v0'</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">set_xticks</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">3</span><span class="p">))</span>
<span class="n">ax</span><span class="p">.</span><span class="n">set_xticklabels</span><span class="p">([</span><span class="s">'district'</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">3</span><span class="p">)])</span>
<span class="n">ax</span><span class="p">.</span><span class="n">legend</span><span class="p">(</span><span class="n">fontsize</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">set_ylabel</span><span class="p">(</span><span class="s">'effective sample size (mean)'</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Text(0, 0.5, 'effective sample size (mean)')
</code></pre></div></div>
<p><img src="/assets/2021-11-23-diagnosing-a-model_files/2021-11-23-diagnosing-a-model_28_1.png" alt="png" /></p>
<p>Let’s now take a closer look at some of the warnings.</p>
<p><code class="language-plaintext highlighter-rouge">There were 71 divergences after tuning.</code> This is an indication that the model had exploring all of the posterior distribution and that there could be a problem with the chains. Divergences result when the “energy at the start of the trajectory differs substantially from the energy at the end” according to pg. 278 of Statistical Rethinking. The energy pertains to the physics simulation that Hamiltonian Monte Carlo performs.</p>
<p><code class="language-plaintext highlighter-rouge">Increase `target_accept` or reparameterize.</code> This suggestion follows the above warning. We’ll talk about reparamaterization later, but what is <code class="language-plaintext highlighter-rouge">target_accept</code> for? Per <a href="https://docs.pymc.io/en/stable/api/inference.html">the pymc documentation</a>, this controls the step size of the physics simulation. A higher <code class="language-plaintext highlighter-rouge">target_accept</code> value will lead to smaller step sizes, or a smaller duration of time to run each segment of a simulation. This will help explore tricky and curvy parts of a posterior distribution where a smaller <code class="language-plaintext highlighter-rouge">target_accept</code> value might overshoot and miss exploring these areas. (Why would you want a smaller <code class="language-plaintext highlighter-rouge">target_accept</code> if you can get away with it? The model will sample more efficiently if the posterior is not problematic.) Therefore, increasing the <code class="language-plaintext highlighter-rouge">target_accept</code> paramaeter is an easy thing we can change so let’s try this first. The default value is 0.8, so let’s try 0.9</p>
<h1 id="me-attempt-1-higher-target_accept">ME attempt 1: higher <code class="language-plaintext highlighter-rouge">target_accept</code></h1>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">with</span> <span class="n">pm</span><span class="p">.</span><span class="n">Model</span><span class="p">()</span> <span class="k">as</span> <span class="n">mme1</span><span class="p">:</span>
<span class="c1"># prior for average district
</span> <span class="n">a_bar</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">"a_bar"</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">,</span> <span class="mf">1.5</span><span class="p">)</span>
<span class="c1"># prior for SD of districts
</span> <span class="n">sigma</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Exponential</span><span class="p">(</span><span class="s">"sigma"</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">)</span>
<span class="c1"># alpha prior, we only have 1 district
</span> <span class="n">a</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">"a"</span><span class="p">,</span> <span class="n">a_bar</span><span class="p">,</span> <span class="n">sigma</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="nb">len</span><span class="p">(</span><span class="n">df_bangladesh_first3</span><span class="p">[</span><span class="s">"district_code"</span><span class="p">].</span><span class="n">unique</span><span class="p">()))</span>
<span class="c1"># link function
</span> <span class="n">p</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">math</span><span class="p">.</span><span class="n">invlogit</span><span class="p">(</span><span class="n">a</span><span class="p">[</span><span class="n">df_bangladesh_first3</span><span class="p">[</span><span class="s">"district_code"</span><span class="p">]])</span>
<span class="c1"># likelihood, n=1 since each represents an individual woman
</span> <span class="n">c</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Binomial</span><span class="p">(</span><span class="s">"c"</span><span class="p">,</span> <span class="n">n</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">p</span><span class="o">=</span><span class="n">p</span><span class="p">,</span> <span class="n">observed</span><span class="o">=</span><span class="n">df_bangladesh_first3</span><span class="p">[</span><span class="s">"use.contraception"</span><span class="p">])</span>
<span class="n">trace_mme1</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">sample</span><span class="p">(</span><span class="n">draws</span><span class="o">=</span><span class="mi">1000</span><span class="p">,</span> <span class="n">random_seed</span><span class="o">=</span><span class="mi">19</span><span class="p">,</span> <span class="n">return_inferencedata</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">progressbar</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span> <span class="n">target_accept</span><span class="o">=</span><span class="mf">0.9</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [a, sigma, a_bar]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 23 seconds.
There were 151 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.5782174075147031, but should be close to 0.9. Try to increase the number of tuning steps.
There were 34 divergences after tuning. Increase `target_accept` or reparameterize.
There were 56 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.8194929209484552, but should be close to 0.9. Try to increase the number of tuning steps.
There were 58 divergences after tuning. Increase `target_accept` or reparameterize.
The rhat statistic is larger than 1.05 for some parameters. This indicates slight problems during sampling.
The estimated number of effective samples is smaller than 200 for some parameters.
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">az</span><span class="p">.</span><span class="n">summary</span><span class="p">(</span><span class="n">trace_mme1</span><span class="p">)</span>
</code></pre></div></div>
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>mean</th>
<th>sd</th>
<th>hdi_5.5%</th>
<th>hdi_94.5%</th>
<th>mcse_mean</th>
<th>mcse_sd</th>
<th>ess_mean</th>
<th>ess_sd</th>
<th>ess_bulk</th>
<th>ess_tail</th>
<th>r_hat</th>
</tr>
</thead>
<tbody>
<tr>
<th>a_bar</th>
<td>-0.497</td>
<td>0.650</td>
<td>-1.371</td>
<td>0.559</td>
<td>0.024</td>
<td>0.017</td>
<td>720.0</td>
<td>720.0</td>
<td>647.0</td>
<td>636.0</td>
<td>1.01</td>
</tr>
<tr>
<th>a[0]</th>
<td>-1.010</td>
<td>0.208</td>
<td>-1.332</td>
<td>-0.685</td>
<td>0.009</td>
<td>0.007</td>
<td>495.0</td>
<td>463.0</td>
<td>544.0</td>
<td>1926.0</td>
<td>1.01</td>
</tr>
<tr>
<th>a[1]</th>
<td>-0.652</td>
<td>0.405</td>
<td>-1.273</td>
<td>0.029</td>
<td>0.011</td>
<td>0.008</td>
<td>1290.0</td>
<td>1290.0</td>
<td>1198.0</td>
<td>1199.0</td>
<td>1.05</td>
</tr>
<tr>
<th>a[2]</th>
<td>0.193</td>
<td>1.327</td>
<td>-1.449</td>
<td>2.039</td>
<td>0.066</td>
<td>0.047</td>
<td>407.0</td>
<td>407.0</td>
<td>369.0</td>
<td>1246.0</td>
<td>1.02</td>
</tr>
<tr>
<th>sigma</th>
<td>0.866</td>
<td>0.741</td>
<td>0.091</td>
<td>1.765</td>
<td>0.051</td>
<td>0.036</td>
<td>214.0</td>
<td>214.0</td>
<td>42.0</td>
<td>21.0</td>
<td>1.07</td>
</tr>
</tbody>
</table>
</div>
<p>We get higher <code class="language-plaintext highlighter-rouge">ess_mean</code> values but our <code class="language-plaintext highlighter-rouge">r_hat</code> actually got worse. Looks like we have more work to do. But let’s try <code class="language-plaintext highlighter-rouge">target_accept</code> again by <a href="https://media.giphy.com/media/bHG5gzKfPESAGr4Dxg/giphy.gif">cranking it up</a> to 0.99</p>
<h1 id="me-attempt-2-even-higher-target_accept">ME attempt 2: even higher <code class="language-plaintext highlighter-rouge">target_accept</code></h1>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">with</span> <span class="n">pm</span><span class="p">.</span><span class="n">Model</span><span class="p">()</span> <span class="k">as</span> <span class="n">mme2</span><span class="p">:</span>
<span class="c1"># prior for average district
</span> <span class="n">a_bar</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">"a_bar"</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">,</span> <span class="mf">1.5</span><span class="p">)</span>
<span class="c1"># prior for SD of districts
</span> <span class="n">sigma</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Exponential</span><span class="p">(</span><span class="s">"sigma"</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">)</span>
<span class="c1"># alpha prior, we only have 1 district
</span> <span class="n">a</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">"a"</span><span class="p">,</span> <span class="n">a_bar</span><span class="p">,</span> <span class="n">sigma</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="nb">len</span><span class="p">(</span><span class="n">df_bangladesh_first3</span><span class="p">[</span><span class="s">"district_code"</span><span class="p">].</span><span class="n">unique</span><span class="p">()))</span>
<span class="c1"># link function
</span> <span class="n">p</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">math</span><span class="p">.</span><span class="n">invlogit</span><span class="p">(</span><span class="n">a</span><span class="p">[</span><span class="n">df_bangladesh_first3</span><span class="p">[</span><span class="s">"district_code"</span><span class="p">]])</span>
<span class="c1"># likelihood, n=1 since each represents an individual woman
</span> <span class="n">c</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Binomial</span><span class="p">(</span><span class="s">"c"</span><span class="p">,</span> <span class="n">n</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">p</span><span class="o">=</span><span class="n">p</span><span class="p">,</span> <span class="n">observed</span><span class="o">=</span><span class="n">df_bangladesh_first3</span><span class="p">[</span><span class="s">"use.contraception"</span><span class="p">])</span>
<span class="n">trace_mme2</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">sample</span><span class="p">(</span><span class="n">draws</span><span class="o">=</span><span class="mi">1000</span><span class="p">,</span> <span class="n">random_seed</span><span class="o">=</span><span class="mi">19</span><span class="p">,</span> <span class="n">return_inferencedata</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">progressbar</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span> <span class="n">target_accept</span><span class="o">=</span><span class="mf">0.99</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [a, sigma, a_bar]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 35 seconds.
There were 10 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.9658147278026631, but should be close to 0.99. Try to increase the number of tuning steps.
There were 2 divergences after tuning. Increase `target_accept` or reparameterize.
There were 3 divergences after tuning. Increase `target_accept` or reparameterize.
There were 5 divergences after tuning. Increase `target_accept` or reparameterize.
The number of effective samples is smaller than 10% for some parameters.
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">az</span><span class="p">.</span><span class="n">summary</span><span class="p">(</span><span class="n">trace_mme2</span><span class="p">)</span>
</code></pre></div></div>
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>mean</th>
<th>sd</th>
<th>hdi_5.5%</th>
<th>hdi_94.5%</th>
<th>mcse_mean</th>
<th>mcse_sd</th>
<th>ess_mean</th>
<th>ess_sd</th>
<th>ess_bulk</th>
<th>ess_tail</th>
<th>r_hat</th>
</tr>
</thead>
<tbody>
<tr>
<th>a_bar</th>
<td>-0.497</td>
<td>0.669</td>
<td>-1.382</td>
<td>0.558</td>
<td>0.022</td>
<td>0.016</td>
<td>913.0</td>
<td>913.0</td>
<td>942.0</td>
<td>1475.0</td>
<td>1.00</td>
</tr>
<tr>
<th>a[0]</th>
<td>-1.015</td>
<td>0.209</td>
<td>-1.327</td>
<td>-0.664</td>
<td>0.006</td>
<td>0.004</td>
<td>1358.0</td>
<td>1358.0</td>
<td>1352.0</td>
<td>1871.0</td>
<td>1.00</td>
</tr>
<tr>
<th>a[1]</th>
<td>-0.684</td>
<td>0.406</td>
<td>-1.355</td>
<td>-0.065</td>
<td>0.011</td>
<td>0.008</td>
<td>1419.0</td>
<td>1419.0</td>
<td>1389.0</td>
<td>1621.0</td>
<td>1.00</td>
</tr>
<tr>
<th>a[2]</th>
<td>0.122</td>
<td>1.315</td>
<td>-1.444</td>
<td>1.987</td>
<td>0.054</td>
<td>0.038</td>
<td>603.0</td>
<td>603.0</td>
<td>709.0</td>
<td>1132.0</td>
<td>1.00</td>
</tr>
<tr>
<th>sigma</th>
<td>0.865</td>
<td>0.760</td>
<td>0.019</td>
<td>1.749</td>
<td>0.034</td>
<td>0.024</td>
<td>489.0</td>
<td>489.0</td>
<td>349.0</td>
<td>425.0</td>
<td>1.01</td>
</tr>
</tbody>
</table>
</div>
<p>These numbers look better, but the suggestions to reparamaterize persist. This problem we’re observing is a a good example of <a href="https://statmodeling.stat.columbia.edu/2008/05/13/the_folk_theore/">the folk theorem of statistical computing</a> and why we should make the effort to reparamaterize. But first, let’s see if we can visualize this problematic posterior using the arviz <code class="language-plaintext highlighter-rouge">plot_pair</code> function.</p>
<h2 id="visualizing-divergent-transitions">Visualizing divergent transitions</h2>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">matplotlib.pyplot</span> <span class="kn">import</span> <span class="n">figure</span>
<span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">8</span><span class="p">,</span> <span class="mi">6</span><span class="p">))</span>
<span class="n">az</span><span class="p">.</span><span class="n">plot_pair</span><span class="p">(</span><span class="n">trace_mme2</span><span class="p">,</span> <span class="n">kind</span><span class="o">=</span><span class="s">'kde'</span><span class="p">,</span> <span class="n">divergences</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>array([[<AxesSubplot:ylabel='a\n0'>, <AxesSubplot:>, <AxesSubplot:>,
<AxesSubplot:>],
[<AxesSubplot:ylabel='a\n1'>, <AxesSubplot:>, <AxesSubplot:>,
<AxesSubplot:>],
[<AxesSubplot:ylabel='a\n2'>, <AxesSubplot:>, <AxesSubplot:>,
<AxesSubplot:>],
[<AxesSubplot:xlabel='a_bar', ylabel='sigma'>,
<AxesSubplot:xlabel='a\n0'>, <AxesSubplot:xlabel='a\n1'>,
<AxesSubplot:xlabel='a\n2'>]], dtype=object)
<Figure size 800x600 with 0 Axes>
</code></pre></div></div>
<p><img src="/assets/2021-11-23-diagnosing-a-model_files/2021-11-23-diagnosing-a-model_39_2.png" alt="png" /></p>
<p>That’s a lot of red dots! Each represents a divergent transition. Why could this happen? McElreath again provides a clue by pointing to the sigma parameter. When using a logit function “floor and ceiling effects sometimes render extreme values of the variance equally plausible as more realistic values.” Let’s look closer at some of the diagnostic metrics.</p>
<p>First we see low <code class="language-plaintext highlighter-rouge">ess_mean</code> and an <code class="language-plaintext highlighter-rouge">r_hat</code> is above 1.00, even if barely.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">az</span><span class="p">.</span><span class="n">summary</span><span class="p">(</span><span class="n">trace_mme0</span><span class="p">,</span> <span class="n">var_names</span><span class="o">=</span><span class="s">'sigma'</span><span class="p">)</span>
</code></pre></div></div>
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>mean</th>
<th>sd</th>
<th>hdi_5.5%</th>
<th>hdi_94.5%</th>
<th>mcse_mean</th>
<th>mcse_sd</th>
<th>ess_mean</th>
<th>ess_sd</th>
<th>ess_bulk</th>
<th>ess_tail</th>
<th>r_hat</th>
</tr>
</thead>
<tbody>
<tr>
<th>sigma</th>
<td>0.966</td>
<td>0.752</td>
<td>0.167</td>
<td>1.885</td>
<td>0.033</td>
<td>0.023</td>
<td>523.0</td>
<td>523.0</td>
<td>232.0</td>
<td>134.0</td>
<td>1.01</td>
</tr>
</tbody>
</table>
</div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">az</span><span class="p">.</span><span class="n">plot_trace</span><span class="p">(</span><span class="n">trace_mme0</span><span class="p">,</span> <span class="n">var_names</span><span class="o">=</span><span class="s">'sigma'</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>array([[<AxesSubplot:title={'center':'sigma'}>,
<AxesSubplot:title={'center':'sigma'}>]], dtype=object)
</code></pre></div></div>
<p><img src="/assets/2021-11-23-diagnosing-a-model_files/2021-11-23-diagnosing-a-model_42_1.png" alt="png" /></p>
<p>It’s hard to tell what’s going on in the trace plot, but the rank plot definitely indicates something strange is going on.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">az</span><span class="p">.</span><span class="n">plot_rank</span><span class="p">(</span><span class="n">trace_mme0</span><span class="p">,</span> <span class="n">var_names</span><span class="o">=</span><span class="s">'sigma'</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><AxesSubplot:title={'center':'sigma'}, xlabel='Rank (all chains)', ylabel='Chain'>
</code></pre></div></div>
<p><img src="/assets/2021-11-23-diagnosing-a-model_files/2021-11-23-diagnosing-a-model_44_1.png" alt="png" /></p>
<p>There’s something simple we can do first to help avoid extreme values of variance: use a much more informative prior. Let’s use a half-normal prior instead of an exponential. It will keep values of <code class="language-plaintext highlighter-rouge">sigma</code> positive while avoiding the more extreme values that a exponential distribution allows.</p>
<h1 id="me-attempt-3-more-informative-prior-for-sigma">ME attempt 3: more informative prior for sigma</h1>
<p><strong>Plot exponential and half-normal (change these)</strong></p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">f</span><span class="p">,</span> <span class="p">(</span><span class="n">ax1</span><span class="p">,</span> <span class="n">ax2</span><span class="p">)</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">12</span><span class="p">,</span> <span class="mi">4</span><span class="p">))</span>
<span class="c1"># First two subplots ------------------------------------------
</span><span class="n">x1</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">linspace</span><span class="p">(</span><span class="n">stats</span><span class="p">.</span><span class="n">norm</span><span class="p">.</span><span class="n">ppf</span><span class="p">(</span><span class="mf">0.01</span><span class="p">),</span>
<span class="n">stats</span><span class="p">.</span><span class="n">norm</span><span class="p">.</span><span class="n">ppf</span><span class="p">(</span><span class="mf">0.99</span><span class="p">),</span> <span class="mi">100</span><span class="p">)</span>
<span class="n">x2</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">linspace</span><span class="p">(</span><span class="n">stats</span><span class="p">.</span><span class="n">halfnorm</span><span class="p">.</span><span class="n">ppf</span><span class="p">(</span><span class="mf">0.01</span><span class="p">),</span>
<span class="n">stats</span><span class="p">.</span><span class="n">halfnorm</span><span class="p">.</span><span class="n">ppf</span><span class="p">(</span><span class="mf">0.99</span><span class="p">),</span> <span class="mi">100</span><span class="p">)</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x1</span><span class="p">,</span> <span class="n">stats</span><span class="p">.</span><span class="n">norm</span><span class="p">.</span><span class="n">pdf</span><span class="p">(</span><span class="n">x1</span><span class="p">),</span> <span class="n">color</span><span class="o">=</span><span class="s">'black'</span><span class="p">)</span>
<span class="n">ax1</span><span class="p">.</span><span class="nb">set</span><span class="p">(</span><span class="n">title</span><span class="o">=</span><span class="sa">r</span><span class="s">'prior for $\bar{\alpha}$'</span><span class="p">,</span> <span class="n">xlabel</span> <span class="o">=</span> <span class="s">'x'</span><span class="p">,</span> <span class="n">ylabel</span><span class="o">=</span><span class="s">'density'</span><span class="p">)</span>
<span class="n">ax2</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x2</span><span class="p">,</span> <span class="n">stats</span><span class="p">.</span><span class="n">halfnorm</span><span class="p">.</span><span class="n">pdf</span><span class="p">(</span><span class="n">x2</span><span class="p">),</span> <span class="n">color</span><span class="o">=</span><span class="s">'black'</span><span class="p">)</span>
<span class="n">ax2</span><span class="p">.</span><span class="nb">set</span><span class="p">(</span><span class="n">title</span><span class="o">=</span><span class="sa">r</span><span class="s">'prior for $\sigma$'</span><span class="p">,</span> <span class="n">xlabel</span> <span class="o">=</span> <span class="s">'x'</span><span class="p">,</span> <span class="n">ylabel</span><span class="o">=</span><span class="s">'density'</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[Text(0.5, 1.0, 'prior for $\\sigma$'),
Text(0.5, 0, 'x'),
Text(0, 0.5, 'density')]
</code></pre></div></div>
<p><img src="/assets/2021-11-23-diagnosing-a-model_files/2021-11-23-diagnosing-a-model_48_1.png" alt="png" /></p>
<p>Model <code class="language-plaintext highlighter-rouge">mme3</code> equation</p>
\[C_i \sim \text{Binomial}(1, p_i) \tag{binomial likelihood}\]
\[\text{logit}(p_i) = \alpha_{\text{district}[i]} \tag{linear model using logit link}\]
\[\alpha_j \sim \text{Normal}(\bar{\alpha}, \sigma) \tag{adaptive prior}\]
\[\bar{\alpha} \sim \text{Normal}(0, 1.5) \tag{regularizing hyperprior}\]
\[\sigma \sim \text{Half-Normal}(\text{TBD}) \tag{regularizing hyperprior}\]
<p>I’m going to leave the exact value of our half-normal parameter TBD because I really don’t know what’s going to work. <a href="https://media.giphy.com/media/3o7TKCFii3mAz693Ko/giphy.gif">We’ll just have to try</a>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">with</span> <span class="n">pm</span><span class="p">.</span><span class="n">Model</span><span class="p">()</span> <span class="k">as</span> <span class="n">mme3</span><span class="p">:</span>
<span class="c1"># prior for average district
</span> <span class="n">a_bar</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">"a_bar"</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">,</span> <span class="mf">1.5</span><span class="p">)</span>
<span class="c1"># prior for SD of districts
</span> <span class="n">sigma</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">HalfNormal</span><span class="p">(</span><span class="s">"sigma"</span><span class="p">,</span> <span class="mf">0.25</span><span class="p">)</span>
<span class="c1"># alpha prior, we only have 1 district
</span> <span class="n">a</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">"a"</span><span class="p">,</span> <span class="n">a_bar</span><span class="p">,</span> <span class="n">sigma</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="nb">len</span><span class="p">(</span><span class="n">df_bangladesh_first3</span><span class="p">[</span><span class="s">"district_code"</span><span class="p">].</span><span class="n">unique</span><span class="p">()))</span>
<span class="c1"># link function
</span> <span class="n">p</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">math</span><span class="p">.</span><span class="n">invlogit</span><span class="p">(</span><span class="n">a</span><span class="p">[</span><span class="n">df_bangladesh_first3</span><span class="p">[</span><span class="s">"district_code"</span><span class="p">]])</span>
<span class="c1"># likelihood, n=1 since each represents an individual woman
</span> <span class="n">c</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Binomial</span><span class="p">(</span><span class="s">"c"</span><span class="p">,</span> <span class="n">n</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">p</span><span class="o">=</span><span class="n">p</span><span class="p">,</span> <span class="n">observed</span><span class="o">=</span><span class="n">df_bangladesh_first3</span><span class="p">[</span><span class="s">"use.contraception"</span><span class="p">])</span>
<span class="n">trace_mme3</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">sample</span><span class="p">(</span><span class="n">draws</span><span class="o">=</span><span class="mi">1000</span><span class="p">,</span> <span class="n">random_seed</span><span class="o">=</span><span class="mi">19</span><span class="p">,</span> <span class="n">return_inferencedata</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">progressbar</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span> <span class="n">target_accept</span><span class="o">=</span><span class="mf">0.99</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [a, sigma, a_bar]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 34 seconds.
There were 8 divergences after tuning. Increase `target_accept` or reparameterize.
There were 6 divergences after tuning. Increase `target_accept` or reparameterize.
There were 201 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.6423494210597291, but should be close to 0.99. Try to increase the number of tuning steps.
There were 8 divergences after tuning. Increase `target_accept` or reparameterize.
The rhat statistic is larger than 1.05 for some parameters. This indicates slight problems during sampling.
The estimated number of effective samples is smaller than 200 for some parameters.
</code></pre></div></div>
<p>Hmmm… that looks worse. Let’s make sigma tighter by setting the half-normal parameter to 0.1.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">with</span> <span class="n">pm</span><span class="p">.</span><span class="n">Model</span><span class="p">()</span> <span class="k">as</span> <span class="n">mme3b</span><span class="p">:</span>
<span class="c1"># prior for average district
</span> <span class="n">a_bar</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">"a_bar"</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">,</span> <span class="mf">1.5</span><span class="p">)</span>
<span class="c1"># prior for SD of districts
</span> <span class="n">sigma</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">HalfNormal</span><span class="p">(</span><span class="s">"sigma"</span><span class="p">,</span> <span class="mf">0.1</span><span class="p">)</span>
<span class="c1"># alpha prior, we only have 1 district
</span> <span class="n">a</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">"a"</span><span class="p">,</span> <span class="n">a_bar</span><span class="p">,</span> <span class="n">sigma</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="nb">len</span><span class="p">(</span><span class="n">df_bangladesh_first3</span><span class="p">[</span><span class="s">"district_code"</span><span class="p">].</span><span class="n">unique</span><span class="p">()))</span>
<span class="c1"># link function
</span> <span class="n">p</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">math</span><span class="p">.</span><span class="n">invlogit</span><span class="p">(</span><span class="n">a</span><span class="p">[</span><span class="n">df_bangladesh_first3</span><span class="p">[</span><span class="s">"district_code"</span><span class="p">]])</span>
<span class="c1"># likelihood, n=1 since each represents an individual woman
</span> <span class="n">c</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Binomial</span><span class="p">(</span><span class="s">"c"</span><span class="p">,</span> <span class="n">n</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">p</span><span class="o">=</span><span class="n">p</span><span class="p">,</span> <span class="n">observed</span><span class="o">=</span><span class="n">df_bangladesh_first3</span><span class="p">[</span><span class="s">"use.contraception"</span><span class="p">])</span>
<span class="n">trace_mme3b</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">sample</span><span class="p">(</span><span class="n">draws</span><span class="o">=</span><span class="mi">1000</span><span class="p">,</span> <span class="n">random_seed</span><span class="o">=</span><span class="mi">19</span><span class="p">,</span> <span class="n">return_inferencedata</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">progressbar</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span> <span class="n">target_accept</span><span class="o">=</span><span class="mf">0.99</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [a, sigma, a_bar]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 42 seconds.
There were 5 divergences after tuning. Increase `target_accept` or reparameterize.
There were 2 divergences after tuning. Increase `target_accept` or reparameterize.
There were 3 divergences after tuning. Increase `target_accept` or reparameterize.
There were 4 divergences after tuning. Increase `target_accept` or reparameterize.
The number of effective samples is smaller than 10% for some parameters.
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">az</span><span class="p">.</span><span class="n">summary</span><span class="p">(</span><span class="n">trace_mme3b</span><span class="p">)</span>
</code></pre></div></div>
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>mean</th>
<th>sd</th>
<th>hdi_5.5%</th>
<th>hdi_94.5%</th>
<th>mcse_mean</th>
<th>mcse_sd</th>
<th>ess_mean</th>
<th>ess_sd</th>
<th>ess_bulk</th>
<th>ess_tail</th>
<th>r_hat</th>
</tr>
</thead>
<tbody>
<tr>
<th>a_bar</th>
<td>-0.924</td>
<td>0.205</td>
<td>-1.244</td>
<td>-0.597</td>
<td>0.009</td>
<td>0.006</td>
<td>571.0</td>
<td>542.0</td>
<td>573.0</td>
<td>517.0</td>
<td>1.00</td>
</tr>
<tr>
<th>a[0]</th>
<td>-0.952</td>
<td>0.186</td>
<td>-1.260</td>
<td>-0.661</td>
<td>0.007</td>
<td>0.005</td>
<td>640.0</td>
<td>613.0</td>
<td>650.0</td>
<td>662.0</td>
<td>1.00</td>
</tr>
<tr>
<th>a[1]</th>
<td>-0.917</td>
<td>0.220</td>
<td>-1.244</td>
<td>-0.557</td>
<td>0.009</td>
<td>0.007</td>
<td>600.0</td>
<td>567.0</td>
<td>601.0</td>
<td>608.0</td>
<td>1.00</td>
</tr>
<tr>
<th>a[2]</th>
<td>-0.911</td>
<td>0.230</td>
<td>-1.269</td>
<td>-0.548</td>
<td>0.010</td>
<td>0.007</td>
<td>585.0</td>
<td>567.0</td>
<td>578.0</td>
<td>649.0</td>
<td>1.00</td>
</tr>
<tr>
<th>sigma</th>
<td>0.081</td>
<td>0.059</td>
<td>0.004</td>
<td>0.161</td>
<td>0.003</td>
<td>0.002</td>
<td>346.0</td>
<td>346.0</td>
<td>286.0</td>
<td>532.0</td>
<td>1.02</td>
</tr>
</tbody>
</table>
</div>
<h1 id="me-attempt-4-re-paramaterization">ME, attempt 4: re-paramaterization</h1>
<p>Our <code class="language-plaintext highlighter-rouge">r_hat</code> for the <code class="language-plaintext highlighter-rouge">a</code> values are better, but the <code class="language-plaintext highlighter-rouge">ess_mean</code> indicates some ineffiencies still. The <code class="language-plaintext highlighter-rouge">sigma</code> also still looks bad. Now we should really consider re-paramaterizing. Luckily, we have an example of how to do this in the book.</p>
<p>Let’s look again at the original equation. We’ve got a substitution we can make for $\alpha$. We can also loosen the sigma back up, using a Half-Normal(0.5) prior.</p>
<p>Centered equation</p>
\[C_i \sim \text{Binomial}(1, p_i) \tag{binomial likelihood}\]
\[\text{logit}(p_i) = \alpha_{\text{district}[i]} \tag{linear model using logit link}\]
\[\alpha_j \sim \text{Normal}(\bar{\alpha}, \sigma) \tag{adaptive prior}\]
\[\bar{\alpha} \sim \text{Normal}(0, 1.5) \tag{regularizing hyperprior}\]
\[\sigma \sim \text{Half-Normal}(\text{TBD}) \tag{regularizing hyperprior}\]
<p>Non-centered equation</p>
\[C_i \sim \text{Binomial}(1, p_i) \tag{binomial likelihood}\]
\[\text{logit}(p_i) = \bar{\alpha} + z_{\text{district}[i]} \sigma \tag{substituting for alpha}\]
\[z \sim \text{Normal}(0, 1) \tag{adaptive prior}\]
\[\bar{\alpha} \sim \text{Normal}(0, 1.5) \tag{regularizing hyperprior}\]
\[\sigma \sim \text{Half-Normal}(0.5) \tag{regularizing hyperprior}\]
<p>The non-centered equation gives us a posterior that is easier to explore, while allowing us to use transformations to get back the numerical values we care about.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">with</span> <span class="n">pm</span><span class="p">.</span><span class="n">Model</span><span class="p">()</span> <span class="k">as</span> <span class="n">mme4</span><span class="p">:</span>
<span class="c1"># prior for average district
</span> <span class="n">a_bar</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">"a_bar"</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">,</span> <span class="mf">1.5</span><span class="p">)</span>
<span class="c1"># prior for SD of districs
</span> <span class="n">sigma</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">HalfNormal</span><span class="p">(</span><span class="s">"sigma"</span><span class="p">,</span> <span class="mf">0.5</span><span class="p">)</span>
<span class="c1"># our substitution
</span> <span class="n">z</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">"z"</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="nb">len</span><span class="p">(</span><span class="n">df_bangladesh_first3</span><span class="p">[</span><span class="s">"district_code"</span><span class="p">].</span><span class="n">unique</span><span class="p">()))</span>
<span class="c1"># alpha prior, we only have 1 district
</span> <span class="c1"># a = pm.Normal("a", a_bar, sigma, shape=len(df_bangladesh_first3["district_code"].unique()))
</span>
<span class="c1"># link function
</span> <span class="n">p</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">math</span><span class="p">.</span><span class="n">invlogit</span><span class="p">(</span><span class="n">a_bar</span> <span class="o">+</span> <span class="n">z</span><span class="p">[</span><span class="n">df_bangladesh_first3</span><span class="p">[</span><span class="s">"district_code"</span><span class="p">]]</span> <span class="o">*</span> <span class="n">sigma</span><span class="p">)</span>
<span class="c1"># likelihood, n=1 since each represents an individual woman
</span> <span class="n">c</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Binomial</span><span class="p">(</span><span class="s">"c"</span><span class="p">,</span> <span class="n">n</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">p</span><span class="o">=</span><span class="n">p</span><span class="p">,</span> <span class="n">observed</span><span class="o">=</span><span class="n">df_bangladesh_first3</span><span class="p">[</span><span class="s">"use.contraception"</span><span class="p">])</span>
<span class="n">trace_mme4</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">sample</span><span class="p">(</span><span class="n">draws</span><span class="o">=</span><span class="mi">1000</span><span class="p">,</span> <span class="n">random_seed</span><span class="o">=</span><span class="mi">19</span><span class="p">,</span> <span class="n">return_inferencedata</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">progressbar</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span> <span class="n">target_accept</span><span class="o">=</span><span class="mf">0.99</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Auto-assigning NUTS sampler...
INFO:pymc3:Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
INFO:pymc3:Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
INFO:pymc3:Multiprocess sampling (4 chains in 4 jobs)
NUTS: [z, sigma, a_bar]
INFO:pymc3:NUTS: [z, sigma, a_bar]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 22 seconds.
INFO:pymc3:Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 22 seconds.
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">az</span><span class="p">.</span><span class="n">summary</span><span class="p">(</span><span class="n">trace_mme4</span><span class="p">)</span>
</code></pre></div></div>
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>mean</th>
<th>sd</th>
<th>hdi_5.5%</th>
<th>hdi_94.5%</th>
<th>mcse_mean</th>
<th>mcse_sd</th>
<th>ess_mean</th>
<th>ess_sd</th>
<th>ess_bulk</th>
<th>ess_tail</th>
<th>r_hat</th>
</tr>
</thead>
<tbody>
<tr>
<th>a_bar</th>
<td>-0.731</td>
<td>0.409</td>
<td>-1.343</td>
<td>-0.137</td>
<td>0.014</td>
<td>0.010</td>
<td>905.0</td>
<td>905.0</td>
<td>1116.0</td>
<td>1060.0</td>
<td>1.0</td>
</tr>
<tr>
<th>z[0]</th>
<td>-0.513</td>
<td>0.883</td>
<td>-1.884</td>
<td>0.894</td>
<td>0.025</td>
<td>0.018</td>
<td>1238.0</td>
<td>1238.0</td>
<td>1245.0</td>
<td>1783.0</td>
<td>1.0</td>
</tr>
<tr>
<th>z[1]</th>
<td>0.059</td>
<td>0.870</td>
<td>-1.237</td>
<td>1.504</td>
<td>0.019</td>
<td>0.015</td>
<td>2150.0</td>
<td>1661.0</td>
<td>2138.0</td>
<td>2537.0</td>
<td>1.0</td>
</tr>
<tr>
<th>z[2]</th>
<td>0.453</td>
<td>1.002</td>
<td>-1.138</td>
<td>2.039</td>
<td>0.021</td>
<td>0.016</td>
<td>2334.0</td>
<td>1919.0</td>
<td>2332.0</td>
<td>2212.0</td>
<td>1.0</td>
</tr>
<tr>
<th>sigma</th>
<td>0.393</td>
<td>0.290</td>
<td>0.000</td>
<td>0.783</td>
<td>0.008</td>
<td>0.006</td>
<td>1216.0</td>
<td>1216.0</td>
<td>1037.0</td>
<td>1015.0</td>
<td>1.0</td>
</tr>
</tbody>
</table>
</div>
<p>Hallelujah! We’ve got no divergences, a great <code class="language-plaintext highlighter-rouge">ess_mean</code> and <code class="language-plaintext highlighter-rouge">r_hat</code> values. Let’s visualize what we’ve got.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">az</span><span class="p">.</span><span class="n">plot_pair</span><span class="p">(</span><span class="n">trace_mme4</span><span class="p">,</span> <span class="n">kind</span><span class="o">=</span><span class="s">'kde'</span><span class="p">,</span> <span class="n">divergences</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>array([[<AxesSubplot:ylabel='z\n0'>, <AxesSubplot:>, <AxesSubplot:>,
<AxesSubplot:>],
[<AxesSubplot:ylabel='z\n1'>, <AxesSubplot:>, <AxesSubplot:>,
<AxesSubplot:>],
[<AxesSubplot:ylabel='z\n2'>, <AxesSubplot:>, <AxesSubplot:>,
<AxesSubplot:>],
[<AxesSubplot:xlabel='a_bar', ylabel='sigma'>,
<AxesSubplot:xlabel='z\n0'>, <AxesSubplot:xlabel='z\n1'>,
<AxesSubplot:xlabel='z\n2'>]], dtype=object)
</code></pre></div></div>
<p><img src="/assets/2021-11-23-diagnosing-a-model_files/2021-11-23-diagnosing-a-model_60_1.png" alt="png" /></p>
<p>Clean pair plots!</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">az</span><span class="p">.</span><span class="n">plot_rank</span><span class="p">(</span><span class="n">trace_mme4</span><span class="p">,</span> <span class="n">var_names</span><span class="o">=</span><span class="s">'sigma'</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><AxesSubplot:title={'center':'sigma'}, xlabel='Rank (all chains)', ylabel='Chain'>
</code></pre></div></div>
<p><img src="/assets/2021-11-23-diagnosing-a-model_files/2021-11-23-diagnosing-a-model_62_1.png" alt="png" /></p>
<p>Great rank plots!</p>
<p>Now let’s get our <code class="language-plaintext highlighter-rouge">a</code> values back. It might be a little tricky to see where this came from so let’s review the most important parts of the centered and non-centered equations.</p>
<p>Centered equation
\(\text{logit}(p_i) = \alpha_{\text{district}[i]} \tag{linear model using logit link}\)
\(\alpha_j \sim \text{Normal}(\bar{\alpha}, \sigma) \tag{adaptive prior}\)</p>
<p>Non-centered equation
\(\text{logit}(p_i) = \bar{\alpha} + z_{\text{district}[i]} \sigma \tag{substituting for alpha}\)
\(z \sim \text{Normal}(0, 1) \tag{adaptive prior}\)</p>
<p>$\alpha$ became $\bar{\alpha} + z_{\text{district}[i]} \sigma$.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">trace_mme4_df</span> <span class="o">=</span> <span class="n">trace_mme4</span><span class="p">.</span><span class="n">to_dataframe</span><span class="p">()</span>
<span class="n">trace_mme4_df</span><span class="p">.</span><span class="n">iloc</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="mi">5</span><span class="p">,</span> <span class="mi">0</span><span class="p">:</span><span class="mi">7</span><span class="p">]</span>
</code></pre></div></div>
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>chain</th>
<th>draw</th>
<th>(posterior, a_bar)</th>
<th>(posterior, z[0], 0)</th>
<th>(posterior, z[1], 1)</th>
<th>(posterior, z[2], 2)</th>
<th>(posterior, sigma)</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>0</td>
<td>0</td>
<td>0.185127</td>
<td>-1.501203</td>
<td>-0.768144</td>
<td>1.673570</td>
<td>0.547466</td>
</tr>
<tr>
<th>1</th>
<td>0</td>
<td>1</td>
<td>-0.025513</td>
<td>-1.515975</td>
<td>-1.125922</td>
<td>-0.096766</td>
<td>0.684001</td>
</tr>
<tr>
<th>2</th>
<td>0</td>
<td>2</td>
<td>-0.334302</td>
<td>-1.174676</td>
<td>-1.535301</td>
<td>0.709949</td>
<td>0.569022</td>
</tr>
<tr>
<th>3</th>
<td>0</td>
<td>3</td>
<td>-0.134617</td>
<td>-2.165963</td>
<td>-1.432953</td>
<td>0.741756</td>
<td>0.376201</td>
</tr>
<tr>
<th>4</th>
<td>0</td>
<td>4</td>
<td>-0.151168</td>
<td>-2.197087</td>
<td>-1.454579</td>
<td>0.214046</td>
<td>0.226324</td>
</tr>
</tbody>
</table>
</div>
<p>Therefore, we can get <code class="language-plaintext highlighter-rouge">a</code> parameters for each district (<code class="language-plaintext highlighter-rouge">a[0], a[1], a[2]</code>) values by doing the appropriate arithmetic on each row. I’m going to ignore the chains for now.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Initialize
</span><span class="n">df_a</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">((</span><span class="nb">len</span><span class="p">(</span><span class="n">trace_mme4_df</span><span class="p">),</span> <span class="mi">3</span><span class="p">)))</span>
<span class="c1"># Fill in rows with transformation
</span><span class="n">df_a</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">trace_mme4_df</span><span class="p">[(</span><span class="s">'posterior'</span><span class="p">,</span> <span class="s">'a_bar'</span><span class="p">)]</span> <span class="o">+</span> <span class="n">trace_mme4_df</span><span class="p">[(</span><span class="s">'posterior'</span><span class="p">,</span> <span class="s">'z[0]'</span><span class="p">,</span> <span class="mi">0</span><span class="p">)]</span> <span class="o">*</span> <span class="n">trace_mme4_df</span><span class="p">[(</span><span class="s">'posterior'</span><span class="p">,</span> <span class="s">'sigma'</span><span class="p">)]</span>
<span class="n">df_a</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">trace_mme4_df</span><span class="p">[(</span><span class="s">'posterior'</span><span class="p">,</span> <span class="s">'a_bar'</span><span class="p">)]</span> <span class="o">+</span> <span class="n">trace_mme4_df</span><span class="p">[(</span><span class="s">'posterior'</span><span class="p">,</span> <span class="s">'z[1]'</span><span class="p">,</span> <span class="mi">1</span><span class="p">)]</span> <span class="o">*</span> <span class="n">trace_mme4_df</span><span class="p">[(</span><span class="s">'posterior'</span><span class="p">,</span> <span class="s">'sigma'</span><span class="p">)]</span>
<span class="n">df_a</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="n">trace_mme4_df</span><span class="p">[(</span><span class="s">'posterior'</span><span class="p">,</span> <span class="s">'a_bar'</span><span class="p">)]</span> <span class="o">+</span> <span class="n">trace_mme4_df</span><span class="p">[(</span><span class="s">'posterior'</span><span class="p">,</span> <span class="s">'z[2]'</span><span class="p">,</span> <span class="mi">2</span><span class="p">)]</span> <span class="o">*</span> <span class="n">trace_mme4_df</span><span class="p">[(</span><span class="s">'posterior'</span><span class="p">,</span> <span class="s">'sigma'</span><span class="p">)]</span>
</code></pre></div></div>
<p>We can plot our parameters now, with a bit more plotting code for the middle plot since we have raw values. And just as a reminder of the original goal of the post, we’ll show the fixed effects model as well. We’l plot them on the same x-scale to show the dramatic difference that partial pooling has.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">f</span><span class="p">,</span> <span class="p">(</span><span class="n">ax1</span><span class="p">,</span> <span class="n">ax2</span><span class="p">,</span> <span class="n">ax3</span><span class="p">)</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">16</span><span class="p">,</span> <span class="mi">4</span><span class="p">),</span> <span class="n">sharex</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="c1"># z parameters, mixed effects
</span><span class="n">az</span><span class="p">.</span><span class="n">plot_forest</span><span class="p">(</span><span class="n">trace_mme4</span><span class="p">,</span> <span class="n">var_names</span><span class="o">=</span><span class="s">'z'</span><span class="p">,</span> <span class="n">combined</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">ax</span><span class="o">=</span><span class="n">ax1</span><span class="p">)</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">set_title</span><span class="p">(</span><span class="s">"z parameters</span><span class="se">\n</span><span class="s">mixed-effects"</span><span class="p">)</span>
<span class="c1"># a parameters, mixed effects
</span><span class="n">ax2</span><span class="p">.</span><span class="n">hlines</span><span class="p">(</span><span class="n">xmin</span><span class="o">=</span><span class="n">df_a</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">quantile</span><span class="p">(</span><span class="mf">0.055</span><span class="p">),</span> <span class="n">xmax</span><span class="o">=</span><span class="n">df_a</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">quantile</span><span class="p">(</span><span class="mf">0.945</span><span class="p">),</span> <span class="n">y</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="n">ax2</span><span class="p">.</span><span class="n">hlines</span><span class="p">(</span><span class="n">xmin</span><span class="o">=</span><span class="n">df_a</span><span class="p">[</span><span class="mi">1</span><span class="p">].</span><span class="n">quantile</span><span class="p">(</span><span class="mf">0.055</span><span class="p">),</span> <span class="n">xmax</span><span class="o">=</span><span class="n">df_a</span><span class="p">[</span><span class="mi">1</span><span class="p">].</span><span class="n">quantile</span><span class="p">(</span><span class="mf">0.945</span><span class="p">),</span> <span class="n">y</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">ax2</span><span class="p">.</span><span class="n">hlines</span><span class="p">(</span><span class="n">xmin</span><span class="o">=</span><span class="n">df_a</span><span class="p">[</span><span class="mi">2</span><span class="p">].</span><span class="n">quantile</span><span class="p">(</span><span class="mf">0.055</span><span class="p">),</span> <span class="n">xmax</span><span class="o">=</span><span class="n">df_a</span><span class="p">[</span><span class="mi">2</span><span class="p">].</span><span class="n">quantile</span><span class="p">(</span><span class="mf">0.945</span><span class="p">),</span> <span class="n">y</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
<span class="n">ax2</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">df_a</span><span class="p">.</span><span class="n">mean</span><span class="p">(),</span> <span class="nb">range</span><span class="p">(</span><span class="mi">3</span><span class="p">),</span> <span class="n">facecolors</span><span class="o">=</span><span class="s">'white'</span><span class="p">,</span> <span class="n">edgecolors</span><span class="o">=</span><span class="s">'blue'</span><span class="p">)</span>
<span class="n">ax2</span><span class="p">.</span><span class="n">invert_yaxis</span><span class="p">()</span>
<span class="n">ax2</span><span class="p">.</span><span class="n">set_yticks</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">3</span><span class="p">))</span>
<span class="n">ax2</span><span class="p">.</span><span class="n">set_yticklabels</span><span class="p">([</span><span class="s">"a"</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">3</span><span class="p">)])</span>
<span class="n">ax2</span><span class="p">.</span><span class="n">set_title</span><span class="p">(</span><span class="s">"a parameters</span><span class="se">\n</span><span class="s">mixed-effects"</span><span class="p">)</span>
<span class="c1"># a parameters, fixed effects
</span><span class="n">az</span><span class="p">.</span><span class="n">plot_forest</span><span class="p">(</span><span class="n">trace_mfe</span><span class="p">,</span> <span class="n">var_names</span><span class="o">=</span><span class="s">'a'</span><span class="p">,</span> <span class="n">combined</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">ax</span><span class="o">=</span><span class="n">ax3</span><span class="p">)</span>
<span class="n">ax3</span><span class="p">.</span><span class="n">set_title</span><span class="p">(</span><span class="s">"a parameters</span><span class="se">\n</span><span class="s">fixed-effects"</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Text(0.5, 1.0, 'a parameters\nfixed-effects')
</code></pre></div></div>
<p><img src="/assets/2021-11-23-diagnosing-a-model_files/2021-11-23-diagnosing-a-model_70_1.png" alt="png" /></p>
<p>In the mixed-effects model, we see that the point estimates of the <code class="language-plaintext highlighter-rouge">a</code> parameters mirror the pattern of the <code class="language-plaintext highlighter-rouge">z</code> parameters which is to be expected. Of course, after we have gone through our model diagnostics and iterations, the bigger picture is that the multi-level model has a much different result than the fixed-effects model which we created at the start of the post. We see shrinkage of our estimates so that it appears our districts have less of a difference among them than originally seen with the fixed-effects model. We also see a reduction in variance. Both the movement in the point estimate and reduction in variance are most dramatic for district 2 which has the smallest sample size of the three shown here.</p>
<h1 id="summary">Summary</h1>
<p>Well, what started as a simple post led to a deeper understanding of how to diagnose a model and what knobs to fiddle. More often than not, we’ll have to turn to alternate paramaterizations as we did here. In another post, we’ll get back to the original goal of understanding how partial pooling happens, resulting in shrinkage of estimates for our clusters.</p>
<p>Appendix: Environment and system parameters</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">%</span><span class="n">watermark</span> <span class="o">-</span><span class="n">n</span> <span class="o">-</span><span class="n">u</span> <span class="o">-</span><span class="n">v</span> <span class="o">-</span><span class="n">iv</span> <span class="o">-</span><span class="n">w</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Last updated: Mon Nov 22 2021
Python implementation: CPython
Python version : 3.8.6
IPython version : 7.20.0
scipy : 1.6.0
sys : 3.8.6 | packaged by conda-forge | (default, Jan 25 2021, 23:22:12)
[Clang 11.0.1 ]
pandas : 1.2.1
seaborn : 0.11.1
pymc3 : 3.11.0
numpy : 1.20.1
theano : 1.1.0
arviz : 0.11.1
matplotlib: 3.3.4
Watermark: 2.1.0
</code></pre></div></div>Ben LacarIn my last post, I gave an example of a multilevel model using a binomial generalized linear model (GLM). The varying intercept model helped illustrate partial pooling, shrinkage, and information sharing. The equation to create the mixed effects model was simple. But how exactly is “information shared”? Let’s get started!Multilevel modeling with binomial GLM2021-10-23T00:00:00+00:002021-10-23T00:00:00+00:00https://benslack19.github.io/data%20science/statistics/multilevel_modeling_01<p>I’ve been on a journey learning multilevel models and Bayesian inference through <a href="https://xcelab.net/rm/statistical-rethinking/">Richard McElreath’s Statistical Rethinking book</a>. The concepts of shrinkage and partial pooling that are inherent to multilevel models are really interesting to me. First, let’s get some terminology out of the way. As Dr. McElreath highlights, there are multiple terms that are used for multilevel models.</p>
<ul>
<li>heirarchical models</li>
<li>mixed-effects models</li>
<li>varying effects models</li>
</ul>
<p>Be mindful though because other authors may have different definitions for these terms. In this post, I will use these terms interchangeably as Dr. McElreath does. Regardless of how they’re called, multilevel models use information sharing based on grouping variables to make more accurate estimates, especially when groups are of variable sizes. These concepts only made sense to me after working through a problem. Let’s look at the impact of a multilevel model structure using a binomial generalized linear model (GLM) example from the book. Dr. McElreath uses the “tadpole data”, but here I’ll use problem 13H1 which has similar concepts. This question illustrates use of “varying intercepts”, which is the simplest kind of varying effects model.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">arviz</span> <span class="k">as</span> <span class="n">az</span>
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="n">pd</span>
<span class="kn">import</span> <span class="nn">pymc3</span> <span class="k">as</span> <span class="n">pm</span>
<span class="kn">import</span> <span class="nn">scipy.stats</span> <span class="k">as</span> <span class="n">stats</span>
<span class="kn">import</span> <span class="nn">seaborn</span> <span class="k">as</span> <span class="n">sns</span>
<span class="kn">from</span> <span class="nn">scipy.special</span> <span class="kn">import</span> <span class="n">expit</span> <span class="k">as</span> <span class="n">logistic</span>
<span class="kn">from</span> <span class="nn">scipy.special</span> <span class="kn">import</span> <span class="n">logit</span>
<span class="kn">from</span> <span class="nn">scipy.optimize</span> <span class="kn">import</span> <span class="n">curve_fit</span>
<span class="kn">from</span> <span class="nn">causalgraphicalmodels</span> <span class="kn">import</span> <span class="n">CausalGraphicalModel</span>
<span class="kn">from</span> <span class="nn">theano</span> <span class="kn">import</span> <span class="n">tensor</span> <span class="k">as</span> <span class="n">tt</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">%</span><span class="n">load_ext</span> <span class="n">nb_black</span>
<span class="o">%</span><span class="n">config</span> <span class="n">InlineBackend</span><span class="p">.</span><span class="n">figure_format</span> <span class="o">=</span> <span class="s">'retina'</span>
<span class="o">%</span><span class="n">load_ext</span> <span class="n">watermark</span>
<span class="n">RANDOM_SEED</span> <span class="o">=</span> <span class="mi">8927</span>
<span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">seed</span><span class="p">(</span><span class="n">RANDOM_SEED</span><span class="p">)</span>
<span class="n">az</span><span class="p">.</span><span class="n">style</span><span class="p">.</span><span class="n">use</span><span class="p">(</span><span class="s">"arviz-darkgrid"</span><span class="p">)</span>
<span class="n">az</span><span class="p">.</span><span class="n">rcParams</span><span class="p">[</span><span class="s">"stats.hdi_prob"</span><span class="p">]</span> <span class="o">=</span> <span class="mf">0.89</span> <span class="c1"># sets default credible interval used by arviz
</span><span class="n">sns</span><span class="p">.</span><span class="n">set_context</span><span class="p">(</span><span class="s">"talk"</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>The nb_black extension is already loaded. To reload it, use:
%reload_ext nb_black
The watermark extension is already loaded. To reload it, use:
%reload_ext watermark
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">standardize</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="n">x</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span> <span class="o">-</span> <span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">x</span><span class="p">))</span> <span class="o">/</span> <span class="n">np</span><span class="p">.</span><span class="n">std</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="k">return</span> <span class="n">x</span>
</code></pre></div></div>
<h1 id="problem-description">Problem description</h1>
<p><em>The description of problem 13H1 is taken directly from the book.</em></p>
<blockquote>
<p>In 1980, a typical Bengali woman could have 5 or more children in her lifetime. By the year 2000, a typical Bengali woman had only 2 or 3. You’re going to look at a historical set of data, when contraception was widely available but many families chose not to use it. These data reside in <code class="language-plaintext highlighter-rouge">data(bangladesh)</code> and come from the 1988 Bangladesh Fertility Survey. Each row is one of 1934 women. There are six variables, but you can focus on two of them for this practice problem:</p>
</blockquote>
<blockquote>
<ol>
<li><code class="language-plaintext highlighter-rouge">district</code>: ID number of administrative district each woman resided in</li>
<li><code class="language-plaintext highlighter-rouge">use.contraception</code>: An indicator (0/1) of whether the woman was using contraception</li>
</ol>
</blockquote>
<blockquote>
<p>The first thing to do is ensure that the cluster variable, <code class="language-plaintext highlighter-rouge">district</code>, is a contiguous set of integers. Recall that these values will be index values inside the model. If there are gaps, you’ll have parameters for which there is no data to inform them. Worse, the model probably won’t run. Look at the unique values of the <code class="language-plaintext highlighter-rouge">district</code> variable:</p>
</blockquote>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># R code 13.40
sort(unique(d$district))
[1] 1 2 3 4 5 ... 51 52 53 55 56 57.... 61
</code></pre></div></div>
<blockquote>
<p>District 54 is absent. So <code class="language-plaintext highlighter-rouge">district</code> isn’t yet a good index variable, because it’s not contiguous. This is easy to fix. Just make a new variable that is contiguous. This is enough to do it:</p>
</blockquote>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># R code 13.41
d$district_id <-as.integer(as.factor(d$district))
sort(unique(d$district_id))
[1] 1 2 3 4 5 ... 60
</code></pre></div></div>
<blockquote>
<p>Now there are 60 values, contiguous integers 1 to 60. Now, focus on predicting <code class="language-plaintext highlighter-rouge">use.contraception</code>, clustered by <code class="language-plaintext highlighter-rouge">district_id</code>. Fit both (1) a traditional fixed-effects model that uses an index variable for district and (2) a multilevel model with varying intercepts for district. Plot the predicted proportions of women in each district using contraception, for both the fixed-effects model and the varying-effects model. That is, make a plot in which district ID is on the horizontal axis and expected proportion using contraception is on the vertical. Make one plot for each model, or layer them on the same plot, as you prefer. How do the models disagree? Can you explain the pattern of disagreement? In particular, can you explain the most extreme cases of disagreement, both why they happen where they do and why the models reach different inferences?</p>
</blockquote>
<p>This problem provides some nice opportunities for how multilevel models work so I’ll make some additional plots in addition to what the question asks for. Let’s dive in!</p>
<h1 id="data-exploration-and-setup">Data exploration and setup</h1>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df_bangladesh</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">read_csv</span><span class="p">(</span>
<span class="s">"../pymc3_ed_resources/resources/Rethinking/Data/bangladesh.csv"</span><span class="p">,</span>
<span class="n">delimiter</span><span class="o">=</span><span class="s">";"</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">df_bangladesh</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>woman</th>
<th>district</th>
<th>use.contraception</th>
<th>living.children</th>
<th>age.centered</th>
<th>urban</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>1</td>
<td>1</td>
<td>0</td>
<td>4</td>
<td>18.4400</td>
<td>1</td>
</tr>
<tr>
<th>1</th>
<td>2</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>-5.5599</td>
<td>1</td>
</tr>
<tr>
<th>2</th>
<td>3</td>
<td>1</td>
<td>0</td>
<td>3</td>
<td>1.4400</td>
<td>1</td>
</tr>
<tr>
<th>3</th>
<td>4</td>
<td>1</td>
<td>0</td>
<td>4</td>
<td>8.4400</td>
<td>1</td>
</tr>
<tr>
<th>4</th>
<td>5</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>-13.5590</td>
<td>1</td>
</tr>
</tbody>
</table>
</div>
<p>The dataframe has several columns, but for this problem, we’ll focus on only the outcome variable <code class="language-plaintext highlighter-rouge">use.contraception</code> and the <code class="language-plaintext highlighter-rouge">district</code> feature. Note how each row represents one woman.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">print</span><span class="p">(</span><span class="s">"shape of df: "</span><span class="p">,</span> <span class="n">df_bangladesh</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>shape of df: (1934, 6)
</code></pre></div></div>
<p>Per the assignment, fix the district variable since it is not a contiguous set of integers.. Luckily, this is easy enough to do with <code class="language-plaintext highlighter-rouge">pd.Categorical</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df_bangladesh</span><span class="p">[</span><span class="s">"district_code"</span><span class="p">]</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">Categorical</span><span class="p">(</span><span class="n">df_bangladesh</span><span class="p">[</span><span class="s">"district"</span><span class="p">]).</span><span class="n">codes</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># inspect and see that it's now 0-indexed for Python
</span>
<span class="k">print</span><span class="p">(</span>
<span class="s">"Head of the dataframe: "</span><span class="p">,</span> <span class="n">df_bangladesh</span><span class="p">[[</span><span class="s">"district"</span><span class="p">,</span> <span class="s">"district_code"</span><span class="p">]].</span><span class="n">drop_duplicates</span><span class="p">().</span><span class="n">head</span><span class="p">()</span>
<span class="p">)</span>
<span class="c1"># and also that it accounts for missing district 54
</span><span class="k">print</span><span class="p">(</span>
<span class="s">"Tail: of the dataframe "</span><span class="p">,</span> <span class="n">df_bangladesh</span><span class="p">[[</span><span class="s">"district"</span><span class="p">,</span> <span class="s">"district_code"</span><span class="p">]].</span><span class="n">drop_duplicates</span><span class="p">().</span><span class="n">tail</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span>
<span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Head of the dataframe: district district_code
0 1 0
117 2 1
137 3 2
139 4 3
169 5 4
Tail: of the dataframe district district_code
1622 51 50
1659 52 51
1720 53 52
1739 55 53
1745 56 54
1790 57 55
1817 58 56
1850 59 57
1860 60 58
1892 61 59
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Inspect the outcome variable
</span><span class="n">df_bangladesh</span><span class="p">[</span><span class="s">"use.contraception"</span><span class="p">].</span><span class="n">value_counts</span><span class="p">()</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0 1175
1 759
Name: use.contraception, dtype: int64
</code></pre></div></div>
<p>Now that we have a sense for how the data is structured, we can start building our models. To appreciate the mixed-effects model, we will start by creating a fixed-effects model so that we can highlight the differences between them.</p>
<h1 id="fixed-effects-model">Fixed-effects model</h1>
<p>Predict <code class="language-plaintext highlighter-rouge">use.contraception</code>. Since there are two outcomes, it makes sense to use a binomial GLM for this problem. We’ll use an index variable for district and it will be an intercept only model. We are using a binomial likelihood where each observation(n) is 1, since each row of our dataset is for one woman. Alternatively, we could have used a Bernoulli. The parameter <em>p</em> is the probability of a woman using contraception. We obtain this from the linear model which is on the second line. The linear model uses the <a href="https://en.wikipedia.org/wiki/Logit">logit function</a> to serve as our link function with the binomial GLM. Finally, we have a regularizing prior for $\alpha$ so that our considered values for <em>p</em> are within reason. What $\alpha$ represents in this case is the average contraception use for each district. This is <em>regardless</em> of any other variables that are in our dataframe because we have omitted them from our model. This point will be contrasted with a future post where we build on this problem.</p>
<p><strong>Model <code class="language-plaintext highlighter-rouge">mfe</code> equation</strong></p>
\[C_i \sim \text{Binomial}(1, p_i) \tag{binomial likelihood}\]
\[\text{logit}(p_i) = \alpha_{\text{district}[i]} \tag{linear model using logit link}\]
\[\alpha_j \sim \text{Normal}(0, 1.5) \tag{regularizing prior}\]
<p>Now let’s use <code class="language-plaintext highlighter-rouge">pymc</code> to build our model.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">with</span> <span class="n">pm</span><span class="p">.</span><span class="n">Model</span><span class="p">()</span> <span class="k">as</span> <span class="n">mfe</span><span class="p">:</span>
<span class="c1"># alpha prior, one for each district
</span> <span class="n">a</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">"a"</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mf">1.5</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="nb">len</span><span class="p">(</span><span class="n">df_bangladesh</span><span class="p">[</span><span class="s">"district_code"</span><span class="p">].</span><span class="n">unique</span><span class="p">()))</span>
<span class="c1"># link function
</span> <span class="n">p</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">math</span><span class="p">.</span><span class="n">invlogit</span><span class="p">(</span><span class="n">a</span><span class="p">[</span><span class="n">df_bangladesh</span><span class="p">[</span><span class="s">"district_code"</span><span class="p">]])</span>
<span class="c1"># likelihood, n=1 since each represents an individual woman
</span> <span class="n">c</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Binomial</span><span class="p">(</span><span class="s">"c"</span><span class="p">,</span> <span class="n">n</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">p</span><span class="o">=</span><span class="n">p</span><span class="p">,</span> <span class="n">observed</span><span class="o">=</span><span class="n">df_bangladesh</span><span class="p">[</span><span class="s">"use.contraception"</span><span class="p">])</span>
<span class="n">trace_mfe</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">sample</span><span class="p">(</span><span class="n">draws</span><span class="o">=</span><span class="mi">1000</span><span class="p">,</span> <span class="n">random_seed</span><span class="o">=</span><span class="mi">19</span><span class="p">,</span> <span class="n">return_inferencedata</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">progressbar</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Auto-assigning NUTS sampler...
INFO:pymc3:Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
INFO:pymc3:Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
INFO:pymc3:Multiprocess sampling (4 chains in 4 jobs)
NUTS: [a]
INFO:pymc3:NUTS: [a]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 21 seconds.
INFO:pymc3:Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 21 seconds.
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># View summary of mfe model
</span><span class="n">az</span><span class="p">.</span><span class="n">summary</span><span class="p">(</span><span class="n">trace_mfe</span><span class="p">).</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>mean</th>
<th>sd</th>
<th>hdi_5.5%</th>
<th>hdi_94.5%</th>
<th>mcse_mean</th>
<th>mcse_sd</th>
<th>ess_mean</th>
<th>ess_sd</th>
<th>ess_bulk</th>
<th>ess_tail</th>
<th>r_hat</th>
</tr>
</thead>
<tbody>
<tr>
<th>a[0]</th>
<td>-1.052</td>
<td>0.205</td>
<td>-1.389</td>
<td>-0.730</td>
<td>0.002</td>
<td>0.002</td>
<td>10791.0</td>
<td>8837.0</td>
<td>11082.0</td>
<td>2622.0</td>
<td>1.0</td>
</tr>
<tr>
<th>a[1]</th>
<td>-0.584</td>
<td>0.452</td>
<td>-1.287</td>
<td>0.134</td>
<td>0.005</td>
<td>0.005</td>
<td>9885.0</td>
<td>4432.0</td>
<td>9897.0</td>
<td>2824.0</td>
<td>1.0</td>
</tr>
<tr>
<th>a[2]</th>
<td>1.240</td>
<td>1.156</td>
<td>-0.647</td>
<td>2.980</td>
<td>0.012</td>
<td>0.014</td>
<td>8647.0</td>
<td>3240.0</td>
<td>8900.0</td>
<td>2517.0</td>
<td>1.0</td>
</tr>
<tr>
<th>a[3]</th>
<td>-0.003</td>
<td>0.362</td>
<td>-0.572</td>
<td>0.579</td>
<td>0.004</td>
<td>0.007</td>
<td>9690.0</td>
<td>1409.0</td>
<td>9763.0</td>
<td>2666.0</td>
<td>1.0</td>
</tr>
<tr>
<th>a[4]</th>
<td>-0.569</td>
<td>0.330</td>
<td>-1.077</td>
<td>-0.020</td>
<td>0.004</td>
<td>0.003</td>
<td>8362.0</td>
<td>4628.0</td>
<td>8431.0</td>
<td>2847.0</td>
<td>1.0</td>
</tr>
</tbody>
</table>
</div>
<p>Let’s visualize the posterior distributions of the $\alpha$ parameter for each district.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">f</span><span class="p">,</span> <span class="n">ax1</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">16</span><span class="p">))</span>
<span class="n">az</span><span class="p">.</span><span class="n">plot_forest</span><span class="p">(</span><span class="n">trace_mfe</span><span class="p">,</span> <span class="n">combined</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">ax</span><span class="o">=</span><span class="n">ax1</span><span class="p">)</span>
<span class="n">ax1</span><span class="p">.</span><span class="nb">set</span><span class="p">(</span><span class="n">xlabel</span><span class="o">=</span><span class="s">"log-odds"</span><span class="p">,</span> <span class="n">ylabel</span><span class="o">=</span><span class="s">"alpha for each district"</span><span class="p">,</span> <span class="n">title</span><span class="o">=</span><span class="s">'posterior distribution (fixed effects model)'</span><span class="p">);</span>
</code></pre></div></div>
<p><img src="/assets/2021-10-23-multilevel_modeling_01_files/2021-10-23-multilevel_modeling_01_19_0.png" alt="png" /></p>
<p>We can learn a lot by looking at these results. First, it is clear that some districts are much less likely to use contraception (negative log-odds) than other districts (log-odds that span zero or are wholly positive). The district indexed as 10 (originally called district 11 of the raw dataset) has the mean with the lowest likelihood of contraceptive usage; if you inspect the raw data, no woman used contraception.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">print</span><span class="p">(</span><span class="s">"No. of district index 10 women who used contraception: "</span><span class="p">,</span> <span class="p">(</span><span class="n">df_bangladesh</span><span class="p">[</span><span class="n">df_bangladesh</span><span class="p">[</span><span class="s">"district_code"</span><span class="p">]</span> <span class="o">==</span> <span class="mi">10</span><span class="p">][</span><span class="s">'use.contraception'</span><span class="p">]).</span><span class="nb">sum</span><span class="p">())</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>No. of district index 10 women who used contraception: 0
</code></pre></div></div>
<p>Another point worth our focus is the width of the credible intervals, representing the uncertainty of our estimate. For example, district index 2 has a mean estimate that is the most positive among all the districts, but the credible interval is exceptionally wide, ranging from a log-ods of -0.647 to 2.980. Other districts, however, have relatively narrow confidence intervals, such as district index 13. This difference in variability can be explained by the number of women in each district. Higher counts represent lower uncertainties.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">print</span><span class="p">(</span><span class="s">"Top 5 lowest districts for counts of women:</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">df_bangladesh</span><span class="p">[</span><span class="s">'district_code'</span><span class="p">].</span><span class="n">value_counts</span><span class="p">().</span><span class="n">sort_values</span><span class="p">().</span><span class="n">head</span><span class="p">())</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Top 5 highest districts for counts of women:</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">df_bangladesh</span><span class="p">[</span><span class="s">'district_code'</span><span class="p">].</span><span class="n">value_counts</span><span class="p">().</span><span class="n">sort_values</span><span class="p">().</span><span class="n">tail</span><span class="p">())</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Top 5 lowest districts for counts of women:
2 2
48 4
53 6
57 10
41 11
Name: district_code, dtype: int64
Top 5 highest districts for counts of women:
5 65
24 67
45 86
0 117
13 118
Name: district_code, dtype: int64
</code></pre></div></div>
<p>While the fixed-effecs model is a reasonable approach, we can do better with a multilevel (mixed-effects) model. Let’s do that next.</p>
<h1 id="mixed-effects-model">Mixed-effects model</h1>
<p>Here we can allow information to pool between clusters (districts). This would make more sense since there’s a varying number of women in each district as we identified above. We would expect our district index 2 estimate to be more precise (narrower credible interval).</p>
<p>How can information be shared? This is where the structure of our equations can give some insight. The main change is the third line in the equations below. We’re now using an <strong>adaptive prior</strong> that borrows information from <em>each</em> district, to make better estimates for <em>all</em> districts. Instead of specifying numerical values in our prior for $\alpha_j$, we instead replace them with new parameters. These new parameters are an “average alpha”, $\bar{\alpha}$ and $\sigma$ and they have their own priors, which we call a hyperprior. Seeing how we can see parameters embedded in other parameters is how we can also appreciate the “multilevel”-ness of the multilevel model. (As McElreath states in an earlier lecture, it can become parameters all the way down.)</p>
<p><strong>Model <code class="language-plaintext highlighter-rouge">mme</code> equation</strong></p>
\[C_i \sim \text{Binomial}(1, p_i) \tag{binomial likelihood}\]
\[\text{logit}(p_i) = \alpha_{\text{district}[i]} \tag{linear model using logit link}\]
\[\alpha_j \sim \text{Normal}(\bar{\alpha}, \sigma) \tag{adaptive prior}\]
\[\bar{\alpha} \sim \text{Normal}(0, 1.5) \tag{regularizing hyperprior}\]
\[\sigma \sim \text{Exponential}(1) \tag{regularizing hyperprior}\]
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># multilevel model
</span><span class="k">with</span> <span class="n">pm</span><span class="p">.</span><span class="n">Model</span><span class="p">()</span> <span class="k">as</span> <span class="n">mme</span><span class="p">:</span>
<span class="c1"># prior for average district
</span> <span class="n">a_bar</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">"a_bar"</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">,</span> <span class="mf">1.5</span><span class="p">)</span>
<span class="c1"># prior for SD of districts
</span> <span class="n">sigma</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Exponential</span><span class="p">(</span><span class="s">"sigma"</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">)</span>
<span class="c1"># alpha priors for each district
</span> <span class="n">a</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">"a"</span><span class="p">,</span> <span class="n">a_bar</span><span class="p">,</span> <span class="n">sigma</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="nb">len</span><span class="p">(</span><span class="n">df_bangladesh</span><span class="p">[</span><span class="s">"district_code"</span><span class="p">].</span><span class="n">unique</span><span class="p">()))</span>
<span class="c1"># link function
</span> <span class="n">p</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">math</span><span class="p">.</span><span class="n">invlogit</span><span class="p">(</span><span class="n">a</span><span class="p">[</span><span class="n">df_bangladesh</span><span class="p">[</span><span class="s">"district_code"</span><span class="p">]])</span>
<span class="c1"># likelihood, n=1 since each represents an individual woman
</span> <span class="n">c</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Binomial</span><span class="p">(</span><span class="s">"c"</span><span class="p">,</span> <span class="n">n</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">p</span><span class="o">=</span><span class="n">p</span><span class="p">,</span> <span class="n">observed</span><span class="o">=</span><span class="n">df_bangladesh</span><span class="p">[</span><span class="s">"use.contraception"</span><span class="p">])</span>
<span class="n">trace_mme</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">sample</span><span class="p">(</span><span class="n">draws</span><span class="o">=</span><span class="mi">1000</span><span class="p">,</span> <span class="n">random_seed</span><span class="o">=</span><span class="mi">19</span><span class="p">,</span> <span class="n">return_inferencedata</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">progressbar</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Auto-assigning NUTS sampler...
INFO:pymc3:Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
INFO:pymc3:Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
INFO:pymc3:Multiprocess sampling (4 chains in 4 jobs)
NUTS: [a, sigma, a_bar]
INFO:pymc3:NUTS: [a, sigma, a_bar]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 17 seconds.
INFO:pymc3:Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 17 seconds.
</code></pre></div></div>
<p>It looks like there are no divergences here so we don’t have to worry about re-parameterizing. Let’s take a look now.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># View summary of mme model
</span><span class="n">az</span><span class="p">.</span><span class="n">summary</span><span class="p">(</span><span class="n">trace_mme</span><span class="p">).</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>mean</th>
<th>sd</th>
<th>hdi_5.5%</th>
<th>hdi_94.5%</th>
<th>mcse_mean</th>
<th>mcse_sd</th>
<th>ess_mean</th>
<th>ess_sd</th>
<th>ess_bulk</th>
<th>ess_tail</th>
<th>r_hat</th>
</tr>
</thead>
<tbody>
<tr>
<th>a_bar</th>
<td>-0.540</td>
<td>0.088</td>
<td>-0.679</td>
<td>-0.397</td>
<td>0.002</td>
<td>0.001</td>
<td>3125.0</td>
<td>3125.0</td>
<td>3125.0</td>
<td>3066.0</td>
<td>1.00</td>
</tr>
<tr>
<th>a[0]</th>
<td>-0.992</td>
<td>0.198</td>
<td>-1.299</td>
<td>-0.679</td>
<td>0.003</td>
<td>0.002</td>
<td>5931.0</td>
<td>5154.0</td>
<td>5973.0</td>
<td>2419.0</td>
<td>1.00</td>
</tr>
<tr>
<th>a[1]</th>
<td>-0.599</td>
<td>0.360</td>
<td>-1.144</td>
<td>-0.001</td>
<td>0.004</td>
<td>0.004</td>
<td>7047.0</td>
<td>3716.0</td>
<td>7138.0</td>
<td>2443.0</td>
<td>1.01</td>
</tr>
<tr>
<th>a[2]</th>
<td>-0.240</td>
<td>0.501</td>
<td>-1.003</td>
<td>0.559</td>
<td>0.006</td>
<td>0.008</td>
<td>7605.0</td>
<td>2191.0</td>
<td>7501.0</td>
<td>3176.0</td>
<td>1.00</td>
</tr>
<tr>
<th>a[3]</th>
<td>-0.179</td>
<td>0.298</td>
<td>-0.652</td>
<td>0.309</td>
<td>0.004</td>
<td>0.004</td>
<td>6683.0</td>
<td>2486.0</td>
<td>6638.0</td>
<td>3036.0</td>
<td>1.00</td>
</tr>
</tbody>
</table>
</div>
<p>Let’s visualize by plotting the mixed effects model posterior side-by-side with the fixed-effects model that we already visualized above. This will help us appreciate the impact of the multilevel model structure.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">f</span><span class="p">,</span> <span class="p">(</span><span class="n">ax1</span><span class="p">,</span> <span class="n">ax2</span><span class="p">)</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">16</span><span class="p">,</span> <span class="mi">16</span><span class="p">),</span> <span class="n">sharex</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">sharey</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">az</span><span class="p">.</span><span class="n">plot_forest</span><span class="p">(</span><span class="n">trace_mfe</span><span class="p">,</span> <span class="n">combined</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">ax</span><span class="o">=</span><span class="n">ax1</span><span class="p">)</span>
<span class="n">ax1</span><span class="p">.</span><span class="nb">set</span><span class="p">(</span><span class="n">xlabel</span><span class="o">=</span><span class="s">"log-odds"</span><span class="p">,</span> <span class="n">ylabel</span><span class="o">=</span><span class="s">"alpha for each district"</span><span class="p">,</span> <span class="n">title</span><span class="o">=</span><span class="s">'posterior distribution (fixed effects model)'</span><span class="p">)</span>
<span class="n">az</span><span class="p">.</span><span class="n">plot_forest</span><span class="p">(</span><span class="n">trace_mme</span><span class="p">,</span> <span class="n">var_names</span><span class="o">=</span><span class="s">'a'</span><span class="p">,</span> <span class="n">combined</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">ax</span><span class="o">=</span><span class="n">ax2</span><span class="p">)</span>
<span class="n">ax2</span><span class="p">.</span><span class="nb">set</span><span class="p">(</span><span class="n">xlabel</span><span class="o">=</span><span class="s">"log-odds"</span><span class="p">,</span> <span class="n">ylabel</span><span class="o">=</span><span class="s">"alpha for each district"</span><span class="p">,</span> <span class="n">title</span><span class="o">=</span><span class="s">'posterior distribution</span><span class="se">\n</span><span class="s">(mixed effects model)'</span><span class="p">);</span>
</code></pre></div></div>
<p><img src="/assets/2021-10-23-multilevel_modeling_01_files/2021-10-23-multilevel_modeling_01_31_0.png" alt="png" /></p>
<p>The first thing that jumps out is that the uncertainty for several districts now is much smaller, particularly those with small sample sizes like district index 2. This is where the multilevel model really shines. Another change is that we see the mean estimates get pulled towards the center, especially those with more extreme values in the fixed effects models. It may be harder to appreciate in this visualization but we’ll plot on the outcome scale. Let’s explore these differences in different ways.</p>
<p>Let’s look more closely at how sample size impacts the uncertainty for each district in the fixed-effects versus mixed-effects model for each district.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Create a new dataframe
</span><span class="n">col2inspect</span> <span class="o">=</span> <span class="p">[</span><span class="s">"mean"</span><span class="p">,</span> <span class="s">"sd"</span><span class="p">,</span> <span class="s">"hdi_5.5%"</span><span class="p">,</span> <span class="s">"hdi_94.5%"</span><span class="p">]</span>
<span class="n">df_summary</span> <span class="o">=</span> <span class="p">(</span>
<span class="n">pd</span><span class="p">.</span><span class="n">merge</span><span class="p">(</span>
<span class="n">az</span><span class="p">.</span><span class="n">summary</span><span class="p">(</span><span class="n">trace_mfe</span><span class="p">)[</span><span class="n">col2inspect</span><span class="p">],</span>
<span class="n">az</span><span class="p">.</span><span class="n">summary</span><span class="p">(</span><span class="n">trace_mme</span><span class="p">)[</span><span class="n">col2inspect</span><span class="p">],</span>
<span class="n">how</span><span class="o">=</span><span class="s">"inner"</span><span class="p">,</span>
<span class="n">left_index</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
<span class="n">right_index</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="p">.</span><span class="n">reset_index</span><span class="p">(</span><span class="n">drop</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="p">)</span>
<span class="c1"># Add number of women for each district
</span><span class="n">df_summary</span><span class="p">[</span><span class="s">"n_women"</span><span class="p">]</span> <span class="o">=</span> <span class="n">df_bangladesh</span><span class="p">.</span><span class="n">groupby</span><span class="p">(</span><span class="s">"district_code"</span><span class="p">).</span><span class="n">count</span><span class="p">().</span><span class="n">iloc</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">]</span>
<span class="c1"># Inspect
</span><span class="n">df_summary</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>mean_x</th>
<th>sd_x</th>
<th>hdi_5.5%_x</th>
<th>hdi_94.5%_x</th>
<th>mean_y</th>
<th>sd_y</th>
<th>hdi_5.5%_y</th>
<th>hdi_94.5%_y</th>
<th>n_women</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>-1.052</td>
<td>0.205</td>
<td>-1.389</td>
<td>-0.730</td>
<td>-0.992</td>
<td>0.198</td>
<td>-1.299</td>
<td>-0.679</td>
<td>117</td>
</tr>
<tr>
<th>1</th>
<td>-0.584</td>
<td>0.452</td>
<td>-1.287</td>
<td>0.134</td>
<td>-0.599</td>
<td>0.360</td>
<td>-1.144</td>
<td>-0.001</td>
<td>20</td>
</tr>
<tr>
<th>2</th>
<td>1.240</td>
<td>1.156</td>
<td>-0.647</td>
<td>2.980</td>
<td>-0.240</td>
<td>0.501</td>
<td>-1.003</td>
<td>0.559</td>
<td>2</td>
</tr>
<tr>
<th>3</th>
<td>-0.003</td>
<td>0.362</td>
<td>-0.572</td>
<td>0.579</td>
<td>-0.179</td>
<td>0.298</td>
<td>-0.652</td>
<td>0.309</td>
<td>30</td>
</tr>
<tr>
<th>4</th>
<td>-0.569</td>
<td>0.330</td>
<td>-1.077</td>
<td>-0.020</td>
<td>-0.577</td>
<td>0.279</td>
<td>-1.007</td>
<td>-0.118</td>
<td>39</td>
</tr>
</tbody>
</table>
</div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">f</span><span class="p">,</span> <span class="n">ax1</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">8</span><span class="p">,</span> <span class="mi">6</span><span class="p">))</span>
<span class="n">sns</span><span class="p">.</span><span class="n">scatterplot</span><span class="p">(</span>
<span class="n">data</span><span class="o">=</span><span class="n">df_summary</span><span class="p">,</span> <span class="n">x</span><span class="o">=</span><span class="s">"sd_x"</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="s">"sd_y"</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="s">"n_women"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"black"</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.25</span><span class="p">,</span> <span class="n">ax</span><span class="o">=</span><span class="n">ax1</span>
<span class="p">)</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">plot</span><span class="p">([</span><span class="mi">0</span><span class="p">,</span> <span class="mf">1.2</span><span class="p">],</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mf">1.2</span><span class="p">],</span> <span class="n">color</span><span class="o">=</span><span class="s">"black"</span><span class="p">,</span> <span class="n">lw</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">linestyle</span><span class="o">=</span><span class="s">"--"</span><span class="p">)</span>
<span class="n">ax1</span><span class="p">.</span><span class="nb">set</span><span class="p">(</span>
<span class="n">xlim</span><span class="o">=</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mf">1.2</span><span class="p">],</span> <span class="n">ylim</span><span class="o">=</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mf">1.2</span><span class="p">],</span> <span class="n">xlabel</span><span class="o">=</span><span class="s">"fixed effects SD"</span><span class="p">,</span> <span class="n">ylabel</span><span class="o">=</span><span class="s">"mixed effects SD"</span><span class="p">,</span> <span class="n">title</span><span class="o">=</span><span class="s">"Impact of sample size on SD"</span>
<span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[(0.0, 1.2),
(0.0, 1.2),
Text(0.5, 0, 'fixed effects SD'),
Text(0, 0.5, 'mixed effects SD'),
Text(0.5, 1.0, 'Impact of sample size on SD')]
</code></pre></div></div>
<p><img src="/assets/2021-10-23-multilevel_modeling_01_files/2021-10-23-multilevel_modeling_01_34_1.png" alt="png" /></p>
<p>On the x-axis are the standard deviations of the $\alpha$ values for each district in the fixed effects model. On the y-axis are the corresponding SD values for the mixed effects model. The size of the points are the number of women in each district. The dashed diagonal line represents where values between the x and y-axes are equal.</p>
<p>Here, we can see that the fixed effects model shows greater uncertainty, especially when the number of women in each district gets lower (points at the right of the plot). The lower uncertainty with mixed effects is due to partial pooling. When the number of women is high, the mixed effects model shows uncertainty on par with the fixed effects model, meaning there’s “less benefit” to using a mixed effects model but it also doesn’t hurt.</p>
<p>Now, let’s plot on the outcome scale. Here, we’ll show the predicted proportion of women in each district using contraception with fixed-effects and mixed-effects estimates shown side-by-side. We’ll use the <code class="language-plaintext highlighter-rouge">logistic</code> function to transform the log-odds back on the probability scale.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">f</span><span class="p">,</span> <span class="n">ax1</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">16</span><span class="p">,</span> <span class="mi">8</span><span class="p">))</span>
<span class="c1"># Plot means
</span><span class="n">ax1</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span>
<span class="n">df_summary</span><span class="p">.</span><span class="n">index</span> <span class="o">-</span> <span class="mf">0.15</span><span class="p">,</span>
<span class="n">logistic</span><span class="p">(</span><span class="n">df_summary</span><span class="p">[</span><span class="s">"mean_x"</span><span class="p">]),</span>
<span class="n">color</span><span class="o">=</span><span class="s">"gray"</span><span class="p">,</span>
<span class="n">alpha</span><span class="o">=</span><span class="mf">0.5</span><span class="p">,</span>
<span class="n">label</span><span class="o">=</span><span class="s">"fixed effect"</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span>
<span class="n">df_summary</span><span class="p">.</span><span class="n">index</span> <span class="o">+</span> <span class="mf">0.15</span><span class="p">,</span>
<span class="n">logistic</span><span class="p">(</span><span class="n">df_summary</span><span class="p">[</span><span class="s">"mean_y"</span><span class="p">]),</span>
<span class="n">color</span><span class="o">=</span><span class="s">"blue"</span><span class="p">,</span>
<span class="n">alpha</span><span class="o">=</span><span class="mf">0.5</span><span class="p">,</span>
<span class="n">label</span><span class="o">=</span><span class="s">"mixed effect"</span><span class="p">,</span>
<span class="p">)</span>
<span class="c1"># Plot uncertainties
</span><span class="n">ax1</span><span class="p">.</span><span class="n">vlines</span><span class="p">(</span>
<span class="n">x</span><span class="o">=</span><span class="n">df_summary</span><span class="p">.</span><span class="n">index</span> <span class="o">-</span> <span class="mf">0.15</span><span class="p">,</span>
<span class="n">ymin</span><span class="o">=</span><span class="n">logistic</span><span class="p">(</span><span class="n">df_summary</span><span class="p">[</span><span class="s">"hdi_5.5%_x"</span><span class="p">]),</span>
<span class="n">ymax</span><span class="o">=</span><span class="n">logistic</span><span class="p">(</span><span class="n">df_summary</span><span class="p">[</span><span class="s">"hdi_94.5%_x"</span><span class="p">]),</span>
<span class="n">color</span><span class="o">=</span><span class="s">"gray"</span><span class="p">,</span>
<span class="n">linewidth</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="n">alpha</span><span class="o">=</span><span class="mf">0.5</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">vlines</span><span class="p">(</span>
<span class="n">x</span><span class="o">=</span><span class="n">df_summary</span><span class="p">.</span><span class="n">index</span> <span class="o">+</span> <span class="mf">0.15</span><span class="p">,</span>
<span class="n">ymin</span><span class="o">=</span><span class="n">logistic</span><span class="p">(</span><span class="n">df_summary</span><span class="p">[</span><span class="s">"hdi_5.5%_y"</span><span class="p">]),</span>
<span class="n">ymax</span><span class="o">=</span><span class="n">logistic</span><span class="p">(</span><span class="n">df_summary</span><span class="p">[</span><span class="s">"hdi_94.5%_y"</span><span class="p">]),</span>
<span class="n">color</span><span class="o">=</span><span class="s">"blue"</span><span class="p">,</span>
<span class="n">linewidth</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="n">alpha</span><span class="o">=</span><span class="mf">0.5</span><span class="p">,</span>
<span class="p">)</span>
<span class="c1"># Plot average mixed effect line
</span><span class="n">me_mean</span> <span class="o">=</span> <span class="n">logistic</span><span class="p">(</span><span class="n">az</span><span class="p">.</span><span class="n">summary</span><span class="p">(</span><span class="n">trace_mme</span><span class="p">).</span><span class="n">loc</span><span class="p">[</span><span class="s">"a_bar"</span><span class="p">,</span> <span class="s">"mean"</span><span class="p">])</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span>
<span class="p">[</span><span class="o">-</span><span class="mi">10</span><span class="p">,</span> <span class="mi">62</span><span class="p">],</span>
<span class="p">[</span><span class="n">me_mean</span><span class="p">,</span> <span class="n">me_mean</span><span class="p">],</span>
<span class="n">color</span><span class="o">=</span><span class="s">"blue"</span><span class="p">,</span>
<span class="n">lw</span><span class="o">=</span><span class="mf">0.5</span><span class="p">,</span>
<span class="n">linestyle</span><span class="o">=</span><span class="s">"--"</span><span class="p">,</span>
<span class="n">alpha</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="n">label</span><span class="o">=</span><span class="s">"mixed effect mean"</span><span class="p">,</span>
<span class="p">)</span>
<span class="c1"># Plot raw fixed effect line
</span><span class="n">fe_mean</span> <span class="o">=</span> <span class="n">df_bangladesh</span><span class="p">[</span><span class="s">"use.contraception"</span><span class="p">].</span><span class="n">mean</span><span class="p">()</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span>
<span class="p">[</span><span class="o">-</span><span class="mi">10</span><span class="p">,</span> <span class="mi">62</span><span class="p">],</span>
<span class="p">[</span><span class="n">fe_mean</span><span class="p">,</span> <span class="n">fe_mean</span><span class="p">],</span>
<span class="n">color</span><span class="o">=</span><span class="s">"red"</span><span class="p">,</span>
<span class="n">lw</span><span class="o">=</span><span class="mf">0.5</span><span class="p">,</span>
<span class="n">linestyle</span><span class="o">=</span><span class="s">"--"</span><span class="p">,</span>
<span class="n">alpha</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="n">label</span><span class="o">=</span><span class="s">"fixed effect mean"</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">legend</span><span class="p">()</span>
<span class="n">ax1</span><span class="p">.</span><span class="nb">set</span><span class="p">(</span>
<span class="n">xlim</span><span class="o">=</span><span class="p">[</span><span class="o">-</span><span class="mi">2</span><span class="p">,</span> <span class="mi">60</span><span class="p">],</span>
<span class="n">ylim</span><span class="o">=</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span>
<span class="n">xlabel</span><span class="o">=</span><span class="s">"district index"</span><span class="p">,</span>
<span class="n">ylabel</span><span class="o">=</span><span class="s">"proportion predicted</span><span class="se">\n</span><span class="s">for contraception use"</span><span class="p">,</span>
<span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[(-2.0, 60.0),
(0.0, 1.0),
Text(0.5, 0, 'district index'),
Text(0, 0.5, 'proportion predicted\nfor contraception use')]
</code></pre></div></div>
<p><img src="/assets/2021-10-23-multilevel_modeling_01_files/2021-10-23-multilevel_modeling_01_36_1.png" alt="png" /></p>
<p>The district index is shown on the x-axis while the proportion predicted for contraception use is on the y-axis. Visualizing on this scale makes it more directly interpretable than log-odds when thinking about the proportion of women using contraception. The horizontal dashed lines represent the overall means for fixed effect (red) versus the mixed effect (blue). The difference in the lines is due to leveraging the sample size differences. As we had seen with the log-odds scale, the districts with the smallest number of women (like district index 2) had their estimates greatly affected by using a multilevel model. The estimates get pulled towards the horizontal blue dashed line, illustrating the concept of <strong>shrinkage</strong> that results from the <strong>partial pooling</strong> of information across estimates.</p>
<h1 id="summary">Summary</h1>
<p>In this post, we covered a simple example of multilevel modeling using a binomial GLM. We used a dataset where clusters (districts) contained variable sample sizes. By contrasting a fixed effects model with a mixed effects model, we can see how multilevel modeling improves our estimates and reduces uncertainty. Here, we covered an example of varying intercepts. In a later post, we’ll add varying slopes which will help us incorporate predictor variables.</p>
<p>Appendix: Environment and system parameters</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">%</span><span class="n">watermark</span> <span class="o">-</span><span class="n">n</span> <span class="o">-</span><span class="n">u</span> <span class="o">-</span><span class="n">v</span> <span class="o">-</span><span class="n">iv</span> <span class="o">-</span><span class="n">w</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Last updated: Sat Oct 23 2021
Python implementation: CPython
Python version : 3.8.6
IPython version : 7.20.0
pandas : 1.2.1
numpy : 1.20.1
scipy : 1.6.0
seaborn : 0.11.1
matplotlib: 3.3.4
sys : 3.8.6 | packaged by conda-forge | (default, Jan 25 2021, 23:22:12)
[Clang 11.0.1 ]
pymc3 : 3.11.0
arviz : 0.11.1
theano : 1.1.0
Watermark: 2.1.0
</code></pre></div></div>Ben LacarI’ve been on a journey learning multilevel models and Bayesian inference through Richard McElreath’s Statistical Rethinking book. The concepts of shrinkage and partial pooling that are inherent to multilevel models are really interesting to me. First, let’s get some terminology out of the way. As Dr. McElreath highlights, there are multiple terms that are used for multilevel models. heirarchical models mixed-effects models varying effects modelsWorking with PyTorch’s Dataset and Dataloader classes (part 1)2021-06-24T00:00:00+00:002021-06-24T00:00:00+00:00https://benslack19.github.io/data%20science/statistics/pytorch-dataset-objects<p>Recently, I built a simple NLP algorithm for a work project, following the template described in <a href="https://pytorch.org/tutorials/beginner/nlp/deep_learning_tutorial.html#sphx-glr-beginner-nlp-deep-learning-tutorial-py">this tutorial</a>. As I looked to increase my model’s complexity, I started to come across references to <a href="https://pytorch.org/tutorials/beginner/basics/data_tutorial.html">Dataset and Dataloader</a> classes. I tried adapting my work-related code to use these objects, but I found myself running into <a href="https://media.giphy.com/media/xwEVCKetQWpeYyumJJ/giphy.gif">pesky bugs</a>. I thought I should take some time to figure out how to properly use <code class="language-plaintext highlighter-rouge">Dataset</code> and <code class="language-plaintext highlighter-rouge">Dataloader</code> objects. In this post, I adapt the PyTorch NLP tutorial to work with <code class="language-plaintext highlighter-rouge">Dataset</code> and <code class="language-plaintext highlighter-rouge">Dataloader</code> objects. Since my focus is primarily on using these objects, please refer to the <a href="https://pytorch.org/tutorials/beginner/nlp/deep_learning_tutorial.html#sphx-glr-beginner-nlp-deep-learning-tutorial-py">tutorial</a> for details regarding the NLP model.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="n">pd</span>
<span class="kn">import</span> <span class="nn">re</span>
<span class="kn">import</span> <span class="nn">seaborn</span> <span class="k">as</span> <span class="n">sns</span>
<span class="kn">import</span> <span class="nn">torch</span>
<span class="kn">import</span> <span class="nn">torch.nn</span> <span class="k">as</span> <span class="n">nn</span>
<span class="kn">import</span> <span class="nn">torch.nn.functional</span> <span class="k">as</span> <span class="n">F</span>
<span class="kn">import</span> <span class="nn">torch.optim</span> <span class="k">as</span> <span class="n">optim</span>
<span class="kn">from</span> <span class="nn">torch.utils.data</span> <span class="kn">import</span> <span class="n">Dataset</span><span class="p">,</span> <span class="n">DataLoader</span>
<span class="n">torch</span><span class="p">.</span><span class="n">manual_seed</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><torch._C.Generator at 0x7fef88a746f0>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">%</span><span class="n">load_ext</span> <span class="n">nb_black</span>
<span class="o">%</span><span class="n">config</span> <span class="n">InlineBackend</span><span class="p">.</span><span class="n">figure_format</span> <span class="o">=</span> <span class="s">'retina'</span>
<span class="o">%</span><span class="n">load_ext</span> <span class="n">watermark</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Figure aesthetics
</span><span class="n">sns</span><span class="p">.</span><span class="n">set_theme</span><span class="p">()</span>
<span class="n">sns</span><span class="p">.</span><span class="n">set_context</span><span class="p">(</span><span class="s">"talk"</span><span class="p">)</span>
<span class="n">sns</span><span class="p">.</span><span class="n">set_style</span><span class="p">(</span><span class="s">"white"</span><span class="p">)</span>
</code></pre></div></div>
<h1 id="first-attempt">First attempt</h1>
<p>The tutorial generates a simple dataset to use for a logistic regression bag-of-words classifier. It takes a sentence and trains whether the sentence is in English or Spanish. The data was structured originally so each sample was a list.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">train_data</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">(</span><span class="s">"me gusta comer en la cafeteria"</span><span class="p">.</span><span class="n">split</span><span class="p">(),</span> <span class="s">"SPANISH"</span><span class="p">),</span>
<span class="p">(</span><span class="s">"Give it to me"</span><span class="p">.</span><span class="n">split</span><span class="p">(),</span> <span class="s">"ENGLISH"</span><span class="p">),</span>
<span class="p">(</span><span class="s">"No creo que sea una buena idea"</span><span class="p">.</span><span class="n">split</span><span class="p">(),</span> <span class="s">"SPANISH"</span><span class="p">),</span>
<span class="p">(</span><span class="s">"No it is not a good idea to get lost at sea"</span><span class="p">.</span><span class="n">split</span><span class="p">(),</span> <span class="s">"ENGLISH"</span><span class="p">),</span>
<span class="p">]</span>
<span class="n">test_data</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">(</span><span class="s">"Yo creo que si"</span><span class="p">.</span><span class="n">split</span><span class="p">(),</span> <span class="s">"SPANISH"</span><span class="p">),</span>
<span class="p">(</span><span class="s">"it is lost on me"</span><span class="p">.</span><span class="n">split</span><span class="p">(),</span> <span class="s">"ENGLISH"</span><span class="p">),</span>
<span class="p">]</span>
</code></pre></div></div>
<p>Before putting the data into the <code class="language-plaintext highlighter-rouge">Dataset</code> object, I’ll organize it into a dataframe for easier input.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Combine so we have one data object
</span><span class="n">data</span> <span class="o">=</span> <span class="n">train_data</span> <span class="o">+</span> <span class="n">test_data</span>
<span class="c1"># Put into a dataframe
</span><span class="n">df_data</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="n">df_data</span><span class="p">.</span><span class="n">columns</span> <span class="o">=</span> <span class="p">[</span><span class="s">"words"</span><span class="p">,</span> <span class="s">"labels"</span><span class="p">]</span>
<span class="n">df_data</span>
</code></pre></div></div>
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>words</th>
<th>labels</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>[me, gusta, comer, en, la, cafeteria]</td>
<td>SPANISH</td>
</tr>
<tr>
<th>1</th>
<td>[Give, it, to, me]</td>
<td>ENGLISH</td>
</tr>
<tr>
<th>2</th>
<td>[No, creo, que, sea, una, buena, idea]</td>
<td>SPANISH</td>
</tr>
<tr>
<th>3</th>
<td>[No, it, is, not, a, good, idea, to, get, lost...</td>
<td>ENGLISH</td>
</tr>
<tr>
<th>4</th>
<td>[Yo, creo, que, si]</td>
<td>SPANISH</td>
</tr>
<tr>
<th>5</th>
<td>[it, is, lost, on, me]</td>
<td>ENGLISH</td>
</tr>
</tbody>
</table>
</div>
<h2 id="putting-the-data-in-dataset-and-output-with-dataloader">Putting the data in <code class="language-plaintext highlighter-rouge">Dataset</code> and output with <code class="language-plaintext highlighter-rouge">Dataloader</code></h2>
<p>Now it is time to put the data into a <code class="language-plaintext highlighter-rouge">Dataset</code> object. I referred to <a href="https://pytorch.org/tutorials/beginner/basics/data_tutorial.html#">PyTorch’s tutorial on datasets and dataloaders</a> and <a href="https://towardsdatascience.com/how-to-use-datasets-and-dataloader-in-pytorch-for-custom-text-data-270eed7f7c00">this helpful example specific to custom text</a>, especially for making my own dataset class, which is shown here.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">TextDataset</span><span class="p">(</span><span class="n">Dataset</span><span class="p">):</span>
<span class="s">"""
Characterizes the pre-processed SRF custom dataset for PyTorch
"""</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">ids</span><span class="p">,</span> <span class="n">text</span><span class="p">,</span> <span class="n">labels</span><span class="p">):</span>
<span class="s">"""
Initialization. Ids can be useful after splitting the dataset.
"""</span>
<span class="bp">self</span><span class="p">.</span><span class="n">ids</span> <span class="o">=</span> <span class="n">ids</span>
<span class="bp">self</span><span class="p">.</span><span class="n">text</span> <span class="o">=</span> <span class="n">text</span>
<span class="bp">self</span><span class="p">.</span><span class="n">labels</span> <span class="o">=</span> <span class="n">labels</span>
<span class="k">def</span> <span class="nf">__len__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="s">"""
This is simply the number of labels in the dataseta.
"""</span>
<span class="k">return</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">labels</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">__getitem__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">idx</span><span class="p">):</span>
<span class="s">"""
Generate one sample of data
"""</span>
<span class="n">label</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">labels</span><span class="p">[</span><span class="n">idx</span><span class="p">]</span>
<span class="n">text</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">text</span><span class="p">[</span><span class="n">idx</span><span class="p">]</span>
<span class="n">sample</span> <span class="o">=</span> <span class="p">{</span><span class="s">"Text"</span><span class="p">:</span> <span class="n">text</span><span class="p">,</span> <span class="s">"Label"</span><span class="p">:</span> <span class="n">label</span><span class="p">}</span>
<span class="k">return</span> <span class="n">sample</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Put train and test into dataset objects
</span><span class="n">train_ids</span> <span class="o">=</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">4</span><span class="p">)</span>
<span class="n">test_ids</span> <span class="o">=</span> <span class="nb">range</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="mi">6</span><span class="p">)</span>
<span class="n">train_DS1</span> <span class="o">=</span> <span class="n">TextDataset</span><span class="p">(</span>
<span class="n">train_ids</span><span class="p">,</span>
<span class="n">df_data</span><span class="p">.</span><span class="n">loc</span><span class="p">[</span><span class="n">train_ids</span><span class="p">,</span> <span class="s">"words"</span><span class="p">].</span><span class="n">tolist</span><span class="p">(),</span>
<span class="n">df_data</span><span class="p">.</span><span class="n">loc</span><span class="p">[</span><span class="n">train_ids</span><span class="p">,</span> <span class="s">"labels"</span><span class="p">].</span><span class="n">tolist</span><span class="p">(),</span>
<span class="p">)</span>
<span class="n">test_DS1</span> <span class="o">=</span> <span class="n">TextDataset</span><span class="p">(</span>
<span class="n">train_ids</span><span class="p">,</span>
<span class="n">df_data</span><span class="p">.</span><span class="n">loc</span><span class="p">[</span><span class="n">test_ids</span><span class="p">,</span> <span class="s">"words"</span><span class="p">].</span><span class="n">tolist</span><span class="p">(),</span>
<span class="n">df_data</span><span class="p">.</span><span class="n">loc</span><span class="p">[</span><span class="n">test_ids</span><span class="p">,</span> <span class="s">"labels"</span><span class="p">].</span><span class="n">tolist</span><span class="p">(),</span>
<span class="p">)</span>
</code></pre></div></div>
<p>When putting the data into their respective dataset objects, it is important to use the <code class="language-plaintext highlighter-rouge">.tolist()</code> method or else <code class="language-plaintext highlighter-rouge">DataLoader</code> will return an error when retrieving the data. Now let’s use <code class="language-plaintext highlighter-rouge">DataLoader</code> and a simple for loop to return the values of the data. I’ll use only the training data and a <code class="language-plaintext highlighter-rouge">batch_size</code> of 1 for this purpose.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">train_DL</span> <span class="o">=</span> <span class="n">DataLoader</span><span class="p">(</span><span class="n">train_DS1</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">shuffle</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Batch size of 1"</span><span class="p">)</span>
<span class="k">for</span> <span class="p">(</span><span class="n">idx</span><span class="p">,</span> <span class="n">batch</span><span class="p">)</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">train_DL</span><span class="p">):</span> <span class="c1"># Print the 'text' data of the batch
</span>
<span class="k">print</span><span class="p">(</span><span class="n">idx</span><span class="p">,</span> <span class="s">"Text data: "</span><span class="p">,</span> <span class="n">batch</span><span class="p">[</span><span class="s">"Text"</span><span class="p">])</span> <span class="c1"># Print the 'class' data of batch
</span> <span class="k">print</span><span class="p">(</span><span class="n">idx</span><span class="p">,</span> <span class="s">"Label data: "</span><span class="p">,</span> <span class="n">batch</span><span class="p">[</span><span class="s">"Label"</span><span class="p">])</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Batch size of 1
0 Text data: [('me',), ('gusta',), ('comer',), ('en',), ('la',), ('cafeteria',)]
0 Label data: ['SPANISH']
1 Text data: [('Give',), ('it',), ('to',), ('me',)]
1 Label data: ['ENGLISH']
2 Text data: [('No',), ('creo',), ('que',), ('sea',), ('una',), ('buena',), ('idea',)]
2 Label data: ['SPANISH']
3 Text data: [('No',), ('it',), ('is',), ('not',), ('a',), ('good',), ('idea',), ('to',), ('get',), ('lost',), ('at',), ('sea',)]
3 Label data: ['ENGLISH']
</code></pre></div></div>
<p>At first glance, things might look okay but the eagle-eyed will notice that each element in our list is now wrapped as one element. If we increase <code class="language-plaintext highlighter-rouge">batch_size</code> to 2, we get an <a href="https://media.giphy.com/media/eGNtzon6aSbPA6qgU4/giphy.gif">ugly error</a>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">train_DL2</span> <span class="o">=</span> <span class="n">DataLoader</span><span class="p">(</span><span class="n">train_DS1</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">shuffle</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Batch size of 2"</span><span class="p">)</span>
<span class="k">for</span> <span class="p">(</span><span class="n">idx</span><span class="p">,</span> <span class="n">batch</span><span class="p">)</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">train_DL2</span><span class="p">):</span> <span class="c1"># Print the 'text' data of the batch
</span>
<span class="k">print</span><span class="p">(</span><span class="n">idx</span><span class="p">,</span> <span class="s">"Text data: "</span><span class="p">,</span> <span class="n">batch</span><span class="p">[</span><span class="s">"Text"</span><span class="p">])</span> <span class="c1"># Print the 'class' data of batch
</span> <span class="k">print</span><span class="p">(</span><span class="n">idx</span><span class="p">,</span> <span class="s">"Label data: "</span><span class="p">,</span> <span class="n">batch</span><span class="p">[</span><span class="s">"Label"</span><span class="p">],</span> <span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Batch size of 2
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-9-b81921277760> in <module>
2
3 print("Batch size of 2")
----> 4 for (idx, batch) in enumerate(train_DL2): # Print the 'text' data of the batch
5
6 print(idx, "Text data: ", batch["Text"]) # Print the 'class' data of batch
~/opt/anaconda3/envs/sdoh_text/lib/python3.8/site-packages/torch/utils/data/dataloader.py in __next__(self)
515 if self._sampler_iter is None:
516 self._reset()
--> 517 data = self._next_data()
518 self._num_yielded += 1
519 if self._dataset_kind == _DatasetKind.Iterable and \
~/opt/anaconda3/envs/sdoh_text/lib/python3.8/site-packages/torch/utils/data/dataloader.py in _next_data(self)
555 def _next_data(self):
556 index = self._next_index() # may raise StopIteration
--> 557 data = self._dataset_fetcher.fetch(index) # may raise StopIteration
558 if self._pin_memory:
559 data = _utils.pin_memory.pin_memory(data)
~/opt/anaconda3/envs/sdoh_text/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py in fetch(self, possibly_batched_index)
45 else:
46 data = self.dataset[possibly_batched_index]
---> 47 return self.collate_fn(data)
~/opt/anaconda3/envs/sdoh_text/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py in default_collate(batch)
71 return batch
72 elif isinstance(elem, container_abcs.Mapping):
---> 73 return {key: default_collate([d[key] for d in batch]) for key in elem}
74 elif isinstance(elem, tuple) and hasattr(elem, '_fields'): # namedtuple
75 return elem_type(*(default_collate(samples) for samples in zip(*batch)))
~/opt/anaconda3/envs/sdoh_text/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py in <dictcomp>(.0)
71 return batch
72 elif isinstance(elem, container_abcs.Mapping):
---> 73 return {key: default_collate([d[key] for d in batch]) for key in elem}
74 elif isinstance(elem, tuple) and hasattr(elem, '_fields'): # namedtuple
75 return elem_type(*(default_collate(samples) for samples in zip(*batch)))
~/opt/anaconda3/envs/sdoh_text/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py in default_collate(batch)
79 elem_size = len(next(it))
80 if not all(len(elem) == elem_size for elem in it):
---> 81 raise RuntimeError('each element in list of batch should be of equal size')
82 transposed = zip(*batch)
83 return [default_collate(samples) for samples in transposed]
RuntimeError: each element in list of batch should be of equal size
</code></pre></div></div>
<p>What’s going on? With some investigation of which I’ll spare you, it appears that having each sample data already as a list makes confuses <code class="language-plaintext highlighter-rouge">Dataloader</code>. Let’s re-structure out data differently.</p>
<h1 id="re-structuring-data-as-a-comma-separated-string">Re-structuring data as a comma-separated string</h1>
<p>Due to the structure of our model, we still need a way to vectorize each sentence sample, but we can’t have each wrapped as a list. Here is a workaround even if the syntax is awkward. I’m rejoining the elements as a comma-separated string like this:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="s">", "</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="s">"me gusta comer en la cafeteria"</span><span class="p">.</span><span class="n">split</span><span class="p">())</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>'me, gusta, comer, en, la, cafeteria'
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">train_data2</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">(</span><span class="s">", "</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="s">"me gusta comer en la cafeteria"</span><span class="p">.</span><span class="n">split</span><span class="p">()),</span> <span class="s">"SPANISH"</span><span class="p">),</span>
<span class="p">(</span><span class="s">", "</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="s">"Give it to me"</span><span class="p">.</span><span class="n">split</span><span class="p">()),</span> <span class="s">"ENGLISH"</span><span class="p">),</span>
<span class="p">(</span><span class="s">", "</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="s">"No creo que sea una buena idea"</span><span class="p">.</span><span class="n">split</span><span class="p">()),</span> <span class="s">"SPANISH"</span><span class="p">),</span>
<span class="p">(</span><span class="s">", "</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="s">"No it is not a good idea to get lost at sea"</span><span class="p">.</span><span class="n">split</span><span class="p">()),</span> <span class="s">"ENGLISH"</span><span class="p">),</span>
<span class="p">]</span>
<span class="n">test_data2</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">(</span><span class="s">", "</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="s">"Yo creo que si"</span><span class="p">.</span><span class="n">split</span><span class="p">()),</span> <span class="s">"SPANISH"</span><span class="p">),</span>
<span class="p">(</span><span class="s">", "</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="s">"it is lost on me"</span><span class="p">.</span><span class="n">split</span><span class="p">()),</span> <span class="s">"ENGLISH"</span><span class="p">),</span>
<span class="p">]</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">data2</span> <span class="o">=</span> <span class="n">train_data2</span> <span class="o">+</span> <span class="n">test_data2</span>
<span class="n">df_data2</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">data2</span><span class="p">)</span>
<span class="n">df_data2</span><span class="p">.</span><span class="n">columns</span> <span class="o">=</span> <span class="p">[</span><span class="s">"words"</span><span class="p">,</span> <span class="s">"labels"</span><span class="p">]</span>
</code></pre></div></div>
<p>Here’s how the data looks.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df_data2</span>
</code></pre></div></div>
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>words</th>
<th>labels</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>me, gusta, comer, en, la, cafeteria</td>
<td>SPANISH</td>
</tr>
<tr>
<th>1</th>
<td>Give, it, to, me</td>
<td>ENGLISH</td>
</tr>
<tr>
<th>2</th>
<td>No, creo, que, sea, una, buena, idea</td>
<td>SPANISH</td>
</tr>
<tr>
<th>3</th>
<td>No, it, is, not, a, good, idea, to, get, lost,...</td>
<td>ENGLISH</td>
</tr>
<tr>
<th>4</th>
<td>Yo, creo, que, si</td>
<td>SPANISH</td>
</tr>
<tr>
<th>5</th>
<td>it, is, lost, on, me</td>
<td>ENGLISH</td>
</tr>
</tbody>
</table>
</div>
<h2 id="putting-the-data-in-dataset-and-output-with-dataloader-1">Putting the data in <code class="language-plaintext highlighter-rouge">Dataset</code> and output with <code class="language-plaintext highlighter-rouge">Dataloader</code></h2>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">train_DS2</span> <span class="o">=</span> <span class="n">TextDataset</span><span class="p">(</span>
<span class="n">train_ids</span><span class="p">,</span>
<span class="n">df_data2</span><span class="p">.</span><span class="n">loc</span><span class="p">[</span><span class="n">train_ids</span><span class="p">,</span> <span class="s">"words"</span><span class="p">].</span><span class="n">tolist</span><span class="p">(),</span>
<span class="n">df_data2</span><span class="p">.</span><span class="n">loc</span><span class="p">[</span><span class="n">train_ids</span><span class="p">,</span> <span class="s">"labels"</span><span class="p">].</span><span class="n">tolist</span><span class="p">(),</span>
<span class="p">)</span>
<span class="n">test_DS2</span> <span class="o">=</span> <span class="n">TextDataset</span><span class="p">(</span>
<span class="n">test_ids</span><span class="p">,</span>
<span class="n">df_data2</span><span class="p">.</span><span class="n">loc</span><span class="p">[</span><span class="n">test_ids</span><span class="p">,</span> <span class="s">"words"</span><span class="p">].</span><span class="n">tolist</span><span class="p">(),</span>
<span class="n">df_data2</span><span class="p">.</span><span class="n">loc</span><span class="p">[</span><span class="n">test_ids</span><span class="p">,</span> <span class="s">"labels"</span><span class="p">].</span><span class="n">tolist</span><span class="p">(),</span>
<span class="p">)</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">train_DL2a</span> <span class="o">=</span> <span class="n">DataLoader</span><span class="p">(</span><span class="n">train_DS2</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">shuffle</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"batch size of 1"</span><span class="p">)</span>
<span class="k">for</span> <span class="p">(</span><span class="n">idx</span><span class="p">,</span> <span class="n">batch</span><span class="p">)</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">train_DL2a</span><span class="p">):</span>
<span class="k">print</span><span class="p">(</span><span class="n">idx</span><span class="p">,</span> <span class="s">"Text data: "</span><span class="p">,</span> <span class="n">batch</span><span class="p">[</span><span class="s">"Text"</span><span class="p">])</span>
<span class="k">print</span><span class="p">(</span><span class="n">idx</span><span class="p">,</span> <span class="s">"Label data: "</span><span class="p">,</span> <span class="n">batch</span><span class="p">[</span><span class="s">"Label"</span><span class="p">],</span> <span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>batch size of 1
0 Text data: ['me, gusta, comer, en, la, cafeteria']
0 Label data: ['SPANISH']
1 Text data: ['Give, it, to, me']
1 Label data: ['ENGLISH']
2 Text data: ['No, creo, que, sea, una, buena, idea']
2 Label data: ['SPANISH']
3 Text data: ['No, it, is, not, a, good, idea, to, get, lost, at, sea']
3 Label data: ['ENGLISH']
</code></pre></div></div>
<p>Great, we get closer to the expected output where we have one sample, represented as a string, in the list created by <code class="language-plaintext highlighter-rouge">DataLoader</code>. We still have to vectorize this before we input this into our model but we can worry about that later. Additionally, when we increase the <code class="language-plaintext highlighter-rouge">batch_size</code> we don’t get an error anymore.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">train_DL2b</span> <span class="o">=</span> <span class="n">DataLoader</span><span class="p">(</span><span class="n">train_DS2</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">shuffle</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"batch size of 2"</span><span class="p">)</span>
<span class="k">for</span> <span class="p">(</span><span class="n">idx</span><span class="p">,</span> <span class="n">batch</span><span class="p">)</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">train_DL2b</span><span class="p">):</span>
<span class="k">print</span><span class="p">(</span><span class="n">idx</span><span class="p">,</span> <span class="s">"Text data: "</span><span class="p">,</span> <span class="n">batch</span><span class="p">[</span><span class="s">"Text"</span><span class="p">])</span>
<span class="k">print</span><span class="p">(</span><span class="n">idx</span><span class="p">,</span> <span class="s">"Label data: "</span><span class="p">,</span> <span class="n">batch</span><span class="p">[</span><span class="s">"Label"</span><span class="p">],</span> <span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>batch size of 2
0 Text data: ['me, gusta, comer, en, la, cafeteria', 'Give, it, to, me']
0 Label data: ['SPANISH', 'ENGLISH']
1 Text data: ['No, creo, que, sea, una, buena, idea', 'No, it, is, not, a, good, idea, to, get, lost, at, sea']
1 Label data: ['SPANISH', 'ENGLISH']
</code></pre></div></div>
<p>We can also verify that this works for our test set in its own <code class="language-plaintext highlighter-rouge">DataLoader</code> object.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">test_DL2b</span> <span class="o">=</span> <span class="n">DataLoader</span><span class="p">(</span><span class="n">test_DS2</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">shuffle</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"batch size of 2"</span><span class="p">)</span>
<span class="k">for</span> <span class="p">(</span><span class="n">idx</span><span class="p">,</span> <span class="n">batch</span><span class="p">)</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">test_DL2b</span><span class="p">):</span>
<span class="k">print</span><span class="p">(</span><span class="n">idx</span><span class="p">,</span> <span class="s">"Text data: "</span><span class="p">,</span> <span class="n">batch</span><span class="p">[</span><span class="s">"Text"</span><span class="p">])</span>
<span class="k">print</span><span class="p">(</span><span class="n">idx</span><span class="p">,</span> <span class="s">"Label data: "</span><span class="p">,</span> <span class="n">batch</span><span class="p">[</span><span class="s">"Label"</span><span class="p">],</span> <span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>batch size of 2
0 Text data: ['Yo, creo, que, si', 'it, is, lost, on, me']
0 Label data: ['SPANISH', 'ENGLISH']
</code></pre></div></div>
<h1 id="train-model-using-dataloader-objects">Train model using <code class="language-plaintext highlighter-rouge">DataLoader</code> objects</h1>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># word_to_ix maps each word in the vocab to a unique integer, which will be its
# index into the Bag of words vector
</span><span class="n">word_to_ix</span> <span class="o">=</span> <span class="p">{}</span>
<span class="k">for</span> <span class="n">sent</span><span class="p">,</span> <span class="n">_</span> <span class="ow">in</span> <span class="n">data</span><span class="p">:</span>
<span class="k">for</span> <span class="n">word</span> <span class="ow">in</span> <span class="n">sent</span><span class="p">:</span>
<span class="k">if</span> <span class="n">word</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">word_to_ix</span><span class="p">:</span>
<span class="n">word_to_ix</span><span class="p">[</span><span class="n">word</span><span class="p">]</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">word_to_ix</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">word_to_ix</span><span class="p">)</span>
<span class="n">VOCAB_SIZE</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">word_to_ix</span><span class="p">)</span>
<span class="n">NUM_LABELS</span> <span class="o">=</span> <span class="mi">2</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{'me': 0, 'gusta': 1, 'comer': 2, 'en': 3, 'la': 4, 'cafeteria': 5, 'Give': 6, 'it': 7, 'to': 8, 'No': 9, 'creo': 10, 'que': 11, 'sea': 12, 'una': 13, 'buena': 14, 'idea': 15, 'is': 16, 'not': 17, 'a': 18, 'good': 19, 'get': 20, 'lost': 21, 'at': 22, 'Yo': 23, 'si': 24, 'on': 25}
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sent</span> <span class="o">=</span> <span class="s">"me, gusta, comer"</span>
<span class="n">sent</span><span class="p">.</span><span class="n">split</span><span class="p">(</span><span class="s">", "</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>['me', 'gusta', 'comer']
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">BoWClassifier</span><span class="p">(</span><span class="n">nn</span><span class="p">.</span><span class="n">Module</span><span class="p">):</span> <span class="c1"># inheriting from nn.Module!
</span> <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">num_labels</span><span class="p">,</span> <span class="n">vocab_size</span><span class="p">):</span>
<span class="c1"># calls the init function of nn.Module. Dont get confused by syntax,
</span> <span class="c1"># just always do it in an nn.Module
</span> <span class="nb">super</span><span class="p">(</span><span class="n">BoWClassifier</span><span class="p">,</span> <span class="bp">self</span><span class="p">).</span><span class="n">__init__</span><span class="p">()</span>
<span class="c1"># Define the parameters that you will need. In this case, we need A and b,
</span> <span class="c1"># the parameters of the affine mapping.
</span> <span class="c1"># Torch defines nn.Linear(), which provides the affine map.
</span> <span class="c1"># Make sure you understand why the input dimension is vocab_size
</span> <span class="c1"># and the output is num_labels!
</span> <span class="bp">self</span><span class="p">.</span><span class="n">linear</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Linear</span><span class="p">(</span><span class="n">vocab_size</span><span class="p">,</span> <span class="n">num_labels</span><span class="p">)</span>
<span class="c1"># NOTE! The non-linearity log softmax does not have parameters! So we don't need
</span> <span class="c1"># to worry about that here
</span>
<span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">bow_vec</span><span class="p">):</span>
<span class="c1"># Pass the input through the linear layer,
</span> <span class="c1"># then pass that through log_softmax.
</span> <span class="c1"># Many non-linearities and other functions are in torch.nn.functional
</span> <span class="k">return</span> <span class="n">F</span><span class="p">.</span><span class="n">log_softmax</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">linear</span><span class="p">(</span><span class="n">bow_vec</span><span class="p">),</span> <span class="n">dim</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">make_bow_vector</span><span class="p">(</span><span class="n">sentence</span><span class="p">,</span> <span class="n">word_to_ix</span><span class="p">):</span>
<span class="s">"""
Edited from original to get words wrapped in a list back
"""</span>
<span class="n">sentence</span> <span class="o">=</span> <span class="n">sentence</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">split</span><span class="p">(</span><span class="s">", "</span><span class="p">)</span>
<span class="n">vec</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">zeros</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">word_to_ix</span><span class="p">))</span>
<span class="k">for</span> <span class="n">word</span> <span class="ow">in</span> <span class="n">sentence</span><span class="p">:</span>
<span class="n">vec</span><span class="p">[</span><span class="n">word_to_ix</span><span class="p">[</span><span class="n">word</span><span class="p">]]</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">return</span> <span class="n">vec</span><span class="p">.</span><span class="n">view</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">make_target</span><span class="p">(</span><span class="n">label</span><span class="p">,</span> <span class="n">label_to_ix</span><span class="p">):</span>
<span class="s">"""
Altered to extract label from list
"""</span>
<span class="k">return</span> <span class="n">torch</span><span class="p">.</span><span class="n">LongTensor</span><span class="p">([</span><span class="n">label_to_ix</span><span class="p">[</span><span class="n">label</span><span class="p">[</span><span class="mi">0</span><span class="p">]]])</span>
</code></pre></div></div>
<h2 id="batch-size-of-1">Batch size of 1</h2>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">train_DL2a</span> <span class="o">=</span> <span class="n">DataLoader</span><span class="p">(</span><span class="n">train_DS2</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">shuffle</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="n">test_DL2a</span> <span class="o">=</span> <span class="n">DataLoader</span><span class="p">(</span><span class="n">test_DS2</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">shuffle</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">model</span> <span class="o">=</span> <span class="n">BoWClassifier</span><span class="p">(</span><span class="n">NUM_LABELS</span><span class="p">,</span> <span class="n">VOCAB_SIZE</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">param</span> <span class="ow">in</span> <span class="n">model</span><span class="p">.</span><span class="n">parameters</span><span class="p">():</span>
<span class="k">print</span><span class="p">(</span><span class="n">param</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Parameter containing:
tensor([[ 0.0544, 0.0097, 0.0716, -0.0764, -0.0143, -0.0177, 0.0284, -0.0008,
0.1714, 0.0610, -0.0730, -0.1184, -0.0329, -0.0846, -0.0628, 0.0094,
0.1169, 0.1066, -0.1917, 0.1216, 0.0548, 0.1860, 0.1294, -0.1787,
-0.1865, -0.0946],
[ 0.1722, -0.0327, 0.0839, -0.0911, 0.1924, -0.0830, 0.1471, 0.0023,
-0.1033, 0.1008, -0.1041, 0.0577, -0.0566, -0.0215, -0.1885, -0.0935,
0.1064, -0.0477, 0.1953, 0.1572, -0.0092, -0.1309, 0.1194, 0.0609,
-0.1268, 0.1274]], requires_grad=True)
Parameter containing:
tensor([0.1191, 0.1739], requires_grad=True)
</code></pre></div></div>
<p>Note that model parameters are randomly initialized to very small, non-zero values so that gradient descent is not too slow. This point is explained more fully by Andrew Ng in <a href="https://www.youtube.com/watch?v=6by6Xas_Kho&list=PLkDaE6sCZn6Ec-XTbcX1uRg2_u4xOEky0&index=35">this video</a>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">label_to_ix</span> <span class="o">=</span> <span class="p">{</span><span class="s">"SPANISH"</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">"ENGLISH"</span><span class="p">:</span> <span class="mi">1</span><span class="p">}</span>
</code></pre></div></div>
<h3 id="run-on-test-data-before-we-train-just-to-see-a-before-and-after">Run on test data before we train, just to see a before-and-after</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">with</span> <span class="n">torch</span><span class="p">.</span><span class="n">no_grad</span><span class="p">():</span>
<span class="k">for</span> <span class="n">batch</span> <span class="ow">in</span> <span class="n">test_DL2a</span><span class="p">:</span>
<span class="c1"># Alter code from tutorial
</span> <span class="c1"># for instance, label in test_data:
</span> <span class="n">instance</span><span class="p">,</span> <span class="n">label</span> <span class="o">=</span> <span class="n">batch</span><span class="p">[</span><span class="s">"Text"</span><span class="p">],</span> <span class="n">batch</span><span class="p">[</span><span class="s">"Label"</span><span class="p">]</span>
<span class="k">print</span><span class="p">(</span><span class="n">instance</span><span class="p">,</span> <span class="n">label</span><span class="p">)</span>
<span class="n">bow_vec</span> <span class="o">=</span> <span class="n">make_bow_vector</span><span class="p">(</span><span class="n">instance</span><span class="p">,</span> <span class="n">word_to_ix</span><span class="p">)</span>
<span class="n">log_probs</span> <span class="o">=</span> <span class="n">model</span><span class="p">(</span><span class="n">bow_vec</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">log_probs</span><span class="p">,</span> <span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">)</span>
<span class="c1"># Print the matrix column corresponding to "creo"
</span><span class="k">print</span><span class="p">(</span>
<span class="s">"Tensor for 'creo' (before training): "</span><span class="p">,</span>
<span class="nb">next</span><span class="p">(</span><span class="n">model</span><span class="p">.</span><span class="n">parameters</span><span class="p">())[:,</span> <span class="n">word_to_ix</span><span class="p">[</span><span class="s">"creo"</span><span class="p">]],</span>
<span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>['Yo, creo, que, si'] ['SPANISH']
tensor([[-0.9736, -0.4744]])
['it, is, lost, on, me'] ['ENGLISH']
tensor([[-0.7289, -0.6586]])
Tensor for 'creo' (before training): tensor([-0.0730, -0.1041], grad_fn=<SelectBackward>)
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">loss_function</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">NLLLoss</span><span class="p">()</span>
<span class="n">optimizer</span> <span class="o">=</span> <span class="n">optim</span><span class="p">.</span><span class="n">SGD</span><span class="p">(</span><span class="n">model</span><span class="p">.</span><span class="n">parameters</span><span class="p">(),</span> <span class="n">lr</span><span class="o">=</span><span class="mf">0.1</span><span class="p">)</span>
<span class="k">for</span> <span class="n">epoch</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">100</span><span class="p">):</span>
<span class="c1"># for instance, label in data:
</span>
<span class="k">for</span> <span class="p">(</span><span class="n">idx</span><span class="p">,</span> <span class="n">batch</span><span class="p">)</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">train_DL2a</span><span class="p">):</span> <span class="c1"># Print the 'text' data of the batch
</span> <span class="n">instance</span><span class="p">,</span> <span class="n">label</span> <span class="o">=</span> <span class="n">batch</span><span class="p">[</span><span class="s">"Text"</span><span class="p">],</span> <span class="n">batch</span><span class="p">[</span><span class="s">"Label"</span><span class="p">]</span>
<span class="c1"># Step 1. Remember that PyTorch accumulates gradients.
</span> <span class="c1"># We need to clear them out before each instance
</span> <span class="n">model</span><span class="p">.</span><span class="n">zero_grad</span><span class="p">()</span>
<span class="c1"># Step 2. Make our BOW vector and also we must wrap the target in a
</span> <span class="c1"># Tensor as an integer. For example, if the target is SPANISH, then
</span> <span class="c1"># we wrap the integer 0. The loss function then knows that the 0th
</span> <span class="c1"># element of the log probabilities is the log probability
</span> <span class="c1"># corresponding to SPANISH
</span> <span class="n">bow_vec</span> <span class="o">=</span> <span class="n">make_bow_vector</span><span class="p">(</span><span class="n">instance</span><span class="p">,</span> <span class="n">word_to_ix</span><span class="p">)</span>
<span class="n">target</span> <span class="o">=</span> <span class="n">make_target</span><span class="p">(</span><span class="n">label</span><span class="p">,</span> <span class="n">label_to_ix</span><span class="p">)</span>
<span class="c1"># Step 3. Run our forward pass.
</span> <span class="n">log_probs</span> <span class="o">=</span> <span class="n">model</span><span class="p">(</span><span class="n">bow_vec</span><span class="p">)</span>
<span class="c1"># Step 4. Compute the loss, gradients, and update the parameters by
</span> <span class="c1"># calling optimizer.step()
</span> <span class="n">loss</span> <span class="o">=</span> <span class="n">loss_function</span><span class="p">(</span><span class="n">log_probs</span><span class="p">,</span> <span class="n">target</span><span class="p">)</span>
<span class="n">loss</span><span class="p">.</span><span class="n">backward</span><span class="p">()</span>
<span class="n">optimizer</span><span class="p">.</span><span class="n">step</span><span class="p">()</span>
<span class="k">if</span> <span class="p">(</span><span class="n">idx</span> <span class="o">%</span> <span class="mi">4</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="o">&</span> <span class="p">(</span><span class="n">epoch</span> <span class="o">%</span> <span class="mi">20</span> <span class="o">==</span> <span class="mi">0</span><span class="p">):</span> <span class="c1"># Edit when datasets are bigger
</span> <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"epoch: </span><span class="si">{</span><span class="n">epoch</span><span class="si">}</span><span class="s">, training sample: </span><span class="si">{</span><span class="n">idx</span><span class="si">}</span><span class="s">, loss = </span><span class="si">{</span><span class="n">loss</span><span class="p">.</span><span class="n">item</span><span class="p">():</span><span class="mf">0.04</span><span class="n">f</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>epoch: 0, training sample: 0, loss = 0.8369
epoch: 20, training sample: 0, loss = 0.0507
epoch: 40, training sample: 0, loss = 0.0257
epoch: 60, training sample: 0, loss = 0.0172
epoch: 80, training sample: 0, loss = 0.0129
</code></pre></div></div>
<p>We see the loss decrease quickly and saturate by the end of the training epochs.</p>
<h3 id="evaluation-after-training">Evaluation after training</h3>
<p>Look at the test set again, after model training.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">with</span> <span class="n">torch</span><span class="p">.</span><span class="n">no_grad</span><span class="p">():</span>
<span class="k">for</span> <span class="n">batch</span> <span class="ow">in</span> <span class="n">test_DL2a</span><span class="p">:</span>
<span class="c1"># Alter code from tutorial
</span> <span class="c1"># for instance, label in test_data:
</span> <span class="n">instance</span><span class="p">,</span> <span class="n">label</span> <span class="o">=</span> <span class="n">batch</span><span class="p">[</span><span class="s">"Text"</span><span class="p">],</span> <span class="n">batch</span><span class="p">[</span><span class="s">"Label"</span><span class="p">]</span>
<span class="k">print</span><span class="p">(</span><span class="n">instance</span><span class="p">,</span> <span class="n">label</span><span class="p">)</span>
<span class="n">bow_vec</span> <span class="o">=</span> <span class="n">make_bow_vector</span><span class="p">(</span><span class="n">instance</span><span class="p">,</span> <span class="n">word_to_ix</span><span class="p">)</span>
<span class="n">log_probs</span> <span class="o">=</span> <span class="n">model</span><span class="p">(</span><span class="n">bow_vec</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">log_probs</span><span class="p">,</span> <span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>['Yo, creo, que, si'] ['SPANISH']
tensor([[-0.2056, -1.6828]])
['it, is, lost, on, me'] ['ENGLISH']
tensor([[-2.7960, -0.0630]])
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Print the matrix column corresponding to "creo"
</span><span class="k">print</span><span class="p">(</span>
<span class="s">"Matrix for 'creo' (after training): "</span><span class="p">,</span>
<span class="nb">next</span><span class="p">(</span><span class="n">model</span><span class="p">.</span><span class="n">parameters</span><span class="p">())[:,</span> <span class="n">word_to_ix</span><span class="p">[</span><span class="s">"creo"</span><span class="p">]],</span>
<span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Matrix for 'creo' (after training): tensor([ 0.3702, -0.5473], grad_fn=<SelectBackward>)
</code></pre></div></div>
<p>We see that the coefficients for the Spanish word “creo” separate quite nicely and relative to the initial values. <a href="https://media.giphy.com/media/U8GLl0bUYFLZVquOfY/giphy.gif">I believe</a> that the model training was successful.</p>
<h1 id="summary">Summary</h1>
<p>In this post, I sought to better understand how to use <code class="language-plaintext highlighter-rouge">Dataset</code> and <code class="language-plaintext highlighter-rouge">Dataloader</code> objects, especially in the context of model training. Fleshing this out showed me where I had to re-structure my data to get my code to work properly. Here, I had a batch size of 1, to mimic the original PyTorch tutorial. In a later post, I’ll write about how to take advantage of batching which is more relevant in larger datasets.</p>
<p>Appendix: Environment and system parameters</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">%</span><span class="n">watermark</span> <span class="o">-</span><span class="n">n</span> <span class="o">-</span><span class="n">u</span> <span class="o">-</span><span class="n">v</span> <span class="o">-</span><span class="n">iv</span> <span class="o">-</span><span class="n">w</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Last updated: Thu Jun 24 2021
Python implementation: CPython
Python version : 3.8.6
IPython version : 7.22.0
numpy : 1.19.5
torch : 1.8.1
re : 2.2.1
json : 2.0.9
seaborn: 0.11.1
pandas : 1.2.1
Watermark: 2.1.0
</code></pre></div></div>Ben LacarRecently, I built a simple NLP algorithm for a work project, following the template described in this tutorial. As I looked to increase my model’s complexity, I started to come across references to Dataset and Dataloader classes. I tried adapting my work-related code to use these objects, but I found myself running into pesky bugs. I thought I should take some time to figure out how to properly use Dataset and Dataloader objects. In this post, I adapt the PyTorch NLP tutorial to work with Dataset and Dataloader objects. Since my focus is primarily on using these objects, please refer to the tutorial for details regarding the NLP model.Extending the chain2021-06-21T00:00:00+00:002021-06-21T00:00:00+00:00https://benslack19.github.io/revisiting-the-chain<p>Last month, I <a href="https://benslack19.github.io/the-chain/">wrote</a> about how I aimed to make daily contributions towards my goals, no matter how small. Not surprisingly, it has been challenging. Nevertheless, I <em>have</em> maintained a chart where I can at least see what goals I’ve been able to contribute to in the last month.</p>
<p>Where have I spent most time? It’s been the subject that most interests me and what I do first thing in the morning: statistics and Bayesian data analysis. Where have I fallen behind? The area of tool development and in writing. I feel like the tool development has lead to the lowest ROI currently. It is probably good that I don’t spend too much time there. But writing I can maintain. I continue to think of topics and will make additional entries soon.</p>Ben LacarLast month, I wrote about how I aimed to make daily contributions towards my goals, no matter how small. Not surprisingly, it has been challenging. Nevertheless, I have maintained a chart where I can at least see what goals I’ve been able to contribute to in the last month.