{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Using the parsim API\n",
    "\n",
    "**parsim** is intended to be used as a command-line tool. For some particular tasks, however, it can be practical to use the Python API in your own Python script, or interactivly. Post-processing and analysis of results is one such task. Here we show a few examples.\n",
    "\n",
    "We assume that there is a parsim project in which we have have worked with the simplistic sample models discussed in the tutorial section. We make that project directory, here named `demo`, our current directory, before starting a Python interpreter. \n",
    "\n",
    "    $ cd demo\n",
    "\n",
    "Before you continue, please have look at the tutorial, so that you are familiar with the simple model template `box` used there. We recall that the Python simulation script is called `calc.py`. For our examples, we will create and execute two small experimental designs (the output of the parsim command is not shown here). In both cases, the stochastic parameters are defined in a parameter file `box_uniform.par`:\n",
    "\n",
    "    length:     uniform(10, 4)     # 12 [m]\n",
    "    width:      uniform(3, 2)      # 4 [m]\n",
    "    height:     uniform(1.3, 0.4)  # 1.5 [m]\n",
    "    density:    uniform(950, 100)  # 1000 [kg/m3]\n",
    "\n",
    "First, we create a two-level full factorial design called `box_ff2n`. With four varying parameters, this yields 16 cases.\n",
    "\n",
    "```\n",
    "$ psm doe -t box --name box_ff2n box_uniform.par ff2n beta=0.999\n",
    "```    \n",
    "\n",
    "We execute the simulation script and collect results:\n",
    "\n",
    "```\n",
    "$ psm run box_ff2n calc.py\n",
    "$ psm collect box_ff2n\n",
    "``` \n",
    "\n",
    "Second, we try the Generalized Subset Design (GSD) available in the `pyDOE2` package. This scheme makes it possible to create reduced designs with more than two levels. \n",
    "Here we try three levels for `length` and `width`, and two levels for the other parameters. The size of the design is reduced by a factor two, which gives us 18 cases.\n",
    "\n",
    "```\n",
    "$ psm doe -t box --name box_gsd box_uniform.par gsd beta=0.999 levels=3,3,2,2 reduction=2\n",
    "$ psm run box_gsd calc.py\n",
    "$ psm collect box_gsd\n",
    "```\n",
    "\n",
    "Now start a Python interpreter of your choice, e.g. `ipython` or a Jupyter Notebook (this API tutorial is itself created as a Jupyter Notebook). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "C:\\Users\\olawid\\PycharmProjects\\psm\\doc\\demo\n"
     ]
    }
   ],
   "source": [
    "%cd demo"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loading parsim Case and Study objects\n",
    "\n",
    "In the previous section, we created parameters studies `box_ff2n` and `box_gsd`. The parsim APi can be used to load the corresponding parsim objects into a Python session.\n",
    "\n",
    "We start by importing the parsim API:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import parsim.core as ps"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Remeber that we started the Python interpreter in the project directory, which is therefor our current directory. Let's load the two studies from the example above. To do this we uspply the study name as an argument to the `Study` class constructor: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "s_ff2n = ps.Study('box_ff2n')\n",
    "s_gsd = ps.Study('box_gsd')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also load individual cases with the `Case` class. To open an individual case, which is part of Study, we use the names of both Study and Case, separated by a colon:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "c_gsd_4 = ps.Case('box_gsd:4')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "On the command-line, the `psm info` command can be used to output information about studies and cases. In a Python console, you can have the same information by printing the output of the object `info` method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Study name            : box_ff2n\n",
      "Creation date         : 2019-08-29 16:55:58\n",
      "Description           : \n",
      "Project name          : myProject\n",
      "Template path         : C:\\Users\\olawid\\PycharmProjects\\psm\\doc\\demo\\modelTemplates\\box\n",
      "Parsim version        : 1.0.dev\n",
      "Project path          : C:\\Users\\olawid\\PycharmProjects\\psm\\doc\\demo\n",
      "Caselist/DOE params   : ['length', 'width', 'height', 'density']\n",
      "DOE scheme            : ff2n\n",
      "DOE arguments         : {'beta': 0.999}\n",
      "--------------------------------------------------------\n",
      "Variable parameter distributions (DOE)\n",
      "--------------------------------------------------------\n",
      "length                : uniform(10, 4)\n",
      "width                 : uniform(3, 2)\n",
      "height                : uniform(1.3, 0.4)\n",
      "density               : uniform(950, 100)\n",
      "--------------------------------------------------------\n",
      "Default parameters (defined in template)\n",
      "--------------------------------------------------------\n",
      "output_file           : results.json\n",
      "color                 : black\n",
      "-------------------------------------------------------------------------\n",
      " Case#  Case_ID           Description\n",
      "-------------------------------------------------------------------------\n",
      " 1      1                 \n",
      " 2      2                 \n",
      " 3      3                 \n",
      " 4      4                 \n",
      " 5      5                 \n",
      " 6      6                 \n",
      " 7      7                 \n",
      " 8      8                 \n",
      " 9      9                 \n",
      " 10     10                \n",
      " 11     11                \n",
      " 12     12                \n",
      " 13     13                \n",
      " 14     14                \n",
      " 15     15                \n",
      " 16     16                \n",
      "-------------------------------------------------------------------------\n",
      "\n"
     ]
    }
   ],
   "source": [
    "print(s_ff2n.info())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In a similar manner you can show the log for a case or study. To look at the logged history of the 4th GSD case, for example,"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2019-08-29 16:56:49 - INFO: Parsim case \"4\" successfully created\n",
      "2019-08-29 16:56:52 - INFO: Executing command/script/executable: calc.py \n",
      "2019-08-29 16:56:52 - INFO: Executable finished successfully (runtime: 0:00:00.155078)\n",
      "   Executable : C:\\Users\\olawid\\PycharmProjects\\psm\\doc\\demo\\study_box_gsd\\case_4\\calc.py \n",
      "   stdout     : C:\\Users\\olawid\\PycharmProjects\\psm\\doc\\demo\\study_box_gsd\\case_4\\calc.out \n",
      "   stderr     : C:\\Users\\olawid\\PycharmProjects\\psm\\doc\\demo\\study_box_gsd\\case_4\\calc.err\n",
      "\n"
     ]
    }
   ],
   "source": [
    "print(c_gsd_4.log())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Working with input parameters and results\n",
    "\n",
    "In the following example, we will show how we can use the parsim API and the functionality of the `pandas` library. \n",
    "We will also show how we can make a simple linear regression model using the `statsmodels` library \n",
    "\n",
    "We start by importing the popular packages `pandas` and `statsmodels`. We also import the parsim API from `parsim.core`.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import statsmodels.api as sm\n",
    "\n",
    "import parsim.core as ps"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We load the full-factorial study, as before:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "s_ff2n = ps.Study('box_ff2n')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Parameters and results as pandas DataFrame and Series objects\n",
    "\n",
    "Both `Study` and `Case` classes provide data in the form of pandas objects. For a study, the `caselist` attribute is a `DataFrame` with the values of all varying parameters. Similarily, all collected results of the study are aggregated in `results`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>length</th>\n",
       "      <th>width</th>\n",
       "      <th>height</th>\n",
       "      <th>density</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>CASENAME</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>13.998</td>\n",
       "      <td>4.999</td>\n",
       "      <td>1.6998</td>\n",
       "      <td>1049.95</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>10.002</td>\n",
       "      <td>4.999</td>\n",
       "      <td>1.6998</td>\n",
       "      <td>1049.95</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>13.998</td>\n",
       "      <td>3.001</td>\n",
       "      <td>1.6998</td>\n",
       "      <td>1049.95</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>10.002</td>\n",
       "      <td>3.001</td>\n",
       "      <td>1.6998</td>\n",
       "      <td>1049.95</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>13.998</td>\n",
       "      <td>4.999</td>\n",
       "      <td>1.3002</td>\n",
       "      <td>1049.95</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>10.002</td>\n",
       "      <td>4.999</td>\n",
       "      <td>1.3002</td>\n",
       "      <td>1049.95</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>13.998</td>\n",
       "      <td>3.001</td>\n",
       "      <td>1.3002</td>\n",
       "      <td>1049.95</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>10.002</td>\n",
       "      <td>3.001</td>\n",
       "      <td>1.3002</td>\n",
       "      <td>1049.95</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>13.998</td>\n",
       "      <td>4.999</td>\n",
       "      <td>1.6998</td>\n",
       "      <td>950.05</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>10.002</td>\n",
       "      <td>4.999</td>\n",
       "      <td>1.6998</td>\n",
       "      <td>950.05</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>13.998</td>\n",
       "      <td>3.001</td>\n",
       "      <td>1.6998</td>\n",
       "      <td>950.05</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>10.002</td>\n",
       "      <td>3.001</td>\n",
       "      <td>1.6998</td>\n",
       "      <td>950.05</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>13.998</td>\n",
       "      <td>4.999</td>\n",
       "      <td>1.3002</td>\n",
       "      <td>950.05</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>10.002</td>\n",
       "      <td>4.999</td>\n",
       "      <td>1.3002</td>\n",
       "      <td>950.05</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>13.998</td>\n",
       "      <td>3.001</td>\n",
       "      <td>1.3002</td>\n",
       "      <td>950.05</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>10.002</td>\n",
       "      <td>3.001</td>\n",
       "      <td>1.3002</td>\n",
       "      <td>950.05</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          length  width  height  density\n",
       "CASENAME                                \n",
       "1         13.998  4.999  1.6998  1049.95\n",
       "2         10.002  4.999  1.6998  1049.95\n",
       "3         13.998  3.001  1.6998  1049.95\n",
       "4         10.002  3.001  1.6998  1049.95\n",
       "5         13.998  4.999  1.3002  1049.95\n",
       "6         10.002  4.999  1.3002  1049.95\n",
       "7         13.998  3.001  1.3002  1049.95\n",
       "8         10.002  3.001  1.3002  1049.95\n",
       "9         13.998  4.999  1.6998   950.05\n",
       "10        10.002  4.999  1.6998   950.05\n",
       "11        13.998  3.001  1.6998   950.05\n",
       "12        10.002  3.001  1.6998   950.05\n",
       "13        13.998  4.999  1.3002   950.05\n",
       "14        10.002  4.999  1.3002   950.05\n",
       "15        13.998  3.001  1.3002   950.05\n",
       "16        10.002  3.001  1.3002   950.05"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s_ff2n.caselist"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>base_area</th>\n",
       "      <th>volume</th>\n",
       "      <th>mass</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>CASENAME</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>69.976002</td>\n",
       "      <td>118.945208</td>\n",
       "      <td>124886.521349</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>49.999998</td>\n",
       "      <td>84.989997</td>\n",
       "      <td>89235.246931</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>42.007998</td>\n",
       "      <td>71.405195</td>\n",
       "      <td>74971.884491</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>30.016002</td>\n",
       "      <td>51.021200</td>\n",
       "      <td>53569.709150</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>69.976002</td>\n",
       "      <td>90.982798</td>\n",
       "      <td>95527.388551</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>49.999998</td>\n",
       "      <td>65.009997</td>\n",
       "      <td>68257.246770</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>42.007998</td>\n",
       "      <td>54.618799</td>\n",
       "      <td>57347.008010</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>30.016002</td>\n",
       "      <td>39.026806</td>\n",
       "      <td>40976.194750</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>69.976002</td>\n",
       "      <td>118.945208</td>\n",
       "      <td>113003.895050</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>49.999998</td>\n",
       "      <td>84.989997</td>\n",
       "      <td>80744.746270</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>42.007998</td>\n",
       "      <td>71.405195</td>\n",
       "      <td>67838.505510</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>30.016002</td>\n",
       "      <td>51.021200</td>\n",
       "      <td>48472.691250</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>69.976002</td>\n",
       "      <td>90.982798</td>\n",
       "      <td>86438.207050</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>49.999998</td>\n",
       "      <td>65.009997</td>\n",
       "      <td>61762.748029</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>42.007998</td>\n",
       "      <td>54.618799</td>\n",
       "      <td>51890.589990</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>30.016002</td>\n",
       "      <td>39.026806</td>\n",
       "      <td>37077.416851</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          base_area      volume           mass\n",
       "CASENAME                                      \n",
       "1         69.976002  118.945208  124886.521349\n",
       "2         49.999998   84.989997   89235.246931\n",
       "3         42.007998   71.405195   74971.884491\n",
       "4         30.016002   51.021200   53569.709150\n",
       "5         69.976002   90.982798   95527.388551\n",
       "6         49.999998   65.009997   68257.246770\n",
       "7         42.007998   54.618799   57347.008010\n",
       "8         30.016002   39.026806   40976.194750\n",
       "9         69.976002  118.945208  113003.895050\n",
       "10        49.999998   84.989997   80744.746270\n",
       "11        42.007998   71.405195   67838.505510\n",
       "12        30.016002   51.021200   48472.691250\n",
       "13        69.976002   90.982798   86438.207050\n",
       "14        49.999998   65.009997   61762.748029\n",
       "15        42.007998   54.618799   51890.589990\n",
       "16        30.016002   39.026806   37077.416851"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s_ff2n.results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `results` DataFrame may contain missing values, `NaN`, if the case simulation crashed or failed to produce a result.\n",
    "\n",
    "The `parameter` attribute shows the values and sources of the parameters that all cases of the study have in common. In the present example, all constant parameters have their default values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>value</th>\n",
       "      <th>source</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>color</th>\n",
       "      <td>black</td>\n",
       "      <td>default</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>output_file</th>\n",
       "      <td>results.json</td>\n",
       "      <td>default</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    value   source\n",
       "color               black  default\n",
       "output_file  results.json  default"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s_ff2n.parameters"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Case objects have the attributes `parameters` and `results`, but for obvious reasons not the `caselist`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>value</th>\n",
       "      <th>source</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>length</th>\n",
       "      <td>10.002</td>\n",
       "      <td>caselist</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>width</th>\n",
       "      <td>3.001</td>\n",
       "      <td>caselist</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>height</th>\n",
       "      <td>1.6998</td>\n",
       "      <td>caselist</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>density</th>\n",
       "      <td>1049.95</td>\n",
       "      <td>caselist</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>color</th>\n",
       "      <td>black</td>\n",
       "      <td>default</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>output_file</th>\n",
       "      <td>results.json</td>\n",
       "      <td>default</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    value    source\n",
       "length             10.002  caselist\n",
       "width               3.001  caselist\n",
       "height             1.6998  caselist\n",
       "density           1049.95  caselist\n",
       "color               black   default\n",
       "output_file  results.json   default"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "c_ff2n_4 = ps.Case('box_ff2n:4')\n",
    "\n",
    "c_ff2n_4.parameters"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that it's easy to see if a parameter is set by the caselist, on the user commandline, or if it's a default value.\n",
    "\n",
    "`Case.parameters` is a pandas DataFrame, while `Case.results` is pandas Series object,"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "base_area       30.016002\n",
       "volume          51.021200\n",
       "mass         53569.709150\n",
       "Name: results, dtype: float64"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "c_ff2n_4.results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Merging datasets from several Studies\n",
    "\n",
    "Sooner or later, you will be in a situation where you will want to make a joint analysis of the results from more than one Study. Maybe you started out with a small study, which only allows for a simple linear model, for example a factorial design in two levels, such as the `box_ff2n` study avobe. Then you realize that a one or two parameters should rather be varied on at least three levels, in order to capture also quadratic terms. For example the `box_gsd` study se saw earlier, where both `length` and `width` parameters were varied on three levels. From a statistical point of view, this is a rather poor example, but that's another story... Let's go ahead and load the GSD study as well, and merge the two for analysis..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "s_gsd = ps.Study('box_gsd')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The results and the caselist from both studies can now be concatenated, but we need to decide what to do with the fact that the DataFrames from both studies have the same index entries. We could either ignore the original indices altogether, creating a new index in the process, or try to keep track of the identities of the cases also in the concatenated datasets. Here we chose the latter option, by creating a hierachical index indicating the names of the original studies. (The option `sort=False` is used to avoid sorting the columns in lexical order.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "results = pd.concat([s_ff2n.results, s_gsd.results], keys=['ff2n', 'gsd'], sort=False)\n",
    "caselist = pd.concat([s_ff2n.caselist, s_gsd.caselist], keys=['ff2n', 'gsd'], sort=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>length</th>\n",
       "      <th>width</th>\n",
       "      <th>height</th>\n",
       "      <th>density</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>CASENAME</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"16\" valign=\"top\">ff2n</th>\n",
       "      <th>1</th>\n",
       "      <td>13.998</td>\n",
       "      <td>4.999</td>\n",
       "      <td>1.6998</td>\n",
       "      <td>1049.95</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>10.002</td>\n",
       "      <td>4.999</td>\n",
       "      <td>1.6998</td>\n",
       "      <td>1049.95</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>13.998</td>\n",
       "      <td>3.001</td>\n",
       "      <td>1.6998</td>\n",
       "      <td>1049.95</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>10.002</td>\n",
       "      <td>3.001</td>\n",
       "      <td>1.6998</td>\n",
       "      <td>1049.95</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>13.998</td>\n",
       "      <td>4.999</td>\n",
       "      <td>1.3002</td>\n",
       "      <td>1049.95</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>10.002</td>\n",
       "      <td>4.999</td>\n",
       "      <td>1.3002</td>\n",
       "      <td>1049.95</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>13.998</td>\n",
       "      <td>3.001</td>\n",
       "      <td>1.3002</td>\n",
       "      <td>1049.95</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>10.002</td>\n",
       "      <td>3.001</td>\n",
       "      <td>1.3002</td>\n",
       "      <td>1049.95</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>13.998</td>\n",
       "      <td>4.999</td>\n",
       "      <td>1.6998</td>\n",
       "      <td>950.05</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>10.002</td>\n",
       "      <td>4.999</td>\n",
       "      <td>1.6998</td>\n",
       "      <td>950.05</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>13.998</td>\n",
       "      <td>3.001</td>\n",
       "      <td>1.6998</td>\n",
       "      <td>950.05</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>10.002</td>\n",
       "      <td>3.001</td>\n",
       "      <td>1.6998</td>\n",
       "      <td>950.05</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>13.998</td>\n",
       "      <td>4.999</td>\n",
       "      <td>1.3002</td>\n",
       "      <td>950.05</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>10.002</td>\n",
       "      <td>4.999</td>\n",
       "      <td>1.3002</td>\n",
       "      <td>950.05</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>13.998</td>\n",
       "      <td>3.001</td>\n",
       "      <td>1.3002</td>\n",
       "      <td>950.05</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>10.002</td>\n",
       "      <td>3.001</td>\n",
       "      <td>1.3002</td>\n",
       "      <td>950.05</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"18\" valign=\"top\">gsd</th>\n",
       "      <th>1</th>\n",
       "      <td>13.998</td>\n",
       "      <td>4.999</td>\n",
       "      <td>1.6998</td>\n",
       "      <td>1049.95</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>13.998</td>\n",
       "      <td>3.001</td>\n",
       "      <td>1.6998</td>\n",
       "      <td>1049.95</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>10.002</td>\n",
       "      <td>4.999</td>\n",
       "      <td>1.6998</td>\n",
       "      <td>1049.95</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>10.002</td>\n",
       "      <td>3.001</td>\n",
       "      <td>1.6998</td>\n",
       "      <td>1049.95</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>13.998</td>\n",
       "      <td>4.999</td>\n",
       "      <td>1.3002</td>\n",
       "      <td>950.05</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>13.998</td>\n",
       "      <td>3.001</td>\n",
       "      <td>1.3002</td>\n",
       "      <td>950.05</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>10.002</td>\n",
       "      <td>4.999</td>\n",
       "      <td>1.3002</td>\n",
       "      <td>950.05</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>10.002</td>\n",
       "      <td>3.001</td>\n",
       "      <td>1.3002</td>\n",
       "      <td>950.05</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>13.998</td>\n",
       "      <td>4.000</td>\n",
       "      <td>1.6998</td>\n",
       "      <td>950.05</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>10.002</td>\n",
       "      <td>4.000</td>\n",
       "      <td>1.6998</td>\n",
       "      <td>950.05</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>13.998</td>\n",
       "      <td>4.000</td>\n",
       "      <td>1.3002</td>\n",
       "      <td>1049.95</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>10.002</td>\n",
       "      <td>4.000</td>\n",
       "      <td>1.3002</td>\n",
       "      <td>1049.95</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>12.000</td>\n",
       "      <td>4.999</td>\n",
       "      <td>1.6998</td>\n",
       "      <td>950.05</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>12.000</td>\n",
       "      <td>3.001</td>\n",
       "      <td>1.6998</td>\n",
       "      <td>950.05</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>12.000</td>\n",
       "      <td>4.999</td>\n",
       "      <td>1.3002</td>\n",
       "      <td>1049.95</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>12.000</td>\n",
       "      <td>3.001</td>\n",
       "      <td>1.3002</td>\n",
       "      <td>1049.95</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>12.000</td>\n",
       "      <td>4.000</td>\n",
       "      <td>1.6998</td>\n",
       "      <td>1049.95</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>12.000</td>\n",
       "      <td>4.000</td>\n",
       "      <td>1.3002</td>\n",
       "      <td>950.05</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               length  width  height  density\n",
       "     CASENAME                                \n",
       "ff2n 1         13.998  4.999  1.6998  1049.95\n",
       "     2         10.002  4.999  1.6998  1049.95\n",
       "     3         13.998  3.001  1.6998  1049.95\n",
       "     4         10.002  3.001  1.6998  1049.95\n",
       "     5         13.998  4.999  1.3002  1049.95\n",
       "     6         10.002  4.999  1.3002  1049.95\n",
       "     7         13.998  3.001  1.3002  1049.95\n",
       "     8         10.002  3.001  1.3002  1049.95\n",
       "     9         13.998  4.999  1.6998   950.05\n",
       "     10        10.002  4.999  1.6998   950.05\n",
       "     11        13.998  3.001  1.6998   950.05\n",
       "     12        10.002  3.001  1.6998   950.05\n",
       "     13        13.998  4.999  1.3002   950.05\n",
       "     14        10.002  4.999  1.3002   950.05\n",
       "     15        13.998  3.001  1.3002   950.05\n",
       "     16        10.002  3.001  1.3002   950.05\n",
       "gsd  1         13.998  4.999  1.6998  1049.95\n",
       "     2         13.998  3.001  1.6998  1049.95\n",
       "     3         10.002  4.999  1.6998  1049.95\n",
       "     4         10.002  3.001  1.6998  1049.95\n",
       "     5         13.998  4.999  1.3002   950.05\n",
       "     6         13.998  3.001  1.3002   950.05\n",
       "     7         10.002  4.999  1.3002   950.05\n",
       "     8         10.002  3.001  1.3002   950.05\n",
       "     9         13.998  4.000  1.6998   950.05\n",
       "     10        10.002  4.000  1.6998   950.05\n",
       "     11        13.998  4.000  1.3002  1049.95\n",
       "     12        10.002  4.000  1.3002  1049.95\n",
       "     13        12.000  4.999  1.6998   950.05\n",
       "     14        12.000  3.001  1.6998   950.05\n",
       "     15        12.000  4.999  1.3002  1049.95\n",
       "     16        12.000  3.001  1.3002  1049.95\n",
       "     17        12.000  4.000  1.6998  1049.95\n",
       "     18        12.000  4.000  1.3002   950.05"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "caselist"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Linear regression analysis"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will use the `statsmodels` package to fit the combined data to a simple linear model. \n",
    "\n",
    "For numerical robustness, you are strongly adviced to normalize or standardize your data! We name our standardized datasets `y` and `x`, for dependent and independent variables, respectively. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "y = (results - results.mean())/results.std()\n",
    "x = (caselist - caselist.mean())/caselist.std()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Fitting a model `statsmodels`typically involves three steps:\n",
    "\n",
    "1. Use the model class to describe the model,\n",
    "2. Fit the model\n",
    "3. Inspect the results\n",
    "\n",
    "Here we use the `OLS` class, for ordinary least squares. \n",
    "\n",
    "First we define and fit the model to our data; we use the output variable `volume` for this example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "mod = sm.OLS(y['volume'], x)  # define model\n",
    "res = mod.fit()               # fit model to data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's look at the results:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>OLS Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>         <td>volume</td>      <th>  R-squared:         </th> <td>   0.973</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>                   <td>OLS</td>       <th>  Adj. R-squared:    </th> <td>   0.969</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>             <td>Least Squares</td>  <th>  F-statistic:       </th> <td>   269.5</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>             <td>Thu, 26 Sep 2019</td> <th>  Prob (F-statistic):</th> <td>4.81e-23</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>                 <td>16:53:39</td>     <th>  Log-Likelihood:    </th> <td>  13.618</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>No. Observations:</th>      <td>    34</td>      <th>  AIC:               </th> <td>  -19.24</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Df Residuals:</th>          <td>    30</td>      <th>  BIC:               </th> <td>  -13.13</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Df Model:</th>              <td>     4</td>      <th>                     </th>     <td> </td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Covariance Type:</th>      <td>nonrobust</td>    <th>                     </th>     <td> </td>   \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "     <td></td>        <th>coef</th>     <th>std err</th>      <th>t</th>      <th>P>|t|</th>  <th>[0.025</th>    <th>0.975]</th>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>length</th>  <td>    0.4915</td> <td>    0.030</td> <td>   16.361</td> <td> 0.000</td> <td>    0.430</td> <td>    0.553</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>width</th>   <td>    0.7373</td> <td>    0.030</td> <td>   24.541</td> <td> 0.000</td> <td>    0.676</td> <td>    0.799</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>height</th>  <td>    0.4333</td> <td>    0.030</td> <td>   14.398</td> <td> 0.000</td> <td>    0.372</td> <td>    0.495</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>density</th> <td> -1.11e-16</td> <td>    0.030</td> <td>-3.69e-15</td> <td> 1.000</td> <td>   -0.061</td> <td>    0.061</td>\n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "  <th>Omnibus:</th>       <td> 6.096</td> <th>  Durbin-Watson:     </th> <td>   1.613</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Prob(Omnibus):</th> <td> 0.047</td> <th>  Jarque-Bera (JB):  </th> <td>   5.828</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Skew:</th>          <td> 1.004</td> <th>  Prob(JB):          </th> <td>  0.0543</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Kurtosis:</th>      <td> 2.713</td> <th>  Cond. No.          </th> <td>    1.06</td>\n",
       "</tr>\n",
       "</table><br/><br/>Warnings:<br/>[1] Standard Errors assume that the covariance matrix of the errors is correctly specified."
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                            OLS Regression Results                            \n",
       "==============================================================================\n",
       "Dep. Variable:                 volume   R-squared:                       0.973\n",
       "Model:                            OLS   Adj. R-squared:                  0.969\n",
       "Method:                 Least Squares   F-statistic:                     269.5\n",
       "Date:                Thu, 26 Sep 2019   Prob (F-statistic):           4.81e-23\n",
       "Time:                        16:53:39   Log-Likelihood:                 13.618\n",
       "No. Observations:                  34   AIC:                            -19.24\n",
       "Df Residuals:                      30   BIC:                            -13.13\n",
       "Df Model:                           4                                         \n",
       "Covariance Type:            nonrobust                                         \n",
       "==============================================================================\n",
       "                 coef    std err          t      P>|t|      [0.025      0.975]\n",
       "------------------------------------------------------------------------------\n",
       "length         0.4915      0.030     16.361      0.000       0.430       0.553\n",
       "width          0.7373      0.030     24.541      0.000       0.676       0.799\n",
       "height         0.4333      0.030     14.398      0.000       0.372       0.495\n",
       "density     -1.11e-16      0.030  -3.69e-15      1.000      -0.061       0.061\n",
       "==============================================================================\n",
       "Omnibus:                        6.096   Durbin-Watson:                   1.613\n",
       "Prob(Omnibus):                  0.047   Jarque-Bera (JB):                5.828\n",
       "Skew:                           1.004   Prob(JB):                       0.0543\n",
       "Kurtosis:                       2.713   Cond. No.                         1.06\n",
       "==============================================================================\n",
       "\n",
       "Warnings:\n",
       "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n",
       "\"\"\""
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "res.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The middle section of this summary output shows the regression coefficients, their standrad error and some statistics (t and F values) to help us judge how important the parameters are for predicting the output quantity. Since we standardized both parameters and result quantities, the standard error is the same for all independent variables. It should come as no surprise that `density` does not contribute to the predicted `volume` (the coefficient is extremly low)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}