{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Using the parsim API\n", "\n", "**parsim** is intended to be used as a command-line tool. For some particular tasks, however, it can be practical to use the Python API in your own Python script, or interactivly. Post-processing and analysis of results is one such task. Here we show a few examples.\n", "\n", "We assume that there is a parsim project in which we have have worked with the simplistic sample models discussed in the tutorial section. We make that project directory, here named `demo`, our current directory, before starting a Python interpreter. \n", "\n", " $ cd demo\n", "\n", "Before you continue, please have look at the tutorial, so that you are familiar with the simple model template `box` used there. We recall that the Python simulation script is called `calc.py`. For our examples, we will create and execute two small experimental designs (the output of the parsim command is not shown here). In both cases, the stochastic parameters are defined in a parameter file `box_uniform.par`:\n", "\n", " length: uniform(10, 4) # 12 [m]\n", " width: uniform(3, 2) # 4 [m]\n", " height: uniform(1.3, 0.4) # 1.5 [m]\n", " density: uniform(950, 100) # 1000 [kg/m3]\n", "\n", "First, we create a two-level full factorial design called `box_ff2n`. With four varying parameters, this yields 16 cases.\n", "\n", "```\n", "$ psm doe -t box --name box_ff2n box_uniform.par ff2n beta=0.999\n", "``` \n", "\n", "We execute the simulation script and collect results:\n", "\n", "```\n", "$ psm run box_ff2n calc.py\n", "$ psm collect box_ff2n\n", "``` \n", "\n", "Second, we try the Generalized Subset Design (GSD) available in the `pyDOE2` package. This scheme makes it possible to create reduced designs with more than two levels. \n", "Here we try three levels for `length` and `width`, and two levels for the other parameters. The size of the design is reduced by a factor two, which gives us 18 cases.\n", "\n", "```\n", "$ psm doe -t box --name box_gsd box_uniform.par gsd beta=0.999 levels=3,3,2,2 reduction=2\n", "$ psm run box_gsd calc.py\n", "$ psm collect box_gsd\n", "```\n", "\n", "Now start a Python interpreter of your choice, e.g. `ipython` or a Jupyter Notebook (this API tutorial is itself created as a Jupyter Notebook). " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "C:\\Users\\olawid\\PycharmProjects\\psm\\doc\\demo\n" ] } ], "source": [ "%cd demo" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading parsim Case and Study objects\n", "\n", "In the previous section, we created parameters studies `box_ff2n` and `box_gsd`. The parsim APi can be used to load the corresponding parsim objects into a Python session.\n", "\n", "We start by importing the parsim API:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import parsim.core as ps" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Remeber that we started the Python interpreter in the project directory, which is therefor our current directory. Let's load the two studies from the example above. To do this we uspply the study name as an argument to the `Study` class constructor: " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "s_ff2n = ps.Study('box_ff2n')\n", "s_gsd = ps.Study('box_gsd')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also load individual cases with the `Case` class. To open an individual case, which is part of Study, we use the names of both Study and Case, separated by a colon:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "c_gsd_4 = ps.Case('box_gsd:4')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On the command-line, the `psm info` command can be used to output information about studies and cases. In a Python console, you can have the same information by printing the output of the object `info` method:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Study name : box_ff2n\n", "Creation date : 2019-08-29 16:55:58\n", "Description : \n", "Project name : myProject\n", "Template path : C:\\Users\\olawid\\PycharmProjects\\psm\\doc\\demo\\modelTemplates\\box\n", "Parsim version : 1.0.dev\n", "Project path : C:\\Users\\olawid\\PycharmProjects\\psm\\doc\\demo\n", "Caselist/DOE params : ['length', 'width', 'height', 'density']\n", "DOE scheme : ff2n\n", "DOE arguments : {'beta': 0.999}\n", "--------------------------------------------------------\n", "Variable parameter distributions (DOE)\n", "--------------------------------------------------------\n", "length : uniform(10, 4)\n", "width : uniform(3, 2)\n", "height : uniform(1.3, 0.4)\n", "density : uniform(950, 100)\n", "--------------------------------------------------------\n", "Default parameters (defined in template)\n", "--------------------------------------------------------\n", "output_file : results.json\n", "color : black\n", "-------------------------------------------------------------------------\n", " Case# Case_ID Description\n", "-------------------------------------------------------------------------\n", " 1 1 \n", " 2 2 \n", " 3 3 \n", " 4 4 \n", " 5 5 \n", " 6 6 \n", " 7 7 \n", " 8 8 \n", " 9 9 \n", " 10 10 \n", " 11 11 \n", " 12 12 \n", " 13 13 \n", " 14 14 \n", " 15 15 \n", " 16 16 \n", "-------------------------------------------------------------------------\n", "\n" ] } ], "source": [ "print(s_ff2n.info())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In a similar manner you can show the log for a case or study. To look at the logged history of the 4th GSD case, for example," ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2019-08-29 16:56:49 - INFO: Parsim case \"4\" successfully created\n", "2019-08-29 16:56:52 - INFO: Executing command/script/executable: calc.py \n", "2019-08-29 16:56:52 - INFO: Executable finished successfully (runtime: 0:00:00.155078)\n", " Executable : C:\\Users\\olawid\\PycharmProjects\\psm\\doc\\demo\\study_box_gsd\\case_4\\calc.py \n", " stdout : C:\\Users\\olawid\\PycharmProjects\\psm\\doc\\demo\\study_box_gsd\\case_4\\calc.out \n", " stderr : C:\\Users\\olawid\\PycharmProjects\\psm\\doc\\demo\\study_box_gsd\\case_4\\calc.err\n", "\n" ] } ], "source": [ "print(c_gsd_4.log())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Working with input parameters and results\n", "\n", "In the following example, we will show how we can use the parsim API and the functionality of the `pandas` library. \n", "We will also show how we can make a simple linear regression model using the `statsmodels` library \n", "\n", "We start by importing the popular packages `pandas` and `statsmodels`. We also import the parsim API from `parsim.core`.\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import statsmodels.api as sm\n", "\n", "import parsim.core as ps" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We load the full-factorial study, as before:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "s_ff2n = ps.Study('box_ff2n')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Parameters and results as pandas DataFrame and Series objects\n", "\n", "Both `Study` and `Case` classes provide data in the form of pandas objects. For a study, the `caselist` attribute is a `DataFrame` with the values of all varying parameters. Similarily, all collected results of the study are aggregated in `results`:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
lengthwidthheightdensity
CASENAME
113.9984.9991.69981049.95
210.0024.9991.69981049.95
313.9983.0011.69981049.95
410.0023.0011.69981049.95
513.9984.9991.30021049.95
610.0024.9991.30021049.95
713.9983.0011.30021049.95
810.0023.0011.30021049.95
913.9984.9991.6998950.05
1010.0024.9991.6998950.05
1113.9983.0011.6998950.05
1210.0023.0011.6998950.05
1313.9984.9991.3002950.05
1410.0024.9991.3002950.05
1513.9983.0011.3002950.05
1610.0023.0011.3002950.05
\n", "
" ], "text/plain": [ " length width height density\n", "CASENAME \n", "1 13.998 4.999 1.6998 1049.95\n", "2 10.002 4.999 1.6998 1049.95\n", "3 13.998 3.001 1.6998 1049.95\n", "4 10.002 3.001 1.6998 1049.95\n", "5 13.998 4.999 1.3002 1049.95\n", "6 10.002 4.999 1.3002 1049.95\n", "7 13.998 3.001 1.3002 1049.95\n", "8 10.002 3.001 1.3002 1049.95\n", "9 13.998 4.999 1.6998 950.05\n", "10 10.002 4.999 1.6998 950.05\n", "11 13.998 3.001 1.6998 950.05\n", "12 10.002 3.001 1.6998 950.05\n", "13 13.998 4.999 1.3002 950.05\n", "14 10.002 4.999 1.3002 950.05\n", "15 13.998 3.001 1.3002 950.05\n", "16 10.002 3.001 1.3002 950.05" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s_ff2n.caselist" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
base_areavolumemass
CASENAME
169.976002118.945208124886.521349
249.99999884.98999789235.246931
342.00799871.40519574971.884491
430.01600251.02120053569.709150
569.97600290.98279895527.388551
649.99999865.00999768257.246770
742.00799854.61879957347.008010
830.01600239.02680640976.194750
969.976002118.945208113003.895050
1049.99999884.98999780744.746270
1142.00799871.40519567838.505510
1230.01600251.02120048472.691250
1369.97600290.98279886438.207050
1449.99999865.00999761762.748029
1542.00799854.61879951890.589990
1630.01600239.02680637077.416851
\n", "
" ], "text/plain": [ " base_area volume mass\n", "CASENAME \n", "1 69.976002 118.945208 124886.521349\n", "2 49.999998 84.989997 89235.246931\n", "3 42.007998 71.405195 74971.884491\n", "4 30.016002 51.021200 53569.709150\n", "5 69.976002 90.982798 95527.388551\n", "6 49.999998 65.009997 68257.246770\n", "7 42.007998 54.618799 57347.008010\n", "8 30.016002 39.026806 40976.194750\n", "9 69.976002 118.945208 113003.895050\n", "10 49.999998 84.989997 80744.746270\n", "11 42.007998 71.405195 67838.505510\n", "12 30.016002 51.021200 48472.691250\n", "13 69.976002 90.982798 86438.207050\n", "14 49.999998 65.009997 61762.748029\n", "15 42.007998 54.618799 51890.589990\n", "16 30.016002 39.026806 37077.416851" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s_ff2n.results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `results` DataFrame may contain missing values, `NaN`, if the case simulation crashed or failed to produce a result.\n", "\n", "The `parameter` attribute shows the values and sources of the parameters that all cases of the study have in common. In the present example, all constant parameters have their default values." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
valuesource
colorblackdefault
output_fileresults.jsondefault
\n", "
" ], "text/plain": [ " value source\n", "color black default\n", "output_file results.json default" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s_ff2n.parameters" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Case objects have the attributes `parameters` and `results`, but for obvious reasons not the `caselist`. " ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
valuesource
length10.002caselist
width3.001caselist
height1.6998caselist
density1049.95caselist
colorblackdefault
output_fileresults.jsondefault
\n", "
" ], "text/plain": [ " value source\n", "length 10.002 caselist\n", "width 3.001 caselist\n", "height 1.6998 caselist\n", "density 1049.95 caselist\n", "color black default\n", "output_file results.json default" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c_ff2n_4 = ps.Case('box_ff2n:4')\n", "\n", "c_ff2n_4.parameters" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that it's easy to see if a parameter is set by the caselist, on the user commandline, or if it's a default value.\n", "\n", "`Case.parameters` is a pandas DataFrame, while `Case.results` is pandas Series object," ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "base_area 30.016002\n", "volume 51.021200\n", "mass 53569.709150\n", "Name: results, dtype: float64" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c_ff2n_4.results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Merging datasets from several Studies\n", "\n", "Sooner or later, you will be in a situation where you will want to make a joint analysis of the results from more than one Study. Maybe you started out with a small study, which only allows for a simple linear model, for example a factorial design in two levels, such as the `box_ff2n` study avobe. Then you realize that a one or two parameters should rather be varied on at least three levels, in order to capture also quadratic terms. For example the `box_gsd` study se saw earlier, where both `length` and `width` parameters were varied on three levels. From a statistical point of view, this is a rather poor example, but that's another story... Let's go ahead and load the GSD study as well, and merge the two for analysis..." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "s_gsd = ps.Study('box_gsd')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The results and the caselist from both studies can now be concatenated, but we need to decide what to do with the fact that the DataFrames from both studies have the same index entries. We could either ignore the original indices altogether, creating a new index in the process, or try to keep track of the identities of the cases also in the concatenated datasets. Here we chose the latter option, by creating a hierachical index indicating the names of the original studies. (The option `sort=False` is used to avoid sorting the columns in lexical order.)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "results = pd.concat([s_ff2n.results, s_gsd.results], keys=['ff2n', 'gsd'], sort=False)\n", "caselist = pd.concat([s_ff2n.caselist, s_gsd.caselist], keys=['ff2n', 'gsd'], sort=False)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
lengthwidthheightdensity
CASENAME
ff2n113.9984.9991.69981049.95
210.0024.9991.69981049.95
313.9983.0011.69981049.95
410.0023.0011.69981049.95
513.9984.9991.30021049.95
610.0024.9991.30021049.95
713.9983.0011.30021049.95
810.0023.0011.30021049.95
913.9984.9991.6998950.05
1010.0024.9991.6998950.05
1113.9983.0011.6998950.05
1210.0023.0011.6998950.05
1313.9984.9991.3002950.05
1410.0024.9991.3002950.05
1513.9983.0011.3002950.05
1610.0023.0011.3002950.05
gsd113.9984.9991.69981049.95
213.9983.0011.69981049.95
310.0024.9991.69981049.95
410.0023.0011.69981049.95
513.9984.9991.3002950.05
613.9983.0011.3002950.05
710.0024.9991.3002950.05
810.0023.0011.3002950.05
913.9984.0001.6998950.05
1010.0024.0001.6998950.05
1113.9984.0001.30021049.95
1210.0024.0001.30021049.95
1312.0004.9991.6998950.05
1412.0003.0011.6998950.05
1512.0004.9991.30021049.95
1612.0003.0011.30021049.95
1712.0004.0001.69981049.95
1812.0004.0001.3002950.05
\n", "
" ], "text/plain": [ " length width height density\n", " CASENAME \n", "ff2n 1 13.998 4.999 1.6998 1049.95\n", " 2 10.002 4.999 1.6998 1049.95\n", " 3 13.998 3.001 1.6998 1049.95\n", " 4 10.002 3.001 1.6998 1049.95\n", " 5 13.998 4.999 1.3002 1049.95\n", " 6 10.002 4.999 1.3002 1049.95\n", " 7 13.998 3.001 1.3002 1049.95\n", " 8 10.002 3.001 1.3002 1049.95\n", " 9 13.998 4.999 1.6998 950.05\n", " 10 10.002 4.999 1.6998 950.05\n", " 11 13.998 3.001 1.6998 950.05\n", " 12 10.002 3.001 1.6998 950.05\n", " 13 13.998 4.999 1.3002 950.05\n", " 14 10.002 4.999 1.3002 950.05\n", " 15 13.998 3.001 1.3002 950.05\n", " 16 10.002 3.001 1.3002 950.05\n", "gsd 1 13.998 4.999 1.6998 1049.95\n", " 2 13.998 3.001 1.6998 1049.95\n", " 3 10.002 4.999 1.6998 1049.95\n", " 4 10.002 3.001 1.6998 1049.95\n", " 5 13.998 4.999 1.3002 950.05\n", " 6 13.998 3.001 1.3002 950.05\n", " 7 10.002 4.999 1.3002 950.05\n", " 8 10.002 3.001 1.3002 950.05\n", " 9 13.998 4.000 1.6998 950.05\n", " 10 10.002 4.000 1.6998 950.05\n", " 11 13.998 4.000 1.3002 1049.95\n", " 12 10.002 4.000 1.3002 1049.95\n", " 13 12.000 4.999 1.6998 950.05\n", " 14 12.000 3.001 1.6998 950.05\n", " 15 12.000 4.999 1.3002 1049.95\n", " 16 12.000 3.001 1.3002 1049.95\n", " 17 12.000 4.000 1.6998 1049.95\n", " 18 12.000 4.000 1.3002 950.05" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "caselist" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Linear regression analysis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will use the `statsmodels` package to fit the combined data to a simple linear model. \n", "\n", "For numerical robustness, you are strongly adviced to normalize or standardize your data! We name our standardized datasets `y` and `x`, for dependent and independent variables, respectively. " ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "y = (results - results.mean())/results.std()\n", "x = (caselist - caselist.mean())/caselist.std()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fitting a model `statsmodels`typically involves three steps:\n", "\n", "1. Use the model class to describe the model,\n", "2. Fit the model\n", "3. Inspect the results\n", "\n", "Here we use the `OLS` class, for ordinary least squares. \n", "\n", "First we define and fit the model to our data; we use the output variable `volume` for this example:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "mod = sm.OLS(y['volume'], x) # define model\n", "res = mod.fit() # fit model to data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's look at the results:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
OLS Regression Results
Dep. Variable: volume R-squared: 0.973
Model: OLS Adj. R-squared: 0.969
Method: Least Squares F-statistic: 269.5
Date: Thu, 26 Sep 2019 Prob (F-statistic): 4.81e-23
Time: 16:53:39 Log-Likelihood: 13.618
No. Observations: 34 AIC: -19.24
Df Residuals: 30 BIC: -13.13
Df Model: 4
Covariance Type: nonrobust
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err t P>|t| [0.025 0.975]
length 0.4915 0.030 16.361 0.000 0.430 0.553
width 0.7373 0.030 24.541 0.000 0.676 0.799
height 0.4333 0.030 14.398 0.000 0.372 0.495
density -1.11e-16 0.030 -3.69e-15 1.000 -0.061 0.061
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Omnibus: 6.096 Durbin-Watson: 1.613
Prob(Omnibus): 0.047 Jarque-Bera (JB): 5.828
Skew: 1.004 Prob(JB): 0.0543
Kurtosis: 2.713 Cond. No. 1.06


Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified." ], "text/plain": [ "\n", "\"\"\"\n", " OLS Regression Results \n", "==============================================================================\n", "Dep. Variable: volume R-squared: 0.973\n", "Model: OLS Adj. R-squared: 0.969\n", "Method: Least Squares F-statistic: 269.5\n", "Date: Thu, 26 Sep 2019 Prob (F-statistic): 4.81e-23\n", "Time: 16:53:39 Log-Likelihood: 13.618\n", "No. Observations: 34 AIC: -19.24\n", "Df Residuals: 30 BIC: -13.13\n", "Df Model: 4 \n", "Covariance Type: nonrobust \n", "==============================================================================\n", " coef std err t P>|t| [0.025 0.975]\n", "------------------------------------------------------------------------------\n", "length 0.4915 0.030 16.361 0.000 0.430 0.553\n", "width 0.7373 0.030 24.541 0.000 0.676 0.799\n", "height 0.4333 0.030 14.398 0.000 0.372 0.495\n", "density -1.11e-16 0.030 -3.69e-15 1.000 -0.061 0.061\n", "==============================================================================\n", "Omnibus: 6.096 Durbin-Watson: 1.613\n", "Prob(Omnibus): 0.047 Jarque-Bera (JB): 5.828\n", "Skew: 1.004 Prob(JB): 0.0543\n", "Kurtosis: 2.713 Cond. No. 1.06\n", "==============================================================================\n", "\n", "Warnings:\n", "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n", "\"\"\"" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The middle section of this summary output shows the regression coefficients, their standrad error and some statistics (t and F values) to help us judge how important the parameters are for predicting the output quantity. Since we standardized both parameters and result quantities, the standard error is the same for all independent variables. It should come as no surprise that `density` does not contribute to the predicted `volume` (the coefficient is extremly low)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 4 }