Using the parsim API

parsim is intended to be used as a command-line tool. For some particular tasks, however, it can be practical to use the Python API in your own Python script, or interactivly. Post-processing and analysis of results is one such task. Here we show a few examples.

We assume that there is a parsim project in which we have have worked with the simplistic sample models discussed in the tutorial section. We make that project directory, here named demo, our current directory, before starting a Python interpreter.

$ cd demo

Before you continue, please have look at the tutorial, so that you are familiar with the simple model template box used there. We recall that the Python simulation script is called calc.py. For our examples, we will create and execute two small experimental designs (the output of the parsim command is not shown here). In both cases, the stochastic parameters are defined in a parameter file box_uniform.par:

length:     uniform(10, 4)     # 12 [m]
width:      uniform(3, 2)      # 4 [m]
height:     uniform(1.3, 0.4)  # 1.5 [m]
density:    uniform(950, 100)  # 1000 [kg/m3]

First, we create a two-level full factorial design called box_ff2n. With four varying parameters, this yields 16 cases.

$ psm doe -t box --name box_ff2n box_uniform.par ff2n beta=0.999

We execute the simulation script and collect results:

$ psm run box_ff2n calc.py
$ psm collect box_ff2n

Second, we try the Generalized Subset Design (GSD) available in the pyDOE2 package. This scheme makes it possible to create reduced designs with more than two levels. Here we try three levels for length and width, and two levels for the other parameters. The size of the design is reduced by a factor two, which gives us 18 cases.

$ psm doe -t box --name box_gsd box_uniform.par gsd beta=0.999 levels=3,3,2,2 reduction=2
$ psm run box_gsd calc.py
$ psm collect box_gsd

Now start a Python interpreter of your choice, e.g. ipython or a Jupyter Notebook (this API tutorial is itself created as a Jupyter Notebook).

[1]:
%cd demo
C:\Users\olawid\PycharmProjects\psm\doc\demo

Loading parsim Case and Study objects

In the previous section, we created parameters studies box_ff2n and box_gsd. The parsim APi can be used to load the corresponding parsim objects into a Python session.

We start by importing the parsim API:

[2]:
import parsim.core as ps

Remeber that we started the Python interpreter in the project directory, which is therefor our current directory. Let’s load the two studies from the example above. To do this we uspply the study name as an argument to the Study class constructor:

[3]:
s_ff2n = ps.Study('box_ff2n')
s_gsd = ps.Study('box_gsd')

We can also load individual cases with the Case class. To open an individual case, which is part of Study, we use the names of both Study and Case, separated by a colon:

[4]:
c_gsd_4 = ps.Case('box_gsd:4')

On the command-line, the psm info command can be used to output information about studies and cases. In a Python console, you can have the same information by printing the output of the object info method:

[5]:
print(s_ff2n.info())
Study name            : box_ff2n
Creation date         : 2019-08-29 16:55:58
Description           :
Project name          : myProject
Template path         : C:\Users\olawid\PycharmProjects\psm\doc\demo\modelTemplates\box
Parsim version        : 1.0.dev
Project path          : C:\Users\olawid\PycharmProjects\psm\doc\demo
Caselist/DOE params   : ['length', 'width', 'height', 'density']
DOE scheme            : ff2n
DOE arguments         : {'beta': 0.999}
--------------------------------------------------------
Variable parameter distributions (DOE)
--------------------------------------------------------
length                : uniform(10, 4)
width                 : uniform(3, 2)
height                : uniform(1.3, 0.4)
density               : uniform(950, 100)
--------------------------------------------------------
Default parameters (defined in template)
--------------------------------------------------------
output_file           : results.json
color                 : black
-------------------------------------------------------------------------
 Case#  Case_ID           Description
-------------------------------------------------------------------------
 1      1
 2      2
 3      3
 4      4
 5      5
 6      6
 7      7
 8      8
 9      9
 10     10
 11     11
 12     12
 13     13
 14     14
 15     15
 16     16
-------------------------------------------------------------------------

In a similar manner you can show the log for a case or study. To look at the logged history of the 4th GSD case, for example,

[6]:
print(c_gsd_4.log())
2019-08-29 16:56:49 - INFO: Parsim case "4" successfully created
2019-08-29 16:56:52 - INFO: Executing command/script/executable: calc.py
2019-08-29 16:56:52 - INFO: Executable finished successfully (runtime: 0:00:00.155078)
   Executable : C:\Users\olawid\PycharmProjects\psm\doc\demo\study_box_gsd\case_4\calc.py
   stdout     : C:\Users\olawid\PycharmProjects\psm\doc\demo\study_box_gsd\case_4\calc.out
   stderr     : C:\Users\olawid\PycharmProjects\psm\doc\demo\study_box_gsd\case_4\calc.err

Working with input parameters and results

In the following example, we will show how we can use the parsim API and the functionality of the pandas library. We will also show how we can make a simple linear regression model using the statsmodels library

We start by importing the popular packages pandas and statsmodels. We also import the parsim API from parsim.core.

[7]:
import pandas as pd
import statsmodels.api as sm

import parsim.core as ps

We load the full-factorial study, as before:

[8]:
s_ff2n = ps.Study('box_ff2n')

Parameters and results as pandas DataFrame and Series objects

Both Study and Case classes provide data in the form of pandas objects. For a study, the caselist attribute is a DataFrame with the values of all varying parameters. Similarily, all collected results of the study are aggregated in results:

[9]:
s_ff2n.caselist
[9]:
length width height density
CASENAME
1 13.998 4.999 1.6998 1049.95
2 10.002 4.999 1.6998 1049.95
3 13.998 3.001 1.6998 1049.95
4 10.002 3.001 1.6998 1049.95
5 13.998 4.999 1.3002 1049.95
6 10.002 4.999 1.3002 1049.95
7 13.998 3.001 1.3002 1049.95
8 10.002 3.001 1.3002 1049.95
9 13.998 4.999 1.6998 950.05
10 10.002 4.999 1.6998 950.05
11 13.998 3.001 1.6998 950.05
12 10.002 3.001 1.6998 950.05
13 13.998 4.999 1.3002 950.05
14 10.002 4.999 1.3002 950.05
15 13.998 3.001 1.3002 950.05
16 10.002 3.001 1.3002 950.05
[10]:
s_ff2n.results
[10]:
base_area volume mass
CASENAME
1 69.976002 118.945208 124886.521349
2 49.999998 84.989997 89235.246931
3 42.007998 71.405195 74971.884491
4 30.016002 51.021200 53569.709150
5 69.976002 90.982798 95527.388551
6 49.999998 65.009997 68257.246770
7 42.007998 54.618799 57347.008010
8 30.016002 39.026806 40976.194750
9 69.976002 118.945208 113003.895050
10 49.999998 84.989997 80744.746270
11 42.007998 71.405195 67838.505510
12 30.016002 51.021200 48472.691250
13 69.976002 90.982798 86438.207050
14 49.999998 65.009997 61762.748029
15 42.007998 54.618799 51890.589990
16 30.016002 39.026806 37077.416851

The results DataFrame may contain missing values, NaN, if the case simulation crashed or failed to produce a result.

The parameter attribute shows the values and sources of the parameters that all cases of the study have in common. In the present example, all constant parameters have their default values.

[11]:
s_ff2n.parameters
[11]:
value source
color black default
output_file results.json default

Case objects have the attributes parameters and results, but for obvious reasons not the caselist.

[12]:
c_ff2n_4 = ps.Case('box_ff2n:4')

c_ff2n_4.parameters
[12]:
value source
length 10.002 caselist
width 3.001 caselist
height 1.6998 caselist
density 1049.95 caselist
color black default
output_file results.json default

Note that it’s easy to see if a parameter is set by the caselist, on the user commandline, or if it’s a default value.

Case.parameters is a pandas DataFrame, while Case.results is pandas Series object,

[13]:
c_ff2n_4.results
[13]:
base_area       30.016002
volume          51.021200
mass         53569.709150
Name: results, dtype: float64

Merging datasets from several Studies

Sooner or later, you will be in a situation where you will want to make a joint analysis of the results from more than one Study. Maybe you started out with a small study, which only allows for a simple linear model, for example a factorial design in two levels, such as the box_ff2n study avobe. Then you realize that a one or two parameters should rather be varied on at least three levels, in order to capture also quadratic terms. For example the box_gsd study se saw earlier, where both length and width parameters were varied on three levels. From a statistical point of view, this is a rather poor example, but that’s another story… Let’s go ahead and load the GSD study as well, and merge the two for analysis…

[14]:
s_gsd = ps.Study('box_gsd')

The results and the caselist from both studies can now be concatenated, but we need to decide what to do with the fact that the DataFrames from both studies have the same index entries. We could either ignore the original indices altogether, creating a new index in the process, or try to keep track of the identities of the cases also in the concatenated datasets. Here we chose the latter option, by creating a hierachical index indicating the names of the original studies. (The option sort=False is used to avoid sorting the columns in lexical order.)

[15]:
results = pd.concat([s_ff2n.results, s_gsd.results], keys=['ff2n', 'gsd'], sort=False)
caselist = pd.concat([s_ff2n.caselist, s_gsd.caselist], keys=['ff2n', 'gsd'], sort=False)
[16]:
caselist
[16]:
length width height density
CASENAME
ff2n 1 13.998 4.999 1.6998 1049.95
2 10.002 4.999 1.6998 1049.95
3 13.998 3.001 1.6998 1049.95
4 10.002 3.001 1.6998 1049.95
5 13.998 4.999 1.3002 1049.95
6 10.002 4.999 1.3002 1049.95
7 13.998 3.001 1.3002 1049.95
8 10.002 3.001 1.3002 1049.95
9 13.998 4.999 1.6998 950.05
10 10.002 4.999 1.6998 950.05
11 13.998 3.001 1.6998 950.05
12 10.002 3.001 1.6998 950.05
13 13.998 4.999 1.3002 950.05
14 10.002 4.999 1.3002 950.05
15 13.998 3.001 1.3002 950.05
16 10.002 3.001 1.3002 950.05
gsd 1 13.998 4.999 1.6998 1049.95
2 13.998 3.001 1.6998 1049.95
3 10.002 4.999 1.6998 1049.95
4 10.002 3.001 1.6998 1049.95
5 13.998 4.999 1.3002 950.05
6 13.998 3.001 1.3002 950.05
7 10.002 4.999 1.3002 950.05
8 10.002 3.001 1.3002 950.05
9 13.998 4.000 1.6998 950.05
10 10.002 4.000 1.6998 950.05
11 13.998 4.000 1.3002 1049.95
12 10.002 4.000 1.3002 1049.95
13 12.000 4.999 1.6998 950.05
14 12.000 3.001 1.6998 950.05
15 12.000 4.999 1.3002 1049.95
16 12.000 3.001 1.3002 1049.95
17 12.000 4.000 1.6998 1049.95
18 12.000 4.000 1.3002 950.05

Linear regression analysis

We will use the statsmodels package to fit the combined data to a simple linear model.

For numerical robustness, you are strongly adviced to normalize or standardize your data! We name our standardized datasets y and x, for dependent and independent variables, respectively.

[17]:
y = (results - results.mean())/results.std()
x = (caselist - caselist.mean())/caselist.std()

Fitting a model statsmodelstypically involves three steps:

  1. Use the model class to describe the model,

  2. Fit the model

  3. Inspect the results

Here we use the OLS class, for ordinary least squares.

First we define and fit the model to our data; we use the output variable volume for this example:

[20]:
mod = sm.OLS(y['volume'], x)  # define model
res = mod.fit()               # fit model to data

Now let’s look at the results:

[21]:
res.summary()
[21]:
OLS Regression Results
Dep. Variable: volume R-squared: 0.973
Model: OLS Adj. R-squared: 0.969
Method: Least Squares F-statistic: 269.5
Date: Thu, 26 Sep 2019 Prob (F-statistic): 4.81e-23
Time: 16:53:39 Log-Likelihood: 13.618
No. Observations: 34 AIC: -19.24
Df Residuals: 30 BIC: -13.13
Df Model: 4
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
length 0.4915 0.030 16.361 0.000 0.430 0.553
width 0.7373 0.030 24.541 0.000 0.676 0.799
height 0.4333 0.030 14.398 0.000 0.372 0.495
density -1.11e-16 0.030 -3.69e-15 1.000 -0.061 0.061
Omnibus: 6.096 Durbin-Watson: 1.613
Prob(Omnibus): 0.047 Jarque-Bera (JB): 5.828
Skew: 1.004 Prob(JB): 0.0543
Kurtosis: 2.713 Cond. No. 1.06


Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

The middle section of this summary output shows the regression coefficients, their standrad error and some statistics (t and F values) to help us judge how important the parameters are for predicting the output quantity. Since we standardized both parameters and result quantities, the standard error is the same for all independent variables. It should come as no surprise that density does not contribute to the predicted volume (the coefficient is extremly low).