Using the parsim API¶
parsim is intended to be used as a command-line tool. For some particular tasks, however, it can be practical to use the Python API in your own Python script, or interactivly. Post-processing and analysis of results is one such task. Here we show a few examples.
We assume that there is a parsim project in which we have have worked with the simplistic sample models discussed in the tutorial section. We make that project directory, here named demo
, our current directory, before starting a Python interpreter.
$ cd demo
Before you continue, please have look at the tutorial, so that you are familiar with the simple model template box
used there. We recall that the Python simulation script is called calc.py
. For our examples, we will create and execute two small experimental designs (the output of the parsim command is not shown here). In both cases, the stochastic parameters are defined in a parameter file box_uniform.par
:
length: uniform(10, 4) # 12 [m]
width: uniform(3, 2) # 4 [m]
height: uniform(1.3, 0.4) # 1.5 [m]
density: uniform(950, 100) # 1000 [kg/m3]
First, we create a two-level full factorial design called box_ff2n
. With four varying parameters, this yields 16 cases.
$ psm doe -t box --name box_ff2n box_uniform.par ff2n beta=0.999
We execute the simulation script and collect results:
$ psm run box_ff2n calc.py
$ psm collect box_ff2n
Second, we try the Generalized Subset Design (GSD) available in the pyDOE2
package. This scheme makes it possible to create reduced designs with more than two levels. Here we try three levels for length
and width
, and two levels for the other parameters. The size of the design is reduced by a factor two, which gives us 18 cases.
$ psm doe -t box --name box_gsd box_uniform.par gsd beta=0.999 levels=3,3,2,2 reduction=2
$ psm run box_gsd calc.py
$ psm collect box_gsd
Now start a Python interpreter of your choice, e.g. ipython
or a Jupyter Notebook (this API tutorial is itself created as a Jupyter Notebook).
[1]:
%cd demo
C:\Users\olawid\PycharmProjects\psm\doc\demo
Loading parsim Case and Study objects¶
In the previous section, we created parameters studies box_ff2n
and box_gsd
. The parsim APi can be used to load the corresponding parsim objects into a Python session.
We start by importing the parsim API:
[2]:
import parsim.core as ps
Remeber that we started the Python interpreter in the project directory, which is therefor our current directory. Let’s load the two studies from the example above. To do this we uspply the study name as an argument to the Study
class constructor:
[3]:
s_ff2n = ps.Study('box_ff2n')
s_gsd = ps.Study('box_gsd')
We can also load individual cases with the Case
class. To open an individual case, which is part of Study, we use the names of both Study and Case, separated by a colon:
[4]:
c_gsd_4 = ps.Case('box_gsd:4')
On the command-line, the psm info
command can be used to output information about studies and cases. In a Python console, you can have the same information by printing the output of the object info
method:
[5]:
print(s_ff2n.info())
Study name : box_ff2n
Creation date : 2019-08-29 16:55:58
Description :
Project name : myProject
Template path : C:\Users\olawid\PycharmProjects\psm\doc\demo\modelTemplates\box
Parsim version : 1.0.dev
Project path : C:\Users\olawid\PycharmProjects\psm\doc\demo
Caselist/DOE params : ['length', 'width', 'height', 'density']
DOE scheme : ff2n
DOE arguments : {'beta': 0.999}
--------------------------------------------------------
Variable parameter distributions (DOE)
--------------------------------------------------------
length : uniform(10, 4)
width : uniform(3, 2)
height : uniform(1.3, 0.4)
density : uniform(950, 100)
--------------------------------------------------------
Default parameters (defined in template)
--------------------------------------------------------
output_file : results.json
color : black
-------------------------------------------------------------------------
Case# Case_ID Description
-------------------------------------------------------------------------
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
-------------------------------------------------------------------------
In a similar manner you can show the log for a case or study. To look at the logged history of the 4th GSD case, for example,
[6]:
print(c_gsd_4.log())
2019-08-29 16:56:49 - INFO: Parsim case "4" successfully created
2019-08-29 16:56:52 - INFO: Executing command/script/executable: calc.py
2019-08-29 16:56:52 - INFO: Executable finished successfully (runtime: 0:00:00.155078)
Executable : C:\Users\olawid\PycharmProjects\psm\doc\demo\study_box_gsd\case_4\calc.py
stdout : C:\Users\olawid\PycharmProjects\psm\doc\demo\study_box_gsd\case_4\calc.out
stderr : C:\Users\olawid\PycharmProjects\psm\doc\demo\study_box_gsd\case_4\calc.err
Working with input parameters and results¶
In the following example, we will show how we can use the parsim API and the functionality of the pandas
library. We will also show how we can make a simple linear regression model using the statsmodels
library
We start by importing the popular packages pandas
and statsmodels
. We also import the parsim API from parsim.core
.
[7]:
import pandas as pd
import statsmodels.api as sm
import parsim.core as ps
We load the full-factorial study, as before:
[8]:
s_ff2n = ps.Study('box_ff2n')
Parameters and results as pandas DataFrame and Series objects¶
Both Study
and Case
classes provide data in the form of pandas objects. For a study, the caselist
attribute is a DataFrame
with the values of all varying parameters. Similarily, all collected results of the study are aggregated in results
:
[9]:
s_ff2n.caselist
[9]:
length | width | height | density | |
---|---|---|---|---|
CASENAME | ||||
1 | 13.998 | 4.999 | 1.6998 | 1049.95 |
2 | 10.002 | 4.999 | 1.6998 | 1049.95 |
3 | 13.998 | 3.001 | 1.6998 | 1049.95 |
4 | 10.002 | 3.001 | 1.6998 | 1049.95 |
5 | 13.998 | 4.999 | 1.3002 | 1049.95 |
6 | 10.002 | 4.999 | 1.3002 | 1049.95 |
7 | 13.998 | 3.001 | 1.3002 | 1049.95 |
8 | 10.002 | 3.001 | 1.3002 | 1049.95 |
9 | 13.998 | 4.999 | 1.6998 | 950.05 |
10 | 10.002 | 4.999 | 1.6998 | 950.05 |
11 | 13.998 | 3.001 | 1.6998 | 950.05 |
12 | 10.002 | 3.001 | 1.6998 | 950.05 |
13 | 13.998 | 4.999 | 1.3002 | 950.05 |
14 | 10.002 | 4.999 | 1.3002 | 950.05 |
15 | 13.998 | 3.001 | 1.3002 | 950.05 |
16 | 10.002 | 3.001 | 1.3002 | 950.05 |
[10]:
s_ff2n.results
[10]:
base_area | volume | mass | |
---|---|---|---|
CASENAME | |||
1 | 69.976002 | 118.945208 | 124886.521349 |
2 | 49.999998 | 84.989997 | 89235.246931 |
3 | 42.007998 | 71.405195 | 74971.884491 |
4 | 30.016002 | 51.021200 | 53569.709150 |
5 | 69.976002 | 90.982798 | 95527.388551 |
6 | 49.999998 | 65.009997 | 68257.246770 |
7 | 42.007998 | 54.618799 | 57347.008010 |
8 | 30.016002 | 39.026806 | 40976.194750 |
9 | 69.976002 | 118.945208 | 113003.895050 |
10 | 49.999998 | 84.989997 | 80744.746270 |
11 | 42.007998 | 71.405195 | 67838.505510 |
12 | 30.016002 | 51.021200 | 48472.691250 |
13 | 69.976002 | 90.982798 | 86438.207050 |
14 | 49.999998 | 65.009997 | 61762.748029 |
15 | 42.007998 | 54.618799 | 51890.589990 |
16 | 30.016002 | 39.026806 | 37077.416851 |
The results
DataFrame may contain missing values, NaN
, if the case simulation crashed or failed to produce a result.
The parameter
attribute shows the values and sources of the parameters that all cases of the study have in common. In the present example, all constant parameters have their default values.
[11]:
s_ff2n.parameters
[11]:
value | source | |
---|---|---|
color | black | default |
output_file | results.json | default |
Case objects have the attributes parameters
and results
, but for obvious reasons not the caselist
.
[12]:
c_ff2n_4 = ps.Case('box_ff2n:4')
c_ff2n_4.parameters
[12]:
value | source | |
---|---|---|
length | 10.002 | caselist |
width | 3.001 | caselist |
height | 1.6998 | caselist |
density | 1049.95 | caselist |
color | black | default |
output_file | results.json | default |
Note that it’s easy to see if a parameter is set by the caselist, on the user commandline, or if it’s a default value.
Case.parameters
is a pandas DataFrame, while Case.results
is pandas Series object,
[13]:
c_ff2n_4.results
[13]:
base_area 30.016002
volume 51.021200
mass 53569.709150
Name: results, dtype: float64
Merging datasets from several Studies¶
Sooner or later, you will be in a situation where you will want to make a joint analysis of the results from more than one Study. Maybe you started out with a small study, which only allows for a simple linear model, for example a factorial design in two levels, such as the box_ff2n
study avobe. Then you realize that a one or two parameters should rather be varied on at least three levels, in order to capture also quadratic terms. For example the box_gsd
study se saw earlier, where both
length
and width
parameters were varied on three levels. From a statistical point of view, this is a rather poor example, but that’s another story… Let’s go ahead and load the GSD study as well, and merge the two for analysis…
[14]:
s_gsd = ps.Study('box_gsd')
The results and the caselist from both studies can now be concatenated, but we need to decide what to do with the fact that the DataFrames from both studies have the same index entries. We could either ignore the original indices altogether, creating a new index in the process, or try to keep track of the identities of the cases also in the concatenated datasets. Here we chose the latter option, by creating a hierachical index indicating the names of the original studies. (The option
sort=False
is used to avoid sorting the columns in lexical order.)
[15]:
results = pd.concat([s_ff2n.results, s_gsd.results], keys=['ff2n', 'gsd'], sort=False)
caselist = pd.concat([s_ff2n.caselist, s_gsd.caselist], keys=['ff2n', 'gsd'], sort=False)
[16]:
caselist
[16]:
length | width | height | density | ||
---|---|---|---|---|---|
CASENAME | |||||
ff2n | 1 | 13.998 | 4.999 | 1.6998 | 1049.95 |
2 | 10.002 | 4.999 | 1.6998 | 1049.95 | |
3 | 13.998 | 3.001 | 1.6998 | 1049.95 | |
4 | 10.002 | 3.001 | 1.6998 | 1049.95 | |
5 | 13.998 | 4.999 | 1.3002 | 1049.95 | |
6 | 10.002 | 4.999 | 1.3002 | 1049.95 | |
7 | 13.998 | 3.001 | 1.3002 | 1049.95 | |
8 | 10.002 | 3.001 | 1.3002 | 1049.95 | |
9 | 13.998 | 4.999 | 1.6998 | 950.05 | |
10 | 10.002 | 4.999 | 1.6998 | 950.05 | |
11 | 13.998 | 3.001 | 1.6998 | 950.05 | |
12 | 10.002 | 3.001 | 1.6998 | 950.05 | |
13 | 13.998 | 4.999 | 1.3002 | 950.05 | |
14 | 10.002 | 4.999 | 1.3002 | 950.05 | |
15 | 13.998 | 3.001 | 1.3002 | 950.05 | |
16 | 10.002 | 3.001 | 1.3002 | 950.05 | |
gsd | 1 | 13.998 | 4.999 | 1.6998 | 1049.95 |
2 | 13.998 | 3.001 | 1.6998 | 1049.95 | |
3 | 10.002 | 4.999 | 1.6998 | 1049.95 | |
4 | 10.002 | 3.001 | 1.6998 | 1049.95 | |
5 | 13.998 | 4.999 | 1.3002 | 950.05 | |
6 | 13.998 | 3.001 | 1.3002 | 950.05 | |
7 | 10.002 | 4.999 | 1.3002 | 950.05 | |
8 | 10.002 | 3.001 | 1.3002 | 950.05 | |
9 | 13.998 | 4.000 | 1.6998 | 950.05 | |
10 | 10.002 | 4.000 | 1.6998 | 950.05 | |
11 | 13.998 | 4.000 | 1.3002 | 1049.95 | |
12 | 10.002 | 4.000 | 1.3002 | 1049.95 | |
13 | 12.000 | 4.999 | 1.6998 | 950.05 | |
14 | 12.000 | 3.001 | 1.6998 | 950.05 | |
15 | 12.000 | 4.999 | 1.3002 | 1049.95 | |
16 | 12.000 | 3.001 | 1.3002 | 1049.95 | |
17 | 12.000 | 4.000 | 1.6998 | 1049.95 | |
18 | 12.000 | 4.000 | 1.3002 | 950.05 |
Linear regression analysis¶
We will use the statsmodels
package to fit the combined data to a simple linear model.
For numerical robustness, you are strongly adviced to normalize or standardize your data! We name our standardized datasets y
and x
, for dependent and independent variables, respectively.
[17]:
y = (results - results.mean())/results.std()
x = (caselist - caselist.mean())/caselist.std()
Fitting a model statsmodels
typically involves three steps:
Use the model class to describe the model,
Fit the model
Inspect the results
Here we use the OLS
class, for ordinary least squares.
First we define and fit the model to our data; we use the output variable volume
for this example:
[20]:
mod = sm.OLS(y['volume'], x) # define model
res = mod.fit() # fit model to data
Now let’s look at the results:
[21]:
res.summary()
[21]:
Dep. Variable: | volume | R-squared: | 0.973 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.969 |
Method: | Least Squares | F-statistic: | 269.5 |
Date: | Thu, 26 Sep 2019 | Prob (F-statistic): | 4.81e-23 |
Time: | 16:53:39 | Log-Likelihood: | 13.618 |
No. Observations: | 34 | AIC: | -19.24 |
Df Residuals: | 30 | BIC: | -13.13 |
Df Model: | 4 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
length | 0.4915 | 0.030 | 16.361 | 0.000 | 0.430 | 0.553 |
width | 0.7373 | 0.030 | 24.541 | 0.000 | 0.676 | 0.799 |
height | 0.4333 | 0.030 | 14.398 | 0.000 | 0.372 | 0.495 |
density | -1.11e-16 | 0.030 | -3.69e-15 | 1.000 | -0.061 | 0.061 |
Omnibus: | 6.096 | Durbin-Watson: | 1.613 |
---|---|---|---|
Prob(Omnibus): | 0.047 | Jarque-Bera (JB): | 5.828 |
Skew: | 1.004 | Prob(JB): | 0.0543 |
Kurtosis: | 2.713 | Cond. No. | 1.06 |
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
The middle section of this summary output shows the regression coefficients, their standrad error and some statistics (t and F values) to help us judge how important the parameters are for predicting the output quantity. Since we standardized both parameters and result quantities, the standard error is the same for all independent variables. It should come as no surprise that density
does not contribute to the predicted volume
(the coefficient is extremly low).