Using the parsim API

parsim is intended to be used as a command-line tool. For some particular tasks, however, it can be practical to use the Python API in your own Python script, or interactivly. Post-processing and analysis of results is one such task. Here we show a few examples.

We assume that there is a parsim project in which we have have worked with the simplistic sample models discussed in the tutorial section. We make that project directory, here named demo, our current directory, before starting a Python interpreter.

$ cd demo

Before you continue, please have look at the tutorial, so that you are familiar with the simple model template box used there. We recall that the Python simulation script is called calc.py. For our examples, we will create and execute two small experimental designs (the output of the parsim command is not shown here). In both cases, the stochastic parameters are defined in a parameter file box_uniform.par:

length:     uniform(10, 4)     # 12 [m]
width:      uniform(3, 2)      # 4 [m]
height:     uniform(1.3, 0.4)  # 1.5 [m]
density:    uniform(950, 100)  # 1000 [kg/m3]

First, we create a two-level full factorial design called box_ff2n. With four varying parameters, this yields 16 cases.

$ psm doe -t box --name box_ff2n box_uniform.par ff2n beta=0.999

We execute the simulation script and collect results:

$ psm run box_ff2n calc.py
$ psm collect box_ff2n

Second, we try the Generalized Subset Design (GSD) available in the pyDOE2 package. This scheme makes it possible to create reduced designs with more than two levels. Here we try three levels for length and width, and two levels for the other parameters. The size of the design is reduced by a factor two, which gives us 18 cases.

$ psm doe -t box --name box_gsd box_uniform.par gsd beta=0.999 levels=3,3,2,2 reduction=2
$ psm run box_gsd calc.py
$ psm collect box_gsd

Now start a Python interpreter of your choice, e.g. ipython or a Jupyter Notebook (this API tutorial is itself created as a Jupyter Notebook).

[1]:

%cd demo

C:\Users\olawid\PycharmProjects\psm\doc\demo

Loading parsim Case and Study objects

In the previous section, we created parameters studies box_ff2n and box_gsd. The parsim APi can be used to load the corresponding parsim objects into a Python session.

We start by importing the parsim API:

[2]:

import parsim.core as ps

Remeber that we started the Python interpreter in the project directory, which is therefor our current directory. Let’s load the two studies from the example above. To do this we uspply the study name as an argument to the Study class constructor:

[3]:

s_ff2n = ps.Study('box_ff2n')
s_gsd = ps.Study('box_gsd')

We can also load individual cases with the Case class. To open an individual case, which is part of Study, we use the names of both Study and Case, separated by a colon:

[4]:

c_gsd_4 = ps.Case('box_gsd:4')

On the command-line, the psm info command can be used to output information about studies and cases. In a Python console, you can have the same information by printing the output of the object info method:

[5]:

print(s_ff2n.info())

Study name            : box_ff2n
Creation date         : 2019-08-29 16:55:58
Description           :
Project name          : myProject
Template path         : C:\Users\olawid\PycharmProjects\psm\doc\demo\modelTemplates\box
Parsim version        : 1.0.dev
Project path          : C:\Users\olawid\PycharmProjects\psm\doc\demo
Caselist/DOE params   : ['length', 'width', 'height', 'density']
DOE scheme            : ff2n
DOE arguments         : {'beta': 0.999}
--------------------------------------------------------
Variable parameter distributions (DOE)
--------------------------------------------------------
length                : uniform(10, 4)
width                 : uniform(3, 2)
height                : uniform(1.3, 0.4)
density               : uniform(950, 100)
--------------------------------------------------------
Default parameters (defined in template)
--------------------------------------------------------
output_file           : results.json
color                 : black
-------------------------------------------------------------------------
 Case#  Case_ID           Description
-------------------------------------------------------------------------
 1      1
 2      2
 3      3
 4      4
 5      5
 6      6
 7      7
 8      8
 9      9
 10     10
 11     11
 12     12
 13     13
 14     14
 15     15
 16     16
-------------------------------------------------------------------------

In a similar manner you can show the log for a case or study. To look at the logged history of the 4th GSD case, for example,

[6]:

print(c_gsd_4.log())

2019-08-29 16:56:49 - INFO: Parsim case "4" successfully created
2019-08-29 16:56:52 - INFO: Executing command/script/executable: calc.py
2019-08-29 16:56:52 - INFO: Executable finished successfully (runtime: 0:00:00.155078)
   Executable : C:\Users\olawid\PycharmProjects\psm\doc\demo\study_box_gsd\case_4\calc.py
   stdout     : C:\Users\olawid\PycharmProjects\psm\doc\demo\study_box_gsd\case_4\calc.out
   stderr     : C:\Users\olawid\PycharmProjects\psm\doc\demo\study_box_gsd\case_4\calc.err

Working with input parameters and results

In the following example, we will show how we can use the parsim API and the functionality of the pandas library. We will also show how we can make a simple linear regression model using the statsmodels library

We start by importing the popular packages pandas and statsmodels. We also import the parsim API from parsim.core.

[7]:

import pandas as pd
import statsmodels.api as sm

import parsim.core as ps

We load the full-factorial study, as before:

[8]:

s_ff2n = ps.Study('box_ff2n')

Parameters and results as pandas DataFrame and Series objects

Both Study and Case classes provide data in the form of pandas objects. For a study, the caselist attribute is a DataFrame with the values of all varying parameters. Similarily, all collected results of the study are aggregated in results:

[9]:

s_ff2n.caselist

[9]:

	length	width	height	density
CASENAME
1	13.998	4.999	1.6998	1049.95
2	10.002	4.999	1.6998	1049.95
3	13.998	3.001	1.6998	1049.95
4	10.002	3.001	1.6998	1049.95
5	13.998	4.999	1.3002	1049.95
6	10.002	4.999	1.3002	1049.95
7	13.998	3.001	1.3002	1049.95
8	10.002	3.001	1.3002	1049.95
9	13.998	4.999	1.6998	950.05
10	10.002	4.999	1.6998	950.05
11	13.998	3.001	1.6998	950.05
12	10.002	3.001	1.6998	950.05
13	13.998	4.999	1.3002	950.05
14	10.002	4.999	1.3002	950.05
15	13.998	3.001	1.3002	950.05
16	10.002	3.001	1.3002	950.05

[10]:

s_ff2n.results

[10]:

	base_area	volume	mass
CASENAME
1	69.976002	118.945208	124886.521349
2	49.999998	84.989997	89235.246931
3	42.007998	71.405195	74971.884491
4	30.016002	51.021200	53569.709150
5	69.976002	90.982798	95527.388551
6	49.999998	65.009997	68257.246770
7	42.007998	54.618799	57347.008010
8	30.016002	39.026806	40976.194750
9	69.976002	118.945208	113003.895050
10	49.999998	84.989997	80744.746270
11	42.007998	71.405195	67838.505510
12	30.016002	51.021200	48472.691250
13	69.976002	90.982798	86438.207050
14	49.999998	65.009997	61762.748029
15	42.007998	54.618799	51890.589990
16	30.016002	39.026806	37077.416851

The results DataFrame may contain missing values, NaN, if the case simulation crashed or failed to produce a result.

The parameter attribute shows the values and sources of the parameters that all cases of the study have in common. In the present example, all constant parameters have their default values.

[11]:

s_ff2n.parameters

[11]:

	value	source
color	black	default
output_file	results.json	default

Case objects have the attributes parameters and results, but for obvious reasons not the caselist.

[12]:

c_ff2n_4 = ps.Case('box_ff2n:4')

c_ff2n_4.parameters

[12]:

	value	source
length	10.002	caselist
width	3.001	caselist
height	1.6998	caselist
density	1049.95	caselist
color	black	default
output_file	results.json	default

Note that it’s easy to see if a parameter is set by the caselist, on the user commandline, or if it’s a default value.

Case.parameters is a pandas DataFrame, while Case.results is pandas Series object,

[13]:

c_ff2n_4.results

[13]:

base_area       30.016002
volume          51.021200
mass         53569.709150
Name: results, dtype: float64

Merging datasets from several Studies

Sooner or later, you will be in a situation where you will want to make a joint analysis of the results from more than one Study. Maybe you started out with a small study, which only allows for a simple linear model, for example a factorial design in two levels, such as the box_ff2n study avobe. Then you realize that a one or two parameters should rather be varied on at least three levels, in order to capture also quadratic terms. For example the box_gsd study se saw earlier, where both length and width parameters were varied on three levels. From a statistical point of view, this is a rather poor example, but that’s another story… Let’s go ahead and load the GSD study as well, and merge the two for analysis…

[14]:

s_gsd = ps.Study('box_gsd')

The results and the caselist from both studies can now be concatenated, but we need to decide what to do with the fact that the DataFrames from both studies have the same index entries. We could either ignore the original indices altogether, creating a new index in the process, or try to keep track of the identities of the cases also in the concatenated datasets. Here we chose the latter option, by creating a hierachical index indicating the names of the original studies. (The option sort=False is used to avoid sorting the columns in lexical order.)

[15]:

results = pd.concat([s_ff2n.results, s_gsd.results], keys=['ff2n', 'gsd'], sort=False)
caselist = pd.concat([s_ff2n.caselist, s_gsd.caselist], keys=['ff2n', 'gsd'], sort=False)

[16]:

caselist

[16]:

		length	width	height	density
	CASENAME
ff2n	1	13.998	4.999	1.6998	1049.95
	2	10.002	4.999	1.6998	1049.95
	3	13.998	3.001	1.6998	1049.95
	4	10.002	3.001	1.6998	1049.95
	5	13.998	4.999	1.3002	1049.95
	6	10.002	4.999	1.3002	1049.95
	7	13.998	3.001	1.3002	1049.95
	8	10.002	3.001	1.3002	1049.95
	9	13.998	4.999	1.6998	950.05
	10	10.002	4.999	1.6998	950.05
	11	13.998	3.001	1.6998	950.05
	12	10.002	3.001	1.6998	950.05
	13	13.998	4.999	1.3002	950.05
	14	10.002	4.999	1.3002	950.05
	15	13.998	3.001	1.3002	950.05
	16	10.002	3.001	1.3002	950.05
gsd	1	13.998	4.999	1.6998	1049.95
	2	13.998	3.001	1.6998	1049.95
	3	10.002	4.999	1.6998	1049.95
	4	10.002	3.001	1.6998	1049.95
	5	13.998	4.999	1.3002	950.05
	6	13.998	3.001	1.3002	950.05
	7	10.002	4.999	1.3002	950.05
	8	10.002	3.001	1.3002	950.05
	9	13.998	4.000	1.6998	950.05
	10	10.002	4.000	1.6998	950.05
	11	13.998	4.000	1.3002	1049.95
	12	10.002	4.000	1.3002	1049.95
	13	12.000	4.999	1.6998	950.05
	14	12.000	3.001	1.6998	950.05
	15	12.000	4.999	1.3002	1049.95
	16	12.000	3.001	1.3002	1049.95
	17	12.000	4.000	1.6998	1049.95
	18	12.000	4.000	1.3002	950.05

Linear regression analysis

We will use the statsmodels package to fit the combined data to a simple linear model.

For numerical robustness, you are strongly adviced to normalize or standardize your data! We name our standardized datasets y and x, for dependent and independent variables, respectively.

[17]:

y = (results - results.mean())/results.std()
x = (caselist - caselist.mean())/caselist.std()

Fitting a model statsmodelstypically involves three steps:

Use the model class to describe the model,
Fit the model
Inspect the results

Here we use the OLS class, for ordinary least squares.

First we define and fit the model to our data; we use the output variable volume for this example:

[20]:

mod = sm.OLS(y['volume'], x)  # define model
res = mod.fit()               # fit model to data

Now let’s look at the results:

[21]:

res.summary()

[21]:

OLS Regression Results
Dep. Variable:	volume	R-squared:	0.973
Model:	OLS	Adj. R-squared:	0.969
Method:	Least Squares	F-statistic:	269.5
Date:	Thu, 26 Sep 2019	Prob (F-statistic):	4.81e-23
Time:	16:53:39	Log-Likelihood:	13.618
No. Observations:	34	AIC:	-19.24
Df Residuals:	30	BIC:	-13.13
Df Model:	4
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
length	0.4915	0.030	16.361	0.000	0.430	0.553
width	0.7373	0.030	24.541	0.000	0.676	0.799
height	0.4333	0.030	14.398	0.000	0.372	0.495
density	-1.11e-16	0.030	-3.69e-15	1.000	-0.061	0.061

Omnibus:	6.096	Durbin-Watson:	1.613
Prob(Omnibus):	0.047	Jarque-Bera (JB):	5.828
Skew:	1.004	Prob(JB):	0.0543
Kurtosis:	2.713	Cond. No.	1.06

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

The middle section of this summary output shows the regression coefficients, their standrad error and some statistics (t and F values) to help us judge how important the parameters are for predicting the output quantity. Since we standardized both parameters and result quantities, the standard error is the same for all independent variables. It should come as no surprise that density does not contribute to the predicted volume (the coefficient is extremly low).