Download OpenAPI specification:Download
aivis Response Analysis is one of the engines of the aivis Technology Platform by Vernaio.
aivis Response Analysis provides insights into underlying causes and informs about how to accomplish the user defined goal with clear and actionable instructions discovered from the input data.
It provides breakdowns of the input data with respect to a goal, i.e., a key performance indicator (KPI), such that users can learn not only the cause-and-effect but also ways to achieve/optimize the KPI.
This is done by recursively performing counterfactual analyses on the relationships between the KPI and the corresponding data input, looking for conditions that have the most impact on the change in KPI, a concept that is similar to the controlled direct effect of causal inference, proposed by J. Pearl (2001).
As such, its explainability is one of the key strengths of the engine, which differentiates itself from other conventional machine learning models.
By improving traditional decision/regression tree models with novel mathematics, aivis Response Analysis achieves a cutting edge performance in identifying the causes of disruptions or/and in finding the ways to accomplish the user defined goals, while requiring a minimum input/configurations from the users.
The engine generates an analysis report based on historical tabular data that includes data values that represents the user's objective such as a KPI.
This documentation explains the usage and principles behind aivis Response Analysis to data and software engineers. For detailed API descriptions of docker images, web endpoints and SDK functions, please consult the reference manual of the regarding component:
SDKs
Docker Images
App-API
Web-API
For additional support, go to Vernaio Support.
Currently, aivis Response Analysis is distributed to a closed user base only. To gain access to the artifacts, as well as for any other questions, you can open a support ticket via aivis Support.
Unlike other aivis engines, The Response Analysis engine takes tabular data as an input. It is thus important to understand the concept of tabular data in the context of the Response Analysis engine. This chapter thus explains the terminologies as well as the required format of tabular data.
A typical example of tabular data is a score board, say, of a football league.
Team | Score | Win | Loss | Draw | Home | Injury |
---|---|---|---|---|---|---|
FC Blue | 109 | 10 | 2 | 3 | "City A" | true |
FC Red | 92 | 8 | 3 | 4 | "City A" | false |
FC Black | 78 | 6 | 3 | 6 | "City B" | |
... | ... | ... | ... | ... | ... | ... |
Tabular data consist of columns. Every column contains two things, these being
Score
and Win
. The column_id needs to be unique within the data.row_id
, value
), like (FC Blue
, 109
).Tabular data fed into the engine input may look like this:
Score | Win | Loss | Draw | Home | Injury |
---|---|---|---|---|---|
(FC Blue, 109) | (FC Blue, 10) | (FC Blue, 2) | (FC Blue, 3) | (FC Blue, "City A") | (FC Blue, true) |
(FC Red, 92) | (FC Red, 8) | (FC Red, 3) | (FC Red, 4) | (FC Red, "City A") | (FC Red, false) |
(FC Black, 78) | (FC Black, 6) | (FC Black, 3) | (FC Black, 6) | (FC Black, "City B") |
Here, Score
, Win
, Loss
, etc, are column_ids. And, the data entries are cells.
The value of a cell can be float, string, or boolean. aivis Response Analysis engine can handle empty cells.
In aivis we also call a row a record. In the above case, we have 3 records.
Numbers are stored as 64-bit Floating Point numbers. They are written in scientific notation like -341.4333e-44
, so they consist of the compulsory part Significand and an optional part Exponent that is separated by an e
or E
.
The Significand contains one or multiple digits and optionally a decimal separator .
. In such a case, digits before or after the separator can be ommited and are assumed to be 0
. It can be prefixed with a sign (+
or -
).
The Exponent contains one or multiple digits and can be prefixed with a sign, too.
The 64-bit Floating Point specification also allows for 3 non-finite values (not a number, positive infinity and negative infinity) that can be written as nan
, inf
/+inf
and -inf
(case insensitive). These values are valid, but the engine regards them as being unknown and they are therefore skipped.
Regular expression: (?i:nan)|[+-]?(?i:inf)|[+-]?(?:\d+\.?|\d*\.\d+)(?:[Ee][+-]?\d+)?
String values must be encoded as UTF-8. Empty strings are regarded as being unknown values and are therefore skipped.
Boolean values must be written in one of the following ways:
true
/false
(case insensitive)1
/0
1.0
/0.0
with an arbitrary number of additional zeros at the endRegular expression: (?i:true)|(?i:false)|1(\.0+)?|0(\.0+)?
aivis Response Analysis engine performs an analysis on the input tabular data. The tabular data should contain a column that the user wants to analyze, such as a KPI. This column will be referred to as the target. The end result of the engine is the report.
Equipped with the knowledge of tabular data, we are now ready to use aivis Response Analysis engine. As an illustrative use case example, we will use the engine to learn about German credit health, e.g., what attributes positively and negatively contributes to one's credit health. Each person is represented by a row, listing a number of attributes and one's credit health, the latter of which will be used as the target KPI.
The SDK of aivis Response Analysis allows for direct calls from your C, Java or Python program code. All language SDKs internally use our native shared library (FFI). As C APIs can be called from various other languages as well, the C-SDK can also be used with languages such as R, Go, Julia, Rust, and more. Compared to the docker images, the SDK enables a more fine-grained usage and tighter integration.
In this chapter we will show you how to get started using the SDK.
A working sdk example that builds on the code explained below can be downloaded directly here:
This zip file contains example code for docker
, python
and java
in respective subfolders. All of them use the same dataset which is in the data subfolder.
Required artifacts:
.whl
-files which you will receive in a libs.zip directly from aivis Support:vernaio_aivis_engine_v2_ra_runtime_python_full-{VERSION}-py3-none-win_amd64.whl
: An response analysis full python runtimevernaio_aivis_engine_v2_base_sdk_python-{VERSION}-py3-none-any.whl
: The base python sdk vernaio_aivis_engine_v2_ra_sdk_python-{VERSION}-py3-none-any.whl
: The response analysis python sdk vernaio_aivis_engine_v2_toolbox-{TOOLBOX-VERSION}-py3-none-any.whl
: The toolbox python sdk - optional for HTML report generationPreparations:
>=3.10
) installation.AIVIS_ENGINE_V2_API_KEY
and assign the licensing key to it.response-analysis-examples.zip
. The data CSV train_ra.csv
needs to stay in **/data
. libs.zip
. These .whl
-files need to be in **/libs
.The folder now has the following structure:
+- data
| +- train_ra.csv
|
+- docker
| +- # files to run the example via docker images, which we will not need now
|
+- java
| +- # files to run the example via java sdk, which we will not need now
|
+- libs
| +- # the .whl files to run aivis
|
+- python
| +- # files to run the example via python sdk
Running the example code:
**/python
subfolder. Here, you find the classic python script example_ra.py
and the jupyter notebook example_ra.ipynb
.
Both run the exact same example and output the same result. Choose which one you want to run. .whl
files. We will now explain two options, which are installing them via pip install or installing them via poetry. Many other options are also possible, of course.Option A: pip install (only for the classic python script example_ra.py
, not for the jupyter notebook example_ra.ipynb
)
**/python
subfolder and run the following commands: # installs the `.whl` files
pip install -r requirements-<platform>.txt
# runs the classic python script `example_ra.py`
python example_ra.py --input=../data --output=output
Option B: poetry install
# installs poetry (a package manager)
python -m pip install poetry
# installs the `.whl` files
poetry install --no-root
# runs the classic python script `example_ra.py`
poetry run python example_ra.py --input=../data --output=output
**/python
subfolder. The first one might take a while, the third one opens a tab in your browser. # installs the `.whl` files
poetry install --no-root
# installs jupyter kernel
poetry run ipython kernel install --user --name=test_ra
# runs the jupyter python script `example_ra.ipynb`
poetry run jupyter notebook example_ra.ipynb
After running the scripts, you will find your computation results in **/python/output
.
Additionally to the response-analysis-examples.zip
you just downloaded, you need the following artifacts. To gain access, you can open a support ticket via aivis Support.
Required artifacts:
.jar
files which you will receive in a libs.zip directly from aivis Support:aivis-engine-v2-ra-runtime-java-full-win-x8664-{VERSION}.jar
: An response analysis full java runtime, here for windows, fitting your operating system - see artifacts for other options on linux and macos. aivis-engine-v2-base-sdk-java-{VERSION}.jar
: The base java sdk aivis-engine-v2-ra-sdk-java-{VERSION}.jar
: The response analysis java sdk Preparations:
>=11
) installation.AIVIS_ENGINE_V2_API_KEY
and assign the licensing key to it.response-analysis-examples.zip
. The data CSVs train_ra.csv
needs to stay in **/data
. libs.zip
. These .jar
-files need to be in **/libs
.The folder now has the following structure:
+- data
| +- train_ra.csv
|
+- docker
| +- # files to run the example via docker images, which we will not need now
|
+- java
| +- # files to run the example via java sdk
|
+- libs
| +- # the .jar files to run aivis
|
+- python
| +- # files to run the example via python sdk, which we will not need now
Running the example code:
**/java
subfolder. Here, you find the build.gradle
. Check, if the paths locate correctly to your aivis engine v2 .jar files in the **/libs
subfolder. **/java
subfolder and run the following commands: # builds this Java project with gradle wrapper
./gradlew clean build
# runs Java with parameters referring to input and output folder
java -jar build/libs/example_ra.jar --input=../data --output=output
After running the scripts, you will find your computation results in **/java/output
.
Our SDK artifacts come in one flavor full
(note that inf
flavor will be introduced in the future release):
full
packages provide the full functionality and are available for mainstream targets only:win-x8664
macos-armv8
* (macOS 11 "Big Sur" or later) 2.3macos-x8664
* (macOS 11 "Big Sur" or later; until aivis engine version 2.9.0) 2.3linux-x8664
(glibc >= 2.14)* Only Python and C SDKs are supported. Java SDK is not available for this target.
In this chapter we want to demonstrate the full API functionality and thus always use the full
package.
To use the Python-SDK you must download the SDK artifact (flavor and target generic) for your pythonpath at build time. Additionally at installation time, the runtime artifact must be downloaded with the right flavor and target.
The artifacts are distributed through a PyPI registry.
Using Poetry you can simply set a dependency on the artifacts specifying flavor and version. The target is chosen depending on your installation system:
aivis_engine_v2_ra_sdk_python = "{VERSION}"
aivis_engine_v2_ra_runtime_python_{FLAVOR} = "{VERSION}"
To use the Java-SDK, you must download at build time:
It is possible to include multiple runtime artifacts for different targets in your application to allow cross-platform usage. The SDK chooses the right runtime artifact at runtime.
The artifacts are distributed through a Maven registry.
Using Maven, you can simply set a dependency on the artifacts specifying flavor, version and target:
<dependency>
<groupId>com.vernaio</groupId>
<artifactId>aivis-engine-v2-ra-sdk-java</artifactId>
<version>{VERSION}</version>
</dependency>
<dependency>
<groupId>com.vernaio</groupId>
<artifactId>aivis-engine-v2-ra-runtime-java-{FLAVOR}-{TARGET}</artifactId>
<version>{VERSION}</version>
<scope>runtime</scope>
</dependency>
Alternativly, with Gradle:
implementation 'com.vernaio:aivis-engine-v2-ra-sdk-java:{VERSION}'
runtimeOnly 'com.vernaio:aivis-engine-v2-ra-runtime-java-{FLAVOR}-{TARGET}:{VERSION}'
To use the C-SDK, you must download the SDK artifact at build time (flavor and target generic). For final linkage/execution you need the runtime artifact with the right flavor and target.
The artifacts are distributed through a Conan registry.
Using Conan, you can simply set a dependency on the artifact specifying flavor and version. The target is chosen depending on your build settings:
aivis-engine-v2-ra-sdk-c/{VERSION}
aivis-engine-v2-ra-runtime-c-{FLAVOR}/{VERSION}
The SDK artifact contains:
include/aivis-engine-v2-ra-core-full.h
The runtime artifact contains:
lib/aivis-engine-v2-ra-{FLAVOR}-{TARGET}.lib
bin/aivis-engine-v2-ra-{FLAVOR}-{TARGET}.dll
(also containing the import library)lib/aivis-engine-v2-ra-{FLAVOR}-{TARGET}.so
(also containing the import library)The runtime library must be shipped to the final execution system.
A valid licensing key is necessary for every aivis calculation in every engine and every component.
It has to be set (exported) as environment variable AIVIS_ENGINE_V2_API_KEY
.
aivis will send HTTPS requests to https://v3.aivis-engine-v2.vernaio-licensing.com
(before release 2.7: https://v2.aivis-engine-v2.vernaio-licensing.com
, before release 2.3: https://aivis-engine-v2.perfectpattern-licensing.de
) to check if your licensing key is valid. Therefore, requirements are an active internet connection as well as no firewall blocking an application other than the browser calling this url.
If aivis returns a licensing error, please check the following items before contacting aivis Support:
<FirstPartOfKey>.<SecondPartOfKey>
with first and second part being UUIDs. In particular, there must be no whitespace. https://v3.aivis-engine-v2.vernaio-licensing.com
in your browser. The expected outcome is "Method Not Allowed". In that case, at least the url is not generally blocked. Before we can invoke API functions of our SDK, we need to set it up for proper usage and consider the following things.
It is important to ensure the release of allocated memory for unused objects.
In Python, freeing objects and destroying engine resources is done automatically. You can force resource destruction with the appropriate destroy function.
In Java, freeing objects is done automatically, but you need to destroy all engine resources like Data- and Analysis-objects with the appropriate destroy function. As they all implement Java’s AutoClosable
interface, we can also write a try-with-resource
statement to auto-destroy them:
try(final ResponseAnalysisData inputData = ResponseAnalysisData.create()) {
// ... do stuff ...
} // auto-destroy when leaving block
In C, you must always
Errors and exceptions report what went wrong on a function call. They can be caught and processed by the outside.
In Python, an Exception
is thrown and can be caught conveniently.
In Java, an AbstractAivisException
is thrown and can be caught conveniently.
In C, every API function can write an error to the given output function parameter &err
(to disable this, just set it to NULL
). This parameter can then be checked by a helper function similar to the following:
const Error *err = NULL;
void check_err(const Error **err, const char *action) {
// everything is fine, no error
if (*err == NULL)
return;
// print information
printf("\taivis Error: %s - %s\n", action, (*err)->json);
// release error pointer
aivis_free(*err);
*err = NULL;
// exit program
exit(EXIT_FAILURE);
}
Failures within function calls will never affect the state of the engine.
The engine emits log messages to report on the progress of each task and to give valuable insights. These log messages can be caught via registered loggers.
# create logger
class Logger(EngineLogger):
def log(self, level, thread, module, message):
if (level <= 3):
print("\t... %s" % message)
# register logger
ResponseAnalysisSetup.register_logger(Logger())
// create and register logger
ResponseAnalysisSetup.registerLogger(new EngineLogger() {
public void log(int level, String thread, String module, String message) {
if (level <= 3) {
System.out.println(String.format("\t... %s", message));
}
}
});
// create logger
void logger(const uint8_t level, const char *thread, const char *module, const char *message) {
if (lvl <= 3)
printf("\t... %s\n", message);
}
// register logger
aivis_setup_register_logger(&logger, &err);
check_err(&err, "Register logger");
During the usage of the engine, a lot of calculations are done. Parallelism can drastically speed things up. Therefore, set the maximal threads to a limited number of CPU cores or set it to 0
to use all available cores (defaults to 0
).
# init thread count
ResponseAnalysisSetup.init_thread_count(4)
// init thread count
ResponseAnalysisSetup.initThreadCount(4);
// init thread count
aivis_setup_init_thread_count(4, &err);
check_err(&err, "Init thread count");
Now that we are done setting up the SDK, we need to create a data store that holds our historical tabular data. In general, all data must always be provided through data stores. You can create as many as you want.
After the creation of the data store, you can fill it with signal data. The classic way to do it is writing your own reading function and adding signals, i.e. lists of data points, to the data context yourself, as it is shown in Data Reader Options.
We recommend to use the built-in files reader, which processes a folder with csv files that have to follow the CSV Format Specification.
We assume, that the folder path/to/input/folder/
contains train_ra.csv
.
# create empty data context for analysis data
analysis_data = ResponseAnalysisData.create()
# create config for files reader
files_reader_config = json.dumps(
{
"folder": "path/to/input/folder/"
}
)
# read data
analysis_data.read_files(files_reader_config)
# ... use analysis data ...
// create empty data context for analysis data
try(final ResponseAnalysisData analysisData = ResponseAnalysisData.create()) {
// create config for files reader
final DtoTabularFilesReaderConfig filesReaderConfig = new DtoTabularFilesReaderConfig("path/to/input/folder/");
// read data
analysisData.readFiles(filesReaderConfig);
// ... use analysis data ...
} // auto-destroy analysis data
// create empty data context for analysis data
TabularDataHandle analysis_data = aivis_tabular_data_create(&err);
check_err(&err, "Create analysis data context");
// create config for files reader
const char *reader_config = "{"
"\"folder\": \"path_to_input_folder\""
"}";
// read data
aivis_tabular_data_read_files(analysis_data, (uint8_t *) reader_config, strlen(reader_config), &err);
check_err(&err, "Read Files");
// ... use analysis data ...
// destroy data context
aivis_tabular_data_destroy(analysis_data, &err);
check_err(&err, "Destroy data context");
analysis_data = 0;
In the following, we will assume you have read in the file train_ra.csv
shipped with the Example Project.
With the data store filled with historical tabular data, we can now create our analysis:
# build analysis config
analysis_config = json.dumps(
{
"dataFilter": {"excludeColumns": ["telephone"]}, # this field is optional
"target": {
"kpi": "TARGET",
"interest": "HIGH_KPI",
},
"strategy": { # this field is optional
"minimalFractionOfRecords": 0.05,
},
}
)
# create analysis
analysis = ResponseAnalysis.create(analysis_data, analysis_config)
# ... use analysis ...
// build analysis config
final DtoAnalysisConfig analysisConfig =
new DtoAnalysisConfig(new DtoTargetConfig("TARGET", DtoInterest.HIGH_KPI)).withDataFilter(
new DtoTabularDataFilter().withExcludeColumns(new String[] { "telephone" }) // this field is optional
).withStrategy(new DtoStrategy().withMinimalFractionOfRecords(0.05)); // this field is optional
// create analysis
final ResponseAnalysis analysis = ResponseAnalysis.create(analysisData, analysisConfig) {
// ... use analysis ...
} // auto-destroy analysis
// build analysis config
const char *analysis_config = "{"
"\"dataFilter\": {\"excludeColumns\": [\"telephone\"]}," // this field is optional
"\"target\": {"
"\"kpi\": \"TARGET\","
"\"interest\": \"HIGH_KPI\" "
"},"
"\"strategy\": {" // this field is optional
"\"minimalFractionOfRecords\": 0.05 "
"}"
"}";
// create analysis
ResponseAnalysisHandle analysis_handle = aivis_response_analysis_create(
analysis_data,
(uint8_t *) analysis_config,
strlen(analysis_config),
&err
);
check_err(&err, "Create analysis");
// ... use analysis ...
// destroy analysis
aivis_response_analysis_destroy(analysis_handle, &err);
check_err(&err, "Destroy analysis");
analysis_handle = 0;
Note that the target column is specified by kpi in target. The field called interest then defines whether the goal is to maximize (high_kpi) or minimize (low_kpi) the target. The field strategy is optional and dictates the minimum number of records in the leaf nodes.
For the moment, you may take this file as it is. The different keys will become clearer from the later sections and the reference manual.
The docker images of aivis Response Analysis are prepared for easy usage. They use the SDK internally, but have a simpler file-based interface. If you have a working docker workflow system like Argo, you can build your own automated workflow based on these images.
In this chapter, we will show you how to get started using docker images. Usage of the SDK will be covered by the next chapter.
A working example that builds on the code explained below can be downloaded directly here: response-analysis-examples.zip.
This zip file contains example code for docker, python and java in respective subfolders. All of them use the same dataset which is in the data subfolder.
Prerequisites: Additionally to the response-analysis-examples.zip
you just downloaded, you need the following artifacts. To gain access, you can open a support ticket via aivis Support.
aivis-engine-v2-ra-worker
and (optionally for HTML report generation) aivis-engine-v2-toolbox
As a Kubernetes user even without deeper Argo knowledge, the aivis-engine-v2-example-ra-argo.yaml
shows best how the containers are executed after each other, how analysis worker is provided with a folder that contains the data csv and how the toolbox assembles a HTML report at the end.
There is one docker image:
{REGISTRY}/{NAMESPACE}/aivis-engine-v2-ra-worker:{VERSION}
The docker image is Linux-based.
You need an installation of Docker on your machine as well as access to the engine artifacts
docker -v
docker pull {REGISTRY}/{NAMESPACE}/aivis-engine-v2-ra-worker:{VERSION}
docker -v
docker pull {REGISTRY}/{NAMESPACE}/aivis-engine-v2-ra-worker:{VERSION}
A valid licensing key is necessary for every aivis calculation in every engine and every component.
It has to be set (exported) as environment variable AIVIS_ENGINE_V2_API_KEY
.
aivis will send HTTPS requests to https://v3.aivis-engine-v2.vernaio-licensing.com
(before release 2.7: https://v2.aivis-engine-v2.vernaio-licensing.com
, before release 2.3: https://aivis-engine-v2.perfectpattern-licensing.de
) to check if your licensing key is valid. Therefore, requirements are an active internet connection as well as no firewall blocking an application other than the browser calling this url.
If aivis returns a licensing error, please check the following items before contacting aivis Support:
<FirstPartOfKey>.<SecondPartOfKey>
with first and second part being UUIDs. In particular, there must be no whitespace. https://v3.aivis-engine-v2.vernaio-licensing.com
in your browser. The expected outcome is "Method Not Allowed". In that case, at least the url is not generally blocked.
All artifacts use CSV as the input data format. As the CSV format is highly non-standardized, we will discuss it briefly in this section.
CSV files must be stored in a single folder specified in the config under data.folder. Within this folder the CSV files can reside in an arbitrary subfolder hierarchy. In some cases (e.g. for HTTP requests), the folder must be passed as a ZIP file.
General CSV rules:
CR LF
/LF
). In other words, each record must be on its own line.Special rules:
id
and contain the row_ids.id
, are interpreted as "columns" of tabular data.
Here, we will analyze the target column with aivis Response Analysis engine.
At the beginning, we create a folder docker
, a subfolder analysis-config
and add the configuration file config.yaml
:
data:
folder: /srv/data
dataTypes:
defaultType: FLOAT
stringColumns: # this field is optional
- "status_of_existing_checking_account"
- "purpose"
- "savings_account"
- "credit_history"
- "marital_status_and_sex"
- "other_debtors"
- "property"
- "other_installment_plans"
- "housing"
- "job"
- "telephone"
- "foreign_worker"
analysis:
dataFilter: # this field is optional
excludeColumns:
- telephone
target:
kpi: TARGET
interest: HIGH_KPI
strategy: # this field is optional
minimalFractionOfRecords: 0.05
output:
folder: /srv/output
Note that the target column is specified by kpi in target. The field called interest then defines whether the goal is to maximize (high_kpi) or minimize (low_kpi) the target. The field strategy is optional and dictates the minimum number of records in the leaf nodes.
For the moment, you may take this file as it is. The different keys will become clearer from the later sections and the docker reference manual.
As a next step, we create a second folder data
and add the Input Data CSV file train_ra.csv
to the folder. Afterwards, we create a blank folder output
.
Our folder structure should now look like this:
+- docker
| +- analysis-config
| +- config.yaml
|
+- data
| +- train_ra.csv
|
+- output
Finally, we can start our analysis via:
docker run --rm -it \
-v $(pwd)/docker/analysis-config:/srv/conf \
-v $(pwd)/data/train_ra.csv:/srv/data/train_ra.csv \
-v $(pwd)/output:/srv/output \
{REGISTRY}/{NAMESPACE}/aivis-engine-v2-ra-worker:{VERSION}
docker run --rm -it `
-v ${PWD}/docker/analysis-config:/srv/conf `
-v ${PWD}/data/train_ra.csv:/srv/data/train_ra.csv `
-v ${PWD}/output:/srv/output `
{REGISTRY}/{NAMESPACE}/aivis-engine-v2-ra-worker:{VERSION}
After a short time, this results in an output file analysis-report.json
in the output
folder.
aivis Response Analysis engine outputs a report, which can be visualized as follows.
On the very left, one finds the root node that contains all records.
Suppose TARGET
values range from 0 to 1.
Mean KPI in the root node is 0.7, which is the average value of TARGET
entries of all records. The root node splits out to two child nodes.
Each child node is reached by a predicate, which is expressed as, for example,
A < x
, where A
is a column_id, and x
is a value that splits the records
of the node. It is the core engine algorithm that picks the column and the
value to best split the node.
A child node is itself again split into two nodes and so on.
Splitting stops when
The final un-split nodes are called leaf nodes (the square nodes). Not only are those records that belong to leaf nodes
grouped to have similar (if not the same) TARGET
values, but they also tend to have extreme values.
This implies the "good" and "bad" records are well separated. aivis Response Analysis engine identifies the most informative path ways to arrive at "good" and "bad" TARGET value scenarios.
By inspecting the predicates that lead to the leaf nodes, therefore, one can learn the reasons for good (or the cause of bad) performance of the KPI.
In this section, we discuss the results of our example project, German credit health. It contains 1000 people's credit health evaluation, compiled by a bank. The target KPI, which naturally represents the credit health, is binary, i.e., 0 for bad and 1 for good. The global average of the KPI is 0.7 meaning that 700 people (records) have good credit health (1) and 300 bad (0).
Previous sections gave an introduction on how to use aivis Response Analysis and also shed some light on how it works. The following sections will explain more on the concept and provide a more profound background. It is not necessary to know this background to use aivis Response Analysis! However, you may find convenient solutions for specific problems, or information on how to optimize your usage of aivis Response Analysis. It will become clear that only minimal user input is required for the engine to perform well. Nevertheless, the user has the option to control the process with several input parameters which will be presented below.
First, an overview of all kinds of possible configuration keys is presented. A more minimal analysis configuration was used above in SDK analysis, respectively, in Docker analysis. This example may mainly serve as a quick reference. The meaning of the different keys is explained in the following sections, and a definition of the syntax is given in the reference manuals.
analysis:
dataFilter:
excludeColumns:
- telephone
# includeColumns: ... either exclude or include columns
target:
kpi: TARGET
interest: HIGH_KPI # or LOW_KPI
weight: MY_WEIGHT_COLUMN
columns:
- column: COLUMN_1
interpreter:
_type: Categorical
- column: COLUMN_2
interpreter:
_type: Numerical
- column: COLUMN_3
interpreter:
_type: Numerical
quantileCount: 100
strategy:
minimalFractionOfRecords: 0.05
analysis_config = json.dumps({
"dataFilter": {
"excludeColumns": ["telephone"]
# "includeColumns": ... either exclude or include columns
},
"target": {
"kpi": "TARGET",
"interest": "HIGH_KPI",
"weight": "MY_WEIGHT_COLUMN",
},
"columns": [
{
"column": COLUMN_1,
"interpreter": {
"_type": "Categorical"
}
},
{
"column": COLUMN_2,
"interpreter": {
"_type": "Numerical"
}
},
{
"column": COLUMN_3,
"interpreter": {
"_type": "Numerical",
"quantileCount": 100,
}
},
],
"strategy": {
"minimalFractionOfRecords": 0.05
}
})
final DtoAnalysisConfig analysisConfig =
new DtoAnalysisConfig(new DtoTargetConfig("TARGET", DtoInterest.HIGH_KPI).withWeight("MY_WEIGHT_COLUMN")).withDataFilter(
new DtoTabularDataFilter().withExcludeColumns(new String[] { " telephone " })
)
// .withDataFilter(new DtoTabularDataFilter().withIncludeColumns(new String[] {"..."})) either exclude or include columns
.withColumns(
new IDtoColumnConfig[] {
new DtoColumnConfig("COLUMN_1").withInterpreter(new DtoCategoricalColumnInterpreter()),
new DtoColumnConfig("COLUMN_2").withInterpreter(new DtoNumericalColumnInterpreter()),
new DtoColumnConfig("COLUMN_3").withInterpreter(new DtoNumericalColumnInterpreter().withQuantileCount(100)), }
)
.withStrategy(new DtoStrategy().withMinimalFractionOfRecords(0.05));
const char *analysis_config = "{"
"\"dataFilter\": {"
"\"excludeColumns\": ["
"\"telephone\","
"]",
//"\"includeColumns\": [...]" either exclude or include columns
"}",
"\"target\": {"
"\"kpi\": \"TARGET\","
"\"interest\": \"HIGH_KPI\","
"\"weight\": \"MY_WEIGHT_COLUMN\","
"}",
"\"columns\" : [{"
"\"column\" : \"COLUMN_1\","
"\"interpreter\": {"
"\"_type\": \"Categorical\""
"}},{"
"\"column\" : \"COLUMN_2\","
"\"interpreter\": {"
"\"_type\": \"Numerical\""
"}},{"
"\"column\" : \"COLUMN_3\","
"\"interpreter\": {"
"\"_type\": \"Numerical\","
"\"quantileCount\": 100"
"}}"
"],"
"\"strategy\" : {"
"\"minimalFractionOfRecords\": 0.05,"
"}"
"}";
The following sections list and explain the parameters the user may configure to control the analysis. The sections are organized along the structure of the configuration classes.
The data filter allows you define the columns that are used for analysis. This can be done by either of the following ways: exclude columns, or, alternatively, provide a list of column ids to include (include columns).
The target column reflects the KPI of the analysis. It must therefore clearly reflect the goal of the analysis.
The field kpi takes the ID of target column. And, interest decides whether the goal is to maximize (high_kpi config) or minimize (low_kpi config) the kpi.
The optional field is weight, which takes a column_id as an input. Each cell of this column represents a positive float that defines how much weight the corresponding TARGET
cell should take, e.g., a weight of 2.0 behaves like two identical records of weight 1.0.
If the weight is 0.0, or if the weight value is missing for some row, this row is excluded from analysis.
If there is no column that exactly matches your definition of the target or weight, you may want to express it as a function of other columns. The easiest way to do so is via the expression language. If you construct the target via some expression, all columns used in this expression are automatically excluded from the analysis.
The column configuration is the place to pass additional information about a column in order to enforce a special treatment. Each column configuration refers to one specific column.
At the core of the column configuration is the interpreter. The interpreter defines which predicates can be built from a column. Very often the default configuration is the best choice and you don't need to set any interpreter. Below you find a table on the different interpreters, followed by some more in-depth explanations.
By default, all float columns are interpreted as numerical and all categorical and boolean columns are interpreted as categorical.
numerical interpreter should be used for all columns for which the order of numbers is meaningful. For aivis Response Analysis engine, the numerical interpreter takes an additional optional argument, namely quantile count, the default value of which is set to be 20.
quantile count sets a resolution.
Imagine a numerical column A
whose cell values are ranging from 0 to 100. Suppose quantile count is 5. This means the engine will consider the values of this column up to a resolution of (100-0)/5 = 20. So it will generate 4 different predicates (plus their negations)
A
larger than 20?A
larger than 40?A
larger than 60?A
larger than 80?String and boolean columns are always interpreted as categorical. Categorical data has nominal scale, i.e., it takes only specific levels and does not necessarily follow any order. In practice, this would express the information about certain states, such as “green”, “red”, or “blue”. This information may be present in form of strings, booleans, or also encoded in numbers. An example could be a column for which "1.0" stands for "pipe open", "2.0" for "pipe blocked", and "3.0" for "pipe sending maintenance alarm". A categorical column A
with possible values a1
, a2
and a3
will currently generate 3 predicates (plus their negations)
A
equal to a1
?A
equal to a2
?A
equal to a3
?One may now wonder, if any column can be interpreted as categorical, why would numerical interpreter exist?
COLUMN_ID
< 10, instead of COLUMN_ID
!= 10.N
number of unique values, and interpreted as categorical, then the engine will create N
number of predicates that all need to be considered. Therefore, if N
is large, it is recommended to interpret it as numerical with quantile count being smaller than N
.The field strategy contains a sub-field called minimal fraction of records, which defines a stopping criterion for splitting nodes. Splitting stops when
As a result of analysis, a report is produced, which contains a tree model. With an appropriate visualization tool, one can generate the tree model from the report.
2.10
Before starting the workflow, sometimes there is the need to add a new column to the dataset (a synthetic column) that is derived from other columns already present. There are various reasons for this, especially if
Technically, you can add synthetic columns using the docker images or any SDK Data API
To create new synthetic columns in a flexible way, aivis features a rich Expression Language to articulate the formula.
The Expression Language is an extension of the scripting language Rhai. We have mainly added support for handling columns natively. This means, you can use columns in normal operators and functions as if they were primitive values. You can even mix columns and primitive values in the same invocation. If at least one parameter is a column, the result will also be a column. The list of operators and functions that allow native column handling can be found in the section on operators and functions.
Information on the basic usage of the language can be found in the very helpful Language Reference of the Rhai Book. This documentation will mainly focus on the added features.
A column consists of a list of data points that represents a series (row ids and values of the same type).
The following value types are supported:
bool
: Booleani64
: 64-bit Integerf64
: 64-bit Floating Pointstring
: UTF-8 StringA column type and its value type are written generically as column<T>
and specifically like e.g. column<i64>
for an integer column.
It is not possible to write down a column literally, but you can refer to an already existing column in your dataset.
Referring to an already existing column is done via:
c(column_id: string literal): column<T>
This function must be used exactly with the syntax above. It is not allowed to invoke it as a method on the column id. The column id must be a simple literal without any inner function invocation!
Examples:
c("my column id") // OK
c("my c" + "olumn id") // FAIL
"my column id".c() // FAIL
To begin with, let's start with a very simple example. Let "a" and "b" be the IDs of two float columns. Then
c("a") + c("b")
yields the sum of the two columns. The Rhai +
operator has been overloaded to work directly on columns (such as many other operators, see below). Therefore, the above expression yields a new column. It contains data points for all rows for which there are entries in "a" and in "b".
However, you might instead want data points for all rows for which there are entries in "a" or in "b". A possible solution for this case is providing default values for each column:
c("a").or(1.0) + c("b").or(0.0)
For more details on handling missing values, see below.
As mentioned above, there are many more functions available for which there is native column support, and you can mix columns with primitive values:
(2 * PI() * (c("x") +0.5)).sin()
If these functions do not satisfy your needs, there is always the alternative not to work on the level of the columns but on the level of the primitive column entries.
The following expression creates a string column that contains the string large
for each row for which the float column "x" is larger than 10.
let val = c("x").at_or(id, 0.0);
if (val > 10.0) {
"large"
} else {
"small"
}
As explained below, the default value 0.0
in the at_or
function has no effect in this case.
More on the function at_or
and the literal id
can be found in the dedicated section.
There are a few functions for handling missing values, as listed below. But before going into details, it is important to understand how the Expression Language deals with the fact that different columns can have data points for different row ids. Essentially, there are three steps involved:
Note: Rows that are not part of any column in the expression are not known to the expression and therefore cannot be handled, not even as missing rows.
The functions to handle missing values are:
filter(column: column<T>, condition: column<bool>): column<T>
– returns a new column with the same values as column
for all rows for which condition
is true. For all other rows, values are missing in the returned column.is_missing(column: column<T>): column<bool>
– returns a new column that is true
for all rows for which a value is missing in column
, and false
for rows for which column
has a value.or(column_1: column<T>, column_2: column<T>): column<T>
– returns a new column with the same values as column_1
but replacing missing values by column_2
or(column_1: column<T>, default: T): column<T>
– returns a new column with the same values as column_1
but replacing missing values by some default valueMoreover, also the following function replaces missing values.
at_or(column: column<T>, row_id: string, default: T): T
– returns the column value at a given row.
While the column-level functions are already powerful, you can unlock the full capabilities of Rhai, by operating on primitive values instead of columns. To bridge this gap and work with individual row values (the primitives), you can use the at_or
function in conjunction with the special literal id
.
at_or
function allows you to retrieve the value of a specific row from a column. For instance, c("x").at_or("A", 0.0)
will return the value from column "x" at row "A" if it exists, otherwise it will return the default value of 0.0.id
is replaced by the current row id. For instance, you may write c("x").at_or(id, 0.0)
. This will return the value from column "x" for some row (or return the default value).
The point is, that the expression is evaluated iteratively for all rows, and the results are collected into a series.
A motivating example was already presented in the example section.avg(column: column<f64>): f64
– returns the average over all rows. Any occurence of nan
is ignored.count(column: column<T>): i64
– returns the number of (non-missing) rows.max(column: column<f64>): f64
– returns the maximum of all values in the column. Any occurence of nan
is ignored.median(column: column<f64>): f64
– returns the median of the values in the column. Any occurence of nan
is ignored.min(column: column<f64>): f64
– returns the minimum of all values in the column. Any occurence of nan
is ignored.mode(column: column<bool/i64/string>): bool/i64/string
– returns the value that occurs most often in the column. If there are several values with the same count, it returns the last element (according to the respective order).sum(column: column<f64>): f64
– returns the sum of all values in the column. Any occurence of nan
is ignored.Here, we list all other functions that do not have a direct Rhai counterpart (in contrast to the section on overloaded operators and functions).
row_ids(column: column<T>): column<i64>
– returns a new column constructed from the given one, where the value of each data point is set to the timestamp
See:
The following operators were defined:
+(i64/f64): i64/f64
-(i64/f64): i64/f64
+(i64/f64, i64/f64): i64/f64
-(i64/f64, i64/f64): i64/f64
*(i64/f64, i64/f64): i64/f64
/(i64/f64, i64/f64): i64/f64
%(i64/f64, i64/f64): i64/f64
**(i64/f64, i64/f64): i64/f64
&(i64, i64): i64
|(i64, i64): i64
^(i64, i64): i64
<<(i64, i64): i64
>>(i64, i64): i64
!(bool): bool
&(bool, bool): bool
|(bool, bool): bool
^(bool, bool): bool
+(string, string): string
false
on different argument types):==(bool/i64/f64/string, bool/i64/f64/string): bool
!=(bool/i64/f64/string, bool/i64/f64/string): bool
<(i64/f64, i64/f64): bool
<=(i64/f64, i64/f64): bool
>(i64/f64, i64/f64): bool
>=(i64/f64, i64/f64): bool
Binary arithmetic and comparison operators can handle mixed i64
and f64
arguments properly, the other parameter is then implicitly converted beforehand via to_float
. Binary arithmetic operators will return f64
if at least one f64
argument is involved.
See:
The following functions were defined:
abs(i64/f64): i64/f64
sign(i64/f64): i64
sqrt(f64): f64
exp(f64): f64
ln(f64): f64
log(f64): f64
log(f64, f64): f64
sin(f64): f64
cos(f64): f64
tan(f64): f64
sinh(f64): f64
cosh(f64): f64
tanh(f64): f64
asin(f64): f64
acos(f64): f64
atan(f64): f64
asinh(f64): f64
acosh(f64): f64
atanh(f64): f64
hypot(f64, f64): f64
atan(f64, f64): f64
floor(f64): f64
ceiling(f64): f64
round(f64): f64
int(f64): f64
fraction(f64): f64
contains(string): bool
len(string): i64
trim(string): string
– with whitespace characters as defined in UTF-8to_upper(string): string
to_lower(string): string
sub_string(value: string, start: i64, end: i64): string
to_int(bool): i64
– returns 1
/0
to_float(bool): f64
– returns 1.0
/0.0
to_string(bool): string
– returns "true"
/"false"
to_float(i64): f64
to_string(i64): string
to_int(f64): i64
– returns 0
on NAN
; values beyond INTEGER_MAX
/INTEGER_MIN
are cappedto_string(f64): string
to_degrees(f64): f64
to_radians(f64): f64
parse_int(string): i64
– throws error if not parsableparse_float(string): f64
– throws error if not parsableis_zero(i64/f64): bool
is_odd(i64): bool
is_even(i64): bool
is_nan(f64): bool
is_finite(f64): bool
is_infinite(f64): bool
is_empty(string): bool
NAN
):max(i64/f64, i64/f64): i64/f64
min(i64/f64, i64/f64): i64/f64
Comparison operators can handle mixed i64
and f64
arguments properly, the other parameter is then implicitly converted beforehand via to_float
. It will return f64
if at least one f64
argument is involved.
The Boolean conversion and comparison functions were added and are not part of the official Rhai.
The following constants are defined in Rhai:
PI(): f64
– the Archimedes' constant: 3.1415...
E(): f64
– the Euler's number: 2.718...
Usually the steps of the workflow will run as part of two different service applications: Training App and Inference App
The diagrams below display typical blueprints of these service aplications using different available components of the engine as well as where they might be located in the end-customer infrastructure landscape (execution environments).
Hereby the following color code was used:
The service application Training App covers the workflow step Training, as well as any bulk inference, e.g. for historical evaluation.
It is executed in the so-called Cold World, which means that it consists of long running tasks that are executed infrequently and have a high resource consumption. Training App works on historical data that was previously archived and thus needs to be retrieved in an extra step from the Data Lake / Cold Storage.
Because of its high resource consumption it is usually not located in the OT network, but is a good fit for the cloud or an on-premise datacenter.
The service application Inference App provides the means for live prediction.
In contrast to the Training App, it runs within the Hot World. Usually it is an ongoing process which serves to predict the current value and only needs minimal resources. Inference App works on live data that is easily available from the Historian / Hot Storage.
As the outcome often influences the production systems (e.g. Advanced Process Control), usually it runs in the OT network. Thanks to low resource consumption, it can run on practical any environment/device, be it in the cloud, on-premise, on-edge or even embedded.
aivis engine v2 toolbox is no official part of aivis engine v2 but an associated side project. It mainly provides tools to turn output artifacts of aivis engine v2 into technical, single-file HTML reports for data scientists. Its api and behaviour is subject to change and experimental. Users should already know the concepts of aivis engine v2 beforehand.
Caveats:
The aivis engine v2 toolbox does not need a licensing key. Its python code is free to look into or even adapt. The respective toolbox release belonging to a aivis engine v2 release {VERSION} is available as:
aivis_engine_v2_toolbox-{VERSION}-py3-none-any.whl
aivis-engine-v2-toolbox:{VERSION}
Each call to construct a toolbox HTML report for engine xy has the following structure:
from aivis_engine_v2_toolbox.api import build_xy_report
config = {
"title": "My Use Case Title",
...
"outputFile": "/path/to/my-use-case-report.html"}
build_xy_report(config)
Additionally, the config needs to contain references to the respective engine's output files, e.g. "analysisReportFile": "/path/to/analysis-report.json"
. The full call to create a report for any engine can for example be found in the python or argo examples of the respective engine.
There are many optional expert configurations to customize your HTML report. Some examples:
The aivis engine v2 toolbox always assumes timestamps to be unix and translates them to readable dates. This behaviour can be switched off via "advancedConfig": {"unixTime": False}
, so that timestamps always remain long
values.
By referring to a metadata file via "metadataFile": "/path/to/metadata.json"
, signals are not only described via their signal id but enriched with more information. The metadata json contains an array of signals with the keys id
(must) as well as name
, description
, unitSymbol
, unitType
(all optional):
{
"signals": [{
"id": "fa6c65bb-5cee-45fa-ab19-355ba94889e9",
"name": "et 1",
"description": "extruder temperature nr. 1",
"unitName": "Kelvin",
"unitSymbol": "K"
}, {
"id": "dc3477e5-a83c-4485-b7f4-7528d336d9c4",
"name": "abc 2"
},
...
]}
To every HTML report which contains a timeseries plot, additional signals can be added to also be displayed. It is however not an automatism to include all signals of the dataset for display, since a full dataset is typically an amount of data which should not be put into a single-file HTML.
All custom configuration options can be seen in the api.py
file in src/aivis_engine_v2_toolbox
.