Download OpenAPI specification:Download
aivis Response Analysis is one of the engines of the aivis Technology Platform by Vernaio.
aivis Response Analysis provides insights into underlying causes and informs about how to accomplish user-defined goals with clear and actionable instructions discovered from the input data.
It provides breakdowns of the input data with respect to a goal, i.e., a key performance indicator (KPI), such that users can learn not only the cause-and-effect but also ways to achieve/optimize the KPI.
This is done by recursively performing counterfactual analyses on the relationships between the KPI and the corresponding data input, looking for conditions that have the most impact on the change in KPI, a concept that is similar to the controlled direct effect of causal inference, proposed by J. Pearl (2001).
As such, its explainability is one of the key strengths of the engine, which differentiates itself from other conventional machine learning models.
By improving traditional decision/regression tree models with novel mathematics, aivis Response Analysis achieves cutting-edge performance in identifying the causes of disruptions and/or in finding the ways to accomplish user-defined goals, while requiring minimal input/configurations from the users.
The engine generates an analysis report based on historical tabular data that includes data values that represents the user's objective such as a KPI.
This documentation explains the usage and principles behind aivis Response Analysis for data and software engineers. For detailed API descriptions of docker images, web endpoints and SDK functions, please consult the reference manual of the relevant component:
SDKs
Docker Images
App-API
Web-API
For additional support, go to Vernaio Support.
Currently, aivis Response Analysis is distributed to a closed user base only. To gain access to the artifacts, as well as for any other questions, you can open a support ticket via aivis Support.
Unlike other aivis engines, The Response Analysis engine takes tabular data as an input. It is thus important to understand the concept of tabular data in the context of the Response Analysis engine. This chapter thus explains the terminologies as well as the required format of tabular data.
A typical example of tabular data is a score board, say, of a football league.
| Team | Score | Win | Loss | Draw | Home | Injury |
|---|---|---|---|---|---|---|
| FC Blue | 109 | 10 | 2 | 3 | "City A" | true |
| FC Red | 92 | 8 | 3 | 4 | "City A" | false |
| FC Black | 78 | 6 | 3 | 6 | "City B" | |
| ... | ... | ... | ... | ... | ... | ... |
Tabular data consist of columns. Every column contains two things, these being:
Score and Win. The column_id needs to be unique within the data.row_id, value), like (FC Blue, 109).Tabular data fed into the engine input may look like this:
| Score | Win | Loss | Draw | Home | Injury |
|---|---|---|---|---|---|
| (FC Blue, 109) | (FC Blue, 10) | (FC Blue, 2) | (FC Blue, 3) | (FC Blue, "City A") | (FC Blue, true) |
| (FC Red, 92) | (FC Red, 8) | (FC Red, 3) | (FC Red, 4) | (FC Red, "City A") | (FC Red, false) |
| (FC Black, 78) | (FC Black, 6) | (FC Black, 3) | (FC Black, 6) | (FC Black, "City B") |
Here, Score, Win, Loss, etc., are column_ids. And the data entries are cells.
The value of a cell can be float, string, or boolean. aivis Response Analysis engine can handle empty cells.
In aivis we also call a row a record. In the above case, we have 3 records.
Numbers are stored as 64-bit Floating Point numbers. They are written in scientific notation like -341.4333e-44, so they consist of the compulsory part Significand and an optional part Exponent that is separated by an e or E.
The Significand contains one or multiple digits and optionally a decimal separator .. In such a case, digits before or after the separator can be omitted and are assumed to be 0. It can be prefixed with a sign (+ or -).
The Exponent contains one or multiple digits and can be prefixed with a sign, too.
The 64-bit Floating Point specification also allows for 3 non-finite values (not a number, positive infinity and negative infinity) that can be written as nan, inf/+inf and -inf (case insensitive). These values are valid, but the engine regards them as being unknown and they are therefore skipped.
Regular expression: (?i:nan)|[+-]?(?i:inf)|[+-]?(?:\d+\.?|\d*\.\d+)(?:[Ee][+-]?\d+)?
String values must be encoded as UTF-8. Empty strings are regarded as being unknown values and are therefore skipped.
Boolean values must be written in one of the following ways:
true/false (case insensitive)1/01.0/0.0 with an arbitrary number of additional zeros at the endRegular expression: (?i:true)|(?i:false)|1(\.0+)?|0(\.0+)?
aivis Response Analysis engine performs an analysis on the input tabular data. The tabular data should contain a column that the user wants to analyze, such as a KPI. This column will be referred to as the target. The end result of the engine is the report.
Equipped with the knowledge of tabular data, we are now ready to use the aivis Response Analysis engine. As an illustrative use case example, we will use the engine to learn about German credit health, e.g., what attributes positively and negatively contribute to one's credit health. Each person is represented by a row, listing a number of attributes and one's credit health, the latter of which will be used as the target KPI.
The SDK of aivis Response Analysis allows for direct calls from your C, Java or Python program code. All language SDKs internally use our native shared library (FFI). As C APIs can be called from various other languages as well, the C-SDK can also be used with languages such as R, Go, Julia, Rust, and more. Compared to the docker images, the SDK enables a more fine-grained usage and tighter integration.
In this chapter we will show you how to get started using the SDK.
A working sdk example that builds on the code explained below can be downloaded directly here:
For the following installation instruction, always replace:
{ENGINE} by the engine's acronym, here ra for response analysis{VERSION} by the aivis version you want to install, e.g. 2.11.0{TARGET} by the target fitting your operating system, e.g. win_amd64 - see artifacts for other options on linux and macosWe recommend running the example in the following way:
>=3.10) installation.AIVIS_ENGINE_V2_API_KEY and assign the aivis licensing key to it.+- data
| +- # CSV file(s) containing the data the example is based on; Docker, Java and Python code read the same CSV files
|
+- docker
| +- # files to run the example via Docker images which we won't need now
|
+- java
| +- # files to run the example via Java SDK which we won't need now
|
+- python
| +- # files to run the example via Python SDK
**/python subfolder. Here, you will find the classic .py Python script and a .ipynb Jupyter notebook.
Both run the exact same example and output the same result. Choose which one you want to run. **/python subfolder and run the following commands. python -m pip install poetry
# configure your credentials
poetry config http-basic.vernaio-python <user token name code> <user token pass code>
pyproject.toml. This step can take a little while: poetry install --no-root
# runs the classic Python script `example_{ENGINE}.py`
poetry run python example_{ENGINE}.py --input=../data --output=output
# installs Jupyter kernel
poetry run ipython kernel install --user --name=aivis
# runs the Jupyter Python script `example_{ENGINE}.ipynb`
poetry run jupyter notebook example_{ENGINE}.ipynb
**/python/output.Done!!
Of course, there are various ways to install Python dependencies on your machine. We only mention some alternatives briefly here, as this is not specific to aivis and can be looked up everywhere. For example, you could download the dependencies as .whl files from the artifact repository. You'll need the following ones, as also listed in Poetry's configuration file pyproject.toml:
vernaio_aivis_engine_v2_{ENGINE}_runtime_python_full-{VERSION}-py3-none-{TARGET}.whl: A full Python runtime for the engine you want to run vernaio_aivis_engine_v2_base_sdk_python-{VERSION}-py3-none-any.whl: The base Python SDK vernaio_aivis_engine_v2_{ENGINE}_sdk_python-{VERSION}-py3-none-any.whl: The Python SDK for the engine you want to run vernaio_aivis_engine_v2_toolbox-{TOOLBOX-VERSION}-py3-none-any.whl: The toolbox Python SDK to post-process the output of aivis and generate an HTML reportThese .whl files can now be installed directly.
You could still use Poetry and adapt the pyproject.toml. (In that case, remove the vernaio-python source definition from pyproject.toml and skip the poetry config step from above.)
[tool.poetry.dependencies]
vernaio-aivis-engine-v2-base-sdk-python = { file = "path/to/vernaio_aivis_engine_v2_base_sdk_python-{VERSION}-py3-none-any.whl" }
# etc.
You are even free to ignore Poetry altogether and install the .whl files directly in pip:
pip install path/to/vernaio_aivis_engine_v2_base_sdk_python-{VERSION}-py3-none-any.whl
# etc.
Then you can directly run the Python script (don't forget --input=../data --output=output) or the notebook in your preferred way.
Of course, there are various ways to install Java dependencies. We recommend running the example in the following way:
>=11) installation.AIVIS_ENGINE_V2_API_KEY and assign the aivis licensing key to it.+- data
| +- # CSV file(s) containing the data the example is based on; Docker, Java and Python code read the same CSV files
|
+- docker
| +- # files to run the example via Docker images which we won't need now
|
+- java
| +- # files to run the example via Java SDK
|
+- python
| +- # files to run the example via Python SDK which we won't need now
build.gradle: credentials {
username <user token name code>
password <user token pass code>
}
(If you like, you could alternatively adapt your gradle.properties file.) **/java subfolder and run the following commands: # builds this Java project with Gradle wrapper
./gradlew clean build
# runs Java with parameters referring to input and output folder
java -jar build/libs/example_{ENGINE}.jar --input=../data --output=output
**/java/output.Done!!
Our SDK artifacts come in one flavor full (note that inf flavor will be introduced in the future release):
full packages provide the full functionality and are available for mainstream targets only:win-x8664macos-armv8* (macOS 11 "Big Sur" or later) 2.3macos-x8664* (macOS 11 "Big Sur" or later; until aivis engine version 2.9.0) 2.3linux-x8664 (glibc >= 2.14)* Only Python and C SDKs are supported. Java SDK is not available for this target.
In this chapter we want to demonstrate the full API functionality and thus always use the full package.
To use the Python-SDK you must download the SDK artifact (flavor and target generic) for your pythonpath at build time. Additionally at installation time, the runtime artifact must be downloaded with the right flavor and target.
The artifacts are distributed through a PyPI registry.
Using Poetry you can simply set a dependency on the artifacts specifying flavor and version. The target is chosen depending on your installation system:
aivis_engine_v2_ra_sdk_python = "{VERSION}"
aivis_engine_v2_ra_runtime_python_{FLAVOR} = "{VERSION}"
To use the Java-SDK, you must download at build time:
It is possible to include multiple runtime artifacts for different targets in your application to allow cross-platform usage. The SDK chooses the right runtime artifact at runtime.
The artifacts are distributed through a Maven registry.
Using Maven, you can simply set a dependency on the artifacts specifying flavor, version and target:
<dependency>
<groupId>com.vernaio</groupId>
<artifactId>aivis-engine-v2-ra-sdk-java</artifactId>
<version>{VERSION}</version>
</dependency>
<dependency>
<groupId>com.vernaio</groupId>
<artifactId>aivis-engine-v2-ra-runtime-java-{FLAVOR}-{TARGET}</artifactId>
<version>{VERSION}</version>
<scope>runtime</scope>
</dependency>
Alternativly, with Gradle:
implementation 'com.vernaio:aivis-engine-v2-ra-sdk-java:{VERSION}'
runtimeOnly 'com.vernaio:aivis-engine-v2-ra-runtime-java-{FLAVOR}-{TARGET}:{VERSION}'
To use the C-SDK, you must download the SDK artifact at build time (flavor and target generic). For final linkage/execution you need the runtime artifact with the right flavor and target.
The artifacts are distributed through a Conan registry.
Using Conan, you can simply set a dependency on the artifact specifying flavor and version. The target is chosen depending on your build settings:
aivis-engine-v2-ra-sdk-c/{VERSION}
aivis-engine-v2-ra-runtime-c-{FLAVOR}/{VERSION}
The SDK artifact contains:
include/aivis-engine-v2-ra-core-full.hThe runtime artifact contains:
lib/aivis-engine-v2-ra-{FLAVOR}-{TARGET}.libbin/aivis-engine-v2-ra-{FLAVOR}-{TARGET}.dll (also containing the import library)lib/aivis-engine-v2-ra-{FLAVOR}-{TARGET}.so (also containing the import library)The runtime library must be shipped to the final execution system.
A valid licensing key is necessary for every aivis calculation in every engine and every component.
It has to be set (exported) as the environment variable AIVIS_ENGINE_V2_API_KEY.
aivis will send HTTPS requests to https://v3.aivis-engine-v2.vernaio-licensing.com (before release 2.7: https://v2.aivis-engine-v2.vernaio-licensing.com, before release 2.3: https://aivis-engine-v2.perfectpattern-licensing.de) to check if your licensing key is valid. Therefore, the requirements are an active internet connection as well as no firewall blocking an application other than the browser from calling this URL.
If aivis returns a licensing error, please check the following items before contacting aivis Support:
<FirstPartOfKey>.<SecondPartOfKey> with the first and second parts being UUIDs. In particular, there must be no whitespace. https://v3.aivis-engine-v2.vernaio-licensing.com in your browser. The expected outcome is "Method Not Allowed". In that case, at least the URL is not generally blocked. Before we can invoke API functions of our SDK, we need to set it up for proper usage and consider the following things.
It is important to ensure the release of allocated memory for unused objects.
In Python, freeing objects and destroying engine resources is done automatically. You can force resource destruction with the appropriate destroy function.
In Java, freeing objects is done automatically, but you need to destroy all engine resources like Data- and Analysis-objects with the appropriate destroy function. As they all implement Java’s AutoClosable interface, we can also write a try-with-resource statement to auto-destroy them:
try(final ResponseAnalysisData inputData = ResponseAnalysisData.create()) {
// ... do stuff ...
} // auto-destroy when leaving block
In C, you must always
Errors and exceptions report what went wrong on a function call. They can be caught and processed by the outside.
In Python, an Exception is thrown and can be caught conveniently.
In Java, an AbstractAivisException is thrown and can be caught conveniently.
In C, every API function can write an error to the given output function parameter &err (to disable this, just set it to NULL). This parameter can then be checked by a helper function similar to the following:
const Error *err = NULL;
void check_err(const Error **err, const char *action) {
// everything is fine, no error
if (*err == NULL)
return;
// print information
printf("\taivis Error: %s - %s\n", action, (*err)->json);
// release error pointer
aivis_free(*err);
*err = NULL;
// exit program
exit(EXIT_FAILURE);
}
Failures within function calls will never affect the state of the engine.
The engine emits log messages to report on the progress of each task and to give valuable insights. These log messages can be caught via registered loggers.
# create logger
class Logger(EngineLogger):
def log(self, level, thread, module, message):
if (level <= 3):
print("\t... %s" % message)
# register logger
ResponseAnalysisSetup.register_logger(Logger())
// create and register logger
ResponseAnalysisSetup.registerLogger(new EngineLogger() {
public void log(int level, String thread, String module, String message) {
if (level <= 3) {
System.out.println(String.format("\t... %s", message));
}
}
});
// create logger
void logger(const uint8_t level, const char *thread, const char *module, const char *message) {
if (lvl <= 3)
printf("\t... %s\n", message);
}
// register logger
aivis_setup_register_logger(&logger, &err);
check_err(&err, "Register logger");
During the usage of the engine, a lot of calculations are done. Parallelism can drastically speed things up. Therefore, set the maximal threads to a limited number of CPU cores or set it to 0 to use all available cores (defaults to 0).
# init thread count
ResponseAnalysisSetup.init_thread_count(4)
// init thread count
ResponseAnalysisSetup.initThreadCount(4);
// init thread count
aivis_setup_init_thread_count(4, &err);
check_err(&err, "Init thread count");
Now that we are done setting up the SDK, we need to create a data store that holds our historical tabular data. In general, all data must always be provided through data stores. You can create as many as you want.
After the creation of the data store, you can fill it with signal data. The classic way to do it is writing your own reading function and adding signals, i.e. lists of data points, to the data context yourself, as it is shown in Data Reader Options.
We recommend using the built-in files reader, which processes a folder with CSV files that have to follow the CSV Format Specification.
We assume that the folder path/to/input/folder/ contains train_ra.csv.
# create empty data context for analysis data
analysis_data = ResponseAnalysisData.create()
# create config for files reader
files_reader_config = json.dumps(
{
"folder": "path/to/input/folder/"
}
)
# read data
analysis_data.read_files(files_reader_config)
# ... use analysis data ...
// create empty data context for analysis data
try(final ResponseAnalysisData analysisData = ResponseAnalysisData.create()) {
// create config for files reader
final DtoTabularFilesReaderConfig filesReaderConfig = new DtoTabularFilesReaderConfig("path/to/input/folder/");
// read data
analysisData.readFiles(filesReaderConfig);
// ... use analysis data ...
} // auto-destroy analysis data
// create empty data context for analysis data
TabularDataHandle analysis_data = aivis_tabular_data_create(&err);
check_err(&err, "Create analysis data context");
// create config for files reader
const char *reader_config = "{"
"\"folder\": \"path_to_input_folder\""
"}";
// read data
aivis_tabular_data_read_files(analysis_data, (uint8_t *) reader_config, strlen(reader_config), &err);
check_err(&err, "Read Files");
// ... use analysis data ...
// destroy data context
aivis_tabular_data_destroy(analysis_data, &err);
check_err(&err, "Destroy data context");
analysis_data = 0;
In the following, we will assume you have read in the file train_ra.csv shipped with the Example Project.
With the data store filled with historical tabular data, we can now create our analysis:
# build analysis config
analysis_config = json.dumps(
{
"dataFilter": {"excludeColumns": ["telephone"]}, # this field is optional
"target": {
"kpi": "TARGET",
"interest": "HIGH_KPI",
},
"strategy": { # this field is optional
"minimalFractionOfRecords": 0.05,
},
}
)
# create analysis
analysis = ResponseAnalysis.create(analysis_data, analysis_config)
# ... use analysis ...
// build analysis config
final DtoAnalysisConfig analysisConfig =
new DtoAnalysisConfig(new DtoTargetConfig("TARGET", DtoInterest.HIGH_KPI)).withDataFilter(
new DtoTabularDataFilter().withExcludeColumns(new String[] { "telephone" }) // this field is optional
).withStrategy(new DtoStrategy().withMinimalFractionOfRecords(0.05)); // this field is optional
// create analysis
final ResponseAnalysis analysis = ResponseAnalysis.create(analysisData, analysisConfig) {
// ... use analysis ...
} // auto-destroy analysis
// build analysis config
const char *analysis_config = "{"
"\"dataFilter\": {\"excludeColumns\": [\"telephone\"]}," // this field is optional
"\"target\": {"
"\"kpi\": \"TARGET\","
"\"interest\": \"HIGH_KPI\" "
"},"
"\"strategy\": {" // this field is optional
"\"minimalFractionOfRecords\": 0.05 "
"}"
"}";
// create analysis
ResponseAnalysisHandle analysis_handle = aivis_response_analysis_create(
analysis_data,
(uint8_t *) analysis_config,
strlen(analysis_config),
&err
);
check_err(&err, "Create analysis");
// ... use analysis ...
// destroy analysis
aivis_response_analysis_destroy(analysis_handle, &err);
check_err(&err, "Destroy analysis");
analysis_handle = 0;
Note that the target column is specified by kpi in target. The field called interest then defines whether the goal is to maximize (high_kpi) or minimize (low_kpi) the target. The field strategy is optional and dictates the minimum number of records in the leaf nodes.
For the moment, you may take this file as it is. The different keys will become clearer from the later sections and the reference manual.
The docker images of aivis Response Analysis are prepared for easy usage. They use the SDK internally, but have a simpler file-based interface. If you have a working docker workflow system like Argo, you can build your own automated workflow based on these images.
In this chapter, we will show you how to get started using docker images. Usage of the SDK will be covered by the next chapter.
A working example that builds on the code explained below can be downloaded directly here: response-analysis-examples.zip.
This zip file contains example code for docker, python and java in respective subfolders. All of them use the same dataset which is in the data subfolder.
Prerequisites: Additionally to the response-analysis-examples.zip you just downloaded, you need the following artifacts. To gain access, you can open a support ticket via aivis Support.
aivis-engine-v2-ra-worker and (optionally for HTML report generation) aivis-engine-v2-toolboxAs a Kubernetes user even without deeper Argo knowledge, the aivis-engine-v2-example-ra-argo.yaml shows best how the containers are executed after each other, how the analysis worker is provided with a folder that contains the data CSV and how the toolbox assembles an HTML report at the end.
There is one docker image:
docker-releases.artifacts.vernaio.com/vernaio/aivis-engine-v2-ra-worker:{VERSION}The docker image is Linux-based.
You need an installation of Docker on your machine as well as access to Vernaio's artifact repository. Log in to Vernaio's artifact repository and (-> upper right) access your user token name code and user token pass code.
docker -v
docker login docker-releases.artifacts.vernaio.com <user token name code> <user token pass code>
docker pull docker-releases.artifacts.vernaio.com/vernaio/aivis-engine-v2-ra-worker:{VERSION}
docker -v
docker login docker-releases.artifacts.vernaio.com <user token name code> <user token pass code>
docker pull docker-releases.artifacts.vernaio.com/vernaio/aivis-engine-v2-ra-worker:{VERSION}
A valid licensing key is necessary for every aivis calculation in every engine and every component.
It has to be set (exported) as the environment variable AIVIS_ENGINE_V2_API_KEY.
aivis will send HTTPS requests to https://v3.aivis-engine-v2.vernaio-licensing.com (before release 2.7: https://v2.aivis-engine-v2.vernaio-licensing.com, before release 2.3: https://aivis-engine-v2.perfectpattern-licensing.de) to check if your licensing key is valid. Therefore, the requirements are an active internet connection as well as no firewall blocking an application other than the browser from calling this URL.
If aivis returns a licensing error, please check the following items before contacting aivis Support:
<FirstPartOfKey>.<SecondPartOfKey> with the first and second parts being UUIDs. In particular, there must be no whitespace. https://v3.aivis-engine-v2.vernaio-licensing.com in your browser. The expected outcome is "Method Not Allowed". In that case, at least the URL is not generally blocked.
All artifacts use CSV as the input data format. As the CSV format is highly non-standardized, we will discuss it briefly in this section.
CSV files must be stored in a single folder specified in the config under data.folder. Within this folder the CSV files can reside in an arbitrary subfolder hierarchy. In some cases (e.g. for HTTP requests), the folder must be passed as a ZIP file.
General CSV rules:
CR LF/LF). In other words, each record must be on its own line.Special rules:
id and contain the row_ids.id, are interpreted as "columns" of tabular data.
Here, we will analyze the target column with aivis Response Analysis engine.
At the beginning, we create a folder docker, a subfolder analysis-config and add the configuration file config.yaml:
data:
folder: /srv/data
dataTypes:
defaultType: FLOAT
stringColumns: # this field is optional
- "status_of_existing_checking_account"
- "purpose"
- "savings_account"
- "credit_history"
- "marital_status_and_sex"
- "other_debtors"
- "property"
- "other_installment_plans"
- "housing"
- "job"
- "telephone"
- "foreign_worker"
analysis:
dataFilter: # this field is optional
excludeColumns:
- telephone
target:
kpi: TARGET
interest: HIGH_KPI
strategy: # this field is optional
minimalFractionOfRecords: 0.05
output:
folder: /srv/output
Note that the target column is specified by kpi in target. The field called interest then defines whether the goal is to maximize (high_kpi) or minimize (low_kpi) the target. The field strategy is optional and dictates the minimum number of records in the leaf nodes.
For the moment, you may take this file as it is. The different keys will become clearer from the later sections and the docker reference manual.
As a next step, we create a second folder data and add the Input Data CSV file train_ra.csv to the folder. Afterwards, we create a blank folder output.
Our folder structure should now look like this:
+- docker
| +- analysis-config
| +- config.yaml
|
+- data
| +- train_ra.csv
|
+- output
Finally, we can start our analysis via:
docker run --rm -it \
-v $(pwd)/docker/analysis-config:/srv/conf \
-v $(pwd)/data/train_ra.csv:/srv/data/train_ra.csv \
-v $(pwd)/output:/srv/output \
docker-releases.artifacts.vernaio.com/vernaio/aivis-engine-v2-ra-worker:{VERSION}
docker run --rm -it `
-v ${PWD}/docker/analysis-config:/srv/conf `
-v ${PWD}/data/train_ra.csv:/srv/data/train_ra.csv `
-v ${PWD}/output:/srv/output `
docker-releases.artifacts.vernaio.com/vernaio/aivis-engine-v2-ra-worker:{VERSION}
After a short time, this results in an output file analysis-report.json in the output folder.
aivis Response Analysis engine outputs a report, which can be visualized as follows.
On the very left, one finds the root node that contains all records.
Suppose TARGET values range from 0 to 1.
The mean KPI in the root node is 0.7, which is the average value of TARGET
entries of all records. The root node splits out to two child nodes.
Each child node is reached by a predicate, which is expressed as, for example,
A < x, where A is a column_id, and x is a value that splits the records
of the node. It is the core engine algorithm that picks the column and the
value to best split the node.
A child node is itself again split into two nodes and so on.
Splitting stops when:
The final un-split nodes are called leaf nodes (the square nodes). Not only are those records that belong to leaf nodes
grouped to have similar (if not the same) TARGET values, but they also tend to have extreme values.
This implies that the "good" and "bad" records are well separated. aivis Response Analysis engine identifies the most informative pathways to arrive at "good" and "bad" TARGET value scenarios.
By inspecting the predicates that lead to the leaf nodes, therefore, one can learn the reasons for good (or the cause of bad) performance of the KPI.
In this section, we discuss the results of our example project, German credit health. It contains 1000 people's credit health evaluation, compiled by a bank. The target KPI, which naturally represents the credit health, is binary, i.e., 0 for bad and 1 for good. The global average of the KPI is 0.7 meaning that 700 people (records) have good credit health (1) and 300 bad (0).
Previous sections gave an introduction on how to use aivis Response Analysis and also shed some light on how it works. The following sections will explain more about the concept and provide a more profound background. It is not necessary to know this background to use aivis Response Analysis! However, you may find convenient solutions for specific problems, or information on how to optimize your usage of aivis Response Analysis. It will become clear that only minimal user input is required for the engine to perform well. Nevertheless, the user has the option to control the process with several input parameters which will be presented below.
First, an overview of all kinds of possible configuration keys is presented. A more minimal analysis configuration was used above in SDK analysis, respectively, in Docker analysis. This example may mainly serve as a quick reference. The meaning of the different keys is explained in the following sections, and a definition of the syntax is given in the reference manuals.
analysis:
dataFilter:
excludeColumns:
- telephone
# includeColumns: ... either exclude or include columns
target:
kpi: TARGET
interest: HIGH_KPI # or LOW_KPI
weight: MY_WEIGHT_COLUMN
columns:
- column: COLUMN_1
interpreter:
_type: Categorical
- column: COLUMN_2
interpreter:
_type: Numerical
- column: COLUMN_3
interpreter:
_type: Numerical
quantileCount: 100
strategy:
minimalFractionOfRecords: 0.05
analysis_config = json.dumps({
"dataFilter": {
"excludeColumns": ["telephone"]
# "includeColumns": ... either exclude or include columns
},
"target": {
"kpi": "TARGET",
"interest": "HIGH_KPI",
"weight": "MY_WEIGHT_COLUMN",
},
"columns": [
{
"column": COLUMN_1,
"interpreter": {
"_type": "Categorical"
}
},
{
"column": COLUMN_2,
"interpreter": {
"_type": "Numerical"
}
},
{
"column": COLUMN_3,
"interpreter": {
"_type": "Numerical",
"quantileCount": 100,
}
},
],
"strategy": {
"minimalFractionOfRecords": 0.05
}
})
final DtoAnalysisConfig analysisConfig =
new DtoAnalysisConfig(new DtoTargetConfig("TARGET", DtoInterest.HIGH_KPI).withWeight("MY_WEIGHT_COLUMN")).withDataFilter(
new DtoTabularDataFilter().withExcludeColumns(new String[] { " telephone " })
)
// .withDataFilter(new DtoTabularDataFilter().withIncludeColumns(new String[] {"..."})) either exclude or include columns
.withColumns(
new IDtoColumnConfig[] {
new DtoColumnConfig("COLUMN_1").withInterpreter(new DtoCategoricalColumnInterpreter()),
new DtoColumnConfig("COLUMN_2").withInterpreter(new DtoNumericalColumnInterpreter()),
new DtoColumnConfig("COLUMN_3").withInterpreter(new DtoNumericalColumnInterpreter().withQuantileCount(100)), }
)
.withStrategy(new DtoStrategy().withMinimalFractionOfRecords(0.05));
const char *analysis_config = "{"
"\"dataFilter\": {"
"\"excludeColumns\": ["
"\"telephone\","
"]",
//"\"includeColumns\": [...]" either exclude or include columns
"}",
"\"target\": {"
"\"kpi\": \"TARGET\","
"\"interest\": \"HIGH_KPI\","
"\"weight\": \"MY_WEIGHT_COLUMN\","
"}",
"\"columns\" : [{"
"\"column\" : \"COLUMN_1\","
"\"interpreter\": {"
"\"_type\": \"Categorical\""
"}},{"
"\"column\" : \"COLUMN_2\","
"\"interpreter\": {"
"\"_type\": \"Numerical\""
"}},{"
"\"column\" : \"COLUMN_3\","
"\"interpreter\": {"
"\"_type\": \"Numerical\","
"\"quantileCount\": 100"
"}}"
"],"
"\"strategy\" : {"
"\"minimalFractionOfRecords\": 0.05,"
"}"
"}";
The following sections list and explain the parameters the user may configure to control the analysis. The sections are organized along the structure of the configuration classes.
The data filter allows you define the columns that are used for analysis. This can be done by either of the following ways: exclude columns, or, alternatively, provide a list of column ids to include (include columns).
The target column reflects the KPI of the analysis. It must therefore clearly reflect the goal of the analysis.
The field kpi takes the ID of the target column. And interest decides whether the goal is to maximize (high_kpi config) or minimize (low_kpi config) the kpi.
The optional field is weight, which takes a column_id as input. Each cell of this column represents a positive float that defines how much weight the corresponding TARGET cell should take, e.g., a weight of 2.0 behaves like two identical records of weight 1.0.
If the weight is 0.0, or if the weight value is missing for some row, this row is excluded from analysis.
If there is no column that exactly matches your definition of the target or weight, you may want to express it as a function of other columns. The easiest way to do so is via the expression language. If you construct the target via some expression, all columns used in this expression are automatically excluded from the analysis.
The column configuration is the place to pass additional information about a column in order to enforce special treatment. Each column configuration refers to one specific column.
At the core of the column configuration is the interpreter. The interpreter defines which predicates can be built from a column. Very often the default configuration is the best choice and you don't need to set any interpreter. Below you find a table on the different interpreters, followed by some more in-depth explanations.
By default, all float columns are interpreted as numerical and all categorical and boolean columns are interpreted as categorical.
numerical interpreter should be used for all columns for which the order of numbers is meaningful. For aivis Response Analysis engine, the numerical interpreter takes an additional optional argument, namely quantile count, the default value of which is set to be 20.
quantile count sets a resolution.
Imagine a numerical column A whose cell values are ranging from 0 to 100. Suppose quantile count is 5. This means the engine will consider the values of this column up to a resolution of (100-0)/5 = 20. So it will generate 4 different predicates (plus their negations):
A larger than 20?A larger than 40?A larger than 60?A larger than 80?String and boolean columns are always interpreted as categorical. Categorical data has a nominal scale, i.e., it takes only specific levels and does not necessarily follow any order. In practice, this would express information about certain states, such as "green", "red", or "blue". This information may be present in the form of strings, booleans, or also encoded in numbers. An example could be a column for which "1.0" stands for "pipe open", "2.0" for "pipe blocked", and "3.0" for "pipe sending maintenance alarm". A categorical column A with possible values a1, a2 and a3 will currently generate 3 predicates (plus their negations):
A equal to a1?A equal to a2?A equal to a3?One may now wonder, if any column can be interpreted as categorical, why would numerical interpreter exist?
COLUMN_ID < 10, instead of COLUMN_ID != 10.N number of unique values, and is interpreted as categorical, then the engine will create N number of predicates that all need to be considered. Therefore, if N is large, it is recommended to interpret it as numerical with quantile count being smaller than N.The field strategy contains a sub-field called minimal fraction of records, which defines a stopping criterion for splitting nodes. Splitting stops when:
As a result of analysis, a report is produced, which contains a tree model. With an appropriate visualization tool, one can generate the tree model from the report.
2.10
Before starting the workflow, there is sometimes a need to add a new column to the dataset (a synthetic column) that is derived from other columns already present. There are various reasons for this, especially if:
Technically, you can add synthetic columns using the Docker images or any SDK Data API.
To create new synthetic columns in a flexible way, aivis features a rich Expression Language to articulate the formula.
The Expression Language is an extension of the scripting language Rhai. We have mainly added support for handling columns natively. This means you can use columns in normal operators and functions as if they were primitive values. You can even mix columns and primitive values in the same invocation. If at least one parameter is a column, the result will also be a column. The list of operators and functions that allow native column handling can be found in the section on operators and functions.
Information on the basic usage of the language can be found in the very helpful Language Reference of the Rhai Book. This documentation will mainly focus on the added features.
A column consists of a list of data points that represents a series (row ids and values of the same type).
The following value types are supported:
bool : Booleani64 : 64-bit Integerf64 : 64-bit Floating Pointstring : UTF-8 StringA column type and its value type are written generically as column<T> and specifically like e.g. column<i64> for an integer column.
It is not possible to write down a column literally, but you can refer to an already existing column in your dataset.
Referring to an already existing column is done via:
c(column_id: string literal): column<T>
This function must be used exactly with the syntax above. It is not allowed to invoke it as a method on the column ID. The column ID must be a simple literal without any inner function invocation!
Examples:
c("my column id") // OK
c("my c" + "olumn id") // FAIL
"my column id".c() // FAIL
To begin, let's start with a very simple example. Let "a" and "b" be the IDs of two float columns. Then
c("a") + c("b")
yields the sum of the two columns. The Rhai + operator has been overloaded to work directly on columns (as have many other operators, see below). Therefore, the above expression yields a new column. It contains data points for all rows for which there are entries in both "a" and "b".
However, you might instead want data points for all rows for which there are entries in "a" or in "b". A possible solution for this case is providing default values for each column:
c("a").or(1.0) + c("b").or(0.0)
For more details on handling missing values, see below.
As mentioned above, there are many more functions available for which there is native column support, and you can mix columns with primitive values:
(2 * PI() * (c("x") +0.5)).sin()
If these functions do not satisfy your needs, there is always the alternative not to work on the level of the columns but on the level of the primitive column entries.
The following expression creates a string column that contains the string large for each row for which the float column "x" is larger than 10.
let val = c("x").at_or(id, 0.0);
if (val > 10.0) {
"large"
} else {
"small"
}
As explained below, the default value 0.0 in the at_or function has no effect in this case.
More information on the function at_or and the literal id can be found in the dedicated section.
There are a few functions for handling missing values, as listed below. But before going into details, it is important to understand how the Expression Language deals with the fact that different columns can have data points for different row IDs. Essentially, there are three steps involved:
Note: Rows that are not part of any column in the expression are not known to the expression and therefore cannot be handled, not even as missing rows.
The functions to handle missing values are:
filter(column: column<T>, condition: column<bool>): column<T> – returns a new column with the same values as column for all rows for which condition is true. For all other rows, values are missing in the returned column.is_missing(column: column<T>): column<bool> – returns a new column that is true for all rows for which a value is missing in column, and false for rows for which column has a value.or(column_1: column<T>, column_2: column<T>): column<T> – returns a new column with the same values as column_1 but replacing missing values by column_2or(column_1: column<T>, default: T): column<T> – returns a new column with the same values as column_1 but replacing missing values by some default valueMoreover, also the following function replaces missing values.
at_or(column: column<T>, row_id: string, default: T): T – returns the column value at a given row.
While the column-level functions are already powerful, you can unlock the full capabilities of Rhai by operating on primitive values instead of columns. To bridge this gap and work with individual row values (the primitives), you can use the at_or function in conjunction with the special literal id.
at_or function allows you to retrieve the value of a specific row from a column. For instance, c("x").at_or("A", 0.0) will return the value from column "x" at row "A" if it exists; otherwise it will return the default value of 0.0.id is replaced by the current row ID. For instance, you may write c("x").at_or(id, 0.0). This will return the value from column "x" for some row (or return the default value).
The point is that the expression is evaluated iteratively for all rows, and the results are collected into a series.
A motivating example was already presented in the example section.avg(column: column<f64>): f64 – returns the average over all rows. Any occurrence of nan is ignored.count(column: column<T>): i64 – returns the number of (non-missing) rows.max(column: column<f64>): f64 – returns the maximum of all values in the column. Any occurrence of nan is ignored.median(column: column<f64>): f64 – returns the median of the values in the column. Any occurrence of nan is ignored.min(column: column<f64>): f64 – returns the minimum of all values in the column. Any occurrence of nan is ignored.mode(column: column<bool/i64/string>): bool/i64/string – returns the value that occurs most often in the column. If there are several values with the same count, it returns the last element (according to the respective order).sum(column: column<f64>): f64 – returns the sum of all values in the column. Any occurrence of nan is ignored.Here, we list all other functions that do not have a direct Rhai counterpart (in contrast to the section on overloaded operators and functions).
row_ids(column: column<T>): column<i64> – returns a new column constructed from the given one, where the value of each data point is set to the timestamp
See:
The following operators are defined:
+(i64/f64): i64/f64-(i64/f64): i64/f64+(i64/f64, i64/f64): i64/f64-(i64/f64, i64/f64): i64/f64*(i64/f64, i64/f64): i64/f64/(i64/f64, i64/f64): i64/f64%(i64/f64, i64/f64): i64/f64**(i64/f64, i64/f64): i64/f64&(i64, i64): i64|(i64, i64): i64^(i64, i64): i64<<(i64, i64): i64>>(i64, i64): i64!(bool): bool&(bool, bool): bool|(bool, bool): bool^(bool, bool): bool+(string, string): stringfalse on different argument types):==(bool/i64/f64/string, bool/i64/f64/string): bool!=(bool/i64/f64/string, bool/i64/f64/string): bool<(i64/f64, i64/f64): bool<=(i64/f64, i64/f64): bool>(i64/f64, i64/f64): bool>=(i64/f64, i64/f64): boolBinary arithmetic and comparison operators can handle mixed i64 and f64 arguments properly; the other parameter is then implicitly converted beforehand via to_float. Binary arithmetic operators will return f64 if at least one f64 argument is involved.
See:
The following functions are defined:
abs(i64/f64): i64/f64sign(i64/f64): i64sqrt(f64): f64exp(f64): f64ln(f64): f64log(f64): f64log(f64, f64): f64sin(f64): f64cos(f64): f64tan(f64): f64sinh(f64): f64cosh(f64): f64tanh(f64): f64asin(f64): f64acos(f64): f64atan(f64): f64asinh(f64): f64acosh(f64): f64atanh(f64): f64hypot(f64, f64): f64atan(f64, f64): f64floor(f64): f64ceiling(f64): f64round(f64): f64int(f64): f64fraction(f64): f64contains(string): boollen(string): i64trim(string): string – with whitespace characters as defined in UTF-8to_upper(string): stringto_lower(string): stringsub_string(value: string, start: i64, end: i64): stringto_int(bool): i64 – returns 1/0to_float(bool): f64 – returns 1.0/0.0to_string(bool): string – returns "true"/"false"to_float(i64): f64to_string(i64): stringto_int(f64): i64 – returns 0 on NaN; values beyond INTEGER_MAX/INTEGER_MIN are cappedto_string(f64): stringto_degrees(f64): f64to_radians(f64): f64parse_int(string): i64 – throws error if not parsableparse_float(string): f64 – throws error if not parsableis_zero(i64/f64): boolis_odd(i64): boolis_even(i64): boolis_nan(f64): boolis_finite(f64): boolis_infinite(f64): boolis_empty(string): boolNAN):max(i64/f64, i64/f64): i64/f64min(i64/f64, i64/f64): i64/f64Comparison operators can handle mixed i64 and f64 arguments properly; the other parameter is then implicitly converted beforehand via to_float. They will return f64 if at least one f64 argument is involved.
The Boolean conversion and comparison functions have been added and are not part of the official Rhai.
The following constants are defined in Rhai:
PI(): f64 – Archimedes' constant: 3.1415...E(): f64 – Euler's number: 2.718...Usually the workflow steps will run as part of two different service applications: Training App and Inference App
The diagrams below display typical blueprints of these service applications using different available components of the engine as well as where they might be located in the end-customer infrastructure landscape (execution environments).
The following color code is used:
The service application Training App covers the workflow step Training, as well as any bulk inference, e.g., for historical evaluation.
It is executed in the so-called Cold World, which means that it consists of long-running tasks that are executed infrequently and have high resource consumption. Training App works on historical data that was previously archived and thus needs to be retrieved in an additional step from the Data Lake / Cold Storage.
Because of its high resource consumption, it is usually not located in the OT network, but is a good fit for the cloud or an on-premise datacenter.
The service application Inference App provides the means for live prediction.
In contrast to the Training App, it runs within the Hot World. Usually it is an ongoing process which serves to predict the current value and only needs minimal resources. Inference App works on live data that is easily available from the Historian / Hot Storage.
As the outcome often influences the production systems (e.g., Advanced Process Control), it usually runs in the OT network. Thanks to low resource consumption, it can run on practically any environment/device, be it in the cloud, on-premise, on-edge or even embedded.
aivis engine v2 toolbox is not an official part of aivis engine v2 but an associated side project. It mainly provides tools to turn output artifacts of aivis engine v2 into technical, single-file HTML reports for data scientists. Its API and behavior are subject to change and experimental. Users should already know the concepts of aivis engine v2 beforehand.
Caveats:
The aivis engine v2 toolbox does not need a licensing key. Its Python code is free to look into or even adapt. The respective toolbox release belonging to an aivis engine v2 release {VERSION} is available as:
aivis_engine_v2_toolbox-{VERSION}-py3-none-any.whl aivis-engine-v2-toolbox:{VERSION}Each call to construct a toolbox HTML report for engine xy has the following structure:
from aivis_engine_v2_toolbox.api import build_xy_report
config = {
"title": "My Use Case Title",
...
"outputFile": "/path/to/my-use-case-report.html"}
build_xy_report(config)
Additionally, the config needs to contain references to the respective engine's output files, e.g., "analysisReportFile": "/path/to/analysis-report.json". The full call to create a report for any engine can, for example, be found in the Python or Argo examples of the respective engine.
There are many optional expert configurations to customize your HTML report. Some examples:
The aivis engine v2 toolbox always assumes timestamps to be Unix and translates them to readable dates. This behavior can be switched off via "advancedConfig": {"unixTime": False}, so that timestamps always remain long values.
By referring to a metadata file via "metadataFile": "/path/to/metadata.json", signals are not only described via their signal ID but enriched with more information. The metadata JSON contains an array of signals with the keys id (required) as well as name, description, unitSymbol, unitType (all optional):
{
"signals": [{
"id": "fa6c65bb-5cee-45fa-ab19-355ba94889e9",
"name": "et 1",
"description": "extruder temperature nr. 1",
"unitName": "Kelvin",
"unitSymbol": "K"
}, {
"id": "dc3477e5-a83c-4485-b7f4-7528d336d9c4",
"name": "abc 2"
},
...
]}
Additional signals can be added to every HTML report that contains a time series plot to also be displayed. However, it is not automatic to include all signals of the dataset for display, since a full dataset is typically an amount of data that should not be put into a single-file HTML.
All custom configuration options can be seen in the api.py file in src/aivis_engine_v2_toolbox.