aivis Engine v2 - Response Analysis - User Guide

Download OpenAPI specification:Download

aivis Response Analysis is one of the engines of the aivis Technology Platform by Vernaio.
aivis Response Analysis provides insights into underlying causes and informs about how to accomplish user-defined goals with clear and actionable instructions discovered from the input data. It provides breakdowns of the input data with respect to a goal, i.e., a key performance indicator (KPI), such that users can learn not only the cause-and-effect but also ways to achieve/optimize the KPI. This is done by recursively performing counterfactual analyses on the relationships between the KPI and the corresponding data input, looking for conditions that have the most impact on the change in KPI, a concept that is similar to the controlled direct effect of causal inference, proposed by J. Pearl (2001). As such, its explainability is one of the key strengths of the engine, which differentiates itself from other conventional machine learning models.

By improving traditional decision/regression tree models with novel mathematics, aivis Response Analysis achieves cutting-edge performance in identifying the causes of disruptions and/or in finding the ways to accomplish user-defined goals, while requiring minimal input/configurations from the users.

The engine generates an analysis report based on historical tabular data that includes data values that represents the user's objective such as a KPI.

Introduction

API References

This documentation explains the usage and principles behind aivis Response Analysis for data and software engineers. For detailed API descriptions of docker images, web endpoints and SDK functions, please consult the reference manual of the relevant component:

SDKs
- Python API reference: base, Response Analysis.
- Java API reference.
- C API reference.
Docker Images
- App-API
  - Response Analysis Worker.
- Web-API
  - Response Analysis Worker.

For additional support, go to Vernaio Support.

Artifact Distribution

Currently, aivis Response Analysis is distributed to a closed user base only. To gain access to the artifacts, as well as for any other questions, you can open a support ticket via aivis Support.

Tabular Data / Columns

Unlike other aivis engines, The Response Analysis engine takes tabular data as an input. It is thus important to understand the concept of tabular data in the context of the Response Analysis engine. This chapter thus explains the terminologies as well as the required format of tabular data.

A typical example of tabular data is a score board, say, of a football league.

Team	Score	Win	Loss	Draw	Home	Injury
FC Blue	109	10	2	3	"City A"	true
FC Red	92	8	3	4	"City A"	false
FC Black	78	6	3	6	"City B"
...	...	...	...	...	...	...

Tabular data consist of columns. Every column contains two things, these being:

a column_id, which is any arbitrary String, like Score and Win. The column_id needs to be unique within the data.
a list of cells, each of which consists of a tuple of (row_id, value), like (FC Blue, 109).

Tabular data fed into the engine input may look like this:

Score	Win	Loss	Draw	Home	Injury
(FC Blue, 109)	(FC Blue, 10)	(FC Blue, 2)	(FC Blue, 3)	(FC Blue, "City A")	(FC Blue, true)
(FC Red, 92)	(FC Red, 8)	(FC Red, 3)	(FC Red, 4)	(FC Red, "City A")	(FC Red, false)
(FC Black, 78)	(FC Black, 6)	(FC Black, 3)	(FC Black, 6)	(FC Black, "City B")

Here, Score, Win, Loss, etc., are column_ids. And the data entries are cells. The value of a cell can be float, string, or boolean. aivis Response Analysis engine can handle empty cells. In aivis we also call a row a record. In the above case, we have 3 records.

Float Cell

Numbers are stored as 64-bit Floating Point numbers. They are written in scientific notation like -341.4333e-44, so they consist of the compulsory part Significand and an optional part Exponent that is separated by an e or E.

The Significand contains one or multiple digits and optionally a decimal separator .. In such a case, digits before or after the separator can be omitted and are assumed to be 0. It can be prefixed with a sign (+ or -).

The Exponent contains one or multiple digits and can be prefixed with a sign, too.

The 64-bit Floating Point specification also allows for 3 non-finite values (not a number, positive infinity and negative infinity) that can be written as nan, inf/+inf and -inf (case insensitive). These values are valid, but the engine regards them as being unknown and they are therefore skipped.

Regular expression: (?i:nan)|[+-]?(?i:inf)|[+-]?(?:\d+\.?|\d*\.\d+)(?:[Ee][+-]?\d+)?

String Cell

String values must be encoded as UTF-8. Empty strings are regarded as being unknown values and are therefore skipped.

Boolean Cell

Boolean values must be written in one of the following ways:

true/false (case insensitive)
1/0
1.0/0.0 with an arbitrary number of additional zeros at the end

Regular expression: (?i:true)|(?i:false)|1(\.0+)?|0(\.0+)?

Workflow

aivis Response Analysis engine performs an analysis on the input tabular data. The tabular data should contain a column that the user wants to analyze, such as a KPI. This column will be referred to as the target. The end result of the engine is the report.

Evaluation

Example Use Case

Equipped with the knowledge of tabular data, we are now ready to use the aivis Response Analysis engine. As an illustrative use case example, we will use the engine to learn about German credit health, e.g., what attributes positively and negatively contribute to one's credit health. Each person is represented by a row, listing a number of attributes and one's credit health, the latter of which will be used as the target KPI.

Getting Started (SDK)

The SDK of aivis Response Analysis allows for direct calls from your C, Java or Python program code. All language SDKs internally use our native shared library (FFI). As C APIs can be called from various other languages as well, the C-SDK can also be used with languages such as R, Go, Julia, Rust, and more. Compared to the docker images, the SDK enables a more fine-grained usage and tighter integration.

In this chapter we will show you how to get started using the SDK.

Run Example Code

A working sdk example that builds on the code explained below can be downloaded directly here:

response-analysis-examples.zip (latest)

For the following installation instruction, always replace:

{ENGINE} by the engine's acronym, here ra for response analysis
{VERSION} by the aivis version you want to install, e.g. 2.11.0
{TARGET} by the target fitting your operating system, e.g. win_amd64 - see artifacts for other options on linux and macos

To run aivis, there are two things you need in addition to the zipped examples you just downloaded: an aivis licensing key and access to Vernaio's artifact repository (Nexus Sonatype). To obtain them, please contact aivis Support.

We recommend running the example in the following way:

Make sure you have a valid Python (>=3.10) installation.
Create an environment variable AIVIS_ENGINE_V2_API_KEY and assign the aivis licensing key to it.
Make sure you have an active internet connection so that the licensing server can be contacted and dependencies can be downloaded.

Download and unzip the examples from the link above. The folder now has the following structure:

+- data
|  +- # CSV file(s) containing the data the example is based on; Docker, Java and Python code read the same CSV files
|
+- docker
|  +- # files to run the example via Docker images which we won't need now 
|
+- java
|  +- # files to run the example via Java SDK which we won't need now 
|
+- python
|  +- # files to run the example via Python SDK

Navigate to the **/python subfolder. Here, you will find the classic .py Python script and a .ipynb Jupyter notebook. Both run the exact same example and output the same result. Choose which one you want to run.
Open a console in the **/python subfolder and run the following commands.
Make sure to install Poetry, a Python package manager:
```
  python -m pip install poetry
```
Log in to Vernaio's artifact repository and (-> upper right) access your user token name code and user token pass code.

Connect to the artifact repository:

  # configure your credentials 
  poetry config http-basic.vernaio-python <user token name code> <user token pass code>

Install aivis and other dependencies as defined in Poetry's configuration file pyproject.toml. This step can take a little while:
```
  poetry install --no-root
```

Option A - run the classic Python script:

  # runs the classic Python script `example_{ENGINE}.py`
  poetry run python example_{ENGINE}.py --input=../data --output=output

Option B - run the Jupyter notebook (the second command opens a tab in your browser):

  # installs Jupyter kernel
  poetry run ipython kernel install --user --name=aivis

  # runs the Jupyter Python script `example_{ENGINE}.ipynb`
  poetry run jupyter notebook example_{ENGINE}.ipynb

After running the scripts, you will find your computation results in **/python/output.

Done!!

Of course, there are various ways to install Python dependencies on your machine. We only mention some alternatives briefly here, as this is not specific to aivis and can be looked up everywhere. For example, you could download the dependencies as .whl files from the artifact repository. You'll need the following ones, as also listed in Poetry's configuration file pyproject.toml:

vernaio_aivis_engine_v2_{ENGINE}_runtime_python_full-{VERSION}-py3-none-{TARGET}.whl: A full Python runtime for the engine you want to run
vernaio_aivis_engine_v2_base_sdk_python-{VERSION}-py3-none-any.whl: The base Python SDK
vernaio_aivis_engine_v2_{ENGINE}_sdk_python-{VERSION}-py3-none-any.whl: The Python SDK for the engine you want to run
vernaio_aivis_engine_v2_toolbox-{TOOLBOX-VERSION}-py3-none-any.whl: The toolbox Python SDK to post-process the output of aivis and generate an HTML report

These .whl files can now be installed directly.

You could still use Poetry and adapt the pyproject.toml. (In that case, remove the vernaio-python source definition from pyproject.toml and skip the poetry config step from above.)

[tool.poetry.dependencies]
vernaio-aivis-engine-v2-base-sdk-python = { file = "path/to/vernaio_aivis_engine_v2_base_sdk_python-{VERSION}-py3-none-any.whl" }
# etc.

You are even free to ignore Poetry altogether and install the .whl files directly in pip:

pip install path/to/vernaio_aivis_engine_v2_base_sdk_python-{VERSION}-py3-none-any.whl
# etc.

Then you can directly run the Python script (don't forget --input=../data --output=output) or the notebook in your preferred way.

Of course, there are various ways to install Java dependencies. We recommend running the example in the following way:

Make sure you have a valid Java (>=11) installation.
Create an environment variable AIVIS_ENGINE_V2_API_KEY and assign the aivis licensing key to it.
Make sure you have an active internet connection so that the licensing server can be contacted.

Download and unzip the examples from the link above. The folder now has the following structure:

+- data
|  +- # CSV file(s) containing the data the example is based on; Docker, Java and Python code read the same CSV files
|
+- docker
|  +- # files to run the example via Docker images which we won't need now 
|
+- java
|  +- # files to run the example via Java SDK
|
+- python
|  +- # files to run the example via Python SDK which we won't need now

We use Gradle as our Java package manager. It's easiest to directly use the Gradle wrapper.
Log in to Vernaio's artifact repository and (-> upper right) access your user token name code and user token pass code.

Connect to the artifact repository by, e.g., adapting the build.gradle:

    credentials {
          username <user token name code>
          password <user token pass code>
    }

(If you like, you could alternatively adapt your gradle.properties file.)

Open a console in the **/java subfolder and run the following commands:

  # builds this Java project with Gradle wrapper
  ./gradlew clean build

  # runs Java with parameters referring to input and output folder
  java -jar build/libs/example_{ENGINE}.jar --input=../data --output=output

After running the scripts, you will find your computation results in **/java/output.

Done!!

Artifacts

Our SDK artifacts come in one flavor full (note that inf flavor will be introduced in the future release):

full packages provide the full functionality and are available for mainstream targets only:
- win-x8664
- macos-armv8* (macOS 11 "Big Sur" or later) 2.3
- macos-x8664* (macOS 11 "Big Sur" or later; until aivis engine version 2.9.0) 2.3
- linux-x8664 (glibc >= 2.14)

* Only Python and C SDKs are supported. Java SDK is not available for this target.

In this chapter we want to demonstrate the full API functionality and thus always use the full package.

To use the Python-SDK you must download the SDK artifact (flavor and target generic) for your pythonpath at build time. Additionally at installation time, the runtime artifact must be downloaded with the right flavor and target.

The artifacts are distributed through a PyPI registry.

Using Poetry you can simply set a dependency on the artifacts specifying flavor and version. The target is chosen depending on your installation system:

aivis_engine_v2_ra_sdk_python = "{VERSION}"
aivis_engine_v2_ra_runtime_python_{FLAVOR} = "{VERSION}"

To use the Java-SDK, you must download at build time:

SDK artifact (flavor and target generic) for your compile and runtime classpath
Runtime artifact with the right flavor and target for your runtime classpath

It is possible to include multiple runtime artifacts for different targets in your application to allow cross-platform usage. The SDK chooses the right runtime artifact at runtime.

The artifacts are distributed through a Maven registry.

Using Maven, you can simply set a dependency on the artifacts specifying flavor, version and target:

<dependency>
  <groupId>com.vernaio</groupId>
  <artifactId>aivis-engine-v2-ra-sdk-java</artifactId>
  <version>{VERSION}</version>
</dependency>
<dependency>
  <groupId>com.vernaio</groupId>
  <artifactId>aivis-engine-v2-ra-runtime-java-{FLAVOR}-{TARGET}</artifactId>
  <version>{VERSION}</version>
  <scope>runtime</scope>
</dependency>

Alternativly, with Gradle:

implementation 'com.vernaio:aivis-engine-v2-ra-sdk-java:{VERSION}'
runtimeOnly    'com.vernaio:aivis-engine-v2-ra-runtime-java-{FLAVOR}-{TARGET}:{VERSION}'

To use the C-SDK, you must download the SDK artifact at build time (flavor and target generic). For final linkage/execution you need the runtime artifact with the right flavor and target.

The artifacts are distributed through a Conan registry.

Using Conan, you can simply set a dependency on the artifact specifying flavor and version. The target is chosen depending on your build settings:

aivis-engine-v2-ra-sdk-c/{VERSION}
aivis-engine-v2-ra-runtime-c-{FLAVOR}/{VERSION}

The SDK artifact contains:

Headers: include/aivis-engine-v2-ra-core-full.h

The runtime artifact contains:

Import library (LIB file), if Windows target: lib/aivis-engine-v2-ra-{FLAVOR}-{TARGET}.lib
Runtime library (DLL file), if Windows target: bin/aivis-engine-v2-ra-{FLAVOR}-{TARGET}.dll (also containing the import library)
Runtime library (SO file), if Linux target: lib/aivis-engine-v2-ra-{FLAVOR}-{TARGET}.so (also containing the import library)

The runtime library must be shipped to the final execution system.

Licensing

A valid licensing key is necessary for every aivis calculation in every engine and every component. It has to be set (exported) as the environment variable AIVIS_ENGINE_V2_API_KEY.

aivis will send HTTPS requests to https://v3.aivis-engine-v2.vernaio-licensing.com (before release 2.7: https://v2.aivis-engine-v2.vernaio-licensing.com, before release 2.3: https://aivis-engine-v2.perfectpattern-licensing.de) to check if your licensing key is valid. Therefore, the requirements are an active internet connection as well as no firewall blocking an application other than the browser from calling this URL.

If aivis returns a licensing error, please check the following items before contacting aivis Support:

Has the environment variable been set correctly?
Licensing keys have the typical form <FirstPartOfKey>.<SecondPartOfKey> with the first and second parts being UUIDs. In particular, there must be no whitespace.
Applications and, in particular, terminals often need to be restarted to learn newly set environment variables.
Open https://v3.aivis-engine-v2.vernaio-licensing.com in your browser. The expected outcome is "Method Not Allowed". In that case, at least the URL is not generally blocked.
Sometimes, firewalls block applications other than the browser from accessing certain or all websites. Try to investigate if you have such a strict firewall.

Setup

Before we can invoke API functions of our SDK, we need to set it up for proper usage and consider the following things.

Releasing Unused Objects

It is important to ensure the release of allocated memory for unused objects.

In Python, freeing objects and destroying engine resources is done automatically. You can force resource destruction with the appropriate destroy function.

In Java, freeing objects is done automatically, but you need to destroy all engine resources like Data- and Analysis-objects with the appropriate destroy function. As they all implement Java’s AutoClosable interface, we can also write a try-with-resource statement to auto-destroy them:

try(final ResponseAnalysisData inputData = ResponseAnalysisData.create()) {

  // ... do stuff ...

} // auto-destroy when leaving block

In C, you must always

free every non-null pointer allocated by the engine with aivis_free (all pointers returned by functions and all double pointers used as output function parameter e.g. Error*)
Note: aivis_free will only free own objects. Also, it will free objects only once and it disregards null pointers.
free your own objects with free as usual.
destroy all handles after usage with the appropriate destroy function.

Error Handling

Errors and exceptions report what went wrong on a function call. They can be caught and processed by the outside.

In Python, an Exception is thrown and can be caught conveniently.

In Java, an AbstractAivisException is thrown and can be caught conveniently.

In C, every API function can write an error to the given output function parameter &err (to disable this, just set it to NULL). This parameter can then be checked by a helper function similar to the following:

const Error *err = NULL;

void check_err(const Error **err, const char *action) {

  // everything is fine, no error
  if (*err == NULL)
    return;

  // print information
  printf("\taivis Error: %s - %s\n", action, (*err)->json);

  // release error pointer
  aivis_free(*err);
  *err = NULL;

  // exit program
  exit(EXIT_FAILURE);
}

Failures within function calls will never affect the state of the engine.

Logging

The engine emits log messages to report on the progress of each task and to give valuable insights. These log messages can be caught via registered loggers.

# create logger
class Logger(EngineLogger):
    def log(self, level, thread, module, message):
        if (level <= 3):
            print("\t... %s" % message)

# register logger
ResponseAnalysisSetup.register_logger(Logger())

// create and register logger
ResponseAnalysisSetup.registerLogger(new EngineLogger() {
            
    public void log(int level, String thread, String module, String message) {
        if (level <= 3) {
            System.out.println(String.format("\t... %s", message));
        }
    }
});

// create logger
void logger(const uint8_t level, const char *thread, const char *module, const char *message) {
  if (lvl <= 3)
    printf("\t... %s\n", message);
}

// register logger
aivis_setup_register_logger(&logger, &err);
check_err(&err, "Register logger");

Thread Management

During the usage of the engine, a lot of calculations are done. Parallelism can drastically speed things up. Therefore, set the maximal threads to a limited number of CPU cores or set it to 0 to use all available cores (defaults to 0).

# init thread count
ResponseAnalysisSetup.init_thread_count(4)

// init thread count
ResponseAnalysisSetup.initThreadCount(4);

// init thread count
aivis_setup_init_thread_count(4, &err);
check_err(&err, "Init thread count");

Data Input

Now that we are done setting up the SDK, we need to create a data store that holds our historical tabular data. In general, all data must always be provided through data stores. You can create as many as you want.

After the creation of the data store, you can fill it with signal data. The classic way to do it is writing your own reading function and adding signals, i.e. lists of data points, to the data context yourself, as it is shown in Data Reader Options.

We recommend using the built-in files reader, which processes a folder with CSV files that have to follow the CSV Format Specification. We assume that the folder path/to/input/folder/ contains train_ra.csv.

# create empty data context for analysis data
analysis_data = ResponseAnalysisData.create()

# create config for files reader
files_reader_config = json.dumps(
    {
        "folder": "path/to/input/folder/"
    }
)

# read data 
analysis_data.read_files(files_reader_config)

# ... use analysis data ...

// create empty data context for analysis data
try(final ResponseAnalysisData analysisData = ResponseAnalysisData.create()) {
  
  // create config for files reader
  final DtoTabularFilesReaderConfig filesReaderConfig = new DtoTabularFilesReaderConfig("path/to/input/folder/");
  
  // read data 
  analysisData.readFiles(filesReaderConfig);
  
  // ... use analysis data ...
  
} // auto-destroy analysis data

// create empty data context for analysis data
TabularDataHandle analysis_data = aivis_tabular_data_create(&err);
check_err(&err, "Create analysis data context");

// create config for files reader
const char *reader_config = "{"
  "\"folder\": \"path_to_input_folder\""
"}";

// read data 
aivis_tabular_data_read_files(analysis_data, (uint8_t *) reader_config, strlen(reader_config), &err);
check_err(&err, "Read Files");

// ... use analysis data ...

// destroy data context
aivis_tabular_data_destroy(analysis_data, &err);
check_err(&err, "Destroy data context");
analysis_data = 0;

In the following, we will assume you have read in the file train_ra.csv shipped with the Example Project.

Analysis

With the data store filled with historical tabular data, we can now create our analysis:

# build analysis config
analysis_config = json.dumps(
    {
        "dataFilter": {"excludeColumns": ["telephone"]}, # this field is optional
        "target": {
            "kpi": "TARGET",
            "interest": "HIGH_KPI",
        },
        "strategy": { # this field is optional
            "minimalFractionOfRecords": 0.05,
        },
    }
)

# create analysis
analysis = ResponseAnalysis.create(analysis_data, analysis_config)

# ... use analysis ...

// build analysis config
final DtoAnalysisConfig analysisConfig =
  new DtoAnalysisConfig(new DtoTargetConfig("TARGET", DtoInterest.HIGH_KPI)).withDataFilter(
    new DtoTabularDataFilter().withExcludeColumns(new String[] { "telephone" }) // this field is optional
  ).withStrategy(new DtoStrategy().withMinimalFractionOfRecords(0.05)); // this field is optional

// create analysis
final ResponseAnalysis analysis = ResponseAnalysis.create(analysisData, analysisConfig) {

  // ... use analysis ...

} // auto-destroy analysis

// build analysis config
const char *analysis_config = "{"
  "\"dataFilter\": {\"excludeColumns\": [\"telephone\"]}," // this field is optional
  "\"target\": {"
  "\"kpi\": \"TARGET\","
  "\"interest\": \"HIGH_KPI\" "
  "},"
  "\"strategy\": {" // this field is optional
  "\"minimalFractionOfRecords\": 0.05 "
  "}"
  "}";

// create analysis
ResponseAnalysisHandle analysis_handle = aivis_response_analysis_create(
  analysis_data,
  (uint8_t *) analysis_config,
  strlen(analysis_config),
  &err
);
check_err(&err, "Create analysis");

// ... use analysis ...

// destroy analysis
aivis_response_analysis_destroy(analysis_handle, &err);
check_err(&err, "Destroy analysis");
analysis_handle = 0;

Note that the target column is specified by kpi in target. The field called interest then defines whether the goal is to maximize (high_kpi) or minimize (low_kpi) the target. The field strategy is optional and dictates the minimum number of records in the leaf nodes.

For the moment, you may take this file as it is. The different keys will become clearer from the later sections and the reference manual.

Getting Started (Docker)

The docker images of aivis Response Analysis are prepared for easy usage. They use the SDK internally, but have a simpler file-based interface. If you have a working docker workflow system like Argo, you can build your own automated workflow based on these images.

In this chapter, we will show you how to get started using docker images. Usage of the SDK will be covered by the next chapter.

Run Example Code

A working example that builds on the code explained below can be downloaded directly here: response-analysis-examples.zip.

This zip file contains example code for docker, python and java in respective subfolders. All of them use the same dataset which is in the data subfolder.

Prerequisites: Additionally to the response-analysis-examples.zip you just downloaded, you need the following artifacts. To gain access, you can open a support ticket via aivis Support.

The docker images aivis-engine-v2-ra-worker and (optionally for HTML report generation) aivis-engine-v2-toolbox
An aivis licensing key, see licensing

As a Kubernetes user even without deeper Argo knowledge, the aivis-engine-v2-example-ra-argo.yaml shows best how the containers are executed after each other, how the analysis worker is provided with a folder that contains the data CSV and how the toolbox assembles an HTML report at the end.

Artifacts

There is one docker image:

The Worker creates the report:
docker-releases.artifacts.vernaio.com/vernaio/aivis-engine-v2-ra-worker:{VERSION}

The docker image is Linux-based.

Requirements

You need an installation of Docker on your machine as well as access to Vernaio's artifact repository. Log in to Vernaio's artifact repository and (-> upper right) access your user token name code and user token pass code.

docker -v
docker login docker-releases.artifacts.vernaio.com <user token name code> <user token pass code>
docker pull docker-releases.artifacts.vernaio.com/vernaio/aivis-engine-v2-ra-worker:{VERSION}

docker -v
docker login docker-releases.artifacts.vernaio.com <user token name code> <user token pass code>
docker pull docker-releases.artifacts.vernaio.com/vernaio/aivis-engine-v2-ra-worker:{VERSION}

Licensing

A valid licensing key is necessary for every aivis calculation in every engine and every component. It has to be set (exported) as the environment variable AIVIS_ENGINE_V2_API_KEY.

If aivis returns a licensing error, please check the following items before contacting aivis Support:

Has the environment variable been set correctly?
Licensing keys have the typical form <FirstPartOfKey>.<SecondPartOfKey> with the first and second parts being UUIDs. In particular, there must be no whitespace.
Applications and, in particular, terminals often need to be restarted to learn newly set environment variables.
Open https://v3.aivis-engine-v2.vernaio-licensing.com in your browser. The expected outcome is "Method Not Allowed". In that case, at least the URL is not generally blocked.
Sometimes, firewalls block applications other than the browser from accessing certain or all websites. Try to investigate if you have such a strict firewall.

CSV format

All artifacts use CSV as the input data format. As the CSV format is highly non-standardized, we will discuss it briefly in this section.

CSV files must be stored in a single folder specified in the config under data.folder. Within this folder the CSV files can reside in an arbitrary subfolder hierarchy. In some cases (e.g. for HTTP requests), the folder must be passed as a ZIP file.

General CSV rules:

The file's charset must be UTF-8.
Records must be separated by Windows or Unix line ending (CR LF/LF). In other words, each record must be on its own line.
Fields must be separated by commas.
The first line of each CSV file represents the header, which must contain column headers that are file-unique.
Every record including the header must have the same number of fields.
Text values must be enclosed in quotation marks if they contain literal line endings, commas or quotation marks.
Quotation marks inside such a text value have to be prefixed (escaped) with another quotation mark.

Special rules:

One column must be called id and contain the row_ids.
All other columns, i.e. the ones that are not called id, are interpreted as "columns" of tabular data.
column_ids are defined by their column headers.
If there are multiple files containing the same column header, this data is regarded as belonging to the same "column" of tabular data.
Column values can be boolean values, numbers and strings.
Empty values are regarded as being unknown and are therefore skipped.
For a given row_id and column_id, there must be only one value; if there are multiple records for the same row_id and column_id, it raises an exception. (Note that this is different from that of the timeseries rules!)

Analysis

Here, we will analyze the target column with aivis Response Analysis engine.

At the beginning, we create a folder docker, a subfolder analysis-config and add the configuration file config.yaml:

data:
  folder: /srv/data
  dataTypes:
    defaultType: FLOAT
    stringColumns: # this field is optional
    - "status_of_existing_checking_account"
    - "purpose"
    - "savings_account"
    - "credit_history"
    - "marital_status_and_sex"
    - "other_debtors"
    - "property"
    - "other_installment_plans"
    - "housing"
    - "job"
    - "telephone"
    - "foreign_worker"
analysis:
  dataFilter: # this field is optional
    excludeColumns:
    - telephone
  target:
    kpi: TARGET
    interest: HIGH_KPI
  strategy: # this field is optional
    minimalFractionOfRecords: 0.05
output:
  folder: /srv/output

For the moment, you may take this file as it is. The different keys will become clearer from the later sections and the docker reference manual.

As a next step, we create a second folder data and add the Input Data CSV file train_ra.csv to the folder. Afterwards, we create a blank folder output.

Our folder structure should now look like this:

+- docker
|  +- analysis-config
|      +- config.yaml
|
+- data
|  +- train_ra.csv
|
+- output

Finally, we can start our analysis via:

docker run --rm -it \
  -v $(pwd)/docker/analysis-config:/srv/conf \
  -v $(pwd)/data/train_ra.csv:/srv/data/train_ra.csv \
  -v $(pwd)/output:/srv/output \
  docker-releases.artifacts.vernaio.com/vernaio/aivis-engine-v2-ra-worker:{VERSION}

docker run --rm -it `
  -v ${PWD}/docker/analysis-config:/srv/conf `
  -v ${PWD}/data/train_ra.csv:/srv/data/train_ra.csv `
  -v ${PWD}/output:/srv/output `
  docker-releases.artifacts.vernaio.com/vernaio/aivis-engine-v2-ra-worker:{VERSION}

After a short time, this results in an output file analysis-report.json in the output folder.

Getting Started (Analysis Report)

aivis Response Analysis engine outputs a report, which can be visualized as follows.

Evaluation

On the very left, one finds the root node that contains all records. Suppose TARGET values range from 0 to 1. The mean KPI in the root node is 0.7, which is the average value of TARGET entries of all records. The root node splits out to two child nodes. Each child node is reached by a predicate, which is expressed as, for example, A < x, where A is a column_id, and x is a value that splits the records of the node. It is the core engine algorithm that picks the column and the value to best split the node. A child node is itself again split into two nodes and so on.
Splitting stops when:

the ratio of the number of records in the node to the total number of records is smaller than minimal fraction of records (when unspecified, it defaults to 0.01).
or TARGET values of all records in the node are the same.

The final un-split nodes are called leaf nodes (the square nodes). Not only are those records that belong to leaf nodes grouped to have similar (if not the same) TARGET values, but they also tend to have extreme values. This implies that the "good" and "bad" records are well separated. aivis Response Analysis engine identifies the most informative pathways to arrive at "good" and "bad" TARGET value scenarios. By inspecting the predicates that lead to the leaf nodes, therefore, one can learn the reasons for good (or the cause of bad) performance of the KPI.

Getting Started (Example Results)

In this section, we discuss the results of our example project, German credit health. It contains 1000 people's credit health evaluation, compiled by a bank. The target KPI, which naturally represents the credit health, is binary, i.e., 0 for bad and 1 for good. The global average of the KPI is 0.7 meaning that 700 people (records) have good credit health (1) and 300 bad (0).

Above is an interactive report, where one can see a tree model generated by aivis Response Analysis engine.

Evaluation

Here, we highlighted some example paths along which we explore the analysis results.

Path 1 is the pathway leading to one of the bad leaf nodes. This leaf node's mean KPI is 0.094 with 32 people (from here on we use "people" and "records" interchangeably), meaning many belonging to this node have very poor credit health. Our engine identified that, in order to reach this leaf node, one needs to have the following attributes:

Having a checking account in the bank (the bank who compiled this data).
The duration of the requested loan is longer than 21 months.
Having less than 100 Deutsche Mark in the savings account.
- Having more than 100 DM in the savings account leads to Path 2.
The duration of the requested loan is shorter than or equal to 44.8 months.

What can we do to improve the credit health? Let us examine Path 2. Path 2 diverges from Path 1 when one has more than 100 DM in the savings account. At the end node of Path 2, which is not a leaf node, the mean KPI is 0.625 with 96 records. From this example, one can already learn that having some savings can substantially improve the credit health (0.393 -> 0.625).

Further improvement can be made by looking at Path 2.1. The predicate is:

credit_history != Existing credits paid back duly till now

49 records belong to this node with the mean KPI being 0.755. But what does it mean?

credit_history is a column id, which is a categorical column. The other categories are:

no credits taken/ all credits paid back duly
all credits at this bank paid back duly
delay in paying off in the past
critical account/ other credits existing (not at this bank)

So, in the context of path 2, if one's credit history is one of the above four (so it is not Existing credits paid back duly till now), then one likely has better credit health, on average 0.755, which is better than the global average, 0.7.

What about people with great credit health? At the end of Path 3 and Path 4 are the nodes in which all records have credit health 1.

In the leaf node of Path 3 are 99 records whose credit health is 1. What these 99 people have in common is:

Not having a checking account in the bank.
Not having other monthly installment plans.
At least as old as 30.2 years old.
- Younger than 30.2 years old leads to Path 4.
credit_history = Critical account/ other credits existing (not at this bank)

If you're younger than 30.2 years old, you still have a chance. The leaf node of Path 4 represents 41 people with credit health 1. They share the following attributes in addition to the shared attributes of Path 3:

Younger than 30.2 years old.
Ratio of installment payment of the requested loan to disposable income in percentage is smaller than 4%.
Present employment duration is at least as long as 3 years.
The reason for the loan is not to buy a new car.

As such, aivis Response Analysis engine provides insights that are tailored to one's specific needs and situation. Not only does the engine clearly explain the reasons for good and bad performances by means of predicates, but also the countermeasures are actionable, which will result in immediate improvements.

Preparation

Previous sections gave an introduction on how to use aivis Response Analysis and also shed some light on how it works. The following sections will explain more about the concept and provide a more profound background. It is not necessary to know this background to use aivis Response Analysis! However, you may find convenient solutions for specific problems, or information on how to optimize your usage of aivis Response Analysis. It will become clear that only minimal user input is required for the engine to perform well. Nevertheless, the user has the option to control the process with several input parameters which will be presented below.

Analysis

A Fully Loaded Analysis Configuration

First, an overview of all kinds of possible configuration keys is presented. A more minimal analysis configuration was used above in SDK analysis, respectively, in Docker analysis. This example may mainly serve as a quick reference. The meaning of the different keys is explained in the following sections, and a definition of the syntax is given in the reference manuals.

analysis:
  dataFilter:
    excludeColumns:
    - telephone
    # includeColumns: ... either exclude or include columns
  target:
    kpi: TARGET
    interest: HIGH_KPI # or LOW_KPI
    weight: MY_WEIGHT_COLUMN
  columns:
  - column: COLUMN_1
    interpreter:
      _type: Categorical
  - column: COLUMN_2
    interpreter:
      _type: Numerical
  - column: COLUMN_3
    interpreter:
      _type: Numerical
      quantileCount: 100
  strategy:
    minimalFractionOfRecords: 0.05

analysis_config = json.dumps({
  "dataFilter": {
    "excludeColumns": ["telephone"]
      # "includeColumns": ... either exclude or include columns
  },
  "target": {
    "kpi": "TARGET",
    "interest": "HIGH_KPI",
    "weight": "MY_WEIGHT_COLUMN",
  },
  "columns": [
    {
      "column": COLUMN_1,
      "interpreter": {
        "_type": "Categorical"
      }
    },
    {
      "column": COLUMN_2,
      "interpreter": {
        "_type": "Numerical"
      }
    },
    {
      "column": COLUMN_3,
      "interpreter": {
        "_type": "Numerical",
        "quantileCount": 100,
      }
    },
  ],
  "strategy": {
    "minimalFractionOfRecords": 0.05
      }
})

final DtoAnalysisConfig analysisConfig =
  new DtoAnalysisConfig(new DtoTargetConfig("TARGET", DtoInterest.HIGH_KPI).withWeight("MY_WEIGHT_COLUMN")).withDataFilter(
    new DtoTabularDataFilter().withExcludeColumns(new String[] { " telephone " })
  )
    // .withDataFilter(new DtoTabularDataFilter().withIncludeColumns(new String[] {"..."})) either exclude or include columns
    .withColumns(
      new IDtoColumnConfig[] {
        new DtoColumnConfig("COLUMN_1").withInterpreter(new DtoCategoricalColumnInterpreter()),
        new DtoColumnConfig("COLUMN_2").withInterpreter(new DtoNumericalColumnInterpreter()),
        new DtoColumnConfig("COLUMN_3").withInterpreter(new DtoNumericalColumnInterpreter().withQuantileCount(100)), }
    )
    .withStrategy(new DtoStrategy().withMinimalFractionOfRecords(0.05));

const char *analysis_config = "{"
  "\"dataFilter\": {"
    "\"excludeColumns\": ["
      "\"telephone\","
      "]",
      //"\"includeColumns\": [...]" either exclude or include columns
    "}",
  "\"target\": {"
    "\"kpi\": \"TARGET\","
    "\"interest\": \"HIGH_KPI\","
    "\"weight\": \"MY_WEIGHT_COLUMN\","
    "}",
  "\"columns\" : [{"
    "\"column\" : \"COLUMN_1\","
    "\"interpreter\": {"
    "\"_type\": \"Categorical\""
  "}},{"
    "\"column\" : \"COLUMN_2\","
    "\"interpreter\": {"
    "\"_type\": \"Numerical\""
  "}},{"
    "\"column\" : \"COLUMN_3\","
    "\"interpreter\": {"
    "\"_type\": \"Numerical\","
    "\"quantileCount\": 100"
  "}}"
"],"
  "\"strategy\" : {"
    "\"minimalFractionOfRecords\": 0.05,"
    "}"
  "}";

The following sections list and explain the parameters the user may configure to control the analysis. The sections are organized along the structure of the configuration classes.

Data Filter: Exclude Parts of the Data

The data filter allows you define the columns that are used for analysis. This can be done by either of the following ways: exclude columns, or, alternatively, provide a list of column ids to include (include columns).

Target: Define the Goal of The Analysis

The target column reflects the KPI of the analysis. It must therefore clearly reflect the goal of the analysis. The field kpi takes the ID of the target column. And interest decides whether the goal is to maximize (high_kpi config) or minimize (low_kpi config) the kpi. The optional field is weight, which takes a column_id as input. Each cell of this column represents a positive float that defines how much weight the corresponding TARGET cell should take, e.g., a weight of 2.0 behaves like two identical records of weight 1.0. If the weight is 0.0, or if the weight value is missing for some row, this row is excluded from analysis.

If there is no column that exactly matches your definition of the target or weight, you may want to express it as a function of other columns. The easiest way to do so is via the expression language. If you construct the target via some expression, all columns used in this expression are automatically excluded from the analysis.

Columns Configuration: If Columns Require Special Treatment

The column configuration is the place to pass additional information about a column in order to enforce special treatment. Each column configuration refers to one specific column.

Interpreter

At the core of the column configuration is the interpreter. The interpreter defines which predicates can be built from a column. Very often the default configuration is the best choice and you don't need to set any interpreter. Below you find a table on the different interpreters, followed by some more in-depth explanations.

By default, all float columns are interpreted as numerical and all categorical and boolean columns are interpreted as categorical.

numerical interpreter should be used for all columns for which the order of numbers is meaningful. For aivis Response Analysis engine, the numerical interpreter takes an additional optional argument, namely quantile count, the default value of which is set to be 20.

quantile count sets a resolution. Imagine a numerical column A whose cell values are ranging from 0 to 100. Suppose quantile count is 5. This means the engine will consider the values of this column up to a resolution of (100-0)/5 = 20. So it will generate 4 different predicates (plus their negations):

is A larger than 20?
is A larger than 40?
is A larger than 60?
is A larger than 80?

String and boolean columns are always interpreted as categorical. Categorical data has a nominal scale, i.e., it takes only specific levels and does not necessarily follow any order. In practice, this would express information about certain states, such as "green", "red", or "blue". This information may be present in the form of strings, booleans, or also encoded in numbers. An example could be a column for which "1.0" stands for "pipe open", "2.0" for "pipe blocked", and "3.0" for "pipe sending maintenance alarm". A categorical column A with possible values a1, a2 and a3 will currently generate 3 predicates (plus their negations):

is A equal to a1?
is A equal to a2?
is A equal to a3?

One may now wonder, if any column can be interpreted as categorical, why would numerical interpreter exist?

if ordering matters, then a column is better interpreted as numerical, since the resulting predicate on a numerical column will be, e.g., COLUMN_ID < 10, instead of COLUMN_ID != 10.
if a numerical column contains N number of unique values, and is interpreted as categorical, then the engine will create N number of predicates that all need to be considered. Therefore, if N is large, it is recommended to interpret it as numerical with quantile count being smaller than N.

Strategy

The field strategy contains a sub-field called minimal fraction of records, which defines a stopping criterion for splitting nodes. Splitting stops when:

the ratio of the number of records in the node to the total number of records is smaller than minimal fraction of records (when unspecified, it defaults to 0.01).
or TARGET values of all records in the node are the same.

Output: Report

As a result of analysis, a report is produced, which contains a tree model. With an appropriate visualization tool, one can generate the tree model from the report.

Appendix 1: Expression Language

2.10

Before starting the workflow, there is sometimes a need to add a new column to the dataset (a synthetic column) that is derived from other columns already present. There are various reasons for this, especially if:

you are interested in analyzing a quantity that is not in your data, but could be calculated by a formula. For that task, you need to add the new column via an expression and then use this new synthetic column as the target.
you want to give lower weight to or even exclude certain rows from the analysis. There is no specific weight column, but you know the rules to calculate the weight based on some existing columns.
you possess domain knowledge and want to include and direct the engine to some important derived quantity. Often certain derived quantities play a specific role in the application's domain and might be easier to understand/verify as opposed to the raw quantities.

Technically, you can add synthetic columns using the Docker images or any SDK Data API.

To create new synthetic columns in a flexible way, aivis features a rich Expression Language to articulate the formula.

The Expression Language is an extension of the scripting language Rhai. We have mainly added support for handling columns natively. This means you can use columns in normal operators and functions as if they were primitive values. You can even mix columns and primitive values in the same invocation. If at least one parameter is a column, the result will also be a column. The list of operators and functions that allow native column handling can be found in the section on operators and functions.

Information on the basic usage of the language can be found in the very helpful Language Reference of the Rhai Book. This documentation will mainly focus on the added features.

Column Type

A column consists of a list of data points that represents a series (row ids and values of the same type).

The following value types are supported:

bool : Boolean
i64 : 64-bit Integer
f64 : 64-bit Floating Point
string : UTF-8 String

A column type and its value type are written generically as column<T> and specifically like e.g. column<i64> for an integer column.

It is not possible to write down a column literally, but you can refer to an already existing column in your dataset.

Column References

Referring to an already existing column is done via:

c(column_id: string literal): column<T>

This function must be used exactly with the syntax above. It is not allowed to invoke it as a method on the column ID. The column ID must be a simple literal without any inner function invocation!

Examples:

c("my column id")        // OK
c("my c" + "olumn id")   // FAIL
"my column id".c()       // FAIL

Examples

To begin, let's start with a very simple example. Let "a" and "b" be the IDs of two float columns. Then

c("a") + c("b")

yields the sum of the two columns. The Rhai + operator has been overloaded to work directly on columns (as have many other operators, see below). Therefore, the above expression yields a new column. It contains data points for all rows for which there are entries in both "a" and "b".

However, you might instead want data points for all rows for which there are entries in "a" or in "b". A possible solution for this case is providing default values for each column:

c("a").or(1.0) + c("b").or(0.0)

For more details on handling missing values, see below.

As mentioned above, there are many more functions available for which there is native column support, and you can mix columns with primitive values:

(2 * PI() * (c("x") +0.5)).sin()

If these functions do not satisfy your needs, there is always the alternative not to work on the level of the columns but on the level of the primitive column entries. The following expression creates a string column that contains the string large for each row for which the float column "x" is larger than 10.

let val = c("x").at_or(id, 0.0);
if (val > 10.0) {
  "large"
} else {
  "small"
}

As explained below, the default value 0.0 in the at_or function has no effect in this case. More information on the function at_or and the literal id can be found in the dedicated section.

Handling missing values

There are a few functions for handling missing values, as listed below. But before going into details, it is important to understand how the Expression Language deals with the fact that different columns can have data points for different row IDs. Essentially, there are three steps involved:

The expression is evaluated for each row ID of any column that can be found in the expression.
If a column has no value for some row ID, the corresponding value is regarded as missing. Any function or operation on a missing value yields a missing value. The only exceptions are the functions listed in this section.
When collecting the final result of an expression, any missing values are ignored. This means the returned column does not include any row for which the expression could not be evaluated due to missing values.

Note: Rows that are not part of any column in the expression are not known to the expression and therefore cannot be handled, not even as missing rows.

The functions to handle missing values are:

filter(column: column<T>, condition: column<bool>): column<T> – returns a new column with the same values as column for all rows for which condition is true. For all other rows, values are missing in the returned column.
is_missing(column: column<T>): column<bool> – returns a new column that is true for all rows for which a value is missing in column, and false for rows for which column has a value.
or(column_1: column<T>, column_2: column<T>): column<T> – returns a new column with the same values as column_1 but replacing missing values by column_2
or(column_1: column<T>, default: T): column<T> – returns a new column with the same values as column_1 but replacing missing values by some default value

Moreover, also the following function replaces missing values.

at_or(column: column<T>, row_id: string, default: T): T – returns the column value at a given row.
If there is no value for that row ID, it returns the default value. However, this function is special and will thus be further explained in the following section.

The literal "id" and working on the level of primitive values

While the column-level functions are already powerful, you can unlock the full capabilities of Rhai by operating on primitive values instead of columns. To bridge this gap and work with individual row values (the primitives), you can use the at_or function in conjunction with the special literal id.

The at_or function allows you to retrieve the value of a specific row from a column. For instance, c("x").at_or("A", 0.0) will return the value from column "x" at row "A" if it exists; otherwise it will return the default value of 0.0.
If the result of the expression is a primitive value, the expression is evaluated for all row IDs. In each evaluation, the literal id is replaced by the current row ID. For instance, you may write c("x").at_or(id, 0.0). This will return the value from column "x" for some row (or return the default value). The point is that the expression is evaluated iteratively for all rows, and the results are collected into a series. A motivating example was already presented in the example section.

Column Functions that Reduce to a Primitive

avg(column: column<f64>): f64 – returns the average over all rows. Any occurrence of nan is ignored.
count(column: column<T>): i64 – returns the number of (non-missing) rows.
max(column: column<f64>): f64 – returns the maximum of all values in the column. Any occurrence of nan is ignored.
median(column: column<f64>): f64 – returns the median of the values in the column. Any occurrence of nan is ignored.
min(column: column<f64>): f64 – returns the minimum of all values in the column. Any occurrence of nan is ignored.
mode(column: column<bool/i64/string>): bool/i64/string – returns the value that occurs most often in the column. If there are several values with the same count, it returns the last element (according to the respective order).
sum(column: column<f64>): f64 – returns the sum of all values in the column. Any occurrence of nan is ignored.

Additional Column Functions

Here, we list all other functions that do not have a direct Rhai counterpart (in contrast to the section on overloaded operators and functions).

row_ids(column: column<T>): column<i64> – returns a new column constructed from the given one, where the value of each data point is set to the timestamp

Operators / Functions overloaded for native signal handling

Operators

See:

The following operators are defined:

Arithmetic:
- +(i64/f64): i64/f64
- -(i64/f64): i64/f64
- +(i64/f64, i64/f64): i64/f64
- -(i64/f64, i64/f64): i64/f64
- *(i64/f64, i64/f64): i64/f64
- /(i64/f64, i64/f64): i64/f64
- %(i64/f64, i64/f64): i64/f64
- **(i64/f64, i64/f64): i64/f64
Bitwise:
- &(i64, i64): i64
- |(i64, i64): i64
- ^(i64, i64): i64
- <<(i64, i64): i64
- >>(i64, i64): i64
Logical:
- !(bool): bool
- &(bool, bool): bool
- |(bool, bool): bool
- ^(bool, bool): bool
String:
- +(string, string): string
Comparison (returns false on different argument types):
- ==(bool/i64/f64/string, bool/i64/f64/string): bool
- !=(bool/i64/f64/string, bool/i64/f64/string): bool
- <(i64/f64, i64/f64): bool
- <=(i64/f64, i64/f64): bool
- >(i64/f64, i64/f64): bool
- >=(i64/f64, i64/f64): bool

Binary arithmetic and comparison operators can handle mixed i64 and f64 arguments properly; the other parameter is then implicitly converted beforehand via to_float. Binary arithmetic operators will return f64 if at least one f64 argument is involved.

Functions

See:

The following functions are defined:

Arithmetic:
- abs(i64/f64): i64/f64
- sign(i64/f64): i64
- sqrt(f64): f64
- exp(f64): f64
- ln(f64): f64
- log(f64): f64
- log(f64, f64): f64
Trigonometry:
- sin(f64): f64
- cos(f64): f64
- tan(f64): f64
- sinh(f64): f64
- cosh(f64): f64
- tanh(f64): f64
- asin(f64): f64
- acos(f64): f64
- atan(f64): f64
- asinh(f64): f64
- acosh(f64): f64
- atanh(f64): f64
- hypot(f64, f64): f64
- atan(f64, f64): f64
Rounding:
- floor(f64): f64
- ceiling(f64): f64
- round(f64): f64
- int(f64): f64
- fraction(f64): f64
String:
- contains(string): bool
- len(string): i64
- trim(string): string – with whitespace characters as defined in UTF-8
- to_upper(string): string
- to_lower(string): string
- sub_string(value: string, start: i64, end: i64): string
Conversion:
- to_int(bool): i64 – returns 1/0
- to_float(bool): f64 – returns 1.0/0.0
- to_string(bool): string – returns "true"/"false"
- to_float(i64): f64
- to_string(i64): string
- to_int(f64): i64 – returns 0 on NaN; values beyond INTEGER_MAX/INTEGER_MIN are capped
- to_string(f64): string
- to_degrees(f64): f64
- to_radians(f64): f64
- parse_int(string): i64 – throws error if not parsable
- parse_float(string): f64 – throws error if not parsable
Testing:
- is_zero(i64/f64): bool
- is_odd(i64): bool
- is_even(i64): bool
- is_nan(f64): bool
- is_finite(f64): bool
- is_infinite(f64): bool
- is_empty(string): bool
Comparison (returns other parameter on NAN):
- max(i64/f64, i64/f64): i64/f64
- min(i64/f64, i64/f64): i64/f64

Comparison operators can handle mixed i64 and f64 arguments properly; the other parameter is then implicitly converted beforehand via to_float. They will return f64 if at least one f64 argument is involved.

The Boolean conversion and comparison functions have been added and are not part of the official Rhai.

Constants

The following constants are defined in Rhai:

PI(): f64 – Archimedes' constant: 3.1415...
E(): f64 – Euler's number: 2.718...

Appendix 2: Integration Scenarios

Usually the workflow steps will run as part of two different service applications: Training App and Inference App

The diagrams below display typical blueprints of these service applications using different available components of the engine as well as where they might be located in the end-customer infrastructure landscape (execution environments).

The following color code is used:

Blue boxes denote aivis Signal Prediction components
Purple boxes represent components that are provided by the service application provider, which can be Vernaio, an industry partner, a reseller or the customer themselves
Grey boxes symbolize 3rd party components (typical vendor systems/services that can be used are suggested in the speech balloons)

Training App

The service application Training App covers the workflow step Training, as well as any bulk inference, e.g., for historical evaluation.

It is executed in the so-called Cold World, which means that it consists of long-running tasks that are executed infrequently and have high resource consumption. Training App works on historical data that was previously archived and thus needs to be retrieved in an additional step from the Data Lake / Cold Storage.

Because of its high resource consumption, it is usually not located in the OT network, but is a good fit for the cloud or an on-premise datacenter.

Via Docker

Training App via Docker

Via SDK

Training App via SDK

Inference App

The service application Inference App provides the means for live prediction.

In contrast to the Training App, it runs within the Hot World. Usually it is an ongoing process which serves to predict the current value and only needs minimal resources. Inference App works on live data that is easily available from the Historian / Hot Storage.

As the outcome often influences the production systems (e.g., Advanced Process Control), it usually runs in the OT network. Thanks to low resource consumption, it can run on practically any environment/device, be it in the cloud, on-premise, on-edge or even embedded.

Via Docker

Inference App via Docker

Via SDK

Inference App via SDK

Infrastructure Landscape

Appendix 3: Toolbox

aivis engine v2 toolbox is not an official part of aivis engine v2 but an associated side project. It mainly provides tools to turn output artifacts of aivis engine v2 into technical, single-file HTML reports for data scientists. Its API and behavior are subject to change and experimental. Users should already know the concepts of aivis engine v2 beforehand.

Caveats:

Large input files or extensive settings might lead to poor UI responsiveness.
UI layouts are optimized for wide screens.

Setup

The aivis engine v2 toolbox does not need a licensing key. Its Python code is free to look into or even adapt. The respective toolbox release belonging to an aivis engine v2 release {VERSION} is available as:

Python whl file: aivis_engine_v2_toolbox-{VERSION}-py3-none-any.whl
Docker Image: aivis-engine-v2-toolbox:{VERSION}

Create Engine Report

Each call to construct a toolbox HTML report for engine xy has the following structure:

from aivis_engine_v2_toolbox.api import build_xy_report

config = {
    "title": "My Use Case Title", 
    ...
    "outputFile": "/path/to/my-use-case-report.html"}
build_xy_report(config)

Additionally, the config needs to contain references to the respective engine's output files, e.g., "analysisReportFile": "/path/to/analysis-report.json". The full call to create a report for any engine can, for example, be found in the Python or Argo examples of the respective engine.

Expert Configuration

There are many optional expert configurations to customize your HTML report. Some examples:

The aivis engine v2 toolbox always assumes timestamps to be Unix and translates them to readable dates. This behavior can be switched off via "advancedConfig": {"unixTime": False}, so that timestamps always remain long values.

By referring to a metadata file via "metadataFile": "/path/to/metadata.json", signals are not only described via their signal ID but enriched with more information. The metadata JSON contains an array of signals with the keys id (required) as well as name, description, unitSymbol, unitType (all optional):

{
  "signals": [{
    "id": "fa6c65bb-5cee-45fa-ab19-355ba94889e9",
    "name": "et 1",
    "description": "extruder temperature nr. 1",
    "unitName": "Kelvin",
    "unitSymbol": "K"
   }, {
    "id": "dc3477e5-a83c-4485-b7f4-7528d336d9c4", 
    "name": "abc 2"
    }, 
   ...
]}

Additional signals can be added to every HTML report that contains a time series plot to also be displayed. However, it is not automatic to include all signals of the dataset for display, since a full dataset is typically an amount of data that should not be put into a single-file HTML.

All custom configuration options can be seen in the api.py file in src/aivis_engine_v2_toolbox.