aivis Engine v2 - Response Analysis - User Guide

Download OpenAPI specification:Download

aivis Response Analysis is one of the engines of the aivis Technology Platform by Vernaio. It provides breakdowns of the input data with respect to a goal, e.g., a key performance indicator (KPI), such that users can learn not only the ways to achieve/optimize the goal but also the causes of its sub-optimal output under given circumstances. This is done by analyzing relationships between the data the user wants to optimize and all the other data input. aivis Response Analysis informs the users how to accomplish the user defined goals with clear and actionable instructions. As such, its explainability is one of the key strengths of the engine, which differentiates itself from other conventional machine learning models.

By improving traditional decision/regression tree models with novel mathematics, aivis Response Analysis achieves a cutting edge performance in identifying the causes of disruptions or/and in finding the ways to accomplish the user defined goals, while requiring a minimum input/configurations from the users.

The engine generates an analysis report based on historical tabular data that includes data values that represents the user's objective such as a KPI.

Introduction

API References

This documentation explains the usage and principles behind aivis Response Analysis to data and software engineers. For detailed API descriptions of docker images, web endpoints and SDK functions, please consult the reference manual of the regarding component:

For additional support, go to Vernaio Support.

Artifact Distribution

Currently, aivis Response Analysis is distributed to a closed user base only. To gain access to the artifacts, as well as for any other questions, you can open a support ticket via aivis Support.

Tabular Data / Columns

Unlike other aivis engines, The Response Analysis engine takes tabular data as an input. It is thus important to understand the concept of tabular data in the context of the Response Analysis engine. This chapter thus explains the terminologies as well as the required format of tabular data.

A typical example of tabular data is a score board, say, of a football league.

Team Score Win Loss Draw Home Injury
FC Blue 109 10 2 3 "City A" true
FC Red 92 8 3 4 "City A" false
FC Black 78 6 3 6 "City B"
... ... ... ... ... ... ...

Tabular data consist of columns. Every column contains two things, these being

  • an column_id, which is any arbitrary String, like Score and Win. The column_id needs to be unique within the data.
  • a list of cells, each of which consists of a tuple of (row_id, value), like (FC Blue, 109).

Tabular data fed into the engine input may look like this:

Score Win Loss Draw Home Injury
(FC Blue, 109) (FC Blue, 10) (FC Blue, 2) (FC Blue, 3) (FC Blue, "City A") (FC Blue, true)
(FC Red, 92) (FC Red, 8) (FC Red, 3) (FC Red, 4) (FC Red, "City A") (FC Red, false)
(FC Black, 78) (FC Black, 6) (FC Black, 3) (FC Black, 6) (FC Black, "City B")

Here, Score, Win, Loss, etc, are column_ids. And, the data entries are cells. The value of a cell can be float, string, or boolean. aivis Response Analysis engine can handle empty cells. In aivis we also call a row a record. In the above case, we have 3 records.

Float Cell

Numbers are stored as 64-bit Floating Point numbers. They are written in scientific notation like -341.4333e-44, so they consist of the compulsory part Significand and an optional part Exponent that is separated by an e or E.

The Significand contains one or multiple digits and optionally a decimal separator .. In such a case, digits before or after the separator can be ommited and are assumed to be 0. It can be prefixed with a sign (+ or -).

The Exponent contains one or multiple digits and can be prefixed with a sign, too.

The 64-bit Floating Point specification also allows for 3 non-finite values (not a number, positive infinity and negative infinity) that can be written as nan, inf/+inf and -inf (case insensitive). These values are valid, but the engine regards them as being unknown and they are therefore skipped.

Regular expression: (?i:nan)|[+-]?(?i:inf)|[+-]?(?:\d+\.?|\d*\.\d+)(?:[Ee][+-]?\d+)?

String Cell

String values must be encoded as UTF-8. Empty strings are regarded as being unknown values and are therefore skipped.

Boolean Cell

Boolean values must be written in one of the following ways:

  • true/false (case insensitive)
  • 1/0
  • 1.0/0.0 with an arbitrary number of additional zeros at the end

Regular expression: (?i:true)|(?i:false)|1(\.0+)?|0(\.0+)?

Workflow

aivis Response Analysis engine performs an analysis on the input tabular data. The tabular data should contain a column that the user wants to analyze, such as a KPI. This column will be referred to as the target. The end result of the engine is the report.

Evaluation

Example Use Case

Equipped with the knowledge of tabular data, we are now ready to use aivis Response Analysis engine. As an illustrative use case example, we will use the engine to learn about German credit health, e.g., what attributes positively and negatively contributes to one's credit health. Each person is represented by a row, listing a number of attributes and one's credit health, the latter of which will be used as the target KPI.

Getting Started (Docker)

The docker images of aivis Response Analysis are prepared for easy usage. They use the SDK internally, but have a simpler file-based interface. If you have a working docker workflow system like Argo, you can build your own automated workflow based on these images.

In this chapter, we will show you how to get started using docker images. Usage of the SDK will be covered by the next chapter.

Run Example Code

A working example that builds on the code explained below can be downloaded directly here: response-analysis-examples.zip.

This zip file contains example code for docker, python and java in respective subfolders. All of them use the same dataset which is in the data subfolder.

Prerequisites: Additionally to the response-analysis-examples.zip you just downloaded, you need the following artifacts. To gain access, you can open a support ticket via aivis Support.

  • The docker images aivis-engine-v2-ra-worker and (optionally for HTML report generation) aivis-engine-v2-toolbox
  • An aivis licensing key, see licensing

As a Kubernetes user even without deeper Argo knowledge, the aivis-engine-v2-example-ra-argo.yaml shows best how the containers are executed after each other, how analysis worker is provided with a folder that contains the data csv and how the toolbox assembles a HTML report at the end.

Artifacts

There is one docker image:

  • The Worker creates the report:
    {REGISTRY}/{NAMESPACE}/aivis-engine-v2-ra-worker:{VERSION}

The docker image is Linux-based.

Requirements

You need an installation of Docker on your machine as well as access to the engine artifacts

docker -v
docker pull {REGISTRY}/{NAMESPACE}/aivis-engine-v2-ra-worker:{VERSION}
docker -v
docker pull {REGISTRY}/{NAMESPACE}/aivis-engine-v2-ra-worker:{VERSION}

Licensing

A valid licensing key is necessary for every aivis calculation in every engine and every component. It has to be set (exported) as environment variable AIVIS_ENGINE_V2_API_KEY.

If aivis returns a licensing error despite the environment variable being set, please check the following items

  • Terminals usually need to be restarted to learn newly set environment variables.
  • Licensing keys have the typical form <FirstPartOfKey>.<SecondPartOfKey> with first and second part being UUIDs. In particular, there is no whitespace.
  • A common error source is that the user's firewall does not let HTTPS requests to v3.aivis-engine-v2.vernaio-licensing.com (before release 2.7: v2.aivis-engine-v2.vernaio-licensing.com, before release 2.3: aivis-engine-v2.perfectpattern-licensing.de) pass and the licensing request never reaches the licensing server. In that case outgoing connections to that hostname and TCP port 443 need to be whitelisted.

CSV format

All artifacts use CSV as the input data format. As the CSV format is highly non-standardized, we will discuss it briefly in this section.

CSV files must be stored in a single folder specified in the config under data.folder. Within this folder the CSV files can reside in an arbitrary subfolder hierarchy. In some cases (e.g. for HTTP requests), the folder must be passed as a ZIP file.

General CSV rules:

  • The file’s charset must be UTF-8.
  • Records must be separated by Windows or Unix line ending (CR LF/LF). In other words, each record must be on its own line.
  • Fields must be separated by comma.
  • The first line of each CSV file represents the header, which must contain column headers that are file-unique.
  • Every record including the header must have the same number of fields.
  • Text values must be enclosed in quotation marks if they contain literal line endings, commas or quotation marks.
  • Quotation marks inside such a text value have to be prefixed (escaped) with another quotation mark.

Special rules:

  • One column must be called id and contain the row_ids.
  • All other columns, i.e. the ones that are not called id, are interpreted as "columns" of tabular data.
  • column_ids are defined by their column headers.
  • If there are multiple files containing the same column header, this data is regarded as belonging to the same "column" of tabular data.
  • Column values can be boolean values, numbers and strings.
  • Empty values are regarded as being unknown and are therefore skipped.
  • For a given row_id and column_id, there must be only one value; if there are multiple records for the same row_id and column_id, it raises an exception. (Note that this is different from that of the timeseries rules!)

Analysis

Here, we will analyze the target column with aivis Response Analysis engine.

At the beginning, we create a folder docker, a subfolder analysis-config and add the configuration file config.yaml:

data:
  folder: /srv/data
  dataTypes:
    defaultType: FLOAT
    stringColumns: # this field is optional
    - "status_of_existing_checking_account"
    - "purpose"
    - "savings_account"
    - "credit_history"
    - "marital_status_and_sex"
    - "other_debtors"
    - "property"
    - "other_installment_plans"
    - "housing"
    - "job"
    - "telephone"
    - "foreign_worker"
analysis:
  dataFilter: # this field is optional
    excludeColumns:
    - telephone
  target:
    kpi: TARGET
    interest: HIGH_KPI
  strategy: # this field is optional
    minimalFractionOfRecords: 0.05
output:
  folder: /srv/output

Note that the target column is specified by kpi in target. The field called interest then defines whether the goal is to maximize (high_kpi) or minimize (low_kpi) the target. The field strategy is optional and dictates the minimum number of records in the leaf nodes.

For the moment, you may take this file as it is. The different keys will become clearer from the later sections and the docker reference manual.

As a next step, we create a second folder data and add the Input Data CSV file train_ra.csv to the folder. Afterwards, we create a blank folder output.

Our folder structure should now look like this:

+- docker
|  +- analysis-config
|      +- config.yaml
|
+- data
|  +- train_ra.csv
|
+- output

Finally, we can start our analysis via:

docker run --rm -it \
  -v $(pwd)/docker/analysis-config:/srv/conf \
  -v $(pwd)/data/train_ra.csv:/srv/data/train_ra.csv \
  -v $(pwd)/output:/srv/output \
  {REGISTRY}/{NAMESPACE}/aivis-engine-v2-ra-worker:{VERSION}
docker run --rm -it `
  -v ${PWD}/docker/analysis-config:/srv/conf `
  -v ${PWD}/data/train_ra.csv:/srv/data/train_ra.csv `
  -v ${PWD}/output:/srv/output `
  {REGISTRY}/{NAMESPACE}/aivis-engine-v2-ra-worker:{VERSION}

After a short time, this should lead to an output file analysis-report.json in the output folder.

Getting Started (SDK)

The SDK of aivis Response Analysis allows for direct calls from your C, Java or Python program code. All language SDKs internally use our native shared library (FFI). As C APIs can be called from various other languages as well, the C-SDK can also be used with languages such as R, Go, Julia, Rust, and more. Compared to the docker images, the SDK enables a more fine-grained usage and tighter integration.

In this chapter we will show you how to get started using the SDK.

Run Example Code

A working sdk example that builds on the code explained below can be downloaded directly here:

This zip file contains example code for docker, python and java in respective subfolders. All of them use the same dataset which is in the data subfolder.

Additionally to the `response-analysis-examples.zip` you just downloaded, you need the following artifacts. To gain access, you can open a support ticket via aivis Support.

Required artifacts:

  • These aivis engine v2 .whl-files which you will receive in a libs.zip directly from aivis Support:
    • aivis_engine_v2_ra_runtime_python_full-{VERSION}-py3-none-win_amd64.whl: An response analysis full python runtime
      (here for windows, fitting your operating system - see artifacts for other options on linux and macos.)
    • aivis_engine_v2_base_sdk_python-{VERSION}-py3-none-any.whl: The base python sdk
    • aivis_engine_v2_ra_sdk_python-{VERSION}-py3-none-any.whl: The response analysis python sdk
    • aivis_engine_v2_toolbox-{TOOLBOX-VERSION}-py3-none-any.whl: The toolbox python sdk - optional for HTML report generation
  • An aivis licensing key, see licensing, which you will receive directly from aivis Support

Preparations:

  • Make sure you have a valid Python(>=3.9) installation.
  • To apply the aivis licensing key, create an environment variable AIVIS_ENGINE_V2_API_KEY and assign the licensing key to it.
  • Make sure you have an active internet connection so that the licensing server can be contacted.
  • Download and unzip the response-analysis-examples.zip. The data CSV train_ra.csv needs to stay in **/data.
  • Download and unzip the libs.zip. These .whl-files need to be in **/libs.

The folder now has the following structure:

+- data
|  +- train_ra.csv
|
+- docker
|  +- # files to run the example via docker images, which we will not need now
|
+- java
|  +- # files to run the example via java sdk, which we will not need now 
|
+- libs
|  +- # the .whl files to run aivis
|
+- python
|  +- # files to run the example via python sdk 

Running the example code:

  • Navigate to the **/python subfolder. Here, you find the classic python script example_ra.py and the jupyter notebook example_ra.ipynb. Both run the exact same example and output the same result. Choose which one you want to run.
  • There are various ways to install dependencies from .whl files. We will now explain two options, which are installing them via pip install or installing them via poetry. Many other options are also possible, of course.

Option A: pip install (only for the classic python script example_ra.py, not for the jupyter notebook example_ra.ipynb)

  • open a console in the **/python subfolder and run the following commands:
      # installs the `.whl` files
      pip install -r requirements-<platform>.txt
    
      # runs the classic python script `example_ra.py`
      python example_ra.py --input=../data --output=output
    

Option B: poetry install

  • If not already happened, install poetry, a python package manager:
      # installs poetry (a package manager)
      python -m pip install poetry
    
  • Run either the classic python script example_ra.py
      # installs the `.whl` files
      poetry install --no-root
    
      # runs the classic python script `example_ra.py`
      poetry run python example_ra.py --input=../data --output=output
    
  • Or run jupyter notebook example_ra.ipynb by executing the following commands in the console opened in the **/python subfolder. The first one might take a while, the third one opens a tab in your browser.
      # installs the `.whl` files
      poetry install --no-root
    
      # installs jupyter kernel
      poetry run ipython kernel install --user --name=test_ra
    
      # runs the jupyter python script `example_ra.ipynb`
      poetry run jupyter notebook example_ra.ipynb
    

After running the scripts, you will find your computation results in **/python/output.

Additionally to the response-analysis-examples.zip you just downloaded, you need the following artifacts. To gain access, you can open a support ticket via aivis Support.

Required artifacts:

  • These aivis engine v2 .jar files which you will receive in a libs.zip directly from aivis Support:
    • aivis-engine-v2-ra-runtime-java-full-win-x8664-{VERSION}.jar: An response analysis full java runtime, here for windows, fitting your operating system - see artifacts for other options on linux and macos.
    • aivis-engine-v2-base-sdk-java-{VERSION}.jar: The base java sdk
    • aivis-engine-v2-ra-sdk-java-{VERSION}.jar: The response analysis java sdk
    • There is NO toolbox jar for HTML report generation.
  • An aivis licensing key, see licensing, which you will receive directly from aivis Support

Preparations:

  • Make sure you have a valid Java(>=11) installation.
  • To apply the aivis licensing key, create an environment variable AIVIS_ENGINE_V2_API_KEY and assign the licensing key to it.
  • Make sure you have an active internet connection so that the licensing server can be contacted.
  • Download and unzip the response-analysis-examples.zip. The data CSVs train_ra.csv needs to stay in **/data.
  • Download and unzip the libs.zip. These .jar-files need to be in **/libs.

The folder now has the following structure:

+- data
|  +- train_ra.csv
|
+- docker
|  +- # files to run the example via docker images, which we will not need now
|
+- java
|  +- # files to run the example via java sdk 
|
+- libs
|  +- # the .jar files to run aivis
|
+- python
|  +- # files to run the example via python sdk, which we will not need now 

Running the example code:

  • We use Gradle as our Java-Package-Manager. It's easiest to directly use the gradle wrapper.
  • Navigate to the **/java subfolder. Here, you find the build.gradle. Check, if the paths locate correctly to your aivis engine v2 .jar files in the **/libs subfolder.
  • open a console in the **/java subfolder and run the following commands:
      # builds this Java project with gradle wrapper
      ./gradlew clean build
    
      # runs Java with parameters referring to input and output folder
      java -jar build/libs/example_ra.jar --input=../data --output=output
    

After running the scripts, you will find your computation results in **/java/output.

Artifacts

Our SDK artifacts come in one flavor full (note that inf flavor will be introduced in the future release):

  • full packages provide the full functionality and are available for mainstream targets only:
    • win-x8664
    • macos-armv8* (macOS 11 "Big Sur" or later) 2.3
    • macos-x8664* (macOS 11 "Big Sur" or later; until aivis engine version 2.9.0) 2.3
    • linux-x8664 (glibc >= 2.14)

* Only Python and C SDKs are supported. Java SDK is not available for this target.

In this chapter we want to demonstrate the full API functionality and thus always use the full package.

To use the Python-SDK you must download the SDK artifact (flavor and target generic) for your pythonpath at build time. Additionally at installation time, the runtime artifact must be downloaded with the right flavor and target.

The artifacts are distributed through a PyPI registry.

Using Poetry you can simply set a dependency on the artifacts specifying flavor and version. The target is chosen depending on your installation system:

aivis_engine_v2_ra_sdk_python = "{VERSION}"
aivis_engine_v2_ra_runtime_python_{FLAVOR} = "{VERSION}"

To use the Java-SDK, you must download at build time:

  • SDK artifact (flavor and target generic) for your compile and runtime classpath
  • Runtime artifact with the right flavor and target for your runtime classpath

It is possible to include multiple runtime artifacts for different targets in your application to allow cross-platform usage. The SDK chooses the right runtime artifact at runtime.

The artifacts are distributed through a Maven registry.

Using Maven, you can simply set a dependency on the artifacts specifying flavor, version and target:

<dependency>
  <groupId>com.vernaio</groupId>
  <artifactId>aivis-engine-v2-ra-sdk-java</artifactId>
  <version>{VERSION}</version>
</dependency>
<dependency>
  <groupId>com.vernaio</groupId>
  <artifactId>aivis-engine-v2-ra-runtime-java-{FLAVOR}-{TARGET}</artifactId>
  <version>{VERSION}</version>
  <scope>runtime</scope>
</dependency>

Alternativly, with Gradle:

implementation 'com.vernaio:aivis-engine-v2-ra-sdk-java:{VERSION}'
runtimeOnly    'com.vernaio:aivis-engine-v2-ra-runtime-java-{FLAVOR}-{TARGET}:{VERSION}'

To use the C-SDK, you must download the SDK artifact at build time (flavor and target generic). For final linkage/execution you need the runtime artifact with the right flavor and target.

The artifacts are distributed through a Conan registry.

Using Conan, you can simply set a dependency on the artifact specifying flavor and version. The target is chosen depending on your build settings:

aivis-engine-v2-ra-sdk-c/{VERSION}
aivis-engine-v2-ra-runtime-c-{FLAVOR}/{VERSION}

The SDK artifact contains:

  • Headers: include/aivis-engine-v2-ra-core-full.h

The runtime artifact contains:

  • Import library (LIB file), if Windows target: lib/aivis-engine-v2-ra-{FLAVOR}-{TARGET}.lib
  • Runtime library (DLL file), if Windows target: bin/aivis-engine-v2-ra-{FLAVOR}-{TARGET}.dll (also containing the import library)
  • Runtime library (SO file), if Linux target: lib/aivis-engine-v2-ra-{FLAVOR}-{TARGET}.so (also containing the import library)

The runtime library must be shipped to the final execution system.

Licensing

A valid licensing key is necessary for every aivis calculation in every engine and every component. It has to be set (exported) as environment variable AIVIS_ENGINE_V2_API_KEY.

If aivis returns a licensing error despite the environment variable being set, please check the following items

  • Terminals usually need to be restarted to learn newly set environment variables.
  • Licensing keys have the typical form <FirstPartOfKey>.<SecondPartOfKey> with first and second part being UUIDs. In particular, there is no whitespace.
  • A common error source is that the user's firewall does not let HTTPS requests to v3.aivis-engine-v2.vernaio-licensing.com (before release 2.7: v2.aivis-engine-v2.vernaio-licensing.com, before release 2.3: aivis-engine-v2.perfectpattern-licensing.de) pass and the licensing request never reaches the licensing server. In that case outgoing connections to that hostname and TCP port 443 need to be whitelisted.

Setup

Before we can invoke API functions of our SDK, we need to set it up for proper usage and consider the following things.

Releasing Unused Objects

It is important to ensure the release of allocated memory for unused objects.

In Python, freeing objects and destroying engine resources is done automatically. You can force resource destruction with the appropriate destroy function.

In Java, freeing objects is done automatically, but you need to destroy all engine resources like Data- and Analysis-objects with the appropriate destroy function. As they all implement Java’s AutoClosable interface, we can also write a try-with-resource statement to auto-destroy them:

try(final ResponseAnalysisData inputData = ResponseAnalysisData.create()) {

  // ... do stuff ...

} // auto-destroy when leaving block

In C, you must always

  • free every non-null pointer allocated by the engine with aivis_free (all pointers returned by functions and all double pointers used as output function parameter e.g. Error*)
    Note: aivis_free will only free own objects. Also, it will free objects only once and it disregards null pointers.
  • free your own objects with free as usual.
  • destroy all handles after usage with the appropriate destroy function.

Error Handling

Errors and exceptions report what went wrong on a function call. They can be caught and processed by the outside.

In Python, an Exception is thrown and can be caught conveniently.

In Java, an AbstractAivisException is thrown and can be caught conveniently.

In C, every API function can write an error to the given output function parameter &err (to disable this, just set it to NULL). This parameter can then be checked by a helper function similar to the following:

const Error *err = NULL;

void check_err(const Error **err, const char *action) {

  // everything is fine, no error
  if (*err == NULL)
    return;

  // print information
  printf("\taivis Error: %s - %s\n", action, (*err)->json);

  // release error pointer
  aivis_free(*err);
  *err = NULL;

  // exit program
  exit(EXIT_FAILURE);
}

Failures within function calls will never affect the state of the engine.

Logging

The engine emits log messages to report on the progress of each task and to give valuable insights. These log messages can be caught via registered loggers.

# create logger
class Logger(EngineLogger):
    def log(self, level, thread, module, message):
        if (level <= 3):
            print("\t... %s" % message)

# register logger
ResponseAnalysisSetup.register_logger(Logger())
// create and register logger
ResponseAnalysisSetup.registerLogger(new EngineLogger() {
            
    public void log(int level, String thread, String module, String message) {
        if (level <= 3) {
            System.out.println(String.format("\t... %s", message));
        }
    }
});
// create logger
void logger(const uint8_t level, const char *thread, const char *module, const char *message) {
  if (lvl <= 3)
    printf("\t... %s\n", message);
}

// register logger
aivis_setup_register_logger(&logger, &err);
check_err(&err, "Register logger");

Thread Management

During the usage of the engine, a lot of calculations are done. Parallelism can drastically speed things up. Therefore, set the maximal threads to a limited number of CPU cores or set it to 0 to use all available cores (defaults to 0).

# init thread count
ResponseAnalysisSetup.init_thread_count(4)
// init thread count
ResponseAnalysisSetup.initThreadCount(4);
// init thread count
aivis_setup_init_thread_count(4, &err);
check_err(&err, "Init thread count");

Data Input

Now that we are done setting up the SDK, we need to create a data store that holds our historical tabular data. In general, all data must always be provided through data stores. You can create as many as you want.

After the creation of the data store, you can fill it with column data.

# create empty data context
analysis_data = ResponseAnalysisData.create()

# add sample data
analysis_data.add_float_column("column-id", [
  DtoFloatCell(100, 1.0),
  DtoFloatCell(200, 2.0),
  DtoFloatCell(300, 4.0),
])

# ... use data ...
// create empty data context
try(final ResponseAnalysisData analysisData = ResponseAnalysisData.create()) {

  // add sample data
  analysisData.addFloatColumn("column-id", Arrays.asList(
    new DtoFloatCell(100L, 1.0),
    new DtoFloatCell(200L, 2.0),
    new DtoFloatCell(300L, 3.0),
  ));

  // ... use data ...

} // auto-destroy data
// create empty data context
TabularDataHandle analysis_data = aivis_tabular_data_create(&err);
check_err(&err, "Create analysis data context");

const DtoFloatCell cells[] = {
  {"100", 1.0},
  {"200", 2.0},
  {"300", 4.0},
};

// add sample data
aivis_tabular_data_add_float_column(analysis_data, "column-id", &cells[0], sizeof cells / sizeof *cells, &err);
check_err(&err, "Adding column");

// ... use data ...

// destroy data context
aivis_tabular_data_destroy(analysis_data, &err);
check_err(&err, "Destroy data context");
analysis_data = 0;

Above we have filled the data store with three hard coded data cells to illustrate the approach. Usually you will read in the data from some other source. In the following, we will assume you have read in the file train_ra.csv shipped with the Example Project.

Analysis

With the data store filled with historical tabular data, we can now create our analysis:

# build analysis config
analysis_config = json.dumps(
    {
        "dataFilter": {"excludeColumns": ["telephone"]}, # this field is optional
        "target": {
            "kpi": "TARGET",
            "interest": "HIGH_KPI",
        },
        "strategy": { # this field is optional
            "minimalFractionOfRecords": 0.05,
        },
    }
)

# create analysis
analysis = ResponseAnalysis.create(analysis_data, analysis_config)

# ... use analysis ...
// build analysis config
final DtoAnalysisConfig analysisConfig =
  new DtoAnalysisConfig(new DtoTargetConfig("TARGET", DtoInterest.HIGH_KPI)).withDataFilter(
    new DtoTabularDataFilter().withExcludeColumns(new String[] { "telephone" }) // this field is optional
  ).withStrategy(new DtoStrategy().withMinimalFractionOfRecords(0.05)); // this field is optional

// create analysis
final ResponseAnalysis analysis = ResponseAnalysis.create(analysisData, analysisConfig) {

  // ... use analysis ...

} // auto-destroy analysis
// build analysis config
const char *analysis_config = "{"
  "\"dataFilter\": {\"excludeColumns\": [\"telephone\"]}," // this field is optional
  "\"target\": {"
  "\"kpi\": \"TARGET\","
  "\"interest\": \"HIGH_KPI\" "
  "},"
  "\"strategy\": {" // this field is optional
  "\"minimalFractionOfRecords\": 0.05 "
  "}"
  "}";

// create analysis
ResponseAnalysisHandle analysis_handle = aivis_response_analysis_create(
  analysis_data,
  (uint8_t *) analysis_config,
  strlen(analysis_config),
  &err
);
check_err(&err, "Create analysis");

// ... use analysis ...

// destroy analysis
aivis_response_analysis_destroy(analysis_handle, &err);
check_err(&err, "Destroy analysis");
analysis_handle = 0;

Note that the target column is specified by kpi in target. The field called interest then defines whether the goal is to maximize (high_kpi) or minimize (low_kpi) the target. The field strategy is optional and dictates the minimum number of records in the leaf nodes.

For the moment, you may take this file as it is. The different keys will become clearer from the later sections and the docker reference manual.

Getting Started (Analysis Report)

aivis Response Analysis engine outputs a report, which can be visualized as follows.

Evaluation

On the very left, one finds the root node that contains all records. Suppose TARGET values range from 0 to 1. Mean KPI in the root node is 0.7, which is the average value of TARGET entries of all records. The root node splits out to two child nodes. Each child node is reached by a predicate, which is expressed as, for example, A < x, where A is a column_id, and x is a value that splits the records of the node. It is the core engine algorithm that picks the column and the value to best split the node. A child node is itself again split into two nodes and so on.
Splitting stops when

  • the ratio of the number of records in the node to the total number of records is smaller than minimal fraction of records (when unspecified, it defaults to 0.01).
  • or TARGET values of all records in the node are the same.

The final un-split nodes are called leaf nodes (the square nodes). Not only are those records that belong to leaf nodes grouped to have similar (if not the same) TARGET values, but they also tend to have extreme values. This implies the "good" and "bad" records are well separated. aivis Response Analysis engine identifies the most informative path ways to arrive at "good" and "bad" TARGET value scenarios. By inspecting the predicates that lead to the leaf nodes, therefore, one can learn the reasons for good (or the cause of bad) performance of the KPI.

Getting Started (Example Results)

In this section, we discuss the results of our example project, German credit health. It contains 1000 people's credit health evaluation, compiled by a bank. The target KPI, which naturally represents the credit health, is binary, i.e., 0 for bad and 1 for good. The global average of the KPI is 0.7 meaning that 700 people (records) have good credit health (1) and 300 bad (0).

Above is an interactive report, where one can see a tree model generated by aivis Response Analysis engine.

Evaluation

Here, we highlighted some example paths along which we explore the analysis results.

Path 1 is the path way leading to one of the bad leaf nodes. This leaf node's mean KPI is 0.094 with 32 people (from here on we use "people" and "records" interchangeably), meaning many belonging to this node have very poor credit health. Our engine identified that, in order to reach this leaf node, one needs to have the following attributes

  • Having a checking account in the bank (the bank who compiled this data).
  • The duration of the requested loan is longer than 21 months.
  • Having less than 100 Deutsche Mark in the savings account.
    • Having more than 100 DM in the savings account leads to Path 2.
  • The duration of the requested loan is shorter than or equal to 44.8 months.

What can we do to improve the credit health? Let us examine Path 2. Path 2 diverges from Path 1 when one has more than 100 DM in the savings account. At the end node of Path 2, which is not a leaf node, the mean KPI is 0.625 with 96 records. From this example, one can already learn that having some savings can substantially improve the credit health (0.393 -> 0.625).

Further improvement can be made by looking at Path 2.1. The predicate is

  • credit_history != Existing credits paid back duly till now

49 records belong to this node with the mean KPI being 0.755. But, what does it mean?

credit_history is a column id, which is a categorical column. The other categories are

  • no credits taken/ all credits paid back duly
  • all credits at this bank paid back duly
  • delay in paying off in the past
  • critical account/ other credits existing (not at this bank)

So, in the context of path 2, if one's credit history is one of the above four (so it is not Existing credits paid back duly till now), then one likely has a better credit health, on average 0.755, which is better than the global average, 0.7.

What about people with great credit health? At the end of Path 3 and Path 4 are the nodes, in which all records have credit health 1.

In the leaf node of Path 3 are 99 records whose credit health is 1. What these 99 people have in common is

  • Not having checking account in the bank.
  • Not having other monthly installment plans.
  • At least as old as 30.2 years old.
    • Younger than 30.2 years old leads to Path 4.
  • credit_history = Critical account/ other credits existing (not at this bank)

If you're younger than 30.2 years old, you still have a chance. The leaf node of Path 4 represents 41 people with credit health 1. They share the following attributes in addition to the shared attributes of Path 3.

  • Younger than 30.2 years old.
  • Ratio of installment payment of the requested loan to disposable income in percentage is smaller than 4%.
  • Present employment duration is at least as long as 3 years.
  • The reason for the loan is not to buy a new car.

As such, aivis Response Analysis engine provides insights that are tailored to one's specific needs and situation. Not only does the engine clearly explain the reasons for good and bad performances by means of predicates, but also the counter-measures are actionable, which will result in immediate improvements.

Preparation

Previous sections gave an introduction on how to use aivis Response Analysis and also shed some light on how it works. The following sections will explain more on the concept and provide a more profound background. It is not necessary to know this background to use aivis Response Analysis! However, you may find convenient solutions for specific problems, or information on how to optimize your usage of aivis Response Analysis. It will become clear that only minimal user input is required for the engine to perform well. Nevertheless, the user has the option to control the process with several input parameters which will be presented below.

Analysis

A Fully Loaded Analysis Configuration

First, an overview of all kinds of possible configuration keys is presented. A more minimal analysis configuration was used above in SDK analysis, respectively, in Docker analysis. This example may mainly serve as a quick reference. The meaning of the different keys is explained in the following sections, and a definition of the syntax is given in the reference manuals.

analysis:
  dataFilter:
    excludeColumns:
    - telephone
    # includeColumns: ... either exclude or include columns
  target:
    kpi: TARGET
    interest: HIGH_KPI # or LOW_KPI
    weight: MY_WEIGHT_COLUMN
  columns:
  - column: COLUMN_1
    interpreter:
      _type: Categorical
  - column: COLUMN_2
    interpreter:
      _type: Numerical
  - column: COLUMN_3
    interpreter:
      _type: Numerical
      quantileCount: 100
  strategy:
    minimalFractionOfRecords: 0.05
analysis_config = json.dumps({
  "dataFilter": {
    "excludeColumns": ["telephone"]
      # "includeColumns": ... either exclude or include columns
  },
  "target": {
    "kpi": "TARGET",
    "interest": "HIGH_KPI",
    "weight": "MY_WEIGHT_COLUMN",
  },
  "columns": [
    {
      "column": COLUMN_1,
      "interpreter": {
        "_type": "Categorical"
      }
    },
    {
      "column": COLUMN_2,
      "interpreter": {
        "_type": "Numerical"
      }
    },
    {
      "column": COLUMN_3,
      "interpreter": {
        "_type": "Numerical",
        "quantileCount": 100,
      }
    },
  ],
  "strategy": {
    "minimalFractionOfRecords": 0.05
      }
})
final DtoAnalysisConfig analysisConfig =
  new DtoAnalysisConfig(new DtoTargetConfig("TARGET", DtoInterest.HIGH_KPI).withWeight("MY_WEIGHT_COLUMN")).withDataFilter(
    new DtoTabularDataFilter().withExcludeColumns(new String[] { " telephone " })
  )
    // .withDataFilter(new DtoTabularDataFilter().withIncludeColumns(new String[] {"..."})) either exclude or include columns
    .withColumns(
      new IDtoColumnConfig[] {
        new DtoColumnConfig("COLUMN_1").withInterpreter(new DtoCategoricalColumnInterpreter()),
        new DtoColumnConfig("COLUMN_2").withInterpreter(new DtoNumericalColumnInterpreter()),
        new DtoColumnConfig("COLUMN_3").withInterpreter(new DtoNumericalColumnInterpreter().withQuantileCount(100)), }
    )
    .withStrategy(new DtoStrategy().withMinimalFractionOfRecords(0.05));
const char *analysis_config = "{"
  "\"dataFilter\": {"
    "\"excludeColumns\": ["
      "\"telephone\","
      "]",
      //"\"includeColumns\": [...]" either exclude or include columns
    "}",
  "\"target\": {"
    "\"kpi\": \"TARGET\","
    "\"interest\": \"HIGH_KPI\","
    "\"weight\": \"MY_WEIGHT_COLUMN\","
    "}",
  "\"columns\" : [{"
    "\"column\" : \"COLUMN_1\","
    "\"interpreter\": {"
    "\"_type\": \"Categorical\""
  "}},{"
    "\"column\" : \"COLUMN_2\","
    "\"interpreter\": {"
    "\"_type\": \"Numerical\""
  "}},{"
    "\"column\" : \"COLUMN_3\","
    "\"interpreter\": {"
    "\"_type\": \"Numerical\","
    "\"quantileCount\": 100"
  "}}"
"],"
  "\"strategy\" : {"
    "\"minimalFractionOfRecords\": 0.05,"
    "}"
  "}";

The following sections list and explain the parameters the user may configure to control the analysis. The sections are organized along the structure of the configuration classes.

Data Filter: Exclude Parts of the Data

The data filter allows you define the columns that are used for analysis. This can be done by either of the following ways: exclude columns, or, alternatively, provide a list of column ids to include (include columns).

Target: Define the Goal of The Analysis

The target column reflects the KPI of the analysis. It must therefore clearly reflect the goal of the analysis. The field kpi takes the ID of target column. And, interest decides whether the goal is to maximize (high_kpi config) or minimize (low_kpi config) the kpi. The optional field is weight, which takes a column_id as an input. Each cell of this column represents a positive float that defines how much weight the corresponding TARGET cell should take, e.g., a weight of 2.0 behaves like two identical records of weight 1.0.

Columns Configuration: If Columns Require Special Treatment

The column configuration is the place to pass additional information about a column in order to enforce a special treatment. Each column configuration refers to one specific column.

Interpreter

At the core of the column configuration is the interpreter. The interpreter defines which predicates can be built from a column. Very often the default configuration is the best choice and you don't need to set any interpreter. Below you find a table on the different interpreters, followed by some more in-depth explanations.

By default, all float columns are interpreted as numerical and all categorical and boolean columns are interpreted as categorical.

numerical interpreter should be used for all columns for which the order of numbers is meaningful. For aivis Response Analysis engine, the numerical interpreter takes an additional optional argument, namely quantile count, the default value of which is set to be 20.

quantile count sets a resolution. Imagine a numerical column A whose cell values are ranging from 0 to 100. Suppose quantile count is 5. This means the engine will consider the values of this column up to a resolution of (100-0)/5 = 20. So it will generate 4 different predicates (plus their negations)

  • is A larger than 20?
  • is A larger than 40?
  • is A larger than 60?
  • is A larger than 80?

String and boolean columns are always interpreted as categorical. Categorical data has nominal scale, i.e., it takes only specific levels and does not necessarily follow any order. In practice, this would express the information about certain states, such as “green”, “red”, or “blue”. This information may be present in form of strings, booleans, or also encoded in numbers. An example could be a column for which "1.0" stands for "pipe open", "2.0" for "pipe blocked", and "3.0" for "pipe sending maintenance alarm". A categorical column A with possible values a1, a2 and a3 will currently generate 3 predicates (plus their negations)

  • is A equal to a1?
  • is A equal to a2?
  • is A equal to a3?

One may now wonder, if any column can be interpreted as categorical, why would numerical interpreter exist?

  • if ordering matters, then a column is better interpreted as numerical, since the resulting predicate on a numerical column will be, e.g., COLUMN_ID < 10, instead of COLUMN_ID != 10.
  • if a numerical column contains N number of unique values, and interpreted as categorical, then the engine will create N number of predicates that all need to be considered. Therefore, if N is large, it is recommended to interpret it as numerical with quantile count being smaller than N.

Strategy

The field strategy contains a sub-field called minimal fraction of records, which defines a stopping criterion for splitting nodes. Splitting stops when

  • the ratio of the number of record in the node to the total number of record is smaller than minimal fraction of records (when unspecified, it defaults to 0.01).
  • or TARGET values of all records in the node are the same.

Output: Report

As a result of analysis, a report is produced, which contains a tree model. With an appropriate visualization tool, one can generate the tree model from the report.

Appendix 1: Toolbox

aivis engine v2 toolbox is a side project of aivis engine v2. It mainly provides tools to turn the output artifacts of aivis engine v2 into technical, single-file HTML reports.

Disclaimer

It is explicitly not an official part of aivis engine v2. Therefore, its api and behaviour is subject to change and not necessarily thoroughly tested. It is very important to note that these HTML reports are not a designed UI but rather a visualization testing playground:
The aivis engine v2 toolbox targets researchers and data scientists who already know the concepts of aivis engine v2 beforehand and wish to quickly visualize and adapt its outputs.

Furthermore:

  • With exceptionally large input files (e.g. too many inferences) or the wrong configuration, the generated HTML pages will be too slow to handle.
  • The HTMLs are optimized for a wide screen.

Setup

The aivis engine v2 toolbox does not need a licensing key. The python code is free to look into or even adapt. The respective toolbox release of an aivis engine v2 release {VERSION} is available as:

  • Python Whl aivis_engine_v2_toolbox-{VERSION}-py3-none-any.whl
  • Docker Image aivis-engine-v2-toolbox:{VERSION}

Create Engine Report

Each call to construct a toolbox HTML report for engine xy has the following structure:

from aivis_engine_v2_toolbox.api import build_xy_report

config = {
    "title": "My Use Case Title", 
    ...
    "outputFile": "/path/to/my-use-case-report.html"}
build_xy_report(config)

Additionally, the config needs to contain references to the respective engine's output files, e.g. "analysisReportFile": "/path/to/analysis-report.json". The full call to create a report for any engine can be found in python or argo examples of the respective engine.

Expert Configuration

There are many optional expert configurations to customize your HTML report. Some examples:

  • The aivis engine v2 toolbox always assumes timestamps to be unix and translates them to readable dates. This behaviour can be switched off via "advancedConfig": {"unixTime": False}, so that timestamps always remain long values.

  • By referring to a metadata file via "metadataFile": "/path/to/metadata.json", signals are not only described via their signal id but enriched with more information. The metadata json contains an array of signals with the keys id (must) as well as name, description, unitSymbol, unitType (all optional):

    {"signals": [{
        "id": "fa6c65bb-5cee-45fa-ab19-355ba94889e9",
        "name": "et 1",
        "description": "extruder temperature nr. 1",
        "unitName": "Kelvin",
        "unitSymbol": "K"
      }, {
        "id": "dc3477e5-a83c-4485-b7f4-7528d336d9c4", 
        "name": "abc 2"
        }, 
       ...
    ]}
    
  • To every HTML report which contains a timeseries plot, additional signals can be added to also be displayed.

All custom configuration options can be seen in the api.py file in src/aivis_engine_v2_toolbox.