aivis Engine v2 - Dependency Analysis - User Guide

Download OpenAPI specification:Download

aivis Dependency Analysis is one of the engines of the aivis Technology Platform by Vernaio.
aivis Dependency Analysis enhances your understanding of signal interactions and helps in signal behavior analysis in complex industrial systems. The primary aim of this approach is to identify signals that behave similarly and group them into clusters. Similarity of two signals can be casted as a specific kind of distance metric: Signals are close to each other if they behave similar. Therefore, a cluster consists of signals that are close to each other and the cluster's center is the signal closest on average to every other signal within the cluster.

As a preparation step for aivis Anomaly Detection, aivis Dependency Analysis enhances the process of identifying the ideal target signals. It allows for the identification of the most representative or 'ideal' signal within a cluster, taking into account both its conformity to the cluster's characteristics and its distinctiveness.

Introduction

API References

This documentation explains the usage and principles behind aivis Dependency Analysis to data and software engineers. For detailed API descriptions of docker images, web endpoints and SDK functions, please consult the reference manual of the regarding component:

SDKs
- Python API reference: base, Dependency Analysis.
- Java API reference.
- C API reference.
Docker Images
- App-API
  - Dependency Analysis Worker.
- Web-API
  - Dependency Analysis Worker.

For additional support, go to Vernaio Support.

Artifact Distribution

Currently, aivis Dependency Analysis is distributed to a closed user base only. To gain access to the artifacts, as well as for any other questions, you can open a support ticket via aivis Support.

Workflow

aivis Dependency Analysis engine performs an analysis on the input Analysis Data. No target signal is required for this analysis. The end result of the engine is the report.

Evaluation

Example Use Case

This example case shows the usage of aivis Dependency Analysis applied to the Damadics dataset, which is a detailed collection of data from a sugar production process. This dataset includes 32 signals recorded over a 25-day period from October 29 to November 22, 2001. Capturing each signal every second, the dataset encompasses over two million data points. The aim is to shed light on the dependencies and relationships among the signals, providing a comprehensive overview of the signal dynamics during the analyzed period.

To this aim, a step-by-step instruction is provided below, guiding through the full process from data import over analysis to report generation. The signals are clustered based on correlation distances and the central signal in each cluster is pinpointed.

In the following sections of this guide, we will explore generating a detailed dependency report using the aivis Technology Platform SDKs, either directly, or via a docker image.

If you are interested in a broader view of the applications of the aivis Technology Platform, please note the more general examples.

Getting Started (SDK)

The SDK of aivis Dependency Analysis allows for direct calls from your C, Java or Python program code. All language SDKs internally use our native shared library (FFI). As C APIs can be called from various other languages as well, the C-SDK can also be used with languages such as R, Go, Julia, Rust, and more. Compared to the docker images, the SDK enables a more fine-grained usage and tighter integration.

In this chapter we will show you how to get started using the SDK.

Run Example Code

A working sdk example that builds on the code explained below can be downloaded directly here:

dependency-analysis-examples.zip (latest)

This zip file contains example code for docker, python and java in respective subfolders. All of them use the same dataset which is in the data subfolder.

Additionally to the `dependency-analysis-examples.zip` you just downloaded, you need the following artifacts. To gain access, you can open a support ticket via aivis Support.

Required artifacts:

These aivis engine v2 .whl-files which you will receive in a libs.zip directly from aivis Support:
- vernaio_aivis_engine_v2_da_runtime_python_full-{VERSION}-py3-none-win_amd64.whl: A dependency analysis full python runtime
  (here for windows, fitting your operating system - see artifacts for other options on linux and macos.)
- vernaio_aivis_engine_v2_base_sdk_python-{VERSION}-py3-none-any.whl: The base python sdk
- vernaio_aivis_engine_v2_da_sdk_python-{VERSION}-py3-none-any.whl: The dependency analysis python sdk
- vernaio_aivis_engine_v2_toolbox-{TOOLBOX-VERSION}-py3-none-any.whl: The toolbox python sdk - optional for HTML report generation
An aivis licensing key, see licensing, which you will receive directly from aivis Support

Preparations:

Make sure you have a valid Python(>=3.10) installation.
To apply the aivis licensing key, create an environment variable AIVIS_ENGINE_V2_API_KEY and assign the licensing key to it.
Make sure you have an active internet connection so that the licensing server can be contacted.
Download and unzip the dependency-analysis-examples.zip. The data CSV train_da.csv needs to stay in **/data.
Download and unzip the libs.zip. These .whl-files need to be in **/libs.

The folder now has the following structure:

+- data
|  +- train_da.csv
|
+- docker
|  +- # files to run the example via docker images, which we will not need now
|
+- java
|  +- # files to run the example via java sdk, which we will not need now 
|
+- libs
|  +- # the .whl files to run aivis
|
+- python
|  +- # files to run the example via python sdk

Running the example code:

Navigate to the **/python subfolder. Here, you find the classic python script example_da.py and the jupyter notebook example_da.ipynb. Both run the exact same example and output the same result. Choose which one you want to run.
There are various ways to install dependencies from .whl files. We will now explain two options, which are installing them via pip install or installing them via poetry. Many other options are also possible, of course.

Option A: pip install (only for the classic python script example_da.py, not for the jupyter notebook example_da.ipynb)

open a console in the **/python subfolder and run the following commands:

  # installs the `.whl` files
  pip install -r requirements-<platform>.txt

  # runs the classic python script `example_da.py`
  python example_da.py --input=../data --output=output

Option B: poetry install

If not already happened, install poetry, a python package manager:

  # installs poetry (a package manager)
  python -m pip install poetry

Run either the classic python script example_da.py

  # installs the `.whl` files
  poetry install --no-root

  # runs the classic python script `example_da.py`
  poetry run python example_da.py --input=../data --output=output

Or run jupyter notebook example_da.ipynb by executing the following commands in the console opened in the **/python subfolder. The first one might take a while, the third one opens a tab in your browser.

  # installs the `.whl` files
  poetry install --no-root

  # installs jupyter kernel
  poetry run ipython kernel install --user --name=test_da

  # runs the jupyter python script `example_da.ipynb`
  poetry run jupyter notebook example_da.ipynb

After running the scripts, you will find your computation results in **/python/output.

Additionally to the dependency-analysis-examples.zip you just downloaded, you need the following artifacts. To gain access, you can open a support ticket via aivis Support.

Required artifacts:

These aivis engine v2 .jar files which you will receive in a libs.zip directly from aivis Support:
- aivis-engine-v2-da-runtime-java-full-win-x8664-{VERSION}.jar: An dependency analysis full java runtime, here for windows, fitting your operating system - see artifacts for other options on linux and macos.
- aivis-engine-v2-base-sdk-java-{VERSION}.jar: The base java sdk
- aivis-engine-v2-da-sdk-java-{VERSION}.jar: The dependency analysis java sdk
- There is NO toolbox jar for HTML report generation.
An aivis licensing key, see licensing, which you will receive directly from aivis Support

Preparations:

Make sure you have a valid Java(>=11) installation.
To apply the aivis licensing key, create an environment variable AIVIS_ENGINE_V2_API_KEY and assign the licensing key to it.
Make sure you have an active internet connection so that the licensing server can be contacted.
Download and unzip the dependency-analysis-examples.zip. The data CSV train_da.csv needs to stay in **/data.
Download and unzip the libs.zip. These .jar-files need to be in **/libs.

The folder now has the following structure:

+- data
|  +- train_da.csv
|
+- docker
|  +- # files to run the example via docker images, which we will not need now
|
+- java
|  +- # files to run the example via java sdk 
|
+- libs
|  +- # the .jar files to run aivis
|
+- python
|  +- # files to run the example via python sdk, which we will not need now

Running the example code:

We use Gradle as our Java-Package-Manager. It's easiest to directly use the gradle wrapper.
Navigate to the **/java subfolder. Here, you find the build.gradle. Check, if the paths locate correctly to your aivis engine v2 .jar files in the **/libs subfolder.

open a console in the **/java subfolder and run the following commands:

  # builds this Java project with gradle wrapper
  ./gradlew clean build

  # runs Java with parameters referring to input and output folder
  java -jar build/libs/example_da.jar --input=../data --output=output

After running the scripts, you will find your computation results in **/java/output.

Artifacts

Our SDK artifacts come only in flavour full (but note that inf flavour will be introduced in a future release):

full packages provide the full functionality and are available for mainstream targets only:
- win-x8664
- macos-armv8* (macOS 11 "Big Sur" or later) 2.3
- macos-x8664* (macOS 11 "Big Sur" or later; until aivis engine version 2.9.0) 2.3
- linux-x8664 (glibc >= 2.14)

* Only Python and C SDKs are supported. Java SDK is not available for this target.

In this chapter we want to demonstrate the full API functionality and thus always use the full package.

To use the Python-SDK you must download the SDK artifact (flavour and target generic) for your pythonpath at build time. Additionally at installation time, the runtime artifact must be downloaded with the right flavour and target.

The artifacts are distributed through a PyPI registry.

Using Poetry you can simply set a dependency on the artifacts specifying flavour and version. The target is chosen depending on your installation system:

aivis_engine_v2_da_sdk_python = "{VERSION}"
aivis_engine_v2_da_runtime_python_{FLAVOUR} = "{VERSION}"

To use the Java-SDK, you must download at build time:

SDK artifact (flavour and target generic) for your compile and runtime classpath
Runtime artifact with the right flavour and target for your runtime classpath

It is possible to include multiple runtime artifacts for different targets in your application to allow cross-platform usage. The SDK chooses the right runtime artifact at runtime.

The artifacts are distributed through a Maven registry.

Using Maven, you can simply set a dependency on the artifacts specifying flavour, version and target:

<dependency>
  <groupId>com.vernaio</groupId>
  <artifactId>aivis-engine-v2-da-sdk-java</artifactId>
  <version>{VERSION}</version>
</dependency>
<dependency>
  <groupId>com.vernaio</groupId>
  <artifactId>aivis-engine-v2-da-runtime-java-{FLAVOUR}-{TARGET}</artifactId>
  <version>{VERSION}</version>
  <scope>runtime</scope>
</dependency>

Alternativly, with Gradle:

implementation 'com.vernaio:aivis-engine-v2-da-sdk-java:{VERSION}'
runtimeOnly    'com.vernaio:aivis-engine-v2-da-runtime-java-{FLAVOUR}-{TARGET}:{VERSION}'

To use the C-SDK, you must download the SDK artifact at build time (flavour and target generic). For final linkage/execution you need the runtime artifact with the right flavour and target.

The artifacts are distributed through a Conan registry.

Using Conan, you can simply set a dependency on the artifact specifying flavour and version. The target is chosen depending on your build settings:

aivis-engine-v2-da-sdk-c/{VERSION}
aivis-engine-v2-da-runtime-c-{FLAVOUR}/{VERSION}

The SDK artifact contains:

Headers: include/aivis-engine-v2-da-core-full.h

The runtime artifact contains:

Import library (LIB file), if Windows target: lib/aivis-engine-v2-da-{FLAVOUR}-{TARGET}.lib
Runtime library (DLL file), if Windows target: bin/aivis-engine-v2-da-{FLAVOUR}-{TARGET}.dll (also containing the import library)
Runtime library (SO file), if Linux target: lib/aivis-engine-v2-da-{FLAVOUR}-{TARGET}.so (also containing the import library)

The runtime library must be shipped to the final execution system.

Licensing

A valid licensing key is necessary for every aivis calculation in every engine and every component. It has to be set (exported) as environment variable AIVIS_ENGINE_V2_API_KEY.

aivis will send HTTPS requests to https://v3.aivis-engine-v2.vernaio-licensing.com (before release 2.7: https://v2.aivis-engine-v2.vernaio-licensing.com, before release 2.3: https://aivis-engine-v2.perfectpattern-licensing.de) to check if your licensing key is valid. Therefore, requirements are an active internet connection as well as no firewall blocking an application other than the browser calling this url.

If aivis returns a licensing error, please check the following items before contacting aivis Support:

Has the environment variable been correctly set?
Licensing keys have the typical form <FirstPartOfKey>.<SecondPartOfKey> with first and second part being UUIDs. In particular, there must be no whitespace.
Applications and in particular terminals often need to be restarted to learn newly set environment variables.
Open https://v3.aivis-engine-v2.vernaio-licensing.com in your browser. The expected outcome is "Method Not Allowed". In that case, at least the url is not generally blocked.
Sometimes, firewalls block applications other than the browser acessing certain or all websites. Try to investigate if you have such a strict firewall.

Setup

Before we can invoke API functions of our SDK, we need to set it up for proper usage and consider the following things.

Releasing Unused Objects

It is important to ensure the release of allocated memory for unused objects.

In Python, freeing objects and destroying engine resources is done automatically. You can force resource destruction with the appropriate destroy function.

In Java, freeing objects is done automatically, but you need to destroy all engine resources like Data- and Analysis-objects with the appropriate destroy function. As they all implement Java’s AutoClosable interface, we can also write a try-with-resource statement to auto-destroy them:

try(final DependencyAnalysisData inputData = DependencyAnalysisData.create()) {

  // ... do stuff ...

} // auto-destroy when leaving block

In C, you must always

free every non-null pointer allocated by the engine with aivis_free (all pointers returned by functions and all double pointers used as output function parameter e.g. Error*)
Note: aivis_free will only free own objects. Also, it will free objects only once and it disregards null pointers.
free your own objects with free as usual.
destroy all handles after usage with the appropriate destroy function.

Error Handling

Errors and exceptions report what went wrong on a function call. They can be caught and processed by the outside.

In Python, an Exception is thrown and can be caught conveniently.

In Java, an AbstractAivisException is thrown and can be caught conveniently.

In C, every API function can write an error to the given output function parameter &err (to disable this, just set it to NULL). This parameter can then be checked by a helper function similar to the following:

const Error *err = NULL;

void check_err(const Error **err, const char *action) {

  // everything is fine, no error
  if (*err == NULL)
    return;

  // print information
  printf("\taivis Error: %s - %s\n", action, (*err)->json);

  // release error pointer
  aivis_free(*err);
  *err = NULL;

  // exit program
  exit(EXIT_FAILURE);
}

Failures within function calls will never affect the state of the engine.

Logging

The engine emits log messages to report on the progress of each task and to give valuable insights. These log messages can be caught via registered loggers.

# create logger
class Logger(EngineLogger):
    def log(self, level, thread, module, message):
        if (level <= 3):
            print("\t... %s" % message)

# register logger
DependencyAnalysisSetup.register_logger(Logger())

// create and register logger
DependencyAnalysisSetup.registerLogger(new EngineLogger() {
            
    public void log(int level, String thread, String module, String message) {
        if (level <= 3) {
            System.out.println(String.format("\t... %s", message));
        }
    }
});

// create logger
void logger(const uint8_t level, const char *thread, const char *module, const char *message) {
  if (lvl <= 3)
    printf("\t... %s\n", message);
}

// register logger
aivis_setup_register_logger(&logger, &err);
check_err(&err, "Register logger");

Thread Management

During the usage of the engine, a lot of calculations are done. Parallelism can drastically speed things up. Therefore, set the maximal threads to a limited number of CPU cores or set it to 0 to use all available cores (defaults to 0).

# init thread count
DependencyAnalysisSetup.init_thread_count(4)

// init thread count
DependencyAnalysisSetup.initThreadCount(4);

// init thread count
aivis_setup_init_thread_count(4, &err);
check_err(&err, "Init thread count");

Data Input

Now that we are done setting up the SDK, we need to create a data store that holds our historical Analysis Data. In general, all data must always be provided through data stores. You can create as many as you want.

After the creation of the data store, you can fill it with signal data. The classic way to do it is writing your own reading function and adding signals, i.e. lists of data points, to the data context yourself, as it is shown in Data Reader Options.

We recommend to use the built-in files reader, which processes a folder with csv files that have to follow the CSV Format Specification. We assume, that the folder path/to/input/folder/ contains train_da.csv.

# create empty data context for analysis data
analysis_data = DependencyAnalysisData.create()

# create config for files reader
files_reader_config = json.dumps(
    {
        "folder": "path/to/input/folder/"
    }
)

# read data 
analysis_data.read_files(files_reader_config)

# ... use analysis data ...

// create empty data context for analysis data
try(final DependencyAnalysisData analysisData = DependencyAnalysisData.create()) {
  
  // create config for files reader
  final DtoTimeseriesFilesReaderConfig filesReaderConfig = new DtoTimeseriesFilesReaderConfig("path/to/input/folder/");
  
  // read data 
  analysisData.readFiles(filesReaderConfig);
  
  // ... use analysis data ...
  
} // auto-destroy analysis data

// create empty data context for analysis data
TimeseriesDataHandle analysis_data = aivis_timeseries_data_create(&err);
check_err(&err, "Create analysis data context");

// create config for files reader
const char *reader_config = "{"
  "\"folder\": \"path_to_input_folder\""
"}";

// read data 
aivis_timeseries_data_read_files(analysis_data, (uint8_t *) reader_config, strlen(reader_config), &err);
check_err(&err, "Read Files");

// ... use analysis data ...

// destroy data context
aivis_timeseries_data_destroy(analysis_data, &err);
check_err(&err, "Destroy data context");
analysis_data = 0;

In the following, we will assume you have read in the file train_da.csv shipped with the Example Project.

Analysis

With the data store filled with historical Analysis Data, we can now create our analysis:

# build analysis config
analysis_config = json.dumps(
    {
        "sampling": {"maximalSampleCount": 10000},
        "clustering": {"minimalClusterSize": 1, "outputMultipleZoomLevels": True},
    }
)

# create analysis
analysis = DependencyAnalysis.create(analysis_data, analysis_config)

# ... use analysis ...

// build analysis config
final DtoAnalysisConfig analysisConfig =
  new DtoAnalysisConfig().withClustering(
    new DtoClusteringConfig().withMinimalClusterSize(1).withOutputMultipleZoomLevels(true))
    .withSampling(new DtoSamplingConfig().withMaximalSampleCount(10000));
// create analysis
final DependencyAnalysis analysis = DependencyAnalysis.create(analysisData, analysisConfig) {

  // ... use analysis ...

} // auto-destroy analysis

// build analysis config
const char *analysis_config =
  "{"
  "\"sampling\": {\"maximalSampleCount\": 10000},"
  "\"clustering\": {"
  "\"minimalClusterSize\": 1,"
  "\"outputMultipleZoomLevels\": true "
  "}"
  "}";

// create analysis
DependencyAnalysisHandle analysis_handle = aivis_dependency_analysis_create(
  analysis_data,
  (uint8_t *) analysis_config,
  strlen(analysis_config),
  &err
);
check_err(&err, "Create analysis");

// ... use analysis ...

// destroy analysis
aivis_dependency_analysis_destroy(analysis_handle, &err);
check_err(&err, "Destroy analysis");
analysis_handle = 0;

For the moment, you may take this file as it is. The different keys will become clearer from the later sections and the reference manual.

Getting Started (Docker)

The docker images of aivis Dependency Analysis are prepared for easy usage. They use the SDK internally, but have a simpler file-based interface. If you have a working docker workflow system like Argo, you can build your own automated workflow based on these images.

In this chapter, we will show you how to get started using docker images. Usage of the SDK will be covered by the next chapter.

Run Example Code

A working example that builds on the code explained below can be downloaded directly here: dependency-analysis-examples.zip.

This zip file contains example code for docker, python and java in respective subfolders. All of them use the same dataset which is in the data subfolder.

Prerequisites: Additionally to the dependency-analysis-examples.zip you just downloaded, you need the following artifacts. To gain access, you can open a support ticket via aivis Support.

The docker images aivis-engine-v2-da-worker and (optionally for HTML report generation) aivis-engine-v2-toolbox
An aivis licensing key, see licensing

As a Kubernetes user even without deeper Argo knowledge, the aivis-engine-v2-example-da-argo.yaml shows best how the containers are executed after each other, how analysis worker is provided with a folder that contains the data csv and how the toolbox assembles a HTML report at the end.

Artifacts

There is one docker image:

The Worker creates the report:
{REGISTRY}/{NAMESPACE}/aivis-engine-v2-da-worker:{VERSION}

The docker image is Linux-based.

Requirements

You need an installation of Docker on your machine as well as access to the engine artifacts

docker -v
docker pull {REGISTRY}/{NAMESPACE}/aivis-engine-v2-da-worker:{VERSION}

docker -v
docker pull {REGISTRY}/{NAMESPACE}/aivis-engine-v2-da-worker:{VERSION}

Licensing

A valid licensing key is necessary for every aivis calculation in every engine and every component. It has to be set (exported) as environment variable AIVIS_ENGINE_V2_API_KEY.

If aivis returns a licensing error, please check the following items before contacting aivis Support:

Has the environment variable been correctly set?
Licensing keys have the typical form <FirstPartOfKey>.<SecondPartOfKey> with first and second part being UUIDs. In particular, there must be no whitespace.
Applications and in particular terminals often need to be restarted to learn newly set environment variables.
Open https://v3.aivis-engine-v2.vernaio-licensing.com in your browser. The expected outcome is "Method Not Allowed". In that case, at least the url is not generally blocked.
Sometimes, firewalls block applications other than the browser acessing certain or all websites. Try to investigate if you have such a strict firewall.

Analysis

At the beginning, we create a folder docker, a subfolder analysis-config and add the configuration file config.yaml:

  data:
    folder: /mnt/workdir/csvArchive
    dataTypes: 
      defaultType: FLOAT
  analysis: 
    sampling: 
      maximalSampleCount: 10000
    clustering: 
      minimalClusterSize: 1
      outputMultipleZoomLevels: true 
  output: 
    folder: /mnt/workdir/analysis-worker-output

For the moment, you may take this file as it is. The different keys will become clearer from the later sections and the docker reference manual.

As a next step, we create a second folder data and add the Input Data CSV file train_da.csv to the folder. Afterwards, we create a blank folder output.

Our folder structure should now look like this:

+- docker
|  +- analysis-config
|      +- config.yaml
|
+- data
|  +- train_da.csv
|
+- output

Finally, we can start our analysis via:

docker run --rm -it \
  -v $(pwd)/docker/analysis-config:/srv/conf \
  -v $(pwd)/data/train_da.csv:/srv/data/train_da.csv \
  -v $(pwd)/output:/srv/output \
  {REGISTRY}/{NAMESPACE}/aivis-engine-v2-da-worker:{VERSION}

docker run --rm -it `
  -v ${PWD}/docker/analysis-config:/srv/conf `
  -v ${PWD}/data/train_da.csv:/srv/data/train_da.csv `
  -v ${PWD}/output:/srv/output `
  {REGISTRY}/{NAMESPACE}/aivis-engine-v2-da-worker:{VERSION}

After a short time, this should lead to an output file analysis-report.json in the output folder. The result is visualized below.

Getting Started Example Results

In this section, we discuss the results of our example project, Damadics. This dataset includes 32 signals recorded over a 25-day period from October 29 to November 22, 2001.

In the interactive report below, you can view various clusterizations generated by the aivis Dependency Analysis engine. On the left panel, a projection of the signals on a two-dimensional plane is shown, with different colors indicating the clusters they are aggregated in. The scrollbar at the top allows navigation through all zoom levels, altering the panels just below it. These panels display the different clusters per zoom level and the signals in each cluster, respectively. You can select multiple clusters, and the plot displaying the signals will update automatically.

Just below the lower right panel, there are two search bars: one to select a focus signal and the other to filter and find specific signals within a cluster. The focus signal is a specific signal of interest, and the correlations of the other signals with this one are then displayed. Two buttons are located below the panel showing the plot of the signals: 'Deselect All Clusters' and 'Export Config Snippets of Selected Clusters'. The former button allows you to deselect all chosen clusters. The latter is intended for export of the signals of selected clusters, formatted as input to other engines.

Data Specification

In the course of using aivis, large amounts of data are ingested. This chapter explains the terminology as well as the required format, quality and quantity.

Timeseries Data / Signals

Most aivis engines work on time series data that is made up of signals. Every signal consists of two things, these being

an ID, which is any arbitrary String except timestamp and availability. The ID needs to be unique within the data.
a list of data points. Each data point consists of a signal value and a specific point in time, the Detection Timestamp (optionally there can also be an Availability Timestamp, but more on that later). Usually the values are the result of a measurement happening in a physical sensor like a thermometer, photocell or electroscope, but you can also use market KPIs like stock indices or resource prices as a signal.

The data points for one or more signals for a certain detection time range are called a history.

Timeseries

The values of a signal can be boolean values, 64-bit Floating Point numbers or Strings. Non-finite numbers (NAN and infinity) and empty strings are regarded as being unknown and are therefore skipped.

Points in time are represented by UNIX Timestamps in milliseconds (64-bit Integer). This means the number of milliseconds that have passed since 01.01.1970 00:00:00 UTC.

Detection Timestamp

The point in time that a signal value belongs to is called the Detection Timestamp. This usually is the timestamp when the measurement originally has taken place. If the measurement is a longer offline process, it should refer to the point in time at which the measured property was established, e.g. the time point of sample drawing or the production time for delayed sampling.

Different signals may have different Detection Timestamps. Some might have a new value every second, some every minute, some just when a certain event happens. aivis automates the process of synchronizing them internally. This includes dealing with holes in the data.

Availability Timestamp

Strictly speaking, aivis Dependency Analysis does not perform any instance of (live) inference. Therefore, this section could be seen as unnecessary in the context of aivis Dependency Analysis. Nevertheless, given an important use case of aivis Dependency Analysis is its usage together with aivis Anomaly Detection, it is still worth mention the generalities of the timeseries data that underlies the engines in aivis.
When doing a historical evaluation, we want to know what the engine would have inferred/predicted for a list of Inference Timestamps that lie in the past (Inference Timestamps are the moments for which you want to get an inference). For a realistic inference, the engine must ignore all signal values that were not yet available to the database at the Inference Timestamp. A good example for such a case is a measurement, that is recorded by a human. The value of this measurement will be backdated by him/her to the Detection Timestamp, but it took e.g. 5 minutes to extract the value and report it to the system. So, it would be wrong to assume that one minute after this fictitious Detection Timestamp, the value would have been already available to the Inference. Another example case is the fully automated lagged data ingestion of distributed systems (especially cloud systems).

There are multiple ways to handle availability. Which strategy you use depends an the concrete use case.

To allow for these different strategies, every data point can have an additional Availability Timestamp that tells the system when this value became available or would have been available.

Training Data Filtering

There is the possibility of filtering the Training Data in multiple ways:

The overall time window can be restricted.
Signals can be excluded and included as a whole.
Specific time windows of specific signals can be excluded or included.

The filtering is configurable:

The docker image Analysis Worker can be configured in the main config file.
SDK Analysis API has filter nodes in the their config structure.

This means that two Dependency Reports could be constructed on the same data set, but on different time windows or signal sets. Alternatively, the user can of course also restrict the data that enters the engine beforehand.

CSV Format

All artifacts use CSV as the input data format. As the CSV format is highly non-standardized, we will discuss it briefly in this section.

CSV files must be stored in a single folder specified in the config under data.folder. Within this folder the CSV files can reside in an arbitrary subfolder hierarchy. In some cases (e.g. for HTTP requests), the folder must be passed as a ZIP file.

General CSV rules:

The file’s charset must be UTF-8.
Records must be separated by Windows or Unix line ending (CR LF/LF). In other words, each record must be on its own line.
Fields must be separated by comma.
The first line of each CSV file represents the header, which must contain column headers that are file-unique.
Every record including the header must have the same number of fields.
Text values must be enclosed in quotation marks if they contain literal line endings, commas or quotation marks.
Quotation marks inside such a text value have to be prefixed (escaped) with another quotation mark.

Special rules:

One column must be called timestamp and contain the Detection Timestamp as UNIX Timestamps in milliseconds (64-bit Integer).
Another column can be present that is called availability. This contains the Availability Timestamp in the same format as the Detection Timestamp.
All other columns, i.e. the ones that are not called timestamp or availability, are interpreted as signals.
Signal IDs are defined by their column headers.
If there are multiple files containing the same column header, this data is regarded as belonging to the same signal.
Signal values can be boolean values, numbers and strings.
Empty values are regarded as being unknown and are therefore skipped.
Files directly in the data folder or in one of its subfolders are ordered by their full path (incl. filename) and read in this order.
If there are multiple rows with the same Detection Timestamp, the data reader proceeds all to the engine which uses the last value that has been read.

Boolean Format

Boolean values must be written in one of the following ways:

true/false (case insensitive)
1/0
1.0/0.0 with an arbitrary number of additional zeros at the end

Regular expression: (?i:true)|(?i:false)|1(\.0+)?|0(\.0+)?

Number Format

Numbers are stored as 64-bit Floating Point numbers. They are written in scientific notation like -341.4333e-44, so they consist of the compulsory part Significand and an optional part Exponent that is separated by an e or E.

The Significand contains one or multiple figures and optionally a decimal separator .. In such a case, figures before or after the separator can be ommited and are assumed to be 0. It can be prefixed with a sign (+ or -).

The Exponent contains one or multiple figures and can be prefixed with a sign, too.

The 64-bit Floating Point specification also allows for 3 non-finite values (not a number, positive infinity and negative infinity) that can be written as nan, inf/+inf and -inf (case insensitive). These values are valid, but the engine regards them as being unknown and they are therefore skipped.

Regular expression: (?i:nan)|[+-]?(?i:inf)|[+-]?(?:\d+\.?|\d*\.\d+)(?:[Ee][+-]?\d+)?

String Format

String values must be encoded as UTF-8. Empty strings are regarded as being unknown values and are therefore skipped.

Example

timestamp,availability,SIGNAL_1,SIGNAL_2,SIGNAL_3,SIGNAL_4,SIGNAL_5
1580511660000,1580511661000,99.98,74.33,1.94,true,
1580511720000,1580511721000,95.48,71.87,-1.23,false,MODE A
1580511780000,1580511781000,100.54,81.19,,1e-5,MODE A
1580511840000,1580511841000,76.48,90.01,2.46,0.0,MODE C
...

Preparation

Previous sections gave an introduction on how to use aivis Dependency Analysis and also shed some light on how it works. The following sections will explain more on the concept and provide a more profound background. It is not necessary to know this background to use aivis Dependency Analysis! However, you may find convenient solutions for specific problems, or information on how to optimize your usage of aivis Dependency Analysis. It will become clear that only minimal user input is required for the engine to perform well. Nevertheless, the user has the option to control the process with several input parameters which will be presented below.

Analysis

Workflow

Data Preparation

The requirements for the input data that were formulated in section Data Specification serve the purpose of making it possible for aivis to read the data. Typically, several additional steps are required to make the data appropriate to be fed into a machine learning algorithm. Among others, these include:

synchronizing timestamps
dealing with missing values

All of the above is handled by aivis automatically. Here, data preparation steps that go beyond anything described in the “Data Specification” section are not necessary and even discouraged as they may alter the underlying information. Synchronization is not necessary as aivis treats each signal separately as a time series, and this also eliminates the need for imputation of missing values.

As mentioned above, aivis takes over many time-consuming data preparation tasks. For encoding of special kinds of signals such as angles, audio data, or categorical data, there are built-in interpreters.

Computation of Correlations and Clusterization

After preparation, each signal is paired with every other signal present in the data storage, and the correlation between them is computed. This process depends strongly on the selected interpreter for each signal. Once all possible combinations are exhausted, and the corresponding distances are derived from the correlations, a clustering process is executed and all possible clusterings are obtained for different values of the zoom factor (this quantity will be explained in detail in the next section).

A Fully Loaded Analysis Configuration

First, an overview of all kinds of possible configuration keys is presented. A more minimal analysis configuration was used above in SDK analysis, respectively, in Docker analysis. This example may mainly serve as a quick reference. The meaning of the different keys is explained in the following sections, and a definition of the syntax is given in the reference manuals.

analysis: 
  dataFilter:
    startTime: 1004569259000
    endTime: 1004655599000
    excludeSignals:
    - signal: S_10
      startTime: 1004654938000
      endTime: 1004655313000
    # includeSignals: ... similar
    includeRanges:
    - startTime: 1004569259000
      endTime: 1004654938000
    - startTime: 1004655530000
      endTime: 1004655599000
    # excludeRanges: ... similar
  sampling:
    maximalSampleCount: 10000
  operativePeriods: 
    signal: MY_BOOLEAN_OPERATIVE_SIGNAL
  clustering:
    minimalClusterSize: 1
    outputMultipleZoomLevels: true
  signals: 
  - signal: S_0
    interpreter: 
      _type: Categorical

analysis_config = json.dumps({
  "dataFilter": {
    "startTime": 1004569259000,
    "endTime": 1004655599000,
    "excludeSignals": [{
      "signal": "SIGNAL_10",
      "startTime": 1004654938000,
      "endTime": 1004655313000
    }],
    # "includeSignals": ... similar
    "includeRanges" : [{
      "startTime": 1004569259000,
      "endTime": 1004654938000
    },{
      "startTime": 1004655530000,
      "endTime": 1004655599000
    }],
    # "excludeRanges": ... similar
  },
  "sampling": {
    "maximalSampleCount": 10000,
  },
  "operativePeriods": {
    "signal": "MY_BOOLEAN_OPERATIVE_SIGNAL"
  },
  "clustering": {
    "minimalClusterSize": 1,
    "outputMultipleZoomLevels": True
  },
  "signals":
    [{"signal": "S_0",
      "interpreter": {"_type": "Categorical"}
     }],
  })

final DtoAnalysisConfig analysisConfig = new DtoAnalysisConfig()
  .withDataFilter(new DtoDataFilter()
    .withStartTime(1004569259000L)
    .withEndTime(1004655599000L)
    .withExcludeSignals(new DtoDataFilterRange[] { 
      new DtoDataFilterRange("S_10")
        .withStartTime(1004654938000L)
        .withEndTime(1004655313000L)
    })
    // .withIncludeSignals ... similar
    .withIncludeRanges(new DtoInterval[] { 
      new DtoInterval()
        .withStartTime(1004569259000L)
        .withEndTime(1004654938000L),
        new DtoInterval()
        .withStartTime(1004655530000L)
        .withEndTime(1004655599000L),
    })
    // .withExcludeRanges ... similar
  )
  .withSampling(new DtoSamplingConfig()
    .withMaximalSampleCount(10000)
  )
  .withOperativePeriods(new DtoOperativePeriodsConfig("MY_BOOLEAN_OPERATIVE_SIGNAL"))
  .withClustering(new DtoClusteringConfig()
    .withMinimalClusterSize(1)
    .withOutputMultipleZoomLevels(true)
  )
  .withSignals(new DtoSignalConfig[]{
    new DtoSignalConfig("S_0")
      .withInterpreter(new DtoCategoricalSignalInterpreter()),
  })
  ;

const char *analysis_config = "{"
 "\"dataFilter\": {"
   "\"startTime\": 1004569259000,"
   "\"endTime\": 1004655599000,"
   "\"excludeSignals\": [{"
   "\"signal\": \"S_10\","
   "\"startTime\": 1004654938000,"
   "\"endTime\": 1004655313000"
 "}]," 
  // "\"includeSignals\": ... similar
   "\"includeRanges\": [{"
   "\"startTime\": 1004569259000,"
   "\"endTime\": 1004654938000"
  "},{"
  "\"startTime\": 1004655530000,"
   "\"endTime\": 1004655599000"
  "}]" 
  // "\"excludeRanges\": ... similar
  "},"
  "\"sampling\": {"
    "\"maximalSampleCount\": 10000"
  "},"
  "\"operativePeriods\": {"
    "\"signal\": \"MY_BOOLEAN_OPERATIVE_SIGNAL\""
  "},"
 "\"clustering\" : {"
   "\"minimalClusterSize\": 1,"
   "\"outputMultipleZoomLevels\": true"
  "},"
  "\"signals\" : [{"
    "\"signal\" : \"S_0\","
    "\"interpreter\": {"
    "\"_type\": \"Categorical\""
   "}"
  "}]"
"}";

Data Filter: Exclude Parts of the Data

The following sections list and explain the parameters the user may configure to control the data on which the analysis is performed. The sections are organized along the structure of the configuration classes.

The data filter allows you define the signals and time range that are used for the analysis. Concretely, the data filter allows you to choose signals for the analysis. This can be done by either of the following ways: exclude signals, or, alternatively, provide a list of signal names to include (include signals). Beyond that, the data filter allows you to determine the time range of data that is used for the analysis, and even to include or exclude separate time ranges for specific signals.

It is also possible to include/exclude several time intervals globally, instead of just selecting/excluding one global time interval. This is carried out using the fields include ranges and exclude ranges 2.5.

Analogous to the global start time and end time, such intervals can also be specified for individual signals. Note, however, that it is usually advisable to apply time constraints globally.

Data excluded by data filters is still available for the expression language but does not directly enter the model building. Control on signal selection for model building is provided by the signal configuration. Finally, note that complementary to the data filter, periods of interest can conveniently be defined by operative periods.

Sampling Configuration: Adjust the number of analysis points

Standard practice often involves using all available data points for analysis. This, however, might be more than what is actually necessary. Adjusting the maximum sample count is a strategic way to avoid overly long analysis durations by specifically limiting the number of timestamps utilized for each signal in the dataset. For instance, consider a dataset with individual signals comprising 2 million values each. Utilizing every value from these signals for analysis could lead to significant computational demands. By default, this number is pared down to 70,000 per signal. For faster processing, this figure can be further reduced. Surprisingly effective outcomes are often achieved with just 10,000 sample points per signal, or even fewer. This works thanks to aivis Dependency Analysis automatically choosing the most informative sample points. This is achieved by ensuring a comprehensive representation of the entire range of behavioral patterns. Should the maximum sample count be set higher than the number of values in a signal, then all points from that signal are considered.

Operative Periods: Exclude Downtimes

Sometimes, the data includes data points which are to be excluded from analysis. A typical situation is the time during which a machine is off, possibly including warm up and cool down phases. Also signal behavior during maintenance time, typically is not of interest. To restrict the analysis to times of interest, a signal may be assigned to be the operative signal. An operative signal must be boolean. Then, analysis is restricted to timestamps for which the operative signal is true. Often, there is no such signal in the raw data but it may be easy to derive the operative times from other signals. For example some motor may be stopped when production is off. If motor speed is above a certain threshold, this may be used to define operative periods. For such situations, an operative signal may easily be created with help of the expression language.

Clustering configuration: Size of clusters and number of clusterings

This sector of the configuration controls two very important aspects of aivis Dependency Analysis, the minimum number of signals per cluster and the number of possible clusterizations to be given as output. In particular, minimal cluster size controls the minimum number of signals that we allow to form a cluster. Its default setting is one, but it could also be practical to be interested only in clusters with more than a given amount of signals. The second element in this configurations is output multiple zoom levels. To explain its behavior, we should first introduce the concept of Zoom Level in the context of aivis Dependency Analysis. One clusterization is calculated for each zoom level. A clusterization is a partition of all signals into clusters. A high zoom level means that signals are distributed among many small clusters. For a low zoom level all signals are distributed to only a few large clusters. If output multiple zoom levels is set to true, the engine will save all clusterizations into the report. If set to false, the engine will only save the clusterization that yields the best separation.

Signal Configuration: If Signals Require Special Treatment

The signal configuration is the place to pass additional information about feature signals in order to enforce a special treatment. Each signal configuration refers to one specific signal.

Interpreter

At the core of the signal configuration is the interpreter. The interpreter defines which features can be built from a signal — that is which triggers can be applied to a signal. Very often the default configuration is the best choice and you don't need to set any interpreter. Below you find a table on the different interpreters, followed by some more in-depth explanations.

Interpreter	Short Explanation	Examples
Default	Corresponds to a numerical interpreter for float signals, and to a categorical one for string and boolean signals
Numerical	No special aspect generation. The signal is taken as it is.	Speed, temperature, weight,...
Categorical	Each signal value corresponds to some category. Categories have no order.	Color, operation mode, on/off,...
Cyclic	Signal values can be mapped to a finite interval. Lower and upper bound of this interval are identified with each other.	Angles (0° to 360°), time of the day (0:00 to 24:00),...
Oscillatory	Signal contains periodically recurrent parts. Interest is rather in the frequency of recurrences than the actual signal values.	Audio data, vibrations,...

By default, all float signals are interpreted as numerical. This interpreter should be used for all signals for which the order of numbers is meaningful and which don't require some special treatment. A thermometer, for example, generates numerical data: the smaller the number the colder the temperature. It is irrelevant whether the scale is continuous, or whether the thermometer’s reading precision is limited to integer degrees. The numerical signal kind is quite common for float signals but there are also situations, for which it does not fit. Therefore, float signals may be also declared any of the other signal kinds.

String and boolean signals are always interpreted as categorical. Categorical data has nominal scale, i.e. it takes only specific levels and does not necessarily follow any order. In practice, this would express the information about certain states, such as “green”, “red”, or “blue”. This information may be present in form of strings, booleans, or also encoded in numbers. An example could be a signal for which "1.0" stands for "pipe open", "2.0" for "pipe blocked", and "3.0" for "pipe sending maintenance alarm".

For a cyclic signal only the residue from division by the cycle length is accounted for. This means the order of numbers is meaningful but it wraps at the cycle length. A common example are angles. Angles are usually defined in the interval \(0\) to \(2 \pi\). This means a cycle length of \(2 \pi\). If the signal takes a value outside this range, it is automatically mapped therein. For example, \(2.1 \pi\) is identified with \(0.1 \pi\). And, of course, 0 and \(1.99 \pi\) are considered to be close to each other. Another example can be derived from a continuous time signal. Let's say time is measured in the unit of hours. Then, applying an interpreter with cycle length 24, yields an aspect that describes the time of the day.

Finally, audio, vibration, or any other data that oscillates with some periodicity may best be interpreted as oscillatory. Oscillatory signals are interpreted in the frequency domain. In order to calculate a frequency spectrum and automatically derive the most relevant aspects, two configuration parameters are necessary. The mesh describes the shortest timespan to consider, the inverse sampling frequency. For example, a mesh of 2 milliseconds means a sample rate of 0.5 kHz. Within this documentation, the unit of the timestamps is usually assumed to be milliseconds to keep explanations concise. However, the unit of timestamps is irrelevant internally. Oscillatory signals may well have sample rates above 1 kHz for which a more fine-grained time unit is necessary. For example, a 32 kHz audio signal has a signal value each 0.031250 milliseconds. In this case, the usual notion of timestamps as milliseconds does not work anymore. Instead, timestamps may be provided in units of nanoseconds and full information is retained for a mesh of 31250. (Alternatively, timestamps may also be provided in units of a thirty second part of a millisecond, and full information is retained for a mesh of 1.) If the highest frequencies of the signal are expected not to be relevant, i.e. if the microphone or detector records with a higher rate than actually needed, the mesh may be chosen larger than the difference between timestamps in the data. In the above example, a mesh of 62500 nanoseconds would retain only each second value. The other parameter is the window length. It describes the longest time span to consider for a frequency spectrum. Therefore, it should reflect some reasonable time to "listen" to the signal before trying to get information out of it. The window length defines the period of the lowest frequency that can be analysed. Therefore, at the very least it should be as long as the period of the lowest relevant frequency. If no data are provided during some interval larger than twice the mesh, no frequency spectrum is calculated for this gap. Instead, the frequency spectrum is calculated for the last period for which there is no gap over the window length. This behavior allows for discontinuous signal transmission. To reduce the amount of data, for example, the signal may be provided in bunches which are sent each 2 minutes, each covering a time period of 10 seconds. The drawbacks are some delay of results, up to 2 minutes in the above example, and loss of information about anything happening within the gap.

Sometimes, it is not clear which interpreter to choose. As an example, take a signal for which "0.0" may stand for "no defects", "1.0" for "isolated microscopic defects" and "3.0" for "microscopic connected defects". A priori, one may assume that the effect of isolated defects may be somewhere in between no defects and connected defects, and thus assign the numerical scale. On the other hand, isolated microscopic defects may have no relevant effect: It may be irrelevant whether there are no or isolated defects. In this case, a categorical scale would be preferable. Such a situation can easily be dealt with: create a duplicate of the signal with the help of the expression language, configure one of the signals as numerical and the other as categorical, and let aivis make the best out of both.

Output: Report

As a result of analysis, a report is produced, which contains all the clusters with their corresponding signals, plus all viable correlations between pairs of signals. With an appropriate visualization tool, one can plot the clusters, the centers of the clusters and the signal correlations from the report.

Appendix 1: Expression Language

Before starting the workflow, sometimes there is the need to add a new signal to the dataset (a synthetic signal) that is derived from other signals already present. There are various reasons for this, especially if

you want to predict a quantity that is not in your Training Data, but it could be calculated by a formula. For that task, you need to add the new signal via an expression and then use this new synthetic signal as target.
you want to restrict the training to operative periods but there is no signal that labels when your machines were off. However, you may be able to reconstruct these periods based on some other signals.
you posess domain knowledge and you want to include and pinpoint the engine to some important derived quantity. Often certain derived quantities play a specific role in the application's domain, and might be easier to understand/verify as opposed to the raw quantities.

Technically, you can add synthetic signals using the docker images or any SDK Data API

To create new synthetic signals in a flexible way, aivis features a rich Expression Language to articulate the formula.

The Expression Language is an extension of the scripting language Rhai. We have mainly added support for handling signals natively. This means, you can use signals in normal operators and functions as if they were primitive values. You can even mix signals and primitive values in the same invocation. If at least one parameter is a signal, the result will also be a signal. The list of operators and functions that allow native signal handling can be found in the section on operators and functions.

Information on the basic usage of the language can be found in the very helpful Language Reference of the Rhai Book. This documentation will mainly focus on the added features.

Signal Type

A signal consists of a list of data points that represents a time series (timestamps and values of the same type).

The following value types are supported:

bool : Boolean
i64 : 64-bit Integer
f64 : 64-bit Floating Point
string : UTF-8 String

A signal type and its value type are written generically as signal<T> and specifically like e.g. signal<i64> for an integer signal.

It is not possible to write down a signal literally, but you can refer to an already existing signal in your dataset.

Signal References

Referring to an already existing signal is done via one of these two functions:

s(signal_id: string literal): signal<T>
s(signal_id: string literal, time_shift: integer literal): signal<T>

The optional time shift parameter shifts the data points into the future. For example, if the signal "a" takes the value 5.7 at timestamp 946684800000, then the following expression takes the same value 5.7 at timestamp 946684808000. The synthesized signal is therefore a lagged version of the original signal "a".

s("a", 8000)

These functions must be used exactly with the syntax above. It is not allowed to invoke them as methods on the signal id. Both parameters must be simple literals without any inner function invocation!

Examples:

s("my signal id")              // OK
s("my signal id", 8000)        // OK
s("my s" + "ignal id", 8000)   // FAIL
"my signal id".s(8000)         // FAIL
s("my signal id", 7000 + 1000) // FAIL

Examples

To begin with, let's start with a very simple example. Let "a" and "b" be the IDs of two float signals. Then

s("a") + s("b")

yields the sum of the two signals. The Rhai + operator has been overloaded to work directly on signals (such as many other operators, see below). Therefore, the above expression yields a new signal. It contains data points for all timestamps of "a" and "b".

A more common application of the expression language may be the aim to interpolate over several timestamps. For example, "a" might fluctuate and we may therefore be interested in a local linear approximation of "a" rather than in "a" itself:

trend_intercept(s("a"), t, -1000, 0)

Here, the literal t refers to the current timestamp. Therefore, the expression yields the present value as obtained from a linear approximation over the last second. As another example, the maximum within the last second:

max(slice(s("a"), t, -1000, 0))

A typical use of the expression language is synthesizing an operative signal. Assume you want to make inferences only when your production is running, and you are sure your production is off when some specific signal "speed" falls below a certain threshold, say 10. However, "speed" may also be above the threshold during maintenance. However, during maintenance "speed" exceeds the threshold only for a few hours. This is in contrast to production which usually runs stable for months. In this situation, an operative signal may thus be synthesized by adopting only intervals larger than one day, i.e. 86400000 ms:

set_sframe(s("speed") > 10, false, 86400000)

Additional Signal Functions

In the following, all functions are defined that operate directly on signals and do not have a Rhai counterpart (For functions with Rhai counterpart, such as the + operator, see the section on operators and functions.). Some functions directly return a signal. The others can be used to create signals via the t literal as will be explained below. Note that a timeseries is always defined on a finite number of timestamps: all timestamps of all signals involved in the expression are used for the synthesized signal. Time shifts specified in the signal function s(signal_id: string literal, time_shift: integer literal) are taken into account. On the other hand, arguments of the functions below (in particular time, from, and to) do not alter the evaluation timestamps. If you need more evaluation timestamps, please apply add_timestamps to some signal in the expression (see below).

add_timestamps(signal_1: signal<T>, signal_2: signal<S>): signal<T> – returns a new signal which extends signal_1 by the timestamps of signal_2. The signal values for the new timestamps are computed with respect to signal_1 using the latest predecessor similar to the below at() function. Adding timestamps can be important because expressions are evaluated only at timestamps of the involved signals. The syntax for this expression is s("x1").add_timestamps(s("x2")). 2.4
at(signal: signal<T>, time: i64): T – returns the signal value at a given time
If there is no value at that time, it will go back in history to find a nearest predecessor; if there is no predecessor, it returns NAN, 0, false or ""
floor_slice(signal: signal<T>, time: i64, from: i64, to: i64): array<T> – returns an array of signal values including the nearest predecessor at the start of the time window and all values within the time window.
The time window is defined by a closed interval of [time + from; time + to]. 2.10
lerp(signal_0: signal<f64>, signal_1: signal<f64>, fraction: signal<f64>): f64 – returns the result of a linear interpolation between signals signal_0 and signal_1. The fraction determines the relative contribution of signal_1: For fraction = 0.0 the result is identical to signal_0, for fraction = 1.0 the result is identical to signal_1. 2.10
moving_avg(signal: signal<f64>, time: i64, from: i64, to: i64): f64 – returns the moving average of the signal. In doing so, the signal is assumed to be piecewise constant between two consecutive timestamps. The considered values within the time window include the nearest predecessor at the start of the time window and all values within the time window.
The time window is defined by a closed interval of [time + from; time + to]. 2.10
moving_max(signal: signal<f64>, time: i64, from: i64, to: i64): f64 – returns the moving maximum of the signal. This is the maximum of all values within the time window and its nearest predecessor.
The time window is defined by a closed interval of [time + from; time + to]. 2.10
moving_min(signal: signal<f64>, time: i64, from: i64, to: i64): f64 – returns the moving minimum of the signal. This is the minimum of all values within the time window and its nearest predecessor.
The time window is defined by a closed interval of [time + from; time + to]. 2.10
moving_std(signal: signal<f64>, time: i64, from: i64, to: i64): f64 – returns the moving standard deviation of the signal. In doing so the signal is assumed to be piecewise constant between two consecutive timestamps. The considered values within the time window include the nearest predecessor at the start of the time window and all values within the time window.
The time window is defined by a closed interval of [time + from; time + to]. 2.10
set_lframe(signal: signal<bool>, new_value: bool, minimal_duration: i64) : signal<bool> – returns a new boolean signal, where large same-value periods of at least duration minimal_duration are set to new_value. Note that the duration of a period is only known after end of the period. This affects the result of this function especially for live prediction.
set_sframe(signal: signal<bool>, new_value: bool, maximal_duration: i64) : signal<bool> – returns a new boolean signal, where small same-value periods of at most duration maximal_duration are set to new_value. Note that the duration of a period is only known after end of the period. This affects the result of this function especially for live prediction.
slice(signal: signal<T>, time: i64, from: i64, to: i64): array<T> – returns an array with all values within a time window of the given signal.
The time window is defined by a closed interval of [time + from; time + to].
steps(signal: signal<T>, time: i64, from: i64, to: i64, step: i64): array<T> – returns an array with values extracted from the given signal using the at function step by step.
The following timestamps are used: (time + from) + (0 * step), (time + from) + (1 * step), ... (until time + to is reached inclusively)
time_since_transition(signal: signal<bool>, time: i64, max_time: i64) : f64 – returns a new float signal, which gives time since last switch of signal from false to true. If this time exceeds max_time we return max_time. Times before the first switch and times t where the signal gives false in [t - max_time , t] are mapped to max_time. 2.4
times(signal: signal<T>): signal<i64> – returns a new signal constructed from the given one, where the value of each data point is set to the timestamp
trend_slope/trend_intercept(signal: signal<i64/f64>, time: i64, from: i64, to: i64): f64 – returns the slope/y-intercept of a simple linear regression model
Any NAN value is ignored; returns NAN if there are no data points available; the following timestamps are used: [time + from; time + to]. The intercept at t = time is returned.

Additional Array Functions

The following functions for arrays were additionally defined:

some(items: array<bool>): bool – returns true if at least one item is true
all(items: array<bool>): bool – returns true if all items are true
sum(items: array<i64/f64>): f64 – returns the sum of all items and 0.0 on an empty array
product(items: array<i64/f64>): f64 – returns the product of all items and 1.0 on an empty array
max(items: array<i64/f64>): f64 – returns the largest array item; any NAN value is ignored; returns NAN on an empty array
min(items: array<i64/f64>): f64 – returns the smallest array item; any NAN value is ignored; returns NAN on an empty array
avg(items: array<i64/f64>): f64 – returns the arithmetic average of all array items; any NAN value is ignored; returns NAN on an empty array
median(items: array<i64/f64>): f64 – returns the median of all array items; any NAN value is ignored; returns NAN on an empty array

Best practice combining expressions

When combining several expressions which operate on time windows then, from a performance point of view, it might be better to build the expression step by step than writting the combination into one expression.

For example, if we want to exclude periods smaller than 30 minutes and periods bigger than 12 hours from an existing boolean signal with signal id "control" we may use the expression:

(s("control")).set_lframe(false, 12*60*60*1000).set_sframe(false, 30*60*1000)

When evaluating this expression at a timestamp t the synthesizer scans trough the 30 minutes time window before t and for each timestamp in there it scan through another 12 hour window before. This means constructing the desired synthesized signal is of complexity 12 × 60 × 30 × # timestamps. However, splitting the above in two expressions, we first generate a signal "helper" via

(s("control")).set_lframe(false, 12*60*60*1000)

and then we apply on the result the expression

(s("helper")).set_sframe(false, 30*60*1000)

In this case we end up with complexity 12 × 60 × # timestamps + 30 × # timestamps which is considerably smaller than before.

Basics of Rhai

Working with signals

In this section, we will briefly show the potential behind Rhai and what you can create out of it. Rhai supports many types including also collections. But Rhai does not have natively a signal type. Then, when working with signals, one approach involves extracting the primitive values from signals and converting the results back into a signal format. This process uses the literal

t: i64 – the current timestamp

together with the function s to refer to some signal and some other function defined above to extract values from the signal. For example, the sum of two signals "a" and "b" could be written without use of the overloaded + operator:

s("a").at(t) + s("b").at(t)

The results of such an expression are automatically translated into a new signal. In order to construct a signal from the results, the expression must not terminate with a ;. Of course, the additional signal functions can be used as any other functions in Rhai, and may thus be combined with the rest of Rhai's tools, when applicable.

Rhai is a scripting language

As such, you can script. A typical snippet would look like the following

let array = [[s("one").at(t), s("two").at(t)], [s("three").at(t), s("four").at(t)], [s("five").at(t), s("six").at(t)]];
let pair_avg = array.map(|sub| sub.avg());
pair_avg.filter(|x| !x.is_nan()).map(|cleaned| cleaned.abs().exp()).sum().ln()

Here, we used array functions (avg(), sum()) that will be clearly defined and presented in the following sections. The last line defines the result of the expression.

Rhai has the usual statements

In the same spirit of many other languages, you can create and control flow using statements if, for, do, while, and more (read Language Reference of the Rhai Book). Here's an example demonstrating their usage

let val = s("one").at(t);
if (val >= 10.0) && (val <= 42.0) {
  1.0 - (val - 42.0)/(10.0-60.0)
} else if (val <= 60.0) && (val > 42.0) {
  1.0 - (val - 42.0)/(60.0-42.0)
} else {
  0.0/0.0
}

In this code snippet, we determine a value to return based on the current state of the "one" signal. Different expressions are assigned depending on the signal's current value. Note that 0.0/0.0 will evaluate to NAN.

Rhai allows you to create your own functions

Like most other languages, you can create your own functions and use them whenever needed.

fn add(x, y) {
    x + y
}

fn sub(x, y,) {     // trailing comma in parameters list is OK
    x - y
}

Rhai allows you to do many more things than the ones here described. Careful reading of Language Reference of the Rhai Book brings numerous benefits in the usage of this programming language.

Operators / Functions overloaded for native signal handling

Operators

See:

The following operators were defined:

Arithmetic:
- +(i64/f64): i64/f64
- -(i64/f64): i64/f64
- +(i64/f64, i64/f64): i64/f64
- -(i64/f64, i64/f64): i64/f64
- *(i64/f64, i64/f64): i64/f64
- /(i64/f64, i64/f64): i64/f64
- %(i64/f64, i64/f64): i64/f64
- **(i64/f64, i64/f64): i64/f64
Bitwise:
- &(i64, i64): i64
- |(i64, i64): i64
- ^(i64, i64): i64
- <<(i64, i64): i64
- >>(i64, i64): i64
Logical:
- !(bool): bool
- &(bool, bool): bool
- |(bool, bool): bool
- ^(bool, bool): bool
String:
- +(string, string): string
Comparison (returns false on different argument types):
- ==(bool/i64/f64/string, bool/i64/f64/string): bool
- !=(bool/i64/f64/string, bool/i64/f64/string): bool
- <(i64/f64, i64/f64): bool
- <=(i64/f64, i64/f64): bool
- >(i64/f64, i64/f64): bool
- >=(i64/f64, i64/f64): bool

Binary arithmetic and comparison operators can handle mixed i64 and f64 arguments properly, the other parameter is then implicitly converted beforehand via to_float. Binary arithmetic operators will return f64 if at least one f64 argument is involved.

Functions

See:

The following functions were defined:

Arithmetic:
- abs(i64/f64): i64/f64
- sign(i64/f64): i64
- sqrt(f64): f64
- exp(f64): f64
- ln(f64): f64
- log(f64): f64
- log(f64, f64): f64
Trigonometry:
- sin(f64): f64
- cos(f64): f64
- tan(f64): f64
- sinh(f64): f64
- cosh(f64): f64
- tanh(f64): f64
- asin(f64): f64
- acos(f64): f64
- atan(f64): f64
- asinh(f64): f64
- acosh(f64): f64
- atanh(f64): f64
- hypot(f64, f64): f64
- atan(f64, f64): f64
Rounding:
- floor(f64): f64
- ceiling(f64): f64
- round(f64): f64
- int(f64): f64
- fraction(f64): f64
String:
- contains(string): bool
- len(string): i64
- trim(string): string – with whitespace characters as defined in UTF-8
- to_upper(string): string
- to_lower(string): string
- sub_string(value: string, start: i64, end: i64): string
Conversion:
- to_int(bool): i64 – returns 1/0
- to_float(bool): f64 – returns 1.0/0.0
- to_string(bool): string – returns "true"/"false"
- to_float(i64): f64
- to_string(i64): string
- to_int(f64): i64 – returns 0 on NAN; values beyond INTEGER_MAX/INTEGER_MIN are capped
- to_string(f64): string
- to_degrees(f64): f64
- to_radians(f64): f64
- parse_int(string): i64 – throws error if not parsable
- parse_float(string): f64 – throws error if not parsable
Testing:
- is_zero(i64/f64): bool
- is_odd(i64): bool
- is_even(i64): bool
- is_nan(f64): bool
- is_finite(f64): bool
- is_infinite(f64): bool
- is_empty(string): bool
Comparison (returns other parameter on NAN):
- max(i64/f64, i64/f64): i64/f64
- min(i64/f64, i64/f64): i64/f64

Comparison operators can handle mixed i64 and f64 arguments properly, the other parameter is then implicitly converted beforehand via to_float. It will return f64 if at least one f64 argument is involved.

The Boolean conversion and comparison functions were added and are not part of the official Rhai.

Constants

The following constants are defined in Rhai:

PI(): f64 – the Archimedes' constant: 3.1415...
E(): f64 – the Euler's number: 2.718...

Appendix 2: Toolbox

aivis engine v2 toolbox is no official part of aivis engine v2 but an associated side project. It mainly provides tools to turn output artifacts of aivis engine v2 into technical, single-file HTML reports for data scientists. Its api and behaviour is subject to change and experimental. Users should already know the concepts of aivis engine v2 beforehand.

Caveats:

Large input files or extensive settings might lead to a poor UI responsiveness.
UI layouts are optimized for wide screens.

Setup

The aivis engine v2 toolbox does not need a licensing key. Its python code is free to look into or even adapt. The respective toolbox release belonging to a aivis engine v2 release {VERSION} is available as:

Python whl file: aivis_engine_v2_toolbox-{VERSION}-py3-none-any.whl
Docker Image: aivis-engine-v2-toolbox:{VERSION}

Create Engine Report

Each call to construct a toolbox HTML report for engine xy has the following structure:

from aivis_engine_v2_toolbox.api import build_xy_report

config = {
    "title": "My Use Case Title", 
    ...
    "outputFile": "/path/to/my-use-case-report.html"}
build_xy_report(config)

Additionally, the config needs to contain references to the respective engine's output files, e.g. "analysisReportFile": "/path/to/analysis-report.json". The full call to create a report for any engine can for example be found in the python or argo examples of the respective engine.

Expert Configuration

There are many optional expert configurations to customize your HTML report. Some examples:

The aivis engine v2 toolbox always assumes timestamps to be unix and translates them to readable dates. This behaviour can be switched off via "advancedConfig": {"unixTime": False}, so that timestamps always remain long values.

By referring to a metadata file via "metadataFile": "/path/to/metadata.json", signals are not only described via their signal id but enriched with more information. The metadata json contains an array of signals with the keys id (must) as well as name, description, unitSymbol, unitType (all optional):

{
  "signals": [{
    "id": "fa6c65bb-5cee-45fa-ab19-355ba94889e9",
    "name": "et 1",
    "description": "extruder temperature nr. 1",
    "unitName": "Kelvin",
    "unitSymbol": "K"
   }, {
    "id": "dc3477e5-a83c-4485-b7f4-7528d336d9c4", 
    "name": "abc 2"
    }, 
   ...
]}

To every HTML report which contains a timeseries plot, additional signals can be added to also be displayed. It is however not an automatism to include all signals of the dataset for display, since a full dataset is typically an amount of data which should not be put into a single-file HTML.

All custom configuration options can be seen in the api.py file in src/aivis_engine_v2_toolbox.