Working with Tabular Data

Although MVG has been designed with vibration waveform data in mind, it still works wonderfully well with non-vibration or KPI data. We refer to this as tabular timeseries data. Each measurement consists of one or more variables captured at the same time. This is different from the vibration waveform data, where each measurement is a short measurement sample from one vibration sensor.

In this example, we will show how to work with tabular sources and measurements, and how to create an analysis. If you have seen any of the previous examples, much of this one will be familiar to you.

Imports

Beyond importing MVG and the ModeId analysis class, we want to use pandas for loading and processing the data before the analysis.

[1]:

import os
from pprint import pprint
from pathlib import Path

import pandas as pd

from mvg import MVG
from mvg.analysis_classes import ModeId

After importing we create the MVG session that let us communicate with the MVG server. For that we need an endpoint URL and a valid token. If you do not have a token, you can get one here.

Note

Each token is used for Authorization AND Authentication. Thus, each unique token represents a unique user, each user has it own, unique database on the VA-MVG’ service.

You need to insert your token received from Viking Analytics here: Just replace "os.environ['TEST_TOKEN']" by your token as a string.

[2]:

ENDPOINT = "http://api.beta.multiviz.com"
TOKEN = os.environ['TEST_TOKEN']

[3]:

session = MVG(ENDPOINT, TOKEN)

We will use a dataset that is collected from a heat pump in the cellar of one of the employees at Viking Analytics. The data is available in this git repository. We clone the repository to get access it. If you don’t have git you can download the data from the link.

[4]:

!git clone https://github.com/vikinganalytics/tabular-example-data.git

Cloning into 'tabular-example-data'...

[5]:

data_path = Path.cwd() / "tabular-example-data" / "heatpump.csv"
data_df = pd.read_csv(data_path)
data_df.head()

[5]:

	timestamp	T_out	T_down	T_kitchen	WATT	T_up	T_brine_in	T_brine_out	T_cellar
0	2018-02-01T00:00:00Z	1.976053	38.367181	20.512105	2.780167	40.028018	3.690000	2.068514	10.965206
1	2018-02-01T01:40:00Z	2.380000	36.940000	20.190000	3.045333	39.060000	3.560000	1.991486	10.924431
2	2018-02-01T03:25:00Z	2.211223	37.715920	20.190000	3.060000	39.840240	3.598392	1.961608	10.982982
3	2018-02-01T05:10:00Z	1.964956	37.940000	20.250000	3.151579	40.023789	3.611330	2.068670	10.888049
4	2018-02-01T06:55:00Z	0.349134	37.551293	20.406457	3.257167	39.856860	3.690000	2.120000	10.941380

We loaded the .csv file into a dataframe object and printed out the first five rows. The timestamp was read as string timestamps, but for the MVG API we need the timestamps to be in milliseconds since EPOCH. So we need to do some data wrangling in order to get it to the correct data type.

We can use the function pandas.to_datetime() to convert a column into datetime objects. Then we can convert the timestamps into integers, which will automatically get converted to nanoseconds since EPOCH. Then, when dividing by 1e6 we get the same time in milliseconds. We can see that the conversion was successful by printing the first five rows again.

[6]:

data_df["timestamp"] = pd.to_datetime(data_df["timestamp"]).astype("int64") // 1000000
data_df.head()

[6]:

	timestamp	T_out	T_down	T_kitchen	WATT	T_up	T_brine_in	T_brine_out	T_cellar
0	1517443200000	1.976053	38.367181	20.512105	2.780167	40.028018	3.690000	2.068514	10.965206
1	1517449200000	2.380000	36.940000	20.190000	3.045333	39.060000	3.560000	1.991486	10.924431
2	1517455500000	2.211223	37.715920	20.190000	3.060000	39.840240	3.598392	1.961608	10.982982
3	1517461800000	1.964956	37.940000	20.250000	3.151579	40.023789	3.611330	2.068670	10.888049
4	1517468100000	0.349134	37.551293	20.406457	3.257167	39.856860	3.690000	2.120000	10.941380

To upload the data to the API we need to convert the dataframe into a dictionary on the format {column: values}. Fortunately, dataframes have a method to do just that, called to_dict(). It has an argument for changing the orientation of the resulting dictionary. By using to_dict("list") we will get it on the correct format.

Note

The dictionary must have a "timestamp" key and at least one more column, corresponding to a tracked variable. The to_dict("list") method will not include the index, so it is important to have the timestamp as a column in the data and not as an index.

[7]:

data_dict = data_df.to_dict("list")

Now that the data has been converted to the correct format we are ready to create the source and upload some measurements. When creating a tabular source, you must define the columns that the data will contain ahead of time. This is to prevent arbitrary data to be uploaded to the source. Besides the columns argument, creating a tabular source works the exact same way as creating a waveform source.

[8]:

session.create_tabular_source(
    sid="heatpump",
    meta={"location": "Molnlycke"},
    columns=data_df.columns.tolist()
)

We can check that the source was created correctly by getting the source information.

[9]:

source = session.get_source("heatpump")
pprint(source)

{'meta': {'location': 'Molnlycke'},
 'properties': {'columns': ['T_out',
                            'T_down',
                            'T_kitchen',
                            'WATT',
                            'T_up',
                            'T_brine_in',
                            'T_brine_out',
                            'T_cellar'],
                'data_class': 'tabular'},
 'source_id': 'heatpump'}

Next, let’s upload the data we prepared to the source

[10]:

session.create_tabular_measurement(
    sid="heatpump",
    data=data_dict
)

Now that we have a source with measurements we can request an analysis of the data from the server. Requesting analyses for tabular data works the same way as it does for waveform data. Although not all features will work for tabular data. The KPIDemo feature for example will only work with waveform data. Another thing to keep in mind is that when requesting population analyses, all sources to be analyzed simultaneously must have the same data class.

For now, we will simply run the mode identification algorithm.

[11]:

response = session.request_analysis(
    sid="heatpump",
    feature="ModeId",
)
request_id = response["request_id"]

To make the results easier to work with we can load them into an analysis class. There exists one for each analysis feature. Once the analysis is completed and we have received the results, we can instatiate the analysis class with the results dictionary.

[12]:

session.wait_for_analyses([request_id])
results = session.get_analysis_results(request_id)
analysis = ModeId(results)

The analysis class makes it very convenient to view the results. Let’s use the ModeId.plot() method to get an overview of the results.

[13]:

analysis.plot()

../../_images/content_examples_3-tabular_example_23_0.png

[13]:

''

The different modes help to identify between idle and have high-usage conditions along all seasons of the year.

When you are finished with the example you can go ahead and delete the source

[14]:

session.delete_source("heatpump")