Analysis and Results Visualization

This script describes the procedure to request an analysis to the Viking Analytics’ MultiViz Analytics Engine (MVG) service. It shows how to query for results of single-asset or asset-population analysis. In addition, it presents some examples of how to visualize the results available for the mode identification feature.

In this example, we will describe how to access and manipulate the analysis results directly. The “Analysis Classes” example provides a simplified and unified interface to access these results as a pandas dataframe, along with some basic visualization of the results.

Preliminaries

This procedure describes all the steps to request an analysis, get analysis results and plot those results using the functions in plotting. Local visualizations functions to create the figures are imported from plotting.

[1]:
import os
import pandas as pd
import matplotlib.pyplot as plt
from requests import HTTPError

# import mvg library with python bindings to mvg-API
from mvg import MVG, plotting

Note that the TOKEN is used both for authorization and authentication. Thus, each unique token represents a unique user and each user has their own unique database on the VA vibration service.

**You need to insert your token received from Viking Analytics here:

[2]:
# Replace by your own Token
VALID_TOKEN = os.environ['TEST_TOKEN']

Instantiate a session object with MVG library. A session object basically caches the endpoint and the token, to simplify the calls to the MVG library.

[3]:
ENDPOINT = "https://api.beta.multiviz.com"
session = MVG(ENDPOINT, VALID_TOKEN)

Asset Analysis

In this example, we will use the sources uploaded by the “Sources and Measurement” example. We start by looking if the sources are available in the database. At least, sources, “u0001” and “u0005”, should appear as available.

[14]:
sources = session.list_sources()

print("Retrieved sources")
for src in sources:
    print(src)
    s_info = session.get_source(src['source_id'])
    print(f"Source info retrieved for one source: {s_info}")
Retrieved sources
{'source_id': 'u0001', 'meta': {'assetId': 'assetA', 'measPoint': 'mloc01', 'location': 'paris', 'updated': 'YES! I have been updated'}, 'properties': {'data_class': 'waveform', 'channels': ['acc']}}
Source info retrieved for one source: {'source_id': 'u0001', 'meta': {'assetId': 'assetA', 'measPoint': 'mloc01', 'location': 'paris', 'updated': 'YES! I have been updated'}, 'properties': {'data_class': 'waveform', 'channels': ['acc']}}
{'source_id': 'u0002', 'meta': {'assetId': 'assetB', 'measPoint': 'mloc01', 'location': 'paris'}, 'properties': {'data_class': 'waveform', 'channels': ['acc']}}
Source info retrieved for one source: {'source_id': 'u0002', 'meta': {'assetId': 'assetB', 'measPoint': 'mloc01', 'location': 'paris'}, 'properties': {'data_class': 'waveform', 'channels': ['acc']}}
{'source_id': 'u0003', 'meta': {'assetId': 'assetC', 'measPoint': 'mloc01', 'location': 'milano'}, 'properties': {'data_class': 'waveform', 'channels': ['acc']}}
Source info retrieved for one source: {'source_id': 'u0003', 'meta': {'assetId': 'assetC', 'measPoint': 'mloc01', 'location': 'milano'}, 'properties': {'data_class': 'waveform', 'channels': ['acc']}}
{'source_id': 'u0004', 'meta': {'assetId': 'assetD', 'measPoint': 'mloc01', 'location': 'milano'}, 'properties': {'data_class': 'waveform', 'channels': ['acc']}}
Source info retrieved for one source: {'source_id': 'u0004', 'meta': {'assetId': 'assetD', 'measPoint': 'mloc01', 'location': 'milano'}, 'properties': {'data_class': 'waveform', 'channels': ['acc']}}
{'source_id': 'u0005', 'meta': {'assetId': 'assetE', 'measPoint': 'mloc01', 'location': 'london'}, 'properties': {'data_class': 'waveform', 'channels': ['acc']}}
Source info retrieved for one source: {'source_id': 'u0005', 'meta': {'assetId': 'assetE', 'measPoint': 'mloc01', 'location': 'london'}, 'properties': {'data_class': 'waveform', 'channels': ['acc']}}
{'source_id': 'u0006', 'meta': {'assetId': 'assetF', 'measPoint': 'mloc01', 'location': 'london'}, 'properties': {'data_class': 'waveform', 'channels': ['acc']}}
Source info retrieved for one source: {'source_id': 'u0006', 'meta': {'assetId': 'assetF', 'measPoint': 'mloc01', 'location': 'london'}, 'properties': {'data_class': 'waveform', 'channels': ['acc']}}

During our example, we will utilize sources, “u0001” and “u0005”, which were previously uploaded to our database.

[15]:
SOURCE_IDS = ["u0001", "u0005"]
SOURCE_IDS
[15]:
['u0001', 'u0005']

The Viking Analytics Vibration service has several features available. We list the available features along with the version of each of them in the following way:

[16]:
available_features = session.supported_features()
available_features
[16]:
{'RMS': '1.0.0', 'ModeId': '0.1.1', 'BlackSheep': '1.0.0', 'KPIDemo': '1.0.0'}

Once the source and measurements had been uploaded in the database, we can request an analysis to the VA Vibration service. When we request an analysis, we need to say which source is to be used in the analysis and the feature to be applied.

The requested analysis will return a dictionary object with two elements. The first element is a "request_id" that can be used to retrieve the results after. The second element is "request_status" that provides the status right after placing the analysis request.

Here, we will request the “RMS” feature from our source “u0001”.

[7]:
SOURCE_ID = SOURCE_IDS[0]
RMS_u0001 = session.request_analysis(SOURCE_ID, 'RMS')
RMS_u0001
[7]:
{'request_id': '99c500893a6184b4f3dcd4a7a3b1565b', 'request_status': 'queued'}

Now, we proceed to request the “ModeId” feature for the same source.

[8]:
ModeId_u0001 = session.request_analysis(SOURCE_ID, 'ModeId')
ModeId_u0001
[8]:
{'request_id': 'adb33a07b97ac7c1b2ab59ae3af16e5d', 'request_status': 'queued'}

Before we are able to get the analysis results, we need to wait until those analyses are successfully completed.

We can query for the status of our requested analyses. The possible status are: - Queued: The analysis has not started in the remote server and it is in the queue to begin. - Ongoing: The analysis is been processed at this time. - Failed: The analysis is complete and failed to produce a result. - Successful: The analysis is complete and it produced a successful result.

[9]:
REQUEST_IDS_u0001 = [RMS_u0001['request_id'], ModeId_u0001['request_id']]
status = session.get_analysis_status(REQUEST_IDS_u0001[0])
print(f"RMS Analysis: {status}")
status = session.get_analysis_status(REQUEST_IDS_u0001[1])
print(f"ModeId Analysis: {status}")
RMS Analysis: successful
ModeId Analysis: successful

The similar procedure is repeated to request the “RMS” and “ModeId” for our source “u0005”.

[17]:
SOURCE_ID = SOURCE_IDS[1]
RMS_u0005 = session.request_analysis(SOURCE_ID, 'RMS')
RMS_u0005
[17]:
{'request_id': 'caebeacdee1874386c48da874896ba1e', 'request_status': 'queued'}
[18]:
SOURCE_ID = SOURCE_IDS[1]
ModeId_u0005 = session.request_analysis(SOURCE_ID, 'ModeId')
ModeId_u0005
[18]:
{'request_id': '938331c0426f024997a353e85e039a25', 'request_status': 'queued'}

Also, we check the status of our analysis for source “u0005” to confirm they had been completed successfully.

[19]:
REQUEST_IDS_u0005 = [RMS_u0005['request_id'], ModeId_u0005['request_id']]
status = session.get_analysis_status(REQUEST_IDS_u0005[0])
print(f"RMS Analysis: {status}")
status = session.get_analysis_status(REQUEST_IDS_u0005[1])
print(f"ModeId Analysis: {status}")
RMS Analysis: successful
ModeId Analysis: successful

Visualization

Once the analysis is complete, one get the results by calling the corresponding “request_id” for each analysis.

First, let’s check all existing “request_id” in the database for each source and feature.

[20]:
REQUEST_IDS_RMS = [session.list_analyses(SOURCE_IDS[0], "RMS"), session.list_analyses(SOURCE_IDS[1], "RMS")]
print(f"The RMS analysis of {SOURCE_IDS[0]} has request_id {REQUEST_IDS_RMS[0]}.")
print(f"The RMS analysis of {SOURCE_IDS[1]} has request_id {REQUEST_IDS_RMS[1]}.")
REQUEST_IDS_MODEID = [session.list_analyses(SOURCE_IDS[0], "ModeId"), session.list_analyses(SOURCE_IDS[1], "ModeId")]
print(f"The ModeId analysis of {SOURCE_IDS[0]} has request_id {REQUEST_IDS_MODEID[0]}.")
print(f"The ModeId analysis of {SOURCE_IDS[1]} has request_id {REQUEST_IDS_MODEID[1]}.")
The RMS analysis of u0001 has request_id ['bc4ea1194b4929764a08e97c60360cb6', '99c500893a6184b4f3dcd4a7a3b1565b'].
The RMS analysis of u0005 has request_id ['caebeacdee1874386c48da874896ba1e'].
The ModeId analysis of u0001 has request_id ['fae66b2396b7dde3039a2f58e287686a', 'adb33a07b97ac7c1b2ab59ae3af16e5d'].
The ModeId analysis of u0005 has request_id ['938331c0426f024997a353e85e039a25'].

The following step is to retrieve the results by calling each one of the “request_id”.

The output of the "get_analysis_results" function is a dictionary. We show the keys of one those dictionaries. The keys are the same for all features.

[21]:
rms_output1 = session.get_analysis_results(request_id=REQUEST_IDS_u0001[0])
mode_output1 = session.get_analysis_results(request_id=REQUEST_IDS_u0001[1])
rms_output5 = session.get_analysis_results(request_id=REQUEST_IDS_u0005[0])
mode_output5 = session.get_analysis_results(request_id=REQUEST_IDS_u0005[1])

rms_output1.keys()
[21]:
dict_keys(['status', 'request_id', 'feature', 'results', 'inputs', 'error_info', 'debug_info'])

Each dictionary contains seven key elements. These elements are: - "status" indicates if the analysis was successful. - "request_id" is the identifier of the requested analysis. - "feature" is the name of the request feature. - "results" includes the numeric results. - "inputs" includes the input information for the request analysis. - "error_info" includes the error information in case the analysis fails and it is empty if the analysis is successful. - "debug_info" includes debugging (log) information related to the failed analysis.

The "results" of the “RMS” feature are five lists. These lists are: - timestamps: epoch (in seconds for this example) of the measurements. - rms: rms value for each measurement. - rms_dc: rms value without the dc component for each measurement. - dc: dc component value for each measurement. - utilization: boolean indicating whether the measurement was used for the rms calculation.

These lists can be converted into a dataframe for ease of manipulation. In this example, we will show how to access the dictionary results information and convert it into a Pandas dataframe. Please check the “Analysis Classes” example for directly getting a results Pandas dataframe. In addition, the “timestamp” column is converted to a timestamp object in a column called “Date”.

[28]:
df_rms1 = pd.DataFrame(rms_output1["results"]["acc"])
df_rms1["timestamps"] = rms_output1["results"]['timestamps']
df_rms1['Date'] = pd.to_datetime(df_rms1['timestamps'], unit="s")
df_rms1.head()
[28]:
rms rms_dc dc utilization timestamps Date
0 0.647123 0.662183 -0.140420 1 1570273260 2019-10-05 11:01:00
1 0.646619 0.661652 -0.140239 1 1570359660 2019-10-06 11:01:00
2 0.646873 0.661923 -0.140347 1 1570446060 2019-10-07 11:01:00
3 0.646643 0.661714 -0.140423 1 1570532460 2019-10-08 11:01:00
4 0.646717 0.661709 -0.140055 1 1570618860 2019-10-09 11:01:00

The "results" of the “ModeId” feature are four lists and one dictionary: - The first list is the measurement epoch in the same unit as the measurement was uploaded, which is seconds for this example. - The second list corresponds to the mode label given to the timestamp. - The third list is a boolean to indicate the uncertainty of label. - The fourth list corresponds to the mode probability of each mode label. The dictionary, called ‘model_info’, contains the “Emerging modes” results.

“Emerging modes” is an additional output of the analysis results that describes the first appearance of each one of the identified modes.

We pass all the lists to a dataframe for ease of manipulation. Similarly to the RMS feature, we will show how to access the dictionary results information and convert it into a Pandas dataframe. Please check the “Analysis Classes” example for directly getting a results Pandas dataframe. In addition, the “timestamp” column is converted to a timestamp object in a column called “Date”.

[30]:
mode_all1 = mode_output1["results"].copy()
mode_emerging1 = mode_all1.pop("mode_info")

# Conversion to dataframe of the full mode labels table
df_mode1 = pd.DataFrame(mode_all1)
df_mode1['Date'] = pd.to_datetime(df_mode1['timestamps'], unit="s")
df_mode1.head()
[30]:
timestamps labels uncertain mode_probability Date
0 1570273260 0 False 0.002528 2019-10-05 11:01:00
1 1570359660 0 False 0.000194 2019-10-06 11:01:00
2 1570446060 0 False 0.000825 2019-10-07 11:01:00
3 1570532460 0 False 0.000025 2019-10-08 11:01:00
4 1570618860 0 False 0.000704 2019-10-09 11:01:00
[31]:
# Conversion to dataframe of the emerging modes table
df_emerging1 = pd.DataFrame(mode_emerging1)
df_emerging1['emerging_Date'] = pd.to_datetime(df_emerging1['emerging_time'], unit="s")
df_emerging1.head()
[31]:
modes emerging_time max_prob_time max_probability emerging_Date
0 0 1570273260 1571137260 0.009954 2019-10-05 11:01:00
1 1 1571655660 1571655660 0.023025 2019-10-21 11:01:00
2 2 1572350460 1574424060 0.010912 2019-10-29 12:01:00

For the purpose of demonstration of our visualization functions, we will merge the dataframes of the “RMS” results and the “ModeId” results into a single dataframe.

[32]:
df_u0001 =  pd.merge_asof(df_rms1, df_mode1, on="timestamps", by="Date")
df_u0001.head()
[32]:
rms rms_dc dc utilization timestamps Date labels uncertain mode_probability
0 0.647123 0.662183 -0.140420 1 1570273260 2019-10-05 11:01:00 0 False 0.002528
1 0.646619 0.661652 -0.140239 1 1570359660 2019-10-06 11:01:00 0 False 0.000194
2 0.646873 0.661923 -0.140347 1 1570446060 2019-10-07 11:01:00 0 False 0.000825
3 0.646643 0.661714 -0.140423 1 1570532460 2019-10-08 11:01:00 0 False 0.000025
4 0.646717 0.661709 -0.140055 1 1570618860 2019-10-09 11:01:00 0 False 0.000704

We repeat the same procedure of converting the results to a dataframe for source “u0005”.

[35]:
#RMS
df_rms5 = pd.DataFrame(rms_output5["results"]["acc"])
df_rms5["timestamps"] = rms_output5["results"]['timestamps']
df_rms5['Date'] = pd.to_datetime(df_rms5['timestamps'], unit="s")
#MODE_ID (full)
mode_all5 = mode_output5["results"].copy()
mode_emerging5 = mode_all5.pop("mode_info")
df_mode5 = pd.DataFrame(mode_all5)
df_mode5['Date'] = pd.to_datetime(df_mode5['timestamps'], unit="s")
#Merging dataframes
df_u0005 = pd.merge_asof(df_rms5, df_mode5, on="timestamps", by="Date")
df_u0005.head()
[35]:
rms rms_dc dc utilization timestamps Date labels uncertain mode_probability
0 0.646887 0.661967 -0.140491 1 1570186860 2019-10-04 11:01:00 0 False 0.000008
1 0.646963 0.661950 -0.140056 1 1570273260 2019-10-05 11:01:00 0 False 0.000001
2 0.646905 0.661932 -0.140244 1 1570359660 2019-10-06 11:01:00 0 False 0.000002
3 0.647395 0.662385 -0.140124 1 1570446060 2019-10-07 11:01:00 0 False 0.000046
4 0.647057 0.662064 -0.140165 1 1570532460 2019-10-08 11:01:00 0 False 0.000004

We can call the individual boxplot for one source and display the boxplot of the “RMS” for each one of the operating modes. Here, we use results for source “u0001”.

[36]:
image_boxplot = plotting.modes_boxplot(df_u0001, "rms", SOURCE_IDS[0])
../../_images/content_examples_analysis_visual_38_0.png

We create a list with all the sources dataframes for the “ModeId” feature and display the “RMS” boxplot across the different modes.

[37]:
plotting.modes_group_boxplot([df_u0001, df_u0005], "rms", SOURCE_IDS)
../../_images/content_examples_analysis_visual_40_0.png

We call the display over time of an individual source and identify all its operating modes. We define the parameter “timeunit” given that the default unit in the function is milliseconds and the epochs in our data is seconds.

[38]:
image_modes = plotting.modes_over_time(df_u0001, SOURCE_IDS[0], timeunit="s")
../../_images/content_examples_analysis_visual_42_0.png

Uncertain areas appear as a gray rectangle above the corresponding periods in the modes plot. If there are not uncertain areas, the space is white.

We create a list with all the sources dataframes and display the modes over time for all.

[39]:
plotting.modes_over_time_group([df_u0001, df_u0005], SOURCE_IDS, timeunit="s")
../../_images/content_examples_analysis_visual_44_0.png

Population Analysis

Another feature is “BlackSheep”, which targets a group of assets and aims to identify the atypical assets within a population.

During our example, we will utilize all sources previously uploaded to our database.

[40]:
SOURCE_ID_ALL = ["u0001", "u0002", "u0003", "u0004", "u0005", "u0006"]
SOURCE_ID_ALL
[40]:
['u0001', 'u0002', 'u0003', 'u0004', 'u0005', 'u0006']

Similar to single asset feature, the requested population analysis will return a dictionary object with two elements. The first element is a "request_id" that can be used to retrieve the results after. The second element is "request_status" that provides the status right after placing the analysis request.

Here, we will request the “BlackSheep” feature for all our sources. In addition, we show how to pass additional parameters into the analysis request. When we want to pass an additional parameter request, this needs to be in the form of a dictionary. In this particular analysis, we had relaxed the "atypical_threshold" parameter from the default value of 0.05 to 0.10 to discover a larger number of assets as atypical.

[41]:
params = {"atypical_threshold": 0.1}
BS_ALL = session.request_population_analysis(SOURCE_ID_ALL, 'BlackSheep', parameters = params)
BS_ALL
[41]:
{'request_id': 'e86e3478105f844a2487ba276cb9f728', 'request_status': 'queued'}

Similarly, we can query for the status of our requested analysis, where the different status options remain the same.

[42]:
REQUEST_ID_ALL = BS_ALL['request_id']
status = session.get_analysis_status(REQUEST_ID_ALL)
print(f"BlackSheep Analysis: {status}")
BlackSheep Analysis: successful

The next step is to retrieve the results by calling the “request_id”.

The output of the "get_analysis_results" function is similar to single asset features. These seven elements are: - "status" indicates if the analysis was successful. - "request_id" is the identifier of the requested analysis. - "feature" is the name of the request feature. - "results" includes the numeric results. - "inputs" includes the input information for the request analysis. - "error_info" includes the error information in case the analysis fails and it is empty if the analysis is successful. - "debug_info" includes debugging (log) information related to the failed analysis.

[43]:
blacksheep_table = session.get_analysis_results(request_id=REQUEST_ID_ALL)
blacksheep_table.keys()
[43]:
dict_keys(['status', 'request_id', 'feature', 'results', 'inputs', 'error_info', 'debug_info'])

The "results" of the “BlackSheep” indicates the atypical assets. "results" is a dictionary with a key element labelled "atypical_assets". "atypical_assets" is a list and the length of the list indicates the number of atypical assets. Each element on this list is a dictionary. The dictionary includes the source_id of the atypical assets, a list with all the measurement timestamps and the corresponding list with a boolean indicating whether the measurement is atypical.

Here, we show the number of atypical assets and the atypical order of these assets.

[44]:
atypical_output = blacksheep_table["results"]
atypical = atypical_output["atypical_assets"]

print(f"There is a total of {len(atypical)} atypical assets in this analysis.")
There is a total of 2 atypical assets in this analysis.
[45]:
atypical1 = atypical[0]
atypical2 = atypical[1]
print(f"The 1st blacksheep is {atypical1['source_id']}")
print(f"The 2nd blacksheep is {atypical2['source_id']}")
The 1st blacksheep is u0004
The 2nd blacksheep is u0002

Please check the “Analysis Classes” example for some of the visualization options of the blacksheep results.