{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Model Inference for New Data\n", "\n", "After training a model on historical data, most common next step is inferering the outcome for the new data. MultiViz Analytics Engine (MVG) supports this by providing a feature called `ApplyModel`. To invoke this feature a successful analysis is required. Then, the `request_id` of that anaysis is used to refer to the model and apply it on any range of data on the same source or any source that has the same columns (or channels). \n", "\n", "This example using data from the Iris dataset illustrates how this process can be achieved." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Setup the connection" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import os\n", "from mvg import MVG, plotting\n", "from sklearn.datasets import load_iris\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "tags": [ "parameters" ] }, "outputs": [], "source": [ "ENDPOINT = \"http://api.beta.multiviz.com\"\n", "# Replace by your own Token\n", "TOKEN = os.environ[\"TEST_TOKEN\"]" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "session = MVG(ENDPOINT, TOKEN)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load the data\n", "The data from Iris dataset needs to be wrangled to conform to the format used by MVG.\n", "The data is in the form of a 2D numpy array which needs to be converted to a dictionary where each key holds the name of a column (in this case features of the plants) and the corresponding values." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": "['sepal length (cm)',\n 'sepal width (cm)',\n 'petal length (cm)',\n 'petal width (cm)']" }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset = load_iris()\n", "column_names = dataset[\"feature_names\"]\n", "column_names" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "data = {}\n", "for idx, name in enumerate(column_names):\n", " data[name] = list((dataset[\"data\"][:,idx]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that this dataset is not a timeseries dataset.\n", "It is chosen because of its availability in the common python packages like sklearn.\n", "However, it is sufficient for demonstration purposes of the `ApplyModel` feature.\n", "MVG requires timestamps for each datapoint to be provided as well.\n", "We use dummy timestamps from 0 to 149." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "data[\"timestamp\"] = list(range(len(dataset[\"data\"])))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create source and measurements\n", "A tabular source with these measurements needs to be created for the model to be built." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "session.create_tabular_source(sid=\"iris\", columns=column_names, meta={}, exist_ok=True)\n", "session.create_tabular_measurement(sid=\"iris\", data=data, meta={}, exist_ok=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The three classes in the data are indicated in `dataset[\"target\"]`.\n", "Data is divided into three equal parts, each with 50 samples for each class." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": "array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,\n 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,\n 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])" }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset[\"target\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Build the model\n", "\n", "To demonstrate `ApplyModel` as a feature, we train a model on the middle 100 samples and then apply the trained model on the entire data." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "request_data = session.request_analysis(\n", " sid=\"iris\", \n", " feature=\"ModeId\",\n", " start_timestamp=25, \n", " end_timestamp=124)\n", "request_id = request_data[\"request_id\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Helper function `wait_for_analysis` allows the possibility of waiting for the termination a group of analyses jobs." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "session.wait_for_analyses(request_id_list=[request_id])" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "mode_output = session.get_analysis_results(request_id=request_id)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Review the results\n", "\n", "We can plot the result of the analysis that shows the three distinct modes." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": "[]" }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": "
", "image/png": "\n" }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "mode_output[\"results\"].pop(\"mode_info\", None)\n", "mode_output[\"results\"].pop(\"mode_probabilities\", None)\n", "plotting.modes_over_time(pd.DataFrame(mode_output[\"results\"]), request_id)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Comparing the resulting labels with the ground-truth in the original iris dataset shows that the borders are identified precisely.\n", "Note that `ModeId` feature sorts the ID of modes in a way that the ID for the first encountered mode is reported as 0, the second mode as 1, and so on.\n", "This happens to be the case in the iris dataset as well but for other datasets sorting of modes needs to be done before comparison." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": "True" }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "all(mode_output[\"results\"][\"labels\"] == dataset[\"target\"][25:125])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Apply the model on the whole dataset\n", "\n", "Now, we apply the model on the entire dataset.\n", "To that end, we need to call `ApplyModel` feature and provide the `request_id` of the analysis that contains the model.\n", "Note that running `ApplyModel` is much faster than `ModeId`." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "request_data_am = session.request_analysis(\n", " sid=\"iris\", \n", " feature=\"ApplyModel\",\n", " parameters={\"model_ref\": request_id}\n", ")\n", "request_id_am = request_data_am[\"request_id\"]\n", "session.wait_for_analyses(request_id_list=[request_id_am])" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "am_output = session.get_analysis_results(request_id=request_id_am)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Review the results of the inference\n", "\n", "Comparing the results, show that the first 25 and last 25 elements of the data that was excluded while building the model in earlier steps are now correctly classified." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": "True" }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "all(am_output[\"results\"][\"labels\"]==dataset[\"target\"])\n" ] } ], "metadata": { "celltoolbar": "Tags", "interpreter": { "hash": "916dbcbb3f70747c44a77c7bcd40155683ae19c65e1c03b4aa3499c5328201f1" }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 2 }