Deploying Machine Learning Models

Machine Learning models are commonly trained in a separate environment from the deployment environment. The models are typically serialized as e.g. pickle files, which can easily be moved between the environments. In this tutorial we are going to look at how we can take a pickled sklearn classifier model and create a service that uses that model to predict a class on data sent to it.

The Classifier

Let’s create a simple classifier for the famous iris dataset in this example. We will not go into any detail about how it’s done. There are plenty of tutorials online that go through this process:

import pickle
from pathlib import Path

import numpy as np
from sklearn import datasets
from sklearn.neighbors import KNeighborsClassifier

np.random.seed(1000)
THIS_DIR = Path(__file__).parent

iris = datasets.load_iris(as_frame=True)

X_df = iris["data"]
y = iris["target"]

n = X_df.shape[0]
i_train = np.random.choice(n, round(0.7 * n))

y_train = y[i_train]
X_train = X_df.loc[i_train, :]

y_test = y.drop(i_train, axis=0)
X_test = X_df.drop(i_train, axis=0)

knn = KNeighborsClassifier()
knn.fit(X_train, y_train)

with open(THIS_DIR / "classifier.pkl", "wb+") as fp:
    pickle.dump(knn, fp)

The Service

Like we have done previously we use daeploy init command to create a new project.

>>> daeploy init 
project_name [my_project]: iris_project

The first thing we have to do is to create a directory in iris_project to put our model.

>>> mkdir iris_project/models 
>>> mv classifier.pkl iris_project/models/ 

While it’s fresh in our memory, let’s add sklearn to the requirements.txt file, because while a pickled object does not require its dependencies to be imported, they still have to be installed. requirements.txt should contain the following:

daeploy
sklearn

A very simple service to make predictions using the classifier we just trained could look like this:

import logging
import pickle
from pathlib import Path

from daeploy import service
from daeploy.data_types import ArrayOutput, DataFrameInput

logger = logging.getLogger(__name__)
THIS_DIR = Path(__file__).parent

with open(THIS_DIR / "models/classifier.pkl", "rb") as fp:
    CLASSIFIER = pickle.load(fp)


@service.entrypoint
def predict(data: DataFrameInput) -> ArrayOutput:
    logger.info(f"Recieved data: \n{data}")
    pred = CLASSIFIER.predict(data)
    logger.info(f"Predicted: {pred}")
    return pred


if __name__ == "__main__":
    service.run()

In this service we unpickle the model as a global variable so we can use it anywhere in the service and we create the prediction entrypoint:

@service.entrypoint
def predict(data: DataFrameInput) -> ArrayOutput:
    logger.info(f"Recieved data: \n{data}")
    pred = CLASSIFIER.predict(data)
    logger.info(f"Predicted: {pred}")
    return pred

The entrypoint simply calls the predict method of the classifier, writes some logs and responds with the result. Thanks to the special input and output types that we describe in Using Non-jsonable Data Types, we don’t have to worry about converting the input and output to json compatible types.

Deployment

Just like before, we deploy the service using the CLI:

>>> daeploy deploy iris_classifier 1.0.0 ./iris_project 
Active host: http://your-host
Deploying service...
Service deployed successfully
MAIN    NAME             VERSION    STATUS    RUNNING
------  ---------------  ---------  --------  -----------------------------------
*       iris_classifier  1.0.0      running   Running (since 2020-11-23 16:22:46)

The service should now be up and running. Go to the documentation to test it out: http://your-host/services/iris_classifier/docs. The data should be on a format that can be transformed to a pandas dataframe, for example:

{
    "data":
    {
        "col1": [1, 2],
        "col2": [1, 2],
        "col3": [1, 2],
        "col4": [1, 2],
    }
}

Note

To get validation on your entrypoint input you should take a look at Typing in the SDK.