Monitoring Your Service
Monitoring is a very important aspect of software deployment. Deployed algorithms often deteriorate over time, in a process called domain shift, and being able to monitor your deployed algorithms can make all the difference in detecting this in the early stages.
Monitoring Variables
To monitor the value of a variable in the code you can use the
store()
function. It takes an arbitrary number of keyword
arguments and saves their values in a service-specific database, where they can be
retrieved at a later time. store()
supports numeric and
string data. Let’s use the hello service as an example
import logging
from daeploy import service
logger = logging.getLogger(__name__)
service.add_parameter("greeting_phrase", "Hello")
@service.entrypoint
def hello(name: str) -> str:
greeting_phrase = service.get_parameter("greeting_phrase")
logger.info(f"Greeting someone with the name: {name}")
# Now the name and greeting will be stored everytime this function is called
service.store(name=name, greeting=greeting_phrase)
return f"{greeting_phrase} {name}"
Monitoring an Entrypoint
It is also possible to automatically monitor the input and output of a service
entrypoint. To do this we pass monitor=True
to the
entrypoint()
decorator
import logging
from daeploy import service
logger = logging.getLogger(__name__)
service.add_parameter("greeting_phrase", "Hello")
# Now the input and output of this entrypoint is stored after each call
@service.entrypoint(monitor=True)
def hello(name: str) -> str:
greeting_phrase = service.get_parameter("greeting_phrase")
logger.info(f"Greeting someone with the name: {name}")
return f"{greeting_phrase} {name}"
In this case, the data is saved as json strings, in order to support more data formats.
Monitoring a Parameter
Parameters set with add_parameter()
can also be monitored by passing
monitor=True
, which also supports integer, float and string data
import logging
from daeploy import service
logger = logging.getLogger(__name__)
# Now the value of ``"greeting_phrase"`` is stored every time it's updated
service.add_parameter("greeting_phrase", "Hello", monitor=True)
@service.entrypoint
def hello(name: str) -> str:
greeting_phrase = service.get_parameter("greeting_phrase")
logger.info(f"Greeting someone with the name: {name}")
return f"{greeting_phrase} {name}"
Getting the Data
Daeploy provides three options for accessing the time-series data for your monitored variables.
Option 1: Json format
The time-series data in json format can be collected via the following entrypoint:
http://your-host/services/<servce_name>_<service_version>/~monitor
It is possible to specify start time and end time for the wanted time-series data by adding end and start query parameters to the url, like this:
http://your-host/services/<servce>_<service_version>/~monitor?end=<...>?start=<...>
The end and start query parameters needs to have the following format:
YYYY-MM-DD[T]HH:MM[:SS[.ffffff]][Z or [±]HH[:]MM]]]
, so for instance: 2020-01-01 02:30
It it also possible to specify which variables to query time-series by specify the query parameter variables, like this:
http://your-host/services/<servce_name>_<service_version>/~monitor?end=<...>?start=<...>?variables=v1?variables=v2
This will query time-series data for variable a and b. The returned json time-series data will be on the following format:
{
"a": {
"timestamp": [t1, t2, ..., tn]
"value": [v1, v2, ..., vn]
}
"b": {
"timestamp": [t1, t2, ..., tn]
"value": [v1, v2, ..., vn]
}
}
Example, using the requests package in python:
response = requests.GET(
"services/name_version/~monitor",
headers={"Authorization": f"Bearer {TOKEN}"})
data = response.json()
Option 2: CSV files
The time-series data can also be returned in csv format, the entrypoint for this is:
http://your-host/services/<servce_name>_<service_version>/~csv
It is possible to specify, which variables and in during which time interval to query time-series data for by using the following query parameters: end, start and varaibles. For instance:
http://your-host/services/<servce_name>_<service_version>/~monitor/csv?end=<...>?start=<...>?variables=v1?variables=v2
The end and start query parameters needs to have the following format:
YYYY-MM-DD[T]HH:MM[:SS[.ffffff]][Z or [±]HH[:]MM]]]
, so for instance: 2020-01-01 02:30
This will query time-series data for variable a and b.This entrypoint returns a zip file containing one csv file per requested variable. The csv files has the following format:
timestamp |
value |
---|---|
t1 |
v1 |
t2 |
v2 |
Option 3: The whole service database
The time-series data is stored a in sqlite databases and the whole service database can be requested at the following entrypoint:
http://your-host/services/<servce_name>_<service_version>/~monitor/db
Limiting the Number of Records in the Database
Service databases are not meant as permanent data storage. The records are kept for 365 days by default, but if a service stores large amounts of data often there is a risk of filling the disk of the host machine. To prevent this it’s possible to change how many instances of each variable can be stored or to automatically remove old instances.
These options are set using an environment variable in the service’s runtime. The easiest way to set these are in .s2i/environment:
- DAEPLOY_SERVICE_DB_TABLE_LIMIT
Should have the format <number><limiter>
Limiter options: rows, days, hours, minutes or seconds.
To limit the number of days to store variables to 30 we would add the following to .s2i/environment:
DAEPLOY_SERVICE_DB_TABLE_LIMIT=30days