kibernetika-serving (ex. kuberlab-serving)#
This is a document describing the possibilities and parameters of kibernetika-serving tool.
What is it?#
kibernetika-serving tool is a generic machine-learning model runner.
Basically, it starts the gRPC server and receives google protobuf messages
(just like tensorflow_model_server
does) and optionally, it can start
HTTP proxy to that gRPC server, so the requests to the server become much
more easier.
It supports the following machine learning frameworks and formats:
- TensorFlow
- ONNX (via Intel nGraph)
- Intel OpenVINO
- PyTorch
Also, it can be run without model and all required logic may be in the
process hooks (see hooks section below), for that need to take null
driver.
kibernetika-serving tool is available in serving containers:
- kuberlab/serving:latest (basic image, doesn't include OpenVINO support)
- kuberlab/serving/latest-gpu (includes GPU-related stuff)
- kuberlab/serving:latest-openvino (includes OpenVINO)
CLI interface#
Once you have an access to kibernetika-serving executable, you are ready to use it. Let's see the options and flags which can be provided during start.
Note: There is an alias kserving for kibernetika-serving.
usage: kibernetika-serving [-h] [--driver DRIVER] --model-path MODEL_PATH
[--hooks HOOKS] [--option OPTION] [--port <int>]
[--http-enable]
[--http-server-command HTTP_SERVER_COMMAND]
[--http-port HTTP_PORT]
optional arguments:
- -h, --help : show this help message and exit
- --driver DRIVER : Driver to use in ML serving server.
- --model-path MODEL_PATH : Path to model file or directory.
- --hooks HOOKS : Hooks python file containing preprocess(inputs) and postprocess(outputs) functions.
- --option OPTION, -o OPTION : Additional options specific to driver. format: -o option_name=option_value
- --port
: Port on which server will listen - --http-enable: Enables HTTP proxy for gRPC server
- --http-server-command HTTP_SERVER_COMMAND : Command for running http tfservable-proxy
- --http-port HTTP_PORT : Port for http server
Drivers#
This section describes the list of available backends (drivers) and their specific options.
TensorFlow#
Driver name used in options: tensorflow.
Options which are used in TensorFlow driver (may be provided via -o option_name=option_value):
- model_signature. Used if --model-path is a saved-model-dir.
If so, driver extracts provided model_signature from saved_model.pb.
Default value is serving_default (default constant in tensorflow package).
Example:
-o model_signature=transform
- inputs: Used only if --model-path is a .pb graph file. Service will
be using the provided tensor names as an input. Example:
-o inputs=input,prob1
- outputs: Used only if --model-path is a .pb graph file. Service will
be using the provided tensor names as an output. Example:
-o outputs=output,embeddings
Values accessible from processing hooks:
- graph: TensorFlow graph
- sess: TensorFlow session
- model_inputs: dict of tensor inputs, tensor name as a key and tensor as a value
- model_outputs: list of output tensor names
- driver: TensorFlowDriver object.
PyTorch#
Driver name used in options: pytorch.
Options which are used in PyTorch driver (may be provided via -o option_name=option_value):
- model_class. Required. It is an import path to PyTorch Net class.
Example:
-o model_class=package.module:NetClass
Values accessible from processing hooks:
- model: PyTorch model object
- model_class: PyTorch model class
- driver: PyTorchDriver object.
Intel OpenVINO#
Driver name used in options: openvino.
Options which are used in Intel OpenVINO driver (may be provided via -o option_name=option_value):
- model_weights: Path to weights (.bin) file. May be used in case weights file is in another location with .xml file
- device: The device which should be used for computation. Multiple devices
can be provided so in this case HETERO plugin will be used.
Possible values: CPU, MYRIAD, FPGA. Examples:
-o device=CPU
,-o device=MYRIAD
,-o device=CPU,MYRIAD
. - flexible_batch_size: Use variable first dimension in input data. In this
case the network will be executed multiple times. For example, if your network
receive input shape (1, 28, 28) and outputs (1, 10), then flexible_batch_size
enables the possibility to pass (N, 28, 28) as an input. Then the serving
output will be (N, 10) accordingly to each input request. Example:
-o flexible_batch_size=true
Values accessible from processing hooks:
- exec_net: Executable Network object
- model_inputs: Input model dict: input name -> input shape
- model_outputs: Output name list
- plugin: Plugin object (may be used in order to load more networks in hooks on the same device)
- driver: IntelOpenVINODriver object
ONNX (Open Neural Network Exchange)#
Driver name used in options: onnx.
Options which are used in ONNX driver (may be provided via -o option_name=option_value):
- runtime_backend: Runtime backend used for initilizing backend. Possible values: CPU, GPU, INTERPRETER, ARGON, NNP. Default value - CPU.
Null driver#
Driver name used in options: null.
This driver is essentially needed only for using in conjunction with hooks. It provides the possibility to create your own serving logic without a model at all. The driver itself just returns what it received as an input.
No other specific options are provided.
Hooks#
This section describes the possible ways of writing serving hooks and their capabilities.
Hooks file structure#
Basic hook functions set:
-
init_hook(**params)
: function-initializer. params is a dict containing all the options passed during start with -o flags and also containing some additional parameters like model_path. -
preprocess(inputs, ctx, **kwargs)
: Hook which executes before model inference. Possible function declarations:preprocess(inputs)
(ctx and kwargs unused),preprocess(inputs, ctx)
(kwargs unused),preprocess(inputs, **kwargs)
(ctx unused). -
postprocess(outputs, ctx, **kwargs)
: Hook which executes after model inference. Possible function declarations:postprocess(outputs)
(ctx and kwargs unused),postprocess(outputs, ctx)
(kwargs unused),postprocess(outputs, **kwargs)
(ctx unused).
Arguments in pre/postprocess hooks:
- inputs - dict containing input numpy arrays, e.g.
{'input-name': <numpy array>}
- outputs - dict containing output numpy arrays, e.g.
{'output-name': <numpy array>}
- ctx - gRPC context object. It can be used to pass data from preprocess hook to postprocess hook. Example: we can set some attribute to ctx object in preprocess hook and then read it in postprocess hook.
- **kwargs - key-value arguments from the driver. Each driver has a different set of kwargs arguments which passed to the hook (see driver section for the details)
Hook example
Here is a hook example which just logs the fact of calling itself.
hooks.py
import logging
LOG = logging.getLogger(__name__)
def init_hook(**params):
LOG.info("Got params:")
LOG.info(params)
def preprocess(inputs):
"""Does processing inputs before the model inference.
For example, here we can do checks, change shape and other ops.
:param inputs: dict containing input numpy arrays:
{'input-name': <numpy array>}
:type inputs: dict
:return: processed inputs dict
"""
LOG.info('Running preprocess...')
return inputs
def postprocess(outputs):
"""Does processing outputs after the model inference.
:param outputs: dict containing input numpy arrays:
{'input-name': <numpy array>}
:type outputs: dict
:return: processed outputs dict
"""
LOG.info('Running postprocess...')
return outputs
Launching multiple models#
For launching more than 1 model, need to specify appropriate hooks for pre- and postprocessing in order
to connect current model output and a next model input. At the end - all the models are lining up in one
pipeline: pre-hook1 -> model1 -> post-hook1 -> pre-hook2 -> model2 -> post-hook2
Therefore, hook file requires more than one pre- and postprocessing functions. For doing that, preprocess
and postprocess
objects in the hooks file must be a list of functions with model-number
length.
None
in the list may mean that there is no hook in this place.
Multiple models hooks file example#
For the examples below, kibernetika-serving
may be launched using the following command:
kibernetika-serving --driver null --model-path any --driver null --model-path any2
Note: The example should work with null
driver but you might be interested in changing
--driver
and --model-path
params for using your own driver and model. The example merely
shows a concept itself.
import logging
LOG = logging.getLogger(__name__)
def log(func):
def decorator(*args):
LOG.info('Running %s...' % func.__name__)
return func(*args)
return decorator
def init_hook(**params):
LOG.info("Got params:")
LOG.info(params)
@log
def preprocess1(inputs):
return inputs
@log
def preprocess2(inputs):
return inputs
@log
def postprocess1(outputs):
return outputs
@log
def postprocess2(outputs):
return outputs
preprocess = [preprocess1, preprocess2]
postprocess = [postprocess1, postprocess2]
Partially implemented hooks:
import logging
LOG = logging.getLogger(__name__)
def log(func):
def decorator(*args):
LOG.info('Running %s...' % func.__name__)
return func(*args)
return decorator
def init_hook(**params):
LOG.info("Got params:")
LOG.info(params)
@log
def preprocess1(inputs):
return inputs
@log
def postprocess2(outputs):
return outputs
preprocess = [preprocess1, None]
postprocess = [None, postprocess2]
Ignore (skip) model inference#
In some cases it needs to skip model inference (more useful when using multiple models). For example, first model detects faces and the second one - emotions on this faces. There is a case where there are no faces on image at all - therefore emotion detection doesn't make sense anymore because it requires face boxes.
Example:
def preprocess1(inputs):
return inputs
def postprocess1(outputs, ctx):
# if length of face boxes is 0 -> skip_next is True else False
ctx.skip_next = len(outputs.get('face-boxes', [])) == 0
return outputs
def preprocess2(inputs, ctx):
if ctx.skip_next:
inputs['ml-serving-ignore'] = True
return inputs
def postprocess2(outputs, ctx):
# Was skipped?
if ctx.skip_next:
# Nothing to return in case if skipped
return {}
return outputs
preprocess = [preprocess1, preprocess2]
postprocess = [postprocess1, postprocess2]