ML Model Deployment concepts (part1)

Deployment

Deployment is a process of running an application on a server or pipeline

Given a model object, how to make it part of your application.

It consist of components:

Deploy model

Low level solution
Off the shelf solution (High level solution)

Serving model

Batch serving
Low latency serving

Batch Serving

Pros:

Very flexible
Easily set up, you can put the model object/data anywhere

Cons:

High latency

Process:

Read input
Run inference
Return output

Low Latency Serving

Pros:

Low latency

Cons:

Rigid schema
Infrastructure setup overhead
Only handles relatively simple model?

Complexity:

Communication interface setup

Format (grpc/json)
Security

Multiple model deployment management

Hybrid

Still have the same complexity as low latency but with

More accurate model
Feasible infrastructure

Input doesn’t need to contain all features

Packages

To properly productionise deploy and serving model, we may want to consider requirements related to

Model versioning
Model metadata
Model artifact

Low level solution (each package only focus on one thing)

Serving model

(language specific) TFServing (https://www.tensorflow.org/tfx/tutorials/serving/rest_simple)
(generic) BentoML
(generic) MPI capable package

(and versioning)

Off the shelf solution (High level solution)

Kubeflow (GCP)
Sagemaker (AWS)
MLFlow (Storing model metadata, Storing model artifact)

Comments