ML Model Deployment concepts (part1)
Deployment
Deployment is a process of running an application on a server or pipeline
Given a model object, how to make it part of your application.
It consist of components:
Deploy model
Low level solution
Off the shelf solution (High level solution)
Serving model
Batch serving
Low latency serving
Batch Serving
Pros:
Very flexible
Easily set up, you can put the model object/data anywhere
Cons:
High latency
Process:
Read input
Run inference
Return output
Low Latency Serving
Pros:
Low latency
Cons:
Rigid schema
Infrastructure setup overhead
Only handles relatively simple model?
Complexity:
Communication interface setup
Format (grpc/json)
Security
Multiple model deployment management
Hybrid
Still have the same complexity as low latency but with
More accurate model
Feasible infrastructure
Input doesn’t need to contain all features
Packages
To properly productionise deploy and serving model, we may want to consider requirements related to
Model versioning
Model metadata
Model artifact
Low level solution (each package only focus on one thing)
Serving model
(language specific) TFServing (https://www.tensorflow.org/tfx/tutorials/serving/rest_simple)
(generic) BentoML
(generic) MPI capable package
(and versioning)
Off the shelf solution (High level solution)
Kubeflow (GCP)
Sagemaker (AWS)
MLFlow (Storing model metadata, Storing model artifact)
Comments
Post a Comment