ML Model Deployment concepts (part1)

Deployment

Deployment is a process of running an application on a server or pipeline

Given a model object, how to make it part of your application.


It consist of components:


Deploy model

  • Low level solution

  • Off the shelf solution (High level solution)


Serving model

  • Batch serving

  • Low latency serving


Batch Serving

Pros:

  • Very flexible

  • Easily set up, you can put the model object/data anywhere

Cons:

  • High latency


Process:

  • Read input

  • Run inference

  • Return output


Low Latency Serving

Pros:

  • Low latency

Cons:

  • Rigid schema

  • Infrastructure setup overhead

  • Only handles relatively simple model?

Complexity:

  • Communication interface setup

    • Format (grpc/json)

    • Security

  • Multiple model deployment management


Hybrid

Still have the same complexity as low latency but with


  • More accurate model

  • Feasible infrastructure

    • Input doesn’t need to contain all features


Packages

To properly productionise deploy and serving model, we may want to consider requirements related to

  • Model versioning

  • Model metadata

  • Model artifact


Low level solution (each package only focus on one thing)

  • Serving model

  • (and versioning)



Off the shelf solution (High level solution)

  • Kubeflow (GCP)

  • Sagemaker (AWS)

  • MLFlow (Storing model metadata, Storing model artifact)


Comments