
What does it do?¶
Hotvect is an open source library for developing real-time and batch machine learning applications, especially personalized content re-rankers. It supports the following tasks:
- Development of feature engineering code that can be shared across offline and online environment
- Integration of Machine Learning libraries like vowpal wabbit, catboost etc. into ML applications
- Definition of ML enabled models and policies, and packaging them into a reusable, modular form that can easily be shared, combined, and deployed into production
- Offline testing and hyperparameter optimization of models and policies, as well as bookkeeping of test results
- Integration with Sagemaker for running offline tests and hyperparameter optimization at scale
What does it not provide?¶
It does not provide:
- Machine Learning algorithms themselves (it is meant to be combined with existing machine learning libraries)
- Orchestration of machine learning pipelines (this needs to be provided through other frameworks like Airflow)
- Life-cycle management of models and policies (this is provided by the Experiment Management Service that supports Hotvect)
- Creation, management and execution of online experiments (this is also provided by the Experiment Management Service)
- Monitoring of ML applications and evaluation of online experiment results (this needs to be provided separately)
Notes¶
Hotvect is designed to be library-agnostic - i.e. you can integrate it with any library. However, currently the library must be “playable” from a JVM process (for example through JNI, or through pure java implementations of ML algorithms like h2o.ai’s xgboost-predictor). We plan to add inter-process integrations in the future.
The Feature Engineering is meant to be written in a JVM language (like Java, Kotlin, Scala etc.). The API for triggering various tasks like offline testing are provided as a python library.