New in Wave ML: Overview
Several versions of H2O Wave ML have been released since the last announcement. Wave ML can now run on H2O AI Hybrid Cloud, utilize H2O Driverless AI (DAI) to train the models and push them into H2O MLOps. In addition, new utility functions were introduced to support H2O Enterprise Steam together with MLOps and minor changes to API were made.
#
Wave ML on CloudYou can run both the H2O-3
and DAI
engines on Cloud.
For H2O-3
it works the same as running Wave ML locally - the H2O-3
server is initialized and used. No additional steps are necessary (the future work will extend this further to use Cloud resources).
For DAI
it's essential to enable OpenID Connect (OIDC) and set up the correct ENV variables. This is done in your app.toml file.
At first, check if the Cloud you are using (we have several instances) has appropriate, shared secrets available:
> h2o secret list
We are looking for MLOps gateway and Steam address:
h2oai-mlops ALL_USERS gatewayh2oai-steam ALL_USERS api
Set up your app.toml
accordingly:
[Runtime]EnableOIDC = true
[[Env]]Name = "H2O_WAVE_ML_STEAM_ADDRESS"Secret = "h2oai-steam"SecretKey = "api"
[[Env]]Name = "H2O_WAVE_ML_MLOPS_GATEWAY"Secret = "h2oai-mlops"SecretKey = "gateway"
You can use your Steam DAI instance with Wave ML in either of two ways:
- Specify the instance name in the ENV variable. This option is suitable for a multinode setup:
[[Env]]Name = "H2O_WAVE_ML_STEAM_CLUSTER_NAME"Value = "my-cool-multinode-project-name"
- Specify the instance name directly in API:
model = build_model(_steam_dai_instance_name='my-dai')
The rest of the Wave ML workflow continues to be the same.
#
Utility FunctionsA new module was introduced to Wave ML
available as h2o_wave_ml.utils
. It contains some useful functions:
list_dai_instances()
: Gets a list of all available Driverless instances.list_dai_multinodes()
: Gets a list of all available Driverless multinode instances.save_autodoc()
: Saves the AutoDoc of DAI model.
Use list_dai_instances()
to obtain a list of your DAI instances. You can use it later to feed build_model()
with particular instance name:
from h2o_wave_ml.utils import list_dai_instances
@app('/')async def serve(q: Q): dai_instances = list_dai_instances(access_token=q.auth.access_token)
An AutoDoc is available to download if the model was built using DAI backend. Use save_autodoc()
to save a .docx
file to drive:
from h2o_wave_ml import build_modelfrom h2o_wave_ml.utils import save_autodoc
@app('/')async def serve(q: Q): model = build_model(...) file_name = save_autodoc(project_id=model.project_id, access_token=q.auth.access_token)
The .project_id
property of the model is always available if the model was obtained by build_model()
function. However, if the model is returned using get_model()
, it's available only if the model belongs to an authenticated user.
See the API for utils.
#
OpenID ConnectOIDC tokens are used to authenticate across the services on Cloud. Tokens are available within a q
argument of the Wave query handler, i.e., q.auth.access_token
and q.auth.refresh_token
. For more information, see the single sign-on section of the Wave page.
Some of the Wave ML functions were equipped with access_token
and refresh_token
arguments to handle the authentication routines behind the scenes. Use the appropriate token to authenticate.
In general, access_token
has a small lifespan, and it's not suitable for a long-running job. Use it when you are sure your routine is handled swiftly, e.g., the listing example:
from h2o_wave_ml.utils import list_dai_instances
@app('/')async def serve(q: Q): dai_instances = list_dai_instances(access_token=q.auth.access_token)
If your task needs more time (e.g. build_model()
), use refresh_token
so Wave ML can generate access tokens as needed:
from h2o_wave_ml import build_model
@app('/')async def serve(q: Q): model = build_model(refresh_token=q.auth.refresh_token, ...)
#
Notable Featuresbuild_model()
was updated with severalH2O-3
andDAI
parameters. See an example forH2O-3
parameters.Pandas dataframe can be used for training, testing and validating:
from h2o_wave_ml import build_model
from sklearn.datasets import load_winefrom sklearn.model_selection import train_test_split
data = load_wine(as_frame=True)['frame']train_df, test_df = train_test_split(data, train_size=0.8)
model = build_model(train_df=train_df, target_column='target')
A validation dataset can be passed to
build_model()
function. Choose fromvalidation_file_path
in case of a file orvalidation_df
in case of a Pandas dataframe.A list of categorical columns can be specified using the
categorical_columns
argument of abuild_model()
function.Columns can be dropped or included using
drop_columns
orfeature_columns
of abuild_model()
function.
See the API for Wave ML and examples on official Wave page.
Follow updates to Wave ML on GitHub: https://github.com/h2oai/wave-ml. Let us know what you think and how we can improve it.
We look forward to continuing our collaboration with the community and hearing your feedback as we further improve and expand the H2O Wave platform.