Hi folks, I have a problem with model materializer. I can't use the output of the train in the step of validation and I found a workaround for that and it works but when it comes to deployment the zenml couldn't access to the model and didn't deploy any model: ```An MLflow model with name model was not logged in the current pipeline run and no running MLflow model server was found. Please ensure that your pipeline includes a step with a MLflow experiment configured that trains a model and logs it to MLflow. This could also happen if the current pipeline run did not log an MLflow model because the training step was cached.```
Last active 8 days ago
I have a problem with model materializer. I can't use the output of the
train in the step of validation and I found a workaround for that and
it works but when it comes to deployment the zenml couldn't access to the model and didn't deploy any model:
An MLflow model with name model was not logged in the current pipeline run and no running MLflow model server was found. Please ensure that your pipeline includes a step with a MLflow experiment configured that trains a model and logs it to MLflow. This could also happen if the current pipeline run did not log an MLflow model because the training step was cached.
hey @assiachahidi19, this could happen if your model training step was cached, as the message says
you could try disabling caching for the entire pipeline to see if that helps
by using :
@step(enable_cache=False)will do it
I already did it:
does your MLflow tracker UI show a model being logged in that run ?
the way this step works is it looks for an MLflow model called
modellogged in the current MLflow run. If it doesn't find one, it shows you this error message
so in the pipeline where you use this deployer step, you also need another step that trains and/or logs the model to MLflow. The step that logs the model also needs to run before the deployer step
check your pipeline run logs, what do they show ? is the trainer step being run ? or is it cached ? does it run before the deployer step ?
Thanks @stefan for all this suggestions I will. in the run log the trainer step is being run and the deploy is after validation and here when I have the warning.
the model has to appear in the run output
this is output from a non-cached job run
```Registered pipeline continuousdeploymentpipeline (version 1).
Running pipeline continuousdeploymentpipeline on stack localmlflowstack (caching enabled)
Step importer has started.
Step importer has finished in 0.649s.
Step normalizer has started.
Step normalizer has finished in 0.838s.
Step trainer has started.
2023/03/16 12:03:19 INFO mlflow.tracking.fluent: Experiment with name 'continuousdeploymentpipeline' does not exist. Creating a new experiment.
1875/1875 [==============================] - 2s 817us/step - loss: 0.3685 - accuracy: 0.8962
1875/1875 [==============================] - 2s 807us/step - loss: 0.2875 - accuracy: 0.9194
1875/1875 [==============================] - 2s 807us/step - loss: 0.2769 - accuracy: 0.9223
1875/1875 [==============================] - 2s 828us/step - loss: 0.2682 - accuracy: 0.9261
1875/1875 [==============================] - 2s 817us/step - loss: 0.2656 - accuracy: 0.9262
2023/03/16 12:03:31 WARNING mlflow.utils.autologgingutils: MLflow autologging encountered a warning: "/home/stefan/aspyre/src/zenml/.venv/lib/python3.8/site-packages/distutilshack/init.py:33: UserWarning: Setuptools is replacing distutils." INFO:root:creating /home/stefan/.config/zenml/localstores/ad2d713b-3596-44a3-98ed-0a4e72dd47f5/mlruns/912429668909778767/2d380ccc59f64c3e96910b702b5b828f/artifacts/model/data
INFO:root:creating /home/stefan/.config/zenml/localstores/ad2d713b-3596-44a3-98ed-0a4e72dd47f5/mlruns/912429668909778767/2d380ccc59f64c3e96910b702b5b828f/artifacts/model/data/model INFO:root:creating /home/stefan/.config/zenml/localstores/ad2d713b-3596-44a3-98ed-0a4e72dd47f5/mlruns/912429668909778767/2d380ccc59f64c3e96910b702b5b828f/artifacts/model/data/model/variables
INFO:root:creating /home/stefan/.config/zenml/localstores/ad2d713b-3596-44a3-98ed-0a4e72dd47f5/mlruns/912429668909778767/2d380ccc59f64c3e96910b702b5b828f/artifacts/model/data/model/assets INFO:root:creating /home/stefan/.config/zenml/localstores/ad2d713b-3596-44a3-98ed-0a4e72dd47f5/mlruns/912429668909778767/2d380ccc59f64c3e96910b702b5b828f/artifacts/tensorboardlogs/train Step trainer has finished in 12.025s. Step evaluator has started. 313/313 - 0s - loss: 0.2731 - accuracy: 0.9257 - 247ms/epoch - 789us/step Step evaluator has finished in 0.521s. Step deploymenttrigger has started.
Step deploymenttrigger has finished in 0.038s. Step modeldeployer has started.
Updating an existing MLflow deployment service: MLFlowDeploymentService[f3e975e5-6fca-42e5-837e-51c1d1c01a44] (type: model-serving, flavor: mlflow)
MLflow deployment service started and reachable at:
Step modeldeployer has finished in 9.694s. Pipeline run continuousdeploymentpipeline-20230316-110317605871 has finished in 25.119s.```
this is output from a cached job run
Reusing registered pipeline continuous_deployment_pipeline (version: 1). Running pipeline continuous_deployment_pipeline on stack local_mlflow_stack (caching enabled) Step importer has started. Using cached version of importer. Step normalizer has started. Using cached version of normalizer. Step trainer has started. Using cached version of trainer. Step evaluator has started. Using cached version of evaluator. Step deployment_trigger has started. Using cached version of deployment_trigger. Step model_deployer has started. An MLflow model with name model was not trained in the current pipeline run. Reusing the existing MLflow model server. Step model_deployer has finished in 0.115s. Pipeline run continuous_deployment_pipeline-2023_03_16-11_06_04_936662 has finished in 1.440s.
if you're still sure you're training and logging a new MLflow model in the same pipeline as your deployer step, then something else is going on that I don't understand
by the way, I somehow assumed you were running the ZenML
mlflow_deploymentexample when you got this error. If that's not the case here and you're running your own custom code, you should take a look at the example, try running it on the same stack, see if you get different results.
my output is similar to non-cached job run. How can I please convert it to cached job. Even I specify the train step enable_cache=False
I'm running a custom code following quickstart 3.7
you can disable/enable caching at step level or at pipeline level, as described in the docs here:
if you're using python 3.7, you may be running into another issue where the sklearn auto-logger doesn't work with mlflow because of a version mismatch. Do you see any warnings in your pipeline run logs ? anything like this ?
WARNING mlflow.utils.autologging_utils: You are using an unsupported version of sklearn. If you encounter errors during autologging, try upgrading / downgrading sklearn to a supported version, or try upgrading MLflow.
that's why it's important to check the MLflow tracker UI to be 100% sure that the model is logged to MLflow
also equally important to not ignore warnings that you see in your logs
thanks @stefan for all this suggestions. now by following quickstart 3.7 mlflow is running well, but as you explain the training didn't log to mlflow I have an empty artifact
Last active 8 days ago