QQ on `kubeflow` as metadata store :thread:
Last active 5 days ago
9 replies
1 views
- AM
QQ on
kubeflow
as metadata store :thread: - AM
I have bunch of pipelines running on kubeflow and after a while I get this error:
```╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /usr/local/lib/python3.9/site-packages/mlmetadata/metadatastore/metadata_s │
│ tore.py:213 in _callmethod │ │ │ │ 210 │ else: │ │ 211 │ grpcmethod = getattr(self.metadatastorestub, methodname) │
│ 212 │ try: │
│ ❱ 213 │ │ response.CopyFrom(grpcmethod(request, timeout=self.grpctim │ │ 214 │ except grpc.RpcError as e: │ │ 215 │ │ # RpcError code uses a tuple to specify error code and short │ │ 216 │ │ # description. │ │ │ │ /usr/local/lib/python3.9/site-packages/grpc/channel.py:946 in call │
│ │
│ 943 │ │ │ │ compression=None): │
│ 944 │ │ state, call, = self.blocking(request, timeout, metadata, cre │ │ 945 │ │ │ │ │ │ │ │ │ waitfor_ready, compression) │
│ ❱ 946 │ │ return _endunaryresponseblocking(state, call, False, None) │ │ 947 │ │ │ 948 │ def withcall(self, │
│ 949 │ │ │ │ request, │
│ │
│ /usr/local/lib/python3.9/site-packages/grpc/_channel.py:849 in │
│ _endunaryresponse_blocking │
│ │
│ 846 │ │ else: │
│ 847 │ │ │ return state.response │
│ 848 │ else: │
│ ❱ 849 │ │ raise _InactiveRpcError(state) │
│ 850 │
│ 851 │
│ 852 def streamunaryinvocationoperationses(metadata, initialmetadata │
╰──────────────────────────────────────────────────────────────────────────────╯
InactiveRpcError: <InactiveRpcError of RPC that terminated with:
status = StatusCode.RESOURCEEXHAUSTED details = "Received message larger than max (5282925 vs. 4194304)" debugerrorstring = "UNKNOWN:Error received from peer metadata-grpc-service.kubeflow:8080 {grpcmessage:"Received message larger than
max (5282925 vs. 4194304)", grpcstatus:8, createdtime:"2023-03-17T18:00:48.942206603+00:00"}"
> │
│ │
│ 215 │ │ # RpcError code uses a tuple to specify error code and short │
│ 216 │ │ # description. │
│ 217 │ │ #│
│ ❱ 218 │ │ raise makeexception(e.details(), e.code().value[0]) # pyty │
│ 219 │
│ 220 def pywrapcc_call(self, method, request, response) -> None: │
│ 221 │ """Calls method, serializing and deserializing inputs and outputs │
╰──────────────────────────────────────────────────────────────────────────────╯
ResourceExhaustedError: Received message larger than max (5282925 vs. 4194304)``` - AM
there are a lot of similar issues I can see on net; here are some hunches:
• themysqldb
somehow gets unresponsive due to duplicate UUIDI of the run details
• the default gRPC message limit is 4mb by default and should be increased
• themysqlGroupConcatMaxLen
parameter of themysql
db should be increased
• … - AM
I am out of ideas; I was wondering if anyone seen similar things before; so I am all ears :ear:
- AM
- AM
in :point_up: issue, someones said
@chensun I was not facing this issue initially, but started to face once Kubeflow has more than 5k pipeline runs. We are not logging any metadata outside metadata-writer. Looks like we have to implement pagination on how metadata-writer queries metadata-server. Please see the issue here google/ml-metadata#74 and google/ml-metadata#42
which is exactly my case; - HA
oh interesting. It seems more like an underlying Kubeflow issue though @amir.benny :disappointed: Not sure how to solve it other than delete the old pipelines?
- HA
I suppose one advantage with ZenML would be that you would have the old pipelines at least on the ZenML side
- AM
yeah; i gotta see if we can get away with this with some pagination or updating the kubeflow version. Thanks Hamza
Last active 5 days ago
9 replies
1 views