A few of our users occasionally spin up pods that do a lot of number crunching. The front end is a web app that queries the pod and waits for a response.
Some of these queries exceed the default 30s timeout for the pod ingress. So, I added an annotation to the pod ingress to increase the timeout to 60s. Users still report occasional timeouts.
I asked how long they need the timeout to be. They requested 1 hour.
This seems excessive. My gut feeling is this will cause problems. However, I don’t know enough about ingress timeouts to know what will break. So, what is the worst case scenario of 3-10 pods having 1 hour ingress timeouts?
That’s outright insane. Does this mean that if they connection has any type of hiccup, all their work is lost?
Instead of having web apps working directly out of request-response cycle, these long running jobs need to be treated as a proper separate task, which gets a proper record entry in their database and could be queried for the results later.