On Tue, Sep 25, 2018 at 09:03:03AM +0100, Neil Williams wrote:
On Tue, 25 Sep 2018 at 08:56, Corentin Labbe clabbe@baylibre.com wrote:
Hello
We got a job (number 332) stuck in running state. After 23h of inaction, the only way to stop it was to cancel+fail it. According to the logs, a small disconnection happen between the slave and master. The slave seems to try to update the final status of the job but the master "ignore" it.
What Debian package version(s) of lava-dispatcher on the slave and lava-server on the master?
On slave: ii lava-common 2018.7-1+stretch all Linaro Automated Validation Architecture common ii lava-dispatcher 2018.7-1+stretch amd64 Linaro Automated Validation Architecture dispatcher
On master: ii lava 2018.7-1+stretch all Linaro Automated Validation Architecture metapackage ii lava-common 2018.7-1+stretch all Linaro Automated Validation Architecture common ii lava-coordinator 0.1.7-1 all LAVA Coordinator daemon ii lava-dev 2018.7-1+stretch all Linaro Automated Validation Architecture developer support ii lava-dispatcher 2018.7-1+stretch amd64 Linaro Automated Validation Architecture dispatcher ii lava-server 2018.7-1+stretch all Linaro Automated Validation Architecture server ii lava-server-doc 2018.7-1+stretch all Linaro Automated Validation Architecture documentation
Can you attach (rather than inline) the test job log file (output.yaml)? The job ended very quickly, looks like a validate error.
This is the job output https://lava.automotivelinux.org/scheduler/job/332
It also looks like both master and slave went offline at the same time. Can you confirm that master and slave are both running in timezone UTC & if both have ntp installed?
Yes I confirm