Re: [Lava-users] Dependencies between master and worker when restoring from backups

18 Dec 2018


      On Tue, 18 Dec 2018 at 12:45, Tim Jaacks tim.jaacks@garz-fricke.com wrote:
...
Hello everyone,
I have written a backup script for my LAVA instance. While testing the restore process I stumbled upon issues. Are there any dependencies between the master and workers concerning backups? When the master crashes, but the worker does not, is it safe to restore the master only and keep the worker as it is? Or do I have to keep master and worker backups in sync and always restore both at the same time?
ZMQ buffers messages for a little time (exactly how long depends on
message volumes). However, if the buffer does fill, just restarting
the service will be fine.
So it is safe to start either lava-master or lava-slave in either
order and if there is a long latency, the lava-slave service might
need to be restarted once the master is up.
...
Restoring my master as described in the LAVA docs generally works. The web interface is back online, all the jobs and devices are in consistent states.
Restoring the worker is relatively easy, according to the docs. I installed the LAVA packages in their previous versions on a fresh (virtual) machine, restored /etc/lava-dispatcher/lava-slave and /etc/lava-coordinator/lava-coordinator.conf. The worker has status "online" in the LAVA web interface afterwards, so the communication seems to work.
However, starting a multinode job does not work. The job log says:
Check that the lava-coordinator is running (wherever you installed it)
and is configured on the worker. The coordinator is capable of
supporting multiple instances but often admins will install a
lava-coordinator alongside the lava-server package on the master and
configure workers to use the coordinator on the relevant master.
https://master.lavasoftware.org/static/docs/v2/first-installation.html#index...
https://master.lavasoftware.org/static/docs/v2/simple-admin.html#checking-fo...
Not every instance uses MultiNode, so this is an extra part of the
backup - restore process for your lab.
...
    lava-dispatcher, installed at version: 2018.5.post1-2~bpo9+1
    start: 0 validate
    Start time: 2018-12-18 12:25:14.335215+00:00 (UTC)
    This MultiNode test job contains top level actions, in order, of: deploy, boot, test, finalize
    lxc, installed at version: 1:2.0.7-2+deb9u2
    validate duration: 0.01
    case: validate
    case_id: 112
    definition: lava
    result: pass
    Initialising group b6eb846d-689f-40c5-b193-8afce41883ee
    Connecting to LAVA Coordinator on lava-server-vm:3079 timeout=90 seconds.


This comes out in a loop, until the job times out.
The lava-slave logfile says:
    2018-12-18 12:27:15,114    INFO master => START(12)
    2018-12-18 12:27:15,117    INFO [12] Starting job
    [...]
    2018-12-18 12:27:15,124   DEBUG [12] dispatch:
    2018-12-18 12:27:15,124   DEBUG [12] env     : {'overrides': {'LC_ALL': 'C.UTF-8', 'LANG': 'C', 'PATH': '/usr/local/bin:/usr/local/sbin:/bin:/usr/bin:/usr/sbin:/sbin'}, 'purge': True}
    2018-12-18 12:27:15,124   DEBUG [12] env-dut :
    2018-12-18 12:27:15,129   ERROR [EXIT] 'NoneType' object has no attribute 'send_start_ok'
    2018-12-18 12:27:15,129   ERROR 'NoneType' object has no attribute 'send_start_ok'


It is the "job = jobs.create()" call in lava-slave's handle_start() routine which fails. Obviously there is a separate database on the worker (of which I did not know until now), which fails to be filled with values. Does this database have to be backup'ed and restored? What is the purpose of this database? Is there anything I need to know about it concerning backups?
The SQLite database on the worker is just to retain state so that the
lava-slave service can be restarted without affecting running test
jobs.
It should not be restored - the previous state of the worker needs to
be cleared when doing a restore.
-- 

Neil Williams
=============
neil.williams@linaro.org
http://www.linux.codehelp.co.uk/

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [Lava-users] Dependencies between master and worker when restoring from backups