[Lava-users] Dependencies between master and worker when restoring from backups

19 Dec 2018


      Hello everyone,
I have written a backup script for my LAVA instance. While testing the restore process I stumbled upon issues. Are there any dependencies between the master and workers concerning backups? When the master crashes, but the worker does not, is it safe to restore the master only and keep the worker as it is? Or do I have to keep master and worker backups in sync and always restore both at the same time?
Restoring my master as described in the LAVA docs generally works. The web interface is back online, all the jobs and devices are in consistent states.
Restoring the worker is relatively easy, according to the docs. I installed the LAVA packages in their previous versions on a fresh (virtual) machine, restored /etc/lava-dispatcher/lava-slave and /etc/lava-coordinator/lava-coordinator.conf. The worker has status "online" in the LAVA web interface afterwards, so the communication seems to work.
However, starting a multinode job does not work. The job log says:
lava-dispatcher, installed at version: 2018.5.post1-2~bpo9+1
    start: 0 validate
    Start time: 2018-12-18 12:25:14.335215+00:00 (UTC)
    This MultiNode test job contains top level actions, in order, of: deploy, boot, test, finalize
    lxc, installed at version: 1:2.0.7-2+deb9u2
    validate duration: 0.01
    case: validate
    case_id: 112
    definition: lava
    result: pass
    Initialising group b6eb846d-689f-40c5-b193-8afce41883ee
    Connecting to LAVA Coordinator on lava-server-vm:3079 timeout=90 seconds.
This comes out in a loop, until the job times out.
The lava-slave logfile says:
2018-12-18 12:27:15,114    INFO master => START(12)
    2018-12-18 12:27:15,117    INFO [12] Starting job
    [...]
    2018-12-18 12:27:15,124   DEBUG [12] dispatch:
    2018-12-18 12:27:15,124   DEBUG [12] env     : {'overrides': {'LC_ALL': 'C.UTF-8', 'LANG': 'C', 'PATH': '/usr/local/bin:/usr/local/sbin:/bin:/usr/bin:/usr/sbin:/sbin'}, 'purge': True}
    2018-12-18 12:27:15,124   DEBUG [12] env-dut :
    2018-12-18 12:27:15,129   ERROR [EXIT] 'NoneType' object has no attribute 'send_start_ok'
    2018-12-18 12:27:15,129   ERROR 'NoneType' object has no attribute 'send_start_ok'
It is the "job = jobs.create()" call in lava-slave's handle_start() routine which fails. Obviously there is a separate database on the worker (of which I did not know until now), which fails to be filled with values. Does this database have to be backup'ed and restored? What is the purpose of this database? Is there anything I need to know about it concerning backups?
Mit freundlichen Grüßen / Best regards
Tim Jaacks
DEVELOPMENT ENGINEER
Garz & Fricke GmbH
Tempowerkring 2
21079 Hamburg
Direct: +49 40 791 899 - 55
Fax: +49 40 791899 - 39
tim.jaacks@garz-fricke.com
www.garz-fricke.com
WE MAKE IT YOURS!
Sitz der Gesellschaft: D-21079 Hamburg
Registergericht: Amtsgericht Hamburg, HRB 60514
Geschäftsführer: Matthias Fricke, Manfred Garz, Marc-Michael Braun

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

[Lava-users] Dependencies between master and worker when restoring from backups