Hi folks,

The 2020.09 tag has been pushed to master on git.lavasoftware.org.
.deb packages have been built in GitLab CI and are published at

  https://apt.lavasoftware.org/release

Docker images for amd64 and arm64 have been built in GitLab CI and
are available from

  https://hub.lavasoftware.org/

and

  https://hub.docker.com/u/lavasoftware

Changes in this release
=================

# Upgrading

This release is bringing a big architectural change by replacing ZMQ by HTTP(s) for server-worker communication.
Admin will have to update the configuration of each worker.

## From ZMQ to HTTP(s)

The protocol that LAVA is using to communicate between the server and the workers has been changed from ZMQ to HTTP(s).
This will improve performance and reliability but admins will have to update their configuration after the upgrade.

## Database migrations

This release include two migrations:
* lava_results_app.0017_testdata_onetoone_field: drop the bug link
* lava_scheduler_app.0053_testjob_and_worker_token

# Device-types

## New device-types

New supported devices:
* imx8mn-evk

## Juno-r2 and tee

Fix a bug in LAVA %2020.08 that was preventing the use of juno-r2 boards. This is a regression from %2020.07 introduced by the support for `tee` in u-boot jobs.

## SoCA9

Update the dtb address from `0x00000100` to `0x00001000` to prevent some issues with u-boot 2020.07.

# From ZMQ to HTTP(s)

In prior versions LAVA daemons where using ZMQ to communicate and send logs. In this release, LAVA is using plain HTTP(s) to control the remote worker and send job logs.

## Reasons

### Load-balancing and fault-tolerance

The previous architecture was not able to cope with a large number of jobs running in parallel. Mainly because it was impossible to load-balance the traffic to multiple `lava-logs` and `lava-master`.

By using HTTPS(s) it is way easier to load-balance the traffic to multiple instances of `lava-server-gunicorn`.

With load-balancing we can also increase fault-tolerance and move toward zero-downtime upgrades.

### Master and scheduling

In the previous design, `lava-master` was both the master and the job scheduler. This was introducing latency to start jobs when running many jobs in parallel.

With the new design, `lava-scheduler` is running in the background, scheduling jobs while `lava-server-gunicorn` is serving both clients and workers.

### Proxies

Using HTTPS(s) will also easier the adoption of remote workers. In fact, connection to non-standard port is often impossible in corporate environment.

### Job termination

With the previous design, `lava-logs` and `lava-master` where both responsible for terminating a job. This was sometime leading to a dead lock where the job was waiting forever for its end.

This is not possible anymore as `lava-server-gunicorn` is responsible for both the logs and the job termination.

### Simplifying the architecture

By using HTTP(s) instead of ZMQ, we are able to decrease the number of services running on the server. We are planning to also drop the need for `lava-coordinator` in the future.

Using HTTP(s) also allow to decrease the number of network ports that the server should listen to. This is simplifying deployment and help hosting many instances on the same physical server.

## Services

The following services has been dropped:
* `lava-logs`: the logs are sent directly to `lava-server-gunicorn`
* `lava-master`: the workers are pulling jobs from `lava-server-gunicorn`

This release is introducing a new service called `lava-scheduler` that is solely responsible for scheduling jobs.

In this release `lava-slave` has been rewritten from scratch and renamed `lava-worker`.

## Version mismatch

With previous LAVA version, `lava-master` was not checking `lava-slave` version. This was sometime leading to strange behavior when the server was upgraded but not the dispatcher.

`lava-server-gunicorn` is now able to check the `lava-worker` version every time the service is requesting jobs to run.
In the event of a version mismatch, the server will put the worker offline, refusing to start jobs on this worker.

When it's safe to stop the worker (the worker is done with the current set of jobs), the server will return a specific error. If you use the new [LAVA docker worker](#lava-docker-worker), `lava-worker` will be automatically upgraded to the server version whenever needed.

## Upgrading

After the upgrade, every worker will be inactive as the `lava-worker` services won't be able to connect to `lava-server-gunicorn`.

For each worker, admins will have to update the configuration.

* Update the `URL` variable in the worker configuration (`/etc/lava-dispatcher/lava-worker`). This is the full URL to the server.
* Add the worker token in `/var/lib/lava/dispatcher/worker/token`. Admins can find the token in the worker admin page at [http://INSTANCE/admin/lava_scheduler_app/worker/WORKER_NAME/change/](http://INSTANCE/admin/lava_scheduler_app/worker/WORKER_NAME/change/).
* restart `lava-worker`

# LAVA docker worker

This release introduces a program called `lava-docker-worker` that runs a LAVA worker inside a Docker container. This script is provided by the `lava-dispatcher-host` package, and has has the following features:

* Takes the same parameters as regular `lava-worker`.
* Detects the LAVA version of the server, and runs the worker from the same LAVA version
* Automatically upgrades the worker when the server upgrades.
* Docker containers started by it are it siblings and not children, i.e. they will run under the host system directly.

This worker in Docker should support most user cases that are supported by the regular LAVA worker, except running LXC containers.

It's important to note that the container started by `lava-docker-worker` runs in privileged mode and with host networking, what means that it is less isolated from the host system as you would usually expect application containers to be:

- it has access to **all** devices under `/dev`.
- it uses the same networking stack as the host system.

In this case, you should consider `lava-docker-worker` as a distribution facilitator, not as an isolation mechanism. You should not run `lava-docker-worker` on a host where you wouldn't run the regular LAVA worker.

# Bug link

The possibility to link a bug to a specific test job or result as been dropped. This feature was generating a huge load on the database server without a real benefits.

# Tests from tar

Starting from this release, LAVA can pull tests from a tar archive instead of a git repository.

The job definition will look like:

```yaml
- test:
   name: basic-linux-smoke
   timeout:
     minutes: 10
   definitions:
     - repository: https://github.com/Linaro/test-definitions/archive/2019.03.tar.gz
       from: url
       path: automated/linux/smoke/smoke.yaml
       name: linux-smoke
       compression: gz
```

# LAVA job id

LAVA is now exporting the job id to the lava test shell environment. The variable is called `LAVA_JOB_ID` and can be used with

```shell
echo "$LAVA_JOB_ID"
```

We are willing to export more LAVA data as environment variable in the future.


Thanks

--
Rémi Duraffort
LAVA Architect
Linaro