Hello everyone,
I am trying to implement a test case scenario in which I need to confirm partition scheme, environment variables after the reboot which is triggered by the watchdog I am using. So basically I need to be able to successfully login to the image after the reboot triggered by the watchdog.
It feels like the link between the test Job and the Linux image is lost after the reboot done by the watchdog. Whatever actions present after that reboot ( boot or test actions ) are not working.
Below is the template of my Job definition:
job details... device_type: qemu #### (timeouts) ###(priority) ##(notify, context etc.)
actions - deploy images: ### firmware: ###
- boot auto_login: login_prompt: "###" username: "##" password_prompt: "###" password: "##"
- test: definitions: ### repository: #### metadata: ### run: steps: - swupdate -i #### - reboot ------------------------------------------------- At this stage I introduced a service which causes kernel panic during the reboot (by the command explicitly given by me above in the job). So after 'X' seconds of watchdog timeout , a second reboot is triggered by the watchdog. During that reboot done by watchdog, the kernel booted successfully, all the services were set up fine, but the test stopped at the login stage.
I think it did not get the "login_prompt" and ''password_prompt" from the boot action I wrote after the above test action ( i.e after reboot done by watchdog ).
- boot auto_login: login_prompt: "###" username: "##" password_prompt: "###" password: "##"
So Is there a way to add boot and test actions in such a way that the test job can be continued after reboot is done by a watchdog ?
Note: When I explicitly provide "reboot" in steps section in test action, then the link did not break and I was able to reboot, login and run test action steps successfully no matter how many times I wanted.
This query is specifically for cases in which reboot happened out of test writer's scope. (i.e like reboot triggered by a watchdog)
On Fri, Dec 8, 2023 at 7:19 AM sai.sathujoda@toshiba-tsip.com wrote:
Hello everyone,
I am trying to implement a test case scenario in which I need to confirm partition scheme, environment variables after the reboot which is triggered by the watchdog I am using. So basically I need to be able to successfully login to the image after the reboot triggered by the watchdog.
It feels like the link between the test Job and the Linux image is lost after the reboot done by the watchdog. Whatever actions present after that reboot ( boot or test actions ) are not working.
Below is the template of my Job definition:
job details... device_type: qemu #### (timeouts) ###(priority) ##(notify, context etc.)
actions
deploy images: ### firmware: ###
boot auto_login: login_prompt: "###" username: "##" password_prompt: "###" password: "##"
test: definitions: ### repository: #### metadata: ### run: steps: - swupdate -i #### - reboot
At this stage I introduced a service which causes kernel panic during the reboot (by the command explicitly given by me above in the job). So after 'X' seconds of watchdog timeout , a second reboot is triggered by the watchdog. During that reboot done by watchdog, the kernel booted successfully, all the services were set up fine, but the test stopped at the login stage.
I think it did not get the "login_prompt" and ''password_prompt" from the boot action I wrote after the above test action ( i.e after reboot done by watchdog ).
- boot auto_login: login_prompt: "###" username: "##" password_prompt: "###" password: "##"
So Is there a way to add boot and test actions in such a way that the test job can be continued after reboot is done by a watchdog ?
LAVA doesn't like the board being rebooted outside of it's control. What you're trying to do won't work. The closest I can think of would be sth like this:
actions: - deploy - boot - test ... steps: - # introduce kernel crash service (I presume it's systemd or similar) - poweroff # make sure the board doesn't start booting - deploy (can be dummy, just download something) - boot auto_login ... ignore_kernel_messages: true
If your board runs into kernel panic only once (as a test) and after watchdog reboot it boots normally, you should be fine.
Note: When I explicitly provide "reboot" in steps section in test action, then the link did not break and I was able to reboot, login and run test action steps successfully no matter how many times I wanted.
This is interesting. I'd like to see the job definition for this.
This query is specifically for cases in which reboot happened out of test writer's scope. (i.e like reboot triggered by a watchdog)
Don't reboot in your tests. LAVA doesn't support it.
Best Regards, Milosz
lava-users mailing list -- lava-users@lists.lavasoftware.org To unsubscribe send an email to lava-users-leave@lists.lavasoftware.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
Hi Milosz,
Thanks for your reply.
Actually let me explain you the situation in detail.
I wanted to implement software update testing in LAVA. Below is the job definition for this. --------------------------------- device_type: qemu job_name: qemu x86_64 software update testing timeouts: job: minutes: 20 action: minutes: 10 actions: power-off: seconds: 60 priority: high visibility: personal notify: criteria: status: finished recipients: - to: method: email email: sai.sathujoda@toshiba-tsip.com context: arch: x86_64 lava_test_dir: '/home/lava-%s' # ACTION BLOCK actions: - deploy: timeout: minutes: 15 to: tmpfs images: system: image_arg: '####' url: #### compression: xz firmware: image_arg: '****' url: **** # BOOT BLOCK - boot: timeout: minutes: 5 method: qemu media: tmpfs prompts: ["####@demo:~#"] auto_login: login_prompt: "demo login:" username: "####" password_prompt: "Password:" password: "####" # TEST_BLOCK - test: timeout: minutes: 5 definitions: - repository: metadata: format: Lava-Test Test Definition 1.0 name: sample-test description: "check reboot version" run: steps: - lava-test-case uname --shell uname -a - cd /home - wget --no-check-certificate -q swupdate_artifact - lsblk - swupdate -i swupdate_artifact - reboot from: inline name: sample-test-1 path: inline/sample-test.yaml - boot: timeout: minutes: 5 method: qemu media: tmpfs prompts: ["####@demo:"] auto_login: login_prompt: "demo login:" username: "####" password_prompt: "Password:" password: "####" - test: timeout: minutes: 5 definitions: - repository: metadata: format: Lava-Test Test Definition 1.0 name: sample-test description: "check partition switch" run: steps: - uname -a - lsblk - reboot from: inline name: sample-test-2 path: inline/sample-test.yaml
context: arch: x86_64 lava_test_results_dir: '/home/lava-%s' -------------------------------
If the reboot after successful swupdate does not cause any kernel panic, then the reboot is working fine, I am able to see the persistent changes in the root file system and no matter how many reboot i explicitly trigger in the job definition, i am not facing any issue.
But the problem here is, in case if the swupdate actually leads to a kernel-panic situation during reboot, then after 60s of watchdog timeout, a reboot is triggered by the watchdog as I mentioned earlier. Even then I have found no issues in booting kernel, setting up of all services and root file system, but the auto login failed because all the actions I have written after the watchdog reboot are not working.
So do you think this can be fixed on LAVA side or is this something LAVA is not ready to support ?
Is some work going on LAVA development side to fix this issue ? In case if you are have no info on this, do you think asking this in lava-dev mailing list might help ?
Thanks, Sai Ashrith
On Tue, Dec 12, 2023 at 2:46 AM sai.sathujoda@toshiba-tsip.com wrote:
Hi Milosz,
Thanks for your reply.
Actually let me explain you the situation in detail.
I wanted to implement software update testing in LAVA. Below is the job definition for this.
device_type: qemu job_name: qemu x86_64 software update testing timeouts: job: minutes: 20 action: minutes: 10 actions: power-off: seconds: 60 priority: high visibility: personal notify: criteria: status: finished recipients:
- to: method: email email: sai.sathujoda@toshiba-tsip.com
context: arch: x86_64 lava_test_dir: '/home/lava-%s' # ACTION BLOCK actions:
- deploy: timeout: minutes: 15 to: tmpfs images: system: image_arg: '####' url: #### compression: xz firmware: image_arg: '****' url: ****
# BOOT BLOCK
- boot: timeout: minutes: 5 method: qemu media: tmpfs prompts: ["####@demo:~#"] auto_login: login_prompt: "demo login:" username: "####" password_prompt: "Password:" password: "####"
# TEST_BLOCK
- test: timeout: minutes: 5 definitions:
- repository: metadata: format: Lava-Test Test Definition 1.0 name: sample-test description: "check reboot version" run: steps: - lava-test-case uname --shell uname -a - cd /home - wget --no-check-certificate -q swupdate_artifact - lsblk - swupdate -i swupdate_artifact - reboot from: inline name: sample-test-1 path: inline/sample-test.yaml
- boot: timeout: minutes: 5 method: qemu media: tmpfs prompts: ["####@demo:"] auto_login: login_prompt: "demo login:" username: "####" password_prompt: "Password:" password: "####"
- test: timeout: minutes: 5 definitions:
- repository: metadata: format: Lava-Test Test Definition 1.0 name: sample-test description: "check partition switch" run: steps: - uname -a - lsblk - reboot from: inline name: sample-test-2 path: inline/sample-test.yaml
context: arch: x86_64 lava_test_results_dir: '/home/lava-%s'
If the reboot after successful swupdate does not cause any kernel panic, then the reboot is working fine, I am able to see the persistent changes in the root file system and no matter how many reboot i explicitly trigger in the job definition, i am not facing any issue.
Do you boot without a login prompt after sw update? Job log would help understand what is happening.
But the problem here is, in case if the swupdate actually leads to a kernel-panic situation during reboot, then after 60s of watchdog timeout, a reboot is triggered by the watchdog as I mentioned earlier. Even then I have found no issues in booting kernel, setting up of all services and root file system, but the auto login failed because all the actions I have written after the watchdog reboot are not working.
I wrote this before - LAVA doesn't like the board being rebooted without LAVA's "help". So if you reboot as a part of your test and it requires logging in, you need to implement it somehow in your test. I've no idea if it's even possible.
So do you think this can be fixed on LAVA side or is this something LAVA is not ready to support ?
It's not a bug, it's by design. LAVA doesn't allow for the reboot in the middle of the test. Best you can do is to poweroff and let lava boot up using "boot" action.
Is some work going on LAVA development side to fix this issue ? In case if you are have no info on this, do you think asking this in lava-dev mailing list might help ?
IMHO it's unlikely you get any other answer in the lava-dev as the crowd there is basically the same as here.
Thanks, Sai Ashrith _______________________________________________ lava-users mailing list -- lava-users@lists.lavasoftware.org To unsubscribe send an email to lava-users-leave@lists.lavasoftware.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
Hello,
Le mar. 12 déc. 2023 à 11:04, Milosz Wasilewski < milosz.wasilewski@foundries.io> a écrit :
On Tue, Dec 12, 2023 at 2:46 AM sai.sathujoda@toshiba-tsip.com wrote:
Hi Milosz,
Thanks for your reply.
Actually let me explain you the situation in detail.
I wanted to implement software update testing in LAVA. Below is the job
definition for this.
device_type: qemu job_name: qemu x86_64 software update testing timeouts: job: minutes: 20 action: minutes: 10 actions: power-off: seconds: 60 priority: high visibility: personal notify: criteria: status: finished recipients:
- to: method: email email: sai.sathujoda@toshiba-tsip.com
context: arch: x86_64 lava_test_dir: '/home/lava-%s' # ACTION BLOCK actions:
- deploy: timeout: minutes: 15 to: tmpfs images: system: image_arg: '####' url: #### compression: xz firmware: image_arg: '****' url: ****
# BOOT BLOCK
- boot: timeout: minutes: 5 method: qemu media: tmpfs prompts: ["####@demo:~#"] auto_login: login_prompt: "demo login:" username: "####" password_prompt: "Password:" password: "####"
# TEST_BLOCK
- test: timeout: minutes: 5 definitions:
- repository: metadata: format: Lava-Test Test Definition 1.0 name: sample-test description: "check reboot version" run: steps: - lava-test-case uname --shell uname -a - cd /home - wget --no-check-certificate -q swupdate_artifact - lsblk - swupdate -i swupdate_artifact - reboot from: inline name: sample-test-1 path: inline/sample-test.yaml
- boot: timeout: minutes: 5 method: qemu media: tmpfs prompts: ["####@demo:"] auto_login: login_prompt: "demo login:" username: "####" password_prompt: "Password:" password: "####"
- test: timeout: minutes: 5 definitions:
- repository: metadata: format: Lava-Test Test Definition 1.0 name: sample-test description: "check partition switch" run: steps: - uname -a - lsblk - reboot from: inline name: sample-test-2 path: inline/sample-test.yaml
context: arch: x86_64 lava_test_results_dir: '/home/lava-%s'
If the reboot after successful swupdate does not cause any kernel panic,
then the reboot is working fine, I am able to see the persistent changes in the root file system and no matter how many reboot i explicitly trigger in the job definition, i am not facing any issue.
Do you boot without a login prompt after sw update? Job log would help understand what is happening.
But the problem here is, in case if the swupdate actually leads to a
kernel-panic situation during reboot, then after 60s of watchdog timeout, a reboot is triggered by the watchdog as I mentioned earlier. Even then I have found no issues in booting kernel, setting up of all services and root file system, but the auto login failed because all the actions I have written after the watchdog reboot are not working.
I wrote this before - LAVA doesn't like the board being rebooted without LAVA's "help". So if you reboot as a part of your test and it requires logging in, you need to implement it somehow in your test. I've no idea if it's even possible.
Should be possible with an interactive test action that will wait for the boot to happen and then run some tests. Maybe a minimal boot would work too (not sure).
So do you think this can be fixed on LAVA side or is this something LAVA
is not ready to support ?
It's not a bug, it's by design. LAVA doesn't allow for the reboot in the middle of the test. Best you can do is to poweroff and let lava boot up using "boot" action.
Is some work going on LAVA development side to fix this issue ? In case
if you are have no info on this, do you think asking this in lava-dev mailing list might help ?
IMHO it's unlikely you get any other answer in the lava-dev as the crowd there is basically the same as here.
Thanks, Sai Ashrith _______________________________________________ lava-users mailing list -- lava-users@lists.lavasoftware.org To unsubscribe send an email to lava-users-leave@lists.lavasoftware.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
lava-users mailing list -- lava-users@lists.lavasoftware.org To unsubscribe send an email to lava-users-leave@lists.lavasoftware.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
Hi Remi,
Thanks for your reply.
As I mentioned earlier, I was doing prompt based login after reboot done by the watchdog. Since the reboot is done by the watchdog which is out of DUT's control. So the following actions written in the job definition are not working. So currently the test will be stuck waiting for login prompt leading to a timeout failure.
As you mentioned about interactive test action, that stage comes only after I am able to login to the image after watchdog's reboot right.
Thanks and regards, Sai Ashrith
Hello,
On 12/18/23 10:53, Remi Duraffort wrote:
Hello,
Le mar. 12 déc. 2023 à 11:04, Milosz Wasilewski <milosz.wasilewski@foundries.io mailto:milosz.wasilewski@foundries.io> a écrit :
On Tue, Dec 12, 2023 at 2:46 AM <sai.sathujoda@toshiba-tsip.com <mailto:sai.sathujoda@toshiba-tsip.com>> wrote: > > Hi Milosz, > > Thanks for your reply. > > Actually let me explain you the situation in detail. > > I wanted to implement software update testing in LAVA. Below is the job definition for this. > --------------------------------- > device_type: qemu > job_name: qemu x86_64 software update testing > timeouts: > job: > minutes: 20 > action: > minutes: 10 > actions: > power-off: > seconds: 60 > priority: high > visibility: personal > notify: > criteria: > status: finished > recipients: > - to: > method: email > email: sai.sathujoda@toshiba-tsip.com <mailto:sai.sathujoda@toshiba-tsip.com> > context: > arch: x86_64 > lava_test_dir: '/home/lava-%s' > # ACTION BLOCK > actions: > - deploy: > timeout: > minutes: 15 > to: tmpfs > images: > system: > image_arg: '####' > url: #### > compression: xz > firmware: > image_arg: '****' > url: **** > # BOOT BLOCK > - boot: > timeout: > minutes: 5 > method: qemu > media: tmpfs > prompts: ["####@demo:~#"] > auto_login: > login_prompt: "demo login:" > username: "####" > password_prompt: "Password:" > password: "####" > # TEST_BLOCK > - test: > timeout: > minutes: 5 > definitions: > - repository: > metadata: > format: Lava-Test Test Definition 1.0 > name: sample-test > description: "check reboot version" > run: > steps: > - lava-test-case uname --shell uname -a > - cd /home > - wget --no-check-certificate -q swupdate_artifact > - lsblk > - swupdate -i swupdate_artifact > - reboot > from: inline > name: sample-test-1 > path: inline/sample-test.yaml > - boot: > timeout: > minutes: 5 > method: qemu > media: tmpfs > prompts: ["####@demo:"] > auto_login: > login_prompt: "demo login:" > username: "####" > password_prompt: "Password:" > password: "####" > - test: > timeout: > minutes: 5 > definitions: > - repository: > metadata: > format: Lava-Test Test Definition 1.0 > name: sample-test > description: "check partition switch" > run: > steps: > - uname -a > - lsblk > - reboot > from: inline > name: sample-test-2 > path: inline/sample-test.yaml > > context: > arch: x86_64 > lava_test_results_dir: '/home/lava-%s' > ------------------------------- > > If the reboot after successful swupdate does not cause any kernel panic, then the reboot is working fine, I am able to see the persistent changes in the root file system and no matter how many reboot i explicitly trigger in the job definition, i am not facing any issue. Do you boot without a login prompt after sw update? Job log would help understand what is happening. > > But the problem here is, in case if the swupdate actually leads to a kernel-panic situation during reboot, then after 60s of watchdog timeout, a reboot is triggered by the watchdog as I mentioned earlier. Even then I have found no issues in booting kernel, setting up of all services and root file system, but the auto login failed because all the actions I have written after the watchdog reboot are not working. I wrote this before - LAVA doesn't like the board being rebooted without LAVA's "help". So if you reboot as a part of your test and it requires logging in, you need to implement it somehow in your test. I've no idea if it's even possible.
Should be possible with an interactive test action that will wait for the boot to happen and then run some tests. Maybe a minimal boot would work too (not sure).
An minimal boot action works with a little bit of tuning, like overwriting the soft reboot command:
- boot: timeout: minutes: 10 method: minimal soft_reboot: echo c > /proc/sysrq-trigger prompts: ["root@demo:~#"] auto_login: login_prompt: "demo login:" username: "root" password_prompt: "Password:" password: "root" parameters: shutdown-message: "end Kernel panic - not syncing: sysrq triggered crash "
The full log shows how the watchdog will reboot the system.
{"dt": "2024-01-03T09:19:15.791571", "lvl": "target", "msg": "[ 122.191970] ---[ end Kernel panic - not syncing: sysrq triggered crash ]---"} - {"dt": "2024-01-03T09:19:15.792074", "lvl": "debug", "msg": "end: 4.2.1 send-reboot-commands (duration 00:00:00) [common]"} - {"dt": "2024-01-03T09:19:15.792278", "lvl": "results", "msg": {"case": "send-reboot-commands", "definition": "lava", "duration": "0.22", "extra": {"commands": ["sysctl -w kernel.panic="0" && echo c > /proc/sysrq-trigger"]}, "level": "4.2.1", "namespace": "common", "result": "pass"}} - {"dt": "2024-01-03T09:19:15.792621", "lvl": "debug", "msg": "end: 4.2 reset-device (duration 00:00:00) [common]"} - {"dt": "2024-01-03T09:19:15.792838", "lvl": "debug", "msg": "start: 4.3 auto-login-action (timeout 00:10:00) [common]"} - {"dt": "2024-01-03T09:19:15.793015", "lvl": "debug", "msg": "Setting prompt string to ['Linux version [0-9]']"} - {"dt": "2024-01-03T09:19:15.793183", "lvl": "debug", "msg": "auto-login-action: Wait for prompt ['Linux version [0-9]'] (timeout 00:10:00)"} - {"dt": "2024-01-03T09:21:12.211543", "lvl": "target", "msg": "[2J[01;01H[=3h[2J[01;01H[2J[01;01H[=3h[2J[01;01H[2J[01;01H[=3h[2J[01;01HBdsDxe: failed to load Boot0001 \"UEFI QEMU HARDDISK QM00001 \" from PciRoot(0x0)/Pci(0x1F,0x2)/Sata(0x0,0xFFFF,0x0): Not Found"}
I posted the complete job description at https://lists.lavasoftware.org/archives/list/lava-users@lists.lavasoftware.o...
You can also find the executed job at https://lava.ciplatform.org/scheduler/job/1068450.
Best regards,
Quirin
> > So do you think this can be fixed on LAVA side or is this something LAVA is not ready to support ? It's not a bug, it's by design. LAVA doesn't allow for the reboot in the middle of the test. Best you can do is to poweroff and let lava boot up using "boot" action. > > Is some work going on LAVA development side to fix this issue ? In case if you are have no info on this, do you think asking this in lava-dev mailing list might help ? IMHO it's unlikely you get any other answer in the lava-dev as the crowd there is basically the same as here. > > Thanks, > Sai Ashrith > _______________________________________________ > lava-users mailing list -- lava-users@lists.lavasoftware.org <mailto:lava-users@lists.lavasoftware.org> > To unsubscribe send an email to lava-users-leave@lists.lavasoftware.org <mailto:lava-users-leave@lists.lavasoftware.org> > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s _______________________________________________ lava-users mailing list -- lava-users@lists.lavasoftware.org <mailto:lava-users@lists.lavasoftware.org> To unsubscribe send an email to lava-users-leave@lists.lavasoftware.org <mailto:lava-users-leave@lists.lavasoftware.org> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
-- Rémi Duraffort Principal Tech Lead LAVA Tech Lead Automation Software Team Linaro
lava-users mailing list -- lava-users@lists.lavasoftware.org To unsubscribe send an email to lava-users-leave@lists.lavasoftware.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
lava-users@lists.lavasoftware.org