On 04/12/2018 15:51, Neil Williams wrote:
> On Tue, 4 Dec 2018 at 14:31, Andrejs Cainikovs
> <Andrejs.Cainikovs@netmodule.com> wrote:
>>
>> On 04/12/2018 14:41, Neil Williams wrote:
>>> On Tue, 4 Dec 2018 at 13:31, Andrejs Cainikovs
>>> <Andrejs.Cainikovs@netmodule.com> wrote:
>>>>
>>>> Hi Neil,
>>>>
>>>> Apart from inlined answers below, job configuration and full log is
>>>> attached.
>>>>
>>>> On 04/12/2018 11:02, Neil Williams wrote:
>>>>> On Tue, 4 Dec 2018 at 09:37, Andrejs Cainikovs
>>>>> <Andrejs.Cainikovs@netmodule.com> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> TWIMC: I got it working. However, I have noticed that last command in
>>>>>> command sequence (at least for U-Boot) never waits for command prompt,
>>>>>> so I had to update base-uboot.jinja2 to get past this issue.
>>>>>
>>>>> Compare with : https://staging.validation.linaro.org/scheduler/job/245806
>
> See https://staging.validation.linaro.org/scheduler/job/245785#L314
>
> Sending with 10 millisecond of delay
>
> https://staging.validation.linaro.org/scheduler/device/staging-black01/devicedict#defline4
>
> https://git.lavasoftware.org/lava/lava/blob/master/lava_scheduler_app/tests/device-types/beaglebone-black.jinja2#L6
>
> Rather than using sleep, it might just be that automation is firing
> the characters at the serial connection faster than a human can type
> and therefore revealing a bug in the hardware which cannot cope.
> Setting a character delay allows LAVA to wait that many milliseconds
> between each character, replicating what happens with a human.
>
> In your device-type jinja2, you can try:
> {% set boot_character_delay = 10 %}

I have played with char delay quite a lot, and it gave me no result.

Also, I wrote that sleep required by environment, and it has nothing to
do with serial connection or anything in this discussion. I will expand
it: negotiation between or Ethernet switch and our hardware takes few
seconds. I need that sleep or dhcp will fail because of timeout.

>>>>>
>>>>> The last command issued to U-Boot should be boot and that is not
>>>>> expected to ever get to a U-Boot prompt. Yet LAVA does check for a
>>>>> U-Boot prompt occurring after that command, along with other failure
>>>>> indicators like Must RESET board to recover
>>>>
>>>> You are, of course, right. I was not clear enough, and by last command I
>>>> meant something else, which we can skip now. Obviously, my aim is to
>>>> boot the DUT past the U-Boot into Linux, and then run some tests.
>>>>
>>>>> Please do more investigation on your end to work out if the problem is
>>>>> correctly fixed.
>>>>>
>>>>>> Now, when I'm looking at this patch which I prepared yesterday - I'm not
>>>>>> sure whether it was indeed issue with last command in a sequence, or
>>>>>> with `dhcp` command processing (there must be a reason why I moved
>>>>>> `setenv serverip` to top, but I'm lazy to check. Just giving it back to
>>>>>> community as it is in case someone will face something similar.
>>>>>
>>>>> Please take the time to check. When vcu.jinja2 is properly debugged, a
>>>>> contribution to upstream (with supporting unit test) would be very
>>>>> welcome.
>>>>
>>>> I have put some extra effort into debugging, and got working scenario
>>>> without changing base-uboot.jinja2. I did this by setting
>>>> `uboot_ipaddr_cmd` in device type template:
>>>>
>>>> {% set uboot_ipaddr_cmd = 'sleep 5; setenv serverip {SERVER_IP}; dhcp' %}
>>>
>>> I think this might be part of the problem. The U-Boot shell will not
>>> emit a prompt between commands if those commands are sent as  single
>>> line separated by a semi-colon. These should be separate lines in the
>>> U-boot commands. You shouldn't need the sleep.
>>
>> I need that sleep due to environment specifics, it's not a part of a
>> workaround here.
>
> In which case, use Jinja2 comments to clearly explain what is going on
> here or someone will later "optimise" it.
>
> {# the sleep is needed here because .... #}
>
> Be verbose in that comment, possibly even link to a URL where the full
> reasoning is explained.

Ditto.

>> I have tried just now to move the sleep into relay
>> control script, removing `uboot_ipaddr_cmd` from device type config at
>> all, and I hit the original problem again: `setenv serverip` is ignored
>> by U-Boot because it was sent while getting DHCP address without waiting
>> for command prompt.
>
> I fear that the sleep is causing this to work "by mistake" - i.e.
> accidentally - and that the core problem of this DUT has not been
> resolved. Sometime, somewhere, someone else is going to come across
> this and find intermittent failures because the sleep didn't turn out
> to be long enough.
>
>  From job_94.log
> - {"dt": "2018-12-04T14:21:35.918610", "lvl": "target", "msg": "=>
> dhcpsetenv serverip 172.17.1.189"}
>
> That is the core problem. The DUT hasn't processed the single command
> as a list of commands, it's concatenated.

What it was concatenated in the first place? If there would be a correct
prompt processing I would not see this issue at all, right?

>> Log attached.
>>
>>>> However, I still realize this is a hack, and looking into the log I
>>>> might be catching something:
>>>>
>>>> - {"dt": "2018-12-04T12:47:23.731495", "lvl": "debug", "msg":
>>>> "bootloader-commands: Wait for prompt ['=>', 'Resetting CPU', 'Must
>>>> RESET board to recover', 'TIMEOUT', 'Retry count exceeded', 'ERROR: The
>>>> remote end did not respond in time.', 'Retry time exceeded'] (timeout
>>>> 00:04:56)"}
>>>> - {"dt": "2018-12-04T12:47:23.746562", "lvl": "input", "msg": "sleep 5;
>>>> setenv serverip 172.17.1.189; dhcp\n"}
>>>> - {"dt": "2018-12-04T12:47:23.852418", "lvl": "target", "msg": "sleep 5;
>>>> setenv serverip 172.17.1.189; dhcp"}
>>>> - {"dt": "2018-12-04T12:47:23.857183", "lvl": "debug", "msg":
>>>> "bootloader-commands: Wait for prompt ['=>', 'Resetting CPU', 'Must
>>>> RESET board to recover', 'TIMEOUT', 'Retry count exceeded', 'ERROR: The
>>>> remote end did not respond in time.', 'Retry time exceeded'] (timeout
>>>> 00:04:56)"}
>>>> - {"dt": "2018-12-04T12:47:23.862239", "lvl": "target", "msg": "setenv
>>>> autoload no"}
>>>> - {"dt": "2018-12-04T12:47:23.866570", "lvl": "input", "msg": "setenv
>>>> serverip 172.17.1.189\n"}
>>>>
>>>> It looks like prompt waiting is skipped in this particular block! Why?
>>>>
>>>>> https://master.lavasoftware.org/static/docs/v2/contribution-intro.html
>>>>>
>>>>>> Also, attaching updated device type config.
>>>>>>
>>>>>> $ diff base-uboot.jinja2 base-uboot.jinja2.backup
>>>>>>
>>>>>> 19,21c19,20
>>>>>> < "          - setenv serverip {SERVER_IP}
>>>>>> <           - " + uboot_ipaddr_cmd|default('dhcp') + "
>>>>>> <           - sleep 1" ) -%}
>>>>>> ---
>>>>>>     > "          - " + uboot_ipaddr_cmd|default('dhcp') + "
>>>>>>     >           - setenv serverip {SERVER_IP}" ) -%}
>>>>>
>>>>> This doesn't need a change to base-uboot.jinja2 - this can be done in
>>>>> the device-type which extends base-uboot.jinja2, supported by a unit
>>>>> test.
>>>>
>>>> No, that's not true. `base_uboot_dhcp_command` variable from
>>>> base-uboot.jinja2 is not overrideable by device type templates. Or did I
>>>> missed something?
>>>
>>> OK, the base-boot.jinja2 could do with override support to allow for that.
>>>
>>> I'll have a look at your test job log file and see how to both remove
>>> the sleep and provide the override.
>
> For the override, it's simply a case of defining a variable and using
> the existing string as a default:
>
> {% set base_uboot_dhcp_command = uboot_dhcp_command|default(
> "          - " + uboot_ipaddr_cmd|default('dhcp') +"
>            - setenv serverip {SERVER_IP}" ) -%}
>
> Then your device-type jinja2 could set uboot_dhcp_command

In latest official release (2018.11-1~bpo9+1) it's:

{% set base_uboot_dhcp_command = (
"          - " + uboot_ipaddr_cmd|default('dhcp') + "
           - setenv serverip {SERVER_IP}" ) -%}

Which does not allow me to override `base_uboot_dhcp_command` from
device type config.

> When it does, it should set a list of commands not a single command
> using semicolons. Compare with x15.jinja2

Thanks, but we're not using fastboot, we're booting via tftp from
Jenkins. If I strip all fastboot related configuration from x15.jinja2,
it will be basically same configuration as I have now.

>>>
>>>>>> Best regards,
>>>>>> Andrejs Cainikovs.
>>>>>>
>>>>>> On 29/11/2018 16:38, Andrejs Cainikovs wrote:
>>>>>>> Hi Neil,
>>>>>>>
>>>>>>> On 29/11/2018 15:58, Neil Williams wrote:
>>>>>>>     > On Thu, 29 Nov 2018 at 14:42, Andrejs Cainikovs
>>>>>>>     > <Andrejs.Cainikovs@netmodule.com> wrote:
>>>>>>>     >> Hi everyone,
>>>>>>>     >>
>>>>>>>     >>
>>>>>>>     >>
>>>>>>>     >> I’m facing an issue when U-Boot commands are sent one after another
>>>>>>> without waiting for a prompt. Obviously, device is not able to boot.
>>>>>>>     >>
>>>>>>>     >>
>>>>>>>     >>
>>>>>>>     >> Excerpt from logs:
>>>>>>>     > (Much better to attach the complete log as downloaded from the menu on
>>>>>>>     > the job detail page, it includes timestamps and other useful content.)
>>>>>>>     > I'm afraid the content you've pasted into your original doesn't help.
>>>>>>>     > Please attach the complete test job log file, don't paste it into the
>>>>>>>     > body of the email where it will be mangled.
>>>>>>>
>>>>>>> Complete log attached.
>>>>>>>
>>>>>>>     > What version of lava-server and lava-dispatcher are you running?
>>>>>>>
>>>>>>> Latest from stretch-backports: 2018.11-1~bpo9+1
>>>>>>>
>>>>>>>     >
>>>>>>>     > One thing would be to obtain a U-Boot device which is already
>>>>>>>     > available on staging.validation.linaro.org or another instance and
>>>>>>>     > known to be working, then borrow the config from that. If you have
>>>>>>>     > something like that available already, it'll make things easier.
>>>>>>>
>>>>>>> Thanks for a hint, I'll have a look into this.
>>>>>>>
>>>>>>>     >>
>>>>>>>     >>
>>>>>>>     >> As you see, all commands that were sent between `dhcp` command and
>>>>>>> until it was able to complete or fail are simply dropped.
>>>>>>>     >>
>>>>>>>     >>
>>>>>>>     >>
>>>>>>>     >> Device type config:
>>>>>>>     >>
>>>>>>>     >>
>>>>>>>     >>
>>>>>>>     >> {% extends 'base-uboot.jinja2' %}
>>>>>>>     >>
>>>>>>>     >>
>>>>>>>     >>
>>>>>>>     >> {% set device_type = "vcu" %}
>>>>>>>     > This line isn't required and a couple of other lines are the defaults.
>>>>>>>     >
>>>>>>>     >>
>>>>>>>     >> {% set console_device = 'ttyS2' %}
>>>>>>>     >>
>>>>>>>     >> {% set baud_rate = 115200 %}
>>>>>>>     >>
>>>>>>>     >> {% set interrupt_prompt = 'Press s to abort autoboot' %}
>>>>>>>     >>
>>>>>>>     >> {% set interrupt_char = 's' %}
>>>>>>>     >>
>>>>>>>     >> {% set bootloader_prompt = '=>' %}
>>>>>>>     >>
>>>>>>>     >>
>>>>>>>     >>
>>>>>>>     >> {% set uboot_mkimage_arch = 'arm' %}
>>>>>>>     >>
>>>>>>>     >>
>>>>>>>     >>
>>>>>>>     >> {% set bootz_kernel_addr = '0x80000000' %}
>>>>>>>     >>
>>>>>>>     >> {% set bootz_ramdisk_addr = '0x83000000' %}
>>>>>>>     >>
>>>>>>>     >> {% set bootz_dtb_addr = '0x82000000' %}
>>>>>>>     >>
>>>>>>>     >>
>>>>>>>     >>
>>>>>>>     >> {% set extra_kernel_args = 'ti_cpsw.rx_packet_max=1526' %}
>>>>>>>     >>
>>>>>>>     >>
>>>>>>>     >>
>>>>>>>     >> {% set kernel_start_message = 'Welcome to' %}
>>>>>>>     >>
>>>>>>>     >>
>>>>>>>     >>
>>>>>>>     >> Device config:
>>>>>>>     >>
>>>>>>>     >>
>>>>>>>     >>
>>>>>>>     >> {% extends 'vcu.jinja2' %}
>>>>>>>     >>
>>>>>>>     >>
>>>>>>>     >>
>>>>>>>     >> {% set connection_command = 'telnet lava-disp-1.local 7000' %}
>>>>>>>     >>
>>>>>>>     >> {% set power_on_command = 'relay-ctrl --relay 1 --state on' %}
>>>>>>>     >>
>>>>>>>     >> {% set power_off_command = 'relay-ctrl --relay 1 --state off' %}
>>>>>>>     >>
>>>>>>>     >> {% set hard_reset_command = 'relay-ctrl --relay 1 --toggle' %}
>>>>>>>     >>
>>>>>>>     >>
>>>>>>>     >>
>>>>>>>     >> Boot action block from job definition:
>>>>>>>     >>
>>>>>>>     >>
>>>>>>>     >>
>>>>>>>     >> - boot:
>>>>>>>     >>
>>>>>>>     >>      timeout:
>>>>>>>     >>
>>>>>>>     >>        minutes: 5
>>>>>>>     >>
>>>>>>>     >>      method: u-boot
>>>>>>>     >>
>>>>>>>     >>      commands: ramdisk
>>>>>>>     >>
>>>>>>>     >>      auto_login:
>>>>>>>     >>
>>>>>>>     >>        login_prompt: 'am335x-nmhw21 login: '
>>>>>>>     >>
>>>>>>>     >>        username: root
>>>>>>>     >>
>>>>>>>     >>      prompts:
>>>>>>>     >>
>>>>>>>     >>      - 'fct@am335x-nmhw21:~# '
>>>>>>>     >>
>>>>>>>     >>
>>>>>>>     >>
>>>>>>>     >> Have I misconfigured something? What I’m missing? Thanks!
>>>>>>>     >
>>>>>>>     >
>>>>>>>     >
>>>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>>
>>>
>
>
>