We observed failures of the msgstress0{3,4} because they hit the PID limit imposed by cgroup. This usually is not an issue as either: - the system is fast enough for the threads to complete before hitting the limit, - the PID limit is far higher than the number of threads started by the tests, - there is no cgroup restricting the PID count further, - the limit can be found in the cgroup user slice.
In the case of our distributions on the Morello boards, neither of those is true. Indeed, the tasks do not complete fast enough to prevent hitting the limit, the limit is quite low (4915 PIDs!) because of the low number of cores and the default limit set by systemd of 15% per service, and the root user is not in its own cgroup user-slice, rather under the serial or sshd service, depending on the access method to the board.
This means that we needed to find a way in the test to find this limit. In this patch, I implement it by checking /proc/self/cgroup and /proc/self/mountinfo to find the cgroup imposing the PID limit as well as where the cgroup sysfs is mounted, if it is.
This has proven more reliable than the previous patch's implementation and more stable, however it is still imperfect in a few ways: - it cannot find the limit if the cgroup sysfs is not mounted, - it assumes only one cgroup imposing limits (might be OK), - it assumes the cgroup is the one with the limit (it might be set by a parent cgroup), - it assumes the mountinfo peer groups are consistent across systems.
It also introduces a few warings as I use a #define to set the array lengths, which cannot be used in the format string to limit the number of characters.
An alternate solution could be implemented by writing a shell script parsing this data directly and passing it as an argument to the affected tests. See for example runtest/cap_bounds or runtest/commands. But this implies changing each affected tests.
To note, upstream is aware that those tests are quite broken[0]. There was a patch series updating them but it was never followed-through.
As this is not really Morello related, rather it is directly linked to the tests and the systems they are run on, this change might be shelved undefinitely.
Thanks a lot to Beata for the help while investigating this !
[0]: https://lists.linux.it/pipermail/ltp/2022-January/027163.html
Teo Couprie Diaz (1): lib/tst_pid: Find cgroup pid.max programatically
lib/tst_pid.c | 43 +++++++++++++++++++ .../syscalls/ipc/msgstress/msgstress03.c | 2 +- 2 files changed, 44 insertions(+), 1 deletion(-)