On Tue, Jan 31, 2023 at 11:49:10AM +0000, Teo Couprie Diaz wrote:
Hi Beata,
Thanks again for the comments, I just came back to this patch.
Thanks for bearing with me !
On 18/01/2023 14:16, Beata Michalska wrote:
On Fri, Jan 13, 2023 at 04:52:06PM +0000, Teo Couprie Diaz wrote:
On 12/01/2023 17:58, Beata Michalska wrote:
Hi Teo,
Thanks for the efforts in solving that cgroup nightmare!
On Thu, Jan 12, 2023 at 10:20:25AM +0000, Teo Couprie Diaz wrote:
In our debian-based distribution, the two files used in lib/tst_pid are not available, but cgroups still imposes a task limit far lesser than the kernel pid_max. If the cgroup sysfs is mounted, we can use it to retrieve the current task limit imposed to the process. Implement the retrieval of this limit.
This can be done by first checking /proc/self/cgroup to get the cgroup the process is in, which will be a path under the cgroup sysfs. To get the path to the cgroup sysfs, check /proc/self/mountinfo. Finally, concatenate those two paths with pid.max to get the full path to the file containing the limit.
This fixed msgstress04, but it appeared that msgstress03 didn't account for all of its PIDs, as it expects that tasks will terminate quickly and not reach the PID limit, so it still hit it. Change its limit to account for all possible tasks running simultaneously.
Signed-off-by: Teo Couprie Diaz teo.coupriediaz@arm.com
lib/tst_pid.c | 43 +++++++++++++++++++ .../syscalls/ipc/msgstress/msgstress03.c | 2 +- 2 files changed, 44 insertions(+), 1 deletion(-)
diff --git a/lib/tst_pid.c b/lib/tst_pid.c index 21cadef2a..1fc2f11f1 100644 --- a/lib/tst_pid.c +++ b/lib/tst_pid.c @@ -103,6 +103,49 @@ static int get_session_pids_limit(void (*cleanup_fn) (void)) if (max_pids < 0) max_pids = read_session_pids_limit(CGROUPS_V1_SLICE_FMT, uid, cleanup_fn);
As we are trying to determine the proper path for pid controller below, do we really need the checks above with fixed paths ?
I suppose not, I left there as a fall-back as it _seems_ to not be too much of an issue until now. I don't mind removing them as indeed they might be redundant.
- /* Check for generic cgroup v1 pid.max */
- if (max_pids < 0) {
char path[PATH_MAX];
char cgroup_pids[PATH_MAX];
char catchall;
int cgroup_ok = 0;
cgroup_ok = FILE_LINES_SCANF(cleanup_fn, "/proc/self/cgroup", "%*d :pids: %s \n", &cgroup_pids);
We could actually drop early here - not much point reading the mountinfo proc entry if we don't have or couldn't find pids cgroup controller.
Agreed, I don't really now what would be the proper way to do that though ! Would a label+goto be appropriate here ? Or enclosing the code in another if, maybe a do {} while (false) ? What do you think ?
That's up to you really, getting another 'if' block would make sense, especially if we drop the previous two checks. Otherwise having do {} while (0) block makes sense as well.
What I chose to do in the end is to drop the previous ifs and have early returns when possible. This eliminates the `max_pids` variable, but a level of checks as well. I feel this makes it clearer and prevents too much condition nesting or an ugly `do {...} while(0);`.
That sounds good.
/*
* This is a bit of hack of scanf format strings. It only works forward
* which means that if a variable has been matched, but not a later
* part of the format it still counts as a match. Thus, we need a
* catch-all match later in the string that will _not_ be counted if
* the static part of the format doesn't match.
* This way, the macro will go on the next line rather than returning
* on a line matching the %s but not the static part.
*/
cgroup_ok |= FILE_LINES_SCANF(cleanup_fn, "/proc/self/mountinfo", "%*s %*s %*s %*s %s %*s shared:16 - cgroup %c", &path, &catchall);
It makes sense as we need to make sure the whole format has been matched. It could be solved by modifying relevant file_lines_scanf function and using "%n" directive but that might be too much of a hassle so we can stick to this approach instead.
It might be a bit annoying, given that %n does not increase the number of matches, which means its value would need to be checked and depend on the format string I think ?
Indeed, which is why I have suggested to leave things as they are.
I'm not entirely sure whether assuming shared mounts is a good idea, but the group identifier should definitely not be fixed and we should not assume '16' here. I think we could skip those with the '*' and appropriate formats. (same applies below).
You're right. During my testing this was consistent across the three OSes I had, but since updating to using the 6.1 kernel it changed on my Morello board, so this is definitely a no-go. I have tested a change to the format string which seems to work by matching the "pids" at the end of the mount point, so it should be ok.
Also I am wondering if we should not consider mount's root, as mount point is supposed to be relative to root , so in some weird cases this will not provide an actual full path. I guess we can ignore that for now, but slapping a comment here would be good.
It would probably be some quite weird cases which is why I didn't take into account here. It shouldn't add much complexity to be honest, I can add it if you think it's useful, but the comment works for me as well.
No, it would indeed be pretty unusual thing so let's just leave some comment that this assumes /sys/fs/cgroup mount point to by the default/expected.
Agreed, added a comment.
Thanks.
if (!cgroup_ok) {
strncat(path, cgroup_pids, PATH_MAX);
strncat(path, "/pids.max", PATH_MAX);
max_pids = read_session_pids_limit(path, uid,
cleanup_fn);
}
- }
- /* Check for generic cgroup v2 pid.max */
- if (max_pids < 0) {
char path[PATH_MAX];
char cgroup_pids[PATH_MAX];
char catchall;
int cgroup2_ok = 0;
cgroup2_ok = FILE_LINES_SCANF(cleanup_fn, "/proc/self/cgroup", "%*d :: %s \n", &cgroup_pids);
cgroup2_ok |= FILE_LINES_SCANF(cleanup_fn, "/proc/self/mountinfo", "%*s %*s %*s %*s %s %*s shared:9 - cgroup2 %c", &path, &catchall);
if (!cgroup2_ok) {
strncat(path, cgroup_pids, PATH_MAX);
strncat(path, "/pids.max", PATH_MAX);
max_pids = read_session_pids_limit(path, uid,
cleanup_fn);
}
- }
My initial tought was to combine the two and decide which cgroup version we are dealing with based on parsing the meminfo with:
"%*s %*s %*s %*s %s %*s shared:9 - cgroup%[ 2]", &path, &cgroup_ver
but that might not work entirely with the unified hierarchy, where we would catch v2 but with no pids.max entry. It would also fail with mixed support. Looking also at the v1 support,this needs 'pids' name controller in the format string for sscanf otherwise we will get wrong path (it will match first entry with cgroup mount type, which might not be for pid controller and the final path will be the wrong one). So I guess the simplest approach is to check for each cgroup separately as you did.
I would have liked something less duplicated as well, but with the limited capabilities of the format strings compared to a regex for example I preferred splitting it that way.
Right, so how about:
Steps:
search for 'pids' entry in '/proc/self/mountinfo' if found: (we have the mount point by now): // *1 - here save the 'pids' relative path for later search for 'pids' controller path in '/proc/self/cgroup' if found: read_session_pids_limit..... if max_pids > 0: return max_pids search for 'cgroup2' in '/proc/self/mountinfo' if found: /* search for 0::......... */ search for relative path for cgroup controller in '/proc/self/cgroup' if found: read_session_pids_limit..... if max_pids > 0 return max_pids else /* here to cover the discrepancies between v1 and v2 and * the docs*/ read from cgroupv2_mount_point/pids_path from *1
If that makes sense ?
It makes a lot of sense, I ended up simplifying the return check as `read_session_pids_limit` already returns -1 if 0 or invalid, so we can return that directly.
However I'm not sure about the mixed part cgroupv2/path_from_pidsv1. Testing on the Morello board doesn't give me anything useful (there is no cgroupv2 pid controller active), or incorrect/incomplete on my work laptop (the cgroupv2 group is a child of the one this would give, plus the previous issue). Does it turn out differently when testing on your side ? Do you think this would be worthwhile to keep, rather than the simple "check 1, if not check 2, if not error" ?
So I am still (apparently) trying to wrap my head around cgroups v2. And you are right I might be off with the path mixing. Searching for 'pids' in mountinfo should succeed only with the PIDS controller being under cgroups v1 control, at which case the /proc/$PID/cgroup should have 'pids' entry. If there is no 'pids' entry in /proc/$PID/mountinfo then we should look for 'cgroup2' entry. If one is found, looking at the /proc/$PID/cgroup should give us a relative path to cgroup the process belongs to and thus to the 'pids.max' file if the PIDS controller is being enabled. If any of this fails it means there is no limit imposed on number of processes through cgroup interface. So you can ignore the 'discrepancies' section from my comment above. Thanks for pointing that one out.
One thing though when looking for mount point for PIDS controller when using cgroups v1: we should not look for 'pids' within the mount point but within the super block options as the mount point can be named differently (again that will probably not ever happen but still) e.g:
mkdir -p /cgroups_mountpoint/whatever mount -t cgroup -o pids none /cgroups_mountpoint/whatever/ cat /proc/self/mountinfo | grep pids
gives:
954 29 0:37 / /cgroups_mountpoint/whatever rw,relatime shared:649 - cgroup none rw,pids
Hope I didn't cause more confusion and apologies for one caused earlier.
--- BR B.
We could even try to cover the inheritance case with the 'max' specifier, by remembering the path and then going level up and then trying to find 'pids.max' entry with readdir (like find_stat_file in lib/tst_device.c i.e.) (I am happy to implement the latter if needed, I'm aware I have already ask for a lot)
Happy to have a look at it, I'm interested and hopefully it should be ok :) I'll try and implement it for the next revision of the patch, after testing the current changes.
BR B.
Thanks again for the comments, Téo
I would just suggest doing that in one block keeping with single declaration for needed variables.
I didn't think about that, as I mimicked the previous logic, but that would be a bit cleaner indeed.
if (max_pids < 0) return -1;
diff --git a/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c b/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c index 3cb70ab18..0c46927b8 100644 --- a/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c +++ b/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c @@ -109,7 +109,7 @@ int main(int argc, char **argv) } }
- free_pids = tst_get_free_pids(cleanup);
- free_pids = tst_get_free_pids(cleanup) / 2;
Could we move that to a separate patch as this is dealing with a different issue?
Will do !
Thanks for the review, T�o
BR B.
if (nprocs >= free_pids) { tst_resm(TINFO, "Requested number of processes higher than limit (%d > %d), "
-- 2.25.1
linux-morello-ltp mailing list -- linux-morello-ltp@op-lists.linaro.org To unsubscribe send an email to linux-morello-ltp-leave@op-lists.linaro.org