We observed failures of the msgstress0{3,4} because they hit the PID limit imposed by cgroup. This usually is not an issue as either: - the system is fast enough for the threads to complete before hitting the limit, - the PID limit is far higher than the number of threads started by the tests, - there is no cgroup restricting the PID count further, - the limit can be found in the cgroup user slice.
In the case of our distributions on the Morello boards, neither of those is true. Indeed, the tasks do not complete fast enough to prevent hitting the limit, the limit is quite low (4915 PIDs!) because of the low number of cores and the default limit set by systemd of 15% per service, and the root user is not in its own cgroup user-slice, rather under the serial or sshd service, depending on the access method to the board.
This means that we needed to find a way in the test to find this limit. In this patch, I implement it by checking /proc/self/cgroup and /proc/self/mountinfo to find the cgroup imposing the PID limit as well as where the cgroup sysfs is mounted, if it is.
This has proven more reliable than the previous patch's implementation and more stable, however it is still imperfect in a few ways: - it cannot find the limit if the cgroup sysfs is not mounted, - it assumes only one cgroup imposing limits (might be OK), - it assumes the cgroup is the one with the limit (it might be set by a parent cgroup), - it assumes the mountinfo peer groups are consistent across systems.
It also introduces a few warings as I use a #define to set the array lengths, which cannot be used in the format string to limit the number of characters.
An alternate solution could be implemented by writing a shell script parsing this data directly and passing it as an argument to the affected tests. See for example runtest/cap_bounds or runtest/commands. But this implies changing each affected tests.
To note, upstream is aware that those tests are quite broken[0]. There was a patch series updating them but it was never followed-through.
As this is not really Morello related, rather it is directly linked to the tests and the systems they are run on, this change might be shelved undefinitely.
Thanks a lot to Beata for the help while investigating this !
[0]: https://lists.linux.it/pipermail/ltp/2022-January/027163.html
Teo Couprie Diaz (1): lib/tst_pid: Find cgroup pid.max programatically
lib/tst_pid.c | 43 +++++++++++++++++++ .../syscalls/ipc/msgstress/msgstress03.c | 2 +- 2 files changed, 44 insertions(+), 1 deletion(-)
In our debian-based distribution, the two files used in lib/tst_pid are not available, but cgroups still imposes a task limit far lesser than the kernel pid_max. If the cgroup sysfs is mounted, we can use it to retrieve the current task limit imposed to the process. Implement the retrieval of this limit.
This can be done by first checking /proc/self/cgroup to get the cgroup the process is in, which will be a path under the cgroup sysfs. To get the path to the cgroup sysfs, check /proc/self/mountinfo. Finally, concatenate those two paths with pid.max to get the full path to the file containing the limit.
This fixed msgstress04, but it appeared that msgstress03 didn't account for all of its PIDs, as it expects that tasks will terminate quickly and not reach the PID limit, so it still hit it. Change its limit to account for all possible tasks running simultaneously.
Signed-off-by: Teo Couprie Diaz teo.coupriediaz@arm.com --- lib/tst_pid.c | 43 +++++++++++++++++++ .../syscalls/ipc/msgstress/msgstress03.c | 2 +- 2 files changed, 44 insertions(+), 1 deletion(-)
diff --git a/lib/tst_pid.c b/lib/tst_pid.c index 21cadef2a..1fc2f11f1 100644 --- a/lib/tst_pid.c +++ b/lib/tst_pid.c @@ -103,6 +103,49 @@ static int get_session_pids_limit(void (*cleanup_fn) (void)) if (max_pids < 0) max_pids = read_session_pids_limit(CGROUPS_V1_SLICE_FMT, uid, cleanup_fn); + /* Check for generic cgroup v1 pid.max */ + if (max_pids < 0) { + char path[PATH_MAX]; + char cgroup_pids[PATH_MAX]; + char catchall; + int cgroup_ok = 0; + + cgroup_ok = FILE_LINES_SCANF(cleanup_fn, "/proc/self/cgroup", "%*d :pids: %s \n", &cgroup_pids); + /* + * This is a bit of hack of scanf format strings. It only works forward + * which means that if a variable has been matched, but not a later + * part of the format it still counts as a match. Thus, we need a + * catch-all match later in the string that will _not_ be counted if + * the static part of the format doesn't match. + * This way, the macro will go on the next line rather than returning + * on a line matching the %s but not the static part. + */ + cgroup_ok |= FILE_LINES_SCANF(cleanup_fn, "/proc/self/mountinfo", "%*s %*s %*s %*s %s %*s shared:16 - cgroup %c", &path, &catchall); + + if (!cgroup_ok) { + strncat(path, cgroup_pids, PATH_MAX); + strncat(path, "/pids.max", PATH_MAX); + max_pids = read_session_pids_limit(path, uid, + cleanup_fn); + } + } + /* Check for generic cgroup v2 pid.max */ + if (max_pids < 0) { + char path[PATH_MAX]; + char cgroup_pids[PATH_MAX]; + char catchall; + int cgroup2_ok = 0; + + cgroup2_ok = FILE_LINES_SCANF(cleanup_fn, "/proc/self/cgroup", "%*d :: %s \n", &cgroup_pids); + cgroup2_ok |= FILE_LINES_SCANF(cleanup_fn, "/proc/self/mountinfo", "%*s %*s %*s %*s %s %*s shared:9 - cgroup2 %c", &path, &catchall); + + if (!cgroup2_ok) { + strncat(path, cgroup_pids, PATH_MAX); + strncat(path, "/pids.max", PATH_MAX); + max_pids = read_session_pids_limit(path, uid, + cleanup_fn); + } + }
if (max_pids < 0) return -1; diff --git a/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c b/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c index 3cb70ab18..0c46927b8 100644 --- a/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c +++ b/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c @@ -109,7 +109,7 @@ int main(int argc, char **argv) } }
- free_pids = tst_get_free_pids(cleanup); + free_pids = tst_get_free_pids(cleanup) / 2; if (nprocs >= free_pids) { tst_resm(TINFO, "Requested number of processes higher than limit (%d > %d), "
On 12/01/2023 10:20, Teo Couprie Diaz wrote:
In our debian-based distribution, the two files used in lib/tst_pid are not available, but cgroups still imposes a task limit far lesser than the kernel pid_max. If the cgroup sysfs is mounted, we can use it to retrieve the current task limit imposed to the process. Implement the retrieval of this limit.
This can be done by first checking /proc/self/cgroup to get the cgroup the process is in, which will be a path under the cgroup sysfs. To get the path to the cgroup sysfs, check /proc/self/mountinfo. Finally, concatenate those two paths with pid.max to get the full path to the file containing the limit.
This fixed msgstress04, but it appeared that msgstress03 didn't account for all of its PIDs, as it expects that tasks will terminate quickly and not reach the PID limit, so it still hit it. Change its limit to account for all possible tasks running simultaneously.
Signed-off-by: Teo Couprie Diaz teo.coupriediaz@arm.com
lib/tst_pid.c | 43 +++++++++++++++++++ .../syscalls/ipc/msgstress/msgstress03.c | 2 +- 2 files changed, 44 insertions(+), 1 deletion(-)
diff --git a/lib/tst_pid.c b/lib/tst_pid.c index 21cadef2a..1fc2f11f1 100644 --- a/lib/tst_pid.c +++ b/lib/tst_pid.c @@ -103,6 +103,49 @@ static int get_session_pids_limit(void (*cleanup_fn) (void)) if (max_pids < 0) max_pids = read_session_pids_limit(CGROUPS_V1_SLICE_FMT, uid, cleanup_fn);
- /* Check for generic cgroup v1 pid.max */
- if (max_pids < 0) {
char path[PATH_MAX];
char cgroup_pids[PATH_MAX];
char catchall;
int cgroup_ok = 0;
cgroup_ok = FILE_LINES_SCANF(cleanup_fn, "/proc/self/cgroup", "%*d :pids: %s \n", &cgroup_pids);
I forgot to make the scanf lines the appropriate length, will be fixed in a v3 if this solution is acceptable.
/*
* This is a bit of hack of scanf format strings. It only works forward
* which means that if a variable has been matched, but not a later
* part of the format it still counts as a match. Thus, we need a
* catch-all match later in the string that will _not_ be counted if
* the static part of the format doesn't match.
* This way, the macro will go on the next line rather than returning
* on a line matching the %s but not the static part.
*/
cgroup_ok |= FILE_LINES_SCANF(cleanup_fn, "/proc/self/mountinfo", "%*s %*s %*s %*s %s %*s shared:16 - cgroup %c", &path, &catchall);
if (!cgroup_ok) {
strncat(path, cgroup_pids, PATH_MAX);
strncat(path, "/pids.max", PATH_MAX);
max_pids = read_session_pids_limit(path, uid,
cleanup_fn);
}
- }
- /* Check for generic cgroup v2 pid.max */
- if (max_pids < 0) {
char path[PATH_MAX];
char cgroup_pids[PATH_MAX];
char catchall;
int cgroup2_ok = 0;
cgroup2_ok = FILE_LINES_SCANF(cleanup_fn, "/proc/self/cgroup", "%*d :: %s \n", &cgroup_pids);
cgroup2_ok |= FILE_LINES_SCANF(cleanup_fn, "/proc/self/mountinfo", "%*s %*s %*s %*s %s %*s shared:9 - cgroup2 %c", &path, &catchall);
if (!cgroup2_ok) {
strncat(path, cgroup_pids, PATH_MAX);
strncat(path, "/pids.max", PATH_MAX);
max_pids = read_session_pids_limit(path, uid,
cleanup_fn);
}
- }
if (max_pids < 0) return -1; diff --git a/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c b/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c index 3cb70ab18..0c46927b8 100644 --- a/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c +++ b/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c @@ -109,7 +109,7 @@ int main(int argc, char **argv) } }
- free_pids = tst_get_free_pids(cleanup);
- free_pids = tst_get_free_pids(cleanup) / 2; if (nprocs >= free_pids) { tst_resm(TINFO, "Requested number of processes higher than limit (%d > %d), "
Hi Teo,
Thanks for the efforts in solving that cgroup nightmare!
On Thu, Jan 12, 2023 at 10:20:25AM +0000, Teo Couprie Diaz wrote:
In our debian-based distribution, the two files used in lib/tst_pid are not available, but cgroups still imposes a task limit far lesser than the kernel pid_max. If the cgroup sysfs is mounted, we can use it to retrieve the current task limit imposed to the process. Implement the retrieval of this limit.
This can be done by first checking /proc/self/cgroup to get the cgroup the process is in, which will be a path under the cgroup sysfs. To get the path to the cgroup sysfs, check /proc/self/mountinfo. Finally, concatenate those two paths with pid.max to get the full path to the file containing the limit.
This fixed msgstress04, but it appeared that msgstress03 didn't account for all of its PIDs, as it expects that tasks will terminate quickly and not reach the PID limit, so it still hit it. Change its limit to account for all possible tasks running simultaneously.
Signed-off-by: Teo Couprie Diaz teo.coupriediaz@arm.com
lib/tst_pid.c | 43 +++++++++++++++++++ .../syscalls/ipc/msgstress/msgstress03.c | 2 +- 2 files changed, 44 insertions(+), 1 deletion(-)
diff --git a/lib/tst_pid.c b/lib/tst_pid.c index 21cadef2a..1fc2f11f1 100644 --- a/lib/tst_pid.c +++ b/lib/tst_pid.c @@ -103,6 +103,49 @@ static int get_session_pids_limit(void (*cleanup_fn) (void)) if (max_pids < 0) max_pids = read_session_pids_limit(CGROUPS_V1_SLICE_FMT, uid, cleanup_fn);
As we are trying to determine the proper path for pid controller below, do we really need the checks above with fixed paths ?
- /* Check for generic cgroup v1 pid.max */
- if (max_pids < 0) {
char path[PATH_MAX];
char cgroup_pids[PATH_MAX];
char catchall;
int cgroup_ok = 0;
cgroup_ok = FILE_LINES_SCANF(cleanup_fn, "/proc/self/cgroup", "%*d :pids: %s \n", &cgroup_pids);
We could actually drop early here - not much point reading the mountinfo proc entry if we don't have or couldn't find pids cgroup controller.
/*
* This is a bit of hack of scanf format strings. It only works forward
* which means that if a variable has been matched, but not a later
* part of the format it still counts as a match. Thus, we need a
* catch-all match later in the string that will _not_ be counted if
* the static part of the format doesn't match.
* This way, the macro will go on the next line rather than returning
* on a line matching the %s but not the static part.
*/
cgroup_ok |= FILE_LINES_SCANF(cleanup_fn, "/proc/self/mountinfo", "%*s %*s %*s %*s %s %*s shared:16 - cgroup %c", &path, &catchall);
It makes sense as we need to make sure the whole format has been matched. It could be solved by modifying relevant file_lines_scanf function and using "%n" directive but that might be too much of a hassle so we can stick to this approach instead. I'm not entirely sure whether assuming shared mounts is a good idea, but the group identifier should definitely not be fixed and we should not assume '16' here. I think we could skip those with the '*' and appropriate formats. (same applies below). Also I am wondering if we should not consider mount's root, as mount point is supposed to be relative to root , so in some weird cases this will not provide an actual full path. I guess we can ignore that for now, but slapping a comment here would be good.
if (!cgroup_ok) {
strncat(path, cgroup_pids, PATH_MAX);
strncat(path, "/pids.max", PATH_MAX);
max_pids = read_session_pids_limit(path, uid,
cleanup_fn);
}
- }
- /* Check for generic cgroup v2 pid.max */
- if (max_pids < 0) {
char path[PATH_MAX];
char cgroup_pids[PATH_MAX];
char catchall;
int cgroup2_ok = 0;
cgroup2_ok = FILE_LINES_SCANF(cleanup_fn, "/proc/self/cgroup", "%*d :: %s \n", &cgroup_pids);
cgroup2_ok |= FILE_LINES_SCANF(cleanup_fn, "/proc/self/mountinfo", "%*s %*s %*s %*s %s %*s shared:9 - cgroup2 %c", &path, &catchall);
if (!cgroup2_ok) {
strncat(path, cgroup_pids, PATH_MAX);
strncat(path, "/pids.max", PATH_MAX);
max_pids = read_session_pids_limit(path, uid,
cleanup_fn);
}
- }
My initial tought was to combine the two and decide which cgroup version we are dealing with based on parsing the meminfo with:
"%*s %*s %*s %*s %s %*s shared:9 - cgroup%[ 2]", &path, &cgroup_ver
but that might not work entirely with the unified hierarchy, where we would catch v2 but with no pids.max entry. It would also fail with mixed support. Looking also at the v1 support,this needs 'pids' name controller in the format string for sscanf otherwise we will get wrong path (it will match first entry with cgroup mount type, which might not be for pid controller and the final path will be the wrong one). So I guess the simplest approach is to check for each cgroup separately as you did. I would just suggest doing that in one block keeping with single declaration for needed variables.
if (max_pids < 0) return -1; diff --git a/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c b/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c index 3cb70ab18..0c46927b8 100644 --- a/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c +++ b/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c @@ -109,7 +109,7 @@ int main(int argc, char **argv) } }
- free_pids = tst_get_free_pids(cleanup);
- free_pids = tst_get_free_pids(cleanup) / 2;
Could we move that to a separate patch as this is dealing with a different issue?
--- BR B.
if (nprocs >= free_pids) { tst_resm(TINFO, "Requested number of processes higher than limit (%d > %d), " -- 2.25.1
linux-morello-ltp mailing list -- linux-morello-ltp@op-lists.linaro.org To unsubscribe send an email to linux-morello-ltp-leave@op-lists.linaro.org
On 12/01/2023 17:58, Beata Michalska wrote:
Hi Teo,
Thanks for the efforts in solving that cgroup nightmare!
On Thu, Jan 12, 2023 at 10:20:25AM +0000, Teo Couprie Diaz wrote:
In our debian-based distribution, the two files used in lib/tst_pid are not available, but cgroups still imposes a task limit far lesser than the kernel pid_max. If the cgroup sysfs is mounted, we can use it to retrieve the current task limit imposed to the process. Implement the retrieval of this limit.
This can be done by first checking /proc/self/cgroup to get the cgroup the process is in, which will be a path under the cgroup sysfs. To get the path to the cgroup sysfs, check /proc/self/mountinfo. Finally, concatenate those two paths with pid.max to get the full path to the file containing the limit.
This fixed msgstress04, but it appeared that msgstress03 didn't account for all of its PIDs, as it expects that tasks will terminate quickly and not reach the PID limit, so it still hit it. Change its limit to account for all possible tasks running simultaneously.
Signed-off-by: Teo Couprie Diaz teo.coupriediaz@arm.com
lib/tst_pid.c | 43 +++++++++++++++++++ .../syscalls/ipc/msgstress/msgstress03.c | 2 +- 2 files changed, 44 insertions(+), 1 deletion(-)
diff --git a/lib/tst_pid.c b/lib/tst_pid.c index 21cadef2a..1fc2f11f1 100644 --- a/lib/tst_pid.c +++ b/lib/tst_pid.c @@ -103,6 +103,49 @@ static int get_session_pids_limit(void (*cleanup_fn) (void)) if (max_pids < 0) max_pids = read_session_pids_limit(CGROUPS_V1_SLICE_FMT, uid, cleanup_fn);
As we are trying to determine the proper path for pid controller below, do we really need the checks above with fixed paths ?
I suppose not, I left there as a fall-back as it _seems_ to not be too much of an issue until now. I don't mind removing them as indeed they might be redundant.
- /* Check for generic cgroup v1 pid.max */
- if (max_pids < 0) {
char path[PATH_MAX];
char cgroup_pids[PATH_MAX];
char catchall;
int cgroup_ok = 0;
cgroup_ok = FILE_LINES_SCANF(cleanup_fn, "/proc/self/cgroup", "%*d :pids: %s \n", &cgroup_pids);
We could actually drop early here - not much point reading the mountinfo proc entry if we don't have or couldn't find pids cgroup controller.
Agreed, I don't really now what would be the proper way to do that though ! Would a label+goto be appropriate here ? Or enclosing the code in another if, maybe a do {} while (false) ? What do you think ?
/*
* This is a bit of hack of scanf format strings. It only works forward
* which means that if a variable has been matched, but not a later
* part of the format it still counts as a match. Thus, we need a
* catch-all match later in the string that will _not_ be counted if
* the static part of the format doesn't match.
* This way, the macro will go on the next line rather than returning
* on a line matching the %s but not the static part.
*/
cgroup_ok |= FILE_LINES_SCANF(cleanup_fn, "/proc/self/mountinfo", "%*s %*s %*s %*s %s %*s shared:16 - cgroup %c", &path, &catchall);
It makes sense as we need to make sure the whole format has been matched. It could be solved by modifying relevant file_lines_scanf function and using "%n" directive but that might be too much of a hassle so we can stick to this approach instead.
It might be a bit annoying, given that %n does not increase the number of matches, which means its value would need to be checked and depend on the format string I think ?
I'm not entirely sure whether assuming shared mounts is a good idea, but the group identifier should definitely not be fixed and we should not assume '16' here. I think we could skip those with the '*' and appropriate formats. (same applies below).
You're right. During my testing this was consistent across the three OSes I had, but since updating to using the 6.1 kernel it changed on my Morello board, so this is definitely a no-go. I have tested a change to the format string which seems to work by matching the "pids" at the end of the mount point, so it should be ok.
Also I am wondering if we should not consider mount's root, as mount point is supposed to be relative to root , so in some weird cases this will not provide an actual full path. I guess we can ignore that for now, but slapping a comment here would be good.
It would probably be some quite weird cases which is why I didn't take into account here. It shouldn't add much complexity to be honest, I can add it if you think it's useful, but the comment works for me as well.
if (!cgroup_ok) {
strncat(path, cgroup_pids, PATH_MAX);
strncat(path, "/pids.max", PATH_MAX);
max_pids = read_session_pids_limit(path, uid,
cleanup_fn);
}
- }
- /* Check for generic cgroup v2 pid.max */
- if (max_pids < 0) {
char path[PATH_MAX];
char cgroup_pids[PATH_MAX];
char catchall;
int cgroup2_ok = 0;
cgroup2_ok = FILE_LINES_SCANF(cleanup_fn, "/proc/self/cgroup", "%*d :: %s \n", &cgroup_pids);
cgroup2_ok |= FILE_LINES_SCANF(cleanup_fn, "/proc/self/mountinfo", "%*s %*s %*s %*s %s %*s shared:9 - cgroup2 %c", &path, &catchall);
if (!cgroup2_ok) {
strncat(path, cgroup_pids, PATH_MAX);
strncat(path, "/pids.max", PATH_MAX);
max_pids = read_session_pids_limit(path, uid,
cleanup_fn);
}
- }
My initial tought was to combine the two and decide which cgroup version we are dealing with based on parsing the meminfo with:
"%*s %*s %*s %*s %s %*s shared:9 - cgroup%[ 2]", &path, &cgroup_ver
but that might not work entirely with the unified hierarchy, where we would catch v2 but with no pids.max entry. It would also fail with mixed support. Looking also at the v1 support,this needs 'pids' name controller in the format string for sscanf otherwise we will get wrong path (it will match first entry with cgroup mount type, which might not be for pid controller and the final path will be the wrong one). So I guess the simplest approach is to check for each cgroup separately as you did.
I would have liked something less duplicated as well, but with the limited capabilities of the format strings compared to a regex for example I preferred splitting it that way.
I would just suggest doing that in one block keeping with single declaration for needed variables.
I didn't think about that, as I mimicked the previous logic, but that would be a bit cleaner indeed.
if (max_pids < 0) return -1; diff --git a/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c b/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c index 3cb70ab18..0c46927b8 100644 --- a/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c +++ b/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c @@ -109,7 +109,7 @@ int main(int argc, char **argv) } }
- free_pids = tst_get_free_pids(cleanup);
- free_pids = tst_get_free_pids(cleanup) / 2;
Could we move that to a separate patch as this is dealing with a different issue?
Will do !
Thanks for the review, Téo
BR B.
if (nprocs >= free_pids) { tst_resm(TINFO, "Requested number of processes higher than limit (%d > %d), " -- 2.25.1
linux-morello-ltp mailing list -- linux-morello-ltp@op-lists.linaro.org To unsubscribe send an email to linux-morello-ltp-leave@op-lists.linaro.org
On Fri, Jan 13, 2023 at 04:52:06PM +0000, Teo Couprie Diaz wrote:
On 12/01/2023 17:58, Beata Michalska wrote:
Hi Teo,
Thanks for the efforts in solving that cgroup nightmare!
On Thu, Jan 12, 2023 at 10:20:25AM +0000, Teo Couprie Diaz wrote:
In our debian-based distribution, the two files used in lib/tst_pid are not available, but cgroups still imposes a task limit far lesser than the kernel pid_max. If the cgroup sysfs is mounted, we can use it to retrieve the current task limit imposed to the process. Implement the retrieval of this limit.
This can be done by first checking /proc/self/cgroup to get the cgroup the process is in, which will be a path under the cgroup sysfs. To get the path to the cgroup sysfs, check /proc/self/mountinfo. Finally, concatenate those two paths with pid.max to get the full path to the file containing the limit.
This fixed msgstress04, but it appeared that msgstress03 didn't account for all of its PIDs, as it expects that tasks will terminate quickly and not reach the PID limit, so it still hit it. Change its limit to account for all possible tasks running simultaneously.
Signed-off-by: Teo Couprie Diaz teo.coupriediaz@arm.com
lib/tst_pid.c | 43 +++++++++++++++++++ .../syscalls/ipc/msgstress/msgstress03.c | 2 +- 2 files changed, 44 insertions(+), 1 deletion(-)
diff --git a/lib/tst_pid.c b/lib/tst_pid.c index 21cadef2a..1fc2f11f1 100644 --- a/lib/tst_pid.c +++ b/lib/tst_pid.c @@ -103,6 +103,49 @@ static int get_session_pids_limit(void (*cleanup_fn) (void)) if (max_pids < 0) max_pids = read_session_pids_limit(CGROUPS_V1_SLICE_FMT, uid, cleanup_fn);
As we are trying to determine the proper path for pid controller below, do we really need the checks above with fixed paths ?
I suppose not, I left there as a fall-back as it _seems_ to not be too much of an issue until now. I don't mind removing them as indeed they might be redundant.
- /* Check for generic cgroup v1 pid.max */
- if (max_pids < 0) {
char path[PATH_MAX];
char cgroup_pids[PATH_MAX];
char catchall;
int cgroup_ok = 0;
cgroup_ok = FILE_LINES_SCANF(cleanup_fn, "/proc/self/cgroup", "%*d :pids: %s \n", &cgroup_pids);
We could actually drop early here - not much point reading the mountinfo proc entry if we don't have or couldn't find pids cgroup controller.
Agreed, I don't really now what would be the proper way to do that though ! Would a label+goto be appropriate here ? Or enclosing the code in another if, maybe a do {} while (false) ? What do you think ?
That's up to you really, getting another 'if' block would make sense, especially if we drop the previous two checks. Otherwise having do {} while (0) block makes sense as well.
/*
* This is a bit of hack of scanf format strings. It only works forward
* which means that if a variable has been matched, but not a later
* part of the format it still counts as a match. Thus, we need a
* catch-all match later in the string that will _not_ be counted if
* the static part of the format doesn't match.
* This way, the macro will go on the next line rather than returning
* on a line matching the %s but not the static part.
*/
cgroup_ok |= FILE_LINES_SCANF(cleanup_fn, "/proc/self/mountinfo", "%*s %*s %*s %*s %s %*s shared:16 - cgroup %c", &path, &catchall);
It makes sense as we need to make sure the whole format has been matched. It could be solved by modifying relevant file_lines_scanf function and using "%n" directive but that might be too much of a hassle so we can stick to this approach instead.
It might be a bit annoying, given that %n does not increase the number of matches, which means its value would need to be checked and depend on the format string I think ?
Indeed, which is why I have suggested to leave things as they are.
I'm not entirely sure whether assuming shared mounts is a good idea, but the group identifier should definitely not be fixed and we should not assume '16' here. I think we could skip those with the '*' and appropriate formats. (same applies below).
You're right. During my testing this was consistent across the three OSes I had, but since updating to using the 6.1 kernel it changed on my Morello board, so this is definitely a no-go. I have tested a change to the format string which seems to work by matching the "pids" at the end of the mount point, so it should be ok.
Also I am wondering if we should not consider mount's root, as mount point is supposed to be relative to root , so in some weird cases this will not provide an actual full path. I guess we can ignore that for now, but slapping a comment here would be good.
It would probably be some quite weird cases which is why I didn't take into account here. It shouldn't add much complexity to be honest, I can add it if you think it's useful, but the comment works for me as well.
No, it would indeed be pretty unusual thing so let's just leave some comment that this assumes /sys/fs/cgroup mount point to by the default/expected.
if (!cgroup_ok) {
strncat(path, cgroup_pids, PATH_MAX);
strncat(path, "/pids.max", PATH_MAX);
max_pids = read_session_pids_limit(path, uid,
cleanup_fn);
}
- }
- /* Check for generic cgroup v2 pid.max */
- if (max_pids < 0) {
char path[PATH_MAX];
char cgroup_pids[PATH_MAX];
char catchall;
int cgroup2_ok = 0;
cgroup2_ok = FILE_LINES_SCANF(cleanup_fn, "/proc/self/cgroup", "%*d :: %s \n", &cgroup_pids);
cgroup2_ok |= FILE_LINES_SCANF(cleanup_fn, "/proc/self/mountinfo", "%*s %*s %*s %*s %s %*s shared:9 - cgroup2 %c", &path, &catchall);
if (!cgroup2_ok) {
strncat(path, cgroup_pids, PATH_MAX);
strncat(path, "/pids.max", PATH_MAX);
max_pids = read_session_pids_limit(path, uid,
cleanup_fn);
}
- }
My initial tought was to combine the two and decide which cgroup version we are dealing with based on parsing the meminfo with:
"%*s %*s %*s %*s %s %*s shared:9 - cgroup%[ 2]", &path, &cgroup_ver
but that might not work entirely with the unified hierarchy, where we would catch v2 but with no pids.max entry. It would also fail with mixed support. Looking also at the v1 support,this needs 'pids' name controller in the format string for sscanf otherwise we will get wrong path (it will match first entry with cgroup mount type, which might not be for pid controller and the final path will be the wrong one). So I guess the simplest approach is to check for each cgroup separately as you did.
I would have liked something less duplicated as well, but with the limited capabilities of the format strings compared to a regex for example I preferred splitting it that way.
Right, so how about:
Steps:
search for 'pids' entry in '/proc/self/mountinfo' if found: (we have the mount point by now): // *1 - here save the 'pids' relative path for later search for 'pids' controller path in '/proc/self/cgroup' if found: read_session_pids_limit..... if max_pids > 0: return max_pids search for 'cgroup2' in '/proc/self/mountinfo' if found: /* search for 0::......... */ search for relative path for cgroup controller in '/proc/self/cgroup' if found: read_session_pids_limit..... if max_pids > 0 return max_pids else /* here to cover the discrepancies between v1 and v2 and * the docs*/ read from cgroupv2_mount_point/pids_path from *1
If that makes sense ?
We could even try to cover the inheritance case with the 'max' specifier, by remembering the path and then going level up and then trying to find 'pids.max' entry with readdir (like find_stat_file in lib/tst_device.c i.e.) (I am happy to implement the latter if needed, I'm aware I have already ask for a lot)
--- BR B.
I would just suggest doing that in one block keeping with single declaration for needed variables.
I didn't think about that, as I mimicked the previous logic, but that would be a bit cleaner indeed.
if (max_pids < 0) return -1; diff --git a/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c b/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c index 3cb70ab18..0c46927b8 100644 --- a/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c +++ b/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c @@ -109,7 +109,7 @@ int main(int argc, char **argv) } }
- free_pids = tst_get_free_pids(cleanup);
- free_pids = tst_get_free_pids(cleanup) / 2;
Could we move that to a separate patch as this is dealing with a different issue?
Will do !
Thanks for the review, Téo
BR B.
if (nprocs >= free_pids) { tst_resm(TINFO, "Requested number of processes higher than limit (%d > %d), " -- 2.25.1
linux-morello-ltp mailing list -- linux-morello-ltp@op-lists.linaro.org To unsubscribe send an email to linux-morello-ltp-leave@op-lists.linaro.org
Hi Beata,
Thanks again for the comments, I just came back to this patch.
On 18/01/2023 14:16, Beata Michalska wrote:
On Fri, Jan 13, 2023 at 04:52:06PM +0000, Teo Couprie Diaz wrote:
On 12/01/2023 17:58, Beata Michalska wrote:
Hi Teo,
Thanks for the efforts in solving that cgroup nightmare!
On Thu, Jan 12, 2023 at 10:20:25AM +0000, Teo Couprie Diaz wrote:
In our debian-based distribution, the two files used in lib/tst_pid are not available, but cgroups still imposes a task limit far lesser than the kernel pid_max. If the cgroup sysfs is mounted, we can use it to retrieve the current task limit imposed to the process. Implement the retrieval of this limit.
This can be done by first checking /proc/self/cgroup to get the cgroup the process is in, which will be a path under the cgroup sysfs. To get the path to the cgroup sysfs, check /proc/self/mountinfo. Finally, concatenate those two paths with pid.max to get the full path to the file containing the limit.
This fixed msgstress04, but it appeared that msgstress03 didn't account for all of its PIDs, as it expects that tasks will terminate quickly and not reach the PID limit, so it still hit it. Change its limit to account for all possible tasks running simultaneously.
Signed-off-by: Teo Couprie Diaz teo.coupriediaz@arm.com
lib/tst_pid.c | 43 +++++++++++++++++++ .../syscalls/ipc/msgstress/msgstress03.c | 2 +- 2 files changed, 44 insertions(+), 1 deletion(-)
diff --git a/lib/tst_pid.c b/lib/tst_pid.c index 21cadef2a..1fc2f11f1 100644 --- a/lib/tst_pid.c +++ b/lib/tst_pid.c @@ -103,6 +103,49 @@ static int get_session_pids_limit(void (*cleanup_fn) (void)) if (max_pids < 0) max_pids = read_session_pids_limit(CGROUPS_V1_SLICE_FMT, uid, cleanup_fn);
As we are trying to determine the proper path for pid controller below, do we really need the checks above with fixed paths ?
I suppose not, I left there as a fall-back as it _seems_ to not be too much of an issue until now. I don't mind removing them as indeed they might be redundant.
- /* Check for generic cgroup v1 pid.max */
- if (max_pids < 0) {
char path[PATH_MAX];
char cgroup_pids[PATH_MAX];
char catchall;
int cgroup_ok = 0;
cgroup_ok = FILE_LINES_SCANF(cleanup_fn, "/proc/self/cgroup", "%*d :pids: %s \n", &cgroup_pids);
We could actually drop early here - not much point reading the mountinfo proc entry if we don't have or couldn't find pids cgroup controller.
Agreed, I don't really now what would be the proper way to do that though ! Would a label+goto be appropriate here ? Or enclosing the code in another if, maybe a do {} while (false) ? What do you think ?
That's up to you really, getting another 'if' block would make sense, especially if we drop the previous two checks. Otherwise having do {} while (0) block makes sense as well.
What I chose to do in the end is to drop the previous ifs and have early returns when possible. This eliminates the `max_pids` variable, but a level of checks as well. I feel this makes it clearer and prevents too much condition nesting or an ugly `do {...} while(0);`.
/*
* This is a bit of hack of scanf format strings. It only works forward
* which means that if a variable has been matched, but not a later
* part of the format it still counts as a match. Thus, we need a
* catch-all match later in the string that will _not_ be counted if
* the static part of the format doesn't match.
* This way, the macro will go on the next line rather than returning
* on a line matching the %s but not the static part.
*/
cgroup_ok |= FILE_LINES_SCANF(cleanup_fn, "/proc/self/mountinfo", "%*s %*s %*s %*s %s %*s shared:16 - cgroup %c", &path, &catchall);
It makes sense as we need to make sure the whole format has been matched. It could be solved by modifying relevant file_lines_scanf function and using "%n" directive but that might be too much of a hassle so we can stick to this approach instead.
It might be a bit annoying, given that %n does not increase the number of matches, which means its value would need to be checked and depend on the format string I think ?
Indeed, which is why I have suggested to leave things as they are.
I'm not entirely sure whether assuming shared mounts is a good idea, but the group identifier should definitely not be fixed and we should not assume '16' here. I think we could skip those with the '*' and appropriate formats. (same applies below).
You're right. During my testing this was consistent across the three OSes I had, but since updating to using the 6.1 kernel it changed on my Morello board, so this is definitely a no-go. I have tested a change to the format string which seems to work by matching the "pids" at the end of the mount point, so it should be ok.
Also I am wondering if we should not consider mount's root, as mount point is supposed to be relative to root , so in some weird cases this will not provide an actual full path. I guess we can ignore that for now, but slapping a comment here would be good.
It would probably be some quite weird cases which is why I didn't take into account here. It shouldn't add much complexity to be honest, I can add it if you think it's useful, but the comment works for me as well.
No, it would indeed be pretty unusual thing so let's just leave some comment that this assumes /sys/fs/cgroup mount point to by the default/expected.
Agreed, added a comment.
if (!cgroup_ok) {
strncat(path, cgroup_pids, PATH_MAX);
strncat(path, "/pids.max", PATH_MAX);
max_pids = read_session_pids_limit(path, uid,
cleanup_fn);
}
- }
- /* Check for generic cgroup v2 pid.max */
- if (max_pids < 0) {
char path[PATH_MAX];
char cgroup_pids[PATH_MAX];
char catchall;
int cgroup2_ok = 0;
cgroup2_ok = FILE_LINES_SCANF(cleanup_fn, "/proc/self/cgroup", "%*d :: %s \n", &cgroup_pids);
cgroup2_ok |= FILE_LINES_SCANF(cleanup_fn, "/proc/self/mountinfo", "%*s %*s %*s %*s %s %*s shared:9 - cgroup2 %c", &path, &catchall);
if (!cgroup2_ok) {
strncat(path, cgroup_pids, PATH_MAX);
strncat(path, "/pids.max", PATH_MAX);
max_pids = read_session_pids_limit(path, uid,
cleanup_fn);
}
- }
My initial tought was to combine the two and decide which cgroup version we are dealing with based on parsing the meminfo with:
"%*s %*s %*s %*s %s %*s shared:9 - cgroup%[ 2]", &path, &cgroup_ver
but that might not work entirely with the unified hierarchy, where we would catch v2 but with no pids.max entry. It would also fail with mixed support. Looking also at the v1 support,this needs 'pids' name controller in the format string for sscanf otherwise we will get wrong path (it will match first entry with cgroup mount type, which might not be for pid controller and the final path will be the wrong one). So I guess the simplest approach is to check for each cgroup separately as you did.
I would have liked something less duplicated as well, but with the limited capabilities of the format strings compared to a regex for example I preferred splitting it that way.
Right, so how about:
Steps:
search for 'pids' entry in '/proc/self/mountinfo' if found: (we have the mount point by now): // *1 - here save the 'pids' relative path for later search for 'pids' controller path in '/proc/self/cgroup' if found: read_session_pids_limit..... if max_pids > 0: return max_pids search for 'cgroup2' in '/proc/self/mountinfo' if found: /* search for 0::......... */ search for relative path for cgroup controller in '/proc/self/cgroup' if found: read_session_pids_limit..... if max_pids > 0 return max_pids else /* here to cover the discrepancies between v1 and v2 and * the docs*/ read from cgroupv2_mount_point/pids_path from *1
If that makes sense ?
It makes a lot of sense, I ended up simplifying the return check as `read_session_pids_limit` already returns -1 if 0 or invalid, so we can return that directly.
However I'm not sure about the mixed part cgroupv2/path_from_pidsv1. Testing on the Morello board doesn't give me anything useful (there is no cgroupv2 pid controller active), or incorrect/incomplete on my work laptop (the cgroupv2 group is a child of the one this would give, plus the previous issue). Does it turn out differently when testing on your side ? Do you think this would be worthwhile to keep, rather than the simple "check 1, if not check 2, if not error" ?
We could even try to cover the inheritance case with the 'max' specifier, by remembering the path and then going level up and then trying to find 'pids.max' entry with readdir (like find_stat_file in lib/tst_device.c i.e.) (I am happy to implement the latter if needed, I'm aware I have already ask for a lot)
Happy to have a look at it, I'm interested and hopefully it should be ok :) I'll try and implement it for the next revision of the patch, after testing the current changes.
BR B.
Thanks again for the comments, Téo
I would just suggest doing that in one block keeping with single declaration for needed variables.
I didn't think about that, as I mimicked the previous logic, but that would be a bit cleaner indeed.
if (max_pids < 0) return -1;
diff --git a/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c b/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c index 3cb70ab18..0c46927b8 100644 --- a/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c +++ b/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c @@ -109,7 +109,7 @@ int main(int argc, char **argv) } }
- free_pids = tst_get_free_pids(cleanup);
- free_pids = tst_get_free_pids(cleanup) / 2;
Could we move that to a separate patch as this is dealing with a different issue?
Will do !
Thanks for the review, T�o
BR B.
if (nprocs >= free_pids) { tst_resm(TINFO, "Requested number of processes higher than limit (%d > %d), "
-- 2.25.1
linux-morello-ltp mailing list -- linux-morello-ltp@op-lists.linaro.org To unsubscribe send an email to linux-morello-ltp-leave@op-lists.linaro.org
On Tue, Jan 31, 2023 at 11:49:10AM +0000, Teo Couprie Diaz wrote:
Hi Beata,
Thanks again for the comments, I just came back to this patch.
Thanks for bearing with me !
On 18/01/2023 14:16, Beata Michalska wrote:
On Fri, Jan 13, 2023 at 04:52:06PM +0000, Teo Couprie Diaz wrote:
On 12/01/2023 17:58, Beata Michalska wrote:
Hi Teo,
Thanks for the efforts in solving that cgroup nightmare!
On Thu, Jan 12, 2023 at 10:20:25AM +0000, Teo Couprie Diaz wrote:
In our debian-based distribution, the two files used in lib/tst_pid are not available, but cgroups still imposes a task limit far lesser than the kernel pid_max. If the cgroup sysfs is mounted, we can use it to retrieve the current task limit imposed to the process. Implement the retrieval of this limit.
This can be done by first checking /proc/self/cgroup to get the cgroup the process is in, which will be a path under the cgroup sysfs. To get the path to the cgroup sysfs, check /proc/self/mountinfo. Finally, concatenate those two paths with pid.max to get the full path to the file containing the limit.
This fixed msgstress04, but it appeared that msgstress03 didn't account for all of its PIDs, as it expects that tasks will terminate quickly and not reach the PID limit, so it still hit it. Change its limit to account for all possible tasks running simultaneously.
Signed-off-by: Teo Couprie Diaz teo.coupriediaz@arm.com
lib/tst_pid.c | 43 +++++++++++++++++++ .../syscalls/ipc/msgstress/msgstress03.c | 2 +- 2 files changed, 44 insertions(+), 1 deletion(-)
diff --git a/lib/tst_pid.c b/lib/tst_pid.c index 21cadef2a..1fc2f11f1 100644 --- a/lib/tst_pid.c +++ b/lib/tst_pid.c @@ -103,6 +103,49 @@ static int get_session_pids_limit(void (*cleanup_fn) (void)) if (max_pids < 0) max_pids = read_session_pids_limit(CGROUPS_V1_SLICE_FMT, uid, cleanup_fn);
As we are trying to determine the proper path for pid controller below, do we really need the checks above with fixed paths ?
I suppose not, I left there as a fall-back as it _seems_ to not be too much of an issue until now. I don't mind removing them as indeed they might be redundant.
- /* Check for generic cgroup v1 pid.max */
- if (max_pids < 0) {
char path[PATH_MAX];
char cgroup_pids[PATH_MAX];
char catchall;
int cgroup_ok = 0;
cgroup_ok = FILE_LINES_SCANF(cleanup_fn, "/proc/self/cgroup", "%*d :pids: %s \n", &cgroup_pids);
We could actually drop early here - not much point reading the mountinfo proc entry if we don't have or couldn't find pids cgroup controller.
Agreed, I don't really now what would be the proper way to do that though ! Would a label+goto be appropriate here ? Or enclosing the code in another if, maybe a do {} while (false) ? What do you think ?
That's up to you really, getting another 'if' block would make sense, especially if we drop the previous two checks. Otherwise having do {} while (0) block makes sense as well.
What I chose to do in the end is to drop the previous ifs and have early returns when possible. This eliminates the `max_pids` variable, but a level of checks as well. I feel this makes it clearer and prevents too much condition nesting or an ugly `do {...} while(0);`.
That sounds good.
/*
* This is a bit of hack of scanf format strings. It only works forward
* which means that if a variable has been matched, but not a later
* part of the format it still counts as a match. Thus, we need a
* catch-all match later in the string that will _not_ be counted if
* the static part of the format doesn't match.
* This way, the macro will go on the next line rather than returning
* on a line matching the %s but not the static part.
*/
cgroup_ok |= FILE_LINES_SCANF(cleanup_fn, "/proc/self/mountinfo", "%*s %*s %*s %*s %s %*s shared:16 - cgroup %c", &path, &catchall);
It makes sense as we need to make sure the whole format has been matched. It could be solved by modifying relevant file_lines_scanf function and using "%n" directive but that might be too much of a hassle so we can stick to this approach instead.
It might be a bit annoying, given that %n does not increase the number of matches, which means its value would need to be checked and depend on the format string I think ?
Indeed, which is why I have suggested to leave things as they are.
I'm not entirely sure whether assuming shared mounts is a good idea, but the group identifier should definitely not be fixed and we should not assume '16' here. I think we could skip those with the '*' and appropriate formats. (same applies below).
You're right. During my testing this was consistent across the three OSes I had, but since updating to using the 6.1 kernel it changed on my Morello board, so this is definitely a no-go. I have tested a change to the format string which seems to work by matching the "pids" at the end of the mount point, so it should be ok.
Also I am wondering if we should not consider mount's root, as mount point is supposed to be relative to root , so in some weird cases this will not provide an actual full path. I guess we can ignore that for now, but slapping a comment here would be good.
It would probably be some quite weird cases which is why I didn't take into account here. It shouldn't add much complexity to be honest, I can add it if you think it's useful, but the comment works for me as well.
No, it would indeed be pretty unusual thing so let's just leave some comment that this assumes /sys/fs/cgroup mount point to by the default/expected.
Agreed, added a comment.
Thanks.
if (!cgroup_ok) {
strncat(path, cgroup_pids, PATH_MAX);
strncat(path, "/pids.max", PATH_MAX);
max_pids = read_session_pids_limit(path, uid,
cleanup_fn);
}
- }
- /* Check for generic cgroup v2 pid.max */
- if (max_pids < 0) {
char path[PATH_MAX];
char cgroup_pids[PATH_MAX];
char catchall;
int cgroup2_ok = 0;
cgroup2_ok = FILE_LINES_SCANF(cleanup_fn, "/proc/self/cgroup", "%*d :: %s \n", &cgroup_pids);
cgroup2_ok |= FILE_LINES_SCANF(cleanup_fn, "/proc/self/mountinfo", "%*s %*s %*s %*s %s %*s shared:9 - cgroup2 %c", &path, &catchall);
if (!cgroup2_ok) {
strncat(path, cgroup_pids, PATH_MAX);
strncat(path, "/pids.max", PATH_MAX);
max_pids = read_session_pids_limit(path, uid,
cleanup_fn);
}
- }
My initial tought was to combine the two and decide which cgroup version we are dealing with based on parsing the meminfo with:
"%*s %*s %*s %*s %s %*s shared:9 - cgroup%[ 2]", &path, &cgroup_ver
but that might not work entirely with the unified hierarchy, where we would catch v2 but with no pids.max entry. It would also fail with mixed support. Looking also at the v1 support,this needs 'pids' name controller in the format string for sscanf otherwise we will get wrong path (it will match first entry with cgroup mount type, which might not be for pid controller and the final path will be the wrong one). So I guess the simplest approach is to check for each cgroup separately as you did.
I would have liked something less duplicated as well, but with the limited capabilities of the format strings compared to a regex for example I preferred splitting it that way.
Right, so how about:
Steps:
search for 'pids' entry in '/proc/self/mountinfo' if found: (we have the mount point by now): // *1 - here save the 'pids' relative path for later search for 'pids' controller path in '/proc/self/cgroup' if found: read_session_pids_limit..... if max_pids > 0: return max_pids search for 'cgroup2' in '/proc/self/mountinfo' if found: /* search for 0::......... */ search for relative path for cgroup controller in '/proc/self/cgroup' if found: read_session_pids_limit..... if max_pids > 0 return max_pids else /* here to cover the discrepancies between v1 and v2 and * the docs*/ read from cgroupv2_mount_point/pids_path from *1
If that makes sense ?
It makes a lot of sense, I ended up simplifying the return check as `read_session_pids_limit` already returns -1 if 0 or invalid, so we can return that directly.
However I'm not sure about the mixed part cgroupv2/path_from_pidsv1. Testing on the Morello board doesn't give me anything useful (there is no cgroupv2 pid controller active), or incorrect/incomplete on my work laptop (the cgroupv2 group is a child of the one this would give, plus the previous issue). Does it turn out differently when testing on your side ? Do you think this would be worthwhile to keep, rather than the simple "check 1, if not check 2, if not error" ?
So I am still (apparently) trying to wrap my head around cgroups v2. And you are right I might be off with the path mixing. Searching for 'pids' in mountinfo should succeed only with the PIDS controller being under cgroups v1 control, at which case the /proc/$PID/cgroup should have 'pids' entry. If there is no 'pids' entry in /proc/$PID/mountinfo then we should look for 'cgroup2' entry. If one is found, looking at the /proc/$PID/cgroup should give us a relative path to cgroup the process belongs to and thus to the 'pids.max' file if the PIDS controller is being enabled. If any of this fails it means there is no limit imposed on number of processes through cgroup interface. So you can ignore the 'discrepancies' section from my comment above. Thanks for pointing that one out.
One thing though when looking for mount point for PIDS controller when using cgroups v1: we should not look for 'pids' within the mount point but within the super block options as the mount point can be named differently (again that will probably not ever happen but still) e.g:
mkdir -p /cgroups_mountpoint/whatever mount -t cgroup -o pids none /cgroups_mountpoint/whatever/ cat /proc/self/mountinfo | grep pids
gives:
954 29 0:37 / /cgroups_mountpoint/whatever rw,relatime shared:649 - cgroup none rw,pids
Hope I didn't cause more confusion and apologies for one caused earlier.
--- BR B.
We could even try to cover the inheritance case with the 'max' specifier, by remembering the path and then going level up and then trying to find 'pids.max' entry with readdir (like find_stat_file in lib/tst_device.c i.e.) (I am happy to implement the latter if needed, I'm aware I have already ask for a lot)
Happy to have a look at it, I'm interested and hopefully it should be ok :) I'll try and implement it for the next revision of the patch, after testing the current changes.
BR B.
Thanks again for the comments, Téo
I would just suggest doing that in one block keeping with single declaration for needed variables.
I didn't think about that, as I mimicked the previous logic, but that would be a bit cleaner indeed.
if (max_pids < 0) return -1;
diff --git a/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c b/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c index 3cb70ab18..0c46927b8 100644 --- a/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c +++ b/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c @@ -109,7 +109,7 @@ int main(int argc, char **argv) } }
- free_pids = tst_get_free_pids(cleanup);
- free_pids = tst_get_free_pids(cleanup) / 2;
Could we move that to a separate patch as this is dealing with a different issue?
Will do !
Thanks for the review, T�o
BR B.
if (nprocs >= free_pids) { tst_resm(TINFO, "Requested number of processes higher than limit (%d > %d), "
-- 2.25.1
linux-morello-ltp mailing list -- linux-morello-ltp@op-lists.linaro.org To unsubscribe send an email to linux-morello-ltp-leave@op-lists.linaro.org
On 31/01/2023 22:03, Beata Michalska wrote:
On Tue, Jan 31, 2023 at 11:49:10AM +0000, Teo Couprie Diaz wrote:
Hi Beata,
Thanks again for the comments, I just came back to this patch.
Thanks for bearing with me !
On 18/01/2023 14:16, Beata Michalska wrote:
On Fri, Jan 13, 2023 at 04:52:06PM +0000, Teo Couprie Diaz wrote:
On 12/01/2023 17:58, Beata Michalska wrote:
Hi Teo,
Thanks for the efforts in solving that cgroup nightmare!
On Thu, Jan 12, 2023 at 10:20:25AM +0000, Teo Couprie Diaz wrote:
In our debian-based distribution, the two files used in lib/tst_pid are not available, but cgroups still imposes a task limit far lesser than the kernel pid_max. If the cgroup sysfs is mounted, we can use it to retrieve the current task limit imposed to the process. Implement the retrieval of this limit.
This can be done by first checking /proc/self/cgroup to get the cgroup the process is in, which will be a path under the cgroup sysfs. To get the path to the cgroup sysfs, check /proc/self/mountinfo. Finally, concatenate those two paths with pid.max to get the full path to the file containing the limit.
This fixed msgstress04, but it appeared that msgstress03 didn't account for all of its PIDs, as it expects that tasks will terminate quickly and not reach the PID limit, so it still hit it. Change its limit to account for all possible tasks running simultaneously.
Signed-off-by: Teo Couprie Diaz teo.coupriediaz@arm.com
lib/tst_pid.c | 43 +++++++++++++++++++ .../syscalls/ipc/msgstress/msgstress03.c | 2 +- 2 files changed, 44 insertions(+), 1 deletion(-)
diff --git a/lib/tst_pid.c b/lib/tst_pid.c index 21cadef2a..1fc2f11f1 100644 --- a/lib/tst_pid.c +++ b/lib/tst_pid.c @@ -103,6 +103,49 @@ static int get_session_pids_limit(void (*cleanup_fn) (void)) if (max_pids < 0) max_pids = read_session_pids_limit(CGROUPS_V1_SLICE_FMT, uid, cleanup_fn);
As we are trying to determine the proper path for pid controller below, do we really need the checks above with fixed paths ?
I suppose not, I left there as a fall-back as it _seems_ to not be too much of an issue until now. I don't mind removing them as indeed they might be redundant.
- /* Check for generic cgroup v1 pid.max */
- if (max_pids < 0) {
char path[PATH_MAX];
char cgroup_pids[PATH_MAX];
char catchall;
int cgroup_ok = 0;
cgroup_ok = FILE_LINES_SCANF(cleanup_fn, "/proc/self/cgroup", "%*d :pids: %s \n", &cgroup_pids);
We could actually drop early here - not much point reading the mountinfo proc entry if we don't have or couldn't find pids cgroup controller.
Agreed, I don't really now what would be the proper way to do that though ! Would a label+goto be appropriate here ? Or enclosing the code in another if, maybe a do {} while (false) ? What do you think ?
That's up to you really, getting another 'if' block would make sense, especially if we drop the previous two checks. Otherwise having do {} while (0) block makes sense as well.
What I chose to do in the end is to drop the previous ifs and have early returns when possible. This eliminates the `max_pids` variable, but a level of checks as well. I feel this makes it clearer and prevents too much condition nesting or an ugly `do {...} while(0);`.
That sounds good.
/*
* This is a bit of hack of scanf format strings. It only works forward
* which means that if a variable has been matched, but not a later
* part of the format it still counts as a match. Thus, we need a
* catch-all match later in the string that will _not_ be counted if
* the static part of the format doesn't match.
* This way, the macro will go on the next line rather than returning
* on a line matching the %s but not the static part.
*/
cgroup_ok |= FILE_LINES_SCANF(cleanup_fn, "/proc/self/mountinfo", "%*s %*s %*s %*s %s %*s shared:16 - cgroup %c", &path, &catchall);
It makes sense as we need to make sure the whole format has been matched. It could be solved by modifying relevant file_lines_scanf function and using "%n" directive but that might be too much of a hassle so we can stick to this approach instead.
It might be a bit annoying, given that %n does not increase the number of matches, which means its value would need to be checked and depend on the format string I think ?
Indeed, which is why I have suggested to leave things as they are.
I'm not entirely sure whether assuming shared mounts is a good idea, but the group identifier should definitely not be fixed and we should not assume '16' here. I think we could skip those with the '*' and appropriate formats. (same applies below).
You're right. During my testing this was consistent across the three OSes I had, but since updating to using the 6.1 kernel it changed on my Morello board, so this is definitely a no-go. I have tested a change to the format string which seems to work by matching the "pids" at the end of the mount point, so it should be ok.
Also I am wondering if we should not consider mount's root, as mount point is supposed to be relative to root , so in some weird cases this will not provide an actual full path. I guess we can ignore that for now, but slapping a comment here would be good.
It would probably be some quite weird cases which is why I didn't take into account here. It shouldn't add much complexity to be honest, I can add it if you think it's useful, but the comment works for me as well.
No, it would indeed be pretty unusual thing so let's just leave some comment that this assumes /sys/fs/cgroup mount point to by the default/expected.
Agreed, added a comment.
Thanks.
if (!cgroup_ok) {
strncat(path, cgroup_pids, PATH_MAX);
strncat(path, "/pids.max", PATH_MAX);
max_pids = read_session_pids_limit(path, uid,
cleanup_fn);
}
- }
- /* Check for generic cgroup v2 pid.max */
- if (max_pids < 0) {
char path[PATH_MAX];
char cgroup_pids[PATH_MAX];
char catchall;
int cgroup2_ok = 0;
cgroup2_ok = FILE_LINES_SCANF(cleanup_fn, "/proc/self/cgroup", "%*d :: %s \n", &cgroup_pids);
cgroup2_ok |= FILE_LINES_SCANF(cleanup_fn, "/proc/self/mountinfo", "%*s %*s %*s %*s %s %*s shared:9 - cgroup2 %c", &path, &catchall);
if (!cgroup2_ok) {
strncat(path, cgroup_pids, PATH_MAX);
strncat(path, "/pids.max", PATH_MAX);
max_pids = read_session_pids_limit(path, uid,
cleanup_fn);
}
- }
My initial tought was to combine the two and decide which cgroup version we are dealing with based on parsing the meminfo with:
"%*s %*s %*s %*s %s %*s shared:9 - cgroup%[ 2]", &path, &cgroup_ver
but that might not work entirely with the unified hierarchy, where we would catch v2 but with no pids.max entry. It would also fail with mixed support. Looking also at the v1 support,this needs 'pids' name controller in the format string for sscanf otherwise we will get wrong path (it will match first entry with cgroup mount type, which might not be for pid controller and the final path will be the wrong one). So I guess the simplest approach is to check for each cgroup separately as you did.
I would have liked something less duplicated as well, but with the limited capabilities of the format strings compared to a regex for example I preferred splitting it that way.
Right, so how about:
Steps:
search for 'pids' entry in '/proc/self/mountinfo' if found: (we have the mount point by now): // *1 - here save the 'pids' relative path for later search for 'pids' controller path in '/proc/self/cgroup' if found: read_session_pids_limit..... if max_pids > 0: return max_pids search for 'cgroup2' in '/proc/self/mountinfo' if found: /* search for 0::......... */ search for relative path for cgroup controller in '/proc/self/cgroup' if found: read_session_pids_limit..... if max_pids > 0 return max_pids else /* here to cover the discrepancies between v1 and v2 and * the docs*/ read from cgroupv2_mount_point/pids_path from *1
If that makes sense ?
It makes a lot of sense, I ended up simplifying the return check as `read_session_pids_limit` already returns -1 if 0 or invalid, so we can return that directly.
However I'm not sure about the mixed part cgroupv2/path_from_pidsv1. Testing on the Morello board doesn't give me anything useful (there is no cgroupv2 pid controller active), or incorrect/incomplete on my work laptop (the cgroupv2 group is a child of the one this would give, plus the previous issue). Does it turn out differently when testing on your side ? Do you think this would be worthwhile to keep, rather than the simple "check 1, if not check 2, if not error" ?
So I am still (apparently) trying to wrap my head around cgroups v2. And you are right I might be off with the path mixing. Searching for 'pids' in mountinfo should succeed only with the PIDS controller being under cgroups v1 control, at which case the /proc/$PID/cgroup should have 'pids' entry. If there is no 'pids' entry in /proc/$PID/mountinfo then we should look for 'cgroup2' entry. If one is found, looking at the /proc/$PID/cgroup should give us a relative path to cgroup the process belongs to and thus to the 'pids.max' file if the PIDS controller is being enabled. If any of this fails it means there is no limit imposed on number of processes through cgroup interface. So you can ignore the 'discrepancies' section from my comment above. Thanks for pointing that one out.
No worries, we have reached the same understanding then !
One thing though when looking for mount point for PIDS controller when using cgroups v1: we should not look for 'pids' within the mount point but within the super block options as the mount point can be named differently (again that will probably not ever happen but still) e.g:
mkdir -p /cgroups_mountpoint/whatever mount -t cgroup -o pids none /cgroups_mountpoint/whatever/ cat /proc/self/mountinfo | grep pids
gives:
954 29 0:37 / /cgroups_mountpoint/whatever rw,relatime shared:649 - cgroup none rw,pids
Agreed, this is what I ended up looking for in my reworked scanf format string. It makes it a bit more understandable as well, not looking for some weird number in the middle.
Hope I didn't cause more confusion and apologies for one caused earlier.
All good : no harm done and I prefer we discuss and take the time to be clear on everything :)
BR B.
Thanks for the comments, Téo
We could even try to cover the inheritance case with the 'max' specifier, by remembering the path and then going level up and then trying to find 'pids.max' entry with readdir (like find_stat_file in lib/tst_device.c i.e.) (I am happy to implement the latter if needed, I'm aware I have already ask for a lot)
Happy to have a look at it, I'm interested and hopefully it should be ok :) I'll try and implement it for the next revision of the patch, after testing the current changes.
BR B.
Thanks again for the comments, T�o
I would just suggest doing that in one block keeping with single declaration for needed variables.
I didn't think about that, as I mimicked the previous logic, but that would be a bit cleaner indeed.
if (max_pids < 0) return -1;
diff --git a/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c b/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c index 3cb70ab18..0c46927b8 100644 --- a/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c +++ b/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c @@ -109,7 +109,7 @@ int main(int argc, char **argv) } }
- free_pids = tst_get_free_pids(cleanup);
- free_pids = tst_get_free_pids(cleanup) / 2;
Could we move that to a separate patch as this is dealing with a different issue?
Will do !
Thanks for the review, T�o
BR B.
if (nprocs >= free_pids) { tst_resm(TINFO, "Requested number of processes higher than limit (%d > %d), "
-- 2.25.1
linux-morello-ltp mailing list -- linux-morello-ltp@op-lists.linaro.org To unsubscribe send an email to linux-morello-ltp-leave@op-lists.linaro.org
linux-morello-ltp@op-lists.linaro.org