terry@n54l:~$ cat /proc/sys/vm/drop_caches cat: /proc/sys/vm/drop_caches: Permission denied terry@n54l:~$ sudo -i root@n54l:~# cat /proc/sys/vm/drop_caches cat: /proc/sys/vm/drop_caches: Permission denied root@n54l:~# whoami root # WTF?
Since Linux 5.5,
drop_caches has become write-only (mode bits
0200) to avoid confusions when operating at scale.
From the git commit:
kernel: sysctl: make drop_caches write-only
drop_caches mode bits changed from
0200 which means
drop_cachesproc file and
sysctlread back the last value written, suggesting this is somehow a stateful setting instead of a one-time command. Make it write-only, like e.g
It makes sense,
drop_caches is one-off command, stateless. It really confuses if operating at scale, I’ve been in that boat before (many times).
Author explained a bit more with real world experience:
While mitigating a VM problem at scale in our fleet, there was confusion about whether writing to this file will permanently switch the kernel into a non-caching mode. This influences the decision making in a tense situation, where tens of people are trying to fix tens of thousands of affected machines: Do we need a rollback strategy? What are the performance implications of operating in a non-caching state for several days? It also caused confusion when the kernel team said we may need to write the file several times to make sure it's effective ("But it already reads back 3?").
Another sysctl syscall fun fact
Came across this in Linux 5.5(https://kernelnewbies.org/Linux_5.5) change log
Remove the sysctl system call (deprecated a long time ago) commit
This system call has been deprecated almost since it was introduced.
In a survey of the linux distributions I can no longer find any of them that enable CONFIG_SYSCTL_SYSCALL. The only indication that I can find that anyone might care is that a few of the defconfigs in the kernel enable CONFIG_SYSCTL_SYSCALL However this appears in only 31 of 414 defconfigs in the kernel, so I suspect this symbols presence is simply because it is harmless to include rather than because it is necessary.
As there appear to be no users of the sysctl system call, remove the code. As this removes one of the few uses of the internal kernel mount of proc I hope this allows for even more simplifications of the proc filesystem.
I decided to do a validation on the distributions I use daily. As you can see below, obviously Arch Linux, Fedora were fine, but Ubuntu, hmm… ;-)
# fedora 31 PRETTY_NAME="Fedora 31 (Thirty One)" root@n54l:/boot# grep CONFIG_SYSCTL_SYSCALL config-$(uname -r) # arch terry@netbook:~$ grep PRETTY_NAME /etc/os-release PRETTY_NAME="Arch Linux" terry@netbook:~$ zcat /proc/config.gz | grep CONFIG_SYSCTL_SYSCALL # ubuntu 18.04 $ grep PRETTY_NAME /etc/os-release PRETTY_NAME="Ubuntu 18.04.4 LTS" $ grep CONFIG_SYSCTL_SYSCALL /boot/config-$(uname -r) CONFIG_SYSCTL_SYSCALL=y