ACCESS ALL AREAS

Fixing the “100% CPU and no useful output” imklog+rsyslog kernel logging problem on Ubuntu guests under Xen PV

5 January 2013 by Mike Gogulski
Posted in technology | 3 Comments »

I’ve been helping the organizers of the unSYSTEM conference with some IT support and development work. Things are getting hot, so it was time to get off shared hosting and onto a VPS.

The VPS provider handed me a Xen VM just as I expected. What I didn’t expect was a three-year-old Linux kernel (2.6.32-5!?) and a 32-bit machine instead of a 64-bit. Okay, the 32-bitness isn’t a big deal, doesn’t matter for this application. The kernel needs upgrading, though, so I go and do my usual things like I do on other Xen VPSes I use. Turns out I can’t upgrade the kernel, since the disk partition where /boot lives before the initrd finishes is actually outside the domain of my VPS, and thus inaccessible. Huh. I sure as heck wouldn’t set up customer DomUs that way if I were the provider, but whatever. A very helpful admin got me running instead with PyGrub, so now I’m running a far more up-to-date 3.5.x kernel. Phew!

While I was investigating things before I got support involved I wanted to look at as much log data as I could, so naturally I was poking around extensively in /var/log to see what was going on. I found this:

root@nyancat:/var/log# tail /var/log/kern.log
[...]
Jan  3 13:26:03 unsystem kernel: Kernel logging (proc) stopped.
Jan  3 13:26:03 unsystem kernel: imklog 5.8.6, log source = /proc/kmsg started.
Jan  3 13:26:03 unsystem kernel: Cannot read proc file system: 1 - Operation
not permitted.
Jan  3 13:26:33 unsystem kernel: last message repeated 1879152 times
Jan  3 13:27:33 unsystem kernel: last message repeated 3727264 times
Jan  3 13:28:33 unsystem kernel: last message repeated 3768064 times
Jan  3 13:29:33 unsystem kernel: last message repeated 3756864 times
root@nyancat:/var/log#

Okay, so that’s decidedly not good. Lucky thing that rsyslogd deduplicates messages before writing them to disk, but even so the rsyslogd process was consuming 100% CPU on one of the cores in the machine (I hadn’t noticed, since performance with the single remaining core was fine. But still, this is bad stuff, so I had to dig in and fix it before I could use this system for, er, unSYSTEM. /proc/kmsg had permissions like:

root@nyancat:/var/log# ls -l /proc/kmsg
-r-------- 1 root root 0 Jan  3 11:05 /proc/kmsg

so I whipped out a can of

root@nyancat:/var/log# chmod 444 /proc/kmsg

… but no love. Try what I might, I was still getting the reports of millions of those EPERM messages.

Okay… so why does dmesg work while imklog is flailing?

root@nyancat:/var/log# strace dmesg > /dev/null
[...]
syslog(0xa, 0, 0)                       = 131072
syslog(0x3, 0x8900038, 0x20008)         = 12549
fstat64(1, {st_dev=makedev(202, 2), st_ino=368645,
st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096,
st_blocks=0, st_size=0, st_atime=2013/01/03-13:37:27,
st_mtime=2013/01/03-13:37:29, st_ctime=2013/01/03-13:37:29}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0xb7861000
write(1, "[    0.000000] Reserving virtual"..., 4096) = 4096
write(1, "ids:1 nr_node_ids:1\n[    0.00000"..., 4096) = 4096
write(1, "g initramfs...\n[    0.167294] Fr"..., 3742) = 3742
exit_group(0)
root@nyancat:/var/log#

dmesg works here for reading the kernel message log, but it’s doing it via a syslog(0x3, char *bufp, int len) call rather than by trying to read /proc/kmsg. Time to ask the internets WTF is going on here.

Turns out that this is a long-standing issue dating back at least to early 2010. Those feisty internets soon coughed up what seems to be the main bug description:

https://bugs.launchpad.net/ubuntu/+source/rsyslog/+bug/523610

And some other bug pages:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=573980

https://bugs.launchpad.net/ubuntu/+source/rsyslog/+bug/565288

The core of the problem goes something like this: some time in early 2010 the Ubuntu and/or Debian devs (I really don’t know which) decided that having rsyslogd running as root is a security risk. I can’t fault that thinking; it’s good practice for system utilities that even need to be launched as root to drop their privileges as soon as they’re able. So, the patch came in saying “Do your few initial root things, then seteuid(getpwnam(“syslog”)->pw_uid); (or whatever).” So, open /dev/kmsg (which you can do as root with no problem), then drop out of the system-privilege stratosphere to do your logging off the open file descriptor. Sounds good, right?

Problem is, /proc/kmsg ain’t a regular file. Nothing in /proc is a regular file, to be sure, but /proc/kmsg is extra special. One fine day, the Linux kernel devs came up with good reason to make sure that the kernel message buffer exposed in /proc could not be read by any non-root process. So, your fd = open(“/proc/kmsg”, “r”);, as root, succeeds, but as soon as you drop privileges the part of the kernel responsible for policing that buffer starts throwing EPERMs at you as soon as you attempt a read() on the file descriptor. Huh. Why bloody rsyslogd doesn’t adapt by using the syslog(3, …, …); method dmesg uses is a mystery for another day (i.e. never).

My eventual workaround for this was inspired by:

http://www.mail-archive.com/debian-bugs-dist@lists.debian.org/msg761493.html:

root@nyancat:/var/log# mkdir -p /var/run/rsyslogd
root@nyancat:/var/log# mkfifo /var/run/rsyslogd/kmsg
root@nyancat:/var/log# chown -R syslog /var/run/rsyslogd
root@nyancat:/var/log# chmod -R 700 /var/run/rsyslogd
root@nyancat:/var/log# echo '$KLogPath /var/run/rsyslogd/kmsg' >>
/etc/rsyslog.conf
root@nyancat:/var/log# dd bs=1 if=/proc/kmsg of=/var/run/rsyslogd/kmsg &

What we’re doing here is creating a named pipe on the filesystem that we’ll use to shuttle data to rsyslogd from /proc/kmsg, using a long-running dd process in byte-at-a-time mode. Gross, but it works. The $KLogPath business added to rsyslog.conf tells the daemon to read its kernel messages off the named pipe we’ve created.

Test:

root@nyancat:~# cat > test.c << EOF
main() {
	char *x=0;
	puts(x);
}
EOF
root@nyancat:~# gcc test.c -o test
root@nyancat:~# ./test
Segmentation fault
root@nyancat:~# tail -1 /var/log/kern.log
Jan  3 14:32:55 unsystem kernel: [18497.165293] a.out[19891]: segfault
at 0 ip b7637ee1 sp bffe10ec error 4 in libc-2.15.so[b75ba000+1a3000]
root@nyancat:~#

Yay, it works. Now let’s make it survive a reboot, via an upstart job:

root@nyancat:/var/log# cat > /etc/init/kmsg-pipe.conf << EOF
#
# Ye Olde /proc/kmsg hack by Mike Gogulski
# from http://www.nostate.com/4228/fixing-the-100-cpu-and-no-useful-output-imklogrsyslog-kernel-logging-problem-on-ubuntu-guests-under-xen-pv
#
# This is free and unencumbered software released into the public domain under
# the terms of the Unlicense [http://unlicense.org/].
#
description	"/proc/kmsg pipe hack for rsyslogd"
start on started rsyslog
stop on stopped rsyslog
respawn
script
	mkdir -p /var/run/rsyslogd || true
	mkfifo /var/run/rsyslogd/kmsg || true
	chown -R syslog /var/run/rsyslogd || true
	chmod -R 700 /var/run/rsyslogd || true
	exec dd bs=1 if=/proc/kmsg of=/var/run/rsyslogd/kmsg
end script
EOF
root@nyancat:/var/log#

The $KLogPath entry must be in /etc/rsyslog.conf as shown above, or someplace under /etc/rsyslog.d/whatever.conf.

Test:

root@nyancat:/var/log# date
Sat Jan  5 01:24:36 CET 2013
root@nyancat:/var/log# service rsyslog stop
rsyslog stop/waiting
root@nyancat:/var/log# ps ax | grep kmsg | grep -v grep
root@nyancat:/var/log# service rsyslog start
rsyslog start/running, process 2288
root@nyancat:/var/log# ~/test
Segmentation fault
root@nyancat:/var/log# tail -1 /var/log/kern.log
Jan  5 01:25:00 nyancat kernel: [ 5629.864293] a.out[2297]: segfault
at 0 ip b7661ee1 sp bfb7e0cc error 4 in libc-2.15.so[b75e4000+1a3000]
root@nyancat:/var/log# !ps
ps ax | grep kmsg | grep -v grep
 2289 ?        Ss     0:00 dd bs=1 if=/proc/kmsg of=/var/run/rsyslogd/kmsg
root@nyancat:/var/log# kill 2289
root@nyancat:/var/log# !ps
ps ax | grep kmsg | grep -v grep
 2305 ?        Ss     0:00 dd bs=1 if=/proc/kmsg of=/var/run/rsyslogd/kmsg
root@nyancat:/var/log# !~
~/a.out
Segmentation fault
root@nyancat:/var/log# !t
tail -1 /var/log/kern.log
Jan  5 01:25:24 nyancat kernel: [ 5653.661550] a.out[2313]: segfault
at 0 ip b761aee1 sp bfe5ed5c error 4 in libc-2.15.so[b759d000+1a3000]
root@nyancat:/var/log#

Groovy. Enjoy!

PS: sudo is for sissies! Nyan!

  1. 3 Responses to “Fixing the “100% CPU and no useful output” imklog+rsyslog kernel logging problem on Ubuntu guests under Xen PV”

  2. By plastoc on 26 March 2013

    For once someone not just pointing to the bug report !
    It works like a charm…
    <3

  3. By pedro on 10 November 2013

    Does this solution create any bad side-effects? Aka, does it solves the CPU/writing problem well? I have the same problem and in your article the solution was “gross” so I wanted to see if it was good enough for a production server

  4. By Egor on 23 July 2014

    On Ubuntu 12.04.1 I uninstalled rsyslogd and install sysklogd with distrib configs and all working now. But it uses the same solution as author – here is the processes started:
    syslog … /sbin/syslogd -u syslog
    klog … /sbin/klogd -P /var/run/klogd/kmsg
    root … /bin/dd bs=1 if=/proc/kmsg of=/var/run/klogd/kmsg


    comments rss Comments RSS

Post a Comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

  • Categories

  • Archives

  • Core Dogma

  • LinkWorth

  • Other groovy stuff