Splunk high CPU utilization starting July 1, 2012
Posted by Admin • Thursday, July 5. 2012 • Category: Linux
This has to be written down for posterity: My single server installation of splunk started using all of the CPU about then. Nagios caught it, but I had no time to debug it. I tried disabling indexes, apps, and inputs - none of that made any difference. I reduced an index size, which again had no effect. I cleaned it out and installed it fresh (empty database), only to see the now familiar steady CPU usage.
In desperation I went to open a question on splunk support forums where it suggested I read a related thread which in fact had the solution: http://splunk-base.splunk.com/answers/52109/universal-forwarder-high-cpu-after-leap-second-correction Apparently the culprit is the leap year second inserted at midnight on June 30. Why that makes splunk go nuts I'm not quite sure, but i did see a warning about it in my dmesg:
Apparently this is the same problem that took down Amazon. The solution involves running a perl command to alter the system date ever so slightly in the absence of ntp.
In desperation I went to open a question on splunk support forums where it suggested I read a related thread which in fact had the solution: http://splunk-base.splunk.com/answers/52109/universal-forwarder-high-cpu-after-leap-second-correction Apparently the culprit is the leap year second inserted at midnight on June 30. Why that makes splunk go nuts I'm not quite sure, but i did see a warning about it in my dmesg:
[3073222.768708] Clock: inserting leap second 23:59:60 UTC
Apparently this is the same problem that took down Amazon. The solution involves running a perl command to alter the system date ever so slightly in the absence of ntp.