Wed Jan 18, 2012 3:30 am
I guess this is the right thread to share my findings.
I replaced old RB411 with brand new RB411UAHR today, because it was restarting quite often (by watchdog timer) and it was always preceded by huge cpu usage by management process (and I also needed the second radio). As RB411 has only slow CPU, it was at 100%, with management getting as much as it could and other processes used almost nothing. With faster RB411UAHR it was a little different, management was taking ~70%, snmp ~20% and 10% was idle. The relatively big share for snmp made me investigate it.
And sure, in my case it's definitely SNMP-related. When it was happening and I either disabled SNMP on RB or blocked incoming SNMP packets by firewall, it immediately stopped and router went back to >95% idle. Another thing I quickly found out, was ~500 incoming SNMP packets every second (usually it's 10-20 every few seconds). SNMP logging on RB revealed that queries produce some error and Dude is trying again and again in infinite loop. Unfortunately I didn't write the error down before I "fixed" it, but it was something like invalid or missing value or something not found. What I know exactly, it was happening in DHCP leases table (.1.3.6.1.2.1.9999.1.1.6.4.1). When I tried to snmpwalk RB using different tool, it also got stuck in same infinite loop at OID 1.3.6.1.2.1.9999.1.1.6.4.1.4.192.168.82.50. Manual snmpwalk in Dude did not get stuck, but under dhcpv4ServerClientEntry the only item shown was dhcpv4ServerClientLeaseType and all others usually present were missing.
The "fix" in my case was to remove all dynamic DHCP leases and disable and then enable all static ones. After that, it doesn't happen any more. Well, so far... but I'm pretty sure it will come back when least needed.
ROS 5.11 (and few older 5.x before that), The Dude 4.0beta3. Not sure whose fault it is.