CCIE Security: On the rocks - Troubleshooting (mainly ISE)


OK, so it's not really ISE per-se, but clearly something is a bit foobar.

Thankfully I can Wireshark it!

Let's start with the ISE server. This is the traffic going out:

Debugging ICMP with Wireshark

See, nothing coming back.

So let's head over to the server and see if anything gets to that:

Debugging ICMP with Wireshark

The server gets the traffic, and sends it out again.

The logical next step would be for SW1 to send it (directly) to SW4, but it doesn't. It sends it to SW3:

Debugging ICMP with Wireshark

So far, so good.

But somewhere (i.e. SW3), the traffic does not get sent back to SW4:

Debugging ICMP with Wireshark

For some reason, the reply traffic is actually going up to SW2:

Debugging ICMP with Wireshark

But the replies never go back down to SW4. I won't include a screenshot of this, but the Wireshark filter was empty.

Shutting down the connection to SW2 didn't help either. Things did look very promising with connecting the laptop to the wifi, but then it just wouldn't connect, and the same issues appeared again.

This isn't working much better either...

Bridging in UNetLab

Grrr.

I think I will swap the switches and see if that helps. Time to shut everything down and re-build.

After a (quick) rebuild...

IOL images in UNetLab

It's now running IOL images, and we STILL have the same problem.

What is making it worse is that the switches will turn themselves off:

Bad images turn off in UNetLab
Thanks for that, switch.

So, another rebuild, back to vIOS, a slightly older image now. Let's see how this one fairs. I will be back in a couple of hours...

So, how do things look now? Well, sadly, not all that great. I even moved the AD server to be on the same switch as the ISE;

Moving things around

It still craps out:
ISE20/admin# ping 10.1.4.100
PING 10.1.4.100 (10.1.4.100) 56(84) bytes of data.
64 bytes from 10.1.4.100: icmp_seq=1 ttl=127 time=7.11 ms
64 bytes from 10.1.4.100: icmp_seq=2 ttl=127 time=7.74 ms
64 bytes from 10.1.4.100: icmp_seq=3 ttl=127 time=9.45 ms
64 bytes from 10.1.4.100: icmp_seq=4 ttl=127 time=6.95 ms

--- 10.1.4.100 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3012ms
rtt min/avg/max/mdev = 6.954/7.816/9.456/0.997 ms

ISE20/admin# ping 10.1.4.100
PING 10.1.4.100 (10.1.4.100) 56(84) bytes of data.

--- 10.1.4.100 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 33000ms


ISE20/admin# ping 10.1.4.100
PING 10.1.4.100 (10.1.4.100) 56(84) bytes of data.

--- 10.1.4.100 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 33000ms


ISE20/admin#
How bad does this suck? More so as I have also lost access to the vWLC, and need to fix that again. So, how to fix this? Windows firewall is off, so the problem is not that. So, I moved the AD server back to SW1, and dual-homed it, with a second connection (192.168.90.100) into SW4.
ISE20/admin# ping 192.168.90.100
PING 192.168.90.100 (192.168.90.100) 56(84) bytes of data.
64 bytes from 192.168.90.100: icmp_seq=1 ttl=128 time=9.79 ms
64 bytes from 192.168.90.100: icmp_seq=2 ttl=128 time=2.20 ms
64 bytes from 192.168.90.100: icmp_seq=3 ttl=128 time=2.61 ms
64 bytes from 192.168.90.100: icmp_seq=4 ttl=128 time=2.72 ms

--- 192.168.90.100 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3008ms
rtt min/avg/max/mdev = 2.205/4.335/9.793/3.157 ms

ISE20/admin# ping 192.168.90.100
PING 192.168.90.100 (192.168.90.100) 56(84) bytes of data.
64 bytes from 192.168.90.100: icmp_seq=1 ttl=128 time=5.93 ms
64 bytes from 192.168.90.100: icmp_seq=2 ttl=128 time=9.14 ms
64 bytes from 192.168.90.100: icmp_seq=3 ttl=128 time=3.17 ms
64 bytes from 192.168.90.100: icmp_seq=4 ttl=128 time=2.86 ms

--- 192.168.90.100 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3006ms
rtt min/avg/max/mdev = 2.861/5.281/9.148/2.533 ms

ISE20/admin#
Let's give it a few minutes...

Seems to be stable. I have fixed (hopefully) the vWLC (the port you can see in CDP on the switch needs to be an access port, not a trunk), and the ISE can ping the AD box, and all the checks pass.

Changing the switch image (instead of the platform) would have been much easier, but let's see if this is fixed before celebrating.

I did have one theory when in the shower this morning, that the issue could stem from HSRP, I havn't put HSRP back in, instead each switch gets a VIF with it's own IP address 10.1.4.1, 192.168.90.1 for SW1 and so on. HSRP is not that great in a virtualized environment!

The vWLC can see the AP, but I am not seeing the WLANs, but I think I need to switch the AP back to FlexConnect mode instead of local.

Will find out tonight when I get home.

Upon returning home, the ISE looks in a much better state. However, the issue now seems to have moved to the vWLC. I changed the AP beck to FlexConnect mode and I can see the WLANs again, but the vWLC seems to lose contact. Same issue, different device. I think this might have something to do with it:
SW2(config-if)#
SW2(config-if)#
-Traceback= 1DBB7C8z 8DBFE5z 90522Ez 904F50z 904D5Dz 900F45z 901B7Bz 901B0Fz 7A8738z 7F8D8Dz - Process "Net Input", CPU hog, PC 0x008FD5AD

-Traceback= 1DBB7C8z 8DBFE5z 90522Ez 904F50z 904D5Dz 900F45z 901B7Bz 901B0Fz 7A8738z 7F8D8Dz - Process "Net Input", CPU hog, PC 0x008FD5B5
SW2(config)#
*Jun  3 18:14:17.110: %SYS-3-CPUHOG: Task is running for (1997)msecs, more than (2000)msecs (0/0),process = Net Input.
*Jun  3 18:14:19.110: %SYS-3-CPUHOG: Task is running for (3997)msecs, more than (2000)msecs (0/0),process = Net Input.
SW2(config-router)#
So I rebooted all the switches. Things look better (again). I even get so far as to be prompted for the ISE certificate when trying to connect to the CCIE.Sec-Admin WLAN. It doesn't actually connect, but still, we do have progress. Clearly there are issues, but I think rebooting the switches (rather than wasting time redesigning the LAN) is the way around it at the moment. Until I can fix the memory hog issue with spanning-tree. Moving to a solely layer-3 would be the best solution here.

The final tasks are to actually get something connected via the ISE server. Which will be the next post, once I can think up a suitable ISE/ICE-based pun.

Edit:

Ahhh, fuck!!!
(Cisco Controller) >ping 192.168.90.205

Send count=3, Receive count=0 from 192.168.90.205

(Cisco Controller) >
I might just replace all the switches with a single Arista switch.

Enough for now. I will just leave this here to outline my current feelings.

condescending wonka

CCIE #49337, author of CCNA and Beyond, BGP for Cisco Networks, MPLS for Cisco Networks, VPNs and NAT for Cisco Networks.

Related Posts

Previous
Next Post »

3 comments

comments
Gaz
8 June 2016 at 08:57 delete

Stu,

What IOS versions were these vIOS and IOL switches? I'm still doing R&S but I've also found issues when labbing out a few advanced scenarios.

I didn't want to invest much more time if your seeing these kind of issues on Cisco L2 IOL/vIOS devices. I'll have to fire up a L2 device and check my versions but it's the one from the latest VIRL release, just wondering if you have the same device and seeing these issues? In fact I have it here it's IOSvL2 - 15.2.4055 DSGS image.

I've had some success by disabling IGMP snooping on these virtual L2 devices and was wondering if you'd tried that.

Reply
avatar
8 June 2016 at 12:37 delete

Hey Gaz,

I used that image as well. I didn't try disabling igmp snooping though... But vEOS (Arista) is working well for me at the moment.

Reply
avatar
Gaz
12 June 2016 at 07:26 delete

Glad to hear it's working a lot better for you not much more frustrating than troubleshooting a faulty image when your trying to get something else working. I've not tried those switches out yet but I'd heard good things about them when run virtually so I'll have to give them a try.

Reply
avatar