Troubleshooting Checkpoint ClusterXL

I recently came across an issue where SmartView Monitor showed an error for ClusterXL on a freshly rebuilt Checkpoint IP565 firewall. Both Synchronization and Filter were stuck in an initilizing state, we tried the following troubleshooting steps initially to no avail:

  1. cphastop followed by cphastart
  2. cpstop followed by cpstart
  3. reboot of the affected firewall

On digging deeper we noticed that one of the firewall devices was configured to use multicast and one for broadcast cluster communications, this was identified using the following command ‘cphaprob -a if‘ which presents the following output:

  eth-s1p3c0      non sync(non secured)
  eth-s4p3c0      non sync(non secured)
  eth-s4p4c0      non sync(non secured)
  eth-s1p1c0      non sync(non secured)
  eth-s1p4c0      sync(secured), multicast
  eth-s1p2c0      non sync(non secured)
  eth-s4p1c0      non sync(non secured)
  eth-s4p2c0      non sync(non secured)

  Virtual cluster interfaces: 7

  eth-s1p3c0      xx.xx.xx.xx
  eth-s4p3c0      xx.xx.xx.xx
  eth-s4p4c0      xx.xx.xx.xx
  eth-s1p1c0      xx.xx.xx.xx
  eth-s1p2c0      xx.xx.xx.xx
  eth-s4p1c0      xx.xx.xx.xx
  eth-s4p2c0      xx.xx.xx.xx

Both firewalls must be configured to use the same method of communication, which can be changed using the following command ‘cphaconf set_ccp multicast‘ or ‘cphaconf set_ccp broadcast‘. Providing your switching infrastructure supports multicast you should use this mode due to the performance overhead of broadcast communication. This command failed to change the method of communication and left us with no other option than to perform the following steps:

  1. Set Checkpoint Packages as in-active, then delete them ensuring that the Connectra package is removed first.
  2. Re-install the Checkpoint R65 IPSO Wrapper
  3. Re-install HFA 70
  4. Re-establish SIC via CPConfig and SmartDashboard
  5. Unassign and re-assign license via SmartUpdate
  6. Push policy from the SmartDashboard

After performing thse steps the cluster CCP was back to multicast (bizare really…). We had to perform a reboot of the second device once this was completed, at which point both nodes of the cluster reported no ClusterXL errors, ‘cphaprob list‘ showed the following output:

# cphaprob list

Registered Devices:

Device Name: Synchronization
Registration number: 0
Timeout: none
Current state: OK
Time since last report: 213003 sec

Device Name: Filter
Registration number: 1
Timeout: none
Current state: OK
Time since last report: 213003 sec

Device Name: cphad
Registration number: 2
Timeout: 5 sec
Current state: OK
Time since last report: 0.7 sec

Device Name: fwd
Registration number: 3
Timeout: 5 sec
Current state: OK
Time since last report: 0.5 sec

fw ctl pstat‘ should also list the Synch as ‘Able to Send/Receive sync packets’ :

# fw ctl pstat

Machine Capacity Summary:
  Memory used: 14% (90MB out of 637MB) – below low watermark
  Concurrent Connections: 26% (17876 out of 67900) – below low watermark
  Aggressive Aging is in monitor only

Hash kernel memory (hmem) statistics:
  Total memory allocated: 200278016 bytes in 48894 4KB blocks using 2 pools
  Initial memory allocated: 20971520 bytes (Hash memory extended by 179306496 bytes)
  Memory allocation  limit: 536870912 bytes using 10 pools
  Total memory bytes  used: 23487660   unused: 176790356 (88.27%)   peak: 34170776
  Total memory blocks used:     7126   unused:    41768 (85%)   peak:     9164
  Allocations: 1183931215 alloc, 0 failed alloc, 1183678473 free

System kernel memory (smem) statistics:
  Total memory  bytes  used: 250335916   peak: 300842432
    Blocking  memory  bytes   used:  1865892   peak:  2596156
    Non-Blocking memory bytes used: 248470024   peak: 298246276
  Allocations: 160033475 alloc, 0 failed alloc, 160032829 free, 0 failed free

Kernel memory (kmem) statistics:
  Total memory  bytes  used: 73389696   peak: 101169940
        Allocations: 1184023246 alloc, 0 failed alloc, 1183769860 free, 0 failed free
        External Allocations: 0 for packets, 0 for SXL

Kernel stacks:
        0 bytes total, 0 bytes stack size, 0 stacks,
        0 peak used, 0 max stack bytes used, 0 min stack bytes used,
        0 failed stack calls

        1029526467 packets, -2128289516 operations, 373013811 lookups,
        2035 record, 183665476 extract

        -1649393933 total, 0 alloc, 0 free,
        4607 dup, -1525329462 get, 138972711 put,
        -1565092568 len, 217535 cached len, 0 chain alloc,
        0 chain free

        54513276 total, 52537755 TCP, 1898998 UDP, 76506 ICMP,
        17 other, 49485065 anticipated, 1 recovered, 17882 concurrent,
        24286 peak concurrent

        213594 fragments, 105472 packets, 389 expired, 0 short,
        0 large, 0 duplicates, 0 failures

        23444077/0 forw, 29804768/0 bckw, 53234829 tcpudp,
        14016 icmp, 702040-723136 alloc

        Version: new
        Status: Able to Send/Receive sync packets
        Sync packets sent:
         total : 78286072,  retransmitted : 16171, retrans reqs : 20,  acks : 3
        Sync packets received:
         total : 17030603,  were queued : 16591, dropped by net : 15
         retrans reqs : 8840, received 3 acks
         retrans reqs for illegal seq : 0
         dropped updates as a result of sync overload: 0