The Corelatus Blog
E1/T1 and SDH/SONET telecommunications

How does TCP behave on an interrupted network?

Posted August 27th 2009

GTH E1/T1 modules are always controlled by a general-purpose server, usually some sort of unix machine. The server and GTH are connected by ethernet and communicate using TCP sockets. Normally, that ethernet connection is chosen to be simple and reliable, for instance by putting the server and the GTH in the same rack, connected to the same ethernet switch.

I experimented a bit to see what happens when that network gets interrupted. I interrupted the network in a reproduceable way by disabling and re-enabling the server's ethernet port for a known length of time while running a <recorder>. (A <recorder> sends all the data, typically someone talking, from an E1 timeslot to the server over a TCP socket, 8000 octets per second.)

Capturing the ethernet packets

Here's what I did to capture traffic and interrupt the ethernet:

  tcpdump -w /tmp/capture.pcap -s 0 not port 22
  sudo ifconfig eth0 down; sleep 5; sudo ifconfig eth0 up
  

A trace where traffic recovers in time to prevent an overrun

The GTH buffers about two seconds of timeslot traffic. So a 'sleep' of about a second won't result in an overrun. Here's what it looks like in wireshark:

PacketTimeDirectionFlagsSeq. #

133 7.596 GTH -> server [PSH, ACK] 59393
134 7.633 server -> GTH [ACK] 1
135 7.724 GTH -> server [PSH, ACK] 60417
136 7.761 server -> GTH [ACK] 1
137 7.852 GTH -> server [PSH, ACK] 61441
138 7.889 server -> GTH [ACK] 1
139 7.980 GTH -> server [PSH, ACK] 62465
140 8.017 server -> GTH [ACK] 1
141 8.108 GTH -> server [PSH, ACK] 63489
142 8.145 server -> GTH [ACK] 1
143 8.236 GTH -> server [PSH, ACK] 64513
144 8.273 server -> GTH [ACK] 1
145 8.364 GTH -> server [PSH, ACK] 65537
146 8.401 server -> GTH [ACK] 1
147 10.151 GTH -> server [PSH, ACK] 66561
148 10.151 server -> GTH [ACK] 1
149 10.151 GTH -> server [ACK] 67585
150 10.151 server -> GTH [ACK] 1

Everything up to packet 146 is normal: the GTH (172.16.2.5) sends 8000 octets every second and the server (172.16.2.1) acks them. It happens to be in chunks of 1024 octets about eight times per second. After packet 146, about 8.4 seconds after the capture started, the ethernet interface went down and stayed down for 1s. The TCP stream started up again after about 1.5s and then 'caught up' by sending many packets in quick succession.

A trace where traffic didn't recover

I took a second trace similar to the first one, except this time, I disabled ethernet for about five seconds:

Packet Time     Source IP     Dest IP    SPort   DPort
----------------------------------------------------------------------
 28   1.040083  172.16.2.5 -> 172.16.2.1 54271 > 45195 [PSH, ACK] Seq=7169
 29   1.040095  172.16.2.1 -> 172.16.2.5 45195 > 54271 [ACK] Seq=1
 30   1.168065  172.16.2.5 -> 172.16.2.1 54271 > 45195 [PSH, ACK] Seq=8193
 31   1.168078  172.16.2.1 -> 172.16.2.5 45195 > 54271 [ACK] Seq=1
 32   1.296067  172.16.2.5 -> 172.16.2.1 54271 > 45195 [PSH, ACK] Seq=9217
 33   1.296079  172.16.2.1 -> 172.16.2.5 45195 > 54271 [ACK] Seq=1
 34   1.424068  172.16.2.5 -> 172.16.2.1 54271 > 45195 [PSH, ACK] Seq=10241
 35   1.424081  172.16.2.1 -> 172.16.2.5 45195 > 54271 [ACK] Seq=1
 36   7.782851  172.16.2.5 -> 172.16.2.1 54271 > 45195 [PSH, ACK] Seq=11265
 37   7.782863  172.16.2.1 -> 172.16.2.5 45195 > 54271 [ACK] Seq=1
 38   7.783406  172.16.2.5 -> 172.16.2.1 54271 > 45195 [ACK] Seq=12289
 39   7.783413  172.16.2.1 -> 172.16.2.5 45195 > 54271 [ACK] Seq=1
 40   7.783569  172.16.2.5 -> 172.16.2.1 54271 > 45195 [ACK] Seq=13737
...
 50   7.784962  172.16.2.5 -> 172.16.2.1 54271 > 45195 [FIN, PSH, ACK] Seq=23873
 51   7.784972  172.16.2.1 -> 172.16.2.5 45195 > 54271 [ACK] Seq=1
 52   7.785026  172.16.2.1 -> 172.16.2.5 45195 > 54271 [FIN, ACK] Seq=1
 53   7.785348  172.16.2.5 -> 172.16.2.1 54271 > 45195 [ACK] Seq=25322

Everything is normal up to packet 35. Then, ethernet is suspended for five seconds and TCP takes a further second to recover, which causes a buffer overrun on the GTH (172.16.2.5). The GTH closes the socket at packet 50 and also sends an overrun event to the application so that it knows why the socket was closed.

Bottom line

GTH uses IP for control and traffic. It is important that the IP link between the GTH and the server is simple and reliable. Ideally the GTH and server should be in the same rack and be connected by an ethernet switch.

It's possible for a system to survive a short interruption (less than a second) to the ethernet traffic without pre-recorded calls getting interrupted. For longer interruptions, all bets are off.

(Interruptions aren't the only type of network problem, e.g. radio networks such as 802.11 can suffer significant packet loss, which can trigger TCP congestion avoidance. But that's another topic.)

Permalink | Tags: GTH, questions-from-customers