High network retransmission rate as root cause

Understanding retransmission rate

When a network link or segment is overloaded or under performing, it drops data packets. This is because overloaded network equipment queues are purged during periods of excessive traffic or limited hardware resources. In response, TCP protocol mechanisms attempt to fix the situation by re-transmitting the dropped packets.

Ideally, retransmission rates should not exceed 0.5% on local area networks and 2% in Internet or cloud based networks. Retransmission rates above 3% will negatively affect user experience in most modern applications. Retransmission issues are especially noticeable by customers using mobile devices in poor network coverage areas.

See the example below at 04:00 on the timeline. This period of high retransmission has dramatically increased the duration of each user action and reduced the number of user actions per minute. Although different applications have varying sensitivity to poor network connection quality, such a condition will likely not only be detectable on the infrastructure level. It will also affect the response time of your application’s services and ultimately degrade user experience in the form of increased load times. 

The problem is detected

Dynatrace detects this problem and monitors its severity across the infrastructure, service, and real user monitoring layers. Thereby showing you how this infrastructure issue translates into user experience problems for your customers. 

Below is an example of packet loss causing high TCP retransmission rates on an Apache web server. This high TCP retransmission rate causes service response time to increase (the server stack needs more time to re-transmit the missing data packets). This ultimately has an impact on end user experience because the users now have to wait longer for their web pages to load.

Root cause analysis of the problem

Dynatrace correlates these incidents across the infrastructure, service, and user experience layers and presents its root cause analysis. See the Root cause section of the Problem page below: Retransmission rate is shown to be the root cause of this problem.