Theses Doctoral

On SIP Server Clusters and the Migration to Cloud Computing Platforms

Kim, Jong Yul

This thesis looks in depth at telephony server clusters, the modern switchboards at the core of a packet-based telephony service. The most widely used de facto standard protocols for telecommunications are the Session Initiation Protocol (SIP) and the Real Time Protocol (RTP). SIP is a signaling protocol used to establish, maintain, and tear down communication channel between two or more parties. RTP is a media delivery protocol that allows packets to carry digitized voice, video, or text.
SIP telephony server clusters that provide communications services, such as an emergency calling service, must be scalable and highly available. We evaluate existing commercial and open source telephony server clusters to see how they differ in scalability and high availability.
We also investigate how a scalable SIP server cluster can be built on a cloud computing platform. Elasticity of resources is an attractive property for SIP server clusters because it allows the cluster to grow or shrink organically based on traffic load. However, simply deploying existing clusters to cloud computing platforms is not good enough to take full advantage of elasticity. We explore the design and implementation of clusters that scale in real-time. The database tier of our cluster was modified to use a scalable key-value store so that both the SIP proxy tier and the database tier can scale separately. Load monitoring and reactive threshold-based scaling logic is presented and evaluated.
Server clusters also need to reduce processing latency. Otherwise, subscribers experience low quality of service such as delayed call establishment, dropped calls, and inadequate media quality. Cloud computing platforms do not guarantee latency on virtual machines due to resource contention on the same physical host. These extra latencies from resource contention are temporary in nature. Therefore, we propose and evaluate a mechanism that temporarily distributes more incoming calls to responsive SIP proxies, based on measurements of the processing delay in proxies.
Availability of SIP server clusters is also a challenge on platforms where a node may fail anytime. We investigated how single component failures in a cluster can lead to a complete system outage. We found that for single component failures, simply having redundant components of the same type are enough to mask those failures. However, for client-facing components, smarter clients and DNS resolvers are necessary.
Throughout the thesis, a prototype SIP proxy cluster is re-used, with variations in the architecture or configuration, to demonstrate and address issues mentioned above. This allows us to tie all of our approaches for different issues into one coherent system that is dynamically scalable, is responsive despite latency varations of virtual machines, and is tolerant of single component failures in cloud platforms.


  • thumnail for Kim_columbia_0054D_13174.pdf Kim_columbia_0054D_13174.pdf application/pdf 8.9 MB Download File

More About This Work

Academic Units
Computer Science
Thesis Advisors
Schulzrinne, Henning G.
Ph.D., Columbia University
Published Here
April 21, 2016