Thesis Defense
"Fast and Highly-Available Stream Processing"
Jeong-Hyon Hwang
Wednesday, July 16, 2008 at 3:00 P.M.
Room 368 (CIT 3rd floor)
Recently, there has been significant interest in applications where high-volume, continuous data streams need to be processed with low latency. Such applications include financial market monitoring, network monitoring, intrusion detection, call analysis, battlefield monitoring, asset tracking, ecosystem monitoring. Since these applications monitor real-time events, the value of a result decays rapidly over time. Therefore, low-latency processing is a key requirement.
Stream processing systems enable efficient implementation of the aforementioned applications. Currently, many such systems are geared toward distributed processing because a large number of applications inherently involve geographically dispersed data sources and the processing capability of a system improves as more servers are used. However, the more computation and communication resources, the higher the odds of failure. In stream processing, a failure prevents low-latency processing because it blocks the flow of data streams. To make matters worse, it may also result in losing data essential to producing correct results.
In this dissertation, we propose various techniques that realize both reliable and timely processing of data streams in the face of server and network failures. We first discuss our basic recovery models, while comparing them in terms of recovery speed, CPU and network utilization, as well as their relationship to various recovery semantics. Next, we describe a fast recovery technique for commodity server clusters. In this technique, operators on each server are backed up on different servers and thus can be recovered in parallel. This technique assigns backup servers and schedules checkpoints in a manner that maximizes the recovery speed. Finally, we discuss our approach for Internet-scale stream processing. In this approach, multiple operator replicas send outputs to downstream replicas, allowing each replica to use whichever data arrives first. To further reduce latency, replicas run without coordination, possibly processing data in different orders. Despite this relaxation, the approach guarantees that applications always receive the same results as in the non-replicated, failure-free case. It also deploys replicas at locations that effectively improve performance and availability. Our experimental results demonstrate the effectiveness of the approaches above. These results were obtained from a server cluster at Brown University and a worldwide network testbed called PlanetLab.
Host: Stan Zdonik
| Page Owner: Webmaster | Last Modified: Mon Jun 30 14:45:46 2008 |