_____________________________________________________________________
SUMMARY
Data processing solutions for cloud computing infrastructures, such as Map-Reduce and DryadLINQ, have been designed for batch processing of bulk data but have proved inadequate for real-time processing. Latency can be improved by extending batch processing systems, by exposing partial computations and early results, as in (e.g. Map-Reduce Online) or pursuing incremental models. Such approaches can weaken resilience to faults. SwiftComp seeks to combine the low latency benefits of soft state based decentralized computations with the fault tolerance guarantees of a backing store.
SwiftComp will design specialized computation CRDTs (Conflict-free Replicated Data Type), such as min/max registers, counters, accumulators, products, sorted sets/maps, that preform computations as new data is added to the CRDT, thus performing incremental computations.Ssince updates to CRDTs happen primarily in memory replicas, computations will benefit from the low latency and high throughput of soft state. The project will leverage past experience in stream and incremental data processing and use Participatory Sensing (P/S) as a reference scenario. P/S explores the ubiquity of sensor capable mobile phones to capture information about the environment, users’ routines, etc.
_____________________________________________________________________
SOLUTIONS
- Improving cloud computing architecture by designing data processing solutions which combine the low latency benefits of soft state based decentralized computations with the fault tolerance guarantees of a backing store
- Addressing real-time, incremental data processing by tightly integrating computations into a decentralized storage system, striving for both performance and resilience, to ensure deterministic computations in the presence of faults
- Using the Conflict-free Replicated Data Type (CRDT) abstraction to allow multiple replicas of the same data (structure) to be modified without coordination, guaranteeing that replicas can be merged later without need for any conflict resolution policy.
- Pursuing a programming model for combining computation and storage abstractions is a seamless way, including provisions for computations that process live streams and globally consistent snapshots of partial or whole datasets.
_____________________________________________________________________
LINK: http://citi.di.fct.unl.pt/project/project.php?id=107
_____________________________________________________________________
Back to Portugal