The Data Calculator: Data Structure Design and Cost Synthesis from First Principles and Learned Cost Models

February 18, 2024

A fundamental challenge in computer science is designing efficient data structures. Given how important and time consuming this is, the authors of The Data Calculator (SIGMOD 2018) ask if we can automate parts of the process. While I do not find their system to be applicable to the general problem of data structure design, it has some neat ideas, many of which may be useful for automatically optimizing and tuning implementations.

EdgeKV: Distributed Key-Value Store for the Network Edge

August 16, 2021

To avoid the communication bottleneck caused by cloud-centric data storage, the authors of this paper propose the general-purpose storage system EdgeKV as a decentralized alternative. Unlike other storage systems for the edge present in the current literature, EdgeKV provides strong consistency guarantees. The fundamental idea of EdgeKV is to store key-value pairs in a Distributed Hash Table (DHT) over various edge servers. These edge servers are referred to as gateway nodes. Each gateway node is uniquely connected to a single, separate nearby group of edge nodes, where each group is defined as a cluster of edge devices in close proximity to one another. Instead of storing key-value pairs on the gateway nodes directly, each gateway node stores its data on a Replicated State Machine (RSM) formed over the nodes of the group that the gateway node is connected to.

About

I am a PhD candidate at University at Buffalo. I am interested in applied algorithms and data structures for managing massive volumes of data in resource constrained environments. My research focuses on problems with applications to scientific computing, especially those important to computational biologists. My goals are to both (i) develop the theoretical tools needed by system designers and (ii) apply these tools by implementing full-scale systems for scientific applications. My current focus is on edge computing for real-time DNA analytics.

Andrew J. Mikalsen

Recent Posts

The Data Calculator: Data Structure Design and Cost Synthesis from First Principles and Learned Cost Models

EdgeKV: Distributed Key-Value Store for the Network Edge

About

More

Publications