This post was originally published on The Data Source on October 16th, 2023, my monthly newsletter covering the top innovation in data infrastructure, engineering and developer-first tooling. Subscribe here and never miss an issue!
Like cloud computing, edge computing is a significant paradigm shift that’s poised to be the next platform shift. This shift stems from the fundamental way in how data is processed and managed. Unlike the traditional cloud model, which relies on centralized data centers, edge computing involves processing data close to its source of generation.
There are two critical factors for the widespread adoption of edge computing: speed and cost-effectiveness. Businesses today are increasingly reliant on near-instantaneous processing of data. This is particularly true for applications that require real-time responses, such as IoT devices, autonomous systems, and immersive technologies like virtual and augmented reality. Edge computing addresses this need by enabling data processing to occur closer to the source, significantly reducing the time it takes for information to traverse back and forth between devices and centralized cloud servers.
This past month, I spent time digging into the world of edge computing to understand the top trends shaping the broader ecosystem, the challenges surrounding data management at the edge as well as areas poised for innovation. Some observations:
Trends Shaping The Edge Computing Ecosystem
- The rise of intelligent edge orchestration: Containerized apps like Docker and Kubernetes have simplified app development and deployment at the edge, cutting down costs associated with managing complex software. As edge computing advances, there will be stronger emphasis on improving intelligent orchestration at the edge. This involves developing sophisticated tools to manage and distribute computing resources and enable flexible workload allocation across various edge devices. This also includes optimizing processing tasks based on factors like latency, data sensitivity, and available resources. This would enhance collaboration between edge devices, fog nodes, and central cloud infrastructure, ensuring optimal performance for applications relying on fast, real-time data processing and low response times.
The advent of WebAssembly (WASM) also ties to this broader trend: By virtue of offering lightweight modules, WASM consumes fewer resources, leading to cost savings that can benefit end-users. With its promise of "write once, run anywhere," code could be executed seamlessly across various platforms and environments.
- Better data management at the network edge: This evolution is driven by the increasing demand for real-time, interactive web applications. CDNs are now incorporating stateful capabilities at the network edge, allowing for seamless and personalized content delivery. Because of this shift, there's more data being created and processed near users which puts the focal point on good data management, i.e., storing, synchronizing, and caching data at the edge.
But Managing Data At The Edge Is No Easy Feat…
- Limited computational resources: Edge devices often have constrained processing power, memory, and storage capacity compared to centralized servers. This limitation can hinder the speed and complexity of data processing tasks.
- Intermittent connectivity can impact database operations: Edge devices may operate in environments with unreliable / intermittent network connectivity. This can lead to difficulties in transmitting data to centralized servers or other edge nodes, affecting real-time data processing. For instance, if an edge device loses its connection to a central database server, it may not be able to perform real-time reads or writes.
- Data security and privacy concerns: Ensuring the security and privacy of data at the edge can be challenging. Edge devices may not have the same level of security measures as centralized data centers, making them potentially more vulnerable to breaches. Moreover, ensuring compliance with data protection regulations can be a complex endeavor in distributed environments.
Opportunities In The Edge Computing Market
- Data synchronization and caching: Implementing a dynamic caching system that intelligently stores and retrieves frequently accessed data on the edge device. This system would be capable of adapting to changing network conditions and prioritize data synchronization based on predefined criteria, ensuring real-time processing even in unreliable network environments.
- Edge-optimized data deployment: This involves pipelines that leverage lightweight data processing techniques, distribute workloads for parallel processing, and intelligently offload tasks to more powerful central servers or the cloud when needed. It incorporates caching mechanisms, operates asynchronously, and offers offline capabilities to mitigate the impact of intermittent connectivity on database operations.
- Edge-first security and compliance solutions: Creating a comprehensive edge security framework that includes encryption at rest and in transit, secure boot processes, and continuous monitoring for anomalies. This would incorporate compliance tools that automate the adherence to data protection regulations, ensuring that edge deployments meet legal requirements.
What stood out the most from my initial research is that caching as a technique is a crucial aspect of edge computing platforms. It stores frequently accessed data close to users, reducing the need for communication with distant servers and thus minimizing delays. This is important for applications like voice assistants and NLP, live broadcasting and streaming, and video conferencing that require real-time or low-latency interactions. Companies like Akamai, Cloudflare and Fastly have demonstrated the value of caching by baking it into their solutions to improve website performance and responsiveness.
However, there are challenges associated with caching. Given that it necessitates specialized software and hardware configurations to effectively handle the storing, retrieving, and expiring cached content, implementing caching is technically challenging. It is also a niche topic that requires domain expertise in distributed systems and hardware. Despite its complexity, caching is a basic requirement for running data and compute on the edge and is required for data intensive applications. This is why today, we see so many startups such as Readyset, Materialize, Polyscale and others building in the “caching” market. What’s interesting is that while caching might work as a product wedge for many, the key question remains: how and where will these caching startups potentially evolve in the next couple of years? This is the most intriguing to me and where I’m focusing my research.
As I get up to speed on this topic, here’s a compilation of resources that I’ve enjoyed:
So, you want to deploy on the edge? by Zak Knill
“If a user makes a request from Europe, and the apps run in US East, that adds an extra 100-150 ms of latency just by round-tripping across the Atlantic… Edge computing tries to solve this problem, by letting app developers deploy their applications across the globe, so that apps serve the user requests closer to the user. This removes a lot of the round-trip latency because the request has to travel less far before getting to a data center that hosts the app. …Edge computing sounds great for reducing response times for users, but the main thing stopping developers from adopting edge computing is data consistency.”
Making Shopify’s Flagship App 20% Faster in 6 Weeks Using a Novel Caching Solution by Ryan Ehrlich
“At Shopify, we use two different technologies for caching: Memcached and Redis. Redis is more powerful than Memcached, supporting more complex operations and storing more complex objects. Memcached is simpler, has less overhead, and is more widely used for caching inside Shop. While we use Redis for managing queues and some caches, we didn’t need Redis’ complexity, so we chose a distributed Memcached.”
Cloudflare on the Edge by Ben Thompson
“Most computing resources that run on cloud computing platforms, including serverless platforms, are created by developers who work at companies where compliance is a foundational requirement. And, up until to now, that’s meant ensuring that platforms follow government regulations like GDPR (European privacy guidelines) or have certifications providing that they follow industry regulations such as PCI DSS (required if you accept credit cards), FedRamp (US government procurement requirements), ISO27001 (security risk management), SOC 1/2/3 (Security, Confidentiality, and Availability controls), and many more… But there’s a looming new risk of regulatory requirements that legacy cloud computing solutions are ill-equipped to satisfy. Increasingly, countries are pursuing regulations that ensure that their laws apply to their citizens’ personal data. One way to ensure you’re in compliance with these laws is to store and process data of a country’s citizens entirely within the country’s borders.”
People to Follow
Kurt is the co-founder and CEO of Fly.io. Fly is an application delivery cloud that enables developers to run their apps closer to the end user.
Pekka is the co-founder and CTO of Turso. Turso is a database that enables developers to write, deploy and maintain highly distributed and performant apps.
Sahn is a software engineer at Discord and co-author of the System Design Interview Book Series.
Practitioners and startup builders, if this is an area of interest to you, please reach out to chat!