Case: Lokad Shelfcheck

by Rinat Abdullin, February 2012.

About the Problem

Studies show that on average, 5 to 10% of the products offered in grocery stores are unavailable at any given time, a situation which has not improved for 2 decades. Products that are not available to the consumer on the dedicated shelf space are referred to as being out-of-shelf (OOS). Monitoring OOS is the cornerstone of on-shelf availability optimization.

70% of surveyed shoppers would shop for an item at a competitor or online if it was unavailable. SymphonyIRI Group

Lost sales because of OOS for the Top 100 retailer is estimated at $69 billions. EPCglobal

Lokad Shelfcheck is a web system to monitor and detect on-shelf availability of products in stores. Suitable for both extremely large and small companies.

Want to learn more? Download Shelfcheck Brochure.

Features of Lokad Shelfcheck

  • Natively runs locally and on two clouds: Rackspace and Windows Azure.
  • Delivery of OSS alerts to any device - from desktop to smartphone.
  • Multi-tenant with flexible integration capabilities.
  • Elastic scalability of storage and computing - can handle large retail networks.
  • Big Data processing based on principles of MapReduce and Event Sourcing.
  • Full audit logs.

Story

Lokad Shelfcheck was the 3rd CQRS-based cloud project at Lokad. In this project we applied lessons learned at previous projects. We also pushed further 4 extremely important principles:

  • Crucial role of Domain-Driven Design in strategic design of the system.
  • Use of full event sourcing model with immutable blobs for persistence-ignorant and scalable data processing.
  • Full use of Lokad.CQRS abstractions that allowed to cleanly isolate cloud-specific concerns.
  • Techniques to reduce development friction.

Let's dive into these points.

1. Crucial Role of DDD

In previous systems, biggest share of problems was caused by complex designs. They were OK to deliver in the first few iterations, but ongoing maintenance was introducing ever-increasing amount of development friction.

To deal with the problem, Domain-Driven Design was applied from the very start, while leveraging "domain modeling by coding" advice by Greg Young. DDD helped to shape initial Bounded Contexts and help to choose implementation technologies for them. All dialogs with domain experts were actually also a domain modeling exercise that helped to explicitly formalize concepts in the field that was new for us (and for the market as well).

2. Technology Choices

A rough split of the server-side blocks by technology:

  • Behavioral code implemented as Aggregate Roots with Event Sourcing. It handled customer tenants, projects and also algorithmic behaviors for guiding through big data processing and complex integration scenarios (with various failure conditions).
  • Integration code (messy details) implemented as non-scalable set of command handlers that were taking chunk of data (and/or endpoint details), performing requested operation and then publishing event at the end.
  • Data processing code - immutable command handlers performing CPU-intensive or IO-intensive computations and publishing events at the end.

Behavioral, integration and data processing code was hosted within master process (worker in the cloud). Elastic CPU scaling was achieved by launching multi-threaded slave processes with data processing code (most CPU intensive).

Both master and slaves were implemented as simple instances of Lokad.CQRS engine.

3. Cloud Ignorance

Shelfcheck was the first Lokad project for Windows Azure Cloud that was initially developed without any use of Azure technologies. This reduced development friction (no need to depend on Azure development emulator and storage, which are not fit for processing gigabyte-large datasets locally) and simplified testing and POC experience. For instance, in certain tight cases, we had systems fully deployed and running in parallel on Windows Azure Cloud and Rackspace Cloud.

4. Reducing Development Friction

Initial iterations of Shelfcheck had to be delivered really fast (with 1 - 1.5 FT developers on average). Hence we discarded everything that didn't add value or introduced friction.

Initially there was no UI at all (event stream visualizer and some simple projections were more than enough). Sources for event stream visualizer were later open-sourced as Audit tool in Lokad.CQRS Sample Project.

case-sh-01.png

Dead-simple persistence (all is files or blobs) that make the most reliable persistence and xcopy deployment to any machine with .NET 4.0 (obviously, Azure deployments had to include some scaffolding).

Code Contracts DSL was used for rapid generation of message contracts. It was later polished and open-sourced as part of Lokad.CQRS project. This code DSL converts (or updates) on-the fly statements like this

AddSecurityPassword?(SecurityId id, string displayName, string login, string password)

into:

[DataContract(Namespace = "Sample")]
public partial class AddSecurityPassword : ICommand<SecurityId>
{
    [DataMember(Order = 1)] public SecurityId Id { get; private set; }
    [DataMember(Order = 2)] public string DisplayName { get; private set; }
    [DataMember(Order = 3)] public string Login { get; private set; }
    [DataMember(Order = 4)] public string Password { get; private set; }

    AddSecurityPassword () {}
    public AddSecurityPassword (SecurityId id, string displayName, string login, string password)
    {
        Id = id;
        DisplayName = displayName;
        Login = login;
        Password = password;
    }
}

Lessons Learned

  • Cloud-ignorance is a blessing and a way to massively reduce development friction. It became default option for all new projects at Lokad.
  • At certain level of elastic scaling, infrastructure can become a bottleneck. In case of Shelfcheck, adding more than 10 slave workers was yielding diminishing returns. Performance of Azure Queues (even with partitioning) was below our needs. So system should be built in a way that scaling it up automatically scales up the infrastructure. Fortunately ZeroMQ project provides detailed guidance and easy way to overcome this limitation in cloud-agnostic way.
  • DDD is extremely important and should be pushed further.
  • Ability to defer decisions and extremely low development friction are essential for the ability to build real products with limited resources (common to startups and small companies).

Learn More

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License