Mobile Sync for Mongo

We here at Zumero have been exploring the possibility of a mobile sync solution for MongoDB.

We first released our Zumero for SQL Server product almost 18 months ago, and today there are bunches of people using mobile apps which sync using our solution.

But not everyone uses SQL Server, so we often wonder what other database backends we should consider supporting. In this blog entry, I want to talk about some progress we've made toward a "Zumero for Mongo" solution and "think out loud" about the possibilities.

Background: Mobile Sync

The basic idea of mobile sync is to keep a partial copy of the database on the mobile device so the app doesn't have to go back to the network for every single CRUD operation. The benefit is an app that is faster, more reliable, and works offline. The flip side of that coin is the need to keep the mobile copy of the database synchronized with the data on the server.

Sync is tricky, but as mobile continues its explosive growth, this approach is gaining momentum:

If the folks at Mongo are already working on something in this area, we haven't seen any sign of it. So we decided to investigate some ideas.

Pieces of the puzzle

In addition to the main database (like SQL Server or MongoDB or whatever), a mobile sync solution has three basic components:

Mobile database
  • Runs on the mobile device as part of the app

  • Probably an embedded database library

  • Keeps a partial replica of the main database

  • Wants to be as similar as possible to the main database

Sync server
  • Monitors changes made by others to the main database

  • Sends incremental changes back and forth between clients and the main database

  • Resolves conflicts, such as when two participants want to change the same data

  • Manages authentication and permissions for mobile clients

  • Filters data so that each client only gets what it needs

Sync client
  • Monitors changes made by the app to the mobile database

  • Talks over the network to the sync server

  • Pushes and pulls incremental changes to keep the mobile database synchronized

For this blog entry, I want to talk mostly about the mobile database. In our Zumero for SQL Server solution, this role is played by SQLite. There are certainly differences between SQL Server and SQLite, but on the whole, SQLite does a pretty good job pretending to be SQL Server.

What embedded database could play this role for Mongo?

This question has no clear answer, so we've been building a a lightweight Mongo-compatible database. Right now it's just a prototype, but its development serves the purpose of helping us explore mobile sync for Mongo.

Embeddable Lite Mongo

Or "Elmo", for short.

Elmo is a database that is designed to be as Mongo-compatible as it can be within the constraints of mobile devices.

In terms of the status of our efforts, let me begin with stuff that does NOT work:

  • Sharding is an example of a Mongo feature that Elmo does not support and probably never will.

  • Elmo also has no plans to support any feature which requires embedding a JavaScript engine, since that would violate Apple's rules for the App Store.

  • We do hope to support full text search ($text, $meta, etc), but this is not yet implemented.

  • Similarly, we have not yet implemented any of the geo features, but we consider them to be within the scope of the project.

  • Elmo does not support capped collections, and we are not yet sure if it should.

Broadly speaking, except for the above, everything works. Mostly:

  • All documents are stored in BSON

  • Except for JS code, all BSON types are supported

  • Comparison and sorting of BSON values (including different types) works

  • All basic CRUD operations are implemented

  • The update command supports all the update operators except $isolated

  • The update command supports upsert as well

  • The findAndModify command includes full support for its various options

  • Basic queries are fully functional, including query operators, projection, and sorting

  • The matcher supports Mongo's notion of query predicates matching any element of an array

  • CRUD operations support resolution of paths into array subobjects, like x.y to {x:[{y:2}]}

  • Regex works, with support for the i, s, and m options

  • The positional operator $ works in update and projection

  • Cursors and batchSize are supported

  • The aggregation pipeline is supported, including all expression elements and all stages (except geo)

More caveats:

  • Support for indexes is being implemented, but they don't actually speed anything up yet.

  • The dbref format is tolerated, but is not [yet] resolved.

  • The $explain feature is not implemented yet.

  • For the purpose of storing BSON blobs, Elmo is currently using SQLite. Changing this later will be straightforward, as we're basically just using SQLite as a key-value store, so the API between all of Elmo's CRUD logic and the storage layer is not very wide.

Notes on testing:

  • Although mobile-focused Elmo does not need an actual server, it has one, simply so that we can run the jstests suite against it.

  • The only test suite sections we have worked on are jstests/core and jstests/aggregation.

  • Right now, Elmo can pass 311 of the test cases from jstests.

  • We have never tried contacting Elmo with any client driver except the mongo shell. So this probably doesn't work yet.

  • Elmo's server only supports the new style protocol, including OP_QUERY, OP_GET_MORE, OP_KILL_CURSORS, and OP_REPLY. None of the old "fire and forget" messages are implemented.

  • Where necessary to make a test case pass, Elmo tries to return the same error numbers as Mongo itself.

  • All effort thus far has been focused on making Elmo functional, with no effort spent on performance.

How Elmo should work:

  • In general, our spec for Elmo's behavior is the MongoDB documentation plus the jstests suite.

  • In cases where the Mongo docs seem to differ from the actual behavior of Mongo, we try to make Elmo behave like Mongo does.

  • In cases where the Mongo docs are silent, we often stick a proxy in front of the Mongo server and dump all the messages so we can see exactly what is going on.

  • We occasionally consult the Mongo server source code for reference purposes, but no Mongo code has been copied into Elmo.

Notes on the code:

  • Elmo is written in F#, which was chosen because it's an insanely productive environment and we want to move quickly.

  • But while F# is a great language for this exploratory prototype, it may not be the right choice for production, simply because it would confine Elmo use cases to Xamarin, and Miguel's world domination plan is not quite complete yet. :-)

  • The Elmo code is now available on GitHub at https://github.com/zumero/Elmo. Currently the license is GPLv3, which makes it incompatible with production use on mobile platforms, which is okay for now, since Elmo isn't ready for production use anyway. We'll revisit licensing issues later.

Next steps:

  • Our purpose in this blog entry is to start conversations with others who may be interested in mobile sync solutions for Mongo.

  • Feel free to post a question or comment or whatever as an issue on GitHub: https://github.com/zumero/Elmo/issues

  • Or email me: eric@zumero.com

  • Or Tweet: @eric_sink

  • If you're interested in a face-to-face conversation or a demo, we'll be at MongoDB World in NYC at the beginning of June.