Zumero: Efficient sync by using multiple SQLite files

The code for this little project is getting too long to be inlined here in the article, so I've posted it on Github.

Mobile offline RSS reader, Part 3

Lots of new stuff. I've now got three separate executables:

  • z_rss_create.exe -- For each feed that has not yet been initialized, create its dbfile and set the permissions on it.

  • z_rss_update.exe -- Retrieve XML and store items for some feeds. The word "some" is defined as "all feeds that have never been updated" plus "up to five feeds that have not been updated in at least an hour".

  • z_rss_sync.exe -- Synchronize the feed dbfiles up to the Zumero server.

All of these are part of the "admin" side of things. In other words, I still haven't written any code that is intended to run on a mobile device. So far, I'm only doing stuff that is intended to run as a cron job on a server somewhere.

C# Rocks

The Zumero core code is written in C, mostly because SQLite is written in C.

But most people don't actually use SQLite from C. Rather, they call it through some binding for their favorite higher-level language. Or they are even further away, and they're not calling SQLite at all, because some sort of ORM is calling it for them.

I'm probably as much a fan of C as anybody. I've had love affairs with a lot of different programming languages, but C is the one I always come back to.

Still, there's something nice about writing this:

string sync_result = db.ExecuteScalar(
    @"SELECT zumero_sync(                            
        'main',                             
        ?, 
        ?, 
        zumero_internal_auth_scheme('zumero_users_admin'), 
        'admin', 
        ?
        );", 
    server_url, 
    dbfile_name_for_this_feed, 
    password
    );

Instead of this:

int rc;
char* sync_result = NULL;
sqlite3_stmt* pStmt = NULL;

rc = sqlite3_prepare_v2(
    db, 
    "SELECT zumero_sync("
    "'main',"
    "?,"
    "?,"
    "zumero_internal_auth_scheme('zumero_users_admin'),"
    "'admin',"
    "?"
    ");", 
    -1, 
    &pStmt, 
    NULL); 
if (rc) goto fail;
rc = sqlite3_bind_text(pStmt, 1, server_url, -1, NULL); 
if (rc) goto fail;
rc = sqlite3_bind_text(pStmt, 2, dbfile_name_for_this_feed, -1, NULL); 
if (rc) goto fail;
rc = sqlite3_bind_text(pStmt, 3, password, -1, NULL); 
if (rc) goto fail;
rc = sqlite3_step(pStmt);
if (rc != SQLITE_ROW) goto fail;
sync_result = strdup(sqlite3_column_text(pStmt, 0));
sqlite3_finalize(pStmt);
pStmt = NULL;

// use sync_result

free(sync_result);
sync_result = NULL;

I like C, but I like higher-level languages too.

In fact, my original plan was to use C# for the first part of these series and then do the subsequent parts in Python or whatever. But I forgot how much I like C#. I'm comfy in this particular chair, and I don't feel like getting up. So for now, Python will have to wait.

Why am I putting each feed in its own SQLite file?

If you stop and think about it, it's really amazing how much power is packed into today's smart phones.

As I briefly mentioned in the post where I announced Zumero, our company got its start by doing mobile stuff. Way back in 1997, we were one of the many contractors working on a Motorola project to build one of the first smart phones. That device had a CPU that would be considered glacially slow by today's standards. The idea of storing movies or songs wasn't even considered, since WiFi and 3G didn't exist yet, and the amount of flash storage on the phone was a number that currently rounds to zero.

In relative terms, the iPhone 5 is like a supercomputer.

But bandwith and on-device storage are still precious commodities, and mobile apps need to pay careful attention to how these resources are used.

So why am I putting each feed in a separate SQLite file? Because not everybody wants every feed, and I don't want to waste bandwidth and storage on things that the user is not going to read.

If you're building a mobile app on Zumero (or any other replicate-and-sync database platform), it is worth thinking about the best way to structure your data such that each device can get exactly the stuff it wants.

SQLite Rocks

Splitting things up sounds like it would cause a lot of hassle, but it turns out that this is one area where SQLite shines. By using the ATTACH statement, SQLite allows you to work with multiple database files from a single connection handle.

For example, in the "all_feeds" database, I have a table called "about", which is where I store the title of the feed. I am also using the presence of a title as a flag which indicates whether I have created the dbfile for the feed's content or not.

So, when somebody adds a new feed to the list, I need to:

  • In the per-feed dbfile (which I need to create), create tables for storing content and permissions.

  • In the "all_feeds" dbfile, insert a row for the title of the feed.

These two operations need to happen in two different SQLite databases. But like I said, SQLite makes this easy. Take a look at some snippets from z_rss_create.cs:

I already have SQLiteConnection db as the SQLite handle for my "all_feeds" database. To do the two steps above, I first ATTACH the new dbfile into this handle, giving it the name "cur".

db.Execute("ATTACH ? AS cur;", dbfile_name_for_this_feed);

Now I can perform SQL operations on either database by simply qualifying the table name with "cur." (for the new dbfile for this feed) ...

db.Execute(
        @"CREATE VIRTUAL TABLE 
        cur.items 
        USING zumero
        (
          id TEXT PRIMARY KEY NOT NULL, 
          title TEXT NOT NULL,
          summary TEXT NOT NULL,
          pubdate_unix_time INTEGER NOT NULL
        );"
        );

... or "main." (for the "all_feeds" db).

db.Execute("INSERT INTO main.about (feedid, title) VALUES (?,?)",
        q.feedid,
        f.Title.Text
        );

Furthermore, by placing both of these operations inside an explicit transaction, the two steps become atomic.

This same technique is also used in z_rss_update.cs where I need to (1) store the actual content of the feed in its dbfile, and (2) update "all_feeds"."last_update" to remember when we retrieved the the XML for that feed.

More to come in part 4...