sudo fry rolls /* This blog is actually mobile-friendly! */

A physical postcard in a world of virtual goods

No matter how much we advance in our technology and migrate more and more aspects of life into the virtual realm, there is still this strange attachment to physical objects. I still like keeping small photobooks of my own photographs, and I enjoy shooting film very much. To me, being able to appreciate the physical embodiment of one's work is a magical feeling, given my complete opposite digital lifestyle.

Long story short, I whipped up an MVP for Esplorio Postcards when we were on a coding trip in Sri Lanka out of the frustration of having to find a post office, hunting for a generic card (as a photographer I want to send the gift of my own creation and experience) then trying to communicate which stamps I want. Instead, now that I've already had an Esplorio feed of organised pretty photos, a customised physical postcard to my loved ones is a tap and an Apple Pay touch away on the new version of Esplorio iOS app. See how easy it is:

This is why I love working in a startup - going from ideas to production is a very quick journey.

If you are interested in my work, please go ahead and check out https://home.esplor.io

How TechCrunch Japan broke our app - Handling local calendars in Swift

japan

"Timezone", "calendar", "region", "locale" are 4 words that strike fear into my mind

I am writing this blog entry as a note to myself for future reference in case other people have the same problems as well, especially if you use DateTools and CoreData in your app

Time format plays a big role in our application. We not only need to be able to make requests to our backend servers with the right format (let's say something like "get all data before 1 Jan 2016"), but also need to show the user the date in a consistent way. Now since we suddenly got a big feature in TechCrunch Japan and had an influx of new users from the other side of the planet, I thought maybe it is a good idea to switch to the Japanese calendar to see if our app still works correctly.

This uncovered a whole world of hurt given that we have not done any proper localisation yet, and are really keen on keeping Gregorian calendar as the standard across our codebase.

Problem 1

Unless specified, the default locale for any dates in the system will obey the locale of the local region. This is a well-known problem - even Apple has an official documentation here stating that you should really use en_US_POSIX if you want to keep everything consistently Gregorian. This was solved easily by setting all our date formatters to use that locale.

Problem 2

Like many other apps, we use CoreData as the backend for our client-side database and DateTools to handle many datetime-related tasks. The issue starts with NSDate instances stored in CoreData obeying the system calendar. It means this piece of code:

    let today = <A date value retrieved from CoreData>
    let formatter = NSDateFormatter()
    formatter.locale = NSLocale(localeIdentifier: "en_US_POSIX")
    formatter.dateFormat = `dd MMM yyyy`
    formatter.stringFromDate(today)

would result in 04 Jan 0028 once the phone is set to use Japanese calendar, and today will think of itself as if it is the year 0028 in Gregorian. If you then use it to perform calculations/manipulations or send it as it is to your backend API to process, it will be wrong.

The solution is to actually be using the en_US_POSIX locale but set both the default calendar identifier in DateTools and the calendar identifier on each formatter to the default system identifier. If you do not use DateTools, do make sure that whatever date/calendar solution you go with in the end does this as well!

    // The DateTools method to set default calendar
    // (do this once at app launch)
    NSDate.setDefaultCalendarIdentifier(NSCalendar.currentCalendar().calendarIdentifier)

    // Rest of the code
    let today = <A date value retrieved from CoreData>
    let formatter = NSDateFormatter()
    formatter.locale = NSLocale(localeIdentifier: "en_US_POSIX")
    formatter.dateFormat = `dd MMM yyyy`
    formatter.stringFromDate(today)

It means that any NSDate instances retrieved from CoreData would always point to the right point in time, i.e. 04 Jan 0028 in Japan calendar now always points to the same point in time as 04 Jan 2016 in Gregorian calendar (having the same Unix timestamp 1451865600). Then, the effect of using the en_US_POSIX locale would in turn cause the date instance with timestamp 1451865600 being formatted into the right Gregorian format that we want: 04 Jan 2016.

Problem 3

Right, so we have solved that problem with CoreData, what is going to happen to dates created inside the app but not saved to CoreData? The answer is that it is totally messed up as well since you have done NSDate.setDefaultCalendarIdentifier(NSCalendar.currentCalendar().calendarIdentifier) at the beginning

The problem is now when creating new NSDate instances within the app, the local system calendar will be used by default

    // Notice the `NSDate()`
    let today = NSDate()
    let formatter = NSDateFormatter()
    formatter.locale = NSLocale(localeIdentifier: "en_US_POSIX")
    formatter.dateFormat = `dd MMM yyyy`
    formatter.stringFromDate(today)

The above code will result in the string 04 Jan 0028. To work around this problem, I temporarily fall back to the Gregorian calendar in the formatter for all of these dates which we need to show the user in the UI but are not created/retrieved from CoreData itself, then reset it to the system local one using defer. The full code is then:

  // A date formatter used for display views
  private static let displayDateFormatter: NSDateFormatter = {
    // Remember this following call is put here for illustration
    // In reality it should be in app launch
    NSDate.setDefaultCalendarIdentifier(NSCalendar.currentCalendar().calendarIdentifier)
    let formatter = NSDateFormatter()
    // Use the default calendar identifier but with the en_US_POSIX locale
    // This will avoid weird dates from popping up
    formatter.locale = en_US_POSIX
    formatter.calendar = NSCalendar(identifier: NSDate.defaultCalendarIdentifier())
    return formatter
  }()

  /**
   dateToDayDisplayString

   Given a date (NSDate), it returns a string representation of it with only the date
   */
  class func dateToDayDisplayString(date: NSDate, withFormat format: String,
    withTimeZone timezone: NSTimeZone? = nil, withCalendarIdentifier calendarIdentifier: String? = nil) -> String {
      if let timezone = timezone {
        displayDateFormatter.timeZone = timezone
      } else {
        displayDateFormatter.timeZone = NSTimeZone(name: "UTC")
      }

      displayDateFormatter.dateFormat = format

      if let calendarIdentifier = calendarIdentifier {
        displayDateFormatter.calendar = NSCalendar(calendarIdentifier: calendarIdentifier)
      }
      // This next line says always reset the calendar to the default one even after the return
      defer {
        if let _ = calendarIdentifier {
          displayDateFormatter.calendar = NSCalendar(identifier: NSDate.defaultCalendarIdentifier())
        }
      }
      return displayDateFormatter.stringFromDate(date)
  }

If you are interested in my work, you can find out more about our product at home.esplor.io

How I almost screwed up the Esplorio iOS launch and fixed it with duct tape

Team Esplorio officially launched the iOS app

Polo

Meet Polo - The Esplorio GPS Kitty

We first built our tracking app a long time ago. In the past few months, we put a beautiful UI on it and re-engineered the whole platform in the process.

We went from this simple one-page tracker prototype:

to a beautiful trip recording/sharing app:

this awesome app

With a bit of luck, we got Hunted and featured on the top of the Tech featured page for the day. Now that we've launched on the app store plus a shiny ProductHunt badge, it is pretty awesome.

What happened behind the scene?

For the 2 days leading up to the launch, we camped at Tim's place to work our ass off. The first day we called it a day at 3am, and the second day we pulled an all-nighter trying to get all the launch stuff together then stayed up until late afternoon to respond to all the new traffic. That was almost 40 hours of work for the 2 days - which is pretty much a week equivalent for most people. It is insane! I do not recommend it.

And I almost f*cked it up

When shit hits the fan just before launch, it hits real hard.

About 13 hours before launch, I was doing usual maintenance on our servers, restarting some machines since the OS required a server restart for some security updates. One faulty restart then took out our whole database cluster. The cluster seemed to get into a very bad race condition and never recovered afterwards no matter what we did to save it. We then decommissioned it, spinned up a new production cluster to replace it using the latest backup that we had at the time. However, by the time the backup data was in place, it was already 3 hours before launch time, but our database views still had not finished indexing yet - which means the site and the app are both unusable.

Tim, Essa and I then had to make a call to whether we should keep going with the launch. It was a Thursday, the coming weekend would be the last weekend before Christmas, so we thought launching any time later than this (even on the Friday) would be a bad idea. At this point, we realised that we still have a staging database, which has the data replicated from production along with all the views being warmed up already, lying there ready to be used. We quickly tested it, everything seemed to work, the only risk is that since these are just staging servers, we have no replications set up, so we run the risk of having a bigger screw-up if one of the boxes fail.

We bit the bullet and used that cluster anyway. It worked flawlessly for the whole launch period. We then ran an XDCR during the launch from this substitute staging cluster to that new production cluster that we built overnight to make sure it always has newest data, with the hope that the view indices will be ready later in the day or maybe the day after at worst.

This afternoon, we confirmed that the new production cluster was ready. We made sure all the data is in place, switched all our servers to use that cluster and reversed the XDCR like it was before (production -> staging).

Yes, that's right. We just fixed our app launch with duct tape and it worked - you can now get it at https://home.esplor.io!

Startup life is fun.

All our base are belong to Google (with a few gotchas)

Workday

The Esplorio team has moved into the same town resulting in me sharing a flat with Essa, and we took our servers along with us (I kid, I kid)

A while back we managed to get into the Google Launch programme, which includes a $100k voucher of Google Compute Engine (GCE) credits. The idea is that Google will assist these new exciting startups to scale with many different resources they have at their command, and beefy servers are just one of their specialties. There are a few technical gotchas I will mention at the end so if you want to skip the BS, go all the way down to The gotchas

The move

These credits sat around for quite some time because our whole team (3+1) were dead focused on getting the iOS app out until about 3 weeks ago when we asked our friend George Hickman to join Esplorio once more to help us with this huge switch involving a lot of different moving parts:

  • API servers serving the webapp and iOS app
  • Web frontend server
  • A single-node 8GB database box we need to convert to a proper distributed cluster as it was designed to run (Couchbase minimum requirements state 16GB of RAM for each node in the cluster)
  • A myriad of other servers to process geodata, images and queued tasks

By the end of the move, with George's tremendous help, we rewrote all of our deployment scripts using the awesome Apache Libcloud. Spinning up a whole database cluster only takes one single deploy.db_cluster:node_count=100 line in the terminal. After all the scripts were rewritten, it took me another couple of days to complete the switch, tighten our firewalls, and scrub all the old servers. As careful as I was, some parts of the system still went down for about half an hour because of a DNS change.

We now even have a staging database cluster, which makes us feel a bit more like a proper software company, and plenty of firepower to prep for growth :fingers crossed:

It was a great experience. After months and months of writing way too much Javascript and Swift, I've eventually got my hands around some much needed DevOps stuff. It also serves as a reminder for myself: Esplorio is totally not a simple system to run!

The gotchas

However, we had several problems that we encountered during our move:

1. Unusual traffic from China

When we first set up some test GCE boxes, we noticed some suspicious traffic hitting our Django servers. Since Django has the ALLOWED_HOSTS check, fortunately it filters out various invalid hosts from hitting most of our endpoints, and on top of that it sends us alerts of these repeated spoofing attempts to hit our servers like this:

ERROR: Invalid HTTP_HOST header: 'azzvxgoagent5.appspot.com'.You may need to add u'azzvxgoagent5.appspot.com' to ALLOWED_HOSTS

No stack trace available

Request repr() unavailable.

(The HOST header may vary: azzvxgoagent5.appspot.com, azzvxgoagent3.appspot.com, azzvxgoagent1.appspot.com, www.google.com.hk)

After some investigation, it turns out all of these requests came from a program called GoAgent, which snoops around Google App Engine (GAE) servers and use them as a free resource to create a proxy service. As you would have guessed, it is apparently used by many Chinese to bypass the Great Firewall. Our Compute Engine boxes must have fallen in the same IP range that GAE boxes use, and we've had thousands of these requests coming our way.

We decided to filter out these requests before it reaches our Django instances, returning a HTTP 444 (bad request error, without any response) right when they hit our HTTP server.

2. Funky network setup by Google

To bring our database over to the new infrastructure without any downtime, we used a technique in Couchbase called XDCR (Cross-DataCenter Replication). The process is to first build the new cluster, and then set up an automatic unidirectional copy of the data from the old cluster into the new one where every single document in the old cluster will be sent over as part of the copy (each copy request is thus called an XDCR op). Once all the data is in place, one can simply flip the switch for the application to use the new cluster, and all the precious data will be there in the new cluster, ready to use. When all of the left-over XDCR ops finish, we can make a backup of the old server and then archive it.

In order for this to happen successfully, all nodes within the 2 clusters need to be able to talk to each other. We first set the new cluster up so that they can all talk to each other using Google's internal IP addresses, leaving only one box exposed to the old cluster, because we thought if we point the XDCR target to this "leader" box, it'd be enough. XDCR failed, of course, because Couchbase clusters treat each node equally and so all of these individual nodes need to be able to talk to each other. I did some further digging into the GCE network structure, and found that Google have done some funky setup where the IP address of eth0 is the internal one, and the external address is apparently generated and configured elsewhere. The idea is that all the nodes are connected to the Internet not directly but via a different layer, and as a result external IP addresses can be changed either at creation time or even on the fly. It's quite cool.

I predict our database cluster would perform even better if we use an all-internal setup, however it is a task for another day.

3. Quotas

The last gotcha we hit during the migration was the quotas. I assume this is enforced by Google to prevent abuse of their system. Basically, our whole setup required a total of a few dozen CPUs and a number of terabytes of SSD to run so we had to ask for quota raises twice. This was, thankfully, not much of a big trouble since we are a totally legit startup (yay!) and Google's support was very quick and receptive about it.

Conclusion

I normally consider myself a (somewhat) full-stack developer, but much of the fun I've had still comes from back-end and DevOps. Building these new servers was a bit like assembling many parts of a big puzzle, and the end result was very satisfying. Now on to a tonne of other stuff waiting for me to complete while I procrastinate by writing this blog entry...

San Francisco stories (no.5)

Not your kind of people

Supermoon observers

POV of an outsider

On one hand, I got to meet exceptional individuals on my trip to San Francisco and the Bay Area. These people are the ones in the driving who push the limits of technology, constantly on the forefront of innovation, trying to have a shot at the impossible, and keeping the wealth flowing in. No exaggeration: they left me in awe of their intelligence and talents. Oh yes, there are way way more talents to tap into outside of Silicon Valley - but these superhumans are a totally different breed, seriously...

On the other hand, there are homeless folks roaming almost every street we walked/drove by, and many of them are mentally ill. One night, when I was waiting for the BART in SF centre to get to our place in Oakland, an old man came around, kept saying a lot of gibberish to everybody nearby - amongst which I made out the part where he said he was a vet in 'Nam (funnily enough...). He then proceeded to sing a song that I could not really understand either, danced along with it in a deranged way, spoke some more gibberish and walked away. I had a glance into his eyes and found them... well, soulless. It would have made a great photograph, but it was just all really sad so I hesitated and decided not to take the shot. Ever since, I have been reading more and more about the homeless and mental health problems of San Francisco - fascinating stuff.

All of it was just like a dream - SF is now 8 timezones away. A jet lag is like leaving your heart and soul somewhere else. It is time to readjust.

From High Wycombe, England