Slides available on http://asfws12.files.wordpress.com/2012/11/appsec2012_keynote.pdf

Alok Menghragani graduated in Lausanne with a Master at the EPFL before joining Facebook in 2008, which was back then still a young startup with “only” 100 millions users. He gave us an interesting insight in how Facebook manages over 10 millions of lines of code while keeping “move fast and break things” as motto. This gets translated in the facts by having 2 code releases in production per day, where competitors perform only a code release every couple of weeks or each month. From a technical perspective, all web relevant content is written in PHP, while native mobile languages are used for the respective apps. On the backend systems, there is a mix as C++, Java and Erlang are engaged for example.

No team owns a given set of code, everyone can alter any part of it, but needs to find a competent reviewer before being able to check it in into the internal GIT repository. Some regex based greps are performed as a second quality measure on the committed code and between 1000 and 2000 web driver test cases (ran within a browser) are simulated before the code gets released to production. Most security aspects get covered by dedicated frameworks.

One of the essential frameworks in use is GateKeeper, which allows serving differentiated, personally tailored content. Based on rules such as “is the user an employee or not”, “is the user over 18 years old” etc, the rendered page can be tuned or a specific feature disabled. Such rules are distributed within 1 second world-wide and introduce a latency of only 20ms for the whole framework. It’s therefore a handy feature to be able to quickly restrict content due to a law suit in a given country. From a usability point of view, this is also a great feature to test different layouts and gather vital intelligence on what users appreciate.

Another effective feature is an IPS inserted in the heart of the PHP MVC pages served by the site. Rules can be altered on the fly again and deployed as well within 1 second across the globe. In case of a security flaw, Facebook can write an IPS rule within a few minutes and deploy it across all servers immediately. Focus is then placed into patching the issue in the code (~ 30 minutes, including review) before performing an emergency release (count 40 minutes for deploying a patch to all 200’000 servers). A serious security flaw can so be restrained within a few minutes and patched within a little more than an hour. This IPS feature is also flexible enough to be used to protect users against other threats, such as the recent 0 day in Internet Explorer.

Of interest also are the explanations why data deletion on Facebook takes some time, due to the very decentralized way data is stored. My previous take on this was that Facebook did not delete data, but kept it all. It seems that changes happened in this regard and your data might get deleted now if you decide to leave the social network.

The whitehat program also got covered, which allows you to get USD 1’500 onwards (no upper cap) for a security issue you signal to Facebook. The main drivers are that you need to involve first Facebook’s team before publishing the flaw elsewhere but the bounty will reflect the severity of the bug you reported. An issue, which needed several days of work will therefore likely be rewarded by more than USD 1500. Furthermore, once the security flaw has been confirmed, the researcher will be in direct contact with the engineer tasked to fix the issue. Around 100 bugs have been reported so far according to Alok.

Time for questions of the public:

  • Has the code been completely rewritten lately?
    Yes, the code was completely rewritten in 2009 when XHP (Facebook’s PHP extension) was introduced. On the other hand, there is little code which is currently older than 1 year in the code base!
  • How is the code managed?
    There is one code base, which gets branched on GIT on Sunday and released in production on Tuesday. It’s a very iterative process, as most of the changes first gets sampled on 1% of the user basis (thanks GateKeeper) and constantly monitored.
  • What advice can Facebook give to a small or medium business?
    The best thing in Alok’s opinion is to establish this culture of “everyone owns the code”, and therefore empower everyone to see, review and alter code of the whole company.

After this first keynote, it was time for all participants to have a break before the presentations start.