Friday, May 14

Flickr Architecture

Flickr is one of my favorite and a leading photo sharing website. Flickr must be having some amazing challenges when it comes to managing their content. Ever expanding new content, ever increasing new users and constant stream of new features and while it provides excellent performance. How do they do it?

Site: flickr.com

Flickr and PHP (an early document)
Capacity planning for lamp
Federation at Flickr: Doing Billions of Queries a Day by Dathan Pattishall.

Platform
PHP
MySQL
Shards
Memcached for a caching layer.
Squid in reverse-proxy for html and images.
Linux (RedHat)
Smarty for templating
Perl
PEAR for XML and Email parsing
ImageMagick, for image processing
Java, for the node service
Apache
SystemImager for deployment
Ganglia for distributed system monitoring
Subcon stores essential system configuration files in a subversion repository for easy deployment to machines in a cluster.
Cvsup for distributing and updating collections of files across a network.

The Stats
More than 4 billion queries per day.
35M photos in squid cache (total)
2M photos in squid’s RAM
470M photos, 4 or 5 sizes of each
38k req/sec to memcached (12M objects)
2 PB raw storage (consumed about ~1.5TB on Sunday
Over 400,000 photos being added every day

The Architecture
A pretty picture of Flickr's architecture can be found on this slide. A simple depiction is:
-- Pair of ServerIron's
---- Squid Caches
------ Net App's
---- PHP App Servers
------ Storage Manager
------ Master-master shards
------ Dual Tree Central Database
------ Memcached Cluster
------ Big Search Engine
Use dedicated servers for static content.
Talks about how to support Unicode.
Use a share nothing architecture.
Everything (except photos) are stored in the database.
Statelessness means they can bounce people around servers and it's easier to make their APIs.
Use horizontal scaling so they just need to add more machines.
Shards (google it)
Every users reads and writes are kept in one shard. Notion of replication lag is gone.
Average queries per page, are 27-35 SQL statements.
Each Shard holds 400K+ users data.
ibbackup on a cron job, that runs across various shards at different times.
Snapshots are taken every night across the entire cluster of databases.
Photos are stored on the filer. Pointers are stored in the database.

No comments:

Post a Comment

Related Posts with Thumbnails