Platform
For us, of course, the architecture of the project is of particular interest: how the main components of the system interact, what in-house developments were required, what tricks we had to use. But before moving on to it, you need to familiarize yourself with the basic things - the technologies and products used.
Debian Linux is used as the main operating system - a time-tested solution, one of the oldest and most stable modern distributions. To balance the load between application servers, the nginx HTTP server operating in reverse proxy mode is used. His responsibilities include maintaining a connection with the user’s browser and transmitting requests to servers responsible for executing the PHP code, as well as monitoring the delivery of the result back to the browser. PHP code is executed using the mod_php module for Apache - there are quite a few alternative options, especially based on the FastCGI protocol, but VKontakte management took a more conservative path in this matter, using the most time-tested solution. No special systems for optimizing the performance of PHP code are used (for example, Facebook wrote its own PHP to C compiler called HipHop), the only external optimization is op-code caching using the publicly available XCache solution.
The situation with data storage looks quite vague: on the one hand, its own database management system, written in C and created by the “best minds” of Russia, is actively used, on the other hand, MySQL was often mentioned as the main storage. I will tell you more about my own VKontakte database below. Speaking about data storage, one cannot fail to mention such an important aspect as caching frequently used information (locating it in RAM for quick access). For this, a very popular product in this area is used - memcached. If you haven't heard: this system allows you to perform very simple atomic operations, such as locating and retrieving arbitrary data by key. The main feature is lightning-fast access and the ability to easily combine the RAM of a large number of servers into a common array for temporary storage of “hot” data.
Third-party projects that are not key to VKontakte are often implemented either using rather exotic solutions, or, conversely, using the simplest technologies. For example, the instant messaging service is implemented in node.js (you can read more about this development in the article “Server-side JavaScript” in ][ 08/2010) using the XMPP aka Jabber protocol (we will return to it later). Video conversion is implemented using the simplest and most effective library - ffmpeg, which also runs the very popular VLC video player.
Main technologies used
- Debian Linux is the main operating system
- nginx - load balancing
- PHP + XCache
- Apache + mod_php
- memcached
- MySQL
- Own DBMS in C, created by the “best minds” of Russia
- node.js - a layer for implementing the XMPP protocol, lives behind HAProxy (haproxy.1wt.eu)
- xfs - file system for storing images and delivering them to the user
- ffmpeg - video conversion
Popularity of VKontakte
According to recent studies, VKontakte is the most popular social network with the most solvent audience.
VKontakte is becoming more and more popular every day; it is no longer used only by Russian citizens. In addition to individuals, a huge number of online stores, beauty salons, sanatoriums, guest houses and training centers have special accounts in this network in addition to their own websites.
About 500,000,000 (five hundred million) people are registered on VKontakte. About 100,000,000 visitors come in every day.
But even such a well-promoted social network is not suitable for all types of business. Groups are needed for those types of businesses and in those niches in which the business directly interacts with the end consumer.
Architecture
The most noticeable difference from the architecture of many other large Internet projects is the fact that VKontakte servers are multifunctional. Those. there is no clear division into database servers, file servers, etc. — they are simultaneously used in several roles. In this case, the redistribution of roles occurs in a semi-automatic mode with the participation of system administrators. On the one hand, this optimizes the efficiency of using system resources, which is good, but on the other hand, it increases the likelihood of conflicts at the operating system level within one server, which entails stability problems. However, despite the use of servers in different roles, the project’s computing power is usually used by less than 20%.
Load balancing between servers occurs in a multi-layered manner, which includes balancing at the DNS level (the domain is served using 32 IP addresses), as well as request routing within the system, with different servers used for different types of requests. For example, generating pages with news (now called a microblog) works according to a clever scheme that uses the capabilities of the memcached protocol to send requests in parallel to obtain data from a large number of keys. If there is no data in the cache, a similar request is sent to the data storage system, and the results obtained are sorted, filtered and discarded at the PHP code level. This functionality works in a similar way on Facebook (they recently exchanged experience), only instead of Facebook’s own DBMS they use MySQL.
A large amount of software has been developed within the walls of VKontakte, which more accurately meets the needs of the project than available opensource and commercial solutions. In addition to the aforementioned proprietary DBMS, they have a monitoring system with notification via SMS (Pavel himself helped design the interface), an automatic code testing system, and statistics and log analyzers.
The project uses fairly powerful equipment; the following server characteristics were tentatively named:
- 8-core Intel processors (two per server, apparently);
- 64 GB of RAM;
- 8 hard drives;
- RAID is not used (replication and backup are carried out at the software level).
It is noteworthy that the servers are not branded, but are assembled by a specialized Russian company. Now the project equipment is located in 4 data centers in St. Petersburg and Moscow, with the entire main database located in the St. Petersburg data center, and only audio and video are hosted in Moscow. There are plans to replicate the database with another data center in the Leningrad region, as well as use the Content Delivery Network to increase the speed of downloading media content in the regions.
Many projects faced with a large number of photos often invent their own solutions for storing them and delivering them to users. This was the first question asked to Pavel from the audience: “How do you store images?” - “On disks!” One way or another, representatives of VKontakte said that this whole bunch of photos of all colors and sizes is simply stored and served from the file system (they use xfs) of a large number of servers, without additional frills. The only confusing thing is that this approach didn’t work for other major projects - they probably didn’t know the magic word :).
No less magical is that very own database in C. This product, perhaps, received the main attention of the audience, but at the same time almost no details about what it, in fact, is, were ever made public. It is known that the DBMS was developed by the “best minds” of Russia, winners of Olympiads and TopCoder competitions, and also that it is used in the most heavily loaded VKontakte services:
- Private messages
- Messages on the walls
- Statuses
- Search
- Privacy
- Friends lists
Unlike MySQL, a non-relational data model is used, and most operations are performed in RAM. The access interface is an extended memcached protocol. Specially composed keys return the results of complex queries (most often specific to a particular service).
The system was designed taking into account the possibility of clustering and automatic data replication. The developers would like to make this system a universal DBMS and publish it under the GPL, but so far it has not been possible due to the high degree of integration with other services.
Interesting facts about VKontakte
- The development process is close to the Agile methodology with weekly iterations (cycles), within which all stages of development take place: planning, requirements analysis, design, development and testing.
- The operating system kernel has been modified (to work with memory), and there is its own package base for Debian.
- Photos are uploaded to two hard drives on one server at the same time and then backed up to another server.
- There are many improvements to memcached, incl. for more stable and long-term placement of objects in memory; There is even a version that ensures data safety.
- Photos are not deleted to minimize fragmentation.
- Decisions on the development of the project are made by Pavel Durov and Andrey Rogozov, responsibility for the services rests with them and the developer who implemented it.
- Pavel Durov has been saving money for hosting since his 1st year :).
Facebook Features
Facebook is a popular social network created several years ago by a Harvard student . The main goal of this network is to organize interaction between people via the Internet. Users can exchange messages, photos, videos, music.
Facebook became the first social network . Its creation shocked literally the whole world. Only after a while VK appeared and Instagram emerged. Distinctive features of the platform include:
- lack of usual audio recordings ;
- prohibition on tracking page visit statistics.
The remaining functions are no different from the same VK, although, rather, the latter is a copy of the social network in question. Facebook users can listen to music, exchange messages, send photos, and visit communities. Also, everyone can create their own interest group and gather an audience. How is Facebook different from Instagram?
The main feature of the platform is that it is world famous. The main difference is if you want to find a friend abroad or ask for help when you need shelter abroad.
Subprojects
Audio and video services are secondary to the social network; the project’s creators do not particularly focus on them. This is mainly due to the fact that they rarely correlate with the main purpose of using a social network - communication, and also create a large number of problems. Video traffic is the main expense of the project, plus the well-known problems with illegal content and claims from copyright holders. 1000-1500 servers are used to transcode video, and it is also stored on them. Media files are banned by hash when deleted at the request of copyright holders, but this is ineffective and it is planned to improve this mechanism. Obviously, we are talking about developing a more intelligent algorithm for recognizing audio and video content by tags, as is, for example, implemented in YouTube, where an uploaded video that violates the license can be automatically deleted within a few minutes after uploading.
As you know, some time ago it became possible to communicate on VKontakte via the Jabber protocol (aka XMPP). The protocol is completely open and there are many opensource implementations. For a number of reasons (including problems with integration with other VKontakte services), it was decided to create our own server within a month, which would be a layer between VKontakte’s internal services and the implementation of the XMPP protocol. It is implemented in node.js - the choice is due to the fact that almost all project developers know JavaScript, and besides, it is a good set of tools for implementing the task. The difficult part was working with large contact lists. For many users, the number of VKontakte friends is measured in hundreds and thousands, and the activity of changing statuses is high: people appear and disappear from online more often than in other similar situations. In addition, it was necessary to implement close integration with the internal personal messaging system VKontakte. As a result, there are 60-80 thousand people online on the service, at its peak - 150 thousand. The HAProxy TCP/HTTP load balancer handles incoming connections and is used to distribute requests across servers and deploy new versions.
When choosing a data storage system, we thought about non-relational data storage systems (in particular, MongoDB), but in the end we decided to use the familiar MySQL. The service operates on 5 servers of different configurations, each of which runs node.js code (4 processes per server), and the three most powerful ones also run MySQL. An interesting feature is the lack of connection between groups of friends in XMPP with groups of friends on the site - this was done at the request of users who did not want their friends to see over their shoulders which group they were in.
An important subproject is also integration with external resources, which is far from easy to implement in a highly loaded service. Increasingly, on the pages of third-party projects you can see “Like” widgets, which allow you to quickly share an interesting post with your friends, as well as small “We are VKontakte” blocks with data about users within the linked group. The main steps taken in this direction, with some comments:
- Maximum cross-browser compatibility for widgets and IFrame applications based on the easyXDM and fastXDM libraries, which ensure interaction between a third-party resource and the VKontakte program interface. Thus, the problem of cross-domain interaction and the issue of working in all browsers was solved.
- Cross-posting statuses on Twitter, implemented using request queues.
- A “share with friends” button that supports openGraph tags and automatically selects a suitable illustration (by comparing the contents of the tag
- Ability to download videos through third-party video hosting sites (YouTube, RuTube, Vimeo, etc.).
Instagram Features
Initially, this platform was created so that users could upload photos from their travels or important events. But the service does not stand still. Among the features of Instagram today are:
- sending messages . At the same time, platform participants can communicate both by text and voice;
- blogging. Instagram has become a productive platform for many bloggers . Not only do they write informative posts, but they also make money from blogging ;
- creating small stories. Instagram Stories is a feature that allows you to publish a photo or video for a period of one day, after which the material will disappear or become archived. Recently, the application developers added the ability to save stories in the current section, presented on the main page of the user’s profile.
An innovation was the creation of a special channel. It's called IGTV . Here users can post long videos. The introduction of the function brought Instagram closer to a video platform like Youtube.
Not a secret
The veil of secrecy about the technical implementation of VKontakte has been slightly dispelled, a lot of interesting aspects have been published, but many points still remain a secret. Perhaps in the future more detailed information will appear about VKontakte’s own DBMS, which, as it turns out, is the key to solving all the most difficult issues in system scalability. Now, no matter how anyone feels about VKontakte, the service is very interesting from the point of view of building highly loaded systems. Still, 11 billion requests per day, the highest uptime and almost 100 million users are worth a lot.
What is more popular and what is better to choose
Instagram, Facebook or Vkontakte – what to choose? Today, Facebook is considered a popular platform. This will be confirmed by many netizens.
Interesting! The service is beginning to give way to Instagram due to its development and improvement. Most of the audience is gradually moving to this platform, preferring watching short stories to regularly scrolling through news on Facebook.
The second question causes a little difficulty , since there is no clear answer. The service participant needs to decide which platform is best for him and why he should register . If he wants to make new acquaintances with representatives of other countries, then Facebook will be a suitable service. To blog, publish photos, and earn money, you should register on Instagram.
Finally, a platform suitable for a comfortable pastime is VK. The choice depends on the participant.