While marketplaces are taking over the world, and Wildberries is reporting a 96% increase in turnover in 2020 — up to 437.2 billion rubles, we, developers, are looking for ways to make these marketplaces process requests in a matter of milliseconds and scale in a snap.
In this article I will tell you about my experience with a highly loaded project, with which I have been cooperating for almost 2 years. I’ve decided to overview only a few points so far that I think are really cool.
One of the tasks was to make the site operation speed comparable to the generally recognized trends. We studied the experience of Ozon, Wildberries, Lamoda in order to find the best architectural solution and databases, create universal constructors, and build common logic with the application.
Everything is really well thought out in Ozon, but the response speed is not so fast. Lamoda has cool design, but it’s not perfect in terms of speed, and Wildberries show better performance compared to others. That’s why we set ourselves the goal of achieving exactly their level in the operation speed. It’s a pity that Wildberries developers don’t tell much, so if any of them reads this article, share your experience. I think everyone will be very interested.
Below I tell you what universal solutions for different tasks we found with our team in the process. The new project site is being prepared for release, and very soon you will be able to evaluate how all methods described below actually worked out and affected the speed. Until then, read and ask questions in the comments.
Server architecture development, database selection
How awesome your project is, depends on many things. For example, how advanced and forward-thinking the tech lead is in the team. How precisely he will think through the server architecture so that it could scale easily while the project grows. What databases he will select so that the site worked fast and did not bother users. Mysql and Postgresql are mostly used. We have been working with the latter for 2 years already. However, in this case it was not enough. Because the project involved high loads and the use of faceted search. Therefore, we had to look for another solution.
And if everything is clear with the loads (complex requests like “show all red products in size L for women” and “remove all other filters that do not contain such products” get in the queue, take all the time, and as a result, the site works slow as hell), I want to dwell on the operation speed separately.
So, problem #1 is the ability to use faceted search conveniently. This is such search where filters depend on each other. That is, if you chose the brand “boss”, you will be shown the colors available only for it. Then if you chose size L you see all items of the specified brand, color, size. So all filters depend on each other.
Simultaneously with faceted search the customer wanted to see rather complex selections on the product page. For example, a selection of all products of the same color. Or selections of discounted products in the store where you’re choosing your outfit. There was also task #2, which pushed us to the fact that we needed another database, and Postgresql alone would not be enough.
Yes, I understand that queries can be optimized, but in any case, the database needs to shovel a lot of data and spend a lot of time processing it.
In general, we began to look for a database that would quickly read data and make faceted search convenient. I won’t enumerate a bunch of options that we tried. Finally, Mongodb and Elasticsearch remained.
In fact, they are the same, but Elasticsearch is written in java and consumes tons of server resources. If you have a catalog of at least 100 thousand of products, then 8 gigs of memory will have to be allocated only for Elasticsearch. And we generally think that java is rather slow. That’s why Mongo became the winner.
Solving the second problem — reducing the load on the site and reducing the number of queries — led us to data denormalization. It is a way in which we prepare data for output in advance. For example, we collected information about a product with the availability in a store, related products, reviews, price, and put it in one place. Thus, the output of the product page turns into one query, not several (which stores have in stock + at what price + similar products + which sizes, etc.).
Next, we built a separate event system so that when the availability status changed or when a review was added, the denormalized product data were updated. Yes, there is a slight delay in updating the data, but we believe that we can neglect this delay when listing the catalog. This model is called the read write model. In the cart, for example, when placing an order, we no longer use denormalized data, but the selections there are simpler and the load is less.
Of course, we could store this data in Postgresql as well, but this is not logical, given that all information for a faceted search is stored in Mongo. Because first, it is a query to one database to filter data, and second, to another to output them.
Final database architecture
Postgresql is used as the main database, all logic and data are here, we write everything here and send queries. Mongodb is used additionally, data is denormalized here, it is read-only. Mongo is very fast and easy to replicate in case of data growth or workloads.
We built the process of writing data to Mongodb on different queues, which also allowed us to scale the project in the future, adding more servers that could sort queues out faster. It sounds kind of simple, but the process requires utmost care, because it’s easy to miss something.
As a result, we got a very fast working directory and high speed of data transfer generation. For example, you need about 7ms to select and prepare a product page, about 30–40ms for a catalog. And it is possible on a server with 2 processors and 4GB of memory. So the prospects for scaling and the potential are enormous. The only thing that is possible is to run into the disk speed.
I’d also note that our entire server architecture is managed by Kubernetes. It is good for business for the following reasons:
- different branches are automatically deployed on the test site. That is, we can develop several new features in parallel and test each one on a separate domain. The key point is that it happens automatically.
- Kubernetes is a server-side orchestration tool. Take this literally. The thing that controls the container orchestra: DB1, DB2, front-end and php. And if goes down, it will reboot, if the load increases, it will be replicated to another machine. It is our conductor. However, one should not think that it can do everything itself. Developers have to write all logic in Kubernetes. To make up a bunch of instructions for every situation. Someone might say that it lowers the response time a bit due to the fact that it builds its own networks with which it manages containers. But look, if a person does this, it will take even more time, but besides that, an error factor will occur. And by the way, we always take into account the network time counting how long the code takes to execute. For example, you know that it takes 30ms, and the browser console displays the request processing time — 100ms. How so? 70ms is the speed at which the data is transferred. If you want to reduce this time — store servers in close proximity to the client. Usually in the console you can see that Wildberries, for example, transmits data about the nearest server from where to load data in order to spend less time on its delivery to the client.
- Deploy to production takes place without downtime and by clicking one button.
Common features with a mobile app. What is the value of code reuse?
I was, a, and will be for cross-platforms, i.e. it is important that the client receives the same functionality on any device and in any operating system. Now I really want to write code that is suitable both for web and native apps, in order to optimize not only time costs, but also not to get discrepancies in logic. However, this is not possible yet.
Nevertheless, we finally managed to partially reuse the code and logic between the web and the mobile apps, since the web app is in React and the mobile app is in React Native.
At the same time each platform has its own appearance. Technically speaking, we have a common Redux Store, so the work with data is carried out from one place, but it does not affect the visual component in any way.
Of course, one could also recall React Native Web, which is designed to write one application that runs simultaneously in a browser using standard web technologies and on iOS or Android as a real native mobile application. A year ago we tried to deal with it, but it was still quite raw. So, we did the following.
A front-end developer making a website wrote a store, and then a mobile developer with a slight lag came to the development of the same section, took the store and did his own processing, but in general it was possible to do it vice versa. Thus we also avoid problems with different logic. And now I’m talking not only about what happens between Android and iOs, but also about the difference between the web and mobile apps. Plus it gives us a tremendous increase in development speed. In my opinion it’s awesome.
Browser side caching
Surely everyone pisses off when they press “back” in the browser, and they are thrown not on the line of the store where you were, but in a random place on the page, and you have to scroll again to the product you stopped at. What a shame! And sometimes it also takes a long time. That’s why we do browser caching within the user’s session. As a result, if the user clicks back, we simply show the page from the browser cache. It happens instantly and the user returns to the place where he stopped surfing. The community offers many options with temporary solutions, but as practice has shown, caching ajax requests is the most powerful and the easiest way to allow users to enjoy using your website catalogue.
Session data receiving and storing, token generation, data merge
Now let’s talk about how sessions and data merge are organized on the project. Why do we separate get and post requests and why do we cache them? Where did the idea of generating token on the front come from? And what does it give us?
First about the cache
While it is assumed that the project will scale, we decided to take into account additional loads in advance and calculated the option with caching. In order to add speed to the site, the product page should load instantly, and the user profile should then be fetched asynchronously. Thus, the entire product page can be put in cache, and the user profile can be loaded separately. Faster? Obviously!
Get and post requests
We came to the conclusion that it is enough for us to receive data only on the client side (it is not important for SSR, but we have the ability to cache it on the Nginx side). This allows us to see instant response, because the entire page can be cached on SSR — all get requests, and in case of high loads, we leave this option to ourselves, and on the client side, information on each specific user will be fetched (all his favorites, number of things in the basket, etc.). Get request is cached separately even for unregistered visitors and does not fail upon update. User data should not be modified by get requests. Thus, if necessary, the entire site can be put into the cache without affecting the session data (authorization).
Let me illustrate with an example why this is important.
Let’s say I logged in and went to the main page, and it cached on me. If we do not separate requests, then when someone else visits the site, he will see the main page with my data received from post requests — with my favorites, because Nginx will consider the cached page relevant and give it in the same form as it got there. When separating requests, it turns out that the main page is taken from the cache, and user data is fetched for each separately.
Session token (not to be confused with authorization)
It is needed for the backend to understand who specifically addresses to it and what data needs to be given. It is a unique key that was previously generated on the backend side, but we did it in another way — on the client side. What did it give us? First, we know the source where the person came from (web, ios, android), and second, now it is not necessary to wait at the frontend side for the end of one request, as it was before, in order to send another (request to receive a session + request to update browsing history). Accordingly, we increased the speed again.
We made it so that we wouldn’t have to wait for the backend all the time. And also, in order to immediately transfer information about the client to the backend, for example, what the client is using — Android or iOs. After all, sometimes the pricing in the store directly depends on the device.
The difference between a session token and an authorization token is that if we hadn’t a mechanism for registering a user, the principle of the site would remain the same as described above. Session cookie is a unique identifier to which the backend can bind any necessary data. Due to this approach, we begin to distinguish between entities such as user and session. What is cool about it?
user — Ilya A
session 1 — web
session 2 — android
Ilya in his account from the computer put 2 products in the cart. Then he went to the site from the phone without logging in there and began to fill the cart too. This is “session 3” for now, because we do not know who it is, and whose cart it is. But as soon as Ilya logs in from his phone to his account, all goods group together in Ilya’s personal cart.
As a result, we connect all user’s devices into one account and merge carts using session cookies. Well, we’ve just explained why we do this from the performance point of view. This is a small thing for performance which makes a big difference. Besides, we always know from what devices our users are visiting the site. In our specific case it influenced the pricing — users from the mobile application received a 1% discount.
We also paid a lot of attention to the images on this project. Because it’s the key to the customer’s sales. Now there are already a bunch of screen resolutions, different pixel densities, the same retina, for example. And all this must be taken into account in order to present an ideal picture to the user on each screen in terms of both quality and proportions.
How can this be achieved? Let me tell you, it’s a bad idea to upload one cool picture in high resolution to show everyone the good quality. Why waste a lot of traffic or face the problem when you load a picture into a banner, it looks beautiful on a desktop, and on the phone a person’s head can be missing for example.
Therefore, we load images through a tag and in a whole array: for phones, desktop, different resolutions, with different density of dots. By the way, now the webp format is a good solution (at the time of writing, Safari did not support it, so both formats were used), which weighs less than the usual jpg. As a result, each device receives its own image. Of course, it takes up a lot of space (there are 14 images for 1 product and this is only for one page, but there is also a cart, a catalog, etc., as a result, we need about 50 pieces so that each user got exactly the picture he needs) but space is the cheapest resource right now. BUT the goal is not to save money, but to show the product in such way that the user bought it without going to the store.
We cache all images on the server side so as not to generate the entire array each time. The client loads one, and we already do as much as we need.
Front-end in the admin panel
Why is a good admin panel a cool tool that requires complex development, time and money?
For us, myths about the admin panel are a real problem. For example, if the site is made on CMS, the admin panel can be edited independently without the participation of developers. But we have already analyzed this particular story (in russian). So, we will omit it.
We always do everything from scratch, the admin panel too, respectively. We have a cool package that we will make open-source in the future. It allows us to quickly create controls for simple entities like news, name categories, etc., but we still have to pay a lot of attention to complex entities, like menus, constructors, the behavior of a staggered grid in real estate, or interface logic in logistics, for example. This is a really big piece of work and it shouldn’t be underestimated. That is why we set the clock, involve the frontend developer, design and create convenient control mechanisms. Hence, the corresponding amounts arise, but at the end you get a custom convenient development individually for your project. There are 2 awesome examples below.
We “invented” it so that the client could change and add something on his own without developers. For example, different new menu categories. What can they be thanks to the current constructor on the example of our project?
As you see, anything at all can be specified. But, if the data is dynamically selected according to such criteria, as I said, it will be hard for the database, and everything will load slowly, so we recall the data denormalization that we implemented. Now, in order to collect new menu items in the admin panel, the task is launched to form any item according to the selected parameters, and it is ready, while it will open very quickly thanks to the notorious denormalization.
But we still want to tell you about the universality of the admin panel. Watch the video how we have created a cool, user-friendly constructor that allows us to solve all the client’s tasks.
I would especially like to draw your attention to blocks. It is a super thing that allows to create some kind of pre-drawn and laid out cool blocks in the catalog in addition to products, in order, for example, to add information about the brand or some kind of promo before products.
We also have the concept of collections in our project. It is clear that there is machine learning, automatic recommendations and other technologies and services, but sometimes you need to create collections manually. Let’s analyze the following case.
For example, we would like to recommend boss jeans for boss T-shirts. Or to recommend T-shirts of the same color. Everything would be fine, but we are building a universal architecture. And today the client sells T-shirts, which have different colors, tomorrow — laptops with different memory and so on. Therefore, we made a constructor again, check it out in the video.
Even in the process of testing we realized that we had made some kind of powerful constructor with which we could form rules of even more complex level. For example, recommend other goods from the same store to the product you are viewing, but at a discount. Developers are no longer needed for this, it can be done by a marketing specialist.
Collection display Web Secret
Exchange with 1C
There are several scenarios for exchange with third-party systems:
- a third-party provider delivers a standardized file to the site, the site processes it — this is how almost all CMS work.
- the site periodically requests a standardized file by URL and makes updates when it is convenient for the site — this is how Yandex. Market. works, for example.
- there is a full-fledged API on the side of the site, and a third-party provider picks methods, changes only what he needs — this is how Ozon works, for example. This method is also called push based.
The scenarios of data communication mentioned above numbered 1 and 2 are entry-level. Yes, in some cases they are enough, but you need the 3rd option in large projects. Although it is complex, it is much faster, flexible and advanced in terms of functionality.
According to the last principle the exchange was implemented in our project. The choice fell on it, because it was required to update a lot of data and request it from the outside. For example, a full exchange of goods is an ordinary operation, so everyone uses option 1 or 2. But we needed that, as soon as the product was taken from the offline store, 1C would notify the site that there was no such product in stock anymore, and we would immediately remove it from the page. Or the importer of the supplier of all Boss goods changed. We cannot rewrite all goods — you change the name of the importer, and this information will be fetched in all goods. In addition, there are many background processes that require a full-fledged API, for example, exchange of client information, push notifications, etc., all this is written in the documentation.
It is important to note here that we are preparing architecturally that many suppliers will send us data, and it is obvious that we won’t process it all at one point. That’s why to all API calls we respond with the number of the task that is queued. And the supplier at any time can check using a separate method what status his task has: in the queue, completed, or there are some errors. Or if you set up notifications, you can get the result based on the result of the task processing.
This allows us to balance the load on our side in the way that is convenient for us, without overloading the entire server, if suddenly all suppliers begin to perform costly operations at once. And it’s also easily scalable. More servers can be added which begin to parse tasks in parallel if necessary.
In this article I’ve shared the experience that we got while working with a cool client, the experience that can be used and improved in other projects. And I will be happy if it is useful to someone, because I’ve written it not for the sake of boasting, but for the sake of sharing my experience. So, leave comments, share your cool solutions.