I have a web app in Azure, which has roughly 100k visitors a month, with less than 2 page views pr session (purely SEO visitors).
I just studied our Azure bills, and was shocked to find out that during last month we 3.41 TB
of data out.
Terabyte.
This makes absolutely no sense. Our average page size is less than 3mb (a lot, but not 30mb which the math would say). The total data out should in practice be:
3431000 (mb) / 150000 (sessions) = 23mb pr session, which is absolutely bogus. A result from a service such as Pingdom says:
(Seems Stack.Imgur is down – temp link: http://prntscr.com/gvzoaz )
My graph looks like this, and it’s not something that just came up. I have not analyzed our bills for a while, so this could easily have been going on for a while:
(Seems Stack.Imgur is down – temp link: http://prntscr.com/gvzohm )
The pages we have most visits on are an autogenerated SEO page which reads from a database with +3mio records, but it’s quite optimized and our databases are not that expensive. The main challenge is the data out, which costs a lot.
However, how do I go about any test this? Where do I start?
My architecture:
I honestly believe that all my resources are in the same area. Here is a screenshot of my main killers of usage – my app and database :
App:
Database:
All my resources:
2
Answers
After some very good help from a Ukraine developer I found on Upwork, we've finally solved the issue.
The challenge was in our robots.txt.
It turned out, that we had SO many requests on our pages - and we have 3.6 mill address pages - that it simply was a HUGE amount of requests. That's why the data out was so big.
We have now solved it by:
I'm happy!
Follow guidance given in Understand your bill for Microsoft Azure.
Review billing from subscription level perspective.
Find out whether egress is sended/requested into/from azure services in other regions or largely requested from website visitors. Verify backup panel of web app as well or any other backup running regularly.
Review performance monitoring or performance test. Any tests from other regions responsible for larger egress?
Find out if egress follows site load during business times. If not dig deeper.
Find out if SEO visitors trigger any downloads, if yes adjust links accordingly.