skip to Main Content

We have an application that does lot of data heavy work on the server for a multi-tenant workspace.

Here are the things that it do :

  1. It loads data from files from different file format.
  2. Execute idempotence rules based on the logic defined.
  3. Execute processing logic like adding discount based on country for users / calculating tax amount etc.. These are specific to each tenant.
  4. Generate refreshed data for bulk edit.

Now after these processing is done, the Tenant will go the the Interface, do some bulk edit overrides to users, and finally download them as some format.

We have tried a lot of solutions before like :

  • Doing it in one SQL database where each tenant is separated with tenant id
  • Doing it in Azure blobs.
  • Loading it from file system files.

But none has given performance. So what is presently designed is :

  • We have a Central database which keeps track of all the databases of Customers.
  • We have a number of Database Elastic Pools in Azure.
  • When a new tenant comes in, we create a Database, Do all the processing for the users and notify the user to do manual job.
  • When they have downloaded all the data we keep the Database for future.

Now, as you know, Elastic Pools has a limit of number of databases, which led us to create multiple Elastic pools, and eventually keeping on increasing the Azure Cost immensely, while 90% of the databases are not in use at a given point of time. We already have more than 10 elastic pools each consisting of 500 databases.

Proposed Changes:

As gradually we are incurring more and more cost to our Azure account, we are thinking how to reduce this.

What I was proposing is :

  1. We create one Elastic Pool, which has 500 database limit with enough DTU.
  2. In this pool, we will create blank databases.
  3. When a customer comes in, the data is loaded on any of the blank databases.
  4. It does all the calculations, and notify the tenant for manual job.
  5. When manual job is done, we keep the database for next 7 days.
  6. After 7 days, we backup the database in Azure Blob, and do the cleanup job on the database.
  7. Finally, if the same customer comes in again, we restore the backup on a blank database and continue. (This step might take 15 – 20 mins to setup, but it is fine for us.. but if we can reduce it would be even better)

What do you think best suited for this kind of problem ?

Our objective is how to reduce Azure cost, and also providing best solution to our customers. Please help on any architecture that you think would be best suited in this scenario.

Each customer can have millions of Record … we see customers having 50 -100 GB of databases even… and also with different workloads for each tenant.

3

Answers


  1. Pick a provider, and host the workloads. Under demand: provide fan-out among the cloud providers when needed.

    This solution requires minimal transfer.

    Login or Signup to reply.
  2. Here is where the problem starts:

    "[…] When they have downloaded all the data we keep the Database for future."

    This is very wrong because it leads to:

    "[…] keeping on increasing the Azure Cost immensely, while 90% of the databases are not in use at a given point of time. We already have more than 10 elastic pools each consisting of 500 databases."

    This is not only a problem of costs but also a problem with security compliance.

    • How long should you store those data?
    • Are these data complying with what county policy?

    Here is my 2 solution:

    1. It goes by itself that if you don’t need those data you just have to delete those databases. You will lower your costs immediately
    2. If you cannot delete them, because they are not in use, switch from Elastic Pool to Serverless

    EDIT:

    Azure SQL Database gets expensive only when you use them.

    If they are unused they will cost nothing. But "unused" means no connections to it. If you have some internal tool that wakes them up ever hours they will never fall in serverless state so you will pay a lot.

    HOW TO TEST SERVERLESS:

    Take a database that you you know it’s unused and put it in serverless state for 1 week; you will see how the cost of that database drop on the Cost Management. And of course, take it off from the Elastc Pool.

    You can run this query on the master database:

    DECLARE @StartDate date = DATEADD(day, -30, GETDATE()) -- 14 Days
    
    SELECT
        @@SERVERNAME AS ServerName
        ,database_name AS DatabaseName
        ,sysso.edition
        ,sysso.service_objective
        ,(SELECT TOP 1 dtu_limit FROM sys.resource_stats AS rs3 WHERE rs3.database_name = rs1.database_name ORDER BY rs3.start_time DESC)  AS DTU
        /*,(SELECT TOP 1 storage_in_megabytes FROM sys.resource_stats AS rs2 WHERE rs2.database_name = rs1.database_name ORDER BY rs2.start_time DESC)  AS StorageMB */
        /*,(SELECT TOP 1 allocated_storage_in_megabytes FROM sys.resource_stats AS rs4 WHERE rs4.database_name = rs1.database_name ORDER BY rs4.start_time DESC)  AS Allocated_StorageMB*/ 
        ,avcon.AVG_Connections_per_Hour
        ,CAST(MAX(storage_in_megabytes) / 1024 AS DECIMAL(10, 2)) StorageGB
        ,CAST(MAX(allocated_storage_in_megabytes) / 1024 AS DECIMAL(10, 2)) Allocated_StorageGB
        ,MIN(end_time) AS StartTime
        ,MAX(end_time) AS EndTime
        ,CAST(AVG(avg_cpu_percent) AS decimal(4,2)) AS Avg_CPU
        ,MAX(avg_cpu_percent) AS Max_CPU
        ,(COUNT(database_name) - SUM(CASE WHEN avg_cpu_percent >= 40 THEN 1 ELSE 0 END) * 1.0) / COUNT(database_name) * 100 AS [CPU Fit %]
        ,CAST(AVG(avg_data_io_percent) AS decimal(4,2)) AS Avg_IO
        ,MAX(avg_data_io_percent) AS Max_IO
        ,(COUNT(database_name) - SUM(CASE WHEN avg_data_io_percent >= 40 THEN 1 ELSE 0 END) * 1.0) / COUNT(database_name) * 100 AS [Data IO Fit %]
        ,CAST(AVG(avg_log_write_percent) AS decimal(4,2)) AS Avg_LogWrite
        ,MAX(avg_log_write_percent) AS Max_LogWrite
        ,(COUNT(database_name) - SUM(CASE WHEN avg_log_write_percent >= 40 THEN 1 ELSE 0 END) * 1.0) / COUNT(database_name) * 100 AS [Log Write Fit %]
        ,CAST(AVG(max_session_percent) AS decimal(4,2)) AS 'Average % of sessions'
        ,MAX(max_session_percent) AS 'Maximum % of sessions'
        ,CAST(AVG(max_worker_percent) AS decimal(4,2)) AS 'Average % of workers'
        ,MAX(max_worker_percent) AS 'Maximum % of workers'
      
      
    FROM sys.resource_stats AS rs1
    inner join sys.databases dbs on rs1.database_name = dbs.name
    INNER JOIN sys.database_service_objectives sysso on sysso.database_id = dbs.database_id
    inner join 
    
    (SELECT t.name
        ,round(avg(CAST(t.Count_Connections AS FLOAT)), 2) AS AVG_Connections_per_Hour
    FROM (
        SELECT name
            --,database_name
            --,success_count
            --,start_time
            ,CONVERT(DATE, start_time) AS Dating
            ,DATEPART(HOUR, start_time) AS Houring
            ,sum(CASE 
                    WHEN name = database_name
                        THEN success_count
                    ELSE 0
                    END) AS Count_Connections
        FROM sys.database_connection_stats
        CROSS JOIN sys.databases
        WHERE start_time > @StartDate
            AND database_id != 1
        GROUP BY name
            ,CONVERT(DATE, start_time)
            ,DATEPART(HOUR, start_time)
        ) AS t
    GROUP BY t.name) avcon on avcon.name = rs1.database_name
    
    
    WHERE start_time > @StartDate
    GROUP BY database_name, sysso.edition, sysso.service_objective,avcon.AVG_Connections_per_Hour
    ORDER BY database_name , sysso.edition, sysso.service_objective
    

    The query will return you statistics for all the databases on the server.

    • AVG_Connections_per_Hour: collects data of the last 30 days
    • All AVG and MAX statistics: collects data of the last 14 days
    Login or Signup to reply.
  3. You could perhaps denormalise your needed data and store it in ClickHouse? It’s a fast noSQL database for online analytical processing meaning that you can run queries which compute discount on the fly and it’s very fast millions to billions rows per second. You will query using their custom SQL which is intuitive, powerful and can be extended with Python/C++.

    You can try doing it like you did it before but with ClickHouse but opt in for a distributed deployment.

    "Doing it in one SQL database where each tenant is separated with tenant id"


    The deployment of Clickhouse cluster can be done on Kubernetes using the Altinity operator, it’s free and you only have to pay for the resources, paid or managed options are also available.


    ClickHouse also supports lots of integrations which means that you can perhaps stream data into it from Kafka or RabbitMQ or from local files/S3 Files


    I’ve been running a test ClickHouse cluster with 150M rows and 70 columns mostly int64 fields. A DB query with 140 filters on all the columns took about 7-8 seconds on light load and 30-50s on heavy load. The cluster had 5 members (2 shards, 3 replicas).

    Note: I’m not affiliated with ClickHouse, I just like the database. You could try to find another OLAP alternative on Azure.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search