I’ve integrated Azure SDK for CPP in my application, and there is significant slow down compared to old Azure SDK.
After upgrade for Upload Azure-sdk-for-cpp parallelism, upload works better, but download is still VERY SLOW.
It can be reproduced with simple example, just by trying to download 1Gb file from Azure storage to local file system.
- Old SDK ~1min
- New SDK ~5min
Old SDK was using CPP REST which used concurrency::streams::istream m_stream; There is no such thing in new SDK , except for TransferOptions.Concurrency which does almost nothing.
Is there some idea how can DownloadTo can be speed up? Or should parallelism be implemented on top of the library?
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
#include <azure/storage/blobs.hpp>
#include <cstdio>
#include <iostream>
#include <stdexcept>
std::string GetConnectionString()
{
const static std::string ConnectionString = "";
if (!ConnectionString.empty())
{
return ConnectionString;
}
const static std::string envConnectionString = std::getenv("AZURE_STORAGE_CONNECTION_STRING");
if (!envConnectionString.empty())
{
return envConnectionString;
}
throw std::runtime_error("Cannot find connection string.");
}
int main()
{
using namespace Azure::Storage::Blobs;
const std::string containerName = "sample-container";
const std::string blobName = "sample-blob";
const std::string blobContent = "Hello Azure!";
auto containerClient
= BlobContainerClient::CreateFromConnectionString(GetConnectionString(), containerName);
containerClient.CreateIfNotExists();
BlockBlobClient blobClient = containerClient.GetBlockBlobClient(blobName);
std::vector<uint8_t> buffer(blobContent.begin(), blobContent.end());
blobClient.UploadFrom(buffer.data(), buffer.size());
Azure::Storage::Metadata blobMetadata = {{"key1", "value1"}, {"key2", "value2"}};
blobClient.SetMetadata(blobMetadata);
auto properties = blobClient.GetProperties().Value;
for (auto metadata : properties.Metadata)
{
std::cout << metadata.first << ":" << metadata.second << std::endl;
}
// We know blob size is small, so it's safe to cast here.
buffer.resize(static_cast<size_t>(properties.BlobSize));
blobClient.DownloadTo(buffer.data(), buffer.size());
std::cout << std::string(buffer.begin(), buffer.end()) << std::endl;
return 0;
}
2
Answers
Long story short, CACHING was the solution. Our system is designed such way, that read function always read only 32kb, and then you can imagine the amount of http requests... At first I have tried to download a 1gb locally and then whenever read is called, get a chunk of that 1gb, afterwards I reduced it all way to 4mb, which showed great results. Speed up was insane.
I would recommend to go with split your download into
chunks
and parallelize them manually. This approach resembles the method used by some HTTP clients to download files in parallel.You can use the below code to download much faster using C++ SDK.
Code:
The above code divides the file into 4 MB chunks and downloads them concurrently using
std::async
for efficient multi-threading, ensuring thread-safe writing withstd::mutex
.Output:
File:
Also check with the GitHub link which your created they can also provide good suggestions to help with C++ SDK.