How to get chrome headless output to memory efficiently with C#? - Asp.net

MAzyoksul
September 25, 2021
163 views
0 votes
2 Answers

Upon request, my ASP.NET server should convert an HTML file to PDF using a chrome headless instance and return the resulting PDF.

CMD command:

chrome --headless --disable-gpu --print-to-pdf-no-header --print-to-pdf="[pdf-file-path]" --no-margins "[html-file-path]"

The PDF file is not trivial to deal with. The server needs to cleanup the PDF file from the previous request, needs to detect when the new PDF is created, and then read the file into the memory. All this is just too slow.

Is there a better solution to this? Could I get the file directly into memory somehow? Or manage the PDF file better?

Tags: asp.net c#google-chrome-headless

Answers

Chosen as BEST ANSWER

Quit using chrome through the command-line interface and use Chrome web drivers on C# like Selenium or Puppeteer instead. For Selenium, use the following NuGet:

https://www.nuget.org/packages/Selenium.WebDriver/4.0.0-rc2

Then you can print your HTML into PDF using the following code:

// Base 64 encode
var textBytes = Encoding.UTF8.GetBytes(html);
var b64Html = Convert.ToBase64String(textBytes);

// Create driver
var chromeOptions = new ChromeOptions();
chromeOptions.AddArguments(new List<string> { "no-sandbox", "headless", "disable-gpu" });
using var driver = new ChromeDriver(webdriverPath, chromeOptions);
// Little bit magic here. Refer to: https://stackoverflow.com/a/52498445/7279624
driver.Navigate().GoToUrl("data:text/html;base64," + b64Html);

// Print
var printOptions = new Dictionary<string, object> {
    // Docs: https://chromedevtools.github.io/devtools-protocol/tot/Page/#method-printToPDF
    { "paperWidth", 210 / 25.4 },
    { "paperHeight", 297 / 25.4 },
};
var printOutput = driver.ExecuteChromeCommandWithResult("Page.printToPDF", printOptions) as Dictionary<string, object>;
var document = Convert.FromBase64String(printOutput["data"] as string);

(Edit)

- AlbertDKallal
- September 25, 2021 at 7:52 pm
- 0 votes
0
I would consider several options.

Print output to a PostScript printer.

Then take the PostScript and say use GhostScript to output a PDF.

Probably even better? use the .net pdfSharp library, and then a some code to render HTML based on that library.

Consider this:

https://www.nuget.org/packages/HtmlRenderer.PdfSharp/1.5.1-beta1

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

How to get chrome headless output to memory efficiently with C#? – Asp.net

Answers