~/blog/puppeteer-og|
Scan this vCard to save my contacts
blog post cover
#frontend

Dynamic Open Graph images with puppeteer

April 28, 2022 · 6 min read

Table of contents

What is what

The Open Graph is a protocol to represent a web page as a rich object in a social graph. On practice, it means representing a web page with some type, image and description in various social media that support Open Graph.

Puppeteer is an Open Source Node.js library to control headless or full Chrome or Chromium over the devtools protocol.

The problem comes to auto generating multiple images for each of the pages, links to which might be sent to one of the social networks or messengers like Twitter and Telegram. Some decides to generate images ahead during the build time, others prefer to generate them on demand once the page is requested for the first time.

We are going to solve this using Puppeteer.

Pros and cons

Of course the approach with Puppeteer is not the only one and for sure it has its pluses and downsides as any other solution or technology.

  • It's super easy to implement
  • It's very flexible due to the power of web pages and CSS

But at the same time:

  • It might be a bit heavy because we have to launch the whole web browser under the hood

I mean that if you have to generate dozens of images somewhere in the cloud, Puppeteer might not be a best solution in terms of performance and resources costs.

There's a nice package that helps to run it on AWS Lambda: https://github.com/alixaxel/chrome-aws-lambda

Implementation

First we need to create a regular HTML page with the desired OG image design:

<div class="breadcrumbs">~/blog</div>
<div class="title" id="title">Dynamic Open Graph images with puppeteer</div>
<div class="author">
  <div class="meta gradient" id="tag">#frontend</div>
  Eugene Dzhumak
</div>

Notice I gave some elements unique identifiers. This will come up handy in a moment.

Now let's run Puppeteer, update the data and take a screenshot of the page:

import puppeteer from 'puppeteer';
import { resolve } from 'path';
import { getAllPostsMeta } from './generate-feed.mjs';

(async () => {
  const posts = await getAllPostsMeta();
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.setViewport({ width: 1200, height: 630 });
  await page.goto('file:'.concat(resolve('./src/og.html')));

  for (const post of posts) {
    await page.evaluate((post) => {
      document.getElementById('title').textContent = post.meta.icon.concat(' ', post.meta.title);
      document.getElementById('tag').textContent = post.meta.genre
        ? '#'.concat(post.meta.genre)
        : '';
    }, post);

    await page.screenshot({
      type: 'jpeg',
      path: resolve('./public/static/', post.slug, 'og.jpg'),
      quality: 70,
    });
  }

  await browser.close();
})();
  • Line 10: Open the HTML page we created previously as a file
  • Lines 13-16: Evaluate a function in context of the page, passing our post variable as an argument
  • Lines 18-22: Take a screenshot of the entire page and save it the desired path

UPD 17.05.2022

Recently I had a chance to use puppeteer in solving of a close to production problem: generating thousands of PDF certificates of successful finishing of an online course or participation in online event.

Those certificates usually are sent to the participant's emails as an evidence of their participation and gratitude for their contribution to the event.

  • Initial data: ~18k users
  • Expected result: generated PDF certificate for each user

Using puppeteer and the exact same technique I've described above I was able to write a pretty fast generation script that took around 16 minutes to go through each user and create a PDF screenshot of the page.

Furthermore, using Node.js Worker Threads I sped up the script to 5 minutes. It's not clear how exactly the bets number of threads should be picked up, so I assume that if wanted to achieve even better results, I could play a bit with the number of worker threads.

yarn run v1.22.17
$ node index.mjs
 ████████████████████████████░░░░░░░░░░░░ 69% | ETA: 65s | 12943/18397

✨ I decided to stop here, since 60 PDFs per second on a local machine is a quite appropriate speed for me.

Recap

Puppeteer may be considered as a go-to solution for simple use-cases. It can be used for webpages crawling1, generating PDFs2, automated testing34.

There's even a book from O'Relly: UI Testing with Puppeteer

Footnotes

  1. https://www.digitalocean.com/community/tutorials/how-to-build-a-concurrent-web-scraper-with-puppeteer-node-js-docker-and-kubernetes

  2. https://pptr.dev/#?product=Puppeteer&version=main&show=api-pagepdfoptions

  3. https://www.browserstack.com/guide/ui-automation-testing-using-puppeteer

  4. https://github.com/smooth-code/jest-puppeteer