My first Chrome extension

TL;DR

Extensions are not hard, reverse-engineering the web page you extend is hard. There're some gotchas though.

I'm developing software for almost 20 years now but I've never developed a real Chrome extension. The technology behind it is called WebExtensions API and it is a cross-browser technology. So with minor modifications it should work on Chrome, Microsoft Edge, Opera, Vivaldi and maybe also Safari. That's the theory!

I think it was 10 years ago that - out of curiosity - I created a Hello World extension but have never touched browser extensions again. There was just no use case for me where a browser extension would have made sense. I associate them with React Dev Tools, Wappalyzer, Ad blockers and the like but nothing that fits my bill. That's changed now!

Use case for a browser extension

I'm working on a project which should extend an existing CRM web app with AI capabilities. After a little bit of brainstorming on how I could integrate the additional functionality I settled on a browser extension. In theory the extension should be able augment the web app so that people using it don't even recognise that the additional functionality comes from someone else than the CRM vendor. That's my hope.

Extensions scared me

I don't know why but unknowingly I've been always staying away from browser extensions. Maybe it's because I'm not familiar with them. Maybe because they feel hacky to me. It injects something into existing apps and you make assumptions which don't hold true and break your extension. Stuff like this is flying through my brain.

Lifecycle of an extension

In order to get over the fear I went through the MDN's and Chrome's documentation about browser extensions. By the way it's excellent and really not complicated especially if you're a web developer and know the DOM and other browser APIs.

An extension usually consists of Javascript files, resource files and it has a manifest.json. It is the only mandatory file which contains metadata like name and description and tells which Javascript files run at which moment. An extension has full access to the DOM of a web page.

Assumptions for DOM access

content_script defines the JS files to be executed when a page loads. Which page triggers an execution of a JS file can be defined with matches key. It is very flexible and allows for every web page on this planet or just one specific URL.

We are only missing at which moment a JS file gets executed by the browser. Broadly speaking it is every time the browser loads a page. When you type a URL in the browser's address bar and confirm the browser executes the extension's JS file. When you reload this page it is executed again.

Does the JS file run before the DOM is available or after or somewhere in between? By default it is when the browser thinks it's best. That's usually when the DOM is ready and resources like images are loaded. It helps to keep the impact of extensions on page load times low. It is possible to change this with the run_at directive.

One more speciality of extension is that each extension executes Javascript in its own sandbox. You cannot access any global JS variables from the web page nor from other extensions.

That's mostly it what you need to know about when the only thing you need to do is modifying the DOM of a web page.

CORS restriction

Until the late 2010s you could fetch any URL from the content_script. In the name of better security this has changed and the latest CORS policies are applied. There is still a way that an extension can fetch from any URL. You need to do it from a background script where no CORS restrictions are and pass messages between the content and background scripts. This is more involved now than it was before.

What we have so far:

browser executes JS files on every page load
matches restricts on which URLs to execute JS files
run_at tells when JS files are executed, normally after DOM is ready
JS files of the content_script cannot fetch any URL since CORS restrictions apply
use background scripts to circumvent the browser's CORS policy

Examples

Given this manifest here are examples on how you can program your JS files.

manifest.json:

"content_scripts": [
    {
      "matches": ["https://www.example.com/*"],
      "js": ["content.js"],
      "run_at": "document_end"
    }
],

content.js

// You can reliably access and modify the page's DOM
const someElement = document.querySelector('.some-class')

// For SPAs (single page applications) you can observe DOM changes like this
const observer = new MutationObserver((mutations) => {
    mutations.forEach((mutation) => {
        if (mutation.addedNodes.length) {
            const button = document.createElement('button')
            ...
            document.body.appendChild(button)
        }
    });
})

observer.observe(document.body, {
    childList: true,
    subtree: true,
    attributes: true,
})

You don't need to wrap your code in document.addEventListener('DOMContentLoaded', () => {...}).

Obviously there's a lot more browser extensions are capable of like executing things in the background, hooking into browser events like opening of tabs that do not exist for web pages. Currently I don't need them so I will end it here. Feel free to go deeper.

Conclusion

I just found out that browser extensions are not scary. With a little bit of documentation reading and tinkering I got the basics that get me towards my goal. The real meat of developing a browser extension lies in analysing and reverse-engineering the page that you want to extend.