Skip to main content

Command Palette

Search for a command to run...

chrome extensions 101 - beginner guide

Updated
9 min read
chrome extensions 101 - beginner guide

why is building chrome extensions fun?

because w the advancement in LLMs, you basically are a magician. the only thing needed now from your side is taste. if you’re reading this blog, you might aswell would be spending most of your time inside a browser. now that might be duckduckgo, helium or zen. but browser nonetheless.

all these browsers essentially share the same architecture under the hood. chrome’s v8 engine, except for safari and firefox which use a different javascript runtime which we’ll talk about.

so making simple chrome extensions is not hard.

although i’ll primarily assume chrome throughout this article, and a single chrome extension would run at most places you could think of. with minor tweaks, would run in safari aswell

speedrun through this cool playlist i learnt myself from


underlying architecture:

the best part about all this: we need not build surrounding boilerplate infra to run chrome extensions and that these work directly in your browser. so the browser is the runtime.

i made a chrome extension which finds the most relevant results from reddit for the object you select on the webpage. very helpful with surfing/shopping on sites like amazon, since reddit is where the most organic results live. we’ll go through the file structure of a chrome extension taking redditHelp as a reference:

annnnnd we’ll go through each one by one

chrome provides you exhaustive APIs, surrounding infra, runtime already. you need a way for the browser to know about your extension. manifest.json is the file which lets you do this.

manifest_version: 3 - this tells chrome this is MV3. MV3 is the current standard, MV2 extension. MV2 versions still work but chrome is phasing them out. more on this later.
permissions": ["activeTab", "storage"] - this lets the browser know that the extension wants access of the current active tab, and chrome storage. (you might use a db aswell if it suits your use case)
host_permissions - this chrome extension asks apis from reddit, and hence we use them from host_permissions
action.default_popup: "popup.html" - when you click the chrome extension popup, popup.html is what is supposed to open
content_scripts with matches: ["<all_urls>"] - injects dist/content.js into every page the user visits. This script runs inside the page's context - it can read and manipulate the DOM. But <all_urls> is a broad permission, chrome will flag this to users.

that being written, the manifest has three distinct entry points - popup.html, background.js and content.js.
why are there 3 separations? 3 separate scripts?

these are 3 isolated runtimes that cant talk to each other directly.

-content script is the webpage’s process - sandboxxed there intentionally. if it had full chrome API access, any malicious website site could piggyback on it.
-background worker has full chrome API access but no DOM. it is the trusted core, but has been kept away from the page content.
-popup is just the UI shell and it dies the moment you close it - so it cannot hold state

// content script — runs inside every page (or matched pages)
// has DOM access, no direct chrome API access (limited subset only)

// listen for messages from background or popup
chrome.runtime.onMessage.addListener((message, sender, sendResponse) => {
  if (message.type === "PING") {
    sendResponse({ status: "content script alive" })
  }
})

// can directly read/manipulate the DOM
const selectedText = window.getSelection()?.toString()

// send something to background
chrome.runtime.sendMessage({ type: "TEXT_SELECTED", payload: selectedText })
// background service worker — the brain of the extension
// no DOM access, full chrome API access
// persists independently of any tab or popup

// listen for extension install
chrome.runtime.onInstalled.addListener(() => {
  console.log("extension installed")
})

// central message hub — everything routes through here
chrome.runtime.onMessage.addListener((message, sender, sendResponse) => {
  if (message.type === "TEXT_SELECTED") {
    // do something — fetch an API, store data, message another tab ....
    fetch(`https://api.example.com?q=${message.payload}`)
      .then(res => res.json())
      .then(data => sendResponse({ results: data }))

    return true // keep channel open for async
  }
})

the next natural question would be how do these 3 talk to each other. the pattern remains the same. someone sends the message, background listens and responds.

// content.ts or popup.ts
chrome.runtime.sendMessage({ type: "FETCH_REDDIT", query: selectedText }, (response) => {
  console.log(response.results)
})

// background.ts
chrome.runtime.onMessage.addListener((message, sender, sendResponse) => {
  if (message.type === "FETCH_REDDIT") {
    fetchRedditResults(message.query).then(results => {
      sendResponse({ results })
    })
    return true // critical — tells chrome to keep the channel open for async response
  }
})

without the return true , chrome closes the message channel before your async function resolves and sendResponse does nothing


storage:

since your popup, background and content scripts are three different JS environments: localStorage doesnt work how you’d expect since each context has its own localStorage.
that is, popup’s localstorage would be invisible to content’s localStorage.

chrome gives you chrome.storage which is shared context storage across all three. use this in your code instead of localStorage


chrome apis:

below are the ones you should keep mind of while making a chrome extension.

chrome.tabs - query the current tab, get its URL, inject scripts programmatically. Most extensions need this.
chrome.scripting.executeScript - run a function in the context of a tab from the background. Alternative to content scripts for one-off injections. chrome.contextMenus - add items to the right-click menu. This is how redditHelp could've been triggered instead of a popup. Very underrated API, gives your extension a native feel.
chrome.alarms - set recurring tasks from the background worker. Think cron but for extensions. Useful for polling, reminders, periodic syncs. chrome.storage.local vs chrome.storage.sync - 'sync' automatically syncs across the user's Chrome instances via their Google account. 'local' stays on device.
chrome.runtime.onInstalled - runs once on install/update. good for setting default config in storage

everything else is a simple lookup on the docs page. refer as you go on about making your app, or let LLMs do the heavy lifting


redditHelp flow and architecture:

we take a brief look at the architecture of redditHelp. although its not that deep, especially in the AI era. i made this while we still somewhat wrote code by hand:

the query selection is the most interesting part. content.ts actively collects three things on the active webpage:

  • selected text

  • cleaned up page title (strips keywords, keeps 6 meaningful words)

  • URL metadata

popup now picks up in priority : selected text > title > metadata url
obviously, we want the manually selected text by user to be of the most priority when you send the dump to reddit.

SPA: content.ts polls the url every second for any changes (if) made in the url. since the user is actively surfing and browsing the page, things will change.

background.ts - popup cannot (and should not) fetch reddit directly. so, popup messages background → background does the fetch → handles caching ( 5 min, 50 entries max ) → enforces rate limiting ( 1 second b/w requests ) → sends back the structured response

no api key needed: since reddit’s search.json is public, we do not need any apis.


deploying chrome extensions:

you could deploy locally and test on your browser and publish on chromewebstore aswell:

Load it locally (for development):

  • npm run build to compile

  • go to chrome://extensions

  • turn on Developer Mode (top right toggle)

  • click "Load unpacked" → select your root folder

  • every time you change code, rebuild and hit the refresh icon on the extension card

Web Store (to ship it publicly):

  • one-time $5 developer registration fee at Google Chrome Web Store Developer Dashboard

  • zip your extension folder, upload it

  • review takes anywhere from a few hours to a few days

  • any update you push also goes through review


building extensions for firefox and safari:

Firefox - uses WebExtensions API which is intentionally chrome-compatible. most extensions work with minimal changes. the main difference is firefox still supports MV2 and has been vocal about not fully adopting MV3 (specifically the declarativeNetRequest vs webRequest debate. (firefox lets you keep the more powerful webRequest).
so if you're building something that needs request interception, firefox actually gives you more power.

safari - apple has a conversion tool (xcrun safari-web-extension-converter) that wraps your extension in an Xcode project. It works but it's annoying. you need a mac/Xcode, and an Apple Developer account ($99/year) to distribute it. Not worth it unless you have a specific reason.

if you want to go cross-browser, build for Chrome first with MV3, then test on Firefox. most prolly it’d just work.


how to think through project ideas - sideQuest:

a small flex i have - never touched, bought or read a single page of a single book for my uni exams. i’ve prolly read more tech related papers and O'Reilly books at this point than i have for my uni. im sure most others do the same. so we take refuge of LLMs and ppts. but learning through gpt back then was tedious and frustrating. why? because as the chats become longer and you go on tangents, it becomes hard to scroll upwards and go back to the original prompt/paragraph you began reading with.

this was a big problem personally. cool guy pointed this out and that’s when i knew this was global and others faced this aswell

so i built sideQuest within a day, bookmark any particular prompt. and jump back to it when you need to. this works for gemini and claude aswell

this problem was real, and solved natively by gpt’s recent UI change. it is basically the same feature

these are simple DOM tools, and nothing fancy. no time complexity, data structures or HLDs/LLDs. you might need microphone apis or DOM apis for certain extensions, but the core architecture remains same.

POINT OF WRITING ALL THIS : A LARGE NUMBER OF PEOPLE HAVE ASKED ME ON HOW DO I GO ABOUT THINKING PROJECT IDEAS, WHAT TO MAKE. this is how i personally approach my builds. simple but pick a problem which you personally face (even better when someone else does asw), build a simple rudimentary MVP around it, and im pretty sure there’ll come a moment when you think “oh i can add this ft.” while building it, that’s how you end up converting a potential idea to a product.

i still use redditHelp for search, sideQuest for bookmarks in long chats to this day

i hope this helps :)


feel free to give any feedbacks and/or suggestions:

github - https://github.com/dexisback twitter - https://x.com/dextertwts redditHelp - https://github.com/dexisback/redditHelp
sideQuest - https://github.com/dexisback/SideQuest
tutorial playlist - i learnt from this