fl1nt.dev - Building Web Agents Before Everyone Else
View Share Bookmark
Building Web Agents Before Everyone Else ×
+ New Tab its just for effect lol
[S]
https://fl1nt.dev/blog/building-web-agents-before-everyone-else
Published: Saturday, July 26, 2025 • 11 min read

Back in early 2023, I built a browser extension that could read webpages and actually understand them. ChatGPT was still kinda new and everyone was losing their minds over it, but I never found it actually useful honestly. In my mind, it was basically just a chatbot that couldn't actually do anything.

I called it Avora, and honestly I thought this was the most insanely good idea that nobody had thought of yet. Imagine having AI that could actually read and understand any webpage you're on, that seemed like it would be useful to literally everyone. "AI agents" were barely even a concept back then, it was just this really unique thing that I was convinced would blow people's minds once they tried it.

The Problem (aka Why This Was Actually Hard)

Making an LLM actually understand a webpage is way harder than it sounds. Anyone can dump HTML into some random model and get some kind of response, but getting it to actually understand the content? Way harder than you'd probably think. Granted, this is easier now, but back in the earlier days of AI this was all foreign.

The main issues I ran into were:

  1. Web pages are messy as hell thanks to the enshittification of the internet. HTML can be a nightmare to parse. There's often times more ads, navigation menus, and other dogshit mixed in, sometimes even more than the real content. it can be really hard to filter down to the actual content you need.
  2. Token limits suck because LLMs have context limits, and back then they were so low that you couldn't just dump an entire webpage and expect it to work.
  3. Finding relevant info is crucial when someone asks a question, you need to find the right part of the page to focus on. With early models, passing over too much info would actually degrade its performance.
  4. Making it actually useful means the AI needs to be able to search for more info or interact with the page, not just read it, or else what's the point of using the browser in the first place?
  5. Speed matters because nobody wants to wait 30 seconds for a response. The agent has to be quicker than the human would be or there's no point.

How I Actually Built It

Avora went through a lot of different iterations at first, but the core loop I ended up with this two-tier system where Avora would choose between different AI models depending on what you were asking and whether you were paying me or not (lol):

def determine_prompt_type(message, page_content, user):
    if user.plan_id in [2, 3, 4]:  # Paid plans
        message_length = len(message.split())
        content_length = len(page_content.split()) if page_content else 0
        
        if message_length > 50 or content_length > 5000:
            return 'advanced'  # Claude with tools
    
    return 'basic'  # gpt-4o-mini

Basic prompts: gpt-4o-mini for quick and easy stuff. Fast, cheap, and pretty decent. Advanced prompts: Claude Sonnet with all the fancy tool calling for complex questions

This wasn't just about saving money (though that was nice), but it was more about speed. It seemed kind of stupid to make someone wait for Claude to think really hard about "what's the price of this product" when gpt-4o-mini can just tell you instantly. So there was utility to basic prompts, beyond just adding moat to make it profitable.

Actually Getting Clean Content

When someone sent a message with the extension, a few things happened first. First, we grab all of the page text, then we grab all of the page's hyperlinks including their surrounding context. Finally, we pass it to the model.

def extract_structured_content(soup):
    links = []
    for link in soup.find_all('a', href=True):
        links.append({
            'url': link['href'],
            'text': link.get_text(strip=True),
            'context': get_surrounding_text(link, 50)  # 50 chars before/after
        })
    
    for element in soup(["script", "style", "nav", "footer"]):
        element.extract()
    
    content = soup.get_text(separator=' ', strip=True)
    
    return {
        'main_content': content,
        'links': links,
        'structure': extract_heading_hierarchy(soup)
    }

This meant the AI could understand not just what was on the page, but what was connected to the page, which became crucial for the tool calling functionality later and general coherence of its responses.

A cool feature I added later on was the ability to extract YouTube transcripts, which were handled differently on the backend, but it basically gave Avora full context over a YouTube video if you were watching one.

def extract_youtube_transcript(page_url):
    video_id = extract_video_id(page_url)
    if not video_id:
        return None
        
    try:
        transcript = YouTubeTranscriptApi.get_transcript(video_id)
        return ' '.join([entry['text'] for entry in transcript])
    except Exception:
        return None

So Avora could "watch" YouTube videos and answer questions about them, which honestly blew my mind back then, even though in hindsight its pretty trivial lol

Embeddings, How to Make Them Actually Work

This is where things got interesting. The biggest pain in the ass with AI back then (and still now tbh) is that you can only fit so much text into the context window. So when someone was on like a massive documentation site, I couldn't just dump the whole thing into GPT and hope for the best.

Instead, I had to get smart about finding the relevant parts, and this is when the logic started to get really complicated:

def process_page_content(page_content, message, max_words=None, max_tokens=None):
    tokens = enc.encode(page_content)
    token_count = len(tokens)
    
    if max_tokens and token_count > max_tokens:
        # Too big for context window - use embeddings
        dataset = create_dataset(page_content)  # Chunk into 1024-char pieces
        dataset_with_embeddings = store_embeddings(dataset)
        relevant_chunks = search_database(message, dataset_with_embeddings)
        return relevant_chunks
    
    return page_content

The way I set up embeddings was pretty decent for 2023. I'd basically call it RAG but before RAG was cool:

  1. We split everything into 1024 character pieces (not sentences, so its easier to manage tokens)
  2. Generated embeddings for each 1024 character chunk in parallel
  3. Pulling relevant chunks by running some fancy cosine similarity logic using the user's prompt against each chunk to find the 3-5 chunks that actually mattered based on the embeddings returned by the model.
  4. Putting it back together combined the relevant chunks into something that made sense for the final LLM to actually answer the user's question.
def search_database(user_query, database):
    query_embedding = client.embeddings.create(
        input=[user_query],
        model='text-embedding-3-small'
    ).data[0].embedding

    similarity_scores = cosine_similarity(
        [query_embedding],
        database['embedding'].tolist()
    )[0]

    database['similarity'] = similarity_scores
    sorted_database = database.sort_values('similarity', ascending=False)
    
    # Return top 3 most relevant chunks
    return ' '.join(sorted_database.head(3)['content'].tolist())

So basically this enabled people to ask questions about huge PDFs or whatever sites and Avora would actually find the right parts instead of just completely fumbling at answering. This was something that was a real problem with AI at the time, and there weren't really good solutions yet like there is now.

Making It Actually Agentic

So at this point, we have a system that:

  1. Grabs all the content from the current page, including hyperlinks with context
  2. Extracts YouTube transcripts if you're watching a video
  3. Uses embeddings to find the most relevant chunks if its a super long context
  4. Chooses between fast/cheap models vs. smart/expensive models based on query complexity
  5. Can actually understand and answer questions about whatever page you're on

But I still wasn't totally happy with where Avora was. It could only work with what was already on the page, and I felt that it would be really awesome if it found the context it needed for you, going above and beyond. That's when I made advanced prompts. I gave Claude the ability to search the web and fetching content from other pages just like a human would to research. This turned Avora from something that just read what was already on the page into something that could actually go and research for you.

tools = [
    {
        "name": "google_search",
        "description": "Search Google for information",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "Search query"}
            }
        }
    },
    {
        "name": "fetch_link_content", 
        "description": "Fetch content from a URL",
        "input_schema": {
            "type": "object",
            "properties": {
                "url": {"type": "string", "description": "URL to fetch"}
            }
        }
    }
]

The coolest part of this is that Claude could chain these tools together. Like if it felt it was necessary to answer the user's query, it could search for something, read the results, then fetch specific pages it found interesting, then search for more specific info based on what it learned:

for chunk in response:
    if chunk.type == "tool_use":
        if chunk.tool_use.name == "google_search":
            result = perform_google_search(chunk.tool_use.input["query"])
        elif chunk.tool_use.name == "fetch_link_content":
            result = fetch_link_content(chunk.tool_use.input["url"])
        
        # Continue conversation with tool results
        response = anthropic.messages.create(
            model="claude-3-5-sonnet-20240620",
            messages=messages,
            tools=tools,
            tool_results=[{"tool_call_id": chunk.id, "result": result}],
            stream=True
        )

So you could be on like idk some random article and ask "what are the latest developments in quantum computing?" and Avora would go search Google, read a bunch of articles, and give you an actual up to date answer regardless of where you were in the browser.

Making It Feel Fast (Streaming)

After adding this whole agentic loop into the system, Avora started to feel really, really sluggish. This is because before we would show a response to the user, it would go through and run all the tool calls it needs. One thing that made a huge difference was streaming the responses, and if you're using an AI app you really should be streaming. Instead of making people wait 10-30 seconds staring at a loading spinner, they could see the words appearing as the AI was thinking and see its tool calls in real time. Instantly makes the UX 999x better.

def handle_basic_prompt(message, page_content, page_url, page_title, chat_history, user_id):
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        stream=True
    )

    for chunk in response:
        if chunk.choices[0].delta.content is not None:
            yield json.dumps({"text": chunk.choices[0].delta.content}) + "\n"

The Stack (and What I Would Change Today)

  • Flask: Because it's what I was most familiar with at the time.
  • OpenAI API: text-embedding-3-small for embeddings, gpt-4o-mini for basic prompts
  • Anthropic API: claude-3.5-sonnet for advanced prompts with tool calling
  • BeautifulSoup: For scraping and cleaning websites
  • youtube-transcript-api: To get video transcripts directly
  • scikit-learn: For the cosine similarity math
  • tiktoken: To count tokens as context was washed so I we never hit limits
  • Serper API: For Google search.

If I were building Avora from scratch today, the stack would be completely different.

I'd probably use TypeScript for the backend following OpenAPI format instead of Python, because honestly Python APIs get messy fast, and I've grown somewhat of a resentment towards the language. I think it's great for scripting, but really shouldn't be used to ship anything real. For model access, I'd definitely use OpenRouter instead of hitting OpenAI and Anthropic APIs directly. OpenRouter didn't exist back then but it's so much better for managing multiple models. And for web search, I'd use Exa instead of Google. Exa is built specifically for AI agents to crawl the web and even handles semantics for you, and honestly I really wish that existed back then lol. Would've saved me so much time building custom scraping logic.

Things I Accidentally Got Right about Agents

Looking back, I apparently made some pretty good decisions that everyone else figured out later:

  1. Using different models for different things is important. Now everyone does this, but back then people just used gpt-4 for everything
  2. Embeddings for dealing with long content is standard now, but in early 2023 most people were just truncating stuff
  3. Tool calling became the way everyone builds agents now
  4. If you aren't streaming responses, your app is automatically shit
  5. Actually understanding different content types matters when handling videos/PDFs/web pages differently instead of just treating everything as text

It's wild that the stuff I built for Avora in 2023 is basically what every AI agent startup is doing now. Embeddings for search, tool calling, streaming responses, using different models for different tasks. All of that is just standard practice now.

At the time, it felt like I stumbled onto something huge that nobody else was really doing yet. I'm not trying to say I'm some visionary or anything, but I definitely caught onto the same trends early on. Just so happens to be that a lot of other people ended up seeing the same opportunities around the same time.

I ended up sunsetting Avora earlier this year, but the whole experience taught me everything I know about building with LLMs and basically launched my entire AI career. What's crazy to me is that AI has changed so much in just 3 years that making something like Avora today could be done in probably just a day with new tooling. I'm really grateful for the experience gained.