What People Are Really Building in r/AI-Agents

Using embeddings, STM, and LLM summaries, this post breaks down the dominant themes in r/AI-Agents and examines how discussions evolve across the month.
Embeddings
LLMs
Topic Modelling
AI
Author

Alex Wainwright

Published

November 18, 2025

AI agents are everywhere right now. Nearly every workflow is being rebranded as “AI.” Psychometric measures supposedly “handled by an AI agent,” forms replaced with a voice model. On the surface, impressive. But let’s be honest: a percentile score doesn’t need an agent, and a basic form can handle initial screening. I’m not convinced.

Tools like Lovable, N8N, and others have also flooded the space with a wave of “vibe-coded” apps. Many even leak their own API keys in network calls. The result is a stream of identical-looking products, some claiming to be health or finance tools while overlooking basic security.

This post looks at how these AI services have fed into the constant push to make money fast. To do that, we’ll dive into the r/AI-Agents subreddit. The goal is simple: understand how this community posts and what they’re actually building.

Subreddit Posts

Using PRAW, I pulled 978 submissions from the subreddit. All data was collected on 12 November 2025. It’s still only a sample, so treat it as indicative rather than definitive.

library(data.table)
library(flextable)
library(ggplot2)
library(reticulate)
library(stm)
library(tidytext)

subreddit_submissions <-
  fread("data/subreddit_submissions.csv")

The posting pattern by day is steady across the month (Figure 1) . Nothing substantial, but enough activity to show the subreddit has a consistent audience, and that people are genuinely interested in the idea of building AI agents.

subreddit_submissions[, created_utc := as.POSIXct(created_utc)]
subreddit_submissions[, .N, by = .(posted_date = as.Date(created_utc))] |>
  ggplot(aes(x = posted_date, y = N)) +
  geom_col(
    colour = "black",
    fill = "white"
  ) +
  theme_classic() +
  labs(
    x = "Submission Posting Date",
    y = "Number of Submissions"
  )

Figure 1: Subreddit posts by day

For the analysis of these posts, we’ll triangulate across three approaches:

  • Embeddings

  • Topic modelling

  • Large language models

Each helps explore the types of content being posted and whether the themes shift over time.

Embeddings

The submission titles are encoded into embeddings, which are then averaged by week. These weekly averages are compared using cosine similarity to check for semantic shifts.

import pandas as pd
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_distances

model = SentenceTransformer("all-mpnet-base-v2")
subreddit_submissions = r.subreddit_submissions

subreddit_submissions["embeddings"] = subreddit_submissions["title"].apply(model.encode)
subreddit_submissions["week_of_year"] = subreddit_submissions["created_utc"].dt.strftime("%W")

week_embeddings = subreddit_submissions.groupby("week_of_year")["embeddings"].mean()
cosine_matrix = model.similarity(week_embeddings, week_embeddings)
cosine_matrix = pd.DataFrame(cosine_matrix, index=week_embeddings.index, columns=week_embeddings.index)

Inspecting the cosine similarity matrix, Week 39 stands out: it’s less similar to the other six weeks, which cluster tightly together.

cosine_matrix <-
  py$cosine_matrix |>
  as.matrix()

cosine_matrix[upper.tri(cosine_matrix)] <- NA
cosine_matrix <- round(cosine_matrix, 3)
cosine_matrix
      39    40    41    42    43    44 45
39 1.000    NA    NA    NA    NA    NA NA
40 0.915 1.000    NA    NA    NA    NA NA
41 0.916 0.987 1.000    NA    NA    NA NA
42 0.912 0.988 0.984 1.000    NA    NA NA
43 0.896 0.984 0.981 0.982 1.000    NA NA
44 0.912 0.986 0.983 0.983 0.982 1.000 NA
45 0.888 0.967 0.968 0.967 0.969 0.971  1

A simple explanation shows up in the weekly counts. Week 39 contains only 15 posts, compared with 206 in Week 40. With such a small sample, the averaged embedding for Week 39 is more sensitive to topic variation and noise.

subreddit_submissions <- 
  py$subreddit_submissions |>
  as.data.table()

subreddit_submissions[, .N, by = .("Created Week" = week_of_year)] |>
  as_flextable()

Created Week

N

character

integer

45

73

44

195

43

173

42

145

41

171

40

206

39

15

n: 7

Overall, the cosine similarities suggest minimal semantic drift across weeks, aside from Week 39, which is likely an artefact of low volume. With short titles and uneven posting counts, embeddings give a high-level signal rather than detailed semantic structure.

Topic Modelling

For topic modelling, the submission titles are pre-processed and tokenised, then analysed using the structural topic modelling (STM) package.

The workflow is straightforward:

  1. Multiple models are fitted with different numbers of topics. Each model is evaluated using exclusivity and semantic coherence.

  2. The topic count with the best balance of these metrics is selected as the preferred solution.

  3. STM is then re-run several times using this chosen topic number to account for variability in model initialisation.

  4. From these runs, the model with the strongest exclusivity–coherence profile is retained for interpretation.

Results of this process showed a 3 topic solution to be preferential.

processed_text <-
  textProcessor(
    documents = subreddit_submissions$title
  )

prepared_text <-
  prepDocuments(
    documents = processed_text$documents,
    vocab = processed_text$vocab
  )

search_k <-
  searchK(
    documents = prepared_text$documents,
    vocab = prepared_text$vocab,
    K = 2:7
  )

multi_topics <-
  selectModel(
    documents = prepared_text$documents,
    vocab = prepared_text$vocab,
    K = 3,
    runs = 100,
    seed = 1410
  )

selected_model <- 
  multi_topics$runout[[2]]

An inspection of the top words for each topic provides insight into the model’s interpretation.

Topic 1 focuses on building “AI agents,” especially around design principles and open-source models.

Topic 2 is about projects and workflows, often linked to sales or productivity.

Topic 3 captures products people have built, what they need, and requests for testing or feedback.

labelTopics(selected_model)
Topic 1 Top Words:
     Highest Prob: agent, build, use, voic, make, best, free 
     FREX: sourc, sdk, open, get, model, best, design 
     Lift: ’re, action, almost, behind, design, easier, email 
     Score: agent, build, voic, use, best, new, make 
Topic 2 Top Words:
     Highest Prob: autom, look, tool, anyon, can, help, workflow 
     FREX: engin, sale, talk, train, project, workflow, far 
     Lift: engin, far, suggest, agentkit, aiml, appli, background 
     Score: autom, tool, workflow, anyon, like, learn, busi 
Topic 3 Top Words:
     Highest Prob: built, actual, need, work, code, just, platform 
     FREX: work, code, product, test, now, claud, differ 
     Lift: work, accident, add, agi, altern, amaz, author 
     Score: built, work, code, actual, product, need, just 

We extend the insight into these topics by sampling representative titles from each one.

Topic 1 focuses on seeking feedback, discussion, and advice on builds.

titles_retained <-
  subreddit_submissions$title
titles_retained <-
  titles_retained[-prepared_text$docs.removed]

model_titles <-
  findThoughts(selected_model, titles_retained)

plotQuote(model_titles$docs[[1]])

Topic 2 centres on selling workflows, promoting sales opportunities, and offering paid work.

plotQuote(model_titles$docs[[2]])

Topic 3 highlights people showcasing their builds and giving advice to others.

plotQuote(model_titles$docs[[3]])

We assign each post to a topic using the maximum posterior topic proportion. Using these assignments, we examine topic presence by week. The pattern mirrors the embedding results: Week 39 shows very little Topic 3 activity and a heavier presence of Topic 2.

subreddit_submissions <-
  subreddit_submissions[-prepared_text$docs.removed]
subreddit_submissions[, topic := apply(selected_model$theta, 1, which.max)]
prop.table(table(subreddit_submissions$week_of_year, subreddit_submissions$topic), margin = 1)
    
              1          2          3
  39 0.53333333 0.40000000 0.06666667
  40 0.49756098 0.25365854 0.24878049
  41 0.45882353 0.25882353 0.28235294
  42 0.44366197 0.26056338 0.29577465
  43 0.45562130 0.21893491 0.32544379
  44 0.50785340 0.18848168 0.30366492
  45 0.49315068 0.17808219 0.32876712

Overall, the topic distribution is steady across most weeks, with Week 39 as the main outlier. Its heavier focus on Topic 2 and near-absence of Topic 3 supports the earlier embedding results: the week’s content is structurally different, likely due to the low number of posts. Beyond that, the subreddit shows consistent patterns in what people ask for, what they build, and what they try to sell.

Large Language Models

DeepSeek was prompted to extract topics and trends based on submission titles and dates.

Main Topics:

  • AI Agent Development – Building agents, frameworks, tools, and technical issues.

  • Business Automation – Workflows, sales, customer service, and basic automation.

  • Voice AI Agents – Voice platforms, use cases, and tooling.

  • Agent Evaluation & Reliability – Testing, failures, debugging, and deployment concerns.

  • Learning & Getting Started – Beginner questions and early entry into the space.

Given the short window, shifts are small. Early posts focus on learning; later posts move toward building agents and discussing frameworks. Voice AI appears more toward the end, while business automation stays steady throughout.

Taken together, the embeddings, topic modelling, and LLM summaries point to the same pattern: most posts revolve around building and selling agents, with only minor variation week to week. The subreddit is active, but the themes are narrow and largely driven by new tools entering the space.