Frequently Asked Questions (FAQ)

About the Service & Technology

Q: What is the main purpose of this service?
A: While this tool excels at creating concise summaries of technical lectures and research, its primary goal is to build a personal map of content consumption. By visualizing every video I watch, I can reclaim control from recommendation algorithms. I want to see the "local neighborhood" of topics for every video to discover related content organically. Map Status (Oct 2025): The current map is a manually generated UMAP plot of the first 4,000 entries. Now that the database has reached 8,000 entries, a regeneration is planned. You can view the example here: rocketrecap.com/exports/index.html
Q: Why do I get the error: "Transcript is too short. Probably I couldn't download it"?
This error occurs when the system fails to retrieve a usable text transcript for the video. This usually happens for one of three reasons:
  • Technical Blocks: YouTube occasionally blocks automated download attempts. These issues often resolve themselves within a few days.
  • No Captions Available: The video may not have closed captions or an auto-generated transcript enabled. (I am working on integrating Whisper to handle these cases via audio-to-text, but it is not yet live).
  • Language Compatibility: The video might be in a language not currently prioritized by the automated download function (e.g., Hindi).
The Workaround: If the video does have a transcript on YouTube, you can manually copy and paste the text into RocketRecap. This feature works best on desktop browsers.
Q: Who pays for this?
A: I built this primarily for my own use. To keep it free for others, I utilize Google’s free developer tier. This provides a shared daily quota (currently 20 requests for Flash models) which is distributed among all visitors to the site.
Q: How does the AI "watch" the video from just a link?
A: For YouTube links, the system uses yt-dlp to fetch the captions/transcript. This text is then processed by a Large Language Model (LLM). For specific cases (like some Chinese videos), I manually process the audio using whisper.cpp to generate a transcript, though this isn't fully automated on the site yet.
Q: What are some advanced ways to use this tool?
A: I often use it for specialized tasks:
  • Technical Glossaries: Asking the AI to "provide a glossary of medical/technical terms" from a lecture.
  • Informed Summaries: Manually pasting scientific paper links or YouTube comments into the prompt to give the AI more context.
  • Research Clusters: Using LLM embeddings to group related videos—a technique with great potential in fields like immune system research or genomic sequencing.
Q: Why doesn't YouTube provide these summaries itself?
A: YouTube does offer basic summaries for YouTube Premium subscribers. However, those are typically brief "teasers" designed to get you to watch the video. My goal is to create self-contained summaries that allow you to absorb the core information without necessarily watching the full video—something YouTube is naturally disincentivized to provide for ad-supported content.

Performance and Cost

Q: How much energy does generating a summary consume?
A: While exact metrics are difficult to track, the cost of API tokens is a good proxy for energy use. Currently, generating a text summary is significantly cheaper (and likely more energy-efficient) than streaming high-definition video for an hour. I am looking into more precise ways to analyze this "digital footprint."
Q: What are the costs for different AI models?
Approximate pricing per million tokens:
  • Flash Lite: $0.10 (Input) / $0.40 (Output)
  • Flash: $0.30 (Input) / $2.50 (Output)
  • Pro: $1.25 (Input) / $10.00 (Output)
Q: What are the current usage limits?
The service relies on a free developer quota.
Current Status (2025-12-06): Google has significantly reduced the free tier. There are currently 0 "Pro" requests available and only 20 Flash requests per day for the entire site.

Limitations and Future Work

Q: Why are some summaries factually incorrect?
AI "hallucinations" or errors usually stem from:
  1. Knowledge Cutoffs: The model might not be aware of very recent events (e.g., refering to a current world leader as "hypothetical").
  2. Transcript Limitations: Current transcripts lack "speaker diarization" (knowing who is speaking), which can confuse the AI during multi-person interviews or debates.
Q: What is on the roadmap?
  • Live Grounding: Integrating Google Search to verify facts in real-time.
  • Auto-Translation: Support for Spanish and Chinese videos (including platforms like Bilibili).
  • Interactive Elements: Highlighting insightful YouTube comments or corrections made by the community.
Q: Is there a plan to commercialize?
No. This is a personal project used to explore the capabilities of LLMs (specifically the Gemini family) and to help me organize my own digital learning.