I really like this idea. It’s basically applying similar principles as are used ...

peterhunt · on April 29, 2023

I built summarize.tech

Yes it's a great idea and I have a version that is basically a convolution over the transcript. It works much better than the current version - it can automatically create cohesive chapters and summaries of those chapters - however, it consumes an order of magnitude more ChatGPT API calls making it uneconomical (for now!)

e1g · on April 29, 2023

I'm inspired that this is a side project, given everything you run. Kudos.

peterhunt · on April 30, 2023

Thanks for the kind words. I built it on a few cross-country plane rides and now I mostly just leave it alone. The infrastructure and tooling we have these days is so incredible.

moneywoes · on April 29, 2023

Can you please eli5 the difference of old and new?

peterhunt · on April 30, 2023

Sure. The old one just splits the transcript into 5 minute chunks and summarizes those. The reason this sucks is because each 5 minute chunk could contain multiple topics, or the same topic could be repeated across multiple chunks.

This dumb technique is actually pretty useful for a lot of people though, and has the advantages of being super easy to parallelize and requiring only 1 pass through the data.

The more advanced technique does a pass through large chunks of the transcript to create lists of chapters in each chunk. Then it combines them to a single canonical chapter list with timestamps (it usually takes a few tries for the model to get it right). Then it does a second pass through the transcript, summarizing the content for each chapter.

The end result is a lot more useful, but is way slower and more expensive.

ralusek · on April 29, 2023

This is the standard practice already