Unrelated, but wondering if anyone here could recommend a Darktable-ish web-based photo organization app, less focused on editing but supporting tagging, starring, etc.?
Twitter used to have a Firehose API, too. Over time they closed it, and made it only available to large users like Google Search with real-time indexing needs.
Twitter had a really outstanding search and streaming API, but after Musk bought it they put it behind a $60k/year paywall. You can see a corresponding and abrupt falloff in academic network research papers, with newer ones that revolve around Twitter largely limited to cannibalizing old datasets.
With luck bsky keeps growing and researchers invest effort in studying a more open-by-design platform.
The documents saved by Stapler are also plain text (JSON). But because the app is trying to be a model citizen in the current model of macOS security/annoyance, it contains the file bookmarks that macOS gives us (which are binary blobs encoded as Base64 text, so incomprehensible to mere humans) rather than the human-readable file paths you might expect. Kind of annoying, but there we go!
Somewhat related: can anyone recommend a simple solution to share each node’s ephemeral disk/“emptyDir” across the cluster? Speed is more important than durability, this is just for a temporary batch job cluster. It’d be ideal if I could stripe across nodes and expose one big volume to all pods (JBOD style)
I guess your biggest issue may be the multiple writer problem, but you'd have the same issue on a local disk. The second multiple writer are supposed to update the same files, you'll run into issues.
Have you thought about TCP sockets between the apps and sharing state, or something like a redis database?
...pods could some how mount node{1..5} as a volume, which would have 5 * 200GB ~1TB of space to write to... multiple pods could mount it and read the same data.
One thing I'm not sure of is how much of a larger bit of text should go into an embedding? I assume it's a trade off of context and recall, with one word not meaning much semantically, and the whole document being too much to represent with just numbers. Is there a sweet spot (e.g. split by sentence) or am I missing something here?