Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That would assume the underlying data does not change. If you use BiqQuery with ingest runnig 24/7, you often don't want to be served from cache. Besides, filters and slicers can be modified on a per user basis, that would trigger recalculations. Again - not sure if a cache would handle this.


It's important to remember how BigQuery actually stores it's data, with a columnar base datastore called Capacitor[0]. While you write data as a row, BigQuery does a certain amount of preprocessing on that data to turn it into column-oriented data. Writing a single new row is non-trivial, so it tends to write data in batches (though there is some other magic going on here I probably don't understand).

[0] https://cloud.google.com/blog/products/gcp/inside-capacitor-...


BigQuery is usually intended for data warehouse use cases, where missing the most recent 23 hours of data is less of a big deal. It's intended to feed things like exec dashboards, not production systems.


I believe the views will refresh when the underlying data changes, so if you are streaming or inserting frequently the costs could add up


Big query isn't a great use case if data that needs to be queried repeatedly and is updating that frequently, I don't think. With streaming inserts, you probably should generally only be doing inserts on the latest date/time based partition, so that you can take advantage of cached queries for all data but the most recent date/time partition.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: