Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: State of Timeseries Databases in 2024
11 points by jaapz on Aug 13, 2024 | hide | past | favorite | 12 comments
Our product stores a lot of (sensoric) time series data. We currently store it using InfluxDB. We started about a decade ago, and back then, InfluxDB was the simplest and best option out there. A big plus was their focus on open source.

However, since then InfluxData seems to have shifted their open source focus more and more to enterprise and cloud (v3 that they are currently working on still afaik does not have an open source version).

As we've been having trouble with InfluxDB randomly simply stalling and not accepting any writes or reads, and also taking into account the painful migration to v2, and now probably another painful migration to v3, we are looking around for alternatives.

Since we started, we've grown so now scalability is more important for us as well.

One option we've been looking at is VictoriaMetrics, which seems quite good.

HN, what time series databases should I certainly look at when trying to find an alternative for InfluxDB?



Directly related to your question:

Both victoria metrics and questdb are compatible (ingestion-wise) with the InfluxDB Line protocol, so migration would be smoother than with other databases. Just point the old ingestion script to the new server URL, and data will start flowing in.

Taking a broader view, the time series database landscape is split into three categories (sorry for adding complexity!):

1. Observability (metrics from your hardware): Prometheus, and other engines that work well with Prometheus such as Victoria Metrics. I think their language is tightly coupled with PromQL. InfluxDB 1.X and 2.X used to be in this camp and were the market-leading solution for observability before Prometheus came along and got incredible adoption. Chronosphere built with m3db is also a big name in this category.

2. General purpose: TimescaleDB is built on top of Postgres, and is now seen increasingly as a super postgres that can also deal with time series data, amongst other things (now focusing on vectors as well).

3. Specialized: kdb+, QuestDB, some OLAP databases that can also do time series (Clickhouse & Druid), and perhaps InfluxDB 3.0 even though it's not OSS yet. Here the focus is on performance, and the data loads tend to be more significant. Industries and use cases often paired with demanding data loads, such as financial services, often require such specialized databases. Some have their prop language (kdb+ with Q), some are closed source (kdb+), and others are OSS & use SQL (questdb, clickhouse, druid). InfluxDB 3.0 also uses SQL (from DataFusion's query engine) but is not OSS yet.


I would use a slightly different classification:

  - general purpose - OLTP
  - general purpose - OLAP
  - specialized:
    - Internet analytics / clickstream data / adtech
    - financial timeseries
    - monitoring/observability
    - IoT/sensors 
    - IIoT/manufacturing/process


Thanks for your insight, it looks like you've come basically to the same conclusions that I found during my own research as well, which is reassuring

For us it seems VictoriaMetrics is the most logical replacement at the moment, as like you say it supports line protocol as well


My experience is that generally Clickhouse beats out any of the targeted time series databases on speed, while also having the advantage of supporting full traditional SQL. I think the market just hasn't realized yet because it's still relatively new.


That's another promising alternative that I've been looking at.

From what I could find however deployment gets quite complex when you want sharding/replication, it really doesn't like deduplication of data and storage might be up to 3x larger than InfluxDB.

Other than that, it was in my top 3 (VictoriaMetrics, Clickhouse, QuestDB).


A lot also depends what time series data and how you're going to query it

VictoriaMetrics is awesome. Clickhouse is another popular choice if you need more flexible model and would like SQL


Have you heard of GreptimeDB? I am the creator of GreptimeDB. It's open-source and compatible with InfluxDB Line Protocol but only supports SQL as the query language. And it wants to support all types of time-series data, including metrics, logs, and events.

InfluxDB is a great project, but as you said, the v3 migration is painful and is still WIP. About the performance, We already did some benchmark GreptimeDB vs. InfluxDB v2, you can find it in our blog.


Thanks, that looks really neat I will look into it for sure


I don’t have personal experience with it but I think TimeScaleDB is worth investigating. It is a Postgres extension. The founder is an academic and a really nice person who managed to build a really great community.


I have been comparing TimescaleDB and my thoughts were it looks really nice and the fact that it is based on the solid foundation of Postgres is really good, but their clustering support was discontinued and they refer to some kind of vague "cloud native solutions" that should be used instead. Incremental backup is not possible unless using third party tools. I was unable to determine if I can write the same value multiple times without running into trouble. And from what I could find performance (both write and read) is worse compared to virtually any other dedicated time series solution out there.


TimescaleDB has not so good performance and on-disk data compression rate comparing to ClickHouse [1]. TimescaleDB is good for cases with relatively small amounts of time series data (e.g. up to a few tens of billions rows) when you already use Postgresql and don't want introducing new database for time series data.

[1] https://benchmark.clickhouse.com/#eyJzeXN0ZW0iOnsiQWxsb3lEQi...


How about Bigtable (for apps) and/or BigQuery (analytics)?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: