I vaguely remember reading comments here that said you can get rate limited on R2 without warning if egress is too high. Was that true and is that still true? What is the limit if so?
I tried looking for that thread again and I only found the exact opposite comment from the Cloudflare founder:
>Not abuse. Thanks for being a customer. Bandwidth at scale is effectively free.[0]
I distinctly remember such a thread though.
Edit: I did find these but neither are what I remember:
Hacker News archive is hosted in ClickHouse as a publicly accessible data lake. It is available without sign-up and is updated in real-time. Example:
# Download ClickHouse:
curl https://clickhouse.com/ | sh
./clickhouse local
# Attach the table:
CREATE TABLE hackernews_history UUID '66491946-56e3-4790-a112-d2dc3963e68a'
(
update_time DateTime DEFAULT now(),
id UInt32,
deleted UInt8,
type Enum8('story' = 1, 'comment' = 2, 'poll' = 3, 'pollopt' = 4, 'job' = 5),
by LowCardinality(String),
time DateTime,
text String,
dead UInt8,
parent UInt32,
poll UInt32,
kids Array(UInt32),
url String,
score Int32,
title String,
parts Array(UInt32),
descendants Int32
)
ENGINE = ReplacingMergeTree(update_time)
ORDER BY id
SETTINGS refresh_parts_interval = 60,
disk = disk(readonly = true, type = 's3_plain_rewritable', endpoint = 'https://clicklake-test-2.s3.eu-central-1.amazonaws.com/', use_environment_credentials = false);
# Run queries:
SELECT time, decodeHTMLComponent(extractTextFromHTML(text)) AS t
FROM hackernews_history ORDER BY time DESC LIMIT 10 \G
# Download everything as Parquet/JSON/CSV...
SELECT * FROM hackernews_history INTO OUTFILE 'dump.parquet'
Honestly don't understand how Cloudflare thinks this is a higher priority than versioning, replication of buckets, or even geo distribution of objects.
It's a strange direction. I thought Cloudflare viewed R2 mostly as competition for S3 when used as cdn backing storage (a very natural place to compete.) For which, btw, it is great -- I seamlessly use it for ActiveStorage and not only is it way cheaper, but configuring it is about 100x simpler than the s3/cloudfront/random acls/signed cookies stuff.
I vaguely remember reading comments here that said you can get rate limited on R2 without warning if egress is too high. Was that true and is that still true? What is the limit if so?
I tried looking for that thread again and I only found the exact opposite comment from the Cloudflare founder:
>Not abuse. Thanks for being a customer. Bandwidth at scale is effectively free.[0]
I distinctly remember such a thread though.
Edit: I did find these but neither are what I remember:
https://news.ycombinator.com/item?id=42263554
https://news.ycombinator.com/item?id=33337183
[0] https://news.ycombinator.com/item?id=38124676
This post also introduces Iceberg pretty nicely. Details on Class A vs Class B operations are here[0].
What kind of latency/throughput are people getting from R2? Does it benefit from parallelism in the same way s3 does?
[0]: https://developers.cloudflare.com/r2/pricing/#class-a-operat...
> What kind of latency/throughput are people getting from R2?
Not sure about now, but upload speeds were very inconsistent when we tested it a year or so ago.
Woo this is cool! I hope they start hosting public datasets like Google does for BigQuery, such as (wink wink) Hacker News archive.
Hacker News archive is hosted in ClickHouse as a publicly accessible data lake. It is available without sign-up and is updated in real-time. Example:
Also available on the public Playground: https://play.clickhouse.com/Nice! And the CREATE TABLE in that example is exactly why I'd love to have it with a catalog ;-)
Honestly don't understand how Cloudflare thinks this is a higher priority than versioning, replication of buckets, or even geo distribution of objects.
It's a strange direction. I thought Cloudflare viewed R2 mostly as competition for S3 when used as cdn backing storage (a very natural place to compete.) For which, btw, it is great -- I seamlessly use it for ActiveStorage and not only is it way cheaper, but configuring it is about 100x simpler than the s3/cloudfront/random acls/signed cookies stuff.