The one all the cool kids use: https://commoncrawl.org/
Here is a good collection with ready-to-use instructions and downloads: https://clickhouse.com/docs/en/getting-started/example-datas...
Some governments have an "open data" initiative like this: https://open.toronto.ca/catalogue/
Amazon Open Data Registry
https://registry.opendata.aws/