Ask HN: Storing Images in PostgreSQL vs. Object Storage for Large Datasets

by atif089on 2/28/2024, 8:29 PMwith 3 comments

Is it advisable to store images directly in PostgreSQL for a dataset of 100 million records, each with a 200KB image, or should I use object storage with references from the start? My primary and only use case involves creating multimodal embeddings for search and relevance purposes.

by throwaway38375on 2/29/2024, 10:31 AM

If you are storing 100 million images at 200KB each, that comes out at 20TB!

I would calculate the costs of something like S3 versus buying five 4TB HDDs and running a network file server.

You're going to save a ton of money hosting this yourself. I would go with two used powerful desktop PCs. One as a DB server and the other as the file server.

Store the images on the file server and store the image's path in the database server.

by speedgooseon 2/29/2024, 5:48 AM

You could do quick tests using bytea, toast, or large objects.

But an object store may be more convenient overall.

When I did something similar, I did store embeddings and the image UUID in a table and my images in an object store with the same UUIDs as filenames. It was simpler to upload the images and put them available through a CDN.

by reactoron 3/2/2024, 9:25 AM

Use something like SeaweedFS or Minio etc.