I am learning spark and interested in real world applications of common crawl data. If you have huge compute power and common crawl dataset, what would you do?