If the images are all black and white, why not train in a two tone, or greyscale colour space?
What neat results! Did you exclude the plants and shellfish? How many images were in the training set?
By my calculations that training would cost you ~$700 on AWS. Is that right?
4 * 24h * $7.20/h = $691.20
$7.20/h for an 8 GPU instance: https://aws.amazon.com/ec2/instance-types/p2/
Beautiful!
Silhouettes don't have enough information for a neural network to really learn the structural relations. You can do "generative zoology" with full color and get very convincing results-- here's a GAN trained on beetle illustrations: https://www.cunicode.com/works/confusing-coleopterists