I’ve worked out the double free or corruption error. I was assuming that the program would find an image that matched the target somewhere for ever cell, and sometimes it didn’t, leaving the array index set to -1. Needless to say, accessing the -1st element of the image library (a vector) didn’t go so well when it tried to output the finished mosaic.
I decided to go for broke, and made the program recurse through my /home/pictures/ (just over 9000 of them) and then run a battery of 100 tests, which took a good day to sort through. It tried 10 images at thresholds from 0-100 by increments of 10. Here you can view one large image with 24x18 cells (I’ve scaled it down and turned it jpg, so the cells are only about 8x6 and not really visible, but it should show what the algorithm’s doing.) The two images are at a threshold of 50 and 100. 50 seems to be the sweet spot, though one particular image peaks at about 30:
I’m not entirely satisfied with this. It seems like some thresholds work better than others for different segments of the image. Part of me thinks that doing this unsupervised might require some sort of unsupervised image segmentation, and that’s almost an entirely different problem. On the other hand, parts of the image suggest that my similarity metric needs a little work. One really suspect area is ‘white’ skin. My metric seems to think that peach is exactly the same as pure white, which really shouldn’t happen for a ‘good’ metric.
There are a few ways I might improve this. One is by switching to a different color space like YCbCr, and treating each color channel differently. in YCbCr, the eye tends to care more about variation in Y - the luminance channel, than Cb or Cr. That said, only relying on Y would create a wild mix of colors, even if the image might be recognizable thanks to brightness. It might be interesting to test though. It also could be interesting to do a greyscale test.
The code is available on Github. There’s definitely room for improvement, but it can output some pretty nice results in the threshold = 30-60 range.