开发者

Image-hashing algorithm to produce natural primary-key values that work well in PostgreSQL table indices?

I'm building out a set of cooperative data stores with images, and I'm starting to implement some simple/trivial content-based search and sort algorithms: SIFT, sparse color-histogram distance, basic SVD, etc.

I am currently using sha1 hashes of binary data as indices in PostgreSQL tables. These hashes are 'dumb' -- they're calculated by feeding the data in question* straight to Python's hashlib.sha1 module, and stored in nullable char columns that are exactly as lo开发者_如何学编程ng as the sha1's base64 representation.

It would be quite a panacea to implement a hash algorithm that would yield hashes suitable for indexing Postgres tables, but that also described the image in some way, à la phash or hamming distance. While phash looks like a good candidate, it turns out to require the use of a proprietary storage engine and API... I'm looking for something less 'turn-key' that will play nice with my existing Python/Postgresql/Solr/Redis-based ecosystem.

It doesn't have to be the fastest -- it's more important for me to implement an algorithm (or algorithms) that can be hacked up a bit and stay somewhat cogent.

( * ) mostly this consists of untransformed or lightly transformed harvests from my images -- things like: JPEG/PNG/DNG image file content, ICC profile data structures, JSON dumps of EXIF/IPTC tagsets, and the like.


Quite interesting approach is described in http://railsware.com/blog/2012/05/10/effective-similarity-search-in-postgresql/.

Basically image is scaled to 15x15 px, then intensity is calculated for each pixel (0.299 * red + 0,587 * green + 0,114 * blue). This array of 255 values is stored in PostgreSQL table column with Gin/Gist index for fast search of similar images.


What about a space-filling-curve, for example a hilbert curve or moore curve?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜