
Since it’s a minimal modification, there is only a small Hamming distance of 2 between them. The original image above is modified with a watermark.

In the example below, we have two visually duplicated images (IMAGE_1 and IMAGE_2) whose SHA512 hash keys are the primary keys in a DynamoDB table, and their image IDs are the secondary keys. The ability to map any file into unique hash keys is useful for matching duplicate content but is not very viable in the case of images.įor instance, you could implement a simple reverse image search in an image library using the image file hash key as a primary key in a database. Common hash algorithms include MD5 and SHA. A hash function is an algorithm that maps data to unique fixed-sized values called hash keys, which are represented as strings. Hashing is one simple way of implementing a reverse image search. This blog post describes the design and process we went through to build such a system to the scale of Canva. To cater to this, one tool we use is perceptual hashing with an internally built reverse image search system.


The diversity of this media poses challenges around the moderation and reduction of unnecessary duplicate content. Canva hosts a huge collection of ever-growing, user media that needs to be stored and managed properly.
