You are viewing a single comment's thread from:

RE: Some limitations that I probably should have mentioned.

in #photomag6 years ago

Currently im working on some tineye equivalent for steemit for searching for duplicate and in the future also similar images. Would you be interested? Im pretty New to programming (2-3 years as a Hobby)

Sort:  

Will it present better results than Google image search?
What exactly are you trying to achieve?

Well it was planned to use to check if a picture was already uploaded/used on steemit or not.

It could be more exact than Google for certain cases because the aproach is different. I calculate a imagehash(p-hash in my case) like tineye. Googles aproach is not open for the public to know, but probably uses machine Learning and pattern matching. From my own experience the algorithm im using is very fast (approx 1s-2s on a 1ghz single core cpu for one hash+ approx 0.2s for listing similar hashes from a database containing approx 0.5 million hashes, the database is subject to change and is missing many pictures from steemit). But the downside is that i can only find identical pictures and slightly edited pictures, whereas Google can is very good at finding similar pictures, due to machine Learning. Note that i do this project just for fun for me to learn database handling, pictureprocessing and multiprocessing.

It sounds like a great project. I didn't mean to criticize you, just wondered about the details. Even if it is not growing into something big, it will still be a great project to work on and learn from.

Finding similar images would be key though, as people tend to adjust 'stolen' images a little to make them look their own.

Well it works to a certain extent. Atm the Problem is the database structure, because the hashes are saved to sql where i only can check if they are exactly the same. To look for similar i would need to compute the hamming distance which is very slow because I need to compute it for every other hash in the database, which would be very slow. Therefore i need to expirement with b-trees.

Not sure if it is possible as I have not tried anything related before. But if you could just save the middle of the image somehow, you might be able to make a good comparison.

Well to produce a hash images are scaled down to a picture with a Pixel amount from the Power of 2 (64 pixels being the smallest with good results). Before and After resizing certain Operations are applied to get better results. Depending on the Operations used the accuracy and time to compute changes. Sample operations are, convert to grayscale, Discrete wavelet transform, Discrete cosine transform, etc. There is a Python libary that i am using : https://github.com/JohannesBuchner/imagehash
On that github repro are also links to webpages on how they work and how effective they are.
Due to these algorithims slight changes like jpeg compression artifacts, rescaling, slight cropping do not affect the hash that much. Cropping does still affect the hash the most ill try out if your Idea or similar techniques work.

Keep in mind I am just thinking out loud there. My idea is that changes made to an image mostly happen at the top and bottom. If it is possible to just check some area in the middle you could find equal and adjusted versions

Jep im just trying to explain how it works. I will test this on my database when i have time. At the moment I am waiting for New Hardware to arrive so that it will run a little bit faster (6 Times as fast) because at the moment i am using a raspberry pi 2 model b as a 24/7 server and That's just not strong enough for database + image hashing.

Coin Marketplace

STEEM 0.28
TRX 0.12
JST 0.033
BTC 62725.26
ETH 3221.94
USDT 1.00
SBD 3.85