Quote from: kmfkewm on April 29, 2012, 04:27 amQuoteNote: once you choose a hashing function, you should stick to it. Also, obviously if buyers play games with their addresses e.g. deliberate mis-spellings, then there's not a lot you can do about it. But, if you see something obviously misspelt, that is a 'red flag' for you. Doesn't mean they are scammer, it could be the buyer is trying to ensure more of the so called 'plausible denialability', but still.Fuzzy hashing could help eliminate that unless the customer horribly misspells shit. Instead of hashing the entire address, split it up into chunks of say 5 bytes, then hash those chunks. Now if they make a single misspelling, there will still be a % match to the previously stored hash that will be statistically higher than normal. This will take a bit more work to do. You also generally want a big input when working with fuzzy hashes, an address doesn't really give a whole lot of data to work with (versus say a 1megabyte image that you want to be able to detect even if it has had some graphics editing done to it), and if the sections are too small it will make brute forcing easier and make salting much more required and also since it is such a small input and addresses are sometimes close it could lead to some false positives. Right now I would like to find a list of all possible legitimate ways to write out the same address, so I can have my script standardize them prior to hashing. edit: since it was Pines idea and he/she/it seems keen to write the script, I will discontinue mine. What language are you going to write it in Pine?Fuzzy hashing seems like a superior idea, thanks for that. It was intending to write the code in Javascript, it's a small language but it's powerful enough for something like this. Really we ought collaborate on the idea , many eyes make bugs shallow and all. Although the actual code is relatively trivial and short, a lot of thought needs to go into it to make it right. Also I haven't done much work on the Javascript API, but it looks easy to learn.What I'll do, is write up an initial draft with the bare bones functionality. Then I'll come back here and yourself and the other guys can critique it. That way I can get feedback and people can pose problems/logical errors I haven't thought of, as well as making nicer code and so on. After 2 or 3 drafts, it ought to be good.On the issue of public hash black-lists, I think we ought to wait to brainstorm it until this is implemented. This little project shouldn't take too long anyway. Better to have something concrete first. Still, if anybody has any sudden insights, pop them onto the coding thread anyhow just in case you forget them later on.Also, on the issue of address formatting, there's a handy webpage posted here called "frank's compulsive guide to postal addresses" that might prove handy (was one of the first threads on this forum).