Silk Road forums

Discussion => Security => Topic started by: kmfkewm on February 26, 2012, 11:23 am

Title: let's talk about classification attacks since nobody wants to play a game
Post by: kmfkewm on February 26, 2012, 11:23 am
http://dkn255hz262ypmii.onion/index.php?topic=13160.45 is what made me decide to make this largely pointless although fun thread.

Let's pretend that there is a scenario where there is a ciphertext protecting the content of communications between a vendor and a customer. The customer claims that the vendor sent them the following message "You are a dumb stupid ass". The vendor says they never said such a thing. My job is to determine who is lying, but I can't see the content of the message because it is encrypted with GPG. Let's simulate that. Encrypt a message to your public GPG key, and post the ciphertext here. This simulates me having access to the ciphertext block of the alleged mean comment (which I would have if I had access to the SR database). We will pretend that this is the ciphertext block that you claim the mean vendor sent to you. Now I will ask you to tell me what the mean vendor said to you by pasting exactly what the ciphertext block decrypted into. Try to trick me into thinking the vendor said something they didn't say, or be honest by pasting exactly what the vendor said. Make sure to paste *exactly* what the message decrypted into without making any changes to the resulting plaintext.

I will then use my super leet skills to help me determine if I trust your version of events or not. Feel free to tell me many messages, with one of them being what you really encrypted to your public key and all of the others being lies. Please paste plaintext in
Code: [Select]
blocks.

example:

Code: [Select]
You are a stupid fat meany!
Code: [Select]
You are the most awesome person ever!
if you already know how I am going to go about this feel free to say....
otherwise let me prove my claim :)

also post your public key btw
Title: Re: let's talk about classification attacks since nobody wants to play a game
Post by: kmfkewm on February 26, 2012, 01:22 pm
Blugh nobody wanted to play the game :(. So I will simply explain.

If I have the following cipher text:
-----BEGIN PGP MESSAGE-----
Version: GnuPG v1.4.10 (GNU/Linux)

hQEMAwRh9rcEwAqAAQf7BgY44mNJkmAyQZH6C52eCWP6PA7iC4cXM6ArH0hM974/
iwuBk5D/xmdj7C8R6HwR3MeNgdOKFmPN8ctuImkXdgSWUqKqf6X/JPGjtr5I6+JB
lSzJrVNydGmQJ62gL8YZHR9spS/iLpXu4lrx6h1ZYOpdpZRWlGDFeMtEwW6zaK7o
03Cu00wx//ETDwGRZlrM8uSopSv+yy2LWGFpiKLnvSaHkUKIOi3DStPTHVfpVkL/
WoyvKQW2xC8a16kbrpr3buhzOlnhzrV/lXUULIPI2/SACa20DhJpQsTwXynrznY3
PLO2L5OOAZmP+yNu3SuKxXzuD7iqTqN3t5uu90vwZNJKAf39QBMn/NvT6alQjrTT
rwipvq75UgpR3xL0ptmhUi/cgtD3CeiwESe2kgqhmDxJWpx74ymOkzSrm4LsT5ZL
H2Z783xIxZZ1A6E=
=1bIX
-----END PGP MESSAGE-----


which is sent from Alice to Bob, and Bob says the message decrypts into "kmf is a stupid fuck" when I ask him what the message is (Alice of course claims that it decrypts into "kmfw is awesome"), I can quickly prove that Bob is lying because the size of the ciphertext is 479 bytes (as it would be if the plaintext was "kmfw is awesome") and the ciphertext of "kmf is a stupid fuck" would have been 486 bytes. Of course I also need to know Bobs public key parameters. Also the message could very well have been "kmf is a fool!" ... but it certainly could not be "kmf is a stupid fuck"

The moral of the story is that GPG ciphertexts don't disguise the size of GPG plaintexts.

This could be very bad if you made an encryption system that only encrypts objects from a set. Even though an attacker can not break the encryption algorithm, they can still figure out what is going on.

This is also the theory behind website fingerprinting attacks. An attacker could run an exit node and make a list of all of the websites accessed with it over a certain period of time. Then they could spider the websites visited. Then they could see the size of every page of a given website and the size of every page linked to off of that page. Then they could determine the size of every page linked to off of every page linked to from that page, etc. Now they can measure encrypted stream size at the entry node as their target surfs. Then they can use the process of elimination to determine the sites that the target is not visiting, and make a pretty good guess as to the site they are visiting (since the pattern will match and show that the target is potentially visiting that site, but certainly not visiting any of the sites for which their traffic pattern could not match).

tl:dr : given a ciphertext encrypted to a known key, you can determine what the ciphertext will NOT decrypt into, and what it COULD decrypt into...but without breaking the encryption algorithm you can not determine what the ciphertext WILL decrypt into.
Title: Re: let's talk about classification attacks since nobody wants to play a game
Post by: greatgreatgrandpa on February 28, 2012, 06:02 am
Just thinking out loud here;

What if there was a standard macro or ASCII that preceded and followed things like addresses for orders, to merely add disposable dummy text to the encrypted block, broadcasting a misleading amount of info.

ggg
Title: Re: let's talk about classification attacks since nobody wants to play a game
Post by: kmfkewm on February 28, 2012, 10:41 am
Just thinking out loud here;

What if there was a standard macro or ASCII that preceded and followed things like addresses for orders, to merely add disposable dummy text to the encrypted block, broadcasting a misleading amount of info.

ggg

that is generally called padding (or morphing, which is a more sophisticated way to apply padding). Really don't need to pad GPG messages though. Do you care if the attacker can determine how many characters your message is, as long as they can not determine what they are?

Tor already uses basic padding to protect some from some sorts of website fingerprinting (which makes it much nicer than VPNs which almost never have any features like this) .... 
Title: Re: let's talk about classification attacks since nobody wants to play a game
Post by: kmfkewm on February 28, 2012, 10:46 am
of course if you are the military and the attacker knows you are going to transmit the name of a place to bomb (either "here" or "there"), then you should make sure to use padding :P
Title: Re: let's talk about classification attacks since nobody wants to play a game
Post by: pine on February 28, 2012, 07:33 pm
OMG so I found kmfkewm's avatar!  :D :D :D

http://www.infobarrel.com/media/image/29906.jpg

Title: Re: let's talk about classification attacks since nobody wants to play a game
Post by: jpisbetterthanme on March 06, 2012, 04:59 pm
Wait is that process of elimination you described possible using tor? I thought tor covered stuff like that up?
Title: Re: let's talk about classification attacks since nobody wants to play a game
Post by: kmfkewm on March 06, 2012, 05:38 pm
Tor does a better job than most systems at trying to counter traffic classification, but it still isn't perfect. Even without using hidden markov models Tor traffic has been fingerprinted with over 50% accuracy.

http://www.wired.com/threatlevel/2010/12/flaws-spotlighted-in-tor-anonymity-network/

The accuracy would almost certainly be significantly higher if they took markov modeling into consideration.

In general, if your entry node is pwnt you are pretty much pwnt. A lot of people are paranoid about the exit node but I am far more worried about the entry node.

You can take measures to protect against this sort of attack by adding chaffing to your circuit yourself. For example, if you load a lot of pages over the same circuit the fingerprint of any given page will be lost to the combination. However this muddying of fingerprints will not be present for hidden services because they use dedicated circuits.

Still Tor does a much better job at countering this sort of attack than pretty much any encrypted VPN or proxy does. Encrypted VPN/SSL traffic can be classified with 90-99% certainty using even less sophisticated classifiers than this one. Throw in hidden markov modeling and the accuracy against VPN proxy and Tor alike is probably going to approach 100% if the target is viewing any substantially complex website or series of interlinked websites. Of course with out actually being able to decrypt the traffic you can not certainly prove what it is via traffic classification, but you can say that out of the ten million sites you have references for, the traffic Alice is getting has a 99% chance of coming from website number 374,982 in your database.

Tor devs would argue that traffic classifiers are not as worrying as they are made to sound, because nobody has a big enough reference of fingerprints (after all, even if you have fingerprint for ten million websites, your dataset doesn't take another set of ten million other websites into consideration. So the accuracy figure is limited to the size of your reference database). Other people say that just making  fingerprint references for all active hidden services (and it isn't impossible for an attacker to get such a list with a little work) plus all websites loaded through four or five malicious exit nodes is going to be enough. After all, there are a certain number of websites that most Tor users are loading, so if you have reference database for the top million some people would argue that it isn't that big of a deal that you don't have references for the bottom ten million.

For a while people wanted ISPs to use traffic classifiers to try and detect people using encryption to download child porn. I don't think much came of it. I used to have the technical specs for an ISP level traffic classification system intended to detect people who loaded CP sites through encrypted tunnels...let me see if I can dig it up again.
Title: Re: let's talk about classification attacks since nobody wants to play a game
Post by: jpisbetterthanme on March 06, 2012, 06:57 pm
Interesting. That article is from 2010 though - surely there have been enough different forms of ghosting integrated into Tor since then to overcome those seemingly-basic issues?

I ask because I don't know. A lot of this is beyond what I know about but it sounds like you could piece together even a simple .bat file to get around these issues? Or that Vidalia's feature of changing your identity every ten minutes would all but make it irrelevant?

Also, how would your entry node end up compromised?

Interested to see that page of stats you were gonna go hunt down :)
Title: Re: let's talk about classification attacks since nobody wants to play a game
Post by: kmfkewm on March 06, 2012, 07:52 pm
Interesting. That article is from 2010 though - surely there have been enough different forms of ghosting integrated into Tor since then to overcome those seemingly-basic issues?

Nah Tor is still just as vulnerable to this. The techniques to protect from it add significant bandwidth and latency over head and Tor tries to be fast.

Quote
I ask because I don't know. A lot of this is beyond what I know about but it sounds like you could piece together even a simple .bat file to get around these issues? Or that Vidalia's feature of changing your identity every ten minutes would all but make it irrelevant?

Changing circuit doesn't really help much from this, a ten minute sample is more than enough to get a traffic fingerprint. Plus your entry guards are persistent for 30-60 days and are reused on many circuits so they have longer than a ten minute sample over all.

Quote
Also, how would your entry node end up compromised?

By being added to the network by an attacker is the most likely way. Hackers could also pwn entry guards that are added by legit people though. Also you need to worry about your ISP doing these attacks, they can see all of your traffic flows regardless of the entry guard used and regardless of if it is malicious or not.

The main techniques for getting around this sort of attack are morphing, padding, cover traffic. Morphing tries to make one traffic flow look like another, for example a website might try to mimic googles traffic fingerprint. Padding adds dummy traffic that distorts or removes the fingerprint depending on how it is implemented (morphing usually makes use of padding). Cover traffic is pretty much another implementation of padding that pads the entire flow (Tor currently pads packets but not flows...all Tor packets have 512 byte payloads via padding, but Tor doesn't add entirely dummy packets to pad the stream). Splitting is another technique that can hide fingerprints. So can multiplexing. Tor does use multiplexing and it helps against traffic classifiers, its why loading multiple pages simultaneously over the same circuit makes it harder to pick a fingerprint out.
Title: Re: let's talk about classification attacks since nobody wants to play a game
Post by: jpisbetterthanme on March 06, 2012, 08:29 pm
So if I understand correctly, it is actually BENEFICIAL to your session security to load clearnet sites through tor while you're on SR or doing whatever else? I was led to believe that was a bad thing so I actually have a whole different browser open for regular clearnet surfing (though I spend all my time here anyway:P)
Title: Re: let's talk about classification attacks since nobody wants to play a game
Post by: kmfkewm on March 06, 2012, 08:39 pm
Well for hidden services it doesn't help to load multiple sites at once because you use a dedicated circuit for hidden service connections. For clearnet it helps to load multiple things at once via the same circuit, because it significantly distorts the fingerprint of any given thing that you are loading.
Title: Re: let's talk about classification attacks since nobody wants to play a game
Post by: jpisbetterthanme on March 06, 2012, 09:07 pm
Then how is a lowly peon supposed to defend himself? :(

PLEASE HELP US KMFKEWM YOU'RE OUR ONLY HOPE! :)
Title: Re: let's talk about classification attacks since nobody wants to play a game
Post by: kmfkewm on March 06, 2012, 09:14 pm
Get Tor to take anonymity more seriously than latency and/or implement a network that defends from these things. I guess you could have PHP scripts that add random padding to loaded pages to try to obscure their fingerprint, and add [padding][/padding] tags to the forum that drop what is in the padding. That might help.
Title: Re: let's talk about classification attacks since nobody wants to play a game
Post by: lilith2u on March 06, 2012, 09:18 pm
Thanks for all the above. I wish i could play :( but its over my head, especially after my first half joint of the day after-work.  My worry is I have put so much encryption shit on my computer that i don't know whats really happening there being a novice? Its very helpful reading these forums and want to learn as much as I can. of course i"m using tor, but my provider is the evil "Comcast" I wonder how vulnerable I have left myself? I get careless sometimes and realize that my tor has been on all night. goddamn Valiums....anyway thanks for the post and the community  ::)...................hi Pine!
Title: Re: let's talk about classification attacks since nobody wants to play a game
Post by: jpisbetterthanme on March 06, 2012, 09:25 pm
Get Tor to take anonymity more seriously than latency and/or implement a network that defends from these things. I guess you could have PHP scripts that add random padding to loaded pages to try to obscure their fingerprint, and add [padding][/padding] tags to the forum that drop what is in the padding. That might help.


So .... Nothing, then? :)
Title: Re: let's talk about classification attacks since nobody wants to play a game
Post by: kmfkewm on March 06, 2012, 09:26 pm
Honestly Tor so far seems to be fine by itself. The people who are attacking it and pwning it are really really smart people who generally have lots of formal and informal education. But then they throw the whitepapers out there, and it is a lot easier to implement an attack from a whitepaper than it is to come up with it on your own. Also Tor generally tries to make attacks against it harder when they are discovered, but often times they are putting a bandaid over a deep wound.

Yes Tor is weak to a lot of technical attacks, yes hidden services can be traced pretty damn easily (especially by an attacker who can order passive surveillance at specific points), no Tor will not keep you anonymous for very long from an attacker with a bit of money and know how, and there are a million and one ways that it can be attacked and pwnt.

No, nobody except for academic researchers (and signals intelligence agencies, but they don't apparently act on the intelligence gathered, or particularly give a tenth of a shit about even highly criminal Tor users) have apparently done any attacks against Tor that resulted in hidden services being traced or clients deanonymized (although LE have hacked hidden services to trace them they never pwnt Tor directly so far), yes Tor is the best (implemented) system for low latency anonymity, no Tor can not even begin to compare to the anonymity that could be offered by high latency solutions, no there are not any cutting edge high latency systems that have been implemented, the majority of this sort of shit never leaves the whitepaper or research lab (attacks and defenses) (except for prob by intelligence agencies, and no they don't give a shit about you) (even though you should never assume that attackers interested in you are too stupid to do interesting things) and nobody knows anyone who was busted via Tor being pwnt (and there are case studies that show people who used Tor getting away from the FBI and Interpol even with significant international resources focused on tracing them).

That pretty much sums it up as one big huge run on sentence. If you take away "Tor is secure!" from this or "Tor is insecure!" from this will largely depend on your outlook on things / personality type ;) (and perhaps who you think is after you....secure from who/what is a much better question after all...)
Title: Re: let's talk about classification attacks since nobody wants to play a game
Post by: pine on March 06, 2012, 09:29 pm
Get Tor to take anonymity more seriously than latency and/or implement a network that defends from these things. I guess you could have PHP scripts that add random padding to loaded pages to try to obscure their fingerprint, and add [padding][/padding] tags to the forum that drop what is in the padding. That might help.

Choices:

> Lag
> Incarceration

It's like being between Scylla and Charybdis  :o

If you had a network with different 'update' cycles. Almost like time zones. Where e.g. this entire forum updates itself every 5 minutes instead of per request, then that might be acceptable. Faster moving applications like the Road, that's a bit of a problem unless you could isolate the wait time to one specific occurrence per login, it'd be ok then.
Title: Re: let's talk about classification attacks since nobody wants to play a game
Post by: lilith2u on March 06, 2012, 09:58 pm
Get Tor to take anonymity more seriously than latency and/or implement a network that defends from these things. I guess you could have PHP scripts that add random padding to loaded pages to try to obscure their fingerprint, and add [padding][/padding] tags to the forum that drop what is in the padding. That might help.

Choices:

> Lag
> Incarceration

It's like being between Scylla and Charybdis  :o

If you had a network with different 'update' cycles. Almost like time zones. Where e.g. this entire forum updates itself every 5 minutes instead of per request, then that might be acceptable. Faster moving applications like the Road, that's a bit of a problem unless you could isolate the wait time to one specific occurrence per login, it'd be ok then.
I'll take the lag
Title: Re: let's talk about classification attacks since nobody wants to play a game
Post by: kmfkewm on March 06, 2012, 10:05 pm
Get Tor to take anonymity more seriously than latency and/or implement a network that defends from these things. I guess you could have PHP scripts that add random padding to loaded pages to try to obscure their fingerprint, and add [padding][/padding] tags to the forum that drop what is in the padding. That might help.

Choices:

> Lag
> Incarceration

It's like being between Scylla and Charybdis  :o

If you had a network with different 'update' cycles. Almost like time zones. Where e.g. this entire forum updates itself every 5 minutes instead of per request, then that might be acceptable. Faster moving applications like the Road, that's a bit of a problem unless you could isolate the wait time to one specific occurrence per login, it'd be ok then.

Sounds kind of how a mix network works.

https://en.wikipedia.org/wiki/Mix_network

I am working on implementing a mix network right now actually :). For data transfer of a more complex system that I hope will make shutting down /attacking the free market a little bit harder, even if it is already secure enough ;P.
Title: Re: let's talk about classification attacks since nobody wants to play a game
Post by: jpisbetterthanme on March 06, 2012, 10:09 pm
It's like being between Scylla and Charybdis  :o


^ Pine you are awesome :-D
Title: Re: let's talk about classification attacks since nobody wants to play a game
Post by: kmfkewm on March 06, 2012, 10:12 pm
Meh that wikipedia article is actually seriously lacking. They don't put enough emphasis on *The node shuffles the message order* which as far as I am concerned is a requirement for a network to be considered a mix network.
Title: Re: let's talk about classification attacks since nobody wants to play a game
Post by: pine on March 06, 2012, 10:21 pm
Get Tor to take anonymity more seriously than latency and/or implement a network that defends from these things. I guess you could have PHP scripts that add random padding to loaded pages to try to obscure their fingerprint, and add [padding][/padding] tags to the forum that drop what is in the padding. That might help.

Choices:

> Lag
> Incarceration

It's like being between Scylla and Charybdis  :o

If you had a network with different 'update' cycles. Almost like time zones. Where e.g. this entire forum updates itself every 5 minutes instead of per request, then that might be acceptable. Faster moving applications like the Road, that's a bit of a problem unless you could isolate the wait time to one specific occurrence per login, it'd be ok then.

Sounds kind of how a mix network works.

https://en.wikipedia.org/wiki/Mix_network

I am working on implementing a mix network right now actually :). For data transfer of a more complex system that I hope will make shutting down /attacking the free market a little bit harder, even if it is already secure enough ;P.

Onwards and upwards!  8)

Title: Re: let's talk about classification attacks since nobody wants to play a game
Post by: CaptainSensible on March 06, 2012, 11:04 pm
What problems do you see in Tor users running their own bridge (on another network) and configuring Tor to connect to that bridge?  That way the first hop in the Tor network is a known relay/bridge.  The only problem I can see is the user's bridge getting hacked by an attacker.