Silk Road forums

Discussion => Security => Topic started by: kmfkewm on June 09, 2013, 03:58 am

Title: Brainstorming the ideal anonymity network
Post by: kmfkewm on June 09, 2013, 03:58 am
Right now there are three anonymity networks that are given any respect by the academic community. Of course Tor is the favorite of the academic world, but I2P and Freenet have both also had some professional analysis done on them. If we count remailer networks then we can also include Mixmaster and Mixminion.

Tor was primarily designed for accessing the clearnet anonymously. It separates routing nodes and clients, meaning that being a client and being a relay are not mutually inclusive. Tor primarily derives its anonymity by routing layer encrypted client communications through three different nodes before they make it to the destination server. For the most part, attackers cannot deanonymize a Tor user unless they can watch the users traffic enter the network and exit the network. By having a very large network of volunteer routing nodes, Tor manages to significantly reduce the probability that any attacker will happen to control the entry and exit nodes selected by the client. Tor also pads all packets to the same size, 512 bytes.

I2P was essentially exclusively designed for accessing Eepsites anonymously. 'Eepsite' is the I2P jargon for hidden services. In almost all cases, I2P clients are also I2P relays. I2P derives its anonymity in much the same way as Tor, outgoing communications are layered encrypted and routed through a series of user selected relays prior to reaching their destination. I2P has some key differences from Tor though. For one Tor uses bidirectional tunnels, meaning that forward and return traffic share the same circuit. I2P uses unidirectional tunnels, forward traffic and return traffic are routed through different tunnels. There is an argument that this helps to protect some from internal website fingerprinting, but I doubt it does much. I2P developers also argue that it helps protect from internal timing attacks, but I believe it most likely increases the vulnerability to such attacks. One thing that possibly protects I2P users from internal timing attacks is the fact that virtually all nodes are relays, and tunnels by default can be variable length. This means that an internal attacker watching traffic originate at me and end at the destination may not be able to actually tell that the traffic really originated at me, it is possible that I forwarded it on for someone else. Tor almost always uses three nodes and hardly any clients are also routing nodes, so it does not get this potential protection of I2P. Another distinguishing feature of I2P is that pretty much all of the users can be enumerated in little time, this is also due to the fact that virtually all nodes are routing nodes. This makes I2P particularly weak to intersection attacks, where an attacker monitors who is currently online and looks for correlations between this set of users and the set of hidden services / pseudonyms currently making use of the network. Tor is not particularly weak to this sort of attack, because the entire list of Tor users is not so readily available. I2P also pads all packets to the same size, I believe 1kb.

Freenet is the most different of the three anonymity networks. Whereas Tor focuses on allowing users to access the clearnet anonymously, and whereas I2P focuses on allowing users to access 'Eepsites' anonymously, Freenet focuses on allowing users to publish and access files while maintaining plausible deniability. Freenet also focuses very strongly on being resistant to censorship. I2P and Tor allow clients to anonymize their access to websites, and websites to hide their location from clients. On the other hand, Freenet allows publishers to insert content and clients to retrieve it anonymously. Essentially always Tor hidden services are hosted on single servers, I2P allows multihomed Eepsites but frequently they will be hosted on single servers as well. On the other hand, Freenet content is always redundantly hosted across the Freenet network of volunteer nodes. Freenet is not really for hosting a website like SR, with php etc, it is more like an anonymous file sharing network. Of course, custom client side software can be made to give people the ability to use Freenet for various things, for example there are Freenet email software packages and Freenet forum software packages. As with I2P, virtually all Freenet clients are also Freenet relays, what is unique to Freenet is that essentially all Freenet clients also store data for the entire network. In addition to sharing their bandwidth with the network, Freenet clients also share their hard drive space. Freenet gains its anonymity entirely because of the fact that all clients are also relays, and data can travel through paths of vastly different lengths prior to arriving at its destination. This means that if a client requests a file through their neighboring nodes, the neighboring nodes cannot easily determine if the client submitted the request themselves or if they are just forwarding it on for somebody else. Likewise, the fact that essentially all clients donate drive space to the network, and hold arbitrary files, means that if a content publisher adds a file to the network through their neighboring nodes, the neighboring nodes have a difficult time determining if the publisher originally published the content or if they are just forwarding it on for someone else. Freenet uses two layers of encryption, one layer for the content and one layer for the links. An encrypted file on Freenet looks the same at all positions on the network, but as file chunks are transferred throughout the network they are sent through dynamically encrypted links. Freenet also swams files over many nodes, and pads all file fragments to the same size.

Mixmaster and Mixminion are mix networks and a remailers. Remailer network are used for sending anonymous email messages. Remailer networks usually layer encrypt forward messages and pad all of them to the same size (they must be padded to the same size at every hop as well). They derive their anonymity by mixing messages, ie: time delaying messages until the mix gathers many messages, then reordering the messages prior to sending them out. This technique offers an extremely high degree of anonymity compared to the low latency networks like Tor and I2P. Even if an attacker watches the links between all mix nodes, the sender maintains their anonymity for some volume of messages sent. In fact, a single good mix on a messages path is usually all that is required to maintain anonymity. The academic literature regarding mix networks is vast and I cannot possibly hope to summarize it all here, but there are many different ways of constructing mixes as well as of trying to attack them. For example, one sort of mix network is called threshold mixing. In a threshold mixing scheme, the mix node gathers messages until a threshold is met (say, 100 messages), prior to reordering the messages and forwarding them on. An attack against a threshold mix is called a flushing attack. In a flushing attack, the attacker first waits for the targets message to enter the mix, at which point they flood the mix with their own messages up to the threshold. This forces the mix to send the targeted message prior to establishing a large enough crowd size to hide the message in, because the attackers messages can be filtered out by the attacker. An even more dangerous version of the flushing attack in the n-1 attack. In this case the attacker can delay the targets message, perhaps because they are the ISP of the target or they are the first node utilized by the target. In this case the attacker flushes the subsequent mix entirely, then releases the target message, then flushes the subsequent mix again. This causes the target to have a crowd size of 1 because the attacker manipulated the mix into only mixing the targets message with filterable attacker messages.

Anyway now that I have explained the very basics of the techniques currently being utilized, I want to brainstorm a bit about what the ideal anonymity network would entail. First of all we would need to think about what our goals are:

1. Clients and routers mutually inclusive (as in I2P and Freenet) or not necessarily (as in Tor)?

Networks where all clients are routers have advantages and disadvantages. The most obvious advantage is that they scale fantastically well and can deal with large volumes of bandwidth. The most obvious disadvantage is that not all users are able to route traffic for others. In the case of a network like Tor or I2P, it is very beneficial to have a large network, because the more nodes there are the smaller percentage of the nodes the attacker can control, and therefor the less likely it is that the attacker can monitor a users traffic at the network edges. In the case of mix networks it is actually a disadvantage to have a very large network. This is because the smaller the network is the more concentrated the messages traveling through it are, and therefor the larger crowd sizes messages can be mixed with. Another risk of all clients being routers is the possible added susceptibility to intersection attacks, we would want to think of a way to make it so the entire client list is not trivially enumerated, or else we would open up this risk.

2. For access to the clearnet or for hidden services?

Allowing access to the clearnet has advantages and disadvantages. Two of the primary disadvantages are that for one the exit node utilized is always going to be capable of spying on the users traffic (although it could be user encrypted), and for two the exit nodes are always going to be abused. The advantages of allowing exiting to the clearnet are that a lot more people will use the network, a lot more traffic will be routed through the network and a much more diverse group of users will use the network. 

3. For services on which content is put, or for content upon which services are built?

In the case of I2P and Tor, anonymity is provided to servers and content can be put on the servers. In the case of Freenet deniability is provided to access to raw data, and any services that utilize Freenet need to be custom designed to work with the raw data retrieved through or published to Freenet. Both of these strategies have advnatages and disadvantages. In the case of systems such as Tor and I2P the advantage is that people are already accustomed to running services on top of servers, and there is already an enormous library of php scripts and such that can be run on anonymized servers. It takes a lot more work to provide a service on top of freenet. On the other hand, in the case of Freenet the security is arguably increased as people can move away from massively bloated all purpose browsers and to task specific bare-bone programs. Additionally, in the case of I2P and Tor, content is vulnerable to hackers who can penetrate the server that it is hosted on, in the case of Freenet the security of content is mostly dependent on the security of Freenet which is likely to itself be more secure than most newbies can ever hope to secure their own servers. Freenet is also essentially immune to content being censored or DDoSed because it is spread through potentially thousands of different servers around the world.

4. High or low latency? Maybe variable latency??

Of course there are advantages and disadvantages to high and low latency networks. Low latency networks are snappy, like Tor and I2P. You can use them for browsing the internet and, although they feel a bit sluggish, they offer more or less the same experience as surfing the internet without an anonymity network at all. Of course the down side to low latency networks is that they are severely crippled in the anonymity department. In the case of Tor you are certainly only safe until your attacker can watch a single packet originate at you and arrive at your destination. I2P might be a bit more resistant to this because of all clients also being routers + variable length tunnels, but it is still weak to a variety of attacks. On the other hand, mix networks are slow. The current mix networks can take hours and hours to deliver a single message. They are also unreliable because of this; as nodes go down before messages reach them, the messages are dropped and never arrive. I2P and Tor are so fast that reliability is much higher, your traffic goes from the first node you select all the way to your destination site in a matter of seconds. On the other hand, mix networks don't necessarily need to be slow, the only reason they are slow is so that each mix can gather an adequate amount of messages prior to reordering and firing. On a very heavily used mix network, very small time delays would be required to gather an adequate crowd size. On the plus side mix networks offer significant anonymity, even if an attacker can observe all links between mix nodes and the internal state of all but one mix node, some level of anonymity is still maintained.

Title: Re: Brainstorming the ideal anonymity network
Post by: kmfkewm on June 09, 2013, 03:58 am
5. How should we obtain our goals?

A. Untraceability

This means that an attacker cannot tell where traffic they observed originated from. Pretty much all anonymity networks put some focus on untraceability because it is required for essentially all other properties that anonymity networks strive for.

B. Unlinkability

This means that an attacker cannot tie multiple actions to the same user.

C. Deniability

This means that an attacker cannot prove beyond a reasonable doubt that a user originally published or intentionally accessed certain published information. The attacker may know that a certain user published certain information, but they cannot determine that they originally or knowingly published it. Likewise, they may know that a certain user requested certain information, but they cannot determine if the user intentionally accessed the published information. This strategy is strongly utilized by Freenet, and to a lesser extent by I2P. Tor is the only one of the networks that puts absolutely no focus on deniability.

D. Membership Concealment

This means that an attacker cannot determine who is using the network. Tor and Freenet both put an emphasis on membership concealment, these days Tor puts a very strong focus on it with the advent of their steganographic bridge links. On the other hand I2P has essentially zero membership concealment, essentially the entire user list of the network is an open book.

E. Censorship resistance / blocking resistance

This means that an attacker cannot prevent users from accessing the network, and also cannot prevent publishers from publishing content. Tor focuses a large amount of effort into preventing attackers from blocking users from accessing the network, but it is currently quite weak to attackers censoring content published through the network. I2P puts essentially no effort into preventing users from being blocked from accessing the network, but it does make it hard to censor services from being accessed through the network (due to multi homed service support). Freenet puts effort into preventing blocking and also does a spectacular job of preventing content from being censored from the network (it is extremely difficult to censor content on freenet).

6. What are some attacks that we know we need to give consideration to?

A. Timing attacks

A timing attack is when an attacker can link packets together through statistical analysis of their arrival times to multiple locations. There are two known ways to prevent timing attacks; mixing can offer very strong defenses from internal and external timing attacks, and plausible deniability via variable length tunnels and forced routing can protect from internal (but probably not external..) timing attacks coming to any certain conclusions. 

B. Byte counting attacks

A byte counting attack follows a flow through the network by counting how many bytes it consists of. The only way to protect from a byte counting attack is by using padding. If all flows are padded to the same size, as is the case with modern remailer networks, then byte counting attacks are impossible. If all flows are padded rounded to the nearest byte, then packet counting attacks become less reliable, as is the case in Tor (where all traffic flows are rounded up to the next multiple of 512 bytes) and I2P (where all traffic flows are rounded up to the next multiple of 1KB). Of course there are two sorts of byte counting attack, counting the bytes of individual packets (easily prevented by padding all packets to the same size) and counting the bytes of individual traffic flows (harder to prevent unless all flows are padded to the same size, accuracy can be reduced with any amount of padding though).

C. Watermarking attacks / tagging attacks

These are less sophisticated than timing attacks but work in a similar fashion. A watermarking attack is when the attacker modifies a traffic stream to make it identifiable at a later point. One way of accomplishing this is by delaying individual packets to embed a detectable interpacket arrival time fingerprint in the flow. Time delayed mixing can protect from watermarking attacks, because the mix gathers all packets prior to forwarding them on, and this removes the embedded watermark. Networks like Tor and I2P are weak to watermarking attacks because the relay nodes forward packets on as they get them, so the interpacket arrival characteristics stay consistent once modified.

D. Intersection attacks

Intersection attacks work by identifying multiple crowds that the target must be in, and then removing all nodes from the suspect crowds that do not appear in all of the suspect crowds. For example, if you can enumerate all of the nodes on a network during the time that the target sends communications to you, you can determine that the target is one of the nodes currently on the network. After doing this many times, you can reduce the size of the suspect list, due to the natural node churn. Intersection attacks have a variety of different manifestations.

E. Traffic identification attacks

Traffic identification is the most trivial of attacks to protect from. If you send traffic through a series of nodes without layer encrypting it, a node can identify traffic it previously routed at a later point simply by looking for the same traffic at a later point. I2P and Tor protect from this attack by using layers of encryption, Freenet does *not* protect from internal traffic identification attacks (only external), but it doesn't really need to because it relies so much on its strong plausible deniability techniques.

F. All of the known mix attacks, like flushing etc

I already explained this previously



Anyway I am a bit tired of typing and I cannot possible summarize all of the things we would need to take into consider anyway, so I will wrap this up with some suggestions.

First of all I think that low latency networks are already covered with Tor and I2P. It is not likely that we are going to be able to make any significant advances to the state of low latency anonymity, and if we were going to it would be by convincing the Tor developers to make some tweaks, not by designing a brand new network. I think that high latency networks are too slow to attract many users, and although they technically can be used for browsing the internet etc, they are too slow to do so. So I think that a variable latency network is the best bet. There is some research already done on this in the context of mix networks, it is called Alpha mixing or Tau mixing. As far as using a mix network goes, this is a bit of a tough call. On the one hand I think mixing is by far the most researched and proven way of providing strong anonymity, on the other hand I would really like to have a P2P anonymity network like I2P, and I would worry that a very large network would dilute the concentration of messages to mix together. Perhaps this can be slightly ameliorated by the utilization of dummy traffic, which would be more realistic on a P2P network with lots of bandwidth.

I definitely think that any new networks should support access to the clearnet. Networks that are only for hidden services simply do not attract as many people as networks that can be used for surfing the regular internet. Additionally, allowing access to the clearnet essentially guarantees a large pool of cover traffic for mixing, and that translates into more anonymity with less time delay. On the other hand, I think that I prefer the freenet strategy of hosting content distributed through out the network. I think that this will encourage more people to actually run hidden services, as they will not need to learn how to configure a server and more importantly they wont need to buy a server in the first place. The primary disadvantage with this is that we will need to create use case specific applications, such as a software package for forums, one for emails, one for blogging, etc. If Tor hidden services have shown us anything, it is that people who want to run hidden service servers don't have the technical expertise required to do so securely. I also like how resistant Freenet hidden services are to DDoS and similar censoring attacks.

I think that deniability is an important aspect that we should definitely utilize. Mixing traffic can protect from timing attacks being carried out, deniability techniques can prevent timing attacks from being used to prove anything after they are carried out. We would primarily be focusing on a fairly medium latency user base, people who want to access sites fast enough to surf the internet, but who require enough anonymity that  they can wait a minute or two. By having variable time delays, traffic of all latencies is given an anonymity advantage, even traffic without any delay at all. This means that just having some people using the network in a high latency fashion, the average user base using it in a medium latency capacity will have increased anonymity. By having time delays at all we will be able to protect some from timing attacks, of course ideally you have multi hour delays to protect from timing attacks, but even 0 seconds to 1 minute per hop should make the network more resistant to timing attacks than Tor or I2P are. Having variable length paths and having all clients route by default will provide plausible deniability as well. All of these things in combination should offer significant protection from timing attacks.

Another thing we need to consider is our padding strategy. It is very easy to pad all packets to the same size and of course we should do this. However, it is also extremely ideal if all traffic flows consist of a single packet. The more padding that is used the more likely it is that an arbitrary webpage can be loaded with a single fixed size packet (ie: if all packets are 1MB, then all webpages 1MB and below can be loaded with a single packet). On the other hand, larger packet sizes leads to inefficient and impossible to scale networks (ie: if all packets are 1MB, then you just spent 1MB * number of routing nodes utilized to send your three byte "lol" message). Perhaps swarming can be used to help hide the size of large messages, or something sort of like I2P's unidirectional tunnels (except it would be more like hydra tunnels).

I am a big fan of layer encrypted networks, and of course for mixing to be utilized layer encryption has to be utilized as well.

Another possibility is using PIR somewhere. The bleeding edge theoretical mix networks use PIR rather than SURB mixing for message retrieval.
Title: Re: Brainstorming the ideal anonymity network
Post by: astor on June 09, 2013, 05:45 am
Great write up!

First, I want to point out, as you probably know but didn't mention, that hidden services can be multihomed, you simply publish the descriptor from two or more boxes. It isn't common, but I have talked to people who do it. Also, I2P has a hidden mode that is similar to entry guards and Freenet's darknet mode. So the features of Tor and I2P overlap more than is usually considered.

One of the most important properties of an anonymity network is the size of the user base. A high latency mix network with one user offers no anonymity. Similarly, I would feel a lot safer using I2P if it had a million concurrent users. Unfortunately, it only has 10,000 to 20,000 users. The main reason of course is that Tor offers easier access to clearnet sites, and it doesn't require you to be a relay. So those are the most important properties for an anonymity network with a large user base.

But to address your ideas, are you brainstorming a theoretical network, or something actually worth building? Because I think any competitor network will suffer the same problem that competitor darknet markets suffer. Everyone is on SR, so everyone will use SR, regardless of how good the alternatives are from a technical standpoint. Right now, 90% of anonymity network users are on Tor. It's doubtful a significant number of people would bother to use another anonymity network, even if was much more robust. Tor is "good enough" for most people.

So if you're describing a theoretical network, your ideas are good. If you want to build something that people would actually use, why not layer it on top of Tor? Route it through Tor but with additional properties that enhance anonymity. Since Tor clients control their circuits, they can easily build variable length paths. Adding timing delays would require modification of relays, and thus cooperation of others, but it might be easier to convince the Tor developers and relay operators to do that than to build a useful competitor network.
Title: Re: Brainstorming the ideal anonymity network
Post by: AllDayLong on June 09, 2013, 06:18 am
So if you're describing a theoretical network, your ideas are good. If you want to build something that people would actually use, why not layer it on top of Tor? Route it through Tor but with additional properties that enhance anonymity. Since Tor clients control their circuits, they can easily build variable length paths. Adding timing delays would require modification of relays, and thus cooperation of others, but it might be easier to convince the Tor developers and relay operators to do that than to build a useful competitor network.

Doesn't Tor need like a lot of help with planned improvements?
Title: Re: Brainstorming the ideal anonymity network
Post by: kmfkewm on June 09, 2013, 07:20 am
Great write up!

First, I want to point out, as you probably know but didn't mention, that hidden services can be multihomed, you simply publish the descriptor from two or more boxes. It isn't common, but I have talked to people who do it. Also, I2P has a hidden mode that is similar to entry guards and Freenet's darknet mode. So the features of Tor and I2P overlap more than is usually considered.

I did know that it was possible to multihome Tor hidden services, but I didn't mention it because I have never heard of anybody actually doing it before. Also, the I2P scene is much more focused on multihoming, their community has been very involved in working on forks of tahoe-lafs for multihoming dynamic Eepsites, whereas the Tor community generally ignores multihoming all together. You are right though that multihoming is possible with Tor as well, and I should have mentioned this rather than give the impression that this feature is unique to I2P. Speaking of rarely used Tor features, it also supports authenticated access to hidden services such that clients without a specific cookie can not even determine if the hidden service is up or not. 

I also am aware of the hidden mode with I2P, this is also a rarely used configuration for I2P. the primary problem with operating in hidden mode is that afaik you no longer route traffic for other peers. I am not an expert in regards to I2P, but from what I can gather the developers generally suggest against running in hidden mode due to it damaging some of the anonymity providing properties of I2P. Of course it also protects from some attacks, such as intersection attacks and obviously membership enumeration.

Quote
One of the most important properties of an anonymity network is the size of the user base. A high latency mix network with one user offers no anonymity. Similarly, I would feel a lot safer using I2P if it had a million concurrent users. Unfortunately, it only has 10,000 to 20,000 users. The main reason of course is that Tor offers easier access to clearnet sites, and it doesn't require you to be a relay. So those are the most important properties for an anonymity network with a large user base.

Yes definitely allowing exiting to the clearnet is required to gain a substantial user base (and all of the delicious cover traffic they bring with them). I am really torn between having all users route by default or not.

Advantages of all users routing:

A. The network can scale much more easily (Tor is constantly running into resource problems, I2P has an abundance of resources)
B. It makes it much easier to add plausible deniability
C. It opens up the possibility of having a distributed data store like Freenet, which I find very attractive
D. The abundance of resources allows for heavier use of dummy traffic and other anonymity increasing, bandwidth intensive techniques
E. The network is likely to grow much larger (20,000 routing nodes versus 3,000 routing nodes) which makes it harder for an attacker to monitor a large % of it

Disadvantages of all users routing:

A. Not as many people want to make resources available as want to consume resources. Having users route by default could lead to a much smaller overall user base, even if the number of routing nodes is larger.

B. If all users route it is very likely that it will open the network up to client enumeration, and this will likely lead to weakness to various sorts of intersection attack

C. If mixing is utilized, having a very large network will dilute the number of messages that mix together at any one hop, potentially significantly reducing the anonymity that can be provided by mixing

Quote
But to address your ideas, are you brainstorming a theoretical network, or something actually worth building? Because I think any competitor network will suffer the same problem that competitor darknet markets suffer. Everyone is on SR, so everyone will use SR, regardless of how good the alternatives are from a technical standpoint. Right now, 90% of anonymity network users are on Tor. It's doubtful a significant number of people would bother to use another anonymity network, even if was much more robust. Tor is "good enough" for most people.

Well my interest in anonymity networks predated SR and the massive SR user base by many years, so I am not really concerned with SR being the primary destination of people who use the darknet. Tor is definitely by far the most popular network though, and any new comer will have trouble even growing to the same size as Freenet or I2P. So I would say that I am brainstorming a theoretical network, but a theoretical network that would be worth building. I really do love Tor but I am entirely convinced that it is not capable of continuing to provide anonymity as the scrutiny against it increases. Simple analysis of Tor reveals that a fairly modest attacker can cause enormous damage to those who use it. We have not seen this carried out in practice yet, and we never will until we do. But looking at the theoretical strengths and weaknesses of Tor, the only conclusion I can come to is that Tor is just not something I want to continue trusting with my life. After the first wave of Tor arrests comes, and in my opinion this will be sometime in the fairly near future, perhaps in a year or two, people will look for alternatives because they will realize that Tor is actually no longer good enough. But I am interested in anonymity networks theoretically and practically, and even if nobody ever uses a superior network it is interesting enough in itself to make one.

Quote
So if you're describing a theoretical network, your ideas are good. If you want to build something that people would actually use, why not layer it on top of Tor? Route it through Tor but with additional properties that enhance anonymity. Since Tor clients control their circuits, they can easily build variable length paths. Adding timing delays would require modification of relays, and thus cooperation of others, but it might be easier to convince the Tor developers and relay operators to do that than to build a useful competitor network.

I can see merit to layering some things on top of Tor (for example a remailer network), but I think that something that is fundamentally an alternative to Tor would not make much sense to layer through Tor. I also doubt that the Tor developers have much interest in fundamentally changing their network. Right now we have low latency anonymity networks a la I2P and Tor, deniable file sharing networks a la Freenet, and high latency mix networks a la Mixminion and Mixmaster. I think that the remailer networks are so slow and unreliable and E-mail specific that hardly anybody will ever use them, that I2P and Tor are so fundamentally insecure that they will not withstand attack for much longer, and that Freenet is so unique that it files sort of a niche market (it can't be used for surfing the internet, it can't be used for E-mail to people on the clearnet, it can't be used for hosting a traditional website, etc). I think that the anonymity network of the future will be a mixture of all of these things: fast enough to surf the internet but slow enough that timing attacks can be somewhat protected from (0-3 minutes of delay total), incorporating plausible deniability as much as possible while still allowing for the clearnet to be surfed, allowing hidden services that are stored distributed through out the network like Freenet or multihomed like I2P and sometimes Tor, etc.

Pretty much I think it will be Freenet in that plausible deniability will be a primary focus (because this offers strong protection and is easier to obtain than actual anonymity), Tor in that exiting to the clearnet will be possible, and Mixminion in that it will look like a greatly watered down remailer network (using the same techniques as the remailers, but to a much smaller degree, to allow for reasonable latencies).

Also, I do believe that a userbase would be attracted. Now more than ever before people are taking an interest in stuff like this. Look at how quickly BitMessage got over 100 nodes. When the earliest academic papers analyzing Tor started coming out it only had a few dozen nodes.
Title: Re: Brainstorming the ideal anonymity network
Post by: kmfkewm on June 09, 2013, 07:39 am
Essentially I think that Tor is the RSA-1,024 of anonymity networks. In 2004 it was more than good enough. In 2013 it looks like it is probably already breakable by the most powerful attackers. Data retention laws are becoming more prevalent, NSA is monitoring as much of the internet as it can, the FBI is well past carnivore, more and more sophisticated attacks are being discovered, etc. Tor is roughly as untraceable as a single hop proxy. If the attacker is monitoring the site you visit, the middle and exit nodes are all but worthless. If the attacker owns your entry node, you are fucked. Entry guards rotate every 30 to 60 days. We need a network that protects from edge timing attacks, and we need a network that provides deniability in the event that an edge timing attack is successful. The only way we can obtain this, that I am aware of, is via time delayed mixing, uniform padding, variable path length and all nodes routing. 
Title: Re: Brainstorming the ideal anonymity network
Post by: astor on June 09, 2013, 06:54 pm
Doesn't Tor need like a lot of help with planned improvements?

They need a lot of help with a lot of stuff. It would be great if improving the hidden service protocol was a top priority. Tor started out as an anonymity network, but their focus has turned into censorship circumvention, because it happens to be a side effect of using an anonymity network -- although the same thing can be accomplished with one-hop proxies. That's why bridges and obfsproxy protocols were added to the network. The Tor Project works on specific projects that sponsors pay them to work on, and for the most part those sponsors are western government agencies and NGOs that want to help activists in repressed countries. So unless someone with deep pockets comes along and asks for specific deliverables related to hidden services, I don't expect to see much improvement there.

Speaking of rarely used Tor features, it also supports authenticated access to hidden services such that clients without a specific cookie can not even determine if the hidden service is up or not. 

I use that for all of my hidden services. :)

Quote
Yes definitely allowing exiting to the clearnet is required to gain a substantial user base (and all of the delicious cover traffic they bring with them). I am really torn between having all users route by default or not.

Advantages of all users routing:

A. The network can scale much more easily (Tor is constantly running into resource problems, I2P has an abundance of resources)
B. It makes it much easier to add plausible deniability
C. It opens up the possibility of having a distributed data store like Freenet, which I find very attractive
D. The abundance of resources allows for heavier use of dummy traffic and other anonymity increasing, bandwidth intensive techniques
E. The network is likely to grow much larger (20,000 routing nodes versus 3,000 routing nodes) which makes it harder for an attacker to monitor a large % of it

Disadvantages of all users routing:

A. Not as many people want to make resources available as want to consume resources. Having users route by default could lead to a much smaller overall user base, even if the number of routing nodes is larger.

More importantly, not all users are able to route. Some are behind unconfigurable NAT. Some have crappy connections. Some can only connect for short periods of time. I2P encourages you to stay connected, because it can take 15 minutes to establish a useful number of connections to the rest of the network. If you can only connect for an hour a day, you waste a lot of time just integrating yourself into the network. Apparently, Freenet is even worse on that point.

Relaying from home is free, whereas the Tor network relies on volunteers spending a lot of money to run high bandwidth relays to handle all of the users. Still, there seems to be sufficient interest that this hasn't harmed Tor yet.

Also, if you want to allow access to clearnet sites, you should not allow arbitrary newbs to be exit nodes. Some people will unwittingly get in a lot of trouble and that will drive everyone away from the network.

I think it's a combination of the network requirements and the lack of clearnet access that makes I2P users a very selective group. No offense to them, I think they are great people, but they are very homogenous. Almost all of them know how to code. Almost all of them are professional technologists or very tech savvy hobbyists. That works well for them now, because there isn't a lot of controversial content on the network. There are no major drug or CP sites. But if I2P was invaded by those groups, that situation would change. Not only might technical weaknesses be revealed by serious adversaries, but it would become obvious that they lack the cover you get from mixing with very diverse crowds. If there was a major CP invasion, then everyone using I2P would be a suspect, whereas I'm quite comfortable using Tor even if someone sees me using it, because of the plausible deniability of the very diverse crowd.

So for these many reasons, I don't think people should be required to relay, and the size and diversity of the user base should be maximized.

Quote
Well my interest in anonymity networks predated SR and the massive SR user base by many years, so I am not really concerned with SR being the primary destination of people who use the darknet.

I was using that as an example of a monopoly and the pressures that come with it. The same logic applies to Facebook, for example. Everyone hates it but no one seems to be able to quit, even though open source, federate social networks exist (which you can run as hidden services or eepsites, even).

Quote
Tor is definitely by far the most popular network though, and any new comer will have trouble even growing to the same size as Freenet or I2P. So I would say that I am brainstorming a theoretical network, but a theoretical network that would be worth building. I really do love Tor but I am entirely convinced that it is not capable of continuing to provide anonymity as the scrutiny against it increases. Simple analysis of Tor reveals that a fairly modest attacker can cause enormous damage to those who use it. We have not seen this carried out in practice yet, and we never will until we do. But looking at the theoretical strengths and weaknesses of Tor, the only conclusion I can come to is that Tor is just not something I want to continue trusting with my life. After the first wave of Tor arrests comes, and in my opinion this will be sometime in the fairly near future, perhaps in a year or two, people will look for alternatives because they will realize that Tor is actually no longer good enough. But I am interested in anonymity networks theoretically and practically, and even if nobody ever uses a superior network it is interesting enough in itself to make one.

Well, if you believe the network is going to be crippled by mass arrests, that's a good reason to start designing a robust alternative.

I still wonder if adding features like layered, permanent entry guards is not worth doing in the short term.

Quote
I can see merit to layering some things on top of Tor (for example a remailer network), but I think that something that is fundamentally an alternative to Tor would not make much sense to layer through Tor. I also doubt that the Tor developers have much interest in fundamentally changing their network. Right now we have low latency anonymity networks a la I2P and Tor, deniable file sharing networks a la Freenet, and high latency mix networks a la Mixminion and Mixmaster. I think that the remailer networks are so slow and unreliable and E-mail specific that hardly anybody will ever use them, that I2P and Tor are so fundamentally insecure that they will not withstand attack for much longer, and that Freenet is so unique that it files sort of a niche market (it can't be used for surfing the internet, it can't be used for E-mail to people on the clearnet, it can't be used for hosting a traditional website, etc).

One thing I've thought about, especially since I've been hanging out with the I2P folks lately, is a trans-proxy. Similar to the onion.to and i2p.us in-proxies, or exit nodes and I2P out-proxies, but trans-proxies would proxy connections between anonymity networks. For example, to access eepsite whatever.i2p from Tor, you would go to whatever.i2p.transproxy.onion, and to access hidden service whatever.onion, you would go to whatever.onion.transproxy.i2p. You could chain these things together, so if you want to use an exit node from I2P, a modified dot exit URL like  www.google.com.RelayName.exit.transproxy.i2p would get you there. Ok, that's a bit confusing for newbs, but you could access and enjoy the properties of different networks at the same time. Somehow, Freenet could be integrated into this too, so you can the plausible deniability of accessing files from Freenet, but through a hidden service, and thus a Tor connection that doesn't expose you as a Freenet user.

This might even be an easier way to get the mixed properties of your theoretical network.
Title: Re: Brainstorming the ideal anonymity network
Post by: kmfkewm on June 10, 2013, 10:02 am
Quote
Also, if you want to allow access to clearnet sites, you should not allow arbitrary newbs to be exit nodes. Some people will unwittingly get in a lot of trouble and that will drive everyone away from the network.

Would definitely make it so people need to select to be exit nodes, after being given a warning about what it means to be an exit node.

Quote
I think it's a combination of the network requirements and the lack of clearnet access that makes I2P users a very selective group. No offense to them, I think they are great people, but they are very homogenous. Almost all of them know how to code. Almost all of them are professional technologists or very tech savvy hobbyists. That works well for them now, because there isn't a lot of controversial content on the network. There are no major drug or CP sites. But if I2P was invaded by those groups, that situation would change. Not only might technical weaknesses be revealed by serious adversaries, but it would become obvious that they lack the cover you get from mixing with very diverse crowds. If there was a major CP invasion, then everyone using I2P would be a suspect, whereas I'm quite comfortable using Tor even if someone sees me using it, because of the plausible deniability of the very diverse crowd.

I think it is primarily the lack of clearnet access. Let's face it, most hidden services are boring. I have never looked at the I2P eepsites but my guess is that they are about as boring as Tor hidden services tend to be, if not more so. I guess there are some I2P torrent sites but I have heard torrenting through I2P is still pretty slow.

Quote
So for these many reasons, I don't think people should be required to relay, and the size and diversity of the user base should be maximized.

In the case of Tor you may have a larger user group to blend into, but the thing is that this will not protect you if an attacker is positioned to do a timing attack against you. If you use Freenet there might be a 90% chance that you are trading CP, but actually proving that somebody is trading CP on Freenet is arguably a lot harder than proving somebody is trading CP on Tor. In the case of Tor, if the attacker owns your entry guard and can observe your traffic arrive at the destination, you are screwed. In the case of Freenet the attacker can be your 'entry node' and indeed the entry node always can see the content they pass to you, but they can still not easily prove that you requested the content.

Quote
Well, if you believe the network is going to be crippled by mass arrests, that's a good reason to start designing a robust alternative.

Just look at the recent HSDIR attack. An attacker is capable of being all HSDIR servers for a hidden service. That means they have the ability to constantly be positioned for 1/2 of a timing attack against any hidden service, and the clients accessing any hidden service. If they own 33.3333% of the bandwidth of the (I think?) 900 or so entry guards, they can deanonymize close to 100% of people who access the targeted hidden service within 60 days. That is the level of an attacker that can deanonymize almost all users of a targeted hidden service: if they can do the HSDIR attack and if they contribute 33.3333% of the entry guard bandwidth for 60 days. Even if they contribute less bandwidth and wait for 30 days, they are still going to be able to do some serious damage. Even if they own only a fraction of the entry guard bandwidth, they will be able to do serious damage over many months.

Quote
I still wonder if adding features like layered, permanent entry guards is not worth doing in the short term.

Layered entry guards could be good for hidden services, but as it stands the attacker doesn't need to own the hidden services entry guards to do an edge timing attack against clients connecting to the hidden service. The attacker only needs to own the HSDIR nodes of the hidden service, or the introduction nodes of the hidden service. Permanent entry guards would also be a good idea, but I doubt the Tor developers ever implement that because it would lead to a lack of resource balancing.

Quote
One thing I've thought about, especially since I've been hanging out with the I2P folks lately, is a trans-proxy. Similar to the onion.to and i2p.us in-proxies, or exit nodes and I2P out-proxies, but trans-proxies would proxy connections between anonymity networks. For example, to access eepsite whatever.i2p from Tor, you would go to whatever.i2p.transproxy.onion, and to access hidden service whatever.onion, you would go to whatever.onion.transproxy.i2p. You could chain these things together, so if you want to use an exit node from I2P, a modified dot exit URL like  www.google.com.RelayName.exit.transproxy.i2p would get you there. Ok, that's a bit confusing for newbs, but you could access and enjoy the properties of different networks at the same time. Somehow, Freenet could be integrated into this too, so you can the plausible deniability of accessing files from Freenet, but through a hidden service, and thus a Tor connection that doesn't expose you as a Freenet user.

I have used a transproxy to do Tor -> I2P once before actually. Actually I used Tor to connect to Freenet myself in the past, no transproxy involved :). Transproxies are neat but I don't think that they will offer the anonymity required unfortunately. Tor to Freenet is probably pretty good though.
Title: Re: Brainstorming the ideal anonymity network
Post by: jackofspades on June 10, 2013, 04:25 pm
Awesome post, very informative and professionally written.

I think a lot of people on here use TOR all the time and have no idea how it works, i feel smarter, just after reading your post:)
I have nothing else to offer to the discussion other than "thanks" for your insight.
Title: Re: Brainstorming the ideal anonymity network
Post by: astor on June 10, 2013, 04:47 pm
Just look at the recent HSDIR attack. An attacker is capable of being all HSDIR servers for a hidden service. That means they have the ability to constantly be positioned for 1/2 of a timing attack against any hidden service, and the clients accessing any hidden service. If they own 33.3333% of the bandwidth of the (I think?) 900 or so entry guards, they can deanonymize close to 100% of people who access the targeted hidden service within 60 days. That is the level of an attacker that can deanonymize almost all users of a targeted hidden service: if they can do the HSDIR attack and if they contribute 33.3333% of the entry guard bandwidth for 60 days. Even if they contribute less bandwidth and wait for 30 days, they are still going to be able to do some serious damage. Even if they own only a fraction of the entry guard bandwidth, they will be able to do serious damage over many months.

I think you overestimate how easy and effective that is. The total entry guard bandwidth in the network is 1200 + 800 = 2000 MB/s [1]. You  need to add 50% bandwidth to the existing network to become 33% of the final bandwidth, so that's 1 GB/s. Assuming the attacker adds very high bandwidth, 30 MB/s relays, he would have to add over 30 relays. If 30+ relays at 30 MB/s suddenly showed up on the network, people would notice.

Then the attacker would have run to these relays for at least month, and what would he get? A list of people accessing SR. A list of tens of thousands of people. What could he do with it? Accessing the market doesn't prove you did anything illegal. It doesn't tell you who the vendors are. It would still be incredibly costly to perform traditional police work to identify the high value targets.

Once again, the large crowd protects you.


1. https://metrics.torproject.org/bwhist-flags.png
Title: Re: Brainstorming the ideal anonymity network
Post by: astor on June 10, 2013, 04:58 pm
Well actually, once they have the IP addresses, then they could order from the big vendors and find the city. The IP addresses that they enumerate in any city would be a short list.

Vendors definitely need to use bridges, permanent entry guards or VPNs.
Title: Re: Brainstorming the ideal anonymity network
Post by: kmfkewm on June 10, 2013, 05:13 pm
warning: I am really just brainstorming here and in the process of writing this post I came to the conclusion that some of the trains of thought I was following would not work very well. I didn't originally plan to, but several times I sort of had my train of thought abruptly diverge in the process of writing this. Most people will probably be very bored reading this, I am going to post it anyway on the off chance that anybody reads it and gives it any thought. I find that when I talk things out as if I am talking to people that I can think more clearly about a subject, I don't care if anybody actually is listening :D.

I am thinking something along these lines:

A. Native / default support for layering with Tor

This will pretty much be required to get a user base to start with. People trust Tor already, and Tor is already big enough that it can provide some anonymity. Layering Tor with another network can only help anonymity, and Tor already has put a lot of focus on blocking resistance / membership concealment, features that seem silly to try to reproduce in a new network when they are already provided adequately by Tor.

B. P2P network

I definitely think that all nodes should route by default. If all nodes run as hidden services we can get around the issue you pointed out of not all users being able to route due to being behind NAT. I suppose it would be H2H, hidden service to hidden service. Normally I would strongly advise against anything where users run as hidden services, but if we can add plausible deniability and protection from timing attacks to the picture, it should be acceptable. I think the only way we are going to be able to add plausible deniability is if all nodes route for all nodes. I also like the idea of all nodes providing some hard drive space to all nodes, this will allow for distributed content that is very resistant to censorship. 

C. Support for multiple use cases: exiting to clearnet, centralized hidden services, distributed content storage, internal E-mail

By being designed for as many use cases as possible, we may be able to attract a large number of users. The primary issue will be designing the system such that all of the different sorts of traffic blend in together, and do so while consuming reasonable amounts of bandwidth. I will need to give more thought to how these different sorts of traffic can be made to blend together.

I have some basic ideas for how traditional hidden services could be provided by a mix network. Utilization of single use reply blocks (SURBS) seems to be an acceptable way to obtain this. Single use reply blocks were introduced by mixminion. When a user sends a forward message through a mix network, first they need to construct a layer encrypted cryptographic packet that securely routes the message payload through the network, starting from the closest node to the message sender and then with layers of encryption being removed as the message is routed forward all the way to the furthest node from the message sender. SURBS are cryptographic packets that route towards the person that constructs them. Essentially Alice creates a SURB and sends it through the mix network to Bob, as described previously. After Bob obtains the SURB, he can attach it to an outgoing message and the message is routed to Alice. From a SURB Bob only learns the closest node to himself, all of the other nodes are only known by Alice and neighboring nodes. One primary difference between normal forward routing and routing with a SURB is that in the case of the former the payload has a layer of encryption stripped from it at each mix, whereas in the case of the later usually a layer of encryption is added to the payload at each mix.

It seems to me that a hidden service could create SURBs that route data to it and then publish them somewhere. After clients obtain a SURB for the hidden service they want to access, they can establish a connection to the hidden service with it, sending their own SURB for the hidden service to send replies to them. Essentially this is the same exact way that remailer networks use SURBs, but instead of applying it to E-mail it would be applied to a client <-> centralized hidden service model. This is not a well thought out idea at this point, primarily we would still need to think of a way for clients to obtain the SURBs in the first place, and for hidden services to refresh the supply of SURBs available to clients.

Another issue is that SURBs are not particularly secure as they are usually used. With SURBs, global passive adversaries can usually link two communicating parties together after only several dozen to thousand messages are exchanged between them. Due to the limited anonymity of SURBs, state of the art theoretical remailers have ditched them all together in favor of Private Information Retrieval (PIR). In the case of remailers that use PIR instead of SURBs, there is a mix network as well as a nymserver network and a PIR network. Users register an address at a nymserver through the mix network, and can control their account through the mix network as well. When Alice sends a message to Bob, she sends it through the mix network to the nymserver of Bob. Bob's nymserver then batches Alice's message to Bob together (adds them to a bucket) with all other messages to Bob that arrived in a given time period (called a cycle). After each cycle completes, Bob's nymserver distributes all of its users buckets to the PIR network. Every cycle Bob engages in the PIR protocol with some number of the PIR nodes in order to obtain his bucket (hundreds of cycles worth of buckets can be stored at a time, and Bob can go through them at his leisure, so long as he always engages in the PIR protocol once for each completed cycle). This system enormously strengthens the anonymity of bidirectional communications through a mix network as it removes a limit on the number of messages Alice and Bob can exchange before their anonymity suffers in the face of a global passive adversary. This comes at the expense of requiring a massive infrastructure to support it (mix network + nymserver network + PIR network), and having semi-trusted intermediaries (the Nymserver). 

For E-mail type systems I would definitely be in favor of using something based on PIR, and PIR based systems are widely recognized as being greatly more anonymous than SURB based systems. However, for a traditional hidden service I don't think it is possible to use PIR. This does bring up a bit of a problem with integrating E-mail and Hidden services into the same network, in the case of E-mail it would be better to use PIR but then it would be using a different mechanism than the hidden services. Perhaps by having plausible deniability due to all nodes routing for all nodes, we can use SURBs for E-mail messages without the traditional weakness to GPA long term intersection attacks. Let's break down the risks of SURBs, here is a quote from a paper that discusses the problems with them:

Quote
Nym servers based on reply blocks (discussed in Section
2.1 above) are currently the most popular option for re-
ceiving messages pseudonymously. Nevertheless, they are
especially vulnerable to end-to-end traffic analysis.
Suppose an adversary is eavesdropping on the nym server,
and on all recipients. The attacker wants to know which user
(call her Alice) is associated with a given pseudonym (say,
nym33). The adversary can mount an intersection attack,
by noticing that Alice receives more messages, on average,
after the nym server has received a message for nym33 than
when it has not.6 Over time, the adversary will notice that
this correlation holds for Alice but not for other users, and
deduce that Alice is likely associated with nym33.
Recent work [19, 43] has studied an implementation of
these intersection attacks called statistical disclosure, where
an attacker compares network behavior when Alice has sent
to network when she is absent in order to link an anonymous
sender Alice to her regular recipients Bob1 ...BobN . Against
pseudonymous recipients, however, these attacks are far eas-
ier: in the anonymity case, many senders may send to any
given recipient Bobi , but with pseudonymous delivery, only
one user sends or receives messages for a given pseudonym.
To examine this effect, we ran a version of the attack simu-
lations described in [43], assuming a single target pseudonym
and N non-target pseudonyms providing cover. In order to
make the attack as difficult as possible (and thus establish
an upper bound on security), we assume that users behave
identically: they receive messages with equal probability ac-
cording to independent geometric distributions in each unit
of time (receiving no messages with probability 1 − PM );
they use identical reply blocks with path length through
mixes in a steady state that delay each message each round
with probability PD .
We ran the simulated attack with different values for PM ,
PD , and , against a nym server with N = 216 active pseudo-
nymous users. (This is probably an overestimate of the num-
ber of users on typical nymserver today [45].) We performed
100 trials for each set of parameters. In the worst case (for
the nym holder), when PM = 0.5, = 1, PD = 0.1, the lack
of mix-net delay variance allowed the simulated attacker to
guess the user’s identity correctly after the user received an
average of only 37 messages. In the best simulated case
(PM = 0.5, PD = 0.9, = 4), the user received an average
of only 1775 messages before the attacker guessed correctly.
For an active user, this is well within a month’s expected
traffic.
Although there are ways to use dummy traffic to resist
statistical disclosure attacks, these are difficult to implement
perfectly in practice (due to outages) and even slight imper-
fections render users vulnerable to attack [43].

So in summary, the risk of SURBs is that Alice can obtain X reply blocks for Bob and then use them to send X messages to Bob, and then watch to see if any of the IP addresses using the network have X more messages arrive at them than usual. It is possible to protect from this if the nymserver stores SURBS for Bob, and only sends messages out in fixed cycles, but in such a case the nymserver is still capable of attacking Bob's anonymity by not respecting the cycle scheme. There are some fundamental differences of the PIR based systems that protect from this attack: Alice (or the nymserver) pushes messages to Bob with SURBs, whereas with the PIR designs Bob pulls messages from the PIR network. With the PIR systems, Bob only pulls X bytes of data per cycle in all cases, with SURBs Bob has as much data pushed to him as people with SURBs for him desire to push.   

In the case of everybody gets everything PIR this attack is protected from even if data is pushed rather than pulled. For example, with BitMessage, if somebody sends Bob x messages, every node on the network receives X additional messages, making the intersection attack impossible to carry out. So the fundamental mechanism leading to the insecurity attributed to SURBs is that they can be used by an attacker to cause Bob to receive unique amounts of data. It seems that it is possible to use SURBs with nearly the same security as PIR if Bob receives messages to a nymserver and then sends the nymserver SURBs in fixed duration cycles, with the network itself enforcing the size of all routed messages (a timestamp enforcement mechanism would also be required). Since Bob determines when the nymserver has a SURB for him, and since the nymserver can only route X bytes to Bob per SURB, it seems that this avoids the specific attack attributed to SURBs. Of course using SURBs for message retrieval still lowers the anonymity to that which can be provided by a mix network, whereas PIR can guarantee anonymity unless all of the utilized PIR servers are compromised (or even if they are all compromised, in the case of everybody gets everything).

Anyway, I am getting a little bit off topic because a hidden service could not utilize a nymserver like this and still serve hundreds of clients. The original goal was to determine if all nodes routing for all nodes protects from this weakness of SURBs. The answer is obviously that it does to an extent, but probably not enough of one. If the attacker is a GPA they will see several nodes get X messages after spamming Bob, but they will be able to follow the flow and it will end at Bob. Bob will maintain plausible deniability from a local internal attacker, after all he could be forwarding the X messages on himself. However, a local external attacker will be able to defeat his plausible deniability. The only way Bob could protect from this is by sending out a number of dummy messages equivalent to the number of legitimate messages he obtains for himself. However, this will only create a crowd size, and unless it is an everybody gets everything PIR like BitMessage, the attack will still be possible for narrowing in on Bob. Of course it is probably asking to much to protect a nearly-low-latency network from a GPA, and this would still allow to protect from a local internal attacker.
Title: Re: Brainstorming the ideal anonymity network
Post by: kmfkewm on June 10, 2013, 05:17 pm
Like you said they could enumerate the people connecting from a certain city and correlate it with known vendor shipping locations. Additionally they could try to get a vendor to write a message to them (or just view a post from a vendor), that would get them the timestamp of when the vendor was online which would allow them to match it up against known connections through their entry guard. Not to mention they would also be able to match the amount of data sent through their guard and compare it to the size of the message they saw from the vendor etc (timing attack + fingerprinting attack).
Title: Re: Brainstorming the ideal anonymity network
Post by: kmfkewm on June 10, 2013, 05:46 pm
Also 1 gigabyte per second really isn't that much bandwidth. A quick search of hosting providers shows 1 gigabit per second unmetered packages averaging around $700 a month. I believe there are 8 gigabits in a gigabyte, so that means $5,600 a month to have enough bandwidth to fuck hidden services and their clients. It would take 60 days for all clients (and hidden services) to rotate entry guards enough to probabilistically select one of the bad entry nodes. That puts the price for massively breaking Tor anonymity at about $11,200 dollars. It would take about $11,200 and 60 days to deanonymize any hidden service and the majority of the clients connecting to the hidden service. I could afford to carry that attack out. I don't want to use a network that I can defeat with traffic analysis.
Title: Re: Brainstorming the ideal anonymity network
Post by: astor on June 10, 2013, 06:07 pm
So you want to run a client that speaks the Tor protocol and creates a hidden service, but the service is to be a relay? That is a very interesting idea, and something that, at least superficially (I'll have to give it more thought), I wouldn't be afraid to use.

If you build a separate network, then someone has to run the first node. That's a problem for us, since we have a preexisting need for anonymity, being associated with this community. But if the new network layers features onto the Tor network, it will be easier to get people to use it, and if it includes something like a messaging system, which a lot of people want right now given the Tormail problems, they will have an incentive to use it.

And it's interesting that hidden relays could provide plausible deniability to hidden services in the event of known attacks.
Title: Re: Brainstorming the ideal anonymity network
Post by: astor on June 10, 2013, 06:12 pm
Also 1 gigabyte per second really isn't that much bandwidth. A quick search of hosting providers shows 1 gigabit per second unmetered packages averaging around $700 a month. I believe there are 8 gigabits in a gigabyte, so that means $5,600 a month to have enough bandwidth to fuck hidden services and their clients. It would take 60 days for all clients (and hidden services) to rotate entry guards enough to probabilistically select one of the bad entry nodes. That puts the price for massively breaking Tor anonymity at about $11,200 dollars. It would take about $11,200 and 60 days to deanonymize any hidden service and the majority of the clients connecting to the hidden service. I could afford to carry that attack out. I don't want to use a network that I can defeat with traffic analysis.

It's not that simple. Buying 8 servers with 1 gigabit ports doesn't mean all 1 gigabit will be used. In fact, from discussions I've seen, that's guaranteed not to be the case.

But my objection has more to do with the fact that adding 50% bandwidth to the network in a week, or even a month, would be noticeable. An attacker would have to spread it out over several months, greatly increasing the cost of the attack.

And you have to factor in the 12-18 servers needed for HSDirs, and the computational cost of brute forcing their fingerprints to be closest to the descriptor ID.
Title: Re: Brainstorming the ideal anonymity network
Post by: kmfkewm on June 10, 2013, 06:33 pm
So you want to run a client that speaks the Tor protocol and creates a hidden service, but the service is to be a relay? That is a very interesting idea, and something that, at least superficially (I'll have to give it more thought), I wouldn't be afraid to use.

If you build a separate network, then someone has to run the first node. That's a problem for us, since we have a preexisting need for anonymity, being associated with this community. But if the new network layers features onto the Tor network, it will be easier to get people to use it, and if it includes something like a messaging system, which a lot of people want right now given the Tormail problems, they will have an incentive to use it.

And it's interesting that hidden relays could provide plausible deniability to hidden services in the event of known attacks.

Making a messaging system that piggy backs on top of hidden services is (relatively) easy to do. There are already designs out there and as a matter of fact such a system is about 85% implemented already (alpha mixing + all mixes are hidden services + PIR network for message retrieval + decentralized directory servers + nymservers + automatic message encryption + dummy traffic + provably secure cryptographic packet format). The challenge is extending what is already done so that it can support multiple types of traffic, rather than just E-mail like messages. Currently I am thinking of how what is already done can be modified into something that can be used for things like surfing the clearnet with mixing, running traditional hidden services with mixing, being a file share, etc. A network that meets multiple use cases. One thing that I think would be nice is to add network wide plausible deniability, but that would entail making it a P2P network I believe (or H2H like I said). That doesn't mesh well with the (semi-centralized, definitely not P2P) private information retrieval system that has been implemented. I am trying to determine how hard it will be to modify the system so that it can encompass all of these things, without having to scrap large parts of the work that has already been done. Perhaps replacing the PIR part of the system with something like Freenet will be adequate. Perhaps the system should be left for messaging only.
Title: Re: Brainstorming the ideal anonymity network
Post by: kmfkewm on June 10, 2013, 06:45 pm
Also 1 gigabyte per second really isn't that much bandwidth. A quick search of hosting providers shows 1 gigabit per second unmetered packages averaging around $700 a month. I believe there are 8 gigabits in a gigabyte, so that means $5,600 a month to have enough bandwidth to fuck hidden services and their clients. It would take 60 days for all clients (and hidden services) to rotate entry guards enough to probabilistically select one of the bad entry nodes. That puts the price for massively breaking Tor anonymity at about $11,200 dollars. It would take about $11,200 and 60 days to deanonymize any hidden service and the majority of the clients connecting to the hidden service. I could afford to carry that attack out. I don't want to use a network that I can defeat with traffic analysis.

It's not that simple. Buying 8 servers with 1 gigabit ports doesn't mean all 1 gigabit will be used. In fact, from discussions I've seen, that's guaranteed not to be the case.

But my objection has more to do with the fact that adding 50% bandwidth to the network in a week, or even a month, would be noticeable. An attacker would have to spread it out over several months, greatly increasing the cost of the attack.

And you have to factor in the 12-18 servers needed for HSDirs, and the computational cost of brute forcing their fingerprints to be closest to the descriptor ID.

Even if it takes 12 servers with 1 gigabit ports, and they are added one at a time once per month:

$700 for the first month
$1,400 for the second month
$2,100 for the third month
$2,800 for the fourth month
$3,500 for the fifth month
$4,200 for the sixth month
$4,900 for the seventh month
$5,600 for the eighth month
$6,300 for the ninth month
$7,000 for the tenth month
$7,700 for the eleventh month
$8,400 for the twelfth month

$54,600 to obtain an adequate amount of servers / bandwidth then + $8,400 * 2 to maintain it for two months (which wouldn't be required as many would be deanonymized up to this point) = $71,400 and 14 months. That does make it a lot more unrealistic for me, although in my *ahem* glory days I did have $50,000 in cash once :D. On the other hand I know big vendors who have stacks of hundred thousand dollars sitting on their kitchen tables. I don't feel comfortable using a network that they can defeat with traffic analysis in a year.
Title: Re: Brainstorming the ideal anonymity network
Post by: astor on June 10, 2013, 07:03 pm
The crypto in Tor makes it CPU-bound. From the discussions I've seen, top of the line relays max out around 300-400 Mbit. That is why the Torservers are carrying around 30 MB (240 Mbit). 1. Because their theoretical maximum is so low. 2. Because few relays carry their theoretical maximum, even the exit nodes. I believe herngaard was pushing 50 MB at one time, which is 400 Mbit, and that's the most I've ever seen. So having a 1 gigabit port doesn't mean much. In order to push 1 GB, you will need 30+ servers.

Title: Re: Brainstorming the ideal anonymity network
Post by: kmfkewm on June 10, 2013, 07:46 pm
I see 300 megabit per second unmetered servers for $300 each. I believe you are correct in saying that the relays are computationally limited, so I will agree with you that, conservatively, ~300 megabits is the most they can push, and that it would therefor be pointless to have more bandwidth made available by a single server ( although I don't know the exact cut off, you certainly seem to though :) ). That means that, rounding up in the spirit of conservatism, we need 27 servers to obtain enough *utilizable* bandwidth (an important distinction that you are correct to have mentioned). Assuming we add two of these servers to the network per month, over about 14 months (rounded up as well).

$600 - first month
$1,200 - second month
$1,800 - third month
$2,400 - fourth month
$3,000 - fifth month
$3,600 - sixth month
$4,200 - seventh month
$4,800 - eighth month
$5,200 - ninth month
$5,800 - tenth month
$6,200 - eleventh month
$6,800 - twelfth month
$7,200 - thirteenth month
$7,800 - fourteenth month

$7,800 * 2 + $60,600 = $76,200 over 16 months
Title: Re: Brainstorming the ideal anonymity network
Post by: astor on July 10, 2013, 10:17 pm
Figured I would add this here, or maybe we should create a separate thread for publishing literature about anonymity network security. AnonBib is focused mostly on Tor.

Here's a new paper about deanonymizing I2P users: http://wwwcip.informatik.uni-erlangen.de/~spjsschl/i2p.pdf

There's no date in the paper, but a thread was started about it 2 weeks ago here: http://zzz.i2p.us/topics/1414?page=1#p6850

It also references papers from 2012. So, I think it was published in the last month or two, and it was definitely published this year.



Practical Attacks Against The I2P Network

In this paper, we describe an attack that can be used to break the anonymity of a victim who is using anonymized resources in I2P – for example, a user browsing eepsites (I2P’s terminology for anonymous websites) or chatting. We are able, with high probability, to list the services the victim accesses regularly, the time of access, and the amount of time that is spent using the service.

We first show how an attacker can tamper with the group of nodes providing the netDB, until he controls most of these nodes. This is possible because I2P has a fixed maximum number of database nodes (only a small fraction of nodes in the entire network host the database). The set of nodes can be manipulated by exploiting the normal churn in the set of participating nodes or by carrying out a denial of service (DoS) attack to speed up the change. We show how a Sybil attack [6] can be used as an alternative approach to control the netDB.

By leveraging control over the network database, we demonstrate how an Eclipse [7, 8] attack can be launch. This results in services being unavailable or peers getting disconnected from the network.Finally, our deanonymization attack exploits the protocol used by peers to verify the successful storage of their peer information in the netDB. The stor age and verification steps are done through two independent connections that can be linked based on timing. Using the information gathered by linking these two interactions, an attacker can determine (with high probability) which tunnel endpoints belong to specific participants (nodes) in the I2P network, and, therefore, deanonymize the participant.

Experimental results were gathered by tests performed both on our test network and on the real I2P network (against our victim nodes running the unmodified I2P software; no service disruption was caused to the actual users of the network).

In summary, the main contributions in this paper are the following:

1. A novel deanonymization attack against I2P, based on storage verification
2. Complete experimental evaluation of this attack in the real I2P network
3. Suggestions on how to improve the I2P to make it more robust

Title: Re: Brainstorming the ideal anonymity network
Post by: Rastaman Vibration on July 11, 2013, 03:29 am
Wow guys! Amazing info here! Thanks so much for sharing your wisdom

 8)
Title: Re: Brainstorming the ideal anonymity network
Post by: PingFail on July 11, 2013, 08:36 pm
@kmfkewm
I too have had similar thoughts and it is nice to find someone else to brainstorm with. We share the same goals and I would like to know your thoughts on this outline.


*****OVERVIEW*****
A network that consists of a mixed, decentralized, and distributed core, mixed PIR outer layers, variable latency via proof of work, multi data-type support, built-in multi host network support (clearnet, i2p, tor, etc cross communication) and the ability to completely hide who on the network is the recipient.


*****NETWORK*****
CORE
The core network is a mesh topology in which all members exchange all data. Similar to Freenet. Data is exchanged between machines using a mixed PIR retrieval system. Machines would accumulate data until the threshold is met. They would then mix and advertise their new data to the connections they have. The connections have the option to retrieve all, none, or some (Using PIR) of the advertised data.

Core members could choose whether or not they want to have their connection information added to distributed list of connections (bootstrapping). This connection information could be ip address, hostname, hidden service, etc. These core members would also be dedicating disk space and bandwidth to the network.

Data that is transmitted across the network consists of two parts. The first part is a small identifier for the second, larger, part. The identifier would include the proof-of-work done on the message as well as an encrypted header for the second part. Once the accumulation threshold is reached then the first part is what is announced/sent to connections for advertising new data. If a connection already has one/some of the data being advertised, they will leave it out of their request list to the announcer. This also allows PIR operation to happen unannounced. Additionally this group can/may be padded.

Proof of work level and data size would dictate the forwarding priority level. The higher the pow, the lower the latency (assuming that there is enough traffic to lower the latency at the time). Also, the smaller the data, the lower the latency. Pow would be required and based on the size of the data being sent, but only at a minimal level.

The sent data would be encrypted with the recipients public key but could contain and be used for anything (messaging, transactions, publishing, file transfers, etc)

*****SERVICES*****
REMAILER
A remailer would function as described earlier in this thread. Multi-layer encryption. This multi-layer encryption function can also be used to require recipients to have both sets of authentication.

This layering can also be used to transport other services from one location to another. Without leaving a traceable path.

CONTENT PUBLISHING
Publishers have the option to send to 1 person or everyone. Since the system is distributed, if readers/recipients share the same decryption key then they would be able to all access the content from a single source.

Content can be sent directly to the requester and the static framework for displaying the content be accessible via the distributed, multi-person accessible, data. This would lower the overhead for both participants.

Other services could be built upon this transport structure. This network disconnects the sender and receiver from each other, allowing other services to be built on top of it to carry out whatever function they wish.

</brainstorming>

Title: Re: Brainstorming the ideal anonymity network
Post by: kmfkewm on July 12, 2013, 11:58 am
Quote
A network that consists of a mixed, decentralized, and distributed core, mixed PIR outer layers, variable latency via proof of work, multi data-type support, built-in multi host network support (clearnet, i2p, tor, etc cross communication) and the ability to completely hide who on the network is the recipient.

I think that mixing is very important for strong anonymity. PIR is also very useful but great care needs to be taken in how it is implemented. Cross network communication is a possibility, although it might be better to base it on top of Tor. In the case of routing nodes, it is better if they all share a common anonymity network.

Quote
CORE
The core network is a mesh topology in which all members exchange all data. Similar to Freenet.

In Freenet all members usually act as routing nodes and data stores, but all members don't share all data. All members sharing all data sounds more like BitMessage, and I don't like this because I think it will scale poorly. I also think that if all users are routing nodes that the anonymity of the network will be hard to ensure, when clients act as routing nodes it tends to open up the risk of intersection attacks. Freenet has managed to do this in a pretty secure way, but I2P is very vulnerable to intersection attacks due to essentially all clients also being routers. BitMessage is also very weak to intersection attacks due to all clients being routing nodes. Tor has substantially protected its users from several sorts of intersection attack by not having all clients act as routing nodes. Another issue with all nodes being routing nodes is that it will dilute the ability to mix messages. Mix networks are actually more secure if they have less routing nodes, because then more messages mix together at each hop. With a network like Tor the anonymity depends on the size of the network, because the goal is to prevent an attacker from observing both ends of a connection, and the more nodes there are, and the more geographically distributed they are, the less likely it is that an arbitrary attacker can view traffic as it enters and exits the network. With a mix network though, generally we would assume that the attacker is already capable of watching all links between all nodes, regardless of how many nodes there are. The anonymity of a mix network depends on the amount of traffic on the network, and particularly the amount of traffic passing through the utilized mixes, and if there are only a few mixes then there is much more traffic passing through all of them and therefor the anonymity provided significantly increases. 

Quote
Data is exchanged between machines using a mixed PIR retrieval system. Machines would accumulate data until the threshold is met. They would then mix and advertise their new data to the connections they have. The connections have the option to retrieve all, none, or some (Using PIR) of the advertised data.

That is an interesting idea. A pull rather than push mix network. Normally in a mix network, one of the mixes receives so many messages and then they reorder them and send them on to the next mix on the messages path. Your proposal seems to be that mixes obtain so many messages, and then they reorder them and advertise that they have them, at which point other nodes on the network can pull them with PIR. There are two important points to consider. The first point to consider is, how do the pulling nodes determine which messages they pull? In a traditional mix network, the client selects the path of mixes that the message travels through. The client needs to have public keys for the mixes on the network first, which they obtain via bootstrapping with a set of directory servers. So the client doesn't need to tell each mix individually that it will use it on its path, it merely needs to construct a message that will be routed through each of the mixes on its intended path. If the client communicates with mixes telling them which messages to pull, the anonymity of the system is reduced to the anonymity the client has when constructing their path when they communicate with each of the mixes telling them the message to pull. If the message is tagged with the next mix on the path, then the PIR will be useless since the mixes will already know which messages are being pulled by which mix.

One possibility is for the messages to be tagged with an ECDH shared secret generated with the mixes long term public key and an ephemeral keypair generated by the client, with the ephemeral public key also attached to the message. If all mixes pull all attached ephemeral keys, they can derive shared secrets with them and then pull messages tagged with shared secrets that match shared secrets they generate with the attached ephemeral keys. This would require all mixes to obtain all ephemeral public keys though, and to derive ECDH shared secrets with all ephemeral public keys. That probably wouldn't scale very well.

The second thing to take into consideration is that generally PIR that isn't everybody gets everything assumes that a message is present on several non-cooperative servers. This opens up the strong possibility of intersection attacks if there is a large distributed PIR network. Because if my outgoing message is on 5 servers only, and the next layer of mixes on its route pulls the message with PIR from these servers, then the message is identified as being one of the messages that is on each of the 5 servers being pulled from. Unless all of the people routing messages use the same exact path as me, this will severely degrade any provided anonymity.   

Quote
Data that is transmitted across the network consists of two parts. The first part is a small identifier for the second, larger, part. The identifier would include the proof-of-work done on the message as well as an encrypted header for the second part. Once the accumulation threshold is reached then the first part is what is announced/sent to connections for advertising new data. If a connection already has one/some of the data being advertised, they will leave it out of their request list to the announcer. This also allows PIR operation to happen unannounced. Additionally this group can/may be padded.

Sounds a lot like BitMessage, and it will likely be weak to some of the same attacks that BitMessage is. For example, if Alice has a message before Bob does, then Bob could not be the sender of the message.
Title: Re: Brainstorming the ideal anonymity network
Post by: PingFail on July 13, 2013, 08:23 pm
I think that mixing is very important for strong anonymity. PIR is also very useful but great care needs to be taken in how it is implemented. Cross network communication is a possibility, although it might be better to base it on top of Tor. In the case of routing nodes, it is better if they all share a common anonymity network.
I was thinking about starting it on top of Tor to begin with. Then expand it out from there.

Quote
In Freenet all members usually act as routing nodes and data stores, but all members don't share all data. All members sharing all data sounds more like BitMessage, and I don't like this because I think it will scale poorly. I also think that if all users are routing nodes that the anonymity of the network will be hard to ensure, when clients act as routing nodes it tends to open up the risk of intersection attacks. Freenet has managed to do this in a pretty secure way, but I2P is very vulnerable to intersection attacks due to essentially all clients also being routers. BitMessage is also very weak to intersection attacks due to all clients being routing nodes. Tor has substantially protected its users from several sorts of intersection attack by not having all clients act as routing nodes. Another issue with all nodes being routing nodes is that it will dilute the ability to mix messages. Mix networks are actually more secure if they have less routing nodes, because then more messages mix together at each hop. With a network like Tor the anonymity depends on the size of the network, because the goal is to prevent an attacker from observing both ends of a connection, and the more nodes there are, and the more geographically distributed they are, the less likely it is that an arbitrary attacker can view traffic as it enters and exits the network. With a mix network though, generally we would assume that the attacker is already capable of watching all links between all nodes, regardless of how many nodes there are. The anonymity of a mix network depends on the amount of traffic on the network, and particularly the amount of traffic passing through the utilized mixes, and if there are only a few mixes then there is much more traffic passing through all of them and therefor the anonymity provided significantly increases.
Well only the core members share all data. You can connect and not share all data if you like. It also would allow for you to share what your neighbors have stored (or are sharing without storing).

 It could function like this:
- One of my neighbors alerts me of new data, they have collected and mixed X number of data before advertising the existence of this new data to me.
- This advertisement is done by sending me the header files.
- Once I have downloaded the header files, I can store them temporarily or for a longer period of time. Either way, I can mix and advertise them to my other connections (Mixing would not necessarily use the same set of files), if none of them want or need any of the advertised files then I do not have to request the data files from the original advertiser.
- I do not have to download or store all of the files on the network. Especially if I am acting solely as support for the network.
- Back to me receiving the mixed header files from the advertiser. I can either retrieve all or some or none of the data files that go with the headers (depending on if I already have some of the data or if I only want to retrieve a few using PIR, or if any of my connections request any of the files.)

Quote
That is an interesting idea. A pull rather than push mix network. Normally in a mix network, one of the mixes receives so many messages and then they reorder them and send them on to the next mix on the messages path. Your proposal seems to be that mixes obtain so many messages, and then they reorder them and advertise that they have them, at which point other nodes on the network can pull them with PIR.
Well my proposal is to have a pushed mix and a PIR pull. Mixing would occur on every hop (doesn't have to, could be defined by the operator of the hop). 

Quote
There are two important points to consider. The first point to consider is, how do the pulling nodes determine which messages they pull? In a traditional mix network, the client selects the path of mixes that the message travels through.
Relevance, request, and percentage.
Relevance
- I need one of the messages so I either download them all or retrieve my message with PIR

Request
- One of my connections needs the message so I download it and/or relay it to them.

Percentage
- I download X percent of all messages, Percentage would not be hard set but rather a rough number (perhaps even randomized). This would allow me to store every few messages, if one of my connections needs the message then they would download it from me. If the person I retrieved my messages from goes offline, I now have a copy of some/all of those messages that the network can access.

Quote
The client needs to have public keys for the mixes on the network first, which they obtain via bootstrapping with a set of directory servers. So the client doesn't need to tell each mix individually that it will use it on its path, it merely needs to construct a message that will be routed through each of the mixes on its intended path. If the client communicates with mixes telling them which messages to pull, the anonymity of the system is reduced to the anonymity the client has when constructing their path when they communicate with each of the mixes telling them the message to pull. If the message is tagged with the next mix on the path, then the PIR will be useless since the mixes will already know which messages are being pulled by which mix.
Clients certainly can have public keys. This would add additional functionality and features. I am trying to make a system that is overall simple but can be used and utilized in different ways making it as complex as you like. This would certainly add to that.

Paths are not pre-determined. Headers do go to everyone on this core network. If you don't need it, you don't have to store it or even forward it to others (up to the client, by default everyone shares with everyone as well). The body is requested, either by those that need it or those that wish to store it.

Quote
The second thing to take into consideration is that generally PIR that isn't everybody gets everything assumes that a message is present on several non-cooperative servers. This opens up the strong possibility of intersection attacks if there is a large distributed PIR network. Because if my outgoing message is on 5 servers only, and the next layer of mixes on its route pulls the message with PIR from these servers, then the message is identified as being one of the messages that is on each of the 5 servers being pulled from. Unless all of the people routing messages use the same exact path as me, this will severely degrade any provided anonymity.   
Well even the first PIR would request more than 1 message. Let's say it requests 3 from one of its connections, the connection either has those three stored, has at least one of the three stored, or has none of the three stored locally. Any locally stored messages would be provided to the requester, any non-stored messages would be requested from the other connections of that node. If 1 is the requester, 2 has none of the messages stored, 3 has one of the messages stored, and 4 has all of them stored then the flow would look like this. 1 requests messages a,b, and c from 2, 2 requests messages a, b, and c to 3, three requests b and c from 4. 4 provides b and c, 3 provides a from local storage, and messages b and c which it got from 4. 2 provides a, b, and c that it gets from 3. A minimum threshold can occur where one node won't request only one message from another connection, instead it would request at least X number of messages using PIR, this would hide what messages a node has stored.

Quote
Sounds a lot like BitMessage, and it will likely be weak to some of the same attacks that BitMessage is. For example, if Alice has a message before Bob does, then Bob could not be the sender of the message.
Indeed it does. It also is similar to freenode. This is not limited to any particular data type, it could be used for a variety of things including messaging, web browsing, file sharing, content publishing, etc.  I disagree, with thresholds and mixing bob may very well have the message before alice but either is waiting for one threshold to be reached, or mixed it out of the current threshold advertisement. The only way that you know if someone has a message is if they advertise it. What messages were advertised would be kept up with on a connection to connection basis. This is on both the advertisers side and the ad recipients side. The recipient would keep up with it so that it PIR could occur.

This model also allows multi latency support. higher pow would take precedence in making it into the current mix but not a guarantee.
Title: Re: Brainstorming the ideal anonymity network
Post by: fucknuts on July 13, 2013, 08:36 pm
subbing

Thanks kmfkewm for this thread, and astor for adding his thoughts.  Absolute treat to have such knowledgable people here.