warning: I am really just brainstorming here and in the process of writing this post I came to the conclusion that some of the trains of thought I was following would not work very well. I didn't originally plan to, but several times I sort of had my train of thought abruptly diverge in the process of writing this. Most people will probably be very bored reading this, I am going to post it anyway on the off chance that anybody reads it and gives it any thought. I find that when I talk things out as if I am talking to people that I can think more clearly about a subject, I don't care if anybody actually is listening
.
I am thinking something along these lines:
A. Native / default support for layering with Tor
This will pretty much be required to get a user base to start with. People trust Tor already, and Tor is already big enough that it can provide some anonymity. Layering Tor with another network can only help anonymity, and Tor already has put a lot of focus on blocking resistance / membership concealment, features that seem silly to try to reproduce in a new network when they are already provided adequately by Tor.
B. P2P network
I definitely think that all nodes should route by default. If all nodes run as hidden services we can get around the issue you pointed out of not all users being able to route due to being behind NAT. I suppose it would be H2H, hidden service to hidden service. Normally I would strongly advise against anything where users run as hidden services, but if we can add plausible deniability and protection from timing attacks to the picture, it should be acceptable. I think the only way we are going to be able to add plausible deniability is if all nodes route for all nodes. I also like the idea of all nodes providing some hard drive space to all nodes, this will allow for distributed content that is very resistant to censorship.
C. Support for multiple use cases: exiting to clearnet, centralized hidden services, distributed content storage, internal E-mail
By being designed for as many use cases as possible, we may be able to attract a large number of users. The primary issue will be designing the system such that all of the different sorts of traffic blend in together, and do so while consuming reasonable amounts of bandwidth. I will need to give more thought to how these different sorts of traffic can be made to blend together.
I have some basic ideas for how traditional hidden services could be provided by a mix network. Utilization of single use reply blocks (SURBS) seems to be an acceptable way to obtain this. Single use reply blocks were introduced by mixminion. When a user sends a forward message through a mix network, first they need to construct a layer encrypted cryptographic packet that securely routes the message payload through the network, starting from the closest node to the message sender and then with layers of encryption being removed as the message is routed forward all the way to the furthest node from the message sender. SURBS are cryptographic packets that route towards the person that constructs them. Essentially Alice creates a SURB and sends it through the mix network to Bob, as described previously. After Bob obtains the SURB, he can attach it to an outgoing message and the message is routed to Alice. From a SURB Bob only learns the closest node to himself, all of the other nodes are only known by Alice and neighboring nodes. One primary difference between normal forward routing and routing with a SURB is that in the case of the former the payload has a layer of encryption stripped from it at each mix, whereas in the case of the later usually a layer of encryption is added to the payload at each mix.
It seems to me that a hidden service could create SURBs that route data to it and then publish them somewhere. After clients obtain a SURB for the hidden service they want to access, they can establish a connection to the hidden service with it, sending their own SURB for the hidden service to send replies to them. Essentially this is the same exact way that remailer networks use SURBs, but instead of applying it to E-mail it would be applied to a client <-> centralized hidden service model. This is not a well thought out idea at this point, primarily we would still need to think of a way for clients to obtain the SURBs in the first place, and for hidden services to refresh the supply of SURBs available to clients.
Another issue is that SURBs are not particularly secure as they are usually used. With SURBs, global passive adversaries can usually link two communicating parties together after only several dozen to thousand messages are exchanged between them. Due to the limited anonymity of SURBs, state of the art theoretical remailers have ditched them all together in favor of Private Information Retrieval (PIR). In the case of remailers that use PIR instead of SURBs, there is a mix network as well as a nymserver network and a PIR network. Users register an address at a nymserver through the mix network, and can control their account through the mix network as well. When Alice sends a message to Bob, she sends it through the mix network to the nymserver of Bob. Bob's nymserver then batches Alice's message to Bob together (adds them to a bucket) with all other messages to Bob that arrived in a given time period (called a cycle). After each cycle completes, Bob's nymserver distributes all of its users buckets to the PIR network. Every cycle Bob engages in the PIR protocol with some number of the PIR nodes in order to obtain his bucket (hundreds of cycles worth of buckets can be stored at a time, and Bob can go through them at his leisure, so long as he always engages in the PIR protocol once for each completed cycle). This system enormously strengthens the anonymity of bidirectional communications through a mix network as it removes a limit on the number of messages Alice and Bob can exchange before their anonymity suffers in the face of a global passive adversary. This comes at the expense of requiring a massive infrastructure to support it (mix network + nymserver network + PIR network), and having semi-trusted intermediaries (the Nymserver).
For E-mail type systems I would definitely be in favor of using something based on PIR, and PIR based systems are widely recognized as being greatly more anonymous than SURB based systems. However, for a traditional hidden service I don't think it is possible to use PIR. This does bring up a bit of a problem with integrating E-mail and Hidden services into the same network, in the case of E-mail it would be better to use PIR but then it would be using a different mechanism than the hidden services. Perhaps by having plausible deniability due to all nodes routing for all nodes, we can use SURBs for E-mail messages without the traditional weakness to GPA long term intersection attacks. Let's break down the risks of SURBs, here is a quote from a paper that discusses the problems with them:
Nym servers based on reply blocks (discussed in Section
2.1 above) are currently the most popular option for re-
ceiving messages pseudonymously. Nevertheless, they are
especially vulnerable to end-to-end traffic analysis.
Suppose an adversary is eavesdropping on the nym server,
and on all recipients. The attacker wants to know which user
(call her Alice) is associated with a given pseudonym (say,
nym33). The adversary can mount an intersection attack,
by noticing that Alice receives more messages, on average,
after the nym server has received a message for nym33 than
when it has not.6 Over time, the adversary will notice that
this correlation holds for Alice but not for other users, and
deduce that Alice is likely associated with nym33.
Recent work [19, 43] has studied an implementation of
these intersection attacks called statistical disclosure, where
an attacker compares network behavior when Alice has sent
to network when she is absent in order to link an anonymous
sender Alice to her regular recipients Bob1 ...BobN . Against
pseudonymous recipients, however, these attacks are far eas-
ier: in the anonymity case, many senders may send to any
given recipient Bobi , but with pseudonymous delivery, only
one user sends or receives messages for a given pseudonym.
To examine this effect, we ran a version of the attack simu-
lations described in [43], assuming a single target pseudonym
and N non-target pseudonyms providing cover. In order to
make the attack as difficult as possible (and thus establish
an upper bound on security), we assume that users behave
identically: they receive messages with equal probability ac-
cording to independent geometric distributions in each unit
of time (receiving no messages with probability 1 − PM );
they use identical reply blocks with path length through
mixes in a steady state that delay each message each round
with probability PD .
We ran the simulated attack with different values for PM ,
PD , and , against a nym server with N = 216 active pseudo-
nymous users. (This is probably an overestimate of the num-
ber of users on typical nymserver today [45].) We performed
100 trials for each set of parameters. In the worst case (for
the nym holder), when PM = 0.5, = 1, PD = 0.1, the lack
of mix-net delay variance allowed the simulated attacker to
guess the user’s identity correctly after the user received an
average of only 37 messages. In the best simulated case
(PM = 0.5, PD = 0.9, = 4), the user received an average
of only 1775 messages before the attacker guessed correctly.
For an active user, this is well within a month’s expected
traffic.
Although there are ways to use dummy traffic to resist
statistical disclosure attacks, these are difficult to implement
perfectly in practice (due to outages) and even slight imper-
fections render users vulnerable to attack [43].
So in summary, the risk of SURBs is that Alice can obtain X reply blocks for Bob and then use them to send X messages to Bob, and then watch to see if any of the IP addresses using the network have X more messages arrive at them than usual. It is possible to protect from this if the nymserver stores SURBS for Bob, and only sends messages out in fixed cycles, but in such a case the nymserver is still capable of attacking Bob's anonymity by not respecting the cycle scheme. There are some fundamental differences of the PIR based systems that protect from this attack: Alice (or the nymserver) pushes messages to Bob with SURBs, whereas with the PIR designs Bob pulls messages from the PIR network. With the PIR systems, Bob only pulls X bytes of data per cycle in all cases, with SURBs Bob has as much data pushed to him as people with SURBs for him desire to push.
In the case of everybody gets everything PIR this attack is protected from even if data is pushed rather than pulled. For example, with BitMessage, if somebody sends Bob x messages, every node on the network receives X additional messages, making the intersection attack impossible to carry out. So the fundamental mechanism leading to the insecurity attributed to SURBs is that they can be used by an attacker to cause Bob to receive unique amounts of data. It seems that it is possible to use SURBs with nearly the same security as PIR if Bob receives messages to a nymserver and then sends the nymserver SURBs in fixed duration cycles, with the network itself enforcing the size of all routed messages (a timestamp enforcement mechanism would also be required). Since Bob determines when the nymserver has a SURB for him, and since the nymserver can only route X bytes to Bob per SURB, it seems that this avoids the specific attack attributed to SURBs. Of course using SURBs for message retrieval still lowers the anonymity to that which can be provided by a mix network, whereas PIR can guarantee anonymity unless all of the utilized PIR servers are compromised (or even if they are all compromised, in the case of everybody gets everything).
Anyway, I am getting a little bit off topic because a hidden service could not utilize a nymserver like this and still serve hundreds of clients. The original goal was to determine if all nodes routing for all nodes protects from this weakness of SURBs. The answer is obviously that it does to an extent, but probably not enough of one. If the attacker is a GPA they will see several nodes get X messages after spamming Bob, but they will be able to follow the flow and it will end at Bob. Bob will maintain plausible deniability from a local internal attacker, after all he could be forwarding the X messages on himself. However, a local external attacker will be able to defeat his plausible deniability. The only way Bob could protect from this is by sending out a number of dummy messages equivalent to the number of legitimate messages he obtains for himself. However, this will only create a crowd size, and unless it is an everybody gets everything PIR like BitMessage, the attack will still be possible for narrowing in on Bob. Of course it is probably asking to much to protect a nearly-low-latency network from a GPA, and this would still allow to protect from a local internal attacker.