3. Anonymity concerns
A. In addition to scaling horribly, when it cannot scale any more it splits into sub networks
If all nodes receive all messages, it is natural to be concerned about the system’s scalability. To address
this, we propose that after the number of messages being sent through the Bitmessage network reaches a
certain threshold, nodes begin to self‐segregate into large clusters or streams. Users would start out using
only stream 1. The stream number is encoded into each address. Streams are arranged in a hierarchy.
This reduces the anonymity set size of the system to the userbase of an individual cluster. Essentially it breaks apart one giant everybody gets everything PIR network into a bunch of little everybody gets everything PIR networks. It also opens up room for a Sybil attack, an attacker could flood clusters with malicious nodes that fake network activity in the hopes of drowning out the number of legitimate users per cluster. They can filter off their own traffic and that could severely degrade the anonymity provided to the legitimate clients. Definitely see a lot of room for anonymity attack vectors based on segmentation of the network.
B. Clients acknowledge when they have received a message, opening it up to anonymity and security attacks
One thing I noticed is that clients sends acknowledge messages when they are able to decrypt a message. I realized that this could probably be used as a side channel to recover the user’s private key.
This can also be used as a deanonymizing attack. Alice is a local external attacker who observes IP address Bob. Over a period of several days, she monitors the amount of traffic Bob gets from the network and the amount of traffic Bob sends into the network. After getting this baseline measurement of Bobs activity, she sends 100 messages to a Bitmessage address of interest. If she observes Bob's outbound traffic increasing from average, and it correlates with the number of acknowledgements he has sent back into the network, Alice can correlate this and assume that Bob is linked to the Bitmessage address she sent all of her messages to.
To be fair the network also supports passive mode of operation, where Bob does not send acknowledgements to messages. However this is not the default behavior. It certainly should be. They also offer an alternative where Bob uses a number of middle nodes to send his message acknowledgements from. This is not adequate to protect from the previously mentioned attack, as Bobs outbound traffic will still increase significantly from his average in order for him to acknowledge all of Alice's messages to him.
C. Nodes keep an up to date index of all messages they have seen, and it can be remotely queried by design
This structure references inventory vectors. Inventory vectors are used for notifying other nodes about
objects they have or data which is being requested. Two rounds of SHA‐512 are used, resulting in a 64
byte hash. Only the first 32 bytes are used; the later 32 bytes are ignored. The rationale for not using SHA‐
256 is that Bitcoin uses SHA‐256 throughout, including for the proof‐of‐work for blocks. Bitmessage
should use a different algorithm so that using Bitcoin mining hardware to do Bitmessage POWs is at least
not completely trivial. SHA‐512 is used throughout Bitmessage (except for certain places within the
encryption implementation). Changing the POW algorithm to one designed to run “poorly” on GPUs and
custom hardware would be a positive change but rapidly developing GPGPU hardware makes it difficult
to judge what algorithms will be appropriate in the future.
So if Alice sends Bob a message, and then Bob sends Alice a reply to the message, Alice can query the inventory of every node on the stream and eliminate all nodes that don't have her message from possibly being Bob? That is an intersection attack that can likely quickly narrow in on Bob. Or can Bob get a message from Alice, and then query all nodes to see if they also have the message from Alice? That will make Alice weak to an intersection attack as well. Do that over a few messages and intersect the crowds, that will quickly allow Bob or Alice to identify each other, or at least greatly narrow in on their anonymity set sizes. There are solutions to this problem but they can find them out themselves
.
D. Clients may clear their caches in a predetermined time based fashion
Using this model, Client 1 downloads all objects stored by Client 2. We propose that each client store
objects for about two days and then delete them to reclaim disk space.
Do they store for exactly two days , down to the micro second, after they receive the messages? If so Bob can continuously query nodes for messages sent by Alice, and follow the messages progression through the network backwards by observing which nodes clear their caches first.
E. Can we count on them to use new initialization vectors for messages rebroadcast due to lack of acknowledgement? Probably not, given their previous crypto blunders.
If the receiver of a message is not
online to receive it during the two‐day window, the sender will notice that he never received an
acknowledgement and will resend the message after waiting an exponentially increasing amount of time.
Hopefully they encrypt it again with a new IV (hopefully they know what an IV is....) before resending it, or else another intersection attack is possible. If somebody resends the same exact ciphertext twice, it wasn't likely being sent to a person who was online in a two day period.
F. They are counting on sleeping to protect from cryptographic timing attacks that can be turned into anonymity attacks
Else if the decryption is not successful:
Sleep for a calculated amount of time (0.3 to several seconds depending on the
size of the message and the speed of your computer). Thus decrypting or
failing to decrypt the message will take the same amount of time.
Their solution to this problem is almost certainly bad. The answer to this sort of problem is pretty much invariably to implement constant time operations that are not based on timers / sleeping, but the actual logic of how the task is carried out. I would bet money that their solution to this problem is not secure. It is also horribly inefficient for the node to be trying to decrypt every single message anyway. Adding artificial time delay on top of that is going to slow things to a crawl. I don't understand how they have determined that they have failed to decrypt a message. You can decrypt a message with a key it wasn't encrypted for, it just produces gibberish. They should just decrypt all messages and then it is constant time and they don't need to sleep.
G. Intersection attacks out the ass
Let's say there is an attacker A and the network consists of 8 nodes, A[1..3], B, C, D, E, F.
A sends Bob (who is node B) a message. After getting the message, Bob sends an acknowledgement, which takes this route :
B -> C -> D -> A[1] -> F -> A[2] -> E -> A[3]
Now the attacker can conclude that Bob might be nodes B, C or D, but he is not nodes F or E. Over enough acknowledgements, or even messages sent in general for that matter, the attacker can narrow in on which node Bob is. This is an intersection attack.
They are a bit safer from internal than external attackers:
for example, assume Node 1 is the attacker and Bob sends an acknowledgement through it:
Bob -> Node 1 -> Node 2
Node 1 cannot tell that the acknowledgement originated at Bob, just as Node 2 cannot tell if the acknowledgement originated at Node 1. This is very similar to Freenet, although apparently with out link level encryption (they should really add link level encryption....)
Bob -> Bobs ISP -> Node 1 ISP -> Node 1 -> Node 1 ISP -> Node2 ISP -> Node2 -> Node 2 ISP
in this case, Bobs ISP can tell that the message originated at Bob, because there is not data incoming for all the data outgoing. Just like Node 2's ISP can tell that the outgoing data did not originate at Node 2, because there is a byte in for each byte out.
The attacker at Bobs ISP who is also forcing the unknown owner of Bobs address to send a lot of acknowledgement messages can still send Bobs address a lot of messages and wait for acknowledgements to be sent. First they observe Bobs in and out traffic ratio over a period of some days to establish an average, then they send Bob 100 messages and wait for acknowledgements. Now they compare Bobs in:out ratio average over the observation period to Bobs in:out average over the period around when they sent Bob 100 messages. If bob sends a number of bytes more out than he gets in, that correlates with the number of messages the attacker sent Bob, they can guess that Bob is their target. They can also rule out nodes that don't have an in:out ratio consistent with someone who acknowledged 100 messages. This is the case if Bob forwards on the message intended for him in addition to the acknowledgement included encrypted in the message. When Bob gets a message of 50 KB that has 49KB of acknowledgement data in it, he will have 50 KB in and 99KB out (50KB out to rebroadcast the original message, 49KB out for the separate acknowledge message). If the attacker sends Bob 100 such messages, they will cause him to take in 5,000 KB and send out 9,900 KB, a ratio difference the attacker will notice at their position at Bobs ISP. The nodes Bob routes these messages to will be getting 9,900 KB in from Bob and sending 9.900 KB out. And of course Bob cannot stop routing a message sent to him after he gets it, otherwise the attacker will notice that Bob got a message of 12 KB in and never sent any messages of 12KB out, then they can use one of their internal nodes to find the 12KB message and then they know it was sent to Bob.
Of course this attack is not even required though because apparently there is no link level encryption in the first place. So Bobs ISP doesn't even need to count packets, they can compare the content of packets in and packets out and only pay attention to packets out that don't have a data equivalent packet in.
H. Lack of link level encryption fucks anonymity up entirely from a local external attacker
The lack of link level encryption actually completely ruins unlinkability of communicating parties in the face of a local external attacker. If Alice requests the public key of Bob through the network, her ISP can see the key request coming from Alice and also determine that no key request came into Alice. Thus Alice's ISP can conclude that Alice requested Bobs public key and therefor link Alice to Bob.
Actually the lack of link level encryption totally fucks anonymity all together in the face of a local external attacker. If Alice sends a message that has her long term ECDH public key attached to it (which in itself is stupid), her ISP can see that this message was sent by Alice but did not get sent to Alice. Since Alice's address is derived from her public ECDH key, this actually let's Alice's ISP identify her bitmessage address.
I. Lack of swarming or padding messages to a constant size fucks anonymity from a local external attacker even if link level encryption is used
Actually even link level encryption is not perfect. If Alice's attacker owns an internal node on the network and is also local external, they can determine that the message originated at Alice by counting packets in and out like before, and additionally they can determine the size of the message by counting packets. Now when a message of that size passes through their internal node, they can determine that this is the message Alice sent by size correlation and now they can get Alice's address from the message + they already know her IP address from their local external positioning. Maybe they should look at Freenet some, Freenet solves this problem by swarming messages split up over several nodes, with each message chunk being the same size. The way Freenet does it, the attacker can still count chunks in and out from their local external positioning, but when *some* of those chunks pass through their internal nodes they are the same size as all other chunks.