What I've leaned is Tor is safe as long as you use the Tor browser to access .onion sites (aka "hidden services") within the Tor network and keep your private information private. Every published paper I've read on global network monitoring and such has to do with activity at the edges - traffic into and out of the Tor network, which happens when people use Tor to access clearnet sites. Clearnet browsing IS a big risk to anonymity because your IP's traffic can be correlated with traffic leaving Tor network (often unencrypted).The connection goes something like this:
Yes the direct deanonymizing attacks against Tor require edge surveillance. No, this is not only possible in the case of connections to clearnet sites.
User <###> (entry node) <###> (relays) <###> (exit node) <---> website
"<###>" means encrypted connection
"<--->" means unencrypted connection
"(relays)" means at least one relay, maybe a chain of relays
Correct
You can see there is considerable vulnerability from the fact that the final connection to the website is completely unencrypted, which means that exit node can see exactly what you're doing on the website. This is why Tor browser comes with "HTTPS Everywhere" installed by default - this way the exit node doesn't see unencrypted communications (assuming the site's ssl is configured securely), although it can still see the exact website you're visiting. But to monitor all traffic within Tor network would require the adversary own all the nodes/relays.
Wrong on a few counts. First of all, although you seem to understand the difference between communications security and traffic analysis, you are still somewhat confusing the issues by continuing to talk about encryption. Traffic analysis works regardless of if the communications are encrypted or not, so as far as anonymity goes it is a good bet to just pretend that all the traffic is encrypted in any case. Of course if the traffic isn't encrypted at the entry, deanonymizing would be much easier, but an attacker can still do edge attacks, aka traffic confirmation, aka end point timing correlation, even if the packets are encrypted. Almost all attacks against anonymity solutions are concerned about packet metadata, such as time of arrival, not the actual payload data of individual packets. Additionally, you are correct to say that a purely active/internal attacker must own all Tor nodes in order to see all Tor traffic. However, it is possible to see all Tor traffic without owning a single node, by monitoring the traffic into and out of nodes at ISP or IX levels. Less sophisticated attackers, hopefully the FBI and DEA falling into this category, would attack Tor by adding nodes to the network. Powerful attackers such as the NSA would monitor Tor node traffic from ISP's / IX's, passively, without having to add any nodes to the network at all.
Hidden services are within the Tor network and follow the onion router protocol so there's no exit node and everything is 100% end-to-end encrypted). The protocol is designed to ensure anonymous communication between the hidden server and the user. The way it works is the hidden service (HS) publishes a list with a number of relays it designates as "Introduction Points" (IP) that it listens to for connection requests. When a User wants to connect to the HS, he must select a different relay as a "Rendevous"(R) and create a circuit to it, then build a circuit to one of the IP's to tell the HS the rendevous point relay, which the HS then builds a circuit to. Then the user can closes the IP circuit and communicate with the HS at the rendevous point. It looks like this:
Yes everything is 100% encrypted up to the hidden service, but this doesn't mean much in terms of anonymity. Well, to be fair it means a lot, because if the traffic is not encrypted en route then the attacker could just spy on it at your entry node to deanonymize you. But as I said before, when contemplating attacks on anonymity systems, more often than not it is safe to work from the assumption that all of the traffic is encrypted, because most anonymity attacks are concerned with packet metadata which is available regardless of if the traffic payload data is encrypted or plaintext. The difference is between communications privacy and communications anonymity; although anonymity massively benefits from encryption, a large majority of anonymity attacks remain viable even if the traffic is layer encrypted end to end. Hidden service connections being encrypted end to end provides you with communications privacy, an attacker at an exit node can no longer eavesdrop on your communications. Hidden service connections being encrypted end to end has virtually no impact on your anonymity, the packet arrival timing metadata is still available and this is what is required to do the most feared deanonymizing attack against Tor (traffic confirmation, end point timing attack).
(1) User selects some relay as a rendevous point (R) and creates circuit:
User <###> (entry node) <###> (relays) <###> (R)
(2) User connects to information point (IP) to tell HS the rendevous relay:
User <###> (entry node) <###> (relays) <###> (IP) <###> (relays) <###> HS
(3) User connects to the HS using the circuit to the rendevous:
User <###> (entry node) <###> (relays) <###> (R) <###> (relays) <###> HS
Yes you are correct about this. In the case of a connection to a clearnet site, active/internal timing attacks look like this:
user <###> Adversary Owned Entry <###> Good Middle <###> Adversary Owned Exit <---> Destination Server
The adversary can link the stream through the entry to the stream through the exit with statistics, using the packet arrival metadata, which exists regardless of if the packet is plaintext or ciphertext.
In the case of a hidden service, the completed circuit and internal timing attack looks like this:
User <###> Adversary Owned Entry <###> Good Middle <###> Good Rendezvous <###> HS Good Final <###> HS Good middle <###> Adversary Owned Entry <###> Hidden Service server
The attack is carried out in the same way, but now instead of having to own the clients entry and exit nodes, the attacker needs to own the client and hidden services entry node. Of course they will only see a connection to an IP address, and they cannot by this alone determine that the IP address is the hidden service. However, as they own one of the hidden services entry nodes, they can do this:
Adversary <###> (Adversaries circuit to hidden service) <###> (Hidden Service Relays to adversaries circuit) <###> Adversary Owned Entry Node <###> Hidden Service
as the adversary is connecting to the hidden service with its .onion address, their timing attack can identify the hidden service once the packets from them as a client pass through their entry guard. Now they have identified the hidden services IP address, and know when they do their timing attack against regular users in the future, that the regular users are connecting to the hidden service instead of just some IP address that is not identified as being linked to any particular hidden service.
Also, an attacker can trace up to a hidden services good entry guards in the following way:
Adversary <###> (Adversaries circuit to hidden service) <###> (Hidden Service Relays to adversaries circuit) <###> Entry Node <###> Hidden Service
every time the adversary creates a connection to the hidden service, the nodes consisting of (Hidden Service Relays to adversaries circuit) change, selected from the current pool of available Tor nodes, as determined by the Tor circuit construction protocol. The hidden services entry node is selected from one of three nodes it has selected as guards, which currently rotate about once every month to two months. The attack is simply brute force: build a circuit to the hidden service, send a packet down it, close the circuit, rinse and repeat. The adversary can select to use a rendezvous point that they own, allowing them to identify the final node from the hidden service. Eventually, after forcing the hidden service to open enough new circuits, the adversary will have one of their Tor relays on the circuit to the hidden service. Now they do timing attacks on their nodes looking to see if one of the packets they send to the hidden service travels to it through one of their nodes. If they have the final node, they will be able to identify the middle node. If they own the middle node, they will be able to identify the entry guard, and they will know it is the middle node as they can identify the final node from their rendezvous point, and if they send a packet to a node that is not a public Tor relay they will know it is the hidden service and that they are its entry guard (which will be easy to confirm as they will see a LOT of the packets they send to the hidden service). This attack allows for quick tracing of hidden services up to their entry guards.
Additionally, it is incorrect to say that you are required to own Tor relays to do these attacks. In reality you are only required to be able to observe traffic going into and out of Tor relays. Attackers who can not gain access to ISPs / IXs will have to resort to either running relays or hacking into operating relays. Powerful attackers such as the NSA can certainly spy on good Tor nodes (especially in the USA) to observe the traffic entering and exiting from them, and thus they do not have much motivation to add their own nodes to the network, especially as active surveillance is much easier to identify than passive surveillance.
This whole process in accomplished by the Vidalia client and requires no input from the user beyond entering the .onion address into the Tor browser. There's also a lot of rigorous PKI trust validation of relays, key-exchanges, asymmetric crypto, etc. happening at each step throughout the process (but I'm leaving that out because it's complicated enough to explain already). They can't 'crack' PGP (unless in individual cases when it's used incorrectly), so they cannot read the actual data being transmitted within the Tor network because it's all encrypted (and signed and verified).Communucations are completely anonymized once they leave your entry node.
Someone already corrected you on this, and indeed Tor is what manages everything, Vidalia is merely a graphical user interface that allows you to control some of what Tor does. You can access hidden services without using Vidalia at all. Also, the cryptography is all but entirely irrelevant to the counter traffic analysis properties of Tor; although unencrypted traffic being sent from entry to exit would be almost completely incompatible with anonymity, the direct attacks on anonymity protocols that are studied today pretty much all work with the assumption that the traffic is end to end encrypted.
tl:dr:
As long as the entry node is not comprimised, Tor is safe. If the entry node *is* comprimised, you are still safe unless:
A. You're using Tor browser to connect to clearnet sites, or
B. You connect to a hidden service who's entry node is also comprimised AND the adversary knows it is used by the hidden server
In order for B to be true, the hidden service itself has to already be de-anonymized and its IP address known. But that would be the hidden service that is comprimised, not Tor itself.
A is mostly true, although I would clarify that
1. The entry guard can be actively compromised, meaning that an attacker owns it
2. The entry guard can be passively compromised, meaning that it uses an ISP or IX that spies on it
3. The entry guard and its' ISP/IX can both be good, but you can still be deanonymized if YOU are being monitored by your ISP or IX
B is true, and I guess you actually have understood that risk from the get go
. However, if you own a hidden services entry guard, it is trivial to determine that you do. Also I guess I should point out that you are less likely to use an entry guard owned by the same attacker who owns the hidden services entry guard than you are to use a malicious exit node owned by the same person who uses your entry node, because you use a new exit node roughly once every ten minutes where as the hidden services entry guard used is selected from three guards that change only once every month to two months. So you are afforded some extra protection in this case. However it is not that hard for an attacker to identify the entry guards used by a hidden service, and it is likely somewhat of a safe bet that a half decent attacker could put a hidden service under passive surveillance. Actual evidence points to the FBI not being such a skilled attacker, but this is likely due to incompetence on their part rather than the inherent security of Tor.