There probably are, but with massive data sets, they are mostly useless. "Big data" is the most overhyped thing in tech right now, like "cloud" services were a few years ago. If you have a data set with thousands of parameters and you analyze it a million ways, than even with a rigid standard like "1 in 1000 probability of being spurious", which is more formally called a p-value of .001, you are going to find a thousand associations like that! There are ways to address it, like the Bonferroni correction, but it is extremely difficult to find the 10 true associations and exclude the 990 false ones. You are bound to get high false positive and false negative rates. Then you will waste your time raiding innocent people, or spend a lot of money following false leads, and really a judge should never sign a warrant based on evidence that has a 40% chance of being false. The NSA's current standard is a 51% probability of being true, or what John Oliver described as "a coin toss, plus one percent". With signal analysis specifically, large data sets suffer from the base rate fallacy: http://www.raid-symposium.org/raid99/PAPERS/Axelsson.pdf A Tor developer named Mike Perry has argued at length that many of the threat assessments against Tor don't take it into account. Many of those assessments also suffer from publication bias (he claims) and are not reproducible under real world conditions, even when the researchers run their analyses on the live Tor network, because there are still components they control, such as the hidden service they are trying to find. In one sense, it's good that the NSA is collecting vast amounts of data. It makes drawing robust conclusions more difficult, so the more the better