Silk Road forums

Discussion => Security => Topic started by: kmfkewm on February 21, 2012, 04:25 am

Title: [intel] some of the tools used for attacking terrorist forums
Post by: kmfkewm on February 21, 2012, 04:25 am
http://ai.arizona.edu/research/terror/

Quote
Introducing: The Dark Web Forum Portal

As part of its Dark Web project, the Artificial Intelligence Lab has for several years collected international jihadist forums. These online discussion sites are dedicated to topics relating primarily to Islamic ideology and theology. The Lab now provides search access to these forums through its Dark Web Forum Portal, and in its beta form, the portal provides access to 28 forums, which together comprise nearly 13,000,000 messages. The Portal also provides statistical analysis, download, translation and social network visualization functions for each selected forum.

Interested in accessing the Dark Web Forum Portal?

You may request an account by submitting a Username Request form (available at http://cri-portal.dyndns.org/UserRequest/c?fromurl=http://cri-portal.dyndns.org):
    - Fill out the form completely.
    - Be sure to include your official institutional email address in either the Username or Notes section.

See also the project page for our NSF-funded project

"CRI:CRD - Developing a Dark Web Collection and Infrastructure for Computational and Social Sciences" (CNS 0709338). [Previously located at http://ai.arizona.edu/research/terror/CRDabstract.htm]

See this important book for more information:

    H. Chen and C. Yang, eds. Terrorism Informatics: Knowledge Management and Data Mining for Homeland Security, New York, NY: Springer, 2008.

Research Goal

The AI Lab Dark Web project is a long-term scientific research program that aims to study and understand the international terrorism (Jihadist) phenomena via a computational, data-centric approach. We aim to collect "ALL" web content generated by international terrorist groups, including web sites, forums, chat rooms, blogs, social networking sites, videos, virtual world, etc.

We have developed various multilingual data mining, text mining, and web mining techniques to perform link analysis, content analysis,  web metrics (technical sophistication) analysis, sentiment analysis, authorship analysis, and video analysis in our research.

The approaches and methods developed in this project contribute to advancing the field of Intelligence and Security Informatics (ISI). Such advances will help related stakeholders to perform terrorism research and facilitate international security and peace.

It is our belief that we (US and allies) are facing the dire danger of losing the "The War on Terror" in cyberspace (especially when many young people are being recruited, incited, infected, and radicalized on the web) and we would like to help in our small (computational) way.

Return to Parameters
Funding

We thank the following agencies for providing research funding support.
Defense Threat Reduction Agency    July 2009 - July 2012
* WMD Intent Identification and Interaction Analysis Using the Dark Web (HDTRA1-09-1-0058)
    
Air Force Research Lab    July 2009 - July 2012
* Dark Web WMD-Terrorism Study (Subcontract No. FA8650-02)
    
National Science Foundation (NSF)    September 2003 – August 2010
* (CRI: CRD) Developing a Dark Web Collection and Infrastructure for Computational and Social Sciences (NSF # CNS-0709338)
* (EXP-LA) Explosives and IEDs in the Dark Web: Discovery, Categorization, and Analysis (NSF # CBET-0730908)
* (SGER) Multilingual Online Stylometric Authorship Identification: An Exploratory Study (NSF # IIS-0646942)
* (ITR, Digital Government) COPLINK Center for Intelligence and Security Informatics Research (partial support)  (NSF # EIA-0326348)
 
Library of Congress    July 2005 – June 2008
* Capture of Multimedia, Multilingual Open Source Web-based At-Risk Content
 
DHS / CNRI    October 2003 - September 2005
* BorderSafe Initiative (partial support)

Return to Parameters
Acknowledgements

We thank the following academic partners and colleagues for their support, help, and comments. Many of our terrorism research colleagues have taught us much about the significance and intricacy of this important domain. They also help guide us in the development of our scientific, computational approach.

    Officers and domain experts of Tucson Police Department, Arizona Department of Customs and Border Protection, and San Diego Automatec Regional Justice Information System (ARJIS) Program
    Dr.  Marc Sageman, University of Pennsylvania
    Dr. Edna Reid, U.S. Department of Justice
    Dr. Johnny Ryan, The Institute of International and European Affairs (IIEA)
    Rick Eaton, Simon Wiesenthal Center
    Dr. Joshua Sinai, The Analysis Corporation
    Dr. Shlomo Argamon, Illinois Institute of Technology
    Chip Ellis, Memorial Institute for the Prevention of Terrorism (MIPT)
    Rex Hudson, Library of Congress
    Dr. Chris Yang, Drexel University
    Dr. Gabriel Weimann, University of Haifa, Israel
    Dr. Mark Last, Ben-Gurion University, Israel
    Drs. Henrik Larsen and Nasrullah Memon, Aalborg University, Denmark
    Dr. Katrina von Knop, George Marshall Center, Germany
    Dr. Jau-Hwang Wang and Robert Chang,  Central Police University, Taiwan
    Dr. Ee peng Lim, Singapore Management University, Singapore
    Dr. Feiyue Wang, Chinese Academy of Sciences, China
    Dr. Michael Chau, Hong Kong University

There has been significant interest from various intelligence, justice, and defense agencies in our computational methodologies, tools, and systems. However, we do not perform (security) clearance-level work nor do we conduct targeted cyber space crime or intelligence investigations. Our research staff members are primarily computer and information scientists from all over the world, and have expertise in more than 10 languages. We perform academic research, write papers (see below), and develop computer programs. We sincerely hope that our work can contribute to international security and peace.

Return to Parameters
Approach and Methodology

Claims: Dr. Gabriel Weimann of the University of Haifa has estimated that there are about 5,000 terrorist web sites as of 2006. Based on our actual spidering experience over the past 5 years, we believe there are about 50,000 sites of extremist and terrorist content as of 2007, including: web sites, forums, blogs, social networking sites, video sites, and virtual world sites (e.g., Second Life). The largest increase in 2006-2007 is in various new Web 2.0 sites (forums, videos, blogs, virtual world, etc.) in different languages (i.e., for home-grown groups, particularly in Europe).  We have found significant terrorism content in more than 15 languages.

Testbed: We collect (using computer programs) various web contents every 2 to 3 months; we started spidering in 2002. Currently we only collect the complete contents of about 1,000 sites, in Arabic, Spanish, and English languages. We also have partial contents of about another 10,000 sites. In total, our collection is about 2 TBs in size, with close to 500,000,000 pages/files/postings from more than 10,000 sites.

We believe our Dark Web collection is the largest open-source extremist and terrorist collection in the academic world. (We have no way of knowing what the intelligence, justice, and defense agencies are doing.) Researchers can have graded access to our collection by contacting our research center. 
Web sites

Our web site collection consists of the complete contents of about 1,000 sites, in various static (html, pdf, Word) and dynamic (PHP, JSP, CGI) formats. We collect every single page, link, and attachment within these sites. We also collect partial information from about 10,000 related (linked) sites. Some large well-known sites contain more than 10,000 pages/files in 10+ languages (in selected pages).
Forums

We collect the complete contents (authors, headings, postings, threads, time-tags, etc.) of about 300 terrorist forums. We also perform periodic updates. Some large radical sites include more than 30,000 members with close to 1,000,000 messages posted. See a recent poster summarizing our capabilities in analyzing forums.

We have also developed the Dark Web Forum Portal, which provides beta search access to several international jihadist “Dark Web” forums collected by the Artificial Intelligence Lab at the University of Arizona. Users may search, view, translate, and download messages (by forum member name, thread title, topic, keyword, etc.). Preliminary social network analysis visualization is also available.
Blogs, social networking sites, and virtual worlds

We have identified and extracted many smaller, transient (meaning, the sites appear and disappear very quickly) blogs and social networking sites, mostly hosted by terrorist sympathizers and “wannabes.” We have also identified more than 30 (self-proclaimed) terrorist or extremist groups in virtual world sites. (However, we are still unsure whether they are “real” terrorist/extremists or just playing the roles in virtual games.)
Videos and multimedia content

Terrorist sites are extremely rich in content, with heavy usage of multimedia formats. We have identified and extracted about 1,000,000 images and 15,000 videos from many terrorist sites and specialty multimedia file-hosting third-party servers. More than 50% of our videos are IED (Improvised Explosive Devices) related.
Computational Techniques (Data Mining, Text Mining, and Web Mining)

Our computational tools are grouped into two categories:

    Collection
    Analysis and Visualization

I. Collection

Web site spidering
We have developed various focused spiders/crawlers based on our previous digital library research. Our spiders can access password-protected sites and perform randomized (human-like) fetching. Our spiders are trained to fetch all html, pdf, and word files, links, PHP, CGI, and ASP files, images, audios, and videos in a web site. To ensure freshness, we spider selected web sites every 2 to 3 months.

Forum spidering
Our forum spidering tool recognizes 15+ forum hosting software and their formats. We collect the complete forum including: authors, headings, postings, threads, time-tags, etc., which allow us to re-construct participant interactions. We perform periodic forum spidering and incremental updates based on research needs. We have collected and processed forum contents in Arabic, English, Spanish, French, and Chinese using selected computational linguistics techniques.

Multimedia (image, audio, and video) spidering
We have developed specialized techniques for spidering and collecting multimedia files and attachments from web sites and forums. We plan to perform stenography research to identify encrypted images in our collection and multimedia analysis (video segmentation, image recognition, voice/speech recognition) to identify unique terrorist-generated video contents and styles.
II. Analysis and Visualization

Social network analysis (SNA)
We have developed various SNA techniques to examine web site and forum posting relationships. We have used various topological metrics (betweeness, degree, etc.) and properties (preferential attachment, growth, etc.) to model terrorist and terrorist site interactions. We have developed  several clustering (e.g., Blockmodeling) and projection (e.g., Multi-Dimensional Scaling, Spring Embedder) techniques to visualize their relationships. Our focus is on understanding “Dark Networks” (unlike traditional “bright” scholarship, email, or computer networks) and their unique properties (e.g., hiding, justice intervention, rival competition, etc.).

Content analysis
We have developed several detailed (terrorism-specific) coding schemes to analyze the contents of terrorist and extremist web sites. Content categories include: recruiting, training, sharing ideology, communication, propaganda, etc. We have also developed computer programs to help automatically identify selected content categories (e.g., web master information, forum availability, etc.).

Web metric analysis
Web metrics analysis examines the technical sophistication, media richness, and web interactivity of extremist and terrorist web sites. We examine technical features and capabilities (e.g., their ability to use forms, tables, CGI programs, multimedia files, etc.) of such sites to determine their level of “web-savvy-ness.” Web metrics provides a measure for terrorists/extremists’ capability and resources. All terrorist site web metrics are extracted and computed using computer programs.

Sentiment and affect analysis
Not all sites are equally radical or violent. Sentiment (polarity: positive/negative) and affect (emotion: violence, racism, anger, etc.) analysis allows us to identify radical and violent sites that warrant further study. We also examine how radical ideas become “infectious” based on their contents, and senders and their interactions. We reply much on recent advances in Opinion Mining – analyzing opinions in short web-based texts. We have also developed selected visualization techniques to examine sentiment/affect changes in time and among people. Our research includes several probabilistic multilingual affect lexicons and selected dimension reduction and projection (e.g., Principal Component Analysis) techniques.

Authorship analysis and Writeprint
Grounded in authorship analysis research, we have developed the (cyber) Writeprint technique to uniquely identify anonymous senders based on the signatures associated with their forum messages. We expand the lexical and syntactic features of traditional authorship analysis to include system (e.g., font size, color, web links) and semantic (e.g., violence. racism) features of relevance to online texts of extremists and terrorists. We have also developed advanced Inkblob and Writeprint visualizations to help visually identify web signatures. Our Writeprint technique has been developed for Arabic, English, and Chinese languages. The Arabic Writeprint consists of more than 400 features, all automatically extracted from online messages using computer programs. Writeprint can achieve an accuracy level of 95%.

Video analysis
significant portion of our videos are IED related. Based on previous terrorism ontology research, we have developed a unique coding scheme to analyze terrorist-generated videos based on the contents, production characteristics, and meta data associated with the videos. We have also developed a semi-automated tool to allow human analysts to quickly and accurately analyze and code these videos.

IEDs in Dark Web analysis
We have conducted several systematic studies to identify IED related content generated by terrorist and insurgency groups in the Dark Web. A smaller number of sites are responsible for distributing a large percentage of IED related web pages, forum postings, training materials, explosive videos, etc. We have developed unique signatures for those IED sites based on their contents, linkages, and multimedia file characteristics. Much of the content needs to be analyzed by military analysts. Training materials also need to be developed for troops before their deployment (“seeing the battlefield from your enemies’ eyes”).
Title: Re: [intel] some of the tools used for attacking terrorist forums
Post by: kmfkewm on February 21, 2012, 04:31 am
I went to an Islamic terrorism forum once. They told their members not to rely on Tor to keep them anonymous.
Title: Re: [intel] some of the tools used for attacking terrorist forums
Post by: pine on February 21, 2012, 08:25 pm
I went to an Islamic terrorism forum once. They told their members not to rely on Tor to keep them anonymous.

Durka Durka Durka Durka!

Wait, is that a black helicopter I see... O_o
Title: Re: [intel] some of the tools used for attacking terrorist forums
Post by: Christy Nugs on February 22, 2012, 03:07 am
very interesting - i see a few things there i need to read up on
thanks...:)
Title: Re: [intel] some of the tools used for attacking terrorist forums
Post by: Derpasaurus on February 22, 2012, 04:16 am
They contract out to that company that was exposed in the UK who was making exploits and selling their forum killing services.
The CCC analyzed their stuff and found painfully amateur coding and typical script kiddy stuff. The DerpaDerkas don't know what they're doing and have shit site security and usually no security updates as they are out shooting off AKs and smoking opium.

I highly doubt that '95% accuracy' claim, definitely smells of a salesman for one of these contractors trying to convince scared govt employees into buying their junkware.

Also remember, they couldn't find Osama bin laden yet he was cleartext sending messages through hotmail and yahoo accounts in the same computer cafe's everyday through the same couriers and yet it took them forever to find him.



Title: Re: [intel] some of the tools used for attacking terrorist forums
Post by: orbitalics on February 22, 2012, 05:27 am
Dont ever forget that we are being watched!
Title: Re: [intel] some of the tools used for attacking terrorist forums
Post by: kmfkewm on February 22, 2012, 07:28 am
I went to an Islamic terrorism forum once. They told their members not to rely on Tor to keep them anonymous.

Durka Durka Durka Durka!

Wait, is that a black helicopter I see... O_o

For Real! How do people that live in caves and in a desert stay up with technology like this?

If you think all radical islamic terrorists live in caves in deserts (or even...many of them?) you are painfully naive.

Quote
They contract out to that company that was exposed in the UK who was making exploits and selling their forum killing services.
The CCC analyzed their stuff and found painfully amateur coding and typical script kiddy stuff. The DerpaDerkas don't know what they're doing and have shit site security and usually no security updates as they are out shooting off AKs and smoking opium.

I highly doubt that '95% accuracy' claim, definitely smells of a salesman for one of these contractors trying to convince scared govt employees into buying their junkware.

Also remember, they couldn't find Osama bin laden yet he was cleartext sending messages through hotmail and yahoo accounts in the same computer cafe's everyday through the same couriers and yet it took them forever to find him.

Yeah in general I think they have shitty computer skills, but they certainly have some forums that know about Tor etc. The 95% accuracy claim is true, I don't see why you doubt it since you don't apparently know anything about the technology.

Yeah it did take them for ever to find Osama. Honestly makes me question how hard they were looking.