Abstract hxxp://arxiv.org/abs/1107.4524Anonymity in Bitcoin, a peer-to-peer electronic currency system, is a complicated issue. Within the system, users are identified by public-keys only. An attacker wishing to de-anonymize its users will attempt to construct the one-to-many mapping between users and public-keys and associate information external to the system with the users. Bitcoin frustrates this attack by storing the mapping of a user to his or her public-keys on that user's node only and by allowing each user to generate as many public-keys as required. In this paper we consider the topological structure of two networks derived from Bitcoin's public transaction history. We show that the two networks have a non-trivial topological structure, provide complementary views of the Bitcoin system and have implications for anonymity. We combine these structures with external information and techniques such as context discovery and flow analysis to investigate an alleged theft of Bitcoins, which, at the time of the theft, had a market value of approximately half a million U.S. dollars.I. INTRODUCTIONBitcoin is a peer-to-peer electronic currency system first described in a paper by Satoshi Nakamoto (probably a pseudonym) in 2008 [1]. It relies on digital signatures to prove ownership and a public history of transactions to prevent double-spending. The history of transactions is shared using a peer-to-peer network and is agreed upon using a proof-of-work system [2], [3].The first Bitcoins were transacted in January 2009 and by June 2011 there were 6.5 million Bitcoins in circulation among an estimated 10,000 users [4]. In recent months, the currency has seen rapid growth in both media attention and market price relative to existing currencies. It has featured in both The Economist and Forbes magazine. At its peak, a single Bitcoin traded for more than US$30 on popular Bitcoin exchanges. At the same time, U.S. Senators and lobby groups in Germany, such as Der Bundesverband Digitale Wirtschaft (BVWD) or the Federal Association of Digital Economy, have raised concerns regarding the untraceability of Bitcoins and their potential to harm society through tax evasion, money laundering and illegal transactions. The implications of the decentralized nature of Bitcoin for authorities' ability to regulate and monitor the flow of currency is as yet unclear.Many users adopt Bitcoin for political and philosophical reasons, as much as pragmatic ones. While there is an understanding amongst Bitcoin's technical users that anonymity is not a prominent design goal of the system, we believe that this awareness is not shared throughout the community. For example, WikiLeaks, an international organization for anonymous whistleblowers, recently advised its Twitter followers that it now accepts anonymous donations via Bitcoin (see Fig. 1) and states that:Bitcoin is a secure and anonymous digital currency. Bitcoins cannot be easily tracked back to you, and are a [sic] safer and faster alternative to other donation methods.[hxxp://wikileaks.org/support.html Retrieved: 22-07-2011]They proceed to describe a more secure method of donating Bitcoins that involves the generation of a one-time public-key but the implications for those who donate using the tweeted public-key are unclear. Is it possible to associate a donation with other Bitcoin transactions performed by the same user or perhaps identify them using external information? At present, there is little detailed work on Bitcoin anonymity in the public domain the extent to which this anonymity holds in the face of determined analysis remains to be tested.Fig 1[http://xfq5l5p4g3eyrct7.onion/view.php?image=dace8ebeadcfa54e5289a8936964d892.png]Fig. 1. Screen capture of a tweet from WikiLeaks announcing their acceptance of anonymous Bitcoin donations'.This paper is organized as follows. In Sect. II we consider some existing work relating to electronic currencies and anonymity. The economic aspects of the system, interesting in their own right, are beyond the scope of this work. In Sect. III we present an overview of the Bitcoin system; we focus on three features that are particularly relevant to our analysis. In Sect. IV we construct two network structures, the transaction network and the user network using the publicly available transaction history. We study the static and dynamic properties of these networks. In Sect. V we consider the implications of these network structures for anonymity. We also combine information external to the Bitcoin system with techniques such as flow and temporal analysis to illustrate how various types of information leakage can contribute to the deanonymization of the system's users. Finally, we conclude in Sect. VI.A. A Note Regarding Motivation and DisclosureOur motivation for this analysis is not to de-anonymize individual users of the Bitcoin system. Rather, it is to demonstrate, using a passive analysis of a publicly available dataset, the inherent limits of anonymity when using Bitcoin. This will ensure that users do not have expectations that are not being fulfilled by the system.In security-related research, there is considerable tension over how best to disclose vulnerabilities [5]. Many researchers favor full disclosure where all information regarding a vulnerability is promptly released. This enables informed users to promptly take defensive measures. Other researchers favor limited disclosure; while this provides attackers with a window in which to exploit uninformed users, a mitigation strategy can be prepared and implemented before public announcement, thus limiting damage, e.g. through a software update. Our analysis does not show a vulnerability of the Bitcoin system per se, but does illustrate some potential risks and pitfalls with regard to anonymity. However, there is no central authority which can fundamentally change the system's behavior. Furthermore, it is not possible to mitigate analysis of the existing transaction history.There are also two noteworthy features of the dataset when compared with, say, contentious social network datasets, e.g. the Facebook profiles of Harvard University students [6]. Firstly, the delineation between what is considered public and private is clear: the entire history of Bitcoin transactions is publicly available. Secondly, the Bitcoin system does not have a usage policy. After joining Bitcoin's peer-to-peer network, a client can freely request the entire history of Bitcoin transactions; there is no crawling or scraping required.Thus, we believe the best strategy to minimise the threat to user anonymity is to be descriptive about the risks of the Bitcoin system. We do not identify individual users apart from those in the case study but we note that it is not difficult for other groups to replicate our work. Indeed, given the passive nature of the analysis, other parties may already be conducting similar analyses.II. RELATED WORKThe related work for this paper can be categorized into two fields: electronic currencies and anonymity.A. Electronic CurrenciesElectronic currencies can be technically classified according to their mechanisms for establishing ownership, protecting against double-spending, ensuring anonymity and/or privacy, and generating and issuing new currency. Bitcoin is particularly noteworthy for the last of these mechanisms. The proof-of-work system [2], [3] that establishes consensus regarding the history of transactions also doubles as a minting mechanism. The scheme was first outlined in the B-Money Proposal [7]. We briefly consider some alternative mechanisms. Ripple [8] is an electronic currency where every user can issue currency. However, the currency is only accepted by peers who trust the issuer. Transactions between arbitrary pairs of users require chains of trusted intermediaries between the users. Saito [9] formalized and implemented a similar system, i-WAT, in which the the chain of intermediaries can be established without their immediate presence using digital signatures. KARMA [10] is an electronic currency where the central authority is distributed over a set of users that are involved in all transactions. PPay [11] is a micropayment scheme for peerto- peer systems where the issuer of the currency is responsible for keeping track of it. However, both KARMA and PPay may incur a large overhead when the rate of transactions is high. Mondex is a smart-card electronic currency [12]. It preserves a central bank's role in the generation and issuance of electronic currency. Mondex was an electronic replacement for cash in the physical world whereas Bitcoin is an electronic analog of cash in the online world.The authors are not aware of any studies of the network structure of electronic currencies. However, there are such studies of physical currencies. The community currency Tomamae-cho was introduced into the Hokkaido Prefecture in Japan for a three-month period during 200405 in a bid to revitalize local economy. The Tomamae-cho system involved gift-certificates that were re-usable and legally redeemable into yen. There was an entry space on the reverse of each certificate for recipients to record transaction dates, their names and addresses, and the purposes of use, up to a maximum of five recipients. Kichiji and Nishibe [13] used the collected certificates to derive a network structure that represented the flow of currency during the period. They showed that the cumulative degree distribution of the network obeyed a powerlaw distribution, the network had small-world properties (the average clustering coefficient was high whereas the average path length was low), the directionality and the value of transactions were significant features, and the double-triangle system [14] was effective. There also exist studies of the physical movement of currency: Where's George?' [15] is a crowd-sourced method for tracking U.S. dollar bills where users record the serial numbers of bills in their possession, along with their current location. If a bill is recorded sufficiently often, its geographical movement can be tracked over time. Brockmann et al. [16] used this dataset as a proxy for studying multi-scale human mobility and as a tool for computing geographic borders inherent to human mobility.B. AnonymityPrevious work has shown the difficulty of maintaining anonymity in the context of networked data and online services which expose partial user information. Backstrom et al. [17] considered privacy attacks which identify users using the structure of the network around them and discussed the difficulty of guaranteeing user anonymity in the presence of network data. Crandall et at. [18] infer social ties between users where none are explicitly stated by looking at patterns of coincidences' or common off-network co-occurences. Narayanan and Shmatikov [19] de-anonymized the Netflix Prize dataset using information from IMDB [hxxp://www.imdb.com] which had similar user content, showing that statistical matching between different but related datasets can be used to attack anonymity. Puzis et al. [20] simulated the monitoring of a communications network using strategically-located monitoring nodes and show that, using real-world network topologies, a relatively small number of nodes can collaborate to pose a significant threat to anonymity. All of this work points to the difficulty in maintaining anonymity where network data on user behaviour is available and illustrates how seemingly minor information leakages can be aggregated to pose significant risks.III. THE BITCOIN SYSTEMThe following is a simplified description of the Bitcoin system; see Nakamoto [1] for a more thorough treatment. Bitcoin is an electronic currency with no central authority or issuer. There is no central bank or fractional reserve system controlling the supply of Bitcoins. Instead, they are generated at a predictable rate such that the eventual total number will be 21 million. There is no requirement for a trusted thirdparty when making transactions. Suppose Alice wishes to send' a number of Bitcoins to Bob. Alice uses a Bitcoin client to join the Bitcoin peer-to-peer network and makes a public transaction or declaration stating that one or more identities that she controls (which can be verified using publickey cryptography), and which previously had a number of Bitcoins assigned to them, wish to re-assign those Bitcoins to one or more other identities, at least one of which is controlled by Bob. The participants of the peer-to-peer network form a collective consensus regarding the validity of this transaction by appending it to the public history of previously agreed-upon transactions (the longest block-chain). This process, known as mining, involves the repeated computation of a cryptographic hash function so that the digest of the transaction, along with other pending transactions, and an arbitrary nonce, has a specific form. This process is designed to require considerable computational effort, from which the security of the Bitcoin mechanism is derived. To encourage users to pay this computational cost, the process is incentivized using newly generated Bitcoins and/or transaction fees.In this paper, there are three features of the Bitcoin system that are of particular interest. Firstly, the entire history of Bitcoin transactions is publicly available. This is necessary in order to validate transactions and prevent double-spending in the absence of a central authority. The only way to confirm the absence of a previous transaction is to be aware of all previous transactions. The second feature of interest is that a transaction can have multiple inputs and multiple outputs. An input to a transaction is either the output of a previous transaction or a sum of newly generated Bitcoins and transaction fees. A transaction frequently has either a single input from a previous larger transaction or multiple inputs from previous smaller transactions. Also, a transaction frequently has two outputs: one sending payment and one returning change. Thirdly, the payer and payee(s) of a transaction are identified through public-keys from public-private key-pairs. However, a user can have multiple public-keys. In fact, it is considered good practice for a payee to generate a new public-private keypair for every transaction. Furthermore, a user can take the following steps to better protect their identity: they can avoid revealing any identifying information in connection with their public-keys; they can repeatedly send varying fractions of their Bitcoins to themselves using multiple (newly generated) public-keys; and/or they can use a trusted third-party mixer or laundry. However, these practices are not universally applied.The three features above, namely the public availability of Bitcoin transactions, the input-output relationship between transactions and the re-use and co-use of public-keys, provide a basis for two distinct network structures: the transaction network and the user network. The transaction network represents the flow of Bitcoins between transactions over time. Each vertex represents a transaction and each directed edge between a source and a target represents an output of the transaction corresponding to the source that is an input to the transaction corresponding to the target. Each directed edge also includes a value in Bitcoins and a timestamp. The user network represents the flow of Bitcoins between users over time. Each vertex represents a user and each directed edge between a source and a target represents an input-output pair of a single transaction where the input's public-key belongs to the user corresponding to the source and the output's public-key belongs to the user corresponding to the target. Each directed edge also includes a value in Bitcoins and a timestamp.We gathered the entire history of Bitcoin transactions from the first transaction on the 3rd January 2009 up to and including the last transaction that occurred on the 12th July 2011. We gathered the dataset using the Bitcoin client [hxxp://www.bitcoin.org] and a modified version of Gavin Andresen's bitcointools. [hxxp://github.com/gavinandresen/bitcointools] The dataset comprises 1 019 486 transactions between 1 253 054 unique public-keys. We describe the construction of the corresponding transaction and user networks and their analyses in the following sections. We will show that the two networks are complex, have a non-trivial topological structure, provide complementary views of the Bitcoin system and have implications for the anonymity of users.IV. THE TRANSACTION AND USER NETWORKSA. The Transaction NetworkThe transaction network T represents the flow of Bitcoins between transactions over time. Each vertex represents a transaction and each directed edge between a source and a target represents an output of the transaction corresponding to the source that is an input to the transaction corresponding to the target. Each directed edge also includes a value in Bitcoins and a timestamp. It is a straight-forward task to construct T from our dataset.Fig 2[http://xfq5l5p4g3eyrct7.onion/view.php?image=bf958d16c38b37a71d699d677b5a91ac.png]Fig. 2. An example sub-network from the transaction network. Each rectangular vertex represents a transaction and each directed edge represents a flow of Bitcoins from an output of one transaction to an input of another.Figure 2 shows an example sub-network of T . t1 is a transaction with one input and two outputs. [The transactions and public-keys used in our examples exist in our dataset. The unique identifier for the transaction t1 is 09441d3c52fa0018365fcd2949925182f6307322138773d52c201f5cc2bb5976. You can query the details of a transaction or public-key by examining Bitcoin's longest block-chain using, say, the Bitcoin Block Explorer (hxxp://www.blockexplorer.com).] It was added to the block-chain on the 1st May 2011. One of its outputs assigned 1.2 BTC (Bitcoins) to a user identified by the publickey pk1.[13eBhR3oHFD5wkE4oGtrLdbdi2PvK3ijMC] The public-keys are not shown in Fig. 2. Similarly, t2 is a transaction with two inputs and two outputs.[0c4d41d0f5d2aff14d449daa550c7d9b0eaaf35d81ee5e6e77f8948b14d62378] It was accepted on the 5th May 2011. One of its outputs sent 0.12 BTC to a user identified by a different public-key, pk2.[19smBSUoRGmbH13vif1Nu17S63Tnmg7h9n] t3 is a transaction with two inputs and and one output.[0c034fb964257ecbf4eb953e2362e165dea9c1d008032bc9ece5cebbc7cd4697] It was accepted on the 5th May 2011. Both of its inputs are connected to the two aforementioned outputs of t1 and t2. The only output of t3 was redeemed by t4.[f16ece066f6e4cf92d9a72eb1359d8401602a23990990cb84498cdbb93026402]T has 974 520 vertices and 1 558 854 directed edges. The number of vertices is less than the total number of transactions in the dataset because we omit transactions that are not connected to at least one other transaction. These correspond to newly generated Bitcoins and transactions fees that are not yet redeemed. The network has neither multi-edges (multiple edges between the same pair of vertices in the same direction) nor loops. It is a directed acyclic graph (DAG) since the output of a transaction can never be an input (either directly or indirectly) to the same transaction.