Silk Road forums

Discussion => Security => Topic started by: kmfkewm on August 01, 2013, 04:03 am

Title: anonymous membership query
Post by: kmfkewm on August 01, 2013, 04:03 am
I have an idea, it is very likely not original but I have never read about it before. The problem is that Alice wants to ask Bob if he has a certain item in his database, but Alice doesn't want Bob to know what she is interested in. The solution is for Bob to run a bloom filter, adding items as normal. When Alice does membership query, she already knows the hash of the item she is interested in, and she knows the parameters of Bobs bloom filter. So Alice needs to do PIR to get each bit from the bloom filter where it will be set to 1 or 0 (1 for all if the item is in the database, probabilistically). So this is really cool but for it to be really really cool it needs a low bandwidth single bit PIR (not sequential block PIR). I only know one PIR scheme really well (the Pynchon Gate PIR), and it would work for this but it would be even worse than simply sending Alice the entire database (Since Alice's query size grows by 1 bit for each message in the database, and in this case each message is 1 bit). Does anyone know a good PIR protocol that would work well with this for low bandwidth anonymous membership query?
Title: Re: anonymous membership query
Post by: P2P on August 01, 2013, 05:37 am
I do not think anyone but astor or sovereignty will even understand any part of what you just said.
Title: Re: anonymous membership query
Post by: astor on August 01, 2013, 07:06 am
Sorry, I know nothing about this aside from what you wrote on the BitMessage forum. :)
Title: Re: anonymous membership query
Post by: fiveotwo on August 01, 2013, 07:10 am
After reading this I had a look into PIR and it's interesting.  Can't offer any insight, but I'm curious as to what you're planning to use it for.
Title: Re: anonymous membership query
Post by: kmfkewm on August 01, 2013, 07:23 am
I want Alice to be able to ask Bob if he has a message for her, without Bob knowing which messages Alice is looking for. The easiest solution is for Bob to tell Alice all of the messages he has. This is great unless there are 500,000 users sending 10 messages a day each tagged with 128 bit string. Now the meta data alone is about 76 megabytes, and there are 500,000 users so Bob now takes about 36 terabytes of bandwidth a day just letting everybody know the metadata of which messages are available. If everybody checks for 1,000 messages a day and the PIR cost is even 512 bits per query, 500,000 * 1,000 * 512 = 29.8 gigabytes of bandwidth for all clients to check for so many messages. But the results are dependent on the cost of the PIR, and the only PIR I know would not work well for this at all (only theoretically would it work, in practice it would be absolutely stupid).

Essentially you and I do ECDH key exchange and come to a shared secret. We communicate with each other through Bob by tagging messages with our shared secret prior to uploading them to Bob anonymously. Now you know the message tags that indicate that a message is waiting for you, but you cannot ask Bob "Hey do you have this message tag?" because then Bob knows what message will be for you. So you need a way to ask Bob if he has something without him being able to tell what you are asking if he has. The solution could be for Bob to keep a bloom filter. Bloom filter is essentially a block of bits , a fresh bloom filter all bits are set to 0. When Bob gets an item he takes its hash value, translates the hash value into a series of numbers that map to a bit position in the grid, and Bob sets the bit positions in the grid to be 1. If all bit positions mapping to an inserted item are set to 1 then you can query the bloom filter and see that the message has probabilistically been added. So now we have Bob with his bloom filter, and he gets these message tags when people send messages to him for others to download and he puts the tags in his bloom filter. Now I can ask Bob if he has the message by doing PIR to get the bits at the positions in his bloom filter (Since I know the message tag I know the bits that will be set to 1 if the message has been added, since I also have the public parameters of Bob's bloom filter). If after I do PIR and get all of the bits they are all 1, then I know Bob has my message, if any of them are 0 then I know Bob doesn't have my message, but in either case Bob cannot tell which messages I asked him if he has. And now Bob doesn't need to send me and 500,000 other people 76 megabytes a day.
Title: Re: anonymous membership query
Post by: kmfkewm on August 01, 2013, 07:31 am
So essentially I need a PIR that meets the following requirements:

1. Single bit, non-sequential

I do not want to get a block of bytes from a position in a database, I want to get single bits that are at pseudo-random locations throughout the database. This puts the Pynchon Gate PIR out the window because it has query sizes that grow by one bit for each item in the database, which works fine if the items in the database are 1 megabyte but if they are 1 bit then it is is obviously worse than just doing everybody gets everything PIR.

2. Scalable, not computationally bound to theoretical papers

I envision hundreds of thousands of clients doing thousands of queries a day. They are all for small bit strings though.

3. Computational PIR. I want a single server PIR, so the database doesn't need to be distributed.

I don't know if any PIR schemes meet these requirements. I understand Pynchon Gate PIR inside and out because it is fairly simple, but many of the advanced PIR papers look to me as if somebody vomited up math symbols onto a paper. I want to spend time to implement something and I am willing to spend much time figuring out how to do it and learning the associated math, but I don't want to spend four weeks to get to the point of understanding that I realize this PIR will not work for me, thirty times before I find the one that will.
Title: Re: anonymous membership query
Post by: Railgun on August 01, 2013, 07:37 am
Yeah, you seem super-knowedgeable about these matters. If I can borrow a page from your own book, I would caution asking such questions that would be better suited and more properly answered on some place like stack-exchange because your topic is very niche-oriented.
Title: Re: anonymous membership query
Post by: kmfkewm on August 01, 2013, 07:54 am
Pretty much here is where I am at. I have been working for about two years now coding a system for secure communications, and have tens of thousands of lines of source code already done. I have been sort of designing this thing as I go. At first I tried to make entire design by myself. That was a bad plan. Then I focused on the academic literature and implemented other peoples ideas. That was a much better plan. I have Sphinx cryptographic packet format completely implemented in C (as well as LIONESS block cipher that it uses). I have greatly wrapped OpenSSL crypto library, and can do ECDH key exchange, AES counter encryption, MAC, and all of the other stuff, with a simple call. I have a NIST password entropy estimator implemented, as well as a bloom filter, as well as wrapped database libraries for easy database management, secure wipe, tons of shit. I also have Alpha mixing mix network system almost fully implemented, and distributed directory servers almost fully implemented, tons of networking code, and a lot of other shit I am sure I cannot remember right now (I have taken a multi month break on working on this because of various reasons). I also have the PIR of Pynchon Gate implemented. So I am close to having the entire Pynchon Gate whitepaper implemented, using Sphinx and Alpha mixing for forward messages, with automatic ECDH between communicating parties, etc. I also have Tor wrapped up and integrated, as all mixes are run as hidden services and all clients use Tor.

But I want to diverge from Pynchon Gate at this point. I don't like that Pynchon gate has a semi-trusted nymserver that gathers all messages to me and then puts them in a bucket that is distributed for PIR. I also don't like that Pynchon gate assumes person to person communications rather than group communications. I want to remove the nymserver by having contact strings between communicating parties. Essentially the communicating parties do ECDH key exchange and have now shared secrets which can be iteratively hashed out to tag to messages for identification. Now I do not need a trusted nymserver that gets messages for Bob and puts them in Bobs bucket, because each individual message is tagged with an unlinkable string that Bob can identify is for him. Okay that is great, except for two things. One thing is that Pynchon gate has buckets that are all padded to the same size, so every PIR cycle Bob always gets his entire bucket of messages and then sorts out the individual messages. This is made possible because of the Nymserver, which bunches all messages to Bob together for retrieval with block PIR. So instead I want Bob to have these contact strings for a thousand people he is communicating with, and every cycle I want Bob to query the PIR server to find out which of his contacts have sent him a message. This is what I am talking about in my original post, how Alice can ask Bob if he has a message for her without Bob knowing which message Alice is asking for. After Alice finds that Bob does have a message for her she then needs to retrieve it. I am a bit shakey on this part as well. If messages are not all the same size then using block PIR to get messages will be a waste of time since the message sizing leaks the message. So all things must be padded to the same size before they are retrieved. But I would rather for the user to be able to do non-sequential PIR to get say 6 KB of messaging data tagged with one contact string in one spot of the database and 2 KB of messaging data tagged with another contact string in another part of the database, and then 2 KB of padding, instead of the user having all of their total message data put in a sequential bucket with 2 KB of padding. So essentially I am working on figuring out

1. How can Alice ask Bob if Bob has a message for Alice without Bob knowing which Message Alice is interested in?
2. Once Alice knows she has a message, how can she get the message without leaking what the message is? 

without having to individually pad each message to the same size (bandwidth prohibitive) and without a trusted nymserver being able to group messages to Alice together so she can get them (padded to the same bucket size) with sequential byte PIR.

If I can find good answers to question 1 and 2 then I will have a very very secure and anonymous encrypted messaging system that can scale to group communications. If that fails I will probably just fall back to finishing Pynchon Gate, but honestly it is not exactly what I want. The hardest part is making shit scale to group communications, doing person to person is soooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo much easier than group communications.
Title: Re: anonymous membership query
Post by: White 0ut on August 01, 2013, 08:09 am
What in the literal fuck is going on in here exactly?
Title: Re: anonymous membership query
Post by: cirrus on August 01, 2013, 08:15 am
What in the literal fuck is going on in here exactly?

(shrugs)  O_o
Title: Re: anonymous membership query
Post by: astor on August 01, 2013, 08:16 am
How can you ask a server for a block of data without the server knowing which block it is?
Title: Re: anonymous membership query
Post by: Railgun on August 01, 2013, 08:18 am
How can you ask a server for a block of data without the server knowing which block it is?

I know a little about computers, but he's using a lot of security-niche language. Could you paraphrase and abridge for us lessors please?
Title: Re: anonymous membership query
Post by: kmfkewm on August 01, 2013, 08:34 am
Okay here is a simple and theoretically sound demonstration that unfortunately will not work in practice. Alice is the user and she talks to Carol, and Bob is the server.

Bob has a bloom filter. Okay, I am keeping this small because I need to write the entire thing out for demonstrative purpose. So Bob's blank bloom filter looks like this

00000

Alice sends a message through Bob to Carol tagged with "I am some secret between Alice and Carol". Bob uses a public algorithm to translate this tag into a series of bit positions, let's say position 0, 2 and 4. So now Bob sets the bits at those positions to be 1 in his bloom filter, to show (probabilistically) that a message with the tag "I am some secret between Alice and Carol" has been added to the database. This gives us a bloom filter that looks like this:

10101

Now Carol wants to know if Bob has a message to her that came from Alice. Since Carol knows "I am some secret between Alice and Carol" and she knows the public algorithm Bob used to translate this secret into a series of bits to set to 1 in his bloom filter, now Carol can determine probabilistically if Bob has that message in his database, by doing PIR to get the bit values at positions 0, 2 and 4 in Bob's bloom filter.

This is where I need to diverge significantly, because I only know one PIR system very well, and it does not work well for this, and it assumes distributed copies of the database. So now let's assume that there are 2 Bob's , and neither cooperates with the other, and they both have the same bloom filter. So the Bob's databases look like this:

10101

and Carol needs to get position 0, 2 and 4. In Pynchon Gate PIR Alice would do this by first generating a random string with one bit for each message in the database. In this case each message in the database IS a bit, so it kind of sucks, but for example:

00101

And now Carol needs to make a second random *looking* string that causes the first string to Xor to 0 at all positions except for the position she wants. So first Carol wants position 0:

00101

0 XOR 1 = 1
0 XOR 0 = 0
1 XOR 1 = 0
0 XOR 0 = 0
1 XOR 1 = 0

so Carol now has the second string 10101. So now Carol has the two strings

00101 and 10101 and she sends one to the first Bob and one to the second Bob. The Bob's then XOR together messages at positions marked by 1 in the string the receive and send the result back to Alice. So the database is 10101:

Bob1 gets 00101

1 XOR 1 = 0

Bob2 gets 10101

1 XOR 1 = 0 XOR 1 = 1

Carol gets 0 and 1 in response, 0 XOR 1 = 1, so the first bit in the bloom filter is set to one. I am not going to write this out all the way, but Carol does this for each of the bit positions associated with the message, and she finds that the result is

111

which means that probabilistically the message from Alice has been added to the list of messages.

Title: Re: anonymous membership query
Post by: SelfSovereignty on August 01, 2013, 08:37 am
I do not think anyone but astor or sovereignty will even understand any part of what you just said.

Sorry, I'm not gonna be much more help than Astor here.  The best (alright, alright, the only) example I know of that may be suited to your needs is Percy++, but I never actually played with it or anything.  I don't know man, it might be exactly what you need or it might be inferior to Pynchon Gate for your purposes.  I'm not familiar enough with both to know.  If you haven't run into it yet, http://percy.sourceforge.net/

Check out the advanced crypto software list at http://acsc.cs.utexas.edu/ too, perhaps.  Honestly that's the best input I have for you at this point -- you're literally writing these posts faster than I'm reading them :P
Title: Re: anonymous membership query
Post by: kmfkewm on August 01, 2013, 08:52 am
Thanks for suggestion of Percy++ SS, it looks like it has a very nicely written paper and I can probably figure out if it will work fairly quickly. It actually looks quite promising right now, after a brief read of the whitepaper, but I need to be sober and spend some time reading it before I conclude it will work. It is already implemented in C++ though, and has a really nice and well explained whitepaper, so it shouldn't take me too long to see if it will work for me. The biggest problem I see with it immediately is that it is multi-server, I would prefer computational single server PIR, but distributed database isn't the end of the world it just adds some to the complexity.

I note from the paper that it was actually designed with the intention of improving on the pynchon gate PIR, so in either case I will probably use this for Pynchon Gate if I end up not being able to do what I want.

Quote
of PIR, an “item” is often thought of as a single bit out of
an n-bit database, but it could also be a “block” of size b
bits. In the latter case, the n-bit database is considered to be
composed of n/b blocks, each of size b bits.

I really need a PIR system where it is a single bit out of an n-bit database, pynchon gate style PIR works great for blocks but when it comes to single bits it is worse than everybody gets everything.

Quote
A number of
applications have been proposed for PIR, including patent
and pharmaceutical databases [1], online census informa-
tion [17], and real-time stock quotes [17]. The Pynchon
Gate [11] shows how to use PIR for an arguably more real-
istic purpose: retrieving pseudonymously addressed email;
it argues that PIR is a more suitable primitive for this appli-
cation than previous proposals.
    A trivial solution to the PIR problem is simply to ask
the server for the whole database and look up the desired
bit or block yourself. To make things more interesting (not
to mention practical), we analyze the communication cost
of the protocol—the total number of bits transmitted—and
insist that it be sublinear; that is, less than n.
    There are two main types of PIR: information-theoretic
and computational. In information-theoretic PIR, the server
is unable to determine any information about your query
even with unbounded computing power. In computational
PIR (CPIR) [3, 8], the privacy of the query need only
be guaranteed against servers restricted to polynomial-time
computations. Note that in the information-theoretic case
the unbounded power is only to be used to try to compro-
mise your privacy; in either case we still insist that you and
the servers use only polynomial-time computations in order
to perform the protocol.
    It is an unsurprising fact that information-theoretic sub-
linear PIR is impossible with a single server. However, it is
possible when there are l servers, each with a copy of the
database—assuming that the servers do not collude in order
to determine your query. A t-private l-server PIR is a PIR
system in which the privacy of the query is information-
theoretically protected, even if up to t of the l servers col-
lude. (Of course, it must be the case that t < l.)
    Beimel and Stahl [2] investigate the case where servers
can fail to respond. In this event, it is important that the
Title: Re: anonymous membership query
Post by: Railgun on August 01, 2013, 08:58 am
Okay here is a simple and theoretically sound demonstration that unfortunately will not work in practice. Alice is the user and she talks to Carol, and Bob is the server.

Bob has a bloom filter. Okay, I am keeping this small because I need to write the entire thing out for demonstrative purpose. So Bob's blank bloom filter looks like this

00000

Alice sends a message through Bob to Carol tagged with "I am some secret between Alice and Carol". Bob uses a public algorithm to translate this tag into a series of bit positions, let's say position 0, 2 and 4. So now Bob sets the bits at those positions to be 1 in his bloom filter, to show (probabilistically) that a message with the tag "I am some secret between Alice and Carol" has been added to the database. This gives us a bloom filter that looks like this:

10101

Now Carol wants to know if Bob has a message to her that came from Alice. Since Carol knows "I am some secret between Alice and Carol" and she knows the public algorithm Bob used to translate this secret into a series of bits to set to 1 in his bloom filter, now Carol can determine probabilistically if Bob has that message in his database, by doing PIR to get the bit values at positions 0, 2 and 4 in Bob's bloom filter.

This is where I need to diverge significantly, because I only know one PIR system very well, and it does not work well for this, and it assumes distributed copies of the database. So now let's assume that there are 2 Bob's , and neither cooperates with the other, and they both have the same bloom filter. So the Bob's databases look like this:

10101

and Carol needs to get position 0, 2 and 4. In Pynchon Gate PIR Alice would do this by first generating a random string with one bit for each message in the database. In this case each message in the database IS a bit, so it kind of sucks, but for example:

00101

And now Carol needs to make a second random *looking* string that causes the first string to Xor to 0 at all positions except for the position she wants. So first Carol wants position 0:

00101

0 XOR 1 = 1
0 XOR 0 = 0
1 XOR 1 = 0
0 XOR 0 = 0
1 XOR 1 = 0

so Carol now has the second string 10101. So now Carol has the two strings

00101 and 10101 and she sends one to the first Bob and one to the second Bob. The Bob's then XOR together messages at positions marked by 1 in the string the receive and send the result back to Alice. So the database is 10101:

Bob1 gets 00101

1 XOR 1 = 0

Bob2 gets 10101

1 XOR 1 = 0 XOR 1 = 1

Carol gets 0 and 1 in response, 0 XOR 1 = 1, so the first bit in the bloom filter is set to one. I am not going to write this out all the way, but Carol does this for each of the bit positions associated with the message, and she finds that the result is

111

which means that probabilistically the message from Alice has been added to the list of messages.

So basically the database server never knows who it has a message for, only that said message exists. The more "Bobs" there are, the less likely the knowledge of the recipient of such a message is, which would be further obscured by increasing string length? But wouldn't they know the original person is leaving messages?

Also could this not be done with a network such as Tor already using a distributed network?
Title: Re: anonymous membership query
Post by: kmfkewm on August 01, 2013, 09:00 am
How can you ask a server for a block of data without the server knowing which block it is?

The answer to that question is in several papers on Computational Single Server PIR (CPIR). Please let me know if you find one that will work for what I am trying to do, I will read it after you suggest it to me :D.
Title: Re: anonymous membership query
Post by: kmfkewm on August 01, 2013, 09:02 am
Quote
So basically the database server never knows who it has a message for, only that said message exists. The more "Bobs" there are, the less likely the knowledge of the recipient of such a message is, which would be further obscured by increasing string length?

Yep. In Pynchon Gate the server knows who it has a message for, but not who gets the message. I want a system where the server doesn't know who it gets a message for and doesn't know who gets the message, to protect from network analysis attacks when I scale things to group communications.
Title: Re: anonymous membership query
Post by: kmfkewm on August 01, 2013, 09:12 am
Quote
So basically the database server never knows who it has a message for, only that said message exists. The more "Bobs" there are, the less likely the knowledge of the recipient of such a message is, which would be further obscured by increasing string length?

Yep. In Pynchon Gate the server knows who it has a message for, but not who gets the message. I want a system where the server doesn't know who it gets a message for and doesn't know who gets the message, to protect from network analysis attacks when I scale things to group communications.

Because I don't want the server to know "I got a message for Alice and Bob and Carol" because then it knows Alice and Bob and Carol communicate with some third party as well possibly. I want the server to know "I got a message for random one time use string A and random one time use string B and random one time use string C" because then it cannot link Alice and Bob and Carol, or one entity to another. Otherwise we could end up with a system where the server cannot tell which IP belongs to Alice and which IP belongs to Bob and which IP belongs to Carol, but it can tell that Alice and Bob and Carol communicate with each other. I want to avoid the ability of the server to tell who is communicating with who even if it cannot tie an IP address to any of the who.
Title: Re: anonymous membership query
Post by: astor on August 01, 2013, 09:14 am
How can you ask a server for a block of data without the server knowing which block it is?

The answer to that question is in several papers on Computational Single Server PIR (CPIR). Please let me know if you find one that will work for what I am trying to do, I will read it after you suggest it to me :D.

LOL. I thought the explanation with Alice and Carol was in response to my question, and I thought, that allows Carol to determine a message for her exists, but still, how does she ask the server for that message? I suppose she could ask for the whole database or a block of it that happens to include the message, so there's some plausible deniability.

Also, I want to know what Alice and Carol's secret is!
Title: Re: anonymous membership query
Post by: astor on August 01, 2013, 09:17 am
Also, if they are anonymous users connecting over Tor, then why does it matter? Why not ask for the message directly?
Title: Re: anonymous membership query
Post by: SelfSovereignty on August 01, 2013, 09:26 am
How can you ask a server for a block of data without the server knowing which block it is?

The answer to that question is in several papers on Computational Single Server PIR (CPIR). Please let me know if you find one that will work for what I am trying to do, I will read it after you suggest it to me :D.

Okay, you know more than me about this stuff, but... I thought it could be proven that this isn't even possible, meaning that you can either have a single server that's not computationally-theoretically robust but can be inefficient and information-theoretically robust, or multiple servers that are both (proportional to the number of parties) computationally-theoretically robust and information-theoretically robust?  Or is it... oh fuck it man, I can barely even remember the words let alone the concepts.  I forgot stuff like this the week after the tests ::)
Title: Re: anonymous membership query
Post by: kmfkewm on August 01, 2013, 09:27 am
How can you ask a server for a block of data without the server knowing which block it is?

The answer to that question is in several papers on Computational Single Server PIR (CPIR). Please let me know if you find one that will work for what I am trying to do, I will read it after you suggest it to me :D.

LOL. I thought the explanation with Alice and Carol was in response to my question, and I thought, that allows Carol to determine a message for her exists, but still, how does she ask the server for that message? I suppose she could ask for the whole database or a block of it that happens to include the message, so there's some plausible deniability.

Also, I want to know what Alice and Carol's secret is!

How she asks the server for the message after she determines it exists is part two hehe. I suppose she could do some private stream searching algorithm, but as far as I know all of these sort of assume that each message is padded to the same size, gah. Alice knowing that the server has a message for her is good, it would be even better if she could determine where in the database the message is. The simplest solution would be for the server to keep track of every contact string and send to every client the list of contact strings, the size of the message they point to, and where in the database they start. That is bandwidth prohibitive when things scale up to hundreds of thousands of clients. Assuming every message has 1 kb of associated data, and there are 500,000 clients getting 10 messages a day, that means there is about 4.8 gigabytes of metadata a day. Sending that to 500,000 clients is not going to fly, so the clients need to be able to query for the data that is relevant to them without the server being able to determine that the data is relevant. I think with a bloom filter and PIR I can hopefully maybe get it so the clients can determine if they have a message waiting for them with low bandwidth, but then the problem is how do the clients find out where the message to them is in the database, and how do they get it without the server knowing it is the message they got.

The biggest issue I am running into is that all of these systems seem to really like the idea of related data being blocked together into sequential buckets that are all padded to the same size instead of thrown up all over the database with 1kb here and 4kb there. Perhaps I could have every message broken down into like 1KB packets, with each packet tagged with the contact string and obtained with private stream searching. But the problem is that the server cannot learn the total number of packets that an individual message consists of. I want to have messages of various different sizes and let clients get them from random spots in the database, and then padding to a certain size, without the server being able to tell the number of individual packets that are returned for a single keyword search. Gah it is really hard to even think about this shit for me honestly, it is a level or two beyond what I know about cryptography and math, and I am struggling to think of how to do it. It is like I am right on the verge of a break through but I just cannot quite get all the way there. I wish I just knew some papers that will work to do what I want when I implement them, I have a much easier time being told what to implement, getting the paper and studying my ass off until I can do it, than I do trying to figure out exactly how to describe what I want and then reading fifty papers trying to find if any of them sound similar hehe.
Title: Re: anonymous membership query
Post by: kmfkewm on August 01, 2013, 09:29 am
How can you ask a server for a block of data without the server knowing which block it is?

The answer to that question is in several papers on Computational Single Server PIR (CPIR). Please let me know if you find one that will work for what I am trying to do, I will read it after you suggest it to me :D.

Okay, you know more than me about this stuff, but... I thought it could be proven that this isn't even possible, meaning that you can either have a single server that's not computationally-theoretically robust but can be inefficient and information-theoretically robust, or multiple servers that are both (proportional to the number of parties) computationally-theoretically robust and information-theoretically robust?  Or is it... oh fuck it man, I can barely even remember the words let alone the concepts.  I forgot stuff like this the week after the tests ::)

It is proven that a single server information theoretically secure PIR algorithm must be everybody gets everything (ie: the entire database must be sent to have information theoretically secure single server PIR), but single server computationally secure PIR is possible (just like a one time pad is information theoretically secure, but RSA is computationally secure. Single server PIR works if it has security comparable to RSA, but not comparable to a one time pad).
Title: Re: anonymous membership query
Post by: kmfkewm on August 01, 2013, 09:32 am
Also, if they are anonymous users connecting over Tor, then why does it matter? Why not ask for the message directly?

Because I don't trust the anonymity of Tor and I am trying to make a system that can resist the NSA :). If I simply wanted to make a distributed encrypted forum that uses Tor I would have been done last year. The anonymity part is what really makes things fucking hard.
Title: Re: anonymous membership query
Post by: SelfSovereignty on August 01, 2013, 09:40 am
You know, a quote suddenly comes to mind.  I find there to be tremendous wisdom in it: "premature optimization is the root of all evil."

Be sure you're solving a problem that you're actually faced with -- do you actually have all these users chomping at the bit, or will it be several years at least before your system (if it takes off) is under that kind of strain?  Are you taking into account the increased computational power that will be available to you at the time you need to actually solve this issue?

Just some thoughts that may or may not be of use... :)
Title: Re: anonymous membership query
Post by: kmfkewm on August 01, 2013, 09:56 am
You know, a quote suddenly comes to mind.  I find there to be tremendous wisdom in it: "premature optimization is the root of all evil."

Be sure you're solving a problem that you're actually faced with -- do you actually have all these users chomping at the bit, or will it be several years at least before your system (if it takes off) is under that kind of strain?  Are you taking into account the increased computational power that will be available to you at the time you need to actually solve this issue?

Just some thoughts that may or may not be of use... :)

That is a good point. I have been trying to make something that will scale to 500,000 users making 10 posts a day, but so far I have no users and they are making no posts a day. Even assuming ten thousand users making ten posts a day, things become much more realistic in the bandwidth department. Even if we assume all messages are 10KB, and the metadata associated with a message is 1KB:

10,000 users making 10 posts a day brings metadata to about 98 megabytes per day * 10,000 users = about 960 gigabytes in metadata bandwidth per day (98 megabytes per user per day).

10,000 users * 5,000 messages per day * 10 KB per message = 480 gigabytes in message data for receive

assuming Pynchon Gate PIR, there are 100,000 messages per day means each query is 100,000 bits about 12.2 KB * 5,000 posts * 10,000 users = 572 gigabytes in request data

960 + 572 + 480 = 2,012 / 1,024 = 1.96 terabytes of bandwidth required per day, which is realistically affordable with a high end server package.

On the other hand, assuming 500,000 users making 10 posts a day brings metadata to 4.8 gigabytes * 500,000 = 2343 terabytes in metadata a day, which is completely unrealistic.

But I would much rather make something that can scale to 500,000 users than something that only scales to 10,000 users.
Title: Re: anonymous membership query
Post by: astor on August 01, 2013, 10:04 am
I would like something that could replace Tormail this year, and that relies on no central server. That is Torchat's greatest strength. Nobody can take it down or overwhelm the network / server, like what appears to happen to Tormail every time there's a database error. The trade off is that Torchat is a live messaging system. Both parties must be online at the same time. If it stored messages like BitMessage, that would be perfect.
Title: Re: anonymous membership query
Post by: kmfkewm on August 01, 2013, 10:07 am
I would like something that could replace Tormail this year, and that relies on no central server. That is Torchat's greatest strength. Nobody can take it down or overwhelm the network / server, like what appears to happen to Tormail every time there's a database error. The trade off is that Torchat is a live messaging system. Both parties must be online at the same time. If it stored messages like BitMessage, that would be perfect.

Freenet sounds like what you want.
Title: Re: anonymous membership query
Post by: heisenbud on August 01, 2013, 01:53 pm
Fuck that - my head just blew up and is dripping onto the table.
Title: Re: anonymous membership query
Post by: /I_Surf_Worm_Holes on August 01, 2013, 02:08 pm
Interesting thread.  Just chiming in to remind myself to come back and parse.

Title: Re: anonymous membership query
Post by: SelfSovereignty on August 02, 2013, 11:05 pm
it really annoys me when i read something and have absolutely no idea what it means, I'm too curios and end up researching things online for no other reason then I HAVE to understand...

I spent a year studying poker and almost as soon as i felt i really understood the game and could win consistently i lost interest in playing.

it's all about the journey of discovery and not so much where i end up.

its a curse  ;)

Amen.  I'm not really sure what I'd do with my time if not for the pursuit of understanding though... it saved me from suicide, in the end.  Well, the quest for understanding paired with amphetamine abuse saved me from suicide, to be totally frank... there's nothing like amphetamine to keep you focused & distracted :P  But the curse is the only reason I have any clue wtf this thread is even about.  Still, it is kind of aggravating...

I'm obsessed with the horizon and don't seem to care that it's forever beyond reach.  I think it's fair to call it a curse when the goal isn't even something that's possible to achieve: there's always something more that you don't understand, isn't there... oh well.  Gotta try anyway :)

... wtf is my problem, getting all maudlin in the midst of geeky-cool techno-crypto time.  Pardon me, I think I may need another dose  :P
Title: Re: anonymous membership query
Post by: jackofspades on August 02, 2013, 11:21 pm
I do not think anyone but astor or sovereignty will even understand any part of what you just said.

Or Pine.

it really annoys me when i read something and have absolutely no idea what it means, I'm too curios and end up researching things online for no other reason then I HAVE to understand...

I spent a year studying poker and almost as soon as i felt i really understood the game and could win consistently i lost interest in playing.

it's all about the journey of discovery and not so much where i end up.

its a curse  ;)

Me too but if it makes ya feel better, OP mine as well have been speaking a different language, im not really the computer type haha.

Title: Re: anonymous membership query
Post by: kmfkewm on August 03, 2013, 03:20 am
Actually found a solution in the advanced crypto archives. I can break messages down into single packets and have the packets tagged with iteratively hashed contact strings. So after doing a key exchange with you we both generate two random strings associated with ourselves for each other, for example:

My Public Contact String: a4244aa43ddd6e3ef9e64bb80f4ee952f68232aa008d3da9c78e3b627e5675c8
My Shared Private String: ac8406ff09a4e60a031a2db5340ba1a6e8154773c39b6b797e5b76dc00e41433

Your Public Contact String: 20b2ba0d04afc53a0e448e084286168e9e7b310dcc3c9d5895ece72208f457f6
Your Shared Private String: 31b78669a324805ea5d0d8593fce14cc6631b36b1bdd3105afadd6d83359fe3f

Now assuming all message packets are 1kb, let's say you send me a 2 KB message. So the first packet is tagged with your original public contact string

20b2ba0d04afc53a0e448e084286168e9e7b310dcc3c9d5895ece72208f457f6

and the next packet is tagged with the hash value of your original public contact string concatenated with your private contact string:

49cf76433764ffd0064d312cc59958e3342e4e06a6b71d7c9244570c0b1e90ee

etc for every message. When I receive messages I use this system:

http://hms.isi.jhu.edu/acsc/privss/

Quote
Suppose a client sends some search keywords to a server. The server checks some documents against the keywords and eventually sends back all the documents that matched. But the catch is that the client wants all this to take place without the server being able to learn what keywords they are interested in or which documents they end up with. These programs let you do that.

and first I search for the first hash that I know will be concatenated to any messages from you, and I set my limit to be 1 returned packet (this protocol lets you set the maximum number of documents, in my case packets, to be returned by the search). After I get your first packet it has information letting me know there is a second packet, so now I do the keyword search for the next contact string, etc, until I have all of the packets you have sent to me so far. I do this for all of my contacts or until I hit a predefined amount of packets for a given cycle (period of time), and if the number of packets I receive so far does not equal the maximum number of packets I receive per cycle, I search for and receive padding packets up to my maximum.

It still might be more efficient if I have an anonymous membership query system so I only search for messages that are actually waiting for me though, depending on how efficient I can make the anonymous membership query versus how efficient it is to search for keywords that do not map to anything in particular. This is actually pretty neat I can probably make it work for static content websites as well, and possibly even specialized interactive hidden services. This encrypted keyword search algorithm has really good bandwidth properties, the bandwidth required is the size of the message received plus a small bit of metadata, so to receive 1kb packet only costs a little over 1kb, and with this I can let the clients get messages part by part from throughout the database instead of having to get a sequential bucket like I would need to do with block PIR (which means there is no need for a semi-trusted nymserver that can tell who is communicating with who, in order to block all messages to a single user together so they can be received with block PIR).
Title: Re: anonymous membership query
Post by: kmfkewm on August 03, 2013, 03:41 am
Automatically encrypted with ECDH-384 + AES-256 + signed with ECDSA-384 mixed outgoing messages and padding + padded private keyword search for getting messages , should be pretty good security and anonymity for forum communications. I will probably add blind Bitcoin mixing and integrate Bitcoin as well, for one click payment of pseudonyms (or maybe somebody will fork Bitcoin and integrate Zerocoin already). Time to get back to work :D.
Title: Re: anonymous membership query
Post by: astor on August 03, 2013, 04:41 am
Nice work, kmf.

At this point, the main barrier to libzerocoin integration is political rather than technical. The Bitcoin developers fear that Bitcoin will be seen as a money laundering system (even more than it is now) if Zerocoin is integrated, so they are unlikely to include it in the Bitcoin protocol / Bitcoin-QT client, although their opinion may change. That doesn't stop other altcoins from integrating it, of course.