You know, a quote suddenly comes to mind. I find there to be tremendous wisdom in it: "premature optimization is the root of all evil."
Be sure you're solving a problem that you're actually faced with -- do you actually have all these users chomping at the bit, or will it be several years at least before your system (if it takes off) is under that kind of strain? Are you taking into account the increased computational power that will be available to you at the time you need to actually solve this issue?
Just some thoughts that may or may not be of use...
That is a good point. I have been trying to make something that will scale to 500,000 users making 10 posts a day, but so far I have no users and they are making no posts a day. Even assuming ten thousand users making ten posts a day, things become much more realistic in the bandwidth department. Even if we assume all messages are 10KB, and the metadata associated with a message is 1KB:
10,000 users making 10 posts a day brings metadata to about 98 megabytes per day * 10,000 users = about 960 gigabytes in metadata bandwidth per day (98 megabytes per user per day).
10,000 users * 5,000 messages per day * 10 KB per message = 480 gigabytes in message data for receive
assuming Pynchon Gate PIR, there are 100,000 messages per day means each query is 100,000 bits about 12.2 KB * 5,000 posts * 10,000 users = 572 gigabytes in request data
960 + 572 + 480 = 2,012 / 1,024 = 1.96 terabytes of bandwidth required per day, which is realistically affordable with a high end server package.
On the other hand, assuming 500,000 users making 10 posts a day brings metadata to 4.8 gigabytes * 500,000 = 2343 terabytes in metadata a day, which is completely unrealistic.
But I would much rather make something that can scale to 500,000 users than something that only scales to 10,000 users.