/ BLOG / Data Dissemination

Van Jacobson did a talk at Google around about 30th of August, 2006. He brings up interesting ideas for true data dissemination.

The thrust of the talk is basically along the lines of; “we shouldn’t care about where data is, but getting the data”. This effectively leads on to a caching-like system. Now be aware, calling it caching does it a bit of a disservice, but its the easiest way to explain it without watching the talk. Effectively any device, connected to the network (by any form) should beable to provide data to any other client. In a time where we’re up against things such as Net Neutrality, this is rather interesting. We shouldn’t care where the end point is, but only getting the data and verifying its the real data. Pretty interesting stuff, in my opinion. Unfortunately I see a few issues, which I don’t remember being explicitly brought up.

If you take a look at the system from a BitTorrent analogy, which Van Jacobson sort of does, the first issue is that it works very well, but only on large data-sets and on static content. Once you release data onto a BitTorrent “network” its very difficult to alter that data without scrapping it and starting from scratch. To combat this, you need to look at the system in a slightly different way; perfecting the analogy, if you will.

Compare the concept to DNS. There’s a canonical source, which all secondary sources ping (even if via “proxy”) to check if there are changes. Now the issue with this is that whilst it copes with dynamic content, it will often take some time before it is distributed through out the entire system. This brings up questions of data expiration and so forth. For private areas on web applications, how do you prevent it from being cached and then otherwise shared?

This leads onto data verification, which is presented as a slightly unsolved issue in my view. Then again, this is a talk on infrastructure, not the finer details.

Finally, the major issue as I see it, is how do you persuade people that sharing their storage space is worth it? If everything is acting as a “cache”, then how much space do you dedicate to the system? How do you prevent leechers?

Overall, its a very cool talk and starts off with an introduction to how the ARPA network and TCP/IP came about, but from a very human stand point. If you don’t care about networking concepts, but are interested in how things came about in a human way (including a brief overview of the arguments over what originally constituted a network), then it might be worth watching the first 30 minutes or so.