Scribe Notes on PeerDB: A P2P-based system for distributed data sharing.

by Joe Prokop (prokjos@iit)

The goal of this paper is to build a p2p network of database enabled nodes that can all be searched in a SQL like syntax. This is achieved by using the BestPeer p2p system and building the database application on top of this base p2p network. The major problem with this concept is that it seems to require the peers to know the schema of the data on other peers. The schema is the format of tables and columns within a database. When querying a database, knowledge of this schema is used to create SQL statements that specify the data to be retrieved from the database. In this p2p model there are possibly thousands of peers each with its own database that is built and maintained by the peer. It?s extremely unlikely that if 30 peers all have the same type of data, they will also all store the data with exactly the same schema. For this reason this paper proposes creating metadata for each schema that will describe the content. With this metadata the PeerDB system can search other peers by content and not just by schemas, like a typical database system would have to. This allows for a user to enter a SQL like statement but still get all relevant data even if it follows a different schema.

The BestPeer system is the base p2p system that the PeerDB system is built on. This system functions much like other p2p systems such as Gnutella, with several major exceptions. First the system allows for mobile agents. This is a code fragment that can be transmitted to other peers and get executed there. These mobile agents can perform certain tasks such as formatting data before it gets returned to the peer whom requested it. Second, the BestPeer system will not only share storage space but also processing power. This happens in the form of these mobile agents, which can execute code on neighboring peers. Third, the system can reconfigure the network itself in order to cluster peers that share similar data. Peers that share no data between them will probably not be neighbors in the network. Finally peers in the BestPeer system all have a unique BestPeer Global Identity (BPID). This allows a peer to be identified as the same peer regardless of the peer?s current IP address. When a peer logs into the network it looks up its BPID on the Location-Independent Global Names Lookup Server (LIGLO).

The PeerDB system is built on top of the previously described BestPeer system. The PeerDB system itself is constructed of four main parts; a database, a DBAgent, a cache manager, and the user interface. The database used in this implementation is the MySQL database. This provides an efficient storage system for the data stored by the local peer. Associated with the database are two dictionaries for metadata. One is a local dictionary for data that is not shared and the other is the export dictionary for shared data. The metadata stored in these dictionaries describes the content of each table and attribute within that table. The DBAgent provides a space for mobile agents to run their code. The mobile agents do most of the work of sorting through the metadata to find the best matches. The user has the final say as to which tables to retrieve from a peer that is being searched. The cache manager stores results from previous queries in case it receives a similar query from a user. This can limit the strain on the peer that actually maintains this data as well as reduce overall network usage. The user interface obviously provides the interface between the system and the user.

The PeerDB system described in this paper provides a fairly unique method for sharing data. Whole files do not need to be transferred only the desired data needs to be. It also allows for the use of mobile agents, which can execute code on other peer?s machines. This leaves the possibility of having hostile peers who wish to execute malicious code that can damage the peers. This threat can probably be dealt with given enough safeguards in the system. A problem that is much more difficult to deal with would be the creation of bad or incomplete metadata. Metadata is supposed to give a description of the data contained in a table or attribute. Incomplete descriptions will have a negative effect on the amount of data a query can return. Bad metadata can come from a person setting up the system who doesn?t fully understand what will be stored or from a person who just doesn?t know how to properly create this metadata. It?s much harder to create good metadata then people would probably think.
9/17/03