Peer-to-Peer

PDBS - Peer-2-Peer database system

DDBS - Distributed database system

P2P paradigm is meant as an alternative to existing information system infrastructures, and the most important features are:

1. Scalability in terms of the number of nodes and distribution. 2. Direct access to data at the source which guarantees freshness in contrast to centralized repositories. 3. Robustness and resilience against attacks and churn (data unavailable because peer is offline?) by exploiting self-organization principles. 4. Simplified deployment because resources (nodes) can join the network (a new data repository can be added to a P2P network without any particular administrative task).

Peer-2-peer database systems:
The main data integration and interoperability idea in peer data management is to avoid a global schema, by providing mapping between pairs of information sources. It’s not needed to map all the peers together, it is enough that the graph of the mappings is connected. Then two sources can be connected by doing pairwise mapping to make a path between them. The global index or mapping graph can be either centralized or distributed. The export schema that each PDBS have contains only the elements of the local schema that a peer want to share with the other peers. Peers autonomously decide on the exchanged part with other peers in the data integration scenario by using mapping rules (source-to-target dependencies???)

Unstructured and structured network types: Pure p2p, super-peer, hybrid

In a Pure P2P system all participants have the same functionality and there is no stored global index of peers.

The super-peer networks have a number of peers that are Super-peers, these may have internal indexes describing data of normal peers and the other super-peers. The communication in super-peer networks is done at two levels, first amongst the super-peers and then amongst the normal peers.

In P2P hybrid systems there may be a server or cluster that may hold a global index of the data.

The above systems are classified as Unstructured because they have no restriction upon placement of data. Structured P2P networks are based on Distributed Hash Tables (DHT) in which uniform hash keys are used to enable efficient lookups. These networks use a protocol to maintain locally information about a subset of their neighbors and this enables efficient routing.

Characteristics of P2P-based approaches
Degree of coupling: This is a peer’s “awareness” of the existence of other peers. In a DDBS all nodes are known by other sites (or the coordinator site) at any time, while in PDBSs, peers can join or leave the network dynamically. This means that the coupling is tighter in a DDBS than in a PDBS. The degree of coupling also decide how much self-organization peers can do, this is limited in structured p2p systems while in unstructured p2p systems can continuously self-organize themselves.

Overlay Network topology: The different classes of P2P overlay networks mainly differ in their topology unfinished...

Routing strategies:  In systems that don’t have a fixed topology the only routing method available for answering requests is flooding the network, have been proposed solutions that are based on maintaining routing information to allow directed semantic routing. Structured PDBSs rely on storing information about neighbors to route towards the neighbors that have an identifier closer to the search key.

Scalability:  Because unstructured networks that rely on flooding have poor scalability. Super-peer networks partially solve the flooding problem by only doing flooding between the different super peers. Another solution is the “random walk”, where queries are only forwarded to one peer at a time. Structured networks scale better because queries are only routed to selected peers and can guarantee perfect recall.

Anonymity:  It is possible to achieve anonymity of both who sent a request and who had the information by routing the request through many peers and also by replicating content