Survey of System Management Frameworks

Outline

Lets start with brief outline on rational for the choice of matrices to compare different implementations. System management frameworks monitor a system, take decisions on how to fix things, and execute those decisions. They often consist of three parts: Sensors, a brain that make decisions, and set of actuators to execute decisions. From 10,000-foot view, most architectures follow this model, but differ on how they collect data, make decisions, and execute them. First table compares them on this aspect.

System management frameworks themselves have to scale and provide High availability. To do those, they have to be composed of multiple servers (managers). Taking decisions with multiple managers (coordination) is one of the key challenges in system management framework design. The second table compares and contrast different coordination models using different architectural styles.

The 3rd table discusses decision models—in other words, implementation of the brain. 4th , 5th and 6th tables discusses details about different systems, and finally 7th table presents some surveys on the topic.

  1. Systems Categorization based on Decision Loop Design- this compare how decisions are taken. There are many models. For example, there can be sensors and one brain, or there might be snapshot is created, or there may be based on reactions to events.
  2. Systems Categorization based on Coordination and Communication model - this discusses how mangers in the framework coordinate their decisions and what architectures being used.
  3. Decision Models in Management Systems
  4. Details About Individual Systems
  5. Details about Decision Models
  6. Details about Coordination based Systems
  7. Surveys

Systems based on Functionality/Design

Sensors->Brain-> (Actuators)?

InfoSpect(Prolog),WRMFDS (AI), Rule-basedDignosis (Hierachy) (Monitoring & Dignosis)
Ref Archi, Autopilot, Policy2Actions (Centralized)
JADE, JINI-Fed, ReactiveSys, (Managers)
Tivoli (Managers + Manual Coordinator)

Sensors->Gauges->Brain-> (Actuators)?

Paradyn MRNet (Monitoring Only)
Rainbow (Archi Model*), ACME-CMU, CoordOfSysMngServices, Code
InfraMonMangGrid(Centralized)
eXtreme(KX),Galaxy* (Managers)

Sensors->DataModel->Brain->Actuators

DealingWithScale(CIM), Marvel( Centralized)
Scalable Management (PLDB/Managers), Nester (Distributed directory service), SelfOrgSWArchi

Sensors->ECA->Actuators

DREAM, SmartSub, IrisLog, HiFi, Sophia (Prolog) - (Managers)

Management Hierarchy

NaradaBrokerMng, WildCat, BISE,ReflexEngine

Queries on demand

ACME , Decentralizing Network Management, Mng4P2P (Managers)

Workflow systems/ Resource management

Unity, RecipeBased (Centralized),
SelfMngWFEngine, Automate Accord, ScWAreaRM, P2PDesktopGrid (Distributed)

Fault tolerant systems

WSRF-Container, GEMS (Distributed)

Decentralized

DMonA,Guerrilla, K-Componets

Deployment frameworks

SmartFrog , Plush, CoordApat

Monitoring

Inca (Centralized), PIPER (multicast over DHT), MDS (LDAP, Events), NWS(LDAP), Scalable Info management, Astorable (Gossip), Monitoring with accuracy, RGMA (DB like access)

Systems based on Coordination and Communication model

Monitoring Only

One Manager without coordination

One Manager with coordination

Managers without coordination

Managers with coordination

Pull / events

InfoSpect
Sophia

Rainbow, Unity, Ref Archi, Autopilot, InfraMonMangGrid

CoordOfSysMngServices

JADE DMonA, Tivoli (Manual)

Pub/Sub hierarchy

DealingWithScale, ACME-CMU

Scalable Management, DREAM, SmartSub NaradaBrokerMng,

P2P

PIPER,

Automate Accord, Mng4P2P, WSRF-Container,

Hierarchy

Gangila, Globus MDS, Paradyn MRNet, WRMFDS

IrisLog, HiFi, JINI-Fed, eXtreme(KX), ScWAreaRM WildCat

Gossip

Astorable, GEMS

Galaxy

Spanning Tree (Network/P2P)

Scalable Info management, ACME

Group Communication

ReactiveSys, Galaxy SelfOrgSWArchi

Distributed Queue

SelfMngWFEngine

Decision Models in management Systems

Rules

Conflict resolution

Verifier

Batch Mode used?

Meta-model used?

Planning

DIOS++

If/then

yes, use priority

No

results of rules applied in next iteration

Yes

No

Rainbow

If/then

No

Yes

No

InfoSpect

Prolog Like

Only Monitoring

No

No

Yes

No

Marvel(1995)

PreCnd->action->PostCond

Yes

No

Yes

No

CoordOfSysMngServices

Yes (Detect ->Human Help)

Yes

Yes

No

Yes

Sophia

Prolog like

No

No

No

RecipeBased

Java Code

Yes

No

No

No

HiFi, DREAM

pub/sub Filter->action

No

No

No

No

No

IrisLog

DB triggers

No

No

No

No

No

ACME

(timer/sensor/completion)
conditions->action

No

No

possible with timer conditions

Yes

No

ReactiveSys (1993)

if/then

No

No

No

Yes

No

Policy2Actions (2007)

Policy- name, conditions (name of method to run + parameters), actions, target component

Yes, based on runtime state + history

There are tests associated with each actions that decide should action need to run

-

No

Yes

Detailed About management systems

Name

Description

Decision taking unit

SmartFrog

Abstract: A deployment and management framework - Components can be grouped, and they are placed in a hierarchy. Groups are managed via top-level components management methods. Each host runs a smart-frog management agent(that uses RMI). Also, provide a language for describing collections of components, configurations and dependencies (order of start).
Components implement interfaces provides management actions (creation, activation, termination, get states) and dynamic remote deployment. Components also have JMX support.

Does not discuss rules

Anubis(HP)
(READ second papaer)
2005

Partition manager uses UDP multicast heartbeat to determine the partition view and stability. A leader is selected for each partition, and state manager provides state dissemination between partitions.
(Peer groups with multicast)

Plush (Plant Lab App management)

Abstract: A deployment service that allows a user to perform the same operation on many nodes, and collect the results. - An app Controller runs on user's machine and client process run on each node managed by Plush. The app controller parses a XML app description that explains apps, where to find the sources and commands need to run to install them. Find resource and install apps remote clients monitor and provide status updates to controller, and take user define actions in case of a failure.

There is no central control; each action is orchestrated by the initializing host.

Monalisa

Abstract: JINI based monitoring framework - Based on java JINI. There are agent services that collect monitoring data. Agent services registers with discovery services using a soft state (subscriptions that timeout). Those registrations are replicated across discovery services in same group. In addition, discovery services may register on other discovery services, making it possible to build a hierarchy.

Agent services are discovered and used to monitor the system. There is a station server for each site, which handles set of agents services (Sensors). Each station server collect, index and archive monitoring data about the site.
It is possible to take manual actions from the GUI.

Only monitoring

GEMS:Gossip-Enabled Monitoring Service for Scalable Heterogeneous Distributed Systems (2006)

Gossip based architecture for monitoring and failure detection.Each node maintains a gossip list, suspect vector, suspect matrix (all suspect vectors). Node randomly gossip with other sending gossip list and suspect matrix.
On hieratical mode, node creates a group hierarchy and maintains the information about group, and all parent groups. Nodes take turns talking to other groups thus avoid need for a coordinator.

N/A

Gangila

Abstract: A cluster monitoring system - Each node keeps information about all other nodes in the cluster, and each node multicast own monitoring data in the cluster. Across multicast domains, values are propagated using a hierarchy (which has one node from each domain) that uses Polling to update its values.

Architecture is hierarchical, each node have alternative nodes on same domain, so resilient to failures

N/A

An Infrastructure for Monitoring and Management in Computational Grids (2000)

Abstract: A centralized Grid monitoring and management system.

Each node has sensor and actuators, and events are sent to an Event Service. The event service performs corrective actions.

Actions - programmers or Grid operations

Static code

Real Time Monitoring with accuracy objectives

Abstract: A monitoring service, which pushes, updates up the tree based on accuracy objectives.

GAP - modified version of Breadth first Spanning tree where each node update its parent node (Push) when value of a node changed.
A-GAP only pushes updates based on accuracy objectives and in order to determine when to push data it use heuristics.

Monitoring only, use AI to decide when to push data up

DREAM

Abstract: CEP system with reactive rules

Build on top of hierarchical pub/sub system, they support Event condition action rules, which provide reactive functionality

Event action rules, no details specified

ECA rules

IrisLog - Tolerating Correlated Failures in Wide Area Monitoring Services

Abstract: Provide sensors and sensor information in a distributed XML DB, and corrective actions are triggered by db triggers.

Require Sensors and Actuators with in resource. Data is stored in a distributed xml database, which support xml queries.
Data represent by an xml tree, which fragmented and distributed across number of hosts. Each fragment is replicated across number of nodes, and we call that a replication group. Each group has a primary (smallest IP), which push the updates to parent element's replica group.
Every write has an associated timestamp and
Every reader and write must contact Quorum (majority) replicas. (So reader must contact most up-to-date replica)
Alternative approach is to contact s number in order and each replica make sure replicas has higher number are up-to-date.
When load exceeded a threshold, data hold in a node will be distributed among less loaded hosts. The paper presents an algorithm to distribute the fragments.

Database triggers

Smart Subscriptions-2000

Hierarchical Pub/Sub system + Set of injectors to provide QOS + manageability via subscriptions

react based on subscriptions

no details

injectors - no details given

ECA

HiFi: A new Monitoring architecture for Distributed System Management (1999)

Abstract: Hierarchical CEP architecture that Support ECA based on composite events.

System has Local Monitoring Agents (LMA) and Domain MA (DMA), where a group of LMA belongs to DMA and DMA may form a hierarchy with other DMAs. Supports composite events

When a filter is submitted the monitoring Language processor, maps primitive events to LMAs, and decomposes composite events and maps them to LMA-DMA and DMA-DMA mappings.

When an event is detected at MA, corresponding actions are carried out. Also DMA use Petri Nets to record and track event history

ECA (filter)

Astorable

Abstract: Gossip based management hierarchy

Hierarchy of Zones, Zone = Zones| hosts. Each leaf nodes has attributes that explains its characteristics and each non-leaf nodes has attributes defined by an aggregation function.

At each level, every node keeps MIBs of every other node. Each node updates values by gossip with same level and immediate child nodes. Values provide eventual consistency and hierarchy can also view as a database and SQL quires over hierarchy are supported. Architecture is decentralized, and able to handle leaving and addition of nodes

No details given, Based on queries resource may be managed using RPC

Reference Architecture for distributed System management 1994

Abstract: Centralized sensor, brain actuator loop
Set of agents that collect data, and monitoring, control and configuration services that build the architecture. (Use RPC)

Actuators: SNMP, CIM for network and instrumented OS and applications to control

Architecture is centralized and could have a single point of failure.

Explain the need of rules, but does not provide details

WSRF-Container (Rajkumar Buyya) 2007

Abstract: Services in a P2P ring, migrated based on SLA requirement

WRF containers are placed on a P2P network, and based on the SLA and health matrices, services are migrated to different container. If request is directed to old address, it can be routed to correct address using P2P overlay

static code

Automate Accord (Parashar et, el)

Abstract: A workflow system built with autonomic components placed in a P2P network

System is built with Autonomic component (functional, operation and control ports) and each includes sensors, actuators and a rule engine.
Those components are placed on a P2P network and each component is capable ofself-configuration, adaptation and optimization. Applications are created by composing these components together.
When a workflow is submitted, component composer break down the workflow in to set of components and rules, and then workflow is executed in selected components in decentralized manner. The execution shares data via a tuple-space.

Rules, that provides self-* properties and generated from workflow.

IF X THEN Y type rules

DIOS++ (2005): Rule base steering of Distributed Scientific Applications (Related to Automate Accord)

Abstract: Autonomic Object (extends from computation object) framework that steer itself base on dynamic rules generated for the application.
Application build using number of Autonomic Objects, and control network has Nodes (each has repository of local Autonomic objects). A gateway combines those objects repositories and a Rule engine decomposes rules, distribute and execute rules with in agents, and collect results.

Also, provide a Web interface to configure and accesses objects.

Application execution said to be go though iteration at each request and result of rule execution are made available at next iteration.

IF X THEN Y type rules

Management of Service-Oriented Architecture IBM Tivoli
(2005)

Abstract: Services, management agents that sends events to a ESB, and mangers

Services are connected to an ESB, and each service run a management agent, which publish the messages to a Message queue. Those messages are received and processed by Management Servers.
User may monitor the status via a management console.

Via management console (manual)

Diagnose by Comparing the dynamic model (real status of the system) and reference model (expected values of the system).

An Architectural Blueprint for an Autonomic Computing (IBM/2006)

Abstract: Managers that run MAPE control loop, high-level managers orchestrate managers with ESB.

Each Resource has a touch point that provides sensors and actuators. There are set of autonomic managers, which manage those resources by running a MAPE control loop. Those managers expose sensor and actuators for high-level managers.
There are set of high-level managers that provide orchestration of autonomic managers. There talk to each other using ESB. There are set of manual managers that control/give direction to high-level managers
*States are kept in set of knowledge sources, policies, symptoms, rules, configurations.

MAPE control loop, decisions are taken based on policy (It is only a architectural blue print)

Globus MDS (Monitoring and Discovery Service)

Abstract: Monitor the Grid and populate the information via LDAP.

LDAP server is backed by a hierarchical pub/sub system. (E.g. sensors->Grid resource info service -> Grid Index info service). Information is collected from host monitoring (Ganglia ...), services (GRAM, FTP, and RFT) and Queues (PBS, LSF).

Provide WSRF, WS-NT, and LDAP interface, which allows users to query for information at different depth of a tree. Each level will aggregate the data from its children.

Only monitoring

Paradyn MRNet

Abstract: Monitoring hierarchy, and manual control

Information collected via sensors is published to a re-publisher hierarchy, which aggregate monitoring data. Consumers get information about resources from hierarchy

Actions: Actions are delivered with multicast

Control is manual

RGMA

Abstract: Allow data base like accesses to monitoring data

Data base producers stream producers. All data are defined by a Schema, and allow database like access. E.g. New producer - create table, Data insert, Delete producer - drop table, Consumer queries - select, re-publishers - queries on results of other queries

Only monitoring

Scalable Management (ICAC 2005/HP)

Abstract: A information model that captures system state by recording event and set of managers

The system is built on a pub/sub broker hierarchy. Each node of the system has a sensor that would output status changes to designated channel. System provides PLDB, a distributed event-recording service for broker events. PLDB is made up with multiple supervisors each manages (add/remove as needed) number of recorders and each monitors and record events (tuples) in a channel. A client may send a request for a data item and a recorder will respond with most up-to-date value. System state is captured as an information model using PLDB, and management services function on that model.

Remote Deployment servicesare capable of handling a deployment schema and service dependencies.

No details given about decisions

Dealing With Scale and Adaptation of Global Web Service Management (2005/HP)
(DealingWithScale-2005)

Abstract: centralized framework for deploying, monitoring and redeploying components in case of failures

Each node runs a deployment service, which handle (installation, configuration, activation, deactivation, de-installation). When a service is installed and started, a health monitoring service is started which will generate heartbeats/failure events for services on same node.
Adaptation engine listen to the events via event bus and update a CIM repository to reflect the state of the system. Decision making engine will use model to redeploy services if they fail.
*failure of health monitoring service is detected via timeouts.

decision making engine/ no details given

A scalable distributed information management system (2004)

Abstract: selective update of monitoring data collected over a tree (based on internal tree of ADT) based on read write ratio.

Build an aggregation tree using the internal tree of ADHT (variation of DHT), and selectively use Update On Write, On Read or by Gossip based on read/write ratio of attributes.
Define modified DHT, Autonomous DHT, which allows each domain to work without other domain being presents (Administrative isolation)

Only monitoring

ACME (2003), Berkeley
Monitoring, Analyzing, and Controlling Internet-scale Systems with ACME

Abstract: Queries are deployed over a span tree of a P2P network, and actions are triggered when conditions are met

System is built based on a spanning tree of a P2P network (which is automatically reconstructed). Each node runs a query processor, sensors and actuators. The root node runs a trigger engine.
Queries are send down (deployed) the spanning tree and they are aggregated from bottom up. (System includes value aggregate function, which concatenates all result). In addition, it supports continuous queries that send the results once for give duration.
Entire trigger engine run alongside ACME and it query sensors specified in rules and run the actions if conditions are met.
Actions: HTTP server and actions are encoded as URLs Based on a P2P spanning tree, but have a single point of failure at query engine

Rules: provide condition and actions, when all conditions for a actions fulfilled, the actions is carried out. (Sensor Conditions, time conditions and completion conditions)

PIPER: Querying the Internet with PIPER

Abstract: monitored data is stored in a DHT, and queries are performed by multicasting.

On a P2P network, each dataitem is stored in node matched by its key (NS + resource Id). The node accepts and deletes data using a soft state protocol. Query is performed by multicasting the query to all nodes that has matching namespace.

Provide a distributed data storage using underline P2P routing (CAR).

Only monitoring

A Hierarchical Architecture for a Distributed Management of P2P Networks and Services
(Mng4P2P)

Abstract: Management framework for P2P systems that build management hierarchy with most fit managers

Build a management tree using most fit managers (according to fitness for management)are on top of the hierarchy.
Each manager checks it's children and parent,

  • New nodes are added to random place of the tree
  • If children are missing, they are removed. If parent is, missing new join process started.

Tree is reordered by promoting more fit nodes as higher managers.
Each node have JMX enabled

Static code

Tools for constructing reactive systems (1993)/Meta, (ReactiveSys-1993)

Abstract: ISIS based framework that triggers reactive actions to manage set of processes.

Each process has sensors (that can be subscribed to) and actuators. Rules are written in NPL and they execute actions (using actuators) when given condition are met.
Manager subscribes to a process group (Group Communication) and when processes are added or removed, manager is notified. Manager change the sensor subscriptions to processes accordingly.
Passive replication is used with managers to guard against failures.

NPL rules

A Self Configuring Service Composition Engine
(SelfMngWFEngine)

Abstract: Distributed workflow engine which use distributed task queues for communication

Have Navigators (Schedule workflows from workflow queue and add to task queue) and dispatchers that execute each task (services). Result of each task is added back to event queue, and state of each workflow is stored in a state repository. Each component has local queues and events/ tasks are submitted to local queues if their location is known.
There is a routing table that associate process with a navigator and targeted tasks are copied to its local queue time to time.

Static Rules

Dispatcher/navigator registers in a configuration registry. Self-healing component verify they are running and if the dispatcher is down, re-execute the tasks.
If the navigator is down, run the task from last state at state repo.

Unity: Experiences with a Prototype Autonomic Computing System

Abstract: system that allocate the resource to applications using predication provided by applications

There are set of autonomic resources, a Policy repository, Registry (of capability of each resource), and monitoring. There are arbiters that allocate the resource to applications using predication provided by applications.
Goal driven self assembly - locate resources from registry and configure itself based on policy/dependencies
Self healing by active replication + monitoring

Allocate resources (Autonomic Elements) to set of applications based on SLA.

Policy + Utility functions
Arbiter ask applications for estimate of effect if a new resource is given, and uses that info to derive decisions.

Rainbow: Architecture based self adaptation with reusable infrastructure

Abstract: sensors->gauges->brain (if/then)->actuators

System contains actuators, resource discovery and probes. Gauges aggregate the information and update model. Model is evaluated by constraints evaluator using rules, and based on outcome of the evaluator, adaptation engine run the strategies (actions), which call actuators
Model - Components, Connectors and properties of each.

Provide adaptation based on a architectural model Constraints + Strategy = if then rules

An Overview of the Galaxy management framework for Scalable Cluster Computing (2000)

Abstract: provide a hybrid management framework for tight control of clusters and loose control of farms

Farm - Organization wide, each node runs a farm service (support farmmembership, failure detection, intra farm communication and state sharing). There is a management information database at each node and it is updated using gossip protocols. In addition, each node runs event, measurement, process and Job, remote script, and component installation services.

Clusters - There are Clusters formed with in a Farm and they provide more tight management. Each run a cluster service (uses group communication) and all nodes see the changes in same order. In addition, it provides reliable communication services.
There is a management cluster, which runs a farm controller, and a cluster Designer among other management services. Architecture is Hybrid - Group communication within Clusters and gossip with in farm.

Static logic

Loosely coupled federation of distributed Management Services (JINI-Fed)

Abstract: JINI based architecture thatallows set of Managers to manager SNMP enabled resources.
Resources are expected to support SNMP.
Each resource is assigned a nanny that captures SNMP traps and generates JINI events, and at the startup, it will be registered with Lookup service(found using JINI discovery mechanism).
In addition, a SNMP gateway allows clients to issue SNMP commands.
There is a configuration service whichstores configurations + rules + policies aboutresources, and notify both resource and management applications about value changes.
Management applications may monitor the system via events generated from SNMP traps and control resources.

It does not provide management solutions but provide a architecture that supports management applications

Autonomic Managment of Clustered Applcations (JADE ), 2006

Abstract: Provide a autonomic component model and presents load balancing (by changing number of replicas in a J2EE deployment)

Fractoal Component Model
Component - Composite (sub components) or primitive (have an executable) and they support server and client interfaces. Server and client interfaces are bound by primitive (same address space) or composite (remote) bindings.
Each component has controllers to support introspection and reconfiguration.

Sensor->Brain->actuators control loop is implemented as fractal components. So they can be reused and also they shall be managed using fractal self management

Kinesthetics eXtreme: An External Infrastructure for Monitoring Distributed Legacy Systems (2003)

Abstract: Gauges signal complex events, and triggered actions are carried out using an agent-based framework.

Each Node has probes (that generate events in to a notification bus) and there are set of Gauges that process these events and identify complex events, and signal those events. Those Gage events are listened by decision a component which use (rules/condition analysis, architecture transformations) to decide best form of action. Then action is carried out by Cougar distributed agent system based Work flakes task processor. (Distributed agents with shared backboards for carrying out actions)
Gauges - Event Distiller - uses set of rules and find correlations between times based pattern marching, Event Packager - event aggregation, transformations and persistent storage.

Actions: Workflakes will decompose the action in to sub tasks, assign the task to target components, and executes those using worklets that allow code to be sent to nodes and executed.

Pub/Sub +

rules/condition analysis, architecture transformations

Constructing an Autonomic Computing Infrastructure Using Cougaar (WildCat-2006)

Abstract: Agent hierarchy based management system that uses policies to coordinate the managers

Agent Hierarchy is build using Cougar agent system. The hierarchy is of the form. ACSSystem->System->Pool->Resource->Agent, and the agent architecture handles the group membership at each group using name conventions and an attribute called role. Each group has a blackboard that allows groups members to talk to each other.
There is a special component called ACSystem that exists by default, and it provides centralized coordination, maintain pools, and bootstrapping.

High-level planning domains specify the high-level goals (policies) using knowledge encoded as asserts. Based on these high-level goals, adaptation engine local to each agent provide control loops and rule based control. Each agent provide Servlet that provide administration and configurations are exposed as operation modes

System managed by high-level policies at top level and control loops and rule based control at agent level.

Decentralizing Network Management-2006

Abstract: a framework to run a decentralizedquery/operation in the network.

Queries may be initialized via an http interface.
The framework works on any spanning tree (e.g. network/p2p).

Echo Pattern

  1. Expansions - Start node send request to all its neighbors, each would send request to their own neighbors. Continue till leaf nodes are reached
  2. Contraction - Each leaf node respond back, and receivers responds to their senders. Algorithm continued until all neighbors of start node responds back.

Skip Echo - each node checks reach ability of neighbors and do not wait for them to respond back.
Algorithm may run mobile code on each node and aggregate while responding back.

Only monitoring/Adaptation as future work

Autopilot: Real-Time Performance Monitoring, Adaptive Control, and Interactive Steering of Computational Grids 2000

Abstract: allow clients to find sensors from a registry, and perform corrective actions.

Registers set of Sensors in a registry, the client find the sensors and subscribe to them. Based on events corrective actions are performed based on fuzzy logics.

e.g. control loop for performance management

The actuators that change the parameters

Monitoring data are used with a fuzzy logic set to select the suitable policies with in the application. (Fuzzy Logic is used to avoid conflicts)

ACME-CMU 2002, Exploiting Architectural Design Knowledge to Support Self-repairing Systems

Abstract: Self-repairing based on architectural models. Also provide ADL (Architectural Description Language) and Design Environment called Acme Studio

Set of probes listen to (system calls, OS, events, network monitoring) and publish events to a event bus (broker hierarchy).
There are set of gauges to interpret (base on constraints) the events and they publish the events to a gaugebus. There is a Gauge agent aggregate the gauge events and information is used by acme studio and Tailor for automatic reconfiguration.
There are constraints defined and when those constraints are met, repair strategies are invoked (imperative language) by a centralized management unit.

Constraints->repair strategies (If then rules)

A Scalable Wide-Area Grid Resource Management Framework (ScWAreaRM-2006)

Abstract: Highly scalable wide area resource management hierarchy with election to select coordinators

Each resource has a resource manager (RM), and they are controlled by Cluster RM (CRM). CRM may have number of levels in hierarchy, and top level there is a Grid RM.
Each RM at upper levels may be manually selected or selected from an election algorithm. There could be number of GRM hierarchies running in parallel. Unlike other GRM may talk to each other (discovering each other via a discovery mechanism).
Each level aggregate information and higher levels have information that is more general. When a request received by GRM

  1. It is assigned to one of CRM based on policies and passed down the hierarchy
  2. If there is not enough resource, negotiate and assign it to a peer GRM
  3. if (node failures/other reason), task can not be assigned, request send back to parents and recovery (recovery can be local or global)

Policies, and static resource management algorithms

Decentralized Cooperative Management: A bottom up approach (2005) (/CuPs)

Abstract: Network System where each node monitors itself and its neighborhood and makes decisions to manage itself and its neighbors.

Each node run three planes

1. Communication Plane - support filtering and adding annotations to packets

2. Coordination plane - topology (keep track of neighborhood topology), interaction protocol, query module (status update, behavior and control)

3. Management Plane

o Monitoring (Local system - Sensor Pool, neighborhood - Aggregation pool)

o Processing - process the sensor data and decide on high level actions based on policies

o Plan - Plan, verify actions

o Action - translate high-level policy actions to concrete actions(action matrix, choose best action from actions based on situation)

Local decision based on policies by changing policies via query module and by action module that decide the action. Also adding annotations to packets is used.

Web-based remote monitoring and fault diagnosis system (WRMFDS-2004)

There are sensors monitor the equipments and perform h/w and s/w preprocessing. Alarms are generated based on preprocessing and send to an intranet level manager, which provides a GUI to the user.
User will choose the monitoring points and run the diagnosis and based on users feedback, rule base and KB is updated.

two level hierarchy for monitoring and diagnosis of equipments

Use Case base reasoning, rough sets, Neural networks, decision trees

Scalable, Fault tolerant Management in Service Oriented Architecture - Harshawardan Gadgil, Geoffery Fox

This paper use pub-sub based approach to system management. For an example is establishing links (hierarchy) of a message broker using management system.

There is a management agent for each manageable resource, which provides a WS-Management based interface to the resource, and sends heartbeats.

Registry - System assumes existence of a scalable and fault tolerant registry (implemented using a database or WS-Context). Registry keeps track of the global state of the system (availability of managers, list of resources and their health, system policies).

Manager - thread per each resource to be managed and provides independent check pointing. Each manager keeps renewing itself in the registry. Mangers monitor each other on registry and if there are resources without manager, the resources are taken over by other manager (passive replication)

Bootstrap services - starting point for all services of the system and main fault prevention component (check for Messaging Nodes, registry endpoints and enough managers). Bootstrap services are arranged in a hierarchy. Leaf Nodes manages managers in domain. Non-leaf nodes make sure leaf nodes functions. Non-leaf nodes may be replicated and they can check each other status to make sure top of the hierarchy is healthy.

Also paper discuss active (run on all replicas) and passive replication and check pointing schemas. Check pointing can be independent or coordinated. Independent states may be not consistent and might need to fall back till consistent state is found. With coordinated check pointing can be blocking or non blocking. Look at the paper for references for each.

Static Code

Designing the Architecture of P2P-Based network Management Systems Andre Panisson et, el.

Network management is traditionally achieved by manager-agent model (Managers query agents/ agents report). P2P based NM(network Management) redefine management entities

1) Top Level Managers (TLM) - GUI fornt end for human users,

2) Middle Level Managers(MLM)- react to TLMs and MLM and provide a reliable communication between management entities

3) Managed devices with agents - agents retrive or change status of managed devices on reqests from MLM

4) Management Service - provide management task to other management entities. Management services are groups as peers for scalability and fault tolerance. These groups monitor each other and make sure sufficient number of peers exists in the group. They also provide load balancing via group peers

How it work? 1) TLM search for management service , 2) MLM respond with service info, 3) TLM make request to one of the MLM, 4) MLM perform the request via accessing Managed resources. P2P systems were done with JAXTA, the services are published and discovered though JAXTA.

Only monitoring

Service Morphing (2003)

Data is added to a notification bus, and project generate services that would process (and may be republish) the data. Those services are dynamically generated based on the metadata about data and monitoring information gathered from resource (using a kernel level pub/sub channel).

I-Queue

Monitor message sequences that likely to create problems and apply app specific corrective behaviors

System-S (2007)

Check pointing based recovery in a distributed stream system

A stream processing application that process streams by routing them across number of Process elements (PE). There is a centralized job manager, which plans the processing on selected hosts.

Both job manager and each PE checkpoint is state. Only selected states are check pointed and intermediate stated are cancelled and retried on failure. New Job manager recover by his check pointed state and talking to PE.

Transparent Atomization in Composite Systems

Add self-management via interceptors (At programming language or middleware level) without changing the existing code. Other way to do it is to use a bridge (Mediation) that sit between server and client and authors claim adding a third party add higher overhead than interceptors at client or server. Adding interceptors is called Transparent Shaping it is achieved at programming language level or middleware level. e.g. System that call to a Web service if Corba server is down.

Self Organizing Scheduling on the Organic Grid

The paper tries to build a Desktop Grid where a Job could be submitted from any node and computation automatically get schedule among other nodes. For such a decentralized system, 1) Few assumptions can be made about the system 2) Self adaptation is normal mode of operation 3) Deployment of components is non-trivial 4) dependency on centralized entities should be avoided.

The model presented by the paper, broke down the task and distribute the task using a overlay network (spanning tree of P2P system could be a good candidate).

Challenges - Distribution of data, Discovery of new nodes, Load balancing, Collection of results, Tolerance to faults, detection of completion of tasks.

Algorithm

1) Each node has a list of friends that he can submit a task to (these friends usually create the overlay network)

2) Each job may submitted at any node, if job is too big, it is broken down and propagated down a tree

3) When a task is done, result are send to parent. Each node keep requesting new tasks from the parent

4) Performance of each node is calculated by number of responses, good nodes are moved up (near to root), and bad nodes are moved down. This is done by each node sending about best child to parent and worst child to its Childs.

5) If the parent goes down (detected by timeouts), node try to talk to an alternative parent from the parent list. If there is, no parent found, node shutdown.

Code

Each resource has a Producer (that reads from set of Sensors), Actor (an actuator) and Manager. Based on the events from Producers Managers take actions using a rule set. In addition, managers generate high-level events that would be archived in a archive. Management GUI uses these events to illustrate status and perform simple management actions. It is a centralized solution.

NWS

Sensors (CPU load, memory utilization, end-to-end bandwidth) -> Memory Service -> forecasting processes -> Web front end for performance measurements and predication. Sensors are managed though sensor control process and everything subscribe to a LDAP registry.

Inca

A centralized monitoring that allow user to deploy metric calculations/ test in host, collect and query them

Allow users to define their own reporters (Could include states info as well as tests and benchmarks) and register them with the agent. The agent deploys them in nodes using reporter manager at each node.� Reporters are executed in schedule and data is sent to depot server (where it is archived), and information collected can be queried or visualize via a GUI. http://inca.sdsc.edu/www/downloads/inca2_07.pdf

Nagios

Monitoring nodes and network

http://portal.acm.org/citation.cfm?id=860378

An Architecture For Managing Distributed Systems

M. Sloman, J. Magee, K. Twidle, J. Kramer

STORM: Scalable Resource Management for Large-Scale Parallel Computers Abstract: A resource management framework that define three basic operation upon which resource management can be implemented.
Each node runs a Node Manager, and program launcher, and centralized machine manager reads the job, broadcast it to selected nodes, executes them, and when they finished, de-allocate nodes. Define three operations a) transfer and signal, 2) test (poll) local event 3) compare local value to global values on subset of variables, and optionally set it, which can be used to implement the RM functionality.
Coordinating adaptations in distributed systems Brian Ensink (CoodAdapt)

Abstract: discuss Coordinating adaptation in different parts in a system, so changes do not break the system. (E.g. if changing the format in a video conference system, it should be done as a one message pass through every node).
Adaption is coordinate by a runtime algorithm, and PCL is extended to support adaptation operations. Adaptations replace code, or run adaptation logic in number of distributed agents in the same time, and framework allow the changes to happen at non-conflicting points in all agents. Paper introduce concept of region, in a process where adaptation can be done safely.

Algorithm connect to each processes and get the information about current region the process is running in, and notify them about the specific region in the future, every one should run a specific operation. Processes ack the region, and perform the operation when execution reach designated region.

The Guerrilla Management Architecture for Ad-hoc networks

Abstract: Decentralized management support for ad-hoc network, include clusters of nodes build around managers, and managers may work together to handle operations (e.g. handoff).
Adhoc networks, communication by multichip wireless infrastructure. Three types of Nodes -a) SNMP capable b) Probe capable c) Full-featured. Network is managed by Agency model - where nodes are clustered to groups with at least one manager and node may move between managers, and managers will handle handoff. Managers may send probes (scripts), to do some work from other node. (a) Monitoring probes b) tasks specific probes). Managers are aware of other managers and take adaptive actions to handle current situations.
Each manager runs a utility-based decision-making mechanism and probabilistic reasoning mechanism, to decide on actions based on events. Probes may be used to perform actions.

Design and architecture of Microsoft cluster service (MSCluster) - Werner Vogels Each node may own resources (e.g. printer, storage, and database). Resource group is unit of migration, and all resources are owned by a one node.
Each node in the cluster keeps a database of the cluster status. Each node try to find an active cluster, and create a new one if not exists. If one active member is found he acts as the sponsor and notify others about the new node, and new node receive a copy of the cluster database. Each node send heartbeats and each monitor others.
If a node fails clusters starts a regroup.
Creating a Robust Desktop Grid using Peer-to-Peer Services, Jik-Soo Kim, Alan Sussman, (P2PDesktopGrid). Abstract: P2P based desktop grid, user submit jobs, and they are processed in a node in p2p network and results are returned.
Desktop grids are characteristics by low IO, and computing intensive jobs.
When the Job is submitted, it is routed to an owner node (which is selected randomly among nodes). The Owner node finds a suitable node via a matchmaking algorithm, and submits the job to the node. The node received the Job process it FCFS, and until job finishes, it sends heartbeat messages to the owner. When Job is completed, results are directly returned to the client. If owner or the execution node fails, other can start a recovery. If both fail, the user has to re-submit the job.
Match Making Algorithms
1) RN- Tree, a tree (spanning tree) that keeps all the nodes in the network, and each node decides his parent node based only on local information. Each node periodically sends local sub-tree to it's parent node. Search for a matching node to run the Job (match making) is done by traveling the tree (first own sub-tree, the parents sub-trees, ect).
2) Content addressable network (CAN), map GUID of a nodes and data points to a d-dimensional space. Using parameters about resource as dimensions, both jobs and nodes are mapped to multi-dimensional space using CAN. Jobs are matched by searching the neighborhood of multidimensional space for matching nodes.
Papers: Resource Discovery Techniques in Distributed Desktop Grid Environments,(Grid 06).
Network weather service: A Distributed resource performance forecasting service for meta-computing. (NWS) Abstract: provide forecasts (e.g. load) about resource, and network. The system consists of Sensors (Node and network), persistent state(PS) processes that store the current sensor data(Usually every sensor has a associated PS process where it store observations, but it may be handled by PS process in different machine), and Forecasters that provide forecasts based on data collected by PS processes. All other services are registered in the Name server (LDAP/possibly distributed) using a soft state protocol. Name server is the only well known address in the system. Users may found a forecasting from the registry, and get predications about the system. The forecaster will use the Name server to locate the services he needs to find.
In order to not to overload the network (data is collected by running tests), Network sensor grouped as cliques, and performance measurements with in nodes in the cliques are performed, but no performance tests are conducted across the cliques. A Hierarchy of sensors is created by creating another clique by taking one node from each clique. A Token is use to make sure one clique member conduct experiments. There is a leader in the clique, who generates a token, Token is re-generated on timeout, and generating process becomes the leader.
Using Multicast-SNMP to Coordinate Distributed Management Agents(MulticastCoord) The system uses Cooperating management agents instead of a hierarchy, and it is possible to build a hybrid solution by building a hierarchy using cooperating peers.
Each node sends heartbeat messages to a well-known multicast address, and each node maintain group information, and new nodes are added when they multicast, and removed when heartbeat is missing. A master agent is used to coordinate cooperating peers, and master agent is elected among the peers using a SNMP multi-cast (IP multicast).
The Master multicasts a trap every T time, if that message is missing for given time, a peer start an election. If the peer receives a vote with high number, it stops the election, otherwise it become master and start to send master multicast messages. Coordinating peers work with multicast, and coordinator is used for synchronization and conflict resolution.
Self-managed decentralized systems using K-components and collaborative reinforcement learning (K-components), Jim Dowling, Vinny Cahill

Abstract: component framework for building self-adaptive systems, and learn adaptation rules with reinforcement learning.
Systems typically use centralized, or consensus based approach for establish and maintain system wide properties, the paper present an alterative, decentralized agents that take local decisions, while coordinating with others.
Agents are build with components called, K-Component, and each component maintain a AMM (Architectural meta model), which represent the local view of architecture. There is no one AMM for complete system. Each component runs a decision model that evaluate adaptation rule written in Adaptation contact language, a declarative programming language (ECA/If then rules to match events to adaptive actions).
Furthermore, those rules / adaptation policies can be learned with collaborative reinforcement learning (CRL). In CRL is agent learn from experience of its neighbors. E.g., Load balancing using CRL.

Automated Rule-Based Diagnosis through a Distributed Monitor System(RuleBasedDiagnosis) Abstract: System that provides diagnosis through a monitoring system that inspects messages. The system only look at the messages (non-intrusive, and uses active forwarding/passive snooping), and monitor and diagnosis errors occurred in a protocol. It is implemented with a hieratical monitoring system, where there are local monitors (LM), few levels of Intermediate monitors (IM) (which keep agitated information), and a one global monitor, which evaluate system as a whole.
System may operate in two modes, Pessimistic version (check all the messages), optimistic version (only when error happens). Each monitor node maintains a state diagram of current state of each node, and runs a rule-base to make decisions. Rules are triggered by failures. Errors can be of two types, local error and propagated error. The System provides a way to verify a node by inspecting its operation, and when the error is detected, node where error detected is tested first, and to detect propagated errors suspicion set is created. The nodes in suspicion nodes are tested, and suspicion test is recursively expanded.
System also supports masking LM and IM failures. To do so the system keeps 2f+1 LM's or IM, and use a byzantine model. Events triggered by LMs are multicast to all the other LMs and, all LMs agree on a one event ID. All the events from LMs are sent to a group of IMs to allow byzantine fault tolerance.
Exploiting Emergence in Autonomic Systems, Richard Anthony, Alun Butler, and Mohammed Ibrahim, Autonomic Computing: Concepts, Infrastructure and Applications. Emergent systems has several properties, A) Whole being more than sum of the parts, B) Compositions of parts and there interactions where micro level behavior leads to macro level behavior (Without central control) C) randomness. There are two types of emergent behaviors Behavioral/ First Order emergence (e.g. Gossip protocols) evolution/second order emergence. E.g., emergence system - Market has a emergence behavior, when supply goes down prize goes up which encourage suppliers in turn supply goes up.
Emergence systems does not yield optimal behavior, or deterministic, and they usually work using local decisions that collectively lead to macro behavior. Usually they use Open feedback loop, where feedback is received via the environment.
Paper presents two emergence systems
Emergent Election Algorithm - communication is unreliable and not ordered. Important messages are retransmitted until conditions holds. System consists of nodes, and there are three categories Idle, Salve, Candidate and there is a one master. Idle nodes make sure there are enough slave nodes present. If slave node detects master is down it starts an election. Every node joins the election unless it sees a candidate message higher than his. When a node saw a higher candidate message, he leave candidacy.
If a candidate does not demoted for given time, he assume master and send master message, and any master whom received a higher master message demote to a salve.
Layered Support for high-level composition of emergence - Three layers System, Clusters and Application. All nodes belong to System layer and they elected a Coordinator using the above algorithm. Clusters are created by system coordinator and cluster elects a sub-coordinatorlocally. Applications deploy and use clusters. The Cluster coordinator performs on demand cluster creation; identify node adjacencies, and assignment of nodes to clusters. This is applied to an air-traffic control system. Nodes are assigned to cells, each node is usually a member of its own cluster, and eight neighboring clusters and one node in each cluster (non-deterministic) manage the center cell. There is emergent behavior arise from overlapping cells.
Papers -Emergence: a Paradigm for Robust and Scalable Distributed Applications Layered Autonomic Systems, Richard John Anthony.
Nestor: Towards Self-Configuring Networks AV Konstantinou Abstract: Allow a set of applications to manage the network. State of the network is captured by a distributed repository, and actions are coordinated with a centralized service.
Three main challenges, a) Change propagation - change to a network element needs to be propagated to related elements, b) Changes may lead to inconsistent systems 3) Different rules/policies need to be composed at runtime to perform a change operation.
Resources and their relationships are expressed using RDL that specify their interfaces and relationships, CDL (Constraint definition Language) defines constraints the resources must follow, and PDL - is used to assign values to configuration model objects based on configuration of related objects.
System allows a set of applications to manage the network. State of the network is captured by a distributed repository that is updated with directory management protocol. There is a Constraint and change propagation manager (centralized) make sure system constancies are kept and change propagation rules does not lead to cycles (Use CDL and PDL), and have right to reject a action. Changes are committed as transactions. (Change propagation rules - effect of changes in other components, and side effects.) Adaption is performed by applications (e.g. Provide an application that make sure given agent is running in every host, and selected a primary), and how applications know about the changes is not clear. Corrective actions are performed via Protocol adapter layer (which support SNMP and other actuators.)
http://www1.cs.columbia.edu/dcc/nestor/
Related - ECA rules and SNMP instrumentation,� S. K. Goli, et al, ICON: A System for Implementing Constraints in Object-based Networks, and J. Haritsa et al, MANDATE: MAnaging Networks using DAtabase TEchnology, IEEE Journal on Selected Areas in Communications, vol. 11, pp. 1360-1371, 1993.

BISE: A Distributed Service Management Infrastructure for Enterprise Data Centers Based on Peer-to-Peer Technology Chunqiang Tang, Rong N. Chang, and Edward So. Abstract: Place all nodes in the system in a spanning tree, monitors the system by aggregating the data across the tree and enforce SLA on the service deployments. Each node runs a management agent that monitors the node. System is based on a P2P substrate that supports application level routing and multicast. Multicast is used to apply configuration changes to all nodes, or let all nodes know about a new service act. The P2P substrate has an underline spanning tree of the system. Monitoring data is aggregated across the spanning tree, and arrives at the root node of the tree. Tree organizes itself if nodes fail.� The applications to run on each machine are determined and changed based on the monitoring data.� When any node received a request, it route the request to a correct service using the configuration information about the system.
Services supporting management of distributed applications and systems by M.A. Bauer Repository, configuration, monitoring and controls services. Management agents carry out actions e.g. SNMP, CIMP. System runs a management control loop and has centralized components.

Understanding self-healing in service-discovery systems

Service provider and users find each other by discovery. The discovery can happen by users� multicasting requests or, with addition of registries that keep track of existing services. JINI based service manager, at start up discover registries via multicast, and store the service descriptions in registries (with soft state).
Service failure detection is done by polling or absence of events, and Failure recovery is done using soft state, application persistence. (? Re-read)

Adaptive application specific middleware

The paper defines application specific middleware, where all messages pass through them as conventional middleware, but supports adaptive structure for composition, control of service interactions and measurement of QOS as well.
The paper provide a self managed composite, which allow different applications to play different roles, and there is a one organizer which compose the applications that play different roles together according to functional and QOS requirements.

Self-organising software architectures for distributed systems(SelfOrgSWArchi) Abstract: component managers collaborate via group communication to maintain a system according to architectural constraints when nodes join and leave the system.
The paper present a system that retain it's architectural specifications despite node joining and leaving the system. Architectural specifications are set of constraints that define the system. System is a directed graph, and constraints are defined in language called �Alloy�. A component defined provided and required ports, and the language may define relationships among the components. (E.g. paper define a definition of a pipe line, saying head, and tail are not connected, and every other component is collected act.) However, it is harder to enforce global constraints with a self-organizing system.
The paper presents an execution environment that support self-organization by satisfying architectural constraints, but assumes no portion occurs. Component provide management interface between component application and component manager, and support an even listener interface that notify about binding/unbinding components act. (Each component has view of the system state, based on events sent by component manager using atomic multicast.). There are set of component managers, and join leave messages, and management actions are performed using total ordered multicast. Changes are triggered by join/leave messages provided by group communication methods. It is responsibility of Component manager to match, required and provided ports in the system.

Robust software via agent-based redundancy

Abstract:Use agent to provide redundancy via different implementation. Robustness via redundancy, identical software components does not help, as they are deterministic. Different implementations based redundancy, information theory define how much redundancy is required.
A) Use a preprocessing algorithm to choose beast implementation.
B) Run all the implementations, and choose the best results.
C) Use preprocessing to find best group of implementations, and use post processing to compare and choose the best solution.
D) Algorithms jointly decide who should do the processing and how to compare results. Preprocessing - random/lottery, Auction/election/criteria selection, Sub goals and team work Post processing - select fastest/least amount of space, vote on results, collaboration - remove controversial data by comparing, Incremental - Two agents compare, if result same forward, if not bring in a third act.
A model of scalable distributed network performance management(ScNwPerfMng) Event base monitoring and measurement, there is hierarchy of managers and they trigger event when matching conditions are found.
Challenges and Best Practices in Policy-Based Autonomic Architectures Abstract:observations about policy based autonomic systems, and a centralized policy engine. Policy engine - manage data center resources, processes, network traffic, servers and load balancers (via sensors and effectors interface). An agent run on every machine, server-pool-director allows agents to find each other.
The Policy includes following parts a) Scope - group of managed resources to whom the property applies, b) value -priory for conflict resolution, c) condition - a Boolean expression, d) action - standards are needed for manageability adaption (e.g. WSDM).

Policy engine requires metadata to enforce policy and hard coding resource metadata in the engine affects extensibility. The presented system allow users to provide their own meta-model of the system. Policy engine accepts system resource configurations and parameters, characteristics and relationships as a meta model, which enables using the engine with wide variety of systems.
Authors make following observations about using policy.
a) Business policy design is complex and error prone, b) scalability is important c) business values are not constant, d) resource properties are not always simple values e)
system model + policies -> engine -> monitor resource, aggregate them to collections, report state, and act upon them. �
Authors identify modifiability, mutability, subscribeility, type, relationship as characteristics of properties. Based on the policies, the centralized policy engine subscribes to events, and performs corrective actions.

Oceano-SLA based management of a computing utility Abstract: System maintains SLA, by dynamically allocating resources. Most business wants to handle peek loads that are much higher than usual load.
Monitoring agents issue events, which are sent to a resource director, which control the system by resource allocation/defalcation and throttling.
SLA characteristics: SLA matrices - requirement violations (penalty assessed), goal violations (no penalty). Priority classes (Gold, Silver, and Bronze).
There are two groups of servers, one group is statically assigned to services, and run critical services like databases, and other group of servers can be dynamically allocated and de-allocated at run time. Agents are placed in every server, and they monitor the server and generate events (specially SLA violations) . Those events are aggregated by a Aggregator to build composite events, and they are sent to a SLA monitor which makes decisions. The SLA monitor asks resource director to take corrective actions. Hierarchy of correlation engines are used to make decision such that events that are more general are handled by high level, and communication with in the hierarchy is done using events and open problem database. List of remedies are used in order for fix the problem. (Assumes SLA management infrastructure will be available without failures).
Using a Utility Computing Framework to develop Utility Systems, T. Eilam, K. Appleby Manage Utility services - a service who has assigned resources (e.g. science experiment, game enviorment). Allow users to setup utility services and workflows to create, delete, and change these utility services. The workflows are automatically created based on user description. Utility service is managed by a manager supplied by the customer. There is a Gateway, and different level of controllers that provide control at different levels (e.g. utility service, resource types, single resource).
GulfStream, a System for Dynamic Topology Management in Multi-domain Server Farms Sameh A. Fakhouri, Germ�an Goldszmidt Abstract: Establish dynamic hierarchy using elections and performs failure detection. Organize nodes in an all network adaptors in a server farm (same network) to a group (Adaptor management Group), and dynamically establish a hierarchy for report topology, and availability of adaptors.
Each node discovers the local configuration. Gulfstream run an agent on every machine. Each agent at the startup broadcast itself, and highest IP become the leader in the group, if there are no others, agent becomes the leader. New adaptors may join by multicast. (Supports group merges). (Failures are detected by the leader, and reconfigure the group and if leader is failed, it�s successor takes over). AMG leader of administrative adaptor load the current network configuration from database and compare with discovered network. If main leader fails, it is selected among the administrative nodes.

Failure detection - each node on the group are placed in a ring, and they monitor each other around the ring. Gulfstream uses each node is monitored by many nodes. Simple extension is to send heartbeats in both directions.
A dynamic group management framework for large-scale distributed event monitoring Group of monitoring agents exchange event notifications, and perform correlations collaboratively.
The paper presents a dynamic group management framework (uses IP multicast), that use event correlation information dynamically reconfigures multicast-group information. It is based on Hifi event system, where there are LMAs and DMA hierarchy. Filters are decomposed and distributed in the hierarchy. Agents may send events to group of managers, or managers may send events to agents to distribute an event correlation task.
For each event, there is a multicast group, and each DMA that is interested in the event joins that group. In addition, there is a multicast group for each domain to share domain management information. �Similarly, multicast group is created for each composite filter expression in DMA-DMA communication.
There is a MngrGrp group for communication among managers and there is are groups DMAGrp, LMAGrp that is used by managers to send decomposed filter assignments to agents. Also there is a group created for each filter (<filter-name>Grp) , and manager subscribes to the group, and when a matching events are found, agents sent events to the group. Also, authors present a protocol to keep the state consistent across all the agents.

System bootstrap
Each manager knows about the LMAs that should be in the system via system specification. Manager at the startup connect to MgrGrp and LMAGrp and wait for all the LMAs to startup. When all LMAs starts up, Manager send out environment information and then LMA start a LMA leader election, and LMA leaders create DMA,s and DMAs again go though a election to create leader DMAs and finally a Root DMA. The Root DMA sends out a completed message and then query assignment begins and after the query assignment, monitoring begins.
Towards a Verifiable Real-Time, Autonomic, Fault Mitigation Framework for Large Scale Real-Time Systems by Abhishek Dubey et al. Hierachy, local managers, regional managers, a global managers. local managers are closest to the clusters nodes, regional managers sumervise and coordiantor managers in his region, and global manager (the cluster head node) is used to submit new jobs or plan resources assigned to existing job. Uses model based apporch where a state model of the system is provided and executed in the regional managers, it is updated according to the events send from resources. Support Migration actions, and actions are performed according to the state model. Authors have special focus on jitter (time between periodic execution) and syncorinzation data collection acorss the system.

Decision Models

Maintaining a state machine and performing corrective actions when state transitions occursReflexEngine
Architecture for coordination of System management services 2004
(CoordOfSysMngServices)

Abstract: Framework for plan and refine management actions

All management events are sent to a repository, and the repository fire actions (and events). Those actions are analyzed by Plan executer who refines them (so system state is consistent). If refinement is not possible, human intervention is sought.
The Plan executer creates a plan from refine actions and they are given to a Plan scheduler who performs change using a feedback look to control changes.

Repository stores Events, Component (config/interconnections), User Profiles (e.g. SLA, config), and policies. There is a state preserver (make sure state is consistent) and Change coordinator that manage actions (expected events)

InfoSpect: using a logic language for system health monitoring in distributed systems (2002)

Abstract: Prolog based diagnose system

A centralized system. Monitoring information is fed as prolog facts and prolog rules diagnose the problem.

Only monitoring and diagnose, Uses prolog based rules

Marvel: On a rule based management architecture (1995) T. Koch, B. Kramer, G. Rohde, International Workshop on Services in Distributed and Networked Environments

Abstract: run rules based on a meta-model

Maps the system components to a data model and execute rules when pre conditions are met. A centralized system.

Rules are of the form

  1. pre conditions - triggers the rule
  2. action
  3. post conditions - must be true after rule

Rules are executed when pre conditions are true. The objects are locked to avoid concurrent edits. Shell envelope language (look like shell scripts) is used to perform actions. Rules are based on mapped object model

Sophia: An Information Plane for Networked Systems(2003) extension of InoSpect

Abstract: decompose and distribute queries to nodes

Sophia allows users to evaluate logic expressions in a distributed system, and when a query is submitted, it is decomposed and distributed to nodes closer to dependencies. Rule based actions based on input from sensors. Supports prolog like rules.

Recipe based Service Configuration and adaptation (RecipeBased) by Synthesizer

Abstract: Each Service has Service Recipes, and based on user requests, services are composed using service recipes.
A adaptation manager monitor and perform adaptations and service composition is done using recipes based on demands

Service Recipe (java logic that invoked via entry function with access to information model of the system) - Includes

  1. Abstract Configuration
  2. Physical Mapping
  3. Coordinate Policies
  4. Adaptation Strategies

To coordinate the system, monitoring and executing adaptations rules + proposals are submitted toadaptation coordinatorwho performs conflict resolution using FCFS and Service Recipes
Conflicts - (action level - two proposals change same component and problem level - proposals have two intentions).

Mapping Policies into Autonomic Management Actions Adaptation Strategies in Policy-Driven Autonomic Management (Policy2Actions-2007)

Abstract: A centralized event handler listens to events and trigger actions defined by policies.

Sensors->Monitor Manager->Event Handler

Policy- name, conditions (name of method to run + parameters), actions, target component.
The event handler keeps track of table of values it should monitor based on policies, and sends violation events to policy decision point (PDP). Event based, event analyzer keep track of policy state transition graph of history

PDP analyze events and select an action based on conditions. At PDP there may be more than one possible action may arise. Best action is identified by calculating priority of action based on current runtime context or based on past experience collected by Event analyzer

E.g. management of a single apache Server that has a GUI to specify policies.

Generic Support for Policy-Based Self-Adaptive Systems (AdeptivePolicy-2006)

Abstract: Framework for self adaptive policy

New Object oriented policy language, which supports different policy suites, and meta policy to choose the right policy rules based on the environment.

The sample system divides the operation area to few zones and applies different policies at different operating zones. Use dead zones to avoid oscillation where no adaptation happens.

Enabling Policy-Driven Self-Management for Enterprise-Scale Systems 2007

Define an algorithm to partition state space of the system so effect of outside state from the partitions can be ignored.
Then macro model is defined for each partition and system is modeled using micro models. Each rule has a confidence attribute that control when the actions for the rule will be triggered.

Define the Management formally and presents an approach that partition the state space to sub spaces so system can be solved using set of micro models

Coordination based Systems

Distributed Intelligent agents Multiagent Systems Katia Sycara

Among the characteristics of the multi-agent systems is A) incomplete information, limited viewpoint B) No system global control C) data decentralized D) communication is asynchronous.

Why? A) problems too large for centralized agents B) interoperability and interconnection of legacy systems C) solve the problems that match with society of autonomous agents naturally D) spatially distributed information sources E) expertise is distributed� F) enhance computation efficiency, reliability, extensibility, robustness, maintainability, responsiveness, flexibility.

Challenges A) formulate, describe, decompose and allocate problems to agents and synthesis results B) how to communicate/interact interoperate, discovery C) take coherent decision from local data, avoid unstable system behavior D) how individual agents reason with about actions and plans of other agents to coordinate with them E) conflict resolution F) engineer multi agent systems

Agent may have static (assigned) tasks, or they may be assigned types dynamically (e.g. Contact net protocol (sub goals and bids)).

Agent system accepts a task submitted by the user, process it and return the results. The authors identify following types of agents based on their role. A) Interface agents - get a task from the user, complete it and return the results, B) Task agents - autonomous problem solving, get a task from the user, identify sub goals, find agents to submit sub goals and construct the results, C)Information agent - generate events by changes to information sources - onetime, periodic and trigger by changes.

Among organization model used by agents are, A) hierarchy B) community of experts (specialization) C) market models D) scientific community - revise and comment on plans.� Some agents depend on emergent behavior, however they have unpredictable and unstable behavior and they are hard to engineer and model.� In some systems, they are planned and coordinated.

Among the models use for multi-agent planning are, A) static agent that sees everything B) share the plans and adjust them C) modeling the system D) Joint commitment, and Joint attitude, E) teamwork.� In addition, agents resolve conflicts by A) agent that sees everything B) standards/ norms to avoid conflicts (as in society), C) game theory (optimize utility), D) proposal counter proposal, E) negotiation utility.

To discover other agents, and coordinate with them, agents use middle agents, yellow pages, matchmakers, blackboard agents, and brokers.� In addition, standard message formats are used and, common ontology may be use to ensure common understandings.

Resource allocation can be done using; A) operational research based B) marketing based.

Reusable Patterns for agent Coordination by Dwight Deugo

This paper is an effort to define patterns for agents, just like done in OOP. Following are concerns with in an agent-based system.

  1. Mobility and Communication - with mobile agents, how two agents can talk? (carry location of other agents, lookup service, broadcast the message)
  2. Interoperability
  3. Temporal Coupling (synchronization exists between agents) and Spatial coupling (agents knows other by name, e.g. pub/sub system remove spatial coupling)
  4. Problem partition
  5. Failures

Agent Patterns

  1. Blackboard - specialist can add data to a blackboard and subscribe to changes. A supervisor decides who can change the blackboard and when the work is completed.
  2. Meeting - Create a meeting place where agents can call in (Meeting manager). Allow agent to initiate a meeting.
  3. Market Maker pattern - A broker accepts requests for resources and services, and matches them up with sellers. Broker allow the final choice to agents
  4. Master slave pattern - a master divide the task in to sub-tasks and submit it to slaves. Slaves finish the task and when every one is done, master compiles the results together.

Negotiating agents - to avoid agents get in each other's way. Agents share actions plans with other agents, and agents may accept, provide a counter proposal or reject the actions. Replacing is done based on policies which action takes precedence.

Coordination among agents using reactive rules - M. Berndtsson

Coordination in different types of systems

ARCHON - each agent has a layer, which supports cooperative interactions. and each share the information that might be useful for others and seek assistance from other agents
COOL - A coordination language for agents, Based on speech acts to specify sub goal as proposals, counter proposal, accept, reject, cancel, satisfy, fail commit, de-commit.
Workflow systems - notification about work, and synchronization
Active databases - ECA (Event, condition, Aciton rules). When a event E arrived, evalaute condition C and execuation Action A.
Collaborative problem solving usually includes three phase, a) Negotiation - allocate tasks to problem solving agents (static vs. dynamic allocation. b) execution - Solving partial problems by agents c) result reporting - collect results, final output.

When a task is submitted, if it cannot be solved by a one agent, it is break down in to subtasks, and contact is created for each sub task. Then bids are called for each sub task, and they are individually processes and results are collected. The process is modeled by the system as ECA rules.

Component Coordination in middleware systems by Matthias Rodestock

Abstract: Intercept messages and dispatch them to original recipients after possible delay, voting among other coordinators is used for decisions.�

The paper presents a coordination model for open systems (where components cannot be changed). The model installs traps in to the system that intercept and redirect message to a coordination components, based on translation rules. The coordination components may dispatch the message to the original recipient, after some delay, and this is usually done with voting with other related coordinators.� E.g. The paper presents an example of implementing dining philosopher using the model.

On Monitoring and Steering in Large scale multi-agent systems - Takahiro Murata et al

Paper presents a model to monitor and control agents using enforcement of policies with in each agent. Each agent has a controller, which intercept it's messages and enforce policies expressed in terms of rules and actions.

  1. RAPTOR -Fault Tolerant Distributed Information Systems, John C. Knight Matthew C. Elder
  2. Policy Specification for Non-Local Fault Tolerance in Large Distributed Information Systems Philip E. Varner
  3. The Willow Architecture: Comprehensive Survivability for Large-Scale Distributed Applications
  4. Fault Tolerant Distributed Information Systems John C. Knight Matthew C. Elder
  5. Fault Tolerance in Critical Information Systems Matthew C. Elder
Maintain state machines based in response to events montiored from the system. State machines are placed in a hierachy and none local errors are detected by higher levels. State machines are described using the state description language (TEDL is a language for specifying the detection of errors and repair of faults in Internet-scale distributed system) and react to failures. Supports Non Local Faults, Fault Hierachies, Fault Sequences. Use for information system failure detection (e.g. intrusion detection systems). This approch has parrellals to Complex Event Processing, however at each level it has some more state due to state models can take more intellgent decisions. However, upper layers have only a high level view of next layers (similer to CEP), however their application domain this approch is OK.
Survivability Management Architecture for Very Large Distributed Systems Jonathan Charles Rowanhill 2004 & ANDREA: Management for the Survival of Very Large Distributed Systems Jonathan C. Rowanhill John C. Knight Hybrid between hierachical (bottem up) and collobrative (peer to peer) control. Each manager delegrate taks it can not handle to other managers. The language includes delgation statemenets, and those statements are matched to other managers. Therefore, the control hierachy is created on demand.
PROBABILISTIC FAULT MANAGEMENT IN DISTRIBUTED SYSTEMS -- JIANGUO DING (Thesis 2007) Use statistics/Probability to handle the uncertinity
Reliable On-Demand Management Operations for Large-scale Distributed Applications Jin Liang, Indranil Gupta and Klara Nahrstedt Create on demand overlays to evalaute queries or pushing softwares in to the overlays

Deployment

An Architecture for Post-Development Configuration Management in a Wide-Area Network Richard
Configuration and dynamic reconfiguration of components using the coordination paradigm George A. Papadopoulos, Farhad Arbab

Surveys

A taxonomy of Grid Monitering Systems

Four types of Systems

Internal (Does not have a publisher), Sensor->Consumer - (MapCenter, GridICE), centralized web interfaces to monitor availability (former) and utilization (latter)).

Sensor->Publisher->Subscriber - Autopilot - Register set of Sensors in a registry, the client find the sensors and subscribe to them. Monitoring data are used with a fuzzy logic set to select the suitable policies with in the application.

Sensor->Publisher->re-publisher->Consumer

GridRM - Every organization has a java-based gateway that collects the local monitoring information and makes them available via a common data model. Those gateways are registered with a registry so they can be found and queried.

Hawkeye - Every monitoring node has a monitoring agent which periodically calculate set of matrices and communicate them to a central manager. Central manager republish indexes and current state of the nodes for fast query execution and periodically store it in a round robin database. Users can query those data via API or a front end.

HBM (Globus Heart Beat Monitor) - Each host has a local Monitor, and every monitored process registered with the local monitor. Local monitors in different nodes periodically update the status of monitored processes to data collector and data collectors will publish them to subscribers.

JAMM - place a sensor manager per host and each manager is associated with a gateway where events a sent using net logger data format. Consumers can lookup a LDAP registry for available sensors and associated gateways and retrieve events from the latter.

Mercury - Set of Sensors ->Local Monitor (Per Host) -> Main Monitor -> Monitoring Service. The monitoring service received queries and if they are valid, the hierarchy manages lower levels to perform the monitoring for query and provide results. Main monitor can be deployed in many places for load distribution.

NetLogger - Use for performance analysis of complex systems, API, library, Sensors and visualization front end. The event may send to local or remote location. Also on anenhanced version, log level is set on Activation manager and Netlogger poll the manager for activation settings. Activation node time to time publish the event to activation producer who match them with consumer subscriptions.

OCM - Sensors <-> local monitor (per host) <-> service managers <-> end user performance tools (There is also a return control flow).

Remos - Query based interface for network interconnections and topology graphs. SNMP other collectors -> Master Collector (for site) -> Modeler predication service. Modeler build flow or topology abstraction based on the query.

SCALEA-G Sensors ->Sensor Manager -> Archival Manager + Sensor repository -> Instrumentation Service (Consumers establish subscriptions or perform queries)

Survey of Adaptive Middleware

View the adaptability in very broad sense,

1) Configurable (Static) - compile time - Quo[66], Embedded Java [76],

2) Customizable (Static) - deployment time - CORBA interceptors, Config files, Eternal[79], IRL[74], Rocks[73]

3) Tunable (Dynamic) - after startup, but before use

4) Repeatedly Tunable (Dynamic) - core remains intact

5) Mutable - adaptation on runtime - Reflection, Late composition of components, dynamic weaving of aspects e.g. OpenORB

Among the techniques uses to create mutable middleware are

a) Computational Reflection - Ability of system to reason about itself (a meta-model), b) Component based design, Structural reflection (inspects and modify internal architecture), bahavioural reflection (inspect and modify computation.).Have a meta-model that represents the system.

S

c) AOP - separate cross cutting concerns during deployment time

There are different types of middleware that supports adaptability.
Host infrastructure - On top of OS, e.g. specialized sockets, thread libraries

Distribution layer- high level programming abstractions e.g. RMI, SOAP, CORBA, DCOM

Common Services - fault tolerance, security, load balancing, event propagation, logging, persistence, real time scheduling, transactions

Domain Services - applications for specific class,

QOS oriented Middleware

1. Real Time - hard/soft (ACE[71,80] - Dynamically updated using Service Configuration Pattern, TAO - virtual component pattern, Dynamic Tao - service configurator Pattern, load/ unload/ adept component implementations - (Repeatdly Tunable) )

2. Among Stream Oriented systems OpenORB (use reflection to provide dynamic adaptation, and interceptor hooks are used to change it's behavior),

3. Squirrel[90,91](use smart proxies), and Metasockets (provide adaptive streams that use dynamic interception or removal of filters to achieve repeatedly tunable behavior).

4. Among Reflection Oriented system OpenCorba [48]( varies properties of ORB by changing implantation of a object) and Flexi Net (set of components dynamically assembled, Use interceptors to change channel behavior) and �(repreatdly tunable)

5. Among Aspect Oriented systems are Quo (aspects are defined using a language and Quo use interceptors at pre/post method to implement those aspects. Interceptors are placed as both server and client side. This is similar to Axis2 handlers), and AspectIX[68] (dynamic waving of aspects. Each object consists of fragments and associated aspects of the objects can be inspected and changed at runtime).

Dependable Middleware

Reliable Communication - Reliable Sockets (use interception to protected socket based applications from network failures.), Reliable packets (intercepts packets at OS level), and Group communication (Ensemble, Isis, Horus, Totem, Coyote e.g. Ensemble Support protocol graph constructed from fine-grained components. QOS monitoring is achieved by inserting detectors to protocol graph. Detectors perform adaptation by distributing a new protocol graph to participant in need).

Fault Tolerant systems usually provide replication at different levels of the system. Among them Electra[103], Orbix+Isis[104] modify the ORB to multicast the requests to replicas, and replication is handled at ORB level. With Object Group Services, the replication is transparent to the ORB, and CORBA object multicast the requests to services. In addition, there are systems, which intercept the requests in OS level, or ORB interceptor level and multicast the requests. FRIENDS[51], handle non functional requirements by adding interceptors at both side. Provides fault tolerant replicas as Meta Objects.

Load Balancer - first request is send to load balancer,which forward the request to a object (notifying client). Client talks with that object and if load is too high, the request might againforward the request to another object. Load balancer monitor the load and if load is unbalanced, it may ask objects to forward the requests back to load balancer, who will redirect the client again to a less loaded client.

Service Configurator Pattern - Configurator create and keep services in a repository, service support init(), finish(), resume(), suspend(),info(),provide Message based control Interface and Dynamic updates
Virtual Component pattern - Decompose the middleware to components, for optional components define virtual components. Virtual components may load the real components only if a request received, or they may load one of the implementation of the components allowing middleware to be tunable.

A Survey of Distributed Enterprise Network and Systems Management Paradigms

Network and system management have evolve from many provisory technologies to SNMP and CIMP, which are mostly centralized. Main goal of those technologies was interoperability. Those systems are used in network monitoring, system management and telecommunication network management. However, centralized systems lack Scalability, Flexibility and robustness and new technologies like CORBA/WS/Java/Mulit-agent systems are available.

Taxonomy of management systems

1. Design could be Centralized, Weakly distributed(Hierarchical ), Strongly Distributed (Hierarchical �or cooperative)

2. Delegation - Usually, Based on hierarchy, Horizontal delegation - Cooperative systems, Vertical delegation - Delegate work, power, accountability, responsibility to lower layer.� Delegation can be done based on domain (Manager at level N delegate tasks to N+1 level ), task (transfer dynamic tasks from Level N to N+1, Micro tasks - editing a parameter, queries or Macro �tasks - high level task, management of complete task )

3. Symantec richness of information model, a) Manage objects - in terms of get/set e.g. SNMP,CIMP b) Computational Objects (Meta Objects) - provide high level operations e.g. JMX c) Goal - used only in cooperative paradigm, when goal is set up manager to a agent, it is agents responsibility to workout the detail.

4. Degree of automation

SNMP provide a centralized system with managed objects. SNMP + (RMON, M2M, DISMAN) provide hierarchical system with managed objects.

OSI management uses a hierarchical model, and mobile code use a hierarchical model with horizontal delegation. Intelligent agents are cooperative, and strongly distributed, and they usually use a goal base model.

Self-healing systems -- survey and synthesis

Three main areas in self healing systems

Maintenance of Health

1. Maintaining Redundancy by Cell division [17] , Self Assembly [40] , Multi-Agent Decision Making [24]

2. Monitoring Control loops - by Grid Adaptation [Rainbow, Extreme,6] , Decision Control Layer [26], Feedback control loop[50] , Adaptive Monitoring[7] , Sensors Gathering data from functional layer[38]

3. Architecture based Models - Maintain diversity by Functionality based security[48] , Diverse Design Model[13] , Identify Critical Paths[25]

4. Performance Log analysis by Watchtower[28] , Predict target event[47] , Finite Automata Scheme[20]

Detection of System Failures

1. Missing components �- Disappearance of components (detect by disappearance of neighbors)[40] , Absence of Message[17] , Missing response[2] , Missing scheduled announcements [9]

2. System Monitoring - Monitoring data for trigger [4][6][16] , Difference between running and actual system[50] , Parallel space servicing[7] , Proactive Probing [37] , Component addition/missing [11]

3. Foreign Element - Proactive containment[37] , Representing system Diff to identify foreign element[11], notify about malicious replica [38],

System Recovery

1. Redundancy Techniques - Self assembly of components[40] , Replicating cells[17] , Recovery oriented computing[22]

2. Architectural models and repair plans - Using Gauges[6] , Feedback control loops[50] , Service and Contract[7] , Event based configuration [11]

3. Component Interactions for healing - the system recovers to maintain architecture according to the specification[19], �, Soft state and application level recovery in a discovery system[10] , dynamic reconfiguration of the system architecture to Isolation of fault component[35]

4. Byzantine agreement and Voting � finding the malicious node by voting [37] , Preprocessing (decision making by lotteries - select a agent randomly, auctions, voting methods, team approach) before solving the problem, and post processing after solving the problem to find best answer[25]

5. Other methods e.g. Recovery Oriented Computing [23], structural re-organizing at run time [19].

Middleware

Review of Current routing protocols for Ad-Hoc mobile wireless network Elizabeth M. Royer

In an Ad-Hoc network, nodes move, join or leave rapidly.� There are two classes of routing algorithms.

Table Driven routing

  1. Destination Sequenced Distance Vector routing -each note maintain all possible destination within the network, and number of hops to each destination. Each route is marked by sequence number assigned by destination host. The routing tables are broadcasted throughout the network time to time.
  2. Cluster head gateway routing protocol - each cluster has a cluster head elected among the nodes. Messages are sent from a node to it's cluster head, then to a gateway nodes (gateway is in contact with two cluster heads), and to destination cluster head and finally to the designation.
  3. Wireless Routing protocol

On demand routing - Routes are found on demand, and it may me maintained once it is found for some time.

  1. When a route is needed, a requested is broadcasted to neighbors, and again broadcasted in turn until a valid route is found. When a route it establish it is keep for some time.
  2. Dynamic Source routing - Similar to #1 each node, receiving a packet checks whether it knows of route. If not, it adds its own address to route record and forwards packet. Different with #1 is not clear.
  3. Associatively based routing - based on how long a given node stays connected, and routes are build favoring stable nodes.
  4. Protocol selects routes based on the signal strength between nodes and a node�s location stability.

A Scalable Content-Addressable Network, Sylvia Ratnasamy, Scott Shenker

Abstract: A P2P routing algorithm that support multi-dimensional key DHT.

D dimensional space is portioned and assigned to nodes, and node store data whose key fall in to the assigned space. Each node keep track of all neighbors, and their space assignments.
When a node is asked to route a message to location k1, he send it to the neighbor who is closest (in distance on d -dimensional space) to the k1. Due to the direction exits in d -dimensional space,average path length is (d/4) (n1/4).
Node Join - When new node came, it pick a random position in the space and that node share his space with new node. The node updates, his own and neighbors routing table to reflect the changes.
Leaving nodes - Leaving node may assign his zone (merge) with a neighbor before he leave. To handle failures, each node send heartbeat messages to the neighbors, and in absence of heartbeats, neighbors starts a takeover. In takeover, the take over message is send to all neighbors of the failed node, and node with smallest zone has precedence. Each node start takeover independently, and receiving a takeover message, node stop the timer, or respond with own takeover message. If node fails, data stored there is lost, unless refreshed by the owner of data.

Chord: A Scalable Peer to Peer lookup service for internet applications

Distributed lookup service - give a key look up the service (DHT).
each lookup is resolved in O(log2(N)) messages, and each server maintain only log2(N) information about other servers. The data is stored in the node to which the key of data is mapped.

Routing is based on a ring, where each node keeps track of his successor, and predecessor. By following the ring, any node can send a message any node. To improve the lookup, each node keeps track of nodes distributed across the ring (and may be more levels of data about node near themselves in the ring)
Join � New node is give a random ID, and each node keep track of successor and predecessor. When a node is added, it find it's suspensor and create the finger and successor entries using the help of successor
stabilizing protocol - look at next successor� and next predecessor of n and make predecessor the successor of n if required.
Failures - Each node maintains N successor list, if a node fails, it will use next live successor as the successor.

Elections in a Distributed Computing System Garcia-Molina, H. 1982

With No communication and node pauses (in synchronous network) - Bully algorithm works (try everyone who is older, and tell every younger you are the coordinator).
With Communication and node pauses - forming groups by sending invitation and merging groups when they are detected it used. Read paper for details.

LDAP (Light weight directory access protocol)

Directory service stores objects with properties, and allow users to search for object using properties, and also Compare - test if a named entry contains a given attribute value, Add a new entry, Delete an entry, Modify an entry, Modify Distinguished Name (DN) - move or rename an entry .� Generic directory distributed features can generally be divided into three areas: referrals, chaining and replication. (LDAP has referrals only)

Referrals - When a query comes into a directory server and cannot satisfy the request directly, the server may have information about a different server that does indeed have the answer. In that case, it can return to the client a referral.

Chaining - Alternatively, the server that does not have the answer on the behalf of the client can go ask the server that does.

Replication - directory information is reproduced and synchronized between two directory servers. Replication can be master/slave, (a single directory is updated and propagates the changes to one or more slaves); or multi-master, under which any copy of the directory can be updated, automatically or periodically updating all the other copies.