| Team 3 : Project Page | 
				| 17-654: Analysis of Software Artifacts 18-846: Dependability Analysis of Middleware
 | 
			
		TEAM 3
		Team Members:
		
		Team Roles:
		- Project lead: Ackley 
  
			
- Baseline: Fry / Boyer 
  
			
- MTA front-end, Spam back-end: Fry / Wilson 
  
			
- Database: Ackley / Wilson 
  
			
- Testing: Ackley / Boyer 
  
			
- Documentation: Ackley / Boyer / Fry / Wilson
			Project Title: Spam'n'Beans - A High-Performance Mail Content Checker
			Baseline Application Description:
		A system that assists high-volume EMail servers by analyzing received EMail
			against a central database of spam. Mail servers may be configured to reject
			EMail classified as having a high probability of being spam.
			
Configuration:
			- Java 
  
				
- Enterprise JavaBeans (JBoss) 
  
				
- Linux
Third-party software, if any (databases):
			- Sendmail 
  
				
- SpamAssassin 
  
				
- PostGreSQL
Project Documents
			
			Project Downloads:
			
			
			Baseline Application
			Interfaces
			
			Scenarios/Interactions
			
				- The customer configures their MTA to forward email 
  into the Spam'n'Beans system. 
  
				
- The system will then compare the content of the 
  emails and assign a likelihood value that the email is spam. 
  
				
- The system will return the original email back to the 
  customer's system with the appropriate spam likelihood embedded within the 
  message headers. 
  
				
- See Baseline Use Cases
Current Status
			
			Downloads
			
			
			
			Fault-Tolerant Baseline Application
			Architecture
			Fault Tolerance will be achieved with the following system 
            attributes:
			
- Replication
				- The system is designed with a cluster 
                  of replicated middle-tier servers, and one or more back-end 
                  servers which are not replicated.
- A backend server will 
                  contain a 
                  Global naming server, EJB container, Fault-Detector and 
                  Replication Manager console, and a Load-Balancing manager.  
                  A second back-end server will run a PostGreSQL database.  
                  None of the back-end servers are replicated and thus remain as 
                  single points of failure within the system.  The backend servers will not
		      host middle-tier replicas.
		  
- All middle-tier 
                  replicas consist of its own JBoss naming service, EJB 
                  container, and SpamAssassin Daemon.  All replicas in the 
                  cluster may simultaneously process client requests, though only one server will process 
                  a given client request at a time.   
- Client requests are 
                  load-balanced across the servers in the middle-tier cluster.  
                  This permits optimum use of resources and allows the fastest 
                  turn-around time (on average) for client requests.  To 
                  permit load balancing each client will query the Load 
                  Balancing Manager on the back-end server - which returns to 
                  the client a particular middle-tier server to contact.  
                  The client will then submit its mail processing request to 
                  the indicated middle-tier server. 
- 
                  The Replication Manager console running a back-end 
    server offers administrative functions to dynamically add/remove machines to 
    the pool of replica servers and to individually launch/shutdown each 
    middle-tier replica.  This permits system administrators to perform 
    routine maintenance on servers with minimal disruption to the system.  
    Using the console, a system administrator may shutdown, or add and launch a 
    middle-tier server without affecting the other middle-tier servers in the 
    cluster.
 
- Fault Detection
				    - Faults are detected through the use of throwing and 
    handling exceptions. 
				        
- Such exceptions fall into 3 categories:
						- Application
							
- Caused by invalid user input and are returned to the user as descriptive errors.
                            
 Requests sent from invalid or inactive clients will 
                            be returned to the user as a user error.
- Non-Fatal
							
- These are exceptions that occur as a result of a system 
								component failure.  
							- Such exceptions are not returned to the user, but are handled gracefully 
								using failover mechanisms.
							- Network exceptions received by clients from the 
                            middle-tier servers are 
							considered  Non-Fatal and clients will transparently failover to a secondary server 
                            using the same transaction ID.
                            - Exceptions received by middleware servers to the clients will result in no activity 
                            on the part of the middleware server. If the client 
          detected the failure, it may retry the operation. The middleware 
          must then detect and ignore the duplicate transaction (which may have 
          been routed to a secondary server). If the client does not retry 
          the operation, all is well and no corrective system action is 
          required.
- Fatal
							
- These are generally non-recoverable system faults which may require
								complete a system shutdown and restart.
							- These exceptions are returned to the user and 
                            system administrator where possible, and may result in total system 
								failure.
							- Examples include failures in the Database 
                            backend, Replication Manager, etc.- Exceptions 
                          received by the middle-tier from the backend 
                          (database) server are considered Fatal and will be 
                          reported to the client and system administrator.- 
                          Exceptions received by the backend (database) 
      server from the middle-tier will result in a transaction rollback.
 
- This console also acts as a Fault Detector which 
                      periodically polls all middle-tier servers 
                      launched from it.  In the event of the polling 
                      receives a Non-Fatal exception, the console will assume a
                      crash-fault of the middle-tier server 
                      and will automatically restart the failed 
                      replica.
- In the event the client receives a Non-Fatal 
                      exception, it will assume a crash-fault 
                      of the middle-tier server, and will attempt to 
                      transparently failover to another middle-tier server.  
                      Any Application exceptions received by the client are 
                      considered to be caused by invalid user input, and are 
                      indicated as such to the user.  Any Fatal exceptions 
                      received by the client indicate there is no opportunity 
                      for failover and reported to the user to indicate the 
                      system is unavailable.
- Assuming fail-silent behavior, the system is 
    capable of handling any number of simultaneous or successive faults in the 
    middle-tier replicas, provided there is at least one middle-tier replica 
    that is running at all times.
 
- Failover 
                - When a client 
                      receives a Non-Fatal exception (indicating a middle-tier 
                  server failure) it will transparently failover 
                  by contacting the Load Balancing Manager for another 
                  middle-tier server and re-submit it's request.  This 
                  assumes the services of the back-end servers (such as the Load 
                  Balancing Manager) are always available.
- 
                  The system design is such that the client communication with session beans  are stateless and can be considered 
    idempotent.  The only state retained by client requests is maintained 
    by entity beans which are stored in the back-end database.  The state 
    of a client for a particular transaction is saved/updated only on successful 
    processing of a message. This should obviate the need for replica 
    checkpoints.
 
- Unique Identifiers
                - Each client request will supply a unique 
                  MsgID. This unique permits 
                  detection of duplicate transactions that may occur as a result 
                  of a failover so that the middle-tier servers do not perform 
                  the transaction twice.
 
- Fault Injection Techniques
                - Inject a crash-fault using any of the following techniques:
		    - Use the console to shutdown a replica gracefully 
      & remove it from the replica cluster. 
 
		    
- Use "repman" to shutdown a replica - resulting in 
      the console restarting that replica. 
 
		    
- Use "kill" to kill a replica process - resulting 
      in the console restarting that replica. 
 
		    
- Disconnect the network cable or shutdown a 
      machine running a replica - resulting in console repeatedly attempting to 
      restart unsuccessfully until the system is restarted/re-connected. 
 
		    
- Run the "scripts/FaultInjector.pl" script to periodically kill
			replicas
 
 
Scenarios/Interactions
			
				- See Fault Tolerance Use Cases
				
- Current testing has used the following configuration:
				
				- The PostGreSQL database server is always running on the same 
                machine.
- 
				The console (Admin, Fault Detector, Load Balancer) machine may 
                be chosen at startup to be any one of the ECE cluster machines. 
                (prefer settlers, othello, or girltalk)  It is assumed that 
                JBoss has already been started on the machine which is to be the 
                console system before the console is 
                launched.
- 
				The middle-tier cluster machines may be chosen shortly after 
                system startup. (prefer settlers, othello, or girltalk)
- 
				It is assumed that all machines in the ECE cluster 
    mount the same network paths.  Thus the location of the JBoss 
    application and deployment directories for each of the system is the 
    same.
 
Current Status
			
			Fault tolerant performance graph
			
 
			
Fault tolerant performance data
			
			  | Elapsed seconds | 1622 | 
			  | Seconds/message | 0.81 | 
			  | Messages/second | 1.23 | 
			  | Bytes processed | 11069358 | 
			  | Bytes/message | 5534.76 | 
			  | Bytes/second | 6824.62 | 
		        
			Fault tolerant performance graph with fault injection
			
 
			
The FaultInjector.pl script was configured to kill one replica every 180 seconds,
			or approximately every 200 messages
			
Fault tolerant performance graph with multiple clients
			
 
			
Four machines were each running five clients.  Every client processed 100 messages
			for a total of 2000 messages.  The database time gets much larger when many clients
			are running simultaneously.
			
			
Downloads
			
			
			Real-Time Fault-Tolerant Baseline Application
			
			Scenarios/Interactions
			Current Status
			Downloads
			
			
			High-Performance Real-Time Fault-Tolerant Baseline Application
			
			Scenarios/Interactions
			Current Status
			Downloads
			
			
			$Id: index.html,v 1.24 2004/04/19 22:07:40 gca Exp $