Meeting Minutes:  Team Meeting w/ Priya
Time: 02/25/04 @ 6:00 - 6:30 PM
Where: MSE Cave
Attendees: Ackley, Boyer, Fry, Wilson
Purpose: Weekly Status Update & Demo
------------------------------------

--- Demo Fault Tolerance to Priya ---
In this initial implementation, the client contacts a 'Load Balancer'
on the "golden machine" for a replica.  The client then contacts replica 
for message processing.  This permits a simple load balancing mechanism.

Priya Comments (our TODO list)
*) We have sucecssfully demonstrated we've completed the Fault
   Tolerance Baseline.
*) Remaining tasks are:
   - Automatic Replica recovery - currently a failed replica must be
     manually re-started.
   - Obtain initial Performance Measurements
     - Must keep track of Faults generated & recovered from.
     - Baseline Performance (a fault-free report)
     - Recovery Time (performance with faults)
   - Revise Failover strategy for performance.
*) In our current design the load balancer, as a single connection point
   will become a bottleneck in terms of scalability.
   - May wish to consider some other strategies to mitigate this, yet
     still permit load balancing.
   - Current design is a "pull" strategy, where clients must constantly
     query the load balancer for information.
   - Perhaps we could employ a "push" strategy, where clients retain a
     cache of system load, and this cache is periodically updated by
     the replication manager.

Q) How would you classify our replication strategy - passive or active?
A) Neither - really it is a clustering strategy.
   - All the replicas are actively doing work - just not the same work.
   - We are the only team to adopt such a strategy for our project.

------------------------------------
Team Meeting (Before Demo)
Time: 02/25/04 @ 5:00 - 6:00 PM / 6:30-7:00 PM
Where: NSH Atrium / Wean Hall - 4th floor corridor niche

Design Update:
--------------
* System handling of duplicate MsgID's
  - Currently generating a FATAL exception.
  - System will be modified to handle duplicate MsgID by first checking
    for such duplicate MsgID.  If found, we will re-use that entity
    bean.
  - Any duplicate messages (as a result of failover) will be passed to
    SpamAssassin for processing.  We will NOT store the result of the
    processing in the database, as the activity is indempotent.
* Generation of MsgIDs
  - Currently the client generates a MsgID/TransactionID as a 
    concatenation of the clientID + hash(msg)
  - Issue is that hashes are NOT unique, and take some time to compute.
  - System will be modified to replace the hash(msg) part of the MsgID
    with the system timer (milliseconds).  This should ensure a 
    unique MsgID, and involves less processing.
* Server/Replica State
  - The ServerStat table has been modified to contain a ServerState.
  - This state consists of "Active/Inactive", and indicates the state
    of the replica on that server.
  - A replica is marked "Active" at system startup.
  - When a client detects a fault, it marks the replica as "Inactive".
  - The load balancer will only give requesting clients the names of
    replicas marked as "Active".  Thus, a failed replica will never be
    contacted again.

New Assignments:
----------------

PW - Merge Fault-Tolerance & Java Log into CVS.

GA - Continue working Global Replication Manager
GA/CF - Add Automatic Replica Recovery/Restart
GA/CF - Add Automatic Fault Detection (Replica Polling)

CF - Merge eat-spam.pl into MPClient

CF - Track down current exception in system 
     - (an unhandled exception from SpamC?)

CF/PW - Complete Java Log Configuration & invokation

AB - Test Cleanup
     - Make test results obvious (Pass/Fail)
     - Make test activity obvious (display test activity)
     
** Obtain Performance Measurements
   - records number of failovers
   - determine fault recovery time
   - Postponed...
     - May depend on eat-spam/MPClient merge

AB - Implement Design Change : Re-use duplicate MsgIDs
AB - Implement Design Change : Replace hash(msg) --> timestamp


Notes:  Team goals are to:
* Complete component implementation by Monday
* Begin component integration Monday Evening.