18-642 Project 11
Updated 11/21/2021. Changelog
Learning objectives: implement more monitors in a style similar to safety
monitors used in life critical systems; confront race conditions and asynchrony
in distributed systems.
Lab Files:
- Project 11 maze files (These are
the same mazes used for project 12; no new mazes will be introduced after this
point.)
Hints/Helpful Links:
- $ECE642RTLE_DIR refers to ~/catkin_ws/src/ece642rtle (or
corresponding project workspace).
- Please follow the handin naming convention: every filename (before the
file extension) must end in [AndrewID]_[FamilyName]_[GivenName].
- See the Recitation video and slides on Canvas
Procedure:
- Write monitors for the rest of the invariants. Refer to the
previous project for reminders on how to write and compile these monitors. It
might be helpful to factor out some functionality in a utility file, such as
ANDREWID_monitor_utils.cpp. You should also use the structured types
defined in $ECE642RTLE_DIR/monitors/monitor_inteface.h. You are
required to write the following monitors:
- ANDREWID_turn_monitor.cpp: update or use as-is from previous
project.
- ANDREWID_tick_monitor.cpp: between calls to tickInterrupt, there
shall be at most one call to each of poseInterrupt, visitsInterrupt, and
bumpInterrupt. tickInterrupt occurs after every call to moveTurtle, so this
effectively checks that your moveTurtle only makes one of each type of call.
- Consider whether it is possible for those three interrupt calls to happen
in varied orderings, and if that is an issue what you might do to ensure that a
changed ordering does not break your monitor design.
- ANDREWID_forward_monitor.cpp: the turtle shall face the direction
it is moving. In other words, if the turtle moves from square A to square B,
and square B is direction DIR (example: east) of square A, the turtle must have
been facing DIR before moving. ADDITIONALLY, the turtle shall not move in round
in during which it has also turned.
- ANDREWID_wall_monitor.cpp: the turtle shall not go through walls
- Recall that the monitor connects to the system and eavesdrops on ROS
messages, so it does not actually have a ground-truth of where the walls are in
the maze. (In real safety critical systems it is difficult for a monitor to
know more about the environment than the rest of the system does since it has
to rely upon the same sensor technology as the main system.) Because of this, a
more precise definition of our invariant is: given a turtle that has gone
between squares A and B that share an edge E, we view an invariant violation
as: turtle has not checked the given edge E bump() before crossing
it, OR turtle has checked the given wall segment E for bumped() ,
which returned true, and turtle still crossed it. In other words, the
turtle must have seen bumped() return false for E some time in the
past before going from A to B. This suggests that, every time your monitor
sees a bumped() call, it will have to store it for lookup later. It
also effectively outlaws designs that remember where the walls are to skip
calling bumped() -- but it is common for safety monitors to close off some
avenues of behavioral optimization, so this is a realistic tradeoff decision.
- You can simplify this approach by putting a bound of how many past
bumped() calls your monitor keeps track of. For example, if your
turtle implementation always checks bumped() before moving, you only
need to keep the last bumped() call stored. Similarly, if your turtle
spins and checks four directions for bumps before moving, you only need to keep
the last four bumped() calls stored. There are more sophisticated
approaches possible, but do be sure that when the turtle moves to a new square
your monitor doesn't retain stale bumped() data from the previous square.
- Note that, geometrically, the line segment between (0,1) and (0,2) and the
line segment between (0,2) and (0,1) are the same. Also, because all walls in
the maze run parallel to the x- and y- axes, either both x- or both y-
coordinates of each wall segment will be the same. We therefore suggest you
store your line-segments in a canonical format to make it easy to compare two
line segments: dictionary order, or all line segments fulfilling ((y1==y2
&& x1 < x2) || (x1==x2 && y1 < y2)) is a good one.
- Hint: Our implementation used a fixed-size circular queue (statically
declared array) and two main helper functions: Endpoints wallBetween(Pose
p1, Pose p2) and bool bumpedInMemory(Endpoint e). However, you do
not have to adopt this approach.
- ANDREWID_face_monitor.cpp: for any call to bumped(x1,y1,x2,y2), the
turtle shall be facing the wall segment with endpoints (x1,y1) and (x2,y2).
- As a hint, note that this means there is only one valid wall segment that
the turtle can call bumped(...) for any given position and orientation. You
might consider computing this valid wall segment and comparing it to the
arguments of bumped(...).
- ANDREWID_solved_monitor.cpp: If the turtle has solved the maze
(atEnd(x,y)==true), it shall not move or turn
- ANDREWID_atend_monitor.cpp: Turtle shall only call atEnd(x,y) if it
is at a position x,y. (No remotely fishing around the maze to figure out where
the goal is for path planning.)
- We have seen occasional problems with the ROS Warning stream being
corrupted when terminating the simulation, probably due to the output stream
not being flushed properly when a Control-C is encountered. A suggested
workaround is that this monitor outputs a "Successful atEnd" using
ROS_WARN when the turtle has reached the end of its journey (atEnd(x,y) is
true) to flush out any warnings still buffered in the output queue. This way
you know that when you see a few "Successful atEnd" messages in a row
that any pending warnings issued before that have also been output
successfully. It might take several such messages to flush the output queue, so
this monitor can output the success message once every tick.
- Other Requirements and notes:
- "step_monitor" shall be included in your system. If you
need to update it for some reason you may, but the intent of the monitor shall
be unchanged.
- Your code shall use the ROS_WARN function to indicate an invariant
violation.
- You should use ROS_INFO to print out any relevant information for
your monitor to display.
- You can use ./run_642_monitors with any number of arguments to
run a subset of all the monitors at once and observe the violations in
VIOLATIONS.txt.
- You can also use the command rqt_console to view and sort ROS
logging messages.
- Take note of any invariant violations. Are they due to the turtle code or
due to bugs in your monitor(s)? If the latter, fix the bugs. In rare cases, ROS
timing issues might give you false positives if messages arrive out of order.
If you see these, you'll need to reconsider your design to tolerate this type
of issue.
- You should fix any invariant violations. (For the next project that becomes
"shall fix", but for now you don't need to be perfect.)
- The number of times the turtle visits a cell has been intentionally left
off the required monitors, but fewer is better.
- Monitors should send an "Monitor [NAME] is runnning at [time]"
entry to the violations file to help differentiate between a violations file
being empty because the monitor did not launch vs. a no-violations run.
- Monitors should send an "Monitor [NAME] is runnning at [time]"
entry to stderr / cerr to help catch issues with monitors failing to launch.
- Try out the Project 11 maze files (see link in lab files
above). Notes:
- Mazes m1-m6 in this folder are the same m1-m6 that you have seen in
previous projects. m7-m10 are new.
- Put the mazes in the $ECE642RTLE_DIR directory.
- All of these mazes are your acceptance tests -- your turtle shall solve all
mazes m1-m10 to get the corresponding points for the final project, including
running without invariant violations. For this project you don't have to solve
all of them.
- Updated the build package similar to the one for previous
projects, meeting the following requirements:
- ANDREWID_build.sh enables all previously required compiler flags
and runs all unit tests. Any compiler warnings and any unit test failures are
annunciated in a way that a grading TA can readily see.
- ANDREWID_run.sh k runs maze k with all run-time monitors enabled.
Any run-time monitor failure is annunciated in a way that a grading TA can
readily see.
- Writeup: Include the following in a writeup:
- Your name. (If the writeups are printed out, the file name info is lost, so
put your name and Andrew ID in the document text as well)
- Q1: Did you struggle with anything creating these invariants? If so,
briefly describe. (We strongly recommend you see a TA at office hours for any
difficult spots before handing in, or at least before working on the next
project.)
- Q2: Did you get any invariant violations due to turtle behavior? If so,
briefly describe.
- Q3: Did you get any invariant violations due to timing issues (that
includes timing issues you fixed before hand-in)? If so, briefly describe.
- Q4: Include a screen shot of solving maze m1 and the window showing
invariant violations generated with all invariants enabled. (If no violations,
show that this is the case. If there are more than 5 or 10 invariant violations
then just showing the last handful or two is fine according what is reasonable
to show in a screen shot.)
- Q5: Summarize the status of your invariants as follows: For each invariant
monitor, state which mazes violate the monitor. (A table might be the easiest
way to do this with monitors on one axis and maze numbers on the other.)
- Q6: Summarize the status of solving each maze m1 through m10, minimally by
stating for each one if it is solved yes/no. Remember you'll want to solve all
the mazes for the final project.
- Q7: Any feedback about this project?
Handin checklist:
Hand in the following:
- A release candidate build file: ANDREWID_P11.tgz
- Your writeup, which should be in a single acrobat file called
p11_writeup_[FamilyName]_[GivenName]_[AndrewID].pdf
Other requirements:
- Use the usual project naming conventions with a prefix of "p11"
for items other than source code.
- The writeup shall be in a single acrobat file for ease of navigation.
- Fonts for all diagrams and text shall be rendered at no smaller than 10
point font. Smaller fonts will require a resubmission
Zip the files and submit them as P11_[Family
name]_[First name]_[Andrew ID].zip.
The rubric for the project is found here.
Hints:
- The invariants defined in projects 10 & 11 completely replace the less
formal constraints you've seen in previous projects. The final project will
require that you satisfy the written in project 10 & 11 to implement the
constraints previously described.
- There is a sleep statement between launching monitors in the scripts we
give you. In some cases the sleep is too short to finish launching a monitor
(it depends on a lot of conditions such as how fast your virtual machine can
run on your physical machine and might vary from run to run). If you have
problems with monitors not running or situations that work with one monitor but
fail with multiple monitors try lengthening the sleep time in that launching
loop.
- Don't forget that there is a way in the code to speed up the turtle by
reducing a time-wasting loop if your system is running slower than you'd like
while you are doing testing.
- Mazes 9 and 10 might look the same, but maze 10 starts with a random
orientation.
- ROS is an inherently asynchronous system. This means that messages will
arrive in different orders, and it is possible for the same type of message to
arrive twice in a row without some other message arriving at all due to
changing message ordering. (The project calls these 'interrupts' from the
system point of view, but as implemented in ROS they are basically messages.)
Note that ROS is designed to operate in real time on physical systems, so even
though we are running simulations, the wall time (i.e., time perceived by you
while you watch the turtle run the maze) that passes when running the
simulation is relevant to ROS behavior.
- If you try to run your system faster than the jitter of message arrival
time in ROS you might see two, three, or more of one type of message before
another type of message arrives. That means that messages that arrive might be
from different turtle positions and/or different turtle orientations, causing
problems for your monitors.
- Slowing down how fast the maze runs (basically slowing the speed of the
turtle on the screen in real time) is usually the easiest way to resolve race
conditions caused by excessive message timing jitter. In general what you want
to do is run the turtle slow enough that all the messages from one move get
sent before the next move is attempted. ("move" is either a turn or
forward move.) In real life, if you need to run faster than this, you need to
deal with distributed time issues by, for example, using
synchronization
or group membership protocols, in a way that gets pretty complicated and pretty
painful in a hurry.
- Hint: if you see monitor violations that change from run to run on the same
maze, that is very likely a race condition (asynchronous system defect) rather
than straightforward logic error.
- Grading criteria note: In case of monitor violations that might be due to
timing errors, most students are able to resolve these. However, for the
purposes of the course grade that passes acceptance testing, the following
criteria will be used:
- If a maze does not produce monitor violations (and does not have false
negative violations involving unreported turtle misbehavior), the run counts as
violation-free.
- If a maze does produce monitor violations, it will be run a second time on
the same maze. If the monitor violations disappear on the second try, that also
counts as violation-free.
- If both first and second runs have violations, it will be run a third time
on the same maze.
- If the third run has no violations, that counts as violation-free.
- If any violation happens at the same cell in the maze, that will be
presumed to be a logic error unless the student can show otherwise.
- If all violations happen in different places in the maze (i.e., no three
runs have the same violation at the same square in the maze), the TA has
authority to determine it is a timing violation. The acceptance test may
be counted as violation-free at TA discretion. Points are likely to be taken
off for the project, but it may count as completed for the course letter
grade requirements.
- You might see out of order message arrival leading to issues with multiple
monitors. This is a result of the non-deterministic, unsynchronized nature of
ROS. We expect that your code works despite these circumstances. There are many
ways to mitigate this issue: sleeping a thread, sequence numbers, etc.
- Recall that from Project 10 we recommended you exit after a few cycles in
the goal square. If you use Ctrl-C you might see an error such as "boost:
mutex lock failed in pthread_mutex_lock: Invalid argument" As long as the
monitors execute properly we will not deduct points for this error.
- Repeated from Project 10:
- Bug fix you might need to make: We had a bug report that the monitor script
was reporting violations that didnt exist and seemed to be triggering on
ROS_INFO lines. After some investigation, it appears that the
run_642_monitors.sh script is matching on regular expressions instead of
literal text which is causing it to match on the info command that affects
some, but not all, students. If this affects you, the corrected code is
- 22: grep -C 5 "\[ WARN\]" $m.output.tmp >> VIOLATIONS.txt
- 23: m_viol=`grep "\[ WARN\]" $m.output.tmp | wc -l`
- Some of you might see an exception that looks like this when you Ctrl-C the
terminal: [master] killing on exit terminate called after throwing an instance
of
'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::lock_error>
>' what(): boost: mutex lock failed in pthread_mutex_lock: Invalid argument
- In most cases this issue can be avoided by having the system do a clean
self-shutdown instead of a Ctrl-C at the end of the maze run.
- Note that while this isnt perfect, we will NOT be taking any points
off if this happens during a demo. So, if you are properly executing the
monitors, you will earn full points. Ctrl-C is a pretty violent way to shut
down a piece of software, and apparently ROS buffer management isn't very
robust to that. If you want to dig deeper come to office hours and we can
suggest some ways to do better, but this is NOT required for the project.
- 6/28/2021: issued
- 10/17/2021: updated hints
- 11/21/2021: deleted reference to unused documentation hand-in; changed
hand-in zip file name for consistency.
- 10/20/2023: updated with font size requirement