Reports

Rip’s Status Report for 4-26

This week I mainly worked on the final presentation. I will be presenting so I wanted to take the lead on formatting the presentation and the content. I decided to use a presentation format called assertion/evidence because I’m a little bored with the “Solution Approach” slide titles I keep seeing in every presentation.

I also helped Niko figure out some bugs he had in the interaction layer that came to light after integration and deployment. I control the emulator and whether or not it’s running, so anytime he’s debug I have to be involved for the most part. I think that’s a good thing to have to have two people involved with the debugging. It’s certainly helped us figure stuff out a little quicker.

Rip’s Status Report for 4-19

This week I finished the first version of the hardware emulator. I’ve been working pretty much the whole day for the past couple days to get this all working together. The first version will use the Django admin page as a frontend and uses the Django REST framework as an interface between the hardware library on Niko’s interaction layer and the emulator webapp. I’m really excited to demo this, because it’s actually able to handle all different kinds of devices, with multiple pieces of hardware (sensors/actuators) simulated in it.

We also got the full system integrated from end to end. This is however with some minor errors but nothing that overshadows the actual purpose. We can go in to Richards webapp, create a new device and bring that up in Niko’s interaction layer, and then see details of that device in my emulator. I’m very excited to see how the demo to our advisors goes.

Rip’s Status Report for 4-12

this week I haven’t done much on this project. I’ve kept up with what Richard and Niko are doing but that’s about it. I know that having something ready for the demo next week is going to be tough, but I’ve had a lot of other things due and I have been lacking on motivation. I’ll work to have something working for next week.

Rip’s Status Report for 4-5

This week I started The hardware emulator. I decided to use Django, a python framework, because Niko is building the interaction layer in python and integration might be simpler if we use the same language.

For the emulator, I plan on first getting a basic webapp with data models up and running and then working on the library that Niko will use in his interaction layer nodes. Once these two are talking, I want to implement the full communication API. I found a REST framework that works with Django and I think I might use that. I’m debating whether or not to actually have a frontend or to use the Django admin page that Django provides as a means to view the data. For now I’m going to use the admin page.

Richard’s Status Report for 4-26

This week, we did our demo to the professor and ta. This went pretty well, and while we were happy that it wasn’t a disaster, we were also aware of what we had left to do. We made a list of the things we still had to complete and got to work.

Since we weren’t planning on changing the webapp from what we showcased in the demo, I didn’t work on it. Instead, I helped Niko on timing the interaction layer stuff, so that we could have solid numbers for the presentation, and so we could back up our answers to the questions our peers and fellow students will probably have.

We discussed how to do this for an hour or two and decided that perhaps the best way would be to ensure the clocks on the nodes were synced, and log everything that happened. I’ll use timing an interaction as an example. We logged the start of the interaction, which is a value on a sensor changing. We also logged the end of the interaction, which is a value on a target device changing. Subtracting the log times of these events gives us a latency figure, which we intend to present tomorrow.

There are some caveats to this method: this latency is much lower on the virtualized aws network than it would be on a home wifi network. The network on aws is state of the art, probably 10 gb/s, while really good wifi networks can reach almost 1 gb/s. AWS machines themselves are also pretty fast, while the devices our system would have run on, raspberry pi’s, are significantly slower.

We feel that even though AWS technology is much better and faster than a home wifi network would be, by our estimates our technology would still fall well under the goal threshold we set at our design presentation.

Niko’s Status Report for 4-19

This past week, I put a lot of work into wrapping up the functionality of the interaction layer and integrating all three parts together.

The way we broke the project up, node infrastructure (aws hosting) fell under the interaction layer umbrella, since the ‘node’ process is the top level process running on the node and managing the other processes. As such, it fell onto me to take care of running the frontend webapp and making sure that the data read / write pipeline worked from the frontend all the way through the interaction layer and down to the hardware emulator for each node.

Below, I’ve broken up my work this past week into different categories, and I elaborate on what I worked on in each category.

  •  Infrastructure
    • setup script: I changed the setup scripts for setting up a new node to create a shared Python virtual environment for the project, rather than a different one for each of the three parts. This made it much easier when manipulating different parts of the system, since I no longer had to worry about which virtual environment was being used.
    • AWS
      • Set up all the nodes: I created 5 unique nodes, each with a different config and hardware description. Since initial device commissioning is out of the scope of the project, when the system “starts” for any demo, it has to already be initialized to a steady state. This means I had to hardcode each device to work properly
      • Elastic IPs: I figured out how to create and assign a static IP address (called an elastic IP by AWS) to each node, so that I could easily hard code the IP address within the code instead of dynamically gathering it each time Amazon decides to change it.
      • Static IP for webapp: I started looking into defining an extra elastic IP address that is reserved for the webapp instead of for a specific node. The way this would work is that all nodes have a unique static IP address, but the master node has an additional public IP address pointing to it. If and when the master dies and another node promotes to master, the new master will dynamically claim that second static IP from AWS. The result of this would be that the same IP address would always point to the webapp, even if the node hosting the webapp changes. I hit a few issues, such as restarting the network stack on an Ubuntu 18.04 virtual machine, and couldn’t get this working by the final demo.
  • Interaction layer
    • CONFIG and hardware_description parser: I made the definition for the CONFIG file clearer and more rigid, and added functionality to easily parse it and get a dictionary. I also created a helper function to parse a “hardware_description.json” file, which describes the hardware located on the node. This was something required by Rip’s hardware emulator that I hadn’t expected, and had to be done during integration.
    • Database schema updates: As discussed in last week’s update, Richard and I discussed certain updates that had to be made to the database schema. I updated the database (and database management helper functions) as per our discussion.
    • help_lib for Richard’s frontend webapp: I added the following functions to a file called help_lib.py, with the intent that these would facilitate communication between the webapp’s backend and the interaction layer’s distributed data storage.
      • addInteraction: takes a new interaction and sends it to all nodes in the system to write to their database
      • deleteInteraction: same as addInteraction; deletes an interaction on all nodes in the system
      • updateInteraction: same as above, but updates an interaction in place instead of deleting it. The way the frontend is currently set up, their is no way for a user to modify an existing interaction, so this function is unused.
      • updateNode: update the display name and description of a node. This again makes sure to sync across all nodes
      • getNodeStatus: takes a list of node serial numbers, and for each node, gets the current value of that node, from the node, through the mqtt broker. If the node is dead, will short circuit and return that the node is dead, instead of trying to ping it to ask for a status update
      • setStatus: set the value for a particular node. E.g. turn a light on, set off an alarm, etc. This will communicate to that node through the broker.
    • db_lib for sql interactions (read / write etc): added a lot of functionality to the db_lib file for managing the database, so that code outside the db_lib file has to do minimal work to read from / write to the database, and in particular, has to do nothing relating to sql.
    • Proper logging setup: updated the logging in the system to use the python logging module and create logs useful for debugging and system monitoring.
    • MqttSocket: created a new class called MqttSocket (i.e. socket type functionality over mqtt). Currently only used by getNodeStatus, this class is meant to describe and facilitate a handshake type interaction between two nodes. We decided to do all node communication through the broker instead of directly from node to node in order to facilitate interactions. However, sometimes one node has to specifically request a piece of information from another node, which follows a request / response pattern. The async and inherently separated publish / subscribe nature of MQTT makes it fairly convoluted to follow this request / response pattern, so I packaged the convoluted logic into a neat helper class that makes it very easy to do the request / response cycle. Here is an example of how it’s used:sock = MqttSocket()# topic to listen for a response
      sock.setListen(‘node/response’)

      # blocking function, sends ‘data’ to ‘node/request’ and returns the
      # response sent on ‘node/response’
      sock.getResponse(‘node/request’, data)

      sock.cleanup()

    • Master failover: master failover hinges on the fact that nodes know whether other nodes in the system (in particular the current master) are alive. Initially, I planned to do this using regular heartbeats. However, I realized that the mqtt protocol has built in support for exactly this behavior, namely topic wills and on_disconnect callbacks.
      • topic wills: any client that connects to a broker can define a ‘will’ message to be sent on any number of topics. If the node disconnects from the broker without properly sending a DISCONNECT packet, the broker will assume it died and send its will message to all nodes subscribed to that topic. I used this to implement heartbeats. All nodes listen to the “heartbeats” topic, and if a node dies, they will all be notified of its death and update their local database accordingly. If the master dies, this notification is done using the on_disconnect callback.
      • on_disconnect: if the underlying socket connection between the paho mqtt library (the library I’m using for client-side mqtt behavior between the nodes and the broker) and the broker is broken, it will invoke an optional on_disconnect callback. This callback will be invoked any time the node client disconnects from the broker. However since the nodes should never intentionally disconnect from the broker, this will only happen if the broker has died. This way, nodes are notified of a master’s death, and can begin the failover process.
    • By using topic wills and on_disconnect, I save needing to send frequent publishes from each node, which would cost unnecessary bandwidth. If a node receives notice that the master has gone down, it will select a new master. The next master is the node with the lowest serial number of the nodes that are currently alive. If that happens to be the current node, it will start the master process, otherwise, it will try to connect to the broker on the new master node.Currently, all of the above works except for the final reconnect step. For some reason, the client mqtt library is having trouble throwing away the previous connection and connecting to the new broker. As such the failover happens, but nodes can’t communicate after :(. I will fix this before the “public demo”.
    • populate db: while hardcoding is generally frowned upon, since our system has a “stable state” for demos, I needed an easy way to get to that steady state. I made a script that populates the database with the hardcoded node values as they exist in the defined steady state, so that it’s very easy to reset a database’s state.
    • Path expansion for integration: this was a small bug that I found interesting enough to include here. On the node, the top level directory contains three repositories:hardware-emulator, interaction-layer, and ecp-webappThe helper functions I wrote for Richard’s frontend exist ininteraction-layer/help_lib.pyand those functions are used inecp-webapp/flask-backend/app.pyMore importantly, those helper functions use relative paths to access the config and database files, which creates problems when the webapp tries to call them. I had to change this behavior so that the helper lib expands relative paths to absolute paths, allowing them to be called from anywhere in the systm.
    • interactions: interaction definitions were ironed out to work with the way the hardware emulator expects values to be, and how the frontend expects them to be. Since my layer is in the middle, I had to be careful about parsing, storing, and acting upon interactions.
  • Problems to be fixed this upcoming week:
    • broker failover: as discussed above in the master failover section, after a master failover, nodes fail to reconnect to the new broker. This needs to be fixed.
    • conflicting interactions: the frontend allows you to define conflicting interactions, which would arbitrarily thrash the system. For example, I could define the following 2 interactions:

      if motion sensor > 5, turn light on
      if motion sensor > 5, turn light off

      Now, if the motion sensor is triggered, the light will begin rapidly flickering, which is annoying and probably unintended. I think it would be cool if the frontend can identify such conflicting interactions, but it may end up being far more complicated than I suspect, and might be too hard to identify loops.

    • setting sensors on frontend: the frontend currently allows you to set a value for any device, such as setting “light = 1” (turn on the light). However, if you try to set a value for a sensor node, the backend throws an exception, crashing the interaction layer. This behavior needs to be prohibited.
    • measure latency: in our design docs, we defined latency requirements. While the logging facilities are in place to measure this latency, we need to actually do the measurements for the final report.

Richard’s Status Report for 4-19

This week, we focused heavily on integration. Although Niko and I were already pretty integrated, we realized that there was actually a lot left that we had to do. There were around seven or six places that my code sent Niko’s library a string but he expected an integer, or something along those lines. We spent a few hours going through the whole system, debugging every single error. This was much more time consuming than I thought it would be.

Another thing that took a while for me was to add additional functionality. Niko’s library included code to delete and add interactions, which I didn’t have in my webapp. To ensure that Niko hadn’t wasted his time to write this code, I added in these features to the web application, so that his code is actually called.

Because I didn’t have Niko’s code running on my computer, I actually had to write test-dummy implementations of Niko’s library to test my code. This was very important and I think that it saved us hours on integration. It ensured that all the functions were getting passed exactly what they were expecting, and there were as few surprises as possible.

I think overall, we did a pretty good job with integration. I communicated with Niko very clearly, and even then still ran into multiple misunderstandings/miscommunications. This is a lesson to show how important communication actually is in group projects.

Niko’s Status Report for 4-12

This past week I worked more on integration between the webapp and interaction layer. Richard and I had a discussion, and realized we had different ideas for how the database would store the information we’d agreed upon. We had a long discussion and settled on a modified subset of the database schema. I worked on modifying the schema to fit our discussion

I also worked on some helper functions to facilitate interaction with both the database and the mqtt broker. Besides making my own code cleaner, it begins to provide an interface for Richard’s webapp’s flask backend to obtain node data and present it to the frontend for the user.

For this upcoming week, I want to finish the api that I worked on this past week and finish integrating it with Richard’s layer. At the end of the week, the frontend should be presenting no hardcoded data, but should be pulling information directly from the nodes.

I also would like to reopen the API discussion with Rip, and outline exactly how his layer and mine will interact.

Team Status Report for 4-5

Our team updates for the past week are largely in two parts. You can see more details on each individual person’s updates in the individual status reports; we will largely discuss integration efforts and issues here

  • Webapp and interaction layer integration:
    • This week we began integration between these two layers. Currently, the interaction layer can start and run the webapp, and make sure it comes back up even when it dies.
    • Setup scripts have been written for both layers (individually and together), facilitating future development.
  • The virtual environment problem:
    • When Richard developed the webapp, he used virtualenv to create the python virtual environment. However, Niko used venv to create his. When creating the setup scripts, there was an issue with using virtualenv that caused the environment to not activate properly and the webapp to fail in initialization. However when manually testing, niko would use venv, and the webapp would work. After a lot of collaborative debugging, Niko and Richard figured out that for the Ubuntu vm’s, venv works better, and they updated the setup scripts accordingly.

Niko’s Status Report for 4-5

This week I did a lot of work on the interaction layer in preparation for the demo on Monday. Here are the following areas I worked on:

  • Setup / install scripts:
    • While user-friendly device commissioning is not in the scope of our project, the fact remains that we still have to “commission” devices during development. This involves creating an aws VM, cloning the appropriate repos, installing dependencies, setting up python virtual environments, initializing device configs, initializing the database, etc. Since this is not something anybody wants to do more than once or twice, I created setup scripts for both the frontend webapp and the interaction layer. I also made a top level script that will clone all the repos and run all their setup scripts. That way after starting a fresh vm, all you need to do is scp the top level script and run it, and then wait for it to finish.
  • Integration:
    • I spent a lot of time this week working to integrate the interaction and frontend webapp layers. Currently, the interaction layer is able to start and run the frontend, and I have written the setup scripts for both layers. For next week, I still need to tie the webapp’s backend into the interaction layer so that it no longer has hardcoded dummy data.
  • Master process:
    • I initially wrote the master process in python, since that is what the rest of the interaction layer is written in. However, I quickly realized that all I was doing was running shell commands from python, such as checking if a process was up and then starting it if not. It doesn’t really make sense to be running only bash commands from within Python, and Python wasn’t making my life easier. I decided it would be better to implement the master in bash as a script. This greatly simplified its logic, and made it a more elegant program. The master is in charge of starting and keeping alive the frontend webapp, node process, and mqtt broker. Once the interaction layer is integrated with the hardware layer, it will also be in charge of starting and keeping alive the hardware simulation webapp.
  • Node interactions:
    • I got the nodes to be able to subscribe and publish to each other, and react to data from other nodes. While the actual definition of an “interaction” needs to be ironed out a bit between me and Richard (front end webapp), the infrastructure is now in place.