LibreNMS giving back

So it’s been a while since the last blog post so this is long overdue!

Since the LibreNMS summit in 2015, we’ve had a little bit of the donation money left over, which day to day we have no use or need for. We’ve considered buying some kit to develop / test against but most of the time this isn’t necessary. So aside from keeping it in the bank, what do we do with it?

We’re not talking about a huge amount of cash, about $260 in total but it makes sense to make use of it in some way. So the core development team had a chat and we thought the best use for it would be to contribute back to other Open Source projects that we make use of within LibreNMS. So that’s what we did, we’ve only done it once at the moment but we hope to be able to spread the remaining amount to at least one other project.

So where has the money gone? RRDTool, it’s one of the biggest reasons why LibreNMS works. The work that Tobias has done and continues to do on RRDtool is nothing short of amazing. We’ve all banged our head against a wall with it on the odd occasion but the reality is quite a few network monitoring platforms wouldn’t exist today if it wasn’t for RRDtool. You can see a list of people who’ve donated to RRDtool here: https://tobi.oetiker.ch/webtools/appreciators.txt

We wish we could do more, but we don’t generate revenue from the work we all put into LibreNMS so we just don’t have the resources available to make a bigger difference.

If you have any suggestions on where else we can donate the remainder of the money to ($130) then please drop a comment here or email team@librenms.org.

Summit report

Once again I must apologise that it has been too long since my last post!  It has been a busy few weeks at work since the summit, and also a busy few weeks for LibreNMS.

I’ve recently updated our web page to specifically thank those who contributed to the summit.  We are humbled by and grateful for your confidence in us.  Five people in total attended the summit:

  • Neil Lathwood, UK
  • Daniel Preußker, Germany
  • Søren Rosiak, Denmark
  • Mike Rostermund, Denmark
  • Paul Gear, Australia

All of the above are now registered committers/reviewers in the LibreNMS organisation on Github.

Our day started with personal introductions – many of us had not met in person until the weekend of the summit.  After a brief talk about some administrative issues, the morning was spent working through our future development priorities.  We then enjoyed lunch together, followed by an afternoon of reviewing issues and working on fixes.

2015-08-30 14.34.50 crop

Due to the provision of a meeting room by Canonical, our costs for running the summit were lower than expected, and we were able to pay for the accommodation and airfares not only for Daniel, but also Søren and Mike.  Here’s a breakdown of how we used the funds raised through the Indiegogo campaign:

  • Daniel Preußker – travel from Germany + accommodation in London – €492
  • Søren Rosiak – travel from Denmark + accommodation in London (for Søren and Mike) – €465
  • Paul Gear – lunch for the summit participants – €119
  • Neil Lathwood – remaining funds from summit for use in legal defence – €376

(The above amounts are not exact due to rounding and currency conversions, and exclude the fees from Indiegogo.  However, the above represents 100% of the usable funds from the summit campaign.)

The big items on our agenda on the technical side were:

  • Installation – The current setup works reasonably well, but we still have a steady stream of people coming through the IRC channel who manage to mess up permissions; we’d like to make this easier by creating a standard installer process that works out which distribution it’s running on and makes all the right adjustments.  https://github.com/joubertredrat had a first attempt at this, which we may use.  (There may be some issues dealing with things like SELinux, but Daniel feels this is solvable with a relatively small SELinux configuration.)
  • Documentation – There is plenty of improvement that could be made around installation, interacting with git, coding standards, and the FAQ.  We will work on this as time permits.  If you aren’t a coder, but would like to make a contribution to the project, this would be a great way to get involved!
  • Alerting – Daniel is working on the next version of alerting in which he hopes to incorporate both functionality and UI improvements.  The ability to share rules with other LibreNMS users was discussed, but no concrete plans have been made yet.  Your feedback would be appreciated: What works well? What doesn’t? What would you like to see?
  • Graphing engine – We are tracking several possibilities with respect to updated graphing engines that would be useful in overcoming some of our current limitations.  There are immediate plans to migrate, but we will continue to track the projects which look most viable.
  • Poller – There are a number of common requests we see from time to time:
    1. ping-only polling
    2. SNMP-only polling [this was recently added]
    3. polling at custom intervals
    4. other polling methods such as netflow, HTTP, NTP

    All of the above are achievable, but will require non-trivial changes to the existing codebase.

If you were a contributor to the campaign and are due some priority attention on an issue (I’ve had one reminder about this already), please get in touch with us via email: team at librenms dot org.  If you haven’t yet submitted an issue on github, please do so before emailing us (or just let us know in the email if there are reasons why you can’t do that).

Blown away by community

It has been quite a while since my last post, so I thought I’d break the radio silence and take the time for a look back at what we’ve been up to. A lot of water has gone under the bridge since my first commit to LibreNMS on 28 October 2013, and the time has flown!

My biggest fear when I started the project was that my lack of time would mean the codebase would languish, and I’d be left with something that worked, but didn’t have a viable future.  However, LibreNMS was born because I felt the need for a network monitoring system whose community:

The community which has gathered over the last 20 months or so has amazed me in fulfilling this vision.  It’s not an understatement to say that all of my original goals for LibreNMS have been met and exceeded already.

Particular thanks must be go to the other two core LibreNMS team members for their efforts:

  • Neil Lathwood has been far and away the most prolific coder on the project and has relentlessly pursued fixing bugs and adding features to benefit our user base.
  • Daniel Preussker wrote our alerting system and came on board as a code reviewer.  His emphasis on security and efforts to improve code quality have been a huge bonus to the team.

The last two months in particular have been a whirlwind, with the number of participants in the IRC channel, mailing list, and issue system showing a dramatic increase (possibly due to a few mailing list discussions and the odd reddit thread).  In IRC alone we went from around 10 regular channel participants to hitting 80 for the first time last week.

Our number of contributors has been growing rapidly in recent weeks, with various contributors joining to provide code to support their preferred devices.  I attribute a large portion of our success here to using git as our SCM and github as our method for collaboration – they make it easy to integrate and collaborate at the code level.  Many of our contributors have never even worked on a DVCS before!

It has been a privilege and a pleasure to see LibreNMS develop into a testimony to the power of community to make Free Software awesome.

In addition to the API which was integrated near the end of 2014 (see earlier blog posts), we’ve made some great progress on other features, including:

  • a customisable alerting system which includes integrations with Slack, HipChat, PagerDuty, and Pushover
  • updating bootstrap to a more recent version and extending its use to various tables in the system through bootgrid
  • added a distributed poller to allow segmenting and scaling for load
  • directly integrated the documentation with our git repository using Read the Docs
  • added or improved support for dozens of device types, including many relevant to Wireless Internet Service Providers (WISPs)

There are many other fixes and improvements as well – see the changelog for full details.

It has been a wild ride so far with LibreNMS, and I’m both thankful to all who’ve contributed to our community so far, and excited at what the future holds.

API merged

In case anyone hasn’t noticed Neil’s blog post or our Twitter feed, we’ve recently merged in Neil’s API work, along with a few updates from me.  The API is based at /api/v0 in your LibreNMS install; it is marked as v0 to signify that it is a pre-stable interface – please do not assume that any part of the API is guaranteed stable until we mark it as v1.  As you may have gathered from my previous notes about API design, there are some interesting new developments afoot in the world of RESTful APIs, and we’d like to work towards implementing the best possible API design.

Please note that the implementation of the API is still a work in progress.  Known issues at the moment are:

  1. Incomplete checking of user permissions
  2. No security auditing or hardening has been done
  3. Encoding of interface names containing slashes needs to be tested
  4. LibreNMS doesn’t come with any specific support or documentation for setting up HTTPS by default
  5. Creation of API tokens is still manual

Because of these issues, I recommend that you do not expose your LibreNMS install’s API to untrusted systems, and especially do not make it available on public web sites.  I expect we’ll issue an update to the documentation soon to provide specific guidance for locking down the API, and hope that we’ll have a number of code updates to address the above implementation issues shortly.

Catching my own silly mistakes

When working on the recent API addition, I messed up a simple change – I dropped a closing parenthesis which was necessary.  Fortunately, Scrutinizer picked this up, and its integration with GitHub means there is a nice warning right where you would normally hit the merge button.

I already had a Makefile entry on my local system which uses php -l to syntax check files that aren’t committed, but I had already done the commit, so it didn’t get checked before I pushed it to GitHub.

I decided this was the wrong approach, and realised git already has a great method for supporting this: hooks.  Hooks can be run at various stages of the git work flow.  One common place is on push, which many people (GitHub included) use to perform notifications of various types (such as emailing interested parties, or starting a Scrutinizer run).

In this case I wanted a pre-commit hook, so that my silly mistakes are never recorded in the git history.  If a pre-commit hook fails, the commit fails, and you have to try again.  This is ideal for integrating syntax and style checkers into your coding process.  I found a discussion about this on Stack Overflow, which linked to a working pre-commit hook script.  I reviewed the code and tested it and it works pretty well for me.  Unfortunately, the .git/hooks directory is not part of the git repository, so each developer will have to add this script for him- or herself.

I would like to eventually extend this to style checking, but it seems we’re quite a long way from being compliant with any PHP style guides, so that will have to be an exercise for the future.

RESTful APIs

Wow, did I ever open Pandora’s box when I started reading about REST!  What I thought would be a fairly straightforward exercise of choosing the URL structure and working out how to version the API turned out to be mind-bending (in a good sense). I’ll expand on this more as I get my paper into shape, but in the meantime, here are some of the things I’ve been reading about.

Foundations

Roy Fielding, one of the original authors the HTTP 1.0 and 1.1 RFCs, introduced REST in chapter 5 of his 2000 dissertation: Architectural Styles and the Design of Network-based Software Architectures.  What I never realised until now is how big a vision the designers of HTTP had: they were actually trying to build into the network an architecture to support large-scale distributed systems, the scope of which we as an industry are only just beginning to glimpse.

API versioning fiasco

Every man and his dog has an opinion on how to do version control in REST APIs.  In 2012, Tim Wood published a concise, helpful summary of the then-current state of which parties advocate for or implement which method of versioning.  His results were both interesting and confusing: basically, this is an area where the commercial world almost (but not quite) settled on one approach – embedding the API version in the URL – in complete disconnect from (or perhaps ignorance of) the research community.

What is REST, really?

This disconnect exhibited itself in a number of ways, and was indicative of a broader underlying misunderstanding.  As early as 2008, Fielding complained that people really weren’t getting the point of REST.  He explained in a number of different ways the idea encapsulated in his final point:

A REST API should be entered with no prior knowledge beyond the initial URI (bookmark) and set of standardized media types that are appropriate for the intended audience (i.e., expected to be understood by any client that might use the API). From that point on, all application state transitions must be driven by client selection of server-provided choices that are present in the received representations or implied by the user’s manipulation of those representations. … [Failure here implies that out-of-band information is driving interaction instead of hypertext.] (Emphasis in original.)

To put this another way: the client’s choices for interacting with the server must be wholly dictated by the content of the server’s response to the initial request.  No knowledge of the server’s URL structure should be assumed or required on the part of the client.

Moving forward

A lot of people seemed to struggle to understand Fielding, as evidenced by the quantity and quality of the questions his post generated in response.  (It certainly took me several reads through the material to unwrap what he was getting at.)  But in the last few years, things have started to change.  A series of blog posts by Mark Nottingham, chair of the IETF HTTP working group, started to show me how discussion about versioning was missing the point, and that there are bigger issues at stake:

In addition, RESTful Web APIs, by Leonard Richardson, Mike Amundsen, & Sam Ruby, was very helpful for me in unlocking these concepts – they even include an guide to Fielding’s dissertation as an appendix.  The big ah-ha moment for me was when they demonstrated with JSON objects how to do and not to do collection navigation.  I realised that nearly every web site I’ve ever visited re-implements this in different, incompatible ways that are non-machine-navigable.  By following paradigms of hypermedia, we will not only save ourselves a lot of work, but allow for the next generation of API clients.

Anyway, stay tuned; this might seem a bit abstract at the moment, but I think it will have important concrete implications for LibreNMS and what we can do with our API.

Oh, look – shiny!

One of the biggest struggles I have with thinking about the direction of LibreNMS is all the features that I would love to add if I had the time. The fact is I’m never going to have the time to implement everything I would like to do, so being focused and using my time wisely is important.

So one thing I think it is important for us to do in the near future is to decide on priorities and focuses so that we’re not running around working on everything that takes our fancy for a few minutes or immediately merging every pull request that comes in.

Here are a few of the things I had originally listed in the roadmap:

  1. Investigate generic device support based on MIBs
  2. Eliminate interface churn for transient interfaces (e.g. ppp/tun) on net-snmp
  3. Investigate solutions for poller performance improvement
  4. Investigate solutions for multiple communities/ports per device
  5. Integrate Nagios-based alerting
  6. Consider adding administrative functions such as enabling/disabling ports, etc.
  7. Front page customisation
  8. GUI configuration of most options

Neil is working on #8, and #7 is my main upcoming coding priority.

One item that overlaps somewhat with #4 is integration of non-SNMP data sources.  Neil has been working on getting ping, poll, and discovery stats added, and there are several others that I’d like to look into, including NTP quality monitoring via ntpq.  I wrote a small standalone utility to do this a while back, but LibreNMS would be much better home for it.

We still need to evaluate whether the Unix agent (which we inherited from Observium, but which few of the LibreNMS community seem to use) or using Net-SNMP extensions might be a better way of achieving this, or whether we need to add poller modules for it.

It may be that we need to provide multiple ways of getting at other data sources.  One LibreNMS user might be a service provider who has no control over the monitored equipment and SNMP is the only viable option; another might have complete configuration authority over a data centre full of Linux hosts, and Linux agents or SNMP extensions might be far preferable – please let us know which solutions work best for you.

The unifying theme these points is that I want LibreNMS give you as complete a picture of your infrastructure as possible in one place.

API requirements and design

Developing an API is one of our main focuses over the next few months.  Neil has already published an initial API that is being driven by developer requirements at his workplace, and we’ve had a request about this on the mailing list as well.

The two biggest questions in my mind are defining what we want the API to do (functional requirements), and making sure it is workable in the long run and adaptable to future requirements (i.e. well-designed).

With respect to the functional requirements, here are the things I think might be useful:

  1. Graphs
    • get a list of available graphs for a given device or interface
    • view a graph (for embedding in another application), given appropriate permissions
      • graphs should come in a number of convenient preset sizes, but we should also allow for limited customisation where possible.
      • allow retrieving of graph data in numeric form to allow it to be used in web APIs such as Google Charts?
  2. Devices
    • list known devices
      • filtering of the list by name, IP address, disabled/ignored state, etc. may be useful
    • add/discover one or more devices (should perform initial poll also)
    • force poll an existing device? (Not sure if this is very useful, since it has the potential to generate high load on the server, and polling more frequently than every 5 minutes is not something that we have on our radar.)
    • disable, ignore, or delete one or more devices
  3. Users/access control
    • add/delete a user
    • change password
    • grant/revoke user access to a device or interface
  4. Interfaces
    • list known interfaces for a device
      • filtering the list by type or IP address may be useful
    • disable, ignore, or delete one or more interfaces
  5. Alerts
      • list active alerts
      • enable/disable alerts globally
      • enable/disable global email address override

    The alerting system is fairly small and simple at the moment. Daniel Preussker started working on something a little more sophisticated on this a while back, but hasn’t had the time to develop it fully.

As always, if you have any input on the above, please get in touch.  IRC is our preferred contact method at the moment (you can use webchat if you don’t have an IRC client installed – join the channel ##librenms), since it allows group discussion in realtime.

In terms of API design, it seems a lot has been written on the subject.  I plan to do some more research on this over the next few days to try to distil the practitioner literature down to some useful guidelines for our API.  Once again, please let us know if you have any useful input in this area.

Other things we’ll probably have to consider at some point include:

  • API versioning.  From my brief reading on the topic, this seems to be an arcane art – more research definitely needed.
  • OAuth, to allow untrusted third parties access to selected features.  I’m a little reluctant to go down this path at the moment, because we don’t really have a good feel for the security posture of the code.  But it’s almost certain to be asked for down the track.  My assumption is that we need to get solid HTTPS support in place before this is viable.

Community contributions and improving code quality

One of my main goals for LibreNMS was to encourage community contribution. We’ve already made some significant progress in this by simply moving to git as our DVCS, and github as our code hosting provider.  This makes it trivially easy for newcomers to make a quick contribution.  If you haven’t experienced this – try it!

  1. If you haven’t signed up with github already, create an account.
  2. Find a bug or deficiency in, say, our installation documentation.
  3. Click the edit button.  This makes your own private fork of the repository on github’s servers.
  4. Make your change to the file, then at the bottom of the screen under “Propose file change”, explain your change, click the green “Propose file change” button, and we’ll instantly get a pull request for the change.

Another thing we’d like to make some progress on is improving our code quality.  PHP has a reputation for encouraging poor programmer discipline and making it easy to do the wrong thing, and we’d like to make this better.  One of the steps we’ve taken is signing up for Scrutinizer, a continuous integration (CI) service which integrates with github and runs various code checks on every pull request to report about code quality.  Here’s an example: https://scrutinizer-ci.com/g/librenms/librenms/inspections/e88e5faf-79b4-440a-8036-331dfd8a3b0e/issues/

Some further issues that could be tackled are little cleanups like moving certain standard code “micro-patterns” into utility functions.  For example, we don’t generate names for RRD files consistently, and we do it in many places in the code.  It would be great to get this centralised, and this is important if we decide later to use another storage mechanism like graphite.  A similar example is date formatting; there is no consistent means for doing this in the code.

These are some simple ways that people can contribute to the project.  As always, if you have any suggestions or input, please get in touch.  Or, just create a fork and start sending in your contributions!