Visual Performance Degradation Troubleshooting with perfSONAR
Time 12/10/19 11:00AM-11:10AM
perfSONAR is widely used and very well-suited for long term regular testing of multi-domain network performance. Its most common use case is to deploy it in different points of a network and establish a mesh of regular measurements inside the network as well as to remote destination using paths of interest for the users. Its dashboard, with alerting and plotting capabilities, give the network operation center (NOC) engineers a quick and easy way to assess the current and past performance state on these paths.
When performance degrades, the engineers want to know more. They want to understand where the performance drop is coming from. They try to isolate the portion of the path, the network segment or device that is causing the degradation. Doing so requires finding additional measurement points (MPs) along the affected path. And then triggering more measurements to compare the results with the regular testing.
This whole process involves using the perfSONAR Service Directory, filtering MP by service type and location, analysing traceroute outputs and finding the new MP along the path. And then running new on-demand measurements from the command line interface (CLI) using pScheduler and its various and long options depending on the type of test to run and tool to use.
To execute this process, most NOC engineers would prefer to use a graphical user interface (GUI) rather than having to switch to an ssh session and use the CLI on a remote host. Can’t we have a GUI to help them? As all perfSONAR components involved have a good and well-documented API, this should be doable.
This is what a small team of GÉANT developers has been working on for some months now. It is also an attempt to replace a previously existing tool, the psUI.
Our new tool is using a custom-built GUI to choose an MP of interest from the Lookup Service (LS) list. It lists the different tests and tools available on the chosen MP so the network engineer can select the most appropriate one to use, along with their corresponding options and parameters. Then the GUI triggers the chosen measurement through calls to the remote pScheduler API. And finally, once the test is finished, it plots the result in a dedicated Grafana dashboard.
From a proof of concept at the end of 2018 to a prototype by mid-2019 and aiming for a first beta version by the end of 2019 we’d like to present the concept, architecture and design of this new tool and to demo it at the TechEX conference 2019.
Authors: Antoine Delvaux (PSNC), Erik Reid (GÉANT)
Speaker Antoine Delvaux PIONIER (Poznan Supercomputing and Networking Center)
Primary track Advanced Networking