Troubleshooting AmLight: Handling Network Events in a Production SDN Environment
Time 09/26/16 08:40AM-09:00AM
A generous range of tools is available for network monitoring and measurement of legacy networks. Most of the network protocols are well known with a variety of open source and commercial tools to help network engineers with their daily operation. Network Monitoring Systems currently in use handle SNMP walks, ping probes and trace route routines natively. Some SNMP MIBs were standardized to collect protocol-specific information, such as VLAN utilization and OSPF LSAs. With OpenFlow and SDN, some network characteristics were changed, making some tools less useful or, sometimes, completely useless. In the past, network engineers assumed the control plane inside the network devices was working properly, focusing most of the troubleshooting processes on the data plane or configurations.
In OpenFlow and SDN environments, with an entirely new control plane in place, two additional components must be included in the troubleshooting scenario: one is the OpenFlow agent inside the network device, and the second is the OpenFlow controller and its associated application. There is added complexity the moment network engineers have to troubleshoot applications. In some cases, specific knowledge of programming languages is required, a skill almost ignored by most of the network engineers.
The complexity involved in the management of OpenFlow and SDN environments has significantly affected and still affects the AmLight network operation. Americas Lightpaths (AmLight) is a project of the U.S. National Science Foundation International Research Network Connections (IRNC) program to facilitate science research and education between the U.S. and the nations of Latin America. AmLight is a production network composed of a number of international network links connecting U.S. R&E networks to similar networks in Latin America.
Once AmLight was migrated to an OpenFlow-based network in 2014, new skills were gained, new troubleshooting tools were incorporated, and some scripts and tools had to be created or customized. To make things even more complex, network events in a production environment must be handled in the least disruptive way possible (different from network testing environments or simulation). Production traffic cannot be affected by the troubleshooting process. However, sometimes, extreme actions have to be taken. Finding a balance is an art at this moment, and having the appropriate set of tools is fundamental.
In past presentations at Internet2, TNC, and GLIF meetings, AmLight presented its experiences in supporting experimental testbeds in parallel with production traffic from a high-level perspective. This proposal aims to present the tools of the AmLight network operation. As part of this presentation, we will describe the initial challenges when migrating to SDN: What tools had to be improved and how they were improved; how these tools fit in the AmLight network monitoring environment to run a smooth troubleshooting process.
The OpenFlow sniffer, customizations to Flow Space Firewall, interoperability with legacy NMS systems, experiences with SDN traceroutes and other tools/scripts will be presented to provide the audience with some guidance in troubleshooting OpenFlow environments. Also, this presentation will be useful for traditional network engineers to get in touch with new approaches and skills they will be required to obtain in the near future.
Speaker Jeronimo Bezerra Florida International University
Primary track Advanced Networking