Extreme is Bringing Purple Rain from the Cloud

During Networking Field Day 21 Aerohive, I mean Extreme, presented on their new “Cloud Driven End to End Enterprise” using ExtremeCloud IQ, formerly HiveManager. After the acquisition of AeroHive by Extreme there had been lots of speculation in the wireless community on what was going to happen with the product. The most obvious conjecture was the reason Extreme made the purchase was for the cloud technology that AeroHive already had, but how would they fold it into the mix with their other offerings?

Abby Strong (@wifi_princess on Twitter) started us off with a quick introduction into The New Extreme and the vision of the company. As Abby started us down the path we got some quick stats around the new technology users in the world, including the 5.1 billion mobile users and USD$2 trillion dollars being spent on digital transformation which was explained more. Digital Transformation is one of the hot marketing buzzwords in the industry at the moment, but what is it exactly? According to Abby, “Digital Transformation is the idea of technology and policy coming together to create a new experience.” This is what Extreme has been focusing on, but how? Extreme is doing this via their Autonomous Network, using automation, insights, infrastructure and an ecosystem all wrapped in machine learning, AI and security.

Picture1

The concept behind this is using the insights and information Extreme has gathered and looking at issues that arise in the network and being able to recommend if it is a possible driver issue, a recommended code upgrade to fix a network issue and so on. This is a really cool concept around automation and insights which is where most companies are trying to get in the industry and from what was shown at NFD20 in February and then again at NFD21, I think they are almost there with their expanded portfolio of solutions in Applications, Switching, Routing and Wireless and open ecosystem and open source. Check out more on those solutions and more about Extreme at https://www.extremenetworks.com/products/.

Next Extreme brought us into their 3rd generation cloud solution, ExtremeCloud IQ and showing their roadmap towards the 4th generation cloud.

The ExtremeCloud IQ Architecture was presented by Shyam Pullela and Gregor Vučajnk (@GregorVucajnk on Twitter) with a demo of the system.

The architecture is still the previous Aerohive design, however, without ever really digging into the product I was impressed with how they have done the back-end cloud. Currently Extreme is using AWS to host their infrastructure, but we were assured it was not dependent on AWS but could be run on any cloud provider. The setup is interesting as they have multiple regional data centers connecting back to a global data center. This provides resiliency built-in to the system, the ability to run in any country in which a public cloud can run and to collect the analytics and ML/AI data globally and not just from regional areas. With the architecture the ExtremeCloud IQ can also be run in different formats, public cloud, private cloud and on-prem to provide the customers with flexibility. From a basic cloud architecture standpoint, there is nothing crazy or specific Extreme is doing with the setup. The key to how they have done it comes into the scalability that has been designed into the system. Using a simple architecture makes it easy for Extreme to just add compute power to the back-end to scale it for large organizations.

Screen Shot 2019-11-08 at 9.27.55 AM

Picture2

With these regional data centers in use, the ExtremeCloud IQ is processing data to the tune of 3.389 PETABYTES per day and an astounding number of devices and clients to help with the ML/AI decision-making that the infrastructure is handling. These stats were mind-blowing to me and really shows the power of what Extreme has been building, especially around the Aerohive acquisition.

Screen Shot 2019-11-08 at 2.39.44 PM

Screen Shot 2019-11-08 at 2.40.27 PM

All of this data gets fed into the cloud dashboard as we see with the majority of other vendors. The client analytics is very reminiscent of the dashboards we see from Cisco, Aruba, Mist, etc., there is nothing too different in this regard with the exception of only getting 30 days of data, with no longer options available at this point in time. This is not a major hit against the technology, only that there is no way to correlate data longer than a one month period.

One of the differences that I see in the system is the lower number of false-positive issues that may be flagged by the system. Using the ML that is built into the CloudIQ is the ability to see anomalies and not present them as a possible bad user session. This is something that can cause headaches, especially in a wireless system with users entering and leaving areas with applications running. I will get deeper into these capabilities in an upcoming post.

The team that was on-camera also did not back down from some interesting and hard questions surrounding the roadmaps of the products, where things are and announcements that were made within 24 hours of the presentation being delivered.

All-in-all the solutions and products I am seeing from Extreme and very positive, they seem to have begun the integration of AeroHive nicely and I am excited to see where they go with the big purple cloud.

 

Security is the New Standard

Everywhere we look today we hear about hacking of servers or email systems, credit card systems being compromised and public Wi-Fi as a ‘use at your own risk’ service. With all of the  big bad’s out there, security should be the new standard within wireless.

Security is more than a buzzword

There are so many buzzwords in the industry at this point with 5G, WiFi6, OFDMA, WPA3 and so on, security should not be considered one as well. For years wireless security was nothing more than a rotating passphrase, if someone remembered to change it. WEP finally got hacked which gave way to WPA and then WPA2. But for the most part all devices where still using a passphrase that was proudly displayed on a white board, sandwich board or the like. When wireless was a ‘nice to have’ commodity this was just fine. With wireless now becoming the primary medium for access, security is a must. Data moving back and forth from private and public clouds requires data have better security than a passphrase. Certificates, central authorization and accounting has become a must. Centralizing these needs into a single system makes securing and monitoring devices within these data sensitive networks.

How can this go further within the network?

Taking security to the next level

Basic monitoring of security within the network, user logins, MAC authentications, machine, authentications, failures, etc. is great to keep up with what is happening or to troubleshoot when a user is having an issue. But with the risks in today’s networks, both wired and wireless, a deeper-level of understanding and monitoring is needed.

This is where a User and Entity Behavioral Analytics (UEBA) system comes into play.

The basics of a UEBA seems simple, but it is a very complicated process. Multiple feeds being provided by items such as packet capture and analysis, SIEM input, NAC Devices, DNS flows, AD flows, etc. all come into the system and are correlated against rules that setup by the security administrators. As this traffic comes in and is analyzed by user a score is provided to that user based on where they are going on the Internet, traffic coming in and going out to ‘dangerous’ locations (i.e. Russia or China), infected emails that were opened, etc. This score is then updated or times. Once customized thresholds that are configured by the administrators are met or exceeded different actions can be taken on that device, disconnected from the network, quarantined on the network, or an alert sent to an administrator.

Total Package

Designing and deploying networks with complete 360º security visibility is no longer an option but a must. With data flowing in and out of private and public clouds, into and out of Internet-based applications, and the pervasiveness of wireless as a primary access medium there has never been a more important time to make security a standard and not an after thought.

WiFi 6 Why We Need It And What It Isn’t

Wireless networks have been around for a long time. We all know the history of the industry starting as a nice to have feature that we could work without a cable. Today wireless has become the primary medium for connectivity in most industries and most households. As the shift has occurred, wireless technology has had to try and keep up. The latest phase of this race is the 802.11ax, or WiFi6, amendment.

Why do we need WiFi6?

By now everyone has heard that 5G is coming and the crazy fast speeds that it will bring from a cellular-side. We will look at that more in another post. But WiFi is fighting the same issues as cellular in today’s world. We are oversubscribed on WiFi, speeds suffer because of older technology, wireless is the primary connection method of almost every device in the world and IoT is coming. Enter WiFi6.

To be upfront as we begin this, ratification of the 802.11ax standard looks to be at least a year away with most stating a date of September 2020 before this will happen. Even without full ratification manufacturers are starting to put out access points and a few clients are starting to trickle into the market.

So with ratification still a year away, why do we need to worry about WiFi6 now? WiFi6 is more about capacity than speed. As more and more devices are accessing the wireless network, bottlenecks begin appearing. The way WiFi6 will handle this is a trick taken from the cellular industry with OFDMA (Orthagonal Frequency Division Multiple Access. The easiest way to explain it is we are taking a highway that has 8 lanes today and then funnels to a one lane road. Huge bottleneck occurs and all traffic grinds to a halt like the 405 in California. Now with WiFi6 and OFDMA, those 8 lanes stay 8 lanes and traffic can flow freely. With having these extra ‘lanes’ capacity is now increased. This is the key part of WiFi6. There is a great white paper on the traffic lanes with well done diagrams and more information on here (https://www.arubanetworks.com/assets/so/SO_80211ax.pdf ).

We have all heard about the speeds and how fast we can now send and receive traffic on WiFi6, but capacity is the key to the system. More capacity equals more opportunities for devices to be serviced on the network, especially for time-senstive data like Voice and Video over WiFi. As we move to Mobility First workplaces and stop pulling cables to desks, wireless is more and more important. Design is ever more complex now for wireless and how we can use the spectrum smarter to allow more of these devices to function and function well.

As stated previously, the key to the new ammendment for 802.11ax is not all about speed. It is about capacity. We need to be looking at how we handle these time-sensitive data and not how we push them faster. With WiFi6, yes the speed is there if you have the right client, but how do we service that least-capable device and make that function as if it is a WiFi6 device? Capacity is the key and as we continue to add more devices, i.e. IoT, wireless first deployments, nurse call devices. WiFi6 is the key to solving this issue and granting that capacity we so badly need.

Auto-Channel Timing and the Issues it can Cause

All wireless network vendors have some Auto-RF Management of some manner, RRM for Cisco, AirMatch (Formerly ARM) for Aruba, etc. Most of the industry uses these features for about 95% of their installs to handle power level changes, channel changes based on interference or utilization. But something I have noticed time and again is the number of installs that use the default values for these Auto-RF algorithms to run.

So the question is, why do we care about this?

When using this for control of power I typically do not see a big issue in using default values for timing of the algorithm. However, for channel assignment I have seen lots of problems over the years using defaults values and the issues it can cause clients.

What is auto-channel management?

Simply, auto-channel management is exactly what it says, centralized automatic management of the channels being used in the network by an RF or mobility master. Each manufacturer has their own way of managing and handling these changes but the concept behind it is universal. We will look into each manufacturers way of doing it in another blog. This one is simply how it generally works.

During normal operation of the wireless network access points collect data about the RF environment, either from dedicated sensors, off-channel scanning, RSSI values that clients are being seen at, as well as neighbor messages from surrounding APs in the same RF group or neighborhood. This data contains client load, interference seen from radar, microwaves, Bluetooth or other networks in the surrounding area.

All of this data gets sent back to the RF Master, typically the wireless controller on the network or a master controller that is handling these duties. This master then takes all of this data to make the calculations for the APs in the network for an optimized channel plan to help mitigate interference as much as possible.

Once this data is compiled on the master the changes are sent back to the network based on anchor times and interval settings. Cisco does this default every 10 minutes starting at midnight. Aruba sends this at 5 am local time to the Mobility Master by default. A common misconception I have run into over the years is just because RRM runs every 10 minutes, does not mean that the channels are necessarily changing every 10 minutes.

 

Why is this an issue for clients?

With the addition of 802.11h the Management Frames Information Elements now include Element ID 37 for Channel Switch Announcements as shown below from the IEEE.

             
  Element ID Length Channel Switch Mode New Regulatory Class New Channel Number Channel Switch Count
Octets: 1 1 1 1 1 1

The Channel Switch Announcement is sent from an AP that has been marked as needing to change channel by the AutoRF calculations. The important parts of the element are the Channel Switch Mode, New Channel Number and Channel Switch Count.

The Channel Switch mode informs the clients on the AP that is scheduled to change channels that a change is going to occur. If this value is set to 1 the clients should cease transmitting data to the AP until the change has occurred, which will cause a disruption in communication for a short period until the change is complete. If the value is set to 0, there are no restrictions on the clients transmitting during the channel change.

The New Channel Number is pretty basic, this the new channel that AP will be on after then channel change is complete.

The Channel Switch Count is basically the countdown timer for the channel switch.  If the count is set to 0 the channel change could occur at anytime. If it is some other number, that is the remaining time before the change occurs.

So with this very basic overview, why does it matter to a client?

In wireless networking a client’s channel is based on the AP it is connected to. If the client is connected to an AP on channel 11, the client will communicate on channel 11. But again, why does this matter?

When an AP changes channels based on RRM calculations, every client associated to that AP must change as well. So our AP that was on channel 11 changes to channel 6 now every client associated to that AP need to change to channel 6 based on the Channel Switch Announcement and the values within that element. Based on the Channel Switch Count, if a client is downloading a file, making a video call, or just doing basic online tasks from their computer there would be a disruption to that client. It could be very brief, but it depends on how long it takes the client to reassociate or roam to the new channel for the AP. With time sensitive applications this can seem like jitter or lag or even just slowness on the network. This can equate to the dreaded, “The network sucks right now”.

Back to the opening, if the defaults time is set to use say 10 minutes, there is a possibility that a network that is seeing interference from surrounding wireless networks, high channel overlap, lower RSSI values, etc. could change channels on AP that frequently. So clients that are connected to these APs are changing channels as well every 10 minutes which could be confused for a small service disruption or just poor network quality. This topic will be looked at in-depth in a coming post.

In the next post we look at some other issues this constant changing of channels can produce as well as how a couple of different manufacturers handle AutoRF within their products.

 

Cisco RRM Restart

Recently when working with Cisco wireless networks I have been really working to get Dynamic Channel Assignment tuned in and really understand much more about it. Some of the important things to make sure you are setting correctly include Anchor Time, DCA Interval (please don’t use the default, there is a blog post coming about that), etc.

One thing that became an option via CLI in the 7.3 code train was the ability to restart the RRM DCA process on the RF Group Leader. Why is this important I can hear some of you saying, or why would I want to do this? Here are a couple of examples of why.

If a controller enters or leaves an RF Group or if the RF Leader leaves and comes back online, as in a reboot, DCA will automatically enter startup mode to reconfigure DCA regardless fo the settings that have been changed on the controller, i.e. not using default of 10 minute intervals. But is there a need to do this manually? Yes.

As you add new APs into the network it is a good idea, and a Cisco recommendation, to initialize DCA startup mode. The reasoning behind this is as APs are added, DCA needs to rerun calculations and provide a more optimized channel plan based on the newly added APs and what the other APs are seeing over the air. When this command is run, it should be done from the RF Leader and will only affect the RF Leader.

The command should be run on both 2.4 GHz and 5 GHz radios:

2.4 GHz: config 802.11b channel global restart

5 GHz: config 802.11a Chanel global restart

802.11ax The Future Begins

The networking industry is full of buzzwords and hype; A.I., M.L., SDWAN and virtual everything. This is even more evident in the world of wireless networking; claims of speeds up to 1 Gbps, wired-like connectivity, mobility first, future-proofing and on and on. It all reminds me of one of my favorite Queen songs Radio Gaga, “All we hear is radio ga ga, Radio blah blah, Radio, what’s new?”.

The new 802.11ax amendment (not yet standard, thanks TheITRebel), or WiFi6 as it is now being called, is slated to be ratified later in 2019. This is causing all kinds of hype in some circles and not so much in others yet as end-user computing devices will probably not have chipsets to support 802.11ax until maybe the end of 2019. Looking forward, more full adoption will probably not happen until 2020 or even as late as 2021.

What is 802.11ax?

802.11ax will build on the features that the 802.11ac, or WiFi5, standard gave us as well as adding some cool new things to help with the ever growing demand on wireless networks. From a desire for mobility-first networks [to cellular offloading that is wanted (and sometimes needed) from the carriers,.11ax has it’s work cut out for it.

802.11ac gave us some significant improvements with additional channel widths in the 5GHz space to allow for 80MHz channels in Wave 1 and 160MHz channels in Wave 2, giving higher bandwidth availability to user devices if those devices had the chipset to support it. The drawback isnow with 80MHz and 160 channels is that we take the available 5GHz channels from a total of 24 down to 5 or 1 available non-overlapping channels, depending on the usage of DFS channels. This makes it much harder to channel plan in an enterprise or LPV style of deployment, so I still recommended to use 20 MHz channels, or perhaps 40MHz if done properly. However, when this style of deployment is done the whopping 1.3Gbps that is touted by the marketing folks cannot be met even when using 3×3 spatial streams. Again, an example of more hype that really is not too useful outside of a small business or home deployment.

802.11ax can achieve throughput speeds of up to 4.8Gbps according to the data sheets and marketing put out so far. But how can we get to those speeds?

As with 802.11ac, to reach the speeds marketing is telling us about we need two things, multiple bonded channels and clients that can support it. Let’s look at these one at a time.

802.11ac wave 2 began to support 160MHz channels as well as Multi-User Multiple Input/Multiple Output to support multiple streams of data. This implementation yielded multi-user downlinks from the AP to the client. However, uplink traffic from the client to the AP is a single client at a time, by contrast. 802.11ax looks to improve this by allowing MU-MIMO APs to talk bi-directionally to up to 8 devices simultaneously and to become almost ‘switch-like’ (I know more buzzwords, sorry). The new standard will also allow capable clients to take full advantage of MU-MIMO and to use dual-streams to an AP which would potentially double the bandwidth to that client.

The best analogy I have seen of this so far is with 802.11ac there is an eight-lane road, that funels down to one-lane creating a bottle neck and allowing only a single car thorugh at a time. This is how MU-MIMO worked previously with legacy uplink/downlink mechanisms. Now with 802.11ax that one-lane road is extended to a full eight-lanes, eliminating the bottle neck and allowing traffic to flow freely.

More to come on this subject soon.

 

 

Cisco Prime, What is it good for?

By now the majority of us have used some itinerant of Prime, NCS, or WCS for wireless management, placing APs on maps, template building, backups, etc. But what else can Prime really do?

I recently did a project where we needed to integrate a new prime instance with the standard CMX installs, which is a chore in and of itself (a post on that is coming), wireless management for the various buildings they have and some jobs to do back-ups of switch, router and ASA configs. There then a larger project to push QoS to a large number of switches, around 1,000 or so. APIC-EM was attempted but there was such a variety of switch models, chassis, IOS versions, QoS abilities to name a few. With these variances, only about half the switches were supported in APIC-EM. Since we had just stood up the new Prime, it was decided to use Prime to push these configs to the switches. Let’s be totally honest before we begin, Prime was not built as a wired network management suite. It was built form the old WCS and then pieces were added and we now have this. It is not horrible, but it is not the best for wired either.

Fun now ensues.

Initial thoughts were to just push Auto-QOS to all switches, however there was a requirement for more granularity. More fun begins. I start to set out writing config scripts in Prime for a couple of switch models to test on, 4506-E and 4500X. Should be simple right, take a QoS config, put it in the template, select the switch and go. To write a script in Prime you need some knowledge of Apache scripting commands which can be a little confusing in itself if you not done coding previously, like myself. I was lucky and had someone who could do these scripts and teach along the way.

Some of the pitfalls we had along the way included the need to build-in smarts to see what platform the switches were to use the proper commands, what version of code was on the switch, querying the switch to gather port types and line cards installed. To accomplish this you have to first begin with understanding the Prime database structure and how to call the appropriate variables for what you need. This excerpt from the Prime 3.1 user guide is a good place to start to understand the variable and how to call them from inside the CLI config templates. Also, see this Support Community Post which has some good info as well.

Now we have gotten our background info we are ready to start jumping in and breaking, I mean writing, some scripts. This was a lot of trial and error for me as we had to touch at least one version of each type of switch and verify we had the right CLI commands to enable QoS as it differs on platforms and even code trains within the same platforms.

After a couple of false starts with getting platforms commands, interface commands and settings just right we were able to get a working script for the first group of switches, the 4506-E,4500-X and a test Nexus 7K. The script ended up looking like this:

$Platform.contains(“Data Center Switches”))

The trick is we had to have the platform command and specifically the “Data Center Switches”. If a sh platform is run on the switches this is what is returned as the platform name. The reason we were looking at this command was it was easier and seemed more stable to call the platform type than the $Version.contains command to check IOS vs. IOS-X.

policy-map configs for IOS-X

#else

This is where we specify non-IOS-X config elements

access-list

policy-map

class-map

#foreach ($interfaceName in $InterfaceNameList)

#if ($interfaceName == (“GigabitEthernet0/0”))

#else

int $interfaceName

service-policy output QOS-SHAPE

service-policy input QOS-MARK

#end

#end

#end

These are the lines where the magic really happens. This code is going to the Prime DB and doing a querying for interfaces using the $InterfaceNameList and then we are checking if $InterfaceName == (“GigabitEthernet0/0”)) which is generally the management port on the switch. Of the port has that name we do not apply any Qos to it. If not any other $InterfaceName we apply the service-policy config to.

Gotcha 1 for me, make sure you account for all the #end statements you need. It becomes easy to lose track and it will frustrate you when you import to Prime and try to test it the first time.

With this basic config, you can now customize based on switch type.

The next step to deploy is we have to get this config into Prime, if you didn’t write it there, and make sure all our variables are working properly. After importing into Prime the Form View tab and Add Variable tabs will now be populated.

Our next post will cover Deployment of the newly created script to either 1 or 1,000 switches depending on the need.