Auto-Channel Timing and the Issues it can Cause

All wireless network vendors have some Auto-RF Management of some manner, RRM for Cisco, AirMatch (Formerly ARM) for Aruba, etc. Most of the industry uses these features for about 95% of their installs to handle power level changes, channel changes based on interference or utilization. But something I have noticed time and again is the number of installs that use the default values for these Auto-RF algorithms to run.

So the question is, why do we care about this?

When using this for control of power I typically do not see a big issue in using default values for timing of the algorithm. However, for channel assignment I have seen lots of problems over the years using defaults values and the issues it can cause clients.

What is auto-channel management?

Simply, auto-channel management is exactly what it says, centralized automatic management of the channels being used in the network by an RF or mobility master. Each manufacturer has their own way of managing and handling these changes but the concept behind it is universal. We will look into each manufacturers way of doing it in another blog. This one is simply how it generally works.

During normal operation of the wireless network access points collect data about the RF environment, either from dedicated sensors, off-channel scanning, RSSI values that clients are being seen at, as well as neighbor messages from surrounding APs in the same RF group or neighborhood. This data contains client load, interference seen from radar, microwaves, Bluetooth or other networks in the surrounding area.

All of this data gets sent back to the RF Master, typically the wireless controller on the network or a master controller that is handling these duties. This master then takes all of this data to make the calculations for the APs in the network for an optimized channel plan to help mitigate interference as much as possible.

Once this data is compiled on the master the changes are sent back to the network based on anchor times and interval settings. Cisco does this default every 10 minutes starting at midnight. Aruba sends this at 5 am local time to the Mobility Master by default. A common misconception I have run into over the years is just because RRM runs every 10 minutes, does not mean that the channels are necessarily changing every 10 minutes.

 

Why is this an issue for clients?

With the addition of 802.11h the Management Frames Information Elements now include Element ID 37 for Channel Switch Announcements as shown below from the IEEE.

             
  Element ID Length Channel Switch Mode New Regulatory Class New Channel Number Channel Switch Count
Octets: 1 1 1 1 1 1

The Channel Switch Announcement is sent from an AP that has been marked as needing to change channel by the AutoRF calculations. The important parts of the element are the Channel Switch Mode, New Channel Number and Channel Switch Count.

The Channel Switch mode informs the clients on the AP that is scheduled to change channels that a change is going to occur. If this value is set to 1 the clients should cease transmitting data to the AP until the change has occurred, which will cause a disruption in communication for a short period until the change is complete. If the value is set to 0, there are no restrictions on the clients transmitting during the channel change.

The New Channel Number is pretty basic, this the new channel that AP will be on after then channel change is complete.

The Channel Switch Count is basically the countdown timer for the channel switch.  If the count is set to 0 the channel change could occur at anytime. If it is some other number, that is the remaining time before the change occurs.

So with this very basic overview, why does it matter to a client?

In wireless networking a client’s channel is based on the AP it is connected to. If the client is connected to an AP on channel 11, the client will communicate on channel 11. But again, why does this matter?

When an AP changes channels based on RRM calculations, every client associated to that AP must change as well. So our AP that was on channel 11 changes to channel 6 now every client associated to that AP need to change to channel 6 based on the Channel Switch Announcement and the values within that element. Based on the Channel Switch Count, if a client is downloading a file, making a video call, or just doing basic online tasks from their computer there would be a disruption to that client. It could be very brief, but it depends on how long it takes the client to reassociate or roam to the new channel for the AP. With time sensitive applications this can seem like jitter or lag or even just slowness on the network. This can equate to the dreaded, “The network sucks right now”.

Back to the opening, if the defaults time is set to use say 10 minutes, there is a possibility that a network that is seeing interference from surrounding wireless networks, high channel overlap, lower RSSI values, etc. could change channels on AP that frequently. So clients that are connected to these APs are changing channels as well every 10 minutes which could be confused for a small service disruption or just poor network quality. This topic will be looked at in-depth in a coming post.

In the next post we look at some other issues this constant changing of channels can produce as well as how a couple of different manufacturers handle AutoRF within their products.

 

Networking Field Day 20 Recap – Juniper is Hedging Their Bets

During Networking Field Day 20 that just wrapped on February 15, 2019, there was a most unexpected presentation from Juniper around automation and some things they are doing to hedge their bets on where the industry is moving over the next several years.

 

Mike Bushong (@mbushong) took the stage first for the team and laid out Juniper’s vision of where the industry is headed and gave warning to some of us old guys, either evolve your skills and be ready to leave CLI behind or you will be left behind. Automation is not a buzzword in our industry any longer, it is the here and the now. If you are looking into automation, looking to understand or learn automation, or even just try to understand what automation means you are already behind. As Mike points out in his Networking Field Day 20 talk, Juniper has lead the way in automation for quite some time but we are now at a tipping point where CLIs are are going to be a thing of the past very soon. Mike also made something very clear during his intro that had a few of us in the room scratching our heads, the tools Juniper is putting out are open to the public, not all are Juniper specific and they are getting no monetary vlaue back from them it is for a greater cuase to us all. A fundamental shift in the industry the needs to take place. And I truly must agree with Mike on this, we as engineers have to start getting better at this or we will be left behind.

Next Raunak Tibrewal (@raunaktib) took the mic over and introduced us to Juniper’s new EngNet site and portal. EngNet is built around 3 bases that help an engineer to prepare for and learn automation, Learn, Build, Explore. One of the things I was impressed with is this was built with community in mind. Juniper has a dedicated Slack channel for community support as well as J-Net to make this a very collaborative and open learning experience. I connected to the EngNet site and was really pleasantly surprised with the content, how it was laid out and was really shocked at the amount of content available. Right up front you can signup for the Slack channel, but then as you continue down there is a nice roadmap to get you going on things no matter what you level might be. Obviously a lot of this content is around Junos OS, but there is some vendor agnostic lessons as well. I think two fo the coolest features are the Automation Exchange which have readily available Ansible playbooks, NAPALM scripts and other goodies which are all sotable and searchable by either Type, Market Segement, Network Role and Operational Process. The final piece that brings this all together is the Learn area in which you can followed Assisted Learning via different options or follow the Self-Learning track. Most everything within EngNet is free, but there are some items that you get a free trial for 60-Days or so and then will need to pay for. All-in-all this is a great place to start if you are looking to get into Junos OS or just to learn through some open labs and even just see what others have done for automation.

The final presentation came from Matt Oswalt (@mierdin) who unveiled the Juniper NRE Labs platform. Matt started by building off what we had already heard from Mike about sutomation today in our industry is not a production-side problem but a consumption-side problem. The tools are there, the technology is there, but the people are not consuming them. To try and help solve this consumption problem, Juniper has released their NRE Labs  which is a “Community Platform for learning and teaching automation and Network Reliability Engineering”. Basically they have put out a totally free (you do not even need to supply an email address), broswer-based platform to learn vendor-neutral automation using tools such as YAML, Python, REST APIs, Git, Linux and so on. It starts with fundamentals if you are just getting your feet wet in autmation or coding. Then there are tools availabel to try our like Salt or NAPALM or Ansible. All of this runs natively in the borwser with no need to download anything. The lessons are customizable based on your current strengths or weaknesses which then bases the tasks on your current knowledge level and provides a roadmap for learning with links. One of the collest things Matt and the team have done is to enable the use of Jupyter notebooks in the learning. Basically this enables you to have a Python interpreter running real-time so you can see the output in your browser window as you run the code. There is so much that has been done by the team on this. I would suggest to go check it out and see for yourself the greatness that is there.

What Juniper has been working on to enable the users to actually consume the tools and automation that is out in the industry is really amazing, especially the fact in the case of NRE Labs, they are not looking to monetize from it. This huge in my opinion and in the long-run could actually help Juniper based on their product set and strong autmation reliance on their products.

 

Check out the presentations from Networking Field Day 20 here.

Cisco RRM Restart

Recently when working with Cisco wireless networks I have been really working to get Dynamic Channel Assignment tuned in and really understand much more about it. Some of the important things to make sure you are setting correctly include Anchor Time, DCA Interval (please don’t use the default, there is a blog post coming about that), etc.

One thing that became an option via CLI in the 7.3 code train was the ability to restart the RRM DCA process on the RF Group Leader. Why is this important I can hear some of you saying, or why would I want to do this? Here are a couple of examples of why.

If a controller enters or leaves an RF Group or if the RF Leader leaves and comes back online, as in a reboot, DCA will automatically enter startup mode to reconfigure DCA regardless fo the settings that have been changed on the controller, i.e. not using default of 10 minute intervals. But is there a need to do this manually? Yes.

As you add new APs into the network it is a good idea, and a Cisco recommendation, to initialize DCA startup mode. The reasoning behind this is as APs are added, DCA needs to rerun calculations and provide a more optimized channel plan based on the newly added APs and what the other APs are seeing over the air. When this command is run, it should be done from the RF Leader and will only affect the RF Leader.

The command should be run on both 2.4 GHz and 5 GHz radios:

2.4 GHz: config 802.11b channel global restart

5 GHz: config 802.11a Chanel global restart

802.11ax The Future Begins

The networking industry is full of buzzwords and hype; A.I., M.L., SDWAN and virtual everything. This is even more evident in the world of wireless networking; claims of speeds up to 1 Gbps, wired-like connectivity, mobility first, future-proofing and on and on. It all reminds me of one of my favorite Queen songs Radio Gaga, “All we hear is radio ga ga, Radio blah blah, Radio, what’s new?”.

The new 802.11ax amendment (not yet standard, thanks TheITRebel), or WiFi6 as it is now being called, is slated to be ratified later in 2019. This is causing all kinds of hype in some circles and not so much in others yet as end-user computing devices will probably not have chipsets to support 802.11ax until maybe the end of 2019. Looking forward, more full adoption will probably not happen until 2020 or even as late as 2021.

What is 802.11ax?

802.11ax will build on the features that the 802.11ac, or WiFi5, standard gave us as well as adding some cool new things to help with the ever growing demand on wireless networks. From a desire for mobility-first networks [to cellular offloading that is wanted (and sometimes needed) from the carriers,.11ax has it’s work cut out for it.

802.11ac gave us some significant improvements with additional channel widths in the 5GHz space to allow for 80MHz channels in Wave 1 and 160MHz channels in Wave 2, giving higher bandwidth availability to user devices if those devices had the chipset to support it. The drawback isnow with 80MHz and 160 channels is that we take the available 5GHz channels from a total of 24 down to 5 or 1 available non-overlapping channels, depending on the usage of DFS channels. This makes it much harder to channel plan in an enterprise or LPV style of deployment, so I still recommended to use 20 MHz channels, or perhaps 40MHz if done properly. However, when this style of deployment is done the whopping 1.3Gbps that is touted by the marketing folks cannot be met even when using 3×3 spatial streams. Again, an example of more hype that really is not too useful outside of a small business or home deployment.

802.11ax can achieve throughput speeds of up to 4.8Gbps according to the data sheets and marketing put out so far. But how can we get to those speeds?

As with 802.11ac, to reach the speeds marketing is telling us about we need two things, multiple bonded channels and clients that can support it. Let’s look at these one at a time.

802.11ac wave 2 began to support 160MHz channels as well as Multi-User Multiple Input/Multiple Output to support multiple streams of data. This implementation yielded multi-user downlinks from the AP to the client. However, uplink traffic from the client to the AP is a single client at a time, by contrast. 802.11ax looks to improve this by allowing MU-MIMO APs to talk bi-directionally to up to 8 devices simultaneously and to become almost ‘switch-like’ (I know more buzzwords, sorry). The new standard will also allow capable clients to take full advantage of MU-MIMO and to use dual-streams to an AP which would potentially double the bandwidth to that client.

The best analogy I have seen of this so far is with 802.11ac there is an eight-lane road, that funels down to one-lane creating a bottle neck and allowing only a single car thorugh at a time. This is how MU-MIMO worked previously with legacy uplink/downlink mechanisms. Now with 802.11ax that one-lane road is extended to a full eight-lanes, eliminating the bottle neck and allowing traffic to flow freely.

More to come on this subject soon.

 

 

To Predict or not to Predict, that is the question…

In the world there are many questions that polarize us all; did Han shoot first, Kirk or Picard, Left Twix or Right Twix. But the most important question of them all, should predictive designs exist. If you follow the wireless community this is probably the most polarizing topic right with lower data rates being enabled or not.

 

Designing wireless is one of the most challenging things we do. We receive a set of drawings and put the Solo cup down and start drawing circles. Wait, bad flashback. These were the good old days. We would draw our circles, place APs and then go on site and verify locations and take some survey readings with an AP on a stick to verify all looks good, what does the spectrum look like, are there interferers in the area.

 

Today we still draw circles, but they are really cool looking ones using Ekahau typically. We draw walls that can help us predict what loss may occur from walls, doors, etc. Then we go on site and take readings with the same software and an AP on a stick to make sure those pretty circles match. But why do it ahead of time and not when you are on site?

 

I have had many instances over the years where a predictive survey was all I was able to do. The customer would not sign-off on doing an on-site active survey because of the disruption it may cause, the building has not been built yet, or just no budget for it in the project.

 

I have also had the opposite where a customer would tell me that they saw no reason for a predictive and the coverage they had was ‘good enough’. But is it?

 

With the stuff we are putting on wireless today can we really be ok with just good enough? In a large portion of organizations, we have gone from wireless being a nice to have to a wireless first strategy. This includes VoWIFI using Skype or some other demanding application/protocol. How do we handle this without trying to do some kind of prediction? Are we to just install the network and then do a remediation at additional cost after things blow up quicker than Lee Badman’s temper when they take the all you can eat steak away??

 

With tools like Ekahau, and no they are not paying me, but they have awesome swag, you can do predictions based on applications, number of users, and device types. We no longer need the Solo cups, oh what? The keg just got tapped…

 

But all joking aside, is it really worth us guessing and throwing APs up and then coming back doing remediations after the fact to make sure we handle the new generation of wireless networks appropriately? Or should we just do the extra work up front so we have an idea of what we are walking into? The reports we can provide ahead of time as well as comparing to post-installation surveys are invaluable to this bloggers opinion and will continue to be fought for as long as I do wireless.

 

 

Cisco Prime – This is what it is good for. Part 2

In the previous post the scripting needed for multi-linecard switches like the 6500 was discussed. In this post we will finally deploy the configs we have created through our scripts using the Prime Deployment function.

To start we simply go to our config template and open it in Prime. We can see the script in the bottom pane of the screen and the Deploy button is available at the top of the page.

Once we click Deploy we are presented with a screen to select the switches to which we want to deploy the configs.

To filter by a specific switch name of prefix, Hit the filter icon and enter the name. As devices are selected with the checkbox, they will add the Device to Deploy area. When all devices to deploy are selected click Next.

Picture1

The next area is the Workflow screen. We did not do anything in this area and just clicked Next.

This then displays the devices selected and we now can see the form created when the script was written which is where, like in the case of the 6500, lincecards can be selected. This area also has an option in the right corner to check the CLI commands against the device verified to make sure the commands are compatible.

Screen_Shot_2018-01-27_at_4_04_57_AM

Screen_Shot_2018-01-27_at_4_05_33_AM

After clicking Next we are presented with the Deployment Options area. We did a couple of different ways of deployment, On-Demand and Scheduled.

Picture2

On-Demand is when selecting the Now radio button then Next. If deployment is to be scheduled at another date and time, this can be accomplished us the Date radio button and selecting the appropriate Date/Time. Be careful as this is the Date/Time of the sever. If your server is centralized in a data center and your site is in another time zone this needs to be taken into account.

There are a couple other options at the bottom of this screen that help to make sure we do not lose our config that we have worked on so hard, Copy Running Config to Startup and Archive Config After Deployment. These are fairly self-explanatory, but the second option is used if you are archiving your device configurations to the Prime Server for back-ups.

Once we click Next we get the final Deploy verification screen, this is our last point of turning around. Once Deploy is clicked the job will begin running in Prime and we can only abort it in the Job Dashboard.

Screen_Shot_2018-01-27_at_4_06_01_AM

At this point, sit back and have some coffee or something stronger, and wait to see the job complete in Job Dashboard. Depending on the number of devices the config is being pushed to and how large the config ended up being, this can take upwards of 20+ minutes to complete. You can keep an eye in it in Job Dashboard and make sure all devices are successfully being deployed to.

Some gotchas that gave us a little grief.

Portchannels. Depending on the model of switch, the portchannels have to have imnput part of the port config added to physical and the output added to the portchannel. We did this manually as it was easier and fee and far between, but with testing you could add this part to your script.

 Random Errors.  We would occasionally receive an error that a timeout occurred pushing the config to the switch. After doing research and looking at the actual switch it was determined the config would actually push and we never really figured out why this error would occur. If anyone else has seen this and has any further info, please let me know and I will update this with that info.

With that we complete the look at using Cisco Prime to push QoS configs to ~1,000 switches in the wild. I genuinely hope this helps some other folks out there and provides some info to all.

Look for more coming soon.

Cisco Prime – This is what it is good for. Part 1

In the last post we looked briefly at a scripting sample on adding QoS commands to IOS-X and IOS switches using Prime Infrastructure. To recap, we were looking to push QoS policies to ~1,000 switches of various models, IOS versions and even line cards. Using APIC-EM was not an option as only about half the switches were supported either because of old platform, IOS or other issues. Prime was selected since it had just been stood up for the wireless implementation and could push to all the various switch types, from 2960 to Nexus 7K.

With the scripts we needed to take into account the platform of the switch, the IOS and linecards as previously mentioned. This process has to use a combination of automation through Prime as well as manual intervention to know what the linecard is installed in the switch so it can be selected from a drop-down of available cards.

Last time we looked at a basic IOS config for QoS, how do we handle a 6500 series with a variety of linecards? Below is a sample of how this had to be handled.

The first thing, as with the previous script is we have to query the Prime DB structure and set the variables for the slots on the switches.

<parameter-metadata>

<param-group cliName=”cli command set” isMandatory=”true” name=”Deploy_QoS_Cat6500 parameters”>

                <description>Parameters for Deploy_QoS_Cat6500</description>

                <parameter name=”slot1″>

                    <description>Line Card Slot 1 Type</description>

 <default-value label=”Select the appropriate line card type or none for slot 1″>None</default-value>

                    <default-value>6148</default-value>

                    <default-value>6524</default-value>

                    <default-value>6704</default-value>

                    <default-value>6724</default-value>

                    <default-value>6748</default-value>

                    <default-value>6824</default-value>

                    <default-value>6848</default-value>

                    <data-type>Dropdown</data-type>

                    <mandatory>true</mandatory>

                    <isGlobal>false</isGlobal>

                    <syntax>

                        <pattern/>

                    </syntax>

                    <isGlobalVariable>false</isGlobalVariable>

                </parameter>

 

This has to be done for each possible line slot depending on the model. We went all the way to 13 based on the customer having a number of 6513 chassis.

Next we get to the meat of the QoS config that will be applied to the ports.

mls qos

mls qos map cos-dscp 0 10 18 26 34 46 48 56

 

##Queuing command structure

 

#set ( $OnePSevenQEightT = “wrr-queue queue-limit 10 25 10 10 10 10 10

wrr-queue bandwidth 1 25 4 10 10 10 10

            priority-queue queue-limit 15

wrr-queue random-detect 1

            wrr-queue random-detect 2

            wrr-queue random-detect 3

            wrr-queue random-detect 4

            wrr-queue random-detect 5

            wrr-queue random-detect 6

            wrr-queue random-detect 7

wrr-queue random-detect max-threshold 1 100 100 100 100 100 100 100 100

wrr-queue random-detect min-threshold 1 80 100 100 100 100 100 100 100

            wrr-queue random-detect max-threshold 2 100 100 100 100 100 100 100 100

wrr-queue random-detect min-threshold 2 80 100 100 100 100 100 100 100

            wrr-queue random-detect max-threshold 3 100 100 100 100 100 100 100 100

            wrr-queue random-detect min-threshold 3 80 100 100 100 100 100 100 100

            wrr-queue random-detect max-threshold 4 100 100 100 100 100 100 100 100

            wrr-queue random-detect min-threshold 4 80 100 100 100 100 100 100 100

            wrr-queue random-detect max-threshold 5 100 100 100 100 100 100 100 100

            wrr-queue random-detect min-threshold 5 80 100 100 100 100 100 100 100

            wrr-queue random-detect max-threshold 6 100 100 100 100 100 100 100 100

wrr-queue random-detect min-threshold 6 80 100 100 100 100 100 100 100

            wrr-queue random-detect max-threshold 7 100 100 100 100 100 100 100 100

            wrr-queue random-detect min-threshold 7 100 100 100 100 100 100 100 100

            wrr-queue cos-map 1 1 1

            wrr-queue cos-map 2 1 0

            wrr-queue cos-map 3 1 2

            wrr-queue cos-map 4 1 3

            wrr-queue cos-map 5 1 6

            wrr-queue cos-map 6 1 7

            wrr-queue cos-map 7 1 4

            priority-queue cos-map 1 5″ )

This is one example of the different structure that needs to be created which is also based on linecard model and what the card will support for commands and QoS. If you are new to this, as I was, the name $OnePSevenQEightT seems confusing, but being a great cryptographer, you can quicker decipher. OneP = One Priority, SevenQ =  Seven Queues and EightT = Eight Thresholds.

Now that we know what model line cards and the configs built for the actual QoS commands, we can start the interface configs for each slot.

## !—INTERFACE CONFIG for slot 1:

 

#if ( $slot1  == “6704” )

            #set ( $port_range = “Te1/1-4” )

            int range $port_range

                        $OnePSevenQEightT

 

#elseif ( $slot1  == “6708” )

            #set ( $port_range = “Te1/1-8” )

            int range $port_range

                        $OnePSevenQFourT

                       

#elseif ( $slot1  == “6724” || $slot1 == “6824” )

            #set ( $port_range = “Gi1/1-24” )

            int range $port_range

                        $OnePThreeQEightT

                       

#elseif ( $slot1 == “6748” || $slot1  == “6848” )

            #set ( $port_range = “Gi1/1-48” )

            int range $port_range

                        $OnePThreeQEightT

                       

#elseif ( $slot1 == “6524” )

            #set ( $port_range = “Gi1/1-24” )

            int range $port_range

                        $OnePThreeQOneT

                       

#elseif ( $slot1 == “6148” )

            #set ( $port_range = “Gi1/1-48” )

            int range $port_range

                         $OnePTwoQTwoT

 

#elseif ( $slot1 == “None” )

#end

In this code we are looking at each slot, #if ( $slot1, and we have to build a config for the slot with each possible linecard that could be installed because each takes a different command or queueing structure as we built in the first set of code.

The linecard model is then specified, == “6704” ). You may be asking, ‘Nick why does this even matter? That seems like a lot of extra code I just really don’t want to deal with.’ It does matter since each linecard model may have a different number of ports and even type of port. We cannot really specify commands to add configs to a Gig interface when the linecard is a TenGig card. We also have to account for the option that the linecard is not actually populated, can’t really put a config on a card that is not installed. It is painful but needed. Copy and Paste is your friend, but be careful to make sure the slot number gets updated each time.

At this point just make sure have the correct number of #end statements and don’t forget to close the clicommand.

We will now move on to Deployment of the configs we have created.