Wi-Fi Issues Fall 2021

Current Resolution Efforts as of Oct. 29, 2021

Ongoing monitoring of the system continues after changes made late last week and early Monday morning resulted in significant improvements to campus-wide Wi-Fi performance. Additional hardware has been ordered to further augment the capacity of the Wi-Fi authentication system and will be installed after it is delivered (pandemic related supply-chain issues have affected the planned delivery date, which is now Nov. 17). Additional sensors are being acquired and will be installed in locations across campus to improve ongoing monitoring and detection of Wi-Fi issues on campus.

Why are we having Wi-Fi issues?

We started Wi-Fi improvements in FY 2018, moving from old, obsolete Cisco Wi-Fi equipment to new Aruba. The Aruba platform was chosen for implementation through an RFP, and was chosen to provide a better Wi-Fi experience across campus, especially in large classroom environments. Prior to the start of the pandemic in 2020, Aruba equipment had been deployed in the majority of instructional buildings on campus and was operating in a stable manner (this map shows the locations that have migrated to Aruba equipment, highlighted in green). Due to the prioritization of available funding to improve classroom spaces, the buildings that house mainly administrative functions and the sports/entertainment venues remain on the old / obsolete (7-10 year old) system. 

While diagnosing the widespread start-of-semester Wi-Fi issues, we discovered that vendor software updates in late 2019 introduced a bug that causes system performance issues under peak load and high user roaming activity. Unfortunately, when COVID hit and faculty, students and staff went fully remote, it masked these bugs from being detected. As people returned to campus in large numbers this Fall and classroom activity ramped up, these bugs manifested, causing a painful and disruptive experience for a large number of faculty, students, and staff who use campus Wi-Fi. Although these issues were present around the world in Aruba equipment for many months, UC Berkeley was one of the first customers to detect these bugs, due to our large environment and the load that we started placing on the system (many other college campuses were similarly affected). We engaged with the highest levels of vendor management, and obtained their urgent attention on this issue, both for resolution of the issues and for prevention of future problems. 

Initial resolution efforts improved the stability of the underlying infrastructure, but they did not fully resolve the issues experienced by faculty and students connecting to Wi-Fi on campus. The initial issue manifested by Wi-Fi access points losing connection to the system at peak times and in various locations, which made devices lose connection and unable to reconnect. Mitigating this issue resulted in improved overall connection stability for devices that were able to connect. Subsequent analysis showed that devices were still having problems during peak times with delays of 15 minutes or longer when connecting. Over the past several weeks, additional changes have been implemented  that have largely addressed these issues.

Issues & Response Summary

When IST started diagnosing the problem, it quickly became apparent that the situation required vendor involvement, so we engaged with Aruba to help troubleshoot and diagnose the issues. Workarounds and fixes were implemented to stabilize the system and, while it fixed the problem identified, it also uncovered additional issues. Further workarounds were implemented to stabilize the system, and the vendor continued to work on a permanent resolution to the bug.

As IST proceeded with troubleshooting, they realized that the data captured through monitoring was not providing a complete picture of the actual user experience, so they dispatched personnel to perform on the ground testing at some of the most hard hit areas to better understand what was occurring. Sensors that simulate user activity were also installed to help measure user experience. This work continued throughout the months of September and October, resulting in additional tuning while vendor work on the permanent fix continued.

Ongoing troubleshooting and analysis uncovered performance issues in the Wi-Fi authentication system, which were exacerbated by roaming protocol settings in the system that required device re-authentication when moving between Access Points. Implementation of 802.11r ‘fast roaming’ protocols, installation of the permanent fix to the vendor software bug, and several authentication system tuning changes have resulted in much improved Wi-Fi performance campus-wide.

Issues & Response Timeline

  • Pre-March 2020 - Changes were made in the vendor software that controls and manages device/AP/Controller connections on the Wi-Fi management system just prior to the start of the pandemic that introduced flaws in how the system handles Device/AP/Controller connections under high load in very large, complex environments like UCB. The issue remained largely dormant during the pandemic because of the extremely low population on campus (and other similar/large Wi-Fi environments) during the pandemic but became obvious during the first full week of instruction (other higher education customers of the vendor who were remote during the pandemic and returned to campus instruction in the Fall experienced the same issues).  

  • Aug. 30 - When large-scale issues with Access Points losing connection to the system and causing large numbers of devices to also lose connection were first detected IST started diagnosing the problem. It became quickly apparent that this was not something that had previously been experienced, so IST engaged with the Wi-Fi vendor to troubleshoot and diagnose the issues. Over the next two days, workarounds were identified and implemented to stabilize the system. System stability was observed Sept. 2 in the afternoon through Sept. 7.
  • Sept. 8 - A software fix from the vendor was applied to the system, and while it fixed the problems identified, it uncovered an additional issue. Additional mitigations were implemented to stabilize the system that afternoon.
  • Sept. 9 - An instructor helpline was opened with extended hours of operation to assist with any Wi-Fi issues impacting instruction. Additional workarounds were implemented to stabilize the system and it has remained stable since then. The vendor continued to work on a permanent fix and IST worked with them on a plan to test / validate / and schedule the remaining fix in a way that minimizes / eliminates any further large-scale disruption to instruction. 
  • Aug. 30 to Sept. 13 - As IST proceeded with troubleshooting, they realized that the data captured through system monitoring was not providing a total picture of the actual user experience, so they dispatched personnel to perform on the ground testing at some of the hardest hit areas to fully understand what was occurring and to enhance troubleshooting efforts.
  • Sept. 13 to Oct. 4 - Additional analysis of gathered data and inspection of system settings revealed that a system setting needed to be changed. This setting has to do with the 802.11r protocol for ‘fast roaming’, and is expected to reduce authentication traffic and improve user experience.

  • Oct. 1 - Wi-Fi authentication traffic for Cal Visitor routed to virtual servers to offload that traffic from the Airbears2/Eduroam authentication servers. Two sensors added on campus to measure and detect user experience issues to assist with troubleshooting efforts.

  • Oct. 7 - 802.11r ‘fast roaming’ protocol implemented in the early morning, with observed positive impact and elimination of ‘timeout errors’ in the system. Additional tuning of system settings implemented in the early afternoon, resulting in a reduction of ‘Wi-Fi’ association errors and overall improved Wi-Fi connection performance.

  • Oct. 8 - Three additional sensors added on campus to provide additional user experience data.

  • Oct. 12-14 - Permanent fix received from the vendor and installed to resolve the Wi-Fi controller software bug.

  • Oct. 15 - Wi-Fi authentication tuning changes implemented to reduce system congestion and improve system performance.

  • Oct. 18 - Additional vendor recommended changes implemented to improve the performance of the authentication system.

  • Oct. 21 - Enabled monitoring that was disabled to avoid system overload while the permanent fix was being developed. Added temporary capacity to Wi-Fi authentication by adding Virtual Machine resources.
  • Oct. 27-29 - Additional tuning changes implemented to improve authentication system performance and reduce device connection wait times. Reboots of individual Access Points to address location specific issues affecting device connections and connection speeds.

Next Steps

Continued monitoring of the system, sensors, and tickets to detect persistent system health and user experience issues. 

Delivery (estimated Nov. 17) and installation of additional Wi-Fi authentication hardware to increase system capacity and handle peak traffic. 

Individual issues that remain with connecting to Eduroam are being addressed on a case-by-case basis.  

How You Can Help 

If you encounter persistent issues connecting to Wi-Fi and need assistance:

  • Students with personal devices - Drop-in IT support for students is available in Eshleman Hall (1st floor) and Moffitt Library (4th floor); see hours of operation. You can also contact Student Technology Services at 510-642-4357 or email sts-help@berkeley.edu

  • Faculty, Staff, and Student employees - Drop-in technical support is available Monday through Friday, 9 a.m. to 3 p.m. for faculty and staff in Dwinelle Hall, Room 128. You can also contact the ITCS Service Desk at 510-664-9000 (option 1) or submit a ticket

The information you provide will help us better understand the impact of the problems and improve our troubleshooting efforts. When reporting an issue, the following details help us diagnose, troubleshoot, and resolve the issue:

  • time of day

  • specific location/building/room

  • Eduroam vs. AirBears2 vs. CalVisitor

  • what kind of device you are using

  • specific problem(s) encountered

  • screenshots of error messages

  • how often you have experienced this at similar/different locations within the past 24-48 hours