Progress toward a new model for UC Berkeley’s computational infrastructure

April 13, 2022

Our world today is computing-intensive and power-hungry. At UC Berkeley, as more and more academic and research activity requires computing, aging power transmission and computing facilities are strained to the breaking point. This reality was the impetus for embarking on a multi-year project to rethink the outdated data center and cloud services model and create a roadmap that would be scalable and remain flexible into the future. Several milestones have been hit on the Data Center and Cloud Services Strategy Roadmap project. We will share our progress, but first a bit of background on how we arrived at this point and the decisions the University has faced.

Scrambling to free data center capacity

One unseasonably warm morning in March 2022, workers quietly loaded 40 large servers from the loading dock at UC Berkeley's Warren Hall onto a salvage truck. As the truck rounded the corner and disappeared from view, the small team in Berkeley IT scrambling to accommodate the campus's insatiable appetite for computing chalked it up as a minor win. Unplugging those 40 old machines running business applications reduced the data center's power consumption by roughly ~25 kilowatts per hour (kW/hr measures power consumed on average over one hour - 25kW/hr is equivalent to running 250 100-Watt bulbs for an hour).

Later that same morning, the data center team received a new request that would eat up nearly half the power just reclaimed. For a single computer. 

The following week another request arrived for eight computers that would use 52kW on average and were critical to academic research driven by data science and machine learning. Over the course of one month, these eight computers will consume 38,000 kilowatts. That’s more than 42 times the power consumption a typical American household will use in the same time period.

Today these high-powered computers used to support machine learning and data science consume as much as 20 times the power of general-purpose machines. Having intensive computing housed and maintained on campus as the default — including at Warren Hall and even more so for local server closets — is inefficient from an expense management perspective. This approach also lacks resiliency from an energy and seismic perspective. 

Drivers of the current challenges

The most important trend is the growing application of computing to every business problem and every academic discipline. Berkeley has been at the epicenter of the development of key advances in data science, which combines statistics, informatics, and computer science. These increasing requirements for computing and power resources demand a modernized strategy. As well, the trend toward increasingly power-hungry computers is caused by what experts refer to as the growing "power density" of computing. While data center managers used to worry about rapidly filling up the floor space, with growing power density, the problem is bringing in enough power to keep the computers running. Power dense equipment also generates a lot of heat. It takes additional energy to cool things down, and this all has implications for Berkeley’s power usage and costs.

In short, today the Warren Hall Data Center is strained to the breaking point by the campus's insatiable demand for the powerful computing that increasingly drives the University's research mission. The University has historically not recovered the power costs of computing on the campus (considered part of overhead), nor placed any restrictions on what computing is allowed on the campus (as other universities have) which makes them effectively "free" for departments and attractive to researchers. Local server rooms and powerful computers under desks are exacerbating Berkeley's carbon footprint and bloating administrative budget deficits by millions of dollars and becoming more expensive every year. 

As many new buildings come online and demand for powerful computing power grows, the University itself has nearly exhausted available power on the main campus. While power expansion work has begun, new capacity will not come online until 2026 at the earliest. This means neither the campus data center nor local server rooms are the answer.

Work performed as part of the Data Center and Cloud Services Strategy Roadmap project has been further assessing the current situation and developing options to fend off disruption and offer new opportunities to the University’s teaching and research missions. 

Photo of servers in data center

 Old data center equipment that was decommissioned and a few of the newly emptied racks. Photo credit: Bill Allison

Learning the data center cannot be expanded

  • As a first step, in 2021, Berkeley IT partnered with Arup, a San Francisco-based engineering firm, to conduct an engineering assessment of the Warren Hall data center with the dual goals of empirically assessing its current state and the feasibility of expanding it while addressing any issues discovered. The Arup assessment resulted in several key findings, the data center was found to be:

    • At, or over, its maximum power capacity.

    • Exceeding structural weight limits.

    • Highly energy inefficient, requiring nearly twice the power to operate as modern commercial colocation facilities. 

Through this process, we also learned the improvement costs for long-term commitment to Warren Hall's data center are not competitive with other alternatives and therefore would not be advisable. The assessment recommended UC Berkeley immediately begin to shift IT workloads out of the facility.

Local servers in open spaces

Local servers like these in open spaces, or in server closets can use three times the energy of a modern colocation facility and cause expensive failures in building environmental controls systems. Yet the research represented in the photos above is world-class and can be better, and more cost-effectively supported. Photo credit: Bill Allison

The current strategy is old and the world has changed

The Arup analysis provided important data for University leadership. Most of the University's computing needs have been adequately met in Warren since 2004. The Warren data center is cheaper than alternatives, it is reliable, and it has reduced the demand for local server closets. But this strategy dates back to the old days of the early 2000s when there was not much difference between administrative computing and research workloads. Not so today, in addition to diverging in power consumption, there are also specialized types of computing. The campus data center houses all types of computing infrastructure, including the Berkeley Research Computing high-performance computing clusters, multiple research clusters managed by departments, as well as many mission-critical administrative applications that run the University. 

But much has changed over twenty years. Just imagine: the campus data center was planned half a decade before the first iPhone existed and before most of our undergraduate students were born! Right now researchers use public cloud (e.g. AWS, Google, Azure), private cloud (our Warren data center cloud options) and there are increasing colocation and edge computing options. Administrative computing needs are met by these options as well the public cloud and software-as-a-service applications (SaaS). Given the rapidly changing technological, climate, and financial landscape, the University needs an updated strategy that remains agile and astute to changes and opportunities.

Developing new proposals for campus computing infrastructure

Another phase of the project surveyed campus stakeholders to further drill down into the impact of moving campus from a one-size-fits-all data center and departmentally managed server rooms to a hybrid approach that combines:

  • Off-campus Berkeley IT-managed server colocation (efficient, cost-effective);

  • On-campus standardized modular data centers and hardware research labs (to meet needs for low latency, edge computing, connection to scientific equipment, or hands-on hardware research);

  • Public cloud incentives and tooling to promote more secure cloud adoption. 

To ensure the project had a comprehensive understanding of the University's needs and the financial implications of changes, starting in September 2021, UC Berkeley partnered with Deloitte, a consulting firm, to develop options to address the University's computational infrastructure needs. Through an in-depth assessment, Berkeley found that reenvisioning the University's current approach and offering more cost-effective and powerful modern data center services, combined with greater use of the public cloud, will allow for greater efficiency, effectiveness, and security of data center assets while also enabling innovation and a new competitive advantage for research, faculty recruitment, retention, and more. The financial savings of the proposals being developed will exceed $200M over the next 15 years.

A new approach for computing infrastructure built with campus stakeholder input

The project team interviewed 81 stakeholders in 28 interviews, documenting the key themes and anecdotes heard across campus. Major themes included needing specific applications and specialized services, supporting innovation and the University’s mission and funding model, and needing reliable storage and backup (and lots of it) with more autonomy to move quickly. Based on the assessment and stakeholder interviews, Berkeley identified potential future state computing infrastructure options that support the campus’ strategy and requirements. There were four strategic, governing principles considered:

  • Academic and Research Alignment: New computational infrastructure (onsite modular data center space, offsite data center colocation, and public cloud) to meet the energy-hungry and growing computational needs of research while enabling future campus growth through risk-mitigating adoption of modern infrastructure. 

  • Financial Advantage: New computational infrastructure (onsite modular data center space, offsite data center colocation, and public cloud) will enable more cost-effective future campus growth at UC Berkeley, saving millions of dollars over the next decade.

  • Risk Management: New computational infrastructure (onsite modular data center space, offsite data center colocation, and public cloud) will improve agility through the adoption of leading infrastructure architecture practices. Additionally, to enhance infrastructure scalability for optimization of current, existing resources.

  • Operational Flexibility: New computational infrastructure (onsite modular data center space, offsite data center colocation, and public cloud) will improve fault tolerance in the event of unplanned service disruptions and strengthen the high availability and resiliency position of critical infrastructure components.

Next steps in the project

Deloitte and UC Berkeley partnered on a future state exercise to develop three options for campus stakeholders. Campus leadership has asked for detailed proposals for the most promising long-term and cost-effective option. The project team is working to produce "shovel-ready" plans for off-site colocation, standard energy-efficient on-campus small modular data centers, and more effective public cloud services. 

Following the campus governance process, once campus leadership confirms the future state direction, a detailed, transparent plan will be developed to move the campus from a one-size-fits-all data center to a new model for a computing infrastructure that will offer facilities on- and off-campus along with an integrated, strategic approach to the public cloud.

Berkeley will use a data-driven approach to change management to ensure a sustainable transformation out of Warren Hall to the new hybrid service model. During all points of this transformation, UC Berkeley will foster transparent communication and engagement to increase awareness, build readiness, and drive buy-in among our campus stakeholders. Jenn Stringer, AVC IT and CIO and Sally McGarrahan, AVC for Facilities Services, led by Bill Allison, UC Berkeley CTO. The core project team includes Steve Aguirre, Manager of Data Center Operations; Charron Andrus, Associate Chief Information Security Officer; Elizabeth Brashers, Chief of Staff to the Vice Chancellor for Research; Dave Browne, Executive Director of Campus IT Infrastructure; Diane Coppini, Facilities Services, Director of Engineering and Technical Services; Eric Fraser, Assistant Dean, Director of IT, College of Engineering; Ken Lutz, Executive Director, Multiscale Systems Center; Liz Marsh, Executive Director of Strategy & Partnerships and Chief of Staff to CIO; Gert Reynaert, Senior Project Manager; Faye Snowden, Technology Program Office Manager, Walter Stokes, Director of Data & Platform Services; and David Turner, Director of Administrative Applications.

The project team and IT leadership will be engaging with campus through workshops and town halls, sharing updates and information through our project website, and briefing executive leadership and key committees. 

For more information and updates on the new data center project, please visit the Data Center & Cloud Services Strategy Roadmap Project