Exactly 10 years ago, at 15:05 Eastern Time on August 14, 2003, an overhead power line came into contact with an overgrown tree near Cleveland, Ohio.
What happened next is a frightening case study of how vulnerable modern economies are to even a small disruption in vital and highly interconnected systems.
Little more than an hour later, a cascading power failure had blacked out most of the Northeast United States and neighbouring parts of Canada — leaving 50 million people without power, some for up to four days.
By 16:13 Eastern Time, parts of Ohio, Michigan, Pennsylvania, New York, Vermont, Massachusetts, Connecticut, New Jersey and the Canadian province of Ontario had been sent back in time more than 100 years to a pre-electrical age.
“How did we become so vulnerable?” Amory and Hunter Lovins asked in a report entitled “Brittle Power: Energy Strategy for National Security,” for the U.S. government.
The Lovinses could have been writing after the 2003 blackout. In fact, the report had been written more than two decades earlier, in 1982, for the Federal Emergency Management Agency, which even then was concerned about emerging threats to the nation’s critical infrastructure.
Amory and Hunter Lovins worried ” a few people could probably blackout most of the country.” They cited the risk of terrorism, sabotage or a hydrogen bomb, in what would come to be seen as an eerily prescient warning about the risks to critical infrastructure after the attacks on the World Trade Center on September 11, 2001.
Terrorism was quickly ruled out as a cause of the August 2003 blackout. But the Lovinses had prophetically identified the weakness in North America’s electricity network.
“America’s energy vulnerability is an unintended side effect of the nature and organisation of highly centralised technologies. Complex energy devices were built and linked together one by one without considering how vulnerable a system this was creating,” they wrote.
“Americans’ most basic functions depend … on a continuous supply of electricity,” the co-authors explained.
“Without it, subways and elevators stall, factories and offices grind to a halt, electric locks jam, intercoms and televisions stand mute, and we huddle without light, heat or ventilation. A brief faltering of our energy pulse can reveal … the hidden brittleness of our interdependent, urbanised society.”
The blackout concentrated minds in the electricity industry and on Capitol Hill. In the ten years since August 2003, the North American power industry has invested billions of dollars upgrading computer systems, training control room staff, and cutting down vegetation along power lines.
Reliability standards which were once voluntary have become mandatory. New synchrophasors are being rolled that will provide updates about the state of the grid several times a second, giving control room staff more situational awareness.
Lessons have been learned. The precise failures which led to the 2003 blackout are unlikely ever to be repeated.
But every blackout results from a unique cocktail of causes. In a complex systems, it is not possible to reduce the risk of catastrophic failure to zero.
The 2003 blackout was not the first. There had been earlier mass blackouts in 1965 (affecting 30 million people in the Northeast), 1977 (9 million people in New York), 1982 (5 million people on the West Coast), 1996 (two big blackouts on the West Coast) and 1998 (152,000 people in Minnesota and neighbouring states).
It will not be the last. In 2011, a blackout cut power to 2.7 million people across southern California and Arizona. In 2012, the two biggest blackouts in history rolled across India’s power grid, cutting power to states containing half of the country’s 1.2 billion people.
While some aspects of reliability have improved enormously, other vulnerabilities are increasing.
The risk of cyber-attacks by terrorist groups or hostile states has increased, ironically because the United States itself has demonstrated the power of cyber-operations with its Stuxnet attack on Iran’s nuclear programme.
Integrating more unpredictable sources of power like wind and solar into the electricity network is also increasing the reliability challenges for grid coordinators.
The trend towards linking up local and regional networks into super-grids, connecting entire countries or continents in China, India, Europe and Latin America, is increasing the very interconnectedness that lay at the heart of the 2003 power blackout.
Despite all the precautions, it could and will happen again.
AN ORDINARY DAY
August 14, 2003, was a typical summer day in Ohio. Power consumption was high as a result of airconditioning demand, but well below the peaks recorded at the same time in previous years. The grid had coped with much worse.
“Peak load conditions on a less than peak load day,” is how managers from First Energy, which managed the local grid, described it to investigators from a joint U.S. and Canadian government task force set to up to establish the causes of the blackout.
Several generating units were undergoing maintenance and were unavailable. At 13:31 the East Lake 5 generating unit, which produced almost 600 megawatts of power, unexpectedly tripped off and was no longer available. Without East Lake 5, the region became dangerously dependent on the nearby Perry nuclear power plant to keep meeting demand.
The Cleveland-Akron region, on the southern shore of Lake Erie, had been identified as a “transmission constrained area” with limited links to the rest of the Eastern Interconnection, the giant power network that services the eastern two-thirds of the United States.
The potential for problems was well-known to grid operators because Cleveland-Akron had suffered severe power shortages and transmission congestion in 1994 and 2002. Two power lines had already failed that afternoon, increasing pressure on the grid.
Nonetheless, none of these factors was responsible for the rolling blackout which occurred later that afternoon. Subsequent modelling by task force investigators established that the condition of the electricity grid was vulnerable but stable before a tree contact caused the third, Harding-Chamberlin, power line to trip at 15:05.
“The central organising principle of electricity reliability management is to plan for the unexpected,” the task force explained. “The unique characteristics of electricity mean that problems, when they arise, can spread and escalate very quickly if proper safeguards are not in place.”
No matter what happens, how many generating units and transmission lines are unavailable, the grid must be operated at all times to ensure it will remain in a safe condition.
Operators must assess the worst case scenario, usually the loss of the largest generator or transmission line on the system, and plan how to meet it (the “N-1 criterion”). If it happens, they must be able to bring the system back into a safe operating condition within no more than 30 minutes, and start planning to meet the next worst-case scenario.
To cope with emergencies, area controllers can order more generation to come on line or seek help from neighbouring areas by requesting transmission loading relief.
Grid managers can cut power supplies to customers with interruptible power supplies and request voluntary conservation. But if all else fails, controllers are expected to start disconnecting blocks of customers to protect the rest. From a reliability perspective, it is better for a few customers to suffer a power cut than risk a cascading power failure across the network.
COMPUTERS AND TREES
When the Harding-Chamberlin power line came into contact with a tree, the failure of Perry nuclear plant became the N-1 contingency. Grid controllers should have disconnected 1,500 megawatts of load to safeguard the system. It would have blacked out much of Cleveland-Akron but the rest of the Eastern Interconnection would have been safe.
Unfortunately, the FE control room was unaware of the danger because several critical computer systems were not operating properly, including the automatic alarm systems. It did not help that reliability and transmission operators were situated in different rooms, or that unusually the control room did not have a visual display of the topology of the grid, its generating assets and transmission lines.
The first indication that something was wrong came only at 15:42, more than half an hour after the critical tree contact.
“Nothing seems to be updating on the computers,” operators at First Energy’s control room, who were responsible for controlling the grid in northern Ohio, told their IT staff.
“We’ve had people calling and reporting trips and nothing seems to be updating on the event summary … I think we’ve got something seriously sick.”
Around the same time, an operator from Perry nuclear power station telephoned the control room to warn that the plant risked an automatic shutdown for safety reasons: “I’m still getting a lot of voltage spikes and swings on the generator … I don’t know how much longer we’re going to survive.”
Four minutes later, the Perry operator telephoned again: “It’s not looking good .. We ain’t going to be here much longer and you’re going to have a bigger problem.”
By that time, it was almost too late. Less than 24 minutes later Cleveland-Akron was blacked out. Much worse was to follow. A rolling power failure had begun that would shut down 265 power plants, with 508 generating units, including 10 nuclear power stations, within the next 8 minutes.
As each power line failed, the remaining lines became more and more congested, heating up and sagging closer and closer to trees and other vegetation. Two additional power lines failed between 15:05 and 15:39, and then 16 more by 16:08, as the situation around Cleveland became critical.
Perhaps if the grid controllers had initiated disconnections in Cleveland a full-blown crisis could have been averted. But it was the failure of another power line, Sammis-Star, at 16:05 that turned a local problem into a regional disaster. The Sammis-Star outage was the critical event leading to widespread cascading, investigators concluded.
“The collapse of FE’s transmission system induced unplanned shifts of power across the region,” according to the task force. “With paths cut from the west, a massive power surge flowed (from Pennsylvania, New Jersey and Maryland) into New York and Ontario in a counter-clockwise flow around Lake Erie to serve the load still connected in eastern Michigan and northern Ohio”.
Protective relays monitoring power lines interpreted the surge as a fault and triggered the circuit breakers to protect the equipment. What followed was a high-speed race, in which power surged along the few remaining pathways on the grid, and the relays and circuit breakers rushed to disconnect more and more transmission lines.
First the Northeast U.S. power network and Ontario were disconnected from the rest of the United States and Canada. Seconds later this giant electrical island broke apart into dozens of smaller fragments.
As voltage and frequency started to fluctuate wildly within each island, protective relays shutdown almost all of the remaining transmission lines and generators to protect them from damage. By 16:13 the Northeast was dark.
In its final report on the causes of the blackout, the U.S.-Canada Power System Outage Task Force identified poor vegetation management, computer failures, inadequate training and lack of real-time situational awareness of grid conditions as the main factors behind the disaster.
First Energy was harshly criticised, but the task force identified institutional failures across the industry, particularly in setting and enforcing reliability standards, and coordinating across the grid. No fewer than 46 recommendations were made to prevent the blackout recurring (“Final Report on the August 14, 2003 Blackout” April 2004).
As usual, a major blackout spurred the industry and Congress to enact long-stalled reforms. The North American Electric Reliability Council became the North American Electric Reliability Corporation (NERC). Under Title XII of the 2005 Energy Policy Act, NERC was given power to set mandatory rather than voluntary standards across the industry.
Although the 2003 blackout was not caused by a cyber-attack, NERC has stepped up efforts to strengthen the grid from malicious activity caused by hackers, terrorists and foreign powers, through its Critical Infrastructure Protection Committee.
Despite all this work, could massive blackouts happen again? Yes. The subsequent mass blackouts in California-Arizona and India point to the continuing risk.
Eight years after the August 2003 blackout, NERC found the California blackout in September 2011 happened because “the system was not being operated in a secure N-1 state. The failure stemmed primarily from weaknesses in … operations planning and real-time situational awareness,” which is similar to what happened in Ohio.
The risk of cascading failure is inherent in complex interconnected systems, as the Lovinses realised. It can be reduced via careful systems analysis and contingency planning, but will never be reduced to zero.
(Editing by Anthony Barker)