The Big Reset
The late columnist and military science fiction author Jerry Pournelle once wrote that the scary thing about the idea of Armageddon is how attractive it is.
He was speaking in terms of civil and industrial design. Just imagine how much more efficient we could build cities if we could just start over with proper planning and design! Instead of being forced to deal with the mess and clutter of centuries of legacy architecture, we could raze those old properties to the ground and build everything anew.
In this hypothetical shiny new world, traffic jams would be a thing of the past. There would be plenty of affordable housing for everyone. Even pollution would be minimized, since industries that spewed out waste products would be located next to factories that consumed those same materials as inputs.
Of course, Dr. Pournelle didn’t intend for this argument to be taken seriously. More than most, he understood the fallacy and hubris of centralized planning.
To start over would require a massive concentration of political and economic power, and the one thing we know is that unchecked power is always abused, regardless of whether that power is governmental, corporate or theocratic. Worse, there’s no guarantee that planners would have sufficient foresight to accommodate future needs.
Nevertheless, I have some sympathy with this view, especially when it comes to the design of computer hardware and software.
The fact is that our computers are packed full of legacy architecture and vestigial remnants of past designs. Computers keep getting better, but most of those improvements are simply layers covering over the previous generation of technological design, much like the strata of a geological rock formation.
This is particularly true in hardware, but it also applies to software and networking protocols.
Most computers today use Intel x86 processors, whose instruction set and register architecture are derived from the design of the 8086 microprocessor first produced in 1978. That’s right, the fundamental design of the chip in your laptop is over forty years old.
Similarly, the internet protocols that you use to surf the web and read email are based on the TCP/IP standard, first introduced in 1983.
And the operating system for your Android phone is based on Linux, which was originally released in 1991. The design of Linux, in turn, was greatly shaped by the earlier Unix operating system, which goes all the way back to 1971. And although both Unix and Linux have evolved greatly in the decades since, many of the core concepts about security, processes, file systems, and kernels have not changed all that much.
So why should we care? What does it matter than our modern technology is based on really old ideas? What’s wrong with “if it works, leave it alone”?
Because it’s not working. Not as well as it should.
The first aspect I want to talk about is performance. Modern computers are very fast; they are also very inefficient.
I remember back in the 1980s, when I wanted to write a document, I would fire up a copy of WordPerfect on my Commodore Amiga, or possibly (a few years later) a Macintosh SE.
Three decades later, I have a machine that is 10,000 times more powerful sitting on my lap, and yet the software doesn’t feel any faster; in fact, it often feels slower.
Of course, I can get a word processor today that has 100 times as many features. But you know what? I don’t really need all those extra features. Sure, it’s better than the word processor I had in 1986; but it’s not 10,000 times better.
Similarly, when I run the Chrome browser on my Ubuntu laptop, I need at least 12Gb of RAM to prevent Chrome from eating all my memory and slowing the machine to a crawl. That’s 12,000 times as much memory as I had on my Amiga 1000. Where does it all go?
Applications today are big, complex and slow, and there are several reasons why.
The first reason is because traditionally, companies that make software only get paid when you either purchase or upgrade the software. So to maximize their profits, they needed a way to induce users to keep buying upgrades. Unfortunately, bug fixes alone simply aren’t enough to motivate most customers to shell out a hundred bucks for a new version. So software companies had to invent sexy new features that would make customers want to spend the money for the latest and greatest.
Unfortunately, most of the best feature ideas are exhausted early on, and after a while the set of new features becomes silly and trivial. As an example, I just got an advertisement to upgrade to the newest Mac OS Mojave. And the top ranked item on their list of (mostly inconsequential) new features was…a dark theme? That’s it?
The second reason for application bloat is because there’s a tradeoff between programmer productivity and software performance. The programming languages which have the highest performance (such as C++) are much more difficult to use, and much slower to develop in, than easy-to-use scripting languages like Python and Node.js.
But it isn’t just languages — there’s a whole host of development tools and frameworks that speed up development, make programming faster, but which have a cost in terms of memory and CPU consumption.
The following is a gross oversimplification, but has some truth to it: if you are willing to have your software run twenty times slower, you can write that software twenty times faster. And for most companies, that is a trade-off worth making — after all, programmers are expensive and computing power is cheap, right?
A third factor is backwards compatibility. A web browser such as Internet Explorer 10, for example, has tons of additional logic in it, because it needs to be able to render web sites that were written for IE9, IE8, IE7, and so on. This makes the program much larger and more complex than if it only had to support web sites that conformed to the latest web standards. The same holds true for other kinds of legacy standards, such as file formats and operating system interfaces.
Software inefficiency didn’t matter so much in past decades, in large part due the influence of Moore’s law. With computer power doubling every 18 months, the easiest way to speed up a slow program was simply to wait a year until computers got faster.
But now we’ve come to the end of Moore’s law. Computing power isn’t doubling every 18 months any more — it’s more like every 20 years. We’re not going to be able to depend on Mr. Moore to speed up our slow, inefficient applications any more.
The computer industry today is in a situation that’s much like automobiles were in the early 1970s — cars were large “gas guzzlers” because gasoline was cheap, and auto makers were in an arms race to produce bigger, heavier vehicles like Cadillacs and Lincoln Continentals because that’s what car buyers wanted.
But then came the energy crisis, and gas prices spiked; governments started to clamp down on smog and pollution (which was killing people by the tens of thousands every year), and eventually we got things like CAFE standards. So auto makers were forced to start thinking small, to build cars that got 40 miles to the gallon instead of 10. Those giant behemoths of the 1970s now seem like dinosaurs or quaint museum pieces.
Unfortunately, making our software and hardware more efficient isn’t going to be an easy job.
Different aspects of computing technology evolve at different rates. In general, the things that are the hardest to change — that are most deeply entrenched — are things that can’t be done unilaterally by a single company, but require revising shared agreements between many stakeholders.
To get an idea of how difficult this is, imagine if the USA and Europe wanted to adopt a common set of standards for electrical power outlets. This decision would affect the design of every electrical device that plugs into a wall socket.
For analogous reasons, it’s very difficult to change low-level computing standards, such as machine instruction sets and register architectures, because they affect every piece of software that runs on that machine.
As one example, earlier I mentioned the fact that we’re still using the Intel x86 instruction set and register architecture. A design which is, to put it mildly, clunky and awkward to program. Years ago I wrote low-level machine code for x86 processors by hand, and it was significantly more difficult than writing code for modern processor designs.
In a way, it’s much like the human brain, where all of the snazzy modern architecture of the neocortex is layered on top of this ancient reptile brain.
As most hardware engineers know, we have had superior designs for CPUs for a long, long time. But we also have hundreds of thousands of mission-critical applications that will never be ported to a new instruction set. And so people who buy computers will tend to stay with the “safe” choice and go with what has worked before, rather than trying something new.
Apple (I have to give them credit) has managed to switch CPU architectures twice, but that’s because they control both the hardware and the core software. There is no way that a company like Microsoft could convince both hardware vendors and software developers to switch to a new processor architecture.
On the other hand, virtually all smart phones use the newer ARM processor design which is much more evolved than the Intel x86. The reason this was possible is because smartphones themselves are relatively new, and when they were first introduced there were no legacy applications holding them back. A similar effect is true for game consoles and other devices which have no need to run older applications.
But desktops and laptops, for the most part, are still running on top of that ancient reptile brain.
The next aspect I want to talk about is security.
If you read the news at all, you probably know that major security breaches — in which sensitive private information about millions of people falls into the hands of cyber-attackers and thieves — are a fairly regular occurrence now, happening several times a year. We’ve become numb to the danger that criminals are easily able to obtain financial and personal data that could be used for scams, blackmail, credit card fraud, and identity theft.
But if you are a software engineer who pays attention to news sources that report security vulnerabilities, you know that the problem is much worse than most people think. New security holes are being discovered daily in all kinds of software, from the operating system on your desktop computer to the software that runs your car. Software that has been trusted for years is suddenly found to be untrustworthy.
Critical software programs — like the Windows operating system — are being audited over and over again to try and root out these problems and fix these vulnerabilities. And yet, it never seems to end, no matter how many times we review our software, there always seems to be more holes. End-users are constantly being bombarded with patches and security updates and wonder why their work has to be constantly interrupted.
It gets even worse when we start to consider cheap imported devices like webcams and thermostats. A lot of the companies making these products don’t even try to find the security flaws, despite the fact that many of these products are known to be laughably insecure.
Can’t we just fix these problems once and for all? Isn’t there a silver bullet that will make our software systems secure by default?
Did it really have to be like this?
The answer is, no it didn’t.
The root of the problem is that we’ve been thinking about security the wrong way. Our operating systems and programming languages are designed around a particular approach to securing systems, one that is fundamentally weak. Worse, security experts have known this for a long time, but getting the software industry to adopt a radically different approach to building hardware and software is extremely difficult.
Our error is that we tend to think of security as a shield or a wall — that is, as a protective barrier that stands between the threats outside, and the thing we want to protect inside.
Imagine that you are a wealthy individual, living in a large house on a large estate. You don’t want thieves coming in and taking your stuff, so you build a high wall around the estate grounds. You also put in gates with strong locks, guard towers to watch the perimeter, trip wires, and every other security measure you can think of.
But you don’t want to have to deal with guards and trip wires in your daily life, so inside the compound there is no security at all. No limits on what you are allowed to do. Why worry, when you have this strong wall between you and any threat?
Getting past the wall is difficult, but not impossible. And once inside the outer perimeter, an attacker can pretty much do whatever they want — steal stuff, burn the house down, kidnap people, and so on.
And when the inevitable security breach happens, and there’s an outcry for greater security, what do we do? We respond by trying to make the wall stronger.
Of course, what I have just described is a metaphor, but it is one that fits fairly well. For example, the whole concept of a “firewall” — where a corporate network is isolated from the rest of the world via a protective barrier — works pretty much like I have described. The same holds true of “root access” or “privilege levels” within a single computing device, where the system administrator has special powers that gives them permission to do anything.
Despite all the protections, attackers are able to bypass firewalls and gain root access on a fairly regular basis. Often they accomplish this via ‘social engineering’ — convincing some person inside the barrier that they should be allowed in. No wall is perfectly impenetrable, because walls need doors. A wall that lets no one in or out isn’t much use.
The reason we build software this way is because it’s convenient. It lets us design applications without having to think about the hassle of securing them. We can then take our insecure application and wrap it in a layer of security — “walls” — as an afterthought.
But if walls aren’t the right answer, what is? How do we fix this? What’s the alternative?
There are several possibilities. However, a holistic approach is required. It’s not enough to just re-write a single application — proper security has to include the entire stack, all the way down to the bare metal.
One idea that security researchers have been developing for decades is called the capability model. This is a world where, instead of everything being permitted by default, nothing is. You can’t do anything — not open a file, allocate memory, or even run code — unless you have a capability object giving permission to do so, and that capability object only gives you the narrow permissions you need to carry out your task. For example, an application that needs to load user preferences from a settings file might be granted the capability to load that file, and only that file — not the ability to read arbitrary files in your home directory.
Because there’s no “soft chewy center” in this model, an attacker who gains access to a particular program can’t do very much, since they can only exercise the capabilities granted to that program.
Unfortunately, up to this point the capability model has been relegated to the domain of academic researchers and hobbyists. A few lone voices in the wilderness extol the virtues of capabilities, but they tend to be ignored by the computer industry at large.
However, this may be changing, as we’ll see.
I want to briefly touch on networking. The protocols that make up the Internet, such as TCP/IP, DNS, and HTTPS, are brilliant in design, and are in large part responsible for the explosive growth and success of the web.
But at the same time, these protocols were created in a gentler, more trusting time. A lot of them came from an academic background, and didn’t anticipate the problems we suffer with disinformation, denial-of-service attacks, trolling, and cyber-intrusion by state-sponsored groups. While there have been measures taken to mitigate these threats, they are only partially effective.
One issue which has never been satisfactorily addressed is the issue of identity. It’s hard to hold people accountable for bad behavior when user accounts are disposable — instead of being punished for violating the rules, just close that account and make a new one. Or write a script to make a thousand new ones, as it costs nothing to do so.
At the same time, there’s also a need for people under threat from repressive governments to communicate without having to expose their real world identities.
What is needed is a middle ground of strong pseudonymity, which would allow users to forge robust, non-disposable identities that are costly to replace, but which allow for people to speak freely without fear of retribution. Unfortunately, no online service supporting such a concept has managed to gain widespread popularity.
However, strong identities are not enough if they can be created and disposed at no cost. What I would like to see in a world in which identities grow stronger with time and effort — that is, the more work you invest in building up an identity, the more far-reaching access and capability you gain. Throwing away that identity means throwing away all that work. Thus, the threat of blacklisting becomes an effective deterrent for bad behavior; and a system of fair adjudication and appeals would ensure that the system remains just.
The problems I have described are so massive, so entrenched, that it seems — hard to believe that we’ll ever solve them.
I often fantasize about the idea of starting over — to throw out everything, and re-design all of our computer technology, hardware and software, from scratch. We would of course incorporate all of the wisdom we have gained, but we could leave behind the mistakes of the past.
In the software world, a term that we use is “refactoring”. Originally the term comes from mathematics, and it means to transform an equation in a way that simplifies it but doesn’t change its essential meaning. In software engineering, it’s considered good practice to refactor your code at regular intervals — to simplify and streamline the code in a way that makes it easier to understand and maintain, but which preserves all of the benefits.
What I’d like to do is refactor the entire computing industry.
I realize of course that starting over would have some risks, such as the danger of losing our history. For example, a modern graphics card is able support programs that use ancient graphics protocols like CGA, EGA and VESA. If we lost that capability then there would be no way to play landmark games like the original Prince of Persia. We could, of course, get around this by building software emulators which would simulate the display capabilities of ancient hardware.
Another danger is the intrusion corporate and government agendas into the new designs. A lot of our software and network protocols were originally created by people who had a strong commitment to freedom and user empowerment, and those principles still linger, although they have been steadily weakened and undermined over time. We would want to make sure that the new designs don’t become a trap intended to disempower and ensnare users. An open, collaborative process where all stakeholders have a voice is the best way to achieve this.
I also recognize that these problems will require more than just technical solutions. They will require a change in corporate behavior, and given that customer demand doesn’t seem to be effective, the only method we know of is via government regulation. An imperfect tool, but the only one we have.
But still — brand new hardware, new operating systems, new networking protocols and new applications. What a fun and exciting project this would be to work on!
It seems like a pipe dream, doesn’t it? Hardly worth even contemplating. Seems like no one in their right mind would try and invent a new CPU architecture, or create a brand new operating system.
Except for one thing…
It’s happening.
RISC-V
I’m very excited about RISC-V, which is a design for a new CPU architecture that begin in 2010 at the University of Berkeley.
RISC-V is completely “open source”, which means that anyone can create their own RISC-V-derived processor without asking anyone’s permission or paying any license fees.
Of course, there have been other attempts to create open-source CPUs before, but they failed because the designs couldn’t compete with the performance of existing processors from companies like Intel and AMD. The RISC-V team included experienced hardware designers that could test every aspect of the design and validate that it would be fast and efficient when translated into silicon.
The RISC-V instruction set and register architecture include all of the features one would expect from a modern CPU design. The programming model is simple to understand, easy to learn, and the instruction formats are regular and consistent. There is already a rich ecosystem including compiler support, emulators, debuggers and other tooling.
RISC-V is designed to be extensible. There’s a small core set of instructions which is required to be present on every RISC-V CPU. In addition, there’s a set of “standard extensions” which manufacturers can choose to include on the chip. These extensions include things like hardware multiply and divide, floating point operations, vector operations, atomic instructions, and much more. Hardware vendors can also define their own custom extensions for specific uses such as graphics or machine learning. Exactly which extensions will be included will depend on the target application — for example, a microcontroller in a toaster might not need all of the fancy instructions that would be needed on a laptop.
RISC-V can operate in both a “RISC” mode where all instructions are fixed width, and a “CISC” (compressed) mode where instructions are variable-length. The latter is desirable in environments where there is limited memory and program size is a factor.
What’s most exciting, however, is the list of companies that are currently pushing RISC-V. Major players like Nvidia, Samsung, Western Digital, and Google — as well as a host of smaller firms — are actively working on RISC-V projects.
If you are really interested in finding out more, a great starting point is the slide deck for the RISC-V “state of the union” presentation given last year:
https://content.riscv.org/wp-content/uploads/2017/12/Tue0900-StateOfUnion-krste.pdf
Fuschia OS
Fuschia is a new operating system being built at Google. Unlike Android, which is based on Linux, Fuschia is an entirely new OS being built from scratch.
Fuschia uses the capability security model (described earlier) at every level. This means that it should be much more secure than current operating systems.
Fuschia is being targeted as a mobile OS for phones and tablets, but I suspect that if it is successful it will eventually find its way to desktops, servers and embedded systems.
Of course, there have been many previous attempts to create independent open-source operating systems. For example, EROS was a capability-based OS that never caught on. So what makes this one worthy of attention? Because it’s freakin’ Google, that’s why!
It may be the Fuschia won’t be as successful as I hope. Nor do I have any insight into what Google’s plans are. But what Fuschia is promising is something the software industry badly needs.
Summary
I can’t predict whether RISC-V and Fuschia will ever take off, but I hope they will. Of course, these are only two parts of a much larger problem, but these two parts are very important because they represent the very lowest levels of the computing stack. It’s the layers at the bottom that are typically the most difficult to replace.
There are a few other refactorings going on, but I’m not going to list them all here, especially since many of them are quite esoterically technical. One in particular is Named Data Networking (NDN), which is a complete rethinking of how content is accessed online.
There is a lot of work to be done, at every level. But I believe it is possible that someday we will have computers that are much more secure, and much more efficient, than the ones we have today