About: Eric Lippert
At my company Educative, we get to chat with developers from all over the world, get to know their story, who they are, and what inspired them to become developers and teach those around them. Today, I sat down with Eric Lippert and got to learn more about his career and the exciting world of C#, the system.random class, and probabilistic programming.
I was always fascinated by computers, even as a small child. I started programming when I was nine, by writing out on paper some little animation programs to make rocket ships fly around the screen. I had to use paper because I didn’t own a computer; I’d then type the programs in on the Commodore PET in the library after school to see if I got it right. My elementary school librarian was a very kind and patient person, and we’re still occasionally in touch many decades later.
Pretty soon after that, my parents got me a Commodore 64, and I started programming in earnest. I worked at a compiler company as a summer job in high school, where I got my first understanding of how professionals program. After that, I did a joint computer science/applied mathematics degree at Waterloo, and ended up working at Microsoft on the Visual Basic compiler as part of the co-op program. It was then very easy for me to choose to come to Microsoft and continue to work on languages.
I left Microsoft in 2012 and went to work at Coverity on improving their C# static analysis product for a couple of years, and now I work on developer tools at Facebook. Essentially I’ve been working on developer tools almost exclusively for some decades now; it’s a lot of fun working on the sorts of tools that I would like to use myself!
GetAFix is an experimental developer tool where we analyze a corpus of code changes which we believe are fixes to particular defects, and then try to deduce what the common fix patterns are. When presented with a novel fragment of code that might contain a similar defect, we deduce which fix pattern seen in the corpus is most likely to resolve the problem. We then present the proposed solution to the developer, and most of the time they agree that it is a good fix.
HackPPL adds probabilistic programming to Hack, which is a statically-typed variant of PHP. I did some architecture work on the Hack compiler proper and helped build the first prototype of the PPL extensions; it has been interesting to see how it has evolved since.
Absolutely. We face all kinds of problems in modern programming that involve statistical or probabilistic reasoning, but many modern, general-purpose programming languages do not present any kind of unified, consistent approach to helping developers solve these problems. For example, cell phone sensors have some error associated with them, so even answering a simple question like, “is the phone moving or still?” involves some probabilistic reasoning, not to mention more complex problems like “is the phone moving on a route that will encounter a construction delay?”.
Almost any problem we face in modern programming has some sort of uncertainty. Think about a few of the probabilistic problems in travel management. What is the probability that the user will need to make a change to their itinerary, or that any plane will be delayed causing a missed connection? What’s the probability that the recommendation that the user wants most is shown in the first three choices? It’s safe to say that many problems involve making predictions of an unknowable future, and we can make better predictions if our tools support principled statistical reasoning right out of the box.
Just as object-oriented programming is programming with objects, and functional programming is programming with functions, probabilistic programming is, no surprise, programming with probabilities. But you’d be right to point out that this tautological answer doesn’t tell us much about any of those programming paradigms.
The basic idea of a probabilistic programming language is that we build into the language itself the notion that a particular value may represent a distribution of possible values, and those values are used by the program to make choices:
- this cell phone is 90% likely to be moving north but 50% of the time the user has been on this route, they stop for lunch on the next block; should we inform the user of the construction delay five blocks north of them?
- this user is 20% likely to click on link X, but 30% likely to click on link Y; which link should we present? (Remember, the value function associated with the two links may be different!)
- this code fragment is 90% likely to have a null dereference defect, and this fix is 70% likely to remove the defect if it exists; should we present the fix?
Based on the control flow, the program then infers new probabilities based on combinations of old ones. The question to the language designer is then: how do we represent those combinations? How do we represent “60% of all emails are not spam, but 99% of emails that mention Nigerian bank offices are spam”, and use that to make a good decision about whether to filter an incoming email? What specifically does such a program look like, and how can we make it natural and easy for the developer to write such a program?
In many modern languages we have created tools that provide a unified, consistent approach to solving problems involving sequences of data; think about LINQ in C#, or sequence comprehensions in Python. How did we do that?
We started by coming up with a unifying abstraction that all sequences have in common, and then we built language elements that allow developers to combine those abstractions in a powerful way. There are some mathematical abstractions that are so much the “air we breathe” that we don’t even think of them as abstractions anymore, like addition or multiplication. The genius of LINQ in C# was to say that, just as addition and multiplication are built into the language as operations on numbers, the operations of sort, filter, group, join, and project are built into the language as operations on sequences. Just as you say:
x = a + b * c;
and have a natural intuition about what that means, so too you can say:
results = from c in customers where c.City == "London" select c.LastName;
Even if you are not a C# programmer, it is pretty easy to see that we’ve got a collection of customers and we’re asking “what are the last names of the customers in London?”. These operations are baked into the language, just as addition is baked in.
We could have a similar – in fact, almost identical! – approach to statistically distributed data similarly embedded into programming languages and their libraries. The connection between sequences and distributions is very strong; one of the ways to think about a distribution is that it is an infinite sequence of values: a six-sided die can be modeled as an unbounded sequence of rolls where each number appears some fraction of the time.
That said: the operations you typically perform on sequences can be very different than the operations you typically perform on distributions, so it is important to not go too far in treating two similar things as though they are the same thing. Operations like, “sample from this distribution” or “compute a posterior from this prior and this observation”, could be similarly abstracted into the type system and then supported by new features in the language. But we are only just starting to see these sorts of features appear in line-of-business languages.
In C# there is a class called System.Random that gives you two things: either a uniform distribution of fractions between 0.0 and 1.0, or a uniform distribution of integers between an upper and lower bound. Historically, the implementations of this class have been pretty poor in that it is very easy to write a buggy program using it; we want the natural, easy way to use a library to also be the right way, and it is not.
Fortunately some of these problems have been fixed in .NET Core, but the true deficiency is deeper than the poor implementation choices. The real problem is that we are well beyond merely needing a source of uniform randomness on an interval to sample from; the problems we have to solve that use probabilities are orders of magnitude more complex.
If the problem you have is, “the probability of a random person in a population having a disease is X%, and we have a diagnostic test that is correct Y% of the time; if a randomly chosen person tests positive, what is the probability that they have the disease?”, then the answer is neither X nor Y, but a combination of the two which we can work out mathematically.
This kind of reasoning is hard for humans to do, even if they’re trained. But if we have elements in our programming languages that represent prior probabilities, observations, and posterior probabilities, then we can write very straightforward programs that answer these questions for us correctly, just as we can write a straightforward program that means “give me the last names of customers in London”.
I started programming in C# when I worked at Microsoft in the very early days of the language. Just as the design process for C# 3 was wrapping up, I joined the C# compiler team, where I implemented a lot of the semantic analyzer. I was then invited to join the C# design committee; I spent about seven years at Microsoft working on the design of C# 4, 5, and 6, and implementing the “Roslyn” version of the compiler, again mostly concentrating on the semantic analysis engine. I’m particularly pleased to have worked on the overload resolution and type inference engine in several versions of C#; there are some interesting problems to solve! And as I noted before, I worked at Coverity for a couple of years on a Roslyn-based static analyzer that looks for defects in real-world C# programs.
First off, C# is by far the language that I know best; I spent many thousands of hours studying it carefully and thinking about its design and implementation. When faced with a novel problem, my thought process usually begins with “how would I do this in C#?”.
But that’s not what I love about it. C# was designed by professional, pragmatic programmers for professional, pragmatic programmers. It’s firmly in the OO family of languages, but the design is not dogmatically OO; the designers look at what is working well in functional languages, declarative languages, research languages, and so on, and incorporate the best ideas from those languages without losing sight of what makes C# feel like C#. It was a privilege to work with that design team for so many years.
I am very excited by where C# is going in C# 8; on the language design front, embracing non-nullable reference types in the language is an enormously bold move that will pay off in improved developer productivity and fewer user-impacting bugs. But what is really exciting is how well Microsoft has embraced the open source ethos for the language, and how this encourages the spread of the language beyond the Windows ecosystem and into the broader software community. Making that transition was not easy, and I applaud my colleagues for embracing a new way of working.
I’ve seen no evidence at all that probabilistic programming in C# is on the design team’s radar; I hope it is now!
And from that, it was very natural to continue supporting users by answering questions on SO, editing books about C#, and so on. It’s fun, and I learn a lot about where the pain points are in a language from all kinds of different perspectives. The problems that experts have with a language are very different than those that beginners have, but they’re both important; by improving the learning experience for a lot of beginners we will grow the next generation of experts.
I’ve been editing technical books as a hobby for a long time, and I’ve co-written a few books myself. I get asked to write new books about C# fairly frequently, and I push back on it primarily because I don’t have the immense amount of free time that it takes to write a book. But a secondary reason why I push back on writing new computer programming books is that it just seems bizarre to me that we write down computer programs on paper and expect people to learn from them like I did when I was nine years old. It’s not the 1980s anymore; computers are ubiquitous, and indeed, a lot of people do their reading exclusively on computerized devices now.
I’ve put up with this deficiency in my blog for over a decade now; I would really much rather people be able to see the code in the browser, and edit and execute it too. And I’ve often thought that if I were to write something of book length again, I would want to do it like an online fully-interactive course, rather than a static, ink-on-dead-trees book.
A platform like Educative, where I can embed runnable code right in each lesson, is a great start to improving this situation. And it is exciting because it is a start; there is so much more that could be done to make it even more interactive. For example, there are a lot of graphs in my lessons that show different probability distributions, but they are all static images; they could also be live and interactive, to allow the user to adjust different parameters and see what happens.
We’ve been trying to figure out how to incorporate computers into pedagogy, literally since I was a small child and the results have been mixed at best; we may finally be on the cusp of having the necessary tools to make it much easier for educators to connect with students in a rich and interactive way.
It’s apparent that probabilistic programming in C# can be tough, especially while using the system.random class, as Eric has alluded to.