Out of the AI Winter and into the Cold

dwave_log_temp_scale
A logarithmic scale doesn’t have the appropriate visual impact to convey how extraordinarily cold 20mK is.

Any quantum computer using superconducting Josephson junctions will have to be operated at extremely low temperatures. The D-Wave machine, for instance, runs at about 20 mK, which is much colder than anything in nature (including deep space). A logarithmic scale like the chart to the right, while technically correct, doesn’t really do this justice.  This animated one from D-Wave’s blog shows this much more drastically when scaled linearly (the first link goes to an SVG file that should play in all modern browsers, but it takes ten seconds to get started).

Given that D-Wave’s most prominent use case is the field of machine learning, a casual observer may be misled to think that the term “AI winter” refers to the propensity of artificial neural networks to blossom in this frigid environment. But what the term actually stands for is the brutal hype cycle that ravaged this field of computer science.

One of the original first casualties of the collapse of artificial intelligence research in 1969 was the ancestor of the kind of learning algorithms that are now often implemented on D-Wave’s machines. This incident is referred to as the XOR affair, and the story that circulates goes like this:  “Marvin Minsky, being a proponent of structured AI, killed off the connectionism approach when he co-authored the now classic tome, Perceptrons. This was accomplished by mathematically proving that a single layer perceptron is so limited it cannot even be used (or trained for that matter) to emulate an XOR gate. Although this does not hold for multi-layer perceptrons, his word was taken as gospel, and smothered this promising field in its infancy.”

Marvin Minsky begs to differ, and argues that he of course knew about the capabilities of artificial neural networks with more than one layer, and that if anything, only the proof that working with local neurons comes at the cost of some universality should have had any bearing.  It seems impossible to untangle the exact dynamics that led to this most unfortunate AI winter, yet in hindsight it seems completely misguided and avoidable, given that a learning algorithm (Backpropagation) that allowed for the efficient training of multi-layer perceptrons had already been published a year prior, but at the time it received very little attention.

The fact is, after Perceptrons was published, symbolic AI flourished and connectionism was almost dead for a decade. Given what the authors wrote in the forward to the revised 1989 edition, there is little doubt how Minsky felt about this:

“Some readers may be shocked to hear it said that little of significance has happened in this field [since the first edition twenty year earlier]. Have not perceptron-like networks under the new name connectionism – become a major subject of discussion at gatherings of psychologists and computer scientists? Has not there been a “connectionist revolution?” Certainly yes, in that there is a great deal of interest and discussion. Possibly yes, in the sense that discoveries have been made that may, in time, turn out to be of fundamental importance. But certainly no, in that there has been little clear-cut change in the conceptual basis of the field. The issues that give rise to excitement today seem much the same as those that were responsible for previous rounds of excitement. The issues that were then obscure remain obscure today because no one yet knows how to tell which of the present discoveries are fundamental and which are superficial. Our position remains what it was when we wrote the book: We believe this realm of work to be immensely important and rich, but we expect its growth to require a degree of critical analysis that its more romantic advocates have always been reluctant to pursue – perhaps because the spirit of connectionism seems itself to go somewhat against the grain of analytic rigor.” [Emphasis added by the blog author]

When fast-forwarding to 2013 and the reception that D-Wave receives from some academic quarters, this feels like deja-vu all over again. Geordie Rose, founder and current CTO of D-Wave, unabashedly muses about spiritual machines, although he doesn’t strike me as a particularly romantic fellow. But he is very interested in using his amazing hardware to make for better machine learning, very much in “the spirit of connectionism”.  He published an excellent mini-series on this at D-Wave’s blog (part 1, 2, 3, 4, 5, 6, 7).  It would be interesting to learn if Minsky was to find fault with the analytic rigor on display here (He is now 86 but I hope he is still going as strong as ten years ago when this TED talk was recorded).

So, if we cast Geordie in the role of the 21st century version of Frank Rosenblatt (the inventor of the original perceptron) then we surely must pick Scott Aaronson as the modern day version of Marvin Minsky.  Only that the argument this time is not about AI, but how ‘quantum’ D-Wave’s device truly is.  The argument feels very similar: On one side, the theoretical computer scientist, equipped with boat-loads of mathematical rigor, strongly prefers the gate model of quantum computing. On the other one, the pragmatist, whose focus is to build something usable within the constraints of what chip foundries can produce at this time.

But the ultimate irony, it seems, at least in Scott Aaronson’s mind, is that the AI winter is the ultimate parable of warning to make his case (as was pointed out by an anonymous poster to his blog).  I.e. he thinks the D-Wave marketing hype can be equated to the over-promises of AI research in the past. Scott fears that if the company cannot deliver, the babe (I.e. Quantum Computing) will be thrown out with the bathwater, and so he blogged:

“I predict that the very same people now hyping D-Wave will turn around and—without the slightest acknowledgment of error on their part—declare that the entire field of quantum computing has now been unmasked as a mirage, a scam, and a chimera.”

A statement that of course didn’t go unchallenged in the comment section (Scott’s exemplary in allowing this kind of frankness on his blog).

I don’t pretend to have any deeper conclusion to draw from these observations, and will leave it at this sobering thought: While we expect science to be conducted in an eminently rational fashion, history gives ample examples of how the progress of science happens in fits and starts and is anything but rational.

23 thoughts on “Out of the AI Winter and into the Cold

  1. I think the bigger problem for Academic physics is not any public backlash against “marketing hype” from D-Wave The looking present danger is a public backlash against the “fairytale physics” so ably documented by Jim Baggott in his new book “Farewell to Reality”.

    I left Academic physics in 1997 because I became horrified by the extent to which Academics were prepared to ignore facts – both mathematical and physical – in order to preserve their cherished dreams of the multiverse and parallel worlds alongside many of the confused theories of information and computing then widely promulgated. Now, I think perhaps things have come full circle.

    We are just a few years now from the day when Academic physics must fight for its very life. There is too much bogus fantasy physics promulgated as real while honest real work goes ignored.

    This will change, as it must, and the reckoning will be furious.

    http://www.amazon.com/Farewell-Reality-Physics-Betrayed-Scientific/dp/1605984728

    1. Thanks for bringing this book to my attentioned. Jim Baggott very much seems to share my frustration with how physics gets packaged and sold to the public. As you suggest, I think this symptom hints at a deeper maladie. And given how even the Copenhagen interpretation had rather unintended cultural consequences the mind reels at what damage this may cause.

      Went ahead and purchased the Kindle edition.

  2. Thanks for this thought-provoking comment. There’s an interesting irony here (and something people might not guess from your post): namely, when it comes to AI, I’m a huge fan of Bayesian, statistical, and connectionist approaches. Sure, I suspect that at some point, logical and explicit knowledge-representation approaches will also be needed—indeed, that pretty much *everything* will be needed. But in the meantime, if statistical approaches are cleaning the clock of more traditional approaches at (say) text translation, voice recognition, or image classification, then quite obviously people should be using the former.

    I consider it obvious that we don’t have the right, as theorists, to tell the practitioners they shouldn’t be using something that works better than any alternative—simply because we don’t adequately understand *why* it works! Instead, the burden (and opportunity) is on us, to develop our theories to the point where they *can* explain why these approaches work in practice.

    But it’s worth emphasizing that the situation with D-Wave is nothing like the above. They claim that it’s possible to get a speedup over any classical approach, without worrying much about coherence, error-correction, or non-stoquasticity — i.e., the things that seem essential to theorists to a speedup. But here, the points is not merely that we don’t understand how their claims could be true. Rather, it’s that all the credible experimental evidence is consistent with their claims not being true! In order to claim a speedup, D-Wave and its boosters needed to misrepresent the situation fairly egregiously. And, as I wrote in my post, if Matthias Troyer’s conjecture is correct, and the D-Wave devices are efficiently simulable by quantum Monte Carlo with no sign problem, then it won’t matter how many more qubits are added: you *still* won’t get an honest speedup over classical.

    Of course, if D-Wave ever *did* get an honest speedup with its current approach, then the situation would shift to be more like that of statistical approaches to AI. I.e., there would obviously be something there, we wouldn’t fully understand it, and the burden would shift to us to explain it.

    1. Scott, thank you, for filling in the gaps as to what drives your concerns with regard to D-Wave. Certainly the question if D-Wave’s chip can be efficiently simulated with quantum Monte Carlo algorithms, and how ‘efficiency’ means something very specific in the computer sciences will make for many more future blog posts.

      What’s reassuring, to some extend, is that everybody on all sides of this discussion knows about the AI winter and doesn’t want to see a repeat with regards to quantum computing.

      Let’s hope this counts for something and will ensure that history doesn’t repeat.

    2. Scott, want to bring this paper (h/t Brian F.) to your attention as it is rather timely and specific to the discussion, it argues that a quantum speed-up in machine learning can be achieved with adiabatic quantum computing.

      1. Henning, that sounds interesting, but a quick look shows there is no mention of “adiabetic” anywhere in this paper (nor analysis of robustness against noise). Can you explain the connection you see with dwave’s machine?

        1. Try searching for “adiabatic,” instead, which points in turn to these references:

          [10] H. Neven, V. Denchev, G. Rose, and W. Macready,
          arXiv:quant-ph/0811.0416 (2008).
          [11] H. Neven, V. Denchev, G. Rose, and W. Macready,
          arXiv:0912.0779 (2009).
          [12] K. Pudenz and D. Lidar, arXiv:1109.0325 (2011)

          1. Brian F., these references do not provide any support to the claim that Rebentrost et al. “argues that a quantum speed-up in machine learning can be achieved with adiabatic quantum computing”.

            Rebentrost et al. are indeed claiming an exponential speed-up, but for a completly different algorithm (this one: http://arxiv.org/pdf/1307.0401v1.pdf).

          2. “Rebentrost et al. are indeed claiming an exponential speed-up, but for a completely different algorithm…”

            An adiabatic process is not an algorithm, but is defined as one “occurring without loss or gain of heat.”

            http://www.merriam-webster.com/dictionary/adiabatic

            A quantum computer (QC) may presumably execute all sorts of algorithms, so long as it maintains coherence — and coherence is often considered a prerequisite for QC’s, although this strikes me as highly questionable in the more general case. Typically, though, it is considered necessary to shield a QC from thermal influences in order for a device to maintain coherence,

            In other views:

            http://bit.ly/19v4eqU

            http://arxiv.org/pdf/1106.2264v3.pdf

            http://eric.ed.gov/?id=EJ873690

            http://bit.ly/19v4Qgg

          3. Brian, adiabatic quantum computing describes an approach to this field that relies on adiabatically developing a Hamiltonian that is constructed in such a fashion that its ground states gives the solution of the computionational problem under investigation. D-Wave falls into this category specifically for the Ising spin model that they implement.

            So in a sense once the Hamiltonian is constructed you indeed don’t have an algorithm to consider any more but let the machine develop adiabatically. This differs from the gate based model that specifically implements various unitary transformations, although one could argue that in the latter case once the gates are set up the QC will by its very nature also develop adiabatically.

            A while ago I attempted a bit of a taxonomy of all the various approaches to QC.

          4. Kalish, they claim a quantum speed-up for machine learning in this paper based on the quantum inversion algorithm. My mistake was to think they were referring to adiabatic QC. Seems to me the paper you link to comes back to same algorithm (of course we already established that my first glance impression of papers is not to be trusted).

            This quantum self-analyzing idea is fascinating in its own right. It reminds me of this older paper which of course assumes the usual statistical sampling of the quantum states.

        2. Thank you for pointing this out. I’ve been too busy to give this a proper read yet and was thrown by their reference at the end of their first paragraph (line 20) where they mention SVM in the context of adiabatic QC [10–13].

          Have been setting this aside to read on the weekend thinking that somebody managed to proof a speed-up with the adiabatic approach. Alas I see now that at the heart of this is their quantum matrix inversion algorithm which, while cool in its own right, is certainly not modelled adiabatically.

          Too bad, it’s unfortunately not as as earth shattering as I thought at first glance 🙁

          1. No problem, happens. Thank you for the discussion, and for providing this link in the first place!

            Actually, I wouldn’t be too surprised if 5 y from now it will be considered as the most earth shattering contribution since Shor.

  3. Creepy….Could Scott be a re-incarnation of Minsky???? Are you sure Minsky is still alive??? Even if he is, maybe part of his soul has already migrated into Scott.

    1. IMHO Minsky being an adherent of strong AI would strongly object to the conjecture of the existence of such soul stuff (migratory or not).

      Then again I don’t know if you can teach CS at MIT without absorbing some of the old maestro’s memes 🙂

  4. I fully support the opinion that the entire field of quantum computing (in its existing form) should be unmasked as a mirage, a scam, and a chimera

  5. Hello HD,
    Believe it or not, I was pals with Geordie once upon a time. You said: “he doesn’t strike me as a particularly romantic fellow”…. I can tell you from many hours of grappling against him that he is more of a cave-man than a don juan.

    Anyhow, I think I will try to read up on the quantum computing game and join you over here. But don’t worry, I won’t tell any of my awful stories.

  6. Henning: I notice that Scott is becoming a mellow fellow!. I think that’s because he is trying to soften us up for the day when either NASA, Google, USRA, USC, Lockheed or D-Wave’s scientist come up (hopefully soon) with a concrete, definitive & undisputed evidence of a real speed-up by D-Wave Two over classical systems. And then he will apologize to D-Wave, its staff & supporters, that he was “wrong”, and then harangue his fellow physicists to “come up with an explanation”, which ALL of us knew, all along, was due to the quantum mechanical effects that the computer has been using since its inception. It’s worth waiting for!.

    1. Scott’s condition is pretty hardcore: He wants clear evidence for quantum speed-up. Whereas it’s a safe bet that D-Wave’s customers will be fine with practically relevant polynomial speed-up.

      At any rate, from a PR standpoint things are just going swimmingly for D-Wave. While one MIT prof (Scott) denigrates them, another (Seth) bemoans that they stole his IP. It’s a twofer. Free publicity with Scott’s critical stance neutered by Seth’s contentions.

      After all, only something valuable would be worth stealing, and as there are no patents infringed there is no threat of litigation or any other potential harm for D-Wave. Just tons of free advertising.

Comments are closed.