What if they had put off the LIGO upgrades?

If a neutron star falls into a black hole but no one has upgraded the gravitational observatory to the required sensitivity, does it fail completely to change our view of the universe?

The Advanced Laser Interferometry Gravitational Observatory (aLIGO) consists of a pair of Fabry–Pérot Interferometers spaced about 3000 km apart, each sporting two cavities about 4 km long and sensitive to length changes smaller than a proton. The tubes containing the optics operate at a vacuum with about 10 times lower pressure than that experienced by the International Space Station in low earth orbit. The lasers put out in excess of 100 kW of laser power, and the power in the chambers is further amplified by each photon reflecting off of the test mass and back several hundred times. Each 20 kg test mass is balanced precariously on threads of glass thinner than things that are really rather thin already. In other words, it’s a huge friggin’ laser powerful enough to burn a burrito, with components precariously balanced in an inside out space ship.

On the 14th of September 2015, these instruments recorded measurements that would support the idea that spacetime changes size when masses accelerate. We usually refer to the instruments and all aspects of the research program supporting it by the same acronym: LIGO. Perhaps you’ve heard of it?

Although the colloquial story is that LIGO recorded the historic GW150914 gravitational wave event during an engineering run even before beginning formal scientific data collection, this isn’t strictly true. In fact LIGO had been performing science runs at Hanford and Livingston sites since 2002. In 2005, LIGO reached an original design sensitivity of strain detection on the order of one part in 1021. Another way to think about, and the common way to report, the sensitivity of the instruments is the distance at which a typical neutron-pair inspiral could nominally be detected. One part in 1021 strain sensitivity corresponds to a search distance of about 8 million parsecs (about 26 million light years). This was the sort of sensitivity LIGO was capable of up until the latter part of 2010. As impressive as that is, there were no gravitational wave detections during operation of LIGO from 2002 to 2010.

The now famous GW150914 and subsequent detections GW151226 and GW170104 came after a comprehensive suite of upgrades that boosted sensitivity to a search distance of 80 million parsecs (~262 light years) away. Four years of shutdown beginning in 2010 marked the transition from “intial LIGO” to “advanced LIGO” (aLIGO). Four years sounds like quite a while in human time, and an especially conservative experimenter might be wont to keep collecting data until proof-of-concept is established. As long as the machine is working in some rudimentary fashion, pushing to eke out just one detection before shutting down for risky upgrades might sound like it makes sense. What if LIGO had put off the upgrades to instead continue with scientific runs? Not much, as it turns out.

Our best guess for the frequency of observable events is based on what aLIGO picked up in the first science run. The first advanced run had about 1100 hours of uptime, time when both instruments were locked-in and active. During this run aLIGO’s picked up 2 confirmed events (and one almost event, yet unconfirmed), giving us a rate of 2 events per 1100 hours in a volume of 2.145 trillion parsecs cubed (the search volume for an 80 Mega-parsec detection distance). This leads us to expect 1 detection for every 22.92 days of run time, or about 16 detections per year, not considering instrument downtime.

Prioritizing data collection at the cost of forgoing upgrades, we would probably still be waiting on the big announcement. Operating at a pre-2014 sensitivity of 8 Mparsecs, we could expect a detection on average once ever 62 years. Assuming a Poisson distribution (events are random), the chances of one or more detections in 4 years of data collection, pre-aLIGO sensitivity, would be just a tick over 6%. For a 50/50 split in the odds of making a detection, we’d have to wait 44 years. Chances are, funding bodies could very well lose interest in that time, and we certainly would not have seen the international enthusiasm in gravitational wave research resulting from the GW150914 announcement.

The moral of the story? The difference between being “productive” and creating something great lies in the old “work smarter, not harder” paradigm. Blind diligence and the perseverance to keep on plugging away has little chance to push the boundaries of what is known to be possible.


Curious about any of the calculations discussed above? Tinker with my notes in this Jupyter notebook

Advertisements

I too built a rather decent deep learning rig for 900 quid

Skip to the components list
Skip to the benchmarks

Robert Heinlein’s 1957 Door into Summer returns throughout to a theme of knowing when it’s “time to railroad.” Loosely speaking this is the idea that one’s success comes as much from historical context as it does from innate ability, hard work, and luck (though much of the latter can be attributed to historical context).

Much of the concepts driving our modern AI renaissance are decades old at least- but the field lost steam as the computers were too slow and the Amazookles of the world were yet to use them to power their recommendation engines and so on. In the meantime computers have gotten much faster and much better at beating humans at very fancy games. Modern computers are now fast enough to make deep learning feasible, and it works for many problems as well as providing insight into how our own minds might work.

I too have seen the writing on the wall in recent years. I can say with some confidence that now is the time to railroad, and by “railroad” I mean revolutionise the world with artificial intelligence. A lot of things changed in big ways during the original “time to railroad,” the industrial revolution. For some this meant fortune and progress and for others, ruin. I would like to think that we are all a bit brighter than our old-timey counterparts were back then and we have the benefit of our history to learn from, so I’m rooting for an egalitarian utopia rather than an AI apocalypse. In any case, collective stewardship of the sea changes underway is important and this means the more people learn about AI the less likely the future will be decided solely by the technocratic elites of today.

I’ve completed a few MOOCs on machine learning in general and neural networks in particular, coded up some of the basic functions from scratch and I’m beginning to use some of the major libraries to investigate more interesting ideas. As I moved on from toy examples like MNIST and housing price prediction one thing became increasingly clear:

It took me a week of work to realize I was totally on the wrong track training a vision model meant to mimic cuttlefish perception on my laptop. This sort of wasted time really adds up, so I decided to go deeper and build my own GPU-enhanced deep learning rig.

Luckily there are lots of great guides out there as everyone and their grandmother is building their own DL rig at the moment. Most of the build guides have something along the lines of “. . . for xxxx monies” in the title, which makes it easier to match budgets. Build budgets run the gamut from the surprisingly capable $800 machine by Nick Condo to the serious $1700 machine by Slav Ivanov all the way up to the low low price of “under $5000” by Kunal Jain. I did not even read the last one because I am not made of money.

I am currently living in the UK, so that means I have to buy everything in pounds. The prices for components in pounds sterling are. . . pretty much the same as they are in greenbacks. The exchange rate to the British pound can be a bit misleading, even now that Brexit has crushed the pound sterling as well as our hopes and dreams. In my experience it seems like you can buy about the same for a pound at the store as for a dollar in the US or a euro on the continent. It seems like the only thing they use the exchange rate for is calculating salaries.

I’d recommend first visiting Tim Dettmers’ guide to choosing the right GPU for you. I’m in a stage of life where buying the “second cheapest” appropriate option is usually best. With a little additional background reading and following Tim’s guide, I selected the Nvidia GTX 1060 GPU with 6GB of memory. This was from Tim’s “I have little money” category, one up from the “I have almost no money” category, and in keeping with my life philosophy of the second-cheapest. Going to the next tier up is often close to twice as costly, but not close to twice as good. This holds for my choice of GPUs as well: a single 1070 is about twice the cost and around 50% or so faster than a 1060 However, two 1060s does get you pretty close to twice the performance, and that’s what I went with. As we’ll see in the benchmarks Tensorflow does make it pretty easy to take advantage of both GPUs, but doubling the capacity of my DLR by doubling the GPUs in the future won’t be plausible.

My upgradeability is somewhat limited by the number of threads (4) and PCIe lanes (16) of the modest i3 CPU I chose; if a near-term upgrade was a higher priority, I should have left out the second 1060 GPU and diverted that part of a budget to a better CPU (e.g. the Intel Xeon E5-1620 V4 recommended by Slav Ivanov). But if you’re shelling out so much for a higher-end system you’ll probably want a bigger GPU to start with, and it’s easy to see how one can go from a budget of $800 to $1700 rather quickly.

The rest of the computer’s job is to quickly dump data into the GPU memory without messing things up. I ended up using almost all the same components as those in Nick’s guide because, again, my physical makeup is meat rather than monetary in nature.

Here’s the full list of components. I sourced what I could from Amazon Warehouse Deals to try and keep the cost down.


GPU (x2): Gigabyte Nvidia GTX 1060 6GB (£205.78 each)
Motherboard: MSI Intel Z170 KRAIT-GAMING (£99.95)
CPU: Intel Core i3 6100 Skylake Dual-Core 3.7 GHz Processor (£94.58)
Memory: Corsair CMK16GX4M2A2400C14 Vengeance 2x8GB (1£05.78)
PSU: Corsair CP-9020078-UK Builder Series 750W CS750M ATX/EPS Semi-Modular 80 Plus Gold Power Supply Unit (£77.25)
Storage: SanDisk Ultra II SSD 240 GB SATA III (£72.18)
Case: Thermaltake Versa H23 (27.10)

Total: £888.40

I had never built a PC before and didn’t have any idea what I was doing. Luckily, Youtube did, and I didn’t even break anything when I slotted all the pieces together. I had an install thumb drive for Ubuntu 16.04 hanging around ready to go and consequently I was up and running relatively quickly.

The next step was installing the drivers and CUDA developer’s toolkit for the GPUs. I’ve been working mainly with Tensorflow lately, so I followed their guide to get everything ready to take advantage of the new setup. I am using Anaconda to manage Python environments for now, so I made one with tensorflow and another with tensorflow_gpu packages.

I decided to train on the CIFAR10 image classification dataset using this tutorial to test out the GPUs. I also wanted to see how fast training progresses on a project of mine, a two-category classifier for quantitative phase microscope images.

The CIFAR10 image classification tutorial from tensorflow.org was perfect because you can flag for the training to take place on one or two GPUs, or train on the CPU alone. It takes ~1.25 hours to train the first 10000 steps on the CPU, but only 4 minutes for the same training on one 1060. That’s a weeks-to-days/days-to-hours/hours-to-minutes level of speedup.

# CPU 10000 steps
2017-06-18 12:56:38.151978: step 0, loss = 4.68 (274.9 examples/sec; 0.466 sec/batch)
2017-06-18 12:56:42.815268: step 10, loss = 4.60 (274.5 examples/sec; 0.466 sec/batch)

2017-06-18 14:12:50.121319: step 9980, loss = 0.80 (283.0 examples/sec; 0.452 sec/batch)
2017-06-18 14:12:54.652866: step 9990, loss = 1.03 (282.5 examples/sec; 0.453 sec/batch)

# One GPU
2017-06-18 15:50:16.810051: step 0, loss = 4.67 (2.3 examples/sec; 56.496 sec/batch)
2017-06-18 15:50:17.678610: step 10, loss = 4.62 (6139.0 examples/sec; 0.021 sec/batch)
2017-06-18 15:50:17.886419: step 20, loss = 4.54 (6197.2 examples/sec; 0.021 sec/batch)

2017-06-18 15:54:00.386815: step 10000, loss = 0.96 (5823.0 examples/sec; 0.022 sec/batch)

# Two GPUs
2017-06-25 14:48:43.918359: step 0, loss = 4.68 (4.7 examples/sec; 27.362 sec/batch)
2017-06-25 14:48:45.058762: step 10, loss = 4.61 (10065.4 examples/sec; 0.013 sec/batch)

2017-06-25 14:52:28.510590: step 6000, loss = 0.91 (8172.5 examples/sec; 0.016 sec/batch)

2017-06-25 14:54:56.087587: step 9990, loss = 0.90 (6167.8 examples/sec; 0.021 sec/batch)

That’s about 21-32x speedup on the GPUs. Not quite double the speed on two GPUs because the model isn’t big enough to utilize all of both GPUs, as we can see in the output from nvidia-smi

# Training on one GPU

# Training on two GPUs

My own model had a similar speedup, going from training about one 79-image minibatch per second to training more than 30 per second. Trying to train this model on my laptop, a Microsoft Surface Book, I was getting about 0.75 steps a second. [Aside: the laptop does have a discrete GPU, a variant of the GeForce 940M, but no Linux driver that I’m aware of :/].

# Training on CPU only
INFO:tensorflow:global_step/sec: 0.981465
INFO:tensorflow:loss = 0.673449, step = 173 (101.889 sec)
INFO:tensorflow:global_step/sec: 0.994314
INFO:tensorflow:loss = 0.64968, step = 273 (100.572 sec)

# Dual GPUs
INFO:tensorflow:global_step/sec: 30.3432
INFO:tensorflow:loss = 0.317435, step = 90801 (3.296 sec)
INFO:tensorflow:global_step/sec: 30.6238
INFO:tensorflow:loss = 0.272398, step = 90901 (3.265 sec)
INFO:tensorflow:global_step/sec: 30.5632
INFO:tensorflow:loss = 0.327474, step = 91001 (3.272 sec)
INFO:tensorflow:global_step/sec: 30.5643
INFO:tensorflow:loss = 0.43074, step = 91101 (3.272 sec)
INFO:tensorflow:global_step/sec: 30.6085

Overall I’m pretty satisfied with the setup, and I’ve got a lot of cool projects to try out on it. Getting the basics for machine learning is pretty easy with all the great MOOCs and tutorials out there, but the learning curve slopes sharply upward after that. Working directly on real projects with a machine that can train big models before the heat-death of the universe is essential for gaining intuition and tackling cool problems.

A Skeptic Over Coffee: Young Blood Part Duh

Does this cloudy liquid hold the secret to vitality in your first 100 years and beyond? I can’t say for sure that it doesn’t. What I can say is that I would happily sell it to you for $8,000.

Next time someone tries to charge you a premium to intravenously imbibe someone else’s blood plasma, you have my permission to tell them no thanks. Unless there’s a chance that it is fake, then it might be worth doing.

Californian company Ambrosia LLC has been making the rounds in publications like the New Scientist hype-machine to promote claims that their plasma transfusions show efficacy at treating symptomatic biomarkers of aging. Set up primarily to exploit rich people by exploiting younger, poorer people on the off chance that the Precious Bodily Fluids of the latter will invigorate the former, the small biotech firm performed a tiny study of over-35s receiving blood plasma transfusions from younger people. It’s listed on clinicaltrials.gov and everything.

First of all, to determine the efficacy of a treatment it’s important that both the doctors and the patients are blinded to whether they are administering/being administered the active therapeutic. That goes all the way up the line from the responsible physician to the phlebotomist to the statistician analyzing the data. But to blind patients and researchers the study must include a control group receiving a placebo treatment, which in this case there was not. So it’s got that going for it.

To be fair, this isn’t actually bad science. For that to be true, it would have to be actual science. Not only does a study like this require a control to account for any placebo effect*, but the changes reported for the various biomarkers may be well within common fluctuations.

Finally, remember that if you assess 20 biomarkers with the common confidence cutoff of p=0.05, chances are one of the twenty will show a statistical difference from baseline. That is the definition of a p-value at that level: a 1 in 20 chance of a difference being down to random chance. Quartz reports the Ambrosia study looked at about 100 different biomarkers and mentions positive changes in 3 of them. I don’t know if they performed statistical tests at a cutoff level of 0.05, but if so you should expect on average 5 of 100 biomarkers in a screen to show a statistical difference. This isn’t the first case of questionable statistics selling fountain of youth concepts.

All of this is not to say that the experiments disprove the positive effects of shooting up teenage PBFs. It also generated zero conclusive evidence against the presence of a large population of English teapots in erratic orbits around Saturn.

You could conclude by saying “more scientific investigation is warranted” but that would imply the work so far was science.

* The placebo effect can even apply to as seemingly objective a treatment as surgery. Take this 2013 study that found no statistical difference in the outcomes of patients with knee problems treated with either arthroscopic surgery or a surgeon pretending to perform the surgery.

I

​What the cornerstone of any futuristic transportation mix should be.

The future has always promised exciting new forms of transport for the bustling hither and thither of our undoubtedly jumpsuit-wearing, cyborg selves. From the outlandish (flying cars) to the decidedly practical (electric cars), a better way of getting about is always just around the corner. Workers in the United States spend about 26 minutes twice a day on their commutes, and for most people this means driving. What’s worse, the negative effect of a long commute on life satisfaction is consistently under-estimated. Premature deaths in the United States due to automobile accidents and air pollution from vehicles are about 33,000 and an estimated 58,000 yearly, respectively. Add in all the costs associated with car ownership and road maintenance (not to mention the incalculable cost of automobiles’ contribution to the potentially existential threat of climate change) and the picture becomes clear: cars aren’t so much a convenient means of conveyance serving the humans they carry, but rather a demanding taskmaster that may be the doom of us all. There must be something better awaiting us in the transportation wonders of tomorrow.

What if we came up with a transportation mode that is faster than taking the bus, costs less than driving, and improves lifespan? What if it also happened to be the most efficient means of transport known? Anything offering up that long list of pros should be a centerpiece of any transportation blend, what wonder of future technology could I possibly be talking about?

I’m writing, of course, about the humble bicycle.

Prioritizing exotic transportation projects like Elon Musk’s hyperloop is like inventing a new type of ladder to reach the highest branches, all the while surrounded on all sides by drooping boughs laden with low-hanging fruit. In a great example of working harder, not smarter, city planners in the U.S. strive tirelessly to please our automobile overlords. Everyone needs a car to get to work and the supermarket, because everything is far apart, and everything is so far apart because everyone drives everywhere anyway. All the parking spaces and wide lanes push everything even further apart in a commuting nightmare feedback loop.

It doesn’t have to be that way, and it’s not too late to change. Consider the famously bikeable infrastructure of the Netherlands, where the bicycle is king. Many people take the purpose-built bike lanes for granted and assume they’ve always been there, but in fact they are a result of deliberate activism leading to a broad change in transportation policy beginning in the seventies. Likewise, the servile relationship many U.S. cities maintain with cars is not set in stone, and, contrary to popular belief, fuel taxes and registration fees don’t cover the costs

Even if every conventional automobile was replaced tomorrow with a self-driving electric car, a bicycle would still be a more efficient choice. The reason comes down to simple physics: a typical bike’s ~10 kgs is a fraction of the mass of the average rider, so most of the energy delivered to the pedals goes toward moving the human cargo. A car (even a Tesla) has to waste most of its energy moving the car itself. The only vehicle that has a chance of besting the bicycle in terms of efficiency is an electric-assist bicycle, once you factor in the total energy costs of producing and shipping the human fuel (food), but even that depends on where you buy your groceries [pdf].

Bicycles have been around in more or less modern form for over a hundred years, but the right tool isn’t necessarily the newest. The law of parsimony posits that the simplest solution that suffices is generally the best, and for many of our basic transport needs that means a bicycle. It’s about time we started affording cycling the respect it deserves as a central piece of our future cities and towns. Your future transportation experience may mean you’ll go to the office in virtual reality, meet important clients by hybrid dirigible, and ship supplies to Mars by electric rocket, but you’ll pick up the groceries by bicycle on the way home from the station.

Image sources used for illustrations:

Fat bike CC SA BY Sylenius

Public Domain:

Tire tracks

Lunar lander module>

Apollo footprint

Trolling a Neural Network to Learn About Color Cues

Neural networks are breaking into new fields and refining roles in old ones on a day-to-day basis. The main enabling breakthrough in recent years is the ability to efficiently train networks consisting of many stacked layers of artificial neurons. These deep learning networks have been used for everything from tomographic phase microscopy to learning to generate speech from scratch.

A particularly fun example of a deep neural net comes in the form of one @ColorizeBot, a twitter bot that generates color images from black and white photographs. For landscapes, portraits, and street photography the results are reasonably realistic, even if they do fall into an uncanny valley that is eery, striking, and often quite beautiful. I decided to try and trick @ColorizeBot to learn something about how it was trained and regularized, and maybe gain some insights into general color cues. First, a little background on how @ColorizeBot might be put together.

According to the description on @ColorizeBot’s Twitter page:

I hallucinate colors into any monochrome image. I consist of several ConvNets and have been trained on millions of images to recognize things and color them.

This tells us that CB is indeed an artificial neural network with many layers, some of which consist of convolutional layers. These would be sharing weights and give deep learning the ability to discover features from images rather than relying on a conventional machine vision approach of manual extraction of image features to train an algorithm. This gives CB the ability to discover important indicators of color that their handler wouldn’t necessarily have thought of in the first place. I expect CB was trained as a special type of autoencoder. Normally, an autoencoding neural network has the same data on both the input and output side and iteratively tries to reproduce the input at the output in an efficient manner. In this case instead of producing a single grayscale image at the output, the network would need to produce three versions, one image each for red, green, and blue color channels. Of course, it doesn’t make sense to totally throw away the structure of the black and white image and the way the authors include this a priori knowledge to inform the output must have been important for getting the technique to work well and fast. CB’s twitter bio claims to have trained on millions of photos, and I tried to trick it into making mistakes and revealing something about it’s inner workings and training data. To do this, I took some photos I thought might yield interesting results, converted them to grayscale, and sent them to @ColorizeBot.

The first thing I wanted to try is a classic teaching example from black and white photography. If you have ever thought about dusting off a vintage medium format rangefinder and turning your closet into a darkroom, you probably know that a vibrant sun-kissed tomato on a bed of crisp greens looks decidedly bland on black and white film. If one wishes to pursue the glamorous life of a hipster salad photograher, it’s important to invest in a few color filters to distinguish red and green. In general, red tomatoes and green salad leaves have approximately the same luminance (i.e. brightness) values. I wrote about how this example might look through the unique eyes of cephalapods, which can perceive color with only one color type of photoreceptor. Our own visual system can only see contrast between the two types of object by their color, but if a human viewer looks at a salad in a dark room (what? midnight is a perfectly reasonable time for salad), they can still tell what is and is not a tomato without distinguishing the colors. @ColorizeBot interprets a B&W photo of cherry tomatoes on spinach leaves as follows:

c2sel44vqaagemw-jpg-large

This scene is vaguely plausible. After all, it some people may prefer salads with unripe tomatoes. Perhaps meal-time photos from these people’s social media feeds made it into the training data for @ColorizeBot. What is potentially more interesting is that this test image revealed a spatial dependence- the tomatoes in the corner were correctly filled in with a reddish hue, while those in the center remain green. Maybe this has something to do with how salad images used to train the bot were framed. Alternatively, it could be that the abundance of leaves surrounding the central tomatoes provide a confusing context and CB is used to recognizing more isolated round objects as tomatoes. In any case it does know enough to guess than spinach is green and some cherry tomatoes are reddish.

Next I decided to try and deliberately evoke evidence of overfitting with an Ishihara test. These are the mosaic images of dots with colored numbers written in the pattern. If @ColorizeBot scraped public images from the internet for some of its training images, it probably came across Ishihara tests. If the colorizer expects to see some sort of numbers (or any patterned color variation) in a circle of dots that looks like a color-blindness test, it’s probably overfitting; the black and white image by design doesn’t give any clues about color variation.

c2se-teveae2_ay-jpg-large

That one’s a pass. The bot filled in the flyer with a bland brown coloration, but didn’t overfit by dreaming up color variation in the Ishihara test. This tells us that even though there’s a fair chance the neural net may have seen an imagef like this before, it doesn’t expect it every time it sees a flat pattern of circles. CB has also learned to hedge its bets when looking at a box of of colored pencils, which could conceivably be a box of brown sketching pencils.

c2seviwviaa87xo-jpg-large

What about a more typical type of photograph? Here’s an old truck in some snow:

c2scawfveaallw4-jpg-large

CB managed to correctly interpret the high-albedo snow as white (except where it was confused by shadows), and, although it made the day out to be a bit sunnier than it actually was, most of the winter grass was correctly interpreted as brown. But have a look on the right hand side of the photo, where apparently CB decided the seasons changed to a green spring in the time it takes to scan your eyes across the image. This is the sort of surreal, uncanny effect that CB is capable of. It’s more pronounced, and sometimes much more aesthetic, on some of the fancier photos on CB’s Twitter feed. The seasonal transformation from one side of the photo tells us something about the limits of CB’s interpretation of context.

In a convolutional neural network, each part of an input image is convolved with kernels of a limited size, and the influence of one part of the image on its neighbors is limited to some degree by the size of the largest kernels. You can think of these convolutional kernels as smaller sub-images that are applied to the full image as a moving filter, and they are a foundational component of the ability of deep neural networks to discover features, like edges and orientations, without being explicitly told what to look for. The results of these convolutional layers propagate deeper through the network, where the algorithm can make increasingly complex connections between aspects of the image.

In the snowy truck and in the tomato/spinach salad examples, we were able to observe @ColorizeBot’s ability to change it’s interpretation of the same sort of objects across a single field of view. If you, fellow human, or myself see an image that looks like it was taken in winter, we include in our expectations “This photo looks like it was taken in winter, so it is likely the whole scene takes place in winter because that’s how photographs and time tends to work.” Likewise, we might find it strange for someone to have a preference for unripe tomatoes, but we’d find it even stranger for someone to prefer a mixture of ripe-ish and unripe tomatoes on the same salad. Maybe the salad maker was an impatient type suffering from a tomato shortage, but given a black and white photo that wouldn’t be my first guess on how it came to be based on the way most of the salads I’ve seen throughout my life have been constructed. In general we don’t see deep neural networks like @Colorizebot generalizing that far quite yet, and the resulting sense of context can be limited. This is different from generative networks like Google’s “Inception” or style transfer systems like Deepart.io, which perfuse an entire scene with a cohesive theme (even if that theme is “everything is made of duck’s eyes”).

Finally, what does CB think of theScinder’s logo image? It’s a miniature magnetoplasmadynamic thruster built out of a camera flash and magnet wire. Does CB have any prior experience with esoteric desktop plasma generators?

c29xshxviaa2_g3

That’ll do CB, that’ll do.

Can’t get enough machine learning? Check out my other essays on the topic

@ColorizeBot’s Twitter feed

@CtheScinder’s Twitter feed

All the photographs used in this essay were taken by yours truly, (at http://www.thescinder.com), and all images were colorized by @ColorizeBot.

And finally, here’s the color-to-B&W-to-color transformation for the tomato spinach photo:

tomatotrickery

Teaching a Machine to Love  XOR

xorsketch

The XOR function outputs true if one of the two inputs are true

The exclusive or function, also known as XOR (but never going by both names simultaneously), has a special relationship to artificial intelligence in general, and neural networks in particular. This is thanks to a prominent book from 1969 by Marvin Minsky and Seymour Papert entitled “Perceptrons: an Introduction to Computational Geometry.” Depending on who you ask, this text was single-handedly responsible for the AI winter due to its critiques of the state of the art neural network of the time. In an alternative view, few people ever actually read the book but everyone heard about it, and the tendency was to generalize a special-case limitation of local and single-layer perceptrons to the point where interest and funding for neural networks evaporated. In any case, thanks to back-propagation, neural networks are now in widespread use and we can easily train a three-layer neural network to replicate the XOR function.

In words, the XOR function is true for two inputs if one of them, but not both, is true. When you plot the XOR as a graph, it becomes obvious why the early perceptron would have trouble getting it right more than half the time.

sketch2dxor

There’s not a way to draw a straight 2D line on the graph and separate the true and false outputs for XOR, red and green in the sketch above. Go ahead and try. The same is going to be true trying to use a plane to separate a 3D version and so on to higher dimensions.

sketch3dxor

That’s a problem because a single layer perceptron can only classify points linearly. But if we allow ourselves a curved boundary, we can separate the true and false outputs easily, which is exactly what we get by adding a hidden layer to a neural network.

xorwhiddenlayer

The truth-table for XOR is as follows:

Input Output
00 0
01 1
10 1
11 0

If we want to train a neural network to replicate the table above, we use backpropagation to flow the output errors backward through the network based on the neuron activations of each node. Based on the gradient of these activations and the error in the layer immediately above, the network error can be optimized by something like gradient descent. As a result, our network can now be taught to represent a non-linear function. For a network with two inputs, three hidden units, and one output the training might go something like this:

trainingxor

Update (2017/03/02) Here’s the gist for making the gif above:

Things to Think About From 2016

thescinder2016wordle

A word cloud of theScinder’s output for 2016, made with wordle.net

CRISPR/Cas9

This subject includes throwbacks to 2015, when I did most of my writing about CRISPR/Cas9. That’s not to say 2016 didn’t contain any major genetic engineering news. In particular scientists are continue to move ahead with the genetic modification of human embryos.

If you feel like I did before I engaged in some deeper background reading, you can catch up with my notes on the basics. I used the protein structures for existing gene-editing techniques to highlight the differences between the old-school gene editing techniques and editing with cas9. I also compared the effort it takes to modify a genome with cas9 to how difficult it was using zinc-finger nucleases, the previous state-of-the-art (spoiler: it amounts to days of difference).

TLDR: The advantage of genetic engineering with Cas9 over previous methods is the difference between writing out a sequence of letters and solving complex molecular binding problems.

aLIGO and the detection of gravitational waves

Among the most impressive scientific breakthroughs of the previous hundred years or so, a bunch of clever people with very sensitive machines announced they’ve detected the squidge-squodging of space. A lot of the LIGO data is available from the LIGO Open Science Center, and this is a great way to learn signal processing techniques in Python. I synchronized the sound of gravitational wave chirp GW150914 to a simulated visualization (from SXS) of a corresponding black hole inspiral and the result is the following video. You can read my notes about the process here. I also modified the chirp to play the first few notes of the “Super Mario Brothers” theme.

Machine Learning

I’ve just started an intensive study of the subject, but machine learning continues to dip its toes into everything to do with modern human life. We have a lot of experience with meat-based learning programs, which should give us some insight into how to avoid common pitfalls. The related renewed interest in artificial intelligence should make the next few years interesting. If we do end up with a “hard” general artificial intelligence sometime soon, it might make competition a bit tough, if you could call it competition at all.

Devote a few seconds of thought to the twin issues of privacy and data ownership.

Mars

2016 also marked a renewed interest in manned space exploration, largely because of the announcement from space enthusiast Elon Musk that he’s really stoked to send a few people to Mars. NASA is still interested in Mars as well, and might be a good partner to temper Musk’s enthusiasm. In the Q&A at about 1:21 in the video below, Musk seems to suggest a willingness to die as the primary prerequisite for his first batch of settlers. There’s some known unavoidable and unknown unknowable dangers in the venture, but de-prioritizing survivability as a mission constraint runs a better chance of delaying manned exploration as long as it remains as expensive as Musk optimistically expects.

Here’s some stuff that’s a little a lot less serious about living on Mars.

It doesn’t grab the headlines with such vigor, but Jeff Bezo’s Blue Origins had an impressive year: retiring their first rocket after five flights and exceeding the mission design in a final test of a launch escape system.
Blue Origin is also working on an orbital launch system called New Glenn, in honor of the first astronaut from the USA to orbit the earth.

In that case, where are we headed?

The previous year provided some exciting moments to really trip the synapses, but we had some worrying turns as well. The biggest challenges of the next few decades will all have technical components, and understanding them doesn’t come for free. Humanity is learning more about biology at more fundamental levels, and medicine won’t look the same in ten years. A lot of people seem unconcerned that we probably won’t make the 2 degrees Celsius threshold for limiting climate change, although not worrying about something doesn’t mean it won’t kill anyone. Scientists and engineers have been clever enough to develop machine learners to assist our curiosity, and it’s exciting to think that resurgent interest in AI might give us someone to talk to soon. Hopefully they’ll be better conversationalists than the currently available chatbots, and a second opinion on the nature of the universe could be useful. It’s not going to be easy to keep up with improving automation, and humans will have to think about what working means to them.

Take some time to really dig into these subjects. You probably already have some thoughts and opinions on some of them, so try to read a contrary take. If you can’t think of evidence that might change your mind, you don’t deserve your conclusions.

Remember that science, technological development, and innovation have a much larger long-term effect on humans and our place in the universe than the petty machinations of human fractionation. So keep learning, figure out something new, and remember that if you possess general intelligence you can approach any subject. On the other hand, autogenous annihilation is one of the most plausible solutions to the Fermi Paradox. This is no time to get Kehoed