The structure behind the simplicity of CRISPR/Cas9


The International Summit on Human Gene Editing took place in Washington D.C. a few weeks ago, underlining the critical attention continuing to follow CRISPR/Cas9 and its applications to genome editing. Recently I compared published protocols for CRISPR/Cas9 and a competing technique based on Zn-finger nucleases. Comparing the protocols suggests editing with CRISPR/Cas9 is vaguely simpler than using Zn-fingers, but didn’t discuss the biomolecular mechanisms underlying the increased ease of use. Here I’ll illustrate the fundamental difference between genome editing with Cas9 in simple terms, using relevant protein structures from the Protein Data Bank.

Each of the techniques I’ll mention here have the same end-goal: break double stranded DNA in a specific location. Once a DNA strand undergoes this type of damage, a cell’s own repair mechanisms take over to put it back together. It is possible to introduce a replacement strand and encourage the cell to incorporate this DNA into the break, instead of the original sequence.

The only fundamental difference in the main techniques used for genome editing is the way they are targeted. Cas9, Zn-finger, and Transcription Activator Like (TAL) nucleases all aim to make a targeted break in DNA. Other challenges, such as getting the system into cells in the first place, are shared alike by all three systems.


Zinc Fingers (red) bound to target DNA (orange). A sufficient number of fingers like these could be combined with a nuclease to specifically cut a target DNA sequence.


Transcription Activator Like (TAL) region bound to target DNA. Combined with a nuclease, TAL regions can also effect a break in a specific DNA location.


Cas9 protein (grey) with guide RNA (gRNA, red) and target DNA sequence (orange). The guide RNA is the component of this machine that does the targeting. This makes the guide RNA the only part that needs to be designed to target a specific sequence in an organism. The same Cas9 protein, combined with different gRNA strands, can target different locations on a genome.

Targeting a DNA sequence with an RNA sequence is simple. RNA and DNA are both chains of nucleotides, and the rules for binding are the same as for reading out or copying DNA: A binds with T, U binds with A, C binds with G, and G binds with C [1]. Targeting a DNA sequence with protein motifs is much more complicated. Unlike with nucleotide-nucleotide pairing, I can’t fully explain how these residues are targeted, let alone in a single sentence. This has consequences in the initial design of the gRNA as well as the efficacy of the system and the overall success rate.

So the comparative ease-of-application stems from the differences in protein engineering vs. sequence design. Protein engineering is hard, but designing a gRNA sequence is easy.

How easy is it really?

Say that New Year’s Eve is coming up, and we want to replace an under-functioning Acetaldehyde Dehydrogenase [2] with a functional version. First we would need a ~20 nucleotide sequence from the target DNA, like this one from just upstream of the ALDH1B gene:


You can write out the base-pairings by hand or use an online calculator to determine the complementary RNA sequence:


To associate the guide RNA to the Cas9 nuclease, the targeting sequence has to be combined with a scaffold RNA which the protein recognises.

Scaffold RNA:

Target Complement:

Target complement + scaffold = guide RNA:

With that sequence we could target the Cas9 nuclease to the acetaldehyde dehydrogenase (ALDH1B) gene, inducing a break and leaving it open to replacement. The scaffold sequence above turns back on itself at the end, sinking into the proper pocket in Cas9, while the target complement sequence coordinates the DNA target, bringing it close to the cutting parts of Cas9. If we introduce a fully functional version of the acetaldehyde dehydrogenase gene at the same time, then we surely deserve a toast as the target organism no longer suffers from an abnormal build-up of toxic acetaldehyde. Practical points remain to actually prepare the gRNA, make the Cas9 protein, and introduce the replacement sequence, but from an informatic design point of view that is, indeed, the gist.

That’s the basics of targeting Cas9 in 1,063 words. I invite you to try and explain the intricacies of TAL effector nuclease protein engineering with fewer words.


[1] That’s C for cytosine, G for guanine, U for uracil, and A for adenine. In DNA, the uracil is replace with thymine (T).

[2] Acetaldehyde is an intermediate produced during alcohol metabolism, thought to be largely responsible for hangovers. A mutation in one or both copies of the gene can lead to the so-called “Asian Flush”.

Sources for structures:

I rendered all of the structures using PyMol. The data come from the following publications:

PDB structure: 3VEK (Zn-finger)

Wilkinson-White, L.E., Ripin, N., Jacques, D.A., Guss, J.M., Matthews, J.M. DNA recognition by GATA1 double finger.To Be Published

PDB structure: 3ugm (TAL)

Mak, A.N., Bradley, P., Cernadas, R.A., Bogdanove, A.J., Stoddard, B.L. The Crystal Structure of TAL Effector PthXo1 Bound to Its DNA Target. (2012) Science 335: 716-719

PDB structure: 4oo8 (Cas9)
Nishimasu, H., Ran, F.A., Hsu, P.D., Konermann, S., Shehata, S.I., Dohmae, N., Ishitani, R., Zhang, F., Nureki, O. Crystal structure of Cas9 in complex with guide RNA and target DNA. (2014) Cell(Cambridge,Mass.) 156: 935-949

Comic cover original source:
“Amazing Stories Annual 1927” by Frank R. Paul – Scanned cover of pulp magazine. Licensed under Public Domain via Wikimedia Commons –

What’s the big deal with CRISPR/Cas9?


Cas9 (grey) in complex with yellow guide RNA and red target DNA. PDB structure 4oo8 manipulated in PyMOL by yours truly. Cas9, like competing genome editing technologies (TALENs and ZFNs), is a nucelase. Click to view animated GIF.

Summary: Eliminate hereditary diseases. Re-program pathological tissue. Design babies. Bring back the T. rex. The peril and promise of genetic engineering has been a long-time coming. Generally speaking, none of the wonders we began collectively imagining with the deduction of DNA structure in the 1950s have come to fruition. At the turn of the millenium with the completion of the human genome project(s), we expected personalized medicine to eradicate inefficacies and side effects in modern medicine. Current development based on bacterial immune systems promises to either revolutionise the treatment of genetic disease or fill the world with ten-foot tall babies shooting lasers out of their perfect blue eyes while playing professional basketball and winning Nobel Prizes.

My first foray into a wet lab consisted of a project straight out of the astounding futures your favourite sci-fis promised you- or warned you about: incorporating functional genetic elements from humans into fungal cells. After a summer spent pushing the limits of what is possible and blurring the lines of what it means to be human, I created a terrible organism neither man nor yeast. Unable to find acceptance among people and no longer satisfied by nature’s intentions, these fungal colonies, the bizarre offspring of one man’s twisted mind and leavening products found the cruel world to be too much and jumped into an autoclave while reciting Macbeth.

Despite the hyperbolic passage above, the monsters yet live. The strain ended up in a laboratory-grade freezer at negative eighty degrees (Celsius, of course, the lab being free of both astrologers and barbarians). The little yeasties are probably still chilling in the small cardboard box where I left them, covered in frost and enjoying a nice bath of glycerol cryo-protectant, traveling through time in suspended animation until the world is ready for them.

The human genes and their counterparts in baker’s yeast are similar enough that in this case one could substitute for the other (at least in one direction). The function of these metabolic keystones known as ATP synthases is an ancient one: churning the potential energy of an electron gradient to make the cellular energy storage molecule adenine triphosphate (ATP). They are primeval enough that the human version acts as a suitable stand-in for a strain of Saccharomyces cerevisiae otherwise incapable of aerobic respiration. I had precisely engineered a genetic vector that inserted directly into the location of the yeast’s genome where the native version had been removed. And by “precisely engineered” I mean that it was so easy, an undergrad could do it, as I did.

Recently a technique based on CRISPR (Clustered Regularly Interspersed Short Palindromic Repeats) and CRISPR-Associated Proteins (such as Cas9) has garnered a lot of attention in the press as well as the scientific community. The word-sequences up-regulating all the excitement highlight the ease and effectiveness of CRISPR/Cas9 over previous methods. The technique’s critical reception has run the full range from drooling anticipation to worried alarm to bad puns.

Since my early days in the lab playing as a god with design of human-yeast splices, I’ve continued down the rabbit-hole of biological scale to the point that I now work more often with the single molecule(s) of biomolecular machinery than with cells directly. So I’m certainly out of the loop and out of a practical grasp of the rational underlying CRISPR/Cas9 genome editing. After all, spider silk proteins have been produced in mammalian cells since before 2002, and are regularly produced in goat’s milk. Does CRISPR/Cas9 change the game to such a degree that warrants the flood of interest?


The interest surrounding CRISPR

I’ll skip over the high-level technical overviews that you’ve probably read before, but for those with the time and interest I can recommend Jennifer Doudna’s Breakthrough Prize lecture. Instead I’ll compare two protocols, the first based on CRISPR/Cas9 and the second based on an older technique using another type of engineered nuclease known as zinc-finger nucleases (ZFNs). I scraped both protocols from the same publication, so apparent differences due to style should be small. To get a sense of the complexity of each technique, here are the two protocols as wordle word-clouds, displaying the size of the 256 most frequently used words in each protocol according to their relative usage.


ZFN protocol: word frequency word cloud


CRISPR/Cas9 protocol: word frequency word cloud

The table below compare the complexity and length of either protocol. The reading complexity measures were generated with this tool, and in short the first measure decreases with increased complexity while the second two increase with added complexity.


At first glance we see that the CRISPR/Cas9 protocol is much longer and more complicated, but if we consider that the Zn-finger nuclease protocol only describes the process up to in vitro validation of the process, we can make a much more equivalent comparison by truncating the CRISPR/Cas9 protocol to the first 13 steps. The resulting comparison:


The associated Wordle even looks a bit friendlier.


So suffice it to say that it’s not easy to see the underpinnings of the excitement surrounding major developments such as CRISPR/Cas9. Essentially the advantages of the CRISPR-based approach stems from the level of difficulty of engineering guide RNAs versus engineering DNA-binding domains based on amino acid residues required for competing techniques ZFNs and TALENs (not compared here). In the brewer’s yeast I modified “back in the day,” targeting the desired genes to the desired location was as simple as including a sequence from the target location on the DNA to be inserted; there are sufficient double-stranded breaks in a flask of yeast culture to allow the gene to find its target a few times. With the specifically targetable nucleases such as Cas9, Zinc-finger nucleases and TALENs, one doesn’t have to count on such an easy model organism to precisely manipulate a small number of cells for a desired change to the genome.

The increased interest alone is sure to drum up funding, public intrigue, and private investment, driving the impact forward as a self fulfilling prophecy. The more interested and excited people are for CRISPR/Cas9, particularly those people with the deep pockets to fill out scientists’ salaries, the more the technique will be subjected to use and refinement. More people using the tool drives the potential for meaningful breakthroughs. On the other hand, we have been promised and warned of this same onrushing biopunk dystopia before, and as they say: if this is the future, where are my gene-driven superpowers?


Published protocols referenced in this post:
[1] Carroll, D., Morton, J. J., Beumer, K. J., & Segal, D. J. (2006). Design, construction and in vitro testing of zinc finger nucleases. Nature Protocols, 1(FEBRUARY 2006), 1329–1341.

[2] Ran, F. A., Hsu, P. P. D., Wright, J., Agarwala, V., Scott, D. a, & Zhang, F. (2013). Genome engineering using the CRISPR-Cas9 system. Nature Protocols, 8(11), 2281–308.

[2015/12/14 EDIT – copyediting]