How do viruses jump species?
SARS-Cov-2 (the virus causing the Covid-19 disease) has jumped to us from another species. Which is how we got most other deadly infectious diseases ... measles (more on this below), flu, HIV, nipah, mumps, etc. The image above shows an artificially coloured scanning electron microscope image of a mass of yellow SARS-CoV-2 virus particles on a dying blue/green cell (Credit: NIAID).
Some people think that the Chinese know exactly how the virus jumped and are deliberately covering it up. But the truth is simpler. The truth is that nobody knows exactly what happened and probably never will. Think about HIV ... studied intensively for decades. It jumped to us from monkeys and chimps but the exact process will probably never be known. There are a few candidate theories but no way of being sure about which one is true. Once you understand the way viruses move between species, the virtual impossibility of knowing exactly how it happened will be clear. That doesn't mean we can't work out how to dramatically reduce the risk of future similar pandemics. We can do that even if we don't know exactly what caused this one.
To understand how viruses move between species, you need to understand what they are and how they work. Understanding this is made incredibly difficult by the vast amount of technical language obscuring the mechanisms for all but those with a professional interest. So this post is for people with a non-technical background; arts, law and the like. It will require hard thinking, but no maths and almost no jargon :)
What's a virus? Quick background
Imagine a long strip of paper with a line of letters printed along its length ... scrunch it up and wrap it in glad wrap. This is pretty much what a virus is. The letters correspond to the genetic material and the glad wrap is the container it comes in. A virus doesn't have much else. The glad wrap has plenty of hooks on it ... a bit like velcro. Virologists and geneticists have special names for all the bits and pieces ... the package can have various forms, each with a special name. The letters come in various forms (RNA or DNA) and configurations (segmented, unsegmented). The jargon is critical for experts, but you can understand a surprising amount of the complexity without it; just think about letters on a strip of paper!
One little glad wrapped bundle is a single virus particle, often called a virion. That's one of the few special words in this article.
A virion is not alive. It isn't made of cells, it has no nervous system. It doesn't eat or move of it's own accord.
When you have a raging infection, you will have billions of virions; even if it starts with a handful. They don't cooperate with each other, they don't plot and plan. They merely react according to the laws of chemistry and physics.
A bacterium (singular), on the other hand, is a fully fledged little critter. It can reproduce and it feeds.
But a virus does nothing without a critter to infect. When we have a viral infection we think it infects us, but each virus particle actually only infects a single cell within us.
Viruses can infect bacterial critters also.
Bacteria feed and reproduce in you or on your skin. Viruses just hijack your cells and reproduce. That way of describing it makes viruses sound devious and cunning, but the aren't. They just drift about bumping into things. The hooks on their surface sometimes fit little hooks on the surface of a cell ... like a couple of hands of matching size and shape clasping each other. If hooks on the virus match hooks on a cell then the particle gets stuck. As it locks into place another bump on the glad wrap is driven into the cell. Depending on the viral species concerned, the bump causes a hole in the surface of the cell and the glad wrap unravels and dumps the paper strip into the body of the cell. The genetic material ends up being copied by the cell, because the machinery in cells is pretty happy to copy any damn genetic material it comes across. The immune system will try to get in the way, but we'll just assume you haven't got one for now!
Genetic material and proteins
In our analogy, genetic material is just letters on a strip of paper ... for example: "your foot shift". I've used words rather than a nonsense string like: "yofodl lddykfable" because I want the words to correspond to genes in the analogy. A gene can be very roughly thought of as an instruction manual for making a protein. So our three word string is like three genes.
Viral reproduction, in our analogy, is just the copying of the string. If the copy process is perfect then we have two identical strings: "your foot shift" and "your foot shift". After one or more copies are made in the cell, the resulting strings are wrapped in glad wrap with the hooks and bumps and eventually the cell bursts open (and dies) releasing all the little glad wrapped balls.
But for many viruses, the copying process is a little (or a lot) sloppy.
The SARS-Cov-2 genetic string is about 29,000 letters and every time it gets copied, there's a good chance of a mistake. For most viruses of the same general class as SARS-CoV-2, there would be perhaps 3 mistakes per copy, but corona viruses have a mechanism which reduces this. It's a primitive form of the kind of repair that our genetic material has.
Aside: When Linus Pauling calculated the number of cancers and birth defects due to radioactive fall out (back in 1959), geneticists thought mutations (in humans) were rare things which occurred "once in a thousand [human] generations". This was a mistake built on a lower level mistake in thinking that damage to DNA in cells was incredibly rare. We now know such damage is incredibly common ... 10,000 pieces per cell per day. The key to our long lives is not a low level of damage, but amazing repair processes. In viral genetic material, the damage due to the dangerous environment inside a cell is coupled with relatively sloppy copying mechanisms and usually no repair mechanism ... a combination punch.
Resume: Many changes to the letters are dead ends; meaning they turn the gene into nonsense. But those changes that aren't dead ends give rise to a split in the family tree. Here's a simple example. The starting set of letters is "your foot shift" in one virion.
The initial copy, shown as line 2, yields a dead end. The yozy string on the right is nonsense. The next copy yields two viral gene strings (on line 3) which are both viable, but with different second words. We now have two viral strains. The "food" strain and the "foot" strain.
Since each word in our analogy is a gene and each gene is a protein, our two virus strains have slightly different 2nd proteins. Remember the hooks? The correct jargon for these on a SARS-CoV-2 virion is a spike and the gene (the word) which generates it is the S gene. Changing the shape of the hook by just a little can influence how likely it is to stick to various types of cell in any animal in which it finds itself. Single letter changes which change the hook slightly might improve or impede its capacity to latch on to the hooks on cells it bumps into.
The error rate in viral reproduction is reasonably well known for different kinds of virus, so if you look at a couple of strings from two virions, you can count the number of different letters and if there are, say, 20, then you know that they must have had a common ancestor virion some 10-20 generations ago. In the case of SARS-Cov-2, there are so many differences between it and the closest bat virus sample on record, that it must have split about 30 years ago (assuming single letter changes during each copy).
The third split is the interesting one. On the right we have a single mutation of "shift" to "shaft". This is sensible, so the virus particle is viable. On the right we have "food" transforming into "yellow". What has happened? How on earth did we get so many changes in one copy operation? We didn't. It turns out there is another way to generate new viral gene strings string.
Time for a break
That's a big chunk of information to digest. Let's relax and look at a picture of a real virus family tree based on real genetic material. Here it is below. Each dot on the image below corresponds to one string of letters from one little virus particle. The tree is drawn to grow sideways rather than down and the number on the left (the Y axis) is the number of different letters in the dot compared to the starting dot. The starting dot is the first genetic sequence lodged on an international database by researchers in China back in January. The image comes from the Nextstrain website built by a team of 18 people, but using data and software developed by many hundreds of people over many years.
You can see that my simple analogy is much simpler than the real thing, but still illustrates what is happening. Notice that my little tree has only two branches from each string. In the real world the number can be much bigger. Much bigger than even in this real image with real data ... because it too is a simplification of the real world!
If you've been paying attention, you might have realised that the gene strings of all the millions of viral particles in your body aren't the same. Evolution of the gene strings takes place within your body. This is why combination treatments of HIV are required. Give a person a single drug and the virus will soon evolve to beat that drug. What this means is that the drug will kill most of the viral particles, but not all ... precisely because they aren't all the same! And the ones that survive will be copied and keep the infection going. But by using multiple drugs makes it hard for the virus to evolve around all of them simultaneously.
In any event, if you wondered how come each person's sample gets assigned a single string, then you are paying attention! There's a partial answer to that question ... namely that most of the variation in the gene strings of the viral particles is irrelevant to the functioning of the virus.
Note the "rate estimate"; 26.179 subs per year. That's the rate at which the virus is changing from the reference virus genetic string. This means that if you took a sample of SARS-CoV-2 from a bunch of people in a year, then, on average their genetic strings would be 26 letters different from today's reference string. What will these changes do to the virus? Probably make it less dangerous (touch wood).
But wait there's more
So what does it mean to find the origin of SARS-Cov-2?
To know the origin means to know the shape and structure of the tree before the first case of Covid-19 caused by SARS-Cov-2. Somewhere, there was one or more critters infected with one or more viruses and somebody caught SARS-Cov-2 from that critter. Or not. Virologists talk about "stuttering" during virus jumping. There may have been many viruses with strings similar to SARS-Cov-2 which infected many people, but they didn't have quite the mutations required to allow the next step ... transmission between people. So infections could have happened over and over for years or even decades until one virion got a mutation that enabled efficient transmission. That mutation may have occurred in a person or in multiple people. Meaning it could have occurred after the first jump into people.
If you are beginning to see how hard it is to find the origin of this virus, then buckle up because it is actually much harder than I've hinted at above.
In our simple case I drew the tree for you. I chose all the mutations (changes of letters).
Suppose instead I just gave you two strings: "your foot shaft" and "your yellow shift" and said: "Draw the tree, please". That's the real world problem.
You could draw many possible trees that would yield those two strings after a suitable number of steps. Which is the actual one that occurred? You could never know!
It's like finding a blind person sprawled in the street after tripping on a paving stone on a piece of paved footpath (sidewalk). Maybe they know exactly which stone they tripped on, but probably not. You could examine the stones and rank them in order of likelihood. The one with the biggest protruding edge closest to the centre of the path would be a good bet as the most likely. But the person may not have tripped on the most likely stone at all!
So it is with gene trees (the technical word is phylogenetic trees). All you can ever do is to estimate the most likely tree (or collection of trees). Doing this involves two branches of mathematics ... combinatorics and statistics, suitably informed by virologists who will insist on adding complicating details that mathematicians would prefer to ignore!
Recombination and "yellow"
Remember the "yellow" change in our tree above? I never explained how it happened.
Imagine a cell being simultaneously infected by two different kinds of virus; like getting the flu + SARS-CoV-2 at the same time; or even three, add in a second kind of corona virus. A cell with those infections will contain three strings, for example, "your shift foot", "thongs yellow ideas" and "bicycle weight person".
When these three strings enter the cell, they float around and can get all mixed up. Whole slabs of one string can end up spliced into a different string. These are called recombinant or reassortment viruses and its a potent way of rapid evolution. And it can get even messier. You can get a piece of your own normal genetic material wrapped in viral glad wrap together with hooks and bumps ant that little ball can deliver your DNA into some other species.
One of the current theories about how the jump to humans was made in the case of SARS-CoV-2 is just such a recombination ... most of the virus gene string came from bats and the spike gene came from a pangolin. But there is more to explain and if the HIV saga is anything to go by, it will be a long road to find more of the detail.
To people who are sick, or have loved ones who are sick, the origins of this spillover (the technical term usually used for such events) will be less important than medical treatment options. But we've had many such spillovers during recent decades and we need to understand the mechanisms so that we can avoid them as far as possible in the future. In a broad sense we do already know enough to prevent (or at least reduce) future pandemics. Stop factory farming, wet markets and encroaching on wildlife habitat. We need global efforts in this respect and we need action in preference to blame; because nobody is innocent!
Following SARS, Chinese researchers have been at the forefront of corona virus research; both independently and in collaboration with researchers from all over the planet. Why did this pandemic start in China and so close to the Wuhan Institute of Virology? There is at least one collaborative project underway to look back in time at stored blood samples from pneumonia cases in many parts of China. Hopefully the attempts by Donald Trump and others to politicise the issue of the origins of SARS-Cov-2 won't get in the way of that or other projects. I'd be predicting that that project may well find evidence of "stuttering" ... previous infections that went unnoticed, both because they didn't occur anywhere close to a major virological research centre and because they didn't have the mutations to explode into a pandemic. They might find virions with the pangolin spike but lacking something else which is important ... some virulence factor.
I've tried to convey the complexity of the viral evolution process without letting too many details get in the way. If you want the details, then any virology text book will have them.
Once you know and think about the full range of natural gene shuffling mechanisms, none of which are subject to any kind of ethics committee oversight, you'll probably realise how irrational opposition to regulated GM technologies is. It's perfectly reasonable to oppose a bad GM proposal ... like breeding chickens that grow twice as fast ... but that's true whether the chickens are GM or natural. The sadistic abomination that is the modern broiler was created entirely with natural breeding. What matters is the end result of the process, not whether it uses GM or other modes of what I'd call accelerated directed evolution.