"No-one is looking for sequences with anywhere near the scale and depth that we do”

Basecamp Research is scanning the planet for new and potentially useful protein sequences at an unprecedented rate — using AI to find molecules with commercially useful functions in biodiversity hotspots and understudied biomes. Can the start-up’s unique approach to sampling nature ethically, combined with a cutting-edge machine learning platform, accelerate the power of biotechnology and help nations value their biodiversity?

March 1st 

On the upper floors of a suite of trendy warehouse offices in central London, a team of field researchers, data engineers and protein scientists are assembling the most comprehensive database of the molecular diversity of planet Earth ever made. This is Basecamp Research, the latest Silicon-Valley funded company to disrupt and accelerate bioscience using artificial intelligence (AI) and machine learning.

In a small lab in the corner of Basecamp’s bright, open plan office, two of their field researchers, back from their latest sampling trip, are analysing soil samples for new protein sequences. In bags of greenish water sagging on the floor of a meeting room, ever more molecules await discovery. As the start-up’s field researchers collect thousands of new protein sequences from the world’s most biodiverse and understudied biomes, its computer scientists are building a new kind of protein database, powered by AI, to make meaningful connections between all that data and learn new rules about what gives certain proteins certain functions.

According to Basecamp, their international sampling expeditions have already increased the number of proteins known to science by 50% in just a few years. In 25% of enzyme classes, they have 10 times as many sequences as can be found in public databases, and in six per cent of enzymes classes, they have 50 times more. The start up says this database or ‘map’ of life’s molecular diversity “unlocks the next holy grail of computational biology: in silico prediction of complex function.”

The company’s co-founder, Oliver Vince, says their aim is to be an interface between biodiversity and biotechnology. That means helping clients from a range of sectors find solutions to their problems among the great untapped diversity of biomolecules on the planet. Their in silico platform is already predicting which proteins from their enormous database will be most useful for specific industrial, research or medical requirements, often molecules that have never been studied before.

                                                                     * * * * 

What’s new about Basecamp’s technology is not just the amount of sequences they have to draw on, but how they collect, annotate and link data. Whereas existing protein databases are essentially a catalogue or list, Basecamp’s database is a ‘knowledge graph’ – a deep network of information that adds context to protein sequences about the conditions in which they act. The graph connects proteins to relevant geographic, environmental, chemical and ecological data.

As Basecamp’s chief technology officer, Philipp Lorenz explains, what you find in nature “is not really arranged as a list or catalogue. Things act more like a network. They interact with each other, they interact with other species, their environment, with specific conditions. So actually, constructing whatever resources you have as a network just feels a lot more intuitive, especially given the vast diversity of data we have.”

Their approach is also reducing the amount of redundancy in existing databases. For too long, Vince says, the search for or design of proteins has been based on the tiny sliver of the Earth’s biomolecules that have been studied as part of academic research.

“If you look at public databases, half of the sequences come from about 20 different species,” he says. “But there are estimated to be over a trillion species on Earth. Pretty much everything that has been sequenced up to now has been collected by researchers working on a specific project and specific species, and when you put it all together you see a lot of the time they are all looking at the same stuff. They are just not looking for sequences with anywhere near the scale and depth that we do.”

Basecamp are taking a different approach: to systematically and strategically gather as much information about protein biodiversity as possible. “If you think about an academic group, they might do one or two expeditions a year. We've done 70 expeditions on five continents in about a year an a half. We have sampled 45% of the World Wildlife Foundations’ major biomes. We've also trained up multiple sets of outsourced researchers, research labs, even farmers who can study their own soil microbiome while helping us access it too.

Vince met co-founder Glen Gowers while they were both at Oxford doing PhDs. The pair devised an expedition to become the first team to conduct entirely off-grid metagenomic DNA analysis, spending a month in a tent with a Nanopore device on an icecap. Vince had become disillusioned by the way PhD students were confined to making small amounts of progress on highly specific and often similar questions; Gowers had concluded from his time at GSK that designing or engineering proteins was never going to be as effective as finding the right one in nature.

“We thought that, if at least 99% of genetic biodiversity is undiscovered, what if you could systematically structure all that data in a way that would be open for machine learning?” says Vince. “We believe that nature already has the answers to our most pressing problems. We just have to go out and find them.”

Sampling the world's biomolecules 
Fieldwork website resize
Basecamp has deployed field researchers to sample biomes in 19 countries and a range of extreme or remote locations, including hydrothermal vents, jungles and icecaps.
The team of field researchers includes under-ice divers from the British Antarctic Survey and other professionals used to working in extreme environments.
“We have lots of applicants for roles here who think it will be like going on holiday, but it's not trivial to do one of these field trips,” says Lorenz. “This is tough. They're doing DNA extractions in jeeps, fields, tents, cars, hotel rooms. Or at minus twenty degrees on an ice cap. And there is an expectation when they come back they will have a certain amount of DNA.”
While remote, biodiverse, extreme or understudied regions are a good source of new sequences, the team says that because they use a variety of sequencing technologies they are also finding vast numbers of new genes and protein sequences even in ‘mundane’ soil samples.
 
The traditional starting point when searching for useful molecules in bioscience is sequence similarity. In other words, finding a protein that does something similar to what you need and working outwards from there. Basecamp’s algorithms work differently – and as a result the platform is finding proteins can have the same or highly similar functions with completely different sequences.
 
Lorenz says that their platform is regularly making discoveries that overturn conventional wisdom about what kind of sequences do what. “There has been a lot of situations where the literature says for example, ‘this five-amino-acid-loop is required for this activity’. And yet we have found enzymes without that loop, and they still work. Or work even better. There's been tonnes of examples like that.”
 
Their technology is even directing where Basecamp’s team of field researchers should look for new proteins, too. “We've built a machine learning pipeline that can actually predict where in the world – which biomes or environments – you are going to find an enrichment of certain protein classes that are of interest,” says Lorenz.
 
The team hope that as their platform expands, they will be able to find useful enzymes for use in an ever expanding range of sectors, aiding the transition from a petrochemical based economy to a greener, biotech-based economy. “If more than 99% of genetic biodiversity remains undiscovered, then potentially 99% of biotechnologies that we could use are undiscovered as well,” says Vince.
 
                                                       Oliver Vince resizedBasecamp's co-founder, Oliver Vince. 
 
                                                                            * * * * 
 
Exploiting the world’s biodiversity for commercial use at this unprecedented scale is an endeavour that will naturally concern many people who work to protect the natural world.

Vince is clear that valuing biodiversity, and the principles of access and benefit sharing, are built into Basecamp’s business model. They want Basecamp to be a model for how to do so-called ‘bio-prospecting’ in a fair and sustainable way that will ultimately help the world want to protect its natural diversity.

In each location Basecamp samples from, consent agreements are signed to provide short, medium and long term benefits to that area: providing scientific equipment, employing local people to help with expeditions, setting up continuing scientific collaborations, and paying royalties should any commercial products come from those locations. In several locations they have set up local labs and trained local scientists so they can monitor biodiversity with the latest molecular tools.

“We benefit from a world where everybody is continuously studying their own biodiversity, and some of that information is coming to us,” says Vince.

Both Vince and Lorenz are blunt about the failures of previous attempts to link the benefits of research products to the location where the product was first found.

“The starting point of this is the last of the three goals of the 1993 Convention on Biodiversity, the fair and equitable sharing of benefits arising from genetic biodiversity,” says Vince. “Biotech companies and academics have basically tried to argue their way around that and put so many conditions in place, that what it has led to, over the years, is academics being the only people that can access nature, through research permits. They then upload sequences to public databases, and then big companies just stream down that data from all over the world for free without any sort of check.”

Vince says Basecamp can act as an interface between biotech companies and biodiverse nations or conservation charities, allowing one to benefit from the other without having to set up new agreements each time. “We sit in a space where there are lots of NGOs that want to do this sort of thing and lots of companies that don’t. We want to be the interface between the two.”

Scotland FIELDWORK croppedA member of the Basecamp team sampling habitats in Scotland.
 

There are still many conservationists that are uncomfortable with the idea of putting a monetary value on nature in order to protect it, and Vince and Lorenz admit that making a case for the commercial value of biodiversity can be seen as pretty ‘cold-hearted’.

But, as Lorenz puts it, a cold hearted argument might be the only thing that certain leaders of certain nations will understand. “You can tell them that there are however many potential new gene editing therapies or new medicines in that rainforest that that they are not going to find if they chop it down and turn it into soy or corn or wheat fields.”

Basecamp’s ambitions for the future are to find biomolecules for all sectors, not just pharma and life science adjacent ones. They also want to improve our understand of the enormity of molecular diversity on this planet with a network of Basecamp mini-labs around the world.

“Hopefully we'll employ lots of people all around the world to study biodiversity,” says Vince. “Emerging policies like biodiversity credits, for example, require you to be able to monitor your biodiversity, and one of the most reliable ways of doing that one of the most valuable ways is soil microbiology. In 30 years time I want those labs to still be there, even if we're not.”

Oliver Vince and Philipp Lorenz were speaking to The Biologist's Tom Ireland. 

Tom Ireland MRSB is editor of The Biologist