ACS Spring 2022
Expanding Cheminformatics to Industries Adjacent to Small Molecule Drug Discovery
Beyond drug discovery: Breaking the boundaries of natural products information
Jonathan BISSON, Adriano Rutz, Guido Pauli, Jean-Luc Wolfender, Pierre-Marie Allard
ACS Spring 2022
Expanding Cheminformatics to Industries Adjacent to Small Molecule Drug Discovery
Formerly at the University of Illinois Chicago
Now at Collaborative Drug Discovery Inc![1]
Core team
Adriano Rutz, Jonathan Bisson, Pierre-Marie Allard
The first contributors
Maria Sorokina, Jiří Vondrášek, Daniel Mietchen, Egon Willighagen, Roderic Page, Ralf Stephan, Christoph Steinbeck, Jakub Galgonek, James Graham, Guido Pauli, Arnaud Gaudry, Jean-Luc Wolfender
And all the, too many to cite, contributors on Wikidata.
Funding
NIH U41 AT008706 / P50 AT000155 (NCCIH/ODS)
ChemBioSys (Project-ID 239748522, SFB 1127)
Alfred P.Sloan G-2019-11458
Elixir CZ MEYS LM2018131
Taxonomy
Chemistry
Over 750,000 entries added as of today:
250,000+ unique structures
30,000+ organisms
75,000+ references
Original question
"Hey Wikidata, Which organisms are known to contain quercetin ?"
SPARQL translation
SELECT DISTINCT ?parent_taxon ?parent_taxonname ?taxon ?taxonname WHERE {
VALUES ?classes { wd:Q11173 wd:Q59199015 } ?compound wdt:P31 ?classes; wdt:P235 "REFJWTPEDVJJIY-UHFFFAOYSA-N".
?taxon wdt:P171 ?parent_taxon. { ?compound p:P703 ?stmt.
?stmt ps:P703 ?taxon. { ?stmt prov:wasDerivedFrom ?ref.
?ref pr:P248 ?art. ?art wdt:P356 ?art_doi.
} } ?taxon wdt:P225 ?taxonname.
?parent_taxon wdt:P225 ?parent_taxonname.}
Sorted by parent taxon.
Which Zephyranthes species lack compounds known from at least two sister species?
Allows us to link Wikidata with PubChem (we will see another integration in a few slides), DrugBank, ChEMBL, ChEBI. It also allows us to do substructure search and similarity searches.
Work by Tiejun Cheng, Evan Bolton and Adriano Rutz.
A new rule-based approach to classify compounds using Wikidata by Ralf Stephan.
We need YOU!
Integration: data lakes, databases and knowledge graphs
Protocols and data are open. SPARQL endpoint allows integration.
Contribute to the Open Data, get visibility. Same model as bio-assay data shared by companies on PubChem,
Combine the skills, tools and knowledge of other sectors in unprecedented ways.
The website: lotus.nprod.net
The manuscript: lotusnprod.github.io/lotus-manuscript
A frontend: lotus.naturalproducts.net
The code: github.com/lotusnprod
The Wikidata project: https://www.wikidata.org/wiki/Wikidata:WikiProject_Chemistry/Natural_products