Blogs Blogs

«Back

How to find data in OpenTox datasets

OpenTox dataset services, currently running at IdeaConsult server, host a number of datasets, containing chemical structures, toxicity data and calculated properties. Some statistics as of today :

$ curl http://apps.ideaconsult.net:8080/ambit2/stats/dataset
4120

$curl http://apps.ideaconsult.net:8080/ambit2/stats/structures
521838

$ curl http://apps.ideaconsult.net:8080/ambit2/stats/properties
31387

While all the information is accessible via OpenTox API and web browser, with the increase of the information available, querying for specific portions of data becomes essential. The examples make use of cURL , but most links could be followed and viewed in web browser as well.

Let's start with two simpe tasks - exploring available datasets and finding which datasets contains a column with given name.

The most straightforward query is to look for datasets by name. The following query will list all datasets, with name starting with ToxCast.

$ curl -H "Accept:text/uri-list" http://apps.ideaconsult.net:8080/ambit2/dataset?search=ToxCast
http://apps.ideaconsult.net:8080/ambit2/dataset/103
http://apps.ideaconsult.net:8080/ambit2/dataset/104
http://apps.ideaconsult.net:8080/ambit2/dataset/105
http://apps.ideaconsult.net:8080/ambit2/dataset/106
http://apps.ideaconsult.net:8080/ambit2/dataset/107
http://apps.ideaconsult.net:8080/ambit2/dataset/108
http://apps.ideaconsult.net:8080/ambit2/dataset/109
http://apps.ideaconsult.net:8080/ambit2/dataset/110

http://apps.ideaconsult.net:8080/ambit2/dataset/111
http://apps.ideaconsult.net:8080/ambit2/dataset/112

List ToxCast datasets URI

 

The RDF/XML representation tells us the URI of the ToxCast dataset, containing Novascreen assays, is http://apps.ideaconsult.net:8080/ambit2/dataset/110  (prefix xml:base="http://apps.ideaconsult.net:8080/ambit2/ and dataset resource and identifier rdf:about="dataset/110")  
)

The datasets contain rows (chemical compounds) and columns (properties). Finding out which properties a dataset consists of is easy - just append "/feature" to the dataset URI: 

$ curl -H "Accept:text/uri-list" http://apps.ideaconsult.net:8080/ambit2/dataset/110/feature?max=10
http://apps.ideaconsult.net:8080/ambit2/feature/22687

http://apps.ideaconsult.net:8080/ambit2/feature/22688

http://apps.ideaconsult.net:8080/ambit2/feature/22689

http://apps.ideaconsult.net:8080/ambit2/feature/22690

http://apps.ideaconsult.net:8080/ambit2/feature/22691

http://apps.ideaconsult.net:8080/ambit2/feature/22692

http://apps.ideaconsult.net:8080/ambit2/feature/22693

http://apps.ideaconsult.net:8080/ambit2/feature/22694

http://apps.ideaconsult.net:8080/ambit2/feature/22695

http://apps.ideaconsult.net:8080/ambit2/feature/22696

List URI of first 10 columns form ToxCast NovaScreen dataset

The same query could be instructed to return information in other formats, for example N3 :

$ curl -H "Accept:text/n3" http://apps.ideaconsult.net:8080/ambit2/dataset/110/feature?max=3
@prefix ot:      <http://www.opentox.org/api/1.1#> .
@prefix dc:      <http://purl.org/dc/elements/1.1/> .
@prefix :        <http://apps.ideaconsult.net:8080/ambit2/> .
@prefix otee:    <http://www.opentox.org/echaEndpoints.owl#> .
@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl:     <http://www.w3.org/2002/07/owl#> .
@prefix xsd:     <http://www.w3.org/2001/XMLSchema#> .
@prefix ac:      <http://apps.ideaconsult.net:8080/ambit2/compound/> .
@prefix ad:      <http://apps.ideaconsult.net:8080/ambit2/dataset/> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix af:      <http://apps.ideaconsult.net:8080/ambit2/feature/> .

ot:hasSource
      a       owl:ObjectProperty .
ot:units
      a       owl:DatatypeProperty .
ot:Feature
      a       owl:Class .

ot:NumericFeature
      a       owl:Class ;
      rdfs:subClassOf ot:Feature .

<http://apps.ideaconsult.net:8080/ambit2/feature/22688>
      a       ot:Feature , ot:NumericFeature ;
      dc:creator "someone" ;
      dc:title "NVS_GPCR_hAdrb1" ;
      ot:hasSource <http://apps.ideaconsult.net:8080/ambit2/dataset/ToxCast_Novascreen_20091214.txt> ;
      ot:units "" ;
      =       otee:Receptor_binding_and_gene_expression .

<http://apps.ideaconsult.net:8080/ambit2/feature/22687>
      a       ot:Feature , ot:NumericFeature ;
      dc:creator "someone" ;
      dc:title "NVS_IC_rCaChN" ;
      ot:hasSource <http://apps.ideaconsult.net:8080/ambit2/dataset/ToxCast_Novascreen_20091214.txt> ;
      ot:units "" ;
      =       otee:Receptor_binding_and_gene_expression .

<http://apps.ideaconsult.net:8080/ambit2/feature/22689>
      a       ot:Feature , ot:NumericFeature ;
      dc:creator "someone" ;
      dc:title "NVS_ENZ_hPTPBAS" ;
      ot:hasSource <http://apps.ideaconsult.net:8080/ambit2/dataset/ToxCast_Novascreen_20091214.txt> ;
      ot:units "" ;
      =       otee:Receptor_binding_and_gene_expression .

The first 3 columns from ToxCast NovaScreen dataset in RDF N3 format

The N3 tells us the title of each column (dc:title) , the origin of the column (ot:hasSource) , and asserts the column represent the same entity, as the one, defined in ECHA Endpoints ontology (otee: Receptor_binding_and_gene_expression). 
 

The full ToxCast NovaScreen dataset can be retrieved  in several formats either by name

(RDF N3)

$curl -H "Accept:text/n3" http://apps.ideaconsult.net:8080/ambit2/dataset/ToxCast_Novascreen_20091214.txt

or by id

(SD File)

$curl -H "Accept:media=chemical/x-mdl-sdfile" http://apps.ideaconsult.net:8080/ambit2/dataset/110

(RDF/XML File)

$curl -H "Accept:media=application/rdf+xml" http://apps.ideaconsult.net:8080/ambit2/dataset/110

To retrieve only specific columns from the entire dataset, one needs to specify the URI of the column as parameters:

http://apps.ideaconsult.net:8080/ambit2/dataset/110?feature_uris[]=http://apps.ideaconsult.net:8080/ambit2/feature/22689

 

Finally, let's assume one doesn't know if ToxCast datasets are available via dataset services,  but would like to know if there are columns with prefix "NVS", which in ToxCast datasets represent NovaScreen assays.

Searching for columns (OpenTox features) by name is similar to searching datasets by name:

$ curl -H "Accept:text/n3" "http://apps.ideaconsult.net:8080/ambit2/feature?max=3&search=NVS"
@prefix ot:      <http://www.opentox.org/api/1.1#> .
@prefix dc:      <http://purl.org/dc/elements/1.1/> .
@prefix :        <http://apps.ideaconsult.net:8080/ambit2/> .
@prefix otee:    <http://www.opentox.org/echaEndpoints.owl#> .
@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl:     <http://www.w3.org/2002/07/owl#> .
@prefix xsd:     <http://www.w3.org/2001/XMLSchema#> .
@prefix ad:      <http://apps.ideaconsult.net:8080/ambit2/dataset/> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix af:      <http://apps.ideaconsult.net:8080/ambit2/feature/> .

ot:hasSource
      a       owl:ObjectProperty .

ot:units
      a       owl:DatatypeProperty .

ot:Feature
      a       owl:Class .

<http://apps.ideaconsult.net:8080/ambit2/feature/22688>
      a       ot:Feature ;
      dc:creator "194.141.0.136" ;
      dc:title "NVS_GPCR_hAdrb1" ;
      ot:hasSource <http://apps.ideaconsult.net:8080/ambit2/dataset/ToxCast_Novascreen_20091214.txt> ;
      ot:units "" ;
      =       otee:Receptor_binding_and_gene_expression .

<http://apps.ideaconsult.net:8080/ambit2/feature/22687>
      a       ot:Feature ;
      dc:creator "194.141.0.136" ;
      dc:title "NVS_IC_rCaChN" ;
      ot:hasSource <http://apps.ideaconsult.net:8080/ambit2/dataset/ToxCast_Novascreen_20091214.txt> ;
      ot:units "" ;
      =       otee:Receptor_binding_and_gene_expression .

<http://apps.ideaconsult.net:8080/ambit2/feature/22689>
      a       ot:Feature ;
      dc:creator "194.141.0.136" ;
      dc:title "NVS_ENZ_hPTPBAS" ;
      ot:hasSource <http://apps.ideaconsult.net:8080/ambit2/dataset/ToxCast_Novascreen_20091214.txt> ;
      ot:units "" ;
      =       otee:Receptor_binding_and_gene_expression .

The first 3 columns with names, starting with NVS in RDF N3 format

The output contains links (ot:hasSource) to the dataset the NVS* columns belongs to (http://apps.ideaconsult.net:8080/ambit2/dataset/ToxCast_Novascreen_20091214.txt), and the content of the dataset  could be easily retrieved in several formats, as explained above.

 

How to perform more complex queries via OpenTox ontology service and SPARQL, how to annotate dataset columns with resources, defined in existing ontologies , and how to use all the information and API to calculate properties and build predictive models - in these blogs we will try to provide practical guide and examples, in addition to official documents at OpenTox site.

 

Previous
Comments