If you've completed installation of the RDF extension as outlined in the previous exercise, you'll notice a new box in the upper right corner of your OpenRefine project screen, by Extensions, labeled "RDF." In this exercise, we will add a reconciliation service using this resource description framework extension. A reconciliation service works by querying websites and bringing controlled lists of terms over to your OpenRefine project. In this case we will add a service for MeSH, the Medical Subject Headings list, so we can compare terms from the Other_Diagnosis field in the sample dataset to the controlled MeSH terms used in PubMed and many biomedical repositories.
Steps to Add the MeSH SPARQL Endpoint for Reconciliation
- From the RDF extension drop down menu at the top right of your OpenRefine project screen, choose the option to add a reconciliation service.
- From the drop down list of options, choose to add a service based on a SPARQL endpoint. A SPARQL endpoint is a kind of port on an HTML website or web service that is capable of receiving queries in a specialized format. When you add such an endpoint to OpenRefine, it allows the program to query the web service.
- In the SPARQL Endpoint dialog box, provide a human-readable name for your SPARQL-based reconciliation service. Then enter the Medical Subject Headings (MeSH) SPARQL endpoint: http://id.nlm.nih.gov/mesh/sparql Then click OK.
- The SPARQL endpoint will now be available under the name provided, in this case "MeSH Reconciliation," whenever you would like to reconcile OpenRefine columns against Medical Subject Heading terms. Learn how to do this in the next section.