A powerful feature of OpenRefine is the ability to pull external data from URLs. This exercise will apply this feature to the Other_Diagnosis column of the sample dataset. Pulling in data from outside sources can be time-consuming as OpenRefine must query them, therefore we'll work with just one row.
"https://en.wikipedia.org/w/api.php?action=opensearch&action=query&generator=prefixsearch&gpssearch="
+ value.escape('url')
+ "&prop=extracts&exintro=1&explaintext=1&redirects=1&limit=10&namespace=0&format=xml"
See the next tab to learn how to parse useful bits from this data.
To parse the description of Asthma from the HTML code block retrieved in the previous exercise, a parsing expression can be used.