Fetching Big Objects Iteratively

The more data the Knowledge Graph holds and the better the data is linked, the more careful a user must be when fetching an object.

Imagine a dataset about a tour with some properties, say, name, description, image, startLocation and an endLocation. That's not an extraordinarily expressive data set, but it can already lead to some interesting side effects because of the graph form in which data is stored. Imagine the startLocation is not just a string or an object with a city name or geo coordinates, but maybe it's a point of interest (POI) that is itself stored in the graph. Maybe this POI has dozens of properties with complex nested data objects. A query now asks for the tour without specifying how "deep" the graph should be traversed; it might try to return Megabytes of data.

There are, of course, ways to deal with that, which we will explain in the following. Summarizing:

  1. us the hopparameter (<=2)
  2. fetch the data iteratively.

1. The hop parameter

The API supports a header parameter which is called x-hop (in this article, we just call it hop) (It is also described in Section Retrieving Objects). A hop is meant as one level of depth in the nesting of an object. In the tour example from above, a hop=1 would return all first-level properties of the tour: name, description, image, startLocation and endLocation. However, it would only return the real values for properties with primitive ranges, such as string, number or date. For ranges that are objects, like startLocation and endLocation, the query would return a URI. This URI is the root node of a sub-graph and can be queried. A hop of 2 would respond with the first-level properties of startLocation and endLocation and the query is fine as long as that's about it. A hop of 3 would time out in most of the cases.

2. Iterative data retrieval

If we need to fetch data which is nested deeper, say something like Tour.startLocation.containesPlace.menu.mainDishes.ingredients, then this needs to be done one object at a time.

First, a call would fetch the first-level properties with a hop=1. If the value of a property is not a string (or alike) but an object containing nothing but an @id property, then a second call is required - again withhop=1.

3. Example

Example: a Hotel called "Hotel Restaurant Pelikan"
URL of the data set: https://mein.toubiz.de/api/v1/article/ab9248cb-45f9-41a7-a45c-9c3e38aac6d2

The api call to fetch the object, without a hop definition, times out. The correct call looks like this:

GET https://proxy.opendatagermany.io/api/ts/v1/kg/things/ab9248cb-45f9-41a7-a45c-9c3e38aac6d2?lang=de&ns=https://mein.toubiz.de/api/v1/article
x-api-key: <api-key>
Content-Type: application/json
x-hop: 1

(To learn how the query is built, read Retrieving objects)

The (full) response looks like this:

[
  {
    "@id": "https://mein.toubiz.de/api/v1/article/ab9248cb-45f9-41a7-a45c-9c3e38aac6d2",
    "@type": [
      "https://schema.org/Hotel",
      "https://schema.org/LodgingBusiness",
      "https://schema.org/HotelRoom",
      "https://odta.io/voc/PointOfInterest"
    ],
    "https://odta.io/voc/geoLink": [
      {
        "@id": "http://onlim.com/entity/geoEnrichmentID_2fa4bbec-ec4c-4f11-acc5-b349bebc7178"
      },
      {
        "@id": "http://onlim.com/entity/geoEnrichmentID_57c4e466-9fa7-43b9-87e7-6321faea22fe"
      },
      {
        "@id": "http://onlim.com/entity/geoEnrichmentID_91e08de1-59b7-45d8-adbe-7ee74022a5f9"
      },
      {
        "@id": "http://onlim.com/entity/geoEnrichmentID_06af03f4-2c71-4592-834f-63f487eb7a28"
      },
      {
        "@id": "http://onlim.com/entity/geoEnrichmentID_d8803ffd-b658-4f0a-9454-b509f168d74e"
      },
      {
        "@id": "http://onlim.com/entity/geoEnrichmentID_1eb33ce1-3492-43c0-8de4-8130e293eb6f"
      },
      {
        "@id": "http://onlim.com/entity/geoEnrichmentID_c718ce93-09d9-4587-bdf3-0af939f46360"
      },
      {
        "@id": "http://onlim.com/entity/geoEnrichmentID_14044074-2fc2-4205-9ae9-18fcc9ca26d7"
      }
    ],
    "https://schema.org/address": {
      "@id": "http://onlim.com/entity/ea0941cd-7f14-4cbc-8a37-8ad03612029c"
    },
    "https://schema.org/containedInPlace": {
      "@id": "https://gdz.bkg.bund.de/Gebietseinheiten/rg/L60"
    },
    "https://schema.org/geo": {
      "@id": "http://onlim.com/entity/d3f5ac81-d7f4-40b0-a8cc-87bc07630c86"
    },
    "https://schema.org/hasMap": {
      "@type": "http://www.w3.org/2001/XMLSchema#anyURI",
      "@value": "https://w3w.co/bahnsteigen.zufrieden.anf%C3%BChren"
    },
    "https://schema.org/identifier": {
      "@type": "http://www.w3.org/2001/XMLSchema#anyURI",
      "@value": "https://mein.toubiz.de/api/v1/article/ab9248cb-45f9-41a7-a45c-9c3e38aac6d2"
    },
    "https://schema.org/isAccessibleForFree": {
      "@type": "http://www.w3.org/2001/XMLSchema#boolean",
      "@value": "false"
    },
    "https://schema.org/name": {
      "@language": "de",
      "@value": "Hotel Restaurant Pelikan"
    },
    "https://schema.org/sdLicense": {
      "@type": "http://www.w3.org/2001/XMLSchema#anyURI",
      "@value": "https://creativecommons.org/licenses/by-sa/4.0/"
    },
    "https://schema.org/telephone": "+49 7466 910000",
    "https://schema.org/url": {
      "@type": "http://www.w3.org/2001/XMLSchema#anyURI",
      "@value": "https://pelikanhotel.de"
    },
    "https://vocab.sti2.at/ds/compliesWith": {
      "@id": "https://semantify.it/ds/sloejGAwT"
    }
  }
]

As you can see, there is already some first-level information materialized:

  • schema:hasMap: line 45
  • schema:name: line 57
  • schema:sdLicense: line 61
  • schema:telephone: line 65
  • ...

But maybe you are interested in the postal address, linked locations or the geo coordinates? Then, use a second call for it (theoretically, for schema:postalAddress or schema:geo a call with hop 2 would work, but better safe than sorry). To perform that call, just use the URI (the @id property of the object), as a new identifier. In the example below, let's fetch the geolocation of the hotel (line 42: schema:geo):

URI: http://onlim.com/entity/d3f5ac81-d7f4-40b0-a8cc-87bc07630c86

Call:

GET https://proxy.opendatagermany.io/api/ts/v1/kg/things/d3f5ac81-d7f4-40b0-a8cc-87bc07630c86?lang=de&ns=http://onlim.com/entity
x-api-key: <api-key>
Content-Type: application/json
X-HOP: 1

The response looks like this:

[
  {
    "@id": "http://onlim.com/entity/d3f5ac81-d7f4-40b0-a8cc-87bc07630c86",
    "@type": "https://schema.org/GeoCoordinates",
    "https://schema.org/latitude": {
      "@type": "http://www.w3.org/2001/XMLSchema#double",
      "@value": "48.0506835"
    },
    "https://schema.org/longitude": {
      "@type": "http://www.w3.org/2001/XMLSchema#double",
      "@value": "8.9701657"
    },
    "https://vocab.sti2.at/ds/compliesWith": {
      "@id": "https://semantify.it/ds/2NErTNGpd"
    }
  }
]

Note: "programmatic condition"

The (programmatic) decision, whether to try to dig deeper (to do more iterations) or not, can be made based on the presence of an @id property. If there is an @id property, this means a nested object is referenced and not materialized.

 "https://schema.org/geo": {
   "@id": "http://onlim.com/entity/d3f5ac81-d7f4-40b0-a8cc-87bc07630c86"
 },