Getting Started with structr • structr

library(structr)
#> Registered S3 method overwritten by 'structr':
#>   method     from    
#>   print.json jsonlite

Introduction

Have you ever had trouble with data-type consistency when interacting with external systems like a web API or a database? It is not uncommon for JSON parsers in R to fail to parse and serialize data to the specification of the external system.

For example, let’s say an API requires a field in a JSON object to be a single scalar integer value. However, you get constant 400 Bad Request because packages like jsonlite serialize the integer as a vector of length 1. If the API requires a nested structure unbox values becomes very impractical.

The structr package provides a solution to this problem by allowing you to define a structure for your data and then serialize and deserialize it to and from JSON. (With the added benefit of being blazingly fast.)

Defining a structure

Defining a structure with structr involves combining different data-types to build complex structures. These are combined through the s_* functions like ?s_map, ?s_vector, ?s_string and others.

Once the desired structure is reached it has to be compiled using the ?build_structure function. The compiled version is what will be used to parse or serialize the structures we want.

structure_basic <- build_structure(
  s_map(
    x = s_integer(),
    y = s_double(),
    z = s_optional(s_string())
  )
)
print(structure_basic)
#> Map {
#>     fields: {
#>         "x": Integer,
#>         "z": Optional(
#>             String,
#>         ),
#>         "y": Double,
#>     },
#>     ignore_extra_fields: false,
#>     expected_fields_str: [
#>         "x",
#>         "z",
#>         "y",
#>     ],
#> }

Parsing JSON

We can use a structure like the one defined above to parse a JSON exactly to the specification.

json_str <- '{"x": 1, "y": 99.9, "z": "value"}'

parse_json(json_str, structure_basic)
#> $x
#> [1] 1
#> 
#> $y
#> [1] 99.9
#> 
#> $z
#> [1] "value"

You may have noticed that z is a combination of ?s_optional and ?s_string. This is because structr is non-null by default. Every field is required be default, meaning it cannot be NULL or NA.

If we try parse a JSON without the field x we will get an error.

json_str <- '{"y": 99.9, "z": "value"}'
try(parse_json(json_str, structure_basic))
#> Error in parse_json(json_str, structure_basic) : 
#>   Serde("missing field `x`") at character 0

# Using 'null' for a non-optional field also causes an error
json_str_null <- '{"x": null, "y": 99.9, "z": "value"}'
try(parse_json(json_str_null, structure_basic))
#> Error in parse_json(json_str_null, structure_basic) : 
#>   ExpectedSigned at character 0

However, if we try to parse a JSON missing the optional field z, it will not fail. s_optional means the value associated with the key can be null or missing from the JSON object.

json_str_missing_z <- '{"x": 1, "y": 99.9}'
parse_json(json_str_missing_z, structure_basic)
#> $x
#> [1] 1
#> 
#> $y
#> [1] 99.9

json_str_null_z <- '{"x": 1, "y": 99.9, "z": null}'
parse_json(json_str_null_z, structure_basic)
#> $x
#> [1] 1
#> 
#> $y
#> [1] 99.9
#> 
#> $z
#> NULL

We will get errors if the types are incorrect. If we pass a string to x or y we will get errors.

json_str_bad_types <- '{"x": "1", "y": "99.9", "z": "value"}'
try(parse_json(json_str_bad_types, structure_basic))
#> Error in parse_json(json_str_bad_types, structure_basic) : 
#>   ExpectedSigned at character 0

Serializing JSON

We can use this same structure to convert R objects into the corresponding JSON. serialize_json ensures that the R object conforms to the structure before creating the JSON string.

r_list <- list(x = 1L, y = 99.9, z = "value")
serialize_json(r_list, structure_basic)
#> {"x":1,"z":"value","y":99.9}

r_list_null_z <- list(x = 2L, y = 100.1)
serialize_json(r_list_null_z, structure_basic)
#> {"x":2,"z":null,"y":100.1}

If the R object doesn’t match the structure (e.g., wrong type, missing required field), serialize_json will raise an error.

# Wrong type for 'x'
r_list_bad_type <- list(x = "1", y = 99.9, z = "value")
try(serialize_json(r_list_bad_type, structure_basic))
#> Error in serialize_json(r_list_bad_type, structure_basic) : 
#>   Failed to serialize structure: Serde("Type mismatch: expected an integer") at character 0

# Missing required field 'y'
r_list_missing_field <- list(x = 1L, z = "value")
try(serialize_json(r_list_missing_field, structure_basic))
#> Error in serialize_json(r_list_missing_field, structure_basic) : 
#>   Failed to serialize structure: Serde("Found NA/null values in non-optional field of type a floating-point number") at character 0

# Providing NA for a non-optional field 'y'
r_list_na_field <- list(x = 1L, y = NA_real_, z = "value")
try(serialize_json(r_list_na_field, structure_basic))
#> Error in serialize_json(r_list_na_field, structure_basic) : 
#>   Failed to serialize structure: Serde("Found NA/null values in non-optional field of type a floating-point number") at character 0

More Complex Structures (Parsing)

Real-world JSON, especially from APIs, often involves nested objects, arrays of objects, optional fields, and specific data types like dates. structr is designed to handle these complexities robustly during parsing.

Let’s imagine we’re interacting with an e-commerce API that returns order details. A typical JSON response might look like this:

{
  "order_id": "ORD-12345",
  "order_date": "2024-07-29",
  "status": "processing",
  "customer": {
    "customer_id": 987,
    "name": "Jane Doe",
    "email": "jane.doe@example.com"
  },
  "items": [
    {
      "sku": "ITEM001",
      "product_name": "Wireless Mouse",
      "quantity": 1,
      "unit_price": 25.99
    },
    {
      "sku": "ITEM002",
      "product_name": "Keyboard",
      "quantity": 1,
      "unit_price": 75.50
    }
  ],
  "shipping_address": {
    "street": "123 Example St",
    "city": "Metropolis",
    "zip_code": "10001",
    "country": "USA"
  },
  "notes": "Gift wrap requested."
}

We can define a structr schema to strictly match this structure:

# Define the schema for a single order item
item_schema_parse <- s_map(
  sku = s_string(),
  product_name = s_string(),
  quantity = s_integer(),
  unit_price = s_double()
)

# Define the schema for the customer
customer_schema_parse <- s_map(
  customer_id = s_integer(),
  name = s_string(),
  email = s_optional(s_string()) # Email might be null
)

# Define the schema for the shipping address
address_schema_parse <- s_map(
  street = s_string(),
  city = s_string(),
  zip_code = s_string(),
  country = s_string()
)

# Combine into the main order schema
order_schema_parse <- s_map(
  order_id = s_string(),
  order_date = s_date(format = "%Y-%m-%d"), # Expecting specific date format
  status = s_string(),
  customer = customer_schema_parse,              # Nest the customer schema
  items = s_vector(item_schema_parse),           # Expect a vector of items matching item_schema
  shipping_address = address_schema_parse,       # Nest the address schema
  notes = s_optional(s_string())           # Notes might be null
)

# Build the final structure
compiled_order_structure_parse <- build_structure(order_schema_parse)

# Print the structure (optional, for inspection)
# print(compiled_order_structure_parse)

Now, let’s define the JSON string in R and parse it using our compiled structure.

json_order_1 <- '{
  "order_id": "ORD-12345",
  "order_date": "2024-07-29",
  "status": "processing",
  "customer": {
    "customer_id": 987,
    "name": "Jane Doe",
    "email": "jane.doe@example.com"
  },
  "items": [
    {
      "sku": "ITEM001",
      "product_name": "Wireless Mouse",
      "quantity": 1,
      "unit_price": 25.99
    },
    {
      "sku": "ITEM002",
      "product_name": "Keyboard",
      "quantity": 1,
      "unit_price": 75.50
    }
  ],
  "shipping_address": {
    "street": "123 Example St",
    "city": "Metropolis",
    "zip_code": "10001",
    "country": "USA"
  },
  "notes": "Gift wrap requested."
}'

# Parse the JSON string
parsed_order_1 <- parse_json(json_order_1, compiled_order_structure_parse)

# Inspect the parsed R object structure and types
str(parsed_order_1, max.level = 2)
#> List of 7
#>  $ order_id        : chr "ORD-12345"
#>  $ order_date      : Date[1:1], format: "2024-07-29"
#>  $ status          : chr "processing"
#>  $ customer        :List of 3
#>   ..$ customer_id: int 987
#>   ..$ name       : chr "Jane Doe"
#>   ..$ email      : chr "jane.doe@example.com"
#>  $ items           :List of 2
#>   ..$ :List of 4
#>   ..$ :List of 4
#>  $ shipping_address:List of 4
#>   ..$ street  : chr "123 Example St"
#>   ..$ city    : chr "Metropolis"
#>   ..$ zip_code: chr "10001"
#>   ..$ country : chr "USA"
#>  $ notes           : chr "Gift wrap requested."

# Access specific parts
print(paste("Order Status:", parsed_order_1$status))
#> [1] "Order Status: processing"
print(paste("Number of items:", length(parsed_order_1$items)))
#> [1] "Number of items: 2"
print(paste("First item SKU:", parsed_order_1$items[[1]]$sku))
#> [1] "First item SKU: ITEM001"
# Notice the order_date is now an R Date object
print(paste("Order Date:", parsed_order_1$order_date))
#> [1] "Order Date: 2024-07-29"
class(parsed_order_1$order_date)
#> [1] "Date"

Let’s try another example where the optional fields (email and notes) are null:

json_order_2 <- '{
  "order_id": "ORD-67890",
  "order_date": "2024-07-30",
  "status": "shipped",
  "customer": {
    "customer_id": 654,
    "name": "John Smith",
    "email": null
  },
  "items": [
    {
      "sku": "ITEM003",
      "product_name": "USB Hub",
      "quantity": 2,
      "unit_price": 12.00
    }
  ],
  "shipping_address": {
    "street": "456 Test Ave",
    "city": "Gotham",
    "zip_code": "20002",
    "country": "USA"
  },
  "notes": null
}'

# Parse the second JSON
parsed_order_2 <- parse_json(json_order_2, compiled_order_structure_parse)

# Check the optional fields (which are NULL in R)
print(paste("Customer Email:", parsed_order_2$customer$email))
#> [1] "Customer Email: "
is.null(parsed_order_2$customer$email)
#> [1] TRUE
print(paste("Order Notes:", parsed_order_2$notes))
#> [1] "Order Notes: "
is.null(parsed_order_2$notes)
#> [1] TRUE

If the incoming JSON deviates from this structure (e.g., wrong type for quantity, missing status, unexpected extra field, incorrect date format), parse_json will raise a specific error, ensuring data integrity when working with external systems.

This example demonstrates how s_map, s_vector, s_optional, s_date, and atomic types (s_string, s_integer, s_double) can be combined to model and validate complex, real-world JSON data during parsing.

More Complex Structures (Serialization)

Just as structr helps enforce structure when parsing JSON, it’s equally valuable when creating JSON to send to external systems, especially those with strict API requirements. This ensures your R data is correctly formatted before transmission.

Let’s imagine we need to send event log data to an API. This API is very particular about the format:

It expects specific fields: eventId, eventDate, eventType, userId, details.
eventDate must be in YYYYMMDD format.
userId must be an integer.
details is a nested object containing source (string) and metadata (optional map).
metadata itself can contain key (string) and value (double), both optional within the metadata map, but if metadata is present, it must be an object.

We can define a structr schema to meet these strict requirements for serialization:

# Schema for the optional metadata map
metadata_schema_serialize <- s_map(
  key = s_optional(s_string()),
  value = s_optional(s_double())
)

# Schema for the details object
details_schema_serialize <- s_map(
  source = s_string(),
  metadata = s_optional(metadata_schema_serialize) # The whole metadata map is optional
)

# Main schema for the log entry
log_entry_schema_serialize <- s_map(
  eventId = s_string(),
  eventDate = s_date(format = "%Y%m%d"), # Strict date format required by API
  eventType = s_string(),
  userId = s_integer(),
  details = details_schema_serialize      # Nest the details schema
)

# Build the final structure for serialization
compiled_log_entry_structure_serialize <- build_structure(log_entry_schema_serialize)

print(compiled_log_entry_structure_serialize)
#> Map {
#>     fields: {
#>         "eventId": String,
#>         "eventType": String,
#>         "eventDate": Date {
#>             format: "%Y%m%d",
#>         },
#>         "details": Map {
#>             fields: {
#>                 "source": String,
#>                 "metadata": Optional(
#>                     Map {
#>                         fields: {
#>                             "key": Optional(
#>                                 String,
#>                             ),
#>                             "value": Optional(
#>                                 Double,
#>                             ),
#>                         },
#>                         ignore_extra_fields: false,
#>                         expected_fields_str: [
#>                             "key",
#>                             "value",
#>                         ],
#>                     },
#>                 ),
#>             },
#>             ignore_extra_fields: false,
#>             expected_fields_str: [
#>                 "source",
#>                 "metadata",
#>             ],
#>         },
#>         "userId": Integer,
#>     },
#>     ignore_extra_fields: false,
#>     expected_fields_str: [
#>         "eventId",
#>         "eventType",
#>         "eventDate",
#>         "details",
#>         "userId",
#>     ],
#> }

Now, let’s create some R data that we want to send to this API. It’s crucial that the R data types match the schema (e.g., use as.Date() for eventDate, integer for userId).

# Example 1: Log entry with full details including metadata
log_entry_data_1 <- list(
  eventId = "evt-aaa-001",
  eventDate = as.Date("2024-08-15"),
  eventType = "user_login",
  userId = 12345L,
  details = list(
    source = "web_frontend",
    metadata = list(
      key = "session_duration",
      value = 350.5
    )
  )
)

# Example 2: Log entry where metadata is missing
log_entry_data_2 <- list(
  eventId = "evt-bbb-002",
  eventDate = as.Date("2024-08-16"),
  eventType = "page_view",
  userId = 98765L,
  details = list(
    source = "mobile_app"
  )
)

# Example 3: Log entry where metadata is present, but its inner fields are NULL
log_entry_data_3 <- list(
  eventId = "evt-ccc-003",
  eventDate = as.Date("2024-08-17"),
  eventType = "item_click",
  userId = 12345L,
  details = list(
    source = "web_frontend",
    metadata = list(
      key = NULL,
      value = NULL
    )
  )
)

str(log_entry_data_1) # Check R types
#> List of 5
#>  $ eventId  : chr "evt-aaa-001"
#>  $ eventDate: Date[1:1], format: "2024-08-15"
#>  $ eventType: chr "user_login"
#>  $ userId   : int 12345
#>  $ details  :List of 2
#>   ..$ source  : chr "web_frontend"
#>   ..$ metadata:List of 2
#>   .. ..$ key  : chr "session_duration"
#>   .. ..$ value: num 350

Let’s serialize these R lists using our strict structure. serialize_json will validate the R list against the schema before producing the JSON.

# Serialize Example 1
serialize_json(log_entry_data_1, compiled_log_entry_structure_serialize, pretty = TRUE)
#> {
#>   "eventId": "evt-aaa-001",
#>   "eventType": "user_login",
#>   "eventDate": "20240815",
#>   "details": {
#>     "source": "web_frontend",
#>     "metadata": {
#>       "key": "session_duration",
#>       "value": 350.5
#>     }
#>   },
#>   "userId": 12345
#> }

# Serialize Example 2
serialize_json(log_entry_data_2, compiled_log_entry_structure_serialize, pretty = TRUE)
#> {
#>   "eventId": "evt-bbb-002",
#>   "eventType": "page_view",
#>   "eventDate": "20240816",
#>   "details": {
#>     "source": "mobile_app",
#>     "metadata": null
#>   },
#>   "userId": 98765
#> }

# Serialize Example 3
serialize_json(log_entry_data_3, compiled_log_entry_structure_serialize, pretty = TRUE)
#> {
#>   "eventId": "evt-ccc-003",
#>   "eventType": "item_click",
#>   "eventDate": "20240817",
#>   "details": {
#>     "source": "web_frontend",
#>     "metadata": {
#>       "key": null,
#>       "value": null
#>     }
#>   },
#>   "userId": 12345
#> }

Now, let’s see what happens if our R data doesn’t conform to the structure expected by the API schema.

# Error Case 1: Incorrect date type (character instead of Date)
bad_data_date_type <- log_entry_data_1
bad_data_date_type$eventDate <- "2024-08-15" # Should be as.Date("2024-08-15")
try(serialize_json(bad_data_date_type, compiled_log_entry_structure_serialize))
#> Error in serialize_json(bad_data_date_type, compiled_log_entry_structure_serialize) : 
#>   Failed to serialize structure: Serde("Type mismatch: expected a date") at character 0

# Error Case 2: Incorrect type for userId (numeric instead of integer)
bad_data_user_type <- log_entry_data_1
bad_data_user_type$userId <- 12345.0 # Should be 12345L
try(serialize_json(bad_data_user_type, compiled_log_entry_structure_serialize))
#> Error in serialize_json(bad_data_user_type, compiled_log_entry_structure_serialize) : 
#>   Failed to serialize structure: Serde("Type mismatch: expected an integer") at character 0

# Error Case 3: Missing a required field ('eventType') in the R list
bad_data_missing_field <- log_entry_data_1
bad_data_missing_field$eventType <- NULL # Remove the field
try(serialize_json(bad_data_missing_field, compiled_log_entry_structure_serialize))
#> Error in serialize_json(bad_data_missing_field, compiled_log_entry_structure_serialize) : 
#>   Failed to serialize structure: Serde("Found NA/null values in non-optional field of type a string") at character 0

# Error Case 4: Providing NA for a required, non-optional field ('source')
bad_data_na_field <- log_entry_data_1
bad_data_na_field$details$source <- NA_character_
try(serialize_json(bad_data_na_field, compiled_log_entry_structure_serialize))
#> Error in serialize_json(bad_data_na_field, compiled_log_entry_structure_serialize) : 
#>   Failed to serialize structure: Serde("Found NA/null values in non-optional field of type a string") at character 0

These examples show how serialize_json uses the structr schema to enforce type correctness and structure before generating the JSON. This is critical for reliable communication with APIs that demand specific formats, preventing malformed requests and runtime errors on the server side. The use of s_date with a specific format ensures dates are serialized exactly as required.