Actor output schema file specification 1.0 [work in progress]
This JSON file defines the schema of the Output object produced by a web Actor. The file is referenced from the main Actor file using the output
property, and it is typically stored in .actor/output_schema.json
.
The output schema is used by the system to generate the output JSON object, whose fields corresponding to properties
, where values are URLs to the dataset results, key-value store files, or live view web server. This output object needs to be generated by system right when the Actor starts, and remain static over entire lifecycle of Actor, only the linked content changes over time. This is necessary to enable integrations of results to other systems - you don’t need to run an Actor to see format of its results as it’s predefined by the output schema.
The output schema is also used by the system to generate the user interface, API examples, integrations, etc.
The file format is a JSON Schema with our extensions.
Structure
{
"actorOutputSchemaVersion": 1,
"title": "Some title", // optional
"description": "Text that is shown in the Output UI", // optional
"type": "object",
"properties": {
// Properties in output reference Actor run's default storages,
// or live view / Standby mode server.
"currentProducts": {
// We extend JSON Schema "type" with the new ones, all prefixed with '$':
"type": "$dataset",
// or specific storage JSON schema file
"type": "./dataset_schema.json",
// Specify where the value will eventually be produced
"link": "$actor.dataset",
// you can also reference a property from input object,
// the linkage will be checked for type compatibility
"link": "$input.myProductsDatasetId",
// Select views how to render the output, using "views" defined by the Dataset schema file
"views": ["productVariants"],
},
// Selects a specific group of records with a certain prefix. In UI, this can be shown
// as a list of images. In the output object, this will be a link to a API with "prefix" param.
"productImages": {
"type": "$keyValueStore",
// or specific storage JSON schema file
"type": "./key_value_store_schema.json",
"link": "$actor.keyValueStore",
"title": "Product images",
"description": "Yaddada", // optional
// optionally, you can specify which files to display in UI for key-value stores
"keyPrefixes": ["images-"],
"collection": "screenshots",
},
// Live view web server for to the Actor, or Standby mode fixed URL
// In the "output" view, this page is rendered in an IFRAME
"productExplorer": {
// a generic web server
"type": "$defaultWebServer",
// Reference an OpenAPI schema of the web server
"type": "./web_server_openapi.json"
"link": "$actor.webServer",
"title": "API server",
"description": "API documentation is available in swagger.com/api/xxxx", // optional
// specify a path to open?
"viewPath": "/nice-report?query=123",
}
}
}
Random notes
The output schema can reference other datasets/kv-stores/queues but only those ones that are referenced in the input, or the default. Hence there’s no point to include storage schema here again, as it’s done elsewhere.
- NOTE: The output schema should enable developers to define schema for the default dataset and key-value store. But how? It should be declarative so that the system can check that e.g. the overridden default dataset has the right schema. But then, when it comes to kv-store, that’s not purely output object but INPUT, similarly for overridden dataset or request queue. Perhaps the cleanest way would be to set these directly in
.actor/actor.json
. - The Run Sync API could have an option to automatically return (or redirect to?) a specific property (i.e. URL) of the output object. This would supersede the
outputRecordKey=OUTPUT
API param as well as the run-sync-get-dataset-items API endpoint. Maybe we could have one of the output properties as the main one, which would be used by default for this kind of API endpoint, and just return data to user. - Same as we show Output in UI, we need to autogenerate the OUTPUT in API e.g. JSON format. There would be properties like in the output_schema.json file, with e.g. URL to dataset, log file, kv-store, live view etc. So it would be an auto-generated field “output” that we can add to JSON returned by the Run API endpoints (e.g. https://docs.apify.com/api/v2#/reference/actor-tasks/run-collection/run-task)
- Also see: https://github.com/apify/actor-specs/pull/5#discussion_r775641112
output
will be a property of run object generated from Output schema
Examples of ideal Actor run UI
- For the majority of Actors, we want to see the dataset with new records being added in realtime
- For Google Spreadsheet Import, we want to first display Live View for the user to set up OAUTH, and once this is set up, then we want to display the log next time.
- For technical Actors, it might be a log
- For HTML to PDF convertor it’s a single record from key-value store
- For Monitoring it’s log during the runtime and a single HTML record in an iframe in the end
- For an Actor that has failed, it might be the log
How to define Actor run UI
Simple version
There will be a new tab on Actor run detail for every Actor with output schema called “Output”. This tab will be at the first position and displayed by default. Tab will show the following:
- Items from output schema with property
visible: true
will be rendered in the same order as they are in schema - The live view will be displayed only when it has
visible: true
and when it’s active. Otherwise, we should show just a short message “This show is over”. - If the dataset has more views then we should have some select or tabs to select the view
Ideal most comprehensive state
- Default setup, i.e., what output components should be displayed at the default run tab
- Optionally, the setup for different states
- Be able to pragmatically changes this using API by Actor itself