Actor output schema file specification 1.0 [work in progress]
This JSON file defines the schema of the output object produced by a web Actor. The file is referenced from the main Actor file using the output
property, and it is typically stored in .actor/output_schema.json
.
The format is a JSON Schema with our extensions, describing a single object.
The output schema is used by the system to generate the output JSON object, whose fields corresponding to properties
, where values are URLs linking to actual Actor results in a dataset, key-value store files, or live view web server. This output object is generated by system right when the Actor starts withour executing any Actor’s code, and remains static over entire lifecycle of Actor; only the linked content changes over time as Actor produces the results. This is necessary to enable integrations of results to other systems, as you don’t need to run an Actor to see format of its results as it’s predefined by the output schema.
The output schema is also used by the system to generate the user interface, API examples, integrations, etc.
Structure
{
"actorOutputSchemaVersion": 1,
"title": "Some title",
"description": "This text is shown in the Output UI",
"type": "object",
"properties": {
// This property in output object will contain a URL to the dataset containing Actor results,
// for example: https://api.apify.com/v2/datasets/XYZabc/items?format=json&view=product_details
"currentProductsDatasetUrl": {
// Type is string, because the value in output object is a URL
"type": "string",
"title": "Current products",
"description": "Yaddada",
// Identifies what kind of object is refereced by this output property (same syntax as "resourceType" in input schema).
// If used, the system will interepret the "source" and render the dataset in UI special way.
"resourceType": "dataset",
// Defines how the output value is created, using text format where {{x}} denote variables (same syntax as webhook templates)
"template": "{{actorRun.defaultDatasetUrl}}?format=json&view=product_details"
// Or reference a property from input object, the linkage will be checked for type compatibility
// "template": "{{actorInput.myProductsDatasetId}}"
},
// Selects a specific group of records with a certain prefix. In UI, this can be shown
// as a list of images. In the output object, this will be a link to a API with "prefix" param.
"productImagesUrl": {
"type": "string",
"title": "Product screenshots",
"resourceType": "keyValueStore",
// Define how the URL is created, in this case it will link to the default Actor key-value store
"template": "{{actorRun.defaultKeyValueStoreUrl}}?collection=screenshots"
},
// Example of reference to a file stored in Actor's default key-value store.
// In UI can be rendered as a file download.
"mainScreenshotFileUrl": {
"type": "string",
"title": "Main screenshot",
"description": "URL to an image with main product screenshot.",
"template": "{{actorRun.defaultKeyValueStoreUrl}}/screenshot.png"
},
// Live view web server for to the Actor
// In the "output" view, this page is rendered in an IFRAME
"productExplorerWebUrl": {
"type": "string",
"resourceType": "webServer",
"title": "Live product explorer app",
"description": "API documentation is available in swagger.com/api/xxxx", // optional
// TODO: ideally this should be named {{actorRun.webServerUrl}} for consistency, but we'd need to change ActorRun everywhere
"template": "{{actorRun.containerUrl}}/product-explorer/"
}
}
}
Random notes
The output schema can reference other datasets/kv-stores/queues but only those ones that are referenced in the input, or the default. Hence there’s no point to include storage schema here again, as it’s done elsewhere.
- NOTE: The output schema should enable developers to define schema for the default dataset and key-value store. But how? It should be declarative so that the system can check that e.g. the overridden default dataset has the right schema. But then, when it comes to kv-store, that’s not purely output object but INPUT, similarly for overridden dataset or request queue. Perhaps the cleanest way would be to set these directly in
.actor/actor.json
. - The Run Sync API could have an option to automatically return (or redirect to?) a specific property (i.e. URL) of the output object. This would supersede the
outputRecordKey=OUTPUT
API param as well as the run-sync-get-dataset-items API endpoint. Maybe we could have one of the output properties as the main one, which would be used by default for this kind of API endpoint, and just return data to user. - Same as we show Output in UI, we need to autogenerate the OUTPUT in API e.g. JSON format. There would be properties like in the output_schema.json file, with e.g. URL to dataset, log file, kv-store, live view etc. So it would be an auto-generated field “output” that we can add to JSON returned by the Run API endpoints (e.g. https://docs.apify.com/api/v2#/reference/actor-tasks/run-collection/run-task)
- Also see: https://github.com/apify/actor-specs/pull/5#discussion_r775641112
output
will be a property of run object generated from Output schema
Examples of ideal Actor run UI
- For the majority of Actors, we want to see the dataset with new records being added in realtime
- For Google Spreadsheet Import, we want to first display Live View for the user to set up OAUTH, and once this is set up, then we want to display the log next time.
- For technical Actors, it might be a log
- For HTML to PDF convertor it’s a single record from key-value store
- For Monitoring it’s log during the runtime and a single HTML record in an iframe in the end
- For an Actor that has failed, it might be the log
How to define Actor run UI
Simple version
There will be a new tab on Actor run detail for every Actor with output schema called “Output”. This tab will be at the first position and displayed by default. Tab will show the following:
- Items from output schema with property
visible: true
will be rendered in the same order as they are in schema - The live view will be displayed only when it has
visible: true
and when it’s active. Otherwise, we should show just a short message “This show is over”. - If the dataset has more views then we should have some select or tabs to select the view
Ideal most comprehensive state
- Default setup, i.e., what output components should be displayed at the default run tab
- Optionally, the setup for different states
- Be able to pragmatically changes this using API by Actor itself