Originally, C4 was designed to emulate Cenit, which takes a more linear approach to data transformation,
using a series of scripts to transform the data. With C4, though, we wanted to allow the syncs to be
put together by average users, at the level of "you buy C4, but the integration packs, and snap the parts together".
In order to re-use data from earlier in the pipeline, a completely linear pipeline would've either required
the users to write actual code or it would've been impossible. Thus, we swapped to a more generalized layout
with pipelines, allowing us to feed the same value from a single stage of the pipeline to multiple other stages.
Scripts are simply C# classes with common methods for setting/retrieving parameters
from arbitrary string properties inside them. An IScript by itself provides no method
of execution beyond the Settle*Params series of methods, which are called each time
something finishes setting parameters on that script.
Pipelines are directed, acyclic graphs. Each node is a C# script with zero or more
input and output ports that can accept or output data, and the edges point from
output ports to input ports, as well as from control ports to nodes.
Pipelines themselves can have in/out/ctrl ports added to them (via their Pipeline object in R0).
Additionally, every pipeline has a reserved ctrl port named Ctrl which is used to indicate its
current state of execution.
For any purpose relating to identifying nodes and/or ports in the pipeline, we use the following conventions:
node:port$ is used to identify the entire pipeline as a nodeNote: Again, this assumes an acyclic graph. The
ScriptResolverwill handle checking for cycles before
execution.
let toPump be all nodes without any attached input ports
if toPump is empty, write a HaltFlow to `$:Ctrl` and return.
for each node in toPump:
if not all of the node's inbound linkages have been written to, skip it for now and continue to the next node.
remove node from toPump
pump the node
check all ctrl ports:
if any port returned a HaltFlow, write it to `$:Ctrl` and return
if any port returned a SkipCycle, write it to `$:Ctrl` and return
// Note that returning here does not mean the pipeline is over, merely that it's up to
// whoever called our `Pump()` to keep calling it until we halt.
if any port returned a SkipPath, blacklist any subtree connected to that port from execution
each port that returns a ContinueFlow adds the immediately connected nodes to toPump
for every outbound linkage connected to one of our out ports:
read once (important) from the port and write that value to every connected input port
add each input port's node to toPump
repeat
Note: The above algorithm specifies that if not all of the nodes connected to an inbound port on a given
node have finished executing, then this node may not be run.
Example use cases for each Flow type:
HaltFlow: Unrecoverable error or a picker runs out of itemsSkipPath: A picker runs out of items or we want to implement an if statement in the graphSkipCycle: We found some data we don't want to finish syncing (acts like C#'s continue keyword)ContinueFlow: Default if we do nothing else.The above is the current flow of execution. There is, however, a planned rewrite to this.
In this form, we change the format of the data itself:
Note: I'll use
#footo indicate what most languages would call an atom or symbol. You can treat it like a
string constant.
(Code, Data)Code values, as well as whether they're required
{#Ok} and true.CtrlN:M writes per cycle, where N and M are exposed via reflection.And we execute thusly:
let toPump be all nodes without any attached input ports
if toPump is empty, write `(#Err, null)` to `$:Ctrl` and return.
for each node in toPump:
if not all of the node's inbound linkages have been written to, skip it for now and continue to the next node.
remove node from toPump
pump the node
for every outbound linkage connected to one of our out ports:
let `(code, data)` be the port's output
for each outbound linkage:
if `code` is acceptable to the linkage, write `data` to the destination port and add that node to toPump
else:
if the linkage is required, write `(#Err, (code, data))` to `$:Ctrl`
if the linkage isn't required, do nothing
repeat
Thus, it will eventually be up to the user in the UI to determine how they wish to handle certain errors, which
greatly reduces complexity in the code itself.
For example, they could express that a picker:
#Ok#ErrNote: This flow ignores the fact that pipelines are cached. This is simply for tracing the flow
of user (your) code.
Note: This ignores the scheduler. The scheduler is in fact simply another pipeline that picks schedules from R0
and uses a custom node to resolve and execute them.
run all pickers in order of definition
// This takes care of essentially making a local copy of the remote systems in R0
for each pipeline:
apply any stored (immediate + saved/sensitive) parameters to it and its nodes,
call `SettleAllParams`, which calls `SettleParams` on itself and any `[SubParam]` properties.
// As a general rule: always _call_ `SettleAllParams`, and only _override_ `SettleParams`
call `OnPipelineStart`
// This is where pickers are expected to create their inner lists and/or kick off any `IAsyncEnumerable` stuff.
execute the pipeline until one of the pipeline's Ctrl ports outputs `HaltFlow`
// See [Flow of Execution] for more details
call `OnPipelineEnd`
run all pushers in order
// This takes care of making sure any changes we made in R0 are reflected in the remote systems