Sunday, December 7, 2014

Process Invocation

Process invocation is important. There can be a portable abstraction and OS-specific implementations that Igor (and in particular tasks) can turn to. In the following I will call it a PIU (Process Invocation Unit) and its ancillary job is to mediate data between processes.

The workbench project already requires a new generation of tools that need to be more prudent about where and how to store user visible data in a multi-device single-user one-world. So why not use this passage to revisit process invocation.

Aspects of a Process

I find it very revealing to use the Six W questions to find out what all the factors of process invocation are.

What: The code that is going to be executed. You will sometimes encounter a hierarchy of subcommands that eventually leads to exactly one operation or application to execute. Each tier in this hierarchical grouping might accept individual configuration options. My impression is that, beyond sharing functionality, these groupings often pertain to a conceptual workflow and/or share the same application-private data.

Who: The data to operate on. This usually involves product-data and not application-private data, which the application intrinsically knows how to find.

How: The way in which the operation is performed; the configuration.

When: The time or the event that triggers the operation.

Where: The physical location, as well as the system and the WS-node, where the process will be launched. In the context of the green/red node distinction I usually called this part the privacy level (public, drone, private). There is also the WB and module location. This is in some way special as it indirectly specifies the data (who) to work with.

Why: Why not? This can be captured in comments and ultimately depends on your individual meaning of life.

From these 6 categories, the who and how are information that is of interest to the process itself, whereas the other have a propensity of being managed externally. You can, of course, imagine a process that reschedules itself or iterates several locations, but for the predominant use case what/when/where/why would be handled by an external facility (or the user).

(Who is what and what is who? It is a little bit ambiguous, but the question of what code to execute seems to be more appropriate.)

Language Requirements

Let's turn the focus to the places where process invocation happens. Apart from starting one particular application (for example through a desktop) there are command-line and scripting languages at the process invocation (PI) level. I think these two contexts have diametrically different requirements and need to be strictly kept separate.

In a script you aim for understandability, use explicit expressions, and are able to comprehensively structure the script. Most importantly, you write code once and read it many times.

In contrast, the command-line is supposed to allow you to formulate jobs in a very concise way. You will often try to combine a series of operations on one line, to be able to later refine it after accessing it in the history. In this sense you might read it back once or twice, but only within a very short time frame.

A concrete example: In a scripting language with text objects (which may be endpoints of streams) you can add a statement to write content out to a file on a separate line, whereas on the command-line you would want something like the redirection features that exist today. The same goes for pipes. In a modern PI-scripting language you would be able to easily capture an output stream in a text object and stream it to another process on the next line. Incidentally, having text-objects on the command-line and a way to inspect them in tabs might make the difference less stringent. You may also be able to further filter such a stream with methods on the text-object and inspect the result in real-time.

In addition, note that a scripting language at the PI-level has also quiet different requirements from (what I frivolously call) an in-process scripting language (eg Python, Perl). A PI-language needs strong orchestration features.

Text-objects would take on a special role, as text is the common ground to be able to combine processes implemented in different languages or even incorporate remote execution. There are worldwide standardization efforts to represent textual data.

Process Input

A process needs to know how to operate on who. Let's look at each more closely.

You usually would not pass data as arguments, simply for the unwanted parsing that might happen under some conditions. Therefore you use streams and files. Having a range of streams designated as input streams with optional labels would often avoid the need for potentially insecure temporary-file handling. From a programming perspective processes would gain more purity (referential transparency). You would need less predefined locations, in particular for small utility tasks.

Determining the effective configuration (how) is quiet tedious nowadays. In some cases, you have to parse a configuration file, check the environment, then parse the command line, and eventually merge the results. All just to obtain the relevant values for how the operation is supposed to transpire.

In the ideal case, an application provides a configuration specification, and a process obtains a structure of the applicable configuration through one call or as a special value similar to argv.

A command-line or script could streamline the arguments of a call to normalize away the syntax flavor, and pass it to Igor/PIU for translation (eg short to long keys). Subsequently the effective configuration would be produced and provided to the process.

The how parts of a command-line processed in such a way can be seen as ad-hoc OTC (One Time Configuration), whereas OTC prepared in the global configuration and activated per call would be recorded OTC.

In terms of performance, note that this unified approach includes what would nowadays be a global configuration (eg in form of a file). If you are worried about the time this might cost (despite optimized data structures), you can of course work exclusively with unprocessed ad-hoc OTC, which would replicate a simple array of strings as you know it from argv.

Process Output

Inside a process you would be able to obtain temporary streams to manage large amounts of data, and later designate them as output streams. Those can again be labeled and there can be multiple. In essence, this would allow you to create pipes of multiple streams (multi-pipes). If the PIU recognizes a legacy script (or a dedicated single stream application) it may even be capable to split and join multiple streams for that stage. Text-objects with corresponding methods might reduce the demand for such filter applications, though.

Then there is the opportunity to create structured output and a consumer would be able to specify which part of the output it is interested in.

Imagine an output stream with the special type "structured". This isn't actually a stream but a key-value string mapping (possibly with nesting and lists) stored inside the PIU. A process that is capable of accepting structured input would access that record directly. Otherwise, that mapping may have a string representation which can be streamed to standard output and the user.

In a way, this does also make the differentiation between a script and user consumer explicit. Depending on which stream is being read, the structured data output stream can be transformed into a pretty-printed standard output text automatically, or the process might create a custom-made output for the user.

A command line syntax could have special syntax for output picking and only the selected values would be presented to the user. Right now, this is sometimes accomplished by adding command-line options that influence the output composition. In conclusion, output picking is a kind of how.

In what way would these multiple output streams be presented in a terminal?

In a GUI-terminal, the actual terminal display could be reduced to user-IO and thereby unclutter the user's interactions. Data-streams could be displayed in tabs above that terminal, with the option of additional filters applied. The user would be able to create a text-object that references such a stream and thereby keep it alive. Of course, there could also be a history of output data-streams per prompt.

In a pure text terminal things would not be that convenient. Everything can certainly be redirected to standard output with a result like today's shared output. Maybe a stream selection mechanism, a "screen"-like setup, a multiple-window terminal, or configurable separators can help.

PI Language Pluralism

As mentioned above, PI scripting and the command line have different requirements. But what is even more interesting is the fact that there can actually be a whole range of languages, like they exist at a lower level.

In my opinion, the most striking omission from what's available for process orchestration scripting (at least for Unix shells) are text-objects and exceptions. I personally think that the higher you get on the language ladder, the more useful exceptions become.

If you think of text-objects as stream-ends, this will open up a whole new (and I think easier and more reliable) way to establish concurrent operations through process orchestration. Together with the option of multi-pipes, it would be possible to assemble a range of such text-objects, wait on them, and stream all to another process at once.

Mappings at the PI-level

Essentially, this would establish a mapping-like structure at the PI-level that can be nested. There would be list-values, but other than that, all keys and values would be strings.

I think we could make a lot of progress with such a relatively simple structure. It would be eligible for the effective configuration, labeled streams, and structured IO. Maybe it would even make sense to store selected input data in the configuration, or use structured output as ad-hoc OTC.

The string values could serve as a basis for other secondary types (bool, numbers, etc). The type-info would just be carried along and processes would interpreted it themselves. Note that the ability to have fragmented records and nesting reduces the need for composite text fields like URLs.

Signature and Polyglot

To recap, the effective configuration is a mapping with known keys from the configuration specification.

Notice what happened by improving the how aspect (and adding output picking). You can now easily check the signature of process invocation. You have a configuration structure that can be verified against the specification and a program can also determine what data it accepts and in what form.

This means that you can more reliably formulate tasks to be scheduled elsewhere. Igor can query the corresponding PIU for supported features and inform outright if that task is even possible.

In addition, you obtain better guarantees (and thereby trust) when combining orchestration scripting with in-process scripting. Concretely, when you assemble data from diverse processes but would like to process that information with a more specialized in-process utility script, you could get a more reliable association to said utility script and can employ static checking. Absence of the need for temporary files would make things easier, too. The location-graph can facilitate packaging (in particular keeping track of an installation). Using the right tool for the right task may become a lot easier.

Configuration is fragmented

You will not get all of the applicable configuration at program startup. Specific configuration can apply to selected files or streams. You would be able to query the configuration that is different to the base configuration for such an item.

Igor/PIU would handle merging and filtering of effective configuration at different levels in the call stack. For example, the ad-hoc OTC of a call would have to be merged with the active global configuration/OTC. Then, the configuration specification would describe what configuration is accessed by a particular process and it could be filtered accordingly.

Igor would know about configured (global) aliases and be able to resolve them before further configuration processing.

Igor/PIU would cover all your configuration needs. And it could propel #hurling-the-beacon.

Update (9, Dec, 2014)

The mapping would be more like a linked list than a hash. Ordered and sequential access. Maybe buffering can be solved by determining exactly one field per stream that can grow without boundary.

All the information in these records is meaningful to the process. The mapping simply serves as a vehicle to transmit that info. The process can choose what to do with that information and optimize access.

Monday, October 20, 2014

Workbench Instances

I would like to describe a few workbench replication utilities and conventions with the example arrangement shown in the picture below. It focuses on workbench instances from the perspective of hardware as a point of access. You may not want or need a replica of every workbench on every device.

Rows and Columns

Every oval represents a workbench of the same project. Each row is a different setup of that project, and accordingly corresponds to what could be called a Purpose (the setup may be temporarily different in one row: green/red). The top row labels the systems that span the entire column, and the vertical black lines show physical hardware boundaries.

Workshop Nodes

The blue, vertical bar separates the green workshop zone from the red-drone zone. For the sake of simplicity, assume the same WS/drone node (global user data) on each side and ignore the details of the system. Ideally, the system would be a blueprint that does not accumulate any data and can therefore be the same everywhere.

The green zone is your safe zone with private data and a higher convenience level. The workshop (WS) nodes here are on hardware you own and trust. The "Laptop" and "Host" system labels hint at harboring a full WS, but it could as well be a green-drone node in a system-container.

The red zone is for experiments, untrusted-code executions, and collaboration on hardware you don't own. You would try to put the least amount of data necessary for the task at hand onto these nodes. The data here may potentially go public. Interestingly, many scheduled tasks may not even need network connectivity, and so not every red-drone is necessarily threatened by information disclosure.


There are basically two ways to instantiate a workbench. You either intend to use slight variations of a conceptually different setup and consequently create a new Purpose, or you want to replicate a workbench on another system. In the latter case, you want to always synchronize the current Situation. For example, you changed S1 on the "Host" system, and want to continue at exactly that state on the "Laptop" system. I call this Replicate-To-Self (RTS) to distinguish it from a collaboration synchronization done through DM-schemes in modules.

RTS ensures that these workbench replicas appear as one. If you detach a workbench from the RTS-relation, it may either represent a potentially useless past state or you might evolve it into a Purpose of its own. If you think you need a backup, you can create a version manifest and spool all modules of a workbench on demand. Note that RTS may be supported by private, always-on, passive storage systems that are not shown in the sketch.

Data Access

Another aspect is data access. The almond colored rectangle shows shared data access. This is a configuration that may be interpreted by the DM-schemes of modules in different ways, but it usually means that all modifications in other workbenches are visible without further action -- for example, because a shared backend is being replicated.

A more formal way to exchange data between Purposes is explicitly through an external repository. That is the same as collaborating with others, except that you may choose to use a private repository. Let's call it Collaborate-With-Self (CWS).

Danger Zone

Now on to damage control and calculated risk.

Having a modular structure and automation makes it easy to push only the data necessary when you schedule a long running task or when you plan to collaborate abroad. That is shown by the "slice" arrow from the green to the red zone. You can start a red-drone on a virtual system which provides you with the necessary separation.

So far I intend to call all workbench-slices on red-drones bots, independent of just being resource providers or meant for human interaction. The interactivity lock-down is merely a switch and does not need to be blessed with another name. By the way, instantiating another Purpose is very similar to slicing down a workbench for use in the red zone.

The labels of the bots in the red-zone are the same as in the green-zone but with a B suffix to indicate their strong relation to the Situation in the green zone. They are, however, completely independent from green-zone workbenches. RTS as well as sharing data access would undermine the motivation for the green/red separation. It would link the security levels of both contexts.

The drone and bot data puts some limit on what might become public in case of a successful break-in from inside (malicious project scripts) or outside (foreign environment). So slicing helps you contain the threat from information disclosure.

Back in the green zone, you can pull data to a passive state and inspect all modification. You vet them, merge them, and destroy the bot (or just the changes). Here, you have the opportunity to detect tampering or unexpected changes. Note how the modifications on a bot (and drone) have a temporal nature. Their reason for existence is to be absorbed into the green zone to possibly be re-injected into the bot as vetted changes.

Drones are little worlds of their own and bots can RTS within the red-zone drones. The bot Situation S2B demonstrates how you can prepare an abroad collaboration on a foreign system inside a virtualized system and the synchronization with the abroad system is handled automatically.

The slice/detect boundary is not exclusively meant for crossing the green/red context qualification. You can as well create such a boundary from the abroad system with the same red-drone and S2B Situation, just to get an understanding about what the result of an operation is (detect).

To stick with the analogy of a house and rooms, you can see the green/red boundary as kind of the front door. You dress for the public and prepare for the things you take with you to be stolen. In the digital world you can even use your phone in trusted mode to fetch stuff you forgot.

Monday, March 17, 2014

The Workbench Network

(This is markuped, but otherwise exact, copy of the feature tracker issue item that you might know from

The Workbench Network

It would offer a lot of opportunities to be able to easily instantiate workbenches in a VM or the cloud. The benefits range from having a clearly visible and understandable security barrier to having a fully laid out cloud-workplace for newcomers to explore. A cloud-workspace would also be useful in an educational setting or collaboration in areas other than software development. Seeing the global resources as nodes that can be synced or read-only replicated to other systems supports this interesting perspective.

I'd like to paint a coarse picture of how a workbench network could look like. The tool possibilities this enables are vast, and I'll only scratch the surface of that topic.

But first a name change. I will refer to Workbench Bots what was previously known as Workbench Slaves. I've created a Wiki page (Glossary) for naming ideas and discussions to keep that out of the drafting process.

The Workshop Node

The global resources belong to a WorkShop Node (WSN). This is your personal work-hub, the nerve center for all your workbenches. The resources managed by the WSN could be roughly separated into three categories: Configuration & Data, Tools, and Active Parts.

Configuration & Data

This is subject to syncing and/or replication. Configuration is a core responsibility of the workbench application and what facilities will be provided can certainly fill a separate topic.


Global tool usage could simply be recorded and synced with other data. Instead of syncing the tool fleet on every node, it would be possible to browse the tools used on other nodes in a general tool installation dialog.

Active Parts

Although all of the WSN would reside in a single location, and would therefore be switchable, you will certainly want to use just one (no configuration duplication). The configuration for active parts (start/stop workbenches, notification handling, cron-jobs, etc) could be arranged in profiles that itself could be started and stopped. This would also allow you to switch, for example, between private and work profiles.


The master record of your WSN could be somewhere online which would allow you to fully sync the WSN to multiple devices without much hassle. In addition to the WSN, it would be possible to have parts of the WSN be read-only instantiated on Drone nodes. These nodes can be on VMs or in the cloud and would offer the mentioned easy to understand security container where only notifications are routed back to your WSN. No automatic syncing of other resources would take place. Note that these nodes are usually accessible from a device where you have a fully capable WSN running -- in another browser window or on the host of the VM. These drones are container nodes where you want to do actual work, but where the global resources are injected from your full WSN.

Drones could be managed with profiles that describe what tools to install initially, what parts of the configuration to push, and which notifications will be forwarded to the WSN. A special VM image could be kept up to date. When you want to explore a project that you cannot fully trust, the image could be cloned for that purpose.

The WSN would know what drones and workbenches are available and could display status info about them.

Your tool fleet might include patched tools or extensions for tools managed with a workbench (e.g. a patched Vim or a workbench of Vim-Vundles (Vim extensions distributed through git)). To have those tools available on a Drone, it would be possible to push Workbench Bots there (remember: lightweight, raw workbenches that are completely independent and also used for automated tasks and deployment pipelines)

Cloud Workplaces

Projects could prepare a cloud workplace with web-based tools (or configuration for tool-categories with default tool selection). Users could explore the project without even having the workbench application installed locally. On the other hand, workbench application users could use the WSN to push their tool-configuration to that cloud workplace, manage credentials, and receive notifications (e.g. a review status change could be routed through the cloud workbench back to your WSN)

The WSN is not a workbench

It has different and a lot less responsibilities. However, it would certainly be advantageous to have some facilities like the navigation parts and k-loc managers (VCS/sync control of known locations) of the workbench there too. So it probably is a special, stripped down workbench. The default would be to not version-control any parts of the WSN and the online storage that is used to sync the WSN could do incremental backups, for example. Advanced users could gradually put k-locs under version control.

More on security

The WSN will probably manage credentials too. After all, you would want to be able to instantiate multiple workbenches of the same project. You would not want your whole workbench live to fall into the wrong hands. So, maybe further means of compartmentalization can be incorporated. For example, a credentials archive for projects the user is not currently actively involved in.

The devices the WSNs are on can provide additional means of security to shield against unauthorized access (locking, encryption). In addition to that, a mobile device could be a notification receiver without accessing the WSN.

How fine grained the permissions/capabilities of Drones have to be remains to be seen. Viewing drones as accessible from a device with a full WSN and having notification receivers separate from that will cover a lot. The possibility of fine-grained permissions can be another guiding principle for structuring the layout of the WSN.


So, what I'm saying is that the global node can be organized as well. That again offers a lot of opportunities in the tool space.

Monday, March 10, 2014

The Location Graph

At the macro level, the filesystem abstraction (VFS) is very inflexible and not user-friendly at all. Without falling back to symlink or mount tricks, paths force content to a physical location because they are also used to address that content in scripts or with bookmarks.

Directory names are used to categorize content at the macro level. And unlike when using tags and sets, you can only categorize one-way.

I suggest working in terms of a Location Graph from the Workshop (the root of your digital world) down to individual modules (known locations; repositories) to separate those concerns.

Graph Utilities

I will use the term Module here for leaves in the location graph. They hold data content and are the units used to configure data management (VCS, replication). Other nodes serve to give structure to the graph and can be equipped with further meaning by adding tags.

All nodes in the graph are addressable by IDs that will always be unique within a particular WS. Project and module names can be universally unique, adding qualification where necessary (for example to identify forks). This means that you can restructure the graph without invalidating bookmarks or scripts.

To be able to put individual resources under a different DM-scheme without breaking applications there are Data-Spaces and Content-Units. A Data-Space is a virtual parent node (or a tag) to find all the modules that contribute to a particular resource group. Inside a module, where regular file-system semantics are used, Content-Units describe the resources contained in a directory. This works similar to, for example, the well known .directory files or similar special files.

Note in the following, that while the project hierarchy is relatively stable, most other aspects can be grouped in Views that serve a specific purpose. You can, for example, view the contributed nodes of an extension-project separately, or merged with the project it extends.

An Example Graph

This picture schematically shows an example of the location graph reduced to the essential parts. At the top, you can see the Workshop node with the global Workshop Data-Space (the yellow oval) arranged in three modules (the blue rectangles).

Below that is one of possibly many Workbench nodes with its accompanying Data-Space.

The workbench is equipped with one project that got augmented (by the user) with one extension-project (the smaller dark-green oval). The extension-project adds one data-module and one project-repository (both in dark blue) displayed here inline with the main project. The additions of a simple extension-project are just inserted into the main project hierarchy on a node by node basis. An extension-project can also be fully specified and/or override the hierarchy of the main project.

You can see that the main project is actually a meta-project that references another project. The sub-project has its own area of responsibility (separated by a dashed line). Meta-projects resolve configuration conflicts between sub-projects and add resources to combine all parts into a whole.

The Workbench Meta-Project

The Workbench is also a place for local experimentation and in this regard acts like a meta-project that is not published. You can assemble multiple projects and modules in a workbench, and also add a project-repository to contribute configuration or process resources.

If you think your setup is worth publishing, you can wrap selected top-level nodes and the project-repository into a new meta-project. You can see an example transition in the picture below. In the initial state, the project-repository is shown as a child of the WB data-space. This is again just a matter of the grouping done in a View which can be more concise in the context of creating a new meta-project.

In a similar way, you can move top-level modules from your workbench to a [meta-]project or draft comprehensive changes to a project in an extension-project and ask the project to merge the changes.

You can see that the hierarchy independent addressing of nodes plays again a very valuable role.

Diverse Points

An interesting advantage of the physical decoupling of locations is that you can move the module you are currently working with the most onto the fastest storage device without disrupting anything.

By adding tags to Workbenches it is possible to group Workbenches according to varying and overlapping aspects.

There can also be Module-Groups to have an additional option for structuring the graph. A refinement could be Fork-Groups that enforce the necessary requirements for selecting one of many clones.

Content-Units can be saved/restored in/from an archive. That would make it possible to get the configuration of uninstalled applications out of the way, but still be able to restore it on demand.

For complex setups (eg: a meta-project of multiple workplaces (develop, design, etc)) of one large project, it might be necessary to recognize when the same module or project is used by more than one workplace. References could accomplish that. On the other hand, it might be necessary to instantiate the same module separately with different configurations. That should not be a problem.

Modules can be unmanaged known locations and projects can acquire further locations (eg for build or install tasks). I wanted to emphasize the role of known locations as data management units here.

Occasionally, it might be necessary to prioritize modules of a Data-Space against each other or de/activate selected modules.


As always: crush it, chew it, see how it tastes (?!) and what can be improved.

Tuesday, November 26, 2013

Workbench Intro -- The Software Development Use Case

(This is a copy of the wiki page that you might know from

What's a workbench?

The workbench is a managed directory in your filesystem that (very much like a VCS) has its own configuration and state.

It can be equipped with source code repositories which can be iterated, inspected and generally worked with.

A workbench can also be equipped with further tools in addition to what the core provides. Tools are scripts that can be executed as workbench subcommands and they have access to the extensive scripting library and facilities provided by the workbench (configuration, project info, repository info, VCS abstraction, Build system abstraction, notification, and more).

Tools can have several origins. The core workbench provides tools for the most basic tasks, like updating repositories, or accessing the configuration. Projects can provide tools, users can write tools, and tools can be downloaded from an online Tool-Shed.

A workbench is able to combine multiple projects, and projects can continuously provide all sorts of workbench customizations.

A workbench instance provides a coherent place to work with one or more projects while tedious repetitive tasks are automated.

There is an exploratory implementation that can give you a feel how it could work, and get your creativity started about the multitude of possibilities a full workbench implementation would enable.


Automate common development related activities

Have all the advantages of automation: Less human error; less manual repetition of tedious tasks; pace; more rewarding situations per day; more time for actual development tasks; more quality because of automated QA-procedures; etc

Allow projects to easily distribute information and workflows to recipients

Projects can push project-info, tools, tasks, extensions (VCS, BS), configuration down to developers, users, testers, technophiles. It is easy for projects to define, for example, quality standards by providing the configuration for tools that check the code.

Have a simplified and coherent view of a project

The workbench bundles and organizes all sorts of project data. That helps managing the complexities of a project a lot. Similar to how well structured code can be better understood, having a good structural foundation for a project will facilitate working with it.

Combine projects to work on upcoming features

If one project works on new features that another project would like to exploit before the final release, those projects could be combined into one workbench.

Remove high project involvement barriers and attract lots of aspiring developers

Interested developers (e.g. students) come out of curiosity and it is easy for them to get fully involved and to contribute back. In particular, IDE initiation tools would allow to get from the temptation to look at a project to actual code browsing and writing within seconds.

Extensive customization opportunities

The resources provided by projects will allow even inexperienced developers to kick-start into the development process. But users that invest a bit time to get to know the workbench facilities can unleash all the powers of the workbench and create new workflows.

It's easy to write own tools and it's easy to write them good

From command aliases and small shell scripts, to full-fledged workbench tools. If you do a thing a second time, and it's more than a single step, you can do something about it. The Workbench will provide the facilities (in particular workbench-local configuration and data storage) that will encourage you to write reusable tools.

Use familiar tools across projects

Get rid of rigid scripts that work for one project only. That does not help projects or aspiring contributors. There are many procedures that are common to every project (e.g. building, testing, contributing back). The workbench will make decent abstraction of those procedures possible.

Get feedback and query workbench state

Feedback and query tools are important to stay informed about the state of a workbench. Install handlers for events happening in a workbench. Specify if you want transient OSD notifications and/or persistent notification (issue tracking) for some events. Configure your prompt to show the most important information or use desktop apps (similar to a system monitor widget) to stay informed.

Define repeatable tasks

It will be possible to create a description of a task (like build or analysis) and have that task executed in a lightweight slave workbench that will be created according to the task description. Therefore long running tasks will not block the master workbench, and the slave workbench will be an independent area where auditing and similar procedures can take place after the task finishes.