Unordered tree machine

From Esolang
Jump to navigation Jump to search

Unordered tree machines are a type of tree machine which navigates a head along a tree which has a lack of order between sibling nodes. In order to prevent movement ambiguity, the tree is modeled as nested sets instead of nested multisets. Unordered tree machines typically have a single tree, as well as several common instructions. Extremely simple shell scripting languages can often be shown to be trivially equivalent to unordered tree machines.

Motivation

Modern operating systems view long term data as being stored in files, organized in a hierarchical tree, called a directory tree. Operating systems which have a command line generally offer a method to run scripts which perform simple tasks. Some of these scripting languages are immediately quite powerful, but others are seemingly unable to perform computation. Tree machines provide a somewhat formalized framework to ascertain whether the seemingly incapable languages are in fact Turing complete through the use of simple translation.

Similar models

Tree machines model computations on a mutable tree. The equally powerful tree rewriting paradigm models computations on immutable trees. The two models are superficially similar, due to both operating on trees, but the way they interact with their trees is very different, and tree rewriting often relies on nodes having order. It is not immediately obvious how to transform a set of tree rewriting rules into a tree machine program.

The Folders language can be considered as a tree machine which also executes its tree, with Folders programs running on a universal tree machine.

Several variants of the tree machine description exist, highlighted below. The minimum symbol and depth values are for known models, they may not be truly minimal. In fact, the minimum symbol count is not known, k refers to the number of symbols required to implement a program equivalent to a universal Turing machine. The smallest s for a machine capable of universal computation likely corresponds to the minimum s required to implement either a universal unordered tree machine, or universal (cyclic) tag.

Variant Unbounded depth (min symbols) Unbounded symbols (min depth) Both unbounded
Core unknown (likely combinational) Combinational (d≥1) Combinational
With invoke Turing complete (sk) Finite-state machine (d≥1) Turing complete
invoke and self-modification Turing complete (sk) Turing complete (d≥1) Turing complete

Core model

An unordered tree machine consists of a special type of unordered rooted tree (made up of sets instead of multisets), as well as a head which navigates the tree according to a program. The tree is unbounded in size, or bounded by three parameters, the maximal depth d, maximal breadth b beneath a given node, and maximal number of unique symbols s, with bs. The tree is made up of two kinds of nodes, leaves, and non-leaves. Leaves are equivalent to files, with non-leaves being equivalent to directories, and are best modeled as a set containing its children. If non-leaf directory nodes must have at least one child, some special type of leaf node could exist which exists only under but beneath all non-leaf nodes to demarcate them as such. All notes have some identity, being a symbol.

The head always points to a non-leaf node, and can navigate to any other non-leaf node. Nodes can be referenced either relative to the position of the head (called relative paths), or relative to the root of the tree (absolute paths). Nodes can always be uniquely identified. Within a non-leaf node, all identifiers/names must be unique for leaf nodes, and non-leaf nodes, although an equivalently powerful model can require uniqueness across these categories. By convention, slashes are used to denote descent into the tree, delimiting parent and child nodes. Additionally, non-leaf nodes above the head can also be referenced relatively, which in turn allows for sibling, cousin, etc. nodes to also be referenced. The special node identities of . and .. reference the node the head points at (or the directory as is in a path), and the node above the head (or node above the prior node in a path), respectively.

This form of tree can be modeled using nested sets. Leaves are symbols, and non-leaves are sets containing a symbol, and a set containing leaves and non-leaves. A directory containing a file and directory is differentiated from just a directory by its nesting, the contents of non-leaves are always at some even nesting beneath the root. Example trees (note the lack of order or number):

  • {} the empty tree, the canonical starting state for core model tree machines
  • {a, b, c} a tree containing 3 leaves/files, /a, /b, /c
  • {b, a, b, c, {d, {}}} the same as before, but now with an empty directory/non-leaf /d
  • {a, b, c, c, {d, d, {c, a, b, b, c, c}}} the same as before, with the directory /d having 3 leaves/files, /d/a, /d/b, /d/c

For select b, d, s values, the tree degenerates into other data structures.

Max depth Max breadth Max symbols Equivalent data structure Possible maximum class Explanation
Unordered rooted tree, nested sets Turing complete
0 Set functionally incomplete (only able to evaluate combinations of OR), Turing complete (conditional invoke)
≥1 Singly nested set Turing complete
either 1 Tally marks no computations (invoke), bounded string (preamble state) If invoke is in the model the 1 symbol is used up by the entry state. If invoke is not present then the program can arbitrarily nest directories to create a string. In the case of 1 symbol the string is unary.
2 ≥3 Unbalanced unordered binary tree assumed Turing complete
any either 0 No data structure no computations Since no identifiers can be used, the tree cannot have any children, meaning it is always the empty set
<∞ <∞ <∞ Bounded tree bounded-storage-machine

Programs

Programs in the head are a list of instructions which are executed in order. When the head reaches the end of a program, the machine halts. The following commands are typically available to programs:

Command Description POSIX shell equivalent DOS/NT shell equivalent
add_directory(path) Adds a non-leaf node which can then be addressed by the given path mkdir path mkdir path
remove_directory(path) Removes a non-leaf node which is addressed by the given path rmdir path 2>/dev/null rmdir path
add_file(path) Adds a leaf node which can then be addressed by the given path touch path touch path
remove_file(path) Removes a leaf node which is addressed by the given path rm -f path del path
change_directory(path) Moves the head to point to the given path cd path 2>/dev/null cd path
copy_directory(from, to) Copies a non-leaf node from a given path (from) so it also exists at another path (to) cp -r from to 2>/dev/null copy from to
copy_file(from, to) Copies a leaf node from a given path (from) so it also exists at another path (to) cp from to 2>/dev/null copy from to

All commands always succeed. That is, if a node doesn't exist for removal, a node already exists when attempted to be created, or a node does not exist for being changed to, its actions are not performed. This means that all operations except for change_directory are idempotent between change_directory, or are always idempotent if only absolute addresses are used.

The add and copy commands do not create intermediary nodes, but the following programs do not rely on this behavior.

Complexity of core model

The model described thus far is total, it is guaranteed to terminate. However, it is able to perform some conditional behavior. The conditional behavior allows it to be in the class of combinational logic.

Consider the following program:

change_directory("d")
add_file("f")

If the program is run while the machine is in a configuration where the head is pointing at a non-leaf node with no children, the non-leaf node would now have a leaf node child identified as "f."

If, however, the program is run while there already exists a non-leaf node identified as "d," there would be a node identified as "d/f."

Directories can be viewed as values. With this perspective a NOR gate can be implemented with the following program:

change_directory("a")
change_directory("b")
add_directory("c")

A directory is set by it existing, thus a preceding add_directory("a") can be used to set a. The gate is set to output true if the path c exists, otherwise its output is false.

In the case that a is set, the machine descends into its corresponding directory. It attempts to further descend into b but is unable to. It then creates a directory, c, which is located at a/c. If both a and b are set, it descends into a first, but cannot descend into b, since it is a sibling. If only b is set, a similar situation occurs with a, but there existing b/c instead. Finally, if neither a nor b are set, the head does not descend at all, resulting in c existing.

The scheme above can be chained as desired to evaluate arbitrary expressions. This shows that a tree machine with limited depth and unbounded breadth can evaluate any combinational expression.

It is unknown if tree machines with unbounded depth and limited symbols or breadth can also evaluate arbitrary expressions.

Addition of invoke

If another command is added, invoke(path), tree machines enter the class of at least bounded-storage machines, with some being Turing complete. invoke runs a leaf node. This can be viewed in two ways, one is to view the head as containing many separate regions demarcated with labels. invoke is then effectively a goto, which examines the leaf node's identifier (and not its parent nodes) and jumps to the corresponding label. Equivalently, leaf nodes could contain data, specifically code which the head can execute. In this model, invoke clears the head's current program and replaces it with the program described in the leaf node. This model translates well to shell scripts, with e.g. POSIX shell using ./node for a relative invoke("node") and /node for an absolute invoke("/node"). For this reason, the target of invoke is often called a script.

It is important that a script, after a copy to a new name, performs the same instructions as its original. Scripts can be modeled as directories containing a file. When copied, the script directory can change name, but the file inside has the same name. invoke then looks in that directory and finds the corresponding code for the fixed filename. This allows for scripts to be modeled as an auxiliary set of tuples of (s, (c₀, c₁, ...)) where s is the symbol, or fixed filename, and (c₀, c₁, ...) are the instructions corresponding to the symbol.

In any view, it is assumed a tree machine equipped with invoke will have its code corresponding nodes placed into the tree in some known way. The tree could be initialized with them, or the machine could have a means of initializing the tree itself.

The idempotence of invoke does not matter, except for when d=0. If invoke is idempotent, then programs which have code following invoke are allowed. If not, then an equally powerful system can be used. Separate the code after the invoke into its own node. Instead of using invoke to perform conditional execution on whether or not the script path exists, cd and invoke are used to conditionally execute on whether or not a directory exists. Consider:

copy_file("if", "a/body")
copy_file("else", "body")
change_directory("a")
invoke("body")

If the directory a exists, then the if node is copied to the path a/body, the else node is copied to body, the head moves to the a directory, and the if code is executed. If not, then the first copy does nothing, but the second copy still does, the change directory also does nothing, resulting in the else code executing.

invoke is able to perform loops, if the path given to invoke refers to the same code currently executing, then it's equivalent to a loop.

Consider a tree machine which has a program like so, with the identifier iterate:

add_directory("a")
change_directory("a")
invoke("/iterate")

The first time this script is run, a non-leaf node with the identifier a is created, then the head moves to point to it. invoke then causes the script to run again, resulting in a/a. This process repeats indefinitely, resulting in an unbounded nesting of a.

Now consider another machine which has several programs:

main

add_directory("a")
add_directory("a/a")
add_directory("a/a/b")
invoke("descend")

descend

remove_file("iterate")
change_directory("a")
copy_file("/descend", "iterate")
copy_file("/stop", "b/iterate")
change_directory("b")
invoke("iterate")

stop

remove_file("iterate")

This machine will recursively descend its way down a path of nested a nodes until it reaches a b non-leaf node. By duplicating the descend program, and adding specific code for the stop invocations for each, the b non-leaf node can be nested an additional time, or raised up a level.

nest

change_directory("..")
add_directory("a")
copy_directory("b", "a/b")
remove_directory("b")

raise

change_directory("../..")
copy_directory("a/b", "b")
remove_directory("a/b")
remove_directory("a")

Finally, we can replace raise with the following to check to see if it has risen to a specific point before actually raising, otherwise execute hit:

copy_file("/raise", "hit")
copy_file("/hit", "c/hit")
change_directory("c")
invoke("hit")

The main program could initialize the tree like so:

add_directory("x")
add_directory("x/b")
add_directory("x/c")
add_directory("y")
add_directory("y/b")
add_directory("y/c")
...

After moving the head to the appropriate node, the extended pair of descend-based programs could be used to nest or raise the b node arbitrarily, with conditional execution possible by copying a specific file to /hit, which is executed when a raise cannot be performed due to it being as high as it can go.

This model encodes incrementation and decrementation as nesting or raising directories, respectively. For two nodes, a tree machine with unbounded depth can perform any computation a Minsky machine can.

Additional remove_file commands can be used to normalize the tree.

For bounded depth and unbounded breadth or symbols, the tree machine can still evaluate any combinational expression, as noted prior. Using invoke allows for the system to iterate its state, but it is unable to refer to any nodes that are not explicitly stated in its program. Since its program is finite, this model is a finite-state machine.

Self-modification

As shown above, invoke permits a tree machine to perform arbitrary computation, but only if it has unbounded depth. For unbounded breadth/symbols and not depth, self modification must be used. Self-modification is modeled with an additional command append_to(path, symbol). This command appends a single symbol to the contents of a leaf node. This command is best viewed as modifying scripts, or the instruction tuples if the auxiliary set of tuples is how scripts are viewed. One can encode any given set of scripts using many append_to commands, rather than using copy from an outside context:

a

add_file("aa")

b

add_file("bb")

equivalent to

add_file("a")
append_to("a", "a")
append_to("a", "d")
append_to("a", "d")
append_to("a", "_")
...
add_file("b")
append_to("b", "a")
append_to("b", "d")
append_to("b", "d")
append_to("b", "_")
...

Since partial scripts can now be made, the behavior of invoke when given an incomplete script is undefined. Execution of incomplete scripts is not used or encountered in the following proofs.

As shown previously, the existence of a directory can be checked. Using append_to we can construct a leaf node which checks for the existence of a directory when executed. The identity of the tested node can easily be seen as a combination of some bounded number of atomic symbols, at each point their value being tested, and if true, a symbol is added, if not, a different one is added. Bounded counting can be accomplished in this way. However, consider a program which builds two other programs.

An unbounded filename can be constructed by having a program which produces a reference to a filename, e.g. in an add_directory. The code to perform this is duplicated twice, or in models where it's equivalent, the resulting leaf node is copied. Importantly, only a partial script is generated. One leaf node is finalized as is, with the other having an additional symbol appended to the node name encoded in its program, and finalized. The existence of the shorter node name is checked first, if existing some code is executed which terminates this descent. If not, the other is executed, causing another pair of leaves to be generated with the cycle continuing. Using this scheme, a filename of unbounded length can not only be created, but also checked. When found the name could be extended or shrunk (using copy and remove), corresponding to incrementation and decrementation, respectively. An initial check to see if the minimal filename exists can be used to determine if the counter is at the minimum value. Multiple of these can be constructed, allowing for Minsky machines to be simulated by tree machines.

Other variants

The core model, extended with more traditional conditional looping, is also Turing complete.

Completion for unbounded breadth can also be achieved by adding some mechanism to generate a reference to a filename which can be modified, without interacting with the tree.

Variations of invoke

Invoke is likely to behave in one of three ways. In all of the proofs above, these variants all perform identically. Unconditional invoke requires the target to exist, thus always being the final executable instruction in a program. If the file exists, then it is executed, with the rest of the program which invoked it performing no additional instructions. If the file does not, then the machine halts. Returning invoke eventually always returns to the invoking program. In the case that the target does not exist, returning invoke is simply a no-op. Conditional invoke is the same as returning invoke, but it never returns.

If invoke is conditional and never returns, it's possible to implement an inverter. In a tree with 0 depth (set machine) this allows for NOR to be implemented like so:

copy_file("a", "b")
invoke("b")
add_file("c")

The variables can be set with add_file("a") or copy_file("noop", "a") for a (assuming a script which performs no action is in noop), similar for b. Chaining requires copying the next line of execution as the variables. Since NOR can be implemented, and invoke can also perform loops, and programs can only access some finite number of leaf nodes explicitly, a set machine with conditional invoke is a bounded-storage machine. With self-modification the system is Turing complete for unbounded symbols.

Returning invoke can also realize NOR, with variables being set with copy_file("exists", "a"), etc.:

copy_file("a", "b")
add_file("c")
invoke("b")

exists

remove_file("c")

Chaining must similarly involve copying scripts.

move

A pair of move commands could be added, move_file(from, to) and move_directory(from, to). These functions can behave in one of two ways. Clobbering move always overwrites the destination if the source exists. Non-clobbering or safe moves only overwrite the destination if it does not exist. Clobbering move is equivalent to copy(from, to) followed by remove(from) in many cases, but copy may be non-clobbering, in which case move cannot be trivially implemented.

Non-clobbering move can be used to create an AND gate.

move_file("a", "b")

The path a only exists after execution if both a and b exist. A tree machine with d=0 can therefore compute any combination of AND and OR.

Clobbering move, in conjunction with invoke can be used for conditional execution:

copy_file("else", "b")
move_file("a", "b")
invoke("b")

The input is set with copy_file("if", "a"). If set, the script will overwrite b causing the if script to execute. If cleared, no overwriting occurs, resulting in the else script executing.

Minimum requirements for Turing completeness

Minimum known s, d for Turing completeness
Unconditional invoke Returning invoke Conditional invoke
No self-modification s≥2, d=∞ s≥2, d=∞ s≥2, d=∞
With self-modification s≥2, d=∞ or s=∞, d≥1 s≥2, d=∞ or s=∞, d≥0 s≥2, d=∞ or s=∞, d≥0
Self modification, clobbering move s≥2, d=∞ or s=∞, d≥0 s≥2, d=∞ or s=∞, d≥0 s≥2, d=∞ or s=∞, d≥0

Minimum b, s

The symbols parameter is somewhat hard to pin down, but the best definition thus far is the magnitude of the largest set representable in the tree. Equivalently, it's the size of the possible set of node names under a given node. With a self-modifying tree machine, each node name is assumed to be made up of symbols itself, atomic symbols.

It was shown that for a depth of 0, unbounded symbols is sufficient for universal computation using returning invoke, conditional invoke, or clobbering move, with depth of 1 being sufficient using unconditional invoke. The model used can have a bounded breadth—largest magnitude at a given time beneath a node—but not bounded symbol count. If the symbol count were bounded there would only be a finite amount of node names, even if only a few nodes were used.

Unlike for depth, breadth and symbols cannot have a minimum of 0 even if the other parameters are unbounded, or if variants highlighted above are used.

Breadth

For breadth, it cannot be 0 as that would result in the tree degenerating into not even a zero-dimensional structure, and due to how all operations work on the tree, all operations become no-ops even if states are not in the tree.

For b=1 there are two scenarios: if states are not stored in the tree (so there is only one total state) then some bounded combinational expressions could be evaluated, if states are stored in the tree (required by invoke) then only a single state may exist in the tree.

For b=2 a single direction unbounded boolean tape can be modeled; nest an unbounded amount of directories, like d, d/d, d/d/d, d/d/d/d/..., then for each position i on the tape, if true store a symbol in the directory with a nesting of i, with d being 1, d/d being 2, etc. This can store unbounded data but it is believed it cannot actually use it. This is because at the beginning, if there is more than one state, the breadth of the root is all used up, and either the nested directory cannot be created, or one state must be deleted. A universal machine which uses a nested directory and begins with all of its states as children of the root would need to somehow use a single state which conditionally loops. In a model where each state is put into its own level of a nested directory then arbitrary states can be used. The tape model can be modified such that each intermediary level is not used to store data. Instead, these levels store a special script ("store"). Data levels instead now store a directory (say "b") which contain only a script ("store", but a distinct state from the other) if they store true, otherwise they are empty. To test a level:

change_directory("b")
change_directory("d")
invoke("store")

When executed in a data cell two things could happen. If the cell stores data, then the "b" directory exists, there is no "d" directory within, and the cell's script is executed. If the cell does not, then the machine cannot move to the "b" directory, so it nests one deeper, and executes the non-data cell. Presumably, a universal machine could be constructed.

For b=3 the standard system where states are stored in root would allow for 2 states to be used, which may be sufficient.

Symbols

For s=0 and s=1 a similar argument made for b=0 and b=1 can be made.

An observation can be made. In a model were invoke is used, in the best case there could be a free preamble state which sets up the tree, and a number of states able to be run with invoke. Any state which can be run with invoke must be in the tree, contributing to s. It is assumed from here on that there is no free preamble state, all states exist in the tree.

In order to actually store any data, then, without destroying states, s must be at least 1 greater than the number of states, sstates + 1. If this is true then something a bit like the unbounded nesting tape can be constructed, using the free symbol to create a directory in the root. In each level of the nested directory the amount of free symbols is effectively refreshed to s - 1.

Hypothetically a Turing machine program with m symbols and n states could potentially be translated into an unordered tree machine with max(n, m) states, or max(n, m) + 1 symbols. The free symbol is used to nest directories, with each state being used as a Turing machine symbol, as well as a state.

The minimum s for unbounded b and d is unknown. It has a lower bound of 2 (1 state and 1 free symbol), but as for the actual minimum, it's nearly the same problem of asking what the minimum number of states and symbols are required to implement a universal Turing machine. Assuming it is precisely the same question, the answer is likely under ten, perhaps 6, based off the (5, 5) Turing machine described by T. Neary and D. Woods in Four Small Universal Turing Machines (2009).

Notable examples

The following languages can be readily viewed as tree machines.

  • Batch in MS-DOS 2.00 and above, POSIX shell, KolibriOS's shell, and others are equivalent to a tree machine with unbounded d, b, s
  • MS-DOS Batch prior to MS-DOS 2.00 can be viewed as a tree machine with unbounded b, s, and d of 0, thus being a set machine
  • Folders is conceptually a universal tree machine with unbounded d, b, s, the actual Folders language being tree machine program descriptions encoded in a tree, executed by a fixed program