Sparkster Development Update. WE 1st September 2018.

sparkster (35)in #tech • 6 years ago (edited)

Last week, we discussed virtual machines and how we optimized our VM to execute a piece of code in 0.5 milliseconds. We also discussed how high-level instructions are broken down into simple instructions for the machine to execute. This week, we will discuss the progress we’ve made with respect to our datastore and some concerns that have been raised during our research.

Recall from last week that high-level instructions are broken down into simple instructions called bytecode, EVM bytecode, IL (Intermediate Language) code, etc. For example, an instruction like “set ‘B’ = ‘B’ + 1” is broken down into simple instructions similar to “Fetch B and load it onto the processor; put the number 1 onto the processor; add the two numbers; store the result where we fetched the original B from.” Here, in the Sparkster language, ‘b’ exists only within the function where we refer to it. This means that once the function ends, ‘b’ has no meaning and is erased from memory, making room for data from other functions.

There is a problem, however, when we want to persist data between functions. Suppose we had a function whose goal is to add 1 to some number every time it executes. The first time the function executes, it will add 1 to 0 and output 1.

The next time it will add 1 to 1 and output 2. For this situation, we need to save the value of ‘b’ and cannot forget it as soon as the function ends. In this case, we require the help of documents. Documents allow us to persist data, and every time we save data to a document, the next function can retrieve the document and update its data. In our virtual machine, documents are considered persistent storage. In order to act on a document, the storage nodes in our block chain must go out to the decentralized datastore, search for the document in question given some filters, and fetch it. They must then give that document to the compute node that wants to compute against the document. This translates to “Fetch ‘b’ from memory and load it onto the processor,” except in our case, ‘b’ is a little more complex than some value in memory.

We have completed a significant part of the search and retrieval process on the storage node. In other words, given some filter criteria like “retrieve ‘b’ where ‘value’ = 1,” we are able to successfully fetch the proper data from our datastore. We have also completed saving new data to the datastore and create the document templates themselves. From here, updating a document will be a trivial task for us, so we consider the completion of data retrieval to be significant.

At this time, we are working on communication between the compute and storage nodes. This will mean that the compute node will be able to successfully request the storage node to retrieve some data and send it back so that the compute node can operate on it.

A next challenge we are addressing is verification. Suppose that a compute node executes some code C and performs some action A. How do we verify that when the compute node executed C, it actually executed C and not some arbitrary piece of code and then proceeded to “lye” and tell us that it actually executed C? Further, how do we verify that the compute node did perform action A as the logic in C required?

Traditional block chains overcome this problem of verification by implementing a proof of work (PoW) algorithm. The details of PoW are drawn out and we will not discuss them here, but it is sufficient to say that performing PoW is an expensive process and ends up guaranteeing that the node that executed C actually did execute C.

In our blockchain, we are employing a proof of stake algorithm (PoS.) This means, to execute code, nodes are selected based on their stakes (i.e. Spark tokens that the nodes have staked.) This does not help us to verify that the nodes are actually executing C. While the stake of a node (what that node has put at risk of loss) is strong incentive for the node to be truthful (since if the node is discovered to be lying, their stake is lost,) without a system of verification, nodes still have an incentive to cheat.

To overcome the verification problem, we will employ verification nodes that will double-check the compute nodes to make sure that the compute nodes actually executed C. But does this actually solve the problem? We now run into an issue of cheating from the verification nodes. The verification nodes have an even stronger incentive to lye than the compute nodes do, since if a compute node is found to be a “bad actor” as the saying goes, the verification nodes that discovered the false information are entitled to the compute node’s staked tokens.

Hence, requiring the verification nodes to arrive at consensus amongst themselves provides a practical solution to this problem. The process of arriving at consensus is to ask a set of verification nodes to all execute the same code, and compare the outputs they arrive at. If a sufficient number of nodes agree with the output of the compute node, we can trust that the compute node did in fact execute the transaction successfully.

The reader might ask at this point why we simply do not use PoW? The answer is that we aim to be able to run the compute nodes on mobile phones as well as desktop computers. Because the limiting agent in our case is the mobile phone, we must take data and processing power into account. PoW is expensive in terms of processing power, it will be unwise to run a PoW algorithm on a mobile phone. In summary, if the algorithm is easy enough for a mobile phone to execute, it is easy enough for a server with significantly more processing power to falsify.

We can easily solve the problem by guaranteeing that a compute node and verification node have not been tampered with; however, since our project will be open-source, this in itself poses its own risks as the code can be manipulated to serve the desires of a hacker. Arguably, even with a closed-source solution, we would still face the same problem because a malicious user can reverse-engineer the binaries that we provide, although it is considerably more difficult to modify a de-compiled program versus modifying a program given its original source code. We will explore this subject further in a future article.

#blockchain #sparkster