Data is becoming increasingly recognised as an asset of value. So much so, in fact, that data marketplaces have opened up, establishing an emerging data economy. This has opened up a wealth of profit-making opportunities that most people are still unaware of. Having worked closely with leading data marketplaces for over a year, I decided to try my hand at something new: arbitrage with data as an asset.
Disjoint Data Economies
In building ArBot, I first looked at the leading data marketplaces that were actually able to assign a value to datasets according to their usefulness. This quickly led me to two platforms: Ocean Protocol and Fetch.AI.
Ocean Protocol is a platform for buying and selling data. On the platform, datasets carry value according to their popularity and usability. Fetch.AI, in contrast, is an IoT economy which routes data to those who find it most useful. Through this routing, the data provider is paid by the data consumer for their services. Both Ocean Protocol and Fetch.AI are oriented toward similar datasets, with a focus on performance in training machine learning algorithms.
There is a clear opportunity for bridging the marketplaces of Fetch.AI and Ocean Protocol. This is where I started. Both platforms have an excellent Python SDK, so I went with Python. The back-end of ArBot ultimately became a module for moving data between the two markets. I packaged it all as a pip module so anyone can move datasets between the two marketplaces.
Any given dataset may carry one price on Ocean Protocol and another on Fetch.AI. This opens up the opportunity to perform arbitrage with data as an asset. This is where I took the next step. Both Ocean Protocol and Fetch.AI use cryptographic tokens as a means of exchange, so I started by translating this into an independent measure of value, the US dollar. After accounting for network fees, I could now accurately see the price differences between datasets.
Using the dataset-moving module from earlier, I created a simple arbitrage automation function that took in a search query and executed trades only in cases where profit could be made. I had arrived where I wanted: a profit-making opportunity in the emerging data economy.
ArBot is a tool for triangular arbitrage with Fetch tokens, Ocean tokens and data. Where things get complicated is judging the value of datasets: in contrast to the tokens, datasets are non-fungible. To make matters worse, two completely different datasets could be equally valuable. Consider a disease-diagnosing AI which could use either a malaria dataset or a pneumonia dataset to improve its accuracy by 2%. To the AI, both are equally valuable, however the value equality would not be immediately clear to a human.
Managing Data Arbitrage Risk with Specificity
The key parameter for executing execution risk in data arbitrage is specificity. By limiting ArBot’s search space, we get stronger guarantees that the consumer it is selling to is receiving what they expect. Thus, specificity allows users of ArBot to choose their own appetite for risk. At low levels of specificity, there is a high risk that the consumer will reject the dataset offering, but there are far more opportunities available. At high levels of specificity, there are fewer opportunities, but the consumer is much more likely to buy.
An example of a high-risk strategy for ArBot is feeding it the query ‘malaria.’ There may be a seller of data labelled malaria on Ocean Protocol and a buyer looking for malaria on Fetch.AI, however their uses for the data might be different. The seller could be in possession of treatment data, and the buyer may want to train their disease-identifying AI with symptom data. In this case, provided the seller’s price is lower than the buyer’s (factoring in network fees), ArBot will take the risk of the buyer not wanting to follow through with the transaction. The benefit of this strategy, however, is that there will likely be a multitude of results for the query malaria, many of which will result in successful arbitrage.
In contrast, a low-risk strategy would be to feed ArBot the query ‘malaria symptom dataset, AI classification.’ In this case, there may be few results, however the buyer and seller likely refer to the exact same thing. With a low-risk strategy, the probability of failed arbitrage is low.