Open Sourcing of UA

in #utopian-io5 years ago

Repository

https://github.com/holgern/ua-python

steem-ua.png

user authority - a way to measure trust

UA works on the assumption that a user follow when he want's to see new posts from the account he starts to follow in his feed. When an account has a high UA value, it means that the account is followed by many other accounts as it has produced good content.

The whole thing doesn't work, though, since there are accounts that are followed by countless accounts without ever having produced any content at all.

One solution to this problem is the establishment of trusted accounts. Only accounts that were followed direct or indirect by trusted accounts have a UA value higher than zero. Who should be set as a trusted account? This is not an easy task. As witnesses in the steem blockchain have been elected, witnesses can be seen as somehow trusted accounts.

It can be seen on those figures from the paper "Combating Web Spam with TrustRank" from Z. Gyöngyi et al. how this works. Each trusted node gets an initial seed value, which propagates through the follow connection to followed accounts. The amount which is propagated depends on the number of following accounts.

There is damping. This means that an account which is not directly followed by a trusted account can accumulate less UA than an account which is directly followed by a trusted account.

Mathematics

We are starting with a static score distribution d which depends on the received witness votes (non-witnesses are set to zero)

d = []
for d in accounts:
   if d is witness:
      d.append(witness_votes)
   else:
      d.append(0)

The sum of d is normalized to 1

d = d/sum(d)

Now the UA_raw value can be computed by:

UA_raw = d
while True:
    UA_raw = alpha * T * UA_raw + (1-alpha) * d

where alpha is the decay factor and normally set to 0.85 and T is the transition matrix.

T*UA_raw is a matrix multiplication and is implemented as:

for a in accounts:
   new_value_a = 0
   for follow_a in a["followers"]:
      new_value_a += UA_raw_old[follow_a] / len(accounts[follow_a]["following])
      
   UA_raw_new[a] = alpha * new_value_a + (1-alpha) * d[a]

The algorithm is now extended by a virtual omega account. Whenever an account is nobody following (also called dangling account), a virtual follow to the virtual omega account which is following itself is added:

new_value_omega = 0
for follow_a in zero_followings:
    new_value_omega += UA_raw_old[follow_a]

UA_raw_omega = alpha * new_value_omega

Implementing the omega account is necessary, as without, the algorithm does not converge.

The sum of UA_raw_new + UA_raw_omega is now 1.

Example

In this simple example, B is followed by A. B is a dangling account. When no omega account is added, the sum is not 1 for every iteration. The algorithm is stable after two steps.

stepABsum
0101
11 * 0.151 * 0.851
21 * 0.150.15 * 0.850.277

Now, the omega account is added and assure that the sum is always 1. Now B is followed by A and omega is followed by itself and B.

stepABomegasum
01001
11 * 0.151 * 0.8501
21 * 0.150.15 * 0.850.85 * 0.851
31 * 0.150.15 * 0.85(0.85 * 0.85 + 0.15 * 0.85) * 0.851

Mapping UA_raw

UA_raw_new is mapped to a logarithm scale by:

def UA(x, N):
    score = math.log10(x * N + 1/N)*2 + 1
    if score < 0:
        score = 0
    if score > 10:
        score = 10
    return round(score, 3)

Technology Stack

The calculation of UA is divided into the following scripts:

streamCustomJsonOps.py

In order to calculate UA, all follow connection of all account must be freeze for the calculation. Thus, fetching just the follower for each account using the API does not work as this would take several hours and the first fetched accounts have a different follow state than the last ones.

The python script fetches all account creation operation and custom_json with follow id from the first block. By doing it this way, it is guaranteed that all follow connections for all accounts are known for the exact same time. The script parses account creation operation which are:

  • account_create
  • account_create_with_delegation
  • pow
  • pow2
  • account_create_with_delegation

and custom_json operation.

When a new account operation was found, it is stored into the database:

account_data = {"block_num": block_num, "type": type_id, "account_index": next_account_id, "account_name": account_name}

Whenever a custom_json with id=follow is found it is stored in the database. Stored are follower, following, the what field and the current block number.

build_follow_db.py

This script goes through the stored account creation and follow/unfollow operation and stores the current follow network state for a specific time point. The generated database contains all available account names and ids, and all follow connections.

In order to keep the database is small as possible, all follow connections are stored with the account number. Each follow is one data entry with:

{"follower": follower, "following": following, "state": state, "block_num": data["block_num"]}

where follower and following are the account ids.

At the moment this database has 112,732,000 entries. For the calculation of UA, all follow connection are needed. As they cannot be stored into RAM, they need to be read out when need from the database.

This process could be speeded up by storing the follow connection into a mongo database. The possibility to use a mongo database was added later and is optional.

When setting use_mongodb=True the accounts and its follower/followings are stored into a mongo database. This is the default setting.

init_accounts_for_ua.py

In this script, all parameter for UA calculation are reset and initialized.

set_trusted_accounts.py

Fetches all active witnesses and calculates the static score distribution for all accounts.

build_ua3.py

Implements the iterative loop to calculate UA_raw for all accounts. The calculation is stopped when 20 iterations are reached or when the difference between two iterations is smaller than 1e-8.

ua_top100.py

This script is used to rank all accounts and to calculate the top 100 accounts.

Roadmap

We will work on open sourcing more UA-related components and try to improve the documentation readability of the code.

How to contribute

Everyone can fork the repository and do a pull request. You can also contact us in our discord.

GitHub Account

https://github.com/holgern

Sort:  
  • Great leap forward for the steem ecosystem.
  • Code is clean and clear.
  • Looking forward the next steps in the project.

Your contribution has been evaluated according to Utopian policies and guidelines, as well as a predefined set of questions pertaining to the category.

To view those questions and the relevant answers related to your post, click here.


Need help? Write a ticket on https://support.utopian.io/.
Chat with us on Discord.
[utopian-moderator]

Thank you for your review, @helo! Keep up the good work!

Very happy to finally see this and have little bit of an understanding on how things are working behind the scenes. @steem-ua is my second largest delegation. These systems can contribute a lot to the ecosystem on the long run due to being a system that is hard to game. It's also really nice to see that my UA Rank is basically 1500 ahead of my follower rankings. Thanks for everything you've done and Best of Luck For The Future!

nice to see this go OS!

Nice. Open source doesn’t mean losing but gaining in the long run. Good job

Awesome @steem-ua. Thank you for making this open source. I hope people will be able to improve this amazing algo to combat spam.

I was never keen on this inititiave, but it is more my belief everything becomes one big gratification ring here. I do keep an eye in my ranking out of curiosity. For most part, it seems reasonable. I know you’re trying to do good on the blockchain, so for that I am appreciative.

a great step in the right direction. Not sure about the 100% self vote??????

Gotta get that curation rewards you know.

hey, finally :)

You said it

Wow, very nice, great to see this.

Thank you very much for taking that step. I've been waiting very long for this. ;-)

Coin Marketplace

STEEM 0.27
TRX 0.13
JST 0.032
BTC 60782.22
ETH 2915.02
USDT 1.00
SBD 3.64