Bug on Hivemind’s following data

in #utopian-io5 years ago (edited)

Project Information

Problem

Hivemind backed api.steemit.com reports invalid/missing following data for some of the accounts. (In comparison to a full node)

How to reproduce

  1. Query the user curbot's following list. (condenser_api.get_following)
curl -s --data '{"jsonrpc":"2.0", "method":"condenser_api.get_following", "params":["curbot",null,"blog",100], "id":1}' https://api.steemit.com
  1. Do the same query on a full node: (https://rpc.usesteem.com)
curl -s --data '{"jsonrpc":"2.0", "method":"condenser_api.get_following", "params":["curbot",null,"blog",100], "id":1}' https://rpc.usesteem.com

You can see the response is different and incomplete in api.steemit.com..

A Python script the detect discrepancies

I believe this is not an exceptional case. I have seen more discrepancies like that while trying to test/benchmark the tower's new endpoints.

This Python script detects discrepancies on follower lists.

from steem import Steem
from steem.account import Account


def get_diff(account):

    followers_on_hivemind = Account(
        account,
        steemd_instance=Steem(
            nodes=["https://api.steemit.com"])
    ).get_followers()

    followers_on_full_node = Account(
        account,
        steemd_instance= Steem(
            nodes=["https://rpc.usesteem.com"])
    ).get_followers()

    print(
        "Accounts listed on api.steemit.com but not in the rpc.usesteem.com")
    print(set(followers_on_hivemind).difference(set(followers_on_full_node)))
    print("*" * 42)
    print(
        "Accounts listed on rpc.usesteem.com but not in the api.steemit.com")
    print(set(followers_on_full_node).difference(set(followers_on_hivemind)))


The result for @emrebeyler's followers:

Accounts listed on api.steemit.com but not in the rpc.usesteem.com
set()
******************************************
Accounts listed on rpc.usesteem.com but not in the api.steemit.com
{'hariyati.amin', 'curbot', 'kenzyobiadi', 'erhanbute'}

After some digging, I have found a rare case on a differently formatted custom json.

For example, I have checked the account history of curbot that when he exactly followed my account, and found this transaction:

Transaction ID: aaccccb73b6dfcb4bbf95f6d2dcb76e1c87137e9

Looks like curbot was bundling follow operations into one transaction. And steemd picked up these and registered as valid follow actions.

However, hive's indexer ignores the custom_json op if loaded json's length is greater than 2.

https://github.com/steemit/hivemind/blob/f7a467921678d928a0d94928c811442b8ab80bce/hive/indexer/custom_op.py#L55

For this case it's greater than 2 because the format is like:

[
    ['follow', {
        'follower': 'curbot',
        'following': 'kevinwong',
        'what': ['blog']
    }],
    ['follow', {
        'follower': 'curbot',
        'following': 'nothingismagick',
        'what': ['blog']
    }],
    ['follow', {
        'follower': 'curbot',
        'following': 'simnrodrguez',
        'what': ['blog']
    }],
    ['follow', {
        'follower': 'curbot',
        'following': 'steem-ua',
        'what': ['blog']
    }],
    ['follow', {
        'follower': 'curbot',
        'following': 'decentraland',
        'what': ['blog']
    }],
    ['follow', {
        'follower': 'curbot',
        'following': 'mikepm74',
        'what': ['blog']
    }],
    ['follow', {
        'follower': 'curbot',
        'following': 'empath',
        'what': ['blog']
    }],
    ['follow', {
        'follower': 'curbot',
        'following': 'emrebeyler',
        'what': ['blog']
    }],
    ['follow', {
        'follower': 'curbot',
        'following': 'eroche',
        'what': ['blog']
    }],
    ['follow', {
        'follower': 'curbot',
        'following': 'ervinneb',
        'what': ['blog']
    }]
]

This explains curbot.

Regarding my other 3 missing followers:

FollowerFollowingTx IDBlock numTimestamp
erhanbuteemrebeylerd10dcd1bdb661fc4e63f2464fa2262624db5d003267109862018-10-11T09:55:21
kenzyobiadiemrebeyler9ef235eb36aac5e466b97ad3e459b7eb9495f898264923932018-10-03T19:38:45
hariyati.aminemrebeyler383a36f7aa65724eb634ebdae141366674dc1df8264504692018-10-02T08:41:33

Timestamps suggest that it happened between 2018-10-02 a 2018-10-10. These transactions don't involve anything unusual.

Additionaly, I have checked roadscape's followers on Steem:

Got this discrepancies:

{'curbot', 'kamvreto', 'msutyler'}

We know the problem w/ curbot so I have checked the other accounts.

For the kamvreto, they followed roadscape at 2016-07-25T22:35:12.

Here is the account history output:

{
    'trx_id': '2b7595b1f3e0e0105156d518b83d7eeaa19b6070',
    'block': 3514062,
    'trx_in_block': 3,
    'op_in_trx': 0,
    'virtual_op': 0,
    'timestamp': '2016-07-25T22:35:12',
    'op': ['custom_json', {
        'required_auths': [],
        'required_posting_auths': ['kamvreto'],
        'id': 'follow',
        'json': '{"follower":"kamvreto","following":"roadscape","what":["posts","blog"]}'
    }]
}

It was a legacy custom_json transaction. The tricky part is that transaction's what property includes two elements.

You can see the Follow constructor expects one element:

https://github.com/steemit/hivemind/blob/60dc61ee4bbde2080421a3fdf10c5b83be840e8b/hive/indexer/follow.py#L71
For this reason, Hive also ignores that.

The problem is same with the other missing follower of roadscape:

{
    'trx_id': 'c7694ff17ba7ba3fbe1740f05c2727ecbd98cd62',
    'block': 3409232,
    'trx_in_block': 1,
    'op_in_trx': 0,
    'virtual_op': 0,
    'timestamp': '2016-07-22T06:18:27',
    'op': ['custom_json', {
        'required_auths': [],
        'required_posting_auths': ['msutyler'],
        'id': 'follow',
        'json': '{"follower":"msutyler","following":"roadscape","what":["posts","blog"]}'
    }]
}

Expanding the sample size:

Discrepancies on @utopian-io's followers:

Accounts listed on rpc.usesteem.com but not in the api.steemit.com
{'qawazd', 'steemgems', 'curbot'}

FollowerFollowingTx IDBlock numTimestamp
steemgemsutopian-io25e9c3d8e625e634b68bd5e16e99327fd37174ae267223682018-10-11T19:25:27
qawazdutopian-io8de43899a8ad84b8bd65a896e71e3e0eafda0757268389412018-10-15T20:37:51

Follow operations are valid. Dates are close to what we miss at @emrebeyler's account: 2018-10-11 and 2018-10-15.

TL;DR

  • We have missing follow ops on api.steemit.com's hive instance. (Generally clustered around the month 2018-10.)

  • Hive ignores if the follow operation includes multiple follows. (steemd accepts it. The case with the @curbot)

  • Hive ignores some legacy follow operations. Because, these ops may include two elements in the what property. (Ex: ["posts", "blog"])

My GitHub Account

https://github.com/emre

Sort:  

Thanks for your contribution.

Apologies for the delay in review.

Your contribution is well detailed and the steps were very easy to follow. Overall I really like the amount of detail you put into the investigation, both within this contribution and inside the GitHub issue. This really is great!.

I can see that a collaborator has acknowledged the issue which is also great to see.

Although there is no potential fix provided, the level of detail you have added will reduce the level of investigation required by any developer looking into this considerably.

Overall, great work and once again, thanks for your contribution.


Your contribution has been evaluated according to Utopian policies and guidelines, as well as a predefined set of questions pertaining to the category.

To view those questions and the relevant answers related to your post, click here.


Need help? Chat with us on Discord.

[utopian-moderator]

Thank you for your review, @tobias-g! Keep up the good work!

Great pickup! There are obviously still some teething issues with Hivemind and there will always be the need for some full steamd nodes to enable these sort of checks. Question is who pays for them?

Posted using Partiko iOS

Witnesses! :-)

I am planning to fire up a full node. Just waiting for the top20. 🎉

Do you think it’s realistic to expect all Top 20 witnesses to run full steamd nodes (with 512Gb RAM instances and the cost that goes with this?)
I think it’s reasonable to expect them all to run Hive based Full nodes (2x64Gb + 32Gb instances) but a smaller subset will still need to run full steamd nodes. Question how they are
compensated for the extra cost.

Posted using Partiko iOS

I think it's reasonable to establish your own expectations for what witnesses at each level should be doing to deserve your vote, and it seems entirely reasonably for those expectations to be set dependent on level.

(Cloud-based server fine under 50, physical server above 50, full node in top 20, for example)

2018-10-11 to 2018-10-15

Remind me again, what were the release dates for #hf20? Is this potentially related to one of the hotfixes applied at that time?

I dont think so. There is no problem on full nodes, they’re returning the data correct.

It might be a hiccup/past bug on tgat timeframe. Hard to say before a full index on a new fresh hivemind.

Would there be any way to detect this bug without a reference API node?

I don't think so. We need something to double check/cross reference. :)

Actually i love to learn this aspect of computer that deals with this coding and stuff like this but i am fearing that i will not be able to cope because of its level of complex

Hello!

I am a community manager at Snax. We are trying to make public blockchain based on EOS node. Snax chain will provide transactions over social networks, token supply based on user social influence.

Snax as well as Steemit rewards its users for the content created, but Snax works as overlay solution over existing social networks (e.g. Twitter)

We have no ICO. We already have a testnet, mainnet will be launched this month, and we currently looking for great candidates for Block Producers like yourself. You can find out more about us at our website snax.one

If our project is interesting for you, please let me know by emailing me at [email protected]

Looking forward to hearing from you, and keep rocking this world!

This post has been included in the latest edition of SoS Daily News - a digest of all you need to know about the State of Steem.

Hi @emrebeyler!

Your post was upvoted by @steem-ua, new Steem dApp, using UserAuthority for algorithmic post curation!
Your post is eligible for our upvote, thanks to our collaboration with @utopian-io!
Feel free to join our @steem-ua Discord server

Hey, @emrebeyler!

Thanks for contributing on Utopian.
Congratulations! Your contribution was Staff Picked to receive a maximum vote for the bug-hunting category on Utopian for being of significant value to the project and the open source community.

We’re already looking forward to your next contribution!

Get higher incentives and support Utopian.io!
Simply set @utopian.pay as a 5% (or higher) payout beneficiary on your contribution post (via SteemPlus or Steeditor).

Want to chat? Join us on Discord https://discord.gg/h52nFrV.

Vote for Utopian Witness!

Coin Marketplace

STEEM 0.34
TRX 0.11
JST 0.034
BTC 66361.53
ETH 3253.14
USDT 1.00
SBD 4.43