Steem Analytics - Distribution of EarningssteemCreated with Sketch.


SteemAnalytics.png

This post is an investigative analysis. It forms part of a series in which I attempt to paint a broad picture of the Steem economy.


Introduction

I have been planning a series of analyses for a while. The aims are to paint a broad picture of the Steem economy, looking at where we are now and at where Steem might be headed.

In this first analysis I consider the distribution of earnings.

  • How much do users typically earn through posting and commenting on the Steem blockchain?
  • Is there an even distribution of rewards? Are most payouts taken by high earning accounts? Or do low earners collectively take home the largest piece of the pie?
  • How do curation and benefactor rewards impact the picture?

0. Data

I have based the earnings analysis on data from the first two weeks of September (September 1 - September 14 inclusive).

The value of post rewards is heavily influenced by the price of Steem. Given the high volatility of the crypto markets different data periods will produce significantly different total rewards. However in this analysis I am mainly interested in the comparative spread of earnings which should hopefully have greater stability. As a check that the data period is representative I have repeated the study with data from the first two weeks of August.

I decided to use posts and comments created in the 14 day period rather than posts and comments paid out over the 14 days, i.e. earnings accrued by activity rather than earnings accrued by payment. I believe that either approach would be appropriate.

All earnings amounts are expressed in STU throughout this analysis. Payouts in Vests, Steem and SBD are converted to STU using factors derived empirically for each hour of the 14 day period.

1. Distribution of earnings - Author earnings:

The first task was to produce a distribution of author earnings, i.e. earnings from posting and commenting. This was achieved by summing the author earnings for each user over the 14 days and then grouping users into buckets: users that earn <$1, users that earn <$2, and so forth. The number of users in each bucket provides the distribution.

However it is difficult to get a clear visualisation on this distribution of author user earnings. The distribution is broadly (negative) exponential, with the vast majority of users at the lower end of the scale, and with a long tail.

Using buckets of $50 (<$50 earned over the 14 days, <$100, ... up to $1500+) produces the following chart under a linear y-axis scale:

authorEarningsLinSep.png

Not particularly informative. Switching to a log scale for the y-axis makes the data visible but is not intuitive to read:

authorEarningsLogSep.png

Using small buckets (<$1, <$2 and so on) reduces the volume of the first bucket but only to a limited extent. The full chart on this basis would require a vast number of buckets (here are the first 50):

authorEarnings1BucketSep.png

Finally I have decided to create bespoke buckets that I feel are intuitive and best illustrate the distribution of earnings. The buckets are as follows:

  • Less than $1 in total over the 14 days;
  • Less than $1 per day (on average, i.e. $1-$14 in total);
  • $1 - $5 per day (on average, i.e. $14 - $70 in total); and
  • More than $5 per day (on average, i.e. $70+ in total).

authorEarningsBespoke.png

September 1-14

Already we have some interesting information. As can be seen, 71% of users earned less than $1 in author rewards in total over the 14 days. Another 19% earned less than $1 per day on average. 7% of users earned $1 - $5 per day on average and only 3% of users earned $5 per day or more (i.e. $70 for the month).

To provide evidence that this data period was not unrepresentative I repeated the exercise for the first two weeks of August, the prior month. The distribution, which is broadly similar, is shown below:

authorEarningsBespokeAug.png

August 1-14

2. Spread of earnings - Author earnings:

The above analysis shows how many users earn at each earnings level (or bucket) over the 14 day period. But I was also interested in how the overall earnings were distributed across the different earning levels. Are most rewards distributed to the high earners? Or do lower earners take home the largest piece of the pie?

To generate this earnings spread distribution I replaced the count of users within each bucket with the sum of author earnings from all users in each bucket. The chart based on $50 buckets is as follows:

authorearningsSumSep.png

Being inquisitive by nature I had a look through the users in the $1500+ bucket (users earning in excess of $100 per day on average). After a brief investigation it became clear that I needed to remove the impact of voting bots.

3. Removing the impact of voting bots

As most readers will be aware, a fairly significant proportion of upvotes on Steem are currently purchased from voting bots. The use of voting bots can exaggerate a user's earnings as measured by post payout information, since the votes need to be paid for, reducing the user's net earnings. I decided that this element needed to be removed before I progressed any further.

In order to remove the impact of voting bots I used the following steps:

  • Creation of a list of voting bots;
  • Capture of all votes from each voting bot made on the posts included in the above author rewards analysis;
  • Calculation of the value of each vote;
  • Aggregation of these vote values by author;
  • Merging of the array of earnings by author and the array of voting bot deductions by author;
  • Creation of the earnings distribution from the new merged array of net author earnings.

To illustrate the impact here is the earnings distribution from section 2 restated with the voting bot votes removed:

authorbidbotearningssumSep.png

There are a fair number of limitations here:

  • A more interesting approach might have been to deduct the amounts paid for each voting bot upvote and thus include voting bot profits and losses in the earnings distribution. However I was limited here by difficulties with the fx rates to convert between Steem and SBD transfers and the STU rewards.
  • I only excluded bid-bots but did not adjust for voting bots such as minnowbooster.
  • I probably should have added the voting bot deductions to the relevant vote-bot accounts in the earnings distribution. This would produce a more complete earnings distribution.

Plenty to work on in future!

4. Spread of earnings - bespoke buckets

Back to our bespoke buckets! We can now compare the count of users in each bucket with the sum of earnings in each bucket:

authorcountvearningsv2.png

It's an interesting chart with some tasty soundbites:

  • 2% of accounts (970 users) earn 57% of author rewards.
  • 9% of accounts (3907 users) earn 85% or author rewards.
  • 72% of accounts took home 1.5% of author rewards between them.

5. Curation and Benefactor rewards

Finally, how do curation and benefactor rewards impact the picture?

The distribution of curation rewards was included in the overall distribution by:

  • Taking each vote on posts included within the author earnings analysis (so the analysis considers a set of posts in completeness rather than votes made within the two week period - this felt like a more solid approach);
  • Capturing all votes with curation rewards (in Vests);
  • Translating the Vests rewards to STU;
  • Aggregating the curation rewards by user;
  • Merging of the array of curation rewards earnings by author and the array of author rewards earnings;
  • Creation of the earnings distribution from the new merged array.

Benfactor rewards were included using a broadly similar approach.

I have produced charts of the $50 bucket distribution:

50bucketscombinedSep.png

And the bespoke buckets distribution:

combinedbucketsSep.png

As can be seen, both curation rewards and benefactor rewards increase the proportion of rewards heading to high earning accounts.

6. Conclusions

In response to the original questions:

How much do users typically earn through posting and commenting on the Steem blockchain?
The majority of accounts earn very little. However it is too early to draw much in the way of conclusions from this data. Are these accounts new users? Are they bots? How much did they post? Are they really dedicated users gaining no rewards? Only a more in-depth study of these accounts would provide these answers.

Is there an even distribution of rewards? Are most payouts taken by high earning accounts? Or do low earners collectively take home the largest piece of the pie?
A small number of accounts, approximately 1000, claimed the majority of rewards (57% of net author rewards, or 67% once curation and benefactor rewards are included). It looks fairly safe to conclude that most payouts are taken by a small number of high earning accounts.

How do curation and benefactor rewards impact the picture?
Curation and benefactor rewards both skew the distribution towards high-earning accounts. This is an unexpected conclusion.


Next steps

In the next installment of this investigative series I will look at how earnings impact user retention and consider how many users the Steem blockchain can support.


Tools and Scripts

gears_blockops_green.jpg

I used the block.ops analysis system to produce this study. Block.ops is an open-source analysis tool designed for heavy-duty analyses of the Steem blockchain data.

You can find the repository for block.ops here:
https://github.com/miniature-tiger/block.ops

The study can be recreated by:

  • Loading the data for the relevant time period into block.ops.
  • Using the earningsdistribution command from the command line, for example:
    $ node blockOps earningsdistribution "2018-09-01" "2018-09-15"

Block.ops stores all posts and comments from the period in a MongoDB collection and the "earningsdistribution" command runs aggregation queries to summarise the results, then post-processes to export the results to csv. Payout amounts are converted to STU using hourly fx factors derived from actual posts. I used the mac numbers spreadsheet tool for the chart illustrations. Eventually I will build my own charts for use with block.ops.


Relevant Links and Resources

Links are provided in the text.


Repository

https://github.com/steemit/steem

This analysis is of data from the Steem blockchain which is an open source project.


Thanks for reading!

Sort:  

Hi @miniature-tiger, that's an impressive work, really great!! We've had a couple of different earning reports in the past, but you've covered aspects and data groupings that I haven't seen before. I'm actually not very surprised about the general trends in the results, but seeing the distribution confirmed with data really emphasizes the situation. I especially like the approach to calculate the bid bots out. I think this gives a more realistic picture. Even though there might be more details that could be worked out as you mentioned, I guess the overall magnitude of this effect is reasonably reflected with your approach. Looking very much forward to your next post in this series!! :)

Your contribution has been evaluated according to Utopian policies and guidelines, as well as a predefined set of questions pertaining to the category.

To view those questions and the relevant answers related to your post, click here.


Need help? Write a ticket on https://support.utopian.io/.
Chat with us on Discord.
[utopian-moderator]

Hey @crokkon,

We've had a couple of different earning reports in the past

Yes. When I started this analysis what I was really trying to look at was ideas around mass adoption and retention and how the different work / reward profiles of the various dApps might compare in building strong communities.

But the starting point was being able to generate the earnings distribution for a set of users (a dApp, a community, or overall) and when I looked at the overall earnings distribution I thought it was sufficiently interesting to post by itself.

I've got the overall earnings / retention part also pretty much complete now so I'll post that up in the next couple of days. There are some interesting conclusions coming out there too. The individual dApp analysis will take a little while longer.

Thanks for the review!

Hehe, it's indeed very often the case that it's not clear at the beginning what will come out at the end. Looking forward to the retention aspects! And you once more reminded me with your work to finally give blockOps a try - I hope I can make it! :)

Thank you for your review, @crokkon! Keep up the good work!

Thanks for the analysis. The most zoomed-in chart of earnings suggests a kind of censored data model. How many users make exactly 0? A chart starting at 0 with $0.001 increments could be very revealing. I suspect that a large number of users make 0 or some tiny amount, with a sharp downwards break to the "exponential curve." If so, earnings would be best modeled as a 2 step process: 1. The probability of receiving $0 (or whatever the cut-off is) and the distribution of earnings, given positive earnings. This model also points to the process generating earnings. A large number of members make 0 (by not participating?) and the rest make earnings described by some smooth function up to random error.

Hey @rufusfirefly, that's a great point, thanks!

There is definitely a break between 0.000 and 0.015 due to the "minimum payout" threshold, under which posts which have an expected payout of less than 0.02 (so 0.015 after curation) do not actually pay out to users.

So you are right, there are a very large number of users who earn 0.000 - due to no votes received or insufficient votes to reach 0.02 payout on any post (I only include users in the author earnings distribution who make at least one post or comment).

Building an actual model of the earnings distribution would probably require looking at the distribution of the underlying reward shares (i.e. the raw vote amounts allocated to posts) with a conversion function to earnings payouts.

That underlying distribution will also most likely need separate processes between users that receive 0 votes/reward-shares and the remaining distribution.


You just planted 0.04 tree(s)!


Thanks to @rufusfirefly

We have planted already
5838.97 trees
out of 1,000,000


Let's save and restore Abongphen Highland Forest
in Cameroonian village Kedjom-Keku!
Plant trees with @treeplanter and get paid for it!
My Steem Power = 30846.64
Thanks a lot!
@martin.mikes coordinator of @kedjom-keku
treeplantermessage_ok.png

We all know who earns the 1500 at the end of the graphs without the aid of bidbots....13% of the reward pool at any given point in time not including his alt accounts curation rewards.
Absolutely brilliant stats, and so readable for thickies like me. Looking forward to your next analysis
Thanks and best wishes :-)

Thanks Nathen!

There were 11 accounts in the 1500+ author bucket pre voting-bot deductions, and 5 once voting bots were removed. Once you add in curation and benefactor rewards I'm pretty sure the user you're thinking of wasn't the top earner, even with the additional curation rewards from their alt account. If I were to add in the witness rewards, there would be some fair competition for the top spot.

The use of alt accounts is a good point. There's no concrete way to sum across the different accounts of a single user. I'm guessing a lot of the low earning accounts may belong to single users trying to farm which will upset the distribution somewhat.

11 Accounts? That sounds like a challenge to my 'manual' statistical analysis lol...I love statistics and figures and must learn some modern basic interogatory analysis techniques. I'm thinking a 30 year old knowledge of COBOL isn't going to cut the mustard in 2018!

Posted using Partiko Android

11 Accounts? That sounds like a challenge to my 'manual' statistical analysis lol...I love statistics and figures and must learn some modern basic interogatory analysis techniques. I'm thinking a 30 year old knowledge of COBOL isn't going to cut the mustard in 2018!

Posted using Partiko Android

Goodness! I am absolutely looking forward to that next article. How many indeed? This is the kind of statistical analysis that we need. I don't know how it can impact the behavior of minnow users, but I think it should absolutely impact the behavior of dolphins on up, if, indeed this is not the distribution we think is healthiest.\n\nDo you take SBI into account and do you think that's useful to be aware of?

Posted using Partiko Android

Hey @improv!

For SBI, to the extent that users receive the income through upvotes on their posts / comments then this would be included. Any bonuses payments paid outside of post rewards (e.g. through transfers) would not be included.

There's lots more analysis work to do but I'm hopeful of getting a greater understanding or the retention / earnings relationship - more on this in the next issue! Then looking at things like SBI / minnowsupport and trying to see what actually makes a difference could be a really great addition!

I am fascinated by retention. I think a lot of it, most of it, is self-motivation (based on my experience with friends) but it is made much easier with support systems, like communities and votes.

My anecdotal experience is that I have friends who I've gotten on here who come and go, but the ones who have done even one @freewritehouse post are, even if they don't blog regularly, more likely to come back and stay for a couple days before disappearing again.
And the ones for whom I've purchased SBI are more likely to come by at least once a week. I'd guess that regular sporadic engagement (if that's a thing) leads to regular regular engagement.

Guesses. Hypotheses. I look forward to your actual analysis.

Good points!

I need to be careful with drawing any conclusions. There's plenty of causation/correlation style traps on retention. Whether users leave because they don't earn or whether they don't earn because they leave or whether they stay because they engage and this has the side effect that they earn but they would stay anyway.

Exactomundo.

Jesus. I think I may just about sneak into that elite group.

Along with most other people who attended SF3.

You just confirmed how much of a bubble that event was.

A pleasant bubble, but a bubble now firmly popped.

I have you at around the 1500 position mark, so at the top of the second bucket for the first two weeks of September.

Although clearly for individuals the position will depend heavily upon which dates are chosen for the study and whether those dates coincide with when you were posting. It's really more of a broad-brush picture across the whole blockchain.

For author rewards there can only be a relatively small core of accounts with reasonable earnings with the Steem price at current levels. There's only a certain amount of rewards to go around. But I'll talk more about that in another post.

OK thanks for that - is that author - benificiary (ie steempress) + curation?

My steem activity has actually been quite up and down since I've been between houses, so I guess my ranking will be up and down.

Top 1500 is quite good anyway - and it's taken me a while to get even there!

So next step - bash out a bot that reports this to people once a week. Then one that compares earnings to UA... maybe develop a 'most overvalued user' ranking and get @arcange to create a new badge of honour/ shame.

This is the kind of post I've been waiting for!

Posted using Partiko Android

I think that it was a fairly similar ranking on author earnings (which is calculated with beneficiary deductions and voting bots removed) and on combined earnings (author plus curation rewards from your voting plus any beneficiary rewards that you receive (zero for most people)).

I think that there are quite a few complications that would need to be solved before getting an accurate individual report. I’ve made no adjustments for delegation costs for example, since these can be paid for months in advance. I have the data but it would need a lot more work!

There's many complications. As long as yer clear about the limitarions, it's very useful analysis!

Posted using Partiko Android

are u optimistic about the future of steem?

I am, but it will take some time to get users back... hopefully we haven't missed our chance... it seems there are networks in place that can be jump started

I think that Steem is currently branching in different directions:

  • Content creation (blogging, videos, art)
  • Task-reward economy (Utopian, Oracle-D, Musing, Actifit)
  • Product / institution reviews
  • Discussion board?
  • Social network?

Each of those top 3 areas could succeed (I think task-reward is potentially the most interesting) but they all need some form of revenue model.

great summmary, ty ... what is Oracle-D?

It's here: @oracle-d

Bringing businesses to Steem who then leverage Steem's army of content creators and copywriters to produce material. Lots of Steem Power delegation. I need to read more around how they're getting on.

ty, i just got on board with them. have a good one !

Interesting stuff - so many buckets!

Really nice analysis! :)
Thank you very much for sharing this data, is very useful.

These write ups are AWESOME about what is going on inside the chain!

Posted using Partiko iOS

Glad to be of use!

Hey, @miniature-tiger!

Thanks for contributing on Utopian.
Congratulations! Your contribution was Staff Picked to receive a maximum vote for the analysis category on Utopian for being of significant value to the project and the open source community.

We’re already looking forward to your next contribution!

Get higher incentives and support Utopian.io!
Simply set @utopian.pay as a 5% (or higher) payout beneficiary on your contribution post (via SteemPlus or Steeditor).

Want to chat? Join us on Discord https://discord.gg/h52nFrV.

Vote for Utopian Witness!

Coin Marketplace

STEEM 0.36
TRX 0.12
JST 0.040
BTC 70839.13
ETH 3563.41
USDT 1.00
SBD 4.77