[Analysis] Blocktrades worldcup contest #mypicks analysis on choices made

in #utopian-io6 years ago (edited)

Repository

https://github.com/superoo7/worldcup

Introduction

In previous post on development update for @blocktrades worldcup competition, I shared about the task, I was being assigned to work on. It is basically a data extraction from more than 2000 posts on steem blockchain, in order to know who are final winners of the competition.

I decided to setup this repo just for reference of data: https://github.com/superoo7/worldcup

Outline

  • Scope
  • Summary
  • Data Extraction
  • Data Visualization
  • Possible of errors
  • Conclusion
  • Tools and Scripts
  • Relevant Links and Resources

Scope

Screen Shot 2018-06-17 at 11.15.03 AM.png
Time in Ghana which is GMT+0, is where the first match of FIFA 2018

The data is extracted out on 16th June 2018, where the timeframe is before 14/06/2018 6pm GMT+0, just before the first match of FIFA 2018.

Results

Summary

There are 2481 total posts being posted for this competition, where you need to fulfill a few rules:

  • Last edit before 14/06/2018 6pm GMT+0.
  • Reputation > 35.
  • Use the tag of #blocktradesworldcup and #mypicks.
  • Only submission in English.
  • Follow the templates given, using w, l, t to indicates the winning condition.
  • If invalid logic of conditions is received (2 winning team), the choice will be considered as n/a.

After the filter, there are 1815 posts left, which will be used in this analysis.

Data Extraction

The data extraction process are as followed:

  • Check reputation of the authors.
  • Check whether an author make multiple posts
  • Extract out data by using string replacement and regex
  • Data Visualization

unknown (3).png
Script written in TypeScript to extract data out with Regex and string replacement.

Based on SteemSQL, there are 2481 posts for the entry of competition, which is checked by using the tag #blocktradesworldcup and #mypicks.

After the first level of filter on reputation of the author (reputation > 35), they are 263 violated users, which made the amount of posts reduce to 2218.

Then, the script check whether an author made multiple posts, these will be check later on maybe by manually or a script, to add in the user's data into the existing one. This reduce the amount of posts to 2018.

Lastly, the data are being extracted into w (win), l (lose), t (tie), o (n/a), which it indicates the Left Hand Side (LHS) as a parameter. In Russia VS Saudi Arabia wise, w indicates Russia win; l indicates that Saudi Arabia wins. In addition, the script also manage to extract data out from HTML table and Markdown Table.

Data Visualize

By using Tableau for data visualization, I am able to build a few plotted data out.

Combination of choices

Screen Shot 2018-06-17 at 10.45.12 AM.png
The amout of combinations of choices made by all authors

Although we expect this to be unique (since everyone would make different choices), but there still a small group of people chose the same condition out from 48 matches.

Choices made

Sheet 1.png

Based on the data collected, there are 87,120 (1815*48) conditions where 293 of them are invalid, due to invalid winning conditions (2 winning team or 2 losing team), etc, which contributed to 0.3363% to the total amount of choices.

Compiled of all data
Sheet 2.png

This is the total compiled version of user's choice behavior. The green color indicates the LHS as the winner, the blue color indicates a draw, red color indicates LHS as the loser and RHS as the winner, and orange indicates invalid data.

Let's just take a few played matches as examples for analysis.

  1. Russia VS Saudi Arabia (W)
    Russia won this game, which shows the majority have picked the correct choices.

  2. Egypt VS Uruguay (L)
    Uruguay won this game, the majority also picked Uruguay as their choice.

  3. Morocco VS Iran (L)
    Iran won this game but the majority picked Morocco to win.

  4. Portugal VS Spain (T)
    This match is a draw, but based on the choices, Spain are more favourable.

  5. Frace VS Australia (W)
    France won this game, which also shows the majority have picked the correct choices.

  6. Argentina VS Iceland (T)
    This is another game that is a tie, but the majority have picked Argentina.

If you interested in a more in-depth analysis of each matches (7 days ago's post), you can check out @petermail 's post

Possible of errors

  • There are still 200 posts to be reviewed due to duplications, I will be working on that manually or by script.
  • I had been added a lot of testing into Jest, a testing framework for creating this analysis, which I think that my script had covered most cases (including <table> HTML table instead of MarkDown table)
  • In some cases, authors does not following templates given, using their own version, Win instead of W; or changed the country name into non-English, and wrong spelling of Country (English instead of England).
  • The script also only accept the country being sorted in the order given in the template.

Conclusion

I am glad that I was being assigned to carry out this task, just wait a few more days until World Cup group stage is over, and we can know who are the winners!

I would suggest the contest holder create a simple template for users to share their result, to prevent confusion and such a complex extraction need to be carried out. The project is open source but required you to have a SteemSQL subscription in order to use it. If you don't have SteemSQL subscription, maybe you can try out the json file in the repository for data analysis.

Tools and Scripts

  • SteemSQL - Extracting data
  • TypeScript - To run data extraction with Regex and string replacement
  • Jest - To test individual functions created for data extraction
  • Tableau - Data visualize tools

Relevant Links and Resources

SELECT
  Comments.author,
  Accounts.reputation,
  Comments.permlink,
  Comments.json_metadata,
  Comments.created,
  Comments.last_update,
  'https://steemit.com/' + Comments.parent_permlink + '/@' + Comments.author + '/' + Comments.permlink  as url,
  Comments.body
FROM Comments
LEFT JOIN Accounts ON Comments.author = Accounts.name
WHERE
  Comments.depth = '0' AND
  CONTAINS(Comments.json_metadata, 'blocktradesworldcup') AND
  CONTAINS(Comments.json_metadata, 'mypicks') AND
  Comments.created < '2018-06-14 18:00:00' AND
  Comments.last_update < '2018-06-14 18:00:00'

Proof of Authorship

https://github.com/superoo7/worldcup

Sort:  

wow what? haha

You've been upvoted by TeamMalaysia community. Do checkout other posts made by other TeamMalaysia authors at http://steemit.com/created/teammalaysia

To support the growth of TeamMalaysia Follow our upvotes by using steemauto.com and follow trail of @myach

Vote TeamMalaysia witness bitrocker2020 using this link vote for witness

Damn, this facebook old tricks all over again 🙈

LOL... fun to play :P

Well done. I like the visualisation of unique picks. I will also post some interesting statistics e.g. no user guessed right all results after 8 matches and 124 users don't have any of the 8 results right.

really great work done by u! maybe you should share on Utopian on analysis too!

I posted some more nice graphs and staticstics. After 19 matches 242 users are above Steemit's average guess and 1096 below.
https://steemit.com/utopian-io/@petermail/world-cup-result-analysis
WorldCup11.png
Points distribution after 19 matches.

Hello @petermail, I did read one of your analysis posts. Perhaps you can try your hand in doing a Utopian analysis?

Hi @superoo7, impressive work to collect and visualize all user picks from the posts! From Utopian perspective this is a bit tricky: Blocktrades and the contest are no open source projects. As an analysis of the steemit/steem open source project, the blocktrades contest is a very narrow aspect. You've submitted this as an analysis of your tool, but from the data you've shown the detection efficiency is probably among the few results that are really about your code. Nevertheless, we've decided to accept that, and we're looking forward to see more from you - maybe slightly more in the focus of Utopian! :)
Feel free to contact us on discord if you have questions!


Need help? Write a ticket on https://support.utopian.io/.
Chat with us on Discord.
[utopian-moderator]

Thanks for moderating, I did approach to @eastmael before this post, so I thought it should be fine haha

Yes, there's usually leniency on first analysis contributions. Then become stricter on the succeeding contributions. Hope to have your next analysis more focused on contributing to open-source projects. :)

Hey @superoo7
Thanks for contributing on Utopian.
We’re already looking forward to your next contribution!

Contributing on Utopian
Learn how to contribute on our website or by watching this tutorial on Youtube.

Want to chat? Join us on Discord https://discord.gg/h52nFrV.

Vote for Utopian Witness!

Hello. The contest steembord inspired me to hold my charity.
I want to give three lucky football souvenirs from Russia.
Terms of the contest read here
https://steemit.com/worldcup/@yak15/attention-contest-i-send-a-gift-to-the-world-cup-2018-to-three-lucky-winners

P.S
I do not have the means for active advertising. I really want to give gifts to my new friends from steemit so I write such messages. Thank you for any help in distributing this information.

Hello in my selection I skipped a team I did not put anything left in X the two I would like to know if it is disqualified.@superoo7

Coin Marketplace

STEEM 0.26
TRX 0.13
JST 0.032
BTC 60837.81
ETH 2874.77
USDT 1.00
SBD 3.62