User talk:Chenzw/Archives/Jun 2023

Latest comment: 1 year ago by MediaWiki message delivery in topic Tech News: 2023-26
Archive This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.
This is a Wikipedia user talk page.

If you find this page on a site that is not Wikipedia, you are viewing a mirror site. The page may be old and the owner of this page may not have a relationship with sites that are not Wikipedia. The original page is located at http://simple.wikipedia.org/wiki/User_talk:Chenzw/Archives/Jun_2023.

Wikimedia Foundation
Wikimedia Foundation
This is the User talk page for Chenzw, where you can send messages and comments to Chenzw.


The Signpost: 5 June 2023

change
 
News, reports and features from the English Wikipedia's newspaper

Tech News: 2023-23

change

MediaWiki message delivery 22:52, 5 June 2023 (UTC)Reply

Tech News: 2023-24

change

MediaWiki message delivery 14:51, 12 June 2023 (UTC)Reply

The Signpost: 19 June 2023

change
 
News, reports and features from the English Wikipedia's newspaper

Tech News: 2023-25

change

MediaWiki message delivery 20:09, 19 June 2023 (UTC)Reply

User:ChenzwBot for viwiki

change

Hi Chenzw, I've been asked on behalf of the Vietnamese Wikipedia project if it is possible to write and operate the Vietnamese version of ChenzwBot to revert vandalism on that project. I noticed that in the source code you wrote that it was theoretically possible to operate it on other MediaWiki wikis, is that still correct? I've subscribed to this topic and watched your talk page for response, so you don't have to be in a hurry anyway. Thank you and regards, NgocAnMaster (talk) 08:56, 3 June 2023 (UTC)Reply

...or if you're unable to do that, just send me an email with your full source code for ChenzwBot. NgocAnMaster (talk) 03:12, 8 June 2023 (UTC)Reply
(talk page stalker) @NgocAnMaster A part of the source code is published on gitlab. — *Fehufangą✉ Talk page 03:28, 8 June 2023 (UTC)Reply
@Fehufanga Thanks, I mean his full source code, not part of it. NgocAnMaster (talk) 03:32, 8 June 2023 (UTC)Reply
Hey @NgocAnMaster, thanks for reaching out. It's been a long time since I was actively working on the code, so please give me a while to find the code for training the machine learning models. I will try to get back to you by the end of next week. Chenzw  Talk  11:41, 13 June 2023 (UTC)Reply
Okay, just do your work, I won't be panic. Anyway, your bot's public source code is released under the GPL license, so I hope the development won't be affected much by those license restrictions. NgocAnMaster (talk) 16:16, 14 June 2023 (UTC)Reply
@NgocAnMaster: The support scripts are now available at [17]. Unfortunately, there is no documentation and there is a lot more hard-coded items than I would have liked, since I didn't anticipate its use on other wikis back then in 2020. Here are some notes:
  • The initial starting point is the train-all.sh shell script.
  • There are two important input files that you need to provide:
  • rev-main_classifier.txt (1)
  • rev-nb_classifier.txt (2)
  • The above two files are intended to contain revision IDs to train the LightGBM and Naive Bayes classifier respectively. Do note that revision IDs you pass into (1) will also have their tokens extracted and fed into the training set for the Naive Bayes classifier. In other words: the main classifier is training on only file (1), while the Naive Bayes classifier is trained on both (1) and (2).
  • Tokens and input features are extracted during training time and persisted to file, in order to account for revisions that may be deleted in future.
  • The files accept one line per revision, in space-delimited format:
123456 vandalism
123457 good
123458 good
  • When the tokens are extracted, each token is also subjected to POS tagging, which is done by the spaCy library. As far as I know, there is no official vi language model in spaCy, but there is one community-maintained model here. Your results may vary.
  • F1 score is 0.893 and ROC AUC is 0.937 based on initial tests in 2020. However, these metrics were based on a binary output decision. In practice, the bot obtains a probability score from the model for the vandalism class, and compares it against a pre-configured threshold to judge between vandalism/non-vandalism.
I am quite sure there is still a lot I have not managed to explain yet, so please feel free to ask if there is anything about the bot code that you need clarification on. Chenzw  Talk  15:02, 22 June 2023 (UTC)Reply
Thanks for that. I'll try and get back to you if anything goes wrong. Have a good day! NgocAnMaster (talk) 13:11, 24 June 2023 (UTC)Reply

Tech News: 2023-26

change

MediaWiki message delivery 16:19, 26 June 2023 (UTC)Reply

Return to the user page of "Chenzw/Archives/Jun 2023".