Wikipedia:Bots/Noticeboard/Archive 4

Archive 1Archive 2Archive 3Archive 4Archive 5Archive 6Archive 10


-BOT Process

[watch this thread][edit]

So from time to time we have an issue with a bot running out of control, unapproved bots being run, etc. In a recent matter (actually its still going on, but thats beside the point), members of the community seems to discuss a proposal that would have (IMHO) compelled a bot owner to change the operation of his bot. But the bot owner was already on record of saying he would not pay attention to that proposal. I asked the crats what sort of consensus they would look for, and WJBscribe indicated they'd look towards the BAG [1] and that it might be nice if the BAG had some formal process "where someone can raise problems with bots and BAG can evaluate whether to require changes to the bot's operation be made in order for approval not to be withdrawn." Im thinking a possible extension might be an RFC-bot, modeled on the RFC-user conduct and RFC-policy systems. Or something akin to Admins Recall, if it could be applied to all bots equally (not 500 different processes). Other ideas? MBisanz talk 07:56, 21 February 2008 (UTC)

Well, we already had a sort of rfc/bot attempt as a subpage of WP:Bots. (I'm not linking it, and if it doesn't get deleted it can be found with a prefix search.) There is a more general problem of lack of oversight of bot tasks. Basically, anyone can do anything. Bot policy allows "assisted scripts" to work without approval with very little real restriction, so long as it doesn't edit so fast that it cannot resonably be an assisted script. If we AGF, it's difficult to justify under policy blocking most bots that do not display straightforward bugs.
One idea I've had is to say that any bot needs approval which 1) edits at a clip too fast to reasonably check a good proportion its edits (maybe 5 per minute?), or 2) is doing a single job with more than some number of edits (maybe 1000), or 3) the operator does not respond to inquiries within 15-20 minutes (an operational definition of an unassisted script). And likewise, any *task* over 1000 edits needs to be posted somewhere (WP:BOTREQ, for instance) with time allowed for objections, unless an exemption is part of the bot approval for a specific type of task. That way admins will have some specific to point to when dealing with editors who start up AWB and make hundreds of edits removing spaces, as noted above. It would address the issue with certain javascript tools which tie up a browser for an hour. And BetacommandBot would be given a considerably larger per-task/per-day edit allowance for image tagging, but whether that's 2000 or 5000 per day would have some community input. Gimmetrow 08:26, 21 February 2008 (UTC)
Those are good ideas, but I think they tackle the bigger problem of not being able to keep track of all the bots and all their approved tasks (look here [2] at that prefix you gave me). Given the recent, shall I use the word, forum shopping, with BCB, that would be severely frowned upon if it happened to a human user or a policy, I'd wondering if we couldn't codify the WP:Bots subspace system. Like with a standard page naming format, rules of what IS a complaint, endorsing users, consensus closing, etc. MBisanz talk 08:42, 21 February 2008 (UTC)
I've been looking at Wikipedia:Requests for comment/User conduct and didn't realize there was a separate section for Admins and non-Admins. Maybe a third Bot section that creates a new page in the Bot subspace? MBisanz talk 23:32, 21 February 2008 (UTC)
Seeing as anything done can be undone, I've created the following Wikipedia:Requests_for_comment/User_conduct#Use_of_bot_privileges procss as a proposed process to show what I'm thinking of. MBisanz talk 02:18, 22 February 2008 (UTC)
Thanks for that, MBisanz. Hopefully this will work. I would like to see WP:BAG make some official, or semi-official pronouncement about this, as it will need their support to work. Some acknowledgment that they will act constructively on the results of bot requests for comments (ie. explaining things to people) rather than just dismissing "attack pages". How does one go about getting an "official" response from WP:BAG? Carcharoth (talk) 11:37, 23 February 2008 (UTC)
I'm counting 1413 active BAGers. Maybe some survey of them of how they'd respond to a Bot-RfC? My inspiration for this was WJB's suggestion at Wikipedia:BN#Bot_change, so maybe it would be better to wait till we have a Bot-RfC that comes to a consensus to do something, the operator refuses, and then see if the BAG responds. MBisanz talk 04:29, 24 February 2008 (UTC)


(copied from BN) IMHO WP:BRFA isn't enough in this respect. Consensus (and the bots themselves) can and do change. There needs to be a process for governing bot (and bot owner) activity including withdrawing approval if necessary. Sure, bots can be blocked but that tends to be reactionary and only takes one admin. I had a bot blocked a few days ago (see bot out of control from above) too and it just seems that, for lack of sufficient process, the block (which was not set a time limit) was just forgotten. We, as a community, need the ability to govern bots because when it comes down to it they are just too efficient. This bitterness and resentment seems to stem mostly from the lack of binding recourse either for the sake of justifying a bot, or for governing one. But as I said it's just my opinion. Adam McCormick (talk) 07:46, 24 February 2008 (UTC)

if there is a serious issue with a bot, leave a note on WT:BRFA and BAG will review the situation. βcommand 14:39, 12 March 2008 (UTC)

{{t1|nobots}} proposal.

[watch this thread][edit]

Just wanted to drop a note here, there is presently a proposal underway at WT:BOTS, to require that all bots be {{nobots}} compliant. SQLQuery me! 03:28, 9 March 2008 (UTC)

nobots needs to be redesigned

[watch this thread][edit]

I just got around to looking at the nobots system, and realized how far from best practices it is. The system is premised on the historical practice of downloading the entire content of a page before making any edit, even if the edit is only to append a new section to the bottom. Once the API editing is implemented, we probably won't need to download any page text at all to get an edit token and commit the new section. At that point, the nobots system will be completely broken.

It seems to me that we should discuss a nobots system that doesn't require bots to perform lots of needless downloads. Perhaps a database of per-bot exclusion lists, like Wikipedia:Nobots/BOTNAME or something like that, which would only require one fetch to get the full list. — Carl (CBM · talk) 12:59, 9 March 2008 (UTC)

Appending sections doesn't make any sense in mainspace (and other content namespaces) because it would break layout by adding information afters stubs, categories and interwikis. If it's not for mainspace, most bot developers will ignore this possibility. MaxSem(Han shot first!) 13:07, 9 March 2008 (UTC)
It does make sense, however, for talk pages, which are the main source of interest for the nobots system. I don't advocate forcing bot developers to follow nobots or forcing them to use the new section editing method. But the current nobots system is (mis)designed assuming that all bots download the old page text before all edits, which is an actively bad assumption because it discourages bots from using more efficient editing methods. — Carl (CBM · talk) 13:20, 9 March 2008 (UTC)
There's whatlinkshere: section-editing bots could load all transclusions and then ignore everything that uses {{nobots}} and load whole pages for those who use {{bots}} with selective exclusion. I don't like the centralised system because it's prone to vandalism and eventually we'll have to fully protect the exclusion lists and build unnecesary bureaucracy around addition/removal from them. MaxSem(Han shot first!) 13:32, 9 March 2008 (UTC)
I agree it's a pain to have the exclusion lists on the wiki. By the way, because of bugzilla:12971, a bot would need to use the API list=embeddedin rather than backlinks to get a list of pages.
But the current system is worse. It is inherently flawed and riddled with technical problems because bots are not as smart as the wiki preprocessor. If the template is on user pages then bots need to be able to parse it as robustly as the wiki parser does. So this code should work:
{{bots|allow={{MyBotAllowList}}}}.
The only implementation I have seen of nobots is the pywikipedia one, and it does not support this because it does not resolve transclusions in template parameters. Also, if {{bots}} was transcluded from a user box, pywikipedia would not notice it even though it would be listed as a transclusion by the API. Which is reasonable enough - nobody should expect bots to parse pages in this way. — Carl (CBM · talk) 13:49, 9 March 2008 (UTC)
Is there a [[Category:]]-like system that we could use? Just throwing out ideas - I doubt there is, but am trying to think "outside the hammer". -- SatyrTN (talk / contribs) 16:03, 9 March 2008 (UTC)
Here's one possibility along those lines. We could set up categories like Category:Pages not to be edited by bots, Category:Pages not to be edited by BOTNAME, Category:Pages that may be edited by BOTNAME and overload the bots and nobots template so that code like {{nobots|BOT1|BOT2|BOT3}} puts the page into the appropriate categories. — Carl (CBM · talk) 18:39, 9 March 2008 (UTC)
A category system just creates massive overhead: the bot would have to load the entire category tree and hold it in memory, whether or not the page was ever actually called on. Clever programming could minimise the overhead, but it's still quite substantial. For userpages, I would advocate using something like User:Example/bots.css. How efficient is a check for page existence? I don't know off the top of my head, but I expect it's pretty low overhead. Whenever a bot wants to edit a page in userspace, it checks for the existence of a bots.css page for that user. If it doesn't exist, it knows it has free reign in the userspace. If it does exist, it loads the page and parses it - we can work out the most versatile and efficient coding - and from that learns which bots can edit which pages in the user's userspace. This provides the additional advantage of being able to easily apply nobots to all your subpages if you so wish. I'm thinking something along the lines of:
exclude [[User:SineBot]] from [[User talk:Happy-melon]]
exclude [[User:MelonBot]] from [[User:Happy-melon]] [[User:Happy-melon/About]] [[User:Happy-melon/Boxes]]
exclude [[User:ClueBot]] from all
exclude all from [[User:Happy-melon/Articles]]
Is pretty easy to read by humans, easy to parse by bots, and easy to debug (redlinks = bad). Not sure how this would extend outside userspace, but how often is nobots used in other namespaces? Comments? Happymelon 20:40, 9 March 2008 (UTC)
When should {{nobots}} ever apply outside the user talk? BJTalk 20:50, 9 March 2008 (UTC)
I've had various instances of editors telling me to keep SatyrBot from adding WikiProject banners to the talk page of articles. I don't know if that's valid, but that's one instance where nobots might apply. And don't we tell certain bots to archive/not archive various talk pages? Or tell sinebot to watch / not watch certain pages? -- SatyrTN (talk / contribs) 21:11, 9 March 2008 (UTC)
Archiving is opt in and Sinebot had a cat last time I checked. BJTalk 21:17, 9 March 2008 (UTC)
Indeed, those were just the first three bots I could think of - don't think of them as anything more than examples. What do you think of the actual system? Happymelon 21:29, 9 March 2008 (UTC)
It basically robots.txt, which has worked for years. But I still see no use for it. BJTalk 21:33, 9 March 2008 (UTC)
That was my inspiration, yes. I can't fully see the use of it myself, but someone said {{nobots}} was becoming obsolete, and proposed a (to my mind) impractical solution, so I came up with my own (hopefully less impractical) idea. Happymelon 21:37, 9 March 2008 (UTC)
Banners on talk pages. -- SatyrTN (talk / contribs) 21:39, 9 March 2008 (UTC)
I have no idea how your bot works but any nobots system doesn't seem like the best way to deal with that. BJTalk 21:45, 9 March 2008 (UTC)

Happy-melon: what do you mean the bot would hold the entire category tree in memory? There would be at most three categories to read: the list of pages forbidding all bots, the list permitting that particular bot, and the list forbidding that particular bot. This would mean (unless any of the lists is over 5000 entries long) only three HTTP queries, one time, to load the exclusions list. That's reasonable.

On the other hand, any system that requires an extra HTTP query for every edit that must be made is unreasonable because it is vastly inefficient. It would be possible to reduce the number of extra queries if you were just looking for page existence, but still every single bot.css file or whatever would have to be loaded, every time the bot wants to edit the corresponding page. That's far from ideal design. — Carl (CBM · talk) 22:59, 9 March 2008 (UTC)

That is true, however, consider the ramifications of actually maintaining such a system, not simply using it. There are four hundred and eight accounts with the bot flag on the english Wikipedia, meaning that a complete system would require at least 800 categories to be created. We can't justify not creating the categories until they are needed, otherwise we will have slews of problems like "OI, my userpage was in Category:Pages not to be edited by SignBot, why did I still get notified??"
The problem we are essentially dealing with is that we need, in some manner, to complile a database table in an environment which doesn't really support multidimensional structures. We have a large number of bots which edit userpages; we have a larger number of userpages which might be edited by bots. We have to cross-reference those data sets in the most efficient manner possible. The real question is: do we divide the table up by bot, or by userpage? the Categories system is an attempt to break the table up by bot, which makes it easy for the table to be parsed by the bots which need to use it. The bots.css system breaks the table up by user, which makes it easier for individual users to manage it. I am of the opinion that, since the bots exist to serve the users, not the other way around, our priority should be to create a system that editors can use easily, even those with no programming experience. Asking them to add each individual page to Category:Pages not to be edited by bots is much more time-consuming and error-prone for them than just adding "exclude all from all" to one file. There's no reason why we can't use a bot to generate the alternative version of the table - even just a regularly-updated list of existing bots.css pages would reduce overhead. If the updating bot checked newpage-tagged RecentChanges, and the deletion logs, for pages with "/bots.css" in the title, the list would be completely current. I doubt that's a particularly onerous task, but it's not one that is even necessary for the system. Essentially what I'm saying is, Wikipedia's back streets are created for its editors, not for its bots - any system should put user-interface first, and bot-interface second. Of course we should endeavour to optimise both interfaces, but if that's not possible, the humans should win. Happymelon 19:35, 10 March 2008 (UTC)
It doesn't seem difficult to me to maintain two overall categories plus two categories per bot, given that the number of exceptions is always going to be very low for properly designed bots. Categories are, in a way, easier for individual users than writing a file using some new syntax that they don't already know. Adding a category is a task everyone is familiar with. (As an aside, naming it shouldn't be .css since it isn't a style sheet.)
But I would prefer to see a per-bot blacklist in any case; the idea of categories is only one proposal. — Carl (CBM · talk) 20:00, 10 March 2008 (UTC)
I suggested .css subpages because they can only be edited by the user, thus preventing vandalism. In the same way, it's just an idea. My main problem with categories is the necessity of maintaining a large tree of mostly-empty categories, since to avoid errors each bot should have existing categories, whether or not they are populated. Happymelon 13:09, 12 March 2008 (UTC)
It makes no difference from the point of view of the bot whether the category page exists or not - the contents of the category can still be queried either way. So I don't see the need to create all the categories at once. But I also am not a strong proponent of the category system - a simple blacklist maintained for each bot would be fine. — Carl (CBM · talk) 13:49, 12 March 2008 (UTC)

BJBot

[watch this thread][edit]

I would like to urge those who approve bots, that bots like BJBot — which left an unwanted long notice on my talk page because I made a single edit to Adam Powell, telling me that it was listed on AfD — should honour {{nobots}}.

As a side note this response is rather uncalled for behaviour for a bot operator. I'm glad he struck that later, but it's still disappointing. Requests by useres not to notify them should only be ignored if there is a good reason to do so. --Ligulem (talk) 19:00, 9 March 2008 (UTC)

How do I roll my eyes over the internet? If you would like something changed ask, don't tell me to stop running my bot. BJTalk 19:18, 9 March 2008 (UTC)
Well, this rant did actually include a hidden gem of a bug report. Thanks. BJTalk 20:08, 9 March 2008 (UTC)
Consider this bot not having my approval. --Ligulem (talk) 20:32, 9 March 2008 (UTC)
k? BJTalk 20:35, 9 March 2008 (UTC)
I personally wouldn't require anyone to implement the current nobots system. BJ, was the bug you mentioned that this editors shouldn't have gotten a notice? It does seem odd if everyone who edited the article even once gets notified. — Carl (CBM · talk) 14:54, 10 March 2008 (UTC)
You might want to read Wikipedia:Bots/Requests for approval/BJBot 4, where Bjweeks said to have had implemented {{nobots}}. When I asked him to stop his bot until that actually works, he first denied my request (later struck his denial) and labelled my comment here as "rant". Besides, that bot task is entierly uneeded and unwanted anway, so it doesn't have my approval (even if it would work as advertised). We simply don't need nor want this hard core talk page spamming. After all, there is a watchlist feature for a purpose. --Ligulem (talk) 16:40, 10 March 2008 (UTC)
I don't think that your individual approval (or mine, since I'm not a BAG member) is the deciding factor. But BJ did say in the bot request that the bot would honor nobots, and I think it is a reasonable thing for this bot to do, if its purpose is mainly to notify users on their talk pages. — Carl (CBM · talk) 17:01, 10 March 2008 (UTC)
There is no consensus for running this bot task. That's the deciding factor. BAG implements consensus. And as an admin, I may block a bot that doesn't follow its approval if its owner is unwilling to stop and fix it after I have asked him to do so. --Ligulem (talk) 17:48, 10 March 2008 (UTC)
The bot seems to be notifying a hell of a lot of people for a single AfD. Can we stop the bot, reopen the BRFA and seek wide community input please (as this task probably affects most of the community and could do with broader input that that provided in the previous one day BRFA)? Martinp23 18:06, 10 March 2008 (UTC)

I would like to see the approval for this task looked into further by BAG. The notifying of people with very few edits to articles seems rather an annoyance and the bot seems to be notifying a lot of people (IPs included) - I count about 50 notifications about the proposed deletion of Prussian Blue (duo) alone. This was probably a request that should have been scrutinised a little longer... WjBscribe 18:22, 10 March 2008 (UTC)

This request should not have been granted. But since there seems to be no procedure for withdrawing of erroneous approvals, chances are small that anything will happen here. In case BAG or whoever actually does review this bot's task, I suggest to at least rethink if it really makes sense to post lenghty notices about article deletions if the last edit of that editor on the article at hand dates back more than a year. Furthermore, notifying admins about page deletions is particularly pointless, since we can still see "deleted" pages anyway. Also, wiki-gnomes like myself who currently don't edit and who have many thousands of small edits in their contribs, are particulary annoyed by having their talk pages plastered with these pointless wordy "notfications" which don't serve much more than making inactive editor's talk pages look like they would pertain to some stupid newbie who needs a pile of corrective warnings about his misplaced steps on this wiki.
This project has really gone mad. Some bot operators with approvals seem to think they are on a heroic mission here and they have to be prepared to knee-jerk reject requests to stop and fix their bots. This attitude is harmful to this project. But that seems to be the norm nowadays on Wikipedia. --Ligulem (talk) 01:07, 12 March 2008 (UTC)
What part of it was a bug do you not get? BJTalk 02:57, 12 March 2008 (UTC)
We could just, you know, ask him to do something about it... --uǝʌǝsʎʇɹnoɟʇs(st47) 19:53, 10 March 2008 (UTC)
I'm confused, isn't that what's been going on so far? —Locke Coletc 20:06, 10 March 2008 (UTC)
I've made some small changes which halved the number of notices to that article. I'm also working on adding a check for when the person last edited the article. That should be done by tomorrow. BJTalk 03:50, 12 March 2008 (UTC)

There was in fact two different bugs that allowed Ligulem to get a notice. The first was me playing around with nobots early in the morning and had been fixed for hours (what he requested fixed on my talk), the second I didn't notice until he posted his rant here ("only one edit" got my interest), I also fixed that. If anybody sees unwarranted notices, leave a message on the bots talk with a diff. I also plan do redisable IP notices per a message on my talk, that should further reduce notices. BJTalk 01:48, 11 March 2008 (UTC)

Thanks, for responding to, and fixing the bugs mentioned somewhere in this complaint. Also, thanks for staying cool on this one. SQLQuery me! 04:16, 12 March 2008 (UTC)
So you do think that this response by BJ was fine? --Ligulem (talk) 09:02, 12 March 2008 (UTC)
This was clearly a misunderstanding by the operator, which he has since corrected. It's not a big deal. -- maelgwn - talk 09:30, 12 March 2008 (UTC)
Yes it's not a big deal, but it would have been nice to admit that in the first place instead of labelling my post here as a "rant". Second, it seems somewhat of an irony, that it was SQL who fully protected his talk page recently [3]. Of course, I do understand that he was under very tense stress in real life and with some recent on-wiki issues. --Ligulem (talk) 09:54, 12 March 2008 (UTC)

Developed by StudentB