Two days ago I restored a customer's data from backup because they typed "ALL" a...

dbcurtis · on Sept 24, 2020

An old office mate had been a systems programmer back in the days of mainframes. He once put a local-site patch into the mainframe boot code where the question about wiping and restoring the on-line storage required operators to type in, correctly capitalized and punctuated, the precise string: "Yes, I really do want to spend the entirety of my shift hanging tapes."

.... and someone still did it. The struggle is real.

segfaultbuserr · on Sept 24, 2020

In electrical engineering, the saying is "Nothing is foolproof to a sufficiently talented fool."

Also, in computer folklore, there are numerous stories of how non-technical users purposefully destroy foolproof mechanisms by brute-force, e.g. cut the slot on a DDR3 socket to insert a DDR2 RAM module and fry everything... And I wonder whether "don't use brute-force, if you have difficulty getting it in, it means you are doing it wrong" should be taught as the first rule when working with hardware. Unfortunately, to add the confusion, we also have connectors that can be surprisingly hard to connect and disconnect even under normal circumstances...

TeMPOraL · on Sept 24, 2020

> And I wonder whether "don't use brute-force, if you have difficulty getting it in, it means you are doing it wrong" should be taught as the first rule when working with hardware. Unfortunately, to add the confusion, we also have connectors that can be surprisingly hard to connect and disconnect even under normal circumstances...

And everyone has likely experienced in their lives plenty of appliances, self-assembly kits and other objects where some components required application of force to put together, because there's resistance coming from the feature that prevents the object from coming apart together. My rule of thumb is now that if the force seems to be veering into "could break surrounding structure" levels, or if the thing starts making unexpected sounds, then I'm doing it wrong.

... and then I have to put a CPU on a motherboard and the correct way absolutely does involve close-to-breaking forces and squeaky sounds.

fer · on Sept 24, 2020

In my environment the adage is "when you say foolproof you mean fooldetecting".

IggleSniggle · on Sept 24, 2020

I think this is a good analogy for software. To the typical user, some software is easy and sensical, while other is obtuse and requires significant jiggering just to do the thing it was ostensibly designed to do.

segfaultbuserr · on Sept 24, 2020

There is a difference, however. The connectors that require a lot of force for insertion and removal have some true advantages, they are usually the simplest, cheapest and generally reliable components. Almost nothing can go wrong with a simple wire terminal, it's just a piece of rectangular or round metal. Although it can be difficult to disconnect for servicing, but you're only expected to do that once per year. On the other hand, "easy" connectors are designed in a way that, instead of requiring the mating force necessary for a good contact , it's provided by the connector mechanism itself, and as a result, they're often more complex, expensive, or fragile, such as a USB connector or a ZIF socket.

A software analogy for an "easy" connector would be, "fancy software with good user experience often has a lot of complexity hidden behind of scene, and can be fragile". But I'm not sure what would be the analogy for a cheap connector. Perhaps, a shell script?

IggleSniggle · on Sept 24, 2020

Yes, a shell script, but for the sake of the discussion, you must imagine that to the average user, invoking a shell script is “the same” as navigating through a few settings menus and clicking some checkboxes they don’t understand: that is to say, when the UX is sufficiently lacking, users often enter “well I don’t really know what I’m doing just push through” mode.

rtx · on Sept 24, 2020

Atleast in this instance some blame lies with computer hardware designers. Make things simple, no one puts a three the wrong way.

laughinghan · on Sept 24, 2020

It sounds like you found the solution, you just made it really cumbersome: undo.

All confirmation dialogs should be replaced with undo. The happy path has lower friction, and in case of a mistake they'll heave a huge sigh of relief. When possible, it's better in all cases for all users, whether novices or power users.

And many things that at first blush seem like undo isn't possible, are actually easy to make undoable with a simple tweak: deleting data? Don't actually delete it until 24 hours later. Sending an email? Wait 10 seconds to actually send it, similar to Gmail's Undo Send.

stickfigure · on Sept 24, 2020

Implementing soft-delete is much easier than "soft-update", which is what this would have been.

eru · on Sept 24, 2020

You could still wait 10 seconds, and have a 'fake' undo button that aborts. (You can even put up a progress bar to pretend you are doing work during those 10 seconds.)

That's purely a UI element and is completely independent of how the actual destructive operation is implemented in the backend nor how hard it would be to reverse.

laughinghan · on Sept 24, 2020

I don't think that would be much better than a confirmation dialog that makes you wait 10 seconds before you can click OK. It's often only after clicking around and seeing the resulting changes that it sinks in that a mistake was made, and they reach for Undo. And that would add just as much friction to the happy path.

eru · on Sept 24, 2020

It can be much better than the confirmation dialog, because it's meant to be implemented in such a way that you can get on with the rest of your work while the undo-countdown is ticking.

From personal experience with gmail's fake undo, in terms of things sinking in, it works almost as well as regular undo for me; and not like a confirmation dialog (which doesn't work at all).

So there's less friction, there's no extra click you need to make after ten seconds. And, also from personal experience, the force-delayed confirmation dialog I've used (I think in Chrome and Firefox for certain actions), don't seem to lead me to thinking at all. At least not any better than a regular confirmation dialog.

But in any case, all these are empirical questions, and it would be interesting to run a little user study with the different options, instead of endless speculation.

stickfigure · on Sept 24, 2020

Gmail is a pretty specific case - email is fundamentally asynchronous and "delay send" for something that's already scheduled is straightforward.

Imagine trying to apply this undo to a bulk add/remove labels operation. Once you've committed the transaction, there is no simple 'undo'. It's possible to build a system capable of undo, sure, but you're talking about a lot of upfront work and complexity. Plus a fairly exotic database schema.

eru · on Sept 25, 2020

I don't see the problem?

I would imagine you would stick all your UI actions in something like a log, and then only apply that log to your actual data with a delay?

But not sure whether you call that 'a lot of upfront work and complexity'?

Perhaps I'm a bit blind, because I come from a part of the programming world that's very keen on persistent datastructures, where undos are trivial to implement. (https://en.wikipedia.org/wiki/Persistent_data_structure)

laughinghan · on Sept 26, 2020

There's nothing exotic about it, you just need an OLAP rather than OLTP database schema: https://en.wikipedia.org/wiki/OLAP_cube

laughinghan · on Sept 24, 2020

Yeah, unfortunately in some cases if you didn't plan for it from the beginning it's not easy to tack on later.

In my opinion, it's usually worth it though. You only hear from the folks asking you to restore things from backup—you won't hear from the folks who experience unnecessary friction and tell their friends or coworkers "it's okay, it works, it's kind of annoying to use though, I can't put my finger on anything specific".

AstralStorm · on Sept 24, 2020

Unless you're actually trying to do a huge synthetic operation where info would choke, e.g. creating an archive while removing existing files.

That one is still possible to undo, just slower and more expensive...

Then there are fun ones like Windows update holding 20 GB of insufficient undo, mechanical or hardware failures induced by extra load, and how to decide where an operation ends.

pdkl95 · on Sept 24, 2020

Unfortunately human error can never be completely eliminated. However, I'm not really talking about this type of problem. In my previous [1], the operator understood how the tool worked; they simply made a mistake when typing the command, and the tool without warning accepted the command to reboot the entire datacenter. Particularly telling are these comments: (sic)

    Operator-1: I ewas rebooting an rb
    Operator-1: forgot to put -n
    [...]
    Operator-5: [...] i've almost done what Operatolr-1
                just did a *number* of times.

That isn't a user understanding problem; it's a dangerous tool that doesn't fail safely. In your case, at least you detected the unusually destructive action and asked for verification. Youtube isn't even attempting simple sanity checks like your "N > 100" test.

> normally makes you type the number

> I just do not know how to make this more idiotproof

Requiring explicit typing of the number or an explicit phrase like "Yes, I want to delete everything." are can help a lot.

If possible, another good approach is to explicitly show the full list of proposed changes. Phrases like "This will change ALL of ..." might have multiple interpretations (ALL what? All of the the things in my entire account? All of the things in the current/last project/group? All of the things I think (perhaps incorrectly) were referenced in this action?). If someone is expecting to change only a few records, a confirmation popup that asks "Do you want to make these changes:" followed by a huge list has a large size/presence that should conflict with their expectations. "I only wanted to change a few things - wtf is this huge list?"

humaniania · on Sept 24, 2020

Require a different user's authentication or admin code to approve an "ALL" transaction.

zentiggr · on Sept 24, 2020

While there might be emergency conditions that would make this cumbersome, in general that sorry of two person control makes sense. That's why the military uses it for especially dangerous actions or conditions. (Weapons loading on the sub i served on, for example).

segfaultbuserr · on Sept 24, 2020

Yes, it works.

But when the sample is large enough, it still happens once in a while. In 2004, a CSB investigation showed that an entire chemical plant exploded after the interlock was bypassed by the supervisor password [0][1].

> The explosion occurred when maintenance personnel entered a password to override computer safeguards, allowing premature opening of the sterilizer door. This caused an explosive mixture of ethylene oxide (EO) to be evacuated to the open-flame catalytic oxidizer by the chamber ventilation system. The oxidizer is used to remove EO in compliance with California air quality regulations. When the EO reached the oxidizer it ignited and the flame quickly traveled back through the ducting to the sterilizer where approximately fifty pounds of EO ignited and exploded.

Apparently the supervisor who owned the password didn't receive any training on the nature of the process and the dangers of bypassing the interlock...

[0] https://www.csb.gov/assets/1/20/sterigenics_report.pdf

[1] https://www.youtube.com/watch?v=_2UnKLm2Eag

C1sc0cat · on Sept 24, 2020

Why would you ever ever have the ability do this "allowing premature opening of the sterilizer door".

Ironic it was air quality regulations - that did for them.

bluGill · on Sept 24, 2020

There needs to be something in case the implementation forgot something. These are dangerous of course, but they can also save the day when something unexpected happens.

C1sc0cat · on Sept 24, 2020

This is a physical chemical plant not a website - if you do need to do something like that you do it manually.

tapland · on Sept 24, 2020

Printing a list of the fist 100 or so affected files is useful to give a real wake up call and a chanse to double check.

heavenlyblue · on Sept 24, 2020

The bigger question I have, why did you not delete the data when asked to?

I understand idiot users, but what about users who actually want to delete it?

dannyw · on Sept 24, 2020

GP mentioned restoring from backups. You generally don’t delete from backups outside of the normal cycling policy, because otherwise they’re not backups.

john_minsk · on Sept 24, 2020

Don't allow the user type "all", but another guy (admin) to verify such commands.

waheoo · on Sept 24, 2020

Reverse survivorship bias...