More

liyanchang · on Sept 4, 2019

See also https://99percentinvisible.org/episode/invisible-women/

"Car crash test dummies are also generally male, based on an average man... and ignores anatomical differences, plus specific individual circumstances like a person being pregnant"

There's lot of places where technology is tilted in unintentional ways

liyanchang · on May 10, 2019

I've been really happy with how my current company[0] has been doing migrations and I've seen a couple others do it but it seems like it should be more widespread.

Database Schema as Code

Instead of writing up and down migrations, you define what the end state should look like. Then the computer will figure out how to get here. This is just how the industry started managing server configurations (Puppet) and infrastructure (Terraform).

We use protocol buffers so it was pretty straight forward to have a definition of what our tables should look like. We have a script that figures out what the delta is between two states (either proto files or a db) and can calculate the schema migration SQL (e.g. CREATE TABLE, etc).

From there, we run it through a safety check. Any unsafe migration (either for data loss or performance issues e.g. DROP TABLE) requires an extra approval file.

There's no real difference between an up migration and a down migration (except that one tends to result in an unsafe migrations). It's calculable at CI time so we can give devs a chance to look at what it's going to do and approve any unsafe migrations. API compatability checks enforce that you need to deprecate before you can drop.

DML, that is data changes, are handled via standard check in a sql file and CI will run it before the code deploy and after the schema migration.

Alembic is the one other place I've seen this concept (a couple others mentioned this) so it's not new, but surprised I haven't seen it more places.

[0] Shameless plug: We're hiring if you're interested in changing how healthcare is paid for, delivered, and experienced. https://www.devoted.com/about/tech-jobs/

evanelias · on May 10, 2019

I call this declarative schema management, since the repo declares the desired state, and the tooling knows how to reach this state. This concept is finally catching on lately, although some huge companies have already been doing it this way for quite some time. Facebook is a key example; they've managed their schema changes in a pure-SQL declarative fashion, company-wide, for nearly a decade.

I'm developing a suite of tools [1] to provide declarative schema management and "schema change by pull request" functionality, initially targeting MySQL and MariaDB. A few large companies have built pipelines using one of my tools -- including Twilio SendGrid, who wrote about their process in-depth recently [2].

[1] https://skeema.io

[2] https://sendgrid.com/blog/schema-management-with-skeema/

liyanchang · on May 10, 2019

This is good to know. As someone who didn't do much with databases before, I was frankly worried given how it didn't seem like many others were taking this approach when it made so much sense (we did have the advantage of having a defined schema which I know isn't always available). Seems like I just didn't know what to search for.

Git would never have worked it required devs to write the up/down patches - why should we have to write the up/down migrations for my schema?

Excited to see more tooling around declarative schema!

meowface · on May 10, 2019

I'm surprised this isn't more of a thing. It seems like the natural evolution of "[X] as code". I've always been a little turned off by migrations (though they were certainly an improvement over the previous situation, which was basically just indeterministic changes on the fly).

evanelias · on May 10, 2019

My thoughts exactly. But it's a major paradigm shift for those coming from the world of Rails/Django/etc migrations, and that unfamiliarity understandably leads to some initial resistance and skepticism.

fwiw, other declarative tools are starting to pop up -- besides my tool Skeema, some others I've seen recently are Migra [1] and sqldef [2]. And meanwhile a bunch of enterprise tools for MS SQL Server have operated in the declarative fashion for quite a long time, although usually with GUIs instead of being git / pull-request-driven. So I think/hope it's just a matter of time before this concept becomes more widely known.

[1] https://github.com/djrobstep/migra

[2] https://github.com/k0kubun/sqldef/

hobs · on May 10, 2019

It's definitely a thing, eg SQL Server Data Tools has this as a default - Schema Compare and Data Compare, and you can just use declarative approaches to defined your final state and let the tool take care of it.

That being said - if you want to do this the downside is usually that its slow as hell, and the non-migration approaches can cost you downtime.

Generic solutions to specific states often means copying all data somewhere else so you can modify the table and then put it back in a useful fashion - a migration often allows more piecemeal approaches.

Edit: a guy I like wrote a good model/migration set of articles http://dlmconsultants.com/model-vs-mig/

nickserv · on May 10, 2019

Curious, how do you deal with renaming fields or tables?

This is a (minor) pain point for traditional migration systems.

evanelias · on May 10, 2019

Excellent question! The short answer is Skeema doesn't directly support renames yet. Renames are inherently more imperative than declarative, so they don't fit in well with the model. I've thought about handling them via tracking their history/state, but it would be hacky.

Two workarounds exist in Skeema currently:

* You can do the rename "out of band" (e.g. manually, rather than via `skeema push`), and then update your schema repo via `skeema pull`. This isn't ideal, but then again, table/col renames typically involve nasty code-deploy-order complexities to begin with (regardless of migration system): there's no way to rename something at the same exact instant that your new code goes live, and it's difficult/annoying to write code that can correctly interact with both names.

* For new empty tables, happily a rename is equivalent to drop-then-re-add. So this case is trivial, and Skeema can be configured to allow destructive changes only on empty tables.

I've written a bit more about this at https://github.com/skeema/skeema/blob/master/doc/requirement... , as well as the bottom of https://www.skeema.io/blog/2019/01/18/declarative/ .

brlewis · on May 11, 2019

How about a column named x_no_wait_y declares a column named y, but if a column named x exists it's renamed?

teddyuk · on May 11, 2019

If you use ms sql server ssdt you use refactor/rename and it finds all references and changes them and then when you go to deploy it generates a sp_rename - 100% killer feature right there :)

barrkel · on May 10, 2019

Data migrations? Denormalizing columns from one table to one or more child tables, possibly more than one relation away? Switching one set of fields in a single table to be in a different table via a relation, converting something from 1:1 to 1:n?

The concept appeals to me, but it only seems to work for trivial migrations.

liyanchang · on May 10, 2019

I’ll openly admit that we don’t have everything figured out. You’re absolutely right that currently, we constrain ourselves on what we migrate to admittedly simple migrations.

I think there’s a whole set of problems to be solved in this space and frankly, I’m as surprised as anyone that given how SQL is declarative, we use procedural code to do migrations so part of my post was hoping people would tell me what tool I should be using or how this approach fails over time. So your examples are really helpful for me as I think through if it’s possible to do automatically, workaround, or get by without.

It seems to me that we just lack the ability to express these transitions mathematically that can help us do them. And of those, there’s probably only a subset which are possible to do without taking downtime.

In particular, the class of migrations that you outlines are a combination of DDL and DML changes and also have quite a bit of code complexity to do without downtime. It’s definitely a current weakness.

evanelias · on May 10, 2019

A totally valid point, but I'd argue those should be handled by a separate tool or process. Data migrations tend to be fully programmatic; tools and frameworks can help reduce the code required, but cannot handle every possible case. (having performed numerous multi-billion-row data migrations, I learned this painfully first-hand...)

For simpler cases, where it may make sense to run a data migration immediately after a schema change, a good generic middle-ground may be configurable hook scripts. A declarative schema management system can then pass relevant info to the hook (which tables were changed, for example) and then the script can run any arbitrary row data diff/apply/migrate type of logic.

I do understand your point though; for relatively straightforward data migrations, an imperative system can capture these much more cleanly by just coupling them with the corresponding schema migration code.

barrkel · on May 11, 2019

I honestly like the way Rails does it: both capturing the imperative deltas and dumping the final schema which gets checked in. Not a big fan of down migrations, usually a waste of time.

Otherwise I like Percona's OSC, particularly how it can tune down table rewrites when there's competing work, or replication is lagging too much. We're just at the point where we need to automate the OSC tool rather than using it as a point solution for migrating our bigger tenants.

PaulMest · on May 10, 2019

I am guessing that you are probably not using Python/Django... but is this any different than what Django offers?

Django allows you to define your models (schema) and then run a command that will generate the migrations. If you don't like the migration that was generated, you can modify it. You can customize up and down operations.

There are also tools that will take an existing database and generate the Django models for you.

All of these operations can also export the exact SQL that will run on each supported RDBMS platform in case you want to be extra sure on what exactly will be executed.

nickserv · on May 10, 2019

Django migrations can be problematic because they're meant to be sequential and have interdependencies. I've had problems merging multiple feature branches because of this, even though there are no code conflicts.

A system like Saltstack or Puppet for databases would not have checked in migrations, these would be generated on the fly at deploy time.

So you could very well have multiple state changes in a single run, by comparing actual DB state and desired DB state, then creating the SQL code as needed for that deployment.

Honestly not having to fiddle with the migrations table on a live server seems pretty nice ;-)

This could very well turn out to be Django's next gen migration tool...

Izkata · on May 10, 2019

> Django migrations can be problematic because they're meant to be sequential and have interdependencies. I've had problems merging multiple feature branches because of this, even though there are no code conflicts.

They're actually a directed graph; this means a conflict wasn't handled on the branches that should have been, and would probably have been a problem regardless.

PaulMest · on May 10, 2019

This was helpful to think about, thanks.

I've rarely encountered logical merge conflicts with migrations, but I could see it happening.

I used to be on the SQL Server team at Microsoft and had some exposure to the customer support teams. So data integrity and eliminating any potential for errors was huge.

So while I love the idea of migrations being generated on the fly from actual state in Production-System-5 to desired state of commit 27a73e, I'm skeptical of it working that well in practice. Certain cases come to mind where there might be intermediate migrations from [full name] -> ([first name] [last name]) -> ([first initial] [last name]). The system would have to be smart enough to know A -> C may require A -> B -> C or prompt the engineering team for a better DML migration script.

Also, you will want there to be documentation about what was performed whether that is a migrations table that points to a .py file... or a .json output... or a log file.

liyanchang · on May 10, 2019

Yeah. I’d love to see the academic paper with formalizations that help me understand the true scope of this problem. Your example is a great one that prompts many questions. Is it possible to travel directly to the commit o(1) or will the code have to calculate the diff of each commit and apply them one at a time o(n) and how much definition and dependency mapping humans need to do to have it work correctly?

kd5bjo · on May 11, 2019

The closest I can think of is trying to define a set of CRDT-compatible operations that are expressive enough to describe your database schema, starting from an empty database. Then, the migration you need to perform is whatever the CRDT merge operator says you need to do.

evanelias · on May 10, 2019

That's great initially, but problems definitely crop up at scale:

* What happens when your company creates new systems that aren't Python/Django? You can either still shoehorn all migrations into Django models, or have multiple separate schema change processes/pipelines... both options are not good.

* If someone makes an out-of-band schema change manually (either by accident or to do a rapid hotfix), you're no longer on a known version. Not sure about Django, but when this happens, most traditional migration systems cannot operate without additional manual cleanups... whereas declarative tools can inherently transition any state to any other state.

* Depending on the DBMS, with large tables and certain types of ALTERs, using an external online schema change tool is greatly preferable to just running an ALTER directly.

* Does Django support sharding / the notion that a change to a sharded model must be made to multiple shards?

PaulMest · on May 10, 2019

* I see your point on not standardizing on one framework. Generally when that has happened for me, it turns into a new service and it has its own database/tables/migration management. It does get quite annoying, for sure.

* I've seen enough things go wrong that on my teams I do not allow DDL to be executed outside of a controlled process that comes from code. But yeah, if that were to happen, it would annoying to figure out what was done and then try to re-model.

* With Django you can specify exact SQL to run. So you can break up operations into multiple smaller steps... canonical example is building a new column based on an old column. You first add the column with NULL. Then you populate in batches of ~10k records. Then you add on the constraints/indexes.

* I haven't used Django with sharding. It appears there are some posts about it, but it all appears to be community generated content and not part of the official docs.

All-in-all, I could see that at a large scale with very mature engineering organizations with lots of activity and complex operations that something like Django could fall short and a home-grown system like this may be beneficial, assuming it were reliable enough.

nikisweeting · on May 10, 2019

Sharding is usually handled by patching the QuerySet / ObjectManager, how Citus handles it is a good example: http://docs.citusdata.com/en/v8.1/develop/migration_mt_djang...

henning · on May 10, 2019

Prepare for the exciting future of DevOps transformation: it's `./manage.py makemigrations`.

daoxid · on May 10, 2019

This sounds nice! One Question: You said that DML changes are handled via "standard check in sql file". Does this simply mean a new SQL file for each migration? And how are DML changes connected to DDL changes? For example, if some code is two versions behind and updated to the current schema, wouldn't this mean that the DDL is updated in one step to the current state, but the DML potentially in two steps, breaking the update?

liyanchang · on May 10, 2019

That's correct. The DML changes as part of CI are somewhat new so we haven't ironed it all out yet.

Here's the scenario that I think you're laying out: 1. Commit A creates column foo 2. Commit B has DML that reference column foo 3. Commit C removes column foo

This works fine if our CI deployer does each commit individually. First roll out any schema changes, then run any DML SQL.

However, our deployer might pick up all those changes and since we roll out the schema migrations first (in this case a create + drop -> NOP) and then runs the DML (which will error), this is an issue because of the rollup.

In practice, we have yet to see this case (most of the time, the dev who write the DML is close enough to the code to know if it's going to be dropped soon and we don't drop that many columns - in part because we know that there be dragons) but truthfully, I haven't thought about it much and need to think through what the impact is beyond this example. Thanks for helping me refine my thinking and I'll have something to ponder on this weekend!

daoxid · on May 10, 2019

Yep, your example describes exactly (and better) what I meant. Thanks!

wvenable · on May 10, 2019

Can that handle column renames? Most schema-to-schema diff tools can't tell the difference between a rename and a delete/add.

liyanchang · on May 10, 2019

I’ll openly admit that we don’t have everything ironed out. In fact my next big project is to tackle derived columns (rename is a column where the transformation is the identity function).

It requires a bit more finesse and integration into our code base as it requires multiple deploys with code that knows how to handle both columns.

djrobstep · on May 11, 2019

Strong agree. You can do much better than than Rails/Django migrations.

I have been advocating for this approach for a while now: https://djrobstep.com/talks/your-migrations-are-bad-and-you-...

safog · on May 10, 2019

Not sure about the state of the world currently after living in BigCo filter bubble for the past few years, but do you even need custom tools to calculate the delta between the schema as checked into VCS vs the database's actual state?

Spanner (https://cloud.google.com/spanner/) I think can auto-compute the diff between its current state and a given schema, generate appropriate SQL statements to perform a migration and get user confirmation for destructive schema changes.

atsaloli · on May 11, 2019

> This is just how the industry started managing server configurations (Puppet) ...

Yes, and CFEngine pioneered this in 1993 (Mark Burgess).

I make my living as a CFEngine consultant.

liyanchang · on Nov 1, 2018

We're building a better and more human health insurance company to improve the lives of seniors in America

We are guided by a deep belief that every senior should be treated like we would treat a member of our own family: with loving care and a profound commitment to their health and well-being through world-class technology and customer service.

Stack: Go, Python, Postgres, GraphQL, gRPC, React, TypeScript, Docker/Kubernetes, Terraform, AWS

liyanchang · on July 20, 2016

Disclosure: I'm an engineer at USDS and these are my own opinions.

I'm personally appreciative of the work that you've done and your continued work, in spite of the many challenges and frustrations. There are many ways to serve. It's always incredibly rewarding to be working along side dedicated and talented public servants and contractors. Thank you.

liyanchang · on July 20, 2016

Disclosure: I'm an engineer at USDS and these are my own opinions.

We are very proud to be working along side the existing agency and contracting staff. You are right that many times, they do know how to fix it and need some help from us getting management to let them do it. Some times they are skeptical, sometimes there is friction - that's all completely understandable.

Ultimately, my time in government has opened my eyes to the talented folks inside government right now and I'm proud to be working along side them. I hope you didn't get a poor impression of us and the work that is happening.

maxxxxx · on July 20, 2016

I didn't mean to offend you at all. USDS is great. I just see this dismissive attitude of existing staff by management. But in reality it's management that messed things up, not staff.

liyanchang · on July 20, 2016

Thanks. I appreciate that. I think you actually have a keen eye for the issues.

Since my time in government, I've become more sympathetic to agency employees, contractors, management - pretty much everyone. The entire environment is really challenging to do the right thing.

As an executive, the typical tools you have at your disposal, budget and HR, are circumscribed by congressional appropriations and federal hiring guidelines. Your ability to promote and demote and reorganize are limited by laws and unions and you definitely can't grant stock options. The metrics that you typically use to measure success are much fuzzier because there isn't a PNL but it's really easy for the press to find a single case where your org does a poor job.

I still maintain hope that good things can be done - I have seen it - and good management makes a huge difference.

Disclosure: I'm an engineer at USDS and these are my own opinions.

liyanchang · on July 20, 2016

Disclosure: I'm an engineer at USDS and these are my own opinions.

I absolutely agree that the interface that most citizens use to government can be better. President Obama agrees with you (from his sxsw remarks):

"I could change the politics of America faster than just about anything if I could just take control of all the DMVs in the country."

"If their primary interaction with government is the IRS, you just don’t have a good association with government when you’re writing that check."

USDS has been working with the IRS to help them get their Transcript service back online with better identity proofing though admittedly just a small step in a better online experience. (irs.gov/transcript)

A number of state and local governments have been making improvements as well (much harder to see if your state or your local government has not been). I'd encourage you to search them out or even start it.

liyanchang · on July 20, 2016

Disclosure: I'm an engineer at USDS and these are my own opinions.

We hire for experience, which comes in all sorts of shapes and sizes. Based on my personal observations, the team is actually the most diverse team I've ever worked on by a number of dimensions. We've pulled folks from retirement, others who are chronologically younger but have been contributing to Python and Debian for years, those who have uprooted their family. The job is an incredible mix - it requires being able to debug a slow performing database index one moment, briefing a deputy secretary the next, all while empowering the other federal employees and contractors to do great things.

I would be remiss not to take this opportunity to link to our hiring page. https://www.usds.gov/join

I do recognize that there are lots of talented folks that aren't able to move to DC or can't make it work for a variety of reasons. That's okay. There are many other ways to contribute to our country [0] - working for state/local government, volunteering in your community, being a good parent, inventing the next breakthrough, actually using your turn signals, and many more.

[0] Yes I recognize that not everyone here is US based. Though the point actually is probably true for all countries.

liyanchang · on July 20, 2016

Disclosure: I'm an engineer at USDS and these are my own opinions.

That's right. The US Government is probably not going to win on compensation - salary limits, no stock options, no lunch. However, while it's not a money making enterprise, it's enough to do just fine. I recognize the sacrifices that many of my colleagues have made to be here, not to mention the sacrifices that federal employees and contractors have made (some who are very good and could be making more in the private sector) and that makes this even more worthwhile.

The thing that government can offer is impact. I've always known that government has a big impact on people's lives but not sure if I could personally make an impact. That's what USDS, 18F, a number of other opportunities are offering.

This is not a job for everyone. But if you're a certain type, there's nothing quite like it.

mattmcknight · on July 20, 2016

The government could do a lot more by adjusting the differentials for technical competence on the GS scale to be somewhere near market (up and down). However, assessing technical competence isn't exactly straightforward.

liyanchang · on July 20, 2016

Disclosure: I'm an engineer at USDS and these are my own opinions.

You make a great point - USDS is just one part of the solution. In fact, in almost all of the USDS projects, we work very closely with agency employees and contractors (many of which are just as talented and have chosen to serve their country). Most of the times, I spend very little time hands-on-keyboard and helping empower the existing team.

As for the longer term solution, there is a less publicized version of what folks are doing. USDS has a number of contracting officers who are helping teach others in government how to be savvy customers of technology. 18F and GSA have been doing a lot on this front as well to help bring in really good contractors and writing agile purchasing agreements. The Office of Federal CIO is rewriting and simplifying tech policy and the Office of Federal Procurement Policy has been working on the procurement side. These are the long term changes that I'm personally excited about.

GVIrish · on July 20, 2016

That is great news, I think engaging and training COR's across government could yield some real improvements to how these projects go.

I'm personally seeing some 18F projects come down the pipe that are truly innovative and hopefully will encourage organizations to think differently about IT procurement. Still a lot of silly stuff out there but at least the ship is starting to change heading.

zacharycohn · on July 20, 2016

What 18F projects are you excited about?

[Disclaimer: I work for 18F and will probably point those teams at this comment so they can do a happy dance!]

liyanchang · on July 20, 2016

Disclosure: I'm an engineer at USDS and these are my own opinions.

So in my admittedly short time in the government [0], I've witnessed how all of these problems are due to good intentions. That's what makes this all really tough because everything you think is bonkers actually has a reason.

The 1400 page travel regulations is a result of trying to prevent fraud - every single issue that comes up results in a new rule.

The fact that it takes some projects years to deploy is that we would like to plan and make sure that every resource is well-spent, that it's in a number of languages and accessible to the blind.

It makes it hard for everyone - I've met lots of smart talented civil servants and government contractors who want to do things differently but have their hands tied behind their back.

[0] 2 years feels like forever to me but flash in the pan to many of the dedicated civil servants I've met.

niels_olson · on July 21, 2016

Disclosure: I'm active duty Navy and these are my own opinions.

In 22 years, I have never been so hopeful for meaningful improvement in my work life as I am now. Having met a few folks I am all too familiar with DTS and the JFTR (the 1400 pages in the article(1)). I think that's a great choice to start with: like Google going after the mundane problems of every person's life. This will make a difference. I am on travel now and was on the phone and DTS (simultaneously) for an hour today. And for anyone who tries to apologize for the 1400 pages, please don't. I have cut instructions from 238 pages to less than 30. I would argue the major problem is not that people are trying to solve every edge case. The major problem is that people are only in a job for a short period of time, come in, and while they may try to solve the edge cases they encounter, they often do that by trying to simplify things by inserting a new abstraction and taking ownership of that abstraction. So the layers of abstraction accrete like sediment. And as long as there's no direct logic conflicts, they can promote away from the problem.

I will gladly buy any USDS, 18F, or DDS hacker in San Diego a beer. Keep up the good work.

(1) It's actually 1602 pages: https://www.defensetravel.dod.mil/Docs/perdiem/JTR.pdf

the_watcher · on July 20, 2016

> every single issue that comes up results in a new rule.

This sentence is the simplest explanation for government (and bureaucratic) incompetence.

Think about writing software. Is the optimal solution to every single bug to write more code to deal with that specific situation? Of course not. In many cases, sorting out the underlying cause and fixing that (which may involve new code, rewriting old code, or even deleting outdated code) is the correct approach (assuming that the optimal solution is the desired outcome, there are of course cases where speed of getting something out that works trumps this, but government regulations only take effect once annually in most cases anyway, so they don't have a speed excuse).

Simply writing a new rule to deal with every scenario is an approach that inevitably leads here.

stupidcar · on July 20, 2016

You demonstrate intellectual fallacies common amongst software developers:

1. Believing that your experience in one particular field makes you qualified to make pronouncements regarding others about which you know nothing.

2. Believing that problems in other fields are inherently simple to understand and solve, and that the reason they aren't must therefore be due to the malice or incompetence of people working in those fields.

3. Believing that software and software development processes are a universal model that can be applied to any other problem via trite and reductive analogy.

devonkim · on July 20, 2016

With these considerations taken we can apply the same problems to people in industrial scale manufacturing that have typically managed many organizations that are now in charge of software projects. Then comes no real effective incentive structure with how most federal contracts are written despite billions spent on lawyers to protect the government and to get the most for the government's dollars supposedly. I've seen far too many projects with clearly the most talented and well-managed contractors getting tossed for actually finishing their deliverables while those that didn't get through 40%+ got renewed because they were just too critical to the political success of the greater project. And they'll continue to cite the project as past performance and renewal as an indicator of competence.

The fact there's so much opposition to USDS and 18F that's greater than any other group ever is the best indicator that the fat, bloated Beltway bandits are worried about their easy road to retirement. This isn't to say everyone's lazy - far from it. But government contracting has been largely insulated from the realities of most commercial enterprises through the politicized veil of "protecting veterans" and "defending the country" and for every legitimate, honest worker there's at least two that just want a cushy 9-5 for 35 years.

8note · on July 20, 2016

while i frequently agree with you, i disagree here.

both code and travel regulations are lists of rules written in a terse language, and both are subject to bloat all of the time, abd parts become deprecated as times and priorities change.

these are very comparable things, e cept that code can change quickly at a low cost, while regulations are costly to change.

Because of the cheapness, software folks have put a lot of effort and study into optimizing how you make rule changes, which makes it very applicable to some other problem spaces

rosser · on July 20, 2016

> ...except that code can change quickly at a low cost, while regulations are costly to change.

And you don't think that's a salient difference? Particularly in light of the adversarial nature of politics, which was one of your comment's parent's points, and which is a major contributing factor to such changes being so costly?

dave_sullivan · on July 20, 2016

Yet the information density of the parent comment is so much higher. You have 3 statements that can be reduced to "Things that work in programming don't necessarily work in other fields." No kidding? Then what's your solution for operational efficiency in government?

stupidcar · on July 21, 2016

Some of us believe detail and nuance add power to argument, rather than rejecting them in favour throwing around basic, unsupported claims.

And your reduction is not even correct. Assuming that problems in other fields are inherently easy to solve has nothing to do with the applicability of software engineering techniques. Assuming those working in other fields are incompetent or malicious has nothing to do with the applicability of software engineering techniques. You say my post was lacking in information density, yet you apparently weren't even able to grasp the arguments I did make. So maybe I needed more explanation, not less?

And are you implying that unless I come up with a solution for efficient government, it somehow renders my — entirely unrelated — argument invalid? That's nothing more than a lame attempt at argumental misdirection. But hey, I'll bite: My solution for operation efficiency in government is for everyone to think and act in the complete opposite manner to you. Is that reductive enough for you?

dave_sullivan · on July 21, 2016

> And are you implying that unless I come up with a solution for efficient government, it somehow renders my — entirely unrelated — argument invalid?

I'm implying that you're long-winded and it weakens your argument. If I reduced your response, I'd reduce it to, "You hurt my feelings and I'm angry about that." Fair enough, but the rest is so much filler.

r00fus · on July 20, 2016

That's fine in writing software - now try that in an adversarial environment.

With special interests (some of whom may be insiders working to undermine the exact fix that's needed), and you start to get the picture.

To add a bit of spice, address some things like time pressure related to elected administration-specific goals and/or election timeframes.

jnbiche · on July 21, 2016

> That's fine in writing software - now try that in an adversarial environment.

Oh, software is not an adversarial environment?

evgen · on July 21, 2016

Software development is little league compared to the major league bloodsport of bureaucratic politics played inside the beltway.

sn9 · on July 20, 2016

Writing software for whom?

Consider the software that NASA writes and how it's written. Rigorously specified, reviewed, and tested by some of the best engineers in the world to the point that almost bug-free code is produced at the expense of a much slower rate of development. Which is the best you can do with billions of dollars and human lives on the line for certain projects.

Now look at most government software infrastructure: frequently mercurial and ambiguous software specifications interacting and/or based on flawed laws and regulations filled with logical contradictions written by Congressmen and lobbyists with perverse incentives. And you have to justify every cent or risk the accusation of wasting taxpayer dollars.

nn3 · on July 20, 2016

>Rigorously specified, reviewed, and tested by some of the best engineers in the world

Actually NASA culture is to strictly avoid super stars. See:

http://www.fastcompany.com/28121/they-write-right-stuff

In the shuttle group's culture, there are no superstar programmers. The whole approach to developing software is intentionally designed not to rely on any particular person.

And the culture is equally intolerant of creativity, the individual coding flourishes and styles that are the signature of the all-night software world. "People ask, doesn't this process stifle creativity? You have to do exactly what the manual says, and you've got someone looking over your shoulder," says Keller. "The answer is, yes, the process does stifle creativity."

Bartweiss · on July 21, 2016

I think equating "best engineers" with "superstars" means you might be bringing your own associations to the topic. (Not unfairly, that's a standard association in the Valley, but still.)

The few NASA engineers I've known have been superb as NASA employees. They weren't grand innovators solving problems on their own, but they were knowledgeable and intelligent. They had deep understanding of the tools they worked with, were rigorously careful and formal, and understood the problems and tradeoffs of their work far beyond any spec they were handed.

To me, that counts as being one of the best engineers in the world. These are people who know what they need to do, why they need to do it, and how they can best accomplish it. In the case of NASA, that generally means doing something radically different than you would at a tech startup, but these people are still brining enormous ability and great care to their work.

sn9 · on July 21, 2016

I never said anything about superstars. That's precisely one of the articles that most informs my understanding of NASA software practices.

rosser · on July 20, 2016

Funnily enough, it turns out that collections of humans interacting isn't like software.

Go figure.

cs702 · on July 20, 2016

Thank you for posting here.

The same logic applies to all those regulations that seem bonkers.

If you think of all those regulations as a type of source code (which dictates what government employees and citizens can and cannot do, when, and under what conditions), it's clear that a lot of regulatory code needs major refactoring.

To use your example with travel regulations, those 1400 pages designed to prevent fraud likely consist primarily of thousands upon thousands of assertions and if-then statements. I wonder if it would be possible to reduce them to, say, a few dozen pages -- by refactoring all that 'regulatory code' to use different, higher-level abstractions.

tonysdg · on July 20, 2016

My guess is that most of those 1400 pages consist of the corner cases - things that will 95% of the time never pop up, but when they do will wreak havoc unless properly dealt with. Granted, I'm not sure travel fraud can wreak that much havoc, but I'm sure someone somewhere gets annoyed over it.

My (admittedly limited) experience in coding/engineering has taught me that it's unwise to look at technical problems and people problems in the same light; the former can be solved much easier than the latter. The trick is to figure out where you can solve the former to avoid the latter!

sliverstorm · on July 20, 2016

Not in government, but I understand the biggest problem with travel to be the expense account, which has a history of being abused to disgusting proportions.

Bartweiss · on July 21, 2016

Frankly, my take is that this is what happens when you try to beat human intent with formal rulings. Explicitly banning every possible form of travel fraud is almost unimaginable - certainly it can't be done without banning a huge amount of legitimate travel also. Rigorous safety around people problems is nigh-impossible, which is why most safe software systems take the approach of "do it our way or go to hell".

At a certain point you can only solve the people problems with oversight and good intentions. You could get one random employee to certify any given travel plan or reciept as "not obviously fraudulent" and recreate the benefit of ~700 pages of regulations, just by showing the thing to someone who doesn't benefit from fraud.

But of course, incremental change produces these kind of awful local minima. If you are punished for fraud, aren't punished for overhead, and can't change the whole system, what else would you do? You ban one known misbehavior, go on with your day, and everything gets a little bit worse.

niels_olson · on July 21, 2016

They're referring to the JFTR, now JTR. And it's actually 1602 pages

https://www.defensetravel.dod.mil/Docs/perdiem/JTR.pdf

DannyBee · on July 20, 2016

"The 1400 page travel regulations is a result of trying to prevent fraud - every single issue that comes up results in a new rule."

This seems like a serious inability to understand that no process designed to prevent future things you can't forsee is 100% effective (by definition). At some point, you have to declare "good enough", and live with it until the error rate becomes unacceptable overall again, then modify it.

IE it's likely 50 pages of those regulations gave them a 99.9%+ rate of avoiding fraud. They then added 1350 pages to get to probably 99.99%

This is unlikely to be worth it.

(and yes, before someone points it out, i'm likely being generous with the numbers)

joeyo · on July 20, 2016

Some of it comes down to agencies needing to protect themselves from congressional witch-hunts. Consider a congressman or political party that has ideological objections to an agency even existing and wishes to neuter or eliminate it. A stellar way of achieving this is by making the target seem wasteful and fraudulent [1]. If there is actual fraud, no matter how small or how much of a corner case, this task becomes even easier.

1. A good example of this is how Republicans periodically attempt to defund science agencies by mocking research projects that sound frivolous.

my_first_acct · on July 20, 2016

A Democrat, the late Senator William Proxmire of Wisconsin, was well known for mocking frivolous-sounding research projects. From the Wikipedia article[0]:

"In 1987, Stewart Brand accused Proxmire of recklessly attacking legitimate research for the crass purpose of furthering his own political career, with gross indifference as to whether his assertions were true or false as well as the long-term effects on American science and technology policy."

[0] https://en.wikipedia.org/wiki/William_Proxmire#Golden_Fleece...

knowaveragejoe · on July 22, 2016

It's good to see examples from both sides. That being said, what the above mentions is Republican doctrine, as opposed to isolated cases of politicians(on either side) simply blindly furthering their political careers.

dastbe · on July 20, 2016

Unfortunately there are a lot of people who believe government shouldn't do anything unless it is fraud-proof. The narratives around welfare and food stamp abuse make headlines for exactly this reason :(.

el_benhameen · on July 20, 2016

I think their intentions are a little more nefarious. It's not that they want e.g. a fraud-free welfare system; they fundamentally disagree with the idea of welfare and so use fraud, whether it's a legitimate issue or not, as a basis for trying to stymie or dismantle the institution.

temujin · on July 20, 2016

The problem is, once the government chooses to not close a known loophole, the number of people who exploit it may increase by orders of magnitude. Without a willingness to add the other 1350 pages, you may end up with something like 70% fraud prevention, not 99.9%.

What's needed is more refactoring. This would benefit from more capacity to try different sets of regulations in parallel.

DannyBee · on July 20, 2016

This is a generally true statement about any process. The solution to that is to enforce well enough that people don't think that's a good idea. I also did say you do have to refactor over time as compliance rate decreases. Past that, i don't think we actually disagree :)

If you have a speed limit sign, and it says "speed limit, 50 mph, enforced by satellite observation", most people will probably ignore it. Those that don't and get caught, yeah, they go looking for excuses for why they ignored it to post-justify it. Changing the regulation wording will not change this. You can make the sign much larger and say "speed limit 50 mph, even if you are really late for an appointment, etc" but honestly, it still will not help that. People ignore it because the enforcement mechanism makes them feel like it won't happen to them (and because it's not socially abhorrent, etc), not because of ignorance of the law

On the other hand, if you have a sign that says "speed limit 50mph, enforced by this guy, right here", and there is a smiling cop with a radar gun sitting next to the sign, enforcing it, most people will not ignore it. In fact, i'd bet you could write everything before "enforced by this guy" in small print people had to slow down to read, and most people would slow down and read it, because they believe the risk of enforcement is greater to them.

Will you get everyone to stop speeding there? Nope.

Even if you add spike strips, laser beams, whatever, someone is going to do it, and in fact, enforcing harder sometimes increases the rate (depending how low the rate is) based on the thrill some people get. 100% compliance is just pretty much impossible, no matter what words you use.

azernik · on July 20, 2016

You cannot fix a loophole with better enforcement. By definition, the behaviors involved are allowed.

Bartweiss · on July 21, 2016

I'm not convinced about that.

Some organizations do startlingly well with good enforcement and a rule against circumventing the rules. Yes, that's subjective and messy, but it can actually work quite nicely.

Hell, it's basically what financial structuring laws are: a rule saying "no using loopholes if you find them". With that in place, it becomes surprisingly easy to address loopholes by punishing everyone who employs them.

vidarh · on July 20, 2016

A common approach is to set a fixed amount per day for expenses based on cost levels in the country in question, and be extremely strict with extras, coupled with approved supplier lists and price ranges for the actual travel.

It "rewards" those who are prudent with extra cash, and so it certainly won't be perfectly efficient, but in return it makes it harder for those who would otherwise try to abuse the system who often will go far overboard, because any extra expensive claims can be given a lot more attention (and often will require advance approval), and it drastically cuts down on paperwork.

mattmcknight · on July 20, 2016

Amazingly, the government does that already! http://www.gsa.gov/portal/content/104877

Still they create mountains of rules....

sarchertech · on July 20, 2016

This is analogous to adding code to cover security issues. 99.9% isn't good enough when people are actively looking to exploit the 0.1%.

nitrogen · on July 20, 2016

But unlike security issues a single failure doesn't compromise 100% of the rest of the system. This is also why analogies between software/security/cryptography/privacy and the tangible world are so awkward.

temujin · on July 20, 2016

Fraud prevention actually is a security issue. Not an Internet security issue, so mistakes aren't punished that quickly, but the analogy is still sound.

nitrogen · on July 20, 2016

Someone buying a new watch with their expense account doesn't suddenly give them access to the whole treasury -- that's the difference between physical and digital realms I am trying to emphasize.

hiram112 · on July 21, 2016

Most security breaches don't allow the malicious user to root the entire server farm either.

I just spent a week fixing permission validation done in JS on the browser. Users could have potentially allowed themselves to see parts of documents outside their role. This didn't give them access to our payroll system, credit card processor, or the backend infrastructure.

Bartweiss · on July 21, 2016

This is a big part of the answer. Congressional hearings and reporting often act like "fraud is fraud", but allowing 1% fraud to save 20% overhead is entirely reasonable.

Improper resource usage is a better metaphor than security failures for this topic.

sliverstorm · on July 20, 2016

People are pretty sensitive about government financial workers committing fraud, similar to how they are rather sensitive to government police committing murder.

DannyBee · on July 20, 2016

Sadly, in neither case will you ever have 100% compliance. Pretending it's achievable, and trying to achieve it, is IMHO, silly.

Remember the regulations do not prevent fraud, enforcement prevents fraud. There already exist plenty of things saying it's not okay, etc. Saying "and also, don't do that" is probably not actually necessary most of the time, in the same way saying "don't shoot people" is sufficient. Saying "and also don't shoot them while they are handcuffed" isn't necessary. Crappy post-justification does mean the regulation was written wrong, and changing the regulation to account for the post-justification will not actually improve the process most of the time.

sliverstorm · on July 20, 2016

I don't think we can take this much further without knowing what's actually in the regulations, but I imagine they consist more of "officer's dash cam will be run 24/7 and backed up in triplicate", "officer will learn proper gun handling techniques X, Y, & Z", etc rather than "don't shoot people", "don't shoot handcuffed people", "don't shoot clowns", "don't shoot children".

Or, in the fraud case, "books will be audited at frequency X", "Y behavior makes it too easy to hide fraud and is not allowed". Rather than "fraud is illegal on Monday", "fraud is also illegal on Tuesday", "fraud is even illegal on holidays"...

Of course we can never achieve 100% with more regulation, but we make it more of a priority to make abuse harder to get away with than elsewhere, presumably increasing overhead in exchange for lowering abuse (yes, this is probably not a strictly linear curve)

Bartweiss · on July 21, 2016

In the travel regulations case, we can see, and horrifyingly it's more of a "fraud is illegal on Monday" situation: https://www.defensetravel.dod.mil/Docs/perdiem/JTR.pdf

There are some sensible regulations there, like having someone approve travel requests, but there are also a lot of very narrow restrictions obviously added by someone who wanted to prevent Fraud X, but lacked the authority to change what was already written. The result is that you get more overhead with depressingly little payoff.

In principle you're right about the trade-off, but that's only the case when rule-writers have the authority to sensibly restructure what already exists.

Bartweiss · on July 21, 2016

This is a good summary. At a certain point, you honestly can just have a rule against stupid or malicious behavior. The trick is to enforce it carefully and sensibly, rather than to pursue comprehensive objective rules.

Anyone who's played rules-lawyering games like Nomic will be aware that banning all misbehavior explicitly is impossible. You're basically limited to whitelisting approved behaviors, or implementing a general rule against malfeasance. Unless the consequences of misbehavior are enormous, the second option tends to be more efficient.

nitrogen · on July 20, 2016

I don't think that necessarily has to be the case. The public conversation could conceivably shift to a cost/benefit analysis of varying levels of enforcement vs. fraud, if only the media would cooperate.

Bartweiss · on July 21, 2016

This is definitely the case, because we already see different analyses for different topics.

When it comes to NSF, people worry about overhead and waste. When it's welfare or food stamps, people worry about fraud instead. Some of this is moral - people care about the 'undeserving poor' more than 'undeserving scientists' - because we tend to hate abuse of charity. But it clearly shows that there are different categories of concern, and that the public is capable of examining both topics.

Bartweiss · on July 21, 2016

This is exactly the problem. When you have a system that completely ignores inefficiency/overhead but goes berserk over fraud, you get totally absurd incentives. Those 1350 pages probably kept some managers from getting fired, but realistically have been a serious waste of time and money.

At a certain point, you either accept a low level of fraud or just make a rule saying "don't do bad, wasteful stuff." Then you fire anyone who breaks that rule and let things work themselves out. (This has other problems, but they can be addressed.)

Most of bureaucratic stupidity is ultimately moral hazard. Someone pays for one failure case but not another, so they spend absurd amounts minimizing what they're responsible for.

fapjacks · on July 20, 2016

I want to work there so bad. USDS or 18F. But I can't get anyone to call me back, even with twenty years working in (and running) startups. Dunno what that's about.

aschonfeld · on July 20, 2016

Hi Fapjacks. I'm on the talent team at 18F. Feel free to email me directly at amanda.schonfeld@gsa.gov. We don't have a direct phone number for folks to call, but I will definitely email you back. :)

bsaul · on July 20, 2016

Very informative comment, but then, did that change once the usds were in charge ? Because my intuition is that taking over an openly failed project makes it easy for the new team to tell everyone to try and keep things simple.

Then wouldn't the succes not be a matter of technical abilities or process, but rather goodwill by the client side to remain reasonable ?