I've used Metaflow for the past 4 years or so on different ML teams. It's really great!
Straightforward for data/ML scientists to pick up, familiar python class API for defining DAGs, and simplifies scaling out parallel jobs on AWS Batch (or k8s). The UI is pretty nice. Been happy to see the active development on it too.
Currently using it at our small biotech startup to run thousands of protein engineering computations (including models like RFDiffusion, ProteinMPNN, boltz, AlphaFold, ESM, etc.).
Data engineering focused DAG tools like Airflow are awkward for doing these kinds of ML computations, where we don't need the complexity of schedules, etc. Metaflow, imho, is also a step up from orchestration tools that were born out of bioinformatics groups, like Snakemake or Nextflow.
If you’ve tried, has it been clunky to run non-python based workflows? I.e if you want to run bedtools or diamond without having to run a bunch of subprocess.run commands?
Right, for most of our workflows, we stay in python land, which is great and seamless with Metaflow being in python. But yes, there are occasions that we have to make a system call to run an old R script or even a compiled C++ executable :shrug: (Metaflow does have some native R support tho) I have not had to use the specific tools you called out, bedtools or diamond.
Most of the time this not a blocking problem since each step in a flow is mapped to a Docker image and/or your choice of EC2 instance (e.g. one step on a GPU, another on a memory optimized instance). You can have one step use an image with all of your python-based ML stuff, and another step have a different image with compiled exectuables that are triggered by a system call. If needed, outputs from such a system call would then need to be persisted in a database/S3 or read back into the python flow for persistence. So, it is not as seamless as a flow in all python, but it can work "good enough".
Hmm, meh, I feel like you can say a lot of pretty similar things about San Francisco and the Bay Area too. Let me defend Chicago with some counterpoints!
- Weather: Sure, SF Bay Area wins here in terms of what you might think of as regularly experienced weather, but it is not without its extremes. Let's recall that most of California, including surrounding SF Bay Area, has been in an extreme drought for the past several years [1], threatening local agriculture and water supply. Also, SF is famous for extremely devastating earthquakes that happen every so often [2] (though, you may not categorize seismic activity as weather, but still a natural occurrence sort of thing).
- Government: SF has its fair share of recent government corruption [3, 4] as well as historical seediness/lawlessness [5], not unlike some of Chicago's history.
- Traffic: Personally, I think the Bay Area has worse traffic than Chicago. The data in this report [6] seems to agree, ranking SF/Oakland as #3 worst in the country and San Jose as #5, whereas Chicago is #8 on the list. And, really, waiting on a few trains to pass for a sports game or special event is not unique to Chicago. Try taking the BART across the bay for a Giants game, Super Bowl 50, Bay to Breakers, Pride Parade, etc. I mean, every city's transportation system hits its limit with big events/gatherings. So what? It's part of living in a city that you learn to accept.
- Wages: Can't debate it. Chicago software engineers, making average of $76k/yr [9], do get paid less than SF, at $110k/yr [10]. But, also consider that income tax in Illinois is a flat 3.75% [11], whereas California's income tax for that bracket is 9.3% [12]. 5.5% more tax in CA is not small. (Federal income tax will also be lower in Chicago with the lower salary, but I haven't looked up those numbers.) And, let's consider cost of living more in general. This handy website [13] says that "A salary of $76,000 in Chicago, Illinois should increase to $178,313 in San Francisco, California", which is mainly due to housing being 428% more expensive in SF than in Chicago. That says a lot! You might interpret that as the $110k/yr average salary in SF actually gets you less quality of living than the $76k/yr in Chicago (according to this website). Or something like that. Don't know, but my housing quality in SF is twice as much for half the space as compared to what I had in Chicago. (What am I doing here?!)
- Crime: According to this (presumably reputable) data [7, 8] Chicago has been steadily improving in terms of crime over the past more than a decade. SF has been kind of steady, and, according to that data, is actually worse per capita than Chicago. I didn't expect that at first. Chicago is a much bigger city and than SF, so that kind of helps to dilute crime among the population, I suppose. Anyway, my anecdotal evidence from friends tell me that it is improving. We must have different kinds of Chicago friends. Or maybe we should trust data instead?
- Schools: SF public schools have a lottery system too [14]. I will grant, though, that CPS does have a lot of ongoing battles with the city and state for funding, like recent closings and strikes. That stuff sucks because it's usually the poorest schools that suffer the most. Ultimately, this is a bigger societal problem than us engineers debating about where the best place for us to live is on a web forum. (Probability of existential crisis increasing...)
Okay. Got that off my chest. I love Chicago. I hope you and others do too still. I may move back soon. Writing this stuff down has, well, made me think twice about wtf I'm doing out here on the West Coast...
Worth clarifying is that I live in the Peninsula, and there's a ton of factors for why my circumstances might differ from yours. Crime out here is way lower than in SF for example. Ultimately there's pros and cons a plenty to both places, and ones priorities ultimately dictate the weighting of them.
For me, weather and culture was huge, as was the crime. The crime piece likely could have been solved if I moved to a nicer suburb, but ultimately that was lower on the list.
Like I said--Chicago will always have a place in my heart, but I've found that I'm ultimately much happier living in the Bay Area, despite the higher cost of living. I just wish I'd realized it sooner so I could have moved out here and bought five years ago.
> And, honestly, I think I was conflating the STEM in this article with Scanning Tunneling Microscopy
It's easy to do. TEM/STEM vs SEM vs STM. All completely different things. This is what happens when scientists name things :P
For those confused:
TEM/STEM: An electron beam is transmitted through your sample. Good for atomic-scale imaging.
SEM: An electron beam is scanned across your sample, but none are transmitted through. Good for topography/surface features (the interaction volume of the beam is too large for atomic resolution).
STM: No electron beam. Instead think of a vinyl record player, and physically scanning a very sample tip across the surface of your sample. Good for atomic-scale imaging of a surface.
A good STM image and a good STEM image can, at first glance, look quite similar (especially for a 2D material like graphene), but they're very different techniques.
In brief, quantum particles act like waves sometimes (think ripples of water) and act like particles (think tiny billiard balls) at other times. The consequence of that, and a major tenet of quantum mechanics, is the 'wavefunction' of a quantum particle. A wavefunction of a particle, instead of just being a single point in space, amounts to a probability density of position and momentum.
So now that we know the above, in these images, what they are actually measuring is the spatial probability density of the electrons. The lighter values correspond to high density of electrons. The darker values correspond to less electron density. The high density occurs around the nuclei of the atoms. Thus, atomic resolution. However, note that individual electrons are not resolved in these images.
Finally, I want to recommend against thinking of atoms as electron planets orbiting a nucleus sun full of empty space in between. That thinking ignores quantum mechanics. The truth is much more fascinating, which is that electrons are wave-particles that have probabilistic densities.
P.S. Protons and neutrons are themselves made of up more elementary quantum particles: quarks!
Does the microscope care about probabilistic densities? Aren't these images rendering interference/difference between electrons sent out and electrons received?
Those 'particles' supposedly were in the space and interacted with the electron beams. They were or weren't in a place at a time.
In wave particle-duality, how are we not just suggesting particles are there because it creates an estimate model that helps model behaviors observed?
In the facetted nano-diamond void, why, in the void, do we see apparent 'ghosts' of the lattice in the void? Are there particles there or not? If there are, why are they dim?
Right. Well, given your comment and niels_olson's, I feel I have misspoken about this STEM experiment. Not that was I said about quantum mechanics was wrong, just that its relevance to the measurement in this experiment is misguided.
As niels_olson points out, the primary interaction here is between the electron beam and the atomic nuclei.
Straightforward for data/ML scientists to pick up, familiar python class API for defining DAGs, and simplifies scaling out parallel jobs on AWS Batch (or k8s). The UI is pretty nice. Been happy to see the active development on it too.
Currently using it at our small biotech startup to run thousands of protein engineering computations (including models like RFDiffusion, ProteinMPNN, boltz, AlphaFold, ESM, etc.).
Data engineering focused DAG tools like Airflow are awkward for doing these kinds of ML computations, where we don't need the complexity of schedules, etc. Metaflow, imho, is also a step up from orchestration tools that were born out of bioinformatics groups, like Snakemake or Nextflow.
Just a satisfied customer of Metaflow here. thx