I am trying to find how the synthetic data was created (looking through the repo) and didn't find it. Maybe I am missing it - Would love to see the prompts and process on that aspect of the training data generation!
Oh my! Thanks for the memories - HPUX was my first workstation class unix operating system (sili-g's were too expensive). I remember downloading and compiling gcc on hpux. THe ideas of compiling a compiler with itself blew my mind!
Very nice. Its great to see how fast it boots, and it can run doom (framebuffer): https://www.youtube.com/watch?v=Ce1pMlZO_mI (also nice to see the dev takes the time to reply to an aspiring CS student on what it takes to grow in this field - comments in youtube)
Could someone explain why/how NBD is better than just using a linux host as an iscsi target? Googling NBD vs iSCSI shows old articles with no real solid conclusion.
NBD is an extremely simple protocol. Read range, write range, delete range, sync -- that's it. If you want to implement an NBD server from scratch, you can totally do so in an afternoon. I have done this and use it in production: https://github.com/sandstorm-io/blackrock/blob/master/src/bl...
iSCSI is comparatively far more complex. It's a TCP-based adaption of the SCSI protocol, which has existed for decades as a way to talk to hard drives. As I understand it, you can pass arbitrary SCSI commands over iSCSI; see: https://en.wikipedia.org/wiki/SCSI_command iSCSI is enterprise-y and has a bigger ecosystem. You can netboot a diskless machine into Windows over iSCSI (I do this: http://kentonsprojects.blogspot.com/2011/12/lan-party-house-...).
Personally I like NBD a lot better because the simplicity means you can build new, cool things with it. But there are others who would say that NBD is a toy compared to iSCSI.
Did you encounter any problems with NBD caching when it acknowledges the write to the application but doesn't pass it to your "backend" therefore leaving no room for error handling if that backend goes away?
NBD provides a virtual block device, so all the normal filesystem caching the kernel does above a hard drive applies to NBD as well. This is good: this is what makes it so fast.
Just because `write()` returned successfully does not mean that the data has been written to disk (whether you're using NBD or otherwise). The application needs to call `fsync()` to force writes to disk and get confirmation of success. An `fsync()` will send all pending NBD_CMD_WRITEs followed by NBD_CMD_FLUSH and will only return success when all of these have completed successfully.
Gut feeling: NBD vs. iSCSI is like NFS vs. SMB, it's not that it has magical features, but it is more integrated and more purpose-built, and built-in (as in, in the kernel).
Development wise it's a much more simple protocol, iSCSI has a lot of it's own complexity + the SCSI complexity to implement, NDB has a reasonably short RFC style document.
Slightly misleading on the source of the profits. It's important to read this section of the article:
"But the company's real profits are derived from a lesser-known side of the business: property development."
and
"Here's how it works: MTR enjoys a special relationship with the Hong Kong government, which is also its majority shareholder. The government provides land -- at no cost -- for use by the train operator, and MTR is then allowed to develop the areas above and around its stations."
So the government loans out land (which is crazy expensive in hong kong) for free and the MTR gets to keep the profits for leasing out that land via malls, etc.