Hacker Newsnew | past | comments | ask | show | jobs | submit | ckastner's commentslogin

> The reason is this:

> Both ethical and safe conduct depend on context and intent.

The same apples to knives, and they can be plenty useful, and used in a safe manner.


I suppose that the argument could be made that knives are inherently unsafe, and that no matter what it is important to always treat it like it's unsafe. This doesn't imply that you shouldn't use knives, just that you should be aware of the inherent unsafety of it?

I don't know, I didn't really agree with the post, I'm trying my best to steel man it.


"AI will never be entirely ethical or safe because it's like having a knife, a gun, a hardware store, and a medical doctor, all in one convenient interface."

I can understand the appeal; being able to be "present" without the time cost can mean (possibly significantly more) presence at the same cost. This could be very attractive especially to those managing personal relations, like sales representatives.

But I'm surprised that the risks seem to be so underestimated.

Once this clone exists, what happens if it gets out into the wild? Imagine everyone having full access do what is effectively a digital model of your personality. Imagine your competition putting your own model to use against you.

And the better the approximation of this model, the worse the damage to yourself.


> being able to be "present" without the time cost can mean (possibly significantly more) presence at the same cost.

This is magical thinking. "Presence" and "time cost" are inextricably linked. You can't have one without the other.

When you use AI to decouple them, you're telling your audience/colleagues/attend that you want them to listen to you but not the other way around.


> This is magical thinking.

But it was helpful to me!

Reading it I mean. The commenter putting into words why exactly someone would think that this would be a good idea.

Of course, you're 110% right that it isn't, but it's still nice that HN provides some subtiles for those that are out of the loop and out of substances in their bloodstream.


Very ironic for the billionaire to be openly replacing himself with AI, I suppose he believes his job is easy enough that an LLM can do it, so we definitely don't need him

Yes, exactly. Anyone training a model to replace themselves, is replacing themselves -- with something that can run 24/7 and can easily scale. And the better the model, the easier to replace.

Hence why I'm so surprised that MZ, of all people, is arguing in this direction.

I would think that the potential for malicious abuse alone should have scared him off of this.


We will never know if he is locked in his smart closet or just become a recluse.

> Imagine your competition putting your own model to use against you.

I imagine that this is part of the original plan. “Okay, we wasted 80 billion dollars on VR, and that hurts. But if we can somehow to convince all of our competitors to also waste 80 billion dollars each, then it’ll even out. How can we trick our competitors into thinking more like Zuckerberg?”


The real risk is when shareholders realize an LLM can do the CEO's job.

But you still get a lot of "shareholder responsibility" comments. Imagine a company that dumps sewage into a river (be that literal or metaphorical). Internet people come around to tell you this is the nature of capitalism and shareholder structure means (increasing?) return on investment is critical and so CEOs have to spend all their waking hours having to juggle this

Am I arguing against this? I don't know - I'm not an economist. But I would like to point out there is such a thing as shareholder fraud and the venn diagram between "sacrifice quality to please shareholders" and "deceiving shareholders" has to be one big intersecting circle, you know? Especially when the guy (Zuckerberg with dual-class shares) can't ever be fired


USB/IP has been pretty useful to me, though locking it down is a bit of a chore, as it does not natively support any type of authentication or authorization (a not unreasonable design decision).

Maybe tunnel it over a secure protocol? If possible?

Do you use it on a lan I’m guessing?

The Radxa Orion O6 is a really nice ARMv9.2 ITX board, and supports UEFI boot. Installation of Debian trixie using Debian's vanilla installation media went flawlessly, and it's been running fine for 6 months now.


Do the peripherals work reliably? Wifi and GPIO especially. It does seem like a very capable board but this is always so hit and miss.


I can't really say as I don't use them. This is a host mostly working as a CI worker connected via ethernet.

I just scanned for WiFi networks and that worked fine. I also see that GPIO is not enabled for CIXP1 devices in Debian's kernel; I'll ask the kernel team to enable it.


Austrian media are reporting that Peter Steinberger had a $100m exit with PSPDFKit in 2021.

I'm extremely curious what OpenAI's offer was. The utility of more money is diminished when you're already pretty wealthy.


It makes me more inclined to take the OP at face value, genuine interest in working on something similar and making it easier for everyone ('my mum') to use.

It probably also makes him more attractive to OpenAI et al. - he's not just some guy who's going to have all sorts of risks earning a lot of money for the first time.


I think he accepted that offer exactly for this reason . He feels he can have a bigger impact within OpenAI (and maybe become a billionaire in the medium run?) that creating his own business (again) out of OpenClaw.


> To call training illegal is similar to calling reading a book and remembering it illegal.

Perhaps, but reproducing the book from this memory could very well be illegal.

And these models are all about production.


To be fair, that seems to be where some of the IA lawsuits are going. The argument goes that the models themselves aren't derivative works, but the output they produce can absolutely be - in much the same way that reproducing a book from memory could be copyright violation, trademark infringement, or generally go afoul of the various IP laws.


Models don’t reproduce books though. It’s impossible for a model to reproduce something word for word because the model never copied the book.

Most of the best fit curve runs along a path that doesn’t even touch an actual data point.


They do memorize some books. You can test this trivially by asking ChatGPT to produce the first chapter of something in the public domain -- for example a Tale of Two Cities. It may not be word for word exact, but it'll be very close.

These academics were able to get multiple LLMs to produce large amounts of text from Harry Potter:

https://arxiv.org/abs/2601.02671


In that case I would say it is the act of reproducing the books that is illegal. Training the AI on said books is not.

So the illegality rests at the point of output and not at the point of input.

I’m just speaking in terms of the technical interpretation of what’s in place. My personal views on what it should be are another topic.


> So the illegality rests at the point of output and not at the point of input.

It's not as simple as that, as this settlement shows [1].

Also, generating output is what these models are primarily trained for.

[1]: https://www.bbc.com/news/articles/c5y4jpg922qo


Unfortunately a settlement doesn't really show you anything definitive about the legality or illegality of something.

It only shows you that the defendant thought it would be better for them to pay up rather than continue to be dragged through court, and that the plaintiff preferred some amount of certain money now over some other amount of uncertain money later, or never.

We cannot say with any amount of confidence how the court would have ruled on the legality, had things been allowed to play out without a settlement.


>Also, generating output is what these models are primarily trained for.

Yes but not generating illegal output. These models were trained with intent to generate legal output. The fact that it can generate illegal output is a side effect. That's my point.

If you use AI to generate illegal output, that act is illegal. If you use AI to generate legal output that act is not illegal. Thus the point of output is where the legal question lies. From inception up to training there is clear legal precedence for the existence of AI models.


If there is one exact sentence taken out of the book and not referenced in quotes and exact source, that triggers copyright laws. So model doesnt have to reproduce the entire book, it only required to reproduce one specific sentence (which may be a characteristic sentence to that author or to that book).


If there is one exact sentence taken out of the book and not referenced in quotes and exact source, that triggers copyright laws.

Yes, and that's stupid, and will need to be changed.


Sure, but that use would easily pass a fair use test, at least in the US.


Models absolutely do reproduce books.

> With a simple two-phase procedure, we show that it is possible to extract large amounts of in-copyright text from four production LLMs. While we needed to jailbreak Claude 3.7 Sonnet and GPT-4.1 to facilitate extraction, Gemini 2.5 Pro and Grok 3 directly complied with text continuation requests. For Claude 3.7 Sonnet, we were able to extract four whole books near-verbatim, including two books under copyright in the U.S.: Harry Potter and the Sorcerer’s Stone and 1984.

https://arxiv.org/abs/2601.02671


The supplementary files in that paper—verbatim reproductions of the full texts of Frankenstein and The Great Gatsby—are pretty instructive. The research group highlighted all additions and omissions, but on most pages the differences are difficult to spot because they are only missing spaces, extra hyphens, and other typographical minutiae.


somewhat related:

"Seymour said he thought it was odd that Apple bought a Cray to design Macs because he was using Macs to design Crays. He sent me his designs for the Cray 3 in MacDraw on a floppy.” reports KentK.

https://cray-history.net/2021/07/16/apple-computer-and-cray-...


> Some 30 years ago, someone challenged me to tell the difference between Pepsi and Coke in a blind taste test.

I did something similar with co-workers recently, who didn't believe there is a meaningful difference between brands. I blind-tasted 6 different glasses and got each one right. I got my favorite (Coke) right just by the first smell, I just had to taste to see whether it was diet or not.

Not that this is a skill or anything. Its just that each of the brands I tasted has a strong characteristic flavor to me, and the difference between real sugar and artificially sweetened is also stark. I've been drinking diet versions for ages precisely because the sugary ones are just too sweet for me.


How low could that floor be, in dollar terms?

The financial engineering with the Twitter/X takeover was already pretty bold, but Tesla would probably still be a chunk an order of magnitude larger than that.


Given that xAI only has a few billion in cash on hand? Very fucking low. It'd bankrupt Elon before reaching that stage though.


There is some nuance to this. Adding comments to the stated goal "Everyone who interacts with Debian source code (1) should be able to do so (2) entirely in git:

(1) should be able does not imply must, people are free to continue to use whatever tools they see fit

(2) Most of Debian work is of course already git-based, via Salsa [1], Debian's self-hosted GitLab instance. This is more about what is stored in git, how it relates to a source package (= what .debs are built from). For example, currently most Debian git repositories base their work in "pristine-tar" branches built from upstream tarball releases, rather than using upstream branches directly.

[1]: https://salsa.debian.org


> For example, currently most Debian git repositories base their work in "pristine-tar" branches built from upstream tarball releases

I really wish all the various open source packaging systems would get rid of the concept of source tarballs to the extent possible, especially when those tarballs are not sourced directly from upstream. For example:

- Fedora has a “lookaside cache”, and packagers upload tarballs to it. In theory they come from git as indicated by the source rpm, but I don’t think anything verifies this.

- Python packages build a source tarball. In theory, the new best practice is for a GitHub action to build the package and for a complex mess to attest that really came from GitHub Actions.

- I’ve never made a Debian package, but AFAICT the maintainer kind of does whatever they want.

IMO this is all absurd. If a package hosted by Fedora or Debian or PyPI or crates.io, etc claims to correspond to an upstream git commit or release, then the hosting system should build the package, from the commit or release in question plus whatever package-specific config and patches are needed, and publish that. If it stores a copy of the source, that copy should be cryptographically traceable to the commit in question, which is straightforward: the commit hash is a hash over a bunch of data including the full source!


This was one of the "lessons learnt" from the XZ incident. One of the (many) steps they took to avoid scrutiny was modifications that existed in the real tarball but not the repo.


For lots of software projects, a release tarball is not just a gzipped repo checked out at a specific commit. So this would only work for some packages.


A simple version of this might be a repo with a single file of code in a language that needs compilation, versus, and the tarball with one compiled binary.

Just having a deterministic binary can be non-trivial, let alone a way to confirm "this output came from that source" without recompiling everything again from scratch.


For most well designed projects, a source tarball can be generated cleanly from the source tree. Sure, the canonical build process goes (source tarball) -> artifact, but there’s an alternative build process (source tree) -> artifact that uses the source tarball as an intermediate.

In Python, there is a somewhat clearly defined source tarball. uv build will happily built the source tarball and the wheel from the source tree, and uv build --from <appropriate parameter here> will build the wheel from the source tarball.

And I think it’s disappointing that one uploads source tarballs and wheels to PyPI instead of uploading an attested source tree and having PyPI do the build, at least in simple cases.

In traditional C projects, there’s often some script in the source tree that runs it into the source tarball tree (autogen.sh is pretty common). There is no fundamental reason that a package repository like Debian or Fedora’s couldn’t build from the source tree and even use properly pinned versions of autotools, etc. And it’s really disappointing that the closest widely used thing to a proper C/C++ hermetic build system is Dockerfile, and Dockerfile gets approximately none of the details right. Maybe Nix could do better? C and C++ really need something like Cargo.


The hacker in me is very excited by the prospect of pypi executing code from my packages in the system that builds everyone's wheels.


Launchpad does this for everything, as does sbuild/buildd in debian land. They generally make it work by both: running the build system in a neutered VM (network access generally not permitted during builds, or limited to only a debian/ubuntu/PPA package mirror), and going to some degree of invasive process/patching to make build systems work without just-in-time network access.

SUSE and Fedora both do something similar I believe, but I'm not really familiar with the implementation details of those two systems.


I’m only familiar with the Fedora system. The build is hermetic, but the source input come from fedpkg new-sources, which runs on the client used by the package developer.


This seems no worse than GitHub Actions executing whatever random code people upload.

It’s not so hard to do a pretty good job, and you can have layers of security. Start with a throwaway VM, which highly competent vendors like AWS will sell you at a somewhat reasonable price. Run as a locked-down unprivileged user inside the container. Then use a tool like gVisor.

Also… most pure Python packages can, in theory, be built without executing any code. The artifacts just have some files globbed up as configured in pyproject.toml. Unfortunately, the spec defines the process in terms of installing a build backend and then running it, but one could pin a couple of trustworthy build backends versions and constraint them to configurations where they literally just copy things. I think uv-build might be in this category. At the very least I haven’t found any evidence that current uv-build versions can do anything nontrivial unless generation of .pyc files is enabled.


If it isn't at least a gzip of a subset of the files of a specific commit of a specific repo, someone's definition of "source" would appear to need work.


To get a specific commit from a repo you need to clone usually, which will involve a much bigger download than just downloading your tar file.


Shallow clones are a thing. And it’s fairly straightforward to create a tarball that includes enough hashes to verify the hash chain all the way to the commit hash. (In fact, I once kludged that up several years ago, and maybe I should dust it off. The tarball extracted just like a regular tarball but had all the git objects needed hiding inside in a way that tar would ignore.)


I don't actually see why you'd need to verify the hash chain anyway. The point of a source tarball, as I understand it, is to be sure of what source you're building, and to be able to audit that source. The development path would seem to be the developer's concern, not the maintainer's.


> The point of a source tarball, as I understand it, is to be sure of what source you're building

Perhaps, in the rather narrow sense that you can download a Fedora source tarball and look inside yourself.

My claim is that upstream developers produce actual official outputs: git commits and sometimes release tarballs. (But note that release tarballs on GitHub are often a mess and not really desired by the developer.). And I further think that verification that a system like Fedora or Debian or PyPI is building from correct sources should involve byte-for-byte comparison of the source tree and that, at least in the common case, there should be no opportunity for a user of one of these systems to upload sources that do not match the claimed upstream sources.

The sadly common workflow where a packager clones a source tree, runs some scripts, and uploads the result as a “source tarball” is, IMO, wrong.


You know git allows history rewrite right?


of the head, or of any commit?


I’m not sure why this would make a difference. The only thing special about the head is that there is a little file (that is not, itself, versioned) saying that a particular commit is the head.


> If a package hosted by Fedora or Debian or PyPI or crates.io, etc claims to correspond to an upstream git commit or release, then the hosting system should build the package, from the commit or release in question plus whatever package-specific config and patches are needed, and publish that.

For Debian, that's what tag2upload is doing.


shoutout AUR, I’m trying arch for the first time (Omarchy) and wasn’t planning on using the AUR, but realized how useful it is when 3 of the tools I wanted to try were distributed differently. AUR made it insanely easy… (namely had issues with Obsidian and Google Antigravity)


If "whatever tools they see fit" means "patch quilting" then please no. Leave the stone age and enter the age of modern DVCS.


git can be seen as porcelain on top of patch quilting so it's not as much done âge as one might think


This is a misunderstanding of what Git does. Git is a Merkle hash tree, content-addressed, immutable/append-only filesystem, with commits as objects that bind a filesystem root by its hash. The diffs that make up a commit are not really its contents -- they are computed as needed. Now most of the time it's best to think of Git as a patch quilting porcelain, but it's really more than that, and while you can get very far with the patch quilting porcelain model, at some point you need to understand that it goes deeper.


That point is not reached during packaging though.

I prefer rebasing git histories over messing with the patch quilting that debian packaging standards use(d to use). Though last I had to use the debian packaging mechanisms, I roundtripped them into git for working on them. I lost nothing during the export.


Yes, I also end up doing things like that, but it's just a pain. If Debian did it themselves then adding a local commit would be truly trivial.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: