Opinionated article by Alexander Hanff, a computer scientist and privacy technologist who helped develop Europe’s GDPR (General Data Protection Regulation) and ePrivacy rules.
We cannot allow Big Tech to continue to ignore our fundamental human rights. Had such an approach been taken 25 years ago in relation to privacy and data protection, arguably we would not have the situation we have to today, where some platforms routinely ignore their legal obligations at the detriment of society.
Legislators did not understand the impact of weak laws or weak enforcement 25 years ago, but we have enough hindsight now to ensure we don’t make the same mistakes moving forward. The time to regulate unlawful AI training is now, and we must learn from mistakes past to ensure that we provide effective deterrents and consequences to such ubiquitous law breaking in the future.



That’s the neat part. It doesn’t.
Copyright hasn’t worked for the past 100 years. Copyright was borne out of an social agreement that works generated from it would enter public domain in a reasonable time frame. Thanks to Mark Twain and Disney, the limit is basically forever, or it might as well be. Here we are still arguing about the next Bond film for a book series that was made in the fucking 1950s. Or the Lord of the Rings series, the genesis of all fantasy. Or thousands of other things that deserve to be in public domain already.
Copyright is a blunt tool that rich people use to bash the poor with. Whatever you think copyright is doing to protect your rights or your works is easy enough for them to just spend enough money with lawyers and cases until you cave. If copyright isn’t working for the public good, then we should abolish it.
People hate AI because it’s mostly developed and used by the rich as a shitty way to save money and layoff even more people than we’ve already had. But, it doesn’t have to be. All of these LLM projects were based on freely available research. Hell, Stable Diffusion is still something you can just download and use for free, despite the fact that Stability AI is still trying to wrestle back their own control into the model.
Instead of sticking our ears in our fingers and saying “la la la la, AI doesn’t exist, it must be destroyed/regulated/fined”, we could push this technology to open sourced as much as possible. I mean, let’s assume that we somehow regulate AI so that people have to pay to use copyrighted works for training (as absurd as that is). AI training goes down drastically, and stagnates. Counties like China are not going to follow those same rules, and eventually, China will be the technological leader here.
Or the program works, and other people who don’t give a shit about copyright freely allow AI to train their works. Then you have AI models that have to follow these arcane rules, but arrived at the same spot, anyway, but only for the rich people who can afford the systems that allow for that regulation. What the fuck was the point in the regulation, except to make it even more expensive to make?
ISBNDB approximates there to be 158,464,880 published books in existence.
Meta’s annual income was ~156 billion last year.
Assuming a one time purchase scenario and a $20 average cost that’s ~3.2 billion dollars. ~2% of their annual revenue.
Or you could assume assuming a $0.2 annual license (similar to a lot of technology licenses), or a 0.002 per “stream” (which I. This instance would be ‘use of data to train model’)
I agree with most of what you said, but if you buy into a lot of the economic paradigms your arguments are based on you must also realize that those require the copyrighted works must be paid for and it’s not unreasonable to do so.
Sure. Copyright is is - is broken. And it certainly doesn’t help I’m paying Spotify etc just so they can pocket the money. But don’t we need something so Hollywood can produce my favorite TV show? I mean that stuff costs millions and millions to make, until it somehow arrives on my screen. Or an author making a decent living with coming up with a nice fantasy novel series? What’s the alternative until we arrive at Star Trek and money is a thing of the past?
I’m pretty sure the AI companies are stealing copyrighted work. Afaik Mata admitted doing it. For several older ones we know which books were in the training datasets. There are several ongoing lawsuits dealing with books being used to train AI, Scarlett Johansson’s voice etc.
I agree. As is, AI is a plaything for rich companies. They have complete control, since they hired the experts and they have the money for all the graphics cards and electricity. If it’s as disruptive as people claim, it’s our bad. Because we’re out of the loop.