Monday, July 3rd 2023
Google Will Use Your Data to Train Their AI According to Updated Privacy Policy
Google made a small but important change to their privacy policy over the weekend that effectively lays claim to anything you post publicly online for use to train their AI models. The original wording of the section of their privacy policy claimed that public data would be used for business purposes, research, and for improving Google Translate services. Now however the section has been updated to read the following:
Sources:
Gizmodo, Google
Google uses information to improve our services and to develop new products, features and technologies that benefit our users and the public. For example, we use publicly available information to help train Google's AI models and build products and features like Google Translate, Bard, and Cloud AI capabilities.Further down in the policy text Google has another section which exemplifies the areas of "publicly available" information they seek to scrape,
For example, we may collect information that's publicly available online or from other public sources to help train Google's AI models and build products and features like Google Translate, Bard, and Cloud AI capabilities. Or, if your business's information appears on a website, we may index and display it on Google services.The new change has already gone into effect as of July 1st, 2023. Given the scope and longevity of Google accounts (think how long some people have had Gmail and YouTube accounts) this change now formally includes an incredibly vast amount of public interaction data stretched over decades. What is still uncertain is whether those individuals that have committed to "de-Googling" their online lives could be caught up in the dragnet of Google's data scraping regardless of whether they've agreed to this policy change, or if simply having any contact with Google over the years is enough. Large-scale public scraping has already been happening regardless of individual consent with other large language training models, such as OpenAI's ChatGPT. Ideally though this change affects only those whom have active accounts with various Google services. One important point to be made is that Google does not mention anything about using private data, and such data shared with Google is apparently safe from being ingested into the AI machine. For now.
35 Comments on Google Will Use Your Data to Train Their AI According to Updated Privacy Policy
To not leverage that advantage would be asinine.
Always awesome to see these changes happen before a long weekend when everyone is too busy living life to put it in a news cycle. :shadedshu:
Google is build on other people's data. It's that simple.
i avoid their products like the plague.
it’s ok for romance and Germanic languages but there is a lot to be desired when going between let’s say English and Korean.
do not answer this, laws wont fix anything, i am just venting.
and google, you can go to hell! assimilate this comment you twat.
gizmodo.com/google-says-itll-scrape-everything-you-post-online-for-1850601486
Conflating information not blocked for collection so that it appears in search results, with information used to train AI seems suspiciously like Google abusing their market position.
developers.google.com/search/docs/crawling-indexing/block-indexing
This simple mechanism has existed forever. But companies don't use it. Why? Because they need traffic Google generates for them, many would go under without it. At the same time, they feel that because Google will slap a (contextual) ad next to their content and charge for that, they are entitled to a piece of that, too.
I mean, let's turn things around and imagine if Google asked for a part of the companies' revenue because they sent some traffic their way. Crazy, right?
At the same time, I have no doubt somewhere in Russia and China someone is training models on their troll farms specifically.
I don't see how that isn't abusing a monopolistic position.
It’s not a reproduction of material, it’s an interpretation of it.
Before Google there were many persons/shows that would summarize printed press for the masses. Obviously they made money off of that, they would include advertising and stuff in their 15-30 minute slots. Yet nobody thought about asking for a piece of that pie.
What they can't do is leverage the effective monopoly they have on search, as a means of preventing others from protecting their work from being further profited from to train an AI model.
In certain sectors AI threatens livelihoods, if that was you, you'd be pretty annoyed if you're being undercut by a derivative of your own work, taken without your consent... while person-on-the-internet's solution is to prevent indexing of your portfolio as if that's a perfectly valid option. We're entering a new murky world where the arguments applied as to what a human is capable of, doesn't really translate to an AI... which can 'learn' more in a few minutes than would be possible in the lifetime of a human.
This is Stable Diffusion's implementation of an image, I wonder if they scraped Getty Images?
On a new install of windows with me not logged into google I have no pagination, just an option to either auto generate new results on scroll or to manually click.
Luckily at the moment on this PC whilst signed in I still have pagination.