Google Will Use Your Data to Train Their AI According to Updated Privacy Policy

Fouquin · Jul 4, 2023

Google made a small but important change to their privacy policy over the weekend that effectively lays claim to anything you post publicly online for use to train their AI models. The original wording of the section of their privacy policy claimed that public data would be used for business purposes, research, and for improving Google Translate services. Now however the section has been updated to read the following:

Google uses information to improve our services and to develop new products, features and technologies that benefit our users and the public. For example, we use publicly available information to help train Google's AI models and build products and features like Google Translate, Bard, and Cloud AI capabilities.

Further down in the policy text Google has another section which exemplifies the areas of "publicly available" information they seek to scrape,

For example, we may collect information that's publicly available online or from other public sources to help train Google's AI models and build products and features like Google Translate, Bard, and Cloud AI capabilities. Or, if your business's information appears on a website, we may index and display it on Google services.

The new change has already gone into effect as of July 1st, 2023. Given the scope and longevity of Google accounts (think how long some people have had Gmail and YouTube accounts) this change now formally includes an incredibly vast amount of public interaction data stretched over decades. What is still uncertain is whether those individuals that have committed to "de-Googling" their online lives could be caught up in the dragnet of Google's data scraping regardless of whether they've agreed to this policy change, or if simply having any contact with Google over the years is enough. Large-scale public scraping has already been happening regardless of individual consent with other large language training models, such as OpenAI's ChatGPT. Ideally though this change affects only those whom have active accounts with various Google services. One important point to be made is that Google does not mention anything about using private data, and such data shared with Google is apparently safe from being ingested into the AI machine. For now.

View at TechPowerUp Main Site | Source

FierceRed · Jul 4, 2023

Can't be too surprised. The only reason Google is so powerful is their population of users using their services over decades. Deeper datasets = deeper insights usually.

To not leverage that advantage would be asinine.

Always awesome to see these changes happen before a long weekend when everyone is too busy living life to put it in a news cycle. :shadedshu:

R0H1T · Jul 4, 2023

You can always choose to delete your data with them like browsing history/youtube/maps/search queries et al. Though I'm not sure when it's completely purged from their servers, if at all.

Ferrum Master · Jul 4, 2023

If it trains on online trolls, what a bright future we will have here

Jism · Jul 4, 2023

Future fully automated - any human intervention gone at some point.

Google is build on other people's data. It's that simple.

Bomby569 · Jul 4, 2023

That's their bussiness model, use other people's data to make money. The disclaimer is just that, like saying tobacco kills.
i avoid their products like the plague.

bug · Jul 4, 2023

They've always used users' data in their products (and been upfront about it), I'm surprised using it for AI wasn't already covered.

Bomby569 · Jul 4, 2023

bug said:
They've always used users' data in their products (and been upfront about it), I'm surprised using it for AI wasn't already covered.

Ai will be heavily regulated in some places like the EU and i think Japan too, so probably to avoid any law problems for not mentioning it specifically

kondamin · Jul 4, 2023

I hope we can see improvements to translations quickly.

it’s ok for romance and Germanic languages but there is a lot to be desired when going between let’s say English and Korean.

..0 · Jul 4, 2023

"AI" they still use that misleading marketing term. is there no laws to stop this shit.....
do not answer this, laws wont fix anything, i am just venting.
and google, you can go to hell! assimilate this comment you twat.

trsttte · Jul 5, 2023

Wait, weren't they already doing so!?

claes · Jul 5, 2023

Yes, the original source is a little more clear on the change, where in the past they used the term “language models” and now explicitly say AI.

Google Says It'll Scrape Everything You Post Online for AI

An update to Google's privacy policy suggests that the entire public internet is fair game for its AI projects.

gizmodo.com

kapone32 · Jul 5, 2023

Since Google have decided to not cover Canadian News having to pay for News Content the can go somewhere.

bushlin · Jul 17, 2023

Many things are publicly available but copyright, ownership and rights to reproduce are not as free and easy as Google appear to be expecting.
Conflating information not blocked for collection so that it appears in search results, with information used to train AI seems suspiciously like Google abusing their market position.

bug · Jul 17, 2023

bushlin said:
Many things are publicly available but copyright, ownership and rights to reproduce are not as free and easy as Google appear to be expecting.
Conflating information not blocked for collection so that it appears in search results, with information used to train AI seems suspiciously like Google abusing their market position.

Google is not abusing anything, companies are using double standard when they complain about Google using their content.

Block Search Indexing with noindex | Google Search Central | Documentation | Google for Developers

A noindex tag can block Google from indexing a page so that it won't appear in Search results. Learn how to implement noindex tags with this guide.

developers.google.com

This simple mechanism has existed forever. But companies don't use it. Why? Because they need traffic Google generates for them, many would go under without it. At the same time, they feel that because Google will slap a (contextual) ad next to their content and charge for that, they are entitled to a piece of that, too.
I mean, let's turn things around and imagine if Google asked for a part of the companies' revenue because they sent some traffic their way. Crazy, right?

R-T-B · Jul 17, 2023

Ferrum Master said:
If it trains on online trolls, what a bright future we will have here

This is largely why guardrails have been implemented: that has already become an issue.

bug · Jul 17, 2023

Ferrum Master said:
If it trains on online trolls, what a bright future we will have here

Most people don't understand how these models work (I got it wrong at first, too): they are trained on a set of curated inputs, the resulting model is tested and released only if it passes validation. It doesn't learn anything after that. It's quite a big limitation, but at the same time, it's the only way to guarantee models won't go off the farm.

At the same time, I have no doubt somewhere in Russia and China someone is training models on their troll farms specifically.

bushlin · Jul 18, 2023

bug said:
Google is not abusing anything, companies are using double standard when they complain about Google using their content.

Block Search Indexing with noindex | Google Search Central | Documentation | Google for Developers

A noindex tag can block Google from indexing a page so that it won't appear in Search results. Learn how to implement noindex tags with this guide.

developers.google.com

This simple mechanism has existed forever. But companies don't use it. Why? Because they need traffic Google generates for them, many would go under without it. At the same time, they feel that because Google will slap a (contextual) ad next to their content and charge for that, they are entitled to a piece of that, too.
I mean, let's turn things around and imagine if Google asked for a part of the companies' revenue because they sent some traffic their way. Crazy, right?

By your logic, if you don't want to compromise the ownership and copyright of your online content for AI training you must cut off by far the most used, effectively a monopoly, search engine and tank your discoverability... Pick one.
I don't see how that isn't abusing a monopolistic position.

claes · Jul 19, 2023

Don’t know what copyright has to do with any of this. I consume copyrighted properties all day, analyze them, and then summarize and make observations about it. Is this illegal?

It’s not a reproduction of material, it’s an interpretation of it.

bug · Jul 19, 2023

bushlin said:
By your logic, if you don't want to compromise the ownership and copyright of your online content for AI training you must cut off by far the most used, effectively a monopoly, search engine and tank your discoverability... Pick one.
I don't see how that isn't abusing a monopolistic position.

By your own logic, Google is obligated to index everything under the Sun and pay for things they index?

Before Google there were many persons/shows that would summarize printed press for the masses. Obviously they made money off of that, they would include advertising and stuff in their 15-30 minute slots. Yet nobody thought about asking for a piece of that pie.

bushlin · Jul 19, 2023

bug said:
By your own logic, Google is obligated to index everything under the Sun and pay for things they index?

Before Google there were many persons/shows that would summarize printed press for the masses. Obviously they made money off of that, they would include advertising and stuff in their 15-30 minute slots. Yet nobody thought about asking for a piece of that pie.

What Google are obliged to do is not break antitrust law, Google profit greatly from the work of others by indexing it and serving ads alongside search results. They're not performing an altruistic act, it's business.
What they can't do is leverage the effective monopoly they have on search, as a means of preventing others from protecting their work from being further profited from to train an AI model.

In certain sectors AI threatens livelihoods, if that was you, you'd be pretty annoyed if you're being undercut by a derivative of your own work, taken without your consent... while person-on-the-internet's solution is to prevent indexing of your portfolio as if that's a perfectly valid option.

claes said:
Don’t know what copyright has to do with any of this. I consume copyrighted properties all day, analyze them, and then summarize and make observations about it. Is this illegal?

It’s not a reproduction of material, it’s an interpretation of it.

We're entering a new murky world where the arguments applied as to what a human is capable of, doesn't really translate to an AI... which can 'learn' more in a few minutes than would be possible in the lifetime of a human.
This is Stable Diffusion's implementation of an image, I wonder if they scraped Getty Images?

claes · Jul 20, 2023

You mean because I am stupider than a machine I can’t photoshop memes anymore? Damn.

AsRock · Jul 20, 2023

FierceRed said:
Can't be too surprised. The only reason Google is so powerful is their population of users using their services over decades. Deeper datasets = deeper insights usually.

To not leverage that advantage would be asinine.

Always awesome to see these changes happen before a long weekend when everyone is too busy living life to put it in a news cycle.

And cannot avoid using them as even a lot of games even require you connecting to google these days.

chrcoluk · Jul 20, 2023

A bigger issue is that they seem to be rolling out a removal of pagination on their text search page. Thats a big enough issue for me that it will stop me using the search engine.

On a new install of windows with me not logged into google I have no pagination, just an option to either auto generate new results on scroll or to manually click.

Luckily at the moment on this PC whilst signed in I still have pagination.

claes · Jul 20, 2023

Really?

System Name	Something Esoteric 2
Processor	Ryzen 7 7800X3D
Motherboard	ASUS Prime B650-Plus
Cooling	Corsair H150i Elite Capellix 360MM AIO
Memory	64GB Corsair Vengeance 6000Mhz DDR5
Video Card(s)	MSI Ventus RTX 3090 OC 24GB
Storage	WD Black SN850X 2TB NVMe
Display(s)	2 x Dell S2721DGF IPS
Case	Corsair 4000D Airflow Tempered Glass
Audio Device(s)	Samsung Buds3 Pro, SteelSeries Arctis Nova 7P
Power Supply	EVGA 1200W P3 80+ Platinum
Mouse	Logitech G903
Keyboard	Microsoft Sidewinder X6
Software	Windows 11 Pro

System Name	HELLSTAR
Processor	AMD RYZEN 9 5950X
Motherboard	ASUS Strix X570-E
Cooling	2x 360 + 280 rads. 3x Gentle Typhoons, 3x Phanteks T30, 2x TT T140 . EK-Quantum Momentum Monoblock.
Memory	4x8GB G.SKILL Trident Z RGB F4-4133C19D-16GTZR 14-16-12-30-44
Video Card(s)	Sapphire Pulse RX 7900XTX. Water block. Crossflashed.
Storage	Optane 900P[Fedora] + WD BLACK SN850X 4TB + 750 EVO 500GB + 1TB 980PRO+SN560 1TB(W11)
Display(s)	Philips PHL BDM3270 + Acer XV242Y
Case	Lian Li O11 Dynamic EVO
Audio Device(s)	SMSL RAW-MDA1 DAC
Power Supply	Fractal Design Newton R3 1000W
Mouse	Razer Basilisk
Keyboard	Razer BlackWidow V3 - Yellow Switch
Software	FEDORA 41

Processor	Ryzen 5 5700x
Motherboard	B550 Elite
Cooling	Thermalright Perless Assassin 120 SE
Memory	32GB Fury Beast DDR4 3200Mhz
Video Card(s)	Gigabyte 3060 ti gaming oc pro
Storage	Samsung 970 Evo 1TB, WD SN850x 1TB, plus some random HDDs
Display(s)	LG 27gp850 1440p 165Hz 27''
Case	Lian Li Lancool II performance
Power Supply	MSI 750w
Mouse	G502

Processor	Intel i5-12600k
Motherboard	Asus H670 TUF
Cooling	Arctic Freezer 34
Memory	2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s)	EVGA GTX 1060 SC
Storage	500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s)	Dell U3219Q + HP ZR24w
Case	Raijintek Thetis
Audio Device(s)	Audioquest Dragonfly Red :D
Power Supply	Seasonic 620W M12
Mouse	Logitech G502 Proteus Core
Keyboard	G.Skill KM780R
Software	Arch Linux + Win10

Processor	Ryzen 5 5700x
Motherboard	B550 Elite
Cooling	Thermalright Perless Assassin 120 SE
Memory	32GB Fury Beast DDR4 3200Mhz
Video Card(s)	Gigabyte 3060 ti gaming oc pro
Storage	Samsung 970 Evo 1TB, WD SN850x 1TB, plus some random HDDs
Display(s)	LG 27gp850 1440p 165Hz 27''
Case	Lian Li Lancool II performance
Power Supply	MSI 750w
Mouse	G502

Google Will Use Your Data to Train Their AI According to Updated Privacy Policy

Fouquin

Staff

FierceRed

R0H1T

Ferrum Master

Jism

Bomby569

bug

Bomby569

kondamin

..0

trsttte

claes

Google Says It'll Scrape Everything You Post Online for AI

kapone32

bushlin

New Member

bug

Block Search Indexing with noindex | Google Search Central | Documentation | Google for Developers

R-T-B

bug

bushlin

New Member

Block Search Indexing with noindex | Google Search Central | Documentation | Google for Developers

claes

bug

bushlin

New Member

claes

AsRock

TPU addict

chrcoluk

claes

System Name	boomer--->zoomer not your typical millenial build
Processor	i5-760 @ 3.8ghz + turbo ~goes wayyyyyyyyy fast cuz turboooooz~
Motherboard	P55-GD80 ~best motherboard ever designed~
Cooling	NH-D15 ~double stack thot twerk all day~
Memory	16GB Crucial Ballistix LP ~memory gone AWOL~
Video Card(s)	MSI GTX 970 ~~GOLDEN EDITION~~ RAWRRRRRR
Storage	500GB Samsung 850 Evo (OS X, *nix), 128GB Samsung 840 Pro (W10 Pro), 1TB SpinPoint F3 ~best in class
Display(s)	ASUS VW246H ~best 24" you've seen FULL HD 1O80PP SLAPS~
Case	FT02-W ~the W stands for white but it's brushed aluminum except for the disgusting ODD bays; cries
Audio Device(s)	A LOT
Power Supply	850W EVGA SuperNova G2 ~hot fire like champagne~
Mouse	CM Spawn ~cmcz R c00l seth mcfarlane darawss~
Keyboard	CM QF Rapid - Browns ~fastrrr kees for fstr teens~
Software	integrated into the chassis
Benchmark Scores	9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999

System Name	Best AMD Computer
Processor	AMD 7900X3D
Motherboard	Asus X670E E Strix
Cooling	In Win SR36
Memory	GSKILL DDR5 32GB 5200 30
Video Card(s)	Sapphire Pulse 7900XT (Watercooled)
Storage	Corsair MP 700, Seagate 530 2Tb, Adata SX8200 2TBx2, Kingston 2 TBx2, Micron 8 TB, WD AN 1500
Display(s)	GIGABYTE FV43U
Case	Corsair 7000D Airflow
Audio Device(s)	Corsair Void Pro, Logitch Z523 5.1
Power Supply	Deepcool 1000M
Mouse	Logitech g7 gaming mouse
Keyboard	Logitech G510
Software	Windows 11 Pro 64 Steam. GOG, Uplay, Origin
Benchmark Scores	Firestrike: 46183 Time Spy: 25121

System Name	Pioneer
Processor	Ryzen R9 9950X
Motherboard	GIGABYTE Aorus Elite X670 AX
Cooling	Noctua NH-D15 + A whole lotta Sunon and Corsair Maglev blower fans...
Memory	64GB (4x 16GB) G.Skill Flare X5 @ DDR5-6000 CL30
Video Card(s)	XFX RX 7900 XTX Speedster Merc 310
Storage	Intel 905p Optane 960GB boot, +2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs
Display(s)	55" LG 55" B9 OLED 4K Display
Case	Thermaltake Core X31
Audio Device(s)	TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply	FSP Hydro Ti Pro 850W
Mouse	Logitech G305 Lightspeed Wireless
Keyboard	WASD Code v3 with Cherry Green keyswitches + PBT DS keycaps
Software	Gentoo Linux x64 / Windows 11 Enterprise IoT 2024

System Name	Main PC
Processor	13700k
Motherboard	Asrock Z690 Steel Legend D4 - Bios 13.02
Cooling	Noctua NH-D15S
Memory	32 Gig 3200CL14
Video Card(s)	4080 RTX SUPER FE 16G
Storage	1TB 980 PRO, 2TB SN850X, 2TB DC P4600, 1TB 860 EVO, 2x 3TB WD Red, 2x 4TB WD Red
Display(s)	LG 27GL850
Case	Fractal Define R4
Audio Device(s)	Soundblaster AE-9
Power Supply	Antec HCG 750 Gold
Software	Windows 10 21H2 LTSC