Sponsored material

Google Cloud
Introduction
Enterprises manage structural data in organized tables and organizes growing volume in non -imposed data such as images, audio and documents. It is traditionally complicated to analyze these diverse data types, as they often need separate tools. Non -imposed media usually requires exports to special services for processing (such as computer vision service for image analysis, or speech -from speech for audio), which creates data silences and hindering a comprehensive analytical theory.
Consider a fantasy e -commerce support system: The ticket details live in a large table, while the support call recording or the damaged product images are in cloud object stores. Without a direct link, answering the context -rich question, such as “identify all support tickets for a specific laptop model where the call audio indicates the disappointment of top users and the screen has been shown in the image” This is a burdensome, multi -step process.
This article is a practical, technical guide for Object Reef in Big Query, designed to combine this analysis. We will find a way to build, inquire and govern the multi -modest datases, which will enable comprehensive insights using the SQL and the interface.
Part 1: Object Reef – Key to combine multi -modal data
Object Reef’s structure and function
To overcome the challenge of solid data, the Big Curery introduced a special structure data type Object Reef. An Object Reef acts as a direct reference to the non -imposed data object stored in Google Cloud Storage (GCS). It does not contain self -imposed data (such as a twenty -64 encoded image in a database, or a duplicate audio); Instead, it points to the location of the data, which allows mass access and adding questions for analysis.
The Object Reef structure consists of several important fields:
- uri (String): GCS Way of an item
- Writer (String): Large GCS allows items to be securely accessed
- Version (String): GCS stores specific generation identification of the Object, which locks the precise version of the reproductive analysis
- Details (JSON): a JSON element that often contains GCS metadata
contentType
Orsize
Here’s an Object Reef Value JSON representation:
JSON
{
"uri": "gs://cymbal-support/calls/ticket-83729.mp3",
"version": 1742790939895861,
"authorizer": "my-project.us-central1.conn",
"details": {
"gcs_metadata": {
"content_type": "audio/mp3",
"md5_hash": "a1b2c3d5g5f67890a1b2c3d4e5e47890",
"size": 5120000,
"updated": 1742790939903000
}
}
}
By sumifying this information, an object provides all the necessary details to find, securely access and understand the basic features of the non -imposed file in the Ref GCS. It forms the construction of multi -modal tables and data fames, which allows structural data to remain as well as non -imposed content references.
Create Multi Moodle Table
A Multi Moodle Table There is a standard bug core table that includes one or more objects reef columns. This section covers how to build these tables and settle them with SQL.
When creating a new table, you can explain the object reef columns or add them to the existing tables. This flexibility allows you to adopt your existing data models to take advantage of multi -modal capabilities.
Make Object Reef Columns with Object Tables
If you have many files secured in GCS bucket, Object table Object Refus is an effective way. An Object Table is just a reading table that shows the contents of the GCS directory and automatically adds a column of the name ref
Type Object Reef.
SQL
CREATE EXTERNAL TABLE `project_id.dataset_id.my_table`
WITH CONNECTION `project_id.region.connection_id`
OPTIONS(
object_metadata="SIMPLE",
uris = ('gs://bucket-name/path/*.jpg')
);
Output is a new table that is A ref
Column you can use ref
Column with such tasks AI.GENERATE
Or join other tables.
To build a Object Refus in the program
More dynamic workflows LOO, you can create object refills through the program OBJ.MAKE_REF()
Function is normal to wrap this function OBJ.FETCH_METADATA()
To settle details
Element with GCS metadata. The following code also works if you change gs://
URI field route in the current table.
SQL
SELECT
OBJ.FETCH_METADATA(OBJ.MAKE_REF('gs://my-bucket/path/image.jpg', 'us-central1.conn')) AS customer_image_ref,
OBJ.FETCH_METADATA(OBJ.MAKE_REF('gs://my-bucket/path/call.mp3', 'us-central1.conn')) AS support_call_ref
Either by using Object Tables or OBJ.MAKE_REF
You can build and maintain multi -modal tables, stage for integrated analytics.
Part 2: Multi Moodle Tables with SQL
Secure and access to government
Object Reef enables governance on your multi -modal data. Access to basic GCS items is not directly given to the end user. Instead, it has been assigned to a major connection resource described in the field of the author of the Object Reef. This model allows several layers of security.
Consider the following multi -modal table, which saves information about our e -commerce store’s LEDUCT product images. The table includes an Object Reef column image
.
Column level security: Limit access to entire columns. For a set of users who only analyze product names and ratings, an administrator can apply a column surface security to the Lord image
Column it refuses to choose these analysts image
The column still allows analysis of other structural fields.
Safety of a row surface: The Big Cyrey allows filter to filter the user can view on the basis of fixed rules. A row level policy can limit access to the user’s role. For example, in a policy it can be stated that “do not allow consumers to inquire from dogs related to products”, which filters these row with the results of the inquiry as if they are not present.
More than one permitted: Two different contacts have been used in this table image.authorizer
Element (conn1
And conn2
,
This allows the administrator to centralize GCS permits through contacts. For example, conn1
Can access a public image bucket, while conn2
Access to a limited bucket with the design of new products. Even if a user can see all rows, their ability to inquire from the basic file for the “Bird Seed” product depends on whether they are allowed to use more privileged. conn2
Connection
AI-Drived Infrance with SQL
AI.GENERATE_TABLE
The function produces a new, structural table by applying the Generative AI model to your multi -modal data. It is ideal for data enrichment tasks on a scale. Let’s use your e -commerce example to create a SEO keyword and a short marketing description for each product, use its name and icon as a source material.
The following inquiry takes action products
Table, to take product_name
And image
Object Reef as inputs. It produces a new table containing original product_id
A list of SEO keywords, and product description.
SQL
SELECT
product_id,
seo_keywords,
product_description
FROM AI.GENERATE_TABLE(
MODEL `dataset_id.gemini`, (
SELECT (
'For the image of a pet product, generate:'
'1) 5 SEO search keywords and'
'2) A one sentence product description',
product_name, image_ref) AS prompt,
product_id
FROM `dataset_id.products_multimodal_table`
),
STRUCT(
"seo_keywords ARRAY, product_description STRING" AS output_schema
)
);
The result is a new structural table with columns product_id
For, for, for,. seo_keywords
And product_description
. This time demands marketing work automatically and produces data developed in use that can be directly used in the content management system or used for further analysis.
PAR
Pasten and bug cure for multi -moodle enclosure
The language is the choice for many data scientists and data analysts. But practitioners usually go into problems when their data is huge to fit in the memory of the local machine.
Big Query Data Fames Provides a solution. It offers Pandas -like API to communicate with data stored in Big Qiyery Ever Pull it into local memory. Library The code translates into the SQL, which is pushed down and hanged on the Big Curi’s extremely expanding engine. It provides a popular Azgar library’s familiar syntax with which the Big Query has the strength.
It naturally extends to multi -modal analytics. A large number of data frames can represent both your structural data and non -imposed files references, together in a single Multi Moodle data frame. This enables you to load, change and analyze data frames containing your structural metadata and pointer, in the same environment.
Create Multi Moodle Data Frame
Once you have Big Frames Library Install, you can start working with multi -modal data. Have a key concept Bulab column: A special column that refers to non -imposed files in GCS. Think of an bulb column to represent an object of an object Reef – it does not keep the file itself, but rather indicates it and provides ways to communicate with it.
There are three common ways to make or nominate a blob column:
PYTHON
import bigframes
import bigframes.pandas as bpd
# 1. Create blob columns from a GCS location
df = bpd.from_glob_path( "gs://cloud-samples-data/bigquery/tutorials/cymbal-pets/images/*", name="image")
# 2. From an existing object table
df = bpd.read_gbq_object_table("", name="blob_col")
# 3. From a dataframe with a URI field
df("blob_col") = df("uri").str.to_blob()
To describe the above point of view:
- A GCS location: Use
from_glob_path
To scan the GCS bucket. Behind the curtains, this operation produces a temporary buggy object table, and offers it as a data frame with a bloody column ready to use it. - An existing object table: If you already have a big item Object table, use
read_gbq_object_table
Function to load it. It reads the existing table without the need to scan the GCS. - An existing data frame: If you have a big data frame that has a string GCSURS column, just use it
.str.to_blob()
To “upgrade” the procedure on this column in the blah column.
Ai-driven assessment with azagar
The main advantage of creating a multi-modal data frame is to analyze AI-driving directly on your non-imposed data. The Big Coyry Data Fames allow you to install a large language model (LLM) in your data, including any bulb columns.
Common workflow includes three steps:
- Create Multi Moodle Data Frame with Bulab Columns Pointing Non -structures
- Load the pre -existing Bigkuri ML model into the Big Frames Model Object
- On the model Object. Call the prediction () method, passing your multi -modal data frame as input.
Let’s continue with e -commerce example. We will use gemini-2.5-flash
Model to produce a brief detail for each pet product image.
PYTHON
import bigframes.pandas as bpd
# 1. Create the multimodal dataframe from a GCS location
df = bpd.from_glob_path(
"gs://cloud-samples-data/bigquery/tutorials/cymbal-pets/images/*", name="image_blob")
# Limit to 2 images for simplicity
df = df.head(2)
# 2. Specify a large language model
from bigframes.ml import llm
model = llm.GeminiTextGenerator(model_name="gemini-2.5-flash-preview-05-20")
# 3. Ask the LLM to describe what's in the picture
answer = model.predict(df_image, prompt=("Write a 1 sentence product description for the image.", df_image("image")))
answer(("ml_generate_text_llm_result", "image"))
When you call model.predict(df_image)
Builds and implementing the SQL queries using Big Coyry Data Fems ML.GENERATE_TEXT
Function, automatically undergo file references blob
Column and text prompt
As inputs, the Big Query Engine acts on this application, sends data to the gymnasium model, and in turn returns the text detail in a new column.
This powerful integration allows you to analyze a multi -modal analysis in thousands or millions of files that use only a few lines.
Go deep with multi -modal data frames
In addition to using LLM for generation, bigframes
The library offers a growing set of tools designed to process and analyze non -imposed data. Key capabilities available with Bulab column and its related methods include:
- Built -in change.: Prepare images for modeling with local changes for ordinary operations BlurredFor, for, for,. NormalizeAnd Size On a scale
- Embeding Generation: Enable the semantic search by producing embeddings from multi -modal data to convert data into a function call in an embellishment.
- Pdf chinling: Streamlines regg workflows by dividing the document content into smaller, meaningful segments through the program.
These features indicate that the Big Query Data Frame is being built as a multi -modal analytics and an end to the end to the AI ​​with Azigar. As progress is underway, you can expect to connect more to more tools found in traditionally separate, special libraries bigframes
.
Result:
Multi -modal tables and data frames represent a change on how organizations can approach data analytics. By creating a lively, secure link between tabler data and non -imposed files in GCS, the Big Cory eliminates data siles, which has a long complex multi -modal analysis.
This guide shows that whether you are writing data analyst SQL, or data scientists using azar, now you now have the ability to easily analyze the relevant data as well as the manufacturer of malicious modal files.
To start building your multi -modal analytics solution, find the following resources:
- Official documents: Read a review How to analyze multi -modal data in Big Query
- The notebook: Get hands with A Big Coyry Data Frames Example Notebook
- Stepped tutorials:
Author: Jeff Nelson, Developer Relationship Engineer