Table of Contents
Informatica Cloud Data Governance & Catalog (CDGC) is a powerful tool that enables organizations to efficiently manage their data, metadata, and data governance processes. A standout feature of CDGC is the Catalog Source, which acts as a gateway for bringing metadata and data from your source systems, such as Oracle databases, into CDGC. This functionality enables CDGC to seamlessly catalog and organize your information assets. In this post, we’ll explore what a Catalog Source is, how it works, and its key capabilities.
What is a Catalog Source?
A Catalog Source in CDGC is essentially a configuration that connects CDGC to your source system – like Oracle, SQL Server, Snowflake etc. This configuration allows CDGC to read metadata and optionally data, process it, and then ingest it into CDGC where it can be used for discovery, profiling, and governance.
In simple terms, the Catalog Source acts as the entry point for your data and metadata into the CDGC platform, enabling users to gain insights and control over their information assets.
Metadata vs. Data Read in Catalog Source
CDGC can read both metadata and data from your source system, depending on how the Catalog Source is configured.
- Metadata Read (Mandatory): Metadata is essential for the functioning of catalog source in CDGC. Metadata describes the structure and properties of your data (e.g., tables, columns, and relationships). The ability to read and catalog this metadata is a core function of CDGC and is always enabled in the Catalog Source.
- Data Read (Optional): Data read is optional and only occurs if you configure certain capabilities in your Catalog Source. For example, if you enable Data Quality or Profiling features, CDGC will also read and process the actual data from your source system, not just the metadata. This capability allows CDGC to assess data quality and perform in-depth analysis, which can be invaluable for governance purposes.
Role of the Secure Agent
It’s important to note that CDGC or IDMC does not directly connect to your source system. Instead, the Secure Agent plays a crucial role in this process. The Secure Agent is a lightweight, on-premises component that runs within your network, acting as an intermediary between CDGC and your source system.
Here’s how it works:
- The Secure Agent establishes a connection to the source system (e.g., Oracle database).
- It reads the metadata and/or data from the source system.
- It processes the data locally within your network to ensure that sensitive data is handled securely.
- Finally, the Secure Agent ingests the processed results into CDGC for further analysis and consumption.
Think of the Secure Agent as a trusted middleman—it handles all the communication between CDGC and your source system, ensuring that your data flows securely and efficiently into the governance and cataloging platform.
Why is Catalog Source Important?
The ability to connect CDGC to a wide variety of source systems, while ensuring that only necessary data is read, processed, and ingested, gives organizations a robust way to manage their metadata and data governance processes. The Secure Agent adds an extra layer of security by keeping sensitive information within your network while enabling CDGC to work seamlessly with your data.
By configuring the Catalog Source appropriately, you can ensure that your data governance processes are efficient, scalable, and secure, whether you’re focusing on metadata discovery or data profiling and quality.
Step by Step guide
- Create a database user and grant the required privileges.
Create a user in Oracle
CREATE USER oracle_service_account IDENTIFIED BY "welcome#123";
Grant the CONNECT privilege to the database user account to enable it to establish a connection to the Oracle database.
GRANT CONNECT TO oracle_service_account;
Grant the predefined SELECT_CATALOG_ROLE role to the user account
GRANT SELECT_CATALOG_ROLE TO oracle_service_account;
Grant the SELECT privilege on all tables and views in the specified schemas that you wish to profile
BEGIN
FOR t IN (SELECT owner, object_name, object_type
FROM all_objects
WHERE owner in('<SCHEMA 1>', 'SCHEMA 2', 'SCHEMA 3')
AND object_type IN ('TABLE', 'VIEW'))
LOOP
EXECUTE IMMEDIATE 'GRANT SELECT ON '||t.owner||'.'|| t.object_name || ' TO oracle_service_account';
END LOOP;
END;
2. Create a connection in IDMC administrator to connect to the Oracle database.
3. Create a Catalog Source configuration in Metadata Command Center utilizing the connection created in previous step.
Metadata Extraction is mandatory. Other capabilities are optional and can be enabled based on your requirements.
You can configure automatic stakeholder assignment and schedule the catalog source to run according to your requirements. These options are optional.
4. Run the Catalog Source.
You can monitor the catalog source execution.
5. Once the Catalog Source is successfully executed, you can view the ingested data in CDGC.