To get started, it is easiest to source a static data product from your desktop. We’ve even provided a link to some sample data at the bottom of this page for you to practice with.

Data products sourced from an endpoint can either be static or updating and can consist of one or many files. Static data products use data which is loaded once and does not change and can be sourced from either a desktop endpoint or a cloud endpoint.

Updating or repeating data products can use data which may be regularly updated. If you are not sure if your data is going to be updated, and your data is sourced from a cloud endpoint then it is recommended to always select an update frequency and update action, even if the product does not update.

Note: You must have a Technician role and a Product Manager role to perform this act

  1. Click on the Products icon on the navigation bar.

  2. Click on Create Product.

  3. Click on Original data product to begin to publish a static or updating data product created from an endpoint.

Set up your Data Product

  1. Proceed to the Publish: Setup step of the workflow:

    1. Set publish parameters:

      1. Enter the working Title for the data product. This can be changed when when the data is finally packaged as a data product.

      2. Select the Frequency - Static or any other update frequency. The update Frequency is displayed on the data product page as part of the Metadata Metrics to give more information about your data product to other users. Follow the article 'Update your Data Product from an Endpoint' to understand how to trigger the updates.

      3. Enter the First Update date, time, and timezone to specify when the process should start to look for TNF files. If a TNF file is not found the process will just keep looking so the Frequency above is not relevant to the actual update frequency.

      4. Select the Update action - append or replace. All updates of tabular data products must adhere to the original schema definition. If the schema has changed then a new data product must be published.

      5. Add users to your Publish Team. You must have one or more users with the Technician role and the Product Owner role to be able to proceed. The 2 blue ticks show success.

      6. Click Save and Continue.

Define the Source of your Data

  1. Select Endpoint or Upload.

    1. If using an endpoint:

      1. Select Type.

      2. Select Endpoint.
        Your cloud endpoints must already be created.

    2. If using an upload:

      1. Click Browse (one table only)
        Or

      2. Drag and drop your file from your desktop onto the browse button.

Copy your Data onto the Platform

  1. Get your data ready:

    1. If you have chosen a cloud endpoint as your Source then you must now transfer the data you want to publish to the specified folder structure on your endpoint. You must place each table in a separate folder for example :

      1. Go to your cloud account.

        1. Create the specific folder structure.

        2. Move your data into the folder structure.

        3. Return to the Review page of the publish process.

        4. Select Refresh.

    2. If you have chosen your Desktop as your Source then the data has already been prepared as it can only be one file.

  2. If your source data files are listed as expected then click Save and Continue, if not then check for errors in your input file. The publish process will not complete successfully if this step is not correct.

    Review list of source files
  3. A popup asks you to confirm the format of the data:

    1. For Tabular file formats the data discovery process is initiated to determine the structure of your data. The process can take several minutes to complete, regardless of the size of the data.

    2. For Non-Tabular a Data Copy process begins which moves the data into a specific location created on the platform for the data product being created.

  4. Click Continue when this process has completed successfully.

Confirm the Schema of your Data

  1. If your data is Tabular then the next step of the publish process is to confirm the format of the data that has been detected and to select tables for preview. If your data is not tabular then this step is not required.

    Review format of tabular data
  2. Review that the schema identified in the previous step is correct. Click on Save and Continue.

  3. Make sure your data adheres to these requirements to ensure your publish is successful:

    1. Always ensure the date column uses the Hive standard date format: mm-dd-yyyy or accept the data column as a string and amend it in Spaces.

    2. Field names for Parquet and Orc input formats cannot contain the characters ,;{}()\n\t= as this causes a failure in Publish. The characters must be cleansed from the data ahead of the publish workflow.

    3. Special characters in field names are replaced with the "_" character for database compliance in Spaces and Export.

    4. If your table does not show headers then check your column data types differ from the column header, i.e. a string header and a <i>bigint</i> data type.

    5. Make sure all attribute names are in lowercase as uppercase letters can result in publish failure.

    6. Do not include hyphens in a table name if the table is due to be updated.

  4. For each table identified select for preview if you want a random selection of rows to be shown as a Data Preview on the data product page. You can choose a maximum of 15 tables to be show in a preview.

  5. Select Save and Continue to proceed.

  6. The validation process processes all records within the data file(s).

  7. If any tables have been selected to show a preview, the sample data is displayed as further confirmation that the data has been processed correctly.

You are now ready to add subscription plan templates to your Data Product or update your data product from an endpoint on a repeating basis


References and FAQs

Publish Team

Original Data Product

Metadata Metrics

Transfer Notification File (TNF)

Validation Process

The Publish Process

Sample Data for your First Data Product

Related Pages

Update your Data Product from an Endpoint

Add Subscription Plans to your Data Product