azure

Azure Databricks Integration with QlikView

Components

Instructions

  1. First let’s collect Azure Databricks related configuration as follows:
    • Select Cluster
      Azure Databricks → Clusters (select the cluster for compute and return results to QlikView)
    • Get Configuration
      Configuration tab → Advanced Settings → JDBC/ODBC
    • Gather following highlighted details Cluster-config
    • Generate personal token in Azure Databricks with proper comment and Lifetime (0 for live indefinitely)
      User Settings → Generate New Token
    • Copy this data to some file for further use
  2. Download Simba ODBC Driver from the link Databricks-Simba-odbc-driver or alternatively get it from this git prerequisites/SimbaODBCDriver.zip
  3. After extracting the downloaded file, we will have msi installer file as follows: SimbaExtracted
  4. Install the driver, by choosing appropriate options
  5. Navigate to setup Azure Databricks data source in the system as follows:
    Control Panel → View by (Large Icons) at top right corner → Administrative Tools cp-admintools datasources
  6. Click Add in User DSN/ System DSN, based on your needs. create-dsn
  7. Make sure to select/ update with following configuration

    Parameter/ Setting Value
    Data Source Name as required
    Description optional
    Spark Server Type SparkThriftServer
    Service Discovery Mode No Service Discovery
    Host HOST_FROM_DATABRICKS_CLUSTER
    Port PORT_FROM_DATABRICKS_CLUSTER (default value: 443)
    Database default is sufficient for this integration
    Authentication User Name and Password
    User Name token
    Password PERSONAL_ACCESS_TOKEN_GENERATED
    Thrift Transport HTTP
    SSL Options Enable SSL (check this) & let it use default cacerts.pem
    HTTP Options HTTP_PATH_FROM_DATABRICKS_CLUSTER
  8. Test Connection (this starts the cluster if it’s in terminated state and wait for the respone from cluster to the DSN ODBC driver Setup)
    • Result from DSN Setup dsn-success
    • Spark UI/ Session details spark-session
  9. Open QlikView and Edit Script (Ctrl+E)
  10. Select ODBC Driver as follows: odbc-select-qlik
  11. Click connect and check Show USER DSNs (if you have setup USER DSN rather than System DSN) → Select Data Source name we have created → Click OK (if required, you can test connection)
    select-dsn connect-test
  12. Click Select to select the database and tables to be loaded into QlikView reports as follows:
    • select-dataset
    • load-dataset
  13. Click Reload once we have the SQL statement append to the file, this queries and loads data from databricks database to the qvw table object
  14. Following is the data, when displayed using
    Object → New Sheet Object → Table Box result-load


References

  1. Databricks Spark drivers
  2. DataStax Configuring Simba drivers
  3. simba-partners/databricks

      👨‍💻 github.com/Vinay26k