Connecting to Tableau
You can use Tableau to connect to a MemSQL cluster for exploring your data. This guide provides guidelines and best practices for developing business Tableau dashboards using MemSQL data and covers the following:
Installing Tableau Desktop
To develop business Tableau dashboards using MemSQL data, you first need to install Tableau Desktop.
If you do not have Tableau Desktop already installed on your PC or Mac, you can download a 14-day trial version from the Tableau Download site.
(Optional) Install Tableau Reader
You can use Tableau Reader to package a dashboard, along with the supporting data, and share it with the recipients, even if they do not have Tableau Desktop (trial or licensed version) installed.
Tableau Reader is available as a free download.
Connecting Tableau to MemSQL
After installing Tableau Desktop, connect it to MemSQL through the following steps:
Download and install MySQL ODBC connector.
Configure MySQL ODBC connector.
Connect Tableau to MemSQL.
Download and Install ODBC Connector for MySQL
Navigate to the MYSQL ODBC Connector site.
5.3.12and then choose your
OS Versionfrom the drop-down lists. Newer 8.x versions of the ODBC connector may require additional configuration changes when connecting to a MemSQL cluster. See the MemSQL documentation for more information.
Pick the appropriate installer option for your system and then click
Download. After the download finishes, install the driver.
Configure MySQL ODBC Connector
You need an existing ODBC connection before connecting Tableau with MemSQL.
If you are using Windows, open the search bar and type
Set up data sources (ODBC).
Configure a new ODBC data source for MemSQL.
Testto test the connectivity and click
Connect Tableau to MemSQL
Open Tableau Desktop and navigate to
Connect > To a Server > More… > MemSQL.
Sign Inwindow, specify the
Require SSLcheck box if the MemSQL cluster has been configured for secure client connections.
Setting Up MemSQL Data Source in Tableau
On the data source page, perform the following steps:
The data source name defaults to the server value you entered previously. Enter a unique data source name to be used in Tableau. For example, use a data source naming convention that helps other users of the data source identify which data source to connect to.
On the left pane, navigate to the
Databasedrop-down list and select a database or search by database name.
Table, select a table or search by table name.
Drag a table to the canvas, and then select the sheet tab to start your analysis.
You can also perform different joins, table appends, filters in the data source along with setting up live connections and extracts. More information is available on the Tableau Online Help.
After the data source set up, you are ready to visualize and analyze data.
Tableau Best Practices
Follow these strategies to build efficient Tableau dashboards.
Keep simple data sources: The performance of visualization depends on the underlying data sources. To improve performance, extract only the data that is needed for the worksheet to perform its analysis.
Extract only the required data: Minimize joined tables. If the analysis requires data from joined tables, then edit the connection to remove unused data.
Execute data source filters before executing traditional filters for context filtering so that the extract is smaller and takes less time to refresh. Although context filtering creates a flat table with initial performance issues, performance improves for subsequent views and filters.
Use extracts filtered by context filters so that it contains only the data that is needed. Extracts are stored in an internal structure, which is easier for Tableau to query and access. Also, calculated fields are saved as actual data, saving further computation time. One of the drawback of using extracts is that the data is not real-time and a scheduled task is required to refresh the data.
Hide unused columns: Hiding unused columns (dimensions/measures) minimizes extract refresh time or custom SQL query time. You can hide fields in the data window or data source, or allow Tableau to hide all unused fields before creating the extract in the Extract Data box.
Use Extracts: Tableau uses different techniques to optimize the extract. You can also improve visualization by aggregating the data for a visible dimension, known as an aggregated extract. When users interact with an aggregated extract, all calculations and summations have already been compiled.
Extracts can be filtered with data source filters, which can help to control the size of the extract in two different ways, depending on when the filter is applied:
If a data source filter is in place prior to extract creation, the extract will contain filtered records.
If a data source filter is put in place after extract creation, the filter will be applied against the full extracted data set. So, your extract will contain all the data but will only show what the data source filter is allowing.
Minimize quick filters: Quick filters require Tableau to run a query against the database to determine the values to display for the selected dimension. Therefore, the quick filters that do not require querying the database for values are custom value list, wildcard match, relative date filters, and the browse period date filter.
It is recommended that you avoid quick filters that require knowledge of the values in the database such as multiple value list, single value list, compact list, slider, measure filters, and ranged date filters.
Use ‘All values in database’ option in quick filters: Use the default option ‘All values in database’ for a quick filter and avoid using ‘Only relevant values’ option. The default option makes all values in the database for that particular field available for user selection. In contrast, ‘Only relevant values’ compares the values returned from the database with those in other quick filters to show only the values that apply, given the choice made on the other filters. This behavior can bring performance issues, especially if the dashboard contains more than two quick filters.
Avoid quick filters or actions that generate context filters: Context filters create a context TEMP table with the values that go through the filter. All other filters access this TEMP table to draw their values from the limited set of data. This improves the performance of the dashboard; however, if the context filter does not trim down the data to a more manageable set in the new context table then it may cause a performance issue with the visualization.
If you use a context filter, make the TEMP table as small as possible. For example, eliminate columns that are not needed for that particular visualization to reduce the size of the data set at least to one tenth of the original size. Also, the context filter should be used against slow-changing values or dimensions only.
Keep range quick filters simple: To display results for across large separated periods of time, use a visualization rather than a quick filter. This is applicable for the following date filters:
Relative date filters, which are used for a date range that is relative to a specific date.
Range of date filters, which are used for a defined range of discrete dates.
Discrete date filters, which are used for the individual dates selected from a list. It is recommended to avoid discrete date filters.
Replace quick filters with action filters: Instead of using multiple quick filters, ‘Only Relevant Values’ option, or quick filters with too many values, use actions filters as they do not require Tableau to run additional queries. These filters work on the users actions, such as clicking on a mark. Action filters can also operate as cascading filters in a filter hierarchy, where values are filtered out as they traverse through the hierarchy.
Avoid using action filters from several sheets for a single dashboard layout as quick filters: Creating visualizations in different worksheets to use them later as action filters in a single dashboard generates extra load every time the dashboard is loaded, as the visualizations are refreshed with every action from the user. When this situation arises, quick filters may be a better solution as they are only loaded once when the dashboard is loaded, and then the filter is applied across all the sheets simultaneously.
Limit the use of customer SQL code in live connections: SQL connections are issued to the database inside a subquery, which can include other clauses from Tableau like GROUP BY, ORDER BY, WHERE and more. Even with efficient SQL code, the extra clauses issued to the database can slow down performance. Use custom SQL only if Tableau cannot generate the desired outcome.
If a SQL command is necessary, then create a view inside the database and connect to it from Tableau. If it is not possible to create a new view, build a data extract with the SQL code. It will run only once when the extract is built or refreshed, minimizing the effect on visualization performance.
Remove extra clauses: Effective SQL code provides Tableau the required data to produce the desired outcome. Remove extra clauses in the SQL code for Tableau to organize and visualize data effectively.
Consider data types for faster calculation: Tableau provides a massive list of functions, divided into different categories that assist in creating calculated fields. The performance impact from different data types may be unnoticeable on smaller data sets, but these differences are more pronounced as the number of records increase. In general, the fastest calculations involve Boolean or number data types, followed by dates, and finally string calculations. It is important to consider ways to achieve same calculated results using faster data types.
Avoid blended calculations: Blended calculations occur when you have to query different data sources to obtain a single calculated field in your visualization. In this case, Tableau needs to query each data source separately to retrieve the values. This can affect performance, especially in large data set. An alternative is to prepare a new view on the data layer on the database server to keep data processing outside Tableau.
Avoid row-level calculations involving parameters: Row-level or record-level calculations operate on every record in the underlying data. Every row calculation consumes time, but when the parameter contains a significant variety of values, for example a table as parameter, it increases time processing exponentially.
Avoid high mark counts: Marks are the points, plots, or symbols on the visualizations. Each mark must be created and positioned before the report can be rendered. Use the capabilities of Tableau to drill down or interact with visualizations to obtain better results.
Minimize the file size of images or custom shapes: Big images or shapes result in slow loading and exporting process. Keep images below 50Kb, 32x32 pixels in dimensions, and use efficient image compression formats to reduce load time.
Tableau Datasource Customization (TDC) helps optimize the interaction of the Tableau dashboard with a MemSQL cluster. Since Tableau is designed to create TEMP tables, it can sometimes result in performance issues. TDC can stop creation of these tables, along with other configuration settings.
Setting up TDC File
Before you create a TDC file, ensure that you have the Tableau version and the driver name. The following are few sample codes that can be used to set up TDC files for MemSQL.
\<?xml version=\'1.0\' encoding=\'utf-8\' ?\> \<connection-customization class=\'memsql\' enabled=\'true\' version=\'10.0\'\> \<vendor name=\'memsql\'/\> \<driver name=\'memsql\'/\> \<customizations\> \<customization name=\'CAP\_ODBC\_BIND\_SUPPRESS\_WIDE\_CHAR\' value=\'yes\' /\> \<customization name=\'CAP\_CREATE\_TEMP\_TABLES\' value=\'no\' /\> \<customization name=\'CAP\_QUERY\_SUBQUERY\_QUERY\_CONTEXT\' value=\'yes\' /\> \<customization name=\'CAP\_SELECT\_INTO\' value=\'no\' /\> \<customization name=\'CAP\_ODBC\_BIND\_FORCE\_MEDIUM\_STRING\_BUFFERS\' value=\'yes\' /\> \</customizations\> \</connection-customization\>
\<?xml version=\'1.0\' encoding=\'utf-8\' ?\> \<connection-customization class=\'mysql\' version=\'2019.1\' enabled=\'true\'\> \<vendor name=\'mysql\' /\> \<driver name=\'mysql\' /\> \<customizations\> \<customization name=\'CAP\_ODBC\_BIND\_SUPPRESS\_WIDE\_CHAR\' value=\'yes\' /\> \<customization name=\'CAP\_CREATE\_TEMP\_TABLES\' value=\'no\' /\> \<customization name=\'CAP\_QUERY\_SUBQUERY\_QUERY\_CONTEXT\' value=\'yes\' /\> \<customization name=\'CAP\_SELECT\_INTO\' value=\'no\' /\> \<customization name=\'CAP\_ODBC\_BIND\_FORCE\_MEDIUM\_STRING\_BUFFERS\' value=\'yes\' /\> \</customizations\> \</connection-customization\>
Copy the code into a text editor and save the file with a
extension. On a Windows machine, place the
.tdc file in the folder
C:\Users\<user name>\Documents\My Tableau Repository\Datasources. Make sure there is only one
.tdc file in this location.
Tableau will not create or customize a connector to work with a specific ODBC driver.
TDC Configuration Properties
As per the need and capability of the data source, configure the following properties for your TDC file:
|Customization Property||Recommended Setting|