Accelerate SQL code migration from Google BigQuery to Amazon Redshift using BladeBridge


Accelerating SQL code migration from Google BigQuery to Amazon Redshift can be a complex and time-consuming task. Businesses often struggle to efficiently translate their existing BigQuery code to Amazon Redshift, which can delay critical data modernization initiatives. However, with the right tools and approach, this migration process can be significantly streamlined.

This post explores how you can use BladeBridge, a leading data environment modernization solution, to simplify and accelerate the migration of SQL code from BigQuery to Amazon Redshift. BladeBridge offers a comprehensive suite of tools that automate much of the complex conversion work, allowing organizations to quickly and reliably transition their data analytics capabilities to the scalable Amazon Redshift data warehouse. BladeBridge provides a configurable framework to seamlessly convert legacy metadata and code into more modern services such as Amazon Redshift.

Amazon Redshift is a fully managed data warehouse service offered by Amazon Web Services (AWS). Tens of thousands of customers use Amazon Redshift every day to run analytics, processing exabytes of data for business insights. Whether your growing data is stored in operational data stores, data lakes, streaming data services, or third-party datasets, Amazon Redshift helps you securely access, combine, and share data with minimal movement or copying. Amazon Redshift is built for scale and delivers up to 7.9 times better price performance than other cloud data warehouses.

By using the BladeBridge Analyzer and BladeBridge Converter tools, organizations can significantly reduce the time and effort required to migrate BigQuery code to Amazon Redshift. The Analyzer provides detailed assessments of the complexity and requirements for the migration, and the Converter automates the actual code conversion process, using pattern-based customizable rules to streamline the transition.

In this post, we walk through the step-by-step process of using BladeBridge to accelerate the migration of BigQuery SQL code to Amazon Redshift.

Solution overview

The BladeBridge solution is composed of two key components: the BladeBridge Analyzer and the BladeBridge Converter.

BladeBridge Analyzer

The Analyzer is designed to thoroughly assess the complexities of the existing data environment, in this case, Google BigQuery. After assessment of the source SQL files, it generates a comprehensive report that provides valuable insights into the migration effort. The Analyzer report includes the following:

  • Summary of the total number of SQL scripts, file scripts, data definition language (DDL) statements, and other key metrics
  • Categorization of the SQL code complexity into levels such as low, medium, complex, and very complex
  • Insights that help both the organizations and systems integrators prepare more accurate project estimates and migration plans

BladeBridge Converter

The Converter is a pattern-based automation tool that streamlines the actual code conversion process from BigQuery to Amazon Redshift. The Converter uses a set of predefined conversion rules and patterns to automatically translate 70–95% of the legacy SQL code. This significantly reduces the manual effort required by developers. The Converter works by doing the following:

  • Parsing the source SQL files and analyzing the code semantically
  • Applying the appropriate translation rules and patterns to convert source database code to the target, in this case, Google BigQuery to Amazon Redshift

The out-of-the-box code handles most conversions. The Converter allows developers to customize the conversion patterns for more complex transformations.

The following is the migration procedure:

  1. Prepare SQL files
  2. Using BladeBridge Analyzer, create an analyzer report
  3. Purchase license keys for converter
  4. Using BladeBridge Converter, convert SQL files

The following diagram illustrates these steps.

Prerequisites

You need the following prerequisites to implement the solution:

Solution walkthrough

Follow these solution steps:

Prepare SQL files

For SQL data warehouses such as BigQuery, code preparation starts by exporting the SQL files out of the data warehouse solution. If your BigQuery SQL code is stored in a single file containing multiple database objects, you need to split them into individual files before using the BladeBridge tools to convert the code to Amazon Redshift. To split into multiple files, you can use the BladeBridge SQL File Splitter utility. The BladeBridge conversion process is optimized to work with each database object (for example, tables, views, and materialized views) and code object (for example, stored procedures and functions) stored in its own separate SQL file. This allows the BladeBridge Analyzer to scan each file individually, gaining a comprehensive understanding of the code patterns, complexity, and structure. To use BladeBridge SQL File Splitter utility, follow these steps:

  1. Log in to BladeBridge portal and download the SQL file splitter utility for your operating system.
  2. Create an input file directory and place your BigQuery SQL code files in the directory.
  3. Create an empty output file directory. The files generated by the splitter will be stored here.
  4. Navigate to the directory where you downloaded the bbsqlsplit executable file and run the following command in your terminal (Mac or Linux) or command prompt (Windows), replacing the input and output file directory paths:

Syntax

bbsqlsplit

######## OPTIONS ########

-d <<input file directory path>>

-o <<output file directory path>>

-E sql

Example 

bbsqlsplit

-d C:\Users\XXXXX\Desktop\BladeBridge\SplitFilesUtility\source

-o C:\Users\XXXXX\Desktop\BladeBridge\SplitFilesUtility\splitFiles

-E sql

For more options of the bbsqlsplit command, refer to the SQL file split documentation in the BladeBridge community portal.

Using BladeBridge Analyzer, create an analyzer report

The Analyzer provides a detailed assessment of the existing BigQuery code, generating a comprehensive report that outlines the complexity and requirements for the migration to Amazon Redshift.

To run the BladeBridge Analyzer, follow these steps:

  1. Log in to the BladeBridge portal and navigate to the Analyzer Download
  2. Download the Analyzer executable file for your operating system (for example, bbanalyzer.exe for Windows, bbanalyzer.gz for Linux and macOS). For macOS and Linux users, you need to deflate the downloaded gzip file.
  3. Download the configuration file (general_sql_specs.json) from the BladeBridge community portal, as shown in the following screenshot.

  1. On the BladeBridge community portal, choose Assets. This page should display the Analyzer key for your organization.

  1. From the assets page, download the Analyzer key as shown in the following screenshot.

In the directory where you downloaded the bbanalyzer executable file, run the following command in your terminal (Mac or Linux) or command prompt (Windows), replacing the necessary paths.

Syntax

bbanalyzer

######## OPTIONS ########

-c <<path to your analyzer key>>

-t SQL

-d <<path to your source code directory>>

-r <<name for the output report>>.xlsx

Example

bbanalyzer

-c C:\Users\XXXXX\Desktop\BladeBridge\analyzer_key.txt

-t SQL

-d C:\Users\XXXXX\Desktop\BladeBridge\SplitFilesUtility\splitFiles

-r analyzer_report.xlsx

After running the command successfully, the Analyzer generates a report. Review the report thoroughly, because it provides a summary and in-depth explanations of the SQL analysis. The summary sheet, shown in the following image, provides an overview of the migration, including the number of total SQL scripts, file scripts, and DDLs. Each SQL script is categorized into LOW, MEDIUM, COMPLEX, or VERY_COMPLEX complexities, which are determined by the Analyzer Complexity Determination Algorithm. The summary will also help with understanding the overall complexity and migration effort before performing the actual conversion.

If you observe an error when running BladeBridge Analyzer, review following troubleshooting tips:

  • Configure the write permission – You may need to add necessary permission to the analyzer executable file. For Mac and Linux users, run chmod 755 ./bbanalyzer to modify the permission.
  • Allow running third party software – Because BladeBridge Analyzer is a third-party software, MacOS may raise a warning or an error when running Analyzer. If you’re using Mac, follow the instructions in Open a Mac app from an unidentified developer.
  • Use local drive – In some cases, you might encounter an error if the executable is located in a network drive. We recommend that you run the executable on the local drive.
  • Don’t include whitespace in the path – Make sure the path to the executable doesn’t contain a directory with spaces in the directory name.

For more details, refer to the BladeBridge Analyzer Demo.

Purchase license keys for convertor

To use the BladeBridge Converter and automate the code translation process, you need to purchase the necessary license keys. These license keys are tied to the specific SQL files you are converting, making sure that updates to the source code require the appropriate license.

To obtain the license keys, follow these steps:

  1. Share the output of the BladeBridge Analyzer report and the provided pricing calculator Excel sheet with BladeBridge.
  2. The BladeBridge team will review the information and provide you with the required license keys to run the Converter.

The license key is tied to the file hash of the SQL files you are converting. If you make updates to the source SQL files, you need to purchase new license keys to convert the modified code. Therefore, make sure to purchase the necessary license keys and manage your files with a version control system to have smooth transitions when converting your BigQuery SQL code to Amazon Redshift.

Using BladeBridge Converter, convert SQL files

The Converter uses the predefined conversion rules that are available in the out-of-the-box configuration files to automatically translate 70–95% of the legacy code, significantly reducing manual effort for your development team. The out-of-the-box configuration file handles conversion for common code patterns from Google BigQuery to Amazon Redshift. For those custom patterns that aren’t covered by an out-of-the-box configuration file, you can create custom conversion rules by creating additional configuration files.

Follow these steps to run the BladeBridge Converter:

  1. Log in to the BladeBridge portal and on the Convertor downloads page, download the Convertor executable file for your operating system (sqlconv.exe for Windows or sqlconv.gz for Mac or Linux)

  1. From the same page, download the configuration file (general_sql_specs.json)
  2. Create an output directory where the converted files will be saved
  3. In the folder where you downloaded the Convertor executable sqlconv, run the following command

Syntax

sqlconv

-c <<converter license file name obtained from BladeBridge>>

-d <<input folder for SQL files>>

-n <<output folder for converted files>>

-u <<path for the config files/s provide at least one file>>

Example

sqlconv

-c converter_license.txt

-d C:\Users\XXXXX\Desktop\BladeBridge\SplitFilesUtility\splitFiles

-n C:\Users\XXXXX\Desktop\BladeBridge\SplitFilesUtility\cnvrtdFiles

-u bq2redshift.json

  1. Run the generated SQL files in your Amazon Redshift data warehouse. If you encounter errors, analyze them and determine if custom conversion rules, not already covered in the out-of-the-box configuration files, need to be applied.
  2. If custom conversion rules are needed, create a new configuration file following the guidelines in the Customize Configuration File section. Provide the new config file name in the -u option and rerun the Converter.
  3. Repeat these steps until all files are converted successfully or manually modified.

Customize configuration file

Customizing a configuration file is an iterative process that can help automate the conversion for occurrences in your codebase. However, manual conversion may be required if the conversion is needed for only a few files and a few occurrences.

The configuration is defined in a JSON file. There is a general configuration file with common rules and custom configuration files for each client with client-specific rules. Rules can be added to the general configuration file if they are applicable for all clients. For client-specific rules, a separate JSON file should be created and referenced. This keeps the general rules clean and organized.

The conversion rules in BladeBridge’s configuration file fall into one of three categories:

  1. Line substitution
  2. Block substitution
  3. Function substitution

Every line ending with a ; is a statement. This line ending also can be replaced with other breakers. Refer to this BladeBridge documentation to get more details on SQL and expression conversion.

The following are considerations while using the customized configuration:

  • Nested functions in BigQuery allow for complex operations within a single SQL statement, which may need to be broken down into multiple steps in Amazon Redshift
  • Array functions in BigQuery provide capabilities for manipulating and transforming array data types, which may require alternative approaches in Amazon Redshift
  • You need to carefully analyze the requirements and implement workarounds or alternative solutions when migrating from BigQuery to Amazon Redshift, especially for advanced functionality not directly supported in Amazon Redshift

Line substitution

Line substitution applies regular expressions to each line of code. This has the from clause, which has the expression to be converted. The to section has the target mapping for which it’ll be converted. Statement_categories limit the application of line substitution to specific statements such as DDL or procedure. For example:

  • The first expression in the following code example replaces the regular expression pattern ROWNUM with the SQL expression row_number() over (order by 1)
  • The second expression in the following code example replaces the regular expression pattern SYSDATE with the SQL expression CURRENT_TIMESTAMP.
line_subst” : [
{“from” : “\bROWNUM\b”, “to” : “row_number() over (order by 1)”},
{“from” : “SYSDATE”, “to” : “CURRENT_TIMESTAMP”}
]

Block substitution

Block substitution applies regular expressions across multiple lines. This applies to statements that stretch over multiple lines, which are generally more complex than the line substitutions. The following expression in the example replaces the block. In this example, the procedure is created in the target database.

BEGIN

EXECUTE IMMEDIATE(‘SQL Statement’);

EXCEPTION WHEN OTHERS

THEN

NULL

END;

To

CALL SP_DYN_SQL(‘parameters’);

“block_subst” : [
{“from”: “BEGIN(.*?)execute immediate(.*);.*exception\s*when\s*others\s*then\s*null(.*?)end;(.*)”, “to”: “CALL sp_dyn_sql($2);”}
]

Function substitution

Function substitution allows replacing one function with an equivalent function in the target data warehouse. The configuration also allows for specifying custom functions.

Function substitution points to an array of instructions responsible for altering function calls. This section is used when function translations are required or function arguments (function signature) have to be altered. The following expression converts the NVL2 function to CASE function on Amazon Redshift.

“function_subst” : [
{“from”: “NVL2”, “output_template” : “CASE WHEN $1 IS NOT NULL THEN $2 ELSE $3 END”}
]

Conclusion

In this post, we demonstrated how to use the BladeBridge Analyzer and BladeBridge Converter to streamline the migration of SQL code from Google BigQuery to Amazon Redshift. By using BladeBridge, organizations can significantly reduce the time and effort required to translate their existing BigQuery code for migration to the Amazon Redshift data warehouse. The Analyzer provides a detailed assessment of the source SQL code, and the Converter automates the actual conversion process using a set of predefined, customizable rules and patterns.

We also covered the customization capabilities of the BladeBridge solution, showcasing how you can tailor the conversion rules to handle more complex transformations. By using the line substitution, block substitution, and function substitution features, you can have a seamless migration that addresses the unique requirements of your data analytics infrastructure.

We encourage you to try out BladeBridge’s GCP BigQuery to Amazon Redshift solution and explore the various configuration options. If you encounter any challenges or have additional requirements, refer to the BladeBridge community support portal or reach out to the BladeBridge team for further assistance.


About the authors

Ritesh Kumar Sinha is an Analytics Specialist Solutions Architect based out of San Francisco. He has helped customers build scalable data warehousing and big data solutions for over 16 years. He loves to design and build efficient end-to-end solutions on AWS. In his spare time, he loves reading, walking, and doing yoga.

Anusha Challa is a Senior Analytics Specialist Solutions Architect focused on Amazon Redshift. She has helped many customers build large-scale data warehouse solutions in the cloud and on premises. She is passionate about data analytics and data science.

Yota Hamaoka is an Analytics Solution Architect at Amazon Web Services. He is focused on driving customers to accelerate their analytics journey with Amazon Redshift.

Milind Oke is a Data Warehouse Specialist Solutions Architect based out of New York. He has been building data warehouse solutions for over 15 years and specializes in Amazon Redshift.

Raza Hafeez is a Senior Product Manager at Amazon Redshift. He has over 13 years of professional experience building and optimizing enterprise data warehouses and is passionate about enabling customers to realize the power of their data. He specializes in migrating enterprise data warehouses to AWS Modern Data Architecture.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *