Python for Non-Data Analytics?

, , ,

Microsoft SQL Server 2017’s support for Python had me curious. Would this integration make it practical to use Python in non-data analytic scenarios where a general-purpose programming language—like Python—is more suitable than set-based T-SQL?

The answer to this question hinges around the integration’s design. Presumably, the new Python integration is designed to work well for data analysis, as this seems to be the prime impetus motivating it. An integration that excels in one scenario, like the assemble data -> process & analyze -> return results flow common in data analytics, may or may not be a good fit for other use cases.

To help us evaluate when and where SQL Server’s Python support may be helpful for non-data analytics applications, let’s compare it—or, more specifically, let’s compare the underlying external script execution environment that powers it—with two integrations that have been included with SQL Server for some time: Common Language Runtime (CLR) and xp_cmdshell.

Since the external script execution environment that powers the SQL Server-to-Python integration also powers SQL Server’s support for the R language, our comparison also applies to the SQL Server-to-R integration.

External Script Execution Environment CLR xp_cmdshell
Usage Executing scripts written in Python and R. Authoring stored procedures, functions, aggregates, triggers and user-defined types in .Net languages like C# and Microsoft Visual Basic. Invoking commands at the operating system command prompt level.
Method of Invocation Invoked by calling stored procedure sp_execute_external_script.

Example:
EXEC sp_execute_external_script
@language = N'Python',
@script = N'OutputDataSet = InputDataSet;
OutputDataSet["Col2"] = OutputDataSet["Col1"] * Multiplier',
@input_data_1 = N'SELECT 1 AS Col1',
@params = N'@Multiplier INT',
@Multiplier = 20;

Invoked directly using T-SQL.

Example:
EXEC MyClrStoredProcedure(@myVar);

Invoked by calling stored procedure xp_cmdshell.

Example:
EXEC xp_cmdshell 'dir *.txt';

Execution Context External to SQL session.

Scripts execute outside SQL Server under an operating system user account retrieved from a predefined pool of accounts. If a SQL user executes multiple scripts simultaneously, all scripts will execute under the same pool account. A pool account will not be used to execute scripts for multiple users at the same time.

Current SQL session.

For example, CLR-authored code can run queries which will be executed using the current SQL session user’s database permissions.

External to SQL session.

Commands are executed under the SQL Server service account or a proxy account, depending on the invoking SQL user and the SQL Server’s configuration.

Parameter Passing Input and output parameters must be passed explicitly, similar to how parameter are passed to sp_execute_sql. Some parameter types, including CLR types, are not supported. Full access to all parameters passed the method. Rudimentary. Data can be passed to the external command as string command line arguments. The text output of the command can be captured using INSERT…EXEC and then parsed by the invoking T-SQL.
Input Data A single string containing a query defining an input data set may be provided. This query’s results will be materialized and passed into the scripting environment as a pandas data frame (Python) or data frame (R). These results can be passed all at once or (Enterprise Edition only) in chunks which are incrementally processed by the script.

If additional data is needed, the script can connect back to SQL Server and execute queries. However, as the script is not running in the context of the current SQL session, these connections do not inherit the current user’s permissions.

Depending on the method type, table variables can be passed as input parameters.

If additional data is needed, queries can be executed which are run in the context of the current SQL session (e.g. under the current user’s permissions).

No mechanism provided.

External process could be programmed to open a connection to the SQL Server and query for data. However, as process is not running in the context of the current SQL session, these connections do not inherit the current user’s permissions. Another option is for the T-SQL that invokes the external process to first write data to a file which the external process has been programmed to read.

Returning Result Sets A single result set can be returned.

The result set is returned without column names, even if the underlying data structure in the scripting environment had column names. Column names can be assigned by using the WITH RESULT SETS clause of EXECUTE.

In the script, the result set to return is assembled as a pandas data frame (Python) or data frame (R).

Multiple result sets can be returned, if appropriate for the type of method (e.g. a stored procedure can return multiple results sets while an aggregate cannot return any). No mechanism provided.

Possible Workarounds: Textual output of command can be captured using INSERT…EXEC and then parsed by the invoking T-SQL. Also, the process could write results to a file which the invoking T-SQL could read.

Ease of Development/Administration Easy to deploy. Text of script to execute is passed as a string to sp_execute_external_script. Hassle involved. Building and deploying custom assemblies is required. Depends on what is being invoked. Appropriate commands and supporting infrastructure must be installed at an accessible location.

Conclusion

Based on the above comparison, I think the answer to our initial question is “it depends.”

The new external script execution environment best fits scenarios where the script does not need to interact with the database server as it executes. This implies both that the script does not modify data on the server and that the script can live within the constraints of receiving and returning at most a single result set plus parameters. While these boundaries aren’t absolute limits, exceeding them likely will require creative workarounds and may involve security compromises (e.g. opening a connection back to the server using hard-coded credentials so that data modification queries can be executed).

Another factor to consider when evaluating the new integration: A slight environment start up delay is sometimes present between the call to sp_execute_external_script and the beginning of script execution. This lag is probably a non-issue when if script itself takes some time to execute; however, when the script is fast and response time is expected to be instantaneous, the delay may make using the integration a no-go. (This concern may be moot if the yet-to-be-released pythonlauncher.config documentation reveals a way to tune the pre-creation of external script processes.)

Leave a Reply

Your email address will not be published. Required fields are marked *