Snowflake for SQL Server Users – Part 15 – Table types

Snowflake has a richer set of options for how tables are created, than we have in SQL Server.

CREATE TABLE Variants

As well as a CREATE TABLE statement as you would already be familiar with in SQL Server, Snowflake offers these variants:

CREATE TABLE tablename AS SELECT

This is basically similar to a SELECT INTO in SQL Server. It executes the SELECT query and creates a table from the results (including the data).

CREATE TABLE tablename LIKE

This is an interesting variant. It creates a table with the same schema as an existing table but without any data i.e. it creates a new empty table based upon the design of another table.

CREATE TABLE tablename CLONE

This is another interesting variant. It clones one table to create another table. It's similar to the LIKE option but also includes all the data.

Table Types

Permanent

Permanent tables are standard table types that are pretty much the same as the equivalents in SQL Server.

TEMPORARY

This is similar to a temporary table in SQL Server. The table and its data are retained until the end of the user's session. The syntax supports a concept of LOCAL TEMPORARY and GLOBAL TEMPORARY but these options have no affect. A standard TEMPORARY table is created.

Note that TEMPORARY can be abbreviated to TEMP, and has a synonym of VOLATILE.

TRANSIENT

These tables aren't dropped at the end of a user session and stay until someone drops them. They are also visible to all users.

A key difference with them though is that they don't offer the same level of protection as standard tables. They are more like an eventually-consistent table where you might lose data if the system fails. They should only be used for data that can easily be recreated if needed.

 

 

 

SQL: Sending email to an operator rather than a recipient in SQL Server

I was answering a question on a forum the other day, and the person asking the question was puzzled about how to send an email to an operator (i.e. msdb operator) rather than to just any other email address.

The reason he wanted to do that was that he wanted to use the Agent operator system to define a name. He didn't want to hard-code a recipient name in his code. By having it as an operator, it could be changed later, independent of the code.

Doing that isn't so hard. You just need to retrieve the operator's email address from msdb.dbo.sysoperators before sending the mail. Here's an example:

And then send the email like this:

I hope that helps someone.

SDU Tools: Single space words in SQL Server T-SQL

We've got a lot of very useful string-related functions in SDU Tools. This is another one. At times, there's a need to remove excess spaces between words in strings. In our free SDU Tools for developers and DBAs, we added a function to do just that: SingleSpaceWords.

The function just takes one parameter: the string to be processed.

Find out more

You can see it in action in the main image above, and in the video here:

Access to SDU Tools is one of the benefits of being an SDU Insider, along with access to our other free tools and eBooks. Please just visit here for more info:

http://sdutools.sqldownunder.com

 

Spatial Data: If you are working with Australian mapping data, Mappify.io is worth knowing about

I spend quite a bit of each year working with spatial data (i.e. mostly mapping data but sometimes other data). One of the constant challenges is where to find the spatial data. Another challenge is where to find services that can help you to work with the data.

For Australian data, if you haven't considered Mappify.io, I think you should take a look.

You need to create an account before you start but it's free to create an account to try it out. Then there are a number of services.

Geocode

You can do simple web-based geocoding if you want. Enter values or upload a CSV and click Geocode.

You get back a map and results that you can export:

And of course there's an API that allows you to do this programatically.

Reverse Geocoding

This takes a list of latitudes and longitudes and reverse geocodes them.

And again, a map and an exportable set of results:

Area Code

The Area Code option determines the area that a location is in. You can choose how the area is categorized:

For example, POA returns the postcodes.

Address Cleansing

I particularly love the option for cleaning addresses. Pass in a full address and have the system break the data into standard columns:

Others

There are many other options as well, like routing and driving directions and distances, spatial analytics, etc. This is a great tool and a great service.

Mappify.io currently say that if you create an account, the first 2,500 requests every day are free. After that they're 0.1¢ each. That seems a bargain.

Learning about Spatial Data

If you're not using spatial data in SQL Server, you should be. And if you need to learn about spatial data in SQL Server, check out our popular on-demand online course:

https://training.sqldownunder.com/p/sqlserver-spatial-data

 

 

T-SQL 101: #43 Concatenating strings in SQL Server

Joining together strings is called concatenating strings. In the first example here you can see SELECT hello, plus NULL, plus there. Now the trick with this, though, is that NULL is a bit of a challenge.

Note that the outcome of that first select statement is NULL is just NULL. As soon as you concatenate something with NULL, you would end up with NULL as the overall answer.

So, in SQL Server 2012 they gave us a CONCAT function. And that allows you to put a list of values (it doesn't matter how many) and concatenate them all. When I first saw this function, I wondered why on earth we needed it, as we could already concatenate strings. However, it does two special things.

The first is that it ignores any NULL values. So you can see in the second SELECT statement above, that it ignore the NULL and just concatenates the other two values.

The second excellent feature of the CONCAT function is that it doesn't care what the data type of your input values is, as it will implicitly convert all the values to strings before concatenating them.

Learning T-SQL

It's worth your while becoming proficient in SQL. If you'd like to learn a lot about T-SQL in a hurry, our Writing T-SQL Queries for SQL Server course is online, on-demand, and low cost.

Snowflake for SQL Server Users – Part 14 – Internal storage in Micropartitions

If you want to learn a lot about how Snowflake works internally, it's worth spending some time reading the ACM whitepaper that described the service.

It describes the overall structure of the service, and then describes how data is stored, followed by a description of how the encryption of the data within the service is accomplished. (I'll talk more about the management of encryption keys in a later blog post).

Columnstores

Columnstores will be familiar to SQL Server users ever since 2012, and it's no surprise that the storage within Snowflake is essentially columnstore based.

Columnstores are great for adding data into, and are excellent for reading large amounts of data in bulk. What they aren't great for, is being updated. Clustered columnstore indexes in SQL Server are updatable but only because they have an associated rowstore that's used for the delta store.

Micropartitions

In Snowflake, the columnstores are formed by a series of what are called micropartitions. Each of these is a contiguous storage location that holds up to 16MB of compressed data (50 to 500 MB of uncompressed data), but importantly, the micropartitions are immutable i.e. once they are created, they are never modified.

The metadata for the micropartitions records the range of values for each of the columns that are stored in the micropartition, the number of distinct values, and a few other properties.

Tables get partitioned automatically during insert and load operations, based upon the order of the incoming data.

Interestingly, the file extension for these is FDN which is short for flocon de neige i.e. the French word for Snowflake.

Query performance against the columnstores is highly dependent upon being able to determine which micropartitions need to be read to satisfy a particular query. In Snowflake, the term used to describe this partition elimination is pruning.

 

SQL: Rolling up or archiving transactions past a cutoff date in T-SQL

Another question that I recently saw on Stack Overflow was asking how you SELECT a bunch of transactions back past a cutoff date, add them up, INSERT a roll-up row, and then delete the old ones. The concern was with how you make sure the ones you are deleting are the same rows that you are rolling up (or totalling). Someone else might UPDATE or INSERT a row at the same time.

I see exactly the same issue with people wanting to archive rows off into another table. They SELECT the rows into the other table, then DELETE them. But how do they know they deleted the same rows.

The answer is the OUTPUT clause for the DELETE statement.

Output Clause

When you execute  a DELETE statement (or INSERT, or UPDATE, or MERGE, etc.), you can add an OUTPUT clause that returns details of what was inserted or deleted. The virtual tables are just like you have in triggers.

Example

So let's start with an example transactions table. I've kept it as simple as possible:

Then let's populate it with some sample data and query the data:

That leaves us rows like this:

Roll-up Time

Now I'll calculate an archive cutoff date, as the first day of a month, 2 months back.

I've added a table variable to "catch" the OUTPUT of the DELETE statement. The poster asked for a table lock so I added that, although it's not really needed or a good idea. (Consider removing it)

Then, within the same transaction, I just INSERT the roll-up row back into the same table.

And Archiving?

Archiving data from one table into another table is even easier. You just do the DELETE in one table, and OUTPUT INTO the other table. Then you know that the rows you deleted are the same ones that you are inserting.

I hope that helps someone.

 

SDU Tools: SQL Server SDU Tools Version

More and more users of SDU Tools are adding the deployment of the tools into their standard deployment pipelines, and into their standard operating environments (SOEs).

One common but simple request that we've had is that they need to be able to check which version of SDU Tools is currently installed. So we added a function to do that.

It doesn't take any parameters but just returns the version number as a string.

Find out more

You can see it in action in the main image above, and in the video here:

 

Access to SDU Tools is one of the benefits of being an SDU Insider, along with access to our other free tools and eBooks. Please just visit here for more info:

http://sdutools.sqldownunder.com

 

Power BI: Creating a TopoJSON file of Australian postcodes for use with Shape Map

In a recent consulting project, I needed to plot analytic values over postcodes. I couldn't just do it with the built-in maps control, mostly because the system that we're developing on doesn't have Internet connectivity. I was keen to upload the mapping data directly into the report, and the Shape Map control (while still in preview) seemed to fit what I needed exactly.

However, when you load custom maps into the control, they need to be in TopoJSON format. Now TopoJSON is a specific extension over GeoJSON which has been pretty common for a while. I tried just using a GeoJSON file with the control, to no luck.

What was successful was to download the postcode shapes from here:

https://www.abs.gov.au/AUSSTATS/abs@.nsf/DetailsPage/1270.0.55.003July%202016?OpenDocument

I downloaded the 2011 shape file for Australian postcode areas.

Next I used the  website at https://mapshaper.org/ to do the following:

  • Loaded the shape file
  • Simplified the shape file (I reduced it to about 2% of the original size, while opting to keep any small areas)
  • Export the data as a TopoJSON file
  • (Optional) I then edited the TopoJSON file to rename my postcode column from POA_CODE to Postcode and from POA_NAME to PostcodeName.

What an awesome site mapshaper.org is !

Notes on Shape Map

To use the shape map control, you need to do this:

  • Make sure you have the Shape Map control (it's currently a preview feature that needs to be enabled)
  • Drag the column containing the postcode to the Location field.
  • Once you've done that, on the format page, you can choose to add a custom map. I imported my TopoJSON postcodes file
  • Drag the analytic value to the Color saturation field.
  • Set the background color and border color.

When I first did this, nothing worked for me. Eventually I worked out that my source data had integer postcodes but the ones in the file were just text fields. I added a computed column to my data, made sure I had text postcodes available, and used that column, then all was good.

I hope this helps someone.

 

 

T-SQL 101: #42 Comparing strings by using collations in SQL Server

Another important concept when you're working with strings is the concept of Collations. If I write code like the first line here:

The question is how would I know if a capital A is treated the same as a lower case A. The answer to that is "it depends upon the collation". Collations are sets of rules that determine how strings are compared. SQL Server supports lots of collations. You can see them all by executing:

If you've never done that, it's worth taking a look at the available options.

For most people, the most important part of the collation names, apart from the language, is the CI and AS parts in the example above. CI says "Case insensitive". If it was CS, it would have been "Case sensitive". This determines how capitals and lower letters are dealt with.

The AS is "Accent sensitive" and you can similarly have AI for "Access insensitive". This relates to accents like accents, graves, umlauts, etc. commonly seen in European languages.

Some collation names start with a SQL prefix. These are old Sybase/SQL Server collations that are retained for backwards compatibility. You should avoid them and use the Windows collations i.e. the ones without the SQL prefix.

You can set default collations at the database and column levels. (Note you cannot set them at the table level).

If the two values you are comparing use different collations, or if you want to use a different collation to the database's default collation, you can specify which one you want by using the COLLATE clause. You can see it in the code image above. In that example, you might have a case-sensitive database but you want to that one comparison in a case-insensitive way.

Temporary Tables

One common problem happens when your database has a different collation to the tempdb database, and you are comparing values between your database and some temporary object. When that happens, you'll see an error that says SQL Server can't resolve the collation conflict. In this case, you definitely need to override by using a COLLATE clause.

You could then specify a particular collation, or if you just want to default to what the database uses, you can say COLLATE DATABASE_DEFAULT without needing to know which collation the database uses.

Learning T-SQL

It's worth your while becoming proficient in SQL. If you'd like to learn a lot about T-SQL in a hurry, our Writing T-SQL Queries for SQL Server course is online, on-demand, and low cost.