Opinion: Singular vs Plural Table Names in SQL Server

There is a near-religious debate in the development and DBA communities about singular and plural table names. It's pointless rehashing all the arguments but I want to spell out what I do, and why.

What's driving this post is that I had a developer tell me that I was doing it inconsistently because even though I generally use plural names, that he found a table where I used a singular name. He thought this was inconsistent. It's not, and this is why.

OLTP Databases 

In traditional OLTP databases, I use generally use plural table names ie: my table will be Sales.Customers, not Sales.Customer. This is to make it clear that the table is a set of data about a set of customers, not information about just one customer.

The main people who complain about this are those who think a table should directly relate to an object. Well, it doesn't. If it did, it still would be a Customers object that was a collection of Customer objects. The table would not map to one Customer.

However, all rules are meant to be broken. The one exception that I make to this is for tables that will only ever hold a single row of data ie: where more than a single rows is not logical or meaningful.

So I might have a table called Application.SourceDatabase if that table will always have one and only one row. And I won't consider that to be inconsistent.

OLAP Databases

Dimensional warehouses are perhaps treated differently.  For me, it depends upon whether there will be a view layer above the database, that's used by any analytic system that accesses the database.

I want analytic systems to see human-presentable data.

By human-presentable, I mean names that are ready to go directly into analytic or reporting models. This does mean they will have, for example, embedded spaces in names where sensible.

So I use one of two options for dimensional models:

  • (Less preferred) The database has singular table names, all names have embedded spaces where needed, except for key columns, lineage columns, etc. that are not ever part of the UI. The database can be consumed directly by those analytic applications.
  • (More preferred) The database is set up with the same naming as an OLTP database above, and has a view layer that provides the human-readable data. The analytic applications talk to the view layer, not to the database tables.

SQL: Why can't I see my files in SSMS when I want to restore?

I had a curious discussion on a mailing list the other day where the questioner was complaining that SQL Server Management Studio (SSMS) wasn't working properly. He wanted to restore a backup, and his folders did not appear in the folder list to choose from. He could see them in Windows File Explorer but couldn't see them when trying to restore a backup.

What he wasn't understanding is that when you use SSMS to perform a restore, what you are doing is sending a command to SQL Server to tell it to do a restore. SQL Server is going to open the file and get the contents out of it, not via your local copy of SSMS.

So, the issue is which files SQL Server can see, not the files that you can see.

This perhaps becomes clearer if you think about SSMS as just a client application that sends commands to a server, and that server often won't be on the same machine ie: you'll have a client system running SSMS and a server somewhere that's running SQL Server.

When you run Windows File Explorer, it's your identity that's doing the looking, either on your local machine, or on network shares on other machines.

When you restore a backup, it's SQL Server that's looking for the files, either on the server that's running SQL Server or again on some network share.

The issue then is the identity of SQL Server and the permissions granted to that identity. If you run SQL Server Configuration Manager (one of the tools that ships with SQL Server) on the server, and select the Services option in the left-hand pane as shown in the main image above, you can see who SQL Server is running as. That's the identity you need to be concerned about.

 

Learning Chinese: Why Learn Mandarin ? Start with the Numbers

Anyone that knows me well will have heard me talking at some time over the last few years, about learning Mandarin. At a SQL conference the other day, I had a few people asking me about it because they were interested in learning about it, and a couple wondering why I'd learn it in the first place.

The first reason for this is how prevalent the language already is and how much more prevalent it will be.

I often see statistics that talk about the most common languages in the world, and invariably they count how many countries speak the language. English fares well in those comparisons.

But if you look at the number of native speakers of the different languages, it's quite a different story, particularly when projected forward.

A few years back, I was reading a Spanish airline (Iberia) magazine with an article on language trends and it was quite revealing. They argued that by 2025, when you rank languages by the number of native speakers, it'll look like this:

  1. Mandarin Chinese
    Daylight
  2. Spanish
  3. English
  4. Hindi
  5. Arabic

The aspect that I hadn't considered was birth rates. They argued that 1 in 4 children born will be native Mandarin.

Just stop and think about that for a moment.

And 1 in 5 will be native Spanish speakers. (This is in no short part caused by the large number of large Catholic Spanish-speaking families in South America where some people are still having 8 or 10 children. In many English speaking countries, the native English speakers aren't even replacing themselves – with 1.3 or 1.4 children per couple).

Now many people will say "oh but there are lots of Chinese dialects". That's true. In fact there are over 900 well-recognized dialects of Chinese. What's important to understand though is that the Chinese government is big on standardization. They even have a single timezone right across the huge country, but you have to do that when you have so many people.

As hard as it is for less common dialect speakers to deal with, the game is over and down the track, it's all Mandarin.

Cantonese is the next most common, and is still the dominant language in Hong Kong. I'm sure it's hard for them to accept but right from the moment Hong Kong was handed back to China, the writing has been on the wall for the language. Children there learn Mandarin from a very young age, and I'm sure over time, that if they want to deal with government, business, etc. it will be Mandarin.

Even here in Melbourne, when I'm on trains, I used to hear mostly Cantonese. Now almost all I hear is Mandarin.

The positive thing here is that the different dialects share a common writing system (with some differences that I'll describe another day).

By the numbers, I've also seen projections that say that if you could speak Mandarin, Spanish, and English, by then you'd be able to communicate at some level with over 90% of the world.

I'll write more next week on methods for learning. If you want to get a taste for it in the meantime though, my current favorite is Tutor Ming. If you decide to try it, click here and it's a bit cheaper for you, and for me.

One argument is that technology will replace the need for learning languages. I don't think that's the case, and I'll discuss that further later too.

 

Book Review: Astrophysics for People in a Hurry – Neil DeGrasse Tyson

I have to admit to being a bit of a fan of Neil DeGrasse Tyson, so I was really looking forward to reading or listening (via Audible) to his book Astrophysics for People in  Hurry.

It's always a bit of a tall order to try to cover something like Astrophysics in a short book. The title reminded of silly book titles like "Applied Multivariate Analysis and Calculus for non-Mathematicians".

But I loved this book.

Neil has very carefully chosen many interesting aspects of the world and how we come to be here, and what we do and don't know but the thing I enjoy most is his unbridled enthusiasm for his subject. You can just tell how much he loves this material and how keen he is to share it with you.

I've read a number of books on these topics in the past but still found many aspects that I either hadn't encountered before or where Neil provided a different view of the topic.

The book covers some tough territory but I think he's done a great job of keeping the treatment of it light enough to avoid losing the majority of his audience.

Recommended !

Greg's rating: 8 out of 10

Note: as an Amazon Associate I earn from qualifying purchases but whether or not I recommend a book is unrelated to this. One day it might just help cover some of my site costs.

SDU Tools: Converting between Base64 and Varbinary in T-SQL

In our free SDU Tools for developers and DBAs, we have added many tools that help to convert data between different types and formats when working with T-SQL.

In some recent consulting work, I needed to convert data from BizTalk messages into other formats. To help make this easy, I added a function that converts from the Base64 (used by Biztalk messages and other applications) into varbinary.

Not surprisingly, we called it Base64ToVarbinary. And for completeness, we also added the reverse function: VarbinaryToBase64.

In the main image above, you can see an example of them in use.

You can also see them in action here:

To become an SDU Insider and to get our free tools and eBooks, please just visit here:

http://sdutools.sqldownunder.com

Shortcut: Fixing or improving SQL Server Books Online

I mentioned in an earlier post that I think the online version of Books Online (BOL) is now superior to the version that you can install locally.

I particularly like the way that the online books are now cross-version ie: each page covers all supported versions, instead of having a separate page for each version.

But one of the really big bonuses is that you now have the opportunity to change the documentation if you think it's incorrect or you think it could be improved. Microsoft have placed all the documentation in a Git repository and you can change it. Doing so is easier than you might expect.

Let's look at an example.

I've searched for the LEN function in T-SQL and found the page:

Note the small list of faces appearing under the command name. These are people who have contributed to this page.

Now while I'm reading the page, I see this:

I'm puzzled why that example is specific to AdventureWorks2012. That code would work on all AdventureWorks versions. So let's try to change it. At the top of the page, there's an Edit link.

I'll click this and I'm taken to the page in Git:

Note in the top right-hand corner that I've already logged onto Git. Also notice the Edit pencil. Now I can't just directly change this info, so what I do is click this to "fork" the project so that I have my own copy to edit.

Now I can make the change that I want to:

And at the bottom of the page, I explain why:

Then I click Propose file change, and I'm taken to a page that asks me to create a pull request.

Git runs some automated checks, and makes sure that I'm not suggesting a change that can't easily be merged.

If I'm happy with the differences shown, I just click Create pull request and again another time. This page is then sent off to the person who is responsible for maintaining the page:

If they agree, your change will be merged in. Either way, you'll receive emails telling you what's going on.

If you have added a lot of changes or code, you'll also receive another email asking you to agree to be a contributor.

This is a great new option and once you're used to it, very easy to do.

Opinion: SQL Server Databases in the Cloud – Single Tenant or Multi Tenant?

I spend a lot of time working with software houses (Microsoft calls these ISVs – Independent Software Vendors). More and more, these companies are looking to convert their on-premises applications to cloud-based SaaS (Software as a Service) offerings.

For many of the ISVs, their on-premises applications are single-tenant ie: they are designed to support a single organization. When they are looking to cloud offerings, one of the first decisions is whether they should create a single database to hold the data for all their client organizations, or whether they should create a separate database for each client organization.

As with most things in computing, there is no one simple answer to this.

Here are the main decision points that I look at:

Isolation

For me, this is the #1 item. You have to decide how important isolating one client's data from other clients is. Time and again, I hear how "the app does that" but you need to keep in mind that in most multi-tenant models, you are only one faulty WHERE clause away from showing one client, another client's data. In fact, it's usually far harder to get the code correct in multi-tenant databases.

So I think you need to start by considering what the outcome of that would be. For some ISVs, this would be deeply embarrassing but manageable. For other ISVs, this would simply be terminal for the organization.

Imagine the discussion between your CEO and the client whose data was disclosed. How would that discussion go? Would you still have a job? Would you still have an organization?

Image by Dmitry Ratushny
Image by Dmitry Ratushny

If you have even the slightest doubt about this, you should lean towards single-tenant ie: a separate database per client. You still have the chance to mess that up, but you are starting in a better place.

Price

This one is pretty easy. In every current cloud provider, single larger databases are more economical than large numbers of smaller databases. I don't think that cost should be your primary concern for this, but if it is, you will lean towards single-database designs.

If you are working in Azure SQL Database though, and haven't looked at their elastic database pools, you should consider them before making your decision.

Cloudiness

A key aspect of "cloudiness" is the granularity of providing resources as required, just when they are required. Having separate databases for each client is much more "cloudy". A simple example of this is that each of your clients might require a different level of performance and/or features.

One client might want to run a very low cost test, another might need good solid general performance, another might need the best performance available. If you use separate databases, even clients running the same application could use databases with different scale and/or performance.

You might also be able to add features to specific clients. For example, one client might want a read-only copy of his/her data in another location.

Having separate databases lets you decide these things on a client-by-client basis.

One other option to consider here is that you might have different requirements even for a single client organization. They might have different environments (ie: Production, UAT, Test, etc.) that require different capabilities.

Noisy Neighbors

Everyone who's used a single database to support a large number of clients has run into the "noisy neighbor" situation at some time.

Image by Nik Shuliahin
Image by Nik Shuliahin

One client gets to the point that they can't get their work done because of the overhead being placed on the single database by another tenant.

If you have a situation where the load placed by different tenants varies, you are going to find life much easier if you have separate databases.

If you use a multi-tenant design, you will need to consider how to move one tenant to another database if that's required. (I've been in ISVs where this is only considered when someone starts complaining but it needs to be part of the design from day #1).

Query Performance

When you have a multi-tenant database, almost every table will have a tenant ID of some type, and these columns will be involved in almost every join operation.

It's not hard to imagine that these databases simply perform worse. Note that I'm not talking about single vs multi-database on a single SQL Server. In those situations, there can also be positive performance outcomes from a single database design – but that's  a topic for another day.

Recovery/Rollback/Import/Export/Retention

If you have one client that needs to roll back their data to an earlier time, this is often extremely difficult with single-database designs. It's trivial to achieve with multi-database designs.

The same applies to situations where clients have different retention policies, or where there is a need for a client to be able to export their own data (or import it again).

Data Ownership

You need to consider who will own the data that's used by your application. When you use a single-database model with multiple tenants, it's clearly going to be a database that you own and are responsible for.

Is that an appropriate risk for your organization?

One model that I'm seeing more commonplace now is that while the ISV owns and operates the application, the end customers own their own databases (and pay for them). This might simplify your legal situation in regards to data ownership.  It might also help if there are data sovereignty issues.

Take legal advice on this.

Summary

As I mentioned, this isn't a simple decision. Nowadays for cloud-based PaaS (Platform as a Service) databases to support SaaS applications though, unless there is a compelling reason not to, I'd suggest starting with a separate database for each client every time.

SQL: Finding rows that have changed in T-SQL – CHECKSUM, BINARY_CHECKSUM, HASHBYTES

If you have data in a SQL Server table and you want to know if any of the values in a row have changed, the best way to do that is by using the rowversion data type. (Note: this used to be called the timestamp data type in a rather unfortunate naming choice). I'll talk more about it in another post.

But today I wanted to discuss the another issue. If I have an incoming row of data (let's say @Parameter1, @Parameter2, @Parameter3, @Parameter4) and I want to know if the incoming values match the ones already in the table and update the table only if they are different, what's the best way to do that/

You might think that's easy and you'd just add a WHERE clause to your UPDATE like:

If you have a large number of columns, doing that gets old pretty fast. Worse, if the columns are nullable, it really needs to be more like this:

You can imagine what this looks like if there are a large number of columns, and you can imagine the amount of calculation that could be needed, just to see if one value has changed.

An alternative approach is to add one more column that represents a checksum or hash value for all the columns, and to just compare that.

The first challenge is to get a single value. Fortunately, the CONCAT function provided in SQL Server 2012 and later works a treat. It promotes all values passed to it to strings, ignores NULL values, and outputs a single string. This means we could calculate the hash or checksum like this:

and we could then compare that to the precomputed value we stored when we last inserted or updated the row. (Either directly inserting the value or via a persisted computed column)

It might be worth using an alternate separator if there's any chance it could occur at the end of any value.

If you are using SQL Server 2017 or later, you'd hope that you could use the CONCAT_WS (concatenate with separator) function instead but unfortunately, it ignores NULL values, so you'd have to wrap them all with ISNULL or COALESCE.

It would have been great if that function had a way to not ignore NULL values, or to have another function provided. It would be really useful if you wanted to generate output lines for CSV or TSV files.

The question that remains is about which hashing function to use.

I often see customers trying to use the CHECKSUM function for this. The problem with CHECKSUM is that it returns an int and you have a reasonable chance of getting a collision. That would be bad, as you'd assume that data was the same when it wasn't. A while back, we got the documentation people to write specific notes about this. You'll notice the web page now says:

"If at least one of the values in the expression list changes, the list checksum will probably change. However, this is not guaranteed."

I'm assuming you want something better than that.

BINARY_CHECKSUM sounds better but it's really more like a case-sensitive version of CHECKSUM, and unless you're working with case-sensitive servers, that's even worse.

The real answer is a true hashing function, and in SQL Server that means HASHBYTES. It's computationally more intensive but is suitable for this type of change detection. The hassle with it was that it was limited to 8000 bytes. Fortunately, in SQL Server 2016 and later, that limitation was removed, so it's now the one to use:

The other point made by our buddy Ron Dunn in the comments is extremely valuable. If you are using CONCAT to combine numbers and date/time values, it's critical that you always format them as strings yourself to control the format used. That way, you won't fall foul of different regional or language settings.

 

 

Book Review: Essentialism – Greg McKeown

Another book that I've been listing to lately on Audible is Greg McKeown's Essentialism: The Disciplined Pursuit of Less.

This was a very timely book for me. I find it very easy to get endlessly bogged down in minutia, rather than just always focusing on the big picture of what I'm doing. It's really easy to do that if you have a great desire to get things perfect or close to perfect. But that's not what you should be aiming for.

What I particularly liked about this book is that Greg didn't just talk about the problem. He had a large number of examples of situations and just how you should deal with them.

In particular, I loved the section on how to say No.

I found the book to be practical and useful, and now I just need to take a long hard look at everything I do, to see how they fit into what I really need to get done.

If you find yourself super-busy with trivia, or just don't know how to politely say no to endless requests without offending people, you might find this book pretty useful.

Greg's rating: 8 out of 10

Note: as an Amazon Associate I earn from qualifying purchases but whether or not I recommend a book is unrelated to this. One day it might just help cover some of my site costs.

Shortcut: The magical F1 key – help on syntax and metadata

I used to always recommend that people install Books Online (BOL) on their systems. It's ironic that it was called "Online", given we're really talking about "Offline" nowadays, but back when we first were talking about it, we were comparing it to a physical book, not to a live reference on the Internet.

Nowadays though, I find that the version online is so far superior to the one that you can install locally, that I think it's better to just use the online version. I particularly like the way that the online books are now cross-version ie: each page covers all supported versions, instead of having a separate page for each version.

Given there is a lot of information online, what's the quickest way then to find the page that you're after?

For T-SQL commands, I find the best option is to type the command name followed by tsql into the search bar. For example, here's a search for the SET command:

Invariably, the command that you're after will be either the first or second entry.

But there's an even quicker way when you're editing in SQL Server Management Studio (SSMS). In this script that I'm editing:

If I double-click EXECUTE to highlight it, then hit F1, I'm taken directly to the correct page.

That's awesome but the F1 key can do more. If I was looking at or editing this script from WideWorldImporters:

and I'm wondering about the People table, I can highlight the full table name [Application].People, then hit Alt-F1.I'm then returned all sorts of information about the table:

What SSMS is doing is running the sp_help command for that object. In another post, we'll talk about how you could change that if needed, or change what happens with other function keys.