Data Modeling: a tale of a man with only one name

I've lost count of the number of computer systems that I've worked with over the years, that insist that everyone has at least two names. Yet there are many people in the community who only have a single name. I'm not just talking about people like Cher who might well have a stage name but then have another name on her passport. I'm talking about people who really only have one name.

Can your systems deal with that?

And if they only have one name, which column does it go into?

Need to be PC

In Australia, we have standards for this sort of thing. While many existing systems will use words like Surname, LastName, FirstName, ChristianName, etc. they aren't considered PC (politically correct or appropriate) now either. In our standards, the terms are:

Family Name
Given Names

Only the family name is mandatory. At Meteor (Metadata Online Registry), these are also defined as 40 characters long each. And only the family name is mandatory. That standard says:

Some people do not have a family name and a given name, they have only one name by which they are known. If the person has only one name, record it in the 'Family name' field and leave the 'Given names' field blank.

A Tale of a Disruptive Man with One Name

At a place that I previously worked, we had a guy apply for a job, and he only had one name. He'd legally changed his name to just a single word. He wanted nothing to do with his family any more and removed all reference to them. He was a pretty alternate sort of character but I so wanted him to come and work with us. He would have broken every system in the place.

But in an interesting twist of fate, he couldn't come to his job interview. The airline told us they were not prepared to allow him to fly in what he was wearing.

To this day, I can't imagine what you could wear to an interview but an airline wouldn't let you fly in it.

T-SQL 101: #17 Paginating rows returned from SQL Server T-SQL queries

When I need to display a potentially large number of rows on a screen, I often need to show them in pages. This is called paginating the rows.

For example, if I'm showing bank transactions, I might want to show 25 per page. The challenge is that I also need to be able to ask for a specific page number. So my query becomes: give me page 7 where there are 25 rows per page. In that case, I want to skip the first 150 rows, then ask for the next 25 rows.

Now I could get SQL Server to send me all the rows at once, and I could just choose which rows to display. If there are many thousands though, that could be really wasteful, mostly in transferring all that data from SQL Server to my application. And if I want to only show a single page at a time, I really don't want to get all the rows every time and then just throw most of them away.

While we could do this with T-SQL in all versions of SQL Server, it was messy. I was really pleased when SQL Server 2012 gave us an option to help with this. It came in the form of extra clauses to the ORDER BY clause.

Take a look at the following query:

It's quite elegant. It says to skip the first 150 rows, then return the next 25 rows, exactly as we wanted. To get any particular page, we just subtract one from the page, multiply by the number of rows per page, and use that as the offset:

OFFSET (@PageNumber – 1) * @RowsPerPage ROWS
FETCH NEXT @RowsPerPage ROWS ONLY

Note that this option isn't really a performance enhancement but that's a story for another day. I like the fact that it is declarative though. You are telling SQL Server what you want, not how to do it. That's almost always a good thing.

Learning T-SQL

It's worth your while becoming proficient in SQL. If you'd like to learn a lot about T-SQL in a hurry, our Writing T-SQL Queries for SQL Server course is online, on-demand, and low cost.

Book: Pro Power BI Architecture

One of my Kiwi buddies who specializes in Power BI is Reza Rad. I was pleased to see he had a set of eBooks now on Power BI but was especially happy to see he had a book called Pro Power BI Architecture.

There are lots of books around to discuss how to use Power BI but there's been a real lack of books on architecting solutions using Power BI. So if you want to learn to develop dashboards or reports, this isn't the book for you. Reza has other books for that.

I enjoyed reading the book and I liked the degree of coverage it gave to these topics.

If you are looking for ways to integrate Power BI into your solutions architecture, this book is a pretty good starting point.

What I was really hoping for though, was more info on administration. Mind you, the book doesn't claim to provide that. I keep getting asked about materials around administration issues. Perhaps that's another edition for Reza to consider. But the architects who need high level overviews of all the key topics should be pretty happy.

Bottom line?

I enjoyed reading this book, and it's a pretty easy read. Great for architects considering Power BI.

Greg's rating: 7 out of 10

Note: as an Amazon Associate I earn (a pittance) from qualifying purchases but whether or not I recommend a book is unrelated to this. One day it might just help cover some of my site costs. (But given the rate, that's not really likely anyway).

Fixing Locking and Blocking Issues in SQL Server – Part 4 – Deadlocks

This is part 4 in a series of posts:

  • Part 1 covered being sure there is a locking and blocking issue
  • Part 2 covered the impact of RCSI
  • Part 3 looked at the impacts of indexing on locking and blocking

Today, though, I want to look at the next potential issue, and that's deadlocks.

Deadlocks

I often hear people interchangeably using the words lock, block, and deadlock.

Locks are a standard feature of systems that are designed for concurrency. Nothing nasty to see here. When they become a problem is in two situations:

  • They are held too long and cause a blocking problem
  • Incompatible locks are held and cause deadlocks

The simplest example of a lock is when I have an item and you want it, but you must wait until I finish with it. That's just normal with sharing something.

If I finish using it quickly, you might barely even notice. But if I hang onto it for a very long time, you experience blocking. Apart from slowing you down (which might seem to be a problem to you), it's also not actually an issue. When I finish, you'll still get it.

As soon as we have more than one item though, things can get messy. If I have item A and I want item B, but you are holding item B because you also want item A, then we have a problem. That's a deadlock.

It also doesn't have to just be two people. I could have item A and want item B. Terry might have item B and want item C. Nga might have item C and want item D. Paula might have item D and want item A. A larger deadlock has occurred.

Resolving Deadlocks – the Old Way

In the past, I worked with systems that didn't resolve this automatically. What would typically happen is that two users would suddenly have their applications freeze. Then a little while later, another user needed something that one of the first two had, and her application would freeze too.

This would go on and on across the building. We used to call it creeping death syndrome.

Eventually, someone would get sick of waiting, and go to find an admin. The admin would work out which user caused the first problem and kill their process. Everyone would then come back to life.

Resolving Deadlocks – the SQL Server Way

In SQL Server, there was a conscious decision to not let this happen and to resolve it automatically. By the word "resolve" though, you need to read "kill one of the user's processes".

SQL Server has a process that looks for these situations and cleans them up. The only question, of course, was which process to kill.

In early versions, they used to kill whichever process had just put the last link the deadlock chain. That process would return an error explaining they had been chosen as a deadlock victim and roll back.

(Note: users don't take well to being told they're victims).

Over time though, this caused issues as long-running processes could collide with short-running processes and get rolled back. They could then have the same issue happen again and again.

So a decision was made to look at the amount of data that had been written to the transaction log by the processes, and kill the one that had done the least work. That worked way better.

That's how SQL Server currently handles deadlocks. In the next post, I'll talk about how you should plan to work around deadlocks.

 

SDU Tools: Migrate SQL Server Object Names to PostgreSQL Names

A while back I wanted to do some testing of Azure Database for PostgreSQL and I needed a database to work with. So I thought I'd try to migrate a basic WideWorldImporters database across to PostgreSQL.

I wrote scripts to do the main migration but then decided that I wanted to change the names of the columns.  While a large number of people use PascalCased names in SQL Server, it's really, really common to use snake_cased names in PostgreSQL. Object names in PostgreSQL are also limited to 63 characters. I needed to alter these names while generating the scripts and I didn't want to do that manually.

To make it easy, in our free SDU Tools for developers and DBAs, we added a function PGObjectName to help.

You can see it in action in the image above, and in the video here:

To become an SDU Insider and to get our free tools and eBooks, please just visit here:

http://sdutools.sqldownunder.com

 

Opinion: Some thoughts for today on professionalism

Who moved my cheese (diagram tool)?

I was part of an interesting email chain today. It started with a guy complaining that in SQL Server Management Studio 18x, the database diagram tool had been removed.

Now I was disappointed to see the tool gone. Mind you, I didn't ever think it was a great diagramming tool. I would have hoped they might have replaced it by a better one instead of just removing it.

Anyway, what caught my eye was his complaint that by removing it, Microsoft had stopped him doing his work effectively on "complex legacy databases". In particular, he told me how he would use the diagram tool to add and delete foreign key relationships in the databases that he worked on.

I have to admit to being a bit horrified by that. I cannot imagine almost ever wanting to do that, let alone routinely. 

If I want to add a relationship, at the very least script it and run the script. That would give me a record of what I did. Often I'll need to apply the same change to another copy of the database anyway. But even if there was only one database, if it had to be restored to before the change, what would I do? Make the change again and hope to do it the same way?

I mentioned that I'd really prefer this to be in a database project and in source control, and deployed from that.

Professionals vs Cowboys

And the guy mentioned that he always just made changes the simplest way, and moved on, mostly because his customers were disorganized and didn't ever have things like source control or places to keep scripts. It all sounded like pure cowboy stuff and left me thinking about a consultant's role in this.

Awesome image by Oleksii Hlembotskyi

Perhaps age will fix it?

I especially loved it when he assumed I was young and when I'd been around long enough (he thought another 20 years), I'd think the same way. I've actually been in this industry 42 years so far, 27 of them with SQL Server. I might just have to disagree with him on that. I'm far from sure I'll still be doing this in 20 years' time. I suspect that if I'm still around at all, I'll be doing something more relaxing.

Awesome image by Vlad Sargu

Simple Thought for the Day

Look, the message for today is simple: No matter how sloppy your customers are, you owe it to yourself to try to do professional quality work.

T-SQL 101: #16 Ordering the output of a T-SQL query

When you run a query against SQL Server, you might want the rows to be returned in a specific order. However, the first thing you should ask yourself is whether that's needed at all.

Sorting can be a very expensive operation if there is no suitable index that's already sorted the data for you. And the question is: should SQL Server be the one doing the sorting? In many cases, the client application that's reading the data might be a better place to do the sorting. That's not always true but it's worth considering. You often have way more clients than your single database server, and sorting data unnecessarily in the database server could limit your ability to scale your system to higher volumes.

Let's let SQL Server do it

However, let's assume that you do want SQL Server to sort the data. An important first concept is that there's no default order for data when you query SQL Server. I regularly see people who think that if a table has a primary key, or has a clustering key (long story we'll talk about another time), that the data will come back in that order.

It might, but you can't guarantee it. Don't assume that if it does in simple tests, that it will continue to do so in production volumes and situations.

If you want rows to come back in a specific order, you need to say which order you want.

And we do that with the ORDER BY clause. Look at these queries:

The simplest example would be to just order the rows by a single column. In the first query though, I've ordered by two columns. That means that the rows are sorted by Description first, and if the Description values are the same, they are then sorted by ProductID, within each Description.

Ordering can also be ascending (ASC) or descending (DESC). Ascending is the default and we usually don't put ASC in the queries, but there's nothing to stop you doing that if you prefer it.

When you have multiple columns in the ordering, each column can be sorted in a different order. In the second query, I've made the Description column sort descending, but then within each Description, the ProductID would be sorted ascending.

Learning T-SQL

It's worth your while becoming proficient in SQL. If you'd like to learn a lot about T-SQL in a hurry, our Writing T-SQL Queries for SQL Server course is online, on-demand, and low cost.

Learning Mandarin: Does Chinese have words for Yes and No?

I remember finding it strange when I was first learning about Chinese, that they really don't have words for yes and no. It seemed obvious that any language would have that. Now I think they're unnecessary words, and what the Chinese do is better.

If you type yes and no into Google Translate and ask for the Chinese equivalents, this is what you see:

But the translation of yes here is more like "is" and the translation of no here is more like "not have".

And this is what's different. The general approach when you're asked a question in Chinese is to repeat back the same verb that was used when the question was asked. So if the question is:

你有什么食物吗? (Nǐ yǒu shén me shíwù ma?) which means "do you have any food". Instead of saying "yes", the answer is either "have" or "not have".

And if you're asked if you like something, the answer is "like" or "not like".

This is really quite succinct and effective.  Yes and No are really quite superfluous words.

Learning Mandarin

I'll write more soon on the best methods for learning. If you want to get a taste for it in the meantime though, my current favorite site is iTalki, and my favorite teacher by far is Amy He. If you decide to try it, click here and it's cheaper for both you and me.

DevOps: Is AIOps just yet another almost meaningless acronym?

DevOps has quickly become a core part of how many organizations deliver IT, and in particular, how they deliver applications. But just as quickly as it has become popular, a whole series of XXXOps names have appeared. One of the latest is AIOps. So is it just yet another almost meaningless acronym?

Well as Betteridges Law of Headlines suggests, the answer is no.

When I first saw the term, I was presuming this would be about how to deploy AI based systems, and I wondered why on earth that would need a special name. But that's not what it is.

So what is AIOps?

AIOps is the use of artificial intelligence (AI) and machine learning (ML) techniques to allow us to analyze IT problems that occur so that we can respond to them fast enough to be useful.

The core problem is that we're now generating an enormous volume of metric and log data about every step of all our processes, and about the health of all our systems and applications yet so much of that data is never processed, or at least not processed fast enough to be useful.

Only machines can do this.

The term AIOps seems to have been coined by Will Cappelli (a former analyst with Gartner). In the end, humans won't be scouring logs and responding to what they find. Instead, they'll be teaching the machines what to look for, and how to correlate information from a variety of sources to find what is really going on.

Cappelli is now at Moogsoft and sums up AIOps quite distinctly:

AIOps is the application of artificial intelligence for IT operations. It is the future of ITOps, combining algorithmic and human intelligence to provide full visibility into the state and performance of the IT systems that businesses rely on.

People are already doing this but it's likely in the future that this will become a well-known job role. It will be important to guide the machine's learning to teach it to recognize the appropriate patterns.

If you are working in related IT roles, it might be time to start to add some data science, AI, and/or ML into your learning plans.

Fixing Locking and Blocking Issues in SQL Server – Part 3 – Indexing

This is part 3 in a series of posts:

  • Part 1 covered being sure there is a locking and blocking issue
  • Part 2 covered the impact of RCSI

Today, though, I want to look at the next potential issue, and that's indexing.

Indexing impacts on locking

It might seem obvious that you need to fix indexing issues in your database applications but it mightn't be so obvious that locking is one of the main things that is affected.

There are two ways that poor indexing impacts locking:

  • Query duration
  • Number of columns (and data volume) involved in the locks

One of the key points that I made in the previous post was that the longer queries run for, the longer they hold locks.

Before you start investigating any detailed issues around locking, you must ensure that the queries run as quickly as possible. 

So often, as soon as the queries run fast, the locking and blocking issues disappear.

But another more subtle issue surrounding indexes is that they can avoid the need to lock entire rows of data when reading. Take the following query plan running against AdventureWorks as an example:

This query just needs StateProvinceID from the address table. There is an index on the StateProvinceID column so SQL Server reads from that index rather than from the table. No big surprise there but note that only a small amount of data will be read and locked.

Now, look at this query plan:

In this case, only the City column was needed but note the index that SQL Server chose to read. It is an index on AddressLine1, AddressLine2, City, StateProvinceID, and PostalCode. Notice that City was not the first component of the index. So why did SQL Server choose to use this index?

Because it was the smallest object it could read that contained the information, even though it had to read the whole index.

Whenever it can, SQL Server tries to minimize the amount of data it's reading, so once again, the index here will reduce the amount of locking that's going on. And less locking means less potential for locking and blocking issues.

Next time I'll talk about the issue that seems to get the most attention: deadlocks.

Learning about indexing

If indexing isn't your "thing" (yet), now would be a great time to take our online on-demand course on SQL Server Indexing for Developers. (It will help DBAs too).