Query: Avoid adding unnecessary sort columns in query with projection and optional navigation properties #6861
Description
Steps to reproduce
Run a query that does a projection such as the following.
var q = from b in lc.Bibs
select new
{
b.Id,
b.OclcNumber,
b.Title,
b.Author,
b.PublicationDate,
Language = b.Language.Name,
b.LanguageId,
Format = b.Format.Name,
b.FormatId,
BibStatus = b.BibStatus.Name,
b.BibStatusId,
b.BibStatusLastWriteTime,
b.BibStatusLastWriteUserName,
b.IsPublic,
b.Level,
b.CreationTime,
b.CreationUserName,
b.LastWriteTime,
b.LastWriteUserName
};
q = q.OrderByDescending(b => b.LastWriteTime);
q = q.Skip(200).Take(100);
var l = q.ToList();
The following zip file contains the definitions for the POCO classes that are used by the above query.
The issue
EF Core 1.1.0-preview1-final generates the following SQL. There are a few problems with the generated SQL. The main one is that you can see that it is sorting by FormatId and LanguageId columns in addition to the LastWriteTime column which is the only one that was specified in the LINQ query. The Bibs table used below is a table containing 7.7 million rows. The query takes 12 seconds to execute. If I run the SQL using SQL Management Studio, but, comment out the extra order by columns, the query executes in 0 seconds. It looks like what it's doing is adding sort columns for the foreign key columns that are not required. The project above is also bringing in the BibStatus navigation property. That one is required and there isn't a sort for it.
A second problem is that the query is doing a projection, but, it looks like EF Core is bringing in all columns from the tables that are used. In this particular query, there is a column of type VARCHAR(MAX) that contains a lot of data. So, I want to avoid bringing that in.
Another thing that I noticed is that it appears that it's using an INNER JOIN on BibStatus, which makes sense because it's a required field. It's doing LEFT JOINs on the the other fields which are not required, which also makes sense. However, EF 6 used a LEFT JOIN for all of the JOINs. I want to say LEFT JOIN is faster than INNER JOIN. For this particular query, it doesn't appear to make a difference. The main problem that seems to be slowing everything down is the extra ORDER BY columns on the foreign key fields.
exec sp_executesql N'SELECT [b].[Id], [b].[Author], [b].[BibStatusId], [b].[BibStatusLastWriteTime], [b].[BibStatusLastWriteUserName], [b].[Content], [b].[CreationTime], [b].[CreationUserName], [b].[FormatId], [b].[IsPublic], [b].[LanguageId], [b].[LastWriteTime], [b].[LastWriteUserName], [b].[Level], [b].[OclcNumber], [b].[PublicationDate], [b].[Title], [b].[Type], [b.Format].[Id], [b.Format].[CreationTime], [b.Format].[CreationUserName], [b.Format].[LastWriteTime], [b.Format].[LastWriteUserName], [b.Format].[Name], [b.Language].[Id], [b.Language].[Code], [b.Language].[CreationTime], [b.Language].[CreationUserName], [b.Language].[LastWriteTime], [b.Language].[LastWriteUserName], [b.Language].[Name], [b.BibStatus].[Name]
FROM [Bibs] AS [b]
INNER JOIN [BibStatuses] AS [b.BibStatus] ON [b].[BibStatusId] = [b.BibStatus].[Id]
LEFT JOIN [Formats] AS [b.Format] ON [b].[FormatId] = [b.Format].[Id]
LEFT JOIN [Languages] AS [b.Language] ON [b].[LanguageId] = [b.Language].[Id]
ORDER BY [b].[LastWriteTime] DESC, [b].[FormatId], [b].[LanguageId]
OFFSET @__p_0 ROWS FETCH NEXT @__p_1 ROWS ONLY',N'@__p_0 int,@__p_1 int',@__p_0=200,@__p_1=100
The following is what the SQL looks like for EF 6.1.3.
SELECT
[Extent1].[Id] AS [Id],
[Extent1].[OclcNumber] AS [OclcNumber],
[Extent1].[Title] AS [Title],
[Extent1].[Author] AS [Author],
[Extent1].[PublicationDate] AS [PublicationDate],
[Extent2].[Name] AS [Name],
[Extent1].[LanguageId] AS [LanguageId],
[Extent3].[Name] AS [Name1],
[Extent1].[FormatId] AS [FormatId],
[Extent4].[Name] AS [Name2],
[Extent1].[BibStatusId] AS [BibStatusId],
[Extent1].[BibStatusLastWriteTime] AS [BibStatusLastWriteTime],
[Extent1].[BibStatusLastWriteUserName] AS [BibStatusLastWriteUserName],
[Extent1].[IsPublic] AS [IsPublic],
[Extent1].[Type] AS [Type],
[Extent1].[Level] AS [Level],
[Extent1].[CreationTime] AS [CreationTime],
[Extent1].[CreationUserName] AS [CreationUserName],
[Extent1].[LastWriteTime] AS [LastWriteTime],
[Extent1].[LastWriteUserName] AS [LastWriteUserName]
FROM [dbo].[Bibs] AS [Extent1]
LEFT OUTER JOIN [dbo].[Languages] AS [Extent2] ON [Extent1].[LanguageId] = [Extent2].[Id]
LEFT OUTER JOIN [dbo].[Formats] AS [Extent3] ON [Extent1].[FormatId] = [Extent3].[Id]
LEFT OUTER JOIN [dbo].[BibStatuses] AS [Extent4] ON [Extent1].[BibStatusId] = [Extent4].[Id]
ORDER BY [Extent1].[LastWriteTime] DESC
OFFSET 0 ROWS FETCH NEXT 100 ROWS ONLY
The query appears to work correct, just much slower than it should due to the extra sort columns.
Further technical details
EF Core version: 1.1.0-preview1-final
Operating system: Windows 10 Anniversary Update
Visual Studio version: 2015