.NET Technical bits: Joins in LINQ Query Expressions

Wednesday, April 7, 2010

Joins in LINQ Query Expressions

As we discussed in the prevoius post a join clause takes two source sequences as input. The elements in each sequence must either be or contain a property that can be compared to a corresponding property in the other sequence. The join clause compares the specified keys for equality by using the special equals keyword. All joins performed by the join clause are equijoins. The shape of the output of a join clause depends on the specific type of join you are performing. The following are three most common join types:

1. Inner join

The following example shows a simple inner equijoin. This query produces a flat sequence of “product name / category” pairs. The same category string will appear in multiple elements. If an element from categories has no matching products, that category will not appear in the results.

var innerJoinQuery =
from category in categories
join prod in products on category.ID equals prod.CategoryID
select new { ProductName = prod.Name, Category = category.Name }; //produces flat sequence


2. Group join

A join clause with an into expression is called a group join.

var innerGroupJoinQuery =
from category in categories
join prod in products on category.ID equals prod.CategoryID into prodGroup
select new { CategoryName = category.Name, Products = prodGroup };


A group join produces a hierarchical result sequence, which associates elements in the left source sequence with one or more matching elements in the right side source sequence. A group join has no equivalent in relational terms; it is essentially a sequence of object arrays.

If no elements from the right source sequence are found to match an element in the left source, the join clause will produce an empty array for that item. Therefore, the group join is still basically an inner-equijoin except that the result sequence is organized into groups.

If you just select the results of a group join, you can access the items, but you cannot identify the key that they match on. Therefore, it is generally more useful to select the results of the group join into a new type that also has the key name, as shown in the previous example.

You can also, of course, use the result of a group join as the generator of another subquery:

var innerGroupJoinQuery2 =
from category in categories
join prod in products on category.ID equals prod.CategoryID into prodGroup
from prod2 in prodGroup
where prod2.UnitPrice > 2.50M
select prod2;


3. Left outer join

In a left outer join, all the elements in the left source sequence are returned, even if no matching elements are in the right sequence. To perform a left outer join in LINQ, use the DefaultIfEmpty method in combination with a group join to specify a default right-side element to produce if a left-side element has no matches. You can use null as the default value for any reference type, or you can specify a user-defined default type. In the following example, a user-defined default type is shown:

var leftOuterJoinQuery =
from category in categories
join prod in products on category.ID equals prod.CategoryID into prodGroup
from item in prodGroup.DefaultIfEmpty(new Product{Name = String.Empty, CategoryID = 0})
select new { CatName = category.Name, ProdName = item.Name };


from: A query expression must begin with a from clause. Additionally, a query expression can contain sub-queries, which also begin with a from clause. The from clause specifies the following:

o The data source on which the query or sub-query will be run.
o A local range variable that represents each element in the source sequence.
Both the range variable and the data source are strongly typed. The data source referenced in the from clause must have a type of IEnumerable, IEnumerable<(Of <(T>)>), or a derived type such as IQueryable<(Of <(T>)>).

let: In a query expression, it is sometimes useful to store the result of a sub-expression in order to use it in subsequent clauses. You can do this with the let keyword, which creates a new range variable and initializes it with the result of the expression you supply. Once initialized with a value, the range variable cannot be used to store another value. However, if the range variable holds a queryable type, it can be queried.

In the following example let is used in two ways:
o To create an enumerable type that can itself be queried.
o To enable the query to call ToLower only one time on the range variable word. Without using let, you would have to call ToLower in each predicate in the where clause.

class LetSample1
{
static void Main()
{
string[] strings =
{
"A penny saved is a penny earned.",
"The early bird catches the worm.",
"The pen is mightier than the sword."
};

// Split the sentence into an array of words
// and select those whose first letter is a vowel.
var earlyBirdQuery =
from sentence in strings
let words = sentence.Split(' ')
from word in words
let w = word.ToLower()
where w[0] == 'a' || w[0] == 'e'
|| w[0] == 'i' || w[0] == 'o'
|| w[0] == 'u'
select word;

// Execute the query.
foreach (var v in earlyBirdQuery)
{
Console.WriteLine("\"{0}\" starts with a vowel", v);
}

// Keep the console window open in debug mode.
Console.WriteLine("Press any key to exit.");
Console.ReadKey();
}
}
/* Output:
"A" starts with a vowel
"is" starts with a vowel
"a" starts with a vowel
"earned." starts with a vowel
"early" starts with a vowel
"is" starts with a vowel
*/

No comments:

Post a Comment