Best way to utilize Parallel / PLINQ in finding keywords in all Excel Worksheet Cells
As title, I have a List<string> keywords;
and also a Workbook
object model that similar to Excel.
I would like to get all the WorkbookCell
that matches the keywords in the list.
I was thinking maybe Parallel the searching would be an good idea:
//Loop through all the Worksheets in parallel
Parallel.ForEach(Workbook.Worksheets, (ws, st) =>
{
if (!st.ShouldExitCurrentIteration)
{
//Loop through all the rows in parallel
Parallel.ForEach(ws.Rows, (wr, tk) =>
{
if (!tk.ShouldExitCurrentIteration)
{
//Loop through all the columns in parallel
Parallel.ForEach(wr.Cells, (cell, ctk) =>
{
if (cell.Value != null)
{
var cellValue = cell.Value.ToString();
//Block keyword found, add the occurance
var matchedKeyword = IsKeywordMatched(cellValue)开发者_如何学编程;
if (matchedKeyword != null)
{
matchedKeyword.AddMatchedCell(cell);
}
}
});
}
});
}
});
Would this be too much of parallel in fact? Please let me know if you have better ideas.
** I have less than 20 worksheets in normal case, but every worksheet will contains more than 10000 of rows and hundreds of columns.
The default number of parallel threads is equal to the number of cores per default. Each parallel loop is related to the overhead of splitting (clustering) the data into n portions and merging them again. I wold say it makes sense to live only the first loop if the number of worksheets is greater then number of cores in common case, otherwise split data on the second level. Nested parallel loops will only decrease performance. Thus yes, you are right it's too much parallelism.
This looks as a good candidate for paralleling for me...
worksheet.Cells.AsParallel().Select(x => new{x,KeywordMatched(x.Value.ToString())}).Where(...)...
Should give you almost linear performance improvement vs. number of cores available.
HINT: Change your IsKeywordMatched function to KeywordMatched, which returns the string matched or NULL if nothing is there. Then filter the resulting query (.Where(...)) by the records where stinr is not null.
精彩评论