FA: Choosing Rotation matrix, based on "Simple Structure Criteria"
One of the most important issues in using factor analysis is its interpretation. Factor analysis often uses factor rotation to enhance its interpretation. After a satisfactory rotation, the rotated factor loading matrix L' will have the same ability to represent the correlation matrix and it can be used as the factor loading matrix, instead of the unrotated matrix L.
The purpose of rotation is to make the rotated factor loading matrix have some desirable properties. One of the methods used is to rotate the factor loading matrix such that the rotated matrix will have a simple structure.
L. L. Thurstone introduced the Principle of Simple Structure, as a general guide for factor rotation:
Simple Structure Criteria:
- Each row of the factor matrix should contain at least one zero
- If there are m common factors, each column of the factor matrix should have at least m zeros
- For every pair of columns in the factor matrix, there should be several variables for which entries approach zero in the one column but not in the other
- For every pair of columns in the factor matrix, a large proportion of the variables should have entries approaching zero in both columns when there are four or more factors
- For every pair of columns in the factor matrix, there should be only a small number of variables with nonzero entries in both columns
The ideal simple structure is such that:
- each item has a high, or meaningful, loading on one factor only and
- each factor have high, or meaningful, loadings for only some of the items.
The problem is that, trying several combinations of rotation methods along with the parameters that each one accepts (especially for oblique ones), the number of candidate matrices increases and it is very difficult to see which one better meets the above criteria.
When I first faced that problem I realized that I was unable to select the best match by merely 'looking' at them, and that I needed an algorithm to help me decide. Under the stress of project's deadlines, the most I could do was to write the following code in MATLAB, which accepts one rotation matrix at a time and returns (under some assumptions) whether each criterion is met or not. A new version (If I would ever tried to upgrade it) would accept a 3d matrix (a set of 2d matrices) as an argument, and the algorithm should return the one that better fits the above criteria.
I am just asking for your opinions (I also think that there's been criticism over the usefulness of the method by itself) and perhaps better approaches to the rotation matrix selection problem. If someone wants to provide some code, I would prefer R or MATLAB.
P.S. The above Simple Structure Criteria formulation can be found in the book "Making Sense of Factor Analysis" by PETT, M., LACKEY, N., SULLIVAN, J.
P.S.2 (from the same book): "A test of successful factor analysis is the extent to which it can reproduce the original corr matrix. If you also used oblique solutions, among all select the one that generated the greatest number of highest and lowest factor loadings." This sounds like another constraint that the algorithm could use.
function [] = simple_structure_criteria (my_pattern_table)
%Simple Structure Criteria
%Making Sense of Factor Analysis, page 132
disp(' ');
disp('Simple Structure Criteria (Thurstone):');
disp('1. Each row of the factor matrix should contain at least one zero');
disp( '2. If there are m common factors, each column of the factor matrix should have at least m zeros');
disp( '3. For every pair of columns in the factor matrix, there should be several variables for which entries approach zero in the one column but not in the other');
disp( '4. For every pair of columns in the factor matrix, a large proportion of the variables should have entries approaching zero in both columns when there are four or more factors');
disp( '5. For every pair of columns in the factor matrix, there should be only a small number of variables with nonzero entries in both columns');
disp(' ');
disp( '(additional by Pedhazur and Schmelkin) The ideal simple structure is such that:');
disp( '6. Each item has a high, or meaningful, loading on one factor only and');
disp( '7. Each factor have high, or meaningful, loadings for only some of the items.');
disp('')
disp('Start checking...')
%test matrix
%ct=[76,78,16,7;19,29,10,13;2,6,7,8];
%test it by giving: simple_structure_criteria (ct)
ct=abs(my_pattern_table);
items=size(ct,1);
factors=size(ct,2);
my_zero = 0.1;
approach_zero = 0.2;
several = floor(items / 3);
small_number = ceil(items / 4);
large_proportion = 0.30;
meaningful = 0.4;
some_bottom = 2;
some_top = floor(items / 2);
% CRITERION 1
disp(' ');
disp('CRITERION 1');
for i = 1 : 1 : items
count = 0;
for j = 1 : 1 : factors
if (ct(i,j) < my_zero)
count = count + 1;
break
end
end
if (count == 0)
disp(['Criterion 1 is NOT MET for item ' num2str(i)])
end
end
% CRITERION 2
disp(' ');
disp('CRITERION 2');
for j = 1 : 1 : factors
m=0;
for i = 1 : 1 : items
if (ct(i,j) < my_zero)
m = m + 1;
end
end
if (m < factors)
disp(['Criterion 2 is NOT MET for factor ' num2str(j) '. m = ' num2str(m)]);
end
end
% CRITERION 3
disp(' ');
disp('CRITERION 3');
for c1 = 1 : 1 : factors - 1
for c2 = c1 + 1 : 1 : factors
test_several = 0;
for i = 1 : 1 : items
if ( (ct(i,c1)>my_zero && ct(i,c2)<my_zero) || (ct(i,c1)<my_zero && ct(i,c2)>my_zero) ) % approach zero in one but not in the other
test_several = test_several + 1;
end
end
disp(['several = ' num2str(test_several) ' for factors ' num2str(c1) ' and ' num2str(c2)]);
if (test_several < several)
disp(['Criterion 3 is NOT MET for factors ' num2str(c1) ' and ' num2str(c2)]);
end
end
end
% CRITERION 4
disp(' ');
disp('CRITERION 4');
if (factors > 3)
for c1 = 1 : 1 : factors - 1
for c2 = c1 + 1 : 1 : factors
test_several = 0;
for i = 1 : 1 : items
if (ct(i,c1)<approach_zero && ct(i,c2)<approach_zero) % approach zero in both
test_several = test_several + 1;
end
end
disp(['large proportion = ' num2str((test_several / items)*100) '% for factors ' num2str(c1) ' and ' num2str(c2)]);
if ((test_several / items) < large_proportion)
pr = sprintf('%4.2g', (test_several / items) * 100 );
disp(['Criterion 4 is NOT MET for factors ' num2str(c1) ' and ' num2str(c2) '. Proportion is ' pr '%']);
e开发者_JAVA技巧nd
end
end
end
% CRITERION 5
disp(' ');
disp('CRITERION 5');
for c1 = 1 : 1 : factors - 1
for c2 = c1 + 1 : 1 : factors
test_number = 0;
for i = 1 : 1 : items
if (ct(i,c1)>approach_zero && ct(i,c2)>approach_zero) % approach zero in both
test_number = test_number + 1;
end
end
disp(['small number = ' num2str(test_number) ' for factors ' num2str(c1) ' and ' num2str(c2)]);
if (test_number > small_number)
disp(['Criterion 5 is NOT MET for factors ' num2str(c1) ' and ' num2str(c2)]);
end
end
end
% CRITERION 6
disp(' ');
disp('CRITERION 6');
for i = 1 : 1 : items
count = 0;
for j = 1 : 1 : factors
if (ct(i,j) > meaningful)
count = count + 1;
end
end
if (count == 0 || count > 1)
disp(['Criterion 6 is NOT MET for item ' num2str(i)])
end
end
% CRITERION 7
disp(' ');
disp('CRITERION 7');
for j = 1 : 1 : factors
m=0;
for i = 1 : 1 : items
if (ct(i,j) > meaningful)
m = m + 1;
end
end
disp(['some items = ' num2str(m) ' for factor ' num2str(j)]);
if (m < some_bottom || m > some_top)
disp(['Criterion 7 is NOT MET for factor ' num2str(j)]);
end
end
disp('')
disp('Checking completed.')
return
I know this is not what you're asking, but you might find this useful even in other cases:
MATLAB should use loops only when really unavoidable. for example, your code
%// CRITERION 6
disp(' ');
disp('CRITERION 6');
for i = 1 : 1 : items
count = 0;
for j = 1 : 1 : factors
if (ct(i,j) > meaningful)
count = count + 1;
end
end
if (count == 0 || count > 1)
disp(['Criterion 6 is NOT MET for item ' num2str(i)])
end
end
Should be written as
%// CRITERION 6
disp(' ');
disp('CRITERION 6');
ct_lg_meaningful = sum(ct > meaningful,2) %// check where ct>meaningful, and sum along 2nd axis - gives a column vector of number of times each row was larger than meaningful.
criteria_not_met = find((ct_lg_meaningful == 0)|(ct_lg_meaningful>1)) %// in this vector find elements that are 0 or >1
if length(criteria_not_met)>0 %// if we found any elements, display them.
disp(['Criterion 6 is NOT MET for items ' num2str(criteria_not_met')]) %' <- to fix SO syntax highlighting
end
精彩评论