开发者

Segmenting a set of data with discrete and continuos data values into one of two groups without using analysis services?

Say I have a table with the following scheme (note: this example is hypothetical, though the real use case is similar).

Type      | Name         | Notes
=====================================================================================
Gender    | Gender       | Either Male or Female (not null)
GeoCoord  | Location     | Lattitude and longitude coordinates
string    | FullName     | 
Date      | BirthDate    | 
bool?     | LikesToParty | Data from a survey (null for people who didn't answer)

Manually looking at the data I know there is a strong correlation between LikesToParty and certain specific configurations of the other values. For example, men who have Wells as their middle name and who are between 15 and 30 years old and who comes from the LA area almost certainly has true in LikeToPa开发者_如何学运维rty. I would like to predict the value of LikesToParty for users that didn't answer the survey.

How do I mine this data using C# without having to buy an expensive package like analysis services? Are there any free libraries for c#?

I've already made a neural network that is capable of most of what I describe in my example above, but it is extremely slow to train and I'm not sure about if this is the right way to go. Maybe there is a better, more efficient, way to segment the data?


Because you are using both discrete and continous data, you might use a decision tree (C4.5, CART). There are some implemented libraries for them; don't beware of Java libs, as you can use the IKVM implementation of Java. For example, I have used the Weka API from C#.


What you describe is a standard problem in machine learning called: data classification.

Methods for data classifiation include: Neural Networks (as you mention), Support Vector Machines (see for example LIBSVM), Decision Trees (as mentioned in the previous answer). The output from these types of methods while very accurate can be difficult to interpret. You can also look probabilistic graphical models like Bayesian Networks, to answer deeper questions like: what is the probability that a male from southern California who likes to party is in his mid twenties.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜