How to make a group by statement (from MySQL) in Stata?

2023-04-01 08:54 问答作者：

I am not a statistics guy but have to deal with quite some data. In most cases these data sets come from an online survey; hence I do have a MySQL database and know how to get some results out of that.

However, now I got a Stata file and I am required to do some analysis. In MySQL I'd know how to do that, but I am stuck in Stata and ask for your help.

I have a not too small table (roughly 50k rows) containing following columns (there are more cols but these are the ones I have to work with):

id - Object ID, unique values

name - Name of object, string value

class - Class of object, integer range 1 - 6

origin - Origin of object, integer range 1 - 2

Within the 50k rows there are only about 7k different names. In Stata I can retrieve all names with list name and could even restrict it to a single class with list name if class == 2.

Now I want a list of all different names along with a count of objects having that name and have the list sorted by count. In MySQL I'd query SELECT name, COUNT(*) AS cnt FROM objects GROUP BY name ORDER BY cnt DESC. But how would that be done in Stata?

Next steps would be to get such lists for each class or for both origins, i.e. SELECT name, COUNT(*) AS cnt FROM objects WHERE class = 2 GROUP BY name ORDER BY cnt DESC, is that possible with Stata, too?

ps: I don't know if stackoverflow is the right place as Stata is not really a programming language, is it? But I 开发者_如何转开发found some Stata-related questions here; that's why I posted it here. If there's a better place to do so, please point me to the right place.

Keep in mind that Stata only works with rectangular tables of fixed length, so you can only add columns that span the whole 50k rows. Within this setup, this is what you can do.

For the first problem (the list of names and frequencies), you can

   collapse (count) freq = name, by(class)
   sort class freq name
   list class name freq, sepby(class)

Note that collapse will delete the existing data and replace with the summary. (Usually, I hate this command for this aspect of data management, but it should work here.) If you don't want this to happen, here's a more sophisticated trick:

   bysort class name : generate long freq = _N
   bysort class name : generate byte first = (_n==1)
   sort class freq name
   list class name freq if first, sepby(class)

(Explanation: _N is the number of observations in by-group, and _n is the number of the current observation within the by-group.)

You can then subset this to the class of interest with if class==#, as you already know.

Also check out the groups command downloadable using ssc inst groups.

继续阅读：group-by stata

How to make a group by statement (from MySQL) in Stata?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？