What is the best programming language for operationalizing research questions with large data sets? [closed]
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this questionI have completed my graduate public policy program but it was not at all tech heavy - some economics and econometrics but not requiring any CS knowledge. A good portion of the research jobs in DC require a basic level of programming knowledge. Mostly they want people who can perform advanced search and retrieval functions with large datasets and save stuff in different formats within their servers. And, they want STATA/stats knowledge, which I have some of.
My question is this: where is the best place to start learning some programming to get to this level? For instance, is Java, SQL, VBA or something else the best thing and most useful for these purposes? And, how much math do I need to write and run simple requests?
Thanks
My name is Alvaro. I worked as a senior bioinformatician on huge gene databases. Studied Bioinformatics at Harvard.
The script language you need for that is Perl.
Then you need a full understanding of SQL. All of that you can find it on the web.
If you get to be advanced you can also use R Programming language for Statistics. Check the web about the R Project. And also MathLab.
But not all at once!
Forget about Java or VBA for those purposes.
good luck
For statistics and database querying/manipulation I would start with SQL.
Keep in mind I have no knowledge of this field as I'm a web developer, but i would think something like Haskell, F#, R, or Python would be your best bet.
And yes, SQL. I would learn SQL92 in and out, and then after you've got the lowest common denominator move on to learning the extensions for MS SQL (I assume that working for the GOVT youre going to be primarily in a windows environment).
Maybe you should extent your STATA knowledge and try accessing big data in STATA via the SQL/ODBC interface.
VBA is no longer actively developed and definitely not a good option.
Well, if you'll be working with databases you'll almost certainly need to know some SQL. But SQL on its own is really just a way of communicating with the database - it isn't an actual programming language. Quite often SQL is paired up with a programming language, such as Java or PHP. Personally, I'm not a fan of Java, but it is used and taught quite widely in universities, so it would probably be a good choice.
I'm a Java programmer who creates a lot of reports. I would recommend starting with both a programming language (naturally I would recommend Java) and SQL at the same time, because creating tables independent of any real use isn't very interesting.
I work primarily with DB2 but to start I would recommend a free database such as MySQL. Once installed you can set up tables and learn about referential integrity, simple queries, joins, and all kinds of good stuff.
Then you can create some simple programs that display data from and read data to the DB. There are many simple examples of this in Java which will be easy to follow if you understand the basics of your database. The needs of the application will drive you to create more complicated DB designs.
After this the current direction in Java is moving to something called ORM (object relational mapping), sounds scary... but it isn't and what it works out to... more less... Is you can forget about SQL, all the tables are automatically transformed into Objects. Objects are the basic building blocks when working in Java. Long story short, you can use a database only knowing database theory and the Java language without specific knowledge of SQL (which is annoyingly different between the different databases). With all that said it is still way easier to learn if you know SQL.
The advice to work with a scripting language such as Perl is good too. PHP also would be a good choice if you're a little interested in producing programs for the web.
You might want to pursue a certification of some sort. There are many for all major databases and many languages. The subjects are large and even if you only look into certification requirements you would have guidance on what to study.
I don't think you need much math for most applications. I only use average() and some math for dates... One of my passions is graphics programming so I certainly don't mind math just most databases are full of business data so I wouldn't worry too much.
How much time do you have?
From what I’ve read my first (and only) suggestion is to grab the nearest programmer you know to the closest pub to get basic knowledge of what programming and databases are about :-) And then get back to stackoverflow.com
That's what I've done in 1997, in almost the same circumstances, being the financial consultant in ex Big Five.
I would avoid any standard programming language and head for the statistical analysis platforms. I'm not an expert, but S-Plus comes to mind, as does SPSS. You may want to click on the link that says, "math" below, because programmers will guide you towards programming languages. Not sure what stats people do, but I doubt they all know SQL and Perl, for instance.
精彩评论