Big Data/Data Sciene General?

Let's see if we can get something started. Anybody interested in diving into this area? Let's help each other out. I'll be posting the books I'm reading right now. All of them can be found on Library Genesis btw.

Anybody working in this field? I wanna hear your stories, how did you get in, self-taught or degree-holder, what techs to learn, etc.

One and for all

Is bid data a meme?

Is it only some statistic + operationg on data with programming language?

I have a PhD in Cognitive Psychology, tons of experience analysis experimental data, but now I'm trying to make the jump into Big data. Using pic related to see where I stand on the stats/math side, nothing new so far.

I have experience with SPSS, Matlab and R, will try to learn Hadoop and all the Hadoop-related platforms.

>Is it only some statistic + operationg on data with programming language?

I'm reading pic related as a sort of theoretical intro to Big Data (no stats nor programming on the book), and I'd say there's a bit more to it. Lots of database management, which is a lot more complex than just having a huge excell sheet or even an SQL-type database, due to the size and scale of it.

Pic related is a good read to get an idea of where the field is at the moment. Most essays are rather superficial, but they're good to give you a general idea. One guy argues that the future of Big Data is not just to analyze the data to answer a question, but build a decision-making and execution process on top of it to automate some activity.

The example is GPS's that show you a route (trivial today) vs. self-driving cars, where the GPS is just the base, and on top of that there's live-data processing to actually move the car from A to B.

I am working as java spring/hibernate monkey right now and I am looking for something worth to learn for future. Big data or AI seems like the best option.

What's the difference between data and big data.
Why does this buzzword exist.

>data
mostly SQL-type databases, where everything is neatly organized, all nicely formatted, ideally you have no missing data

>big data
NoSQL (Not only SQL), which means data is whatever comes in, you don't always have predefined categories. Big data is analyzing video streams together with text, or analyzing youtube comments for thousands of videos, not stuff you could do with excel or SQL.
Also, 3 V's: VELOCITY, VOLUME, VARIETY. You get tons of data coming in really fast and it's all kinds of stuff, not a pre-made form like "Name, age, gender, zip code, phone number".

Read the 1st chapter of for a better description.

there are some nice vid about big data on udacity

nobody here doing big data?

>"doing big data"
Are you?

I work with systems gathering data in the petabyte scale each day. It is hardly big data, just a lot of data.

THere is a site with big data exercises, hadoop etc

anybody knows website name?

no, I'm not, but trying to get into it. Tell us your education background, languages/skills to learn, any advice or tips

Stop making inane posts on Sup Forums and educate yourself is a tip. 98% of people on Sup Forums are idiots with neither education nor jobs


I have a degree in CS and work with surveillance systems for electricity, water and gas grids. They are wireless sensors connecting to base stations (Ethernet, GPRS, and a few other varieties). Everything gets sent back to us for analysis, then back to the customer in a heavily condensed format. The purpose is to detect damages (leaks, broken equipment, etc.) or generally measure performance

I am educating myself, working through all of the books I posted in the OP and following posts. Just looking for advice.

What framework do you use to manage your data, is it Hadoop or something of your own?

>framework
Our own
(super old)

Self-taught. Was working in a molecular biology lab at the time, and we had started some big sequencing projects. Nobody in the lab knew any bioinformatics, so I learned how to go through our data myself.

Good thing RStudio makes it so easy for beginners, but too bad its so bloated. Once I was experienced enough I switched to working within vim completely.

but you still use R or have you switched to something better?

That's kind of my thing too, in my psych lab I was doing all the stats and the little bit of programming we had to do (run some experiments on Matlab, minor Python shit), so now I'm trying to capitalize on that and learn a bit more.

What is your background? Degree, etc. I work in a molecular biology lab as an undergraduate, thinking of doing into big data genomics. But everyone seems to have a PhD in CS or Math.

I still use R. Since it can be slow for large jobs, I kinda want to learn some faster languages (the current consensus among R package devs seems to be to use C++ for the slower parts).

All my background is in molecular bio. I have never taken a CS course in my life, which unfortunately may have led to some bad habits that my colleagues like to make fun of.

>take big data and machine learning module
>oh boy, im sure ill be writing cool software that does cool machine learning shit
>nope
>3 months of Weka and Excel
>i took comp sci for this instead of just shaving my legs and being a slutty shitposter with actual income

Dude you gay.

wanna know how I know you're fat?

dont worry man, the person that posts all those gay images has a god awful ugly face. anyone have the picture?

because i took comp sci?

yup

doesnt make their legs any less smooth looking