Any thoughts on AWK?

Question

Any thoughts on AWK?

Cooper Bailey

Any thoughts on AWK?
Our astronomy undergrad involves learning it, among other script languages and programs. Am i getting memed? Is AWK worth researching further?
This IS a course which teaches C and fucking Fortran as an alternative. Astronomy is probably the most computer-wise retarded field there is.

October 18, 2017 - 08:12

Ayden Wright

I use Sed, Awk and BASH to manipulate text data files. That’s the extent of my programming ability though.

October 18, 2017 - 08:15

Juan Richardson

Care to elaborate? What field do you work in?

October 18, 2017 - 08:17

Noah White

>Care to elaborate?
No

October 18, 2017 - 08:20

Camden Brown

Sed and Awk are great for manipulating text files. Add some skill with Bash and you’re all set for easy to moderate text processing. If it gets really complicated, you might want to use something more general purpose like Python.

But for routine text work, especially quick one-liners and you’ll be set. It’s a great skill to have to make drudge work go quickly.

For astronomy, I don’t know how useful C would be. However, Fortran is still in use in the sciences. It’s great and fast for number crunching.

I can’t comment too much without more details, but the curriculum doesn’t scream “stupid” to me.

October 18, 2017 - 08:21

Caleb Bennett

Seismic data processing

October 18, 2017 - 08:22

Jaxson Wilson

>Any thoughts on AWK?
>Our astronomy undergrad involves learning it, among other script languages and programs. Am i getting memed? Is AWK worth researching further?
>This IS a course which teaches C and fucking Fortran as an alternative. Astronomy is probably the most computer-wise retarded field there is.
You could do a lot of interesting text manipulation. But I would rather recommend Python for general purposeness, or even Julia for a (modern and) more scientific programming language

October 18, 2017 - 08:27

Leo Wilson

in CS, but never had an exclusive class on tools like these.
easy enough for me to pick up though, lifesaver with data manipulation like other people say.

October 18, 2017 - 08:51

Eli Gutierrez

From what I know, Python, Fortran, and C/C++ are the most used languages for astrophysics. You might be able to get a more definitive answer from /sci/ though
t. Astrophysics undergrad

October 18, 2017 - 08:59

Kevin Nelson

AWK IS a general purpose language. It shines with structured data like delimited ascii.

October 18, 2017 - 09:03

James Allen

The only thing you should really ever need awk for is to quickly parse out one or more columns from stdout given a delimeter. Using it as a full on scripting language is retard-tier. Just use a good language like Ruby or Python if you need to do anymore more complicated.

October 18, 2017 - 10:51

Gabriel Gomez

awk is easy to learn, and quite useful, being one of Unix's scripting languages. It's got nice functions for text and CSV.

October 18, 2017 - 10:53

Carter Morris

>and CSV.
Care to elaborate? Let's say I have a CSV file with lines that look like this

field 1, field 2, field 3, field 4, "field , 5"
field 1, field 2, "field , 3", field 4, field 5

How would I use awk to output field 4 from all rows?

October 18, 2017 - 10:57

Ryan Gomez

Easy, if you truly mean field four, which is 3" in the second line. You never use a delimiter which will occur in the data you're using.

October 18, 2017 - 11:03

John Morris

You're going to need Fortran for Physics.

October 18, 2017 - 11:04

Jason Young

not her but i can't resist:
{
FS = "," will make fields seperated by commas,
$4=$4 i believe awk will automatically remove trailing and leading whitespaces when you assign,
}
and from there you just print $4.

October 18, 2017 - 11:04

Isaiah Bell

That will split fields on commas that are in strings too though

October 18, 2017 - 11:07

Andrew Fisher

>if you truly mean field four, which is 3" in the second line

No it is not. do you understand how the CSV file format works?

October 18, 2017 - 11:12

Ryder Hughes

>not her
I'm not a girl

October 18, 2017 - 11:13

Charles Sullivan

yes, you are.
if you want strings to be preserved you will need a regex as the FS, i don't think it will be very complicated though.
(i would give you one if i could use something to help me write one, but i am not in a good position to do that.)

October 18, 2017 - 11:15

Aiden Parker

>implying men exist on the internet
dick or gtfo

October 18, 2017 - 11:16

Nathaniel Torres

Dick or GTFO

October 18, 2017 - 11:16

Andrew Young

post dick

October 18, 2017 - 11:17

Matthew Gomez

>regex as the FS

This is why I prefer to use Ruby or Python over awk in 99% of cases.

require 'csv'
CSV.parse(File.read "file.csv").each {|row| puts row[3]}

Done. A easy, elegant solution without any knowledge of regex required.

October 18, 2017 - 11:23

Julian Hughes

this

October 18, 2017 - 11:25

Jeremiah Thompson

Lets see that dick of yours

October 18, 2017 - 11:25

Kayden Cox

UNZIP IT

October 18, 2017 - 11:27

Carter Rodriguez

>using a built in csv library
..or you could download a piece of someone else's awk script to parse csv, and it would never (realistically, compared to a programming language which could have serious changes to libraries) have compatability issues, more than likely be faster, lighter, and simpler. (while actually having the code in your script, rather than encapsulating it away which is all fine and well until you want to slightly modify the format)
>without any knowledge of regex required
the library you're including probably uses regexes, that point is lacking.
there is a time and a place for non-dsl "real" programming languages, that time is not now.

October 18, 2017 - 11:29

Liam Smith

>You should download some random person's code from the internet rather than using a specialized parser which is included with the standard libraries for your language
I hope you never get a real job writing code because you would be absolutely terrible at it.

October 18, 2017 - 11:31

William Jones

Only two things you need to know OP.
user@devuan:~$ du -h /usr/bin/python2.7
3.7M /usr/bin/python2.7
user@devuan:~$ du -h /usr/bin/mawk
116K /usr/bin/mawk
user@devuan:~$

Python is a disgusting whale and Awk is a petite angel. Don't be a whale hunter.

October 18, 2017 - 11:32

Adam Harris

I work as an aerospace engineer and I would say learning both awk, sed, and unix in general is a huge thing that you should do.

Developing solvers and programs on windows is incredibly annoying if you're not using microsoft's C++. And even then most of the calcfarms you'll be using are going to be linux anyway. I constantly use awk/sed/coreutils on windows because parsing text or doing any sort of CLI processing is a fucking chore. Even powershell does not come with the same tools that a default install will come with.

When people say it's impossible to get "real work" done on linux I honestly have to question what they call real work, because doing anything that isn't some all in one install (like matlab) on windows is the fucking worst.

October 18, 2017 - 11:39

Adrian Bailey

awk and sed are nice, but they do text formatting only. awk can do a bit more, like averaging a column.

In a way, sed and awk can teach you more about big data by handling lines individually.

Stop being a baby. It's not hard to pick it up.

October 18, 2017 - 11:40

Kevin Nguyen

>how could I ever afford a hard drive with the capacity for a 3.7M interpreter as a NEET

October 18, 2017 - 11:41

Thomas Thompson

>Our astronomy undergrad involves learning it, among other script languages and programs. Am i getting memed? Is AWK worth researching further?
AWK has saved a bunch of times. Easily shaved off a week of a project just by writing 30 lines to process the output of some ampl code and integrate them in a format useful for R programming. It is a great investment

October 18, 2017 - 11:41

Camden Mitchell

>downloading "some random person's" code rather than using a language's stdlib
standard libraries need to maintain compatability as well as possible, and if they are for scripting languages then "specialized" is highly misleading.
whereas, getting a regex from somewhere else can literally mean anywhere, including some language's stdlib.
>for your language
why do you assume we will be using the language after we acquire the fourth field?
for all we know, it will literally be the end of a bash script, in which case, writing your own awk and using someone else's regex is easily the simplest method.
hell, if it's exactly as described, awk still beats out any "real" language for bug-surface, and using literally any regex that works could theoretically have 0 surface.
this is literally what awk was made for, regexes shouldn't scare you.
i hope you learned something.

October 18, 2017 - 11:42

Lucas Morris

>mawk
Is it any better than nawk? Why do you use it? I want to understand.

October 18, 2017 - 11:42

Hunter Harris

>the library you're including probably uses regexes, that point is lacking.
First of all, no it does not. Regex is slow as fuck. Ruby's implementation is much faster.
Second of all, even if it did use regex under the hood, the point would still stand because the person writing the high level code does not need to spend time manually dicking around with regex to solve the problem.

October 18, 2017 - 11:43

Brody Reed

This is just what my system came with. It's unlikely your system is actually using 'awk'. The 'awk' command is probably a link to a specific implementation.

October 18, 2017 - 11:46

Jaxson Bennett

This is how retards think. I'm well versed in awk and regex as I used them regularly in my previous job before eventually replacing all my nonsense Bash scripts with equivalent (but faster and more elegant) Ruby scripts. Using some awk bullshit one liner with a regex delimiter is innefficient and not very readable to anyone except a small minority of people. When you work within an organization with more than 3 people, code readability is more important than anything else except when performance becomes an issue. In this case, the Ruby implementation is more performant and also more readable.

October 18, 2017 - 11:46

Jace Watson

Ruby confirmed for best.
user@devuan:~$ du -h /usr/bin/ruby2.1
8.0K /usr/bin/ruby2.1
user@devuan:~$ du -h /usr/bin/mawk
116K /usr/bin/mawk

October 18, 2017 - 11:50

Cooper Reyes

Ah, okay. I too use what came with my system.

October 18, 2017 - 11:53

Jack Carter

user@debian:~$ du -h /usr/bin/du
112K /usr/bin/du

Quick, some rewrite 'du' in Ruby!

October 18, 2017 - 11:53

Liam Bell

Jesus dude, we know you suck at regex's and you probably suck at Unix command line since most utilities use regex's.

October 18, 2017 - 12:31

Mason Williams

>her

October 18, 2017 - 15:33

Chase Thomas

As an astro postgrad basically everything I use/write is python or c/cuda and probably 80% I know about but don't use myself are python. Sed and awk are handy but most people just use python scripts instead because thats what they know.

October 18, 2017 - 18:00

Austin Miller

It's a nice old school low level tool.

October 18, 2017 - 18:14

Mason Johnson

I need some gawk/regexp help

I want this to match any string that isn't a comment and contains ServerName and domain.com case insensitively

gawk '/[^#][[:space:]]*[sS]erver[nN]ame[[:space:]]*.+\.domain.com/' /tmp/notavhost.conf

[root@mutproxyf2r1 ~]# cat /tmp/notavhost.conf
ServerName toto.domain.com
ServerName toto.domain.com
ServerName toto.domain.com
Servername toto.domain.com
Servername toto.domain.com
#Servername toto.domain.com
#Servername toto.domain.com
# ServerName toto.domain.com
# Servername toto.domain.com
# ServerName toto.domain.com
# ServerName toto.domain.com

So far I've got it to everything expect when there's not space/tab at the beginning of the line.

October 18, 2017 - 18:22

Juan Parker

awk is a great text munging language. C and FORTRAN are more for complex numeric processing (in scientific settings, anyway) and not really comparable.

I'm primarily a C dev but I use awk one liners to extract data from log files on a daily basis. Perl one liners are elder god tier for this but the learning curve is much steeper. Either will run circles around Python because both have regex and input tokenization as language level features (it's no exaggeration to say you can do stuff in 10 characters in Perl that would be 10 lines of Python).

However, if you have to pick one language to learn well I'd go with Python. For scientific applications most everything that doesn't have high performance requirements will be written in Python. It's broadly useful in ways that C and awk are not.

October 18, 2017 - 18:54

Jeremiah Smith

^(\s)?(?!#)(\s+)?[sS]erver[nN]ame\s+\w+\.domain\.com$

Try that.

October 18, 2017 - 18:54

Easton Wright

grep -i ServerName /tmp/notavhost.conf | grep toto.domain.com
Sorry, but no, awk here is inefficient, you'll spend too much time writing it.

October 18, 2017 - 18:57

Michael Parker

I was in a lecture called Introduction to High Performance computing last year. The Prof conceived a Computer architecture for training purposes and he used awk to print out the state of the CPU for each von Neumann cycle, given as input a list of OPcodes or Assembly commands. So awk was used as an Assembler.

October 18, 2017 - 19:02

Camden Green

>piping grep to grep

C'mon man, at least use a single expression.

Agreed on using grep over awk however - it's faster both in dev and machine time. gawk is insanely slower than grep if all you're doing is matching one regex - it's really noticeable if the input is large enough.

October 18, 2017 - 19:07

Kayden Williams

> at least use a single expression
I don't know how. it's not egrep since I need both to match, probably a regex will do it, like \b(\s+\t+)\b but it still takes time to google.

October 18, 2017 - 19:12

Mason Collins

.* is cruise control for concatenating regexen.

October 18, 2017 - 19:14

Xavier Gutierrez

that's because /usr/bin/ruby is just a shell script that calls the actual bloated binary that's somewhere in /usr/lib

October 18, 2017 - 21:19

Jayden Sanchez

You can't just say that comma in the separator in CVS. When field contains a string that contains comma, you are fucked. Does Awk really has some CVS awareness?

October 18, 2017 - 23:30

Matthew Peterson

>what is quoting
>CVS pharmacy

October 19, 2017 - 00:46

Evan Morgan

If you ever need to find data in a well defined output, awk will be very helpful. A program that outputs context-sensitive data (markup languages) won't work with grep.

October 19, 2017 - 01:06

Logan Clark

Yeah, awk doesn't understand any escaping rules so if there's a possibility of fields containing commas just using awk with FS set to comma won't work. If you can guarantee the input is unambiguous it works great, however.

In general parsing CSV robustly requires an external library. The format really isn't that simple once you account for all the varying nonstandard ways different CSV exporters (*cough* Excel *cough*) handle escaping.

October 19, 2017 - 04:17

Aaron Hughes

This. For an honest comparison you have to count the size of all dependencies. It's trivial to generate a small executable that pulls in several MB of shared libraries.

October 19, 2017 - 04:19

Colton Taylor

Fortran is still the best for high-performance computing. It's like a cross between C and Matlab. I work with it for particle simulations.

Part of the reason it's so good is because of openmpi. It's really easy to parallelize, which is important in astronomy.

Awk is probably most useful for sysadmins. It's like vi - it's always present and it always works.

Think of it as learning tools that you can use when collaborating with peers and professors. None of what you're learning will die anytime soon.

October 19, 2017 - 04:24

Ethan James

You know C has OpenMPI too right?

As far as I'm aware, Fortran sticks around for 2 reasons: first, there's a huge legacy of Fortran libraries written by PhDs in decades past that work well and no one wants to understand and rewrite. Second, Fortran has handled representation of composite numeric values like matrices and complex numbers much better than C and C++. In Fortran there's usually a single performant way to handle those whereas in C and C++ there's a hodgepodge of different implementations. Take multidimensional arrays as the prime example: in C the obvious solutions all have odd corner cases like varying levels of support for VLAs and needing to use restrict to defeat the standard performance degrading aliasing rules. In Fortran all that shit has just worked for longer than postdocs have been alive.

October 19, 2017 - 04:51

Thomas Butler

>It's like vi - it's always present and it always works.
I like AWK, but perl made it into POSIX so it's always present too.

October 19, 2017 - 05:12

Adam Carter

im learning awk this semestre too and its been impressing me
you can do so much with so little, its amazing

October 19, 2017 - 05:15

Matthew Rogers

A professor at my college used AWK to process telescope images

October 19, 2017 - 05:21

Nicholas Watson

If you are ( probably not if you are asking ) or plan to be Linux admin it's worth to learn some "advanced-basics" by all means.

October 19, 2017 - 05:48

Jason Edwards

Why would I want to use sed when nano exists?

October 19, 2017 - 05:50

Camden Thomas

Learn the difference between buffers and streams and you'll find your answer young padawan.

October 19, 2017 - 05:52

Aiden Harris

Standard awk (aka POSIX awk) is a very small language, once you got the idea of having blocks executed when a line matches a condition, a short man page is enough to see everything the language does, and you quickly end up memorizing almost everything. It's not an important language per se, but it's so widespread, efficient for some tasks and simple there's no reason not to learn it if you use the CLI on any unixy system or if some people in your field use it.

October 19, 2017 - 05:54

Benjamin Nelson

sauce

October 19, 2017 - 07:02

Michael Thompson

>Ruby
kys

October 19, 2017 - 08:42

Dylan Green

can't link to the SUS, but any rate it's in the base install of every Linux/BSD even OpenBSD has it in their base install.

October 19, 2017 - 08:58

1 2 ... 8 Next

Any thoughts on AWK?

Last threads