Any thoughts on AWK?

Any thoughts on AWK?
Our astronomy undergrad involves learning it, among other script languages and programs. Am i getting memed? Is AWK worth researching further?
This IS a course which teaches C and fucking Fortran as an alternative. Astronomy is probably the most computer-wise retarded field there is.

I use Sed, Awk and BASH to manipulate text data files. That’s the extent of my programming ability though.

Care to elaborate? What field do you work in?

>Care to elaborate?
No

Sed and Awk are great for manipulating text files. Add some skill with Bash and you’re all set for easy to moderate text processing. If it gets really complicated, you might want to use something more general purpose like Python.

But for routine text work, especially quick one-liners and you’ll be set. It’s a great skill to have to make drudge work go quickly.

For astronomy, I don’t know how useful C would be. However, Fortran is still in use in the sciences. It’s great and fast for number crunching.

I can’t comment too much without more details, but the curriculum doesn’t scream “stupid” to me.

Seismic data processing

>Any thoughts on AWK?
>Our astronomy undergrad involves learning it, among other script languages and programs. Am i getting memed? Is AWK worth researching further?
>This IS a course which teaches C and fucking Fortran as an alternative. Astronomy is probably the most computer-wise retarded field there is.
You could do a lot of interesting text manipulation. But I would rather recommend Python for general purposeness, or even Julia for a (modern and) more scientific programming language

in CS, but never had an exclusive class on tools like these.
easy enough for me to pick up though, lifesaver with data manipulation like other people say.

From what I know, Python, Fortran, and C/C++ are the most used languages for astrophysics. You might be able to get a more definitive answer from /sci/ though
t. Astrophysics undergrad

AWK IS a general purpose language. It shines with structured data like delimited ascii.

The only thing you should really ever need awk for is to quickly parse out one or more columns from stdout given a delimeter. Using it as a full on scripting language is retard-tier. Just use a good language like Ruby or Python if you need to do anymore more complicated.

awk is easy to learn, and quite useful, being one of Unix's scripting languages. It's got nice functions for text and CSV.

>and CSV.
Care to elaborate? Let's say I have a CSV file with lines that look like this

field 1, field 2, field 3, field 4, "field , 5"
field 1, field 2, "field , 3", field 4, field 5

How would I use awk to output field 4 from all rows?

Easy, if you truly mean field four, which is 3" in the second line. You never use a delimiter which will occur in the data you're using.

You're going to need Fortran for Physics.

not her but i can't resist:
{
FS = "," will make fields seperated by commas,
$4=$4 i believe awk will automatically remove trailing and leading whitespaces when you assign,
}
and from there you just print $4.

That will split fields on commas that are in strings too though

>if you truly mean field four, which is 3" in the second line

No it is not. do you understand how the CSV file format works?

>not her
I'm not a girl

yes, you are.
if you want strings to be preserved you will need a regex as the FS, i don't think it will be very complicated though.
(i would give you one if i could use something to help me write one, but i am not in a good position to do that.)

>implying men exist on the internet
dick or gtfo

Dick or GTFO

post dick

>regex as the FS

This is why I prefer to use Ruby or Python over awk in 99% of cases.

require 'csv'
CSV.parse(File.read "file.csv").each {|row| puts row[3]}


Done. A easy, elegant solution without any knowledge of regex required.

this

Lets see that dick of yours

UNZIP IT

>using a built in csv library
..or you could download a piece of someone else's awk script to parse csv, and it would never (realistically, compared to a programming language which could have serious changes to libraries) have compatability issues, more than likely be faster, lighter, and simpler. (while actually having the code in your script, rather than encapsulating it away which is all fine and well until you want to slightly modify the format)
>without any knowledge of regex required
the library you're including probably uses regexes, that point is lacking.
there is a time and a place for non-dsl "real" programming languages, that time is not now.

>You should download some random person's code from the internet rather than using a specialized parser which is included with the standard libraries for your language
I hope you never get a real job writing code because you would be absolutely terrible at it.

Only two things you need to know OP.
user@devuan:~$ du -h /usr/bin/python2.7
3.7M /usr/bin/python2.7
user@devuan:~$ du -h /usr/bin/mawk
116K /usr/bin/mawk
user@devuan:~$

Python is a disgusting whale and Awk is a petite angel. Don't be a whale hunter.

I work as an aerospace engineer and I would say learning both awk, sed, and unix in general is a huge thing that you should do.

Developing solvers and programs on windows is incredibly annoying if you're not using microsoft's C++. And even then most of the calcfarms you'll be using are going to be linux anyway. I constantly use awk/sed/coreutils on windows because parsing text or doing any sort of CLI processing is a fucking chore. Even powershell does not come with the same tools that a default install will come with.

When people say it's impossible to get "real work" done on linux I honestly have to question what they call real work, because doing anything that isn't some all in one install (like matlab) on windows is the fucking worst.

awk and sed are nice, but they do text formatting only. awk can do a bit more, like averaging a column.

In a way, sed and awk can teach you more about big data by handling lines individually.

Stop being a baby. It's not hard to pick it up.

>how could I ever afford a hard drive with the capacity for a 3.7M interpreter as a NEET

>Our astronomy undergrad involves learning it, among other script languages and programs. Am i getting memed? Is AWK worth researching further?
AWK has saved a bunch of times. Easily shaved off a week of a project just by writing 30 lines to process the output of some ampl code and integrate them in a format useful for R programming. It is a great investment

>downloading "some random person's" code rather than using a language's stdlib
standard libraries need to maintain compatability as well as possible, and if they are for scripting languages then "specialized" is highly misleading.
whereas, getting a regex from somewhere else can literally mean anywhere, including some language's stdlib.
>for your language
why do you assume we will be using the language after we acquire the fourth field?
for all we know, it will literally be the end of a bash script, in which case, writing your own awk and using someone else's regex is easily the simplest method.
hell, if it's exactly as described, awk still beats out any "real" language for bug-surface, and using literally any regex that works could theoretically have 0 surface.
this is literally what awk was made for, regexes shouldn't scare you.
i hope you learned something.

>mawk
Is it any better than nawk? Why do you use it? I want to understand.

>the library you're including probably uses regexes, that point is lacking.
First of all, no it does not. Regex is slow as fuck. Ruby's implementation is much faster.
Second of all, even if it did use regex under the hood, the point would still stand because the person writing the high level code does not need to spend time manually dicking around with regex to solve the problem.

This is just what my system came with. It's unlikely your system is actually using 'awk'. The 'awk' command is probably a link to a specific implementation.

This is how retards think. I'm well versed in awk and regex as I used them regularly in my previous job before eventually replacing all my nonsense Bash scripts with equivalent (but faster and more elegant) Ruby scripts. Using some awk bullshit one liner with a regex delimiter is innefficient and not very readable to anyone except a small minority of people. When you work within an organization with more than 3 people, code readability is more important than anything else except when performance becomes an issue. In this case, the Ruby implementation is more performant and also more readable.

Ruby confirmed for best.
user@devuan:~$ du -h /usr/bin/ruby2.1
8.0K /usr/bin/ruby2.1
user@devuan:~$ du -h /usr/bin/mawk
116K /usr/bin/mawk

Ah, okay. I too use what came with my system.

user@debian:~$ du -h /usr/bin/du
112K /usr/bin/du

Quick, some rewrite 'du' in Ruby!

Jesus dude, we know you suck at regex's and you probably suck at Unix command line since most utilities use regex's.

>her

As an astro postgrad basically everything I use/write is python or c/cuda and probably 80% I know about but don't use myself are python. Sed and awk are handy but most people just use python scripts instead because thats what they know.

It's a nice old school low level tool.

I need some gawk/regexp help

I want this to match any string that isn't a comment and contains ServerName and domain.com case insensitively

gawk '/[^#][[:space:]]*[sS]erver[nN]ame[[:space:]]*.+\.domain.com/' /tmp/notavhost.conf


[root@mutproxyf2r1 ~]# cat /tmp/notavhost.conf
ServerName toto.domain.com
ServerName toto.domain.com
ServerName toto.domain.com
Servername toto.domain.com
Servername toto.domain.com
#Servername toto.domain.com
#Servername toto.domain.com
# ServerName toto.domain.com
# Servername toto.domain.com
# ServerName toto.domain.com
# ServerName toto.domain.com

So far I've got it to everything expect when there's not space/tab at the beginning of the line.

awk is a great text munging language. C and FORTRAN are more for complex numeric processing (in scientific settings, anyway) and not really comparable.

I'm primarily a C dev but I use awk one liners to extract data from log files on a daily basis. Perl one liners are elder god tier for this but the learning curve is much steeper. Either will run circles around Python because both have regex and input tokenization as language level features (it's no exaggeration to say you can do stuff in 10 characters in Perl that would be 10 lines of Python).

However, if you have to pick one language to learn well I'd go with Python. For scientific applications most everything that doesn't have high performance requirements will be written in Python. It's broadly useful in ways that C and awk are not.

^(\s)?(?!#)(\s+)?[sS]erver[nN]ame\s+\w+\.domain\.com$

Try that.

grep -i ServerName /tmp/notavhost.conf | grep toto.domain.com
Sorry, but no, awk here is inefficient, you'll spend too much time writing it.

I was in a lecture called Introduction to High Performance computing last year. The Prof conceived a Computer architecture for training purposes and he used awk to print out the state of the CPU for each von Neumann cycle, given as input a list of OPcodes or Assembly commands. So awk was used as an Assembler.

>piping grep to grep

C'mon man, at least use a single expression.

Agreed on using grep over awk however - it's faster both in dev and machine time. gawk is insanely slower than grep if all you're doing is matching one regex - it's really noticeable if the input is large enough.

> at least use a single expression
I don't know how. it's not egrep since I need both to match, probably a regex will do it, like \b(\s+\t+)\b but it still takes time to google.

.* is cruise control for concatenating regexen.

that's because /usr/bin/ruby is just a shell script that calls the actual bloated binary that's somewhere in /usr/lib

You can't just say that comma in the separator in CVS. When field contains a string that contains comma, you are fucked. Does Awk really has some CVS awareness?

>what is quoting
>CVS pharmacy

If you ever need to find data in a well defined output, awk will be very helpful. A program that outputs context-sensitive data (markup languages) won't work with grep.

Yeah, awk doesn't understand any escaping rules so if there's a possibility of fields containing commas just using awk with FS set to comma won't work. If you can guarantee the input is unambiguous it works great, however.

In general parsing CSV robustly requires an external library. The format really isn't that simple once you account for all the varying nonstandard ways different CSV exporters (*cough* Excel *cough*) handle escaping.

This. For an honest comparison you have to count the size of all dependencies. It's trivial to generate a small executable that pulls in several MB of shared libraries.

Fortran is still the best for high-performance computing. It's like a cross between C and Matlab. I work with it for particle simulations.

Part of the reason it's so good is because of openmpi. It's really easy to parallelize, which is important in astronomy.

Awk is probably most useful for sysadmins. It's like vi - it's always present and it always works.

Think of it as learning tools that you can use when collaborating with peers and professors. None of what you're learning will die anytime soon.

You know C has OpenMPI too right?

As far as I'm aware, Fortran sticks around for 2 reasons: first, there's a huge legacy of Fortran libraries written by PhDs in decades past that work well and no one wants to understand and rewrite. Second, Fortran has handled representation of composite numeric values like matrices and complex numbers much better than C and C++. In Fortran there's usually a single performant way to handle those whereas in C and C++ there's a hodgepodge of different implementations. Take multidimensional arrays as the prime example: in C the obvious solutions all have odd corner cases like varying levels of support for VLAs and needing to use restrict to defeat the standard performance degrading aliasing rules. In Fortran all that shit has just worked for longer than postdocs have been alive.

>It's like vi - it's always present and it always works.
I like AWK, but perl made it into POSIX so it's always present too.

im learning awk this semestre too and its been impressing me
you can do so much with so little, its amazing

A professor at my college used AWK to process telescope images

If you are ( probably not if you are asking ) or plan to be Linux admin it's worth to learn some "advanced-basics" by all means.

Why would I want to use sed when nano exists?

Learn the difference between buffers and streams and you'll find your answer young padawan.

Standard awk (aka POSIX awk) is a very small language, once you got the idea of having blocks executed when a line matches a condition, a short man page is enough to see everything the language does, and you quickly end up memorizing almost everything. It's not an important language per se, but it's so widespread, efficient for some tasks and simple there's no reason not to learn it if you use the CLI on any unixy system or if some people in your field use it.

sauce

>Ruby
kys

can't link to the SUS, but any rate it's in the base install of every Linux/BSD even OpenBSD has it in their base install.