Hey Sup Forums

Hey Sup Forums.
Explain me how a Compiler works.

Other urls found in this thread:

en.wikipedia.org/wiki/Abstract_syntax_tree
steve-yegge.blogspot.com/2007/06/rich-programmer-food.html
docs.python.org/3/reference/grammar.html
en.m.wikipedia.org/wiki/Yacc?wprov=sfla1
en.wikipedia.org/wiki/Control_flow_graph?wprov=sfla1
twitter.com/NSFWRedditImage

it takes high level code and spits out intermediate macro-assembler instructions
these instructions can then be translated into machine code for the CPU to slurp up

en.wikipedia.org/wiki/Abstract_syntax_tree

Do you have CS degree?

Nope. Why?
That's new. Thanks, user.

I know that the project is dead, but the installer always managed to work. What happened?

Text --> compiler magic --> machine code

Arch Anywhere

Arch Anywhere was better so they trashed Architect

Of course, the compiler itself is a program, and it's often written in the same language as the language it's for. The C compiler used to be written in C, though GCC has mainly switched to C++. The C++ compiler is written in C++.

1)Instruction set architecture(ARM is a company that make ISA, Intel also make ISA, AMD also make ISA.) Same company can make and implement ISA.

2)ISA is implemented into(realized into ) physical microprocessor. How is it is implemented depends on microarchitecture which is "design" philosophy used to put different components at different physical location and underlying communication.
Imagine a microchip. How "add" of ISA is realized through placement of transistors is a "design philosophy".(Intel, AMD have their own implementation of diffrent ISA).
When ISA is implemented it is said to be binary machine i.e it understands only stream of binary information. How ISA is implemented is also fuzzy to me :( Not a electronics guys. Feel free to insert.

3) A processor iteself doesn't do anything. It sits there with basic knowledge of arithmetics. The implementor of ISA(Intel or AMD) provides a set of human readable mnemonics. These mnemonics are readable to human and inside processor (imagine these mnemonics are give to processor entry point i.e a place to start) are decoded into binary stream.
A set of these mnemonics is called assemble language of processor.

4) Although human readable assembly are not practical enough to map human desired intruction to assembly code. Hence a programming language is made. This programming language wraps the assembly instruction into more practical insturctions. A programmer can use tthis programming language to make video games.

5)Between 4 and 5 lies the operating system. Operating system defines a thing called kernel that owns the processor. kernel exposes certain function calls(called kernel interface) that "programming language" can use. Why? Because its safe to let someone decide whether your machine instruction is good or bad for the overall system. The programming language , we can now say C programming language because most of the time the immediate wrapper programming language is C.

>Compilers take a stream of symbols, figure out their structure according to some domain-specific predefined rules, and transform them into another symbol stream.
t. yegge

steve-yegge.blogspot.com/2007/06/rich-programmer-food.html

From the dragon book, in order of what happens first to last:

>character stream (a file)
>lexical analyzer (splits characters into meaningful bits)
>syntax analyzer (analyzes tokens into syntax tree)
>semantic analyzer (analyzes syntax tree)
>intermediate code generation (takes syntax tree and turns it into a format that can be assembled into any machine specific code)
>target machine code (the program machine code)

There are other things like optimization, that also come into play but the above is basically how compilers translate source files into object code

GCC and LLVM both convert all programming languages to some intermediate programming language. With GCC its called GIMPLE, with LLVM it's called LLVM IR. Pretty much all compiler optimizations are done to this intermediate code so as not to have to write an optimizer for each language. Once the code has been appropriately optimized it's then compiled to something the machine can understand.

A good way to think of this is in terms of real languages. If I wanted to translate something from Japanese to Italian I might have difficulty finding someone who speaks both languages well. Instead it'd be much easier to have someone translate the Japanese to English and then another person to translate the English to Italian.

>The programming language , we can now say C programming language because most of the time the immediate wrapper programming language is C,
has access to assembly language but for a average programmer (which is expected be writing application code) its not neccessary to give him enough priveleges. Thus he better be using kernel interfaces api.
There are multiple related terminology.
Kernel API.
System calls.
Kernel interface.
I am not very clear but I can say
Assembly < Above 3 < C language.

6)Now comes the compiler. C language specifies symantics and syntax of writing programs. Compiler is a thing that take your source code and produce a corresponding assembly code. Remember assembly language is designed by Inter/AMD for the processor unit.
How compiler does this is pretty straight forward too if you think abstractly.
>It parse your source code and make decision and produce assembly code.
You will really need to read ullman's compiler to understand the rules and principles.

So how the compiler came about? Compiler for C is implemented in assembly language. The immediate wrapper programming has to be written in assembly language. There is no other way except binary code. Impossible.

7) Rest of the technology you see are sitting on top of above 6 stuffs. From C++, JAVA, to DirectX.

>dragon book
Alfred Aho, Jeffrey Ullman
Why can't you remember these great people. It hurts my soul.
>Holy shit this capcha. WTF.I am to pick female dresses now.

What did you mean by this?

lexical analyzer splits character streams into tokens, that's what the syntax analyzer is using to build the syntax tree.

Basically this.

Each language specifies its syntax in a context free grammar, you can find pythons CFG here as an example:
docs.python.org/3/reference/grammar.html

This is basically a series of rules, where you can start with a symbol, and use the production rules to develop a representation of your source code as a tree structure.

A tool like yacc will analyze your source code using these production rules to create a representation of your source code in the form of an abstract syntax tree
en.m.wikipedia.org/wiki/Yacc?wprov=sfla1

At this point, if there are syntax errors, yacc will throw an error because it can't follow the rules in the grammar to build a tree. When the tree is built, you can perform analysis like type checking, and making sure variables are declared before being used.

The AST is further analyzed, and split into basic blocks. A control flow graph is created, which allows the program to optimize the code in many ways, in addition to things like allocating registers on the processor.
en.wikipedia.org/wiki/Control_flow_graph?wprov=sfla1

The control flow graph can then be converted to machine language using a translation scheme, which is different for each processor instruction set and architecture

Compiler design is a complicated endeavor that uses lots of CS theory, and is a big area of academic research

>Why can't you remember these great people.
Let's not suck the dicks of their ghosts too hard.
They created a book that misleads students into the assumptions that parsing is somehow a big deal.

Wow. I understand now.