Segmentation Fault


OS extends its physical memory by using virtual memory which is implemented using a technique called paging.
Paging is another form of swapping between HDD and Physical Memory.
If application requests a page in memory and it could find it then a Page Fault occurs.
If the address of the page requested is invalid then Invalid Page Fault occurs which causes the program to abort.

What is segmentation fault?

In operating systems that use virtual memory, every process is given the impression that it is working with large, contiguous sections of memory. In reality, each process’ memory may be dispersed across different areas of physical memory, or may have been paged out to a backup storage (typically the hard disk). When a process requests access to its memory, it is the responsibility of the operating system to map the virtual address provided by the process to the physical address where that memory is stored. The page table is where the operating system stores its mappings of virtual addresses to physical addresses.

The page table lookup may fail for two reasons. The first is if there is no translation available for that address, meaning the memory access to that virtual address is invalid. This will typically occur because of a programming error, and the operating system must take some action to deal with the problem. On modern operating systems, it will send a segmentation fault to the offending program.

The page table lookup may also fail if the page is not resident in physical memory. This will occur if the requested page has been paged out of physical memory to make room for another page. In this case the page is paged to a secondary store located on a medium such as a hard disk drive (this secondary store, or “backing store”, is often called a “swap partition” if it’s a disk partition or a swap file, “swapfile”, or “page file” if it’s a file). When this happens the page needs to be taken from disk and put back into physical memory.

Common causes segmentation fault:

The following are some typical causes of a segmentation fault:

Attempting to execute a program that does not compile correctly. Some compilers will output an executable file despite the presence of compile-time errors.

  •     Dereferencing NULL pointers
  •     Attempting to access memory the program does not have rights to (such as kernel structures in process context)
  •     Attempting to access a nonexistent memory address (outside process’s address space)
  •     Attempting to write read-only memory (such as code segment)
  •     A buffer overflow
  •     Using uninitialized pointers

When a Page Fault occurs, OS does the following:

  •     Determine the location of the data in auxiliary storage.
  •     Obtain an empty page frame in RAM to use as a container for the data.
  •     Load the requested data into the available page frame.
  •     Update the page table to show the new data.
  •     Return control to the program, transparently retrying the instruction that caused the page fault.


source : wikipedia and webopedia

When you hit URL in browser

The 3 things that happens:

1.DNS Resolution

2.TCP connection

3.Browser Rendering

4.HTTP Request/Response


DNS Resolution on Browser

The operating system looks at /etc/host file,first for the ip address of can be changed from /etc/nsswitch), then looks /etc/resolv.conf for the DNS server IP for that machine


Pick (call it dinky)

Now dinky could be configured in 2 ways:
1.Recursive DNS
2.Iterative DNS

If “dinky” is Recursive DNS, following is what going to happen:
STEP 1:You enter in the browser. So the operating system’s resolver will send a DNS query for the A record to the DNS server

STEP 2:The DNS server on receiving the query, will look through its tables(cache) to find the IP address(A record) for the domain But it does not have the entry.

STEP 3: As the answer for the query is not available with the DNS server, this server sends a query to one of the DNS root server,for the answer. Now an important fact to note here is that root server’s are always iterative servers.

[POINTER #1: How to query ROOT DNS Server]

STEP 4: The dns root server’s will reply with a list of server’s (referral) that are responsible for handling the .COM gTLD’s.

STEP 5: Our DNS server will select one of the .COM gTLD server from the list given by the root server, to query the answer for “”

STEP 6: Similar to the root server’s , the gTLD server’s are also iterative in nature, so it replies back to our DNS server with the list of IP addresses of the DNS server’s responsible for the domain(authoritative name server for the domain)

STEP 7: This time also our DNS server will select one of the IP from the given list of authoritative name servers, and queries the A record for The authoritative name server queried, will reply back with the A record as below. = <XXX:XX:XX:XX> (Some IP address)

STEP 8: Our DNS server will reply us back with the ip domain pair(and any other resource if available). Now the browser will send request to the ip given, for the web page

NOTE:In the above Dinky gets a list of servers and has to choose one Server from the list to further proceed. The most Famous DNS server software BIND uses a technique called as rtt metric(Round Trip Time metric). Using this technique, the server tracks the RTT of each root server, and selects the one,with lower RTT.


If “dinky” is Iterative DNS (or Non-Recursive DNS), following is what going to happen:
In this mode Dinky will only get referral to other DNS servers and will not provide final answer on behalf of our resolver.

Since our DNS server (Dinky) is not a recursive name server(which means its iterative), it will give us the answer if it has in its records. Otherwise will give us the referral to the root servers(it will not query the root server’s and other servers by itself.)

Now its the job of our resolver to query the [#1]root server, [#2].COM TLD servers, and [#3]authoritative name server’s, for the answer.

STEP 1: You enter in the browser. So the operating system’s resolver will send a DNS query for the A record to the DNS server .

STEP 2: The DNS server on receiving the query, will look through its tables(cache) to find the IP address(A record) for the domain But it does not have the entry.

STEP 3: Now instead of querying the root server’s, our DNS server will reply us back with a referral to root servers. Now our operating system resolver, will query the root servers for the answer.

Now the rest of the steps are all the same. The only difference in iterative query is that

if the DNS server does not have the answer, it will not query any other server for the answer, but rather it will reply with the referral to DNS root server’s
But if the DNS server has the answer, it will give back the answer(which is same in both iterative and recursive queries)
in an iterative query, the job of finding the answer(from the given referral), lies to the local operating system resolver.
POINTER 1 : How to query ROOT DNS Server?
the root name server(.) is the most important resource in the name server heirarchy. when any name server is asked for an information which it does not have, the first thing that name server does is asking one of the (.)root name server.

there are 13 root name servers as follows.

Now the ip address of all the root servers mentioned above are known to all the DNS software packages, by default. Which means all the DNS servers can reach these root servers without any other DNS server.

Why only 13 DNS Servers?
The main reason is because when you plan a big architecture like DNS root server’s, you need to go into several depths to analyse performance issues. So as i said there are 13 IP addresses. If you are a networking guy or a system administrator, you might already know that UDP is better than TCP where performance is the requirement. And due to performance issues, a UDP packet used for DNS is limited to 512 bytes, if your payload goes above 512 bytes, then TCP will be used.

TCP involves very high overhead, because it includes multiple steps and procedures to establish a TCP connection, that can slow the entire process.

UDP is better suited for reliability and the second one is suited for performance. Things like DNS should never be slow, hence it by default works on UDP. And a single UDP packet should contain all this 13 IP addresses along with other UDP proto
col information (416 bytes of 13 ip addresses and remaining protocol information of UDP). Yeah sure you can easily have 30 or 40 DNS root server IP addresses, but you will not be able to send all of them in one UDP packet (you will have to send them in multiple packets, that will reduce the performance). Hence for performance and low network overhead the root servers are limited to 13 IP addresses.

Even India had 3 DNS root servers. One in Bangalore, Chennai, and New Delhi.

There is a technology called as Anycasting that plays a major role in achieving this distributed architecture of DNS root servers. In simple terms anycasting is a technology that makes multiple servers, in fact many servers in different locations to share a single IP address. Which means, many servers will be available at that one address. Whenever a request is send to an anycast IP address, then networking routers will route that request to the nearest server possible. This means if i want to reach from India the nearest possible location is Chennai (which is shown in the map), rather than reaching some other location in the world. This is the reason why DNS root servers rely heavily on IP anycasting technology.
NOTE: A ALIAS record (say pointing it to ELB endpoint ) is not exposed to commands like HOST. Whereas if we pointed a domain to ELB endpoint using CNAME record then DNS Resolver will show it as an alias to ELB endpoint.

TCP connection

Layer 5: This is the layer from where our applications tries to establish connection to a server. For example imagine that you have Firefox Browser installed
on your machine, and you are trying to establish connection with Now the Browser knows how to open a temporary port and request a connection to 80 port
on server.This layer is called as the application Layer, where all our applications try to establish connections. Be it a browser,ftp client,ssh client.

Layer 4: This is the layer where our topic comes into picture, this layer is named as Transport Layer, There are two protocols in this layer(TCP,UDP). Either of them
can be used, Mostly in our day to day life we use TCP(because most of the applications require a reliable connection which TCP provides).UDP is also used for example,
in order to query a DNS server we normally use UDP protocol. Most of you must have heard about segments in network or MSS (Maximum Segment Size), Now TCP provides
reliability in communication with the help of something called as Positive Acknowledgment with Re-transmission (PAR).
Step1: Machine 1 wants to initiate a connection with machine 2, So machine 1 sends a segment with SYN(Synchronize Sequence Number). This segment will inform the machine 2 that Machine 1 would like to start a communication with Machine 2 and informs machine 2 what sequence number it will start its segments with.

Note: Sequence Numbers are mainly used to keep data in order.

Step2: Machine 2 will respond to Machine 1 with “Acknowledgment” (ACK) and SYN bits set. Now machine 2’s ACK segment does two things; they are as below.
1. It acknowledges machine 1’s SYN segment.
2.It informs Machine 1 what sequence number it will start its data with.

Step 3:Now finally machine 1 Acknowledges Machine 2’s initial sequence Number and its ACK signal. And then Machine 1 will start the actual data transffer.

Note: Initial Sequence Numbers are randomly selected while initiating connections between two machines.

Session Layer

The means that the client has sent a get request with 936b and session prevails till the entire 936b has reached and then the 2nd GET request is issued.

Physical Layer

3.CAT 6

.jpg .doc .css etc

Browser Rendering

When the URL is entered in the browser , the DNS , TCP and HTTP request is made. Once the Server send the HTML page to browser , the browser parse it and create an object in memory called DOM. While the browser creates the DOM it gets alot of reference such as Images, JS. when it sees those references they are downloaded. Some of the resources may be served from a different domain which adds up the DNS resolution , TCP connections.

DOM – At the top is the HTML tag and below 2 tags
1.Head Tag (more browser related things like title, search-enginge keywords etc)
2.Body Tag(content of the page.)

3 Primarily used resource types are :
a.CSS – they dictate the Font size and layout of webpage.
b.Scripts like JS – the dynamic element and they change DOM dynamically (Menu button, then 3rd components such as FB LIKE button, inline Twitter streams etc)

CSS and Scripts doesnt require to be external, they can internally embedded in the code as well.

Modern Browsers does many things in parallel like build a DOM while downloading resources in parallel. But this is not good as JS can do many things after or even during the page is loading. (one such function is document.write which will inject html code while the browser is reading it which can even change whole meaning of the already create DOM/Page). As a result whenever the Browser come across a Script the browser stops building the DOM and then executes the scripts. Since some scripts are external , the browser has to download it and execute it before it can continue parsing the HTML.

2 types of resources block/delay the rendering of the page – CSS (until all CSS are downloaded and processed the browser will not paint the page) and JS block the construction of the DOM becos they do that bcos browser can only paint what they understood and put into the DOM.

Best Practice :
CSS [so that they can downloaded earlier by the browser]
JS [placed at the bottom so that minimum things are lost if any issue caused to scripts. ]
JS makes the webpage more dynamic but the inclusion of XMLHttpRequest made the JS even more dynamic by contacting server and get response without page reload/refresh.


Cookies are primarily used to make the stateless webapp to statefull webapp.
“session-only cookies” that last only for a particular browsing session, or permanent cookies that last for multiple session.

 Example of HTTP Request/Response

Example of HTTP request

GET /dumprequest HTTP/1.1 Host: 
Connection: keep-alive Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8 
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.114 Safari/537.36
Accept-Language: en-US,en;q=0.8

Example of HTTP response

Scheme http
Port 80
Path /

Headers received from

Request time: 0.30609 s

HTTP/1.1 200 OK
Server nginx
Date Sun, 08 Jun 2014 15:13:00 GMT
Content-Type text/html; charset=UTF-8
Connection close
Vary Cookie
X-hacker If you’re reading this, you should visit and apply to join the fun, mention this header.
Link <;; rel=shortlink
Content-Encoding gzip





The two big differences between NAS and Fibre Channel SAN are the wires and the protocols. In terms of wires, NAS runs on Ethernet, and FC-SAN runs on Fibre Channel. The protocols are also different. NAS communicates at the file level, with requests like create-file-MyHomework.doc or read-file-Budget.xls. FC-SAN communicates at the block level, with requests over the wire like read-block-thirty-four or write-block-five-thousand-and-two.


If you think the protocol is more important, then iSCSI is like SAN; if the wire is more important, then iSCSI is like NAS.


Technical people know that the protocol is more important; it determines how the compute server talks with the storage. With FC-SAN, a filesystem like UFSVxFS or ZFS runs on the host and converts file requests into the block requests that are sent over the wire. With NAS, the host sends file requests over the wire, so a filesystem must run in the storage system. There are advantages and disadvantages to both approaches, but the point is that from an architectural perspective, iSCSI looks just like FC-SAN. The filesystem runs on the host and sends block requests over the wire. (Many technical people are offended by the idea that iSCSI might be NAS.)


Business people focus on infrastructure, budgets, and org charts, so they worry about wires. Choosing NAS over SAN for Oracle, Exchange, or SAP affects the capital budget for Ethernet versus Fibre Channel, and it can even affect organizational structure. Sometimes an “Apps/Servers/Storage Group” owns Fibre Channel, while Ethernet belongs to a “Distributed Infrastructure Group”. Is the Apps group allowed to buy and manage their own Ethernet switches if they decide to run Oracle over NAS? They may argue, “We should own the switches between server and storage.” The Distributed Infrastructure group may argue, “We own all TCP/IP networking,” but the corporate network may not offer the bandwidth or quality of service required for Oracle-on-NAS. I’ve seen CIOs do reorgs over these issues. Business-wise, iSCSI looks just like NAS, so business people often assume that iSCSI is a form of NAS.


Credits : Article by – Dave


Identation, is what important in Python.

Python can be used for both CGI and Scripts

Python scripts are usually stored in file with .py extension. In linux there is no significance with any extension. It is just naming convention to understand that the file contains python script.

In python script – mention the python shebang header – this will tell the  environment to use the Python Interpreter.
Example – #!/usr/bin/python
The usage of shebang header is useful to create a standalone executable script like :


Else we have to mention the python interpreter:

#python myscript

It is also common in a linux environment to have multiple versions of the python interpreter.

The Print Statement
-We can give arithemetic operations
-We have to use ” ” or ‘ ‘  to print strings

-On python interpreter enter “help”
help>keywords (To list out keywords)

Data types

  1. string
  2. int
  3. float
  4. lists
  5. tuples
  6. dictionary

To find the type of value in a variable use Type command

ExampleVairable = "Unni"
<type 'str'>

Arthimetic Precedence

PEMDAS – Parethesis Exponent Multiplication Addition Subtraction

1.Concatenation using  +

Example I - #print "first" "second"
first second
Example II #print "first" + "second"

2.String Multiplication

#print "Hello"*3
output - HelloHelloHello
#print "Hello\n"*3
#print "Hello\t"*3
Hello Hello Hello

Keyboard Inputs
Python uses 2 funtions for this purpose
1.input()     : used to collect integers
2.raw_input() : used to collect any data types


message = raw_input("What is ur message? ")
var = input("Enter an integer ?")

String Manipulation
#message = “new string”


2.string extraction

#print message[0]

A range of string can printed using “:”
NOTE: Strings are immutable ie once assigned they cannot be changed.You can create a new variable with a variation of string but you cannot change the string.
example : message[0] = m – will throw err.

Importing a String Module

import string

1.Change all to upper case



New string

3.Capitalize each words

o/p = New String

4.Split & Join

['new' 'string']

#string.join(message) NOTE:the list is converted to string provided the list pure list not nested list.
'new string'

This is important to create a list and store them.(useful for log analysis)

Last Note example – Python Script

import string
message = "new string"
print message, "contains", len(message), "characters"
print "The first character in message is " message[0]
print "Example of slicing" , message, message[0:4]
for letter in message:
print letter


List allow store values – string , integer etc in separate delimited by ‘,’
List is used in conjunction with string – by splitting strings

Define a empty list :

numlist = []

Define a List with values:

#numlist = [1,2,3,4,5]

#print numlist

There are some methods that can operate on List.
1.Reverse Method

#print numlist
numlist2= [6,7,8,9,10]

2.Append Method

#print numlist

Now numlist is a nested list.

#print numlist[0]

But at index 5 is a list not integer , hence when we enter

print numlist[5]

Now to address each element in the nested list we have to call using 2 dimension arrays such as :

#print numlist[5][0]

3.Pop Method of List
pop function will remove the last value in last in first out fashion. This will treat the list as a stack

[6,7,8,9,10] is removed

#print numlist

4.Extend Function
Used to increase an existing list – one flat list

#print numlist

5.Insert Function
Used to insert a value.
Syntax – insert(index,value)

print numlist
print numlist

How all of these used normally is in logs – they read in using python, split using string funciton, convert them into list, then use list function to manipulate. Like dont want the 1st column and wants the 5th column and the form a modified version of the log.

Range Function
Range is used to create a sequence of numbers starting from 0

numlist3 = range(11)
print numlist3


numlist3 = range(1,11) - to mention range in range function.
print numlist3


numlist3 = range (0,100,40) - to mention the increment value of 40
print numlist3

List Using Strings

stringlist = ["Unni","SR"]
print stringlist

All list methods used above also applies here.

print stringlist
['SR', 'Unni']

stringlist2 = ["abc", "def"]
print stringlist
['Unni', 'SR', ['abc', 'def']]

print stringlist[0]

print stringlist[2][0]

print stringlist.pop()

print stringlist.extend(stringlist2)
['Unni', 'SR', 'abc', 'def']

print stringlist
['Unni', 'SR', 'Hi', 'abc', 'def']

An Example Scenario :
Pass a line of apache log to a variable, say if it is :

logfile = "20040122 80 GET index.html"
import string
logfile2 = string.split(logfile) # this will convert the value of logfile into a List data type
print logfile2
['20040122', '', '80', 'GET', 'index.html']

Now we can check the data type of logfile and logfile2



This is important as now in logfile2 all list methods are available like appending, poping, etc.

All of the values of the list variable can now be accessed using the index variable.

print logfile2[0]

print logfile2[0:2] or logfile3 = logfile2[0:2]
['20040122', '', '80']

Finally we have to convert back to string

logfile4 = string.join(logfile3)
print logfile4
20040122 80



Tuples are immutable compared to Lists. Its like a read-only version of Lists.


#products = ["doll", "bag", "cake"]
#products2 = ("doll", "bag", "cake")

#print products
['doll', 'bag', 'cake'] = These are Lists.

#print products2
('doll', 'bag', 'cake') = These are Tuple which are immutable

#type (products2)

These tuples are used in Dictonaries, because Dictonaries requires immutable data types for its key-value construction.

Example for Dictonaries

#items = {'Apple':85, 'Orange':50}

NOTE : Keys has to be unique.

#print items
{'Apple':85, 'Orange':50}

To add a new Key-Value pair

#items['Grape'] = 55

To change values of existing key


#print items

{'Apple':90, 'Orange':50, 'Grape':55}

items is an object of Dictionaries in the Object Oriented world of Python, Therefore there are methods that can be used for Dictonaries.

To list all keys in a dictionary object


To list all values in a dictionary object


Delete a key

#del items['Apple']
['Orange', 'Grape']


#print items

To iterate through all key-value pairs in Dict using a for loop

for k,v in items.iteritems():
print k,v

Orange 50
Grape 55

Now the value in key-value pair of Dictonary can also be a LIST ie

#suiteprice = [100,200]
print suiteprice

items['Suite'] = suiteprice
print items
{'Suite':[100,200], 'Orange':50, 'Grape':55}

Iterate using For loop

for k,v in items.iteritems():
print k,v

Suite [100, 200]
Orange 50
Grape 55


IF Loop syntax

if min < max: #conditions can be extended using 'and' 'or' etc. Ex : min < max and min = max
print abc
print def

if min = max:
print abc
elif min < max:
print def


Usage I
for i in sys.argv:
print i

Usage II
arglen = len(sys.argv)

Usage III
print sys.argv[0] #this is print the script file name on screen.

Combining Numbers and Strings

Take a look at this program, and see if you can figure out what it’s supposed to do.

print ("Please give me a number: ",)
number = raw_input()

plusTen = number + 10
print ("If we add 10 to your number, we get " + plusTen)
This program should take a number from the user, add 10 to it, and print out the result. But if you try running it, it won't work! You'll get an error that looks like this:
Traceback (most recent call last):
File "", line 7, in
print "If we add 10 to your number, we get " + plusTen
TypeError: cannot concatenate 'str' and 'int' objects

What’s going on here? Python is telling us that there is a TypeError, which means there is a problem with the types of information being used. Specifically, Python can’t figure out how to reconcile the two types of data that are being used simultaneously: integers and strings. For example, Python thinks that the number variable is holding a string, instead of a number. If the user enters 15, then number will contain a string that is two characters long: a 1, followed by a 5. So how can we tell Python that 15 should be a number, instead of a string?

Also, when printing out the answer, we are telling Python to concatenate together a string (“If we add 10 to your number, we get “) and a number (plusTen). Python doesn’t know how to do that — it can only concatenate strings together. How do we tell Python to treat a number as a string, so that we can print it out with another string?
Luckily, there are two functions that are perfect solutions for these problems. The int() function will take a string and turn it into an integer, while the str() function will take an integer and turn it into a string. In both cases, we put what we want to change inside the parentheses. Therefore, our modified program will look like this:

print ("Please give me a number:",)
response = raw_input()

number = int(response)
plusTen = number + 10

print ("If we add 10 to your number, we get " + str(plusTen))

Another way of doing the same is to add a comma after the string part and then the number variable, like this:

print ("If we add 10 to your number, we get ", plusTen)

or use special print formatting like this:

print ("If we add 10 to your number, we get %s" % plusTen)

or use format()

print ("If we add 10 to your number, we get {0}".format(plusTen))

That’s all you need to know about strings and variables! .

input and raw_input function accept a string as parameter. This string will be displayed on the prompt while waiting for the user input.
The difference between the two is that raw_input accepts the data coming from the input device as a raw string, while input accepts the data and evaluates it into python code. This is why using input as a way to get a user string value returns an error because the user needs to enter strings with quotes.
It is recommended to use raw_input at all times and use the int function to convert the raw string into an integer. This way we do not have to bother with error messages until the error handling chapter and will not make a security vulnerability in your code.

List is like a clay. You can put it into any form.
String is like a stone. You cannot modify it but create temporary stuff for visuals.

More on SYS.ARGV

It is used to collect the arguments that are entered while executing the script
Example I

import sys
#To check the number of arguments
if len(sys.argv) < 3
print "At least 2 arguments is required"
#To list out all arguments
for i in sys.argv:
print i
#To print the name of the script
print sys.argv[0]


#To print all characters in a string

name = unni
for var in name:
  print var

#To print all values in a List

list = ["apple",33,"orange",55]
for var in list:
  print var

#To form list dynamically from a log file

logfile = ["a b c d e f g", "1 2 3 4 5 6 7"]
logfile2 = []
for i in logfile:
print logfile2

Similarly sys.exit() function can be used to exit a program.



count = 0
while count <= 10:
  print count
  count = count + 1

Example II – Indefinite Loop

while 1:
  print "Indefinite Loop"


open function is used to open a file. There are several modes in which a file is being opened.
They are – r(read), rb(read binary), w(write), wb(write binary), a(append), r+(read write)
The output open() is stored in a variable, as the variable is considered as an object of open()
Example :

han1 = open("data1", "r")
print han1

Now han1 is a fileobject which has functions of its own like:
1.readline() – reads into a string the first line upto \n
print han1.readline() – will read the entire file and store into one string unless the number of characters specified
print – it will print upto 50th character.
3.readlines() – will read one line per list element.
print han1.readlines()

File Handlers
han1 = open(“data1”, “r”)
han2 = open(“data2”, “w”)

We can combine both file handlers though. Here data2 file while be created if it doesnt exist or overwrite if it already exist.

Example Script

#2 File handler - han1=read , han2=write
han1 = open("data1", "r")
han2 = open("data2", "w")
tmpread1 = han1.readlines()
for i in tempread1:

Functions for write object handler:
1. write() = can write strings.
2. writelines() = can write multiple lines into a file. No need for a FOR LOOP.

When we want to write to file from varialbes, write() functions only accepts strings. So some formatting is required.
% operator when applied to strings performs formatting
%s – string , %d – integer, %f – float

Example Program

han1 = open("file2.txt","w")
product = "Apple"
cost = 332
count = 1
han1.write("%s %d %d\n" % (product,cost,count))

After the % what we have mentioned is a tuple.
Running Linux command in Python

OS SYSTEM will allow to use linux commands

import os
os.system("echo Hi")

COMMANDS GETOUTPUT will allow to store linux command output to variable

import commands
x = commands.getoutput("echo Hi")
print x

PASS means do nothing

print "Hi Man"
while 1:

The above python program will execute indefinitely until it gets a keyboard interrupt.

Stack Optimization for Magento

Caching is always the First Step towards performance enhancement.

Why use Redis for Magento?

Lets have a look at how the Magento caching (which uses the Zend_Cache library) works.
Each cache entry consists of the following information

  • the cached data
  • a cache key (or ID), that uniquely identifies this entry and is used to retrieve the data from the cache
  • a cache lifetime, after which the cache entry expires
  • zero or more cache tags

On the cache management page most of the cache tags used by Magento are listed. Depending on the modules and extensions installed there could be more or less tags, e.g. CONFIG, LAYOUT_GENERAL_CACHE_TAG, BLOCK_HTML, TRANSLATIONS, FULL_PAGE_CACHE, …3

Magento offers several different options what to use as a cache storage, and each of these storage systems is used by means of a PHP class called “cache backend”.

By default, cache data is stored in files (located in the directory var/cache/).
Another option is DB – which is not recommended as it would cause harm than good.
Memory is ideal option in this scenario,except for one problem: APC and memcached only support storing simple key-value pairs, so the cache tags are lost! This renders the whole caching rather useless, because every time we need to clear only one part of the cache, EVERY cache entry is cleared.

But, do not despair! The Zend Framework contains a solution to this problem. There is a special cache backend called Twolevels. The Twolevels backend uses a fast cache backend (i.e. APC or memcached) for the cache data, and a slow backend (i.e. files or database) for the lifetime and the cache tag information. This way we can have the best of both worlds!

Now that we use Redis – it is capable of storing keys-values as well as cache tags!!! with APC only for caching PHP complied code – And since Magento makes heavy use of cache tags, the effect is quite noticeable, depending on the number of records in the cache. During Migration I had faced issue of Magento pointing to old DB which can be avoided using the following redis cli command

#redis-cli -h <IP> flushdb

Magento makes it easy to configure all this by the way – have a look at the file app/etc/local.xml.additional for further information.

Install DisableLog Magento Extension
Magento, by default for each and every request magento would put an entry on DB (which updates 5 tables). This fills up the database slowly and also gives a slightly lower pageload. And if you’re using external site-statistics-tools like GoogleAnalytics and Piwik away, this logging is pointless. Disable it using this extension.

Mysql DB Caching  [Not Implemented]
query_cache_size=64M   ,   query_cache_limit=2Mb

They have a list of settings there for my.cnf as follows:

key_buffer = 512M
max_allowed_packet = 64M
table_cache = 512
sort_buffer_size = 4m
read_buffer_size = 4m
read_rnd_buffer_size = 2m
myisam_sort_buffer_size = 64m
tmp_table_size = 128m
query_cache_size = 96m
query_cache_type = 1
thread_cache_size = 8
max_connections = 400
wait_timeout = 300

APC for PHP Opcode Cache

Following are the settings in the /etc/php/php.d/apc.ini file

Find the apc.php file at /usr/share/php/apc.php and copy it into your web server root. This file will help you determine how effectively APC is working on your system and whether you need to make adjustments to the amount of RAM you make available to APC.

Apache – Setting MaxClients

Apache – Enable Expires Headers

Browsers use caching extensively, and can save a lot of the elements included in a web site locally so that they can be served from the browser’s cache rather than the web server on the next request. This can help quite a bit in shortening load times. The problem is for the browser to know when a file can be served from the cache, and when not – because the local copy is outdated.  To solve this issue, browsers rely on two HTTP headers, Expires and Cache-Control.

Magento’s default .htaccess file already configures these according to Yahoo’s performance recommendations (more on them below), but does not enable them by default. To enable them, all you need to do is add the following lines to your Apache server configuration (usually found in /etc/apache2/apache.conf):

<IfModule mod_expires.c>
ExpiresActive On

Remove unnecessary apache modules
Example :

LoadModule proxy_connect_module modules/

Apache may not any configuration related to proxy still it is loaded on spawning of each apache process.

Remove unnecessary php modules for performance & security

#mv /etc/php.d/sqlite3.ini /etc/php.d/sqlite3.disable

Other compiled-in modules can only be removed by reinstallating PHP with a reduced configuration. You can download php source code from and compile it as follows with GD, fastcgi, and MySQL support:

./configure --with-libdir=lib64 --with-gd --with-mysql --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/usr/com --mandir=/usr/share/man --infodir=/usr/share/info --cache-file=../config.cache --with-config-file-path=/etc --with-config-file-scan-dir=/etc/php.d  --enable-fastcgi --enable-force-cgi-redirect

Use mod_deflate in apache configuration

Check if compression is enabled across multiple web browsers :

Use FastCGI to run PHP [Not Implemented]

If you are using Apache as your web server, there are two ways you can sensibly set up PHP. The first way is to use the mod_php module, which is easiest to use and hence the default with many hosting providers. In this case, a PHP interpreter is run with each process of the web server and on stand-by until a script is executed.

If you are wondering why your Apache needs a lot of memory, probably the fact that you are using mod_php is a big part of the answer. On a large site, there may well be hundreds of Apache processes running, and each has its own PHP interpreter. However, only very few of them – often less than one in ten – actually need to run PHP. The rest serve static files or simply wait for new requests. Because PHP uses a lot of memory, it is a good idea to see if you can avoid the overhead generated by having dozens and dozens idle PHP processes running.

The way to avoid the overhead is to use FastCGI instead of mod_php. With FastCGI, a separate internal daemon is run on your web server that is contacted by the web server only when execution of PHP required. Thus you do not need to carry the PHP baggage for all requests.

Setting up FastCGI requires you to make some changes to your server configuration, but the benefits will be large. A good starting point is this article: For more details, check the Apache web site and the documentation of your distribution.

Alternative: Turn Off Keep-Alive if you have to use mod_php [Not Implemented]

If you cannot, or do not want to, switch to FastCGI, you can still do something to reduce the memory usage per visitor. This is important because each server has only so much memory, and if you can serve a visitor with less memory, you can server more visitors in total and scale further with given resources.

As I pointed out above, the big problem with mod_php is that you need to keep a PHP interpreter running with each Apache process, and that most Apache requests do not involve PHP. By default, Apache enables a feature called HTTP Keep-Alive, which lets visitors re-uses a connection for several requests. This makes the web site faster for the visitor, because images and other static files can be loaded without continuously re-connecting with the web server. But it also means that your web server will have many idle processes, waiting for new requests that may never arrive. And each of these idle processes runs a full PHP interpreter.

To turn off Keep-Alive, search your Apache configuration files for the KeepAlive directive. If the directive is not set, add the following line to your config file

KeepAlive off
and then restart Apache. If it is set, ensure that it is set to “off“. You should start to see lower memory usage immediately.

However there are other few down-sides to mod_php though: you’re forced to use Apache’s less efficient prefork MPM and your Apache processes will get large. Serving static files with 32MB+ Apache processes isn’t very efficient.

What is the difference between PHP CGI and PHP FastCGI?

When PHP is run as a CGI program the web server spawns a process each time a request is made. This can be less efficient unless your operating system is good at spawning processes quickly. Unlike mod_php, CGI application won’t bloat your Apache, so the processes can remain small for service static files – important for high traffic sites.

When PHP is run as a FastCGI, a number of daemon processes are started and sit waiting for the web server to request them to run PHP code. This avoids the cost of spawning new processes each time in addition to not bloating web server processes. Due to the nature of long-running processes, FastCGI may require some periodic process maintenance. Lighttpd’s spawn-fgi script provides a method to limit the number of requests a single FCGI process is allowed to serve in an attempt to reduce the effect of memory leakage.

OS Optimizations

Select the partition which has the documentroot, NFS etc
Edit /etc/fstab file and add “noatime” to the options in the fourth column, and remount everything with the mount -a command.
Now the file access time isn’t updated after every access hence giving us a little bit of performance gain.

NFS I/O – Diagnostic
If Using check its metrics using the

op/s        rpc bklog
1.91        0.00
read:             ops/s           kB/s          kB/op        retrans        avg RTT (ms)    avg exe (ms)
0.005       0.116      25.601        0 (0.0%)       3.681       3.730
write:            ops/s           kB/s          kB/op        retrans        avg RTT (ms)    avg exe (ms)
0.004       0.011       2.840        0 (0.0%)      1107.043       6.886

Here avg RTT for write is very high!! hence we can say that the write performance is poor looks like this is due to a network or NFS server latency because “avg RTT” should be much lower.
“avg RTT”: “the network + server latency of the request”
“avg exe”: “the total time the request spent from init to release”

When mounting an NFS export on an NFS client one can specify NFS mount options that over-ride the defaults. e.g.
#mount -t nfs -o intr,rsize=65536,wsize=65536,noacl,nocto,nodiratime nfsserver:/mnt/export /mnt/nfsclient-mountpoint/
Here’s a list of important NFS mount options:

Use the intr mount option when using the default hard mount option. It allows signals to interrupt file operations, thus allow recovery from what appears to be an NFS hang.

The typical default read size transferred in a packet is 32768 bytes. Increasing this value may increase the performance depending on the size of the data being read. Recommended values for this parameter are numbers within the power of 2 (4096, 8192, …). Large values may not work with NFS version 2. Note: Setting this size to a value less than the largest supported block size will adversely affect performance.(max value = 65536)

The typical default write size transferred in a packet is 32768 bytes. Increasing this value may increase the performance depending on the size of the data being written. Recommended values for this parameter are numbers within the power of 2 (4096, 8192, …).Note: Setting this size to a value less than the largest supported block size will adversely affect performance.(max value = 65536)

To avoid updating the inode’s access time use noatime. Alternately, use relatime to update the access time only when the access time is earlier than the modify or change time of the inode.

Disables Access Control List (ACL) processing.

Suppress the retrieval of new attributes when creating a file.

Setting this value disables the NFS server from updating the directory access time. This is the directory equivalent setting of noatime.

Disable all forms of attribute caching entirely. This extracts a significant performance penalty but it allows two different NFS clients to get reasonable results when both clients are actively writing to a common export on the server.

Current status on NFS
The amount and size of data, that the server and the client uses, for passing data between them is very much important.
Use df -hT to get the NFS Server IP and enter the following to get the status

#grep <IP> /proc/mounts /DoNotDelete/media nfs4 rw,relatime,vers=4.0,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=,local_lock=none,addr= 0 0

Decreasing the size of read and write in RPC packets, will increase the total number of network IP packet’s that need to be send over the network.

Which means if you have 1 MB of data, dividing it into equal chunks of 32KB will increase the number of chunks, and if you divide it in equal chunks of 64KB the number of chunks will be reduced. Which means you need to send a high number of IP packet’s over the network if you decrease these values, and if you increase these values, you will have to send less number of IP packets over the network.

So our decision on modifying this parameter, must always depend on the network capability. If suppose you have 1 Gigabit port on your NFS server and client, and your network switches connecting these server’s also are capable of 1G ports, then i would suggest to tweak these parameter’s to a higher value.

Maximum Transfer Unit (MTU) of Network Interface
Running command “tracepath <nfs-server>” from NFS client should provide one with current path MTU on the NFS client. MTU of network card should be set equal to the path MTU obtained with tracepath command. This should  help in bringing down packets getting dropped to a lesser value. For reference see ifconfig man pages.
Security Best Practice for the PHP App

More into Magento File System

1.Dir : magento/app
This is your Magento application directory. This is the directory where the final class Mage… application file (Mage.php) is stored.

2.Dir : magento/app/code
This is your Magento code directory. This is base directory for the three Magento code pools (core, community, local).

3.Dir : magento/app/etc
The etc folder is where Magento stores system level (as opposed to module level) configuration files. The name etc is borrowed from the *nix family of operating systems, and Magento’s configuration files are all XML based.

File : magento/app/etc/local.xml – is where we mention DB credentials, session storage (using memcache/db) , caching backing (eg.redis) etc are configured.

4.Dir : magento/media
Magento’s media folder is where media files (images, movies, etc.) related to data (products) is stored.
This dir is chosen as the shared directory in a multi-server environment.

5.Dir : magento/skin
Skin folder contains images, CSS and Javascript files used by your themes.This is not the only folder where you’ll find images, CSS or javascript though. This folder is meant for files that are customized per theme.

6.Dir : magento/var
The var folder is another one borrowed from the *nix world. The var stands for Variable files, and is intended to store files which are expected to change during normal system operations.

7.Dir : magento/var/cache
Magento, rather famously, makes heavy use of caching for activities that might bog down the system if they had to be performed every-time a page loads. For example, Layout XML files are merged once, and then the tree is cached so they don’t need to be merged again. The cache folder is one place where Magento will store these cached results.

8.Dir : magento/var/log
Magento’s log folder is where is stores the system and exception logs. These logs can be turned on from the Admin Panel’s section. The apache/web-server user will need write permission on this folder and the files therein.
System -> Configuration -> Developer -> Log Settings

9.Dir : magento/var/session
During installation you have the option of storing user sessions on disk, or in the database. The session folder is where the user sessions are written out to and read from if you choose to store them in the filesystem.

10.Dir : magento/media/upload
There are a number of Admin Panel features which allow you to upload media files (default logos, etc.). The upload folder is where Magento stores these files.

More Steps
1. Increase Ulimit

2. Go to: System->Configuration->CATALOG/Catalog->Frontend

Use Flat Catalog Category: Yes

Use Flat Catalog Product : Yes

Go to: System->Configuration->ADVANCED/Developer: Merge javascript and CSS


To change the domain name of a magento site, we need to upate a table called core_config_data .

Identify Memory Leak

A memory leak, technically, is an ever-increasing usage of memory by an application.

With common desktop applications, this may go unnoticed, because a process typically frees any memory it has used when you close the application.

However, In the client/server model, memory leakage is a serious issue, because applications are expected to be available 24×7. Applications must not continue to increase their memory usage indefinitely, because this can cause serious issues. To monitor such memory leaks, we can use the following commands.

$ ps aux --sort pmem

root         1  0.0  0.0  1520  508 ?        S     2005   1:27 init
inst  1309  0.0  0.4 344308 33048 ?      S     2005   1:55 agnt (idle)
inst  2919  0.0  0.4 345580 37368 ?      S     2005  20:02 agnt (idle)
inst 24594  0.0  0.4 345068 36960 ?      S     2005  15:45 agnt (idle)
root 27645  0.0 14.4 1231288 1183976 ?   S     2005   3:01 /TaskServer/bin/./wrapper-linux-x86-32

In the above ps command, –sort option outputs the highest %MEM at bottom. Just note down the PID for the highest %MEM usage. Then use ps command to view all the details about this process id, and monitor the change over time. You had to manually repeat ir or put it as a cron to a file.

$ ps ev --pid=27645
27645 ? S 3:01 0 25 1231262 1183976 14.4 /TaskServer/bin/./wrapper-linux-x86-32

$ ps ev --pid=27645
27645 ? S 3:01 0 25 1231262 1183976 14.4 /TaskServer/bin/./wrapper-linux-x86-32

Note: In the above output, if RSS (resident set size, in KB) increases over time (so would %MEM), it may indicate a memory leak in the application.