Saturday, 30 March 2013

IT Lab Session 10

Assignment 1

Create 3 vectors, x, y, z and choose any random values for them, ensuring they are of equal length, 
T<- cbind(x,y,z) 
Create 3 dimensional plot of the same



> sample<-rnorm(50,25,6)
> sample
 [1] 30.785023 31.702170 23.528853 18.208267 32.110218 35.820121 32.404731
 [8] 24.507976 14.959855 29.919671 27.677203 17.108632 27.514712 20.260337
[15] 26.557483 30.048945 23.540832 15.833124 29.411549 27.037098 29.744451
[22] 28.901576 31.999236 32.641413 24.628705 27.263692 32.895669 27.046758
[29] 20.699581 32.417177 20.637992 20.448817 29.045200  9.706208 19.479191
[36] 19.214362 30.487007 41.029803 26.190709 24.989519 28.134211 25.319421
[43] 22.595737 27.045515 20.529657 36.455755 31.249895 19.290580 24.701767
[50] 24.621257
> x<-sample(sample,10)
> y<-sample(sample,10)
> z<-sample(sample,10)
> x
 [1] 30.45576 20.63799 23.52885 20.69958 41.02980 29.74445 31.24990 30.48701
 [9] 32.64141 15.83312
> y
 [1] 20.69958 22.59574 36.45576 30.48701 30.78502 32.64141 32.40473 24.50798
 [9] 24.98952 26.55748
> z
 [1] 27.03710 32.40473 27.04676 24.98952 30.04895 24.50798 36.45576 29.04520
 [9] 19.29058 30.78502
> T<-cbind(x,y,z)
> T
             x        y        z
 [1,] 30.45576 20.69958 27.03710
 [2,] 20.63799 22.59574 32.40473
 [3,] 23.52885 36.45576 27.04676
 [4,] 20.69958 30.48701 24.98952
 [5,] 41.02980 30.78502 30.04895
 [6,] 29.74445 32.64141 24.50798
 [7,] 31.24990 32.40473 36.45576
 [8,] 30.48701 24.50798 29.04520
 [9,] 32.64141 24.98952 19.29058
[10,] 15.83312 26.55748 30.78502
> plot3d(T)
 
> plot3d(T,col=rainbow(1000))
 
> plot3d(T,col=rainbow(1000),type='s') 
 
Assignment 2 
Read the documentation of rnorm and pnorm, 
Create 2 random variables 
Create 3 plots: 
1. X-Y 
2. X-Y|Z (introducing a variable z and cbind it to z and y with 5 diff categories) 3. Color code and draw the graph 
4. Smooth and best fit line for the curve 


> x<-rnorm(1500,100,10)
> y<-rnorm(1500,85,5)
> z1<-sample(letters,5)
> z2<-sample(z1,1500,replace=TRUE)
> z<-as.factor(z2)
> t<-cbind(x,y,z)
> qplot(x,y)
 
> qplot(x,z)
 
> qplot(x,z,alpha=I(1/10)) 
 
> qplot(x,y,geom=c("point","smooth")) 
 
> qplot(x,y,colour=z) 
 
> qplot(log(x),log(y),colour=z) 
 

Sunday, 24 March 2013

ITBAL Assignment Session 9 : Data Visualisation Tools


 

Data Visualisation Tool : visual.ly


Successfully getting through a job interiew in today's competitive world is all about being able to sell yourself properly, more importantly, uniquely. One such very unique software application is provided by visual.ly. This helps in building visual and pictorial resumes. Thus, visual.ly provides data visualisation at a basic yet effective and unique manner.

With widespread use of internet and advent of online or digital marketing, its a good idea to provide visual representation of the same boring word docs or pdfs. Although having a pictorial resum might sound unprofessional, you never know that might turn out to be your usp. Also notable is the ease with wich the interface is maintained between the app and the data source. The app directly feteches the relevant info from the linked in profile to which it is mapped at a higher level. Once the information is fetched, the visual resume is available in few seconds time. Also impressive is the level of flexibility and variety of choices that visual.ly offers to the users in selecting the format of the resume.

Since the software is on the cloud and the service is available as per demand, there is no need have prior requirements. Instant loggin in to the internet is enough to avail the service.
Thus, although very basic, the pictorial resume app by visual.ly has a very effective and attractive use in the field of data visualisation. To convey ideas effectively, both aesthetic form and functionality need to go hand in hand, providing insights into a rather sparse and complex data set by communicating its key-aspects in a more intuitive way. Yet designers often fail to achieve a balance between form and function, creating gorgeous data visualizations which fail to serve their main purpose — to communicate information".

Visual.ly was founded by Stew Langille, Lee Sherman, Tal Siach, and Adam Breckler in 2011. Besides the feature of online resume-making, visual.ly can be used to create and publish their own data visualizations.Through this tool, users will be able to gather information from databases and APIs in an automated service to produce an infographic.

Few notable USPs of visual.ly resume creator :
Simple user friendly User Interface
Numerous options regarding visual presentation of different types of data are available
The full tool is available online on cloud
It is fast
Results are attractive and elegant
Options to retain and avail that data are available.

Data Visualisation Tool URL :  Visual.ly: (http://visual.ly/)

Friday, 15 March 2013

IT BAL - Session 8, Mar 12, Panel Data


Problem: 

Perform Panel Data Analysis of "Produc" data

Solution:

Three models for pooled data analysis:
     
Pooled affect model
Fixed affect model
Random affect model

Functions used:

pFtest : for determining between fixed and pooled
plmtest : for determining between pooled and random
phtest: for determining between random and fixed

Loading the data : 

data(Produc , package ="plm")
head(Produc)











Pooled Affect Model 

pool <-plm( log(pcap) ~log(hwy)+ log(water)+ log(util) + log(pc) + log(gsp) + log(emp) + log(unemp), data=Produc,model=("pooling"),index =c("state","year"))
summary(pool)












Fixed Affect Model:

fixed<-plm( log(pcap) ~log(hwy)+ log(water)+ log(util) + log(pc) + log(gsp) + log(emp) + log(unemp), data=Produc,model=("within"),index =c("state","year"))
summary(fixed)












Random Affect Model:

random <-plm( log(pcap) ~log(hwy)+ log(water)+ log(util) + log(pc) + log(gsp) + log(emp) + log(unemp), data=Produc,model=("random"),index =c("state","year"))
> summary(random)











Testing of Model

Hypothesis testing : 

H0: Null Hypothesis: the individual index and time based params are all zero
H1: Alternate Hypothesis: atleast one of the index and time based params is non zero

Pooled vs Fixed

Null Hypothesis: Pooled Affect Model
Alternate Hypothesis : Fixed Affect Model

Command:

> pFtest(fixed,pool)


Result:
data:  log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp) 
F = 56.6361, df1 = 47, df2 = 761, p-value < 2.2e-16
alternative hypothesis: significant effects 
Since the p value is negligible we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Fixed Affect Model.

Pooled vs Random

Null Hypothesis: Pooled Affect Model
Alternate Hypothesis: Random Affect Model

Command :
> plmtest(pool)

Result:

  Lagrange Multiplier Test - (Honda)
data:  log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)
normal = 57.1686, p-value < 2.2e-16
alternative hypothesis: significant effects 

Since the p value is negligible we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Random Affect Model.

Random vs Fixed

Null Hypothesis: No Correlation . Random Affect Model
Alternate Hypothesis: Fixed Affect Model

Command:
 > phtest(fixed,random)

Result:

 Hausman Test
data:  log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)
chisq = 93.546, df = 7, p-value < 2.2e-16
alternative hypothesis: one model is inconsistent 

Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Fixed Affect Model.

Conclusion: 

Fixed Affect Model is best suited to do the panel data analysis for "Produc" data set.

Hence , we conclude that within the same id i.e. within same "state" there is no variation. 



Monday, 4 March 2013

WBC Assignment : Discussion on 'Astroturfing'


At a generic level, we are familiar with the term 'Astroturf' with reference to synthetic grass covering popular in sports like hockey etc. Basically it means fake grass. However, if we go into in-depth analysis of the term and the real implications, astroturfing is a formal political, advertising or public relations campaign seeking to create the impression of a spontaneous movement. It creates the impression of widespread support for a policy, individual or product whereas in reality little of such support exists. Astroturfers are most often highly paid lobby groups or political operatives who act at the behest of an organization or individual who has a stake in the campaign's successful outcome. Astroturfing can thus be undertaken by an individual pushing a personal agenda or a highly organized group with like political party. A prominent example of an astroturfing campaign is the National Smokers Alliance, an early astroturf group created by Burson-Marsteller on behalf of tobacco giant Philip Morris. It worked to influence Federal legislation in 1995 by organizing mailings and running a phone-bank urging people to call or write to politicians expressing their opposition to laws aimed at discouraging teens from starting to smoke. Another example is a pro-Kremlin group funding a vast network of online activists to create the illusion of widespread support for Vladimir Putin.