2008年12月27日星期六

Astonishing poetry: 2009 Come on, China!



I believe the author of this poet is really full of humor sense.

A simple interpretation:

''Da Wang'' district is part of disputed area between China and India, and under the control of India now. ''Zhong Shan Shi Tu'' 中山世土 means ''Diàoyútái Qúndǎo'' 钓鱼台群岛 in Chinese, or ''Senkaku Islands '' 尖閣諸島(せんかくしょとう) in Japanese. It is a title offered by an emperor of Qing Dynasty (China) to King of Ryūkyū Kingdom, or 琉球國 りゅうきゅうおうこく in Japanese. It is also a disputed territory between China and Japan.

2008年8月24日星期日

Mark 4

Chapter 7

"You have let go of the commands of God and are holding on to the traditions of men."

"You have a fine way of setting aside the commands of God in order to observe your own trations!"

"Nothing outside a man can make him 'unclean' by going into him. Rather, it is what comes out of a man that makes him 'unclean.'"

"For from within, out of men's hears, come evil thoughts, sexual immorality, theft, murder, adultery, greed, malice, deceit, lewdness, envy, slander, arrogance and folly."

There is a parable about children, dogs, bread. "First let the children eat all they want, for it is not right to take the children's bread and toss it to their dogs."

"But even the dogs under the table eat the children's crumb." It seems that why her daughter is possessed by devil is she is hungury.

Jesus healed people again.

Chapter 8

Jesus feeds 4000 people at Tyre and Sidon by seven loaves and a few small fish.

When he went to Dalmanutha, he and his disciples were questioned by Pharisess. He had to leave.

Jesus' disciples relied on Jesus so much that they do not care about their provisions. When Jesus told them be careful about the yeast of the Pharisees and that of Herod, they are suspicious and scared in that they might starve. Jesus told them they need be confident.

Jesus healed people again. It seems that Jesus healed people wherever he visits and he is afraid that people know he can heal the hardest disease.

When Jesus was at Bethsaida, most people thought him as other prophets.

Jesus told his disciples that he would be killed and rise three days later. Peter took him aside and began to rebuke him.

"If anyone would come after me, he must deny himself and take up his cross and follow me. For whoever wants to save his life will lose it, but whoever loses his life for me and for the gospel will save it. What good it is for a man to gain the whole world, yet forfeit his soul? Or what can a man give in exchange for his soul?"

2008年8月19日星期二

Mark 3

Chapter 5

A herd of pigs were killed by Jesus, which could be a wealth at that time. The owner of these pigs must feel angry when he or she heard these pigs are demons and the demons killed pigs. In some sense, the reason why Jesus is disliked by some people is that some of his excuses to his behaviors are too ridiculous to believe. It seems that Jesus did not compensate the demage by demons to those who are not his followers.

Jesus healed very sick woman and a dead girl. It seems that what Jesus had done to those who are not his followers is healing sick persons. That is amazing, yet it is not easy to tell if Jesus is a doctor.

Chapter 6

King Herod thought Jesus is the raised John who was beheaded before. John intervened the relation between King Herod and Herodias, his brother Philip's wife. So he was beheaded.

Five thousand people were fed by five loaves and two fish.

I do not understand why Jesus walked on the water. If he wanted to help his followers, he can calm down the wind.

2008年5月31日星期六

Mark 2

Chapter 3 is about conflicts and Jesus' reaction

Jesus' enemy want to entrap Jesus, bringing a man with a shriveled hand before him on the Sabbath. To abide the tradition, he cannot heal the man. Jesus tried to provoke his enemy's conscience, yet it does not work. He healed. His enemy got the excuse to kill him. He withdrew to the lake and appointed twelve apostles to defend his church.

His family disliked him and tried to take charge of him. He denied his mother and brothers because they did not follow God.

Here is one difficulty of Christianity in China. Chinese tradition gave parents authority. It is hard for Chinese to love God more than their parents.

Even he was not in Jerusalem, his enemy condemned him as an evil spirit. Jesus defended himself using several parables. Evil cannot drive out evil, and only holy spirit can tie up evil. In his eyes, evils must be the same even there are different kinds of evils.

Chapter 4 Parables

Sowing seeds and different results.

"For whatever is hidden is meant to be disclosed, and whatever is concealed is meant to be brought out into the open."

Jesus calms the storm.

2008年5月30日星期五

Mark 1

"Mark" is a brief story of Jesus Christ. Because it is brief, for me, a beginner, it is a good primary textbook.

Chapter 1 is about that Jesus is omnipotent. He healed many and drove out evil spirits. He has great influence, because it seems that he can call disciples before he showed his power.

But there are two questions:

Jesus was baptized by another preacher, John. Why there is a John before Jesus? Is John a teacher of Jesus? It seems that he is not one of Jesus' disciples. Is he involved in Jesus' activities following?

Answer: John is said to be the one who prepared the way for Jesus. He is not a teacher of Jesus. He is not a disciple of Jesus. Yes, he is in the following activities, see Luke.

In 1:35-39, Jesus left his disciples and prayed in a solitary place early in the morning? Why he did that? When his disciples found him, why he decided to go somewhere else?

Answer: Early morning and praying alone is good to plan for the whole day. After thinking, Jesus decided to build up ministries.

In chapter 2 Jesus was challenged four times.

The first one is when Jesus healed a paralytic. At first he said "your sins are forgiven." Some teachers of law thought that Jesus is blaspheming in their mind. Jesus answered them by healing this paralytic to show that he is the lord and there is no blasphemy. It is a quiet fight between Jesus and his enemies.

The second one is when Jesus was invited to dinner with tax collectors and sinners. Someone asked his disciples why Jesus is banded with these guys. Jesus told them he is a doctor and he comes to sinners but not the righteous. This is an indirect fight.

The third time is about fasting. Jesus' disciples did not fast. Someone asked Jesus why. Jesus answered by two metaphors. He is the lord, the lord need not fast, and so do his disciples when they are together with the lord. He is new, different from other preachers. So it is never a good idea to attach old things to something new. This is a direct fight, but Jesus used metaphors.

The forth one is about his disciples did something forbidden on the Sabbath. Someone questioned Jesus why they did unlawful things. Jesus gave them an example that David did something forbidden before. Because of his dignity, it is not a problem at all. This is a totally direct fight, no metaphors.

These four stories tell me that Jesus is audacious and righteous. Because he and his disciples did many things which are strange for common people, he is audacious. Because he always answered question, regardless of direct or indirect questions, directly to the people who questioned him, he is righteous.

2008年5月28日星期三

Go to hell, Lang Xianping.

I read an essay by Lang Xianping last night. This morning it is forwarded by a friend of mine. Reading something twice can encourage the reader to write a comment,and here it is.

I know little about Lang Xianping. Yet I still feel him disgusting. It
seems that he is preaching that Chinese need believe in God, or
anything like that. Otherwise Chinese does not deserve modern
enterprises. Basically Chinese cannot benefit from modern enterprises
because of the culture. If Chinese can benefit, they must be
hypocrisy. And the reason why American or European people are more
wealthy is that they are more moral than Chinese. I do not mean that
Chinese do not have any problem in morality. But I think blame the
dysfunction for firm owners are all immoral is not the point. Lang
must be more a propaganda machine than a serious and righteous
economist. He should not be blind at all kinds of shortfalls of
current system, but blame for education or morality. If he is right,
then Adam Smith's invisible hands must be wrong.

I can imagine that in his ideal world, it must be a Communism, and Lei
Feng could be an ideal entrepreneur. Probably he think Cultural
Revolution in China is even righteous.

2008年5月8日星期四

我常用到的stata命令之六

Stata的估计和检验功能强大,尤其是处理微观数据。回归是实证研究的常用方法。而线性设定下的最小二乘法(OLS)和两阶段最小二乘法(2SLS)又是应用最广泛的,应该优先掌握。先讲如何在stata中实现OLS和2SLS估计,再讲如何在选择合理的方法处理实际问题。后一部分深受Joshua Angrist教授影响,在此致谢,后面引用他的思想时会一一注明。

估计的命令简单明了,就不多说了。

reg (被解释变量) (解释变量1) (解释变量2)……

执行上面的命令后,stata会报告回归的估计结果和一些方差分析。在下方的表格中依次列出了解释变量,其系数估计值,估计值的标准误,t比率,p值,及置信度为(1-0.05)的置信区间。

拿到回归结果后,先看看符号对不对,是否显著。解释变量影响的方向和大小直接从点估计值读出,显著性由t统计量得出。在经典假设下,t比率服从t分布。t分布和标准正态分布形状相似,只是它的“尾巴”要比标准正态分布的“肥”一些。当样本量趋于无穷时,t分布的极限分布是标准正态分布,因而“肥尾”的特征逐渐消失。下表列出了不同自由度下二者的差异(Beyer 1987 “CRC Standard Mathematical Tables, 28th ed.”;Goulden 1956 “Methods of Statistical Analysis, 2nd ed.”)。可以看出,自由度超过一百时,二者的差别就已经相当小了。所以,当样本量的数量级是100或以上时,用标准正态分布的关键点作检验是比较准确的。比如,1.96是97.5%的关键点,1.64是95%的关键点。这些都是比较熟悉的。


90% 95% 97.5% 99.5%
1 3.07768 6.31375 12.7062 63.6567
2 1.88562 2.91999 4.30265 9.92484
3 1.63774 2.35336 3.18245 5.84091
4 1.53321 2.13185 2.77645 4.60409
5 1.47588 2.01505 2.57058 4.03214
10 1.37218 1.81246 2.22814 3.16927
30 1.31042 1.69726 2.04227 2.75000
100 1.29007 1.66023 1.98397 2.62589

1.28156 1.64487 1.95999 2.57588

读者读到这里可能会笑话我了。stata不是已经报告了t检验的p值和置信区间了吗?为什么不直接察看这些结果呢?原因在于实证文献往往只报告参数的估计值和标准误,需要读者自己将估计值和标准误相除来计算显著性。而且当你在写实证文章时,也应该报告参数的估计值和标准误。因为p值和置信区间是基于待估计参数等于零的原假设的,如果读者要做其他原假设并不是系数等于零的检验,或者单尾检验时,知道标准误就很方便。所以,报告标准误比报告p值更好。

和回归命令相伴使用的一个重要命令是predict。它的用处是在回归结束后得到相关的统计量。语法如下:

predict (新变量名), (统计量名)

这里的统计量名是一些选项。常用的选项有:xb(回归的拟合值。这是默认选项,即不加任何选项时,predict会给新变量赋一个拟合值。);residuals(残差);leverage(杠杆值)。后面举一个例子。
(待续)

2008年4月18日星期五

A simple study on "Wang Qian Yuan Incident" at Duke

This Wednesday when I attended an informal meeting, an American told me there is one incident happened at Duke Last Wednesday. This is related to both Tibet and a disobedient Chinese girl, Wang Qian Yuan (王千源). Usually the pro-China protests organized by CSSA, and endorsed by Chinese government are boring and ineffective, so they will not attract too much of my attention. However this time it is a little different, because we Chinese succeeded in transforming a conflict between Chinese and foreigners into a civil war between Chinese again. This Friday the proposed econometrics seminar is canceled. So I decided to study this incident and try to find the whole story.

By finding several videos from YouTube, and several articles from the Duke Chinese Scholars and Students Association, The Chronicle, an independent daily newspaper at Duke University, the whole process seems emerges.

The event begins with a candlelight vigil supporting freedom in Tibet organized by the Duke Human Rights Coalition on Wednesday evening. One leader of this coalition, Daniel Cordero said they reserved the Chapel Quad in advance, planing to advocate for Tibet's freedom from the People's Republic of China. However "crowds of upset protesters flooded the Chapel Quadrangle". They bear signs and Chinese flags, expressing patriotism and criticizing Western media through chants and song.

During this process, a Chinese female in yellow came to the central stage, presenting her idea on Tibet. We can here several Chinese yelling at her "Are you Chinese?" from the video I attached. Later she was surrounded and queried by many Chinese. Because the volume of background songs are so loud, I cannot hear their conversation clearly.

Anyway, in one video several of her "crimes" are listed. It is in Chinese, I tried to translate it into English as follows:

(1) Write the slogan "Free Tibet" on the back of one "separatist";

(2) Make counter-active gestures together with "separatists" (which is based on the gesture for "one world, one dream" but falsified by "separatists").

(3) Compare the Tibetan ensign with the Hong Kong ensign.

What is more astonishing happened later. Her name, her phone number and her Chinese identity were posted to the web site of the Duke Chinese Scholars and Students Association. Later these private information were posted in several popular Chinese-language forums, say, Tianya, Kaidi. Even worse, contact information of her parents were also posted.

Since she has received many harassing phone calls and e-mails, she filed a report with Duke University Police Department Friday and indicted DCSSA released her private information. The president of DCSSA denied such accusation consequently.

Though the story is not finished, because there might be more fights between Ms.Wang and DCSSA, I'd like to comment what Duke CSSA has done.

(1) The organization of pro-China protest is not effective at all.
They did not reserve the place to protest. They protest by chants and yelling is not pursuasive. They threat other people's right to free speech. Basically it seems that DCSSA does not know the rule in the US at all. Actually what they have done will hurt the image of Chinese eventually.

(2) It is illegal to release an individual's privacy, if DCSSA did that.

(3) Treat such a severe accusation that DCSSA released privacy by only denying is not sufficient at all. DCSSA need express their concern about the victim, need provide all necessary help finding the person who released the privacy, since it is released throught DCSSA's email system.

It seems that DCSSA abuse the freedom in the US. Their actions are not constructive. It does not know what their interests and purposes are, and how to protect their rights and deliver their opinions in the US.

PS:

http://www.dukechronicle.com/media/storage/paper884/news/2008/04/14/News/Student.Gets.Threats.After.China.Protest-3322848.shtml

http://media.www.dukechronicle.com/media/storage/paper884/news/2008/04/10/News/ProTibet.ProChina.Protesters.Clash.On.Quad-3316313.shtml

http://www.youtube.com/watch?v=I4J6nfyb-3k





An Open Letter to Duke Community
Apr 14th, 2008

After last Wednesday’s high profile protest on Duke campus, a few subscribers to the mailing list
China@duke.edu
anonymously sent out messages verbally attacking one student using language we found troubling and heinous, as well as releasing this student’s private information. This mailing list was set up mainly for the purpose of helping students exchange information such as second-hand car or apartment sublease. It is open to the public, not limited only to Chinese students and scholars at Duke, for subscription and currently has more than 900 registered users, and like many other mailing list of this kind, we do not have a dedicated member to monitor it closely on a daily basis. However, we removed all the relevant messages once they were brought to our attention. And starting on Saturday, April 12, 2008, we have imposed stricter filter rules for messages sent through the mailing list. Duke Chinese Students and Scholars Association (DCSSA) hereby declares our unequivocal position that we strongly disagree and condemn the behavior of these few anonymous subscribers.

However, we are very disappointed by the story “Student gets threats after China protest” appearing on today’s Chronicle (Apr 14th, 2008). We feel regretful that this student considered it was DCSSA’s fault to release “all kinds of information” about her, and several other student organizations on campus blamed DCSSA for actions taken by certain subscribers to our mailing list, which, for the reasons stated above, we have to disagree with. We are sympathetic to this student’s situation, and as the representatives of DCSSA, we will try to contact this student to resolve any misunderstandings.

As one of the largest student groups on campus, DCSSA is an organization dedicated to promoting diversity on Duke Campus. We are always proud to bring the culture from China—our home country which has a glorious history of more than 5,000 years, to the Gothic Wonderland which we also call home. We hope that by learning from each other, we can work towards an even brighter future. We appreciate the increasing attention on China recently received from the Duke Community. In light of the recent events on and off campus, we welcome your constructive comments and healthy reflections on a wide range of topics, including the impartiality of media, freedom of speech, and effectiveness of cross-cultural communication. Please feel free to send us your email to:
dcssa2008@gmail.com
.

Thank you!

Zhizhong Li, DCSSA President
Weina Wang, DCSSA Vice President
Weining Bian, DCSSA Vice President

2008年4月13日星期日

我常用到的stata命令之五

(续)
合并数据库既要合并观察,又要合并变量。合并观察用append。两个数据库的格式完全一样,但观察不一样,合并他们用append空格using空格(文件名)就可以狗尾续貂了。简单明了,很难犯错。用merge就需要格外小心。如果两个数据库中包含共同的观察,但是变量不同,希望从一个数据库中提取一些变量到另一个数据库,这时用merge。完整的过程如下:

use (文件名) [打开辅助数据库]
sort (变量名) [根据变量排序,这个变量是两个数据库共有的识别信息]
save (文件名), replace [保存辅助数据库]
use (文件名) [打开主数据库]
sort (变量名) [对相同的变量排序]
merge (变量名) using (文件名), keep((变量名))
[第一个变量名即为前面sort后面的变量名,文件名是辅助数据库的名字,后面的变量名是希望提取的变量名]
ta _merge [显示_merge的取值情况。_merge等于1的观察是仅主库有的,等于2的是仅辅助库有的,等于3是两个库都有的。]
drop if _merge==2 [删除仅仅来自辅助库的观察]
drop _merge [删除_merge]
save (文件名), replace [将合并后的文件保存,通常另存]

讲到这里似乎对于数据的生成和处理应该闭嘴了,讲讲估计、检验这些更有趣的事情吧。这里我最后举一个例子,说说准备工作的不易。麻烦的事情是总是有一些没办法简单套用命令的特殊要求。现在有两条路可以通向罗马:一是找到更高级的命令一步到位;二是利用已知简单命令多绕几个圈子达到目的。

下面讲一个惨痛的教训,是我迄今碰到的最繁复的生成新数据。原始数据是一份住户登记表。里面有每个人的个人信息和他与与户主关系的信息,目的是找到亲子关系。构想是新数据库以子辈为观察单位,找到他们的父母,再把父母的变量添加到每个观察上。我的做法如下:

use a1,clear [打开全部样本数据库]
keep if gender==2&agemos>=96&a8~=1&line<10
[保留已婚的一定年龄的女性]
replace a5=1 if a5==0
[变量a5标记和户主的关系。等于0是户主,等于1是户主的配偶。这里不加区分地将户主及其配偶放在一起。]
keep if a5==1|a5==3|a5==7
[保留是户主(=1),是户主的子女(=3),或是户主的儿媳(=7)的那些人。]

ren h hf [将所需变量加上后缀f,表示女性]
ren line lf [将所需变量加上后缀f,表示女性]
sort wave hhid
save b1,replace [排序并保存]

keep if a5f==1 [留下其中是户主或户主配偶的]
save b2,replace [保存]

use b1,clear
keep if a5f==3|a5f==7
save b3,replace [留下其中是户主女儿或儿媳的并保存]

use a3,clear [打开与户主关系是户主子女的儿童数据库]
sort wave hhid
merge wave hhid using CHNS01b2, keep(hf lf)
ta _merge
drop if _merge==2
sort hhid line wave [处理两代户,将户主配偶女性库与儿童库合并]

by hhid line wave: egen x=count(id)
drop x _merge [计算每个年份家庭匹配的情况,x只取值1,表明两代户匹配成功]
save b4,replace [保存]

use a4,clear [打开与户主关系是户主孙子女的儿童数据库]
sort wave hhid
merge wave hhid using CHNS01b3, keep(a5f a8f schf a12f hf agemosf c8f lf)
ta _merge
drop if _merge==2 [处理三代户,将户主女儿或儿媳女性库与孙子女儿童库合并]

sort hhid line wave
by hhid line wave: egen x=count(id)
gen a=agemosf-agemos
drop if a<216&x==3 [计算每个年份家庭匹配的情况,x不只取1,三代户匹配不完全成功。删除不合理的样本,标准是年龄差距和有三个可能母亲的那些家庭。]

gen xx=x[_n+1]
gen xxx=x[_n-1]
gen y=lf if x==1
replace y=lf[_n+1] if x==2&xx==1
replace y=lf[_n-1] if x==2&xxx==1
keep if x==1|(lf==y&x==2)
[对于有两个可能母亲的儿童,有相同编码的女性出现两次的情况。上面的做法是为了保证不删除这部分样本。]

drop a x xx xxx y _merge
save b5,replace [保存合并后的数据库]

[对男性数据的合并完全类似,不赘述。]

log close
exit,clear

我的方法是使用简单命令反复迂回地达到目的,所以非常希望有更简便的方法。不过往往不能追求程序非常漂亮,也得过且过了。曾经有人向我索要过上面的处理方法,但我一直没有回复。现在公开了,希望对需要的人能有所帮助,我也懒得答复了。
(待续)

我常用到的stata命令之四

(续)
egen也是生成变量的一个命令,特点是函数功能强大。gen可以支持一些简单的函数,像四则运算。egen支持更复杂的函数,比如求某变量的平均值,

egen ave = mean((变量名))

还有很多函数,可以从help里面查到,不一一列举。到现在为止我用到的有取平均、加和等函数。

讲了这么多,举个例子。某个原始数据中用变量date记录了一些日期,格式是:1980年12月11日被记为19801211。如果要提取其中的年份和月份,并生成虚拟变量,该怎么办?下面是我的做法:

gen yr=int(date/10000)
gen mo=int((data-yr*10000)/100)
ta yr, gen( yd)
ta mo, gen( md)

这里函数int()是取整函数,即把一个数字的小数部分去掉。

所需变量做好后,就可以保存为新数据库了。命令是save空格(文件名),replace。前面说过,replace选项将更新对数据库的修改,所以一定要小心使用。所以应另存为一个新库,如果把原始数据改了又变不回去,就叫天不应叫地不灵了。

除了对单个数据库的简单操作外,有时需要改变数据的结构,或者抽取来自不同数据库的信息合并。这一类命令中我用过的有:改变数据的纵横结构的命令reshape,生成退化数据库的命令collapse,合并数据库的命令append和merge。

reshape用于改变纵列(longitudinal)数据的结构。纵列数据就是通常说的面板(panel)数据,它记录下同一个主体(agent)同一个变量在不同时刻的观察值。记录纵列数据有宽表和长表两种格式。所谓宽表是以每个主体为纪录的单位,不同时期的相同变量都记录在同一观察下。例如,主体是某厂商,时期有2000、2001年,变量是雇佣人数和所在城市,假设雇佣人数在不同时期不同,所在城市则不变。宽表记录的格式是每个厂商是一个观察,没有时期变量,雇佣人数有两个变量,分别记录2000年和2001年的人数,所在城市只有一个变量。所谓长表是主体和时间共同定义观察,在上面的例子中,每个厂商有两个观察,由不同的年份变量区分,雇佣人数和所在城市在不同观察下都只有一个,记录在不同年份该变量的取值。reshape就是把数据库把宽表变成长表,把长表变成宽表。

在上面的例子下,把宽表变成长表的命令格式如下:

reshape long (雇佣人数的变量名), i((标记厂商的变量名)) j((标记时期的变量名))

因为所在城市不随时期变化,所以在转换格式时不用放在reshape long后面,转换前后也不改变什么。相反,把长表变成宽表只需把long换成了wide。

collapse的用处是计算某个数据库的一些统计量,再把这些统计量另存为新数据库。我用到它也较无奈,因为我找不到直接报告中位数和从1%到99%百分位数的命令。哪位大侠知道麻烦告诉我一下,在下先谢过了。计算中位数的命令如下。

collapse (median) ((变量名)), by((变量名))

生成的新数据库中记录了第一个括号中的变量(可以是多个变量)的中位数。右面的by选项是根据某个变量分组计算中位数,没有这个选项则计算全部样本的中位数。
(待续)

我常用到的stata命令之三

(续)
在生成新数据库的过程中,往往需要用原始变量派生出新的变量。生成新变量的命令有gen,egen和replace。它们的基本语法是

gen (或replace)(变量名)=(表达式)

二者的不同之处是gen生成新变量,replace重新定义旧变量。

虚拟变量是取值为0或1的变量,用来标记样本中主体的某种性质。虚拟变量在实证分析中广泛使用,所以略述如何生成的新的虚拟变量。有两种基本的方法。一种较简明,

gen(变量名)=((限制条件))

这里“限制条件”最外面的小括号是语法要求的,里面的小括号表示括号中间的内容是解释性的。如果某个观察满足限制条件,那么这个虚拟变量在该处取值为1,否则为0。另一种是

gen (变量名)=1 if (取值为1限制条件)
replace(相同的变量名)=0 if (取值为0的限制条件)

二者有一个小小的区别。如果限制条件的表达式里面没有任何缺失值,那么两种方法的结果一样。如果有缺失值,第一种方法会把是缺失值的观察的虚拟变量都定义为0。而第二种方法可以将虚拟变量的取值分为三种,一是等于1,二是等于0,三是等于缺失值。避免了把本来信息不明的观察错误地放到回归中去。

需要生成的虚拟变量不多时,依次定义新变量即可,如果需要生成大量类似的虚拟变量,基本方法就很费时费力。比如,希望在一个包含大量社区的数据中生成社区虚拟变量时,社区的数目可能有成百上千个,太费事了。如果每个社区有一个编码标记,就可以入下命令批量生成相应的虚拟变量。

ta (变量名), gen((变量名))

第一个括号里的变量名是已知变量,即上面例子中的社区编码。后一个括号里的变量名是新生成的虚拟变量的共同前缀,后面跟数字表示不同的虚拟变量。如果我在这里填入d,那么,上述命令就会新生成d1,d2,等等,直到所有社区都有一个虚拟变量。

补充一句。在回归中控制这么多社区变量,如果一个一个地输入变量名会很累。可以用省略符号简化,d*表示所有d字母开头的变量;或者是用破折号,d1-d150表示第一个到第150个社区虚拟变量。

还有一种方法可以在回归中直接控制虚拟变量,而不必真的生成这些虚拟变量。如下。

areg (被解释变量) (解释变量), absorb(变量名)

absorb选项后面的变量名和前面讲的命令中第一个变量名相同,即上面例子中的社区编码。回归的结果和在reg中直接加入相应的虚拟变量相同。
(待续)

我常用到的stata命令之二

(续)
第一步是整理原始数据。没有经过整理得原始数据,有错漏和不统一的地方。比如,一些变量的缺失观察值的表示方法,有时会用点,有时会用-9,-99等数字。未加调整就回归,结果自然荒谬。还有在不同的数据中,相同的变量的变量名不同,给合并数据带来麻烦。个人意见:根据需要,从原始数据中提取所需信息,再重新生成新的数据库,后续的分析只使用这个新库。如果需要增添新的信息,也是修改这个新库,不宜直接调用原始数据。这部分工作不难,但是非常基础。如果在这里你不够小心,后面的事情往往会白做。

现在检查数据。常用的命令包括codebook,su,ta,des和list。其中codebook提供的信息最全面,缺点是不能使用if条件限制范围,而且各种信息同时报告,在制作表格时不方便,所以还要用别的命令帮帮忙。su(summrarize)的语法是空格加变量名,它报告相应变量的非缺失的观察个数,均值,标准差,最小值和最大值。ta(tabulate)的语法是空格后面加一个(或两个)变量名,它报告某个变量(两个变量时即为二维)的取值(不含缺失值)的频数,比率和按大小排列的累积比率。des(describe)的后面也可以加任意个变量名,只要在数据中有。它报告变量的存储的类型,显示的格式和标签。(一般地,标签记录该变量的定义和单位)。List的后面也是接变量名,它报告该变量的观察值,我们可以用if或in来限制观察值的范围。所有这些命令都可以后面不加任何变量名,报告的结果是正在使用的数据库中的所有变量的相应信息。说起来苍白无力,不如打开stata亲自实验一下吧。

一句题外话。除了codebook之外,上述统计类的命令都属于r族命令(也称一般命令)。执行后都可以使用return list报告储存在r()中的统计结果。最典型的r族命令当属summarize。它会把样本量、均值、标准差、方差、最小值、最大值、总和等统计信息储存起来。你在执行su之后,只需敲入return list就可以得到所有这些信息。除了用于统计的命令之外,还有用于估计的命令,比如,regress。这些估计命令(又称e族命令)也存储了很多相关信息。和前面的统计命令类似,我们可以用ereturn list命令看到相应的信息。在复杂一些的应用中,比如对回归分解,计算一些程序中无法直接计算的统计量时,这些功能很有用。

用codebook可以看变量的值域和单位。如果有-9,-99这样的取值,怀疑是缺失。核对一下问卷中对缺失值的记录方法,确定后,改为用点记录。命令是replace (变量名)=. if (变量名)==-9。缺失值占总样本的比例太多不好,一是样本小,结果会不显著;二是可能有选择性偏差,缺失的那部分人的特征和总体相差很大。这是选用变量的一个依据。

统一命名。或者统一标签;或者统一变量的命名规则。更改变量名的命令是

ren(原变量名)(新变量名)

定义标签的命令是

label var(变量名)”(标签内容)”

整齐划一的变量名有助于记忆,简明的标签有助于分析数据。
(待续)

2008年3月2日星期日

我常用到的stata命令之一

最重要的两个命令莫过于help和search了。即使经常使用stata,也很难记住常用命令的每一个细节,更不用说那些不常用到的了。当然这也没必要,因为在遇到困难时有可能找到专家,或者自己查stata自带的帮助文件。这些帮助文件十分详尽,面面俱到。不过这既是好处也是麻烦。当你看到长长的帮助文件时,是不是对迅速找到相关信息感到没有信心?

help和search都是查找帮助文件的命令。它们之间的区别在于help是精确查找,而search是模糊查找。如果你清楚地知道某个命令,想知道它的具体使用方法,只须在stata的命令行窗口中输入help空格加上这个命令的名字。回车后屏幕上就会显示出该命令帮助文件的全部内容。如果你想知道在stata下做某件事情,比如估计或计算,却不知道具体该如何实现,那么可以用search命令。使用的方法和help类似,只须把准确的命令名改成某个关键词。回车后屏幕会给出所有和这个关键词相关的帮助文件和链接的列表。在列表中找最相关的内容,点击后会弹出新窗口,里面会给出相关的帮助文件。耐心寻找,反复试验,通常很快能找到所需内容。

下面讲一些处理数据前的准备工作。我的经验是最好能用stata的do文件编辑器记录下做过的工作。因为很少有一项实证研究能够一次完成,所以,下次继续工作时,能够复制前面的工作非常重要。有时因为一些细小的不同,就无法复制原先的结果。这时如果用do文件记录下以往的工作,你就不必重复劳动,不必一遍又一遍地试图重现做过的工作。在stata窗口上部的工具栏中有个孤立的小按钮,把鼠标放上去会出现“bring do-file editor to front”,点击这个按钮就会出现do文件编辑器。

为了使do文件能够顺利工作,一般需要编辑do文件的“头”和“尾”。这里给出我使用的“头”和“尾”。

/*(标签。简单记下该文件的功能。) */

capture clear (清空内存中的数据)
capture log close (关闭所有打开的日志文件)
set mem 128m (设置用于stata使用的内存容量)
set more off
(关闭more选项。如果打开该选项,那么结果分屏输出,即一次只输出一屏结果。你按空格键后再输出下一屏,直到全部输完。如果关闭则中间不停,一次全部输出。)
set matsize 4000 (设置矩阵的最大阶数。我用的是不是太大了?)

cd D: (进入数据所在的盘符和文件夹。和dos的命令行很相似。)
log using (文件名).log,replace
(打开日志文件,并更新。日志文件将记录下所有文件运行后给出的结果,如果你修改了文件内容,replace选项可以将其更新为最近运行的结果。)
use (文件名),clear (打开数据文件。)

(文件正文内容)

log close (关闭日志文件。)
exit,clear (退出并清空内存中的数据。)

这个do文件的“头”、“尾”并非我的发明,是从沈明高老师那里学到的。版权归沈明高老师。(待续)