Abstract
In this article, we describe the two new commands
1 Introduction
Geographic information, because of its fundamental nature, has been playing an increasingly important role in scientific research. All statistical software programs need to address geographic data, especially when converting from human languages to geographic locations or vice versa. This is why there are several commands to address geographic location issues. Those commands include
Similar to Google Map, Baidu Map is an online map provided by Baidu Co. Ltd., a company that focuses on search-engine services for Chinese language users. Baidu Map provides online map and navigation services. It covers more than 400 cities and thousands of counties and districts. Information can detail a newly built flyover, a one-way lane with little transportation, and a country road used only by pedestrians.
Most Chinese rely on its navigation services while driving or walking around to find a small restaurant in an unfamiliar place or even around their own neighborhood. With the help of Baidu Map, people can easily find the locations of a restaurant, a bank, a parking lot, a gas station, and the like.
On April 23, 2010, Baidu offered open map API to developers for free. Since then, it has provided both a JavaScript API and a web-service API. With those applications, we can extract the longitude and latitude of a Chinese address and convert a location into the corresponding address in Chinese. On June 18, 2019, Baidu updated the API to version 3.0 to optimize services and provided completely new geocoding, which is used by
2 Baidu Map API key
Before you use these two commands to access Baidu Map in Stata, a Chinese mobile number is required to apply for a Baidu Map API key. The API key is an official permission to use the Baidu APIs. To apply for this permission, users must first use the Chinese mobile number to register for a Baidu account.
After the registration, users can log on to Baidu Map open platform (http://lbsyun.baidu.com) to apply for the Baidu Map API key. In this step, an applicant needs to key in his or her name, mobile number, and email address and agree to a declaration on certain legal issues to use the platform. The procedure is straightforward, but it takes several days for users to finally get their API key after submitting the application form online.
A typical Baidu Map API key is an alphanumeric string. Both commands rely on this API key. Thus, users have to explicitly specify it. Suppose the user already has a Baidu Map API key, which is, say,
3 The commands
3.1 Overview
Both commands rely on Baidu Map. Because the source code of the Baidu Map website is UTF-8 encoded,
When using
Sometimes, we have both a separated address and an all-in-one address. In this case, users can even specify both. However, when both are used but only one of the addresses can return a meaningful location,
The accuracy of the commands depends on the accuracy of Baidu Map, which also depends on the way you specify the addresses. If the address is not specific—say, we specify only a university’s name as an address for a university with several campuses in a certain city or province—Baidu Map may give the location of only one of the campuses, which may not be the one we are interested in.
3.2 Syntax of cngcode
We have a sample dataset,
Let’s now look at our dataset: use example.dta
Given the above information, Separate address, where users supply address parts with different variables. The syntax is as follows:
Full address, where users supply address with an all-in-one address line, here the variable
As we can see in the above, different descriptions of one location may cause different results; missing address information will cause missing values of longitude and latitude. Users could combine 1 and 2 in a single line of the command. The syntax is as follows:
In the combined address mode, as specified in both the separated address and the all-in-one address,
However, if both addresses yield meaningful locations, the default choice is to report the location generated from the separated address, which is normally more accurate than the all-in-one full address. Users can use the option
3.3 Syntax of cnaddress
Compared with the syntax for A simple command to get the corresponding Chinese address is as follows:
The output from the above command includes a separate address flavor, including province name, city, district, street, and a full all-in-one address. Different types of coordinates, such as WGS-84, GCJ-02, and BD-09, have different latitude and longitude to a given location. We can specify the type of coordinate in the option
Options for
4 Conclusion
Before the release of these two commands, it was difficult for Chinese Stata users to process Chinese geographic information.
Using longitude and latitude, we can also use the Baidu Map’s navigation system to determine the traveling time between two locations by air, train, or car (Li, Xue, and Zhang 2019). Such information could be useful for researchers in transportation, education, the environment, and economics. Besides, there is still a lot to do in the future. We hope to see more researchers devote their time and effort to this area to promote more in-depth research in certain areas with the use of Stata software.
Supplemental Material
Supplemental Material, dm0104 - Extracting Chinese geographic data from Baidu Map API
Supplemental Material, dm0104 for Extracting Chinese geographic data from Baidu Map API by Yuan Xue and Chuntao Li in The Stata Journal
Footnotes
5 Programs and supplemental materials
To install a snapshot of the corresponding software files as they existed at the time of publication of this article, type
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
