Tourism statistics are a "monitor" for judging the scale and quality of tourism development in a country or region. The quality of data will directly affect the correct judgment of the government, industry and academia on the development of tourism in a country or region. Since tourism is an industry defined from the demand side, it is implicit in the classification of national economic industries and is not a direct object of statistics by governments at all levels. The quality of tourism statistics has no direct impact on the main statistics of governments at all levels. Therefore, tourism statistics work has not been taken seriously by government leaders at all levels for a long time. The industrial boundary of tourism is defined according to the industries involved in tourists' consumption, involving many aspects such as "food, accommodation, transportation, travel, shopping, and entertainment". Although my country's tourism authorities have revised my country's tourism statistics survey system in accordance with the "International Tourism Statistics Recommendations 2008" and "Methodological Framework for Tourism Satellite Account Recommendations 2008" recommended by UNWTO, guiding my country's tourism statistics work, there are always voices of doubt about the official data released in both academia and industry, such as doubts about the "horizontally incomparable and vertically incomparable" of domestic tourism statistics, "the scope of domestic tourist statistics is too broad", and "tourism statistics are too watery". Some of the above doubts about tourism statistics are caused by cognitive biases about the complexity of tourism statistics, some are caused by differences in the recognition of basic concepts such as tourism, tourists, and visitors from different disciplinary backgrounds, and some are caused by the limitations of traditional sampling surveys and grassroots reporting survey methods. The doubts about tourism statistics from all walks of life mainly focus on domestic tourism statistics, especially the accuracy of regional domestic tourism statistics. As my country enters a new era of social economy and the uncertainty of the external environment, expanding consumer demand has become an important driving force for my country's economic development, and tourism consumption, as a multi-level, diversified, and sustainable service final consumption, has also become a "fragrant pavilion" for governments at all levels to stimulate the economy. Therefore, the accuracy of domestic tourism statistics is directly related to national or regional tourism development strategy decisions and the healthy development of regional tourism. This article will start with an analysis of the factors that lead to inaccurate data in the production process of domestic tourism statistics, and explore the path to improve the quality of domestic tourism statistics from the main links of statistical data production.
1. Current methods for producing domestic tourism statistics in my country
1. Domestic tourism statistical indicators
At present, tourism administrative departments at all levels in my country carry out tourism statistics work in accordance with the "Tourism Statistical Survey System" (2017 edition).
The International Tourism Statistics Recommendations 2008 defines domestic tourism as "domestic tourism including travel by domestic residents within the country and the domestic portion of outbound tourism". It mainly collects basic data and related indicators such as the scale of residents' travel, travel purposes, modes of transportation, travel organization methods, accommodation types, etc. at the national level.
Since the definition of domestic tourism by UNWTO remains at the national level, it is far from enough to count domestic tourism at the national level in view of my country's actual situation. Domestic tourists account for more than 99% of the tourism source market in provinces (municipalities directly under the Central Government, autonomous prefectures) and cities. Therefore, in my country's current tourism statistical survey system, domestic tourism statistics are divided into three geographical scales: national, provincial and city. The core statistical indicators of domestic tourism at different geographical scales are shown in Table 1. Due to the differences in statistical perspectives and statistical units at the national and regional levels, although they are expressed as domestic tourist numbers and tourism income in the tourism economic operation statistical reports of governments at all levels, the meanings of the indicators are different. The national level counts the number of residents traveling domestically, and the regional level (provincial and city level) counts the number of domestic tourists visiting (including domestic residents outside the region and local residents visiting the region).
Table 1 Domestic tourism statistical indicators and explanations at different geographical scales in my country
Note: 1. Compiled based on the Tourism Statistical Survey System (2017 edition). 2. Definition of tourists This article follows the definition of UNWTO and my country's Tourism Statistical Survey System.
2. Domestic tourism statistical survey methods at different geographical scales
1. National level domestic tourism statistical survey methods
The two core indicators of the national domestic tourism statistical survey (domestic tourist numbers and tourism revenue) were commissioned by the former National Tourism Administration to the National Bureau of Statistics Social Situation and Public Opinion Survey Center. The multi-stage random equidistant sampling method was used to investigate the average household travel rate, travel expenditure level and expenditure structure of urban and rural residents in 60 sample cities through a computer-assisted telephone (CATI) survey. The statistics were compiled in an overall estimation based on the questionnaire survey results, and the survey was conducted once a quarter on a household basis. See Table 2 for details.
2. Domestic tourism statistics survey methods at prefecture and city level
Since the core statistical indicators of domestic tourists at the provincial level are formed by summarizing the data at the prefecture-level, the prefecture-level is currently the smallest geographical unit for domestic tourism statistics. The country adopts a unified sampling survey method, mainly surveying overnight tourists in tourist accommodation units, supplemented by surveying day-trip tourists at tourist attractions and tourists staying overnight at relatives and friends' homes, as shown in Table 2. The number of overnight tourists received by tourist accommodation units is estimated based on the number of guest rooms owned by the local city, the room occupancy rate, the proportion of tourists received, and other related data. These related data come from the monthly reporting information of accommodation units. The total sample size is determined based on the number of domestic tourists received during the same survey period in the previous year. In principle, the sample size for annual surveys should not be less than 0.03% of the number of domestic tourists received in the previous year.
The estimated value of domestic tourism revenue received at the prefecture-level city level depends on the average daily expenditure and average length of stay of domestic tourists, and the data comes from a sample survey on domestic tourist expenditure.
Table 2 Data summary and survey methods of core statistical indicators of domestic tourism at different geographical scales in my country
In summary, the main survey methods used for collecting basic data on domestic tourist numbers and tourism revenue at different geographical scales are sampling surveys and reporting by grassroots statistical units. Although with the development of information and digital technology, some research institutions and provinces have actively tried to use big data as a data source, due to the differences in principles between traditional survey methods and big data survey methods, both have different degrees of impact on the quality of domestic tourism statistical data.
3. Implementation Methods of Domestic Tourism Statistical Survey
According to my country's tourism statistics survey system, tourism statistics work includes data collection, aggregation, estimation, analysis, and release, and each link shows strong professionalism. The implementation of domestic tourism statistics at the national level is the consultation and formulation of the survey plan by the Ministry of Culture and Tourism and the National Bureau of Statistics, and then entrusted to the Social Situation and Public Opinion Survey Center of the National Bureau of Statistics for implementation. The survey implementation process and data processing are undoubtedly professional. However, tourism statistics in various provinces are mostly a business of the Policy and Regulations Department, equipped with one or two full-time statisticians. Since the implementation of tourism statistics, the sampling survey tasks in tourism statistics have been mainly undertaken by the survey teams under the provincial statistics bureaus. With the rapid growth of domestic tourism, the innovation of tourism formats, the transformation of tourism methods, and the increasing attention paid by governments at all levels to data, the classification and quality requirements of data are getting higher and higher. In order to meet the needs of the government, enterprises and academia for data, tourism management departments at all levels have become more and more common in the absence of compilation and lack of professional personnel. Some provinces and cities have outsourced all tourism surveys and data analysis to survey companies, while others have outsourced them by project. Domestic tourism statistics are the focus and difficulty of provincial tourism statistics, and business outsourcing has become the first choice.
2. Analysis of factors affecting the quality of domestic tourism statistical data
1. Analysis of the impact of statistical survey methods on data quality
1. Data quality analysis of sampling survey
Sampling survey is the main method to obtain basic data on per capita expenditure per trip and per capita daily expenditure for domestic tourism statistics at different geographical scales, mainly including sampling survey on residents' travel situation and sampling survey on local expenditure on receiving domestic tourists. Since manual or telephone survey methods are currently used, although there are pre-designed scientific survey plans, the following factors will still affect the quality of survey data:
(1) The impact of subjective and objective factors in questionnaire survey on data quality
On the one hand, whether the respondents are willing to fill in the form truthfully will affect the data quality of the survey, and on the other hand, the accuracy of the filled-in data. The current statistical survey methods, whether it is a survey of residents' travel or a survey of domestic tourists' spending in the receiving area, all rely on the respondents to recall the travel expenses during a certain time or period, especially the domestic residents' travel survey, which is conducted once a quarter to recall the expenses of the last trip. Although electronic payment is becoming more and more popular, whether it is reservation or payment, consumption records will be left. Compared with the era of paper currency payment, the accuracy of the survey values of tourists' travel expenses should be improved to a certain extent. However, the recall survey method has inherent defects. According to the law of human forgetting and the diversity of scenes in the travel process, it is difficult for most people to accurately remember the various expenses during the travel process.
(2) The sample size is too small, which affects the representativeness of the sample and the accuracy of the overall data.
Due to the limitation of survey funds, the national-level domestic residents' travel sampling survey in my country's tourism statistical survey system has an annual sample size of 80,000. Compared with the current number of tourists exceeding 6 billion, the survey sample is obviously too small. The sample size of the sampling survey on domestic market expenditure in each province is 0.03% of the domestic market size in the previous year. The provinces still have a discount on the sample size when implementing it, which will affect the quality of the overall calculation through the sample to a certain extent. At the same time, due to the high cost and low efficiency of traditional personnel survey methods, with the development of information technology and mobile Internet, the advantages of low cost and fast feedback of online surveys have become more prominent, and it is entirely possible to increase the sample size.
2. Analysis of data quality of primary report
In the current tourism statistics survey system, the basic situation, reception situation, and financial situation of tourist hotels, travel agencies, and tourist attractions are all obtained in the form of reports. In domestic tourism statistics, it is mainly used to estimate the number of overnight tourists and the number of one-day tourists. The main factors affecting the quality of grassroots report data are:
(1) The directory of accommodation units is incomplete and not updated in a timely manner
Overnight tourists are composed of overnight tourists staying in commercial accommodation facilities, staying at relatives' and friends' homes, or staying in secondary residences for self-use or free of charge for vacation. Therefore, the accuracy of the data on the number of overnight tourists received mainly depends on the degree of perfection of the directory database of accommodation units included in the statistics. At present, the scale standards of accommodation units included in the online direct reporting system of the Ministry of Culture and Tourism vary from place to place. Even in the same province, the standards of various cities are different. For example, Wenchang City in Hainan has accommodation units with 60 beds included in the direct reporting system, while Sanya City has accommodation units with 39 beds included in the direct reporting system.
(2) Most accommodation units are small and micro enterprises and self-employed individuals
With the advancement of global tourism, in order to adapt to the changes in people's travel patterns and the changes in accommodation needs, the market share of non-standard accommodation products such as homestays, inns, and apartment-style family hotels is increasing, both in rural and urban areas. On the one hand, most of these accommodation units are self-employed individuals and small and micro enterprises, and they have not been registered with the industrial and commercial administration and tax authorities, making it difficult for them to enter the scope of supervision and statistics; on the other hand, these accommodation units are born and die quickly, so it is difficult for traditional directory database creation and updating to touch these actual accommodation units.
(3) With the increase in the number of open scenic spots, it is difficult to count the number of one-day tourists using traditional survey methods.
New tourism forms such as rural tourism, tourist towns, historical blocks, tourist complexes, and rural complexes are mostly open scenic spots without gates and no entrance fees. It is difficult to estimate the number of one-day tourists based on traditional statistical methods. Technical issues such as how to determine the boundaries of open scenic spots and how to identify tourists are factors that restrict the accurate measurement of the number of people received by traditional open scenic spots. These obstacles happen to be unsolvable by traditional statistical survey methods, which directly affect the accuracy of one-day tour statistics.
3. Big Data Quality Analysis
At present, the most widely used big data sources in the data mining process of tourism data centers and related big data companies are operator data, UnionPay business data, OTA operation data, etc. Due to the difference between the production process of big data and the production structure of government statistical data (see Table 3), in the absence of national standards in data cleaning, analysis and modeling, the use of big data for domestic tourism statistics has a certain degree of impact on the quality of tourism statistical data.
Operator data is characterized by large volume, full dimensions, high accuracy, continuity and traceability. It is used to distinguish different types of tourists through analysis of tourist sources, tourist behaviors, travel trajectories, etc., and becomes a new data source for estimating the reception volume of various types of tourists. At the same time, tourist consumption data is obtained by matching with UnionPay business data. The main obstacle affecting the quality of operator data is the lack of unified data screening standards. According to the technical standard for defining tourists in my country's tourism statistical survey system, it is 10 kilometers away from the place of residence for more than 6 hours. In operator data mining, it is necessary to define "how long to stay at the destination" to be considered a one-day tourist. However, at present, the country has not issued standards for operator data mining. Different data companies conduct data mining according to their own standards, resulting in confusing results and affecting the quality and value of operator data.
As a data source for obtaining tourists' travel consumption, UnionPay business data is more accurate than tourist consumption data obtained through questionnaire surveys, but it is not the complete data of tourists' spending at tourist destinations.
On the one hand, OTA can provide tourist reservation information as a data source for tourist consumption. Compared with traditional data from sample surveys, the typicality and representativeness of this data will be affected to a certain extent. On the other hand, it can obtain location information, basic information, and some business information of accommodation, catering, entertainment, scenic spots and other units through web crawler technology, which greatly reduces the time and cost of building a traditional basic unit directory.
Table 3 Comparison of big data and traditional data production processes
2. Analysis of the impact of outsourcing models on the quality of domestic tourism statistics
1. Analysis of the impact of outsourcing company’s business level on data quality
At present, a number of companies undertaking tourism statistics have grown up all over the country, but these companies have different backgrounds, uneven personnel quality, and different business capabilities. In addition, tourism statistics are highly professional, which leads to different types of outsourcing companies affecting the quality of domestic tourism statistics data in different links and to different degrees. Through the investigation of tourism statistics work in some provinces, cities, and prefecture-level cities across the country and the long-term tracking and observation of government bidding websites at all levels, the companies undertaking tourism statistics work at all levels in the market can be divided into the following five types:
(1) Traditional survey companies
This type of company was restructured from a survey company that was originally affiliated with the Statistics Bureau. These companies have a long history of undertaking tourist spending surveys and are skilled in conducting surveys. They are highly professional in sample extraction, field surveys, data aggregation, and writing survey reports. However, these companies are engaged in surveys for multiple industries, so they can only aggregate and perform simple analysis of survey data, and do not have the professional correlation analysis of tourism survey data and the professional ability to integrate multiple data.
(2) Relevant institutions of higher education or individual teachers
In the early days, when there was a shortage of professional personnel in the tourism management department, the daily work of tourism statistics was completed by cooperating with relevant professional teachers or research institutions in local universities. As a result, a group of teachers who could undertake tourism statistics business grew up, using university student resources to conduct questionnaire surveys, with high research and data analysis capabilities, and completed high-quality data and in-depth research reports. However, tourism statistics work is a continuous daily work. Because such market entities do not have full-time personnel, and universities have changed their teacher assessment methods and funding management methods, the part-time work of university teachers to undertake daily statistical work cannot guarantee the continuity of statistical survey work to a certain extent.
(3) Big data companies
These companies have turned their attention from tourism big data to tourism statistics, such as using mobile phone location information to make single statistical indicators such as one-day tours. Since tourism statistics require the integration of multiple data sources, there is no unified standard for big data cleaning, analysis, and modeling. In addition, these companies lack the knowledge accumulation of tourism and statistics. If they undertake tourism statistics outsourcing business in a package, they will also need to set up a survey team and a professional team for indicator accounting, which is difficult to achieve in the short term and cannot guarantee data quality.
(4) Tourism data companies
Such companies often originate from the tourism industry and have a background in industry-university-research. While focusing on the basic theoretical research of tourism statistics, they are rooted in local tourism statistics work, closely integrating tourism statistics theory with the actual work of local tourism statistics. They have professionals in tourism statistics, big data, market research, data modeling, etc., who can ensure the data quality of tourism statistics.
(5) Other types of companies and institutions
Such as IT companies, business consulting companies and various associations, etc. These companies and institutions are of varying quality, each with its own strengths. Tourism statistics are just a sideline or a direction of transformation for them. Tourism statistics are not very professional, and the quality of the tourism data submitted varies.
2. Impact of current bidding methods on data quality
At present, the main bidding methods for tourism statistics outsourcing business include competitive consultation, negotiation consultation and entrusted bidding. Each bidding method has its own advantages and disadvantages in terms of fairness and professionalism. However, in reality, due to the complexity and professionalism of tourism statistics work, the results of some local tourism statistics project bidding are not satisfactory. It is not ruled out that some companies that rely on personal connections or "flashy" (referring to the fact that the bidding plan is very fancy and the evaluation experts do not understand tourism statistics very well) regard tourism statistics business as a quick money-making project, and rely on personal connections to reduce the project to a small size without bidding, or win the bid at a low price. After taking on the project, such companies will objectively not have the ability to complete the project seriously due to reasons such as compressing project costs, the professionalism of tourism statistics, the standardization of surveys and the lack of tourism statistics experience, making it difficult to ensure the quality of tourism statistics data.
Since the current tourism statistics survey system does not have standards for accepting the quality of statistical data and the content that tourism statistics survey reports should include, and the project review process is relatively operational, different bidding methods will affect the quality of tourism data to a certain extent.
III. Paths to improve the quality of domestic tourism statistics
1. Integrate big data and traditional data to build an upgraded tourism data system
With the continuous improvement of my country's tourism statistical survey system, many provinces and cities have established tourism data centers and integrated some big data sources, such as data provided by communication operators, OTA, Alipay, airlines and other units. However, the multiple data sources of the provincial tourism data centers have not established a correlation relationship, and the information island phenomenon is very prominent. Therefore, we should actively build tourism data systems at all levels, integrate traditional data sources with big data sources, find the relationship between various data, use information fusion theory, and process data into various information according to certain algorithms, convert it into tourism data, and then generate tourism statistics through the processing of tourism data to serve the government, enterprises and tourists. The data fusion path is shown in Figure 1.
(1) Data source module
Based on the source of data generation, the data of the tourism data platform system can be divided into: data directly collected within the tourism system, department exchange data (this type of data mainly comes from administrative records or revenue record data of departments such as statistics, public security, taxation, transportation, commerce, development and reform, culture and sports, housing and construction, forestry, agriculture, water affairs, etc.), market purchased data (referring to data obtained by purchasing from communication operators, OTA, social networking sites, airlines, etc.), and network public data (referring to public data obtained on the Internet through big data mining technologies such as web crawlers).
Figure 1 Tourism data system architecture
(2) Database system (meta matrix)
Data from different channels are collected by different means, and the collected data are cleaned and classified, and stored in the database according to the hierarchical classification principle to form a tourism database. The database can be divided into tourism supply information system, tourism market information system and tourism destination GIS system, as shown in Figure 2.
Figure 2 Tourism database subsystem
(3) Application system
According to certain statistical principles and algorithms, tourism statistics data and tourism special research data are formed to meet the different needs of different objects such as the government, enterprises, and tourists. The most authoritative data generated by this application system is tourism statistics data. In the top-level design stage of the system, tourism statistics indicators, accounting algorithms, etc. are embedded in the data system platform. Therefore, building tourism data systems at all levels is not only an important way to integrate big data with traditional data and improve the quality of domestic tourism statistics, but also an important direction for my country's tourism statistics reform.
1. Changes in statistical survey methods
1. Changes in the survey method for the number of tourists received by local areas
(1) The integrated survey method of big data + small survey is used to estimate the total number of domestic tourists received in the region
Refers to the use of tourist location information to estimate the total number of domestic tourists received in a region. Currently, tourist location information can be derived from mobile phone signaling, GPS, related apps and other location information. Currently, the more mature and widely used are the mobile phone signaling data of the three major operators. Taking counties and cities as an example, domestic tourists can be divided into three levels: domestic tourists from outside the province, tourists from outside the county and within the province, and tourists from the county. On the basis of the unified definition of the concept of tourists, first eliminate the data of non-tourists who have entered the county, and then through a sample survey, formulate an adjustment coefficient based on the proportion of users who do not bring mobile phones (such as the elderly and children) and those who carry two mobile phones, so as to estimate a more accurate number of tourists received.
(2) Multi-source data verification to estimate the total number of domestic tourists received by the region
At present, the main sources of data on the number of tourists received in a region include: accommodation unit online reporting system, public security accommodation registration management information system, transportation passenger system, water and electricity consumption system, communication operator signaling system, OTA reservation system, etc. Multiple data sources are matched and verified with each other, and big data storage and analysis technology are used to build an estimation model for the number of domestic tourists received, and the total number of domestic tourists received in the region is estimated.
2. Changes in statistical survey methods for tourist spending levels
(1) Internet survey as the main method and traditional manual survey as the supplementary method
Compared with traditional manual surveys and telephone surveys, online surveys have the advantages of low cost, short cycle, high interface visualization, and no time limit for filling out. The online survey methods are as follows:
① Promotion of accounting apps. Based on the location information of mobile phones, the accounting apps are recommended to tourists through SMS and recommendations at key locations (airports, stations, hotels, scenic spots, etc.). The design of the accounting apps follows the principles of ease of use and flexibility in accounting details. Tourists are encouraged to keep accounts through certain incentive measures. A complete account of travel expenses is actually a record of residents' travel expenses. If it is promoted nationwide, it can supplement residents' travel surveys to some extent.
② Online questionnaire survey. Multiple questionnaires can be designed for different survey locations and survey periods. Designing a daily expenditure survey, the daily filling method can not only meet the needs of real-time data collection every day, but also greatly reduce the difficulty of tourists filling out expenditure questionnaires and improve the accuracy of collected data. ③ Intelligent questionnaire survey. The accumulation of a large number of existing questionnaires and the accumulation of various survey scenario data can be used to realize a human-computer interactive questionnaire method through machine learning, which can not only increase the enthusiasm of tourists to participate in the questionnaire survey, but also greatly save the survey cost once it is promoted on a large scale.
(2) Compare multiple data sources and mine consumption data that has already occurred for matching samples
In the context of big data, a large amount of various data will be generated during the tourists' travel. By mining tourists' mobile phone location information, UnionPay consumption information and other data, and under the premise of protecting user privacy, we can dig out tourists' out-of-town or local travel expenditure data, and then combine it with certain sampling surveys to obtain a relatively complete set of tourists' expenditure data.
3. Methods for integrating online public data with departmental exchange data
This method mainly refers to the use of web crawler technology to improve the creation and update of the tourism supply unit directory. Statistics and tourism departments at all levels, based on the original directory of various tourism industry activity units, first, update the original directory every six months in combination with administrative records from market supervision, taxation, public security and other departments; secondly, use web crawler technology to crawl accommodation, catering, entertainment, scenic spots and other related information from major OTAs (Ctrip, Qunar, Mafengwo, Airbnb, Dianping, Meituan, etc.), and then update the tourism industry directory once a quarter through multi-information source comparison, telephone verification, on-site verification and other procedures to ensure the timeliness of the directory update.
2. Establishing a standardized management system for outsourcing tourism statistics business
In recent years, tourism statistics have become a hot topic of concern to all sectors of society. In the tourism statistics supervision work conducted by the National Bureau of Statistics, data quality is the focus of supervision. As the competent department for industry statistics, the Ministry of Culture and Tourism should strengthen the quality management of tourism statistics. In the current chaotic market competition, it should establish a management system for tourism statistics outsourcing as soon as possible and incorporate it into the tourism statistics survey system. Through standardization and standardization, it should guide various companies to improve their business capabilities, thereby ensuring the quality of tourism statistics.
The tourism statistics business outsourcing management system should include the following: ① Establish an evaluation index system and national standards for data quality assessment for tourism statistical data collection, aggregation, modeling, data analysis, research reports and other links at different geographical scales of provincial, municipal and county levels; ② Establish a tourism statistics expert team across the country to be responsible for tourism statistics project bidding, project review and project evaluation; ③ Establish a tourism statistics qualification management system, which can refer to the tourism planning qualification management measures.
Contributor: Li Ying, School of Economics and Management, Northwest University