Hierarchical Clustering of Population Pyramids Presented as Histogram Symbolic Data

Mathematics and Computer Sciences Journal (MCSJ), Volume 2, Sep 2017

View Abstract   Hide Abstract   |   Purchase PDF
Abstract
Population pyramid is a very popular presentation of the age-sex distribution of the human population of a particular region. Its shape is influenced not only by demographical indicators, but also by many other social and political characteristics, such as birth control policy, wars, life-style etc. In the paper Clustering of population pyramids (Korenjak-C erne, Kejzar, Batagelj, Informatica, 2008) clusters of world countries with similar pyramidal shapes were obtained using Wards hierarchical clustering. The corresponding clusters shapes can offer additional insight about countries to field-related researchers. In order to get clusters where the gender and size of population are also taken into account we present data as histogram symbolic data (Billard, Diday, 2006). For their analysis we adapt the generalized Wards hierarchical clustering procedure (Batagelj, 1988). The changes of the pyramids shapes, and also changes of the countries inside main clusters will be examined for the years 1996, 2001, and 2006.

Author(s): Natasa Kejzar, Simona Korenjak-Cerne, Vladimir Batagelj

Statistical Modulation of a Human Health Problem in Albania

Mathematics and Computer Sciences Journal (MCSJ), Volume 2, Sep 2017

View Abstract   Hide Abstract   |   Purchase PDF
Abstract
The air pollution from the industry activity is very dangerous for the human health. This paper aims to analyze the data collected in three sites: two polluted and the ones not, positioned in the south of Albania. Using ANOVA we analyze the influence of the site in the hematological and pneumological field. We build a multivariable regress model for the pneumology using smoke, time stay and the age as independent variables in this model. The covariance method used on the model shows that avoiding the smoke variable there is no difference between three sites in the pneumological field. The dependence of the smoke from the time stay is shown using the multi ANOVA method.

Author(s): Luela Prifti, Etleva Beliu, Shpetim Shehu

Applications of Wavelet-Based Functional Mixed Models to Proteomics and Genomics Data

Mathematics and Computer Sciences Journal (MCSJ), Volume 2, Sep 2017

View Abstract   Hide Abstract   |   Purchase PDF
Abstract
Various genomic and proteomic assays yield high dimensional, irregular functional data. For ex- ample, MALDI-MS yields proteomics data consisting of one-dimensional spectra with many peaks, 2D gel electrophoresis and LC-MS yield two-dimensional images with spots that correspond to peptides present in the sample, and array CGH or SNP chip arrays yield one-dimensional functions of copy number information along the genome. In this talk, I will discuss how to identify candidate biomarkers for various types of proteomic and genomic data using Bayesian wavelet-based functional mixed models. This approach models the functions in their entirety, so avoid reliance on peak or spot detection methods. The ?exibility of this framework in modeling nonparametric fixed and random effect functions enables it to model the effects of multiple factors simultaneously, allowing one to perform inference on multiple factors of interest using the same model fit, while adjusting for clinical for experimental covariates that may affect both the intensitiesand locations of the peaks and spots in the data. I will demonstrate how to identify regions of the functions that are differentially expressed across experimental conditions, in a way that takes both statistical and clinical significance into account and controls the Bayesian false discovery rate to a pre-specified level. Time allowing, I will also demonstrate how to use this framework as the basis for classifying future samples based on their proteomic smf genomic profiles in a way that can also combine information across multiple sources of data, including proteomic, genomic, and clinical. These methods will be applied to a series of proteomic and genomic data sets from cancer-related studies.

Author(s): Jeffrey S. Morris

A semiparametric Bayesian model for examiner agreement in periodontal research

Mathematics and Computer Sciences Journal (MCSJ), Volume 2, Sep 2017

View Abstract   Hide Abstract   |   Purchase PDF
Abstract
An important measure of the severity of periodontal disease is the probing pocket depth (PPD), which is measured on up to 6 sites for each tooth in the mouth. Establishing and monitoring agreement among multiple examiners is critical to high quality periodontal research. We develop a Bayesian hierarchical model that links the true, observed and recorded values of PPD, permitting correlation among the measures within patient. Tooth-site-specific examiner effects are modeled as arising from a Dirichlet process mixture, facilitating discovery of subgroups among the periodontal sites according to degree of agreement with a reference examiner. We analyze data from a PPD calibration study and illustrate the effects of correlation on assessments of examiner agreement.

Author(s): Elizabeth H. Slate and Elizabeth G. Hill

A Bayesian Approach to Inferring the Contribution of Unobserved Ground Conditions to Observed Scores in Sports: The Example of Cricket

Mathematics and Computer Sciences Journal (MCSJ), Volume 2, Sep 2017

View Abstract   Hide Abstract   |   Purchase PDF
Abstract
This paper is part of a wider research programme using a dynamic-programming approach to modelling the choices about the amount of risk to take by teams and players in International Cricket. An important confounding variable in this analysis is the ground conditions (size and shape of stadium, condition of playing surface and weather conditions) that affect the trade off between risk and return that teams and players face. This variable does not exist in our historical data set and would in any event be very difficult to accurately observe on the day of a match. In this paper, we consider a way of estimating a distribution for the ground conditions using only the information contained in the scores and result of the match. In our approach we use the difference between the cumulative density function of scores and a probit estimate of the probability of each score being a winning score in order to infer the extent to which high scores on average reflect easy conditions rather than good performance. Using a Monte Carlo method we estimate the percentage of the variation in total scores that is due to the variation in conditions and we subsequently use Bayes Law to estimate a distribution of conditions for each match. We develop our method using the example of cricket and we outline some potential applications of the method to other sporting contests.

Author(s): Scott R. Brooker, Seamus Hogan

Teaching mathematics to engineers with Mathematica

Mathematics and Computer Sciences Journal (MCSJ), Volume 2, Aug 2017

View Abstract   Hide Abstract   |   Purchase PDF
Abstract
Over last years, increasing using of informatics tools on education is absolutely generalized, in particular, interesting mathematics software with large-scale applications has been developed. It has been proved, they are a very good help on learning not only because rapid calculations are possible, but because they let students to reach concepts in a clearer way. Moreover, in the formation of a future engineer, it is important to get him a correct and fluent interpretation of numerical results. On other hand, they should be able to design their own programs to solve problems. In this sense, using any kind of mathematical software integrating numeric and symbolic computation will be a no discussible help. In this work, academic results of students learning with Mathematica are analyzed. This software was used to teach "Numeric Calculus", an optional subject which students can choose from his second year on the university, so that, all of them have already attended to a course on Mathematics Fundaments of Engineer at his first year. The number of students was never more than twenty five in order to get a good evaluation of the efficiency of the methodology employed. The results correspond to the last five years in an Engineer School.

Author(s): Conchita Marin

Generalized self-consistency algorithms for mixture models

Mathematics and Computer Sciences Journal (MCSJ), Volume 2, Aug 2017

View Abstract   Hide Abstract   |   Purchase PDF
Abstract
A generalization of the concept of mixture, self-consistency, expectation and imputation and asso- ciated Quasi-EM algorithms is presented and applied to multinomial logistic model, a family of univariate survival models, and multivariate survival models motivated by frailties. A subclass of Archimedian copula models is identified that is characterized by monotonically convergent Quasi-EM algorithms. A connection to recently proposed MM algorithms that extend the EM concept without using missing data arguments is established.

Author(s): Alex Tsodikov

Anisotropic and Inhomogeneous Hidden Markov Models for the Analysis ofWater Quality Spatio-Temporal Data on a Cylindrical Lattice

Mathematics and Computer Sciences Journal (MCSJ), Volume 2, Aug 2017

View Abstract   Hide Abstract   |   Purchase PDF
Abstract
Motivated by a real data problem, an anisotropic and inhomogeneous spatio-temporal Hidden Markov model (HMM) with an unknown number of states is made up on a cylindrical lattice. A Bayesian inference procedure, based on a reversible jump Markov chain Monte Carlo algorithm, is proposed to estimate both the dimension and the unknown parameters of the model. The real data problem is the modelling in time and in space of the concentrations of three dissolved inorganic nitrogens recorded monthly by the Scottish Environmental Protection Agency in the 56 major Scottish rivers. The 56 gauging stations can be linked to create a circle and so the spatio-temporal data set can be displayed on a cylinder. The states of the hidden Markov process allows the classification of the observations in a small set of groups. The different states can represent increasing levels of pollution. In the Bayesian model presented here, the hidden Markov process is an anisotropic and inhomogeneous Potts model. The Potts model is widely used in statistical mechanics to model the spins of elementary particles that are placed on a lattice. Here the hidden Potts model is assumed to be anisotropic (i.e., variant under rotations) and inhomogeneous (i.e., variant under translations). Anisotropy is due to the presence of two different parameters describing the link between neighbouring pixels: one for the temporal relation and the other for the spatial relation. Inhomogeneity is established by assuming that the spatial relation is a function of the physical distance between two neighbouring sites.

Author(s): Luigi Spezia

Parametric/Nonparametric Hybrid Two Sample Problem by Empirical Likelihood with Censored Data

Mathematics and Computer Sciences Journal (MCSJ), Volume 2, Aug 2017

View Abstract   Hide Abstract   |   Purchase PDF
Abstract
We use the hazard formulation of the censored data empirical likelihood to study the two sample parametric/nonparametric hybrid model. We demonstrate that a proper Empirical Likelihood definition that takes into account of censoring will result an empirical likelihood ratio test with a proper chi-square limiting distribution under null hypothesis. We illustrate the use of the proposed test by way of testing the ROC curve with censored data, among others. Results are compared to Zhou & Liang (2005 Biometrika). Joint work with Hua Liang, Department of Biostatistics, University of Rochester.

Author(s): Mai Zhou

Bayesian analysis of physiologically based pharmacokinetics modelling of perchloroethylene in humans

Mathematics and Computer Sciences Journal (MCSJ), Volume 2, Aug 2017

View Abstract   Hide Abstract   |   Purchase PDF
Abstract
This study is to estimate population distributions of PBPK model parameters and to make a dose reconstruction with clinical data from uncontrolled studies. Perchloroethylene (PCE) is a widely distributed pollutant in the environment. The cancer risks of PCE at low exposures are uncertain. PCE occurs widely in the dry cleaning establishments and also can be found in indoor air. However, the concentrations of PCE are mostly below 1ppm. Therefore, it is very important to assess cancer risks at these low concentrations. A human physiologically based pharmacokinetics (PBPK) model was used to quantify tissue doses of PCE and its key metabolite, Trichloroacetic Acid (TCA) after inhalation exposures. This PBPK model was integrated with a statistical hierarchical model to acknowledge variations due to intraindividual variation, interindividual variation, measurement error and difference between study methods. A Bayesian approach, Markov chain Monte Carlo analysis, was employed to analyze clinical data obtained from controlled studies. The data are on alveolar or exhaled breath concentrations of PCE, blood concentrations of PCE and TCA, urinary excretion of TCA. The posterior distributions of PBPK model parameters were obtained. Predictive ability of posteriors was satisfactory. Posterior predictions are much better than prior fit.

Author(s): Junshan Qiu